🤖 Algorithms#

Under construction 🚧

Included in the Composer library is a suite of algorithmic speedup algorithms. These modify the basic training procedure, and are intended to be composed together to easily create a complex and hopefully more efficient training routine. While other libraries may have implementations of some of these, the implementations in Composer are specifically written to be combined with other methods.

Below is a brief overview of the algorithms currently in Composer. For more detailed information about each algorithm, see the method cards, also linked in the table. Each algorithm has a functional implementation intended for use with your own training loop, and an implementation intended for use with Composer’s trainer.

Name	tldr	functional
ChannelsLast	Uses channels last memory format (NHWC)	`apply_channels_last`



CutMix	Combines pairs of examples in non-overlapping regions and mixes labels	`cutmix_batch`



SWA	Computes running average of model weights.



ColOut	Removes columns and rows from the image for augmentation and efficiency.	`colout_batch`



ScaleSchedule	Scale the learning rate schedule by a factor



SAM	SAM optimizer measures sharpness of optimization space



Alibi	Replace attention with AliBi	`apply_alibi`



SeqLengthWarmup	Progressively increase sequence length.	`set_batch_sequence_length`



MixUp	Blends pairs of examples and labels	`mixup_batch`



SqueezeExcite	Replaces eligible layers with Squeeze-Excite layers	`apply_squeeze_excite`



StochasticDepth	Replaces a specified layer with a stochastic verion that randomly drops the layer or samples during training	`apply_stochastic_depth`



Factorize	Factorize GEMMs into smaller GEMMs	`apply_factorization`



GhostBatchNorm	Use smaller samples to compute batchnorm	`apply_ghost_batchnorm`



SelectiveBackprop	Drops examples with small loss contributions.	`selective_backprop`



ProgressiveResizing	Increases the input image size during training	`resize_batch`



RandAugment	Applies a series of random augmentations	`randaugment_image`



BlurPool	Applies blur before pooling or downsampling	`apply_blurpool`



LayerFreezing	Progressively freezes layers during training.	`freeze_layers`



LabelSmoothing	Smooths the labels with a uniform prior	`smooth_labels`



AugMix	Image-perserving data augmentations	`augmix_image`



CutOut	Randomly erases rectangular blocks from the image.	`cutout_batch`

Functional API#

The simplest way to use Composer’s algorithms is through the functional API. Composer’s algorithms can be grouped into three broad classes:

data augmentations add additional transforms to the training data.
model surgery algorithms modify the network architecture.
training loop modifications change various properties of the training loop.

Data augmentations can be inserted either into the dataloader as a transform, or after a batch has been loaded depending on what the augmentation acts on. Here is an example of using RandAugment with the functional API

import torch
from torchvision import datasets, transforms

from composer import functional as cf

c10_transforms = transforms.Compose([cf.randaugment(), # <---- Add RandAugment
                                    transforms.ToTensor(),
                                    transforms.Normalize(mean, std)])

dataset = datasets.CIFAR10('../data',
                        train=True,
                        download=True,
                        transform=c10_transforms)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1024)

Other data augmentations, such as CutMix act on a batch of inputs. These can be inserted in the training loop after a batch is loaded from the dataloader as follows:

from composer import functional as CF

cutmix_alpha = 1
num_classes = 10
for batch_idx, (data, target) in enumerate(dataloader):
### Insert CutMix here ###
data = CF.cutmix(data, target, cutmix_alpha, num_classes)
### ------------------ ###
    optimizer.zero_grad()
output = model(data)
loss = loss(output, target)
loss.backward()
optimizer.step()

Model surgery algorithms make direct modifications to the network itself. Functionally, these can be called as follows, using BlurPool as an example

import torchvision.models as models

from composer import functional as cf

model = models.resnet18()
cf.apply_blurpool(model)

Each method card has a section describing how to use these methods in your own trainer loop.

Composer Trainer#

To make full use of Composer, we recommend using our algorithms and trainer together. Using algorithms with the trainer is simple, just pass a list of the algorithms you want to run as the algorithms argument when initializing the trainer. Composer will automatically run each algorithm at the appropriate time during training, as well as handle any collisions and reorderings needed.

Here is an example of how to call trainer with a few algorithms:

from composer import Trainer
from composer.algorithms.blurpool import BlurPool
from composer.algorithms.channels_last import ChannelsLast

channels_last = ChannelsLast()
blurpool = BlurPool(replace_convs=True,
                                        replace_maxpools=True,
                                        blur_first=True)

trainer = Trainer(model=model,
                train_dataloader=train_dataloader,
                eval_dataloader=test_dataloader,
                max_duration='90ep',
                device='gpu',
                algorithms=[channels_last, blurpool],
                validate_every_n_epochs=-1,
                seed=42)

Custom algorithms#

To implement a custom algorithm, it is necessary to first understand how Composer uses events to know where in the training loop to run an algorithm, and how algorithms can modify the state used for subsequent computations.