๐Ÿค– Algorithms#

Composer has a curated collection of speedup methods (โ€œAlgorithmsโ€) that can be composed to easily create efficient training recipes.

Below is a brief overview of the algorithms currently in Composer. For more detailed information about each algorithm, see the method cards, also linked in the table. Each algorithm has a functional implementation intended for use with your own training loop and an implementation intended for use with Composerโ€™s trainer.

Name

tldr

functional

ChannelsLast

Uses channels last memory format (NHWC)

apply_channels_last()

CutMix

Combines pairs of examples in non-overlapping regions and mixes labels

cutmix_batch()

SWA

Computes running average of model weights.

ColOut

Removes columns and rows from the image for augmentation and efficiency.

colout_batch()

ScaleSchedule

Scale the learning rate schedule by a factor

SAM

SAM optimizer measures sharpness of optimization space

Alibi

Replace attention with AliBi

apply_alibi()

SeqLengthWarmup

Progressively increase sequence length.

set_batch_sequence_length()

MixUp

Blends pairs of examples and labels

mixup_batch()

SqueezeExcite

Replaces eligible layers with Squeeze-Excite layers

apply_squeeze_excite()

StochasticDepth

Replaces a specified layer with a stochastic verion that randomly drops the layer or samples during training

apply_stochastic_depth()

Factorize

Factorize GEMMs into smaller GEMMs

apply_factorization()

GhostBatchNorm

Use smaller samples to compute batchnorm

apply_ghost_batchnorm()

SelectiveBackprop

Drops examples with small loss contributions.

selective_backprop()

ProgressiveResizing

Increases the input image size during training

resize_batch()

RandAugment

Applies a series of random augmentations

randaugment_image()

BlurPool

Applies blur before pooling or downsampling

apply_blurpool()

LayerFreezing

Progressively freezes layers during training.

freeze_layers()

LabelSmoothing

Smooths the labels with a uniform prior

smooth_labels()

AugMix

Image-perserving data augmentations

augmix_image()

CutOut

Randomly erases rectangular blocks from the image.

cutout_batch()

Functional API#

The simplest way to use Composerโ€™s algorithms is via the functional API. Composerโ€™s algorithms can be grouped into three, broad classes:

  • data augmentations add additional transforms to the training data.

  • model surgery algorithms modify the network architecture.

  • training loop modifications change various properties of the training loop.

Data Augmentations#

Data augmentations can be inserted into your dataset.transforms similiar to Torchvisionโ€™s transforms. For example, with ๐ŸŽฒ RandAugment:

import torch
from torchvision import datasets, transforms

from composer import functional as cf

c10_transforms = transforms.Compose([cf.randaugment(), # <---- Add RandAugment
                                    transforms.ToTensor(),
                                    transforms.Normalize(mean, std)])

dataset = datasets.CIFAR10('../data',
                        train=True,
                        download=True,
                        transform=c10_transforms)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1024)

Some augmentations, such as โœ‚๏ธ CutMix, act on a batch of inputs. Insert these in your training loop after a batch is loaded from the dataloader:

from composer import functional as cf

cutmix_alpha = 1
num_classes = 10
for batch_idx, (data, target) in enumerate(dataloader):
    data = cf.cutmix(
        data,
        target,
        alpha=cutmix_alpha,
        num_classes=num_classes
    )
    optimizer.zero_grad()
    output = model(data)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()

Model Surgery#

Model surgery algorithms make direct modifications to the network itself. For example, apply ๐ŸŠ BlurPool, inserts a blur layer before strided convolution layers as demonstrated here:

from composer import functional as cf
import torchvision.models as models

model = models.resnet18()
cf.apply_blurpool(model)

For a transformer model, we can swap out the attention head of a ๐Ÿค— transformer with one from ๐Ÿฅธ ALiBi:

from composer import functional as cf
from composer.algorithms.alibi.gpt2_alibi import _attn
from composer.algorithms.alibi.gpt2_alibi import enlarge_mask

from transformers import GPT2Model
from transformers.models.gpt2.modeling_gpt2 import GPT2Attention


model = GPT2Model.from_pretrained("gpt2")

cf.apply_alibi(
    model=model,
    heads_per_layer=12,
    max_sequence_length=8192,
    position_embedding_attribute="module.transformer.wpe",
    attention_module=GPT2Attention,
    attr_to_replace="_attn",
    alibi_attention=_attn,
    mask_replacement_function=enlarge_mask
)

Training Loop#

Methods such as ๐Ÿž๏ธ Progressive Image Resizing or โ„๏ธ Layer Freezing apply changes to the training loop. See their method cards for details on how to use them in your own code.

Composer Trainer#

Building training recipes require composing all these different methods together, which is the purpose of our Trainer. Pass in a list of the algorithm classes to run to the trainer, and we will automatically run each one at the appropriate time during training, handling any collisions or reorderings as needed.

from composer import Trainer
from composer.algorithms import BlurPool, ChannelsLast

trainer = Trainer(
    model=model,
    algorithms=[ChannelsLast(), BlurPool()]
    train_dataloader=train_dataloader,
    eval_dataloader=test_dataloader,
    max_duration='10ep',
)

For more information, see: โš™๏ธ Using the Trainer and ๐ŸšŒ Welcome Tour.

Two-way callbacks#

The way our algorithms insert themselves in our trainer is based on the two-way callbacks system developed by (Howard et al, 2020). Algorithms interact with the training loop at various Events and effect their changes by modifing the trainer State.