composer.algorithms#

Modules

composer.algorithms.agc

Adaptive gradient Clipping Clips all gradients in model based on ratio of gradient norms to parameter norms.

composer.algorithms.algorithm_hparams

composer.algorithms.algorithm_hparams

composer.algorithms.algorithm_registry

composer.algorithms.algorithm_registry

composer.algorithms.alibi

ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings for tokens in transformer-based NLP models, instead encoding position information by biasing the query-key attention scores proportionally to each token pair's distance.

composer.algorithms.augmix

AugMix (Hendrycks et al, 2020) creates multiple independent realizations of sequences of image augmentations, applies each sequence with random intensity, and returns a convex combination of the augmented images and the original image.

composer.algorithms.blurpool

BlurPool adds anti-aliasing filters to convolutional layers to increase accuracy and invariance to small shifts in the input.

composer.algorithms.channels_last

Changes the memory format of the model to torch.channels_last.

composer.algorithms.colout

Drops a fraction of the rows and columns of an input image.

composer.algorithms.cutmix

CutMix trains the network on non-overlapping combinations of pairs of examples and iterpolated targets rather than individual examples and targets.

composer.algorithms.cutout

Cutout is a data augmentation technique that works by masking out one or more square regions of an input image.

composer.algorithms.ema

Exponential moving average maintains a moving average of model parameters and uses these at test time.

composer.algorithms.factorize

Decomposes linear operators into pairs of smaller linear operators.

composer.algorithms.ghost_batchnorm

Replaces batch normalization modules with Ghost Batch Normalization modules that simulate the effect of using a smaller batch size.

composer.algorithms.hparams

composer.algorithms.hparams

composer.algorithms.label_smoothing

Shrinks targets towards a uniform distribution to counteract label noise.

composer.algorithms.layer_freezing

Progressively freeze the layers of the network during training, starting with the earlier layers.

composer.algorithms.mixup

Create new samples using convex combinations of pairs of samples.

composer.algorithms.no_op_model

Replaces model with a dummy model of type NoOpModelClass.

composer.algorithms.progressive_resizing

Apply Fastai's progressive resizing data augmentation to speed up training.

composer.algorithms.randaugment

Randomly applies a sequence of image data augmentations (Cubuk et al, 2019) to an image.

composer.algorithms.sam

SAM (Foret et al, 2020) wraps an existing optimizer with a SAMOptimizer which makes the optimizer minimize both loss value and sharpness.This can improves model generalization and provide robustness to label noise.

composer.algorithms.scale_schedule

Deprecated - do not use.

composer.algorithms.selective_backprop

Selective Backprop prunes minibatches according to the difficulty of the individual training examples, and only computes weight gradients over the pruned subset, reducing iteration time and speeding up training.

composer.algorithms.seq_length_warmup

Sequence length warmup progressively increases the sequence length during training of NLP models.

composer.algorithms.squeeze_excite

Adds Squeeze-and-Excitation blocks (Hu et al, 2019) after the Conv2d modules in a neural network.

composer.algorithms.stochastic_depth

Implements stochastic depth (Huang et al, 2016) for ResNet blocks.

composer.algorithms.swa

Stochastic Weight Averaging (SWA; Izmailov et al, 2018) averages model weights sampled at different times near the end of training.

composer.algorithms.utils

Helper utilities for algorithms.

composer.algorithms.warnings

composer.algorithms.warnings

Efficiency methods for training.

Examples include LabelSmoothing and adding SqueezeExcite blocks, among many others.

Algorithms are implemented in both a standalone functional form (see composer.functional) and as subclasses of Algorithm for integration in the Composer Trainer. The former are easier to integrate piecemeal into an existing codebase. The latter are easier to compose together, since they all have the same public interface and work automatically with the Composer Trainer.

For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:

  • Algorithm.match(): returns True if the algorithm should be run given the current State and Event.

  • Algorithm.apply(): performs an in-place modification of the given State

For example, a simple algorithm that shortens training:

from composer import Algorithm, State, Event, Logger

class ShortenTraining(Algorithm):

    def match(self, state: State, event: Event, logger: Logger) -> bool:
        return event == Event.INIT

    def apply(self, state: State, event: Event, logger: Logger):
        state.max_duration /= 2  # cut training time in half

For more information about events, see Event.

Functions

get_algorithm_registry

composer.algorithms.algorithm_registry.get_algorithm_registry

list_algorithms

composer.algorithms.algorithm_registry.list_algorithms

Classes

AGC

Clips all gradients in model based on ratio of gradient norms to parameter norms.

Alibi

ALiBi (Attention with Linear Biases; Press et al, 2021) dispenses with position embeddings and instead directly biases attention matrices such that nearby tokens attend to one another more strongly.

AugMix

AugMix (Hendrycks et al, 2020) creates width sequences of depth image augmentations, applies each sequence with random intensity, and returns a convex combination of the width augmented images and the original image.

AugmentAndMixTransform

Wrapper module for augmix_image() that can be passed to torchvision.transforms.Compose.

BlurPool

BlurPool adds anti-aliasing filters to convolutional layers to increase accuracy and invariance to small shifts in the input.

ChannelsLast

Changes the memory format of the model to torch.channels_last.

ColOut

Drops a fraction of the rows and columns of an input image and (optionally) a target image.

ColOutTransform

Torchvision-like transform for performing the ColOut augmentation, where random rows and columns are dropped from up to two Torch tensors or two PIL images.

CutMix

CutMix trains the network on non-overlapping combinations of pairs of examples and interpolated targets rather than individual examples and targets.

CutOut

Cutout is a data augmentation technique that works by masking out one or more square regions of an input image.

EMA

Maintains a shadow model with weights that follow the exponential moving average of the trained model weights.

Factorize

Decomposes linear operators into pairs of smaller linear operators.

GhostBatchNorm

Replaces batch normalization modules with Ghost Batch Normalization modules that simulate the effect of using a smaller batch size.

LabelSmoothing

Shrink targets towards a uniform distribution as in Szegedy et al.

LayerFreezing

Progressively freeze the layers of the network during training, starting with the earlier layers.

MixUp

MixUp trains the network on convex combinations of pairs of examples and targets rather than individual examples and targets.

NoOpModel

Runs on Event.INIT and replaces the model with a dummy model of type NoOpModelClass.

ProgressiveResizing

Apply Fastai's progressive resizing data augmentation to speed up training.

RandAugment

Randomly applies a sequence of image data augmentations (Cubuk et al, 2019) to an image.

RandAugmentTransform

Wraps randaugment_image() in a torchvision-compatible transform.

SAM

Adds sharpness-aware minimization (Foret et al, 2020) by wrapping an existing optimizer with a SAMOptimizer.

SWA

Apply Stochastic Weight Averaging (Izmailov et al, 2018)

ScaleSchedule

Deprecated - do not use.

SelectiveBackprop

Selectively backpropagate gradients from a subset of each batch.

SeqLengthWarmup

Progressively increases the sequence length during training.

SqueezeExcite

Adds Squeeze-and-Excitation blocks (Hu et al, 2019) after the Conv2d modules in a neural network.

SqueezeExcite2d

Squeeze-and-Excitation block from (Hu et al, 2019)

SqueezeExciteConv2d

Helper class used to add a SqueezeExcite2d module after a Conv2d.

StochasticDepth

Applies Stochastic Depth (Huang et al, 2016) to the specified model.

Hparams

These classes are used with yahp for YAML-based configuration.

AGCHparams

See AGC

AlgorithmHparams

Hyperparameters for algorithms.

AlibiHparams

See Alibi

AugMixHparams

See AugMix

BlurPoolHparams

See BlurPool

ChannelsLastHparams

ChannelsLast has no hyperparameters, so this class has no member variables.

ColOutHparams

See ColOut

CutMixHparams

See CutMix

CutOutHparams

See CutOut

EMAHparams

See EMA

FactorizeHparams

See Factorize

GhostBatchNormHparams

See GhostBatchNorm

LabelSmoothingHparams

See LabelSmoothing

LayerFreezingHparams

See LayerFreezing

MixUpHparams

See MixUp

NoOpModelHparams

composer.algorithms.hparams.NoOpModelHparams

ProgressiveResizingHparams

See ProgressiveResizing

RandAugmentHparams

See RandAugment

SAMHparams

See SAM

SWAHparams

See SWA

ScaleScheduleHparams

See ScaleSchedule

SelectiveBackpropHparams

See SelectiveBackprop

SeqLengthWarmupHparams

composer.algorithms.hparams.SeqLengthWarmupHparams

SqueezeExciteHparams

See SqueezeExcite

StochasticDepthHparams

See StochasticDepth

Methods

  • load_multiple()

  • load()