composer.Algorithm

Algorithms are implemented in both a standalone functional form (see composer.functional), or as class Algorithm for integration in the MosaicML Trainer. This section describes the latter form.

For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:

Algorithm.match(): returns True if the algorithm should be run given the current State and Event.
Algorithm.apply(): performs an in-place modification of the given State

For example, a simple algorithm that shortens training:

from composer import Algorithm, State, Event, Logger

class ShortenTraining(Algorithm):

    def match(self, state: State, event: Event, logger: Logger) -> bool:
        return event == Event.TRAINING_START

    def apply(self, state: State, event: Event, logger: Logger):
        state.max_epochs /= 2  # cut training time in half

For a complete list of algorithms, see composer.algorithms.

For reference, available events include:

Name	Description
`INIT`	Immediately after `model` initialization, and before creation of `optimizers` and `schedulers`. Model surgery typically occurs here.
`TRAINING_START`	Start of training. For multi-GPU training, runs after the DDP process fork.
`EPOCH_START`, `EPOCH_END`	Start and end of an Epoch.
`BATCH_START`, `BATCH_END`	Start and end of a batch, inclusive of the optimizer step and any gradient scaling.
`AFTER_DATALOADER`	Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.
`BEFORE_TRAIN_BATCH`, `AFTER_TRAIN_BATCH`	Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once.
`BEFORE_FORWARD`, `AFTER_FORWARD`	Before and after the call to `model.forward()`
`BEFORE_LOSS`, `AFTER_LOSS`	Before and after the loss computation.
`BEFORE_BACKWARD`, `AFTER_BACKWARD`	Before and after the backward pass.
`TRAINING_END`	End of training.
`EVAL_START`, `EVAL_END`	Start and end of evaluation through the validation dataset.
`EVAL_BATCH_START`, `EVAL_BATCH_END`	Before and after the call to `model.validate(batch)`
`EVAL_BEFORE_FORWARD`, `EVAL_AFTER_FORWARD`	Before and after the call to `model.validate(batch)`

For more information about events, see composer.Event.

class composer.core.Algorithm(*args, **kwargs)[source]

Base class for algorithms.

Algorithms are pieces of code which run at specific events in the training loop. Algorithms modify the trainer’s state, generally with the effect of improving the model’s quality, or increasing the efficiency and throughput of the training loop.

Algorithms must implement two methods: match(), which returns whether the algorithm should be run given the current event and state, and apply(), which makes an in-place change to the State.

abstract apply(event: Event, state: State, logger: Logger) → Optional[int][source]

Applies the algorithm to make an in-place change to the State

Can optionally return an exit code to be stored in a Trace.

Parameters

event (Event) – The current event.
state (State) – The current state.
logger (Logger) – A logger to use for logging algorithm-specific metrics.

Returns

``int`` or ``None`` – exit code that is stored in Trace
and made accessible for debugging.

property find_unused_parameters: bool

Indicates that the effect of this algorithm may cause some model parameters to be unused.

Used to tell DDP that some parameters will be frozen during training and hence it should not expect gradients from them. All algorithms which do any kind of parameter freezing should override this function to return True.

abstract match(event: Event, state: State) → bool[source]

Determines whether this algorithm should run, given the current Event and State.

Examples:

To only run on a specific event:

>>> return event == Event.BEFORE_LOSS

Switching based on state attributes:

>>> return state.epoch > 30 && state.world_size == 1

See State for accessible attributes.

Parameters

event (Event) – The current event.
state (State) – The current state.

Returns

bool – True if this algorithm should run now.