composer.Algorithm

Algorithms are implemented in both a standalone functional form (see composer.functional), or as class Algorithm for integration in the MosaicML Trainer. This section describes the latter form.

For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:

For example, a simple algorithm that shortens training:

from composer import Algorithm, State, Event, Logger

class ShortenTraining(Algorithm):

    def match(self, state: State, event: Event, logger: Logger) -> bool:
        return event == Event.TRAINING_START

    def apply(self, state: State, event: Event, logger: Logger):
        state.max_epochs /= 2  # cut training time in half

For a complete list of algorithms, see composer.algorithms.

For reference, available events include:

Name

Description

INIT

Immediately after model initialization, and before creation of optimizers and schedulers. Model surgery typically occurs here.

TRAINING_START

Start of training. For multi-GPU training, runs after the DDP process fork.

EPOCH_START, EPOCH_END

Start and end of an Epoch.

BATCH_START, BATCH_END

Start and end of a batch, inclusive of the optimizer step and any gradient scaling.

AFTER_DATALOADER

Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.

BEFORE_TRAIN_BATCH, AFTER_TRAIN_BATCH

Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once.

BEFORE_FORWARD, AFTER_FORWARD

Before and after the call to model.forward()

BEFORE_LOSS, AFTER_LOSS

Before and after the loss computation.

BEFORE_BACKWARD, AFTER_BACKWARD

Before and after the backward pass.

TRAINING_END

End of training.

EVAL_START, EVAL_END

Start and end of evaluation through the validation dataset.

EVAL_BATCH_START, EVAL_BATCH_END

Before and after the call to model.validate(batch)

EVAL_BEFORE_FORWARD, EVAL_AFTER_FORWARD

Before and after the call to model.validate(batch)

For more information about events, see composer.Event.

class composer.core.Algorithm(*args, **kwargs)[source]

Base class for algorithms.

Algorithms are pieces of code which run at specific events in the training loop. Algorithms modify the trainer’s state, generally with the effect of improving the model’s quality, or increasing the efficiency and throughput of the training loop.

Algorithms must implement two methods: match(), which returns whether the algorithm should be run given the current event and state, and apply(), which makes an in-place change to the State.

abstract apply(event: Event, state: State, logger: Logger) Optional[int][source]

Applies the algorithm to make an in-place change to the State

Can optionally return an exit code to be stored in a Trace.

Parameters
  • event (Event) – The current event.

  • state (State) – The current state.

  • logger (Logger) – A logger to use for logging algorithm-specific metrics.

Returns
  • ``int`` or ``None`` – exit code that is stored in Trace

  • and made accessible for debugging.

property find_unused_parameters: bool

Indicates that the effect of this algorithm may cause some model parameters to be unused.

Used to tell DDP that some parameters will be frozen during training and hence it should not expect gradients from them. All algorithms which do any kind of parameter freezing should override this function to return True.

abstract match(event: Event, state: State) bool[source]

Determines whether this algorithm should run, given the current Event and State.

Examples:

To only run on a specific event:

>>> return event == Event.BEFORE_LOSS

Switching based on state attributes:

>>> return state.epoch > 30 && state.world_size == 1

See State for accessible attributes.

Parameters
  • event (Event) – The current event.

  • state (State) – The current state.

Returns

bool – True if this algorithm should run now.