composer.Algorithm
Algorithms are implemented in both a standalone functional form (see composer.functional), or as class Algorithm
for integration in the MosaicML Trainer
. This section describes the latter form.
For ease of composability, algorithms in our Trainer are based on the two-way callbacks concept from Howard et al, 2020. Each algorithm implements two methods:
Algorithm.match()
: returnsTrue
if the algorithm should be run given the currentState
andEvent
.Algorithm.apply()
: performs an in-place modification of the givenState
For example, a simple algorithm that shortens training:
from composer import Algorithm, State, Event, Logger
class ShortenTraining(Algorithm):
def match(self, state: State, event: Event, logger: Logger) -> bool:
return event == Event.TRAINING_START
def apply(self, state: State, event: Event, logger: Logger):
state.max_epochs /= 2 # cut training time in half
For a complete list of algorithms, see composer.algorithms.
For reference, available events include:
Name |
Description |
---|---|
|
Immediately after |
|
Start of training. For multi-GPU training, runs after the DDP process fork. |
|
Start and end of an Epoch. |
|
Start and end of a batch, inclusive of the optimizer step and any gradient scaling. |
|
Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms. |
|
Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once. |
|
Before and after the call to |
|
Before and after the loss computation. |
|
Before and after the backward pass. |
|
End of training. |
|
Start and end of evaluation through the validation dataset. |
|
Before and after the call to |
|
Before and after the call to |
For more information about events, see composer.Event.
- class composer.core.Algorithm(*args, **kwargs)[source]
Base class for algorithms.
Algorithms are pieces of code which run at specific events in the training loop. Algorithms modify the trainer’s state, generally with the effect of improving the model’s quality, or increasing the efficiency and throughput of the training loop.
Algorithms must implement two methods:
match()
, which returns whether the algorithm should be run given the current event and state, andapply()
, which makes an in-place change to the State.- abstract apply(event: Event, state: State, logger: Logger) Optional[int] [source]
Applies the algorithm to make an in-place change to the State
Can optionally return an exit code to be stored in a
Trace
.
- property find_unused_parameters: bool
Indicates that the effect of this algorithm may cause some model parameters to be unused.
Used to tell DDP that some parameters will be frozen during training and hence it should not expect gradients from them. All algorithms which do any kind of parameter freezing should override this function to return True.
- abstract match(event: Event, state: State) bool [source]
Determines whether this algorithm should run, given the current
Event
andState
.Examples:
To only run on a specific event:
>>> return event == Event.BEFORE_LOSS
Switching based on state attributes:
>>> return state.epoch > 30 && state.world_size == 1
See
State
for accessible attributes.- Parameters
event (
Event
) – The current event.state (
State
) – The current state.
- Returns
bool – True if this algorithm should run now.