composer.trainer.ddp#
Helpers for running distributed data parallel training.
Functions
A context manager for handling the |
|
Wraps the module in a |
Classes
How and when DDP gradient synchronization should happen. |
- class composer.trainer.ddp.DDPSyncStrategy(value)[source]#
Bases:
composer.utils.string_enum.StringEnum
How and when DDP gradient synchronization should happen.
- SINGLE_AUTO_SYNC#
The default behavior for DDP. Gradients are synchronized as they computed, for only the final microbatch of a batch. This is the most efficient strategy, but can lead to errors when
find_unused_parameters
is set, since it is possible different microbatches may use different sets of parameters, leading to an incomplete sync.
- MULTI_AUTO_SYNC#
The default behavior for DDP when
find_unused_parameters
is set. Gradients are synchronized as they are computed for all microbatches. This ensures complete synchronization, but is less efficient thanSINGLE_AUTO_SYNC
. This efficiency gap is usually small, as long as either DDP syncs are a small portion of the trainerโs overall runtime, or the number of microbatches per batch is relatively small.
- FORCED_SYNC#
Gradients are manually synchronized only after all gradients have been computed for the final microbatch of a batch. Like
MULTI_AUTO_SYNC
, this strategy ensures complete gradient synchronization, but this tends to be slower thanMULTI_AUTO_SYNC
. This is because ordinarily syncs can happen in parallel with theloss.backward()
computation, meaning syncs can be mostly complete by the time that function finishes. However, in certain circumstances, syncs may take a very long time to complete - if there are also a lot of microbatches per batch, this strategy may be optimal.
- composer.trainer.ddp.ddp_sync_context(state, is_final_microbatch, sync_strategy)[source]#
A context manager for handling the
DDPSyncStrategy
.- Parameters
is_final_microbatch (bool) โ Whether or not the context is being used during the final microbatch of the gradient accumulation steps.
sync_strategy (str | DDPSyncStrategy) โ The ddp sync strategy to use. If a string is provided, the string must be one of the values in
DDPSyncStrategy
.
- composer.trainer.ddp.prepare_ddp_module(module, find_unused_parameters)[source]#
Wraps the module in a
torch.nn.parallel.DistributedDataParallel
object if running distributed training.