composer.State
The State
object is available for algorithms to modify during
Algorithm.apply()
, and captures the state of the trainer.
A summary of available attributes and properties is given below:
Attribute |
Type |
Description |
---|---|---|
Training arguments |
||
|
|
Model, typically as a subclass of |
|
|
Global batch size for training |
|
|
Batch size for evaluation |
|
|
Gradient accumulation steps. The size of each microbatch would be |
|
|
Maximum number of epochs |
|
|
Precision, one of |
|
|
Called with the precision to return a contextmanager. |
Timing Information |
||
|
|
The current epoch |
|
|
The current step (in terms of optimization steps) |
|
|
Index of the batch in the current epoch. Not mutable. |
|
|
Number of optimization steps per epoch. Not mutable. |
Training Loop Tensors |
||
|
|
Batch returned by the dataloader. We currently support a |
|
|
Helper |
|
|
Helper |
|
|
last computed loss |
|
|
Batch size returned from the dataloader. This can be different from the current size of |
|
|
Output of the model’s forward pass. |
Optimizers |
||
|
|
Optimizers. Multiple optimizers are not currently supported. |
|
|
LR schedulers, wrapped in |
|
|
Gradient scaler for mixed precision. |
Dataloaders |
||
|
|
Dataloader for training. |
|
|
Dataloader for evaluation. |
Algorithms |
||
|
|
List of algorithms |
|
|
List of callbacks, including loggers |
Note
To support multi-GPU training, state.model
may be wrapped in DistributedDataParallel
, and the dataloaders may be wrapped in a device-specific dataloader that handles moving tensors to device.
Note
Schedulers
are wrapped in ComposableScheduler
, which handles stepping either stepwise or epochwise, and also properly sets up learning rate warmups.