composer.State

The State object is available for algorithms to modify during Algorithm.apply(), and captures the state of the trainer.

A summary of available attributes and properties is given below:

Attribute	Type	Description
Training arguments
`model`	`torch.nn.Module`	Model, typically as a subclass of `BaseMosaicModel`.
`train_batch_size`	`int`	Global batch size for training
`eval_batch_size`	`int`	Batch size for evaluation
`grad_accum`	`int`	Gradient accumulation steps. The size of each microbatch would be `train_batch_size / num_gpus / grad_accum`
`max_epochs`	`int`	Maximum number of epochs
`precision`	`str \| Precision`	Precision, one of `[fp32, amp]`
`precision_context`	`Callable`	Called with the precision to return a contextmanager.
Timing Information
`epoch`	`int`	The current epoch
`step`	`int`	The current step (in terms of optimization steps)
`batch_idx`	`int`	Index of the batch in the current epoch. Not mutable.
`steps_per_epoch`	`int`	Number of optimization steps per epoch. Not mutable.
Training Loop Tensors
`batch`	`Batch`	Batch returned by the dataloader. We currently support a `tuple` pair of tensors, or a `dict` of tensors.
`batch_pair`	`BatchPair`	Helper `property` that checks the batch is a tuple pair of tensors, and returns the batch.
`batch_dict`	`BatchDict`	Helper `property` that checks the batch is a `dict`, and returns the batch.
`loss`	`Tensors`	last computed loss
`last_batch_size`	`int`	Batch size returned from the dataloader. This can be different from the current size of `Batch` tensors if algorithms have modified the batch data.
`outputs`	`Tensors`	Output of the model’s forward pass. `outputs` is passed to the `model.loss` calcuation.
Optimizers
`optimizers`	`Optimizer \| Tuple[Optimizer]`	Optimizers. Multiple optimizers are not currently supported.
`schedulers`	`Scheduler \| Tuple[Scheduler]`	LR schedulers, wrapped in `ComposableScheduler`.
`scaler`	`torch.cuda.amp.GradScaler`	Gradient scaler for mixed precision.
Dataloaders
`train_dataloader`	`DataLoader`	Dataloader for training.
`eval_dataloader`	`DataLoader`	Dataloader for evaluation.
Algorithms
`algorithms`	`Sequence[Algorithm]`	List of algorithms
`callbacks`	`Sequence[Callback]`	List of callbacks, including loggers

Note

To support multi-GPU training, state.model may be wrapped in DistributedDataParallel, and the dataloaders may be wrapped in a device-specific dataloader that handles moving tensors to device.

Note

Schedulers are wrapped in ComposableScheduler, which handles stepping either stepwise or epochwise, and also properly sets up learning rate warmups.

class composer.State(model: types.Model, train_batch_size: int, eval_batch_size: int, grad_accum: int, max_epochs: int, precision: Union[str, types.Precision] = <property object>, precision_context: Callable[[Union[str, Precision]], ContextManager] = <factory>, epoch: int = 0, step: int = 0, loss: types.Tensors = <factory>, last_batch_size: int = 0, batch: types.Batch = <factory>, outputs: types.Tensors = <factory>, optimizers: Optional[types.Optimizers] = None, schedulers: Optional[types.Schedulers] = None, scaler: Optional[types.Scaler] = None, train_dataloader: Optional[types.DataLoader] = None, eval_dataloader: Optional[types.DataLoader] = None, algorithms: Sequence[Algorithm] = (), callbacks: Sequence[Callback] = ())[source]

The class used to store the state of the trainer.

Contains variables that the trainer tracks throughout the training loop. Note that the entire state is serialized when the trainer is checkpointed so that it can be used restore the trainer and continue training from a checkpoint. Algorithms are able to modify this object in-place.

model

The model, typically as a subclass of BaseMosaicModel.

Type: Model, often BaseMosaicModel

train_batch_size

The global batch size used for training.

Type: int

eval_batch_size

The batch size used for evaluation.

Type: int

grad_accum

The number of gradient accumulation steps to use. The size of each microbatch is train_batch_size / num_gpus / grad_accum.

Type: int

max_epochs

The maximum number of epochs to train for.

Type: int

precision

The numerical precision to use for training. Should be one of [fp32, amp].

Type: str | Precision

precision_context ((precision: Precision) -> ContextManager): Function to produce a context manager to mandate precision.

epoch

The index of the current epoch.

Type: int

step

The index of the current step/batch (measured globally).

Type: int

batch

The most recently retrieved batch.

Type: Batch

loss

The most recently computed loss.

Type: Tensors

last_batch_size

The size of the batch last returned from the dataloader. This can be different from the current size of batch if algorithms have modified the batch.

Type: int

outputs

The most recently computed output from the model’s forward pass.

Type: Tensors

optimizers

The optimizers being used to train the model. Multiple optimizers are not currently supported.

Type: Optimizers

schedulers

The learning rate schedulers, typically wrapped in ComposableScheduler.

Type: Schedulers

scaler

The gradient scaler in use for mixed precision training.

Type: GradScaler, optional

train_dataloader

The dataloader used for training.

Type: DataLoader

eval_dataloader

The dataloader used for evaluation.

Type: DataLoader

algorithms

The algorithms used for training.

Type: Sequence[Algorithm]

callbacks

The callbacks used for training.

Type: Sequence[Callback]

property batch_dict: Dict[str, Tensor]

The current batch, represented as a BatchDict.

Raises: TypeError – If the current batch is not a BatchDict.
Type: BatchDict

property batch_idx: int

batch_idx is the index of the batch in the current epoch.

Type: int

property batch_pair: Union[Tuple[Union[Tensor, Tuple[Tensor, ...], List[Tensor]], Union[Tensor, Tuple[Tensor, ...], List[Tensor]]], List[Tensor]]

The current batch, represented as a BatchPair.

Raises: TypeError – If the current batch is not a BatchPair.
Type: BatchPair

load_state_dict(state: Dict[str, Any])[source]

Loads the state.

Parameters: state_dict (StateDict) – object returned from call to state_dict().

state_dict() → Dict[str, Any][source]: Returns the state as a dict.

property steps_per_epoch: int

The number of steps (batches) per epoch.

Type: int