composer.State

The State object is available for algorithms to modify during Algorithm.apply(), and captures the state of the trainer.

A summary of available attributes and properties is given below:

Attribute

Type

Description

Training arguments

model

torch.nn.Module

Model, typically as a subclass of BaseMosaicModel.

train_batch_size

int

Global batch size for training

eval_batch_size

int

Batch size for evaluation

grad_accum

int

Gradient accumulation steps. The size of each microbatch would be train_batch_size / num_gpus / grad_accum

max_epochs

int

Maximum number of epochs

precision

str | Precision

Precision, one of [fp32, amp]

precision_context

Callable

Called with the precision to return a contextmanager.

Timing Information

epoch

int

The current epoch

step

int

The current step (in terms of optimization steps)

batch_idx

int

Index of the batch in the current epoch. Not mutable.

steps_per_epoch

int

Number of optimization steps per epoch. Not mutable.

Training Loop Tensors

batch

Batch

Batch returned by the dataloader. We currently support a tuple pair of tensors, or a dict of tensors.

batch_pair

BatchPair

Helper property that checks the batch is a tuple pair of tensors, and returns the batch.

batch_dict

BatchDict

Helper property that checks the batch is a dict, and returns the batch.

loss

Tensors

last computed loss

last_batch_size

int

Batch size returned from the dataloader. This can be different from the current size of Batch tensors if algorithms have modified the batch data.

outputs

Tensors

Output of the model’s forward pass. outputs is passed to the model.loss calcuation.

Optimizers

optimizers

Optimizer | Tuple[Optimizer]

Optimizers. Multiple optimizers are not currently supported.

schedulers

Scheduler | Tuple[Scheduler]

LR schedulers, wrapped in ComposableScheduler.

scaler

torch.cuda.amp.GradScaler

Gradient scaler for mixed precision.

Dataloaders

train_dataloader

DataLoader

Dataloader for training.

eval_dataloader

DataLoader

Dataloader for evaluation.

Algorithms

algorithms

Sequence[Algorithm]

List of algorithms

callbacks

Sequence[Callback]

List of callbacks, including loggers

Note

To support multi-GPU training, state.model may be wrapped in DistributedDataParallel, and the dataloaders may be wrapped in a device-specific dataloader that handles moving tensors to device.

Note

Schedulers are wrapped in ComposableScheduler, which handles stepping either stepwise or epochwise, and also properly sets up learning rate warmups.