composer.State

The State object is available for algorithms to modify during composer.core.algorithm.Algorithm.apply(), and captures the state of the trainer.

A summary of available attributes and properties is given below:

Attribute

Type

Description

Training arguments

model

torch.nn.Module

Model, typically as a subclass of BaseMosaicModel.

train_batch_size

int

Global batch size for training

eval_batch_size

int

Batch size for evaluation

grad_accum

int

Gradient accumulation steps. The size of each microbatch would be train_batch_size / num_gpus / grad_accum

max_epochs

int

Maximum number of epochs

precision

str | Precision

Precision, one of [fp32, amp]

precision_context

Callable

Called with the precision to return a contextmanager.

Timing Information

epoch

int

The current epoch

step

int

The current step (in terms of optimization steps)

batch_idx

int

Index of the batch in the current epoch. Not mutable.

steps_per_epoch

int

Number of optimization steps per epoch. Not mutable.

Training Loop Tensors

batch

Batch

Batch returned by the dataloader. We currently support a tuple pair of tensors, or a dict of tensors.

batch_pair

BatchPair

Helper property that checks the batch is a tuple pair of tensors, and returns the batch.

batch_dict

BatchDict

Helper property that checks the batch is a dict, and returns the batch.

loss

Tensors

last computed loss

last_batch_size

int

Batch size returned from the dataloader. This can be different from the current size of Batch tensors if algorithms have modified the batch data.

outputs

Tensors

Output of the model’s forward pass. outputs is passed to the model.loss calcuation.

Optimizers

optimizers

Optimizer | Tuple[Optimizer]

Optimizers. Multiple optimizers are not currently supported.

schedulers

Scheduler | Tuple[Scheduler]

LR schedulers, wrapped in ComposableScheduler.

scaler

torch.cuda.amp.GradScaler

Gradient scaler for mixed precision.

Dataloaders

train_dataloader

DataLoader

Dataloader for training.

eval_dataloader

DataLoader

Dataloader for evaluation.

Algorithms

algorithms

Sequence[Algorithm]

List of algorithms

callbacks

Sequence[Callback]

List of callbacks, including loggers

Note

To support multi-GPU training, state.model may be wrapped in DistributedDataParallel, and the dataloaders may be wrapped in a device-specific dataloader that handles moving tensors to device.

Note

Schedulers are wrapped in ComposableScheduler, which handles stepping either stepwise or epochwise, and also properly sets up learning rate warmups.

class composer.core.State(model: types.Model, train_batch_size: int, eval_batch_size: int, grad_accum: int, max_epochs: int, precision: Union[str, types.Precision] = <property object>, precision_context: Callable[[Union[str, Precision]], ContextManager] = <factory>, epoch: int = 0, step: int = 0, loss: types.Tensors = <factory>, last_batch_size: int = 0, batch: types.Batch = <factory>, outputs: types.Tensors = <factory>, optimizers: Optional[types.Optimizers] = None, schedulers: Optional[types.Schedulers] = None, scaler: Optional[types.Scaler] = None, train_dataloader: Optional[types.DataLoader] = None, eval_dataloader: Optional[types.DataLoader] = None, algorithms: Sequence[Algorithm] = (), callbacks: Sequence[Callback] = (), world_size: int = 1, nproc_per_node: int = 1, seed: Optional[int] = None)[source]

The current state of the trainer.

Algorithms are able to modify this object in-place.

model

The model, typically as a subclass of BaseMosaicModel.

Type

types.Model, often BaseMosaicModel

train_batch_size

The global batch size used for training.

Type

int

eval_batch_size

The batch size used for evaluation.

Type

int

grad_accum

The number of gradient accumulation steps to use. The size of each microbatch is train_batch_size / num_gpus / grad_accum.

Type

int

max_epochs

The maximum number of epochs to train for.

Type

int

precision

The numerical precision to use for training. Should be one of [fp32, amp].

Type

str | Precision

precision_context ((precision

Precision) -> ContextManager): Function to produce a context manager to mandate precision.

epoch

The index of the current epoch.

Type

int

step

The index of the current step/batch (measured globally).

Type

int

batch

The most recently retrieved batch.

Type

types.Batch

loss

The most recently computed loss.

Type

types.Tensors

last_batch_size

The size of the batch last returned from the dataloader. This can be different from the current size of batch if algorithms have modified the batch.

Type

int

outputs

The most recently computed output from the model’s forward pass.

Type

types.Tensors

optimizers

The optimizers being used to train the model. Multiple optimizers are not currently supported.

Type

Optimizer | Tuple(Optimizer)

schedulers

The learning rate schedulers, wrapped in ComposableScheduler.

Type

Scheduler | Tuple(Scheduler)

scaler

The gradient scaler in use for mixed precision training.

Type

torch.cuda.amp.GradScaler, optional

train_dataloader

The dataloader used for training.

Type

DataLoader

eval_dataloader

The dataloader used for evaluation.

Type

DataLoader

algorithms

The algorithms used for training.

Type

list of Algorithm

callbacks

The callbacks used for training.

Type

list of Callback

property batch_dict: Dict[str, torch.Tensor]

The current batch, represented as a BatchDict.

Raises

TypeError – If the current batch is not a BatchDict.

Type

BatchDict

property batch_idx: int

batch_idx is the index of the batch in the current epoch.

Type

int

property batch_pair: Sequence[Union[torch.Tensor, Tuple[torch.Tensor, ...], List[torch.Tensor]]]

The current batch, represented as a BatchPair.

Raises

TypeError – If the current batch is not a BatchPair.

Type

BatchPair

load_state_dict(state: Dict[str, Any])[source]

Loads the state.

Parameters

state_dict (types.StateDict) – object returned from call to state_dict().

state_dict() Dict[str, Any][source]

Returns the state as a dict.

property steps_per_epoch: int

The number of steps (batches) per epoch.

Type

int