composer.callbacks.speed_monitor#

Monitor throughput during training.

Classes

SpeedMonitor

Logs the training throughput.

class composer.callbacks.speed_monitor.SpeedMonitor(window_size=100)[source]#

Bases: composer.core.callback.Callback

Logs the training throughput.

The training throughput in terms of number of samples per second is logged on the BATCH_END event if we have reached the window_size threshold. Per epoch average throughput and wall clock train time is also logged on the EPOCH_END event.

Example
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[callbacks.SpeedMonitor(window_size=100)],
... )

The training throughput is logged by the Logger to the following keys as described below.

Key

Logged data

throughput/step

Rolling average (over window_size most recent batches) of the number of samples processed per second

throughput/epoch

Number of samples processed per second (averaged over an entire epoch)

wall_clock_train

Total elapsed training time

Parameters

window_size (int, optional) โ€“ Number of batches to use for a rolling average of throughput. Default to 100.

load_state_dict(state)[source]#

Restores the state of SpeedMonitor object.

Parameters

state (StateDict) โ€“ The state of the object, as previously returned by state_dict()

state_dict()[source]#

Returns a dictionary representing the internal state of the SpeedMonitor object.

The returned dictionary is pickle-able via torch.save().

Returns

StateDict โ€“ The state of the SpeedMonitor object