composer.callbacks.grad_monitor#

Monitor gradient during training.

Classes

GradMonitor

Computes and logs the L2 norm of gradients on the AFTER_TRAIN_BATCH event.

class composer.callbacks.grad_monitor.GradMonitor(log_layer_grad_norms=False)[source]#

Bases: composer.core.callback.Callback

Computes and logs the L2 norm of gradients on the AFTER_TRAIN_BATCH event.

L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and hence may cause a reduction in throughput while training large models. In order to ensure the correctness of norm, this function should be called after gradient unscaling in cases where gradients are scaled.

Example

>>> from composer.callbacks import GradMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[GradMonitor()],
... )

The L2 norms are logged by the Logger to the following keys as described below.

Key

Logged data

grad_l2_norm/step

L2 norm of the gradients of all parameters in the model on the AFTER_TRAIN_BATCH event

layer_grad_l2_norm/LAYER_NAME

Layer-wise L2 norms if log_layer_grad_norms is True (default False)

Parameters

log_layer_grad_norms (bool, optional) โ€“ Whether to log the L2 normalization of each layer. Defaults to False.