grad_monitor#

Monitor gradients during training.

Classes

GradMonitor

Computes and logs the L2 norm of gradients on the Event.AFTER_TRAIN_BATCH event.

class composer.callbacks.grad_monitor.GradMonitor(log_layer_grad_norms=False)[source]#

Bases: composer.core.callback.Callback

Computes and logs the L2 norm of gradients on the Event.AFTER_TRAIN_BATCH event.

L2 norms are calculated after the reduction of gradients across GPUs. This function iterates over the parameters of the model and may cause a reduction in throughput while training large models. In order to ensure the correctness of the norm, this function should be called after gradient unscaling in cases where gradients are scaled.

Example

>>> from composer import Trainer
>>> from composer.callbacks import GradMonitor
>>> # constructing trainer object with this callback
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_dataloader,
...     optimizers=optimizer,
...     max_duration="1ep",
...     callbacks=[GradMonitor()],
... )

The L2 norms are logged by the Logger to the following keys as described below.

Key	Logged data
`grad_l2_norm/step`	L2 norm of the gradients of all parameters in the model on the `Event.AFTER_TRAIN_BATCH` event.
`layer_grad_l2_norm/LAYER_NAME`	Layer-wise L2 norms if `log_layer_grad_norms` is `True`. Default: `False`.

Parameters: log_layer_grad_norms (bool, optional) – Whether to log the L2 normalization of each layer. Default: False.