composer.algorithms.agc.agc#
Core adaptive gradient clipping classes and functions.
Functions
Clips all gradients in model based on ratio of gradient norms to parameter norms. |
Classes
Clips all gradients in model based on ratio of gradient norms to parameter norms. |
- class composer.algorithms.agc.agc.AGC(clipping_threshold=0.01)[source]#
Bases:
composer.core.algorithm.Algorithm
Clips all gradients in model based on ratio of gradient norms to parameter norms.
From <https://arxiv.org/abs/2102.06171>. Computes the norm of the weights and the norm of their corresponding gradients, then scales the gradients by (weight_norm / grad_norm) * clipping_threshold for gradients whose norms are greater than weight_norm * clipping_threshold. Norms are taken across rows for weight matrices in MLPs, across entire filters/kernels for CNNs (channel and spatial dimensions), and across the whole vector for biases.
Runs on
Event.AFTER_TRAIN_BATCH
.Example
from composer.algorithms import AGC from composer.trainer import Trainer agc_algorithm = AGC() trainer = Trainer( model=model, train_dataloader=train_dataloader, eval_dataloader=eval_dataloader, max_duration="1ep", algorithms=[agc_algorithm], optimizers=[optimizer] )
- Parameters
clipping_threshold (float, optional) โ The largest acceptable ratio between grad norms and parameter norms before clipping is done.