composer.algorithms.agc.agc#

Core adaptive gradient clipping classes and functions.

Functions

apply_agc

Clips all gradients in model based on ratio of gradient norms to parameter norms.

Classes

AGC

Clips all gradients in model based on ratio of gradient norms to parameter norms.

class composer.algorithms.agc.agc.AGC(clipping_threshold=0.01)[source]#

Bases: composer.core.algorithm.Algorithm

Clips all gradients in model based on ratio of gradient norms to parameter norms.

From <https://arxiv.org/abs/2102.06171>. Computes the norm of the weights and the norm of their corresponding gradients, then scales the gradients by (weight_norm / grad_norm) * clipping_threshold for gradients whose norms are greater than weight_norm * clipping_threshold. Norms are taken across rows for weight matrices in MLPs, across entire filters/kernels for CNNs (channel and spatial dimensions), and across the whole vector for biases.

Runs on Event.AFTER_TRAIN_BATCH.

Example

from composer.algorithms import AGC
from composer.trainer import Trainer
agc_algorithm = AGC()
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    max_duration="1ep",
    algorithms=[agc_algorithm],
    optimizers=[optimizer]
)
Parameters

clipping_threshold (float, optional) โ€“ The largest acceptable ratio between grad norms and parameter norms before clipping is done.

apply(event, state, logger)[source]#

Freeze layers in the model.

match(event, state)[source]#

Run on Event.AFTER_TRAIN_BATCH.

composer.algorithms.agc.agc.apply_agc(model, clipping_threshold=0.01)[source]#

Clips all gradients in model based on ratio of gradient norms to parameter norms.

Example

from composer.algorithms.agc import apply_agc
apply_agc(model=model)
Parameters
  • model (Module) โ€“ The model being trained.

  • clipping_threshold (float, optional) โ€“ The largest acceptable ratio between grad norms and parameter norms before clipping is done.