gradient_clipping#

Core gradient clipping classes and functions.

Functions

apply_gradient_clipping

Clips all gradients in model based on specified clipping_type.

Classes

GradientClipping

Clips all gradients in model based on specified clipping_type.

class composer.algorithms.gradient_clipping.gradient_clipping.GradientClipping(clipping_type, clipping_threshold)[source]#

Bases: composer.core.algorithm.Algorithm

Clips all gradients in model based on specified clipping_type.

Runs on Event.AFTER_TRAIN_BATCH.

Example

from composer.algorithms import GradientClipping
from composer.trainer import Trainer
gc = GradientClipping(clipping_type='norm', clipping_threshold=0.1)
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    max_duration="1ep",
    algorithms=[gc],
    optimizers=[optimizer]
)
Parameters
  • clipping_type ('adaptive', 'norm', 'value') โ€“ String denoting which type of gradient clipping to do. The options are: โ€˜normโ€™, which clips the gradient norm and uses torch.nn.utils.clip_grad_norm_, โ€˜valueโ€™, which clips gradient at a specified value and uses torch.nn.utils.clip_grad_value_, and โ€˜adaptiveโ€™, which clips all gradients based on gradient norm:parameter norm ratio using composer.algorithms.gradient_clipping.gradient_clipping._apply_agc.

  • clipping_threshold (float, optional) โ€“ Specifies what value to clip the gradients to (for โ€˜valueโ€™), what values to clip the gradient norms to (for โ€˜normโ€™), and threshold by which if grad_norm / weight_norm is greater than this threshold then scale gradients by this threshold * (weight_norm / grad_norm) (for โ€˜adaptiveโ€™).

Raises
  • NotImplementedError โ€“ if deepspeed is enabled and clipping_type is not โ€˜normโ€™.

  • ValueError โ€“ if deepspeed is enabled and clipping_type is not โ€˜normโ€™.

composer.algorithms.gradient_clipping.gradient_clipping.apply_gradient_clipping(parameters, clipping_type, clipping_threshold)[source]#

Clips all gradients in model based on specified clipping_type.

Parameters
  • parameters (Tensor or Iterable[Tensor]) โ€“ The parameters to of the model for whose gradients we will clip

  • clipping_type ('adaptive', 'norm', 'value') โ€“ String denoting which type of gradient clipping to do. The options are: โ€˜normโ€™, which clips the gradient norm and uses torch.nn.utils.clip_grad_norm_, โ€˜valueโ€™, which clips gradient at a specified value and uses torch.nn.utils.clip_grad_value_, and โ€˜adaptiveโ€™, which clips all gradients based on gradient norm:parameter norm ratio using composer.algorithms.gradient_clipping.gradient_clipping._apply_agc.

  • clipping_threshold (float, optional) โ€“ Specifies what value to clip the gradients to (for โ€˜valueโ€™), what values to clip the gradient norms to (for โ€˜normโ€™), and threshold by which if grad_norm / weight_norm is greater than this threshold then scale gradients by this threshold * (weight_norm / grad_norm) (for โ€˜adaptiveโ€™).