composer.algorithms.selective_backprop.selective_backprop#

composer.algorithms.selective_backprop.selective_backprop

Functions

select_using_loss

Selectively backpropagate gradients from a subset of each batch (Jiang et al, 2019).

should_selective_backprop

Decide if selective backprop should be run based on time in training.

Classes

Algorithm

Base class for algorithms.

ComposerModel

The interface needed to make a PyTorch model compatible with composer.Trainer.

Event

Enum to represent events in the training loop.

Logger

An interface to record training data.

SelectiveBackprop

Selectively backpropagate gradients from a subset of each batch (Jiang et al, 2019).

State

The state of the trainer.

Attributes

  • Callable

  • Optional

  • Sequence

  • Tuple

  • Union

  • annotations

class composer.algorithms.selective_backprop.selective_backprop.SelectiveBackprop(start=0.5, end=0.9, keep=0.5, scale_factor=0.5, interrupt=2)[source]#

Bases: composer.core.algorithm.Algorithm

Selectively backpropagate gradients from a subset of each batch (Jiang et al, 2019).

Selective Backprop (SB) prunes minibatches according to the difficulty of the individual training examples, and only computes weight gradients over the pruned subset, reducing iteration time and speeding up training. The fraction of the minibatch that is kept for gradient computation is specified by the argument 0 <= keep <= 1.

To speed up SBโ€™s selection forward pass, the argument scale_factor can be used to spatially downsample input image tensors. The full-sized inputs will still be used for the weight gradient computation.

To preserve convergence, SB can be interrupted with vanilla minibatch gradient steps every interrupt steps. When interrupt=0, SB will be used at every step during the SB interval. When interrupt=2, SB will alternate with vanilla minibatch steps.

Parameters
  • start (float, optional) โ€“ SB interval start as fraction of training duration Default: 0.5.

  • end (float, optional) โ€“ SB interval end as fraction of training duration Default: 0.9.

  • keep (float, optional) โ€“ fraction of minibatch to select and keep for gradient computation Default: 0.5.

  • scale_factor (float, optional) โ€“ scale for downsampling input for selection forward pass Default: 0.5.

  • interrupt (int, optional) โ€“ interrupt SB with a vanilla minibatch step every interrupt batches. Default: 2.

apply(event, state, logger=None)[source]#

Apply selective backprop to the current batch.

match(event, state)[source]#

Matches Event.INIT and Event.AFTER_DATALOADER

  • Uses Event.INIT to get the loss function before the model is wrapped

  • Uses Event.AFTER_DATALOADER` to apply selective backprop if time is between self.start and self.end.

composer.algorithms.selective_backprop.selective_backprop.select_using_loss(input, target, model, loss_fun, keep=0.5, scale_factor=1)[source]#

Selectively backpropagate gradients from a subset of each batch (Jiang et al, 2019).

Selective Backprop (SB) prunes minibatches according to the difficulty of the individual training examples and only computes weight gradients over the selected subset. This reduces iteration time and speeds up training. The fraction of the minibatch that is kept for gradient computation is specified by the argument 0 <= keep <= 1.

To speed up SBโ€™s selection forward pass, the argument scale_factor can be used to spatially downsample input tensors. The full-sized inputs will still be used for the weight gradient computation.

Parameters
  • input (Tensor) โ€“ Input tensor to prune

  • target (Tensor) โ€“ Output tensor to prune

  • model (Callable) โ€“ Model with which to predict outputs

  • loss_fun (Callable) โ€“ Loss function of the form loss(outputs, targets, reduction='none'). The function must take the keyword argument reduction='none' to ensure that per-sample losses are returned.

  • keep (float, optional) โ€“ Fraction of examples in the batch to keep. Default: 0.5.

  • scale_factor (float, optional) โ€“ Multiplier between 0 and 1 for spatial size. Downsampling requires the input tensor to be at least 3D. Default: 1.

Returns

(torch.Tensor, torch.Tensor) โ€“ The pruned batch of inputs and targets

Raises
  • ValueError โ€“ If scale_factor > 1

  • TypeError โ€“ If loss_fun > 1 has the wrong signature or is not callable

Note: This function runs an extra forward pass through the model on the batch of data. If you are using a non-default precision, ensure that this forward pass runs in your desired precision. For example:

with torch.cuda.amp.autocast(True):
    X_new, y_new = selective_backprop(X, y, model, loss_fun, keep, scale_factor)
composer.algorithms.selective_backprop.selective_backprop.should_selective_backprop(current_duration, batch_idx, start=0.5, end=0.9, interrupt=2)[source]#

Decide if selective backprop should be run based on time in training.

Returns true if the current_duration is between start and end. Recommend that SB be applied during the later stages of a training run, once the model has already โ€œlearnedโ€ easy examples.

To preserve convergence, SB can be interrupted with vanilla minibatch gradient steps every interrupt steps. When interrupt=0, SB will be used at every step during the SB interval. When interrupt=2, SB will alternate with vanilla minibatch steps.

Parameters
  • current_duration (float) โ€“ The elapsed training duration. Must be within \([0.0, 1.0)\).

  • batch_idx (int) โ€“ The current batch within the epoch

  • start (float, optional) โ€“ The duration at which selective backprop should be enabled. Default: 0.5.

  • end (float, optional) โ€“ The duration at which selective backprop should be disabled Default: 0.9.

  • interrupt (int, optional) โ€“ The number of batches between vanilla minibatch gradient updates Default: 2.

Returns

bool โ€“ If selective backprop should be performed on this batch.