composer.algorithms.selective_backprop.selective_backprop#
Core SelectiveBackprop class and functions.
Functions
Prunes minibatches as a subroutine of SelectiveBackprop. |
|
Decides if selective backprop should be run based on time in training. |
Classes
Selectively backpropagate gradients from a subset of each batch. |
- class composer.algorithms.selective_backprop.selective_backprop.SelectiveBackprop(start=0.5, end=0.9, keep=0.5, scale_factor=0.5, interrupt=2)[source]#
Bases:
composer.core.algorithm.Algorithm
Selectively backpropagate gradients from a subset of each batch.
Based on (Jiang et al, 2019), Selective Backprop (SB) prunes minibatches according to the difficulty of the individual training examples, and only computes weight gradients over the pruned subset, reducing iteration time and speeding up training.
The fraction of the minibatch that is kept for gradient computation is specified by the argument
0 <= keep <= 1
.To speed up SBโs selection forward pass, the argument
scale_factor
can be used to spatially downsample input image tensors. The full-sized inputs will still be used for the weight gradient computation.To preserve convergence, SB can be interrupted with vanilla minibatch gradient steps every
interrupt
steps. Wheninterrupt=0
, SB will be used at every step during the SB interval. Wheninterrupt=2
, SB will alternate with vanilla minibatch steps.- Args:
- start (float, optional): SB interval start as fraction of training duration
Default:
0.5
.- end (float, optional): SB interval end as fraction of training duration
Default:
0.9
.- keep (float, optional): fraction of minibatch to select and keep for gradient computation
Default:
0.5
.- scale_factor (float, optional): scale for downsampling input for selection forward pass
Default:
0.5
.- interrupt (int, optional): interrupt SB with a vanilla minibatch step every
interrupt
batches. Default:2
.
Example
from composer.algorithms import SelectiveBackprop algorithm = SelectiveBackprop(start=0.5, end=0.9, keep=0.5) trainer = Trainer( model=model, train_dataloader=train_dataloader, eval_dataloader=eval_dataloader, max_duration="1ep", algorithms=[algorithm], optimizers=[optimizer] )
- composer.algorithms.selective_backprop.selective_backprop.select_using_loss(input, target, model, loss_fun, keep=0.5, scale_factor=1)[source]#
Prunes minibatches as a subroutine of SelectiveBackprop. Computes the loss function on the provided training examples and runs minibatches according to the difficulty. The fraction of the minibatch that is kept for gradient computation is specified by the argument
0 <= keep <= 1
.To speed up SBโs selection forward pass, the argument
scale_factor
can be used to spatially downsample input tensors. The full-sized inputs will still be used for the weight gradient computation.- Parameters
input (Tensor) โ Input tensor to prune
target (Tensor) โ Output tensor to prune
model (Callable) โ Model with which to predict outputs
loss_fun (Callable) โ Loss function of the form
loss(outputs, targets, reduction='none')
. The function must take the keyword argumentreduction='none'
to ensure that per-sample losses are returned.keep (float, optional) โ Fraction of examples in the batch to keep. Default:
0.5
.scale_factor (float, optional) โ Multiplier between 0 and 1 for spatial size. Downsampling requires the input tensor to be at least 3D. Default:
1
.
- Returns
(torch.Tensor, torch.Tensor) โ The pruned batch of inputs and targets
- Raises
ValueError โ If
scale_factor > 1
TypeError โ If
loss_fun > 1
has the wrong signature or is not callable
Note
This function runs an extra forward pass through the model on the batch of data. If you are using a non-default precision, ensure that this forward pass runs in your desired precision. For example:
from composer.algorithms.selective_backprop import select_using_loss with torch.cuda.amp.autocast(True): X_new, y_new = select_using_loss(X_sb, y_sb, lin_model, loss_fun, keep=0.5, scale_factor=1)
- composer.algorithms.selective_backprop.selective_backprop.should_selective_backprop(current_duration, batch_idx, start=0.5, end=0.9, interrupt=2)[source]#
Decides if selective backprop should be run based on time in training.
Returns true if the
current_duration
is betweenstart
andend
. It is recommended that SB be applied during the later stages of a training run, once the model has already โlearnedโ easy examples.To preserve convergence, SB can be interrupted with vanilla minibatch gradient steps every
interrupt
steps. Wheninterrupt=0
, SB will be used at every step during the SB interval. Wheninterrupt=2
, SB will alternate with vanilla minibatch steps.- Parameters
current_duration (float) โ The elapsed training duration. Must be within \([0.0, 1.0)\).
batch_idx (int) โ The current batch within the epoch
start (float, optional) โ The duration at which selective backprop should be enabled. Default:
0.5
.end (float, optional) โ The duration at which selective backprop should be disabled Default:
0.9
.interrupt (int, optional) โ The number of batches between vanilla minibatch gradient updates Default:
2
.
- Returns
bool โ If selective backprop should be performed on this batch.