composer.algorithms.mixup.mixup#

Core MixUp classes and functions.

Functions

mixup_batch

Create new samples using convex combinations of pairs of samples.

Classes

MixUp

MixUp trains the network on convex combinations of pairs of examples and targets rather than individual examples and targets.

class composer.algorithms.mixup.mixup.MixUp(num_classes, alpha=0.2)[source]#

Bases: composer.core.algorithm.Algorithm

MixUp trains the network on convex combinations of pairs of examples and targets rather than individual examples and targets.

This is done by taking a convex combination of a given batch X with a randomly permuted copy of X. The mixing coefficient is drawn from a Beta(alpha, alpha) distribution.

Training in this fashion sometimes reduces generalization error.

Example

from composer.algorithms import MixUp
from composer.trainer import Trainer
mixup_algorithm = MixUp(num_classes=1000, alpha=0.2)
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    max_duration="1ep",
    algorithms=[mixup_algorithm],
    optimizers=[optimizer]
)

Parameters

num_classes (int) – the number of classes in the task labels.
alpha (float) – the psuedocount for the Beta distribution used to sample interpolation parameters. As alpha grows, the two samples in each pair tend to be weighted more equally. As alpha approaches 0 from above, the combination approaches only using one element of the pair.

apply(event, state, logger)[source]#

Applies MixUp augmentation on State input.

Parameters

event (Event) – the current event
state (State) – the current trainer state
logger (Logger) – the training logger

match(event, state)[source]#

Runs on Event.INIT and Event.AFTER_DATALOADER.

Parameters

event (Event) – The current event.
state (State) – The current state.

Returns

bool – True if this algorithm should run now.

composer.algorithms.mixup.mixup.mixup_batch(x, y, n_classes, interpolation_lambda=None, alpha=0.2, indices=None)[source]#

Create new samples using convex combinations of pairs of samples.

This is done by taking a convex combination of x with a randomly permuted copy of x. The interpolation parameter lambda should be chosen from a Beta(alpha, alpha) distribution for some parameter alpha > 0. Note that the same lambda is used for all examples within the batch.

Both the original and shuffled labels are returned. This is done because for many loss functions (such as cross entropy) the targets are given as indices, so interpolation must be handled separately.

Example

from composer.algorithms.mixup import mixup_batch
new_inputs, new_targets, perm = mixup_batch(
                                    x=X_example,
                                    y=y_example,
                                    n_classes=1000,
                                    alpha=0.2
                                    )

Parameters

x – input tensor of shape (B, d1, d2, …, dn), B is batch size, d1-dn are feature dimensions.
y – target tensor of shape (B, f1, f2, …, fm), B is batch size, f1-fn are possible target dimensions.
interpolation_lambda – coefficient used to interpolate between the two examples. If provided, must be in [0, 1]. If None, value is drawn from a Beta(alpha, alpha) distribution.
alpha – parameter for the beta distribution over the interpolation_lambda. Only used if interpolation_lambda is not provided.
n_classes – total number of classes.
indices – Permutation of the batch indices 1..B. Used for permuting without randomness.

Returns

x_mix – batch of inputs after mixup has been applied
y_mix – labels after mixup has been applied
perm – the permutation used