composer.algorithms.seq_length_warmup.seq_length_warmup#

Core code for sequence length warmup.

Functions

set_batch_sequence_length

Set the sequence length of the current batch.

Classes

SeqLengthWarmup

Progressively increases the sequence length during training.

class composer.algorithms.seq_length_warmup.seq_length_warmup.SeqLengthWarmup(duration=0.3, min_seq_length=8, max_seq_length=1024, step_size=8, truncate=True)[source]#

Bases: composer.core.algorithm.Algorithm

Progressively increases the sequence length during training.

Changes the sequence length of all tensors in the input batch. The sequence length increases from min_seq_length to max_seq_length in steps of step_size during the first duration fraction of training.

The sequence length is then kept at max_seq_length for the rest of training.

Tensors are either truncated (truncate=True) or reshaped to create new examples from the extra tokens (truncate=False).

This algorithm runs on AFTER_DATALOADER to modify the sequence length of a batch of data, after the model and data have been moved to accelerators.

Note

step_size should be a multiple of eight for optimal throughput on NVIDIA GPUs

Note

Variable input lengths can create CUDA OOM errors. To avoid this, we follow PyTorch notes and pre-allocate the memory with a blank forward and backward pass.

See the Method Card for more details.

Example: Awaiting language model test fixtures.

Parameters

duration (float, optional) – Fraction of total training for sequential length learning. Default = 0.3.
min_seq_length (int, optional) – Minimum sequence length to start the warmup. Default = 8.
max_seq_length (int, optional) – Maximum sequence length to stop the warmup. Default = 1024.
step_size (int, optional) – Step size of sequence length. Default = 8.
truncate (bool, optional) – Truncate tensors or reshape extra tokens to new examples. Default = True.

composer.algorithms.seq_length_warmup.seq_length_warmup.set_batch_sequence_length(batch, curr_seq_len, truncate=True)[source]#

Set the sequence length of the current batch.

Changes the sequence length of all tensors in the provided dictionary to curr_seq_len, by either truncating the tensors (truncate=True) or reshaping the tensors to create new examples from the extra tokens (truncate=False).

Note

The schedule for curr_seq_len over training time should be managed out of this function.

Example: Awaiting language model test fixtures.

Parameters

batch (Dict[str, Tensor]) – The input batch to the model, must be a dictionary.
curr_seq_length (int) – The desired sequence length to apply.
truncate (bool, optional) – Truncate sequences early, or reshape tensors to create new examples out of the extra tokens. Default = True.

Returns

Dict[str, Tensor] – a Mapping of input tensors to the model, where all tensors have curr_seq_len in the second dimension.