seq_length_warmup#
Core code for sequence length warmup.
Functions
Set the sequence length of a batch. |
Classes
Progressively increases the sequence length during training. |
- class composer.algorithms.seq_length_warmup.seq_length_warmup.SeqLengthWarmup(duration=0.3, min_seq_length=8, max_seq_length=1024, step_size=8, truncate=True, preserve_end_of_sequence=False)[source]#
Bases:
composer.core.algorithm.Algorithm
Progressively increases the sequence length during training.
Changes the sequence length of all tensors in the input batch. The sequence length increases from
min_seq_length
tomax_seq_length
in steps ofstep_size
during the firstduration
fraction of training.The sequence length is then kept at
max_seq_length
for the rest of training.Tensors are either truncated (
truncate=True
) or reshaped to create new examples from the extra tokens (truncate=False
).This algorithm runs on
Event.AFTER_DATALOADER
to modify the sequence length of a batch of data after the model and data have been moved to accelerators.Note
step_size
should be a multiple of eight for optimal throughput on NVIDIA GPUs.Note
Variable input lengths can create CUDA OOM errors. To avoid this, we follow the PyTorch notes and pre-allocate the memory with a blank forward and backward pass.
See the Method Card for more details.
Example:
from composer.algorithms import SeqLengthWarmup from composer import Trainer seq_length_warmup = SeqLengthWarmup(duration=0.5, min_seq_length=8, max_seq_length=1024, step_size=8, truncate=True, preserve_end_of_sequence=False) trainer = Trainer(model=model, train_dataloader=train_dataloader, max_duration="1ep", algorithms=[seq_length_warmup])
- Parameters
duration (float, optional) โ Fraction of total training for sequential length learning. Default =
0.3
.min_seq_length (int, optional) โ Minimum sequence length to start the warmup. Default =
8
.max_seq_length (int, optional) โ Maximum sequence length to stop the warmup. Default =
1024
.step_size (int, optional) โ Step size of sequence length. Default =
8
.truncate (bool, optional) โ Truncate sequences early, or reshape tensors to create new examples out of the extra tokens. Default:
True
.preserve_end_of_sequence (bool, optional) โ Preserve the end-of-sequence of the batch when truncating. Useful when input formats include a unique end-of-sequence token. Ignored if
truncate=False
. Default:False
. E.g., ifbatch["input_ids"]
is[[10, 11, 12, 13, 14, 15]]
andcurr_seq_length=3
,"input_ids"
in the returned batch would be[[10, 11, 12]]
withpreserve_end_of_sequence=False
and would be[[10, 11, 15]]
withpreserve_end_of_sequence=True
. This behavior applies to any batch tensor with 2 or more dimensions.
- composer.algorithms.seq_length_warmup.seq_length_warmup.set_batch_sequence_length(batch, curr_seq_len, truncate=True, preserve_end_of_sequence=False)[source]#
Set the sequence length of a batch.
Changes the sequence length of all tensors in the provided dictionary to
curr_seq_len
by either truncating the tensors (truncate=True
) or reshaping the tensors to create new examples from the extra tokens (truncate=False
).Note
The schedule for
curr_seq_len
over training time should be managed outside of this function.Note
Variable input lengths can create CUDA OOM errors. To avoid this, we follow the PyTorch notes and pre-allocate the memory with a blank forward and backward pass.
- Parameters
batch (Dict[str, Tensor]) โ The input batch to the model, must be a dictionary.
curr_seq_length (int) โ The desired sequence length to apply.
truncate (bool, optional) โ Truncate sequences early, or reshape tensors to create new examples out of the extra tokens. Default:
True
.preserve_end_of_sequence (bool, optional) โ Preserve the end-of-sequence of the batch when truncating. Useful when input formats include a unique end-of-sequence token. Ignored if
truncate=False
. Default:False
. E.g., ifbatch["input_ids"]
is[[10, 11, 12, 13, 14, 15]]
andcurr_seq_length=3
,"input_ids"
in the returned batch would be[[10, 11, 12]]
withpreserve_end_of_sequence=False
and would be[[10, 11, 15]]
withpreserve_end_of_sequence=True
. This behavior applies to any batch tensor with 2 or more dimensions.
- Returns
Dict[str, Tensor] โ a Mapping of input tensors to the model, where all tensors have curr_seq_len in the second dimension.
Example:
import composer.functional as cf for epoch in range(num_epochs): for X, y in train_loader: X = cf.set_batch_sequence_length(X, sequence_length) y_hat = model(X) loss = loss_fn(y_hat, y)