composer.datasets

DataloaderHparams contains the torch.utils.data.dataloader settings that are common across both training and eval datasets:

num_workers
prefetch_factor
persistent_workers
pin_memory
timeout

Each DatasetHparams is then responsible for returning a DataloaderSpec, which is a NamedTuple of dataset-specific settings such as:

dataset
drop_last
shuffle
collate_fn

This indirection (instead of directly creating the dataloader at the start) is needed because for multi-GPU training, dataloaders require the global rank to initialize their torch.utils.data.distributed.DistributedSampler.

As a result, our trainer uses the DataloaderSpec and DataloaderHparams to create the dataloaders after DDP has forked the processes.

composer.datasets

Base Classes and Hyperparameters

Datasets