composer.datasets
DataloaderHparams
contains the torch.utils.data.dataloader
settings that are common across both training and eval datasets:
num_workers
prefetch_factor
persistent_workers
pin_memory
timeout
Each DatasetHparams
is then responsible for returning a DataloaderSpec
, which is a NamedTuple
of dataset-specific settings such as:
dataset
drop_last
shuffle
collate_fn
This indirection (instead of directly creating the dataloader
at the start) is needed because for multi-GPU training, dataloaders require the global rank to initialize their torch.utils.data.distributed.DistributedSampler
.
As a result, our trainer uses the DataloaderSpec
and DataloaderHparams
to create the dataloaders after DDP has forked the processes.