composer.datasets.dataloader#

Common settings across both the training and eval datasets.

These settings are dataset independent.

Functions

unwrap_data_loader

Recursively unwraps a dataloader if it is of type WrappedDataLoader.

Classes

WrappedDataLoader

A wrapper around dataloader.

Hparams

These classes are used with yahp for YAML-based configuration.

DataLoaderHparams

Hyperparameters to initialize a torch.utils.data.DataLoader.

class composer.datasets.dataloader.DataLoaderHparams(num_workers=8, prefetch_factor=2, persistent_workers=True, pin_memory=True, timeout=0)[source]#

Bases: yahp.hparams.Hparams

Hyperparameters to initialize a torch.utils.data.DataLoader.

Parameters
  • num_workers (int, optional) โ€“ Number of CPU workers to use per device to fetch data. Set to 0 to use the main training thread for dataloading. While zero workers can be useful for debugging, it should not be used for performance reasons. Default: 8.

  • prefetch_factor (int, optional) โ€“ Number of samples loaded in advance by each worker. For example, 2 means there will be a total of 2 * num_workers samples prefetched across all workers. If num_workers = 0, then the prefetch_factor must be left at the default value. Default: 2.

  • persistent_workers (bool) โ€“ Whether to reuse dataloader workers across epochs. If num_workers is 0, then this field must be False. Default: True.

  • pin_memory (bool, optional) โ€“ Whether or not to copy Tensors into CUDA pinned memory before returning them. If num_workers = 0, then the pin_memory must be False. Default: True.

  • timeout (float) โ€“ Timeout, in seconds, for collecting a batch from workers. Set to 0 for no timeout. Default: 0.

initialize_object(dataset, *, batch_size, sampler, drop_last, collate_fn=None, worker_init_fn=None)[source]#

Create a dataloader.

Parameters
  • dataset (Dataset) โ€“ The dataset.

  • batch_size (int) โ€“ The per-device batch size.

  • sampler (Sampler[int] or None) โ€“ The sampler to use for the dataloader.

  • drop_last (bool) โ€“ Whether to drop the last batch if the number of samples is not evenly divisible by the batch size.

  • collate_fn (callable, optional) โ€“ Custom collate function. Default: None.

  • worker_init_fn (callable, optional) โ€“ Custom worker init function. Default: None.

Returns

DataLoader โ€“ The dataloader.

class composer.datasets.dataloader.WrappedDataLoader(dataloader)[source]#

Bases: composer.core.types.DataLoader

A wrapper around dataloader.

Parameters

dataloader (DataLoader) โ€“ A wrapped or unwrapped dataloader.

classmethod is_dataloader_already_wrapped(dataloader)[source]#

Returns whether the dataloader is wrapped with cls. This helper method checks recursively through all wrappings until the underlying dataloader is reached.

Parameters

dataloader (DataLoader) โ€“ The dataloader to check

Returns

bool โ€“ Whether the dataloader is wrapped recursively with cls.

composer.datasets.dataloader.unwrap_data_loader(dataloader)[source]#

Recursively unwraps a dataloader if it is of type WrappedDataLoader.

Parameters

dataloader (DataLoader) โ€“ The dataloader to unwrap

Returns

DataLoader โ€“ The underlying dataloader