composer.datasets.hparams#

composer.datasets.hparams

Hparams

These classes are used with yahp for YAML-based configuration.

DatasetHparams

Abstract base class for hyperparameters to initialize a dataset.

SyntheticHparamsMixin

Synthetic dataset parameter mixin for DatasetHparams.

class composer.datasets.hparams.DatasetHparams(is_train=True, drop_last=True, shuffle=True, datadir=None)[source]#

Bases: yahp.hparams.Hparams, abc.ABC

Abstract base class for hyperparameters to initialize a dataset.

Parameters
  • datadir (str) โ€“ The path to the data directory.

  • is_train (bool) โ€“ Whether to load the training data (the default) or validation data.

  • drop_last (bool) โ€“ If the number of samples is not divisible by the batch size, whether to drop the last batch (the default) or pad the last batch with zeros.

  • shuffle (bool) โ€“ Whether to shuffle the dataset. Defaults to True.

abstract initialize_object(batch_size, dataloader_hparams)[source]#

Creates a DataLoader or DataloaderSpec for this dataset.

Parameters
  • batch_size (int) โ€“ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.

  • dataloader_hparams (DataloaderHparams) โ€“ The dataset-independent hparams for the dataloader

Returns
  • Dataloader or DataSpec โ€“ The dataloader, or if the dataloader yields batches of custom types,

  • a :class:`DataSpec`.

class composer.datasets.hparams.SyntheticHparamsMixin(use_synthetic=False, synthetic_num_unique_samples=100, synthetic_device='cpu', synthetic_memory_format=MemoryFormat.CONTIGUOUS_FORMAT)[source]#

Bases: yahp.hparams.Hparams, abc.ABC

Synthetic dataset parameter mixin for DatasetHparams.

Parameters
  • use_synthetic (bool, optional) โ€“ Whether to use synthetic data. (Default: False)

  • synthetic_num_unique_samples (int, optional) โ€“ The number of unique samples to allocate memory for. Ignored if use_synthetic is False. (Default: 100)

  • synthetic_device (str, optonal) โ€“ The device to store the sample pool. Set to cuda to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to cpu to move data between host memory and the device on every batch. Ignored if use_synthetic is False. (Default: cpu)

  • synthetic_memory_format โ€“ The MemoryFormat to use. Ignored if use_synthetic is False. (Default: CONTIGUOUS_FORMAT)