SyntheticBatchPairDataset#

class composer.datasets.SyntheticBatchPairDataset(*, total_dataset_size, data_shape, num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, device='cpu', memory_format=MemoryFormat.CONTIGUOUS_FORMAT, transform=None)[source]#

Emulates a dataset of provided size and shape.

Parameters
  • total_dataset_size (int) โ€“ The total size of the dataset to emulate.

  • data_shape (List[int]) โ€“ Shape of the tensor for input samples.

  • num_unique_samples_to_create (int) โ€“ The number of unique samples to allocate memory for.

  • data_type (str or SyntheticDataType, optional) โ€“ Default: SyntheticDataType.GAUSSIAN.

  • label_type (str or SyntheticDataLabelType, optional) โ€“ create. Default: SyntheticDataLabelType.CLASSIFICATION_INT.

  • num_classes (int, optional) โ€“ Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or``CLASSIFICATION_ONE_HOT``. Default: None.

  • label_shape (List[int], optional) โ€“ Shape of the tensor for each sample label. Default: None.

  • device (str) โ€“ Device to store the sample pool. Set to 'cuda' to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to 'cpu' to move data between host memory and the gpu on every batch. Default: 'cpu'.

  • memory_format (MemoryFormat, optional) โ€“ Memory format for the sample pool. Default: MemoryFormat.CONTIGUOUS_FORMAT.

  • transform (Callable, optional) โ€“ Transform(s) to apply to data. Default: None.