composer.datasets.synthetic#

Synthetic datasets used for testing, profiling, and debugging.

Classes

SyntheticBatchPairDataset

Emulates a dataset of provided size and shape.

SyntheticDataLabelType

Defines the class label type of the synthetic data.

SyntheticDataType

Defines the distribution of the synthetic data.

SyntheticPILDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.

class composer.datasets.synthetic.SyntheticBatchPairDataset(*, total_dataset_size, data_shape, num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, device='cpu', memory_format=MemoryFormat.CONTIGUOUS_FORMAT, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

Emulates a dataset of provided size and shape.

Parameters
  • total_dataset_size (int) โ€“ The total size of the dataset to emulate.

  • data_shape (List[int]) โ€“ Shape of the tensor for input samples.

  • num_unique_samples_to_create (int) โ€“ The number of unique samples to allocate memory for.

  • data_type (str or SyntheticDataType, optional) โ€“ Default: SyntheticDataType.GAUSSIAN.

  • label_type (str or SyntheticDataLabelType, optional) โ€“ create. Default: SyntheticDataLabelType.CLASSIFICATION_INT.

  • num_classes (int, optional) โ€“ Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or``CLASSIFICATION_ONE_HOT``. Default: None.

  • label_shape (List[int], optional) โ€“ Shape of the tensor for each sample label. Default: None.

  • device (str) โ€“ Device to store the sample pool. Set to 'cuda' to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to 'cpu' to move data between host memory and the gpu on every batch. Default: 'cpu'.

  • memory_format (MemoryFormat, optional) โ€“ Memory format for the sample pool. Default: MemoryFormat.CONTIGUOUS_FORMAT.

  • transform (Callable, optional) โ€“ Transform(s) to apply to data. Default: None.

class composer.datasets.synthetic.SyntheticDataLabelType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

Defines the class label type of the synthetic data.

CLASSIFICATION_INT#

Class labels are ints.

CLASSIFICATION_ONE_HOT#

Class labels are one-hot vectors.

class composer.datasets.synthetic.SyntheticDataType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

Defines the distribution of the synthetic data.

GAUSSIAN#

Standard Guassian distribution.

SEPARABLE#

Gaussian distributed, but classes will be mean-shifted for separability.

class composer.datasets.synthetic.SyntheticPILDataset(*, total_dataset_size, data_shape=(64, 64, 3), num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, transform=None)[source]#

Bases: torchvision.datasets.vision.VisionDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.

Parameters
  • total_dataset_size (int) โ€“ The total size of the dataset to emulate.

  • data_shape (List[int]) โ€“ Shape of the tensor for input samples.

  • num_unique_samples_to_create (int) โ€“ The number of unique samples to allocate memory for.

  • data_type (str or SyntheticDataType, optional) โ€“ Default: SyntheticDataType.GAUSSIAN.

  • label_type (str or SyntheticDataLabelType, optional) โ€“ create. Default: SyntheticDataLabelType.CLASSIFICATION_INT.

  • num_classes (int, optional) โ€“ Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or CLASSIFICATION_ONE_HOT. Default: None.

  • label_shape (List[int], optional) โ€“ Shape of the tensor for each sample label. Default: None.

  • transform (Callable, optional) โ€“ Transform(s) to apply to data. Default: None.