composer.datasets.synthetic#

composer.datasets.synthetic

Classes

MemoryFormat

An enumeration.

StringEnum

Base class for Enums containing string values.

SyntheticBatchPairDataset

Emulates a dataset of provided size and shape.

SyntheticDataLabelType

An enumeration.

SyntheticDataType

An enumeration.

SyntheticPILDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.

VisionDataset

Base Class For making datasets which are compatible with torchvision.

Attributes

  • Callable

  • Optional

  • Sequence

  • Union

class composer.datasets.synthetic.SyntheticBatchPairDataset(*, total_dataset_size, data_shape, num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, device='cpu', memory_format=MemoryFormat.CONTIGUOUS_FORMAT, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

Emulates a dataset of provided size and shape.

Parameters
  • total_dataset_size (int) โ€“ The total size of the dataset to emulate.

  • data_shape (List[int]) โ€“ Shape of the tensor for input samples.

  • num_unique_samples_to_create (int) โ€“ The number of unique samples to allocate memory for.

  • data_type (str or SyntheticDataType, optional) โ€“

  • label_type (str or SyntheticDataLabelType, optional) โ€“

  • num_classes (int, optional) โ€“ Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or``CLASSIFICATION_ONE_HOT``. Otherwise, should be None.

  • label_shape (List[int]) โ€“ Shape of the tensor for each sample label.

  • device (str) โ€“ Device to store the sample pool. Set to cuda to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to cpu to move data between host memory and the gpu on every batch.

  • memory_format (MemoryFormat, optional) โ€“ Memory format for the sample pool.

class composer.datasets.synthetic.SyntheticDataLabelType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

An enumeration.

class composer.datasets.synthetic.SyntheticDataType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

An enumeration.

class composer.datasets.synthetic.SyntheticPILDataset(*, total_dataset_size, data_shape=(64, 64, 3), num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, transform=None)[source]#

Bases: torchvision.datasets.vision.VisionDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.

Parameters
  • total_dataset_size (int) โ€“ The total size of the dataset to emulate.

  • data_shape (List[int]) โ€“ Shape of the image for input samples. Default = [64, 64]

  • num_unique_samples_to_create (int) โ€“ The number of unique samples to allocate memory for.

  • data_type (str or SyntheticDataType, optional) โ€“

  • label_type (str or SyntheticDataLabelType, optional) โ€“

  • num_classes (int, optional) โ€“ Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or CLASSIFICATION_ONE_HOT. Otherwise, should be None.

  • label_shape (List[int]) โ€“ Shape of the tensor for each sample label.

  • transform (Callable) โ€“ Dataset transforms