composer.datasets.synthetic#

composer.datasets.synthetic

Classes

`MemoryFormat`	An enumeration.
`StringEnum`	Base class for Enums containing string values.
`SyntheticBatchPairDataset`	Emulates a dataset of provided size and shape.
`SyntheticDataLabelType`	An enumeration.
`SyntheticDataType`	An enumeration.
`SyntheticPILDataset`	Similar to `SyntheticBatchPairDataset`, but yields samples of type `Image` and supports dataset transformations.
`VisionDataset`	Base Class For making datasets which are compatible with torchvision.

Attributes

Callable
Optional
Sequence
Union

class composer.datasets.synthetic.SyntheticBatchPairDataset(*, total_dataset_size, data_shape, num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, device='cpu', memory_format=MemoryFormat.CONTIGUOUS_FORMAT, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

Emulates a dataset of provided size and shape.

Parameters

total_dataset_size (int) – The total size of the dataset to emulate.
data_shape (List[int]) – Shape of the tensor for input samples.
num_unique_samples_to_create (int) – The number of unique samples to allocate memory for.
data_type (str or SyntheticDataType, optional) –
label_type (str or SyntheticDataLabelType, optional) –
num_classes (int, optional) – Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or``CLASSIFICATION_ONE_HOT``. Otherwise, should be None.
label_shape (List[int]) – Shape of the tensor for each sample label.
device (str) – Device to store the sample pool. Set to cuda to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to cpu to move data between host memory and the gpu on every batch.
memory_format (MemoryFormat, optional) – Memory format for the sample pool.

class composer.datasets.synthetic.SyntheticDataLabelType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

An enumeration.

class composer.datasets.synthetic.SyntheticDataType(value)[source]#

Bases: composer.utils.string_enum.StringEnum

An enumeration.

class composer.datasets.synthetic.SyntheticPILDataset(*, total_dataset_size, data_shape=(64, 64, 3), num_unique_samples_to_create=100, data_type=SyntheticDataType.GAUSSIAN, label_type=SyntheticDataLabelType.CLASSIFICATION_INT, num_classes=None, label_shape=None, transform=None)[source]#

Bases: torchvision.datasets.vision.VisionDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.

Parameters

total_dataset_size (int) – The total size of the dataset to emulate.
data_shape (List[int]) – Shape of the image for input samples. Default = [64, 64]
num_unique_samples_to_create (int) – The number of unique samples to allocate memory for.
data_type (str or SyntheticDataType, optional) –
label_type (str or SyntheticDataLabelType, optional) –
num_classes (int, optional) – Number of classes to use. Required if SyntheticDataLabelType is CLASSIFICATION_INT or CLASSIFICATION_ONE_HOT. Otherwise, should be None.
label_shape (List[int]) – Shape of the tensor for each sample label.
transform (Callable) – Dataset transforms