composer.datasets.ade20k#

ADE20K Semantic segmentation and scene parsing dataset.

Please refer to the ADE20K dataset for more details about this dataset.

Classes

ADE20k

PyTorch Dataset for ADE20k.

Hparams

These classes are used with yahp for YAML-based configuration.

`ADE20kDatasetHparams`	Defines an instance of the ADE20k dataset for semantic segmentation from a local disk.
`ADE20kWebDatasetHparams`	Defines an instance of the ADE20k dataset for semantic segmentation from a remote blob store.

class composer.datasets.ade20k.ADE20k(datadir, split='train', both_transforms=None, image_transforms=None, target_transforms=None)[source]#

Bases: torch.utils.data.dataset.Dataset

PyTorch Dataset for ADE20k.

Parameters

datadir (str) – the path to the ADE20k folder.
split (str) – the dataset split to use, either ‘train’, ‘val’, or ‘test’. Default: 'train'.
both_transforms (Module) – transformations to apply to the image and target simultaneously. Default: None.
image_transforms (Module) – transformations to apply to the image only. Default: None.
target_transforms (Module) – transformations to apply to the target only. Default None.

class composer.datasets.ade20k.ADE20kDatasetHparams(use_synthetic=False, synthetic_num_unique_samples=100, synthetic_device='cpu', synthetic_memory_format=MemoryFormat.CONTIGUOUS_FORMAT, is_train=True, drop_last=True, shuffle=True, datadir=None, split='train', base_size=512, min_resize_scale=0.5, max_resize_scale=2.0, final_size=512, ignore_background=True)[source]#

Bases: composer.datasets.hparams.DatasetHparams, composer.datasets.hparams.SyntheticHparamsMixin

Defines an instance of the ADE20k dataset for semantic segmentation from a local disk.

Parameters

use_synthetic (bool, optional) – Whether to use synthetic data. Default: False.
synthetic_num_unique_samples (int, optional) – The number of unique samples to allocate memory for. Ignored if use_synthetic is False. Default: 100.
synthetic_device (str, optional) – The device to store the sample pool on. Set to 'cuda' to store samples on the GPU and eliminate PCI-e bandwidth with the dataloader. Set to 'cpu' to move data between host memory and the device on every batch. Ignored if use_synthetic is False. Default: 'cpu'.
synthetic_memory_format – The MemoryFormat to use. Ignored if use_synthetic is False. Default: 'CONTIGUOUS_FORMAT'.
datadir (str) – The path to the data directory.
is_train (bool) – Whether to load the training data or validation data. Default: True.
drop_last (bool) – If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default: True.
shuffle (bool) – Whether to shuffle the dataset. Default: True.
split (str) – the dataset split to use either ‘train’, ‘val’, or ‘test’. Default: 'train`.
base_size (int) – initial size of the image and target before other augmentations. Default: 512.
min_resize_scale (float) – the minimum value the samples can be rescaled. Default: 0.5.
max_resize_scale (float) – the maximum value the samples can be rescaled. Default: 2.0.
final_size (int) – the final size of the image and target. Default: 512.
ignore_background (bool) – if true, ignore the background class when calculating the training loss. Default: true.

initialize_object(batch_size, dataloader_hparams)[source]#

Creates a DataLoader or DataSpec for this dataset.

Parameters

batch_size (int) – The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) – The dataset-independent hparams for the dataloader.

Returns

DataLoader or DataSpec – The DataLoader, or if the dataloader yields batches of custom types, a DataSpec.

validate()[source]#

Validate that the hparams are of the correct types. Recurses through sub-hparams.

Raises: TypeError – Raises a TypeError if any fields are an incorrect type.

class composer.datasets.ade20k.ADE20kWebDatasetHparams(is_train=True, drop_last=True, shuffle=True, datadir=None, webdataset_cache_dir='/tmp/webdataset_cache/', webdataset_cache_verbose=False, shuffle_buffer=256, remote='s3://mosaicml-internal-dataset-ade20k', name='ade20k', split='train', base_size=512, min_resize_scale=0.5, max_resize_scale=2.0, final_size=512, ignore_background=True)[source]#

Bases: composer.datasets.hparams.WebDatasetHparams

Defines an instance of the ADE20k dataset for semantic segmentation from a remote blob store.

Parameters

datadir (str) – The path to the data directory.
is_train (bool) – Whether to load the training data or validation data. Default: True.
drop_last (bool) – If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default: True.
shuffle (bool) – Whether to shuffle the dataset. Default: True.
datadir – The path to the data directory.
is_train – Whether to load the training data or validation data. Default: True.
drop_last – If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default: True.
shuffle – Whether to shuffle the dataset. Default: True.
webdataset_cache_dir (str) – WebDataset cache directory.
webdataset_cache_verbose (str) – WebDataset cache verbosity.
remote (str) – S3 bucket or root directory where dataset is stored. Default: 's3://mosaicml-internal-dataset-ade20k'
name (str) – Key used to determine where dataset is cached on local filesystem. Default: 'ade20k'
split (str) – the dataset split to use either ‘train’, ‘val’, or ‘test’. Default: 'train'.
base_size (int) – initial size of the image and target before other augmentations. Default: 512.
min_resize_scale (float) – the minimum value the samples can be rescaled. Default: 0.5.
max_resize_scale (float) – the maximum value the samples can be rescaled. Default: 2.0.
final_size (int) – the final size of the image and target. Default: 512.
ignore_background (bool) – if true, ignore the background class when calculating the training loss. Default: True.

initialize_object(batch_size, dataloader_hparams)[source]#

Creates a DataLoader or DataSpec for this dataset.

Parameters

batch_size (int) – The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) – The dataset-independent hparams for the dataloader.

Returns

DataLoader or DataSpec – The DataLoader, or if the dataloader yields batches of custom types, a DataSpec.

validate()[source]#

Validate that the hparams are of the correct types. Recurses through sub-hparams.

Raises: TypeError – Raises a TypeError if any fields are an incorrect type.

class composer.datasets.ade20k.PadToSize(size, fill=0)[source]#