datasets#

Modules

composer.datasets.ade20k

ADE20K Semantic segmentation and scene parsing dataset.

composer.datasets.ade20k_hparams

ADE20K Semantic segmentation and scene parsing dataset.

composer.datasets.brats

BraTS (Brain Tumor Segmentation) dataset.

composer.datasets.brats_hparams

BraTS (Brain Tumor Segmentation) dataset hyperparameters.

composer.datasets.c4

C4 (Colossal Cleaned Common Crawl) dataset.

composer.datasets.c4_hparams

C4 (Colossal Cleaned Common Crawl) dataset hyperparameters.

composer.datasets.cifar

CIFAR image classification dataset.

composer.datasets.cifar_hparams

CIFAR image classification dataset hyperparameters.

composer.datasets.coco

COCO (Common Objects in Context) dataset.

composer.datasets.coco_hparams

COCO (Common Objects in Context) dataset hyperparameters.

composer.datasets.dataset_hparams

Dataloader and Dataset Hyperparameter classes.

composer.datasets.dataset_hparams_registry

Mapping between dataset names and corresponding HParams classes.

composer.datasets.evaluator_hparams

Hyperparameters for the Evaluator.

composer.datasets.ffcv_utils

Module ffcv_utils.

composer.datasets.glue_hparams

GLUE (General Language Understanding Evaluation) dataset hyperparameters (Wang et al, 2019).

composer.datasets.imagenet

ImageNet classification streaming dataset.

composer.datasets.imagenet_hparams

ImageNet classification dataset hyperparameters.

composer.datasets.lm_dataset_hparams

Generic hyperparameters for self-supervised training of autoregressive and masked language models.

composer.datasets.mnist_hparams

MNIST image classification dataset hyperparameters.

composer.datasets.streaming

MosaicML Streaming Datasets for cloud-native model training.

composer.datasets.synthetic

Synthetic datasets used for testing, profiling, and debugging.

composer.datasets.synthetic_hparams

Synthetic Dataset hyperparameter mixin.

composer.datasets.synthetic_lm

Synthetic language modeling datasets used for testing, profiling, and debugging.

composer.datasets.utils

Utility and helper functions for datasets.

Natively supported datasets.

Classes

ADE20k

PyTorch Dataset for ADE20k.

C4Dataset

Builds a streaming, sharded, sized torch.utils.data.IterableDataset for the C4 (Colossal Cleaned Common Crawl) dataset.

COCODetection

PyTorch Dataset for the COCO dataset.

PytTrain

Module PytTrain.

PytVal

Module PytVal.

StreamingADE20k

Implementation of the ADE20k dataset using StreamingDataset.

StreamingC4

Implementation of the C4 (Colossal Cleaned Common Crawl) dataset using StreamingDataset.

StreamingCIFAR10

Implementation of the CIFAR10 dataset using StreamingDataset.

StreamingCOCO

Implementation of the COCO dataset using StreamingDataset.

StreamingImageNet1k

Implementation of the ImageNet1k dataset using StreamingDataset.

SyntheticBatchPairDataset

Emulates a dataset of provided size and shape.

SyntheticDataLabelType

Defines the class label type of the synthetic data.

SyntheticDataType

Defines the distribution of the synthetic data.

SyntheticPILDataset

Similar to SyntheticBatchPairDataset, but yields samples of type Image and supports dataset transformations.