composer.datasets.coco#
COCO (Common Objects in Context) dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset. Please refer to the COCO dataset for more details.
Classes
PyTorch Dataset for the COCO dataset. |
|
Implementation of the COCO dataset using StreamingDataset. |
Hparams
These classes are used with yahp
for YAML
-based configuration.
Defines an instance of the COCO Dataset. |
|
DatasetHparams for creating an instance of StreamingCOCO. |
- class composer.datasets.coco.COCODatasetHparams(is_train=True, drop_last=True, shuffle=True, datadir=None)[source]#
Bases:
composer.datasets.hparams.DatasetHparams
Defines an instance of the COCO Dataset.
- Parameters
datadir (str) โ The path to the data directory.
is_train (bool) โ Whether to load the training data or validation data. Default:
True
.drop_last (bool) โ If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default:
True
.shuffle (bool) โ Whether to shuffle the dataset. Default:
True
.
- initialize_object(batch_size, dataloader_hparams)[source]#
Creates a
DataLoader
orDataSpec
for this dataset.- Parameters
batch_size (int) โ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) โ The dataset-independent hparams for the dataloader.
- Returns
Iterable | DataSpec โ An iterable that yields batches, or if the dataset yields batches that need custom
processing, a :class:`~core.data_spec.DataSpec`.
- class composer.datasets.coco.COCODetection(img_folder, annotate_file, transform=None)[source]#
Bases:
torch.utils.data.dataset.Dataset
PyTorch Dataset for the COCO dataset.
- class composer.datasets.coco.StreamingCOCO(remote, local, split, shuffle, batch_size=None)[source]#
Bases:
composer.datasets.streaming.dataset.StreamingDataset
Implementation of the COCO dataset using StreamingDataset.
- Parameters
remote (str) โ Remote directory (S3 or local filesystem) where dataset is stored.
local (str) โ Local filesystem directory where dataset is cached during operation.
split (str) โ The dataset split to use, either โtrainโ or โvalโ.
shuffle (bool) โ Whether to shuffle the samples in this dataset.
batch_size (Optional[int]) โ Hint the batch_size that will be used on each deviceโs DataLoader. Default:
None
.
- class composer.datasets.coco.StreamingCOCOHparams(is_train=True, drop_last=True, shuffle=True, datadir=None, remote='s3://mosaicml-internal-dataset-coco/mds/1/', local='/tmp/mds-cache/mds-coco/', split='train')[source]#
Bases:
composer.datasets.hparams.DatasetHparams
DatasetHparams for creating an instance of StreamingCOCO.
- Parameters
datadir (str) โ The path to the data directory.
is_train (bool) โ Whether to load the training data or validation data. Default:
True
.drop_last (bool) โ If the number of samples is not divisible by the batch size, whether to drop the last batch or pad the last batch with zeros. Default:
True
.shuffle (bool) โ Whether to shuffle the dataset. Default:
True
.remote (str) โ Remote directory (S3 or local filesystem) where dataset is stored. Default:
's3://mosaicml-internal-dataset-coco/mds/1/`
local (str) โ Local filesystem directory where dataset is cached during operation. Default:
'/tmp/mds-cache/mds-coco/`
split (str) โ The dataset split to use, either โtrainโ or โvalโ. Default:
'train`
.
- initialize_object(batch_size, dataloader_hparams)[source]#
Creates a
DataLoader
orDataSpec
for this dataset.- Parameters
batch_size (int) โ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataLoaderHparams) โ The dataset-independent hparams for the dataloader.
- Returns
Iterable | DataSpec โ An iterable that yields batches, or if the dataset yields batches that need custom
processing, a :class:`~core.data_spec.DataSpec`.