composer.datasets.coco#

COCO (Common Objects in Context) dataset.

COCO is a large-scale object detection, segmentation, and captioning dataset. Please refer to the COCO dataset for more details.

Classes

`COCODetection`	PyTorch Dataset for the COCO dataset.
`StreamingCOCO`	Implementation of the COCO dataset using StreamingDataset.

class composer.datasets.coco.COCODetection(img_folder, annotate_file, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

PyTorch Dataset for the COCO dataset.

Parameters

img_folder (str) – the path to the COCO folder.
annotate_file (str) – path to a file that contains image id, annotations (e.g., bounding boxes and object classes) etc.
transform (Module) – transformations to apply to the image.

class composer.datasets.coco.StreamingCOCO(remote, local, split, shuffle, batch_size=None)[source]#

Implementation of the COCO dataset using StreamingDataset.

Parameters

remote (str) – Remote directory (S3 or local filesystem) where dataset is stored.
local (str) – Local filesystem directory where dataset is cached during operation.
split (str) – The dataset split to use, either ‘train’ or ‘val’.
shuffle (bool) – Whether to shuffle the samples in this dataset.
batch_size (Optional[int]) – Hint the batch_size that will be used on each device’s DataLoader. Default: None.