composer.datasets.coco#

COCO (Common Objects in Context) dataset.

COCO is a large-scale object detection, segmentation, and captioning dataset. Please refer to the COCO dataset for more details.

Classes

COCODetection

PyTorch Dataset for the COCO dataset.

StreamingCOCO

Implementation of the COCO dataset using StreamingDataset.

class composer.datasets.coco.COCODetection(img_folder, annotate_file, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

PyTorch Dataset for the COCO dataset.

Parameters
  • img_folder (str) โ€“ the path to the COCO folder.

  • annotate_file (str) โ€“ path to a file that contains image id, annotations (e.g., bounding boxes and object classes) etc.

  • transform (Module) โ€“ transformations to apply to the image.

class composer.datasets.coco.StreamingCOCO(remote, local, split, shuffle, batch_size=None)[source]#

Bases: composer.datasets.streaming.dataset.StreamingDataset, torchvision.datasets.vision.VisionDataset

Implementation of the COCO dataset using StreamingDataset.

Parameters
  • remote (str) โ€“ Remote directory (S3 or local filesystem) where dataset is stored.

  • local (str) โ€“ Local filesystem directory where dataset is cached during operation.

  • split (str) โ€“ The dataset split to use, either โ€˜trainโ€™ or โ€˜valโ€™.

  • shuffle (bool) โ€“ Whether to shuffle the samples in this dataset.

  • batch_size (Optional[int]) โ€“ Hint the batch_size that will be used on each deviceโ€™s DataLoader. Default: None.