build_streaming_cifar10_dataloader#

composer.datasets.build_streaming_cifar10_dataloader(global_batch_size, remote, *, local='/tmp/mds-cache/mds-cifar10', split='train', drop_last=True, shuffle=True, predownload=100000, keep_zip=None, download_retry=2, download_timeout=60, validate_hash=None, shuffle_seed=None, num_canonical_nodes=None, **dataloader_kwargs)[source]#

Builds a streaming CIFAR10 dataset

Parameters
  • global_batch_size (int) โ€“ Global batch size.

  • remote (str) โ€“ Remote directory (S3 or local filesystem) where dataset is stored.

  • local (str, optional) โ€“ Local filesystem directory where dataset is cached during operation. Defaults to '/tmp/mds-cache/mds-imagenet1k/`.

  • split (str) โ€“ Which split of the dataset to use. Either [โ€˜trainโ€™, โ€˜valโ€™]. Default: 'train`.

  • drop_last (bool, optional) โ€“ whether to drop last samples. Default: True.

  • shuffle (bool, optional) โ€“ whether to shuffle dataset. Defaults to True.

  • predownload (int, optional) โ€“ Target number of samples ahead to download the shards of while iterating. Defaults to 100_000.

  • keep_zip (bool, optional) โ€“ Whether to keep or delete the compressed file when decompressing downloaded shards. If set to None, keep iff remote is local. Defaults to None.

  • download_retry (int) โ€“ Number of download re-attempts before giving up. Defaults to 2.

  • download_timeout (float) โ€“ Number of seconds to wait for a shard to download before raising an exception. Defaults to 60.

  • validate_hash (str, optional) โ€“ Optional hash or checksum algorithm to use to validate shards. Defaults to None.

  • shuffle_seed (int, optional) โ€“ Seed for shuffling, or None for random seed. Defaults to None.

  • num_canonical_nodes (int, optional) โ€“ Canonical number of nodes for shuffling with resumption. Defaults to None, which is interpreted as the number of nodes of the initial run.

  • **dataloader_kwargs (Dict[str, Any]) โ€“ Additional settings for the dataloader (e.g. num_workers, etc.)