composer.datasets.streaming.download#

Download handling for StreamingDataset.

Functions

download_or_wait

Downloads a file from remote to local, or waits for it to be downloaded.

composer.datasets.streaming.download.dispatch_download(remote, local, timeout)[source]#

Use the correct download handler to download the file

Parameters
  • remote (Optional[str]) โ€“ Remote path (local filesystem).

  • local (str) โ€“ Local path (local filesystem).

  • timeout (float) โ€“ How long to wait for file to download before raising an exception.

composer.datasets.streaming.download.download_from_http(remote, local)[source]#

Download a file from a http/https remote to local.

composer.datasets.streaming.download.download_from_local(remote, local)[source]#

Download a file from remote to local.

Parameters
  • remote (str) โ€“ Remote path (local filesystem).

  • local (str) โ€“ Local path (local filesystem).

composer.datasets.streaming.download.download_from_s3(remote, local, timeout)[source]#

Download a file from remote to local.

Parameters
  • remote (str) โ€“ Remote path (S3).

  • local (str) โ€“ Local path (local filesystem).

  • timeout (float) โ€“ How long to wait for shard to download before raising an exception.

composer.datasets.streaming.download.download_from_sftp(remote, local)[source]#

Download a file from remote SFTP server to local filepath.

Authentication must be provided via username/password in the remote URI, or a valid SSH config, or a default key discoverable in ~/.ssh/.

Parameters
  • remote (str) โ€“ Remote path (SFTP).

  • local (str) โ€“ Local path (local filesystem).

composer.datasets.streaming.download.download_or_wait(remote, local, wait=False, max_retries=2, timeout=60)[source]#

Downloads a file from remote to local, or waits for it to be downloaded.

Does not do any thread safety checks, so we assume the calling function is using wait correctly.

Parameters
  • remote (Optional[str]) โ€“ Remote path (S3, SFTP, or local filesystem).

  • local (str) โ€“ Local path (local filesystem).

  • wait (bool, default False) โ€“ If true, then do not actively download the file, but instead wait (up to timeout seconds) for the file to arrive.

  • max_retries (int, default 2) โ€“ Number of download re-attempts before giving up.

  • timeout (float, default 60) โ€“ How long to wait for file to download before raising an exception.