composer.utils.file_helpers#

Helpers for working with files.

Functions

ensure_folder_is_empty

Ensure that the given folder is empty.

format_name_with_dist

Format format_str with the run_name, distributed variables, and extra_format_kwargs.

format_name_with_dist_and_time

Format format_str with the run_name, distributed variables, timestamp, and extra_format_kwargs.

get_file

Get a file from a local folder, URL, or object store.

is_tar

Returns whether name has a tar-like extension.

Exceptions

GetFileNotFoundException

Exception if get_file() failed due to a not found error.

exception composer.utils.file_helpers.GetFileNotFoundException[source]#

Bases: RuntimeError

Exception if get_file() failed due to a not found error.

composer.utils.file_helpers.ensure_folder_is_empty(folder_name)[source]#

Ensure that the given folder is empty.

Hidden files and folders (those beginning with .) and ignored. Sub-folders are checked recursively.

Parameters

folder_name (str | Path) โ€“ The folder to ensure is empty.

Raises

FileExistsError โ€“ If folder_name contains any non-hidden files, recursively.

composer.utils.file_helpers.format_name_with_dist(format_str, run_name, **extra_format_kwargs)[source]#

Format format_str with the run_name, distributed variables, and extra_format_kwargs.

The following format variables are available:

Variable

Description

{run_name}

The name of the training run. See run_name.

{rank}

The global rank, as returned by get_global_rank().

{local_rank}

The local rank of the process, as returned by get_local_rank().

{world_size}

The world size, as returned by get_world_size().

{local_world_size}

The local world size, as returned by get_local_world_size().

{node_rank}

The node rank, as returned by get_node_rank().

For example, assume that the rank is 0. Then:

>>> from composer.utils import format_name_with_dist
>>> format_str = '{run_name}/rank{rank}.{extension}'
>>> format_name_with_dist(
...     format_str,
...     run_name='awesome_training_run',
...     extension='json',
... )
'awesome_training_run/rank0.json'
Parameters
  • format_str (str) โ€“ The format string for the checkpoint filename.

  • run_name (str) โ€“ The value for the {run_name} format variable.

  • extra_format_kwargs (object) โ€“ Any additional format() kwargs.

composer.utils.file_helpers.format_name_with_dist_and_time(format_str, run_name, timestamp, **extra_format_kwargs)[source]#

Format format_str with the run_name, distributed variables, timestamp, and extra_format_kwargs.

In addition to the variables specified via extra_format_kwargs, the following format variables are available:

Variable

Description

{run_name}

The name of the training run. See run_name.

{rank}

The global rank, as returned by get_global_rank().

{local_rank}

The local rank of the process, as returned by get_local_rank().

{world_size}

The world size, as returned by get_world_size().

{local_world_size}

The local world size, as returned by get_local_world_size().

{node_rank}

The node rank, as returned by get_node_rank().

{epoch}

The total epoch count, as returned by epoch().

{batch}

The total batch count, as returned by batch().

{batch_in_epoch}

The batch count in the current epoch, as returned by batch_in_epoch().

{sample}

The total sample count, as returned by sample().

{sample_in_epoch}

The sample count in the current epoch, as returned by sample_in_epoch().

{token}

The total token count, as returned by token().

{token_in_epoch}

The token count in the current epoch, as returned by token_in_epoch().

For example, assume that the current epoch is 0, batch is 0, and rank is 0. Then:

>>> from composer.utils import format_name_with_dist_and_time
>>> format_str = '{run_name}/ep{epoch}-ba{batch}-rank{rank}.{extension}'
>>> format_name_with_dist_and_time(
...     format_str,
...     run_name='awesome_training_run',
...     timestamp=state.timer.get_timestamp(),
...     extension='json',
... )
'awesome_training_run/ep0-ba0-rank0.json'
Parameters
  • format_str (str) โ€“ The format string for the checkpoint filename.

  • run_name (str) โ€“ The value for the {run_name} format variable.

  • timestamp (Timestamp) โ€“ The timestamp.

  • extra_format_kwargs (object) โ€“ Any additional format() kwargs.

composer.utils.file_helpers.get_file(path, destination, object_store=None, chunk_size=1048576, progress_bar=True)[source]#

Get a file from a local folder, URL, or object store.

Parameters
  • path (str) โ€“

    The path to the file to retreive.

    • If object_store is specified, then the path should be the object name for the file to get. Do not include the the cloud provider or bucket name.

    • If object_store is not specified but the path begins with http:// or https://, the object at this URL will be downloaded.

    • Otherwise, path is presumed to be a local filepath.

  • destination (str) โ€“

    The destination filepath.

    If path is a local filepath, then a symlink to path at destination will be created. Otherwise, path will be downloaded to a file at destination.

  • object_store (ObjectStore, optional) โ€“

    An ObjectStore, if path is located inside an object store (i.e. AWS S3 or Google Cloud Storage). (default: None)

    This ObjectStore instance will be used to retreive the file. The path parameter should be set to the object name within the object store.

    Set this parameter to None (the default) if path is a URL or a local file.

  • chunk_size (int, optional) โ€“ Chunk size (in bytes). Ignored if path is a local file. (default: 1MB)

  • progress_bar (bool, optional) โ€“ Whether to show a progress bar. Ignored if path is a local file. (default: True)

Raises

GetFileNotFoundException โ€“ If the path does not exist, a GetFileNotFoundException exception will be raised.

composer.utils.file_helpers.is_tar(name)[source]#

Returns whether name has a tar-like extension.

Parameters

name (str | Path) โ€“ The name to check.

Returns

bool โ€“ Whether name is a tarball.