file_helpers#

Helpers for working with files.

Functions

create_symlink_file

Create a symlink file, which can be followed by get_file().

ensure_folder_has_no_conflicting_files

Ensure that the given folder does not have any files conflicting with the filename format string.

ensure_folder_is_empty

Ensure that the given folder is empty.

format_name_with_dist

Format format_str with the run_name, distributed variables, and extra_format_kwargs.

format_name_with_dist_and_time

Format format_str with the run_name, distributed variables, timestamp, and extra_format_kwargs.

get_file

Get a file from a local folder, URL, or object store.

is_tar

Returns whether name has a tar-like extension.

Create a symlink file, which can be followed by get_file().

Unlike unix symlinks, symlink files can be created by this function are normal text files and can be uploaded to object stores via ObjectStore.upload_object() or loggers via Logger.file_artifact() that otherwise would not support unix-style symlinks.

Parameters
  • existing_path (str) โ€“ The name of existing object that the symlink file should point to.

  • destination_filename (str | Path) โ€“ The filename to which to write the symlink. It must end in '.symlink'.

composer.utils.file_helpers.ensure_folder_has_no_conflicting_files(folder_name, filename, timestamp)[source]#

Ensure that the given folder does not have any files conflicting with the filename format string.

If any filename is formatted with a timestamp where the epoch, batch, sample, or token counts are after timestamp, a FileExistsError will be raised. If filename and occurs later than timestamp, raise a FileExistsError.

Parameters
  • folder_name (str | Path) โ€“ The folder to inspect.

  • filename (str) โ€“ The pattern string for potential files.

  • timestamp (Timestamp) โ€“ Ignore any files that occur before the provided timestamp.

Raises

FileExistsError โ€“ If folder_name contains any files matching the filename template before timestamp.

composer.utils.file_helpers.ensure_folder_is_empty(folder_name)[source]#

Ensure that the given folder is empty.

Hidden files and folders (those beginning with .) and ignored. Sub-folders are checked recursively.

Parameters

folder_name (str | Path) โ€“ The folder to ensure is empty.

Raises

FileExistsError โ€“ If folder_name contains any non-hidden files, recursively.

composer.utils.file_helpers.format_name_with_dist(format_str, run_name, **extra_format_kwargs)[source]#

Format format_str with the run_name, distributed variables, and extra_format_kwargs.

The following format variables are available:

Variable

Description

{run_name}

The name of the training run. See Logger.run_name.

{rank}

The global rank, as returned by get_global_rank().

{local_rank}

The local rank of the process, as returned by get_local_rank().

{world_size}

The world size, as returned by get_world_size().

{local_world_size}

The local world size, as returned by get_local_world_size().

{node_rank}

The node rank, as returned by get_node_rank().

For example, assume that the rank is 0. Then:

>>> from composer.utils import format_name_with_dist
>>> format_str = '{run_name}/rank{rank}.{extension}'
>>> format_name_with_dist(
...     format_str,
...     run_name='awesome_training_run',
...     extension='json',
... )
'awesome_training_run/rank0.json'
Parameters
  • format_str (str) โ€“ The format string for the checkpoint filename.

  • run_name (str) โ€“ The value for the {run_name} format variable.

  • extra_format_kwargs (object) โ€“ Any additional format() kwargs.

composer.utils.file_helpers.format_name_with_dist_and_time(format_str, run_name, timestamp, **extra_format_kwargs)[source]#

Format format_str with the run_name, distributed variables, timestamp, and extra_format_kwargs.

In addition to the variables specified via extra_format_kwargs, the following format variables are available:

Variable

Description

{run_name}

The name of the training run. See Logger.run_name.

{rank}

The global rank, as returned by get_global_rank().

{local_rank}

The local rank of the process, as returned by get_local_rank().

{world_size}

The world size, as returned by get_world_size().

{local_world_size}

The local world size, as returned by get_local_world_size().

{node_rank}

The node rank, as returned by get_node_rank().

{epoch}

The total epoch count, as returned by epoch().

{batch}

The total batch count, as returned by batch().

{batch_in_epoch}

The batch count in the current epoch, as returned by batch_in_epoch().

{sample}

The total sample count, as returned by sample().

{sample_in_epoch}

The sample count in the current epoch, as returned by sample_in_epoch().

{token}

The total token count, as returned by token().

{token_in_epoch}

The token count in the current epoch, as returned by token_in_epoch().

{total_wct}

The total training duration in seconds, as returned by total_wct().

{epoch_wct}

The epoch duration in seconds, as returned by epoch_wct().

{batch_wct}

The batch duration in seconds, as returned by batch_wct().

For example, assume that the current epoch is 0, batch is 0, and rank is 0. Then:

>>> from composer.utils import format_name_with_dist_and_time
>>> format_str = '{run_name}/ep{epoch}-ba{batch}-rank{rank}.{extension}'
>>> format_name_with_dist_and_time(
...     format_str,
...     run_name='awesome_training_run',
...     timestamp=state.timestamp,
...     extension='json',
... )
'awesome_training_run/ep0-ba0-rank0.json'
Parameters
  • format_str (str) โ€“ The format string for the checkpoint filename.

  • run_name (str) โ€“ The value for the {run_name} format variable.

  • timestamp (Timestamp) โ€“ The timestamp.

  • extra_format_kwargs (object) โ€“ Any additional format() kwargs.

composer.utils.file_helpers.get_file(path, destination, object_store=None, overwrite=False, progress_bar=True)[source]#

Get a file from a local folder, URL, or object store.

Parameters
  • path (str) โ€“

    The path to the file to retrieve.

    • If object_store is specified, then the path should be the object name for the file to get. Do not include the the cloud provider or bucket name.

    • If object_store is not specified but the path begins with http:// or https://, the object at this URL will be downloaded.

    • Otherwise, path is presumed to be a local filepath.

  • destination (str) โ€“

    The destination filepath.

    If path is a local filepath, then a symlink to path at destination will be created. Otherwise, path will be downloaded to a file at destination.

  • object_store (ObjectStore, optional) โ€“

    An ObjectStore, if path is located inside an object store (i.e. AWS S3 or Google Cloud Storage). (default: None)

    This ObjectStore instance will be used to retrieve the file. The path parameter should be set to the object name within the object store.

    Set this parameter to None (the default) if path is a URL or a local file.

  • overwrite (bool) โ€“ Whether to overwrite an existing file at destination. (default: False)

  • progress_bar (bool, optional) โ€“ Whether to show a progress bar. Ignored if path is a local file. (default: True)

Raises

FileNotFoundError โ€“ If the path does not exist.

composer.utils.file_helpers.is_tar(name)[source]#

Returns whether name has a tar-like extension.

Parameters

name (str | Path) โ€“ The name to check.

Returns

bool โ€“ Whether name is a tarball.