file_helpers#
Helpers for working with files.
Functions
Create a symlink file, which can be followed by |
|
Ensure that the given folder does not have any files conflicting with the |
|
Ensure that the given folder is empty. |
|
Format |
|
Format |
|
Get a file from a local folder, URL, or object store. |
|
Returns whether |
- composer.utils.file_helpers.create_symlink_file(existing_path, destination_filename)[source]#
Create a symlink file, which can be followed by
get_file()
.Unlike unix symlinks, symlink files can be created by this function are normal text files and can be uploaded to object stores via
ObjectStore.upload_object()
or loggers viaLogger.file_artifact()
that otherwise would not support unix-style symlinks.
- composer.utils.file_helpers.ensure_folder_has_no_conflicting_files(folder_name, filename, timestamp)[source]#
Ensure that the given folder does not have any files conflicting with the
filename
format string.If any filename is formatted with a timestamp where the epoch, batch, sample, or token counts are after
timestamp
, aFileExistsError
will be raised. Iffilename
and occurs later thantimestamp
, raise aFileExistsError
.- Parameters
- Raises
FileExistsError โ If
folder_name
contains any files matching thefilename
template beforetimestamp
.
- composer.utils.file_helpers.ensure_folder_is_empty(folder_name)[source]#
Ensure that the given folder is empty.
Hidden files and folders (those beginning with
.
) and ignored. Sub-folders are checked recursively.- Parameters
- Raises
FileExistsError โ If
folder_name
contains any non-hidden files, recursively.
- composer.utils.file_helpers.format_name_with_dist(format_str, run_name, **extra_format_kwargs)[source]#
Format
format_str
with therun_name
, distributed variables, andextra_format_kwargs
.The following format variables are available:
Variable
Description
{run_name}
The name of the training run. See
Logger.run_name
.{rank}
The global rank, as returned by
get_global_rank()
.{local_rank}
The local rank of the process, as returned by
get_local_rank()
.{world_size}
The world size, as returned by
get_world_size()
.{local_world_size}
The local world size, as returned by
get_local_world_size()
.{node_rank}
The node rank, as returned by
get_node_rank()
.For example, assume that the rank is
0
. Then:>>> from composer.utils import format_name_with_dist >>> format_str = '{run_name}/rank{rank}.{extension}' >>> format_name_with_dist( ... format_str, ... run_name='awesome_training_run', ... extension='json', ... ) 'awesome_training_run/rank0.json'
- composer.utils.file_helpers.format_name_with_dist_and_time(format_str, run_name, timestamp, **extra_format_kwargs)[source]#
Format
format_str
with therun_name
, distributed variables,timestamp
, andextra_format_kwargs
.In addition to the variables specified via
extra_format_kwargs
, the following format variables are available:Variable
Description
{run_name}
The name of the training run. See
Logger.run_name
.{rank}
The global rank, as returned by
get_global_rank()
.{local_rank}
The local rank of the process, as returned by
get_local_rank()
.{world_size}
The world size, as returned by
get_world_size()
.{local_world_size}
The local world size, as returned by
get_local_world_size()
.{node_rank}
The node rank, as returned by
get_node_rank()
.{epoch}
The total epoch count, as returned by
epoch()
.{batch}
The total batch count, as returned by
batch()
.{batch_in_epoch}
The batch count in the current epoch, as returned by
batch_in_epoch()
.{sample}
The total sample count, as returned by
sample()
.{sample_in_epoch}
The sample count in the current epoch, as returned by
sample_in_epoch()
.{token}
The total token count, as returned by
token()
.{token_in_epoch}
The token count in the current epoch, as returned by
token_in_epoch()
.{total_wct}
The total training duration in seconds, as returned by
total_wct()
.{epoch_wct}
The epoch duration in seconds, as returned by
epoch_wct()
.{batch_wct}
The batch duration in seconds, as returned by
batch_wct()
.For example, assume that the current epoch is
0
, batch is0
, and rank is0
. Then:>>> from composer.utils import format_name_with_dist_and_time >>> format_str = '{run_name}/ep{epoch}-ba{batch}-rank{rank}.{extension}' >>> format_name_with_dist_and_time( ... format_str, ... run_name='awesome_training_run', ... timestamp=state.timestamp, ... extension='json', ... ) 'awesome_training_run/ep0-ba0-rank0.json'
- composer.utils.file_helpers.get_file(path, destination, object_store=None, overwrite=False, progress_bar=True)[source]#
Get a file from a local folder, URL, or object store.
- Parameters
path (str) โ
The path to the file to retrieve.
If
object_store
is specified, then thepath
should be the object name for the file to get. Do not include the the cloud provider or bucket name.If
object_store
is not specified but thepath
begins withhttp://
orhttps://
, the object at this URL will be downloaded.Otherwise,
path
is presumed to be a local filepath.
destination (str) โ
The destination filepath.
If
path
is a local filepath, then a symlink topath
atdestination
will be created. Otherwise,path
will be downloaded to a file atdestination
.object_store (ObjectStore, optional) โ
An
ObjectStore
, ifpath
is located inside an object store (i.e. AWS S3 or Google Cloud Storage). (default:None
)This
ObjectStore
instance will be used to retrieve the file. Thepath
parameter should be set to the object name within the object store.Set this parameter to
None
(the default) ifpath
is a URL or a local file.overwrite (bool) โ Whether to overwrite an existing file at
destination
. (default:False
)progress_bar (bool, optional) โ Whether to show a progress bar. Ignored if
path
is a local file. (default:True
)
- Raises
FileNotFoundError โ If the
path
does not exist.