composer.loggers.object_store_logger#

Log artifacts to an object store.

Classes

ObjectStoreLogger

Logger destination that uploads artifacts to an object store.

class composer.loggers.object_store_logger.ObjectStoreLogger(provider, container, provider_kwargs=None, should_log_artifact=None, object_name='{artifact_name}', num_concurrent_uploads=4, upload_staging_folder=None, use_procs=True)[source]#

Bases: composer.loggers.logger_destination.LoggerDestination

Logger destination that uploads artifacts to an object store.

This logger destination handles calls to file_artifact() and uploads files to an object store, such as AWS S3 or Google Cloud Storage.

object_store_logger = ObjectStoreLogger(
    provider='s3',
    container='my-bucket',
    provider_kwargs={
        'key': 'AKIA...',
        'secret': '*********',
        'region': 'ap-northeast-1',
    },
)

# Construct the trainer using this logger
trainer = Trainer(
    ...,
    loggers=[object_store_logger],
)

Note

This callback blocks the training loop to copy each artifact where should_log_artifact returns True, as the uploading happens in the background. Here are some additional tips for minimizing the performance impact:

  • Set should_log to filter which artifacts will be logged. By default, all artifacts are logged.

  • Set use_procs=True (the default) to use background processes, instead of threads, to perform the file uploads. Processes are recommended to ensure that the GIL is not blocking the training loop when performing CPU operations on uploaded files (e.g. computing and comparing checksums). Network I/O happens always occurs in the background.

  • Provide a RAM disk path for the upload_staging_folder parameter. Copying files to stage on RAM will be faster than writing to disk. However, there must have sufficient excess RAM, or MemoryErrors may be raised.

Parameters
  • provider (str) โ€“

    Cloud provider to use. Valid options are:

  • container (str) โ€“ The name of the container (i.e. bucket) to use.

  • provider_kwargs (Dict[str, Any], optional) โ€“

    Keyword arguments to pass into the constructor for the specified provider. These arguments would usually include the cloud region and credentials.

    Common keys are:

    • key (str): API key or username to be used (required).

    • secret (str): Secret password to be used (required).

    • secure (bool): Whether to use HTTPS or HTTP. Note: Some providers only support HTTPS, and it is on by default.

    • host (str): Override hostname used for connections.

    • port (int): Override port used for connections.

    • api_version (str): Optional API version. Only used by drivers which support multiple API versions.

    • region (str): Optional driver region. Only used by drivers which support multiple regions.

  • should_log_artifact ((State, LogLevel, str) -> bool, optional) โ€“

    A function to filter which artifacts are uploaded.

    The function should take the (current training state, log level, artifact name) and return a boolean indicating whether this file should be uploaded.

    By default, all artifacts will be uploaded.

  • object_name (str, optional) โ€“

    A format string used to determine the object name.

    The following format variables are available:

    Variable

    Description

    {artifact_name}

    The name of the artifact being logged.

    {run_name}

    The name of the training run. See Logger.run_name.

    {rank}

    The global rank, as returned by get_global_rank().

    {local_rank}

    The local rank of the process, as returned by get_local_rank().

    {world_size}

    The world size, as returned by get_world_size().

    {local_world_size}

    The local world size, as returned by get_local_world_size().

    {node_rank}

    The node rank, as returned by get_node_rank().

    Leading slashes ('/') will be stripped.

    Consider the following example, which subfolders the artifacts by their rank:

    >>> object_store_logger = ObjectStoreLogger(..., object_name='rank_{rank}/{artifact_name}')
    >>> trainer = Trainer(..., run_name='foo', loggers=[object_store_logger])
    >>> trainer.logger.file_artifact(
    ...     log_level=LogLevel.EPOCH,
    ...     artifact_name='bar.txt',
    ...     file_path='path/to/file.txt',
    ... )
    

    Assuming that the processโ€™s rank is 0, the object store would store the contents of 'path/to/file.txt' in an object named 'rank0/bar.txt'.

    Default: '{artifact_name}'

  • num_concurrent_uploads (int, optional) โ€“ Maximum number of concurrent uploads. Defaults to 4.

  • upload_staging_folder (str, optional) โ€“ A folder to use for staging uploads. If not specified, defaults to using a TemporaryDirectory().

  • use_procs (bool, optional) โ€“ Whether to perform file uploads in background processes (as opposed to threads). Defaults to True.

get_uri_for_artifact(artifact_name)[source]#

Get the object store provider uri for an artfact.

Parameters

artifact_name (str) โ€“ The name of an artifact.

Returns

str โ€“ The uri corresponding to the uploaded location of the artifact.

Object stores do not natively support symlinks, so we emulate symlinks by adding a .symlink file to the object store, which is a text file containing the name of the object it is pointing to.