composer.utils.object_store#

Utility for uploading to and downloading from cloud object stores.

Classes

ObjectStore

Utility for uploading to and downloading from object (blob) stores, such as Amazon S3.

Hparams

These classes are used with yahp for YAML-based configuration.

ObjectStoreHparams

ObjectStore hyperparameters.

class composer.utils.object_store.ObjectStore(provider, container, provider_kwargs=None)[source]#

Utility for uploading to and downloading from object (blob) stores, such as Amazon S3.

Example

Hereโ€™s an example for an Amazon S3 bucket named MY_CONTAINER:

>>> from composer.utils import ObjectStore
>>> object_store = ObjectStore(
...     provider="s3",
...     container="MY_CONTAINER",
...     provider_kwargs={
...         "key": "AKIA...",
...         "secret": "*********",
...     }
... )
>>> object_store
<composer.utils.object_store.ObjectStore object at ...>
Parameters
  • provider (str) โ€“

    Cloud provider to use. Valid options are:

  • container (str) โ€“ The name of the container (i.e. bucket) to use.

  • provider_kwargs (Dict[str, Any], optional) โ€“

    Keyword arguments to pass into the constructor for the specified provider. These arguments would usually include the cloud region and credentials.

    Common keys are:

    • key (str): API key or username to be used (required).

    • secret (str): Secret password to be used (required).

    • secure (bool): Whether to use HTTPS or HTTP. Note: Some providers only support HTTPS, and it is on by default.

    • host (str): Override hostname used for connections.

    • port (int): Override port used for connections.

    • api_version (str): Optional API version. Only used by drivers which support multiple API versions.

    • region (str): Optional driver region. Only used by drivers which support multiple regions.

property container_name#

The name of the object storage container.

download_object(object_name, destination_path, overwrite_existing=False, delete_on_failure=True)[source]#

Download an object to the specified destination path.

Parameters
  • object_name (str) โ€“ The name of the object to download.

  • destination_path (str) โ€“ Full path to a file or a directory where the incoming file will be saved.

  • overwrite_existing (bool, optional) โ€“ Set to True to overwrite an existing file. (default: False)

  • delete_on_failure (bool, optional) โ€“ Set to True to delete a partially downloaded file if the download was not successful (hash mismatch / file size). (default: True)

download_object_as_stream(object_name, chunk_size=None)[source]#

Return a iterator which yields object data.

Parameters
  • object_name (str) โ€“ Object name.

  • chunk_size (Optional[int], optional) โ€“ Optional chunk size (in bytes).

Returns

Iterator[bytes] โ€“ The object, as a byte stream.

get_object_size(object_name)[source]#

Get the size of an object, in bytes.

Parameters

object_name (str) โ€“ The name of the object.

Returns

int โ€“ The object size, in bytes.

property provider_name#

The name of the cloud provider.

upload_object(file_path, object_name, verify_hash=True, extra=None, headers=None)[source]#

Upload an object currently located on a disk.

Parameters
  • file_path (str) โ€“ Path to the object on disk.

  • object_name (str) โ€“ Object name (i.e. where the object will be stored in the container.)

  • verify_hash (bool, optional) โ€“ Whether to verify hashes (default: True)

  • extra (Optional[Dict], optional) โ€“ Extra attributes to pass to the underlying provider driver. (default: None, which is equivalent to an empty dictionary)

  • headers (Optional[Dict[str, str]], optional) โ€“ Additional request headers, such as CORS headers. (defaults: None, which is equivalent to an empty dictionary)

upload_object_via_stream(obj, object_name, extra=None, headers=None)[source]#

Upload an object.

Parameters
  • obj (bytes | Iterator[bytes]) โ€“ The object.

  • object_name (str) โ€“ Object name (i.e. where the object will be stored in the container.)

  • verify_hash (bool, optional) โ€“ Whether to verify hashes (default: True)

  • extra (Optional[Dict], optional) โ€“ Extra attributes to pass to the underlying provider driver. (default: None)

  • headers (Optional[Dict[str, str]], optional) โ€“ Additional request headers, such as CORS headers. (defaults: None)

class composer.utils.object_store.ObjectStoreHparams(provider, container, key_environ=None, secret_environ=None, region=None, host=None, port=None, extra_init_kwargs=<factory>)[source]#

Bases: yahp.hparams.Hparams

ObjectStore hyperparameters.

Example

Hereโ€™s an example on how to connect to an Amazon S3 bucket. This example assumes:

  • The container is named named MY_CONTAINER.

  • The AWS Access Key ID is stored in an environment variable named AWS_ACCESS_KEY_ID.

  • The Secret Access Key is in an environmental variable named AWS_SECRET_ACCESS_KEY.

>>> from composer.utils import ObjectStoreHparams
>>> provider_hparams = ObjectStoreHparams(
...     provider="s3",
...     container="MY_CONTAINER",
...     key_environ="AWS_ACCESS_KEY_ID",
...     secret_environ="AWS_SECRET_ACCESS_KEY",
... )
>>> provider = provider_hparams.initialize_object()
>>> provider
<composer.utils.object_store.ObjectStore object at ...>
Parameters
  • provider (str) โ€“

    Cloud provider to use.

    See ObjectStore for documentation.

  • container (str) โ€“ The name of the container (i.e. bucket) to use.

  • key_environ (str, optional) โ€“

    The name of an environment variable containing the API key or username to use to connect to the provider. If no key is required, then set this field to None. (default: None)

    For security reasons, composer requires that the key be specified via an environment variable. For example, if your key is an environment variable called OBJECT_STORE_KEY that is set to MY_KEY, then you should set this parameter equal to OBJECT_STORE_KEY. Composer will read the key like this:

    >>> import os
    >>> params = ObjectStoreHparams(key_environ="OBJECT_STORE_KEY")
    >>> key = os.environ[params.key_environ]
    >>> key
    'MY_KEY'
    

  • secret_environ (str, optional) โ€“

    The name of an environment variable containing the API secret or password to use for the provider. If no secret is required, then set this field to None. (default: None)

    For security reasons, composer requires that the secret be specified via an environment variable. For example, if your secret is an environment variable called OBJECT_STORE_SECRET that is set to MY_SECRET, then you should set this parameter equal to OBJECT_STORE_SECRET. Composer will read the secret like this:

    >>> import os
    >>> params = ObjectStoreHparams(secret_environ="OBJECT_STORE_SECRET")
    >>> secret = os.environ[params.secret_environ]
    >>> secret
    'MY_SECRET'
    

  • region (str, optional) โ€“ Cloud region to use for the cloud provider. Most providers do not require the region to be specified. (default: None)

  • host (str, optional) โ€“ Override the hostname for the cloud provider. (default: None)

  • port (int, optional) โ€“ Override the port for the cloud provider. (default: None)

  • extra_init_kwargs (Dict[str, Any], optional) โ€“

    Extra keyword arguments to pass into the constructor for the specified provider. (default: None, which is equivalent to an empty dictionary)

get_provider_kwargs()[source]#

Returns the provider_kwargs argument, which is used to construct a ObjectStore.

Returns

Dict[str, Any] โ€“ The provider_kwargs for use in constructing an ObjectStore.

initialize_object()[source]#

Returns an instance of ObjectStore.

Returns

ObjectStore โ€“ The object_store.