composer.callbacks.mlperf#

composer.callbacks.mlperf

Functions

get_system_description

Generates a valid system description.

rank_zero

composer.callbacks.mlperf.composer.callbacks.mlperf.rank_zero

require_mlperf_logging

composer.callbacks.mlperf.composer.callbacks.mlperf.require_mlperf_logging

Classes

Callback

Base class for callbacks.

DataLoader

Data loader.

LogLevel

LogLevel denotes when in the training loop log messages are generated.

Logger

An interface to record training data.

MLPerfCallback

Creates a compliant results file for MLPerf Training benchmark.

State

The state of the trainer.

Attributes

  • Any

  • BENCHMARKS

  • DIVISIONS

  • Dict

  • List

  • Optional

  • STATUS

  • Sized

  • mlperf_available

class composer.callbacks.mlperf.MLPerfCallback(root_folder, index, benchmark='resnet', target=0.759, division='open', metric_name='Accuracy', metric_label='eval', submitter='MosaicML', system_name=None, status='onprem', cache_clear_cmd=None)[source]#

Bases: composer.core.callback.Callback

Creates a compliant results file for MLPerf Training benchmark.

A submission folder structure will be created with the root_folder as the base and the following directories:

root_folder/
    results/
        [system_name]/
            [benchmark]/
                results_0.txt
                results_1.txt
                ...
    systems/
        [system_name].json

A required systems description will be automatically generated, and best effort made to populate the fields, but should be manually checked prior to submission.

Currently, only open division submissions are supported with this Callback.

Example:

from composer.callbacks import MLPerfCallback

callback = MLPerfCallback(
    root_folder='/submission',
    index=0,
    metric_name='Accuracy',
    metric_label='eval',
    target='0.759',
)

During training, the metric found in state.current_metrics[metric_label][metric_name] will be compared against the target criterion.

Note

This is currently an experimental logger, that has not been used (yet) to submit an actual result to MLPerf. Please use with caution.

Note

MLPerf submissions require clearing the system cache prior to any training run. By default, this callback does not clear the cache, as that is a system specific operation. To enable cache clearing, and thus pass the mlperf compliance checker, provide a cache_clear_cmd that will be executed with os.system.

Parameters
  • root_folder (str) โ€“ The root submission folder

  • index (int) โ€“ The repetition index of this run. The filename created will be result_[index].txt.

  • benchmark (str, optional) โ€“ Benchmark name. Currently only resnet supported.

  • target (float, optional) โ€“ The target metric before the mllogger marks the stop of the timing run. Default: 0.759 (resnet benchmark).

  • division (str, optional) โ€“ Division of submission. Currently only open division supported.

  • metric_name (str, optional) โ€“ name of the metric to compare against the target. Default: Accuracy.

  • metric_label (str, optional) โ€“ label name. The metric will be accessed via state.current_metrics[metric_label][metric_name].

  • submitter (str, optional) โ€“ Submitting organization. Default: MosaicML.

  • system_name (str, optional) โ€“ Name of the system (e.g. 8xA100_composer). If not provided, system name will default to [world_size]x[device_name]_composer, e.g. 8xNVIDIA_A100_80GB_composer.

  • status (str, optional) โ€“ Submission status. One of (onprem, cloud, or preview). Default: "onprem".

  • cache_clear_cmd (str, optional) โ€“ Command to invoke during the cache clear. This callback will call os.system(cache_clear_cmd). Default is disabled (None)

composer.callbacks.mlperf.get_system_description(submitter, division, status, system_name=None)[source]#

Generates a valid system description.

Make a best effort to auto-populate some of the fields, but should be manually checked prior to submission. The system name is auto-generated as โ€œ[world_size]x[device_name]_composerโ€, e.g. โ€œ8xNVIDIA_A100_80GB_composerโ€.

Parameters
  • submitter (str) โ€“ Name of the submitter organization

  • division (str) โ€“ Submission division (open, closed)

  • status (str) โ€“ system status (cloud, onprem, preview)

Returns

system description as a dictionary