composer.callbacks.callback_hparams#

Hyperparameters for callbacks.

Hparams

These classes are used with yahp for YAML-based configuration.

CallbackHparams

Base class for Callback hyperparameters.

CheckpointSaverHparams

CheckpointSaver hyperparameters.

GradMonitorHparams

GradMonitor hyperparamters.

LRMonitorHparams

LRMonitor hyperparameters.

MemoryMonitorHparams

MemoryMonitor hyperparameters.

SpeedMonitorHparams

SpeedMonitor hyperparameters.

class composer.callbacks.callback_hparams.CallbackHparams[source]#

Bases: yahp.hparams.Hparams, abc.ABC

Base class for Callback hyperparameters.

abstract initialize_object()[source]#

Initialize the callback.

Returns

Callback โ€“ An instance of the callback.

class composer.callbacks.callback_hparams.CheckpointSaverHparams(save_folder='{run_name}/checkpoints', filename='ep{epoch}-ba{batch}-rank{rank}', artifact_name='{run_name}/checkpoints/ep{epoch}-ba{batch}-rank{rank}', latest_filename='latest-rank{rank}', overwrite=False, weights_only=False, save_interval='1ep', num_checkpoints_to_keep=- 1)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

CheckpointSaver hyperparameters.

Parameters
class composer.callbacks.callback_hparams.EarlyStopperHparams(monitor, dataloader_label, comp=None, min_delta=0.0, patience=1)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

EarlyStopper hyperparameters.

Parameters
  • monitor (str) โ€“ The name of the metric to monitor.

  • dataloader_label (str) โ€“ The label of the dataloader or evaluator associated with the tracked metric. If monitor is in an Evaluator, the dataloader_label field should be set to the Evaluatorโ€™s label. If monitor is a training metric or an ordinary evaluation metric not in an Evaluator, dataloader_label should be set to โ€˜trainโ€™ or โ€˜evalโ€™ respectively.

  • comp (str, optional) โ€“ A string dictating which comparison operator to use to measure change in the monitored metric. Set comp to โ€œlessโ€ to use the function torch.less(), and โ€œgreaterโ€ to use the function torch.greater(). The comparison operator will be called comp(current_value, prev_best). For example, for metrics where the optimal value is low (error, loss, perplexity), use a less than operator.

  • min_delta (float, optional) โ€“ An optional float that requires a new value to exceed the best value by at least that amount. Defaults to 0.

  • patience (int | str, optional) โ€“ The interval of time the monitored metric can not improve without stopping training. Defaults to 1 epoch. If patience is an integer, it is interpreted as the number of epochs.

initialize_object()[source]#

Initialize the EarlyStopper callback.

Returns

EarlyStopper โ€“ An instance of EarlyStopper.

class composer.callbacks.callback_hparams.GradMonitorHparams(log_layer_grad_norms=False)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

GradMonitor hyperparamters.

Parameters

log_layer_grad_norms (bool, optional) โ€“ See GradMonitor for documentation. Default: False.

initialize_object()[source]#

Initialize the GradMonitor callback.

Returns

GradMonitor โ€“ An instance of GradMonitor.

class composer.callbacks.callback_hparams.LRMonitorHparams[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

LRMonitor hyperparameters.

There are no parameters as LRMonitor does not take any parameters.

initialize_object()[source]#

Initialize the LRMonitor callback.

Returns

LRMonitor โ€“ An instance of LRMonitor.

class composer.callbacks.callback_hparams.MLPerfCallbackHparams(root_folder, index, benchmark='resnet', target=0.759, division='open', metric_name='Accuracy', metric_label='eval', submitter='MosaicML', system_name=None, status='onprem', cache_clear_cmd=None, host_processors_per_node=None)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

MLPerfCallback hyperparameters.

Parameters
  • root_folder (str) โ€“ The root submission folder

  • index (int) โ€“ The repetition index of this run. The filename created will be result_[index].txt.

  • benchmark (str, optional) โ€“ Benchmark name. Currently only resnet supported. Default: resnet.

  • target (float, optional) โ€“ The target metric before the mllogger marks the stop of the timing run. Default: 0.759 (resnet benchmark).

  • division (str, optional) โ€“ Division of submission. Currently only open division supported. Default: "open".

  • metric_name (str, optional) โ€“ name of the metric to compare against the target. Default: "Accuracy".

  • metric_label (str, optional) โ€“ label name. The metric will be accessed via state.current_metrics[metric_label][metric_name]. Default: "eval".

  • submitter (str, optional) โ€“ Submitting organization. Default: "MosaicML".

  • system_name (str, optional) โ€“ Name of the system (e.g. 8xA100_composer). If None, system name will default to [world_size]x[device_name]_composer, e.g. 8xNVIDIA_A100_80GB_composer. Default: None.

  • status (str, optional) โ€“ Submission status. One of (onprem, cloud, or preview). Default: "onprem".

  • cache_clear_cmd (str, optional) โ€“ Command to invoke during the cache clear. This callback will call subprocess(cache_clear_cmd). Default is disabled (None)

  • host_processors_per_node (int, optional) โ€“ Total number of host processors per node. Default: None.

initialize_object()[source]#

Initialize the MLPerf Callback.

Returns

MLPerfCallback โ€“ An instance of MLPerfCallback

class composer.callbacks.callback_hparams.MemoryMonitorHparams[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

MemoryMonitor hyperparameters.

There are no parameters as MemoryMonitor does not take any parameters.

initialize_object()[source]#

Initialize the MemoryMonitor callback.

Returns

MemoryMonitor โ€“ An instance of MemoryMonitor.

class composer.callbacks.callback_hparams.SpeedMonitorHparams(window_size=100)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

SpeedMonitor hyperparameters.

Parameters

window_size (int, optional) โ€“ See SpeedMonitor for documentation.

initialize_object()[source]#

Initialize the SpeedMonitor callback.

Returns

SpeedMonitor โ€“ An instance of SpeedMonitor.

class composer.callbacks.callback_hparams.ThresholdStopperHparams(monitor, dataloader_label, threshold, comp=None, stop_on_batch=False)[source]#

Bases: composer.callbacks.callback_hparams.CallbackHparams

ThresholdStopper hyperparameters.

Parameters
  • monitor (str) โ€“ The name of the metric to monitor.

  • dataloader_label (str) โ€“ The label of the dataloader or evaluator associated with the tracked metric. If monitor is in an Evaluator, the dataloader_label field should be set to the Evaluatorโ€™s label. If monitor is a training metric or an ordinary evaluation metric not in an Evaluator, dataloader_label should be set to โ€˜trainโ€™ or โ€˜evalโ€™ respectively.

  • threshold (float) โ€“ The threshold that dictates when to halt training. Whether training stops if the metric exceeds or falls below the threshold depends on the comparison operator.

  • comp (str, optional) โ€“ A string dictating which comparison operator to use to measure change in the monitored metric. Set comp to โ€œlessโ€ to use the function torch.less(), and โ€œgreaterโ€ to use the function torch.greater(). The comparison operator will be called comp(current_value, prev_best). For example, for metrics where the optimal value is low (error, loss, perplexity), use the less than operator.

  • stop_on_batch (bool, optional) โ€“ A bool that indicates whether to stop training in the middle of an epoch if the training metrics satisfy the threshold comparison. Defaults to False.

initialize_object()[source]#

Initialize the ThresholdStopper callback.

Returns

ThresholdStopper โ€“ An instance of ThresholdStopper.