composer.callbacks.callback_hparams#
Hyperparameters for callbacks.
Hparams
These classes are used with yahp
for YAML
-based configuration.
Base class for Callback hyperparameters. |
|
|
|
|
|
|
|
|
|
|
- class composer.callbacks.callback_hparams.CallbackHparams[source]#
Bases:
yahp.hparams.Hparams
,abc.ABC
Base class for Callback hyperparameters.
- class composer.callbacks.callback_hparams.CheckpointSaverHparams(save_folder='{run_name}/checkpoints', filename='ep{epoch}-ba{batch}-rank{rank}', artifact_name='{run_name}/checkpoints/ep{epoch}-ba{batch}-rank{rank}', latest_filename='latest-rank{rank}', overwrite=False, weights_only=False, save_interval='1ep', num_checkpoints_to_keep=- 1)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
CheckpointSaver
hyperparameters.- Parameters
save_folder (str, optional) โ See
CheckpointSaver
.filename (str, optional) โ See
CheckpointSaver
.artifact_name (str, optional) โ See
CheckpointSaver
.latest_filename (str, optional) โ See
CheckpointSaver
.overwrite (bool, optional) โ See
CheckpointSaver
. Default:False
.weights_only (bool, optional) โ See
CheckpointSaver
. Deafult:False
.save_interval (str, optional) โ
Either a time-string or a path to a function. If a time-string, checkpoints will be saved according to this interval.
If a path to a function, it should be of the format
'path.to.function:function_name'
. The function should take (State
,Event
) and return a boolean indicating whether a checkpoint should be saved given the current state and event. The event will be eitherEvent.BATCH_CHECKPOINT
orEvent.EPOCH_CHECKPOINT
. Default:"1ep"
.num_checkpoints_to_keep (int, optional) โ See
CheckpointSaver
. Default:-1
.
- class composer.callbacks.callback_hparams.EarlyStopperHparams(monitor, dataloader_label, comp=None, min_delta=0.0, patience=1)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
EarlyStopper
hyperparameters.- Parameters
monitor (str) โ The name of the metric to monitor.
dataloader_label (str) โ The label of the dataloader or evaluator associated with the tracked metric. If monitor is in an Evaluator, the dataloader_label field should be set to the Evaluatorโs label. If monitor is a training metric or an ordinary evaluation metric not in an Evaluator, dataloader_label should be set to โtrainโ or โevalโ respectively.
comp (str, optional) โ A string dictating which comparison operator to use to measure change in the monitored metric. Set
comp
to โlessโ to use the functiontorch.less()
, and โgreaterโ to use the functiontorch.greater()
. The comparison operator will be calledcomp(current_value, prev_best)
. For example, for metrics where the optimal value is low (error, loss, perplexity), use a less than operator.min_delta (float, optional) โ An optional float that requires a new value to exceed the best value by at least that amount. Defaults to 0.
patience (int | str, optional) โ The interval of time the monitored metric can not improve without stopping training. Defaults to 1 epoch. If patience is an integer, it is interpreted as the number of epochs.
- initialize_object()[source]#
Initialize the EarlyStopper callback.
- Returns
EarlyStopper โ An instance of
EarlyStopper
.
- class composer.callbacks.callback_hparams.GradMonitorHparams(log_layer_grad_norms=False)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
GradMonitor
hyperparamters.- Parameters
log_layer_grad_norms (bool, optional) โ See
GradMonitor
for documentation. Default:False
.
- initialize_object()[source]#
Initialize the GradMonitor callback.
- Returns
GradMonitor โ An instance of
GradMonitor
.
- class composer.callbacks.callback_hparams.LRMonitorHparams[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
LRMonitor
hyperparameters.There are no parameters as
LRMonitor
does not take any parameters.
- class composer.callbacks.callback_hparams.MLPerfCallbackHparams(root_folder, index, benchmark='resnet', target=0.759, division='open', metric_name='Accuracy', metric_label='eval', submitter='MosaicML', system_name=None, status='onprem', cache_clear_cmd=None, host_processors_per_node=None)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
MLPerfCallback
hyperparameters.- Parameters
root_folder (str) โ The root submission folder
index (int) โ The repetition index of this run. The filename created will be
result_[index].txt
.benchmark (str, optional) โ Benchmark name. Currently only
resnet
supported. Default:resnet
.target (float, optional) โ The target metric before the mllogger marks the stop of the timing run. Default:
0.759
(resnet benchmark).division (str, optional) โ Division of submission. Currently only
open
division supported. Default:"open"
.metric_name (str, optional) โ name of the metric to compare against the target. Default:
"Accuracy"
.metric_label (str, optional) โ label name. The metric will be accessed via
state.current_metrics[metric_label][metric_name]
. Default:"eval"
.submitter (str, optional) โ Submitting organization. Default:
"MosaicML"
.system_name (str, optional) โ Name of the system (e.g. 8xA100_composer). If
None
, system name will default to[world_size]x[device_name]_composer
, e.g.8xNVIDIA_A100_80GB_composer
. Default:None
.status (str, optional) โ Submission status. One of (onprem, cloud, or preview). Default:
"onprem"
.cache_clear_cmd (str, optional) โ Command to invoke during the cache clear. This callback will call
subprocess(cache_clear_cmd)
. Default is disabled (None
)host_processors_per_node (int, optional) โ Total number of host processors per node. Default:
None
.
- initialize_object()[source]#
Initialize the MLPerf Callback.
- Returns
MLPerfCallback โ An instance of
MLPerfCallback
- class composer.callbacks.callback_hparams.MemoryMonitorHparams[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
MemoryMonitor
hyperparameters.There are no parameters as
MemoryMonitor
does not take any parameters.- initialize_object()[source]#
Initialize the MemoryMonitor callback.
- Returns
MemoryMonitor โ An instance of
MemoryMonitor
.
- class composer.callbacks.callback_hparams.SpeedMonitorHparams(window_size=100)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
SpeedMonitor
hyperparameters.- Parameters
window_size (int, optional) โ See
SpeedMonitor
for documentation.
- initialize_object()[source]#
Initialize the SpeedMonitor callback.
- Returns
SpeedMonitor โ An instance of
SpeedMonitor
.
- class composer.callbacks.callback_hparams.ThresholdStopperHparams(monitor, dataloader_label, threshold, comp=None, stop_on_batch=False)[source]#
Bases:
composer.callbacks.callback_hparams.CallbackHparams
ThresholdStopper
hyperparameters.- Parameters
monitor (str) โ The name of the metric to monitor.
dataloader_label (str) โ The label of the dataloader or evaluator associated with the tracked metric. If monitor is in an Evaluator, the dataloader_label field should be set to the Evaluatorโs label. If monitor is a training metric or an ordinary evaluation metric not in an Evaluator, dataloader_label should be set to โtrainโ or โevalโ respectively.
threshold (float) โ The threshold that dictates when to halt training. Whether training stops if the metric exceeds or falls below the threshold depends on the comparison operator.
comp (str, optional) โ A string dictating which comparison operator to use to measure change in the monitored metric. Set
comp
to โlessโ to use the functiontorch.less()
, and โgreaterโ to use the functiontorch.greater()
. The comparison operator will be calledcomp(current_value, prev_best)
. For example, for metrics where the optimal value is low (error, loss, perplexity), use the less than operator.stop_on_batch (bool, optional) โ A bool that indicates whether to stop training in the middle of an epoch if the training metrics satisfy the threshold comparison. Defaults to False.
- initialize_object()[source]#
Initialize the ThresholdStopper callback.
- Returns
ThresholdStopper โ An instance of
ThresholdStopper
.