composer.optim.scheduler#

composer.optim.scheduler

Functions

`asdict`	Return the fields of a dataclass instance as a new dictionary mapping field names to field values.
`compile`	composer.optim.scheduler.compile
`constant_scheduler`	Maintains a fixed learning rate.
`cosine_annealing_scheduler`	Decays the learning rate according to the decreasing part of a cosine curve.
`cosine_annealing_warm_restarts_scheduler`	Cyclically decays the learning rate according to the decreasing part of a cosine curve.
`cosine_annealing_with_warmup_scheduler`	Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.
`dataclass`	Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.
`exponential_scheduler`	Decays the learning rate exponentially.
`linear_scheduler`	Adjusts the learning rate linearly.
`linear_with_warmup_scheduler`	Adjusts the learning rate linearly, with a linear warmup.
`multi_step_scheduler`	Decays the learning rate discretely at fixed milestones.
`multi_step_with_warmup_scheduler`	Decays the learning rate discretely at fixed milestones, with a linear warmup.
`polynomial_scheduler`	Sets the learning rate to be exponentially proportional to the percentage of training time left.
`step_scheduler`	Decays the learning rate discretely at fixed intervals.

Classes

`ABC`	Helper class that provides a standard way to create an ABC using inheritance.
`ComposerSchedulerFn`	Specification for a "stateless" scheduler function.
`LambdaLR`	Sets the learning rate of each parameter group to the initial lr times a given function.
`Protocol`	Base class for protocol classes.
`_LRScheduler`	torch.optim.lr_scheduler._LRScheduler
`State`	The state of the trainer.
`Time`	Time represents static durations of training time or points in the training process in terms of a `TimeUnit` enum (epochs, batches, samples, tokens, or duration).
`TimeUnit`	Units of time for the training process.

Hparams

These classes are used with yahp for YAML-based configuration.

`ConstantLRHparams`	Hyperparameters for the `constant_scheduler()` scheduler.
`CosineAnnealingLRHparams`	Hyperparameters for the `cosine_annealing_scheduler()` scheduler.
`CosineAnnealingWarmRestartsHparams`	Hyperparameters for the `cosine_annealing_warm_restarts_scheduler()` scheduler.
`CosineAnnealingWithWarmupLRHparams`	Hyperparameters for the `cosine_annealing_with_warmup_scheduler()` scheduler.
`ExponentialLRHparams`	Hyperparameters for the `exponential_scheduler()` scheduler.
`LinearLRHparams`	Hyperparameters for the `linear_scheduler()` scheduler.
`LinearWithWarmupLRHparams`	Hyperparameters for the `linear_with_warmup_scheduler()` scheduler.
`MultiStepLRHparams`	Hyperparameters for the `multi_step_scheduler()` scheduler.
`MultiStepWithWarmupLRHparams`	Hyperparameters for the `multi_step_with_warmup_scheduler()` scheduler.
`PolynomialLRHparams`	Hyperparameters for the `polynomial_scheduler()` scheduler.
`SchedulerHparams`	composer.optim.scheduler.SchedulerHparams
`StepLRHparams`	Hyperparameters for the `step_scheduler()` scheduler.

Attributes

ComposerScheduler
List
TYPE_CHECKING
Union
log

class composer.optim.scheduler.ComposerSchedulerFn(*args, **kwargs)[source]#

Bases: Protocol

Specification for a “stateless” scheduler function.

A scheduler function should be a pure function that returns a multiplier to apply to the optimizer’s provided learning rate, given the current trainer state, and optionally a “scale schedule ratio” (SSR). A typical implementation will read state.timer, and possibly other fields like state.max_duration, to determine the trainer’s latest temporal progress.

class composer.optim.scheduler.ConstantLRHparams(factor=1.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the constant_scheduler() scheduler.

scheduler_function(*, ssr=1.0, factor=1.0, total_time='1dur')#

Maintains a fixed learning rate.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
factor (float) – Factor. Default = 1.0.
total_time (str or Time) – Total time. Default = '1dur'.

class composer.optim.scheduler.CosineAnnealingLRHparams(t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_max='1dur', min_factor=0.0)#

Decays the learning rate according to the decreasing part of a cosine curve.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_max (str or Time) – Total time. Default = '1dur'.
min_factor (float) – Minimum factor. Default = 0.0.

class composer.optim.scheduler.CosineAnnealingWarmRestartsHparams(t_0='1dur', min_factor=0.0, t_mult=1.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_warm_restarts_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_0, t_mult=1.0, min_factor=0.0)#

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_0 (str or Time) – The first cycle’s duration.
t_mult (float) – The multiplier for subsequent cycles’ durations. Default = 1.0.
min_factor (float) – Minimum factor. Default = 0.0.

class composer.optim.scheduler.CosineAnnealingWithWarmupLRHparams(warmup_time, t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, t_max='1dur', min_factor=0.0)#

Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
t_max (str or Time) – Total time. Default = '1dur'.
min_factor (float) – Minimum factor. Default = 0.0.

class composer.optim.scheduler.ExponentialLRHparams(gamma)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the exponential_scheduler() scheduler.

scheduler_function(*, ssr=1.0, gamma)#

Decays the learning rate exponentially.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
gamma (float) – Gamma.

class composer.optim.scheduler.LinearLRHparams(start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the linear_scheduler() scheduler.

scheduler_function(*, ssr=1.0, start_factor=1.0, end_factor=0.0, total_time='1dur')#

Adjusts the learning rate linearly.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
start_factor (float) – Start factor. Default = 1.0.
end_factor (float) – End factor. Default = 0.0.
total_time (str or Time) – Total time. Default = '1dur'.

class composer.optim.scheduler.LinearWithWarmupLRHparams(warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the linear_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')#

Adjusts the learning rate linearly, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
start_factor (float) – Start factor. Default = 1.0.
end_factor (float) – End factor. Default = 0.0.
total_time (str or Time) – Total time. Default = '1dur'.

class composer.optim.scheduler.MultiStepLRHparams(milestones, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the multi_step_scheduler() scheduler.

scheduler_function(*, ssr=1.0, milestones, gamma=0.1)#

Decays the learning rate discretely at fixed milestones.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
milestones (list of str or Time) – Milestones.
gamma (float) –

class composer.optim.scheduler.MultiStepWithWarmupLRHparams(warmup_time, milestones, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the multi_step_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, milestones, gamma=0.1)#

Decays the learning rate discretely at fixed milestones, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
milestones (list of str or Time) – Milestones.
gamma (float) –

class composer.optim.scheduler.PolynomialLRHparams(power, t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the polynomial_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_max='1dur', power, min_factor=0.0)#

Sets the learning rate to be exponentially proportional to the percentage of training time left.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_max (str or Time) – Total time. Default = '1dur'.
power (float) – Power.
min_factor (float) – Minimum factor. Default = 0.0.

class composer.optim.scheduler.SchedulerHparams[source]#

Bases: yahp.hparams.Hparams, abc.ABC

composer.optim.scheduler.SchedulerHparams

class composer.optim.scheduler.StepLRHparams(step_size, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the step_scheduler() scheduler.

scheduler_function(*, ssr=1.0, step_size, gamma=0.1)#

Decays the learning rate discretely at fixed intervals.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
gamma (float) – Gamma. Default = 0.1.

composer.optim.scheduler.constant_scheduler(state, *, ssr=1.0, factor=1.0, total_time='1dur')[source]#

Maintains a fixed learning rate.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
factor (float) – Factor. Default = 1.0.
total_time (str or Time) – Total time. Default = '1dur'.

composer.optim.scheduler.cosine_annealing_scheduler(state, *, ssr=1.0, t_max='1dur', min_factor=0.0)[source]#

Decays the learning rate according to the decreasing part of a cosine curve.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_max (str or Time) – Total time. Default = '1dur'.
min_factor (float) – Minimum factor. Default = 0.0.

composer.optim.scheduler.cosine_annealing_warm_restarts_scheduler(state, *, ssr=1.0, t_0, t_mult=1.0, min_factor=0.0)[source]#

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_0 (str or Time) – The first cycle’s duration.
t_mult (float) – The multiplier for subsequent cycles’ durations. Default = 1.0.
min_factor (float) – Minimum factor. Default = 0.0.

composer.optim.scheduler.cosine_annealing_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, t_max='1dur', min_factor=0.0)[source]#

Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
t_max (str or Time) – Total time. Default = '1dur'.
min_factor (float) – Minimum factor. Default = 0.0.

composer.optim.scheduler.exponential_scheduler(state, *, ssr=1.0, gamma)[source]#

Decays the learning rate exponentially.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
gamma (float) – Gamma.

composer.optim.scheduler.linear_scheduler(state, *, ssr=1.0, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Adjusts the learning rate linearly.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
start_factor (float) – Start factor. Default = 1.0.
end_factor (float) – End factor. Default = 0.0.
total_time (str or Time) – Total time. Default = '1dur'.

composer.optim.scheduler.linear_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Adjusts the learning rate linearly, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
start_factor (float) – Start factor. Default = 1.0.
end_factor (float) – End factor. Default = 0.0.
total_time (str or Time) – Total time. Default = '1dur'.

composer.optim.scheduler.multi_step_scheduler(state, *, ssr=1.0, milestones, gamma=0.1)[source]#

Decays the learning rate discretely at fixed milestones.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
milestones (list of str or Time) – Milestones.
gamma (float) –

composer.optim.scheduler.multi_step_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, milestones, gamma=0.1)[source]#

Decays the learning rate discretely at fixed milestones, with a linear warmup.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
warmup_time (str or Time) – Warmup time.
milestones (list of str or Time) – Milestones.
gamma (float) –

composer.optim.scheduler.polynomial_scheduler(state, *, ssr=1.0, t_max='1dur', power, min_factor=0.0)[source]#

Sets the learning rate to be exponentially proportional to the percentage of training time left.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
t_max (str or Time) – Total time. Default = '1dur'.
power (float) – Power.
min_factor (float) – Minimum factor. Default = 0.0.

composer.optim.scheduler.step_scheduler(state, *, ssr=1.0, step_size, gamma=0.1)[source]#

Decays the learning rate discretely at fixed intervals.

Parameters

state (State) – The current Composer Trainer state.
ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.
gamma (float) – Gamma. Default = 0.1.