composer.optim.scheduler#

composer.optim.scheduler

Functions

asdict

Return the fields of a dataclass instance as a new dictionary mapping field names to field values.

compile

composer.optim.scheduler.compile

constant_scheduler

Maintains a fixed learning rate.

cosine_annealing_scheduler

Decays the learning rate according to the decreasing part of a cosine curve.

cosine_annealing_warm_restarts_scheduler

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

cosine_annealing_with_warmup_scheduler

Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.

dataclass

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.

exponential_scheduler

Decays the learning rate exponentially.

linear_scheduler

Adjusts the learning rate linearly.

linear_with_warmup_scheduler

Adjusts the learning rate linearly, with a linear warmup.

multi_step_scheduler

Decays the learning rate discretely at fixed milestones.

multi_step_with_warmup_scheduler

Decays the learning rate discretely at fixed milestones, with a linear warmup.

polynomial_scheduler

Sets the learning rate to be exponentially proportional to the percentage of training time left.

step_scheduler

Decays the learning rate discretely at fixed intervals.

Classes

ABC

Helper class that provides a standard way to create an ABC using inheritance.

ComposerSchedulerFn

Specification for a "stateless" scheduler function.

LambdaLR

Sets the learning rate of each parameter group to the initial lr times a given function.

Protocol

Base class for protocol classes.

_LRScheduler

torch.optim.lr_scheduler._LRScheduler

State

The state of the trainer.

Time

Time represents static durations of training time or points in the training process in terms of a TimeUnit enum (epochs, batches, samples, tokens, or duration).

TimeUnit

Units of time for the training process.

Hparams

These classes are used with yahp for YAML-based configuration.

ConstantLRHparams

Hyperparameters for the constant_scheduler() scheduler.

CosineAnnealingLRHparams

Hyperparameters for the cosine_annealing_scheduler() scheduler.

CosineAnnealingWarmRestartsHparams

Hyperparameters for the cosine_annealing_warm_restarts_scheduler() scheduler.

CosineAnnealingWithWarmupLRHparams

Hyperparameters for the cosine_annealing_with_warmup_scheduler() scheduler.

ExponentialLRHparams

Hyperparameters for the exponential_scheduler() scheduler.

LinearLRHparams

Hyperparameters for the linear_scheduler() scheduler.

LinearWithWarmupLRHparams

Hyperparameters for the linear_with_warmup_scheduler() scheduler.

MultiStepLRHparams

Hyperparameters for the multi_step_scheduler() scheduler.

MultiStepWithWarmupLRHparams

Hyperparameters for the multi_step_with_warmup_scheduler() scheduler.

PolynomialLRHparams

Hyperparameters for the polynomial_scheduler() scheduler.

SchedulerHparams

composer.optim.scheduler.SchedulerHparams

StepLRHparams

Hyperparameters for the step_scheduler() scheduler.

Attributes

  • ComposerScheduler

  • List

  • TYPE_CHECKING

  • Union

  • log

class composer.optim.scheduler.ComposerSchedulerFn(*args, **kwargs)[source]#

Bases: Protocol

Specification for a โ€œstatelessโ€ scheduler function.

A scheduler function should be a pure function that returns a multiplier to apply to the optimizerโ€™s provided learning rate, given the current trainer state, and optionally a โ€œscale schedule ratioโ€ (SSR). A typical implementation will read state.timer, and possibly other fields like state.max_duration, to determine the trainerโ€™s latest temporal progress.

class composer.optim.scheduler.ConstantLRHparams(factor=1.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the constant_scheduler() scheduler.

scheduler_function(*, ssr=1.0, factor=1.0, total_time='1dur')#

Maintains a fixed learning rate.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • factor (float) โ€“ Factor. Default = 1.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

class composer.optim.scheduler.CosineAnnealingLRHparams(t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_max='1dur', min_factor=0.0)#

Decays the learning rate according to the decreasing part of a cosine curve.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

class composer.optim.scheduler.CosineAnnealingWarmRestartsHparams(t_0='1dur', min_factor=0.0, t_mult=1.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_warm_restarts_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_0, t_mult=1.0, min_factor=0.0)#

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_0 (str or Time) โ€“ The first cycleโ€™s duration.

  • t_mult (float) โ€“ The multiplier for subsequent cyclesโ€™ durations. Default = 1.0.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

class composer.optim.scheduler.CosineAnnealingWithWarmupLRHparams(warmup_time, t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the cosine_annealing_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, t_max='1dur', min_factor=0.0)#

Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

class composer.optim.scheduler.ExponentialLRHparams(gamma)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the exponential_scheduler() scheduler.

scheduler_function(*, ssr=1.0, gamma)#

Decays the learning rate exponentially.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • gamma (float) โ€“ Gamma.

class composer.optim.scheduler.LinearLRHparams(start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the linear_scheduler() scheduler.

scheduler_function(*, ssr=1.0, start_factor=1.0, end_factor=0.0, total_time='1dur')#

Adjusts the learning rate linearly.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • start_factor (float) โ€“ Start factor. Default = 1.0.

  • end_factor (float) โ€“ End factor. Default = 0.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

class composer.optim.scheduler.LinearWithWarmupLRHparams(warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the linear_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')#

Adjusts the learning rate linearly, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • start_factor (float) โ€“ Start factor. Default = 1.0.

  • end_factor (float) โ€“ End factor. Default = 0.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

class composer.optim.scheduler.MultiStepLRHparams(milestones, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the multi_step_scheduler() scheduler.

scheduler_function(*, ssr=1.0, milestones, gamma=0.1)#

Decays the learning rate discretely at fixed milestones.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • milestones (list of str or Time) โ€“ Milestones.

  • gamma (float) โ€“

class composer.optim.scheduler.MultiStepWithWarmupLRHparams(warmup_time, milestones, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the multi_step_with_warmup_scheduler() scheduler.

scheduler_function(*, ssr=1.0, warmup_time, milestones, gamma=0.1)#

Decays the learning rate discretely at fixed milestones, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • milestones (list of str or Time) โ€“ Milestones.

  • gamma (float) โ€“

class composer.optim.scheduler.PolynomialLRHparams(power, t_max='1dur', min_factor=0.0)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the polynomial_scheduler() scheduler.

scheduler_function(*, ssr=1.0, t_max='1dur', power, min_factor=0.0)#

Sets the learning rate to be exponentially proportional to the percentage of training time left.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • power (float) โ€“ Power.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

class composer.optim.scheduler.SchedulerHparams[source]#

Bases: yahp.hparams.Hparams, abc.ABC

composer.optim.scheduler.SchedulerHparams

class composer.optim.scheduler.StepLRHparams(step_size, gamma=0.1)[source]#

Bases: composer.optim.scheduler.SchedulerHparams

Hyperparameters for the step_scheduler() scheduler.

scheduler_function(*, ssr=1.0, step_size, gamma=0.1)#

Decays the learning rate discretely at fixed intervals.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • gamma (float) โ€“ Gamma. Default = 0.1.

composer.optim.scheduler.constant_scheduler(state, *, ssr=1.0, factor=1.0, total_time='1dur')[source]#

Maintains a fixed learning rate.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • factor (float) โ€“ Factor. Default = 1.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

composer.optim.scheduler.cosine_annealing_scheduler(state, *, ssr=1.0, t_max='1dur', min_factor=0.0)[source]#

Decays the learning rate according to the decreasing part of a cosine curve.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

composer.optim.scheduler.cosine_annealing_warm_restarts_scheduler(state, *, ssr=1.0, t_0, t_mult=1.0, min_factor=0.0)[source]#

Cyclically decays the learning rate according to the decreasing part of a cosine curve.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_0 (str or Time) โ€“ The first cycleโ€™s duration.

  • t_mult (float) โ€“ The multiplier for subsequent cyclesโ€™ durations. Default = 1.0.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

composer.optim.scheduler.cosine_annealing_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, t_max='1dur', min_factor=0.0)[source]#

Decays the learning rate according to the decreasing part of a cosine curve, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

composer.optim.scheduler.exponential_scheduler(state, *, ssr=1.0, gamma)[source]#

Decays the learning rate exponentially.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • gamma (float) โ€“ Gamma.

composer.optim.scheduler.linear_scheduler(state, *, ssr=1.0, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Adjusts the learning rate linearly.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • start_factor (float) โ€“ Start factor. Default = 1.0.

  • end_factor (float) โ€“ End factor. Default = 0.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

composer.optim.scheduler.linear_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, start_factor=1.0, end_factor=0.0, total_time='1dur')[source]#

Adjusts the learning rate linearly, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • start_factor (float) โ€“ Start factor. Default = 1.0.

  • end_factor (float) โ€“ End factor. Default = 0.0.

  • total_time (str or Time) โ€“ Total time. Default = '1dur'.

composer.optim.scheduler.multi_step_scheduler(state, *, ssr=1.0, milestones, gamma=0.1)[source]#

Decays the learning rate discretely at fixed milestones.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • milestones (list of str or Time) โ€“ Milestones.

  • gamma (float) โ€“

composer.optim.scheduler.multi_step_with_warmup_scheduler(state, *, ssr=1.0, warmup_time, milestones, gamma=0.1)[source]#

Decays the learning rate discretely at fixed milestones, with a linear warmup.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • warmup_time (str or Time) โ€“ Warmup time.

  • milestones (list of str or Time) โ€“ Milestones.

  • gamma (float) โ€“

composer.optim.scheduler.polynomial_scheduler(state, *, ssr=1.0, t_max='1dur', power, min_factor=0.0)[source]#

Sets the learning rate to be exponentially proportional to the percentage of training time left.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • t_max (str or Time) โ€“ Total time. Default = '1dur'.

  • power (float) โ€“ Power.

  • min_factor (float) โ€“ Minimum factor. Default = 0.0.

composer.optim.scheduler.step_scheduler(state, *, ssr=1.0, step_size, gamma=0.1)[source]#

Decays the learning rate discretely at fixed intervals.

Parameters
  • state (State) โ€“ The current Composer Trainer state.

  • ssr (float) โ€“ The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default = 1.0.

  • gamma (float) โ€“ Gamma. Default = 0.1.