composer.core.time#

Utilities to track training progress in terms of epochs, batches, samples, and tokens.

Callbacks, algorithms, and schedulers can use the current training time to fire at certain points in the training process.

The Timestamp class tracks the total number of epochs, batches, samples, and tokens. The trainer is responsible for updating it at the end of every epoch and batch. There is only one instance of the Timestamp, which is attached to the State.

The Time class represents static durations of training time or points in the training process in terms of a specific TimeUnit enum. This class supports comparisons, arithmetic, and conversions.

See the Time Guide for more details on tracking time during training.

Functions

ensure_time

Ensure maybe_time is an instance of Time.

Classes

Time

Time represents static durations of training time in terms of a TimeUnit enum.

TimeUnit

Enum class to represent units of time for the training process.

Timestamp

Timestamp represents a snapshot of the current training progress.

class composer.core.time.Time(value, unit)[source]#

Bases: Generic[composer.core.time.TValue], composer.core.serializable.Serializable

Time represents static durations of training time in terms of a TimeUnit enum.

See the Time Guide for more details on tracking time during training.

To construct an instance of Time, you can either:

  1. Use a value followed by a TimeUnit enum or string. For example,

>>> Time(5, TimeUnit.EPOCH)  # describes 5 epochs.
Time(5, TimeUnit.EPOCH)
>>> Time(30_000, "tok")  # describes 30,000 tokens.
Time(30000, TimeUnit.TOKEN)
>>> Time(0.5, "dur")  # describes 50% of the training process.
Time(0.5, TimeUnit.DURATION)
  1. Use one of the helper methods. See:

Time supports addition and subtraction with other Time instances that share the same TimeUnit. For example:

>>> Time(1, TimeUnit.EPOCH) + Time(2, TimeUnit.EPOCH)
Time(3, TimeUnit.EPOCH)

Time supports multiplication. The multiplier must be either a number or have units of TimeUnit.DURATION. The multiplicand is scaled, and its units are kept.

>>> Time(2, TimeUnit.EPOCH) * 0.5
Time(1, TimeUnit.EPOCH)
>>> Time(2, TimeUnit.EPOCH) * Time(0.5, TimeUnit.DURATION)
Time(1, TimeUnit.EPOCH)

Time supports division. If the divisor is an instance of Time, then it must have the same units as the dividend, and the result has units of TimeUnit.DURATION. For example:

>>> Time(4, TimeUnit.EPOCH) / Time(2, TimeUnit.EPOCH)
Time(2.0, TimeUnit.DURATION)

If the divisor is number, then the dividend is scaled, and it keeps its units. For example:

>>> Time(4, TimeUnit.EPOCH) / 2
Time(2, TimeUnit.EPOCH)
Parameters
classmethod from_batch(batch)[source]#

Create a Time with units of TimeUnit.BATCH.

Equivalent to Time(batch, TimeUnit.BATCH).

Parameters

batch (int) โ€“ Number of batches.

Returns

Time โ€“ Time instance, in batches.

classmethod from_duration(duration)[source]#

Create a Time with units of TimeUnit.DURATION.

Equivalent to Time(duration, TimeUnit.DURATION).

Parameters

duration (float) โ€“ Duration of the training process. Should be on [0, 1) where 0 represents the beginning of the training process and 1 represents a completed training process.

Returns

Time โ€“ Time instance, in duration.

classmethod from_epoch(epoch)[source]#

Create a Time with units of TimeUnit.EPOCH.

Equivalent to Time(epoch, TimeUnit.EPOCH).

Parameters

epoch (int) โ€“ Number of epochs.

Returns

Time โ€“ Time instance, in epochs.

classmethod from_sample(sample)[source]#

Create a Time with units of TimeUnit.SAMPLE.

Equivalent to Time(sample, TimeUnit.SAMPLE).

Parameters

sample (int) โ€“ Number of samples.

Returns

Time โ€“ Time instance, in samples.

classmethod from_timestring(timestring)[source]#

Parse a time string into a Time instance.

A time string is a numerical value followed by the value of a TimeUnit enum. For example:

>>> Time.from_timestring("5ep")  # describes 5 epochs.
Time(5, TimeUnit.EPOCH)
>>> Time.from_timestring("3e4tok")  # describes 30,000 tokens.
Time(30000, TimeUnit.TOKEN)
>>> Time.from_timestring("0.5dur")  # describes 50% of the training process.
Time(0.5, TimeUnit.DURATION)
Returns

Time โ€“ An instance of Time.

classmethod from_token(token)[source]#

Create a Time with units of TimeUnit.TOKEN.

Equivalent to Time(sample, TimeUnit.TOKEN).

Parameters

token (int) โ€“ Number of tokens.

Returns

Time โ€“ Time instance, in tokens.

to_timestring()[source]#

Get the time-string representation.

For example:

>>> Time(5, TimeUnit.EPOCH).to_timestring()
'5ep'
Returns

str โ€“ The time-string representation.

property unit[source]#

The unit of the time.

property value[source]#

The value of the time, as a number.

class composer.core.time.TimeUnit(value)[source]#

Bases: composer.utils.string_enum.StringEnum

Enum class to represent units of time for the training process.

EPOCH#

Epochs.

Type

str

BATCH#

Batches (i.e. number of optimization steps)

Type

str

SAMPLE#

Samples.

Type

str

TOKEN#

Tokens. Applicable for natural language processing (NLP) models.

Type

str

DURATION#

Fraction of the training process complete, on [0.0, 1.0)

Type

str

class composer.core.time.Timestamp(epoch=0, batch=0, sample=0, token=0, batch_in_epoch=0, sample_in_epoch=0, token_in_epoch=0, total_wct=None, epoch_wct=None, batch_wct=None)[source]#

Bases: composer.core.serializable.Serializable

Timestamp represents a snapshot of the current training progress.

The timestamp measures training progress in terms of epochs, batches, samples, tokens, and wall clock time. Timestamps are not updated in-place.

See the Time Guide for more details on tracking time during training.

Parameters
  • epoch (int | Time[int], optional) โ€“ The epoch.

  • batch (int | Time[int], optional) โ€“ the batch.

  • sample (int | Time[int], optional) โ€“ The sample.

  • token (int | Time[int], optional) โ€“ The token.

  • batch_in_epoch (int | Time[int], optional) โ€“ The batch in the epoch.

  • sample_in_epoch (int | Time[int], optional) โ€“ The sample in the epoch.

  • token_in_epoch (int | Time[int], optional) โ€“ The token in the epoch.

  • total_wct (timedelta, optional) โ€“ The total wall-clock duration.

  • epoch_wct (timedelta, optional) โ€“ The wall-clock duration of the last epoch.

  • batch_wct (timedelta, optional) โ€“ The wall-clock duration of the last batch.

property batch[source]#

The total batch count.

property batch_in_epoch[source]#

The batch count in the current epoch (resets at 0 at the beginning of every epoch).

property batch_wct[source]#

The wall-clock duration (in seconds) for the last batch.

copy(epoch=None, batch=None, sample=None, token=None, batch_in_epoch=None, sample_in_epoch=None, token_in_epoch=None, total_wct=None, epoch_wct=None, batch_wct=None)[source]#

Create a copy of the timestamp.

Any specified values will override the existing values in the returned copy.

Parameters
  • epoch (int | Time[int], optional) โ€“ The epoch.

  • batch (int | Time[int], optional) โ€“ the batch.

  • sample (int | Time[int], optional) โ€“ The sample.

  • token (int | Time[int], optional) โ€“ The token.

  • batch_in_epoch (int | Time[int], optional) โ€“ The batch in the epoch.

  • sample_in_epoch (int | Time[int], optional) โ€“ The sample in the epoch.

  • token_in_epoch (int | Time[int], optional) โ€“ The token in the epoch.

  • total_wct (timedelta, optional) โ€“ The elapsed duration from the beginning of training.

Returns

Timestamp โ€“ A new timestamp instance, created from a copy, but with any specified values overriding the existing values.

property epoch[source]#

The total epoch count.

property epoch_wct[source]#

The wall-clock duration (in seconds) for the current epoch.

get(unit)[source]#

Returns the current time in the specified unit.

Parameters

unit (str | TimeUnit) โ€“ The desired unit.

Returns

Time โ€“ The current time, in the specified unit.

get_state()[source]#

Returns all values of the timestamp object in a dictionary.

Returns

Dict[str, Union[Time[int], datetime.timedelta]] โ€“ All values of the timestamp object.

property sample[source]#

The total sample count.

property sample_in_epoch[source]#

The sample count in the current epoch (resets at 0 at the beginning of every epoch).

to_next_batch(samples=0, tokens=0, duration=None)[source]#

Create a new Timestamp, advanced to the next batch.

Equivalent to:

>>> timestamp.copy(
...     batch=timestamp.batch + 1,
...     batch_in_epoch=timestamp.batch_in_epoch + 1,
...     sample=timestamp.sample + samples,
...     sample_in_epoch=timestamp.sample_in_epoch + samples,
...     token = timestamp.token + tokens,
...     token_in_epoch=timestamp.token_in_epoch + tokens,
...     total_wct=timestamp.total_wct + duration,
...     epoch_wct=timestamp.epoch_wct + duration,
...     batch_wct=duration,
... )
Timestamp(...)

Note

For accurate time tracking, when doing distributed training, the samples and tokens should be the total across all ranks for the given batch. This method will not accumulate these counts automatically. If per-rank sample and token counts are provided, these counts will differ across ranks, which could lead towards inconsistent behavior by Algorithm or Callback instances that use these counts.

Parameters
  • samples (int | Time, optional) โ€“ The number of samples trained in the batch. Defaults to 0.

  • tokens (int | Time, optional) โ€“ The number of tokens trained in the batch. Defaults to 0.

  • duration (timedelta, optional) โ€“ The duration to train the batch.

to_next_epoch()[source]#

Create a new Timestamp, advanced to the next epoch.

Equivalent to:

>>> timestamp.copy(
...     epoch=timestamp.epoch+1,
...     batch_in_epoch=0,
...     sample_in_epoch=0,
...     token_in_epoch=0,
...     epoch_wct=datetime.timedelta(seconds=0),
...     batch_wct=datetime.timedelta(seconds=0),
... )
Timestamp(...)
property token[source]#

The total token count.

property token_in_epoch[source]#

The token count in the current epoch (resets at 0 at the beginning of every epoch).

property total_wct[source]#

The wall-clock duration (in seconds) from the beginning of training.

composer.core.time.ensure_time(maybe_time, int_unit)[source]#

Ensure maybe_time is an instance of Time.

Parameters
  • maybe_time (Time | str) โ€“ A time string, integer, or instance of Time.

  • int_unit (TimeUnit | str) โ€“ The unit to use if maybe_time is an integer

Returns

Time โ€“ An instance of Time.