composer.core.time#
Utilities to track training progress in terms of epochs, batches, samples, and tokens.
Callbacks, algorithms, and schedulers can use the current training time to fire at certain points in the training process.
The Timestamp
class tracks the total number of epochs, batches, samples, and tokens. The trainer is
responsible for updating it at the end of every epoch and batch. There is only one instance of the
Timestamp
, which is attached to the State
.
The Time
class represents static durations of training time or points in the training process in terms
of a specific TimeUnit
enum. This class supports comparisons, arithmetic, and conversions.
See the Time Guide for more details on tracking time during training.
Functions
Ensure |
Classes
Time represents static durations of training time or points in the training process in terms of a |
|
Enum class to represent units of time for the training process. |
|
Timestamp represents a snapshot of the current training progress, in terms of epochs, batches, samples, and tokens. |
- class composer.core.time.Time(value, unit)[source]#
Bases:
Generic
[composer.core.time.TValue
]Time represents static durations of training time or points in the training process in terms of a
TimeUnit
enum (epochs, batches, samples, tokens, or duration).See the Time Guide for more details on tracking time during training.
To construct an instance of
Time
, you can either:Use a value followed by a
TimeUnit
enum or string. For example,
>>> Time(5, TimeUnit.EPOCH) # describes 5 epochs. Time(5, TimeUnit.EPOCH) >>> Time(30_000, "tok") # describes 30,000 tokens. Time(30000, TimeUnit.TOKEN) >>> Time(0.5, "dur") # describes 50% of the training process. Time(0.5, TimeUnit.DURATION)
Use one of the helper methods. See:
Time
supports addition and subtraction with otherTime
instances that share the sameTimeUnit
. For example:>>> Time(1, TimeUnit.EPOCH) + Time(2, TimeUnit.EPOCH) Time(3, TimeUnit.EPOCH)
Time
supports multiplication. The multiplier must be either a number or have units ofTimeUnit.DURATION
. The multiplicand is scaled, and its units are kept.>>> Time(2, TimeUnit.EPOCH) * 0.5 Time(1, TimeUnit.EPOCH)
>>> Time(2, TimeUnit.EPOCH) * Time(0.5, TimeUnit.DURATION) Time(1, TimeUnit.EPOCH)
Time
supports division. If the divisor is an instance ofTime
, then it must have the same units as the dividend, and the result has units ofTimeUnit.DURATION
. For example:>>> Time(4, TimeUnit.EPOCH) / Time(2, TimeUnit.EPOCH) Time(2.0, TimeUnit.DURATION)
If the divisor is number, then the dividend is scaled, and it keeps its units. For example:
>>> Time(4, TimeUnit.EPOCH) / 2 Time(2, TimeUnit.EPOCH)
- Parameters
- classmethod from_batch(batch)[source]#
Create a
Time
with units ofTimeUnit.BATCH
. Equivalent toTime(batch, TimeUnit.BATCH)
.
- classmethod from_duration(duration)[source]#
Create a
Time
with units ofTimeUnit.DURATION
. Equivalent toTime(duration, TimeUnit.DURATION)
.
- classmethod from_epoch(epoch)[source]#
Create a
Time
with units ofTimeUnit.EPOCH
. Equivalent toTime(epoch, TimeUnit.EPOCH)
.
- classmethod from_sample(sample)[source]#
Create a
Time
with units ofTimeUnit.SAMPLE
. Equivalent toTime(sample, TimeUnit.SAMPLE)
.
- classmethod from_timestring(timestring)[source]#
Parse a time string into a
Time
instance. A time string is a numerical value followed by the value of aTimeUnit
enum. For example:>>> Time.from_timestring("5ep") # describes 5 epochs. Time(5, TimeUnit.EPOCH) >>> Time.from_timestring("3e4tok") # describes 30,000 tokens. Time(30000, TimeUnit.TOKEN) >>> Time.from_timestring("0.5dur") # describes 50% of the training process. Time(0.5, TimeUnit.DURATION)
- Returns
Time โ An instance of
Time
.
- classmethod from_token(token)[source]#
Create a
Time
with units ofTimeUnit.TOKEN
. Equivalent toTime(sample, TimeUnit.TOKEN)
.
- to_timestring()[source]#
Get the time-string representation.
For example:
>>> Time(5, TimeUnit.EPOCH).to_timestring() '5ep'
- Returns
str โ The time-string representation.
- property unit#
The unit of the time.
- property value#
The value of the time, as a number.
- class composer.core.time.TimeUnit(value)[source]#
Bases:
composer.utils.string_enum.StringEnum
Enum class to represent units of time for the training process.
- class composer.core.time.Timestamp(epoch=0, batch=0, sample=0, token=0, batch_in_epoch=0, sample_in_epoch=0, token_in_epoch=0, total_wct=None, epoch_wct=None, batch_wct=None)[source]#
Bases:
composer.core.serializable.Serializable
Timestamp represents a snapshot of the current training progress, in terms of epochs, batches, samples, and tokens. Timestamps are not updated in-place.
See the Time Guide for more details on tracking time during training.
- Parameters
batch_in_epoch (int | Time[int], optional) โ The batch in the epoch.
sample_in_epoch (int | Time[int], optional) โ The sample in the epoch.
token_in_epoch (int | Time[int], optional) โ The token in the epoch.
total_wct (timedelta, optional) โ The total wall-clock duration.
epoch_wct (timedelta, optional) โ The wall-clock duration of the last epoch.
batch_wct (timedelta, optional) โ The wall-clock duration of the last batch.
- property batch#
The total batch count.
- property batch_in_epoch#
The batch count in the current epoch (resets at 0 at the beginning of every epoch).
- property batch_wct#
The wall-clock duration (in seconds) for the last batch.
- copy(epoch=None, batch=None, sample=None, token=None, batch_in_epoch=None, sample_in_epoch=None, token_in_epoch=None, total_wct=None, epoch_wct=None, batch_wct=None)[source]#
Create a copy of the timestamp. Any specified values will override the existing values in the returned copy.
- Parameters
batch_in_epoch (int | Time[int], optional) โ The batch in the epoch.
sample_in_epoch (int | Time[int], optional) โ The sample in the epoch.
token_in_epoch (int | Time[int], optional) โ The token in the epoch.
total_wct (timedelta, optional) โ The elapsed duration from the beginning of training.
- Returns
Timestamp โ A new timestamp instance, created from a copy, but with any specified values overriding the existing values.
- property epoch#
The total epoch count.
- property epoch_wct#
The wall-clock duration (in seconds) for the current epoch.
- property sample#
The total sample count.
- property sample_in_epoch#
The sample count in the current epoch (resets at 0 at the beginning of every epoch).
- to_next_batch(samples=0, tokens=0, duration=None)[source]#
Create a new
Timestamp
, with the batch, sample, and token counts properly incremented.Equivalent to:
>>> timestamp.copy( ... batch=timestamp.batch + 1, ... batch_in_epoch=timestamp.batch_in_epoch + 1, ... sample=timestamp.sample + samples, ... sample_in_epoch=timestamp.sample_in_epoch + samples, ... token = timestamp.token + tokens, ... token_in_epoch=timestamp.token_in_epoch + tokens, ... total_wct=timestamp.total_wct + duration, ... epoch_wct=timestamp.epoch_wct + duration, ... batch_wct=duration, ... ) Timestamp(...)
Note
For accurate time tracking, when doing distributed training, the
samples
andtokens
should be the total across all ranks for the given batch. This method will not accumulate these counts automatically. If per-rank sample and token counts are provided, these counts will differ across ranks, which could lead towards inconsistent behavior byAlgorithm
orCallback
instances that use these counts.
- to_next_epoch()[source]#
Create a new
Timestamp
incremented by one epoch and withbatch_in_epoch
,sample_in_epoch
, andtoken_in_epoch
reset.Equivalent to:
>>> timestamp.copy( ... epoch=timestamp.epoch+1, ... batch_in_epoch=0, ... sample_in_epoch=0, ... token_in_epoch=0, ... epoch_wct=datetime.timedelta(seconds=0), ... batch_wct=datetime.timedelta(seconds=0), ... ) Timestamp(...)
- property token#
The total token count.
- property token_in_epoch#
The token count in the current epoch (resets at 0 at the beginning of every epoch).
- property total_wct#
The wall-clock duration (in seconds) from the beginning of training.