time#
Utilities to track training progress in terms of epochs, batches, samples, and tokens.
Callbacks, algorithms, and schedulers can use the current training time to fire at certain points in the training process.
The Timestamp
class tracks the total number of epochs, batches, samples, and tokens. The trainer is
responsible for updating it at the end of every epoch and batch. There is only one instance of the
Timestamp
, which is attached to the State
.
The Time
class represents static durations of training time or points in the training process in terms
of a specific TimeUnit
enum. This class supports comparisons, arithmetic, and conversions.
See the Time Guide for more details on tracking time during training.
Functions
Ensure |
Classes
Time represents static durations of training time in terms of a |
|
Enum class to represent units of time for the training process. |
|
Timestamp represents a snapshot of the current training progress. |
- class composer.core.time.Time(value, unit)[source]#
Bases:
Generic
[composer.core.time.TValue
],composer.core.serializable.Serializable
Time represents static durations of training time in terms of a
TimeUnit
enum.See the Time Guide for more details on tracking time during training.
To construct an instance of
Time
, you can either:Use a value followed by a
TimeUnit
enum or string. For example,
>>> Time(5, TimeUnit.EPOCH) # describes 5 epochs. Time(5, TimeUnit.EPOCH) >>> Time(30_000, "tok") # describes 30,000 tokens. Time(30000, TimeUnit.TOKEN) >>> Time(0.5, "dur") # describes 50% of the training process. Time(0.5, TimeUnit.DURATION)
Use one of the helper methods. See:
Time
supports addition and subtraction with otherTime
instances that share the sameTimeUnit
. For example:>>> Time(1, TimeUnit.EPOCH) + Time(2, TimeUnit.EPOCH) Time(3, TimeUnit.EPOCH)
Time
supports multiplication. The multiplier must be either a number or have units ofTimeUnit.DURATION
. The multiplicand is scaled, and its units are kept.>>> Time(2, TimeUnit.EPOCH) * 0.5 Time(1, TimeUnit.EPOCH)
>>> Time(2, TimeUnit.EPOCH) * Time(0.5, TimeUnit.DURATION) Time(1, TimeUnit.EPOCH)
Time
supports division. If the divisor is an instance ofTime
, then it must have the same units as the dividend, and the result has units ofTimeUnit.DURATION
. For example:>>> Time(4, TimeUnit.EPOCH) / Time(2, TimeUnit.EPOCH) Time(2.0, TimeUnit.DURATION)
If the divisor is number, then the dividend is scaled, and it keeps its units. For example:
>>> Time(4, TimeUnit.EPOCH) / 2 Time(2, TimeUnit.EPOCH)
- Parameters
- classmethod from_batch(batch)[source]#
Create a
Time
with units ofTimeUnit.BATCH
.Equivalent to
Time(batch, TimeUnit.BATCH)
.
- classmethod from_duration(duration)[source]#
Create a
Time
with units ofTimeUnit.DURATION
.Equivalent to
Time(duration, TimeUnit.DURATION)
.
- classmethod from_epoch(epoch)[source]#
Create a
Time
with units ofTimeUnit.EPOCH
.Equivalent to
Time(epoch, TimeUnit.EPOCH)
.
- classmethod from_sample(sample)[source]#
Create a
Time
with units ofTimeUnit.SAMPLE
.Equivalent to
Time(sample, TimeUnit.SAMPLE)
.
- classmethod from_timestring(timestring)[source]#
Parse a time string into a
Time
instance.A time string is a numerical value followed by the value of a
TimeUnit
enum. For example:>>> Time.from_timestring("5ep") # describes 5 epochs. Time(5, TimeUnit.EPOCH) >>> Time.from_timestring("3e4tok") # describes 30,000 tokens. Time(30000, TimeUnit.TOKEN) >>> Time.from_timestring("0.5dur") # describes 50% of the training process. Time(0.5, TimeUnit.DURATION)
- Returns
Time โ An instance of
Time
.
- classmethod from_token(token)[source]#
Create a
Time
with units ofTimeUnit.TOKEN
.Equivalent to
Time(sample, TimeUnit.TOKEN)
.
- class composer.core.time.TimeUnit(value)[source]#
Bases:
composer.utils.string_enum.StringEnum
Enum class to represent units of time for the training process.
- class composer.core.time.Timestamp(epoch=0, batch=0, sample=0, token=0, batch_in_epoch=0, sample_in_epoch=0, token_in_epoch=0, total_wct=None, epoch_wct=None, batch_wct=None)[source]#
Bases:
composer.core.serializable.Serializable
Timestamp represents a snapshot of the current training progress.
The timestamp measures training progress in terms of epochs, batches, samples, tokens, and wall clock time. Timestamps are not updated in-place.
See the Time Guide for more details on tracking time during training.
- Parameters
batch_in_epoch (int | Time[int], optional) โ The batch in the epoch.
sample_in_epoch (int | Time[int], optional) โ The sample in the epoch.
token_in_epoch (int | Time[int], optional) โ The token in the epoch.
total_wct (timedelta, optional) โ The total wall-clock duration.
epoch_wct (timedelta, optional) โ The wall-clock duration of the last epoch.
batch_wct (timedelta, optional) โ The wall-clock duration of the last batch.
- property batch_in_epoch[source]#
The batch count in the current epoch (resets at 0 at the beginning of every epoch).
- copy(epoch=None, batch=None, sample=None, token=None, batch_in_epoch=None, sample_in_epoch=None, token_in_epoch=None, total_wct=None, epoch_wct=None, batch_wct=None)[source]#
Create a copy of the timestamp.
Any specified values will override the existing values in the returned copy.
- Parameters
batch_in_epoch (int | Time[int], optional) โ The batch in the epoch.
sample_in_epoch (int | Time[int], optional) โ The sample in the epoch.
token_in_epoch (int | Time[int], optional) โ The token in the epoch.
total_wct (timedelta, optional) โ The elapsed duration from the beginning of training.
- Returns
Timestamp โ A new timestamp instance, created from a copy, but with any specified values overriding the existing values.
- get_state()[source]#
Returns all values of the timestamp object in a dictionary.
- Returns
Dict[str, Union[Time[int], datetime.timedelta]] โ All values of the timestamp object.
- property sample_in_epoch[source]#
The sample count in the current epoch (resets at 0 at the beginning of every epoch).
- to_next_batch(samples=0, tokens=0, duration=None)[source]#
Create a new
Timestamp
, advanced to the next batch.Equivalent to:
>>> timestamp.copy( ... batch=timestamp.batch + 1, ... batch_in_epoch=timestamp.batch_in_epoch + 1, ... sample=timestamp.sample + samples, ... sample_in_epoch=timestamp.sample_in_epoch + samples, ... token = timestamp.token + tokens, ... token_in_epoch=timestamp.token_in_epoch + tokens, ... total_wct=timestamp.total_wct + duration, ... epoch_wct=timestamp.epoch_wct + duration, ... batch_wct=duration, ... ) Timestamp(...)
Note
For accurate time tracking, when doing distributed training, the
samples
andtokens
should be the total across all ranks for the given batch. This method will not accumulate these counts automatically. If per-rank sample and token counts are provided, these counts will differ across ranks, which could lead towards inconsistent behavior byAlgorithm
orCallback
instances that use these counts.
- to_next_epoch()[source]#
Create a new
Timestamp
, advanced to the next epoch.Equivalent to:
>>> timestamp.copy( ... epoch=timestamp.epoch+1, ... batch_in_epoch=0, ... sample_in_epoch=0, ... token_in_epoch=0, ... epoch_wct=datetime.timedelta(seconds=0), ... batch_wct=datetime.timedelta(seconds=0), ... ) Timestamp(...)