composer.Event
Events represent specific points in the training loop where a Algorithm
and
Callback
can run.
Note
By convention, Callback
should not be modifying the state,
and are used for non-essential reporting functions such as logging or timing.
Methods that need to modify state should be Algorithm
.
Events List
Available events include:
Name |
Description |
---|---|
|
Immediately after |
|
Start of training. For multi-GPU training, runs after the DDP process fork. |
|
Start and end of an Epoch. |
|
Start and end of a batch, inclusive of the optimizer step and any gradient scaling. |
|
Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms. |
|
Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once. |
|
Before and after the call to |
|
Before and after the loss computation. |
|
Before and after the backward pass. |
|
End of training. |
|
Start and end of evaluation through the validation dataset. |
|
Before and after the call to |
|
Before and after the call to |
Training Loop
For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:
model = your_model()
<INIT> # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)
ddp.launch() # for multi-GPUs, processes are forked here
<TRAINING_START> # has access to process rank for DDP
for epoch in range(90):
<EPOCH_START>
for batch in dataloader:
<AFTER_DATALOADER>
<BATCH_START>
#-- closure: forward/backward/loss -- #
<BEFORE_TRAIN_BATCH>
# for gradient accumulation
for microbatch in batch:
<BEFORE_FORWARD>
outputs = model.forward(microbatch)
<AFTER_FORWARD>
<BEFORE_LOSS>
loss = model.loss(outputs, microbatch)
<AFTER_LOSS>
<BEFORE_BACKWARD>
loss.backward()
<AFTER_BACKWARD>
gradient_unscaling() # for mixed precision
gradient_clipping()
<AFTER_TRAIN_BATCH>
# -------------------------- #
optimizer.step() # grad scaling (AMP) also
<BATCH_END>
scheduler.step('step')
maybe_eval()
scheduler.step('epoch')
maybe_eval()
<EPOCH_END>
<TRAINING_END>
def maybe_eval():
<EVAL_START>
for batch in eval_dataloader:
<EVAL_BATCH_START>
<EVAL_BEFORE_FORWARD>
outputs, targets = model.validate(batch)
<EVAL_AFTER_FORWARD>
metrics.update(outputs, targets)
<EVAL_BATCH_END>
<EVAL_END>
Note
Several events occur right after each other (e.g. AFTER_DATALOADER
and BATCH_START
).
We keep these separate because algorithms/callbacks may want to run,
for example, after all the dataloader transforms.
API Reference
- class composer.core.event.Event(value)[source]
An event that occurs during the execution of the training and evaluation loops.
For a conceptual understanding of what events are run within the trainer, see the below pseudo-code outline:
``` model = your_model() <INIT> # model surgery here optimizers = SGD(model.parameters(), lr=0.01) schedulers = CosineAnnealing(optimizers, T_max=90)
ddp.launch() # for multi-GPUs, processes are forked here <TRAINING_START> # has access to process rank for DDP
- for epoch in range(90):
<EPOCH_START>
- for batch in dataloader:
<AFTER_DATALOADER> <BATCH_START>
#– closure: forward/backward/loss – # <BEFORE_TRAIN_BATCH>
# for gradient accumulation for microbatch in batch:
<BEFORE_FORWARD> outputs = model.forward(microbatch) <AFTER_FORWARD> <BEFORE_LOSS> loss = model.loss(outputs, microbatch) <AFTER_LOSS> <BEFORE_BACKWARD> loss.backward() <AFTER_BACKWARD>
gradient_unscaling() # for mixed precision gradient_clipping() <AFTER_TRAIN_BATCH> # ————————– #
optimizer.step() # grad scaling (AMP) also
<BATCH_END> scheduler.step(‘step’) maybe_eval()
scheduler.step(‘epoch’) maybe_eval() <EPOCH_END>
<TRAINING_END>
- def maybe_eval():
<EVAL_START>
- for batch in eval_dataloader:
<EVAL_BATCH_START>
<EVAL_BEFORE_FORWARD> outputs, targets = model.validate(batch) <EVAL_AFTER_FORWARD>
metrics.update(outputs, targets) <EVAL_BATCH_END>
<EVAL_END>