composer.Event
Events represent specific points in the training loop where a Algorithm
and
Callback
can run.
Note
By convention, Callback
should not be modifying the state,
and are used for non-essential reporting functions such as logging or timing.
Methods that need to modify state should be Algorithm
.
Events List
Available events include:
Name |
Description |
---|---|
|
Immediately after |
|
Start of training. For multi-GPU training, runs after the DDP process fork. |
|
Start and end of an Epoch. |
|
Start and end of a batch, inclusive of the optimizer step and any gradient scaling. |
|
Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms. |
|
Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once. |
|
Before and after the call to |
|
Before and after the loss computation. |
|
Before and after the backward pass. |
|
End of training. |
|
Start and end of evaluation through the validation dataset. |
|
Before and after the call to |
|
Before and after the call to |
Training Loop
For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:
model = your_model()
<INIT> # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)
ddp.launch() # for multi-GPUs, processes are forked here
<TRAINING_START> # has access to process rank for DDP
for epoch in range(90):
<EPOCH_START>
for batch in dataloader:
<AFTER_DATALOADER>
<BATCH_START>
#-- closure: forward/backward/loss -- #
<BEFORE_TRAIN_BATCH>
# for gradient accumulation
for microbatch in batch:
<BEFORE_FORWARD>
outputs = model.forward(microbatch)
<AFTER_FORWARD>
<BEFORE_LOSS>
loss = model.loss(outputs, microbatch)
<AFTER_LOSS>
<BEFORE_BACKWARD>
loss.backward()
<AFTER_BACKWARD>
gradient_unscaling() # for mixed precision
gradient_clipping()
<AFTER_TRAIN_BATCH>
# -------------------------- #
optimizer.step() # grad scaling (AMP) also
<BATCH_END>
scheduler.step('step')
maybe_eval()
scheduler.step('epoch')
maybe_eval()
<EPOCH_END>
<TRAINING_END>
def maybe_eval():
<EVAL_START>
for batch in eval_dataloader:
<EVAL_BATCH_START>
<EVAL_BEFORE_FORWARD>
outputs, targets = model.validate(batch)
<EVAL_AFTER_FORWARD>
metrics.update(outputs, targets)
<EVAL_BATCH_END>
<EVAL_END>
Note
Several events occur right after each other (e.g. AFTER_DATALOADER
and BATCH_START
).
We keep these separate because algorithms/callbacks may want to run,
for example, after all the dataloader transforms.
API Reference
- class composer.core.event.Event(value)[source]
Enum to represent events.
For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:
model = your_model() <INIT> # model surgery here optimizers = SGD(model.parameters(), lr=0.01) schedulers = CosineAnnealing(optimizers, T_max=90) ddp.launch() # for multi-GPUs, processes are forked here <TRAINING_START> # has access to process rank for DDP for epoch in range(90): <EPOCH_START> for batch in dataloader: <AFTER_DATALOADER> <BATCH_START> #-- closure: forward/backward/loss -- # <BEFORE_TRAIN_BATCH> # for gradient accumulation for microbatch in batch: <BEFORE_FORWARD> outputs = model.forward(microbatch) <AFTER_FORWARD> <BEFORE_LOSS> loss = model.loss(outputs, microbatch) <AFTER_LOSS> <BEFORE_BACKWARD> loss.backward() <AFTER_BACKWARD> gradient_unscaling() # for mixed precision gradient_clipping() <AFTER_TRAIN_BATCH> # -------------------------- # optimizer.step() # grad scaling (AMP) also <BATCH_END> scheduler.step('step') maybe_eval() scheduler.step('epoch') maybe_eval() <EPOCH_END> <TRAINING_END> def maybe_eval(): <EVAL_START> for batch in eval_dataloader: <EVAL_BATCH_START> <EVAL_BEFORE_FORWARD> outputs, targets = model.validate(batch) <EVAL_AFTER_FORWARD> metrics.update(outputs, targets) <EVAL_BATCH_END> <EVAL_END>
Note
Several events occur right after each other (e.g.
AFTER_DATALOADER
andBATCH_START
). We keep these separate because algorithms/callbacks may want to run, for example, after all the dataloader transforms.- INIT
Immediately after
model
initialization, and before creation ofoptimizers
andschedulers
. Model surgery typically occurs here.
- TRAINING_START
Start of training. For multi-GPU training, runs after the DDP process fork.
- EPOCH_START
Start of an epoch.
- BATCH_START
Start of a batch.
- AFTER_DATALOADER
Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.
- BEFORE_TRAIN_BATCH
Before the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.
- BEFORE_FORWARD
Before the call to
model.forward()
.
- AFTER_FORWARD
After the call to
model.forward()
.
- BEFORE_LOSS
Before the call to
model.loss()
.
- AFTER_LOSS
After the call to
model.loss()
.
- BEFORE_BACKWARD
Before the call to
loss.backward()
.
- AFTER_BACKWARD
After the call to
loss.backward()
.
- AFTER_TRAIN_BATCH
After the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.
- BATCH_END
End of a batch, which occurs after the optimizer step and any gradient scaling.
- EPOCH_END
End of an epoch.
- TRAINING_END
End of training.
- EVAL_START
Start of evaluation through the validation dataset.
- EVAL_BATCH_START
Before the call to
model.validate(batch)
- EVAL_BEFORE_FORWARD
Before the call to
model.validate(batch)
- EVAL_AFTER_FORWARD
After the call to
model.validate(batch)
- EVAL_BATCH_END
After the call to
model.validate(batch)
- EVAL_END
End of evaluation through the validation dataset.