composer.Event

Events represent specific points in the training loop where a Algorithm and Callback can run.

Note

By convention, Callback should not be modifying the state, and are used for non-essential reporting functions such as logging or timing. Methods that need to modify state should be Algorithm.

Events List

Available events include:

Name	Description
`INIT`	Immediately after `model` initialization, and before creation of `optimizers` and `schedulers`. Model surgery typically occurs here.
`TRAINING_START`	Start of training. For multi-GPU training, runs after the DDP process fork.
`EPOCH_START`, `EPOCH_END`	Start and end of an Epoch.
`BATCH_START`, `BATCH_END`	Start and end of a batch, inclusive of the optimizer step and any gradient scaling.
`AFTER_DATALOADER`	Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.
`BEFORE_TRAIN_BATCH`, `AFTER_TRAIN_BATCH`	Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once.
`BEFORE_FORWARD`, `AFTER_FORWARD`	Before and after the call to `model.forward()`
`BEFORE_LOSS`, `AFTER_LOSS`	Before and after the loss computation.
`BEFORE_BACKWARD`, `AFTER_BACKWARD`	Before and after the backward pass.
`TRAINING_END`	End of training.
`EVAL_START`, `EVAL_END`	Start and end of evaluation through the validation dataset.
`EVAL_BATCH_START`, `EVAL_BATCH_END`	Before and after the call to `model.validate(batch)`
`EVAL_BEFORE_FORWARD`, `EVAL_AFTER_FORWARD`	Before and after the call to `model.validate(batch)`

Training Loop

For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:

model = your_model()
<INIT>  # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)

ddp.launch()  # for multi-GPUs, processes are forked here
<TRAINING_START>  # has access to process rank for DDP

for epoch in range(90):
    <EPOCH_START>

    for batch in dataloader:
        <AFTER_DATALOADER>
        <BATCH_START>

        #-- closure: forward/backward/loss -- #
        <BEFORE_TRAIN_BATCH>

        # for gradient accumulation
        for microbatch in batch:
            <BEFORE_FORWARD>
            outputs = model.forward(microbatch)
            <AFTER_FORWARD>
            <BEFORE_LOSS>
            loss = model.loss(outputs, microbatch)
            <AFTER_LOSS>
            <BEFORE_BACKWARD>
            loss.backward()
            <AFTER_BACKWARD>

        gradient_unscaling()  # for mixed precision
        gradient_clipping()
        <AFTER_TRAIN_BATCH>
        # -------------------------- #

        optimizer.step() # grad scaling (AMP) also

        <BATCH_END>
        scheduler.step('step')
        maybe_eval()

    scheduler.step('epoch')
    maybe_eval()
    <EPOCH_END>

<TRAINING_END>

def maybe_eval():
    <EVAL_START>

    for batch in eval_dataloader:
        <EVAL_BATCH_START>

        <EVAL_BEFORE_FORWARD>
        outputs, targets = model.validate(batch)
        <EVAL_AFTER_FORWARD>

        metrics.update(outputs, targets)
        <EVAL_BATCH_END>

    <EVAL_END>

Note

Several events occur right after each other (e.g. AFTER_DATALOADER and BATCH_START). We keep these separate because algorithms/callbacks may want to run, for example, after all the dataloader transforms.

API Reference

class composer.core.event.Event(value)[source]

Enum to represent events.

For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:

model = your_model()
<INIT>  # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)

ddp.launch()  # for multi-GPUs, processes are forked here
<TRAINING_START>  # has access to process rank for DDP

for epoch in range(90):
    <EPOCH_START>

    for batch in dataloader:
        <AFTER_DATALOADER>
        <BATCH_START>

        #-- closure: forward/backward/loss -- #
        <BEFORE_TRAIN_BATCH>

        # for gradient accumulation
        for microbatch in batch:
            <BEFORE_FORWARD>
            outputs = model.forward(microbatch)
            <AFTER_FORWARD>
            <BEFORE_LOSS>
            loss = model.loss(outputs, microbatch)
            <AFTER_LOSS>
            <BEFORE_BACKWARD>
            loss.backward()
            <AFTER_BACKWARD>

        gradient_unscaling()  # for mixed precision
        gradient_clipping()
        <AFTER_TRAIN_BATCH>
        # -------------------------- #

        optimizer.step() # grad scaling (AMP) also

        <BATCH_END>
        scheduler.step('step')
        maybe_eval()

    scheduler.step('epoch')
    maybe_eval()
    <EPOCH_END>

<TRAINING_END>

def maybe_eval():
    <EVAL_START>

    for batch in eval_dataloader:
        <EVAL_BATCH_START>

        <EVAL_BEFORE_FORWARD>
        outputs, targets = model.validate(batch)
        <EVAL_AFTER_FORWARD>

        metrics.update(outputs, targets)
        <EVAL_BATCH_END>

    <EVAL_END>

Note

Several events occur right after each other (e.g. AFTER_DATALOADER and BATCH_START). We keep these separate because algorithms/callbacks may want to run, for example, after all the dataloader transforms.

INIT: Immediately after model initialization, and before creation of optimizers and schedulers. Model surgery typically occurs here.

TRAINING_START: Start of training. For multi-GPU training, runs after the DDP process fork.

EPOCH_START: Start of an epoch.

BATCH_START: Start of a batch.

AFTER_DATALOADER: Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.

BEFORE_TRAIN_BATCH: Before the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.

BEFORE_FORWARD: Before the call to model.forward().

AFTER_FORWARD: After the call to model.forward().

BEFORE_LOSS: Before the call to model.loss().

AFTER_LOSS: After the call to model.loss().

BEFORE_BACKWARD: Before the call to loss.backward().

AFTER_BACKWARD: After the call to loss.backward().

AFTER_TRAIN_BATCH: After the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.

BATCH_END: End of a batch, which occurs after the optimizer step and any gradient scaling.

EPOCH_END: End of an epoch.

TRAINING_END: End of training.

EVAL_START: Start of evaluation through the validation dataset.

EVAL_BATCH_START: Before the call to model.validate(batch)

EVAL_BEFORE_FORWARD: Before the call to model.validate(batch)

EVAL_AFTER_FORWARD: After the call to model.validate(batch)

EVAL_BATCH_END: After the call to model.validate(batch)

EVAL_END: End of evaluation through the validation dataset.