composer.Event

Events represent specific points in the training loop where a Algorithm and Callback can run.

Note

By convention, Callback should not be modifying the state, and are used for non-essential reporting functions such as logging or timing. Methods that need to modify state should be Algorithm.

Events List

Available events include:

Name

Description

INIT

Immediately after model initialization, and before creation of optimizers and schedulers. Model surgery typically occurs here.

TRAINING_START

Start of training. For multi-GPU training, runs after the DDP process fork.

EPOCH_START, EPOCH_END

Start and end of an Epoch.

BATCH_START, BATCH_END

Start and end of a batch, inclusive of the optimizer step and any gradient scaling.

AFTER_DATALOADER

Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.

BEFORE_TRAIN_BATCH, AFTER_TRAIN_BATCH

Before and after the forward-loss-backward computation for a training batch. When using gradient_accumulation, these are still called only once.

BEFORE_FORWARD, AFTER_FORWARD

Before and after the call to model.forward()

BEFORE_LOSS, AFTER_LOSS

Before and after the loss computation.

BEFORE_BACKWARD, AFTER_BACKWARD

Before and after the backward pass.

TRAINING_END

End of training.

EVAL_START, EVAL_END

Start and end of evaluation through the validation dataset.

EVAL_BATCH_START, EVAL_BATCH_END

Before and after the call to model.validate(batch)

EVAL_BEFORE_FORWARD, EVAL_AFTER_FORWARD

Before and after the call to model.validate(batch)

Training Loop

For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:

model = your_model()
<INIT>  # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)

ddp.launch()  # for multi-GPUs, processes are forked here
<TRAINING_START>  # has access to process rank for DDP

for epoch in range(90):
    <EPOCH_START>

    for batch in dataloader:
        <AFTER_DATALOADER>
        <BATCH_START>

        #-- closure: forward/backward/loss -- #
        <BEFORE_TRAIN_BATCH>

        # for gradient accumulation
        for microbatch in batch:
            <BEFORE_FORWARD>
            outputs = model.forward(microbatch)
            <AFTER_FORWARD>
            <BEFORE_LOSS>
            loss = model.loss(outputs, microbatch)
            <AFTER_LOSS>
            <BEFORE_BACKWARD>
            loss.backward()
            <AFTER_BACKWARD>

        gradient_unscaling()  # for mixed precision
        gradient_clipping()
        <AFTER_TRAIN_BATCH>
        # -------------------------- #

        optimizer.step() # grad scaling (AMP) also

        <BATCH_END>
        scheduler.step('step')
        maybe_eval()

    scheduler.step('epoch')
    maybe_eval()
    <EPOCH_END>

<TRAINING_END>

def maybe_eval():
    <EVAL_START>

    for batch in eval_dataloader:
        <EVAL_BATCH_START>

        <EVAL_BEFORE_FORWARD>
        outputs, targets = model.validate(batch)
        <EVAL_AFTER_FORWARD>

        metrics.update(outputs, targets)
        <EVAL_BATCH_END>

    <EVAL_END>

Note

Several events occur right after each other (e.g. AFTER_DATALOADER and BATCH_START). We keep these separate because algorithms/callbacks may want to run, for example, after all the dataloader transforms.

API Reference

class composer.core.event.Event(value)[source]

Enum to represent events.

For a conceptual understanding of when events are run within the trainer, see the below pseudo-code outline:

model = your_model()
<INIT>  # model surgery here
optimizers = SGD(model.parameters(), lr=0.01)
schedulers = CosineAnnealing(optimizers, T_max=90)

ddp.launch()  # for multi-GPUs, processes are forked here
<TRAINING_START>  # has access to process rank for DDP

for epoch in range(90):
    <EPOCH_START>

    for batch in dataloader:
        <AFTER_DATALOADER>
        <BATCH_START>

        #-- closure: forward/backward/loss -- #
        <BEFORE_TRAIN_BATCH>

        # for gradient accumulation
        for microbatch in batch:
            <BEFORE_FORWARD>
            outputs = model.forward(microbatch)
            <AFTER_FORWARD>
            <BEFORE_LOSS>
            loss = model.loss(outputs, microbatch)
            <AFTER_LOSS>
            <BEFORE_BACKWARD>
            loss.backward()
            <AFTER_BACKWARD>

        gradient_unscaling()  # for mixed precision
        gradient_clipping()
        <AFTER_TRAIN_BATCH>
        # -------------------------- #

        optimizer.step() # grad scaling (AMP) also

        <BATCH_END>
        scheduler.step('step')
        maybe_eval()

    scheduler.step('epoch')
    maybe_eval()
    <EPOCH_END>

<TRAINING_END>

def maybe_eval():
    <EVAL_START>

    for batch in eval_dataloader:
        <EVAL_BATCH_START>

        <EVAL_BEFORE_FORWARD>
        outputs, targets = model.validate(batch)
        <EVAL_AFTER_FORWARD>

        metrics.update(outputs, targets)
        <EVAL_BATCH_END>

    <EVAL_END>

Note

Several events occur right after each other (e.g. AFTER_DATALOADER and BATCH_START). We keep these separate because algorithms/callbacks may want to run, for example, after all the dataloader transforms.

INIT

Immediately after model initialization, and before creation of optimizers and schedulers. Model surgery typically occurs here.

TRAINING_START

Start of training. For multi-GPU training, runs after the DDP process fork.

EPOCH_START

Start of an epoch.

BATCH_START

Start of a batch.

AFTER_DATALOADER

Immediately after the dataloader is called. Typically used for on-GPU dataloader transforms.

BEFORE_TRAIN_BATCH

Before the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.

BEFORE_FORWARD

Before the call to model.forward().

AFTER_FORWARD

After the call to model.forward().

BEFORE_LOSS

Before the call to model.loss().

AFTER_LOSS

After the call to model.loss().

BEFORE_BACKWARD

Before the call to loss.backward().

AFTER_BACKWARD

After the call to loss.backward().

AFTER_TRAIN_BATCH

After the forward-loss-backward computation for a training batch. When using gradient accumulation, this is still called only once.

BATCH_END

End of a batch, which occurs after the optimizer step and any gradient scaling.

EPOCH_END

End of an epoch.

TRAINING_END

End of training.

EVAL_START

Start of evaluation through the validation dataset.

EVAL_BATCH_START

Before the call to model.validate(batch)

EVAL_BEFORE_FORWARD

Before the call to model.validate(batch)

EVAL_AFTER_FORWARD

After the call to model.validate(batch)

EVAL_BATCH_END

After the call to model.validate(batch)

EVAL_END

End of evaluation through the validation dataset.