composer.models.transformer_shared#

composer.models.transformer_shared

Classes

`ComposerModel`	The minimal interface needed to use a model with `composer.trainer.Trainer`.
`ComposerTransformer`	Implements the base logic that all Transformers can build on top of.
`LanguageCrossEntropyLoss`	Hugging Face compatible cross entropy loss.

Attributes

Mapping
TYPE_CHECKING
Tuple
annotations
log

class composer.models.transformer_shared.ComposerTransformer(module, config, tokenizer, gradient_checkpointing=False)[source]#

Bases: composer.models.base.ComposerModel

Implements the base logic that all Transformers can build on top of.

Works with Hugging Face Transformers.

Parameters

module (transformers.PreTrainedModel) – An instance of PreTrainedModel that contains the forward pass function.
config (transformers.PretrainedConfig) – The PretrainedConfig object that stores information about the model hyperparameters.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer used for this model, necessary to assert required model inputs.

get_model_inputs()[source]#

Returns a set of inputs that the model expects in the forward pass.

If an algorithm wants to interact with the model inputs (for instance, popping the labels for a custom loss fn, or adding attention head masks for head pruning, it must access self.set_model_inputs().

Returns: The set of keys that are expected in the Mapping used to compute the forward pass.

loss(outputs, batch)[source]#

Computes the loss of the tensor from the output.

We don’t implement this for the generic Transformer abstraction, since loss functions are model and objective specific. A single model architecture could use a myriad of loss functions which are better left expressed by the user.

Parameters

outputs (Mapping) – The dictionary output from the model. It could contain the loss as computed by Hugging Face, or algorithms can pop the labels from the input in case they modify the loss function.
batch (Batch) – The set of ground truth labels to use to compute the loss against.

Returns

The loss as a ``Tensors`` object.

Raises

NotImplementedError – A model-specific and task-specific loss function must be written.

metrics(train=False)[source]#

Get metrics for evaluating the model.

Downstream models should override this method if they would like to add task-specific metrics.

Parameters: train (bool) – a boolean flag to indicate whether to return training or validation metrics.

Warning

If train=True, then it might calculate the training loss twice if algorithms are overriding the loss fn. This could be expensive due to the computational cost of softmax; it is worth exploring caching strategies.

Returns: A Metrics object that can be used to calculate task performance.

validate(batch)[source]#

Runs the validation step.

Parameters: batch (Batch) – a dictionary of Dict[str, Tensor] of inputs that the model expects, as found in ComposerTransformer.get_model_inputs().
Returns: Tuple[Mapping, None] – A tuple containing the output from the forward pass. This is fed into directly into the output of metrics().