gated_linear_units#

Module gated_linear_units.

Functions

`apply_gated_linear_units`	Replaces the Linear layers in the feed-forward network with Gated Linear Units.
`from_BertIntermediate`	Defines a replacement policy from a `transformers.models.bert.modeling_bert.BertIntermediate` to a `torch.nn.Identity` The identity effectively acts as no-op.
`from_BertOutput`	Defines a replacement policy from a `transformers.models.bert.modeling_bert.BertOutput` to a `composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput`

Classes

`Algorithm`	Base class for algorithms.
`BERTGatedFFOutput`	Defines a single feed-forward block that uses Gated Linear Units.
`BertForMaskedLM`	Bert Model with a language modeling head on top.
`BertForSequenceClassification`	Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
`BertIntermediate`	Module `BertIntermediate`.
`BertOutput`	Module `BertOutput`.
`Event`	Enum to represent training loop events.
`GatedLinearUnits`	Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit.
`HuggingFaceModel`	A wrapper class that converts 🤗 Transformers models to composer models.
`Logger`	An interface to record training data.
`State`	The state of the trainer.

Exceptions

`MissingConditionalImportError`	Handles errors for external packages that might not be installed.
`NoEffectWarning`	Warns when an algorithm did not have an effect.

Attributes

Callable
Dict
IS_TRANSFORMERS_INSTALLED
Optional
Sequence
Type
Union
annotations
log

class composer.algorithms.gated_linear_units.gated_linear_units.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Bases: composer.core.algorithm.Algorithm

Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degredation to throughput.

Runs on Event.INIT, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.

Parameters

act_fn (Callable[[Tensor], Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.
gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.
non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

Example

from composer.algorithms import GatedLinearUnits

algorithm = GatedLinearUnits()
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration="1ep",
    algorithms=[algorithm],
    optimizers=[optimizer]
)

composer.algorithms.gated_linear_units.gated_linear_units.apply_gated_linear_units(model, optimizers, act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Replaces the Linear layers in the feed-forward network with Gated Linear Units.

Parameters

model (torch.nn.Module) – The model to modify in-place.
optimizers (torch.optim.Optimizer | Sequence[torch.optim.Optimizer], optional) –
Existing optimizers bound to model.parameters(). All optimizers that have already been constructed with model.parameters() must be specified here so that they will optimize the correct parameters.

If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.
act_fn (Callable[Tensor, Tensor], optional) – Optionally, the activation function to use. If None, the algorithm will use the existing activation function in the model.
gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.
non_gated_layer_bias (bool, optional) – Whether to use biases in the linear layers within the GLU. Default: False.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertIntermediate(layer, module_index)[source]#: Defines a replacement policy from a transformers.models.bert.modeling_bert.BertIntermediate to a torch.nn.Identity The identity effectively acts as no-op.

composer.algorithms.gated_linear_units.gated_linear_units.from_BertOutput(layer, module_index, act_fn, gated_layer_bias=False, non_gated_layer_bias=False)[source]#: Defines a replacement policy from a transformers.models.bert.modeling_bert.BertOutput to a composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput