composer.algorithms.gated_linear_units.gated_linear_units#
composer.algorithms.gated_linear_units.gated_linear_units
Functions
Replaces the Linear layers in the feed-forward network with Gated Linear Units. |
|
Defines a replacement policy from a |
|
Defines a replacement policy from a |
Classes
Base class for algorithms. |
|
Defines a single feed-forward block that uses Gated Linear Units. |
|
BERT model based on ๐ค Transformers. |
|
|
transformers.models.bert.modeling_bert.BertIntermediate |
|
transformers.models.bert.modeling_bert.BertOutput |
Enum to represent training loop events. |
|
Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. |
|
An interface to record training data. |
|
The state of the trainer. |
Exceptions
Handles errors for external packages that might not be installed. |
|
Warns when an algorithm did not have an effect. |
Attributes
Callable
Dict
IS_TRANSFORMERS_INSTALLED
Optional
Sequence
Type
Union
annotations
log
- class composer.algorithms.gated_linear_units.gated_linear_units.GatedLinearUnits(act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#
Bases:
composer.core.algorithm.Algorithm
Replaces all instances of Linear layers in the feed-forward subnetwork with a Gated Linear Unit. The Gated Linear Units provide a more expressive form for the same number of parameters, and a slight degredation to throughput.
Runs on
INIT
, so it can swap the Linear layers in the FFN for GLUs before the model is DDP wrapped.- Parameters
act_fn (Callable[[Tensor], Tensor], optional) โ Optionally, the activation function to use. If
None
, the algorithm will use the existing activation function in the model.gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.non_gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.
Example
from composer.algorithms import GatedLinearUnits algorithm = GatedLinearUnits() trainer = Trainer( model=model, train_dataloader=train_dataloader, max_duration="1ep", algorithms=[algorithm], optimizers=[optimizer] )
- composer.algorithms.gated_linear_units.gated_linear_units.apply_gated_linear_units(model, optimizers, act_fn=None, gated_layer_bias=False, non_gated_layer_bias=False)[source]#
Replaces the Linear layers in the feed-forward network with Gated Linear Units.
- Parameters
model (torch.nn.Module) โ The model to modify in-place.
optimizers (torch.optim.Optimizer | Sequence[torch.optim.Optimizer], optional) โ
Existing optimizers bound to
model.parameters()
. All optimizers that have already been constructed withmodel.parameters()
must be specified here so that they will optimize the correct parameters.If the optimizer(s) are constructed after calling this function, then it is safe to omit this parameter. These optimizers will see the correct model parameters.
act_fn (Callable[Tensor, Tensor], optional) โ Optionally, the activation function to use. If
None
, the algorithm will use the existing activation function in the model.gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.non_gated_layer_bias (bool, optional) โ Whether to use biases in the linear layers within the GLU. Default:
False
.
- composer.algorithms.gated_linear_units.gated_linear_units.from_BertIntermediate(layer, module_index)[source]#
Defines a replacement policy from a
transformers.models.bert.modeling_bert.BertIntermediate
to atorch.nn.Identity
The identity effectively acts as no-op.
- composer.algorithms.gated_linear_units.gated_linear_units.from_BertOutput(layer, module_index, act_fn, gated_layer_bias=False, non_gated_layer_bias=False)[source]#
Defines a replacement policy from a
transformers.models.bert.modeling_bert.BertOutput
to acomposer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput