composer.algorithms.gated_linear_units.gated_linear_unit_layers#

composer.algorithms.gated_linear_units.gated_linear_unit_layers

Classes

BERTGatedFFOutput

Defines a single feed-forward block that uses Gated Linear Units.

Attributes

Callable

class composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput(d_embed, d_ff, dropout_rate, act_fn, layernorm_eps, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Bases: torch.nn.modules.module.Module

Defines a single feed-forward block that uses Gated Linear Units.

Parameters

d_embed (int) – The input dimension for the feed-forward network.
d_ff (int) – The hidden dimension for the feed-forward network.
dropout_rate (float) – The dropout rate to use between the two projection matricies in the feed-forward block.
act_fn (Callable[Tensor, Tensor]) – The activation function to use in the feed-forward network.
layernorm_eps (float) – The epsilon term to use in the LayerNorm operator. Useful for when the variance is small.
gated_layer_bias (bool) – Whether to use a bias term in the gated projection matrix.
non_gated_layer_bias (bool) – Whether to use a bias term in teh non-gated projection matrix.

forward(hidden_states, residual_connection)[source]#

Parameters

hidden_states (Tensor) – The hidden states from the attention matrix.
residual_connection (Tensor) – The residual connection to add before the LayerNorm operator.