gated_linear_unit_layers#

Module gated_linear_unit_layers.

Classes

BERTGatedFFOutput

Defines a single feed-forward block that uses Gated Linear Units.

Attributes

  • Callable

class composer.algorithms.gated_linear_units.gated_linear_unit_layers.BERTGatedFFOutput(d_embed, d_ff, dropout_rate, act_fn, layernorm_eps, gated_layer_bias=False, non_gated_layer_bias=False)[source]#

Bases: torch.nn.modules.module.Module

Defines a single feed-forward block that uses Gated Linear Units.

Parameters
  • d_embed (int) โ€“ The input dimension for the feed-forward network.

  • d_ff (int) โ€“ The hidden dimension for the feed-forward network.

  • dropout_rate (float) โ€“ The dropout rate to use between the two projection matricies in the feed-forward block.

  • act_fn (Callable[Tensor, Tensor]) โ€“ The activation function to use in the feed-forward network.

  • layernorm_eps (float) โ€“ The epsilon term to use in the LayerNorm operator. Useful for when the variance is small.

  • gated_layer_bias (bool) โ€“ Whether to use a bias term in the gated projection matrix.

  • non_gated_layer_bias (bool) โ€“ Whether to use a bias term in teh non-gated projection matrix.

forward(hidden_states, residual_connection)[source]#
Parameters
  • hidden_states (Tensor) โ€“ The hidden states from the attention matrix.

  • residual_connection (Tensor) โ€“ The residual connection to add before the LayerNorm operator.