composer.algorithms.functional.apply_alibi

composer.algorithms.functional.apply_alibi(model: torch.nn.modules.module.Module, heads_per_layer: int, max_sequence_length: int, position_embedding_attribute: str, attention_module: torch.nn.modules.module.Module, attr_to_replace: str, alibi_attention: Callable, mask_replacement_function: Optional[Callable]) → None[source]

Removes position embeddings and replaces the attention function and attention mask according to AliBi.

Parameters

model – model to transform
heads_per_layer – number of attention heads per layer
max_sequence_length – maximum sequence length that the model will be able to accept without returning an error
position_embedding_attribute – attribute for position embeddings. For example in HuggingFace’s GPT2, the position embeddings are “transformer.wpe”.
attention_module – module/class that will have its self-attention function replaced. For example, in HuggingFace’s GPT, the self-attention module is transformers.models.gpt2.modeling_gpt2.GPT2Attention.
attr_to_replace – attribute that self-attention function will replace. For example, in HuggingFace’s GPT2, the self-attention function is “_attn”.
alibi_attention – new self-attention function in which ALiBi is implemented. Used to replace “{attention_module}.{attr_to_replace}”.
mask_replacement_function – function to replace model’s attention mask. This is sometimes necessary for evaluating on sequence lengths longer than the model was initialized to accommodate.