๐ช GLUE Entry Point#
This notebook is intended to demonstrate how to use the GLUE (General Language Understanding Evaluation) entry point for pre-training and fine-tuning NLP models across the 8 GLUE tasks.
This will cover:
The basics of the entry point and what it enables
How to construct your YAML for training
Executing an example fine-tuning job
Setup#
Letโs get started and configure our environment.
Install Composer#
First, install Composer if you havenโt already:
[ ]:
%pip install 'mosaicml[nlp]'
Next, pull and cd into the Composer Github repository:
[ ]:
!git clone https://github.com/mosaicml/composer
import os
os.chdir('composer/')
Basics of the Entry Point#
This entry point allows you to specify if you want to pre-train a NLP model, fine-tune a model on the downstream tasks, or run the entire pipeline. If pre-training, the entry point will handle distributed training across all available GPUs. If fine-tuning, the entry point will fine-tune all given checkpoints on all 8 GLUE tasks in parallel using multiprocessing pools. This entry point is designed to make this process more efficient and remove the tediousness of individually spawning jobs and manually loading all the model checkpoints.
Constructing your YAML for training#
A full out-of-the-box YAML example for this entry point can be found in ./glue_example.yaml.
If youโre already familiar with YAMLs, you can skip to the next part! If not, weโll break down how this is structured.
Pre-training#
If you are only pre-training an NLP model from scratch, you only need to specify the pretrain_hparams
section of the YAML. In this section, you will find your standard hyperparameters for pre-training a model โ the model configuration, dataset and dataloader specifications, batch size, etc. For the default configuration, we use identical parameters to composer/yamls/models/bert-base.yaml
to pre-train a BERT model. See TrainerHparams
documentation for more information about what is included in these parameters.
pretrain_hparams:
# Use a bert-base model, initialized from scratch
model:
bert:
use_pretrained: false
tokenizer_name: bert-base-uncased
pretrained_model_name: bert-base-uncased
# Train the model on the English C4 corpus
train_dataset:
streaming_c4:
remote: s3://allenai-c4/mds/1/
local: /tmp/mds-cache/mds-c4/
split: train
shuffle: true
tokenizer_name: bert-base-uncased
max_seq_len: 128
group_method: truncate
mlm: true
mlm_probability: 0.15
dataloader:
pin_memory: true
timeout: 0
prefetch_factor: 2
persistent_workers: true
num_workers: 8
# Periodically evaluate the LanguageCrossEntropy and Masked Accuracy
# on the validation split of the dataset.
evaluators:
evaluator:
label: bert_pre_training
eval_dataset:
streaming_c4:
remote: s3://allenai-c4/mds/1/
local: /tmp/mds-cache/mds-c4/
split: val
shuffle: false
tokenizer_name: bert-base-uncased
max_seq_len: 128
group_method: truncate
mlm: true
mlm_probability: 0.15
metric_names:
- LanguageCrossEntropy
- MaskedAccuracy
# Run evaluation after every 1000 training steps
eval_interval: 1000ba
# Use the decoupled AdamW optimizer with learning rate warmup
optimizers:
decoupled_adamw:
lr: 5.0e-4 # Peak learning rate
betas:
- 0.9
- 0.98
eps: 1.0e-06
weight_decay: 1.0e-5 # Amount of weight decay regularization
schedulers:
linear_decay_with_warmup:
t_warmup: 0.06dur # Point when peak learning rate is reached
alpha_f: 0.02
max_duration: 275184000sp # Subsample the training data for 275M samples
train_batch_size: 4000 # Number of training examples to use per update
eval_batch_size: 2000
precision: amp # Use mixed-precision training
grad_clip_norm: -1.0 # Turn off gradient clipping
grad_accum: 'auto' # Use automatic gradient accumulation to avoid OOMs
save_folder: checkpoints # The directory to save checkpoints to
save_interval: 3500ba # Save checkpoints every 3500 batches
save_artifact_name: '{run_name}/checkpoints/ep{epoch}-ba{batch}-rank{rank}'
save_num_checkpoints_to_keep: 0
save_overwrite: True
loggers:
object_store:
object_store_hparams: # The bucket to save checkpoints to
s3:
bucket: your-bucket-here
Fine-tuning#
If you are only fine-tuning checkpoints on the GLUE tasks, you are expected to specify the checkpoints to load from by specifying a finetune_ckpts
list as so in the finetune_hparams
section of your YAML. Upon runnning the entry point with this list, it will automatically pull all checkpoints and fine-tune on all of them. Note that if the finetune_ckpts
list contains paths in object store, the entry point expects a load_object_store
instance, as well as its corresponding
credentials to be specified, otherwise it will try to load from local disk. See our checkpointing guide if youโre not familiar with our checkpoint saving and loading schema.
In all logging instances, such as Weights and Biases and in the results table outputted at the end of training, all the fine-tune runs will be grouped by pre-train checkpoint name for easier organization and run tracking.
Below is an example finetune_hparams
that loads checkpoints from an Amazon S3 bucket:
finetune_hparams:
...
finetune_ckpts:
- path/to/checkpoint1
- path/to/checkpoint2
# if paths are in ObjectStore, the following is expected to be defined
load_object_store:
s3:
bucket: your-bucket-here
โ Note: The load paths provided in finetune_ckpts
have to be relative paths within an object store bucket/local directory as Composer does not currently allow checkpoints to be loaded via remote URIs. Alternatively, you can provide a full https URL to a remote checkpoint as your full path, such as https://storage.googleapis.com/path/to/checkpoint.pt
.
Pre-training and fine-tuning#
To run the entire end-to-end pipeline, you are expected to provide the entry point with your pre-train configuration as explained above, as well as any overrides to apply to the fine-tuning jobs. In this case, the entry point is run in two distinct stages for distributed pre-training and multiprocessed fine-tuning; however, all information transferred between the stages is automatically handled by the entry point. Checkpoints are automatically saved to your specified save_folder
and loaded
from wherever pre-training saved them, therefore the finetune_ckpts
section of finetune_hparams
is ignored if specified.
โ Note: The entry point runs all 8 GLUE fine-tuning tasks on every saved pre-training checkpoint, so set your save_interval within your pretrain_hparams
appropriately to avoid unnecessarily long evaluation times.
Executing your job#
Letโs now put together all our knowledge about the entry point and launch a job that will fine-tune a pre-trained BERT model on the 8 GLUE tasks! Because we are only fine-tuning with no special configurations, we only need to specify our bucket information and the finetune_ckpts
to load from. The following configuration will load a pre-trained model from our AWS S3 bucket, and save any fine-tune checkpoints under a local checkpoints
folder:
[ ]:
data = {
'finetune_hparams': {
'load_object_store': {'s3': {'bucket': 'mosaicml-internal-checkpoints-bert'}},
'save_folder': 'checkpoints',
'finetune_ckpts': ['bert-baseline-tokenizer-2uoe/checkpoints/ep0-ba68796-rank0']
}
}
Letโs now dump our constructed hparams to a YAML file to be loaded by the entry point:
[ ]:
import yaml
import tempfile
tmp_file = tempfile.NamedTemporaryFile()
with open(tmp_file.name, 'w+') as f:
yaml.dump(data, f)
Letโs launch it! At the end of training, we will see a table containing the GLUE per-task, GLUE-Large, and GLUE-All scores!
[ ]:
!python examples/glue/run_glue_trainer.py -f {tmp_file.name} --training_scheme finetune
๐ก Pro-tip: Try python examples/glue/run_glue_trainer.py --help
to get more information about the entry point, and python examples/glue/run_glue_trainer.py {pretrain_hparams, finetune_hparams} --help
to get a detailed breakdown of your hparams options!
Next steps#
Try pre-training and fine-tuning your own models with this framework! Also, feel free to check out the rest of our Composer docs to try using Composer speedups in this entry point!