composer.datasets.glue_hparams#
GLUE (General Language Understanding Evaluation) dataset hyperparameters (Wang et al, 2019).
The GLUE benchmark datasets consist of nine sentence- or sentence-pair language understanding tasks designed to cover a diverse range of dataset sizes, text genres, and degrees of difficulty.
Note that the GLUE diagnostic dataset, which is designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, is not included here.
Please refer to the GLUE benchmark for more details.
Hparams
These classes are used with yahp
for YAML
-based configuration.
Sets up a generic GLUE dataset loader. |
- class composer.datasets.glue_hparams.GLUEHparams(use_synthetic=False, synthetic_num_unique_samples=100, synthetic_device='cpu', synthetic_memory_format=MemoryFormat.CONTIGUOUS_FORMAT, drop_last=True, shuffle=True, task=None, tokenizer_name=None, split=None, max_seq_length=256, max_network_retries=10)[source]#
Bases:
composer.datasets.dataset_hparams.DatasetHparams
,composer.datasets.synthetic_hparams.SyntheticHparamsMixin
Sets up a generic GLUE dataset loader.
- Parameters
task (str) โ the GLUE task to train on, choose one from:
'CoLA'
,'MNLI'
,'MRPC'
,'QNLI'
,'QQP'
,'RTE'
,'SST-2'
, and'STS-B'
.tokenizer_name (str) โ The name of the HuggingFace tokenizer to preprocess text with. See HuggingFace documentation.
split (str) โ Whether to use
'train'
,'validation'
, or'test'
split.max_seq_length (int, optional) โ Optionally, the ability to set a custom sequence length for the training dataset. Default:
256
.max_network_retries (int, optional) โ Number of times to retry HTTP requests if they fail. Default:
10
.
- Returns
DataLoader โ A PyTorch
DataLoader
object.