composer.datasets.glue#
composer.datasets.glue
Functions
Cast a value to a type. |
|
Returns the same class as was passed in, with dunder methods added based on the fields defined in the class. |
Classes
Specifications for operating and training on data. |
Hparams
These classes are used with yahp
for YAML
-based configuration.
Hyperparameters to initialize a |
|
Abstract base class for hyperparameters to initialize a dataset. |
|
Sets up a generic GLUE dataset loader. |
Attributes
Dataset
log
- class composer.datasets.glue.GLUEHparams(is_train=True, drop_last=True, shuffle=True, datadir=None, task=None, tokenizer_name=None, split=None, max_seq_length=256, num_workers=64, max_network_retries=10)[source]#
Bases:
composer.datasets.hparams.DatasetHparams
Sets up a generic GLUE dataset loader.
- Parameters
datadir (str) โ The path to the data directory.
is_train (bool) โ Whether to load the training data (the default) or validation data.
drop_last (bool) โ If the number of samples is not divisible by the batch size, whether to drop the last batch (the default) or pad the last batch with zeros.
shuffle (bool) โ Whether to shuffle the dataset. Defaults to True.
task (str) โ the GLUE task to train on, choose one from: CoLA, MNLI, MRPC, QNLI, QQP, RTE, SST-2, and STS-B.
tokenizer_name (str) โ The name of the HuggingFace tokenizer to preprocess text with.
split (str) โ Whether to use โtrainโ, โvalidationโ or โtestโ split.
max_seq_length (int) โ Optionally, the ability to set a custom sequence length for the training dataset. Default: 256
num_workers (int) โ Optionally, the number of CPU workers to use to preprocess the text. Default: 64
max_network_retries (int) โ Optionally, the number of times to retry HTTP requests if they fail. Default: 10
- Returns
A :class:`~composer.core.DataSpec` object
- initialize_object(batch_size, dataloader_hparams)[source]#
Creates a
DataLoader
orDataloaderSpec
for this dataset.- Parameters
batch_size (int) โ The size of the batch the dataloader should yield. This batch size is device-specific and already incorporates the world size.
dataloader_hparams (DataloaderHparams) โ The dataset-independent hparams for the dataloader
- Returns
Dataloader or DataSpec โ The dataloader, or if the dataloader yields batches of custom types,
a :class:`DataSpec`.