InContextLearningSchemaTaskDataset#
- class composer.datasets.InContextLearningSchemaTaskDataset(choices_key='context_options', *args, **kwargs)[source]#
A dataset that constructs batches for in-context learning schema evaluation. A schema task involves sentences with a fill-in-the-blank where the user needs to choose the correct word to fill in from a set of N options. We use the partial evaluation technique from https://arxiv.org/abs/1806.02847 to determine the modelโs choice of fill-in word.
The default input format is a jsonl file with the following fields: - context_options: List of strings corresponding to possible preceding context options for the continuation - gold: Index of the correct context from โcontext_optionsโ - continuation: The finishing continuation
Each batch then consists of
batch_size // N
distinct tasks and has the following the structure - input_ids: Input tensorbatch x seqlen x # of tokens
- continuation_indices: List ofbatch
consisting of tensors indicating which indices in the sequence correspond to the question answer (aka continuation) - mode: Indicates to the model that this is an ICL task and may rely on a custom code path to properly update metrics - labels: Identical to the input, used by the model to calculate loss/metrics - gold_indices: List of lengthbatch_size // N
indicating for each question, which of the answers is correct (via an integer [0, N-1]) - choice_groupings: Indicates which indices of the batch correspond to which questions- construct_context(example, preceding_text='', add_answer=False)[source]#
Takes a example and constructs a context with the correct context for the exampleโs continuation.
- Parameters
- Returns
str โ The single correct context for a given continuation