๐ Evaluation#
To track training progress, validation datasets can be provided to the
Composer Trainer through the eval_dataloader
parameter. The trainer
will compute evaluation metrics on the evaluation dataset at a frequency
specified by the the Trainer
parameter eval_interval
.
from composer import Trainer
trainer = Trainer(
...,
eval_dataloader=my_eval_dataloader,
eval_interval="1ep", # Default is every epoch
)
The metrics should be provided by ComposerModel.metrics()
.
Multiple Datasets#
If there are multiple validation datasets that may have different metrics,
use Evaluator
to specify each pair of dataloader and metrics.
This class is just a container for a few attributes:
label
: a user-specified name for the metric.dataloader
: PyTorchDataLoader
or ourDataSpec
.See DataLoaders for more details.
metrics
:torchmetrics.Metric
ortorchmetrics.MetricCollection
.
For example, the GLUE tasks for language models can be specified as in the following example:
from composer.core import Evaluator
from torchmetrics import Accuracy, MetricCollection
from composer.models.nlp_metrics import BinaryF1Score
glue_mrpc_task = Evaluator(
label='glue_mrpc',
dataloader=mrpc_dataloader,
metrics=MetricCollection([BinaryF1Score(), Accuracy()])
)
glue_mnli_task = Evaluator(
label='glue_mnli',
dataloader=mnli_dataloader,
metrics=Accuracy()
)
trainer = Trainer(
...,
eval_dataloader=[glue_mrpc_task, glue_mnli_task],
...
)
In this case, the metrics from ComposerModel.metrics()
will be ignored
since they are explicitly provided above.