Documentation Index
Fetch the complete documentation index at: https://docs.m4trix.dev/llms.txt
Use this file to discover all available pages before exploring further.
@m4trix/evals helps you build repeatable evaluation suites for AI systems. You describe the cases to run, group them into datasets, attach one or more evaluators, and execute named run configs from the CLI or runner API.
Use it when you want to:
- Keep prompt, agent, or workflow regressions visible over time
- Run the same test cases across several evaluators or scoring strategies
- Store run artifacts for later inspection
- Run evals locally, in CI, or through the interactive terminal UI
Core Model
An eval suite is made of four pieces:- Test cases define typed inputs, optional expected outputs, and tags.
- Datasets select test cases by tag and/or file path.
- Evaluators score each selected case and can record metrics, logs, and diffs.
- Run configs queue dataset/evaluator jobs with optional repetitions, sampling, and tags.
Package Exports
Import the builders and helpers from@m4trix/evals:
Dataset,TestCase,Evaluator,RunConfigScore,Metric,percentScore,deltaScore,binaryScoretokenCountMetric,latencyMetricTagAndFilter,TagOrFilter,TagSetdefineConfig,createRunner,withRunnerConfigS, re-exported from Effect Schema
Example Project
The repository includes a complete example atexamples/evals-example. It contains dataset, test case, evaluator, run config, sampling, tag filter, and config examples.
Run it from the example directory: