UseDocumentation Index
Fetch the complete documentation index at: https://docs.m4trix.dev/llms.txt
Use this file to discover all available pages before exploring further.
@m4trix/evals to define datasets, test cases, and evaluators for repeatable AI evaluation runs.
Location
How It Works
- Dataset — Groups test cases by tags and/or file paths
- Test Case — Defines input/output pairs (e.g. prompt + expected score threshold)
- Evaluator — Applies scoring logic to each test case
- RunConfig + CLI — Group dataset/evaluator jobs and execute with
eval-agents-simple run --run-config "..."
Setup
*.dataset.ts— Dataset definitions*.evaluator.ts— Evaluator definitions*.test-case.ts— Test case definitions*.run-config.ts— Named multi-job configs
Run Evals
--run-config to queue multiple configs; they share one --concurrency cap.
RunConfig names allow kebab-case, snake_case, camelCase, etc.: only letters, digits, _, and - (no spaces). The CLI matches names case-insensitively.
Key Files in evals-example
src/evals/demo.dataset.ts— Dataset withincludedTags: ['demo']src/evals/demo.evaluator.ts— Evaluators (score, length, multi-score, diff)src/evals/demo.test-case.ts— Test cases with prompts and expected outputssrc/evals/example-name.run-config.ts— ExampleRunConfigm4trix-eval.config.ts— Discovery and artifact paths
Config
Optionalm4trix-eval.config.ts at project root:
Next
- Testing Guide
- @m4trix/evals package docs