Documentation Index
Fetch the complete documentation index at: https://docs.m4trix.dev/llms.txt
Use this file to discover all available pages before exploring further.
@m4trix/evals ships two CLI entry points:
eval-agents-simplefor scripted runs and CI.eval-agentsfor the interactive terminal UI.
eval-agents-simple.
Run a Config
--run-config <name>selects a discoveredRunConfigbyname; repeat it to queue several configs.--concurrency, -c Ncaps concurrent test-case executions. The simple CLI defaults to4.--experiment <name>forwards the label to evaluatormeta.experimentName.--ciexits with code1if any test case fails.
Generate Dataset Cases
Dataset.define({ name }) id. This command resolves the dataset and generates a case file from the matching discovered cases.
Interactive UI
Config File
Createm4trix-eval.config.ts in the project root to customize discovery, artifacts, and default runner concurrency.
Default Discovery
Without a config file, discovery starts atprocess.cwd() and scans for:
- Datasets:
.dataset.ts,.dataset.tsx,.dataset.js,.dataset.mjs - Evaluators:
.evaluator.ts,.evaluator.tsx,.evaluator.js,.evaluator.mjs - Run configs:
.run-config.ts,.run-config.tsx,.run-config.js,.run-config.mjs - Test cases:
.test-case.ts,.test-case.tsx,.test-case.js,.test-case.mjs
node_modulesdist.next.git.pnpm-store
Artifacts
Run results are written to.eval-results by default. Each run snapshot includes:
- run id, status, timestamps, and artifact path
- dataset id and display name
- evaluator ids
- total, completed, passed, and failed test-case counts
- per-test-case scores, metrics, logs, diffs, and errors
artifactDirectory in m4trix-eval.config.ts when you want artifacts colocated with your eval files or persisted by CI.
Config Precedence
Runner settings are applied in this order:- Built-in defaults
m4trix-eval.config.ts- Explicit
createRunner({...})overrides