Stop guessing if agents
can use your service.
AXIS is an open scoring framework and CLI that measures how well your project, APIs, and tooling actually work when an AI agent tries to use them. Run real agents against real scenarios, capture every tool call, and get a comparable 0–100 score back.
$npm i -g @netlify/axis $axis init $axis run Four dimensions, one score.
A single pass/fail tells you nothing about why an agent struggled. AXIS scores four independent dimensions so you can focus on what matters: a slow API, a confusing layout, a noisy tool, or the agent's own decisions.
Did the agent actually finish the task? Scored against your rubric checks by an LLM judge.
Shell, filesystem, build tools. Measures whether your project structure and dev workflow trip agents up.
APIs, MCP tools, third-party services. Tells you whether your endpoints are actually usable by an agent.
Planning, tool selection, self-organization. Captures the quality of the agent's own decisions.
Every run, fully inspectable.
AXIS doesn't just hand you a number. Each run produces a self-contained HTML report with the full transcript, every tool call, the LLM judge's per-criterion grading, and a sparse-index view that's optimized for skimming. Try the live sample below. Click around, expand interactions, read the rubric grades.
Score it. Baseline it. Gate CI on it.
AXIS is built to slot into the same place as your unit tests, just for agent experience. Run it locally to iterate, then turn on baselines and let your CI catch regressions automatically.
A JSON file with a prompt, a rubric, and optional setup steps. Five lines of YAML is enough to start.
40+ built-in agents including Claude Code, Codex, Gemini, plus any ACP-compliant agent. Bring your own with a small adapter.
Snapshot a passing run, commit the baseline, and any future regression beyond noise tolerance fails the build.
Ship for humans and the agents they use.
AXIS is open source and free.
You can wire it into your project in a couple of minutes.