Open Source Built by Netlify, for the agent web

Stop guessing if agents
can use your service.

AXIS is an open scoring framework and CLI that measures how well your project, APIs, and tooling actually work when an AI agent tries to use them. Run real agents against real scenarios, capture every tool call, and get a comparable 0–100 score back.

Get started View on GitHub

1 Install $npm i -g @netlify/axis

2 Set up $axis init

3 Run $axis run

The framework

Four dimensions, one score.

A single pass/fail tells you nothing about why an agent struggled. AXIS scores four independent dimensions so you can focus on what matters: a slow API, a confusing layout, a noisy tool, or the agent's own decisions.

40% Goal Achievement

Did the agent actually finish the task? Scored against your rubric checks by an LLM judge.

20% Environment

Shell, filesystem, build tools. Measures whether your project structure and dev workflow trip agents up.

20% Service

APIs, MCP tools, third-party services. Tells you whether your endpoints are actually usable by an agent.

20% Agent

Planning, tool selection, self-organization. Captures the quality of the agent's own decisions.

The report

Every run, fully inspectable.

AXIS doesn't just hand you a number. Each run produces a self-contained HTML report with the full transcript, every tool call, the LLM judge's per-criterion grading, and a sparse-index view that's optimized for skimming. Try the live sample below. Click around, expand interactions, read the rubric grades.

.axis/reports/sample/report.html

Open full

Sample data: 6 scenarios, 3 agents, generated at docs build time. Interactive.

The workflow

Score it. Baseline it. Gate CI on it.

AXIS is built to slot into the same place as your unit tests, just for agent experience. Run it locally to iterate, then turn on baselines and let your CI catch regressions automatically.

Define a scenario

A JSON file with a prompt, a rubric, and optional setup steps. Five lines of YAML is enough to start.

Run any agent

40+ built-in agents including Claude Code, Codex, Gemini, plus any ACP-compliant agent. Bring your own with a small adapter.

Set a baseline

Snapshot a passing run, commit the baseline, and any future regression beyond noise tolerance fails the build.

Ship for humans and the agents they use.

AXIS is open source and free.
You can wire it into your project in a couple of minutes.

Read the quickstart How scoring works

Stop guessing if agents can use your service.

Four dimensions, one score.

Every run, fully inspectable.

Score it. Baseline it. Gate CI on it.

Ship for humans and the agents they use.

Stop guessing if agents
can use your service.