You ask Claude to build a counter module. It writes the code in a tidy markdown block. You read it, copy it into your editor, hit run, and… nothing prints. Was the logic right? Did the constraint fire? Does facts.count actually end at three after three bumps?

You don't know. The AI doesn't know. You're both reasoning about static text.

We built a thing that changes that. Ask Claude (or Cursor, or Cline, or any MCP-compatible assistant) to build a Directive module and run it, and the answer comes back grounded in observed behavior – the actual facts, the actual logs, the actual errors – instead of "this should work."

This post is about why that matters and how to start using it today.

The gap in modern AI coding assistants

Every AI coding tool you've used works the same way: prompt in, code out. Sometimes the code's right. Sometimes it isn't. The cycle to find out is always the same – paste, run, read the output, copy the error back to the chat, ask the assistant to fix it, repeat.

That cycle assumes execution lives on your side. The AI writes; you run. Iteration speed is bottlenecked by how fast you can transcribe what just happened back into the conversation.

The fix is obvious in hindsight: give the assistant a runtime. Let it execute the code it just wrote and see what happened. Now the AI isn't guessing whether the constraint fires – it can check. It isn't asking you what the error message says – it has the error message. The "did this work?" round trip collapses from minutes to seconds.

That's the loop Directive's new run_in_sandbox tool closes.

What the AI sees in chat

Here's a real exchange with Claude running the Directive MCP server:

You: "Build me a Directive counter that doubles in a derivation, then run it and tell me what doubled looks like after three bumps."
Claude: [calls generate_module, gets a paired counter + runner] [calls run_in_sandbox with the paired files]
Sandbox returns:
logs:    ["[log] [start] facts: {\"count\":0}",
          "[log] [settled] facts: {\"count\":3}"]
facts:   { count: 3 }
derived: { doubled: 6 }
errors:  []
durationMs: 47
playgroundUrl: "https://directive.run/playground#files=…"
Claude: "After three bumps, facts.count is 3 and the derivation doubled is 6 – exactly as expected. The runner settled cleanly in 47 ms. Here's the source if you want to iterate, and a playground link so you can edit it live."

Notice what's different. The assistant tells you doubled: 6 because it read system.derive.doubled from the running system. It tells you the run settled in 47 ms because the sandbox timed it. The playground link is right there because the sandbox built one for you in the same response.

This is what people mean when they say "AI agents that close the loop." The agent isn't predicting what the code does; it's reporting what the code did.

Three ways to use it

1. As your AI assistant's runtime. Install the @directive-run/mcp server in Claude Desktop, Cursor, or any MCP-compatible client. Ask the assistant to build something with Directive, and ask it to run that something. The transcript comes back in chat. No copy-paste, no context switching.

2. As a web playground. Head to directive.run/playground, paste a Directive snippet, and click Run. The DevTools panel on the right shows the facts, the logs, and any errors. Edit the snippet in the StackBlitz embed and re-run with one click. Same sandbox, different UI.

3. As a CI gate or internal tool. The @directive-run/sandbox package is on npm. Drop it into any Node.js project to validate examples in your docs, gate generated code in a code-review bot, or build your own teaching playground. The API is one function:

import { runInSandbox } from "@directive-run/sandbox";

const result = await runInSandbox({
  files: [
    { path: "src/counter.ts", source: moduleCode },
    { path: "src/main.ts", source: runnerCode },
  ],
  timeoutMs: 5000,
});

if (result.errors.length === 0 && !result.timedOut) {
  console.log("Example settled cleanly:", result.facts);
}

What it can and can't do

It can:

Execute any TypeScript snippet that imports from the Directive ecosystem (@directive-run/core, react, vue, svelte, solid, lit, ai, query, and ten others).
Capture every console.log / warn / error.
Snapshot facts and computed derivations after the system settles.
Time out cleanly on infinite loops (default 5-second budget).
Run on Vercel, AWS Lambda, Cloud Run, or any Node 20+ environment.

It can't:

Reach the network – outbound HTTP to any private IP, localhost, or cloud metadata endpoint is blocked by design.
Touch the filesystem – no fs, no process, no require.
Run forever – wall-clock budget caps at 10 seconds.
Replace your real test suite – it's a "what does this code do at a glance?" tool, not a property-based testing framework.

For the curious: the sandbox documentation explains how the boundary is enforced, and the engineering retrospective walks through how we built it.

Why this matters for AI-assisted development

When the AI can run the code it writes, three things change:

Iteration gets faster. A failed run no longer requires you to paste the error back. The AI sees the error in its own response and can ask you whether to fix it, or just fix it and re-run.

Confidence gets real. "I built a module" becomes "I built a module and it produces {count: 3, doubled: 6} after three bumps." That's evidence, not assertion.

Documentation stays honest. A docs site that runs every example through the sandbox in CI can't ship code samples that don't actually settle. The transcript is the test.

Teaching becomes interactive. A tutorial blog post can ship a Directive snippet that the reader can edit in-page and see the result of, without ever leaving the article.

Get started in 60 seconds

If you're using Claude Desktop:

{
  "mcpServers": {
    "directive": {
      "command": "npx",
      "args": ["-y", "@directive-run/mcp"]
    }
  }
}

Drop that into ~/Library/Application Support/Claude/claude_desktop_config.json (Mac) or the equivalent on Windows, restart Claude, and ask:

"Use the directive MCP server to generate a traffic-light module and run it."

The first response should include the source, the captured transcript, and a clickable playground link. That's the loop closing.

If you're not using Claude Desktop yet, the playground needs no install – paste a snippet, click Run, watch the transcript appear.

What's next

The sandbox is the foundation for a much bigger story. We're working on:

Live re-run on every edit in the playground, so the transcript updates as you type.
Streaming transcripts in the MCP tool – the LLM sees facts.count: 1 arrive at the moment the system dispatches an event, not just the final snapshot.
Schema validation before execution – catch shape errors at the type layer instead of as runtime throws.
A validate_snippet MCP tool so the AI can check whether code WOULD run, without actually running it.

If you're building something that needs safe AI-generated code execution – a hosted playground, a CI gate, a teaching tool, an internal copilot – the sandbox is on npm today as @directive-run/sandbox. The MCP server with run_in_sandbox is @directive-run/mcp.

The whole point of an AI coding assistant is that you spend less time wondering whether the code works. Now your AI can find out for you.

Related reading:

Sandbox documentation – the API, the boundary, the cost model.
Inside the Directive Sandbox – the engineering retrospective.
Playground – paste, run, see what happened.
@directive-run/mcp on npm.
@directive-run/sandbox on npm.

AI That Actually Runs Your Code – How Directive's Sandbox Closes the Loop Between Code Generation and Execution