6 min read
Sandbox
The sandbox executes a Directive snippet inside a bounded worker_threads process and returns what happened: captured console.log output, the post-settle() facts snapshot, derivation values, and any errors. It's the engine behind two surfaces – the run_in_sandbox MCP tool and the playground's live transcript panel – and it's available as a standalone npm package for anyone building their own playground, CI gate, or teaching tool.
Why a sandbox
LLMs and humans both generate Directive code. The natural next question is "does it actually work?" Without an execution layer, the answer is "probably; run it locally and see." The sandbox closes that loop:
- Agent-side – the MCP tool returns the transcript so the AI can self-correct based on observed behavior rather than just static lint output.
- Web-side – the playground's Run button hits the sandbox and renders the same transcript inline, so a user can iterate from edits in the StackBlitz embed without leaving the page.
- CI-side – a docs site that wants to verify every example actually settles can run them all through the sandbox in a single pass.
What the sandbox isn't: a general-purpose code runner. It's tuned for Directive – it knows how to read facts via $store.toObject(), how to walk system.derive, how to wait through settle(). It refuses imports outside the @directive-run/* allowlist by design.
API
Install:
pnpm add @directive-run/sandbox
The package has esbuild and ts-morph as optional dependencies; install them too if your runtime doesn't already vendor them.
The single entry point is runInSandbox:
import { runInSandbox } from "@directive-run/sandbox";
const result = await runInSandbox({
files: [
{ path: "src/counter.ts", source: moduleSource },
{ path: "src/main.ts", source: runnerSource },
],
timeoutMs: 5000,
});
console.log(result.logs); // ["[log] [start] count= 0", "[log] [settled] count= 2"]
console.log(result.facts); // { count: 2 }
console.log(result.derived); // { isPositive: true }
console.log(result.errors); // []
The response shape:
interface SandboxResult {
logs: string[]; // captured console.log/warn/error lines
facts: Record<string, unknown>; // system.facts.$store.toObject() snapshot
derived: Record<string, unknown>; // system.derive[key] snapshot per declared key
errors: string[]; // structured error messages
durationMs: number;
timedOut: boolean;
}
For already-runnable snippets (the kind get_example or fix_code returns), pass { source: "..." } instead and the package maps it onto src/main.ts internally.
Sandbox boundary
Three defensive layers, in order:
AST allowlist validator (powered by
ts-morph). Pre-flights every file before the bundler:- Imports must match the
@directive-run/*curated set –core,ai,query,react,vue,svelte,solid,lit,el,optimistic,timeline,mutator,knowledge,scaffold,claude-plugin,lint– or a relative./*.jspath inside the payload. Anything else (node:fs,express,@sizls/*) is rejected. @directive-run/{cli,mcp,sandbox,vite-plugin-api-proxy}are explicitly denied. They're build / CLI / sandbox-meta tooling with no legitimate use inside a sandboxed demo.- Free identifier references to
process,require,fetch,Buffer,eval,setTimeout, etc. are denied. - Property-access bypass chains are rejected too (v0.3.0+):
globalThis.process,globalThis.fetch,globalThis["X"]bracket syntax,.constructoraccess on any value,Function(...)calls,Reflect.get(globalThis, …)smuggle. These were the holes that an earlier "skip property-key positions" rule unintentionally opened up; the Phase A security audit traced the chains and v0.3.0 closes them.
- Imports must match the
esbuild bundler with absolute-path rewriting. The payload is virtualized into a single ESM string with
@directive-run/*imports rewritten to absolutefile://URLs of the host's resolvednode_modulespaths. The worker can then import the bundle from/tmpwithout needing anode_moduleschain above it – which means Vercel, AWS Lambda, Cloud Run, and similar read-only-FS deploy targets all work.worker_threads.WorkerwithresourceLimits. 32 MB heap, 16 MB code, clamped wall-clock budget of[100ms, 10s](default 5s). The worker is hard-terminated on overrun viaworker.terminate()– no cooperative cancellation needed. The host writes the bundle to a fresh tmp directory per call and cleans up infinally, so leaked workers can't accumulate disk.Outbound fetch wrapper. The worker patches
globalThis.fetchbefore importing the bundle. The wrapper blocks loopback (127.0.0.0/8,::1,localhost), link-local (169.254.0.0/16– includes the AWS/GCP/Azure IMDS endpoint at.169.254), RFC-1918 private (10/8,172.16-31/12,192.168/16), multicast, IPv4-mapped IPv6 in literal AND hex form, and non-HTTP(S) protocols. The user's snippet can't callfetch(validator blocks it as a free identifier and asglobalThis.fetchaccess), but@directive-run/query's internalfetchcalls run inside the package's module body and the validator never sees them – the wrapper is the only place that can intercept.
What the boundary doesn't cover
- Network access to public hosts. A snippet that imports
@directive-run/querycan still hit any public URL it can spell. The wrapper blocks the private ranges, not the public internet. - CPU starvation outside V8 heap.
resourceLimitsis V8-heap-only. A snippet that allocates a giantBuffer(blocked at the validator), spins microtasks, or builds deeply-nested structures can still exhaust memory until the wall-clock kills it. - Trust boundary inversion. The sandbox protects the host from the snippet, not the snippet from the host. If you embed the sandbox in a server, the server's
process.env,node_moduleslayout, and child-process state are visible to your own code paths even though the snippet can't touch them.
The full threat-model coverage map – defended-vs-not-defended class by class – lives in docs/AE-AUDIT-SANDBOX.md in the directive monorepo.
Two ways to use it
As an MCP tool
The @directive-run/mcp server exposes run_in_sandbox as a tool. AI clients (Claude Desktop, Cursor, Cline) call it directly after generating code:
"Generate a Directive counter module, then run it and tell me what facts I see."
The LLM calls generate_module, pipes the paired output through run_in_sandbox, and reads the transcript back. The response includes a playgroundUrl for click-through editing if the user wants to iterate further.
As a docs-site API route
The directive.run/playground page hits an internal /api/sandbox Next.js route that wraps runInSandbox. The DevTools panel's Run button POSTs the source + runner files, renders the returned transcript inline. Same boundary, same execution stack, different UI.
For a sense of what's live: open the playground, paste a small Directive module, click Run, and watch the Facts / Logs / Errors tabs populate.
When to use what
- You want to demo a module to a user – use
playground_link. The user gets a clickable URL that boots a real running project in StackBlitz; they can edit live. - You want the AI to see what the module did and self-correct – use
run_in_sandbox. The transcript comes back in-chat so the next reasoning step has the observed behavior to work with. - You're building a CI gate that asserts every example settles cleanly – use the standalone
@directive-run/sandboxpackage and checkresult.errors.length === 0 && !result.timedOut. - You're shipping a teaching tool – wrap
runInSandboxbehind your own UI, surfaceresult.logsandresult.factsnext to the user's editor.
Cost model
- Cold start (first call after a process boot) –
ts-morphandesbuildload lazily, so first-call latency is in the 300–800 ms range. - Warm path – about 50–70 ms per call for a typical Directive snippet, plus whatever the snippet itself spends in
settle(). - Memory – workers are not pooled. Each call spins a fresh worker and tears it down, so memory state never carries between calls.
If you're driving high-QPS traffic the package isn't the right fit; it's tuned for interactive use and CI gates, not as a hot-path code runner.
See also
- Playground – the live UI that wraps the sandbox API route.
@directive-run/mcp– the MCP server that exposesrun_in_sandboxas a tool.@directive-run/sandbox– the npm package, with the canonical API reference in the package README.- Sources – the realtime primitive sandbox snippets often compose with.

