Skip to main content

Observability

10 min read

DevTools

A visual debugger that shows you what your AI agents did, why decisions were made, how long things took, and what went wrong.

Try it live

DevTools is active on all example pages. Visit Safety Shield or Checkpoint and click the Directive logo button (bottom-left) to inspect the system. For the full AI DevTools experience with streaming events, try the AI Chat demo.

What you can see: agent execution timeline, cost breakdown, constraint evaluation, guardrail results, breakpoints, live token streaming, session replay, and state inspection across 13 specialized views.

Directive DevTools
ws://localhost:4040
GraphGoalBreakpoints
researcher
writer
reviewer
pattern
0s1s2s3s4s5s
10 events | 550 tokens | 5.2sDemo session (pre-loaded)

How It Works

When you enable debug: true on an orchestrator, it records every decision as a timestamped event in a debug timeline. The DevTools server streams these events over WebSocket to the DevTools UI, which visualizes them in real-time.

Your App → Orchestrator (debug: true) → Timeline (event log)

                                        DevTools Server (WebSocket)

                                    DevTools UI (browser)

DevTools vs SSE Transport

These are two separate outputs from the same orchestrator. They serve different purposes and can be used independently.

DevTools is for you, the developer. It streams debug events (timeline, health, state) to the DevTools UI so you can inspect what happened inside the orchestrator.

SSE Transport (createSSETransport) is for your users. It streams agent text responses to your frontend – the typing effect you see in ChatGPT-style interfaces.

DevToolsSSE Transport
PurposeDebug & inspectStream responses to users
AudienceDeveloperEnd user
ProtocolWebSocket (bidirectional)HTTP SSE (one-way)
DataTimeline events, health, stateAgent text tokens
FunctionconnectDevTools()createSSETransport()

You can use one, both, or neither.


Quick Start

One line to connect DevTools to your orchestrator:

import { connectDevTools } from '@directive-run/ai';

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: { /* ... */ },
  debug: true,
});

const server = await connectDevTools(orchestrator, { port: 4040 });

console.log(`DevTools server on ws://localhost:${4040}`);

Open the DevTools UI and connect to ws://localhost:4040.


Manual Setup

For full control over the server configuration:

import { createDevToolsServer, createWsTransport } from '@directive-run/ai';

const transport = await createWsTransport({ port: 4040 });

const server = createDevToolsServer({
  transport,
  timeline: orchestrator.timeline!,
  healthMonitor: orchestrator.healthMonitor,
  getSnapshot: () => buildSnapshot(orchestrator),
  getBreakpointState: () => orchestrator.getPendingBreakpoints(),
  onResumeBreakpoint: (id, mods) => orchestrator.resumeBreakpoint(id, mods),
  onCancelBreakpoint: (id, reason) => orchestrator.cancelBreakpoint(id, reason),
  getScratchpadState: () => orchestrator.scratchpad?.getAll() ?? {},
  getDerivedState: () => orchestrator.derived ?? {},
  maxClients: 50,
  batchSize: 1,
  batchIntervalMs: 50,
});

Configuration

OptionTypeDefaultDescription
portnumber4040WebSocket server port
hoststring"localhost"Host to bind to
maxClientsnumber50Maximum concurrent DevTools clients
batchSizenumber1Events per batch message
batchIntervalMsnumber50Batch flush interval (ms)
authenticate(token: string) => boolean | Promise<boolean>Token validation callback (see Authentication)

Remote Connections

By default, the DevTools server binds to localhost – only accessible from the same machine. To debug a remote orchestrator (staging, production, another machine on your network):

const server = await connectDevTools(orchestrator, {
  port: 4040,
  host: "0.0.0.0", // Expose to all network interfaces
  authenticate: (token) => token === process.env.DEVTOOLS_TOKEN,
});

Security

Binding to 0.0.0.0 exposes the server to your entire network. Always use authentication when exposing DevTools beyond localhost. Use wss:// (WebSocket over TLS) in production via a reverse proxy.

When you need this:

  • Debugging a staging/production orchestrator from your local DevTools UI
  • Team debugging – multiple developers inspecting the same orchestrator
  • Cloud-hosted DevTools connecting to your running server

In the DevTools UI, enter the remote URL (e.g., ws://staging.internal:4040) and the auth token to connect.


Authentication

Token-based authentication for remote DevTools connections. Browser WebSocket doesn't support custom headers, so authentication happens as the first message after connection.

Server Side

const server = await connectDevTools(orchestrator, {
  port: 4040,
  host: "0.0.0.0",
  authenticate: async (token) => {
    // Validate against your secret, database, or auth service
    return token === process.env.DEVTOOLS_TOKEN;
  },
});

When authenticate is configured:

  1. New connections are held in a pending state
  2. The server waits for an authenticate message with the token
  3. If valid → sends welcome, proceeds normally
  4. If invalid → sends error with code AUTH_FAILED, closes connection

When authenticate is not configured, connections work exactly as before – no auth required. This is fully backward compatible.

Client Side

The DevTools UI has an optional "Auth Token" field in the sidebar. Enter your token before connecting to a remote server. The token is sent automatically as the first message after the WebSocket opens.

Manual Setup

If using createDevToolsServer directly:

const server = createDevToolsServer({
  transport,
  timeline: orchestrator.timeline!,
  authenticate: async (token) => {
    return token === process.env.DEVTOOLS_TOKEN;
  },
  // ... other config
});

Custom Transports

The DevTools server is transport-agnostic. It works with any WebSocket library via the DevToolsTransport interface:

interface DevToolsTransport {
  onConnection(handler: (
    client: DevToolsClient,
    onMessage: (handler: (data: string) => void) => void,
    onClose: (handler: () => void) => void,
  ) => void): void;
  close(): void;
}

The built-in createWsTransport uses the Node.js ws package, but you can implement this interface for Bun, Deno, or any other runtime.

When you'd build a custom transport: HTTP long-polling for environments without WebSocket support, SSE-based transport for one-way streaming, or a custom auth layer that validates tokens at the transport level.

function createMyTransport(port: number): DevToolsTransport {
  // Your WebSocket/transport setup here
  return {
    onConnection(handler) { /* wire up new connections */ },
    close() { /* shut down */ },
  };
}

const server = createDevToolsServer({
  transport: createMyTransport(4040),
  timeline: orchestrator.timeline!,
});

Views

The DevTools UI has 13 specialized views (6 system + 7 AI), accessible as tabs. A time format selector (ms / elapsed / clock) applies across all views.

System Tabs

Six tabs for inspecting core Directive system state:

TabDescription
FactsLive key-value table of all facts with filter, copy, inline editing, and breakpoint icons
DerivationsLive key-value table of all derivations with filter and copy
PipelineConstraint evaluation status, requirement lifecycle, and inflight resolvers
System GraphInteractive React Flow diagram of facts → derivations → constraints → resolvers
Time TravelSnapshot browser with diff view, undo/redo, and export/import
BreakpointsFact mutation breakpoints, trace event breakpoints, and pause/resume controls

AI Tabs

1. Timeline

Horizontal lanes per agent with bar-per-event rendering and row packing to prevent overlap.

Filtering:

  • Agent filter chips – show/hide specific agents
  • Event type filter chips – filter by event type
  • Regex search across all event properties (150ms debounce, ReDoS-safe)
  • Error-only quick filter – show only error events
  • AND/OR filter mode toggle – combine filters with intersection or union

Navigation:

  • Zoom (1x–20x) with Ctrl+Scroll
  • Pan with click-and-drag (grab cursor when zoomed)
  • Canvas minimap for navigation (high-DPI, click-to-pan)
  • Time axis labels with configurable format

Live features:

  • Replay cursor line (red vertical) for stepping through events
  • Error highlighting with red rings on error events
  • Live token streaming panel – per-agent token preview (up to 500 chars) with count
  • Pause/resume button with pending event count badge

Task event types:

  • task_start – Task execution begins
  • task_complete – Task execution completed
  • task_error – Task execution failed
  • task_progress – Task reports intermediate progress

2. Cost & Budget

Combined cost analysis and budget tracking in a single tabbed view.

Cost section:

  • Total tokens, input/output breakdown, and estimated cost
  • Stacked bar chart per agent with hover tooltips
  • Cost breakdown table: Agent, Runs, Input, Output, Total, Cost, %
  • Per-model pricing editor (local only) with reset to defaults

Budget section:

  • Hourly and daily budget bars with color alerts (90% → red, 70% → amber)
  • Remaining budget percentage
  • Recent spend list with agent filter and sort (time, cost, tokens)
  • Totals footer with aggregate cost

3. State

Two sub-tabs with key count badges: Scratchpad and Derived.

  • Key-value display with syntax highlighting and search/filter
  • Live updates as values change
  • Refresh button with 600ms debounce feedback
  • "Edit & Fork" button – modify state values and fork the timeline from that point

4. Guardrails

Guardrail check results with pass/fail status.

  • Guardrail event list with type (input/output), name, and result
  • Pass rate statistics
  • Color-coded results (green for pass, red for fail)

5. Agent Graph

Interactive directed acyclic graph using React Flow showing agent execution flow.

  • Agent nodes with status colors and icons
  • Execution edges with animated connections
  • Node selection for detail inspection
  • Freehand drawing annotations

Task nodes appear as violet dashed-border nodes with a gear icon, distinct from agent nodes. They show label, run count, and a progress bar during execution. Hover to see the task description.

6. Goal

Goal and target progress tracking.

  • Progress indicators for configured goals
  • Completion status per objective

7. Memory

Agent memory and context inspection.

  • Memory usage per agent
  • Context window utilization

Event Detail Panel

Clicking any event in the Timeline opens a detail panel showing event properties.

Features:

  • Property rendering – Syntax-highlighted values (booleans, numbers, strings, objects) with depth-limiting
  • String expansion – "Show more/less" toggle for truncated content (>200 chars)
  • Copy to clipboard – Copy event ID or full event JSON
  • Token counts – Displays inputTokens, outputTokens, and totalTokens when available

Replay Mode

Step through recorded events with playback controls. Uses frame-skipping to maintain real-time accuracy at faster speeds.

Controls:

  • Play/Pause (Space)
  • Step forward/backward (Arrow keys)
  • Seek to any position (cursor slider)
  • Jump to start/end (Home/End)
  • Exit replay (Escape)
  • Speed: 1x, 2x, 5x, 10x
  • Replay from event – right-click or use "Replay from here" in the detail panel

Error Highlighting

The DevTools highlights error events in the Timeline view with red rings and distinct coloring. Error events include agent_error, resolver_error, and failed guardrail checks. Use the error-only quick filter to isolate these events.


Session Management

  • Export JSON – Save a session to JSON with version and timestamp metadata
  • Export HTML – Generate a standalone HTML trace viewer (no dependencies, no WebSocket – share with anyone)
  • Import – Load a saved session for replay (validates event types and structure, 50MB limit)
  • Fork – Truncate timeline to a past point, optionally edit state, and replay from there

WebSocket Protocol

Server → Client

MessageDescription
welcomeConnection established (sent after successful auth if configured)
event / event_batchTimeline events
snapshotFull orchestrator state snapshot
healthAgent health data
breakpointsPending breakpoint state
scratchpad_state / scratchpad_updateScratchpad data
derived_state / derived_updateDerived values
token_stream / stream_doneLive token streaming
fork_completeFork operation completed
errorServer error (includes AUTH_FAILED for authentication failures)
pongKeepalive response

Client → Server

MessageDescription
authenticateSend auth token (required when server has authenticate configured)
request_snapshotRequest current state
request_healthRequest health data
request_eventsRequest event history
request_breakpointsRequest breakpoint state
request_scratchpadRequest scratchpad state
request_derivedRequest derived values
resume_breakpointResume a paused breakpoint
cancel_breakpointCancel a paused breakpoint
fork_from_snapshotFork timeline at a snapshot
export_session / import_sessionSession persistence
pingKeepalive ping

Supported Event Types

The DevTools UI recognizes 29 event types grouped by category:

CategoryEvent Types
Agent lifecycleagent_start, agent_complete, agent_error, agent_retry
Constraintsconstraint_evaluate, resolver_start, resolver_complete, resolver_error
Governanceguardrail_check, approval_request, approval_response, breakpoint_hit, breakpoint_resumed
Patternspattern_start, pattern_complete, race_start, race_winner, race_cancelled, debate_round, reflection_iteration
Statederivation_update, scratchpad_update
Checkpointscheckpoint_save, checkpoint_restore
Taskstask_start, task_complete, task_error, task_progress
Infrastructurehandoff_start, handoff_complete, reroute, dag_node_update

Connection Details

  • Auto-reconnect: Exponential backoff up to 30s, max 20 attempts
  • Keepalive: Ping every 30 seconds
  • Event buffer: Max 5,000 events in memory, requestAnimationFrame flushing
  • Token streaming: Buffers up to 10KB per agent, 50 concurrent agents max, 5-minute inactivity timeout
  • Prototype pollution defense: __proto__, constructor, prototype blocked on all inbound messages
  • Input validation: All server messages validated against typed discriminator union before processing

DevToolsServer API

interface DevToolsServer {
  clientCount: number;
  broadcast(message: DevToolsServerMessage): void;
  pushHealth(): void;
  pushBreakpoints(): void;
  pushScratchpadUpdate(key: string, value: unknown): void;
  pushDerivedUpdate(id: string, value: unknown): void;
  pushTokenStream(agentId: string, tokens: string, tokenCount: number): void;
  pushStreamDone(agentId: string, totalTokens: number): void;
  close(): void;
}

  • DevTools Plugin – Console API and floating panel for debugging any Directive system's facts, derivations, and events.
  • AI Chat Demo – Try the visual debugger in your browser.

Next Steps

Previous
Breakpoints & Checkpoints

Stay in the loop. Sign up for our newsletter.

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven Runtime for TypeScript | AI Guardrails & State Management