Composable wrappers that make any AgentRunner production-ready.

Each feature follows the same pattern: wrap a runner, get a runner back. Stack them with care – order determines behavior.

Composition Model

Every wrapper has the signature (runner, config) => AgentRunner — except withFallback, which takes an array of runners: ([runners...], config) => AgentRunner. Chain them like middleware.

Composition order matters

Apply wrappers from inside out:

Model Selection → Fallback → Retry → Budget → Structured Output

Model selection runs closest to the provider. Budget checks happen before retries. Structured output validates after everything else.

Watch for retry multiplication. withRetry({ maxRetries: 3 }) wrapping withStructuredOutput({ maxRetries: 2 }) means up to 12 LLM calls (4 retry attempts × 3 parse retries each). Similarly, withFallback with retried runners multiplies: two runners with maxRetries: 2 each = up to 6 total attempts.

The examples below assume these runners are set up. See Running Agents for all provider options.

import { createOpenAIRunner } from '@directive-run/ai/openai';
import { createAnthropicRunner } from '@directive-run/ai/anthropic';

const baseRunner = createOpenAIRunner({
  apiKey: process.env.OPENAI_API_KEY!,
});
const backupRunner = createAnthropicRunner({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

Using `pipe()`

The cleanest way to compose middleware is pipe(), which applies wrappers left-to-right:

import {
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
  (r) => withBudget(r, { budgets: [{ window: 'hour', maxCost: 5, pricing }] }),
  (r) => withStructuredOutput(r, { schema: MySchema }),
);

Manual Composition

Or apply wrappers manually:

import {
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
  byAgentName,
} from '@directive-run/ai';

// Build from inside out – innermost wrapper runs closest to the provider
let runner = baseRunner;
runner = withModelSelection(runner, [byInputLength(200, 'gpt-4o-mini')]);
runner = withFallback([runner, backupRunner]);
runner = withRetry(runner, { maxRetries: 3 });
runner = withBudget(runner, { budgets: [{ window: 'hour', maxCost: 5, pricing }] });
runner = withStructuredOutput(runner, { schema: MySchema });

With Orchestrators

Pass the composed runner to either orchestrator:

import { createAgentOrchestrator, createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
);

// Single-agent
const single = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await single.run(agent, 'Hello!');

// Multi-agent – the same composed runner is shared across all agents
const multi = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
});
const research = await multi.runAgent('researcher', 'Explain WASM');

Intelligent Retry

HTTP-status-aware retry with exponential backoff and jitter. Respects Retry-After headers on 429 responses and never retries client errors (400, 401, 403, 404, 422).

import { withRetry, RetryExhaustedError } from '@directive-run/ai';

const runner = withRetry(baseRunner, {
  maxRetries: 3,         // 3 retries + 1 initial = 4 total attempts
  baseDelayMs: 1000,     // Start with 1s delay
  maxDelayMs: 30000,     // Cap at 30s
  onRetry: (attempt, error, delayMs) => {
    console.log(`Retry ${attempt} in ${delayMs}ms: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof RetryExhaustedError) {
    console.error(`All ${err.retryCount} retries failed`);
    console.error('Last error:', err.lastError.message);
  }
}

Retry Behavior by Status Code

Status	Behavior
429	Retry with `Retry-After` header value (falls back to exponential backoff)
500, 502, 503	Retry with exponential backoff + jitter
400, 401, 403, 404, 422	Never retry (client errors)
No HTTP status	Retry (network errors, timeouts)

Custom Retry Predicate

const runner = withRetry(baseRunner, {
  maxRetries: 2,
  isRetryable: (error) => {
    // Don't retry invalid API key errors
    if (error.message.includes('invalid API key')) {
      return false;
    }
    return true; // Retry everything else
  },
});

Provider Fallback

Automatic failover across multiple runners. Tries each in order; moves to the next on failure.

import { withFallback, withRetry, AllProvidersFailedError } from '@directive-run/ai';

const runner = withFallback([
  withRetry(openaiRunner, { maxRetries: 2 }),    // Try OpenAI first (with retries)
  withRetry(anthropicRunner, { maxRetries: 2 }),  // Fall back to Anthropic
  ollamaRunner,                                   // Last resort: local Ollama
], {
  shouldFallback: (error) => {
    // Don't fall back on auth errors – they'll fail everywhere
    return !error.message.includes('401');
  },
  onFallback: (fromIndex, toIndex, error) => {
    console.log(`Provider ${fromIndex} failed, trying ${toIndex}: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof AllProvidersFailedError) {
    console.error(`All ${err.errors.length} providers failed:`);
    err.errors.forEach((e, i) => console.error(`  [${i}] ${e.message}`));
  }
}

Cost Budget Guards

Pre-call cost estimation and rolling budget windows prevent runaway spending. Each budget window tracks costs independently.

import { withBudget, BudgetExceededError } from '@directive-run/ai';
import type { BudgetRunner } from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = withBudget(baseRunner, {
  // Per-call limit
  maxCostPerCall: 0.10,
  pricing,

  // Rolling windows – each tracked independently, resets on a rolling basis (not calendar)
  budgets: [
    { window: 'hour', maxCost: 5.00, pricing },
    { window: 'day', maxCost: 50.00, pricing },
  ],

  // Fine-tune estimation
  charsPerToken: 4,               // ~4 characters per token (default)
  estimatedOutputMultiplier: 1.5,  // Expect 1.5x output tokens vs input

  onBudgetExceeded: (details) => {
    alert(`Budget exceeded (${details.window}): $${details.estimated.toFixed(4)} > $${details.remaining.toFixed(4)}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.error(`${err.window} budget exceeded: $${err.estimated.toFixed(4)}`);
  }
}

Checking Spend

Access the getSpent() method to build dashboards or preemptive alerts:

// Cast needed because getSpent() is added by withBudget, not on the base AgentRunner type
const spent = (runner as BudgetRunner).getSpent('hour');
const limit = 5.00;

if (spent > limit * 0.8) {
  console.warn(`Approaching hourly limit: $${spent.toFixed(2)} / $${limit.toFixed(2)}`);
}

Smart Model Selection

Route prompts to cheaper models based on rules. First match wins; unmatched prompts use the agent's original model.

import {
  withModelSelection,
  byInputLength,
  byAgentName,
  byPattern,
} from '@directive-run/ai';

// Shorthand – pass a rules array directly
const runner = withModelSelection(baseRunner, [
  byInputLength(200, 'gpt-4o-mini'),
  byAgentName('summarizer', 'gpt-4o-mini'),
]);

Config Object

For callbacks and advanced options, pass a config object:

const runner = withModelSelection(baseRunner, {
  rules: [
    byInputLength(200, 'gpt-4o-mini'),                // Short inputs → mini
    byAgentName('classifier', 'gpt-4o-mini'),          // Classification agent → mini
    byPattern(/summarize|translate/i, 'gpt-4o-mini'),  // Summary/translate → mini
  ],
  onModelSelected: (original, selected) => {
    if (original !== selected) {
      console.log(`Routed ${original} → ${selected}`);
    }
  },
});

Custom Rules

Write your own match function:

import type { ModelRule } from '@directive-run/ai';

const byLanguage: ModelRule = {
  match: (agent, input) => /[\u4e00-\u9fff]/.test(input), // Chinese characters
  model: 'gpt-4o',  // Use full model for CJK languages
};

const runner = withModelSelection(baseRunner, {
  rules: [byLanguage, byInputLength(200, 'gpt-4o-mini')],
});

Structured Outputs

Parse and validate LLM responses against a schema. If validation fails, automatically retries up to maxRetries times with the validation error sent back to the LLM as feedback. Works with any Zod-compatible schema.

import { z } from 'zod';
import { withStructuredOutput, StructuredOutputError } from '@directive-run/ai';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

const runner = withStructuredOutput(baseRunner, {
  schema: SentimentSchema,
  maxRetries: 2,  // Retry up to 2 times on validation failure
});

try {
  const result = await runner(agent, 'Analyze: I love this product!');
  // result.output is typed as { sentiment, confidence, reasoning }
  console.log(result.output.sentiment);   // "positive"
  console.log(result.output.confidence);  // 0.95
} catch (err) {
  if (err instanceof StructuredOutputError) {
    console.error('Failed to get valid JSON:', err.message);
    console.error('Last raw output:', err.lastResult?.output);
  }
}

Custom JSON Extractor

Override the default JSON extraction (first {...} or [...] in output):

const runner = withStructuredOutput(baseRunner, {
  schema: MySchema,
  extractJson: (output) => {
    // Extract from markdown code block
    const match = output.match(/```json\n([\s\S]+?)\n```/);
    if (match) {
      return JSON.parse(match[1]);
    }
    return JSON.parse(output);
  },
});

Batch Queue

Group agent calls into batches for efficient processing. Each submit() returns a promise that resolves when its individual call completes.

import { createBatchQueue } from '@directive-run/ai';

const queue = createBatchQueue(runner, {
  maxBatchSize: 20,   // Flush when 20 calls are queued
  maxWaitMs: 5000,    // Or after 5 seconds, whichever comes first
  concurrency: 5,     // Run 5 calls in parallel per batch
});

// Submit calls – they batch automatically
const results = await Promise.all([
  queue.submit(agent, 'Classify: sports article'),
  queue.submit(agent, 'Classify: tech article'),
  queue.submit(agent, 'Classify: food article'),
]);

console.log(results.map(r => r.output));

// Force immediate flush
await queue.flush();

// Check queue depth (queued + in-flight)
console.log(`${queue.pending} calls pending`);

// Clean up (flushes remaining calls before disposing)
await queue.dispose();

Constraint-Driven Provider Routing

Use runtime state to select providers dynamically. Track cost, latency, and error rates per provider, then write constraints that react to them.

import { createConstraintRouter } from '@directive-run/ai';

const router = createConstraintRouter({
  providers: [
    {
      name: 'openai',
      runner: openaiRunner,
      pricing: { inputPerMillion: 5, outputPerMillion: 15 },
    },
    {
      name: 'anthropic',
      runner: anthropicRunner,
      pricing: { inputPerMillion: 3, outputPerMillion: 15 },
    },
    {
      name: 'ollama',
      runner: ollamaRunner,
      // No pricing – local inference is free
    },
  ],
  defaultProvider: 'openai',
  constraints: [
    // Switch to local when costs exceed $100
    {
      when: (facts) => facts.totalCost > 100,
      provider: 'ollama',
      priority: 10,
    },
    // Fall back to Anthropic when OpenAI is unreliable
    {
      when: (facts) => (facts.providers['openai']?.errorCount ?? 0) > 5,
      provider: 'anthropic',
    },
  ],
  // Opt-in: automatically prefer cheapest provider when no constraint matches
  preferCheapest: true,
  // Error cooldown: skip a provider for 30s after an error
  errorCooldownMs: 30000,
  // reason: "constraint" | "cheapest" | "default" | "cooldown-skip"
  onProviderSelected: (name, reason) => {
    console.log(`Using ${name} (${reason})`);
  },
});

// Use like any other runner
const result = await router(agent, input);

// Access runtime stats
console.log('Total cost:', router.facts.totalCost);
console.log('Call count:', router.facts.callCount);
console.log('Avg latency:', router.facts.avgLatencyMs, 'ms');

`RoutingFacts` Type

The router.facts object exposes all runtime stats for use in constraints:

interface RoutingFacts {
  totalCost: number;
  callCount: number;
  errorCount: number;
  lastProvider: string | null;
  avgLatencyMs: number;
  providers: Record<string, ProviderStats>;
}

interface ProviderStats {
  callCount: number;
  errorCount: number;
  totalCost: number;
  avgLatencyMs: number;
  lastErrorAt: number | null;
}

Provider Stats

The router tracks per-provider statistics accessible via router.facts.providers:

const openaiStats = router.facts.providers['openai'];
console.log({
  calls: openaiStats.callCount,
  errors: openaiStats.errorCount,
  cost: openaiStats.totalCost,
  latency: openaiStats.avgLatencyMs,
  lastError: openaiStats.lastErrorAt,
});

Full Composition Example

Compose all features onto a single runner with pipe(), then pass it to the Orchestrator:

import {
  createAgentOrchestrator,
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3, baseDelayMs: 1000 }),
  (r) => withBudget(r, {
    maxCostPerCall: 0.10,
    pricing,
    budgets: [{ window: 'hour', maxCost: 5, pricing }],
  }),
  (r) => withStructuredOutput(r, { schema: MySchema, maxRetries: 2 }),
);

const orchestrator = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await orchestrator.run(myAgent, 'Hello!');

Token Budgets in Multi-Agent

The multi-agent orchestrator tracks token usage across all agents with maxTokenBudget. When the budget is reached, a built-in constraint pauses further agent runs. Combine this with a budgetWarningThreshold callback to alert before the hard stop:

import { createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 2 }),
);

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
  maxTokenBudget: 50000,
  budgetWarningThreshold: 0.8,   // Fire callback at 80% usage
  onBudgetWarning: ({ currentTokens, maxBudget, percentage }) => {
    console.warn(`Token budget ${(percentage * 100).toFixed(0)}% used: ${currentTokens}/${maxBudget}`);
  },
});

// Each runAgent call contributes to the shared budget
const research = await orchestrator.runAgent('researcher', 'Summarize recent AI papers');
const article = await orchestrator.runAgent('writer', String(research.output));

console.log(`Total tokens used: ${orchestrator.totalTokens}`);

The budget is shared across all agents in the orchestrator. Individual agent runs that would exceed the remaining budget are blocked by a constraint before the LLM call is made.

Next Steps

Running Agents – basic runner setup
Orchestrator – agent orchestration with constraints and approvals
Guardrails – input validation and output safety
Streaming – real-time token streaming

Resilience & Routing

Composition Model

Using `pipe()`

Manual Composition

With Orchestrators

Intelligent Retry

Retry Behavior by Status Code

Custom Retry Predicate

Provider Fallback

Cost Budget Guards

Checking Spend

Smart Model Selection

Config Object

Custom Rules

Structured Outputs

Custom JSON Extractor

Batch Queue

Constraint-Driven Provider Routing

`RoutingFacts` Type

Provider Stats

Full Composition Example

Token Budgets in Multi-Agent

Next Steps

Stay in the loop. Sign up for our newsletter.

Composition Model

Using pipe()

Manual Composition

With Orchestrators

Intelligent Retry

Retry Behavior by Status Code

Custom Retry Predicate

Provider Fallback

Cost Budget Guards

Checking Spend

Smart Model Selection

Config Object

Custom Rules

Structured Outputs

Custom JSON Extractor

Batch Queue

Constraint-Driven Provider Routing

RoutingFacts Type

Provider Stats

Full Composition Example

Token Budgets in Multi-Agent

Next Steps

Stay in the loop. Sign up for our newsletter.

Using `pipe()`

`RoutingFacts` Type