Skip to main content

Foundations

10 min read

Resilience & Routing

Composable wrappers that make any AgentRunner production-ready.

Each feature follows the same pattern: wrap a runner, get a runner back. Stack them with care – order determines behavior.


Composition Model

Every wrapper has the signature (runner, config) => AgentRunner – except withFallback, which takes an array of runners: ([runners...], config) => AgentRunner. Chain them like middleware.

Composition order matters

Apply wrappers from inside out:

Model Selection → Fallback → Retry → Budget → Structured Output

Model selection runs closest to the provider. Budget checks happen before retries. Structured output validates after everything else.

Watch for retry multiplication. withRetry({ maxRetries: 3 }) wrapping withStructuredOutput({ maxRetries: 2 }) means up to 12 LLM calls (4 retry attempts × 3 parse retries each). Similarly, withFallback with retried runners multiplies: two runners with maxRetries: 2 each = up to 6 total attempts.

The examples below assume these runners are set up. See Running Agents for all provider options.

import { createOpenAIRunner } from '@directive-run/ai/openai';
import { createAnthropicRunner } from '@directive-run/ai/anthropic';

const baseRunner = createOpenAIRunner({
  apiKey: process.env.OPENAI_API_KEY!,
});
const backupRunner = createAnthropicRunner({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

Using pipe()

The cleanest way to compose middleware is pipe(), which applies wrappers left-to-right:

import {
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
  (r) => withBudget(r, { budgets: [{ window: 'hour', maxCost: 5, pricing }] }),
  (r) => withStructuredOutput(r, { schema: MySchema }),
);

Manual Composition

Or apply wrappers manually:

import {
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
  byAgentName,
} from '@directive-run/ai';

// Build from inside out – innermost wrapper runs closest to the provider
let runner = baseRunner;
runner = withModelSelection(runner, [byInputLength(200, 'gpt-4o-mini')]);
runner = withFallback([runner, backupRunner]);
runner = withRetry(runner, { maxRetries: 3 });
runner = withBudget(runner, { budgets: [{ window: 'hour', maxCost: 5, pricing }] });
runner = withStructuredOutput(runner, { schema: MySchema });

With Orchestrators

Pass the composed runner to either orchestrator:

import { createAgentOrchestrator, createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
);

// Single-agent
const single = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await single.run(agent, 'Hello!');

// Multi-agent – the same composed runner is shared across all agents
const multi = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
});
const research = await multi.runAgent('researcher', 'Explain WASM');

Intelligent Retry

HTTP-status-aware retry with exponential backoff and jitter. Respects Retry-After headers on 429 responses and never retries client errors (400, 401, 403, 404, 422).

import { withRetry, RetryExhaustedError } from '@directive-run/ai';

const runner = withRetry(baseRunner, {
  maxRetries: 3,         // 3 retries + 1 initial = 4 total attempts
  baseDelayMs: 1000,     // Start with 1s delay
  maxDelayMs: 30000,     // Cap at 30s
  onRetry: (attempt, error, delayMs) => {
    console.log(`Retry ${attempt} in ${delayMs}ms: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof RetryExhaustedError) {
    console.error(`All ${err.retryCount} retries failed`);
    console.error('Last error:', err.lastError.message);
  }
}

Retry Behavior by Status Code

StatusBehavior
429Retry with Retry-After header value (falls back to exponential backoff)
500, 502, 503Retry with exponential backoff + jitter
400, 401, 403, 404, 422Never retry (client errors)
No HTTP statusRetry (network errors, timeouts)

Custom Retry Predicate

const runner = withRetry(baseRunner, {
  maxRetries: 2,
  isRetryable: (error) => {
    // Don't retry invalid API key errors
    if (error.message.includes('invalid API key')) {
      return false;
    }
    return true; // Retry everything else
  },
});

Provider Fallback

Automatic failover across multiple runners. Tries each in order; moves to the next on failure.

import { withFallback, withRetry, AllProvidersFailedError } from '@directive-run/ai';

const runner = withFallback([
  withRetry(openaiRunner, { maxRetries: 2 }),    // Try OpenAI first (with retries)
  withRetry(anthropicRunner, { maxRetries: 2 }),  // Fall back to Anthropic
  ollamaRunner,                                   // Last resort: local Ollama
], {
  shouldFallback: (error) => {
    // Don't fall back on auth errors – they'll fail everywhere
    return !error.message.includes('401');
  },
  onFallback: (fromIndex, toIndex, error) => {
    console.log(`Provider ${fromIndex} failed, trying ${toIndex}: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof AllProvidersFailedError) {
    console.error(`All ${err.errors.length} providers failed:`);
    err.errors.forEach((e, i) => console.error(`  [${i}] ${e.message}`));
  }
}

Cost Budget Guards

Pre-call cost estimation and rolling budget windows prevent runaway spending. Each budget window tracks costs independently.

import { withBudget, BudgetExceededError } from '@directive-run/ai';
import type { BudgetRunner } from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = withBudget(baseRunner, {
  // Per-call limit
  maxCostPerCall: 0.10,
  pricing,

  // Rolling windows – each tracked independently, resets on a rolling basis (not calendar)
  budgets: [
    { window: 'hour', maxCost: 5.00, pricing },
    { window: 'day', maxCost: 50.00, pricing },
  ],

  // Fine-tune estimation
  charsPerToken: 4,               // ~4 characters per token (default)
  estimatedOutputMultiplier: 1.5,  // Expect 1.5x output tokens vs input

  onBudgetExceeded: (details) => {
    alert(`Budget exceeded (${details.window}): $${details.estimated.toFixed(4)} > $${details.remaining.toFixed(4)}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.error(`${err.window} budget exceeded: $${err.estimated.toFixed(4)}`);
  }
}

Checking Spend

Access the getSpent() method to build dashboards or preemptive alerts:

// Cast needed because getSpent() is added by withBudget, not on the base AgentRunner type
const spent = (runner as BudgetRunner).getSpent('hour');
const limit = 5.00;

if (spent > limit * 0.8) {
  console.warn(`Approaching hourly limit: $${spent.toFixed(2)} / $${limit.toFixed(2)}`);
}

Smart Model Selection

Route prompts to cheaper models based on rules. First match wins; unmatched prompts use the agent's original model.

import {
  withModelSelection,
  byInputLength,
  byAgentName,
  byPattern,
} from '@directive-run/ai';

// Shorthand – pass a rules array directly
const runner = withModelSelection(baseRunner, [
  byInputLength(200, 'gpt-4o-mini'),
  byAgentName('summarizer', 'gpt-4o-mini'),
]);

Config Object

For callbacks and advanced options, pass a config object:

const runner = withModelSelection(baseRunner, {
  rules: [
    byInputLength(200, 'gpt-4o-mini'),                // Short inputs → mini
    byAgentName('classifier', 'gpt-4o-mini'),          // Classification agent → mini
    byPattern(/summarize|translate/i, 'gpt-4o-mini'),  // Summary/translate → mini
  ],
  onModelSelected: (original, selected) => {
    if (original !== selected) {
      console.log(`Routed ${original}${selected}`);
    }
  },
});

Custom Rules

Write your own match function:

import type { ModelRule } from '@directive-run/ai';

const byLanguage: ModelRule = {
  match: (agent, input) => /[\u4e00-\u9fff]/.test(input), // Chinese characters
  model: 'gpt-4o',  // Use full model for CJK languages
};

const runner = withModelSelection(baseRunner, {
  rules: [byLanguage, byInputLength(200, 'gpt-4o-mini')],
});

Structured Outputs

Parse and validate LLM responses against a schema. If validation fails, automatically retries up to maxRetries times with the validation error sent back to the LLM as feedback. Works with any Zod-compatible schema.

import { z } from 'zod';
import { withStructuredOutput, StructuredOutputError } from '@directive-run/ai';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

const runner = withStructuredOutput(baseRunner, {
  schema: SentimentSchema,
  maxRetries: 2,  // Retry up to 2 times on validation failure
});

try {
  const result = await runner(agent, 'Analyze: I love this product!');
  // result.output is typed as { sentiment, confidence, reasoning }
  console.log(result.output.sentiment);   // "positive"
  console.log(result.output.confidence);  // 0.95
} catch (err) {
  if (err instanceof StructuredOutputError) {
    console.error('Failed to get valid JSON:', err.message);
    console.error('Last raw output:', err.lastResult?.output);
  }
}

Custom JSON Extractor

Override the default JSON extraction (first {...} or [...] in output):

const runner = withStructuredOutput(baseRunner, {
  schema: MySchema,
  extractJson: (output) => {
    // Extract from markdown code block
    const match = output.match(/```json\n([\s\S]+?)\n```/);
    if (match) {
      return JSON.parse(match[1]);
    }
    return JSON.parse(output);
  },
});

Batch Queue

Group agent calls into batches for efficient processing. Each submit() returns a promise that resolves when its individual call completes.

import { createBatchQueue } from '@directive-run/ai';

const queue = createBatchQueue(runner, {
  maxBatchSize: 20,   // Flush when 20 calls are queued
  maxWaitMs: 5000,    // Or after 5 seconds, whichever comes first
  concurrency: 5,     // Run 5 calls in parallel per batch
});

// Submit calls – they batch automatically
const results = await Promise.all([
  queue.submit(agent, 'Classify: sports article'),
  queue.submit(agent, 'Classify: tech article'),
  queue.submit(agent, 'Classify: food article'),
]);

console.log(results.map(r => r.output));

// Force immediate flush
await queue.flush();

// Check queue depth (queued + in-flight)
console.log(`${queue.pending} calls pending`);

// Clean up (flushes remaining calls before disposing)
await queue.destroy();

Constraint-Driven Provider Routing

Use runtime state to select providers dynamically. Track cost, latency, and error rates per provider, then write constraints that react to them.

import { createConstraintRouter } from '@directive-run/ai';

const router = createConstraintRouter({
  providers: [
    {
      name: 'openai',
      runner: openaiRunner,
      pricing: { inputPerMillion: 5, outputPerMillion: 15 },
    },
    {
      name: 'anthropic',
      runner: anthropicRunner,
      pricing: { inputPerMillion: 3, outputPerMillion: 15 },
    },
    {
      name: 'ollama',
      runner: ollamaRunner,
      // No pricing – local inference is free
    },
  ],
  defaultProvider: 'openai',
  constraints: [
    // Switch to local when costs exceed $100
    {
      when: (facts) => facts.totalCost > 100,
      provider: 'ollama',
      priority: 10,
    },
    // Fall back to Anthropic when OpenAI is unreliable
    {
      when: (facts) => (facts.providers['openai']?.errorCount ?? 0) > 5,
      provider: 'anthropic',
    },
  ],
  // Opt-in: automatically prefer cheapest provider when no constraint matches
  preferCheapest: true,
  // Error cooldown: skip a provider for 30s after an error
  errorCooldownMs: 30000,
  // reason: "constraint" | "cheapest" | "default" | "cooldown-skip"
  onProviderSelected: (name, reason) => {
    console.log(`Using ${name} (${reason})`);
  },
});

// Use like any other runner
const result = await router(agent, input);

// Access runtime stats
console.log('Total cost:', router.facts.totalCost);
console.log('Call count:', router.facts.callCount);
console.log('Avg latency:', router.facts.avgLatencyMs, 'ms');

RoutingFacts Type

The router.facts object exposes all runtime stats for use in constraints:

interface RoutingFacts {
  totalCost: number;
  callCount: number;
  errorCount: number;
  lastProvider: string | null;
  avgLatencyMs: number;
  providers: Record<string, ProviderStats>;
}

interface ProviderStats {
  callCount: number;
  errorCount: number;
  totalCost: number;
  avgLatencyMs: number;
  lastErrorAt: number | null;
}

Provider Stats

The router tracks per-provider statistics accessible via router.facts.providers:

const openaiStats = router.facts.providers['openai'];
console.log({
  calls: openaiStats.callCount,
  errors: openaiStats.errorCount,
  cost: openaiStats.totalCost,
  latency: openaiStats.avgLatencyMs,
  lastError: openaiStats.lastErrorAt,
});

Full Composition Example

Compose all features onto a single runner with pipe(), then pass it to the Orchestrator:

import {
  createAgentOrchestrator,
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3, baseDelayMs: 1000 }),
  (r) => withBudget(r, {
    maxCostPerCall: 0.10,
    pricing,
    budgets: [{ window: 'hour', maxCost: 5, pricing }],
  }),
  (r) => withStructuredOutput(r, { schema: MySchema, maxRetries: 2 }),
);

const orchestrator = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await orchestrator.run(myAgent, 'Hello!');

Token Budgets in Multi-Agent

The multi-agent orchestrator tracks token usage across all agents with maxTokenBudget. When the budget is reached, a built-in constraint pauses further agent runs. Combine this with a budgetWarningThreshold callback to alert before the hard stop:

import { createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 2 }),
);

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
  maxTokenBudget: 50000,
  budgetWarningThreshold: 0.8,   // Fire callback at 80% usage
  onBudgetWarning: ({ currentTokens, maxBudget, percentage }) => {
    console.warn(`Token budget ${(percentage * 100).toFixed(0)}% used: ${currentTokens}/${maxBudget}`);
  },
});

// Each runAgent call contributes to the shared budget
const research = await orchestrator.runAgent('researcher', 'Summarize recent AI papers');
const article = await orchestrator.runAgent('writer', String(research.output));

console.log(`Total tokens used: ${orchestrator.totalTokens}`);

The budget is shared across all agents in the orchestrator. Individual agent runs that would exceed the remaining budget are blocked by a constraint before the LLM call is made.


Next Steps

Previous
Running Agents

Stay in the loop. Sign up for our newsletter.

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven Runtime for TypeScript | AI Guardrails & State Management