Skip to main content

Foundations

10 min read

Resilience & Routing

Composable wrappers that make any AgentRunner production-ready.

Each feature follows the same pattern: wrap a runner, get a runner back. Stack them with care – order determines behavior.


Composition Model

Every wrapper has the signature (runner, config) => AgentRunner — except withFallback, which takes an array of runners: ([runners...], config) => AgentRunner. Chain them like middleware.

Composition order matters

Apply wrappers from inside out:

Model Selection → Fallback → Retry → Budget → Structured Output

Model selection runs closest to the provider. Budget checks happen before retries. Structured output validates after everything else.

Watch for retry multiplication. withRetry({ maxRetries: 3 }) wrapping withStructuredOutput({ maxRetries: 2 }) means up to 12 LLM calls (4 retry attempts × 3 parse retries each). Similarly, withFallback with retried runners multiplies: two runners with maxRetries: 2 each = up to 6 total attempts.

The examples below assume these runners are set up. See Running Agents for all provider options.

import { createOpenAIRunner } from '@directive-run/ai/openai';
import { createAnthropicRunner } from '@directive-run/ai/anthropic';

const baseRunner = createOpenAIRunner({
  apiKey: process.env.OPENAI_API_KEY!,
});
const backupRunner = createAnthropicRunner({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

Using pipe()

The cleanest way to compose middleware is pipe(), which applies wrappers left-to-right:

import {
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
  (r) => withBudget(r, { budgets: [{ window: 'hour', maxCost: 5, pricing }] }),
  (r) => withStructuredOutput(r, { schema: MySchema }),
);

Manual Composition

Or apply wrappers manually:

import {
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
  byAgentName,
} from '@directive-run/ai';

// Build from inside out – innermost wrapper runs closest to the provider
let runner = baseRunner;
runner = withModelSelection(runner, [byInputLength(200, 'gpt-4o-mini')]);
runner = withFallback([runner, backupRunner]);
runner = withRetry(runner, { maxRetries: 3 });
runner = withBudget(runner, { budgets: [{ window: 'hour', maxCost: 5, pricing }] });
runner = withStructuredOutput(runner, { schema: MySchema });

With Orchestrators

Pass the composed runner to either orchestrator:

import { createAgentOrchestrator, createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3 }),
);

// Single-agent
const single = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await single.run(agent, 'Hello!');

// Multi-agent – the same composed runner is shared across all agents
const multi = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
});
const research = await multi.runAgent('researcher', 'Explain WASM');

Intelligent Retry

HTTP-status-aware retry with exponential backoff and jitter. Respects Retry-After headers on 429 responses and never retries client errors (400, 401, 403, 404, 422).

import { withRetry, RetryExhaustedError } from '@directive-run/ai';

const runner = withRetry(baseRunner, {
  maxRetries: 3,         // 3 retries + 1 initial = 4 total attempts
  baseDelayMs: 1000,     // Start with 1s delay
  maxDelayMs: 30000,     // Cap at 30s
  onRetry: (attempt, error, delayMs) => {
    console.log(`Retry ${attempt} in ${delayMs}ms: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof RetryExhaustedError) {
    console.error(`All ${err.retryCount} retries failed`);
    console.error('Last error:', err.lastError.message);
  }
}

Retry Behavior by Status Code

StatusBehavior
429Retry with Retry-After header value (falls back to exponential backoff)
500, 502, 503Retry with exponential backoff + jitter
400, 401, 403, 404, 422Never retry (client errors)
No HTTP statusRetry (network errors, timeouts)

Custom Retry Predicate

const runner = withRetry(baseRunner, {
  maxRetries: 2,
  isRetryable: (error) => {
    // Don't retry invalid API key errors
    if (error.message.includes('invalid API key')) {
      return false;
    }
    return true; // Retry everything else
  },
});

Provider Fallback

Automatic failover across multiple runners. Tries each in order; moves to the next on failure.

import { withFallback, withRetry, AllProvidersFailedError } from '@directive-run/ai';

const runner = withFallback([
  withRetry(openaiRunner, { maxRetries: 2 }),    // Try OpenAI first (with retries)
  withRetry(anthropicRunner, { maxRetries: 2 }),  // Fall back to Anthropic
  ollamaRunner,                                   // Last resort: local Ollama
], {
  shouldFallback: (error) => {
    // Don't fall back on auth errors – they'll fail everywhere
    return !error.message.includes('401');
  },
  onFallback: (fromIndex, toIndex, error) => {
    console.log(`Provider ${fromIndex} failed, trying ${toIndex}: ${error.message}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof AllProvidersFailedError) {
    console.error(`All ${err.errors.length} providers failed:`);
    err.errors.forEach((e, i) => console.error(`  [${i}] ${e.message}`));
  }
}

Cost Budget Guards

Pre-call cost estimation and rolling budget windows prevent runaway spending. Each budget window tracks costs independently.

import { withBudget, BudgetExceededError } from '@directive-run/ai';
import type { BudgetRunner } from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = withBudget(baseRunner, {
  // Per-call limit
  maxCostPerCall: 0.10,
  pricing,

  // Rolling windows – each tracked independently, resets on a rolling basis (not calendar)
  budgets: [
    { window: 'hour', maxCost: 5.00, pricing },
    { window: 'day', maxCost: 50.00, pricing },
  ],

  // Fine-tune estimation
  charsPerToken: 4,               // ~4 characters per token (default)
  estimatedOutputMultiplier: 1.5,  // Expect 1.5x output tokens vs input

  onBudgetExceeded: (details) => {
    alert(`Budget exceeded (${details.window}): $${details.estimated.toFixed(4)} > $${details.remaining.toFixed(4)}`);
  },
});

try {
  const result = await runner(agent, input);
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.error(`${err.window} budget exceeded: $${err.estimated.toFixed(4)}`);
  }
}

Checking Spend

Access the getSpent() method to build dashboards or preemptive alerts:

// Cast needed because getSpent() is added by withBudget, not on the base AgentRunner type
const spent = (runner as BudgetRunner).getSpent('hour');
const limit = 5.00;

if (spent > limit * 0.8) {
  console.warn(`Approaching hourly limit: $${spent.toFixed(2)} / $${limit.toFixed(2)}`);
}

Smart Model Selection

Route prompts to cheaper models based on rules. First match wins; unmatched prompts use the agent's original model.

import {
  withModelSelection,
  byInputLength,
  byAgentName,
  byPattern,
} from '@directive-run/ai';

// Shorthand – pass a rules array directly
const runner = withModelSelection(baseRunner, [
  byInputLength(200, 'gpt-4o-mini'),
  byAgentName('summarizer', 'gpt-4o-mini'),
]);

Config Object

For callbacks and advanced options, pass a config object:

const runner = withModelSelection(baseRunner, {
  rules: [
    byInputLength(200, 'gpt-4o-mini'),                // Short inputs → mini
    byAgentName('classifier', 'gpt-4o-mini'),          // Classification agent → mini
    byPattern(/summarize|translate/i, 'gpt-4o-mini'),  // Summary/translate → mini
  ],
  onModelSelected: (original, selected) => {
    if (original !== selected) {
      console.log(`Routed ${original}${selected}`);
    }
  },
});

Custom Rules

Write your own match function:

import type { ModelRule } from '@directive-run/ai';

const byLanguage: ModelRule = {
  match: (agent, input) => /[\u4e00-\u9fff]/.test(input), // Chinese characters
  model: 'gpt-4o',  // Use full model for CJK languages
};

const runner = withModelSelection(baseRunner, {
  rules: [byLanguage, byInputLength(200, 'gpt-4o-mini')],
});

Structured Outputs

Parse and validate LLM responses against a schema. If validation fails, automatically retries up to maxRetries times with the validation error sent back to the LLM as feedback. Works with any Zod-compatible schema.

import { z } from 'zod';
import { withStructuredOutput, StructuredOutputError } from '@directive-run/ai';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

const runner = withStructuredOutput(baseRunner, {
  schema: SentimentSchema,
  maxRetries: 2,  // Retry up to 2 times on validation failure
});

try {
  const result = await runner(agent, 'Analyze: I love this product!');
  // result.output is typed as { sentiment, confidence, reasoning }
  console.log(result.output.sentiment);   // "positive"
  console.log(result.output.confidence);  // 0.95
} catch (err) {
  if (err instanceof StructuredOutputError) {
    console.error('Failed to get valid JSON:', err.message);
    console.error('Last raw output:', err.lastResult?.output);
  }
}

Custom JSON Extractor

Override the default JSON extraction (first {...} or [...] in output):

const runner = withStructuredOutput(baseRunner, {
  schema: MySchema,
  extractJson: (output) => {
    // Extract from markdown code block
    const match = output.match(/```json\n([\s\S]+?)\n```/);
    if (match) {
      return JSON.parse(match[1]);
    }
    return JSON.parse(output);
  },
});

Batch Queue

Group agent calls into batches for efficient processing. Each submit() returns a promise that resolves when its individual call completes.

import { createBatchQueue } from '@directive-run/ai';

const queue = createBatchQueue(runner, {
  maxBatchSize: 20,   // Flush when 20 calls are queued
  maxWaitMs: 5000,    // Or after 5 seconds, whichever comes first
  concurrency: 5,     // Run 5 calls in parallel per batch
});

// Submit calls – they batch automatically
const results = await Promise.all([
  queue.submit(agent, 'Classify: sports article'),
  queue.submit(agent, 'Classify: tech article'),
  queue.submit(agent, 'Classify: food article'),
]);

console.log(results.map(r => r.output));

// Force immediate flush
await queue.flush();

// Check queue depth (queued + in-flight)
console.log(`${queue.pending} calls pending`);

// Clean up (flushes remaining calls before disposing)
await queue.dispose();

Constraint-Driven Provider Routing

Use runtime state to select providers dynamically. Track cost, latency, and error rates per provider, then write constraints that react to them.

import { createConstraintRouter } from '@directive-run/ai';

const router = createConstraintRouter({
  providers: [
    {
      name: 'openai',
      runner: openaiRunner,
      pricing: { inputPerMillion: 5, outputPerMillion: 15 },
    },
    {
      name: 'anthropic',
      runner: anthropicRunner,
      pricing: { inputPerMillion: 3, outputPerMillion: 15 },
    },
    {
      name: 'ollama',
      runner: ollamaRunner,
      // No pricing – local inference is free
    },
  ],
  defaultProvider: 'openai',
  constraints: [
    // Switch to local when costs exceed $100
    {
      when: (facts) => facts.totalCost > 100,
      provider: 'ollama',
      priority: 10,
    },
    // Fall back to Anthropic when OpenAI is unreliable
    {
      when: (facts) => (facts.providers['openai']?.errorCount ?? 0) > 5,
      provider: 'anthropic',
    },
  ],
  // Opt-in: automatically prefer cheapest provider when no constraint matches
  preferCheapest: true,
  // Error cooldown: skip a provider for 30s after an error
  errorCooldownMs: 30000,
  // reason: "constraint" | "cheapest" | "default" | "cooldown-skip"
  onProviderSelected: (name, reason) => {
    console.log(`Using ${name} (${reason})`);
  },
});

// Use like any other runner
const result = await router(agent, input);

// Access runtime stats
console.log('Total cost:', router.facts.totalCost);
console.log('Call count:', router.facts.callCount);
console.log('Avg latency:', router.facts.avgLatencyMs, 'ms');

RoutingFacts Type

The router.facts object exposes all runtime stats for use in constraints:

interface RoutingFacts {
  totalCost: number;
  callCount: number;
  errorCount: number;
  lastProvider: string | null;
  avgLatencyMs: number;
  providers: Record<string, ProviderStats>;
}

interface ProviderStats {
  callCount: number;
  errorCount: number;
  totalCost: number;
  avgLatencyMs: number;
  lastErrorAt: number | null;
}

Provider Stats

The router tracks per-provider statistics accessible via router.facts.providers:

const openaiStats = router.facts.providers['openai'];
console.log({
  calls: openaiStats.callCount,
  errors: openaiStats.errorCount,
  cost: openaiStats.totalCost,
  latency: openaiStats.avgLatencyMs,
  lastError: openaiStats.lastErrorAt,
});

Full Composition Example

Compose all features onto a single runner with pipe(), then pass it to the Orchestrator:

import {
  createAgentOrchestrator,
  pipe,
  withRetry,
  withFallback,
  withBudget,
  withModelSelection,
  withStructuredOutput,
  byInputLength,
} from '@directive-run/ai';

const pricing = { inputPerMillion: 5, outputPerMillion: 15 };

const runner = pipe(
  baseRunner,
  (r) => withModelSelection(r, [byInputLength(200, 'gpt-4o-mini')]),
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 3, baseDelayMs: 1000 }),
  (r) => withBudget(r, {
    maxCostPerCall: 0.10,
    pricing,
    budgets: [{ window: 'hour', maxCost: 5, pricing }],
  }),
  (r) => withStructuredOutput(r, { schema: MySchema, maxRetries: 2 }),
);

const orchestrator = createAgentOrchestrator({ runner, autoApproveToolCalls: true });
const result = await orchestrator.run(myAgent, 'Hello!');

Token Budgets in Multi-Agent

The multi-agent orchestrator tracks token usage across all agents with maxTokenBudget. When the budget is reached, a built-in constraint pauses further agent runs. Combine this with a budgetWarningThreshold callback to alert before the hard stop:

import { createMultiAgentOrchestrator, pipe, withRetry, withFallback } from '@directive-run/ai';

const runner = pipe(
  baseRunner,
  (r) => withFallback([r, backupRunner]),
  (r) => withRetry(r, { maxRetries: 2 }),
);

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: {
    researcher: { agent: researcher, maxConcurrent: 3 },
    writer: { agent: writer, maxConcurrent: 1 },
  },
  maxTokenBudget: 50000,
  budgetWarningThreshold: 0.8,   // Fire callback at 80% usage
  onBudgetWarning: ({ currentTokens, maxBudget, percentage }) => {
    console.warn(`Token budget ${(percentage * 100).toFixed(0)}% used: ${currentTokens}/${maxBudget}`);
  },
});

// Each runAgent call contributes to the shared budget
const research = await orchestrator.runAgent('researcher', 'Summarize recent AI papers');
const article = await orchestrator.runAgent('writer', String(research.output));

console.log(`Total tokens used: ${orchestrator.totalTokens}`);

The budget is shared across all agents in the orchestrator. Individual agent runs that would exceed the remaining budget are blocked by a constraint before the LLM call is made.


Next Steps

Previous
Running Agents

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven State Management for TypeScript