Skip to main content

Multi-Agent Orchestrator

4 min read

Self-Healing Networks

Automatic agent rerouting when health degrades – Directive-native self-healing.

When an agent becomes unhealthy (circuit breaker opens, health score drops), the system reactively reroutes work to a healthy equivalent. Zero overhead when not configured.


Quick Start

Single-Agent Fallback

When the primary circuit breaker opens, fall back to alternate runners:

import { createAgentOrchestrator } from '@directive-run/ai';

const orchestrator = createAgentOrchestrator({
  runner: primaryRunner,
  circuitBreaker: { failureThreshold: 3, resetTimeoutMs: 30000 },
  selfHealing: {
    fallbackRunners: [backupRunner, emergencyRunner],
    onReroute: (event) => console.log(`Rerouted to ${event.reroutedTo}`),
  },
});

Fallback runners are tried in order. When the primary circuit recovers (half-open → closed), it automatically resumes as the primary.

Multi-Agent Rerouting

When an agent's circuit breaker opens, reroute to an equivalent agent:

import { createMultiAgentOrchestrator } from '@directive-run/ai';

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: {
    'gpt-writer': { agent: gptWriter, capabilities: ['writing'] },
    'claude-writer': { agent: claudeWriter, capabilities: ['writing'] },
    'researcher': { agent: researcher, capabilities: ['research'] },
  },
  selfHealing: {
    circuitBreakerDefaults: { failureThreshold: 3, resetTimeoutMs: 30000 },
    useCapabilities: true,
    selectionStrategy: 'healthiest',
    onReroute: (event) => console.log(`${event.originalAgent}${event.reroutedTo}`),
  },
});

When gpt-writer fails, the system automatically reroutes to claude-writer (same writing capability).


Health Monitor

The health monitor tracks per-agent metrics in a rolling time window:

import { createHealthMonitor } from '@directive-run/ai';

const monitor = createHealthMonitor({
  windowMs: 60000,        // 60-second rolling window
  maxNormalLatencyMs: 5000, // latency normalization ceiling
  maxEventsPerAgent: 1000,  // max events per agent before FIFO eviction
});

monitor.recordSuccess('agent-a', 120);
monitor.recordFailure('agent-a', 5000, new Error('timeout'));

const score = monitor.getHealthScore('agent-a'); // 0-100
const metrics = monitor.getMetrics('agent-a');
// { agentId, circuitState, successRate, avgLatencyMs, healthScore, lastErrors, ... }

Health Score Formula

The score is a weighted combination (all configurable):

FactorDefault WeightHow it's computed
Success rate0.5successes / totalEvents
Latency0.31 - (avgLatency / maxNormalLatencyMs)
Circuit state0.2CLOSED=1, HALF_OPEN=0.5, OPEN=0

No data = score 50 (neutral). Score range: 0-100.


Equivalency Resolution

The system finds equivalent agents through two mechanisms:

Capability Matching (default)

Agents with overlapping capabilities arrays are considered equivalent:

agents: {
  'fast-writer': { agent: fastWriter, capabilities: ['writing', 'summarization'] },
  'deep-writer': { agent: deepWriter, capabilities: ['writing', 'analysis'] },
  'researcher':  { agent: researcher, capabilities: ['research'] },
},
selfHealing: { useCapabilities: true },

When fast-writer fails, deep-writer is a candidate (shared writing capability). researcher is not (no overlap).

Explicit Groups

For fine-grained control, define equivalency groups:

selfHealing: {
  equivalencyGroups: {
    writers: ['fast-writer', 'deep-writer', 'backup-writer'],
    researchers: ['researcher', 'backup-researcher'],
  },
},

Explicit groups are checked first, then capability matching. Unhealthy agents are filtered out, and the healthiest equivalent is selected.


Selection Strategy

StrategyBehavior
healthiest (default)Pick the equivalent with the highest health score
round-robinRotate through equivalents evenly
selfHealing: {
  selectionStrategy: 'round-robin',
},

Degradation Policies

When all equivalents are exhausted:

PolicyBehavior
reject (default)Throw the original error
fallback-responseReturn a static response
selfHealing: {
  degradation: 'fallback-response',
  fallbackResponse: { output: 'Service temporarily unavailable.' },
},

Single-Agent Configuration

interface SelfHealingConfig {
  fallbackRunners?: AgentRunner[];   // tried in order
  fallbackAgent?: AgentLike;         // alternate agent definition
  circuitBreaker?: CircuitBreakerConfig;
  healthThreshold?: number;          // default 30
  degradation?: 'reject' | 'fallback-response';
  fallbackResponse?: unknown;
  onReroute?: (event: RerouteEvent) => void;
}

Circuit breaker required

selfHealing requires a circuitBreaker to detect failures. If you configure selfHealing without circuitBreaker, a dev-mode warning is emitted.


Multi-Agent Configuration

interface MultiAgentSelfHealingConfig {
  circuitBreakerDefaults?: CircuitBreakerConfig;
  healthThreshold?: number;                        // default 30
  equivalencyGroups?: Record<string, string[]>;
  useCapabilities?: boolean;                       // default true
  selectionStrategy?: 'healthiest' | 'round-robin';
  degradation?: 'reject' | 'fallback-response';
  fallbackResponse?: unknown;
  onReroute?: (event: RerouteEvent) => void;
  healthMonitor?: HealthMonitorConfig;
}

Lifecycle Hooks

selfHealing: {
  onReroute: (event) => {
    // event.originalAgent – the unhealthy agent
    // event.reroutedTo – the replacement
    // event.reason – why rerouting happened
    console.log(`Rerouted ${event.originalAgent}${event.reroutedTo}: ${event.reason}`);
  },
},

Circular Reroute Guard

A rerouted agent cannot itself reroute – maximum 1 hop. This prevents infinite reroute loops when multiple agents in an equivalency group are failing simultaneously.


Zero-Overhead Guarantee

All self-healing setup is gated behind configuration checks. When selfHealing is not provided:

  • No HealthMonitor created
  • No extra facts in the Directive system
  • No extra constraints or resolvers
  • The existing circuitBreaker option works exactly as before

Next Steps

Previous
Cross-Agent State

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven State Management for TypeScript