Skip to main content

Plugins

8 min read

Clobber Loop Detector

A single clobber on an abortOn:-bound fact is fine — the binding catches the race and the audit ledger records it. A loop is two or more resolvers whose when: predicates both satisfy a shared state and keep rewriting the fact every reconcile tick, indefinitely. Today the symptom surfaces as a value flapping between two states in a customer screenshot, even though the audit ledger holds 800 clobbers/sec of forensic evidence nobody is querying. clobberLoopPlugin closes the loop: one structured warning per detected loop, with a predicate-overlap proof that names the specific clauses fighting.


Quick Start

import { createSystem } from '@directive-run/core';
import { clobberLoopPlugin } from '@directive-run/core/plugins';

const detector = clobberLoopPlugin({
  threshold: 5,
  windowMs: 1000,
  onLoop: (event) => {
    pagerduty.trigger({
      severity: event.severity,
      summary: `Clobber loop on ${event.fact}: ${event.participants.join(' vs ')}`,
      details: event,
    });
  },
});

const system = createSystem({
  module: myModule,
  plugins: [detector.plugin],
});

// During incident response:
detector.disable();

The returned handle is { plugin, disable, enable, isEnabled }. Pass .plugin to createSystem; use the rest at runtime when an SRE needs to flip the detector off without redeploying.


When to use clobberLoopPlugin vs clobberAlertPlugin

clobberAlertPluginclobberLoopPlugin
Fires onevery clobber on an irreversible-tagged factsustained loops only (≥ N distinct rejections from ≥ 2 resolvers in window)
Outputper-event alertone event per detected loop, with predicate-overlap proof
Use whena single clobber is operationally urgent (money, PII)the noise of "many clobbers" is the problem itself
Mounts withclobber-alertclobber-loop
Pair?yes — both can run together

Most production systems want both: clobberAlertPlugin pages on irreversible-tagged clobbers immediately, clobberLoopPlugin separately surfaces the rule-design bug behind a sustained churn.


How loop detection works

The detector subscribes to resolver.write.rejected events and aggregates them per fact:

┌────────────────────────────────────────────────────────┐
│  resolver.write.rejected stream                        │
│                                                        │
│  ringBuffer[fact] = [{ timestamp, requirementId,       │
│                       resolverId, seq }, ...]          │
│                                                        │
│  trim to windowMs                                      │
│                                                        │
│  if distinct(requirementId) ≥ threshold                │
│     AND distinct(resolverId) ≥ 2                       │
│     AND (fact, participantSet) not in cooldown         │
│  then:                                                 │
│    build PredicateOverlapProof from participants'      │
│      whenSpecs                                         │
│    PII-redact operands via system.meta.byTag("pii")    │
│    emit resolver.clobber.loop.detected                 │
│    enter cooldown for cooldownMs                       │
└────────────────────────────────────────────────────────┘

The distinct-by-requirement-id counting matters: a single resolver's retry storm shares one requirement ID, so it counts as one rejection. The detector only fires on true multi-participant contention — never on a single resolver retrying itself.

When the loop quiets (default 30s without a new rejection on the (fact, participantSet)), a resolver.clobber.loop.resolved event closes the alarm so dashboards show active loops, not historical loops.


Configuration

OptionTypeDefaultDescription
windowMsnumber1000Window over which rejections aggregate.
thresholdnumber5Minimum distinct-requirement rejections in window to trigger.
cooldownMsnumber5000Suppress same-(fact, participantSet) re-fire for this duration after emission.
resolvedAfterMsnumber30000Quiet window before firing resolver.clobber.loop.resolved.
maxTrackedFactsnumber256Global LRU cap on facts the detector tracks.
maxParticipantsPerFactnumber16Per-fact cap on participant resolvers.
maxEmissionsPerSecnumber10Global emission cap. Above-cap detections surface in next event's suppressedSinceLastEmit.
capturePIIbooleanfalseIf false (default), whenSpec operands at PII-tagged fact paths are redacted to "[redacted]" BEFORE the event leaves the plugin. Set to true only when the deployment has a data-processing addendum.
onLoop(event) => voidconsole.warn in dev, console.error to stderr in prodCalled for each detected loop. NOT noop in production — defaults to stderr so the signal lands in log pipelines without explicit routing.
onResolved(event) => voidundefinedCalled when a previously-detected loop closes.

The defaults assume dev / staging operation. For production wire onLoop: pagerduty.trigger or onLoop: slack.post explicitly — the stderr default is the floor under "no monitoring at all," not a recommended production sink.


The detected event

{
  type: 'resolver.clobber.loop.detected',
  systemId: string,                         // multi-tenant routing key
  fact: string,                             // the contended fact key
  participants: readonly string[],          // sorted unique resolver IDs
  participantModules: readonly string[],    // each resolver's owning module
  count: number,                            // distinct-requirement rejections in window
  windowMs: number,
  firstAt: number,                          // ms epoch of first event in window
  lastAt: number,                           // ms epoch of trigger event
  predicateOverlap?: PredicateOverlapProof, // see below
  severity: 'warn' | 'error',               // escalated to "error" when fact tagged "pii" or "money"
  factTags: readonly string[],              // surfaces tags without leaking values
  suppressedSinceLastEmit: number,          // global rate-limit overflow counter
  rejectionSeqs: readonly number[],         // audit cross-references
}

Emitted through system.observe() so audit-ledger, devtools, and OTel exporters all see it without depending on clobberLoopPlugin directly.

The resolved companion:

{
  type: 'resolver.clobber.loop.resolved',
  systemId: string,
  fact: string,
  participants: readonly string[],
  durationMs: number,
  resolution: 'no-recurrence-in-window'
            | 'participant-disabled'
            | 'predicate-narrowed',
}

The PredicateOverlapProof

This is the killer feature — when both fighting resolvers' constraints use data-form when: predicates, the plugin tells you the exact clauses that co-fire so the suggested fix isn't a guess.

type PredicateOverlapProof =
  | {
      verdict: 'matched';
      coFireClauses: LeafClause[];
      conflictingClauses: never[];
    }
  | {
      verdict: 'overlap';
      coFireClauses: LeafClause[];
      conflictingClauses: LeafClause[];
    }
  | {
      verdict: 'indeterminate';
      reason: 'non-comparable-operator';
      coFireClauses: LeafClause[];
    }
  | {
      verdict: 'function-form-opaque';
      reason: 'one-or-both-when-is-a-function';
      whenSourceHashes?: string[];
    };
VerdictMeaningWarning text disclaims?
matchedBoth predicates have identical structural clauses. The strongest verdict — the rules are syntactic duplicates.No
overlapClauses share at least one path and at least one pairwise comparison says they co-fire, with no direct contradictions.No
indeterminateA non-comparable operator ($regex, $elemMatch, $matches) appeared. Cannot prove overlap structurally.Yes ("cannot prove overlap — predicate uses non-comparable operator")
function-form-opaqueAt least one constraint uses a function-form when:. Structural comparison is impossible. If audit-ledger is mounted, includes hashes of the function sources for cross-version diffing.Yes ("cannot prove overlap — at least one constraint uses function-form when:")

The proof builder uses the same flattenPredicate + clause-comparison machinery directive doctor uses to find contradictions before runtime — so a loop the detector flags at runtime is the same shape doctor could have flagged at design time.

Default warning text from console.warn:

[directive] CLOBBER LOOP on `cart.discount` (5 clobbers in 482ms)
  Participants: applyCoupon, applyLoyaltyDiscount
  Predicate overlap: matched
  Both `when:` predicates fire when:
    - user.loyaltyTier >= 2
    - coupon.code exists
  Suggested fix: add `priority:` to one, narrow `when:` to disjoint conditions, or merge into a single resolver.

Reason-aware shouldRetry integration

v1.23.0 also widened RetryPolicy.shouldRetry with an optional third argument — ShouldRetryContext — so a retry policy can decide based on WHY the attempt failed. The motivating case pairs with this detector: "retry on clobber, fail loud on real bugs."

resolvers: {
  applyDiscount: {
    requirement: 'APPLY_DISCOUNT',
    retry: {
      attempts: 5,
      backoff: 'exponential',
      shouldRetry: (err, attempt, ctx) => {
        if (ctx?.reason === 'clobbered') return attempt < 5;
        if (ctx?.reason === 'timeout')   return attempt < 2;
        return false;  // 'error' / 'cancelled' → no retry
      },
    },
    resolve: async (req, ctx) => { /* ... */ },
  },
}

The ShouldRetryContext carries:

  • reason: 'clobbered' | 'timeout' | 'cancelled' | 'error'
  • clobber?: { fact, expected, actual } — populated when reason === 'clobbered'

Two-argument shouldRetry(err, attempt) callers continue to work unchanged — the third argument is additive. Before this change, a clobber-induced abort never reached shouldRetry at all; the controller's aborted signal short-circuited the retry path silently. Now policies can opt into bounded retries on contention while still failing loud on real errors.


PII safety

Redaction happens at event-construction time, not at message-format time. The plugin walks the participants' whenSpec through the same redactWhenSpec utility the audit-ledger uses, against system.meta.byTag("pii"), BEFORE the PredicateOverlapProof is attached to the emitted event. Downstream sinks (audit-ledger, devtools, third-party onLoop handlers) all receive the redacted form by default — there is no "raw vs formatted" split where a sink could accidentally log the unredacted operands.

capturePII: true is the explicit opt-out. Mirror of the audit-ledger contract — set it only when the deployment has a data-processing addendum that permits unredacted operand capture.

The slim per-rejection buffer entries ({ timestamp, requirementId, resolverId, seq }) deliberately don't carry expected / actual payloads at all — the audit ledger already keeps the forensic payload, so PII doesn't spread through the detector's own buffers.


Audit-ledger integration

When createAuditLedger is mounted alongside clobberLoopPlugin, both new event variants are captured as ledger entries:

import { createAuditLedger, clobberLoopPlugin, memorySink } from '@directive-run/core/plugins';

const ledger = createAuditLedger({ sink: memorySink() });
const detector = clobberLoopPlugin({ threshold: 5 });

const system = createSystem({
  module: myModule,
  plugins: [detector.plugin, ledger.plugin],
});

// Later — audit query for the loop:
const loopEntries = ledger
  .recent(1000)
  .filter((e) => e.kind === 'resolver.clobber.loop.detected');

Each resolver.clobber.loop.detected audit entry includes rejectionSeqs — the sequence numbers of the contributing resolver.write.rejected entries — so an auditor reading a loop entry can walk to every individual rejection. The proof's overlapVerdict is surfaced as a tag (the predicate clauses are PII-redacted upstream).


Runtime kill-switch

The plugin's return handle exposes disable(), enable(), and isEnabled() so an SRE can flip the detector off during incident response without redeploying. The buffer state survives across toggle — enable() resumes cleanly without a warm-up delay.

const detector = clobberLoopPlugin({ threshold: 5 });

// At incident time:
detector.disable();
// ...investigate / mitigate...
detector.enable();
detector.isEnabled(); // true

disable() stops emission. Inbound resolver.write.rejected events still update the ring buffer so a re-enable doesn't lose the picture of what happened during the freeze.


Cap behavior

CapWhen hitBehavior
maxTrackedFacts257th distinct fact churnedLRU eviction (not FIFO) — the legitimate hot fact stays resident even if a hot-then-cold attacker churns the map.
maxParticipantsPerFact17th distinct resolver on one factDetailed participant tracking pauses for that fact until the cooldown expires. One "N-way contention" event still fires.
maxEmissionsPerSec11th loop detected in same second across all factsThe 11th-through-Nth detections increment suppressedSinceLastEmit. The NEXT event's suppressedSinceLastEmit field reports how many were dropped.

The buffer memory bound is small: 32 entries × 256 facts × ~80 bytes = under 1 MB worst case.


Mounting alongside clobberAlertPlugin

Both plugins subscribe to the same resolver.write.rejected event stream and operate independently. Order doesn't matter; both will fire on the same rejection if their filters match.

const system = createSystem({
  module: myModule,
  plugins: [
    clobberAlertPlugin({
      irreversibleTags: ['money', 'pii'],
      onAlert: pagerduty.trigger,
    }),
    clobberLoopPlugin({
      threshold: 5,
      onLoop: slack.postIncident,
    }).plugin,
  ],
});

Pairing them is the recommended production posture: instant pages on irreversible-tagged clobbers via clobberAlertPlugin, slower-fuse rule-design signal via clobberLoopPlugin.

Previous
Circuit Breaker

Stay in the loop. Sign up for our newsletter.

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven Runtime for TypeScript | AI Guardrails & State Management