Plugins
•8 min read
Clobber Loop Detector
A single clobber on an abortOn:-bound fact is fine — the binding catches the race and the audit ledger records it. A loop is two or more resolvers whose when: predicates both satisfy a shared state and keep rewriting the fact every reconcile tick, indefinitely. Today the symptom surfaces as a value flapping between two states in a customer screenshot, even though the audit ledger holds 800 clobbers/sec of forensic evidence nobody is querying. clobberLoopPlugin closes the loop: one structured warning per detected loop, with a predicate-overlap proof that names the specific clauses fighting.
Quick Start
import { createSystem } from '@directive-run/core';
import { clobberLoopPlugin } from '@directive-run/core/plugins';
const detector = clobberLoopPlugin({
threshold: 5,
windowMs: 1000,
onLoop: (event) => {
pagerduty.trigger({
severity: event.severity,
summary: `Clobber loop on ${event.fact}: ${event.participants.join(' vs ')}`,
details: event,
});
},
});
const system = createSystem({
module: myModule,
plugins: [detector.plugin],
});
// During incident response:
detector.disable();
The returned handle is { plugin, disable, enable, isEnabled }. Pass .plugin to createSystem; use the rest at runtime when an SRE needs to flip the detector off without redeploying.
When to use clobberLoopPlugin vs clobberAlertPlugin
clobberAlertPlugin | clobberLoopPlugin | |
|---|---|---|
| Fires on | every clobber on an irreversible-tagged fact | sustained loops only (≥ N distinct rejections from ≥ 2 resolvers in window) |
| Output | per-event alert | one event per detected loop, with predicate-overlap proof |
| Use when | a single clobber is operationally urgent (money, PII) | the noise of "many clobbers" is the problem itself |
| Mounts with | clobber-alert | clobber-loop |
| Pair? | yes — both can run together |
Most production systems want both: clobberAlertPlugin pages on irreversible-tagged clobbers immediately, clobberLoopPlugin separately surfaces the rule-design bug behind a sustained churn.
How loop detection works
The detector subscribes to resolver.write.rejected events and aggregates them per fact:
┌────────────────────────────────────────────────────────┐
│ resolver.write.rejected stream │
│ │
│ ringBuffer[fact] = [{ timestamp, requirementId, │
│ resolverId, seq }, ...] │
│ │
│ trim to windowMs │
│ │
│ if distinct(requirementId) ≥ threshold │
│ AND distinct(resolverId) ≥ 2 │
│ AND (fact, participantSet) not in cooldown │
│ then: │
│ build PredicateOverlapProof from participants' │
│ whenSpecs │
│ PII-redact operands via system.meta.byTag("pii") │
│ emit resolver.clobber.loop.detected │
│ enter cooldown for cooldownMs │
└────────────────────────────────────────────────────────┘
The distinct-by-requirement-id counting matters: a single resolver's retry storm shares one requirement ID, so it counts as one rejection. The detector only fires on true multi-participant contention — never on a single resolver retrying itself.
When the loop quiets (default 30s without a new rejection on the (fact, participantSet)), a resolver.clobber.loop.resolved event closes the alarm so dashboards show active loops, not historical loops.
Configuration
| Option | Type | Default | Description |
|---|---|---|---|
windowMs | number | 1000 | Window over which rejections aggregate. |
threshold | number | 5 | Minimum distinct-requirement rejections in window to trigger. |
cooldownMs | number | 5000 | Suppress same-(fact, participantSet) re-fire for this duration after emission. |
resolvedAfterMs | number | 30000 | Quiet window before firing resolver.clobber.loop.resolved. |
maxTrackedFacts | number | 256 | Global LRU cap on facts the detector tracks. |
maxParticipantsPerFact | number | 16 | Per-fact cap on participant resolvers. |
maxEmissionsPerSec | number | 10 | Global emission cap. Above-cap detections surface in next event's suppressedSinceLastEmit. |
capturePII | boolean | false | If false (default), whenSpec operands at PII-tagged fact paths are redacted to "[redacted]" BEFORE the event leaves the plugin. Set to true only when the deployment has a data-processing addendum. |
onLoop | (event) => void | console.warn in dev, console.error to stderr in prod | Called for each detected loop. NOT noop in production — defaults to stderr so the signal lands in log pipelines without explicit routing. |
onResolved | (event) => void | undefined | Called when a previously-detected loop closes. |
The defaults assume dev / staging operation. For production wire onLoop: pagerduty.trigger or onLoop: slack.post explicitly — the stderr default is the floor under "no monitoring at all," not a recommended production sink.
The detected event
{
type: 'resolver.clobber.loop.detected',
systemId: string, // multi-tenant routing key
fact: string, // the contended fact key
participants: readonly string[], // sorted unique resolver IDs
participantModules: readonly string[], // each resolver's owning module
count: number, // distinct-requirement rejections in window
windowMs: number,
firstAt: number, // ms epoch of first event in window
lastAt: number, // ms epoch of trigger event
predicateOverlap?: PredicateOverlapProof, // see below
severity: 'warn' | 'error', // escalated to "error" when fact tagged "pii" or "money"
factTags: readonly string[], // surfaces tags without leaking values
suppressedSinceLastEmit: number, // global rate-limit overflow counter
rejectionSeqs: readonly number[], // audit cross-references
}
Emitted through system.observe() so audit-ledger, devtools, and OTel exporters all see it without depending on clobberLoopPlugin directly.
The resolved companion:
{
type: 'resolver.clobber.loop.resolved',
systemId: string,
fact: string,
participants: readonly string[],
durationMs: number,
resolution: 'no-recurrence-in-window'
| 'participant-disabled'
| 'predicate-narrowed',
}
The PredicateOverlapProof
This is the killer feature — when both fighting resolvers' constraints use data-form when: predicates, the plugin tells you the exact clauses that co-fire so the suggested fix isn't a guess.
type PredicateOverlapProof =
| {
verdict: 'matched';
coFireClauses: LeafClause[];
conflictingClauses: never[];
}
| {
verdict: 'overlap';
coFireClauses: LeafClause[];
conflictingClauses: LeafClause[];
}
| {
verdict: 'indeterminate';
reason: 'non-comparable-operator';
coFireClauses: LeafClause[];
}
| {
verdict: 'function-form-opaque';
reason: 'one-or-both-when-is-a-function';
whenSourceHashes?: string[];
};
| Verdict | Meaning | Warning text disclaims? |
|---|---|---|
matched | Both predicates have identical structural clauses. The strongest verdict — the rules are syntactic duplicates. | No |
overlap | Clauses share at least one path and at least one pairwise comparison says they co-fire, with no direct contradictions. | No |
indeterminate | A non-comparable operator ($regex, $elemMatch, $matches) appeared. Cannot prove overlap structurally. | Yes ("cannot prove overlap — predicate uses non-comparable operator") |
function-form-opaque | At least one constraint uses a function-form when:. Structural comparison is impossible. If audit-ledger is mounted, includes hashes of the function sources for cross-version diffing. | Yes ("cannot prove overlap — at least one constraint uses function-form when:") |
The proof builder uses the same flattenPredicate + clause-comparison machinery directive doctor uses to find contradictions before runtime — so a loop the detector flags at runtime is the same shape doctor could have flagged at design time.
Default warning text from console.warn:
[directive] CLOBBER LOOP on `cart.discount` (5 clobbers in 482ms)
Participants: applyCoupon, applyLoyaltyDiscount
Predicate overlap: matched
Both `when:` predicates fire when:
- user.loyaltyTier >= 2
- coupon.code exists
Suggested fix: add `priority:` to one, narrow `when:` to disjoint conditions, or merge into a single resolver.
Reason-aware shouldRetry integration
v1.23.0 also widened RetryPolicy.shouldRetry with an optional third argument — ShouldRetryContext — so a retry policy can decide based on WHY the attempt failed. The motivating case pairs with this detector: "retry on clobber, fail loud on real bugs."
resolvers: {
applyDiscount: {
requirement: 'APPLY_DISCOUNT',
retry: {
attempts: 5,
backoff: 'exponential',
shouldRetry: (err, attempt, ctx) => {
if (ctx?.reason === 'clobbered') return attempt < 5;
if (ctx?.reason === 'timeout') return attempt < 2;
return false; // 'error' / 'cancelled' → no retry
},
},
resolve: async (req, ctx) => { /* ... */ },
},
}
The ShouldRetryContext carries:
reason: 'clobbered' | 'timeout' | 'cancelled' | 'error'clobber?: { fact, expected, actual }— populated whenreason === 'clobbered'
Two-argument shouldRetry(err, attempt) callers continue to work unchanged — the third argument is additive. Before this change, a clobber-induced abort never reached shouldRetry at all; the controller's aborted signal short-circuited the retry path silently. Now policies can opt into bounded retries on contention while still failing loud on real errors.
PII safety
Redaction happens at event-construction time, not at message-format time. The plugin walks the participants' whenSpec through the same redactWhenSpec utility the audit-ledger uses, against system.meta.byTag("pii"), BEFORE the PredicateOverlapProof is attached to the emitted event. Downstream sinks (audit-ledger, devtools, third-party onLoop handlers) all receive the redacted form by default — there is no "raw vs formatted" split where a sink could accidentally log the unredacted operands.
capturePII: true is the explicit opt-out. Mirror of the audit-ledger contract — set it only when the deployment has a data-processing addendum that permits unredacted operand capture.
The slim per-rejection buffer entries ({ timestamp, requirementId, resolverId, seq }) deliberately don't carry expected / actual payloads at all — the audit ledger already keeps the forensic payload, so PII doesn't spread through the detector's own buffers.
Audit-ledger integration
When createAuditLedger is mounted alongside clobberLoopPlugin, both new event variants are captured as ledger entries:
import { createAuditLedger, clobberLoopPlugin, memorySink } from '@directive-run/core/plugins';
const ledger = createAuditLedger({ sink: memorySink() });
const detector = clobberLoopPlugin({ threshold: 5 });
const system = createSystem({
module: myModule,
plugins: [detector.plugin, ledger.plugin],
});
// Later — audit query for the loop:
const loopEntries = ledger
.recent(1000)
.filter((e) => e.kind === 'resolver.clobber.loop.detected');
Each resolver.clobber.loop.detected audit entry includes rejectionSeqs — the sequence numbers of the contributing resolver.write.rejected entries — so an auditor reading a loop entry can walk to every individual rejection. The proof's overlapVerdict is surfaced as a tag (the predicate clauses are PII-redacted upstream).
Runtime kill-switch
The plugin's return handle exposes disable(), enable(), and isEnabled() so an SRE can flip the detector off during incident response without redeploying. The buffer state survives across toggle — enable() resumes cleanly without a warm-up delay.
const detector = clobberLoopPlugin({ threshold: 5 });
// At incident time:
detector.disable();
// ...investigate / mitigate...
detector.enable();
detector.isEnabled(); // true
disable() stops emission. Inbound resolver.write.rejected events still update the ring buffer so a re-enable doesn't lose the picture of what happened during the freeze.
Cap behavior
| Cap | When hit | Behavior |
|---|---|---|
maxTrackedFacts | 257th distinct fact churned | LRU eviction (not FIFO) — the legitimate hot fact stays resident even if a hot-then-cold attacker churns the map. |
maxParticipantsPerFact | 17th distinct resolver on one fact | Detailed participant tracking pauses for that fact until the cooldown expires. One "N-way contention" event still fires. |
maxEmissionsPerSec | 11th loop detected in same second across all facts | The 11th-through-Nth detections increment suppressedSinceLastEmit. The NEXT event's suppressedSinceLastEmit field reports how many were dropped. |
The buffer memory bound is small: 32 entries × 256 facts × ~80 bytes = under 1 MB worst case.
Mounting alongside clobberAlertPlugin
Both plugins subscribe to the same resolver.write.rejected event stream and operate independently. Order doesn't matter; both will fire on the same rejection if their filters match.
const system = createSystem({
module: myModule,
plugins: [
clobberAlertPlugin({
irreversibleTags: ['money', 'pii'],
onAlert: pagerduty.trigger,
}),
clobberLoopPlugin({
threshold: 5,
onLoop: slack.postIncident,
}).plugin,
],
});
Pairing them is the recommended production posture: instant pages on irreversible-tagged clobbers via clobberAlertPlugin, slower-fuse rule-design signal via clobberLoopPlugin.

