Skip to main content

Infrastructure

3 min read

Semantic Cache

Cache agent responses by semantic similarity so equivalent questions hit cache instead of calling the LLM.

The semantic cache uses embeddings to match similar inputs, pluggable storage backends, and approximate nearest neighbor (ANN) indexes for fast lookups at scale.


Quick Start

import {
  createSemanticCache,
  createBruteForceIndex,
  createInMemoryStorage,
  createSemanticCacheGuardrail,
} from '@directive-run/ai';

const cache = createSemanticCache({
  embedder: async (texts) => {
    // Your embedding function – returns number[][] of vectors
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: texts,
    });

    return response.data.map((d) => d.embedding);
  },
  storage: createInMemoryStorage(),
  similarityThreshold: 0.85,  // Cosine similarity threshold (0–1)
  ttlMs: 3600000,             // Cache entries expire after 1 hour
  onHit: (entry, similarity) => console.log('Cache hit:', entry.query, similarity),
  onMiss: (query) => console.log('Cache miss:', query),
});

// Use as a guardrail – short-circuits the agent call on cache hit
const guardrail = createSemanticCacheGuardrail({ cache });

const orchestrator = createAgentOrchestrator({
  runner,
  guardrails: {
    input: [guardrail],
  },
});

Configuration

OptionTypeDefaultDescription
embedderEmbedderFnrequired(texts: string[]) => Promise<number[][]>
similarityThresholdnumber0.9Cosine similarity threshold (0–1)
maxCacheSizenumber1000Maximum number of entries to cache
ttlMsnumber3600000Cache entry TTL (ms)
namespacestringCache namespace for multi-tenant scenarios
storageSemanticCacheStorageCustom storage backend (defaults to in-memory)
perAgentbooleanInclude agent name in cache key
onHit(entry, similarity) => voidCache hit callback
onMiss(query) => voidCache miss callback
onError(error) => voidCache lookup error callback

ANN Indexes

Brute Force

Exact search – compares against every entry. Best for small datasets (<10K entries):

import { createBruteForceIndex } from '@directive-run/ai';

const index = createBruteForceIndex();

VP-Tree

Vantage-Point Tree for larger datasets. Reduces search from O(n) to O(log n) average:

import { createVPTreeIndex } from '@directive-run/ai';

const index = createVPTreeIndex();

Batched Embedding

Batch concurrent embedding calls to reduce API round-trips:

import { createBatchedEmbedder } from '@directive-run/ai';

const batchedEmbedder = createBatchedEmbedder(
  async (texts) => {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: texts,
    });

    return response.data.map((d) => d.embedding);
  },
  {
    maxBatchSize: 100,
    maxWaitMs: 50,
  }
);

Concurrent batchedEmbedder() calls within the maxWaitMs window are collected into a single batch call to the underlying embedder.


Storage

In-Memory

import { createInMemoryStorage } from '@directive-run/ai';

const storage = createInMemoryStorage();

Custom Storage

Implement the SemanticCacheStorage interface for persistent backends:

const storage = {
  getEntries: async (namespace: string) => { /* ... */ },
  addEntry: async (namespace: string, entry: CacheEntry) => { /* ... */ },
  updateEntry: async (namespace: string, id: string, updates: Partial<CacheEntry>) => { /* ... */ },
  removeEntry: async (namespace: string, id: string) => { /* ... */ },
  clear: async (namespace: string) => { /* ... */ },
};

Cache Guardrail

createSemanticCacheGuardrail returns a guardrail that intercepts agent input and returns the cached response on a hit, bypassing the LLM entirely:

import { createSemanticCacheGuardrail } from '@directive-run/ai';

const guardrail = createSemanticCacheGuardrail({ cache });

// Single-agent
const orchestrator = createAgentOrchestrator({
  runner,
  guardrails: { input: [guardrail] },
});

// Multi-agent — cache guardrail at orchestrator level applies to all agents
const multi = createMultiAgentOrchestrator({
  runner,
  agents: { researcher: { agent: researcher }, writer: { agent: writer } },
  guardrails: { input: [guardrail] },
});

On a cache miss, the agent runs normally and the result is cached for future queries.


Testing

import { createTestEmbedder } from '@directive-run/ai';

// Deterministic embedder for tests – consistent vectors for same input
const testEmbedder = createTestEmbedder();

Embedder Function

The EmbedderFn type:

type EmbedderFn = (texts: string[]) => Promise<number[][]>;

It receives an array of strings and returns an array of embedding vectors (number arrays). The vectors must all have the same dimensionality.


Next Steps

Previous
SSE Transport

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven State Management for TypeScript