Skip to main content
Every query passes through a 5-tier routing system, fastest to slowest. Each tier acts as a short-circuit: if it can answer, it returns immediately and skips all slower tiers. Every result is cached so subsequent queries drop to Tier 0 or 1.
TierNameLatencyLLM Calls
0Exact cache~0ms0
1Fuzzy cache~50ms0
2Direct search~100–200ms0
3Single LLM<5s1
4Agentic loop8–15sMultiple

Tier 0: Exact Cache Hit (0ms)

The fastest path. A normalized version of your query is looked up in an in-memory map.

Algorithm

  1. Normalize query: lowercase, trim, collapse whitespace
  2. Build cache key: normalized_query
  3. Map.get(key) — O(1) lookup
  4. Validate TTL: (now - storedAt) < 60s
  5. Validate fingerprint: MD5 hash of sorted path:mtime pairs must match current context tree state
  6. HIT — return cached response immediately

Context Tree Fingerprint

The cache uses a fingerprint of the context tree to detect changes:
  • Glob all .md files in .brv/context-tree/
  • Sort by path, concat path:mtime joined by |, MD5 hash, take first 16 hex chars
  • The fingerprint itself is cached for 30s to avoid repeated glob I/O
  • If any file is added, removed, or modified, the fingerprint changes and all cache entries are invalidated

Parameters

ParameterValue
Max cache size50 entries
TTL60 seconds
Fingerprint cache TTL30 seconds
Eviction policyLRU (oldest insertion order)
Cost: O(1) map lookup. Zero LLM, zero search, zero I/O.

Tier 1: Fuzzy Cache Match (~50ms)

When the exact match fails, the cache scans all entries for semantic similarity using Jaccard similarity on tokenized queries.

Algorithm

  1. Tokenize query: split on whitespace, remove stopwords, filter tokens shorter than 2 chars
  2. Skip fuzzy matching if query has fewer than 2 meaningful tokens
  3. For each cache entry:
    • Skip if fingerprint mismatch (cheap string compare)
    • Skip if TTL expired (cheap timestamp compare)
    • Compute Jaccard similarity: |A ∩ B| / |A ∪ B|
  4. Return the highest-similarity match if similarity >= 0.6

Jaccard Similarity

Optimization: always iterate the smaller set, check membership in the larger set.
intersection = count of shared tokens
union = |A| + |B| - intersection
similarity = intersection / union

Parameters

ParameterValue
Similarity threshold0.6
Min tokens for fuzzy2
Cost: O(n) over cache entries x O(min(|A|, |B|)) per comparison. No LLM, no I/O.

Out-of-Domain Short-Circuit

Between Tier 1 and Tier 2, if all searches return zero results, the query is classified as out-of-domain:
  • Returns: “This topic is not covered in the knowledge base.”
  • The OOD response is cached to prevent repeated misses from hitting the LLM
Cost: Zero. Saves an entire LLM call for irrelevant queries. When the initial local search returns fewer than 3 results, the executor extracts key entities and runs additional searches to improve recall.

Algorithm

  1. Split query into words, filter stopwords, keep words with length >= 3
  2. Take top 3 entities
  3. Run searches in parallel via Promise.allSettled()
  4. Deduplicate results by path
  5. Merge into original result set
For example, the query “How does JWT refresh work in the auth module?” extracts entities ["jwt", "refresh", "auth"] and runs 3 parallel MiniSearch lookups, merging any new unique results into the original set. Cost: Up to 3 additional MiniSearch lookups (in parallel). No LLM.

Tier 2: Direct Search Response (~100-200ms)

When search results are highly confident, the system skips the LLM entirely and returns a formatted markdown response assembled from raw document content.

Algorithm

  1. Filter search results with score >= 0.7, take top 5
  2. Read full document content from .brv/context-tree/ in parallel
  3. Run canRespondDirectly() decision:
    • Gate 1: topResult.score >= 0.85 (minimum threshold). If not, fail.
    • Gate 2a: topResult.score >= 0.93 — score is so strong that dominance check is skipped. Pass.
    • Gate 2b: gap = topScore - secondScore >= 0.08 — clear separation between top and runner-up. Pass.
  4. Format response: Summary + Details (max 5000 chars/doc, max 5 docs) + Sources + Gaps
  5. Cache result and return

Why Gap-Based, Not Ratio-Based?

BM25 normalized scores cluster in the [0.8, 0.95] range. A ratio check like “2x the second result” is mathematically impossible in that range. A fixed gap of 0.08 correctly identifies dominant matches.

Parameters

ParameterValue
Min score threshold0.85
High confidence (skip dominance)0.93
Min gap for dominance0.08
Max content per doc5000 chars
Max docs in response5
Cost: File reads only (no LLM). ~100-200ms for disk I/O.

Tier 3: Optimized Single LLM Call (<5s)

When search found good results (score >= 0.7) but not confident enough for Tier 2, the system makes a single constrained LLM call with pre-fetched context embedded in the prompt.

Algorithm

  1. Filter results with score >= 0.7, build a pre-fetched context string (excerpts formatted as markdown sections)
  2. Inject search data into sandbox as variables:
    • __query_results_{taskId} = search results array
    • __query_meta_{taskId} = {resultCount, topScore, hasPreFetched}
    • Build prompt with pre-fetched context embedded directly
  3. Execute with tight LLM overrides: maxTokens=1024, temperature=0.3
  4. LLM answers from the embedded context. If insufficient, it can use code_exec with silent: true to read additional documents from the sandbox variables
  5. Cache result and return

Parameters

ParameterValue
Max tokens1024
Temperature0.3
Max iterations50
Pre-fetch score threshold0.7
Max pre-fetched docs5
Cost: One LLM call with tight token and temperature constraints.

Tier 4: Full Agentic Loop (8-15s)

When no pre-fetched context is available (search returned nothing above the 0.7 threshold), the system falls back to the full agentic loop.

Algorithm

  1. Same sandbox variable injection as Tier 3
  2. Build prompt WITHOUT pre-fetched context — LLM must discover answers via tool use
  3. Execute with relaxed LLM overrides: maxTokens=2048, temperature=0.5
  4. LLM reads search results via code_exec, may call tools.readFile() to load documents, may loop through multiple tool calls
  5. Protected by doom-loop detection (max iterations limit)
  6. Cache result and return

Parameters

ParameterValue
Max tokens2048
Temperature0.5
Max iterations50
Cost: Multiple LLM calls with tool use. Full agentic loop with loop detection.

Knowledge Scoring

Search results that feed into Tiers 2–4 are ranked by a compound scoring algorithm:
compoundScore = (W_RELEVANCE x BM25 + W_IMPORTANCE x importance/100 + W_RECENCY x recency) x tier_boost
Currently, only BM25 relevance is active. Importance and recency weights are reserved for future use:
SignalWeightSource
BM25 relevance100% (1.0)MiniSearch full-text search
Importance0% (disabled)Access hits (+3/search) + curate updates (+5/update)
Recency0% (disabled)Exponential decay: e^(-days/30)

Tier Boost Multipliers

All tiers currently use the same boost (reserved for future differentiation):
MaturityBoost
corex1.00
validatedx1.00
draftx1.00

Maturity Lifecycle (Hysteresis)

draft --(importance >= 65)--> validated --(importance >= 85)--> core
draft <--(importance < 35)-- validated <--(importance < 60)-- core
The hysteresis gap (e.g., promote at 65, demote at 35) prevents rapid oscillation between tiers.

Decay Functions

  • Importance decay: importance x 0.995^days (~78% remaining after 50 days of non-use)
  • Recency decay: e^(-days/30) (half-life of ~21 days)

Complete Query Flow

User Query
    |
    |--- Fire parallel searches (local)
    |
    v
[Tier 0] Exact Cache Lookup
    |--- HIT --> Return (0ms)
    |
   MISS
    v
[Tier 1] Fuzzy Cache (Jaccard >= 0.6)
    |--- HIT --> Return (~50ms)
    |
   MISS
    v
[OOD] All searches returned 0 results?
    |--- YES --> "Not covered" response (0ms, cached)
    |
    NO
    v
[Entity Search] Initial results < 3?
    |--- YES --> Run supplementary entity searches (parallel)
    |
    v
[Tier 2] Direct Response (score >= 0.85 + dominant)
    |--- PASS --> Return formatted markdown (100-200ms)
    |
   NOT DOMINANT
    v
[Tier 3] Single LLM + Pre-fetched Context (score >= 0.7 exists)
    |--- PASS --> Return LLM response (<5s)
    |
   NO CONTEXT
    v
[Tier 4] Full Agentic Loop
    |--- Return LLM response (8-15s)
All tier results are cached, so subsequent similar queries resolve at Tier 0 or 1.