| Tier | Name | Latency | LLM Calls |
|---|---|---|---|
| 0 | Exact cache | ~0ms | 0 |
| 1 | Fuzzy cache | ~50ms | 0 |
| 2 | Direct search | ~100–200ms | 0 |
| 3 | Single LLM | <5s | 1 |
| 4 | Agentic loop | 8–15s | Multiple |
Tier 0: Exact Cache Hit (0ms)
The fastest path. A normalized version of your query is looked up in an in-memory map.Algorithm
- Normalize query: lowercase, trim, collapse whitespace
- Build cache key:
normalized_query Map.get(key)— O(1) lookup- Validate TTL:
(now - storedAt) < 60s - Validate fingerprint: MD5 hash of sorted
path:mtimepairs must match current context tree state - HIT — return cached response immediately
Context Tree Fingerprint
The cache uses a fingerprint of the context tree to detect changes:- Glob all
.mdfiles in.brv/context-tree/ - Sort by path, concat
path:mtimejoined by|, MD5 hash, take first 16 hex chars - The fingerprint itself is cached for 30s to avoid repeated glob I/O
- If any file is added, removed, or modified, the fingerprint changes and all cache entries are invalidated
Parameters
| Parameter | Value |
|---|---|
| Max cache size | 50 entries |
| TTL | 60 seconds |
| Fingerprint cache TTL | 30 seconds |
| Eviction policy | LRU (oldest insertion order) |
Tier 1: Fuzzy Cache Match (~50ms)
When the exact match fails, the cache scans all entries for semantic similarity using Jaccard similarity on tokenized queries.Algorithm
- Tokenize query: split on whitespace, remove stopwords, filter tokens shorter than 2 chars
- Skip fuzzy matching if query has fewer than 2 meaningful tokens
- For each cache entry:
- Skip if fingerprint mismatch (cheap string compare)
- Skip if TTL expired (cheap timestamp compare)
- Compute Jaccard similarity:
|A ∩ B| / |A ∪ B|
- Return the highest-similarity match if
similarity >= 0.6
Jaccard Similarity
Parameters
| Parameter | Value |
|---|---|
| Similarity threshold | 0.6 |
| Min tokens for fuzzy | 2 |
Out-of-Domain Short-Circuit
Between Tier 1 and Tier 2, if all searches return zero results, the query is classified as out-of-domain:- Returns: “This topic is not covered in the knowledge base.”
- The OOD response is cached to prevent repeated misses from hitting the LLM
Supplementary Entity Search
When the initial local search returns fewer than 3 results, the executor extracts key entities and runs additional searches to improve recall.Algorithm
- Split query into words, filter stopwords, keep words with length >= 3
- Take top 3 entities
- Run searches in parallel via
Promise.allSettled() - Deduplicate results by path
- Merge into original result set
["jwt", "refresh", "auth"] and runs 3 parallel MiniSearch lookups, merging any new unique results into the original set.
Cost: Up to 3 additional MiniSearch lookups (in parallel). No LLM.
Tier 2: Direct Search Response (~100-200ms)
When search results are highly confident, the system skips the LLM entirely and returns a formatted markdown response assembled from raw document content.Algorithm
- Filter search results with score >= 0.7, take top 5
- Read full document content from
.brv/context-tree/in parallel - Run
canRespondDirectly()decision:- Gate 1:
topResult.score >= 0.85(minimum threshold). If not, fail. - Gate 2a:
topResult.score >= 0.93— score is so strong that dominance check is skipped. Pass. - Gate 2b:
gap = topScore - secondScore >= 0.08— clear separation between top and runner-up. Pass.
- Gate 1:
- Format response: Summary + Details (max 5000 chars/doc, max 5 docs) + Sources + Gaps
- Cache result and return
Why Gap-Based, Not Ratio-Based?
BM25 normalized scores cluster in the [0.8, 0.95] range. A ratio check like “2x the second result” is mathematically impossible in that range. A fixed gap of 0.08 correctly identifies dominant matches.Parameters
| Parameter | Value |
|---|---|
| Min score threshold | 0.85 |
| High confidence (skip dominance) | 0.93 |
| Min gap for dominance | 0.08 |
| Max content per doc | 5000 chars |
| Max docs in response | 5 |
Tier 3: Optimized Single LLM Call (<5s)
When search found good results (score >= 0.7) but not confident enough for Tier 2, the system makes a single constrained LLM call with pre-fetched context embedded in the prompt.Algorithm
- Filter results with score >= 0.7, build a pre-fetched context string (excerpts formatted as markdown sections)
- Inject search data into sandbox as variables:
__query_results_{taskId}= search results array__query_meta_{taskId}={resultCount, topScore, hasPreFetched}- Build prompt with pre-fetched context embedded directly
- Execute with tight LLM overrides:
maxTokens=1024, temperature=0.3 - LLM answers from the embedded context. If insufficient, it can use
code_execwithsilent: trueto read additional documents from the sandbox variables - Cache result and return
Parameters
| Parameter | Value |
|---|---|
| Max tokens | 1024 |
| Temperature | 0.3 |
| Max iterations | 50 |
| Pre-fetch score threshold | 0.7 |
| Max pre-fetched docs | 5 |
Tier 4: Full Agentic Loop (8-15s)
When no pre-fetched context is available (search returned nothing above the 0.7 threshold), the system falls back to the full agentic loop.Algorithm
- Same sandbox variable injection as Tier 3
- Build prompt WITHOUT pre-fetched context — LLM must discover answers via tool use
- Execute with relaxed LLM overrides:
maxTokens=2048, temperature=0.5 - LLM reads search results via
code_exec, may calltools.readFile()to load documents, may loop through multiple tool calls - Protected by doom-loop detection (max iterations limit)
- Cache result and return
Parameters
| Parameter | Value |
|---|---|
| Max tokens | 2048 |
| Temperature | 0.5 |
| Max iterations | 50 |
Knowledge Scoring
Search results that feed into Tiers 2–4 are ranked by a compound scoring algorithm:| Signal | Weight | Source |
|---|---|---|
| BM25 relevance | 100% (1.0) | MiniSearch full-text search |
| Importance | 0% (disabled) | Access hits (+3/search) + curate updates (+5/update) |
| Recency | 0% (disabled) | Exponential decay: e^(-days/30) |
Tier Boost Multipliers
All tiers currently use the same boost (reserved for future differentiation):| Maturity | Boost |
|---|---|
| core | x1.00 |
| validated | x1.00 |
| draft | x1.00 |
Maturity Lifecycle (Hysteresis)
Decay Functions
- Importance decay:
importance x 0.995^days(~78% remaining after 50 days of non-use) - Recency decay:
e^(-days/30)(half-life of ~21 days)