Documentation Index
Fetch the complete documentation index at: https://docs.byterover.dev/llms.txt
Use this file to discover all available pages before exploring further.
Every query passes through a 5-tier routing system, fastest to slowest. Each tier acts as a short-circuit: if it can answer, it returns immediately and skips all slower tiers. Every result is cached so subsequent queries drop to Tier 0 or 1.
| Tier | Name | Latency | LLM Calls |
|---|
| 0 | Exact cache | ~0ms | 0 |
| 1 | Fuzzy cache | ~50ms | 0 |
| 2 | Direct search | ~100–200ms | 0 |
| 3 | Single LLM | <5s | 1 |
| 4 | Agentic loop | 8–15s | Multiple |
Tier 0: Exact Cache Hit (0ms)
The fastest path. A normalized version of your query is looked up in an in-memory map.
Algorithm
- Normalize query: lowercase, trim, collapse whitespace
- Build cache key:
normalized_query
Map.get(key) — O(1) lookup
- Validate TTL:
(now - storedAt) < 60s
- Validate fingerprint: MD5 hash of sorted
path:mtime pairs must match current context tree state
- HIT — return cached response immediately
Context Tree Fingerprint
The cache uses a fingerprint of the context tree to detect changes:
- Glob all
.md files in .brv/context-tree/
- Sort by path, concat
path:mtime joined by |, MD5 hash, take first 16 hex chars
- The fingerprint itself is cached for 30s to avoid repeated glob I/O
- If any file is added, removed, or modified, the fingerprint changes and all cache entries are invalidated
Parameters
| Parameter | Value |
|---|
| Max cache size | 50 entries |
| TTL | 60 seconds |
| Fingerprint cache TTL | 30 seconds |
| Eviction policy | LRU (oldest insertion order) |
Cost: O(1) map lookup. Zero LLM, zero search, zero I/O.
Tier 1: Fuzzy Cache Match (~50ms)
When the exact match fails, the cache scans all entries for semantic similarity using Jaccard similarity on tokenized queries.
Algorithm
- Tokenize query: split on whitespace, remove stopwords, filter tokens shorter than 2 chars
- Skip fuzzy matching if query has fewer than 2 meaningful tokens
- For each cache entry:
- Skip if fingerprint mismatch (cheap string compare)
- Skip if TTL expired (cheap timestamp compare)
- Compute Jaccard similarity:
|A ∩ B| / |A ∪ B|
- Return the highest-similarity match if
similarity >= 0.6
Jaccard Similarity
Optimization: always iterate the smaller set, check membership in the larger set.
intersection = count of shared tokens
union = |A| + |B| - intersection
similarity = intersection / union
Parameters
| Parameter | Value |
|---|
| Similarity threshold | 0.6 |
| Min tokens for fuzzy | 2 |
Cost: O(n) over cache entries x O(min(|A|, |B|)) per comparison. No LLM, no I/O.
Out-of-Domain Short-Circuit
Between Tier 1 and Tier 2, if all searches return zero results, the query is classified as out-of-domain:
- Returns: “This topic is not covered in the knowledge base.”
- The OOD response is cached to prevent repeated misses from hitting the LLM
Cost: Zero. Saves an entire LLM call for irrelevant queries.
Supplementary Entity Search
When the initial local search returns fewer than 3 results, the executor extracts key entities and runs additional searches to improve recall.
Algorithm
- Split query into words, filter stopwords, keep words with length >= 3
- Take top 3 entities
- Run searches in parallel via
Promise.allSettled()
- Deduplicate results by path
- Merge into original result set
For example, the query “How does JWT refresh work in the auth module?” extracts entities ["jwt", "refresh", "auth"] and runs 3 parallel MiniSearch lookups, merging any new unique results into the original set.
Cost: Up to 3 additional MiniSearch lookups (in parallel). No LLM.
Tier 2: Direct Search Response (~100-200ms)
When search results are highly confident, the system skips the LLM entirely and returns a formatted markdown response assembled from raw document content.
Algorithm
- Filter search results with score >= 0.7, take top 5
- Read full document content from
.brv/context-tree/ in parallel
- Run
canRespondDirectly() decision:
- Gate 1:
topResult.score >= 0.85 (minimum threshold). If not, fail.
- Gate 2a:
topResult.score >= 0.93 — score is so strong that dominance check is skipped. Pass.
- Gate 2b:
gap = topScore - secondScore >= 0.08 — clear separation between top and runner-up. Pass.
- Format response: Summary + Details (max 5000 chars/doc, max 5 docs) + Sources + Gaps
- Cache result and return
Why Gap-Based, Not Ratio-Based?
BM25 normalized scores cluster in the [0.8, 0.95] range. A ratio check like “2x the second result” is mathematically impossible in that range. A fixed gap of 0.08 correctly identifies dominant matches.
Parameters
| Parameter | Value |
|---|
| Min score threshold | 0.85 |
| High confidence (skip dominance) | 0.93 |
| Min gap for dominance | 0.08 |
| Max content per doc | 5000 chars |
| Max docs in response | 5 |
Cost: File reads only (no LLM). ~100-200ms for disk I/O.
Tier 3: Optimized Single LLM Call (<5s)
When search found good results (score >= 0.7) but not confident enough for Tier 2, the system makes a single constrained LLM call with pre-fetched context embedded in the prompt.
Algorithm
- Filter results with score >= 0.7, build a pre-fetched context string (excerpts formatted as markdown sections)
- Inject search data into sandbox as variables:
__query_results_{taskId} = search results array
__query_meta_{taskId} = {resultCount, topScore, hasPreFetched}
- Build prompt with pre-fetched context embedded directly
- Execute with tight LLM overrides:
maxTokens=1024, temperature=0.3
- LLM answers from the embedded context. If insufficient, it can use
code_exec with silent: true to read additional documents from the sandbox variables
- Cache result and return
Parameters
| Parameter | Value |
|---|
| Max tokens | 1024 |
| Temperature | 0.3 |
| Max iterations | 50 |
| Pre-fetch score threshold | 0.7 |
| Max pre-fetched docs | 5 |
Cost: One LLM call with tight token and temperature constraints.
Tier 4: Full Agentic Loop (8-15s)
When no pre-fetched context is available (search returned nothing above the 0.7 threshold), the system falls back to the full agentic loop.
Algorithm
- Same sandbox variable injection as Tier 3
- Build prompt WITHOUT pre-fetched context — LLM must discover answers via tool use
- Execute with relaxed LLM overrides:
maxTokens=2048, temperature=0.5
- LLM reads search results via
code_exec, may call tools.readFile() to load documents, may loop through multiple tool calls
- Protected by doom-loop detection (max iterations limit)
- Cache result and return
Parameters
| Parameter | Value |
|---|
| Max tokens | 2048 |
| Temperature | 0.5 |
| Max iterations | 50 |
Cost: Multiple LLM calls with tool use. Full agentic loop with loop detection.
Knowledge Scoring
Search results that feed into Tiers 2–4 are ranked by a compound scoring algorithm:
compoundScore = (0.6 x BM25 + 0.2 x importance/100 + 0.2 x recency) x tier_boost
All three signals are active — relevance, accumulated importance, and freshness together determine result ranking:
| Signal | Weight | Source |
|---|
| BM25 relevance | 60% | MiniSearch full-text search, normalized via score / (1 + score) |
| Importance | 20% | Access hits (+3/search) + curate updates (+5/update), decays 0.995^days |
| Recency | 20% | Exponential decay: e^(-days/30) |
Tier Boost Multipliers
Maturity tiers amplify or penalize the compound score:
| Maturity | Boost |
|---|
| core | x1.15 |
| validated | x1.00 |
| draft | x0.85 |
Maturity Lifecycle (Hysteresis)
draft --(importance >= 65)--> validated --(importance >= 85)--> core
draft <--(importance < 35)-- validated <--(importance < 60)-- core
The hysteresis gap (e.g., promote at 65, demote at 35) prevents rapid oscillation between tiers.
Decay Functions
- Importance decay:
importance x 0.995^days (~78% remaining after 50 days of non-use)
- Recency decay:
e^(-days/30) (half-life of ~21 days)
Complete Query Flow
User Query
|
|--- Fire parallel searches (local)
|
v
[Tier 0] Exact Cache Lookup
|--- HIT --> Return (0ms)
|
MISS
v
[Tier 1] Fuzzy Cache (Jaccard >= 0.6)
|--- HIT --> Return (~50ms)
|
MISS
v
[OOD] All searches returned 0 results?
|--- YES --> "Not covered" response (0ms, cached)
|
NO
v
[Entity Search] Initial results < 3?
|--- YES --> Run supplementary entity searches (parallel)
|
v
[Tier 2] Direct Response (score >= 0.85 + dominant)
|--- PASS --> Return formatted markdown (100-200ms)
|
NOT DOMINANT
v
[Tier 3] Single LLM + Pre-fetched Context (score >= 0.7 exists)
|--- PASS --> Return LLM response (<5s)
|
NO CONTEXT
v
[Tier 4] Full Agentic Loop
|--- Return LLM response (8-15s)
All tier results are cached, so subsequent similar queries resolve at Tier 0 or 1.