- Prevents context overflow and truncation errors.
- Automatically retains important (system/critical) and recent messages.
- Optimizes prompt size for better LLM response quality.
- Requires no manual intervention — compression is seamless and automatic.
How it works
- Cipher tracks token usage for every message and the entire context in real time.
- When token usage approaches the warning threshold, a warning is logged.
- If token usage exceeds the compression threshold, Cipher triggers compression:
- Hybrid strategy analyzes the conversation and chooses middle-removal or oldest-removal for best efficiency.
- Middle-removal keeps the start/end, removes less relevant middle messages.
- Oldest-removal removes oldest non-critical, non-system messages first.
- After compression, the prompt is rebuilt to fit within the model’s token window.
- Compression history is tracked for monitoring/debugging.
Compression Flow
Warning and Compression Thresholds
Cipher uses configurable thresholds for context utilization:- Warning Threshold: Default 0.8 (80% of maxTokens). When exceeded, a warning is logged.
- Compression Threshold: Default 0.9 (90% of maxTokens). When exceeded, compression is triggered.
- Each LLM model (e.g., GPT-4, Claude 3, Gemini) has its own context window and recommended compression config (thresholds, strategy, preservation counts).
| Model | Context Window | Default Strategy | Warning Threshold | Compression Threshold |
|---|---|---|---|---|
| GPT-4 | 8,192 | Hybrid | 0.85 | 0.9 |
| GPT-3.5 Turbo | 4,096 | Hybrid | 0.8 | 0.9 |
| Claude 3 Sonnet | 200,000 | Oldest-Removal | 0.85 | 0.92 |
| Gemini Pro | 32,760 | Middle-Removal | 0.9 | 0.95 |
| OpenAI o1-series | 16,384 | Middle-Removal | 0.9 | 0.95 |
| Default/Unknown | 8,192 | Hybrid | 0.8 | 0.9 |
- You can override these defaults in
cipher/src/core/brain/llm/compression/factory.ts
Compression Strategies
Hybrid Strategy- Automatically selects the best compression method (middle-removal or oldest-removal) based on conversation length and token distribution.
- Always preserves system/critical messages and minimum message count.
- Preserves the first N and last M messages (configurable).
- Always keeps system and critical messages.
- Removes less relevant messages from the middle until the token target and minimum message count are satisfied.
- Preserves the last M messages (configurable).
- Always keeps system and critical messages.
- Iteratively removes the oldest non-critical, non-system messages until the token target and minimum message count are satisfied.
CompressionConfigSchema
Example (CLI Mode)
- When running in CLI mode, Cipher displays token usage per message and overall context.
- Example:
Scenario:
System Log OutputRespone
- Response text:
- Response log: