Cipher ensures efficient use of LLM context windows and prevents token overflow:
  • Prevents context overflow and truncation errors.
  • Automatically retains important (system/critical) and recent messages.
  • Optimizes prompt size for better LLM response quality.
  • Requires no manual intervention — compression is seamless and automatic.

How it works

  • Cipher tracks token usage for every message and the entire context in real time.
  • When token usage approaches the warning threshold, a warning is logged.
  • If token usage exceeds the compression threshold, Cipher triggers compression:
    • Hybrid strategy analyzes the conversation and chooses middle-removal or oldest-removal for best efficiency.
    • Middle-removal keeps the start/end, removes less relevant middle messages.
    • Oldest-removal removes oldest non-critical, non-system messages first.
  • After compression, the prompt is rebuilt to fit within the model’s token window.
  • Compression history is tracked for monitoring/debugging.

Compression Flow

Warning and Compression Thresholds

Cipher uses configurable thresholds for context utilization:
  • Warning Threshold: Default 0.8 (80% of maxTokens). When exceeded, a warning is logged.
  • Compression Threshold: Default 0.9 (90% of maxTokens). When exceeded, compression is triggered.
Thresholds and strategy parameters are model/provider-specific and can be tuned for each deployment.
  • Each LLM model (e.g., GPT-4, Claude 3, Gemini) has its own context window and recommended compression config (thresholds, strategy, preservation counts).
ModelContext WindowDefault StrategyWarning ThresholdCompression Threshold
GPT-48,192Hybrid0.850.9
GPT-3.5 Turbo4,096Hybrid0.80.9
Claude 3 Sonnet200,000Oldest-Removal0.850.92
Gemini Pro32,760Middle-Removal0.90.95
OpenAI o1-series16,384Middle-Removal0.90.95
Default/Unknown8,192Hybrid0.80.9
  • You can override these defaults in cipher/src/core/brain/llm/compression/factory.ts

Compression Strategies

Hybrid Strategy
  • Automatically selects the best compression method (middle-removal or oldest-removal) based on conversation length and token distribution.
  • Always preserves system/critical messages and minimum message count.
Middle-Removal Strategy
  • Preserves the first N and last M messages (configurable).
  • Always keeps system and critical messages.
  • Removes less relevant messages from the middle until the token target and minimum message count are satisfied.
Oldest-Removal Strategy
  • Preserves the last M messages (configurable).
  • Always keeps system and critical messages.
  • Iteratively removes the oldest non-critical, non-system messages until the token target and minimum message count are satisfied.

CompressionConfigSchema

export const CompressionConfigSchema = z.object({
    strategy: z.enum(['middle-removal', 'oldest-removal', 'hybrid']),
    maxTokens: z.number().positive(),
    warningThreshold: z.number().min(0).max(1).default(0.8),
    compressionThreshold: z.number().min(0).max(1).default(0.9),
    preserveStart: z.number().min(1).default(4),
    preserveEnd: z.number().min(1).default(5),
    minMessagesToKeep: z.number().min(1).default(4),
});

Example (CLI Mode)

  • When running in CLI mode, Cipher displays token usage per message and overall context.
  • Example: Scenario:
    cipher> Create a mock array of 15 chat messages with the following configuration:
    - `preserveStart = 10`  
    - `preserveEnd = 10`
    
    System Log Output
    ...
    20:38:16 INFO: [TokenAware] Token count updated: 8277 tokens for 20 messages
    20:38:16 INFO: [TokenAware] Compression threshold reached (101%), starting compression...
    ...
    20:38:16 INFO: [TokenAware] Token count updated: 5417 tokens for 12 messages
    20:38:16 INFO: [TokenAware] Compression completed: 9696.5 → 6080.5 tokens
    ...
    
    Respone
    • Response text:
    Certainly! Below is a TypeScript example that:
    - Creates 15 mock chat messages, 
    - Sets `preserveStart = 10` and `preserveEnd = 10` (which sum to 20, greater than total messages),
    - Adjusts these preservation counts to avoid overlaps, 
    - Extracts and logs the `preservedStart`, `middleMessages`, and `preservedEnd` slices, 
    ...
    
    • Response log:
    ⚡ Context has been compressed.