Token Management

Cipher ensures efficient use of LLM context windows and prevents token overflow:

Prevents context overflow and truncation errors.
Automatically retains important (system/critical) and recent messages.
Optimizes prompt size for better LLM response quality.
Requires no manual intervention — compression is seamless and automatic.

How it works

Cipher tracks token usage for every message and the entire context in real time.
When token usage approaches the warning threshold, a warning is logged.
If token usage exceeds the compression threshold, Cipher triggers compression:
- Hybrid strategy analyzes the conversation and chooses middle-removal or oldest-removal for best efficiency.
- Middle-removal keeps the start/end, removes less relevant middle messages.
- Oldest-removal removes oldest non-critical, non-system messages first.
After compression, the prompt is rebuilt to fit within the model’s token window.
Compression history is tracked for monitoring/debugging.

Compression Flow

Warning and Compression Thresholds

Cipher uses configurable thresholds for context utilization:

Warning Threshold: Default 0.8 (80% of maxTokens). When exceeded, a warning is logged.
Compression Threshold: Default 0.9 (90% of maxTokens). When exceeded, compression is triggered.

Thresholds and strategy parameters are model/provider-specific and can be tuned for each deployment.

Each LLM model (e.g., GPT-4, Claude 3, Gemini) has its own context window and recommended compression config (thresholds, strategy, preservation counts).

Model	Context Window	Default Strategy	Warning Threshold	Compression Threshold
GPT-4	8,192	Hybrid	0.85	0.9
GPT-3.5 Turbo	4,096	Hybrid	0.8	0.9
Claude 3 Sonnet	200,000	Oldest-Removal	0.85	0.92
Gemini Pro	32,760	Middle-Removal	0.9	0.95
OpenAI o1-series	16,384	Middle-Removal	0.9	0.95
Default/Unknown	8,192	Hybrid	0.8	0.9

You can override these defaults in cipher/src/core/brain/llm/compression/factory.ts

Compression Strategies

Hybrid Strategy

Automatically selects the best compression method (middle-removal or oldest-removal) based on conversation length and token distribution.
Always preserves system/critical messages and minimum message count.

Middle-Removal Strategy

Preserves the first N and last M messages (configurable).
Always keeps system and critical messages.
Removes less relevant messages from the middle until the token target and minimum message count are satisfied.

Oldest-Removal Strategy

Preserves the last M messages (configurable).
Always keeps system and critical messages.
Iteratively removes the oldest non-critical, non-system messages until the token target and minimum message count are satisfied.

CompressionConfigSchema

export const CompressionConfigSchema = z.object({
    strategy: z.enum(['middle-removal', 'oldest-removal', 'hybrid']),
    maxTokens: z.number().positive(),
    warningThreshold: z.number().min(0).max(1).default(0.8),
    compressionThreshold: z.number().min(0).max(1).default(0.9),
    preserveStart: z.number().min(1).default(4),
    preserveEnd: z.number().min(1).default(5),
    minMessagesToKeep: z.number().min(1).default(4),
});

Example (CLI Mode)

When running in CLI mode, Cipher displays token usage per message and overall context.

Example: Scenario:

cipher> Create a mock array of 15 chat messages with the following configuration:
- `preserveStart = 10`  
- `preserveEnd = 10`

System Log Output

...
20:38:16 INFO: [TokenAware] Token count updated: 8277 tokens for 20 messages
20:38:16 INFO: [TokenAware] Compression threshold reached (101%), starting compression...
...
20:38:16 INFO: [TokenAware] Token count updated: 5417 tokens for 12 messages
20:38:16 INFO: [TokenAware] Compression completed: 9696.5 → 6080.5 tokens
...

Respone

Response text:

Certainly! Below is a TypeScript example that:
- Creates 15 mock chat messages, 
- Sets `preserveStart = 10` and `preserveEnd = 10` (which sum to 20, greater than total messages),
- Adjusts these preservation counts to avoid overlaps, 
- Extracts and logs the `preservedStart`, `middleMessages`, and `preservedEnd` slices, 
...

Response log:

⚡ Context has been compressed.

Getting Started

Configuration

Memory System

MCP Integration

Session Management

How it works

Compression Flow

Warning and Compression Thresholds

Compression Strategies

CompressionConfigSchema

Example (CLI Mode)

Getting Started

Configuration

Memory System

MCP Integration

Session Management

​How it works

​Compression Flow

​Warning and Compression Thresholds

​Compression Strategies

​CompressionConfigSchema

​Example (CLI Mode)

How it works

Compression Flow

Warning and Compression Thresholds

Compression Strategies

CompressionConfigSchema

Example (CLI Mode)