From memstack
Explains TokenStack, the built-in compression proxy that reduces Claude Code tool output before it reaches the Anthropic API, saving tokens and extending context window capacity.
How this skill is triggered — by the user, by Claude, or both
Slash command
/memstack:token-optimizationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
*One built-in compression proxy that shrinks Claude Code tool output before it reaches the Anthropic API.*
One built-in compression proxy that shrinks Claude Code tool output before it reaches the Anthropic API.
When this skill activates, output:
TokenStack - enabling compression & reading your savings...
Then execute the protocol below.
| Context | Status |
|---|---|
| User asks about token savings or context optimization | ACTIVE - full guide |
| User says "TokenStack", "token stack", "reduce tokens" | ACTIVE - relevant section |
| User wants to enable or confirm the proxy | ACTIVE - enable steps |
| User asks how to read their savings | ACTIVE - dashboard section |
| Proxy crash, health check, or live status | DORMANT - use Compress skill |
| User is actively coding (no optimization discussion) | DORMANT - do not activate |
TokenStack is a single transparent proxy that sits between Claude Code and the Anthropic API. It intercepts each request, compresses the bulky tool output inside it, and forwards the smaller payload upstream. Less text per turn means more usable context and lower token cost.
It is built into the memstack-skill-loader package. There is nothing extra to install: if you have MemStack, you have TokenStack.
Earlier versions documented a 3-layer manual setup (Serena MCP, RTK CLI, and the Headroom API proxy). That stack is retired. TokenStack supersedes all three. There is no pip install, no Rust binary, no MCP server, and no command prefixing.
Start the dashboard with the proxy flag:
python -m memstack_skill_loader dashboard --with-proxy
This starts the TokenStack proxy on 127.0.0.1:8787 and sets ANTHROPIC_BASE_URL for you, so Claude Code traffic routes through it automatically. No manual environment configuration is needed.
Options:
--proxy-port N changes the proxy port (default 8787).python -m memstack_skill_loader proxy.Free-tier transforms run on every request and are lossless (they remove only redundant formatting):
| Transform | What it removes |
|---|---|
| Strip ANSI codes | terminal color and escape sequences |
| Strip trailing whitespace | end-of-line padding |
| Collapse blank lines | runs of empty lines |
| Dedup consecutive identical lines | repeated identical lines |
| Strip preambles | "Here is the contents of file..." lead-ins |
| Collapse inline whitespace (Python) | redundant intra-line spacing |
Pro tier (active with a valid Pro license) adds seven more transforms on top:
| Transform | Effect |
|---|---|
| AST truncation | Shortens Python function bodies while keeping signatures and type annotations. Largest single saving (around 78% on line-numbered Python). Lossy by design: Python code blocks are not preserved byte-for-byte. |
| JSON compression | Minifies verbose JSON output |
| Log deduplication | Folds repeated log lines |
| Path compression | Shortens long repeated file paths |
| Markdown stripping | Removes decorative markdown |
| System-prompt compression | Compresses system-prompt boilerplate |
| Conversation-history dedup | Drops duplicated earlier message blocks |
Only AST truncation is lossy. Every other transform reduces tokens without changing meaning.
The dashboard shows a proxy indicator with a live PRO or FREE tier badge plus your session and 30-day savings percentages. If the badge is present, traffic is routing through TokenStack.
A quick health check from a terminal:
curl http://127.0.0.1:8787/health
The dashboard reports savings in three places:
python -m memstack_skill_loader dashboard --with-proxy.No installs, no MCP servers, no command prefixing.
| Skill | Scope | When to Use |
|---|---|---|
| Token Optimization (this) | What TokenStack is, how to enable it, free vs Pro, reading savings | Understanding or turning on compression |
| Compress | Proxy health and live status troubleshooting | Proxy not routing, health checks |
| Context DB | SQLite fact store | Reducing repeated reads of project context |
npx claudepluginhub cwinvestments/memstack --plugin memstackMonitors and troubleshoots TokenStack proxy for Claude Code. Useful for token savings or proxy issues.
Audits Claude Code or Codex setup for context window waste, recovers 5–25% via config cleanup and compaction.
Optimizes Claude Code sessions for Max-plan token limits via response compression, tool output filtering, drift prevention, and planning for broad tasks.