Two layers of compression: why Slipstream added RTK alongside headroom

Where the tokens go

An AI coding agent burns tokens two different ways, and they have almost nothing to do with each other.

The first is the conversation. Every turn re-sends a growing pile of context — files it already read, the system prompt, earlier turns, the plan. The model bills you for all of it, again, on every request.

The second is command output. The moment an agent runs git status, npm test, grep or kubectl get, the full, verbose result gets pasted straight back into the context — borders, timestamps, stack traces, hundreds of lines of log. That output wasn't in the conversation a second ago; the agent's own tools just put it there.

headroom was built for the first problem. RTK is built for the second. Slipstream now runs both.

headroom — compressing the conversation

headroom is the engine Slipstream launched with, and it stays the core. It runs as a local proxy: your agent points its model endpoint at Slipstream instead of straight at the provider, and every request passes through headroom first. It rewrites the context — dropping redundancy, trimming what the model has already seen — before forwarding a much smaller request upstream. Same answer, 60–95% fewer tokens.

It works at the API layer: it sees the whole conversation on its way to the model and compresses it as one. That's the right place to catch re-sent context, but by the time a request reaches headroom, a noisy command result is already part of it.

RTK — compressing the commands

RTK works one step earlier, at the shell layer. When your agent is about to run a Bash command, RTK rewrites it — git status becomes rtk git status — so the command runs through RTK. RTK knows the shape of 100+ common dev commands and returns a compact, structured version of their output: the same information, with the borders, repetition and dead weight stripped. Often 60–90% smaller, in under 10 ms.

The key point: RTK trims that output before it ever enters the context. It's not compressing the conversation — it's stopping noise from becoming part of the conversation in the first place.

Why run both

Because they're not the same tool pointed at the same problem. They sit at different layers and save tokens from different sources, so their savings add up rather than overlap.

	headroom	RTK
Layer	API · the conversation	Shell · command output
Compresses	context sent to the model	output of commands the agent runs
Mechanism	local proxy rewrites each request	hook rewrites Bash commands
Best at	redundant, re-sent context	noisy dev output (git, tests, logs)
When it acts	on the way to the model	before output enters context

With both on, RTK trims command noise at the source and headroom compresses whatever conversation remains. One reduces what gets created; the other reduces what gets re-sent. The dashboard now splits your savings by source — Compression, Prefix cache, and Commands — so you can see each layer's contribution.

A first implementation

We want to be straight about what this is: a first pass. RTK support landed in v0.1.5, it's off by default, and you turn it on per-machine.

RTK reports its savings in tokens, not dollars, so the figure you see on the dashboard under "Commands" is an estimate — we convert those tokens to a dollar value using a representative input-token price. The real combined effect of running both engines depends heavily on how command-heavy your agents are, which models you use, and how your provider prices cached input.

What we're still learning

We don't yet have enough real-world data to put a confident number on total combined savings. As more people run both engines, we'll sharpen the estimate and share what we find here. Treat the Commands figure as directional for now, not exact.

Turning it on

RTK is bundled with Slipstream — there's nothing extra to install. In Settings → Command compression, flip Trim command output (RTK). Slipstream wires it into Claude Code and makes the bundled rtk available on your path.

A few things worth knowing before you do:

Restart your agent. The hook loads when a Claude Code session starts, so restart any running sessions after toggling.
It rewrites Bash commands only — Claude Code's built-in Read/Grep/Glob tools bypass it.
Claude Code first. This release wires RTK for Claude Code; other agents follow.
macOS & Linux. On native Windows RTK falls back to a CLAUDE.md-based mode without the automatic rewrite (WSL recommended).

The full walkthrough — with every caveat — is in the manual. New to Slipstream? Download it and the conversation layer (headroom) is on out of the box; add the command layer whenever you like.