Summarizer

MCP Response Interception Limitations

Clarification that context-mode cannot intercept MCP tool responses because there's no PostToolUse hook in Claude Code; only built-in tools and CLI wrappers benefit from compression

← Back to MCP server that reduces Claude Code context consumption by 98%

While context-mode offers significant token savings for built-in tools and CLI wrappers, users have confirmed that it cannot currently intercept or compress MCP tool responses due to the lack of a "PostToolUse" hook in Claude Code. This architectural limitation means MCP data flows directly into the context, leading to "un-eatable" token consumption unless MCP authors manually implement server-side summarization and drill-down tools. Beyond these technical hurdles, some participants express concern that aggressive compression might inadvertently degrade the model's reasoning quality or negate the economic benefits provided by prompt caching. Ultimately, the discussion establishes a clear boundary where context-mode remains highly valuable for subprocesses but requires a more intentional, design-heavy approach for third-party MCP integrations.

10 comments tagged with this topic

View on HN · Topics
Very interesting, one big wrinkle with OP:s approach is exactly that, the structured responses are un-touched, which many tools return. Solution in OP as i understand it is the "execute" method. However, im building an MCP gateway, and such sandboxed execution isnt available (...yet), so your approach to this sounds very clever. Ill spend this day trying that out
View on HN · Topics
Really intrigued and def will try, thanks for this. In connecting the dots (and help me make sure I'm connecting them correctly), context-mode _does not address MCP context usage at all_, correct? You are instead suggesting we refactor or eliminate MCP tools, or apply concepts similar to context_mode in our MCPs where possible? Context-mode is still very high value, even if the answer is "no," just want to make sure I understand. Also interested in your thoughts about the above. I write a number of MCPs that work across all Claude surfaces; so the usual "CLI!" isn't as viable an answer (though with code execution it sometimes can be) ... Edit: typo
View on HN · Topics
Right, context-mode doesn't change how MCP tool definitions get loaded into context. That's the "input side" problem that Cloudflare's Code Mode tackles by compressing tool schemas. Context-mode handles the "output side," the data that comes back from tool calls. That said, if you're writing your own MCPs, you could apply the same pattern directly. Instead of returning raw payloads, have your MCP server return a compact summary and store the full output somewhere queryable. Context-mode just generalizes that so you don't have to rebuild it per server.
View on HN · Topics
Hmmm. I was talking about the output side. When data comes back from an MCP tool call, context-mode is still not in the loop, not able to help, is it? Edit: clarify "MCP tool"
View on HN · Topics
I dug into this further. Tested empirically and read the code. Confirmed: context-mode cannot intercept MCP tool responses. The PreToolUse hook (hooks/pretooluse.sh) matches only Bash|Read|Grep|Glob|WebFetch|WebSearch|Task. When I called my obsidian MCP's obsidian_list via MCP, the response went straight into context — zero entries in context-mode's FTS5 database. The web fetches from the same session were all indexed. The context-mode skill (SKILL.md) actually acknowledges this at lines 71-77 with an "after-the-fact" decision tree for MCP output: if it's already in context, use it directly; if you need to search it again, save to file then index. But that's damage control — the context is already consumed. You can't un-eat those tokens. The architectural reason: MCP tool responses flow via JSON-RPC directly to the model. There's no PostToolUse hook in Claude Code that could modify or compress a response before it enters context. And you can't call MCP tools from inside a subprocess, so the "run it in a sandbox" pattern doesn't apply. So the 98% savings are real but scoped to built-in tools and CLI wrappers (curl, gh, kubectl, etc.) — anything replicable in a subprocess. For third-party MCP tools with unique capabilities (Excalidraw rendering, calendar APIs, Obsidian vault access), the MCP author has to apply context-mode's concepts server-side: return compact summaries, store full output queryably, expose drill-down tools. Which is essentially what you suggested above. Still very high value for the built-in tool side. Just want the boundary to be clear. Correct any misconceptions please!
View on HN · Topics
The compression numbers look great but I keep wondering: does the model actually produce equivalent output with compressed context vs full context? Extending sessions from 30min to 3hrs only matters if reasoning quality holds up in hour 2. esafak's cache economics point is underrated. With prompt caching, verbose context that gets reused is basically free. If compression breaks cache continuity you might save tokens while spending more money. The deeper issue is that most MCP tools do SELECT * when they should return summaries with drill-down. That's a protocol design problem, not a compression problem.
View on HN · Topics
AFAIK Claude Code doesn't inject all the MCP output into the context. It limits 25k tokens and uses bash pipe operators to read the full output. That's at least what I see in the latest version.
View on HN · Topics
Does context mode only work with MCPs? Or does it work with bash/git/npm commands as well?
View on HN · Topics
I'm not sure it actually works with MCPs *at all*, trying to get that clarified. How can context-mode get "into the MCP loop"?
View on HN · Topics
See my comment above, context-mode has no way to inject itself into the MCP tool-call - response loop. Still high-value, outside MCPs.