Summarizer

Subagent Architecture Benefits

Discussion of spawning subprocesses for work-oriented calls that don't pollute parent context, returning only summarized results to main thread

← Back to MCP server that reduces Claude Code context consumption by 98%

The subagent architecture focuses on preserving the "gold" of the main context window by isolating data-heavy tasks, such as tool calls and extensive log outputs, into independent subprocesses. Users advocate for returning only distilled summaries to the main thread—often covering the final answer, methodology, and lessons learned—so the primary model can reach conclusions without wading through raw data bloat. While this approach significantly enhances reasoning quality and enables better parallelization, some contributors caution that it may introduce a "slowdown penalty" for simple tasks like linting that might be faster to handle directly. Ultimately, the consensus highlights a shift toward "context purity," where sophisticated subagent routing and even small local models are used to filter information before it ever reaches the more expensive parent LLM.

9 comments tagged with this topic

View on HN · Topics
Author here. I shared the GitHub repo a few days ago ( https://news.ycombinator.com/item?id=47148025 ) and got great feedback. This is the writeup explaining the architecture. The core idea: every MCP tool call dumps raw data into your 200K context window. Context Mode spawns isolated subprocesses — only stdout enters context. No LLM calls, purely algorithmic: SQLite FTS5 with BM25 ranking and Porter stemming. Since the last post we've seen 228 stars and some real-world usage data. The biggest surprise was how much subagent routing matters — auto-upgrading Bash subagents to general-purpose so they can use batch_execute instead of flooding context with raw output. Source: https://github.com/mksglu/claude-context-mode Happy to answer any architecture questions.
View on HN · Topics
I think telling it to run those in a subagent should accomplish the same thing and ensure only the answer makes it to the main context. Otherwise you will still have some bloat from reading the exact output, although in some cases that could be good if you’re debugging or something
View on HN · Topics
Not really because it reliably greps or searches the file for relevant info. So far I haven't seen it ever load the whole file. It might be more efficient for the main thread to have a subagent do it but probably at a significant slowdown penalty when all I'm doing is linting or running tests. So this is probably a judgement call depending on the situation.
View on HN · Topics
i think something kinda easy for that could be to pretend that pruned output was actually done by a subagent. copy the detailed logs out, and replace it with a compacted summary.
View on HN · Topics
Maybe the right answer is “why not both”, but subagents can also be used for that problem. That is, when something isn’t going as expected, fork a subagent to solve the problem and return with the answer. It’s interesting to imagine a single model deciding to wipe its own memory though, and roll back in time to a past version of itself (only, with the answer to a vexing problem)
View on HN · Topics
I do this with my agents. Basically, every "work" oriented call spawns a subprocess which does not add anything to the parent context window. When the subprocess completes the task, I ask it to 1) provide a complete answer, 2) provide a succinct explanation of how the answer was arrived at, 3) provide a succinct explanation of any attempts which did not work, and 4) Anything learned during the process which may be useful in the future. Then, I feed those 4 answers back to the parent as if they were magically arrived at. Another thing I do for managing context window is, any tool/MCP call has its output piped into a file. The LLM then can only read parts of the file and only add that to its context if it is sufficient. For example, execute some command that produces a lot of output and ultimately ends in "Success!", the LLM can just tail the last line to see if it succeeded. If it did, the rest of the output doesn't need to be read. if it fails, usually the failure message is at the end of the log. Something I'm working on now is having a smaller local model summarize the log output and feed that summarization to the more powerful LLM (because I can run my local model for ~free, but it is no where near as capable as the cloud models). I don't keep up with SOTA so I have no idea if what I'm doing is well known or not, but it works for me and my set up.
View on HN · Topics
Do you need 80+ tools in context? Even if reduced, why not use sub agents for areas of focus? Context is gold and the more you put into it unrelated to the problem at hand the worse your outcome is. Even if you don't hit the limit of the window. Would be like compressing data to read into a string limit rather than just chunking the data
View on HN · Topics
Thanks for this. I do most of my work in subagents for better parallelization. Is it possible to have it work there? Currently the stats say subagents didn't benefit from it.
View on HN · Topics
Does the skill run in a subagent, saving context?