Summarizer

Agentic Context Management

Ideas about models managing their own context, pruning irrelevant information, backtracking failed attempts, and treating context like git branches with cherry-picking and rebasing

← Back to MCP server that reduces Claude Code context consumption by 98%

Commenters advocate for a paradigm shift from treating context as an immutable stack to viewing it as a malleable workspace that models can proactively prune, summarize, and reorganize. By applying git-like concepts such as branching, rebasing, and "passing notes to past selves," agents can discard failed attempts or verbose logs to maintain focus and prevent the performance degradation caused by context bloat. This agentic approach to memory management often involves replacing raw data dumps with compacted summaries or searchable databases, which significantly extends session longevity while reducing token costs. Ultimately, participants see this move toward self-improving context as a vital step in making long-running agents more efficient, cost-effective, and capable of complex, multi-step reasoning.

21 comments tagged with this topic

View on HN · Topics
Nice work. It strikes me there's more low hanging fruit to pluck re. context window management. Backtracking strikes me as another promising direction to avoid context bloat and compaction (i.e. when a model takes a few attempts to do the right thing, once it's done the right thing, prune the failed attempts out of the context).
View on HN · Topics
Agree. I’d like more fine grained control of context and compaction. If you spend time debugging in the middle of a session, once you’ve fixed the bugs you ought to be able to remove everything related to fixing them out of context and continue as you had before you encountered them. (Right now depending on your IDE this can be quite annoying to do manually. And I’m not aware of any that allow you to snip it out if you’ve worked with the agent on other tasks afterwards.) I think agents should manage their own context too. For example, if you’re working with a tool that dumps a lot of logged information into context, those logs should get pruned out after one or two more prompts. Context should be thought of something that can be freely manipulated, rather than a stack that can only have things appended or removed from the end.
View on HN · Topics
Oh that's quite a nice idea - agentic context management (riffing on agentic memory management). There's some challenges around the LLM having enough output tokens to easily specify what it wants its next input tokens to be, but "snips" should be able to be expressed concisely (i.e. the next input should include everything sent previously except the chunk that starts XXX and ends YYY). The upside is tighter context, the downside is it'll bust the prompt cache (perhaps the optimal trade-off is to batch the snips).
View on HN · Topics
So I built that in my chat harness. I just gave the agent a “prune” tool and it can remove shit it doesn’t need any more from its own context. But chat is last gen.
View on HN · Topics
Yeah, the fact that we have treated context as immutable baffles me, it’s not like humans working memory keeps a perfect history of everything they’ve done over the last hour, it shouldn’t be that complicated to train a secondary model that just runs online compaction, eg: it runs a tool call, the model determines what’s Germaine to the conversion and prunes the rest, or some task gets completed, ok just leave a stub in the context that says completed x, with a tool available to see the details of x if it becomes relevant again.
View on HN · Topics
This is a partial realization of the idea, but, for a long running agent the proportion of noise increases linearly with the session length, unless you take an appropriately large machete to the problem you’re still going to wind up with sub optimal results.
View on HN · Topics
Yeah, I'd definitely like to be able to edit my context a lot more. And once you consider that you start seeing things in your head like "select this big chunk of context and ask the model to simply that part", or do things like fix the model trying to ingest too many tokens because it dumped a whole file in that it didn't realize was going to be as large as it was. There's about a half-dozen things like that that are immediately obviously useful.
View on HN · Topics
> I think agents should manage their own context too. My intuition is that this should be almost trivial. If I copy/paste your long coding session into an LLM and ask it which parts can be removed from context without losing much, I'm confident that it will know to remove the debugging bits.
View on HN · Topics
I generally do this when I arrive at the agent getting stuck at a test loop or whatever after injecting some later requirement in and tweaking. Once I hit a decent place I have the agent summarize, discard the branch (it’s part of the context too!) and start with the new prompt
View on HN · Topics
I’ve been wondering about this and just found this paper[1]: Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Looks interesting. [1] https://arxiv.org/html/2510.04618v1
View on HN · Topics
what i want is for the agent to initially get the full data and make the right decision based on it, then later it doesnt need to know as much about how it got there. isnt that how thinking works? intermediate tokens that then get replaced with the reuslt?
View on HN · Topics
i think something kinda easy for that could be to pretend that pruned output was actually done by a subagent. copy the detailed logs out, and replace it with a compacted summary.
View on HN · Topics
Treat context like git shas. Yes, there is a specific order within a 'branch' but you should be able to do the equivalent of cherry-picking and rebasing it
View on HN · Topics
Maybe the right answer is “why not both”, but subagents can also be used for that problem. That is, when something isn’t going as expected, fork a subagent to solve the problem and return with the answer. It’s interesting to imagine a single model deciding to wipe its own memory though, and roll back in time to a past version of itself (only, with the answer to a vexing problem)
View on HN · Topics
I forget where now but I'm sure I read an article from one of the coding harness companies talking about how they'd done just that. Effectively it could pass a note to its past self saying "Path X doesn't work", and otherwise reset the context to any previous point. I could see this working like some sort of undo tree, with multiple branches you can jump back and forth between.
View on HN · Topics
I've been running https://github.com/rtk-ai/rtk for a week seems to be a good balance between culling out of context and not just killing everything. I've been running https://github.com/Opencode-DCP/opencode-dynamic-context-pru... in opencode as well. It seems more aggressive.
View on HN · Topics
This sounds a little bit like rkt? Which trims output from other CLI applications like git, find and the most common tools used by Claude. This looks like it goes a little further which is interesting. I see some of these AI companies adopting some of these ideas sooner or later. Trim the tokens locally to save on token usage. https://github.com/rtk-ai/rtk
View on HN · Topics
Haven't looked at rtk closely but from the description it sounds like it works at the CLI output level, trimming stdout before it reaches the model. Context-mode goes a bit further since it also indexes the full output into a searchable FTS5 database, so the model can query specific parts later instead of just losing them. It's less about trimming and more about replacing a raw dump with a summary plus on-demand retrieval.
View on HN · Topics
I’m also trying to see which one makes more sense. Discussion about rtk started today: https://news.ycombinator.com/item?id=47189599
View on HN · Topics
Would be interested to know if this architecture facilitates dynamic context injection from external knowledge sources without inflating the payload again.
View on HN · Topics
That's the theory and it does hold up in practice. When context is 70% raw logs and snapshots, the model starts losing track of the actual task. We haven't run formal benchmarks on answer quality yet, mostly focused on measuring token savings. But anecdotally the biggest win is sessions lasting longer before compaction kicks in, which means the model keeps its full conversation history and makes fewer mistakes from lost context.