Summarizer

LLM Input

llm/9b2efe03-4d9e-4db2-a79a-13cee83b17d6/topic-3-f231aa39-8e02-4991-aeff-d2bc880ed112-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Agentic Context Management # Ideas about models managing their own context, pruning irrelevant information, backtracking failed attempts, and treating context like git branches with cherry-picking and rebasing
</topic>

<comments_about_topic>
1. Nice work.

It strikes me there's more low hanging fruit to pluck re. context window management. Backtracking strikes me as another promising direction to avoid context bloat and compaction (i.e. when a model takes a few attempts to do the right thing, once it's done the right thing, prune the failed attempts out of the context).

2. Agree. I’d like more fine grained control of context and compaction. If you spend time debugging in the middle of a session, once you’ve fixed the bugs you ought to be able to remove everything related to fixing them out of context and continue as you had before you encountered them. (Right now depending on your IDE this can be quite annoying to do manually. And I’m not aware of any that allow you to snip it out if you’ve worked with the agent on other tasks afterwards.)

I think agents should manage their own context too. For example, if you’re working with a tool that dumps a lot of logged information into context, those logs should get pruned out after one or two more prompts.

Context should be thought of something that can be freely manipulated, rather than a stack that can only have things appended or removed from the end.

3. Oh that's quite a nice idea - agentic context management (riffing on agentic memory management).

There's some challenges around the LLM having enough output tokens to easily specify what it wants its next input tokens to be, but "snips" should be able to be expressed concisely (i.e. the next input should include everything sent previously except the chunk that starts XXX and ends YYY). The upside is tighter context, the downside is it'll bust the prompt cache (perhaps the optimal trade-off is to batch the snips).

4. So I built that in my chat harness. I just gave the agent a “prune” tool and it can remove shit it doesn’t need any more from its own context. But chat is last gen.

5. Yeah, the fact that we have treated context as immutable baffles me, it’s not like humans working memory keeps a perfect history of everything they’ve done over the last hour, it shouldn’t be that complicated to train a secondary model that just runs online compaction, eg: it runs a tool call, the model determines what’s Germaine to the conversion and prunes the rest, or some task gets completed, ok just leave a stub in the context that says completed x, with a tool available to see the details of x if it becomes relevant again.

6. This is a partial realization of the idea, but, for a long running agent the proportion of noise increases linearly with the session length, unless you take an appropriately large machete to the problem you’re still going to wind up with sub optimal results.

7. Yeah, I'd definitely like to be able to edit my context a lot more. And once you consider that you start seeing things in your head like "select this big chunk of context and ask the model to simply that part", or do things like fix the model trying to ingest too many tokens because it dumped a whole file in that it didn't realize was going to be as large as it was. There's about a half-dozen things like that that are immediately obviously useful.

8. > I think agents should manage their own context too.

My intuition is that this should be almost trivial. If I copy/paste your long coding session into an LLM and ask it which parts can be removed from context without losing much, I'm confident that it will know to remove the debugging bits.

9. I generally do this when I arrive at the agent getting stuck at a test loop or whatever after injecting some later requirement in and tweaking. Once I hit a decent place I have the agent summarize, discard the branch (it’s part of the context too!) and start with the new prompt

10. I’ve been wondering about this and just found this paper[1]: Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Looks interesting.

[1] https://arxiv.org/html/2510.04618v1

11. what i want is for the agent to initially get the full data and make the right decision based on it, then later it doesnt need to know as much about how it got there.

isnt that how thinking works? intermediate tokens that then get replaced with the reuslt?

12. i think something kinda easy for that could be to pretend that pruned output was actually done by a subagent. copy the detailed logs out, and replace it with a compacted summary.

13. Treat context like git shas. Yes, there is a specific order within a 'branch' but you should be able to do the equivalent of cherry-picking and rebasing it

14. Maybe the right answer is “why not both”, but subagents can also be used for that problem. That is, when something isn’t going as expected, fork a subagent to solve the problem and return with the answer.

It’s interesting to imagine a single model deciding to wipe its own memory though, and roll back in time to a past version of itself (only, with the answer to a vexing problem)

15. I forget where now but I'm sure I read an article from one of the coding harness companies talking about how they'd done just that. Effectively it could pass a note to its past self saying "Path X doesn't work", and otherwise reset the context to any previous point.

I could see this working like some sort of undo tree, with multiple branches you can jump back and forth between.

16. I've been running https://github.com/rtk-ai/rtk for a week seems to be a good balance between culling out of context and not just killing everything. I've been running https://github.com/Opencode-DCP/opencode-dynamic-context-pru... in opencode as well. It seems more aggressive.

17. This sounds a little bit like rkt? Which trims output from other CLI applications like git, find and the most common tools used by Claude. This looks like it goes a little further which is interesting.

I see some of these AI companies adopting some of these ideas sooner or later. Trim the tokens locally to save on token usage.

https://github.com/rtk-ai/rtk

18. Haven't looked at rtk closely but from the description it sounds like it works at the CLI output level, trimming stdout before it reaches the model. Context-mode goes a bit further since it also indexes the full output into a searchable FTS5 database, so the model can query specific parts later instead of just losing them. It's less about trimming and more about replacing a raw dump with a summary plus on-demand retrieval.

19. I’m also trying to see which one makes more sense. Discussion about rtk started today: https://news.ycombinator.com/item?id=47189599

20. Would be interested to know if this architecture facilitates dynamic context injection from external knowledge sources without inflating the payload again.

21. That's the theory and it does hold up in practice. When context is 70% raw logs and snapshots, the model starts losing track of the actual task. We haven't run formal benchmarks on answer quality yet, mostly focused on measuring token savings. But anecdotally the biggest win is sessions lasting longer before compaction kicks in, which means the model keeps its full conversation history and makes fewer mistakes from lost context.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Agentic Context Management # Ideas about models managing their own context, pruning irrelevant information, backtracking failed attempts, and treating context like git branches with cherry-picking and rebasing

commentCount

← Back to job