Context Management Complexity

Discussion of Claude Code as primarily a context management tool, challenges of maintaining session quality over time, compaction strategies, and frustration when context is silently degraded

Users are increasingly frustrated by Claude Code’s "silent context degradation," where reasoning history is pruned after idle periods to save tokens, often leaving sessions feeling "lobotomized" without adequate warning. While some argue that developers should take responsibility for their own project memory through external files, others contend that invisible pruning violates the tool's core contract, forcing users to choose between expensive cache misses and unpredictable quality drops. The discussion highlights a growing rift between power users who demand granular control over context variables and those who want more sophisticated, transparent automation that doesn't compromise the "fine-tuned" feel of a long session. Ultimately, the community is calling for improved UI feedback and documentation, as many find that recent updates have prioritized token efficiency over the stable, long-horizon reasoning required for complex engineering tasks.

View on HN · Topics

I tried to hack the statusline to show this but when i tried, i don't think the api gave that info. I'd love if they let us have more variables to access in the statusline.

View on HN · Topics

CC can explain it clearly, which how I learned about how the inference stack works.

View on HN · Topics

> 99.99% of users won't even understand the words that are being used.

That's a bad estimate. Claude Code is explicitly a developer shaped tool, we're not talking generically ChatGPT here, so my guess is probably closer to 75% of those users do understand what caching is, with maybe 30% being able to explain prompt caching actually is. Of course, those users that don't understand have access to Claude and can have it explain what caching is to them if they're interested.

View on HN · Topics

I somewhat disagree that this is due diligence. Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.

View on HN · Topics

I would say this is abstracting the behavior.

View on HN · Topics

There are general users of the average SaaS, and there are claude code users. There's no doubt in my mind that our expectations should be somewhat higher for CC users re: memory. I'm personally not completely convinced that cache eviction should be part of their thought process while using CC, but it's not _that_ much of a stretch.

View on HN · Topics

Personally I've never thought about cache eviction as it pertains to CC. It's just not something that I ever needed to think about. Maybe I'm just not a power user but I just use the product the way I want to and it just works.

View on HN · Topics

It is more useful to read posts and threads like this exact thread IMO. We can't know everything, and the currently addressed market for Claude Code is far from people who would even think about caching to begin with.

View on HN · Topics

Instead of just dropping all the context, the system could also run a compaction (summarizing the entire convo) before dropping it. Better to continue with a summary than to lose everything.

View on HN · Topics

There's problems with this approach as well I've found

I'm really beginning to feel the lack of control when it's comes to context if I'm being honest

View on HN · Topics

OpenAI does this for all API calls

> Our systems will smartly ignore any reasoning items that aren’t relevant to your functions, and only retain those in context that are relevant. You can pass reasoning items from previous responses either using the previous_response_id parameter, or by manually passing in all the output items from a past response into the input of a new one.

https://developers.openai.com/api/docs/guides/reasoning

Disclosure - work on AI@msft

View on HN · Topics

The trace goes back fine, that's not the issue.

The issue is that if they send the full trace back, it will have to be processed from the start if the cache expired, and doing that will cause a huge one-time hit against your token limit if the session has grown large.

So what Boris talked about is stripping things out of the trace that goes back to regenerate the session if the cache expires. Doing this would help avert burning up the token limit, but it is technically a different conversation, so if CC chooses poorly on stripping parts of the context then it would lead to Claude getting all scatter-brained.

View on HN · Topics

The blog post says they prune them now not to charge you. That’s the change they implemented.

View on HN · Topics

right. they were charging you for it, now they aren't because they are just dropping your conversation history.

View on HN · Topics

> There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them

The irony is that Claude Design does this. I did a big test building a design system, and when I came back to it, it had in the chat window "Do you need all this history for your next block of work? Save 120K tokens and start a new chat. Claude will still be able to use the design system." Or words to that effect.

View on HN · Topics

This is exactly what also confused me. I had the exact same prompt in Claude code as well, and the no option implies you can also keep the whole history. But clicking keep apparently only ever kept the user and assistant messages not the whole actual thinking parts of the conversation

View on HN · Topics

Why cant you just build a project document that outlines that prompt that you want to do? Or have claude save your progress in memory so you can pick it up later? Thats what I do. It seems abhorrent to expect to have a running prompt that left idle for long periods of time just so you can pick up at a moments whim...

View on HN · Topics

You know that memory goes back into a prompt as context that wasn't cached, so... that just adds work.

Granted, the "memory" can be available across session, as can docs...

View on HN · Topics

recursive-mode does just that: https://recursive-mode.dev/introduction

View on HN · Topics

Not at the moment apparently. They remove the thinking messages when you continue after 1 hour. That was the whole idea of that change. So the LLM gets all your messages, its responses etc but not the thinking parts, why it generated that responses. You get a lobotomised session.

View on HN · Topics

This violates the principle of least surprise, with nothing to indicate Claude got lobotomized while it napped when so many use prior sessions as "primed context" (even if people don't know that's what they were doing or know why it works).

The purpose of spending 10 to 50 prompts getting Claude to fill the context for you is it effectively "fine tunes" that session into a place your work product or questions are handled well.

// If this notion of sufficient context as fine tune seems surprising, the research is out there.)

Approaches tried need to deal with both of these:

1) Silent context degradation breaks the Pro-tool contract. I pay compute so I don't pay in my time; if you want to surface the cost, surface it (UI + price tag or choice), don't silently erode quality of outcomes.

2) The workaround (external context files re-primed on return) eats the exact same cache miss, so the "savings" are illusory — you just pushed the cost onto the user's time. If my own time's cheap enough that's the right trade off, I shouldn't be using your machine.

View on HN · Topics

I got exactly this warning message yesterday, saying that it could use up a significant amount of my token budget if I resumed the conversation without compaction.

View on HN · Topics

Compaction wont save you, in fact calling compaction will eat about 3-5x the cold cache cost in usage ive found.

View on HN · Topics

Wouldn't it help if the system did compaction before the eviction happens? But the problem is that Claude probably don't want to automatically compact all sessions that have been left idle for one hour (and very likely abandoned already), that would probably introduce even more additional costs.

Maybe the UI could do that for sessions that the user hasn't left yet, when the deadline comes near.

View on HN · Topics

I saw that too, but that's actually even worse on cache - the entire conversation is then a cache miss and needs to be loaded in in order to do the compaction. Then the resulting compacted conversation is also a cache miss.

You ideally want to compact before the conversation is evicted from cache. If you knew you were going to use the conversation again later after cache expiry, you might do this deliberately before leaving a session.

Anthropic could do this automatically before cache expiry, though it would be hard to get right - they'd be wasting a lot of compute compacting conversations that were never going to be resumed anyway.

View on HN · Topics

> I think the best option would be tell a user who is about to resurrect a conversation that has been evicted from cache that the session is not cached anymore and the user will have to face a full cost of replaying a session

This feature has been live for a few days/weeks now, and with that knowledge I try remember to a least get a process report written when I'm for example close to the quota limit and the context is reasonably large. Or continue with a /compact, but that tends to lead to be having to repeat some things that didn't get included in the summary. Context management is just hard.

View on HN · Topics

Have the tool maintain a doc, and use either the built-in memory or (I prefer it this way) your own. I've been pretty critical of some other aspects of how Claude Code works but on this one I think they're doing roughly the right thing given how the underlying completion machinery works.

Edit: If you message me I can share some of my toolchain, it's probably similar to what a lot of other people here use but I've done some polishing recently.

View on HN · Topics

> Adding an in-product tip to recommend running /clear when re-visiting old conversations (we shipped a few iterations of this)

I feel like I'm missing something here. Why would I revisit an old conversation only to clear it?

To me it sounds like a prompt-cache miss for a big context absolutely needs to be a per-instance warning and confirmation. Or even better a live status indicating what sending a message will cost you in terms of input tokens.

View on HN · Topics

Yep, agree. We added a little "/clear to save XXX tokens" notice in the bottom right, and will keep iterating on this. Thanks for being an early user!

View on HN · Topics

Then you need to update your documentation and teach claude to read the new documentation because here is what claude code answered:

Question: Hey claude, if we have a conversation, and then i take a break. Does it change the expected output of my next answer, if there are 2 hours between the previous message end the next one?

Answer: No. A 2-hour gap doesn't change my output. I have no internal clock between messages — I only see the conversation content plus the currentDate context injected each turn. The prompt cache may expire (5 min TTL), which affects
cost/latency but not the response itself.

The only things that can change output across a break: new context injected (like updated date), memory files being modified, or files on disk changing.

-- This answer directly contradict your post. It seems like the biggest problem is a total lack of documentation for expected behavior.

A similar thing happens if I ask claude code for the difference between plan mode, and accept edits on.

Then Claude told me the only difference was that with plan mode it would ask for permission before doing edits. But I really don't think this is true. It seems like plan mode does a lot more work, and present it in a total different way. It is not just a "I will ask before applying changes" mode.

View on HN · Topics

I think thats a bad idea. It seems like expecting to have a prompt open like this, accumulating context puts a load on the back end. Its one of those things that is a bad habit. Like trying to maintain open tabs in a browser as a way to keep your work flow up to date when what you really should be doing is taking notes of your process and working from there.

I have project folders/files and memory stored for each session, when I come back to my projects the context is drawn from the memory files and the status that were saved in my project md files.

Create a better workflow for your self and your teams and do it the right way. Quick expect the prompt to store everything for you.

For the Claude team. If you havent already, I'd recommend you create some best practices for people that don't know any better, otherwise people are going to expect things to be a certain way and its going to cause a lot of friction when people cant do what the expect to be able to do.

View on HN · Topics

How does the Claude team recommend devs use Claude Code?

1) Is it okay to leave Claude Code CLI open for days?

2) Should we be using /clear more generously? e.g., on every single branch change, on every new convo?

View on HN · Topics

reasonably, if i'm in an interactive session, its going to have breaks for an hour or more.

whats driving the hour cache? shouldnt people be able to have lunch, then come back and continue?

are you expecting claude code users to not attend meetings?

I think product-wise you might need a better story on who uses claude-code, when and why.

Same thing with session logs actually - i know folks who are definitely going to try to write a yearly RnD report and monthly timesheets based on text analysis of their claude code session files, and they're going to be incredibly unhappy when they find out its all been silently deleted

View on HN · Topics

Isn't that exactly what people had been accusing Anthropic of doing, silently making Claude dumber on purpose to cut costs? There should be, at minimum, a warning on the UI saying that parts of the context were removed due to inactivity.

View on HN · Topics

Why not automatically run a compaction close to the 1-hour mark? Then the cache miss won’t have such a bad impact.

View on HN · Topics

That is understandable, but the issue is the sudden drop in quality and the silent surge in token usage.

It also seems like the warning should be in channel and not on X. If I wanted to find out how broken things are on X, I'd be a Grok user.

View on HN · Topics

Wow so that's why you did #2? The explanation in the CLI is really not clear. I thought it was just a suggestion to compact, no idea it was way more expensive than if I hadn't left it idle for an hour.

You guys really need to communicate that better in the CLI for people not on social

View on HN · Topics

So you made this change completely invisible to the user, without the user being able to choose between the two behaviors, and without even documenting it in the (extremely verbose) changelog [1]? I can't find it, the Docs Assistant can't find it (well, it "I found it!" three times being fed your reply with a non-matching item).

I frequently debug issues while keeping my carefully curated but long context active for days. Losing potentially very important context while in the middle of a debugging session resulting in less optimal answers, is costing me a lot more money than the cache misses would.

In my eyes, Claude Code is mainly a context management tool . I build a foundation of apparent understanding of the problem domain, and then try to work towards a solution in a dialogue. Now you tell me Anthrophic has been silently breaking down that foundation without telling me, wasting potentially hours of my time.

It's a clear reminder that these closed-source harnesses cannot be trusted (now or in the future), and I should find proper alternatives for Claude Code as soon as possible.

[1] https://code.claude.com/docs/en/changelog

View on HN · Topics

2. could you bring back the _compact and accept plan_? even if it is not the default option.

View on HN · Topics

Wow, I always thought the context is always stored locally and this is something I have control over.

Glad I use kiro-cli which doesn't do this.

View on HN · Topics

Same here. 4.6 was a downgrade in thinking quality, but I appreciated the extend context at first.

Over time, I realized the extended context became randomly unreliable. That was worse to me than having to compact and know where I was picking up.

View on HN · Topics

I frequently see it reference points that it made and then added to its memory as if they were my own assertions. This creates a sort of self-reinforcing loop where it asserts something, “remembers” it, sees the memory, builds on that assertion, etc., even if I’ve explicitly told it to stop.

View on HN · Topics

I often have Claude commit and pr; on the last week I've seen several instances of it deciding to do extra work as part of the commit. It falls over when it tries to 'git add', but it got past me when I was trying auto mode once

View on HN · Topics

I also think some of this stems from the default 1m context window. Performance starts to degrade when context size increases, and each token over (i think the level is) 400k counts more towards your usage limit. Defaulting to 1m context size, if people arent carefully managing context (which they shouldnt ever have to in an ideal world), they would notice somewhat degraded performance and increased token usage regardless.

View on HN · Topics

I have to agree with OP, in my experience it is usually more productive to start over than to try correcting output early on. deeper into a project and it gets a bit harder to pull off a switch. I sometimes fork my chats before attempting to make a correction so that I can resume the original just in case (yes, I know you can double-tap Esc but the restoration has failed for me a few times in the past and now I generally avoid it)

View on HN · Topics

I can't remember what the technique is called, but back in the GPT 4 days there was a paper published about having a number of attempts at responding to a prompt and then having a final pass where it picks the best one. I believe this is part of how the "Pro" GPT variant works, and Cursor also supports this in a way (though I'm not sure if the auto pick best one at the end is part of it - never tried)

View on HN · Topics

Having a "Recovery Mode"/"Safe Boot" flag to disable our configurations (or progressively enable) to see how claude code responds would be nice. Sometimes I get worried some old flag I set is breaking things. Maybe the flag already exists? I tried Claude doctor but it wasn't quite the solution.

For instance:

Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?

I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):

w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249

w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243

I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.

View on HN · Topics

Off topic, but I'm hoping you'll maybe see this. There's been an issue with the VS code extension that makes it pretty much impossible to use (PreToolUse can't intercept permission requests anymore, using PermissionRequest hooks always open the diff viewer and steals focus):

https://github.com/anthropics/claude-code/issues/36286
https://github.com/anthropics/claude-code/issues/25018

View on HN · Topics

I see some anthropic claude code people are reading the comments. A day or two ago I watched a video by theo t3.gg on whether claude got dumber. Even though he was really harsh on anthropic and said some mean stuff. I thought some of the points he was raising about claude code was quite apt. Especially when it comes to the harness bloat. I really hope the new features now stop and there is a real hard push for polish and optimization. Otherwise I think a lot of people will start exploring less bloated more optimized alternatives. Focus on making the harness better and less token consuming.

https://youtu.be/KFisvc-AMII?is=NskPZ21BAe6eyGTh

View on HN · Topics

I find OpenCode vastly superior. Only thing missing is Vim mode but I saw a fork that someone implemented it. I really like being able to click on a previous message I sent to revert to that point in the conversation. You can revert in CC by pressing Escape twice but the “menu” it takes you to for picking the message is terrible because it only shows your messages. Also, expanding subagent/tools/thinking/etc. blocks is super intuitive in OpenCode whereas CC’s view when you press CTRL+O is also terrible and hard to understand at first glance.

View on HN · Topics

I've got to do some cleanup before sharing (yay vibe coding) but the big things I've changed so far:

1) Curated a set of models I like and heavily optimized all possible settings, per agent role and even per skill (had to really replumb a lot of stuff to get it as granular as I liked)

2) Ported from sqlite to postgresql, with heavily extended schema. I generate embeddings for everything, so every aspect of my stack is a knowledge graph that can be vector searched. Integrated with a memory MCP server and auditing tools so I can trace anything that happens in the stack/cluster back to an agent action and even thinking that was related to the action. It really helps refine stuff.

3) Tight integration of Gitea server, k3s with RBAC (agents get their own permissions in the cluster), every user workspace is a pod running opencode web UI behind Gitea oauth2.

4) Codified structure of `/projects/<monorepo>/<subrepos>` with simpler browserso non-technical family members can manage their work easier (agents handle all the management and there are sidecars handling all gitops transparent to the user)

5) Transparent failover across providers with cooldown by making model definitions linked lists in the config, so I can use a handful of subscriptions that offer my favorite models, and fail over from one to the next as I hit quota/rate limits. This has really cut my bill down lately, along with skipping OpenRouter for my favorite models and going direct to Alibaba and Xiaomi so I can tailor caching and stuff exactly how I want.

6) Integrated filebrowser, a fork of the Milkdown Crepe markdown editor, and codemirror editor so I don't even need an IDE anymore. I just work entirely from OpenCode web UI on whatever device is nearest at the moment. I added support for using Gemma 4 local on CPU from my phone yesterday while waiting in line at a store yesterday.

Those are the big ones off the top of my head. Im sure there's more. I've probably made a few hundred other changes, it just evolves as I go.

View on HN · Topics

The solution IMO is to switch to an agent harness wrapper solution that uses CLI-wrapping or ACP to connect to different coding agents. This is the only way that works across OpenAI, Claude and Gemini.

There are a few out there (latest example is Zed's new multi-agent UI), but they still rely on the underlying agent's skill and plugin system. I'm experimenting with my own approach that integrates a plugin system that can dynamically change the agent skillset & prompts supplied via an integrated MCP server, allowing you to define skills and workflows that work regardless of the underlying agent harness.

View on HN · Topics

And the reason why Claude Code is so buggy ...

https://techtrenches.dev/p/the-snake-that-ate-itself-what-cl...

View on HN · Topics

I apologize for the potato quality of these links, however, I have been working tirelessly to wrap my head how to reason about how agents and LLM models work. They are more than just a black box.

The first tries to answer what happens when I give the models harder and harder arithmetic problems to the point Sonnet will burn 200k tokens for 20minutes. [0]

The other is a very deep dive into the math of a reasoning model in the only way I could think to approach it, with data visualizations, seeing the computation of the model in real time in relation to all the parts.[1]

Two things I've learned are that the behavior of an agent that will reverse engineer any website and the behavior of an agent that does arithmetic are the same. Which means the probability that either will solve their intended task is the same for the given agent and task -- it is a distribution. The other, is that models have a blind spot, therefore creating a red team adversary bug hunter agent will not surface a bug if the same model originally wrote the code.

Understanding that, knowing that I can verify at the end or use majority of votes (MoV), using the agents to automate extremely complicated tasks can be very reliable with an amount of certainty.

[0] https://adamsohn.com/reliably-incorrect/

[1] https://adamsohn.com/grpo/

View on HN · Topics

My biggest problem with CC as a harness is that I can't trust "Plan" mode. Long running sessions frequently start bypassing plan mode and executing, updating files and stuff, without permission, while still in plan mode. And the only recovery seems to be to quit and reload CC.

Right now my solution is to run CC in tmux and keep a 2nd CC pane with /loop watching the first pane and killing CC if it detects plan mode being bypassed. Burning tokens to work around a bug.

View on HN · Topics

Here's one person's feedback. After the release of 4.7, Claude became unusable for me in two ways: frequent API timeouts when using exactly the same prompts in Claude Code that I had run problem-free many times previously, and absurdly slow interface response in Claude Cowork. I found a solution to the first after a few days (add "CLAUDE_STREAM_IDLE_TIMEOUT_MS": "600000" to settings.json), but as of a few hours ago Cowork--which I had thought was fantastic, by the way--was still unusable despite various attempts to fix it with cache clearing and other hacks I found on the web.

View on HN · Topics

Opus 4.7 is very rough to work with. Specifically for long-horizon (we were told it was trained specifically for this and less handholding).

I don't have trust in it right now. More regressions, more oversights, it's pedantic and weird ways. Ironically, requires more handholding.

Not saying it's a bad model; it's just not simple to work with.

for now: `/model claude-opus-4-6[1m]` (youll get different behavior around compaction without [1m])

View on HN · Topics

I’ve stuck to the non-1M context Opus 4.6 and it works really well for me, even with on-going context compression. I honestly couldn’t deal with the 1M context change and then the compounding token devouring nonsense of 4.7
I sincerely hope Anthropic is seeing all of this and taking note. They have their work cut out for them.

View on HN · Topics

As an end-user, I feel like they're kind of over-cooking and under-describing the features and behavior of what is a tool at the end of the day. Today the models are in a place where the context management, reasoning effort, etc. all needs to be very stable to work well.

The thing about session resumption changing the context of a session by truncating thinking is a surprise to me, I don't think that's even documented behavior anywhere?

It's interesting to look at how many bugs are filed on the various coding agent repos. Hard to say how many are real / unique, but quantities feel very high and not hard to run into real bugs rapidly as a user as you use various features and slash commands.

View on HN · Topics

It's incredibly frustrating when I've spelled out in CLAUDE.md that it should SSH to my dev server to investigate things I ask it to and it regularly stops working with a message of something like:

> Next steps are to run `cat /path/to/file` to see what the contents are

Makes me want to pull my hair out. I've specifically told you to go do all the read-only operations you want out on this dev server yet it keeps forgetting and asking me to do something it can do just fine (proven by it doing it after I "remind" it).

That and "Auto" mode really are grinding my gears recently. Now, after a Planing session my only option is to use Auto mode and I have to manually change it back to "Dangerously skip permissions". I think these are related since the times I've let it run on "Auto" mode is when it gives up/gets stuck more often.

Just the other day it was in Auto mode (by accident) and I told it:

> SSH out to this dev server, run `service my_service_name restart` and make sure there are no orphans (I was working on a new service and the start/stop scripts). If there are orphans, clean them up, make more changes to the start/stop scripts, and try again.

And it got stuck in some loop/dead-end with telling I should do it and it didn't want to run commands out on a "Shared Dev server" (which I had specifically told it that this was not a shared server).

The fact that Auto mode burns more tokens _and_ is so dumb is really a kick in the pants.

View on HN · Topics

for me at least, yes. just wrote it to coworkers this afternoon. Behaves way more "stable" in terms of quality and i don't have the feeling of the model getting way worse after 100k tokens of context or so.

What i notice: after 300k there's some slight quality drop, but i just make sure to compact before that threshold.

Summarizer