Cache Expiration Concerns

Users frustrated by one-hour cache timeout clearing thinking tokens, affecting workflows where sessions sit idle during lunch, meetings, or multi-day projects. Many argue this should be user-configurable with clear UI indicators showing cache state

Users are pushing back against the one-hour cache timeout, arguing that such a brief window ignores standard human workflows like lunch breaks and meetings while effectively "lobotomizing" the model by stripping critical reasoning context. While developers defend the policy as a necessary optimization to prevent massive, unexpected token hits during session resumes, frustrated contributors describe the "silent" degradation of intelligence as a breach of trust that forces them to choose between wasted time or exhausted usage quotas. To resolve this friction, many suggest implementing clear UI indicators—such as expiration timers or cache-state widgets—or providing an "ultra-resume" option that allows power users to pay a premium to keep their long-term context "warm." Ultimately, the consensus highlights a growing demand for transparency, with users insisting that technical optimizations should never come at the expense of documented, predictable model behavior.

View on HN · Topics

"On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6"

This makes no sense to me. I often leave sessions idle for hours or days and use the capability to pick it back up with full context and power.

The default thinking level seems more forgivable, but the churn in system prompts is something I'll need to figure out how to intentionally choose a refresh cycle.

View on HN · Topics

Hey, Boris from the Claude Code team here.

Normally, when you have a conversation with Claude Code, if your convo has N messages, then (N-1) messages hit prompt cache -- everything but the latest message.

The challenge is: when you let a session idle for >1 hour, when you come back to it and send a prompt, it will be a full cache miss, all N messages. We noticed that this corner case led to outsized token costs for users. In an extreme case, if you had 900k tokens in your context window, then idled for an hour, then sent a message, that would be >900k tokens written to cache all at once, which would eat up a significant % of your rate limits, especially for Pro users.

We tried a few different approaches to improve this UX:

1. Educating users on X/social

2. Adding an in-product tip to recommend running /clear when re-visiting old conversations (we shipped a few iterations of this)

3. Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.

Hope this is helpful. Happy to answer any questions if you have.

View on HN · Topics

I appreciate the reply, but I was never under the impression that gaps in conversations would increase costs nor reduce quality. Both are surprising and disappointing.

I feel like that is a choice best left up to users.

i.e. "Resuming this conversation with full context will consume X% of your 5-hour usage bucket, but that can be reduced by Y% by dropping old thinking logs"

View on HN · Topics

That doesn’t make sense to pay more for cache warming. Your session for the most part is already persisted. Why would it be reasonable to pay again to continue where you left off at any time in the future?

View on HN · Topics

I’m coming at this as a complete Claude amateur, but caching for any other service is an optimisation for the company and transparent for the user. I don’t think I’ve ever used a service and thought “oh there’s a cache miss. Gotta be careful”.

I completely agree that it’s infeasible for them to cache for long periods of time, but they need to surface that information in the tools so that we can make informed decisions.

View on HN · Topics

> I was never under the impression that gaps in conversations would increase costs

The UI could indicate this by showing a timer before context is dumped.

View on HN · Topics

a countdown clock telling you that you should talk to the model again before your streak expires? that's the kind of UX i'd expect from an F2P mobile game or an abandoned shopping cart nag notification

View on HN · Topics

Well sure if you put it that way, they're similar. But it's either you don't see it and you get surprised by increased quota usage, or you do see it and you know what it means. Bonus points if they let you turn it off.

No need to gamify it. It's just UI.

View on HN · Topics

Plenty of room for a middle ground, like a static timestamp per session that shows expiration time, without the distraction of a constantly changing UI element.

View on HN · Topics

Why not an automated ping message that's cheap for the model to respond to?

View on HN · Topics

Because the cache is held on anthropics side, and they aren't going to hold your context in cache indefinitely.

View on HN · Topics

Yes!!
A UI widget that shows how far along on the prompt cache eviction timelines we are would be great.

View on HN · Topics

That sounds stressful.

But perhaps Claude Code could detect that you're actively working on this stuff (like typing a prompt or accessing the files modified by the session), and send keep-cache-alive pings based on that? Presumably these pings could be pretty cheap, as the kv-cache wouldn't need to be loaded back into VRAM for this. If that would work reliably, cache expiry timeouts could be more aggressive (5 min instead of an hour).

View on HN · Topics

I tried to hack the statusline to show this but when i tried, i don't think the api gave that info. I'd love if they let us have more variables to access in the statusline.

View on HN · Topics

I too would far rather bear a token cost than have my sessions rot silently beneath my feet. I usually have ~5 running CC sessions, some of which I may leave for a week or two of inactivity at a time.

View on HN · Topics

How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain?

You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.

View on HN · Topics

I use CC, and I understand what caching means.

I have no idea how that works with a LLM implementation nor do I actually know what they are caching in this context.

View on HN · Topics

I somewhat disagree that this is due diligence. Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.

View on HN · Topics

>If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs,

and the system was being run by some of the planet’s brightest people whose famous creation is well known to disseminate complex information succinctly,

>then:

You would expect to be led to understand, like… a 1997 Prius.

“This feature showed the vehicle operation regarding the interplay between gasoline engine, battery pack, and electric motors and could also show a bar-graph of fuel economy results.” https://en.wikipedia.org/wiki/Toyota_Prius_(XW10)

View on HN · Topics

Yes. It’s perfectly reasonable to expect the user to know the intricacies of the caching strategy of their llm. Totally reasonable expectation.

View on HN · Topics

To some extent I'd say it is indeed reasonable. I had observed the effect for a while: if I walked away from a session I noticed that my next prompt would chew up a bunch of context. And that led me to do some digging, at which point I discovered their prompt caching.

So while I'd agree with your sarcasm that expecting users to be experts of the system is a big ask, where I disagree with you is that users should be curious and actively attempting to understand how it works around them. Given that the tooling changes often, this is an endless job.

View on HN · Topics

Personally I've never thought about cache eviction as it pertains to CC. It's just not something that I ever needed to think about. Maybe I'm just not a power user but I just use the product the way I want to and it just works.

View on HN · Topics

I do think having some insight into the current state of the cache and a realistic estimate for prompt token use is something we should demand.

View on HN · Topics

If there was an affordance on the TUI that made this visible and encouraged users to learn more - that would go a long way.

View on HN · Topics

Okay, sure. There's a dollar/intelligence tradeoff. Let me decide to make it, don't silently make Claude dumber because I forgot about a terminal tab for an hour. Just because a project isn't urgent doesn't mean it's not important. If I thought it didn't need intelligence I would use Sonnet or Haiku.

View on HN · Topics

It'd probably be helpful for power users and transparency to actually show how the cache is being used. If you run local models with llamacpp-server, you can watch how the cache slots fill up with every turn; when subagents spawn, you see another process id spin up and it takes up a cache slot; when the model starts slowing down is when the context grows (amd 395+ around 80-90k) and the cache loads are bigger because you've got all that.

So yeah, it doesn't take much to surface to the user that the speed/value of their session is ephemeral because to keep all that cache active is computationally expensive because ...

You're still just running text through a extremely complex process, and adding to that text and to avoid re-calculation of the entire chain, you need the cache.

View on HN · Topics

Is there a way to say: I am happy to pay a premium (in tokens or extra usage) to make sure that my resumed 1h+ session has all the old thinking?

I understand you wouldn't want this to be the default, particularly for people who have one giant running session for many topics - and I can only imagine the load involved in full cache misses at scale. But there are other use cases where this thinking is critical - for instance, a session for a large refactor or a devops/operations use case consolidating numerous issue reports and external findings over time, where the periodic thinking was actually critical to how the session evolved.

For example, if N-4 was a massive dump of some relevant, some irrelevant material (say, investigating for patterns in a massive set of data, but prompted to be concise in output), then N-4's thinking might have been critical to N-2 not getting over-fixated on that dump from N-4. I'd consider it mission-critical, and pay a premium, when resuming an N some hours later to avoid pitfalls just as N-2 avoided those pitfalls.

Could we have an "ultraresume" that, similar to ultrathink, would let a user indicate they want to watch Return of the (Thin)king: Extended Edition?

View on HN · Topics

I think it’s crazy that they do this, especially without any notice. I would not have renewed my subscription if I knew that they started doing this.

Especially in the analysis part of my work I don‘t care about the actual text output itself most of the time but try to make the model „understand“ the topic.

In the first phase the actual text output itself is worthless it just serves as an indicator that the context was processed correctly and the future actual analysis work can depend on it.
And they‘re… just throwing most the relevant stuff out all out without any notice when I resume my session after a few days?

This is insane, Claude literally became useless to me and I didn’t even know it until now, wasting a lot of my time building up good session context.

There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them… make it an env variable (that is announced not a secretly introduced one to opt out of something new!) or at least write it in a change log if they really don’t want to allow people to use it like before, so there‘d be chance to cancel the subscription in time instead of wasting tons of time on work patterns that not longer work

View on HN · Topics

So to defend a litte, its a Cache, it has to go somewhere, its a save state of the model's inner workings at the time of the last message. so if it expires, it has to process the whole thing again. most people don't understand that every message the ENTIRE history of the conversion is processed again and again without that cache. That conversion might of hit several gigs worth of model weights and are you expecting them to keep that around for /all/ of your conversions you have had with it in separate sessions?

View on HN · Topics

No? It's not because it's a cache, it's because they're scared of letting you see the thinking trace. If you got the trace you could just send it back in full when it got evicted from the cache. This is how open weight models work.

View on HN · Topics

The trace goes back fine, that's not the issue.

The issue is that if they send the full trace back, it will have to be processed from the start if the cache expired, and doing that will cause a huge one-time hit against your token limit if the session has grown large.

So what Boris talked about is stripping things out of the trace that goes back to regenerate the session if the cache expires. Doing this would help avert burning up the token limit, but it is technically a different conversation, so if CC chooses poorly on stripping parts of the context then it would lead to Claude getting all scatter-brained.

View on HN · Topics

It seems like an opportunity for a hierarchical cache. Instead of just nuking all context on eviction, couldn’t there be an L2 cache with a longer eviction time so task switching for an hour doesn’t require a full session replay?

View on HN · Topics

No of course it’s unrealistic for them to hold the cache indefinitely and that’s not the point. You are keeping the session data yourself so you can continue even after cache expiry. The point I‘m making is that it made me very angry that without any announcement they changed behavior to strip the old thinking even when you have it in your session file. There is absolutely no reason to not ask the user about if they want this

And it’s part of a larger problem of unannounced changes it‘s just like when they introduced adaptive thinking to 4.6 a few weeks ago without notice.

Also they seem to be completely unaware that some users might only use Claude code because they are used to it not stripping thinking in contrast to codex.

Anyway I‘m happy that they saw it as a valid refund reason

View on HN · Topics

This is exactly what also confused me. I had the exact same prompt in Claude code as well, and the no option implies you can also keep the whole history. But clicking keep apparently only ever kept the user and assistant messages not the whole actual thinking parts of the conversation

View on HN · Topics

Don't you have that by just resuming old convo?

The only issue is that it didn't hit the cache so it was expensive if you resume later.

View on HN · Topics

Not at the moment apparently. They remove the thinking messages when you continue after 1 hour. That was the whole idea of that change. So the LLM gets all your messages, its responses etc but not the thinking parts, why it generated that responses. You get a lobotomised session.

View on HN · Topics

Or generate tiny filler messages every hour until you come back to it.

View on HN · Topics

As some others have mentioned.

I think the best option would be tell a user who is about to resurrect a conversation that has been evicted from cache that the session is not cached anymore and the user will have to face a full cost of replaying a session, not only the incremental question and answer.

(In understand under the hood that llms are n^2 by default but it's very counter intuitive - and given how popular cc is becoming outside of nerd circles, probably smaller and smaller fraction of users is aware of it)

I would like to decide on it case by case. Sometimes the session has some really deep insight I want to preserve, sometimes it's discardable.

View on HN · Topics

Wouldn't it help if the system did compaction before the eviction happens? But the problem is that Claude probably don't want to automatically compact all sessions that have been left idle for one hour (and very likely abandoned already), that would probably introduce even more additional costs.

Maybe the UI could do that for sessions that the user hasn't left yet, when the deadline comes near.

View on HN · Topics

Is having massive sessions which sit idle for hours (or days) at a time considered unusual? That's a really, really common scenario for me.

Two questions if you see this:

1) if this isn't best practice, what is the best way to preserve highly specific contexts?

2) does this issue just affect idle sessions or would the cache miss also apply to /resume ?

View on HN · Topics

I leave sessions idle for hours constantly - that's my primary workflow. If resuming a 900k context session eats my rate limit, fine, show me the cost and let me decide whether to /clear or push through. You already show a banner suggesting /clear at high context - just do the same thing here instead of silently lobotomizing the model.

View on HN · Topics

I'm also a Claude Code user from day 1 here, back from when it wasn't included in the Pro/Max subscriptions yet, and I was absolutely not aware of this either. Your explanation makes sense, but I naively was also under the impression that re-using older existing conversations that I had open would just continue the conversation as is and not be a treated as a full cache miss.

My biggest learning here is the 1 hour cache window. I often have multiple Claudes open and it happens frequently that they're idle for 1+ hours.

This cache information should probably get displayed somewhere within Claude Code

View on HN · Topics

But.. that doesn't solve the problem of having no indication in-session when it'll lose the cache. A nudge to /clear does nothing to indicate "or else face significant cost" nor does it indicate "your cache is stale".

Love the product. <3

View on HN · Topics

To add to this. The new indicator is "New task? /clear to save <X> tokens" even though it affects all tasks, not just new ones.

Mislead, gaslight, misdirect is the name of the game

View on HN · Topics

Then you need to update your documentation and teach claude to read the new documentation because here is what claude code answered:

Question: Hey claude, if we have a conversation, and then i take a break. Does it change the expected output of my next answer, if there are 2 hours between the previous message end the next one?

Answer: No. A 2-hour gap doesn't change my output. I have no internal clock between messages — I only see the conversation content plus the currentDate context injected each turn. The prompt cache may expire (5 min TTL), which affects
cost/latency but not the response itself.

The only things that can change output across a break: new context injected (like updated date), memory files being modified, or files on disk changing.

-- This answer directly contradict your post. It seems like the biggest problem is a total lack of documentation for expected behavior.

A similar thing happens if I ask claude code for the difference between plan mode, and accept edits on.

Then Claude told me the only difference was that with plan mode it would ask for permission before doing edits. But I really don't think this is true. It seems like plan mode does a lot more work, and present it in a total different way. It is not just a "I will ask before applying changes" mode.

View on HN · Topics

Resuming sessions after more than 1 hour is a very common workflow that many teams are following. It will be great if this is considered as an expected behaviour and design the UX around it. Perhaps you are not realising the fact that Claude code has replaced the shells people were using (ie now bash is replaced with a Claude code session).

View on HN · Topics

I think thats a bad idea. It seems like expecting to have a prompt open like this, accumulating context puts a load on the back end. Its one of those things that is a bad habit. Like trying to maintain open tabs in a browser as a way to keep your work flow up to date when what you really should be doing is taking notes of your process and working from there.

I have project folders/files and memory stored for each session, when I come back to my projects the context is drawn from the memory files and the status that were saved in my project md files.

Create a better workflow for your self and your teams and do it the right way. Quick expect the prompt to store everything for you.

For the Claude team. If you havent already, I'd recommend you create some best practices for people that don't know any better, otherwise people are going to expect things to be a certain way and its going to cause a lot of friction when people cant do what the expect to be able to do.

View on HN · Topics

> Resuming sessions after more than 1 hour is a very common workflow that many teams are following

Yeah it's called lunch!

View on HN · Topics

This just does not match my workflow when I work on low-priority projects, especially personal projects when I do them for fun instead of being paid to do them. With life getting busy, I may only have half an hour each night with Claude to make some progress on it before having to pause and come back the next day. It’s just the nature of doing personal projects as a middle-aged person.

The above workflow basically doesn’t hit the rate limit. So I’d appreciate a way to turn off this feature.

View on HN · Topics

Hi Boris

I'm curious why 1 hour was chosen?

Is increasing it a significant expense?

Ever since I heard about this behaviour I've been trying to figure out how to handle long running Claude sessions and so far every approach I've tried is suboptimal

It takes time to create a good context which can then trigger a decent amount of work in my experience, so I've been wondering how much this is a carefully tuned choice that's unlikely to change vs something adjustable

View on HN · Topics

reasonably, if i'm in an interactive session, its going to have breaks for an hour or more.

whats driving the hour cache? shouldnt people be able to have lunch, then come back and continue?

are you expecting claude code users to not attend meetings?

I think product-wise you might need a better story on who uses claude-code, when and why.

Same thing with session logs actually - i know folks who are definitely going to try to write a yearly RnD report and monthly timesheets based on text analysis of their claude code session files, and they're going to be incredibly unhappy when they find out its all been silently deleted

View on HN · Topics

Isn't that exactly what people had been accusing Anthropic of doing, silently making Claude dumber on purpose to cut costs? There should be, at minimum, a warning on the UI saying that parts of the context were removed due to inactivity.

View on HN · Topics

maybe you could surface an expected cache miss to the user

View on HN · Topics

You created this issue by setting a timer for cache clearing. Time is really not a dimension that plays any role in how coding agent context is used.

View on HN · Topics

It is too suprising. Time passed should not matter for using AI.

Either swallow the cost or be transparent to the user and offer both options each time.

View on HN · Topics

Wow so that's why you did #2? The explanation in the CLI is really not clear. I thought it was just a suggestion to compact, no idea it was way more expensive than if I hadn't left it idle for an hour.

You guys really need to communicate that better in the CLI for people not on social

View on HN · Topics

> The challenge is: when you let a session idle for >1 hour, when you come back to it and send a prompt, it will be a full cache miss, all N messages. We noticed that this corner case led to outsized token costs for users.

I dont agree with this being characterized as a "corner case".

Isn't this how most long running work will happen across all serious users?

I am not at my desk babysitting a single CC chat session all day. I have other things to attend to -- and that was the whole point of agentic engineering.

Dont CC users take lunch breaks?

How are all these utterly common scenarios being named as corner cases -- as something that is wildly out of the norm, and UX can be sacrificed for those cases?

View on HN · Topics

> that would be >900k tokens written to cache all at once

Probably that's why I hit my weekly limits 3-4 days ago, and was scheduled to reset later today. I just checked, and they are already reset.

Not sure if it's already done, shouldn't there be a check somewhere to alert on if an outrageous number of tokens are getting written, then it's not right ?

View on HN · Topics

what about selling long term cache space to users?

or even, let the user control the cache expiry on a per request basis. with a /cache command

that way they decide if they want to drop the cache right away , or extend it for 20 hours etc

it would cost tokens even if the underlying resource is memory/SSD space, not compute

View on HN · Topics

Why is time the variable you're solving for? Why can't I keep that cache warm by keeping the session open?

View on HN · Topics

So this explains why resuming a session after a 5-hour timeout basically eats most of the next session. How then to avoid this?

View on HN · Topics

I drop sessions very frequently to resume later - that's my main workflow with how slow Claude is. Is there anything I can do to not encounter this cache problem?

View on HN · Topics

Wasn’t cache time reduced to 5 minutes? Or is that just some users interpretation of the bug?

View on HN · Topics

What about:

/loop 5m say "ok".

Will that keep the cache fresh?

View on HN · Topics

Sorry but I think this should be left up to the user to decide how it works and how they want to burn their tokens. Also a countdown timer is better than all of these other options you mention.

View on HN · Topics

The entire reason I keep a long-lived session around is because the context is hard-won — in term of tokens and my time.

Silently degrading intelligence ought to be something you never do, but especially not for use-cases like this.

I’m looking back at my past few weeks of work and realizing that these few regressions literally wasted 10s of hours of my time, and hundreds of dollars in extra usage fees. I ran out of my entire weekly quota four days ago, and had to pause the personal project I was working on.

I was running the exact same pipeline I’ve run repeatedly before, on the same models, and yet this time I somehow ate a week’s worth of quota in less than 24h. I spent $400 just to finish the pipeline pass that got stuck halfway through.

I’m sorry to be harsh, but your engineering culture must change. There are some types of software you can yolo. This isn’t one of them. The downstream cost of stupid mistakes is way, way too high, and far too many entirely avoidable bugs — and poor design choices — are shipping to customers way too often.

View on HN · Topics

as a variation:

how does this help me as a customer? if i have to redo the context from scratch, i will pay both the high token cost again, but also pay my own time to fill it.

the cost of reloading the window didnt go away, it just went up even more

View on HN · Topics

Yeah this is actually quite shocking. In my earlier uses of CC I might noodle on a problem for a while, come back and update the plan, go shower, think, give CC a new piece of advice, etc. Basically treating it like a coworker. And I thought that it was a static conversation (at least on the order of a day or so). An hour is absurd IMO and makes me want to rethink whether I want to keep my anthropic plan.

View on HN · Topics

They moved it to 5m around the same timeframe though: https://www.reddit.com/r/ClaudeAI/comments/1sk3m12/followup_...

View on HN · Topics

Seems like it would interact very badly with the time based usage reset. If lots of people are hitting their limit and then letting the session idle until they can come back, this wouldn't be an exception. It would almost be the default behaviour.

View on HN · Topics

Yeah, this is so silly.

Anthropic: removes thinking output

Users: see long pauses, complain

Anthropic: better reduce thinking time

Users: wtf

To me it really, really seems like Anthropic is trying to undo the transparency they always had around reasoning chains, and a lot of issues are due to that.

Removing thinking blocks from the convo after 1 hour of being inactive without any notice is just the icing on the cake, whoever thought that was a good idea? How about making “the cache is hot” vs “the cache is cold” a clear visual indicator instead, so you slowly shape user behavior, rather than doing these types of drastic things.

View on HN · Topics

" Combined with this only happening in a corner case (stale sessions) and the difficulty of reproducing the issue, it took us over a week to discover and confirm the root cause"

I don't know about others, but sessions that are idle > 1h are definitely not a corner case for me.
I use Claude code for personal work and most of the time, I'm making it do a task which could say take ~10 to 15mins. Note that I spend a lot of time back and forth with the model planning this task first before I ask it to execute it.
Once the execution starts, I usually step away for a coffee break (or) switch to Codex to work on some other project - follow similar planning and execution with it.
There are very high chances that it takes me > 1h to come back to Claude.

View on HN · Topics

I refuse to believe that caching tiers for longer than 1 hour would be impossible to transparently build and use to avoid all this complexity to begin with, nor that it would be that expensive to maintain in 2026 when the bulk costs are on inference anyways which would even be reduced by occasional longer time cache hits.

View on HN · Topics

Some of these changes and effects seriously affect my flow. I'm a very interactive Claude user, preferring to provide detailed guidance for my more serious projects instead of just letting them run. And I have multiple projects active at once, with some being untouched for days at a time. Along with the session limits this feels like compounding penalties as I'm hit when I have to wait for session reset (worse in the middle of a long task), when I take time to properly review output and provide detailed feedback, when I'm switching among currently active projects, when I go back to a project after a couple days or so,... This is honestly starting to feel untenable.

View on HN · Topics

Here's one person's feedback. After the release of 4.7, Claude became unusable for me in two ways: frequent API timeouts when using exactly the same prompts in Claude Code that I had run problem-free many times previously, and absurdly slow interface response in Claude Cowork. I found a solution to the first after a few days (add "CLAUDE_STREAM_IDLE_TIMEOUT_MS": "600000" to settings.json), but as of a few hours ago Cowork--which I had thought was fantastic, by the way--was still unusable despite various attempts to fix it with cache clearing and other hacks I found on the web.

View on HN · Topics

Right a very simple UI thing that they should have that would have prevented so much misunderstanding. Is a simple counter. How much usage do a have i used and how much is left.

If a message will do a cache recreation the cost for that should be viewable.

View on HN · Topics

> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!

Seems like a very basic software engineering error that would be caught by normal unit testing.

View on HN · Topics

Are you saying dropping cache after 1 hour is not intentionally degrading performance?

View on HN · Topics

But it still degrades performance.

View on HN · Topics

Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.

View on HN · Topics

Just as a note to CC fans/users here since I had an opportunity to do so... I tested resuming a session that was stale at 950k tokens after returning from a full day or so of being idle, thus a fully empty quota/session.

Resuming it cost 5% of the current session and 1% of the weekly session on a max subscription.

View on HN · Topics

Useful update. Would be useful to me to switch to a nightly / release cycle but I can see why they don't: they want to be able to move fast and it's not like I'm going to churn over these errors. I can only imagine that the benchmark runs are prohibitively expensive or slow or not using their standard harness because that would be a good smoke test on a weekly cadence. At the least, they'd know the trade-offs they're making.

Many of these things have bitten me too. Firing off a request that is slow because it's kicked out of cache and having zero cache hits (causes everything to be way more expensive) so it makes sense they would do this. I tried skipping tool calls and thinking as well and it made the agent much stupider. These all seem like natural things to try. Pity.

View on HN · Topics

broken cache does not breakthrough make.

View on HN · Topics

The third bug is the one worth dwelling on. Dropping thinking blocks every turn instead of just once is the kind of regression that only shows up in production traffic. A unit test for "idle-threshold clearing" would assert "was thinking cleared after an hour of idle" (yes) without asserting "is thinking preserved on subsequent turns" (no). The invariant is negative space.

The real lesson is that an internal message-queuing experiment masked the symptoms in their own dogfooding. Dogfooding only works when the eaten food is the shipped food.

View on HN · Topics

ngl lost alot of trust in cc after reading this, specially point 1

how do you just do that to millions of users building prod code with your shit

View on HN · Topics

So now the solution is to input a “ping” message every hour so that it keeps the cache warm?

View on HN · Topics

ugh, caching based on idle time is horrible for my usage anyway; since claude is both fairly slow and doesn't really have much of a daily quota anyway I often tell it to do something and then wander off and come back to check on it when I next think about it. I always vaguely assumed that my session would not "detect" the intervening time anyway since it was all async. I guess from a global perspective time-based cache eviction makes sense.

View on HN · Topics

had this happen to me mid-refactor and spent 20 min wondering if I'd gone crazy. honestly the one hour threshold feels pretty arbitrary, sometimes you just step away to think

View on HN · Topics

I think that would also have busted cache all the time, and uncached requests consume usage limits rapidly.

Summarizer