Adaptive Thinking Concerns

Users frustrated that adaptive thinking reduces reasoning without user control, especially in Claude.ai interface lacking effort parameters, and speculation this is compute-saving measure

Users are increasingly frustrated by Anthropic’s "adaptive thinking" and unannounced adjustments, which many perceive as a silent degradation of model intelligence aimed at saving compute costs. Critics argue that stripping thinking tokens from idle sessions and lowering default reasoning levels effectively "lobotomizes" the experience, depriving professional users of the high-tier reasoning they specifically paid for. Although Anthropic representatives claim these changes were intended to fix UI latency, the lack of explicit user control in the web interface has sparked accusations of "gaslighting" and a significant loss of trust. This perceived unreliability is driving some power users toward competitors, with many insisting they would rather pay higher fees for transparent, consistent performance than endure a "lazy" model.

View on HN · Topics

Okay, sure. There's a dollar/intelligence tradeoff. Let me decide to make it, don't silently make Claude dumber because I forgot about a terminal tab for an hour. Just because a project isn't urgent doesn't mean it's not important. If I thought it didn't need intelligence I would use Sonnet or Haiku.

View on HN · Topics

I’m not familiar with the Claude API but OpenAI has an encrypted thking messages option. You get something that you can send back but it is encrypted. Not available on Anthropic?

View on HN · Topics

No of course it’s unrealistic for them to hold the cache indefinitely and that’s not the point. You are keeping the session data yourself so you can continue even after cache expiry. The point I‘m making is that it made me very angry that without any announcement they changed behavior to strip the old thinking even when you have it in your session file. There is absolutely no reason to not ask the user about if they want this

And it’s part of a larger problem of unannounced changes it‘s just like when they introduced adaptive thinking to 4.6 a few weeks ago without notice.

Also they seem to be completely unaware that some users might only use Claude code because they are used to it not stripping thinking in contrast to codex.

Anyway I‘m happy that they saw it as a valid refund reason

View on HN · Topics

Not at the moment apparently. They remove the thinking messages when you continue after 1 hour. That was the whole idea of that change. So the LLM gets all your messages, its responses etc but not the thinking parts, why it generated that responses. You get a lobotomised session.

View on HN · Topics

OK didn't know that. I also resume fairly old sessions with 100-200k of context, and I sometimes keep them active for a while (but with large breaks in between).

Still on Opus 4.6 with no adaptive thinking, so didn't really notice anything worse in the past weeks, but who knows.

View on HN · Topics

I don't think you can store the cache on client given the thinking is server side and you only get summaries in your client (even those are disabled by default).

View on HN · Topics

If they really need to guard the thinking output, they could encrypt it and store it client side. Later it'd be sent back and decrypted on their server.

But they used to return thinking output directly in the API, and that was _the_ reason I liked Claude over OpenAI's reasoning models.

View on HN · Topics

And still are gaslighting:

We take reports about degradation very seriously. We never intentionally degrade our models [...] On March 4, we changed Claude Code's default reasoning effort from high to medium

Anthropic is the best company of its kind, but that is badly worded PR.

View on HN · Topics

Is adding JPEG compression to your software “intentional degradation” of the software? I wouldn't say providing a selectable option to use a faster, cheaper version of something qualifies as “degradation”.

It is certainly true that they did a poor job communicating this change to users (I did not know that the default was “high” before they introduced it, I assumed they had added an effort level both above and below whatever the only effort choice was there before). On the other hand, I was using Claude Code a fair bit on “medium” during that time period and it seemed to be performing just fine for me (and saving usage/time over “high”), so it doesn't seem clear that that was the wrong default, if only it had been explained better.

View on HN · Topics

They didn’t say “your experience is not worse” but they did frequently say “just turn reasoning effort back up and it will be fine”. And that pretty explicitly invalidates all the (correct) feedback which said it’s not just reasoning effort.

They knew they had deliberately made their system worse, despite their lame promise published today that they would never do such a thing. And so they incorrectly assumed that their ham fisted policy blunder was the only problem.

Still plenty I prefer about Claude over GPT but this really stings.

View on HN · Topics

They lost me at Opus 4.7

Anecdotally OpenAI is trying to get into our enterprise tooth and nail, and have offered unlimited tokens until summer.

Gave GPT5.4 a try because of this and honestly I don’t know if we are getting some extra treatment, but running it at extra high effort the last 30 days I’ve barely see it make any mistakes.

At some points even the reasoning traces brought a smile to my face as it preemptively followed things that I had forgotten to instruct it about but were critical to get a specific part of our data integrity 100% correct.

View on HN · Topics

extra high burns tokens i find. ( run 5.4 on medium for 90% of the tasks and high if i see medium struggling and its very focused and make minimum changes.

View on HN · Topics

Yeah but it also then strikes the perfect balance between being meticulous and pragmatic. Also it pushes back much more often than other models in that mode.

View on HN · Topics

Note mini-high is similar perf/latency to medium, but much cheaper

View on HN · Topics

> Why does it need to say things to itself like “great I have a plan now!”

How else would it know whether it has a plan now?

View on HN · Topics

Curious what effort level you have it set to and the prompt itself. Just a guess but this seems like it could be a potential smell of an excessively high effort level and may just need to dial back the reasoning a bit for that particular prompt.

View on HN · Topics

Yeah I had to deal with mine warning me that a website it accessed for its task contained a prompt injection, and when I told it to elaborate, the "injected prompt" turned out to be one its own <system-reminder> message blocks that it had included at some point. Opus 4.7 on xhigh

View on HN · Topics

>On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode

Instead of fixing the UI they lowered the default reasoning effort parameter from high to medium? And they "traced this back" because they "take reports about degradation very seriously"? Extremely hard to give them the benefit of doubt here.

View on HN · Topics

Hey, Boris from the team here.

We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).

View on HN · Topics

> people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this

UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.

View on HN · Topics

Yeah, this is so silly.

Anthropic: removes thinking output

Users: see long pauses, complain

Anthropic: better reduce thinking time

Users: wtf

To me it really, really seems like Anthropic is trying to undo the transparency they always had around reasoning chains, and a lot of issues are due to that.

Removing thinking blocks from the convo after 1 hour of being inactive without any notice is just the icing on the cake, whoever thought that was a good idea? How about making “the cache is hot” vs “the cache is cold” a clear visual indicator instead, so you slowly shape user behavior, rather than doing these types of drastic things.

View on HN · Topics

Wow, bad enough for them to actually publish something and not cryptic tweets from employees.

Damage is done for me though. Even just one of these things (messing with adaptive thinking) is enough for me to not trust them anymore. And then their A/B testing this week on pricing.

View on HN · Topics

I went with MiniMax. The token plans are over what I currently need, 4500 messages per 5h, 45000 messages per week for 40$. I can run multiple agents and they don't think for 5-10 minutes like Sonnet did. Also I can finally see the thinking process while Anthropic chose to hide it all from me.

I'm using Zed and Claude Code as my harnesses.

View on HN · Topics

A suggestion to Anthropic, just start charging the real price for your software. Of course you have to dumb it down, when the $200 tier in reality produces 5-10 thousand dollars in monthly costs when used by people who know how to max it out.
So then you come up with creative nonsense like "adaptive thinking" when your tool is sometimes working and sometimes outright not - the irony of "intelligent tools" not "thinking" aside. Of course this would kind of ruin your current value proposition as charging the actual price would make your core idea of making large swaths of skilled population un-employed, unfeasible but I am sure if you feed it into the Claude, it will find some points for and against, just like how Karpathy uses his LLM of choice to excrement his blog posts.

View on HN · Topics

The Claude UI still only has "adaptive" reasoning for Opus 4.7, making it functionally useless for scientific/coding work compared to older models (as Opus 4.7 will randomly stop reasoning after a few turns, even when prompted otherwise). There's no way this is just a bug and not a choice to save tokens.

View on HN · Topics

It was odd that there was no mention of the forced adaptive reasoning in the article. My guess is they don't have enough compute to do anything else here.

View on HN · Topics

They are forcing users to use adaptive thinking now and deprecating thinking.type: "enabled" and budget_tokens . But the web interface (claude.ai), does not support specifying the effort parameter.

View on HN · Topics

Just add this, it works better than Opus 4.7

vim ~/.claude/settings.json

{
"model": "claude-opus-4-6",
"fastMode": false,
"effortLevel": "high",
"alwaysThinkingEnabled": true,
"autoCompactWindow": 700000
}

View on HN · Topics

Wouldn't xhigh or max work better

View on HN · Topics

By imposing the use of their harness, they control the system prompt:

> On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality, and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7

They can pick the default reasoning effort:

> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode

They can decide what to keep and what to throw out (beyond simple token caching):

> On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6

It literally is all in the post.

I don't worry about anything though. It's not my product. I don't work for Anthropic, so I really couldn't care less about anyone else's degraded (or not) experience.

View on HN · Topics

> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode.

This sounds fishy. It's easy to show users that Claude is making progress by either printing the reasoning tokens or printing some kind of progress report. Besides, "very long" is such a weasel phrase.

View on HN · Topics

Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.

View on HN · Topics

Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.

View on HN · Topics

I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?

View on HN · Topics

Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.

View on HN · Topics

Did they not address how adaptive thinking has played in to all of this?

View on HN · Topics

As an end-user, I feel like they're kind of over-cooking and under-describing the features and behavior of what is a tool at the end of the day. Today the models are in a place where the context management, reasoning effort, etc. all needs to be very stable to work well.

The thing about session resumption changing the context of a session by truncating thinking is a surprise to me, I don't think that's even documented behavior anywhere?

It's interesting to look at how many bugs are filed on the various coding agent repos. Hard to say how many are real / unique, but quantities feel very high and not hard to run into real bugs rapidly as a user as you use various features and slash commands.

View on HN · Topics

If anthropic is doing this as a result of "optimizations" they need to stop doing that and raise the price.
The other thing, there should be a way to test a model and validate that the model is answering exactly the same each time.
I have experienced twice... when a new model is going to come out... the quality of the top dog one starts going down... and bam.. the new model is so good.... like the previous one 3 months ago.

The other thing, when anthropic turns on lazy claude... (I want to coin here the term Claudez for the version of claude that's lazy.. Claude zzZZzz = Claudez) that thing is terrible... you ask the model for something... and it's like... oh yes, that will probably depend on memory bandwith... do you want me to search that?...

YES... DO IT... FRICKING MACHINE..

View on HN · Topics

not the first time. Still not showing thinking are we?

View on HN · Topics

The issue making Claude just not do any work was infuriating to say the least. I already ran at medium thinking level so was never impacted, but having to constantly go "okay now do X like you said" was annoying.

Again goes back to the "intern" analogy people like to make.

View on HN · Topics

They feel they're in a position to make important trade-off decisions on behalf of the user. "It's just slightly worse, I'll sneak this change in" is not something to be tolerated, whether it actually turns out to be much worse or not. Their adaptive thinking mess has caused a ton of work for me. I know a lot of people are saying Codex is actually better now. I don't agree but I'm switching to it because it's much more reliable.

View on HN · Topics

Effort should not be configurable for Opus, it should be set to a single default that provides the highest level of capability. There are zero instances in which I am willing to accept a lesser result in exchange for a slightly faster response from Opus. If that were the case I would be using Flash or Haiku.

View on HN · Topics

Interesting. All 3 seems like they’re obviously going to impact quality. e.g, reducing the effort from high to medium.

So then, there must have been an explicit internal guidance/policy that allowed this tradeoff to happen.

Did they fix just the bug or the deeper policy issue?

Summarizer