Adaptive Thinking Confusion

Technical difficulties with new adaptive thinking replacing extended thinking, confusion about effort levels, inability to disable adaptive thinking, and concerns that models choose not to think when they should

The transition to "adaptive thinking" has sparked widespread frustration among users who feel the system has traded transparent control for an unreliable "trust us" approach that frequently fails to engage deep reasoning on complex tasks. While manually forcing effort levels to "max" or "xhigh" often restores model quality, these settings—paired with a new, less efficient tokenizer—are reportedly draining subscription credits in as little as fifteen minutes. Technical friction is further aggravated by confusing API changes and "irritating" status messages that many perceive as deceptive placeholders for background latency rather than genuine indicators of thought. Ultimately, power users are increasingly resorting to environment variables and custom scripts to bypass what they view as a restrictive and expensive "black box" in order to regain baseline performance.

View on HN · Topics

I'm finding the "adaptive thinking" thing very confusing, especially having written code against the previous thinking budget / thinking effort / etc modes: https://platform.claude.com/docs/en/build-with-claude/adapti...

Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that: https://platform.claude.com/docs/en/build-with-claude/adapti...

(Still trying to get a decent pelican out of this one but the new thinking stuff is tripping me up.)

View on HN · Topics

Its especially concerning / frustrating because boris’s reply to my bug report on opus being dumber was “we think adaptive thinking isnt working” and then thats the last I heard of it: https://news.ycombinator.com/item?id=47668520

Now disabling adaptive thinking plus increasing effort seem to be what has gotten me back to baseline performance but “our internal evals look good“ is not good enough right now for what many others have corroborated seeing

View on HN · Topics

This matches my experience as well, "adaptive thinking" chooses to not think when it should.

View on HN · Topics

My settings are pretty standard:

% claude
Claude Code v2.1.111
Opus 4.7 (1M context) with xhigh effort · Claude Max
~/...
Welcome to Opus 4.7 xhigh! · /effort to tune speed vs. intelligence

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. 50 meters is shorter than most parking lots — you'd spend more time starting the car and parking than walking there. Plus, driving to a car wash you're about to use defeats the purpose if traffic or weather dirties it en route.

View on HN · Topics

Maybe there was no thinking.

View on HN · Topics

yeah they took "i pick the budget" and turned it into "trust us".

View on HN · Topics

CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 claude…

View on HN · Topics

What does that actually do? Force the "effort" to be static to what I set?

View on HN · Topics

I used Opus 4.7 for about 15 minutes on the auto effort setting.

It nicely implemented two smallish features, and already consumed 100% of my session limit on the $20 plan.

See you again in five hours.

View on HN · Topics

I've been using it with `/effort max` all the time, and it's been working better than ever.

I think here's part of the problem, it's hard to measure this, and you also don't know in which AB test cohorts you may currently be and how they are affecting results.

View on HN · Topics

Agree. I keep effort max on Claude and xhigh on GPT for all tasks and keep tasks as scoped units of work instead of boil the ocean type prompts. It is hard to measure but ultimately the tasks are getting completed and I'm validating so I consider it "working as expected".

View on HN · Topics

It works better, until you run out of tokens. Running out of tokens is something that used to never happen to me, but this month now regularly happens.

Maybe I could avoid running out of tokens by turning off 1M tokens and max effort, but that's a cure worse than the disease IMO.

View on HN · Topics

so even with a new tokenizer that can map to more tokens than before, their answer is still just "you're not managing your context well enough"

"Opus 4.7 uses an updated tokenizer that [...] can map to more tokens—roughly 1.0–1.35× depending on the content type.

[...]

Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise."

View on HN · Topics

For me, making it high effort just fixed all the quality problems, and even cut down on token use somehow

View on HN · Topics

This. They kind of snuck this into the release notes: switching the default effort level to Medium. High is significantly slower, but that’s somewhat mitigated by the fact that you don’t have to constantly act like a helicopter parent for it.

View on HN · Topics

This comment thread is a good learner for founders; look at how much anguish can be put to bed with just a little honest communication.

1. Oops, we're oversubscribed.

2. Oops, adaptive reasoning landed poorly / we have to do it for capacity reasons.

3. Here's how subscriptions work. Am I really writing this bullet point?

As someone with a production application pinned on Opus 4.5, it is extremely difficult to tell apart what is code harness drama and what is a problem with the underlying model. It's all just meshed together now without any further details on what's affected.

View on HN · Topics

I mean they literally said on their own end that adaptive thinking isn't working as it should. They rolled it out silently, enabled by default, and haven't rolled it back.

View on HN · Topics

Working on some research projects to test Opus 4.7.

The first thing I notice is that it never dives straight into research after the first prompt. It insists on asking follow-up questions. "I'd love to dive into researching this for you. Before I start..." The questions are usually silly, like, "What's your angle on this analysis?" It asks some form of this question as the first follow-up every time.

The second observation is "Adaptive thinking" replaces "Extended thinking" that I had with Opus 4.6. I turned this off, but I wish I had some confidence that the model was working as hard as possible (I don't want it to mysteriously limit its thinking capabilities based on what it assumes requires less thought. I'd rather control the thinking level). I always ran research prompts with extended thinking enabled on Opus 4.6, and it gave me confidence that it was taking time to get the details right.

The third observation is it'll sit in a silent state of "Creating my research plan" for several minutes without starting to burn tokens. At first I thought this was because I had 2 tabs running a research prompt at the same time, but it later happened again when nothing else was running beside it. Perhaps this is due to high demand from several people trying to test the new model.

And fourth, the research output is significantly shorter and less detailed than Opus 4.6. Where before I would get several pages of research findings, now I get a short 2-3 pager.

Overall, I feel a bit confused. It doesn't seem better than 4.6, and from a research standpoint it might be worse. It seems like it got several different "features" that I'm supposed to learn now.

View on HN · Topics

Perhaps on the 10x plan.

It went through my $20 plan's session limit in 15 minutes, implementing two smallish features in an iOS app.

That was with the effort on auto.

It looks like full time work would require the 20x plan.

View on HN · Topics

This is a CC harness thing than a model thing but the "new" thinking messages ('hmm...', 'this one needs a moment...') are extraordinarily irritating. They're both entirely uninformative and strictly worse than a spinner. On my workflows CC often spends up to an hour thinking (which is fine if the result is good) and seeing these messages does not build confidence.

View on HN · Topics

Sounds really minor, but was actually a big contributor to me canceling and switching. The VS Code extension has a morphing spinner thing that rapidly switches between these little catch phrases. It drives me crazy, and I end up covering it up with my right click menu so I can read the actual thinking tokens without that attention vampire distracting me.

And of course they recently turned off all third party harness support for the subscription, so you're just forced to watch it and any other stuff they randomly decide to add, or pay thousands of dollars.

View on HN · Topics

There’s one that’s like “Considering 17 theories” that had me wondering what those 17 things would be, I wanted to see them! Turns out it’s just a static message. Very confusing.

View on HN · Topics

Maybe there are literally 17 models in an initial MoE pass. Seems excessive though.

View on HN · Topics

It wouldn't be so irritating if thinking didn't start to take a lot longer for tasks of similar complexity (or maybe it's taking longer to even start to think behind the scenes due to queueing).

View on HN · Topics

Agreed. I actually have thought those were “waiting to get a response from the API” rather than “the model is still thinking” messages

View on HN · Topics

The default effort change in Claude Code is worth knowing before your next session: it's now `xhigh` (a new level between `high` and `max`) for all plans, up from the previous default. Combined with the 1.0–1.35× tokenizer overhead on the same prompts, actual token spend per agentic session will likely exceed naive estimates from 4.6 baselines.

Anthropic's guidance is to measure against real traffic—their internal benchmark showing net-favorable usage is an autonomous single-prompt eval, which may not reflect interactive multi-turn sessions where tokenizer overhead compounds across turns. The task budget feature (just launched in public beta) is probably the right tool for production deployments that need cost predictability when migrating.

View on HN · Topics

That depends a bit on token efficiency. From their "Agentic coding performance by effort level" graph, it looks like they get similar outcome for 4.7 medium at half the token usage as 4.6 at high.

Granted that is, as you say, a single prompt, but it is using the agentic process where the model self prompts until completion. It's conceivable the model uses fewer tokens for the same result with appropriate effort settings.

View on HN · Topics

You're not, it wasn't released yet. Update to 111 and you'll see it (i'm on Max20, i do)

Heck, mine just automatically set it to 4.7 and xhigh effort (also a new feature?)

View on HN · Topics

Thanks, I was already on the latest claude code, I just restarted it and now it's showing 4.7 and xhigh.

xhigh was mentioned in the release post, it's the new default and between high and max.

View on HN · Topics

I've noticed it getting dumber in certain situations , can't point to it directly as of now , but seems like its hallucinating a bit more .. and ditto on the Adaptive thinking being confusing

View on HN · Topics

Assuming /effort max still gets the best performance out of the model (meaning "ULTRATHINK" is still a step below /effort max, and equivalent to /effort high), here is what I landed on when trying to get Opus 4.7 to be at peak performance all the time in ~/.claude/settings.json:

{
"env": {
"CLAUDE_CODE_EFFORT_LEVEL": "max",
"CLAUDE_CODE_DISABLE_BACKGROUND_TASKS": "1"
}
}

The env field in settings.json persists across sessions without needing /effort max every time.

I don't like how unpredictable and low quality sub agents are, so I like to disable them entirely with disable_background_tasks.

View on HN · Topics

I'd recommend anyone to ask Claude to show used context and thinking effort on its status line, something like:

```
#!/bin/bash
input=$(cat)
DIR=$(echo "$input" | jq -r '.workspace.current_dir // empty')
PCT=$(echo "$input" | jq -r '.context_window.used_percentage // 0' | cut -d. -f1)
EFFORT=$(jq -r '.effortLevel // "default"' ~/.claude/settings.json 2>/dev/null)
echo "${DIR/#$HOME/~} | ${PCT}% | ${EFFORT}"
```

Because the TUI it is not consistent when showing this and sometimes they ship updates that change the default.

View on HN · Topics

Here you go folks:

https://www.svgviewer.dev/s/odDIA7FR

"create a svg of a pelican riding on a bicycle" - Opus 4.7 (adaptive thinking)

View on HN · Topics

> Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens.

I guess that means bad news for our subscription usage.

View on HN · Topics

These stuck out as promising things to try. It looks like xhigh on 4.7 scores significantly higher on the internal coding benchmark (71% vs 54%, though unclear what that is exactly)

> More effort control: Opus 4.7 introduces a new xhigh (“extra high”) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, we’ve raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.

The new /ultrareview command looks like something I've been trying to invoke myself with looping, happy that it's free to test out.

> The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out.

View on HN · Topics

It's alot better for me especially on xhigh

View on HN · Topics

I have not seen any comment from the early tests of 4.7 claiming that it does not work better than the previous version.

However, there have been some valuable warnings about problems that have been hit in the first minutes after switching to 4.7.

For instance that the new guardrails can block working at projects where the previous version could be used without problems and that if you are not careful the changed default settings can make you reach the subscription limits much faster than with the previous version.

View on HN · Topics

API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"\"thinking.type.enabled\" is not supported for this model. Use \"thinking.type.adaptive\" and \"output_config.effort\" to control
thinking behavior."},"request_id":"req_011Ca7enRv4CPAEqrigcRNvd"}

Eep. AFAIK the issues most people have been complaining about with Opus 4.6 recently is due to adaptive thinking. Looks like that is not only sticking around but mandatory for this newer model.

edit: I still can't get it to work. Opus 4.6 can't even figure out what is wrong with my config. Speaking of which, claude configuration is so confusing there are .claude/ (in project) setting.json + a settings.local.json file, then a global ~/.claude/ dir with the same configuration files. None of them have anything defined for adaptive thinking or thinking type enable. None of these strings exist on my machine. Running latest version, 2.1.110

View on HN · Topics

> In Claude Code, we’ve raised the default effort level to xhigh for all plans.

Does it also mean faster to getting our of credits?

Summarizer