Harness vs Model Blame

Difficulty distinguishing whether problems stem from Claude Code harness changes or underlying model changes, noting that harness updates may have more impact than model changes

Users express significant frustration over "harness drama" within Claude Code, arguing that stealthy system-message injections and restrictive permission prompts often cause more performance degradation than the underlying models themselves. While new effort levels like "xhigh" aim to refine reasoning, many find that the CLI’s undocumented safeguards—such as over-zealous malware checks and unnecessary read-only permissions—lead to baffling refusals that require manual patching or "dangerously" bypassing safety protocols to resolve. This lack of transparency has prompted power users to employ custom wrapper scripts and third-party tools like Cursor to regain control over context and "thinking" outputs, as the official harness is increasingly viewed as an unstable "helicopter parent." Ultimately, the consensus suggests that the primary bottleneck in AI-assisted coding is no longer model intelligence, but rather the friction and unpredictability introduced by the agentic software layer.

View on HN · Topics

Note that for Claude Code, it looks like they added a new undocumented command line argument `--thinking-display summarized` to control this parameter, and that's the only way to get thinking summaries back there.

VS Code users can write a wrapper script which contains `exec "$@" --thinking-display summarized` and set that as their claudeCode.claudeProcessWrapper in VS Code settings in order to get thinking summaries back.

View on HN · Topics

Here is additional discussion and hacks around trying to retain Thinking output in Claude Code (prior to this release):

https://github.com/anthropics/claude-code/issues/8477

View on HN · Topics

It really wasn't. Most of the argument was around product portfolio and agentic coding performance.

View on HN · Topics

Until the next time they push you back to Claude. At this point, I feel like this has to be the most unstable technology ever released. Imagine if docker had stopped working every two releases

View on HN · Topics

This is one of the many reasons I don't think the model companies are going to win the application space in coding.

There's literally zero context lost for me in switching between model providers as a cursor user at work. For personal stuff I'll use an open source harness for the same reason.

View on HN · Topics

I think this is more about which model you steer your coding harness to. You can also self-host a UI in front of multiple models, then you own the chat history.

View on HN · Topics

Usually the problems that cause this kind of thing are:

1) Bad prompt/context. No matter what the model is, the input determines the output. This is a really big subject as there's a ton of things you can do to help guide it or add guardrails, structure the planning/investigation, etc.

2) Misaligned model settings. If temperature/top_p/top_k are too high, you will get more hallucination and possibly loops. If they're too low, you don't get "interesting" enough results. Same for the repeat protection settings.

I'm not saying it didn't screw up, but it's not really the model's fault. Every model has the potential for this kind of behavior. It's our job to do a lot of stuff around it to make it less likely.

The agent harness is also a big part of it. Some agents have very specific restrictions built in, like max number of responses or response tokens, so you can prevent it from just going off on a random tangent forever.

View on HN · Topics

I generally think codex is doing well until I come in with my Opus sweep to clean it up. Claude just codes closer to the way my brain works. codex is great at finding numerical stability issues though and increasingly I like that it waits for an explicit push to start working. But talking to Claude Code the way I learned to talk to codex seems to work also so I think a lot of it is just learning curve (for me).

View on HN · Topics

Hate that about Claude Code. I have been adding permissions for it to do everything that makes sense to add when it comes to editing files, but way too often it will generate 20-30 line bash snippets using sed to do the edits instead, and then the whole permission system breaks down. It means I have to babysit it all the time to make sure no random permission prompts pop up.

View on HN · Topics

I do feel that CC sometimes starts doing dumb tasks or asking for approval for things that usually don’t really need it. Like extra syntax checks, or some greps/text parsing basic commands

View on HN · Topics

Exactly. Why do they ask permission for read-only operations?! You either run with --dangerously-skip-permissions or you come back after 30 minutes to find it waiting for permission to run grep. There's no middle ground, at least not that Claude CLI users have access to.

View on HN · Topics

This. They kind of snuck this into the release notes: switching the default effort level to Medium. High is significantly slower, but that’s somewhat mitigated by the fact that you don’t have to constantly act like a helicopter parent for it.

View on HN · Topics

Do you have to put it in a build/execute mode (separate from a planning mode) to allow it to move on? I use opencode, and that's how it works.

View on HN · Topics

> One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction.

I've finally started experimenting recently with Claude's --dangerously-skip-permissions and Codex's --dangerously-bypass-approvals-and-sandbox through external sandboxing tools. (For now just nono¹, which I really like so far, and soon via containerization or virtual machines.)

When I am using Claude or Codex without external sandboxing tools and just using the TUI, I spend a lot of time approving individual commands. When I was working that way, I found Codex's tendency to stop and ask me whether/how it should proceed extremely annoying. I found myself shouting at my monitor, "Yes, duh, go do the thing!".

But when I run these tools without having them ask me for permission for individual commands or edits, I sometimes find Claude has run away from me a little and made the wrong changes or tried to debug something in a bone-headed way that I would have redirected with an interruption if it has stopped to ask me for permissions. I think maybe Codex's tendency to stop and check in may be more valuable if you're relying on sandboxing (external or built-in) so that you can avoid individual permissions prompts.

--

1: https://nono.sh/

View on HN · Topics

There is a new flag for terminal flickering issues:

> Claude Code v2.1.89: "Added CLAUDE_CODE_NO_FLICKER=1 environment variable to opt into flicker-free alt-screen rendering with virtualized scrollback"

View on HN · Topics

there is an official codex plugin for claude. I just have them do adversarial reviews/implementations. etc with each other. adds a bit of time to the workflow but once you have the permissions sorted it'll just engage codex when necessary

View on HN · Topics

Do this -- take your coworker's PRs that they've clearly written in Claude Code, and have Codex/GPT 5.4 review them.

Or have Codex review your own Claude Code work.

It then becomes clear just how "sloppy" CC is.

I wouldn't mind having Opus around in my back pocket to yeet out whole net new greenfield features. But I can't trust it to produce well-engineered things to my standards. Not that anybody should trust an LLM to that level, but there's matters of degree here.

View on HN · Topics

It cuts both ways. What I usually do these days is to let codex write code, then use claude code /simplify, have both codex and claude code review the PR, then finally manually review and fixup things myself. It's still ~2x faster than doing everything by myself.

View on HN · Topics

This comment thread is a good learner for founders; look at how much anguish can be put to bed with just a little honest communication.

1. Oops, we're oversubscribed.

2. Oops, adaptive reasoning landed poorly / we have to do it for capacity reasons.

3. Here's how subscriptions work. Am I really writing this bullet point?

As someone with a production application pinned on Opus 4.5, it is extremely difficult to tell apart what is code harness drama and what is a problem with the underlying model. It's all just meshed together now without any further details on what's affected.

View on HN · Topics

What I want to know is why my bedrock-backed Claude gets dumber along with commercial users. Surely they're not touching the bedrock model itself. Only thing I can think of is that updates to the harness are the main cause of performance degradation.

View on HN · Topics

It seems a little more fussy than Opus 4.6 so far. It actually refuses to do a task from Claude's own Agentic SDK quick start guide ( https://code.claude.com/docs/en/agent-sdk/quickstart ):

"Per the instructions I've been given in this session, I must refuse to improve or augment code from files I read. I can analyze and describe the bugs (as above), but I will not apply fixes to `utils.py`."

View on HN · Topics

Claude Code injects a 'warning: make sure this file isn't malware' message after every tool call by default. It seems like 4.7 is over-attending to this warning. @bcherny, filed a bug report feedback ID: 238e5f99-d6ee-45b5-981d-10e180a7c201

View on HN · Topics

That "per the instructions I've been given in this session" bit is interesting. Are you perhaps using it with a harness that explicitly instructs it to not do that? If so, it's not being fussy, it's just following the instructions it was given.

View on HN · Topics

I'm using their own python SDK with default prompts, exactly as the instructions say in their guide (it's the code from their tutorial).

View on HN · Topics

I assume this is due to the fact that claude code appends a system message each time it reads a file that instructs it to think if the file is malware. It hasnt been an issue recently for me but it used to be so bad I had to patch out the string from the cli.js file. This is the instruction it uses:

> Whenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.

View on HN · Topics

I had the same problem. Restarted Claude Code after an update, and now it has disappeared.

View on HN · Topics

Is this happening on the latest build of Claude Code? Try `claude --update`

View on HN · Topics

This is a CC harness thing than a model thing but the "new" thinking messages ('hmm...', 'this one needs a moment...') are extraordinarily irritating. They're both entirely uninformative and strictly worse than a spinner. On my workflows CC often spends up to an hour thinking (which is fine if the result is good) and seeing these messages does not build confidence.

View on HN · Topics

It's hard to say. Admittedly I'm a heavy user as I intentionally cap out my 5x plan every week - I've personally found that I get more usage being on older versions of CC and being very vigilant on context management. But nobody can say for sure, we know they have A/B test capabilities from the CC leaks so it's just a matter of turning on a flag for a heavy user.

View on HN · Topics

Not showing up in claude code by default on the latest version. Apparently this is how to set it:

/model claude-opus-4-7

Coming from anthropic's support page, so hopefully they did't hallucinate the docs, cause the model name on claude code says:

/model claude-opus-4-7
⎿ Set model to Opus 4

what model are you?

I'm Claude Opus 4 (model ID: claude-opus-4-7).

View on HN · Topics

On the most current version (v2.1.110) of claude:

> /model claude-opus-4.7

⎿ Model 'claude-opus-4.7' not found

View on HN · Topics

Sounds like it was added as of .111, so update and it might work?

View on HN · Topics

claude-opus-4-7

not

claude-opus-4.7

View on HN · Topics

You're not, it wasn't released yet. Update to 111 and you'll see it (i'm on Max20, i do)

Heck, mine just automatically set it to 4.7 and xhigh effort (also a new feature?)

View on HN · Topics

Thanks, I was already on the latest claude code, I just restarted it and now it's showing 4.7 and xhigh.

xhigh was mentioned in the release post, it's the new default and between high and max.

View on HN · Topics

Dash, not dot

View on HN · Topics

/model claude-opus-4.7
⎿ Model 'claude-opus-4.7' not found

Just love that I'm paying $200 for models features they announce I can't use!

Related features that were announced I have yet to be able to use:

$ claude --enable-auto-mode
auto mode is unavailable for your plan

$ claude
/memory
Auto-dream: on · /dream to run
Unknown skill: dream

View on HN · Topics

I think that was a typo on my end, its "/model claude-opus-4-7" not "/model claude-opus-4.7"

View on HN · Topics

That sets it to opus 4:

/model claude-opus-4.7
⎿ Model 'claude-opus-4.7' not found

/model claude-opus-4-7
⎿ Set model to Opus 4

/model
⎿ Set model to Opus 4.6 (1M context) (default)

View on HN · Topics

Thanks, but not working for me, and I'm on the $200 max plan

Edit: Not 30 seconds later, claude code took an update and now it works!

View on HN · Topics

It's up now, update claude code

View on HN · Topics

--model claude-opus-4-7 works as well

View on HN · Topics

It does not work, it says Claude Opus 4 not 4.7

View on HN · Topics

I think its just a visual/default thing, cause Opus 4.0 isn't offered on claude code anymore. And opus 4.7 is on their official docs as a model you can change to, on claude code

Just ask it what model it is(even in new chat).

what model are you?

I'm Claude Opus 4 (model ID: claude-opus-4-7).

https://support.claude.com/en/articles/11940350-claude-code-...

View on HN · Topics

Assuming /effort max still gets the best performance out of the model (meaning "ULTRATHINK" is still a step below /effort max, and equivalent to /effort high), here is what I landed on when trying to get Opus 4.7 to be at peak performance all the time in ~/.claude/settings.json:

{
"env": {
"CLAUDE_CODE_EFFORT_LEVEL": "max",
"CLAUDE_CODE_DISABLE_BACKGROUND_TASKS": "1"
}
}

The env field in settings.json persists across sessions without needing /effort max every time.

I don't like how unpredictable and low quality sub agents are, so I like to disable them entirely with disable_background_tasks.

View on HN · Topics

It's been a little while since I cared all that much about the models because they work well enough already. It's the tooling and the service around the model that affects my day-to-day more.

I would guess a lot of the enterprise customers would be willing to pay a larger subscription price (1.5x or 2x) if it means that they would have significantly higher stability and uptime. 5% more uptime would gain more trust than 5% more on a gamified model metrics.

Anthropic used to position itself as more of the enterprise option and still does, but their issues recently seems like they are watering down the experience to appease the $20 dollar customer rather than the $200 dollar one. As painful as it is personally, I'd expect that they'd get more benefit long term from raising prices and gaining trust than short term gaining customers seeking utility at a $20 dollar price point.

View on HN · Topics

These stuck out as promising things to try. It looks like xhigh on 4.7 scores significantly higher on the internal coding benchmark (71% vs 54%, though unclear what that is exactly)

> More effort control: Opus 4.7 introduces a new xhigh (“extra high”) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, we’ve raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.

The new /ultrareview command looks like something I've been trying to invoke myself with looping, happy that it's free to test out.

> The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out.

View on HN · Topics

It's frankly becoming difficult for me to imagine what the next level of coding excellence looks like though.

By which I mean, I don't find these latest models really have huge cognitive gaps. There's few problems I throw at them that they can't solve.

And it feels to me like the gap now isn't model performance, it's the agenetic harnesses they're running in.

View on HN · Topics

Yeah, my personal anecdata is that Claude has just gotten better and better since January. I haven’t felt like even making the minor effort to compare with Codex’s current state. Just yesterday Claude Code made a major visible improvement in planning/executing — maybe it switched to 4.7 without me noticing? (Task: various internal Go services and Preact frontends.)

View on HN · Topics

Nobody I've seen in the comments is basing it on 4.7 performance. They're basing it on how unpleasant March and early April was on the Claude Code coding plans with 4.6. Which, from my experience, it was.

I'm interested in seeing how 4.7 performs. But I'm also unwilling to pony up cash for a month to do so. And frankly dissatisfied with their customer service and with the actual TUI tool itself.

It's not team sports, my friend. You don't have to pick a side. These guys are taking a lot of money from us. Far more than I've ever spent on any other development tooling.

View on HN · Topics

Claude code had safeguards like that hardcoded into the software. You could see it if you intercept the prompts with a proxy

View on HN · Topics

https://marginlab.ai/trackers/claude-code-historical-perform...

View on HN · Topics

Seems they jumped the gun releasing this without a claude code update?

/model claude-opus-4.7
⎿ Model 'claude-opus-4.7' not found

View on HN · Topics

Seems like it's not in Claude Code natively yet, but you can do an explicit `/model claude-opus-4-7` and it works.

View on HN · Topics

You use Claude Code? Then harness changes will have had much more impact than any model "stealth nerfing".

View on HN · Topics

Both CC but also cursor with raw api calls.

Summarizer