Model Quality Comparisons

Debates about GPT-5.4 versus Claude Opus performance, observations about model improvements and regressions, and strategies for switching between providers based on current quality

While users debate whether GPT-5.4 or Claude Opus currently holds the crown, many developers are adopting a pragmatic, "loyalty-free" strategy by using platform-agnostic configurations to jump between providers whenever performance fluctuates or usage limits tighten. Some contributors praise GPT-5.4 for its uncanny ability to "one-shot" complex system failures and its polished Mac integration, while others find Claude significantly more intuitive for data-centric tasks that otherwise require excessive human intervention. Beyond raw benchmarks, the consensus suggests that these models are most effective when treated as guided surgical tools rather than autonomous architects, particularly in explicit environments like Go where the AI's logic can be tightly controlled. Ultimately, the choice often comes down to current stability and specific workflows, with users frequently rotating subscriptions to bypass "nerfed" updates or restrictive daily caps.

View on HN · Topics

This is the way to do it if you're a serious developer, you use the AI coding agent as a tool, guiding it with your experience. Telling a coding agent "build me an app" is great, but you get garbage. Telling an agent "I've stubbed out the data model and flow in the provided files, fill in the TODOs for me" allows you the control over structure that AI lacks. The code in the functions can usually be tweaked yourself to suit your style. They're also helpful for processing 20 different specs, docs, and RFCs together to help you design certain code flows, but you still have to understand how things work to get something decent.

Note that I program in Go, so there is only really 1 way to do anything, and it's super explicit how to do things, so AI is a true help there. If I were using Python, I might have a different opinion, since there are 27 ways to do anything. The AI is good at Go, but I haven't explored outside of that ecosystem yet with coding assistance.

View on HN · Topics

I did some work on an agent that was supposed to demonstrate a learning pipeline. I figured having it fix broken linux servers with some contrived failures would make for a good example if it getting stuck, having to get some assistance to progress, and then having a better capability for handling that class of failure in the future.

I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..

Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.

View on HN · Topics

Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

View on HN · Topics

> Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back.

This seems to be the new narrative around here but it's not jiving with what I'm experiencing. Obviously Anthropic's uptime stats are terrible but when it's up, it's excellent (and I personally haven't had any issues with uptime this week, although my earlier-in-the-week usage was lighter than usual).

I'm loving 4.7. I was loving 4.6 too. I use Codex to get code reviews done on Claude-generated code but have no interest in using it as my daily driver.

View on HN · Topics

really struggling to understand where this is coming from, agents haven't really improved much over using the existing models. anything an agent can do, is mostly the model itself. maybe the technology itself isn't mature yet.

View on HN · Topics

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

View on HN · Topics

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

View on HN · Topics

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

View on HN · Topics

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

View on HN · Topics

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

View on HN · Topics

at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

View on HN · Topics

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

View on HN · Topics

Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

Summarizer