Vibe Coding Culture Critique

Critics blame Anthropic's internal vibe coding practices for quality issues, citing examples of 68GB memory usage bugs and the irony of AI coding tool makers not properly testing their own code

Critics contend that Anthropic has pivoted from a "premium" engineering culture to a reckless "vibe coding" approach, where an over-reliance on AI-generated solutions has led to embarrassing technical failures like 68GB memory leaks and broken session logic. This shift highlights a biting irony: while management suggests AI can largely replace human engineers, the resulting products often feel like unpolished "slop" that lacks basic QA, rigorous edge-case testing, and senior product oversight. Commentators argue that internal developers operating with infinite tokens are increasingly disconnected from real-world user constraints, creating a "complexity trap" where the software becomes too convoluted for its own creators to effectively reason about. Ultimately, there is a growing concern that by prioritizing rapid, "vibes-based" feature development over core reliability, Anthropic is undermining its own reputation and proving that traditional engineering rigor remains indispensable.

View on HN · Topics

Anthropic literally advertises long sessions, 1M context, high reasoning etc.

And then their vibe-coders tell us that we are to blame for using the product exactly as advertised: https://x.com/lydiahallie/status/2039800718371307603 while silently changing how the product works.

Please stop defending hapless innocent corporations.

View on HN · Topics

clarification though: the cache that's important to the GPU/NPU is loaded directly in the memory of the cards; it's not saved anywhere else. They could technically create cold storage of the tokens (vectors) and load that, but given how ephemeral all these viber coders are, it's unlikely there's any value in saving those vectors to load in.

So then it comes to what you're talking about, which is processing the entire text chain which is a different kind of cache, and generating the equivelent tokens are what's being costed.

But once you realize the efficiency of the product in extended sessions is cached in the immediate GPU hardware, then it's obvious that the oversold product can't just idle the GPU when sessions idle.

View on HN · Topics

> I’m sorry to be harsh, but your engineering culture must change. There are some types of software you can yolo. This isn’t one of them. The downstream cost of stupid mistakes is way, way too high, and far too many entirely avoidable bugs — and poor design choices — are shipping to customers way too often.

I have to imagine this isn't helped by working somewhere where you effectively have infinite tokens and usage of the product that people are paying for, sometimes a lot.

View on HN · Topics

It’s certainly #2. They have shown over dozens of decisions they move very quickly, break stuff, then have to both figure out what broke and how to explain it.

View on HN · Topics

what's even more amazing is it took them two weeks to fix what must have been a pretty obvious bug, especially given who they are and what they are selling.

View on HN · Topics

they just vibecoded a fix and didnt think about the tradeoff they were making and their always yes-man of a model just went with it

View on HN · Topics

A simpler explanation (esp. given the code we've seen from claude), is that they are vibecoding their own tools and moving fast and breaking things with predictably sloppy results.

View on HN · Topics

Yeah and that statement also speaks to their test rigor if they make a change that big without thoroughly testing the edge case they're modifying.

View on HN · Topics

IMO this is the consequence of a relentless focus on feature development over core product refinement. I often have the impression that Anthropic would benefit from a few senior product people. Someone needs to lend them a copy of “Escaping the Build Trap.” Just because we _can_ rapidly add features now doesn’t mean we should.

PS I’m not referencing a well-known book to suggest the solution is trite product group think, but good product thinking is a talent separate from good engineering, and Anthropic seems short on the later recently

View on HN · Topics

Essentially they should hire a few of the old school product guys from Apple. Best me to it, but the obsession on UX and quality from earlier Apple is exactly what they urgently need instead of tech folks trying to engineer themselves into complicated rabbit holes and shenanigans.

View on HN · Topics

I think they've dug themselves into a complexity trap. Beyond the stochastic nature of the models themselves, I don't think they're able to reason about their software anymore. Too many levers, too many dials, and code that likely nobody understands.

But worse, based on the pronouncements of Dario et al I suspect management is entirely unsympathetic because they believe we (SWEs) are on the chopping block to be replaced. And intimation that putting guard rails around these tools for quality concerns ... I'm suspecting is being ignored or discouraged.

In the end, I feel like Claude Code itself started as a bit of a science experiment and it doesn't smell to me like it's adopted mature best practices coming out of that.

View on HN · Topics

They had like 100 devs making 600k at one point. The issue is certainly not lack of talent. More like, they insist on forcing the vibe coding narrative. Some candidates are refusing interview requests accordingly.

View on HN · Topics

Even for all of us plan users, where we got barely any use from our plan because we'd destroy our 5h and 1w usage limits, also unlikely, after all they have an out of "your usage limits are guaranteed to be 5x of Pro users" (who are also being screwed).

Of course, all their vibe coding is being done with effectively infinite tokens, so...

View on HN · Topics

And the reason why Claude Code is so buggy ...

https://techtrenches.dev/p/the-snake-that-ate-itself-what-cl...

View on HN · Topics

It seems like there is no concept of deployment, or even A/B test, what works on presumably claude employee's laptop for the hour they spent testing it will ship immediately to everyone.

I mean, yes, even testing in production with some of your customer is better than.. testing with ALL of your customers?

View on HN · Topics

If Anthropic couldn't catch these issues before people started screaming at them, do we really believe 50% of software engineering jobs are going away?

View on HN · Topics

Anthropic releases used to feel thorough and well done, with the models feeling immaculately polished. It felt like using a premium product, and it never felt like they were racing to keep up with the news cycle, or reply to competitors.

Recently that immaculately polished feel is harder to find. It coincides with the daily releases of CC, Desktop App, unknown/undocumented changes to the various harnesses used in CC/Cowork. I find it an unwelcome shift.

I still think they're the best option on the market, but the delta isn't as high as it was. Sometimes slowing down is the way to move faster.

View on HN · Topics

hm. ml people love static evals and such, but have you considered approaches that typically appear in saas? (slow-rollouts, org/user constrained testing pools with staged rollouts, real-world feedback from actual usage data (where privacy policy permits)?

View on HN · Topics

And you didn't invest anything in polish, quality and reliability before... why? Because for any questions people have you reply something like "I have Claude working on this right now" and have no idea what's happening in the code?

A reminder: your vibe-coded slop required peak 68GB of RAM, and you had to hire actual engineers to fix it.

View on HN · Topics

Here's Jared Sumner of bun saying they reduced peak consumption from 68GB to 1.7GB: https://x.com/jarredsumner/status/2026497606575398987 Anthropic had acquired bun just 3 months prior.

A month prior their vibe-coders was unironically telling the world how their TUI wrapper for their own API is a "tiny game engine" as they were (and still are) struggling to output a couple of hundred of characters on screen: https://x.com/trq212/status/2014051501786931427

Meanwhile Boris: "Claude fixes most bugs by itself. " while breaking the most trivial functionality all the time : https://x.com/bcherny/status/2030035457179013235 https://x.com/bcherny/status/2021710137170481431 https://x.com/bcherny/status/2046671919261569477 https://x.com/bcherny/status/2040210209411678369 while claiming they "test carefully": https://x.com/bcherny/status/2024152178273989085

View on HN · Topics

I've noticed the same thing in my own AI assisted work. Feels like I'm moving too fast and it's easy to implement decisions quickly but they really have to be the right f--ing decisions. In the past dev was so slow so you had a lot of time to vet the hard decisions and now you don't.

View on HN · Topics

I agree. It all feels so AI-slopy now.

View on HN · Topics

I guess it's a bit of desperation to find a sustainable business model.

The AI hype is dying, at least outside the silicon valley bubble which hackernews is very much a part of.

That and all the dogfooding by slop coding their user facing application(s).

View on HN · Topics

To be fair to Anthropic, they did not intentionally degrade performance.

To take the opposite side, this is the quality of software you get atm when your org is all in on vibe coding everything.

View on HN · Topics

Experienced engineers that know the codebase and system well, and with enough time to consider the problem properly would likely consider this case.

But if we're vibing... This is the kind of bug that should make it back into a review agent/skill's instructions in a more generic format. Essentially if something is done to the message history, check there tests that subsequent turns work as expected.

But yeah, you'd have to piss off a bunch of users in prod first to discover the blind spot.

View on HN · Topics

Those are exactly the kind of issues you run into when your app is ai coded you built one thing and kill something else.

You have too many and the wrong benchmarks

View on HN · Topics

To think we'd have known about this in advance if they'd just have open sourced Claude Code, rather than them being forced into this embarrassing post mortem. Sunlight is the best disinfectant.

View on HN · Topics

A heavily vibe coded CLI would have tons of issues, regularly.

LLMs over edit and it's a known problem.

View on HN · Topics

I had similar experience just before 4.5 and before 4.6 were released.

Somehow, three times makes me not feel confident on this response.

Also, if this is all true and correct, how the heck they validate quality before shipping anything?

Shipping Software without quality is pretty easy job even without AI. Just saying....

View on HN · Topics

The issue making Claude just not do any work was infuriating to say the least. I already ran at medium thinking level so was never impacted, but having to constantly go "okay now do X like you said" was annoying.

Again goes back to the "intern" analogy people like to make.

View on HN · Topics

Zero QA basically.

View on HN · Topics

id go more on the lines of "dont know what to QA for"

View on HN · Topics

or you can use a non vibe designed efficient Rust TUI coding agent made by yours truly, all my coworkers use it too :) called https://maki.sh !

lua plugins WIP

View on HN · Topics

Maybe he didn't know or they were still figuring it out which is fine they're still engineers who can get things wrong sometimes but the communication felt lackluster and being on the receiving end sucks when you had a reliable setup which then degrades. There is a reason people don't upgrade software and why people say if it works don't fix it, but obviously that's not an option for Anthropic when you want to keep improving the product, so they need good measurement tools and quick rollbacks even if properly "benchmarking" LLMs could prove difficult.

View on HN · Topics

> On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6.

Is it just me or does this seem kind of shocking? Such a severe bug affecting millions of users with a non-trivial effect on the context window that should be readily evident to anyone looking at the analytics. Makes me wonder if this is the result of Anthropic's vibe-coding culture. No one's actually looking at the product, its code, or its outputs?

View on HN · Topics

It's really hard to understand. There needs to be really loud batman sign in the sky type signals from some hero third party calling out objective product degradation. Do they use cc internally? If so do they use a different version? This should've been almost as loud a break as service just going down altogether, yet it took 2 weeks to fix?!

View on HN · Topics

> they were challenging to distinguish from normal variation in user feedback at first

translation: we ignored this and our various vibe coders were busy gaslighting everyone saying this could not be happening

View on HN · Topics

Resuming from sessions are still broken since Feb (I had to get claude to write a hook to fix that itself), the monitoring tool doesn't work and blocks usage of what does (simple sleep - except it doesn't even block correctly so you just sidestep in more ridiculous ways), and yet there seems to be more annoying activity proxies/spinner wheels (staring into middle distance)... Like I don't know how in a span of a few months you lose such focus on your product goals. Has Anthropic reached that point in their lifecycle already where their product team is no longer staffed by engineers and they have more and more non-technical MBAs joining trying to ride the hype train?

Summarizer