Summarizer

Vibe Coding and Code Quality

The polarization around building apps without reading the code; critics warn of unmaintainable "slop" and technical debt, while proponents value the speed and ability to bypass syntax.

← Back to Opus 4.5 is not the normal AI agent experience that I have had thus far

87 comments tagged with this topic

View on HN · Topics
> I still believe those who rave about it are not writing anything I would consider "engineering". Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators. The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word. See also: https://www.seangoedecke.com/pure-and-impure-engineering/
View on HN · Topics
Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title. OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best. And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.
View on HN · Topics
I just uninstalled Zed today when I realized the reason I couldn't delete a file on Windows because it was open in Zed. So I wouldn't speak too highly of the LLM's ability to write code. I have never seen another editor on Windows make the mistake of opening files without enabling all 3 share modes.
View on HN · Topics
I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable. Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.
View on HN · Topics
I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours. Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck. I'm tempted to take another swing with Opus 4.5 for the science.
View on HN · Topics
I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."
View on HN · Topics
This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance. I even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding. Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.
View on HN · Topics
How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?
View on HN · Topics
By tomorrow the app will be replaced with a new version from the other competitor, by that time the memory leak will not reveal itself
View on HN · Topics
Part of the "one day" development time was exhaustively testing it. Since the tool's scope is so small, getting good test coverage was pretty easy. Of course, I'm not guaranteeing through formal verification methods that the code is bug free. I did find bugs, but they were all areas that were poorly specified by me in the requirements.
View on HN · Topics
Hah! I actually initiated the project because I'm a long time XBMC/Kodi user. I started using it when it was called XBMC, on an actual Xbox 1. I am sick and tired of its crashing, poor playback performance, and increasingly bloated feature set. It's embarrassing when I have friends or family over for movie night, and I have to explain "Sorry folks, Kodi froze midway through the movie again" while I frantically try to re-launch/reboot my way back to watching the movie. VLC's playback engine is much better but the VLC app's TV UX is ass. This application actually uses the libVLC playback engine under the hood.
View on HN · Topics
I think anecdotes like this may prove very relevant the next few years. AI might make bad code, but a project of bad code that's still way smaller than a bloated alternative, and has a UX tailored to your exact requirements could be compelling. A big part of the problem with existing software is that humans seem to be pretty much incapable of deciding a project is done and stop adding to it. We treat creating code like a job or hobby instead of a tool. Nothing wrong with that, unless you're advertising it as a tool.
View on HN · Topics
I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions. Then I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed
View on HN · Topics
> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it. ... says it all.
View on HN · Topics
I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all but I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window
View on HN · Topics
This was me. I have done a full 180 over the last 12 months or so, from "they're an interesting idea, and technically impressive, but not practically useful" to "holy shit I can have entire days/weeks where I don't write a single line of code".
View on HN · Topics
The value I'm getting from this stuff is so large that I'll take those risks, personally.
View on HN · Topics
Many people - simonw is the most visible of them, but there are countless others - have given up trying to convinced folks who are determined to not be convinced, and are simply enjoying their increased productivity. This is not a competition or an argument.
View on HN · Topics
Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people? My experience scrolling X and HN is a bunch of people going "omg opus omg Claude Code I'm 10x more productive" and that's it. Just hand wavy anecdotes based on their own perceived productivity. I'm open to being convinced but just saying stuff is not convincing. It's the opposite, it feels like people have been put under a spell. I'm following The Primeagen, he's doing a series where he is trying these tools on stream and following peoples advice on how to use them the best. He's actually quite a good programmer so I'm eager to see how it goes. So far he isn't impressed and thus neither am I. If he cracks it and unlocks significant productivity then I will be convinced.
View on HN · Topics
It's true that some people will just continually move the goalposts because they are invested in their beliefs. But that doesn't mean that the skepticism around certain claims aren't relevant. Nobody serious is disputing that LLM's can generate working code. They dispute claims like "Agentic workflows will replace software developers in the short to medium term", or "Agentic workflows lead to 2-100x improvements in productivity across the board". This is what people are looking for in terms of evidence and there just isn't any. Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity [0]. We also have evidence that it harms our cognitive abilities [1]. Anecdotally, I have found myself lazily reaching for LLM assistance when encountering a difficult problem instead of thinking deeply about the problem. Anecdotally I also struggle to be more productive using AI-centric agents workflows in areas of expertise . We want evidence that "vibe engineering" is actually more productive across the entire lifespan of a software project. We want evidence that it produces better outcomes. Nobody has yet shown that. It's just people claiming that because they vibe coded some trivial project, all of software development can benefit from this approach. Recently a principle engineer at Google claimed that Claude Code wrote their team's entire year's worth of work in a single afternoon. They later walked that claim back, but most do not. I'm more than happy to be convinced but it's becoming extremely tiring to hear the same claims being parroted without evidence and then you get called a luddite when you question it. It's also tiring when you push them on it and they blame it on the model you use, and then the agent, and then the way you handle context, and then the prompts, and then "skill issue". Meanwhile all they have to show is some slop that could be hand coded in a couple hours by someone familiar with the domain. I use AI, I was pretty bullish on it for the last two years, and the combination of it simply not living up to expectations + the constant barrage of what feels like a stealth marketing campaign parroting the same thing over and over (the new model is way better, unlike the other times we said that) + the amount of absolute slop code that seems to continue to increase + companies like Microsoft producing worse and worse software as they shoehorn AI into every single product (Office was renamed to Copilot 365). I've become very sensitive to it, much in the same way I was very sensitive to the claims being made by certain VC backed webdev companies regarding their product + framework in the last few years. I'm not even going to bring up the economic, social, and environmental issues because I don't think they're relevant, but they do contribute to my annoyance with this stuff. [0] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... [1] https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling...
View on HN · Topics
IDEs vs vim makes a lot of sense. AI really does feel like using an IDE in a certain way Using AI for me absolutely makes it feel like I'm more productive. When I look back on my work at the end of the day and look at what I got done, it would be ludicrous to say it was multiple times the amount as my output pre-AI Despite all the people replying to me saying "you're holding it wrong" I know the fix to it doing the wrong thing. Specify in more detail what I want. The problem with that is twofold: 1. How much to specify? As little as possible is the ideal, if we want to maximize how much it can help us. A balance here is key. If I need to detail every minute thing I may as well write the code myself 2. If I get this step wrong, I still have to review everything, rethink it, go back and re-prompt, costing time When I'm working on production code, I have to understand it all to confidently commit. It costs time for me to go over everything, sometimes multiple iterations. Sometimes the AI uses things I don't know about and I need to dig into it to understand it AI is currently writing 90% of my code. Quality is fine . It's fun! It's magical when it nails something one-shot. I'm just not confident it's faster overall
View on HN · Topics
I think this is an extremely honest perspective. It's actually kind of cool that it's gotten to the point it can write most code - albeit with a lot of handholding.
View on HN · Topics
The "tools" in this context are literally a few hundred lines of Python or Github CI build pipeline, we're not talking about 500kLOC massive applications. I'm building tools, not complete factories :) The AI builds me a better hammer specifically for the nails I'm nailing 90% of the time. Even if the AI goes away, I still know how the custom hammer works.
View on HN · Topics
That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.
View on HN · Topics
Note how nothing in your comment addresses anything I said. Except the last sentence that basically confirms what I said. This perfectly illustrates the discourse around AI. As for the snide and patronizing "it's in your interest to stay relevant": 1. I use these tools daily. That's why I don't subscribe to willful wide-eyed gullibility. I know exactly what these tools can and cannot do. The vast majority of "AI skeptics" are the same. 2. In a few years when the world is awash in barely working incomprehensible AI slop my skills will be in great demand. Not because I'm an amazing developer (I'm not), but because I have experience separating wheat from the chaff
View on HN · Topics
I know someone who is using a vibe coded or at least heavily assisted text editor, praising it daily, while also saying llms will never be productive. There is a lot of dissonance right now.
View on HN · Topics
I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code. I've done enough of that with LLMs, though not in large codebases. I should say I'm hardly ever vibe-coding, unlike the original article. If I think I want code that will last, I'll steer the models in ways that lean on years of non-LLM experience. E.g., I'll reject results that might work if they violate my taste in code. It also helps that I can read code very fast. I estimate I can read code 100x faster than most students. I'm not sure there is any way to teach that other than the old-fashioned way, which involves reading (and writing) a lot of code.
View on HN · Topics
All of these things work very well IMO in a professional context. Especially if you're in a place where a lot of time was spent previously revising PRs for best practices, etc, even for human-submitted code, then having the LLM do that for you that saves a bunch of time. Most humans are bad at following those super-well. There's a lot of stuff where I'm pretty sure I'm up to at least 2x speed now. And for things like making CLI tools or bash scripts, 10x-20x. But in terms of "the overall output of my day job in total", probably more like 1.5x. But I think we will need a couple major leaps in tooling - probably deterministic tooling, not LLM tooling - before anyone could responsibly ship code nobody has ever read in situations with millions of dollars on the line (which is different from vibe-coding something that ends up making millions - that's a low-risk-high-reward situation, where big bets on doing things fast make sense. if you're already making millions, dramatic changes like that can become high-risk-low-reward very quickly. In those companies, "I know that only touching these files is 99.99% likely to be completely safe for security-critical functionality" and similar "obvious" intuition makes up for the lack of ability to exhaustively test software in a practical way (even with fuzzers and things), and "i didn't even look at the code" is conceding responsibility to a dangerous degree there.)
View on HN · Topics
This sounds nice, except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom. Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet?
View on HN · Topics
> except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom. Ironically, that race to the bottom is no different then we already have. Have you already worked for a company before? A lot of software is developed, BADLY. I dare to say that a lot of software that Opus 4.5 generates, is often a higher quality then what i have seen in my 25 year carrier. The amount of companies that cheapen out, hiring juniors fresh from school, to work as coding monkies is insane. Then projects have bugs / security issues, with tons of copy/pasted code, or people not knowing a darn thing. Is that any different then your feared future? I dare to say, that LLms like Opus are frankly better then most juniors. As a junior to do a code review for security issues. Opus literally creates extensive tests, points out issues that you expect from a mid or higher level dev. Of course, you need to know to ask! You are the manager. > Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet? Frankly, yes ... If you are a real developer, do you still think development is fun after 10 years, 20 years? Doing the exact same boring work. Reimplementing the 1001 login page, the 101 contact form ... A ton of our work is in reality repeating the same crap over and over again. And if we try to bypass it, we end up tied to tied to those systems / frameworks that often become a block around our necks. Our industry has a lot of burnout because most tasks may start small but then grow beyond our scope. Todays its ruby on rails programming, then its angular, no wait, react, no wait, Vue, no wait, the new hotness is whatever again. > slowly terraform the planet? Well, i am actually making something. Can you say the same for all the power / gpu draw with bitcoin, Ethereum whatever crap mining. One is productive, a tool with insane potential and usage, the other is a virtual currency where only one is ever popular with limited usage. Yet, it burns just as much for a way more limited return of usability. Those LLMs that you are so against, make me a ton more productive. You wan to to try out something, but never really wanted to get committed because it was weeks of programming. Well, now you as manager, can get projects done fast. Learn from them way faster then your little fingers ever did.
View on HN · Topics
Why dont I see any streams building apps as quickly as they say? Just HYpe
View on HN · Topics
>So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself. I don't think you've seen the full potential. I'm currently #1 on 5 different very complex computer engineering problems, and I can't even write a "hello world" in rust or cpp. You no longer need to know how to write code, you just need to understand the task at a high level and nudge the agents in the right direction. The game has changed. - https://highload.fun/tasks/3/leaderboard - https://highload.fun/tasks/12/leaderboard - https://highload.fun/tasks/15/leaderboard - https://highload.fun/tasks/18/leaderboard - https://highload.fun/tasks/24/leaderboard
View on HN · Topics
How are you qualified to judge its performance on real code if you don't know how to write a hello world? Yes, LLMs are very good at writing code, they are so good at writing code that they often generate reams of unmaintainable spaghetti. When you submit to an informatics contest you don't have paying customers who depend on your code working every day. You can just throw away yesterday's code and start afresh. Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.
View on HN · Topics
> How are you qualified to judge its performance on real code if you don't know how to write a hello world? The ultimate test of all software is "run it and see if it's useful for you." You do not need to be a programmer at all to be qualified to test this.
View on HN · Topics
> The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric. I've also built a bitorrent implementation from the specs in rust where I'm keeping the binary under 1MB. It supports all active and accepted BEPs: https://www.bittorrent.org/beps/bep_0000.html Again, I literally don't know how to write a hello world in rust. I also vibe coded a trading system that is connected to 6 trading venues. This was a fun weekend project but it ended up making +20k of pure arbitrage with just 10k of working capital. I'm not sure this proves my point, because while I don't consider myself a programmer, I did use Python, a language that I'm somewhat familiar with. So yeah, I get what you are saying, but I don't agree. I used highload as an example, because it is an objective way of showing that a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.
View on HN · Topics
>I'm currently #1 on 5 different very complex computer engineering problems Ah yes, well known very complex computer engineering problems such as: * Parsing JSON objects, summing a single field * Matrix multiplication * Parsing and evaluating integer basic arithmetic expressions And you're telling me all you needed to do to get the best solution in the world to these problems was talk to an LLM?
View on HN · Topics
What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects. If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. No doubt I could give Opus 4.5 "build be a XYZ app" and it will do well. But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". Any non-technical person might read that and go "if it works it works" but any reasonable engineer will know that thats not enough.
View on HN · Topics
That sounds like an insane way to do anything that matters. Sure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email? The only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters.
View on HN · Topics
Sure, and the buildings are built to a slowly-evolving code, using standard construction techniques, operating as a predictable building in a larger ecosystem. The problem with "all software" being AI-generated is that, to use your analogy, the electrical standards, foundation, and building materials have all been recently vibe-coded into existence, and none of your construction workers are certified in any of it.
View on HN · Topics
I know what you are talking about, but there is more to life than just product-market fit. Hardly any of us are working on Postgres, Photoshop, blender, etc. but it's not just cope to wish we were. It's good to think about the needs to business and the needs of society separately. Yes, the thing needs users, or no one is benefiting. But it also needs to do good for those users, and ultimately, at the highest caliber, craftsmanship starts to matter again. There are legitimate reasons for the startup ecosystem to focus firstly and primarily on getting the users/customers. I'm not arguing against that. What I am arguing is why does the industry need to be dominated by startups in terms of the bulk of the products (not bulk of the users). It begs the question of how much societally-meaningful programming waiting to be done. I'm hoping for a world where more end users code (vibe or otherwise) and the solve their own problems with their own software. I think that will make more a smaller, more elite software industry that is more focused on infrastructure than last-mile value capture. The question is how to fund the infrastructure. I don't know except for the most elite projects, which is not good enough for the industry (even this hypothetical smaller one) on the whole.
View on HN · Topics
Users will not care about the quality of your code, or the backed architecture, or your perfectly strongly typed language. They only care about their problems and treat their computers like an appliance. They don't care if it takes 10 seconds or 20 seconds. They don't even care if it has ads, popups, and junk. They are used to bloatware and will gladly open their wallets if the tool is helping them get by. It's an unfortunately reality but there it is, software is about money and solving problems. Unless you are working on a mission critical system that affects people's health or financial data, none of those matter much.
View on HN · Topics
I know the customer's couldn't care about the quality of the code they see. But the idea that they don't care about software being bad/laggy/bloated ever, because it "still solves problems", doesn't stand up to scrutiny as an immutable fact of the universe. Market conditions can change. I'm banking on a future that if users feel they can (perhaps vibe) code their own solutions, they are far less likely to open their wallets for our bloatware solutions. Why pay exorbitant rents for shitty SaaS if you can make your own thing ad-free, exactly to your own mental spec? I want the "computers are new, programmers are in short supply, customer is desperate" era we've had in my lifetime so far to come to a close.
View on HN · Topics
It’s kind of funny - there’s another thread up where a dev claimed a 20-50x speed up. To their credit they posted videos and links to the repo of their work. And when you check the work, a large portion of it was hand rolling an ORM (via an LLM). Relatively solved problem that an LLM would excel at, but also not meaningfully moving the needle when you could use an existing library. And likely just creating more debt down the road.
View on HN · Topics
I've hand-rolled my own ultra-light ORM because the off-the-shelf ones always do 100 things you don't need.* And of course the open source ones get abandoned pretty regularly. Type ORM, which a 3rd party vendor used on an app we farmed out to them, mutates/garbles your input array on a multi-line insert. That was a fun one to debug. The issue has been open forever and no one cares. https://github.com/typeorm/typeorm/issues/9058 So yeah, if I ever need an ORM again, I'm probably rolling my own. *(I know you weren't complaining about the idea of rolling your own ORM, I just wanted to vent about Type ORM. Thanks for listening.)
View on HN · Topics
Reminds me of a post I read a few days ago of someone crowing about an LLM writing for them an email format validator. They did not have the LLM code up an accompanying send-an-email-validation loop, and were blithely kept uninformed by the LLM of the scar tissue built up by experience in the industry on how curiously a deep rabbit hole email validation becomes. If you’ve been around the block and are judicious how you use them, LLM’s are a really amazing productivity boost. For those without that judgement and taste, I’m seeing footguns proliferate and the LLM’s are not warning them when someone steps on the pressure plate that’s about to blow off their foot. I’m hopeful we will this year create better context window-based or recursive guardrails for the coding agents to solve for this.
View on HN · Topics
This is a typical problem you see in autodidacts. They will recreate solutions to solved problems, trip over issues that could have been avoided, and generally do all of things you would expect someone to do if they are working with skill but no experience. LLMs accelerate this and make it more visible, but they are not the cause. It is almost always a person trying to solve a problem and just not knowing what they don't know because they are learning as they go.
View on HN · Topics
With the right prompt the LLM will solve it in the first place. But this is an issue of not knowing what you don't know, so it makes it difficult to write the right prompt. One way around this is to spawn more agents with specific tasks, or to have an agent that is ONLY focused on finding patterns/code where you're reinventing the wheel. I often have one agent/prompt where I build things but then I have another agent/prompt where their only job is to find codesmells, bad patterns, outdated libraries, and make issues or fix these problems.
View on HN · Topics
"And likely just creating more debt down the road" In the most inflationary era of capabilities we've seen yet, it could be the right move. What's debt when in a matter of months you'll be able to clear it in one shot?
View on HN · Topics
I'd quickly trash your application if I see you just vibe coded some bullshit app. Developing is about working smart, and its not smart to ask AI to code stuff that already exists, its in fact wasteful.
View on HN · Topics
> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. You’re talking like in the year 2026 we’re still writing code for future humans to understand and improve. I fear we are not doing that. Right now, Opus 4.5 is writing code that later Opus 5.0 will refactor and extend. And so on.
View on HN · Topics
This sounds like magical thinking. For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle. Then, abstractions are tools to lower the cognitive load. Good abstractions reduce the total amount of code written, allow to reason about the code in terms of these abstractions, and do not leak in the area of their applicability. Say Sequence, or Future, or, well, function are examples of good abstractions. No matter what kind of cognitive process handles the code, it benefits from having to keep a smaller amount of context per task. "Code structure does not matter, LLMs will handle it" sounds a bit like "Computer architectures don't matter, the Turing Machine is proved to be able to handle anything computable at all". No, these things matter if you care about resource consumption (aka cost) at the very least.
View on HN · Topics
> For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle. Guess what, AIs don't like that as well because it makes harder for them to achieve the goal. So with minimal guidance, which at this point could probably be provided by AI as well, the output of AI agent is not that.
View on HN · Topics
Yes LLMs aren't very good at architecture. I suspect because the average project online has pretty bad architecture. The training set is poisoned. It's kind of bittersweet for me because I was dreaming of becoming a software architect when I graduated university and the role started disappearing so I never actually became one! But the upside of this is that now LLMs suck at software architecture... Maybe companies will bring back the software architect role? The training set has been totally poisoned from the architecture PoV. I don't think LLMs (as they are) will be able to learn software architecture now because the more time passes, the more poorly architected slop gets added online and finds its way into the training set. Good software architecture tends to be additive, as opposed to subtractive. You start with a clean slate then build up from there. It's almost impossible to start with a complete mess of spaghetti code and end up with a clean architecture... Spaghetti code abstractions tend to mislead you and lead you astray... It's like; understanding spaghetti code tends to soil your understanding of the problem domain. You start to think of everything in terms of terrible leaky abstraction and can't think of the problem clearly. It's hard even for humans to look at a problem through fresh eyes; it's likely even harder for LLMs to do it. For example, if you use a word in a prompt, the LLM tends to try to incorporate that word into the solution... So if the AI sees a bunch of leaky abstractions in the code; it will tend to try to work with them as opposed to removing them and finding better abstractions. I see this all the time with hacks; if the code is full of hacks, then an LLM tends to produce hacks all the time and it's almost impossible to make it address root causes... Also hacks tend to beget more hacks.
View on HN · Topics
Refactoring is a very mechanistic way of turning bad code into good. I don’t see a world in which our tools (LLMs or otherwise) don’t learn this.
View on HN · Topics
I also love how AI enthusiasts just ignore the issue of exhausted training data... You cant just magically create more training data. Also synthetic training data reduces the quality of models.
View on HN · Topics
That's been my main argument for why LLMs might be at their zenith. But I recently started wondering whether all those codebases we expose to them are maybe good enough training data for the next generation. It's not high quality like accepted stackoverflow answers but it's working software for the most part.
View on HN · Topics
We don't know what Opus 5.0 will be able to refactor. If argument is "humans and Opus 4.5 cannot maintain this, but if requirements change we can vibe-code a new one from scratch", that's a coherent thesis, but people need to be explicit about this. (Instead this feels like the mott that is retreated to, and the bailey is essentially "who cares, we'll figure out what to do with our fresh slop later".) Ironically, I've been Claude to be really good at refactors, but these are refactors I choose very explicitly. (Such as I start the thing manually, then let it finish.) (For an example of it, see me force-pushing to https://github.com/NixOS/nix/pull/14863 implementing my own code review.) But I suspect this is not what people want. To actually fire devs and not rely on from-scratch vibe-coding, we need to figure out which refactors to attempt in order to implement a given feature well. That's a very creative open-ended question that I haven't even tried to let the LLMs take a crack at it, because why I would I? I'm plenty fast being the "ideas guy". If the LLM had better ideas than me, how would I even know? I'm either very arrogant or very good because I cannot recall regretting one of my refactors, at least not one I didn't back out of immediately.
View on HN · Topics
Yeah, I might be early to this. And certainly, I still read a lot of code in my day to day right now. But I sure write a lot less of it, and the percentage I write continues to go down with every new model release. And if I'm no longer writing it, and the person who works on it after me isn't writing it either, it changes the whole art of software engineering. I used to spend a great deal of time with already working code that I had written thinking about how to rewrite it better, so that the person after me would have a good clean idea of what is going on. But humans aren't working in the repos as much now. I think it's just a matter of time before the models are writing code essentially for their eyes, their affordances -- not ours.
View on HN · Topics
Yeah we're not too far from agreement here. Something I think though (which, again, I could very well be wrong about; uncertainty is the only certainly right now) is that "so the person after me would have a good clean idea of what is going on" is also going to continue mattering even when that "person" is often an AI. It might be different, clarity might mean something totally different for AIs than for humans, but right now I think a good expectation is that clarity to humans is also useful to AIs. So at the moment I still spend time coaxing the AI to write things clearly. That could turn out to be wasted time, but who knows. I also think if it as a hedge against the risk that we hit some point where the AIs turn out to be bad at maintaining their own crap, at which point it would be good for me to be able to understand and work with what has been written!
View on HN · Topics
Yeah I think it's a mistake to focus on writing "readable" or even "maintainable" code. We need to let go of these aging paradigms and be open to adopting a new one.
View on HN · Topics
In my experience, LLMs perform significantly better on readable maintainable code. It's what they were trained on after-all. However what they produce is often highly readable but not very maintainable due to the verbosity and obvious comments. This seems to pollute codebases over time and you see AI coding efficiency slowly decline.
View on HN · Topics
Do readability and maintainability not matter when AI "reads" and maintains the code? I'm pretty sure they do.
View on HN · Topics
If that would be true, you could surely ask an LLM to write the same complexity apps in brainfuck, right?
View on HN · Topics
I had Opus write a whole app for me in 30 seconds the other night. I use a very extensive AGENTS.md to guide AI in how I like my code chiseled. I've been happily running the app without looking at a line of it, but I was discussing the app with someone today, so I popped the code open to see what it looked like. Perfect. 10/10 in every way. I would not have written it that good. It came up with at least one idea I would not have thought of. I'm very lucky that I rarely have to deal with other devs and I'm writing a lot of code from scratch using whatever is the latest version of the frameworks. I understand that gives me a lot of privileges others don't have.
View on HN · Topics
Their thesis is that code quality does not matter as it is now a cheap commodity. As long as it passes the tests today it's great. If we need to refactor the whole goddamn app tomorrow, no problem, we will just pay up the credits and do it in a few hours.
View on HN · Topics
I trust my offshore engineers way more than the slop I get from the "AI"s. My team makes my life a lot easier, because I know they know what they are doing. The LLMs, not so much.
View on HN · Topics
> Their thesis is that code quality does not matter as it is now a cheap commodity. That's not how I read it. I would say that it's more like "If a human no longer needs to read the code, is it important for it to be readable?" That is, of course, based on the premise that AI is now capable of both generating and maintaining software projects of this size. Oh, and it begs another question: are human-readable and AI-readable the same thing? If they're not, it very well could make sense to instruct the model to generate code that prioritizes what matters to LLMs over what matters to humans.
View on HN · Topics
I would be much more impressed with implementing new, long-requested features into existing software (that are open to later maintain LLM-generated code).
View on HN · Topics
When an LLM can rewrite it in 24 hours and fill the missing parts in minutes that argument is hard to defend. I can vibe code what a dev shop would charge 500k to build and I can solo it in 1-2 weeks. This is the reality today. The code will pass quality checks, the code doesn’t need to be perfect, it doesn’t need to be cleaver it needs to be. It’s not difficult to see this right? If an LLM can write English it can write Chinese or python. Then it can run itself, review itself and fix itself. The cat is out of bag, what it will do to the economy… I don’t see anything positive for regular people. Write some code has turned into prompt some LLM. My phone can outplay the best chess player in the world, are you telling me you think that whatever unbound model anthropic has sitting in their data center can’t out code you?
View on HN · Topics
What mainstream software product do I use on a day to day basis besides Claude? The ones that continue to survive all build around a platform of services, MSO, Adobe, etc. Most enterprise product offerings, platform solutions, proprietary data access, proprietary / well accepted implementation. But lets not confuse it with the ability to clone it, it doesnt seem far fetched to get 10 people together and vibe out a full slack replacement in a few weeks.
View on HN · Topics
The whole point of good engineering was not about just hitting the hard specs, but also have extendable, readable, maintainable code. But if today it’s so cheap to generate new code that meets updated specs, why care about the quality of the code itself? Maybe the engineering work today is to review specs and tests and let LLMs do whatever behind the scenes to hit the specs. If the specs change, just start from scratch.
View on HN · Topics
"Write the specs and let the outsourced labor hit them" is not a new tale. Let's assume the LLM agents can write tests for, and hit, specs better and cheaper than the outsourced offshore teams could. So let's assume now you can have a working product that hits your spec without understanding the code. How many bugs and security vulnerabilities have slipped through "well tested" code because of edge cases of certain input/state combinations? Ok, throw an LLM at the codebase to scan for vulnerabilities; ok, throw another one at it to ensure no nasty side effects of the changes that one made; ok, add some functionality and a new set of tests and let it churn through a bunch of gross code changes needed to bolt that functionality into the pile of spaghetti... How long do you want your critical business logic relying on not-understood code with "100% coverage" (of lines of code and spec'd features) but super-low coverage of actual possible combinations of input+machine+system state? How big can that codebase get before "rewrite the entire world to pass all the existing specs and tests" starts getting very very very slow? We've learned MANY hard lessons about security, extensibility, and maintainability of multi-million-LOC-or-larger long-lived business systems and those don't go away just because you're no longer reading the code that's making you the money. They might even get more urgent. Is there perhaps a reason Google and Amazon didn't just hire 10x the number of people at 1/10th the salary to replace the vast majority of their engineering teams year ago?
View on HN · Topics
> let LLMs do whatever behind the scenes to hit the specs assuming for the sake of argument that's completely true, then what happens to "competitive advantage" in this scenario? it gets me thinking: if anyone can vibe from spec, whats stopping company a (or even user a) from telling an llm agent "duplicate every aspect of this service in python and deploy it to my aws account xyz"... in that scenario, why even have companies?
View on HN · Topics
It’s all fun and games vibecoding until you A) have customers who depend on your product B) it breaks or the one person prompting and has access to the servers and api keys gets incapacited (or just bored). Sure we can vibecode oneoff projects that does something useful (my fav is browser extensions) but as soon as we ask others to use our code on a regular basis the technical debt clock starts running. And we all know how fast dependencies in a project breaks.
View on HN · Topics
It's all fine till money starts being involved and whoopsies cost more than few hours of fixing.
View on HN · Topics
In my personal experience, Claude is better at greenfield, Codex is better at fitting in. Claude is the perfect tool for a "vibe coder", Codex is for the serious engineer who wants to get great and real work done. Codex will regularly give me 1000+ line diffs where all my comments (I review every single line of what agents write) are basically nitpicks. "Make this shallow w/ early return, use | None instead of Optional", that sort of thing. I do prompt it in detail though. It feels like I'm the person coming in with the architecture most of the time, AI "draws the rest of the owl."
View on HN · Topics
Exactly. The main issue IMO is that "software that seems to work" and "software that works" can be very hard to tell apart without validating the code, yet these are drastically different in terms of long-term outcomes. Especially when there's a lot of money, or even lives, riding on these outcomes. Just because LLMs can write software to run the Therac-25 doesn't mean it's acceptable for them to do so. Your hobby project, though, knock yourself out.
View on HN · Topics
On the contrary, Opus 4.5 is the best agent I’ve ever used for making cohesive changes across many files in a large, existing codebase. It maintains our patterns and looks like all the other code. Sometimes it hiccups for sure.
View on HN · Topics
But time I spend asking is time I could have been writing exactly what I wanted in the first place, if I already did the planning to understand what I wanted. Once I know what I want, it doesn't take that long, usually. Which is why it's so great for prototyping, because it can create something during the planning, when you haven't planned out quite what you want yet.
View on HN · Topics
I totally agree. And welcome to disposable software age.
View on HN · Topics
It just one shots bug fixes in complex codebases. Copy-paste the bug report and watch it go.
View on HN · Topics
If you have microservices architecture in your project you are set for AI. You can swap out any lacking, legacy microservice in your system with "greenfield" vibecoded one.
View on HN · Topics
I think there is a subjective difference. When a human builds dogshit at least you know they put some effort and the hours in. When I'm reading piles of LLM slop, I know that just reading it is already more effort than it took to write. It feels like I'm being played. This is entirely subjective and emotional. But when someone writes something with an LLM in 5 seconds and asks me to spend hours reviewing...fuck off.
View on HN · Topics
> You have to experience it yourself on your own real problems and over the course of days or weeks. How do you stop it from over-engineering everything?
View on HN · Topics
This has always been my problem whether it's Gemini, openai or Claude. Unless you hand-hold it to an extreme degree, it is going to build a mountain next to a molehill. It may end up working, but the thing is going to convolute apis and abstractions and mix patterns basically everywhere
View on HN · Topics
Sure, I can tell it not to do that, but it doesn't know what that is. It's a je ne sais quoi . I can't teach it taste .
View on HN · Topics
Difficult and it really depends on the complexity. I definitely work in a spec-driven way, with a step-by-step implementation phase. If it goes the wrong way I prefer to rewrite the spec and throw away the code.