Summarizer

Engineering vs. Coding

A recurring distinction between "coding" (boilerplate, standard patterns) which AI conquers, and "engineering" (novel logic, complex systems, 3D graphics) where AI supposedly still fails.

← Back to Opus 4.5 is not the normal AI agent experience that I have had thus far

58 comments tagged with this topic

View on HN · Topics
I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do. Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating. Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo. I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.
View on HN · Topics
We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level. With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected. I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.
View on HN · Topics
> I still believe those who rave about it are not writing anything I would consider "engineering". Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators. The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word. See also: https://www.seangoedecke.com/pure-and-impure-engineering/
View on HN · Topics
> Engineering is.. novel, for lack of a better word. Tell that to the guys drawing up the world's 10 millionth cable suspension bridge
View on HN · Topics
Actually, 10000th https://www.bridgemeister.com/fulllist.htm
View on HN · Topics
I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.
View on HN · Topics
Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title. OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best. And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.
View on HN · Topics
That's not the point being made to you. The point is that most people in the "software engineering" space are applying known tools and techniques to problems that are not groundbreaking. Very few are doing theoretical computer science, algorithm design, or whatever you think it is that should be called "engineering."
View on HN · Topics
Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.
View on HN · Topics
It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago. Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.
View on HN · Topics
I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI. I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan) 4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc. I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little. All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere. However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.
View on HN · Topics
I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.
View on HN · Topics
I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code. The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic , and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing .
View on HN · Topics
Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet.
View on HN · Topics
> It also can't do Rust really well, once you get to the meat of it. Not sure why that is Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.
View on HN · Topics
I thought that initially, but I don't think the skills AI weakens in me are particularly valuable Let's say AI becomes too expensive - I more or less only have to sharpen up being able to write the language. My active recall of the syntax, common methods and libraries. That's not hard or much of a setback Maybe this would be a problem if you're purely vibe coding, but I haven't seen that work long term
View on HN · Topics
The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance. Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting. Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.
View on HN · Topics
I use CC for so much more than just writing code that I cannot imagine being constrained within an IDE. Why would I want to launch an IDE to have CC update the *arr stack on my NAS to the latest versions for example? Last week I pointed CC at some media files that weren't playing correctly on my Apple TV. It detected what the problem formats were and updated my *arr download rules to prefer other releases and then configured tdarr to re-encode problem files in my existing library.
View on HN · Topics
> I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code Yes indeed, these are the things on the other hand which aren't working well in my opinion: - large codebase - complex domain knowledge - creating any feature where you need product insights - tasks requiring choices (again, complexity doesn't matter here, the task may be simple but require some choices) - anything unclear where you don't know where you are going first While you don't experience any of these when teaching or side projects, these are very common in any enterprise context.
View on HN · Topics
We've been living in that world since the invention of the compiler ("automatic programming"). Few people write machine code any more. If you think of LLMs as a new variety of compiler, a lot of their shortcomings are easier to describe.
View on HN · Topics
One benefit of conventional code is that it expresses logic in an unambiguous way. Much of "the minutiae" is deciding what happens in edge cases. It's even harder to express that in a human language than in computer languages. For some domains it probably doesn't matter.
View on HN · Topics
Isn't this a meaningless example? Formatters already exist. Generating code that doesn't need to be formatted is exactly the same as generating code and then formatting it. I care about the norms in my codebase that can't be automatically enforced by machine. How is state managed? How are end-to-end tests written to minimize change detectors? When is it appropriate to log something?
View on HN · Topics
Lints are typically well suited for syntactic properties or some local semantic properties. Almost all interesting challenges in software design and evolution involve nonlocal semantic properties.
View on HN · Topics
>So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself. I don't think you've seen the full potential. I'm currently #1 on 5 different very complex computer engineering problems, and I can't even write a "hello world" in rust or cpp. You no longer need to know how to write code, you just need to understand the task at a high level and nudge the agents in the right direction. The game has changed. - https://highload.fun/tasks/3/leaderboard - https://highload.fun/tasks/12/leaderboard - https://highload.fun/tasks/15/leaderboard - https://highload.fun/tasks/18/leaderboard - https://highload.fun/tasks/24/leaderboard
View on HN · Topics
How are you qualified to judge its performance on real code if you don't know how to write a hello world? Yes, LLMs are very good at writing code, they are so good at writing code that they often generate reams of unmaintainable spaghetti. When you submit to an informatics contest you don't have paying customers who depend on your code working every day. You can just throw away yesterday's code and start afresh. Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.
View on HN · Topics
I know what's like running a business, and building complex systems. That's not the point. I used highload as an example because it seems like an objective rebuttal to the claim that "but it can't tackle those complex problems by itself." And regarding this: "Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash" Again, a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.
View on HN · Topics
What I think people get wrong (especially non-coders) is that they believe the limitation of LLMs is to build a complex algorithm. That issue in reality was fixed a long time ago. The real issue is to build a product. Think about microservices in different projects, using APIs that are not perfectly documented or whose documentation is massive, etc. Honestly I don't know what commenters on hackernews are building, but a few months back I was hoping to use AI to build the interaction layer with Stripe to handle multiple products and delayed cancellations via subscription schedules. Everything is documented, the documentation is a bit scattered across pages, but the information is out there. At the time there was Opus 4.1, so I used that. It wrote 1000 lines of non-functional code with 0 reusability after several prompts. I then asked something to Chat gpt to see if it was possible without using schedules, it told me yes (even if there is not) and when I told Claude to recode it, it started coding random stuff that doesn't exist. I built everything to be functional and reusable myself, in approximately 300 lines of code. The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric
View on HN · Topics
> The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric. I've also built a bitorrent implementation from the specs in rust where I'm keeping the binary under 1MB. It supports all active and accepted BEPs: https://www.bittorrent.org/beps/bep_0000.html Again, I literally don't know how to write a hello world in rust. I also vibe coded a trading system that is connected to 6 trading venues. This was a fun weekend project but it ended up making +20k of pure arbitrage with just 10k of working capital. I'm not sure this proves my point, because while I don't consider myself a programmer, I did use Python, a language that I'm somewhat familiar with. So yeah, I get what you are saying, but I don't agree. I used highload as an example, because it is an objective way of showing that a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.
View on HN · Topics
This hits the nail on the head. There's a marked difference between a JSON parser and a real world feature in a product. Real world features are complex because they have opaque dependencies, or ones that are unknown altogether. Creating a good solution requires building a mental model of the actual complex system you're working with, which an LLM can't do. A JSON parser is effectively a book problem with no dependencies.
View on HN · Topics
You are looking at this wrong. Creating a json parser is trivial. The thing is that my one-shot attempt was 10x slower than my final solution. Creating a parser for this challenge that is 10x more efficient than a simple approach does require deep understanding of what you are doing. It requires optimizing the hot loop (among other things) that 90-95% of software developers wouldn't know how to do. It requires deep understanding of the AVX2 architecture. Here you can read more about these challenges: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa...
View on HN · Topics
>I'm currently #1 on 5 different very complex computer engineering problems Ah yes, well known very complex computer engineering problems such as: * Parsing JSON objects, summing a single field * Matrix multiplication * Parsing and evaluating integer basic arithmetic expressions And you're telling me all you needed to do to get the best solution in the world to these problems was talk to an LLM?
View on HN · Topics
Lol, the problem is not finding a solution, the problem is solving it in the most efficient way. If you think you can beat an LLM, the leaderboard is right there.
View on HN · Topics
What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects. If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. No doubt I could give Opus 4.5 "build be a XYZ app" and it will do well. But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". Any non-technical person might read that and go "if it works it works" but any reasonable engineer will know that thats not enough.
View on HN · Topics
Well, the first 90% is easy, the hard part is the second 90%. Case in point: Self driving cars. Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders.
View on HN · Topics
No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline. Knowledge engineering has a notion called "covered/invisible knowledge" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good. Considering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person.
View on HN · Topics
I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet. > The only question is how long it takes to get there. This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations. It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom.
View on HN · Topics
Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help. I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool.
View on HN · Topics
Users will not care about the quality of your code, or the backed architecture, or your perfectly strongly typed language. They only care about their problems and treat their computers like an appliance. They don't care if it takes 10 seconds or 20 seconds. They don't even care if it has ads, popups, and junk. They are used to bloatware and will gladly open their wallets if the tool is helping them get by. It's an unfortunately reality but there it is, software is about money and solving problems. Unless you are working on a mission critical system that affects people's health or financial data, none of those matter much.
View on HN · Topics
I've hand-rolled my own ultra-light ORM because the off-the-shelf ones always do 100 things you don't need.* And of course the open source ones get abandoned pretty regularly. Type ORM, which a 3rd party vendor used on an app we farmed out to them, mutates/garbles your input array on a multi-line insert. That was a fun one to debug. The issue has been open forever and no one cares. https://github.com/typeorm/typeorm/issues/9058 So yeah, if I ever need an ORM again, I'm probably rolling my own. *(I know you weren't complaining about the idea of rolling your own ORM, I just wanted to vent about Type ORM. Thanks for listening.)
View on HN · Topics
It seems to me these days, any code I want to write tries to solve problems that LLMs already excel at. Thankfully my job is perhaps just 10% about coding, and I hope people like you still have some coding tasks that cannot be easily solved by LLMs. We should not exeggarate the capabilities of LLMs, sure, but let's also not play "don't look up".
View on HN · Topics
> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. You’re talking like in the year 2026 we’re still writing code for future humans to understand and improve. I fear we are not doing that. Right now, Opus 4.5 is writing code that later Opus 5.0 will refactor and extend. And so on.
View on HN · Topics
This sounds like magical thinking. For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle. Then, abstractions are tools to lower the cognitive load. Good abstractions reduce the total amount of code written, allow to reason about the code in terms of these abstractions, and do not leak in the area of their applicability. Say Sequence, or Future, or, well, function are examples of good abstractions. No matter what kind of cognitive process handles the code, it benefits from having to keep a smaller amount of context per task. "Code structure does not matter, LLMs will handle it" sounds a bit like "Computer architectures don't matter, the Turing Machine is proved to be able to handle anything computable at all". No, these things matter if you care about resource consumption (aka cost) at the very least.
View on HN · Topics
Yes LLMs aren't very good at architecture. I suspect because the average project online has pretty bad architecture. The training set is poisoned. It's kind of bittersweet for me because I was dreaming of becoming a software architect when I graduated university and the role started disappearing so I never actually became one! But the upside of this is that now LLMs suck at software architecture... Maybe companies will bring back the software architect role? The training set has been totally poisoned from the architecture PoV. I don't think LLMs (as they are) will be able to learn software architecture now because the more time passes, the more poorly architected slop gets added online and finds its way into the training set. Good software architecture tends to be additive, as opposed to subtractive. You start with a clean slate then build up from there. It's almost impossible to start with a complete mess of spaghetti code and end up with a clean architecture... Spaghetti code abstractions tend to mislead you and lead you astray... It's like; understanding spaghetti code tends to soil your understanding of the problem domain. You start to think of everything in terms of terrible leaky abstraction and can't think of the problem clearly. It's hard even for humans to look at a problem through fresh eyes; it's likely even harder for LLMs to do it. For example, if you use a word in a prompt, the LLM tends to try to incorporate that word into the solution... So if the AI sees a bunch of leaky abstractions in the code; it will tend to try to work with them as opposed to removing them and finding better abstractions. I see this all the time with hacks; if the code is full of hacks, then an LLM tends to produce hacks all the time and it's almost impossible to make it address root causes... Also hacks tend to beget more hacks.
View on HN · Topics
Well from cars or CPUs its not expected for them to eventually reach AGI, they also don't eat a trillion dollar hole into us peasants pockets. Sure, improvements can be made. But on a fundamental level, agents/LLMs can not reason (even though they love to act like they can). They are parrots learning words, these parrots wont ever invent new words once the list of words is exhausted though.
View on HN · Topics
In ~25 years or so of dealing with large, existing codebases, I've seen time and time again that there's a ton of business value and domain knowledge locked up inside all of that "messy" code. Weird edge cases that weren't well covered in the design, defensive checks and data validations, bolted-on extensions and integrations, etc., etc. "Just rewrite it" is usually -- not always, but _usually_ -- a sure path to a long, painful migration that usually ends up not quite reproducing the old features/capabilities and adding new bugs and edge cases along the way.
View on HN · Topics
The whole point of good engineering was not about just hitting the hard specs, but also have extendable, readable, maintainable code. But if today it’s so cheap to generate new code that meets updated specs, why care about the quality of the code itself? Maybe the engineering work today is to review specs and tests and let LLMs do whatever behind the scenes to hit the specs. If the specs change, just start from scratch.
View on HN · Topics
Even if you are going green field, you need to build it the way it is likely to be used based a having a deep familiarity with what that customer's problems are and how their current workflow is done. As much as we imagine everything is on the internet, a bunch of this stuff is not documented anywhere. An LLM could ask the customer requirement questions but that familiarity is often needed to know the right questions to ask. It is hard to bootstrap. Even if it could build the perfect greenfield app, as it updates the app it is needs to consider backwards compatibility and breaking changes. LLMs seem very far as growing apps. I think this is because LLMs are trained on the final outcome of the engineering process, but not on the incremental sub-commit work of first getting a faked out outline of the code running and then slowly building up that code until you have something that works. This isn't to say that LLMs or other AI approaches couldn't replace software engineering some day, but they clear aren't good enough yet and the training sets they have currently have access to are unlikely to provide the needed examples.
View on HN · Topics
> its building it the right way, in an easily understood way, in a way that's easily extensible. When I worked at Google, people rarely got promoted for doing that. They got promoted for delivering features or sometimes from rescuing a failing project because everyone was doing the former until promotion velocity dropped and your good people left to other projects not yet bogged down too far.
View on HN · Topics
I used to agree with this stance, but lately I'm more in the "LLMs are just fancy autocomplete" camp. They can just autocomplete increasingly more things, and when they can't, they fail in ways that an intelligent being just wouldn't. Rather that just output a wrong or useless autocompletion.
View on HN · Topics
They're not an equivalent intelligence as human's and thus have noticeably different failure modes. But human's fail in ways that they don't (eg. being unable to match llm's breadth and depth of knowledge) But the question i'm really asking is... isn't it more than a sheer statistical "trick" if an LLM can actually be instructed to "read surrounding code", understand the request, and demonstrably include it in its operation? You can't do that unless you actually understand what "surrounding code" is, and more importantly have a way to comply with the request...
View on HN · Topics
But... you can ask! Ask claude to use encapsulation, or to write the equivalent of interfaces in the language you using, and to map out dependencies and duplicate features, or to maintain a dictionary of component responsibilities. AI coding is a multiplier of writing speed but doesn't excuse planning out and mapping out features. You can have reasonably engineered code if you get models to stick to well designed modules but you need to tell them.
View on HN · Topics
Except it doesn't work the same way it won't work for LLMs. If you use too many microserviced, you will get global state, race conditions, much more complex failure models again and no human/LLM can effectively reason about those. We somewhat have tools to do that in case of monoliths, but if one gets to this point with microservices, it's game over.
View on HN · Topics
Yeah, all of those applications he shows do not really expose any complex business logic. With all the due respect: a file converter for windows is glueing few windows APIs with the relevant codec. Now, good luck working on a complex warehouse management application where you need extremely complex logic to sort the order of picking, assembling, packing on an infinite number of variables: weight, amazon prime priority, distribution centers, number and type of carts available, number and type of assembly stations available, different delivery systems and requirements for different delivery operators (such as GLE, DHL, etc) that has to work with N customers all requiring slightly different capabilities and flows, all having different printers and operations, etc, etc. And I ain't even scratching the surface of the business logic complexity (not even mentioning functional requirements) to avoid boring the reader. Mind you, AI is still tremendously useful in the analysis phase, and can sort of help in some steps of the implementation one, but the number of times you can avoid looking thoroughly at the code for any minor issue or discrepancy is absolutely close to 0.
View on HN · Topics
It might scale. So far, Im not convinced, but lets take a look at fundmentally whats happening and why humans > agents > LLMs. At its heart, programming is a constraint satisfaction problem. The more constraints (requirements, syntax, standards, etc) you have, the harder it is to solve them all simultaneously. New projects with few contributors have fewer constraints. The process of “any change” is therefore simpler. Now, undeniably 1) agents have improved the ability to solve constraints by iterating ; eg. Generate, test, modify, etc. over raw LLm output. 2) There is an upper bound (context size, model capability) to solve simultaneous constraints. 3) Most people have a better ability to do this than agents (including claude code using opus 4.5). So, if youre seeing good results from agents, you probably have a smaller set of constraints than other people. Similarly, if youre getting bad results, you can probably improve them by relaxing some of the constraints (consistent ui, number of contributors, requirements, standards, security requirements, split code into well defined packages). This will make both agents and humans more productive. The open question is: will models continue to improve enough to approach or exceed human level ability in this? Are humans willing to relax the constraints enough for it to be plausible? I would say currently people clambering about the end of human developers are cluelessly deceived by the “appearance of complexity” which does not match the “reality of constraints” in larger applications. Opus 4.5 cannot do the work of a human on code bases Ive worked on. Hell, talented humans struggle to work on some of them. …but that doesnt mean it doesnt work. Just that, right now, the constraint set it can solve is not large enough to be useful in those situations . …and increasingly we see low quality software where people care only about speed of delivery; again, lowering the bar in terms of requirements. So… you know. Watch this space. Im not counting on having a dev job in 10 years. If I do, it might be making a pile of barely working garbage. …but I have one now, and anyone who thinks that this year people will be largely replaced by AI is probably poorly informed and has misunderstood the capabilities on these models. Theres only so low you can go in terms of quality.
View on HN · Topics
Based on my experience using these LLMs regularly I strongly doubt it could even build any application with realistic complexity without screwing things up in major ways everywhere, and even on top of that still not meeting all the requirements.
View on HN · Topics
> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. The number of production applications that achieve this rounds to zero I’ve probably managed 300 brownfield web, mobile, edge, datacenter, data processing and ML applications/products across DoD, B2B, consumer and literally zero of them were built in this way
View on HN · Topics
This has always been my problem whether it's Gemini, openai or Claude. Unless you hand-hold it to an extreme degree, it is going to build a mountain next to a molehill. It may end up working, but the thing is going to convolute apis and abstractions and mix patterns basically everywhere
View on HN · Topics
Sure, I can tell it not to do that, but it doesn't know what that is. It's a je ne sais quoi . I can't teach it taste .