Productivity vs. Inefficiency

Debates over whether AI actually saves time or just feels productive. Some cite studies suggesting productivity drops (e.g., 19%), while others argue that the efficiency comes from parallelizing tasks or handling boilerplate. Users critique the lack of hard metrics in the article and the reliance on 'feeling' more efficient.

The debate over AI productivity centers on whether these tools offer high-leverage mechanical power or merely flood development pipelines with low-quality "chaff" that necessitates a massive, often hidden, scale of human validation. While proponents argue that LLMs enable a "power coding" workflow through parallelization and the offloading of tedious boilerplate, seasoned skeptics warn of a "productivity tax" where time gained in generation is lost to firefighting architectural flaws and vetting subtle hallucinations. This tension reveals a sharp cultural divide between those who view programming as an artisanal craft and those who see it as a business-driven engineering problem, suggesting that AI's true value depends entirely on an operator's ability to maintain discipline and domain expertise. Ultimately, many contributors believe the current inefficiency is a necessary learning phase, as developers shift from being solo creators to becoming high-level directors of probabilistic agents.

View on HN · Topics

The challenge not addressed with this line of reasoning is the required sheer scale of output validation on the backend of LLM-generated code. Human hand-developed code was no great shakes at the validation front either, but the scale difference hid this problem.

I’m hopeful what used to be tedious about the software development process (like correctness proving or documentation) becomes tractable enough with LLM’s to make the scale more manageable for us. That’s exciting to contemplate; think of the complexity categories we can feasibly challenge now!

View on HN · Topics

Recent experiments with LLM recipes (ChatGPT): missed salt in a recipe to make rice, then flubbed whether that type of rice was recommended to be washed in the recipe it was supposedly summarizing (and lied about it, too)…

Probabilistic generation will be weighted towards the means in the training data. Do I want my code looking like most code most of the time in a world full of Node.js and PHP? Am I better served by rapid delivery from a non-learning algorithm that requires eternal vigilance and critical re-evaluation or with slower delivery with a single review filtered through an meatspace actor who will build out trustable modules in a linear fashion with known failure modes already addressed by process (ie TDD, specs, integration & acceptance tests)?

I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.

View on HN · Topics

I think the word predictable is doing a bit of heavy lifting there.

Lets say you shovel some dirt, you’ve got a lot of control over where you get it from and where you put it..

Now get in your big digger’s cabin and try to have the same precision. At the level of a shovel-user, you are unpredictable even if you’re skilled. Some of your work might be out a decent fraction of the width of a shovel. That’d never happen if you did it the precise way!

But you have a ton more leverage. And that’s the game-changer.

View on HN · Topics

So you have a hobby.

I have a profession. Therefore I evaluate new tools. Agents coding I've introduced into my auxiliary tool forgings (one-off bash scripts) and personal projects, and I'm just now comfortable to introduce into my professional work. But I still evaluate every line.

View on HN · Topics

IME Gemini is pretty slow in comparison to Claude - but hey, it's super cheap at least.

But that speed makes a pretty significant difference in experience.

If you wait a couple minutes and then give the model a bunch of feedback about what you want done differently, and then have to wait again, it gets annoying fast.

If the feedback loop is much tighter things feel much more engaging. Cursor is also good at this (investigate and plan using slower/pricier models, implement using fast+cheap ones).

View on HN · Topics

> It's a shame that AI coding tools have become such a polarizing issue among developers.

Frankly I'm so tired of the usual "I don't find myself more productive", "It writes soup". Especially when some of the best software developers (and engineers) find many utility in those tools, there should be some doubt growing in that crowd.

I have come to the conclusion that software developers , those only focusing on the craft of writing code are the naysayers.

Software engineers immediately recognize the many automation/exploration/etc boosts, recognize the tools limits and work on improving them.

Hell, AI is an insane boost to productivity, even if you don't have it write a single line of code ever .

But people that focus on the craft (the kind of crowd that doesn't even process the concept of throwaway code or budgets or money) will keep laying in their "I don't see the benefits because X" forever, nonsensically confusing any tool use with vibe coding.

I'm also convinced that since this crowd never had any notion of what engineering is (there is very little of it in our industry sadly, technology and code is the focus and rarely the business, budget and problems to solve) and confused it with architectural, technological or best practices they are genuinely insecure about their jobs because once their very valued craft and skills are diminished they pay the price of never having invested in understanding the business, the domain, processes or soft skills.

View on HN · Topics

I've spent 2+ decades producing software across a number of domains and orgs and can fully agree that _disciplined use_ of LLM systems can significantly boost productivity, but the rules and guidance around their use within our industry writ large are still in flux and causing as many problems as they're solving today.

As the most senior IC within my org, since the advent of (enforced) LLM adoption my code contribution/output has stalled as my focus has shifted to the reactionary work of sifting through the AI generated chaff following post mortems of projects that should have never have shipped in the first place. On a good day I end up rejecting several PRs that most certainly would have taken down our critical systems in production due to poor vetting and architectural flaws, and on the worst I'm in full on fire fighting mode to "fix" the same issues already taking down production (already too late.)

These are not inherent technical problems in LLMs, these are organizational/processes problems induced by AI pushers promising 10x output without the necessary 10x requirements gathering and validation efforts that come with that. "Everyone with GenAI access is now a 10x SDE" is the expectation, when the reality is much more nuanced.

The result I see today is massive incoming changesets that no one can properly vet given the new shortened delivery timelines and reduced human resourcing given to projects. We get test suite coverage inflation where "all tests pass" but undermine core businesses requirements and no one is being given the time or resources to properly confirm the business requirements are actually being met. Shit hits the fan, repeat ad nauseum. The focus within our industry needs to shift to education on the proper application and use of these tools, or we'll inevitably crash into the next AI winter; an increasingly likely future that would have been totally avoidable if everyone drinking the Koolaid stopped to observe what is actually happening.

As you implied, code is cheap and most code is "throwaway" given even modest time horizons, but all new code comes with hidden costs not readily apparent to all the stakeholders attempting to create a new normal with GenAI. As you correctly point out, the biggest problems within our industry aren't strictly technical ones, they're interpersonal, communication and domain expertise problems, and AI use is simply exacerbating those issues. Maybe all the orgs "doing it wrong" (of which there are MANY) simply fail and the ones with actual engineering discipline "make it," but it'll be a reckoning we should not wish for.

I have heard from a number of different industry players and they see the same patterns. Just look at the average linked in post about AI adoption to confirm. Maybe you observe different patterns and the issues aren't as systemic as I fear. I honestly hope so.

Your implication that seniors like myself are "insecure about our jobs" is somewhat ironically correct, but not for the reasons you think.

View on HN · Topics

>management values speed and quantity of delivery above all else

I don't know about you but this has been the case for my entire career.
Mgmt never gave a shit about beautiful code or tech debt or maintainability or how enlightened I felt writing code.

View on HN · Topics

It reminds me a lot of 3D Printing tbh. Watching all these cool DIY 3d printing kits evolve over years, I remember a few times I'd checked on costs to build a DIY one. They kept coming down, and down, and then around the same time as "Build a 3d printer for $200 (some assembly required)!" The Bambu X1C was announced/released, for a bit over a grand iirc? And its whole selling point was that it was fast and worked, out of the box. And so I bought one and made a bunch of random one-off-things that solved _my_ specific problem, the way I wanted it solved. Mostly in the form of very specific adapter plates that I could quickly iterate on and random house 'wouldn't it be nice if' things.

That's kind of where AI-agent-coding is now too, though... software is more flexible.

View on HN · Topics

I think the web/system dichotomy is also a major conflating factor for LLM discussions.

A “few hundred lines of code” in Rust or Haskell can be bumping into multiple issues LLM assisted coding struggles with. Moving a few buttons on a website with animations and stuff through multiple front end frameworks may reasonably generate 5-10x that much “code”, but of an entirely different calibre.

3,000 lines a day of well-formatted HTML template edits, paired with a reloadable website for rapid validation, is super digestible, while 300 lines of code per day into curl could be seen as reckless.

View on HN · Topics

There's a point at which these things become Good Enough though, and don't bottleneck your capacity to get things done.

To your point, React, while it has new updates, hasn't changed the fundamentals since 16.8.0 (introduction of hooks) and that was 7 years ago. Yes there are new hooks, but they typically build on older concepts. AWS hasn't deprecated any of our existing services at work (besides maybe a MySQL version becoming EOL) in the last 4 years that I've worked at my current company.

While I prefer pnpm (to not take up my MacBook's inadequate SSD space), you can still use npm and get things done.

I don't need to keep obsessing over whether Codex or Claude have a 1 point lead in a gamed benchmark test so long as I'm still able to ship features without a lot of churn.

View on HN · Topics

It sounds like you're talking more about "vibe coding" i.e. just using LLMs without inspecting the output. That's neither what the article nor the people to whom you're replying are saying. You can (and should) heavily review and edit LLM generated code. You have the full ability to change it yourself, because the code is just there and can be edited!

View on HN · Topics

Whether it's interesting or not is irrelevant to whether it produces usable output that could be economically valuable.

View on HN · Topics

Yeah, still waiting for something to ship before I form a judgement on that

View on HN · Topics

> the scope is so small there's not much point in using an LLM

Actually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.

It has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.

For such "mechanical" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.

(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)

I call it "power coding", akin to power armor, or perhaps "coding at the speed of thought". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.

I do use both approaches though, just depends on the project, task or mood!

View on HN · Topics

I don't understand how Agents make you feel productive. Single/Multiple agents reading specs, specs often produced with agents itself and iterated over time with human in the loop, a lot of reviewing of giant gibberish specs. Never had a clear spec in my life. Then all the dancing for this apperantly new paradigm, of not reviewing code but verifying behaviour, and so many other things. All of this to me is a total UNproductive mess. I use Cursor autocomplete from day one till to this day, I was super productive before LLMs, I'm more productive now, I'm capable, I have experience, product is hard to maintain but customers are happy, management is happy. So I can't really relate anymore to many of the programmers out there, that's sad, I can count on my hands devs that I can talk to that have hard skills and know-how to share instead of astroturfing about AI Agents

View on HN · Topics

How could the author write all of that and not talk about actual time savings versus the prior method?

I mean, what is the point of change if not to improve? I don't mean "I felt I was more efficient." Feelings aren't measurements. Numbers!

View on HN · Topics

If you make 10k/mo -- which is not that much!, $500 is 5% of revenue. All else held equal, if that helps you go 20% faster, it's an absolute no brainer.

The question is.. does it actually help you do that, or do you go 0% faster? Or 5% slower?

Inquiring minds want to know.

View on HN · Topics

> Context switching is very expensive. In order to remain efficient, I found that it was my job as a human to be in control of when I interrupt the agent, not the other way around. Don't let the agent notify you.

This I have found to be important too.

View on HN · Topics

AI adoption is being heavily pushed at my work and personally I do use it, but only for the really "boilerplate-y" kinds of code I've already written hundreds of times before. I see it as a way to offload the more "typing-intensive" parts of coding (where the bottleneck is literally just my WPM on the keyboard) so I have more time to spend on the trickier "thinking-intensive" parts.

View on HN · Topics

AI is getting to the game-changing point. We need more hand-written reflections on how individuals are managing to get productivity gains for real (not a vibe coded app) software engineering.

View on HN · Topics

> I'm not [yet?] running multiple agents, and currently don't really want to

This is the main reason to use AI agents, though: multitasking. If I'm working on some Terraform changes and I fire off an agent loop, I know it's going to take a while for it to produce something working. In the meantime I'm waiting for it to come back and pretend it's finished (really I'll have to fix it), so I start another agent on something else. I flip back and forth between the finished runs as they notify me. At the end of the day I have 5 things finished rather than two.

The "agent" doesn't have to be anything special either. Anything you can run in a VM or container (vscode w/copilot chat, any cli tool, etc) so you can enable YOLO mode.

View on HN · Topics

I find it interesting that this thread is full of pragmatic posts that seem to honestly reflect the real limits of current Gen-Ai.

Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".

View on HN · Topics

I quickly run out of the JetBrains AI 35 monthly credits for $300/yr and spending an additional $5-10/day on top of that, mostly for Claude.

I just recently added in Codex, since it comes with my $20/mo subscription to GPT and that's lowering my Claude credit usage significantly... until I hit those limits at some point.

20 12 + 300 + 5 ~200... so about $1500-$1600/year.

It is 100% worth it for what I'm building right now, but my fear is that I'll take a break from coding and then I'm paying for something I'm not using with the subscriptions.

I'd prefer to move to a model where I'm paying for compute time as I use it, instead of worrying about tokens/credits.

View on HN · Topics

> If an agent isn't running, I ask myself "is there something an agent could be doing for me right now?"

Solution-looking-for-a-problem mentality is a curse.

View on HN · Topics

For the AI skeptics reading this, there is an overwhelming probability that Mitchell is a better developer than you. If he gets value out of these tools you should think about why you can't.

View on HN · Topics

The AI skeptics instead stick to hard data, which so far shows a 19% reduction in productivity when using AI.

View on HN · Topics

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

> 1) We do NOT provide evidence that AI systems do not currently speed up many or most software developers. Clarification: We do not claim that our developers or repositories represent a majority or plurality of software development work.

> 2) We do NOT provide evidence that AI systems do not speed up individuals or groups in domains other than software development. Clarification: We only study software development.

> 3) We do NOT provide evidence that AI systems in the near future will not speed up developers in our exact setting. Clarification: Progress is difficult to predict, and there has been substantial AI progress over the past five years [3].

> 4) We do NOT provide evidence that there are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Clarification: Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.

View on HN · Topics

Points 2 and 3 are irrelevant.

Point 1 is saying results may not generalise, which is not a counter claim. It’s just saying “we cannot speak for everyone”.

Point 4 is saying there may be other techniques that work better, which again is not a counter claim. It’s just saying “you may find bette methods.”

Those are standard scientific statements giving scope to the research. They are in no way contradicting their findings. To contradict their findings, you would need similarly rigorous work that perhaps fell into those scenarios.

Not pushing an opinion here, but if we’re talking about research then we should be rigorous and rationale by posting counter evidence. Anyone who has done serious research in software engineering knows the difficulties involved and that this study represents one set of data. But it is at least a rigorous set and not anecdata or marketing.

I for one would love a rigorous study that showed a reliable methodology for gaining generalised productivity gains with the same or better code quality.

View on HN · Topics

There is no such hard data. It's just research done on 16 developers using Cursor and Sonnet 3.5 .

View on HN · Topics

Perhaps that's the reason. Maybe I'm just not a good enough developer. But that's still not actionable. It's not like I never considered being a better developer.

View on HN · Topics

The value Mitchell describes aligns well with the lack of value I'm getting. He feels that guiding an agent through a task is neither faster nor slower than doing it himself, and there's some tasks he doesn't even try to do with an agent because he knows it won't work, but it's easier to parallelize reviewing agentic work than it is to parallelize direct coding work. That's just not a usage pattern that's valuable to me personally - I rarely find myself in a situation where I have large number of well-scoped programming tasks I need to complete, and it's a fun treat to do myself when I do.

View on HN · Topics

Don't get it. What's the relation between Mitchell being a "better" developer than most of us (and better is always relative, but that's another story) and getting value out of AI? That's like saying Bezos is a way better businessman than you, so you should really hear his tips about becoming a billionaire. No sense (because what works for him probably doesn't work for you)

Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.

View on HN · Topics

Maybe you disagree with it, but it seems like a pretty straightforward argument: A lot of us dismiss AI because "it can't be trusted to do as good a job as me". The OP is arguing that someone, who can do better than most of us, disagrees with this line of thinking. And if we have respect for his abilities, and recognize them as better than our own, we should perhaps re-assess our own rationale in dismissing the utility of AI assistance. If he can get value out of it, surely we can too if we don't argue ourselves out of giving it a fair shake. The flip side of that argument might be that you have to be a much better programmer than most of us are, to properly extract value out of the AI... maybe it's only useful in the hands of a real expert.

View on HN · Topics

No, it doesn't work that way. I don't know if Mitchell is a better programmer than me, but let's say he is for the sake of argument. That doesn't make him a god to whom I must listen. He's just a guy, and he can be wrong about things. I'm glad he's apparently finding value here, but the cold hard reality is that I have tried the tools and they don't provide value to me. And between another practicioner's opinion and my own, I value my own more.

View on HN · Topics

There is no dichotomy of craft and AI.

I consider myself a craftsman as well. AI gives me the ability to focus on the parts I both enjoy working on and that demand the most craftsmanship. A lot of what I use AI for and show in the blog isn’t coding at all, but a way to allow me to spend more time coding.

This reads like you maybe didn’t read the blog post, so I’ll mention there many examples there.

View on HN · Topics

I enjoy Japanese joinery, but for some reason the housing market doesn't.

View on HN · Topics

Nobody is trying to talk anyone out of their hobby or artisanal creativeness. A lot of people enjoy walking, even after the invention of the automobile. There's nothing wrong with that, there are even times when it's the much more efficient choice. But in the context of say transporting packages across the country... it's not really relevant how much you enjoy one or the other; only one of them can get the job done in a reasonable amount of time. And we can assume that's the context and spirit of the OP's argument.

View on HN · Topics

>But in the context of say transporting packages across the country... it's not really relevant how much you enjoy one or the other; only one of them can get the job done in a reasonable amount of time.

I think one of the more frustrating aspects of this whole debate is this idea that software development pre-AI was too "slow", despite the fact that no other kind of engineering has nearly the same turn around time as software engineering does (nor does they have the same return on investment!).

I just end up rolling my eyes when people use this argument. To me it feels like favoring productivity over everything else.

View on HN · Topics

> a period of inefficiency

I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].

And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.

[0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...

Summarizer