Summarizer

LLM Input

llm/e6f7e516-f0a0-4424-8f8f-157aae85c74e/topic-13-ba159c03-4142-428c-b933-ae73ed232ed4-input.json

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Productivity Metrics and Paradoxes # Skepticism regarding "2x productivity" claims. Commenters argue that generating more code doesn't equal value, noting that debugging, communicating, and context-gathering are the real bottlenecks, and that AI might simply be increasing the volume of low-quality output or "slop."
</topic>

<comments_about_topic>
1. > If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing . LLMs-for-Code are another tool under the same category.

2. This project and its website were both originally working 1 shot prototypes:

The website https://pxehost.com - via codex CLI

The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.

There was much tweaking, testing, refactoring (often manually) before releasing it.

Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.

The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.

I can give more if you like, but hope that is what you are looking for.

3. I work in insurance - regulated, human capital heavy, etc.

Three examples for you:
- our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans
- our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients.
- our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.

We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document

4. > the whole point was to free the human from reading all the details and relevant context about the case

That's your assumption.

My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.

It doesn't need to be all or nothing.

5. The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.

On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.

6. This kind of take I find genuinely baffling. I can't see how anybody working with current frontier models isn't finding them a massive performance boost. No they can't replace a competent developer yet, but they can easily at least double your productivity.

Careful code review and a good pull request flow are important, just as they were before LLMs.

7. > double your productivity

Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.

Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.

8. Why is that the real metric? If you can turn a 1x dev into a 2x dev that's a huge deal, especially if you can also turn the original 2x dev into a 4x dev.

And far from "churning out code" my work is better with LLMs. Better tested, better documented, and better organized because now I can do refactors that just would have taken too much time before. And more performant too because I can explore more optimization paths than I had time to before.

Refusing to use LLMs now is like refusing to use compilers 20 years ago. It might be justified in some specific cases but it's a bad default stance.

9. > Why is that the real metric?

The answer to "Can you perform at the same level as a dev who is considered 2x as productive as you?" is self-explanatory. If your answer is negative, you are not 2x as productive

10. People thought they were doubling their productivity and then real, actual studies showed they were actually slower. These types of claims have to be taken with entire quarries of salt at this point.

11. My bigger point was that not everyone who is skeptical about supposed productivity gains and their veracity is in competition with you. I think any inference you made beyond that is a mistake on your part.

(I did do web development and distributed systems for quite some time, though, and I suspect while LLMs are probably good at tutorial-level stuff for those areas it falls apart quite fast once you leave the kiddy pool.)

P.S.:

I think it's very ironic that you say that you should be careful to not speak in general terms about things that might depend much more on context, when you clearly somehow were under the belief that all developers must see the same kind of (perceived) productivity gains you have.

12. I would also take those studies with a grain of salt at this point, or at least taking into consideration that a model from even a few months ago might have significant enough results from the current frontier models.

And in my personal experience it definitely helps in some tasks, and as someone who doesn't actually enjoy the actual coding part that much, it also adds some joy to the job.

Recently I've also been using it to write design docs, which is another aspect of the job that I somewhat dreaded.

13. I think the bigger part of those studies was actually that they were a clear sign that whatever productivity coefficient people were imagining back then was clearly a figment of their imagination, so it's useful to take that lesson with you forward. If people are saying they're 2 times productive with LLMs, it's still likely the case that a large part of that is hyperbole, whatever model they're working with.

It's the psychology of it that's important, not the tool itself; people are very bad at understanding where they're spending their time and cannot accurately assess the rate at which they work because of it.

14. I like coming up with the system design and the low level pseudo code, but actually translating it to the specific programming language and remembering the exact syntax or whatnot I find pretty uninspiring.

Same with design docs more or less, translating my thoughts into proper and professional English adds a layer I don't really enjoy (since I'm not exactly great at it), or stuff like formatting, generating a nice looking diagram, etc.

Just today I wrote a pretty decent design doc that took me two hours instead of the usual week+ slog/procrastination, and it was actually fairly enjoyable.

15. > human employees using AI are doing way more than they could before, both in depth and scale

Funny how that doesn't show up in any productivity or economic metrics...

16. Bit too soon to tell, no? Claude Code wasn't released until the latter half of Q2, offering little time for it to show up in those figures, and Q3 data is only preliminary right now. Moreover, it seems to be the pairing with Opus 4.5 that lends some credence to the claims. However, it was released in Q4. We won't have that data for quite a while. And like Claude Code, it came late in the quarter, so realistically we really need to wait on Q1 2026 figures, which hasn't happened yet and won't really start to appear until summertime and beyond.

That said, I expect you are right that we won't see it show up. Even if we assume the claim is true in every way for some people, it only works for exceptional visionaries who were previously constrained by typing speed, which is a very, very, very small segment of the developer population. Any gains that small group realize will be an unrecognizable blip amid everything else. The vast majority of developers need all that typing time and more to have someone come up with their next steps. Reducing the typing time for them doesn't make them any more productive. They were never limited by typing speed in the first place.

17. Humans are doing a bit more, specifically around 20% more.

AI generates output that must be thoroughly check for most software engineering purposes. If you’re not checking the output, then quality and accuracy must not matter. For quick prototyping that’s mostly true. Not for real engineering.

18. I think a practical measure still useful right now, which does capture a lot of the "non-performance" capabilities of an employee, is as follows:

"Why has my job not been outsourced yet, since it is far cheaper?" Those are probably the same reasons why AI won't take your job this year.

Raw coding metrics are a very small part of being a cog in a company, which is not me saying it will never happen. Just me saying that thos focus on coding performance kind of misses the forest for the trees.

19. The adoption of AI tools for software development will probably not result in sudden layoffs but rather on harder to measure changes, like smaller teams being able to tackle significantly more ambitious projects than before.

I suspect that another kind of impact is already happening in organisations where AI adoption is uneven: suddenly some employees appear to be having a lot more leisure time while apparently keeping the same productivity as before.

20. > you would be spinning up new projects and offshoots

If the engineers can 10x their output, this actually exposes the product leadership since I find it unlikely that they can 10x the number of revenue generating projects or 10x their product spec development.

21. > Allegedly every dollar you spent on an engineer is potentially worth 10x(?) what it was a couple years ago. Meaning your profit per engineer could soar, but tech companies decided they don't want more profit?

Exactly, so many of these claims are complete nonsense. I'm supposed to believe that boards/investors would be fine with companies doing massive layoffs to maintain flat/minuscule growth, when they could keep or expand their current staffing and massively expand their market share and profits with all this increased productivity?

It's ridiculous. If this stuff had truly increased productivity at the levels claimed we would see firms pouring money into technical staff to capitalize on this newfound leverage.

22. But it isn’t joining the workforce. Your perspective is that it could, but the point that it hasn’t is the one that’s salient. Codex might be able to do a substantial portion of what a freelancer can do, but even you fell short of saying it can replace the freelancer. As long as every ai agent needs its hand held the effect on the labor force is an increase in costs and an increase in outputs where quality doesn’t matter. It’s not a reduction of labor forces

23. Terrible productivity loss vs. signing up for a hosted Wordpress site.

24. It was from Altman's blog:

> We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies...

"materially change the output of companies" seems fairly defined and didn't happen in most cases. I guess some kicked out more slop but I don't think that's what he meant.

25. TikTok, Youtube, news, blogs, … are getting flooded with AI generated content, I'd call that a pretty substantial "change in output".

I think the mistake here is expecting that AI is just making workers in older jobs faster, when the reality is, more often than not, that it changes the nature of the task itself.

Whenever AI reached the "good enough" point, it doesn't do so in a way that nicely aligns with human abilities, quite the opposite, it might be worse at performing a task, but be able to perform it 1000x faster. That allows you to do things that weren't previously possible, but it also means that professionals might not want to rely on using AI for the old tasks.

A professional translator isn't going to switch over to using AI, the quality isn't there yet, but somebody like Amazon could offer a "OCR & translate all the books" service and AI would be good enough for it, since it could handle all the books that nobody has the time and money to translate manually. Which in turn will eventually put the professional translator out of a job when it gets better than good enough. We aren't quite there yet, but getting pretty close.

In 2025 a lot of AI went from "useless, but promising" to "good enough".

26. I've seen organizations where 300 of 500 people could effectively be replaced by AI, just by having some of the the remaining 200 orchestrate and manage automation workflows that are trivially within the capabilities of current frontier models.

There's a whole lot of bullshit jobs and work that will get increasingly and opaquely automated by AI. You won't see jobs go away unless or until organizations deliberately set out to reduce staff. People will use AI throughout the course of their days to get a couple of "hours" of tasks done in a few minutes, here and there, throughout the week. I've already seen reports and projects and writing that clearly comes from AI in my own workplace. Right now, very few people know how to recognize and assess the difference between human and AI output, and even fewer how to calibrate work assignments.

Spreadsheet AIs are fantastic, reports and charting have just hit their stride, and a whole lot of people are going to appear to be very productive without putting a whole lot of effort into it. And then one day, when sufficiently knowledgable and aware people make it into management, all sorts of jobs are going to go quietly away, until everything is automated, because it doesn't make sense to pay a human 6 figures what an AI can do for 3 figures in a year.

I'd love to see every manager in the world start charting the Pareto curves for their workplaces, in alongside actual hours worked per employee - work output is going to be very wonky, and the lazy, clever, and ambitious people are all going to be using AI very heavily.

Similar to this guy: https://news.ycombinator.com/item?id=11850241

https://www.reddit.com/r/BestofRedditorUpdates/comments/tm8m...

Part of the problem is that people don't know how to measure work effectively to begin with, let alone in the context of AI chatbots that can effectively do better work than anyone a significant portion of the adult population of the planet.

The teams that fully embrace it, use the tools openly and transparently, and are able to effectively contrast good and poor use of the tools, will take off.

27. It seems like we are using AI to automate the unimportant parts of jobs that we shouldn’t have been doing anyway. Things like endless status reports or emails.

But from what I’ve seen it just makes that work output even less meaningful—who wants to read AI generated 10 pages that could have been two bullet points?

And it doesn’t actually improve productivity because that was never the bottleneck of those jobs anyway. If anything, having some easy rote work is a nice way to break up the pace.

28. Employee has a few bullet-points of updates, they feed it through an LLM to fluff it out into an email to their manager, and then the manager puts the received email through an LLM to summarize it down to a few bullet points... Probably making some mistakes.

There are all these things in writing we used as signals for intelligence, attention to detail, engagement, willingness to accept feedback, etc... but they're now easy to counterfeit at scale.

Hopefully everyone realizes what's going on and cuts out the middleman.

29. AI doing a bullshit job isn't a productivity increase though; it's at best a cost cut. It would be an even bigger cost cut to remove the bullshit job

30. A brief history of programming:

1. Punch cards -> Assembly languages

2. Assembly languages -> Compiled languages

3. Compiled languages -> Interpreted languages

4. Interpreted languages -> Agentic LLM prompting

I've tried the latest and greatest agentic CLI and toolings with the public SOTA models.

I think this is a productivity jump equivalent to maybe punch cards -> compiled languages, and that's it. Something like a 40% increase, but nowhere close to exponential.

31. That's jump if you are a junior. It falls down hard for the seniors doing more complex stuff.

I'm also reminding that we tried whole "make it look like human language" with COBOL and it turned out that language wasn't a bottleneck, the ability of people to specify exactly what they want was the bottleneck. Once you have exact spec, even writing code on your own isn't all that hard but extracting that from stakeolders have always been the harder part of the programming.

32. > the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation.

We spent an amount of money on data centers that was so large that it managed to overcome a self-imposed kick in the nuts from tariffs and then some. The amount of money involved rivals the creation of the railroad system in the United States. Of course GDP overperformed in that scenario.

Where did AI tool use show up in the productivity numbers?

33. Is that a useful thought experiment? Claude benefits you as an individual more than a coworker, but I find I hard to believe your use of Claude is more of a value add to the business than an additional coworker. Especially since that coworker will also have access to Claude.

In the past we also just raised the floor on productivity, do you think this will be different?

34. There’s often the question of communication overhead between people; Claude would remove that.

35. people talking as if communication overhead is bad. That overhead makes someone else able to substitute for you (or other one) when needs happen, and sometimes can discover concerns earlier.

36. > There’s often the question of communication overhead between people; Claude would remove that.

... and replace that with communication overhead with claude ?

37. If you’re already communicating with Claude, it’s not additional overhead.

38. For your answer to be correct for your employer , the added productivity from your use of LLMs must be at least as much as the productivity from whichever coworker you're having fired. No study I've seen claims much above a 20% increase in productivity, so either a) your productivity without LLMs was ~5x that of your coworkers, or b) you're making a mistake in your analysis (likely some combination of thinking about it from your perspective instead of your employers and overestimating how helpful LLMs are to you).

39. It makes him (presumed) 20% more effective than his coworker makes him . Overall effectiveness of the team is not being considered, but that's why his manager isn't asking him :)

40. What's most useful to you is not necessarily most useful to the business. The bar for critical thinking to get staff at this company I've surely heard of must not be very high.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Productivity Metrics and Paradoxes # Skepticism regarding "2x productivity" claims. Commenters argue that generating more code doesn't equal value, noting that debugging, communicating, and context-gathering are the real bottlenecks, and that AI might simply be increasing the volume of low-quality output or "slop."

commentCount

40

← Back to job