AI-Assisted Coding Reality

Divergent experiences with tools like Claude Code and Codex. While some report massive productivity boosts and shipping entire features solo, others describe "lazy" AI, subtle logic bugs in generated tests (e.g., SQL query validation), and the danger of unverified code bloat.

The current landscape of AI-assisted coding is defined by a stark divide between skeptics who dismiss these tools as "lazy" stochastic parrots prone to logic bugs and enthusiasts who leverage them as high-velocity "intelligence engines" capable of doubling productivity. While proponents celebrate the ability to rapidly prototype features and compress specialized roles like DevOps into single-developer tasks, critics caution that "vibe coding" often results in unverified code bloat and a dangerous erosion of institutional knowledge. Success with these agents appears to depend less on the model’s inherent reasoning and more on a robust ecosystem of compilers and automated validators, shifting the developer’s primary value from manual syntax mastery to high-level architectural oversight. Ultimately, while the technology can drastically lower the barrier to entry for complex projects, it remains a "mediocre machine" without the rigorous verification and deep domain expertise of a human operator.

View on HN · Topics

Yesterday I got AI (a sota model) to write some tests for a backend I'm working on. One set of tests was for a function that does a somewhat complex SQL query that should return multiple rows

In the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?

I don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years

This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

View on HN · Topics

As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes.

The tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.

In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.

LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.

View on HN · Topics

> As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes

That has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data

This specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code

I do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do

My point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it

Why can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems

View on HN · Topics

Sounds like the AI was not dumb but lazy. I do it similarly when I don't feel like doing it.

View on HN · Topics

> Is this what people are referring to when they say AI writes their tests?

yes

> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.

[nods]

> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

and yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.

The larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the "fuzzing" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?

View on HN · Topics

> connecting all the data sources for agents to run

Copilot can't jump to definition in Visual Studio.

Anthropic got a lot of mileage out of teaching Claude to grep, but LLM agents are a complete dead-end for my code-base until they can use the semantic search tools that actually work on our code-base and hook into the docs for our expensive proprietary dependencies.

View on HN · Topics

Any topic with little coverage in the training data. LLMs will keep circling around the small bits in the training data, unable synthesize new connections.

This is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish.

View on HN · Topics

I recall someone saying stories of LLMs doing something useful to "I have a Canadian girlfriend" stories. Not trying to discredit or be a pessimist, can anyone elaborate how exactly they use these agents while working in interdependent projects in multi-team settings in e.g. regulated industries?

View on HN · Topics

I’m strictly talking about “Agentic” coding here:

They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.

I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.

LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.

View on HN · Topics

> If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing . LLMs-for-Code are another tool under the same category.

View on HN · Topics

I don’t think the conclusion is right. Your org might still require enough React knowledge to keep you gainfully employed as a pure React dev but if all you did was changing some forms, this is now something pretty much anyone can do. The value of good FE architecture increased if anything since you will be adding code quicker. Making sure the LLM doesn’t stupidly couple stuff together is quite important for long term success

View on HN · Topics

> someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org

Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.

Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code

View on HN · Topics

I follow at least one GitHub repo (a well respected one that's made the HN front page), and where everything is now Claude coded. Things do move fast, but I'm seriously under impressed with the quality. I've raised a few concerns, some were taken in, others seem to have been shut down with an explanation Claude produced that IMO makes no sense, but which is taken at face value.

This matches my personal experience. I was asked to help with a large Swift iOS app without knowing Swift. Had access to a frontier agent. I was able to consistently knock a couple of tickets per week for about a month until the fire was out and the actual team could take over. Code review by the owners means the result isn't terrible, but it's not great either. I leave the experience none the wiser: gained very little knowledge of Swift, iOS development or the project. Management was happy with the productivity boost.

I think it's fleeting and dread a time where most code is produced that way, with the humans accumulating very little institutional knowledge and not knowing enough to properly review things.

View on HN · Topics

I'm just one data point. Me being unimpressed should not be used to judge their entire work. I feel like I have a pretty decent understanding of a few small corners of what they're doing, and find it a bad omen that they've brushed aside some of my concerns. But I'm definitely not knowledgeable enough about the rest of it all.

What concerns me is, generally, if the experts (and I do consider them experts) can use frontier AI to look very productive, but upon close inspection of something you (in this case I) happen to be knowledgeable about, it's not that great (built on shaky foundations), what about all the vibe coded stuff built by non-experts?

View on HN · Topics

This project and its website were both originally working 1 shot prototypes:

The website https://pxehost.com - via codex CLI

The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.

There was much tweaking, testing, refactoring (often manually) before releasing it.

Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.

The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.

I can give more if you like, but hope that is what you are looking for.

View on HN · Topics

I appreciate the effort and that's a nice looking project. That's similar to the gains I've gotten as well with Greenfield projects (I use codex too!). However not as grandiose as these the Canadian girlfriend post category.

View on HN · Topics

This looks awesome, well done.

I find it remarkable there are people that look at useful, living projects like that and still manage to dismiss AI coding as a fad or gimmick.

View on HN · Topics

I had some .csproj files that only worked with msbuild/vsbuild that I wanted to make compatible with dotnet. Copilot does a pretty good job of updating these and identifying the ones more likely to break (say web projects compared to plain dlls). It isn't a simple fire and forget, but it did make it possible without me needing to do as much research into what was changing.

Is that a net benefit? Without AI, if I really wanted to do that conversion, I would have had to become much more familiar with the inner workings of csproj files. That is a benefit I've lost, but it would've also taken longer to do so, so much time I might not have decided to do the conversion. My job doesn't really have a need for someone that deeply specialized in csproj, and it isn't a particular interest of mine, so letting AI handle it while being able to answer a few questions to sate my curiosity seemed a great compromise.

A second example, it works great as a better option to a rubber duck. I noticed some messy programming where, basically, OOP had been abandoned in favor of one massive class doing far too much work. I needed to break it down, and talking with AI about it helped come up with some design patterns that worked well. AI wasn't good enough to do the refactoring in one go, but it helped talk through the pros and cons of a few design pattern and was able to create test examples so I could get a feel for what it would look like when done. Also, when I finished, I had AI review it and it caught a few typos that weren't compile errors before I even got to the point of testing it.

None of these were things AI could do on their own, and definitely aren't areas I would have just blindly trusted some vibe coded output, but overall it was productivity increase well worth the $20 or so cost.

(Now, one may argue that is the subsidized cost, and the unsubsidized cost would not have been worthwhile. To that, I can only say I'm not versed enough on the costs to be sure, but the argument does seem like a possibility.)

View on HN · Topics

Here's some anecdata from the B2B SaaS company I work at

- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work

- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it

- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.

- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production

---

All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.

View on HN · Topics

This kind of take I find genuinely baffling. I can't see how anybody working with current frontier models isn't finding them a massive performance boost. No they can't replace a competent developer yet, but they can easily at least double your productivity.

Careful code review and a good pull request flow are important, just as they were before LLMs.

View on HN · Topics

> double your productivity

Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.

Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.

View on HN · Topics

Why is that the real metric? If you can turn a 1x dev into a 2x dev that's a huge deal, especially if you can also turn the original 2x dev into a 4x dev.

And far from "churning out code" my work is better with LLMs. Better tested, better documented, and better organized because now I can do refactors that just would have taken too much time before. And more performant too because I can explore more optimization paths than I had time to before.

Refusing to use LLMs now is like refusing to use compilers 20 years ago. It might be justified in some specific cases but it's a bad default stance.

View on HN · Topics

The denial on this topic is genuinely surreal. I've knocked out entire features in a single prompt that took me days in the past.

I guess I should be happy that so many of my colleagues are willing to remove themselves from the competitive job pool with these kinds of attitudes.

View on HN · Topics

C, Swift, Typescript, audio dsp, robotics etc.

People always want to claim what they’re doing is so complex and esoteric that AI can’t touch it. This is dangerous hubris.

View on HN · Topics

You discount the value of being intimately familiar with each line of code, the design decisions and tradeoffs because one wrote the bloody thing.

It is negative value for me to have a mediocre machine do that job for me, that I will still have to maintain, yet I will have learned absolutely nothing from the experience of building it.

View on HN · Topics

No, I wouldn't say it's super complex. I make custom 3D engines. It's just that you and I were probably never in any real competition anyway, because it's not super common to do what I do.

I will add that LLMs are very mediocre, bordering on bad, at any challenging or interesting 3D engine stuff. They're pretty decent at answering questions about surface API stuff (though, inexplicably, they're really shit at OpenGL which is odd because it has way more code out there written in it than any other API) and a bit about the APIs' structure, though.

View on HN · Topics

My bigger point was that not everyone who is skeptical about supposed productivity gains and their veracity is in competition with you. I think any inference you made beyond that is a mistake on your part.

(I did do web development and distributed systems for quite some time, though, and I suspect while LLMs are probably good at tutorial-level stuff for those areas it falls apart quite fast once you leave the kiddy pool.)

P.S.:

I think it's very ironic that you say that you should be careful to not speak in general terms about things that might depend much more on context, when you clearly somehow were under the belief that all developers must see the same kind of (perceived) productivity gains you have.

View on HN · Topics

I would also take those studies with a grain of salt at this point, or at least taking into consideration that a model from even a few months ago might have significant enough results from the current frontier models.

And in my personal experience it definitely helps in some tasks, and as someone who doesn't actually enjoy the actual coding part that much, it also adds some joy to the job.

Recently I've also been using it to write design docs, which is another aspect of the job that I somewhat dreaded.

View on HN · Topics

I like coming up with the system design and the low level pseudo code, but actually translating it to the specific programming language and remembering the exact syntax or whatnot I find pretty uninspiring.

Same with design docs more or less, translating my thoughts into proper and professional English adds a layer I don't really enjoy (since I'm not exactly great at it), or stuff like formatting, generating a nice looking diagram, etc.

Just today I wrote a pretty decent design doc that took me two hours instead of the usual week+ slog/procrastination, and it was actually fairly enjoyable.

View on HN · Topics

> The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.

Both of these agents launched mid-2025.

View on HN · Topics

don't forget Aider from 2023

View on HN · Topics

Still working hard and now we also have Aider-ce.

View on HN · Topics

The parent comment specifically referenced Claude Code, which launched in Feb 2025 [1] and went GA May 2025 [2]. Codex also launched May 2025 [3].

[1] https://www.anthropic.com/news/claude-3-7-sonnet

[2] https://www.anthropic.com/news/claude-4

[3] https://openai.com/index/introducing-codex/

View on HN · Topics

but not Claude Code. it was released just this summer (I guess?)

View on HN · Topics

a stellar piece, Cal, as always. short and straight to the point.

I believe that Codex and the likes took off (in comparison to e.g. "AI" browsers) because the bottleneck there was not reasoning about code, it was about typing and processing walls of text. for a human, the interface of e.g. Google Calendar is ± intuitive. for a LLM, any graphical experience is an absolute hellscape from performance standpoint.

CLI tools, which LLMs love to use, output text and only text, not images, not audio, not videos. LLMs excel at text, hence they are confined to what text can do. yes, multimodal is a thing, but you lose a lot of information and/or context window space + speed.

LLMs are a flawed technology for general, true agents. 99% of the time, outside code, you need eyes and ears. we have only created a self-writing paper yet.

View on HN · Topics

Codex and the like took off because there existed a "validator" of its work - a collection of pre-existing non-LLM software - compilers, linters, code analyzers etc. And the second factor is very limited and defined grammar of programming languages. Under such constraints it was much easier to build a text generator which will validate itself using external tools in a loop, until generated stream makes sense.

And the other "successful" industry being disrupted is the one where there is no need validate output, because errors are ok or irrelevant. A text not containing much factual data, like fiction or business-lingo or spam. Or pictures, where it doesn't matter which color is a specific pixel, a rough match will do just fine.

But outside of those two options, not many other industries can use at scale an imprecise word or media generator. Circular writing and parsing of business emails with no substance? Sure. Not much else.

View on HN · Topics

Besides the ability to deal with text, I think there are several reasons why coding is an exceptionally good fit for LLMs.

Once LLMs gained access to tools like compilers, they started being able to iterate on code based on fast, precise and repeatable feedback on what works and what doesn't, be it failed tests or compiler errors. Compare this with tasks like composing a powerpoint deck, where feedback to the LLM (when there is one) is slower and much less precise, and what's "good" is subjective at best.

Another example is how LLMs got very adept at reading and explaining existing code. That is an impressive and very useful ability, but code is one of the most precise ways we, as humans, can express our intent in instructions that can be followed millions of times in a nearly deterministic way (bugs aside). Our code is written in thoroughly documented languages with a very small vocabulary and much easier grammar than human languages. Compare this to taking notes in a zoom call in German and trying to make sense of inside jokes, interruptions and missing context.

But maybe most importantly, a developer must be the friendliest kind of human for an LLM. Breaking down tasks in smaller chunks, carefully managing and curating context to fit in "memory", orchestrating smaller agents with more specialized tasks, creating new protocols for them to talk to each others and to our tools.... if it sounds like programming, it's because it is.

View on HN · Topics

LLMs are good at coding (well, kinda, sometimes) because programmers gave away their work away for free and created vast training data.

View on HN · Topics

I don’t think “giving away” has much to do with it.

I mean we did give away code as training data but we also know that AI companies just took pirated books and media too.

So I don’t think gifting has much to do with it.

Next all the Copilot users will be “giving away” all their business processes and secrets to Microsoft to clone.

View on HN · Topics

I agree with that. For code, most of it was in a "public space" similar to driving down a street and training the model on trees and signs etc. The property is not yours but looking at it doesn't require ownership.

View on HN · Topics

It was not a well thought out piece and it is discounting the agentic progress that has happened.

>The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.

It is easy to forget that Claude Code CAME OUT in 2025. The models and agents released in 2025 really DID prove how powerful and capable they are. The predictions were not really wrong. I AM using code agents in a literal fire and forget way.

Claude Code is a hugely capable agentic interface for sovling almost any kind of problem or project you want to solve for personal use. I literally use it as the UX for many problems. It is essentially a software that can modify itself on the fly.

Most people haven't really grasped the dramatic paradigm shift this creates. I haven't come up with a great analogy for it yet, but the term that I think best captures how it feels to work with claude code as a primary interface is "intelligence engine".

I'll use an example, I've created several systems harnessed around Claude Code, but the latest one I built is for stock porfolio management (This was primarily because it is a fun problem space and something I know a bit about). Essentially you just used Claude Code to build tools for itself in a domain. Let me show how this played out in this example.

Claude and I brainstorma general flow for the process and roles. Then figure out what data each role would need, research what providers have the data at a reasonable price.

I purchase the API keys and claude wires up tools (in this case python scripts and documentation for the agents for about 140 api endpoints), then builds the agents and also creates an initial vesrion of the "skill" that will invoke the process that looks something like this:

Macro Economist/Strategist -> Fact Checker -> Securities Sourcers -> Analysts (like 4 kinds) -> Fact Checker/Consolidator -> Portfolio Manager

Obviously it isn't 100% great on the first pass and I have to lean on expertise I have in building LLM applications, but now I have a Claude Code instance that can orchestrate this whole research process and also handle ad-hoc changes on the fly.

Now I have evolved this system through about 5 significant iterations, but I can do it "in the app". If I don't like how part of it is working, I just have the main agent rewire stuff on the fly. This is a completely new way of working on problems.

View on HN · Topics

Bit too soon to tell, no? Claude Code wasn't released until the latter half of Q2, offering little time for it to show up in those figures, and Q3 data is only preliminary right now. Moreover, it seems to be the pairing with Opus 4.5 that lends some credence to the claims. However, it was released in Q4. We won't have that data for quite a while. And like Claude Code, it came late in the quarter, so realistically we really need to wait on Q1 2026 figures, which hasn't happened yet and won't really start to appear until summertime and beyond.

That said, I expect you are right that we won't see it show up. Even if we assume the claim is true in every way for some people, it only works for exceptional visionaries who were previously constrained by typing speed, which is a very, very, very small segment of the developer population. Any gains that small group realize will be an unrecognizable blip amid everything else. The vast majority of developers need all that typing time and more to have someone come up with their next steps. Reducing the typing time for them doesn't make them any more productive. They were never limited by typing speed in the first place.

View on HN · Topics

Can you elaborate on this more? What would be a task you would use claude code for, and what would accomplishing the task look like?

View on HN · Topics

Agents as staff replacements that can tackle tasks you would normally assign to a human employee didn't happen in 2025.

Agents as LLMs calling tools in a loop to perform tasks that can be handled by typing commands into a computer absolutely did.

Claude Code turns out to be misnamed: it's useful for way more than just writing code, once you figure out how to give it access to tools for other purposes.

I think the browser agents (like the horribly named "ChatGPT Agent" - way to burn a key namespace on a tech demo!) have acted as a distraction from this. Clicking links is still pretty hard. Running Bash commands on the other hand is practically a solved problem.

View on HN · Topics

> Agents as LLMs calling tools in a loop to perform tasks that can be handled by typing commands into a computer absolutely did.

I think that this still isn't true for even very mundane tasks like "read CSV file and translate column B in column C" for files with more than ~200 lines. The LLM will simply refuse to do the work and you'll have to stitch the badly formatted answer excerpts together yourself.

View on HN · Topics

Try it. It will work fine, because the coding agent will write a little Python script (or sed or similar) and run that against the file - it won't attempt to rewrite the file by reading it and then outputting the transformed version via the LLM itself.

View on HN · Topics

I really don’t agree with the author here. Perplexity has, for me, largely replaced Cal Newport’s job (read other journalists work and synthesize celebrity and pundit takes on topic X). I think the take that Claude isn’t literally a human so agents failed is silly and a sign of motivated reasoning. Business processes are going to lag the cutting edge by years in any conditions and by generations if there is no market pressure. But Codex isn’t capable of doing a substantial portion of what I would have had to pay a freelancer/consultant to do? Any LLM can’t replace a writer for a content mill? Nonsense. Newport needs to open his eyes and think harder about how a journalist can deliver value in the emerging market.

View on HN · Topics

Have y’all tried Claude code using opus 4.5 - I believe it has fully joined the workforce, had my grandma build and deploy her own blog with a built in cms and an admin portal, post editor, integrate uploads with GitHub, add ci/cd and took about 2 hours mostly because she types slow

View on HN · Topics

I use an agent in all my day to day coding now. It's a lot small tasks to speed me up, but it's definitely in use.

View on HN · Topics

Claude Code became a critical part of my workflow in March 2025. It is now the primary tool.

View on HN · Topics

> Spreadsheet AI

If you don't mind, could you please write a few examples of what LLMs do in Spreadsheets? Because that's probably the last place where I would allow LLMs, since they tend to generate random data and spreadsheets being notoriously hard to debug due all the hidden formulas and complex dependencies.

Say you have an accounting workbook with 50 or so sheets with tables depending on each other and they contain very important info like inventory and finances. Just a typical small to medium business setup (big corporations also do it). Now what? Do you allow LLMs to edit files like that directly? Do you verify changes afterwards and how?

View on HN · Topics

Do LLM's generate "random data"? I you give them source data there is virtually no room for hallucination in my experience. Spreadsheets are no different than coding. You can put tests in place to verify results.

View on HN · Topics

In December of 2025, I took five tickets I was assigned in Jira and threw them at codex, which just did them, and with the help of MCPs, codex was able to read the ticket, generate some code, test the code, update gitlab, create a merge request on Gitlab, and update the Jira with the MR. CodeRabbit then reviewed the MR before a human had to look at it. It didn't happen in 2025, but I see it happening for 2026.

View on HN · Topics

> He’s merely conflating an adoption curve with capabilities.

Sure, programmers would still adopt LLMs faster than the rest of the work-force whether or not the LLMs were good at writing code. But you have to at credit at least some of that adoption rate to the fact that LLMs are significantly better at text (e.g. code) generation than they are at most other white-collar tasks (e.g. using a web browser)

View on HN · Topics

A brief history of programming:

1. Punch cards -> Assembly languages

2. Assembly languages -> Compiled languages

3. Compiled languages -> Interpreted languages

4. Interpreted languages -> Agentic LLM prompting

I've tried the latest and greatest agentic CLI and toolings with the public SOTA models.

I think this is a productivity jump equivalent to maybe punch cards -> compiled languages, and that's it. Something like a 40% increase, but nowhere close to exponential.

View on HN · Topics

I'm a staff level SWE at a company that you've all heard of (not a flex, just providing context).

If my manager said to me tomorrow: "I have to either get rid of one of your coworkers or your use of AI tools, which is it?"

I would, without any hesitation, ask that he fire one of my coworkers. Gemini / Claude is way more useful to me than any particular coworker.

And now I'm preparing for my post-software career because that coworker is going to be me in a few years.

Obviously I hope that I'm wrong, but I don't think I am.

View on HN · Topics

No that’s not true at all. Humans can deal with ambiguity and operate independently. Claude can’t do that. You’re trading one “problem” for an entirely different one in this hypothetical.

View on HN · Topics

> There’s often the question of communication overhead between people; Claude would remove that.

... and replace that with communication overhead with claude ?

View on HN · Topics

"I have to either get rid of one of your coworkers or your laptop, which is it?"

View on HN · Topics

Im there with you at the govt contracting company i work for we lost a contract we had for ten years. Our team was 10 to 15 employees and we lost the contract to a company who are now doing the work with 5 employees and AI.

My company said we now are going to being bidding with smaller teams and promoting our use of AI.

One example of them promoting the company's use of AI is creating a prototype using chatGPT and AntiGravity. He took a demo video off of Youtube of a govt agency app, fed the video to chatGPT, GPT spit out all the requirements for the ten page application and then he fed those requirements to AntiGravity and boom it repilcated/created the working app/prototype in 15 minutes. Previously that would take a team of 3 to 5 a week or few to complete such a prototype.

View on HN · Topics

here's the thing, my manager won't need to do that. windsurf swe-1 is good enough for my use case and swe-1.5 is even better. Combined with free quotas of mixed openai, gemini and claude I don't really need to pay anything.

In fact I don't want to pay too much, to prevent the incoming enshittification

Summarizer