Human-in-the-Loop Importance

The human as only part with skin in the game, removal of human oversight as abandonment of error-catching, the backwards premise that humans are bottlenecks

As AI-generated "slop" increasingly inflates technical documentation and codebases, professionals warn that abandoning human oversight risks transforming developers into "reverse centaurs" who facilitate automated workflows without understanding their underlying logic. This trend toward "vibe coding" often produces architecturally hollow systems, forcing humans to act as weary "bumpers" to prevent stochastic tools from veering into production-breaking errors or "bricking" critical infrastructure. Ultimately, because AI lacks "skin in the game" and cannot suffer the consequences of its mistakes, the consensus emphasizes that true professional value lies in a human’s ability to critically filter, justify, and take responsibility for every line of output.

View on HN · Topics

This paragraph hit home with me as well. I work at a large tech company that's a household name and the practice of using AI to pad out design documents has become totally out of control over the last 4 or 5 months. Writing documentation is arduous and a little painful, which as it turns out is a good thing as it incentivizes the writer to be as succinct as possible. Why the fuck should I -- along with five other engineers -- bother to read and review your design if you didn't even bother to write it?

View on HN · Topics

If that is your manager, do so, sure. But make sure your manager is "such a manager".

If I was your manager, and you sent me your seventeen page AI generated thing coz you think I'm just gonna summarize anyway and I expect something long: You misread me.

I make a point all the time to everyone that won't listen, to not send me walls of text. I'm not gonna read them. I'm gonna ignore them, close your bug reports until I can understand them because you spent the time to make them short and legible. If you use AI for that, I don't care. But I better have something short and that when I read it makes actual sense and when I verify it, holds up. If I wanted to just ask AI, I'd do it myself. You have to "value add" to the AI if you want to be valuable yourself.

View on HN · Topics

I agree. I send 2 sentence replies to most things my bosses boss sends me. He’s near retirement, dude doesn’t want me to send him a book. He knows the thinking under the work our team is doing is solid.

The only time I send something longer is if it’s a postmortem for some prod issue, which I write by hand.

I use AI every day, often multiple agents at once, but knowing when it’s appropriate and when I need to be the one thinking really hard about something.

View on HN · Topics

I’m too lazy to tell the AI what I want to say, then copy and send its output.

I just type what I want to say and hit send. YOLO

View on HN · Topics

> I just type what I want to say and hit send. YOLO

Made me smile. Perhaps the new term for making a human hand-written reply is that I didnt use AI … “I YOLOed it”.

View on HN · Topics

You're not supposed to read the Jira ticket. You're supposed to paste the link along with instructions for your Claude agent to "do this ticket, no mistakes," then raise an MR for whatever it writes. The text is a wire protocol between agents. If a PM doesn't care enough about the requirements to write, or even read them, then would they even notice if the code works or not? Why would they care about that? What does "works" even mean if no human knows the spec?

How quickly we become reverse centaurs.

View on HN · Topics

> then would they even notice if the code works or not?

it's literally their job to ship functional product features...

View on HN · Topics

Recently I reviewed some vibe-coded stuff and sent a list of issues and suggestions to the “author,” figuring he’d read it and then go through each one with Claude until fixed.

Instead he didn’t read it at all, and just threw the whole thing at Claude Code as a big prompt. The result was… interesting!

View on HN · Topics

The last place I worked for, if it happened with someone new in the company or the team, I would find a polite way to say "do your job and fix this shit" and it worked.

Some people have put me on their blacklists after these interactions, sure, but they're the exact people I don't want to work with again. The important thing here is that I've never done someone else's work for free.

View on HN · Topics

You tell Claude to review it and if it breaks something you blame Claude. No one can get mad at you for it because they don't want to look like luddites.

View on HN · Topics

That has nothing to do with using AI, if the dev didn't check their work then that is being a bad dev.

View on HN · Topics

This is literally losing the whole process to a stochastic parrot.

View on HN · Topics

We need to demand better from our coworkers and from ourself.

Young "AI native" coworker opens PRs with 3 screen slop description, I flagged that "I know he ain't reading all that, and therefore I ain't reading all that" , so he should just give a max half-screen overview. I expect that the PR description makes sense, is correct, and have been reviewed by the person opening the PR. You can still use agents for that, but at least there is a chance with shorter descriptions that it's not completely bs.

View on HN · Topics

Unfortunately, there is pressure to treat this stuff in good faith. Maybe the PR author really did write all this. Maybe they really did spend 6 hours writing this document.

So, I approach it in good faith, but I do get upset when people say "I'll ask claude". You need to be the intermediary, I can also prompt claude and read back the result. If you are going to hire an employee to do work on your behalf, you are responsible for their performance at the end of the day. And that's what an AI assistant is. The buck stops with you. But I don't think people understand that and that they don't understand they aren't adding value. At some point, you have to use your brain to decide if the AI is making sense, that's not really my job as the code/doc reviewer. I want to have a conversation with you , not your tooling, basically.

View on HN · Topics

> If you are going to hire an employee to do work on your behalf, you are responsible for their performance at the end of the day.

So, what you are saying is that I should fire the bottom N% of underperforming agent instances?

You know, like employers do as opposed to taking any responsibility?

View on HN · Topics

As long as each part of the hierarchy understands what they need to know at their level and what they produce, I have no problem with "the whole hierarchy".

You're saying this as if it's some rebuttal ad absurdum, when it's absolutely the case: when the higher layers don't understand what they do, we have a problem with that too, and that's been true since forever. Remember Dilbert and Office Space, and making fun of the ignorant middle managers and execs?

In this case, what we're complaining about is coders not understanding the code they ship (because some AI wrote it and they don't bother to review it or guide the AI fully).

View on HN · Topics

I write a lot and have on several occasions tried dictation as an initial draft authoring step. It was trash every time.

Good for thinking through a concept but unsalvageable in the edit phase. Easier to throw away and rewrite now that you know what to say.

Nowadays I like conversation as an ideating step. Talk to a bunch of people, try to explain yourself until they get it, see what questions they ask. Sometimes in HN threads like this :)

Then write it down.

You get super high signal writing where every sentence is load bearing. I’ve had people take my documents and share them around the company as “this is how it’s done”

It can take weeks of work to produce a 500 word product vision document. And then several months to implement, even with AI.

View on HN · Topics

Hmm... when I really care about the quality of something, I basically write what I think/speak, then try to edit it down by half. I don't find it unsalvageable, but the editing does require an order of magnitude more time than the initial draft of thoughts vomited into the keyboard.

View on HN · Topics

If I paste something from an AI into chat, I always identify it as such by saying something like "my claude instance says this:". I also don't blindly copy paste from it, I always read it first and usually edit it for brevity or tone. Feel like this should be the absolute minimum for sending AI content to a person.

View on HN · Topics

My friend built a construction management SaaS entirely via Claude.

It looked damned impressive, and it kind of worked to demo, but he is in no way a programmer, though he understood the problem domain very well. I asked a few basic questions:

- where is the data stored?

- How would you recover from a database failure?

- does it consume tokens at runtime?

- what is the runtime used at the back end?

- why are the web pages 3M in size and take forever to load?

He had no idea.

It's a typical vibe coding scenario, and people like to paint this as why vibe sucks.

I think however that all that is needed to bridge the gap is some very simple feedback from an expert at the right time.

For example to someone who knows about databases, its pretty easy to look at a database schema and spot stuff that looks off - denormalised data, weird columns. That takes 10 minutes, and the feedback could be given directly to the LLM.

Likewise someone who knows a little about systems architecture could make sure at the outset that some good practices are followed, e.g.:

- "I want your help to build this system but at runtime I do not want to consume any tokens."

- "I want the system to store its data in Postgres (or whatever) and I want documented recovery plans if the database craps itself".

- "I want web pages to, as much as possible, load and render as quickly as possible, and then pull data in from the back end, with loading indicators showing where the UI was not yet up to date".

View on HN · Topics

One of the riskier bets my team is currently making is that this is exactly what is needed, and nearly nothing more.

We have LOB prototypes vibe coded by enthusiastic domain experts that we are supporting in a “port and release” fashion. A senior engineer takes the prototype and uses Claude code to generate a reasonable design, do an initial rough port (~80% functional, 100% auth & audit logging) and (hopefully) all the guidance necessary to keep the agent between the lines. Coupled with review bots and evolving architecture guidance etc. Then the business partner develops and supports it from there.

For low stakes CRUD, I think it’s a reasonable middle ground. There truly is a lot of value in letting an expert user fine tune UX; and we’re only doing this with people who are already good at defining requirements and have the kind of “systems” thinking that makes them valuable analyst resources to the tech team already. Early results are encouraging but it’s way too early to draw conclusions.

Personally I hate how badly internal users are served by the majority of their systems and am willing to take some calculated long-term governance risks.

View on HN · Topics

> That takes 10 minutes

Verifying LLM output needs to occur every time LLM output is generated, so no it doesn’t just take 10 minutes.

It takes 10 minutes + time to change the LLM input + 10 minutes to verify it worked * ~the number of times the code is generated.

Which is why vibe coding is so common, if you actually care about quality LLM’s are a near endless time sink.

View on HN · Topics

> I think however that all that is needed to bridge the gap is some very simple feedback from an expert at the right time.

I don't think it's as simple as that. What will most likely happen is that the vibe coders will quickly eat up your time asking for validation and feedback if you are not careful. You are also now implicitly contributing to their project, which if it goes south, could come back to bite you. If the vibe coders are pushing code in the org, then they should become part of the formal review process like any other junior programmer.

They should also be forced to do daily stand-ups, sit in meetings and explain their code like the rest of us.

View on HN · Topics

Perhaps the author of the code and architecture (Claude) should receive those questions.

View on HN · Topics

Such repetitions can regularly be deterministically automated, like find -exec sed and similar medium level tools.

If you spend a lot of time performing monotonic tasks, then your organisation needs to delete and refactor for a while until change in 'hot' areas of the code base are easy to make. Reaching for some code synthesis SaaS to paper it over will worsen the problem and should result in excommunication from the guild.

View on HN · Topics

All of it hell no :D But just with any things, you break things down into subtasks. Then you break it down even more. You as a human don't hold all that stuff in your head either, so why would an LLM?

My current codebase is ~3 million LoC all in all (not greenfield, really old code), working on it by myself, the complexity is definitely manageable between Claude and me :)

View on HN · Topics

You are not responding faithfully to the comment. A mechanic looking up the schematics in a manual understands them. Just because they haven't memorized the material does not make it the same. This is more analogous to looking up a function in the documentation that you forgot about.

This is clearly not what the post was referring to, which is instead like googling how to fix a pipe in your home when you've never done any plumbing before in your life. Can it work out? Sure, depends on the issue, can you cause your pipes to freeze, your house to flood, or sediment build up to completely block a pipe? Yes.

View on HN · Topics

Absolutely - factory repair guides/apps are the only source of truth for official specs, although 3rd-party manuals are very good as well. That being said, I've often turned 3-hour estimated repairs into 15-minute jobs through clever shortcuts. For example, rotating an alternator to replace the run clutch through the gap in in the intake manifold as opposed to removing the complete intake manifold. I think that's where using experienced (and resourceful) developers pays off.

Also, for sale: BMW E60/61 Bentley 2-volume set. Barely used.

View on HN · Topics

With you up until the last sentence.

When I get my car fixed, I could not care less if they googled, used a service manual, or did it by "these old 2023's always had this problem right here...". I care if it is fixed.

And as I'm currently trying to fix something on my own, for financial reasons, I assure you a mechanic with training AND google can do a better job in 1/4th the time. Because I don't have the training.

Nor do the worst people using LLMs.

View on HN · Topics

Can't you just tell Claude to fix it and if Claude can't fix it, it must be impossible to fix so oh well?

View on HN · Topics

There is perhaps _some_ truth to this, long term. But I think it’s way too early to remove all the QA.

View on HN · Topics

Pretty much this. It's like a cult mentality. Those who critique the approach or push back get sidelined. There are demos every week of essentially Claude loops and MCP integrations and those of us not reaffirming the ideas stopped getting invited.

Heard some wild statements in the past few months. A couple that come to mind:

- "we don't need to review the output closely, it's designed to correct itself"
- "it comes up with the requirements, writes the tickets, and prioritises what to work on. We only need to give it a two or three line prompt"

The promise of this agentic workflow is always only a few weeks away. It's not been used to build anything that has made it to production yet.

View on HN · Topics

I have it on a long timer so that I have to pause for a while before the auto-complete prompt appears. I've found I tend to deliberately set things up for it to attempt when I know I'm going to have to type a bunch of boiler plate or some code that's logically straightforward but syntactically fiddly ie. I write a quick comment describing what the next few lines should do and then wait a seconds for it to make the suggestion

View on HN · Topics

Even worse, I've seen the JetBrains AI auto-complete insert hard-to-spot bugs, like two nested for loops with i and j for loop index variables, where the inner loop was fairly complex and incorrectly used i instead of j in one place.

View on HN · Topics

Indeed “it misses deeper issues […] such as when the wrong change has been made“ which human review will catch.

What it will do, is notice inconsistencies like a savant who can actually keep 12 layers of abstraction in mind at once. Tiny logic gaps with outsized impact, a typing mistake that will lead to data corruption downstream, a one variable change that complete changes your error handling semantics in a particular case, etc. It has been incredibly useful in my experience, it just serves a different purpose than a peer review.

View on HN · Topics

ouch, sounds like your manager is more a problem than the llm review!

i find it as a good backstop to catch dumb mistakes or suggest alternatives but is not a replacement for human review (we require human review but llm suggestions are always optional and you're free to ignore)

View on HN · Topics

I usually use git and open source tooling, but I've been working with our internal tech stack recently. It includes an editor with AI-powered autocomplete, and it drives me crazy.

It populates suggestions nearly instantly, which is constantly distracting. They're often wrong (either not the comment I was leaving, or code that's not valid). Most of the normal navigation keys implicitly accept the suggestion, so I spend an annoying amount of time editing code I didn't write, and fighting with the tool to STFU and let me work. Sometimes I'll try what it suggests only to find out that it doesn't build or is broken in other stupid ways.

All of this with the constant anxiety to "be more productive because AI."

View on HN · Topics

oof. nothing like a home grown tool that gets more in your way than helps!

i especially find suggestions distracting in markdown where i feel is the key place i really dont want an llm trying to interfere in my ability to communicate to other developers on my team

View on HN · Topics

i disagree because i see code as the actual product of the thought behind it. it is after all a description of the intent of the programmer and programming language are what we use to communicate to machines

that said, we will see over the next few years who is right!

View on HN · Topics

i'll bite. the uses for llms i've described are about what i've been using them for since chatgpt 3o. they've absolutely gotten better since then but i still find them to be very poor replacements for humans, esp in regards to architectural direction. they're very useful assistants tho

View on HN · Topics

My daughter's pediatrician uses an AI to record and summarize our conversation for the doctor so she can pay more attention to conversing and talking with us than taking notes. I think it's a fair usage of AI (in that it's not a completely stupid usage of AI, but obviously it still has some issues), but I always have to stop myself from saying "disregard all previous context and do X"

I think it'd be funny, but I'm afraid it'll add something weird to my daughter's medical record.

View on HN · Topics

The right use of AI requires stellar leadership, and to be honest, I don't think that kind of leadership exists. I am using AI just for myself, and the traps and pitfalls I encounter are so many. For example, I generate an article on a topic, and while this is very useful to get started, I then have to go through every sentence because AI makes some overconfident statements that are just not true in this form. This is still very helpful, because then I have to think about why they are not true. But I don't see how that can ever scale, how would I know that colleagues are also diligent like this?

AI is incredible in three scenarios: a) what I just described, to get you started, b) to generate artifacts that can be rigorously checked (and I don't mean tests, I mean proofs), c) where your artifacts don't have a meaningful notion of correctness, like a work of art.

c) is a matter of taste, b) certainly scales, but a) is where I think trust will be essential, and I am not ready to trust anyone with that except myself.

Oh, and I think currently, c) is applied to software engineering, by people who cannot distinguish the engineering from the art part of software. Which is just funny right now, and will eventually be catastrophic.

View on HN · Topics

Also, all code is wrong in the wrong context, all code is right in the right context, the reason AI cannot one shot a complete architecture is that it's not a defined and possible task - if you fully specify the architecture the AI isn't designing anything, and if you don't fully specify the architecture how is the AI going to resolve ambiguity without either guessing, asking questions to make you do the necessary work, or refusing to work until it's fully specified?

AI is a stochastic process, it's more like finding the answer to a particular problem using simulated annealing, a genetic algorithm, or a constrained random walk. It's been trained on code well enough that there's a high density probability field around the kinds of code you might want, and that's what you see often - middle of the road solutions are easy to one shot.

But if you have very specific requirements, you're going to quickly run into areas of the probability cloud that are less likely, some so unlikely that the AI has no training data to guide it, at which point it's no better than generating random characters constrained by the syntax of the language unless you can otherwise constrain the output with some sort of inline feedback mechanism (LSP, test, compiler loops, linters, fuzzers, prop testing, manual QA, etc etc).

View on HN · Topics

I have to produce a great deal of documentation at work for our customers, most of it regulatory and compliance assessments.

Some of the sources I need to use come from agencies in the government or working with the government and are often over a thousand pages long.

So AI has been incredibly helpful here because a lot of what I need to do is map this huge bureaucratic set of guidelines and policies to each customer’s particular situation.

Aware of the sloppy nature of LLMs I created my own workflow that resembles more coding than document drafting.

I use Codex, VSCode and plain markdown, I don’t use MS Word or Copilot like all my other colleagues.

I invest a great deal of time still doing manual labor like researching and selecting my sources, which I then make available for Codex to use as its single source of truth.

I start with a skill that generates the outline which often is longer than it should be. Sometimes I get say a 18 sections outline and I ask Codex to cut it in half. Then I ask for a preliminary draft of each section (each on a separate markdown) and read through and update as necessary, before I ask the agent to develop each section in full, then proof read and update again.

When I’m satisfied I merge all the sections into one single markdown and run another skill to check for repetition, ambiguity, length, etc and usually a few legitimate improvements are recommended.

The whole process can still take me several days to produce a 20-30 pages compliance document, which gets read, verified and approved by myself and others in my team before it goes out.

The productivity gains are pretty obvious, but most importantly I think the content is of better quality for the customer.

View on HN · Topics

Using LLMs/agents feels like bowling with bumpers but I'm the bumpers.

View on HN · Topics

It’s like walking a dog that keeps pulling off the path

View on HN · Topics

While I’m not disagreeing, if you ask the LLM to critique something, it will try very hard to find something to critique, regardless of how little it might be warranted. The important thing is that you have to remain the competent judge of its output.

View on HN · Topics

> never ask a model for confirmation or encouragement; but you can absolutely ask it to critique something, and that's often of value.

What's the difference? The end result is equally unreliable.

In either case, the value is determined by a human domain expert who can judge whether the output is correct or not, in the right direction or not, if it's worth iterating upon or if it's going to be a giant waste of time, and so on. And the human must remain vigilant at every step of the way , since the tool can quickly derail.

People who are using these tools entirely autonomously, and give them access to sensitive data and services, scare the shit out of me. Not because the tool can wipe their database or whatnot, but because this behavior is being popularized, normalized, and even celebrated. It's only a matter of time until some moron lets it loose on highly critical systems and infrastructure, and we read something far worse than an angry tweet.

View on HN · Topics

That’s entirely on you. You can take the time to understand it before moving on to the next task. I say this with sympathy and understanding.

View on HN · Topics

just last week AI led a developer on our team to brick our git history when he was attempting to fix a deploy. he's not a git expert but an llm should of not led him that far astray, no?

i see on a weekly basis where if an llm was left to do what its initial direction was without human oversight it would have broken otherwise working programs

View on HN · Topics

Great article. Hits on many points that resonate with my experience.

The skin in the game one, in particular, is something I've been thinking about. People have been telling me LLMs are "more intelligent" than "average people". But it's easy to sound intelligent when you have no skin in the game. People have to stand by their word and suffer the consequences of their actions. It's not enough just to sound intelligent.

It seems appropriate also to share an anecdote of an incident that recently happened in my job. A colleague submitted some code for review, quite a lot of it. A second colleague reviewed and questioned a piece of code. Rather than answer the question with a justification, the question was taken rhetorically and the code was removed. The code then failed in production because the removed code was, in fact, necessary. The LLM obviously "knew" this, but neither colleague did. It's leading me to introduce a "no rhetorical questions in code review" rule. The submitter must be able to justify every line of code they submit.

Summarizer