Code Review Challenges

AI-generated PRs being too large to review, colleagues not reading their own code, rubber-stamping becoming norm, quality assurance breaking down

The rise of AI-generated pull requests has sparked a "slop" cycle where overengineered code and bloated documentation discourage human oversight, leading to massive, unreadable changesets that degrade institutional knowledge. This trend is eroding professional accountability as "vibe coders" offload the burden of comprehension onto reviewers, who frequently resort to rubber-stamping or using their own LLMs to skim the noise. Consequently, the traditional social contract of peer mentorship is breaking down, replaced by a "write-only" culture where neither the author nor the reviewer truly understands the logic being shipped. Driven by misaligned management incentives and AI providers that benefit from complexity, this shift threatens to turn software development into a hazard-prone process where "sounding intelligent" replaces actual technical justification.

View on HN · Topics

> Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend.

Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’

Even ignoring token costs, there’s a high incentive for LLMs to generate complex solutions, because those solutions generate demand for further LLM use. (You don’t really want to review that 30,000 line pull request by hand , do you?)

View on HN · Topics

I'm sorry but "extensive documentation, scalable, high test coverage, perfect code style" seems to me to be the opposite of throwaway.

It sounds like the kind of thing people will think surely must be very important and in use, because why go through all those hoops instead of doing a quick hack?

But I guess we can just throw AI at the maintenance burden anyways..

View on HN · Topics

This paragraph hit home with me as well. I work at a large tech company that's a household name and the practice of using AI to pad out design documents has become totally out of control over the last 4 or 5 months. Writing documentation is arduous and a little painful, which as it turns out is a good thing as it incentivizes the writer to be as succinct as possible. Why the fuck should I -- along with five other engineers -- bother to read and review your design if you didn't even bother to write it?

View on HN · Topics

I see it even on my GitHub project, issues and pull request comments get longer, responses get longer, all generated by ai and read by ai. This text is no longer for human consumption, but to provide context to ai.

View on HN · Topics

I like them. It tells very clearly how much effort went into someone's work.

I like them even more on code comments. It tells _precisely_ how much effort went into the pull request, so I don't spend time reviewing lazy work.

View on HN · Topics

I think you’re missing the sarcasm in their comment.

They’re saying that the emoji usage is telling them that very little effort was put into the PR and that they’ll treat it accordingly.

View on HN · Topics

So you just rubber-stamp the lazy work? What else can you do when this PR is assigned to you specifically for reviewing?

View on HN · Topics

Recently I reviewed some vibe-coded stuff and sent a list of issues and suggestions to the “author,” figuring he’d read it and then go through each one with Claude until fixed.

Instead he didn’t read it at all, and just threw the whole thing at Claude Code as a big prompt. The result was… interesting!

View on HN · Topics

This is happening with coworkers now. It’s honestly insulting.

They put up a PR with all the obvious tells, the markdown table of files that changed, the description that basically parrots back things the human obviously wanted them to stress in the task (“this implements a secure, tested (no regressions) implementation of a Foo…”), and the code is an absolute mess of one-off functions placed in any random file with no thought to the way the codebase is actually organized.

Then I give feedback after spending like an hour going through their 2000 line change, and then here comes back an update with a very literal interpretation of my feedback that clearly doesn’t really understand what I was even saying. Complete with code comments that parrot back what I said (“// Use the expected platform abstractions for conversion (not bespoke methods”).

Reviewing coworkers PR’s feels like I’m just talking to the LLM directly at this point, but with more steps and I have less control over the output.

View on HN · Topics

I guess they just close the PR.

View on HN · Topics

I wonder if we humans are already checking out from PR reviews from human effort that we've misjudged as AI. we are in so much trouble! lol

View on HN · Topics

Because the reviewer ends up doing the real work actually checking it works.

The laziness is offloading work down the line.

View on HN · Topics

> The "elongation" of workplace artifacts resonated with me on such deep level

Well put. I generally skip AI-generated PR descriptions for this reason as they tend to miss the forest for the trees. Sometimes a large change can be explained by a short yet information-rich description ("migrate to use X instead of Y", "Implement F using pattern P") that only a human could and should write.

View on HN · Topics

We need to demand better from our coworkers and from ourself.

Young "AI native" coworker opens PRs with 3 screen slop description, I flagged that "I know he ain't reading all that, and therefore I ain't reading all that" , so he should just give a max half-screen overview. I expect that the PR description makes sense, is correct, and have been reviewed by the person opening the PR. You can still use agents for that, but at least there is a chance with shorter descriptions that it's not completely bs.

View on HN · Topics

As long as each part of the hierarchy understands what they need to know at their level and what they produce, I have no problem with "the whole hierarchy".

You're saying this as if it's some rebuttal ad absurdum, when it's absolutely the case: when the higher layers don't understand what they do, we have a problem with that too, and that's been true since forever. Remember Dilbert and Office Space, and making fun of the ignorant middle managers and execs?

In this case, what we're complaining about is coders not understanding the code they ship (because some AI wrote it and they don't bother to review it or guide the AI fully).

View on HN · Topics

They likely haven’t read it either, so they’ll never know you didn’t as well.

View on HN · Topics

This had me crack up!

I used to have a colleague (senior engineer) who never cared to write a single line in Pull Request descriptions, as if other people had to magically know what he meant to achieve with such changes.

Now? His PRs have a full page description with "bulleted summaries of bulleted summaries"!

View on HN · Topics

Whenever I see AI-generated content put forward for my attention, I extract myself from the situation with the minimum possible time expenditure from my side.

It's some sort of a leverage: "I spend 5 minutes prompting, so that you could spend 30 minutes reviewing". Not gonna happen LLM buddies.

View on HN · Topics

> I think however that all that is needed to bridge the gap is some very simple feedback from an expert at the right time.

I don't think it's as simple as that. What will most likely happen is that the vibe coders will quickly eat up your time asking for validation and feedback if you are not careful. You are also now implicitly contributing to their project, which if it goes south, could come back to bite you. If the vibe coders are pushing code in the org, then they should become part of the formal review process like any other junior programmer.

They should also be forced to do daily stand-ups, sit in meetings and explain their code like the rest of us.

View on HN · Topics

If you have a codebase that big, can you even fit enough of it into a context window for the LLM to make correct and meaningful changes across all of it? Admittedly I've only used LLM-based coding for smaller projects.

View on HN · Topics

The forcing of competent engineers to vibe code is something I’ll never understand. Also, I’ve heard rewriting people’s vibe coded efforts being a substantial issue, everything that engineers do nowadays seems to be code review.

View on HN · Topics

It would be horrible to rewrite. Not the first commit or whatever. But after a few weeks of people not reading the code it looks more like a write only code base. I refused to go full vibe/agentic coding. So I got to see what was happening. This was only over a short period of time mind you.

There was a lot of duplicate and triplicate methods. A lot of the classes were is-a related without inheritance, not the biggest deal but it was becoming a mess.

Code I used to know well was more or less gone. It was rewritten in a way that wasn't the same approach and had lost lessons learned. Some of it had real battle wounds baked into it. Things qa passed the week before were broken in places no one thought they touched. A good deal of tests were useless or didn't mean anything for production.

Code review is more or less impossible for me. I can read maybe a 1k line change. 20-30k changes all the time? You end up saying "sure buddy lgtm". We had someone put a 200kloc change for a new feature using a 3rd party tool no one had used before. No clue, but it was not my business apparently because we needed to be more individuals now that we were using AI

View on HN · Topics

How can you read a 1k lone change?

What are you doing where 200kloc is even remotely acceptable? That’s like half a percent of linux.

View on HN · Topics

How do I do that? It takes a while.

Don't ask me. It wasnt 200k it was like 170 something. I can't say too much but it was some big weird ETL pipeline using some weird database. Tons of weird algorithms for displaying data, by storing it all in memory? I don't know man I wasn't allowed to talk to whoever had swarms of agents create it. From what I understand of it it was a complete hazard

Linux kernel has I think tens of millions of lines of code for reference.

View on HN · Topics

My boss told me enforcing code quality wasn’t important because in 6 months we won’t even read code anymore.

View on HN · Topics

I'm with you on all apart from code review.

Our team has tried a couple tools. Most of the issues highlighted are either very surface level or non-issues. When it reviews code from the less competent team members, it misses deeper issues which human review has caught, such as when the wrong change has been made to solve a problem which could be solved a better way.

Our manager uses it as evidence to affirm his bias that we don't know what we're doing. It got to the point that he was using a code review tool and pasting the emoji littered output into the PR comments. When we addressed some of the minor issues (extra whitespace for example) he'd post "code review round 2". Very demoralising and some members of the team ended up giving up on reviewing altogether and just approving PRs.

I think it's ok to review your own code but I don't think it should be an enforced constraint in a process, because the entire point of code review from the start was to invest time in helping one another improve. When that is outsourced to a machine, it breaks down the social contract within the team.

View on HN · Topics

ouch, sounds like your manager is more a problem than the llm review!

i find it as a good backstop to catch dumb mistakes or suggest alternatives but is not a replacement for human review (we require human review but llm suggestions are always optional and you're free to ignore)

View on HN · Topics

Don't give up on the automated code review entirely though, the models and prompts are getting better every day.

View on HN · Topics

That explains the spaghetti ball that is CC

View on HN · Topics

On troubleshooting, either LLMs used to be better, or I'm in a huge bad luck strake. All of the last few times I tried to ask one, I've got a perfectly believable and completely wrong answer that weren't even on the right subject.

On code review, the amount of false positives is absolutely overwhelming. And I see no reason for that to improve.

But yes, LLMs can probably help on those lines.

View on HN · Topics

appreciate the compliment!

i don't see llm code review as any kind of code review replacement; more as a backstop to catch things a human might miss (like today an llm caught an unimplemented feature in a POC that would have otherwise been easy for a human to miss)

View on HN · Topics

what would you say the disconnect was? Was it a simple case of that your teams' not comfortable with merging AI code?

View on HN · Topics

One important part is not expanded on - incentives. If you really think about it that is the crux of the problem. If I am recognized for creating documents, PRs, features, decks, token use, and NOT for doc/PR/deck reviews or feedback or fixing features, then the outcome is what we see now.

An example of a new feature in the company goes the following way:

- some request is raised by person1

- PR is generated with an "agent" by person2

- PR is reviewed using an "agent" by person3

- feature is merged and shipped

- person1 is happy and records a video with a feature to be shown to the clients

- in a next call with the leadership this feature is declared as a success

It all looks good until you look at the implementation, not only that there is very little time to intervene. I find myself recently trying to quickly review PRs before they get quickly merged, just to be on a safe side as people do not even look at the code.

View on HN · Topics

You already realised that you aren't paid to review code manually. Why waste the time? And maybe even get the wrath of your management by "wasting" time?

View on HN · Topics

One of the best uses of AI I've found is code reviewing stuff I've written either entirely myself, or even code generated in a previous session.

View on HN · Topics

And the added horror of prs that keep on coming. Correct looking code with no thoughts behind it.

View on HN · Topics

I disagree on both fronts. Unguided AI can be a very efficient tech debt generator.

View on HN · Topics

Who cares? I obviously didn't like the article.

> Schemes were all wrong

Why'd you let him run wild for two months? What software org would let anyone, even principle do that? Wouldn't the very first thing you'd do is review the guys schema? This reads like all the other snarky posts on HN about how everyone is punching above their pay grade and people who are much more advanced in some space just watch like two trains colliding.

I'll tell you what is productive in the workplace. Communication. That is it. Communicate and lift the guy up, give the guy a running start instead of chilling in the break room snarking with all your snarky co-workers.

View on HN · Topics

Great article. Hits on many points that resonate with my experience.

The skin in the game one, in particular, is something I've been thinking about. People have been telling me LLMs are "more intelligent" than "average people". But it's easy to sound intelligent when you have no skin in the game. People have to stand by their word and suffer the consequences of their actions. It's not enough just to sound intelligent.

It seems appropriate also to share an anecdote of an incident that recently happened in my job. A colleague submitted some code for review, quite a lot of it. A second colleague reviewed and questioned a piece of code. Rather than answer the question with a justification, the question was taken rhetorically and the code was removed. The code then failed in production because the removed code was, in fact, necessary. The LLM obviously "knew" this, but neither colleague did. It's leading me to introduce a "no rhetorical questions in code review" rule. The submitter must be able to justify every line of code they submit.

Summarizer