Code Quality Without Review

The adoption of AI coding agents is fundamentally shifting the developer's role from manual implementation to high-level orchestration, where the focus moves toward reviewing automated Pull Requests and managing parallel tasks. While some enthusiasts embrace a future where direct source code inspection becomes the exception, skeptics remain wary of "driving blind," arguing that "vibe coding" without local verification risks creating a mountain of sloppy, unmaintainable code. To reconcile these perspectives, many developers are adopting a hybrid workflow, using agents to generate initial PRs while maintaining quality through persistent markdown plans and local checkouts for manual testing.

View on HN · Topics

I delayed adopting conductor because I had my own worktree + pr wrappers around cc but I tried it over the holidays and wow. The combination of claude + codex + conductor + cc on the web and claude in github can be so insanely productive.

I spend most of my time updating the memory files and reviewing code and just letting a ton of tasks run in parallel

View on HN · Topics

I haven't missed planning mode myself. I tend to tell it "write a detailed plan first in a file called spec.md for me to review", then use that as the ongoing plan.

I like that it ends up in the repo as it means it survives compaction or lets me start a fresh session entirely.

View on HN · Topics

I was doing the same, but recently I noticed that Claude now writes its plans to a markdown file somewhere nested in the ~/.claude/plans directory. It will carry a reference to it through compaction. Basically mimicking my own workflow!

This can be customized via a shell env variable that I cannot remember ATM.

The downside (upside?) is that the plan will not end up in your repo. Which sometimes I want. I love the native plan mode though.

View on HN · Topics

I'm surprised to see people getting value from "web sandbox"-type setups, where you don't actually have access to the source code. Are folks really _that_ confident in LLMs as to entirely give up the ability to inspect the source code, or to interact with a running local instance of the service? Certainly that would be the ideal, but I'm surprised that confidence is currently running that high.

View on HN · Topics

I still get the full source code back at the end, I tell it to include code it wrote in the PR.

I also wrote my own tool to extract and format the complete transcript, it gives me back things like this where I can see everything it did including files and scripts it didn't commit. Here's an example: https://gistpreview.github.io/?3a76a868095c989d159c226b7622b...

View on HN · Topics

Oh fascinating - so you're reviewing "your own" code in-PR, rather than reviewing it before PR submission? I can see that working! Feels weird, but I can see it being a reasonable adaptation to these tools - thanks!

What about running services locally for manual testing/poking? Do you open ports on the Anthropic VM to serve the endpoints, or is manual testing not part of your workflow?

View on HN · Topics

Yeah, I generally use PRs for anything a coding agent writes for me.

If something is too fiddly to test within the boundaries of a cloud coding agent I switch to my laptop. Claude Code for web has a "claude --teleport" command for this, or I'll sometimes just do a "gh pr checkout X" to get the branch locally.

View on HN · Topics

The output from Jules is a PR. And then it's a toss-up between "spot on, let's merge" and "nah, needs more work, I will check out the branch and fix it properly when I am the keyboard". And you see the current diff on the webpage while the agent is working.

View on HN · Topics

Right, yes, that was precisely my point - it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.

View on HN · Topics

> it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.

I have a project where I've made a rule that no code is written by humans. It's been fun! It's a good experience to learn how far even pre-Opus 4.5 agents can be pushed.

It's pretty clear to me that in 12 months time looking at the code will be the exception, not the rule.

View on HN · Topics

when the agent pushes the PR, in a branch, you can switch to that branch locally on your machine and do whatever, review it, change it, and ask for extra modifications on top, squash it, rebase it

View on HN · Topics

I wonder when/how to test and review the code though? I mean, how do you know Claude Code hasn't entered a completely different path than you had imagined?

View on HN · Topics

That sounds nice, but what happens when there's something Claude messes up, doesn't know how to do something, or when you have to review the thousand lines it added to your project ?

Unless it's a totally vibe coded side project without any tests or quality control of some sort.

I'm just curious what you can build with this setup. It just seems to be the way to create a mountain of sloppy, unmaintainable code.

View on HN · Topics

I'm kind of confused too. I spend way more time testing and reviewing code than I could possibly keep up with 4 agents

View on HN · Topics

This sounds cool but I feel like I need to often run the code in one way or another when verifying what Claude does. Otherwise it feels like driving blind. Claude Code already has the web version which I could use from my phone and fair it can't run scripts etc which limits the output quality. But if I can't verify what it did it also limits how long I can let it run before I need my laptop eventually.

Ofc if you have demo deployments etc on branches that you could open on mobile it works for longer.

Another issue is that I often need to sit down and think about the next prompt going back and forth with the agent on a plan. Try out other product features, do other research before I even know what exactly to build. Often doing some sample implementations with Claude code and click around these days. Doing this on a phone feels... limiting.

I also can't stand the constant context switching. Doing multiple feature in parallel already feels dumb because every time I come from feature B to A or worse from feature G to E it takes me some time to adjust to where I was, what Claude last did and how to proceed from here. Doing more tasks than 2 max. 3 in parallel often ends up slowing me down. Now you add ordering coffee and small talk to the mix and I definitely can't effectively prompt without rereading all history for minutes before sending the next prompt. At which point I might have also opened up my laptop.

Ofc if you truly vibe code and just add feature on feature and pray nothing breaks, the validation overhead and bar for quality goes down a lot so it works a lot better but the output is also just slop by then.

I typed this on my phone and it took 20 minutes, a laptop might have been faster.

Summarizer