Verification and Code Review

A critical point raised is the danger of 'fire-and-forget' coding with LLMs on mobile. Users note that verifying the code generated by AI is difficult on a phone due to limited visibility and syntax highlighting. The conversation touches on the risks of deploying code or merging pull requests without the ability to properly audit the logic or run tests, suggesting that mobile workflows are better suited for prototyping than production engineering.

Developers are increasingly bypassing the limitations of small screens by treating pull requests and automated CI/CD pipelines as the primary safeguards for AI-generated code. Innovative workflows now feature sandboxed mobile terminals and custom Telegram bots that allow users to resolve AI design dilemmas with a single tap, effectively bridging the gap between automated agents and human oversight. While skeptics argue that the lack of syntax highlighting and deep focus makes mobile engineering impractical, proponents find it ideal for "thumb-driven development" of personal micro-utilities during commutes or exercise. Ultimately, this approach shifts the verification burden from tedious manual auditing to a results-oriented model where live previews and success indicators validate the logic.

View on HN · Topics

I read the Readme. So this is all just stuff you can do with Claude's cli interface? It edits files and runs utilities? And it does this with few enough errors that you can be productive by just chatting with it over ssh? Is Claude the only one that can do this?

View on HN · Topics

Probably Claude Code and Codex are the currently best ones, Claude Code a bit faster, Codex a lot more precise and "engineering" focused.

As long as you figure out how to verify that the built thing actually does what it's supposed to, ideally with automated tests, it's almost fire-and-forget if you're good at explaining what you want and need.

View on HN · Topics

How did you make sure Claude wasn't doing anything unintended while allowing it to run scripts it wrote on your network?

View on HN · Topics

I still manually approve tool use requests at the start of a run. As it gets deeper in I might allow it to run safer commands without that oversight (e.g. writing to local text files), but potentially destructive execution still requires approval.

As for the local env, I'm treating the Android terminal as a sandbox. Anything gets trashed I just reset and reinstall my toolchain.

I won't pretend I'd use this workflow for anything high-stakes. But for simple things like "I wonder how my Hue lights actually work?", its viable.

View on HN · Topics

I've replied with this in another comment, but this seems more pertinent ;)

Thats exactly the approach I took with https://github.com/cloud-atlas-ai/miranda , Telegram bot, PR is the human review point, tests + CodeRabbit catch most issues.

Bot intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.

View on HN · Topics

I have read of people doing remote coding with clause but through having Claude create pull request. The user then looks through the requests, and either approves or sends it back with edits. Seems like a good way to interact with Claude code, especially once one sets up a test suite and those proposed pull requests have proven not to regress.

View on HN · Topics

Same approach here. PR is the human review point, tests + CodeRabbit catch most issues -> https://github.com/cloud-atlas-ai/miranda .

The gap I wanted to fill: when Claude is genuinely uncertain ("JWT or sessions?" "Breaking change or not?"), it either guesses wrong or punts to the PR description where you can't easily respond.

Built a Telegram bot that intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.

Works in tandem with a bunch of other LLM enhancers I've built, they're linked in the README or that repo

View on HN · Topics

This is how I do mobile device coding. Android terminal w. git and gh installed and authenticated. Claude manages the feature branching and PR process; I review the PR in the GitHub mobile app.

View on HN · Topics

No syntax highlighting, I do like to review snippets of code. Also the interactive questions / answers during planning would be a pain over email. And what about text wrapping? Headache.

Edit: also setting up an email interface API to Claude Code seems like a lot more work than just setting up a VPN.

View on HN · Topics

You don't need a VPN to vibe code on your phone. I've been happily doing thumb-driven development for the last 4 months now using GitHub Copilot on github.com from my phone. It even has real-time chat with copilot as it works. Having your PRs deploy to an environment allows you to check it. I also have playwright tests that record screenshots and traces that get uploaded as artifacts I can check too.

View on HN · Topics

My flow is GitHub issues+ GitHub Copilot+ Web Deployments from GitHub actions.

I can just ask GitHub to fix something from the mobile app, and then set it to build on PR merge. It works most of the time, but you'd have to be absolutely wacky to do it in production or with any code you actually care about

View on HN · Topics

I've been using a similar workflow for the past couple of months.Heavily inspired by Simon Willison’s approach of building micro tools, I’ve started building micro-utilities. I do this mostly while I'm commuting or outside or waiting for something at work.

Instead of just jotting down an idea in a notes app (and it sitting there for eternity), I’ll open up Jules, describe the tool, and have it scaffold the HTML. I have Cloudflare Pages hooked up to the repo—once Jules submits the PR, the preview branch builds automatically and I can verify the result on my phone immediately.

View on HN · Topics

I’ve only used web codex version but everything about it was slower than what’s described here, broken flows, more rate limited and impossible to “human in the loop” before a PR.

View on HN · Topics

Let me know how it goes! From the comments above, seems like you can use tmux to keep persistent sessions when you lose Internet connection - but I haven't tried myself.

Diff review is alright. I'm an amateur programmer. Sometimes I don't look at the code claude generates, but when I'm troubleshooting a bug, I'll ask claude to output all recent changes - which satisfies my untrained eye.

View on HN · Topics

This looks neat.
How do you handle code verification in this workflow, especially if you want to be confident about what actually ran?

View on HN · Topics

I might just be old fashioned but in a party with a couple of drinks in me I don't trust my ability to even vibe code well.

View on HN · Topics

Just keep your eyes on the pretty CI/CD lights.

View on HN · Topics

Does this approach work for anyone? For my life, I've found that if I'm not behind the computer then I'm not in a productive situation anyway, even with AI access. I don't have a setting where I can concentrate for a long time and think clearly. For examole when watching children, doing groceries, during transit (probably have to change train in 20m, or walking to next destination). No convenient access to a notepad and pen. On a phone it's also inconvenient to do research.

For me personally I've found two better uses of in-between time:

1. Micro exercises. Really important for health and longevity, especially when it's hard to find dedicated time for exercise.

2. Resting. This means no phone. Yeah hard to resist doom scrolling. Just relaxing muscles and breathing exercises, calming down the nervous system. Increases long term resillience and reduces stress.

So I'm a bit puzzled. If you are in a situation where you can concentrate, why not just pull out a laptop? Typing on phone is really annoying. Even complex conversations with AI I prefer doing on a laptop.

Perhaps there are coding tasks where the prompt is not too complex and it's more about writing code. But you still have to review the result. That's even more annoying on a phone than writing text.

View on HN · Topics

In between sets!? I've found that if I do any activity in between sets (like watching Twitter) I'll just end up spending way too much and then make the exercise session super long. Also I can't focus and write a serious prompt or review serious results in just or 3 minutes. But maybe it works if the app is sonething you've recently worked on and you already have very clearly in your mind what you want, it just needs to be done.

View on HN · Topics

For incremental changes 1-2 sentences are usually enough. Also, since the program itself is a workout app with live reload, I can actually fix bugs while I’m using it.

As for too long of a wait I agree, it makes the sessions longer. Ideal window is after a heavy superset where waiting for 3-5 minutes is not a waste.

(Note that I’m not doing this for my real job, just for my personal project)

View on HN · Topics

Does anyone have any good advice or resources on a good workflow to do this with web apps? There's some stuff I'd really like to solve, for myself/family, that would require a front and back-end with persistent storage.

I would love to easily be able to set this up easily when a new idea pops into my mind and then have something running (locally or securely in some cloud) within a few hours/days. I wouldn't want to spend a ton of money for this though, nor have a lot of overhead to manage.

Edit: In addition, I'd like some safeguards where I can't have the LLM of choice accidentally delete stuff or do other unintended things on my network.

View on HN · Topics

I feel it depends whether you inspect and edit the code as part of the workflow, or just test what the AI produced and give feedback without participating in the coding yourself.

Summarizer