llm/9db4e77f-8dd5-46da-972e-40d33f3399ef/topic-7-221c37c7-4aa6-465e-acbf-ac2dd74e2e47-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them.
<topic>
Technical Workflow Configurations # Specific details on managing AI agents, including the use of git worktrees for isolation, planning modes, 'teleporting' sessions between local CLI and web interfaces, and using markdown files to define agent behaviors.
</topic>
<comments_about_topic>
1. I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the "more complicated problems" and ensure long-term maintainability of projects when done purely through Claude Code.
Effectively, I try to:
- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.
- Ensure code is written in such a way that it's easy to understand for LLMs;
- Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.
It's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.
It's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).
I'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.
[1] https://github.com/solatis/claude-config/
2. Thanks for sharing and taking the time to document your repo. I’m also sometimes unsure of “self-promotion” — especially when you don’t have anything to sell, including yourself.
I sometimes don’t share links, due to this and then sometimes overshare or miss the mark on relevance.
But sometimes when I do share people are excited about it, so I’ve leaned more to sharing. Worst is you get some downvotes or negative comments, so why not if there is some lurker who might get benefit.
When you don’t blog or influence, how else but in related HN comment threads are like-minded people gonna know about some random GitHub repo?
My second level hope is that it gets picked up by AI crawlers and get aligned somewhere in the latent space to help prompters find it.
ETA: “The [Prompt Engineer] skill was optimized using itself.” That is a whole other self-promotional writeup possibility right there.
3. hah thanks for the compliment.
yeah last time I shared it, I got a whole lot of hate for vibe coder self promotional BS so I decided to tread a bit more carefully this time.
I encourage you to try to prompt engineer skill! It’s one of the easiest to use, and you can literally use it on anything, and you’ll also immediately see how the “dynamic prompt workflow” works.
4. Yes thank you! I find I get more than enough done (and more than enough code to review) by prompting the agent step by step. I want to see what kind of projects are getting done with multiple async autonomous agents. Was hoping to find youtube videos of someone setting up a project for multiple agents so I could see the cadence of the human stepping in and making directions
5. I run 3-5 on distinct projects often. (20x plan) I quite enjoy the context switching and always have. I have a vanilla setup too, and I don't use plugins/skills/commands, sometimes I enable a MCP server for different things and definitely list out cli tools in my claude.md files. I keep a Google doc open where I list out all the projects I'm working on and write notes as I'm jumping thought the Claude tabs, I also start drafting more complex prompts in the Google doc. I've been using turbo repo a lot so I don't have to context switch the architecture in my head. (But projects still using multiple types of DevOps set ups)
Often these days I vibe code a feedback loop for each project, a way to validate itself as OP said. This adds time to how long Claude takes to complete giving me time to switch context for another active project.
I also use light mode which might help others... jks
6. I have not used Claude. But my experience with Gemini and aider is that multiple instances of agents will absolutely stomp over each other. Even in a single sessions overwriting my changes after telling the agent that I did modifications will often result in clobbering.
7. See the agent as a coworker ssh-ing on your machine, how would you work efficiently ? By working on the same directory ? No
You give each agent a git worktree and if you want to check, you checkout their branch.
8. You should try Claude opus 4.5 then. I haven’t had that issue. The key is you need to have well defined specs and detailed instructions for each agent.
9. Proper sandboxing can fix this. But I didn’t see op mention it which I thought was weird
10. Op mentions in the follow up comments that he does a separate git checkout, one for each of the 5 Claude Code agents he runs. So each is independent and when PRs get submitted that's where the merging happens.
11. Personally I just use /resume to switch back to other states when I need to.
12. Yep.
For one of the things I am doing, I am the solo developer on a web application. At any given point, there are 4-5 large features I want and I instruct Claude to heavily test those features, so it is not unusual for each to run for 30-45 minutes and for overall conversations to span several hours. People are correct that it often makes mistakes, so that testing phase usually uncovers a bunch of issues it has to fix.
I usually have 1-2 mop up terminal windows open for small things I notice as I go along that I want to fix. Claude can be bad about things like putting white text on a white button and I want a free terminal to just drop every little nitpick into it. They exist for me to just throw small tasks into. Yes, you really should start a new convo every need, but these are small things and I do not want to disrupt my flow.
There are another 2-3 for smaller features that I am regularly reviewing and resetting. And then another one dedicated to just running the tests already built over and over again and solving any failures or investigating things. Another one is for research to tell me things about the codebase.
13. Where is Claude's checkout? Do you have them all share the same local files or does each use its own copy?
14. People are doing this lots of different ways. Some run it in its own containers or in instances on the web. Some are using git worktrees. I use a worktree for anything large, but smaller stuff is just done in the local files.
Sloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped.
15. > Sloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped.
I think a key thing to point out to people here is that Claude's built in editing tools won't generally allow it to write to a file that has changed since last time it read it, so if it tries to write and gets an error it will tend to re-read the file, adjust its changes accordingly before trying again. I don't know how foolproof those tests are, because Claude can get creative with sed and cat to edit files, and of course if a change crosses file boundaries this might not avoid broken changes entirely. But generally - as you said - it seems good at avoiding big messes.
16. I use Beads which makes it more easy to grasp since its "tickets" for the agent, and I tell it what I want, it creates a bead (or "ticket") and then I ask it to do research, brain dump on it, and even ask it to ask me clarifying questions, and it updates the tasks, by the end once I have a few tasks with essentially a well defined prompt, I tell Claude to run x tasks in parallel, sometimes I dump a bunch of different tasks and ask it to research them all in parallel, and it fills them in, and I review. When it's all over, I test the code, look at the code, and mention any follow ups.
I guess it comes down to, how much do you trust the agent? If you don't trust it fully you want to inspect everything, which you still can, but you can choose to do it after it runs wild instead of every second it works.
17. It depends on the specifics of the tasks; I routinely work on 3-5 projects at once (sometimes completely different stuff), and having a tool like cloud code fits great in my workflow.
Also, the feedback doesnt have to be immediate: sometimes I have sessions that run over a week, because of casual iterations; In my case its quite common to do this to test concepts, micro-benchmarking and library design.
18. I see you haven’t tried BMAD-METHOD or spec-kit yet.
19. Exactly. And if that problem is complex, your first step should be to plan how to sub-divide it anyway. So just ask Claude to map out interdependencies for tasks to look for opportunities to paralellise.
20. Even having Opus review code written by Opus works very well as a first pass. I typically have it run a sub-agent to review its own code using a separate prompt. The sub-agents gets fresh context, so it won't get "poisoned" by the top level contexts justifications for the questionable choices it might have made. The prompts then direct the top level instance to repeat the verification step until the sub-agent gives the code a "pass", and fix any issues flagged.
The result is change sets that still need review - and fixes - but are vastly cleaner than if you review the first output.
Doing runs with other models entirely is also good - they will often identify different issues - but you can get far with sub-agents and different persona ( and you can, if you like, have Claude Code use a sub agent to run codex to prompt it for a review, or vice versa - a number of the CLI tools seems to have "standardized" on "-p <prompt>" to ask a question on the command line)
Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.
21. Thanks, that sounds all very reasonable!
> Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.
Well, that's what the CI is for. :)
In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
22. > Well, that's what the CI is for. :)
Sure, but I'd prefer to catch it before that, not least because it's a simpler feedback loop to ensure Claude fixes its own messes.
> In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
Claude seems to "love" to use linters and error messages if it's given the chance and/or the project structure hints at an ecosystem where certain tools are usually available. But just e.g. listing by name a set of commands it can use to check things in CLAUDE.md will often be enough to have it run it aggressively.
If not enough, you can use hooks to either force it, or sternly remind it after every file edit, or e.g. before it attempts to git commit.
23. At the begining of the project, the runs are fast, but as the project gets bigger, the runs are slower:
- there are bigger contexts
- the test suite is much longer and slower
- you need to split worktree, resources (like db, ports) and sometimes containers to work in isolation
So having 10 workers will run for a long time. Which give plenty of time to write good spec.
You need good spec, so the llm produce good tests, so it can write good code to match these tests.
Having a very strong spec + test suite + quality gates (linter, type checkers, etc) is the only way to get good results from an LLM as the project become more complex.
Unlike a human, it's not very good at isolating complexity by itself, nor stopping and asking question in the face of ambiguity. So the guardrails are the only thing that keeps it on track.
And running a lot of guardrail takes time.
E.G: yesterday I had a big migration to do from HTMX to viewjs, I asked the LLM to produce screenshots of each state, and then do the migration in steps in a way that kept the screenshit 90% identical.
This way I knew it would not break the design.
But it's very long to run e2e tests + screenshot comparison every time you do a modification. Still faster than a human, but it gives plenty of time to talk to another llm.
Plus you can assign them very different task:
- One work on adding a new feature
- One improves the design
- One refactor part of the code (it's something you should do regularly, LLM produce tech debt quickly)
- One add more test to your test suite
- One is deploying on a new server
- One is analyzing the logs of your dev/test/prod server and tell you what's up
- One is cooking up a new logo for you and generating x versions at different resolutions.
Etc.
It's basically a small team at your disposal.
24. > I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.
You use agents to expand the requirements as well , either in plan mode (as OP does) or with a custom scaffold (rules in CLAUDE.md about how to handle requirements; personally I prefer giving Claude the latitude to start when Claude is ready rather than wait for my go-ahead)
> I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.
[this got long: TL;DR: This is what works for me: Stop worrying about individual steps; use sub-agents and slash-commands to encapsulate units of work to make Claude run longer; use permissions to allow as much as you dare (and/or run in a VM to allow Claude to run longer; give Claude tools to verify its work (linters, test suites, sub-agents double-checking the work against the spec) and make it use it; don't sit and wait and read invidiual parts of the conversation - it will only infuriate you to see Claude make stupid mistakes, but if well scaffolded it will fix them before it returns the code to you, so stop reading, breathe, and let it work; only verify when Claude has worked for a long time and checked its own work -- that way you review far less code and far more complete and coherent changes]
You don't. You wait until each agent is done , and you review the PR's. To make this kind of thing work well you need agents and slash-commands, like OP does - sub-agents in particular help prevent the top-level agents from "context anxiety": Claude Code appears to have knowledge of context use, and will be prone to stopping before context runs out; sub-agents use their own context and the top-level agent only uses context to manage the input to and output from them, so the more is farmed out to sub-agents, the longer Claude Code is willing to run. I when I got up this morning, Claude Code had run all night and produced about 110k words of output.
This also requires extensive permissions to use safe tools without asking (what OP does), or --dangerously-skip-permissions (I usually do this; you might want to put this in a container/VM as it will happily do things like "killall -9 python" or similar without "thinking through" consequences - I've had it kill the terminal it itself ran in before), or it'll stop far too quickly.
You'll also want to explicitly tell it to do things in parallel when possible. E.g. if you want to use it as a "smarter linter" (DO NOT rely on it as the only linter, use a regular one too, but using claude to apply more complex rules that requires some reasoning works great), you can ask it to "run the linter agent in parallel on all typescript files" for example, and it will tend to spawn multiple sub-agents running in parallel, and metaphorically twiddle its thumbs waiting for them to finish (it's fun seeing it get "bored" and decide to do other things in the meantime, or get impatient and check on progress obsessively).
You'll also want to make Claude use sub-agents to review, verify, test its work, with instructions to repeat until all the verification sub-agents give its changes a PASS (see 12/ and 13/ in the thread) - there is no reason for you to waste your time reviewing code that Claude itself can tell isn't ready.
[E.g. concrete example: "Vanilla" Claude "loves" using instance_variable_get() in Ruby if facing a class that is missing an accessor for an instance variable. Whether you know Ruby or not, that should stand out like a sore thumb - it's a horrifically gross code smell, as it's basically bypassing encapsulation entirely. But you shouldn't worry about that - if you write Ruby with Claude, you'd want a rule in CLAUDE.md telling it how to address missing accessors, and sub-agent, and possibly a hook, making sure that Claude is told to fix it immediately if it ever uses it.]
Farming it off to sub-agents both makes it willing to work longer, especially on "boring" tasks, and avoids the problem that it'll look at past work and decide it already "knows" this code is ready and start skipping steps.
The key thing is to stop obsessing over every step Claude takes, and treat that as a developer experimenting with something they're not clear on how to do yet. If you let it work, and its instructions are good, and it has ways of checking its work, it will figure out its first attempts are broken, fix them, and leave you with output that takes far less of your time to review.
When Claude tells you its done with a change, if you stop egregious problems, fix your CLAUDE.md, fix your planning steps, fix your agents.
None of the above will absolve you of reviewing code, and you will need to kick things back and have it fix them, and sometimes that will be tedious, but Claude is good enough that the problems you have it fix should be complex, not simple code smells or logic errors, and 9 out 10 times they should signal that your scaffold is lacking im
25. This was extremely useful to read for many reasons, but my favorite thing I learned is that you can “teleport” a task FROM the local Claude Code to Claude Code on the web by prepending your request with “&”. That makes it a “background” task, which I initially erroneously thought was a local background task. Turns out it sends the task and conversation history up to the web version. This allows you to do work in other branches on Claude Code web, (and then teleport those sessions back down to local later if you wish)
26. OpenCode is actually client server architecture. Typically one either runs the TUI or the web interface. I wonder if it would cope ok with running multiple interfaces at once?
Neovim has a decade old feature request for multiple clients to be able to connect to it. No traction alas. Always a great superpower to have, if you can hack it. https://github.com/neovim/neovim/issues/2161
Chrome DevToops Protocol added multiple client support maybe 5 years ago? It's super handy there because automation tools also use the same port. So you couldn't automate and debug at the same time!
That is a really tool ability, to move work between different executors. OpenCode is also super good at letting you open an old session & carry on, so you can switch between. I appreciate the mention; I love the mobile ambient aspect of how Claude Code can teleport this all!!
27. I implemented some of his setup and have been loving it so far.
My current workflow is typically 3-5 Claude Codes in parallel
- Shallow clone, plan mode back and forth until I get the spec down, hand off to subagent to write a plan.md
- Ralph Wiggum Claude using plan.md and skills until PR passes tests, CI/CD, auto-responds to greptile reviews, prepares the PR for me to review
- Back and forth with Claude for any incremental changes or fixes
- Playwright MCP for Claude to view the browser for frontend
I still always comb through the PRs and double check everything including local testing, which is definitely the bottleneck in my dev cycles, but I'll typically have 2-4 PRs lined up ready for me at any moment.
28. We have a giant monorepo, hence the shallow clones. Each Claude works on its own feature / bug / ticket though, sometimes in the same part of the codebase but usually in different parts (my ralph loop has them resolve any merge conflicts automatically). I also have one Claude running just for spelunking through K8s, doing research, or asking questions about the codebase I'm unfamiliar with.
29. Do you prefer Playwright or the Chrome MCP?
30. What I find surprising is how much human intervention the creator of Claude uses. Every time Claude does something bad we write it in claude.md so he learns from it... Why not create an agent to handle this and learn automatically from previous implementations.
B: Outcome Weighting
# memory/store.py
OUTCOME_WEIGHTS = {
RunOutcome.SUCCESS: 1.0, # Full weight
RunOutcome.PARTIAL: 0.7, # Some issues but shipped
RunOutcome.FAILED: 0.3, # Downweighted but still findable
RunOutcome.CANCELLED: 0.2, # Minimal weight
}
# Applied during scoring:
final_score = score * decay_factor * outcome_weight
C: Anti-Pattern Retrieval
# Similar features → SUCCESS/PARTIAL only
similar_features = store.search(..., outcome_filter=[SUCCESS, PARTIAL])
# Anti-patterns → FAILED only (separate section)
anti_patterns = store.search(..., outcome_filter=[FAILED])
Injected into agent prompt:
## Similar Past Features (Successful)
1. "Add rate limiting with Redis..." (Outcome: success, Score: 0.87)
## Anti-Patterns (What NOT to Do)
_These similar attempts failed - avoid these approaches:_
1. "Add rate limiting with in-memory..." (FAILED, Score: 0.72)
## Watch Out For
- **Redis connection timeout**: Set connection pool size
The flow now:
Query: "Add rate limiting"
│
├──► Similar successful features (ranked by outcome × decay × similarity)
│
├──► Failed attempts (shown as warnings)
│
└──► Agent sees both "what worked" AND "what didn't"
31. Great list of useful tips.
It's interesting that Boris doesn't mention "Agent Skills" at all. I'm still a bit confused at the difference between slash commands and Agent Skills.
https://code.claude.com/docs/en/skills
32. The main difference is that slash commands are invoked by humans, whereas skills can only be invoked by the agent itself. It works kinda as conditional instructions.
As an example, I have skills that aide in adding more detail to plans/specs, debugging, and for spinning up/partitioning subagents to execute tasks. I don't need to invoke a slash command each time, and the agent can contextually know by the instructions I give it what skills to use.
33. In the reddit thread Boris says they’re adding the ability to call skills via slash commands in an upcoming release and that he uses the term skill and slash commands interchangeably.
34. "Boris: Skills = slash commands, I use them interchangeably"
https://www.reddit.com/r/ClaudeAI/comments/1q2c0ne/comment/n...
35. I believe slash commands are all loaded into the initial context and executed when invoked by the user. Skills on the other hand only load the name and description into initial context, and the agent (not user) determines when to invoke them, and only then is the whole skill loaded into context. So skills shift decision making to the agent and use progressive disclosure for context efficiency.
36. > manually fixing crap it produces
> it tends to produce so many errors
I get some of the skepticism in this thread, but I don't get takes like this. How are you using cc that the output you look at is "full of errors"? By the time I look at the output of a session the agent has already ran linting, formatting, testing and so on. The things I look at are adherence to the conventions, files touched, libraries used, and so on. And the "error rate" on those has been steadily coming down. Especially if you also use a review loop (w/ codex since it has been the best at review lately).
You have to set these things up for success. You need loops with clear feedback. You need a project that has lots of clear things to adhere to. You need tight integrations. But once you have these things, if you're looking at "errors", you're doing something wrong IMO.
37. Boris is a power user's power user.
I would highly recommend every project maintain a simple, well-written AGENTS.md file. At first it may seem like a more nitpicky README, but you will quickly see how much coding agents benefit from this added context. Imo, the two most important things to include in AGENTS.md are frequent commands and verification methods.
A third thing I've started adding to my projects is a list of related documentation and libraries that may not be immediately obvious. Things like confluence pages and other repos associated with the project.
38. I had this ShowHN yesterday, which didn't grab much attention, so i'm using this opportunity as it seems relevant (it is a solution for running CC in parallel)
if you folks like to run parallel claude-code sessions, and like native terminal like Ghostty, i have a solution for using Git Worktree natively with Ghostty, it is called agentastic.dev, and it has a built-in Worktree/IDE/Diff/Codereview around Ghostty (macos only for now).
Would be happy to answer any questions
ShowHN post: https://news.ycombinator.com/item?id=46501758
39. >Most sessions start in Plan mode
I don't get why they added a Plan mode. Even without it, you can just ask claude to "make a plan" from where you can iterate on it.
Also is it really that much faster to type "/commit-push-pr" than it is to type "commit, push and make a pr" ?
40. It’s really a convenience to force the model to use the planning tool and it prevents edit/write tools until the user approves the plan, like an inverse of “auto accept edits” mode.
41. One thing that’s helped me is creating a bake-off. I’ll do it between Claude and codex. Same prompt but separate environments. They’ll both do their thing and then I’ll score them at the end. I find it helps me because frequently only one of them makes a mistake, or one of them finds an interesting solution. Then once I declare a winner I have scripts to reset the bake-off environments.
42. I wonder what sort of problems you must have to get this upset about the creator of a particular software telling people how they personally use that software
Personally I keep open several tabs of CC but it's not often that more than one or two of them would be running at the same time. It's just to keep particular context around for different parts of the same application since it's quite big (I don't use CC for creating new projects). For example if I had it work on a feature and then I realized there was a bug or an adjustment in the same files that needed to be made then I can just go back to that tab hours or maybe even days later without digging through history
43. Needlessly condescending post of someone sharing their self-proclaimed vanilla setup of iterm with a handful of tabs.
But hey, if it makes you happy.
44. Ao this guy is personally responsible for the RAM shortage, it seems. Jokes aside, i have a similar setup, but with a mix of claude and a local model. Claude can access the local model for simple and repetitive tasks, and it actually does a good job on testing UI. Great way to save tokens.
45. I actually use dozens of claude codes "in parallel" myself (most are sitting idle for a lot of the time though). I set up a web interface and then made it usable by others at clodhost.com if anybody wants to try it (free)!
46. The PostToolUse hook tip for formatting Claude's code is the only actual tip here. Everything else reads like marketing copy.
47. That’s in the docs though
48. Does anyone know if it’s possible to have “ultrathink” be the default instead of saying it in every prompt?
49. https://x.com/bcherny/status/2007892431031988385?s=20
Seems to be moved to the default now. PSA for anyone who didn't see
50. Put it in CLAUDE.md because that just gets added to the prompt
51. Ah, thank you! Now I feel like an idiot. I guess I was thinking “ultrathink” was a specially interpreted command within claude code (sort of like a slash command).
52. "My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. "
Well, of course he doesn't need to customize it. It's already working the way he wants it, seeing as how he created it
53. > [I'm] the creator of Claude Code.
but also
> Claude Code works great out of the box, so I personally don't customize it much.
Am I the only one to notice the irony of this juxtaposition?
54. For lots of software unless you really know what you are doing it's best to just leave the default settings alone and not dig too deep into what's not immediately intended to do. For my application lots of bug reports come from people using our advanced settings without reading any of the instructions at all and screwing it up
So in the case of him being the creator obviously he built it for his needs
55. What’s ironic? He made a good product that works well without needing to configure it?
56. He doesn't need to configure it because he made his preferences the default.
57. Absolutely shocking... Boris uses a light themed terminal?! Kidding aside, these were great tips. I am quite intrigued by the handing off of local Claude sessions to the web version. I wonder if this feature exists for the other Coding CLI agents.
58. I doubt he’d use Claude code as it is. I’m sure he’d upgrade to think harder and do more iterations and go deeper. Codex for example already does that but could go deeper a bit longer to figure out more.
</comments_about_topic>
Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Technical Workflow Configurations # Specific details on managing AI agents, including the use of git worktrees for isolation, planning modes, 'teleporting' sessions between local CLI and web interfaces, and using markdown files to define agent behaviors.
58