Summarizer

LLM Input

llm/9db4e77f-8dd5-46da-972e-40d33f3399ef/batch-2-7bb311fc-2dac-4bfd-9068-390bb76131f0-input.json
Pretty-print
prompt

The following is content for you to classify. Do not respond to the comments—classify them.

<topics>
1. Feasibility of Parallel Agent Workflows
   Related: Skepticism regarding the human capacity to supervise multiple AI agents simultaneously, utilizing analogies like washing dishes vs. laundry, and debating the cognitive load required for context switching between 10 active coding streams.
2. Code Quantity versus Quality
   Related: Discussions on whether generating 50-100 Pull Requests a week represents true productivity or merely 'token-maxxing', with concerns about code churn, technical debt, and the inability of humans to properly review such high volumes of generated code.
3. The One-Person Unicorn Startup
   Related: Debates on whether AI enables solo founders to build billion-dollar companies, arguing that while coding is easier, business bottlenecks like sales, marketing, and product-market fit remain unsolved by LLMs, despite rumors of stealth successes.
4. Claude Code Product Feedback
   Related: User feedback on the Claude Code CLI tool, mentioning specific bugs like terminal flickering and context loss, comparisons to tools like Codex and Cursor, and complaints about reliability and lack of basic features.
5. Cost and Access Disparities
   Related: Analysis of the financial feasibility of running Opus 4.5 agents in parallel, noting that while Anthropic employees may have unlimited access, the cost for average users would be prohibitive due to token limits and API pricing.
6. Marketing Hype and Astroturfing
   Related: Accusations that the original post and similar recent content represent a coordinated marketing campaign by Anthropic, with users expressing distrust of 'influencer' style posts and potential conflicts of interest from the tool's creator.
7. Future of Software Engineering
   Related: Existential concerns about the devaluation of coding skills, the shift from creative building to managerial reviewing of AI output, and fears that junior developers will lose the opportunity to learn through doing.
8. Technical Workflow Configurations
   Related: Specific details on managing AI agents, including the use of git worktrees for isolation, planning modes, 'teleporting' sessions between local CLI and web interfaces, and using markdown files to define agent behaviors.
9. AI Code Review Strategies
   Related: Approaches for handling AI-generated code, such as using separate AI instances to review PRs, the necessity of rigorous CI/CD guardrails, and the danger of blindly trusting 'green' tests without human oversight.
10. The Light Mode Terminal Debate
   Related: A humorous yet contentious side discussion sparked by the creator's use of a light-themed terminal, leading to arguments about eye strain, readability, astigmatism, and developer cultural norms regarding dark mode.
11. SaaS Commoditization and Moats
   Related: Predictions that AI will drive the marginal cost of software to zero, eroding traditional SaaS business models, and that future business value will rely on proprietary data, domain expertise, and distribution rather than code.
12. Agentic Limitations and Reliability
   Related: Criticisms of current AI agents acting like 'slot machines' requiring constant steering, their struggle with complex concurrency bugs, and the observation that they often produce boilerplate rather than solving deep architectural problems.
13. Corporate Adoption and Budgeting
   Related: Anecdotes about colleagues burning through massive amounts of API credits with varying degrees of success, and the disconnect between management's desire for AI productivity and the reality of review bottlenecks.
14. Context Management Techniques
   Related: Discussions on how to optimize context for AI agents, including the use of CLAUDE.md or AGENTS.md to establish rules, and the technical challenges of context limits and pruning during long sessions.
15. Vibe Coding vs. Engineering
   Related: The distinction between 'vibe coding' (iterating until it feels right without deep understanding) and traditional engineering, with experienced developers using AI as a force multiplier rather than a replacement for understanding.
0. Does not fit well in any category
</topics>

<comments_to_classify>
[
  
{
  "id": "46526721",
  "text": "Exactly my opinion. Im pretty pragmatic and open minded, though seasoned enough that I dont stay on the bleeding edge. I became a convert in October, and I think the most recent Sonnet/Opus models truly changed the calculus of \"viable/useable\" so that we have now crossed into the age of AI.\n\nWe are going to see the rest of the industry come along kicking and screaming over the next calendar year, and thats when the ball is going to start truly rolling."
}
,
  
{
  "id": "46524707",
  "text": "> I don’t think it’s industry-wide yet, but it will be relatively soon.\n\n> Check back in on your assessment in a year.\n\nWe’ve all read that, and claims grander than that, multiple times over the past few years. And next year someone will say it again."
}
,
  
{
  "id": "46525276",
  "text": "I think the Deepseek moment that everyone started trying Deepseek and chain of thought was the weekend of 1/25/25 and 1/26/25.\n\nThe progress lived up to the hype the past year. To say otherwise is to be either intellectually dishonest or you just didn't bother using the tools in order to feel how much progress was made.\n\nI just went back to a project that I remember the models struggled with. It felt like years ago but it was from July. Even July to now is night and day different."
}
,
  
{
  "id": "46527045",
  "text": "> To say otherwise is to be either intellectually dishonest or you just didn't bother using the tools\n\nWe can’t have a proper discussion if you start by making wrong and uninformed statements about a stranger and promptly assert that you believe anyone who disagrees with you is either malicious or wilfully ignorant. People can experience the same things and still reach different conclusions or have different opinions.\n\nWhen the same revolutionary messaging is touted over and over with revised dates whenever the previous prediction hasn’t panned out, anyone is justified in not buying that “this time is different” when that has been said multiple times before.\n\nIt’s the boy who cried wolf. Sure, maybe someday it will be true, but save it for when it is instead of repeatedly saying “next year”, “in the next five years”.\n\nhttps://en.wikipedia.org/wiki/The_Boy_Who_Cried_Wolf\n\nhttps://en.wikipedia.org/wiki/List_of_predictions_for_autono..."
}
,
  
{
  "id": "46530174",
  "text": "are there really startups (in the US) pushing 996?"
}
,
  
{
  "id": "46524326",
  "text": "If all of this really worked, Claude Code would not be a buggy, slow, frustratingly limited, and overall poorly written application. It can't even reload a \"plugin\" at runtime. Something that native code plugin hosts have been doing since plugins existed, where it's actually hard to do.\n\nClaude Plugins are a couple `.md` file references, some `/command` handler registrations, and a few other pieces of trivial state. There's not a lot there, but you have to restart the whole damn app to install or update one.\n\nPlus, there's the **ing terminal refresh bug they haven't managed to fix over the past year. Maybe put a team of 30 code agents on that. If I sound bitter, it's because the model itself is genuinely very good. I've just been stuck for a very long time working with it through Claude Code."
}
,
  
{
  "id": "46526691",
  "text": "Yes, anthropics product design is truly bad, as is their product strategy (hey, I know you just subscribed to Claude, but that isnt Claude Code which you need to separately subscribe to, but you get access to Claude Code if you subscribe to a certain tier of Claude, but not the other way around. Also, you need to subscribe to Claude Code with api key and not usage based pricing or else you cant use Claude Code in certain ways. And I know you have Claude and Claude Code access, but actually you cant use Claude Code in Claude, sorry)"
}
,
  
{
  "id": "46527800",
  "text": "I'm a 1-person startup doing pretty well.\n\nI got laid off in the first half of 2025 and decided to use my severance to see if I could go full-time with my side project. Over the last six months I've gone from zero to about $200k in ARR, and 75% of that was in the last three months. My average customer is paying about $250 / month.\n\nI have zero help, I do everything myself: coding, design, marketing, sales, etc. The product uses AI to replace humans in a niche industry, so the core of the product is AI, but I also increasingly build it with AI. I rarely code manually these days, I'm just riding herd on agents, often in between sales calls, dealing with customer support, etc. I may eventually hire a VA-type person to help with admin and customer support stuff where it changes often enough that it's not worth it to build an AI workflow for, but even there...I don't know. If we get reliable computer use models in 2026 or 2027, I probably won't ever hire anyone.\n\nI've never talked openly in tech circles about this product, nor will I. The technical challenges are non-trivial, so I don't think it'd be easy to replicate for another engineer, but my competitors are all dinosaurs and getting customers to switch to me is incredibly easy. The last thing I need is another engineer spinning up a competitor."
}
,
  
{
  "id": "46524265",
  "text": "> shouldn't we be seeing a ton of 1 person startups?\n\nToo early. Wait a year. People are just coming to grips how to really make these agents make good changes and large enough changes to really start accelerating.\n\nAlso, expect a number of those startups to be entirely stealth and wait longer to raise, as well as maybe in many cases be more fleeting and/or far more fast moving (having to totally re-invent what they're doing at a pace you wouldn't expect to before).\n\nI've been full in on this for 2 years now, and I'm only just at the stage where I feel my setups and model capabilities are intersecting to produce results good enough that I've started testing if one project I'm working on will actually manage to generate revenue.\n\nI'm not going to tell you what it is, because if I did there's too little moat and HN is crawling with great people who could probably replicate it and execute on it faster than me, and Claude is capable of doing all the heavy lifting entirely by itself - that in itself is what makes it potentially viable -, so sorry for being vauge.\n\nIf it shows signs of generating revenue, it'll be so cheap to scale because of Claude, that I'll be able scale it far before I need to raise any capital.\n\nBut other people will figure it out, most likely other people are already doing the same thing.\n\nAs a result I have a short window, and it likely will close as model improvements will make it more and more trivial to do what I'm trying to do, so my approach is to try to extract as much return as I can in as little time as I can, hoping there isn't yet too much competition, and then move on.\n\nThis last part will also limit - a lot of people just won't be able to move fast enough (I might not have), and so a lot of these \"one person startups\" won't ever become visible because they won't even get to a stage where people are ready to talk about it.\n\nIn this case, it is easily measurable how much time Claude has saved me, because I've done the same thing before, manually, and made money from it, and the fastest turnaround I've achieved before was 21 days. So far, my first test run with Claude + me in the loop produced the same quality in 3 days, my second in 2 days, my third 12 hours, and I think I can drive it down towards 1-2 hours of my time, with me being the blocker to speeding it up beyond that.\n\nAt 21 days it wasn't really profitable. At 1-2 days it \"should be\" wildly profitable unless I'm already too late. If I can get it down to an hour or two of my time, then I'd also be able to hire to scale it further with good margin, and the question is just finding the sweet spot.\n\nThis opportunity will never be a unicorn, but there's a lot of money there if you don't need to raise, and the cost of scaling it to the sweet spot where I maximise my returns is something I should be able to finance without outside money the moment I validate that the unit economics are right.\n\nYou might not hear about this \"one person startup\" again until it either has failed and I decide to tell the story, or it's succeeded but the opportunity has closed and I've made what I can make from it. I suspect there will be many cases like mine that you'll never hear about at all.\n\n(and yes, I realise a lot of people will just dismiss this as bullshit because I won't give details; that's fine)"
}
,
  
{
  "id": "46524450",
  "text": "I'm not dismissing it. I've been working on something secret-squirrel for over 5 years. It wasn't until November that I made a major breakthrough, resulting in four computer science revelations. At first, I wrote about it in a blog post; people didn't even believe me. Some researchers I wrote to validated it.\n\nI hadn't really used Claude before, but if nobody cares ... then commercialize it, delete the blog post and code from the open source world. In the last month, Claude has helped turn it from a <700 line algorithm into nearly a full-blown product in its own right.\n\nBut yeah, the moat is small. The core of everything is less than 5k LoC; and it'd be easy af for my soon-to-be competitors to reproduce. The only thing I've got going for me is a non-technical cofounder believing in me and pounding on doors to find our first customer, while I finish up the technical side.\n\nWith the computer science revelations, we can basically keep us 6-8 months ahead for the next couple of years. This is the result of years of hard work, but AI has let me take it to market at an astounding speed."
}
,
  
{
  "id": "46523719",
  "text": "I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the \"more complicated problems\" and ensure long-term maintainability of projects when done purely through Claude Code.\n\nEffectively, I try to:\n\n- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.\n\n- Ensure code is written in such a way that it's easy to understand for LLMs;\n\n- Capture all \"invisible knowledge\" around decisions and architecture that's difficult to infer from code alone.\n\nIt's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.\n\nIt's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).\n\nI'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.\n\n[1] https://github.com/solatis/claude-config/"
}
,
  
{
  "id": "46524109",
  "text": "> Ensure code is written in such a way that it's easy to understand for LLMs;\n\n> Capture all \"invisible knowledge\" around decisions and architecture that's difficult to infer from code alone.\n\nI work on projects where people love to create all sorts of complex abstractions but also hate writing ADRs (so they don’t) or often any sorts of comments and when they do they’re not very well written. Like the expectation is that you should call and ask the person who wrote something or have a multi-hour meeting where you make decisions and write nothing down.\n\nThat sort of environment is only conductive to manual work, dear reader, avoid those. Heed the advice above about documenting stuff."
}
,
  
{
  "id": "46526689",
  "text": "Whether or not we work at the same place, we work at the same place."
}
,
  
{
  "id": "46526508",
  "text": "Thanks for sharing and taking the time to document your repo. I’m also sometimes unsure of “self-promotion” — especially when you don’t have anything to sell, including yourself.\n\nI sometimes don’t share links, due to this and then sometimes overshare or miss the mark on relevance.\n\nBut sometimes when I do share people are excited about it, so I’ve leaned more to sharing. Worst is you get some downvotes or negative comments, so why not if there is some lurker who might get benefit.\n\nWhen you don’t blog or influence, how else but in related HN comment threads are like-minded people gonna know about some random GitHub repo?\n\nMy second level hope is that it gets picked up by AI crawlers and get aligned somewhere in the latent space to help prompters find it.\n\nETA: “The [Prompt Engineer] skill was optimized using itself.” That is a whole other self-promotional writeup possibility right there."
}
,
  
{
  "id": "46529217",
  "text": "hah thanks for the compliment.\n\nyeah last time I shared it, I got a whole lot of hate for vibe coder self promotional BS so I decided to tread a bit more carefully this time.\n\nI encourage you to try to prompt engineer skill! It’s one of the easiest to use, and you can literally use it on anything, and you’ll also immediately see how the “dynamic prompt workflow” works."
}
,
  
{
  "id": "46524083",
  "text": "> Ensure code is written in such a way that it's easy to understand for LLMs\n\nOver the summer last year, I had the AI (Gemini Pro 2.5) write base libraries from scratch that area easy for itself to write code against. Now GPro3 can one-shot (with, at most, a single debug loop at the REPL) 100% of the normal code I need developed (back office/business-type code).\n\nHuge productivity booster, there are a few things that are very easy for humans to do that AI struggles with. By removing them, the AI has been just fantastic to work with."
}
,
  
{
  "id": "46524157",
  "text": "How would you characterize code is easy for AI to write code against. - and wouldn't that also be true for humans?"
}
,
  
{
  "id": "46524653",
  "text": "AI is greatly aided by clear usage examples and trigger calls, such as \"Use when [xyz]\" types of standard comments."
}
,
  
{
  "id": "46525621",
  "text": "All relevant code fits in context. Functional APIs. Standard data structures. Design documents for everything.\n\nI'm doing this in a Clojure context, so that helps—the core language/libraries are unusually stable and widely used and so feature-complete there's basically no hallucinations."
}
,
  
{
  "id": "46523386",
  "text": "Yes thank you! I find I get more than enough done (and more than enough code to review) by prompting the agent step by step. I want to see what kind of projects are getting done with multiple async autonomous agents. Was hoping to find youtube videos of someone setting up a project for multiple agents so I could see the cadence of the human stepping in and making directions"
}
,
  
{
  "id": "46523823",
  "text": "I run 3-5 on distinct projects often. (20x plan) I quite enjoy the context switching and always have. I have a vanilla setup too, and I don't use plugins/skills/commands, sometimes I enable a MCP server for different things and definitely list out cli tools in my claude.md files. I keep a Google doc open where I list out all the projects I'm working on and write notes as I'm jumping thought the Claude tabs, I also start drafting more complex prompts in the Google doc. I've been using turbo repo a lot so I don't have to context switch the architecture in my head. (But projects still using multiple types of DevOps set ups)\n\nOften these days I vibe code a feedback loop for each project, a way to validate itself as OP said. This adds time to how long Claude takes to complete giving me time to switch context for another active project.\n\nI also use light mode which might help others... jks"
}
,
  
{
  "id": "46523599",
  "text": "Multiple instances of agents are an equivalent to tabs in other applications - primarily holders of state, rather than means for extreme parallelism."
}
,
  
{
  "id": "46524024",
  "text": "I have not used Claude. But my experience with Gemini and aider is that multiple instances of agents will absolutely stomp over each other. Even in a single sessions overwriting my changes after telling the agent that I did modifications will often result in clobbering."
}
,
  
{
  "id": "46524130",
  "text": "See the agent as a coworker ssh-ing on your machine, how would you work efficiently ? By working on the same directory ? No\n\nYou give each agent a git worktree and if you want to check, you checkout their branch."
}
,
  
{
  "id": "46533964",
  "text": "You should try Claude opus 4.5 then. I haven’t had that issue. The key is you need to have well defined specs and detailed instructions for each agent."
}
,
  
{
  "id": "46526837",
  "text": "Proper sandboxing can fix this. But I didn’t see op mention it which I thought was weird"
}
,
  
{
  "id": "46530167",
  "text": "Op mentions in the follow up comments that he does a separate git checkout, one for each of the 5 Claude Code agents he runs. So each is independent and when PRs get submitted that's where the merging happens."
}
,
  
{
  "id": "46523806",
  "text": "Personally I just use /resume to switch back to other states when I need to."
}
,
  
{
  "id": "46523709",
  "text": "I agree. I'm imagining a large software team with hundreds of tickets \"ready to be worked on\" might support this workflow - but even then, surely you're going to start running into unnecessary conflicts.\n\nThe max Claude instances I've run is 2 because beyond that, I'm - as you say - unable to actually determine the next best course during the processing time. I could spend the entire day planning / designing prompts - and perhaps that will be the most efficient software development practise in the future. And/or perhaps there it is a sign I'm doing insufficient design up front."
}
,
  
{
  "id": "46528012",
  "text": "I suppose he may have a list of feature requests and bug reports to work on, but it does seem a bit odd from a human perspective to want to work on 5 or more things literally in parallel, unless they are all so simple that there is no cognitive load and context switching required to mentally juggle them.\n\nWashing dishes in parallel with laundry and cleaning is of course easily possible, but precisely because there is no cognitive load involved. When the washing machine stops you can interrupt what you are doing to load clothes into the drier, then go back to cleaning/whatever. Software development for anything non-trivial obviously has a much higher task-switching overhead. Optimal flow for a purely human developer is to \"load context\" at the beginning of the day, then remain in flow-state without interruptions.\n\nThe cynical part of me can't also help but wonder if Cherny/Anthopic aren't just advocating token-maxxing!"
}
,
  
{
  "id": "46523357",
  "text": "Yeah I don’t understand these posts recently with people running 10 at once\n\nCan someone give an example of what each of them would be doing?\n\nAre they just really slow, is that the problem?"
}
,
  
{
  "id": "46523595",
  "text": "For me it's their speed, yes. I only run 0-3 at a time, and often the problem at hand is very much not complex. For example \"Take this component out of the file into its own file, including its styles.\" The agent may take 5 minutes for that and what do I do in the meantime? I can start another agent for the next task at hand.\n\nCould also be a bug hunt \"Sometimes we get an error message about XYZ, please investigate how that might happen.\" or \"Please move setting XY from localstorage to cookies\"."
}
,
  
{
  "id": "46524052",
  "text": "I rarely run 10 top-level sessions, but I often run multiple.\n\nHere is one case, though:\n\nI have a prototype Ruby compiler that long languished because I didn't have time. I recently picked up work on it again with Claude Code.\n\nThere are literally thousands of unimplemented methods in the standard library. While that has not been my focus so far, my next step for it is to make Claude work on implementing missing methods in 10+ sessions in parallel, because why not? While there are some inter-dependencies (e.g. code that would at least be better with more of the methods of the lowest level core classes already in place), a huge proportion are mostly independent.\n\nIn this case the rubyspec test suite is there to verify compliance. On top of that I have my own tests (does the compiler still compile itself, and does the selftests still run when compiled with self-compiled compiler?) so having 10+ sessions \"pick off\" missing pieces, make an attempt see if it can make it pass, and move on, works well.\n\nMy main limitation is that I have already once run into the weekly limits of my (most expensive) Claude Max subscription, and I need it for other things too for client work and I'm not willing to pay-per-token for the API use for that project since it's not immediately giving me a return.\n\n(And yes, they're \"slow\" - but faster than me; if they were fast enough, then sure, it'd be nicer to have them run serially, the same way if you had time it's easier to get cohesive work if a single developer does all the work on a project instead of having a team try to coordinate)"
}
,
  
{
  "id": "46524719",
  "text": "It just happens automatically. Once you set it running and it's chugging away there's nothing for you to do for a while. So of course you start working on something else. Then that is running ... before you know it, 5 of them are going and you have forgotten which is what and this is your new problem."
}
,
  
{
  "id": "46523559",
  "text": "Yep.\n\nFor one of the things I am doing, I am the solo developer on a web application. At any given point, there are 4-5 large features I want and I instruct Claude to heavily test those features, so it is not unusual for each to run for 30-45 minutes and for overall conversations to span several hours. People are correct that it often makes mistakes, so that testing phase usually uncovers a bunch of issues it has to fix.\n\nI usually have 1-2 mop up terminal windows open for small things I notice as I go along that I want to fix. Claude can be bad about things like putting white text on a white button and I want a free terminal to just drop every little nitpick into it. They exist for me to just throw small tasks into. Yes, you really should start a new convo every need, but these are small things and I do not want to disrupt my flow.\n\nThere are another 2-3 for smaller features that I am regularly reviewing and resetting. And then another one dedicated to just running the tests already built over and over again and solving any failures or investigating things. Another one is for research to tell me things about the codebase."
}
,
  
{
  "id": "46523876",
  "text": "Where is Claude's checkout? Do you have them all share the same local files or does each use its own copy?"
}
,
  
{
  "id": "46523934",
  "text": "People are doing this lots of different ways. Some run it in its own containers or in instances on the web. Some are using git worktrees. I use a worktree for anything large, but smaller stuff is just done in the local files.\n\nSloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped."
}
,
  
{
  "id": "46524077",
  "text": "> Sloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped.\n\nI think a key thing to point out to people here is that Claude's built in editing tools won't generally allow it to write to a file that has changed since last time it read it, so if it tries to write and gets an error it will tend to re-read the file, adjust its changes accordingly before trying again. I don't know how foolproof those tests are, because Claude can get creative with sed and cat to edit files, and of course if a change crosses file boundaries this might not avoid broken changes entirely. But generally - as you said - it seems good at avoiding big messes."
}
,
  
{
  "id": "46523483",
  "text": "Depends on the project you are working on. Solo on a web app? You probably have 100s of small things to fix. Some more padding there, add a small new feature here, etc."
}
,
  
{
  "id": "46523447",
  "text": "> don't need 10 parallel agents making 50-100 PRs a week\n\nI don't like to be mean, but I few weeks ago the guy bragged about Claude helping him do +50k loc and -48k loc(netting a 2k loc), I thought he was joking because I know plenty of programmers who do exactly that without AI, they just commit 10 huge json test files or re-format code.\n\nI almost never open a PR without a thorough cleanup whereas some people seem to love opening huge PRs."
}
,
  
{
  "id": "46533403",
  "text": "Agree. People are stuck applying the \"agent\" = \"employee\" analogy and think they are more productive by having a team/company of agents. Unless you've perfectly spec'ed and detailed multiple projects up front, the speed of a single agent shouldn't be the bottleneck."
}
,
  
{
  "id": "46533955",
  "text": "That’s how it works though. You create a detailed spec up front. That’s the workflow."
}
,
  
{
  "id": "46528074",
  "text": "Potentially, a lot of that isn't just code generation, it *is* requirements gathering, design iteration, analysis, debugging, etc.\n\nI've been using CC for non-programming tasks and its been pretty successful so far, at least for personal projects (bordering on the edge of non-trivial). For instance, I'll get a 'designer' agent coming up with spec, and a 'design-critic' to challenge the design and make the original agent defend their choices. They can ask open questions after each round and I'll provide human feedback. After a few rounds of this, we whittle it down to a decent spec and try it out after handing it off to a coding agent.\n\nAnother example from work: I fired off some code analysis to an agent with the goal of creating integration tests, and then ran a set of spec reviewers in parallel to check its work before creating the actual tickets.\n\nMy point is there are a lot of steps involved in the whole product development process and isn't just \"ship production code\". And we can reduce the ambiguity/hallucinations/sycophancy by creating validation/checkpoints (either tests, 'critic' agents to challenge designs/spec, or human QA/validation when appropriate)\n\nThe end game of this approach is you have dozens or hundreds of agents running via some kind of orchestrator churning through a backlog that is combination human + AI generated, and the system posts questions to the human user(s) to gather feedback. The human spends most of the time doing high-level design/validation and answering open questions.\n\nYou definitely incur some cognitive debt and risk it doing something you don't want, but thats part of the fun for me (assuming it doesn't kill my AI bill)."
}
,
  
{
  "id": "46527266",
  "text": "My impression is that people who are exploring coordinated multi-agent-coding systems are working towards replacing full teams, not augmenting individuals. \"Meaningful supervising role\" becomes \"automated quality and process control\"; \"generate requirements quickly\" -> we already do this for large human software teams.\n\nIf that's the goal, then we shouldn't interpret the current experiment as the destination."
}
,
  
{
  "id": "46523612",
  "text": "I use Beads which makes it more easy to grasp since its \"tickets\" for the agent, and I tell it what I want, it creates a bead (or \"ticket\") and then I ask it to do research, brain dump on it, and even ask it to ask me clarifying questions, and it updates the tasks, by the end once I have a few tasks with essentially a well defined prompt, I tell Claude to run x tasks in parallel, sometimes I dump a bunch of different tasks and ask it to research them all in parallel, and it fills them in, and I review. When it's all over, I test the code, look at the code, and mention any follow ups.\n\nI guess it comes down to, how much do you trust the agent? If you don't trust it fully you want to inspect everything, which you still can, but you can choose to do it after it runs wild instead of every second it works."
}
,
  
{
  "id": "46523434",
  "text": "LLM agents can be a bit like slot machines. The more the merrier.\n\nAnd at least two generate continuous shitposts for their companies Slack.\n\nThat said, having one write code and a clean context review it is helpful."
}
,
  
{
  "id": "46524526",
  "text": "I would do the same thing if I would justifing paying 200$ per Month for my hobby. But even with that, you will run into throttling / API / Resource limits.\n\nBut AI Agents need time. They need a little bit of reading the sourcecode, proposing the change, making the change, running the verification loop, creating the git commit etc. Can be a minute, can be 10 and potentially a lot longer too.\n\nSo if your code base is big enough that you can work o different topics, you just do that:\n\n- Fix this small bug in the UI when xy happens\n- Add a new field to this form\n- Cleanup the README with content x\n- . . .\n\nI'm an architect at work and have done product management on the side as its a very technical project. I have very little problem coming up with things to fix, enhnace, cleanup etc. I have hard limits on my headcount.\n\nI could easily do a handful of things in parallel and keeping that in my head. Working memory might be limited but working memory means something different than following 10 topics. Especially if there are a few tpics inbetween which just take time with the whole feedback loop.\n\nBut regarding your example of house cleaning: I have ADHD, i sometimes work like this. Working on something, waiting for a build and cleaning soming in parallel.\n\nWhat you are missing is the practical experience with Agents. Taking the time and energy of setting up something for you, perhaps accessability too?\n\nWe only got access at work to claude code since end of last year."
}
,
  
{
  "id": "46524980",
  "text": "> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.\n\nIn this case you have to take a leap of faith and assume that Claude or Codex will get each task done correctly enough that your house won't burn down."
}
,
  
{
  "id": "46523346",
  "text": "This is it! “I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.”"
}
,
  
{
  "id": "46523554",
  "text": "maybe more like throw shits to the wall and see what sticks?"
}

]
</comments_to_classify>

Based on the comments above, assign each to up to 3 relevant topics.

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  
{
  "id": "comment_id_3",
  "topics": [
    0
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment

Remember: Output ONLY the JSON array, no other text.
commentCount

← Back to job