llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/batch-6-714c8f1c-7b3a-47c6-9ff9-f080f6565578-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Not Novel or Revolutionary
Related: Many commenters argue this workflow is standard practice, not radically different. References to existing tools like Kiro, OpenSpec, SpecKit, and Antigravity that already implement spec-driven development. Claims the approach was documented 2+ years ago in Cursor forums.
2. LLMs as Junior Developers
Related: Analogy comparing LLMs to unreliable interns with boundless energy. Discussion of treating AI like junior developers requiring supervision, documentation, and oversight. The shift from coder to software manager role.
3. AI-Generated Article Concerns
Related: Multiple commenters suspect the article itself was written by AI, noting characteristic style and patterns. Debate about whether AI-written content should be evaluated differently or dismissed outright.
4. Magic Words and Prompt Engineering
Related: Skepticism about whether words like 'deeply' and 'in great details' actually affect LLM behavior. Discussion of attention mechanisms, emotional prompting research, and whether prompt techniques are superstition or cargo cult.
5. Planning vs Just Coding
Related: Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.
6. Spec-Driven Development Tools
Related: References to existing frameworks: OpenSpec, SpecKit, BMAD-METHOD, Kiro, Antigravity. Discussion of how these tools formalize the research-plan-implement workflow described in the article.
7. Context Window Management
Related: Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.
8. Waterfall Methodology Comparison
Related: Commenters note the approach resembles waterfall development with detailed upfront planning. Discussion of whether this contradicts agile principles or represents rediscovering proven methods.
9. Test-Driven Development Integration
Related: Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.
10. Single Session vs Multiple Sessions
Related: Author's claim of running entire workflows in single long sessions without performance degradation. Others recommend clearing context between phases for better results.
11. Determinism and Reproducibility
Related: Concerns about non-deterministic LLM outputs. Discussion of whether software engineering can accommodate probabilistic tools. Comparisons to gambling and slot machines.
12. Token Cost Considerations
Related: Discussion of workflow being token-heavy and expensive. Comparisons between Claude subscription tiers. Arguments that simpler approaches save money while achieving similar results.
13. Annotation Workflow Details
Related: Questions about how to format inline annotations for Claude to recognize. Techniques like TODO prefixes, HTML comments, and clear separation between human and AI-written content.
14. Subagent Architecture
Related: Using multiple agents for different phases: planning, implementation, review. Red team/blue team approaches. Dispatching parallel agents for independent tasks.
15. Reference Implementation Technique
Related: Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality.
16. Claude vs Other Models
Related: Comparisons between Claude, Codex, Gemini, and other models. Discussion of model-specific behaviors and optimal prompting strategies. Using multiple models in complementary roles.
17. Greenfield vs Existing Codebases
Related: Observation that most AI coding articles focus on greenfield development. Different challenges when working with legacy code and established patterns.
18. Human Review Requirements
Related: Debate about whether all AI-generated code must be reviewed line-by-line. Questions about trust, liability, and whether AI can eventually be trusted without oversight.
19. Productivity Claims Skepticism
Related: Questions about actual time savings versus perceived productivity. References to studies showing AI sometimes makes developers less productive. Concerns about false progress.
20. Documentation as Side Benefit
Related: Plans and research documents serve as valuable documentation for future maintainers. Version controlling plan files in git. Using plans to understand architectural decisions later.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "47108820",
"text": "The crowd around this pot shows how superficial is knowledge about claude code. It gets releases each day and most of this is already built in the vanilla version. Not to mention subagent working in work trees, memory.md, plan on which you can comment directly from the interface, subagents launched in research phase, but also some basic mcp's like LSP/IDE integration, and context7 to not to be stuck in the knowledge cutoff/past.\n\nWhen you go to YouTube and search for stuff like \"7 levels of claude code\" this post would be maybe 3-4.\n\nOh, one more thing - quality is not consistent, so be ready for 2-3 rounds of \"are you happy with the code you wrote\" and defining audit skills crafted for your application domain - like for example RODO/Compliance audit etc."
}
,
{
"id": "47108962",
"text": "I'm using the in-built features as well, but I like the flow that I have with superpowers. You've made a lot of assumptions with your comment that are just not true (at least for me).\n\nI find that brainstorming + (executing plans OR subagent driven development) is way more reliable than the built-in tooling."
}
,
{
"id": "47109322",
"text": "I try these staging-document patterns, but suspect they have 2 fundamental flaws that stem mostly from our own biases.\n\nFirst, Claude evolves. The original post work pattern evolved over 9 months, before claude's recent step changes. It's likely claude's present plan mode is better than this workaround, but if you stick to the workaround, you'd never know.\n\nSecond, the staging docs that represent some context - whether a library skills or current session design and implementation plans - are not the model Claude works with. At best they are shaping it, but I've found it does ignore and forget even what's written (even when I shout with emphasis), and the overall session influences the code. (Most often this happens when a peripheral adjustment ends up populating half the context.)\n\nIndeed the biggest benefit from the OP might be to squeeze within 1 session, omitting peripheral features and investigations at the plan stage. So the mechanism of action might be the combination of getting our own plan clear and avoiding confusing excursions. (A test for that would be to redo the session with the final plan and implementation, to see if the iteration process itself is shaping the model.)\n\nOur bias is to believe that we're getting better at managing this thing, and that we can control and direct it. It's uncomfortable to realize you can only really influence it - much like giving direction to a junior, but they can still go off track. And even if you found a pattern that works, it might work for reasons you're not understanding -- and thus fail you eventually. So, yes, try some patterns, but always hang on to the newbie senses of wonder and terror that make you curious, alert, and experimental."
}
,
{
"id": "47110045",
"text": "What I've read is that even with all the meticulous planning, the author still needed to intervene. Not at the end but at the middle, unless it will continue building out something wrong and its even harder to fix once it's done. It'll cost even more tokens. It's a net negative.\n\nYou might say a junior might do the same thing, but I'm not worried about it, at least the junior learned something while doing that. They could do it better next time. They know the code and change it from the middle where it broke. It's a net positive."
}
,
{
"id": "47110072",
"text": "Unfortunately, you could argue that the model provider has also learned something, i.e. the interaction can be used as additional training data to train subsequent models."
}
,
{
"id": "47110053",
"text": "this comment is the first truly humane one ive read regarding this whole AI fiasco"
}
,
{
"id": "47109468",
"text": "Planning is important because you get the LLM to explain the problem and solution in its language and structure, not yours.\n\nThis shortcuts a range of problem cases where the LLM fights between the users strict and potentially conflicting requirements, and its own learning.\n\nIn the early days we used to get LLM to write the prompts for us to get round this problem, now we have planning built in."
}
,
{
"id": "47109348",
"text": "This is the flow I've found myself working towards. Essentially maintaining more and more layered documentation for the LLM produces better and more consistent results. What is great here is the emphasis on the use of such documents in the planning phase. I'm feeling much more motivated to write solid documentation recently, because I know someone (the LLM) is actually going to read it! I've noticed my efforts and skill acquisition have moved sharply from app developer towards DevOps and architecture / management, but I think I'll always be grateful for the application engineering experience that I think the next wave of devs might miss out on.\n\nI've also noted such a huge gulf between some developers describing 'prompting things into existence' and the approach described in this article. Both types seem to report success, though my experience is that the latter seems more realistic, and much more likely to produce robust code that's likely to be maintainable for long term or project critical goals."
}
,
{
"id": "47107019",
"text": "I do something very similar, also with Claude and Codex, because the workflow is controlled by me, not by the tool. But instead of plan.md I use a ticket system basically like ticket_<number>_<slug>.md where I let the agent create the ticket from a chat, correct and annotate it afterwards and send it back, sometimes to a new agent instance. This workflow helps me keeping track of what has been done over time in the projects I work on. Also this approach does not need any „real“ ticket system tooling/mcp/skill/whatever since it works purely on text files."
}
,
{
"id": "47107081",
"text": "+1 to creating tickets by simply asking the agent to. It's worked great and larger tasks can be broken down into smaller subtasks that could reasonably be completed in a single context window, so you rarely every have to deal with compaction. Especially in the last few months since Claude's gotten good at dispatching agents to handle tasks if you ask it to, I can plan large changes that span multilpe tickets and tell claude to dispatch agents as needed to handle them (which it will do in parallel if they mostly touch different files), keeping the main chat relatively clean for orchestration and validation work."
}
,
{
"id": "47107593",
"text": "semantic plan name is important"
}
,
{
"id": "47107048",
"text": "Regarding inline notes, I use a specific format in the `/plan` command, by using th `ME:` prefix.\n\nhttps://github.com/srid/AI/blob/master/commands/plan.md#2-pl...\n\nIt works very similar to Antigravity's plan document comment-refine cycle.\n\nhttps://antigravity.google/docs/implementation-plan"
}
,
{
"id": "47110172",
"text": "Shameless plug: https://beadhub.ai allows you to do exactly that, but with several agents in parallel. One of them is in the role of planner, which takes care of the source-of-truth document and the long term view. They all stay in sync with real-time chat and mail.\n\nIt's OSS.\n\nReal-time work is happening at https://app.beadhub.ai/juanre/beadhub (beadhub is a public project at https://beadhub.ai so it is visible).\n\nParticularly interesting (I think) is how the agents chat with each other, which you can see at https://app.beadhub.ai/juanre/beadhub/chat"
}
,
{
"id": "47108690",
"text": "I’ve been using this same pattern, except not the research phase. Definetly will try to add it to my process aswell.\n\nSometimes when doing big task I ask claude to implement each phase seprately and review the code after each step."
}
,
{
"id": "47107940",
"text": "The annotation cycle is the key insight for me. Treating the plan as a living doc you iterate on before touching any code makes a huge difference in output quality.\n\nExperimentally, i've been using mfbt.ai [ https://mfbt.ai ] for roughly the same thing in a team context. it lets you collaboratively nail down the spec with AI before handing off to a coding agent via MCP.\n\nAvoids the \"everyone has a slightly different plan.md on their machine\" problem. Still early days but it's been a nice fit for this kind of workflow."
}
,
{
"id": "47107981",
"text": "I agree, and this is why I tend to use gptel in emacs for planning - the document is the conversation context, and can be edited and annotated as you like."
}
,
{
"id": "47111425",
"text": "So we’re back to waterfall huh"
}
,
{
"id": "47108681",
"text": "I've been working off and on on a vibe coded FP language and transpiler - mostly just to get more experience with Claude Code and see how it handles complex real world projects. I've settled on a very similar flow, though I use three documents: plan, context, task list. Multiple rounds of iteration when planning a feature. After completion, have a clean session do an audit to confirm that everything was implemented per the design. Then I have both Claude and CodeRabbit do code review passes before I finally do manual review. VERY heavy emphasis on tests, the project currently has 2x more test code than application code. So far it works surprisingly well. Example planning docs below -\n\nhttps://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv..."
}
,
{
"id": "47108346",
"text": "I've been teaching AI coding tool workshops for the past year and this planning-first approach is by far the most reliable pattern I've seen across skill levels.\n\nThe key insight that most people miss: this isn't a new workflow invented for AI - it's how good senior engineers already work. You read the code deeply, write a design doc, get buy-in, then implement. The AI just makes the implementation phase dramatically faster.\n\nWhat I've found interesting is that the people who struggle most with AI coding tools are often junior devs who never developed the habit of planning before coding. They jump straight to \"build me X\" and get frustrated when the output is a mess. Meanwhile, engineers with 10+ years of experience who are used to writing design docs and reviewing code pick it up almost instantly - because the hard part was always the planning, not the typing.\n\nOne addition I'd make to this workflow: version your research.md and plan.md files in git alongside your code. They become incredibly valuable documentation for future maintainers (including future-you) trying to understand why certain architectural decisions were made."
}
,
{
"id": "47111240",
"text": "> it's how good senior engineers already work\n\nThe other trick all good ones I’ve worked with converged on: it’s quicker to write code than review it (if we’re being thorough). Agents have some areas where they can really shine (boilerplate you should maybe have automated already being one), but most of their speed comes from passing the quality checking to your users or coworkers.\n\nJuniors and other humans are valuable because eventually I trust them enough to not review their work. I don’t know if LLMs can ever get here for serious industries."
}
,
{
"id": "47110186",
"text": "The biggest roadblock to using agents to maximum effectiveness like this is the chat interface. It's convenience as detriment and convenience as distraction. I've found myself repeatedly giving into that convenience only to realize that I have wasted an hour and need to start over because the agent is just obliviously circling the solution that I thought was fully obvious from the context I gave it. Clearly these tools are exceptional at transforming inputs into outputs and, counterintuitively, not as exceptional when the inputs are constantly interleaved with the outputs like they are in chat mode."
}
,
{
"id": "47109884",
"text": "This is similar to what I do. I instruct an Architect mode with a set of rules related to phased implementation and detailed code artifacts output to a report.md file. After a couple of rounds of review and usually some responses that either tie together behaviors across context, critique poor choices or correct assumptions, there is a piece of work defined for a coder LLM to perform. With the new Opus 4.6 I then select specialist agents to review the report.md, prompted with detailed insight into particular areas of the software. The feedback from these specialist agent reviews is often very good and sometimes catches things I had missed. Once all of this is done, I let the agent make the changes and move onto doing something else. I typically rename and commit the report.md files which can be useful as an alternative to git diff / commit messages etc."
}
,
{
"id": "47107931",
"text": "The author is quite far on their journey but would benefit from writing simple scripts to enforce invariants in their codebase. Invariant broken? Script exits with a non-zero exit code and some output that tells the agent how to address the problem. Scripts are deterministic, run in milliseconds, and use zero tokens. Put them in husky or pre-commit, install the git hooks, and your agent won’t be able to commit without all your scripts succeeding.\n\nAnd “Don’t change this function signature” should be enforced not by anticipating that your coding agent “might change this function signature so we better warn it not to” but rather via an end to end test that fails if the function signature is changed (because the other code that needs it not to change now has an error). That takes the author out of the loop and they can not watch for the change in order to issue said correction, and instead sip coffee while the agent observes that it caused a test failure then corrects it without intervention, probably by rolling back the function signature change and changing something else."
}
,
{
"id": "47108464",
"text": "Lol I wrote about this and been using plan+execute workflow for 8 months.\n\nSadly my post didn't much attention at the time.\n\nhttps://thegroundtruth.media/p/my-claude-code-workflow-and-p..."
}
,
{
"id": "47108652",
"text": "I have to give this a try. My current model for backend is the same as how author does frontend iteration. My friend does the research-plan-edit-implement loop, and there is no real difference between the quality of what I do and what he does. But I do like this just for how it serves as documentation of the thought process across AI/human, and can be added to version control. Instead of humans reviewing PRs, perhaps humans can review the research/plan document.\n\nOn the PR review front, I give Claude the ticket number and the branch (or PR) and ask it to review for correctness, bugs and design consistency. The prompt is always roughly the same for every PR. It does a very good job there too.\n\nModelwise, Opus 4.6 is scary good!"
}
,
{
"id": "47109997",
"text": "The separation of planning and execution resonates strongly. I've been using a similar pattern when building with AI APIs — write the spec/plan in natural language first, then let the model execute against it.\n\nOne addition that's worked well for me: keeping a persistent context file that the model reads at the start of each session. Instead of re-explaining the project every time, you maintain a living document of decisions, constraints, and current state. Turns each session into a continuation rather than a cold start.\n\nThe biggest productivity gain isn't in the code generation itself — it's in reducing the re-orientation overhead between sessions."
}
,
{
"id": "47108414",
"text": "https://github.blog/ai-and-ml/generative-ai/spec-driven-deve..."
}
,
{
"id": "47108070",
"text": "> “remove this section entirely, we don’t need caching here” — rejecting a proposed approach\n\nI wonder why you don't remove it yourself. Aren't you already editing the plan?"
}
,
{
"id": "47107966",
"text": "Interesting! I feel like I'm learning to code all over again! I've only been using Claude for a little more than a month and until now I've been figuring things out on my own. Building my methodology from scratch. This is much more advanced than what I'm doing. I've been going straight to implementation, but doing one very small and limited feature at a time, describing implementation details (data structures like this, use that API here, import this library etc) verifying it manually, and having Claude fix things I don't like. I had just started getting annoyed that it would make the same (or very similar) mistake over and over again and I would have to fix it every time. This seems like it'll solve that problem I had only just identified! Neat!"
}
,
{
"id": "47107464",
"text": "> Most developers type a prompt, sometimes use plan mode, fix the errors, repeat.\n\n> ...\n\n> never let Claude write code until you’ve reviewed and approved a written plan\n\nI certainly always work towards an approved plan before I let it lost on changing the code. I just assumed most people did, honestly. Admittedly, sometimes there's \"phases\" to the implementation (because some parts can be figured out later and it's more important to get the key parts up and running first), but each phase gets a full, reviewed plan before I tell it to go.\n\nIn fact, I just finished writing a command and instruction to tell claude that, when it presents a plan for implementation, offer me another option; to write out the current (important parts of the) context and the full plan to individual (ticket specific) md files. That way, if something goes wrong with the implementation I can tell it to read those files and \"start from where they left off\" in the planning."
}
,
{
"id": "47107609",
"text": "The author seems to think theyve invented a special workflow...\n\nWe all tend to regress to average (same thoughts/workflows)...\n\nHave had many users already doing the exact same workflow with:\nhttps://github.com/backnotprop/plannotator"
}
,
{
"id": "47107683",
"text": "4 times in one thread, please stop spamming this link."
}
,
{
"id": "47108819",
"text": "Haha this is surprisingly and exactly how I use claude as well. Quite fascinating that we independently discovered the same workflow.\n\nI maintain two directories: \"docs/proposals\" (for the research md files) and \"docs/plans\" (for the planning md files). For complex research files, I typically break them down into multiple planning md files so claude can implement one at a time.\n\nA small difference in my workflow is that I use subagents during implementation to avoid context from filling up quickly."
}
,
{
"id": "47108879",
"text": "Same, I formalized a similar workflow for my team (oriented around feature requirement docs), I am thinking about fully productizing it and am looking to for feedback - https://acai.sh\n\nEven if the product doesn’t resonate I think I’ve stumbled on some ideas you might find useful^\n\nI do think spec-driven development is where this all goes. Still making up my mind though."
}
,
{
"id": "47109249",
"text": "Spec-driven looks very much like what the author describes. He may have some tweaks of his own but they could just as well be coded into the artifacts that something like OpenSpec produces."
}
,
{
"id": "47108919",
"text": "This is basically long-lived specs that are used as tests to check that the product still adheres to the original idea that you wanted to implement, right?\n\nThis inspired me to finally write good old playwright tests for my website :)."
}
,
{
"id": "47107320",
"text": "I recently discovered GitHub speckit which separates planning/execution in stages: specify, plan, tasks, implement. Finding it aligns with the OP with the level of “focus” and “attention” this gets out of Claude Code.\n\nSpeckit is worth trying as it automates what is being described here, and with Opus 4.6 it's been a kind of BC/AD moment for me."
}
,
{
"id": "47107427",
"text": "Try OpenSpec and it'll do all this for you. SpecKit works too. I don't think there's a need to reinvent the wheel on this one, as this is spec-driven development."
}
,
{
"id": "47110512",
"text": "Gemini is better at research Claude at coding. I try to use Gemini to do all the research and write out instruction on what to do what process to follow then use it in Claude. Though I am mostly creating small python scripts"
}
,
{
"id": "47110075",
"text": "Every \"how I use Claude Code\" post will get into the HN frontpage.\n\nWhich maybe has to do with people wanting to show how they use Claude Code in the comments!"
}
,
{
"id": "47109812",
"text": "In my own tests I have found opus to be very good at writing plans, terrible at executing them. It typically ignores half of the constraints.\nhttps://x.com/xundecidability/status/2019794391338987906?s=2...\nhttps://x.com/xundecidability/status/2024210197959627048?s=2..."
}
,
{
"id": "47110007",
"text": "1. Don't implement too much at at time\n\n2. Have the agent review if it followed the plan and relevant skills accurately."
}
,
{
"id": "47110047",
"text": "the first link was from a simple request with fewer than 1000 tokens total in the context window, just a short shell script.\n\nhere is another one which had about 200 tokens and opus decided to change the model name i requested.\n\nhttps://x.com/xundecidability/status/2005647216741105962?s=2...\n\nopus is bad at instruction following now."
}
,
{
"id": "47110374",
"text": "I just use Jesse’s “superpowers” plugin. It does all of this but also steps you through the design and gives you bite sized chunks and you make architecture decisions along the way. Far better than making big changes to an already established plan."
}
,
{
"id": "47110412",
"text": "Link for those interested: https://claude.com/plugins/superpowers"
}
,
{
"id": "47111583",
"text": "Have you tried https://github.com/pcvelz/superpowers ?"
}
,
{
"id": "47110536",
"text": "I suggest reading the tests that Superpowers author has come up with for testing the skills. See the GitHub repo."
}
,
{
"id": "47110418",
"text": "https://github.com/obra/superpowers"
}
,
{
"id": "47110069",
"text": "Good article, but I would rephrase the core principle slightly:\n\nNever let Claude write code until you’ve reviewed, *fully understood* and approved a written plan.\n\nIn my experience, the beginning of chaos is the point at which you trust that Claude has understood everything correctly and claims to present the very best solution. At that point, you leave the driver's seat."
}
,
{
"id": "47108661",
"text": "I’ve been using Claude through opencode, and I figured this was just how it does it. I figured everyone else did it this way as well. I guess not!"
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50