llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/batch-7-0514674f-9411-46ef-b39d-81ca4af3391e-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Not Novel or Revolutionary
Related: Many commenters argue this workflow is standard practice, not radically different. References to existing tools like Kiro, OpenSpec, SpecKit, and Antigravity that already implement spec-driven development. Claims the approach was documented 2+ years ago in Cursor forums.
2. LLMs as Junior Developers
Related: Analogy comparing LLMs to unreliable interns with boundless energy. Discussion of treating AI like junior developers requiring supervision, documentation, and oversight. The shift from coder to software manager role.
3. AI-Generated Article Concerns
Related: Multiple commenters suspect the article itself was written by AI, noting characteristic style and patterns. Debate about whether AI-written content should be evaluated differently or dismissed outright.
4. Magic Words and Prompt Engineering
Related: Skepticism about whether words like 'deeply' and 'in great details' actually affect LLM behavior. Discussion of attention mechanisms, emotional prompting research, and whether prompt techniques are superstition or cargo cult.
5. Planning vs Just Coding
Related: Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.
6. Spec-Driven Development Tools
Related: References to existing frameworks: OpenSpec, SpecKit, BMAD-METHOD, Kiro, Antigravity. Discussion of how these tools formalize the research-plan-implement workflow described in the article.
7. Context Window Management
Related: Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.
8. Waterfall Methodology Comparison
Related: Commenters note the approach resembles waterfall development with detailed upfront planning. Discussion of whether this contradicts agile principles or represents rediscovering proven methods.
9. Test-Driven Development Integration
Related: Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.
10. Single Session vs Multiple Sessions
Related: Author's claim of running entire workflows in single long sessions without performance degradation. Others recommend clearing context between phases for better results.
11. Determinism and Reproducibility
Related: Concerns about non-deterministic LLM outputs. Discussion of whether software engineering can accommodate probabilistic tools. Comparisons to gambling and slot machines.
12. Token Cost Considerations
Related: Discussion of workflow being token-heavy and expensive. Comparisons between Claude subscription tiers. Arguments that simpler approaches save money while achieving similar results.
13. Annotation Workflow Details
Related: Questions about how to format inline annotations for Claude to recognize. Techniques like TODO prefixes, HTML comments, and clear separation between human and AI-written content.
14. Subagent Architecture
Related: Using multiple agents for different phases: planning, implementation, review. Red team/blue team approaches. Dispatching parallel agents for independent tasks.
15. Reference Implementation Technique
Related: Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality.
16. Claude vs Other Models
Related: Comparisons between Claude, Codex, Gemini, and other models. Discussion of model-specific behaviors and optimal prompting strategies. Using multiple models in complementary roles.
17. Greenfield vs Existing Codebases
Related: Observation that most AI coding articles focus on greenfield development. Different challenges when working with legacy code and established patterns.
18. Human Review Requirements
Related: Debate about whether all AI-generated code must be reviewed line-by-line. Questions about trust, liability, and whether AI can eventually be trusted without oversight.
19. Productivity Claims Skepticism
Related: Questions about actual time savings versus perceived productivity. References to studies showing AI sometimes makes developers less productive. Concerns about false progress.
20. Documentation as Side Benefit
Related: Plans and research documents serve as valuable documentation for future maintainers. Version controlling plan files in git. Using plans to understand architectural decisions later.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "47109524",
"text": "I don't deny that AI has use cases, but boy - the workflow described is boring:\n\n\"Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. \"\n\nDoes anyone think this is as epic as, say, watch the Unix archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian demos how pipes work; or Dennis working on C and UNIX? Or even before those, the older machines?\n\nI am not at all saying that AI tools are all useless, but there is no real epicness. It is just autogenerated AI slop and blob. I don't really call this engineering (although I also do agree, that it is engineering still; I just don't like using the same word here).\n\n> never let Claude write code until you’ve reviewed and approved a written plan.\n\nSo the junior-dev analogy is quite apt here.\n\nI tried to read the rest of the article, but I just got angrier. I never had that feeling watching oldschool legends, though perhaps some of their work may be boring, but this AI-generated code ... that's just some mythical random-guessing work. And none of that is \"intelligent\", even if it may appear to work, may work to some extent too. This is a simulation of intelligence. If it works very well, why would any software engineer still be required? Supervising would only be necessary if AI produces slop."
}
,
{
"id": "47110814",
"text": "The post and comments all read like:\nHere are my rituals to the software God. If you follow them then God gives plenty. Omit one step and the God mad. Sometimes you have to make a sacrifice but that's better for the long term.\n\nI've been in eng for decades but never participated in forums. Is the cargo cult new?\n\nI use Claude Code a lot. Still don't trust what's in the plan will get actually written, regardless of details. My ritual is around stronger guardrails outside of prompting. This is the new MongoDB webscale meme."
}
,
{
"id": "47107946",
"text": "I tried Opus 4.6 recently and it’s really good. I had ditched Claude a long time ago for Grok + Gemini + OpenCode with Chinese models. I used Grok/Gemini for planning and core files, and OpenCode for setup, running, deploying, and editing.\n\nHowever, Opus made me rethink my entire workflow. Now, I do it like this:\n\n* PRD (Product Requirements Document)\n\n* main.py + requirements.txt + readme.md (I ask for minimal, functional, modular code that fits the main.py)\n\n* Ask for a step-by-step ordered plan\n\n* Ask to focus on one step at a time\n\nThe super powerful thing is that I don’t get stuck on missing accounts, keys, etc. Everything is ordered and runs smoothly. I go rapidly from idea to working product, and it’s incredibly easy to iterate if I figure out new features are required while testing. I also have GLM via OpenCode, but I mainly use it for \"dumb\" tasks.\n\nInterestingly, for reasoning capabilities regarding standard logic inside the code, I found Gemini 3 Flash to be very good and relatively cheap. I don't use Claude Code for the actual coding because forcing everything via chat into a main.py encourages minimal code that's easy to skim—it gives me a clearer representation of the feature space"
}
,
{
"id": "47108528",
"text": "Interesting approach. The separation of planning and execution is crucial, but I think there's a missing layer most people overlook: permission boundaries between the two phases.\n\nRight now when Claude Code (or any agent) executes a plan, it typically has the same broad permissions for every step. But ideally, each execution step should only have access to the specific tools and files it needs — least privilege, applied to AI workflows.\n\nI've been experimenting with declarative permission manifests for agent tasks. Instead of giving the agent blanket access, you define upfront what each skill can read, write, and execute. Makes the planning phase more constrained but the execution phase much safer.\n\nAnyone else thinking about this from a security-first angle?"
}
,
{
"id": "47108348",
"text": "I’m a big fan of having the model create a GitHub issue directly (using the GH CLI) with the exact plan it generates, instead of creating a markdown file that will eventually get deleted. It gives me a permanent record and makes it easy to reference and close the issue once the PR is ready."
}
,
{
"id": "47110371",
"text": "I do the same. I also cross-ask gemini and claude about the plan during iterations, sometimes make several separate plans."
}
,
{
"id": "47108683",
"text": "I came to the exact same pattern, with one extra heuristic at the end: spin up a new claude instance after the implementation is complete and ask it to find discrepancies between the plan and the implementation."
}
,
{
"id": "47108685",
"text": "The baffling part of the article is all the assertions about how this is unique, novel, not the typical way people are doing this etc.\n\nThere are whole products wrapped around this common workflow already (like Augment Intent)."
}
,
{
"id": "47109758",
"text": "Since the rise of AI systems I really wonder how people wrote code before. This is exactly how I planned out implementation and executed the plan. Might have been some paper notes, a ticket or a white board, buuuuut ... I don't know."
}
,
{
"id": "47109167",
"text": "Google Anti-Gravity has this process built in. This is essentially a cycle a developer would follow: plan/analyse - document/discuss - break down tasks/implement. We’ve been using requirements and design documents as best practice since leaving our teenage bedroom lab for the professional world. I suppose this could be seen as our coding agents coming of age."
}
,
{
"id": "47109684",
"text": "Cool, the idea of leaving comments directly in the plan never even occurred to me, even though it really is the obvious thing to do.\n\nDo you markup and then save your comments in any way, and have you tried keeping them so you can review the rules and requirements later?"
}
,
{
"id": "47109151",
"text": "My process is similar, but I recently added a new \"critique the plan\" feedback loop that is yielding good results. Steps:\n\n1. Spec\n\n2. Plan\n\n3. Read the plan & tell it to fix its bad ideas.\n\n4. (NB) Critique the plan (loop) & write a detailed report\n\n5. Update the plan\n\n6. Review and check the plan\n\n7. Implement plan\n\nDetailed here:\n\nhttps://x.com/PetrusTheron/status/2016887552163119225"
}
,
{
"id": "47109175",
"text": "Same. In my experience, the first plan always benefits from being challenged once or twice by claude itself."
}
,
{
"id": "47110217",
"text": "I had to stop reading about half way, it's written in that breathless linkedin/ai generated style."
}
,
{
"id": "47109446",
"text": "> I am not seeing the performance degradation everyone talks about after 50% context window.\n\nI pretty much agree with that. I use long sessions and stopped trying to optimize the context size, the compaction happens but the plan keeps the details and it works for me."
}
,
{
"id": "47108088",
"text": "Insights are nice for new users but I’m not seeing anything too different from how anyone experienced with Claude Code would use plan mode. You can reject plans with feedback directly in the CLI."
}
,
{
"id": "47108658",
"text": "This is a similar workflow to speckit, kiro, gsd, etc."
}
,
{
"id": "47110039",
"text": "this is exactly how I work with cursor\n\nexcept that I put notes to plan document in a single message like:\n\n> plan quote\nmy note\n> plan quote\nmy note\n\notherwise, I'm not sure how to guarantee that ai won't confuse my notes with its own plan.\n\none new thing for me is to review the todo list, I was always relying on auto generated todo list"
}
,
{
"id": "47109374",
"text": "I agree with most of this, though I'm not sure it's radically different. I think most people who've been using CC in earnest for a while probably have a similar workflow? Prior to Claude 4 it was pretty much mandatory to define requirements and track implementation manually to manage context. It's still good, but since 4.5 release, it feels less important. CC basically works like this by default now, so unless you value the spec docs (still a good reference for Claude, but need to be maintained), you don't have to think too hard about it anymore.\n\nThe important thing is to have a conversation with Claude during the planning phase and don't just say \"add this feature\" and take what you get. Have a back and forth, ask questions about common patterns, best practices, performance implications, security requirements, project alignment, etc. This is a learning opportunity for you and Claude. When you think you're done, request a final review to analyze for gaps or areas of improvement. Claude will always find something, but starts to get into the weeds after a couple passes.\n\nIf you're greenfield and you have preferences about structure and style, you need to be explicit about that. Once the scaffolding is there, modern Claude will typically follow whatever examples it finds in the existing code base.\n\nI'm not sure I agree with the \"implement it all without stopping\" approach and let auto-compact do its thing. I still see Claude get lazy when nearing compaction, though has gotten drastically better over the last year. Even so, I still think it's better to work in a tight loop on each stage of the implementation and preemptively compacting or restarting for the highest quality.\n\nNot sure that the language is that important anymore either. Claude will explore existing codebase on its own at unknown resolution, but if you say \"read the file\" it works pretty well these days.\n\nMy suggestions to enhance this workflow:\n\n- If you use a numbered phase/stage/task approach with checkboxes, it makes it easy to stop/resume as-needed, and discuss particular sections. Each phase should be working/testable software.\n\n- Define a clear numbered list workflow in CLAUDE.md that loops on each task (run checks, fix issues, provide summary, etc).\n\n- Use hooks to ensure the loop is followed.\n\n- Update spec docs at the end of the cycle if you're keeping them. It's not uncommon for there to be some divergence during implementation and testing."
}
,
{
"id": "47109641",
"text": "All sounds like a bespoke way of remaking https://github.com/Fission-AI/OpenSpec"
}
,
{
"id": "47109539",
"text": "Doesn’t Claude code do this by switching between edit mode and plan mode?\n\nFWIW I have had significant improvements by clearing context then implementing the plan. Seems like it stops Claude getting hung up on something."
}
,
{
"id": "47108569",
"text": "How are the annotations put into the markdown? Claude needs to be able to identify them as annotations and not parts of the plan."
}
,
{
"id": "47107951",
"text": "I use amazon kiro.\n\nThe AI first works with you to write requirements, then it produces a design, then a task list.\n\nThe helps the AI to make smaller chunks to work on, it will work on one task at a time.\n\nI can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.\n\nKiro also supports steering files, they are files that try to lock the AI in for common design decisions.\n\nthe price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context."
}
,
{
"id": "47109419",
"text": "I don't really get what is different about this from how almost everyone else uses Claude Code? This is an incredibly common, if not the most common way of using it (and many other tools)."
}
,
{
"id": "47109012",
"text": "It seems like the annotation of plan files is the key step.\n\nClaude Code now creates persistent markdown plan files in ~/.claude/plans/ and you can open them with Ctrl-G to annotate them in your default editor.\n\nSo plan mode is not ephemeral any more."
}
,
{
"id": "47109659",
"text": "Sounds a bit like what Claude Plan Mode or Amazon's Kiro were built for. I agree it's a useful flow, but you can also overdo it."
}
,
{
"id": "47111173",
"text": "There is not a lot of explanation WHY is this better than doing the opposite: start coding and see how it goes and how this would apply to Codex models.\n\nI do exactly the same, I even developed my own workflows wit Pi agent, which works really well. Here is the reason:\n\n- Claude needs a lot more steering than other models, it's too eager to do stuff and does stupid things and write terrible code without feedback.\n\n- Claude is very good at following the plan, you can even use a much cheaper model if you have a good plan. For example I list every single file which needs edits with a short explanation.\n\n- At the end of the plan, I have a clear picture in my head how the feature will exactly look like and I can be pretty sure the end result will be good enough (given that the model is good at following the plan).\n\nA lot of things don't need planning at all. Simple fixes, refactoring, simple scripts, packaging, etc. Just keep it simple."
}
,
{
"id": "47108823",
"text": "Funny how I came up with something loosely similar. Asking Codex to write a detailed plan in a markdown document, reviewing it, and asking it to implement it step by step. It works exquisitely well when it can build and test itself."
}
,
{
"id": "47107451",
"text": "I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details).\n\nThis has changed in the last week, for 3 reasons:\n\n1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to..\n\n2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more.\n\n3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track\nand correct early.\n\nHaving a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction.\n\nIs it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out.\n\nI don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back."
}
,
{
"id": "47107463",
"text": "Just don’t use Claude Code. I can use the Codex CLI with just my $20 subscription and never come close to any usage limits"
}
,
{
"id": "47107497",
"text": "What if it's just slower so that your daily work fits within the paid tier they want?"
}
,
{
"id": "47107564",
"text": "It isn’t slower. I use my personal ChatGPT subscriptions with Codex for almost everything at work and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out. It’s never application code. It’s usually some combination of app code + Docker + AWS issue with my underlying infrastructure - created with whatever IAC that I’m using for a client - Terraform/CloudFormation or the CDK.\n\nI burned through $10 on Claude in less than an hour. I only have $36 a day at $800 a month (800/22 working days)"
}
,
{
"id": "47107640",
"text": "> and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out.\n\nIt doesn’t seem controversial that the model that can solve more complex problems (that you admit the cheaper model can’t solve) costs more.\n\nFor the things I use it for, I’ve not found any other model to be worth it."
}
,
{
"id": "47107806",
"text": "You’re assuming rational behavior from a company that doesn’t care about losing billions of dollar.\n\nHave you tried Codex with OpenAi’s latest models?"
}
,
{
"id": "47108382",
"text": "Not in the last 2 months.\n\nCurrent clause subscription is a sunk cost for the next month. Maybe I’ll try codex if Claude doesn’t lead anywhere."
}
,
{
"id": "47108560",
"text": "I use both. As I’m working, I tell each of them to update a common document with the conversation. I don’t just tell Claude the what. I tell it the why and have it document it.\n\nI can switch back and forth and use the MD file as shared context."
}
,
{
"id": "47108131",
"text": "Curious: what are some cases where it'd make sense to not pay for the 20x plan (which is $200/month), and provide a whopping $800/month pay-per-token allowance instead?"
}
,
{
"id": "47108547",
"text": "Who knows? It’s part of an enterprise plan. I work for a consulting company. There are a number of fallbacks, the first fallback if we are working on an internal project is just to use our internal AWS account and use Claude code with the Anthropic hosted on Bedrock.\n\nhttps://code.claude.com/docs/en/amazon-bedrock\n\nThe second fallback if it is for a customer project is to use their AWS account for development for them.\n\nThe rate my company charges for me - my level as an American based staff consultant (highest bill rate at the company) they are happy to let us use Claude Code using their AWS credentials. Besides, if we are using AWS Bedrock hosted Anthropic models, they know none of their secrets are going to Anthropic. They already have the required legal confidentiality/compliancd agreements with AWS."
}
,
{
"id": "47108073",
"text": "this is literally reinventing claude's planning mode, but with more steps. I think Boris doesn't realize that planning mode is actually stored in a file.\n\nhttps://x.com/boristane/status/2021628652136673282"
}
,
{
"id": "47109399",
"text": "It is really fun to watch how a baby makes its first steps and also how experienced professionals rediscover what standards were telling us for 80+ years."
}
,
{
"id": "47107717",
"text": "There are a few prompt frameworks that essentially codify these types of workflows by adding skills and prompts\n\nhttps://github.com/obra/superpowers\nhttps://github.com/jlevy/tbd"
}
,
{
"id": "47108881",
"text": "Hub and spoke documentation in planning has been absolutely essential for the way my planning was before, and it's pretty cool seeing it work so well for planning mode to build scaffolds and routing."
}
,
{
"id": "47109824",
"text": "this sounds... really slow. for large changes for sure i'm investing time into planning. but such a rigid system can't possible be as good as a flexible approach with variable amounts of planning based on complexity"
}
,
{
"id": "47108956",
"text": "It’s worrying to me that nobody really knows how LLMs work. We create prompts with or without certain words and hope it works. That’s my perspective anyway"
}
,
{
"id": "47108979",
"text": "It's actually no different from how real software is made. Requirements come from the business side, and through an odd game of telephone get down to developers.\n\nThe team that has developers closest to the customer usually makes the better product...or has the better product/market fit.\n\nThen it's iteration."
}
,
{
"id": "47108972",
"text": "It's the same as dealing with a human. You convey a spec for a problem and the language you use matters. You can convey the problem in (from your perspective) a clear way and you will get mixed results nonetheless. You will have to continue to refine the solution with them.\n\nGenuinely: no one really knows how humans work either."
}
,
{
"id": "47108540",
"text": "Is it required to tell Claude to re-read the code folder again when you come back some day later or should we ask Claude to just pickup from research.md file thus saving some tokens?"
}
,
{
"id": "47108880",
"text": "The “inline comments on a plan” is one of the best features of Antigravity, and I’m surprised others haven’t started copycatting."
}
,
{
"id": "47107221",
"text": "I do something broadly similar. I ask for a design doc that contains an embedded todo list, broken down into phases. Looping on the design doc asking for suggestions seems to help. I'm up to about 40 design docs so far on my current project."
}
,
{
"id": "47109669",
"text": "Why don't you make Claude give feedback and iterate by itself?"
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50