llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/batch-1-c21b45fc-3a3f-4b82-8098-2542dd03052e-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Not Novel or Revolutionary
Related: Many commenters argue this workflow is standard practice, not radically different. References to existing tools like Kiro, OpenSpec, SpecKit, and Antigravity that already implement spec-driven development. Claims the approach was documented 2+ years ago in Cursor forums.
2. LLMs as Junior Developers
Related: Analogy comparing LLMs to unreliable interns with boundless energy. Discussion of treating AI like junior developers requiring supervision, documentation, and oversight. The shift from coder to software manager role.
3. AI-Generated Article Concerns
Related: Multiple commenters suspect the article itself was written by AI, noting characteristic style and patterns. Debate about whether AI-written content should be evaluated differently or dismissed outright.
4. Magic Words and Prompt Engineering
Related: Skepticism about whether words like 'deeply' and 'in great details' actually affect LLM behavior. Discussion of attention mechanisms, emotional prompting research, and whether prompt techniques are superstition or cargo cult.
5. Planning vs Just Coding
Related: Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.
6. Spec-Driven Development Tools
Related: References to existing frameworks: OpenSpec, SpecKit, BMAD-METHOD, Kiro, Antigravity. Discussion of how these tools formalize the research-plan-implement workflow described in the article.
7. Context Window Management
Related: Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.
8. Waterfall Methodology Comparison
Related: Commenters note the approach resembles waterfall development with detailed upfront planning. Discussion of whether this contradicts agile principles or represents rediscovering proven methods.
9. Test-Driven Development Integration
Related: Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.
10. Single Session vs Multiple Sessions
Related: Author's claim of running entire workflows in single long sessions without performance degradation. Others recommend clearing context between phases for better results.
11. Determinism and Reproducibility
Related: Concerns about non-deterministic LLM outputs. Discussion of whether software engineering can accommodate probabilistic tools. Comparisons to gambling and slot machines.
12. Token Cost Considerations
Related: Discussion of workflow being token-heavy and expensive. Comparisons between Claude subscription tiers. Arguments that simpler approaches save money while achieving similar results.
13. Annotation Workflow Details
Related: Questions about how to format inline annotations for Claude to recognize. Techniques like TODO prefixes, HTML comments, and clear separation between human and AI-written content.
14. Subagent Architecture
Related: Using multiple agents for different phases: planning, implementation, review. Red team/blue team approaches. Dispatching parallel agents for independent tasks.
15. Reference Implementation Technique
Related: Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality.
16. Claude vs Other Models
Related: Comparisons between Claude, Codex, Gemini, and other models. Discussion of model-specific behaviors and optimal prompting strategies. Using multiple models in complementary roles.
17. Greenfield vs Existing Codebases
Related: Observation that most AI coding articles focus on greenfield development. Different challenges when working with legacy code and established patterns.
18. Human Review Requirements
Related: Debate about whether all AI-generated code must be reviewed line-by-line. Questions about trust, liability, and whether AI can eventually be trusted without oversight.
19. Productivity Claims Skepticism
Related: Questions about actual time savings versus perceived productivity. References to studies showing AI sometimes makes developers less productive. Concerns about false progress.
20. Documentation as Side Benefit
Related: Plans and research documents serve as valuable documentation for future maintainers. Version controlling plan files in git. Using plans to understand architectural decisions later.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "47109958",
"text": "Reproducing experimental results across models and vendors is trivial and cheap nowadays."
}
,
{
"id": "47110037",
"text": "Not if anthropic goes further in obfuscating the output of claude code."
}
,
{
"id": "47110383",
"text": "Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.\n\nThe resulting artefact, that's what is worth testing."
}
,
{
"id": "47110953",
"text": "> Why would you test implementation details\n\nBecause this has never been sufficient. From things like various hard to test cases to things like readability and long term maintenance. Reading and understanding the code is more efficient and necessary for any code worth keeping around."
}
,
{
"id": "47110595",
"text": "I think the real value here isn’t “planning vs not planning,” it’s forcing the model to surface its assumptions before they harden into code.\n\nLLMs don’t usually fail at syntax. They fail at invisible assumptions about architecture, constraints, invariants, etc. A written plan becomes a debugging surface for those assumptions."
}
,
{
"id": "47111468",
"text": "It's also great to describe the full use case flow in the instructions, so you can clearly understand that LLM won't do some stupid thing on its own"
}
,
{
"id": "47111056",
"text": "Sub agent also helps a lot in that regard. Have an agent do the planning, have an implementation agent do the code and have another one do the review. Clear responsabilities helps a lot.\n\nThere also blue team / red team that works.\n\nThe idea is always the same: help LLM to reason properly with less and more clear instructions."
}
,
{
"id": "47111501",
"text": "This sounds very promising. Any link to more details?"
}
,
{
"id": "47111426",
"text": "Did you just write this with ChatGPT?"
}
,
{
"id": "47110921",
"text": "> LLMs don’t usually fail at syntax?\n\nReally? My experience has been that it’s incredibly easy to get them stuck in a loop on a hallucinated API and burn through credits before I’ve even noticed what it’s done. I have a small rust project that stores stuff on disk that I wanted to add an s3 backend too - Claude code burned through my $20 in a loop in about 30 minutes without any awareness of what it was doing on a very simple syntax issue."
}
,
{
"id": "47110900",
"text": "Except that merely surfacing them changes their behavior, like how you add that one printf() call and now your heisenbug is suddenly nonexistent"
}
,
{
"id": "47108872",
"text": "> the workflow I’ve settled into is radically different from what most people do with AI coding tools\n\nThis looks exactly like what anthropic recommends as the best practice for using Claude Code. Textbook.\n\nIt also exposes a major downside of this approach: if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.\n\nI've found a much better approach in doing a design -> plan -> execute in batches, where the plan is no more than 1,500 lines, used as a proxy for complexity.\n\nMy 30,000 LOC app has about 100,000 lines of plan behind it. Can't build something that big as a one-shot."
}
,
{
"id": "47109032",
"text": "if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong\n\nThis is my experience too, but it's pushed me to make much smaller plans and to commit things to a feature branch far more atomically so I can revert a step to the previous commit, or bin the entire feature by going back to main. I do this far more now than I ever did when I was writing the code by hand.\n\nThis is how developers should work regardless of how the code is being developed. I think this is a small but very real way AI has actually made me a better developer (unless I stop doing it when I don't use AI... not tried that yet.)"
}
,
{
"id": "47111305",
"text": "I do this too. Relatively small changes, atomic commits with extensive reasoning in the message (keeps important context around). This is a best practice anyway, but used to be excruciatingly much effort. Now it’s easy!\n\nExcept that I’m still struggling with the LLM understanding its audience/context of its utterances. Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments."
}
,
{
"id": "47110514",
"text": "We're learning the lessons of Agile all over again."
}
,
{
"id": "47110718",
"text": "We're learning how to be an engineer all over again.\n\nThe authors process is super-close what we were taught in engineering 101 40 years ago."
}
,
{
"id": "47111554",
"text": "It's after we come down from the Vibe coding high that we realize we still need to ship working, high-quality code. The lessons are the same, but our muscle memory has to be re-oriented. How do we create estimates when AI is involved? In what ways do we redefine the information flow between Product and Engineering?"
}
,
{
"id": "47110962",
"text": "I always feels like I'm in a fever dream when I hear about AI workflows. A lot of stuff is what I've read from software engineering books and articles."
}
,
{
"id": "47109417",
"text": "LLMs are really eager to start coding (as interns are eager to start working), so the sentence “don’t implement yet” has to be used very often at the beginning of any project."
}
,
{
"id": "47110220",
"text": "Most LLM apps have a 'plan' or 'ask' mode for that."
}
,
{
"id": "47109387",
"text": "Developers should work by wasting lots of time making the wrong thing?\n\nI bet if they did a work and motion study on this approach they'd find the classic:\n\n\"Thinks they're more productive, AI has actually made them less productive\"\n\nBut lots of lovely dopamine from this false progress that gets thrown away!"
}
,
{
"id": "47110258",
"text": "Developers should work by wasting lots of time making the wrong thing?\n\nYes. In fact, that's not emphatic enough: HELL YES!\n\nMore specifically, developers should experiment. They should test their hypothesis. They should try out ideas by designing a solution and creating a proof of concept, then throw that away and build a proper version based on what they learned.\n\nIf your approach to building something is to implement the first idea you have and move on then you are going to waste so much more time later refactoring things to fix architecture that paints you into corners, reimplementing things that didn't work for future use cases, fixing edge cases than you hadn't considered, and just paying off a mountain of tech debt.\n\nI'd actually go so far as to say that if you aren't experimenting and throwing away solutions that don't quite work then you're only amassing tech debt and you're not really building anything that will last. If it does it's through luck rather than skill.\n\nAlso, this has nothing to do with AI. Developers should be working this way even if they handcraft their artisanal code carefully in vi."
}
,
{
"id": "47111021",
"text": ">> Developers should work by wasting lots of time making the wrong thing?\n\n> Yes. In fact, that's not emphatic enough: HELL YES!\n\nYou do realize there are prior research and well tested solutions for a lot of things. Instead of wasting time making the wrong thing, it is faster to do some research if the problem has already been solved. Experimentation is fine only after checking that the problem space is truly novel or there's not enough information around.\n\nIt is faster to iterate in your mental space and in front of a whiteboard than in code."
}
,
{
"id": "47110292",
"text": "> Developers should work by wasting lots of time making the wrong thing?\n\nYes? I can't even count how many times I worked on something my company deemed was valuable only for it to be deprecated or thrown away soon after. Or, how many times I solved a problem but apparently misunderstood the specs slightly and had to redo it. Or how many times we've had to refactor our code because scope increased. In fact, the very existence of the concepts of refactoring and tech debt proves that devs often spend a lot of time making the \"wrong\" thing.\n\nIs it a waste? No, it solved the problem as understood at the time. And we learned stuff along the way."
}
,
{
"id": "47109751",
"text": "Classic\n\nhttps://metr.org/blog/2025-07-10-early-2025-ai-experienced-o..."
}
,
{
"id": "47109426",
"text": "> design -> plan -> execute in batches\n\nThis is the way for me as well. Have a high-level master design and plan, but break it apart into phases that are manageable. One-shotting anything beyond a todo list and expecting decent quality is still a pipe dream."
}
,
{
"id": "47109596",
"text": "> if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.\n\nYou just revert what the AI agent changed and revise/iterate on the previous step - no need to start over. This can of course involve restricting the work to a smaller change so that the agent isn't overwhelmed by complexity."
}
,
{
"id": "47110513",
"text": "How can you know that 100k lines plan is not just slop?\n\nJust because plan is elaborate doesn’t mean it makes sense."
}
,
{
"id": "47109087",
"text": "wtf, why would you write 100k lines of plan to produce 30k loc.. JUST WRITE THE CODE!!!"
}
,
{
"id": "47109133",
"text": "They didn't write 100k plan lines. The llm did (99.9% of it at least or more). Writing 30k by hand would take weeks if not months. Llms do it in an afternoon."
}
,
{
"id": "47109223",
"text": "Just reading that plan would take weeks or months"
}
,
{
"id": "47109460",
"text": "You don't start with 100k lines, you work in batches that are digestible. You read it once, then move on. The lines add up pretty quickly considering how fast Claude works. If you think about the difference in how many characters it takes to describe what code is doing in English, it's pretty reasonable."
}
,
{
"id": "47109375",
"text": "And my weeks or months of work beats an LLMs 10/10 times. There are no shortcuts in life."
}
,
{
"id": "47109921",
"text": "I have no doubts that it does for many people. But the time/cost tradeoff is still unquestionable. I know I could create what LLMs do for me in the frontend/backend in most cases as good or better - I know that, because I've done it at work for years. But to create a somewhat complex app with lots of pages/features/apis etc. would take me months if not a year++ since I'd be working on it only on the weekends for a few hours. Claude code helps me out by getting me to my goal in a fraction of the time. Its superpower lies not only in doign what I know but faster, but in doing what I don't know as well.\n\nI yield similar benefits at work. I can wow management with LLM assited/vibe coded apps. What previously would've taken a multi-man team weeks of planning and executing, stand ups, jour fixes, architecture diagrams, etc. can now be done within a single week by myself. For the type of work I do, managers do not care whether I could do it better if I'd code it myself. They are amazed however that what has taken months previously, can be done in hours nowadays. And I for sure will try to reap benefits of LLMs for as long as they don't replace me rather than being idealistic and fighting against them."
}
,
{
"id": "47110882",
"text": "> What previously would've taken a multi-man team weeks of planning and executing, stand ups, jour fixes, architecture diagrams, etc. can now be done within a single week by myself.\n\nThis has been my experience. We use Miro at work for diagramming. Lots of visual people on the team, myself included. Using Miro's MCP I draft a solution to a problem and have Miro diagram it. Once we talk it through as a team, I have Claude or codex implement it from the diagram.\n\nIt works surprisingly well.\n\n> They are amazed however that what has taken months previously, can be done in hours nowadays.\n\nOf course they're amazed. They don't have to pay you for time saved ;)\n\n> reap benefits of LLMs for as long as they don't replace me\n> What previously would've taken a multi-man team\n\nI think this is the part that people are worried about. Every engineer who uses LLMs says this. By definition it means that people are being replaced.\n\nI think I justify it in that no one on my team has been replaced. But management has explicitly said \"we don't want to hire more because we can already 20x ourselves with our current team +LLM.\" But I do acknowledge that many people ARE being replaced; not necessarily by LLMs, but certainly by other engineers using LLMs."
}
,
{
"id": "47111101",
"text": "I'm still waiting for the multi-years success stories. Greenfield solutions are always easy (which is why we have frameworks that automate them). But maintaining solutions over years is always the true test of any technologies.\n\nIt's already telling that nothing has staying power in the LLMs world (other than the chat box). Once the limitations can no longer be hidden by the hype and the true cost is revealed, there's always a next thing to pivot to."
}
,
{
"id": "47111064",
"text": "> but in doing what I don't know as well.\n\nComments like these really help ground what I read online about LLMs. This matches how low performing devs at my work use AI, and their PRs are a net negative on the team. They take on tasks they aren’t equipped to handle and use LLMs to fill the gaps quickly instead of taking time to learn (which LLMs speed up!)."
}
,
{
"id": "47109406",
"text": "Might be true for you. But there are plenty of top tier engineers who love LLMs. So it works for some. Not for others.\n\nAnd of course there are shortcuts in life. Any form of progress whether its cars, medicine, computers or the internet are all shortcuts in life. It makes life easier for a lot of people."
}
,
{
"id": "47109781",
"text": "That's not (or should not be what's happening).\n\nThey write a short high level plan (let's say 200 words). The plan asks the agent to write a more detailed implementation plan (written by the LLM, let's say 2000-5000 words).\n\nThey read this plan and adjust as needed, even sending it to the agent for re-dos.\n\nOnce the implementation plan is done, they ask the agent to write the actual code changes.\n\nThen they review that and ask for fixes, adjustments, etc.\n\nThis can be comparable to writing the code yourself but also leaves a detailed trail of what was done and why, which I basically NEVER see in human generated code.\n\nThat alone is worth gold, by itself.\n\nAnd on top of that, if you're using an unknown platform or stack, it's basically a rocket ship. You bootstrap much faster. Of course, stay on top of the architecture, do controlled changes, learn about the platform as you go, etc."
}
,
{
"id": "47110361",
"text": "I take this concept and I meta-prompt it even more.\n\nI have a road map (AI generated, of course) for a side project I'm toying around with to experiment with LLM-driven development. I read the road map and I understand and approve it. Then, using some skills I found on skills.sh and slightly modified, my workflow is as such:\n\n1. Brainstorm the next slice\n\nIt suggests a few items from the road map that should be worked on, with some high level methodology to implement. It asks me what the scope ought to be and what invariants ought to be considered. I ask it what tradeoffs could be, why, and what it recommends, given the product constraints. I approve a given slice of work.\n\nNB: this is the part I learn the most from. I ask it why X process would be better than Y process given the constraints and it either corrects itself or it explains why. \"Why use an outbox pattern? What other patterns could we use and why aren't they the right fit?\"\n\n2. Generate slice\n\nAfter I approve what to work on next, it generates a high level overview of the slice, including files touched, saved in a MD file that is persisted. I read through the slice, ensure that it is indeed working on what I expect it to be working on, and that it's not scope creeping or undermining scope, and I approve it. It then makes a plan based off of this.\n\n3. Generate plan\n\nIt writes a rather lengthy plan, with discrete task bullets at the top. Beneath, each step has to-dos for the llm to follow, such as generating tests, running migrations, etc, with commit messages for each step. I glance through this for any potential red flags.\n\n4. Execute\n\nThis part is self explanatory. It reads the plan and does its thing.\n\nI've been extremely happy with this workflow. I'll probably write a blog post about it at some point."
}
,
{
"id": "47111568",
"text": "This is a super helpful and productive comment. I look forward to a blog post describing your process in more detail."
}
,
{
"id": "47109185",
"text": "100,000 lines is approx. one million words. The average person reads at 250wpm. The entire thing would take 66 hours just to read, assuming you were approaching it like a fiction book, not thinking anything over"
}
,
{
"id": "47108936",
"text": "Dunno. My 80k+ LOC personal life planner, with a native android app, eink display view still one shots most features/bugs I encounter. I just open a new instance let it know what I want and 5min later it's done."
}
,
{
"id": "47109941",
"text": "If you wouldn't mind sharing more about this in the future I'd love to read about it.\n\nI've been thinking about doing something like that myself because I'm one of those people who have tried countless apps but there's always a couple deal breakers that cause me to drop the app.\n\nI figured trying to agentically develop a planner app with the exact feature set I need would be an interesting and fun experiment."
}
,
{
"id": "47109079",
"text": "In 5 min you are one shotting smaller changes to the larger code base right? Not the entire 80k likes which was the other comments point afaict."
}
,
{
"id": "47109172",
"text": "Yeah, then I guess I misunderstood the post. Its smaller features one by one ofc."
}
,
{
"id": "47108955",
"text": "Both can be true. I have personally experienced both.\n\nSome problems AI surprised me immensely with fast, elegant efficient solutions and problem solving. I've also experienced AI doing totally absurd things that ended up taking multiple times longer than if I did it manually. Sometimes in the same project."
}
,
{
"id": "47109005",
"text": "What is a personal life planner?"
}
,
{
"id": "47109039",
"text": "Todos, habits, goals, calendar, meals, notes, bookmarks, shopping lists, finances. More or less that with Google cal integration, garmin Integration (Auto updates workout habits, weight goals) family sharing/gamification, daily/weekly reviews, ai summaries and more. All built by just prompting Claude for feature after feature, with me writing 0 lines."
}
,
{
"id": "47109308",
"text": "Ah, I imagined actual life planning as in asking AI what to do, I was morbidly curious.\n\nPrompting basic notes apps is not as exciting but I can see how people who care about that also care about it being exactly a certain way, so I think get your excitement."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50