Summarizer

LLM Input

llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/topic-4-6b92e3b2-0fff-4be1-8f0a-f38727c93c7e-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Planning vs Just Coding # Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.
</topic>

<comments_about_topic>
1. But the author's workflow is actually very different from Boris'.

#6 is about using plan mode whereas the author says "The built-in plan mode sucks".

The author's post is much more than just "planning with clarity".

2. Agreed. The process described is much more elaborate than what I do but quite similar. I start to discuss in great details what I want to do, sometimes asking the same question to different LLMs. Then a todo list, then manual review of the code, esp. each function signature, checking if the instructions have been followed and if there are no obvious refactoring opportunities (there almost always are).

The LLM does most of the coding, yet I wouldn't call it "vibe coding" at all.

"Tele coding" would be more appropriate.

3. I use AWS Kiro, and its spec driven developement is exactly this, I find it really works well as it makes me slow down and think about what I want it to do.

Requirements, design, task list, coding.

4. It feels like retracing the history of software project management. The post is quite waterfall-like. Writing a lot of docs and specs upfront then implementing. Another approach is to just YOLO (on a new branch) make it write up the lessons afterwards, then start a new more informed try and throw away the first. Or any other combo.

For me what works well is to ask it to write some code upfront to verify its assumptions against actual reality, not just be telling it to review the sources "in detail". It gains much more from real output from the code and clears up wrong assumptions. Do some smaller jobs, write up md files, then plan the big thing, then execute.

5. It's nice to have it written down in a concise form. I shared it with my team as some engineers have been struggling with AI, and I think this (just trying to one-shot without planning) could be why.

6. if only there was another simpler way to use your knowledge to write code...

7. I think the real value here isn’t “planning vs not planning,” it’s forcing the model to surface its assumptions before they harden into code.

LLMs don’t usually fail at syntax. They fail at invisible assumptions about architecture, constraints, invariants, etc. A written plan becomes a debugging surface for those assumptions.

8. Except that merely surfacing them changes their behavior, like how you add that one printf() call and now your heisenbug is suddenly nonexistent

9. > the workflow I’ve settled into is radically different from what most people do with AI coding tools

This looks exactly like what anthropic recommends as the best practice for using Claude Code. Textbook.

It also exposes a major downside of this approach: if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.

I've found a much better approach in doing a design -> plan -> execute in batches, where the plan is no more than 1,500 lines, used as a proxy for complexity.

My 30,000 LOC app has about 100,000 lines of plan behind it. Can't build something that big as a one-shot.

10. if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong

This is my experience too, but it's pushed me to make much smaller plans and to commit things to a feature branch far more atomically so I can revert a step to the previous commit, or bin the entire feature by going back to main. I do this far more now than I ever did when I was writing the code by hand.

This is how developers should work regardless of how the code is being developed. I think this is a small but very real way AI has actually made me a better developer (unless I stop doing it when I don't use AI... not tried that yet.)

11. I do this too. Relatively small changes, atomic commits with extensive reasoning in the message (keeps important context around). This is a best practice anyway, but used to be excruciatingly much effort. Now it’s easy!

Except that I’m still struggling with the LLM understanding its audience/context of its utterances. Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.

12. It's after we come down from the Vibe coding high that we realize we still need to ship working, high-quality code. The lessons are the same, but our muscle memory has to be re-oriented. How do we create estimates when AI is involved? In what ways do we redefine the information flow between Product and Engineering?

13. LLMs are really eager to start coding (as interns are eager to start working), so the sentence “don’t implement yet” has to be used very often at the beginning of any project.

14. Developers should work by wasting lots of time making the wrong thing?

Yes. In fact, that's not emphatic enough: HELL YES!

More specifically, developers should experiment. They should test their hypothesis. They should try out ideas by designing a solution and creating a proof of concept, then throw that away and build a proper version based on what they learned.

If your approach to building something is to implement the first idea you have and move on then you are going to waste so much more time later refactoring things to fix architecture that paints you into corners, reimplementing things that didn't work for future use cases, fixing edge cases than you hadn't considered, and just paying off a mountain of tech debt.

I'd actually go so far as to say that if you aren't experimenting and throwing away solutions that don't quite work then you're only amassing tech debt and you're not really building anything that will last. If it does it's through luck rather than skill.

Also, this has nothing to do with AI. Developers should be working this way even if they handcraft their artisanal code carefully in vi.

15. >> Developers should work by wasting lots of time making the wrong thing?

> Yes. In fact, that's not emphatic enough: HELL YES!

You do realize there are prior research and well tested solutions for a lot of things. Instead of wasting time making the wrong thing, it is faster to do some research if the problem has already been solved. Experimentation is fine only after checking that the problem space is truly novel or there's not enough information around.

It is faster to iterate in your mental space and in front of a whiteboard than in code.

16. > Developers should work by wasting lots of time making the wrong thing?

Yes? I can't even count how many times I worked on something my company deemed was valuable only for it to be deprecated or thrown away soon after. Or, how many times I solved a problem but apparently misunderstood the specs slightly and had to redo it. Or how many times we've had to refactor our code because scope increased. In fact, the very existence of the concepts of refactoring and tech debt proves that devs often spend a lot of time making the "wrong" thing.

Is it a waste? No, it solved the problem as understood at the time. And we learned stuff along the way.

17. > design -> plan -> execute in batches

This is the way for me as well. Have a high-level master design and plan, but break it apart into phases that are manageable. One-shotting anything beyond a todo list and expecting decent quality is still a pipe dream.

18. > if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.

You just revert what the AI agent changed and revise/iterate on the previous step - no need to start over. This can of course involve restricting the work to a smaller change so that the agent isn't overwhelmed by complexity.

19. How can you know that 100k lines plan is not just slop?

Just because plan is elaborate doesn’t mean it makes sense.

20. wtf, why would you write 100k lines of plan to produce 30k loc.. JUST WRITE THE CODE!!!

21. They didn't write 100k plan lines. The llm did (99.9% of it at least or more). Writing 30k by hand would take weeks if not months. Llms do it in an afternoon.

22. Just reading that plan would take weeks or months

23. You don't start with 100k lines, you work in batches that are digestible. You read it once, then move on. The lines add up pretty quickly considering how fast Claude works. If you think about the difference in how many characters it takes to describe what code is doing in English, it's pretty reasonable.

24. That's not (or should not be what's happening).

They write a short high level plan (let's say 200 words). The plan asks the agent to write a more detailed implementation plan (written by the LLM, let's say 2000-5000 words).

They read this plan and adjust as needed, even sending it to the agent for re-dos.

Once the implementation plan is done, they ask the agent to write the actual code changes.

Then they review that and ask for fixes, adjustments, etc.

This can be comparable to writing the code yourself but also leaves a detailed trail of what was done and why, which I basically NEVER see in human generated code.

That alone is worth gold, by itself.

And on top of that, if you're using an unknown platform or stack, it's basically a rocket ship. You bootstrap much faster. Of course, stay on top of the architecture, do controlled changes, learn about the platform as you go, etc.

25. I take this concept and I meta-prompt it even more.

I have a road map (AI generated, of course) for a side project I'm toying around with to experiment with LLM-driven development. I read the road map and I understand and approve it. Then, using some skills I found on skills.sh and slightly modified, my workflow is as such:

1. Brainstorm the next slice

It suggests a few items from the road map that should be worked on, with some high level methodology to implement. It asks me what the scope ought to be and what invariants ought to be considered. I ask it what tradeoffs could be, why, and what it recommends, given the product constraints. I approve a given slice of work.

NB: this is the part I learn the most from. I ask it why X process would be better than Y process given the constraints and it either corrects itself or it explains why. "Why use an outbox pattern? What other patterns could we use and why aren't they the right fit?"

2. Generate slice

After I approve what to work on next, it generates a high level overview of the slice, including files touched, saved in a MD file that is persisted. I read through the slice, ensure that it is indeed working on what I expect it to be working on, and that it's not scope creeping or undermining scope, and I approve it. It then makes a plan based off of this.

3. Generate plan

It writes a rather lengthy plan, with discrete task bullets at the top. Beneath, each step has to-dos for the llm to follow, such as generating tests, running migrations, etc, with commit messages for each step. I glance through this for any potential red flags.

4. Execute

This part is self explanatory. It reads the plan and does its thing.

I've been extremely happy with this workflow. I'll probably write a blog post about it at some point.

26. 100,000 lines is approx. one million words. The average person reads at 250wpm. The entire thing would take 66 hours just to read, assuming you were approaching it like a fiction book, not thinking anything over

27. Do you think that Anthropic don’t include things like this in their harness / system prompts? I feel like this kind of prompts are uneccessary with Opus 4.5 onwards, obviously based on my own experience (I used to do this, on switching to opus I stopped and have implemented more complex problems, more successfully).

I am having the most success describing what I want as humanly as possible, describing outcomes clearly, making sure the plan is good and clearing context before implementing.

28. I've been running AI coding workshops for engineers transitioning from traditional development, and the research phase is consistently the part people skip — and the part that makes or breaks everything.

The failure mode the author describes (implementations that work in isolation but break the surrounding system) is exactly what I see in workshop after workshop. Engineers prompt the LLM with "add pagination to the list endpoint" and get working code that ignores the existing query builder patterns, duplicates filtering logic, or misses the caching layer entirely.

What I tell people: the research.md isn't busywork, it's your verification that the LLM actually understands the system it's about to modify. If you can't confirm the research is accurate, you have no business trusting the plan.

One thing I'd add to the author's workflow: I've found it helpful to have the LLM explicitly list what it does NOT know or is uncertain about after the research phase. This surfaces blind spots before they become bugs buried three abstraction layers deep.

29. I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.

My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.

Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.

The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:

-keep the spine runnable early (end-to-end scaffold)

-add one thin slice at a time (don’t let it touch 15 files speculatively)

-force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)

-treat refactors as normal, because the harness makes them safe

If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.

Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.

30. I actually don't really like a few of things about this approach.

First, the "big bang" write it all at once. You are going to end up with thousands of lines of code that were monolithically produced. I think it is much better to have it write the plan and formulate it as sensible technical steps that can be completed one at a time. Then you can work through them. I get that this is not very "vibe"ish but that is kind of the point. I want the AI to help me get to the same point I would be at with produced code AND understanding of it, just accelerate that process. I'm not really interested in just generating thousands of lines of code that nobody understands.

Second, the author keeps refering to adjusting the behaviour, but never incorporating that into long lived guidance. To me, integral with the planning
process is building an overarching knowledge base. Every time you're telling it
there's something wrong, you need to tell it to update the knowledge base about
why so it doesn't do it again.

Finally, no mention of tests? Just quick checks? To me, you have to end up with
comprehensive tests. Maybe to the author it goes without saying, but I find it is
integral to build this into the planning. Certain stages you will want certain
types of tests. Some times in advance of the code (so TDD style) other times
built alongside it or after.

It's definitely going to be interesting to see how software methodology evolves
to incorporate AI support and where it ultimately lands.

31. The articles approach matches mine, but I've learned from exactly the things you're pointing out.

I get the PLAN.md (or equivalent) to be separated into "phases" or stages, then carefully prompt (because Claude and Codex both love to "keep going") it to only implement that stage, and update the PLAN.md

Tests are crucial too, and form another part of the plan really. Though my current workflow begins to build them later in the process than I would prefer...

32. This all looks fine for someone who can't code, but for anyone with even a moderate amount of experience as a developer all this planning and checking and prompting and orchestrating is far more work than just writing the code yourself.

There's no winner for "least amount of code written regardless of productivity outcomes.", except for maybe Anthropic's bank account.

33. Trust me I'm very impressed at the progress AI has made, and maybe we'll get to the point where everything is 100% correct all the time and better than any human could write. I'm skeptical we can get there with the LLM approach though.

The problem is LLMs are great at simple implementation, even large amounts of simple implementation, but I've never seen it develop something more than trivial correctly. The larger problem is it's very often subtly but hugely wrong. It makes bad architecture decisions, it breaks things in pursuit of fixing or implementing other things. You can tell it has no concept of the "right" way to implement something. It very obviously lacks the "senior developer insight".

Maybe you can resolve some of these with large amounts of planning or specs, but that's the point of my original comment - at what point is it easier/faster/better to just write the code yourself? You don't get a prize for writing the least amount of code when you're just writing specs instead.

34. This is exactly what the article is about. The tradeoff is that you have to throughly review the plans and iterate on them, which is tiring. But the LLM will write good code faster than you, if you tell it what good code is.

35. My experience has so far been similar to the root commenter - at the stage where you need to have a long cycle with planning it's just slower than doing the writing + theory building on my own.

It's an okay mental energy saver for simpler things, but for me the self review in an actual production code context is much more draining than writing is.

I guess we're seeing the split of people for whom reviewing is easy and writing is difficult and vice versa.

36. The key part of my comment is "correctly".

Does it write maintainable code? Does it write extensible code? Does it write secure code? Does it write performant code?

My experience has been it failing most of these. The code might "work", but it's not good for anything more than trivial, well defined functions (that probably appeared in it's training data written by humans). LLMs have a fundamental lack of understanding of what they're doing, and it's obvious when you look at the finer points of the outcomes.

That said, I'm sure you could write detailed enough specs and provide enough examples to resolve these issues, but that's the point of my original comment - if you're just writing specs instead of code you're not gaining anything.

37. I find “maintainable code” the hardest bias to let go of. 15+ years of coding and design patterns are hard to let go.

But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

Specs are worth it IMO. Not because if I can spec, I could’ve coded anyway. But because I gain all the insight and capabilities of AI, while minimizing the gotchas and edge failures.

38. To answer all of your questions:

yes, if I steer it properly.

It's very good at spotting design patterns, and implementing them. It doesn't always know where or how to implement them, but that's my job.

The specs and syntactic sugar are just nice quality of life benefits.

39. > Yesterday I had Claude write an audit logging feature to track all changes made to entities in my app. Yeah you get this for free with many frameworks, but my company's custom setup doesn't have it.

But did you truly think about such feature? Like guarantees that it should follow (like how do it should cope with entities migration like adding a new field) or what the cost of maintaining it further down the line. This looks suspiciously like drive-by PR made on open-source projects.

> That would've taken me at least a day, maybe two.

I think those two days would have been filled with research, comparing alternatives, questions like "can we extract this feature from framework X?", discussing ownership and sharing knowledge,.. Jumping on coding was done before LLMs, but it usually hurts the long term viability of the project.

Adding code to a project can be done quite fast (hackatons,...), ensuring quality is what slows things down in any any well functioning team.

40. It's not particularly interesting.

I wanted to add audit logging for all endpoints we call, all places we call the DB, etc. across areas I haven't touched before. It would have taken me a while to track down all of the touchpoints.

Granted, I am not 100% certain that Claude didn't miss anything. I feel fairly confident that it is correct given that I had it research upfront, had multiple agents review, and it made the correct changes in the areas that I knew.

Also I'm realizing I didn't mention it included an API + UI for viewing events w/ pretty deltas

41. I'd find it deeply funny if the optimal vibe coding workflow continues to evolve to include more and more human oversight, and less and less agent autonomy, to the point where eventually someone makes a final breakthrough that they can save time by bypassing the LLM entirely and writing the code themselves. (Finally coming full circle.)

42. Researching and planning a project is a generally usefully thing. This is something I've been doing for years, and have always had great results compared to just jumping in and coding. It makes perfect sense that this transfers to LLM use.

43. Since Opus 4.5, things have changed quite a lot. I find LLMs very useful for discussing new features or ideas, and Sonnet is great for executing your plan while you grab a coffee.

44. Most of these AI coding articles seem to be about greenfield development.

That said, if you're on a serious team writing professional software there is still tons of value in always telling AI to plan first, unless it's a small quick task. This post just takes it a few steps further and formalizes it.

I find Cursor works much more reliably using plan mode, reviewing/revising output in markdown, then pressing build. Which isn't a ton of overhead but often leads to lots of context switching as it definitely adds more time.

45. I want to be clear, I'm not against any use of AI. It's hugely useful to save a couple of minutes of "write this specific function to do this specific thing that I could write and know exactly what it would look like". That's a great use, and I use it all the time! It's better autocomplete. Anything beyond that is pushing it - at the moment! We'll see, but spending all day writing specs and double-checking AI output is not more productive than just writing correct code yourself the first time, even if you're AI-autocompleting some of it.

46. Surely Addy Osmani can code. Even he suggests plan first.

https://news.ycombinator.com/item?id=46489061

47. > planning and checking and prompting and orchestrating is far more work than just writing the code yourself.

This! Once I'm familiar with the codebase (which I strive to do very quickly), for most tickets, I usually have a plan by the time I've read the description. I can have a couple of implementation questions, but I knew where the info is located in the codebase. For things, I only have a vague idea, the whiteboard is where I go.

The nice thing with such a mental plan, you can start with a rougher version (like a drawing sketch). Like if I'm starting a new UI screen, I can put a placeholder text like "Hello, world", then work on navigation. Once that done, I can start to pull data, then I add mapping functions to have a view model,...

Each step is a verifiable milestone. Describing them is more mentally taxing than just writing the code (which is a flow state for me). Why? Because English is not fit to describe how computer works (try describe a finite state machine like navigation flow in natural languages). My mental mental model is already aligned to code, writing the solution in natural language is asking me to be ambiguous and unclear on purpose.

48. “The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan.”

I’m not sure we need to be this black and white about things. Speaking from the perspective of leading a dev team, I regularly have Claude Code take a chance at code without reviewing a plan. For example, small issues that I’ve written clear details about, Claude can go to town on those. I’ve never been on a team that didn’t have too many of these types of issues to address.

And, a team should have othee guards in place that validates that code before it gets merged somewhere important.

I don’t have to review every single decision one of my teammates is going to make, even those less experienced teammates, but I do prepare teammates with the proper tools (specs, documentation, etc) so they can make a best effort first attempt. This is how I treat Claude Code in a lot of scenarios.

49. Radically different? Sounds to me like the standard spec driven approach that plenty of people use.

I prefer iterative approach. LLMs give you incredible speed to try different approaches and inform your decisions. I don’t think you can ever have a perfect spec upfront, at least that’s my experience.

50. This is quite close to what I've arrived at, but with two modifications

1) anything larger I work on in layers of docs. Architecture and requirements -> design -> implementation plan -> code. Partly it helps me think and nail the larger things first, and partly helps claude. Iterate on each level until I'm satisfied.

2) when doing reviews of each doc I sometimes restart the session and clear context, it often finds new issues and things to clear up before starting the next phase.

51. The multi-pass approach works outside of code too. I run a fairly complex automation pipeline (prompt -> script -> images -> audio -> video assembly) and the single biggest quality improvement was splitting generation into discrete planning and execution phases. One-shotting a 10-step pipeline means errors compound. Having the LLM first produce a structured plan, then executing each step against that plan with validation gates between them, cut my failure rate from maybe 40% to under 10%. The planning doc also becomes a reusable artifact you can iterate on without re-running everything.

52. This looks like an important post. What makes it special is that it operationalizes Polya's classic problem-solving recipe for the age of AI-assisted coding.

1. Understand the problem (research.md)

2. Make a plan (plan.md)

3. Execute the plan

4. Look back

53. Yeah, OODA loop for programmers, basically. It’s a good approach.

54. Since everyone is showing their flow, here's mine:

* create a feature-name.md file in a gitignored folder

* start the file by giving the business context

* describe a high-level implementation and user flows

* describe database structure changes (I find it important not to leave it for interpretation)

* ask Claude to inspect the feature and review if for coherence, while answering its questions I ask to augment feature-name.md file with the answers

* enter Claude's plan mode and provide that feature-name.md file

* at this point it's detailed enough that rarely any corrections from me are needed

55. This is the way.

The practice is:

- simple

- effective

- retains control and quality

Certainly the “unsupervised agent” workflows are getting a lot of attention right now, but they require a specific set of circumstances to be effective:

- clear validation loop (eg. Compile the kernel, here is gcc that does so correctly)

- ai enabled tooling (mcp / cli tool that will lint, test and provide feedback immediately)

- oversight to prevent sgents going off the rails (open area of research)

- an unlimited token budget

That means that most people can't use unsupervised agents.

Not that they dont work; Most people have simply not got an environment and task that is appropriate.

By comparison, anyone with cursor or claude can immediately start using this approach , or their own variant on it.

It does not require fancy tooling.

It does not require an arcane agent framework.

It works generally well across models.

This is one of those few genunie pieces of good practical advice for people getting into AI coding.

Simple. Obviously works once you start using it. No external dependencies. BYO tools to help with it, no “buy my AI startup xxx to help”. No “star my github so I can a job at $AI corp too”.

Great stuff.

56. Honesty this is just language models in general at the moment, and not just coding.

It’s the same reason adding a thinking step works.

You want to write a paper, you have it form a thesis and structure first. (In this one you might be better off asking for 20 and seeing if any of them are any good.) You want to research something, first you add gathering and filtering steps before synthesis.

Adding smarter words or telling it to be deeper does work by slightly repositioning where your query ends up in space.

Asking for the final product first right off the bat leads to repetitive verbose word salad. It just starts to loop back in on itself. Which is why temperature was a thing in the first place, and leads me to believe they’ve turned the temp down a bit to try and be more accurate. Add some randomness and variability to your prompts to compensate.

57. Absolutely. And you can also always let the agent look back at the plan to check if it is still on track and aligned.

One step I added, that works great for me, is letting it write (api-level) tests after planning and before implementation. Then I’ll do a deep review and annotation of these tests and tweak them until everything is just right.

58. Huge +1. This loop consistently delivers great results for my vibe coding.

The “easy” path of “short prompt declaring what I want” works OK for simple tasks but consistently breaks down for medium to high complexity tasks.

59. Can you help me understand the difference between "short prompt for what I want (next)" vs medium to high complexity tasks?

What i mean is, in practice, how does one even get to a a high complexity task? What does that look like? Because isn't it more common that one sees only so far ahead?

60. It's more or less what comes out of the box with plan mode, plus a few extra bits?

61. Planning is important because you get the LLM to explain the problem and solution in its language and structure, not yours.

This shortcuts a range of problem cases where the LLM fights between the users strict and potentially conflicting requirements, and its own learning.

In the early days we used to get LLM to write the prompts for us to get round this problem, now we have planning built in.

62. This is the flow I've found myself working towards. Essentially maintaining more and more layered documentation for the LLM produces better and more consistent results. What is great here is the emphasis on the use of such documents in the planning phase. I'm feeling much more motivated to write solid documentation recently, because I know someone (the LLM) is actually going to read it! I've noticed my efforts and skill acquisition have moved sharply from app developer towards DevOps and architecture / management, but I think I'll always be grateful for the application engineering experience that I think the next wave of devs might miss out on.

I've also noted such a huge gulf between some developers describing 'prompting things into existence' and the approach described in this article. Both types seem to report success, though my experience is that the latter seems more realistic, and much more likely to produce robust code that's likely to be maintainable for long term or project critical goals.

63. I do something very similar, also with Claude and Codex, because the workflow is controlled by me, not by the tool. But instead of plan.md I use a ticket system basically like ticket_<number>_<slug>.md where I let the agent create the ticket from a chat, correct and annotate it afterwards and send it back, sometimes to a new agent instance. This workflow helps me keeping track of what has been done over time in the projects I work on. Also this approach does not need any „real“ ticket system tooling/mcp/skill/whatever since it works purely on text files.

64. +1 to creating tickets by simply asking the agent to. It's worked great and larger tasks can be broken down into smaller subtasks that could reasonably be completed in a single context window, so you rarely every have to deal with compaction. Especially in the last few months since Claude's gotten good at dispatching agents to handle tasks if you ask it to, I can plan large changes that span multilpe tickets and tell claude to dispatch agents as needed to handle them (which it will do in parallel if they mostly touch different files), keeping the main chat relatively clean for orchestration and validation work.

65. semantic plan name is important

66. Regarding inline notes, I use a specific format in the `/plan` command, by using th `ME:` prefix.

https://github.com/srid/AI/blob/master/commands/plan.md#2-pl...

It works very similar to Antigravity's plan document comment-refine cycle.

https://antigravity.google/docs/implementation-plan

67. I’ve been using this same pattern, except not the research phase. Definetly will try to add it to my process aswell.

Sometimes when doing big task I ask claude to implement each phase seprately and review the code after each step.

68. The annotation cycle is the key insight for me. Treating the plan as a living doc you iterate on before touching any code makes a huge difference in output quality.

Experimentally, i've been using mfbt.ai [ https://mfbt.ai ] for roughly the same thing in a team context. it lets you collaboratively nail down the spec with AI before handing off to a coding agent via MCP.

Avoids the "everyone has a slightly different plan.md on their machine" problem. Still early days but it's been a nice fit for this kind of workflow.

69. I agree, and this is why I tend to use gptel in emacs for planning - the document is the conversation context, and can be edited and annotated as you like.

70. I've been working off and on on a vibe coded FP language and transpiler - mostly just to get more experience with Claude Code and see how it handles complex real world projects. I've settled on a very similar flow, though I use three documents: plan, context, task list. Multiple rounds of iteration when planning a feature. After completion, have a clean session do an audit to confirm that everything was implemented per the design. Then I have both Claude and CodeRabbit do code review passes before I finally do manual review. VERY heavy emphasis on tests, the project currently has 2x more test code than application code. So far it works surprisingly well. Example planning docs below -

https://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv...

71. I've been teaching AI coding tool workshops for the past year and this planning-first approach is by far the most reliable pattern I've seen across skill levels.

The key insight that most people miss: this isn't a new workflow invented for AI - it's how good senior engineers already work. You read the code deeply, write a design doc, get buy-in, then implement. The AI just makes the implementation phase dramatically faster.

What I've found interesting is that the people who struggle most with AI coding tools are often junior devs who never developed the habit of planning before coding. They jump straight to "build me X" and get frustrated when the output is a mess. Meanwhile, engineers with 10+ years of experience who are used to writing design docs and reviewing code pick it up almost instantly - because the hard part was always the planning, not the typing.

One addition I'd make to this workflow: version your research.md and plan.md files in git alongside your code. They become incredibly valuable documentation for future maintainers (including future-you) trying to understand why certain architectural decisions were made.

72. The biggest roadblock to using agents to maximum effectiveness like this is the chat interface. It's convenience as detriment and convenience as distraction. I've found myself repeatedly giving into that convenience only to realize that I have wasted an hour and need to start over because the agent is just obliviously circling the solution that I thought was fully obvious from the context I gave it. Clearly these tools are exceptional at transforming inputs into outputs and, counterintuitively, not as exceptional when the inputs are constantly interleaved with the outputs like they are in chat mode.

73. Lol I wrote about this and been using plan+execute workflow for 8 months.

Sadly my post didn't much attention at the time.

https://thegroundtruth.media/p/my-claude-code-workflow-and-p...

74. I have to give this a try. My current model for backend is the same as how author does frontend iteration. My friend does the research-plan-edit-implement loop, and there is no real difference between the quality of what I do and what he does. But I do like this just for how it serves as documentation of the thought process across AI/human, and can be added to version control. Instead of humans reviewing PRs, perhaps humans can review the research/plan document.

On the PR review front, I give Claude the ticket number and the branch (or PR) and ask it to review for correctness, bugs and design consistency. The prompt is always roughly the same for every PR. It does a very good job there too.

Modelwise, Opus 4.6 is scary good!

75. The separation of planning and execution resonates strongly. I've been using a similar pattern when building with AI APIs — write the spec/plan in natural language first, then let the model execute against it.

One addition that's worked well for me: keeping a persistent context file that the model reads at the start of each session. Instead of re-explaining the project every time, you maintain a living document of decisions, constraints, and current state. Turns each session into a continuation rather than a cold start.

The biggest productivity gain isn't in the code generation itself — it's in reducing the re-orientation overhead between sessions.

76. > “remove this section entirely, we don’t need caching here” — rejecting a proposed approach

I wonder why you don't remove it yourself. Aren't you already editing the plan?

77. Interesting! I feel like I'm learning to code all over again! I've only been using Claude for a little more than a month and until now I've been figuring things out on my own. Building my methodology from scratch. This is much more advanced than what I'm doing. I've been going straight to implementation, but doing one very small and limited feature at a time, describing implementation details (data structures like this, use that API here, import this library etc) verifying it manually, and having Claude fix things I don't like. I had just started getting annoyed that it would make the same (or very similar) mistake over and over again and I would have to fix it every time. This seems like it'll solve that problem I had only just identified! Neat!

78. > Most developers type a prompt, sometimes use plan mode, fix the errors, repeat.

> ...

> never let Claude write code until you’ve reviewed and approved a written plan

I certainly always work towards an approved plan before I let it lost on changing the code. I just assumed most people did, honestly. Admittedly, sometimes there's "phases" to the implementation (because some parts can be figured out later and it's more important to get the key parts up and running first), but each phase gets a full, reviewed plan before I tell it to go.

In fact, I just finished writing a command and instruction to tell claude that, when it presents a plan for implementation, offer me another option; to write out the current (important parts of the) context and the full plan to individual (ticket specific) md files. That way, if something goes wrong with the implementation I can tell it to read those files and "start from where they left off" in the planning.

79. Haha this is surprisingly and exactly how I use claude as well. Quite fascinating that we independently discovered the same workflow.

I maintain two directories: "docs/proposals" (for the research md files) and "docs/plans" (for the planning md files). For complex research files, I typically break them down into multiple planning md files so claude can implement one at a time.

A small difference in my workflow is that I use subagents during implementation to avoid context from filling up quickly.

80. Same, I formalized a similar workflow for my team (oriented around feature requirement docs), I am thinking about fully productizing it and am looking to for feedback - https://acai.sh

Even if the product doesn’t resonate I think I’ve stumbled on some ideas you might find useful^

I do think spec-driven development is where this all goes. Still making up my mind though.

81. I recently discovered GitHub speckit which separates planning/execution in stages: specify, plan, tasks, implement. Finding it aligns with the OP with the level of “focus” and “attention” this gets out of Claude Code.

Speckit is worth trying as it automates what is being described here, and with Opus 4.6 it's been a kind of BC/AD moment for me.

82. In my own tests I have found opus to be very good at writing plans, terrible at executing them. It typically ignores half of the constraints.
https://x.com/xundecidability/status/2019794391338987906?s=2...
https://x.com/xundecidability/status/2024210197959627048?s=2...

83. 1. Don't implement too much at at time

2. Have the agent review if it followed the plan and relevant skills accurately.

84. I just use Jesse’s “superpowers” plugin. It does all of this but also steps you through the design and gives you bite sized chunks and you make architecture decisions along the way. Far better than making big changes to an already established plan.

85. Good article, but I would rephrase the core principle slightly:

Never let Claude write code until you’ve reviewed, *fully understood* and approved a written plan.

In my experience, the beginning of chaos is the point at which you trust that Claude has understood everything correctly and claims to present the very best solution. At that point, you leave the driver's seat.

86. I tried Opus 4.6 recently and it’s really good. I had ditched Claude a long time ago for Grok + Gemini + OpenCode with Chinese models. I used Grok/Gemini for planning and core files, and OpenCode for setup, running, deploying, and editing.

However, Opus made me rethink my entire workflow. Now, I do it like this:

* PRD (Product Requirements Document)

* main.py + requirements.txt + readme.md (I ask for minimal, functional, modular code that fits the main.py)

* Ask for a step-by-step ordered plan

* Ask to focus on one step at a time

The super powerful thing is that I don’t get stuck on missing accounts, keys, etc. Everything is ordered and runs smoothly. I go rapidly from idea to working product, and it’s incredibly easy to iterate if I figure out new features are required while testing. I also have GLM via OpenCode, but I mainly use it for "dumb" tasks.

Interestingly, for reasoning capabilities regarding standard logic inside the code, I found Gemini 3 Flash to be very good and relatively cheap. I don't use Claude Code for the actual coding because forcing everything via chat into a main.py encourages minimal code that's easy to skim—it gives me a clearer representation of the feature space

87. My process is similar, but I recently added a new "critique the plan" feedback loop that is yielding good results. Steps:

1. Spec

2. Plan

3. Read the plan & tell it to fix its bad ideas.

4. (NB) Critique the plan (loop) & write a detailed report

5. Update the plan

6. Review and check the plan

7. Implement plan

Detailed here:

https://x.com/PetrusTheron/status/2016887552163119225

88. Same. In my experience, the first plan always benefits from being challenged once or twice by claude itself.

89. this is exactly how I work with cursor

except that I put notes to plan document in a single message like:

> plan quote
my note
> plan quote
my note

otherwise, I'm not sure how to guarantee that ai won't confuse my notes with its own plan.

one new thing for me is to review the todo list, I was always relying on auto generated todo list

90. I use amazon kiro.

The AI first works with you to write requirements, then it produces a design, then a task list.

The helps the AI to make smaller chunks to work on, it will work on one task at a time.

I can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.

Kiro also supports steering files, they are files that try to lock the AI in for common design decisions.

the price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context.

91. There is not a lot of explanation WHY is this better than doing the opposite: start coding and see how it goes and how this would apply to Codex models.

I do exactly the same, I even developed my own workflows wit Pi agent, which works really well. Here is the reason:

- Claude needs a lot more steering than other models, it's too eager to do stuff and does stupid things and write terrible code without feedback.

- Claude is very good at following the plan, you can even use a much cheaper model if you have a good plan. For example I list every single file which needs edits with a short explanation.

- At the end of the plan, I have a clear picture in my head how the feature will exactly look like and I can be pretty sure the end result will be good enough (given that the model is good at following the plan).

A lot of things don't need planning at all. Simple fixes, refactoring, simple scripts, packaging, etc. Just keep it simple.

92. Funny how I came up with something loosely similar. Asking Codex to write a detailed plan in a markdown document, reviewing it, and asking it to implement it step by step. It works exquisitely well when it can build and test itself.

93. I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details).

This has changed in the last week, for 3 reasons:

1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to..

2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more.

3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track
and correct early.

Having a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction.

Is it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out.

I don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back.

94. Hub and spoke documentation in planning has been absolutely essential for the way my planning was before, and it's pretty cool seeing it work so well for planning mode to build scaffolds and routing.

95. this sounds... really slow. for large changes for sure i'm investing time into planning. but such a rigid system can't possible be as good as a flexible approach with variable amounts of planning based on complexity

96. I do something broadly similar. I ask for a design doc that contains an embedded todo list, broken down into phases. Looping on the design doc asking for suggestions seems to help. I'm up to about 40 design docs so far on my current project.

97. This is just Waterfall for LLMs. What happens when you explore the problem space and need to change up the plan?

98. I feel like if I have to do all this, I might as well write the code myself.

99. This is great. My workflow is also heading in that direction, so this is a great roadmap. I've already learned that just naively telling Claude what to do and letting it work, is a recipe for disaster and wasted time.

I'm not this structured yet, but I often start with having it analyse and explain a piece of code, so I can correct it before we move on. I also often switch to an LLM that's separate from my IDE because it tends to get confused by sprawling context.

100. I don't know. I tried various methods. And this one kind of doesn't work quite a bit of times. The problem is plan naturally always skips some important details, or assumes some library function, but is taken as instruction in the next section. And claude can't handle ambiguity if the instruction is very detailed(e.g. if plan asks to use a certain library even if it is a bad fit claude won't know that decision is flexible). If the instruction is less detailed, I saw claude is willing to try multiple things and if it keeps failing doesn't fear in reverting almost everything.

In my experience, the best scenario is that instruction and plan should be human written, and be detailed.

101. Another pattern is:

1. First vibecode software to figure out what you want

2. Then throw it out and engineer it

102. This separation of planning and execution resonates deeply with how I approach task management in general, not just coding.

The key insight here - that planning and execution should be distinct phases - applies to productivity tools too. I've been using www.dozy.site which takes a similar philosophy: it has smart calendar scheduling that automatically fills your empty time slots with planned tasks. The planning happens first (you define your tasks and projects), then the execution is automated (tasks get scheduled into your calendar gaps).

The parallel is interesting: just like you don't want Claude writing code before the plan is solid, you don't want to manually schedule tasks before you've properly planned what needs to be done. The separation prevents wasted effort and context switching.

The annotation cycle you describe (plan -> review -> annotate -> refine) is exactly how I work with my task lists too. Define the work, review it, adjust priorities and dependencies, then let the system handle the scheduling.

103. Wow, I've been needing this! The one issue I’ve had with terminals is reviewing plans, and desiring the ability to provide feedback on specific plan sections in a more organized way.

Really nice ui based on the demo.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Planning vs Just Coding # Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.

commentCount

← Back to job