Summarizer

LLM Input

llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/topic-8-a7d84eab-0c79-4a4b-82ac-7e2ab6d996fd-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Test-Driven Development Integration # Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.
</topic>

<comments_about_topic>
1. Why would you test implementation details? Test what's delivered, not how it's delivered. The thinking portion, synthetized or not, is merely implementation.

The resulting artefact, that's what is worth testing.

2. > Why would you test implementation details

Because this has never been sufficient. From things like various hard to test cases to things like readability and long term maintenance. Reading and understanding the code is more efficient and necessary for any code worth keeping around.

3. > If it did work, well, the oldest trick in computer science is writing compilers, i suppose we will just have to write an English to pedantry compiler.

"Add tests to this function" for GPT-3.5-era models was much less effective than "you are a senior engineer. add tests for this function. as a good engineer, you should follow the patterns used in these other three function+test examples, using this framework and mocking lib." In today's tools, "add tests to this function" results in a bunch of initial steps to look in common places to see if that additional context already exists, and then pull it in based on what it finds. You can see it in the output the tools spit out while "thinking."

So I'm 90% sure this is already happening on some level.

4. I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.

My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.

Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.

The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:

-keep the spine runnable early (end-to-end scaffold)

-add one thin slice at a time (don’t let it touch 15 files speculatively)

-force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)

-treat refactors as normal, because the harness makes them safe

If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.

Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.

5. Now that code is cheap, I ensured my side project has unit/integration tests (will enforce 100% coverage), Playwright tests, static typing (its in Python), scripts for all tasks. Will learn mutation testing too (yes, its overkill). Now my agent works upto 1 hour in loops and emits concise code I dont have to edit much.

6. I actually don't really like a few of things about this approach.

First, the "big bang" write it all at once. You are going to end up with thousands of lines of code that were monolithically produced. I think it is much better to have it write the plan and formulate it as sensible technical steps that can be completed one at a time. Then you can work through them. I get that this is not very "vibe"ish but that is kind of the point. I want the AI to help me get to the same point I would be at with produced code AND understanding of it, just accelerate that process. I'm not really interested in just generating thousands of lines of code that nobody understands.

Second, the author keeps refering to adjusting the behaviour, but never incorporating that into long lived guidance. To me, integral with the planning
process is building an overarching knowledge base. Every time you're telling it
there's something wrong, you need to tell it to update the knowledge base about
why so it doesn't do it again.

Finally, no mention of tests? Just quick checks? To me, you have to end up with
comprehensive tests. Maybe to the author it goes without saying, but I find it is
integral to build this into the planning. Certain stages you will want certain
types of tests. Some times in advance of the code (so TDD style) other times
built alongside it or after.

It's definitely going to be interesting to see how software methodology evolves
to incorporate AI support and where it ultimately lands.

7. The articles approach matches mine, but I've learned from exactly the things you're pointing out.

I get the PLAN.md (or equivalent) to be separated into "phases" or stages, then carefully prompt (because Claude and Codex both love to "keep going") it to only implement that stage, and update the PLAN.md

Tests are crucial too, and form another part of the plan really. Though my current workflow begins to build them later in the process than I would prefer...

8. Exactly; the original commenter seems determined to write-off AI as "just not as good as me".

The original article is, to me, seemingly not that novel. Not because it's a trite example, but because I've begun to experience massive gains from following the same basic premise as the article. And I can't believe there's others who aren't using like this.

I iterate the plan until it's seemingly deterministic, then I strip the plan of implementation, and re-write it following a TDD approach. Then I read all specs, and generate all the code to red->green the tests.

If this commenter is too good for that, then it's that attitude that'll keep him stuck. I already feel like my projects backlog is achievable, this year.

9. For some software, sure but not most.

And you don’t blame humans anyways lol. Everywhere I’ve worked has had “blameless” postmortems. You don’t remove human review unless you have reasonable alternatives like high test coverage and other automated reviews.

10. That sounds like the recommended approach. However, there's one more thing I often do: whenever Claude Code and I complete a task that didn't go well at first, I ask CC what it learned, and then I tell it to write down what it learned for the future. It's hard to believe how much better CC has become since I started doing that. I ask it to write dozens of unit tests and it just does. Nearly perfectly. It's insane.

11. Absolutely. And you can also always let the agent look back at the plan to check if it is still on track and aligned.

One step I added, that works great for me, is letting it write (api-level) tests after planning and before implementation. Then I’ll do a deep review and annotation of these tests and tweak them until everything is just right.

12. I've been working off and on on a vibe coded FP language and transpiler - mostly just to get more experience with Claude Code and see how it handles complex real world projects. I've settled on a very similar flow, though I use three documents: plan, context, task list. Multiple rounds of iteration when planning a feature. After completion, have a clean session do an audit to confirm that everything was implemented per the design. Then I have both Claude and CodeRabbit do code review passes before I finally do manual review. VERY heavy emphasis on tests, the project currently has 2x more test code than application code. So far it works surprisingly well. Example planning docs below -

https://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv...

13. The author is quite far on their journey but would benefit from writing simple scripts to enforce invariants in their codebase. Invariant broken? Script exits with a non-zero exit code and some output that tells the agent how to address the problem. Scripts are deterministic, run in milliseconds, and use zero tokens. Put them in husky or pre-commit, install the git hooks, and your agent won’t be able to commit without all your scripts succeeding.

And “Don’t change this function signature” should be enforced not by anticipating that your coding agent “might change this function signature so we better warn it not to” but rather via an end to end test that fails if the function signature is changed (because the other code that needs it not to change now has an error). That takes the author out of the loop and they can not watch for the change in order to issue said correction, and instead sip coffee while the agent observes that it caused a test failure then corrects it without intervention, probably by rolling back the function signature change and changing something else.

14. This is basically long-lived specs that are used as tests to check that the product still adheres to the original idea that you wanted to implement, right?

This inspired me to finally write good old playwright tests for my website :).

15. I suggest reading the tests that Superpowers author has come up with for testing the skills. See the GitHub repo.

16. Another approach is to spec functionality using comments and interfaces, then tell the LLM to first implement tests and finally make the tests pass. This way you also get regression safety and can inspect that it works as it should via the tests.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Test-Driven Development Integration # Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.

commentCount

← Back to job