Spec Writing as the New Coding

The idea that working with agents shifts the primary task from writing syntax to writing detailed specifications and prompts. Users note that AI forces developers to be more explicit about requirements, effectively turning English specs into the source code, though some argue this is just a verbose and nondeterministic programming language.

While some developers find joy in shifting from syntax to high-level architectural "editing," critics argue that treating English specifications as source code merely creates a verbose and non-deterministic programming language. Success in this new paradigm requires finding a "sweet spot" in task scoping, where humans maintain rigorous editorial control over the system’s trunk while delegating "code inpainting" for the smaller branches. However, this transition remains contentious; many feel that the constant need to verify AI output and manage repetitive errors creates a "supervision cost" that threatens to outweigh the efficiency gains of automation. Ultimately, the shift forces a formalization of the developer's role as a translator, requiring them to build complex "immune systems" of rules to guide agents through the inherent ambiguity of natural language.

View on HN · Topics

If natural language is used to specify work to the LLM, how can the output ever be trusted? You'll always need to make sure the program does what you want, rather than what you said.

View on HN · Topics

Just create a very specific and very detailed prompt that is so specific that it starts including instructions and you came up with the most expensive programming language.

View on HN · Topics

Absolutely this. I am tired of that trope.

Or the argument that "well, at some point we can come up with a prompt language that does exactly what you want and you just give it a detailed spec." A detailed spec is called code. It's the most round-about way to make a programming language that even then is still not deterministic at best.

View on HN · Topics

> Break down sessions into separate clear, actionable tasks. Don't try to "draw the owl" in one mega session.

This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.

View on HN · Topics

This is actually an aspect of using AI tools I really enjoy: Forming an educated intuition about what the tool is good at, and tastefully framing and scoping the tasks I give it to get better results.

It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.

View on HN · Topics

I agree that framing and scoping tasks is becoming a real joy. The great thing about this strategy is there's a point at which you can scope something small enough that it's hard for the AI to get it wrong and it's easy enough for you as a human to comprehend what it's done and verify that it's correct.

I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.

Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.

View on HN · Topics

> it just polls you when it needs clarification

I think anyone who has worked on a serious software project would say, this means it would be polling you constantly.

Even if we posit that an LLM is equivalent to a human, humans constantly clarify requirements/architecture. IMO on both of those fronts the correct path often reveals itself over time, rather than being knowable from the start.

So in this scenario it seems like you'd be dealing with constant pings and need to really make sure you're understanding of the project is growing with the LLM's development efforts as well.

To me this seems like the best-case of the current technology, the models have been getting better and better at doing what you tell it in small chunks but you still need to be deciding what it should be doing. These chunks don't feel as though they're getting bigger unless you're willing to accept slop.

View on HN · Topics

It sounds to me like the goal there is to spell out everything you don't want the agent to make assumptions about. If you let the agent make the plan, it'll still make those assumptions for you.

View on HN · Topics

You joke, but the more I iterate on a plan before any code, the more successful the first pass is.

1) Tell claude my idea with as much as I know, ask it to ask me questions. This could go on for a few rounds. (Opus)

2) Run a validate skill on the plan, reviewer with a different prompt (Opus)

3) codex reviews the plan, always finds a few small items after the above 2.

4) claude opus implements in 1 shot, usually 99% accurate, then I manually test.

If I stay on target with those steps I always have good outcomes, but it is time consuming.

View on HN · Topics

> On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.

Taking it piece by piece in Claude Code has been significantly more successful.

View on HN · Topics

so many times I catch myself asking a coding agent e.g “please print the output” and it will update the file with “print (output)”.

Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes

View on HN · Topics

I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.

The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.

View on HN · Topics

Exactly. The LLMs are quite good at "code inpainting", eg "give me the outline/constraints/rules and I'll fill-in the blanks"

But not so good at making (robust) new features out of the blue

View on HN · Topics

This is what I experienced as well.

these are some ticks I use now.

1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)

2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.

3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)

View on HN · Topics

This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)

View on HN · Topics

I still use the chatbot but like to do it outside-in. Provide what I need, and instruct it to not write any code except the api (signatures of classes, interfaces, hierarchy, essential methods etc). We keep iterating about this until it looks good - still no real code. Then I ask it to do a fresh review of the broad outline, any issues it foresees etc. Then I ask it to write some demonstrator test cases to see how ergonomic and testable the code is - we fine tune the apis but nothing is fleshed out yet. Once this is done, we are done with the most time consuming phase.

After that is basically just asking it to flesh out the layers starting from zero dependencies to arriving at the top of the castle. Even if we have any complexities within the pieces or the implementation is not exactly as per my liking, the issues are localised - I can dive in and handle it myself (most of the time, I don't need to).

I feel like this approach works very well for me having a mental model of how things are connected because the most of the time I spent was spent on that model.

View on HN · Topics

I've been thinking about this as three maturity levels.

Level 1 is what Mitchell describes — AGENTS.md, a static harness. Prevents known mistakes. But it rots. Nobody updates the checklist when the environment changes.

Level 2 is treating each agent failure as an inoculation. Agent duplicates a util function? Don't just fix it — write a rule file: "grep existing helpers before writing new ones." Agent tries to build a feature while the build is broken? Rule: "fix blockers first." After a few months you have 30+ of these. Each one is an antibody against a specific failure class. The harness becomes an immune system that compounds.

Level 3 is what I haven't seen discussed much: specs need to push, not just be read. If a requirement in auth-spec.md changes, every linked in-progress task should get flagged automatically. The spec shouldn't wait to be consulted.

The real bottleneck isn't agent capability — it's supervision cost. Every type of drift (requirements change, environments diverge, docs rot) inflates the cost of checking the agent's work.

Crush that cost and adoption follows.

View on HN · Topics

> level 2 - becomes an immune system

i'd bet that above some number there will be contradictions. Things that apply to different semantic contexts, but look same on syntax level (and maybe with various levels of "syntax" and "semantic"). And debugging those is going to be nightmare - same as debugging requirements spec / verification of that

View on HN · Topics

I don't understand how Agents make you feel productive. Single/Multiple agents reading specs, specs often produced with agents itself and iterated over time with human in the loop, a lot of reviewing of giant gibberish specs. Never had a clear spec in my life. Then all the dancing for this apperantly new paradigm, of not reviewing code but verifying behaviour, and so many other things. All of this to me is a total UNproductive mess. I use Cursor autocomplete from day one till to this day, I was super productive before LLMs, I'm more productive now, I'm capable, I have experience, product is hard to maintain but customers are happy, management is happy. So I can't really relate anymore to many of the programmers out there, that's sad, I can count on my hands devs that I can talk to that have hard skills and know-how to share instead of astroturfing about AI Agents

View on HN · Topics

> Never had a clear spec in my life.

To me part of our job has always been about translating garbage/missing specs in something actionnable.

Working with agents don't change this and that's why until PM/business people are able to come up with actual specs, they'll still need their translators.

Furthermore, it's not because the global spec is garbage that you, as a dev, won't come up with clear specs to solve technical issues related to the overall feature asked by stakeholders.

One funny thing I see though, is in the AI presentations done to non-technical people, the advice: "be as thorough as possible when describing what you except the agent to solve!".
And I'm like: "yeah, that's what devs have been asking for since forever...".

View on HN · Topics

With "Never had a clear spec in my life" what I mean is also that I don't how something should come out till I'm actually doing it. Writing code for me lead to discovery, I don't know what to produce till I see it in the wrapping context, like what a function should accept, for example a ref or a copy. Only at that point I have the proper intuition to make a decision that has to be supported long term. I don't want cheap code now I want a solit feature working tomorrow and not touching it for a long a time hopefully

View on HN · Topics

It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever. I constantly have to tell it what I don't like or what can be improved or need to request clarifications or alternative solutions.

This is what's so annoying about it. It's like a child that does the same errors again and again.

But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.

View on HN · Topics

> a period of inefficiency

I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].

And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.

[0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...

Summarizer