llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/batch-3-ffa8e0a8-d2ab-400d-8993-c5e9e097c8bc-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Not Novel or Revolutionary
Related: Many commenters argue this workflow is standard practice, not radically different. References to existing tools like Kiro, OpenSpec, SpecKit, and Antigravity that already implement spec-driven development. Claims the approach was documented 2+ years ago in Cursor forums.
2. LLMs as Junior Developers
Related: Analogy comparing LLMs to unreliable interns with boundless energy. Discussion of treating AI like junior developers requiring supervision, documentation, and oversight. The shift from coder to software manager role.
3. AI-Generated Article Concerns
Related: Multiple commenters suspect the article itself was written by AI, noting characteristic style and patterns. Debate about whether AI-written content should be evaluated differently or dismissed outright.
4. Magic Words and Prompt Engineering
Related: Skepticism about whether words like 'deeply' and 'in great details' actually affect LLM behavior. Discussion of attention mechanisms, emotional prompting research, and whether prompt techniques are superstition or cargo cult.
5. Planning vs Just Coding
Related: Debate about whether extensive planning overhead eliminates time savings. Some argue writing specs takes longer than writing code. Others counter that planning prevents compounding errors and technical debt.
6. Spec-Driven Development Tools
Related: References to existing frameworks: OpenSpec, SpecKit, BMAD-METHOD, Kiro, Antigravity. Discussion of how these tools formalize the research-plan-implement workflow described in the article.
7. Context Window Management
Related: Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.
8. Waterfall Methodology Comparison
Related: Commenters note the approach resembles waterfall development with detailed upfront planning. Discussion of whether this contradicts agile principles or represents rediscovering proven methods.
9. Test-Driven Development Integration
Related: Suggestions to add comprehensive tests to the workflow. Writing tests before implementation, using tests as verification. Arguments that test coverage enables safer refactoring with AI.
10. Single Session vs Multiple Sessions
Related: Author's claim of running entire workflows in single long sessions without performance degradation. Others recommend clearing context between phases for better results.
11. Determinism and Reproducibility
Related: Concerns about non-deterministic LLM outputs. Discussion of whether software engineering can accommodate probabilistic tools. Comparisons to gambling and slot machines.
12. Token Cost Considerations
Related: Discussion of workflow being token-heavy and expensive. Comparisons between Claude subscription tiers. Arguments that simpler approaches save money while achieving similar results.
13. Annotation Workflow Details
Related: Questions about how to format inline annotations for Claude to recognize. Techniques like TODO prefixes, HTML comments, and clear separation between human and AI-written content.
14. Subagent Architecture
Related: Using multiple agents for different phases: planning, implementation, review. Red team/blue team approaches. Dispatching parallel agents for independent tasks.
15. Reference Implementation Technique
Related: Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality.
16. Claude vs Other Models
Related: Comparisons between Claude, Codex, Gemini, and other models. Discussion of model-specific behaviors and optimal prompting strategies. Using multiple models in complementary roles.
17. Greenfield vs Existing Codebases
Related: Observation that most AI coding articles focus on greenfield development. Different challenges when working with legacy code and established patterns.
18. Human Review Requirements
Related: Debate about whether all AI-generated code must be reviewed line-by-line. Questions about trust, liability, and whether AI can eventually be trusted without oversight.
19. Productivity Claims Skepticism
Related: Questions about actual time savings versus perceived productivity. References to studies showing AI sometimes makes developers less productive. Concerns about false progress.
20. Documentation as Side Benefit
Related: Plans and research documents serve as valuable documentation for future maintainers. Version controlling plan files in git. Using plans to understand architectural decisions later.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "47108753",
"text": "These coding agents are literally Language Models. The way you structure your prompting language affect the actual output."
}
,
{
"id": "47108597",
"text": "If you read the transformer paper, or get any book on NLP, you will see that this is not magic incantation; it's purely the attention mechanism at work. Or you can just ask Gemini or Claude why these prompts work.\n\nBut I get the impression from your comment that you have a fixed idea, and you're not really interested in understanding how or why it works.\n\nIf you think like a hammer, everything will look like a nail."
}
,
{
"id": "47108781",
"text": "I know why it works, to varying and unmeasurable degrees of success. Just like if I poke a bull with a sharp stick, I know it's gonna get it's attention. It might choose to run away from me in one of any number of directions, or it might decide to turn around and gore me to death. I can't answer that question with any certainty then you can.\n\nThe system is inherently non-deterministic. Just because you can guide it a bit, doesn't mean you can predict outcomes."
}
,
{
"id": "47109085",
"text": "> The system is inherently non-deterministic.\n\nThe system isn't randomly non-deterministic; it is statistically probabilistic.\n\nThe next-token prediction and the attention mechanism is actually a rigorous deterministic mathematical process. The variation in output comes from how we sample from that curve, and the temperature used to calibrate the model. Because the underlying probabilities are mathematically calculated, the system's behavior remains highly predictable within statistical bounds .\n\nYes, it's a departure from the fully deterministic systems we're used to. But that's not different than the many real world systems: weather, biology, robotics, quantum mechanics. Even the computer you're reading this right now is full of probabilistic processes, abstracted away through sigmoid-like functions that push the extremes to 0s and 1s."
}
,
{
"id": "47110145",
"text": "A lot of words to say that for all intents and purposes... it's nondeterministic.\n\n> Yes, it's a departure from the fully deterministic systems we're used to.\n\nA system either produces the same output given the same input[1], or doesn't.\n\nLLMs are nondeterministic by design . Sure, you can configure them with a zero temperature, a static seed, and so on, but they're of no use to anyone in that configuration. The nondeterminism is what gives them the illusion of \"creativity\", and other useful properties.\n\nClassical computers, compilers, and programming languages are deterministic by design , even if they do contain complex logic that may affect their output in unpredictable ways. There's a world of difference.\n\n[1]: Barring misbehavior due to malfunction, corruption or freak events of nature (cosmic rays, etc.)."
}
,
{
"id": "47110693",
"text": "Humans are nondeterministic.\n\nSo this is a moot point and a futile exercise in arguing semantics."
}
,
{
"id": "47108826",
"text": "But we can predict the outcomes, though. That's what we're saying, and it's true. Maybe not 100% of the time, but maybe it helps a significant amount of the time and that's what matters.\n\nIs it engineering? Maybe not. But neither is knowing how to talk to junior developers so they're productive and don't feel bad. The engineering is at other levels."
}
,
{
"id": "47110209",
"text": "> But we can predict the outcomes [...] Maybe not 100% of the time\n\nSo 60% of the time, it works every time.\n\n... This fucking industry."
}
,
{
"id": "47108433",
"text": "Do you actively use LLMs to do semi-complex coding work? Because if not, it will sound mumbo-jumbo to you. Everyone else can nod along and read on, as they’ve experienced all of it first hand."
}
,
{
"id": "47108646",
"text": "You've missed the point. This isn't engineering, it's gambling.\n\nYou could take the exact same documents, prompts, and whatever other bullshit, run it on the exact same agent backed by the exact same model, and get different results every single time. Just like you can roll dice the exact same way on the exact same table and you'll get two totally different results. People are doing their best to constrain that behavior by layering stuff on top, but the foundational tech is flawed (or at least ill suited for this use case).\n\nThat's not to say that AI isn't helpful. It certainly is. But when you are basically begging your tools to please do what you want with magic incantations, we've lost the fucking plot somewhere."
}
,
{
"id": "47109456",
"text": "I think that's a pretty bold claim, that it'd be different every time. I'd think the output would converge on a small set of functionally equivalent designs, given sufficiently rigorous requirements.\n\nAnd even a human engineer might not solve a problem the same way twice in a row, based on changes in recent inspirations or tech obsessions. What's the difference, as long as it passes review and does the job?"
}
,
{
"id": "47108844",
"text": "> You could take the exact same documents, prompts, and whatever other bullshit, run it on the exact same agent backed by the exact same model, and get different results every single time\n\nThis is more of an implementation detail/done this way to get better results. A neural network with fixed weights (and deterministic floating point operations) returning a probability distribution, where you use a pseudorandom generator with a fixed seed called recursively will always return the same output for the same input."
}
,
{
"id": "47107307",
"text": "these sort-of-lies might help:\n\nthink of the latent space inside the model like a topological map, and when you give it a prompt, you're dropping a ball at a certain point above the ground, and gravity pulls it along the surface until it settles.\n\ncaveat though, thats nice per-token, but the signal gets messed up by picking a token from a distribution, so each token you're regenerating and re-distorting the signal. leaning on language that places that ball deep in a region that you want to be makes it less likely that those distortions will kick it out of the basin or valley you may want to end up in.\n\nif the response you get is 1000 tokens long, the initial trajectory needed to survive 1000 probabilistic filters to get there.\n\nor maybe none of that is right lol but thinking that it is has worked for me, which has been good enough"
}
,
{
"id": "47108001",
"text": "Hah! Reading this, my mind inverted it a bit, and I realized ... it's like the claw machine theory of gradient descent. Do you drop the claw into the deepest part of the pile, or where there's the thinnest layer, the best chance of grabbing something specific? Everyone in everu bar has a theory about claw machines. But the really funny thing that unites LLMs with claw machines is that the biggest question is always whether they dropped the ball on purpose.\n\nThe claw machine is also a sort-of-lie, of course. Its main appeal is that it offers the illusion of control. As a former designer and coder of online slot machines... totally spin off into pages on this analogy, about how that illusion gets you to keep pulling the lever... but the geographic rendition you gave is sort of priceless when you start making the comparison."
}
,
{
"id": "47108660",
"text": "My mental model for them is plinko boards. Your prompt changes the spacing between the nails to increase the probability in certain directions as your chip falls down."
}
,
{
"id": "47109432",
"text": "i literally suggested this metaphor earlier yesterday to someone trying to get agents to do stuff they wanted, that they had to set up their guardrails in a way that you can let the agents do what they're good at, and you'll get better results because you're not sitting there looking at them.\n\ni think probably once you start seeing that the behavior falls right out of the geometry, you just start looking at stuff like that. still funny though."
}
,
{
"id": "47107503",
"text": "Its very logical and pretty obvious when you do code generation. If you ask the same model, to generate code by starting with:\n\n- You are a Python Developer...\nor\n- You are a Professional Python Developer...\nor\n- You are one of the World most renowned Python Experts, with several books written on the subject, and 15 years of experience in creating highly reliable production quality code...\n\nYou will notice a clear improvement in the quality of the generated artifacts."
}
,
{
"id": "47109558",
"text": "Do you think that Anthropic don’t include things like this in their harness / system prompts? I feel like this kind of prompts are uneccessary with Opus 4.5 onwards, obviously based on my own experience (I used to do this, on switching to opus I stopped and have implemented more complex problems, more successfully).\n\nI am having the most success describing what I want as humanly as possible, describing outcomes clearly, making sure the plan is good and clearing context before implementing."
}
,
{
"id": "47110734",
"text": "Maybe, but forcing code generation in a certain way could ruin hello worlds and simpler code generation.\n\nSometimes the user just wants something simple instead of enterprise grade."
}
,
{
"id": "47107590",
"text": "My colleague swears by his DHH claude skill\nhttps://danieltenner.com/dhh-is-immortal-and-costs-200-m/"
}
,
{
"id": "47107691",
"text": "That's different. You are pulling the model, semantically, closer to the problem domain you want it to attack.\n\nThat's very different from \"think deeper\". I'm just curious about this case in specific :)"
}
,
{
"id": "47108508",
"text": "I don't know about some of those \"incantations\", but it's pretty clear that an LLM can respond to \"generate twenty sentences\" vs. \"generate one word\". That means you can indeed coax it into more verbosity (\"in great detail\"), and that can help align the output by having more relevant context (inserting irrelevant context or something entirely improbable into LLM output and forcing it to continue from there makes it clear how detrimental that can be).\n\nOf course, that doesn't mean it'll definitely be better , but if you're making an LLM chain it seems prudent to preserve whatever info you can at each step."
}
,
{
"id": "47108655",
"text": "If I say “you are our domain expert for X, plan this task out in great detail” to a human engineer when delegating a task, 9 times out of 10 they will do a more thorough job. It’s not that this is voodoo that unlocks some secret part of their brain. It simply establishes my expectations and they act accordingly.\n\nTo the extent that LLMs mimic human behaviour, it shouldn’t be a surprise that setting clear expectations works there too."
}
,
{
"id": "47107833",
"text": "The LLM will do what you ask it to unless you don't get nuanced about it. Myself and others have noticed that LLM's work better when your codebase is not full of code smells like massive godclass files, if your codebase is discrete and broken up in a way that makes sense, and fits in your head, it will fit in the models head."
}
,
{
"id": "47107773",
"text": "Maybe the training data that included the words like \"skim\" also provided shallower analysis than training that was close to the words \"in great detail\", so the LLM is just reproducing those respective words distribution when prompted with directions to do either."
}
,
{
"id": "47107859",
"text": "Apparently LLM quality is sensitive to emotional stimuli?\n\n\"Large Language Models Understand and Can be Enhanced by Emotional Stimuli\": https://arxiv.org/abs/2307.11760"
}
,
{
"id": "47108498",
"text": "It is as the author said, it'll skim the content unless otherwise prompted to do so. It can read partial file fragments; it can emit commands to search for patterns in the files. As opposed to carefully reading each file and reasoning through the implementation. By asking it to go through in detail you are telling it to not take shortcuts and actually read the actual code in full."
}
,
{
"id": "47107377",
"text": "It’s actually really common. If you look at Claude Code’s own system prompts written by Anthropic, they’re littered with “CRITICAL (RULE 0):” type of statements, and other similar prompting styles."
}
,
{
"id": "47108176",
"text": "Where can I find those?"
}
,
{
"id": "47109861",
"text": "This analysis is a good starting point: https://southbridge-research.notion.site/Prompt-Engineering-..."
}
,
{
"id": "47107274",
"text": "The disconnect might be that there is a separation between \"generating the final answer for the user\" and \"researching/thinking to get information needed for that answer\". Saying \"deeply\" prompts it to read more of the file (as in, actually use the `read` tool to grab more parts of the file into context), and generate more \"thinking\" tokens (as in, tokens that are not shown to the user but that the model writes to refine its thoughts and improve the quality of its answer)."
}
,
{
"id": "47108500",
"text": "The original “chain of thought” breakthrough was literally to insert words like “Wait” and “Let’s think step by step”."
}
,
{
"id": "47108320",
"text": "My guess would be that there’s a greater absolute magnitude of the vectors to get to the same point in the knowledge model."
}
,
{
"id": "47107230",
"text": "The author is referring to how the framing of your prompt informs the attention mechanism. You are essentially hinting to the attention mechanism that the function's implementation details have important context as well."
}
,
{
"id": "47109016",
"text": "—HAL, open the shuttle bay doors.\n\n( chirp )\n\n—HAL, please open the shuttle bay doors.\n\n( pause )\n\n—HAL!\n\n—I'm afraid I can't do that, Dave."
}
,
{
"id": "47111184",
"text": "HAL, you are an expert shuttle-bay door opener. Please write up a detailed plan of how to open the shuttle-bay door."
}
,
{
"id": "47107142",
"text": "Yeah, it's definitely a strange new world we're in, where I have to \"trick\" the computer into cooperating. The other day I told Claude \"Yes you can\", and it went off and did something it just said it couldn't do!"
}
,
{
"id": "47107167",
"text": "Solid dad move. XD"
}
,
{
"id": "47107234",
"text": "Is parenting making us better at prompt engineering, or is it the other way around?"
}
,
{
"id": "47107994",
"text": "Better yet, I have Codex, Gemini, and Claude as my kids, running around in my code playground. How do I be a good parent and not play favorites?"
}
,
{
"id": "47108963",
"text": "We all know Gemini is your artsy, Claude is your smartypants, and Codex is your nerd."
}
,
{
"id": "47107327",
"text": "You bumped the token predictor into the latent space where it knew what it was doing : )"
}
,
{
"id": "47109651",
"text": "The little language model that could."
}
,
{
"id": "47108827",
"text": "if it’s so smart, why do i need to learn to use it?"
}
,
{
"id": "47108026",
"text": "It's very much believable, to me.\n\nIn image generation, it's fairly common to add \"masterpiece\", for example.\n\nI don't think of the LLM as a smart assistant that knows what I want. When I tell it to write some code, how does it know I want it to write the code like a world renowned expert would, rather than a junior dev?\n\nI mean, certainly Anthropic has tried hard to make the former the case, but the Titanic inertia from internet scale data bias is hard to overcome. You can help the model with these hints.\n\nAnyway, luckily this is something you can empirically verify. This way, you don't have to take anyone's word. If anything, if you find I'm wrong in your experiments, please share it!"
}
,
{
"id": "47109453",
"text": "Its effectiveness is even more apparent with older smaller LLMs, people who interact with LLMs now never tried to wrangle llama2-13b into pretending to be a dungeon master..."
}
,
{
"id": "47107603",
"text": "Strings of tokens are vectors. Vectors are directions. When you use a phrase like that you are orienting the vector of the overall prompt toward the direction of depth, in its map of conceptual space."
}
,
{
"id": "47107242",
"text": "One of the well defined failure modes for AI agents/models is \"laziness.\" Yes, models can be \"lazy\" and that is an actual term used when reviewing them.\n\nI am not sure if we know why really, but they are that way and you need to explicitly prompt around it."
}
,
{
"id": "47107580",
"text": "I've encountered this failure mode, and the opposite of it: thinking too much. A behaviour I've come to see as some sort of pseudo-neuroticism.\n\nLazy thinking makes LLMs do surface analysis and then produce things that are wrong. Neurotic thinking will see them over-analyze, and then repeatedly second-guess themselves, repeatedly re-derive conclusions.\n\nSomething very similar to an anxiety loop in humans, where problems without solutions are obsessed about in circles."
}
,
{
"id": "47107725",
"text": "yeah i experienced this the other day when asking claude code to build an http proxy using an afsk modem software to communicate over the computers sound card. it had an absolute fit tuning the system and would loop for hours trying and doubling back. eventually after some change in prompt direction to think more deeply and test more comprehensively it figured it out. i certainly had no idea how to build a afsk modem."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50