Summarizer

LLM Input

llm/8632d754-c7a3-4ec2-977a-2733719992fa/batch-3-a1e163c5-f09b-40c2-872e-8a09d5ca8e42-input.json

Pretty-print

prompt

The following is content for you to classify. Do not respond to the comments—classify them.

<topics>
1. Determinism vs. Probabilistic Output
Related: Comparisons between compilers (deterministic, reliable) and LLMs (probabilistic, 'fuzzy'). Users debate whether 100% correctness is required for tools, with some arguing that LLMs are fundamentally different from traditional automation because they lack a 'ground truth' logic, while others argue that error rates are acceptable if the utility is high enough.
2. The Code Review Bottleneck
Related: Concerns that generating code faster merely shifts the bottleneck to reviewing code, which is often harder and more time-consuming than writing it. Users discuss the cognitive load of verifying 'vibe code' and the risks of blindly trusting output that looks correct but contains subtle bugs or security flaws.
3. Erosion of Programming Skills
Related: Fears that relying on AI causes developers to lose fundamental skills ('use it or lose it'), such as forgetting syntax for frameworks like RSpec. Users discuss the value of the 'Stare'—deep mental simulation of problems—and whether outsourcing thinking to machines degrades human expertise and the ability to solve novel problems without assistance.
4. Financial Barriers and Costs
Related: Discussions about the high cost of running continuous agents (potentially hundreds of dollars a month), with some noting that the author's wealth (as a billionaire/founder) biases his perspective on affordability. Users question whether the productivity gains justify the expense for average developers or if this creates a divide based on access to compute.
5. Agentic Workflows and Harnessing
Related: Technical strategies for controlling AI behavior, such as 'harness engineering,' using AGENTS.md files to document rules and prevent regressions, and setting up feedback loops where agents run tests to verify their own work. This includes moving beyond simple chatbots to autonomous background processes that triage issues or perform research.
6. Safety and Sandboxing
Related: Practical concerns about giving AI agents shell access or file system permissions. Users discuss the risks of agents accidentally 'nuking' systems, installing unwanted dependencies, or running dangerous commands, and recommend solutions like running agents in containers, VMs, or using specific sandboxing tools like Leash to limit blast radius.
7. Environmental Impact
Related: Reactions to the author's suggestion to 'always have an agent running,' with users expressing alarm at the potential energy consumption and environmental cost of millions of developers running constant background inference tasks for marginal productivity gains, described by some as 'cooking the planet.'
8. Architects vs. Builders Analogy
Related: Extensive debate using construction analogies to describe the shift in the developer's role. Comparisons are made between architects (who design and delegate) and builders, with arguments about whether AI users are 'vibe architects' who don't understand the materials, or professional engineers utilizing modern equivalents of CAD software and heavy machinery.
9. AI as Junior Developers
Related: The characterization of AI agents as an infinite supply of 'slightly drunken new college grads' or interns who are fast and cheap but require constant supervision. Users discuss the ratio of senior engineer time needed to review AI output and the lack of a path for these 'AI juniors' to ever become seniors.
10. Trust and Hallucination Risks
Related: Skepticism regarding the reliability of AI, highlighted by examples like 'wind-powered cars' or bad recipes. Users argue that because LLMs predict tokens rather than understanding physics or logic, they are 'confidently stupid' and require expert humans to filter out hallucinations, making them dangerous for those lacking deep domain knowledge.
11. Productivity vs. Inefficiency
Related: Debates over whether AI actually saves time or just feels productive. Some cite studies suggesting productivity drops (e.g., 19%), while others argue that the efficiency comes from parallelizing tasks or handling boilerplate. Users critique the lack of hard metrics in the article and the reliance on 'feeling' more efficient.
12. Corporate Process vs. Individual Flow
Related: The distinction between individual productivity gains (solopreneurs, solo projects) and organizational reality. Users note that while AI speeds up coding, it doesn't solve organizational bottlenecks like meetings, cross-team coordination, or gathering requirements, limiting its revolutionary impact on large enterprises compared to solo work.
13. Spec Writing as the New Coding
Related: The idea that working with agents shifts the primary task from writing syntax to writing detailed specifications and prompts. Users note that AI forces developers to be more explicit about requirements, effectively turning English specs into the source code, though some argue this is just a verbose and nondeterministic programming language.
14. Hype Cycles and Model Churn
Related: Frustration with the rapid pace of change in the AI landscape ('honeymoon phase'). Users complain about building workflows around a specific model only for it to change or degrade ('drift') in the next update, leading to a constant need to relearn prompt engineering and tooling idiosyncrasies.
15. Local Models vs. Cloud Privacy
Related: Concerns about uploading proprietary source code to cloud providers like Anthropic or OpenAI. Users discuss the trade-offs between using superior cloud models (Claude Code) versus privacy-preserving local models (OpenCode) or self-hosted solutions, and the difficulty of trusting AI companies with sensitive intellectual property.
0. Does not fit well in any category
</topics>

<comments_to_classify>
[

{
"id": "46909744",
"text": "> the scope is so small there's not much point in using an LLM\n\nActually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.\n\nIt has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.\n\nFor such \"mechanical\" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.\n\n(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)\n\nI call it \"power coding\", akin to power armor, or perhaps \"coding at the speed of thought\". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.\n\nI do use both approaches though, just depends on the project, task or mood!"
}
,

{
"id": "46910448",
"text": "Do you have the tool open sourced somewhere? I have been thinking of using something similar"
}
,

{
"id": "46905912",
"text": "I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.\n\nThe more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong."
}
,

{
"id": "46906564",
"text": "And lately, the sweet spot has been moving upwards every 6-8 weeks with the model release cycle."
}
,

{
"id": "46905965",
"text": "Exactly. The LLMs are quite good at \"code inpainting\", eg \"give me the outline/constraints/rules and I'll fill-in the blanks\"\n\nBut not so good at making (robust) new features out of the blue"
}
,

{
"id": "46905344",
"text": "This matches my experience, especially \"don’t draw the owl\" and the harness-engineering idea.\n\nThe failure mode I kept hitting wasn’t just \"it makes mistakes\", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).\n\nWhat ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.\n\nOnce I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed."
}
,

{
"id": "46907190",
"text": "No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - \"it wasn't.. it was\", \"it stopped being.. and started\". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.\n\nThese patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable."
}
,

{
"id": "46908036",
"text": ">makes the human race seem quite easily hackable.\n\nIf the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.\n\nHave you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?\n\nHacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them."
}
,

{
"id": "46912362",
"text": "How do I hack the human population to give me money, and simultaneously, hack law enforcement to not arrest me?"
}
,

{
"id": "46913493",
"text": "> How do I hack the human population to give me money\n\nMake something popular or become famous.\n\n> hack law enforcement to not arrest me\n\nDon't become famous with illegal stuff.\n\nThe hack is that we live in a society that makes people think they need a lot of money and at the same time allows individuals to accumulate obscene amounts of wealth and influence and many people being ok with that."
}
,

{
"id": "46907574",
"text": ">The failure mode I kept hitting wasn’t just \"it makes mistakes\", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).\n\nYeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.\n\nSo I would start getting to 90% and basically starting a new project with that as the baseline to add to."
}
,

{
"id": "46911289",
"text": "This is what I experienced as well.\n\nthese are some ticks I use now.\n\n1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)\n\n2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.\n\n3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)"
}
,

{
"id": "46909771",
"text": "Would love to hear more about your geneology app."
}
,

{
"id": "46905507",
"text": "This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)"
}
,

{
"id": "46905852",
"text": "I think you are right, the secret is that there is no secret. The projects I have been involved with thats been most successful was using these techniques. I also think experience helps because you develop a sense that very quickly knows if the model wants to go in a wonky direction and how a good spec looks like.\n\nWith where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code."
}
,

{
"id": "46906838",
"text": "> The bottle neck has gone from writing code to reading code.\n\nThis has always been the bottleneck. Reviewing code is much harder and gets worse results than writing it, which is why reviewing AI code is not very efficient. The time required to understand code far outstrips the time to type it.\n\nMost devs don’t do thorough reviews. Check the variable names seem ok, make sure there’s no obvious typos, ask for a comment and call it good. For a trusted teammate this is actually ok and why they’re so valuable! For an AI, it’s a slot machine and trusting it is equivalent to letting your coworkers/users do your job so you can personally move faster."
}
,

{
"id": "46913469",
"text": "I still use the chatbot but like to do it outside-in. Provide what I need, and instruct it to not write any code except the api (signatures of classes, interfaces, hierarchy, essential methods etc). We keep iterating about this until it looks good - still no real code. Then I ask it to do a fresh review of the broad outline, any issues it foresees etc. Then I ask it to write some demonstrator test cases to see how ergonomic and testable the code is - we fine tune the apis but nothing is fleshed out yet. Once this is done, we are done with the most time consuming phase.\n\nAfter that is basically just asking it to flesh out the layers starting from zero dependencies to arriving at the top of the castle. Even if we have any complexities within the pieces or the implementation is not exactly as per my liking, the issues are localised - I can dive in and handle it myself (most of the time, I don't need to).\n\nI feel like this approach works very well for me having a mental model of how things are connected because the most of the time I spent was spent on that model."
}
,

{
"id": "46910621",
"text": "I've been thinking about this as three maturity levels.\n\nLevel 1 is what Mitchell describes — AGENTS.md, a static harness. Prevents known mistakes. But it rots. Nobody updates the checklist when the environment changes.\n\nLevel 2 is treating each agent failure as an inoculation. Agent duplicates a util function? Don't just fix it — write a rule file: \"grep existing helpers before writing new ones.\" Agent tries to build a feature while the build is broken? Rule: \"fix blockers first.\" After a few months you have 30+ of these. Each one is an antibody against a specific failure class. The harness becomes an immune system that compounds.\n\nLevel 3 is what I haven't seen discussed much: specs need to push, not just be read. If a requirement in auth-spec.md changes, every linked in-progress task should get flagged automatically. The spec shouldn't wait to be consulted.\n\nThe real bottleneck isn't agent capability — it's supervision cost. Every type of drift (requirements change, environments diverge, docs rot) inflates the cost of checking the agent's work.\n\nCrush that cost and adoption follows."
}
,

{
"id": "46910858",
"text": "> level 2 - becomes an immune system\n\ni'd bet that above some number there will be contradictions. Things that apply to different semantic contexts, but look same on syntax level (and maybe with various levels of \"syntax\" and \"semantic\"). And debugging those is going to be nightmare - same as debugging requirements spec / verification of that"
}
,

{
"id": "46911346",
"text": "I don't understand how Agents make you feel productive. Single/Multiple agents reading specs, specs often produced with agents itself and iterated over time with human in the loop, a lot of reviewing of giant gibberish specs. Never had a clear spec in my life. Then all the dancing for this apperantly new paradigm, of not reviewing code but verifying behaviour, and so many other things. All of this to me is a total UNproductive mess. I use Cursor autocomplete from day one till to this day, I was super productive before LLMs, I'm more productive now, I'm capable, I have experience, product is hard to maintain but customers are happy, management is happy. So I can't really relate anymore to many of the programmers out there, that's sad, I can count on my hands devs that I can talk to that have hard skills and know-how to share instead of astroturfing about AI Agents"
}
,

{
"id": "46911928",
"text": "> Never had a clear spec in my life.\n\nTo me part of our job has always been about translating garbage/missing specs in something actionnable.\n\nWorking with agents don't change this and that's why until PM/business people are able to come up with actual specs, they'll still need their translators.\n\nFurthermore, it's not because the global spec is garbage that you, as a dev, won't come up with clear specs to solve technical issues related to the overall feature asked by stakeholders.\n\nOne funny thing I see though, is in the AI presentations done to non-technical people, the advice: \"be as thorough as possible when describing what you except the agent to solve!\".\nAnd I'm like: \"yeah, that's what devs have been asking for since forever...\"."
}
,

{
"id": "46912828",
"text": "With \"Never had a clear spec in my life\" what I mean is also that I don't how something should come out till I'm actually doing it. Writing code for me lead to discovery, I don't know what to produce till I see it in the wrapping context, like what a function should accept, for example a ref or a copy. Only at that point I have the proper intuition to make a decision that has to be supported long term. I don't want cheap code now I want a solit feature working tomorrow and not touching it for a long a time hopefully"
}
,

{
"id": "46911606",
"text": "> Never had a clear spec in my life.\n\nJust because you haven't or you work in a particular way, doesn't mean everyone does things the same way.\n\nLikewise, on your last point, just because someone is using AI in their work, doesn't mean they don't have hard skills and know-how. Author of this article Mitchell is a great example of that - someone who proved to be able to produce great software and, when talking about individuals who made a dent in the industry, definitely had/has an impactful career."
}
,

{
"id": "46911634",
"text": "Never mentioned Mitchell I'm generally speaking, 95% of industry is not Mitchell"
}
,

{
"id": "46905872",
"text": "For those wondering how that looks in practice, here's one of OP's past blog posts describing a coding session to implement a non-trivial feature: https://mitchellh.com/writing/non-trivial-vibing (covered on HN here: https://news.ycombinator.com/item?id=45549434 )"
}
,

{
"id": "46910652",
"text": "Very much the same experience. But it does not talk much about the project setup and the influence of it on the session success. In the narrow scoped projects it works really well, especially when tests are easy to execute. I found that this approach melts down when facing enterprise software with large repositories and unconventional layouts. Then you need to do a bunch of context management upfront, and verbose instructions for evaluations. But we know what it needs is a refactor thats all.\n\nAnd the post touches on a next type of a problem, how to plan far ahead of time to utilise agents when you are away. It is a difficult problem but IMO we’re going in a direction of having some sort of shared “templated plans”/workflows and budgeted/throttled task execution to achieve that. It is like you want to give a little world to explore so that it does not stop early, like a little game to play, then you come back in the morning and check how far it went."
}
,

{
"id": "46913510",
"text": "How much does it cost per day to have all these agents running on your computer?\n\nIs your company paying for it or you?\n\nWhat is your process of the agent writes a piece of code, let's say a really complex recursive function, and you aren't confident you could have come up with the same solution? Do you still submit it?"
}
,

{
"id": "46913562",
"text": "The guy who wrote the post is a billionaire"
}
,

{
"id": "46913667",
"text": "I thought this was a joke ie you need to be a billionaire to be able to use agents like this, but you are correct.\n\nI think we need to stop listening to billionaires. The article is well thought out and well written, but his perspective is entirely biased by never having to think about money at all... all of this stuff is incredibly expensive."
}
,

{
"id": "46913775",
"text": "Billionaires also tend to have a vested interest in the tech being hyped and adopted, after all one doesn't become a billionaire without investments."
}
,

{
"id": "46913718",
"text": "Oh, never heard of him!"
}
,

{
"id": "46904966",
"text": "Much more pragmatic and less performative than other posts hitting frontpage. Good article."
}
,

{
"id": "46905071",
"text": "Finally, a step-by-step guide for even the skeptics to try to see what spot the LLM tools have in their workflows, without hype or magic like I vibe-coded an entire OS, and you can too! ."
}
,

{
"id": "46907474",
"text": "With so much noise in the AI world and constant model updates (just today GPT-5.3-Codex and Claude Opus 4.6 were announced), this was a really refreshing read. It’s easy to relate to his phased approach to finding real value in tooling and not just hype. There are solid insights and practical tips here. I’m increasingly convinced that the best way not to get overwhelmed is to set clear expectations for what you want to achieve with AI and tailor how you use it to work for you, rather than trying to chase every new headline. Very refreshing."
}
,

{
"id": "46914171",
"text": "How much electricity (and associated materials like water) must this use?\n\nIt makes me profoundly sad to think of the huge number of AI agents running endlessly to produce vibe-coded slop. The environmental impact must be massive."
}
,

{
"id": "46906155",
"text": "It's amusing how everyone seems to be going through the same journey.\n\nI do run multiple models at once now. On different parts of the code base.\n\nI focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.\n\nI do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.\n\nAutocompleting our brains! What a crazy time."
}
,

{
"id": "46910273",
"text": "Very nice. As a consequence of this new way of working I'm using `git worktree` and diffview all the time.\n\nFor more on the \"harness engineering\", see what Armin Ronacher and Mario Zechner are doing with pi: https://lucumr.pocoo.org/2026/1/31/pi/ https://mariozechner.at/posts/2025-11-30-pi-coding-agent/\n\n> I really don't care one way or the other if AI is here to stay3, I'm a software craftsman that just wants to build stuff for the love of the game.\n\nI suspect having three comma on one's bank account helps being very relaxed about the outcome ;)"
}
,

{
"id": "46913808",
"text": "How could the author write all of that and not talk about actual time savings versus the prior method?\n\nI mean, what is the point of change if not to improve? I don't mean \"I felt I was more efficient.\" Feelings aren't measurements. Numbers!"
}
,

{
"id": "46905784",
"text": "> At a bare minimum, the agent must have the ability to: read files, execute programs, and make HTTP requests.\n\nThat's one very short step removed from Simon Willison's lethal trifecta."
}
,

{
"id": "46911405",
"text": "This is why I won't run Claude without additional sandboxing. I'm currently using (and quite pleased with) https://github.com/strongdm/leash"
}
,

{
"id": "46908314",
"text": "I will say one thing Claude does is it doesn't run a command until you approve it, and you can choose between a one-time approval and always allowing a command's pattern. I usually approve the simple commands like `zig build test`, since I'm not particularly worried about the test harness. I believe it also scopes file reading by default to the current directory."
}
,

{
"id": "46909308",
"text": "A lot of people run the claude with --dangerously-skip-permissions"
}
,

{
"id": "46905926",
"text": "I'm definitely not running that on my machine."
}
,

{
"id": "46906766",
"text": "The way this is generally implemented is that agents have the ability to request a tool use. Then you confirm \"yes, you may run this grep\"."
}
,

{
"id": "46910211",
"text": "Same, but I felt okay sticking my code base in a VM and then letting an agent run there. I’d say it worked well"
}
,

{
"id": "46910348",
"text": "> This blog post was fully written by hand, in my own words.\n\nThis reminded me of back when wysiwyg web editors started becoming a thing, and coders started adding those \"Created in notepad\" stickers to their webpages, to point out they were 'real' web developers. Fun times."
}
,

{
"id": "46906443",
"text": "It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever. I constantly have to tell it what I don't like or what can be improved or need to request clarifications or alternative solutions.\n\nThis is what's so annoying about it. It's like a child that does the same errors again and again.\n\nBut couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome."
}
,

{
"id": "46906732",
"text": "> It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever.\n\nYour improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.\n\nAnyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.\n\nPutting some words in AGENTS.md hardly seems like the most annoying thing.\n\ntip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver."
}
,

{
"id": "46908119",
"text": "While this may be the end goal, I do think humanity needs to take the trip along with AI to this point.\n\nA mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order."
}
,

{
"id": "46906500",
"text": "It is not a mind reader. I enjoy giving it feedback because it shows I am in charge of the engineering.\n\nI also love using it for research for upcoming features. Research + pick a solution + implement. It happens so fast."
}

]
</comments_to_classify>

Based on the comments above, assign each to up to 3 relevant topics.

Return ONLY a JSON array with this exact structure (no other text):
[

{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,

{
"id": "comment_id_2",
"topics": [
2
]
}
,

{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment

Remember: Output ONLY the JSON array, no other text.

commentCount

← Back to job