llm/8632d754-c7a3-4ec2-977a-2733719992fa/batch-4-c8a7ac36-16ae-47bc-811f-4e7fc91f5219-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Determinism vs. Probabilistic Output
Related: Comparisons between compilers (deterministic, reliable) and LLMs (probabilistic, 'fuzzy'). Users debate whether 100% correctness is required for tools, with some arguing that LLMs are fundamentally different from traditional automation because they lack a 'ground truth' logic, while others argue that error rates are acceptable if the utility is high enough.
2. The Code Review Bottleneck
Related: Concerns that generating code faster merely shifts the bottleneck to reviewing code, which is often harder and more time-consuming than writing it. Users discuss the cognitive load of verifying 'vibe code' and the risks of blindly trusting output that looks correct but contains subtle bugs or security flaws.
3. Erosion of Programming Skills
Related: Fears that relying on AI causes developers to lose fundamental skills ('use it or lose it'), such as forgetting syntax for frameworks like RSpec. Users discuss the value of the 'Stare'—deep mental simulation of problems—and whether outsourcing thinking to machines degrades human expertise and the ability to solve novel problems without assistance.
4. Financial Barriers and Costs
Related: Discussions about the high cost of running continuous agents (potentially hundreds of dollars a month), with some noting that the author's wealth (as a billionaire/founder) biases his perspective on affordability. Users question whether the productivity gains justify the expense for average developers or if this creates a divide based on access to compute.
5. Agentic Workflows and Harnessing
Related: Technical strategies for controlling AI behavior, such as 'harness engineering,' using AGENTS.md files to document rules and prevent regressions, and setting up feedback loops where agents run tests to verify their own work. This includes moving beyond simple chatbots to autonomous background processes that triage issues or perform research.
6. Safety and Sandboxing
Related: Practical concerns about giving AI agents shell access or file system permissions. Users discuss the risks of agents accidentally 'nuking' systems, installing unwanted dependencies, or running dangerous commands, and recommend solutions like running agents in containers, VMs, or using specific sandboxing tools like Leash to limit blast radius.
7. Environmental Impact
Related: Reactions to the author's suggestion to 'always have an agent running,' with users expressing alarm at the potential energy consumption and environmental cost of millions of developers running constant background inference tasks for marginal productivity gains, described by some as 'cooking the planet.'
8. Architects vs. Builders Analogy
Related: Extensive debate using construction analogies to describe the shift in the developer's role. Comparisons are made between architects (who design and delegate) and builders, with arguments about whether AI users are 'vibe architects' who don't understand the materials, or professional engineers utilizing modern equivalents of CAD software and heavy machinery.
9. AI as Junior Developers
Related: The characterization of AI agents as an infinite supply of 'slightly drunken new college grads' or interns who are fast and cheap but require constant supervision. Users discuss the ratio of senior engineer time needed to review AI output and the lack of a path for these 'AI juniors' to ever become seniors.
10. Trust and Hallucination Risks
Related: Skepticism regarding the reliability of AI, highlighted by examples like 'wind-powered cars' or bad recipes. Users argue that because LLMs predict tokens rather than understanding physics or logic, they are 'confidently stupid' and require expert humans to filter out hallucinations, making them dangerous for those lacking deep domain knowledge.
11. Productivity vs. Inefficiency
Related: Debates over whether AI actually saves time or just feels productive. Some cite studies suggesting productivity drops (e.g., 19%), while others argue that the efficiency comes from parallelizing tasks or handling boilerplate. Users critique the lack of hard metrics in the article and the reliance on 'feeling' more efficient.
12. Corporate Process vs. Individual Flow
Related: The distinction between individual productivity gains (solopreneurs, solo projects) and organizational reality. Users note that while AI speeds up coding, it doesn't solve organizational bottlenecks like meetings, cross-team coordination, or gathering requirements, limiting its revolutionary impact on large enterprises compared to solo work.
13. Spec Writing as the New Coding
Related: The idea that working with agents shifts the primary task from writing syntax to writing detailed specifications and prompts. Users note that AI forces developers to be more explicit about requirements, effectively turning English specs into the source code, though some argue this is just a verbose and nondeterministic programming language.
14. Hype Cycles and Model Churn
Related: Frustration with the rapid pace of change in the AI landscape ('honeymoon phase'). Users complain about building workflows around a specific model only for it to change or degrade ('drift') in the next update, leading to a constant need to relearn prompt engineering and tooling idiosyncrasies.
15. Local Models vs. Cloud Privacy
Related: Concerns about uploading proprietary source code to cloud providers like Anthropic or OpenAI. Users discuss the trade-offs between using superior cloud models (Claude Code) versus privacy-preserving local models (OpenCode) or self-hosted solutions, and the difficulty of trusting AI companies with sensitive intellectual property.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "46910793",
"text": "I'm kind of on the same journey, a bit less far along. One thing I have observed is that I am constantly running out of tokens in claude. I guess this is not an issue for a wealthy person like Mitchell but it does significantly hamper my ability to experiment."
}
,
{
"id": "46905849",
"text": "This seems like a pretty reasonable approach that charts a course between skepticism and \"it's a miracle\".\n\nI wonder how much all this costs on a monthly basis?"
}
,
{
"id": "46913156",
"text": "The comment by user senko [1] links to a post from this same author with an example for a specific coding session that costs $15.98 for 8 hours of work. The example in this post talks about leaving agents running overnight, in which case I'd guess \"twice that amount\" would be a reasonable approximation.\n\nOr if we assume that the OP can only do 4 hours per sitting (mentioned in the other post) and 8 hours of overnight agents then it would come down to $15.98 * 1.5 * 20 = $497,40 a month (without weekends).\n\n[1] https://news.ycombinator.com/item?id=46905872"
}
,
{
"id": "46913316",
"text": ">$15.98 * 1.5 * 20 = $497,40 a month\n\nAre people seriously dropping hundreds of dollars a month on these products to get their work done?"
}
,
{
"id": "46914035",
"text": "If you make 10k/mo -- which is not that much!, $500 is 5% of revenue. All else held equal, if that helps you go 20% faster, it's an absolute no brainer.\n\nThe question is.. does it actually help you do that, or do you go 0% faster? Or 5% slower?\n\nInquiring minds want to know."
}
,
{
"id": "46905857",
"text": "As long as we're on the same page that what he's describing is itself a miracle ."
}
,
{
"id": "46912025",
"text": "It’s not. A miracle is “an event that is inexplicable by natural or scientific laws and accordingly gets attributed to some supernatural or preternatural cause”. Could we please stop trivialising and ignoring the meaning of words?"
}
,
{
"id": "46909807",
"text": "Take your religion somewhere else please."
}
,
{
"id": "46911224",
"text": "not quite as technically rich as i came to expect from previous posts from op, but very insightful regardless.\n\nnot ashamed to say that i am between steps 2 and 3 in my personal workflow.\n\n>Adopting a tool feels like work, and I do not want to put in the effort\n\nall the different approaches floating online feel ephemeral to me. this, just like for different tools for the op, seem like a chore to adopt. i like the fomo mongering from the community does not help here, but in the end it is a matter of personal discovery to stick with what works for you."
}
,
{
"id": "46908694",
"text": "I've been building systems like what the OP is using since gpt3 came out.\n\nThis is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.\n\nThen a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.\n\nI've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.\n\nUnironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage."
}
,
{
"id": "46908813",
"text": "First off, appreciate you sharing your perspective. I just have a few questions.\n\n> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.\n\nCan you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?\n\n> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.\n\nI appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?"
}
,
{
"id": "46910030",
"text": "I can't speak for parent, but I use gptel, and it sounds like they do as well. It has a number of features, but primarily it just gives you a chat buffer you can freely edit at any time. That gives you 100% control over the context, you just quickly remove the parts of the conversation where the LLM went off the rails and keep it clean. You can replace or compress the context so far any way you like.\n\nWhile I also use LLMs in other ways, this is my core workflow. I quickly get frustrated when I can't _quickly_ modify the context.\n\nIf you have some mastery over your editor, you can just run commands and post relevant output and make suggested changes to get an agent like experience, at a speed not too different from having the agent call tools. But you retain 100% control over the context, and use a tiny fraction of the tokens OpenCode and other agents systems would use.\n\nIt's not the only or best way to use LLMs, but I find it incredibly powerful, and it certainly has it's place.\n\nA very nice positive effect I noticed personally is that as opposed to using agents, I actually retain an understanding of the code automatically, I don't have to go in and review the work, I review and adjust on the fly."
}
,
{
"id": "46911879",
"text": "One thing to keep in mind is that the core of an LLM is basically a (non-deterministic) stateless function that takes text as input, and gives text as output.\n\nThe chat and session interfaces obscure this, making it look more stateful than it is. But they mainly just send the whole chat so far back to the LLM to get the next response. That's why the context window grows as a chat/session continues. It's also why the answers tend to get worse with longer context windows – you're giving the LLM a lot more to sift through.\n\nYou can manage the context window manually instead. You'll potentially lose some efficiencies from prompt caching, but you can also keep your requests much smaller and more relevant, likely spending fewer tokens."
}
,
{
"id": "46909059",
"text": "LLMs work on text and nothing else. There isn't any magic there. Just a limited context window on which the model will keep predicting the next token until it decides that it's predicted enough and stop.\n\nAll the tooling is there to manage that context for you. It works, to a degree, then stops working. Your intuition is there to decide when it stops working. This intuition gets outdated with each new release of the frontier model and changes in the tooling.\n\nThe stateless API with a human deciding what to feed it is much more efficient in both cost and time as long as you're only running a single agent. I've yet to see anyone use multiple agents to generate code successfully (but I have used agent swarms for unstructured knowledge retrieval).\n\nThe Unix tools are there for you to progra-manually search and edit the code base copy/paste into the context that you will send. Outside of Emacs (and possibly vim) with the ability to have dozens of ephemeral buffers open to modify their output I don't imagine they will be very useful.\n\nOr to quote the SICP lectures: The magic is that there is no magic."
}
,
{
"id": "46909000",
"text": "> I've been building systems like what the OP is using since pgt3 came out.\n\nOP is also a founder of Hashicorp, so.. lol.\n\n> This is the honeymoon phase.\n\nNo offense but you come across as if you didn’t read the article."
}
,
{
"id": "46909075",
"text": "You come across as if you didn't read my post.\n\nI'll wait for OP to move their workflow to Claude 7.0 and see if they still feel as bullish on AI tools.\n\nPeople who are learning a new AI tool for the first time don't realzie that they are just learning quirks of the tool and underlying and not skills that generalize. It's not until you've done it a few times that you realzie you've wasted more than 80% of your time on a model that is completely useless and will be sunset in 6 months."
}
,
{
"id": "46908487",
"text": "> Immediately cease trying to perform meaningful work via a chatbot.\n\nThat depends on your budget. To work within my pro plan's codex limits, I attach the codebase as a single file to various chat windows (GPT 5.2 Thinking - Heavy) and ask it to find bugs/plan a feature/etc. Then I copy the dense tasklist from chat to codex for implementation. This reduces the tokens that codex burns.\n\nAlso don't sleep on GPT 5.2 Pro. That model is a beast for planning."
}
,
{
"id": "46905043",
"text": "I recently also reflected on the evolution of my use of ai in programming. Same evolution, other path. If anyone is interested: https://www.asfaload.com/blog/ai_use/"
}
,
{
"id": "46906298",
"text": "Just wanted to say that was a nice and very grounded write up; and as a result very informative. Thank you. More stuff like this is a breath of fresh air in a landscape that has veered into hyperbole territory both in the for and against ai sides"
}
,
{
"id": "46910437",
"text": "OT but, the style. The journey. What is it? What does this remind me of?\n\nFlowers for Algernon.\n\nOr at least the first half. I don't wanna see what it looks like when AI capabilities start going in reverse.\n\nBut I want to know."
}
,
{
"id": "46909386",
"text": "So does everyone just run with giving full permissions on Claude code these days? It seems like I’m constantly coming back to CC to validate that it’s not running some bash that’s going to nuke my system. I would love to be able to fully step away but it feels like I can’t."
}
,
{
"id": "46911409",
"text": "I sandbox everything inside https://github.com/strongdm/leash\n\nThat way the blast radius is vastly reduced."
}
,
{
"id": "46909791",
"text": "I run my agents with full permissions in containers. Feels like a reasonable tradeoff. Bonus is I can set up each container with exactly the stack needed."
}
,
{
"id": "46909402",
"text": "Honest question, when was the last time you caught it trying to use a command that was going to \"nuke your system\"?"
}
,
{
"id": "46909434",
"text": "“Nuke” is maybe too strong of a word, but it has not been uncommon for me to see it trying to install specific versions of languages on my machine, or services I intentionally don’t have configured, or sometimes trying to force npm when I’m using bun, etc."
}
,
{
"id": "46909556",
"text": "Maybe once a month"
}
,
{
"id": "46908280",
"text": "What a lovely read. Thank you for sharing your experience.\n\nThe human-agent relationship described in the article made me wonder: are natural, or experienced, managers having more success with AI as subordinates than people without managerial skill? Are AI agents enormously different than arbitrary contractors half a world away where the only communication is daily text exchanges?"
}
,
{
"id": "46909577",
"text": "> Context switching is very expensive. In order to remain efficient, I found that it was my job as a human to be in control of when I interrupt the agent, not the other way around. Don't let the agent notify you.\n\nThis I have found to be important too."
}
,
{
"id": "46907386",
"text": "LLMs are not for me. My position is that the advantage we humans have over the\nrest of the natural world, is our minds. Our ability to think, create and express ideas\nis what separates us from the rest of the animal kingdom. Once we give that over to\n\"thinking\" machines, we weaken ourselves, both individually and as a species.\n\nThat said, I've given it a go. I used zed, which I think is a pretty great tool. I\nbought a pro subscription and used the built in agent with Claude Sonnet 4.x and Opus.\nI'm a Rails developer in my day job, and, like MitchellH and many others, found out\nfairly quickly that tasks for the LLM need to be quite specific and discrete. The agent\nis great a renames and minor refactors, but my preferred use of the agent was to get it\nto write RSpec tests once I'd written something like a controller or service object.\n\nAnd generally, the LLM agent does a pretty great job of this.\n\nBut here's the rub: I found that I was losing the ability to write rspec.\n\nI went to do it manually and found myself trying to remember API calls and approaches\nrequired to write some specs. The feeling of skill leaving me was quite sobering and\nmarked my abandonment of LLMs and Zed, and my return to neovim, agent-free.\n\nThe thing is, this is a common experience generally. If you don't use it, you lose it.\nIt applies to all things: fitness, language (natural or otherwise), skills of all kinds.\nWhy should it not apply to thinking itself.\n\nNow you may write me and my experience off as that of a lesser mind, and that you won't\nhave such a problem. You've been doing it so long that it's \"hard-wired in\" by now.\nPerhaps.\n\nIt's in our nature to take the path of least resistance, to seek ease and convenience at\nevery turn. We've certainly given away our privacy and anonymity so that we can pay for\nthings with our phones and send email for \"free\".\n\nLLMs are the ultimate convenience. A peer or slave mind that we can use to do our\nthinking and our work for us. Some believe that the LLM represents a local maxima, that\nthe approach can't get much better. I dunno, but as AI improves, we will hand over more\nand more thinking and work to it. To do otherwise would be to go against our very nature\nand every other choice we've made so far.\n\nBut it's not for me. I'm no MitchellH, and I'm probably better off performing the\nmundane activities of my work, as well as the creative ones, so as to preserve my\nhard-won knowledge and skills.\n\nYMMV\n\nI'll leave off with the quote that resonates the most with me as I contemplate AI:-\n\n\"I say your civilization, because as soon as we started thinking for you,\nit really became our civilization, which is, of course, what this is all about.\"\n-- Agent Smith \"The Matrix\""
}
,
{
"id": "46908005",
"text": "AI adoption is being heavily pushed at my work and personally I do use it, but only for the really \"boilerplate-y\" kinds of code I've already written hundreds of times before. I see it as a way to offload the more \"typing-intensive\" parts of coding (where the bottleneck is literally just my WPM on the keyboard) so I have more time to spend on the trickier \"thinking-intensive\" parts."
}
,
{
"id": "46909303",
"text": "I was using it the same way you just described but for C# and Angular and you're spot on. It feels amazing not having to memorize APIs and just let the AI even do code coverage near to 100%, however at some point I began noticing 2 things:\n\n- When tests didn't work I had to check what was going on and the LLMs do cheat a lot with Volkswagen tests, so that began to make me skeptic even of what is being written by the agents\n\n- When things were broken, spaghetti and awful code tends to be written in an obnoxius way it's beyond repairable and made me wish I had done it from scratch.\n\nThankfully I just tried using agents for tests and not for the actual code, but it makes me think a lot if \"vibe coding\" really produces quality work."
}
,
{
"id": "46913290",
"text": "I don't understand why you were letting your code get into such a state just because an agent wrote it? I won't approve such code from a human, and will ask them to change it with suggestions on how. I do the same for code written by claude.\n\nAnd then I raise the PR and other humans review it, and they won't let me merge crap code.\n\nIs it that a lot of you are working with much lighter weight processes and you're not as strict about what gets merged to main?"
}
,
{
"id": "46905275",
"text": "I'd be interested to know what agents you're using. You mentioned Claude and GPT in passing, but don't actually talk about which you're using or for which tasks."
}
,
{
"id": "46904820",
"text": "Good article! I especially liked the approach to replicate manual commits with the agent. I did not do that when learning but I suspect I'd have been much better off if I had."
}
,
{
"id": "46912459",
"text": "> having an agent running at all times\n\nThis gave me a physical flinch. Perhaps this is unfounded, but all this makes me think of is this becoming the norm, millions of people doing this, and us cooking our planet out much faster than predicted."
}
,
{
"id": "46907333",
"text": "This is yet one more indication to me that the winds have shifted with regards to the utility of the “agent” paradigm of coding with an LLM. With all the talk around Opus 4.5 I decided to finally make the jump there myself and haven’t yet been disappointed (though admittedly I’m starting it on some pretty straightforward stuff)."
}
,
{
"id": "46905118",
"text": "Thanks for sharing your experiences :)\n\nYou mentioned \"harness engineering\". How do you approach building \"actual programmed tools\" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?"
}
,
{
"id": "46907696",
"text": "For those of working on large proprietary, in fringe languages as well, what can we do? Upload all the source code to the cloud model? I am really wary of giving it a million lines of code it’s never seen."
}
,
{
"id": "46911097",
"text": "I've found mostly for context reasons its better to just have a grand overview of the systems and how they work together and feed that to the agent as context, it will use the additional files it touches to expand its understanding if you prompt well."
}
,
{
"id": "46907978",
"text": "AI is getting to the game-changing point. We need more hand-written reflections on how individuals are managing to get productivity gains for real (not a vibe coded app) software engineering."
}
,
{
"id": "46906513",
"text": "Do you have any ideas on how to harness AI to only change specific parts of a system or workpiece? Like \"I consider this part 80/100 done and only make 'meaningful' or 'new contributions' here\" ...?"
}
,
{
"id": "46911503",
"text": "This are all valid points and a hype-free pragmatic take, I've been wondering about the same things even when I'm still in the skeptics side. I think there are other things that should be added since Mitchell's reality won't apply to everyone:\n\n- What about non opensource work that's not on Github?\n\n- Costs! I would think \"an agent always running\" would add up quickly\n\n- In open source work, how does it amplify others. Are you seeing AI Slop as PRs? Can you tell the difference?"
}
,
{
"id": "46908082",
"text": "Now that the Nasdaq crashes, people switch from the stick to the carrot:\n\n\"Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!\""
}
,
{
"id": "46910155",
"text": "If the author is here, please could you also confirm you’ve never been paid by any AI company, marketing representative, community programme, in any shape or form?"
}
,
{
"id": "46910553",
"text": "I don't think you appreciate how un-bribeable this particular author is, and I don't just mean in a moral sense."
}
,
{
"id": "46910489",
"text": "He explicitly said \"I don't work for, invest in, or advise any AI companies.\" in the article.\n\nBut yes, Hashimoto is a high profile CEO/CTO who may well have an indirect, or near-future interest in talking up AI. HN articles extoling the productivity gains of Claude on HN do generally tend to be from older, managerial types (make of that what you will)."
}
,
{
"id": "46912010",
"text": "What made me feel old today: seeing a 36-year-old referred to as an older type"
}
,
{
"id": "46910169",
"text": "Bit strange that you are skeptical by default."
}
,
{
"id": "46910197",
"text": "Isn't skeptical by default quite reasonable?"
}
,
{
"id": "46910244",
"text": "Probably exhausting to be that way. The author is well respected and well known and has a good track record. My immediate reaction wasn’t to question that he spoke in good faith."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50