llm/0c6097e3-bc76-4fbe-ab4f-ceafa2484e5f/batch-1-f32d4d05-048c-4802-b883-07857a458aa5-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. AI Performance on Greenfield vs. Legacy
Related: Users debate whether agents excel primarily at starting new projects from scratch while struggling to maintain large, complex, or legacy codebases without breaking existing conventions.
2. Context Window Limitations and Management
Related: Discussions focus on token limits (200k), performance degradation as context fills, and strategies like compacting history, using sub-agents, or maintaining summary files to preserve long-term memory.
3. Vibe Coding and Code Quality
Related: The polarization around building apps without reading the code; critics warn of unmaintainable "slop" and technical debt, while proponents value the speed and ability to bypass syntax.
4. Claude Code and Tooling
Related: Specific praise and critique for the Claude Code CLI, its integration with VS Code and Cursor, the use of slash commands, and comparisons to GitHub Copilot's agent mode.
5. Economic Impact on Software Jobs
Related: Existential anxiety regarding the obsolescence of mid-level engineers, the potential "hollowing out" of the middle class, and the shift toward one-person unicorn teams.
6. Prompt Engineering and Configuration
Related: Strategies involving `CLAUDE.md`, `AGENTS.md`, and custom system prompts to teach the AI coding conventions, architecture, and specific skills for better output.
7. Specific Language Capabilities
Related: Anecdotal evidence regarding proficiency in React, Python, and Go versus struggles in C++, Rust, and mobile development (Swift/Kotlin), often tied to training data availability.
8. Engineering vs. Coding
Related: A recurring distinction between "coding" (boilerplate, standard patterns) which AI conquers, and "engineering" (novel logic, complex systems, 3D graphics) where AI supposedly still fails.
9. Security and Trust
Related: Concerns about deploying unaudited AI code, the introduction of vulnerabilities, the risks of giving agents shell access, and the difficulty of verifying AI output.
10. The Skill Issue Argument
Related: Proponents dismiss failures as "skill issues," suggesting frustration stems from poor prompting or adaptability, while skeptics argue the tools are genuinely inconsistent.
11. Cost of AI Development
Related: Analysis of the financial viability of AI coding, including hitting API rate limits, the high cost of Opus 4.5 tokens, and the potential unsustainability of VC-subsidized pricing.
12. Future of Software Products
Related: Predictions that software creation costs will drop to zero, leading to a flood of bespoke personal apps replacing commercial SaaS, but potentially creating a maintenance nightmare.
13. Human-in-the-Loop Workflows
Related: The consensus that AI requires constant human oversight, "tools in a loop," and code review to prevent hallucination loops and ensure functional software.
14. Opus 4.5 vs. Previous Models
Related: Users describe the specific model as a "step change" or "inflection point" compared to Sonnet 3.5 or GPT-4, citing better reasoning and autonomous behavior.
15. Documentation and Specification
Related: The shift from writing code to writing specs; users find that detailed markdown documentation or "plan mode" yields significantly better AI results than vague prompts.
16. AI Hallucinations and Errors
Related: Reports of AI inventing non-existent CLI tools, getting stuck in logical loops, failing at visual UI tasks, and making simple indexing errors.
17. Shift in Developer Role
Related: The idea that developers are evolving into "product managers" or "architects" who direct agents, requiring less syntax proficiency and more systems thinking.
18. Testing and Verification
Related: The reliance on test-driven development (TDD), linters, and compilers to constrain non-deterministic AI output, ensuring generated code actually runs and meets requirements.
19. Local Models vs. Cloud APIs
Related: Discussions on the viability of local models for privacy and cost savings versus the necessity of massive cloud models like Opus for complex reasoning tasks.
20. Societal Implications
Related: Broader philosophical concerns about wealth concentration, the "class war" of automation, environmental impact, and the future of work in a post-code world.
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "46523089",
"text": "Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.\n\nMy rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too."
}
,
{
"id": "46524689",
"text": "> Engineers write bugs all the time\n\nWhy do we hold calculators to such high bars? Humans make calculation mistakes all the time.\n\nWhy do we hold banking software to such high bars? People forget where they put their change all the time.\n\nEtc etc."
}
,
{
"id": "46528921",
"text": "I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:\n\nhttps://qntm.org/notpointthree"
}
,
{
"id": "46530473",
"text": "Some of them. The good ones don't."
}
,
{
"id": "46523214",
"text": "my unrealistic bar lies somewhere above \"pick a new library\" bug resolution"
}
,
{
"id": "46530351",
"text": "Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models."
}
,
{
"id": "46535800",
"text": "Yes, December 2025 and January 2026."
}
,
{
"id": "46530138",
"text": "I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen."
}
,
{
"id": "46530278",
"text": "I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.\n\nThe Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic , and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing ."
}
,
{
"id": "46530370",
"text": "Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet."
}
,
{
"id": "46523072",
"text": "I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent."
}
,
{
"id": "46527677",
"text": "That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.\n\nI've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.\n\nI already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has \"code interpreter\" support for Python, not C and/or C++.\n\nUsing Gemini to help define the ISA was an interesting process. It had useful input in a \"pair-design\" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to \"please answer then stop\", and interestingly quality of the \"conversation\" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).\n\nI'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document."
}
,
{
"id": "46523315",
"text": "I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines.\nI have also tried it with retro-computing programming with relative success."
}
,
{
"id": "46521199",
"text": "> Mobile\n\nFrom what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.\n\nBut it's übercharged™ for C#"
}
,
{
"id": "46521418",
"text": "> It also can't do Rust really well, once you get to the meat of it. Not sure why that is\n\nBecause types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally."
}
,
{
"id": "46516290",
"text": "I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.\n\nAnd I get it. Coding with Claude Code really was prompting something, getting errors, and asking it to fix it. Which was still useful but I could see why a skilled coder adding a feature to a complex codebase would just give up\n\nOpus 4.5 really is at a new tier however. It just...works. The errors are far fewer and often very minor - \"careless\" errors, not fundamental issues (like forgetting to add \"use client\" to a nextjs client component."
}
,
{
"id": "46519587",
"text": "This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance.\n\nI even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding.\n\nOh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it."
}
,
{
"id": "46519827",
"text": "I have a few Go projects now and I speak Go as well as you speak Kotlin. I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.\n\nFor instance, I always respected types, but I'm too lazy to go spend hours working on types when I can just do ruby-style duck typing and get a long ways before the inevitable problems rear their head. Now, I can use a strongly typed language and get the advantages for \"free\"."
}
,
{
"id": "46527497",
"text": "> I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.\n\nOh absolutely. I've been using Python for past 15 or so years for everything.\n\nI've never written a single line of Rust in my life, and all my new projects are Rust now, even the quick-script-throwaway things, because it's so much better at instantly screaming at claude when it goes off track. It may take it longer to finish what I asked it to do, but requires so much less involvement from me.\n\nI will likely never start another new project in python ever.\n\nEDIT: Forgot to add that paired with a good linter, this is even more impressive. I told Claude to come up with the most masochistic clippy configuration possible, where even a tiny mistake is instantly punished and exceptions have to be truly exceptional (I have another agent that verifies this each run).\n\nI just wish there was cargo-clippy for enforcing architectural patterns."
}
,
{
"id": "46520728",
"text": "and with types, it makes it easier for rounds of agents to pick up mistakes at compile time, statically. linting and sanity checking untyped languages only goes so far.\nI've not seen LLM's one shot perl style regexes. and javascript can still have ugly runtime WTFs"
}
,
{
"id": "46521917",
"text": "I've found this too.\n\nI find I'm doing more Typescript projects than Python because of the superior typing, despite the fact I prefer Python."
}
,
{
"id": "46523848",
"text": "How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?"
}
,
{
"id": "46524441",
"text": "By tomorrow the app will be replaced with a new version from the other competitor, by that time the memory leak will not reveal itself"
}
,
{
"id": "46528856",
"text": "Part of the \"one day\" development time was exhaustively testing it. Since the tool's scope is so small, getting good test coverage was pretty easy. Of course, I'm not guaranteeing through formal verification methods that the code is bug free. I did find bugs, but they were all areas that were poorly specified by me in the requirements."
}
,
{
"id": "46521053",
"text": "Oh, wow, that's impressive, thanks for sharing!\n\nGoing to one-up you though -- here's a literal one-liner that gets me a polished media center with beautiful interface and powerful skinning engine. It supports Android, BSD, Linux, macOS, iOS, tvOS and Windows.\n\n`git clone https://github.com/xbmc/xbmc.git `"
}
,
{
"id": "46521148",
"text": "Hah! I actually initiated the project because I'm a long time XBMC/Kodi user. I started using it when it was called XBMC, on an actual Xbox 1. I am sick and tired of its crashing, poor playback performance, and increasingly bloated feature set. It's embarrassing when I have friends or family over for movie night, and I have to explain \"Sorry folks, Kodi froze midway through the movie again\" while I frantically try to re-launch/reboot my way back to watching the movie. VLC's playback engine is much better but the VLC app's TV UX is ass. This application actually uses the libVLC playback engine under the hood."
}
,
{
"id": "46522405",
"text": "I think anecdotes like this may prove very relevant the next few years. AI might make bad code, but a project of bad code that's still way smaller than a bloated alternative, and has a UX tailored to your exact requirements could be compelling.\n\nA big part of the problem with existing software is that humans seem to be pretty much incapable of deciding a project is done and stop adding to it. We treat creating code like a job or hobby instead of a tool. Nothing wrong with that, unless you're advertising it as a tool."
}
,
{
"id": "46522723",
"text": "Yea, after this little experiment, I feel like I can just go through every big, bloated, slow, tech-debt-ridden software I use and replace it with a tiny, bespoke version that does only what I need and no more.\n\nThe old adage about how \"users use 10% of your software's features, but they each use a different 10%\" can now be solved by each user just building that 10% for themselves."
}
,
{
"id": "46522168",
"text": "Have you tried VidHub? Works nicely against almost anything. Plex, jellyfin, smb/webdav folder etc"
}
,
{
"id": "46521154",
"text": "I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library\n\nAt work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions.\n\nThen I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed"
}
,
{
"id": "46521909",
"text": "> At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems.\n\nYou really need to at least try Claude Code directly instead of using CoPilot. My work gives us access to CoPilot, Claude Code, and Codex. CoPilot isn’t close to the other more agentic products."
}
,
{
"id": "46522328",
"text": "Vs code copilot extension the harness is not great, but Opus 4.5 with Copilot CLI works quite well."
}
,
{
"id": "46529517",
"text": "Do they manage context differently or have different system prompts? I would assume a lot of that would be the same between them. I think GH Copilots biggest shortcoming is that they are too token cheap. Aggressively managing context to the detriment of the results. Watching Claude read a 500 line file in 100 line chunks just makes me sad."
}
,
{
"id": "46520865",
"text": "Thanks for posting this. It's a nice reminder that despite all the noise from hype-mongers and skeptics in the past few years, most of us here are just trying to figure this all out with an open mind and are ready to change our opinions when the facts change. And a lot of people in the industry that I respect on HN or elsewhere have changed their minds about this stuff in the last year, having previously been quite justifiably skeptical. We're not in 2023 anymore.\n\nIf you were someone saying at the start of 2025 \"this is a flash in the pan and a bunch of hype, it's not going to fundamentally change how we write code\", that was still a reasonable belief to hold back then. At the start of 2026 that position is basically untenable: it's just burying your head in the sand and wishing for AI to go away. If you're someone who still holds it you really really need to download Claude Code and set it to Opus and start trying it with an open mind: I don't know what else to tell you. So now the question has shifted from whether this is going to transform our profession (it is), to how exactly it's going to play out. I personally don't think we will be replacing human engineers anytime soon (\"coders\", maybe!), but I'm prepared to change my mind on that too if the facts change. We'll see.\n\nI was a fellow mind-changer, although it was back around the first half of last year when Claude Code was good enough to do things for me in a mature codebase under supervision. It clearly still had a long way to go but it was at that tipping point from \"not really useful\" to \"useful\". But Opus 4.5 is something different - I don't feel I have to keep pulling it back on track in quite the way I used to with Sonnet 3.7, 4, even Sonnet 4.5.\n\nFor the record, I still think we're in a bubble. AI companies are overvalued. But that's a separate question from whether this is going to change the software development profession."
}
,
{
"id": "46521435",
"text": "The AI bubble is kind of like the dot-com bubble in that it's a revolutionary technology that will certainly be a huge part of the future, but it's still overhyped (i.e. people are investing without regard for logic)."
}
,
{
"id": "46521658",
"text": "We were enjoying cheap second hand rack mount servers, RAM, hard drives, printers, office chairs and so on for a decade after the original dot com crash. Every company that went out of business liquidated their good shit for pennies.\n\nI'm hoping after AI comes back down to earth there will be a new glut of cheap second hand GPUs and RAM to get snapped up."
}
,
{
"id": "46521613",
"text": "Right. And same for railways, which had a huge bubble early on. Over-hyped on the short time horizon. Long term, they were transformative in the end, although most of the early companies and early investors didn’t reap the eventual profits."
}
,
{
"id": "46521978",
"text": "But the dot-com bubble wasn't overhyed in retrospect. It was under-hyped."
}
,
{
"id": "46522111",
"text": "At the time it was overhyped because just by adding .com to your company's name you could increase your valuation regardless of whether or not you had anything to do with the internet. Is that not stupid?\n\nI think my comparison is apt; being a bubble and a truly society-altering technology are not mutually exclusive, and by virtue of it being a bubble, it is overhyped."
}
,
{
"id": "46522583",
"text": "There was definitely a lot of stupid stuff happening. IMO the clearest accurate way to put it is that it was overhyped for the short term (hence the crazy high valuations for obvious bullshit), and underhyped for the long term (in the sense that we didn't really foresee how broadly and deeply it would change the world). Of course, there's more nuance to it, because some people had wild long-term predictions too. But I think the overall, mainstream vibe was to underappreciate how big a deal it was."
}
,
{
"id": "46519996",
"text": "> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.\n\n... says it all."
}
,
{
"id": "46519757",
"text": "I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all\n\nbut I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window"
}
,
{
"id": "46520540",
"text": "Ah, but you’re at the beginning stage young grasshopper. Soon you will be missing that horizontal ultra wide monitor as you spin up 8 different Claude agents in parallel seasons."
}
,
{
"id": "46520769",
"text": "oh I noticed! I've begun doing that on my laptop. I just started going down all my list of sideprojects one by one, then two by two, a Claude Code instance in a terminal window for each folder. It's a bit mental\n\nI'm finding that branding and graphic design is the most arduous part, that I'm hoping to accelerate soon. I'm heavily AI assisted there too and I'm evaluating MCP servers to help, but so far I do actually have to focus on just that part as opposed to babysit"
}
,
{
"id": "46521145",
"text": "> \"asking it to fix it.\"\n\nThis is what people are still doing wrong. Tools in a loop people, tools in a loop.\n\nThe agent has to have the tools to detect whatever it just created is producing errors during linting/testing/running. When it can do that, I can loop again, fix the error and again - use the tools to see whether it worked.\n\nI _still_ encounter people who think \"AI programming\" is pasting stuff into ChatGPT on the browser and they complain it hallucinates functions and produces invalid code.\n\nWell, d'oh."
}
,
{
"id": "46526033",
"text": "Last weekend I was debugging some blocking issue on a microcontroller with embassy-rs, where the whole microcontroller would lock up as soon as I started trying to connect to an MQTT server.\n\nI was having Opus investigate it and I kept building and deploying the firmware for testing.. then I just figured I'd explain how it could do the same and pull the logs.\n\nOff it went, for the next ~15 minutes it would flash the firmware multiple times until it figured out the issue and fixed it.\n\nThere was something so interesting about seeing a microcontroller on the desk being flashed by Claude Code, with LEDs blinking indicating failure states. There's something about it not being just code on your laptop that felt so interesting to me.\n\nBut I agree, absolutely, red/green test or have a way of validating (linting, testing, whatever it is) and explain the end-to-end loop, then the agent is able to work much faster without being blocked by you multiple times along the way."
}
,
{
"id": "46527606",
"text": "This is kind of why I'm not really scared of losing my job.\n\nWhile Claude is amazing at writing code, it still requires human operators. And even experienced human operators are bad at operating this machinery.\n\nTell your average joe - the one who thinks they can create software without engineers - what \"tools-in-a-loop\" means, and they'll make the same face they made when you tried explaining iterators to them, before LLMs.\n\nExplain to them how typing system, E2E or integration test helps the agent, and suddenly, they now have to learn all the things they would be required to learn to be able to write on their own."
}
,
{
"id": "46526204",
"text": "Jules is slow incompetent shit and that uses tools in a loop, so no..."
}
,
{
"id": "46520795",
"text": "I have been out of the loop for a couple of months (vacation). I tried Claude Opus 4.5 at the end of November 2025 with the corporate Github Copilot subscription in Agent mode and it was awful: basically ignoring code and hallucinating.\n\nMy team is using it with Claude Code and say it works brilliantly, so I'll be giving it another go.\n\nHow much of the value comes from Opus 4.5, how much comes from Claude Code, and how much comes from the combination?"
}
,
{
"id": "46521285",
"text": "As someone coming from GitHub copilot in vscode and recently trying Claude Code plugin for vscode I don't get the fuss about Claude.\n\nCopilot has by far the best and most intuitive agent UI. Just make sure you're in agent mode and choose Sonnet or Opus models.\n\nI've just cancelled my Claude sub and gone back and will upgrade to the GH Pro+ to get more sonnet/opus."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50