Summarizer

LLM Input

llm/0c6097e3-bc76-4fbe-ab4f-ceafa2484e5f/batch-11-6b46d2bf-adee-4e78-bd16-1e4e696ca763-input.json

prompt

The following is content for you to classify. Do not respond to the comments—classify them.

<topics>
1. AI Performance on Greenfield vs. Legacy
   Related: Users debate whether agents excel primarily at starting new projects from scratch while struggling to maintain large, complex, or legacy codebases without breaking existing conventions.
2. Context Window Limitations and Management
   Related: Discussions focus on token limits (200k), performance degradation as context fills, and strategies like compacting history, using sub-agents, or maintaining summary files to preserve long-term memory.
3. Vibe Coding and Code Quality
   Related: The polarization around building apps without reading the code; critics warn of unmaintainable "slop" and technical debt, while proponents value the speed and ability to bypass syntax.
4. Claude Code and Tooling
   Related: Specific praise and critique for the Claude Code CLI, its integration with VS Code and Cursor, the use of slash commands, and comparisons to GitHub Copilot's agent mode.
5. Economic Impact on Software Jobs
   Related: Existential anxiety regarding the obsolescence of mid-level engineers, the potential "hollowing out" of the middle class, and the shift toward one-person unicorn teams.
6. Prompt Engineering and Configuration
   Related: Strategies involving `CLAUDE.md`, `AGENTS.md`, and custom system prompts to teach the AI coding conventions, architecture, and specific skills for better output.
7. Specific Language Capabilities
   Related: Anecdotal evidence regarding proficiency in React, Python, and Go versus struggles in C++, Rust, and mobile development (Swift/Kotlin), often tied to training data availability.
8. Engineering vs. Coding
   Related: A recurring distinction between "coding" (boilerplate, standard patterns) which AI conquers, and "engineering" (novel logic, complex systems, 3D graphics) where AI supposedly still fails.
9. Security and Trust
   Related: Concerns about deploying unaudited AI code, the introduction of vulnerabilities, the risks of giving agents shell access, and the difficulty of verifying AI output.
10. The Skill Issue Argument
   Related: Proponents dismiss failures as "skill issues," suggesting frustration stems from poor prompting or adaptability, while skeptics argue the tools are genuinely inconsistent.
11. Cost of AI Development
   Related: Analysis of the financial viability of AI coding, including hitting API rate limits, the high cost of Opus 4.5 tokens, and the potential unsustainability of VC-subsidized pricing.
12. Future of Software Products
   Related: Predictions that software creation costs will drop to zero, leading to a flood of bespoke personal apps replacing commercial SaaS, but potentially creating a maintenance nightmare.
13. Human-in-the-Loop Workflows
   Related: The consensus that AI requires constant human oversight, "tools in a loop," and code review to prevent hallucination loops and ensure functional software.
14. Opus 4.5 vs. Previous Models
   Related: Users describe the specific model as a "step change" or "inflection point" compared to Sonnet 3.5 or GPT-4, citing better reasoning and autonomous behavior.
15. Documentation and Specification
   Related: The shift from writing code to writing specs; users find that detailed markdown documentation or "plan mode" yields significantly better AI results than vague prompts.
16. AI Hallucinations and Errors
   Related: Reports of AI inventing non-existent CLI tools, getting stuck in logical loops, failing at visual UI tasks, and making simple indexing errors.
17. Shift in Developer Role
   Related: The idea that developers are evolving into "product managers" or "architects" who direct agents, requiring less syntax proficiency and more systems thinking.
18. Testing and Verification
   Related: The reliance on test-driven development (TDD), linters, and compilers to constrain non-deterministic AI output, ensuring generated code actually runs and meets requirements.
19. Local Models vs. Cloud APIs
   Related: Discussions on the viability of local models for privacy and cost savings versus the necessity of massive cloud models like Opus for complex reasoning tasks.
20. Societal Implications
   Related: Broader philosophical concerns about wealth concentration, the "class war" of automation, environmental impact, and the future of work in a post-code world.
0. Does not fit well in any category
</topics>

<comments_to_classify>
[
  
{
  "id": "46521986",
  "text": "To be fair, you're not supposed to be doing the \"one shot\" thing with LLMs in a mature codebase.\n\nYou have to supply it the right context with a well formed prompt, get a plan, then execute and do some cleanup.\n\nLLMs are only as good as the engineers using them, you need to master the tool first before you can be productive with it."
}
,
  
{
  "id": "46529866",
  "text": "I’m well aware, as I said I am regularly using CC/Codex/OC in a variety of projects, and I certainly didn’t claim that can’t be used productively in a large code base.\n\nBut that different challenges become apparent that aren’t addressed by examples like this article which tend to focus on narrow, greenfield applications that can be readily rebuilt in one shot.\n\nI already get plenty of value in small side projects that Claude can create in minutes. And while extremely cool, these examples aren’t the kind of “step change” improvement I’d like to see in the area where agentic tools are currently weakest in my daily usage."
}
,
  
{
  "id": "46523439",
  "text": "I would be much more impressed with implementing new, long-requested features into existing software (that are open to later maintain LLM-generated code)."
}
,
  
{
  "id": "46529270",
  "text": "Fully agreed! That’s the exact kind of thing I was hoping to find when I read the article title, but unfortunately it was really just another “normal AI agent experience” I’ve seen (and built) many examples of before."
}
,
  
{
  "id": "46521451",
  "text": "Adding capacity to software engineering through LLMs is like adding lanes to a highway — all the new capacity will be utilized.\n\nBy getting the LLM to keep changes minimal I’m able to keep quality high while increasing velocity to the point where productivity is limited by my review bandwidth.\n\nI do not fear competition from junior engineers or non-technical people wielding poorly-guided LLMs for sustained development. Nor for prototyping or one offs, for that matter — I’m confident about knowing what to ask for from the LLM and how to ask."
}
,
  
{
  "id": "46528098",
  "text": "No that has certainly been my experience, but what is going to be the forcing function after a company decides it needs less engineers to go back to hiring?"
}
,
  
{
  "id": "46523249",
  "text": "This is relatively easily fixed with increasing test coverage to near 100% and lifting critical components into model checker space; both approaches were prohibitively expensive before November. They’ll be accepted best practices by the summer."
}
,
  
{
  "id": "46521181",
  "text": "Why not have the LLM rewrite the entire codebase?"
}
,
  
{
  "id": "46521276",
  "text": "In ~25 years or so of dealing with large, existing codebases, I've seen time and time again that there's a ton of business value and domain knowledge locked up inside all of that \"messy\" code. Weird edge cases that weren't well covered in the design, defensive checks and data validations, bolted-on extensions and integrations, etc., etc.\n\n\"Just rewrite it\" is usually -- not always, but _usually_ -- a sure path to a long, painful migration that usually ends up not quite reproducing the old features/capabilities and adding new bugs and edge cases along the way."
}
,
  
{
  "id": "46521322",
  "text": "Classic Joel Spolsky:\n\nhttps://www.joelonsoftware.com/2000/04/06/things-you-should-...\n\n> the single worst strategic mistake that any software company can make:\n\n> rewrite the code from scratch."
}
,
  
{
  "id": "46521807",
  "text": "Steve Yegge talks about this exact post a lot - how it stayed correct advice for over 25 years - up until October 2025."
}
,
  
{
  "id": "46522218",
  "text": "Time will tell. I’d bet on Spolsky, because of Hyrum’s Law.\n\nhttps://www.hyrumslaw.com/\n\n> With a sufficient number of users of an API,\nit does not matter what you promise in the contract:\nall observable behaviors of your system\nwill be depended on by somebody.\n\nAn LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important.\n\nFurthermore, Spolsky talks about how to do incremental rewrites of legacy code in his post. I’ve done many of these and I expect LLMs will make the next one much easier."
}
,
  
{
  "id": "46522422",
  "text": ">An LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important.\n\nI've been using LLMs to write docs and specs and they are very very good at it."
}
,
  
{
  "id": "46522559",
  "text": "That’s a fair point — I agree that LLMs do a good job predicting the documentation that might accompany some code. I feel relieved when I can rely on the LLM to write docs that I only need to edit and review.\n\nBut I’m using LLMs regularly and I feel pretty effectively — including Opus 4.5 — and these “they can rewrite your entire codebase” assertions just seem crazy incongruous with my lived experience guiding LLMs to write even individual features bug-free."
}
,
  
{
  "id": "46521771",
  "text": "When an LLM can rewrite it in 24 hours and fill the missing parts in minutes that argument is hard to defend.\n\nI can vibe code what a dev shop would charge 500k to build and I can solo it in 1-2 weeks. This is the reality today. The code will pass quality checks, the code doesn’t need to be perfect, it doesn’t need to be cleaver it needs to be.\n\nIt’s not difficult to see this right? If an LLM can write English it can write Chinese or python.\n\nThen it can run itself, review itself and fix itself.\n\nThe cat is out of bag, what it will do to the economy… I don’t see anything positive for regular people. Write some code has turned into prompt some LLM. My phone can outplay the best chess player in the world, are you telling me you think that whatever unbound model anthropic has sitting in their data center can’t out code you?"
}
,
  
{
  "id": "46523444",
  "text": "Well, where is your competitor to mainstream software products?"
}
,
  
{
  "id": "46533840",
  "text": "What mainstream software product do I use on a day to day basis besides Claude?\n\nThe ones that continue to survive all build around a platform of services, MSO, Adobe, etc.\n\nMost enterprise product offerings, platform solutions, proprietary data access, proprietary / well accepted implementation. But lets not confuse it with the ability to clone it, it doesnt seem far fetched to get 10 people together and vibe out a full slack replacement in a few weeks."
}
,
  
{
  "id": "46521427",
  "text": "If the LLM just wrote the whole thing last week, surely it can write it again."
}
,
  
{
  "id": "46521747",
  "text": "If an LLM wrote the whole project last week and it already requires a full rewrite, what makes you think that the quality of that rewrite will be significantly higher, and that it will address all of the issues? Sure, it's all probabilistic so there's probably a nonzero chance for it to stumble into something where all the moving parts are moving correctly, but to me it feels like with our current tech, these odds continue shrinking as you toss on more requirements and features, like any mature project. It's like really early LLMs where if they just couldn't parse what you wanted, past a certain point you could've regenerated the output a million times and nothing would change."
}
,
  
{
  "id": "46521645",
  "text": "* With a slightly different set of assumption, which may or may not matter. UAT is cheap.\n\nAnd data migration is lossy, becsuse nobody care the data fidelity anyway."
}
,
  
{
  "id": "46521608",
  "text": "Broken though"
}
,
  
{
  "id": "46521041",
  "text": "The whole point of good engineering was not about just hitting the hard specs, but also have extendable, readable, maintainable code.\n\nBut if today it’s so cheap to generate new code that meets updated specs, why care about the quality of the code itself?\n\nMaybe the engineering work today is to review specs and tests and let LLMs do whatever behind the scenes to hit the specs. If the specs change, just start from scratch."
}
,
  
{
  "id": "46521473",
  "text": "\"Write the specs and let the outsourced labor hit them\" is not a new tale.\n\nLet's assume the LLM agents can write tests for, and hit, specs better and cheaper than the outsourced offshore teams could.\n\nSo let's assume now you can have a working product that hits your spec without understanding the code. How many bugs and security vulnerabilities have slipped through \"well tested\" code because of edge cases of certain input/state combinations? Ok, throw an LLM at the codebase to scan for vulnerabilities; ok, throw another one at it to ensure no nasty side effects of the changes that one made; ok, add some functionality and a new set of tests and let it churn through a bunch of gross code changes needed to bolt that functionality into the pile of spaghetti...\n\nHow long do you want your critical business logic relying on not-understood code with \"100% coverage\" (of lines of code and spec'd features) but super-low coverage of actual possible combinations of input+machine+system state? How big can that codebase get before \"rewrite the entire world to pass all the existing specs and tests\" starts getting very very very slow?\n\nWe've learned MANY hard lessons about security, extensibility, and maintainability of multi-million-LOC-or-larger long-lived business systems and those don't go away just because you're no longer reading the code that's making you the money. They might even get more urgent. Is there perhaps a reason Google and Amazon didn't just hire 10x the number of people at 1/10th the salary to replace the vast majority of their engineering teams year ago?"
}
,
  
{
  "id": "46521338",
  "text": "> let LLMs do whatever behind the scenes to hit the specs\n\nassuming for the sake of argument that's completely true, then what happens to \"competitive advantage\" in this scenario?\n\nit gets me thinking: if anyone can vibe from spec, whats stopping company a (or even user a) from telling an llm agent \"duplicate every aspect of this service in python and deploy it to my aws account xyz\"...\n\nin that scenario, why even have companies?"
}
,
  
{
  "id": "46523441",
  "text": "It’s all fun and games vibecoding until you\nA) have customers who depend on your product\nB) it breaks or the one person prompting and has access to the servers and api keys gets incapacited (or just bored).\n\nSure we can vibecode oneoff projects that does something useful (my fav is browser extensions) but as soon as we ask others to use our code on a regular basis the technical debt clock starts running. And we all know how fast dependencies in a project breaks."
}
,
  
{
  "id": "46521856",
  "text": "You can do this for many things now.\n\nWalmart, McDonalds, Nike - none really have any secrets about what they do. There is nothing stopping someone from copying them - except that businesses are big, unwieldy things.\n\nWhen software becomes cheap companies compete on their support. We see this for Open Source software now."
}
,
  
{
  "id": "46523473",
  "text": "These are businesses with extra-large capital requirements. You ain't replicating them, because you don't have the money, and they can easily strangle you with their money as you start out.\n\nSoftware is different, you need very very little to start, historically just your own skills and time. Thes latter two may see some changes with LLMs."
}
,
  
{
  "id": "46525143",
  "text": "How conveniently you forgot about the most impotant things for a product to make money - marketing and the network effect...."
}
,
  
{
  "id": "46525415",
  "text": "I don't see the relevance to the discussion. Marketing is not significantly different for a shop and a online-only business.\n\nHaving to buy a large property, fulfilling every law, etc is materially different than buying a laptop and renting a cloud instance. Almost everyone has the material capacity to do the latter, but almost no one has the privilege for the former."
}
,
  
{
  "id": "46521399",
  "text": "The business is identifying the correct specs and filter the customer needs/requests so that the product does not become irrelevant."
}
,
  
{
  "id": "46521497",
  "text": "Okay, we will copy that version of the product too.\n\nThere is more to it than the code and software provided in most cases I feel."
}
,
  
{
  "id": "46521541",
  "text": "I think `andrekandre is right in this hypothetical.\n\nWho'd pay for brand new Photoshop with a couple new features and improvements if LLM-cloned Photoshop-from-three-months-ago is free?\n\nThe first few iterations of this cloud be massively consumer friendly for anything without serious cloud infra costs. Cheap clones all around. Like generic drugs but without the cartel-like control of manufacturing.\n\nBusiness after that would be dramatically different, though. Differentiating yourself from the willing-to-do-it-for-near-zero-margin competitors to produce something new to bring in money starts to get very hard. Can you provide better customer support? That could be hard, everyone's gonna have a pretty high baseline LLM-support-agent already... and hiring real people instead could dramatically increase the price difference you're trying to justify... Similarly for marketing or outreach etc; how are you going to cut through the AI-agent-generated copycat spam that's gonna be pounding everyone when everyone and their dog has a clone of popular software and services?\n\nPhotoshop type things are probably a really good candidate for disruption like that because to a large extent every feature is independent. The noise reduction tool doesn't need API or SDK deps on the layer-opacity tool, for instance. If all your features are LLM balls of shit that doesn't necessarily reduce your ability to add new ones next to them, unlike in a more relational-database-based web app with cross-table/model dependencies, etc.\n\nAnd in this \"try out any new idea cheaply and throw crap against the wall and see what sticks\" world \"product managers\" and \"idea people\" etc are all pretty fucked. Some of the infinite monkeys are going to periodically hit to gain temporary advantage, but good luck finding someone to pay you to be a \"product visionary\" in a world where any feature can be rolled out and tested in the market by a random dev in hours or days."
}
,
  
{
  "id": "46522482",
  "text": "OK, so what do people do? What do people need? People still need to eat, people get married and die, and all of the things surrounding that, all sorts of health related stuff. Nightlife events. Insurance. actuaries. Raising babies. What do you spend your fun money on?\n\nPeople pay for things they use. If bespoke software is a thing you pick up at the mall at a kiosk next to Target we gotta figure something out."
}
,
  
{
  "id": "46526261",
  "text": "It's all fine till money starts being involved and whoopsies cost more than few hours of fixing."
}
,
  
{
  "id": "46523730",
  "text": "> What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects\n\nThey get those ocassionally all the time though too. Depends on the company. In some software houses it's constant \"greenfield projects\", one after another. And even in companies with 1-2 pieces of main established software to maintain, there are all kinds of smaller utilities or pipelines needed.\n\n> But day to day, when I ask it \"build me this feature\" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider \"right\".\n\nIn some cases that's legit. In other cases it's just \"it did it well, but not how I'd done it\", which is often needless stickness to some particular style (often a contention between 2 human programmers too).\n\nBasically, what FloorEgg says in this thread: \"There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things.\"\n\nAnd you can always not just tell it \"build me this feature\", but tell it (high level way) how to do it, and give it a generic context about such preferences too."
}
,
  
{
  "id": "46527481",
  "text": "Even if you are going green field, you need to build it the way it is likely to be used based a having a deep familiarity with what that customer's problems are and how their current workflow is done. As much as we imagine everything is on the internet, a bunch of this stuff is not documented anywhere. An LLM could ask the customer requirement questions but that familiarity is often needed to know the right questions to ask. It is hard to bootstrap.\n\nEven if it could build the perfect greenfield app, as it updates the app it is needs to consider backwards compatibility and breaking changes. LLMs seem very far as growing apps. I think this is because LLMs are trained on the final outcome of the engineering process, but not on the incremental sub-commit work of first getting a faked out outline of the code running and then slowly building up that code until you have something that works.\n\nThis isn't to say that LLMs or other AI approaches couldn't replace software engineering some day, but they clear aren't good enough yet and the training sets they have currently have access to are unlikely to provide the needed examples."
}
,
  
{
  "id": "46524815",
  "text": "Yeah. Just like another engineer. When you tell another engineer to build you a feature, it's improbable they'll do it they way that you consider \"right.\"\n\nThis sounds a lot like the old arguments around using compilers vs hand-writing asm. But now you can tell the LLM how you want to implement the changes you want. This will become more and more relevant as we try to maintain the code it generates.\n\nBut, for right now, another thing Claude's great at is answering questions about the codebase. It'll do the analysis and bring up reports for you. You can use that information to guide the instructions for changes, or just to help you be more productive."
}
,
  
{
  "id": "46522481",
  "text": "> its building it the right way, in an easily understood way, in a way that's easily extensible.\n\nWhen I worked at Google, people rarely got promoted for doing that. They got promoted for delivering features or sometimes from rescuing a failing project because everyone was doing the former until promotion velocity dropped and your good people left to other projects not yet bogged down too far."
}
,
  
{
  "id": "46523600",
  "text": "You can look at my comment history to see the evidence to how hostile I was to agentic coding. Opus 4.5 completely changed my opinion.\n\nThis thing jumped into a giant JSF (yes, JSF) codebase and started fixing things with nearly zero guidance."
}
,
  
{
  "id": "46531240",
  "text": "After recently applying Codex to a gigantic old and hairy project that is as far from greenfield it can be, I can assure you this assertion is false. It’s bonkers seeing 5.2 churn though the complexity and understanding dependencies that would take me days or weeks to wrap my head around."
}
,
  
{
  "id": "46526335",
  "text": "In my personal experience, Claude is better at greenfield, Codex is better at fitting in. Claude is the perfect tool for a \"vibe coder\", Codex is for the serious engineer who wants to get great and real work done.\n\nCodex will regularly give me 1000+ line diffs where all my comments (I review every single line of what agents write) are basically nitpicks. \"Make this shallow w/ early return, use | None instead of Optional\", that sort of thing.\n\nI do prompt it in detail though. It feels like I'm the person coming in with the architecture most of the time, AI \"draws the rest of the owl.\""
}
,
  
{
  "id": "46521731",
  "text": "My favorite benchmark for LLMs and agents is to have it port a medium-complexity library to another programming language. If it can do that well, it's pretty capable of doing real tasks. So far, I always have to spend a lot of time fixing errors. There are also often deep issues that aren't obvious until you start using it."
}
,
  
{
  "id": "46521803",
  "text": "Comments on here often criticise ports as easy for LLMs to do because there's a lot of training and tests are all there, which is not as complex as real word tasks"
}
,
  
{
  "id": "46524687",
  "text": "Exactly. The main issue IMO is that \"software that seems to work\" and \"software that works\" can be very hard to tell apart without validating the code, yet these are drastically different in terms of long-term outcomes. Especially when there's a lot of money, or even lives, riding on these outcomes. Just because LLMs can write software to run the Therac-25 doesn't mean it's acceptable for them to do so.\n\nYour hobby project, though, knock yourself out."
}
,
  
{
  "id": "46526129",
  "text": "Another thing these posts assume is a single developer keep working on the product with a number of AI agents, not a large team. I think we need to rethink how teams work with AI. Its probably not gonna be a single developer typing a prompt but a team somehow collaborates a prompt or equivalent. XP on steroids? Programming by committee?"
}
,
  
{
  "id": "46521273",
  "text": "I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though."
}
,
  
{
  "id": "46521873",
  "text": "I don’t know what I’m doing wrong. Today I tried to get it to upgrade Nx, yarn and some resolutions in a typescript monorepo with about 20 apps at work (Opus 4.5 through Kiro) and it just…couldn’t do it. It hit some snags with some of the configuration changes required by the upgrade and resorted to trying to make unwanted changes to get it to build correctly. I would have thought that’s something it could hit out of the park. I finally gave up and just looked at the docs and some stack overflow and fixed it myself. I had to correct it a few times about correct config params too. It kept imagining config options that weren’t valid."
}
,
  
{
  "id": "46522833",
  "text": "> ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.\n\nPeople keep telling me that an LLM is not intelligence, it's simply spitting out statistically relevant tokens. But surely it takes intelligence to understand (and actually execute!) the request to \"read adjacent code\"."
}
,
  
{
  "id": "46522959",
  "text": "I used to agree with this stance, but lately I'm more in the \"LLMs are just fancy autocomplete\" camp. They can just autocomplete increasingly more things, and when they can't, they fail in ways that an intelligent being just wouldn't. Rather that just output a wrong or useless autocompletion."
}
,
  
{
  "id": "46523026",
  "text": "They're not an equivalent intelligence as human's and thus have noticeably different failure modes. But human's fail in ways that they don't (eg. being unable to match llm's breadth and depth of knowledge)\n\nBut the question i'm really asking is... isn't it more than a sheer statistical \"trick\" if an LLM can actually be instructed to \"read surrounding code\", understand the request, and demonstrably include it in its operation? You can't do that unless you actually understand what \"surrounding code\" is, and more importantly have a way to comply with the request..."
}

]
</comments_to_classify>

Based on the comments above, assign each to up to 3 relevant topics.

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  
{
  "id": "comment_id_3",
  "topics": [
    0
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment

Remember: Output ONLY the JSON array, no other text.

commentCount

50

← Back to job