Summarizer

LLM Input

llm/0c6097e3-bc76-4fbe-ab4f-ceafa2484e5f/batch-7-5fc8df9a-2e61-402b-ac8e-5ad2372a7039-input.json

prompt

The following is content for you to classify. Do not respond to the comments—classify them.

<topics>
1. AI Performance on Greenfield vs. Legacy
   Related: Users debate whether agents excel primarily at starting new projects from scratch while struggling to maintain large, complex, or legacy codebases without breaking existing conventions.
2. Context Window Limitations and Management
   Related: Discussions focus on token limits (200k), performance degradation as context fills, and strategies like compacting history, using sub-agents, or maintaining summary files to preserve long-term memory.
3. Vibe Coding and Code Quality
   Related: The polarization around building apps without reading the code; critics warn of unmaintainable "slop" and technical debt, while proponents value the speed and ability to bypass syntax.
4. Claude Code and Tooling
   Related: Specific praise and critique for the Claude Code CLI, its integration with VS Code and Cursor, the use of slash commands, and comparisons to GitHub Copilot's agent mode.
5. Economic Impact on Software Jobs
   Related: Existential anxiety regarding the obsolescence of mid-level engineers, the potential "hollowing out" of the middle class, and the shift toward one-person unicorn teams.
6. Prompt Engineering and Configuration
   Related: Strategies involving `CLAUDE.md`, `AGENTS.md`, and custom system prompts to teach the AI coding conventions, architecture, and specific skills for better output.
7. Specific Language Capabilities
   Related: Anecdotal evidence regarding proficiency in React, Python, and Go versus struggles in C++, Rust, and mobile development (Swift/Kotlin), often tied to training data availability.
8. Engineering vs. Coding
   Related: A recurring distinction between "coding" (boilerplate, standard patterns) which AI conquers, and "engineering" (novel logic, complex systems, 3D graphics) where AI supposedly still fails.
9. Security and Trust
   Related: Concerns about deploying unaudited AI code, the introduction of vulnerabilities, the risks of giving agents shell access, and the difficulty of verifying AI output.
10. The Skill Issue Argument
   Related: Proponents dismiss failures as "skill issues," suggesting frustration stems from poor prompting or adaptability, while skeptics argue the tools are genuinely inconsistent.
11. Cost of AI Development
   Related: Analysis of the financial viability of AI coding, including hitting API rate limits, the high cost of Opus 4.5 tokens, and the potential unsustainability of VC-subsidized pricing.
12. Future of Software Products
   Related: Predictions that software creation costs will drop to zero, leading to a flood of bespoke personal apps replacing commercial SaaS, but potentially creating a maintenance nightmare.
13. Human-in-the-Loop Workflows
   Related: The consensus that AI requires constant human oversight, "tools in a loop," and code review to prevent hallucination loops and ensure functional software.
14. Opus 4.5 vs. Previous Models
   Related: Users describe the specific model as a "step change" or "inflection point" compared to Sonnet 3.5 or GPT-4, citing better reasoning and autonomous behavior.
15. Documentation and Specification
   Related: The shift from writing code to writing specs; users find that detailed markdown documentation or "plan mode" yields significantly better AI results than vague prompts.
16. AI Hallucinations and Errors
   Related: Reports of AI inventing non-existent CLI tools, getting stuck in logical loops, failing at visual UI tasks, and making simple indexing errors.
17. Shift in Developer Role
   Related: The idea that developers are evolving into "product managers" or "architects" who direct agents, requiring less syntax proficiency and more systems thinking.
18. Testing and Verification
   Related: The reliance on test-driven development (TDD), linters, and compilers to constrain non-deterministic AI output, ensuring generated code actually runs and meets requirements.
19. Local Models vs. Cloud APIs
   Related: Discussions on the viability of local models for privacy and cost savings versus the necessity of massive cloud models like Opus for complex reasoning tasks.
20. Societal Implications
   Related: Broader philosophical concerns about wealth concentration, the "class war" of automation, environmental impact, and the future of work in a post-code world.
0. Does not fit well in any category
</topics>

<comments_to_classify>
[
  
{
  "id": "46525707",
  "text": "Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds.\n\nStarting back in 2022/2023:\n\n- (~2022) It can auto-complete one line, but it can't write a full function.\n\n- (~2023) Ok, it can write a full function, but it can't write a full feature.\n\n- (~2024) Ok, it can write a full feature, but it can't write a simple application.\n\n- (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product.\n\n- (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term.\n\nIt's pretty clear to me where this is going. The only question is how long it takes to get there."
}
,
  
{
  "id": "46526032",
  "text": "> It's pretty clear to me where this is going. The only question is how long it takes to get there.\n\nI don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason."
}
,
  
{
  "id": "46526845",
  "text": "While I do agree with you. To play the counterpoint advocate though.\n\nWhat if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase?\n\nIt is probably incredibly wasteful, but ignoring that, could it work?"
}
,
  
{
  "id": "46531602",
  "text": "That sounds like an insane way to do anything that matters.\n\nSure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email?\n\nThe only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters."
}
,
  
{
  "id": "46533484",
  "text": "In Japan buildings (apartments) aren't built to last forever. They are built with a specific age in mind. They acknowledge the fact that houses are depreciating assets which have a value lim->0.\n\nThe only reason we don't do that with code (or didn't use to do it) was because rewriting from scratch NEVER worked[0]. And large scale refactors take massive amounts of time and resources, so much so that there are whole books written about how to do it.\n\nBut today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the \"spec\".\n\n[0] https://www.joelonsoftware.com/2000/04/06/things-you-should-...\n\n[1] https://simonwillison.net/2025/Dec/15/porting-justhtml/"
}
,
  
{
  "id": "46535512",
  "text": "Sure, and the buildings are built to a slowly-evolving code, using standard construction techniques, operating as a predictable building in a larger ecosystem.\n\nThe problem with \"all software\" being AI-generated is that, to use your analogy, the electrical standards, foundation, and building materials have all been recently vibe-coded into existence, and none of your construction workers are certified in any of it."
}
,
  
{
  "id": "46531207",
  "text": "I have the same questions in my head lately."
}
,
  
{
  "id": "46525799",
  "text": "Well, the first 90% is easy, the hard part is the second 90%.\n\nCase in point: Self driving cars.\n\nAlso, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders."
}
,
  
{
  "id": "46526021",
  "text": "Even if Opus 4.5 is the limit it’s still a massively useful tool. I don’t believe it’s the limit though for the simple fact that a lot could be done by creating more specialized models for each subdomain i.e. they’ve focused mostly on web based development but could do the same for any other paradigm."
}
,
  
{
  "id": "46526668",
  "text": "That's a massive shift in the claim though... I don't think anyone is disputing that it's a useful tool; just the implication that because it's a useful tool and has seen rapid improvement that implies they're going to \"get all the way there,\" so to speak."
}
,
  
{
  "id": "46526054",
  "text": "Personally I'm not against LLMs or AI itself, but considering how these models are built and trained, I personally refuse to use tools built on others' work without or against their consent (esp. GPL/LGPL/AGPL, Non Commercial / No Derivatives CC licenses and Source Available licenses).\n\nOf course the tech will be useful and ethical if these problems are solved or decided to be solved the right way."
}
,
  
{
  "id": "46526115",
  "text": "We just need to tax the hell out of the AI companies (assuming they are ever profitable) since all their gains are built on plundering the collective wisdom of humanity."
}
,
  
{
  "id": "46526238",
  "text": "I don’t think waiting for profitability makes sense. They can be massively disruptive without much profit as long as they spend enough money."
}
,
  
{
  "id": "46527312",
  "text": "AI companies and corporations in general control your politicians so taxing isn't going to happen."
}
,
  
{
  "id": "46526120",
  "text": "They're not blenders.\n\nThis is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it.\n\nYou just lose knowledge, which can be provided externally, and which is the actual \"pirated\" part.\n\nThe logic is _learned_"
}
,
  
{
  "id": "46527348",
  "text": "It hasn't learned any LOGIC. It has 'learned' patterns from the input."
}
,
  
{
  "id": "46533502",
  "text": "What is logic other than applying patterns?"
}
,
  
{
  "id": "46534328",
  "text": "The definition is broad for now this will do: Logic is the study of correct reasoning."
}
,
  
{
  "id": "46526162",
  "text": "Are there any recent publications about it so I can refresh myself on the matter?"
}
,
  
{
  "id": "46527401",
  "text": "You won't find any trustworthy papers on the topic because GP is simply wrong here.\n\nThat models can be distilled has no bearing whatsoever on whether a model has learned actual knowledge or understanding (\"logic\"). Models have always learned sparse/approximately-sparse and/or redundant weights, but they are still all doing manifold-fitting.\n\nThe resulting embeddings from such fitting reflect semantics and semantic patterns . For LLMs trained on the internet, the semantic patterns learned are linguistic , which are not just strictly logical, but also reflect emotional, connotational, conventional, and frequent patterns, all of which can be illogical or just wrong. While linguistic semantic patterns are correlated with logical patterns in some cases, this is simply not true in general."
}
,
  
{
  "id": "46526664",
  "text": "i like to think of LLMs as random number generators with a filter"
}
,
  
{
  "id": "46525876",
  "text": "> Well, the first 90% is easy, the hard part is the second 90%.\n\nYou'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth."
}
,
  
{
  "id": "46526156",
  "text": "No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline.\n\nKnowledge engineering has a notion called \"covered/invisible knowledge\" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good.\n\nConsidering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person."
}
,
  
{
  "id": "46531347",
  "text": "\"covered/invisible knowledge\" aka tacit knowledge"
}
,
  
{
  "id": "46531441",
  "text": "Yeah, I failed to remember the term while writing the comment. Thanks!"
}
,
  
{
  "id": "46529890",
  "text": "Self driving cars is not a proof. It only proves that having quick gains doesn't mean necessarily you'll get a 100% fast. It doesn't prove it will necessarily happen."
}
,
  
{
  "id": "46526506",
  "text": ">> No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs.\n\nSelf-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid.\n\n>> The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline.\n\nSure, but the question is not \"how long does it take for LLMs to get to 100%\". The question is, how long does it take for them to become as good as, or better than, humans. And that threshold happens way before 100%."
}
,
  
{
  "id": "46526868",
  "text": ">> Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid.\n\nDoesn't matter, because if we're talking about AI models, no (type of) model reaches 100% linearly, or 100% ever. For example, recognition models run with probabilities. Like Tesla's Autopilot (TM), which loves to hit rolled-over vehicles because it has not seen enough vehicle underbodies to classify it.\n\nSame for scientific classification models. They emit probabilities, not certain results.\n\n>> Sure, but the question is not \"how long does it take for LLMs to get to 100%\"\n\nI never claimed that a model needs to reach a proverbial 100%.\n\n>> The question is, how long does it take for them to become as good as, or better than, humans.\n\nThey can be better than humans for certain tasks. They are actually better than humans in some tasks since 70s, but we like to disregard them to romanticize current improvements, but I don't believe current or any generation of AIs can be better than humans in anything and everything, at once.\n\nRemember: No machine can construct something more complex than itself.\n\n>> And that threshold happens way before 100%.\n\nYes, and I consider that \"treshold\" as \"complete\", if they can ever reach it for certain tasks, not \"any\" task."
}
,
  
{
  "id": "46526260",
  "text": ">None of the models (even AI in general) can capture this\n\nNone of the current models maybe, but not AI in general? There’s nothing magical about brains. In fact, they’re pretty shit in many ways."
}
,
  
{
  "id": "46526315",
  "text": "A model trained on a very large corpus can't, because these behaviors are different or specialized enough they cancel each other most of the cases. You can forcefully fine-tune a model with a singular person's behavior up to a certain point, but I'm not sure that even that can capture the subtlest of behaviors or decision mechanisms which are generally the most important ones (the ones we call gut feeling or instinct).\n\nOTOH, while I won't call human brain perfect, the things we label \"shit\" generally turn out to be very clever and useful optimizations to workaround its own limitations, so I regard human brain higher than most AI proponents do. Also we shouldn't forget that we don't know much about how that thing works. We only guess and try to model it.\n\nLastly, searching perfection in numbers and charts or in engineering sense is misunderstanding nature and doing a great disservice to it, but this is a subject for another day."
}
,
  
{
  "id": "46526717",
  "text": "The understanding of the brain is far from complete whether they're \"magical\" or \"shit.\""
}
,
  
{
  "id": "46527530",
  "text": "Also obviously brains are both!"
}
,
  
{
  "id": "46525972",
  "text": "I read the comment more as \"based on past experience, it is usually the case that the first 90% is easier than the last 10%\", which is the right base case expectation, I think. That doesn't mean it will definitely play out that way, but you don't have to \"prove\" things like this. You can just say that they tend to be true, so it's a good expectation to think it will probably be true again."
}
,
  
{
  "id": "46526472",
  "text": "The saying is more or less treated as a truism at this point. OP isn't claiming something original and the onus of proving it isn't on them imo.\n\nI've heard this same thing repeated dozens of times, and for different domains/industries.\n\nIt's really just a variation of the 80/20 rule."
}
,
  
{
  "id": "46526224",
  "text": "Note that blog posts rarely show the 20 other times it failed to build something and only that time that it happened to work.\n\nWe've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years"
}
,
  
{
  "id": "46532757",
  "text": "I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation).\n\nAs long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better.\n\nAnd I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around."
}
,
  
{
  "id": "46533615",
  "text": "With \"art\" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM.\n\nDoes it matter that 49 of them \"failed\"? It cost me fractions of a cent, so not really.\n\nIf every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway.\n\nIt's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it."
}
,
  
{
  "id": "46527610",
  "text": "I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet.\n\n> The only question is how long it takes to get there.\n\nThis is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations.\n\nIt is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom."
}
,
  
{
  "id": "46526449",
  "text": "> - (~2023) Ok, it can write a full function, but it can't write a full feature.\n\nThe trend is definitely here, but even today, heavily depends on the feature.\n\nWhile extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product."
}
,
  
{
  "id": "46525956",
  "text": "Yeah maybe, but personally it feels more like a plateau to me than an exponential takeoff, at the moment.\n\nAnd this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time!\n\nBut I'm still pretty skeptical of \"these things are about to not require human operators in the loop at all!\"."
}
,
  
{
  "id": "46525987",
  "text": "I can agree that it doesn’t seem exponential yet but this is at least linear progression not a plateau."
}
,
  
{
  "id": "46527301",
  "text": "Linear progression feels slower (and thus more like a plateau) to me than the end of 2022 through end of 2024 period.\n\nThe question in my mind is where we are on the s-curve. Are we just now entering hyper-growth? Or are we starting to level out toward maturity?\n\nIt seems like it must still be hyper-growth, but it feels less that way to me than it did a year ago. I think in large part my sense is that there are two curves happening simultaneously, but at different rates. There is the growth in capabilities, and then there is the growth in adoption. I think it's the first curve that seems to be to have slown a bit. Model improvements seem both amazing and also less revolutionary to me than they did a year or two ago.\n\nBut the other curve is adoption, and I think that one is way further from maturity. The providers are focusing more on the tooling now that the models are good enough. I'm seeing \"normies\" (that is, non-programmers) starting to realize the power of Claude Code in their own workflows. I think that's gonna be huge and is just getting started."
}
,
  
{
  "id": "46526299",
  "text": "Each of these years we’ve had a claim that it’s about to replace all engineers.\n\nBy your logic, does it mean that engineers will never get replaced?"
}
,
  
{
  "id": "46526598",
  "text": "Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help.\n\nI suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool."
}
,
  
{
  "id": "46527595",
  "text": "Ok, it can create a long-lived complex codebase for a product that is extensible and scalable over the long term, but it doesn't have cool tattoos and can't fancy a matcha"
}
,
  
{
  "id": "46526432",
  "text": "This is disingenuous because LLMs were already writing full, simple applications in 2023.[0]\n\nThey're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry.\n\nPlus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves.\n\n[0] eg https://www.youtube.com/watch?v=GizsSo-EevA"
}
,
  
{
  "id": "46526450",
  "text": "What LLM were you using to build full applications in 2023? That certainly wasn’t my experience."
}
,
  
{
  "id": "46526487",
  "text": "Just from googling, here's a video \"Use ChatGPT to Code a Full Stack App\" from May 18, 2023.[0]\n\nThere's a lot of non-ergonomic copy and pasting but it's definitely using an LLM to build a full application.\n\n[0] https://www.youtube.com/watch?v=GizsSo-EevA"
}
,
  
{
  "id": "46526828",
  "text": "That's not at all what's being discussed in this article. We copy-pasted from SO before this. This article is talking about 99% fully autonomous coding with agents, not copy-pasting 400 times from a chat bot."
}
,
  
{
  "id": "46527110",
  "text": "Hi, please re-read the parent comment again, which was claiming\n\n> Starting back in 2022/2023:\n\n> - (~2022) It can auto-complete one line, but it can't write a full function.\n\n> - (~2023) Ok, it can write a full function, but it can't write a full feature.\n\nThis was a direct refutation, with evidence, that in 2023 people were not claiming that LLMs \"can't write a full feature\", because, as demonstrated, people were already building full applications with it at the time.\n\nThis obviously is not talking exclusively about agents, because agents did not exist in 2022."
}

]
</comments_to_classify>

Based on the comments above, assign each to up to 3 relevant topics.

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  
{
  "id": "comment_id_3",
  "topics": [
    0
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment

Remember: Output ONLY the JSON array, no other text.

commentCount

50

← Back to job