llm/e6f7e516-f0a0-4424-8f8f-157aae85c74e/batch-0-4e67094d-2880-45bc-8d61-4372e54e7e1c-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. Reasoning vs. Pattern Matching
Related: Debates on whether LLMs truly think or merely predict tokens based on training data. Includes comparisons to human cognition, the definition of "reasoning" as argument production versus evaluation, and the argument that LLMs are "lobotomized" without external loops or formalization.
2. AI-Assisted Coding Reality
Related: Divergent experiences with tools like Claude Code and Codex. While some report massive productivity boosts and shipping entire features solo, others describe "lazy" AI, subtle logic bugs in generated tests (e.g., SQL query validation), and the danger of unverified code bloat.
3. The AI Economic Bubble
Related: Comparisons to the dot-com crash, with arguments that current valuation relies on "science fiction fantasies" and hype rather than revenue. Counter-arguments suggest the infrastructure (datacenters, GPUs) provides real value similar to the fiber build-out, even if a market correction is imminent.
4. Workforce Displacement and Automation
Related: Fears and anecdotes regarding job security, including a "Staff SWE" preferring AI to coworkers and contractors losing bids to smaller, AI-equipped teams. Discussions cover the automation of "bullshit jobs," the potential for a "winner take all" economy, and management incentives to cut labor costs.
5. Definition of Agentic Success
Related: Disagreement over whether AI "joined the workforce." Some argue failing to replace humans entirely (the "secretary" model) is a failure of 2025 predictions, while others claim deep integration as a tool (automating loops, drafting emails) constitutes a successful, albeit different, type of joining.
6. Verification and Hallucination Risks
Related: The critical need for external validation mechanisms. Commenters note that coding agents succeed because compilers/linters act as truth-checkers, whereas open-ended tasks (spreadsheets, emails) lack rigorous feedback loops, making hallucinations and "truthy" errors dangerous and hard to detect.
7. Impact on Skill and Learning
Related: Concerns about the long-term effects on human expertise. Topics include "skill atrophy" where juniors bypass learning fundamentals, the educational crisis evidenced by Chegg's collapse, and the difficulty of debugging AI code without deep institutional knowledge or "muscle memory" of the system.
8. Corporate Hype vs. Utility
Related: Cynicism toward executive predictions (Altman, Hinton) viewed as efforts to pump stock prices or attract investment. Users contrast "corporate puffery" and "vaporware" with the practical, often mundane utility of AI in specific B2B workflows like insurance claim processing or data extraction.
9. Integration into Legacy Systems
Related: The challenge of applying AI to real-world, messy environments versus greenfield demos. Discussion includes the difficulty of getting agents to work with proprietary codebases, expensive dependencies, lack of documentation for obscure vendor tools, and the failure of browser agents on standard web forms.
10. Formalization of Natural Language
Related: Theoretical discussions on overcoming LLM limitations by mapping natural language to formal logic or proof systems (like Lean). Skeptics argue human language is too "mushy" or context-dependent for this to be a silver bullet for AGI or perfect reasoning.
11. Medical and Specialized Fields
Related: Debates on AI in radiology and medicine. While some see potential in automated reporting and "second opinions" to catch errors, professionals argue that current models struggle with complex cases, over-report issues, and lack the nuance required for high-stakes diagnostics.
12. The Secretary vs. Replacement Model
Related: The shift in expectations from AI as an autonomous employee to AI as a productivity-enhancing assistant. Users describe workflows where humans act as orchestrators or managers of AI output rather than performing the rote work, effectively reviving the role of the personal secretary.
13. Software Engineering Evolution
Related: Predictions that the discipline is shifting from "writing code" to "managing entropy" and system design. Some view this as empowering "cowboy devs" to move fast, while others fear a future of unmaintainable "vibe coded" software that no human fully understands.
14. Productivity Metrics and Paradoxes
Related: Skepticism regarding "2x productivity" claims. Commenters argue that generating more code doesn't equal value, noting that debugging, communicating, and context-gathering are the real bottlenecks, and that AI might simply be increasing the volume of low-quality output or "slop."
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "46511206",
"text": "The answer is reasoning. It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable. They are useful, and they can mimic thinking to a certain level, mainly because they have been trained on a inhumane amount of data that no person could learn in one life. But they do not think, and the current algorithms are clearly a dead end for thinking machines."
}
,
{
"id": "46511403",
"text": "> the current algorithms are clearly a dead end for thinking machines.\n\nThese discussions often get derailed into debates about what \"thinking\" means. If we define thinking as the capacity to produce and evaluate arguments, as the cognitive scientists Sperber and Mercier do, then we can see LLMs are certainly producing arguments, but they're weak at the evaluation.\n\nIn some cases, arguments can be formalised, and then evaluating them is a solved problem, as in the examples of using the Lean proofchecker in combination with LLMs to write mathematical proofs.\n\nThat suggests a way forward will come from formalising natural language arguments. So LLMs by themselves might be a dead end but in combination with formalisation they could be very powerful. That might not be \"thinking\" in the sense of the full suite of human abilities that we group with that word but it seems an important component of it."
}
,
{
"id": "46513763",
"text": "Yesterday I got AI (a sota model) to write some tests for a backend I'm working on. One set of tests was for a function that does a somewhat complex SQL query that should return multiple rows\n\nIn the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?\n\nI don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years\n\nThis kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way"
}
,
{
"id": "46514269",
"text": "As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes.\n\nThe tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.\n\nIn some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.\n\nLLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still."
}
,
{
"id": "46515134",
"text": "> In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.\n\nI really hate this description, but I can't quite filly articulate why yet. It's distinctly different because interns can form new observations independently. AIs can not. They can make another guess at the next token, but if it could have predicted it the 2nd time, it must have been able to predict it the first, so it's not a new observation. The way I think through a novel problem results in drastically different paths and outputs from an LLM. They guess and check repeatedly, they don't converge on an answer. Which you've already identified\n\n> LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.\n\nThis isn't how you work with an intern (unless the intern is unable to learn)."
}
,
{
"id": "46514571",
"text": "> As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes\n\nThat has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data\n\nThis specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code\n\nI do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do\n\nMy point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it\n\nWhy can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems"
}
,
{
"id": "46515380",
"text": "Sounds like the AI was not dumb but lazy. I do it similarly when I don't feel like doing it."
}
,
{
"id": "46514990",
"text": "> Is this what people are referring to when they say AI writes their tests?\n\nyes\n\n> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.\n\n[nods]\n\n> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way\n\nand yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.\n\nThe larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the \"fuzzing\" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?"
}
,
{
"id": "46511496",
"text": "> suggests a way forward will come from formalising natural language arguments\n\nIf by this you mean \"reliably convert expressions made in human natural language to unambiguous, formally parseable expressions that a machine can evaluate the same way every time\"... isn't that essentially an unreachable holy grail? I mean, everyone from Plato to Russell and Wittgenstein struggled with the meaning of human statements. And the best solution we have today is to ask the human to restrict the set of statement primitives and combinations that they can use to a small subset of words like \"const\", \"let foo = bar\", and so on."
}
,
{
"id": "46512678",
"text": "Whether the Holy Grail is unreachable or not is the question. Of course, the problem in full generality is hard, but that doesn't mean it can't be approached in various partial ways, either by restricting the inputs as you suggest or by coming up with some kind of evaluation procedures that are less strict than formal verifiability. I don't have any detailed proposals tbh"
}
,
{
"id": "46511572",
"text": "> That suggests a way forward will come from formalising natural language arguments.\n\nHot take (and continue with the derailment), but I'd argue that analytic philosophy from the last 100 years suggests this is a dead end. The idea that belief systems could be formalized was huge in the early 20th century (movements like Logical Positivism, or Russell's principia mathematica being good examples of this).\n\nThose approaches haven't really yielded many results, and by far the more fruitful form of analysis has been to conceptually \"reframe\" different problems (folks like Hillary Putnam, Wittgenstein, Quine being good examples).\n\nWe've stacked up a lot of evidence that human language is much too loose and mushy to be formalised in a meaningful way."
}
,
{
"id": "46511937",
"text": "We've stacked up a lot of evidence that human language is much too loose and mushy to be formalized in a meaningful way.\n\nLossy might also be a way of putting it, like a bad compression algorithm. Written language carries far less information than spoken and nonverbal cues."
}
,
{
"id": "46512791",
"text": "True, maybe full formalisation is too strict and the evaluation should be fuzzier"
}
,
{
"id": "46511706",
"text": "I think you may mean Sperber and Mercier define \"reasoning\" as the capacity to produce and evaluate arguments?"
}
,
{
"id": "46512701",
"text": "True, they use the word \"reasoning\". Part of my point was just to focus on the more concrete concept: the capacity to produce and evaluate arguments."
}
,
{
"id": "46513008",
"text": "> If we define thinking as the capacity to produce and evaluate arguments\n\nThat bar is so low that even a political pundit on TV can clear it."
}
,
{
"id": "46513818",
"text": "I know a lot of people with access to Claude Code and the like will say that 'No, it sure seems to reason to me!'\n\nGreat. But most (?) of the business out there aren't paying for the big boy models.\n\nI know of a F100 that got snookered into a deal with GPT 4 for 5 years, max of 40 responses per session, max of 10 sessions of memory, no backend integration.\n\nThose folks rightly think that AI is a bad idea."
}
,
{
"id": "46516192",
"text": "> It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable\n\nI have kids, and you could say the same about toddlers. Terrific mimics, they don't understand the whys."
}
,
{
"id": "46516856",
"text": "IMHO when toddlers say mama they really understand that to a much much bigger degree than any LLM. They might not be able to articulate it but the deep understanding is there.\n\nSo I think younger kids have purpose and associate meaning to a lot of things and they do try to get to a specific path toward an outcome.\n\nOf course (depending on the age) their \"reasoning\" is in a different system than hours where the survival instincts are much more powerful than any custom defined outcome so most of the time that is the driving force of the meaning.\n\nWhy I talk about meaning? Because, of course, the kids cannot talk about the why, as that is very abstract. But meaning is a big part of the Why and it continues to be so in adult life it is just that the relation is reversed: we start talking about the why to get to a meaning.\n\nI also think that kids starts to have more complex thoughts than the language very early. If you got through the \"Why?\" phase you might have noticed that when they ask \"Why?\" they could mean very different questions. But they don't know the words to describe it. Sometimes \"Why?\" means \"Where?\" sometimes means \"How?\" sometimes means \"How long?\" .... That series of questioning is, for me, a kind of proof that a lot of things are happening in kids brain much more than they can verbalise."
}
,
{
"id": "46512598",
"text": "What constitutes real \"thinking\" or \"reasoning\" is beside the point. What matters is what results we getting.\n\nAnd the challenge is rethinking how we do work, connecting all the data sources for agents to run and perform work over the various sources that we perform work. That will take ages. Not to mention having the controls in place to make that the \"thinking\" was correct in the end."
}
,
{
"id": "46512692",
"text": "> connecting all the data sources for agents to run\n\nCopilot can't jump to definition in Visual Studio.\n\nAnthropic got a lot of mileage out of teaching Claude to grep, but LLM agents are a complete dead-end for my code-base until they can use the semantic search tools that actually work on our code-base and hook into the docs for our expensive proprietary dependencies."
}
,
{
"id": "46513112",
"text": "Thinking is not besides the point, it is the entire point.\n\nYou seem to be defining \"thinking\" as an interchangeable black box, and as long as something fits that slot and \"gets results\", it's fine.\n\nBut it's the code-writing that's the interchangeable black box, not the thinking. The actual work of software development is not writing code, it's solving problems.\n\nWith a problem-space-navigation model, I'd agree that there are different strategies that can find a path from A to B, and what we call cognition is one way (more like a collection of techniques) to find a path. I mean, you can in principle brute-force this until you get the desired result.\n\nBut that's not the only thing that thinking does. Thinking responds to changing constraints, unexpected effects, new information, and shifting requirements. Thinking observes its own outputs and its own actions. Thinking uses underlying models to reason from first principles. These strategies are domain-independent, too.\n\nAnd that's not even addressing all the other work involved in reality: deciding what the product should do when the design is underspecified. Asking the client/manager/etc what they want it to do in cases X, Y and Z. Offering suggestions and proposals and explaining tradeoffs.\n\nNow I imagine there could be some other processes we haven't conceived of that can do these things but do them differently than human brains do. But if there were we'd probably just still call it 'thinking.'"
}
,
{
"id": "46514424",
"text": "Do you think reasoning models don't count? there is a lot of work around those and things like RAGs."
}
,
{
"id": "46511294",
"text": "Reasoning keeps improving, but they still have a ways to go\n\nhttps://arcprize.org/leaderboard"
}
,
{
"id": "46512506",
"text": "What we need is reasoning as in \"drawing logical conclusions based on logic\". LLMs do reasoning by recursively adding more words to the context window. That's not logical reasoning."
}
,
{
"id": "46512665",
"text": "It's debatable that humans do \"drawing logical conclusions based on logic\". Look at politics and what people vote for. They seem to do something more like pattern matching."
}
,
{
"id": "46513100",
"text": "Humans are far from logical. We make decisions within the context of our existence. This includes emotions, friends, family, goals, dreams, fears, feelings, mood, etc.\n\nit’s one of the challenges when LLMs are being anthropomorphised, reasoning/logic for bots is not the same as that for humans."
}
,
{
"id": "46513347",
"text": "And yet, when we make bad calls or do illogical things, because of hormones, emotions, energy levels, etc we still calling it reasoning.\n\nBut, to LLMs we don't afford the same leniency. If they flip some bits and the logic doesn't add up we're quick to point that \"it's not reasoning at all\".\n\nFunny throne we've built for ourselves."
}
,
{
"id": "46515108",
"text": "Yes, because different things are different."
}
,
{
"id": "46513038",
"text": "Maybe we say that when we don't like those conclusions?\n\nAfter all I can guarantee the other side (whatever it is) will say the same thing for your \"logical\" conclusions.\n\nIt is logic, we just don't share the same predicates or world model..."
}
,
{
"id": "46513137",
"text": "Just because all humans don't use reason all the time doesn't mean reasoning isn't a good and desirable strategy."
}
,
{
"id": "46512690",
"text": "I don't know why you were downvoted. It is a bit more complicated, but that's the gist of it. LLMs don't actually reason."
}
,
{
"id": "46516492",
"text": "Whether LLM is reasoning or not is an independent question to whether it works by generating text.\n\nBy the standard in the parent post, humans certainly do not \"reason\". But that is then just choosing a very high bar for \"reasoning\" that neither humans nor AI meets...what is the point then?\n\nIt is a bit like saying: \"Humans don't reason, they just let neurons fire off one another, and think the next thought that enters their mind\"\n\nYes, LLMs need to spew out text to move their state forward. As a human I actually sometimes need to do that too: Talk to myself in my head to make progress. And when things get just a tiny bit complicated I need to offload my brain using pen and paper.\n\nMost arguments used to show that LLMs do not \"reason\" can be used to show that humans do not reason either.\n\nTo show that LLMs do not reason you have to point to something else than how it works."
}
,
{
"id": "46515150",
"text": "LLMs"
}
,
{
"id": "46511433",
"text": "they can think just not in the same abstract platonic way that a human mind can"
}
,
{
"id": "46516556",
"text": "Your mind must work differently than mine. I have programmed for 20 years, I have a PhD in astrophysics..\n\nAnd my \"reasoning\" is pretty much like a long ChatGPT verbal and sometimes not-so-verbal (visual) conversation with myself.\n\nIf my mind really did abstract platonic thinking I think answers to hard problems would just instantly appear to me, without flaws. But only problems I hve solved before and can pattern match do that.\n\nAnd if I have to think any new thoughts I feel that process is rather similar to how LLMs work.\n\nIt is the same for history of science really -- only thoughts that build small steps on previous thoughts and participate in a conversation actually are thought by humans.\n\nTotally new leaps, which a \"platonic thinking machines\" should easily do, do not seem to happen..\n\nHumans are, IMO, conversation machines too..."
}
,
{
"id": "46511786",
"text": "I rather approach it from a Cartesian perspective. A context window is just that, it's not \"existence\". And because they do not exist in the world the same way as a human does, they do not think in the same way a human does (reversal of \"I think therefore I am\")"
}
,
{
"id": "46511924",
"text": "I have a context matrix, therefore I transform?"
}
,
{
"id": "46511348",
"text": "> But they do not think\n\nI see this argument made a lot. I'm not sure if the distinction really holds weight once we start to unravel though.\n\nWhat's a topic you're able to think about that an LLM is not able to think about?"
}
,
{
"id": "46512069",
"text": "I asked GPT for rules on 101-level French grammar. That should be well documented for someone learning from English, no? The answers were so consistently wrong that it seemed intentional. Absolutely nothing novel asked of it. It could have quoted verbatim if it wanted to be lazy. I can't think of an easier question to give an LLM. If it's possible to \"prompt wrong\" a simple task that my six-year old nephew could easily do, the burden of proof is not on the people denying LLM intelligence, it's on the boosters."
}
,
{
"id": "46514697",
"text": "> the burden of proof is not on the people denying LLM intelligence, it's on the boosters\n\nIt's an impossible burden to prove. We can't even prove that any other human has sentience or is reasoning, we just evaluate the outcomes.\n\nOne day the argument you're putting forward will be irrelevant, or good for theoretical discussion only. In practice I'm certain that machines will achieve human level output at some point."
}
,
{
"id": "46516378",
"text": "> machines will achieve human level output at some point\n\nWould you care to put some sort of time scale to \"at some point?\" Are we talking about months, years, decades, centuries?"
}
,
{
"id": "46512030",
"text": "I can solve a mystery novel based on the evidence alone. Assuming an LLM doesn’t already have the answer it will offer solutions based on meta-information like how similar mysteries conclude or are structured. While this can be effective, it’s not really solving the mystery and will fail with anything truly novel."
}
,
{
"id": "46511622",
"text": "Any topic with little coverage in the training data. LLMs will keep circling around the small bits in the training data, unable synthesize new connections.\n\nThis is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish."
}
,
{
"id": "46513325",
"text": "Edit: HN filters emojis in comments. TLDR: the bots reply with various sea-related emojis and can't explain that there's no seahorse emoji.\n\non Meta (whatsapp) chatbot:\n\nme: what is the seahorse emoji?\n\nbot:\n\nme: this is not a seahorse\n\nbot: You got me! The actual seahorse emoji is . Want me to use it in a fun sentence?\n\non Bing/Copilot:\n\nme: show me the seahorse emoji\n\nbot:\n\n------------\n\nHere you go: — actually, that’s a fun combo, but the real seahorse emoji is … wait, not quite! The correct one is for jellyfish and for unicorn, but the seahorse emoji itself is ? Let me clarify:\n\nThe official seahorse emoji is: ? No — correction! It’s for unicorn. The actual seahorse emoji is: ?\n\nLet’s be precise: the seahorse emoji is ?\n\nOops — I misspoke earlier. The true seahorse emoji is: ?\n\nWait — let’s settle this cleanly:\nThe seahorse emoji is: ?\n\nActually, the correct one is: ?\n\n---------------\n\nIn some instances, Copilot answers that \"there's no seahorse emoji _and that's why I keep running in circles_\". This is just pasting answers, as this is a popular problem discussed on the internet. If the bot were reasoning, it would just answer \"there's no seahorse emoji, please stop asking\"."
}
,
{
"id": "46514480",
"text": "Copilot is the absolute worst. Yesterday I had tried to have it create a printable calendar for January 2026 but no matter how I instructed it, it kept showing that the first was on a Wednesday, not Thursday. I even fed it back its own incorrect PDF in a new conversation, which clearly showed the 1st on a Wednesday and asked it to tell me what day the calendar showed the first on. It said the calendar showed the 1st as a Thursday. It started to make me disbelieve my own eyes.\n\nEdit: I gave up on Copilot ant fed the same instructions to ChatGPT, which had no issue.\n\nThe point here is that some models seem to know your intention while some just seem stuck on their training data."
}
,
{
"id": "46513478",
"text": "If that's the benchmark, then Opus 4.5 (with \"extended thinking\") can think:\n\n> Me: what is the seahorse emoji?\n> Claude: There isn't a seahorse emoji in the standard Unicode emoji set. The closest you'll get is the generic fish or tropical fish , but no dedicated seahorse exists as of now."
}
,
{
"id": "46510224",
"text": "If anyone was around for the dot-com bubble any company internet related or with a web like name was irrationally funded, P/E didn't matter, burn didn't matter, product didn't matter.\n\nAI has all the same markers of a the dot com bubble and eventually venture capital will dry up and many AI companies will go bust with a few remaining that make something useful with an unmet niche."
}
,
{
"id": "46510860",
"text": "\"The Web\" and \"E-Commerce\" ended up being quite gigantic \"unmet niches\" though!"
}
,
{
"id": "46511182",
"text": "If smth is bubble it does not mean that said smth has no value. It just means that there is over investment and thus inefficient investment. Like housing bubble - nobody argues that houses are not needed and are not big part of economy."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50