Summarizer

LLM Input

llm/e6f7e516-f0a0-4424-8f8f-157aae85c74e/topic-0-cebc059c-7064-4663-b89f-23ffd1832bb5-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Reasoning vs. Pattern Matching # Debates on whether LLMs truly think or merely predict tokens based on training data. Includes comparisons to human cognition, the definition of "reasoning" as argument production versus evaluation, and the argument that LLMs are "lobotomized" without external loops or formalization.
</topic>

<comments_about_topic>
1. The answer is reasoning. It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable. They are useful, and they can mimic thinking to a certain level, mainly because they have been trained on a inhumane amount of data that no person could learn in one life. But they do not think, and the current algorithms are clearly a dead end for thinking machines.

2. > the current algorithms are clearly a dead end for thinking machines.

These discussions often get derailed into debates about what "thinking" means. If we define thinking as the capacity to produce and evaluate arguments, as the cognitive scientists Sperber and Mercier do, then we can see LLMs are certainly producing arguments, but they're weak at the evaluation.

In some cases, arguments can be formalised, and then evaluating them is a solved problem, as in the examples of using the Lean proofchecker in combination with LLMs to write mathematical proofs.

That suggests a way forward will come from formalising natural language arguments. So LLMs by themselves might be a dead end but in combination with formalisation they could be very powerful. That might not be "thinking" in the sense of the full suite of human abilities that we group with that word but it seems an important component of it.

3. Yesterday I got AI (a sota model) to write some tests for a backend I'm working on. One set of tests was for a function that does a somewhat complex SQL query that should return multiple rows

In the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?

I don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years

This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

4. As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes.

The tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.

In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.

LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.

5. > In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.

I really hate this description, but I can't quite filly articulate why yet. It's distinctly different because interns can form new observations independently. AIs can not. They can make another guess at the next token, but if it could have predicted it the 2nd time, it must have been able to predict it the first, so it's not a new observation. The way I think through a novel problem results in drastically different paths and outputs from an LLM. They guess and check repeatedly, they don't converge on an answer. Which you've already identified

> LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.

This isn't how you work with an intern (unless the intern is unable to learn).

6. > As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes

That has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data

This specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code

I do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do

My point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it

Why can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems

7. > Is this what people are referring to when they say AI writes their tests?

yes

> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.

[nods]

> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

and yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.

The larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the "fuzzing" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?

8. I think you may mean Sperber and Mercier define "reasoning" as the capacity to produce and evaluate arguments?

9. True, they use the word "reasoning". Part of my point was just to focus on the more concrete concept: the capacity to produce and evaluate arguments.

10. > If we define thinking as the capacity to produce and evaluate arguments

That bar is so low that even a political pundit on TV can clear it.

11. I know a lot of people with access to Claude Code and the like will say that 'No, it sure seems to reason to me!'

Great. But most (?) of the business out there aren't paying for the big boy models.

I know of a F100 that got snookered into a deal with GPT 4 for 5 years, max of 40 responses per session, max of 10 sessions of memory, no backend integration.

Those folks rightly think that AI is a bad idea.

12. > It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable

I have kids, and you could say the same about toddlers. Terrific mimics, they don't understand the whys.

13. IMHO when toddlers say mama they really understand that to a much much bigger degree than any LLM. They might not be able to articulate it but the deep understanding is there.

So I think younger kids have purpose and associate meaning to a lot of things and they do try to get to a specific path toward an outcome.

Of course (depending on the age) their "reasoning" is in a different system than hours where the survival instincts are much more powerful than any custom defined outcome so most of the time that is the driving force of the meaning.

Why I talk about meaning? Because, of course, the kids cannot talk about the why, as that is very abstract. But meaning is a big part of the Why and it continues to be so in adult life it is just that the relation is reversed: we start talking about the why to get to a meaning.

I also think that kids starts to have more complex thoughts than the language very early. If you got through the "Why?" phase you might have noticed that when they ask "Why?" they could mean very different questions. But they don't know the words to describe it. Sometimes "Why?" means "Where?" sometimes means "How?" sometimes means "How long?" .... That series of questioning is, for me, a kind of proof that a lot of things are happening in kids brain much more than they can verbalise.

14. Thinking is not besides the point, it is the entire point.

You seem to be defining "thinking" as an interchangeable black box, and as long as something fits that slot and "gets results", it's fine.

But it's the code-writing that's the interchangeable black box, not the thinking. The actual work of software development is not writing code, it's solving problems.

With a problem-space-navigation model, I'd agree that there are different strategies that can find a path from A to B, and what we call cognition is one way (more like a collection of techniques) to find a path. I mean, you can in principle brute-force this until you get the desired result.

But that's not the only thing that thinking does. Thinking responds to changing constraints, unexpected effects, new information, and shifting requirements. Thinking observes its own outputs and its own actions. Thinking uses underlying models to reason from first principles. These strategies are domain-independent, too.

And that's not even addressing all the other work involved in reality: deciding what the product should do when the design is underspecified. Asking the client/manager/etc what they want it to do in cases X, Y and Z. Offering suggestions and proposals and explaining tradeoffs.

Now I imagine there could be some other processes we haven't conceived of that can do these things but do them differently than human brains do. But if there were we'd probably just still call it 'thinking.'

15. Do you think reasoning models don't count? there is a lot of work around those and things like RAGs.

16. Reasoning keeps improving, but they still have a ways to go

https://arcprize.org/leaderboard

17. What we need is reasoning as in "drawing logical conclusions based on logic". LLMs do reasoning by recursively adding more words to the context window. That's not logical reasoning.

18. It's debatable that humans do "drawing logical conclusions based on logic". Look at politics and what people vote for. They seem to do something more like pattern matching.

19. Humans are far from logical. We make decisions within the context of our existence. This includes emotions, friends, family, goals, dreams, fears, feelings, mood, etc.

it’s one of the challenges when LLMs are being anthropomorphised, reasoning/logic for bots is not the same as that for humans.

20. And yet, when we make bad calls or do illogical things, because of hormones, emotions, energy levels, etc we still calling it reasoning.

But, to LLMs we don't afford the same leniency. If they flip some bits and the logic doesn't add up we're quick to point that "it's not reasoning at all".

Funny throne we've built for ourselves.

21. Yes, because different things are different.

22. Maybe we say that when we don't like those conclusions?

After all I can guarantee the other side (whatever it is) will say the same thing for your "logical" conclusions.

It is logic, we just don't share the same predicates or world model...

23. Just because all humans don't use reason all the time doesn't mean reasoning isn't a good and desirable strategy.

24. I don't know why you were downvoted. It is a bit more complicated, but that's the gist of it. LLMs don't actually reason.

25. Whether LLM is reasoning or not is an independent question to whether it works by generating text.

By the standard in the parent post, humans certainly do not "reason". But that is then just choosing a very high bar for "reasoning" that neither humans nor AI meets...what is the point then?

It is a bit like saying: "Humans don't reason, they just let neurons fire off one another, and think the next thought that enters their mind"

Yes, LLMs need to spew out text to move their state forward. As a human I actually sometimes need to do that too: Talk to myself in my head to make progress. And when things get just a tiny bit complicated I need to offload my brain using pen and paper.

Most arguments used to show that LLMs do not "reason" can be used to show that humans do not reason either.

To show that LLMs do not reason you have to point to something else than how it works.

26. they can think just not in the same abstract platonic way that a human mind can

27. Your mind must work differently than mine. I have programmed for 20 years, I have a PhD in astrophysics..

And my "reasoning" is pretty much like a long ChatGPT verbal and sometimes not-so-verbal (visual) conversation with myself.

If my mind really did abstract platonic thinking I think answers to hard problems would just instantly appear to me, without flaws. But only problems I hve solved before and can pattern match do that.

And if I have to think any new thoughts I feel that process is rather similar to how LLMs work.

It is the same for history of science really -- only thoughts that build small steps on previous thoughts and participate in a conversation actually are thought by humans.

Totally new leaps, which a "platonic thinking machines" should easily do, do not seem to happen..

Humans are, IMO, conversation machines too...

28. I rather approach it from a Cartesian perspective. A context window is just that, it's not "existence". And because they do not exist in the world the same way as a human does, they do not think in the same way a human does (reversal of "I think therefore I am")

29. I have a context matrix, therefore I transform?

30. > But they do not think

I see this argument made a lot. I'm not sure if the distinction really holds weight once we start to unravel though.

What's a topic you're able to think about that an LLM is not able to think about?

31. I asked GPT for rules on 101-level French grammar. That should be well documented for someone learning from English, no? The answers were so consistently wrong that it seemed intentional. Absolutely nothing novel asked of it. It could have quoted verbatim if it wanted to be lazy. I can't think of an easier question to give an LLM. If it's possible to "prompt wrong" a simple task that my six-year old nephew could easily do, the burden of proof is not on the people denying LLM intelligence, it's on the boosters.

32. > the burden of proof is not on the people denying LLM intelligence, it's on the boosters

It's an impossible burden to prove. We can't even prove that any other human has sentience or is reasoning, we just evaluate the outcomes.

One day the argument you're putting forward will be irrelevant, or good for theoretical discussion only. In practice I'm certain that machines will achieve human level output at some point.

33. > machines will achieve human level output at some point

Would you care to put some sort of time scale to "at some point?" Are we talking about months, years, decades, centuries?

34. I can solve a mystery novel based on the evidence alone. Assuming an LLM doesn’t already have the answer it will offer solutions based on meta-information like how similar mysteries conclude or are structured. While this can be effective, it’s not really solving the mystery and will fail with anything truly novel.

35. Any topic with little coverage in the training data. LLMs will keep circling around the small bits in the training data, unable synthesize new connections.

This is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish.

36. Edit: HN filters emojis in comments. TLDR: the bots reply with various sea-related emojis and can't explain that there's no seahorse emoji.

on Meta (whatsapp) chatbot:

me: what is the seahorse emoji?

bot:

me: this is not a seahorse

bot: You got me! The actual seahorse emoji is . Want me to use it in a fun sentence?

on Bing/Copilot:

me: show me the seahorse emoji

bot:

------------

Here you go: — actually, that’s a fun combo, but the real seahorse emoji is … wait, not quite! The correct one is for jellyfish and for unicorn, but the seahorse emoji itself is ? Let me clarify:

The official seahorse emoji is: ? No — correction! It’s for unicorn. The actual seahorse emoji is: ?

Let’s be precise: the seahorse emoji is ?

Oops — I misspoke earlier. The true seahorse emoji is: ?

Wait — let’s settle this cleanly:
The seahorse emoji is: ?

Actually, the correct one is: ?

---------------

In some instances, Copilot answers that "there's no seahorse emoji _and that's why I keep running in circles_". This is just pasting answers, as this is a popular problem discussed on the internet. If the bot were reasoning, it would just answer "there's no seahorse emoji, please stop asking".

37. Copilot is the absolute worst. Yesterday I had tried to have it create a printable calendar for January 2026 but no matter how I instructed it, it kept showing that the first was on a Wednesday, not Thursday. I even fed it back its own incorrect PDF in a new conversation, which clearly showed the 1st on a Wednesday and asked it to tell me what day the calendar showed the first on. It said the calendar showed the 1st as a Thursday. It started to make me disbelieve my own eyes.

Edit: I gave up on Copilot ant fed the same instructions to ChatGPT, which had no issue.

The point here is that some models seem to know your intention while some just seem stuck on their training data.

38. If that's the benchmark, then Opus 4.5 (with "extended thinking") can think:

> Me: what is the seahorse emoji?
> Claude: There isn't a seahorse emoji in the standard Unicode emoji set. The closest you'll get is the generic fish or tropical fish , but no dedicated seahorse exists as of now.

39. This makes a lot of sense and is consistent with the lens that LLMs are essentially better autocomplete

40. I'm mostly a fan of AI coding tools, but I think you're basically right about this.

I think we'll see more specialized models for narrow tasks (think AlphaFold for other challenges in drug discovery, for example) as well, but those will feel like individual, costly, high impact discoveries rather than just generic "AI".

Our world is human-shaped and ultimately people who talk of "AGI" secretly mean an artificial human.

I believe that "intelligence", the way the word is actually used by people, really just means "skillful information processing in pursuit of individual human desires".

As such, it will never be "solved" in any other way than to build an artificial human.

41. No, when you bring in the genetic algorithm (something LLM AI can be adjacent to by the scale of information it deals in) you can go beyond human intelligence. I work with GA coding tools pretty regularly. Instead of prompting it becomes all about devising ingenious fitness functions, while not having to care if they're contradictory.

If superhuman intelligence is solved it'll be in the form of building a more healthy society (or, if you like, a society that can outcompete other societies). We've already seen this sort of thing by accident and we're currently seeing extensive efforts to attack and undermine societies through exploiting human intelligence.

To a genetic algorithm techie that is actually one way to spur the algorithm to making better societies, not worse ones: challenge it harder. I guess we'll see if that translates to life out here in the wild, because the challenge is real.

42. > If superhuman intelligence is solved it'll be in the form of building a more healthy society (or, if you like, a society that can outcompete other societies).

Maybe so, but the point I'm trying to make is this needs to look nothing like sci-fi ASI fantasies, or rather, it won't look and feel like that before we get the humanoid AI robots that the GP mentioned.

You can have humans or human institutions using more or less specialized tools that together enable the system to act much more intelligently.

There doesn't need to be a single system that individually behaves like a god - that's a misconception that comes from believing that intelligence is something like a computational soul, where if you just have more of it you'll eventually end up with a demigod.

43. a stellar piece, Cal, as always. short and straight to the point.

I believe that Codex and the likes took off (in comparison to e.g. "AI" browsers) because the bottleneck there was not reasoning about code, it was about typing and processing walls of text. for a human, the interface of e.g. Google Calendar is ± intuitive. for a LLM, any graphical experience is an absolute hellscape from performance standpoint.

CLI tools, which LLMs love to use, output text and only text, not images, not audio, not videos. LLMs excel at text, hence they are confined to what text can do. yes, multimodal is a thing, but you lose a lot of information and/or context window space + speed.

LLMs are a flawed technology for general, true agents. 99% of the time, outside code, you need eyes and ears. we have only created a self-writing paper yet.

44. This is the reasoning deficit. Models are very good at generating large quantities of truthy outputs, but are still too stupid to know when they've made a serious mistake. Or, when they are informed about a mistake they sometimes don't "get it" and keep saying "you're absolutely right!" while doing nothing to fix the problem.

It's a matter of degree, not a qualitative difference. Humans have the exact same flaws, but amateur humans grow into expert humans with low error rates (or lose their job and go to work in KFC), whereas LLMs are yet to produce a true expert in anything because their error rates are unacceptably high.

45. If you think about the real-world and the key bottleneck with most creative work projects (this includes software), it's usually context (in the broadest sense of the word).

Humans are good at this because they are truly multi-modal and can interact through many different channels to gather additional context to do the requisite task at hand. Given incomplete requirements or specs, they can talk to co-workers, look up old documents from a previous release, send a Slack or Teams message, setup a Zoom meeting with stakeholders, call customers, research competitors, buy a competitors product and try it out while taking notes of where it falls short, make a physical site visit to see the context in which the software is to be used and environmental considerations for operation.

Point is that humans doing work have all sorts of ways to gather and compile more context before acting or while they are acting that an LLM does not and in some cases cannot have without the assistance of a human. This process in the real world can unfurl over days or weeks or in response to new inputs and our expectation of how LLMs work doesn't align with this.

LLMs can sort of do this, but more often than not, the failure of LLMs is that we are still very bad at providing proper and sufficient context to the LLM and the LLMs are not very good at requesting more context or reacting to new context, changing plans, changing directions, etc. We also have different expectations of LLMs and we don't expect the LLM to ask "Can you provide a layout and photo of where the machine will be set up and the typical operating conditions?" and then wait a few days for us to gather that context for it before continuing.

46. Once again, more evidence mounts that AI is massively overhyped and limited in usefulness, and once again we will see people making grandiose claims (without evidence of course) and predictions that will inevitably fall flat in the future. We are, of course, perpetually just 3-6 months away from when everything changes.

I think Carmack is right, LLM's are not the route to AGI.

47. What has he been wrong about? He was way ahead of predicting the scaling limitations, llm not making it to agi.

48. What scaling limitations, Gemini 3 shows us that is not over yet, and little brother flash is a hyper sparse, 1T parameter model (aiui) that is both fast and good

I agree with GP, Marcus has not been an accurate or significant voice, could care lass what he has to say about ai. He's not a practitioner anymore in my mind

49. Well clearly LLMs are not AGI, and all such calls of them being 'AGI' have been a pump and dump scam. So he got that dead right for years.

50. Probably any sales and marketing departments of companies with an "AI" product (based on an LLM) which is presented as having AGI-like capabilities.

I'm doubt parent poster was referring to anyone phrasing it in those literal terms. Kind of like how "some people claim flavored water can cure cancer" doesn't mean that's the literal pitch being given for the snake-oil.

51. No that’s not true at all. Humans can deal with ambiguity and operate independently. Claude can’t do that. You’re trading one “problem” for an entirely different one in this hypothetical.

52. Isn't that what polishing 'the prompt' does? Refine the communication like an editor does for a publication? Only in this case it's instructions for how to get a transformer to mine an existing set of code to produce some sort of vaguely useful output.

The human factor adds knowledge of the why that refines the results. Not just any algorithm or a standard pattern that fits, but the correct solution for the correct question.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Reasoning vs. Pattern Matching # Debates on whether LLMs truly think or merely predict tokens based on training data. Includes comparisons to human cognition, the definition of "reasoning" as argument production versus evaluation, and the argument that LLMs are "lobotomized" without external loops or formalization.

commentCount

← Back to job