Spatial Reasoning Limitations

Discussion of LLMs struggling with spatial tasks, image orientation affecting OCR accuracy, and whether ARC-AGI improvements indicate genuine spatial reasoning advances or benchmark-specific solutions

The discussion highlights the persistent "Moravec’s paradox" in AI, where effortless human spatial tasks remain significant hurdles for models that struggle with basic image orientation and complex 3D geometry. While some view rising ARC-AGI scores as a breakthrough in "graphical" reasoning, others dismiss the benchmark as an overhyped visual puzzle that fails to represent genuine general intelligence. Practical limitations in OCR and CAD engineering suggest that even advanced LLMs still require human-in-the-loop feedback or specialized spatial sub-models to overcome frequent hallucinations. Ultimately, the consensus indicates that while visual AI is advancing, achieving robust world-modeling remains a significant challenge that may require new training paradigms beyond traditional text and token generation.

View on HN · Topics

I suspect the non-spikey part is the more interesting comparison

Why is it so easy for me to open the car door, get in, close the door, buckle up. You can do this in the dark and without looking.

There are an infinite number of little things like this you think zero about, take near zero energy, yet which are extremely hard for Ai

View on HN · Topics

> There's a term for this, but I can't think of it at the moment.

Moravec's paradox: https://epoch.ai/gradient-updates/moravec-s-paradox

View on HN · Topics

I'm excited for the big jump in ARC-AGI scores from recent models, but no one should think for a second this is some leap in "general intelligence".

I joke to myself that the G in ARC-AGI is "graphical". I think what's held back models on ARC-AGI is their terrible spatial reasoning, and I'm guessing that's what the recent models have cracked.

Looking forward to ARC-AGI 3, which focuses on trial and error and exploring a set of constraints via games.

View on HN · Topics

Agreed. I love the elegance of ARC, but it always felt like a gotcha to give spatial reasoning challenges to token generators- and the fact that the token generators are somehow beating it anyway really says something.

View on HN · Topics

Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?

View on HN · Topics

That's a bit like saying just give blind people cameras so they can see.

View on HN · Topics

I mean, no not really. These models can see, you're giving them eyes to connect to that part of their brain.

View on HN · Topics

They should train more on sports commentary, perhaps that could give spatial reasoning a boost.

View on HN · Topics

Arc-AGI (and Arc-AGI-2) is the most overhyped benchmark around though.

It's completely misnamed. It should be called useless visual puzzle benchmark 2.

It's a visual puzzle, making it way easier for humans than for models trained on text firstly. Secondly, it's not really that obvious or easy for humans to solve themselves!

So the idea that if an AI can solve "Arc-AGI" or "Arc-AGI-2" it's super smart or even "AGI" is frankly ridiculous. It's a puzzle that means nothing basically, other than the models can now solve "Arc-AGI"

View on HN · Topics

One discovery I've made with gemini is that ocr accuracy is much higher when document is perfectly aligned at 0 degree. When we provided images with handwritten text to gemini which were horizontal (90 or 180 degree) it had lots of issues reading dates, names etc. Then we used paddle ocr image orientation model to find orientation and rotate the image it solved most of our issues with ocr.

View on HN · Topics

I just tested it on a very difficult Raven matrix, that the old version of DeepThink, as well as GPT 5.2 Pro, Claude Opus 4.6, and pretty much every other model failed at.

This version of DeepSeek got it first try. Thinking time was 2 or 3 minutes.

The visual reasoning of this class of Gemini models is incredibly impressive.

View on HN · Topics

it is interesting that the video demo is generating .stl model.
I run a lot of tests of LLMs generating OpenSCAD code (as I have recently launched https://modelrift.com text-to-CAD AI editor) and Gemini 3 family LLMs are actually giving the best price-to-performance ratio now. But they are very, VERY far from being able to spit out a complex OpenSCAD model in one shot. So, I had to implement a full fledged "screenshot-vibe-coding" workflow where you draw arrows on 3d model snapshot to explain to LLM what is wrong with the geometry. Without human in the loop, all top tier LLMs hallucinate at debugging 3d geometry in agentic mode - and fail spectacularly.

View on HN · Topics

Yes, I've been waiting for a real breakthrough with regard to 3D parametric models and I don't think think this is it. The proprietary nature of the major players (Creo, Solidworks, NX, etc) is a major drag. Sure there's STP, but there's too much design intent and feature loss there. I don't think OpenSCAD has the critical mass of mindshare or training data at this point, but maybe it's the best chance to force a change.

View on HN · Topics

yes, i had the same experience. As good as LLMs are now at coding - it seems they are still far away from being useful in vision dominated engineering tasks like CAD/design. I guess it is a training data problem. Maybe world models / artificial data can help here?

View on HN · Topics

If you want that to get better, you need to produce a 3d model benchmark and popularize it. You can start with a pelican riding a bicycle with working bicycle.

View on HN · Topics

Google is way ahead in visual AI and world modelling. They're lagging hard in agentic AI and autonomous behavior.

View on HN · Topics

Not very likely?

ARC-AGI-3 has a nasty combo of spatial reasoning + explore/exploit. It's basically adversarial vs current AIs.

View on HN · Topics

How is spatial reasoning useless??

Summarizer