Summarizer

Skepticism About Theory

Arguments that theory may be impossible due to data complexity, model size requirements, and analogies to understanding human consciousness requiring something larger than the brain

← Back to There Will Be a Scientific Theory of Deep Learning

While some researchers believe we are close to explaining why neural networks outperform other models, many skeptics argue that a unified theory may be impossible because the true complexity resides in massive, unstructured datasets rather than the architectures themselves. This perspective suggests that understanding systems of such immense scale is mathematically daunting, drawing parallels to the paradox of a human brain attempting to comprehend consciousness using only its own limited capacity. Others view deep learning less as a traditional product of human engineering and more as a discovery of "natural" principles, where empirical experimentation currently outpaces our ability to predict silent failure modes or establish fundamental laws. Ultimately, the discourse highlights a tension between the hope that simple rules can define complex phenomena and the fear that the sheer scale of modern data represents a fundamental barrier to human understanding.

14 comments tagged with this topic

View on HN · Topics
As someone who works in the area, this provides a decent summary of the most popular research items. The most useful and impressive part is the set of open problems at the end, which just about covers all of the main research directions in the field. The skepticism I'm seeing in the comments really highlights how little of this work is trickling down to the public, which is very sad to see. While it can offer few mathematical mechanisms to infer optimal network design yet (mostly because just trying stuff empirically is often faster than going through the theory, so it is more common to retroactively infer things), the question "why do neural networks work better than other models?" is getting pretty close to a solid answer. Problem is, that was never the question people seem to have ever really been interested in, so the field now has to figure out what questions we ask next.
View on HN · Topics
Theory becomes critical when you need to predict failure modes. A decision support system that 'just works' most of the time but fails silently on edge cases is worse than a simpler system with known limitations. Understanding the bias mechanisms would help us know when a model is confident vs when it's just pattern matching. That distinction matters when the stakes are high.
View on HN · Topics
Deep learning works at a very high level because 'it can keep learning from more data' better than any other approaches. But without the 'stupid amount of data' that is available now, the architecture would be kind of irrelevant. Unless you are going some way to explain both sides of the model-data equation I don't feel you have a solid basis to build a scientific theory, e.g. 'why reasoning models can reason'. The model is the product of both the architecture and training data. My fear is that this is as hopeless right now as explaining why humans or other animals can learn certain things from their huge amount of input data. We'll gain better empirical understanding, but it won't ever be fundamental computer science again, because the giga-datasets are the fundamental complexity not the architecture.
View on HN · Topics
> We argue complexity conceals underlying regularity, and that deep learning will indeed admit a scientific theory That would be amazing, but personally I’m skeptical.
View on HN · Topics
"Yeah, about that" - Gödel
View on HN · Topics
tbf, we've learned (ha!) more from smashing teeny tiny particles and "looking" at what comes out than from say 40 years of string theory. Sometimes doing stuff works, and the theory (hopefully) follows.
View on HN · Topics
Is there not some Rice's Theorem equivalent for deep nets? After all they are machines that are randomly generated, so from classical computer science I would not presume a theory of "what do all deep nets do" to be prima facie logically possible. Nor do I see this explained in the objections section.
View on HN · Topics
Well, "There Will Be a Scientific Theory of Deep Learning" looks like flag planting - an academic variant of "I told you so!", but one that is a citation magnet.
View on HN · Topics
It's actually really fascinating that there isn't a scientific theory of deep learning, especially as it's a product of human engineering as opposed to e.g. biology or particle physics.
View on HN · Topics
Calling it “a product of human engineering” is misleading. Deep learning exploits principles we don’t fully understand. We didn’t engineer those principles. It’s not fundamentally any different than particle physics or biology, which are both similarly consequences of rules that we didn’t invent and can’t control.
View on HN · Topics
There are very good reasons why it took this long, but can be summed up as: everyone was looking in the wrong place. Deep learning breaks a hundred years of statistical intuition, and you don't move a ship that large quickly.
View on HN · Topics
There is, but it is fractured. I would equate this effort as more of a standardization of terms and language.
View on HN · Topics
I’m in the skeptical camp. Whatever theory that will eventually emerge will not be as solid as: 1. Theory of pattern recognition (as developed in 80s and 90s) 2. Theory of thermodynamics 3. Theory of gravity 4. Theory of electromagnetism 5. Theory of relativity Etc. because of two reasons: 1. While half of deep learning is how humans construct the architecture of networks, the more important half relies on data. This data is a hodgepodge of scraped internet data (text and videos), books, user interactions etc., which really has no coherent structure 2. To extract meaningful insights from this much data, it takes models of enormous size like 10B+. The thing about random systems (in the mathematical sense) is that it takes “something” of order of magnitude bigger size to “understand” it, unless there is some concentration of measure type mathematical niceties (as in thermodynamics), which I don’t think is there in these models and data. This is the same reason I don’t think humans will ever be able to “understand” human consciousness. It will take something of an order of magnitude bigger than our own brains to do that. Here is Terence Tao explaining this concentration stuff in another context: https://mathstodon.xyz/@tao/113873092369347147 I would love to be proven wrong though.
View on HN · Topics
The whole point about theory, though, is that simple rules can define complex phenomena. I don’t think anything you wrote fundamentally rules out the idea that we could find a theory of deep learning.