Discussion of why the universal approximation theorem is necessary but not sufficient to explain neural network performance, noting that SVMs and other models share this property, making it insufficient to distinguish neural network superiority
← Back to There Will Be a Scientific Theory of Deep Learning
While the universal approximation theorem establishes a necessary foundation for neural networks, it fails to explain their practical superiority because many other models, such as SVMs and gradient boosting, share the same theoretical capability. Commenters argue that the true "secret sauce" lies elsewhere, suggesting that implicit regularization, complex biases within optimization, and massive parameter scaling are the real drivers of modern performance. This debate highlights a significant gap between academic theories regarding scaling laws and a more pragmatic view that attributes success to sheer computational power and empirical refinement. Ultimately, universal approximation is seen as a mere baseline for computability that offers little insight into why specific architectures excel where traditional models falter.
10 comments tagged with this topic