In case I ever want to revisit this topic, e.g. to respond to pushback…

Inside Views, Impostor Syndrome, and the Great LARP – dives a bit deeper into “insight” (though doesn’t use that term, and doesn’t discuss it as an AI capability)

https://sideways-view.com/2018/02/24/takeoff-speeds/ — good 2018 analysis of takeoff speeds

The Generative AI Paradox. As I said to Sam:

This was interesting (note, should link to the middle section of a longer post). A study claiming to demonstrate that LLMs are better at generating text than understanding text. I always question the details on these things, but the high-level concept is interesting, plausible, and may provide another perspective on the challenge of long-form thinking: if indeed an LLM has trouble understanding what it has done, then it will also struggle to make good choices about how to proceed further.

Faith and Fate: Limits of Transformers on Compositionality provides evidence that merely scaling transformers won’t get us to AGI. I’ve only skimmed the first couple of pages. 

We propose two hypotheses. First, Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized path matching. This contrasts with the systematic multi-step reasoning approach that learns to apply underlying computational rules required for building correct answers [54, 32, 24]. Shortcut learning [26] via pattern-matching may yield fast correct answers when similar compositional patterns are available during training but does not allow for robust generalization to uncommon or complex examples. Second, due to error propagation, Transformers may have inherent limitations on solving high-complexity compositional tasks that exhibit novel patterns. Errors in the early stages of the computational process can lead to substantial compounding errors in subsequent steps, preventing models from finding correct solutions.

Empirical results show that training on task-specific data leads to near-perfect performance on in-domain instances and under low compositional complexity, but fails drastically on instances outside of this region. This substantial gap suggests that systematic problem-solving capabilities do not emerge from maximum likelihood training [5] on input-output sequences, even when prompted or trained with human-like reasoning steps (i.e., a linearization of computation graphs; §3.1).

Our careful study based on the computation graph and analyses demonstrates that Transformers can often solve multi-step compositional problems by collapsing the depth of the compositional operations via analogical pattern matching. More broadly, our findings
suggest that the strong performance of Transformers should be taken with a certain grain of salt: Despite initially appearing challenging, certain tasks may not possess the inherent compositionality they seem to have. This is due to the fact that desired solutions could be readily derived from input-output sequences present in the training data, allowing for shortcut pattern matching to produce acceptable solutions.