x LESSWRONG is fundraising! LW Login A Bear Case: My Predictions Regarding AI Progress — LessWrong AI Risk AI Timelines AI World Modeling Curated 2025 Top Fifty: 14 % 377 A Bear Case: My Predictions Regarding AI Progress by Thane Ruthenis 5th Mar 2025 11 min read 163 377 This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading. I'm not fully committed to this model yet: I'm still on the lookout for more agents and inference-time scaling later this year. But Deep Research, Claude 3.7, Claude Code, Grok 3, and GPT-4.5 have turned out largely in line with these expectations [1] , and this is my current baseline prediction. The Current Paradigm: I'm Tucking In to Sleep I expect that none of the currently known avenues of capability advancement are sufficient to get us to AGI [2] . I don't want to say the pretraining will "plateau", as such, I do expect continued progress. But the dimensions along which the progress happens are going to decouple from the intuitive "getting generally smarter" metric, and will face steep diminishing returns. Grok 3 and GPT-4.5 seem to confirm this. Grok 3's main claim to fame was "pretty good: it managed to dethrone Claude Sonnet 3.5.1 for some people!". That was damning with faint praise. GPT-4.5 is subtly better than GPT-4, particularly at writing/EQ. That's likewise a faint-praise damnation: it's not much better. Indeed, it reportedly came out below expectations for OpenAI as well, and they certainly weren't in a rush to release it. (It was intended as a new flashy frontier model, not the delayed, half-embarrassed "here it is I guess, hope you'll find something you like here".) GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting. (Not to be a scaling-law denier. I believe in them, I do! But they measure perplexity , not general intelligence/real-world usefulness, and Goodhart's Law is no-one's ally.) OpenAI seem to expect this, what with them apparently planning to slap the "GPT-5" label on the Frankenstein's monster made out of their current offerings instead of on, well, 100x'd GPT-4. They know they can't cause another hype moment without this kind of trickery. Test-time compute/RL on LLMs: It will not meaningfully generalize beyond domains with easy verification . Some trickery like RLAIF and longer CoTs might provide some benefits, but they would be a fixed-size improvement. It will not cause a hard-takeoff self-improvement loop in "soft" domains. RL will be good enough to turn LLMs into reliable tools for some fixed environments/tasks. They will reliably fall flat on their faces if moved outside those environments/tasks. Scaling CoTs to e. g. millions of tokens or effective-indefinite-size context windows (if that even works) may or may not lead to math being solved. I expect it won't. It may not work at all: the real-world returns on investment may end up linear while the costs of pretraining grow exponentially. I mostly expect FrontierMath to be beaten by EOY 2025 ( it's not that difficult ), but maybe it won't be beaten for years. [3] Even if it "technically" works to speed up conjecture verification, I'm skeptical on this producing paradigm shifts even in "hard" domains. That task is not actually an easily verifiable one. ( If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions. ) "But the models feel increasingly smarter!": It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality. My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing , compared to e. g. OpenAI's corporate drones. The recent upgrade to GPT-4o seems to confirm this. They seem to have merely given it a better personality, and people were reporting that it "feels much smarter". Deep Research was this for me, at first. Some of its summaries were just pleasant to read, they felt so information-dense and intelligent! Not like typical AI slop at all! But then it turned out most of it was just AI slop underneath anyway, and now my slop-recognition function has adjusted and the effect is gone. What LLMs are good at: eisegesis-friendly problems and in-distribution problems. Eisegesis is "the process of interpreting text in such a way as to introduce one's own presuppositions, agendas or biases". LLMs feel very smart when you do the work of making them sound smart on your own end: when the interpretation of their output has a free parameter which you can mentally set to some value which makes it sensible/useful to you. This includes e. g. philosophical babbling or brainstorming. You do the work of picking good interpretations/directions to explore, you impute the coherent personality to the LLM. And you inject very few bits of steering by doing so, but those bits are load-bearing . If left to their own devices, LLMs won't pick those obviously correct ideas any more often than chance. See R1's CoTs, where it often does... that . This also covers stuff like Deep Research's outputs . They're great specifically as high-level overviews of a field, when you're not relying on them to be comprehensive or precisely on-target or for any given detail to be correct. It feels like this issue is easy to fix. LLMs already have ~all of the needed pieces, they just need to learn to recognize good ideas! Very few steering-bits to inject! This issue felt easy to fix since GPT-3.5, or perhaps GPT-2 . This issue is not easy to fix. In-distribution problems: One of the core features of the current AIs is the "jagged frontier" of capabilities. This jaggedness is often defended by "ha, as if humans don't have domains in which they're laughably bad/as if humans don't have consistent cognitive errors!". I believe that counterargument is invalid. LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B. This is consistent with the constant complaints about LLMs and LLM-based agents being unreliable and their competencies being impossible to predict ( example ). That is: It seems the space of LLM competence shouldn't be thought of as some short-description-length connected manifold or slice through the space of problems, whose shape we're simply too ignorant to understand yet. (In which case "LLMs are genuinely intelligent in a way orthogonal to how humans are genuinely intelligent" is valid.) Rather, it seems to be a set of individual points in the problem-space, plus these points' immediate neighbourhoods... Which is to say, the set of problems the solutions to which are present in their training data . [4] The impression that they generalize outside it is based on us having a very poor grasp regarding the solutions to what problems are present in their training data. And yes, there's some generalization. But it's dramatically less than the impressions people have of it. Agency: Genuine agency, by contrast, requires remaining on-target across long inferential distances : even after your task's representation becomes very complex in terms of the templates which you had memorized at the start. LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc. They just have bigger sets of templates now, which lets them fool people for longer and makes them useful for marginally more tasks. But the scaling on that seems pretty bad, and this certainly won't suffice for autonomously crossing the astronomical inferential distances required to usher in the Singularity. "But the benchmarks!" I dunno, I think they're just not measuring what people think they're measuring . See the point about in-distribution problems above, plus the possibility of undetected performance-gaming , plus some subtly but crucially unintentionally-misleading reporting . Case study: Prior to looking at METR's benchmark , I'd expected that it's also (unintentionally!) doing some shenanigans that mean it's not actually measuring LLMs' real-world problem-solving skills. Maybe the problems were secretly in the training data, or there was a selection effect towards simplicity, or the prompts strongly hinted at what the models are supposed to do, or the environment was set up in an unrealistically "clean" way that minimizes room for error and makes solving the problem correctly the path of least resistance (in contrast to messy real-world realities), et cetera. As it turned out, yes, it's that last one: see the "systematic differences from the real world" here . Consider what this means in the light of the previous discussion about inferential distances/complexity-from-messiness. As I'd said, I'm not 100% sure of that model. Further advancements might surprise me, there's an explicit carve-out for ??? consequences if math is solved , etc. But the above is my baseline prediction, at this point, and I expect the probability mass on other models to evaporate by this year's end. Real-World Predictions I dare not make the prediction that the LLM bubble will burst in 2025, or 2026, or in any given year in the near future. The AGI labs have a lot of money nowadays, they're managed by smart people, they have some real products, they're willing to produce propaganda, and they're buying their own propaganda (therefore it will appear authentic). They can keep the hype up for a very long time, if they want. And they do want to. They need it, so as to keep the investments going. Oceans of compute is the only way to collect on the LLM bet they've made, in the worlds where that bet can pay off, so they will keep maximizing for investment no matter how dubious the bet's odds start looking. Because what else are they to do? If they admit to themselves they're not closing their fingers around godhood after all, what will they have left? There will be news of various important-looking breakthroughs and advancements, at a glance looking very solid even to us/experts. Digging deeper, or waiting until the practical consequences of these breakthroughs materialize, will reveal that they're 80% hot air/hype-generation. [5] At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid. Inasmuch as LLMs boost productivity, it will mostly be as tools. There's a subtle but crucial difference between "junior dev = an AI model" and "senior dev + AI models = senior dev + team of junior devs". Both decrease the demand for junior devs (as they exist today, before they re-specialize into LLM whisperers or whatever). But the latter doesn't really require LLMs to be capable of end-to-end autonomous task execution, which is the property required for actual transformative consequences. (And even then, all the rumors about LLMs 10x'ing programmer productivity seem greatly overstated .) Inasmuch as human-worker replacements will come, they will be surprisingly limited in scope. I dare not make a prediction regarding the exact scope and nature, only regarding the directionality compared to current expectations. There will be a ton of innovative applications of Deep Learning, perhaps chiefly in the field of biotech, see GPT-4b and Evo 2 . Those are, I must stress, human-made innovative applications of the paradigm of automated continuous program search. Not AI models autonomously producing innovations. There will be various disparate reports about AI models autonomously producing innovations, in the vein of this or that or that . They will turn out to be misleading or cherry-picked. E. g., examining those examples: In the first case, most of the improvements turned out to be reward-hacking (and not even intentional on the models' part). In the second case , the scientists have pre-selected the problem on which the LLM is supposed to produce the innovation on the basis of already knowing that there's a low-hanging fruit to be picked there. That's like 90% of the work. And then they further picked the correct hypothesis from the set it generated, i. e., did eisegesis. And also there might be any amount of data contamination from these scientists or different groups speaking about their research in public, in the years they spent working on it. In the third case, the AI produces useless slop with steps like "..., Step N: invent the Theory of Everything (left as an exercise for the reader), ...", lacking the recognition function for promising research. GPT-3-level stuff. (The whole setup can also likely be out-performed by taking the adjacency matrix of Wikipedia pages and randomly sampling paths from the corresponding graph, or something like this .) I expect that by 2030s, LLMs will be heavily integrated into the economy and software, and will serve as very useful tools that found their niches. But just that: tools. Perhaps some narrow jobs will be greatly transformed or annihilated (by being folded into the job of an LLM nanny). But there will not be AGI or broad-scope agents arising from the current paradigm, nor autonomous 10x engineers. At some unknown point – probably in 2030s , possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that . Closing Thoughts This might seem like a ton of annoying nitpicking. Here's a simple generator of all of the above observations: some people desperately, desperately want LLMs to be a bigger deal than what they are . They are not evaluating the empirical evidence in front of their eyes with proper precision . [6] Instead, they're vibing, and spending 24/7 inventing contrived ways to fool themselves and/or others. They often succeed. They will continue doing this for a long time to come. We, on the other hand, desperately not want LLMs to be AGI-complete. Since we try to avoid motivated thinking, to avoid deluding ourselves into believing into happier realities, we err on the side of pessimistic interpretations. In this hostile epistemic environment, that effectively leads to us being overly gullible and prone to buying into hype . Indeed, this environment is essentially optimized for exploiting the virtue of lightness . LLMs are masters at creating the vibe of being generally intelligent. Tons of people are cooperating, playing this vibe up, making tons of subtly-yet-crucially flawed demonstrations. Trying to see through this immense storm of bullshit very much feels like "fighting a rearguard retreat against the evidence". [7] But this isn't what's happening, in my opinion. On the contrary: it's the LLM believers who are sailing against the winds of evidence. If LLMs were actually as powerful as they're hyped up to be, there wouldn't be the need for all of these attempts at handholding . Ever more contrived agency scaffolds that yield ~no improvement. Increasingly more costly RL training procedures that fail to generalize. Hail-mary ideas regarding how to fix that generalization issue. Galaxy-brained ways to elicit knowledge out of LLMs that produce nothing of value. The need for all of this is strong evidence that there's no seed of true autonomy/agency/generality within LLMs . If there were, the most naïve AutoGPT setup circa early 2023 would've elicited it. People are extending LLMs a hand, hoping to pull them up to our level. But there's nothing reaching back. And none of the current incremental-scaling approaches will fix the issue. They will increasingly mask it, and some of this masking may be powerful enough to have real-world consequences. But any attempts at the Singularity based on LLMs will stumble well before takeoff. Thus, I expect AGI Labs' AGI timelines have ~nothing to do with what will actually happen. On average, we likely have more time than the AGI labs say. Pretty likely that we have until 2030, maybe well into 2030s. By default, we likely don't have much longer than that. Incremental scaling of known LLM-based stuff won't get us there, but I don't think the remaining qualitative insights are many. 5-15 years, at a rough guess. ^ For prudency's sake: GPT-4.5 has slightly overshot these expectations. ^ If you are really insistent on calling the current crop of SOTA models "AGI", replace this with "autonomous AI" or "transformative AI" or "innovative AI" or "the transcendental trajectory" or something. ^ Will o4 really come out on schedule in ~2 weeks, showcasing yet another dramatic jump in mathematical capabilities, just in time to rescue OpenAI from the GPT-4.5 semi-flop? I'll be waiting. ^ This metaphor/toy model has been adapted from @Cole Wyeth . ^ Pretty sure Deep Research could not in fact "do a single-digit percentage of all economically valuable tasks in the world" , except in the caveat-laden sense where you still have a human expert double-checking and rewriting its outputs. And in my personal experience, on the topics at which I am an expert, it would be easier to write the report from scratch than to rewrite DR's output. It's a useful way to get a high-level overview of some topics, yes. It blows Google out of the water at being Google, and then some. But I don't think it's a 1-to-1 replacement for any extant form of human labor. Rather, it's a useful zero-to-one thing. ^ See all the superficially promising "AI innovators" from the previous section, which turn out to be false advertisement on a closer look. Or the whole "10x'd programmer productivity" debacle . ^ Indeed, even now, having written all of this, I have nagging doubts that this might be what I'm actually doing here. I will probably keep having those doubts until this whole thing ends, one way or another. It's not pleasant. 377 A Bear Case: My Predictions Regarding AI Progress 127 Daniel Kokotajlo 49 ACCount 22 Thane Ruthenis 7 Kabir Kumar 10 Thane Ruthenis 0 StopAI 11 Cole Wyeth 2 ErickBall 27 Dylan Richardson 6 Daniel Kokotajlo 3 Seth Herd 26 Thane Ruthenis 8 Rafael Harth 7 yo-cuddles 2 johnkclark 48 johnswentworth 16 Cole Wyeth 36 Vladimir_Nesov 9 Auspicious 8 Cole Wyeth 12 Kaj_Sotala 11 Thane Ruthenis 7 p.b. 37 Stephen McAleese 16 Thane Ruthenis 38 Thomas Kwa 7 p.b. 7 Thomas Kwa 6 johnswentworth 34 Erik Jenner 22 Thomas Kwa 8 Thane Ruthenis 11 Thomas Kwa 5 Thane Ruthenis 9 dysangel 8 Gunnar_Zarncke -15 FluidThinkers 2 Thane Ruthenis 1 Raphael Roche 22 Seth Herd 11 orangecelsius32 7 Seth Herd 9 Vladimir_Nesov 5 Seth Herd 3 p.b. 21 Vladimir_Nesov 6 Paragox 9 Vladimir_Nesov 5 Thane Ruthenis 18 Raemon 18 abramdemski 8 RyanCarey 12 Vladimir_Nesov 3 GoteNoSente 7 Steven Byrnes 1 StopAI 6 abramdemski 17 Daniel Kokotajlo 9 Thane Ruthenis 21 Daniel Kokotajlo 67 Steven Byrnes 5 p.b. 5 Daniel Kokotajlo 3 Thane Ruthenis 6 Noosphere89 5 quetzal_rainbow 3 Thane Ruthenis 21 Thane Ruthenis 1 yo-cuddles 10 Daniel Kokotajlo 5 yo-cuddles 7 Daniel Kokotajlo 5 yo-cuddles 17 Garrett Baker 14 Jackson Wagner 14 Mateusz Bagiński 15 Thane Ruthenis 4 Cole Wyeth 11 Loki zen 10 Ben Pace 6 Thane Ruthenis 5 Ben Pace 9 Alice Blair 12 Thane Ruthenis 8 Vladimir_Nesov 7 Thane Ruthenis 7 Vladimir_Nesov 3 Thane Ruthenis 5 Vladimir_Nesov 5 Alice Blair 9 Matt Levinson 3 Thane Ruthenis 9 Daniel Kokotajlo 4 Cole Wyeth 8 Tyler Tracy 6 Thane Ruthenis 8 Daniel Kokotajlo 5 Thane Ruthenis 10 Vladimir_Nesov 5 Daniel Kokotajlo 4 Thane Ruthenis 7 Legionnaire 7 Satya Benson 9 Thane Ruthenis 4 Vladimir_Nesov 1 Thane Ruthenis 6 Vladimir_Nesov 3 Thane Ruthenis 5 Vladimir_Nesov 6 abramdemski 2 Thane Ruthenis 4 abramdemski 5 StanislavKrym 4 Thane Ruthenis 5 KatWoods 6 Thane Ruthenis 5 Noosphere89 5 Anders Lindström 5 Thane Ruthenis 4 Anders Lindström 5 Daniel Kokotajlo 4 rain8dome9 3 Thane Ruthenis 4 Filipe Aleixo 4 Mikhail Samin 7 Thane Ruthenis 5 Martin Randall 5 Mikhail Samin 3 Thane Ruthenis 4 Jman9107 25 Thane Ruthenis -6 StopAI 4 Thane Ruthenis 4 Vugluscr Varcharka 4 Cole Wyeth 3 Andrey Seryakov 2 Thane Ruthenis 3 ScottWofford 7 Jackson Wagner 1 ScottWofford 1 Jackson Wagner 3 Mark Schröder 6 Thane Ruthenis 3 Mark Schröder 3 IC Rainbow 3 Sergii 3 [email protected] 3 Aprillion 3 Chris_Leong 6 Thane Ruthenis 4 Chris_Leong 3 peterr 5 Mitchell_Porter 3 Daniel Kokotajlo 2 The Dao of Bayes 2 uhbif19 2 Thane Ruthenis 2 R S 1 Hide 1 IC Rainbow 0 akarlin 9 Thane Ruthenis New Comment Submit Rendering 162 / 163 comments, sorted by top scoring (show more) Click to highlight new comments since: Today at 11:28 PM Some comments are truncated due to high volume. (⌘F to expand all) Change truncation settings [ - ] Daniel Kokotajlo 10mo 127 93 some people desperately, desperately want LLMs to be a bigger deal than what they are . A larger number of people, I think, desperately desperately want LLMs to be a smaller deal than what they are. Reply [ - ] ACCount 10mo 49 28 The more mainstream you go, the larger this effect gets. A lot of people seemingly want AI to be a nothingburger. When LLMs emerged, in mainstream circles, you'd see people go "it's not important, it's not actually intelligent, you can see it make the kind of reasoning mistakes a 3 year old would". Meanwhile, on LessWrong: "holy shit, this is a big fucking deal, because it's already making the same kind of reasoning mistakes a human three year old would!" I'd say that LessWrong is far better calibrated. People who weren't familiar with programming or AI didn't have a grasp of how hard natural language processing or commonsense reasoning used to be for machines. Nor do they grasp the implications of scaling laws. Reply [ - ] Thane Ruthenis 10mo 22 0 Meanwhile, on LessWrong: "holy shit, this is a big fucking deal, because it's already making the same kind of reasoning mistakes a human three year old would!" FWIW, that was me in 2022, looking at GPT-3.5 and being unable to imagine how capabilities can progress from there that doesn't immediately hit ASI. (I don't think I ever cared about benchmarks. Brilliant humans can't necessarily ace math exams, so why would I gatekeep the AGI term behind that?) Now it's two-and-a-half years later and I no longer see it. As far as I'm concerned, this paradigm harnessed most of its general-reasoning potential at 3.5 and is now asymptoting out around something. I don't know what this something is, but it doesn't seem to be "AGI". All "improvement" since then has just been window dressing; the models learning to convincingly babble about ever-more-sophisticated abstractions and solve ever-more-complicated math/coding puzzles that make their capabilities legible to ever-broader categories of people. But it's not anything GPT-3.5 wasn't already fundamentally capable of; and GPT-3.5 was not capable of taking off, and there's been no new fundamental capability advances since then. (I remember dreading ... (read more) Reply 1 7 Kabir Kumar 10mo What observations would change your mind? [ - ] Thane Ruthenis 10mo 10 0 See here . Reply 0 StopAI 10mo Your observations are basically "At the point where LLM's are AGI. I will change my mind" If it solves pokemon one-shot, solves coding or human beings are superfluous for decision making. It's already practically AGI. These are bad examples! All you have shown me now is that you can't think of any serious intermediate steps LLM's have to go through before they reach AGI. [ - ] Cole Wyeth 10mo 11 5 No, it's possible for LLMs to solve a subset of those problems without being AGI (even conceivable, as the history of AI research shows we often assume tasks are AI complete when they are not e.g. Hofstader with chess, Turing with the Turing test). I agree that the tests which are still standing are pretty close to AGI; this is not a problem with Thane's list though. He is correctly avoiding the failure mode I just pointed it out. Unfortunately, this does mean that we may not be able to predict AGI is imminent until the last moment. That is a consequence of the black-box nature of LLMs and our general confusion about intelligence. Reply 2 ErickBall 10mo Why on earth would pokemon be AGI-complete? [ - ] Dylan Richardson 9mo 27 17 Some people here seem to think that motivated reasoning is only something that people who want an outcome do, meaning that people concerned about doom and catastrophe can’t possibly be susceptible. This is a mistake. Everyone desires vindication. No one want to be that guy that was so cautious that he fails to be praised for his insight. This drives people to favoring extreme outcomes, because extreme views are much more attention grabbing and a chance to be seen as right feels a lot better than being wrong feels bad (It's easy to avoid fault for false predictions and claim credit for true ones). Obviously, this is just one possible bias, maybe Daniel and others with super short timelines are still very well calibrated. But it bares consideration. Reply 4 6 Daniel Kokotajlo 9mo Not only is that just one possible bias, it's a less-common bias than its opposite. Generally speaking, more people are afraid to stick their necks out and say something extreme than actively biased towards doing so. Generally speaking, being wrong feels more bad than being right feels good. There are exceptions; some people are contrarians, for example (and so it's plausible I'm one of them) but again, talking about people in general, the bias goes in the opposite direction from what you say. 3 Seth Herd 9mo Definitely. Excellent point. See my short bit on motivated reasoning, in lieu of the full post I have on the stack that will address its effects on alignment research. I frequently check how to correct my timelines and takes based on potential motivated reasoning effects for myself. The result is usually to broaden my estimates and add uncertainty, because it's difficult to identify which direction MR might've been pushing me during all of the mini-decisions that led to forming my beliefs and models. My motivations are many and which happened to be contextually relevant at key decision points is hard to guess. On the whole I'd have to guess that MR effects are on average larger on long timelines and low p(dooms). They both allow us to imagine a sunny near future, and to work on our preferred projects instead of panicking and having to shift to work that can help with alignment if AGI happens soon. Sorry. This is worth a much more careful discussion, that's just my guess in the absence of pushback. [ - ] Thane Ruthenis 10mo 26 13 Yup, the situation is somewhat symmetrical here; see also the discussion regarding which side is doing the sailing-against-the-winds-of-evidence. My "tiebreaker" there is direct empirical evidence from working with LLMs, including attempts to replicate the most impressive and concerning claims about them. So far, this source of evidence has left me thoroughly underwhelmed. Reply 8 Rafael Harth 10mo Can confirm that I'm one of these people (and yes, I worry a lot about this clouding my judgment). 7 yo-cuddles 10mo Definitely! However, there is more money and "hype" in the direction of wanting these to scale into AGI. Hype and anti-hype don't cancel each other out, if someone invests a billion dollars into LLM's, someone else can't spend negative 1 billion and it cancels out: the billion dollar spender is the one moving markets, and getting a lot of press attention. We have Yudkowsky going on destiny, I guess? 2 johnkclark 10mo I agree. I think some people are whistling past the graveyard. [ - ] johnswentworth 10mo 48 19 Noting for the sake of later evaluation: this rough picture matches my current median expectations. Not very high confidence; I'd give it roughly 60%. Reply 1 1 [ - ] Cole Wyeth 10mo 16 -1 I give it ~70%, except caveats: "Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach." It won't be neurosymbolic. Also I don't see where the 2030 number is coming from. At this point my uncertainty is almost in the exponent again. Seems like decades is plausible (maybe <50% though). It's not clear that only one breakthrough is necessary. Reply [ - ] Vladimir_Nesov 10mo 36 2 Without an intelligence explosion, it's around 2030 that scaling through increasing funding runs out of steam and slows down to the speed of chip improvement. This slowdown happens around the same time (maybe 2028-2034) even with a lot more commercial success (if that success precedes the slowdown), because scaling faster takes exponentially more money. So there's more probability density of transformative advances before ~2030 than after, to the extent that scaling contributes to this probability. That's my reason to see 2030 as a meaningful threshold, Thane Ruthenis might be pointing to it for different reasons. It seems like it should certainly be salient for AGI companies, so a long timelines argument might want to address their narrative up to 2030 as a distinct case. Reply 9 Auspicious 10mo I also found that take very unusual, especially when combined with this: The last sentence seems extremely overconfident, especially combined with the otherwise bearish conclusions in this post. I'm surprised no one else has mentioned it. 8 Cole Wyeth 10mo Yeah, I agree - overall I agree pretty closely with Thane about LLMs but his final conclusions don't seem to follow from the model presented here. [ - ] Kaj_Sotala 10mo 12 4 I think I'm also around 60-70% for the rough overall picture in the OP being correct. Reply [ - ] Thane Ruthenis 10mo 11 2 I'm at ~80%, for comparison. Reply 7 p.b. 10mo Same here. [ - ] Stephen McAleese 10mo * 37 23 "Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach." I think you might be underestimating the power of incremental, evolutionary improvements over time where near-term problems are constantly solved and this leads to gradual improvement. After all, human intelligence is the result of gradual evolutionary change and increasing capabilities over time. It's hard to point to a specific period in history where humans achieved general intelligence. Currently LLMs are undoubtedly capable at many tasks (e.g. coding, general knowledge) and much more capable than their predecessors. But it's hard to point at any particular algorithmic improvement or model and say that it was key to the success of modern LLMs. So I think it's possible that we'll see more gradual progress and tweaks on LLMs that lead towards increasingly capable models and eventually yield AGI. Eventually you could call this progress a new architecture even though all the progress is gradual. Reply 2 [ - ] Thane Ruthenis 10mo 16 -3 I don't think that's how it works. Local change accumulating into qualitative improvements over time is a property of continuous (-ish) search processes, such as the gradient descent and, indeed, evolution. Human technological progress is instead a discrete-search process. We didn't invent the airplane by incrementally iterating on carriages; we didn't invent the nuclear bomb by tinkering with TNT. The core difference between discrete and continuous search is that... for continuous search, there must be some sort of "general-purpose substrate" such that (1) any given object in the search-space can be defined as some parametrization of this substrate, and (2) this substrate then allows a way to plot a continuous path between any two objects such that all intermediate objects are also useful. For example: For evolution, it's the genome: you could move from any organism to any other organism by doing incremental DNA adjustments, and the in-between organisms must be competitive. For ML, it's the model's parameters: for any two programs that can be implemented on a given architecture, you can plot the path from one of them to the other, and this path is followed if it's the path of gradually ... (read more) Reply [ - ] Thomas Kwa 10mo * 38 13 A continuous manifold of possible technologies is not required for continuous progress. All that is needed is for there to be many possible sources of improvements that can accumulate, and for these improvements to be small once low-hanging fruit is exhausted. Case in point: the nanogpt speedrun , where the training time of a small LLM was reduced by 15x using 21 distinct innovations which touched basically every part of the model, including the optimizer, embeddings, attention, other architectural details, quantization, hyperparameters, code optimizations, and Pytorch version. Most technologies are like this, and frontier AI has even more sources of improvement than the nanogpt speedrun because you can also change the training data and hardware. It's not impossible that there's a moment in AI like the invention of lasers or the telegraph , but this doesn't happen with most technologies, and the fact that we have scaling laws somewhat points towards continuity even as other things like small differences being amplified in downstream metrics point to discontinuity. Also see my comment here on a similar topic. If you think generalization is limited in the current regime, try to create AGI... (read more) Reply 1 7 p.b. 10mo Because these benchmarks are all in the LLM paradigm: Single input, single output from a single distribution. Or they are multi-step problems on rails. Easy verification makes for benchmarks that can quickly be cracked by LLMs. Hard verification makes for benchmarks that aren't used. One could let models play new board/computer games against average humans: Video/image input, action output. One could let models offer and complete tasks autonomously on freelancer platforms. One could enrol models in remote universities and see whether they autonomously reach graduation. It's not difficult to come up with hard benchmarks for current models (these are not close to AGI complete). I think people don't do this because they know that current models would be hopeless at benchmarks that actually aim for their shortcomings (agency, knowledge integration + integration of sensory information, continuous learning, reliability, ...) 7 Thomas Kwa 10mo Agree, this is one big limitation of the paper I'm working on at METR. The first two ideas you listed are things I would very much like to measure, and the third something I would like to measure but is much harder than any current benchmark given that university takes humans years rather than hours. If we measure it right, we could tell whether generalization is steadily improving or plateauing. 6 johnswentworth 10mo I think you should address Thane's concrete example: That seems to me a pretty damn solid knock-down counterargument. There were no continuous language model scaling laws before the transformer architecture, and not for lack of people trying to make language nets. [ - ] Erik Jenner 10mo 34 18 There were no continuous language model scaling laws before the transformer architecture https://arxiv.org/abs/1712.00409 was technically published half a year after transformers, but it shows power-law language model scaling laws for LSTMs (several years before the Kaplan et al. paper, and without citing the transformer paper). It's possible that transformer scaling laws are much better, I haven't checked (and perhaps more importantly, transformer training lets you parallelize across tokens), just mentioning this because it seems relevant for the overall discussion of continuity in research. I also agree with Thomas Kwa's sibling comment that transformers weren't a single huge step. Fully-connected neural networks seem like a very strange comparison to make, I think the interesting question is whether transformers were a sudden single step relative to LSTMs. But I'd disagree even with that: Attention was introduced three years before transformers and was a big deal for machine translation. Self-attention was introduced somewhere between the first attention papers and transformers. And the transformer paper itself isn't atomic, it consists of multiple ideas—replacing RNNs/LSTMs with ... (read more) Reply 2 [ - ] Thomas Kwa 10mo 22 8 Though the fully connected -> transformers wasn't infinite small steps, it definitely wasn't a single step. We had to invent various sub-innovations like skip connections separately, progressing from RNNs to LSTM to GPT/BERT style transformers to today's transformer++. The most you could claim is a single step is LSTM -> transformer. Also if you graph perplexity over time, there's basically no discontinuity from introducing transformers, just a possible change in slope that might be an artifact of switching from the purple to green measurement method. The story looks more like transformers being more able to utilize the exponentially increasing amounts of compute that people started using just before its introduction, which caused people to invest more in compute and other improvements over the next 8 years. We could get another single big architectural innovation that gives better returns to more compute, but I'd give a 50-50 chance that it would be only a slope change, not a discontinuity. Even conditional on discontinuity it might be pretty small. Personally my timelines are also short enough that there is limited time for this to happen before we get AGI. Reply 8 Thane Ruthenis 10mo This argument still seems to postdict that cars were invented by tinkering with carriages and horse-breeding, spacecraft was invented by tinkering with planes, refrigerators were invented by tinkering with cold cellars, et cetera. If you take the snapshot of the best technology that does X at some time T, and trace its lineage, sure, you'll often see the procession of iterative improvements on some concepts and techniques. But that line won't necessarily pass through the best-at-X technologies at times from 0 to T - 1. The best personal transportation method were horses, then cars. Cars were invented by iterating on preceding technologies and putting them together; but horses weren't involved. Similar for the best technology at lifting a human being into the sky, the best technology for keeping food cold, etc. I expect that's the default way significant technological advances happen. They don't come from tinkering with the current-best-at-X tech. They come from putting together a bunch of insights from different or non-mainstream tech trees, and leveraging them for X in a novel way. And this is what I expect for AGI. It won't come from tinkering with LLMs, it'll come from a continuous-in-retrospect, surprising-in-advance contribution from some currently-disfavored line(s) of research. (Edit: I think what I would retract, though, is the point about there not being a continuous manifold of possible technological artefacts. I think something like "the space of ideas the human mind is capable of conceiving" is essentially it.) [ - ] Thomas Kwa 10mo 11 1 I think we have two separate claims here: Do technologies that have lots of resources put into their development generally improve discontinuously or by huge slope changes? Do technologies often get displaced by technologies with a different lineage? I agree with your position on (2) here. But it seems like the claim in the post that sometime in the 2030s someone will make a single important architectural innovation that leads to takeover within a year mostly depends on (1), as it would require progress within that year to be comparable to all the progress from now until that year. Also you said the architectural innovation might be a slight tweak to the LLM architecture, which would mean it shares the same lineage. The history of machine learning seems pretty continuous wrt advance prediction. In the Epoch graph, the line fit on loss of the best LSTM up to 2016 sees a slope change of less than 2x, whereas a hypothetical innovation that causes takeover within a year with not much progress in the intervening 8 years would be ~8x. So it seems more likely to me (conditional on 2033 timelines and a big innovation) that we get some architectural innovation which has a moderately different l... (read more) Reply 5 Thane Ruthenis 10mo Indeed, and I'm glad we've converged on (2). But... ... On second thoughts, how did we get there? The initial disagreement was how plausible it was for incremental changes to the LLM architecture to transform it into a qualitatively different type of architecture. It's not about continuity-in-performance, it's about continuity-in-design-space. Whether finding an AGI-complete architecture would lead to a discontinuous advancement in capabilities, to FOOM/RSI/sharp left turn, is a completely different topic from how smoothly we should expect AI architectures' designs to change. And on that topic, (a) I'm not very interested in reference-class comparisons as opposed to direct gears-level modeling of this specific problem, (b) this is a bottomless rabbit hole/long-standing disagreement which I'm not interested in going into at this time. That's an interesting general pattern, if it checks out. Any guesses why that might be the case? My instinctive guess is the new-paradigm approaches tend to start out promising-in-theory, but initially very bad, people then tinker with prototypes, and the technology becomes commercially viable the moment it's at least marginally better than the previous-paradigm SOTA. Which is why there's an apparent performance-continuity despite a lineage/paradigm-discontinuity. 9 dysangel 10mo >There's no equivalent in technology. There isn't some "general-purpose technological substrate" such that you can start with any technological artefact, slightly perturb it, iterate, and continuously reach any other technological artefact. Discontinuous/discrete changes are needed. It sounds like you're almost exactly describing neural nets and backpropagation. A general purpose substrate that you slightly perturb to continuously and gradually move towards the desired output. I believe that as we have better ideas for self play, focusing on quality of thought processes over general knowledge, that we'll see some impressive results. I think we're already seeing signs of this in the increasing quality of smaller models. 8 Gunnar_Zarncke 10mo Evolution also deals with discrete units. Either the molecule replicates or it doesn't. Granted, physical evolutions is more massively parallel, but the search space is smaller in biology, but the analogy should hold as long as the search space is large enough to hide the discreteness. And if 10000s of developers try 100s of small alternatives, some few of them might hit the transformer. -15 FluidThinkers 10mo 2 Thane Ruthenis 10mo I actually looked into that recently. My initial guess was this was about "the context window" as a concept. It allows to keep vast volumes of task-relevant information around, including the outputs of the model's own past computations, without lossily compressing that information into a small representation (like with RNNs). I asked OpenAI's DR about it, and its output seems to support that guess. In retrospect, it makes sense that this would work better. If you don't know what challenges you're going to face in the future, you don't necessarily know what past information to keep around, so a fixed-size internal state was a bad idea. 1 Raphael Roche 10mo Exactly. Future is hard to predict and the author's strong confidence seems suspicious to me. Improvements came fast last years. 2013-2014 : word2vec and seq2seq 2017 : transformer and gpt-1 2022 : CoT prompting 2023 multimodal LLMs 2024 reasonning models. Are they linear improvements or revolutionnary breakthroughs ? Time will tell, but to me there is no sharp frontier between increment and breakthrough. It might happen that AGI results from such improvements, or not. We just don't know. But it's a fact that human general intelligence resulted from a long chain of tiny increments, and I also observe that results in ARC-AGI bench exploded with CoT/reasoning models (not just math or coding benchs). So, while 2025 could be a relative plateau, I won't be so sure that next years will also. To me a confidence far from 50% is hard to justify. [ - ] Seth Herd 10mo 22 9 I agree with almost everything you've said about LLMs. I still think we're getting human-level AGI soonish. The LLM part doesn't need to be any better than it is. A human genius with no one-shot memory (severe anterograde amnesia) and very poor executive function (ability to stay on task and organize their thinking) would be almost useless - just like LLMs are. LLMs replicate only part of humans' general intelligence. It's the biggest part, but it just wouldn't work very well without the other contributing brain systems. Human intelligence, and its generality (in particular our ability to solve truly novel problems) is an emergent property of interactions among multiple brain systems (or a complex property if you don't like that term). See Capabilities and alignment of LLM cognitive architectures In brief, LLMs are like a human posterior cortex. A human with only a posterior cortex would be about as little use as an LLM (of course this analogy is imperfect but it's close). We need a prefrontal cortex (for staying on task, "executive function"), a medial temporal cortex and hippocampus for one-shot learning, and a basal ganglia for making better decisions than just whatever first comes t... (read more) Reply [ - ] orangecelsius32 10mo 11 0 This is an interesting model, and I know you acknowledged that progress could take years, but my impression is that this would be even more difficult than you're implying. Here are the problems I see, and I apologize in advance if this doesn't all make sense as I am a non-technical newb. Wouldn’t it take insane amounts of compute to process all of this? LLM + CoT already uses a lot of compute (see: o3 solving ARC puzzles for $1mil). Combining this with processing images/screenshots/video/audio, plus using tokens for incorporating saved episodic memories into working memory, plus tokens for the decision-making (basal ganglia) module = a lot of tokens. Can this all fit into a context window and be processed with the amount of compute that will be available? Even if one extremely expensive system could run this, could you have millions of agents running this system for long periods of time? How do you train this? LLMs are superhuman at language processing due to training on billions of pieces of text. How do you train an agent similarly? We don’t have billions of examples of a system like this being used to achieve goals. I don’t think ... (read more) Reply 7 Seth Herd 10mo I don't think this path is easy; I think immense effort and money will be directed at it by default, since there's so much money to be made by replacing human labor with agents. And I think no breakthroughs are necessary, just work in fairly obvious directions. That's why I think this is likely to lead to human-level agents. 1. I don't think it would take insane amounts of compute, but compute costs will be substantial. They'll be roughly like costs for OpenAIs Operator; it runs autonomously, making calls to frontier LLMs and vision models essentially continuously. Costs are low enough that $200/month covers unlimited use. (although that thing is so useless people probably aren't using it much. So the compute costs of o1 pro thinking away continuously are probably a better indicator; Altman said $200/mo doesn't quite cover the average, driven by some users keeping as many going constantly as they can. It can't all be fit into a context window for complex tasks. And it's costly even when the whole task would fit. That's why additional memory systems are needed. There are already context window management techniques in play for existing limited agents. And RAG systems seem to already be adequate to serve as episodic memory; humans use much fewer memory "tokens" to accomplish complex tasks than the large amount of documentation stored in current RAG systems used for non-agentic retrieval assisted generation of answers to questions that rely on documented information. So I'd estimate something like $20-30 for an agent to run all day. This could come down a lot if you managed to have many of its calls use smaller/cheaper LLMs than whatever is the current latest and greatest. 2. Humans train themselves to act agentically by assembling small skills (pick up the food and put it in your mouth, run forward, look for tracks) into long time horizon tasks (hunting). We do not learn by performing RL on long sequences and applying the learning to everything w 9 Vladimir_Nesov 10mo But AI speed advantage? It's 100x-1000x faster, so years become days to weeks. Compute for experiments is plausibly a bottleneck that makes it take longer, but at genius human level decades of human theory and software development progress (things not bottlenecked on experiments) will be made by AIs in months. That should help a lot in making years of physical time unlikely to be necessary, to unlock more compute efficient and scalable ways of creating smarter AIs. 5 Seth Herd 10mo Yes, probably. The progression thus far is that the same level of intelligence gets more efficient - faster or cheaper. I actually think current systems don't really think much faster than humans - they're just faster at putting words to thoughts, since their thinking is more closely tied to text. But if they don't keep getting smarter, they will still likely keep getting faster and cheaper. 3 p.b. 10mo I kinda agree with this as well. Except that it seems completely unclear to me whether recreating the missing human capabilities/brain systems takes two years or two decades or even longer. It doesn't seem to me to be a single missing thing and for each separate step holds: That it hasn't been done yet is evidence that it's not that easy. [ - ] Vladimir_Nesov 10mo 21 0 I'm not sure raw compute (as opposed to effective compute) GPT-6 (10,000x GPT-4) by 2029 is plausible (without new commercial breakthroughs). Nvidia Rubin is 2026-2027 (models trained on it 2027-2029), so a 2029 model plausibly uses the next architecture after (though it's more likely to come out in early 2030 then, not 2029). Let's say it's 1e16 FLOP/s per chip (BF16, 4x B200) with time cost $4/hour (2x H100), that is $55bn to train for 2e29 FLOPs and 3M chips in the training system if it needs 6 months at 40% utilization (reinforcing the point that 2030 is a more plausible timing, 3M chips is a lot to manufacture). Training systems with H100s cost $50K per chip all-in to build (~BOM not TCO), so assuming it's 2x more for the after-Rubin chips the training system costs $300B to build. Also, a Blackwell chip needs 2 KW all-in (a per-chip fraction of the whole datacenter), so the after-Rubin chip might need 4 KW, and 3M chips need 12 GW. These numbers need to match the scale of the largest AI companies. A training system ($300bn in capital, 3M of the newest chips) needs to be concentrated in the hands of a single company, probably purpose-built. And then at least $55bn of its time ne... (read more) Reply 6 Paragox 10mo For funding timelines, I think the main question increasingly becomes: how much of the economical pie could be eaten by narrowly superhuman AI tooling? It doesn't take hitting an infinity/singularity/fast takeoff for plausible scenarios under this bearish reality to nevertheless squirm through the economy at Cowen-approved diffusion rates and gradually eat insane $$$ worth of value, and therefore, prop up 100b+ buildouts. OAI's latest sponsored pysop leak today seems right in line with bullet point numero uno under real world predictions, that they are going to try and push 100 billion market eaters on us whether we, ahem, high taste commentators like it or not. Perhaps I am biased by years of seeing big-numbers-detached-from-reality in FAANG, but I see the centaurized Senior SWE Thane alluded too easily eating up a 100 billion chunk[1] worldwide (at current demand, not even adjusting for the marginal cost of software -> size of software market relation!) Did anyone pay attention to the sharp RLable improvements in the O3-in-disguise Deep Research model card, vs O1? We aren't getting the singularity, yes, but scaling RL on every verifiable code PR in existence (plus 10^? of synthetic copies) seems increasingly likely to get us the junior/mid level API (I hesitate to call it agent), that will write superhuman commits for the ~90% of PRs that have well-defined and/or explicitly testable objectives. Perhaps then we will finally start seeing some of that productivity 10xing that Thane is presently and correctly skeptical off; only Senior+ need apply of course. (Side note: in the vein of documenting predictions, I currently predict that in the big tech market, at-scale Junior hiring is on its waning and perhaps penultimate cycle, with senior and especially staff compensation likewise soon skyrocketing as every ~1 mil/year USD quartet of supporting Juniors is replaced with a 300k/year Claude Pioneer subscription straight into an L6's hands.) I think the main danger 9 Vladimir_Nesov 10mo That's why I used the "no new commercial breakthroughs" clause, $300bn training systems by 2029 seem in principle possible both technically and financially without an intelligence explosion, just not with the capabilities legibly demonstrated so far. On the other hand, pre-training as we know it will end[1] in any case soon thereafter, because at ~current pace a 2034 training system would need to cost $15 trillion (it's unclear if manufacturing can be scaled at this pace, and also what to do with that much compute, because there isn't nearly enough text data, but maybe pre-training on all the video will be important for robotics). How far RL scales remains unclear, and even at the very first step of scaling o3 doesn't work as clear evidence because it's still unknown if it's based on GPT-4o or GPT-4.5 (it'll become clearer once there's an API price and more apples-to-apples speed measurements). ---------------------------------------- 1. This is of course a quote from Sutskever's talk. It was widely interpreted as saying it has just ended, in 2024-2025, but he never put a date on it. I don't think it will end before 2027-2028. ↩︎ 5 Thane Ruthenis 10mo I did meant effective compute, yeah. Noted, though. (Always appreciate your analyses, by the way. They're consistently thorough and informative.) [ - ] Raemon 10mo * 18 3 It seems good for me to list my predictions here. I don't feel very confident. I feel an overall sense of "I don't really see why major conceptual breakthroughs are necessary." (I agree we haven't seen, like, an AI do something like "discover actually significant novel insights .") This doesn't translate into me being confident in very short timelines, because the remaining engineering work (and "non-major" conceptual progress) might take a while, or require a commitment of resources that won't materialize before a hype bubble pops. But: a) I don't see why novel insights or agency wouldn't eventually fall out of relatively straightforward pieces of: "make better training sets" (and training-set generating processes) "do RL training on a wide variety of tasks" "find some algorithmic efficiency advances that, sure, require 'conceptual advances' from humans, but of a sort of straightforward kind that doesn't seem like it requires deep genius?" b) Even if A doesn't work, I think "make AIs that are hyperspecialized at augmenting humans doing AI research" is pretty likely to work, and that + just a lot of money/attention generally going into the space seems to increase the likelihood of it... (read more) Reply 2 [ - ] abramdemski 10mo 18 4 This fits my bear-picture fairly well. Here's some details of my bull-picture: GPT4.5 is still a small fraction of the human brain, when we try to compare sizes. It makes some sense to think of it as a long-lived parrot that's heard the whole internet and then been meticulously reinforced to act like a helpful assistant. From this perspective, it makes a lot of sense that its ability to generalize datapoints is worse than human, and plausible (at least naively) that one to four additional orders of magnitude will close the gap. Even if the pretraining paradigm can't close the gap like that due to fundamental limitations in the architecture, CoT is approximately Turing-complete. This means that the RL training of reasoning models is doing program search, but with a pretty decent prior (ie representing a lot of patterns in human reasoning). Therefore, scaling reasoning models can achieve all the sorts of generalization which scaling pretraining is failing at, in principle; the key question is just how much it needs to scale in order for that to happen. While I agree that RL on reasoning models is in some sense limited to tasks we can provide good feedback on, it seems like ... (read more) Reply 1 1 1 8 RyanCarey 10mo Is GPT4.5's ?10T parameters really a "small fraction" of the human brain's 80B neurons and 100T synapses? [ - ] Vladimir_Nesov 10mo 12 2 Human brain holds 200-300 trillion synapses . A 1:32 sparse MoE at high compute will need about 350 tokens/parameter to be compute optimal [1] . This gives 8T active parameters (at 250T total), 2,700T training tokens, and 2e29 FLOPs (raw compute GPT-6 that needs a $300bn training system with 2029 hardware). There won't be enough natural text data to train it with, even when training for many epochs . Human brain clearly doesn't train primarily on external data (humans blind from birth still gain human intelligence), so there exists some kind of method for generating much more synthetic data from a little bit of external data. I'm combining the 6x lower-than-dense data efficiency of 1:32 sparse MoE from Jan 2025 paper with 1.5x-per-1000x-compute decrease in data efficiency from Llama 3 compute optimal scaling experiments, anchoring to Llama 3's 40 tokens/parameter for a dense model at 4e25 FLOPs. Thus 40x6x1.5, about 350. It's tokens per active parameter, not total. ↩︎ Reply 3 GoteNoSente 10mo Isn't it fairly obvious that the human brain starts with a lot of pretraining just built in by evolution? I know that some people make the argument that the human genome does not contain nearly enough data to make up for the lack of subsequent training data, but I do not have a good intuition for how apparently data efficient an LLM would be that can train on a limited amount of real world training data plus synthetic reasoning traces of a tiny teacher model that has been heavily optimised with massive data and compute (like the genome has). I also don't think that we could actually reconstruct a human just from the genome (I expect transferring the nucleus of a fertilised human egg into, say, a chimpanzee ovum and trying to gestate it in the womb of some suitable mammal would already fail for incompatibility reasons), so the cellular machinery that runs the genome probably carries a large amount of information beyond just the genome as well, in the sense that we need that exact machinery to run the genome. In many other species it is certainly the case that much of the intelligence of the animal seems hardwired genetically. The speed at which some animal acquires certain skills therefore does not tell us too much about the existence of efficient algorithms to learn the same behaviours from little data starting from scratch. 7 Steven Byrnes 10mo I think parts of the brain are non-pretrained learning algorithms, and parts of the brain are not learning algorithms at all, but rather innate reflexes and such. See my post Learning from scratch in