@Séb Krier: (I know I'm a stuck record) An important assumption in AI discourse is that sufficiently capable generalist *models* are the main event. Get the model smart enough, and it more or less does everything. Value creation, competitive advantage, and risk would all concentrate at the… pic.twitter.com/b5HXqxcAZv https://x.com/i/status/1999994684203045003 When I talk about where time horizons and scaling curves come from, reference the recent axrp podcast where Daniel talks about the boiling point of water. Being a bit of a more contingent phenomenon than we generally recognize, a function of the complex Dynamics between liquid water, water vapor, and dust particles in the air. Other scaling laws are probably aggregates of multiple factors, including feedback loops that we may not be recognizing. For example, the time horizons curve may be a function of it, taking longer and longer to find the rarer and rarer stumbling blocks for models. I like this model much more than the fixed hazard rate model to explain the distribution of model success rates at tasks of different lengths. “Drop-in-remote worker” is a frustrated response to the argument that it will take time for AI to diffuse into the economy. It’s ultimately a bet that AI will zoom past the need for adaptation faster than we will adapt to it. For now, that’s a bad bet. Maybe even more so in robotics. (In robotics, we have two related questions: will robots be humanoid, and will we need to refactor work to leverage them?) Reference https://blog.cosmos-institute.org/p/faster-horses. Intelligence is a concept that exists only in practice, not in theory. https://www.patreon.com/posts/january-3-2025-147255956 Intelligence is compression. But there is no perfect compression algorithm nor even any universal compression algorithm. In some sense, there is no such thing as a compression algorithm. It is a fundamental law of information that there does not exist in the algorithm which can compress all files. In fact, any compression algorithm must fall into one of two categories either. It never compresses anything at all, or there are some inputs which it makes larger rather than smaller . And yet, both compression algorithms and intelligence clearly exist in some sense in the world. The resolution to this paradox is that both compression and intelligence exist only within some universe. Some distribution space of inputs. When you know something about the set of input files or world situations that you might encounter, you can create a compression algorithm or a world understanding and action selection algorithm that is more likely than that to do well in that specific distribution. How well you can do this depends in large part on how much information you have about that distribution and of course how well you're able to understand and leverage that understanding. But we never have perfect understanding, in some sense. It's probably the case that we have only very little understanding, and of course we only have limited ability to make use of what such information we have, so everything we do is approximations and heuristics. And so the messy complexity of intelligence comes in as you try to select those heuristics. All of this has a very meta nature. You can act, you can spend time thinking about what action to take, or you can work on improving your algorithms for thinking. You can collect more information, you can review which sorts of actions have worked better or less. Well, you can review what heuristics for thinking have been more or less valuable, you can work on your criteria for performing those evaluations. There is no ceiling to the potential meta stack, no known formula for how much energy you should expend at each level of the stack, no known formula for selecting. How to devote the effort that you do expend at each level. Practices such as rationality, Buddhism, listen to your gut, etc. Are all are all heuristics, even rationality is a heuristic. Perhaps in some theoretically ideal world where you could be perfectly rational, rationality would be more than just a heuristic, it would be the correct approach. However, we do not live in that world, and so even rationalists can only apply rationalism into a finite extent. At the beginning of the year, I was listening to Ezra Klein interview someone about Buddhism and meditation and rising above your feelings and emotions. And I got to thinking that when I encounter a new practice like this, I find it threatening or exhausting, because I seize upon the idea that the practice should be applied exhaustively extensively thoroughly. And that is both an enormous amount of work, too much to contemplate, and inappropriate because any of these practices is just another heuristic, to be incorporated into our heuristic toolkit. This has extensive implications for both. How we build AIS, and how we evaluate and assess them and understand their implications. Because intelligence is not something that exists in an ideal platonic world, that can only exist in a particular universe with a particular distribution of situations and goals, there is no bedrock theory upon which we can construct, or even evaluate, any particular AI system or approach to building AI systems. Even to the extent that rigorous analysis might be possible, we are far away from having actually discovered how to carry it out. Our understanding of the universe that we live in, and our tools for working with that knowledge, are far too limited and clumsy. As a result, the development and assessment of artificial intelligence will be inherently a heuristic, experimental, messy, fumbling project. Any particular new technique- such as having AI's evaluate their own work and to look for errors that they can self-correct, or introducing a system of multiple agents, critiquing one another, or recording lessons and memories, etc, none of these will be a silver bullet. For any new technique or tool in the tool kit, there will be the question of when to apply it. How to apply it? How extensively to apply it? When to keep applying it and when to give up and switch to another tool, or to combine two tools. The AI can never know for certain what to do, or how to figure out what to do, or how hard to work on figuring out what to do. In light of all this messiness and uncertainty and intractability, it is fascinating to contemplate why scaling was exist. For a project. As for an endeavor as messy and ill-defined and preparadamatic as the development of AI, why should smooth curves exist anywhere on any graph? One thought I have is to question the extent to which this is actually true. I'd been told that we've all read repeatedly about learning curves, and Wright's law, which basically says that Moore's law has played out in many many technologies, not just transistor fabrication. But a recent paper showed that this is not particularly true, or is only very very loosely true, that learning curves tend to wiggle all over the place and are not nearly as straight of a line as Moore's law. And in the specific example of Moore's law, there are various factors that made it into a self-fulfilling prophecy, it exists because we believed it would exist and we worked used it as our planning yardstick, Rather than it being some laws nature. Nature. It would be interesting to go look at the various scaling law graphs, and see how straight they really are. Various forms of philosophical ethics, whether to treat principles as absolute, etc. Stereotypes, snap judgments, surface impressions- these are all limited and fallible, but they're also essential for getting through everyday life. Life. It requires a judgment call to decide when to spend the effort to look past your snap impression, and a judgment call to know how much effort to spend on that judgment, call, etc. System 1 versus system 2. Thinking. Which of several conceptual frameworks to bring to a situation. Heuristics: think before you act, measure twice / cut once, etc. N Thoughts on AGI; Offcuts https://news.ycombinator.com/item?id=46408921 https://x.com/snewmanpv/status/2005329341845303612 https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-into-claude-opus-4-5-from-pokemon Anil Seth (@anilkseth) posted at 6:45 AM on Mon, Nov 10, 2025:"Cognition all the way down". Great to see this fine new paper from @robertchisciure & @drmichaellevin our now in Synthese - it introduces a new metric to quantify biological intelligence as search efficiency in multidimensional problem spaces https://t.co/YH7W3F9FtQ https://t.co/UKIULAD5wU(https://x.com/anilkseth/status/1987894412584566876?t=0JSx8bvFJGpJy6NR5Xh7kw&s=03) @Ethan Mollick: In general, all the chatbots seem to struggle with files in a way that CLI versions do not. Gemini will frequently confuse which nano banana image you are referring to in a conversation (the chain of thought shows it loses track) and ChatGPT often misplaces files that it makes https://x.com/i/status/2000413758729162889 Thought about AGI: for biological anchors, perhaps we should be looking at communications bandwidth rather than computation? It's easy to add more compute. https://x.com/i/status/1998778005129224412 From https://jasmi.news/p/42-notes-on-ai-and-work: “Diffusion lag” reflects a lack of product-market fit. Even AI optimists are still hitting practical roadblocks. That’s why detailed case studies are so much fun: physics, code security, running a restaurant at a small independent hotel. Our friendly hotel purveyor describes one such long-horizon task: “To replicate [chef] Hagai’s context, you’d need entire recipes, or maybe video of him preparing the foods; Toast sales data, or maybe video of the dining room; our hours; his calendar, featuring private events; communications among staff about what’s getting used for what; the CSVs for Baldor; the paper receipts for quick runs to Loeb’s; and maybe surveillance footage to capture exceptions.” https://gwern.net/ai-daydreaming https://x.com/1a3orn/status/1997056050403725373 https://www.dwarkesh.com/p/thoughts-on-ai-progress-dec-2025 “AI Agenda: AI’s ‘Split-Brain’ Problem” https://x.com/Jack_W_Lindsey/status/1993389056932339721 From https://www.strangeloopcanon.com/p/epicycles-all-the-way-down – “Epicycles” is a great description for some of the behavior these models are learning (and the way we’re layering on new training techniques). Mike Knoop (@mikeknoop) posted at 11:32 AM on Tue, Nov 25, 2025:"scaling sucked out all the oxygen in the room, everyone converged to the same ideas" --> new ideas still needed!(https://x.com/mikeknoop/status/1993402375944679673?t=LWizqAJMjEshtTbDlsNbgg&s=03) Quintin Pope (@QuintinPope5) posted at 3:42 AM on Thu, Nov 20, 2025:It is genuinely weird how many complex dynamical feedback loops end up producing smooth exponentials as their aggregate outputs.(https://x.com/QuintinPope5/status/1991472282422899184?t=DqRlOyRAVVMmxdd9X6Go_Q&s=03) Machine Learning Street Talk (@MLStreetTalk) posted at 11:38 PM on Sat, Nov 15, 2025:Intelligence is not just compression.The future is much harder to know than the past, especially for the "objects of interest" (complex/adaptive systems)Creation, construction and composition are better words to use than compression.Compression is a great proxy for the(https://x.com/MLStreetTalk/status/1989961238839693788?t=Fu6VNj83BJCK7qwDh3hKNw&s=03) Why are centaurs a big win in coding but not in chess? The human brain, significantly, develops new circuits on the fly. We are not seeing AI being able to do that, but [my Superproductivity post] we are starting to see AI being able to capture new learnings in the form of written knowledge, new prompts, and new code. https://x.com/fchollet/status/1989340153114976598 https://x.com/HjalmarWijk/status/1985529956890530217 From my comments on Timothy Lee’s draft post on long context: [Timothy] Anthropic CEO Dario Amodei said that “there's no reason we can't make the context length 100 million words today, which is roughly what a human hears in their lifetime.” With any sort of back-of-the-envelope math, 100M for human lifetime seems off. Average reading speed is around 250 words per minute, let's conservatively call that 300 tokens per minute. That's 36M words per 2000-hour work year. We don't spend our whole lives reading, but we do spend a lot of time reading in one fashion or another, and quite a bit of the rest consuming audio information at not-drastically-lower rates. Especially as LLMs (at least currently) are a lot more verbose than people are. Critically, this includes being verbose in their chain-of-thought. Also software engineering involves a lot of skimming – through code windows, terminal output, etc. A person might be presented with 1000 tokens of terminal output and get away with only reading 100 tokens of it, whereas an LLM would probably need to process all of it. [Timothy] In February 2025, a team of researchers at Adobe published research on a more difficult variant of the needle-in-a-haystack test. Here the “needle” was a sentence like “Yuki lives next to the Semper Opera House,” and the model would be asked “Which character has been to Dresden?” To answer this question, you need to know that the Semper Opera House is in Dresden. Leading language models do know this, so if you give them this challenge in a short prompt (a small “haystack”) they tend to get it right more than 90% of the time. But if you give them the same challenge in a larger “haystack” — for example, a 32,000-token prompt — performance drops dramatically. GPT-4o goes from 99% accuracy to 70%; Claude 3.5 Sonnet drops from 88% to 30%. 1. I wonder how well most people would do on this test. 2. A person given this task would likely skim back through the passage. LLMs are handicapped by not being able to do this (my point being that this handicap supports your thesis). https://x.com/emollick/status/1985507610963968447 How will we measure AI capabilities for multi day tasks? Karnofsky on 80,000 hours notes that AIS so far don't really seem to exhibit power seeking in any non-trivial way and he interviews this to the fact that we are not training them on. You know a kind of very long tasks in an environment where like that like they're not being given any training tasks where you know accumulating resources hacking into computers to expand your computing capacity. You know suborning the global dialogue or national dialogue to enact more AI favorable policies like that's just far away from anything that shows up in their training. This suggests to me that so long as AI's continue to be poor at generalization, we're probably in good shape because you know probably continue to be the case for quite a while that there's no training regime where they would specifically get to hone their techniques or even you know desire experience of of experiencing reward for that kind of strategic power seeking. So as long as they're not good at generalization and so long as we're not incorporating any kind of maybe real world feedback into their training and their and their global State, that would provide the opportunity for them to learn from attempts at power seeking, then we're probably safe with regard to power seeking Andrej Karpathy (@karpathy) posted at 10:09 AM on Wed, Oct 01, 2025:Finally had a chance to listen through this pod with Sutton, which was interesting and amusing.As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea(https://x.com/karpathy/status/1973435013875314729?t=l8ZRYSohCQ_g9t18XW18pA&s=03) https://x.com/dwarkesh_sp/status/1979259041013731752?t=_ICAkvN3N3W24PVdwS8kmA&s=03 https://x.com/karpathy/status/1979644538185752935?t=H0LxJvX5xMtiIqHt69rLpA&s=03 Statistics often make AI sound more impactful than it is. Ryan Greenblatt pointing out that “AI writing 90% of the code at Anthropic” may be only true of some teams, and in any case may not be 90% of the value of the code. Fin interview on the Cognitive Revolution, their agent is handling 60% of support tickets but not 60% of the workload. An example of someone actually leaning in to asking a model to do a thing it can't do. Analyze the failures and what they say about missing capabilities: https://pca.st/episode/d9d8178e-9f7e-4102-bf2c-aa3e747bc188 From https://www.dwarkesh.com/p/andrej-karpathy: His experience writing nano chat is that models are bad at writing new or unusual code. They very strongly veer back toward using standard libraries standard coding styles. They're not great at at new things. This causes him to be less optimistic about the potential for automation of AI R&D and for the models to be discovering developing new techniques and pushing the state-of-the-art forward. It's interesting to contrast this with the claim from Anthropic that 90% of all their code internally is being written by by LLMs. I guess we could interpret this as maybe, even at a place like Anthropic there's a lot of boilerplate code being written. Maybe “written by LLMs” is a little bit loose and some of that includes autocomplete which Andre describes as as actually working well for him. It could be that the issue he experienced was not that he was doing something new in terms of the in terms of the technical approach, but more just his code style that he was trying to do something lean, streamlined, with few dependencies and maybe the models are are fine at using the standard techniques in different combinations and his problem was just that he was using down standard techniques and maybe maybe it's really the different combinations that will be important in pushing the state of the art forward all the by the same token that suggests that maybe even as the models are writing 90% of the code of anthropic, they're not really supplying all that much of a percentage of the sort of cognitive labor involved Cf. Intercom's Fin handling 60% of tickets but a smaller percentage of support workload "They're not very good at code that has never been written before... which is what we're trying to achieve when we're building these models." One of the next big advances might be getting more than one bit of data per RL rollout, introspect over where the model did well / poorly. ("Process supervision") Andre talks about llms not having cultural knowledge. There's no sense in which an LLM can write a book for other llms to consume. This is in the context of the takeoff in human capabilities about 10,000 years ago that was based on the ability to accumulate cultural knowledge and cultural capital. I might argue that the process of accumulating cultural knowledge can be thought of as a multi-decade project tying back to the METR timeline concept or task time horizon concept that we arguably won't have AGI until AI time horizons are measured in decades because that's the humans can pursue decades-long projects and the interesting scenarios for AGI or ASI where it's rapidly advancing its own development where it's developing now, full energy sources, etc. That's things that constitute multi-decade projects. In particular. The AI R&D explosion as as described in AI 2027 is I think fairly explicitly a century-long project. So for that to ignite AI would have to. Maybe it wouldn't have to be at the century-long horizon level, but it would have to be at a long enough horizon level that it can be advancing its own horizon faster than the scope of the project increases at the same time. We should always keep in mind that just as AI development may encounter unexpected barriers, it may also encounter unexpected leaps forward and I should reiterate what I've said on a number of occasions that AI's superhuman strengths May compensate for its weaknesses. Jaggedness runs both ways. Maybe the labor market collapses when AI can adapt into new opportunities faster than people can. (Mod demand for human labor specifically because it is human.) I think I already wrote about this in conversation with someone, maybe it’s already in the More Thoughts doc? Or maybe that was verbal. Trying out a thought, on the topic of the endless debate as to whether AI will lead to permanent mass unemployment (because it will eventually do ~everything), or people will find new, plausibly-better jobs (because this is what has ~always happened before). Here's my thought: as the world has evolved over the centuries, new opportunities continually emerge, and those niches are initially filled by people. Eventually, some of those new jobs go away as automation / mechanization catches up, but new niches keep opening up and people – being far more adaptable than machines – get there first. For instance, it took many decades from the invention of the telephone to the invention of the automated switching systems, meaning that there were several generations in which someone could make a career as a switchboard operator. In this framing, *the tipping point will occur if/when AI becomes more adaptable than people*. If that happens, AI will fill the new niches first, leaving no window for people to occupy new "jobs" as they emerge. I could note, in contrast, that the web created an opportunity for Internet research librarians, but it was filled by Google, not people. [assuming AI isn’t supply constrained, and setting aside demand for human labor specifically because it is human – which needs to be distinguished from objective quality issues] Chris Barber (GG discussion on WhatsApp): Kind of like stocks and flows in systems Stocks = who does it better today Flow = who adapts faster/better My naive expectation is that humans and ai will drift and specialize in the direction of wherever they adapt better I expect there’ll be many things where humans adapt better for a long time, though I’m unsure on the quantity of roles in those areas It’s also a way to assess impacts on people and points to things that would help society adapt i.e. people will do well proportional to how well they adapt to new supply constrained high demand things and also a way of measuring the kind of economic challenge level of ai is something like to what extent it adapts faster than people in what portion of valuable areas (i can imagine some kind of chart for this) when ai systems gets better at new things slowly, less adaptation needed from humans, less intense if it’s adapting to most of the obvious new reskilling directions quickly, feels much more intense this also intersects with the jaggedness debate the more jagged agi is, the more areas where humans will be better and/or adapt faster Daniel Rock (GG discussion on WhatsApp): Some great papers on this: - Autor levy and Murnane (2003) - Autor and Thompson (2024) - Acemoglu and Restrepo (2016) - race between man and machine (they have a lot of papers) - And Ide and Talamás (2025) “Danie”: I think that’s really compelling, and I agree — if AI becomes more adaptable, that’ll generate new ideas about how to implement itself faster, as you describe. But I’d build on that and say adaptability, when paired with scale and access, is where it really takes off. AI isn’t just innovating faster; it’s also diffusing across industries and geographies far more quickly than previous technologies. In other words, it’s accelerating both the creation and the spread of its own applications. For example, historically, technological change was sectoral and staggered — the telephone or mechanisation hit discrete industries and localities over decades. Labour markets had temporal slack: time for the reallocation, retraining, and institutional adjustment you mention. What’s distinctive about AI is that it’s a multi-sectoral adoption shock with widening but uneven productivity impacts — unfolding across legal, creative, administrative, analytical, etc domains at once. So adaptability, when coupled with that simultaneity, collapses the temporal and spatial buffers that usually cushion technological shocks. Even if the long-run adaptive logic still holds, the adjustment process is no longer sequential but synchronous (or at the very least, in rapid cyclical waves) — and that compression of time, rather than simply the scale of change, is what makes this transition historically different. I could note that personal computers displaced typists, but not teachers; factory automation displaced assembly line workers, but not construction workers. https://www.henrikkarlsson.xyz/p/wordless-thought. Relates to “neuralese”. Also I think suggests that there are important things going on in the human brain that do not fit into the linear structure of an LLM. Also also I wonder whether current LLMs operate at a significant handicap by not being able to specialize. https://ai-frontiers.org/articles/agis-last-bottlenecks keys off of a list of components of AGI. I think the list is incomplete. Review my post on the Case of the Missing Agent, this doesn’t cover higher-level skills such as prioritization and maintaining coherence over long time periods. You could argue that this falls under the memory and reasoning bullets, but I would argue that this under-weighs the depth / breadth / complexity of the remaining areas to be covered. More generally, I think this ties back to my idea that we lack vocabulary (or at least have only a sparse vocabulary) for the things that current models are missing. https://www.dwarkesh.com/p/andrej-karpathy He suggests that pre-training an LLM infuses it with both knowledge and intelligence (core cognitive structures that allow it to then learn new things through reinforcement learning), and that the models might actually be better at learning if they didn't have all that knowledge because they use it as a crutch. I wonder whether it's really true that they're acquiring core intelligence separate from facts, or if they just have billions of little shards of cognitive concepts entangled with the the detailed knowledge and this is part of the reason that they struggle to generalize. Ethan Mollick (@emollick) posted at 11:36 AM on Tue, Oct 21, 2025:Papers like this show that are a lot of potential pathways forward on some of the hardest outstanding problems in AI. The amount of low-hanging fruit suggests that AI lab R&D might continue to find ways around barriers to continual improvement of AI models.(https://x.com/emollick/status/1980704687377486182?t=85fVurBCtsQddzOHOlqOtA&s=03) Ethan Mollick (@emollick) posted at 4:47 AM on Tue, Oct 21, 2025:Looking back at an exponentially improving technology & you will see how momentum led to R&D which overcame tech barriersThe fact that reasoners were developed at exactly the moment AI pre-training slowed is how Moore’s Law works, too: new technique appear to maintain the trend https://t.co/9rAySHO5L2(https://x.com/emollick/status/1980601776710525103?t=tLJFmxj8M3wZV_-RlRTtmA&s=03) Toby Ord (@tobyordoxford) posted at 0:12 PM on Mon, Oct 20, 2025: New post on RL scaling: Careful analysis of OpenAI’s public benchmarks reveals RL scales far worse than inference: to match each 10x scale-up of inference compute, you need 100x the RL-training compute. The only reason it has been cost-effective is starting from a tiny base. 🧵 https://t.co/ZwhDegc4NO (https://x.com/tobyordoxford/status/1980351353227768109?t=KevNIGnuT-Kq5Lt87kx1Kg&s=03) Toby Ord (@tobyordoxford) posted at 0:49 PM on Fri, Oct 03, 2025:So it looks like most of the gains are coming from the ability to spend more compute on each answer rather than from better ability to reason for the same token budget.This shift has big implications for AI business, governance, and risk.https://t.co/X2EocaaZjQ13/(https://x.com/tobyordoxford/status/1974200193504719049?t=fKg6dK3nB5WEYbOHr052sg&s=03) From https://thezvi.substack.com/p/bending-the-curve: A fun suggestion was to imagine LLMs talking about how jagged human capabilities are. Look how dumb we are in some ways while being smart in others. I do think in a meaningful sense LLMs and other current AIs are ‘more jagged’ than humans in practice, because humans have continual learning and the ability to patch the situation and also route the physical world around our idiocy where they’re being importantly dumb. So we’re super dumb, but we try to not let it get in the way. Reactions to https://www.dwarkesh.com/p/thoughts-on-sutton: The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute. → Clearly, the wins come from leveraging compute. But “effectively” is as important here as “scalably”. Do we understand what the dividing line is between effective and ineffective uses of compute? (Not sure whether this directly relates to anything Dwarkesh said) someone, I think a speaker on MLST, talked about how LLMs / deep learning models develop horrible, “fractured”, spaghetti-code representations. Perhaps that’s because we throw them into the deep end when training, they (I presume) come out of the gate learning obscure facts and other things that small children don’t try to learn (?), instead children master broad basic concepts and then build on that. Could be a combination of what they encounter (e.g. school curriculum, “baby talk”) and natural learning instinct (babies and children tune out things that are over their heads). The agent is in no substantial way learning from organic and self-directed engagement with the world. Having to learn only from human data (an inelastic hard-to-scale resource) is not a scalable use of compute. What these LLMs learn from training is not a true world model (which tells you how the environment changes in response to different actions). Rather, they are building a model of what a human would say next. And this leads them to rely on human-derived concepts. To that last paragraph: people talk about how in the limit, an ASI should basically be able to derive anything from anything, a perfect predictor would need a model of the world in which the humans it is predicting reside, etc. But I don’t see anyone engage with the computational efficiency of that. It might be a valid point in the limit, but it might also be an inefficient, perhaps insupportably inefficient, way of building true world models. LLMs aren’t capable of learning on-the-job, so we’ll need some new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I like to talk about how advancing technology erodes the gray areas that allow society to function, such as digital media removing the gray area around recording shows off of TV. Continuous-learning AIs will erode the gray area between employees developing their expertise on the job, vs. stealing trade secrets when they move to another employer. I tried to ask Richard a couple of times whether pretrained LLMs can serve as a good prior on which to accumulate the experiential learning (aka do the RL) which will lead to AGI. On the one hand, it seems to be generally accepted that pretraining allows LLMs to achieve nonzero scores on various RL challenges and that this is a necessary precondition to further hill climbing. On the other hand, per some of my thoughts above, perhaps pretraining sets them down a dead-end trail where they’re burdened by a mess of advanced concepts instead of a cleaner foundation of childhood concepts. The accumulation of knowledge over tens of thousands of years has clearly been essential to humanity’s success. In any field of knowledge, thousands (and likely millions) of previous people were involved in building up our understanding and passing it on to the next generation. We didn’t invent the language we speak, nor the legal system we use, nor even most of the knowledge relevant to the technologies in our phones. This process is more analogous to imitation learning than to RL from scratch. This is an important point. However, we don’t dump this accumulated mass of intellectual heritage on two-year-olds; we are very thoughtful about the order in which we present the information (and we do a lot of work to digest and organize it, likely supplemented by childhood instincts about what information to focus on). Are we literally predicting the next token (like an LLM) in order to do this cultural learning? No, and so even imitation learning for humans is not like supervised learning for AI. But neither are we running around trying to collect some well defined scalar reward. No ML learning regime perfectly describes human learning. Aren’t we? The reward might be an instinctual mix of things like “satisfying curiosity” and “mastering new achievements”, but it may exist as a fairly coherent thing? Being able to continuously learn from the environment in a high throughput way is obviously necessary for true AGI. And it clearly doesn’t exist with LLMs trained on RLVR. But there might be some relatively straightforward ways to shoehorn continual learning atop LLMs. For example, one could imagine making SFT a tool call for the model. So the outer loop RL is incentivizing the model to teach itself effectively using supervised learning, in order to solve problems that don’t fit in the context window. I’m genuinely agnostic about how well techniques like this will work—I’m not an AI researcher. But I wouldn’t be surprised if they basically replicate continual learning. Models already demonstrate something resembling human continual learning within their context windows. The fact that in-context learning emerged spontaneously from the training incentive to process long sequences suggests that if information could flow across windows longer than the current context limit, models could meta-learn the same flexibility they already show in-context. This could be a path to AGI using an LLM architecture. But I’m somewhat dubious. LLMs might be too sample-inefficient. It would be very interesting to see whether models can learn to do a good job of deciding what information is valuable enough to fine tune on (is it feasible for a gradient descent training process to figure this out?). Is in-context learning robust enough to supply everything that LLMs are missing, if it could be extended across sessions? https://www.interconnects.ai/p/thoughts-on-the-curve At The Curve, Ted Chiang said that the things that separates human art from machine slop is a person making decisions. I didn’t get the chance to ask how that would apply to someone putting in laborious effort to, for instance, craft a cohesive movie out of 8-second Sora videos. At The Curve, Gary Marcus said that broad AI never beats narrow AI. I asked about coding models, and he said that they’re an interesting middle ground involving lots of “data augmentation”, which might not be applicable to other areas. I can probably think of lots of other examples of things that LLMs can do but have never been addressed using narrow AI. Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex) posted at 1:29 AM on Sat, Sep 27, 2025: I think @RichardSSutton is spiritually right, but offers non-constructively framed arguments (much like Yann). Here are some of my incompetent thoughts on the matter. 0) almost none of this is relevant to mundane economic and strategic questions. However ineptly and wastefully https://t.co/qyMUb0c2Oh (https://x.com/teortaxesTex/status/1971854577755377748?t=V7JV-OHm5wSQE_mpk6SlGw&s=03) From the video model section of https://importai.substack.com/p/import-ai-430-emergence-in-video: video models exhibit generalization, but my suspicion is that it will be as shallow and narrow and unsatisfying as we see for LLMs. Might add some more thoughts on the unsatisfying nature of powerful LLM capabilities. Over and over and over and over again, we mistake benchmark scores for utility, and rapid progress for short timelines. Steve Newman (@snewmanpv) posted at 9:40 AM on Tue, Sep 23, 2025:Why do coding agents cheat on unit tests (e.g. by modifying the test to always return true)?The obvious answer, "because this would be rewarded during RL training", only makes sense if the RL environment is stupid enough to be fooled by hacked tests. Do we know whether this is(https://x.com/snewmanpv/status/1970528715403854327?t=Mj_RiH2oeiOblZ7Xn1QrSA&s=03) https://x.com/snewmanpv/status/1970528715403854327 I cannot wrap my head around the idea that context lengths have been growing at 30x per year. How does this fit into cost and the way that models are being used in practice? (somewhat duplicative of things I said last time, but it’s worth coming at this from multiple framings) AI already has all of the cognitive skills that laypeople have words for. It’s difficult to even pick out what’s missing, let alone describe it. CGI also had the property that there was a point where we were still in the uncanny valley, but it was difficult to say exactly what was missing. Many other software challenges have not had that property. > Am also interested in how much folks here buy into the sandbagging theory that several of the cos are holding back on public announcements & using access to ever more powerful models/compute for internal competitive advantage?“ I haven't heard any hints that the big labs are sandbagging. It's certainly possible that I wouldn't get a hint, but I'm inclined to think that's not happening. I do think that to some extent, we're not seeing the most powerful models the labs could create today for one or more of the following reasons: The models are still being tuned / tested. They're not releasing their biggest models (and plausibly not even developing them) because they wouldn't have enough compute to serve them. They're limiting how "Pro" the models are (how long / how much in parallel they think) because it's too expensive / too much compute. And maybe they give themselves higher limits internally. > Is it possible that they're also holding models bc they're not yet safe/aligned enough to not be a legal/brand/general risk? Hmm, possibly? I haven't heard that either, but it does seem conceivable. Very plausibly, a substantially-better-than-GPT5 model would trigger a level of required safeguards (under the big labs' published safety frameworks) that they're not ready to deliver, and they might not want to bend those frameworks far enough to skirt the issue. From https://pca.st/episode/27144be7-bcae-4e16-b1e4-40302c956a1a: AI labs are paying Surge AI to build RL environments. Seems like there’s a ton of this going on everywhere (Mechanize + many many other companies). AI labs are paying Surge AI to build ARC AGI style problems. To what extent does that represent benchmaxxing (the Surge CEO says there’s a ton of this, also over-optimization for LLMArena which is very low quality signal), vs. building genuinely useful capabilities? Ethan Mollick (@emollick) posted at 8:42 AM on Wed, Sep 17, 2025:A big issue with today’s agent implementation is that they don’t ask questions, even when the thinking trace says the AI believes more information is required.Many disappointing results would be solved by just asking for clarification when needed, especially as task time grows.(https://x.com/emollick/status/1968339804975948274?t=jgv5dYIjriq2bhDdCzLajQ&s=03) → they should know when to ask the user for information, also when to seek it from the environment – which deep research models have already learned to do? (check whether I already wrote about this) when I succeed or fail at a task, I record much more than one bit of information. I use reasoning to identify the specific reason(s) that I succeeded (at a difficult task where failure was likely) or failed, and update heavily on those. (Don’t touch a hot stove twice.) Watching John squeeze the water bag from the side, realizing this is a much better way to fill it, updating strongly from one sample. LLM training doesn't distinguish these especially good examples. If you asked gpt5 whether it should spend a fifth hour wrestling with the spreadsheet, I have to think it would have said no. LLMs have weird disconnects in what information they can apply when. Deeply non reflective of their own state and actions. (Kind of like the aliens in Footfall, scaffolded into action before they were ready?) Also, they’re non-introspective in the way they learn. Don't lean into understanding their own successes and failures, curiosity, etc. They learn very entangled concepts, perhaps because they don’t go through the early childhood stage of learning simple concepts? (How much of that is a function of the environment an early child is exposed to, and how much is a function of what they choose to explore and attend to?) There's some deep concept to my brain. Keeps wanting to explore about the nature of intelligence and intelligence as compression and other models for intelligence and how to quantify it. Something about the extent to the efficiency with which a system can find insights and ways of compressing a data set as a function. Also of the nature of the data set and something about the fractal structure of this problem and that probably has something to do with scaling laws and there's a nearby timeline where I start having extended conversation with an LLM about this and that's my personal route into into llm-induced psychosis Intelligence is compression, which is about finding just the right way to factor or frame a problem or situation. To decide whether a boiler is at risk of exploding, don't sample the trajectory of ten gas molecules, measure the pressure. Finding the right invariant for an inductive proof or iterative algorithm. Finding good low level features. https://x.com/mjbukow/status/1962186888103747892 See “What to make of those METR evals” with Toby Ord Mechanistic interpretability tools such as sparse autoencoders appear to be able to explain like 80% of the activations in an LLM. Could the remaining 20% be where much of the interesting / advanced thought is taking place? Is 80% “pretty good”, or more like “barely scratching the surface”? Compare to performance vs. loss function. Tie this to my note about LLMs learning far more facts per parameter / neuron: Liv (@livgorton) posted at 9:27 AM on Tue, Aug 26, 2025:17/ If this is right, it reframes the robustness problem: adversarial vulnerability might be the price of neural networks' incredible efficiency. They're vulnerable precisely because they're doing something clever with their representations.(https://x.com/livgorton/status/1960378468807447026?t=NKCW413z2VZixyL0jSXutw&s=03) Discussions of AI utility always need to consider the baseline alternative (read) My comment on Ryan’s August 2025 timelines update on LessWrong: Nice analysis. I can't add anything substantive, but this writeup crystallized for me just how much we're all focusing on METR's horizon lengths work. On the one hand, it's the best data set we have at the moment for quantitative extrapolation, so of course we should focus on it. On the other hand, it's only one data set, and could easily turn out to not imply what we think it implies. My only points are (a) we shouldn't weight the horizon length trends too heavily, and (b) boy do we need additional metrics that are both extrapolatable, and plausibly linked to actual outcomes of interest. See this discussion of TextQuests: When things go wrong: Mostly, models fail because they end up getting confused about what they've already done - this suggests that as model context lengths improve as well as their ability to effectively use their memory, performance will grow. → I should read the details, is it that the games can’t play the entire game in a single context window, or is it that they get confused anyway? The latter would suggest that context windows, even if scaled further, are not an full solution to memory / continuous learning. Nathan Lambert thinks “we already have AGI” and that “continual learning” will be achieved through context engineering: https://www.interconnects.ai/p/contra-dwarkesh-on-continual-learning With GPT-5, OpenAI is optimizing for cost and usability, not raw intelligence. https://www.reddit.com/r/mlscaling/comments/1mrm0di/we_had_this_big_gpu_crunch_we_could_go_make/ The labs may not be maxing out capabilities, because they’re inference capacity constrained and so there’s no point in developing a large smart model. See “Applied AI: The Math Doesn’t Work for Flat AI Agent Pricing”, from The Information. See Sam Altman’s remarks around the time of the GPT-5 launch, that they’re focusing on cost & speed rather than capability. See my notes about this in the 36 Thoughts post. Siméon (@Simeon_Cps) posted at 5:52 PM on Fri, Aug 15, 2025:I have this theory that we are in a period of increasing marginal utility of capabilities. GPT-2 to GPT-3 jump was a bigger jump than 3 to 4, which was bigger than 4 to 5. But the utility jumps have been increasing.My core thesis for why is that most use cases are bottlenecked(https://x.com/Simeon_Cps/status/1956519485277684219?t=Y9sBafUnmKmEhxpCqQurMw&s=03) https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence Inference keeps getting more efficient, but inference costs aren’t falling, because we’re taking the gains in more intelligence – but not much more intelligence. A reminder that scaling laws are logarithmic. https://ethanding.substack.com/p/ai-subscriptions-get-short-squeezed Google’s Genie 3 is touted as “providing a training playground for robotics and agents” but this strikes me as an absolute recipe for reward hacking? Related to the fuss regarding Genie 3, I am entirely unimpressed by “look at this one-off funny cool thing an AI did”. I have a lot of respect for Ethan Mollick and he’s often very insightful but he also posts a lot of this stuff – one-shot prompts to the latest AI to create a starship simulator or something – and I have absolutely no time for this. Generative AI is very good at creating an example of a thing, and much much worse at creating the specific thing you want/need, satisfying specific constraints. These one-shot demos are reward hacking against the Twitter algorithm. Everyone talks about agents. Maybe because agents need context and the best hope is to let them handle a larger task so they can assemble their own context? Application design is still unimaginative and shallow AI so this is all model developers have to aim for. Regarding https://x.com/ben_j_todd/status/1934284189928501482: [me] Error rate may be a good fit for the data, but is it convincing as a model of the actual phenomena at play? It seems to me that models are lacking some fundamental capabilities that will be needed before they can independently manage large, complex projects, and that "reducing the error rate" is not good way to highlight those missing capabilities. I can't resist quoting something I wrote last year: Lumping all this under “reliability” is like saying that all I’d need to start playing in the NBA is to be “better at basketball”: in reality, I would have to acquire a wide range of skills, many of which are not feasible given my current architecture. I _really_ hope we don't head toward advertising as a primary revenue source for the tools people mostly use to interact with AI. It creates such awful incentives, and burns vast resources in zero-sum games (competitions between advertisers, and against consumers trying to cling to whatever shreds of attention span they still retain). See August notes from Beth Barnes, she had a lot of relevant thoughts (credit her) Sure, for some reasonable definitions of "error", everything becomes possible if you can push the error rate close enough to 0. My intuition is that LLMs go wrong in all sorts of very different ways – citing incorrect facts, committing logic errors, failing to focus on a relevant piece of information in their context window, making poor high-level choices about how to approach a problem, failing to come up with a key insight that makes a problem tractable – and lumping them all under the term "error" obscures a lot of important details and makes it harder to predict the future. Like saying that all hunter-gatherer tribes needed to do to create modern civilization was "grow their economy". In his Latent Space interview, Greg Brockman said that GPT5 is the first model they've trained on messy real usage coding tasks, as opposed to benchmarks. I wonder if they get the most detailed data internally, and thus might do best there? From http://henrikkarlsson.xyz/p/attention: Michael Nielsen writes about this in an essay where he describes the experience of pushing himself to go deeper than usual in understanding a mathematical proof: I gradually internalize the mathematical objects I’m dealing with [using spaced repetition]. It becomes easier and easier to conduct (most of) my work in my head. [. . .] Furthermore, as my understanding of the objects change – as I learn more about their nature, and correct my own misconceptions – my sense of what I can do with the objects changes as well. It’s as though they sprout new affordances, in the language of user interface design, and I get much practice in learning to fluidly apply those affordances in multiple ways. [. . .] After going through the [time-consuming process of deeply understanding a proof,] I had a rather curious experience. I went for a multi-hour walk along the San Francisco Embarcadero. I found that my mind simply and naturally began discovering other facts related to the result. In particular, I found a handful (perhaps half a dozen) of different proofs of the basic theorem, as well as noticing many related ideas. This wasn’t done especially consciously – rather, my mind simply wanted to find these proofs. Chris Olah writes: Research intimacy is different from theoretical knowledge. It involves internalizing information that hasn’t become part of the “scientific cannon” yet. Observations we don’t (yet) see as important, or haven’t (yet) digested. The ideas are raw. (A personal example: I’ve memorized hundreds of neurons in InceptionV1. I know how they behave, and I know how that behavior is built from earlier neurons. These seem like obscure facts, but they give me powerful, concrete examples to test ideas against.) Research intimacy is also different from research taste. But it does feed into it, and I suspect it’s one of the key ingredients in beating the “research taste market.” As your intimacy with a research topic grows, your random thoughts about it become more interesting. Your thoughts in the shower or on a hike bounce against richer context. Your unconscious has more to work with. Your intuition deepens. I suspect that a lot of “brilliant insights” are natural next steps from someone who has deep intimacy with a research topic. And that actually seems more profound. Notes from Prof. Cris Moore of the Santa Fe Institute on Machine Learning Street Talk: I'm listening to machine learning streets talk and the presenter is talking about how the real world contains a lot of structure that intelligence can take advantage of that real world problems are not usually adversarily, hard or even random. They have a lot of structure and a lot of problems which can be proven mathematically to be very difficult in theory are often fairly tractable in practice. I might encapsulate this by saying is a Converse to the idea that the models just want to learn. We might say that the world just wants to be learned. Human puzzle solvers finding partial knowledge in Sudoku. You might note that a certain cell must be either two or seven or that the three in this box must be in one of these two locations. Similarly, I could talk about forms of partial knowledge that arise when doing a battleships problem. The host used the nice phrase. Epistemic foraging people can come up with different frameworks for looking at a problem when doing a pantaminos puzzle. Do I think about which piece can fit here? Or do I think about where a particular piece can go? We come up with new strategies on the fly. We come up with heuristics for deciding which strategy to apply next and where. I wonder whether AI theorem provers are doing any of that. We see a lot of examples of puzzle solving where AIS can't do it at all. Although many of those are confounded by being very two-dimensional visual, I wonder if people have found examples of puzzles that are strictly one-dimensional text that still exercise this kind of skill? We generate new mental toolkits and notations and representations and techniques for ourselves on the fly. I don't think AIS do anything like that right now, so there's a whole array of skills involved there with maybe not much training data. Today models to the extent that they can solve problems. Solve IMO problems. For example, it really seems like they're not doing much of that, although I should look at some of the proofs they've generated, but my understanding is they're not doing anything like that or maybe we can't tell because it's not. It would be in the reasoning traces that I don't think we have access to, but I haven't seen examples reported of them doing things like that. Instead, they just seem to kind of have very good intuition, presumably grounded in having absorbed such a breads and depth of training data to some extent. That means they're coasting off of past human work. On the other hand, clearly to some extent it means they're succeeding but erases the question, are they going to be unable to move? Many most Fields forward in any interesting way because you can't do that by coasting. Maybe can they? Can they do it? I guess mixing and matching the existing training data in a way that humans almost never can. There was that example of a man uncovering something about networks of cell regulation asking might have been '03 to think about it as analogy to a Battle Royale game, making a connection to a very different field or maybe through a process similar to training a large metal linen. Distilling maybe the models we'll be able to kind of flail around inefficiently generate new insights, but then make very very good use of those insights by incorporating them into their training data going forward. Can we measure and assess any of this, to judge where things will go from here? Maybe relevant? https://www.interconnects.ai/p/brakes-on-an-intelligence-explosion Argues that rapid progress on benchmarks is very much a reflection of explicit hill-climbing: In fact it is a common job at AI laboratories to make new data that looks very close to population evaluations. These laboratories can’t train on the test set directly for basic reasons of scientific integrity, but they can pay thousands to millions of dollars for new training data that looks practically identical. This is a very common practice and makes the hillclimbing on evaluations far less extraordinary. AI capabilities in domains we are measuring aren't accelerating, they’re continuing. At the same time, AI’s abilities are expanding outwards into new domains. AI researchers solve domains when we focus on them, not really by accident. Generalization happens sometimes, but it is messy to track and argue for. There are many explanations for why this will be the case. All of them rely on the complexity of the environment we are operating modern AI in being too high relative to the signal for improvement. The AI systems that furthest exceeded human performance in one domain were trained in environments where those domains were the entire world. AlphaGo is the perfect rendition of this. Notes from MSLT The Mathematical Foundations of Intelligence https://podcasts.apple.com/us/podcast/the-mathematical-foundations-of-intelligence/id1510472996?i=1000741165541 A mix of ideas from the talk and my own ideas: Muse about the fundamental concept or metric for intelligence, some combination of compression, parsimony, predictive ability, efficiency at both training and inference time, ability to successfully pursue goals in the world. intelligence is compression – parsimonious and self-consistent representations (why the latter?) insight and abstraction… which are just forms of compression? lossy implies a fitness metric. Compression is only well defined in the context of a metric for how good a fit the the reconstructed artifact is to the original, does it retain the attributes that we care about. For example, if I'm trying to predict in my head, what will happen in a certain situation, I don't care about every detail of the signals that will reach my retina where the precise State of every atom in the room, there are higher level system estate parameters that are what I actually care about predicting. ability to successfully pursue a goal domain specific memorization generalization, overfitting; domain specificity efficiency of usage of various resources at both training and inference time; curiosity, guided exploration, etc. Interviewer: efficient search over Turing Machine algorithms Inductive bias [Child Page: Will LLMs Generalize?] My model of what is going on with LLMs (LW) Have LLMs Generated Novel Insights? (LW) 2 Big Questions for AI Progress in 2025-2026 (Helen Toner) [Child Page: How Deep are The Remaining Rabbit Holes?] [theme is: we don’t know how deep AI intellectual capacity is, and we don’t know how many tasks need that? all these things that our dinner identified as missing from current AIs, how hard will they be, might some of it be amenable to prompting + scaffolding + a touch of RL?] Build on https://lemmata.substack.com/p/what-i-wish-i-knew-about-frontiermath to write about the need to measure creativity, and fuzzy capabilities more generally (also build on my initial writeup from our first dinner). Write a post questioning how real world tasks break down into background, creativity, execution. Highlight the questions about creativity required for the FM problems. My suspicion is that a significant chunk of FrontierMath problems can be solved by applying advanced mathematical techniques in relatively straightforward ways. If anything, this might obscure their difficulty to humans: most people don’t have the right knowledge, and without the right knowledge the problems seem impossible; but with the right knowledge, they aren’t so bad. https://amistrongeryet.substack.com/p/were-finding-out-what-humans-are/comment/95456981 Ethan Mollick (@emollick) posted at 11:06 PM on Tue, Feb 18, 2025: As I have written many times, AI is not naturally a great tutor, it offers explanations but, without proper prompting, tends to tell your answers rather than engaging you in the process of understanding. I find explanations on demand very promising, but they aren't there yet.(https://x.com/emollick/status/1892108434171949321?t=JRIExSHeGTJitIUwXGVCDQ&s=03) Taren: feels like he does say that this is basically a prompting/product problem? Steve: I read this as "the student has to find the right questions to ask". Other parts of his tweet sound discouraging ("not naturally a great tutor", "they aren't there yet"). But I guess it is ambiguous. From https://erictopol.substack.com/p/when-doctors-with-ai-are-outperformed: When A.I. systems attempted to gather patient information through direct interviews, their diagnostic accuracy plummeted — in one case from 82 percent to 63 percent. The study revealed that A.I. still struggles with guiding natural conversations and knowing which follow-up questions will yield crucial diagnostic information. Taren: this is a great example, and/but i wonder how much of this is a product/prompting problem vs a capabilities problem... feels like a naive user of AI setting up the interview process, vs an expert user, could have a very different outcome here -- and hard to say which type it was in this case? The difficulty our first-dinner participants had in deciding whether a capability gap can be met using prompting, data/scale, or architectural changes. Taren notes that all three routes could be viable, on different time scales. Sigal Samuel (@SigalSamuel) posted at 10:00 AM on Fri, Feb 21, 2025: The big AI story of the past 6 months is: Companies now claim that their AI models are capable of genuine reasoning. Is that true? I found that the best answer lies in between hype and skepticism. https://t.co/b3ZuMjO0ZJ Thanks to @ajeya_cotra @RyanPGreenblatt @MelMitchell1 (https://x.com/SigalSamuel/status/1892997861886820474?t=x29SrzUJR8dq9mwmmnaiDQ&s=03) https://arxiv.org/abs/2410.06992 Our analysis reveals some critical issues with the SWE-bench dataset: 1) 32.67% of the successful patches involve cheating as the solutions were directly provided in the issue report or the comments. We refer to as solution leakage problem. 2) 31.08% of the passed patches are suspicious patches due to weak test cases, i.e., the tests were not adequate to verify the correctness of a patch. When we filtered out these problematic issues, the resolution rate of SWE-Agent+GPT-4 dropped from 12.47% to 3.97%. We also observed that the same data quality issues also exist in the two variants of SWE-bench, i.e., SWE-bench Lite and SWE-Bench Verified. In addition, over 94% of the issues were created before LLM's knowledge cutoff dates, posing potential data leakage issues. Review https://epochai.substack.com/p/ai-progress-is-about-to-speed-up. Note reference to Moravec's paradox. Also note expectation that capabilities which are weak today will continue to be weak. https://aidanmclaughlin.notion.site/reasoners-problem https://x.com/AndrewCritchPhD/status/1891887600102932629?t=0UjiKsyU97miKXKTPaKllg&s=03 [Zvi] Have o1-Pro give you a prompt to have Deep Research do Deep Research on Deep Research prompting, use that to create prompt templates for Deep Research. The results are here in case you want to try the final form. https://news.ycombinator.com/item?id=43169586: Most of the time, most of the devs I know, including myself, are not really creating novelty with the code itself, but with the product. A recent Matt Levine column talks about how humans do better than current AIs and traditional ML in out-of-distribution situations. See the section that ends with: The stereotype about algorithmic trading and investing is something like “algorithms tend to learn on historical data and are poorly suited to dealing with regime changes, while humans are more flexible and have better gut instincts to handle sharp breaks with the past.” I have often been skeptical of that stereotype. Humans also learn on historical data, and less of it: If you’ve been trading for 10 years, in some sense you only really have access to 10 years of market history, while a computer can hold the last 200 years of data in its mind. But Sasha Gill makes me rethink that. She has roughly zero years of market history, she barely knows what a yard of cable is, but she’s keeping an eye on Truth Social. She’s handling the regime change. If you are a computer trained on recent historical data, a sharp increase in FX volatility might catch you flat-footed. If you’re a human trader straight out of university, you’ll be like “ah yes time to fire up Truth Social.” The algorithm has never even heard of Truth Social! Good time to be a human FX trader. https://x.com/littmath/status/1898461323391815820 First section of https://www.theintrinsicperspective.com/p/ai-plays-pokemon-but-so-does-teslas https://x.com/slatestarcodex/status/1896457193215742274 Ethan Mollick (@emollick) posted at 11:24 AM on Sun, Mar 09, 2025:If it turns out LLMs are only capable of recombinatory innovation (finding novel connections among existing knowledge), that would still be very