Articles About Donate Publish an Article Search K Subscribe Subscribe Articles About Donate Publish an Article Search K Subscribe Subscribe Subscribe Thank you! Your submission has been received! Oops! Something went wrong while submitting the form. Exporting Advanced Chips Is Good for Nvidia, Not the US Laura Hiscott — The White House is betting that hardware sales will buy software loyalty — a strategy borrowed from 5G that misunderstands how AI actually works. AI Could Undermine Emerging Economies Deric Cheng — AI automation threatens to erode the “development ladder,” a foundational economic pathway that has lifted hundreds of millions out of poverty. The Evidence for AI Consciousness, Today Cameron Berg — A growing body of evidence means it’s no longer tenable to dismiss the possibility that frontier AIs are conscious. AI Alignment Cannot Be Top-Down Audrey Tang — Community Notes offers a better model — where citizens, not corporations, decide what “aligned” means. AI alignment, attentiveness, Community Notes, Taiwan, Audrey Tang, model specification, deliberative governance, epistemic security, portability and interoperability, market design, Polis, reinforcement learning from community feedback, social media moderation, civic technology AGI's Last Bottlenecks Adam Khoja — A new framework suggests we’re already halfway to AGI. The rest of the way will mostly require business-as-usual research and engineering. AGI, artificial general intelligence, AGI definition, GPT-5, GPT-4, visual reasoning, world modeling, continual learning, long-term memory, hallucinations, SimpleQA, SPACE benchmark, IntPhys 2, ARC-AGI, working memory AI Will Be Your Personal Political Proxy Bruce Schneier — By learning our views and engaging on our behalf, AI could make government more representative and responsive — but not if we allow it to erode our democratic instincts. AI political proxy, direct democracy, generative social choice, ballot initiatives, voter participation, democratic representation, AI governance, Rewiring Democracy, Bruce Schneier, Nathan E. Sanders, policy automation, civic engagement, rights of nature, disenfranchised voters, algorithmic policymaking Is China Serious About AI Safety? Karson Elmgren — China’s new AI safety body brings together leading experts — but faces obstacles to turning ambition into influence. China AI Safety and Development Association, CnAISDA, China AI safety, World AI Conference, Shanghai AI Lab, Frontier AI risk, AI governance, international cooperation, Tsinghua University, CAICT, BAAI, Global AI Governance Action Plan, AI Seoul Summit commitments, Concordia AI, Entity List AI Deterrence Is Our Best Option Dan Hendrycks — A response to critiques of Mutually Assured AI Malfunction (MAIM). AI deterrence, Mutually Assured AI Malfunction, MAIM, Superintelligence Strategy, ASI, intelligence recursion, nuclear MAD comparison, escalation ladders, verification and transparency, redlines, national security, sabotage of AI projects, deterrence framework, Dan Hendrycks, Adam Khoja Summary of “If Anyone Builds It, Everyone Dies” Laura Hiscott — An overview of the core arguments in Yudkowsky and Soares’s new book. If Anyone Builds It Everyone Dies, Eliezer Yudkowsky, Nate Soares, MIRI, AI safety, AI alignment, artificial general intelligence, artificial superintelligence, AI existential risk, Anthropic deceptive alignment, OpenAI o1, Truth_Terminal, AI moratorium, book summary, Laura Hiscott AI Agents Are Eroding the Foundations of Cybersecurity Rosario Mastrogiacomo — In this age of intelligent threats, cybersecurity professionals stand as the last line of defense. Their decisions shape how humanity contends with autonomous systems. AI agents, AI identities, cybersecurity, identity governance, zero trust, least privilege, rogue AI, autonomous systems, enterprise security, trust networks, authentication and authorization, RAISE framework, identity security, circuit breakers Precaution Shouldn't Keep Open-Source AI Behind the Frontier Ben Brooks — Invoking speculative risks to keep our most capable models behind paywalls could create a new form of digital feudalism. open-source AI, frontier models, precautionary policy, digital feudalism, OpenAI, Meta, Llama, GPT-OSS, regulation, open development, AI risk, legislation, policy debate, Berkman Klein Center The Hidden AI Frontier Oscar Delaney — Many cutting-edge AI systems are confined to private labs. This hidden frontier represents America’s greatest technological advantage — and a serious, overlooked vulnerability. hidden frontier AI, internal AI models, AI security, model theft, sabotage, government oversight, transparency, self-improving AI, AI R&D automation, policy recommendations, national security, RAND security levels, frontier models, AI governance, competitive advantage Uncontained AGI Would Replace Humanity Anthony Aguirre — The moment AGI is widely released — whether by design or by breach — any guardrails would be as good as gone. AGI, artificial general intelligence, open-source AI, guardrails, uncontrolled release, existential risk, humanity replacement, security threat, proliferation, autonomous systems, alignment, self-improving intelligence, policy, global race, tech companies Superintelligence Deterrence Has an Observability Problem Jason Ross Arnold — Mutual Assured AI Malfunction (MAIM) hinges on nations observing one another's progress toward superintelligence — but reliable observation is harder than MAIM's authors acknowledge. MAIM, superintelligence deterrence, Mutual AI Malfunction, observability problem, US-China AI arms race, compute chips data centers, strategic sabotage, false positives, false negatives, AI monitoring, nuclear MAD analogue, superintelligence strategy, distributed R&D, espionage escalation, peace and security Open Protocols Can Prevent AI Monopolies Isobel Moure — With model performance converging, user data is the new advantage — and Big Tech is sealing it off. open protocols, AI monopolies, Anthropic MCP, context data lock-in, big tech, APIs, interoperability, data portability, AI market competition, user context, model commoditization, policy regulation, open banking analogy, enshittification In the Race for AI Supremacy, Can Countries Stay Neutral? Anton Leicht — The global AI order is still in flux. But when the US and China figure out their path, they may leave little room for others to define their own. AI race, US-China competition, middle powers, export controls, AI strategy, militarization, economic dominance, compute supply, frontier models, securitization, AI policy, grand strategy, geopolitics, technology diffusion, national security How AI Can Degrade Human Performance in High-Stakes Settings Dane A. Morey — Across disciplines, bad AI predictions have a surprising tendency to make human experts perform worse. AI, human performance, safety-critical settings, Joint Activity Testing, human-AI collaboration, AI predictions, aviation safety, healthcare alarms, nuclear power plant control, algorithmic risk, AI oversight, cognitive systems engineering, safety frameworks, nurses study, resilient performance How the EU's Code of Practice Advances AI Safety Henry Papadatos — The Code provides a powerful incentive to push frontier developers toward measurably safer practices. EU Code of Practice, AI Act, AI safety, frontier AI models, risk management, systemic risks, 10^25 FLOPs threshold, external evaluation, transparency requirements, regulatory compliance, general-purpose models, European Union AI regulation, safety frameworks, risk modeling, policy enforcement How US Export Controls Have (and Haven't) Curbed Chinese AI Chris Miller — Six years of export restrictions have given the U.S. a commanding lead in key dimensions of the AI competition — but it’s uncertain if the impact of these controls will persist. chip, chips, china, chip export controls, China semiconductors, hardware, AI hardware policy, US technology restrictions, SMIC, Huawei Ascend, Nvidia H20, AI infrastructure, high-end lithography tools, EUV ban, domestic chipmaking, AI model development, technology trade, computing hardware, US-China relations Nuclear Non-Proliferation Is the Wrong Framework for AI Governance Michael C. Horowitz — Placing AI in a nuclear framework inflates expectations and distracts from practical, sector-specific governance. A Patchwork of State AI Regulation Is Bad. A Moratorium Is Worse. Kristin O’Donoghue — Congress is weighing a measure that would nullify thousands of state AI rules and bar new ones — upending federalism and halting the experiments that drive smarter policy. ai regulation, state laws, federalism, congress, policy innovation, legislative measures, state vs federal, ai governance, legal framework, regulation moratorium, technology policy, experimental policy, state experimentation, federal oversight, ai policy development Can Copyright Survive AI? Laura González Salmerón — Designed to protect human creativity, copyright law is under pressure from generative AI. Some experts question whether it has a future. copyright, generative ai, ai, creativity, intellectual property, law, legal challenges, technology, digital rights, innovation, future of copyright, authorship, content creation, legal reform, copyright law, ai-generated content Avoiding an AI Arms Race with Assurance Technologies Nora Ammann — A global race to build powerful AI is not inevitable. Here’s how technical solutions can help foster cooperation. ai arms race, assurance technologies, ai cooperation, global ai development, technical solutions, ai safety, international collaboration, ethical ai, ai policy, ai governance, technology diplomacy, nuclear We'll Be Arguing for Years Whether Large Language Models Can Make New Scientific Discoveries Edward Parker — ai, artificial intelligence, large language models, scientific discovery, digital intelligence, expert consensus, technology, innovation, society impact, machine learning, research, future of science, debate, ai capabilities, advancements in ai The Case for AI Liability Gabriel Weil — Abandoning liability mechanisms risks creating a dangerous regulatory vacuum. ai liability, regulatory vacuum, liability mechanisms, ai regulation, legal frameworks, technology accountability, risk management, artificial intelligence, governance, policy, ethical ai, tech industry, innovation, legal responsibility What if Organizations Ran Themselves? Gayan Benedict — Autonomous AI-enabled organizations are increasingly plausible. They would fundamentally break the way we regulate the economy. autonomous organizations, ai-enabled organizations, self-managing organizations, economic regulation, artificial intelligence, future of work, organizational structure, automation, technology in business, decentralized management, ai in economics, innovation, business transformation, nuclear How AI Can Prevent Blackouts David 'davidad' Dalrymple — For safety-critical domains like energy grids, "probably safe" isn't good enough. To fulfill the potential of AI in these areas, we need to develop more robust, mathematical guarantees of safety. ai, energy grids, blackout prevention, safety-critical domains, mathematical guarantees, robust ai, infrastructure safety, power systems, risk management, smart grids, technology in energy, ai safety, nuclear We're Not Ready for AI Liability Kevin Frazier — In the absence of federal legislation, the burden of managing AI risks has fallen to judges and state legislators — actors lacking the tools needed to ensure consistency, enforceability, or fairness. ai liability, federal legislation, ai risks, judges, state legislators, legal challenges, consistency, enforceability, fairness, regulation, technology policy, artificial intelligence, legal framework, risk management, governance, state laws, judicial responsibility A Glimpse into the Future of AI Companions Vanessa Bates Ramirez — AI is increasingly being used for emotional support — but research from OpenAI and MIT raises concerns that it may leave some users feeling even worse. ai companions, emotional support, openai, mit, mental health, technology, future of ai, ethical concerns, user experience, psychological impact, artificial intelligence, digital companionship, ai ethics, emotional well-being, human-ai interaction How AI Is Eroding the Norms of War David Kirichenko — An unchecked autonomous arms race is eroding rules that distinguish civilians from combatants. ai, autonomous weapons, arms race, warfare norms, civilian protection, military ethics, combatants, war technology, international law, defense policy, unmanned systems, ethical concerns, artificial intelligence, conflict dynamics, security challenges, nuclear Today's AIs Aren't Paperclip Maximizers. That Doesn't Mean They're Not Risky Peter N. Salib — Classic arguments about AI risk imagined AIs pursuing arbitrary and hard-to-comprehend goals. Large Language Models aren't like that, but they pose risks of their own. ai risk, paperclip maximizer, large language models, ai goals, ai safety, ai ethics, ai threats, ai behavior, ai development, technology risks, artificial intelligence, machine learning, ai impacts, existential risk, ai governance Can “Location Verification” Stop AI Chip Smuggling? Scott J Mulligan — US lawmakers propose a new system to check where chips end up. ai chip smuggling, location verification, us lawmakers, chip tracking, technology regulation, semiconductor industry, export control, national security, supply chain monitoring, tech policy, chip distribution, international trade, compliance technology The Misguided Quest for Mechanistic AI Interpretability Dan Hendrycks — Despite years of effort, mechanistic interpretability has failed to provide insight into AI behavior — the result of a flawed foundational assumption. mechanistic interpretability, ai behavior, ai transparency, ai ethics, machine learning, flawed assumptions, ai research, ai analysis, ai limitations, ai insights We’re Arguing About AI Safety Wrong Helen Toner — Dynamism vs. stasis is a clearer lens for criticizing controversial AI safety prescriptions. ai safety, ai ethics, dynamism, stasis, artificial intelligence, technology criticism, safety prescriptions, ai development, risk assessment, innovation vs regulation, tech debate, ai policy, future of ai Can the US Prevent AGI from Being Stolen? Philip Tschirhart — Securing AI weights from foreign adversaries would require a level of security never seen before. artificial general intelligence, agi, ai security, cybersecurity, national security, us defense, intellectual property, technology theft, foreign adversaries, ai research, ai ethics, ai governance, data protection, tech policy, ai innovation, nuclear AI Companies Want to Give You a New Job. Your Team? A Million AIs. Vanessa Bates Ramirez — AI Frontiers spoke with leading researchers and a CEO building AI agents to explore how AI will reshape work—and whether the jobs of the future are ones we’ll actually want. ai, future of work, automation, ai companies, job transformation, ai researchers, ai agents, workplace innovation, employment trends, ai impact, digital workforce, technology and jobs, ai in business, ai ceo, ai frontier America First Meets Safety First Miles Brundage — President Trump vowed to be a peacemaker. Striking an “AI deal” with China could define global security and his legacy. america first, safety first, president trump, peacemaker, china, ai safety, global security, international relations, diplomacy, legacy, artificial intelligence, us-china relations, geopolitics, technology policy, nuclear AIs Are Disseminating Expert-Level Virology Skills Dan Hendrycks — New research shows frontier models outperform human scientists in troubleshooting virology procedures—lowering barriers to the development of biological weapons. ai, virology, biological weapons, research, frontier models, bio lab, artificial intelligence, expert-level skills, human scientists, biosecurity, technology, laboratory tasks, scientific innovation Smokescreen: How Bad Evidence Is Used to Prevent AI Safety Laura Hiscott — Corporate capture of AI research—echoing the days of Big Tobacco—thwarts sensible policymaking. ai safety, bad evidence, ai policy, flawed benchmarks, corporate influence, transparency, accountability, safety data, research environment, structural reforms, trustworthy data, ai research, evidence-based policy We Need a New Kind of Insurance for AI Job Loss Kevin Frazier — AI is poised to leave a lot of us unemployed. We need to rethink social welfare. ai job loss, social insurance, ai displacement, future of work, automation, labor market, economic policy, job displacement, workforce adaptation, technology impact, unemployment, us economy, ai policy, social safety net, employment insurance Exporting H20 Chips to China Undermines America’s AI Edge Jason Hausenloy — Continued sales of advanced AI chips allow China to deploy AI at massive scale. ai, china, h20 chips, advanced gpus, technology export, global ai race, us-china relations, semiconductor industry, technology policy, national security, america's ai edge, trade restrictions, tech competition, geopolitical tension How Applying Abundance Thinking to AI Can Help Us Flourish Kevin Frazier — Realizing AI’s full potential requires designing for opportunity—not just guarding against risk. abundance thinking, ai, artificial intelligence, potential, hope, fear, positive mindset, growth, technology, innovation, future, opportunities, human flourishing, optimistic outlook Why Racing to Artificial Superintelligence Would Undermine America’s National Security Corin Katzke — Rather than rushing toward catastrophe, the US and China should recognize their shared interest in avoiding an ASI race. artificial superintelligence, ASI, national security, US-China relations, technology race, AI ethics, global cooperation, AI policy, security risks, international relations, technological competition, AI development, strategic interests, AI governance, catastrophic risk, nuclear AI Risk Management Can Learn a Lot From Other Industries Malcolm Murray — AI risk may have unique elements, but there is still a lot to be learned from cybersecurity, enterprise, financial, and environmental risk management. ai, risk management, cybersecurity, enterprise risk, financial risk, environmental risk, industry comparison, best practices, risk mitigation, technology, innovation, safety protocols, governance, compliance, regulations, nuclear Can We Stop Bad Actors From Manipulating AI? Andy Zou — AI is naturally prone to being tricked into behaving badly, but researchers are working hard to patch that weakness. ai security, adversarial attacks, machine learning, ai ethics, cybersecurity, ai manipulation, bad actors, ai vulnerabilities, defense mechanisms, ai research, algorithmic bias, ethical ai, ai safety, trust in ai The Challenges of Governing AI Agents Noam Kolt — Autonomous systems are being rapidly deployed, but governance efforts are still in their infancy. ai governance, autonomous systems, ai deployment, regulatory challenges, ethical ai, ai policy, technology governance, ai ethics, ai regulation, autonomous agents, ai oversight, responsible ai, ai safety, emerging technologies, ai accountability Welcome to AI Frontiers The AI Frontiers Editorial Board — ai, artificial intelligence, machine learning, technology, innovation, future, data science, ai development, ai research, ai trends, automation, deep learning Quick search did not find anything. Hit ↵ Enter or click ' View all results ' to do a full search. View all results All Technology & Research AGI's Last Bottlenecks A new framework suggests we’re already halfway to AGI. The rest of the way will mostly require business-as-usual research and engineering. Oct 22, 2025 Adam Khoja Laura Hiscott Guest Commentary Your browser does not support the video tag. Download Audio Adam Khoja is a co-author of the recent study, “ A Definition of AGI .” The opinions expressed in this article are his own and do not necessarily represent those of the study’s other authors. Laura Hiscott is a core contributor at AI Frontiers and collaborated on the development and writing of this article. ‍ Dan Hendrycks, lead author of “ A Definition of AGI ,” provided substantial input throughout this article’s drafting. _____ In a recent interview on the “Dwarkesh Podcast ,” OpenAI co-founder Andrej Karpathy claimed that artificial general intelligence (AGI) is around a decade away, expressing doubt about “over-predictions in the industry.” Coming amid growing discussion of an “AI bubble,” Karpathy’s comment throws cold water on some of the more bullish predictions from leading tech figures. Yet those figures don’t seem to be reconsidering their positions. Following Anthropic CEO Dario Amodei’s prediction last year that we might have “a country of geniuses in a datacenter” as early as 2026, Anthropic co-founder Jack Clark said this September that AI will be smarter than a Nobel Prize winner across many disciplines by the end of 2026 or 2027. A testable AGI definition is needed for apples-to-apples comparisons. There may be as many estimates of when AGI will arrive as there are people working in the field. And to complicate matters further, there is disagreement on what AGI even is. This imprecision hampers attempts to compare forecasts. To provide clarity to the debate, we, alongside thirty-one co-authors, recently released a paper that develops a detailed definition of AGI, allowing us to quantify how well models “can match or exceed the cognitive versatility and proficiency of a well-educated human adult.” We don’t claim our definition represents exactly what Karpathy or Amodei imagine when they discuss future AI systems, but a precise specification of AGI does provide the starting point for an apples-to-apples debate. The ten components of our AGI definition cover the breadth of human cognitive abilities. The detailed scores of GPT-4 and GPT-5 demonstrate the progress between the models, as well as unaddressed issues. Source . Our framework scores ten broad abilities and finds GPT-5 roughly halfway to AGI. Inspired by the Cattell-Horn-Carroll (CHC) theory of human intelligence, our definition formulates AGI as a system that possesses 10 broad abilities found in humans, from knowledge and reasoning to memory and writing. Just as with the study of human intelligence, we have a battery of diverse tasks that can rigorously assess AI models’ performance in each of these areas. We have tested GPT-4 and GPT-5 with targeted benchmarks in each of the 10 capabilities, weighting them equally to calculate an overall “AGI score.” Based on our definition, GPT-4 achieved a score of 27%, while GPT-5 reached 57% — with GPT-5’s main improvements being image support, audio support, a much larger context window, and mathematical skills. In order to predict when AGI — as defined by this framework — will arrive, we can systematically analyze each of the areas where the models fall short of well-educated humans, quantify how quickly systems are progressing, and estimate how difficult each barrier will be to resolve. As we will discuss, the meatiest remaining challenges appear to be visual processing and continual learning, but they ultimately appear tractable. Missing Capabilities and the Path to Solving Them To judge proximity to AGI, focus on where models still fall short, not excel. For the purposes of this analysis, we won’t look at areas where current models are already performing at or above human baselines. We are now well used to LLMs that broadly match or exceed most well-educated humans at reading, writing, and math. Having been trained on the internet, these models also have far superior reserves of knowledge than any human. To understand how close current models are to AGI, we must focus on where they lose points in the definitional framework, and how tractable those areas are. AI advances can generally be placed in one of three categories: (1) “business-as-usual” research and engineering that is incremental; (2) “standard breakthroughs” at a similar scale to OpenAI’s advancement that delivered the first reasoning models in 2024 ; finally, (3) “paradigm shifts” that reshape the field, at the scale of pretrained Transformers. We will now look at the areas of our definition where GPT-4 and GPT-5 lose many points — reasoning, visual processing, auditory processing, speed, working memory, memory retrieval, and long-term memory storage — and assess the scale of advance required to reach our definition of AGI. Visual Processing Overview. A reasonable (but imperfect) way to describe the previous generation of vision capabilities is by analogy to a human who is shown an image for a fraction of a second before they must answer questions about it: enough time to recognize natural objects and describe scenes, but not enough to count objects or perform mental rotations. Current models are more capable, but still struggle on visual reasoning and world modeling. The SPACE benchmark assesses spatial reasoning. Models do not yet match average human scores on these tasks, but they are improving rapidly. Source . Visual reasoning. While models can readily understand simple natural images, unnatural images such as schematics and screenshots are more challenging. That said, state-of-the-art models are rapidly improving their understanding of unnatural images and are becoming more capable visual reasoners. For example, on a subset of the SPACE benchmark developed by Apple, GPT-4o (May 2024) scored only 43.8%, while internal tests we’ve done at the Center for AI Safety show GPT-5 (August 2025) scoring 70.8%, while humans get 88.9% on average. Therefore, we might expect that business-as-usual AI development will continue to drive rapid progress on visual reasoning. The IntPhys 2 benchmark tests intuitive physics understanding by asking whether a video is physically plausible. The best existing models perform only slightly better than chance. Source . World modeling. Another visual processing task that models struggle with is world modeling, or intuitively understanding how the physical world behaves. Researchers such as Yann LeCun have argued that more fundamental advances may be needed to achieve this capability. A recent benchmark from Meta called IntPhys 2 tests world modeling by presenting AIs with videos and asking them how physically plausible the scenarios are. It shows that the best current models perform only slightly better than chance. However, upstream capabilities progress is a tide that lifts many boats, so it would not surprise us to see significant improvements on this benchmark with business-as-usual engineering. On-the-Spot Reasoning Progress in on-the-spot reasoning in the past two years has been substantial. While GPT-4 struggled with simple logical puzzles, reasoning models such as GPT-5 now approach the fluidity and precision of humans, especially in the text modality. By thinking about complex problems for hours on end, the best language models now score well enough on various olympiads, including the International Olympiad in Informatics (IOI) and International Math Olympiad (IMO), to earn gold medals. Results of a visual reasoning IQ test. Models score much higher when given a text version of the problems than when given the raw question images. Source . Models still struggle with visual induction. Despite the ongoing improvements in on-the-spot reasoning generally, models still lose points on tasks that demand visual induction. For example, they perform worse than most humans in a visual reasoning IQ test called Raven’s Progressive Matrices. Yet, when presented with text descriptions of the same problems, top models score between 15 to 40 points better than when given the raw question images, exceeding most humans. This suggests the modality is what is making the difference, rather than a deficiency in the model’s logical reasoning itself. Changing the size of a visual logic puzzle can degrade a model’s reasoning performance, suggesting that the failure may be to do with perception, rather than reasoning. Source . The remaining bottleneck is likely perception, not reasoning. Issues with visual inductive reasoning are the main barrier to perfect reasoning scores. In a benchmark measuring this called ARC-AGI, models have made impressive strides but remain below human-level. However, this may be due to a difficulty in perceiving visual data. An engineer demonstrated that logically identical but enlarged ARC-AGI puzzles produced a large drop in models’ performance. Humans, with their extremely fluent visual processing abilities, would hardly notice the difference, while models see it as a significantly longer and more difficult problem. This suggests that remaining deficiencies in visual reasoning may be more a failure of perception, rather than a failure of the underlying capacity to reason. Improvements in multimodal perception are therefore plausibly much of what is required for a human-level on-the-spot reasoning score, and these improvements could be delivered by business-as-usual engineering. Auditory Processing Audio capabilities appear tractable if given greater prioritization. Historically, audio capabilities tend to be easier for models to learn than visual capabilities. Current deficiencies in audio processing may simply be because this domain is not the highest priority in large AI companies, not because researchers don’t know how to make progress here. This is borne out by Sesame AI, a startup making voice companions, whose voice models from this past winter still far outperform the state of the art from frontier AI corporations. Putting in more effort to train models using known techniques on better auditory data (e.g., clearly labeled emotive interjections and accents in audio data) and reducing latency may therefore be sufficient to fill much of the gap. We therefore expect that business-as-usual engineering will saturate this domain. Speed Speed is superhuman in text and math, but lags where perception or tool use is required . When it comes to speed — a component of intelligence that considers how quickly a model can complete tasks — the scores vary depending on modality. GPT-5 is much faster than humans at reading, writing, and math, but slower at certain auditory, visual, and computer use tasks. In some cases, GPT-5 also seems to use reasoning mode to complete fairly simple tasks that should not require much reasoning, meaning that they take an unnecessarily long, convoluted approach that slows them down. Nonetheless, at fixed performance levels, costs and speed have improved dramatically year on year. Improving speed across many areas is a business-as-usual activity in frontier AI corporations, where known methods are yielding steady progress. Working Memory Another area where both models dropped points was in their working memory — the ability to maintain and manipulate information in active attention. There are multiple facets to this capability, with information being presented in textual, auditory, and visual modalities. When working with text, current models already demonstrate a working memory comparable with humans, if not far superior. Meanwhile, tasks that assess working memory in the visual and auditory modalities are what models struggle with. For example, one task that falls within visual working memory is spatial navigation memory. On a benchmark called MindCube , which measures this, GPT-4o scores 38.8%, far below human level. GPT-5 shows considerable improvement, achieving 59.7%, though this is still below average human scores. Improving auditory working memory is likely to be even more tractable than visual working memory. Since models already have a human-level working memory for text, it seems likely that business-as-usual engineering will bring the visual and auditory modalities along too. Newer models show improved scores on tests of visual working memory, such as the MindCube spatial navigation memory benchmark. Source . Long-Term Memory Retrieval (Hallucinations) Long-term memory retrieval is another core component of general intelligence, and models already have an impressive ability to fluently access information in their vast stores of general knowledge. Where GPT-4 and GPT-5 lose points, however, is in their tendency to hallucinate — to utter inaccurate information without hesitation. One of the current best available measures for hallucination is SimpleQA , a benchmark created by OpenAI that gives models highly specific questions to answer without using the internet. To score well, a model must not just be good at retrieving information, but also be able to recognize when it is uncertain — in many cases, it should refuse to answer, rather than making something up. Both GPT-4 and GPT-5 perform poorly in this area, with the latter hallucinating in response to more than 30% of questions. However, Anthropic’s Claude models hallucinate far less often, approaching but not yet matching human-level confabulation rates. This suggests the problem is tractable with business-as-usual engineering. A benchmark called SimpleQA measures how often models hallucinate. OpenAI’s current models perform poorly, but Anthropic’s models hallucinate far less often, suggesting that this is a tractable problem. Source . Long-Term Memory Storage (Continual Learning) The only broad domain in which GPT-4 and GPT-5 both score zero is long-term memory storage, or continual learning — the capacity to keep learning from new experiences and adapting behavior over the long term. Current models are “frozen” after training, preventing them from meaningfully learning anything new in deployment. Although models can do a “capability contortion,” leaning into their strong working memories over long context windows to give a false impression of long-term memory, this is not practical over weeks or months. They still have a kind of “amnesia,” resetting with every new session. To mimic a human’s capacity for continual learning, a dedicated long-term memory solution is essential, perhaps in the form of durable weight updates. Of all the gaps between today’s models and AGI, this is the most uncertain in terms of timeline and resolution. Every missing capability we have discussed so far can probably be achieved by business-as-usual engineering, but for continual long-term memory storage, we need a breakthrough. Nonetheless, the problem is not completely opaque, and probably won’t require a paradigm shift. The problem is now receiving substantial attention and resources from frontier AI corporations. In August, Demis Hassabis highlighted memory as a key missing capability, while Sam Altman, talking about GPT-6, hinted , “People want memory. People want product features that require us to be able to understand them.” Dario Amodei represented the sentiment of much of the industry when he said during an interview in August: “Models learn within the context… Maybe we'll train the model in such a way that it is specialized for learning over the context. You could, even during the context, update the model's weights… So, there are lots of ideas that are very close to the ideas we have now that could perhaps do [continual learning].” Despite being arguably the most nebulous remaining obstacle to AGI, we may only need an o1-preview moment for continual learning and long-term memory storage, that is, a standard breakthrough away. Conclusion Drawing together all the capabilities detailed in our framework, we can think of general intelligence as a cognitive “engine” that transforms inputs into outputs. We think our definition is useful because it provides a framework for assessing AIs against the breadth of human cognitive capabilities, so we can pinpoint which specific skills are still missing; the capability of the full system is arguably only as strong as its weakest link. AGI can be analogized to an engine that converts inputs to outputs by using a collection of cognitive abilities, consisting of General Knowledge (K); Reading and Writing Ability (RW); Mathematical Ability (M); On-the-Spot Reasoning (R); Working Memory (WM); Long-Term Memory Storage (MS); Long-Term Memory Retrieval (MR); Visual Processing (V); Auditory Processing (A); Speed (S). Source . Considering the gaps that we have highlighted, what will it take to get to AGI? According to our analysis, all that may be needed is a breakthrough in continual learning, as well as regular research and engineering for handling visual reasoning, world modeling, hallucinations, and spatial navigation memory. Looking at how research is advancing in each of these areas, when might we expect a company to publicly release a model with an AGI Score above 95% according to our definition? One of the authors, Adam, estimates a 50% chance of reaching this threshold by the end of 2028, and an 80% chance by the end of 2030. Ultimately, our framework allows us to replace vague speculation with a quantitative diagnostic. Given the industry’s focused efforts, it seems highly plausible that researchers will fill in all the puzzle pieces of our definition of AGI in the next few years, much sooner than the decade-long timelines suggested by some in the field. We are a standard breakthrough and business-as-usual research away from AGI. Footnotes Written by Adam Khoja Adam Khoja does technical and policy research at the Center for AI Safety. He studied math and computer science at UC Berkeley. Laura Hiscott Laura Hiscott is a staff writer for AI Frontiers. She has worked in science communication for over six years, both in press offices and at magazines. She studied physics at Imperial College London and trained in science communication at the European Southern Observatory. Continue reading Exporting Advanced Chips Is Good for Nvidia, Not the US The White House is betting that hardware sales will buy software loyalty — a strategy borrowed from 5G that misunderstands how AI actually works. Laura Hiscott Dec 15, 2025 AI Could Undermine Emerging Economies AI automation threatens to erode the “development ladder,” a foundational economic pathway that has lifted hundreds of millions out of poverty. Deric Cheng Dec 11, 2025 Want to contribute to the conversation? Pitch your piece Subscribe to AI Frontiers Thank you for subscribing. Please try again. Subscribe to AI Frontiers Thank you for subscribing. Please try again. AI Frontiers is a platform for expert dialogue and debate on the impacts of artificial intelligence. Home Articles About Contact Publish an Article Subscribe Donate The views expressed in our articles reflect the perspectives of individual authors, not necessarily those of the editors or the publication as a whole. Our editorial team values intellectual variety and believes that AI is a complex topic demanding a range of viewpoints, carefully considered. © 2025 AI Frontiers Subscribe to AI Frontiers Thank you for subscribing. Please try again. Discover more from AI Frontiers Stay informed on the future of AI alongside 30,000+ other subscribers. Thank you for subscribing. Please try again. I've already subscribed