Summarizer

LLM Output

llm/7aa3097d-285e-44af-bca1-6df7de74fbb3/9361fe13-7b8e-4fcb-8712-a83c0edc8bb4-output.json

Pretty-print

summary

# Summary: Notes for Early Indicators Project

## Project Overview

This Notion page documents an ongoing research initiative focused on identifying measurable "early indicators" that could help predict AI development trajectories and their societal impacts. The project aims to develop metrics that will inform future discussions about critical uncertainties ("cruxes") in AI progress, rather than directly answering those questions.

## Collaborative Methodology

The project leader has found that serious analytical work requires multiple rounds of deep engagement, which is difficult to achieve in group settings like Slack or email. The most effective approach has been one-on-one interactions over extended periods, though synchronous workshops (like one at CSET) have shown some success for brainstorming and idea aggregation.

The proposed workflow involves three phases: initial brainstorming via Zoom calls with diverse participants (including representatives from organizations like METR, Epoch AI, and the Forecasting Research Institute), followed by concrete experiment proposals developed by specialists, and finally group evaluation of those proposals.

## Key Research Questions

The document outlines several critical uncertainties worth measuring:

- **AI Capability Timelines**: When will "superhuman coders" and AI researchers emerge? How is the task completion time horizon progressing?
- **Benchmark vs. Reality Gap**: Is the gap between benchmark performance and real-world task completion growing or shrinking?
- **Automation Bottlenecks**: Are certain skills emerging as "long poles" that are harder to automate?
- **Compute Dependencies**: Can AI systems think around compute bottlenecks? How much progress comes from labor input versus compute?
- **Economic Sustainability**: Is the current AI investment buildout economically sustainable, or are we in a bubble?

## Supporting Evidence and Analysis

### Time Horizon Research
METR's research on AI task completion found that frontier models' 50% time horizon (tasks they can complete with 50% success rate) has been doubling approximately every seven months since 2019. Current frontier models like Claude 3.7 Sonnet have time horizons around 50 minutes. Extrapolating this trend suggests AI systems could automate month-long software tasks within five years.

### Productivity Claims Scrutinized
A detailed analysis by Mike Judge challenges widespread claims about AI coding productivity. Despite 60% of developers using AI tools daily, actual software output metrics (new mobile apps, websites, GitHub repositories, video games) show flat or declining trends—not the exponential growth expected if AI truly delivered 10x productivity gains. A controlled study found AI actually slowed developers by 19%, contradicting their perception of 20% improvement.

### AI Adoption Patterns
AI adoption has been faster than almost any technology in history. ChatGPT reached 100 million users within two months—faster than Instagram, Netflix, or Spotify. By 2025, approximately 10% of Americans use AI daily, and nearly 40% of US businesses pay for AI products. However, most users rely on free-tier models rather than frontier systems, and the fraction of paying users may be declining even as absolute numbers grow.

### Economic Sustainability Concerns
The document extensively examines whether AI investment constitutes a bubble. Current AI company capital expenditures run approximately 6x revenue—higher than both the railroad bubble (2x) and dot-com bubble (4x). OpenAI projects revenue growth from $10 billion to $100 billion over three years, which would be historically unprecedented. While some analysts argue this buildout remains smaller than historical technology investments as a percentage of GDP, others note warning signs including circular financing arrangements among AI companies.

## Bear Case Perspective

A detailed "bear case" analysis argues that current LLM architectures may not achieve AGI. Key claims include:
- Pretraining improvements are decoupling from genuine intelligence gains
- Test-time compute/RL won't meaningfully generalize beyond easily verifiable domains
- LLMs excel primarily at "in-distribution" problems present in training data
- Agency requiring sustained reasoning across long inferential distances remains fundamentally limited

## Measurement Recommendations

The project emphasizes tracking sector-specific diffusion rather than attempting society-level generalizations. Software engineering is identified as a leading indicator domain. Proposed measurements include monitoring internal AI company productivity studies, tracking the ratio of AI-generated versus human-generated impactful ideas, and examining sample-efficient learning across domains. Global survey initiatives like the Collective Intelligence Project's "Global Pulse" provide data on AI trust and usage patterns across 63+ countries.

← Back to job