Summarizer

LLM Output

llm/d06c1a4f-0867-4f45-9209-419066e776eb/4078d4fe-8515-4656-a7f9-a5529f0a1c23-output.json

summary

# Summary: Notes for Early Indicators Project

This Notion page documents an ongoing research initiative to identify measurable early indicators for AI progress and its economic impacts. The project, led by someone consulting with experts from AI labs, universities, and policy organizations, aims to develop metrics that can inform predictions about AI capabilities and their real-world consequences.

## Project Structure and Methodology

The project operates in phases, with Phase 1 focused on identifying key questions ("cruxes") about AI development, and Phase 2 on brainstorming concrete measurements. The organizer notes that serious collaborative work requires extended multi-turn conversations, typically through 1:1 engagement rather than group discussions. However, for brainstorming, synchronous group sessions may work better. The plan involves organizing Zoom sessions with 3-4 participants each, potentially including representatives from organizations like METR, Epoch AI, and the Forecasting Research Institute.

## Key Research Questions

The document identifies several critical uncertainties to measure, including: when "superhuman coders" and AI researchers will emerge; how AI task completion horizons are progressing; whether the gap between benchmark performance and real-world tasks is growing or shrinking; whether any skills represent "long poles" that are harder to automate; and whether rapid algorithmic improvements could decouple AI impact from hardware investment.

## Supporting Research Findings

Linked resources provide important context. METR's research on AI task completion found that frontier models' "50% time horizon" (the length of tasks AI can complete with 50% success) has been doubling approximately every seven months since 2019, with possible acceleration in 2024. The time horizon metric measures tasks by how long human professionals take to complete them.

A critical counterpoint comes from Mike Judge's analysis "Where's the Shovelware?" which argues that despite widespread claims of AI coding productivity gains, there's no evidence of increased software output across app stores, GitHub projects, Steam games, or domain registrations. The METR study found developers thought AI made them 20% faster when it actually made them 19% slower, suggesting developers may be unreliable narrators of their own productivity.

## Economic and Adoption Considerations

The project examines whether current AI investment levels are sustainable, with discussions of potential bubble dynamics. While investment amounts are unprecedented in absolute terms, they're not historically unusual as a percentage of GDP compared to railroad or telecom buildouts. The project notes concerns about "creative financing" and circular deals within the AI ecosystem, though these don't definitively indicate a bubble.

Research on AI adoption shows ChatGPT reached adoption levels faster than almost any technology in history—10% of US weekly users within two years. However, frequency of use and actual productivity impacts remain unclear, with surveys showing mixed results about whether AI usage intensity is increasing.

## Capability-Reality Gaps

Steve Newman's analysis highlights a disconnect between AI benchmark performance and real-world applicability. Current AI systems struggle with tasks requiring: managing complexity over long projects, metacognition and dynamic planning, continuous learning and memory, leveraging unstructured information, exercising judgment, and generating genuine creative insights. These capabilities aren't measured by current benchmarks but are essential for most economically valuable work.

A "bear case" perspective argues that current LLM architectures may have fundamental limitations preventing them from achieving AGI. This view suggests improvements since GPT-3.5 have been largely "window dressing"—better at convincing presentation without fundamental capability advances. The argument points to continued struggles with genuine agency, reliability outside training distributions, and the persistent need for elaborate scaffolding to achieve real-world results.

← Back to job