Tweet by Eli Lifland:

Takeaways re: AI R&D performance:
1. Claude 3.5 Sonnet reaches ~50th percentile human baseline 8-hour performance.
2. Sonnet Old-> New is a 0.2 jump in 4 months. We're 0.6 away from 90th percentile baselines.

I think this significantly shortens my timelines. (caveats in reply) https://t.co/6svf0cu8EF pic.twitter.com/VzldSvcMCU