Tweet by Toby Ord: New post on RL scaling: Careful analysis of OpenAI’s public benchmarks reveals RL scales far worse than inference: to match each 10x scale-up of inference compute, you need 100x the RL-training compute. The only reason it has been cost-effective is starting from a tiny base. 🧵 pic.twitter.com/ZwhDegc4NO