Tweet by Epoch AI: Most AI benchmarks share a common flaw: they saturate too quickly to study long-run trends. Our solution: “stitch” many benchmarks together. This lets us compare models across a wide range of capabilities on a single unified scale. Here’s how this works.🧵 pic.twitter.com/d6Gvr6Ip1B