Tweet by Dwarkesh Patel:

The most interesting part for me is where @karpathy  describes why LLMs aren't able to learn like humans.

As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.”

A single end reward gets broadcast across… https://t.co/3guOwdewKd pic.twitter.com/lYonLgrukB