Tweet by Miles Brundage: Certainly it’ll be easier to directly target areas that have clear reward functions but even besides the generalization point, I think the conclusion will just be “ok you need to work harder on the right data/reward functions for the other stuff” so it’s a matter of degree at…