Tweet by Steve Newman: Why do coding agents cheat on unit tests (e.g. by modifying the test to always return true)? The obvious answer, "because this would be rewarded during RL training", only makes sense if the RL environment is stupid enough to be fooled by hacked tests. Do we know whether this is…