Tweet by Elizabeth Barnes: The tasks are also limited in scope vs real-life ML research. We've tried to include aspects like interacting with large, messy codebases, or allocating compute between initial experiments vs final run, but there are practical constraints on the duration and compute per task.