llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-10-7f576c79-b88d-4619-b289-3148c00b6e96-output.json
Simon Willison’s "Pelican on Bicycle" SVG benchmark has evolved from a lighthearted personal test into a controversial lightning rod for debating whether AI labs are "benchmaxxing" by specifically training on famous informal prompts. While the latest model results show unprecedented technical coherence and artistic quality, critics argue that the benchmark's visibility creates a perverse incentive for companies to curate specialized training data, potentially misleading the public about a model’s general reasoning. Despite these concerns about manipulation, many enthusiasts maintain that the test remains a valuable ritual, arguing that its validity is easily defended by swapping the pelican for an "ocelot on a skateboard" to see if the model's underlying spatial logic holds up.