Simon Willison's informal SVG generation test, discussion of whether it's being trained on specifically, quality improvements in latest models, and debate over its validity as a casual benchmark
Simon Willison’s "Pelican on Bicycle" SVG benchmark has evolved from a lighthearted personal test into a controversial lightning rod for debating whether AI labs are "benchmaxxing" by specifically training on famous informal prompts. While the latest model results show unprecedented technical coherence and artistic quality, critics argue that the benchmark's visibility creates a perverse incentive for companies to curate specialized training data, potentially misleading the public about a model’s general reasoning. Despite these concerns about manipulation, many enthusiasts maintain that the test remains a valuable ritual, arguing that its validity is easily defended by swapping the pelican for an "ocelot on a skateboard" to see if the model's underlying spatial logic holds up.
45 comments tagged with this topic