Summarizer

First Proof Mathematical Challenge

Discussion of newly released unsolved math problems designed to test frontier models, predictions about whether current models can solve genuine research-level mathematics

← Back to Gemini 3 Deep Think

The First Proof Mathematical Challenge is viewed as a high-stakes benchmark for testing whether frontier models like Gemini Deep Think or GPT-5.2 can move beyond pattern matching to solve genuine, research-level mathematics. Commenters emphasize the immense coordination required to curate these expert-level problems, noting that the content is so specialized it often appears like a foreign language to non-mathematicians. A central debate has emerged regarding the five-day submission window; some argue it is too brief for broad social propagation, while others believe it is the "sweet spot" for intense AI inference that prevents human-assisted cheating. Ultimately, while expectations for a total breakthrough remain cautious, there is a strong consensus that even negative results are vital for understanding the true limits of AI in scientific discovery.

12 comments tagged with this topic

View on HN · Topics
Gpt5.2 can answer i don't know when it fails to solve a math question
View on HN · Topics
Would be cool to have a benchmark with actually unsolved math and science questions, although I suspect models are still quite a long way from that level.
View on HN · Topics
And mathematics?
View on HN · Topics
The comparison should be with GPT 5.2 pro which has been used successfully to solve open math problems.
View on HN · Topics
I'm pretty certain that DeepMind (and all other labs) will try their frontier (and even private) models on First Proof [1]. And I wonder how Gemini Deep Think will fare. My guess is that it will get half the way on some problems. But we will have to take an absence as a failure, because nobody wants to publish a negative result, even though it's so important for scientific research. [1] https://1stproof.org/
View on HN · Topics
As a non-mathematician, reading these problems feels like reading a completely foreign language. https://arxiv.org/html/2602.05192v1
View on HN · Topics
The 1st proof original solutions are due to be published in about 24h, AIUI.
View on HN · Topics
Feels like an unforced blunder to make the time window so short after going to so much effort and coming up with something so useful.
View on HN · Topics
5 days for Ai is by no mean short! If it can solve it, it would need perhaps 1-2 hours. If it can not, 5 days continuous running would produce gibberish only. We can safely assume that such private models will run inferences entirely on dedicated hardware, sharing with nobody. So if they could not solve the problems, it's not due to any artificial constraint or lack of resources, far from it. The 5 days window, however, is a sweat spot because it likely prevents cheating by hiring a math PhD and feed the AI with hints and ideas.
View on HN · Topics
5 days is short for memetic propagation on social media to reach everyone who has their own harness and agentic setup that wants to have a go.
View on HN · Topics
Really surprised that 1stproof.org was submitted three times and never made front page at HN. https://hn.algolia.com/?q=1stproof This is exactly the kind of challenge I would want to judge AI systems based on. It required ten bleeding-edge-research mathematicians to publish a problem they've solved but hold back the answer . I appreciate the huge amount of social capital and coordination that must have taken. I'm really glad they did it.
View on HN · Topics
Of course it isn't made the front page. If something is promising they hunt it down, and when conquered they post about it. Lot of times the new category has much better results, than the default HN view.