GPT-5.2 Cracks Legendary Erdős Problems, Stuns Mathematicians

GPT-5.2 Cracks Legendary Erdős Problems, Stuns Mathematicians

14 January, 20262 sources compared
Technology and Science

Key Points from 2 News Sources

  1. 1

    OpenAI's new model produced a complete solution to a prominent Erdős problem

  2. 2

    Neel Somani left the problem for 15 minutes and then found a full solution

  3. 3

    Somani formalized and verified the proof using Harmonic, which validated the solution

Full Analysis Summary

LLM formal proof milestone

GPT‑5.2’s latest demonstrations stunned parts of the mathematical community by producing a correct, fully formalizable proof for a version of a Paul Erdős problem.

Software engineer Neel Somani tested OpenAI’s GPT‑5.2, discovered the proof, and verified it using the formalization tool Harmonic, TechCrunch reports.

Mezha.net similarly notes that after verifying the machine-produced proof, Somani used Harmonic’s Aristotle formalization tool to confirm the result and to establish a baseline for when large language models can solve open math problems.

These accounts describe a striking case in which a modern large language model produced a proof that could be checked and formalized using current proof-assistant infrastructure.

The episode signals a new level of practical utility for language models in formal mathematics.

Coverage Differences

Tone and emphasis

TechCrunch (Western Mainstream) frames Somani’s find as a notable demonstration that large models “returned to find a correct, fully formalizable proof” and highlights verification with Harmonic as part of the story; mezha.net (Other) emphasizes verification plus the broader claim that this establishes “a baseline for when large language models (LLMs) can solve open math problems,” presenting the event as a milestone for AI capability rather than only a one-off success. TechCrunch reports the verification process and impact, whereas mezha.net reports the verification and highlights a conceptual shift toward LLMs solving open problems.

GPT-5.2 mathematical reasoning

The solution process leaned on classical number‑theory results and external sources.

TechCrunch reports the model’s chain-of-thought invoked classical results such as Legendre’s formula, Bertrand’s postulate, and the 'Star of David' theorem, and ultimately located a related 2013 MathOverflow post by Harvard’s Noam Elkies.

Mezha.net likewise reports that GPT-5.2 demonstrated improved mathematical reasoning by citing relevant theorems, finding the 2013 Elkies post, and producing a final proof that differed from and in some ways completed Elkies’s approach to an Erdős problem.

Both outlets therefore describe GPT-5.2 as not only retrieving relevant literature but recombining classical results into a novel, verifiable argument.

Coverage Differences

Narrative detail

Both sources report that the model used classical results and located Noam Elkies’s 2013 MathOverflow post, but TechCrunch focuses on enumerating the specific invoked results (Legendre’s formula, Bertrand’s postulate, “Star of David” theorem) and on the model’s chain-of-thought, while mezha.net highlights that the final proof “differed from and in some ways completed Elkies’s approach,” emphasizing completion and extension of prior human work rather than the list of intermediate theorems.

AI and Erdős problems

The GPT‑5.2 episode is presented as part of a larger wave of AI‑assisted progress on Erdős problems.

Both outlets note that roughly 15 problems moved from 'open' to 'solved' since mid‑December, with about 11 of those credited in some way to AI.

TechCrunch reports a 'surge of solved Erdős problems,' saying 15 problems moved from 'open' to 'solved' since Christmas, with 11 crediting AI and earlier autonomous work coming from a Gemini‑powered model called AlphaEvolve.

Mezha.net similarly states that GPT‑5.2 and earlier systems, including a Gemini‑based AlphaEvolve, have helped move about 15 Erdős problems to a 'solved' status, 11 of which involved AI.

These parallel accounts frame the Somani/GPT‑5.2 result as one point in a rapid, AI‑driven sprint on many smaller, previously obscure combinatorial and number‑theory cases.

Coverage Differences

Narrative and emphasis

Both sources report similar numeric tallies and mention AlphaEvolve, but TechCrunch frames the trend as evidence that “large models can meaningfully push mathematical frontiers and change how open problems are attacked,” emphasizing the methodological change; mezha.net frames the trend as part of AI’s growing role in formalization and discovery and highlights academic uptake and systemic shifts in verification. The difference is one of emphasis—TechCrunch foregrounds impact on the field’s frontier, while mezha.net foregrounds AI’s role in moving many easier or obscure instances to solved status.

Formal Proof Tooling

Both sources highlight the rising importance of formal proof systems and specialized tools to check and organize machine-produced mathematics, with mezha.net emphasizing the growing role of systems like Lean and Aristotle and AI tools for organizing, verifying, and refining proofs, while TechCrunch attributes progress to increased use of formalization tools and specialized LLMs (for example, Harmonic's Aristotle and OpenAI's research tools) that make machine-generated proofs easier to verify and extend.

Harmonic's Aristotle is explicitly named in both accounts as the verification pathway used in Somani's case, indicating a maturing toolchain that couples LLM reasoning with mechanized checking.

Coverage Differences

Focus and specificity

TechCrunch highlights specialized LLMs and research tools in addition to formalizers, framing the trend as driven by both model advances and tooling; mezha.net foregrounds formal proof systems and rising academic adoption, quoting Harmonic’s Tudor Ahim on adoption. In short, both describe a verification stack but TechCrunch emphasizes the broader ecosystem (specialized LLMs plus formalizers) while mezha.net stresses formal proof systems’ organizational role.

AI reshaping mathematical research

Leading mathematicians are already weighing what this means for research practice.

Both outlets cite Terence Tao’s view that scalable AI may make it practical to attack the 'long tail' of obscure but straightforward Erdős problems.

TechCrunch cautions that while AI is not yet replacing human mathematicians, large models can meaningfully push mathematical frontiers and change how open problems are attacked.

Mezha.net frames the development as AI becoming an emergent partner in formalization and mathematical discovery, suggesting a partnership model where humans direct and verify machine work.

The combined picture is one of rapidly improving machine assistance that is reshaping which problems are worth pursuing and how verification workflows are organized, even as sources differ slightly on tone—cautionary about replacement versus enthusiastic about partnership.

Coverage Differences

Tone (caution vs. optimism)

TechCrunch (Western Mainstream) expresses cautious optimism: it explicitly states “While AI is not yet replacing human mathematicians,” emphasizing limits; mezha.net (Other) uses more forward-looking language to present AI as an “emergent partner in formalization and mathematical discovery,” stressing adoption and partnership. Both report Terence Tao’s argument about the long tail, but they draw different implications about the near‑term role of humans.

All 2 Sources Compared

mezha.net

OpenAI GPT 5.2 Advances AI in Solving Erdős Math Problems

Read Original

TechCrunch

AI models are starting to crack high-level math problems

Read Original