https://mathstodon.xyz/@tao/115855840223258103 “Erdos problem #728 was solved more or less autonomously by AI”
Recently, the application of AI tools to Erdos problems passed a milestone: an Erdos problem (#728
https://www.
erdosproblems.com/728) was solved more or less autonomously by AI (after some feedback from an initial attempt), in the spirit of the problem (as reconstructed by the Erdos problem website community), with the result (to the best of our knowledge) not replicated in existing literature (although similar results proven by similar methods were located).
This is a demonstration of the genuine increase in capability of these tools in recent months, and is largely consistent with other recent demonstrations of AI using existing methods to resolve Erdos problems, although in most previous cases a solution to these problems was later located in the literature, as discussed in https://
mathstodon.xyz/deck/@tao/115788262274999408 . This particular case was unusual in that the problem as stated by Erdos was misformulated, with a reconstruction of the problem in the intended spirit only obtained in the last few months, which helps explain the lack of prior literature on the problem. However, I would like to talk here about another aspect of the story which I find more interesting than the solution itself, which is the emerging AI-powered capability to rapidly write and rewrite expositions of the solution. (1/5)
2d
Terence Tao @tao
Let me begin by quickly recapping the history of this problem. In 1975, Erdos, Graham, Ruzsa, and Strauss studied the prime factorization of binomial coefficients such as
, and posed many related questions. One such question (which was an offhand spinoff of a different question) asked whether one could find infinitely many 𝑎,𝑏,𝑛 with 𝑎,𝑏≥ε𝑛 obeying the divisibility condition 𝑎!𝑏!|𝑛!(𝑎+𝑏−𝑛)! and 𝑎+𝑏>𝑛+𝐶log𝑛. However, the question was vaguely worded in a number of respects; for instance it was initially unclear whether 𝐶 was intended to be small or large.
A few months ago, a team associated with the AI tool AlphaProof observed that the problem admitted several trivial solutions, if 𝑎 or 𝑏 were allowed to be large compared with 𝑛. This technically solved the problem, but was deemed not in the spirit of the question, and an additional constraint 𝑎,𝑏≤(1−ε)𝑛 was imposed to rule out these solutions. Further AI-assisted literature search did not turn up significant work on this problem. (2/5)
On Jan 4, ChatGPT was able to produce a proof even with the adjusted constraint, but with 𝐶 taken to be a small constant: https://
chatgpt.com/s/t_695bdbf3047c8191af842d03db356b1a ; this was then formalized by Aristotle into Lean. However, it was determined on closer reading of the source paper that 𝐶 was intended to be large; indeed, the results in the original paper already established the small 𝐶 claim, although this was not evident to us until a few days later.
Shortly afterwards, a different web site participant ran the Lean proof through ChatGPT to rewrite it in natural language https://
drive.google.com/file/d/1ejHqEddpD52SYubZlK8eYx2aoanFmMs8/view?usp=sharing , and then after further conversation obtained an improved writeup
https://
drive.google.com/file/d/1HAoRuYiUTN0PNOhjY96-dWKNqJ1X4Ucy/view?usp=sharing , in which several gaps in the original proof were filled in. The exposition was still somewhat clunky and "AI" in feel, and lacked much of the remarks and references to literature that would organically accompany a human-written proof, but was readable enough that the general ideas of the proof could be readily extracted. (3/5)
Meanwhile, with further prompting, ChatGPT was also able to adapt the argument to handle large 𝐶 as well as small 𝐶, thus finally producing a new result in the spirit of the intended question https://
drive.google.com/file/d/1xRw8_o2C8HwmxMDnBR5OJlxXaW7jlYbz/view?usp=sharing . Interestingly, the proof contained some minor errors in it, but the AI tool Aristotle was able to automatically repair these gaps and produce a Lean-verified proof.
At this point, a third particiant ran Aristotle again on the existing Lean proof to provide a shorter version, which a different participant then input into a lengthy back-and-forth ChatGPT session https://
chatgpt.com/share/695e7cbd-605c-8010-809b-ccba75560c76 to turn it into a much more fully fleshed article that described not just the proof itself, but its connection with prior literature and with a tighter narrative structure. This resulted in a new writeup of the proof
https://
drive.google.com/file/d/1MRQfcHhrYMfMTvlZcMC3zEK7aOrUyHiQ/view?usp=sharing that had less of the feel of a generic AI-produced document, and which I judge to be at a level of writing within ballpark of an acceptable standard for a research paper, although there is still room for further improvement. (I review this text at
https://www.
erdosproblems.com/forum/thread/728#post-2852 ). (4/5)
My preference would still be for the final writeup for this result to be primarily human-generated in the most essential portions of the paper, though I can see a case for delegating routine proofs to some combination of AI-generated text and Lean code. But to me, the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument.
This is sharp contrast to existing practice where the effort required to produce even one readable manuscript is quite time-consuming, and subsequent revisions (in response to referee reports, for instance) are largely confined to local changes (e.g., modifying the proof of a single lemma), with large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors. However, the combination of reasonably competent AI text generation and modification capabilities, paired with the ability of formal proof assistants to verify the informal arguments thus generated, allows for a much more dynamic and high-multiplicity conception of what a writeup of an argument is, with the ability for individual participants to rapidly create tailored expositions of the argument at whatever level of rigor and precision is desired.
Presumably one would still want to have a singular "official" paper artefact that is held to the highest standards of writing; but this primary paper could now be accompanied by a large number of secondary alternate versions of the paper that may be somewhat looser and AI-generated in nature, but could hold additional value beyond the primary document. (5/5)