November 22, 2024
AI-human collaboration could possibly achieve superhuman greatness in mathematics
Mathematicians explore ideas by proposing conjectures and proving them with theorems. For centuries, they built these proofs line by careful line, and most math researchers still work like that today. But artificial intelligence is poised to fundamentally change this process. AI assistants nicknamed “co-pilots” are beginning to help mathematicians develop proofs—with a real possibility this will one day let humans answer some problems that are currently beyond our mind’s reach.
One promising type of AI co-pilot is being developed at the California Institute of Technology. It can automatically propose next steps in a proof and help complete intermediate mathematical goals, helping build the logical connective tissue between larger steps. “If I’m developing a proof, this new co-pilot gives me multiple suggestions for going forward,”says Animashree Anandkumar, a computing and mathematical sciences professor at Caltech. Along with her team, Anandkumar presented the AI co-pilot in a recent preprint paper, which has not yet been peer-reviewed. Crucially, she says, those suggestions “will all be correct.”
The co-pilot is a large language model (LLM), the same kind of machine-learning system behind OpenAI’s ChatGPT and Google’s Gemini. While its training is different, it is also similar to the technology powering Google’s AlphaProof and AlphaGeometry 2—both of which generated complex mathematical proofs to a silver medal standard at this year’s International Mathematical Olympiad (IMO) for the world’s best high school students. And although LLMs can generate what is, in a technical sense, “bullshit,” a co-pilot’s erroneous suggestions are checked and rejected. In the case of the Caltech co-pilot, that’s thanks to the software in which the AI operates, called Lean, which uses rigorous mathematical logic to screen for valid statements.
Proof by Code
Over the past few years Lean has become increasingly popular with a small but growing user base. The open-source software allows mathematicians to enter their mathematics by coding, a process known as formalizing. What’s the advantage? It’s never wrong. In Lean and other proof-assistant applications, software automatically checks mathematical statements for accuracy. That’s a world away from traditional, so-called informal mathematics, where reviewers and colleagues painstakingly audit pages of such statements. That process is prone to human error, and mistakes are missed.
If you are writing a proof with the help of the Caltech co-pilot, you can click on a button to request new lines of Lean’s programming language to represent the mathematics you are working with. Several options, which Anandkumar calls “tactic suggestions,” will appear on the right-hand side of the screen; you then simply choose whichever option looks most appropriate. If your proof is headed in a direction that has obvious or well-known intermediate conclusions, the co-pilot can also suggest how to complete that trajectory.
“There’s no trust issue” with Lean because the software checks the work, says Martin Hairer, a professor of pure mathematics at the Swiss Federal Institute of Technology in Lausanne and at Imperial College London. Still, many academics haven’t adopted it yet. “It’s hard to use because you have to enter all the mathematics as code,” Hairer says. Coding in Lean requires entering details that would be omitted when writing up a paper, he notes, so it might take multiple pages of code to show what’s self-evidently true or obvious.
But Hairer, who is not involved in the Caltech project, believes AI co-pilots will eventually take all that grunt work away. “Once you present a statement which is obvious to most mathematicians, an LLM should then be able to generate the code for it,” he says, adding that this faster process could “possibly lure a new generation of mathematicians to Lean.”
Anandkumar, too, predicts more researchers will embrace formal AI-assisted mathematics. “Today, when I talk to young mathematicians or even undergrads early in their careers, I see that they are familiar with these AI systems,” she says. ”They’ll do whatever it takes to make the job quicker and easier to gain a competitive edge.”
Mathematical Transformations
Before the international mathematics community adopts AI tools in a meaningful way, these platforms will need to get much more powerful. With their silver medal standard at this year’s IMO, Google’s AlphaProof and AlphaGeometry 2 have shown remarkable results. But they haven’t yet reached the level that research mathematicians need for assistance with complex proofs; humans, in that regard, are still the more capable mathematicians.
Yet “soon there will be systems which approach that level,” says David Silver, vice president of reinforcement learning at Google DeepMind. “I think this will essentially lift human mathematicians to a place where they are able to operate, and ponder ideas, at a much higher level.” Mathematics is beginning to transform, he says, just like it did when the electronic calculator was invented. “In the time before the calculator, there was a wide range of mathematics that were extremely laborious and took a lot of effort,” he says. “I think we’re now at that stage with proving, and in the future we’ll see very complex proofs being automatically solved by AI.”
Collaboration through AI
Adopting AI co-pilots will shake up how mathematicians work with one another, too. Typically, they operate alone or in small groups. Trusted colleagues assess their proofs piece by piece. But formal AI assistants could empower larger groups of human collaborators to tackle the biggest problems by breaking them down into subproblems. Each part would be batched out to be solved by different teams of specialist AI-human partnerships. “Mathematics is seen as such a lonely endeavour, especially in popular media, but now it looks like AI will become the enabler of collaboration amongst mathematicians,” Anandkumar says.
Mathematicians are generally optimistic that AI co-pilots will soon give human experts a boost, allowing them to invest their time in more complex and difficult problems. For example, in the coming years AI-human partnerships might try their hand at notoriously difficult Millennium Prize Problems—perhaps even something as challenging as P versus NP, a long-standing question in theoretical computer science that asks whether every problem whose solution can be quickly verified can also be quickly solved.
“That’s where we’re going to find ourselves going quite soon,” says Silver, as he considers the notion of solving such questions. “Problems as complex as ‘P versus NP’—or something that difficult—are vastly beyond where we are today in terms of even knowing how to start,” he says. “But I can imagine that, maybe in around three years, we might be on course for something like that.”