OpenAI Podcast · 2026-04-28

OpenAI Podcast Ep17: AI's Breakthrough in Mathematics and Its Impact on Science

Hosts: Andrew Maine

Guests: Sébastien Bubeck, Ernest Ryu

AI in mathematicsAGI progressAutomated researchMathematical reasoningScientific discoveryHuman-AI collaborationProof verificationAI benchmarks

Read summary Jump to transcript Original episode

Why it matters

AI models have progressed from basic arithmetic to solving international math Olympiad problems and open research problems within a few years.

Key claims

AI models have progressed from basic arithmetic to solving international math Olympiad problems and open research problems within a few years.
Mathematics serves as a clear, verifiable benchmark for AI reasoning capabilities, helping track progress toward AGI.
AI-assisted research compresses timelines dramatically, enabling mathematicians to solve problems in hours that previously took months.
The concept of 'AGI time' describes AI's ability to sustain reasoning from seconds to potentially weeks or months, crucial for deep scientific breakthroughs.

Briefing memo

Summary

In this episode of the OpenAI Podcast, researchers Sébastien Bubeck and Ernest Ryu discuss the remarkable progress AI has made in mathematics, evolving from basic problem-solving to reaching Olympiad-level and even research-level capabilities. They highlight how AI models like ChatGPT have transitioned from struggling with simple math tasks to solving complex open problems, accelerating mathematical research and enabling new discoveries. The conversation emphasizes the importance of mathematics as a benchmark for AI reasoning and its broader implications for advancing scientific fields such as biology and material science.

The guests also explore the concept of 'automated researchers'—AI systems capable of sustained, autonomous problem-solving over extended periods, which could revolutionize scientific discovery. They caution about the risks of over-reliance on AI leading to shallower human understanding and stress the continued need for expert human oversight. The episode concludes with reflections on how AI will transform the practice of mathematics, making it more interconnected, accelerating verification processes, and ultimately enhancing human creativity and productivity in science.

AI models have progressed from basic arithmetic to solving international math Olympiad problems and open research problems within a few years.
Mathematics serves as a clear, verifiable benchmark for AI reasoning capabilities, helping track progress toward AGI.
AI-assisted research compresses timelines dramatically, enabling mathematicians to solve problems in hours that previously took months.
The concept of 'AGI time' describes AI's ability to sustain reasoning from seconds to potentially weeks or months, crucial for deep scientific breakthroughs.
AI can perform deep literature searches and connect disparate mathematical fields, accelerating discovery and surfacing novel solutions.
There is a risk of human expertise atrophy if researchers over-rely on AI without maintaining deep understanding and verification skills.
AI will enhance verification of mathematical proofs, speeding up the validation process and reducing errors in published research.
The future of science involves human-AI collaboration where AI handles complex computations and humans guide research priorities and interpret results.

Source material

Transcript

Hello, I'm Andrew Maine, and this is the OpenAI podcast.

Today, our guests are researchers Sebastian Boobek in Ernest Rio, and we're going to talk about math, how went from almost laughable to Olympiad level, and why you need math to reach AGI.

The progress of the last few years has been nothing short of miraculous.

We will be able to have LLMs, be able to solve problems that require more than 50 pages of thinking.

Mathematics was just a perfect benchmark to see the model making progress during the last four years.

Sebastian Ernest, I'd love to know more about you.

So how would you explain your roles?

Yeah, sure.

So I have been working in mathematics for almost 20 years now.

I used to work in optimization, and a theory of machine learning.

I was a professor at Princeton for a few years, before moving to Microsoft, and now I met a researcher at OpenAI, and in the last few years, I've been really trying to understand how AGI can help mathematics and to really evaluate the progress that we're making in terms of solving difficult math problems with AI.

Ernest, how about you?

Yeah, so I've recently joined OpenAI as a researcher, but before that, I was an applied mathematician working on optimization and machine learning theory, and in my previous job, I worked as a professor of mathematics at the UCLA Math department.

So I think a lot of people at this perception that these models aren't good at math, literally called language models, and how has that changed, what's gone on?

Yeah, I think the progress of the last few years has been nothing short of miraculous.

It's important to remember that two years ago, we didn't even have reasoning models.

Later on, models that could prove difficult mathematical theorems.

Today, two years later, the model they are able to help fields medalist in their day-to-day work.

So really, the jump is just simply astounding.

And maybe if I can build a little bit more on that, something which is important to understand is that everybody has been surprised by this progress, including us.

So to tell you a story, a year and a half ago, I was at a workshop at the conference with other pharaoh mathematician, and there was a debate that I positively painted in on weather LLM, scaling LLM's will help us resolve major open problems.

So this was a debate a year and a half ago, and the room was very divided.

In fact, they did a poll at the beginning, and I think it was like 80% said, no, impossible that this would happen.

So then the debate unfolded, and by the end of the debate, it was more like 50, 50.

So pretty good progress during that hour.

This obviously was just so wrong in hindsight.

Like just mere eight months later, the model was starting to be able to do research to live at mathematics.

What was the breakthrough moment for you realizing that there was a really good intersection between AI and mathematics?

So summer of 25, the big news was, chatchipiT was able to achieve a top human level performance at the international math Olympia, you know, gold mill performance.

So that was amazing news, and that demonstrated that, well, at least for the competition level mathematics, the models are very highly capable.

Only on par with the top human high school contestants.

But well, competition problems are canned problems.

They have relatively short solutions because they are meant to be solved in it within a few hours, and they're not novel because, well, somebody came up with it as there's a solution.

So it's not research level math.

So then I got curious, and a lot of people got curious, can chatchipiT do research level mathematics?

And there was a lot of debate online, and then I thought to myself, I should try it on my own problems.

Maybe I'll try it for myself, and make up my own minds as opposed to listening to what other people say, because I'm a mathematician myself.

So I took a classical open problem in optimization theory, which is a branch of applied mathematics that I work on, work in.

And the question specifically is, there's a famous algorithm called Nestrel of accelerated gradient method, and does this have this conversion behavior or is it possible that the, for, you know, in our certain bad cases, can there be a certain diversion behavior?

This question was, was generally open in the sense that people know that in most cases, the algorithm behaves well, it's convergent, but people really did not know, like, is there a bad instance, does it in the worst case, could it diverge?

The answer turned out to be, yes, and the way I discovered it is, I remember it distinctly, so my bad time for my son is 8 p.m.

And then I tried not to stay awake after midnight.

So I had four hours of usually evening hours to myself, if I want to focus on something, so I decide, okay, I'm gonna spend a few days working on this.

So over the course of three days, so that's 12 hours total.

I interacted with Chad to be eating on this question.

It wasn't as simple as me just putting in the prompt and getting a solution.

I played the role of the verifier.

I told whenever the model made a mistake, I corrected it.

I also tried to point it the conversation into areas that I felt approaches that I felt were novel.

And after a while, the proof, there was a proof.

And I checked it.

I also asked Chad to be to double check it, and it was correct.

And that's how this 42 year old open problem got resolved.

And once I got this solution, I thought to myself, what would be the most fun thing for me, fun way for me to publicize this?

Because I could just write a paper, and that would be less fun.

So I decided, let me go to Twitter and talk about this.

And dangerous, but yeah.

Yeah.

But well, I had a lot of fun.

Yes, people, it was, I think one of the earliest instances of a genuinely open problem being solved by AI.

And yeah, people like the data, and it was a lot of fun.

It is an issue.

You brought that up that we've seen sometimes people say, hey, I found something cool or novel, and then sometimes it gets torn apart, sometimes it stands up.

And going into social media can't be kind of scary, but it sounds like we do need these kind of feedback cycles.

I think part of the challenge for a lot of us is we hear terms, you know, here like the international math Olympiad, and we're trying to figure out, like, OK, what does that mean from like a scale of a problem?

I can understand addition, subtraction, multiplication, could you give me an example of understanding?

Like where we went from, from like first chatGPT, which could kind of sort of use it, then it could do math, it could use a tool, but then the models sort of implicitly understanding that.

When chatGPT just entered the scene in early 2003, I started testing, I was very curious about how the model would perform fair on sort of common math problems.

So these would include math problems that you would see in like the high school of, but also like day-to-day, like math-ish problems.

So for example, imagine a scenario where like the three of us went camping together and then I paid for this, except for paper this, and then actually pay for whatever, and then we went one of the clear-of-the-ledger and we want to split things even at the end.

Can chatGPT do the calculations for us?

And this is moderately complicated, if you have like 17 items that we've purchased.

In 23, 24, and also in early 25, I remember, the models couldn't do this.

Another example would be, I'm in, let's say, in Korea, Serbs in Paris, Andrew, you're in California, and you want to set up a Zoom meeting, like, what would be a good hour to do so?

Again, in early 25, the models couldn't do this.

But then just suddenly things just changed.

And I wasn't in open AI at the time, so I'm not quite privy to what exactly you did, but suddenly the models started solving IMO problems, and then for the more it started solving research problems.

And the way I sort of calibrate this right now is that unless you are a professional mathematician trying to discover new mathematics, if you are somebody who's like, let's say a physicist, or a chemist who uses relatively complicated mathematics, like, differential equations, things, differential geometry, things like this, but you're not inventing new math.

Then, chatGPT can do all of the math that you would need.

So any basically user of high-level mathematics, it from STEM, can now use chatGPT to basically have their math taken care of.

And you would want to access some degree of caution, check the check whether things are right, run simulations, just to double-check the models, can make mistakes.

But now, any math problem that you would want to solve, most people, for 99% of the population, the models can do it.

When I worked on the release of GPT4, you schedule this one of those examples.

And I could put three people into a schedule and have it figure out time slots, but pushing it beyond that, that was really hard.

Why did was there a change?

So Ernest just talked about noticing all of us that got better.

Now, we know one thing was tool used.

You could let the model use a calculator, but something else happened with the models themselves.

So going back to the debates that I just told you about, like the framing was really about, can scaling a loan, scaling of LLM's alone, bring you to solving research breaks within mathematics.

And this is a wrong framing.

What we do at OpenAI, we do a lot of research, innovative research.

It's not just about scaling the model.

So when you ask what happened, or when you're asking what happened, middle of last year, when suddenly the model were able to solve a mass problem, well, a lot of things happen.

We do a lot of research.

And all of these has to progress at the same time.

So I can't really point to a single element.

But it was able to do it itself without the tool, so.

Yeah, so I think it's really, really important to just double down on what Ernest was saying about the progress, and the scheduling problems that the model wasn't able to do back then.

I said that two years ago, we didn't have reasoning models.

Well, I think about four years ago, four years ago, so this is pre-chargey pity.

And I remember Google came out with a mathematics model, called Minerva at the time.

And I fell from my chair.

I was so impressed.

What was I impressed by?

That the model, I could give it the coordinates of points in the plane, and it would give me a line that goes through those points.

Like, when I say that, you know, now it's almost hard to understand what are you talking about, obviously a model can do that.

So I think we have kind of forgotten how quickly things have happened.

And now, yeah, Ernest was saying that it's basically at the point where, unless you're trying to invent new mathematics, it's kind of at the right level already.

I would say we're already seeing glimmers that even to invent new mathematics, it's getting there.

Can you break down, though, aside from somebody who's interested in developing new fields of mathematics or just making new groups, it's what does this affect everything else?

What is the impact of this going to be on science?

What is the impact of the rest of what you're working on?

Why is this really important?

Not just, oh, cool, it does math.

So I think the alkali does math part.

What did matter, as we were developing those models, as a good way to benchmark the progress?

The nice thing about mathematics is that the question now, very clear, non-ambiguous, you know, everybody agrees on what the question is asking.

So that's point number one.

Point number two, you can verify the answer.

So once a model can give an answer, everybody will agree, was it correct or was it not correct?

Although you can put a pin on that because we will talk about, you know, in research level, it's not that simple and you move to a value.

But before research level, it's very easy to evaluate.

So mathematics was just a perfect benchmark to see the model making progress during the last four years.

Now, we'd say we have kind of saturated data aspect.

And you can ask, okay, now, now, okay, find the models to mathematics.

We have understood what about the next steps.

And for the next step, I would say that having our models be good at mathematics is going to be good for many, many other things, and let me explain why.

A key feature of mathematics is that to resolve a problem, you have to think for a long time.

Be it days, weeks, sometimes years.

So this long thinking, not only do you have to think for a long time, but you also have to think consistently for a long time.

If at some point in your chain of reasoning, there is a mistake, this will kill the entire argument.

It doesn't matter if everything after that is correct.

If there is one single very important, everything the entire argument is destroyed.

So this property makes it that this is what you want out of reasoning models, that if they make mistakes, they will be able to correct themselves.

So we're hoping that these properties that they acquire through mathematics will generalize to us or domain, which by the way, is exactly the same thing with human beings.

Why do we train human beings in mathematics?

I mean, it's a very fun topic.

I love it.

We did it professionally.

Maybe we still do some of it a little bit.

But why do we train humans in mathematics exactly for the same reason?

It gives you this kind of very logical thinking.

Do we need to think about new ways to talk about these discoveries?

Yeah, so I personally view it a little bit as part of my role to try to educate the research community about the recent advances, because I have this dual background of both being a former mathematician and now working on the frontier of AI.

And indeed, Twitter and social media is a great place to try to explain what is the progress in particular because this progress is so fast.

So, you know, for example, I maybe we can talk a little bit about the air-dose problems and some of the controversies that happened around that.

So there was a first example, so there was first, you know, Ernest example and then there were few of the problems that were historically part of those two, just so I think.

Oh, love to know who he isn't.

Why his problems are tremendous.

Yeah, of course.

So polioidos is one of the most prolific mathematician of the last century.

He has written, I think, 1500 research paper.

He was a very iconoclastic figure.

You know, he didn't have a house or an apartment.

He was just traveling for one university to the next, trying to find new collaborators.

And every time he would go to a place and basically ask questions.

He was very, very, very gifted at asking questions.

Not all the questions that he asked were interesting.

Let me just say that right away.

But still, it was very productive.

And, you know, they to research community wrote a lot of papers with him.

There is even this concept of an Erdos number, which is, you know, how far away are you in the chain of collaborators from adding also a paper with with Erdos?

My Erdos number is two.

I, of course, a paper with someone who, of course, would be for Erdos.

Yeah, I'm pretty happy about that.

My number is three.

Oh, yeah, the joke was, you know, you could be on a train ride with him.

And then by the end of the train ride, you maybe work on a paper with him and have your name.

Absolutely.

I think the two of us just read basically says something about our respective age.

Yeah, it's essentially what it's said.

So anyway, Erdos has, you know, all of this problem.

And there is a very nice website by Thomas Bloom, who is keeping track of all the Erdos problems that are still open.

Hmm.

So I think there is like a thousand problem or something like that on that website.

And Thomas himself has done the work of trying to find, you know, is an expert in combinatorics.

So he can kind of say, okay, this is open, this is, you know, resolved.

This has some complicated status, you know, for every problem.

Of course, he doesn't necessarily know the answer to all of them.

So if there is a paper which is marked open, it is not necessarily true that nobody knows how to solve it.

But it is also a very interactive website where people can go on it and, you know, add comments to every problem and explain whether there is a solution, etc.

So it's a very dynamic, a great website.

So of course, once we started to have GPT be able to solve research mass problems, this sounded like a treasure trove of problem to try our models on.

And we tried a couple and to our great surprise, the model came back with answers to some of them that were marked as open.

So we got really excited about this.

The first one, you know, that I tweeted about, I don't remember when it was maybe it was in October or something like that last year.

It was a deep literature search result.

So let me explain what that means.

It means that what GPT did is that it did a vast literature search, trying to scan, you know, thousands of papers.

And it found in some unrelated field, the answer to the question.

Now it's really important to understand that it's not like in that, you know, unrelated field, the person said, okay, I'm solving an erodish problem.

It was written in a completely different language.

It was different mathematics, you have to do work to connect the two pieces and GPT did that.

So that was kind of amazing.

And this was very ad hoc.

Like, you know, we just tried by hand, basically, in the Chargipiti interface.

Once we saw that Mark Selky was, you know, in our team also, decided to have a more systematic approach of trying all of the problems.

And he tried that and the model came back with solutions to 10 erodish problems.

And this was, you have to remember at that point, there was still, I think, a very dynamic discussion about weather, you know, those models could go beyond the state of the art and discover new mathematics.

So I got very excited about this result and I tweeted about it and, you know, it's kind of an infamous tweet because people misunderstood it as kind of saying, it really found the solution to 10 open problems that are very hard.

And the solution is completely new and did not exist in the literature.

But that's not what happened.

It was connected, of course, to the previous case, where it is a deep literature search.

So there was some, you know, third with Google about, you know, we end them is about weather, you know, this is the right way to talk about search results.

But now the punchline is kind of amazing, which is a few months later.

So again, I said 10 solutions to open problems and this was solution in the literature.

And then the question is, can you find solutions that are not in the literature?

By now, we have more than 10 actual solutions that are completely new, that are publishable in top journal in combinatorics, completely obtained by, you know, some by charity and some by our internal models.

So just within, again, this really speaks to the acceleration.

In the span of just a few months, we went to, it's kind of a ridiculous statement to say that there would be 10 solution to other problems.

To, it's actually happening for real and it's accelerating.

Yeah, it's energy because it seems like that, you know, step one is have models be able to do really good literature research.

And there have been major papers and awards done, given to people who've just done literature searches and found this solution was solved here and it actually applies elsewhere.

So it's needed, it does as a first step, but now that it's actually doing a rich thing.

I mean, you know, one thing that I really like about AI research is that it forces us to confront big questions about intelligence and about, you know, research and progress and how do we discover new things.

In particular, there is this question of whether the progress that we're seeing in science is it just putting together different pieces and, you know, doing a little bit of reasoning on top of it.

Or other of those brilliant, you know, sparks of insight, everybody, of course, points to Einstein's, you know, relativity.

I'm not even sure that really counts to be honest.

So I think the jury is still out on weather this process of just recombination and a little bit of thinking whether you can kind of increase, you know, human knowledge with knowledge, with knowledge, or do you really need the sparks of genius that would be somehow only human?

Well, even he credited Africa, who was, but who came up with the, you know, the analogy, the visualization method.

You know, he said it wasn't his, we pointed like we did it and he kind of took it to the next step further obviously and I think that we sometimes we love these tiny little stories when it's a lot more complex than that.

Yeah, absolutely.

What will it mean for scientists in general if we have better mathematical tools in AI?

How does it affect other things, biology, material science?

Yeah, so again, how it affects the rest of science?

Well, the point is I think it's really important for everybody to understand.

It's not like we're doing something very, very special for mathematics.

Our techniques, our training techniques are very general.

They are applied to everything.

So our expectation is that we are seeing more progress in mathematics.

Well, one reason is because it's very easy to benchmark.

It's very easy to see that progress.

But we have full expectation that this is going to happen in all sciences.

It's not going to be limited to mathematics.

Yeah, it seems like something that's very good at going to this structure and then this is true and going through a long sequence of those kinds of statements has a lot of applications elsewhere.

We've heard the term auto-researcher do you unpack that a bit?

Right now, the way we work is exactly what Ernest described, which is really an interaction.

It's kind of a professor's student interaction, where chatGPT is a student and the professor is kind of giving a first problem and the student comes back and then they talk a little bit.

The student goes the way for another week comes back.

One point, of course, is that it's compressing those timelines greatly.

In their own story, you know, of solving these problem in 12 hours.

I mean, I don't know without chatGPT, how long would it have taken you?

Well, I have spent more than 40 hours failing without AI and I don't know, maybe the month.

Right.

Yeah.

So, so exactly.

So, you know, there is this thing of just compressing timelines.

Now, when we talk about the automated researcher, that's a slightly different vision where the model, or maybe a collection of model, would work autonomously for a long period of time.

This is kind of needed if we want to go beyond the current level.

The current level of interaction, you know, the professor's student interaction was a student comes back after a week.

It's going to be very hard with that mode of interaction to do real breakthroughs, to solve actually long-standing, you know, research problems, or to make progress in, you know, very difficult fields in biology where you need to interact, you know, with the wet lab and do all kinds of experiment.

So, once you want to go about the real breakthrough, we will need to work over longer timelines.

And this is where the automated researcher comes in.

Maybe let me say it in a slightly different way.

One concept that I'm a big fan of is this concept of AGI time.

So, you can have AGI seconds, minutes, hours, days, and so on.

So, that really means you have an AGI and for like, it can mimic human thinking but for how long.

So, as Ernest was saying, you know, two years ago, maybe models were mimicking, you know, a high school student to think for a few minutes on the problem.

Now, we can mimic a researcher who can think for hours, maybe a few days.

We really want to go towards, and this progress has been going on for now, you know, very consistently for four years where we went literally from seconds to minutes to hours to days.

And now, we have roughly at days slash one week.

We want to go to weeks if not months.

This is open research, you know, I don't think anyone on the planet knows exactly how to do it.

But this goes back to we are doing a lot of research, a lot of innovation, and I think once everything will be put together, we're just seeing this arc of progress where we keep making progress in AGI time.

But this is, this is a direction of the automated researcher.

So, the people, the other mathematicians that I, you know, talked to, they're a mode of using AI is they open up chatchipity and then they talk to chatchipity within that context window.

And you can have multiple sessions, but each session has a finite context length.

And roughly on the order of like 50 pages of a math paper.

And that's not long enough to make true, like deep, a math groundbreaking math breakthroughs.

Because a lot of math papers are longer than 50 pages.

And also, the thought, the human thought that went into to produce, let's say, a 10 or 30 page paper is usually well, much orders are mounted longer than the final output.

So, there's a limitation with the limited context window.

But for users, but people who use codex will know that you can actually have very long work sessions with codex.

So, you just keep, you know, giving instructions as to what kind of code you want to write.

And then the code itself, that you're working on, the repository of a code, which in the math sense, the analogy would be that would be analogous to like math notes that you write down.

That can be very, very, very long.

But codex has a, is pretty good at dealing with that.

It wants it a while.

It compactifies its conversations.

And it has its way of becoming this really amazing agent that can do really complex jobs over huge repositories of code, over a long, a really long context of conversation.

And this, I believe, is going to happen with the mathematics research as well.

So, we will be able to have LLMs be able to solve problems that are longer than just, you know, that require more than 50 pages of thinking.

And that's what humans do.

That's what human math mathematicians do.

People think for a day on a certain problem.

And then we kind of summarize our ideas and put it into notes the next day or the next week.

We come back to it.

And then over several months, we've thought for so, so long.

But it's sort of summarized.

It's sort of organizing a way that becomes manageable.

And in the end, the final output becomes a 30-page paper that summarizing the thoughts over many, many months or even years.

So, yeah, I think that's going to happen.

I was working on a very, very laughable problem to you guys over the weekend and using an LLM to try to do it to figure out how to use a really small LLM to do math.

In the middle of it, I needed a benchmark.

And I came across easy math, which is a benchmark for small LLMs.

And it probably was just a paper on it.

There wasn't really a lot of data.

And I just in the middle of Codex, I go, can you create my own benchmark here and just generate the data for that in five minutes later?

I had it.

And that was magical to me because I'm in the middle of working on the tool that would have involved me all of a sudden, okay, I got to spend a few hours to go do a generator, go produce this sort of stuff, absolutely.

And it runs in the background.

I can't imagine what it's like for you guys doing grown-up problems.

Yeah, I mean, what you described is really what we went after when we published the paper, whose title was early experiments in science acceleration with GPT-5.

What you have experience is literal acceleration.

This is something that would have taken you before.

I don't know, maybe a few days of work.

I would have given up.

Yeah, so that's actually a great point.

I would have given up.

This really enables scientists everywhere, like for example, mathematicians to be able to use code.

Most of our friends, they don't code, you know.

And now suddenly, they have codex.

They can do all the experiments that, you know, before they were trying to find the poor graduate and to do the experiment for them.

Now, they can do all of these experiments very easily.

The flip side is all, of course, like that scientists in all the disciplines, they can also use more advanced mathematics now.

Thanks to Chajib.

I set down a Bob Metcalf and showed him how to use codex to do R.

And he's working on a project and R was new to him and he wanted to learn that.

Yeah, and I was kind of a fun experience to take somebody's got a great mind and say, oh, here's instead of spending a lot of time having to figure this out.

There's the tool for you.

But of course, now, as you alluded to before, we should talk about the role of the human in all of these.

What is the place for the human?

Especially if we start to think about, you know, let's think a little bit about the future.

I'm not a big fan of trying to predict the future.

I like to explain what happens.

What do you think will happen?

I think I think, you know, there is what my heart tells me and there is the rational aspect.

So what my head tells me is, look, the progress has been happening, you know, very consistently for the last four years, from being able to solve mass problems that would take you seconds to minute to hours to days.

There is no reason anybody would look at the situation would say, okay, yeah, from now you will have systems that can think for weeks.

To years from now, systems that can think for, you know, years.

Not only that, but already today, we're finding that our models, they are able to really surface humans in the sense that they can find mistakes in papers.

You know, we had system, we had agents that internally that have been able to come up with to find papers and say, hey, actually this is wrong.

Here is a correct answer.

Not only that, but people tend to think that AI is only good at answering questions.

Actually, no, it's also pretty good that asking questions.

Of course, you need to be, you know, again, you need some research innovation there, which we had.

And now our models are very good at asking questions.

So good, in fact, that humans are looking at those questions and saying, hey, maybe I should write a paper based on this question.

So, so this is, you know, really, really already happening now.

So, I think what I'm trying to say is that, you know, year in two years, yes, models could do basic, more or less everything that human researchers do.

So, now what?

What is the role of humans?

But why is it that we're doing science?

What's the point?

You know, the point is not to, I mean, I think it shouldn't be, to just solve the problem for the firm of solving problem.

We're solving problems because we're trying to understand something.

The understanding piece is key.

We're not solving problems to write papers, to show, to say that we can write, you know, 10 times more papers than our neighbor.

That's not, that's not the point.

You know, you can do a competitive chess if that's, if that's your kind of deal.

We're trying to really understand deeper things.

And why are we trying to understand deeper things?

Because we want to have better control over our environment.

We want to be able to cure diseases.

We want to be able to build things, you know, better faster, more robust, more solid, all of those things.

So, I think there is a chance that we're looking at a very, very bright future using those tools as long as the human stays in control and guides what are the problems that matters?

Problems that, you know, the AI doesn't care about curing diseases.

I mean, you know, they will not suffer from the same diseases we do.

But we do care.

So we have to control them and to guide them towards those problems.

At the time of the advent of the first computers when the computer went from being a person that did the math to an actual machine that did it, you saw some people looking at maybe we all have to move from math to physics because that's where their hard problems are going to be and there's not going to be any more hard problems in mathematics because computers resolve that.

Now, it was the 1940s and 1950s and it turned out that that's not the case.

The computation opened up a whole new branch of that and that's what's going to continue that we're just going to see that the mathematician that's in high school today is going to have a very exciting future 30 years from now because of what's happening here.

I think math is going to be so much fun.

So math is, so mathematicians enjoy solving problems.

But, you know, pre-AI, you know, we would think for months to solve a problem and that's there's enjoyment in that but it's it's quite grueling in a tool.

There is a lot of pain and there's a huge like there's a surge of dopamine when you actually find the solution.

That's going to be accelerated.

So, you know, more solutions, more fun.

But also, I think math is going to become much more richer because it's going to be much more interconnected because there's a lot of at research level, a lot of math is hyper niche and when you write the paper, you know that there are only five living humans right now that will care about this paper but you like the results, you put it put it out and then the five other people appreciate it so they read it.

But then, you know, 20 years later it's going to, well, it's going to be in the archive somewhere and nobody will read it.

But now that we have AI, the AI will have read it and if there is a useful connection as Sebastian mentioned, it will surface it and then people, people, you know, 100 years down the line will discover it and use it for whatever they want to use.

So there's, I would now have much more confidence that my results that are just like that is put out there will be used if there is a use in the future.

And also, I'm now able to access the mathematics in a much broader way.

There are fields that I've not studied but, you know, if a result comes up then I would still have to study that field to be able to use that particular result in my research but there is no way I could have found that result without the assistance of AI but now it's accessible.

The model tells me, hey, you can use this to solve your problem and then, well, okay, I'll go and try to use that.

So math is going to be a much more interconnected enterprise and also verifying correctness of mathematics is actually quite non-trivial because imagine there's a proof written by some, you know, somebody that's, it's 300 pages long and it claims to solve a really important problem and this person is a very reputable person so like there's there and the paper at a surface looks, you know, plausible.

How do you know?

Well, I mean, these are, this is a process that takes years to verify and it's also not enough that one person reads it, many people need to read it and read it and then try to extend it and then look into the details.

This is a process that takes years and sometimes, fatally incorrect proofs are published.

So that's also a very slow process where the field initially accepts a result but later on discovers that it's unsalvitable so then it needs to get filtered out.

This is going to be so much more accelerated with AI.

So right now our chatgbt and our AI models are not perfect at verifying mathematics but it's very good and also it has much more patience than when humans.

So the truth is so much of the published mathematics have minor mistakes and some a lot of them do have major mistakes and we know because we have tested these things with our models but now I think the more richer future of mathematics is that this will be through AI verification.

We will have much more certainty as to which results are correct, which results are incorrect and we'll have a much faster feedback on this.

The paper published put out a week ago, we could get a verification on that and then we could trust and build on that as opposed to rating for five years to really ascertain the correctness.

So it's overall math is going to be much more fun, it's going to be much more interconnected.

It will be we'll be able to trust it with us more, we'll be able to move faster and the mathematicians will solve harder and more interesting problems.

So maybe one thing that I want to add so I totally agree with everything that you just said it's going to be a lot of fun but I want also to talk about one potential danger of the current progress which would be that we kind of hand the keys to the castle to the AI and that human just start to trust the system a lot more and that they don't do the hard work that we kind of did to own our skills and to own our skills to be able to verify and to sit patiently for hours, many days in a row or many weeks in a row to try to understand deeply a result and instead just kind of ask that GPT to explain it to us in simple terms.

So basically I'm worried about potentially having a shallower understanding of things because we rely too much on the tool.

So I think it's really important for the audience for everyone listening to us to understand that expertise is even more valuable than it ever was.

The reason why we are able to squeeze out those results from the GPT is because of all of those years of training and our deep understanding of the subject.

If it wasn't for that we would not be able to push the state of the art and we're seeing it.

It's not like we're seeing you know thousands of people like non-masmaticians suddenly being able to to prove new result.

In fact if anything we have seen recent examples in social media where non-masmaticians have tried to use those tools to prove serum and and come up with you know many tens of pages of proof and then it turns out to be just wrong.

So this is a danger that we have to grapple with.

It seems like that's going to be a problem a lot of things.

You see people spend you know using current models that often just reinforce things you want to hear and that can be kind of your meal.

I'm going to come up with some sort of unified theory or whatever like we'll guess why that's going to be a lot harder.

Yeah I mean this sort of issue of mental sort of atrophy if you will is also I think very prominent in coding as well.

So I mean I'm not a yeah I wasn't a computer science major but I took some computer science courses and yeah I did I coded myself I wrestled with the debugger and most people of my age did but nowadays you don't have to do that in your university curriculum and I think that's very dangerous.

I've heard some people in the sciences we look at the progress of very optimistic like well we're not going to need scientists or not going to need this anymore and oh yeah no wow this is terrible.

So really I want to make sure anybody listening please do not say that this is the opposite of what we need.

We need more scientists than ever.

Those scientists are going to be more productive, more powerful.

They will do better things but we need them to be really really good at their craft and I think this is where you know obviously, openly I cannot do everything you know just to to say it out loud and this is where the existing institution have a very big role to play.

So academia needs to both understand the rate of progress and you know how fast this is going but also to kind of reclaim their role in in that process.

Yeah my hope and expectations we're going to see more people going to the sciences because if you decide later on and like that you want to get into this it's easier to catch up if you're dedicated because you have the greatest tutoring the world open I just added it to chat GPT can has as a visual explanation tool now that helps you explain things and I think that people you know just because all of a sudden an AI model is able to you know completely top out you know a benchmark doesn't mean you go okay we're done.

We we we sold great school math.

Congratulations everybody.

AI is done it's like no there's a next level and next level and you can need people.

No I think it will help I mean the young generation to get up to speed in science like so much more quickly that's for like I cannot imagine if I had a GPT you know as a teenager I mean I remember looking at Maxwell equation and being like what does it really mean wow did they come up with this stuff now you can just ask it and it will explain it to you so beautifully it's a big deal but you still need to do the hard work on top of it.

The with a lot more people trying to create mathematical proof so you don't know what they're doing and aren't really maybe putting the right scholarship to make sure that we've seen areas of code repose and whatnot and people contributing fixes that aren't real fixes and things like this how do you solve for that if if I'm somebody who's involved in mathematics or journal right now I'm a little bit terrified.

Yeah so I think what Ernest said is that you know AI can help also for that so we can have on the other side of it of those systems to have AI agent that are also going over everything trying to verify as much as possible and then again we do not want to trust fully the AI to verify and to accept the paper or to accept a comment but we can have the AI agent flagging specific potential issues so kind of bringing to the front okay hey maybe this this but I'm not totally sure about it so that will accelerate that that will help you know the human to have less to verify basically and I think the sort of social structure of mathematics or you know code it has to change a little bit in a way that the human doing the commit or human controlling the agent takes responsibility so in mathematics there already is a culture of well if you put out an incorrect proof then well that's that it hurts your reputation and you're putting your reputation on your line on the line when you put out a paper with your name um and that has to I think we need more of that.

If you're mathematically curious and somebody is watching us for listening then they maybe have an interest in math but maybe they didn't feel they were a math person but they're kind of curious to get started what would you tell them good chat with judging PT if you are interested in learning then it's so helpful like at even at the research level when I need to learn a new concept I would habitually go to Wikipedia and then it's just very dense and I'm like okay well after like 30 seconds I go okay let me ask chat to PT and then I ask it and then I also ask follow-up questions and and when I do so it it gives me so much so much more helpful information that is tailored to the the parts of my knowledge that is missing because I'm a passing the questions tailored towards that and you could you could imagine explain to chat to PT your mathematical background the the the the things that you the books that you've read the material that you've learned and then ask it to come up with a question that is would be open and also would be understandable with your level of expertise Sebastian mentioned this I think you know people I don't think people had appreciate that these LLMs are able to come up with good questions but I think they can so having this companion that you can talk up talk talk about math with and talk talk about questions you could ask the model to help you solve it and once you have a solution then you could keep talking and you'll come up with the next question you know variations of this it it's becomes a much more even though you're still in your room alone it it feels much less of a solitary process and that that's what really makes mathematics fun because math I think it really is a social endeavor I think and toy problems be fun and I tell people we can start with like how many M&Ms can you fit in your bathtub yeah it sounds silly and you start to ask and like then you go how many words did you read last year how would you figure this out and then you can start to have this real wonderful conversation and start asking these questions next thing you know you're starting to do more more complex mathematics and realize how it should affect you a gentleman this is great Sebastian Ernest thank you very much thank you for having us