OpenAI Podcast · 2025-11-20

How AI Is Accelerating Scientific Discovery

Hosts: Andrew Main

Guests: Kevin Weil, Alex Lupsasca

AI for scienceGPT-5OpenAI for Sciencephysics researchmathematical proofliterature searchbenchmarks (GPQA, GDP-Val)reasoning / test-time computefusion energydrug discoveryhuman-AI collaboration

Why it matters

Frontier problems currently have low non-zero pass rates.

Key claims

  • OpenAI for Science's stated goal: compress ~25 years of scientific progress into 5 by giving top scientists access to frontier models.
  • Alex Lupsasca described himself as an AI skeptic entering 2025, converted after GPT o3-pro solved a pulsar magnetosphere sum using a 1950s Norwegian math identity.
  • GPT-5 Pro correctly derived conformal symmetries of black hole equations after a warmup prompt—a calculation Lupsasca places at the edge of human ability.
  • Upcoming OpenAI paper (~12 sections, 8–9 outside academics across math, physics, astronomy, CS, biology, materials) documents working uses of GPT-5 and includes 4–5 new non-trivial mathematical results.

Episode summary

Summary

Kevin Weil (head of OpenAI for Science) and Alex Lupsasca (OpenAI research scientist and Vanderbilt physicist) discuss how AI is beginning to push the frontier of scientific research. OpenAI for Science's stated mission is to compress roughly 25 years of scientific progress into 5 by putting frontier models in the hands of leading researchers. Weil argues GPT-5 is crossing into territory where it can produce novel results—small breakthroughs and existence proofs that mark the transition from 'the model can't do this' to 'I can't imagine doing this without AI.'

Lupsasca, a self-described former AI skeptic, recounts concrete breakthroughs: GPT o3-pro deriving a closed form for an infinite sum involving Gegenbauer polynomials tied to pulsar magnetospheres (drawing on a 1950s Norwegian math paper), and GPT-5 Pro finding conformal symmetries of black hole equations after a warmup prompt—an answer at the edge of his own abilities. Both guests emphasize that current frontier use requires back-and-forth iteration, low pass rates, and even 'warmup' prompts, and that a major OpenAI for Science goal is reducing this cognitive load for researchers.

They preview an upcoming paper (12 sections, ~8–9 external academic collaborators across math, physics, astronomy, CS, biology, materials science) that catalogues working use cases and includes 4–5 new non-trivial mathematical results. Weil repeatedly stresses that today's models are the worst we'll ever use, that pass rates climb sharply with more thinking time, and that giving expert scientists large amounts of compute is itself an underleveraged lever. Looking ahead 5 years, they highlight personalized medicine, scalable fusion, and AI-driven experimental design—while framing OpenAI for Science as a general-purpose push rather than a top-down agenda. Weil acknowledges parallel work at Google (Demis Hassabis, AlphaFold) and notes GPQA has gone from GPT-4 at ~39% to recent models near 90%, requiring harder frontier-science evaluations.

  • OpenAI for Science's stated goal: compress ~25 years of scientific progress into 5 by giving top scientists access to frontier models.
  • Alex Lupsasca described himself as an AI skeptic entering 2025, converted after GPT o3-pro solved a pulsar magnetosphere sum using a 1950s Norwegian math identity.
  • GPT-5 Pro correctly derived conformal symmetries of black hole equations after a warmup prompt—a calculation Lupsasca places at the edge of human ability.
  • Upcoming OpenAI paper (~12 sections, 8–9 outside academics across math, physics, astronomy, CS, biology, materials) documents working uses of GPT-5 and includes 4–5 new non-trivial mathematical results.
  • Frontier problems currently have low non-zero pass rates; models often need warmup questions and iterative prompting—OpenAI is researching how to reduce that cognitive load.
  • Pass rates continue to improve with longer thinking time; Weil says models can already think for 40+ minutes and could be allowed much longer for expert scientists.
  • GPQA benchmark: GPT-4 at ~39% in 2023, recent OpenAI models near 90% (above the ~70% human expert baseline), driving need for harder frontier-science evals.
  • Weil acknowledged parallel work at Google/Demis Hassabis on AlphaFold and stated the broader goal as enabling '100 scientists to win Nobel prizes using AI.'

Source material

Transcript

Hello, I'm Andrew Main and this is the OpenAI Podcast.

Today, my guests are Kevin Weil, head of OpenAI for Science, and Alex Lupsasca, who is an OpenAI research scientist and professor of physics at Vanderbilt University.

We're going to be discussing how AI is impacting science, an upcoming research paper, and where science might be headed in the next five years.

Maybe the most profound way that people are going to feel AGI in their lives is through science.

With chatGPT, I can just launch it in that direction, in that direction, in that direction.

The acceleration that is going to come from these tools is going to change science.

So you're running the OpenAI for Science initiative.

Could you explain what that's about?

Yeah, the mission of OpenAI for Science is to accelerate science.

So the question is, can we help scientists do the next, say, 25 years of scientific research and scientific discovery in five years instead?

Science underpins so much of what we do and how we live.

If we can make progress go faster by putting our most advanced models into the hands of the best scientists in the world, we should do that.

That's what we're trying to do.

You could ask, why now?

Why didn't we do this a year ago?

Why aren't we doing this a year from now?

One of the big reasons is we're just starting to see our frontier AI models being able to do novel science.

So we're starting to see examples where GPT-5 can actually prove new things.

Maybe not yet things that humans could not do, but things that humans have not done.

So these little existence proofs of GPT-5 being able to break out past the frontier of human knowledge and into the unknown.

And if there's one thing that I've learned from now, a year and a half or so at OpenAI, it's that you go very quickly from the model can't do something to the model can just barely do something.

And it's not great at it yet, but you see these early examples.

And then six months later, 12 months later, all of a sudden you couldn't imagine doing this thing without AI.

And I think science is in that initial phase where we're seeing real acceleration for scientists that are using AI.

Sometimes novel, not yet maybe large breakthroughs, call them small breakthroughs.

And that just says that there's so much potential in this space.

We've seen examples of, let's say, AI helping with mathematical proofs.

Could you give me an example of how it might do things in some other areas like physics or whatever kind of things we might see in the short term?

Yeah, I mean, we're seeing examples every day and they're across the range of sort of the scientific frontier.

You see examples in mathematics, in physics, astronomy, life sciences, like biology.

Alex, I mean, you've worked on some of these.

Maybe it's a good time to talk about some of the physics stuff that you've seen.

Yeah, I think coming back to Kevin's point about how this is a special time, that's very much how I feel as well, because I started the year 2025 thinking, yeah, chat GPT is cool.

Like everybody, I used it when it came out and I thought it's a great chat bot, but I was sure it would take a very long time before it would become really relevant for my own work.

So I started the year, I would say, as an AI skeptic, because I like to see evidence before I'm convinced of something.

And I saw people using it to help in the writing and I started to use it for that as well.

It's very useful for proofreading, but I thought, oh, it's going to be a while before it gets to do the special stuff that I'm really a specialist at.

Black holes.

Like black hole physics, exactly.

And I had this experience early this year where I was trying to find this magnetic field solution that describes what happens around a pulsar, which is a rotating star with very powerful magnetic fields.

And I was going for this very particular solution.

I had to solve a partial differential equation.

I was able to identify that solution as an infinite sum over products of special functions called the Jean Polynomials.

And if you go to physics grad school, this is the kind of thing that you spend a lot of time getting familiar with.

And I also like these puzzles and I was playing around with the sum and I felt like there should be a simple formula that it evaluates to.

And I thought, okay, I have this friend who has chat GPT 03 pro, which I didn't have access to at the time.

And I thought, okay, I'm just going to send it to him and see what comes out of it.

And he sends me back this output.

It thought for 11 minutes, which at the time I'd never seen it do because I was using the free version, which doesn't think for as long.

And it gave this beautiful answer where it was able to understand what the sum was.

And break it down into pieces that it could tackle.

And then it had to go and find this special identity that was published in one paper from the 1950s in the Norwegian Journal of Mathematics.

And so it understood what the problem was.

And it knew about this random identity that was just the thing for the job.

And it used them and it gave this beautiful output.

And at the end, the answer was wrong because it made this silly typo.

It added an extra factor in front for it.

It was almost kind of like a human making a silly typo at the end, but it was very easy to check the derivation.

And I went through it and I realized, okay, there's this extra factor.

But aside from that, it did the work.

And that really sent me reeling because I thought, okay, I would say that's a uniquely human ability.

I thought that's something that makes theoretical physicists special.

You know, now in 2025, clearly they're capable of doing things that I would consider amazing.

Yeah.

One of the cool things.

So you've got examples like Alex's, where it was probably not something that he, he could have done it himself over eventually, but GPT was able to do it faster.

That's acceleration on its own.

And there's something qualitative about that, even as well, because if you can explore, instead of exploring two paths over the course of a week, if you can explore 10 paths in parallel in, you know, an hour, all of a sudden there's a lot more ideas that you can try.

And that's also acceleration.

We also see examples in like literature search, which you don't think of as maybe like deep scientific innovation, but it's really important to be able to understand, you know, has somebody worked on this problem before?

And if so, is there something I can learn to speed up my own work?

So, and we've seen interesting examples where there was one, I might get the details of this wrong, but we were talking to this researcher and he was saying, he was exploring this particular idea in like high dimensional optimization.

And he was like, man, you know, this thing I'm working on, it's interesting, but somebody must have worked on this before.

I can't be the first person to have had this idea.

I just can't, but I can't find any examples.

And then he had given it, he sort of given a description of what he was working on to GPT five and GPT five found an example from, I think it was like economics or something, a completely different field that use completely different terminology.

So no keyword lookup would have ever worked.

GPT five did sort of a conceptual level literature search, found somebody's PhD thesis in German.

So also a completely different language, you know, it was like basically lost to time, but this person had done really interesting sort of related work that helped him in his research.

And so, you know, that's another area.

So you can talk about, uh, the acceleration that comes from just like novel proofs and GPT five being able to do something on its own or guided by an expert.

But there's also these examples of acceleration and calculations and literature search and, and all of them contribute to accelerating science.

Yeah.

And the exact same thing happened to me.

I was trying to derive this property of black holes and I got this equation that described this phenomenon I was after and it had a three derivative term, which is pretty unusual.

And I looked at it and I recognized it's something called the Schwartzian derivative, which is a special thing that appears in math.

And I thought, hmm, wow, this is really strange.

This would show up.

And I just copy pasted the equation into Chad GPT and said, do you, have you seen this before?

And it said, oh yes, this is the conformal bridge equation.

I had no idea what a conformal bridge was at the time.

And it said, oh, just look up this paper.

And that was amazing because it turns out that this equation that showed up, in my work had already been studied in some other works.

And I've heard from a lot of colleagues doing research in physics that there's a lot of that going on.

And at the forefront of knowledge, everything becomes so niche that it's very hard to know the latest details in neighboring fields.

And GPT is an amazing help with that.

Yeah.

That's another thing that we've heard from professors, researchers that we've talked to is there's so much, you have to be so specialized today.

And so sometimes it gets hard to explore an area outside of your main area.

There's one particular, um, mathematician we were talking to said, you know, one of my last papers, I knew there was an area that I wanted to go follow it off in this direction, but it wasn't my specialty and it would have taken me a long time.

And I just kind of ended up feeling like, you know, maybe that's not the most efficient place for me to spend my time.

Now with GPT five, I'm going to go back and explore that because I've got a coworker, effectively a collaborator who has read just about every scientific paper that's out there and is, is, you know, a pretty meaningful expert on just about any topic you want.

And I think I'm going to be able to go explore these adjacencies in a far better way with chat GPT than I could have on my own.

And so that's also a fascinating new take, right?

It helps every, it can help you go deeper.

Like you were saying, it can also help you go more broad.

Literature search is pretty interesting.

It's like one of my weird hobbies is I like to go back and look at when was some early scientific discovery made that didn't get utilized so much later on.

Yeah.

Famous one was carbon filaments.

You know, when Thomas Edison spent all that effort to try to find it, you know, had been published in like 20 years before, you know, Dewey Decimal System was invented that year.

So you can't blame him.

Uh, other things like, uh, silicon is this in my conductor, you know, somebody would read in the literature.

We might have had that five to 10 years earlier, build a replicate DNA that had been published like 10 or 12 years earlier before somebody figured that out.

And then the shotgun technique we use for DNA, you know, understanding, you know, figuring out like the DNA sequencing that was first published like 1982, but at that time there weren't supercomputers that could run it.

Right.

And that's exciting just to think of just having a really good tool that can search through all of this stuff and pull up these answers you have.

Yeah.

And I think especially some of the most interesting research now happens at the, at the intersections of, of two fields.

And again, it's, it's hard for one person to be an expert in two fields, let alone three or four or five.

And sometimes it's tough for humans to collaborate.

You don't necessarily find the right person.

The person doesn't have infinite patients.

And here with GPT, you have now the option to have a collaborator that will work 24 seven has infinite patients, you know, has read substantially every scientific paper written in the last however many years.

And so it's just, it's, it's a new kind of collaboration that is its own form of acceleration.

Think about like Claude Shannon's wife was a mathematician and how much that to help what he was able to do.

And I think we forget how much collaboration really is a factor of that.

But I would say some people hearing this might go, yeah, but it couldn't spell strawberry last year.

Yeah.

It couldn't do math.

So why are we going to have it do, you know, science?

Yeah.

So actually, uh, I don't even know if I've told you this, my own sort of origin story, I've appreciated, uh, what GPT five could do.

Or in this case, it was, I think, oh, this was almost a year ago.

So it was oh, one preview maybe.

Um, but I was meeting with, uh, this, this guy named Brian Spears, who's a physicist at Lawrence Livermore.

Um, there was in DC and we'd never met before.

So I didn't know sort of what to expect.

I thought maybe I was going to go in and, um, be talking to him about what was new and what he could do with Oh, one preview and why he should give it a try.

Little did I know I sat down and, and he immediately took control of the conversation and said, let me tell you what I can do with your models.

And like, these are the most amazing things for science and this is going to change the world.

And he was like, okay, let me take you through this.

And he opened up his laptop.

Uh, and you know, he works on fusion, right?

Lawrence Livermore, uh, it was the first to, to do large scale fusion with positive energy, like super exciting.

So he's like, all right, we're going to take a fusion example.

And first I'm going to start with the undergrad version of this problem.

And so he, he shows me this conversation and he's like, all right, so you've got, you know, uh, uh, a copper rod and we're going to bombard it with super high pressure waves.

What happens?

And you know, he's like, so he answered and Oh, one preview gives a good answer.

It's like, okay, cool.

So it got the, it got the, uh, got the undergrad problem.

Right.

And then now let's, let's ask the graduate version of this.

Now what happens inside the rod itself as you're doing this and you know what, what needs to be true in order for it to generate these certain kinds of shockwaves.

And he goes through and he's like, okay, so got that right.

All right.

Now let's ask the postdoc level question.

All right.

Now let's ask the, and at this point I'm like, you know, despite having a physics background, I'm just following along for the ride because he's beyond anything I can do.

Like, all right.

Now let's ask the, you just joined Lawrence Livermore and you, you know, kind of question, you've gone through your postdoc, you're a nuclear physicist and he keeps going and Oh, one preview keeps getting the answer.

Right.

And then he's like, all right.

Now let me ask you that you've worked at Lawrence Livermore for 20 years question and it goes and it gets it right.

And then not only that, but it like suggests that the only way to go forward is to use these, these set of simulation tools that are like partially classified or the only Lawrence Livermore has.

And it's like, you know, I don't have access to these, but if you did, you would want to use these tools.

And he's like, look, nothing in here that nothing that I just showed you was something that I couldn't do, but it would have taken me days.

And certainly not everybody at the lab can do this.

Like the acceleration that comes, that is going to come from these tools is going to change science.

And so I went from like sitting down with this guy who I thought maybe I was going to be sort of talking to him about the value of AI to him just completely blowing my mind about the, the potential of AI.

And this is a year ago.

This is O one preview.

You know, we've come leaps and bounds since then.

And the thing that I always try and, and like remind everybody, the AI models that we're using today as good as GPT 5.1 pro is, these are the worst AI models that we will ever use for the rest of our lives.

And when you think about that, the fact that we're here just implies that the future is very bright.

How have your colleagues been using these tools?

Yeah, there's a lot of different usages, I think literature search, here's what I'm working on.

Does it connect to any other thing?

And this is something that we spend a lot of time on as scientists just understanding when something new shows up in our work, how it connects to other things.

And okay.

My own experience that made me become AI peeled, I think is just the reason you came to open AI.

Yeah.

And when GPT 5 pro came out, I met Mark Chen who works here at open AI.

He's chief research officer.

And he gave me a challenge.

He was very proud.

He said, you know, why don't you just give it a hard problem?

And I thought, he want a hard problem.

Okay.

And so I gave it this question of gravity.

Right.

So I had just found these new symmetries of black holes, which is something that doesn't happen that often.

And I'd written up a paper that came out in June on the archive.

And I was very happy about that.

And I thought, okay, well, let's see how GPT pro handles this new question.

And so I gave it the equation.

And I didn't say that it has some symmetries.

I didn't give it a leading question.

I just said, what are the symmetries?

And it thought for five minutes and said, no symmetries.

And I go, it's not there yet.

Still better than the AI.

And Mark said, it's visibly crestfallage.

Okay, well, just just give it an easier question then.

And so I think, okay, I'm going to give it the warm up baby version of the problem, which is find the symmetries of this equation, not in the full black hole spacetime, which is complicated, but in the flat space limit where the spacetime is empty and hit enter.

It thinks for, you know, nine minutes.

It comes back with this beautiful answer.

Oh, this equation has conformal symmetry, which is the correct thing.

And here are the three generators.

And it was very beautiful.

And you know, this version of the equation probably has been studied, I'm sure has been studied many times over the decades.

So I don't know what it did exactly, but it came up with the answer.

And I thought, okay, this is very good.

You know, this is a great outcome.

And then Mark said, okay, well, but now that it's been primed on the warm up example, try again in this instance of chat, the harder problem.

And I thought, okay, let's go.

And so we give it the hard problem again, hit enter, and it thinks and it thinks.

And that was the first time I saw it, I think for so long, I think it took 18 minutes.

And it comes out with this beautiful answer that was completely correct.

And that blew my mind because I had been working on this for a very long time.

And I would say that that calculation is at the edge of my abilities.

I think it's something that, you know, very few people could have done the way I did it.

And so I was really shocked because, you know, you spend years of your life training to be best in class or something and fighting symmetries of black holes and these kinds of equations.

That's my jam.

And I thought, okay, so I guess that just happened.

And it really sent my mind reeling.

And I was a little bit shell shocked for a few days and I just couldn't stop thinking about it.

And after that, I realized, okay, I have to become involved in this, because to see this capability emerge into the world right now and not to not be involved with this just seemed crazy to me.

I was going to actually think you made a really important point in the middle of that around the fact that you gave it the hard question.

It didn't get it right.

You gave it an easier question.

It got that right.

And then you're able to give it a harder question.

There is still, you know, as excited as we clearly are about the future here.

There's also a very real set like when you're giving GPT-5 or any of these AI models a problem that's on the frontier, that's at the limit of their capabilities, they tend to still be wrong a lot.

Kind of like any human would be operating at the level of at the frontier of their capabilities.

And it takes, you know, it isn't just automatic yet.

Hopefully in the future, it will be, you know, enter in any hard question and the model answers it.

But today, there's a lot of back and forth and the people that are best, the researchers that are best at getting the most out of the models, have a sort of patience to go back and forth with them.

I think that's natural.

It's probably the way that you would work with any, you know, any two people operating at about the limit of their capabilities.

But I think it's important, especially for folks listening to this who are doing research with the models, to know that it's not, it isn't just one shot and it always works.

There really is a back and forth and sort of a patience that it takes.

And one of the interesting research problems that we're spending a lot of time thinking about is how we help people with, how we sort of help reduce that cognitive load.

Because when you're working on a problem, say the model has a 5% pass rate on some problem.

So technically, the model can get it right once out of 20 times, but it's really at the frontier.

So it's not going to get it right nearly, you know, even close to every time.

If you're sitting inside chat GPT and just entering in this question, you're going to have to enter it in, you know, what, 10 times before you have the odds that it's going to get the right answer.

And that's, most people aren't going to do that.

And so there's a whole host of problems that the model can solve that people probably try and are like, Oh, after three tries, it didn't get it right.

So let's all move on.

The model's not good enough yet.

And actually it is, but it's just very hard to tell apart low pass rate problems from problems that are too hard.

And I think that's actually a really important thing for us to help researchers and mathematicians get past because the most interesting problems right now are going to be the ones where the model has a very low, but nonzero pass rate.

Those are going to be the hardest problems that the model can solve, the best ways that it can, that it can help accelerate science.

And so that's a really interesting research problem that we're taking on to try and make that a little more automatic, a little less ground work.

But for now, like putting in the time and really going back and forth with the model does yield results.

Well, it feels like we're at a moment, kind of like when we went from GPT 3.5 to chat GPT 3.5 was a model, extremely capable model, but it was still effectively a base model.

And I was a prompt engineer at the time and knowing how to prompt that I could get great results for it, but it took all those little tricks to sort of understand the context.

Then we went to chat GPT and we understood, okay, we know the kind of problems people are trying to solve.

Let's make it a little bit easier for them to get there without having to do that.

It feels like that's kind of where we're heading into a science though, that now that you have people like Alex explaining the problems you're trying to solve and what you're doing that we may see like a big acceleration with this.

I think it's probably just a characteristic of any question that's on the frontier of, or sort of at the limit of what the models can do.

And back with GPT 3.5 and early versions of four, the questions that were at the limit of what the model can do were much more basic.

Now they're questions of scientific research, but you still, when you're operating at the frontier, the pass rate will be low.

And so you got to kind of like, there's value in sticking with it and trying a few different things and taking the parts that it gets right and refining them while telling the model where it got other things wrong.

In this example, I mentioned it needed a warmup, but the warmup was the obvious warmup that you would do as a human.

Because actually when I was attacking this problem, I wasn't thinking about the black hole case first, this flat space limit was the obvious place to start.

And that is where I began.

And so I think the models are actually really good, but we could get better at making them think of the warmup problem themselves so they can go there directly.

But more generally, I think there's this thing we have to bear in mind, which is that as scientists, our role is to push the edge of knowledge.

There are things that are just beyond the edge.

And our goal is to bring them before the edge of knowledge by understanding them.

But this edge is very jagged.

So there are very basic questions about the universe, like why are there three dimensions of space or what happened to the Big Bang?

These are things that everybody wants to know the answer to.

And yet, even though there's simple questions, there's really nothing intelligent to say about this.

We just don't know.

They're very hard problems actually.

And then meanwhile, there are these very hard questions that you would think we wouldn't be able to answer at all, to which we have extremely detailed answers.

We can predict the electron dipole moment to, I don't know, 12 decimal places, something crazy.

So the edge of human knowledge itself is very jagged.

And it takes many years of graduate school to learn where the edge is.

And I think what we're finding with these AI models is that the edge of their knowledge is also very jagged.

So you mentioned there's some basic questions that the models can't answer.

That's true.

At the same time, there are some very hard questions that they're very well suited for already today.

And I think what's exciting is that their edge of knowledge is very jagged in a way that's different from ours.

So obviously, as time goes on, I think the edge of ability for these models is going to keep expanding.

But as long as it expands in a way that is slightly different from our edge, that's also really interesting because at the intersection where it can go farther than us or we can get ahead of it, that's where a lot of interesting things are going to happen, I think.

Yeah, human and AI together are much more powerful than human alone or AI alone.

I want to explore that a little bit more.

But first, tell me about the research paper.

Yeah, so we've talked a bunch about these anecdotal examples that Alex has gotten from the time that he spent with his colleagues that we see coming in across Twitter on a semi-daily basis at this point.

And we wanted to bring them together and just write something, publish something that lays out the current state of GPT-5 with respect to science.

And so what we've got, it's a handful of collaborators from Inside Open AI and I think eight or nine academics from beyond our walls across a bunch of different fields, math, physics, astronomy, computer science, biology, material science.

And the paper is something on the order of 12 sections, each one highlighting a different way that GPT-5 is accelerating their work.

The goal was not to be, you know, hypey and say everything is solved.

It's really to say hoverboards for everybody.

Yeah.

Like, this is what works.

This is what doesn't work.

Here's what I tried.

In many cases, we're sharing the chat GPT, you know, the full share length, the conversation.

So you can see the back and forth that the scientist has with the model.

And it's meant to be kind of a moment in time to say this is where we are today.

And I think we'll look back in six months, 12 months, and we'll probably be much further and that'll be exciting.

But even where we are today, we've got a section in the paper on a bunch of different examples around literature search, a section in the paper with a bunch of different examples around acceleration, whether it's calculations and other things like that.

And then a section where we actually contribute four or five new non-trivial results in mathematics.

And a couple of these are small.

A couple of them probably could have been papers on their own.

And so you go from kind of the mundane, but very pragmatic and real bits of acceleration to the more sort of profound GPT-5 actually pushing past the current frontier of human knowledge.

And so we're super excited about this paper.

It's, you know, I think there'll be a lot more to come.

We're not the only lab doing great work, by the way.

Google has been doing this for a while and I have a ton of respect for what Demis and the team have done with Alpha Fold and more.

I just think we're at a really exciting time.

You know, ideas in science often have their moment when you have multiple people coming with the same idea, whether it's quantum mechanics, like Alex was talking about, or the light bulb.

Right now, it's very clear that AI is just beginning to change science and it's going to be an exciting few years.

What advice do you have for students and grad students in the sciences?

Because I hear people talk about like, "Oh, we're not going to need scientists anymore," which sounds absolutely crazy.

It's not like the telescope got rid of the astronomer.

It actually created the astronomer.

How do you feel about that and what advice do you have?

Okay, I think first of all, it's important to acknowledge there's a lot of anxiety in academia right now that is unrelated to AI.

It has to do with lots of changes in the way that science is organized in this country and we're still going through these changes.

I think that talking to young people, there's a lot of anxiety surrounding this.

I actually think AI is a really exciting new tool that's becoming available that is going to help a lot because it's just going to make everybody just so much more efficient.

As Kevin was mentioning earlier, when you work on a research project, oftentimes you don't know which way exactly to go.

You know you're here, you want to get there, but there are different possible paths, different lines of attack.

The whole point of research is that from the get-go, you don't know which way to go.

One of the things that's really fun, actually fun with GPT is that you can just say, "Hey, I'm trying to solve this.

Here are some ideas I have."

You can upload some notes that you have or just describe it in a few sentences.

It's very good at getting what you're trying to do.

Then you can just say, "What if I approached it this way?"

or "What if I were to do it this way?"

It can immediately go off and chart a path through the unknown, just signposting different potential avenues.

That actually saves so much time because I'm a human, I have a little bit of time, energy, and when I'm going to put in the effort to do a calculation, I spend a lot of time trying to prototype it and think ahead where it's going to take me.

With chat GPT, I can just launch it in that direction, in that direction, in that direction.

It doesn't completely get everything right, but just having these signposts along the way is so helpful because then when you do go down the path yourself, you have somebody helping you along, it feels like.

I think that's just going to make everybody faster, more productive.

Already, the young people that I meet are spending a lot of time experimenting with chat GPT and figuring out its capabilities.

I think it's going to be a boon for everyone.

You mentioned part of the idea of the paper was to say, "Okay, this is where we are now.

Let's go look in six months.

Let's talk.

We're five years since GPT-3.

We're five years from now, we're sitting down here."

What are we going to see?

Oh, man.

The five-year question is so hard.

I mean- It's a great question.

Here's a crystal ball.

Yeah.

I mean, the exciting thing about this field in general is you look back 12 months and you're completely embarrassed by where you were 12 months ago.

When GPT-3 launched, it was unbelievable.

I'll speak for myself.

It blew my mind, the idea that AI could do any of these things.

Then somewhere in around GPT-3.5 and 4, the Turing test, which we had held up for like, what, 75 years as the pinnacle of artificial intelligence research.

Oh, man, the world will be different when an AI can pass the Turing test.

We just went whooshing by and now we just don't talk about the Turing test anymore.

It's totally forgotten.

Even you look back to the beginning of this year, of 2025, and most people were writing code themselves.

Most engineers were writing all of their own code.

That's gross.

Writing it yourself.

Now, fast forward and you've got the idea that you would do really much of anything without leveraging codecs, cloud code, GitHub, Copilot, any of these tools, they're all incredible, is crazy.

You're so much more productive with it.

Just in 12 months, and in 12 months, software engineering has fundamentally changed.

I think over the next 12 months, we're going to see profound changes in the way that science is done, both in the stuff that we can do in silico, in theoretical physics, in mathematics, and computer science.

I think we're going to begin to see it in the life sciences, in the physical sciences.

That's over the next 12 months.

I mean, five years.

That's a question I think about a lot because when it comes to mathematical proof, I can go into a computer and I can test that, I can verify that, or at least test with it with some extent the same with some equation for physics.

But when you get into talking about the life sciences or material sciences and stuff, are we going to have a bottleneck of way more predictions than ways to test them?

I think one of the valuable, there's so many areas where models can help with life sciences.

If you take biology, drug discovery, for example, you have a huge search space.

The more that models can learn how to prune that search space, the more even if you're going to end up with a bunch of physical, real world experiments to run at the end of the day, if you can intelligently prune the search space, then you can more rapidly converge on the drugs that are likely to work in particular scenarios.

And then you can think about the impact.

For that to have real world impact, you need to make it all the way through the regulatory process.

That is its own process that AI can help speed up because you end up needing to write these huge papers that bring together tons of different findings and so on.

So you can take each step of the process and AI can help upfront as you prune the search space and try and find candidates that are more likely to meet your needs and meet the goals that you have.

And then as you go through the process to getting this thing out to consumers and making a real world impact, AI can contribute there.

And we have pilots with a number of the companies in the space doing that.

So it really is fairly broad based.

You started off with an interest in particle physics.

You were studying that, and then you found other things.

And now you find yourself back in the sciences.

Do you think other people are going to follow that pattern?

I mean, it is an absolute privilege for me to get to come back and work on science.

And I am nowhere near the scientists that folks like Alex and other people here at OpenAI are, but I don't know of something.

I think we talk a lot about AGI at OpenAI, artificial general intelligence.

I think maybe the most profound way that people are going to feel AGI in their lives is through science.

ChatGPT is an incredible tool.

I use it tons of times every single day.

But AGI inside ChatGPT will be able to do lots of things.

But when I can have personalized medicine, if AI models can contribute to science, finding a way to do scalable fusion more quickly, those kinds of things will change all of our lives.

And I think these are very real possibilities at the pace that we're going.

So that's why this is the most exciting thing in the world to me to get to work on.

I don't know what AGI will look like, but sometimes the experience you have of giving ChatGPT a really hard equation you're working on, and it just puts out the answer.

To me, that feels certainly like something approaching that.

And I also don't have a crystal ball.

I also clearly have a bad track record of predicting where AI is going, given that at the start of the year, I didn't think I'd be here.

But there's two things that are simultaneously clear to me.

One is the models are definitely going to keep getting better.

And sometimes my colleagues ask me, "Oh, are we reaching a plateau?"

And that is actually something I was wondering about too.

And then I joined OpenAI and I got to play with some maternal models that we have.

They're even stronger.

And I was like, "Okay, this is definitely going to keep getting really, really good."

And then the second thing is, I think already with GPT-5 Pro, which is I think our best 5.1 Pro today, our best model that's available on the outside, I think there's a big gap between what the models can do and what the science community uses them for.

And one of our goals here at OpenAI for Science is to start bridging that gap.

Because I think the models move so fast that unless you're really paying attention, you may not realize how much has changed in just the last few months.

And so I think these two facts are true and are going to, over the next year, really lead to big changes in science.

The models just keep getting better and people are starting to catch on.

And that's why we're seeing all this chatter on Twitter and social media.

And that's only going to accelerate.

So where that takes us, I don't know, but I'm excited to find out.

I think you both made a very good point in that is that these models improve at such a rapid pace that sometimes people have a very firm idea of what they are because they tried something six months ago.

And I've encountered people who I really respect and the scientists are like, "Oh, I tried it."

And I'm like, "I tried it 18 months ago."

And they're not used to a tool evolving that quickly.

Yeah.

Or they're using the free version because of course that's how everyone starts and the free version doesn't think for as long and so it can't solve problems that are as challenging.

Yeah, I think that's really real.

It's one of the reasons that I think the best advice is to just keep trying the problems, even if you're working on problems.

And as you try them on GPT-5, it isn't super helpful.

I wouldn't give up.

I would keep trying it every few months.

And I think at some point, it's going to start being valuable if it's not already there today.

We talked about thinking time.

That's another area that we're really excited to see that with GPT-5 Pro, you can get the model.

I've seen it think for what, maybe 40 minutes on some of the hardest problems?

Yeah, that we have an example on the paper.

But it has a certain amount of compute allowance because we have to serve it to many, many, many people.

40 minutes is certainly not a limit on thinking.

The models can think for two hours, six hours, 12 hours, 24 hours.

And one thing we continue to see is that pass rate on hard problems continues to improve as you give the models more time to think.

Which is like, it's surprising actually the number of times there's a totally reasonable human, like intuitive human analogy to these things.

There are a lot of problems that I can't solve in 20 minutes, but that I might be able to solve if you gave me two hours.

System one and system two thinking.

Yeah, and some that I can't solve in two hours.

But if I had a day to really think about it and try different things, I might get there.

And the models are the same way.

So being able to give a much smaller, there aren't as many scientists in the world as there are users of chat GPT.

If we could find ways to give scientists that really know how to use the models well, just a huge amount of compute.

I think that is yet another way that we can accelerate science.

Yeah, it's a very good point because you'll hear people talk about it.

We hit a wall or whatever.

And one of the things that was really an amazing discovery, which a year ago we found out about the whole reasoning paradigm and the fact that you can just take the model of today and let it think longer.

And we think about, you know, people go, what would we do with all this computer building, all these, this hyperscaling?

It's like even using today's models and letting them think for a long time, we could probably have some amazing discoveries.

Yeah.

100%.

I think if, if model progress stopped today, just the process of, of driving awareness within the scientific community and giving people more of the best that the models can deliver, I think we would see a large amount of scientific acceleration.

But of course, progress is not going to stop, as Alex was saying.

And so it, when you think about models, being able to think for a longer time, being able to train them to do harder and harder scientific tasks.

And actually also just, you know, getting out in the scientific community and helping people see what the frontier really is and how they can use the models better to do the work that they're doing.

I just like, I'm excited to see where this goes over the course of the next six months, 12 months, 24 months.

Yeah.

I think this is a really unique time in history.

It feels like a special moment.

And to be clear, we're not telling people drop whatever you're doing and come do AI.

That's not the message.

I think what we want to say is keep doing what you're doing, but also there's this great new collaborator, this new tool you get to use that's going to make it even more fun.

And it's going to bring you life into a lot of different fields.

One of the challenges right now with benchmarks is that models, one, we talk about terms like saturation, one seems like models have done that.

Also, a lot of them just don't seem that impressive anymore.

Now it looks like we're moving to the scientific frontier.

What does scientific benchmarks look like?

Yeah.

Like with many things, there's sort of an intuitive way to understand this is the models get smarter.

Benchmarks are just a way of testing the model in some sense.

And as the models get smarter, you need to give them harder and harder tests because they learn how to ace the earlier tests.

So if you take GPQA, which stands for Google-proof Q&A, it's a scientific benchmark that asks basically PhD level questions across a range of scientific fields.

We thought for a long time, that was a very hard benchmark to beat.

I think it came out in 2023 and GPT-4 originally was like at 39% on this benchmark.

Humans, by the way, are at about 70%.

But now you fast forward two years and our latest models are nearly at 90%.

So they're surpassing the capability of most humans in their field of scientific study across every field at once, which is kind of amazing when you think about it.

But those aren't the hardest questions in the world.

And that's one of the reasons that we're focused on new evaluations that ask frontier science and mathematics questions.

It's also, we released something called GDP-Val recently, which is an eval that tests the model's ability to do economically valuable tasks.

So the smarter the models get, the harder the tests that we want to keep giving them.

Because every gap that we see, every place where the model can't answer a certain question, that's feedback for us and gives us a way to improve the model further.

Caring disease, great.

What area though beyond that would you really like to see?

And it could be crazy or weird or odd you'd like to see scientific acceleration.

You want to go first?

Well, I'm very selfish.

So I have my own interests.

I really like black holes.

That's my passion.

You want to build a black hole.

I think there's a lot of potential for how AI can accelerate black hole research.

And of course, I want to see it help with cancer and drug discovery and all these good things.

But my first priority is, yeah, I want to see more AI helping with black holes.

So there's a lot of ideas on the table and so much potential.

One thing is there are a lot of theoretical questions that are very thorny.

And I think if you just sat down and you could understand everything that is known and you could integrate it, integrate that knowledge, I think a lot of things would fall out of that.

And that's one of the things that we're exploring.

Dark matter, for instance, is something that we've been talking about because there's a lot of data on dark matter from various experiments, but we still have no idea really what it is.

There's a bunch of theories out there.

I think a really interesting idea is, could it be that by feeding chat GPT all the experimental data that is known about dark matter and all the theories, it could rule some of them out already by combining bits of knowledge that are just so disparate that it's hard for our human minds to hold them together.

I think that's kind of an exciting frontier.

And then I think also, since we were talking about the far future, experimental work is totally not out of the question.

Right now we're focused on more theoretical fields because they can be done in silico.

But you could totally imagine using AI to design better experiments and maybe run very hard complicated experiments, including maybe for black hole physics and other fields.

I think there's a lot of ground to explore here and very exciting possibilities.

Yeah.

And I'll say fusion, just because if we can actually, we have again, large scale, but small existence proofs of it.

So clearly it can work.

And the challenge now is to do it at bigger scale, more reliably.

Clearly it's possible.

We will figure this out, but if we can accelerate it, then the world with fusion is a significantly better place than the world without, we solve a lot of problems if we solve fusion.

And I'm excited to see if maybe we can contribute in some way.

I think it's easily overlooked by people how much we're dependent upon energy.

And if we had the same orders of magnitude improvement on energy production that we had in the last 200 years, what that unlocks.

And you think about things that are energy intensive like desalinization or construction and other things.

And when you have really, really, really unbound energy, it's incredible.

I mean, some groups might be looking to build lots of infrastructure for lots of GPUs, for example.

Who might want to do that?

But even beyond that, I think that we're going to probably see from the infrastructure build out a lot more energy devoted to energy.

And much like mobile phones and laptops made electric cars a lot more efficient because all this money being thrown into battery technology, I think we probably see that offshoot.

Yeah.

And I think anytime you change something by an order of magnitude, the world changes.

I think what we've seen over the past year with the way that software engineering has changed, you now don't need to be trained as a software engineer to write meaningful amounts of code.

That means you can bring, there are like, 30 million software engineers in the world.

I think now 300 million, maybe 3 billion people can write software.

And that's going to fundamentally change things.

If we can move, if we can make energy 10 times more prevalent, 10 times cheaper, it will change the world.

And I think it's a really high potential place for us to apply the intelligence of our models.

If I can add something, we have ideas that we're excited about in terms of the potential of AI to change science.

But this is very much not supposed to be a top down effort where we dictate what AI is going to do in the world.

We're actually very excited about building the best general purpose AI.

And if we release that into the world, then everybody will take it and use it for their own purposes.

And for me, I'm a black hole physicist.

I want to use AI to further black hole science.

But for a scientist in another field, I think it's natural to use it for that.

And the nature of research is such that it's very hard to know where the next breakthrough is going to come from, really.

And so I think our vision is to push this out into the world.

I think we could see a lot more adoption than we have today.

And once that happens, who knows where the next biggest discovery will come?

But that's how we give ourselves the best chance to accelerate scientific discovery.

Yeah, it's such an important point that the frontier or the surface area of science is massive.

And this is not about what we can do within open AI individually to accelerate science or to accelerate specific scientific projects.

It's about giving scientists all around the world AI so that they can accelerate their work.

That's how we move science forward faster.

So there are pieces I think that we will try and do because it'll help us learn.

But the vast majority, like what we really want is to see 100 scientists win Nobel prizes using AI.

Yeah, it feels like it's not the end of science.

It's really the start.

Exactly.

Certainly, it's sort of a, there's a science 2.0 moment happening, I think.