Google DeepMind: The Podcast · 2026-03-10

10 Years of AlphaGo: AI's Turning Point and Beyond

Hosts: Hannah Fry

Guests: Thore Graepel, Pushmeet Kohli

AlphaGoReinforcement LearningDeep LearningAlphaZeroScientific AIAlgorithm DiscoveryInterpretabilityProtein Folding

Read summary Jump to transcript Original episode

Why it matters

AlphaGo's 2016 victory over Lee Sedol was a landmark event demonstrating AI's ability to combine intuition and calculation to surpass human expertise in Go.

Key claims

AlphaGo's 2016 victory over Lee Sedol was a landmark event demonstrating AI's ability to combine intuition and calculation to surpass human expertise in Go.
Move 37 exemplified AI's capacity to produce novel, counterintuitive strategies that expanded human understanding of the game.
AlphaZero advanced the approach by learning solely from game rules without human data, discovering new strategies beyond human knowledge.
Techniques pioneered in AlphaGo have been adapted to scientific domains, such as AlphaTensor for discovering faster matrix multiplication algorithms.

Episode summary

Summary

This episode of Google DeepMind: The Podcast reflects on the 10-year anniversary of AlphaGo's historic 2016 victory over Go champion Lee Sedol, marking a pivotal moment in AI development. Guests Thore Graepel and Pushmeet Kohli discuss how AlphaGo combined deep learning and reinforcement learning to master the complex game of Go, surpassing human intuition and calculation. The episode highlights the significance of AlphaGo's novel moves, such as move 37, which challenged human understanding and demonstrated AI's potential to discover insights beyond human knowledge.

The conversation extends to the evolution from AlphaGo to AlphaZero, which learned to play without human data, further pushing AI beyond human-derived strategies. The guests also explore how AlphaGo's innovations in search algorithms and reinforcement learning have been adapted to tackle real-world scientific challenges, including protein folding and algorithm discovery. They emphasize the importance of verifiable domains, interpretability, and human-AI collaboration in advancing scientific knowledge. The episode concludes by reflecting on how AlphaGo's legacy continues to inspire breakthroughs across AI research and applications.

AlphaGo's 2016 victory over Lee Sedol was a landmark event demonstrating AI's ability to combine intuition and calculation to surpass human expertise in Go.
Move 37 exemplified AI's capacity to produce novel, counterintuitive strategies that expanded human understanding of the game.
AlphaZero advanced the approach by learning solely from game rules without human data, discovering new strategies beyond human knowledge.
Techniques pioneered in AlphaGo have been adapted to scientific domains, such as AlphaTensor for discovering faster matrix multiplication algorithms.
Search algorithms and reinforcement learning are now applied to complex real-world problems like protein folding, job scheduling, and logistics optimization.
Interpretability and verifiability remain critical challenges for AI-generated insights, especially in scientific discovery where human understanding is essential.
Large language models represent a shortcut leveraging vast human-generated data but face limitations in generating novel knowledge without reinforcement learning and exploration.
AlphaGo's success catalyzed a shift in AI research, proving that AI can achieve superhuman performance and generate new insights in complex domains.

Source material

Transcript

Welcome back to Google DeepMine the podcast, I'm Professor Hanofri.

Picture the scene, it's March 2016, inside Hotel Suite in Seoul, South Korea, two players are playing the ancient game of Go, a game of unimaginable complexity, long thought impossible for a machine to master.

On one side is Lisa Dol, a legendary 18-time Go World Champion, on the other, AlphaGo, a neural network-based AI system built on a powerful technique called reinforcement learning.

Welcome to the DeepMine Challenge Live in Seoul, Korea.

Oh, that's a very surprising move.

Not a single human player with a chosen move 37.

After hours of intense gameplay spread over seven days, yeah, that's an exciting move.

Lisa Dol placed two stones on the board to signal his final resignation.

And in the blink of an eye, the world changed.

The final result of 4-1 Congratulations to AlphaGo into the entire team.

That was exactly one decade ago, and the field of AI has changed unimaginably since then.

We've seen the rise of large language models, the growing sophistication of AI agents, and the solving of scientific grand challenges like protein folding.

But in many ways, the modern AI revolution arguably began right there on that wooden board in South Korea.

So in this episode, we wanted to look backwards and forwards to how a bold experiment in teaching machines to play games became the foundation stone for the AI breakthroughs of today.

And with me, are the perfect guests to tell their story.

Tori Grateful is a distinguished research scientist at Google DeepMine who was right there in Seoul as a key architect of the AlphaGo project.

And Pushmeek Kohli, who leads Google DeepMine science work, and is the person to tell us how those early techniques pioneered in Go can tackle crucial problems today.

Welcome to the podcast, both of you.

Tori, I know you're an accomplished Go player yourself.

Just explain to us why Go was seen as a good challenge for AI.

Yes, the the game of Go seemed like the perfect challenge for AI because the game has such simple rules yet it leads to such complex gameplay with tactics and strategies and complex patterns.

And once the game of chess had been solved as it were or at least you know, Deep Blue had won against the well champion.

Then Go was this open challenge.

It's much more complex than chess by many orders of magnitude and nobody was expecting it to be solved anytime soon.

Yet it looks so elegant and simple for computer scientists and so it was the perfect game to tackle at the time.

I mean, the idea of nobody, thinking it would be solved anytime soon.

That's sort of hits the nail on the head, right, for sure.

I know you were working at Microsoft at the time, but just how complex was this problem considered to be?

I think it was considered extremely complex and that is because not only because of the breadth of the search space of the number of moves you can make, but also the depth how long you have to reason and how long the games are in the game of chess.

You might think about reasoning about 60 to 70 sort of moves in the game of go.

It's much, much longer.

And that leads to the challenge of the problem.

Toy, I know when you first started Deep Mind being a Go player.

Didn't you play against Alpha Go when on your first day?

Yeah, yeah, exactly.

So imagine I come first day at work at Deep Mind.

I know a couple of people including David Silver and he asks me, Tori, you're a Go player, right?

Couldn't you do as a favor and test our baby version of something that wasn't even called Alpha Go at the time, of course.

It was an internship project and they had just about taken a few thousand games from the internet and had trained a system or a few hundred thousand games maybe.

And I had the opportunity to be one of the first people to play against it.

But you can imagine I was excited, but I was also nervous.

It was my first day at work and there I was being dragged to a centrally located table.

On the other side, I think it was Ajah Wang who would later be known as the Hand of Alpha Go with his poker face.

And I got to play against this baby version of Alpha Go with people watching, with a lot of people watching all the round and you know, there was no escape.

Later, Demis showed up and of course David was there the whole time.

And so what does one do play conservatively, right?

So I just thought, just don't make a mistake.

Surely this can't be so hard.

But of course that was exactly what that version of the program was good at.

It was trained on human professional games.

So it knew exactly what to do against conventional play.

And so as this little test match proceeded, my position became worse and worse and I ended up losing by a small margin.

But I took the crown of the first person who officially lost and stopped for Go.

It was quite the experience.

And of course afterwards, everyone knew me.

It was a wonderful way of introducing myself.

Humbling way.

Humbling way exactly.

Absolutely.

Push me just from my desk.

I mean, okay, so I know that the algorithm has drawn quite substantially from that early point where it was an intention.

But just broadly, he explained to us how it works.

And this idea about cracking the kind of componentorial spaces Yeah, so I think if you look at the game of Go, the number of moves that you can make at any given time, there are a finite number of moves.

But if you looked and reason about the overall game state, it's exponential.

And that exponential growth in the number of states that you have to reason about is what makes the game extremely complicated.

So how did they crack it then?

Just remind us of the solution that they discovered.

The beauty of AlphaGo was there is this element of thinking fast and thinking slow.

And AlphaGo, in some sense, was the perfect combination of those thinking fast and thinking slow processes coming together to take on this extremely large search space.

It matches quite well to how humans play the game.

I think, you know, if you imagine how a human would play a game of chess or a game of Go, we also have the capacity to look at a position and pretty quickly appreciate if that's good for black or good for white.

And we can also look at a position and already see moves that seem promising.

We never look at all the possible moves which would be maybe 20 or 30 in chess or 200 or 300 in Go.

We immediately drawn to certain maybe even aesthetically pleasing moves that seem like just the right ones guided by our intuition.

And that element is complemented by planning where we explicitly reasoned through the possibilities.

If I make this move, my opponent might make that move and then I have to counter with this move.

And these two different ways of thinking come together in how humans play these games and they also come together in how AlphaGo plays.

The intuition and the calculation as it was.

Exactly.

So was that the inspiration that did you sort of think about how you were playing the game, how other Go players were playing the game and draw that direct inspiration from neuroscience effectively as a way?

Yeah, I think that is definitely one direction because a lot of team members were actually game players who were able to introspect and see how we tackle the game.

And then of course that comes together with deep learning that at the time, you know, since 2012 had grown as a direction.

And now for the first time gave us the tools to learn these approximate functions.

For example, the value function that takes the board and tells us how good it is for either black or white or the policy network that takes the board and effectively ranks the available moves according to how likely it would be that a professional player would take them.

And so deep learning was just ripe at the time to tackle this problem and gave us the opportunity to implement the fast thinking.

The slow thinking is not unlike what happened in deep blue, you know, it's the search of the game tree that was already known and that we might now call it good old fashioned.

Yeah.

Okay, well, I mean, you lost to this thing quite early, but once it had gone through a lot of the people on the team, let's say, I know that you tested it with a professional go player because you had fun way come into the office.

Yeah, exactly.

How confident were you at that point that it was going to be him?

Yeah, we had different levels of confidence, which was really interesting.

So we had been really lucky to find him, you know, he was the European go champion at the time.

He lived in Bordeaux and came over.

We lured him into playing this game with us.

And the setup was that he would play 10 test games against the version of Alpha Go at that point.

And I personally thought that Alpha Go cannot possibly be at the point already that it beats the European champion of professional player.

And so I had a bet with David Silver.

David Silver was confident.

He said, I think Alpha Go is going to nail it 10 0 and I said, no, I think Alpha Go will lose at least one game.

And the the bet was that whoever lost would have to show up at the office dressed as an ancient Japanese gomaster and be in the office for one day with that.

Well, who showed up like that?

It was me because it was in fact 10 0.

But it did give us confidence and gave them its confidence that we would be able to tackle even harder opponents in the near future.

Which you do, of course, on a plane you got in 2016 to settle in Korea to play against Lisa Dole.

I mean, just tell us, give us a sense of how phenomenal a player he actually is.

Yeah.

So, Lisa Dole was really one of the or maybe the best players at the time with an incredible trek record of winning tournaments.

He was compared to Roger Federer at the time for his success and intellectual brilliance.

And so for us, it was a tremendous honor that he accepted our challenge to play against him.

And it was a tremendous challenge because we had to set a date, right?

You can't just say, you know, we'll tell you when we're ready.

A date was set and we had to work towards that date to actually make Alpha Go strong enough.

And what added tension and excitement to it was that Lisa Dole was convinced that he would win.

He thought it highly unlikely at the time that Alpha Go would win.

And of course, he was basing his assessment on the game records that he had seen against François, that and he assessed that he was better.

But of course, what he wasn't so aware of is that Alpha Go was constantly improving through the training and the algorithmic refinements.

And so on that we made.

And so the entire team basically went to South Korea and you wouldn't believe the excitement of people there.

You know, the truth is in England, Go is a bit of a niche activity, right?

Very few people would be able to play it or even know about it.

But in South Korea, people were so excited.

The best Go players are celebrities.

And you know, we came there and there were hordes of photographers that took pictures.

We had a documentary film crew with us.

And so imagine typical computer geeks as it were.

Suddenly in the limelight of the world for this match, that was quite the adventure.

Yeah.

I mean, were you nervous about the performance of Alpha Go?

Yes, we were definitely nervous.

So of course, we had a very sophisticated evaluation pipeline.

You can test against players that you have access to, like François, that was super helpful.

You can also test against previous versions of the program.

And you can calculate what we call the ILO score of the system, which basically takes the outcomes of all the games that you play against other versions, maybe earlier versions of your program, and calculates how what the writing of the new version is.

And you can calibrate these things quite well.

But of course, we didn't know where on that scale E-Cidol would be.

And of course, we wanted a cushion as well.

You know, it would be would be nice to be quite a bit better to have some certainty.

Because this is the world's stage, right?

If you lose this, that's a bit of a hit to the reputation.

And so yeah, there was, we were nervous.

We worked up to the last minute.

We also needed to make sure that the system is really stable.

You know, you don't want to make last minute changes to make it that little bit better, but the risk that it now becomes unstable.

But in the end, we were, we were quite happy with it.

And so we entered that now, kind of famous hotel floor where all the action happened, where all the press was waiting and so on and embarked on the match.

And people were watching from around the world, including pushing.

Yeah.

So, I mean, where were you at this point?

You were watching on?

Yeah, I wasn't Seattle.

I mean, I really started getting into it in the middle of the first game.

It became so clear that AlphaGo had reached that specific milestone.

And you could even see the reaction from the press and the commentators and Lisa Doll himself.

Well, essentially, you said that the middle of that game, because in the early stages of that day, was it clear who had the upper hand?

I think from, like, from person who was just watching it, I felt that in the early stages, everyone felt quite confident that Lisa Doll would win.

In fact, only as the game progressed and it became closer to the final outcome that they realized that as you count the territory, AlphaGo had an advantage.

And it came in the surprise to people.

What did you think?

Yeah.

So, I had this interesting interaction on site with the professional goal player, an American professional goal player, who was sitting next to me, why we were watching.

And there was some sequence unfolding in a corner, and he approached me and said, you know, I always tell my students not to play that stupid move that AlphaGo just played.

So, I mean, it's pretty hopeless.

And I was like, I'm not as much as an expert.

I'm, let's just wait and see what's my reaction.

And then after, after that first game, this gentleman came to me and said, this is the most phenomenal thing.

I've ever experienced.

I'm so grateful that I'm allowed to be here to witness that a machine can play, go at this level, and there's going to be so much we can learn from it, and he was already embracing this.

I mean, if to imagine these people dedicate their lives to the study of this game, and they've often trained from being young children to their current age just to master this game.

And so, of course, it comes as a shock to them that a machine might match or even exceed a human go player.

Because if that was the first game, when AlphaGo won, in the second game, AlphaGo did something that I mean, really surprised everybody.

Oh, that's a very surprising move.

Yeah, so this was a remarkable scene that I was sitting in the international English speaking commentating room.

And Michael Redmond, our American commentator, he had this big demo board on the wall, and he would put all the stones up there on the board to show people what was being played and comment on different variations.

And so he took the stone corresponding to move 37 on the board, and then he stepped back and said, ah, this must be wrong.

And he took it back, and then he looked at the screen again and said, no, no, that is actually what AlphaGo played and he put it back.

He was puzzled.

You could see it, that that was such a counterintuitive move for a human player.

It was a shoulder move on the fifth line.

And this is typically something that human go players avoid.

So often in go, there is some kind of pushing going on along the edges.

And one of the players builds territory along the wall of the board, and the other side builds influence towards the center of the board.

And if that happens on the third and fourth line, this is considered to be roughly equitable, you know, both sides get something out of it.

But what AlphaGo was effectively suggesting is that it's still profitable if you do it on the fifth line, and you give that much more territory to the other party.

And that's why what was surprising to people, that that could that there would be situations in which that would be correct.

And so not only was it a very special move, but it in a way it represented a new way of weighing these two factors of immediate territory versus influence towards the center of the board against each other.

Something that went beyond a human go player would normally do, right?

Yeah, absolutely.

I mean, there are moments like this where you see the true potential of any assistant expanding human knowledge where people have regarded in this particular case, the game, of course, as a as a thing to be studied for many many years.

And there comes this particular point where that knowledge is expanded.

And people who are at first skeptical, and which was the case in the game as well, when the move was played, it was considered a hallucination or a mistake, right, for quite a bit of time before its implications became clear.

Later on in the game.

Exactly.

Because it proved to be pivotal to the second win.

Yeah.

It was not just a moment in that game, but it was also a moment, I think, in the whole sort of history of AI, where that particular moment showed us that there will be times when these systems will produce insights, which we might not even be able to discern whether they are the right tanks or amazing breakthroughs, but yet they will have a lot of influence in how we look at whole areas of study in a completely new light.

Well, I also want to talk about move 78.

This is a a move that was played by Lisa Dahl that confused AlphaGo causing it to resign the game.

What is Lisa Dahl up to here?

He's just burned like seven right minutes just on this move already.

It's too big a lot to do.

Oh, look at that move.

That's an exciting thing.

Oh, you know, I'm not actually sure what AlphaGo is trying to do here.

So, by this point, AlphaGo has won three games in a row.

And now Lisa Dahl does a move that confuses the system.

Is that fair to say?

Yeah, that's absolutely fair to say.

So, move 78 was an unusual wedge move that Lisa Dahl played.

There had been a very interesting battle as it were at the center of the board.

And Lisa Dahl found this move and it was also surprising to people similar to move 37.

And from then on, we observed that AlphaGo didn't have a good grasp of the position anymore.

We saw that the moves that it made didn't really make sense to us in a bad way.

You know, 37 also didn't make sense to us, maybe, but these moves even to to amateur us like us seem to seem strange.

And so it had it had been confused by the move.

And just to zoom out to give you a sense of why this still mattered so much.

So, you might say, okay, it's a match of five games.

And AlphaGo has won the first three.

What more is there?

Cool.

But then we were thinking, well, if now Lisa Dahl was to win the last two, what would you conclude?

He's got it figured out, right?

He's bound the fragility.

Exactly.

So, the human, it would have been the human triumph.

And so, that's why that game and the last one were still very exciting to us.

But it wasn't an entirely the case that we were disappointed.

We were certainly disappointed, but also we had so much admiration for E.S.

Dahl to, you know, as a human to be able to find this move, you just have to imagine this master who has dedicated his life to playing this game in this battle.

That must have been so hard on him, right?

To see this machine play so perfectly and him straggling to find a way.

And then in game four, he finds a way.

And as he put it in the press conference, I think later he said that he was so happy and proud that he was able, maybe for the last time on behalf of humanity to find a way to overcome the machine.

Because some people could, the divine move, didn't they?

Yeah, yeah.

And I think given the tension at that point in time and him really outgrowling himself at that moment and finding that I think it's a good name for it.

What the final score was for one to alpha go in title?

What was the reaction from the Go community?

Yeah, so the Go community followed the match very closely and of course the outcome was dramatic and for many people unexpected.

And so people showed very different reactions, you know, some people were absolutely amazed and surprised about the outcome.

Some people couldn't believe it.

Others, of course, also thought that some era had come to an end because now maybe the strongest Go player was no longer a human, but by the machine.

But overall, what we found amazing is that there was an uptake in interest in the game of Go.

I think more people play Go now than did before and the Go community really embraced the learning from AlphaGo.

So there are now many programs that work essentially the same way that AlphaGo does and people use it for teaching purposes.

They analyze their games through it.

And overall, I think it has provided a lift to the whole Go community.

Let me ask you about the reaction from the AI world to this match.

What was the buzz?

What was the conversation like?

The Leithedol match, the AlphaGo Leithedol match was a key pivot point where a lot of people, especially in the machine learning community who have been sort of working on these models and techniques as a mathematical and applied project, started to see evidence that these systems can self-learn and go beyond human knowledge.

And that is a very important point because in machine learning, you train with training data which has been collected and your natural sort of expectation is that the model is going to just be consistent with that distribution.

And to show that you can go beyond that distribution.

And that insight then can be utilized by the world.

I think is an amazing sort of insight that comes out of the Soul Experience.

And it really points to what is possible with artificial intelligence.

In not just the game of Go, but in the understanding of the world, in chemistry, in biology, in mathematics, in computer science, what are these amazing analogs of move 37 that these systems will be able to discover and reveal to us?

I think that point that you made there about going beyond human intelligence is just so fascinating.

But one of the things that I find most intriguing about that AlphaGo story even after the victory for one is that you then built AlphaZero where you took away all of the human data, all of the the games of Go that it had been trained on.

And discovered that once you take out the human intelligence, the thing actually improved, which is astonishing to me.

Yeah, from a scientific perspective, one could argue that that is even even bigger step than the original AlphaGo.

Because as you were saying, the AlphaZero system doesn't have access to any human game records, how humans play, didn't have access to prior knowledge about the game, how the game is played, but really only had access to the rules of the game.

And means of representing and learning these functions that we talked about the policy net and the value net.

So basically, it starts playing entirely randomly at the beginning because it has no notion of what good or bad moves are.

But it gather its experience from playing these games and it learns what are moves that are more likely to lead to a win, what are moves that are more likely to lead to loss, what are positions that look promising, what are positions that are not promising.

And eventually, it starts playing better and better moves.

And now, of course, it's not limited by human knowledge.

And what it discovered was amazing.

So first of all, it rediscovered ways of how humans play.

And that was totally reassuring, you know, there are certain patterns in the corner in Go that we call it Giuseyki, or in chess, there are certain opening moves.

The system was now more general.

It could play chess, go and show gui and could have played any number of other board games if we trained it that way.

And so at first, it rediscovers human knowledge.

And we think, wow, this is so cool.

It finds the same openings and so on.

And then we look at some of these openings and it stops playing them.

We think, what's going on?

It has found the refutation.

So it discovered rediscovered human knowledge and then it discards it because it has now gone beyond it and has found there's actually better ways of playing.

I'm not going to continue playing in this human way.

Stuff that humans haven't found yet, effectively.

Exactly.

For Alpha Zero, when it played Go, the way it played Go looked alien to me in the end.

So this wasn't the kind of Go that I had learned from my Go teacher, you know, which is structured maybe in a way that enables humans to understand these moves looked very free and didn't make much sense at the time.

But 30 moves later, everything would fall into place.

And you see, oh, yeah, oh, wow, that makes sense now.

And so on, as if it had the foresight, in a way, which it did.

So that discovery from nothing to that level of play was very impressive.

Okay, so there's something I want to show you, something that happened actually when you guys were in Seoul.

Because as you mentioned before, you were being filmed for this documentary for Alpha Go.

And there's some footage that didn't make it into the film.

But it was captured by the cameras as they were sort of packing up at the mic, French are still running.

I don't know if you've heard this little clip, let me play it for you.

This is Dennis and David, having a sort of private conversation.

Just amazing seeing how quickly the problem that is seen as being impossible.

Yeah, things to being, I'm telling you, we can solve a point in holding.

That's like, I mean, this is huge.

I'm sure we can do that.

I was, I thought we could do that before.

Yeah.

Now, when now we definitely can do it.

Yeah, that thing, okay.

Beautiful.

It's not great.

Yeah.

Torrid, do you think that captured the mood at the time?

Yeah, that was the kind of door that Alpha Go opened at the time, right?

If we can do this, then what else could we do?

Because this is a game with 10 to the power of 170 different positions.

This is super complex.

And if we have principled ways of navigating that kind of combinatorial search space, then it seems plausible that we would also be able to handle other large combinatorial search spaces and at the time one of the favorites was protein folding.

Absolutely.

And this is now the point really where you come aboard with the Deep Mind team for me because when it came to Alpha Fold, I mean, you're integral part of that story.

Did Alpha Go, did that project directly influence what you guys went on to do or was it sort of like the confidence of a victory that made Dennis say things like that?

No, I think Dennis from very early on, I think he has a very strong notion of what AI is being developed for.

He really sees AI as a tool that will help us understand the world better.

In fact, at the time when the Alpha Go matches were happening, I was at Microsoft working on AI for programming.

Now, AI for coding is everywhere, but at that time not many people were working on programs synthesis and AI for coding.

And Dennis wanted me to join Deep Mind.

And my question to him was, I am really interested in having AI systems, machine learning systems for solving the most challenging problems in the world and to make sense of what's happening.

And I think his reaction was, if you want to understand the world and if you want to solve the most important problems in the world, then you have to join Deep Mind because we will need AI to really understand the world deeply and to tackle these problems.

So, if you are interested in sort of learning to program if you are interested in cybersecurity, if you are interested in dealing with climate change, if you are interested in understanding how to deal with impossible to treat sort of diseases, you have to come and really lead the charge on how can AI be used for these applications.

I want to ask you about some of the innovations that you guys have made in Alpha Go and how they ended up finding their way into the science projects that you guys were doing.

One of the big things that Alpha Go did was to make that gigantic search space more tractable.

So, how have search algorithms changed since then and how are they being used in science?

I mean, search is such an integral part of many problems that you encounter in the real world.

We just spoke about protein folding, which could be considered as the search over the space of all possible structures, but just to give a more sort of simpler example, you can think of search as also the search of algorithms for solving a particular problem.

So, everything around us that computers do has some form of matrix multiplication underlying it.

So, even the fact that we have these machine learning systems and neural networks that are changing the world today, these neural networks are based on matrix multiplication, essentially taking large matrices of numbers and multiplying them together.

And even the very simplest operation of matrix multiplication, which is just taking two matrices and multiplying them.

Is the simplest thing that you sort of learn in school and and college and yet we don't know as a whole research community, what is the fastest way of multiplying two matrices?

So, if you think about that problem, you can reason about it as a search problem.

You can say there are there is a space of possible algorithms and now search over that space of algorithms and try to find me the best algorithms the issue is that the search space for that problem is even larger than the search space for go.

So, one of the first things that we needed to do is we came up with this agent called alpha tensor, which made matrix multiplication as a search problem, as a game.

So, instead of did you win or lose the game of go, you're saying, did you multiply these two matrices together quickly or not?

Yeah, did you multiply these matrices completely accurately in the smallest number of moves?

Right.

And that was the game and there was an algorithm that stress in 1969 had come up with and since then for 50 years there was no progress and then alpha tensor found a better way of multiplying these two matrices and then that was a key sort of proof point of what is possible with the same sort of techniques.

In case there's anyone watching who's sort of I don't know maybe not that familiar with the things we're talking about, matches multiplication for example.

I mean we need to take the really clear on the potential of this thing.

I mean every single large language model in the world is essentially at its heart just a massive matrix multiplication problem.

Right.

Yes.

All of the fuss about different chips that are being made is because some of them can multiply matrices faster than others.

Yeah.

And what you're describing here is it is like turning that into a game and even small games that you might make on how quickly you can do something once you scale it up to the size of how much everybody in the world is using AI.

We're talking about gigantic differences.

Yeah.

Absolutely.

And since then what we have done is we have said and let's not just tackle matrix multiplication.

Let's tackle all the possible algorithms that you can think of.

So our new agents like alpha evolve they search in the space of all possible programs trying to find the best algorithms that can solve these these important problems whether how do you schedule jobs in a data center which is an extremely important problem has that and it has implications in terms of energy, compute utilization and so on or how do you sort of tackle these logistics problems where you are trying to move packets around in a network.

So the same basic methodology of tackling the search problems now has been expanded in terms of what you can do with it.

Okay, but I'm thinking here about the policy network that the intuition as you described it where you know a go player might look at the board and say I think this is a fruitful direction in which to search.

If you instead of a board instead of a game of go you've got all possible algorithms of everything in the entire world and beyond let's say.

How on earth do you create intuition in that sort of a situation?

How do you know how to narrow down the search space?

Yeah, so I think this is this is a very interesting sort of research topic that we are now starting to think of when we apply agents like alpha evolve to discover these new algorithms.

Sometimes those algorithms are not very intuitive to us.

In fact, they could be counterintuitive.

So sometimes you can see the patterns you can see that there are certain symmetries in the problem that we did not understand mathematicians did not understand computer scientists did not understand, but somehow they were those symmetries.

The agent somehow discovered those symmetries and then they it exploited and utilized those symmetries to make the solution much more efficient.

In some cases, we just don't understand how it made things faster, but they are faster.

And then our challenges that when you think about collaboration where humans and these AI's agents are working together, then how do we make sure that and the systems that are produced and the algorithms that are produced are interpretable by the human computer scientists and engineers?

It reminds me a little bit of this situation in AlphaGo where people in the end game were observing AlphaGo and found that it didn't quite play optimally and they were really surprised to say, look, this is the better move than what AlphaGo played.

You know, is it not playing?

Well, is it making mistakes?

And the solution was that AlphaGo was optimizing the objective we had given it, which is to maximize the probability of winning the game.

Humans tend to use a heuristic, which is they want to have more territory than the opponent by some margin.

And they think the larger the margin is, the better it is for them.

Just often too, but AlphaGo doesn't care about the margin.

For AlphaGo, it was enough to win by half a point.

And so often in the end game, it was almost toying seem to be toying with the opponent and giving up points just up until the point where it was sure it could win by half a point.

And you sometimes you get these counterintuitive behaviours, but if you then drill deeper, you can see why they come about.

Because the algorithm and the humans are ultimately optimizing for slightly different things.

Exactly.

Yeah.

Okay, but then that does make me wonder.

So move 37 as an example of where, you know, it went beyond what human humans are able to do.

At the same time, when moved 37 first came through, people thought it was a mistake, right?

So how can you tell the difference?

I mean, if they algorithm comes up with something that is original, can you be sure it's not hallucination?

Yeah.

And I think this is an important point, right?

Like with the large language models, especially when they were being developed initially, the first versions of them, they would hallucinate.

They would come up with solutions, which were not correct or come up with responses, which were completely invalid.

And this is where the importance of the agent harness comes into play, where you couple the large language model with a verifier, which is able to sort of prove out when what is being hallucinated and what is actually something that might be remarkable that we need to investigate further.

But then if there's large language models are based on human data, is there a danger of limiting yourself to what humans have already discovered?

I'm thinking of the, you know, what's already in the textbook as it were.

When we build the agent, we deliberately increase the amount of things that they have to explore.

So we tell the models that you have to go beyond the distribution that you were trained on and you should feel free to explore more.

And in fact, you might sort of produce new things which might not be appropriate or not be correct.

But we have that verifier and evaluation function to prove out those those insights.

I think this is really how Karl Popper would also characterize the whole scientific process.

Conjecture and refutation is the famous essay.

And you know, conjecture is maybe hallucination.

It's this production capability of producing plausible hypotheses.

And then refutation is the step by which you filter out the things that are wrong, that don't work.

And I think it also makes clear why the current AI capability landscape looks like it does.

Namely, it is very good in verifiable domains.

Code is a verifiable domain.

You define the objective.

You can write down tests for the code.

The first test is that it compiles, you know, and then you test it on those tests.

But you have hard criteria to reject failure.

Which is super important for these kinds of tasks.

If you don't have it, things become much trickier.

For example, if you work on open scientific problems, you might not have a verifier who can tell you that this is right or this is wrong.

Ultimately, often experiment, physical experiment will be the verification that you need.

Right, but that's quite a long way down the road, doesn't it?

I guess the experimental part of it.

Because I'm just wondering here about interpretability coming back to the point that you made earlier.

Does it matter that you might end up with results that are not easily interpretable here, given that the stakes are so much higher than they are on a board of a go game?

Yeah, I think it does matter.

Science is also about communication.

If you can come up with this new insight, but if you are not able to communicate and people are not able to build on top of it, then there are limits to what the impact that will be achieved.

So interpretability plays a very important role, but it's not the only thing.

Take the example of alpha fold.

Alpha fold is able to solve this amazing problem of protein structure prediction.

Do we understand completely the conceptual sort of operations that it does?

Like at the mechanistic level, yes, but we don't know completely the underlying theory that can be used to recreate a human level reasoning process to make the same predictions.

And we will somehow need to convert them to a human digestible form that the bounded rational human mind will be able to comprehend.

I think there's a really interesting point there, which is that an explanation not only needs to account for the phenomenon that you're explaining.

It also needs to account for the intellectual level of the recipient of the explanation.

So sometimes on YouTube, you can see these things life explained at the level of a six year old, in eight year old, in 10 year old, in 12 year old.

I quite like the explanations for 12 year old, I have to say.

And that reflects this fact, right?

You know, some an explanation really is a bridge between the phenomenon and our capacity to understand it.

So it may very well be the case that future AI systems come up with explanations that might seem simplistic to them, but that are just about right for us to keep up with the AI system, right?

Exactly.

I mean, if you look at our agents like Alpha Proof, what they are able to do is you give them open mad problems and they will give you a proof and that that proof is very viable.

You can tell whether it's correct or not.

Exactly.

Even if you don't understand.

Yeah, you might not understand it, but you know it's correct, right?

The uncertainty about whether the original theorem was correct or not is now resolved, but do we completely understand it?

Like in fact, till now the results that we have had, we have spent their effort and then converted those results into a form that mathematicians have been able to see and say, yes, it makes sense.

I can actually translate it in English and it all works.

But there are two key phenomena that come out of it.

One is that the importance of framing the problem now rises.

Because if you don't, one of the challenges when we are trying to solve these very hard maths problems, when we are giving the agent these hard problems is to specify the problem accurately so that the agent can now understand what is the reward function that it needs to optimize for.

And then once it finds the solution, then there's the challenge of actually converting the solution back to a human readable form.

If we do get to a point there where an algorithm could just come up with its own proof, where's the role for mathematicians in all of this, speaking selfishly?

No, I think mathematicians are even more important today.

Because what these agents are able to do is they are able to solve these incredible problems.

But what are the problems that needs solving?

How do you specify that problem?

That's where mathematicians and scientists come in.

I do like the idea that one day they might be, I don't know, Riemann hypothesis and it comes back and says, yes, there's a proof unfortunately.

It's beyond any human's ability to understand it, so, you know, sorry about that.

But actually, I'm joking, slightly, but if we are talking here about advancing scientific knowledge and understanding beyond what humans have done, do you think you've seen examples of move 37 in science already?

Yeah, I think absolutely.

I think just the example of the metric multiplication algorithm, it is something that people had started for many, many years and yet we have been able to come up with a new algorithm.

So that is genuinely a move 37 moment in algorithmic discovery.

And I think we are now seeing the same thing in many other areas of science, in mathematics, in in material science, coming up with new structures that we think now are stable.

So there are a number of these things, but the original move 37 moment is still very relevant because it was, in some sense, the first and it brought about that concept of going beyond human understanding.

I am thinking here about Alpha Zero again and how that really moved away from human data and showed these profound results.

Large language models on the other hand ended up being almost a shortcut to intelligence, I guess, that was based very much on human data.

Was that a sort of surprising turn of events for you?

Yes, I think that is an interesting thing that we observed.

Deep mind was based on this idea that we use games as a microcosm of the real world and the philosophy of Deep mind had been to place agents within these environments and let them learn how to master them and thereby grow their intelligence.

And then what happened with large language models was really this discovery that there is a shortcut that somehow there is this huge amount of crystallized intelligence if you like, stored in the form of data on the internet, first text data, maybe images, maybe videos and so on.

And that the shortcut is really to first mind all of that data and train systems based on that.

And that's basically the first second generation of large language models that are based on that.

But then of course, you come to the point where first of all, that doesn't lead you to novelty.

You're now within this corpus of existing human knowledge and we know how competent these models are within that.

But it's very difficult to get out of that.

Now, how do we go beyond what we already know?

And that's I think we're now the community for the past few years is exploring the methods again that deep mind pioneered early on, others, of course, reinforcement learning in environments.

Part of the post-training now is routinely forms of reinforcement learning and either on human generated data or also on problems, on environments like coding environments and so on.

And so now we're in a period where we're going again, beyond human knowledge.

Push me, do you think that we would be here at this moment in the AI revolution if it hadn't been for Africa?

I think AlphaGo was that transition point where it became very, very clear that the moment of transition where we go beyond human level intelligence in particular areas is not science fiction or many decades later it is happening now.

And if it could happen in a game of go, there was no reason why it couldn't happen in protein structure prediction in fusion, in material science.

Then the legacy of that match and move 37 and and that experience is what we are all living in now.

I think that's a great point to end the episode actually.

To be honest with you, first of all, thank you so much for joining me.

Amazing.

Yep, pleasure.

These big paradigm shifting moments in the story of humans and machines have happened before, but the thing about chess is that it was always just a question of calculation, can a machine brute force its way to a victory?

AlphaGo was different.

It was the first time that a machine had demonstrated something deeper, a genuine intelligence that combined intuition with calculation and took us beyond human capability.

Now 10 years on from the AlphaGo match, the field has moved at an incredible pace, but many of the questions that preoccupied researchers then are more relevant now than ever.

How do you create AI systems that go beyond human knowledge and a capable of new insights?

And how do you separate the genuinely new insights from hallucinations?

You have been listening to Google DeepMine the podcast with me, Hannah Fry.

We have got plenty more episodes to come out this year, so please make sure you subscribe to our YouTube channel.

I'll see you soon.