No Priors ยท 2026-06-10

Biohub's Open-Source AI-Driven Biology Revolution with Zuckerberg, Chan & Rives

Hosts: Sarah Guo, Elad Gil

Guests: Mark Zuckerberg, Priscilla Chan, Alex Rives

BiohubOpen SourceProtein FoldingAI in BiologyESM FoldPersonalized MedicineFrontier BiologyScientific Tooling

Why it matters

Biohub integrates frontier AI and frontier biology to build hierarchical world models from proteins to cells and systems.

Key claims

  • Biohub integrates frontier AI and frontier biology to build hierarchical world models from proteins to cells and systems.
  • The recent ESM Fold model folded over 1.1 billion proteins, predicting structures rapidly and enabling digital protein and antibody design.
  • Open-source approach aims to democratize tools and data, accelerating scientific progress across academia and biotech.
  • Focus on generating novel biological data through advanced imaging, cellular engineering, and sensors to feed AI models.

Briefing memo

Summary

In this episode of No Priors, Mark Zuckerberg, Priscilla Chan, and Alex Rives discuss the ambitious mission of Biohub to accelerate biological science through open-source AI and frontier biology. They emphasize building hierarchical world models starting from proteins to cells and whole biological systems, integrating AI with novel biological data collection methods. The team highlights their recent breakthrough with ESM Fold, an open protein language model that predicts protein structures at scale and enables digital protein design, including therapeutic antibodies. They stress the importance of open ecosystems to empower scientists globally and the long-term philanthropic commitment to this 100-year mission to cure, prevent, and manage all diseases.

  • Biohub integrates frontier AI and frontier biology to build hierarchical world models from proteins to cells and systems.
  • The recent ESM Fold model folded over 1.1 billion proteins, predicting structures rapidly and enabling digital protein and antibody design.
  • Open-source approach aims to democratize tools and data, accelerating scientific progress across academia and biotech.
  • Focus on generating novel biological data through advanced imaging, cellular engineering, and sensors to feed AI models.
  • Goal is personalized medicine by understanding individual genetics, disease mechanisms, and designing bespoke interventions.
  • Long-term philanthropic effort with $500M committed, emphasizing tool development over commercial profit motives.
  • Challenges include data scarcity in biology compared to language models and the need for multi-scale biological modeling.
  • Future vision includes digital experimental platforms, improved clinical translation, and enabling rare disease research.

Source material

Transcript

We just want to give tools to the whole scientific community.

We want to understand how biology works.

I want to understand the genetics of this person.

I want to understand the risks they have to do different illnesses.

My goal is to be able to treat the individual as an individual.

Understand the mechanisms and be able to intervene.

We will have a bigger impact by getting this and more side to send the quicker by doing it as open source projects instead.

It's not just like there's some factory somewhere that you can pay to produce the data.

You actually need to invent new novel scientific approaches.

The theory isn't that we're going to cure the diseases.

We're not.

It's that we want to help accelerate the pace of progress for the whole scientific field.

We folded over 1.1 billion proteins and predicted their structures.

And we didn't design a model for antibodies.

We didn't design a model to be able to bind one particular target.

We just designed a model that could understand proteins.

We could design a protein to actually change the physiology.

Then we can actually cure someone.

Today on NoPriors, we're joined by Mark Zuckerberg, Priscilla Chan and Alex Reeves.

We'll be talking about biohub and all their various efforts to now start applying AI at scale to do world models of cells and different levels of interactions across biology.

Mark Priscilla, thank you for doing this.

Yeah, thanks for having us here.

Alex, congratulations on new missions.

Thank you.

You guys made biohub your primary philanthropic effort.

And then committed $500 million to this virtual biology initiative.

Can you tell us a little bit about why do that?

And how did you go from we should fund this to this is who we are?

So biohub, in its current form, we're super excited about it.

We feel like it's a really good fit for who we are and what we bring to the table and what we can achieve together.

But this work started 10 years ago when we were thinking about how can we give back?

And Mark wanted to build an organization that could cure prevent and manage all disease by the end of the century.

And we had a series of hilarious meetings with scientists that famous Nobel Prize winning scientists were just laughing at us.

Is that was that you're starting line?

We're just going to cure all disease.

No, no, and to be clear, we don't think that we're going to be the ones curing the diseases.

Our goal is always to build tools that could accelerate the whole scientific fields that way the scientific field collectively could cure all the diseases.

But still, I feel people laughing at it.

By the end of the century, it was a stretch.

Now I think it's like too conservative.

And so we were kept being like, okay, well, we had these series of funny awkward educational conversations were like, okay, but like, why?

Why do you think it's impossible?

And like, you know, just being the person in the room is just like, oh, I don't know why you tell me.

Finally, we got people to like, they're like, fine, if you really must know.

And we're like, you know, we do.

It's in some more.

It's, you know, they're like, well, we work in silos.

And when you publish information doesn't get shared, it gets locked up for long periods of time.

And we don't have tooling, you know, they gave the example of like, we build a great tool by one postdoc in the lab and it lives on their computer.

And when they graduate, the tool is gone.

And they just, it was, what we heard was very hard to build shared tools to move science faster, build the shared knowledge base to quickly move science faster.

And that's sort of where we begin and thinking about, okay, like, if those are the problems, like, what can we contribute?

Yeah, I mean, so the original biohub model was basically focus on long-term tool development by bringing together engineers and scientists across multiple universities to focus on long-term tool development.

And it basically, it, like, worked.

And, you know, we started off with, with season, I doing a number of different things.

And I think over time, we just felt like, okay, the science piece is really working and we just kept on investing more and more and more in it until now it is basically the primary and main thing that we're doing.

And we've expanded the original San Francisco biohub to a handful.

Now at this point, there's New York, there's Chicago.

The real focus in the unifying theme at this point is the virtual biology initiative around taking the unique data sets that are able to be generated in order to model effectively, starting with the smallest pieces of proteins, but then eventually cells in whole biological systems.

But that's kind of how we've evolved is, you know, this idea that we talk about around that some of this is an AI problem and you want to build a frontier AI lab, but you need to couple that with a frontier biology effort that can do the work of, of basically being able to understand and get the data that you need to actually be able to build these models.

Because unlike language models, there's just like a lot of data out there on the internet.

That's not really the case with biology.

I mean, there are, obviously a bunch of different data sets that exist that academia and scientists have generated over the decades.

But a lot of the stuff that I think we want to put into this, it doesn't exist, right?

It's like you want to build to visualize things that people haven't been able to see before, which is why we're doing the imaging work.

You want to be able to record things that are going on inside the body, which is why we're doing the kind of cellular engineering work.

You want to be able to measure things like inflammation in ways that haven't been possible, which is why the Chicago Biohub is focused on building those kind of devices and being able to do that.

And that will fundamentally create new types of data sets that will allow new types of models.

And I think it's just a very exciting thing that I'm going back to what you're saying.

If the, if the scientific field it primarily needs kind of tool development that now is going to empower scientists across the the field to build to do their work faster, that's what we think we can provide through this kind of long-term focus on tool development.

But I think there's a fun through line on where we started and bringing us to our work with that Alex is driving now is that our very first request for application RFA here was around single cell sequencing.

And we wanted to look at sort of like the RNA that just transcribed an individual cells.

And that was possible, but it was still pretty early on in understanding how different cells were expressing their DNA.

To the point where at the beginning we were just funding methods like getting people to describe how to do it so that others could share that methodology.

And then that became us funding the human cell Atlas, which is now one of the largest databases of single cell transcriptomes.

It was getting hard for scientists to annotate the data.

So we built cell by gene, which was like a very simple annotation tool that scientists could use to make use of that data.

Then the community came around cell by gene, built around cell by gene, and started contributing more and more data that we had nothing to do with sort of creating or funding or making happen in the world.

And now cell by gene is a corpus of knowledge that a lot of the transcriptomic base models are based off of and is used regularly by the scientific community.

But still there are always critiques.

This is just stamp collecting.

You're just gathering bits of knowledge, sorry, bits of data.

And we're not going to be able to pull scientific knowledge in wisdom and insights out of.

And we didn't have any answer for a while.

And then imagine our delight when large language models became a huge topic of conversation that could make sense of large amounts of data.

And I just for me is like, what if we could actually understand how biology worked?

Move it from a discovery-based science to an engineering-based science, where we could systematically understand how living beings living cells worked and be able to understand why things go wrong.

And so when we saw that moment, we're like, this is it.

Something really big could happen here.

Alex, you started met a fair but you were on the path to, you know, you'd assemble the team in evolutionary scale and raise venture and you were making progress in your models.

What was the pitch from Mark and Priscilla Rue said like, that's actually the right way to go after the mission?

Well, I think for me, it was really kind of the moment when I understood that, you know, they really saw this as an integration of frontier AI and frontier biology.

And I think I had developed conviction that, you know, this is really a new era of science that's just beginning kind of what's going to be possible with artificial intelligence and, you know, we're in the age of information theory at scale.

And we have these systems that can basically kind of predict the next token and they can, you know, learn role models from them.

They can learn biology from the data.

And so, you know, I think that it just, it was really clear that, you know, it's a build kind of that next, that next kind of institution for the next era.

You would really need to have frontier artificial intelligence.

You would have to have frontier biology.

You would need to start to put those things in feedback and really have models that are learning from the biology.

And I think, you know, it's just, and you need the right scale on the right people.

And so this, this just really felt, I think, like, the way to do that.

There's a variety of models that you all have been working on.

And I think it's kind of interesting because some of the earliest breakthroughs in biology were things like alpha fold where, you know, it was a Google model that showed that you could do protein folding at scale in a really interesting way that people didn't realize was very tractable.

And this was pre-sort of the really big transformer waves that came later.

And then you're working on a variety of different things that different scale, right?

You're doing incremental molecular modeling and protein folding.

You're doing cell-based stuff you're thinking about interrogating larger scale systems and biology.

How old do you think that extends from sort of the micro to the macro?

You mentioned almost starting with building blocks and building up, but modeling cellular behavior is very different from modeling protein folding.

The data is very different.

The modeling is different.

I'm just curious, like, do you think it's all similar in terms of this data and you train stuff?

Or do you think it's actually, there's some differences in terms of how you actually have to deal with these systems?

I mean, there are probably some differences.

I mean, you can try to talk more to the specifics around this.

But like, I mean, I think each layer is going to end up being somewhat qualitatively different, right?

But you need to be able to understand the protein interactions in order to be able to understand how cells work.

So you can't just go straight to cells in a way without understanding the protein modeling.

And then if you're trying to understand something like the, you know, the way the immune system works or a bunch of cells interact together, then it's tough to do that without first understanding cells.

I mean, you might be able to that like a very high level of abstraction, simulate a system.

But if you really want to like understand how it's going to work, you kind of want to build the simulations at each level hierarchically.

So that's basically the approach that we're going through, starting with the, um, the building blocks and the protein.

But yeah, I mean, I think that there's going to be different types of data that you want to collect for each, um, the modeling techniques I think we'll see.

I mean, that'll all keep on advancing across the board.

But I do think that a big part of the strategy is this view that you need to build it up hierarchically.

And, you know, one of the things that's unique about us in the space is we are very intentional that the, the AI efforts and the wet lab efforts were a single effort.

And we've done a lot of work to bring them together.

And the really neat thing that we can do is really try to pull and gather data that helps us connect, um, across sort of the hierarchy.

You know, you can look at transcriptomics with space within a cell and look at where it's localizing.

We can look at, um, translucent zebra fish and look at the development across, uh, different cells.

And when they're brain develops, we have sensors that allow us to look at cell cell communication and different molecules.

And so we can be strategic about the types of, uh, experiments and data we want to collect that helps us bridge across these, that makes it so that there's some connective tissue that helps drive the modeling that, you know, the modeling magic that happens.

Yeah, there is an I asked a question, by the way, that used to be a biologist.

I have a PhD biology and I worked at the wet labs for almost a decade and everything else.

You're looking for a job.

Uh, uh, we can talk about that one there.

I'm here along with my dad and my Danny Glover, you know, and he's the weapon.

I'm almost a retirement.

Um, but I think, um, you know, one of the things that was always lacking was this integrative nature across the different layers of biology and the developmental biologists who work on their own and the molecular biologists would be totally different experiments.

And so that's what is curious about.

Yeah.

Typically, there's a reductionist view of biology and there's a systems view and those people didn't really work together deeply.

And so one of the exciting things about what you're doing actually is how you're bridging that.

And so that's, that was kind of the basis for the question as well.

Yeah, and if I could add something there, you know, it's, I think that, you know, we're in the age of this kind of information theory and biology.

And so, you know, there are levels of complexity and hierarchy and biology and kind of each level is made up of and, you know, constituted by the lower levels.

And so, as you want to have that kind of more complete description and you want to have systems that can really generalize and begin to actually answer, you know, experimental questions digitally that you could ask in the lab, you know, you need to have kind of the right basis for modeling at every level.

And so I think what's really unique about what we can do is to, as, as, as Priscilla and Mark were saying, you know, really build information at each of these different layers, collect them, collect kind of those connection points, but then also really kind of do it at the scale that will reveal that underlying information or contextual.

And that's going to be really critical to actually be able to build digital representations that can answer a new experimental questions.

One of the things that inspires me most about this effort is really what Priscilla's side would just like, well, there's so much we actually don't understand about biology and what if we could, which I think is actually very different from lots of other incredibly interesting and useful AI problems we attack were like trying to replicate human behavior.

And I'm like, a lot of that data's, you know, on the internet or captured without pretending to understand all human behavior, you can predict a lot of it.

I thought one of the most interesting things in your release was actually, you know, the mechanistic interpretability stuff you alluded to, which is, can we actually extract new knowledge from, you know, what the model believes is happening, right?

Can you talk a little bit about that?

Yeah, I'm really excited about that.

So I think, you know, in mechanistic interpretability kind of traditionally it's been applied to large language models with the goal of understanding, you know, kind of what is the representation space of a large language model?

How does it compute things and does that really connect to, you know, what we understand about our intuitive understanding of the world?

And so there's, I think that's really rich toolkit that has been developed to start to be able to ask those questions.

So kind of what does that mean for biology?

One of the classes of models that we train are these protein language models.

So they're really, you know, trained on the codes of proteins.

And so anything they weren't about biology is is kind of emergent.

And we've seen that they can learn things like biological structure and biological function.

And that's just kind of emergent from this, you know, token prediction, training task.

So, you know, as we think about, like, mechanistic interpretability in those models, you know, we're really seeing the unknown, because the models have been trained on billions of protein sequences.

They've been trained on, you know, both known and unknown biology.

And yet they're developing these representations that start to kind of capture things that we can really see correspond to that reductive picture of biology that's been built up over the centuries.

So kind of, you can, you can start to connect the dots between proteins where we kind of really don't know anything about them with, with proteins where we do know something, because there's that kind of underlying structure grammar that's linking them in the representation space of the model.

And at the extreme, it could be, you know, we're going to understand systems in the body that we didn't before or the mechanism of action for a new treatment because we can ask the model and interrogate that representation.

That's right.

The hope is that you kind of really learn the underlying basis for how it's making the predictions.

And so you open up the black box and you can actually understand kind of the biology that the model is representing.

So, asking for a friend, you know, you guys all believe in venture-backed companies is a way to have impact on the world.

What was it like collecting data on zebrafish or the span of the data or the wet lab work or just the scale?

Like, what makes this a better fit for this big non-profit ecosystem effort versus a venture-backed company?

Well, I think we just want to give tools to the whole scientific community.

And, I mean, like, sorry, I think in order to have the biggest impact, I mean, part of it is just where, I mean, it's not actually clear that we couldn't run it as a business if we wanted to.

I just think that we will have a bigger impact by getting this and more scientists and quicker by doing it as open source projects instead.

So, yeah, I mean, I think that that's kind of the approach.

But, I don't know, it's an interesting question.

I'm not sure that, I mean, obviously you were doing it as a non-profit company, a bunch of the modeling before.

Then you run into certain issues.

I mean, you have to raise a large amount of money in order to build a compute clusters.

You know, I mean, I think a lot of ways the data is actually even more of a constraint.

And, um, because if you look at the scale of these models compared to language models, they're smaller, but they're smaller because the amount of data is less.

In order to get the data, it's not just like there's some factory somewhere that you can pay to produce the data, like you actually need to invent new novel scientific approaches to be able to do the, you know, for example, the type of cellular engineering we're doing in New York or the types of devices in Chicago, which is why, you know, when we're talking about this concept of frontier biology and frontier AI, the frontier biology is you need to do real science to advance different biological methods in order to be able to observe the things that create the data that go into the model.

So, it's not just like an off the shelf thing that you can create.

Now, that's a pretty big effort.

I don't know that there are like that many things like that that are done as biotechs.

I think it's just the scale of the ambition of what we're doing, the horizon over which we're committed to doing it.

I think part of the theory is like, if you're building tools that are this complicated, you kind of want to have a 10 to 15 year time horizon on, on, on building out these efforts.

And then the scale of capital required.

I mean, I guess there's no rule that said that you couldn't do it as like an incredibly well-funded startup, but I think that this just made more sense.

And then it also is simplifying strategically to not have to think about how you're going to make money with the different things.

I mean, we just, we want to get the models and people's hands.

We release them as open source.

I think that that's a very valuable thing to do.

And again, I mean, the theory isn't that we're going to cure the diseases.

We're not.

It's that we want to help accelerate the pace of progress for the whole scientific field.

As the person least experienced with making money here, I would say that there, you, the sort of neutral non-profit nature of our work actually helps harness more people to enter this effort.

And to actually achieve the mission of like understanding the totality of human biology and to cure prevent mental diseases, you actually do need the entire academic biotech industry to come together and to work on this in a sort of unified way.

In part, because there's a lot of talent out there.

And it's not helpful to leave any talent, exclude any talent from the effort.

And there's a super long tale of diseases.

There are the common ones and even the common ones.

I think if you unbundle heart disease, cancer, neurodegenerative diseases, even if you unbundle like dementia or depression, there are many, many, many subcategories that become more and more niche.

And that's not even looking at the long long tale of rare diseases.

Those often get orphaned and don't get brought along when we're sort of looking at with the most efficient way to impact the lives of money.

But if you sort of decentralize the effort and put the tools and many people's hands, you start getting people who are like, you know what?

I am super interested in spinal mass muscular atrophy.

And that's something I care deeply about.

And if you put the tools in that person's hands, they're going to be able to make progress.

And a way if you had to focus your efforts and make big bets, you probably wouldn't because it's just a niche individual small group disease that actually will in turn, if we can understand that disease process helps us unlock knowledge about a lot more about the human about how the human body works.

Do you have any thoughts or predictions in terms of what disease areas this work will impact first?

I know it's very hard to be predictive about these things.

But just given the nature of the work and the nature of the model, so there are areas where most optimists take about in the short-term medium term.

That's actually not how I think about it, at least.

The way I think about it is like we want to understand how biology works.

The ideal world, as you would say, I understand I understand the genetics of this person.

So I want to think about people at the individual level.

I want to understand the genetics of this person.

I want to understand the risks they have to different illnesses.

I want to understand the mechanistic connection between, say, a gene variant, a protein in a disease process.

Because if you understand that through chain, then you can design a protein, design a drug, bespoke to them and actually make an intervention.

And right now, I'm sure we've all had experiences being sick.

And if you have something that's even remotely non-standard, you go into PubMed, you look up a paper, you look up the supplement, and then you start going through the methods.

And you're like, am I represented in this paper?

And we're just making guesses.

We really have no mechanistic understanding, we're saying, like, okay, you're kind of like these people that we studied.

And this drug kind of impacts the pathway that we think is implicated.

Let's try and see if anything happens.

And time passes.

And sometimes it works and sometimes it doesn't.

So my goal is to be able to treat the individual as an individual, understand the mechanisms and be able to intervene.

And there are different diseases that are different stages of filling out that whole through line.

And so for some diseases, you just want to understand which gene variants actually cause disease and which don't.

And that is that in itself can be super empowering to patients.

And if beyond that, there are some diseases where we understand the chain, we just can't intervene and change a specific protein function.

That's super exciting too.

Like if we could design a protein to actually change the physiology, then we can actually cure someone.

But to me, like that is just as exciting as understanding, contributing to our understanding of like how someone gets sick in the first.

Yeah, and that's a very exciting vision because you're basically saying you can bring generalizable tools to provide very personalized things for each individual person.

Yes.

And that's the power of the approach.

You have these big models that you build that can then apply anywhere.

I know that you mentioned earlier that you're going to try and cure prevent all diseases within 100 years.

And you mentioned, hey, it could actually be sooner now, given all the advances than you have.

Do you have some thought of when we think we'll be closer to that goal?

I mean, I'm optimistic it'll be sooner.

I mean, I think the thing that's complicated is that it's a dynamic system, right?

So if you fix something, there will obviously be future things that you need to work on.

So I don't think that the current set of things that we're aware of are going to be the only things that need to get worked out.

But I don't know, I think that the progress with AI is really is obviously very exciting on this.

The other thing that I'd say just adding to what you were saying a second ago is we really look at more kind of systems than specific diseases.

So for example, one area that seems really important to understand is inflammation.

We talked about this a bunch.

This is a big focus of the Chicago Biohub.

There's a lot of data on that.

And that's very, it seems quite clear that it's connected to a bunch of different diseases.

But we don't rather than studying the specific diseases, we think that by trying to understand inflammation more broadly, that will make it so that other companies that can then use these tools can work on specific therapies.

Another example is, I think that the immune system, I think, is a very good case to study for some of the work that we're doing in cellular engineering.

And when we're kind of ladder up from proteins to cells to like whole dynamic systems within the body, I think that that one makes sense.

I mean, it's sort of privileged to can the cells can travel around through the body, all that.

Obviously, that has a big part in addressing different diseases.

How do you make the immune system function better?

But exactly how do you connect that last mile?

I think it's going to be more something that Biotech or other academics individually studying things will be better suited to do.

So this is kind of how we think about building up the tool set that just helps accelerate all these other folks.

Whether the timeline is 10 years or 100, you know, less than 100 now, I think it's useful for maybe your average doctor or patient human being, everybody's a patient, to think about like what's externally visible in the progress here.

You worked with patients for a long time at UCSF, like what should doctors look out for, what should people look out for if you're actually accelerating progress?

This is the part, you know, I'm super excited about the progress, especially with this launch that Alex and his team have put forward.

And I think it's very clear that science is going to start moving pretty quickly.

And I think the thing that's less clear to me is exactly how we translate to the clinic and what that looks like.

And I think what has to change is actually the way we do clinical research.

And my hope is that we're really shortening the distance between bench research and patient impact.

But there's a lot of steps there that we need people who actually take care of patients to think creatively and think about how to deploy safely.

And that's a gap that we have some work in.

We partner with Jennifer Downon at CRISPR Cures program at UCSF.

So we were dipping our toe and understanding how the deployment of research needs to change, given how quickly research will be progressing.

But that one is still, I think, is still shaping up.

Maybe I could say something about our most recent launch, and I think it also kind of...

Yeah, I guess it was just a week ago about now.

So we announced the new ESM fold.

And so this is basically an open system for scientific discovery and protein biology.

It's a world model of protein biology that's been trained.

It's a language model based.

So it's been trained on billions of protein sequences.

It kind of learns these emergent representations of protein biology.

And then we can use it to make predictions of atomic resolution, protein structure.

And we can use it to...

And it's really fast.

So it's blazing fast.

It was kind of illustrating this Pareto optimal frontier of speed and accuracy in structure prediction.

And so this allows us to kind of characterize, you know, really vast kind of stretches of the protein universe.

So we folded over 1.1 billion proteins and predicted their structures and and identified kind of features connecting all of them through mechanistic interpretability.

But I think the thing that I thought was most exciting about this model is it's this really general model of kind of protein biology.

And so you can use it as a world model.

You can actually really start to search the space of the world model to design new proteins.

And it's really, you know, hitting state of the art across pretty much every structure prediction benchmark.

And especially on protein protein interactions and protein antibody interactions, which is really critical for therapeutic design.

And so what we found is you can actually now use the model to design proteins and to design actually single chain antibodies.

And so you can do all of this digitally and then, you know, really in a small number of experimental trials, basically like a 96-well plate, you know, select from hundreds of thousands of trajectories digitally, actually synthesize, you know, 96 proteins, tests in the lab and are really kind of short, easy experimental cycle.

And we found nanomolar binders there.

And so, you know, that's really the level for therapeutic activity.

So it's, it's really I think showing that you can have these kind of general purpose models that can, you know, we didn't design a model for antibodies.

We didn't design a model to, you know, to be able to bind one particular target, you know, we just design a model that could understand proteins.

And you kind of get protein design as an emergent property.

And then I also think it illustrates this, this kind of the power of open science and open source because, you know, we release this is basically an open discovery engine.

And so really, anyone can build on it.

And so it takes what are these really intensive laboratory experiments where, you know, you have to screen through hundreds of thousands or millions of antibodies and high throughput screens in the lab.

And, you know, you can really just kind of spend up an instance in compute.

And now, you know, be able to generate antibodies.

You should say more about sort of like, we took that data when we looked at an antibody screen.

And then we validated, we looked at pdl in cells.

And then we looked at it under the cryoem and sort of how all that complemented validated the what you were seeing in the models.

That's right.

Yeah.

So I mean, I think it's really critical, you know, to actually go and characterize these molecules in the lab.

And it's, you know, we have a structural biology center here.

We have incredibly powerful cryoem microscopes.

And so we're really able to kind of look at these proteins biophysically and functionally.

And so, you know, we designed proteins for several therapeutically relevant targets and were able to confirm their their function.

And so the light when it works, the way it's posted.

Yeah, it's very amazing.

I also look at the structure also.

Yeah.

So you can see atomic resolution kind of at the binding air face is correct.

I know a lot of your work is really focused on basic research and kind of building other fundamentals.

If I look at actual translation into drugs or drug development, often a clinical trial will be 15 years, sort of cost $1.5 billion.

About $50 million of that often is the molecule and preclinical work.

And it's a few years of work.

And then the other 1.45 billion and decade plus is actually the drug development side of it.

A lot of that seems to be gated on some regulatory issues.

Some of its recruitment, it's a variety of things.

But a lot of it also has to do with the failure of drugs and trials around things like absorption or toxicity or things like that.

Have you considered it all tackling that other chain of sort of molecular design and thinking or is the primary focus more on the basic biology and sort of the initial sort of molecules?

I mean, at least my hope in building this comprehensive model of how cells work is actually also being able to predict off-target effects.

I think you can do some of that actually with biological models because right now some of the off-target effects are we just didn't know your kidney cell also expressed this receptor and then we tested in human like we see it happening and we see renal toxicity.

And so being and if you have a single cell atlas that looks at all the different cell types, some of which actually were not predicted before we modeled them.

You can start looking at which cells actually do have receptors for the target you thought you were exclusively targeting and be able to predict some of these downstream effects before we get into the human trials.

And I think that that's actually one of the more exciting applications of the like a transcriptomic model to understand actually how the different cells will react when you intervene and do something.

And you know but I think when you think about delivery mechanisms and patient care you start that's where you start having to be creative about when you ask to like what disease do you want to care first?

There are certain diseases that will be easier to like deliver at their periodic to or the risk for war it makes more sense.

And you know I think we were all inspired by baby KJ I think last year or no.

When the team at CHOP was able to deliver a CRISPR therapeutic to edit a mutation then he had would have that would have inevitably led him to a significant neuro neuro toxicity and altered his life but we were able to that disease was very carefully chosen because we needed to target his liver cells and if we could easily deliver a product that would work in his liver and I think that's when the creativity the the wherewithal to choose the right applications can help us unlock the first applications.

Maybe something just to add to that also you know because I mean kind of you describe the conventional you know drug development process right and I think you know these kind of tools have the potential to have a lot of impact on that process but you know what's what's interesting is to really start to think about kind of the new paradigms that can open up and you know what does it mean if you can you know the barrier to develop a drug to design a molecule you know to kind of get through all of those stages is so much lower and so you have programmable biology and you can you know really start to you know create a a medicine for every individual patient I think that has enormous implications for how we you know how we do drug development and what the future of medicine looks like.

That'll be an exciting day when that FDA accepts like a virtual clinical trial for the phase one or something or you know this based on some personal to be of that person yeah yeah even short of that like thinking about the specific like mechanisms where you see this acceleration like I imagine if people feel like they can predict impact in kidney cells or have a stronger perspective on talks because they have this broader understanding they'll be willing to try many more programs right yeah the recruitment could also change and we we had this program where is one and the basic idea is that a lot of people focus on the the most common diseases but there's this long tail and the economics don't quite work out for companies to focus on those diseases but if you can make it so that the groups of patients can kind of come together and organize and say hey we would take an experimental drug on this then it actually because of the costs that you're talking about and how that's a huge amount of the the overall cost if you can flip that then it actually makes it set the economics make a lot more sense to then if you can generate something more easily and you can pair it with a group of people I think one of the interesting things from science and engineering is that often you can hit your head against the wall on the common problems and in this case diseases but a lot of times you like learn a lot more about a system from finding some kind of rare or like weird side things happen in this case yeah so I don't know I think that that's like always been kind of an interesting part of this that actually connects pretty well to this because now you're gonna be able to enable a long tail of new kind of ideas to get tried and enable them to potentially get tested more easily yeah that's a really good point on rare um on in our rare disease cohorts first of all they're incredibly inspiring and powerful but patient groups are a self-organizing patient registries natural history registries um bio banks um they're organizing their own clinical trials there's gene therapy that one disease group has moved forward over the course of like I want to say like three to five years rather than decades and the speed is so fast because um the patients themselves have organized the resources that a scientist or a clinician might need and it's it's it's it's incredible but I think to some degree you're gonna need something like this because they're gonna be many more new things that can get created but that doesn't mean that for like the general population that you're not gonna want the same level of vetting that we've had historically but making it so that people who want to be on more of the frontier have the ability to do that is I think also gonna be pretty helpful mm-hmm yeah letting people opt in to be part of trial I think is one of the big ships that is starting to happen but could really help accelerate biology in general all three of you have mentioned um at different points like the power of open ecosystems in such a large space like I think some of that logic around open source and the breath or diversity of data collection you are as we're describing it should also apply in the like language model world and then multi-modal AI world like do you think that's right does any of the work you're doing here change how you think about AI and meta?

I mean I think it's sort of a similar philosophy overall and you know for someone who's talking about this that you know a lot of our our focus is building tools that empower individuals to do things and that's a sort of a common theme across a lot of the things that that I work on is it's just kind of putting the technology in individuals hands we don't believe in this like very centralized future where there should be a small number of institutions that that basically are advancing all of the stuff our vision is not that there's gonna be like some central super intelligence that solves all of science I think like people are really important and I think we'll be more important in the future and giving people more tools to be more productive is going to be like a critical part of any kind of positive future that both and that's how progress has always been made historically right it's not through centralization it's through empowering individuals to try things that are somewhat out of the mainstream that other people didn't think we're good ideas because they thought there were good ideas that already have been done so I think that that's very central to the whole ethos of I mean to some degree it's like why you create something like social media right to give people voice it's you know I think a lot of the stuff that we that I care about in terms of empowering people with individual AI open sources one instantiation of it it's not the only way to do it it certainly is one way that you basically are saying we're gonna take this technology and put it in everyone's hands in terms of science I think it really makes sense and we're deeply committed to open source um they're obviously interesting considerations on this that are important too because there's a lot of considerations around biosafety and things like that that we're gonna need to balance and think through how to how to handle um but I think overall this is like very deep in the ethos of the work that we're doing both at biohub and like probably a theme for a lot of the stuff that I do is just like we we believe that a positive future is one where you build a technology as a tool you put it in individual's hands and that's kind of how society makes progress you have um um this like I think incredibly ambitious mission at biohub and yet you know um the AI scientists that work here could also go work in commercial enterprises how do you think about the talent and to like how to bring people to biohub um I mean when you're gonna start you know um yeah I mean it's it's a very hot market for AI researchers but I think that part of the part of what that means is that there's a lot of uh demand and you like if they're very in demand can work on the things that they want to work on and I think this gets back to this point again about frontier AI and frontier biology right so yeah I mean I think like the AI researchers who work here could go work on on language models or things that any of the the main labs um but those labs don't have the frontier biology part attached to it so I think there's like also a just very large mission component of this which is like there's an ability to do this unique work here that you just can't really do at the other places um if so if you're if that's what your focus is then this um then you know I don't actually think that there's any other organization in the world that's been doing both the frontier biology on the frontier AI yeah why are you the relics I mean I think it's really simple yeah our mission is to take care of our fantasies and and I think you know there's it's just such an useful and you say with a straight faith and a lot of 100 year times it's very serious now there's no more hot stuff yeah yeah it's it's a really powerful mission and I think you know you um yeah I mean it's it's just you know scientists I think it's something yeah it's something people are deeply motivated by and I think you know we're at this moment and time where that actually seems like something that can be achieved and I think you know we're building a really unique place where we're we're tackling that problem and you know we have the resources I think kind of the the right the the right things to actually really really go after that and do that yeah I mean that resonates with me as somebody who you know talks to and hires a lot of research scientists they want to they want to know if you have the data if you have the tools if you have the compute if you have the talent and then what the mission is and so I actually think I think that's super competitive the other thing is that you don't need a very large team right so I think it's like an interesting thing about the world is that people care about different missions and that's good I think that's like part of the whole I'm part of why giving building these tools and giving people the ability to explore what they care about whether it's like a cross science or just across everything is like such a powerful way to make progress in society is that people care about different things and in order to make progress in AI you don't need like many many hundreds of AI researchers or thousands or anything like that I think you can really make progress with um you a very strong group of a dozen or a couple dozen people and yeah I mean finding people who like care about this mission is not a particularly hard thing I mean this is like a super important thing in the world so like that that's yeah it's it's just kind of a cool thing about the world is that people obviously are drawn to different different missions so I think the like simplest mental models that folks have even with their paying attention to the space are essentially like okay you know structure prediction models for proteins and protein protein interaction models and then so there's this one piece which is fundamental understanding and then there's this like theory of someday we're just going to be able to like zero shot things into either the clinic or the clinic with much much better hit rate um what needs to happen for us to go from yes and full two to this other piece yeah is that feasible I think that's a great question I mean I would say that I'm really optimistic on that so I think you know on the one hand you know these are problems that historically you know people could spend kind of an entire career working on like how do you how do you figure out how to effectively optimize a drug how do you get it you know get it through preclinical how do you do the early safety I think that you know when you have a new scientific paradigm kind of you know questions that were once hard um kind of become simplified through the new paradigm and so I'm very optimistic that kind of many of these core problems will be solved kind of in an emergent way uh through these models and I think one great example of that is toxicity whereas if you can kind of really digitally um digitally kind of simulate everything and be able to predict you know where a drug is going to distribute and bite across the human body you know like you kind of have um the the beginning of a solution to that kind of problem so I I think that I think that once you have these kind of accurate representations at the molecular level you know you we're going to start to see really rapid progress on a lot of these core problems.

What is the most exciting user experimentation uh with the models you've seen in the last week since release?

Yeah I mean it's just been great to kind of see it get integrated in all kinds of things I think one of the really interesting things that we've been seeing as people kind of connecting it with agente systems to just kind of do automated design um and kind of just automate that whole process so it's really I think another example of how you can kind of see bringing together um agente and frontier AI with you know the ability to have a world model for biology and actually reason about biology and you know really kind of start to automate um the the entire design process.

Are you taking um you know how do you decide what the next step in the research agenda is um it's like world model for biology and then I could I'm just going to be very coarse here like I could scale it up I could add more data I could actually like adding data is a non-trivial thing in terms of new methods and domains like what is do you take input from the um the larger ecosystem about you know how people are using it and what would make it more useful or is it really like we we understand like the next step of structures or coverage that we're looking for.

I mean I think there's two things so like we have a view on kind of the next big challenge which I think is you know the virtual cell and you know really being able to kind of lateral the hierarchy of biological complexity to the cell and sorry I'm very basic question yes virtual cell model like what is the input and output I should expect.

Yeah I mean I think there's a different views on that but yeah I think kind of what you ultimately want is a system that can really model each of the levels of complexity so you know the the the proteomic layer the genetic layer the transcriptomic layer and connect that to the phenotype and you need enough generality so that you can ask the model questions about a new intervention and a context that hasn't been trained on and kind of get an answer from it and you know the gap that we we need to close as a field is being able to um really make those predictions that can generalize so that's going to require you know enormous effort to generate data.

Yeah and then I mean in terms of what you decide to do next I think this is like you know a pretty normal process of constraint management right I mean it's like like I think every lab and every field across the world probably feels compute constrained I think that that's probably true here too right it's like um so I mean I know like you know there's always questions like okay should we double down more on advancing the protein piece should we do more of the cellular stuff like the Voser kind of ongoing debates in terms of how you sequence that um and then yeah within that there's kind of being at the Pareto frontier about how much you want to train the different models in order to like in in the size of the models is also dependent on the scale of the data that you have you know yeah for for obvious reasons so yeah I mean I think it's there's some of that is just where you want to be on the curves and then normal constraints but I think that this is like probably the same process that like any research organization goes through of like you want to go in all these different directions and you're just trying to constrain to optimize and make enough progress to do world class work at one thing at a time while planting some seeds that can blossom over the the next die a couple of years as well yeah that this has been the most dynamic period of technology at least I've seen a reminder I'm so excited in terms of everything that's happening with AI and every week there's something new that's changed are you tired or I'm both yeah I'm both yeah it's a combination of figurated and exhaust yeah it's wonderful and so I guess you know things are very unpredictable right now it's really hard to know it's coming we have this almost like early signs of experimentation on the model side with the genetic flows that we're starting to see in really interesting ways models starting to help more and more with models that's still very very early days for that if you're thinking back five years from now and you were to define what success was relative to your efforts and I know things have a very dynamic things changed a lot but you have this common thread of tooling for the bio lab you have a common thread of empowering scientists scale you're looking back five years from that was a specific thing that you really want to make sure that you've accomplished or achieved or a primary goal well but I think we have a pretty clear view of this like hierarchical set of world models that we want to build around biology and the other part of that is that we want to do the highest quality work in the world right I mean and I think we're basically set up to do that between having a world class AI research team and this collection of biohubs to your world class life science is research organizations I think that that's like fundamentally a set up that no other organization in the world has but you know you can have a lot of great ingredients and that doesn't guarantee that you succeed and so I mean to me like five years from now looking back I think you know it's another I'm sure other labs or efforts will try to produce like things that approximate what we're trying to do and I just think that we should be able to do something that is meaningfully better in a unique intellectual contribution to the world right I think that that's kind of what you and whenever you do any kind of research that's what you're trying to do right so yeah so if we do that I think we'll all feel very good I would also expect that at some point we'll just start seeing a lot more idea generation from the people using the models but I have enough faith that that part will materialize that for me it's more just about like making sure that we do world class work and I think if we do like the rest almost we'll take care of itself very last question for you snapshot of it's mid twenty twenty six was the biggest update in your own thinking about biohub or the domain from the last year from the last year I mean you joined in the last year I mean I think the the biggest thing that that we basically rotated and and I think the last year we basically kind of formalized the biohub is the main focus of our philanthropy so I think this is like I've been a very big shift but Alex and the team coming in I think has been interesting not only because it's it's a world class group right I mean you guys have worked together for a while I go also I mean you talked about how stuff is changing so much in the field I think one thing that's underrated is like this is like a extremely talented group of people who also are like no each other and work well together and like our stable and good and like I think that that also is under estimated in terms of the compounding benefit of like people being able to like work well in a stable environment over time so I think that's a really important piece but part of what we wanted to do was prior to Alex leading the effort the previous leaders of the biohub were basically primarily biologists who were interested in technology right and now I think we this is the point where we really flipped that right where I mean obviously you have a background in biology as well but like you are primarily an AI researcher who has a background in in AI and in biology and that that's like a deep reflection on on kind of the way that I that we expect that this is going to kind of drive more value in the future so that those are probably the biggest updates in the last year and in terms of the work that that we're doing I mean it's a new leader not just the leader but a team that I think has been is is like a really good and then yeah I mean I think on the rest of the industry it's like it's on track I mean I think like it's kind of this crazy thing because like when you have an exponentially growing curve I think the way that an exponential curve feels is it's growing so quickly that the kind of emotional feeling is it can't possibly keep going right because like it's it because it's just like but I mean the nature of an exponential curve is it doesn't just keep going it keeps accelerating quite exponential growth is accelerating so I think that that has all these like emotions and psychology attached to it but I think fundamentally when you look at the curve and the industry the kind of fundamental thing is it is on track it has remained on that curve which I think has all these very profound implications for all of these domains but certainly it validates and makes one feel very good about making a very big investment and the things that that will play out if that if you stay on that track and it seems like we are so that I think is very good news.

Nothing the most important aspect of what you're doing there is you're actually closing the loop with the actual biology because with code and research it's close-up systems and so they're very fast to iterate this is an open loop system so you're closing a loop and that's that's really crucial to progress.

Yeah for me one of the biggest changes with the strategy we're driving now in Alex at the helm is you know before we had amazing teams moving generally in the same direction and understanding like the potential collaborations and interconnecting this of our work but now we are at arms linked moving together.

It feels very direct.

It's very directed and it's very exciting it's a little bit scary but it's like truly a team playing off each other and trying to make progress towards this goal and that has taken a lot of work but also the maturity our teams being able to have their work at a level of maturation where it actually does make sense to interlock.

Amazing well to teams being on the curve.

Thank you guys for doing that.

Thank you.

Thank you.