The a16z Show · 2025-11-06

Zuckerberg & Chan: AI, Virtual Cells, and CZI's Decade-Long Bet to Cure Disease

Hosts: Unknown

Guests: Mark Zuckerberg, Priscilla Chan

AI for biologyVirtual cell modelsOpen source science infrastructureChan Zuckerberg Initiative / BiohubPrecision medicineSingle-cell genomicsFrontier AI + biology integrationAcademic compute clusters

Why it matters

Zuckerberg and Chan on CZI's Biohub, virtual cell models, and decade-long disease mission

Key claims

  • CZI's Biohub is now the main thrust of their philanthropy, operating as a unified team under Alex Reeves (ex-EvolutionaryScale/Meta protein folding), combining frontier biology with frontier AI
  • CZ Cell×Gene began as an annotation tool to unblock a workflow bottleneck and accidentally became the standardized data format for single-cell biology—75% of cataloged cells came from the community, not CZI funding
  • Virtual cell models are framed as the next "microscope"—a hierarchical system from proteins to cells to virtual immune systems that lets scientists test high-risk hypotheses in silico before committing to wet-lab work
  • They've built one of the first large-scale academic compute clusters (~1,000 GPUs) and are expanding to ~10,000 GPUs, which they describe as the new lab space; they invite outside scientists to apply for compute access

Episode summary

Summary

Mark Zuckerberg and Priscilla Chan join a16z to discuss the Chan Zuckerberg Initiative's (CZI) 10-year journey building infrastructure for modern biological research, and why they are now consolidating their efforts into the Biohub as the central pillar of their philanthropy. They explain the origin story of the Cell Atlas project—what began as a simple annotation tool (CZ Cell×Gene) inadvertently became the de facto data standard for single-cell biology, with ~75% of its millions of cells contributed by the broader scientific community. Zuckerberg argues that breakthroughs in science are typically preceded by new observational tools, and that biology is still missing a "periodic table of elements"-equivalent foundation.

The core of the conversation centers on virtual cell models—a grand challenge they believe could compress the disease-curing timeline from a century to something significantly sooner. They describe a hierarchical approach: state-of-the-art protein models (built with the EvolutionaryScale team joining from Meta) feeding into cellular models, eventually enabling virtual immune systems and personalized simulations. The goal isn't 100% accuracy but enough directional signal to de-risk hypotheses in silico before expensive wet-lab work.

Zuckerberg announces that the Biohub will now operate as a unified entity under Alex Reeves (formerly of EvolutionaryScale and Meta's protein-folding work), combining frontier biology teams in SF, Chicago, and New York with a central AI team. They highlight their compute cluster expansion from ~1,000 to 10,000 GPUs as the new "lab space," and emphasize their open-source philosophy—giving away tools that others won't build because the credit accrues to users, not builders. Chan adds that the user interfaces are intentionally designed for low barrier of entry so immunologists can contribute to neurodegeneration research, biologists can engage with engineering, and interdisciplinary collaboration can flourish organically.

  • CZI's Biohub is now the main thrust of their philanthropy, operating as a unified team under Alex Reeves (ex-EvolutionaryScale/Meta protein folding), combining frontier biology with frontier AI
  • CZ Cell×Gene began as an annotation tool to unblock a workflow bottleneck and accidentally became the standardized data format for single-cell biology—75% of cataloged cells came from the community, not CZI funding
  • Virtual cell models are framed as the next "microscope"—a hierarchical system from proteins to cells to virtual immune systems that lets scientists test high-risk hypotheses in silico before committing to wet-lab work
  • They've built one of the first large-scale academic compute clusters (~1,000 GPUs) and are expanding to ~10,000 GPUs, which they describe as the new lab space; they invite outside scientists to apply for compute access
  • Zuckerberg argues AI is simultaneously the most over- and underestimated technology, and that domain-specific models paired with curated data outperform general approaches—a lesson they say carries over from industry video and language models
  • Three Biohub locations have distinct focuses: New York on cell engineering, Chicago on tissue/cell-cell communication, and San Francisco on deep imaging and transcriptomics, each co-located with partner universities
  • CZI's model is explicitly philanthropic and open-source—the credit is seeing scientists and startups actually use the tools, including pharma and a16z portfolio companies working on diseases like idiopathic pulmonary fibrosis
  • Zuckerberg believes AI advances could compress the end-of-century timeline for curing/preventing disease "significantly sooner," though he emphasizes the strategy is to empower the scientific community rather than CZI doing it alone

Source material

Transcript

This is a space that there's just going to be a huge amount of leverage with AI.

It still seems like there could be a lot more effort in this space around building tools, and it's kind of this crazy thing that we're here in 2025, and there's not the kind of periodic table of elements equivalent for biology.

We think that this is probably one of the most important sets of tools that you need to build.

When we first set out the goal to cure and prevent disease by the end of the century, people like, honestly, most scientists couldn't look at us with a straight face.

And because- They're like, "You're crazy!"

Yes!

And it was true because if you just decided to spend the money funding the next best grant for every single lab in the country, like, there was no pathway to that being true.

The biology folks, I think, looked at it as if it were crazy ambitious, and then the AI folks were like, "Well, that's kind of boring.

That's just automatically going to happen."

Yeah, I know.

There's something in between there that needs to be bridged.

The scientific community needs fundamentally new tools to cure disease, not just more funding.

For decades, biological research has been constrained by the same limitations.

Small grants that fund incremental progress, isolated labs working on narrow questions, and a lack of shared infrastructure to tackle the biggest challenges in medicine.

But what if we could change that?

Today, you'll hear from Priscilla Chan and Mark Zuckerberg on their 10-year journey building the infrastructure for modern biological research.

We discuss how they accidentally created the standard for biology data with their Cell Atlas project, cataloging millions of cells in an open source format.

We explore why they're betting on virtual cells that let scientists test high-risk hypotheses in silico before investing in extensive wet lab work.

And we dive into Biohub, their play to accelerate discovery by pairing frontier biology with frontier AI.

Hope you enjoy.

Mark Priscilla, welcome to the ASINZ podcast.

Thanks for having us.

Yeah, great to be here.

Excited.

All right.

So excited to have you.

You're doing exciting stuff.

Yeah.

To that end, almost a decade ago, you guys started the Chan Zuckerberg Initiative with the mission and intent to cure, prevent, manage all disease by the end of this century.

There's a lot of missions that you guys could have poured your time and resources into.

Take us behind the conversations of why you guys picked this one.

Maybe Priscilla, why don't we start with you and your side story.

It always surprises people when I talk about how we work in basic science research.

I trained as a pediatrician and people always think, "Oh, it must be about medicine."

And for me, I went into medicine because I wanted to improve people's lives.

I wanted to make a difference.

I wanted to be able to help others.

And I think training as a pediatrician at UCSF, I met a lot of patients and frankly, like little kids and families, for which we just had no idea what the problem was.

And they might have a specific gene that they could name if they were lucky, or they could be grouped into a bunch of other diseases and there'd be a general sort of PDF they'd print out, like, "This is what we know."

And then it was my job as an intern or resident to try to translate a few lines of information to how we were going to supposed to take care of the patient.

And for me, that's when I really realized the power of basic science and how we need to work on basic science to advance the forefront of what's possible.

I think of it as the pipeline of hope.

Yeah.

And why did you think you could cure all disease?

Because that's like a very aggressive goal.

Do you want to answer that one?

Yeah.

Well, I mean, we're not going to cure all diseases, to be clear.

I mean, the strategy is to help scientists and the scientific community cure all diseases.

So the strategy is really one of accelerating the pace of basic science.

And the theory that we had was if you look at the history of science, most major breakthroughs are basically preceded by the invention of a new tool to observe phenomena in a new way.

So it's about things like the microscope, being able to observe bacteria or other fields, the telescope or, you know, but it's just to use an engineering example, without those kind of tools, it's kind of like you're coding without being able to step through the code and debug things, right?

That's like the old days.

Yeah.

So our whole approach on this is basically let's help build tools that will accelerate the pace of the whole field.

And I think that there's a niche that I think fits that because if you look at how funding works in science, the vast majority of funding comes from the government and NIH grants.

It's parceled out into these relatively small grants that allow individual investigators to investigate usually pretty near term things.

And the development of these kind of new types of tools, whether it's imaging or building now a lot of AI things like virtual cell models are longer term, oftentimes more expensive to develop.

So think about like, right on the order of maybe a hundred million to a billion dollars over a 10 to 15 year period, and then you try to unlock those tools and give them to the scientific community to accelerate the pace.

So that's kind of the theory.

Right.

And it seems like there's also something that is you don't really get credit for the tools in a lot of ways.

I mean, we have companies that use your tools and they're very happy about it, but I didn't even know that that was the case.

And so that's philanthropy.

Yeah.

Well, it is.

But most people do philanthropy to get credit too.

I mean, that's kind of a part of it.

So I guess did you think about that or were you just like, no, like this is going to work.

And if it works, that's all we need.

We're super focused on like actually making every scientist better and beyond science, like startups, startup founders, because the point is we can't do this alone.

And when we first set up the goal to cure and prevent disease by the end of the century, people like honestly, most scientists couldn't look at us with a straight face.

And they're like, you're crazy.

Yes.

And it was true because if you just decided to spend the money funding the next best grant for every single lab in the country, like you, there was no pathway to that being true.

But if you forced people to really think about this and like, okay, what is the most credible pathway to doing this and what are the barriers to that credible pathway, then we sort of got somewhere, right?

They were like, well, like, there's no shared tools or we're not working on big projects and building the right data sets.

And we're like, okay, well, then we can start doing something about that.

And so that's where the idea of building shared tools, because no one right now in the same.

That's so interesting.

So basically you're like, we're going to cure all disease.

And they're like, can't be done.

Why can't it be done?

Well, because we don't have the tools.

Okay.

That's pretty, that's a pretty cool sequence.

Yeah.

I mean, there's also this funny thing where the biology folks, I think, looked at it as if it were crazy ambitious.

And then the AI folks are like, well, that's kind of boring.

That's just automatically going to happen.

So I know that's like, there's something in between there that needs to be bridged.

And if you can like kind of use the kind of modern AI tools in order to build the types of tools that biologists need.

So that's a big part of how we think about our work is AI has got to be the most overestimated and underestimated technology ever.

Like simultaneously.

I mean, yeah, we'll probably like the internet early on, but we kind of think about ourselves and the work that we're doing at the Biohub as frontier biology paired with frontier AI.

So there are labs that do frontier AI that basically, you know, are building the most advanced models.

And then there are lots of biological research organizations that effectively do very leading edge research to build to either discover new datasets or looking to certain challenges.

But so far there hasn't been anyone who's tried to do both of those at once.

And when you look at, I mean, even something like AlphaFold, which is amazing, right?

It was built off of this dataset that was a public dataset that had been produced decades ago.

And what I think you have the opportunity to do if you do both of those together is produce specific datasets for the purpose of training AI models to build virtual cells that can do specific things.

So I think that that's like a pretty interesting zone to be in.

And of all the things that we've worked on, you know, actually, when we started CZI, we kind of actually focused on a number of areas.

And what we found is just that the science research has had by far the biggest returns who just doubled down on it over and over and over until now we're at the point that we're 10 years in and Biohub is really the main focus of our philanthropy at this point.

But yeah, I mean, that's kind of, that's basically the focus.

Maybe you're not giving yourselves enough credit because you're sort of saying, well, there's bite-sized science.

We don't want to do that.

There's century-scale science.

And that seemed like a long time horizon, but achievable, ambitious.

But you've actually identified, which I think is really fantastic, grand scientific challenges that are right in between.

They're 10 to 15 year horizons, at least per kind of the way you communicate about them and the way you energize the scientific community about them.

10 to 15 is kind of an interesting time horizon, sort of like similar to the time horizon of a venture backed company, similar to the time horizon on which a team can work together for that period of time.

How did you get to that number?

And then how are you thinking about the challenges that you take on in each 10 to 15 year wave?

Because that's concrete, achievable.

You build a lot of credibility around it, the way that you've announced those challenges.

Well, I'm curious how you guys think about it.

But for us, when we looked at the grand challenges on the 10 to 15 year time horizon, it needs to be like, when you look at it, you're like, I see a path.

Not everything needs to be solved for us to take it on.

In fact, if everything's solved, then that feels like that should just go.

That wasn't ambitious enough.

Yeah.

Like we have some risk appetite, right?

So we want things where we're like, there's a credible pathway, someone who is at the helm who can do this.

And there's enough ambiguity where we feel like we could take on that risk.

And if we do it, the returns could be higher than even expected.

And the way we modeled that in the bio hubs is we have three bio hubs.

We have one in San Francisco, one in Chicago, one in New York.

The one in New York works on cell engineering.

Can we engineer cells to go in and detect signals, read it out, or to take certain actions?

In Chicago, we're building tissues and looking at cell communications within tissues.

And then in San Francisco, we're looking at deep imaging and transcriptomics.

And that work, the locations are not by accident.

We also look at the partner universities, because we have folks who come to the bio hubs to do this work, collaborative, interdisciplinary, and sort of unconstrained by the traditional lab.

But we also build off of the labs at these academic institutes that support the work.

And so that's how we sort of choose the grand challenge and the locations.

And then the sort of layering in the large language models and AI coming into the picture has been so interesting, because we were already building tools to measure interesting data, building the data sets.

But we didn't really know what to do with them yet.

And large language models coming onto the scene were like, wow, we can make sense of all of this now.

>> I'm curious what you view success as in the therapeutic realm.

So we think a lot about understanding biology, and sometimes we bet on startups that want to unlock completely new biological areas, diseases where we don't know what's going wrong.

And then there's another group of folks who kind of say, hey, okay, now that we understand what's going wrong, let's fix it.

Let's come in with a drug.

Let's come in with a new type of chemistry, a new type of antibody.

What do you think success for the CZ Biohub looks like 10, 20, 50 years from now in terms of the new medicines that you've enabled?

>> We want there to be like an explosion of a community who are building these, just the new wave of what it means to be deploying precision medicine.

I think for rare diseases and common diseases alike, you're really talking about individual biology that we sort of lump together.

We often don't know how it happens, right?

We know that you have this mutation or the worst nightmares, you have a variant of unknown significance.

What does that even mean?

>> The horrible view.

>> Yes, horrible.

And you're like, you tell someone you kind of know something, but we don't know what it means.

But if you look at the way we've been able to look at variants and look at single cell transcriptomics, we're starting to be able to say, okay, this variant actually impacts this set of downstream cells.

And then we start looking at the proteins that get expressed and how it looks similar or different to what a healthy cell would look like.

Then you can start targeting, okay, like let's look at that as a target.

And you both know the specificity of the target you want to build based on the ability to connect mutation to protein expression, as well as to be able to predict off a target effects.

What are the side effects?

Because you also know where else that drug will be able to interact with the body.

And so those are rare, but I really think most diseases should be thought of as rare diseases because each one of our biology is different.

And right now, we just get lumped, right?

We get lumped based on age, demographics, ancestry, if we're lucky to have that level of understanding.

But truly, each one of our biology is different and say, like, if you look at hypertension or depression, like we kind of just go by trial and error and saying, like, let's just try that drug and see what happens.

But what should really happen is being able to precisely and accurately and quickly treat people by looking at individuals' biology.

We want to enable the basic science, and we would be thrilled if people picked up the models that we build to be able to build the diagnostics, the therapeutics that need to come.

You've built amazing datasets.

I have to say, like, I mean, you may not hear the feedback from the startup community and the pharma community and the R&D community, but it's there because you've committed to open source.

And so people may not be -- they may not all be writing papers, but they are using those tools.

There's a startup in our portfolio working on idiopathic pulmonary fibrosis.

The name tells you how vexing the disease is.

It's idiopathic.

We don't know why it happens.

The IPF is named that way.

And so, you know, he was telling me that he used your cell by gene atlases to look at millions of single cells in patients with disease, without disease, try to pinpoint the fibroblasts, double-click on the fibroblasts and their gene expression and try to use that to inform, hey, where could I go after a new drug target in this disease that's fundamentally a strange clump of idiopathic origin.

So I think there's a huge group of innovators who love the tools, the visualizations, the query systems, and really the software approach that you built to making that data incredibly accessible.

Cell by gene is like almost an accident, though.

Tell us more.

So do you want to share a little bit about cell by gene or do you want me to start?

Well, I mean, I don't know which part you want to get into, but I mean, but the cell atlas work overall, I mean, it's kind of this crazy thing that we're here in 2025 and there's not the kind of periodic table of elements equivalent for biology.

So that was sort of a lot of the inspiration of it was, all right, how do we both through work that we're going to do in the Biohub and through other grants be able to pull together and standardize a format where you can have all this data.

And when we were starting off, we didn't even necessarily have in mind that we were going to use that to build virtual cell models.

I think that that sort of just come into focus as the AI work has advanced.

But that's a very exciting thing.

We should definitely spend a bunch of time on the virtual cell models, but I'm not sure what you wanted to get into on the cell atlas.

Well, the single cell work was one of our first RFAs 10 years ago we started and we were like, okay, we think this is possible.

We actually funded the methodology for it to standardize how it was going to be done.

So that was 10 years ago.

And we then we seated a few labs to start building out that dataset, but we were like, there are like millions or billions of different cell types and different permutations.

Like, how are we going to do this?

And especially with like a burgeoning technique.

And so we ended up seating a few groups and they started doing work and then they told us they had a problem.

There was a bottleneck in their workflow because they couldn't annotate the data fast enough.

And so we built cell by gene was an annotation tool.

That's the original source of this.

So we built the annotation tool to make it easy for people who are doing single cell science to be able to annotate the data.

And then we put the data that we collected publicly so people could share.

But because everyone started using the same annotation tool, everyone was standardized then on the same data formats.

And then there started being a community around the tool and they wanted to share back and build the Atlas.

So now after 10 years, there are millions of cells that have been built into this shared resource for the entire scientific community.

We only funded about 75% of it.

Sorry, that's wrong.

We've only funded 25% of it.

75% came from the broader community saying this is useful and there's an easy way for us to standardize and build this problem.

They had the same metadata.

Yeah.

That's right.

It's like an interesting, what you'd call a network effect.

Yeah, I was going to say it sounds like the internet.

Yeah.

Come for the annotation, stay for the virtual cell model.

Well, it was very important when we were getting started with the work to have everyone who was doing it have a consistent format.

So that way it could be used and portable.

And then once that kind of took off as the way that would get done, then other people just found it valuable.

Yeah, and even relative to prior data, bases like GEO and whatnot, they're just simply not as standardized or QC.

Yeah.

Let's get into virtual cells.

One of the great challenges, the grand challenges you would focus on, maybe talk about what is the promise or the hope and maybe some of the challenges or where we're at with it.

Yeah.

I mean, we think that this is going to be one of the most important tools at this point is basically building up the kind of hierarchy from proteins to just different structures from the cell to a whole virtual immune system or different levels of hierarchy.

And we think this is going to end up being a very important set of tools for people to effectively generate hypotheses for different science work.

Even before you get to the point where you're really running full experiments in it, you can come up with some estimate of how that might run.

It will be useful for some of the precision medicine type examples that Priscilla was talking about a few minutes ago.

But we think that this is probably one of the most important sets of tools that you need to build.

And it's not a single thing.

So there's different angles to come at this from.

The cell atlas data is helpful for understanding things on a cellular level.

One of the most important things that we're doing right now, this great company Evolutionary Scale, actually had a bunch of researchers who had formerly worked at Meta on protein folding models, is joining a bio hub.

And Alex Reeves, the leader of it, is actually going to be the head of the whole science program, which is actually interesting when you think about it, where it's like you have AI and biology coming together.

And really, it's like an AI person who understands biology is running it rather than a biologist who has some understanding of AI, I think just speaks a little bit to where we think the relative weight of these things is.

But we basically view, like Priscilla was saying with the different bio hubs, New York doing cellular engineering will basically make it so that you can have cells that can record different things that are going on around the body and share that data.

And then you can build that into models.

The Chicago Bio Hub being able to record inflammation and basically study that in order to help understand...

That's a different dataset.

We have the Imaging Institute, which is...

We just trained our first set of models around that, which are the first spatial models around understanding the way that cells look in different states.

And eventually, just like you have this analogy on the industry side or on language models where you have different capabilities, and then over time you train them into models and it gets more and more general.

That's the idea here.

So we'll build the bio hubs around grand biological challenges.

The bio hubs will build tools that will generate novel datasets.

We will build models based on those and then eventually combine the models into an increasingly general view of a virtual cell that will be useful both for scientists and hopefully startups and companies that are working on finding drugs, which is not our part of the whole thing, but I think is obviously a really important part of what needs to happen.

Yeah.

And you guys think about risk all the time in terms of when you make investments.

I think the promise of being able to do virtual biology using a virtual cell model is you can actually take on riskier ideas.

Right now, grant funding can be hard to come by and the wet lab work is expensive and slow and it's not just money, it's also time.

And so you have to choose something that you think is going to have some likelihood of success to keep your lab career going.

And so it naturally lends people to take on some risk, but not a lot of risk because they need to make sure that they are hitting a certain percentage of the time to make tenure or publish or whatever they need to do.

But if you had a virtual cell model where you could simulate really high quality biology, you could actually then start testing and tinkering on the computational side and ask riskier questions, things that would have been expensive and costly in terms of time and resources to do in the lab and actually see if there is promise doing the experiments in silico before you make the time and money investment in the wet lab.

Do you think of it kind of like a model organism?

Yeah.

Like it's the new fruit fly.

Yeah, I was going to ask, given the complexity of a cell, like how close, like how accurate do you think you'll get the model to?

I mean, just assuming, I mean, maybe you get it to like a perfectly accurate representation of a cell, but like how accurate to be useful with the virtual cell ought to be.

I think it will obviously iterate and get better and better because right now we like right now we're still just talking about transcriptomics.

We're expanding into different ways of looking at the cell, but you get more and more accuracy.

But I don't think it needs to be 100% accurate to be useful because you just want to be able to de-risk the idea on the front end a little bit.

And the more and more you de-risk it, the more efficient it gets, obviously, but it'll be useful if you even get directional signal.

And yes, we do think about it like as a model organism, but in a way that's like has fidelity to the human body.

Like, you know, like I don't want to...

All models are wrong.

Some are useful.

Yeah.

Yes.

Hopefully has utility on certain acts.

Exactly.

And just like the language models, you build in specific capabilities.

So it's not just...

So for example, one of the models that we're publishing is variant former, right?

It basically makes it so that it's trained on a bunch of effectively pairs.

If you have a cell, you apply CRISPR to it in a place, you see what comes out at the other side.

So it basically is able to make that kind of a prediction.

Like, okay, if you have this edit that you're doing to a cell, what is likely going to happen?

Another one of the models is it's this diffusion model.

Basically, you can describe a type of cell that you would like it to simulate, then it will just produce a kind of synthetic model of the cell.

Again, I mean, it's kind of interesting because to Priscilla's point before about how everyone is different and different cells have kind of...

You want to be able to simulate these kind of rare configurations.

Having at least a synthetic version of what that could look like is interesting.

And then you can test against that.

The cryo model, I think is interesting because it's spatial.

So it kind of gives you a sense of there are all these different models that you can have that allow you to basically look at different kinds of things.

Then you just train them in to be increasingly general over time.

Oh, yeah, they're very interesting.

And is the modeling technology basically LLMs or is there a reasoning model?

Is it like a just...

Oh, that's actually...

Yeah, no, that's a fascinating one too.

Because one of the new models...

I think this one is very early, but it's basically the first reasoning model over biology.

So the idea is that, yeah, you effectively have these models that kind of simulate world models in different ways and then you want it to be able to not just be able to spit out correlations in terms of what it's found, but actually be able to kind of reason through how things would evolve and why things would happen.

I know it's quite early, but it is interesting conceptually as what I think is clearly going to be an important direction in terms of how these models evolve.

Because that's what I was thinking, that if it doesn't work, the next question you have is why.

But I think what you find in reasoning...

Because I'm married to your hypothesis.

Sure, sure.

I mean, the...

Yeah, I thought you were saying if the reasoning model doesn't work, why?

I mean, I think the...

Yeah, well, yeah.

That's kind of...

Well, you're way in the making of this.

I mean, the language model analogy for that would be you need better kind of world models or better pre-trained models in order to get the reasoning to be good.

But it's...

Yeah, you just...

You build more capabilities into it.

And I think that there's probably an order too.

So the work that Alex and the evolutionary scale folks worked on is a lot of it is protein, which is interesting because that's at a kind of smaller resolution, obviously, than the cellular data, the cellophase.

But part of the hypothesis is that you can look at all these different cells and you can kind of simulate how they might behave, but you're going to have a somewhat shallow understanding, unless you actually have this hierarchical understanding of what...

How the sub-components of the cells are going to interact.

So our view is that you basically want to build up a state of the art protein model, and then have that be a part of the state of the art cellular model.

And then once you have that, you build things like the virtual immune system, which allows you to simulate much more complicated systems.

But it's sort of this hierarchical approach to building up these virtual models.

That makes a lot of sense because also as you get into personalization, you've got common proteins combining into a unique cell.

So that makes it like from a system standpoint, that makes it much more manageable.

That makes a lot of sense.

Interesting.

Yeah.

Yeah.

No, it's very fascinating stuff.

Yeah.

So you guys are announcing some big news this week.

Do you want to give us a sneak preview?

Well, the big news is thinking about how we are going to be coming together as one team.

And in the past, we've run bio hubs, and we've done built software.

We've done some AI research.

But all of it has been really thinking has been a little bit decentralized.

But now under Alex's leadership, we are going to come together as the bio hub, an operating philanthropy where we are doing the science in service of a singular goal together.

And how do we actually advance the state of biology and research at the intersection of AI and biology?

Amazing.

Alex is amazing.

Yeah, he's great.

And then the other thing is that the piece that I mentioned earlier, which is just, yeah, I mean, CZI has focused on a number of different things.

We've really just found over time that we feel like we've been able to make the biggest difference in science.

So we've just kept on doubling down on it.

And we're going to continue doing work and education.

We're going to continue supporting local communities and in those different pieces.

But going forward, the bio hub is really going to be the main thrust of our philanthropy.

And we're very excited about that.

Because I think that there has been, when we started the mission to see if we could help the scientific community cure and prevent diseases by the end of the century, I do think with the advances in AI, that should be possible to do significantly sooner.

And that is a very worthy and important and very exciting goal that we think we kind of have a unique place in the ecosystem that we can help empower others to make fast progress on that.

So there's obviously like plenty of advantages to decentralization from a management communication overhead and so forth.

And so like, what are you trying to add by adding this kind of new layer slash unification on top?

Like what are the outputs?

And then I guess what are the complexities to that?

Because that's, I'm sorry to ask a CEO question.

No, no, I mean, I'm like, for a friend.

We think about it.

You want to go for it?

And I can jump in.

Yeah, so there are obviously amazing groups doing frontier AI and a lot of groups doing great frontier biology.

And where we think we can do uniquely is actually tie these two together.

And we are we've funded data sets, we've built data sets, we're like building the instrumentation now to be able to look at the cell, whether it's, you know, for at the tissue cell cell communication, our cryo EM, where we can look at the cell at a nearly atomic level.

So we have the ability to not only build the data sets, but actually shape and form them the way we want, based on what we see as necessary to complement the existing body of knowledge.

And so we have amazing teams doing that work.

And we're building these AI models.

And so what the reason to do it together is then we can actually complete the flywheel, like, you know, the model is looking like it has some gaps and blind spots in this area.

Okay, who do we talk to?

How do we build the next data set?

And, you know, we're seeing this in the lab, like, the metadata is going to be so rich that we can feed back into the way that we do this modeling.

And so if we can close that loop, which is our goal and bringing everyone together, it's, I think it's gonna be incredibly powerful.

And it's more than just like, you know, writing down a spec and saying, like, please deliver this, like, these people need to be sort of working shoulder to shoulder and shaping each other's work for this to actually be the more and more accurate model of how the human cell works.

Well, you know, it's so interesting, because that is exactly like, that's been the biggest surprise in the industry for us in AI worlds, like, forget biology for one second, is that the domain specific models have been like super interesting.

Like, the original taste is really like there's just some AI's are gonna get so smart, they're gonna be smarter than everybody at everything.

But like on video models, like every video model is best at something but not everything.

And so knowing what problem you're solving actually turns out to be sort of ironically very important in AI.

Because you can actually get to a way better result.

Yes, if you put the two together, like, yeah, we're seeing that over and over again, in a way that is very counterintuitive to the whole narrative kind of going into it.

And biology, it used to be the at least, you know, one assumption was all the data sets aren't on the internet.

And so part of the reason you need a domain specific model is that the data sets are not public.

You guys are kind of bucking that trend too, by creating a lot of open source access to the data.

And then even then, it sounds like you're betting, you know, on the trend that we're seeing in other industries.

But still there will be nuance in how you annotate that data, curate that data.

Well, how you talk to a scientist, right?

Like, so you have to not only know the data and the model and so forth, but like the conversation is what we can find out ends up being very, very important.

So rich and so important how you actually...

A scientist isn't going to talk to it like, you know, I talked to chat GPT or whatever.

This is the fruit fly you can talk to.

That's really, that's super exciting.

And then user interface is actually really important.

You talked about you guys have a founder who's using cell by gene.

That user interface was intentionally designed to not need to have a computational or really a very deep biological background to be able to use because you want people coming from different fields to look at the problem.

It's like, look here, help us solve problems here.

And so building that user interface in a way where it's not a very high barrier to entry to be able to poke around and learn something and bring knowledge back to your work, that's intentional.

And we're really hoping when we build these virtual models that we get to a place where we can allow a lower and lower barrier entry for people to say like, you know, like I have some knowledge about this, maybe I can contribute.

A very pertinent example is turns out, I think immunology has a ton to do with neurodegeneration.

But it was like immunology is behind all this.

Everything.

So it might be part of your sensory vision.

So you need to be able to allow the immunologist to come in and understand neurodegeneration and understand how their world fits in.

And so the more you lower the barrier to entry allows people to actually think in a sort of truly collaborative and interdisciplinary way.

So will the Biohub grow as a team?

Like will you employ more people at the Biohub proper or are you moving towards more of a network model with more sites, more labs, more community driven data sets, like which is the thruster?

Maybe it's both.

Probably a little of both.

And we've added new Biohubs over time.

And then we're also building up more of this like central AI team.

But I know I think that these organizational questions of how do you set this up are fascinating.

And a lot of our approach is sort of informed by what the rest of the field is doing.

Because you kind of think about science as it's this portfolio, right?

Society has a portfolio of stuff that it's trying to do.

And as in terms of philanthropy, you want to be the most additive that you can be by trying to figure out what else is underrepresented.

So science by default is very decentralized, right?

It's like kind of the way that granting has worked, the way that I think scientists by default want to work.

So I think a lot of what we've found is that figuring out ways to encourage collaboration in ways that otherwise seem very simple, but weren't happening before, can unlock a lot of value.

So the very first Biohub, what we did, there were two kind of interesting things.

One was it was this collaboration between UCSF, Stanford and Berkeley.

And there are all these really smart people at all these different places who previously, I guess in theory, they could have figured out a way to work together, but there was not really a formal construct for them to do that.

And this just allowed a lot more collaboration.

The other one is cross discipline, basically having biologists sit next to engineers.

And this view that like these two disciplines are things that need to...

And I don't know, I'm sure you've seen this in a lot of the companies, but there's so many interesting...

The companies always set them apart.

Well, it's interesting how many organizational questions or problems you can fix just by having two teams sit together, right?

It doesn't matter what the org chart is or whatever.

It's like, you guys need to sit next to each other until you get this thing to work.

And that's something I really believe in.

And you have 10 to 15 years.

Well, no, it's all like communication is such an underrated problem in general, in all kinds of...

In building anything, or solving anything.

So that's pretty neat.

Yeah.

And it's just really kind of simple stuff, but I think it's sort of novel as a model.

And one of the things that's...

So we've now copied this from the first Biohub to the Biohub network and expanded it to other models, but it's also just been neat to see other folks who are working in the field also adopt similar models because it's a pretty intuitive thing.

But at some point, you'll reach the point where actually it's really good to have decentralized work too, right?

So it shouldn't be that...

We're not saying that this is the way that all science should work.

We're just saying that there's a space for this that can unlock a lot of value because for whatever reason hasn't been the default.

Yeah.

And we still rely on...

There's famous stories in the MIT lab about that.

That's how they invented lasers and so forth.

They put a bunch of people from different departments in the same space.

The media lab.

Yeah.

Well, actually physics is where we got a lot of the inspiration.

Physics has just historically been...

Labs have just rallied around big projects and big shared resources.

And we are relatively centralized, but we still depend on a lot of labs who are doing sort of exact frontier work or complementary work to come together to support those.

There's that.

But one more thought on your expansion question is like, and maybe this is like the modern AI lab.

We are not expanding a lot of square footage per se, but we're expanding our compute.

The research, they don't want employees working for them.

They don't want space.

Yeah.

They just want GPUs.

Yeah.

So it's just like in a sense, that's new lab space.

It's much more expensive than what lab space.

And you guys have always been creative on that, even in the last few years, you've created ways to share access to compute, you've enabled academic labs to...

It forgot the name of your program.

Yeah.

Scientists in residence or something like that.

Yeah.

But your...

Yeah.

I think we...

The core of it is, if you look at individual labs, they'll have like a large lab would have tens of GPUs.

And we were the first to really build a large scale compute cluster.

A thousand now where we have plans to move to the 10,000 range.

And that one requires a different type of project, obviously.

They're able to ask different types of questions.

And it's a resource that we use, but also we've invited scientists to apply and say, like, what question do you have that could use this amount of resource and be able to sort of see collaborations that way?

And so if a scientist is out there listening, like who's not employed by the Biohub or working at the Biohub but wants to collaborate with the Biohub, that you're going to create really interesting doors to utilize the resources.

That's awesome.

Yeah.

I mean, the GPUs were somewhat zero sum, right?

So the data isn't.

So yeah.

Yeah.

Fair enough.

Yeah.

So you're about to celebrate 10 years doing this.

As you look out in the years to come, what else can you tell us about either things that you're thinking about for the future or maybe even principles or a North Star that's going to guide how you guys grow and evolve going forward?

You know, it's been really interesting in the past 10 years, because I actually spent the first few years completely envious of people working for for-profit companies because there's so much clarity.

Like the market will tell you whether or not it's private or public will tell you if you're doing a good job.

If they think you're doing it.

If they think you're doing it.

They're not always real.

They're not always real.

No, it's big difference.

But I was still envious because that was, I was like, I craved that feedback like, am I doing a good job?

And, you know, 10 years in, you know, the reason why we're doubling down on biology is like, not only did we achieve what we said we were going to do and when we set out to set out on these projects, it actually delivered more than we thought we were going to.

And I was like, okay, that's a signal I can latch on to.

And like, that's a signal like we can really continue doubling down and doing more of that.

And so I think it's continuing to tolerate the early ambiguity when you're like, okay, I'm going to do more of this.

And, and being patient, but being willing to have a long time horizon, but be impatient at the same time because it's all those iterations along the way that have sort of allowed us to get to this place where, you know, to get lucky ready, having built data, data sets to take advantage of AI and large language models.

That's because of all the work that we have been doing.

And so being able to continue moving forward in this ambiguity and sometimes lack of signal on a big goal.

Like, I think we sort of set the DNA for that.

Amazing.

Oh, no pun intended.

But we get to see how many people use the tools and the feedback.

Yeah.

Yeah.

Yeah.

You have customers, which is pretty cool.

Yeah.

Yeah.

For philanthropy.

Like that's awesome.

Yeah.

No, it's, it's one of the fun things about building tools is like, you kind of get to see how valuable do people find the tools?

Do people use the tools in order to publish important work?

Right.

Right.

Right.

Right.

Yeah.

And I probably, I mean, our feedback is, they're awesome.

And completely unique by the way.

So like, the other thing is like, what would you use if you didn't have this?

It's like, there's nothing.

Not yet.

It's a real kind of void.

I mean, there's this whole pipeline that needs to exist from accelerating basic science to funding a lot of people to use it to then you can get into the biotechs that basically can start to work on, on basically coming up with novel therapies.

And then you get the pharma companies that do them at scale.

And then there's a space for philanthropy on the other side of public health of basically taking the therapies and kind of bringing them out to everyone in the world.

But this is a space that, I mean, that there's just going to be a huge amount of leverage with AI.

And it is, yeah, it still seems like there could be a lot more effort in the space around building tools and just accelerate the whole thing a lot better.

Yeah.

And I do think it is the place where you are completely unique.

Right.

The other things there are other people who can do that, but there's nobody doing what it's got good, good founder market.

Yes.

If we didn't exist, would it be a problem?

Yes.

Like those questions really land.

Yeah.

As a VC, one of us is an engineer.

The other one, the scientist doctor.

Yeah.

Very happy in this direction.

Yeah.

We thank you very much not only for our companies, but for us as humans for working on this.

It's amazing work.

Oh, thank you.

Thank you guys.

Thank you guys so much.

Thanks for listening to this episode of the A16Z podcast.

If you liked this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family.

For more episodes, go to YouTube, Apple podcast, and Spotify.

Follow us on X, A16Z, and subscribe to our sub stack at a16z.substack.com.

Thanks again for listening and I'll see you in the next episode.

As a reminder, the content here is for informational purposes only.

Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund.

Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.

For more details, including a link to our investments, please see a16z.com/disclosures.