
OpenAI Podcast · 2026-04-16
OpenAI Podcast Episode 16: Building AI for Life Sciences
Hosts: Andrew Maine
Guests: Joy Jiao, Yunyun Wang
Why it matters
OpenAI has developed a new series of life sciences models focused on genomics, protein understanding, and early discovery use cases.
Key claims
- OpenAI has developed a new series of life sciences models focused on genomics, protein understanding, and early discovery use cases.
- Models assist with automating lab workflows, literature synthesis, data analysis, and experimental design, improving research speed and accuracy.
- Collaboration with Ginkgo Bioworks demonstrated AI's ability to design viable biological experiments, marking a milestone in AI-driven biology.
- OpenAI implements strict safeguards and differentiated access to mitigate dual-use risks, balancing capability with safety.
Briefing memo
Summary
In this episode, OpenAI's research lead Joy Jiao and product lead Yunyun Wang discuss the development and deployment of AI models tailored for life sciences. They highlight the creation of specialized biochemistry-focused models that assist with complex workflows in genomics, protein understanding, and early drug discovery. The conversation emphasizes the potential of AI to accelerate scientific research by automating repetitive tasks, enhancing data analysis, and enabling long-term, complex problem-solving through scalable compute and model orchestration.
The guests also address the dual-use risks associated with powerful AI in biology, such as the potential for misuse in creating harmful biological agents. OpenAI's approach includes rigorous safeguards, risk-averse deployment, and differentiated access controls to ensure responsible use by verified professionals. They share insights on current capabilities, including collaborations with labs like Ginkgo Bioworks, and envision a future where AI-driven autonomous labs accelerate drug discovery, personalized medicine, and biosecurity measures. The episode concludes with advice for researchers and students on adopting AI tools to enhance creativity and productivity in life sciences.
- OpenAI has developed a new series of life sciences models focused on genomics, protein understanding, and early discovery use cases.
- Models assist with automating lab workflows, literature synthesis, data analysis, and experimental design, improving research speed and accuracy.
- Collaboration with Ginkgo Bioworks demonstrated AI's ability to design viable biological experiments, marking a milestone in AI-driven biology.
- OpenAI implements strict safeguards and differentiated access to mitigate dual-use risks, balancing capability with safety.
- AI models can predict chemical reaction outcomes and are progressing toward more complex biological predictions like toxicity and personalized medicine.
- The future vision includes autonomous AI-powered labs running continuous experiments to accelerate drug discovery and medical countermeasures.
- Adoption challenges include skepticism in parts of the scientific community, addressed by demonstrating utility through accessible platforms like ChatGPT and Codex.
- OpenAI emphasizes the importance of human oversight, collaboration, and incremental deployment to responsibly integrate AI into life sciences.
Source material
Transcript
Hello, I'm Andrew Maine, and this is the OpenAI podcast.
On today's episode, we're talking with the research lead, Joy-Jow, and product lead, Union Wing, about OpenAI for life sciences.
We'll explore what new models are making possible in biology and medicine, and what it takes to deploy the most advanced capabilities responsibly.
This allows it to kind of reach new levels of difficulty and discovery that we didn't think was even possible before.
Putting really capable, expert level knowledge in the hands of a greater amount of people, one of the taglines was to scale test time compute to cure all disease, so that is like our team tagline.
We started off with just a basic API, and then we had chatGPT, which is more conversational, was really good for text, as code became a capability went through basically code models, and then code X.
Now that you're getting more scientists in the life sciences, working on these systems, does that mean things have to evolve to help with the way researchers might work with these tools?
Yeah, we're really excited to build and deploy the life sciences model series.
So this is a new biochemistry, focused model series.
That's really anchored on these very complex life science research workflows, and we're focused on adding new, like, mechanistic understanding, starting with genomics understanding, and protein understanding, and really focused on early discovery use cases, because we feel like that's one of the core bottlenecks that greater thinking time, greater compute, and really leveraging more capable AI models can help meaningfully stale some of these like research barriers.
And I think there's also a model orchestration piece of actually how to embed this into workflows, and it's been really great for having all these different product surfaces to deploy to, we're seeing a lot of really great literature synthesis, workflows happening on CHATGBT, and these models really push the frontier of, like, long trajectory-agented workflows, and we're really able to empower that on codex.
And more on the model orchestration piece is that, I think, for enterprise use cases, there's, like, this reproducibility and repeatability element, and we are trying to overcome this by working on, like, some of the life sciences research plugins that were shipping for very specific translational bio-users.
So the life sciences research plugin has over 50 steals, which are essentially temple ties, repeatable workflows, that if you need to, whether do some sort of cross-evidence match and search across various different papers, or do pathway analysis, something that's, like, repeatable that you often do, we can have, like, almost like, a one-click deploy option, by using our life sciences plugins on top, and that's also how we're considering the balance between scaling for very specialized purposes.
Something we're hoping to get into is maybe clinical purposes, but also make it still very general use for all foundational biology.
I think the models can get quite far by using tools.
So for example, we can use open source protein structure prediction algorithms and set a research stack, and in this case, the model is acting kind of, like, a regular computational biologist, you would kind of go run these tools on a computer, you will look at the output, you will tweak the input a little bit.
So I think that is something our models can already do.
I do think what will make the model see if it's more powerful is to start to turn them more into kind of a biochemistry expert, and I think with this kind of intuition expertise, you can use these tools even more intelligently, and get at the right answer more quickly.
How did you get your interest in life sciences?
My, I guess original background was actually in life sciences.
So I've always been interested in biology as a kid, I got my PhD in systems biology, I've run like a decade ago, from Harvard.
Found academia to be very interesting, but the pace was a little bit more slow moving than I would have liked.
And I think just the experience, I'll kind of like having to physically be in the lab and kind of transferring small amounts of look with one or two to another.
I think I wanted something a little bit faster pace where I felt like I was more in direct control of my own velocity.
So I went from that to software and ended up here at OpenAI.
And so this is kind of like a full circle moment for me, where I'm starting to look at biology again, and looking at how to accelerate my previous self with AI.
So yeah, I'm really excited to see what progress AI can make in the space.
So yeah, this is too slow.
Let me go off an AI and speed it up.
So I can get back into it.
Yeah, except from this end of the, I don't really ever want to touch a Pythagorean here again.
So I would prefer for my role.
I'll still do it for me.
Yeah, we joke about that a lot, a lot of our motivation for this is we can automate pipetting and never have to do that again.
Well, that's what's interesting.
I was looking at what you all did with a Ginko BioWorks in the idea of taking GPTI5 and taking an AI system and then working with a robotic lab and how was able to speed things up.
Could you tell us a little about that?
Yeah, the Ginko Work is interesting because I think when it started, I think was like July of last year, 2025.
And at that point, GPTI5 had just been training.
We were really not sure if the models could do any kind of biology.
We didn't really have that much biology and our training data was mostly math and computer science, which I think makes sense because these things have verifiable solutions.
And this is usually not the case in biology.
Unless you have, you can't go and do the experiment in the lab, right?
So when we started the collaboration with Ginko, it was really can the model do any biology at all.
Can it design experiments that actually make reactants like make the product that we want?
So it was actually quite surprising.
I think when GPTI5 designed the first set of experiments with Ginko and the results came back.
Well, we made a non-zero amount of protein and that was actually quite surprising.
And then I think progressing from that point in time, which is just roughly like six months ago to now where it actually just feels quite obvious that our models can accelerate science is actually just really surprising.
And it really shows that they are the possible, I think.
I think before that experiment, like I joined the bio-Ginko team was conducted, I think we really didn't know like for ourselves.
And I always say like we kind of learn that for ourselves when we like engage in these experiments and we have a few more in the in the works with others.
And I think that is like the type of like acceleration that we're looking for.
Injusting, high throughput, experimental data is really difficult.
It's very compute intensive.
And I think for a lot of these scientific workflows, the true bottleneck for the speed and progress of scientific acceleration is at like, how like almost a human bottlenecks.
And I think the future that me and Joy see is that it's no longer human bottlenecks, but rather maybe compute bottlenecks.
And we're really able to like deploy minis of agents, doing parallel orchestration to divide and counter out all these tasks.
And the researcher can now spend their time on like really analyzing interpreting like the most meaningful insights coming out of that.
So union, how did you get into this?
Yeah, I think reflecting bad, I've actually been working on like biology, research and some shape are formed for a majority of my time here at OpenAI.
So I first started on working on bio-risk communications and a lot of our bio-defense initiatives.
And so I feel like coming to now working on the life sciences research side gives me like just a appreciation for how difficult this problem is and tattooing it from both sides.
And my initial entry's point into what lab research was actually through doing a lot of infectious disease and biology work.
So I think I've always done like the interest in bio-security in that way.
So this just feels like a really great moment like right now to work on it, especially when our models are getting more capable at beneficial use and just general life sciences.
How long has OpenAI been focused on life sciences?
Yeah, I would say it was really the way we design our capability emails as show us that this is possible.
So it's been, I think for at least like two years now that we have worked on a lot of our early research research experiments and now with the gentle autonomous like wet lab model and the loop experiments.
I think we have a few more research honors in the space that we're really excited about.
I think I can't actually name everyone right now, but there's a lot of self-kind of in the chemical design, protein design and design space that I think is very AI-native and a lot of people are interested in.
So understanding how the world works, understanding how chemicals react, understanding how self-interact, how pathways inside self-interact.
All the way to can we accelerate drug discovery.
So given a disease kind of model how scientists understand the mechanism, can we want to give an at target, actually define a drug against that target, can we even accelerate the FDA approval process?
So I think there's a role for AI to play kind of add every step of this pipeline.
And yeah, I think there's a lot of AI possible in everything.
I've been to some of those cutting-edge labs and on the outside you have this impression of it then you walk in there and you literally see somebody with a role-peachy dishes, rope samples, and just some grads should end going click, click, click, and I'm like, oh, this is the pace of science.
This is fascinating.
But yeah, you get the exactly like enough of this.
And I've got to go speed this up.
But we forget that's often the pace of science is just how fast the human hands can move through that.
The tool like that is it's kind of exciting.
When you start using these tools to maybe think about new pathways for treatments or just evaluate, you also introduce the idea that these could be used for things that maybe you're less desirable.
You'll buy a weapons or something that comes up a lot.
The fact is, you know, an AI can figure out how to do a code exploit.
My code of figure out how to do a gene exploit.
How are you addressing that?
Yeah, that's a great question.
And I think it is just probably one of the most severe risks that we're currently really tracking for rising AI capabilities.
Our first approach to that was really thinking about how do we assess for information hazards?
At what point does a model now maybe give like the final step and like just synthesis of a dangerous pathogen?
And what we found is that like the precursor steps to that really looks very benign.
And it's really hard to distinguish between.
So another way to put it is like the same steps that a beneficial like legitimate after my take is looks very similar to the ones that a different call out and start with something that's exactly.
So I still think that we made the right call for really taking a very risk-averse approach to that.
But now I'm really excited about like a different tree access and like responsible deployment.
As really a core pillar of all of our safeguards work.
And really understanding that there are different user segments.
And I almost feel like the future we're going towards is something like models as like professions similar to how they models have different personalities.
And sometimes you want to invoke the right one depending on the type of like workflow you're looking at.
So I think how this translates is similar to how biologists working on like their therapeutics and their research.
They require access to data sets are often very tightly controlled or they require access to just expert level.
They all have PhDs and have like expert level like biology like knowledge.
How does that compare to how does that translate over to models?
I think that's why we have to how similarly take the same training approach.
But also the same security approach and deploying that in like a way where we can have those very heightened enterprise-grade controls in place.
So you just mentioned safeguards can you explain how that applies here?
Where you would need them, why you would need them?
Yeah, so we very thoughtfully designed and designed new safeguards for pretty much all of our models across very different risk areas.
But I think when it came to bio, this was like the first dual-use risk that is both also a capability risk.
So it very much correlates with how we as capabilities improve the risk correlates.
And I think that's why when our first approach when we really there was no precedent for a lot of this work and we were the first to really activate these high safeguards.
When we saw that that's a significant reasoning jump in our model capabilities.
We really wanted to make sure that we did it right and I think the best way to get it right is to incrementally deploy.
Yeah, I think it's really a fine line between having a very capable model.
That's capable of accelerating benign science and beneficial science versus a model that could be used by a bad actor.
And I think the safest model here would be a model that just had no capability, right?
And it's not very good.
Yeah, it's not very good, but it's very safe.
And on the other hand, if you had a model that is basically an oracle of the physical role that basically knows everything about every experiment, that model could function wrong hands and do potentially very bad things because someone can go and say, hey, design a new novel pandemic potential partition and the model can just go and do that autonomously.
So I think we need to kind of figure out where we draw the line in between the two and kind of think about who gets access to a potentially very capable model and who doesn't.
And we found in kind of a called general access traffic is that it's very difficult to figure out what a user's actual intentions are just from kind of reading a prompt.
And I think as an example of this, let's say someone says, hey, how many cloma gene?
The model might not even be given what the gene is, but it can come over the protocol for it.
And so this gene could just be something like green for us and protein, where could be a toxin?
And there's basically no way to figure that out from the context of the conversation.
And so this becomes a very difficult problem in production.
And basically, I think like you said, we decided to kind of err on the side of safety here.
And basically say that, okay, if we think that there is a potential for misuse, we either have the model kind of self-refused the user, which it tends to say things like, sorry, I can't really help you with that, but I can give you a high level overview of this protocol instead.
And this unfortunately very, very much a noise are kind of professional scientists, understandably.
And then we also kind of have multiple layers of mitigation on top of that.
But I think really to unlock the full capabilities of our model, who we need is this differentiated access.
And what this means is we know who the user actually is.
They are a professional working at a legitimate research institution or a pharma company.
And because of the regulations around these institutions, we know that, for example, all the reagents are being tracked, all the cells that are using are being tracked.
And so this gives us confidence that this is a legitimate user and not a random person of basement doing who knows what.
And that allows us to get them basically more capabilities than we are able to provide to the general access traffic.
What can you do right now if you're working with the models, you're working it with an in a laboratory, what would you say the capabilities at this moment?
So I think people use the models for very different things.
I've talked to people in the bigger lab recently on kind of how they've been using our models on I codecs.
And sometimes it's as simple as, hey, can you write a spreadsheet for me?
I don't want to just minimize a number of pipetting stuff that I have to make.
And this hits me very hard because I had done the same thing by hand in grad school.
So that's like a very simple mathematical software operation.
And then there's much harder tasks.
I can't you design a enzyme for me with all of these biological design tools.
So I think there's a huge range of sophistication.
And something I'm very excited about is how we can use our models to be a more powerful discriminator and really testing and assessing new novel ideas.
And I think something that I've been noticing as I trend with a lot of our research partners and also the users of our models is that models for scientific research and task almost require a different like persona or a different prompting style.
So we I often feel like you know like a model that is much more scrutinizing or a step-to-date at good ideas is it's very similar to how human scientists what go assess like originality and feasibility.
It's really like I think helping understand like out of all the new papers and new publications out there that push the frontier of all these hypothesis.
What are the ones that are really feasible and valid for testing?
That's going to help you know lead to new breakthroughs.
So and then translating this to something like a disease target screening selection like the potentials for these drug targets are endless but it's really like narrowing down the aperture and I feel like that's where the assistance comes like this is extremely difficult work to do that scale and having a model that can like empower and accelerate that process.
I think is kind of like one of the immediate impacts we're hoping to see by a response way to point this model to those those users.
It seems like it's a very interesting trajectory.
You went from there was you had GPT 3 on the API, GPT 3.5, then you get chat GPT and now we have chat to be the apps and now we have codex and it sounds like these things just the number of things you can do with this continues to grow.
How would you see this building you know do you see this is basically just becoming a complete infrastructure for kind of every kind of inquiry might want to pursue?
Yeah I think the dream is to have a lot of the basic foundations of a scientific workflows happen on codex and I think the goal is to have codex to pretty much be able to do everything that is possible to do on the computer.
Of course we also want to extend beyond that with kind of picking it up to robotics and so forth.
But I think right now we already do things for example if we have a bunch of different deficits on our remote on our laptop we can actually say hey codex go and run this code on all of these different deficits that are all remote and then codex can do that.
I can say monitor this for me and kind of like go away and do something else and the codex is like they're watching all the locks for you.
It can build a lot of just kind of for purpose software for analyzing specific pieces of data for visualizing data so for example if we have experimental biology data that we're sending each other on the team.
What I've noticed recently is instead of sending a raw data we've started sending HTML files are just these kind of like beautiful UI stuff codex has built with kind of like spinning proteins and it's actually just kind of changes the way that we share with each other and collaborate.
Yeah when we first started mapping out how users and organizations might adopt this I think we envision that each scientist would get their personal assistant or their coworker and this is a way that they can help scale their their their collective output and then the next paradigm of that would be standing up whole research institutions where a whole program team can actually deploy a like workforce of various agents and they can all do like parallel task delegation, how mimicking a lot of these existing patterns and we we can figure out the pieces of like how they can all collectively like work together to solve like larger larger tasks.
It's interesting because open eyes talked about the need for compute and I think that sometimes we just sort of think like so I can have more conversations and stuff but when you're talking about the idea of building these tools to become entire platforms or scientific exploration it sounds like the computer advantage is really critical.
Yeah I think there's two of the access we can't think about how we are scaling compute.
The one that I think everyone's familiar with is just getting bigger models and I think as we went from GPT 2 to 3 there was a huge size increase and there were just these amazing emerging properties from the model and we thinking about you know when GPT 2 was released we were all kind of collectively amazed I was able to write a coherent article about unicorns and now we're in a completely different world right and a lot of that is driven by model architecture yes but also just the number of parameters in the model just allows it to achieve this incredible intelligence that we never thought was possible before and then on the other axis we have what we call test tying compute scaling and this is when you are influencing a model when it's kind of spitting out tokens and this is a thing that happened fairly recently when we call these sub-recenting models it's that you can think for a scalable amount of time and this is variable depending on how difficult I think it thinks a problem is but we can have the model think for days we're really just kind of ways to just kind of have it think forever about a problem and this allows it to kind of reach new levels of difficulty and discovery that we didn't think was even possible before.
When we think about data centers we often just sort of think about it generating cat pictures or doing text conversations but I think that's really the helpful framework to look at is that these are going to be systems we're doing extremely long-term big complex processes of thinking about this and it's it to me it just makes a lot more sense when you know projects like stargates and we're going to be building a lot of compute it's not just for what we're doing now but it's going to be for things like that.
When we had first announced a teams formation on Slack I think one of the taglines was to scale test time compute to cure all this is so that is like our team tag loss our team model and it's ambitious yeah.
Had a friend whose child was born with one of those orphaned diseases and she would do fundraisers to everything she could to try to support some researchers were trying to find a cure for this but they're just not enough time not enough people and you know I'm hopeful that we're kind of in an age now where these kinds of tools are going to make that maybe a thing in the past.
Yeah I think we're already seeing the model help a lot in these cases I think from things like drug repurposing so for example a drug that's already been cleared by the FDA for use and one different indication but it for kind of like from mechanistic understandings of how that drug works the model has suggested in many different cases for maybe you can use this drug to temporarily ameliorate symptoms.
We're also seeing a lot of advantage and personalized medicines for example the design of ASLs who are there RNA-based treatments is very common and I think yeah we are actually very very close to being able to scale this up in a really vast way with AI I think just in the next year or two I think we'll see very big changes here.
Every every researcher I know when you ask them what they could use in their lab they always say more hands more people more people doing this kind of work and you hear some people talk about well is AI going to displace that and I think no it sounds like it's just a big accelerator for all the things that could be done.
Yeah I completely agree I feel like when you think about lab automation for example a lot of the bottleneck comes from actually being to translate a protocol into something that can be run on the platform and we have partners tell us about how codex has been helping them do this and this is kind of fundamentally I have coding problem have understanding how well-up works and then I think thinking about the data analysis piece I feel like having our models kind of walks through a user who maybe doesn't understand to have the deepest understanding of statistics they can still rigorously analyze the data that's coming in the model can kind of help them probe different hypotheses or it can suggest difference statistical tests it can point up potential issues and biases in the data I think these are all ways of kind of uplifting individual scientists and helping them do better science but I don't think we can ever fully replace the scientists and the loop.
So you've been putting it into the lab you've figured out how to help with automation.
Where do you think we're going to be six months from now 12 months from now?
Well I would really love to get to the point where we can say that AI has designed a new drug or cure the disease.
I don't know if that can happen this six months but I will hope in the next few years that's going to happen I think we're seeing science of this happen you kind of all over different stages of the pipeline.
I mean obviously earlier in the drug discovery process where you're kind of looking out literature synthesis or the model is kind of discovering new biology for that to become a new drug on a market is going to be a very long process or possibly I could decade but I think there's ways that we can really speed up this process by starting at maybe the clinical trial stage where it's starting a little bit before then and the safety reviews or in the drug design phase.
So I think basically that's what I'm the most excited about coming up in the next few years.
Yeah for me I think I'm most excited about all the possibilities that our users or scientists can do on our platforms.
So for one I think a huge win would be if a researcher can patent a new finding or a new discovery on our platform and using our models and that's why we really focused on early discovery and starting with building like teaching the models at the mechanistic understanding.
So this is again like trying to provide the most powerful tools through our life sciences models to these scientists so they can't really accelerate the speed of their research.
Do you think we'll get to a point where the models are going to really good at basically predicting the cell or predicting the outcome?
I think definitely yes I think it depends on the complexity of a system.
So for example one thing our models are already very good at is predicting the outcome of a chemical reaction and I think as you increase and biochemical and biological complexity some of the hardest things to predict is given a drop while this be toxic to a specific person or to a specific system and I want to slowly work our way up to that but that is definitely on a road map as something we want to do eventually.
When we're looking at models that do things like language or math it's pretty easy to put together an e-vowse for it.
Did it get the problem right or get it wrong?
What do e-vowse look like for models that are doing biology?
Yeah we have various different ways of evaluating model performance.
A really nice way to do this is kind of with experimental data so someone has already done the experiment and then you ask the model, can it predict the outcome of these experiments?
So a lot of the kind of virtual cell work basically looks like this right so someone has done single cell RNA C on millions of different cells and then you feed this to a model and then you try to get it to predict the unseen perturbation.
We can also do a lot with synthetic data and this means that maybe you have generated a set of data and you put very specific characteristics in this data that could be kind of a focus for the model and these are things that maybe a typical computational biologist might encounter day to day so this could be sound weird bias in the data could be some QC thing that you have to do or statistical correction and because we generated the data ourselves then we can actually go and test the models capability as a computational biologist.
I just catch all of these different mistakes.
So there's a lot of different ways to be creative with evaluation but that being said I think voila this is still kind of the final like real evaluation of the model right and as you like to say nothing in biology is really real and how you can prove it in the real world and so we do have a lot of research collaborations where we try to do just that.
Yeah e-veils have really become more complex and sophisticated over time and I think that's especially true for designing e-veils that can really capture both value creation but solving complex problems for life sciences so I think we really try to focus on examples that are not like toy problems but really captures that like for example like the the messiness of like pre-processed site data and when we design these new evaluations a starting point is often just trying to recreate an existing experiment so something that has already had baseline so we already know what the either current state of the art looks like or the current round truth looks like so a evaluation really excited about is looking at if our models can assess like the antibody binding predictions and looking at how that's been done for in sustained virus variant and then once we have already done that baseline we can push forward and say can we do this with something that hasn't been done before and I think that is like some of the precursor steps to denoval antibody design maybe expanding the the neutralization for new viral variants and that's also on the path to new treatments and potentially developing new vaccines.
What has been the reception in the life sciences particularly conferences in the community people you know have you seen a lot of willingness to embrace this or skepticism or people who just don't think it's helpful.
I think it probably depends on what part of the country you're in.
I feel like kind of being on the west coast everyone is pretty AI pulled and so they really embrace this AI scientists, the agentic workflows and they really kind of see the future for AI.
When I'm at a conference on the east coast this changes a lot I think people are generally a bit more skeptical maybe there's a a little bit more doubt around the AI capabilities and yeah I think it's just maybe like a cultural difference I think most of the big AI labs are here and so we kind of have a percent experience of what the models are capable of and this kind of changes our perspective a little bit.
How do you bridge that gap how do you get more scientists to understand because it sounds like the more people contributing the better because there are weaknesses or areas need to be improved upon and the more you get people who are maybe skeptical about this to sort of figure out how to participate.
Yeah I think there's a few different ways the easiest way is by launching our models through the front platforms like chat or codecs and I think just by kind of showing individual scientists how useful this could be maybe just making a serial dilution spreadsheet for someone who's pipetting but that has real value right and I think you can kind of build up from there.
I think coming from the other end we do have these more deep research collaborations with labs for for example I antibody design or enzyme design and these sort of things are kind of more you know they've resolved in publications and if people will read and say okay you know a AI system did a lot of work has biological novelty has been proven out on the wild lab and so I think that also lands credibility to the system.
Yeah I think the simple answer is you show by doing and you show by publishing and engaging with the scientific community and I think the step tourism is really healthy and should be welcomed I think it's it's just really great to see people get really excited because and also trying to like disprove maybe because the potential for this technology is so great if we get it right and if we can actually really leverage its full capabilities so I feel like the carefulness about how do we actually make this work for real problems is like very much warranted but yeah I think when we publish and I think that just also shows a need for more rigorous evaluations that represent like these life science workflows and research problems so people can look at and email and say yes like I feel like now I have like a hundred different ideas for how I can implement this into my my lab and solve some of the current bottlenecks I'm facing.
I actually think there's a certain amount of stress I've been calendar from people who are worried that you know AI is really powerful but they don't know how to use it the right way and so there is this general feeling of like I need more AI in my workflow in my life but they don't know where AI should come in and a team part of the product vision is to just make it so simple that it just works so you can just go to something like Codex and say hey I want to do whatever I'm doing today and Codex can figure out all the different pieces, the multi-age on workflows, the tool calling all of that and so yeah basically you don't have to stress about how to get a flow from AI and it just happens naturally.
We do see those step changes every time these models become smarter in understand users better you get more utility because some people go I don't have to spend a lot of time trying to prompt it or figure out all the tricks to it.
If you're talking to somebody who is considering getting into the life sciences maybe high schools do it right now what advice would you give them.
I feel like when I was in high school so I did the USA biology and Lampia back when I was a high school student and I think out of all the different Olympics I think biology was seen as kind of the most like memorization heavy one versus like maths right where it's kind of more you know test time compute scaling whereas biology is more kind of memory and retrieval.
I think my hope is that with AI having kind of learned all the relationships between all the different research pieces is that it can really uplift human creativity and just make the process less memorization and more kind of helping people connect different fields of research together and just kind of I guess I've heard during the front years of what people are able to explore in biology.
So yeah I feel like my advice to I guess a high school student will be that maybe you don't have to kind of go and memorize all the biology books you should just do more exploration with AI I think you can definitely read papers and just ask questions and I think you can do both deeper dives and broader overviews this way and I think just the way of learning really changes.
I found that when I was in the lab there was a real like solo like individual aspect of doing biology research compared to like for example when I went to my first like CS hackathon there was some excitement about just like the collaborative nature when we first like build our like app together.
So I feel like that's really the future I hope to see for early adopters and students using our models and maybe using in like the totads front time because there is like a collaborative nature to it too.
I think like for example sending your scripts or sending your conversations or maybe one day we have like we all have like our own like co-scientists or agent and we can like deploy our agent to now work with a teammate in that way I think there's just like new like interactions and new modalities for us I would just encourage students to adopt early and just to like also pioneer their own path for how they would like to use it.
For me personally I always actually tell I felt like I got into the wet lab a little bit too early and like we mentioned earlier I did not enjoy pipette.
That's a little theme here.
There's a lot of like very intense manual tasks involved and so I hope that like you know when our AM models can connect with physical devices that yeah we can just like make a lot of like the learning curve more fun for for students so that they can kind of like learn with the models and then kind of like maximize our time with like the really interesting interactions spaces.
So I've been working with a student I like to help students come with projects and one of them is we've taken codex and he's connected it to a greenhouse and basically using it to get photos back and to look at it and to evaluate it.
And I think it's been fun to see how he's been taking both you know AI technology and then something traditional like a greenhouse to combine them to and basically building up the skill set of learning how to use the two of them.
When you talk to your peers you talk to people who are running labs or running experiments and researchers what advice do you have for them because the problem I see is that a lot of them go that's great I just don't have the time but what we're trying to do is save them time so do have any kind of quick advice that you give them or any ways you try to maybe inspire them?
Most people that I know I think in academia use AI and I think two main ways that I've seen one is to kind of talk to AI about an existing piece of research paper or something and just kind of make sure that you're understanding things the right way or kind of fact tracking and this is personally what I really like to use AI for because you can ask a really dumb questions and you don't feel any touchments actually just really wonderful for learning and then I think people use a lot for analyzing experimental results and I think this comes out to the statistics piece that I learned where I mentioned before where sometimes you don't know whether right way to analyze your data is or there's just kind of so many different interdisciplinary fields that you or your data might touch on something in chemistry or something in like a random niche field of like protein biology and really nice thing is that the model can kind of like pull those different ways of data analysis and for you and kind of explore all of these different paths.
I feel like both of those are pretty low-lift ways to try things out so you could just kind of like throw a pd file at AI and just be like hey help help me understand this paper and just have a natural conversation or you can you know boot up codex and do some data analysis throughout the on your laptop.
Yeah I I would say that you'd have to start with making sure it doesn't feel like work right away so maybe it'll be easier when you're focusing on a adoption to just like work on like a hobby project or a passion project for for me for example I actually started working on like more like literature synthesis tasks but I was doing creative writing projects which are kind of like just something that was not at all related to like our data day even though the biology is of creative space I was just exploring that through like a different different media and I think that's actually when I start unlocking a lot of different ways to either prompt the model or to actually access different data sources so I think that count just gave me a lot of like pattern matching abilities for when I was trying to apply it because we're not gonna get it right in the first try and it is really hard and I feel like the progress and pace of this field moves so fast that every like week or month there is like a new like pretty exciting development that might change how we engage with models or AI systems so I think it's just important to get started somewhere and I think another theme is the collaboration element I feel like it's more powerful when you have a recommendation from either somebody on your direct team who is doing the same day-to-day tasks as you that happens a lot on our team as well where somebody will say oh I doubt totems to like now touch these three different like internal like databases that we weren't able to connect before and I don't even like the latent space the latent capabilities are just so vast that there's a lot that we just don't know until again we can do it so I think just having conversations with your friends your lab mates your teammates well I think spark a lot of those conversations a lot of those creative juices and then help you help you with your own adoption.
What does science look like 10 years ago?
I think when we started this team we do have like really just ambitious targets and one of those is like I think we do want to make meaningful strides towards or even if like assist with like curing a disease and I think there's just so many rare orphaned diseases that doesn't really have the attention and the resources that it warrants because it's just such a a difficult feel to actually like for example like clinical research is so difficult to actually bring that to patients and to market so while 10 years I feel like it's just really a really long time why I'm really excited about like some of the progress that we can make and I think it's too like the carefully optimistic that like we're gonna see some of those breakthroughs pretty soon.
Yeah I think maybe this is a bit of a sci-fi vision that I have of the world that I really hope becomes reality which is that you have these autonomous labs that are just mostly robots and you have them all hooked up to AI and you just have autonomous research institutes that are constantly running and curing human disease is maybe making new materials, making new drugs, it's maybe solving personalized medicine.
There's a lot of end of one or just ultra rare diseases where people without vast monetary and research scientific resources can even begin to think about but we can solve that with AI and I think we can kind of almost breaks through the financial and regulatory and monetary constraints with the system.
So I think that that's kind of the dream and I think also you must separately think kind of more about the biosecurity side of things.
The systems can be kind of constantly sampling our environment, it can be sampling wastewater, it can be sampling the air and constantly detecting potential threats or even just you know better predictions for the flu and getting better flu vaccines.
But just generally these different medical countermeasures I think should be happening autonomously in 10 years and I think that that's basically something yeah I'm really excited about.
The AI lab is exciting because I think if people really understand what it means is it's not there aren't scientists it's they're more scientists but they sit at home and they go into codex and say can you go run this experiment for me like you have a data center you have a science center doing that.
Right exactly yeah and I think I didn't talk about the scientists in this individual I was just describing but obviously there are people involved in here and I think it's really kind of high level direction setting from the humans we're saying here's a patient with this disease here are some potential solutions or things that maybe you can look at and I think the AI can then go off and explore the variety as you can design experiments and I come back to the humans and say here's what I found what do you think which would do next and this can be kind of a academic discussion it's a little bit similar to kind of the way that people interact with codex today where you say here go write a function we're go write a piece of code and I write since I hear here's a code and then the person tells you the next thing to do so I think it's a little bit similar to that kind of interaction but on a much grander scale and on a much longer time horizon I think it's really like the democratizing science aspect and putting like really capable expert level knowledge in the hands of a greater amount of people and I think what that can mean for personalized medicine for bolster and our societal defenses there's just like so many naturally occurring new like variants every year new like influenza strains so I think it's really just like securing defenses and feeling like we actually have more agency to counter all that and I think I'm really excited about a lot of like the medical countermeasure acceleration work as well.
Well it's very excited thank you for sharing this with us.
Thank you for having us.
Yeah thank you so much.