OpenAI Podcast · 2026-03-16

OpenAI on Building AI for Better Healthcare

Hosts: Andrew Maine

Guests: Nate Gross, Karan Singhal

healthcare AImodel trainingdata privacyclinical deploymentpatient empowermentelectronic health recordsAI safetymultimodal data integration

Read summary Jump to transcript Original episode

Why it matters

OpenAI collaborates with 250+ physicians to create and evaluate healthcare AI models using 48,500 rubric criteria.

Key claims

OpenAI collaborates with 250+ physicians to create and evaluate healthcare AI models using 48,500 rubric criteria.
ChatGPT for Health ensures data security by not training on user health data and supports personalized, context-aware interactions.
Models are trained to recognize uncertainty, escalate appropriately, and tailor responses based on user literacy and professional background.
Real-world clinical deployment in Nairobi showed statistically significant reductions in diagnostic and treatment errors.

Episode summary

Summary

In this episode of the OpenAI Podcast, Dr. Nate Gross and Karan Singhal discuss OpenAI's focused efforts on integrating AI into healthcare to improve outcomes for patients and clinicians. They highlight the collaborative approach with over 250 physicians to develop and rigorously evaluate AI models tailored for healthcare, emphasizing safety, context-awareness, and personalized responses. The conversation covers the deployment of ChatGPT for Health, designed to securely handle sensitive medical data while empowering users with contextualized AI interactions.

The guests also address the challenges of healthcare fragmentation, the importance of trust and up-to-date medical knowledge in AI responses, and the integration of diverse health data sources including wearables and electronic health records. They share insights from real-world deployments, such as a clinical copilot study in Nairobi that demonstrated significant reductions in diagnostic and treatment errors. The episode underscores OpenAI's mission to raise the floor, sweep the floor, and raise the ceiling in healthcare AI, aiming to make AI a protective and indispensable tool for both patients and healthcare professionals.

OpenAI collaborates with 250+ physicians to create and evaluate healthcare AI models using 48,500 rubric criteria.
ChatGPT for Health ensures data security by not training on user health data and supports personalized, context-aware interactions.
Models are trained to recognize uncertainty, escalate appropriately, and tailor responses based on user literacy and professional background.
Real-world clinical deployment in Nairobi showed statistically significant reductions in diagnostic and treatment errors.
Integration with electronic health records, wearables, and national standards is key to providing comprehensive, personalized care.
OpenAI focuses on raising accessibility, reducing clinician administrative burden, and enhancing the impact of AI in healthcare.
Collaborative partnerships with healthcare systems and regulatory bodies are essential for adoption and trust.
Future AI capabilities include leveraging multimodal data and extended context to provide novel predictive insights.

Source material

Transcript

Hello, I'm Andrew Maine, and this is the OpenI podcast.

Today, we're talking to Dr. Nate Gross, head of health in Caron Singal, who leads health AI research at OpenI.

We'll cover what went into training models to handle sensitive questions and how it's helping clinicians, patients, and health care systems.

We actually worked really closely with a group, a cohort of around 250 physicians across every stage of generation of this data.

And we're starting to see medications that have been sitting on a shelf that all of a sudden AI has found ways for them to have direct value in patient lives.

How did you find your way into health care?

So, what drew me to health care initially was health policy.

I was very interested this was before the first Obama election.

Value-based care was first becoming a thing.

I started studying different ways to make health care more accessible to more people.

And then eventually went to Emery for medical school, and what drew me to that was a large public hospital, grady hospital, to make sure that you're taking advantage of every clinical hour you have.

So, what kind of things were you doing?

So, I was mostly pissing off the IT department.

When I was in medical school, the newsfeed came out, the iPhone came out, Twitter came out, the app store came out.

And so, comparing the technology that we had as doctors, which was fax machine, clipboard, paper binder, the beginnings of electronic health records to like what my friends had or what the patients had in the waiting room was pretty profound.

So, you come out up from the point of view as an AI researcher, where did your interest in applying this to health care come from?

So, I nerded out a lot when I was younger about things like philosophy of mind, and I thought a lot about intelligence and how far could intelligence go, and could machines be intelligent.

And a lot of those explorations took me towards, as I was learning about AI, I started to work on my first AI projects, thinking a lot about the ways in which AI could have a lot of impact on humanity in the future.

And I thought something like I didn't predict the future or how fast it would happen, and I thought something like AI would happen within our lifetimes.

So, then once I had that conviction, I thought a lot about, you know, what are the ways in which I can have either positive impact, and it hopefully make that a really large upside for humanity, or think about the ways in which we could avoid downside.

So, since then in my career, I've been thinking a lot about both sides of that coin, thinking about that from the perspective as a safety researcher, which is part of my background, and then really some of that work on safety and privacy that was working on previously.

I started applying get in health care, and then I started being like, whoa, there's a really massive opportunity to think about the application, the technology, especially large language models, and health care.

And that's what took me to transitioning to it full time, which just the size that opportunity in the fact that I felt like the health care and clinical AI world was kind of not fully aware of that, how that gap.

And so, I just thought it was kind of a really amazing opportunity and responsibility to bring us there.

I want to understand both the vision and actually how this is going to be implemented.

So, our mission at OpenAI is to ensure that AGI benefits all of humanity, and health is one of the places where I think that is not only most achievable, but is the clearest.

So, health care today, as everyone knows, is fragmented, care is missed, left and right.

Patients are often left, 364 days per year, without the opportunity to engage with the organizations that have the information centralized.

And doctors have extremely limited time when they do get that chance to engage with the patient, to actually have a meaningful impact beyond a simple surgery or a simple reactive prescription.

The system is more reactive than it is proactive today.

And that leads to tremendous challenges in the system that leads to tremendous gaps in care.

It leads to leaving people behind in situations when they could be thriving.

And one of the reasons that I joined OpenAI is is because access has always been a through line in my life, access to knowledge, first in medicine, then in building a product for doctors to access the latest medical literature, and then in supporting entrepreneurs as they were building health care tools.

But OpenAI has the type of technology that can do that at scale for the entire ecosystem, all at once, help patients, help healthcare professionals, and help incredible entrepreneurs who are building for all of the corners and edge cases and tough challenges that exist in each area of the health market.

What is the strategy here?

We know that people use chat bots all the time now for medical questions, but it seems like you're building and working towards something bigger and more comprehensive, not just for the patient side, but the clinician side because you talked about like what your goals are.

Patients are increasingly turning to tools like chat GPT throughout the year.

In fact, 900 million people now use chat GPT per week.

And if you look at how many are doing health related queries, it's about one in four, why aren't a given week.

So that's 40 million people per day.

And so our strategy in health is as much proactive as it is, reactive and stepping up to the responsibility and the opportunity to do good that comes with that strong consumer demand.

And so with chat GPT health, we have created a space to keep these conversations not just secure, but empowered.

So when I say secure, of course, encrypted with this essentially one way of out protecting your conversations.

So these extra security layers, these protections to make sure that we will never train on users' health care data, combined with empowerment, really.

You know, search engines that people have used before to navigate health, have amnesia.

You know, they're one size fits all.

And I think context really matters in health care.

And so building a series of features and technology hooks to help patients bring in their own context that they choose to so that each time they choose to engage with AI, it's grounded in their own context.

Is it key reason why we've built this chat GPT for health foundation?

So I understand the safeguards you put in place to keep the data separate and to make sure that you don't get a leak between there and to be able to undergo a very rigorous method of making sure that your data secure.

But when it comes to the model itself, what comes into training models that are capable of working with something like health care?

It's kind of like the most important thing in the world.

For sure, it's a high-stakes domain and because of the use of that people are doing, it's super important that we get it right.

So we think a lot about a few things when we think about evaluation and training for health care.

And this is actually the foundation for the work at health that OpenAI.

When we were first starting to work on the health effort at OpenAI, we were thinking a lot about the safety and grounding motivation as an important part of what we were doing.

And so part of the thesis actually for starting work on health at OpenAI was thinking, this is an excellent way to ground our work in safety and alignment and provide kind of concrete incentives and feedback loop for researchers to think about this problem.

The model improvements in the safety safety thinking here is not just NAFTA, it's actually the beginning of our work here.

And so where we started really was thinking about evaluation.

So can we think about the ways in which you know models were already starting to become useful to people then?

And there's already starting to be this capability overhang between what the models could do and what people were using them for.

And so we started to navigate that problem and think about where the models still have gaps today.

And so that's where our work on evaluation comes in.

And so we've taken a pretty methodological, methodologically interesting approach to that.

And a lot of that has reflected in our work in health bench, which is this kind of realistic evaluation of conversations between users who are either health professionals or consumers talking to models.

And seeing measuring the performance and safety of the models in the situations which are these kind of multi-term conversations.

And the way we worked on this is we actually worked really closely with a group a core of around 250 physicians that we work with to kind of across every stage of generation of this data from thinking about the ways in which the areas that we would focus in for the evaluation and the areas that we thought about we're going to be clinically relevant or impactful to the specific you know what are the specific things that are being graded in this evaluation.

So that's like a range of things from are you tailoring your response to a layperson versus a more technical health professional?

Are you thinking about the ways in which you should see context first before providing an initial response?

The model used to be significantly are much better today at kind of seeking context when needed because users are typing in much less than the models often need and you want to provide information that's most helpful.

It burns exactly.

If you use the types then it burns, how do you think about the right way to provide information?

You can provide some initial information potentially based on an impression you might have of what the user might be saying but the most helpful thing to do in that situation and the safest thing to do in that situation is actually to ask for more context.

So that's just one example of the many ways that we kind of measured performance in health pension.

Health pension particular actually measured around 49,000 different dimensions of performance and that's an example of one possible dimension of performance.

So it's a very multifaceted evaluation that we built kind of in concert with this cohort of 250 physicians over a long period of time and it took us about a year actually and to work on that evaluation and then release it.

In the kind of the model development cycle it seems like sometimes some company gets a bit ahead and somebody comes up and catches up and whatnot.

I've noticed a pattern with the open-air health models that consistently been far ahead in health pension and other emails that like by a big margin.

Why is that?

I think we have a pretty dedicated effort here in a pretty serious effort that is cross-functional and kind of across the stack for everything from kind of pre-deployment emails to like like health bench to monitoring in production traffic and thinking about the ways in which we are ensuring safety and production traffic in the in a privacy-preserving way and working with physicians across every step of that process.

And so to my knowledge, opening as models are the only major models where every phase of model training from pre-training to to mid-training to post-training and every step in between really integrates health into every major stage.

And I think the result is that our models are pretty good, not just on our own benchmark but also the benchmarks that people, other people put together.

I'd like to add a little to what Karen said about the the model training because I think when we spend time with the health care ecosystem, that's one of the things that is most important to them.

So not only were these models trained in development with hundreds of physicians who created over 5,000 conversations and 48,500 rubric criteria through which to evaluate AI responses and score them and identify ways that we could improve the model, do additional data acquisition, do additional post-training, hone in on a particular subspecialty or particular area of the world where users were telling us we could improve health or health care in that specific topic.

But in addition, I think that close proximity to physicians really leads to calling out the the most important parts that should be focused on in model development.

So, you know, other places sometimes I see how a model fared on a medical school exam, on board exam.

And health care is not multiple choice, you know, patients are coming in with a tremendous amount of complexities and their own stories and nuance and context.

And that's presented in many different ways and part of the job of working in health care is being able to draw from those disparate sources, draw from experience, balance all that in your head.

And so having a training mechanism that thinks about things like win to escalate and how to escalate and keep that always as the top priority or adaptive literacy.

I mean, can compare the one-size-fits-all handouts that people get when they visit the doctor today to a model that can respond differently when it knows your an oncologist versus a primary care doctor versus a pharmacist in Kenya versus a patient at the 12th grade literacy level or the third grade literacy level is extremely important for not only making sure that accuracy and impact is maximized, but also just to make sure that everyone can maximally participate in their own care on the patient's side.

And then finally uncertainty, you know, if you go back a year and a half ago, many of the the mistakes people would call out about AI models were overconfident hallucinations and I think in such a high-stakes field like health care, one of the most important things is that the model can be trained to better know when it doesn't know and say that.

And in addition, suggest follow-up that can be dug into either by the patient in a referral to the health care system or by the doctor if the doctor is using the model a test that they might run additional pathways they may go down to make sure that the patient can be led to the best possible outcome.

We've seen the cost of intelligence drop every year and it's exciting because every year you're able to get better answers, medicine, everything health care across the board.

But what are the challenges?

What are going to be the blockers or what are you looking at ahead to say that okay, we have to solve for this?

The drop and cost intelligence has been super exciting here because so much of what we think about and care about here is actually about access.

And so the more people have access to technology and the more people will benefit and that's why we're working on rolling out, Cheshipity Health, more widely to all free users.

And so that's incredibly exciting and we think about as researchers is like where will the marginal gains intelligence compound the most?

And so I think Nate mentioned the exciting thing which is like there is more and more data that is being collected that is across different modalities.

How do you think about integrating that data across all the different ways that people use Cheshipity and all the different modalities and wearables and things like this that people are collecting, lab tests, things like this.

And that's one place where I think a lot of the intelligence will compound and we'll start to see kind of new zero-one capabilities.

Like a model looks at my entire history over a decade and tells me a prediction that even a human couldn't have because it just the model has higher context size.

So thinking about those zero-one capabilities I think are going to be really cool.

The other thing we keep in mind is just like how are people thinking about and using Cheshipity today?

Can we can even measure that?

Can we improve that?

And I think we're kind of this interesting point right now.

I call this on to our team the transition where for context I I bike to work and I bike to work I wear my helmet.

I I worry about cars and things like this next to me.

I just reach the point here and us if you know S and S if we have a bunch of self-driving cars including waymo's.

I just reach the point where you know when I'm biking next to a waymo I actually feel safer than if I was biking next to a human driver.

I don't worry about whether I'm in their blind spot or or not or anything like this.

So I feel this protective effect by being next to this waymo and I want everybody to have this protective effect right.

I want everybody to have this protective effect with with health AI.

There are these studies showing that you know if you have a doctor in your family that adds a protective effect to your health as well.

And I want everybody whether they're patient or health professional to think about the ways in which the like as a patient you want to feel safer.

Having this as a health professional you want this to be a safety net for the decisions that you're making.

That's another frontier that I think we're going to cross in the next six months or so which is really exciting.

It's kind of an inflection point.

Another thing that we're thinking about is kind of the right ways to think around post-deployment monitoring of certain workflows.

And I think a good example here that what let's talk about is our AI clinical copilot study that we did with Penta Health.

This was a study where we worked with these 20 or so clinics in Nairobi and actually thought about the ways in which we can deploy a safety net for clinicians in that context which is basically monitoring things that they type into their electronic health record and only interrupting their flow when there's something potentially concerning that's going on or a potential error or things like this.

What we found is that when we deployed this to clinicians in the setting that there was actually a statistical significant reduction in diagnostic and treatment errors for the clinicians who are using this tool versus not.

And I think this is a step in the direction of moving beyond kind of model evaluations and even monitoring of the ways in which people are thinking about using CHPT today to actually think about workflows in which these technologies can be deployed and the right ways to evaluate those workflows after deployment.

I think that's another frontier that we are really excited about and would love to see more from our partners.

What do you think the challenges are going to be?

I'll start with talking through some of the the challenges that exist on the professional side.

So each day when healthcare professionals use AI they're looking for the ability to trust what they're seeing in the answer.

And so a lot of our recent work has been making sure that answers that the AI is providing are not just grounded in what the model was training on, but is grounded in the latest medical literature, the latest guidelines.

And sometimes the latest guidance from their own institution or their own region.

Some conditions are treated differently in areas of different areas of the country.

Other times different care settings have different levels of of resources, different levels of specialists and additional services on hand and it can be helpful as a healthcare professional to be able to quickly navigate that and come up with completely personalized care plans.

And so building connectivity within CHPT to not only be HIPAA aligned and be used in these secure environments, but also be able to combine sensitive information with the latest medical knowledge.

I think is a great path that we've started down and something that will continue to keep trust as the top priority between how healthcare professionals engage with AI.

So I think one of the other challenges is that the systems themselves in healthcare are quite siloed, both at an organization level, but also at the tools that have to be used within each organization.

AI thus far has been deployed on a really a point solution basis in the technology industry, but increasingly the connectivity is becoming available to connect the dots between the hundreds of different systems, some analog, some digital, some structures, some unstructured, many decentralized, many not-on-the-cloud, being able to connect all of those through unified AI layers to actually make sure that patients and information isn't falling through the cracks and that the connectivity can be maximized to actually bring the greatest amount of impact.

That's hard in healthcare and it's certainly not something that we can say is solved, but with many of our recent products ranging from chatGPT for healthcare and its connectivity to apps and connectors to the Open AI API for healthcare to our frontier foundation for models and agents, we think increasingly there's going to be an opportunity to really accelerate what is possible within the healthcare system and what agents can achieve.

Part of this seems like it's very collaborative, working on the healthcare industry and I noticed when using the chatGPT Health app, the first thing I did was able to put in my records and get all of that and it looked like there was a lot of just cooperation working across the ecosystem to do this.

How has that come to be?

Where is his headed?

It's extremely important that all of the healthcare system has an equal chance to contribute and engage nationally and internationally with providing the context that will help empower patients to receive the best possible answers from chatGPT and so on the electronic health record side this means working with the government and centers for Medicare and Medicaid services, adopting national standards for electronic health record sinking so that patients in just a few taps are able to bring in their context and consented ways.

It's been able to tap into existing standards like mobile phones and the most popular consumer health products and the most popular biosensors and wearables to make sure again in just one or two taps patients are able to not only bring in that information but leverage it in thoughtful ways and ways that may not have been possible without the combined set of data that can exist in this sort of ecosystem.

So for instance, being able to reference your recent exercise activity when making a plan of how to spend your evening or being able to even do things as simple as, you know, reference your overnight sleep and stress when your agent is helping you set your calendar for the next day and what tasks you make take on first.

It's very exciting, you know, I have, you know, we're smart, ring, a watch, whatever, but I get this data and all I kind of have in my apps or rings to look at and go like, okay, I guess it's doing something.

Being able to plug in a chatGPT has been fantastic because now I'm able to ask those kinds of questions but that's very exciting what you talk about too is if you get a plan from your doctor or suggestions is literally say, hey, I didn't walk enough yesterday, what should I do today?

I've had it be really good at menu planning.

It literally go on this menu, tell me what to order and whatnot.

And so you're saying we're just going to get more of that in much better.

Yeah, and that's why our partnerships, I believe are so important because in these instances, chatGPT doesn't replace the incredible technology that our partners are building, to go deep on health insights for a particular wearable.

But our surface area, our opportunity to bring in that health information can now extend to the many different ways people use chatGPT such as what they're going to cook for dinner or how they're going to plan their after noon.

Sometimes I think of two patients and one patient has to navigate the health care system by themselves and the other patient maybe has a spouse come with them and that spouse has a clip organ used to work as a health care professional and is very attentive if not neurotic and can follow up on details and is connected to your personal calendar.

And the best aspects of that with consent for the patients that want to, I think represents a future where we can make it easier and easier for patients to follow care plans to play active captain like roles in their own health in partnership with their care teams and their physicians.

And I think if we can remove a lot of the friction that historically exists between those processes, whether it's just information not following or there's a lot to keep track of or a lot of old information to parse and bring in, we can do a tremendous amount of good or we can help patients themselves be empowered to do a tremendous amount of good in their own care plans.

And you know it's a physician that it's hard to give as much time as you would like because you can always have more patients you have to deal with then you have hours in the day and it's interesting to see a kind of a technology that has infinite time, infinite patients to be able to do that as a compliment to that.

I mean if there's one thing that health care professionals are short on it's time.

So when we think about our role internally at OpenAI we often break down the work that we're doing into three buckets, raise the floor so make sure that AI and the benefits of AI are accessible to everyone and that could be patients that could be healthcare professionals and others working in health related industries sweep the floor, which means help doctors and help other health professionals save time from the tremendous administration and bureaucratic burdens that they have every day so that they can spend more time with their patients and then thirdly raise the ceiling you know the impact that AI can have in healthcare I think will you know allow us to look back on this this space in a few years and say wow we we have all accelerated together in a way that medicine is still in the in the driver's seat but is also far more empowered than ever before.

Yeah I don't think anybody feels like their doctor spent too much time with them so it looks like this is going to be helpful to solve for that.

What was been your favorite aha or wow or this is really cool moment in the intersection of AI and healthcare?

I'll answer your question in an on-standard way which is I think the most amazing thing to see me for me in the last year has been the rate of adoption of health actually even even beyond the the charity health product before we we announced the charity health product.

It's been one of our fastest growing use cases it's kind of health and wellness questions and we we share that hundreds of millions of people a week are starting to use charity PT for health and wellness.

I think seeing that rep growth especially you know coming from a background of being motivated to work on this problem because I felt like healthcare and clinical AI world were not super aware of the potential of elements in healthcare and seeing how far we had come I think has been a really special moment for me.

There's no doubt that the adoption of this technology and the fact that it is increasingly collaborative with with the healthcare system that is increasingly driving feedback loops back to us to improve the models is the most meaningful thing in the most mission-aligned thing but what I also get excited about is is what our research team is increasingly able to give back to them using that feedback and not only is it the capabilities of the models but it's what can be unlocked once those models are allowed to run longer and have more context and we're starting to see discoveries of medications that have been sitting on a shelf that all of a sudden AI has found ways for them to have meaningful and direct value and in patient lives it is starting to scale experiments that we as individuals wouldn't have been able to juggle on our own and that partnership can bind with that increase capability to finally move from being interesting to being useful and increasingly to being transformative is I think what is the most exciting thing for us heading into this year.

Now that you've been working on this for some time you've been engaged in a clinician and talking to people helping deploy this what has been some of the feedback you've seen.

I think I think the the experience of flying to Nairobi and seeing the clinicians using the tool and the ways in which we did this thing which we call active change management where we worked really closely with these clinicians and flew to Kenya a couple times to think about the ways that we could deepen their workflows using the AI tool and make it something that not only made sense to them but actually became a kind of something that was indispensable for them and so as we were concluding the study the team was actually thinking about the team at Penn to help was thinking about potentially running another study and they actually had a lot of hesitance around running another study because that would have involved having some group of clinicians using AI and some group of clinicians not using AI they actually felt it was dangerous to have a group of clinicians not using the AI and so that's the point at which I was like wow we have done something major here.

I think the the stories that we get back from our members every day are one of the most meaningful parts of the job and these are from caregivers that are you know increasingly understrain taking care of a family members trying to navigate their own health at the same time this is from doctors and nurses who are truly overloaded every day and we can help them extend their expertise and you know can press the the tough parts of their their day a little bit more and then sometimes and and this is more rare but increasing it's the miracle cases it's the the patient who had been bouncing around the system for years the the unsolved diagnosis the the emergency where information wasn't present and suddenly being able to step in and assist and accelerate and bring people into the care that could really help is truly a privilege.

It's exciting it's it's an amplifier and every doctor I know wants to be able to do more for their patients thank you very much this has been very interesting guys thank you