Training Data · 2025-02-25

OpenAI on Deep Research: End-to-End RL as the Path to Capable Agents

Hosts: Unknown

Guests: Isa Fulford, Josh Tobin

Deep ResearchReinforcement LearningAI Agentso3 / reasoning modelsEnd-to-end trainingAgent architecturesWeb browsing agentsKnowledge work automation

Read summary Jump to transcript Original podcast

Podcast feed URL

Open feed

Why it matters

Hard rules (e.g., forbidden databases) should stay in human-written logic.

Key claims

Deep Research is a fine-tuned o3 trained end-to-end with reinforcement learning on hard browsing and reasoning tasks, with browser and Python tool access.
OpenAI explicitly bets against hand-built agent graphs, arguing end-to-end RL produces more flexible, creative strategies than scripted LLM pipelines.
Core thesis: 'you get what you optimize for'—the most powerful agents will be built by RL-tuning reasoning models directly for the outcome, not gluing together non-end-to-end pieces.
Tobin invokes Jan LeCun's cake analogy to explain RL's return: pretrained LLMs are now the cake, so RL 'cherries on top' finally work.

Episode summary

Summary

Isa Fulford and Josh Tobin from OpenAI's Deep Research team join Training Data to discuss the product launched three weeks prior—an agent in ChatGPT that synthesizes web sources into detailed 5–30 minute reports, positioned as OpenAI's second agent release after Operator. They explain that Deep Research is a fine-tuned version of o3 trained end-to-end with reinforcement learning on hard browsing and reasoning tasks, with access to a browser tool and a Python tool. The hosts press them on how it works under the hood, and the team emphasizes that the model's flexibility comes from end-to-end training rather than from a hand-authored graph of LLM calls—the model reacts to live web content and adjusts its own strategy mid-task, which they argue is hard to replicate with scripted agent frameworks.

A central argument of the conversation is that the dominant lesson of machine learning—"you get what you optimize for"—now favors end-to-end RL tuning on top of pretrained reasoning models. Tobin uses Jan LeCun's cake analogy to explain why RL is "so back": pretrained language models are the cake and supervised fine-tuning is the frosting, so RL cherries can finally work. Fulford adds that hand-written logic should only encode hard rules (e.g., data the model must not touch), while everything else is better learned. They argue this recipe—state-of-the-art reasoning model + human-equivalent tools + direct optimization for the outcome—will scale to increasingly complex tasks, framing AGI as increasingly an operational rather than research problem.

The discussion also covers use cases (knowledge work, medical research, shopping, travel, coding, and personalized education), design choices like the upfront clarification flow, the importance of high-quality training data, future plans to expand into private data sources and fuse browsing/computer-use capabilities, and reactions to Sam Altman's claim that Deep Research will take over a single-digit percentage of economically valuable tasks. In a lightning round they predict agents will be the breakout application category of the year and reaffirm reinforcement learning's resurgence.

Deep Research is a fine-tuned o3 trained end-to-end with reinforcement learning on hard browsing and reasoning tasks, with browser and Python tool access.
OpenAI explicitly bets against hand-built agent graphs, arguing end-to-end RL produces more flexible, creative strategies than scripted LLM pipelines.
Core thesis: 'you get what you optimize for'—the most powerful agents will be built by RL-tuning reasoning models directly for the outcome, not gluing together non-end-to-end pieces.
Tobin invokes Jan LeCun's cake analogy to explain RL's return: pretrained LLMs are now the cake, so RL 'cherries on top' finally work.
Hard rules (e.g., forbidden databases) should stay in human-written logic; everything else should be left to the model to learn.
Data quality—credited to team members Edward Sun and others—was a hidden key to making the system work.
Roadmap includes expanding to private data sources, better browsing and analysis, and fusing Deep Research with Operator-style computer use into a unified agent.
Lightning round predictions: agents are the breakout application category of the year, and reinforcement learning is 'so back.'

Source material

Transcript

A lesson that I've seen people learn over and over again in this field is like, you know, we think that we can do things that are smarter than what the models do by writing it ourselves.

But as the field progresses, the models come up with better solutions to things than humans do.

The, like, probably number one lesson on machine learning is, like, you get what you optimize for.

And so if you're able to set up the system such that you can optimize directly for the outcome that you're looking for, the results are going to be much, much better than if you sort of try to glue together models that are not optimized end-to-end for the task that you're trying to have them do.

So my, like, long-term guidance is that, you know, I think, like, reinforcement learning tuning on top of models is probably going to be a critical part of how the most powerful agents get built.

We're excited to welcome Isa Fulford and Josh Tobin, who lead the Deep Research product at OpenAI.

Deep Research launched three weeks ago and has quickly become a hit product used by many tech luminaries like the Colosseins for everything from industry analysis to medical research to birthday party planning.

Deep Research was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks and is the second product in a series of agent lunches from OpenAI with the first being Operator.

We talked to Isa and Josh about everything from Deep Research's use cases to how the technology works under the hood to what we should expect in future agent lunches from OpenAI.

Isa and Josh, welcome to the show.

Thank you.

Thank you so much for joining us.

It's great to be here.

Thank you for having us.

So maybe let's start with, like, what is Deep Research?

Tell us about the origin stories and what this product is doing.

So Deep Research is an agent that is able to search many online websites and it can create very comprehensive reports.

It can do tasks that would take humans many hours to complete and it's in chat GPT and it takes like five to 30 minutes to answer you and so it's able to do much more in-depth research and answer your questions with much more detail and specific sources than regular chat GPT response would be able to do.

It's one of the first agents that we've released.

We released Operator pretty recently as well and so Deep Research is the second agent and we'll release many more in future.

What's the origin story behind Deep Research?

When did he choose to do this?

What was the inspiration and how many people work on it?

What did it take to bring this to fruition?

Good question.

This is before my time.

So I'm curious to hear that.

So I think maybe around a year ago we were seeing a lot of success internally with this new reasoning paradigm and training models to think before responding and we were focusing a lot on math and science domains but I think that the other thing that this kind of new reasoning model regime unlocks is the ability to do longer horizon tasks that involve like agentic kind of abilities and so we thought a lot of people do tasks that require a lot of online research or a lot of external context and that involves a lot of reasoning and discriminating between sources and you have to be quite creative to do those kinds of things and I think we finally had models or a way of training models that would allow us to be able to tackle some of those tasks.

So we decided to try and start training models to do first browsing tasks so using like the same methods that we use to train reasoning models but on more real world tasks.

Was it your idea and Josh how did you get involved?

At first it was like me and Josh Patil who as our opening eye is working on a similar project that will be released at some point which we're very excited about and we built an original demo and then also with Thomas Stimson who's one of those people who just is an amazing engineer like will dive into anything and just get loads of things on so it was very fun.

Yeah and I joined more recently I rejoined opening eye about six months ago from my startup.

I was an eye opening eye in the early days and was looking around at projects when I rejoined and got very interested in some of our agentic efforts including this one and got involved with that.

Amazing.

Well tell us a little bit about who you built it for.

Yeah it's really for anyone who does knowledge work as part of their day to day job or really as part of their life.

So we're seeing a lot of the usage come from people using it for work doing things like research as part of their jobs for understanding markets, companies, real estates.

A lot of scientific research, medical.

I think we've seen a lot of medical examples as well.

And one of the things we're really excited about as well is this style of like I just need to go out and spend many hours doing something that you know where I have to do a bunch of web searches and collate a bunch of information is not just a work thing but it's also useful for shopping and travel as well.

So we're excited for the plus launch so that more people will be able to try deep research and maybe we'll see some new use cases as well.

It's definitely one of the products I've used the most over the last couple weeks.

It's been amazing.

Using it for work?

For work definitely.

Also for fun.

What are you using it for?

Oh for me?

Oh my goodness.

So I was thinking about buying a new car and I was trying to figure out when the next model was going to be released for the car and there's all these speculative blog posts like there's patterns from the manufacturer and so I asked deep research can you break down all the gossip about this car and then all of the facts about what they've done and what this automaker said before and it put together an amazing report that told me maybe wait a couple months but this year like in the next few months it should come out.

Yeah like one of the things that's really cool about it is it's like it's not just for going broad and gathering all of the information about a source but it's also really good at finding like very obscure like weird facts on the internet.

Like if you have something very specific you want to know that you might not just turn up in the first page of search results it's good at that kind of thing too.

So that's cool.

What are some of the surprising use cases that you've seen?

I think the thing I've been most surprised by is how many people are using it for coding.

Yeah.

Which wasn't really a use case I'd considered but I've seen a lot of people on Twitter and in various places where we get feedback using it for coding and code search and also for finding the latest documentation or a certain package or something and helping them write a script or something.

So yeah I'm like I'm kind of embarrassed that we didn't think of that as a use case because it's like you know for chat tribute users it seems so obvious but I know it's impressive how well it works.

How do you think the balance of business versus individual use case will evolve over time?

Like you mentioned the plus launch that's happening you know in a year's time or two years time would you guess this is mostly a business tool or mostly a consumer tool?

I would say hopefully both.

I think it's a pretty general capability which and I think it's something that we do both in work and in personal life.

Yeah I'm excited about both.

I think the magic of it is like it just saves people a lot of time.

If there's something that might have taken you hours or in some cases we've heard like days people can just put it in here and get you know 90% of what they would have come out up with on their own.

And so yeah I tend to think there's like there's more tasks like that in business than they're in personal but I mean I think for sure it's gonna be part of people's lives in both.

It's really become the majority of my usage for chat tribute users always pick deep research rather than normal.

So what are you seeing in terms of consumer use cases and what are you excited about?

I think a lot of shopping, travel recommendations, I personally used the model a lot.

I've been using it for months to do these kinds of things.

We were in Japan for the launch of deep research so it was very helpful in finding restaurants with very specific requirements and finding things that I wouldn't have necessarily found.

Yeah and I found it like when you have something it's like the kind of thing where you know if you're shopping maybe for something expensive or you're planning a trip that is special or you want to spend a lot of time thinking about.

It's like for me you know I might go and spend hours and hours like trying to read everything on the internet about this one this product that I'm interested in buying like scouring all the reviews and the forums and stuff like that and deep research can put together kind of like something like that very quickly and so it's really useful for that kind of thing.

The model is also very good at instruction following so if you have a query with many different parts or many different questions so if you want the information about the product but you also want comparisons to all other products and you also want information about reviews from you know Reddit or something like that you can give loads of different requirements and it will do all of them for you.

Yeah another tip is like just ask it to format it in a table.

You'll usually do that anyway but it's like if you it's really helpful to have like a table with a bunch of citations and things like that for all the categories of things that you want to research.

Yeah there are also some features that hopefully we'll get into the product at some point but the model is able to the underlying model is able to embed images so it can find images of the products and it's also this is not a consumer use case but it's able to create graphs as well and then embed those in its response so hopefully that will come to chat to be to you soon as well.

Nerdy consumer use case.

Yeah.

Although speaking of nerdy consumer use cases also like personalized education is a really interesting use case.

It's like if there's if there's a topic that you've been meaning to learn about you know if you need to brush up on your biology or you know you want to learn about like like like some some world event it's it's really good at you know put put in all the information about what you feel like you don't understand what aspects of it you want to go do research on it and it'll put together a nice report for you.

One of my friends is considering starting a CPG company and he's been using it so much to find similar products to see if specific names are already you know the domains already taken market sizing like all of these different things so that's when that's when fun to he'll share the reports with me and I'll read them so it's been pretty fun to see.

Another like fun use case is it's really good at finding like a single like obscure fact on the internet like if there's like a you know like an obscure TV show or something that you want to you know to like find like one particular episode of or something like that it'll go and it'll go deep and find like one reference to it on the web.

Oh yeah my my brother's friend's dad had this very specific fact it was about some Austrian general who was empowered during a certain a death of someone during a battle like a very niche question and apparently chat GPT had previously answered it wrong and he was very sure that it was wrong so you went to the public library and found a record and found that it was wrong and so then deep research was able to get it right so we sent it to him and he was he was excited.

What is the rough mental model for you know what deep research is excellent at today and you know where should people be using the O series of models where should where should they be using deep research?

Deep research really excels at is if you have a sort of detailed description of what you want and in order to get the best possible answer requires reading a lot of the internet.

If you have kind of like more of a big question it'll help you kind of clarify what you want but it's I mean it's it's really at its best when there's like a specific set of information that you're looking for.

I think it's very good at synthesizing information it encounters it's very good at finding specific like hard to find information but it's maybe less and it can make kind of some new insights I guess from what it from what it encounters but I don't think it's necessary it's not making new scientific discoveries yet and then I think using the O series model for me if I'm asking for something to do with coding usually it doesn't require knowledge outside of what the model already knows from it like pre-training so you would usually use O1 pro or O1 for coding or O3 mini high.

And so Deep Research is a great example of where some of the new product directions for OpenAI are going.

I'm curious how can the extent you can share how does it work?

The model that powers deep research is a fine-tuned version of O3 which is our most advanced reasoning model and we specifically trained it on hard browsing tasks that we collected as well as other reasoning tasks and so it also has access to a browsing tool and python tool so through training end-to-end on those tasks it learned like strategies to solve them and the resulting models good at online search and analysis.

Yeah like intuitively the way you can think about it is you make this sort of this request ideally a detailed request about what you want the model thinks hard about that it searches for information it pulls that information and it reads it it understands how it relates that request and then decides what to search for next in order to get kind of closer to the final answer that you want and it's trained to do a good job of pulling together all of those all that information to a nice tidy report with citations that point back to the original information that I found.

Yeah I think what's new about deep research as an agentic capability is that because we have the ability to train end-to-end there are a lot of things that that you have to do in the process of doing research that you couldn't really predict beforehand so I don't think it's possible to write some kind of language model program or script that would be as flexible as what the model is able to learn through training where it's actually reacted to live web information and based on something it sees it has to make a change its strategy and things like that so we actually see it doing pretty creative searches you can read the chain of thought summary and I'm sure you can see sometimes it is very smart about how it comes up with the next thing to look for.

So John Carlson had a tweet that went somewhat viral you know how much of the magic of deep research is you know real time access to web content and how much of the magic is in kind of chain of thought can you maybe shed some light on that?

I think it's definitely a combination I think you can see that because there are other such products that don't necessarily that weren't trained end-to-end so won't be as flexible in responding to you know responding to information in accountants won't be as creative about how to solve specific problems because they weren't specifically trained for that purpose so it's definitely a combination I mean it's a fine-tuned version of O3, O3 is a very smart and powerful model a lot of the analysis capability is also from the underlying O3 model training so I think it's definitely a combination.

Before OpenAI was working at a startup and we were dabbling in building agents kind of the way that I see most people describe building agents on the internet which is essentially you know you construct this graph of operations and some of the nodes in that graph are language models and so you can the language model can decide what to do next but the overarching logic of the sequence of steps that happen is defined by a human and what we found is that it's really it's like powerful way of building things to get quickly to a prototype but it falls down pretty quickly in the real world because it's very hard to anticipate all the scenarios that the model might face and think about all the different branches of the path that you might want to take.

In addition to that the models often are not the best decision makers at nodes in that graph because they weren't trained to do to make those decisions they were trained to do things that look similar to that and so I think the thing that's really powerful about this model is that it's trained directly end to end to solve the kinds of tasks that users are using it to solve.

So you don't have to set up a graph or make those node like decisions on the back on the architecture on the back end.

It's all driven by the model itself.

Yeah.

I'm going to say more about this because you know it seems like that's one of the very opinionated decisions that you've made and clearly it's worked.

There's so many companies that are building on your API kind of prompting to you know to solve specific tasks for specific users.

Do you think all a lot of those applications would be better served by kind of having you know trained models end to end for their specific workflows?

I think if you have a very specific workflow that is quite predictable it makes a lot of sense to do something like Josh described.

But if you have something that has a lot of edge cases or it needs to be quite flexible then I think something similar to deep research is probably a better approach.

Yeah I think like the guidance I give people is the one thing that you don't want to bake into the model is like kind of hard and fast rules.

Like if you have you know a database that you don't want the model to touch or something like that it's better to encode that in human written logic.

But I think it's kind of like a lesson that I've seen people learn over and over again in this field is like you know we think that we can do things that are smarter than what the models do by writing it ourselves.

But in reality like usually the model like as the field progresses the model come up with better solutions to things than humans do.

And also like you know the like probably number one lesson in machine learning is like you get what you optimize for.

And so if you if you're able to set up the system such that you can optimize directly for the outcome that you're looking for the results are going to be much much better than if you sort of try to glue together models that are not optimized end to end for the tasks that you're trying to have them do.

So my like long term guidance is that you know I think like reinforcement learning tuning on top of models is probably going to be a critical part of how the most powerful agents get built.

What were the biggest technical challenges along the way to making this work?

Well I mean maybe I can say as like an observer rather than someone who was involved in this from the beginning.

But it seems like kind of one of the things that that ESA and the rest of the team worked really really hard on and was kind of like one of the hidden keys to success was like making really high quality data sets.

That's you know another one of those like age old lessons in machine learning that people keep relearning but the quality of the data that you put into the model is probably the biggest to her many factor in the quality of the model that you get on the other side.

And then have someone like Edward Sun who's other person who works on the project who just any data set he will optimize.

So that's a secret to success.

Find your Edward.

Yes great great machine learning model training.

How do you make sure that it's right?

Yeah so that's obviously a cool part of this model and product is that we want it to be users to be able to trust the outputs.

So part of that is we have citations and so users are able to see where the model is citing its information from.

And we during training that's something that we actually like try and make sure is correct but it's still possible for the model to make mistakes or hallucinate or trust a source that maybe isn't the most trustworthy source of information.

So that's definitely an active area where we want to continue improving the model.

How should we think about this together with you know O3 and operator and effort other different leases like does this use operator is it do these all build on top of each other or are they all kind of a series of different applications of O3?

Today these are pretty disconnected but you can kind of you can imagine kind of where we're going with this right which is like the ultimate agent that people have access to at some point in the future should be able to do you know not just web search or using a computer or any of the other types of actions that you'd want like kind of a human assistant to do but should be able to fuse all these things in a more natural way.

Any other design decisions that you know you've taken that are maybe not obvious at first glance?

I think one of them is the the clarification flow so if you've used deep research the model will ask you questions before starting its research and usually it's actually BT maybe will ask you a question at the end of its response but usually doesn't have such that kind of behavior upfront and that was intentional because you will get the best response from the deep research model if the prompt is very well specified and detailed and think that it's not the natural user behavior to give all of the information in the first prompt so we wanted to make sure that if you're going to wait five minutes, 30 minutes that your response is as detailed and satisfactory so we added this additional step to make sure that the user provides all the detail that we would need and I've actually seen a bunch of people on Twitter saying that they have this flow or that they will talk to 01 or 01 pro to help make their prompt more detailed and then once they're happy with the prompt then they'll send it to deep research which is interesting to people finding their own own workloads for how to use this.

So there's been three different deep research products launched in the last few months.

Tell us a little about what makes you guys special and how we should think about it.

And they're all called deep research right?

They're all called deep research, not a lot of naming creativity in this field.

I think people should trial them for themselves and get a feel.

I think the difference in quality, I think they all have pros and cons but I think the difference will be clear but what that comes down to is just the way that this model was built and the effort that went into constructing the data sets and then the engine that we have with the O-series models which allows us to just optimize models to make things that are really smart and really high quality.

We had the 01 team on the podcast last year and we were joking that OpenAI is not that good at naming things.

I will say this is your best named product.

Deep research is.

At least it describes what it does I guess.

Yeah.

So I'm curious to hear a little about where you want to go from here.

You have deep research today.

What do you think it looks like a year from now and what may be our complementary things you want to build along the way?

We're excited to expand the data sources that the model has access to.

We've trained a model that's generally very good at browsing public information but it should also be able to search private data as well.

And then I think just pushing the capabilities further so it could be better at browsing it could be better at analysis.

And then thinking about how this fits into our agent roadmap more broadly.

I think that the recipe here is something that's going to scale to a pretty wide range of use cases.

Things that are going to surprise people how well they work.

But this idea of you take a state of the art reasoning model you give it access to the same tools that humans can use to do their jobs or to go about their daily lives.

And then you optimize directly for the kinds of outcomes that you're looking that you want the agent to be able to do.

That recipe there's like really nothing stopping that recipe from scaling to more and more complex tasks.

So I feel like AGI is like an operational problem now.

And I think yeah a lot of things to come in that general formula.

So Sam had a pretty striking quote of deep research will take over a single dinner percentage of all economically viable tasks valuable tasks in the world.

How should we think about that?

I think of it as like it's deep research is not capable of doing all of what you do.

But it is capable of saving you like hours or sometimes in some cases days at a time.

And so I think like what we're hopefully relatively close to is deep research and the agents that we build next and the agents that we build on top of it giving you one five ten twenty five percent of your time back depending on the type of work that you do.

I mean I think you've already automated 80 percent of what I do.

It's definitely on the higher end for me.

We just need to start writing checks I guess.

Are there entire job categories that you think are kind of more at risk is the wrong word but like more in the in the strike zone for what deep research is exceptional.

So for example I'm thinking consulting.

But like are there specific categories that you think are more in the strike zone.

Yeah I used to be consulted.

I don't think any jobs are at risk.

Like I don't really think of this as like a labor replacement kind of thing at all.

Like it's but for these types of knowledge work jobs where like where you are spending a lot of your time kind of looking through information making conclusions I think it's it's going to give people superpowers.

Yeah I'm very excited about a lot of the medical use cases just the ability to find all of the literature or all of the recent cases for a certain condition.

I think I've already seen a lot of doctors posting about this or they've reached out to us and said oh we used it for this thing we used it to help find a clinical trial for this patient or something like that.

So just people who are already so busy just saving some time or it's maybe something that they wouldn't have had time to do.

And now they are able to have that information for them.

And I think the like the impact of that is like maybe a little bit more profound than it sounds on the surface right.

It's not just like it's not just like you know getting five percent of your time back but it's the type of thing that might have taken you four hours or eight hours to do.

Now you can do for you know a chat GPT subscription and five minutes.

And so like what types of things would you do if you had infinite time that now maybe you can do like many many copies of.

So like you know should you do research on every single possible startup that you could invest in instead of just the ones that you have time to meet with things like that.

Or on the consumer side one thing that I'm thinking of is you know the working mom that's too busy to plan a birthday party for her toddler like now it's now it's doable.

So it's I agree with you it's way more important than five percent of your time.

It's all the things you couldn't do before.

Exactly.

What does this change about education and the way we should learn.

And you know what will you be teaching your kids now that we're in a world of agents and deep research.

Education's been out like one of the top few things that people use it for.

I think it's I mean this is true for a chat GPT generally it's it's like if like learning things by talking to an AI system that is able to like personalize the information that gives you based on what you tell it or maybe in the future what it knows about you feels like a much more efficient way to learn and a much more engaging way to learn than like reading textbooks.

We have some lightning round questions.

All right.

Okay.

Your favorite deep research use case.

I'll say yeah like personalized education just like learning about anything I want to say.

I've already mentioned this but I think a lot of the personal stories that people have shared about finding information about a diagnosis that they received or someone in their family received have been really great to see.

Okay.

We saw a few application categories breakout last year.

So for example coding being out being an obvious one what application categories you think will break out this year.

I mean clearly agents agents.

I was going to say to you.

Okay.

20 to 25 0 of the agent.

I think so.

And then how do you think about what piece of content that you should recommend people reading to read to learn more about agents or where the state of the eye is going.

Could be an author to training data.

This podcast.

I think it's like it's so hard to keep up with the state of the art in AI.

I think the like the general advice I have for people is like pick one or two sub topics that you're really interested in and go like curate a list of people who are we think are saying interesting things about it and like how to find those one or two things they were interested in.

Maybe actually that's a good deep research use case like you know go go go use it to like go deep on things that you want to learn more about.

This is a bit old now but I think a few years ago I watched the I think it's called like foundations of RL or something like this from Peter Abiel and it's it's a few years old but I think that it was a good introduction to reinforce with nothing.

So yeah I would definitely second any any content by Peter Abiel my grad school advisor.

Oh yeah.

Okay reinforcement learning is it you know it kind of went through a peak and then felt like it was a little bit of a dull dream again and is speaking again is that the right read on what's happening with RL.

So back.

Yeah.

So back.

Yeah.

Why why now?

Because everything else is working.

Like I think if you maybe people have been following the field for a while will remember the gallon laccoon cake analogy analogy if you're building a cake then most of the cake is the cake and then there's a little bit of frosting and then there's a few cherries on top and the analogy was that like unsupervised learning is the cake supervised learning is the frosting and reinforcement learning is the cherries on top when we in the field were working on reinforcement learning back in 2015 2016.

It's kind of like I think Jan Lekun's analogy which I think in retrospect is probably correct is that we were like trying to add the cherries before we had the cake.

But now we have language models that are pre trained on massive amounts of data and are incredibly capable.

We know how to how to do supervised fine tuning on those language models to make them good instruction following and generally doing the things that people want them to do.

And so now that that works really well it's like very ripe to tune those models for any kind of use case that you can define a reward function for.

Great.

Okay.

So from this lightning round we got agents will be you know the breakout category in 2025 and reinforcement learning is so back.

I love it.

Thank you guys so much for joining us.

We love this conversation.

Congratulations on the incredible product and we can't wait to see what comes of it.

Thank you.