
Latent Space · 2025-03-11
OpenAI Launches Agents Platform: Responses API, Agents SDK, and New Built-in Tools
Hosts: Alessio (Decibel), swyx (Small.AI)
Guests: Romain Huet, Nikunj Kothari
Why it matters
Three new built-in tools launched: web search, improved file search, and computer use (Operator's underlying model)
Key claims
- Three new built-in tools launched: web search, improved file search, and computer use (Operator's underlying model)
- Responses API introduced as a unified primitive designed for agentic, multi-turn workflows—a strict superset of Chat Completions and Assistants API
- Chat Completions will continue to be supported; Assistants API targeted for sunset in H1 2026 with full-year migration assistance
- Responses API stateful storage is free for 30 days, enabling built-in observability via the OpenAI dashboard
Episode summary
Summary
Romain Huet and Nikunj Kothari from OpenAI join Latent Space to announce a suite of developer-facing launches positioned around OpenAI's thesis that 2025 is the "year of agents." The core releases include three new built-in tools—web search, improved file search, and computer use—alongside a new Responses API and an Agents SDK that replaces the experimental Swarm framework.
The Responses API is framed as a unified, strict superset of Chat Completions and the Assistants API, designed for multi-turn agentic workflows. OpenAI confirmed Chat Completions will remain supported for years, while the Assistants API is targeted for sunset in H1 2026 with a full year of migration support. Notably, stateful storage on the Responses API is free for 30 days, which the hosts flagged as a surprisingly developer-friendly decision.
On the tools, the web search offering builds on the same fine-tuned model powering ChatGPT search (cited as jumping from 30% to 90% accuracy on simple QA) and exposes citations with sub-paragraph precision. The computer use tool is the same model behind Operator, described as still in early stages ("GPT-1 or GPT-2 of computer use"). File search gains metadata filtering, expanded file types, query optimization, and custom re-ranking. The Agents SDK retains Swarm's handoff pattern but adds type support, guardrails, and OpenAI dashboard–native tracing, with plug-in support for third-party chat completion APIs and tracing providers. The team hinted at eventually tying agent traces into the Reinforcement Fine-Tuning (RFT) workflow for iterative agent improvement.
- Three new built-in tools launched: web search, improved file search, and computer use (Operator's underlying model)
- Responses API introduced as a unified primitive designed for agentic, multi-turn workflows—a strict superset of Chat Completions and Assistants API
- Chat Completions will continue to be supported; Assistants API targeted for sunset in H1 2026 with full-year migration assistance
- Responses API stateful storage is free for 30 days, enabling built-in observability via the OpenAI dashboard
- GPT-4o search preview (fine-tuned for search) exposed in Chat Completions; web search also available as a tool in Responses API, with sub-paragraph citations
- File search adds metadata filtering, broader file type support, query optimization, and custom re-ranking; positioned as a managed RAG service
- Agents SDK replaces Swarm, preserving the handoff pattern while adding guardrails, type support, tracing, and multi-provider compatibility
- Long-term vision: link agent traces from the SDK to the RFT (Reinforcement Fine-Tuning) pipeline for self-improving agents
Source material
Transcript
[MUSIC PLAYING] Hey, everyone.
Welcome back to another "Late in Space" landing episode.
This is Alessio, partner and CTO at Desible.
And I'm joined by Spix, founder of Small.AI.
Hi, and today with a super special episode-- because we're talking with our old friend Roman-- hi, welcome.
Thank you.
Thank you for having me.
And Nikunj, who is most famously is, if anyone has ever tried to get any access to anything on the API, Nikunj is the guy.
So I know your emails because I look forward to them.
Yeah, nice to meet all of you.
I think that we're basically convening today to talk about the new API.
So perhaps you guys want to just kick off, what is OpenAI launching today?
Yeah, so I can kick it off.
We're launching a bunch of new things today.
We're going to do three new built-in tools.
So we're launching a web search tool.
This is basically chat GPD for search, but available in the API.
We're launching an improved file search tool.
So this is you bringing your data to OpenAI.
You upload it.
We take care of parsing it, chunking it, embedding it, making it searchable, give you this ready vector store that you can use.
So that's the file search tool.
And then we're also launching our computer use tool.
So this is the tool behind the operator product in chat GPD.
So that's coming to developers today.
And to support all of these tools, we're going to have a new API.
So we launched chat completions, I think March 2023 or so.
It's been a while.
So we're looking for an update over here to support all the new things that the models can do.
And so we're launching this new API called the responses API.
It works with tools.
We think it will be a great option for all the future agentic products that we build.
And so that is also launching today.
Actually, the last thing we're launching is the agents SDK.
We launched this thing called Swarm last year, where it was an experimental SDK for people to do multi-agent orchestration and stuff like that.
It was supposed to be educational, experimental, but people really loved it.
They ate it up.
And so we were like, all right, let's upgrade this thing.
Let's give it a new name.
And so we're calling it the agents SDK.
It's going to have built-in tracing in the open AI dashboard.
So lots of cool stuff going out.
So yeah, sorry about that.
That's a lot.
But we said 2025 was a year of agents.
So there you have it, a lot of new tools to build these agents for developers.
OK, I guess we'll just kind of go one by one.
It'll leave the agents SDK towards the end.
So response is API.
I think the sort of primary concern that people have, and something I think I voiced to you guys when I was talking with you in the planning process, was is chat completion going away?
So I just wanted to let you guys respond to the concerns that people might have.
Chat completion is definitely here to stay.
It's a bare metal API we've had for quite some time.
Lots of tools built around it.
So we want to make sure that it's maintained and people can confidently keep on building on it.
At the same time, it was kind of optimized for a different world.
It was optimized for a free multimodality world.
We also optimized for kind of single turn text to prompt, text prompt in, text response out.
And now with these agentic workflows, we noticed that developers and companies want to build longer horizon tasks, like things that require multiple turns to get the task accomplished.
And computer use is one of those, for instance.
So that's why the responses API came to life to kind of support these new agentic workflows.
But chat completion is definitely here to stay.
And assistance API, we've had to target sunset data first half of 2026.
So this is kind of like-- in my mind, there was kind of very poetic mirroring of the API with the models.
I kind of view this as like kind of the merging of assistance API and chat completions, right, into one unified responses.
So it's kind of like how GPT and the old series models are also unifying.
Yeah, that's exactly the right framing, right?
Like, I think we took the best of what we learned from the assistance API, especially being able to access tools very conveniently.
But at the same time, simplifying the way you have to integrate.
You no longer have to think about six different objects to kind of get access to these tools.
With the responses API, you just get one API request.
And suddenly, you can sweep in those tools, right?
Yeah, absolutely.
And I think we're going to make it really easy and straightforward for assistance API users to migrate over to responses API without any loss of functionality or data.
So our plan is absolutely to add assistant-like objects and thread-like objects that work really well with the responses API.
We'll also add the code interpreter tool, which is not launching today, but it'll come soon.
And we'll add async mode to responses API, because that's another difference with assistance.
We'll have web hooks and stuff like that.
But I think it's going to be a pretty smooth transition once we have all of that in place.
And we'll give folks a full year to migrate and help them through any issues they face.
So overall, I feel like the assistance users are really going to benefit from this longer term with this more flexible primitive.
How should people think about when to use each type of API?
So I know that in the past, the assistance was maybe more stateful, kind of like long running.
Many tool use, kind of like file-based things.
And the check completions is more stateless, kind of like traditional completion API.
Is that still the mental model that people should have?
Or should you, by default, always try and use the responses API?
So the responses API is going to support everything that-- it's at launch going to support everything that chat completion supports.
And then over time, it's going to support everything that assistance supports.
So it's going to be a pretty good fit for anyone starting out with OpenAI.
They should be able to go to responses.
Responses, by the way, also has a stateless mode.
So you can pass in store false, and that'll make the whole API stateless, just like chat completions.
We're really trying to get this unification story in, so that people don't have to juggle multiple endpoints.
That being said, chat completions are most widely adopted API.
It's so popular.
So we're still going to support it for years with new models and features.
But if you're a new user, or if you want an existing user, you want to tap into some of these built-in tools or something, you should feel totally fine migrating to responses.
And you'll have more capabilities and performance than chat completions.
I think the messaging that I agree, that I think resonated the most when I talked to you, was that it is a strict superset.
Like you should be able to do everything that you could do in chat completions and with assistance.
And the thing that I just assumed that because you're now by default a stateful, you're actually storing the chat logs or the chat state.
I thought you'd be charging me for it.
So to me, it was very surprising that you figured out how to make it free.
Yeah, it's free.
We store your state for 30 days.
You can turn it off.
But yeah, it's free.
Interesting thing on state is that it just makes-- particularly for me, it makes debugging things and building things so much simpler, where I can create a responses object.
It's pretty complicated and part of this more complex application that I've built.
And I can just go into my dashboard and see exactly what happened.
Did I mess up my prompt?
Did it not call one of these tools?
Did I misconfigure one of the tools?
The visual observability of everything that you're doing is so, so helpful.
So I'm excited about people trying that out and getting benefits from it too.
Yeah, it's really, I think, really nice to have.
But all I'll say is that my friend Corey Quinn says, anything that can be used as a database will be used as a database.
So be prepared for some abuse.
[LAUGHTER] All right.
Yeah, that's a good one.
We have some of that with the metadata.
People are very, very creative at stuff that you need to have an object.
We do have metadata with responses.
Exactly.
Let's get through all of these.
So web search-- I think when I first said web search, I thought you were going to just expose an API that then return kind of like a nice list of things.
But the way its name is, like, GPT-4.0 search preview.
So I'm guessing you have-- you're using basically the same model that is in the chat GPT search, which is fine tune for search.
I'm guessing it's a different model than the base one.
And it's impressive the jump in performance.
So just to give an example, in simple QA, GPT-4.0 is 30% accuracy.
4.0 search is 90%.
We always talk about how tools are like-- models is not everything you need.
Like, tools around it are just as important.
So yeah, maybe give people a quick review on the work that went into making this special.
Should I take that?
Yeah.
So first thing, we're launching web search in two ways.
One, in responses API, which is our API for tools, it's going to be available as a web search tool itself.
So you'll be able to go tools, turn on web search, and you're ready to go.
We still wanted to give chat completions people access to real time information.
So in that chat completions API, which does not support built-in tools, we're launching the direct access to the fine tuned model that chat GPT-4.0 search uses.
And we call it GPT-4.0 search preview.
And how is this model built?
Basically, our search research team has been working on this for a while.
Their main goal is to get information, get a bunch of information from all of our data sources that we use to gather information for search, and then pick the right things and then cite them as accurately as possible.
And that's what the search team has really focused on.
They've done some pretty cool stuff.
They use synthetic data techniques.
They've done O-series model distillation to make these four fine tunes really good.
But yeah, the main thing is can it remain factual?
Can it answer questions based on what it retrieves?
And can it cite it accurately?
And that's what this fine tune model really excels at.
And so yeah, we're excited that it's going to be directly available in chat completions along with being available as a tool.
Yeah, just to clarify, if I'm using the responses API, this is a tool.
But if I'm using chat completions, I have to switch model.
I cannot use O-1 and call search as a tool.
Yeah, that's right.
Exactly.
I think what's really compelling, at least for me and my own uses of it so far, is that when you use web search as a tool, it combines nicely with every other tool and every other feature of the platform.
So think about this for a second, for instance.
Imagine you have a responses API call with the web search tool.
But suddenly you turn on function cooling.
It also turns on, let's say, structured outputs.
Now you cannot have the ability to structure any data from the web in real time in the JSON schema that you need for your application.
So it's quite powerful when you start combining those features and tools together.
It's kind of like an API for the internet almost.
You get access to the precise schema you need for your app.
Yeah.
And then just to wrap up on the infrastructure side of it, I read on the post that people, publisher, can choose to appear in the web search.
So are people by default in it?
How can we get a latent space in the web search API?
Yeah.
Yeah, I think we have some documentation around how websites publishers can control what shows up in our web search tool.
And I think you should be able to read that.
I think we should be able to get latent space in for sure.
Yeah.
So I compare this to a broader trend that I started covering last year of online LOMs.
Actually, Perplexity, I think, was the first to offer an API that is connected to search.
And then Gemini had the search grounding API.
And I think you guys-- I actually didn't-- I missed this in the original reading of the docs.
But you even give citations with the exact sub-paragraph that is matching, which I think is the standard nowadays.
I think my question is, how do we take what a knowledge cutoff is for something like this?
Because now, basically, there's no knowledge cutoff.
It's always live.
But then there's a difference between what the model has internalized in its backpropagation and what is searching up its reg.
I think it kind of depends on the use case and what you want to showcase as the source.
Like, for instance, you take a company like Hebia that has used this web search tool.
They can combine for credit firms or law firm.
They can find public information from the internet with the live sources and citation that sometimes you do want to have access to, as opposed to the internal knowledge.
But if you're building something different, well, you just want to have an assistant that relies on the deep knowledge that the model has, you may not need to have these direct citations.
So I think it kind of depends on the use case a little bit.
But there are many companies like Hebia that will need that access to these citations to precisely know where the information comes from.
Yeah, yeah, for sure.
And then one thing on the breadth, I think a lot of the open deep research implementations have this sort of hyperparameter about how deep they're searching and how wide they're searching.
I don't see that in the docs, but is that something you can tune?
Is that something you recommend thinking about?
Super interesting.
It's definitely not a parameter today, but we should explore that.
It's very interesting.
I imagine how you would do it with the web search tool and responses API is you would have some form of agent orchestration over here where you have a planning step.
And then each web search call that you do explicitly goes a layer deeper and deeper and deeper.
But it's not a parameter that's available out of the box.
That's a cool thing to think about.
The only guidance I'll offer there is a lot of these implementations offer top K, which is top 10, top 20.
But actually, you don't really want that.
You want some kind of similarity cutoff, some matching score cutoff.
Because if there's only five things, five documents that match fine, if there's 500 that match, maybe that's what I want.
But also, that might make my costs very unpredictable because the costs are something like $30 per 1,000 queries.
Yeah.
Yeah, I guess you could have some form of a context budget.
And then you're like, go as deep as you can and pick the best stuff and put it into x number of tokens.
There could be some creative ways of managing costs.
But yeah, this is a super interesting thing to explore.
Do you see people using the files in the search API together where you can kind of search and then store everything in the file so the next time I'm not paying for the search again?
And yeah, how should people balance that?
That's actually a very interesting question.
Let me first tell you about how I've seen-- a really cool way I've seen people use files in search together is they put their user preferences or memories in the vector store.
And so a query comes in.
You use the file search tool to get someone's reading preferences or fashion preferences and stuff like that.
And then you search the web for information or products that they can buy related to those preferences.
And you then render something beautiful to show them, here are five things that you might be interested in.
So that's how I've seen file search, web search work together.
And by the way, that's a single response API call, which is really cool.
So you just configure these things, go boom, and everything just happens.
But yeah, that's how I've seen files and web work together.
But I think that what you're pointing out is interesting.
And I'm sure developers will surprise us, as they always do, in terms of how they combine these tools and how they might use file search as a way to have memory and preferences like Nick Oop says.
But I think zooming out, what I found very compelling and powerful here is when you have these neural nets that have all of the knowledge that they have today, plus real time access to the internet for any kind of real time information that you might need for your app, and file search, where you can have a lot of company private documents, private details, you combine those three, and you have very, very compelling and precise answers for any kind of use case that your company or your product might want to enable.
It's a difference between internal documents versus the open web, right?
Like you're going to need both.
Exactly.
Exactly.
I never thought about it doing memory as well.
I guess, again, anything that's a database, you can store it and we'll use it as a database.
That sounds awesome.
But I think also you've been expanding the file search.
You have more file types.
You have query optimization, custom re-ranking.
So it really seems like it's been fleshed out.
Obviously, I haven't been paying a ton of attention to the file search capability.
But it sounds like your team has added a lot of features.
Yeah, metadata filtering was the main thing people were asking us for for a while.
And that's the one I'm super excited about.
I mean, it's just so critical.
Once your vector store size goes over more than 5,000, 10,000 records, you kind of need that.
So yeah, metadata filtering is coming too.
Yeah, for most companies, it's also not a competency that you want to rebuild in-house necessarily.
Thinking about embeddings and chunking and how of that, it sounds very complex for something very obvious to ship for your users.
Companies like Navan, for instance, they were able to build with the file search.
Take all of the FAQ and travel policies, for instance, that you have.
You put that in file search tool.
And then you don't have to think about anything.
Now your assistant becomes naturally much more aware of all of these policies from the files.
The question is, there's a very, very vibrant rag industry already, as you all know.
So there's many other vector databases, many other frameworks.
Probably if it's an open source stack-- I'll say a lot of the AI engineers that I talked to want to own this part of the stack.
And it feels like, when should we DIY?
And when should we just use whatever OpenAI offers?
Yeah, I mean, if you're doing something completely from scratch, you're going to have more control, right?
So super supportive of people trying to roll up their sleeves, build their super custom chunking strategy and super custom retrieval strategy and all of that.
And those are things that will be harder to do with OpenAI's tools.
OpenAI's tool has-- we have an out of the box solution.
We give you some knobs to customize things.
But it's more of a managed rag service.
So my recommendation would be start with the OpenAI thing, see if it meets your needs.
And over time, we're going to be adding more and more knobs to make it even more customizable.
But if you want the completely custom thing, you want control over every single thing, then you'd probably want to go and hand roll it using other solutions.
So we're supportive of both.
Engineers should pick.
Yeah.
And then we got computer use, which I think Operator was obviously one of the hot releases of the year.
And we're only 2 on 10.
Let's talk about that.
And that's also, it seems, a separate model that has been fine-tuned for Operator that has Projectus.
Yeah, absolutely.
I mean, the computer use model is exciting.
The cool thing about computer use is that we're just so, so early.
It's like the GPT-2 of computer use, or maybe GPT-1 of computer use right now.
But it is a separate model that has been-- the computer use team has been working on.
You send it screenshots, and it tells you what action to take.
So the outputs of it are almost always tool calls.
And you're inputting screenshots based on whatever computer you're trying to operate.
Maybe zooming out for a second, because I'm sure your audience is super, super like AI native, obviously.
But what is computer use as a tool, right?
And what's Operator?
So the idea for computer use is how do we let developers also build agents that can complete tasks for the users but using a computer of a browser instead?
And so how do you get that done?
And so that's why we have this custom model optimized for computer use that we use for Operator ourselves.
But the idea behind putting it as an API is that imagine now you want to automate some tasks for your product or your own customers, then now you can have the ability to spin up one of these agents that will look at the screen and act on the screen.
So that means the ability to click, the ability to scroll, the ability to type, and to report back on the action.
So that's what we mean by computer use and wrapping it as a tool also in the responses API.
So now that gives a hint also at the multi-turn thing that we were hinting at earlier, the idea that, yeah, maybe one of these actions can take a couple of minutes to complete because there's maybe 20 steps to complete that task, but now you can.
Do you think computer use can play Pokemon?
Oh, interesting.
I guess we should try it.
There's a lot of interest.
I think Pokemon really is a good agent benchmark, to be honest.
It seems like Claude is running into a lot of trouble.
Sounds like we should make it a new eval, it looks like.
Yeah, yeah, yeah.
Oh, and then one more thing before we move on to Agent SDK, I know you have a hard stop.
There's obviously blah, blah, dash preview, right?
Search preview, computer use preview, right?
And you see all fine tunes of 4.0.
I think the question is, are they all going to be merged into the main branch, or are we basically always going to have subsets of these models?
Yeah, I think in the early days, research teams that are over there operate with fine-tuned models.
And then once the thing gets more stable, we sort of merge it into the main line.
So that's definitely the vision, going out of preview as we get more comfortable with and learn about all the developer use cases.
And we're doing a good job at them.
We'll sort of make them part of the core models so that you don't have to deal with the bifurcation.
You should think of it this way as exactly what happened last year when we introduced vision capabilities.
Vision capabilities were in a vision preview model based off of GPT-4.
And then vision capabilities now are obviously built into GPT-4.
You can think about it the same way for the other modalities like audio and those kind of models, like optimized for search and computer use.
HSSDK, we have a few minutes left.
So let's just assume that everyone has looked at Swarm.
I think that Swarm has really popularized the handoff technique, which I thought was really, really interesting for sort of a multi-agent world.
What is new with the SDK?
Yeah, for sure.
So we've basically added support for types.
We've made this a lot-- yeah, we've added support for types.
We've added support for guard railing, which is a very common pattern.
So in the guard rail example, you basically have two things happen in parallel.
The guard rail can sort of block the execution.
It's a type of optimistic generation that happens.
And I think we've added support for tracing.
So you can basically look at the traces that the agent's SDK creates in the OpenAI dashboard.
We also made this pretty flexible.
So you can pick any API from any provider that supports the chat completions API format.
So it supports responses by default, but you can easily plug it into anyone that uses the chat completions API.
And similarly, on the tracing side, you can support multiple tracing providers.
By default, it sort of points to the OpenAI dashboard.
But there's so many tracing companies out there, and we'll announce some partnerships on that front, too.
So just like adding lots of core features and making it more usable, but still centered around handoffs is the main, main concept.
And by the way, it's interesting, right?
Because Swarm just came to life out of learning from customers directly that orchestrating agents in production was pretty hard.
You know, simple ideas could quickly turn very complex, like one of those guard rails, one of those handoffs, et cetera.
So that came out of learning from customers and was initially shipped as a low-key experiment, I'd say.
But we were kind of taken by surprise at how much momentum there was around this concept.
And so we decided to learn from that and embrace it.
To be like, OK, maybe we should just embrace that as a core primitive of the OpenAI platform.
And that's kind of what led to the agent's SDK.
And I think now, as Nicole mentioned, it's like adding all of these new capabilities to it, like leveraging the handoffs that we had, but tracing also.
And I think what's very compelling for developers is instead of having one agent to rule them all and you stuff a lot of tool calls in there that can be hard to monitor, now you have the tools you need to separate the logic.
You can have a triage agent that, based on an intent, goes to different kind of agents.
And then on the OpenAI dashboard, we're releasing a lot of new user interface logs as well.
So you can see all of the tracing UIs.
Essentially, you'll be able to troubleshoot what exactly happened in that workflow when the triage agent did a handoff to a secondary agent and the third and see the tool calls, et cetera.
So we think that the agent's SDK combined with the tracing UIs will definitely help users and developers build better agent take workflows.
And just before we wrap, are you thinking of connecting this with also the RFT API?
Because I know you already have-- you kind of store my tax completions, and then I can do fine tuning of that.
Is that going to be similar for agents where you're storing kind of like my traces and then help me improve the agents?
Yeah, absolutely.
Like you've got to tie the traces to the E-valve's product so that you can generate good E-valves.
Once you have good E-valves and graders and tasks, you can use that to do reinforcement fine tuning.
And lots of details to be figured out over here, but that's the vision.
And I think we're going to go after it pretty hard and hope we can make this whole workflow a lot easier for developers.
Awesome.
Thank you so much for the time.
I'm sure you'll be busy on Twitter tomorrow without the developer feedback.
Yeah, thank you so much for having us.
And as always, we can't wait to see what developers will build with these tools and how we can learn as quickly as we can from them to make them even better over time.
Awesome.
Thank you, guys.
Thank you.
Thank you, both.
Awesome.
[MUSIC PLAYING]