Lenny's Podcast · 2025-04-10

OpenAI CPO Kevin Weil on AI Product Strategy, Evals, and the Future of Building

Hosts: Lenny Rachitsky

Guests: Kevin Weil

AI product strategyevals and model evaluationstartup opportunitiesfine-tuning modelschat interface designreasoning modelsvibe codingfuture of workAI and educationmodel maximalismLibra cryptocurrency

Why it matters

Writing evals is becoming a core skill for PMs and builders.

Key claims

  • Today's AI models are the worst you'll ever use—OpenAI operates on a 2-month improvement cadence that fundamentally changes product strategy
  • Writing evals is becoming a core skill for PMs and builders; products must be designed around whether a model hits 60%, 95%, or 99%+ accuracy on the use case
  • OpenAI deliberately won't pursue most vertical/industry-specific applications—3M+ developers using their API represents massive startup opportunity for fine-tuned, domain-specific products
  • Model maximalism: build products right at the edge of current model capabilities because better models will arrive shortly and make them sing

Episode summary

Summary

Kevin Weil, OpenAI's Chief Product Officer, joins Lenny's Podcast for a wide-ranging discussion on how AI is reshaping product development and what it means to build inside the world's most-watched AI lab. Weil explains that OpenAI operates on the principle of "model maximalism"—the current AI is the worst you'll ever use, with capabilities expanding every few months—which pushes teams to build for where models are headed rather than where they are today. He describes a bottoms-up culture with intentionally PM-light teams (~25 PMs), iterative deployment, and a strong working partnership between research and product functions.

  • Today's AI models are the worst you'll ever use—OpenAI operates on a 2-month improvement cadence that fundamentally changes product strategy
  • Writing evals is becoming a core skill for PMs and builders; products must be designed around whether a model hits 60%, 95%, or 99%+ accuracy on the use case
  • OpenAI deliberately won't pursue most vertical/industry-specific applications—3M+ developers using their API represents massive startup opportunity for fine-tuned, domain-specific products
  • Model maximalism: build products right at the edge of current model capabilities because better models will arrive shortly and make them sing
  • Chat is the ideal interface for AI because it's how humans already communicate with full flexibility—rigid interfaces would limit the bandwidth of interaction
  • Fine-tuned models and researcher/ML engineer roles will become embedded in nearly every product team, similar to how Cursor and Windsurf use ensembles of specialized models
  • Weil is PM-light on purpose (~25 PMs); looks for high-agency, ambiguity-tolerant PMs who can build rapport with research teams and lead through influence
  • On kids: prioritize curiosity, independence, and thinking skills over specific technical knowledge since the future is unknowable; AI tutoring is the most underexplored world-changing opportunity

Source material

Transcript

important thing about OpenAI is that it is the worst AI model you will ever use for the rest of your life.

And when you actually get that in your head, it's kind of wild.

Everywhere I've ever worked before this, you kind of know what technology you're building on.

But that's not true at all with AI.

Every two months, computers can do something they've never been able to do before, and you need to completely think differently about what you're doing.

You're chief product officer of maybe the most important company in the world right now.

I want to chat about what it's just like to be inside the center of the storm.

Our general mindset is in two months there's going to be a better model and it's going to blow away whatever the current set of limitations are.

And we say this to developers too.

If you're building and the product that you're building is kind of right on the edge of the capabilities of the models, keep going because you're doing something right.

Give it another couple months and the models are going to be great.

And suddenly the product that you have that just barely worked is really going to sing.

Famously, you led this project at Facebook called Libra.

Libra is probably the biggest disappointment of my career.

It fundamentally disappoints me that this doesn't exist in the world today because the world would be a better place if we'd been able to ship that product.

We tried to launch a new blockchain.

It was a basket of currencies originally.

It was integration into WhatsApp and Messenger.

I would be able to send you 50 cents in WhatsApp for free.

It should exist.

To be honest, the current administration is super friendly to crypto.

Facebook's reputation is in a very different place.

Maybe they should go build it now.

Today, my guest is Kevin Wheel.

Kevin is chief product officer at OpenAI, which is maybe the most important and most impactful company in the world right now being at the forefront of AI and AGI and maybe someday super intelligence.

He was previously head of product at Instagram and Twitter.

He was co-creator of the Libra cryptocurrency at Facebook, which we chat about.

He's also on the boards of Planet and Strava and the Black Product Managers Network and The Nature Conservancy.

He's also just a really good guy and he has so much wisdom to share.

We chat about how OpenAI operates, implications of AI and how we will all work and build product, which markets within the AI ecosystem companies like OpenAI won't likely go after and thus are good places for startups to own.

Also, why learning the craft of writing evals is quickly becoming a core skill for product builders, what skills will matter most in an AI era and what he's teaching his kids to focus on and so much more.

This is a very special episode and I'm so excited to bring it to you.

If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube.

If you become an annual subscriber of my newsletter, you get a year free of Proplexity Pro, Linear, Notion, Superhuman and Renola.

Check it out at Lenny's newsletter.com and click bundle.

With that, I bring you Kevin Wheel.

This episode is brought to you by EPPO.

EPPO is a next-generation A/B testing and feature management platform built by alums of Airbnb and Snowflake for modern growth teams.

Companies like Twitch, Miro, ClickUp and DraftKings rely on EPPO to power their experiments.

Experimentation is increasingly essential for driving growth and for understanding the performance of new features.

And EPPO helps you increase experimentation velocity while unlocking rigorous deep analysis in a way that no other commercial tool does.

When I was at Airbnb, one of the things that I loved most was our experimentation platform where I could set up experiments easily, troubleshoot issues and analyze performance all on my own.

EPPO does all that and more with advanced statistical methods that can help you shave weeks off experiment time and accessible UI for diving deeper into performance and out of the box reporting that helps you avoid annoying prolonged analytic cycles.

EPPO also makes it easy for you to share experiment insights with your team, sparking new ideas for the A/B testing flywheel.

EPPO powers experimentation across every use case, including product, growth, machine learning, monetization and email marketing.

Check out EPPO at getepo.com/leni and 10X your experiment velocity.

That's geteppo.com/leni.

This episode is brought to you by Persona, the adaptable identity platform that helps businesses fight fraud, meet compliance requirements and build trust.

While you're listening to this right now, how do you know that you're really listening to me, Lenny?

These days, it's easier than ever for fraudsters to steal PII, faces and identities.

That's where Persona comes in.

Persona helps leading companies like LinkedIn, Etsy and Twilio securely verify individuals and businesses across the world.

What sets Persona apart is its configurability.

Every company has different needs, depending on its industry, use cases, risk tolerance and user demographics.

That's why Persona offers flexible building blocks that allow you to build tailored collection and verification flows that maximize conversion while minimizing risk.

Plus, Persona's orchestration tools automate your identity process so that you can fight rapidly shifting fraud and meet new waves of regulation.

Whether you're a startup or an enterprise business, Persona has a plan for you.

Learn more at www.withpersona.com/leni.

Again, that's www.withpersona.com/leni.

Kevin, thank you so much for being here and welcome to the podcast.

Thank you so much for having me.

We've been talking about doing this forever.

We made it happen.

We did it.

I can't imagine how insane your life is.

I really appreciate you that you made time for this.

And we're actually recording this the week that you guys launched your new image model, which is a happy coincidence.

My entire social feed is filled with skibbly vacations of everyone's life and family photos and everything.

So good job.

Yep, mine too.

My wife, Elizabeth, sent me one of hers.

So I'm right there with you.

Let me just ask, did you guys expect this kind of reaction?

It feels like this is the most viral thing that's happened in AI, which is a high bar since I don't know, Jet GPT launched.

Just like, did you guys expect it to go this well?

What does it feel like internally?

You know, there have been a handful of times in my career when you're working on a project or product internally, and the internal usage just explodes.

This was true, by the way, when we were building stories at Instagram, more than anything else in my career, we could feel it was going to work because we were all using it internally.

And we'd go away for a weekend, you know, before it launched, we were all using it and we come back after a weekend and we would know what was going on and be like, Oh, hey, I saw you were at that camping trip.

How was that?

You were like, man, this thing really works.

Image Gen was definitely one of those.

So we've been playing with it for I don't know, couple months.

And when it first went live internally to the company, there was kind of a little gallery where you could generate your own, you could also see what everyone else was generating.

And it was just like nonstop buzz.

So yeah, we had a sense that this was going to be a lot of fun for people to play with.

That's a really cool, like, that should be a measure of just like confidence in something going well that you're launching is internally, everyone's going crazy for it.

Yeah, especially social things, because yeah, you have a very tight network as a company, socially, so you know each other and your experts in your product, hopefully.

And so there's some sense in which if you're doing something social, and it's not taking off internally, you might you might question what you're doing.

Yeah.

And by the way, the Ghibli thing, is that something you're seated?

Or how did that even start?

Was that like an intentional example?

I think it's just the style people love.

And yeah, model is is really capable at, at emulating style or understanding what you know, it's very good at instruction following, that's actually something that I think people, I'm starting to see people discover with it.

But you do very complex things, you can give it two images, you know, what is your living room, and the other is a whole bunch of photos or memorabilia things you want.

And you say like, tell me how you would arrange these things.

Or you can say, I'd like you to show me what this will look like if you put this over here and this thing to the right of that and this one to the left of this, but under that one.

And the model actually will understand all of that and do it.

It's incredibly powerful.

So I'm, I'm, I'm just excited about all the different things people are going to figure out.

Yeah.

All right.

Well, good job.

Good job, team open AI.

Let's get serious here.

And let's kind of zoom out a little bit.

The way I see it is your chief product officer of maybe the most important company in the world right now.

Just not to set the bar too high, but you guys are ushering in AI, AGI, at some point super intelligence at some point, no big deal.

I've had I have more questions for you than I've had for any other guest actually put out a call out on Twitter and LinkedIn and my community just like what would you want to ask Kevin and 300 over 300 well formed questions and we're going to go through every single one.

So let's just get started.

I'm just joking.

I picked out the best and there's a lot of stuff I'm really curious about.

It's it's 1pm here.

It doesn't get dark for a while.

So let's do it.

Okay, here we go.

Okay.

So first of all, I'm just going to take notes here.

When is AGI launching?

When does this I mean, we just launched a good image gen model.

Does that count?

It's, it's getting there.

It's getting there.

There's this.

There's this quote I love, which is AI is whatever hasn't been done yet.

Because once it's been done when it kind of works, then you call it machine learning.

And once it's kind of ubiquitous, and it's everywhere, then it's just an algorithm.

So I've always loved that, that we call things AI when they still don't quite work.

And then, you know, by the time it's like, an AI algorithm that's recommending you follow, you know, oh, that's just an algorithm.

But this new thing like self driving cars, that's some degree, we're always going to be there.

And the next thing is always going to be AI.

And the current thing that we, you know, use every day and is just a part of our lives.

That's an algorithm.

It's so interesting.

Because yeah, like, in the Bay Area, you see self driving cars driving around.

And so normal now, when like, four years ago, and I know three years ago, you would have thought you would have seen this and you'd be like, holy shit, what is how we're in the future.

And now we're just so taken for granted.

It's I mean, there's something like that with everything if I showed you it when GPT three launched, right, I wasn't at open AI, then I was just I was just a user.

But it was mind blowing.

And if I gave you GPT three, now I just plug that into chat GPT for you and you started using it, you'd be like, what is this thing?

I like this like mess.

There's I had the same experience when I when I first got into a Waymo, right, your your very first ride, at least my very first ride, my first like 10 seconds in a Waymo, it starts driving and you're like, oh my god, watch out for that bike.

You're holding on to whatever you can.

And then like, five minutes in, you've calmed down.

And you realize that you're getting driven around the city without a driver.

And it's working.

You're just like, oh my god, I am living in the future right now.

And then like another 10 minutes, you're bored, you're doing email on your phone, answering slack messages.

And you know, suddenly, this miracle of human invention is just an expected part of your life from then on.

And there is really something in the way that we all are adapting to AI that's kind of like that these miraculous things happen and computers can do something they've never been able to do before.

And it blows our mind collectively for like a week.

And then we're like, oh, yeah, like, oh, yeah, now it's just machine learning on its way to being an algorithm.

The craziest thing about what you just shared actually is like, I don't know chat GPT, which is like now feels terrible.

3.5 was like a couple years ago.

And imagine what life will be like in a couple years from now.

We're gonna get to that where things are going, what you think is going to be the next big leap.

But I want to start with the beginning of your journey at OpenAI.

So you worked at Twitter, you worked at Facebook, you worked at planet Instagram.

At some point, you got recruited to go and come work at OpenAI.

I'm curious just what that story was like of the recruiting process of joining open open AI as CPO.

Is there any, are there any fun stories there?

If I'm running remembering the timeline, right, we communicated planet I was leaving.

And I was planning to just go take some time, you know, like I wasn't going to stop working.

But but I was also happy to take the summer.

This is like maybe April or something.

It was like, cool, I'm gonna have the summer with my kids, we're gonna, you know, go to Tahoe or something.

And I'll actually get to hang out rather than what I usually do going up and down and all that.

And then Sam and I had known each other lightly for a bunch of years.

And he's, he's always involved in so many interesting things, you know, like, companies building fusion and all these things.

So he'd always been somebody that I would like call occasionally, if I was starting to think about my next thing.

Because I like working on big like tech forward sort of, you know, next, next wave kind of things.

And, and so I called him, I think Vinod also helped put us in touch again.

And, and this time, it wasn't like, oh, you should go talk to like these guys working on fusion.

It, he said, actually, yeah, we're thinking about something, you should come talk to us.

I was like, Okay, that sounds amazing.

Let's do it.

And it goes really fast, really, really fast.

Like I met, you know, most of the management team in a brief period of time, a few days.

And they were telling me, look, we're gonna, we're basically gonna move as fast as we, as we want to move.

And it kind of if ever, if you talk to everyone, everyone likes you ready to go.

Sam came over for dinner.

And we had we had a great evening together, just like talking about open AI in the future and getting to know each other better.

And at the end, I was like, I was going to go in the next day for like a bigger round of interviews.

And Sam was saying, you know, hey, it's going really well, we're really excited.

And I said, Cool, so how do I think about tomorrow?

And he said, Oh, you'll be fine.

Don't worry about it.

And if it goes well, like we're basically there.

And so I go in the next day, meet a bunch of people, have a great time.

Like I really enjoyed everybody I met with.

In any interview, you can always second guess yourself, you know, like, oh, I shouldn't have said that thing.

Or I that thing I gave a bad answer on, I wish I could redo.

But I came away feeling like I think that went pretty well.

And I was expecting to hear like that weekend, basically, because they'd sort of set expectations, so as you know, if this goes well, we're ready to go.

And I didn't hear anything.

And then it was like Monday, Tuesday, Wednesday, I still didn't hear anything.

And I reached out to to folks on the open AI side a couple of times still nothing.

And I was like, Oh, my God, I screwed it up.

Like, I don't know where I screwed it up.

But I totally screwed it up.

I can't believe it.

And I was going back to Elizabeth, my wife and being like, What did I do?

Like, where Where do you think I, you know, getting all crazy about it.

And then it's still nothing.

And finally, it was like, it was like nine days later, they finally got back to me.

And it turned out, you know, there was like a bunch of stuff happening internally, and this, that and the other thing.

And, you know, there's just a million things happening.

And they finally were like, Oh, yeah, that went well, let's do this.

I was like, Oh, okay, cool.

Let's do it.

But it was like nine days of agony.

And they were just super busy on some internal stuff.

And there I was like fretting every single day and re re going over every line of our interview process.

It makes me think about when you're like dating someone and you've texted them and they just you're not hearing anything back and all like you assume something is wrong.

Yeah, totally.

They might just be busy.

I give them a hard time about it still.

So that's wild.

I love it.

I love that it worked out.

And I guess I guess the lesson there is don't don't jump to conclusions.

Yeah, have a little bit of chill.

Speaking of that, I want to chat about what it's just like to be inside the center of the storm.

Again, you worked at a lot of let's say traditional companies, even though they're not that traditional Twitter and Instagram and Facebook and planet.

And now you work at OpenAI.

I'm curious what is most different about how things work in your day to day life at OpenAI.

I think it's probably the pace.

Maybe it's two things.

One is it's the pace.

The second is, you know, everywhere I've ever worked before this, you kind of know what technology you're building on.

So you spend your time thinking about what what problems are you solving?

Who are you building for?

You know, how are you going to make their lives better?

How are you going to is this a big enough problem that you're going to be able to to change habits?

You know, do people care about this problem being solved?

All those like good product things.

But the stuff that you're building on is like kind of fixed, you know, you're talking about databases and things.

And I bet the database you use this year is probably 5% better than the database you used two years ago.

But that's not true at all with AI.

It's like every two months, computers can do something they've never been able to do before.

And you need to completely think differently about what you're doing.

There's like something fundamentally interesting about that makes life fun here.

There's also something you know, we'll maybe like talk about eviles later.

But it also really in this world of everything we're used to with computers is about giving a computer very defined inputs.

You know, if you look at Instagram, for example, there are buttons that do specific things and you know what they do.

And then when you give a computer defined inputs, you get very defined outputs.

You're confident that if you do the same thing three times, you're going to get the same output three times.

LLMs are completely different than that, right?

They're good at fuzzy, subtle inputs, then all the nuances of human language and communication, they're pretty good at.

And also, they don't really give you the same answer, you probably get spiritually the same answer for the same question, but it's certainly not the same set of words every time.

And so you're much more it's fuzzier inputs and fuzzier outputs.

And it when you're building products, it really matters with whether you know, there's some use case that you're trying to build around.

If the model gets it right 60% of the time, you build a very different product than if the model gets it right 95% of the time versus if the model gets it right 99.5% of the time.

And so there's also something you have to get really into the weeds on your use case and the evals and things like that, in order to understand the right kind of product to build.

So that is just fundamentally different.

You know, if your database works once, it works every time.

And that's not true in this world.

Let's actually follow this thread on evals.

I definitely wanted to talk about this.

So we had this legendary panel at the Lenny and Friends Summit was you and Mike Krieger and Sir Guo moderating.

So fun.

And the thing that I heard that kind of stuck with people from that panel was a comment you made, where you said that writing evals is going to become a core skill for product managers.

Yeah, and I feel like that probably applies further than just product managers.

A lot of people know what evals are a lot of people have no idea what I'm talking about.

So could you just briefly explain what is an eval?

And then just why do you think this is going to be so important for people building products in the future?

Yeah, sure.

I think the easiest way to think about it is almost like a quiz for a model a test to gauge how much it how well it knows a certain set of subject material or how good it is at responding to a certain set of questions.

So in the same way, you know, you take a calculus class, and then you have calculus tests that see if you're, you've learned what you're supposed to learn, you have evals that test, how good is the model at at creative writing?

How good is the model at that, you know, graduate level science?

How good is the model at competitive coding?

And so you have these set of evals that basically, you know, perform as benchmarks for how smart or capable the model is.

Is like a simple way to think about like unit tests for unit tests, tests in general for models totally.

Great, great.

Okay.

And then, why is this so important for people that don't totally understand what the hell is going on here with evals?

Why is it so so key to building AI products?

Well, it gets back to what I was saying, you need to know whether your model is going to there are certain things that models will get right, 99.95% of the time, and you can just be confident.

There are things that they're going to be 95% right on and things they're going to be 60% right on.

If the model 60% right on something, you're going to need to build your product totally differently.

And by the way, these things aren't static either.

So a big part of evals is, if you know, you're you're building for some use case.

So let's say let's take our deep research product, which is one of my favorite things that we've released, maybe ever.

Right, the idea is, with deep research for people who haven't used it, you can give chat GPT now a an arbitrarily complex query, like, it's not about returning you an answer from, you know, a search query, which we can also do, it's, it's, here's a thing that if you were going to answer it yourself, you'd go off and do, you know, two hours of reading on the web, and then you might need to read some papers, and then you would come back and start writing up your thoughts and realize you had some gaps in your thinking.

So you go out and do more research, you might, it might take you a week to write some like 20 page answer to this question, you can let chat GPT just like chug for you for 2530 minutes, you know, it's not the immediate answers you're used to, but it might go work for 2530 minutes and do work that would have taken you a week.

So as we were building that product, we were designing evals, sort of it at the same time as we were thinking about how this product was going to work, and we were trying to go through, like hero use cases, you know, here's a question you want to be able to ask, here's an amazing answer for that question.

And, and then turning those into evals, and, and then hill climbing on those evals.

So it's not just that the model is static, and we hope it does okay on a certain set of things, you can teach the model, you can make this a continuous learning process.

And so as we were fine tuning our model for deep research to be able to answer these things, we were able to test is it getting better on these evals that we said were important measures of how the product was working.

And it's when you start seeing that and you start seeing performance on evals going up, you start saying, Okay, I think we have a product here.

You made a kind of a comment along these same lines around evals that that AI is almost like capped in how amazing it can be by that how good we are at evals.

Does that resonate any more thoughts along those lines?

These I mean, these models are, are their intelligences and intelligence is so fundamentally multidimensional.

So you can talk about a model being amazing at competitive coding, which may not be the same as that model being great at front end coding or back end coding or taking a whole bunch of code that's written in cobalt and turning it into Python, you know, like, and that's just within the software engineering world.

And so I think there's a sense in which you can think of these models as incredibly smart, very like factually aware intelligences, but still most of the world's data knowledge process is, is not public.

It's behind the walls of companies or governments or other things.

And same way, if you were going to join a company, you would spend your first two weeks onboarding, you'd be learning the company specific processes, you'd get access to company specific data.

It's you can teach these models, the models are smart enough, you can teach them anything, but they need to have the sort of the raw data to, to learn from.

And so there's a, there's a sense in which, yeah, I think the future is really going to be incredibly smart, broad based models that are fine tuned and, and, and tailored with company specific or use case specific data so that they perform really well on company specific or use case specific things.

And you're going to measure that with custom evals.

And so, you know, what I, what I was referring to is just like these models are really smart, you need to still teach them things if the data is not in their training set.

And there's a huge amount of use cases that are not going to be in their training set, because they're relevant to one industry or one company.

I'm just going to keep following the thread that you're leading us down.

But I'm going to come back because I have more questions around some of these things.

So you came to a space that I think a lot of AI founders are thinking about is just where's opening I not going to come squash me in the future, or one of the other foundational models.

And so it's unclear to a lot of people just like should I build a startup in the space or not?

Is there any advice you have or any guidance for where you think opening I, or just foundational models in general likely won't go and where you have an opportunity to build a company?

Well, one of my so this is something that Ev Williams used to say, back at Twitter, that's always stuck with me, which is, no matter, no matter how big your company gets, no matter how like incredible the people are, there are way more smart people outside your walls than there are inside your walls.

And that's why we are so focused on building a great API.

We have 3 million developers using our API.

No matter how ambitious we are, how big we grow, by the way, we don't want to grow super big.

There are going to be there are so many use cases places in the world where AI can fundamentally make our lives better.

We're not going to have the people we're not going to have the, you know, the know how to build most of these things.

And I think like I was saying, the data is, is industry specific use case specific, you know, behind certain company walls, things like that.

And there are immense opportunities in every industry in every vertical in the world to go build AI based products that improve upon the state of the art.

And there's just no way we could ever do that ourselves.

We don't want to we couldn't if we did want to.

And we're really excited to power that for 3 million plus developers and way more in the future.

Coming back to your earlier point about the tech changing constantly and getting faster, not exactly knowing what you'll have by the time you launch something in terms of the power of the model.

I was I'm curious what allows you to ship quickly and consistently and such great stuff.

And it sounds like one answer is bottoms up empowered teams versus a very top down roadmap that's you know, planned out for a quarter.

What what are some of those things that allow you to ship such great stuff so often so quickly?

Yeah, I mean, we try and we try and have a sense of where we're trying to go, you know, point ourselves in a direction so that we have some rough sense of alignment.

Like thematically, I don't for a second and we do quarterly road mapping, you know, we laid out sort of a year long strategy, I don't for a second believe that what we write down in these documents is what we're going to actually ship, you know, three months from now, let alone six or nine.

But that's okay.

There's a I think it's like an Eisenhower quote, plans are useless planning is helpful, which I totally subscribe to, especially in this world.

It's really valuable if you think about quarterly road mapping, for example, it's really valuable to have a moment where you stop and go, okay, what did we do?

What worked?

What went well?

What didn't go well?

What did we learn?

And now what do we think we're going to do next?

And by the way, everybody has some dependencies, you know, you need the infrastructure team to do the following things, partnership with research here.

And so you want to have a second to kind of check your dependencies, make sure you're good to go and then start executing.

We try and keep that really lightweight, because it's not gonna be right.

You know, we're gonna throw it out halfway because we will have learned new things.

So the moment of planning is helpful, even if you're only gonna, you know, it's only partially right.

So that's, I think, be just expecting that you're going to be super agile.

And there's no sense writing a three month roadmap, let alone a year long roadmap, because the technology is changing underneath you so quickly.

We really do try and go like very strongly bottoms up, kind of subject to our overall directional alignment.

We have great people.

We have engineers and PMs and designers and researchers who are passionate about the products they're building, and have strong opinions about them, and are also the ones building them.

And so they're they have a, they have a real sense of what the capabilities are, too, which is super important.

And so I think you want to be more bottoms up in this way.

And so we operate that way.

We are happy making mistakes.

We make mistakes all the time.

It's one of the things I really appreciate about Sam, he pushes us really hard to move fast.

But he also understands that with moving fast comes, we didn't quite get this right, or, you know, we launched this thing, it didn't work, we'll roll it back.

You know, look at our naming, our naming is horrible.

There's a lot of questions people had for you.

Yeah, model names.

Yeah, it's absolutely atrocious.

And we know it.

And we'll get around to fixing it at some point.

But it's not the most important thing.

And so we don't spend a lot of time on it.

But it also shows you how it doesn't matter.

Again, chat, GPT, the most popular, fastest growing product in history, models are it's the number one AI, API and model.

So clearly, it doesn't matter that much.

And we name things like 03 mini high.

Oh, man, I love it.

Okay, so you talked about road mapping, and bottoms up.

And I'm really curious how you is there like a cadence or ritual of aligning with you or Sam or he or you review everything that's going out?

Like, is there a meeting every week or every month where you guys see what's happening on key projects?

So we do product reviews and things like that, like you would expect.

There isn't a ritual because there isn't we, I would never want us to be blocked on launching something, you know, waiting for a review with me or Sam if we can't get there, if I'm traveling or Sam's, you know, busy or whatever, that's a bad reason for us not to ship.

So obviously, for the biggest, most high priority stuff, we have a pretty close beat on it.

But we really try not to, frankly, like we want to empower teams to move quickly.

And I think it's more important to ship and iterate.

So we have this philosophy that we call iterative deployment.

And the idea is like, we're all learning about these models together.

So there's a real sense in which it's way better to like ship something, even when you don't know the full set of capabilities and iterate together like in public.

And we kind of co evolved together with the rest of society as we learn about these things and where they're different and where they're good and bad and weird.

I really like that philosophy.

There's also a bit of I think the other thing that like ends up being a part of our our product philosophy is the sense of like model maximalism.

The models are not perfect.

They're going to make mistakes, you could spend a lot of time building all kinds of different scaffolding around them.

And by the way, sometimes we do because sometimes there are things, you know, kinds of errors that you just don't want to make.

But we don't spend that much time building scaffolding around the parts that don't match that.

Because our general mindset is in two months, there's going to be a better model and it's going to blow away whatever, you know, the current set of limitations are.

And so if you're building and we say this to developers to if you're building and the product that you're building is kind of right on the edge of the capabilities of the models, keep going because you're doing something right.

Because you give it another couple months and the models are going to be great.

And suddenly the product that you have that just barely worked is really going to sing.

And, you know, that's that's kind of how you make sure that you're really pushing the envelope and building new things.

I had the founder of Bolt on the podcast, Stack Blitz is the company name.

And he he shared the story that they've been working on this product for seven years behind the scenes and it was failing, nothing was happening.

And then all of a sudden, it was sorry to mention a competitor, but Claude came out or Sonnet 3.5 came out.

And all of a sudden everything worked.

And they've been building all this time and finally worked.

And I hear that a lot with YC just like things are that never were possible now are just becoming possible every few months with the updates and the models.

Yeah, absolutely.

Let me actually ask this.

I wasn't planning to ask this, but I'm curious if you have any quick thoughts.

Just why why is Sonnet so good at coding and kind of thoughts on your stuff getting as good and better at actual coding?

Yeah, I mean, kudos to anthropic, they built very good coding models.

No doubt.

We think that we can do the same.

Maybe by the time this podcast is shipped, we'll have more to say.

But either way, all credit to them.

I think this intelligence is really multidimensional.

And so I think there's the model providers, it used to be that OpenAI had this like massive model lead, you know, 12 months or something ahead of everybody else.

That's not true anymore.

You know, I like to think we still have a lead, I'd argue that we do.

But it's certainly not a massive one.

And that means that there are going to be different places where you know, the Google models are really good, or where anthropic models are really good, or where we're really good.

And our competitors are like, Ah, we got to get better at that.

And it actually is easier to get better at a certain thing once someone's proved it possible than it is to, you know, forge a path through the the jungle, and doing something brand new.

So I just think, yeah, as an example, it was like, nobody, nobody could break four minutes in the mile.

And then finally, somebody did in the next year, 12 more people did it.

I think there's that all over the place.

And it just means that competition is really intense.

And consumers are going to win and developers are going to win in businesses are going to win in a big way from that.

It's part of why the industry moves so fast.

But you know, all respect to the other big model providers, models are getting really good.

We're going to move as fast as we can.

And I think we've got some good stuff coming.

Exciting.

This makes me also think about in many ways, other models are better at certain things.

But somehow chat GPT is like the, like, if you look at all the awareness numbers and usage numbers, it's like, no matter where you guys are in the rankings, people seem to just like, think of AI and chat GPT almost as, as the same.

What do you think you did right to kind of win and consumer mindset, at least at this point in awareness in the world?

I think being first helps, which is one of the reasons why we're so focused on moving quickly.

You know, we like being the first to launch new capabilities, things like deep research.

We've also our models are very, they can do a lot of things, right?

So they can, they can take real time video input, they can you have speech to speech, you can use speech to text and text to speech.

They can do deep research, they can operate on a canvas, they can write code.

And so chat GPT can kind of be this one stop shop where all the things that you want to do are possible.

And as we, as we go forward in it, you know, we have more agentic tools like operator where it's browsing for you and doing things for you on the web.

More and more, you're going to be able to come to this one place to chat GPT, give it instructions, and have it accomplish real things for you in the world.

There's like something fundamentally valuable in that.

And so, you know, we think a lot about that we think, and it we move, we try to move really fast so that we are always the most useful place for people to come to.

What would you say is the most counterintuitive thing that you've learned after building AI products or working at open AI something was just like, I did not expect that?

I don't know, maybe I should have expected this.

But one of the things that's been funny for me is the extent to which you can kind of reason when you're trying to figure out how some product should work with AI, you can often, or even why some AI thing happens to be true, you can often reason about it the way you would reason about another human.

And it kind of works.

Yeah, so maybe a couple examples, when we were first launching our, our reasoning model, right, we were the first to build a model that could reason that could that could instead of giving you just a quick, you know, system one answer right away at every question you asked, it was the third emperor of the Holy Roman Empire, like, you know, here's an answer, you could ask it hard questions, and it would reason the same way that if I asked you to do crossword puzzle, you couldn't just like snap fill in everything, you would be well, okay, on this one across, I think it could be one of these two, but that means there's an A here.

So that one has to be this a way, you know, like backtrack kind of step by step build up from where you are, same way you answer any, any difficult logistical problem, any scientific problem.

So this reasoning breakthrough was big, but it was also the first time that a model needed to sit and think and that's a weird paradigm for a consumer product, you don't normally have something where you might need to hang out for 25 seconds after you ask a question.

And, and so we were trying to figure out, you know, what's the UI for this, because it's also not like with deep research, where the model is going to go and think for 25 minutes, sometimes, it's actually not that hard, because you're not going to sit and watch it for 25 minutes, you're going to go do something else, you're going to go to another tab or go get lunch or whatever.

And then you'll come back and it's done.

When it's like 2025 seconds or 10 seconds, it's a long experience, it's a long time to wait, but it's not long enough to go do something else.

And so you actually need and, you know, so you can think like, if you asked me something that I needed to think for 20 seconds to answer, what would I do?

I wouldn't just like, go mute, and not say anything and kind of, you know, shut down for 20 seconds and then come back.

So we shouldn't do that, we shouldn't just like have a slider sitting there.

That's annoying.

But I also wouldn't just start like babbling every single thought that I had.

So we probably shouldn't just like expose the whole chain of thought as the model's thinking.

But you know, I might go like, huh, that's a good question.

All right, I might approach it like that.

And then think, you know, you're sort of like maybe giving you little updates.

And that's actually what we ended up shipping.

You have similar things where you can like you can find situations where you get better thinking sometimes out of a group of models that all try and attack the same problem.

And then you have a model that's looking at all their outputs and integrating it and then giving you a single answer at the end.

I mean, sounds a little bit like brainstorming.

Right?

Like I certainly have better ideas when I get in a room and brainstorm with other people because they think differently than me.

And so anyways, there's just like all these situations where you can actually kind of reason about it like a group of humans or an individual human, it sort of works, which, I don't know, maybe maybe I shouldn't have been surprised, but I was.

That is so interesting.

Because when I see these models operate, I like I never even thought about you guys designing that experience.

Like to me, just feels like this is what the LLM does.

It just sits there and tells me what it's thinking.

And I love this point you're making of like, let's make it feel like a human operating.

And how does human operate?

Well, they just talk out loud, they think here's the thing I should explore.

And I love that deep sequined, like to the extreme of that, right?

Where they're just like, here's everything I'm doing and thinking and people actually like that too.

I guess was that surprising to you?

Like, oh, maybe that could work too.

People seem to like everything.

Yeah, we learned from that actually.

Because we, when we first launched it, we kind of gave you like the subheadings of what the model is looking about, but not much more.

And then deep seek launched and they were it was a lot.

And we kind of went, you know, I don't know if everyone wants like that.

There's some novelty effect to seeing what the model is really about.

We felt that too.

And we were looking at it internally.

It's interesting to see the models chain of thought.

But it's not, you know, I think at the scale of like 400 million people, you don't want to see the model kind of like babble a bunch of things.

And so what we ended up doing was summarizing it in interesting ways.

So instead of just getting the subheadings, you're kind of getting like one or two sentences about how it's thinking about it.

And you can learn from that.

So we kind of tried to find a middle ground that that we thought was an experience that would be meaningful for most people.

But you know, showing everybody like three paragraphs is probably not the right answer.

This reminds me of something else you said at the summit that has really stuck with me this idea that chat people always make fun of like chat is not like the future interface for how we interact with AI.

But you made this really interesting point that may argue the other side, which is like as humans, we interface by talking and the IQ of a human can span from really low to really high.

And it all works because we're talking to them and chat is the same thing.

And it can work on all kinds of intelligence levels.

Maybe just share maybe I just shared it.

But I guess anything there about just why chat actually ends up being such an interesting interface for all.

Yeah, I don't know if maybe I'm maybe this is one of those things I believe that most people don't believe.

But I actually think chat is an amazing interface because it's so versatile.

People tend to go, oh, chat.

Yeah, well, that's just like, you know, we'll figure out something better.

And I kind of think I kind of think this is it's it's it's incredibly universal because it is the way we talk like, I can talk to you verbally, like we're talking now, I can, you know, we can see each other and interact.

We can talk on WhatsApp and, you know, be texting each other.

But all of these things is this sort of like unstructured, you know, method of communication.

And that's how we operate.

If I had to, and if I had some more rigid interface that I was allowed to use when we spoke, I would be able to speak to you about, you know, far fewer things.

And it would actually get in the way of us having like maximum communication bandwidth.

So there's something magical.

And by the way, in the past, it never worked because models, there wasn't a model that was good at understanding all of the complexity and nuances of human speech.

And that's the magic of LLMs.

So to me, it's like an interface that's exactly fit to the power of these things.

And that doesn't mean that it always has to be just like, I don't necessarily always want to type.

But if you do want that very open ended, flexible communication medium, it may be that we're speaking and the model speaking back to me, but you still want that like that, that very sort of lowest common denominator, no restrictions way of interacting.

That is so interesting.

That's really changed the way I think about the stuff is that point that chat is just so good for this very specific problem of talking to super intelligence, basically, by the way, I think there are like, it's not that it's only chat either, like there are, if you have high volume use cases where they're more prescribed, and the you don't actually need the full generality.

There are there are many use cases where it's better to have something that's less flexible, more prescribed, faster at a specific task.

And those are great, too.

And you know, you can build all sorts of those.

And but you still want chat as like this baseline for anything that falls out of whatever, you know, vertical you happen to be building for.

It's like a catch all for like every possible thing you'd ever want to express to a model.

I'm excited to chat with Christina Gilbert, the founder of one schema, one of our longtime podcast sponsors.

Hi, Christina.

Yes, thank you for having me on, Lenny.

What is the latest with one schema?

I know you now work with some of my favorite companies like ramp Vanta scale and watershed.

I heard that you just launched a new product to help product teams import CSV's from especially tricky systems like ERPs.

Yes.

So we just launched one scheme of file feeds, which allows you to build an integration with any system in 15 minutes, as long as you can export a CSV to an SFTP folder.

We see our customers all the time getting stuck with hacks and workarounds.

And the product teams that we work with don't have to turn down prospects because their systems are too hard to integrate with.

We allow our customers to offer thousands of integrations without involving their engineering team at all.

I can tell you that if my team had to build integrations like this, how nice would it be to be able to take this off my roadmap, and instead use something like one schema, and not just to build it, but also to maintain it forever.

Absolutely Lenny, we've heard so many four stories of multi day outages from even just a handful of bad records.

We have laser focused on integration reliability to help teams end all of those distractions that come up with integrations.

We have a built in validation layer that stops any bad data from entering your system and one schema will notify your team immediately of any data that looks incorrect.

I know that importing incorrect data can cause all kinds of pain for your customers and quickly lose their trust.

Christina, thank you for joining us.

And if you want to learn more, head on over to one schema.co.

That's one schema.co.

I want to come back to the you talked about researchers and their relationship with product teams.

I imagine a lot of innovation comes from researchers just like I having an inkling and then building something amazing and then releasing it.

And some ideas come from PMs and engineers.

How did how did those teams collaborate?

Like does every team have a PM?

Is it a lot of research led stuff?

Just like what give us a sense of just where ideas and products come from mostly?

It's an area where we're evolving a lot.

I'm really excited about it, frankly, I think if you go back, you know, a couple years when chat GPT was just getting started.

Obviously, I wasn't an open AI.

So but it we were more we were more of a pure research company at the time.

Chat GPT, if you remember, was a low key research preview.

For many years.

Yeah, it wasn't a thing that the team launched thinking it was going to be this massive product.

Oh, chat GPT.

And it was just a way that we were going to let people you know, play with and iterate on the models.

And so we were we were primarily a research company, a world class research company.

And as chat GPT has grown, and as we built our B2B products and our API is and other things, it now we're more of a product company than we were.

I still think we can't work.

Open AI should never be a pure product company, we need to be both a world class research company and a world class product company.

And the two need to really work together.

And that's the thing that's that I think we've been getting much better at over the last like, six months.

If you if you treat those things separately, and you know, the researchers go do amazing things and build models, and then they get to some state and then the product and engineering teams go take them and do something with them.

We're effectively just an API consumer of our own models.

The best products though are going to be is like I was talking about with deep research, it's a lot of iterative feedback, it's understanding the products you're trying to solve or the problems you're trying to solve, building evals for them, using those evals to go gather data and fine tune models to get them to be better at the these use cases that you're looking to solve.

It's a huge amount of back and forth to do it well.

And I think the best products are going to be eng product design and research, working together as a single team to build novel things.

So that's that's actually how we're trying to operate with basically anything that we build.

It's a new muscle for us because we're kind of new as a product company.

But but it's one that people are really excited about because we've seen every time we do it, we build something awesome.

And so you know, now every product starts like that.

How many product managers do you have at OpenAI?

I don't know if you share that number, but if you do, not that many, actually, I don't know, 25.

Maybe it's a little more than that.

But my personal belief is that you want to be pretty pm light as an organization just in general.

I say this with love because I am a pm but too many pm causes problems, you know, will like fill the world with decks and ideas versus execution.

So I think that the I think it's a good thing when you have a pm that has that is working with maybe slightly too many engineers, because it means that they're not going to get in and micromanage, you're going to leave a lot of, you know, influence and responsibility with the engineers to make decisions.

It means you want to have really product focused engineers, which we're fortunate to have, we have an amazingly product focused, like high agency engineering team.

But when you have something like that, you have a team that feels super empowered.

You have a pm that's, you know, trying to really understand the problems and kind of gently guide the team a little bit, but has too much going on to get too far into the details.

And you end up being able to move really fast.

So that's kind of the philosophy we take.

We want we want product the engine leads and product the engineers all the way through.

We want not too many PMs, but really awesome high quality ones.

And so far, that seems to be working pretty well.

I imagine being a PM at open AI is like a dream come true for a lot of people.

At the same time, I imagine it's not a fit for a lot of people.

There's researchers involved, very product minded engineers.

What do you what do you look for in the PMs that you hire there for folks that are like maybe a problem, I shouldn't go work there, I shouldn't even think about that.

I think I've said this a few times, but like high agency is something that we really look for people that are not going to come in and kind of wait for everyone else to allow them to do something, they're just going to see a problem and go do it.

That's it's just a core part of how we work.

I think people that that are happy with ambiguity, because there is a massive amount of ambiguity here is not the kind of place and we have we have trouble sometimes with with more junior PMs because of this because it's just not the place where someone is going to come in and say, Okay, you know, here's here's the landscape, here is your area, I want you to go do this thing.

And that's that's what you want is a as an early career PM.

We just I mean, no one here has time.

And the nobody the problems are too ill formed, and we're figuring them all out as we go.

And so high agency very comfortable with ambiguity, ready to come in and help execute and move really quickly.

That that's kind of our recipe.

And I think also happy leading through influence.

Because I mean, it's usual as a PM people don't report to you, your team doesn't report to you, etc.

But you also have the the complexity of a research function, which is even more sort of self directed.

And it's really important to build a good rapport with the research team.

And so you know that I think the EQ side of things is also super important for us.

I know in most companies, a PM comes in and they're just like, why do we need you?

And as a PM, you have to earn trust and help people see the value.

And I feel like it open eyes probably a very extreme version of that where they're like, why do we need this person?

We researchers, engineers, what are you going to do here?

Yeah, I think people appreciate it done right.

But you got you bring people along.

I think one of the most important things a PM can do well is be decisive.

So it's, it's, there's a real fine line you don't want to be making it.

I mean, it's kind of like, I don't love the PM is the CEO of the product illusion all the time.

But, but just like Sam and his role would be making mistakes if he made every single decision in every meeting that he was in.

And he would also be making mistakes if he made no decisions in any meetings that he was in, right?

It's a, it's the, it's understanding when to defer to your team and to like, let, let people innovate.

And when there is like a decision to be made that people either don't feel comfortable with or don't feel empowered to make, or a decision that, that, you know, has too many different, like disparate pros and cons that are spread out across a big group and someone needs to be decisive and make a call.

It's a really important trait of a CEO.

It's something Sam does well.

And it's, it's also a really important trait of a PM kind of at a, at a more microscopic level.

And so because there's so much ambiguity, it's not obvious what the answer is in a lot of cases.

And so having a PM, they can come in and like, and by the way, this doesn't need to be a PM, I'm perfectly happy if it's anybody else.

But I kind of looked at the PM to say like, if there's ambiguity and no one's making a call, you better make sure that we get a call made and we move forward.

This touches on a few posts I've done of just where is AI gonna take over work that we do versus help us with various work.

So let me come at this question from a few different direction of just how AI impacts product teams and hiring things like that.

So first of all, there's all this talk of LM's doing our coding for us.

And 90% of code is going to be written by AI in a year.

Dario Denthropic said that, at the same time, you guys are all hiring engineers like crazy, PMs like crazy, you know, every function is dead, but you're still hiring every single one.

I guess just first of all, let me just ask this, how do you how do you and the team, like say engineers, PMs use AI in your work?

Is there anything that's like, really interesting or things that you think people are sleeping on and, and how you use AI in your day to day work?

We use it a lot.

I mean, every one of us is in chat GPT all the time, summarizing docs using it to help write docs with GPTs that you know, write product specs and things like that, all the stuff that you would imagine.

I mean, talk about writing evals, like you can actually use models to help you write evals and they're pretty good at it.

That all said, I still don't, I'm still sort of disappointed by by us and despite, I really mean me.

In if I were to, if I were to just like teleport my five year old self leading product at some other company into my day job, I would recognize it still.

And I think we should be in a world certainly a year from now, probably even more now that where I almost wouldn't recognize it because the workflows are so different and I'm using AI so heavily and I'd still recognize it today.

So I think in some sense, I'm not doing a good enough job of that.

You know, just to give an example, like, why shouldn't we be like vibe coding, demos, right, left and center, like, instead of showing stuff in like Figma, we should be showing prototypes that people are vibe coding, you know, over the course of 30 minutes to illustrate proofs of concept and to explore ideas.

That's totally possible today.

And we're not doing it enough.

Our actually, our chief people officer, Julia, was telling me the other day, she vibe coded an internal tool that she had at a previous job that she really wanted to have here at open AI.

And she opened I don't know, windsurf or something and vibe coded it.

Like, how cool is that?

And if our chief people officer is doing it, we have no excuse to not be doing it more.

That's an awesome story.

Okay.

And some people may not have heard this term, vibe coding.

Can you describe what that means?

Yeah, I think this was, I think this was Andre's term, Carpathi, Andre Carpathi.

Yeah.

Where it's just so you have these tools like cursor and windsurf that and get a copilot that are very good at suggesting what code you might want to write.

So you can give them a prompt and it'll write code.

And then as you go to edit it, it's suggesting what you might want to do.

And the way that everyone started using that stuff was give it a prompt, have it do stuff, you go edit it, give it a prompt, you know, and you're kind of like really going back and forth with the model the whole time.

As the models are getting better, and as people are getting more used to it, you can kind of just like, let go of the wheel a little bit.

And when the model is suggesting stuff, it's just like, tap, tap, tap, tap, tap, like keep going.

Yes, yes, yes, yes, yes.

And of course, the model makes mistakes, or it does something that doesn't compile.

But when it doesn't compile, you paste the error in and you say go, go, go, go, go.

And then you test it out.

And it like does one thing that you don't want it to do.

So you enter in an instruction and say, go, go, go, go, go.

And you just kind of like let the model do its thing.

And it's not that you would do that for production code that needed to be super tight today yet.

But for so many things, you're trying to get to a proof of concept, you're getting to a demo.

And you can really take your hands off the wheel and the model will do an amazing job.

And that's what that's that's vibe coding.

That's an awesome explanation.

I think like the pro version of that, which is I think the way Andre even described it as you talk, you do like, there's a step, like whisper, super whisper, something like that, where you're like talking to the model, not not even typing.

Yeah, totally.

Oh, man.

So let me let me just ask, I guess, when you look at product teams in the future, you talked about how you guys should be doing this more, instead of designs, having prototypes, what do you think might be the biggest changes in how product teams are structured or built?

Where do you think they're going in the next few years?

I think you're definitely going to live in a world where you have more, where you have researchers built into every product team.

And I don't even mean just at, at like foundation model companies.

Because I think the future, actually, frankly, one thing that I'm sort of surprised about, about our industry in general, is that there's not a greater use of fine tuned bottles.

Like a lot of people, you know, these models are very good.

So our API does a lot of things really well.

But when you have particular use cases, you can always make the model perform better on a particular use case by fine tuning it, it's probably just a matter of time, you know, folks aren't like quite comfortable yet with doing that in every case.

But to me, there's no question that that's the future.

Every models are going to be everywhere, just like transistors are everywhere.

AI is going to be just a part of the fabric of everything we do.

But I think there are going to be a lot of fine tuned models, because why would you not want to more specifically customize a model against a particular use case.

And so I think you're going to want sort of quasi researcher machine learning engineer types as part of pretty much every team, because fine tuning a model is just going to be part of the core workflow for building most products.

So that's that's one change that maybe you know, you're starting to see a foundation model companies that will propagate out to more teams over time.

I'm curious if there's a concrete example that makes that real.

And I'll share one that comes to mind as you talk, which is when you look at cursor and windsurf on something I learned from those founders, is that they they use like a sonnet.

But then they also have a bunch of custom models that help along the edges that make the specific experience.

That's not just generating code even better, like autocomplete and looking ahead to where things are going.

So is that one or any other examples of what you put?

What is a fine tune model that you think teams will be building with these researchers on their teams?

Yeah, I mean, so when you're fine tuning a model, one of the you're basically giving the model a bunch of, of examples of the kinds of things you want it to be better at.

So it's, it's, here's the problem, here's a good answer, here's a problem, here's a good answer.

Or here's a question, here's a good answer, you know, times 1000 or or 10,000.

And suddenly, you're you're teaching the model to be much better than, than it was out of the gate at that particular thing.

We use it everywhere internally.

We also, we use ensembles of models much more internally than people might think.

So it's not here is I have 10 different problems, I'll just ask, you know, baseline GPT 40 about a bunch of these things.

If we have 10 different problems, we might, we might solve them using, you know, 20 different model calls, some of which are using specialized fine tune models, they're using models of different sizes, because maybe you have different latency requirements or cost requirements at different for different questions.

They are probably using custom prompts for each one, like basically, you want the to teach the model to be really good, you want to break the problem down into more specific tasks, versus some broader set of high level tasks.

And then you can use models very specifically to get very good at each individual thing.

And then you know, you have an ensemble that sort of tackles the whole thing.

I think a lot of good companies are doing that today, I still see a lot of companies kind of giving the model single generic broad problems versus breaking the problem down.

And I think there will be more breaking the problem down using specific models for specific things, including fine tuning.

And so in your case, because this is really interesting, is that you're using different levels of chat GPT, like a 103.

And yeah, that's really cheaper.

There'll be parts of our internal stack.

So we do if you give you an example, customer support with 400 plus weekly, 400 plus million weekly active users, we get you know, a lot of inbound tickets, right?

I don't know how many customer support folks we have, but it's not very many.

3040 I'm not sure, way smaller than you would have at any comparable company.

And it's because we've automated a lot of our flows, we've got, you know, most questions, using our internal resources, knowledge base, you know, guidelines for how we answer questions, what kind of personality, etc.

You can teach the model those things, and then have it do a lot of its answers automatically, or where it doesn't have, you know, the full confidence to answer a particular question, it can still suggest an answer, request a human to look at it.

And then that humans answer actually is its own sort of fine tuning data for the model, you're telling it the right answer in a particular case.

And we're using it various places, you know, some of these places you want a little bit more reasoning is not super latency sensitive.

So you want a little more reasoning, and we'll use one of our series models.

In other places, you want a quick check on something.

And so you're fine to use like for a mini, which is super fast and super cheap.

And in general, it's like specific models for specific purposes.

And then you you you ensemble them together to solve problems.

By the way, again, not unlike how we as humans solve problems.

A company is arguably an ensemble of models that have all been, you know, fine tuned in based on what we studied in college and what we have like learned over the course of our careers, we've all been fine tuned to have different sets of skills.

And you like group them together in different configurations.

And the output of the ensemble is much better than the output of any one individual.

Kevin, you're blowing my mind.

That sounds exactly correct.

And also different people are you pay them less, they cost less to talk to some people take a long time to answer.

Some people hallucinating.

This is like this is a mental model, but really does work in thinking.

This is great.

Some people are visual, they want to draw out their thinking.

Some people want to talk word cell.

Wow, this is a really good metaphor.

So again, coming back to your advice here, because I love that we circled back to it.

It's you're finding a really good way to think about how to design great AI experiences, LMs, I guess, specifically think about how a person would do this.

Well, it's maybe not always the answer is to think about how a person would do it.

But sometimes to gain intuition for how you might solve a problem, you think about what an equivalent human would do in those situations, and use that to, you know, at least gain a different perspective on the problem.

Well, this is great.

There's just like, you know, because some of this really is talking to a model.

There's a lot of prior art, because we talk to other humans all the time and encounter them in all sorts of different situations.

And, and so like, there's a lot to learn from that.

Okay, so speaking of humans, I want to chat about the future a little bit.

So you have three kids, and someone, a community member asked me this hilarious question that I think it's something a lot of people are thinking about.

So this is Patrick Strail, I worked at him with a mid Airbnb, he asks, she says, ask what he's encouraging his kids to learn to prepare for the future.

I'm worried my six year old by the year 2036 will face a lot of competition trying to get into the top roofing or plumbing programs and get a backup plan.

That's funny.

So our kids are we have a 10 year old and eight year old twins.

So they're they're still pretty young.

They're kind of I mean, it's amazing how AI native they are.

Like, they just it's completely normal to them that there are self driving cars that they can talk to AI all day long.

They have full conversations with chat GPT and Alexa and everything else.

I think who knows what the future holds.

I think you know, things like coding skills are going to be relevant for a long time.

Who knows?

But I think if you teach your kids to be curious, to be independent, to be self confident, you teach them how to think, I don't know what the future holds.

But I think that those are going to be skills that are going to be important in any configuration of the future.

And so, you know, it's not like we have all the answers.

But that's how Elizabeth and I think about our kids.

And do you find that AI, there's a lot of talk about AI tutoring, is that something you guys are doing anything you're I know they're using chat GPT, I love the level of the photos you post, they're playing with prompts and stuff.

But I guess is there anything there you're you're experimenting with, or you think is going to become really important?

This is something that it's maybe the most important thing that that AI could do.

Maybe that's a maybe that's a grand statement.

There are lots of important things that I can do, including like, speeding up the pace of fundamental science research and discovery, which maybe is actually the most important thing AI can do.

But but one of the most important things would be personalized tutoring.

And it kind of blows my mind that there is still, I know there are there are a bunch of good products out there, like, you know, Khan Academy does great things.

They're a wonderful partner of ours.

Vinod Khosla has a nonprofit that has that's doing some really interesting stuff in this space and is making an impact.

But I kind of want like, I'm kind of surprised that there isn't like a two billion kid, you know, AI personalized tutoring thing, because the models are good enough to do it now.

And every, every study out there that's ever been done seems to show that when you have, you know, classrooms is still classroom, like education is still important.

But when you combine that with personalized tutoring, you get like, multiple standard deviation improvements in learning speed.

And so it's just, it's uncontroversial.

It's good for kids.

It's free, chat, GPT is free, you don't need to pay for it.

And the models are good enough, like, it still just kind of blows my mind that there isn't something amazing out there that, you know, our kids are using and your future kids are using and like, people in all sorts of places around the world that aren't as lucky as our kids to be able to like have this sort of built in solid education.

Again, chat GPT is free, people have Android devices everywhere like this could, I really just think this could change the world.

And I'm surprised it doesn't exist.

And I want it to exist.

This kind of touches on something I want to spend a little time on, which is a lot of people also worry a lot about AI, where it's going, they worry about jobs, it's going to take their worry about, you know, the super intelligence, squashing humanity in the future.

What's kind of your perspective on the on that and just kind of the optimistic case that I think people need to hear?

I mean, I'm a big technology optimist, I think if you look over the last 200 years, maybe maybe more, technology has driven a lot of the advancements that have made us the the world in the society that we are today.

It drives economic advancements, it drives geopolitical advancements, quality of life, longevity advancement, I mean, technology is at the root of of just about everything.

So I think there are very few examples where where this is anything but a great, a great thing over the longer term.

That doesn't mean that there aren't like temporary dislocations or where there aren't individuals that are impacted.

And that's like that matters too.

So it can't just be that the average is good.

You've got to also think about how you take care of each individual person as best you can.

So it's something that we think a lot about.

And as we, you know, work with the administration, as we work with policy, like, we try and help where wherever we can, we do a lot with education.

You know, one of the one of the benefits here is that chat GPT is also perhaps the best like reskilling app you could possibly want.

It knows a lot of things that can teach you a lot of things if you're interested in learning new things.

So but these are these are very real issues.

I'm super optimistic about the long run.

And we're going to need to do everything we can as a society to ensure that we like make this transition, you know, as graceful and as well supported as we can.

To give people a sense of where things might be going.

That's a big question.

A lot of people minds.

So someone asked this question that I love, which is AI is already changing creative work in a lot of different ways, writing and design and coding.

What do you what do you think is the next big leap?

What should we be thinking is the next big leap in AI assisted creativity specifically?

And then just broadly, like, where do you think things are going to be going in the next few years?

Yeah, this is also an area where I'm, I'm a big optimist.

Like, if you if you look at Sora, for example, I mean, we talked about image gen earlier and the the absolute like fount of creativity that people are putting across Twitter and Instagram and other places.

I'm I am the world's worst artist, like the worst, maybe the only thing I'm worse at than then, then art is singing.

And I, you know, I like give me a pencil and a pad of paper, and I can't draw better than my five than our eight year old, you know, it's just like it's, but give me give me image, Jen.

And, you know, I can think some creative thoughts and put something into the model and suddenly have output that I couldn't have possibly done myself.

That's pretty cool.

Even even you look at at folks that are really talented.

I was talking to a director recently about Sora, someone who's directed films that that that we would all know.

And, and he was saying, you know, for for a film that he's doing, like, say, say, take the example of some sort of sci fi ish, you know, think of like Star Wars.

And you've got some scene where there's a there's a plane zooming into some Death Star like thing.

And so you've got the plane looking at the whole planet, and then you want to cut to a scene where the planes like, you know, kind of at the ground level, and all of a sudden you see the city and everything else, right?

How are you going to manage that cut scene?

And, and that transition?

And he was saying, you know, in in the world of two years ago, I would have paid, you know, a 3d effects company, 100 grand, and they would have taken a month, and they would have produced two versions of this cut scene for me.

And I would have evaluated them, we would have chosen one, because what are you going to do, like pay another 50 grand, and we had another month?

And, and we would have just gone with it.

And you know, it would be fine, like, movies are great, I love them.

And there've been, obviously, we can do great things with the technology that we've had.

But you now look at what you can do with Sora.

And his point was now I can use Sora, our video model, and I can get 50 different variations of this cut scene, just, you know, me brainstorming into a prompt in the model brainstorming a little bit with me, I've got 50 different versions.

And and then of course, I can like iterate off of those and refine them and take different ideas.

And now I'm still going to go to that, that 3d effects studio to produce the final one.

But I'm going to go having brainstormed and like I had this much more creative approach with a with an outcome that's much better.

And and like I did that assisted by AI.

So my personal view on on creativity in general is that it's no one's gonna you don't type into Sora, like make me a great movie, it requires creativity and ingenuity and all these things.

But it can help you explore more, it can help you get to a better final result.

So, you know, again, I tend to be an optimist in most things.

But I'm actually I think, I think there's a very good story here.

I know Sam Altman, I think it was him who tweeted recently the creative writing piece that you guys are working on words.

Yeah, he's very bad at writing creative stuff.

And he shared example was actually really good.

Imagine that's another area then investment.

Yeah, there's, there's some exciting stuff happening internally with some new research techniques.

So we'll have more to say about that at some point.

But yeah, Sam, Sam sometimes likes to show off some of the stuff that's coming.

By the way, it's like very sort of indicative of this iterative deployment philosophy.

We don't have some breakthrough and keep it to ourselves forever.

And then you know, bestow it upon the world someday.

We kind of just talk about the things we're working on and share when we can, and launch early and often and then iterate in public.

And I really like that philosophy.

I love all these hints that a few things coming.

I know you can't say too much.

You talked about how there might be a coding leap coming in the near future, maybe by the time this comes out.

Is there anything else people should be thinking about might be coming in the near future, any things you can tease that are interesting, exciting?

Man, this hasn't been enough for you.

Oh, yeah, only everything is getting better every day.

Yeah, I'm like, man, I hope I hope we get some of the stuff out before the episode launches.

This is your new time box.

I don't piss people off.

Now, it's the amazing thing to me is we, we were talking earlier about how far models have come in just a couple years.

If you went back to GPT-3, you'd be like disgusted by how bad it was, even though Lenny of two years ago was mind blown by how good these were.

And for a long time, we were iterating every six to nine months on a new GPT model.

It was like GPT-3, GPT-3.5, 4.

And now with this O series of reasoning models, we're moving even faster.

We're like every roughly, you know, three months, maybe four months, there's a new O series model and each of them is a step up in, in capability.

And so the capabilities of these models are increasing at a massive pace.

They're also getting cheaper as, as they scale.

You look at, at where we were even like a couple years ago, the original, I think the original, I don't know, what was it GPT-3.5 or something was like a hundred X the cost of GPT-4 O mini today in, in the API.

So a couple of years, you've gone down two orders of magnitude in, in cost for much more intelligence.

And so I don't know where there's another series of trends like that in the world.

Models are getting smarter, they're getting faster, they're getting cheaper, and they're getting safer too.

You know, they hallucinate less every, every iteration.

And so there's just, you know, the, the Moore's law and, and, and transistors becoming ubiquitous.

That was a, that was a law around doubling the number of transistors on a chip every 18 months.

If you're talking about something where you're getting 10 X every year, that's a massively steeper exponential.

And it just, you know, it, it tells us that the future is going to be very different than today.

I still, the thing I try and remind myself is the AI models that you're using today is the worst AI model you will ever use for the rest of your life.

And when you actually get that in your head, it's kind of wild.

I was going to actually say the same thing.

And that's, that's the thing that always sticks with me when I watch this thing.

Like you're talking about Sora, and I imagine many people hearing that are like, no, no, it's, it's not actually ready.

It's not good enough.

It's not going to be as good as a movie I see in the theater, but the point is what you just made, but this is the worst it's going to be.

It will only get better.

Yeah.

Model maximalism, just like keep, you know, building, building for the capabilities that are almost there.

And the model is going to catch up and be amazing.

State to where the puck's going to be.

Yeah.

This reminds me, I was just using, I was ghibli flying everything the other day and I was just like, why is it taking so long?

What was that?

I said, as one does as one does these days.

I was just like, it has taken a minute to generate this image of my family in this amazing way.

Like, come on, let's take it so long.

You just get so used to magic happening in front of you.

Yeah, totally.

Okay.

Final question.

This is going to go in a completely different direction.

A lot of people asked about this.

So famously, you led this project at Facebook called Libra, which is now called Novy.

A lot of people always wondered what happened there.

That was a really cool idea.

I know some people have a sense there's regulation challenges, things like that.

I don't know if you've talked about this much.

So I guess just, can you just give people a brief summary of just like, what is Libra made this project to work on and just what happened and how you feel about it?

Yeah, I mean, David Marcus led it and I happily work for him and with him.

I think he's a visionary and also a mentor and a friend.

You know, honestly, Libra is probably the biggest disappointment of my career.

When I think about the problems we were solving, which are very real problems, if you look at, for example, the remittance space, people sending money to family members in other countries, it is maybe, I mean, it's incredibly regressive, right?

People that don't have the money to spend or having to pay 20% to send money home to their family.

So outrageous fees.

It takes multiple days.

You have to go then pick up cash from, it's just, it's all bad.

And here we are with like 3 billion people using WhatsApp all over the world, talking to each other every day, especially friends and family, exactly the kind of people who'd send money to each other.

Why can't you send money as immediately, as cheaply, as simply as you send a text message?

It's one of those things when you sit back and think about it, that should just exist.

And that was what we set out to try and do.

Now, I don't think we played all of our cards perfectly.

If I could go back and do things, there are a bunch of things I would do differently.

We tried to kind of get it all at once.

We tried to launch a new blockchain.

It was a basket of currencies originally.

It was integration into WhatsApp and Messenger.

And I think the whole world kind of went like, "Oh my God, that's a lot of change at once."

And it happened also to be at the time that Facebook was at the absolute nadir of its reputation.

And so that didn't help, right?

It was also not the messenger that people wanted for this kind of change.

We knew all that going in, but we went for it.

I think there are a bunch of ways that we could do that that would have introduced the change a little bit more gently, maybe still gotten to that same outcome, but fewer new things at once and introduced the new things one at a time.

Who knows?

Those were decisions we made together.

So we all own them.

Certainly I own them.

But it fundamentally disappoints me that this doesn't exist in the world today, because the world would be a better place if we'd been able to ship that product.

I would be able to send you 50 cents in WhatsApp for free.

It would settle instantly.

Everybody would have a balance in their WhatsApp account.

We'd be transact.

I mean, it was just, it should exist.

I don't know, to be honest, the current administration is super friendly to crypto.

Facebook's reputation, Meta's reputation is in a very different place.

Maybe they should go build it now.

I was looking at the history of it and apparently they sold the tech to some private equity company for 200 million bucks.

Yeah.

Yeah.

Yeah.

So, and then head of my back.

There are a couple of current blockchains that are built on the tech because the tech was open sourced in Aptos and Mistin are two companies that are built off of this tech.

So you know, at least the, all of the work that we did did not die, but, and lives on in these two companies and they're both doing really well, but still, you know, we should be able to send each other money in WhatsApp and we can't today.

Here, here.

Well, thanks for sharing that story.

Kevin, is there anything else you want to share or maybe a last negative advice or insight before we get to our very exciting lightning round?

Ooh, the lightning round.

Let's just go do that.

Let's do it.

With that, Kevin, we reached our very exciting lightning round.

Are you ready?

Yeah.

Let's do it.

Okay.

What are two or three books that you find yourself recommending most other people?

Co intelligence by Ethan Mollick, a really good book about AI and how to use it in your daily life as a student, as a teacher.

He's super thoughtful.

Also, by the way, a very good follow on Twitter.

The accidental superpower by Peter Zion.

Very good if you're interested in geopolitics and the forces that sort of shape the dynamics happening.

And then I really enjoyed Cable Cowboy.

I don't know who the author is, but the biography of John Malone.

Just fascinating if you like business, especially if you want to get into like, I mean, the man was an incredible deal maker and shaped a lot of the modern cable industry.

So that was a good biography.

These are all first time mentions, which is always a great.

Oh, good.

Next question.

Do you have a favorite recent movie or TV show that you really enjoyed?

Um, I wish I had time to watch a TV show.

So I'm just Sora videos.

Yeah, right.

I don't know.

I read when I was a kid, I read the Wheel of Time series.

And now Amazon has it as they're like the third season of it.

So I want to watch that I haven't yet.

Top Gun 2 was an awesome movie.

I think that's no longer new.

But you know, that shows my last movie was, but I like the idea like I want.

I want more like Americana.

I want more like being proud of being strong.

And I thought Top Gun 2 did a really good job of that.

Like, you know, pride and patriotism.

I think I think the US could use more of that.

Is there a favorite product that you recently discovered that you really love other than your super intelligence internal tool that you all have access to them?

I'm just joking.

That's right.

Internal HDR.

Well, I think I think like vibe coding with with products like windsurf is just super fun.

And I'm, I'm having a great time doing that.

I still just love that our chief people officer vibe coded some tools.

Maybe the other one is Waymo.

Every chance I get I'll take a Waymo, it's just a better way of writing and it still feels like the future.

So they've done an amazing job.

That's awesome.

By the way, I had the founder of windsurf on the podcast, they might come out before this or after this.

And also Cursors CEO is coming on the podcast either before or after this.

Oh, cool.

I have a ton of respect for what those guys are doing there.

Those are awesome products.

Just changing the way everyone builds product.

No big deal.

Yeah.

A couple more questions.

Do you have a favorite life motto that you often repeat yourself, find really useful and work your life?

Yeah.

So actually, this is interestingly enough, it's more of a philosophy.

But then I thought Zuck encapsulated it one time on a Facebook earnings call.

So I actually had this made into a poster.

It sits in my room.

But somebody was asking Mark, this is literally on an earnings call.

So it's like an analyst on an earnings call asking him, you know, it was some quarter when Facebook had grown a lot.

This was back in the 20 teens sometime, I think.

But it's like, you know, so what did you do?

What, you know, what was it that you launched?

That was the one thing that drove all this growth for you.

And he said something to the effect of, you know, sometimes it's not any one thing.

It's just good work consistently over a long period of time.

And that's always stuck with me.

And I think it is, I mean, you know, I run ultra marathons.

It's like, it's just about grinding.

I think people too often look for like the silver bullet when a lot of life is and a lot of like excellence is actually showing up day in and day out, doing good work, getting a little bit better every single day.

And you know, you may not notice it over a week, or even a month.

And a lot of people then you know, kind of get like dismayed and stop.

But actually you keep doing it, the gains keep compounding.

And over the course of a year, two years, five years, it adds up like crazy.

So good work consistently over a long period of time.

Damn, I love that.

I gotta make a poster of this now.

That is resonate with that.

Okay, that is so good.

Okay, final question.

I'm gonna ask if you have any prompting tricks, and I'm gonna set it up first.

But think about if you have a trick that you could recommend to people for prompting LMs better.

There's this I had a guest Alex Komarowski come on the podcast, he's from Stripe and writes us weekly reflections on what's happening in the world.

A lot of them are AI related.

And he once described an LM as a zip file of all human knowledge.

And all the answers are in there.

And you just need to figure out the right question to ask to get the answer to every problem basically.

And so just reminded me how important prompt engineering is and knowing how to prompt well, you're constantly prompting chat GPT.

What's one tip one trick that you found to be helpful in helping you get what you want?

Well, I'll say first of all, I want to kill the idea that you have to be a good prompt engineer.

I think if we do our jobs, that stops being true.

It's just one of those like sharp edges of models that experts can learn.

But then you just over time, you shouldn't need to know all that.

The same way you used to have to get deep into like, you know, what's your storage engine in MySQL?

Are you using in ODB 4.1?

Or like, and, you know, there's still use cases for that if you're at the at the sort of deep edge of MySQL performance, but most people don't need to care.

And you shouldn't need to care about minute details of prompting if AI is really going to become, you know, broadly adopted.

But, you know, today, we're not totally there.

I think I think by the way we are making progress there, I think there is less prompt engineering than there had to be before.

But in line with some of the fine tuning stuff I was talking about and the importance of giving examples, you can do like, you know, effectively poor man's fine tuning by including examples in your prompt of the kinds of things that you might want and a good answer.

So like, here's an example, and here's a good answer.

Here's an example, here's a good answer.

Now go solve this problem for me.

And the model really will listen and learn from that.

Not as well as if you do a full fine tune, but much more than if you don't provide any examples.

And I think people don't do that often enough.

That's awesome.

One tip that I heard, I'm curious if this works is you tell it this is very, very important to my career.

Make it like really understand like someone will die if you don't answer the correct thing.

Does that work?

It you know, it's really weird.

I there's probably a good explanation for this, but you can also say things.

So yes, I think there is some validity to that.

You can also say things like, I want you to be Einstein.

Now answer this physics problem for me, or you are the world's greatest marketer, the world's greatest brand marketer.

Now here's a naming question.

And it's there is something where it sort of shifts the model into a certain mindset.

And it can actually be really positive.

I use that tip all the time.

Actually, I always when I'm coming up with questions for interviews, and I use it occasionally to like come up with things I haven't thought of, I actually type you're the world's best podcast interviewer, right?

I have Kevin, Kevin wheel coming on the pot.

Yeah, and actually works.

Yeah.

By the way, back to our other point that we made a few times like you do do that sometimes with people, right?

You sort of put them you frame things, you get them into a certain mindset and the end of the difference.

So I think there are like human analogues of this one more time.

Kevin, this was incredible.

I was thinking about a way to end this the way I feel like I feel like not only are you at the cutting edge of the future, like you're you and the team are kind of like actually the edge that is creating the future.

And so it's a real honor to have you on here and to talk to you and to hear how you think things are, where you think things are going, and what we need to be thinking about.

So thank you for being here, Kevin.

Oh, thank you so much for having me.

I feel real I get to work with the world's best team.

And you know, all credit to them, but really appreciate you having me on.

It's been it's been super fun.

I forgot to ask you the two final questions working folks finding online if they want to reach out.

And well, how can listeners be useful to you?

I am at Kevin wheel, K, E, V, I, N, W, E, I, L on pretty much every platform, you know, I'm, I'm still a Twitter DAU after all these years, I guess an ex DAU, LinkedIn, wherever.

And I think the thing I would love from people, give me feedback.

People are using chat GPT, tell us where tell me where it can be where it's working really well for you and where you want us to double down.

Tell me where it's failing.

I'm I'm very active and engaged on Twitter.

I love hearing from people what's working and what's not.

So don't be shy.

And I learned following you helps you figure out all the stuff that you're launching, like you share all the things that are going out every day or week, month.

So that's also benefit.

And by the way, 400 million weekly active users all emailing you feedback.

Here we go.

Yes, let's do it.

Okay.

Well, thank you, Kevin.

Thanks for being here.

All right, man.

Thanks so much.

See you soon.

Bye, everyone.

Thank you so much for listening.

If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify or your favorite podcast app.

Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast.

You can find all past episodes or learn more about the show at Lenny's podcast.com.

See you in the next episode.