The Cognitive Revolution · 2025-06-12

Google DeepMind's 50X Growth & AI Lab Strategy with Logan Kilpatrick

Hosts: Nathan (Cognitive Revolution)

Guests: Logan Kilpatrick

Gemini 2.5 Prolong contextdiffusion language modelsagents and MCPAI lab convergence/divergencestartup opportunitiesAGI as product experiencehuman value in AI eraAI token economics

Why it matters

Google AI usage grew 50x in one year, from 10T to 500T tokens per month (~50K tokens per person on Earth)

Key claims

  • Google AI usage grew 50x in one year, from 10T to 500T tokens per month (~50K tokens per person on Earth)
  • Mid-2023 merger of Google Brain, Research, and DeepMind under Demis was the inflection point that transformed Google from 'sleeping giant' into a coordinated AI powerhouse
  • Kilpatrick expects more divergence among AI labs going forward as low-hanging fruit is captured and structural advantages (compute, distribution) matter more
  • Anthropic cutting off Windsurf after its OpenAI deal framed as defensible business logic about compute allocation to long-term partners

Episode summary

Summary

Logan Kilpatrick (Google DeepMind) returns to The Cognitive Revolution for a wide-ranging discussion of the May 2025 AI release avalanche and Google's transformation into a top-tier AI powerhouse. He reveals that Google AI usage has grown 50x year-over-year, from 10 trillion to 500 trillion tokens per month—roughly 50,000 tokens per person on Earth monthly—and credits this to the mid-2023 merger of Google Brain, parts of Google Research, and DeepMind under Demis Hassabis's leadership. The organizational unification, paired with Google's structural incentive to embed AI across billion-user products (Search, Workspace, YouTube, Cloud) and its TP/compute advantage, has closed the gap with competitors he says was visible from the outside.

On competitive dynamics, Kilpatrick expects more divergence among labs going forward because "low-hanging fruit has been captured" and structural advantages (infrastructure, distribution) will increasingly differentiate winners; he sees room for startups that pick a focused wedge but warns against competing head-on in foundation models without serious capitalization. He is skeptical of scenarios where labs hoard their best models (e.g., AI 2027), citing multiple game-theoretic incentives to ship externally, and Google's Cloud mandate to serve external developers. He also discusses Anthropic's Windsurf cutoff as defensible business logic, Google's diffusion language model demo as a potential new paradigm enabling personal generative UI, Gemini 2.5 Pro's exceptional long-context performance on OpenAI's MRCR benchmark, and the ongoing simplification of agent scaffolding as models absorb reasoning.

On AGI and the future of work, Kilpatrick predicts the AGI "moment" will be a product experience that weaves memory, long context, and reasoning together rather than a single model release. He expresses a strongly human-centric philosophy: he writes his own emails and tweets without AI assistance, believes people will continue to value authentic human perspective (including podcasts), and views NotebookLM-style on-demand content as net-additive rather than substitutive for human-created work. He closes with an open invitation to email him directly (lkilpatrick@google.com) for early-access program consideration.

  • Google AI usage grew 50x in one year, from 10T to 500T tokens per month (~50K tokens per person on Earth)
  • Mid-2023 merger of Google Brain, Research, and DeepMind under Demis was the inflection point that transformed Google from 'sleeping giant' into a coordinated AI powerhouse
  • Kilpatrick expects more divergence among AI labs going forward as low-hanging fruit is captured and structural advantages (compute, distribution) matter more
  • Anthropic cutting off Windsurf after its OpenAI deal framed as defensible business logic about compute allocation to long-term partners
  • Google's diffusion language model demo showed speeds that could enable personal generative UI; Kilpatrick open to non-autoregressive paradigms winning in some use cases
  • Gemini 2.5 Pro shows ~20% gap over peers on OpenAI's MRCR long-context eval (8 needles), with reasoning fused to long-context unlocking much larger effective context use
  • Agent scaffolding is collapsing: NotebookLM's audio overviews went from a 14-step to a 4-step pipeline as model capabilities absorbed intermediate steps
  • Kilpatrick predicts the AGI 'moment' will be a product experience combining long context, reasoning, and memory rather than a single model release

Source material

Transcript

has a lot of experience in the future.

Hello, and welcome back to the Cognitive Revolution.

Today's guest, Logan Kilpatrick, needs no introduction.

This is his fifth appearance on the show, and his tireless work in support of AI application developers, previously at OpenAI, and now for the last year and change at Google, is legendary.

In this conversation, with the benefit of at least a little time to process, we're looking back and digesting the overwhelming volume of major new AI models and products that Google and others have recently released.

Logan's experience at Google as their AI usage has grown some 50X, from 10 trillion tokens per month just a year ago after he started, to 500 trillion tokens per month today.

Which is notably more than 50,000 tokens per month for every person living on planet Earth.

Logan also shares his perspective on Google's incredible organizational transformation, from what once was described as a sleeping giant, to now an indisputable top-tier AI powerhouse.

He has assets headlined by the strongest overall compute infrastructure of any company, top-tier and Pareto frontier models, including Gemini 2.5 Pro, highly original and viral products like Notebook LM, game-changing applications in medicine and science that are starting to ship to trusted users, and what I have always and still consider to be the deepest bench of AI research talent and the most diversified, well-rounded research agenda to be found anywhere in the world.

He also offers thoughtful analysis on whether we'll continue to see convergence among leading AI companies, or more divergence as the low-hanging fruit gets picked.

Why he believes that startups still have unprecedented opportunities despite Big Tech's advantages.

The implications of Anthropic cutting off Windsurf after they partnered with OpenAI.

How the blinding speed of Google's latest "diffusion" language model could bring about yet another revolution in software.

And why, despite all the AI capabilities advances he's seen and helped to popularize, he's still betting that humans will continue to matter and taking a relationship-centric approach to his work.

Speaking of relationships, perhaps the highest alpha part of this episode was when I asked Logan for advice for those who want to break into the early access programs and other support structures that he and people in similar positions can provide.

I won't spoil his response here, but it did include his personal email and an invitation to reach out.

As always, if you're finding value in the show, we'd appreciate it if you'd share it with friends or leave us a review.

We always welcome your feedback too, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

With that, I hope you once again enjoy hearing from Logan Kilpatrick of Google DeepMind.

We're going to be talking about some regular cadence because I feel like we get to do this a lot and it's wonderful to be back.

It's only your calendar that would prevent that from happening, so be careful what you wish for.

So we were joking about titling this podcast the decade of the week of May 15th to May 22nd, 2025.

Holy moly, we've got just an absolute avalanche.

I've been kind of saying for a long time my grip on like all AI news is slipping.

And with this moment, I think it's officially slipped for everybody.

We've come a real long way since GPT-4 and a ton of stuff is happening.

Kind of want to run it down, but I also realize at this point we can't even be comprehensive.

So I also kind of want to take some strategic opportunities to zoom out a little bit and just get your bigger picture perspective on some things.

First one in that vein is over the last however many months, we've seen like several waves of leading AI companies launching very similar things in pretty short periods of time.

This has happened with reasoning models.

Most recently, it has happened again with the coding agents.

And on a feature level, we're also seeing quite a bit of like connect into your Gmail, connect into your Google Docs, different kind of context retrieval type things.

Some of that's obvious, but some of it is like pretty core research driven, right?

Like getting the models to reason.

How do you understand why the different leading companies seem to have such a similar development trajectory and like also launch timelines?

Yeah, that's a great question.

I think there's a couple of dimensions to this.

One, I think like on the research side, I think there's like true innovations that when people go out and talk about them, but it becomes clear in hindsight why something should be done.

And I think like maybe like reasoning is like kind of that story.

Obviously DeepMind has been working on that reasoning stuff for a long time.

I think it was like some of the particular techniques just became clear that let's make this level of order of magnitude of investment and things end up working out pretty well.

And then you bake and all the other stuff that we had been trying that was sort of independent and different and you start to see some really interesting things.

So I think some of this is people light the path and this is what's awesome about an ecosystem is like other people light a path.

You get to benefit from the path that they've lit and then you go in and sort of, you know, can bake in those innovations into what you're doing.

Plus benefit from the bets that you were making that were sort of independent of that.

So I think that's exciting on the research side.

I think on the product side, I think there's a lot of people.

AI is the most competitive both of the model and of the product side is the most competitive ecosystem in the entire world right now.

There is not a more competitive ecosystem with the amount of money, talent, intellectual capital, speed of execution, etc, etc.

Then there is in this AI ecosystem right now.

So I think there's just a lot of like competitive people who are actually really good at what they do and they don't want to be pushed behind by their competitors.

And there's this feeling that you have to like stay on par with what everyone else is doing.

And that actually ends up being like I feel that tension is somebody who builds products in the AI ecosystem, which is there's this tension between like doing what you think the long term future is versus like not trying to look like you're behind in the present moment and like finding that balance point between like the long term bets that are distinct that are actually going to give you a differentiated perspective over time.

And what the short term like we just need to have parity is a tough challenge.

I think like again, feel this on the developer side as well, because we obviously like provide compatibility layers for like people who use other model providers to come and use Gemini.

So there's always thoughts about like, how much do we invest in that to like, get feature parity while also developing sort of next generation API capabilities and model capabilities.

So it's a tough trade off on the dates and the timing stuff.

I think some of it is intentional.

I think a lot of it actually just ends up being like relatively happenstance that things launch around the same times.

But it is always fun to see the conspiracies of like, x, y and z companies just like sitting on something and then like with a one hour notice, they just like, all right, we're gonna all put like anyone who's worked in any company that's larger than like 20 people, it's like not actually possible to do that.

Like the amount of operational overhead and burden to do something like that is like companies are not that nimble, like even choose your favorite AI lab is not nimble enough to like have that level of reaction.

Well, I do have to say Google and DeepMind and you know, your team specifically, you guys have been pretty nimble, right?

I mean, one of the things I wanted to ask is just like, what has the experience been like what sort of culture I mean, a year ago, and definitely two years ago, right?

The sort of outside view of Google was sleeping giant turns scleronic, like everybody's kind of managing their own little fiefdom.

I don't know to what degree that was really true.

Now the narrative is totally flipped, like the giant is wide awake and really remarkably keeping pace with even much like younger and smaller companies.

One of the most interesting data points shared at the IO event was 500 trillion tokens per month now being processed across Google services, we're now into the next month.

So at the rate of that curve, we might be literally like 2x more already.

I don't know if you are watching the dials that closely to know, but like, how has that happened at Google?

culturally?

What has shifted or how has the what is your experience been internally, not just like going through that curve, but rallying everybody to actually support all the different work that has gone into supporting that curve?

Yeah, one of the interesting parts about this story is, I think at the core, it's like a people organizational story.

And I think like the challenge is like, it's just not like, that's not selling, you know, front page New York Times articles or whatever, choose your favorite analogy.

But historically, like if you look at how Google was set up to do a bunch of this work, I think the reality was, is it wasn't set up for this moment, I think like Google was obviously structurally had like many different teams doing AI stuff and for a lot of the right reasons, actually, because they were pursuing like fundamentally different goals in some sense, like Google Brain as an example historically had like had a very wide breadth of like, truly different research.

And that's where like a lot of like the transformer and other things like came out of that, like very varied breadth of research.

And then at the same time, you had Google research, which was doing sort of like more applied things in some cases, and like, trying to upstream a lot of that into other parts of Google.

And of course, Brain also did, did that sometimes.

And then you had DeepMind, which actually Demis and the team had a had like a strong opinion about like how they thought, and how they think we'll get to AGI.

So like they were pursuing like a very specific research direction.

direction.

And a lot of that you've seen sort of the things that have come out of Deepmind over the last six or seven years around that.

So, and I think in a world where like nothing had sort of, there, there wasn't a clear winner as far as like what the technology, at least in the short term could be to help scale us closer to like systems that get closer to AGI.

I think it made sense for Google to have those bets across the board, very different structural organizations and different approaches to doing this.

But I think it became clear at a certain point that like one of those things was working in the short term and we should go and like rally resources and sort of get everyone on the same page.

And I think that happened at the end of or like the middle of 2023 when brain and part of research and then Deepmind actually merged together.

And I think that was like the start of the story of like Google putting itself in the position to be successful for the next 10 years with this technology.

The challenging part is like large human systems are extremely complex.

So like it is not like there's just like a lot of humans involved.

And I think it's just like very easy to forget that like the level of complexity and chaos in human systems, regardless of like how, what you want the outcome to be.

And so I think the Deepmind team and Demis and the folks on the leadership team there have done a great job of sort of reinventing the sort of like culture and all that stuff in order to make an organization that two very different organizations bringing them together under a single roof, actually setting up the team structure so that Google could be successful building these models.

And then actually going through the process, like it's not just like the large scale training runs don't take like a minute.

So like your iteration cycle is like not super quick.

Your iteration cycle of releasing models and like doing all the end to end work is not super quick.

So like it took time to I think like actually get the iteration cycle going.

And obviously open AI and others are open AI specifically had been doing that iteration cycle, I think a little bit more leading up to some of these moments.

And I think we had been doing at the time.

So I think as you sort of do that iteration cycle, and then at the same time, like all of a sudden, everyone wants AI, the 500 million tokens a month is like a 50x increase over the previous year.

So at the same time that you're sort of setting up the right organizational structure, doing the iteration loop to make sure that you're actually training the world's best models.

You also need to scale up hardware.

And like that doesn't happen instantaneously either like you we need TPUs to do research, we need TPUs in order to do inference.

And the amount of TPUs that you need isn't just like sitting there idly waiting.

So there's like work involved, and there's timelines involved in all that.

So I think like all things considered, given the constraints, I think we're in an incredibly good position.

And the thing that gets me most excited is like the slope of improvement across all of those dimensions is like, how can we work better as a team and like get everyone on the same page?

How do we keep making sure actually that like breadth of research upstreams back into the main Gemini models?

How do we make sure we have the world's best infrastructure?

How do we make sure that we have that iteration cycle, as far as releasing new models down and we sort of learn the hard lessons and sort of develop rigor on that?

I think we're doing all those which has been incredibly exciting to see.

I think DeepMind also the last comment I'll make is like DeepMind has also transitioned.

And I think we talked about this before from being an organization that did foundational research to now actually building products.

So I think that's the last sort of step of this organizational journey is how do we build the Gemini app?

How do we think about what we do for developers?

How does that actually influence how we train models and like what that that sort of iteration cycle look like?

So all that stuff is now happened.

And the work has been done.

And I think we're like putting the pieces in the right places.

And I think now for the next three to five years, we get to see the outcome of making good and hard decisions to put things in the right place.

Yeah, if I had to summarize that and maybe contrast and I won't ask you to contrast, but there's another big AI research organization out there with sort of fragmented structure and tons of compute and researchers that have been pursuing lots of different directions for a number of years.

And that if anybody hasn't already identified it is meta.

And the contrast has been pretty strong.

One possible explanation would just be that like leadership at DeepMind sort of saw that like, yeah, we're kind of getting close.

Like this actually seems like it might be kind of a thing now.

And so it's worth going through all that trouble of reorganizing and I'm sure many things the demos would rather do than like reorganize an organization and redraw lines of reporting to who and who reports to who and whatever.

But if you're close and it's like you're kind of getting on that wartime footing, so to speak, not that I ever wanted to see an AI war, it's worth it.

And you haven't seen that same kind of thing at meta and maybe a few other companies.

And you also haven't seen the sort of integration of the work.

I mean, they do have like meta AI in their apps, but clearly it's kind of not on the same level.

And they also haven't done the reasoning thing.

Like, I'm sure they had meta researchers at the same San Francisco parties getting those same very few bit hints that like, Hey, this seems to be sort of working that everybody else seemed to kind of say, okay, we better make sure we're on that train.

And thus far they kind of haven't.

So I guess for me, takeaway there was maybe just the importance of conviction in leadership to kind of do whatever it takes and push hard on on small hints that seem credible seems to maybe matter a lot right now.

Yeah, the other piece that I'll add is like how I think it I think the incentives matter a lot.

And I think for Google, the incentives in the DNA matter a lot.

And I think like Google has been in Sundar said this many times.

And I think he's spot on about this.

Like Google has been an AI company since sooner took over in 2016 or whatever it was, like the transformer like people joke that the trans Google was sitting on the transformer didn't use it.

The transformer was powering Google search at multibillion user scale in a bunch of different ways.

So like the technology was being used, it wasn't in the same incarnation as like the current generative AI stuff, but like it was being used at that level of scale.

So I think like building models, deploying it, building that infrastructure, that iteration process had been in Google's DNA, it obviously needed to like be reformulated a little bit for current team that's doing that like across Google.

The other piece of this is just the incentives as well.

Like I think there's if you look across Google's products, like Google organizationally and from like what the future of our products look like are so incentivized to make great models because like the great models that we make, and this is an interesting thread that we should talk about sort of are present across all of our products like you like you're writing in docs, you're doing things in sheets, you're in a way mo you're doing stuff on YouTube with video, you're you're a cloud enterprise customer and you're doing something like all of those use cases end up benefiting from this.

Like it's not just like a add on thing, it is fundamental to the success of those products.

So I think there's an interesting angle to this around like how incentivized Google is to be successful.

I think we're highly incentivized and it's in the DNA of what the company's been doing for the last 10 years.

So I think those two things as well, if you don't have them, I think it just it makes this moment a lot, probably a lot more painful than it would have to be otherwise.

Hey, we'll continue our interview in a moment after a word from our sponsors.

In business, they say you can have better, cheaper or faster, but you only get to pick two.

But what if you could have all three at the same time?

That's exactly what cohere, Thomson writers and specialized bikes have since they upgraded to the next generation of the cloud, Oracle cloud infrastructure.

OCI is the blazing fast platform for your infrastructure, database, application development and AI needs where you can run any workload in a high availability, consistently high performance environment and spend less than you would with other clouds.

How is it faster?

OCI's block storage gives you more operations per second, cheaper.

OCI costs up to 50% less for compute, 70% less for storage and 80% less for networking and better in test after test.

OCI customers report lower latency and higher bandwidth versus other clouds.

This is the cloud built for AI and all of your biggest workloads right now with zero commitment.

Try OCI for free.

Head to Oracle.com/cognitive.

That's Oracle.com/cognitive.

That 50 or I should say 500 trillion tokens per month is 50,000 tokens per month for every human being on the face of the earth, which is a pretty crazy number.

That is crazy.

It's grown a lot faster than I expected that it might end.

Obviously, there's other providers out there processing a lot of tokens too.

We're now getting into the regime.

Again, of course, not everybody's using it too.

Starting to get some significant inference numbers on a per capita basis.

As you look ahead, do you think that we're going to continue to see this convergence where the leaders will mostly be doing stuff that is measurable on the same bar charts or do you think we will start to see more divergence, which could mean different form factors, significantly different strengths and weaknesses?

Who knows what divergence might look like, but what's your expectation there?

I would guess we see more divergence to be honest.

I was actually just at a dinner a couple of nights ago and was talking to some founders and a bunch of them are clearly betting on the fact that there'll be model convergence through this.

I think it depends on what level of abstraction you want as far as what are things going to converge.

My general sense is the low hanging fruit has been captured.

Now it's like what structural advantages do you have as a business to train LLMs?

I think Google has a really important infrastructure advantage in the ecosystem and a bunch of other things like that where I think you'll actually see those things shine through.

I think getting to this point was not unexpected.

I think now getting to the next level is not going to be easy for a lot of people to do.

That's where the world class teams and folks who are really making the order of magnitude of bet on this are going to see the advantages.

I would expect also intuitively through that lens of it only gets harder from this point.

The AI innovation does not become easier after this point to make these models better.

I would guess a bunch of teams start to, and it'll be interesting to see what the size of the labs, maybe all the labs will keep doing everything, but I think there really will be opportunities to go and focus on a specific area.

I'm just like conjecturing here, but you could imagine like Anthropic decides, "Hey, we actually just want to be the world's best coding model, and that's the only thing we care about, and that's what success looks like."

I don't think this is going to be true because they have a very broad mission of what they want to do, and it doesn't seem like it's specifically code, but you could imagine that there are companies that are doing some angle of this where they end up deciding, "Hey, there's actually value and diverging from this path of something super, super general because we could build a really great business and do that."

I think it's, again, it's at odds with some of the broad missions that some of these companies have, but I do think there's actually real value in that, and you can really go deep and start to understand how to build a long-term business and company around some of those things.

Maybe the big labs won't do that, but I think at least for other ...

There's obviously more people who are training foundation models than just the big labs, and I think a lot of those companies are going down the path of, "Let's find something that we're really good at.

Let's build a differentiated perspective on how to solve this problem from a model perspective or from an infrastructure perspective," whatever it is, and I think that makes a lot of sense, honestly.

Yeah, that's interesting.

I don't know.

I should be at least somewhat differential.

You probably know better than I do, but I just look at just how fast these core foundation models from the leaders are getting better, and I'm like, "I would not want to be a tier two foundation model trainer in today's world."

Especially if it's only getting harder from here, and you're saying that from the deep mind position, I'm like, "Boy, that sounds really hard from any other position."

I mean, I guess my full ...

It won't be for you to sign on to this in this moment, I don't think, but my, transparently, my position is like, "I just think the big tech companies are going to win anything and everything that they want to win," and you're seeing this bleeding into the application layer as well.

I mean, interested to hear, I know you've been very focused on supporting developers directly via the AI studio and the APIs.

You also, I'm sure, are in regular dialogue with folks like Cursor and Winsurf and anybody who might use Gemini 2.5 Pro as a coding model, but now we've also seen all the big three frontier developers in this last wave.

They've all put out a coding agent too, right?

How are they feeling and how are you talking to them about the fact that they're using, you want them to use the model, but you also now do have a competing product in market against them, right?

Yeah.

A couple of things.

One, I think the product that we do have with your referencing Jules, right?

Yeah.

At least for us, Jules is definitely super early.

I think I'm super excited.

It's a great team inside of Google that's working on it, but obviously it is the level of adoption that some of these other AI coding products have and is very much on a different level.

500 million ARR they just said today.

Yeah, yeah.

I saw that tweet, which is exciting for them.

I think there is, I was talking to someone last night and the comment that I made, which is continues to be true and I don't know if I've said it on another episode that you and I have talked about, but there's no better time in human history than right now to be building a startup.

If you're building a startup to build language models and compete against all the big labs, you better be very well capitalized to do that because that's a very difficult problem.

If you're building out the application layer, it's never been easier.

The time to build software, the opportunity to explore new ideas, the pace at which AI enables you to potentially, this current AI moment enables you to scale monetization and build really retentive user products and build a profitable business.

All of these things have never been, the barriers have never been lower than it is right now to do all those things, which is just as somebody who fundamentally believes in developers changing the world.

I think that's the coolest opportunity ever.

Sure, the big tech companies will hopefully be successful as well and we'll sell infrastructure and do some things at the application layer, but the real opportunity is there's a million and one different problems to be solved and some of these large billion user products solve things in a really general way.

The cool thing is you can really go deep for some specific user segment and solve their problem in a unique way and the cost to do that from building a startup and writing the software to do it has never been lower.

And the outside world, the startup world, there's a thousand and one different AI tools that you get to leverage in order to get to that place.

Larger companies don't, just because of the level of security and the privacy requirements and the enterprise requirements, oftentimes don't use a lot of those tools.

It's a different bespoke set of tools, which the pace of tooling innovation for large companies often happens a little bit slower than it does for the startup ecosystem.

So you have all of these entrenched speed advantages and then you couple in the idea that everyone's going to have a bunch of agents building stuff for them in the future.

I continue to be super, super excited for people, even in the coding space at the application layer who are building stuff.

There's so many cool things to be built.

The importance of speed or the criticality of the advantage of speed for startups, I think is definitely never, I mean, it's always been true, I suppose, but it seems like it's taken on an extreme importance now.

Just not too long ago, talked to Andrew Lee from Shortwave who is building a Gmail on top of Gmail, but also Gmail competitor.

And he said after really soul searching deeply, we came to the conclusion that our only advantage is speed.

We have nothing else.

Focus is the other piece.

And I think this goes to big companies and I feel this as well.

There's lots of tension for me because the cool thing about Google is there's a million in one innovative things that are happening.

The challenge is how do you actually balance that, take action based on the million in one innovative things that's happening?

It's a real burden.

And the nice thing for startups is you don't have a million in one innovative things happening.

You can just go and do one thing.

And I have a lot of envy for folks who have that because you just don't need to make a lot of decisions.

You can just really focus on solving that problem at hand.

And I think that's the speed of execution, the ability to focus on just a single thing is a blessing and take advantage of that as much as possible.

Hey, we'll continue our interview in a moment after a word from our sponsors.

It is an interesting time for business.

Tariff and trade policies are dynamic, supply chains squeezed and cashflow tighter than ever.

If your business can't adapt in real time, you are in a world of hurt.

You need total visibility from global shipments to tariff impacts to real time cashflow.

And that's NetSuite by Oracle, your AI powered business management suite trusted by over 42,000 businesses.

NetSuite is the number one cloud ERP for many reasons.

It brings a county, financial management, inventory and HR all together into one suite.

That gives you one source of truth, giving you visibility and the control you need to make quick decisions.

And with real time forecasting, you're peering into the future with actionable data.

Plus with AI embedded throughout, you can automate a lot of those everyday tasks, letting your teams stay strategic.

NetSuite helps you know what's stuck, what it's costing you and how to pivot fast because in the AI era, there is nothing more important than speed of execution.

It's one system giving you full control and the ability to tame the chaos.

That is NetSuite by Oracle.

If your revenues are at least in the seven figures, download the free ebook Navigating Global Trade, three insights for leaders at netsuite.com/cognitive.

That's netsuite.com/cognitive.

What do you make of this windsurf news lately?

The brief story is they agreed a deal with OpenAI.

They've been using Claude as their primary model and then anthropic cuts them off in virtue of having agreed to this deal with OpenAI.

On the face of it, that's all pretty reasonable.

But if I am a coding agent company or whatever that's thinking like, what's my long-term prospect here?

Right now, the speed advantage goes to the startups because the big tech companies have been so friendly, I guess, to the rest of the ecosystem as to put the models out before they've implemented them in their own products in many cases.

But it's not too hard to imagine that flipping.

If Google wanted to say, "Okay, Gemini 3, we're going to deploy in our own coding agent and Gmail and docs, and then a few months later we'll put it in the API," that would definitely really flip the speed advantage on its head.

Do you think startup founders should be worried about that?

Yeah, that's an interesting question.

I think one on the anthropic piece, I do think that the byline of anthropic wanting to invest in who they think will be long-term partners and getting compute to those customers, I think actually as somebody who's spent a bunch of time thinking about how we get compute to the right teams that are building products, I have empathy for that argument.

I think that could make sense.

It's totally defensible from any number.

A simple business strategy, don't support your competitors, I think is totally defensible in many, certainly in an in-normal business context.

Yeah.

I think as far as our strategy, the great thing for builders is Google Cloud is the fifth largest enterprise business in the entire world.

The mandate of Cloud is to bring this infrastructure to the rest of the world, to bring Google's infrastructure to the rest of the world so that people can build world-class startups and not need to rebuild the level of infrastructure that Google's built in order to scale the internet to where it is today.

It's just a core and foundational part of the business that I would find it hard to believe the strategy shifting from shipping across our own services, but also shipping across Cloud services.

Actually, interestingly, oftentimes today, it's actually even more extreme than the picture that you painted, which is oftentimes the external developers have an even larger speed advantage from a model perspective.

Because if you think about who is the customer of models inside of Google, it's teams that are building billion user products.

The teams that are building billion user or 150 million user products, I was just talking to someone about some of the random, not random, but some of the features and products that exist inside of Google Workspace and some of the ones that I've never even thought about have 150 million monthly active users, which is crazy.

That user persona, if you go and talk to enterprise users of LMs, they don't move the LMS quickly.

They don't switch models as quickly because even though they're inside of Google, there's still all of the normal constraints of building a large user product, which is you don't want the behavior to shift.

You have to do a ton of evals.

You have to do all these things and all that requires time.

I think when you have a small product, it's very easy to just quickly switch models and you don't really have to think about it that much.

But I think for teams inside of Google, they do have to think about that.

That's the responsibility to the users that we have is to be really thoughtful about that.

Again, I think the time horizon would shift so dramatically as far as getting these models out the door.

If the strategy became, we have to force internal teams to use these and then deploy them before we give them to extra.

They'd be so far off from where we are today that again, I have a hard time imagining that would make sense.

Also again, from the business perspective, it's important for us to give LMs to developers because it's a core part of the Google Cloud business, which is again, a huge business for Google.

I know you know Daniel Coctello and you guys over.

I don't know how well necessarily, but you overlapped at open AI.

I'm sure you're familiar with his AI 2027 scenario.

One of the interesting things in that is that he projects basically over these next two years that the model developers are going to start widening the gap.

I saw a room not too long ago on Twitter, somebody asked like how much ahead is what you have internally versus what we see externally.

He said, "You guys have no idea how good you have it.

It's like two months.

You're on the bleeding edge just behind where we are internally."

Daniel's projection is that will change and that for multiple reasons, including wanting to use the models intensively for their own ML research automation, dreams of recursive self-improvement and take off and who knows what.

He's got pretty aggressive scenarios in mind there.

He thinks that basically that gap is going to widen that the developers are going to start to hold the best models back for themselves.

The public will kind of satisfy more often and the really insane stuff will be held very closely and known to few people.

It sounds like you don't buy that scenario basically or at least don't see any signs of that happening at Google.

I think there's two dimensions.

One, it's hard to get signal on how good models are.

E-Vals is just such a difficult problem.

You often don't really have an intuition as to could this thing be the right model long-term if you don't release to the world.

That is just one of n pressures on the idea that you should get the model out the door.

I think you also then, I wouldn't underscore this momentum quasi-momentum war that happens, which is like it's really important to project what the extra momentum looks like from an AI perspective because ultimately I think actually if you go and talk to developers and people who are building companies, that's actually a really large influence on who they end up building on.

I also think there's a bunch of other threads of this, which is switching stuff is hard.

There's so many layers that I have a hard time again buying that we're not going to deliver models to the world in the same way that we're doing it now.

I think there's just many levels of motivation in game theory, which tell me that that won't be true.

It's also interesting to think about how do you actually capture the most from an economic perspective, how do you capture the most value from this technology?

Maybe it's not us.

The cool thing about developers is obviously Google has large distribution, but you get this really wide aperture of distribution across so many different things.

Maybe actually the economic model looks slightly different, which is like you're not the unit of intelligence is assuming the models are way better and can do all this technically productive stuff.

The unit of intelligence being a token and charging people on a per million token basis, I could buy that changes in the future and you're like the economic model looks different than how developers do it today.

But I still think fundamentally you would want, there's a great business to be built giving that to other people because how you're going to use this model looks very different than how other people are going to potentially use it.

You could build a great business doing that by releasing it to the world.

The importance of feedback definitely is not to be missed.

That does also connect back to what's going on at Meta, the line of conversation that for the record you're not commenting on, but I'm just tangentially mentioning like they're not getting nearly as much of that as the companies that have currently the best models in market are getting.

Yeah, that's interesting.

How much do you use other companies' models?

Do you go and do your own like different vibe checks across different providers?

What's your model diet?

Yeah, I play around with a bunch of stuff.

I think it's interesting.

It's fun to just like see how like I'm also like independent of my job.

Like I'm somebody who loves technology and I love seeing cool AI products.

So I spend a lot of time playing around with all the coding models.

I think it's probably like the thing that I experiment with the most, but I just saw like there's tons of cool stuff happening in the audio space right now.

There's like tons of, we launched our native audio model at IO, which was one of the threads and it's available on notebook LM and a bunch of other stuff.

But, and for developers, but it's been really interesting to see that as like a emergent space that people are building products and services.

So I've been spending a bunch of time playing around with like the products, the models that people have.

11 Labs just landed a new model to do something similar, like native, like really robust, natural sounding audio.

So it's been super cool.

There's across whatever dimension you want.

There's tons of fun stuff to play with.

I still try chat GPT occasionally every once in a while to play around with and see what that experience has evolved into.

But yeah, it's been, it's fun.

It's fun to be somebody who likes using this technology.

It is interesting though.

Like I don't, I'm not one of those people who I have lots of conversations with folks who talk about how they will like send the same query to three different models and then go and examine the differences between those.

And they do it in like three different tabs.

And these are people who I have a lot of respect for who are like running companies and all this stuff.

And I'm like, seems like a pretty cryptic way to be, to be doing that type of experimentation, which has led me to, there's probably some interesting product to build there.

Like people are really trying to understand the nuances and differences between models and engage with multiple answers.

And I think the like multiple pieces of content thing is like a really interesting thread to pull on.

You can imagine that in the future and in product experiences, but yeah, it is, I'm not at that level where I have like Claude Grock chat GPT and Gemini open at all times.

And I'm asking my question in three, four places.

I don't do that all the time by any means, but I do it on occasion.

My general philosophy is always to try to be doing two things at once.

One being whatever the object level task is, and then two being learning about AI's ability to help me with that object level task.

So if I have any sort of like contract review, it'd be a great example.

If I'm going to take on some advisory agreement or whatever, they'll send me the contract.

I'll send it to at least three AI's.

If none of them have an issue with it, I'll just sign it without even reading it myself.

And usually they are pretty consistent, but it is also kind of an interesting opportunity to see just how they're presenting things a little differently.

You know, Claude is like typically the shortest and the least formatted.

I don't know.

It's been, it is hard to characterize.

How would you characterize, I mean, Gemini 2.5 for me, which you just launched a new version on stage, right?

At the AI engineer world's fair.

I can't claim any deep familiarity with the new one because it just came out yesterday, but the Gemini 2.5 Pro class of models, I think we're on the third like date stamp version, right?

It was one of those like kind of hair raising moments for me because the command of the context window that it has was just so incredible.

I dumped, I've talked about this on a couple of different episodes.

So briefly, a research code base into it that had like four to 500,000 tokens, no other provider, at least with the level of access that I have even supports that level of, you know, that length of context and to see the command that it had of it.

And I was literally going back and forth, debugging problems in the AI studio with, and don't tell anybody because they might cut me off for this behavior, but I'm putting just like it's rewriting whole files for me.

And then I'm saying, Oh, I got this bug, please fix and like rewrite another version of these like long scripts for me with half a million tokens of context.

That was like, wow, you know, this is feel that step change.

I wonder what other step changes you would highlight that people might not be fully aware of or more kind of subtle vibes and tone sort of differences that you think distinguish Gemini from other options.

Yeah, this month, the whole model behavior piece has been really interesting to see like folks have a strong reaction to like default personalities.

And I think we're like very early in coming up with like a rigorous point of view as far as like how to make a default personality that works well, like through certain lens, like if you look at the requirements for building the Gemini model are the like baseline Gemini model is used across so many different, even inside of Google, so many different products that have like such a varied point of view and like product and user that they're building for that it actually becomes really difficult to like come up with.

I know Claude's and enthrobic have done a ton of stuff as like trying to make the model personality like feel distinct and like they have a point of view of what that should look like.

I think this goes back to like, what's the advantages for startups?

Like I think enthrobic gets to do that just because like, you know, the consumer product is like relatively small relative to like a billion user products.

So it has been interesting to see like us take a more middle of the road approach not try to have too much of a personality, but then also make sure that the model can have that if that's what the product that people want to build.

And like the best example of this is the Gemini app, like the Gemini app probably wants to like actually have a personality and do some of that stuff.

I think the challenge becomes how you like maintain that.

And I think I've seen that time and time again of like as you change the models, the personality dramatically changes.

And if you're like intimately conversing with these models, it feels like the person or the model you were talking to before is now gone and it's replaced by something else.

And I think that's actually a pretty jarring experience for today's like model iteration process.

I think there's some interesting stuff to happen to make that not the case.

But I think what you described and I actually I have a tweet queued up that I need to put out about long context long context capabilities because it's like far in a way to 25 pro this current iteration is like has a really large gap and I'll tweet this.

I'm in my era of tweeting things live right now just because it's top of mind, but I'll put this out and then I'll send you in the chat this tweet that I just put out.

It's like far into what you're talking about like gaps in model performance like long context is actually clearly one of those right now.

I'll put it in the chat for us.

I just opened Twitter and there it was eight seconds ago.

This is showing open AI's MRCR which is a long context eval that open AI built and you can see the Delta between the models and like far in a way the latest version of 2.5 Pro is just like on the order of like 20% better.

And this is eight needles which is like the hardest version of let's not just like single needle context is like retrieving one thing from the context window like even the Gemini 1.5 Pro model from over a year ago was like close to 100% accurate.

I like that was basically a solved problem.

The problem is like exponential decay as soon as you start adding more and like to see this level of progress from a model perspective with being able to process like and find eight distinct items is like actually pretty remarkable to see that happen especially given like there wasn't a whole lot of long context innovation that's happened in the last like year.

And I think it's this combination I've had this conversation.

I was just with Jack Ray yesterday who leads our reasoning.

He's awesome.

He leads our reasoning team leads our reasoning efforts and originally worked on long contexts and I've talked to him a lot about this like fusion of long context and reasoning and really it's like reasoning that enables this like really you to be able to use the full context window that's available.

So it's cool to see that happen finally.

Yeah, you can feel it.

I mean, I would say, you know, of course benchmarks and practical use are not the same thing.

I don't know what I would have said about Gemini 1.5 in terms of did it feel like it had the depth of command.

I did a few things where we put like whole books through it and ask it to find relevant quotes.

It did, you know, could do that pretty well.

But this new thing is if people haven't tried it recently, you know, and you have any one of the things that's challenging is it's like, that's so much context that it's hard in many cases for you to know if it's doing a good job.

Video is one of the best ones.

I think actually I've been doing this and it's like easy to I mean, the challenges you have to like perhaps already watch the video where like it's a fun experiment like take a long video you've watched and then like go and ask a bunch of questions and that use case tends to shine, which is really interesting.

And we've actually also interestingly like because of how good the context window has gotten, we'd like actually seen a shift in what the distribution of the request size looks like.

Like people are actually using historically, we'd been like, ah, why does no one really use long context that much?

Like did we or we were definitely ahead of the technology and the curve from that perspective.

And like now we've seen with 2.5 Pro is like the long context usage is dramatically higher than it's historically been, which has been awesome to see people coming around and building on I think it gets closer to this future like, you know, the whole long context versus rag discussion.

I think historically you could have like brushed it off a little bit because the model wasn't really that good and no one was really doing it.

But I think the future is going to look more like people putting more and more and stuff into the context window.

And of course those still need rag and cases, but it's awesome to see that like corner turning from a long context perspective.

Yeah, turn your hyper parameters up.

It's one of my current mantras.

Another good one just for people to want it if you want to get sort of a qualitative sense of the command that the new models have of long context.

I have a simple collab notebook for just extracting emails from the Gmail API where just do like, you know, from me basically just is sent kind of is my simple search to filter out all the crap I'm not engaging with, but just threads that I have sent a message to pull all of those.

You can go back, you know, depending on your volume of email pretty far and get a pretty robust picture of who you are that still fits into a million tokens.

And then you can start to get a sense for what the model understands of you from that million tokens.

And it is pretty impressive.

And I can share that collab notebook if anybody wants to mess around with that.

I put it in that format so that you can do it without your data ever having to leave Google.

It's just going from your Gmail to your own Google drive via the collab notebook.

So it was the most secure way I could think to make it that I could share with you.

I don't want to have your email.

It's the last thing I need.

So, okay.

You've mentioned a few times being with people, dinners, talking to founders, I guess two kind of angles on that.

One is what are you looking for in those groups?

I mean, everybody who's building wants to be in the inner circle of early access programs, the trusted tester rosters and all that kind of stuff.

How do people get into those programs?

How do they get into the trusted tester sets?

Also, can you enable VO3 API for me, please?

How much is network how you're keeping up versus like other ways of keeping up?

Yeah, that's a good question.

I think this looks different depending on like what you're doing.

I think for me, maybe I'm not sure how well this will track across people, but for folks who are listening this far into the conversation, I assume you're an AI enthusiast.

It's really strong after the first like five minutes.

I love that.

So send me an email.

Honestly, like we have a super robust early access program.

We'd love feedback from people building interesting things.

So if you're building something interesting, email me my emails, L the first letter of my name and then kill Patrick, my last name at google.com.

Send me email.

I would love to hear about what you're building.

We'd love to get you into the early access program.

It's not some big, there's some stuff is more secret.

Some stuff is less secret.

And we really just love to work closely with developers and get feedback and be as like open and building with people as possible.

So email us.

Yeah.

So hopefully at VO three API is a work in progress.

We don't have one available at the moment that we can like onboard people to externally.

We're setting up a bunch of stuff and also working on like how we can make it to let the model, the order of magnitude of scale for the API product is just like relative to putting it into a consumer product with a high price point.

It's just like different.

It's a lot of different dimensions.

So we're working on ways to like make sure the model can actually work at the, we have infrastructure to support the model at the scale that we're going to see demand with from an API perspective.

It's incredible to see all the like audio really bringing video to life.

Like historically, I'd been actually like truthfully, like pretty skeptical of a lot of the video models as far as like how much like it's cool to see the video generated.

Like you couldn't really like the practical use cases was just like hard for me because the amount of work it would take to like do something meaningful with that video is like pretty substantial.

And I'm curious for way Mark, how like I used the product, but I haven't used it recently, but a much like audio has been an important part of that story.

But I think it like really brings the video to life for me now that it has audio and the audio feels like it's actually native to what the video is meant to be saying.

Yeah, it's incredible.

First of all, I mean, I've been thinking recently that we're quite lucky.

I don't think anybody doesn't really get in plan this.

But you know, there was a lot of hand wringing about deep fakes and fake voices, cloned voices, making calls, including a little bit from yours truly around the election that didn't really come to pass.

I think maybe mostly because the models weren't quite there yet.

And we're fortunate that it's sort of landing like early in a cycle where we hopefully by the time the next election comes around, we'll actually have enough reps and people have sort of built up cultural immunity to it.

And there'll be more guardrails and whatever, such that hopefully we'll be able to deal with it.

But it is getting to the point now where it's like, I genuinely don't always know if something is AI video or real video.

And for waymark, in particular, audio has been really important.

We had have traditionally and still do take the like approach of just having a voiceover track, you know, we mostly make TV commercials, we mostly partner with big media companies, YouTube ads is a natural part of that.

And these are all sound on environments.

So anytime you see a TV commercial, there's usually a person talking to you.

And then there's visuals and there might be on screen text and images and what have you.

That's kind of our usual approach.

We use a mix of providers, but 11 Labs has certainly been a very important provider for us and their voice quality just continues to climb.

With the Vayo 3, it's like kind of opens up a new dimension.

In the past, we mostly used images that the businesses have.

And then the next step is like, well, if we can bring those images to life by doing image to video with even like a Vayo 2, then that just makes the whole thing more dynamic.

Quality is really important there, I would say Vayo 2 mostly hits the mark, but sometimes has some little bit of weird stuff.

Honestly, it is pretty, pretty damn good.

But with three, now it's like, oh, you could even kind of rethink the form factor a little bit.

You could sort of imagine having the voiceover talk a bit, but then also like flipping over to a clip and having that thing sort of present in a different voice.

And so it definitely opens up the space of possibilities for us in terms of the sorts of stories we can try to tell.

We're mostly telling small local business stories, but there's a lot of different ways to tell them, right?

And this, we've been relatively narrow in that space over time, just because the technology could only do so much.

This is the kind of thing that our creative team sees and it's up to them to kind of figure out like, what exactly would you want to make out of this new thing now that you can have all kinds of different voices showing up in a real context like that?

And honestly, I think we're still kind of wrapping our heads around it.

And also somewhat limited by the fact that we're still just testing it in the actual top tier Gemini app.

My personal AI spend is up to about a thousand dollars a month, which is also an interesting thing.

I've been saying that for a while, but I hadn't actually got there.

And now I'm pretty much there between open AI, Claude, the Gemini, all at the top level, plus like 20 other things that I've accumulated.

It's amazing to be spending a thousand dollars a month on AI subscriptions, but some of these things you got to have.

Yeah.

Do you think you're getting that level of value out of them?

And I assume like just be given the position that you're in that like some of them are duplicative because you want to test all the different stuff.

But like if you were to sort of remove the duplicative ones and like you just had whatever the best was across a bunch of different categories, like do you feel like you're getting that level of productivity boost relative to what you're spending today?

No question.

If I wasn't committed to testing everything and having the earliest point of view on things that I can get, I think I could get a very similar productivity boost for much less, but the productivity boost is still like dramatically higher than what I'm paying.

No doubt the acceleration of just all sorts of different work.

I mean, last week I was traveling a bit and ended up coding two different apps on Replit just with the agent doing almost everything for me.

And it was like, it's starting to feel like delegating work to other humans, you know, much more so than a few years ago when we talked about prompt engineering and like trying to originally, the original prompt engineering, right?

It's like setting things up so that a natural completion of what you provided would be what you wanted.

And now it's just like, I literally don't think that much about the fact that this is even AI and it's more just like, here's some product notes.

And if it messes up, then I'm like, well, why did you mess up?

Like, did I mislead you or something?

I have to think a little harder, but I'm really struck by how it is the communication to the AI is now is feeling much more natural, much more high level.

And no doubt the boost is tremendous.

This is the eval that I think is most, one of the most exciting ones to me, which is like relative to the amount of money that you're spending, like how much that I know, like it's hard to measure that I know.

So it is somewhat theoretical, but like that is what I think long-term as we move away from like all the regular academic benchmarks are saturated, et cetera, et cetera.

Like what is the economic productivity that's created by some of these systems and you need a broad sort of mandate in order to do something like that, just cause there's like so many, it's like infinite possibilities, but it is really interesting to think about that.

And like, I do think it's a cool North star to like drive up the amount of value you can create in the world and in like a very positive way through like to a $20 subscription, like the value you get today from a $20 subscription relative to what it's going to be in five years, I think is actually really materially different and it's, yeah, it'll be cool to see that play out.

Yeah.

Another dynamic that's going to be interesting to watch is sort of what, if any stable equilibrium do we ever arrive at?

I think right now one of the reasons that there's so much surplus for me is that we're not yet in equilibrium.

And so to a certain degree, I have like superpowers that other people don't have, which they could have, but they don't because they're not aware of it or they just haven't developed even just the habits.

Right.

I mean, a lot of it is honestly just thinking to do it in the moment or to go use the AI instead of doing it manually for whatever version of it you might be considering back in the holidays, late last year, there was a project that, and I wasn't involved in the business side of this at all, but somebody basically came to me and said, Hey, I've got a audio production project that this company, typically they've got big network and they want to do like a ton of local radio ads.

And I thought of you, maybe you could do it.

And they're like, they usually would pay like a couple hundred bucks per location, per version of the ad.

This would end up being into the like six figures, but we're wondering if you could do it for less and the discount that we were able to provide to this company relative to what they are used to spending was probably like 75%, but still the like revenue per hour that I actually spent on it was probably like $3,000 an hour, not all of which came to me by the way, but that that sort of disequilibrium, I think like doesn't last forever.

There's going to be a lot of people over time will figure that stuff out.

And so I do wonder kind of, you know, in that project in particular, and in general, I feel like I'm sort of still being compensated based on assumptions that like have not fully taken on board the fact that like productivity can and in some places has like significantly jumped.

I think my own happened in the future.

I don't know.

Yeah, I think there's still going to be those edges in the future, which is interesting.

Like I don't think the pace of I think if anything is true, the pace of innovation is going to go up.

I think back to my comment from before, I think that doesn't mean that it's not going to be more difficult, but I do think the pace of innovation is going to continue going up into the right.

And because of that, like I think there is going to be a lot of like discontinuities and like the opportunities.

So like being on the frontier is like likely to be disproportionately rewarded because you are using all the tools and stuff like that, which is super interesting to see play out.

But there's so many like edges and like so many opportunities that are left as the frontier keeps going forward that like even if you're just showing up today and you're like, I'm not on the frontier.

There's probably 50 things that you could go and explore that end up being like super, super interesting.

Force transition.

Speaking of things that are on the frontier and super interesting.

Let's talk a little bit about agents.

Obviously everybody's talking about agents in all sorts of different ways.

Here's a horseshoe theory of agents.

I've found that the latest things, whether it's like a cloud code or a jewels or any of these more agentic models that sort of take multiple steps and do bigger mini projects for you feel actually to me much more like the original chat GPT in that the mode of interacting with it is very turn based.

What's happening is the turns are getting bigger.

The output is getting bigger and hopefully more valuable, hopefully more accurate in order to be able to do all that stuff and succeed.

But you're sort of on a one off basis.

You're still like kind of on the hook as a human for like figuring out like, did it do what I wanted?

Did I ask it the right thing?

Is this actually working for me at all or not?

And how do I proceed based on what it did?

I have to sort of evaluate that on a step by step basis.

And then in the middle is where I think people are actually getting like scalable automation value where they're not letting the AI like choose its own adventure.

They're not just saying like, here's 50 tools and a goal like go, which can sometimes create these magic moments, but often like doesn't do what you want.

In the middle, it's a much more structured, whether it's Lang chain or whatever kind of paradigm that's like, we're going to break this thing down into its constituent parts.

We're going to have eight different prompts for the eight different steps.

There might be a couple little forks or whatever double back points in there.

So we'll give the AI maybe some discretion to kind of choose exactly what route it's going to follow, but it's like a pretty rails sort of system.

And those seem to be the things from what I've seen where people are actually getting to the point where the reliability is high enough that they no longer have to look at the output task by task basis.

So how would you coach people as they think about sort of the original chat bots that are now familiar to, you know, code flows agents, a genteck autonomous.

How do you see that spectrum and where should people be?

I'm sure you've got lots of thoughts.

Yeah.

But my take right now is that with reasoning, it's become very clear that a lot of the scaffolding will sort of move into that layer, like the model, like you'll send a request and you'll provide a bunch of scaffolding to the model and the reasoning step.

Like today, it'll have access to search and code execution and a code sandbox and tools and function calling.

But actually it'll end up being like models in general around this like trajectory to become like agents out of the box, which is actually really interesting.

Like they'll have all these capabilities baked in and the thing will be able to do.

And of course it'll be, there will be limits to what it does because you don't build everything into it, but it will by default have access to do a lot of things like that.

And then you could imagine you have a bunch of other hosted tools and things like that, which then sort of gets the sort of data flywheel spinning as far as like actually being able to like build the train, the models to go and do that.

And then you can imagine some of those like trajectories that you're describing with like the flows that the model goes through and the way that it tries to solve problems.

Like all that ends up also being sort of upstream to the model.

So I do think the models are sort of on that path to be systems and agents out of the box, but the, like the practical reality is like there will still always be the need for scaffolding.

And I think it's this like balance of how do you make the current version of the product that you want to work in a way that likely needs to use scaffolding, but you don't build it in a way that like it's ends up being not one way door.

And as soon as the model can do that thing, now you're, you'd have to fundamentally re and maybe the coding models are good enough that like rewriting everything from scratch actually won't be that difficult and it'll all be fine.

But I think historically if you like a larger product, it ends up being really hard.

And I think this is actually the transition that like I've talked to a lot of companies and products that they're like mid in this sort of AI 2.0 LLM 2.0 transition moment where they'd actually like built a lot of the original tooling for around the fact that like models weren't good at a lot of things.

So they had all this additional scaffolding.

They had all these additional layers and systems.

And like it actually was a pretty complex system to make LLMs work in production at scale.

And now that the models have become so good and can do a lot of these things natively, you can actually like remove a lot of that complexity, which again, depending on the complexity of what you built can actually be really, really difficult.

So I think it's like this.

And again, I think the folks who built the scaffolding and the complex system did the right thing because they wanted to make that product experience work.

And they probably benefited by the fact that they were AI native and powered by it from the beginning and hopefully won a bunch of customers in business.

But I think you also need to make sure that you can like continue to adapt because I think the models will be able to do more and more and hopefully take on more and more of that burden.

So it has been interesting to see that like I've had the increasing number of conversations with people who are in that boat of going through that transition right now.

And it's been specifically because of reasoning that this has like made it possible for a lot of people.

Yeah.

And long context, obviously goes hand in hand with that.

Yeah, we've experienced that at Weimar, especially in image processing.

I mean, I've told the story repeatedly too.

So just again, briefly, it's like how hard I had to work at one point in time, just to have any minimal understanding of what a random user uploaded image was kind of crazy.

And now it's like, feed 100 images into Gemini flash and it'll just like tell you which ones to use.

It's really simple.

So what once was a highly scaffolded workflow and had to be in order to get to the reliability point now for us is basically just a prompt.

And that does seem like that sort of cycle will repeat.

That's basically what you're describing and just making sure you're ready to kind of rip out scaffolding and sort of convert it to a prompt as that moment starts to hit for whatever you're building is like the is the recommendation.

Yeah.

And Josh, I had a conversation with Josh Woodward who runs Gemini app and Google Labs.

And he was saying how this played out for them in notebook LM even as well.

Like originally to make those notebook LM audio overviews happen, it was like a 14 step process.

And there was all this, like all these different handoffs and different steps in the loop, most of them powered by Gemini.

And today it's like a four step process.

And it's like dramatically simplified the level of complexity because the models are just like so good at doing a lot of those things now that they don't need to have like entire bespoke system built around writing the transcripts for the audio or reviews, which is really cool.

And like you actually feel that in the product experience in some ways too, or like the product experience has become like a lot faster.

And there's a lot of other things like because you don't need 14 different independent LM calls that are all sort of have to happen in a sequence.

So it's been cool to see the like product experience actually benefit in a lot of ways from this like level of simplicity that's come as the models have gotten better.

Any other things you think are really interesting hidden gems under appreciated just strong trends in the agent space.

A to A is something I've been kind of looking into and honestly haven't really been able to wrap my head around yet.

Yeah, I've got a non agent thing that's interesting.

And I'm happy to talk about but I think on the agent side, at least for it.

I think the like quick mental model is like, there's just parts of the building agents ecosystem that MCP doesn't solve.

And I think it is trying to solve some of those.

And like one of the examples is like the auth model and things like that.

So there are like parts of the story from putting agents into production at scale that need to be solved still.

And I think it's an open question whether or not like whereas MCP going to go long term is it going to sort of cat do a bunch of those things is it going to sort of leave space for other frameworks or standards to go and solve some of those problems.

I don't think we know yet.

So it'll be I'm sort of watching closely and interested to see what happens.

Okay, what else is on your mind?

Diffusion model.

Did you see the demo of the gem?

I did have it in there.

I didn't get to it.

But yes, did you get to play around with it yet?

I haven't used it.

You could I'd love to look at on that list as well.

I'll get you on the list.

Unbelievably.

First of all, it does make in some intuitive sense, a lot more sense to me than the auto regressive model.

Like when I just reflect on my own pattern of thinking, I feel like what I'm doing is much more sort of fuzzy and high level first and then sort of gets kind of segmented down into parts.

And then I like try to do those parts.

And then at some level, I'm writing sentences token by token.

But you know, that's that resonates far more than trying to sit out and sit down and write the whole thing linearly from the first token to last, even with a reasoning model to kind of read or scratch pad place to kind of mess around.

So yeah, I think would it surprise me if in two years the diffusion paradigm has kind of won because this sort of course defined structure, it turns out to be better.

Not really.

And damn, is it fast?

It's unbelievably fast.

Unbelievably fast.

Yeah, I'm really excited.

I think even if there's a world where the next token prediction paradigm continues, but just for people who want to build products that have that level of speed, maybe it's I don't it's unclear.

I think at this point, like what the performance trade off characteristics will be like, is it, you know, would the cost be the same all those things.

So there's a bunch of open questions, but assuming you could build product experiences for the similar costs that they are today with a similar sort of model quality, there's a lot of really interesting product experiences to be built if you have that level of like, I think actually that's the thing that could enable this like personal generative UI experience to really happen is if the tokens can actually be generated that quickly and like rendering on a screen and like the blink of a human eye would be really, really, really cool to see.

So I'm super excited and I'll get you on the list for access to that.

I think it's, if anything, even if it doesn't end up working out, I think it's just like a good reminder that we need to be pushing in different directions because like there are other paradigms that I think could work and maybe it's not next token production.

And there's a bunch of properties of some of these things that like editing and which is becoming like more and more common for a lot of these use cases that the diffusion model seems to be really well suited for.

Yeah, I suspect in the end, it could be quite a bit better for a lot of use cases too.

I mean, it just seems so natural.

I always say the transformer is not the end of history.

Obviously, there is an attention mechanism in a lot of these diffusion models too.

So it's attention is still part of what we need.

What do you think we're missing right now from AGI?

Memory is one often cited candidate.

What's on your list?

Memory is definitely one of those.

I think AGI is going to end up being much more of a product experience.

Like I would guess my, if I have a hypothesis about how people are going to end up having the AGI moment, my assumption right now, and we'll see if this plays out, is that like someone's going to release a model that ends up being really good and it's not going to be this thing that everyone is like, we've clearly built whatever your definition of AGI is, which is also the problem that everyone now has a different definition of AGI.

It'd be easy if we all have the same definition, we don't.

So that's the other problem that it's not going to happen.

But I think it is going to be like a product experience.

Someone is going to, because there's just the model piece of this, and I kind of almost believe you could do a lot of these things today.

And obviously there are certain constraints, but I think someone's going to weave together the right components at the product level with a model that's really smart.

And like maybe the, I don't know like the Delta and how smart the model needs to be relative to today to like actually have this experience work, but like it could be just like long context is 50% better and reasoning is 50% better.

And then you somehow figure out this away from memory to work.

And like the memory piece is like actually a completely different engineering, like neuroscience, human psychology problem of like, how do you surface the right things at the right time?

Like, I think someone's going to build that experience.

And it's, they're going to say that the feeling is that this thing is going to be like AGI.

And it's, again, it's really a product experience, like enabled by a model, but like the model itself isn't able to do all those things.

It's what happens when you take the model and you build everything around it, you do it in a really thoughtful way that people are going to say is sort of the AGI moment for a lot of folks.

So that's my guess right now.

And again, I think the models are like doing more and more of this stuff.

And you could imagine maybe the models are doing the memory stuff themselves and that gets trained into the model.

I think that's like very far out there, but in the short term, it's definitely going to be like a product experience that gets us to AGI, which is like not what I think the AGI narrative is like so model driven right now.

And I just don't think that's actually how people are going to feel and experience and what's going to end up happening.

A good memory work coming out of Google as you might expect.

We recently did an episode on the Titans architecture and there's already been a follow up to that.

Now up to 10 million tokens of memory and probably can go beyond, but they've demonstrated up to 10 million tokens with pretty strong memory performance.

So yeah, it could be coming sooner rather than later, but it is kind of a distinct module, right?

And obviously our brains have like many modules too.

Okay.

Maybe last question, then I'll let you hit on anything else you want, but one of the striking moments from IO was Sergey was asked in a little fireside chat what he thinks the future of the web is going to look like in five to 10 years.

And he almost like spit out his coffee.

I like at that moment where he was like future of the web.

He's like, I don't think we know what the future of the world is going to look like in five to 10 years.

And that's a striking reminder that even the people pushing the different tiers of this technology, like don't have a crystal ball and don't really know what exactly we're getting into.

So I wonder what your kind of expectation for the future of your life and your job is in the next, even let's say two to five years, are we going to get a drop in Logan replacement?

I mean, notebook LM is going to replace me.

Do you think you're on the chopping block in the next two to five years for AI replacement as well?

Or like, how do you see this shaping up?

Yeah, I was sitting at the front row of that fireside chat next to Cori, who's our CTO in deep mind and Emmanuel tropa who drives a bunch of our infrastructure stuff.

And it was fun to see their reactions as well to some of the conversation.

I have such a fundamentally like human centric view of the world.

Like I actually even in today's like as somebody who builds AI and likes and thinks all the AI products are cool.

I write everything I do personally, like I like all of the work that I do every email that I write every tweet that I write is written through my head.

And I have in probably 95% of cases, zero AI assistance involved in that process.

And it's because I have conviction in my worldview.

And it's because I have conviction in my sort of tone.

And I think like maybe you can make a loose approximation of someone, but like the reality is like, I want to be the entity which has agency over the things that like come out around who I am.

Because I think it was like fundamentally like, I think people will end up having this like fundamental like discuss question for themselves of like, who do they want?

Even if I have this digital twin, which knows all the things and like maybe it can make loose approximations and I could say, oh yeah, that seems reasonable.

I could potentially see myself saying something like that.

Like do I actually want that thing going and saying those things on my behalf?

Probably not.

Like that is a very foreign concept to what humans do today.

I think maybe the only exception like one, one notable, not exception to this, but like example against this is like people who run companies.

Like I can imagine like you have a large company.

You're like, oh, some team or some person is like representing Google as an example.

And like, oh, someone's saying, oh, maybe I wouldn't have said that way or I wouldn't have phrased it that way.

And like they are kind of, you know, still, they're still representing Google as a whole, but they're like sort of an independent agent on the behalf.

I think like, unless you've had that experience, like, but that's still like a very fundamentally different than like the human, like I want some other entity representing me.

And I, it's not clear to me that people are really going to want that experience.

I personally, like right now don't want that experience.

And that's just like my, that's my personal opinion.

That's what the decision that I'm making, but I do think it'll be interesting to see sort of where, where the balance is as far as like how much people do that.

But like, I also, this goes back to, and I have this, a bunch of these random convictions on my personal website, but like, I think like the value of humanity, like you for you actually as an example, Nathan, like I think this podcast is exponentially more valuable in a world where AI can generate human sounding things and can analyze content and put together research reports.

Like the reality is the next token prediction coming out of all of those systems isn't the next token prediction that's coming out of your brain.

And like the thing that I care about is like, what's the next token prediction or the diffusion, if you want to use that example, instead, the diffusion thoughts that are coming out of your brain, that's what I care about.

And like, I care about your perspective because you're another human and we have shared, you know, lived experiences and we've done stuff together in person, all that stuff.

I think there's places where like you won't care about that because of like whatever, like maybe the type of content or like, you know, there's certain dimensions where that won't matter as much, but I really do fundamentally believe like humans are interested in what other humans have to say.

Like when I think about someone sending me AI content that was written by AI or generated by, I just like care a little bit less.

I'm like not really that interested.

Like I can kind of tell there's, they're not willing to put in the craft and the time to do something.

Like why am I that interested in?

And again, there's exceptions to this.

Like software is a great example.

If someone builds a great software, like do I care whether or not a human wrote that or not?

Like kind of not really, it would be, yeah, maybe in some cases I'd appreciate it more if a human did it, maybe not in some cases.

So it is interesting.

There'll be a, there'll be a spectrum.

I'm not worried.

I think the way that people do work will shift in some capacity.

And I think the value of like having a differentiated perspective is also just going to be incredibly beneficial in a world where intelligence isn't the sort of limiting factor in a lot of ways.

So all that's to say, I'm excited for another five to 600 podcast episodes from you over the next two to five years.

Yeah, thank you.

See how many I can tick off for yet.

I think there's a lot to appreciate in your thoughts there.

And I do, I'm with you on some portion of it where I'm like, you know, there's this whole notion of like, if you don't have jobs, they'll have no meaning.

And I'm not on that train.

And I definitely am on the idea that like, one of the great things about AI could be that it could allow people to make much more of an, or put much more of an emphasis on making connections with each other.

And then at the same time, I'm like, I don't know, notebook LM is getting awfully good.

And it can handle any topic on demand, you know, so that and I noticed that in myself, like it is notebook LM, not in a huge fraction yet, but it is starting to like eat away from my listening to other podcasts, just because there are times when I'm like, I want to know about this thing.

And nobody's done a podcast on it yet.

And notebook LM will, you know, and it has that background knowledge.

And so even if it's like a little worse in some ways, or maybe is, and of course, the expressiveness of the voice and all that is getting pretty good too.

It hits a park and some other ways that really matter.

Yeah, I think the fundamental this is actually like the I was just listening to Sundar talk about this.

And I think it was a good example that he gave, which is around just like the this is like a great like search versus like AI chat, product experiences, which is like everyone two years ago was like, Oh, now that chat GPT has however many hundreds of millions of monthly active users, that has been that has remained true.

And across even other products, I'm sure there's more hundreds of millions of users.

Yet, like Google searches are growing like the number of queries on search is growing, the search business is still growing.

And it's because like, they're actually, in some sense, solving like fundamentally different problems.

And I think like that notebook alum example you gave of like, I want on demand entertainment about this very specific topic, and maybe that no one has created that type of content before is like kind of a different use case.

And like, I think and maybe this doesn't fully track across like all podcasts, but like there's lots of podcasts that I listen to where I'm like, I just want to hear what this person has to say about this.

Like that's the thing that I actually care about.

And I'm kind of willing to listen to them talk about kind of whatever.

But like, hey, I actually I'm trying to learn about some very discreet tasks that I have nothing about like the chance that one of the choose my top five favorite podcasts have talked about that it's actually probably slim.

So you need some other mechanism in order to do that.

But I think this like that and maybe that's not like fully true.

So I'm curious what your reaction is that but I think that is going to play out across a lot of other domains and dimensions where it's like, this thing actually is like net additive and creates and like it sure puts pressure in some capacity because there's a limited amount of time in the human day, but like it doesn't end up being as disruptive as it would look on paper.

Yeah, I hope not that time limit is a very hard constraint as it stands right now.

I've started to recently up my listening speed from it had been to access the default and actually YouTube just increased these the kind of things that really moved the needle for me.

The mobile max speed has now gone from 2x max to I don't even know what the max is now but you can go well beyond two.

And so I can listen to things that like 2.5 x by default and just save myself another six minutes that saves me six minutes on a on a one hour piece of content.

So this is the way I'm trying to pack more and more in.

I don't think I can go too much farther down that path.

So yeah, at some point it does become just in terms of competition for time and attention.

It does become like you hit some sort of fundamental limits.

I mean, maybe we get neural link working and then that's to the next big unlock where it's like just the bandwidth increases so dramatically that all bets are off.

I do think there's a credible line of thought honestly that like upgrading human cognition in very like deeply integrated ways is going to be necessary.

I mean, that's sort of the Elon brief pitch for why found neural link was like to be able to go along for the ride with the AIs and especially see these diffusion models man.

I mean, already the autoregressive ones are faster.

They can write Gemini can write much faster than I can read.

And then the diffusion models are like another order of magnitude faster.

So the speed of it all is just going to be another wild thing to contend with.

And now you've got these like video models.

I just saw one that is like generating video dynamically in real time and you like interact with it.

And yeah, boy, I share a lot of the excitement and enthusiasm and I'm also like, I don't know that I can compete actually in with all this stuff.

It just seems like it's going to get really, really good at everything and be ubiquitous and be so personalized to each individual user and kind of know what they know and what they don't need to know.

And like, how do I compete as somebody who's a one size fits all with a not a huge audience, but you know, there's enough people out there that I certainly can't customize the podcast for each one.

And when they I can do that.

I can only I think the point though is what is people want the Nathan experience.

I think is the point like that's at least that's my fundamental bet and conviction is that like that's long-term why in a world where I could spin up a thousand podcasts that look something similar to yours, like people want your perspective and like there's value in that even though it's not the like highest order optimization of whatever the piece of like the delivery of the content or whatever it is.

That's my bet.

So we'll see if that ends up being true, but I have conviction in that bed.

So yeah, hopefully it'll turn out right.

Yeah, I hope you're right.

Well, I know many people want the Logan experience and I know we're over time so I appreciate you sharing so much time and info with us today.

Look forward to doing it again in the not too distant future.

This is great, Nathan.

Thank you for having me.

It's fun to chat and hopefully I'll see you in person again soon.

Cool.

Vayo 3 diffusion model put me on your list.

I will.

And with that Logan Kilpatrick, thank you for being part of the cognitive revolution.

It is both energizing and enlightening to hear why people listen and learn what they value about the show.

So please don't hesitate to reach out via email at TCR at turpentine.co or you can DM me on the social media platform of your choice.