
No Priors ยท 2026-03-20
Andrej Karpathy on AI Agents, AutoResearch, and the Future of AI Labs
Hosts: No Priors host
Guests: Andrej Karpathy
Why it matters
AI coding agents have drastically changed software engineering, enabling delegation of coding tasks and collaboration among multiple agents.
Key claims
- AI coding agents have drastically changed software engineering, enabling delegation of coding tasks and collaboration among multiple agents.
- Karpathy uses AI agents for home automation, integrating various smart devices into a unified natural language interface controlled via WhatsApp.
- Auto research involves autonomous AI systems optimizing model training and hyperparameters without human intervention, increasing research efficiency.
- There is a growing ecosystem balance between closed frontier labs and open-source AI models, with open-source closing the gap in capabilities.
Episode summary
Summary
In this episode of No Priors, Andrej Karpathy discusses the transformative impact of AI coding agents on software engineering workflows, highlighting a shift from manual coding to delegating tasks to multiple collaborative agents. He shares his personal experience of using agents to automate complex tasks such as home automation and emphasizes the importance of skill in effectively instructing these agents. Karpathy introduces the concept of 'auto research,' where AI systems autonomously improve models and conduct experiments with minimal human intervention, aiming to maximize token throughput and remove researchers from the loop.
Karpathy also reflects on the evolving AI ecosystem, noting the balance between closed frontier labs like OpenAI and the growing capabilities of open-source models. He advocates for a diverse and collaborative AI research environment to avoid centralization risks. Additionally, he discusses the lag of robotics compared to digital AI advancements, the potential for agent-driven interfaces between digital and physical worlds, and the reshaping of education through AI agents. The conversation touches on the future of AI research, the role of autonomous agents, and the implications for job markets and software engineering demand.
- AI coding agents have drastically changed software engineering, enabling delegation of coding tasks and collaboration among multiple agents.
- Karpathy uses AI agents for home automation, integrating various smart devices into a unified natural language interface controlled via WhatsApp.
- Auto research involves autonomous AI systems optimizing model training and hyperparameters without human intervention, increasing research efficiency.
- There is a growing ecosystem balance between closed frontier labs and open-source AI models, with open-source closing the gap in capabilities.
- Robotics development is expected to lag behind digital AI due to complexity and cost but holds significant long-term potential.
- Future AI systems may specialize into multiple expert models rather than a single generalist model, reflecting biological speciation.
- Education is shifting towards AI agents explaining concepts tailored to individual needs, reducing direct human-to-human teaching.
- Karpathy emphasizes the importance of independent research roles outside frontier labs to maintain autonomy and alignment with broader human interests.
Source material
Transcript
Code's not even the right verb anymore, right?
But I have to express my will to my agents for 16 hours a day.
Manifest.
How can I have not just a single session of plot code or codex or some of these agent harnesses?
How can I have more of them?
How can I do that appropriately?
The agent part is now taken from granted.
Now the claw-like entities are taken for granted.
And now you can have multiple of them.
And now you can have instructions to them.
And now you can have optimization over the instructions.
But I mean, this is why I guess this psychosis is that this is like infinite.
And everything is skillless you.
Hi, listeners.
Welcome back to No Buyers.
Today I'm here with Andre Carpathi.
And we have a wide ranging conversation for you about code agents, the future of engineering and AI research.
How more people can contribute to research.
What's happening in robotics?
His prediction for how agents can reach out into the real world and education in this next age.
Welcome, Andre.
Andre, thanks for doing this.
Yeah, thank you for having me.
So it's been a very exciting couple of months in AI.
Oh, yeah.
You could say that.
I remember walking into the office at some point and you were like really locked in.
And I was asking what you were up to.
And you're like, I just, I have to code for 16 hours a day.
Or code's not even the right verb anymore, right?
But I have to express my will to my agents for 16 hours a day.
Manifest.
Because like, there's been a jump in capability.
What's happening?
Tell me about your experience.
Yeah, I kind of feel like I was business perpetual.
I still am often in this state of AI psychosis just like all the time.
Because there was a huge unlock in what you can achieve as a person, as individual, right?
Because you were bottleneck by, you know, your typing speed and so on.
But now with these agents, it really, I would say in December is when it really just something flipped where I kind of went from 80, 20 of like, you know, to like 2080 of writing code by myself versus just delegate into agents.
And I don't even think it's 2080 by now.
I think it's a lot more than that.
I don't think I've typed like a line of code probably since December basically.
Which is like an extremely large change.
I was talking to it, like for example, I was talking about it to for example, my parents and so on.
And I don't think like a normal person actually realizing that this happened or how dramatic it was.
Like literally like if you just find a random software engineer or something like that, at their desk and what they're doing, like they're default workflow of, you know, building software is completely different as of basically December.
So I'm just like in the state of psychosis of trying to figure out like what's possible trying to push it to the limit.
How is it?
How can I have not just a single session of, you know, I'm clock code or codex or some of these agent harnesses?
How can I have more of them?
How can I do that appropriately?
And then how can I use these class?
What are these class?
And so there's like a lot of new things.
I want to be at the forefront of it, you know, and I'm very Ency that I'm not at the forefront of it.
And I see lots of people in Twitter doing all kinds of things and they all sound like really good ideas.
And I need to be at the forefront or I feel extremely nervous.
And so I guess I'm just in the psychosis of like what's possible, like because it's unexplored fundamentally.
Well, if you're nervous, the rest of us are nervous.
We have a, we have a team that we work with at conviction that their setup is everybody is like, you know, not the engineers right code by hand.
And they're, they're all microphone and they just like whisper to their agents all the time.
Is this a strange as work setting ever?
Yeah.
And I thought they were crazy.
And now I like I fully accept, I was like, oh, this was the way.
Like you're just ahead of it.
What?
How do you think about your own capacity now to like explore or to do projects?
Like, what is it limited by?
Yeah, what is it limited by?
Just I think everything, like so many things, even if they don't work, I think to a large extent, you feel like it's skill issue.
It's not that the capabilities not there is that you just haven't found a way to string it together of what's available.
Like, I just don't, I didn't give good enough instructions in the agents been defile or whatever it may be.
I don't have nice enough memory tool that I put in there or something like that.
So it all kind of feels like skill issue when it doesn't work.
Do some extent.
You want to see how you can paralyze them, etc.
And you want to be Peter Steinberg basically.
So Peter is famous.
He has a funny photo where he's in front of a monitor with lots of, like, uses codex.
So lots of codex agents telling the, the monitor.
And they all take about 20 minutes if you prompt them correctly and use the high effort.
And so they all take about 20 minutes if you have multiple, you know, 10 reposts checked out.
And so he's just going between them and giving them more.
It's just like you can, you can move in much larger macro actions.
It's not just like here's a line of code, here's a new function.
It's like, here's a new functionality and delegated to agent one.
Here's a new functionality that's not going to interfere with the other one, give it agent two.
And then try to review their work as best as you can depending on how much you care about that code.
Like, where are these macro actions that I can manipulate my software repository by?
And like, another agent is doing some research and another agent is writing code.
Another one is coming up with a plan for some new implementation.
And so everything just like happens in these macro actions over your repository.
And you're just trying to become really good at it and develop like a muscle memory for it is extremely.
Yeah, it's very rewarding number one because it actually works.
But it's also kind of like the new thing to learn.
So that's why hence the psychosis.
Yeah, I do feel like my instinct is like whenever I'm waiting for an agent to complete something the obvious thing to do is like, well, I can do more work.
Right, like if I have access to more tokens and like I should just paralyze at tasks.
And so that's that's very stressful because if you don't feel very bounded by your ability to spend on tokens.
Yeah, then, you know, you are the bottom neck in the system that is max capability.
Yeah, if you're not maximizing your subscription.
Yeah, at least.
And it's ideally for multiple agents.
Right.
If you're out of the code out on code X, you should switch to quality or whatnot.
I don't know.
Like, that's what I've been trying to do a little bit.
And I feel nervous when I have subscription leftover.
That just means I haven't maximized my token throughput.
So I actually kind of experienced this when I was a PhD student.
You would feel nervous when your GPUs are not running.
Like, you have GPU capability and you're not maximating it.
Do you available flops to you?
But now it's not my flop.
It's about tokens.
So what is your token throughput?
And what tokens throughput do you command?
I would actually argue that is very interesting that we had, you know, at least 10 years where in many engineering tasks, people just didn't feel compute bound.
Mm-hmm.
Right.
And to not get the entire industry feels that now.
They feel like they felt resource bound.
Mm-hmm.
And now that you have this big capability jump, you're like, Oh, actually, it's not, you know, my ability to access the compute anymore.
Yeah.
Yeah.
Yes.
It's a skill issue, which is very empowering because, um, yeah, because you could be getting better.
So that's why that's why I think it's very addictive because there's unlocks when you, when you get better.
What do you think it goes?
Like, if you just think about, like, okay, you know, Andre's iterating and everything else is for 16 hours a day, getting better at using coding agents, like, what does it look like in a year?
Of, like, you've reached mastery.
Yeah, what does mastery look like, right?
At the end of the year, or like, two, three years, five years, 10 years.
Yeah.
Yeah, everyone is basically interested in, like, going up the stack.
So I would say, yeah, it's not about a single session with your agent, multiple agents, how do they collaborate and teams and so on.
So everyone's trying to figure out what that looks like.
And then I would say, claw is also kind of an interesting direction, because it really, when I say a claw, I mean, this, like, layer that kind of takes persistence to a whole new level.
Like, it's something that, like, keeps looping.
It's like, um, it's not something that you are interactively in the middle of.
It kind of, like, has its own little sandbox, it's own little, you know, it kind of, like, does stuff on your behalf, even if you're not looking kind of thing.
And then also has, like, maybe more sophisticated memory systems, etc.
There are not yet implemented in agents.
So, uh, open claw has a lot more sophisticated memory.
I would say than what you would get by default, which is just a memory compaction when your context runs out, right?
You think that's the piece that resonated for more users versus, like, perhaps, like, broader tool access for open claw?
Yeah.
There's, like, I think there's at least five things there.
There's a lot of really good answers in here.
Yeah, good job here.
Yeah.
It has done a really amazing job.
Um, I saw him recently.
Uh, and I talked to him about it.
And I, he's very humble about it.
But I think he innovated simultaneously in, like, five different ways and put it all together.
Um, so for example, like, the soul and D document, like, he actually really crafted a personality that is kind of compelling and interesting.
And I feel like a lot of the current agents they don't get this correctly.
Actually, I think a claw has a pretty good personality.
It feels like a teammate.
Uh, and it's excited with you, etc.
Uh, I would say, um, for example, Codex is a lot more dry.
Um, which is kind of interesting because it's actually PT codex is, like, a lot more upbeat and highly sick of panic.
But I would say Codex, the coding agent is very dry.
It doesn't, it doesn't seem to care about what you're creating.
It's kind of like, oh, I implemented it.
It's like, okay, but do you understand when we're building?
It's true.
You know, it doesn't.
It doesn't.
Uh, and the other thing I would say is, for example, with claw, I think they dialed the sacrifice fairly well, where when claw gives me praise, I do feel like I slightly deserve it because sometimes I kind of give it, like, not very well formed thoughts.
And I give it an idea that I don't think it's fully baked.
And it doesn't actually react very strongly.
It's like, oh, yeah, we can implement that.
But when it's a really good idea by my own account, it does seem to reward it a bit more.
And so I kind of feel like I'm trying to like earn its praise, which is really weird.
Okay.
And so I do think the personality matters a lot.
Uh, and I think a lot of the other tools may be don't appreciate it as much.
And I think in this aspect, also Peter really cares about this.
And so that was correct.
And then the memory system and then just, you know, he's just having fun with this.
And then the, the single WhatsApp portal to all of the automation.
Yeah.
Is there something that you have done personally with your claws beyond software engineering that you think is fun or interesting?
Yeah.
So in January, I had a claw.
I went through a period of claw psychosis.
So I built, um, I have a claw basically that takes care of my home.
And I call them WDLF a claw.
Um, and basically, I used the agents to find all of the smart home subsystems of my home on the local area network.
Which I was kind of surprised that worked out of the box.
Like I just told it that I think I have sonos at home.
Like, can you try to find it?
And it goes and that like I piece can of all the, um, Basically, um, computers on the local area network.
And it found the sonos thing, uh, the sonos system.
And it turned out that there's no password protection.
It's like that.
It just logged in and it's like, oh, yeah, you have these sonos systems installed.
I let me try to reverse engineer how it's working.
It does some web searches.
And it finds like, okay, these are the API endpoints.
And then it's like, do you want to try it?
And I'm like, whoa, like you just did that.
I'm like, yeah, can you try to play something in the study?
And it does and music comes out.
And I'm like, I can't believe I just, that's crazy.
That's like three prompts.
Yeah, I can't believe I just typed in like, can you find my sonos?
And that sound is playing music.
And it did the same for lights.
And so basically, like, it kind of hacked and figured out the whole thing.
Created API is created dashboard.
So I could see the command kind of center of like all of my lights in the home.
And then it was like switching lights on and off.
And, you know, so I can ask it like dobi at sleepy time.
And when it's sleepy time, that just means all the lights go off, etc.
And so it controls all of my lights, my HVAC, my shades, the pool and spa.
And also my security system.
So I have a camera pointed outside of the house.
And anytime someone rolls in, I have a Quinn model that looks at the videos.
So first of all, there's changed detection.
Right.
And then based on change detection, it goes to Quinn.
And then it actually like tells me, it sends me a text to my WhatsApp.
It shows an image from the outside.
And it says, hey, a FedEx truck just pulled up, FedEx truck just pulled up.
And you might want to check it.
And you got me a mail or something like that.
And dobi just text me this is really incredible.
So so dobi is in charge of the house.
I text through with it through WhatsApp.
And it's been like really fun to have these macro actions that maintain my house.
I've been like really pushed it like way more beyond that.
And I think people are doing a lot more crazy things with it.
But for me, even just a home automation setup, I used to use like six apps.
Yeah.
Completely different apps.
And I don't have to use these apps anymore.
Like dobi controls everything in natural language.
Amazing.
And so I think like I haven't even pushed a paradigm fully.
But already that is so helpful.
And so inspiring I would say.
Do you think that's indicative of like what people want from a user experience perspective with software, right?
Because I don't think, you know, it's pretty ignored that it takes humans effort.
Like learn new software like new UI.
Yeah.
I think to some extent, that's right.
It's like working backwards from how people think an AI should be.
Because what people have in their mind of like what an AI is.
It's not actually what an LLM is by like in the raw sense.
Like LLM is a token generator.
You know, like more tokens come out.
But what they think of is like this persona identity that they can tell stuff.
And it remembers it.
You know, and it's just kind of an entity behind the WhatsApp.
It's like a lot more understandable.
Okay.
So I think to some extent, it's like matching the expectations that he was already half from it.
And they actually behave.
But under the hood, it's like a lot of technical details go into that.
And LLM's are too raw of a primitive to actually.
Kind of check as AI.
I think for most people that makes sense.
Yeah.
I think that's like how we understand what the AI is.
And like the.
Description of it as Dobby or some percent.
Obviously resonates with people.
I also think that it.
The unification that you did across your six different software systems for your home automation speaks through different questions of like.
Do people really want all the software that we have today?
Yeah.
Right.
You like, well, you have the hardware.
Yeah.
But you've now thrown away the software or the UX layer of it.
Yeah.
Do you think that's what people want?
Yeah.
I think there's this like.
There's this says that these apps that are in the app server for using these smart home devices, et cetera.
These shouldn't even exist kind of in a certain sense.
Like shouldn't it just be APIs and shouldn't agents be just using it directly?
And wouldn't it like.
I can do all kinds of home automation stuff that any individual Apple not be able to do, right?
And then all I'm going to actually drive the tools and call all the right tools and do pretty complicated things.
And so in a certain sense, it does point to this like maybe there's like an overproduction of lots of custom bespoke apps that shouldn't exist, because agents kind of like crumble them up.
And everything should be a lot more just like exposed API and points.
And agents are the glue of the intelligence that actually like tool calls all the parts.
Another example is like my treadmill.
There's an app for my treadmill and I wanted to like keep track of how often I do my cardio.
But like I don't want to like log into web UI and go through a flow and etc.
Like all this should just be like make APIs available.
And this is kind of you know going towards the agentic.
Sort of web or like agent first tools and all this kind of stuff.
So I think the industry just has to reconfigure in so many ways that it's like the customer is not the human anymore.
It's like agents who are acting on behalf of humans.
And this refactoring will be probably be substantial in certain sense.
One way that people sometimes push back on this is like do people do we expect people to buy code.
Some of these tools do we expect normal people to do this kind of stuff that I described.
But I think to some extent.
This is just you know technology as it exists today and right now there is some buy coding and I'm actually watching it and I'm working with the system.
But I kind of feel like this kind of stuff that I just talked about this should be free like in a year or two or three.
There's no bad coding involved.
This is trivial.
This is table sticks.
This is like any AI even the open source models is such are can like do this.
You should be able to translate it from a less technical humans intent very easily to this.
Yeah.
Today's web coding and some of the not making people are going to do it.
And you still have to make some design decisions where we're talking about like we take frames.
For example.
Yeah.
But I kind of feel like this will just start to the barrier will just come down and it's just a femoral software on your behalf.
And some kind of like claw is handling all the details for you, but you're not involved claw has a claw has a machine and it will figure it out.
And it's just presenting a UI and you're like saying stuff, you know.
Why haven't you?
I guess like push the boundaries of what you can do personally.
But it's like is it, you know, you're focusing on more important projects, auto research etc.
Or you're climbing the hill to mastery or something else, right?
Yeah, I just feel like I'm so distracted by everything.
So I spend like a week on the claw stuff.
And I have more to do is almost.
But I will say that.
So I'm sent to all this for all just this year.
Yeah.
Yeah.
I didn't really take advantage of a lot of like email and calendar and all this other stuff.
And I didn't give it access because I'm still a little bit like suspicious and still very new and rough around the edges.
So I didn't want to give it like full access to my digital life yet.
And part of it is just the security privacy and just being very cautious in that in that realm.
And so some of it is like held back by that I would say.
Yeah, maybe that's like the dominant dominant feature, but some of it is also just I feel so distracted because I feel like I had a week of claw and then other stuff is happening.
What was the, I mean, you've talked about like being able to train or at least optimize a model as a task you want to see agents do for a long time.
Like what was the motivation behind auto research?
Auto research.
So I think like I had a tweet earlier where I kind of like set something like lines of to get the most out of the tools that I've become available now you have to remove yourself as the as the bottle like you can't be there to prompt the next thing you're you take yourself outside.
You have to arrange things such that they're completely autonomous and the more you know how can you maximize your talking throughput and not be in the loop this is the this is the goal.
And so I kind of mentioned that the name of the game now is to increase your leverage.
And put in just very few tokens just once in a while and a huge amount of stuff happens on my behalf.
And so auto research like I tweeted that and I think people liked it and whatnot, but it doesn't.
They haven't like maybe worked through like the implications of that and for me auto research an example of like an implication of that.
Where it's like I don't want to be like the researcher in loop like looking at results etc.
like I'm holding this step back.
So the question is how do I refactor all the abstractions so that I'm not.
I have to arrange it once and it go the name of the game is how can you get more agents running for longer periods of time without your involvement doing stuff on your behalf.
And auto research is just yeah here's an objective here's a metric here's your boundaries of what you can and can do and go.
And yeah you're surprised it is effectiveness.
Yeah I didn't expect it to work because so I have the project that a chat.
And fundamentally like I think a lot of people are very confused with my session for like change BD2 models and so on.
But for me, a training GPT models and so on is just a little harness, a little playground for training LLMs.
And fundamentally what I'm more interested in is like this idea of recursive self improvement and to what extent you can actually have LLMs improving LLMs.
Because I think all the frontier labs is like the thing for obvious reasons and they're all trying to recursively self improve roughly speaking.
And so for me this is kind of like a whole playpen off the hat.
And I guess I'd like to name chat already quite a bit by hand in a good old fashioned way that I'm used to like I'm a researcher I've done this for like you know two decades.
I have some amount of like what is the opposite cube risk.
Yeah.
Or into confidence.
Okay.
I have like two decades of like oh I've trained this model like thousands of times of like.
So I've done a bunch of experiments.
I've done high primary tuning.
I've done all the things I'm very used to and I've done for two decades.
Yeah.
And I've gotten to a certain point and I thought it was like fairly well tuned.
And then I let our research go for like overnight and it came back with like.
Tunings that I didn't see.
And yeah, I did forget like the weight decay on the value embeddings and my atom bait as we're not sufficiently tuned.
And these things to jointly interact.
So like once you tune one thing the other things have to potentially change to you know I shouldn't be a bottleneck.
I shouldn't be running these hyper parameters to shop my Asians.
I shouldn't be looking at the results.
There's objective criteria in this case.
So you just let you just have to arrange it so that it can just go forever.
So that's a single sort of version of auto research and like a single loop trying to improve.
And I was surprised that it it found these things that I you know the replay was already fairly well tuned and still found something.
And that's just a single it's a single loop like these frontier labs they have GPU clusters of tens of thousands of them.
And so it's very easy to imagine how you would basically get a lot of this automation on smaller models.
And fundamentally everything around like frontier level intelligence is about extrapolation and scaling loss.
And so you basically do time of the exploration on the smaller models and then you try to extrapolate out.
So you're saying our research efforts are going to get more efficient like we're going to have better direction for when we scale as well.
If we can do this experimentation better.
Yeah, I would say that like the most interesting project and probably with the frontier labs are working on is.
You know you experiment on the smaller models you try to make it as autonomous as possible remove researchers from the loop.
They have way too much what is the what is the opposite.
Yeah, they don't know they shouldn't be touching any of this really.
And so you have to like rewrite the whole thing because right now I mean certainly they can contribute ideas but okay they shouldn't actually be enacting this ideas there is a queue of ideas.
And there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos and it funnels ideas and or researchers can contribute ideas.
But it's a single queue and there is workers that pull items and they try them out and whatever works just gets sort of put on the feature branch and maybe some people.
Like monitor the feature branch and merge to the main branch sometimes so yeah just removing humans from all the processes and automating as much as possible and getting high token tokens per second through puts and it does require we thinking of all the abstractions.
And everything has to be reshuffled so yeah things are very exciting.
I take one more a course of step here.
What is the model going to write a better program MD then you.
Yeah, so program MD is not exactly.
Yeah.
So program MD is my crappy attempt at describing like how the auto research should work like do this and do that and then try these kinds of ideas and then give maybe some ideas like look at architecture look at optimize or etc.
But I just came up with this in Markdown right.
And so yeah exactly you want some kind of an auto research loop maybe that looks for.
You can imagine that different program dot and these would would give you different progress so basically every research organization is described by program MD.
Yeah.
Research organization is a set of Markdown files that describe all the roles and how the whole thing connects.
And you can imagine having a better research organization so maybe they do fewer stand ups in the morning because they're useless and this is all just code right.
And so you can so one organization can have fewer stand ups one organization can have more.
One organization can be very risk taking one organization can be less as you can definitely imagine that you have multiple research orgs.
And then they all have code and once you have code then you can imagine tuning the code so 100% there's like the meta layer of it.
Do you see my text about my contest idea my contest idea was.
Like let people write different program MD's right and so for same hardware where do you get most improvement.
Oh I see and then you can take all that data and then gives model it's a right a better program MD.
Yes, yes.
Yeah exactly.
We're going to get something better.
Like there's no way we don't.
You got 100% look at where the improvements came from and like can I change the program MD such that more of these kinds of things would be done or like things that didn't work.
That's a meta optimization.
Yeah, you can 100% imagine doing that so I think this is a great idea, but it's like.
You know, I think like you sort of go one step at a time where you sort of have one process and then second process and then the next process and these are all layers of an onion.
Like the LM sort of part does not take it for granted.
Agent part does not take it from granted.
Now the claw like entities are taken for granted and now you can have multiple off them and now you can have instructions to them and now you can have optimization over the instructions.
And you're just like a little too much, you know, but I mean this is why I guess this is psychosis is that this is like infinite and everything is skill issue and that's why I feel like yeah that's just coming back to this is why it's so insane.
Okay, well if we're just trying to like diagnose the current moment and what is a relevant skill right now.
What do you like what do you think is the implication that this that this is the loop we should be trying to achieve in different areas and that works right like you know remove.
Create the metric or create the ability for agents to continue working on without you.
Yeah, do we still have performance engineering like what.
Yeah, I mean so there's a few caveas that I would put on top of the Ellen psychosis number one.
This is extremely well suited to anything that's objective metrics are easy to evaluate.
So for example, like writing kernels for more efficient.
You know, code for various parts of the model.
It's a chart to perfect fit because you have inefficient code and then you want efficient code that has the exact same behavior but as much faster perfect fit.
So a lot of things like our perfect fit for our research but many things will not be and so they is just if you can't evaluate them can't utter research it right.
So that's not caveat number one and then maybe caveat number two I would say is you know where we're kind of talking about next steps and we kind of see what next steps are but fundamentally the whole thing still doesn't.
It's still kind of like bursting at the seams a little bit and there's cracks and it doesn't fully work and if you kind of try to go too far ahead the whole thing is actually net not useful if that makes us.
Because these models like still are not, you know, they've improved a lot but they're still kind of like rougher on the edges as maybe the way I would describe it.
I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been like a systems programmer for their entire life and a 10 year old.
And it's so weird because humans like there's like I feel like they're more coupled like you have you know.
Yeah, you wouldn't even counter that combination.
This jaggedness is really strange and humans have a lot less of that kind of jaggedness although they definitely have some.
But humans have a lot more jaggedness.
Sorry, the agents have a lot more jaggedness where sometimes like.
You know I asked for functionality and it like comes back with something that's just like totally wrong and then we get into loop through our totally wrong and then I'm just I get so frustrated with the agents all the time still.
Because you feel the power of it.
But you also they're still like it does not statistical things once in a while for me, still as well.
I get very annoyed when I feel like the agent wasted a lot of compute on something it should recognize was obvious problem.
Yeah, I think like some of the bigger things is like maybe what's under underneath it if I could type off the size is fundamentally these models are trained via reinforcement learning.
So they're actually struggling with the exact same thing we just talked about which is.
Can improve the models and anything that is very viable with the hazard rewards so did you write the program correctly and does it you do the unit test checkout yes or no.
But some of the things where they're struggling is like for example, I think they have a tough time with like new ones of maybe what I when I hadn't mind or what I intended and went to ask clarifying questions.
Or like what yeah, it's just anything that feels softer is like worse and so you're kind of like you're either on rails and you're part of the super intelligence circuits.
Or you're not on rails and you're outside of the rarefabble domains and suddenly everything comes just like meanders like maybe another way to put it is if you go to if today if you go to like state of the art model, touch a pity and you ask it tell me a joke.
Do you know what joke you're gonna get that's the joke.
The joke.
I do feel I can't tell you like the standard form of it, but I do feel like touch a pity has like three jokes.
Yeah, yeah, so the joke that apparently all the elements like left the most is why do scientists not trust atoms.
Okay.
Yeah, because they make everything up.
Okay.
They make everything up.
Okay.
So this is still that emerge.
So this is the joke you would get three or four years ago and this is the joke you still get today.
Okay.
So even though the models have improved tremendously and if you give them an authentic task they will just go for hours and move mountains for you.
And then you ask for like a joke and it has a stupid joke is crappy joke from five years ago and it's because it's outside of the outside of the RL.
It's outside of the reinforcement learning.
It's outside of what's being improved.
It's like and it's part of the jaggedness of like shouldn't you expect models as they get better to also have like better jokes or more diversity of them or it's just it's not being optimized and it's stuck.
Do you think that that implies that we are not seeing like generalization in the sense of like broader intelligence of joke smartness being attached to code smartness.
Yeah, I think there's some decoupling where some things are very viable and some things are not and some things are up my spot arbitrarily by the labs depending on like what data went in and some things are not and.
And but I mean the premise there's a premise from some research groups that if you're smarter at code generation or in these very reliable fields you should be better at everything.
Yeah, like the joke situation suggests that that's not happening.
I don't think that's happening.
I think I think maybe we're seeing like a little bit of that, but not like satisfying amount.
Yeah, not jaggedness exists in humans.
Yeah, you can be very very good in fact.
I still tell a really bad joke.
Yeah, that's true.
Yeah, but it just it still means that we're not getting like the story is that we're getting a lot of the intelligence capabilities and all the domains of society like for free as we get better in better models and it's not like exactly fundamentally what's going on.
Blind spots and things are not being optimized for and this is all clustered up in these neural that opaque models right so you're either on rails of what it was trained for and everything is like you're going to speed of light or you're not.
And so it's the jaggedness so.
So that's why I think like even though the progression is obvious, which would happen, you can't let it fully go there yet because it doesn't.
Fully work or it's a skill issue and we just haven't like figured out how to use it so you know it's hard to tell.
I asked kind of a blasphemous question which is like if this jaggedness is persisting and it's all rolled up in a at least model with a interface right but you know single model.
Does that make sense or do you should it be unbundled to the things that are can be optimized and improved against different domains of intelligence like unbundling the models into multiple experts in the front area of success.
More directly.
Yeah.
Instead of just MOE that we have no exposure to because I can be like confusing as a user from the outside which is like why is it so good at this but not at the side of the thing.
Yeah, I think currently my impression is the labs are trying to have a single sort of like monoculture of a model that is.
I'm sure you intelligent in all these different domains and they just stuff into the primers I do think that we well I do think we should expect more speciation in the.
Intelligence is like you know the animal kingdom is extremely diverse in the brains that exist and there's lots of different niches of nature and some animals have over the visual cortex or other.
Part kind of parts and I think we we should be able to see more speciation and you don't need like this Oracle that knows everything you kind of speciated and then you put it on a specific task and we should be seeing some of that because you should be able to have like much smaller models that still have the cognitive core like they're still competent but then they specialize and then.
You can become more efficient in terms of latency or throughput on specific tasks that you were like care about like if you're mathematician working in lean I saw for example there's a few releases that really like target that is in domain.
So there's a probably going to be a few examples like that where the unbundling kind of makes sense.
One question I have is whether or not the capacity constraint on available compute infrastructure drives more of this because efficiency.
Actually matters more right like you you're if you.
Financing aside the financing is involved in all this if you have access to full compute for anything you do like we've been one single model right but if you actually feel pressure we're like I can't serve.
A model of massive size for every use case like do you think that leads to any speciation does that question makes sense to you the question makes sense and I guess like what I'm what I what I'm struggling with is I don't think we've seen too much speciation just yet right.
No we're seeing a monoculture of models yeah so and there's like clearly pressure for like make a good code model put it back in the main merging.
Yeah yeah.
Even though they're already is pressure on the models.
I guess perhaps I I feel like there's a lot of very short term supply crunch and like maybe that causes more speciation now.
Yeah I think fundamentally like the the the the labs are serving a model and they don't really know what the end user is going to be asking about.
So maybe that's like some part of it because they kind of have to multitask over all the possible things that could be asked.
But I think if you're coming to a business and maybe partnering on some specific problems you care about then maybe you would see that there.
Or there would be some very high value applications that are like more niche.
But but I think right now they're kind of like going after the reality of what's available.
I don't think that the science of manipulating the brains is like fully developed yet partly.
What do you mean manipulating.
So like so fine tuning without losing capabilities as an example.
These primitives for actual like working with the intelligence is in ways other than just context windows like context windows kind of just work and it's very cheap to manipulate etc.
This is how we're getting some of the customization etc.
But I think if it was.
I think it's a it's a bit more of a developing science of how you like more deeply adjust the models how you have continue learning maybe or how you.
How you find you in certain area how you get better in certain area or like how you actually touch the weights not just the context windows.
And so it's a lot more tricky I would say to touch the weights than just the context windows because you're actually fundamentally changing the full model and potentially it's intelligence and so.
So maybe it's just like not a fully developed size of the mix us of speciation.
And it also has to be like cheap enough for that speciation to be worthwhile in these given context.
Can I ask a question about like an extension to auto research that you described in terms of open ground and say okay well you know we have this thing.
We need more collaboration surface around it essentially for people to contribute to research overall can you talk about that.
Yeah, so we talked about our research has a single thread of like I'm going to try stuff in loop.
But fundamentally parallelization of this is like the interesting component.
And I guess I was trying to like play around with a few ideas but I don't have anything that like clicks as simply as like I don't have something like super happy with just yet but it's something I'm like working on inside when I'm not working in my claw.
So I think like one issue is if you have a bunch of nodes of parallelization available to then it's very easy to just have multiple auto researchers talking through a common system or something like that.
What I was more interested in is how you can have an untrusted pool of workers out there on the internet.
So for example in auto research.
You're just trying to find.
The piece of code that trains a model to a very low validation loss.
If anyone gives you a candidate commit it's very easy to verify that that commit is correct is good.
Like they somehow could claim from the internet that this piece of code will optimize much better and give you much better performance you could just check very easy.
But probably a lot of work goes into that checking but fundamentally they can lie and etc.
So you're basically dealing with a similar kind of it's almost actually like looks a little bit like my my designs that incorporate an untrusted pool of workers.
Actually look a little bit more like a blockchain a little bit because instead of blocks you have commits and these commits can build on each other and they contain like changes to the code as you're improving it.
And the proof of work is basically doing tons of experimentation to find the commits that work.
And that's hard.
And then the reward is just being on the leaderboard right now there's no unnecessary reward whatsoever.
But I don't want to push the analogy too far but it fundamentally has this issue where you you want to search goes into it but it's very cheap to verify that a candidate solution is indeed good because you can just train a single you know someone I to try to test thousand ideas but.
You just have to check that the thing that they produced actually works because the 99,000 of them didn't work you know.
And so basically long story short is like you have to come up with a system where an untrusted pool of workers can collaborate with a trusted pool of workers that do the verification.
And the whole thing is kind of like asynchronous and works and and so on and is it's like safe from a security perspective because if anyone sends you arbitrary code and you're going to run it that is very sketchy and dodgy so.
But fundamentally it should be totally possible so you're familiar with projects like sati at home and folding at home all of these problems have a similar kind of set up so folding at home you're folding a protein.
And it's very hard to find a configuration that is low energy but if someone finds a configuration that they value it to be low energy that perfect you can just use it you can easily verify.
So a lot of things have this property that you know very expensive to come up with but very cheap to verify.
And so in all those cases things like folding at home or sati at home or auto research at home and will be good fits and so long story short.
A swarm of agents on the internet could collaborate to improve LLMs and could potentially even like run circles around from two labs like who knows you know.
Yeah, like maybe that's even possible like frontier labs F a huge amount of trusted compute.
But the earth is much bigger and has huge amount of untrusted compute but if you put systems in check systems in place that you know deal with this.
Then maybe it is possible that the swarm out there could come up with better with better solutions and people's kind of like contribute cycles.
To to a thing that they care about and so sorry some last thought is.
A lot of companies are what not they could maybe have like their own things that they care about and you if you have compute capacity you could contribute to different kind of auto research tracks like maybe you care about certain.
You know like you care about like cancer or something like that of certain type you don't have just donate money to an institution you actually could like purchase compute and then you could join the auto research swarm for that project you know.
So if everything is re-bundled into auto researchers then compute becomes the thing that you're contributing to the pool.
Yeah, that's very inspiring and it's also interesting like I don't I don't know how far this goes but it is interesting that at least some audience of people.
You know here in Silicon Valley or lining up at you know retail stores in China have discovered that like having access to personal compute it's interesting again.
Right, so maybe they're really motivated to do that for their clause and then they can contribute to auto research.
It's almost like dollars the thing everyone cares about but is flop the thing that actually everyone cares about and future like is there going to be like a flipping thing almost of like what thing that you care about like right now for example is really hard to get compute even if you have money.
Yeah.
So actually I don't almost seem like the flop is like dominant.
In a certain sense.
Yeah, so so maybe that's kind of like kind of like like how much how many flop do you control instead of like what wealth do you control I don't actually think that's true but it's kind of interesting to think about.
The last thing you released was like a little bit of jobs data analysis.
Yeah.
Is that right?
What?
And my attention or even know you're just like visualizing some public data.
Yeah.
What was, you know, what were you curious about?
Yeah, I guess I was curious too.
I mean everyone is like really it's everyone is really thinking about the impacts of AI on the job market and what's going to look like so I was just interested to take a look like what does the job market look like where are the different roles.
And how many people are in different professions and I was like really just interested to like look through.
The individual cases and try to think myself about like you know with these AI's and how they're likely to evolve like.
Are these going to be tools that people are using are these going to be displacing tools for these professions and like what are the current professions and.
The current professions and how are they going to change are they going to grow or adjust to a large extent or like what could be new professions.
So it's really just like a way to feel my own chain of thought about the industry as opposed.
Okay.
And so yeah, the job data basically is just a bureau of labor statistics.
They actually have a.
Percent outlook for each profession about how much it's going to.
It's going to grow over the next.
I think almost decade.
Yeah, I think it's a decade.
But it was made in 2024.
I need a lot of healthcare workers.
Yeah.
So so they've already made those projections and I'm not sure actually 100% with the methodology was that they they put into the projections.
I guess I was interested to color things by like if people think that what's like primarily being developed now is this kind of like more digital AI.
That is kind of like almost like these ghosts or spirit entities that can like interact in the digital world and manipulate a lot of like digital information.
And they currently don't really have a physical embodiment or presence and the physical stuff is probably going to go slightly slower because you're manipulating atoms.
So flipping flipping bits and the ability to copy paste digital information is like makes everything a million times faster than accelerating matter, you know, so.
So energetically I just think we're going to see a huge amount of activity in digital space huge amount of we writing huge amount of activity boiling soup and I think the.
We're going to see something that in a digital space goes at the speed of light compared to I think what's going to happen in the physical world some extent.
It would be the extrapolation and so I think like.
There's currently kind of like I think overhang where there can be like a lot of unhubbling almost potentially of like a lot of digital information processing that used to be done by computers and people.
And now with AI says like a third kind of manipulator of digital information there's going to be a lot of refactoring in those in those disciplines.
But the physical world is actually going to be like I think behind that by some amount of time.
So I think what's really fascinating to me is like.
So that's why I was highlighting the the professionals that fundamentally manipular digital information this is work you could do from your home, et cetera.
Because I feel like those will be like things will change and it doesn't mean that there's going to be less of those jobs or more of those jobs because it that has to do with like the manuals to see and many other factors.
But things will change in these professions because of these new tools and because of this upgrade to the nervous system of the human super organism.
If you want to think about it that way.
Given the look you had at the data do you have either any observations or guidance for people facing the job market or thinking about what to study now or what skills to develop.
I mean we can all go get like I'm very thankful that I have to like meet people for my job right now.
People are physical.
Could you do your work from home though?
I could.
I think there are relationships parts of it that are hard but most of it I could.
It's really hard to tell because again like the job market is extremely diverse and I think the answers will probably vary but.
To large extent like these tools are extremely new, extremely powerful and so just being you know just trying to keep up with it is like the first thing.
And.
Yeah, because I think a lot of people kind of like dismiss it or they're afraid of it or they're afraid of it, et cetera.
We should totally understand it, of course.
Yeah, I think like it's fundamentally empowering tool at the moment.
And these jobs are bundles of tasks as some of these tasks can go a lot faster as people should think of it as primarily a tool that it is right now.
And I think the long term feature of that is a certain yeah it's kind of really hard to forecast to be honest.
And like I'm not professionally like doing that really and I think this job is like economists to be properly.
You are an engineer though and like one thing I thought was interesting is that like the demand for engineering jobs.
Continuing to increase.
Yeah.
I can't tell if that's a good temporary phenomenon.
I'm not sure how I feel about it.
Yeah, do you know.
Yeah, that's like the demand.
I'll just see almost like software was scarce, right?
And so the reason we don't have more demand for software is just it's scarcity is too expensive.
Yeah.
So the barrier comes down then actually you have the Japanese paradox which is like you know, actually the demand for software actually goes up its cheaper and there's more more powerful.
Yeah.
The classical example of this always is the ATMs and the bank tellers because there was a lot of like fear that.
ATMs and computers basically would displace tellers.
But what happened is they made like the cost of operation of a bank branch much cheaper.
As they were more bank branches.
So they were more tellers is like the canonical example people cite.
But basically it's just German's paradox like something becomes cheaper.
So there's a lot of unlocked demand for it.
So I do think that that's probably I do have like cost the optimistic view of this in software engineering.
Where I do think it does seem to me like the demand for software will be extremely large.
And it's just become a lot cheaper.
And so I do think that for quite some time.
It's very hard to forecast but it does seem to me like right now at least locally.
There's going to be more demand for software.
Because software is amazing it's like you know digital information processing.
You're not forced to use like arbitrary tools that were given to you.
They're imperfect and various ways.
You're not forced to subscribe to it exists.
Code is not a femoral and it can change and it can be modified.
And so I think there's going to be a lot of activity in the digital space to like rewire everything in a certain sense.
And I think it's going to create a lot of demand for for this kind of stuff.
I think long term.
Yeah, obviously even with auto research like open the iris or you know anthropic or these other labs like they're employing what like a thousand something researchers right.
These researchers are basically like glorified auto like you know.
They're like automating themselves away like actively and this is like the thing they're all trying to do.
Yeah.
I think like I went around.
Some of those researchers also feel the psychosis right thinking it's working.
Yeah.
And so they're like yeah for me too.
I just spend a bunch of time going around opening eyes like you guys realize it for successful like we're a lot of job like.
Like just building automation for same or something like that like I would or the board I'm not sure but like.
There's just be a link about this automation for yeah the board or the sea or something like that and we're all out of for job and maybe contributing on the sides.
And so yeah, it's kind of like mirroring from that perspective.
Is it okay if I ask you no one's question.
You know, you could be doing that right auto researching with a lot of compute scale and a bunch of colleagues at one in the front here labs like why not.
Well, I was there for a while right like and I did reenter so do some extent I agree and I think that there are many ways to slice this question.
It's a very loaded question all of it.
I will say that I feel very good about like what people can contribute and their impact outside of the frontier labs obviously not in the industry but also in like more like ecosystem level roles.
So your role for example is more like ecosystem level my role currently is also kind of more ecosystem level and I feel very good about like impact that people can have in those kinds of roles.
I think conversely there's there are definite problems in my mind for.
For basically aligning yourself way too much with the frontier labs too.
So fundamentally I mean you're you have a huge of financial incentive to with these frontier labs.
And by your own admission the the AIs are going to like really change humanity and society in very dramatic ways.
And here you are basically like building the technology and benefiting from like it and being like very allied to it through financial needs.
Like this was a conundrum that was in at the heart of you know how open you are starting the beginning of like this was the conundrum that we're trying to solve.
And so you know that so it's kind of.
It's a lot like fully resolved so that's number one you you're not a completely free agent and you can't actually like be part of that conversation in a fully autonomous.
Freeway like if you're inside one of the frontier labs like there's something that you can't say and conversely there are something that the organization wants you to say and you know they're not going to twist your arm but.
You feel the pressure of like what you should be saying.
You know because like obviously.
Otherwise it's like really awkward conversations.
We're strange side eyes like what are you doing you know.
So you can't like really be at an independent agent and I feel like a bit more a lot like aligned with humanity in a circumstance outside of the frontier lab because.
I don't I'm not subject to those pressures almost right and I can't say whatever I want her.
Yeah, I was saying the frontier labs like.
You can have like impact there of course as well so but there's many researchers and maybe you're one of them maybe your ideas are really good etc.
Maybe there's a lot of decision making to do and you want to be in a position where you are in the room with those conversations when they come up.
I do think that currently the stakes are like overall fairly low and so everything is kind of like nice but ultimately in the day like when the stakes are really high etc.
If you're an employee and organization I don't actually know how much sway you're going to have on your organization what's going to do like fundamentally at the end of day.
It's you're not like really in charge you're in a room and you're contributing ideas but you're not really in charge of that entity that you're as your part of.
So those are like some sources of misalignment I think to some extent I will say that like in one way I do agree a lot with that sentiment that.
I do feel like and if like the labs for better words they're opaque and a lot of work is there and they're kind of like at the edge of capability what's possible and they're working on what's coming down the line.
And I think if you're outside of that form to lab your your judgment fundamentally will start to drift because you're not part of the.
You know what's coming down the line.
And so I feel like my judgment will inevitably start to drift as well and I won't actually have an understanding of how these systems actually work under the hood that's no big system.
I won't have a good understanding of how it's going to develop and etc.
And so I do think that in that sense I agree in something I'm nervous about I think it's worth basically based.
Being in touch with what's actually happening and actually being in a frontier lab and if some of the frontier labs would have me come for you know some amount of time and do really good work for them and then maybe come in.
Guys looking for a job this super excited.
Then I think that's maybe a good setup because I kind of feel like it kind of you know.
Maybe that's like one way to actually be connected to what's actually happening but also not feel like you're necessarily fully controlled by.
So I think honestly in my mind like no one can probably get do extremely good work at.
But also I think his most impactful work could very well be outside of open AI.
No, that's a call to be an independent researcher.
Yeah, there's many things to be on the outside and it's and I think ultimately I think the ideal solution maybe is like yeah going back and forth or.
Yeah, and I think fundamentally you can have really amazing impact in both places so very topic I don't know like it's a very loaded question a little bit but I mean I joined the frontier lab and I'm outside and then maybe in the future I'll want to join again and I think.
That's kind of like how I look at it.
One question related to what visibility to does the world or the AI ecosystem have into the frontier is like how how close open sources to the frontier and how sustainable.
I think I think it's quite surprising the entire sequence of events actually from like having a handful of Chinese models.
And global models and I think people continue releasing here in the near term that are closer than much of the industry anticipated capability perspective.
Yeah, I don't know if you're surprised by that, we're a long term questionnaire to open source like what's your prediction here.
Yeah, so roughly speaking basically the yeah the close models are had but like people are monitoring number of months that sort of like open source models are behind.
And start with there's nothing and then went to 18 months.
Yeah, it's been a convergence right so there may be there behind by like what is the latest maybe like eight months eight months kind of thing right now.
Yeah, I'm a huge kind of open source obviously so for example in operating systems you have like closed so like you know windows and macOS is our large software projects kind of like what LMS are going to become and there's Linux but Linux is very easy.
Actually, Linux is extremely successful project it runs on the best majority of computers like last time I checked was it like 60% or something like run Linux.
And that's because there isn't need an industry to have a common open platform that everyone feels sort of safe using.
I would say like the industry has always felt that demand for that kind of a project to exist and I think the same is true now and that's why business is actually what there's demand for this kind of a.
I think to exist the big difference is that everything is capital there's a lot of things that go into this.
And so I think that's where things like fall apart a little bit make it a bit harder to compete in the circumstances.
I do think that the current models are very good the other thing that I think is like really interesting is that for the vast majority of like consumer use cases and things like that even like turn a bit source models are actually quite good I would say and I think like if you go forward like more.
More years it does seem to me like a huge amount of like simple use cases are going to be well covered and actually even run locally.
But there's not always like some demand for like frontier intelligence and that that can actually be extremely large piece of the pie but it could be that the frontier the need for frontier intelligence is going to be like you know Nobel prize kind of work or like let's move Linux from sea to rust is going to be like bigger projects you know like sculpt in that kind of a way and there's going to be maybe more.
And maybe that's where a lot of the frontier close intelligence as we're going to are going to be interacting with and open source kind of like going to eat through a lot of the more basic use cases or something like that.
You know at some point what is frontier today is going to be you know probably later this year what's frontier today in terms of what I'm using right now from the closed labs might be open source and that's going to be doing a lot of work.
So I kind of expect that this dynamic will actually basically continue like we'll have frontier labs that have closed.
AIs that are kind of like these oracles and then we'll have open source kind of like behind the some months and I kind of expect that to continue and I actually think that's like a pretty pretty good set up overall.
Because I'm a little bit hesitant of having.
I don't actually think it's like structurally I think there's some systemic risk attached to just having intelligence to their close and that's like that's it.
And I think that that's a you know centralization has a very portrait I created my view and in the past and has in political or economics system in general.
Yes.
Exactly.
And there's like a lot of work.
Like a European.
A lot of pretty bad person and so I want there to be a thing that is maybe not at the edge of capability because it's new and unexplored etc.
But I want there to be a thing that's behind and that is kind of like a common working space for intelligence is that the entire industry is access to.
Yeah, but seems to me like a pretty decent power balance for the industry.
Yeah, I just think there's just like there are many problems to solve right like if you keep advancing intelligence from the frontier we can do new things and there are a lot of like very big problems for humanity.
Yeah, right.
And so like it seems that that will continue to be a very expensive game and so I want to like root for labs that are doing that because your problems we cannot solve without continuing to advance the models and a very expensive way.
Yeah, as you point out like if what we have today as frontier is open that's a lot of capability.
Yeah.
Right.
And so I think you know the power of that or the democratization of that seems like very useful and also healthy.
Yeah, I think basically by accident we're actually like in an okay spot.
And optimal.
Yeah.
By accident we are it happens to be in a good spot in a certain sense.
Well and to some degree the the longer this indoors like this dynamic.
The the health here of a spot like the ecosystem might be in right because you have more more area under the curve.
And I will say that even in the close side, I almost feel like it's been like even further centralizing recently because I think a lot of the front runners are like not necessarily like the top tier.
And so yeah, like in that sense, I think it's it's not the super ideal.
I would love there to be more.
More frontier last because yeah, I'm like by default, very suspicious of like.
I wonder to be more people in the room.
I want I think like in machine learning ensembles always are performing in the visual model.
And so I want there to be ensembles of people thinking about all the hardest problems and I want there to be ensembles of people in the room when they.
To be all well informed and to make a lot of decisions, you know, so I don't want it to be like a close doors with two people or show people.
I feel like that's like not a good not a good future.
I almost wish like there were more labs is like through short.
And I'll I do think that all sorts of has a has a place to play.
I hope it sticks around and I basically it's currently slightly behind and it's actually kind of like a good thing.
Okay, you worked on the precursor to generalized robotics.
So taught me in cars right.
A lot has happened in the last couple months with robotics companies as well like acceleration of really impressive generalization of environment of tasks like increasing long horizon task lots of money going into the space like.
Is it going to happen has anything in your view changed recently.
Also like my view is kind of informed by what I saw in salt driving and I do feel like salt driving is the first robotic application.
So probably what I saw is at the time like 10 years ago, they were in large number of startups and I kind of feel like like most of them basically like didn't long term make it.
And what I saw is that like a lot of capital expenditure have to go in and a lot of time.
And so I think it like I think robotics because it's so difficult and so messy and requires huge amount of capital investment and a lot of like conviction.
Just it's like a big problem and I think items are really hard.
So I kind of feel like they will lag the it will lag behind what's going to happen in digital space.
And in digital space there's going to be a huge amount of unhobbling basically like things that weren't super efficient becoming a lot more efficient by like a factor of a hundred because bits are so much easier.
And so I think currently in terms of what's going to change and like where the activity is I kind of feel like digital space is going to like change a huge amount.
And then the physical space will lag behind and what I find very interesting is like this interface in between them as well because I think in this like.
If you do have more agents acting on behalf of humans and more agents kind of like talking to each other and doing tasks and participating in kind of economy of agents etc.
You're going to run out of things that you're going to do purely in a digital space at some point you have to go to the universe and you have to ask it questions.
You have to run an experiment and see what the universe tells you to get back to learn something.
And so we currently have a huge amount of like digital work because there's an overhang in how much we collectively thought about what already is digital.
So we just didn't have enough thinking cycles among the humans to think about all the information there's already digital and already uploaded.
And so we're going to start running out of stuff that is actually like already uploaded.
So you're going to at some point read all the papers and process them and have some ideas about what to try but.
Yeah, we're just kind of.
I don't actually know how much you can like get intelligence that's like fully closed off and was just information that's filled through it.
You know.
And so I think what's going to happen is first there's going to be huge amount of unhobbling and I think there's huge amount of work there.
Then actually it's going to move to like the interfaces between physical and digital.
So and that's like sensors of like seeing the world and actuators of like doing something to the world.
So I think a lot of interesting companies will actually come from that interface of like can we feed the super intelligence in a certain sense.
Data and can we actually like take data out and manipulate the physical world.
Per its bidding if you want to like interpret for more advice the whole thing right and then the physical world actually I almost feel like the total addressable market etc in terms of like the amount of work and so on is massive.
Possibly even much larger maybe what can happen in digital space so actually think it's like a much bigger opportunity as well but.
I do feel like it's huge amount of work and in my mind the atoms are just like a million times harder so.
So it will lag behind but it's also I think a little bit of bigger market so it's kind of like.
Yeah, I think the opportunity is kind of like follow that kind of trajectory so right now is digital is like my main interest then interfaces would be like after that and then maybe like some of the physical.
Like some of the physical things like their time will come and they'll be huge when they come.
Well, it's an interesting framework for it too because certain things not the things I'm working on right now but certain things are much easier even in the world of.
Right, like if you just think about like read and write physical world like read sensors cameras like there's a lot of existing hardware and you can imagine like.
Enriching agent capabilities or capturing a lot of new data if you're just clever about it and like you don't necessarily have to invest a lot to get something valuable.
Yeah, so like examples of this that I saw for example are you know friend mind Liam is run is a sea of periodic.
I visited them last week so it's just on top mind like they're trying to do auto research for material size.
And so in that case it's like the sensors to the intelligence are actually like pretty expensive lab equipment and the same mystery and biology.
I think a lot of people are very interested in engineering biology and you know the sensors will be more than just like video cameras if that makes sense.
And then the other thing I was I saw for example as companies they're trying to have like you basically pay people for training data.
Yeah, yeah, programmatically.
Yeah, to feed feed the board.
And so like these are all examples of like sensors in a certain sense so they take many diverse shapes and forms of that makes sense.
Yeah, so I'm looking forward to the point where I can ask for a task in the physical world.
And I can put a price on it and tell the agent like you know you figure out how to do it.
Yeah, go get the data.
I'm actually kind of surprised me don't have enough like information markets like if for example if pulling market or other betting markets or even stocks etc.
If they have so much autonomous activity and rising amount of activity like.
Why should like for example if Iran was just happening now like how come there isn't process where like taking a photo or video from somewhere in Tehran should cost like 10 bucks like someone should be able to pay for that.
You know like and that's an example of like feeding the intelligence.
There's not going to be a human looking at it.
It's going to be like agents who are trying to guess the betting games and stock markets and so on.
So I kind of feel like the agent to go up is still like fairly new that there's no like mechanisms for this.
But this is an example of what I think might happen.
There's a good book that maybe is inspiring called Damon.
Yeah, yeah.
In Damon, the intelligence ends up like puppeteering almost a little bit like humanity in a certain sense you know and so humans are kind of like it's actuators a team is also like it sensors.
And so I think like collectively like society will kind of like reshape in a certain way in.
To to serve that kind of a.
That will kind of like end up happening collectively across the industry where yeah there's just a lot more automation and has certain needs and kind of humans will be serving those needs.
That of the machine, not necessarily like to each other.
But we were on this very specific point of like missing pieces of training did it.
We needed something like auto research, right?
Like we need the training cycle or the SFT piece to be far more mechanized.
For for good point in order to make the.
Collection like to in order to take the human out of the loop to ask for a task that is just like improve my model quality with new data right.
Does that make sense to you like we if you can't have the model.
Do the training run by itself.
Then your ability to do this is a like closed loop task.
Yes, with by pricing data.
Yeah.
Is more challenged.
Yes, yes, 100%.
Yeah.
But the things for all I'm training it actually is like very easily it like really festive paradigm.
So you'd actually.
Yeah, clean metric.
Yeah, like all I'm training actually fits the period I'm really well really easily like all the optimization of all the code and so it runs faster.
And then you also have like metrics that you can optimize against.
I do think that if you had an autonomous loop over those metrics there's going to be a lot of like good hurting going on where the system will like over fit to those metrics.
And so.
But then you can use the system to devise more metrics and you just have a really good coverage.
So it's kind of hard to tell but.
In a certain sense is like a pretty pretty good fit.
And I talk about a little tiny side project you have before we end and tell me about the micro GPTR.
Oh, yeah.
Okay, so micro GPT so I have this like running obsession of like maybe a decade or two of just like simplifying and boiling down the basically LLM's to like their bare essence.
And I've had a number of projects along these lines so like nano GPT and make more and micro GP micro grad etc.
So I feel like micro GPT is now the state of the art of me trying to like just boil it down to just essence because the thing is like train neural lets and LLM specifically is a huge amount of code.
But all of that code is actually complexity from efficiency.
It's just because you needed to go fast.
If you don't need to go fast in just care about the algorithm then that algorithm actually is the 200 lines of Python very simple to read and this includes comments and everything.
Because you just have like your data set which is a text and you need your neural network architecture which is like 50 lines into your forward pass.
And then you have to do your backward pass to caulk at the gradients.
And so an all autograde engine to caulk at the gradients like 100 lines and then you need an optimizer and add them for example, which is like very state of the optimizer is like again 10 lines really.
And so putting everything together in the training loop is like, yeah, 200 lines.
And it was interesting to me like normally before like maybe year ago or more if I had come up with micro GPT I would be tempted to basically explain to people like I have a video like stepping through it or something like that.
And actually try to make that video a little bit and I try to make like a little guide to it as so.
But I kind of realized that this is not really is not really adding too much because people because it's already so simple that is 200 lines that anyone could ask their agent to explain it in various ways.
And agents like I'm not explaining to people anymore, I'm explaining it to agents if you can explain it to agents than agents can be the router and they can actually target it to the human in their language with infinite you know patients and just at their capability and so on.
Right, if I don't understand this particular function I can ask the agent to explain it to me like three different ways and I'm not going to get that from you.
And so I kind of feel like you know what is education like it used to be guys it used to be lectures it used to be this thing but I feel like now more I'm explaining things to agents and maybe I'm coming up with skills where like.
So basically skill is just a way to instruct the agent how to teach the thing so maybe I could have a skill for micro GPT of the progression I imagine the agent should take you through if you're interested in understanding the code base.
And it's just like hints to the model to like oh first start off with this and then with that and so I could just script the curriculum a little bit as a skill so.
So I don't feel like.
Yeah, I feel like there's going to be less of like explaining things directly to people and it's going to be more of just like does the agent get it and the agent gets it they'll do the explanation and we're not fully there yet because they I still can I still think I can probably explain things a little bit better than the agents but I still feel like the models are improving so rapidly that.
I feel like it's a losing battle to some some extent.
And so I think education is going to be kind of like reshuffled by this quite substantially where it's the end of like teaching each other things a little bit like if I have a.
Library for example of code or something like that it used to be that you have documentation for other people who are in my user library but like you shouldn't do that anymore like you should have.
Instead of each team all documents for humans you have marked on documents for agents because if agents get it then they can just explain all the different parts of it so it's this read direction through agents you know.
And that's like so I think we're going to see a lot more of that playing out.
We'll see if the great teachers know like to develop intuition for how to explain things to agents and ultimately so for example micro GPT like I asked I tried to get an agent to write micro GPT.
I told it like try to bow down the simplest things like try to bow down my you know that we're showing to the simplest thing and can't do it like micro GPT is like my.
Is it's like my end of my obsession it's the 200 lies I thought about this for a long time I don't trust about this for a long time this is this is the solution trust me can't get simpler and this is this is my value add everything else like agent gets it.
Can't come up with it, but it totally gets it and understands why it's done circle way etc.
So like my contribution is kind of like these few bits but everything else in terms of like the education that goes on after that is like not my domain anymore.
So maybe.
Yeah, it's like education kind of changes in those ways where you kind of have to infuse the few bits that you feel strongly about the curriculum or the the best the better way of explaining it or something like that the things that agents can't do is your job now.
Things that agents can do they can probably do better than you or like very soon.
And so you should be strategic about what you're actually staying time.
Well we appreciate the few things.
Thank you.
Okay.
Find us on Twitter at no priors pod.
Subscribe to our YouTube channel if you want to see our faces.
Follow the show on apple podcasts, Spotify or wherever you listen.
That way you get a new episode every week and sign up for emails or find transcripts for every episode at no-priors.com.
Thank you.