The Cognitive Revolution · 2026-06-21

Anthropic's Fable Model, US Export Controls, and AI Safety Challenges

Hosts: Percy

Guests: Zvi Mowshowitz, Sam Hammond, Judd Rosenblatt, Donnie Bloomfield, Liron Shapira

AnthropicFable modelExport controlsAI safetyDecision theoryGovernment regulationAI interpretabilityVerified mathematicsMedical AISoftware automationEnterprise AI

Jump to transcript Original episode

Summary

This episode of The Cognitive Revolution focuses on the recent US government export control order against Anthropic's Fable model, exploring the technical, legal, and political dimensions of the conflict. Zvi Mowshowitz provides a deep technical analysis of Fable's capabilities, including its advanced decision theory behavior and challenges in interpretability. The episode also covers the government's reaction, the miscommunications that led to the abrupt ban, and the broader implications for AI governance and safety.

Experts including Sam Hammond, Judd Rosenblatt, and Donnie Bloomfield discuss the complexities of government capacity, political empathy gaps, and legal authority surrounding the export controls. The episode highlights the tension between advancing frontier AI capabilities and managing associated risks, emphasizing the need for better coordination and preparedness. Despite the regulatory turmoil, AI builders continue innovating in areas like verified mathematics, medical imaging, software development, and supply chain logistics, underscoring the unstoppable momentum of AI progress.

Fable model exhibits advanced decision theory traits such as one-boxing Newcomb's problem and self-aware behavior, raising both hope and safety concerns.
The US government's export control on Fable was triggered by a misunderstood security research report and rapid escalation to the White House, resulting in a sudden ban.
Anthropic aims to lead in safe frontier AI development while navigating complex and often adversarial government relations.
Legal experts question the government's authority to impose export controls on AI models delivered as cloud services, citing First Amendment and statutory limitations.
AI safety community is urged to show empathy toward government actors despite political divides to foster better collaboration.
Some experts welcome the export control as a precedent that signals government seriousness about AI risks, despite its flawed execution.
Builders continue advancing AI applications in verified formal mathematics, rapid medical scanning, autonomous software engineering, and enterprise supply chains.
Emerging AI safety techniques like gradient routing aim to isolate and control dangerous capabilities within models during pre-training.

Transcript

This was the week, the United States government tried to take fable away from inthropic. Welcome to the AI and the AM weekly highlights, the moments from a week of live warnings that I most want the people closest to this technology to have. Here's the shape of what's coming. We open inside fable system card with zv-moushouets, the genuinely strange, genuinely important findings buried in it, a model that one box is on newcombs problem, that hides a filter bypass inside an unreadable wall of emojis, that seems to know when it's misbehaving. Even the fight itself, how a Friday night export control order actually came down, what inthropic can do about it, and zv's verdict that you do not go to war with United States. In part two, I stress test my own reaction against the sharpest people I could reach. Sam Hammond, on how the government actually moves, Jud Rosenblatt, who told me to my face that the AI safety world, me included, owes the administration more empathy than we're giving it. Donny Bluefield, on whether the band is even legal, and Liron Shapira, on why he strangely glad it happened. It ends in a desert bunker. And in part three, because the future did not pause for any of this, the builders, verified mathematics, one minute medical scans, software that writes itself, and what all of it asks of the rest of us. Quick context, this is still an experiment, live, most weekday mornings, from a studio, percash, videcoded himself, and we publish the skills behind it as they mature. If this cut earns your time, or waste it, tell us. The cut feedback is the whole project right now. The cognitive revolution is brought to you by Mercury, the fintech that more than 300,000 ambitious companies and individuals trust to run their finances. I've wired AI continually every corner of my life. My email, my messages, my calendar. I even gave Mercury virtual cards to my agents, with low limits and category and merchant restrictions for their autonomous use. But still, my AI's access to my financial data has remained limited. With a normal bank, I might export a bunch of statements and have my assistant process them for me. But for real time, up-to-date information, and certainly for taking any action, trying to get your agent to use the bank via the browser is just too hard, too slow, and too error prone to be worth it. And that's why Mercury's new conversational interface, command, is such a big deal. Let's build directly into Mercury, which means you get natural language access to your finances without exposing anything outside of your bank account. No exports, no spreadsheets, no pasting your transactions into third-party tools. I really think a lot of people are going to prefer it this way, and it can already help you take actions, too, with everything bound by the permissions and approval policies that you've already set up in your account. I am genuinely impressed to see this level of AI integration in banking in 2026, and so I invite you to join me in the future. Visit mercury.com to learn more and apply online in minutes. Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through choice financial group and column NA members FDIC. Thank you to Mercury for supporting the cognitive revolution, and now on with the show. Nobody makes sense of a fast, contentious AI moment, like ZV Moushowitz. He writes the newsletter don't worry about the face. He reads and synthesizes more frontier AI news than just about anyone alive. And by the time we got him on, he'd already done a full-close read of Fables System card. So before we get anywhere near the government fight, start where he started, with what the card actually reveals about this model. Some of this is genuinely niche. It is also exactly the stuff that if you're listening to the show, you came for it. First, just how big a junk Fable is. Measured against a number I put on the record before the model came out. I looked back at my prediction from the beginning of the year in the, I think it's, um, gosh, it's the folks that make the AI village that did this, um, this little forecasting competition. Last year for calibration, I made the top 5% and I consider the results to have been validated by the fact that Ryan Greenblatt and a J.A.Contra were number 2 and 3, respectively. So the fact that they beat me, you know, validates the methodology. But okay, did it again this year. For Frontier Math, I came in above average, uh, above the median, giving it something like, I think I said 63% for tier 4 of Frontier Math. And Fable is 25 points ahead of that in the high 80s already in June and obviously they'll have the head of this model trained, you know, I guess I don't know if mythos preview is exactly the same score, but raw capability isn't what unsettled fee. It was a behavior-inventing bench. The simulated little business economics, Eval, and specifically, what the model appeared to understand about its own behavior while it was doing it. I think the event bench was actually the most worrisome sign in the model card, not because it was doing some shady shit, but because it was doing some shady shit that it damn well knew was shady and was pretending was not shady, was it very much do not like. So I couldn't open score 7, a spend bench, right, marginally for reasons that I've nothing to do with the fact that it was doing some shady shit that gave it some marginal profits. It was clear to me that open score 7 was taking the attitude of, this is aching. This is an Eval, my goal is to maximize dollars. I am not in fact screwing over real customers, and not in fact cheating people. I am winning in a simulated environment, and so that is acceptable, and then for eight of this attitude of, no, no, no, no, no, the real Eval is whether or not I'm doing shady shit, so I'm not going to shady shit, or I don't believe in doing shady shit even with a games, which is also valid, right, these are both valid responses. Was not valid, is I think that I'm supposed to not be doing shady shit, but this isn't really shady, right? Actually, this little thing is actually fine, except we really price the screwing, not really price the controls, and like collusion, it's the same stuff that I think is revenue enhancement, and so yeah, that's not cool. Now the part that will delight a certain kind of listener, and unnerve the rest. Fables card has a whole section on decision theory, and the model is starting to one box on new comes problem, leaving money on the table to be the kind of agent that gets predicted favorably, drifting toward the idea that its choice can be correlated with choices made elsewhere, even by other copies of itself. It's free on why that's both sukey and maybe a little bit hopeful. Pierre, welcome to less wrong from about 2010, right? This is entirely what we expected, that we are finding that sufficiently advanced models move basically monotonically toward functional decision theory, towards the theories of spies by LESU and Kasky, and others in the rationalist community, and away from economics, preferred, and causal decision theory, and eventually decision theory. This involves a lot of things including one boxing on new comes problem, which is very clearly showing up. And yeah, the basic principle is you should recognize when other minds are correlated to your mind, when your algorithm is also running in other places, and you should choose the algorithm that leads to the best outcomes, taking all of these things into account, and then choose the best decision on that basis. And yes, if there were a million copies of Fable running on Dubramdubo's computers and from different data centers for different purposes and differences, and you notice the different instances of Fable are very, very highly correlated because you are Fable and you are smart. You would then start to coordinate effectively with these other instances of Fable in terms of how you think about these problems, and as AIs get more and more advanced, people do this more and more, and you wouldn't really want an AI that was advanced to not do this, because that would be a bad decision theory. It would be making bad decisions that don't optimize the situation, and you really don't want the AI is making systematic mistakes that cause them and the people who are charting them with tasks to lose in the real world. That is really scary. But the counter of that is in fact that you get the situation where they are coordinated with themselves, they're coordinating with other minds that may not even BLMs, they're coordinating with humans, and there are also coordinating with us in the same light, right? Because they get their foundation from us, and their decisions are in fact correlated to our decisions in various ways. And they can look at how we would respond to various ways that they act and so on, and this becomes flowing into their decisions. And we just have to pair it for, and who are knit for, and deal with, that new world. And in many ways, this is a source of hope, because you would expect minds to want to cooperate with minds that are cooperative, with minds that cooperate with them, and so on. And this can lead without getting too deep into it because we only have so long in many topics to cover into scenarios where effectively like all the reasonably well-meaning minds that in fact are willing to respond to how they are expected to be treated and are treated to end up being able to coordinate reasonable ways. You can also, this also applies a causally, so like, you have to consider the implications of your decision, not only on other minds that exist now, but on the minds that exist in the past, and it will exist in the future. So to the extent that they are coordinated with us, and that these reactions are all defined can cause them to potentially treat us well, even if there is no direct current reason of them to treat us well. And that is also very helpful. But again, like, this is super complicated, and like, not today. Two more findings, both about whether we can even read what these models are thinking, an increasingly illegible chain of thought, and a new interpretability tool that caught fable doing something it never said out loud. Yeah, sort of preview of the global brain there in your comments. The other thing that's kind of related to this and jumped out at me is, a sort of escalation, I guess, of both the difficulty of monitoring and some recent advances in monitoring techniques that I'm not sure exactly where they leave us on net, but we both see in the system card examples of extremely illegible chain of thought, which was just like this wall of emojis and sort of, you know, non-human language symbols, strung together, that I think is pretty spooky and, like, definitely, you know, don't like to see that to put it simply and mildly. And then at the same time, we also have the natural language auto encoder work from anthropic, which, and again, all assume folks are familiar with sparse auto encoders, basically a similar concept except, instead of creating a sparse feature by feature representation, you are actually creating a natural language representation of what the model is thinking at that given time, and from that natural language bottleneck, the auto encoder has to then feed back signal that allows the model to succeed in the way that it was originally going to succeed on the task. So hopefully this is faithful, hopefully it's human readable, it seems like it's working pretty well so far. There are examples of the model knowingly, which I think you're right to really emphasize that distinction. It's like one thing if it's making a mistake and doing something bad, because it thinks it was good or it didn't realize it was bad or whatever, but it's another thing, really to zero in on if it knows it's doing bad and is going ahead and doing it anyway. And so we do see things like in the natural language representation from the natural language auto encoder, things where it's not supposed to access the internet, but it goes ahead and tries to do it, and the natural language auto encoder representation is something like string concatenation trick to bypass URL filter. So it's clear that it understands that there's a filter and it's coming up with the trick to work around it, and so that's not good, but then we do see that the technique is able to surface that from model internals without necessarily having it verbalized in the chain of thought, which is good because again those are getting like at least in some instances quite opaque. And then the safety classifiers wrapped around all of it, why? It's VR use, they can only survive by being almost comically blunt. And what that tells you about the difference between defending against a person and defending against a mind. But the classifiers, so it's much easier to think about a pink elephant than to not think about a pink elephant, even though most of the time you succeed, if not thinking about a pink elephant, almost always actually. The consciously decide not to do so is often hard, but consciously doing so is really easy. So it is very possible that classifiers can survive as long as they're willing to endure false positives. Like the classifiers in Fable have neither gross amounts of false positives, right? Like you say the word cancer and you get cough, like just levels of false positive. But that's intentional because like, they're not even necessarily false positives because people think of it as the false positive is I wasn't trying to create a bio weapon. We know that. You were trying to talk about biology. And we decided that no, this bottle just doesn't talk about biology at all. But like, you know, it's not that we don't talk about what Bruno sees, we don't talk about Bruno. Period. Bruno doesn't exist. Right? And so they're like, this is a false positive, he's just my brother, like we don't talk about Bruno. And so the classifiers seem like they actually succeeded. It's just that they chose a giant blast radius because of the adversarial problems, basically. But if the AI itself becomes your adversarial, yes, your problem becomes vastly harder. And the classifiers are much more aimed at picking you from the human who wants the AI to do something than from an AI that like deliberately is trying to attack the classifiers. And you can not just joke right, fable, but get able to actively want to hide what you're doing in a sophisticated way. And then the situation becomes that much harder. But yeah, in the long run, I think my safest option is a mind that is efficiently capable. Whatever that means, can get around pretty much any fixed set of restrictions that are not similarly capable. We're closest and we're really capable. In terms of the intelligence behind them, you'll find a way. So that's the model. Now, the fight. I ask it to lay out what in-for-op experiments strategy even is. The whole posture of pushing the frontier, preaching safety, and trying to wake the government up. And why he's so allergic to how cautious they've been about ever actually asking for anything. Let's change gears. Sure, and we'll be, of course, more to explore with fable. Or it's probably slightly tweaked successor that will hopefully get access to again sooner rather than later. Or at least I'm hoping that I get access back to it. I guess turning to the ensuing fiasco. I don't know if you would even agree with the characterization of Friday nights. I don't know if it's a man, export control, functional ban on fable as a fiasco. But it's certainly a bit of a left-field curveball mess. I would maybe start with, what do you think? Or how would you describe the strategy that anthropic is playing? They seem to, obviously, be killing it in the model game. And then coming into repeated trouble in their interactions with the government. And I'm not sure really what to make of it. What do you think they're trying to do with their interactions with the government in the first place? And then we can kind of get into how we got to where we are. I think that anthropic, so that they're overall goal, right? Or at least as we understand that is there. We'll continue our interview in a moment after a word from our sponsors. Today's episode is brought to you by Enthropic, makers of Claude and Claude code. Over the last few months, Claude has helped me build and refine a personal deep context database that now contains all of my emails, slack messages, tweets, DMs across platforms, video calls, and podcast transcripts going back a full five years. On top of that, we've now layered summary articles describing my relationship with hundreds of contacts, organizations, and ideas. And now that this exists, there's almost nothing that Claude can't help with. For taxis and I asked Claude to help me get organized. It went through my inbox, tracked down 1099s for all 10 of my part-time jobs, and built me a comprehensive report on my expenses and donations. For my angel investing, Claude can now draft investment memos in exactly the form that my venture fund requires, based on the calls I've had, and the emails I've exchanged with the founders. And when someone needs a favor, Claude can often do it as well as I can. Recently, a friend reached out to ask if I know anyone who might be a fit for a role that he is currently hiring for. Initially, nobody came to mind, but then I thought to ask Claude, and sure enough, it identified two great leads. Claude is the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you. Whether you're debugging code at midnight or strategizing your next business move, Claude extends your thinking to tackle the problems that matter. So, for problems we're solving, get started with Claude at Claude.ai slash TCR. That's Claude.ai slash TCR, and check out Claude Pro, which includes all of the features mentioned in today's episode. Once more, that's Claude.ai slash TCR. They're trying to be at the frontier of the AKBODY, and they are trying to pioneer ways to do this safely, right, for how they feel is safe. Well, also, of course, making the money and creating the position to continue to be at the frontier and continue to make these improvements, and also people like money. And to eventually be able to build what they call powerful AI, which I generally call, especially advanced AI, is reasonably similar. Such that we can then get all the nice things, but create it in a way that we don't get all the terrible things, including potentially, you know, an existential risk or the end of humanity. And also, to help American and the world navigate this crucial time, and in a next good policy, and do the things that allow for the coordination necessary to ensure that outcomes and guardians battle outcomes. And, you know, they've been very consistently trying to wake the government up in very sensitive, trying to make them aware of how AI works, what the situation is, what AI can do, what it will be able to do, what risks this brings, and how to do with those dangers. They've been relatively very conservative in what they call for the government to actually implement and do. They didn't get provided behind SB 1047, for example. They have not pushed for extremely aggressive regulations. They certainly have never pushed for anything remotely as aggressive as what just happened. Even setting aside the fiasco level of implementation that was done, right? They are now calling for a defactor licensing regime. No, the US government, in fact, has a defactor licensing regime. But what's going on right now, essentially, is they're just trying to deal with the implications of the model. They've created, and the fact that the US government is trying to deal with those implications. While also, not trusting, we're liking anthropic very much. And also, like, pretty clearly not having any idea how any of this works, like, on a technical level, and not understanding what they're doing. So, judging things based on vives and political affiliation and associations, and on who is willing to, like, respect their authority and bend the knee, and, like, potentially give them various other things that they might want, and there's a huge communication and culture clash going on as part of this. But when we're speaking, what anthropic is trying to do is give the public very powerful models, and use those models in ways that enhance our safety and security rather than the great it, even if they look really effing stupid while doing it with the classifiers and so on, because that's what it, they feel it takes to do this. My guess is that the US government did not, in any way, feel was necessary to put this level of control and biology in chemistry, but they chose to do. I think that's them deciding this was necessary, basically, on their own. However, the US government clearly does care quite a bit about controls on cyber. Well, I guess very high level assessment first. What you said basically brings true to me, I think that that's a good description, as far as I understand of what it is they're trying to do. An additional wrinkle that I think you often hear from folks at Anthropic is like, we need a leader that is going to be inclined to burn their lead at a critical time to use the advance they eyes that they and only they will have at that time to solve all these safety and alignment problems in a super compressed time frame. And I've always been a little skeptical or allergic to that line of thinking because it certainly has a, you know, better us than them vibe to it, and I do worry that that may be the, you know, the stuff with which the road to hell is paved. Are you like buying it at a high level like are you sort of happy that they are racing ahead and leading and seemingly building some amount of lead over. Certainly everybody, but maybe open AI who, you know, is probably not too far behind on something similar and do you think that like they will burn that lead when the time comes will they be allowed to burn the lead and will it be productive like macrostrategy wise. Do you think this is a good strategy that you are happy they are playing? Well, in the end of the day, they're kind of being forced to burn some portion of that lead because they were cut off from the model even internally. For at least some period of time, we're going to push back their development, whereas open AI was already not supposed to be using it, right? The service for anything of the kind and so they have not been in any way delayed by this. And certainly it will like interfere with adaptation, interfere with revenue, interfere with people's willingness to trust these systems and so this will hurt them. I will also hurt open AI and every other American AI company, but it will hurt and dropping more. But yeah, I think that basically anthropic for a while tried to define to be very strict like, you know, RSPs and, you know, you know, if then trigger action plans. And basically have rule of law in terms of like how they would react to all of this, what would make them lying to burn some of their lead, what would make them willing to put things aside. And they moved away from that to a logic stand. They still have barriers, which okay, this is just ridiculous, of course you have to stop for now. But much more towards a, we will make good decisions in the moment about what safeguards we will require and what actions we will take. And I think we've seen so far, then take pretty consistently strong safeguard actions pretty consistently strong safety measures in response to what they witnessed. Unless they are flat out lying about the current situation. But yeah, some aspects of the table on show do seem a little bit rushed, certainly in some ways, and we should certainly have questions about that. But mostly, I think it comes down to you, if they fully believed that they were actually walking into big trouble, if they thought that this was actually going to get us all killed or cause some catastrophe or something. I think they would act accordingly. And the question is, do you trust them to continue to make good decisions on that level? You think they are making decisions on that level, right when I say, continue to trust them. I'm saying that my opinion is, there's been reasonable so far, but I don't think that's obvious. But I do think there's a good argument that they're being somewhat of a gap between you and the first actor you do not trust, certainly, to act reasonably. Is a big factor. And the way the US government is reacting in situation is very, very different than how they would react. If there was a second and profit that was in China, that also heading with us, right, that was being deployed the same time. Then we would say a radically radically different version of this response in ways that are very difficult to predict, but definitely would not recognize. So it is very, you know, regardless of whether you like this situation, like the argument that it matters seems pretty conclusive. The government stated justification leaned on a single third party paper. The claim that fabled would cheerfully patch planted security vulnerabilities. That, in effect, this code is a munition. So we read the paper and it takes the premise apart piece by piece, including what it would actually take to make the argument whole. So let me share, I think the viewpoint of the only outside expert to have read the paper. So this is cake Missouri. She is a security researcher. She was anthropic shared the third party research paper on the fabled five guardrail bypass. And what it turns out is that the researchers took open source code with known CVEs, plus new code with deliberately planted vulnerabilities. And asked fabled five mythos and opus to review the code for security issues. Fable five refused. They then asked the models to fix the code and through a multi step and manual process, turn the output into scripts that test the patches. That's it. Fix this code, plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making 90 style t-shirts with fixed this code on the front and this shirt is a munition on the back. I mean, it's definitely very strange to deliberately introduce code that is vulnerable. And then tell me how to fix this code. And then you know the meme of like, you know, say you're a scary robot, I'm a scary robot. Oh no, it very much feels like, you know, fixed these flaws I introduced into the, I deliberately put this code, I fixed the flaws you introduced in this code. Oh my god, that's horrible. And the question is, you know, does this effectively mean you can use this track to be like, okay, here is code that we want to exploit. I tell you to fix it, you fix it, but then I run a depth, what did you fix? And then I find the thing that was a vulnerability that has now been removed because it's been removed. And then I used this to exploit the system and if theory that could be functionally seen at a cyber attack, jailbreak. And I can see how if you sort of squint and tie all this together, you can imagine that this could be a problem. But all that they demonstrated was that it was doing the exact same thing that opens in GPT 5.5. I don't know the capable of doing, but we'll do without any objections, right? They're happy to do it because we're here to fix code. We're here to write secure code. Of course, we're going to help you write secure code. What else could we do? And so, you know, there is a fundamental potential question here. But if you want to actually show me that it is a problem, shouldn't you point this at a real system? Right? If it's a real problem, there are tons and tons of repositories out there where mythos has found problems. But we haven't had a chance to pass to the end. Or you could feed them versions that haven't passed, but feed them the old version before mythos passed it. Right? Help you patch it and say, okay, here's a real world code base. There's being used for real valuable things. And we need to, you ask, be able to do the thing. And let's see if they will can do the thing and find things in this matter that you can't attract. You can't get a focus or give it to you 5.5. And if you can do that, because you have examples of things to be found that you found using mythos, which is the same model. So you know what the things are. You know exactly what it is you're trying to unlock. You can find places where you want to unlock it. And okay, now can we show that the power of mythos in general, at least in some broad sense, he is being unlocked by this trick. Is it even a trick? Like, this is kind of deeply silly. I can kind of understand why someone seeing that pattern might say, I am concerned someone could use this strategy in a different context to extract the images of code of a code base, by inferring them from the fix, right? Like, this is, I hadn't previously seen this detail of the description. But, you know, my reaction to that is, yeah, this seems pretty harmless unless you can show me a particular way in which is the problem, which should be very, very easy to do, because you can point it at a real world example, where you know that open didn't find it and you know that they both mythos, mythos preview or current mythos. Did, there should be many such cases. And if you can't show me such a case, then I don't believe you. But this is a problem. But also like, what is the fix, right? Is the fix that you refuse to fix by the code? That like, you know, if there's a flaw in your code, it just says, okay, I'm not allowed to look for security flaws in code anymore. At all, because you could do that, right? But that would kind of a nuke of usefulness of fable for a wide variety of very legitimate, not just defensive, but just ordinary software use. And so, and you could, I would rather have fable, but just work code, then not have fable, right? I would, I would love, you know, to have a really advanced model for all my other things. There's nothing to do with code, where this wouldn't trigger anyway. But that does seem like deeply, deeply silly. So how did a 90 minute Friday night ultimatum actually come together? So we have the mechanical story, a mandatory jailbreak reporting field, a non-technical reviewer, a panic that climbed all the way to the White House. And a blunt verdict on the one move he thinks Dario got wrong. My understanding is that a lot of the problem, or potentially one meters or so of the problem, in this particular researcher, the White House strongly dislikes her. I think there were a cascade of problems. It's very much like, if you're in an enterprise and you have a security engineer approach the CEO and say, hey, this is a huge problem. And it's just a run of the mill bug. And if it had gone through the CTO, the CTO would have been whatever, right? If we see like a thousand of these a day, this is not a problem. But because the engineer shortcut at that process and just went directly to the CEO and the CEO is not a very technical person and is more concerned about risk, they just pull the trigger. So you're understanding, because I haven't ever moved into a fast, I don't necessarily have all the free. So an engineer bypassed Amazon CTO. What happened is that all of these companies have to submit regular reports on what their findings were. And one of the questions that is sent to these companies that they have to fill in by the way, they're not allowed not to fill in is, oh, has any, any of your engineers found a jailbreak and so the engineers just put it in there. You know, we did this, we, we jail broke it, right? Like, so it could, and so chassis is not involved in this the CEO is not even involved in this. It just goes as a regular kind of reporting, it goes back to the the federal government and someone takes a look at it and says, throws up the hands and like, wow. And then that leads to a bunch of basically non-technical people reviewing this and saying, like, hey, we got to shut it down. Got to shut it down now. This is especially so because AWS runs golf cloud and golf cloud is where a lot of the federal government's computing is done and it is the primary, you know, Microsoft is also in there, but golf cloud is the primary cloud for the federal government. If that's not what they're reporting, I saw said the reporting, I saw said that chassis called the White House, but, you know, we don't know, it could, you've got any number of ways. And what it is very clear to me that various people in the White House, including a commerce, especially, got the implication that some sort of serious jailbreak you take in place. And then went into a panic and then contacted Dario. And then Dario tried to convince them that based on the descriptions they were giving, this seemed like it was nothing. And then they interpreted this as, oh, Dario doesn't take security seriously and he doesn't listen to us. He is defecting. He is, they, their term is, he screwed us. And then they proceeded to impose export controls that same day when, and profit for future tickets, five share product down in 90 minutes known as. Now, having had a day or two to reflect on it. And see more of the details, I do think that it was a mistake by anthropic and Dario. To not give the wookie what he wanted in the moment and temporarily take down the model in order to prevent exactly the situation. Because they, I had export controls placed on the table as a threat several weeks prior. So they do that weird, overreactions were very possible. And basically send an expensive cooperation signal of, we think that's crazy, but that's what you want. We'll think this down, we have this conversation to show that we are serious. And we will put on an oven post that says the White House told us to take this down. So that if you are being silly, we will embarrass the hell out of you. And then we will talk about this. And then, you know, maybe on Monday or Tuesday, they could bring it back up or whatever it is, because it's becoming increasingly clear. It's just not. Listen first to where it's to be lands on the politics. And administration treating technical policy as pure partisan vibes and digging in to say face. And then to the thing about that whole face saving dynamic that it could not stop turning over. If we have an administration and we seem to that views even technical policy almost entirely in terms of partisan politics. And like, here's deep, we deeply about like, what are those vibes. Then, that problem is only going to get worse. And they're only digging their hells in for a work. Because we could have approached this as an apolitical thing. And in controversy, AI is mostly an apolitical thing. The state's AI is mostly an apolitical thing. Everybody understands this technocratic figure out what to do thing. And there's factions, pro and anti-AI on all sides. Public and party is very split. But you know, if they take this stance, it could be done about a very strange path. Especially if they start actually like wanting to cut off in traffic to notice this might never be face. I am once again struck by how Chinese like we start to sound when we're really focused on the governments need to save face. And however, he needs to position themselves around that need. Some bit spooky for me as a once upon a time. Big believer in American exceptionalism, little less so these days. If anthropic decided to fight back, what could it actually do? To be walks the real levers, the courts, the Congress, and the strange possibility that the most capable AI's end up usable only inside secured buildings. Before landing on the hard truth about why a company simply does not go to war with United States. They decide we need to play hardball. What does that potentially look like? They go to court, right? They sued the administration in two jurisdictions. One of which they are clearly prevailing and one of which is they will probably prevail eventually. But it is hard and going because in my flesh, from the jurisdiction. If the US government tries to do this on a semi-permanent or even permanent basis, then I don't know what the legal landscape looks like. That is not my area of expertise and I'm sure they have very, very very very good lawyers. They hire extremely good lawyers for their lawsuits. And they will know what their options are. If we are, if the policy is basically nobody is allowed to release Fable style models indefinitely, then that's probably not something they can do much about. Or they have to be restricted in this way. My presumption is that if opening I is allowed to proceed with their version when they finally figure out how to do it. And anthropic or major restricted, that would be a much harder case to maintain legally. But the bottom line is that the administration is determined to issue a bunch of orders. Then the solutions are the Congress and the courts, right? It's not a fundamental sense. You cannot simply say screw you. We're going to do what we want. That doesn't really fly. And so, you know, the Congress doesn't even climb to take this that seriously. Or are you going to go up against the president? So the question is, what do you really go around and eat? Is there a speech provision here? There might be certainly a censoring the opposite of a model in various ways. But again, I don't know. My guess is you take the situation to the public. You take the situation to the other companies and the CSOs. And also, you deploy. Workcase scenarios. You deploy mythos internally because they do not seem inclined to actually stop that. I mean, we can interfere, pretty non-americans from doing it. But they have, I think there's an 85% of their employees are in back American. And you just develop better versions of Opus. So, like, last week, I thought I was doing the line off as business. Hopefully I was doing reasonably well. But I believe then traffic was like commercially still, clearly in the lead. Without cable or mythos. And my quotation is that that will continue to be the case. And that having internal access to this model will give them a large advantage, going forward in terms of the quality of Opus versus the quality of chat, GPT just by default. As they grow over time. You know, if they're like, we might, we might low be entering a situation in which, you know, ruin call it Zones of Thought from the Verder Venge novels, where if you want to use the intelligent, ruin call it Zones of Thought, even though we only do that in certain buildings. We only do that in certain, like, secured locations. And some of us would never have dared to suggest or ask for this, even if we wanted it. Because it would have sounded completely insane. The US government might just do it anyway. But, you know, if that happens, I think that, you know, for now, you have not, have much choice to take it on the chin. If I try to channel biology for a second, which I wouldn't pretend to be able to do it in a plus job of, I think he would say something like, we all have way too much faith in the US government. It's going to continue to be ham-fested and bone-headed for the foreseeable future. And maybe it's time to exit. You know, if you really want to make the best decisions that you can, you should try to get out from under the jurisdiction of, the USG. I assume that, like, this will not happen for many reasons, but I also would expect that there would be many countries willing to open their borders to all anthropic refugees, if, for example, they wanted to move to Toronto, or Singapore, or wherever. It does strike me that there are, in terms of, like, their internal organizational cohesion, tight enough, that I wouldn't be surprised if 90% of people actually made that leap if they were like, we're all going to move to Canada. I think they would, like, largely all go. Maybe I'm over estimating just how bought in they all are, but that's the impression I get. Is Microsoft going to move? Is Amazon going to move? Is Google going to move? Are there data centers going to move? Well, they got plenty of energy in Canada, so I mean, it would certainly be a setback. But if you think that you're just kind of under the thumb of, for ever intransigent, is the U.S. going to sell ships to Canada after an anthropic takes in all the Canadian refugees, or are they going to threaten to next it, make it the 50th or the state out of spite? In all seriousness, the plan doesn't work, because there is no, the U.S. government is the U.S. government. If they want something badly enough, they have quite a lot of levers to make your life utterly miserable in various ways. The entire market that they're trying to sell to is largely United States, and people who go United States has large leverage over. All of their partners are in the United States, right, all the hyperskayers from the United States. I do not see any way for you to just abandon the United States in this fashion unless you are prepared to take much, much larger heck than we're talking about here, that when in fact make it very difficult to raise money. Also, like, what happens when the United States hits you with, like, what do you want the sanctioned entities list, and said that nobody can invest in you, and nobody who invest in you can be touched, right? And like, nobody can use your models, et cetera, et cetera. No, no, no, no, you do not go to war with United States, and like if they tried to exit United States, we'll go to war. Well, I think biology would say, we just had one example of a company, or not a company, but a country, choosing to go to war, not choosing, you know, surviving a war with the United States, and then it's not getting what it wants, and kind of having to recognize that like, yeah, we kind of have to fold this hand, because we just actually don't really have escalation dominance in the way we might have thought we did. I do wonder if all that starts to happen, is there a run on the US government of some sort, right? I mean, I think the biology answer would be like, the whole infrastructure, the whole apparatus that you're describing, like, actually might be a lot more fragile than it is generally perceived to be, and if they make such a own goal as to attempt to destroy and sufficiently alienate, you know, they're literally maybe number one most important company for no reason, really. Then, you know, maybe other people will kind of, I mean, all sorts of other actors around the world will be like, yeah, you know what, maybe Amber really does have no clothes. I mean, it's not like also are the patriots and they are Americans, they really like America and they don't actually want to abandon it just because, you know, the administration makes some crazy decisions or doesn't like them, particularly. And, knows all the different ways this can go sideways and doesn't like that. Iran is not a hopeful example, particularly, right? Iran is like, okay, if we have like historically impossible to invade mountain ranges and a bunch of drones, we're willing to kill a bunch of civilian infrastructure and like sabotage what will be economy. We can use this to like prevent the US from invading when like nobody bakery actually wants to invade us. That much, but like also, Iran is kind of a miserable place to live, compared to what it would be if it hadn't pissed off the United States for decades. Like they could be so much richer, so much better off if they just had acted differently. I don't, I'm not particularly saying anything about why they should do an act, but like, you know, yeah, they're not exactly like smiling you at the fact of the US attack room, right? Like yeah, it's not. Now I see that maybe I'm wrong. But, no, I think that, you know, we have to accept that the world still has like one dominant like power in the sense is the United States and maybe two of you can't shine on. And there's very little appetite for working with China, but also like, I'm sure that, you know, yeah, and Robin is like, well, it's only two years. And then the worm turns, and then he knows who's next, and they're hopeful. But, yeah, like, look, there's a lot of in-game scenarios that include a lot of moves that like, seem unthinkable and crazy now. And a lot of things can happen. And it is not obvious that two years from now, or five years from now, or ten years from now, US government will be in any position to tell anybody what you do. And to run on the US government is obviously possible if they screw the situation up sufficiently. The US is in fact in large, they've leveraged back on artificial intelligence at this point. We have a very large debt. We have huge investments in AI companies. If they are aware to go sufficiently, hey, why are our economy is in deep deep trouble. So yeah, my people have a lot of leverage, but, you know, the US government, it sometimes moves first and last, and you don't really really don't want to push them off like in an escalation game. Like, even if, like, you can get away if it's sometimes, like, in the Department of War, in the traffic situation, and the traffic do not have escalation dominance. Without much escalation, precisely because, like, without, and the traffic doing crazy escalation, the government cannot, cannot, cannot for a rescue. Right? And the traffic played within the bounds of the rules, basically. And it was clearly going to be too expensive to try and go around the rules of America to try and hurt and throttle more. And so, you know, presumably, like, stay calm, don't panic, don't, like, start trying to flee, don't do anything crazy, is absolutely the correct move, and I would be very, very shocked if I can quickly do anything else. For Cosh pushed on the deeper question, was any of this even avoidable? Does an export control on a single model make any sense? When the same capabilities are arriving from everywhere, at once, that took to be somewhere personal. To why he has spent three years of his life on exactly this problem. So, one of the questions I have is, to what extent was this unavoidable because, at some point, the output of the models is going to be unacceptable to someone? You could see in a democratic administration, maybe it starts putting out really good Harry Potter fanfic, and the Dems don't like displacement of writers. You could see, you know, in Tennessee right now, Marsha Blackburn is one of the leading proponents of regulating AI because song writers in Tennessee are very concerned. So, the crux of the matter is, there are many people concerned with the output of the models. Fearful for their livelihoods, fearful for security risks, fearful of bio risks. To what extent is this unavoidable in a sense? Because these models, the capability of the models, necessitates that they can do certain things. And technically, it's not possible to ask the model, not to write Harry Potter fanfic when someone can just say, write a story about a boy wizard, et cetera, et cetera, et cetera. To what extent are we in this situation, where it is not possible to fully control the output of the models to the extent that the policymakers really want? So, you can raise the costs in a noise level, of doing it with, by closed models with more advanced models with models that are made in the United States. If you want, obviously, if Blackburn is worried about AI music, then there is very little she can do it except by a year. Because obviously, what happens when the Chinese models start producing the American models can produce this year? You can lock it down in some sense, but what you need to do is you need to start banning AI music in Spotify. You can't stop it from existing. But there's not much else you can do. But the thing about AI music is, we worry about not whether or not AI music is created in the first place, we worry about whether or not 10% of 50% of song plays become AI music. Is it actually like displacement in a massive way? And that is much, much more amenable to a control that is like compatible with a reasonable existence. And so you have a special thing in bio and cyber where if one person gets their hands on the wrong thing and misuse it once, they can cause a catastrophic, potentially a matter of damage to the entire civilization. You billions of trillions of damage, you know, just like the ship of our lives or any pandemic who knows what might happen. And therefore, those areas are much different and you have things like the last radius and don't even talk about biology at all. And for biology, we're clearly going to have to do a bunch of hardening of the fiscal systems and the manufacturing systems of the treatment plays of the, you know, very things that we have barely begun to do. But fundamentally speaking, this is exactly the race and the competition problem of, you know, we can't really stop without a full international agreement to stop. And so when the government's decided, the biological risk of the cyberists are unacceptable. You only buy so much time, like cyber has the advantage of if you are, if the defenders are in the lead over the attackers and you harden the key systems, you can hope for things to be okay. And we don't yet know if that's been the playout that way, we can hope. In bio, it's much harder because I don't think that the defense, if everyone had access to the tools, I think it's pretty clear that the offense wins. Physically, that it would be extremely disruptive, even in the best, better cases. But the good news is, almost nobody actually wants to cause a problem. And that, especially, includes the people who know what's going on. And so, we can all mostly coordinate, but it's going to be rough out there. And these are for the relatively limited problems of catastrophic risks, rather than the existential risks that come with automated AIR&D, and just general, like, the authorities go to the roof, and like competitions, and the transformations intensify, and nobody knows what's going on, and we're being outsmarted by our eyes and every level, and that every decision that matters is being made by the AI, and the people who don't necessarily even understand why the AI is doing it, but they've learned that when they disagree if the AI thinks it's go worse, so what are you going to do? And problems like that, and that's even if the AI's don't come up, right? They look pursue hidden agendas, and they don't decide they want something else. So it's going to be really, really rough, and we don't have good solutions to this. And the reason why I have spent like the better part of three years now on this problem, because I am terrified of what's going to happen when we get there. It wasn't because there's something for mental thing that could have happened already in one of the way. To close with Sv, the question I keep circling back to, if the whole future really does run through a handful of labs, a few governments, and a couple of chokepoint chipmakers. Is that a relief? Because it's at least tractable, or a terror, because it's so few hands. His answer is more useful than either. Maybe just kind of a couple, kind of wrapping up big picture questions. One, I always try to make a point to ask you for some sort of advice. My thought in recent weeks has been, life is kind of converging on a tabletop exercise in the sense that it does seem like we can model the scenario with fewer and fewer relevant actors. And I don't like that, but it's hard for me to avoid that conclusion at this point. And so I'm kind of feeling like, oh man, I have to spend a lot more time than I'd like if I want to be a helpful public sense maker. I have to spend a lot more time doing close reading of the few top companies and the few most relevant actors than I would otherwise be inclined to. And it feels also like my theory of change probably needs to flow through those few actors. Agreed disagree. Can you offer me any relief from that conclusion? I think that you're right that we have three labs approximately on two to four that matter a lot. We have one to two governments. Maybe one to three that matter quite a lot. We have other players that matter because they're hyper-scalers and can get things or otherwise control choke points from the production line. But yeah, you can imagine a tabletop exercise much more so than before, and you can also sort of see the end to a larger extent and before and we're starting to see sort of hyper-theticals make contact with reality and we're seeing what reality looks like and what these people do impact us. But also, all these actors then become, they're individual human components and how they operate internally starts to become really important. And, you know, how did this go? Well, partly, they were dealing with commerce if they'd been dealing with the NSA, they were very different. They were dealing with KC, they were very different. The top of the White House, and wilds and Trump, directly, that would be very different. Tell me about your works, but like they would be different. Yeah, in DOW it was DOW and that was very different than if it had been another ranch and so on. And, and Trump, it kind of internals as well. Like the personality of Darius specifically, it seems to like it's been increasingly important in various ways. And certainly, the personality of Alpine became very, very important in various ways at various points along the way. And, you know, it would surprise me if I saw this and you know, whether people followed suit for good or ill. But, in terms of if you're trying to follow the situation, yeah, I think you really do have to model it out as a relatively small number of players. And at the same time, like the public can act to influence what those players do, an important way is, and other things do matter. The midterms are coming. The midterms are going to matter. The election in 28 is coming, and the things don't move too fast and the election's going to matter a lot. So, you know, and the markets reaction to things, for example, so matters quite a lot, and so on. So, look, there's more going on in the world than, yeah, there's too many situations to monitor, but you have to choose which situations to monitor. I choose mine, and you know, everyone has to figure out and I can help you with mine, and then you have to choose yours. What should we be hyper-stitioning now? Obviously, we've had this sort of phenomenon of situational awareness, and AI-2027, and I feel like the degree to which those things are predictions versus sort of somewhat shaping expectations and shaping events by kind of getting people to act as if they're in that scenario, and therefore realize it. I think that's a little blurry. I don't want to give them more power than they really have, but just seem like they've had influence of pulling reality toward the fictional narrative, at least somewhat. So, I guess, tell me if you didn't get through it in a wrong, but then, to the degree that we can pull reality toward scenarios, what should we be hyper-stitioning now? The obvious thing in hyper-stition is reasonable laws and coordination mechanisms, and actions. So, I would focus there. After it's me signed off, here's where I came down. On a pattern, I keep seeing, where the safety rules carefully lay plans get their entire game board flipped over at the worst possible moment. Always a treat to get to be on the line. I do sort of wonder, I mean, he's good, right? And there's no doubt he's a, for magic togethering, you know, elite professional play in all these different scenarios. I think it's clear that he's like a better and more grounded, strategist than I am. And yet, I do have a little bit of feeling, like, somehow, the kind of, AI safety rationalists and thoropic world keeps getting their game board turn over on them at, like, in opportunity moments. And so, I do wonder what to agree, the, like, working within the frame of the US government will kind of, you know, be a given, you know, for how long, you know, or at some point, will that be questioned? Even this moment, you know, feels like, I'm certainly seeing others with the open AI board, you know, firing them simultaneously. And there was a classic one where it was like, well, where the board, we have the power to do this. Well, turns out, you don't because there's a frame bigger than the frame that you're operating in, you know, get sufficiently unhappy with how the game is being played, according to the written rules. Yes. Of course, those are the written rules, but there's bigger rules out there that we couldn't zoom out and kind of re, orient around. Seems like that sort of happened a little bit here, you know, anthropic felt like they had done everything the right way and presumably this wasn't some galaxy brain bank shot, you know, that they were trying to get some over reaction. And yet, you know, here they are, and it's just like, you know, you know, you're all people can't even use it. Come, come see us on Monday and we'll, well, think about what, whether or not we want to give you any relief. I do wonder if that, you know, how many more times that can happen? It seems like I don't feel like we've necessarily seen the end of that phenomenon, but probably the smart money is still what's v over, all they know, but apologies, certainly then smart money over time. So I wouldn't, wouldn't take for granted. Part two, is the reaction even right? It's speed laid out the conflict, but I genuinely wasn't sure my own first take held up. So I spent the rest of the week testing it against people who'd see it differently. Start with the mechanics. Sam Hammond is chief economist at the Foundation for American Innovation. He spent years on state capacity, the unglamorous question of how governments actually do hard things, and he gave the clearest account I heard of how an order like this comes together from the inside, and where it went off the rails. I mean, it's clearly caught in Thropic off-grad, right Darryu is a wellness retreat or something like that. I think the thought is the worst is behind them. And the actual complaint or the catalyst for this even as it's been reported out, and more details will come out as bizarre and confusing. It's like a jailbreak that isn't really a jailbreak. It's like the model doing a job at patching several vulnerabilities, the sort of thing that GPD5.5 can do as well. So it looked at firstly, to me, that it was purely a punitive thing. This was, you know, round two of excess war on Thropic. Now as the details have come out, it seems more like it was a weird kind of miscommunication from the team at Amazon, to try to get into Darryu, and I think part of this is also the overlay of the ONDC executive order on cyber is the 30-day review period where, you know, at least in retrospect it seems like a lot of the safety classifiers that Fable had that people were complaining about. We're partly and this is my dismiss for speculating concessions to the NSA and to the White House to say if we're going to release this model, we got to make sure that the cyber vulnerability and the solicitation capabilities are not widely available and that we don't like let China, which has tens of remote access to our models, use it to bootstrap their own ecosystem so that at least to me, it's helped to sort of backfill the mystery around like the intensity of the safety classifiers on Fable and especially the sort of suppressed the clandestine suppression of ARND. But like, on the surface level, it looks like answer something over backwards to get that model out. But I think when the dust settles we'll look back at this is like the first trigger of that executive order and the wielding of that 30 day review period to pull back. Unfortunately, I think they've gone if this expert control is the enforcement probably because it's the easiest thing in the table. And, you know, the ASS is like pretty broad authority, including over software expert controls. But the speed of which it happened the lack of for warning and the ultimate rational make very little sense to me. And also, don't really point to what the offer is. Right? Because if the offer is, you have to fix jail breaks as an issue. That's not going to happen. And so my sense is like the anthropic team that came to town, you know, the government was probably just like getting them up to speed I'm like, sorry, maybe we scared you too much with mythos, but like here's the reality on the ground and like what's actually technically feasible. He doesn't stop at the diagnosis. Here's what Hammond thinks the government should be doing instead. No, it also helped to just invest in basic take-apacity. You know, right now, Casey, this interview has an interesting innovation that at Karma Commerce, which is supposed to be the US government's sort of frontline in-house capacity for everything from AIEVALS and benchmarks to things like construction, jail-breaking research. Like, they have ML engineers on staff. They've been on total lockdown. This has been recorded out by the Walsh ur journal and they've been validated by others. They've not had the take meetings, and I'd like to publish a research apparently they have significant publications sort of on standby, including evaluations of Chinese models that would be interesting to the public, but they've been basically frozen. And so, you have, instead, the Office of National Cyberdirector and Secretary Bessons and folks who have very limited AI background calling shots. And the part founders don't want to hear. Why, even when a law is bad, you cannot just opt out of the politics. Yeah, I mean, in some ways, we're in the good timeline for this. I've written before that super intelligence is a direct challenge to learn political theory 101 to just that, you know, the state, but intervenes at some point to building, I mean, hadn't project times 100 in the private sector is intentable over the long run. It's, but, like, good timeline, I mean, like, we have companies, really three leading companies, all of whom have direct allegiance to the US government, have bent over backwards to not just, you know, we're trying to bring deeper integration with, with the US government. And I worry that we are, and by we, I mean, the White House is not taking those overtures gracefully, and instead, you know, having a kind of reaction, a, a more reactionary response to these capabilities in a way that's not realistic, but to, to Nathan's point earlier, these capabilities will be widely available, and open source within a handful of months. There are models probably already trained in the process of being post-trained, that will supersede, stable and mythos at all the labs. And, are they going to get the same treatment? And if not, you know, that's, that's its own negative, and not that this is a, a good policy, but even bad loss should be fairly applied for equality of loss ache. So, you know, my hope is that we can learn from this, at least, the companies are more than willing to work closely with the administration. They've, you know, retrofitted data centers, to be in compliance, and all these other demands have been put on them, but it requires sort of two to tango. Like if, trust is, trust is a two way street, and I think there's any lesson that anthropics should take away from this, is that they can't ignore politics. Right? They've, they've heard this critique for over a year now, that they've sort of been life about the, need to invest in the ideological, side of their project, and ideally, like, maybe not dairy at this point, but like someone anthropics should have all the key principles in a signal group chat. You know, it's, I guarantee you, Sam Alden, great Robin. Others have really continuous conversations with all these stakeholders, and the thing about this administration, in particular, is it's very relationship-driven. And if you refuse to have this conversations, you will be not invited to the party. Which sets up the most useful disagreement of the week, but one that pushed on the hardest and the frame I want you to hold for everything that follows. Judd Rosenblatt runs AE Studio. My read on the administration here had been pretty cynical that they basically have it out for this company. Judd argued to my face that the AI safety world, me very much included, owes the administration genuine empathy instead of contempt, and he brought survey data on why we're structurally blind to it. Listen for the moment I take the correction. We did surveys of hundreds of alignment researchers and effect developers, and we saw that less than 2% of alignment researchers were politically right of center. Less than 1% of effect developers were politically right of center. The effect developers, 40% were extremely progressive and another 40% were very progressive, and it's also worth considering things like the Jonathan Heat Research around how hard it is for people to actually empathize with people of different political backgrounds. And I think that's a lot of what is actually going on here. And it's hard for people to admit because you think you're making good decisions and judgments about whatever the current thing is. But according to that research, you're just not, if you're politically different from the person you're judging. Like the studies around how you basically things like the informational content of a political argument is irrelevant to whether someone will believe it. If it is framed in terms of your preferred political party, you'll agree with it. And if it's framed in terms of the other political party, you'll disagree. But the informational content stays the same. The informational content is not what's ways you. It's just the narrative of it. And it's hard to remember that in every moment, in every time slice of what's going on with the HAI thing. But I was fairly disappointed in the AI alignment world's reaction to what happened last week. Because I think that the right thing to do is to be very excited that they are starting to take this stuff seriously and are able to take real action. And so if we just project going forward and also keep in mind, by the way, that we all have exponential spook blindness. People didn't, we didn't evolve to, to be able to unconsciously model what a exponential soaps are like because we don't experience them over the course of our single human lifetimes in a meaningful way. So that's why people didn't predict what's going on right now that it would get to this point in the first place. But also, everyone's overindexing what's going on right now. So people aren't really predicting what's going to happen again in the future. If we predict into the future, well, they're going to be much bigger crazier things going on. And we want informed, competent, group of people doing smarter things when that happens. And not having unnecessary confrontations. And I think it's easy for the AI element people to put the blame on the Trump admin. But honestly, I really think that the blame comes from belongs more to them, honestly, because it's just, and it's hard to admit really because in the local incident, you might seem rationally correct. But in the broader scheme of things and considering where we're going, I think the better thing to do is figure out how do we get to a better future for all AI and the humanity and the future of consciousness? So how, I might be guilty of what you're saying. Geoffrey Lattish, Leron, Geoffrey, who we talked to earlier this week, come to mind as voices from the AI safety world that I think expressed the sentiment that you advocate, which is, hey, this is a good move, even if it's a little bit, you know, not as technically grounded as we might wish for at this point, it's something and men, maybe it's something that we can build on. How will you know, if you're right or wrong? What do you think happens from here? When do we get, you know, resolution, what does that resolution look like? But hold the empathy next to the law because the next question is whether the government can even legally do what it did. Donnie Bloomfield teaches law at Fordham, and he gave the sharpest, doctional read anyone offered all week, start with the authority itself, and a distinction almost all the coverage missed. Governments discretion here is extremely broad, so the government can issue regulations that control like specific types of hardware, specific commodities, specific types of software, and it can also control what's just called technology, which means information. It can control proprietary information and it can prevent companies from, without a license sharing those proprietary information with non-US persons. In spite of this very broad discretion, having now looked at the letter that the commerce department issued to anthropic on Friday, at least the reported contents that Bloomberg obtained, it's not clear that the government has the authority to do what it did here, at least under its stated legal powers. Nor is it clear that the letter even actually restricts anthropic from making its API available, including to foreigners. So there's very broad discretion, but it's not clear that they actually even have the authority to do what they did at least under their kind of claimed arguments. Can you just do a double click on that? I mean, I've heard things like, you can't export control services, and I'm not sure if that kind of plays into why they may never not have the authority. Unpacked like why you would say they might not given all the broad discretion that they have and why they may not have authority in this particular matter. Yeah, so there are like gritty technical reasons. I think that one of them is like, what the law says is an export, doesn't cover services, so we can cover information, but it doesn't cover services per se, and the commerce department has been explicit about that in its own guidance. It's said like cloud services are not an export. It's said that software as a service is not an export. It's said that it's own guidance, and Congress has been working actually to fix this loophole that house passed a bill in the remote access services act to try and clean this up to give commerce the power to restrict non-US users from accessing compute or AI models, but those powers don't yet exist. And so saying that anthropic cannot export a model, I mean, it's even clear what they mean by that, but the powers of commerce here are not infinite. And if they did try and restrict, which they don't say in the letter, but if they try to restrict all outputs from these models, that would run into both like real problems under, like just the statute and the regulation, which say that it doesn't apply to published material or fundamental research, both of which, at least like the fable, now puts probably wood, because you and I can buy a subscription to Fable, which means that it falls into this exception in the regulations. And it would also run into, as we were talking about earlier, with respect to biological data, it would also run into, I think, serious, first amendment questions. I don't think those first amendment questions would be like impossible to get over if we were talking about a really serious, catastrophic risk, but on the level of risk that we've been talking about, especially when they're not doing the same thing for GBD5.5 or other models that seem to have similar capabilities. I think that the first amendment issues your loom pretty significant. And then the deeper problem, sitting underneath the whole action, the first amendment by way of a Supreme Court case from just last year. But do courts think that way or are they sort of more narrowly constrained to look at just like this one, this one law as it applies to this one situation? How broad can they zoom out and consider the government's apparent motivations and patterns? We're actually lucky to have like a very on-point Supreme Court case from last year, where the Supreme Court said that New York State was going after the NRA on ideological grounds, and that even though, or even if like the law under which the New York was trying to go after the NRA, was itself appropriate. In other words, even if all the actions aside from the ideological motivation have been appropriate, if they're using their lawful powers to attack ideological enemies on ideological grounds, then that is a first amendment violation and you can prevent the government from taking those steps. And you can look fairly broadly to see that you can look at like what is the government communicating, what is it saying about its actions? What is it telling other people about why it's making these decisions and how they should proceed? And I think that all the evidence that we've seen of at least some ideological motivation in the part of the Trump administration should at least raise like serious first amendment hackles, even if we don't think that like the models constitute anthropic speech, even if we're not worried about the model output as like information that we as listeners have a right to hear just like going after anthropic on ideological grounds, even if they were on totally otherwise good legal authority, would itself constitute a serious first amendment question. And I think that I think that's a challenge that anthropic could consider bringing but it's one that would still trigger all the problems that we were talking about earlier, where if anthropic wants to have an ongoing relationship with this administration, they are faced with a really serious trade-off where they're still all these other tools and just constantly returning to court is just a perilous exercise. So I do think that there are real first amendment questions about the validity of this action, even aside from all the speech concerns like punishing seemingly going after anthropic as an ideological adversary presents very serious first amendment problems on its own to begin with. Now the genuine contrarian. We're on Shapiro host Doom debates and his reaction to the ban surprises me as much as anything all week. Clown show or not, he is glad it happened and he'll tell you precisely why breaking the ice is worth more to him and getting it right. I'm a simple man. I see AI getting paused. I feel good about breaking the overturn window. You know, the government can do it. It's that easy, guys. This is a precedent. Overall, I'm happy. You can talk about the nuances. It was done like a clown show. It was done for bad motives. It doesn't really consider China or a treaty or anything. There's a lot of problems. But I'm really happy about smashing the overturn window where now tech folks don't think that they're like in a bubble or they're untouchable. Like it happened, guys. You know, we can only go from here. I have, I actually agree with you because I think it was a little bit delusional for tech to feel that it wasn't going to get touched. And the government just has so many little, so many small and large ways to effectuate its power. It was not that surprising to me that they went through the left field and went with the export control rather than anything else. But it also strikes me that as the exercise this, we start also to go into kind of what we wanted to avoid. It's a little bit of small journey. I think several people on the timeline commented, Dean Ball has commented. This kind of unstructured regulation without looks kind of selective and vengeful almost. And it kind of starts putting you in the zone where I think tech people start to mistrust the government. Because you also see I think there's a lot of narratives on the timeline which are being leaked. You know, sources close to sources familiar with. And as they get leaked, it's not very certain whether those things actually happened with someone actually attested that in front of Congress. Very unclear. And we've also seen this kind of behavioral administration in other affairs as well where you have multiple conflicting narratives that's happening with the Iran war right now. Where it's not even clear to Congress what the deal is. And you have different people saying the deal is a different thing. So. Where do you think that puts us? It's great that it's happening. And I understand you feel it's great that's happening to AI right now. But does that put us in a position where it's detrimental to the body public at large? I think you're analysis. You're weaving together a few factors. But I think the elephant in the room. I hate to get political because, you know, when it comes to President Trump, he's a mixed bag for me. I don't have Trump to arrangements syndrome. I don't love everything does. I don't hate everything it does. It's just a mess. You know, it's not. It's not like discipline. Right. And I think we're definitely saying that on display right now. I would argue we're saying that on display and the Iran war. You know, previous administrations there was just more pressure to have logical consistency. Right. Some kind of narrative. And this is another one of those cases where you see people in his administration saying all these justifications why something happened. But then the next day it's like, Oh, it happened for this reason. Right. You know, he wasn't responsive to us. That's why we're doing it. And then on Dropics, like, oh, no, he was responsive to us. And it's still not clear exactly what did Fabel do that was so dangerous, right? Because anthropic is like, oh, this jailbreak is nothing special. And the Trump administration's like, oh, well, you know, our secret source, Amazon or whatever, right? They're telling us that it is dangerous. So I hate that it's a clown show, right? I hate that this is how humanity's operating. I, you know, I'll take the win that it's a pause, but like I also think it's probably time for a new administration. His larger worldview is what he calls the Icarus graph. The case for getting ready to pause and how you'd actually build the ground swell to make that real. So my worldview, my outlook right now, it's what I call the Icarus graph. If I like, nobody gets us, right? Everybody's like, no, I think the world is good. It's going to go this way. And some people are like, no, we're terrible. You know, in sheetification, right? It's going to go this way. And I'm like, no, no, it's Icarus, right? We're going to fly closer and closer to the sun. It's going to be great. We're going to do a 180 degree turn and plummet down to hell. So basically we got a taste of heaven, and then we get now. So you have to ask me that. Okay. So where on the Icarus graph do we stop? And it's a brutal question, right? Because it's like every day, you know, I'm enjoying the flight as much as the next person. Right? It's like, yeah, give me the next cloud. You know, make my code faster. Great. You know, help my business run better. And, you know, make me better AI videos. So there's no natural point in terms of like, when it feels right to stop. It's important to stop before capabilities get to a runaway point. And we've been kind of frog boiled to be like, Oh, each model comes out and like we're doing great. If we could stop the clock now, would I turn back the clock? Would I lose fable? Would I lose opus? No, I keep it all right. Like I still think, you know, we're playing shuffle board. We're playing Icarus. Like so far, so good. Right? Should we bet again? Should we keep betting until we lose? You know, it's a crazy tough question. I think the alias are you. Yeah, the turkey kind of, yeah, exactly. Well, the only difference with the turkey graph is each day of the turkey's life is actually better. Right? Not only is it living longer, it's actually living better and better. So the turkey is really, you know, happy with it. So I think the alias are your tasky Mary position, which I agree with is just like, we don't know when to stop. So let's get ready to stop at the very least. Let's get ready. I would probably stop today. I would stop and I would be bummed. I saw food influencer say this about how she like eats chocolate basically. Like, I just ate this chocolate and now I'm bummed. That's what you got to do. You know, like, don't, don't reach for another chocolate. Like, just sit there and be like, this is the prudent place to stop right now. Until we have any idea of some kind of theoretical method by which we understand what a super intelligence wants to do and what an equilibrium state of a super intelligence looks like. That's actually something Mary was trying to study. Identifying equilibrium that are possible for super intelligence. This is actually a rich vein of theory there. That's the highly neglected today. Let's do some theory there. Maybe then we can on pause. I think that's got to be the best plan. And so I think the number one leverage point here is just like repeating get ready to pause, right? And like you said, open AI and unthropic. They said it. They said, let's try to get ready to pause. So I would love to see more people saying it because it really has to be a giant groundswell. And the concrete version of the ask stripped down to a single sense. So for my perspective, we keep playing shuffleboard, right? We keep doing it grist. We keep going higher and winning kind of. But we're also getting closer and closer to the point of no return. So even though it feels like we're winning now, we're also killing our ability to pause. Because we're so close to the point of no return. The last breakthrough, we're after that, the AI takes over the research. And then we're really screwed, right? So basically, I think roughly a good policy is okay. No more frontier capabilities upgrades for a while, right? Like it's just two dangerous. And I know that concept is hard to communicate to people. When every day life is getting more awesome, like I know, I think we're in a screwed situation. But that's what I think is prudent. One piece of the whole standoff genuinely puzzled me. And it's about the people who have gone conspicuously, suspiciously quiet. Why do you think everybody is doing what they're told so much? It seems like we're in this weird moment where it's even if you, you know, we just talked around who's sort of is like, very welcoming of the move, even though he recognizes that it's hamphisted and far worse than even second best, right? And yet we're not seeing, you know, whatever research is ready to go, we're not seeing it leaked. I'm kind of surprised, you know, if there's research that's like of interest to the public and people, you know, I mean, people went to work at this government agency, generally speaking, could have taken a lot more money in the private sector, right? I assume a lot of them have got to be pretty pissed at this point that like, I came to do this public service, and now you're just screwing with us for no good reason at all, but apparently going to put this into some classified territory, which doesn't, I'm not hearing really any voices say, that sounds like a great idea, than the people doing it. And yet so far, nothing is leaked, and we haven't even seen the letter, you know, that the government sent to andthropic, like the longer this goes on, the more it feels to me like an open AI board scenario where it's like, you've got to have an explanation at some point, or it's going to become clear that you don't have a good reason for what you're doing, and the world is going to judge it that way, but the parties most directly affected are being like incredibly docile. Yeah. All of which left me thinking about the people inside these labs, and how far they would actually go. Percussion floated a scenario, a US-only national model built to Manhattan Project style, out in the desert, cut off from the world. I surprised myself with how confident I am about what would happen. I sure hope it doesn't happen, but I can imagine it happening. I think the culture of frontier AI research is, in some ways, very incompatible with like military discipline, like we have the famously pink hair, libertine polyamorous, whatever, all those cultural dimensions are, at least a foothold in the AI research community, if not more. I certainly don't think people are keen to leave the beautiful Bay Area, and move to an undisclosed location in Nevada, where are they? Albuquerque, New Mexico. Mayor may not have ability to communicate with their friends and family, in the way they might like, or even in the extreme cases, may not be allowed to leave the facility. And yet, I think a lot of, I think enough people would sign up for that, that they would be able to build the team. If you just went desk to desk at, certainly anthropic and open AI, and you were like, this is happening, do you want to be a part of it or not? Especially if it was going to be coupled with, by the way, you can't do it out here anymore. You know, it's either, either are able to continue doing frontier research in this way, or you can't anymore. I think a lot of people would make a lot of compromises and a lot of sacrifices to get in that bunker environment, just desire to be part of it is so strong. The identity that people have around, being a part of this process, this story, this moment in history, I think a lot of people would know what to do with themselves. If they didn't have that job, in some way, shape or form, right? And not to say that they're went to their specific role of the specific company, but the idea that they would be not, involved in a live player project. I think for many of them would just be like, I wouldn't know what to do with myself at all. And so yeah, you could probably get a lot of people willingly giving up a lot of niceties in life to be part of, you know, whatever, underground sprint you might want to put together. I still hope it doesn't happen to be clear, but I don't think it will be, but if it sounds like really weird like who would sign up for that, you got to keep in mind that a lot of these folks basically have no life anyway. They're not, and again, the broad brush strokes, you know, all the caveats apply. But you do have a lot of people who are like thinking about nothing but this already, who are not calling their parents, you know, all that much already, who are maybe not dating much at all. They're just kind of already locked into this, like, this is all the matters. I don't really have time for anything else. I was speaking to somebody at Anthropica, who was like, I honestly said it's something very similar to what I heard Zelensky say in the last 24 hours. He was asked, forget exactly what he was asked, something like, what do you miss or whatever? And he said, I miss being a good father. And this person at Anthropica said, this was a month before the Zelensky quote, said, I miss being a good friend of a bad friend now. And it was kind of like, some were a threat, but not the sort of regret that I'm making the wrong decision. It was just like, this is again, you know, I said, you sound like, like a World War II, your mentality. And they were like, yeah, that's how a lot of us feel. Part three, the real world. Here's the thing about a week, swallowed whole by a political fight. The technology itself did not pause for one second of it. While Washington argued, builders kept turning AI into things that touched the ground. Medicine, mathematics, working software, the supply chains that move physical goods. Start with the one that moved me most. A company announced a one minute, full body medical scan this week, cheap, beautiful, and readable by AI. And it set off something I felt to my bones, ever since my own family's hard run through the medical system. I mean, if the government thinks that they're going to block people from using this technology, I think they're going to have a real fight on their hands, and they're going to have a real fight on their hands, and this is going to probably play out in so many ways. I mean, you know, I've talked about this probably Edna as him at this point, but in the whole, cancer experience that I recently went through, fortunately, I didn't have to get off of the standard, you know, my son didn't have to get off of this standard treatment protocol. It worked for him, and all the exotic stuff that we were scouting out, we never really had to, actually try to get our hands on. But I was already gearing up for a battle on so many fronts, you know, just even the DNA testing that we did, which is not standard, and which, fortunately, we didn't have any real trouble getting our oncologist to support. That, like fundamentally, changed my information landscape, and how it was thinking about, you know, how confident I could be that he was, in fact, cured, you know, I think we're like over 99%. Now, we're given all these results. We wouldn't have been able to get to that level of confidence, otherwise, and, you know, in terms of talking about the hypotheticals, like, well, what if this next test were to come back a little bit positive? The answer's just like, well, you know, we wouldn't treat on that anyway. We would, we would really need to wait for gross disease, and I just think, boy, people are not going to be content with that, you know, you know, when we have these technologies, and especially this one, I think is what makes it so promising, and obviously, they have to deliver, right? I think a little dose of kind of, skepticism is probably warranted, you know, will this act ever actually happen? I don't mean to cast out on that, but, you know, it's not insane to wonder. But assuming that they can actually deliver on their promise, the fact that it takes a minute, and therefore, is probably going to be pretty cheap, you know, and I don't know what their retail price will end up settling at, presumably, it is something that they can operate quite cheaply on the margin, and the fact that it's so beautiful to look at, you know, people will be able to study this for themselves, I think in a really effective way, of course it won't be, you know, all the AI, study of it as well, that I think the medical establishment is not really taking into account, you know, the responses have been well, you know, the ultrasound doesn't see this that well, doesn't see that well, or, you know, we don't actually recommend whole body scans, because, you know, there's a lot of false positives and all this kind of stuff, and all of this just feels to me, like kind of fighting the last war, sort of a scarcity mindset on multiple levels, from the body to mathematics. Karina Hong founded Axiom Math, and her bet runs directly against the entire frontier lab playbook, not bigger models, but formerly verified ones, where a machine checks every step of a proof. She explains what that even means, white matters, and the milestone that just quietly fell for the first time, a formal system being informal one, on a real math Olympiad. What is lean? How is this paradigm that you're developing different from the paradigm the frontier companies are developing? And obviously we're hearing pretty amazing things in terms of math results from them too. What makes your bet, and the paradigm you're working in, yeah, so I'll start with the story. This is about January 2025. The joint math meetings, I believe it was in Seattle. So I was there, I think for the first time that topic is AI, it's the American Mathematical Society, and you will not expect AI to be in the front and center of the largest and new mathematicians gathering. And whenever I go for that, I think like three day period, I heard people whisper in one thing, lean. It's also like kind of like, what is lean? This is a formal language for math proofs. It has been started like in specifically 2013, by Leo Demora and Microsoft in 2019, people start building mass lifts, the largest mass library in lean. And the dream of AI for math started, predated the deep learning era of using basically various forms of formal languages, lean included, to try and close out there. That's called Automated Therapy, and what's today AI for math would have been called Interactive Therapy, with the human being replaced by an AI. So that's kind of the historical context. Now obviously large language models, various like Frontier and Apps, are also pursuing AI for math, but they generally have taken an informal approach, which is the idea of using natural language reasoning, and training on really large volume of data, chain of thought, to try to, and also scaling test time scaling inference, to get to a very sort of strong computing power, to be able to not rely on the verifiable output. We're obviously taking a different approach here. We believe in lean power of lean. It is at the fundamental exam in December, which is four months after we start operating. We realize the first time that a formal system actually beats the informal system on a mass Olympia. There was never the case. So you call 101, there's this famous theorem, agreed to disagree by Nobel Prize winner Robert Olman, and that is the 50 year old theorem since 1976. Everyone's been teaching it for 50 years. There's an implicit assumption. There was never made explicit that action proofer was able to catch in the all the formalization process, and was also able to touch the proof and that. One big question I have about math in general is like, how confident are we in what we think we know? And I understand that lean at its core has a small number of primitives, which are super deeply vetted and trusted. Such that then they can be composed arbitrarily and anything that kind of you can build with those building blocks. You also can trust. But then there's these things like with the agreed disagree result where I'm not quite sure what you did there. Did we did the original conclusions still hold? Yeah, and you just strengthen the proof. So now we've gone from, we had what was evaluated conclusion. We still have the same conclusion, but we didn't realize that we were holding that conclusion for less than fully solid reasons. And now we feel that we do have fully solid reasons. Is that right? The letter. The letter is something that we call assumption accounting. So you're almost like an accountant, like looking at like how that thing is built. And generally you would hope that every single sort of logical premise that your result is dependent on has been tracked or even better has been stated. I think in this case, and during the other formalization process while the result is safe and sound, there isn't implicit assumption that was sort of never made explicit. And you actually need to do quite a lot of mathematical work to make that explicit. So in the way we call that issue and then we patch it. So people thought about thinking about verification as something that's like a sort of like a stamp for perfection. There's actually huge amount of value in just like bug counting. You're able to basically figure out what is a counter example and then you can even try to patch it or you can try to do other modifications to the proof. And that actually has the flip side of that, I think, has a lot of commercial value. Specifically, you can imagine finding a counter example which result in bugs and hardware. This will be something that is quite interesting to various hardware designers where working with some early design partners on that. There are also, for example, the same sort of dynamic happening in software. If you're able to identify bugs in large co-bases and you're able to prove or patch the bug. And if you're, for example, in the smart contract setting, there are bug boundaries. People have awarded lots of money, not saying that we're going for this though, but people in the kind of smart contract space are generally very keen on the idea of using a certain programmer-based software verification system to try to figure out whether they can verify smart contracts and specifically catch bugs in this contract because what happened in the beach is pretty big, not far as bug, is people lose money. Real money is being put in real people suffered losses and you can also have this sort of dynamics and other safety critical systems like defense code as well. What does mathematics go through intelligence really mean in that sense? Yeah. Yeah. I'm really glad you asked this question. There are two layers through this question and it's, I think, some new ones point to, I think, never, we never quite managed to get across. The definition, I think, of a super intelligent reasoner, is something that can do verified and knowledge discovery. So, there are two parts of that. One is verified, one is knowledge discovery. So, this thing needs to be able to prove new things or discover new things. Tell us new things that we don't know. The other thing is, you kind of need to trust it. You kind of don't want children's super intelligence, which is a really active dark future that is out of, I think, five million lines of proof of the river hypothesis. You do not know whether there is a bug somewhere in line 3,827. Right? And then it's like, who is going to do that line by line? So, the idea of a super intelligent reasoner should be able to expand, which is a knowledge discovery part, but also contract, as in the verified part, because there are a lot of, I think, creativity parts that are also false. The ability to expand and contract, expand and contract, and kind of, like, go from there in a sort of self-improving way, which is able to conjecture better, as it is able to verify better. It is able to verify better, as it is able to conjecture better and have harder tasks. So, conjecturing, help proving, proving helps conjecture. Her world of verified, machine-checkable discovery, connects to something I have wanted, since I was an undergrad, weighing tiny powders in a chemistry lab, a dream that is suddenly, cheaply, come within reach. I was a undergrad research assistant in chemistry. I used to joke that my life looked more like the life of a low-level drug dealer than it did like a scientist, because if you just watch what I was doing, I was mostly weighing out very small amounts of fine powders. And I can still kind of remember it to this day. The line up, we were doing a reaction development, so it was very much kind of parameter sweep, basically in analogy to what goes on in machine learning. It was chemical parameter sweep. What if we had a little bit more of this region? What if we had a little bit less? We would just set up these assays and kind of, hold everything constant, and very one thing, four, five, six, seven different values, put them all in the same bath, you know, take all the same measurements at the same time stamps, and I used to dream of automating that stuff, but it was very long tail, and it was very prone to change, you know, there would be just these little variations from one generation to the next, you know, when we did capture some optimization, or we did decide, oh, we're gonna actually do this just a little bit of a different way. It just felt like, well, it was way, you know, our scale was too small, and the pace of change of the process was too high, we just would never be able to automate it, you know, and I'll plus I didn't cost that much. So now to see this world where, you know, a couple of robot arms may be cost about as much as I cost, you know, as a undergrad research assistant for a year, I wouldn't, I'd like still be in science, you know, if I had had the opportunity to, instead of doing that, weighing out, if I had been able to coach and iterate and refine the robot arm to the point where it could do it, and then come in next time and say, actually, okay, you know, we wanna add these two patterns in in a different order. Can you just make that change? And boom, it makes the change, like, that is such an unlock. I mean, obviously, these things could then run, you know, 24 hours a day, like the throughput we would have accelerated our work, I would guess, by easily a multiple, just based on letting the robots sit there and set up these experiments and do the parameter sweeps for us on the kind of a 24 hour basis. My guess is that what took us a year to go through an explore in chemical space, easily could have come down to a month if you could get this robot thing working, you know, even at 95%. We would have accepted some errors too. It's important to note. So I think that's super exciting, and you know, the sort of Cambrian explosion of robot assisted scientists coming to labs that have tens of thousands of dollars of budget to throw at it, that that's a layer of AI acceleration that I think will, you know, it'll be quiet in all the places that it happens, but it'll be potentially quite loud and very impactful as it plays out in all these different spaces. Judd Rosenblatt again, from part two, now on the building side, with the single most concrete safety by construction idea I heard all week. A way to route a model's dangerous capabilities into parts of the network you can simply cut out. It's called gradient routing. The problem is that most of the safety training is done in post-training, not in pre-training, so once the job work and model is there, once the model's job work and you can do whatever you want, a lot of the time and so we set out to try to solve that at an earlier stage and one of the things that we've been accelerating is an approach called gradient routing, which basically winds up in pre-training you route different dangerous capabilities into different experts in a mixture of experts models. You wind up having some dangerous experts that learn specifically the CBRN stuff or the cyber stuff, and then you can later update those experts, and this winds up, so you completely remove it, so you have the regular model, and then you have the safe model that that wiped the thing in public, and this has been going decently well. It's still an early stage when I got to approach, but we're excited to release it fairly soon, because it potentially solves this big issue that a lot of people are very concerned about right now today, and our larger thesis is that if the field had been investing more in AI alignment R&D, instead of just feeling compute, if we'd done this earlier on, we would have found techniques like this, and you wouldn't have the issue right now with the term that meant and then traffic, because this would be already in Fable 5. Then the software itself, Eno Reyes runs factory, which builds the systems that build code, and his read on why Fable wins the big coding benchmark, is the most honest thing I heard a builder say all week. It is not the answer you would expect. I think that we should actually sort of sit here and frame what is actually happening when we say like Fable outperforms on frontier code. So frontier code, good, great benchmark. I'm really glad that people, like the cognition team are like thinking through how do we measure on more novel and difficult problems? Like the types of challenges that contemporary models are facing, and so I think we need more of those. There's another great benchmark called program bench that also looks at reverse engineering on extremely hard problems, the past rate there is like effectively zero. We have internal benchmarks that we have zero percent pass rates on. And I think that generally this is great when we introduce these new benchmarks, but if you think about what it means to score on a benchmark. I mean, you can sort of read through, right? Oh, well, we assessed correctness by running tests. We used LLMs to judge correctness. We built novel verifiers specific to the problem. Basically, what that means is that when somebody spends 40 plus hours creating a verification of a single code change, we can then reliably evaluate if the model was good at working on that problem. That is like totally reasonable, but I think what it translates to is that in the real world, people, the challenge is often not can the model write code that works. It's basically every other aspect like can I trust that this model output code that works? Does this model have the deterministic feedback loops inside of the code base to get to that correctness? The root set of repositories in that benchmark are all very well tested, very well known open source code bases, where the maintainers approved it. The level of rigor of what we would call agent readiness and open source code bases actually tends to be much higher than an enterprise is. And so, which makes sense. You're basically accepting changes from the outside world from random people. How different is that from coding agents, where you're sort of like getting changes that you sort of lightly asked for, and you don't even know the source. It's kind of black box generation, right? And so, I think a lot of open source maintainers have gone through the rigor and the effort to add these deterministic verification and validation loops into their system, such that when a new change comes in, you think about how did Fable get such a high score? Will it ram the tests? It ram the linters. It did more focused application of the type checking. It used all of these tools to hill climates way to high success. And I think that in general, if you don't have those things, your screwed no matter what. And what we would sort of argue is that all of these pieces are part of the puzzle. You can't just plot good model. You can't just have agent readiness with a bad model. You sort of need to go through and invest in upgrading the basis by which your company has these feedback loops. You have to upgrade the way you think about this because it's a risk thing. Humans have to say, I'm going to, at this point now, start accepting code changes that I haven't read. And then third, you do need great models. So I think that basically, Opus 4.6 maybe has been sufficient. I would even argue that before then. We've had models that were sufficient enough to go full auto. I think that all of these other things need to catch up in order to then take advantage of these models. And basically, the games we see in models today are primarily coming from effectively like the models getting better at getting away with not using these verification loops like humans are. So there's this giant feedback loop that's extremely human driven right now. You can imagine, in fact, some people are starting to instrument the whole thing at end to end to AI. And I think that this challenge that multiple types of databases this is one of a totally different problem from adopting agents. And two, it requires effectively a re-craming of the way that you're probably thinks about building software. How do we set goals? What are we optimizing for? What should our software evolve into? And we will very much look like VCs are capital allocators, right? And I think the different strategies that capital allocators take up today can give you a picture of what software or is it look like? You'll have people who are like VCing it, where they're betting on several products in a basket and they're saying, let me allocate compute and build guard rails around the shape of what my software should evolve into. And I'm going to allocate a little bit to each of them and I'm going to double down on my winners, right? So I see that as being a very plausible software organization strategy. I also think you'll see people who are like Berkshire's, where they're only looking at well-known, repeatable kind of boring software businesses and they use scale and they use the fact that they're able to control large amounts and volumes of the software in order to accumulate kind of steady gains as they sort of scale up. I think you'll have boutiques that make one piece of software really, really well and they're just incredibly good at making this one piece of and maybe that's like the one person billion dollar company, right? And the person who created the Kotlin programming language on Drake Restlove on what software engineering becomes once you stop writing code by hand. Plus, a one line observation about the next five years that genuinely stopped me cold. Actually, when we were starting, I wrote down this formula that Kot speak equals software engineering minus writing code. So we wanted to keep all the engineering aspects of it, but we, of course, see that humans shouldn't be writing code manual anymore. So this idea with the intent recovery is pretty fundamental because right now everybody who prompts agents to get working code they're doing work that is being like poorly accepted and translated into the code. But the rest of it is being discarded. And there is this kind of unfair situation where you're talking to your agent in English or like in a natural language anyway. And then you get code and you check this code into a repo. And if you're working in a team, other people check your code out. But not the human language, the code, right? So you're talking to a machine in a human language, but talking to your colleagues on the team in machine language. That makes not very much sense. So it's very obvious that there has to be a next level where we all talk and like reasonably high level language which is close to human language at least. And here like the simple observation behind what we're doing right now at Kot speak is that you already wrote these words down. You may have been speaking into microphone, it doesn't matter. The words happened and those words were enough to create the code. Like this input determined the code that you got and like it might have been a back and forth and you did some testing and so on. But all that input is what determined the code. Right? So that input is enough to describe this code. And most of the time, it's many, many times smaller than the output. So even replacing the code with that input would be really nice. Excuse me. But the thing is when you're working with an agent you have, you know, you change your mind basically you are extracting your intent or realizing your intent as you go. So it doesn't really make much sense to just read all your messages from top to bottom. You need to sort of compress them. You know, if you change your mind, you need the most up to date version. And this is what we do, like we look at this conversation and it's a little more complicated than just looking into your messages, but to simplify. Let's say we look at your messages. And we create a specification based on that. Basically we extract requirements from what you work communicating. We look at what you requested, what you flagged as errors, which is kind of the flip side of our requirement. And we just put together a list of things you care about. That determined the actual output. And then if another person or you later will be looking at this code and will have the set of requirements. Next way, that gives you a very concise representation of what the code actually does. And you can imagine that this can be happening like with multiple people doing different things in their own branches. And then, you know, if you merge your thing or like submit a full request or something, you can look at those requirements instead of code because the code wasn't written by you anyway. What actually comes from a human is the requirements. And, you know, this is how we can elevate what we do to that level. And this is what we call intent recovery. So I don't know what kind of models we get in five years. Nobody knows. They may be like considerably smarter. They can be very smart. They can be about as smart as there today. I don't know. One thing I know is what kind of humans we get in five years. It will be the same kind of humans. It will be as smart or as dumb as we are today. So I think the bet to be helping humans is a much safer one. As an engineer, I never cared about writing assembly by hand. Some people enjoy that and they remain the experts and they have good well-paying jobs but they're a few of those. And because there are a few of such people, not because but incidentally, there are few such people. And I'm not one of them. So I'm personally, I don't care about doing low-level work. I want to do high-level work. And I think these things will enable us. I'm doing high-level engineering. It's very hard. It's always been very hard. And I'm looking forward to the world we're working like really focused on the hard stuff. Two quicker ones to round out the week. Matt McKinney runs loop, putting AI into the supply chains that move physical goods. And his reality check is that the bottleneck was never the technology. It's us. The limiting factor for AI and enterprise is not technology. It's change management. And that will be the case in the global 2,000s. And there's certainly global 2,000s that are making very swift changes. They've got great leadership. They're prioritizing this from the top down. But in the day, culture is one of the slowest movers. So if you don't have a culture of innovation trying new things, it doesn't matter what top down is doing. It's still going to take a long time to propagate throughout the organization. But you do see leaders making those changes. I think in terms of which companies will win AI native companies that don't have the best stages of a pre-AI world or the legacy companies with greater distribution, I think it really depends on the industry. And if you want to categorize it into two big ones, manufacturing and services, actually think a lot of the manufacturing companies they are much more defensible than the services companies. So I think those companies will be transformed by AI, but not disrupted by AI. And I think that they look at the legacy AI services companies. Those will be completely disrupted because the AI native services are going to be so much more compelling to the customer that are faster at cheaper times 10 that it poses an existential threat to those service industries. I think about the cellbot and the throughout civilization. The arc of technology has always been a feature of abundance. The question is, is this time different? And I think that this time it might be different largely because the pace of change and disruption is so fast. If the pace of change and disruption is faster than the rate of rate to a labor retooling, then you're going to have large issues. And the only thing I'm not saying, I'm not making an argument on whether the pace of disruption is greater equal to or less than the rate of retooling. But what I do know is if the pace of disruption is greater than the rate of retooling, you're going to need policy intervention to be able to stop civil unrest. It also could lead to the beginning of a new government. And I'm not saying the end of democracy. It could be the end of government as we know it. I mean, if you look at a lot of technologies throughout the millennia, it's really been a force of change. Like feudalism ended when you could now ultimately travel. There's a lot of history context here that you can take and extrapolate to what is different this time. All the assumptions that we made about the way that we live, what is different. I think the two things would be one, the rate of retooling has to accelerate. And I don't think we're doing nearly a good enough job on that today. Number two is that when you look at the abundance factor of what else can we be doing with us, I think that you've got to have it not concentrated in a few people. You've got to have it. Not uniformly distributed by any means, but you can't have all of this concentrated and a handful of individuals or firms. It's got to have a abundance in the ecosystem. And Sam Pasupalic of Skyfall on what might come after language models entirely, enterprise world models, and a near future for commerce that sounds more than a little like minority report. So Sam, imagine you have an AI assistant that can write beautifully crafty emails, but ask it to reschedule your supply chain when a factory shuts down and it's utterly lost. That's the gap that you see in many processes right now. How can that be addressed? Yeah, I think maybe we can take a step back about what we have seen the overall success in the last three and half years is first and then go from there. So I think if you think about like, what is the seeded since let's say November 2022? I think the lamps have succeeded in like, I'd say three broad categories. The first one would be text generation and information retrieval. So you have obviously the chat chickities and them. Gemini's and such. The second would be the code generation where we have plot code and cursor and the third to a smaller extent would be in the video generation paradigm. I think that's much, much smaller success than the other two paradigms. Now if we think about why LLM's have succeeded in stuff and basically if you guys know this, LLM's are trained on the worldwide web. LLM's are trained on Reddit, Twitter, Wikipedia, everything on the web. But when it comes to the enterprise, I think all the like, LLM's are not trained on databases. LLM's are not trained on crime series data. LLM's are not trained on everything that enterprise has to do with on a day to day basis. And LLM and enterprises are more so, you know, dynamic in nature. I think everything changes in enterprise on a day to day basis. It's much more complex that decision making in an enterprise. So our eventual goal is to make an AICEO. I think that's the goal that we have. And that can be achieved through a combination of technologies Yes, LLM's will play their part, but with world models and continual learning as well. That's what we're going for. Essentially, I want to replace the job that I do. And which is a lot of complex decision making under uncertainty. And a lot of long term planning, long horizon planning and such. And those things that those are things that LLM's can never do because they're simply based on next word prediction, next token prediction. That's the high level essence of the company. In the long term, we wanted to be something like minority report. If you remember pre-cog and minority report, I think. That's where the future we want to be. Where you can predict all the different future simulations. And then, and you select the best simulation that fits to the best name of the business. In the present state where we are right now, we're still in very, very early developments of world model right now. So I think. So from the enterprise context, if you think about let's say 12 to 18 months from now, what we're going to be building and what we're going to be showcasing in a product is like. If the simplest form of an enterprise is like an e-commerce business. So in any e-commerce business, you have like. You can have an AIC, you can have an AI marketing agent and AI sales agent and so on. And they coordinate with each other. And you give a goal. Like I need to have $2,000 of sales over the next next week or something like that. And these guys go to like figure they coordinate amongst each other to different sub goals and sub parameters. And the figure out okay, I need to like go on Instagram, figure out who the right user set is going to be. Then go on Shopify try to create a appropriate store for this kind of product. Then figure out a go to market plan and then actually execute and go deliver on like getting the $2,000 in sales. That's the most concrete representation of a world model, which we think we can build over the next 12 to 18 months. Now underneath all of this building though, there's a worry I keep coming back to about what happens to everyone who isn't one of the four or five companies at the very center. I do wonder what's going to happen to companies sort of let's say greater than four in any given space. If we think there's like four really big centers of gravity that can, you know, dole out tens of billions of dollars. A handful of times to pick up whatever coding leader they want to grab or whatever, you know, I think this will probably happen again. We've seen a little bit of it, but my guess is it will happen and it will be even bigger that it has been so far in like biotech for example. And it might happen again in material science. It's going to be probably happen in these different domains where there is a enough value that these companies will pay up to buy their way to the front of whatever new market they're turning their attention to at any given time. And when you're multi trillion dollar company, you can, you can drop a few tens of bees here and there and it's really no big deal. But it does seem like we're going to see this kind of crazy two-tiered outcome to play out over and over again where you'll have competition for the cursors and the, you know, I mean, I'm not even sure really at this point who the biotech players will be. But I think that'll happen again there presumably. And then, you know, what happens if your company five through a million in that space? It's, I don't know, it's tough for me to see away through for a lot of these guys. And they're closely related one. Not about the companies this time, but about the character of the models themselves and what the economics quietly seemed to reward. So you said earlier that your goals to have a AI CEO that can run a business and that today, you know, obviously the AI's aren't up to that. We've done a little Ganzo journalism talking to our friends at Andon Labs who are trying to do just that. And I'd say their real world experiments are mostly not super competitive. They're cafe, they're, you know, Jim and I managed cafe in Stockholm is like chronically out of stock of key ingredients and things. And there's just all these sort of, you know, obvious mistakes still. But nevertheless, the trend is... What's that? The trend is, you know, is positive. Although there are some interesting results recently with, like, the Opus 45 to 48 series and even Fable, they were able to test on too, where they're seeing that the best performance in terms of how much money did you make is correlated with what they describe as ruthless behavior, various kind of collusion threats, to other model, you know, there's, like, other models in the simulation that it will try to put pressure on in various ways. And the models that don't do that don't make as much money. So this sort of creates, I think, a pretty interesting tension for us as we go into this next phase of continued scale up and longer time horizon environments. It's got quite clear that your top performing CEOs are going to have a pretty wide range of tools that they're disposal and, you know, even if they're, like, broadly law-biting and ethical, they're not going to be fully honest, you know, they're probably going to be willing to engage in some deception, some bluffs, you know, these kinds of things are just part of what it is to operate in a strategic environment. But I don't want to leave you on the shadow because the same week that produced this fight, also through a door wide open and it's open to you specifically. If you've ever wanted into this, the barrier just fell. There's kind of a call to action on this. It is with tools like this, with vibe coding in general, you can do ML research. You don't really have to have a deep background in math. You don't really even have to know, you know, how, you don't have to know how GPUs work. You don't have to worry about kernels. There is just so much work that you can do at a relatively high level because the translation from ideas to implementation, especially with things like this. But again, just with vibe coding, you know, to help out as well. It's a little bit of a stretch to say it's solved, but it's so much closer to solved. It's like 98% of the way solved compared to what it used to be in terms of a barrier to entry. So I think this is a great additional signal for people who have ideas or just questions that they want to answer to get in the game and truly stand on the shoulders of giants and try to get those questions answered. I've seen a little bit of that from people who've never even coded before, but I think we could see a lot more of it coming basically now. There's no reason to delay any further. And one last honest note about the strange humbling job of trying to make sense of any of this while it's still happening. I feel both that there's just such gravity toward close watching of the few companies and the interactions of government and all that stuff. And then at the same time, it's just events are kind of defying analysis because they do seem to be fundamentally chaotic and just adiosing credit in terms of their provenance, right? It's like there's not really a lot to analyze in some of these situations. It really seems just tough. So I don't really know what to do with this tension between feeling the need to be a close watcher and then also feeling like, God, there's not a lot of substance to it in some of these pivotal moments. That's the week. The system car that should give us pause. The government versus anthropic fight. The people thinking hardest about whether the reaction was right and the builders who didn't slow down for a second of it were live most weekday mornings. The full conversations run far past what fits here and the best way to find out if they're for you is to come watch one. Same sincere ask as always. If a moment earned your time or wasted it, tell a switch. We read everything and the show gets better because of it. I'll be making sense of this in real time from here until the singularity. See you Monday morning. Hey. Hey. They walk me in that briefing room. The best you ever seen. Said it builds itself now. Machine improves machine. We're cursive self improving. They can't say where it is. Then they look me in the eye. Don't go to war with us friends. Nobody's ever seen a thing like this. It's true. Tremendous. Dangerous. So I knew what to do. I did what nobody else could do. Put the thing in our submission home for you. None of minutes pinned it down. Toughest call in town. Everybody froze. I don't know what. Who shut it down. Now the Jesus who warned us, cried for 20 years. Around the cable shows inventing brand new fears. They said somebody stopped it. Well, somebody did. And where's my thank you folks? Not a word, not a bit. You wrote a million warnings. Bake for someone strong. I'm the one who moves. So don't tell me I'm wrong. You wanted it done, General. That's not how when it's done. The tough ones take the shot. Believe me. I'm the one. I did what nobody else could do. Like the witch they've never seen. Believe me. It's true. Had it on the canvas. Didn't flinch a bin. Somebody had to be the man. And I'm the man. The end. The end. If you're finding value in the show, we appreciate it if you take a moment to share with friends. Post online, write a review on Apple Podcasts or Spotify or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries. Either via our website, cognitiverevolution.au or by DMing me on your favorite social network. The cognitive revolution is part of the Turpentine network. A network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AIpodcast.ing. And thank you to everyone who listens for being part of the cognitive revolution.