Leviathan Waking
On Anthropic/USG, and a new era in AI governance
Introduction
Imagine that there were no Food and Drug Administration (FDA), but there remained a large pharmaceutical sector, similar in size and scope to the one the United States enjoys today. In this alternate world, imagine that drugs were not licensed or otherwise formally approved by regulators; there were even officials in the executive branch who boasted that the U.S., unlike other countries, would not get into the regulatory morass of licensing drugs.
One day, a pharmaceutical developer warns that they think they have made a drug that cures a major Cancer at one dosage but is lethal at a slightly higher dosage. The company says, for this reason, that they are going to restrict release only to pre-approved patients and monitor their usage of the drug carefully—a sharp break from prior industry practice but one that the company insists, controversially, is necessary. This particular company had been advocating for years for stricter drug regulation, much to the chagrin of the government.
This causes a stir, and the government, not quite knowing what to do, announces that it will give drug developers the helpful option to show their drugs’ safety profiles to government officials before they are released. They are adamant that this is a voluntary program. The pharmaceutical company, being hopelessly literal nerds, and if we are being honest, more than a little bit obstinate, decides to release their drug without going through the voluntary program. “We already paused general availability of the drug while we did our own safety study, so we don’t need the government’s testing, and besides it is voluntary, isn’t it?” the company seems to be saying.
But then a handful of patients get side effects severe enough to hospitalize them, but not severe enough to be lethal. The government gets understandably upset, particularly considering their lack of experience in regulating drugs. “You talked up your own safety practices so much, and now we have people in the hospital. You are telling us that you are comfortable releasing chemicals that can put people into the hospital?,” the government argues to the company.
The company’s literal and obstinate nerds say, “well, we’ve thought about drug safety regulation quite a bit, and given how common hospitalization of a small number of patients is with a new drug, compared to the lifesaving benefits of our drug for millions, yes, we think the benefits outweigh the risks in this case.” But trust has already broken down, and this abstract, technocratic defense falls on deaf ears. “People are being hospitalized,” the government says.
And so the government bans the drug, indefinitely. It is not clear what the government wants more: a remedy for this specific side effect, a solution to all side effects from drugs, or, really, an apology from the company, as well as the sensation of domination over these disobedient, obstinate, and literal nerds.
In a matter of weeks, in our alternative world, the United States went from a system that was implausibly laissez-faire for the level of risk involved in this industry, to a system that was, in the eyes of essentially all expert onlookers, incomprehensibly strict and risk averse.
Fable, Jailbreaks, and Export Controls: What Happened
This, of course, is my read of what happened in the Trump Administration’s latest dispute with the AI company Anthropic. For those not following the blow-by-blow, what happened, in a few sentences, is:
Anthropic released Fable, a commercial version of their very-powerful Mythos model with severe guardrails to prevent misuse.
People liked it, though broadly speaking thought the guardrails were far too strict.
A few days later, officials in the Trump Administration (it is not clear who) became aware of a jailbreak that got around some of Fable’s safeguards (it is not clear how severely), and demanded that Anthropic de-deploy the model (it is not clear with how much specificity the government expressed the concern).
Anthropic did not de-deploy the model (it is not clear why), so the government imposed worldwide export controls against all non-U.S. persons on Fable and Mythos.
Because Anthropic lacks the ability to validate U.S. personhood for end users, this meant they had to pull down the models globally, for everyone. In fact, by some accounts, Anthropic has had to suspend internal usage of their model because of the risk that their own non-U.S. person employees might use the model.
You’ll notice the clause “it is not clear” repeated frequently above. The sheer opacity of everything that is unfolding makes it hard to analyze. There is no text for me to draw on, and no actual policy to criticize. There is simply a game of he-said, she-said played between two actors whose animosity toward one another is only growing and who both, if we are honest, seem to be making things worse for themselves and for the whole industry.
It is worth dwelling for a moment on how unclear the Trump Administration has been. In the weeks after Anthropic first announced Mythos Preview, David Sacks, the former White House AI Czar and current Vice Chair of the President’s Council of Advisors on Science and Technology, sought to downplay the capabilities of the model by suggesting that Anthropic had a “boy who cried wolf” problem with AI safety claims. Emil Michael, Undersecretary of War for Research and Engineering, argued—correctly—that AI has a “cyber” problem rather than a “Mythos” problem, meaning that the risks and capabilities of Mythos are not intrinsic to Anthropic models, but something we should expect to see broadly throughout the AI field soon.
A few weeks later, Sacks argued that OpenAI’s GPT 5.5 was of a similar capability level to Mythos and applauded OpenAI for making it broadly available. He contrasted OpenAI’s openness—and again I think Sacks is right here—with Anthropic’s more cautious and restrictive approach to releasing Mythos, saying on X, “[GPT 5.5] may be the first cyber model that defenders actually get to use.”
About five weeks after that tweet, on June 2, President Trump signed Executive Order 14409, “Promoting Advanced Artificial Intelligence Innovation and Security,” section 3 of which describes a voluntary, 30-day pre-deployment testing program for frontier AI, and section 3(c) of which reads, in its entirety:
(c) Nothing in this section shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement for the development, publication, release, or distribution of new AI models, including frontier models.
Anthropic had announced the highly restricted release of Mythos Preview nearly 60 days prior to the date this Executive Order was signed, and had surely made the U.S. government aware of Mythos well before even that date. And besides, nothing in the Executive Order was operative yet—the deadline for the creation of the voluntary testing program is not until July.
On paper then, given the text of the Administration’s policy and the statements of senior Administration and Administration-adjacent officials, Anthropic should have felt in the clear to release Fable without getting an explicit thumbs up from the U.S. government. Everything the U.S. government was communicating, in policy and in rhetoric, seemed to suggest “go ahead, release your model!”
And yet common sense would dictate otherwise. Anthropic is still in the midst of a heated dispute with the Department of War about that agency’s decision to label Anthropic a supply-chain risk. Bitter disputes about policy and politics between the Administration and Anthropic remain unresolved, among them export controls, federal preemption, and the general reality that Anthropic supports Democratic candidates for office while Republicans occupy the seat of power.
Of course they needed to tread carefully. What the law says does not matter. What Administration officials argue on one day does not matter. Anthropic is a political enemy of this Administration, in part because they have explicitly chosen to make themselves one. It is simply naïve to think that your company can operate under such circumstances without an extreme degree of regulatory caution. And given this context, Anthropic’s actions are viewed by many within Washington as not simply unwise, but actively antagonistic.
And it is not just about Anthropic and political grudge matches with the Trump Administration. Everyone at the frontier should understand that in practice, you do need an explicit green light from the government now. I can’t pretend to be mad about this, even though it does contradict both the rhetoric and the policy of the Administration; after all, my own analysis of the EO at the time it was signed was that it created a de facto licensing regime.
The stark reality is that making superintelligence is a profoundly political act even in the healthiest of societies, to say nothing of the filthily political world we Americans currently inhabit. A model like Mythos goes beyond being a mere political act and implicates the sovereignty of the state itself. No company gets to shake the foundation of state sovereignty while staying blithely above the raw reality of politics.
In D.C., Anthropic’s rapid release of Mythos after the supply-chain risk controversy with the Department of War was not just seen as another step in the development of AI, even if that is what it was. It was seen by many as a move against the United States Government—a private company, developing a weapon, as a move against the government. What else, really, could one have expected? All actors in this industry, and all concerned citizens observing the AI field, must steel themselves for a profoundly more political future.
What Is To Be Done?
The near-term solution to the local dispute between Anthropic and the government is that thing you hear often in D.C. these days: “a deal.” That is not a matter of policy; it is a matter of Anthropic and the U.S. government coming to mutually agreeable terms by which they can live with one another. The medium-term solution to the broader problem—the lack of a coherent frontier AI governance framework—is a technocratic law, passed by Congress and not by Executive fiat, that puts real guardrails on both industry and government.
Consider the FDA example we started with. In the real world, when the FDA denies a company’s drug, the FDA itself is bound by laws and procedural rules that constrain its actions. The FDA cannot simply deny a drug application for no reason, with no notice, and with no public transparency. The FDA’s authorizing statute from Congress outlines the specific reasons the FDA may give for denying a drug; the FDA has to explain, in detail and in writing, what is wrong with the company’s drug; there are numerous appeals processes, first within the agency itself and ultimately extending to the judiciary.
Now, I am not saying we need an “FDA for AI,” and I am also very much not saying “the FDA is a perfect institution.” Far from it. My argument is instead that technocratic institutions mediate between the raw impulses of political actors and private enterprise. They provide procedure, structure, predictable rules—all things that create “rules of the game” which go beyond the brazenly political. Does politics enter the picture? Of course, often in significant ways. One hope of mine is that there are ways to design institutions that minimize political interference in technocratic matters, which has been the focus of my writing on private governance and independent verification organizations.
Politics and Technocracy
I return, however, that politics is not a thing to be avoided in frontier AI. It is a problem to be managed, and a force, ultimately, to be channeled healthily. One should not hope so much to eliminate politics as to put political forces toward the ends towards which they can most productively be applied.
To see what I mean, consider last week’s Anthropic controversy: the strictness of the guardrails the company imposed on Fable, and in particular the company’s initial decision (quickly walked back) to create system-level “safeguards” that would silently degrade Fable’s performance on tasks related to “frontier LLM” research and engineering. All the company’s other safeguards involved degradations of performance that were explicit to the user: users who asked about biology, for example, were frequently downgraded to the previous generation model, Claude Opus 4.8. Other times, users would get the same kinds of refusals from Fable that have become familiar.
The machine-learning safeguards, however, were different in that a user would think they were getting a helpful, earnest response from Fable, while in fact, at the system level, Anthropic was mangling both the user’s input prompt and the model’s output to create an invisibly degraded answer. This struck, well, almost everyone on X as unfair, myself included.
As I mentioned, the company quickly backtracked. But the whole incident caused me to reflect on the role of AI in politics. Unlike the issue of evaluating the severity of a jailbreak, I felt no need, with the silent-degradation issue, for a private governance institution to mediate between political will and private corporations. And the reason, I realized, is because the silent-degradation controversy is intrinsically about what is fair, while evaluating the severity of a jailbreak is a technical judgment.
Politics is well-adapted to channel popular intuitions about what is fair (intuitions that, to be clear, I do not always agree with) into the law. One needn’t conduct any evaluations or audits to conclude that silently rewriting a user’s prompt to sabotage their work is not fair; it is transparently poor corporate conduct. Political processes are at their best when they are channeled toward making decisions of this type. To be clear, it is trivially easy for human intuitions about what is fair to create deeply perverse and ultimately unfair outcomes. The founders believed in representative democracy mediated by republican checks and balances for a reason: they feared the raw will of the majority. But within the structure they laid down, political processes aimed at determining what is fair have, on the whole and over the course of centuries, done a decent if still highly imperfect job.
Political processes are not well adapted, however, to make information-dense technical judgments. “What kind of jailbreak is this, what threat models does it enable, and how does it compare to the broader universe of jailbreak?” are a series of technical questions that White House and other Trump Administration officials asked themselves in the past few days, both implicitly and explicitly. But they asked themselves these questions without a lot of time (political leaders, being generalists stretched across the entire Federal government, are always pressed for time), without much technical context, and—the coup de grace—with a set of political prejudices about the company to whose model the jailbreak applied. It should not be surprising, given this set of facts, that Administration officials arrived at a decision many observers outside the Administration—including people like me, who are pre-disposed to be sympathetic to its decisions—found perplexing and frustrating.
This notion dovetails with the work of Gillian Hadfield, whose writing on regulatory markets (both in general and with respect to AI in particular) is the primary inspiration for my own work on private governance. Hadfield argues that governance of any complex technology implicates two distinct types of question: first, democratic questions: what kinds of deception by AI developers count as unfair? What tradeoffs between utility, safety, and competition does the polity wish to make? What level of catastrophic risk are we willing to tolerate?
Once those broad democratic questions are answered comes the second set of questions: how do we implement these public directives? And here, things rapidly become technical. It is here where the notion of private governance bodies, overseen by public authorities but allowed considerable latitude to develop their own answers to these technical questions, come into play. Key to Hadfield’s idea is that these private bodies would exist in competition with one another, allowing regulation itself to evolve with technology, societal attitudes, and the like.
It feels that we are a long way from any kind of outcome like that, but I am reminded that in AI, the political Overton Window moves quickly. What I can predict with confidence, however, is that if we continue to govern frontier AI without a serious overarching framework, we will continue to get chaotic, unpredictable, and value-destructive outcomes. In the absence of clear rules to mediate political impulses, the American effort will not be about how to achieve U.S. global dominance in AI, as the President aspires to, but instead about whether the U.S. government can achieve dominance over U.S. AI. This is a fight that benefits no one.


After reading this essay, I’m left wondering: What do you think Anthropic should have done with Mythos?
Like, I agree that “tread carefully” is certainly good advice here, but it’s not clear *how* they could’ve tread more carefully. Outside of vibes-y actions like verbally appeasing the Trump admin or hiring a few republicans, I don’t k know what you would have recommended policy-wise. Sit on frontier models until after OpenAI releases a better one? That would cost them the lead in the AI race forever. Run the release by the Trump admin first? What else were Glasswing, the NSA and Pentagon briefings, etc? Add more safety measures? To shut down all GPT-5.5 levels of cyber capabilities *with near-perfect reliability* probably isn’t technically feasible.
The only thing I can think of that might have realistically worked would be waiting for the government to set up their EO cybersecurity clearinghouse, going through it, and then releasing Fable. But I’m not convinced that would’ve been enough. There’s a high chance that the jailbreak would’ve gone unnoticed (I’m not sure the clearinghouse would be Amazon-level right off the bat), and a high chance that the admin would do their inspection, approve it, and then jump on Anthropic anyway as soon as someone finds a partial jailbreak—because they wanted to.
I agree that Dario et al need to take a few levels in realpolitik, but the amount of blame directed at Anthropic feels a bit like “What was she wearing?”
The most alarming detail isn't any single decision. It's the velocity of the swing. The same administration went from "we will not create a licensing regime" to "worldwide export controls on a single model" in a matter of weeks, and the trigger wasn't a change in the technology or a change in the assessed risk. It was a breakdown in a relationship.
That's the structural problem your FDA analogy is really pointing at. When governance runs on relationships rather than institutions, the rules change at the speed of the relationship, not at the speed of the technology. Every company in the field is now operating under a regime that could shift overnight depending on who falls out with whom next. And the damage isn't to any one company. It's to the predictability that every other company, investor, and allied government needs in order to plan at all. The irony is that the stated goal is American dominance in AI, and the one thing that reliably kills a country's lead in a technology race is making the rules unpredictable enough that capital and talent start looking for somewhere more stable to build.