Introduction
On the first day of the Trump Administration, the White House’s Office of Personnel Management (OPM) issued a memo that suggested federal agencies consider firing so-called probationary employees. Despite the name, this is not a designation for employees who are in some kind of trouble. Instead, it refers to a “probation” period that applies to newly hired career civil servants, employees who have been transferred between agencies, and sometimes even employees who have been promoted into management roles. These employees are much easier to fire than most federal employees, so they were a natural target for the Trump Administration’s cost-cutting initiatives.
Because probationary employees are disproportionately likely to be young and focused on more recent government priorities (like AI), the move had unintended consequences. The Trump Administration has since updated the OPM memo to add a paragraph clarifying that they are not directing agencies to fire probationary staff (the first link in this article is the original memo, if you would like to compare).
While the memo was a disruption for many federal agencies, it would have been an existential threat to the US AI Safety Institute, virtually all of whose staff are probationary employees. The threat did not come to fruition, but the whole affair gave me, and I suspect others in Washington, an opportunity to ponder the future of the US AI Safety Institute (AISI) under the Trump Administration.
I believe that the federal government should possess a robust capacity to evaluate the capabilities of frontier AI systems to cause catastrophic harms in the hands of determined malicious actors. Given that this is what the US AI Safety Institute does, I believe it should be preserved. Indeed, I believe its funding should be increased. If it is not preserved, the government must rapidly find other ways to maintain this capability. That seems like an awful lot of trouble to go through to replicate an existing governmental function, and I do not see the point of doing so. In the longer term, I believe AISI can play a critical nonregulatory role in the diffusion of advanced AI—not just in catastrophic risk assessment but in capability evaluations more broadly.
Why AISI is Worth Preserving
Shortly after the 2024 election, I wrote a piece titled “AI Safety Under Republican Leadership.” I argued that the Biden-era definition of “AI safety” was hopelessly broad, incorporating everything from algorithmic bias and misinformation to catastrophic and even existential risks. In addition to making it impossible to execute an effective policy agenda, this capacious conception of AI safety also made the topic deeply polarizing. When farcical examples of progressive racial neuroses manifested themselves in Google Gemini’s black Founding Fathers and Asian Nazis, critics could—correctly—say that such things stemmed directly from “AI safety policy.”
The irony is that, for the most part, algorithmic bias and misinformation (real or alleged) are not major priorities for the people who worry about catastrophic AI risks. And it is mostly people of that persuasion, rather than the “tech ethics” persuasion, who staff the US AI Safety Institute. Confusingly, many in DC seem to believe that AISI is somehow involved in efforts to make AI woke or to otherwise censor AI outputs. But US AISI has not, to my knowledge, published a single document that I would describe as “woke.” Indeed, if anything, it is AISI’s parent agency, the National Institute for Standards and Technology (NIST), that is preoccupied with typical center-left causes like bias, misinformation, and environmental sustainability (see, for example, the NIST AI Risk Management Framework).
Measuring and understanding AI risks that could create catastrophic harms is obviously a government function. There would be little point in going through the trouble of having a government in the first place if its function did not involve assessing and mitigating things like biorisks, wide-scale cyberattacks, and other clear prospective harms to physical safety.
Some argue that these risks are overblown, and others even assert that they are “science fiction.” While I do believe that the AI safety community focuses overmuch on specific kinds of risks to the near-total exclusion of others, I find the “science fiction” critique in particular to be among the laziest and most irresponsible motifs in AI policy discourse. Potentially dangerous capabilities are beginning to emerge in frontier AI systems. Here, for example, is OpenAI’s System Card for Deep Research:
Several of our biology evaluations indicate our models are on the cusp of being able to meaningfully help novices create known biological threats, which would cross our high risk threshold. We expect current trends of rapidly increasing capability to continue, and for models to cross this threshold in the near future.
Here’s Anthropic’s Claude 3.7 System Card, noting that “ASL-3” refers to “AI Safety Level 3,” a heretofore unbreached threshold of novel capabilities that Anthropic believes will require more serious safeguards:
Further, based on what we observed in our recent CBRN testing, we believe there is a substantial probability that our next model may require ASL-3 safeguards. We’ve already made significant progress towards ASL-3 readiness and the implementation of relevant safeguards.
We’re sharing these insights because we believe that most frontier models may soon face similar challenges in capability assessment.
If you believe that these risks are science fiction, you are implicitly asserting that OpenAI and Anthropic are either wrong or lying in these documents, both published within the last month. You may be right, but the burden of proof is on you to explain why you believe this. Reiterating your distaste for AI safety policy proposals (a separate issue from the assessment of risk), mocking AI safety advocates, and the like are miles away from appropriately rigorous arguments.
We still don’t know what it will mean for an AI system to possess the potential to create catastrophic harms. Perhaps not much—or at least, not too much too rapidly. Technology takes time to diffuse, and things like bioterrorism are surprisingly rare. Nor do we know what it will mean for AI policy; we have many ways of combatting catastrophic harm other than placing onerous restrictions on AI development.
But should the United States federal government possess a robust understanding of these risks, including in frontier models before they are released to the public? Should there be serious discussions going on within the federal government about what these risks mean? Should someone be thinking about the fact that China’s leading AI company, DeepSeek, is on track to open source models with potentially catastrophic capabilities before the end of this year? Is it possible a Chinese science and technology effort with lower-than-Western safety standards might inadvertently release a dangerous and infinitely replicable thing into the world, and then deny all culpability? Should the federal government be cultivating expertise in all these questions?
Obviously.
Risks of this kind are what the US AI Safety Institute has been studying for a year. They have outstanding technical talent. They have no regulatory powers, making most (though not all) of my political economy concerns moot. They already have agreements in place with frontier labs to do pre-deployment testing of models for major risks. They have, as far as I can tell, published nothing that suggests a progressive social agenda.
Should their work be destroyed because the Biden Administration polluted the notion of AI safety with a variety of divisive and unrelated topics? My vote is no. I believe AISI should be preserved. But if the Trump Administration decides otherwise and chooses to eliminate AISI, they will need to find a way to maintain the capability with minimal to no loss of continuity. My guess is that this is not worth the effort.
A Longer-Term Vision for AISI
The capabilities I have described are the most urgent functions of AISI. But with a bit more time and more funding, I believe it could come to occupy a broader role—even without assuming any regulatory powers.
Say that you are Chief Technology Officer for a hospital and interested in acquiring an AI system to automate the process of updating health records and other routine paperwork. Perhaps the system you’re considering has agentic capabilities as well, with the ability to use a computer in a human-like way.
You are likely to have many questions about the reliability and capabilities of this system. Hospitals are heavily regulated, and involve matters of life and death, so your questions will be especially numerous and high stakes. Many of your questions will be common across all hospitals; others will be specific to your enterprise and its workflows.
You’ve seen some recent regulatory guidance on AI use from your state’s hospital regulator, but they basically amount to telling you to do a good job and to write a report explaining how you are going to do a good job. But you do not know what “good job” means, and evidently, neither does your regulator.
This is a stylized description, but it is not far from the reality of frontier AI policy today. Laws and regulations rarely specify substantive, measurable requirements and instead rely on subjective (and hence ever-shifting) standards. This is because, by and large, policymakers themselves do not yet know what it is, exactly, they wish to achieve. They simply know which harms they wish to avoid, and for the most part, these harms are already unlawful under existing law.
AI policy thus often finds itself with a kind of chicken-and-egg problem: businesses do not know how to comply, and policymakers do not know how to describe what compliance looks like in an objective way. Policies like Colorado’s SB 205, as they apply to businesses seeking to adopt AI, primarily require those businesses to write lengthy reports to the government about the steps they took to mitigate every conceivable harm. There is no way to know what successful compliance is, so the longer and more detailed these compliance documents are, the better—ad infinitum. In the European Union, process-based compliance of this kind has been estimated to function almost like a tax on AI spending, adding 5-17% in compliance costs to any corporate use of AI.
The solution to the chicken-and-egg problem faced by regulators and businesses is not yet more paperwork—it is a technical solution. Specifically, this is a problem of measurement and evaluation—the precise specialties of AISI and NIST.
Today, evaluations are used for a small fraction of what they could be; they focus overmuch, in my view, on a small set of catastrophic risks, or on broad capabilities like math, coding, and knowledge of arcane facts (note that this doesn’t mean I find those things unimportant, just that they are currently a disproportionate focus). And even in these relatively limited uses, evaluations of frontier systems are commonly said to be in a state of crisis due to how quickly they tend to be beaten by the best AIs.
Many of these evaluations are structured as exams, with fact-based questions for the AI to answer. For example, EpochAI has a widely cited benchmark called FrontierMath, which was developed in collaboration with world-leading mathematicians to be an extremely difficult math test. Knowing that an AI can successfully perform the most advanced mathematics is valuable, but this is, clearly, an academic exercise: the direct economic value of AI will come from automating much more mundane knowledge work, and the math scores are, at best, a loose proxy for broader capabilities.
The capabilities frontier today is not especially about how much models know—they know a lot. Instead, it is about what they can do. What tasks can AI systems meaningfully automate, and how reliably do they do so? Increasingly, then, evaluations are shifting to simulations of agentic workflows. In my view, the most meaningful measures of AI progress are not evaluations like FrontierMath or Humanity’s Last Exam, interesting though those are. Instead, they are task-based evaluations like OpenAI’s SWE-Bench Verified, which measures performance on software engineering tasks, or METR’s RE-Bench, which does the same thing for machine learning research.
Task-based evaluations are difficult to create, and there is no way that AISI, even working in partnership with private actors, can create evaluations for all uses of AI throughout the economy. It is possible that an entire ecosystem of private evaluation organizations will need to be created. In the case of healthcare, one such organization—the Coalition for Health AI—is already tackling this problem (though I have not thoroughly vetted their work).
What AISI can do, however, is accelerate the creation and diffusion of more sophisticated evaluations, and thereby help to seed a broader ecosystem of evaluation tools. They can do this by developing and open sourcing the fundamental tools needed to build agentic evaluations, publishing guidelines for evaluation creators, and potentially even funding the creation of new task-based evaluations for targeted industries or use cases.
As AI systems become capable of building evaluations themselves (which in some ways already happens, and I expect will grow considerably over the coming year), the cost of developing these evaluations could very well drop. Perhaps one day, there will be a future where evaluations can be built “on the fly,” tailored to the precise needs of each business.
Armed with these tools, businesses might adopt AI with more confidence and speed. Regulators can write more objective and specific rules instead of relying on nebulous pronouncements (if they wish; some regulators like nebulous pronouncements). The world would be a better, more richly informed place if these tools existed, and AISI can make it happen—at minimal taxpayer expense and with no regulatory powers.
Conclusion
I have always been torn between pushing back on the overreach of Biden-era “safety” policies and grappling with genuine risks created by advanced AI. In fact, this was the topic of the first article I ever wrote publicly about AI, way back in October 2023:
How we contend with these newfound tools, whether and to what extent we trust them, and how we incorporate them into our society will determine whether they will disempower or enrich humanity. Being insulated from even the possibility of offensive content on the internet is quite different from being protected against, say, AI-created novel pathogens; speech is not a form of violence, but AI-powered drones may well be. It is time to set aside the petty squabbles of the past decade and address the serious legal, public-policy, moral, and philosophical questions that the prospect of highly capable AI has prompted.
AI systems with potentially dangerous capabilities are a real, near-term prospect. This is a cause for attention and concern, but not a cause for panic. The world need not end because of this; almost all major threats to society will require much more than just a new software capability to bring to fruition (even cyberattacks are not necessarily bottlenecked by the quality of the attackers’ code). Onerous restrictions on AI models themselves could still backfire even in a world with genuine catastrophic AI risk.
It often seems as though simply acknowledging facts about current frontier AI system capabilities and their likely near-future trajectory is grounds for being labeled a “doomer.” This is an untenable dynamic. I do not know when it will shift, but shift it will. Whenever that shift occurs, techno-optimists will need a coherent agenda for how to grapple with the new world that is rapidly being built around us.
Yet at the same time, I feel that a near-term picture of what governance of sophisticated AI could look like is coming into view. This is exciting, but the candid reality is that we are not on track to implement any of the ideas I—and many others—have put forth.
That first article of mine ended on a hopeful note:
This rapidly advancing technology, and the new era it may portend, can raise our individual and collective capabilities. With any luck, perhaps it can also raise the tenor of our discourse.
This has not happened yet, but I have not lost hope.
(Edited b/c I took a second look at your post and realised you addressed some of the points I raised).
We both seem to agree that expecting the US AI Safety Institute* to produce evals for 200 different industries would be a mistake.
I guess where we differ is that I would like to see the AISI more narrowly focused as focused orgs are more likely to succeed at their tasks.
So whilst I do see a role for an AISI in terms of providing advice to other government departments, I would personally lean towards leaving existing regulators responsible for contracting/funding evals orgs, with an exception for industries critical to national security.
* I'm actually Australian, so I've mostly thought about what an Australian AI Safety Institute should do and I'm mostly carrying these assumptions over here. This could be a weakness in my analysis, such as if there were significant contextual differences I was failing to take into account.