Welcome, new subscribers! I am glad you are here. This is a new project, and I welcome your feedback. I’d also appreciate your sharing my newsletter with others. This is, by far, the best way to support this project. Thanks to April Pawluk of the Arc Institute and Ruxandra Tesloianu of the Wellcome Sanger Institute for their insight and guidance (neither of them saw this piece in advance nor necessarily endorse the ideas herein).
Introduction
Among the most worrisome speculative risks related to AI is the ability for small groups, and perhaps even individuals, to manufacture chemical and biological weapons. Dario Amodei, CEO of the AI lab Anthropic, claimed in recent Senate testimony that AI systems in 2-3 years could “greatly widen the range of actors with the technical capability to conduct a large-scale biological attack.” To contend with this challenge, Amodei proposes regulating AI models as “cars” or “airplanes.”
Similarly, the Center for the Governance of AI, a prominent AI policy think tank, has used the threat of bioweapons, among other things, to assert that open source AI in particular has “extreme risks” which require government to “enforce safety measures” (what those specifically might be is mostly left unsaid) using “liability law and regulation, licensing requirements, fines, or penalties.” How exactly this is to be accomplished is similarly left to the reader’s imagination, though the report admits that “further research is needed to understand the risks, benefits, and legal feasibility of different policy options,” and notes that enforcement may also be a challenge.
This threat has captured the minds of federal policymakers. President Biden’s Executive Order on AI focuses heavily on the biological and chemical weapons risk, directing the National Institute of Standards and Technology, the Department of Homeland Security, the Department of Energy, and other agencies to explore the issue further. The Order’s most consequential provision is a reporting requirement for AI models trained above a certain computing power threshold; that threshold is three orders-of-magnitude lower for models primarily using “biological sequence data.”
On a superficial level, it seems to make sense that focusing on AI regulation is the best way to mitigate the risks of AI-enabled biorisk. However, advocates of this approach often fail to explain how, exactly, AI per se would simplify the creation of novel bioweapons so radically that untrained individuals or groups would suddenly have the ability to synthesize one.
Furthermore, AI is precisely the wrong layer of the production chain to target for reasons of practicality, legality, and the principles of open scientific inquiry. Exploring why this is the case will lead us to a better way to mitigate biorisk, while revealing fundamental flaws in the way AI policy is currently conceived along the way. Let’s dig in.
AI Biorisk: Examining the Evidence
First, it’s worth observing that neither Amodei’s testimony nor the Center for the Governance of AI proposal cite much evidence to support their claims about the risks. As helpfully illustrated here, the Center for the Governance of AI paper in particular, and many other prominent ones on this subject, end up citing evidence-free claims in other papers to back up their own assertions, rather than citing concrete evidence. Amodei says that Anthropic has done deeper research, but can’t share it with the public. Given the dramatic effects these policy proposals would have on the crucial open source AI community, the AI field more broadly, and science, the public deserves to know more.
It is doubtful that large language models (LLMs) like ChatGPT alone would meaningfully aid in a biological or chemical attack beyond what already exists on the Internet. After all, LLMs are trained on Internet data, so we shouldn’t expect an LLM on its own to generate scientific insights that were not already known. A RAND report released just last week compared the efforts of different test groups to conduct a hypothetical biological weapon attack: some that only had access to the Internet, and some that had access to the Internet plus LLMs (alas, they do not say which; it would be helpful to know). The authors found “no statistically significant difference in the viability of plans generated with or without LLM assistance.”
Little is known about what, concretely, constitutes a dangerous AI model in the context of biorisk. The production of a bioweapon is a complex process, requiring specialized equipment, substantial expertise and know-how, and extensive experimentation. There is no doubt that AI, including but not limited to LLMs, can accelerate steps in this process, but it is far from clear that it can radically simplify the process itself (this is interesting to think about in the context of AI diffusion more broadly, and one of several reasons I doubt it will transform the world quite as quickly as others do—but I digress).
Even a breakthrough such as DeepMind’s AlphaFold, which largely solved the longstanding protein folding problem, did not eliminate the need for real-world experimentation; predicting the structure of a protein is one thing, but understanding how that protein will interact with the complex environment of a cell, including the millions of other proteins within that cell, is a different challenge altogether. This understanding can only be attained through experimentation, at least currently. It is unclear how an AI model per se could enable a person or group that lacks substantial expertise, capacity to carry out experiments, and specialized equipment to create a novel bioweapon—and anyone with such capabilities already likely wouldn’t need AI to do so.
This alone should cause us to question whether biorisk is best addressed at the level of an AI model, rather than focusing on physical constraints in the process with specialized and hence more controllable supply chains. Regulating software, on the other hand, is far more challenging and carries with it unique and high costs.
The Right and Wrong Way To Regulate AI
That brings us to the deeper problem with these proposals, and most proposals to impose mandatory regulation on AI models: enforceability. AI is software, and software is a manifestation of thought—a form of knowledge that has become a crucial part of how human beings express themselves and conduct their lives. Because software is a form of knowledge, it spreads in a way that is not easily susceptible to government regulation. It is thus fiendishly difficult to manage the diffusion of knowledge and software alike. This is not, at its core, a problem to be mitigated; it is a fact to be reckoned with.
AI safety advocates often skirt this issue by insisting that their proposals for mandatory standards and government approval apply only to ‘frontier’ model makers like OpenAI, Google, etc., who presumably will comply. This ignores the fact that today’s frontier is tomorrow’s old news. Computing power increases rapidly; data, architectural, other software improvements boost performance even faster. The French startup Mistral, for example, released an open source model late last year that matches OpenAI’s GPT 3.5 (released 2022) in performance benchmarks, at only 25% of GPT 3.5’s size (46.7 billion versus 175 billion parameters). And remember: these improvements happen as the cost of computing hardware also decreases.
The fact that software improvements can enhance performance makes setting a “compute threshold” over which regulation is triggered difficult: What if an architectural improvement yields, say, a 10% reduction in the compute necessary for training a model? Would we need to lower the threshold by 10%? What if it only works for some models in some circumstances? Furthermore, given the almost total lack of concrete knowledge about what substantively defines an AI model with ‘dangerous’ capabilities, especially in the case of bioweapons, how should we even set a computing threshold in the first place? Compute thresholds are an early attempt by policymakers to make AI ‘legible’ to government, as described by James C. Scott in his book Seeing Like a State. Legibility is how states make reality intelligible and actionable; this attempt at legibility, however, is not especially useful.
Finally, these proposals assume that model makers will seek to comply with the regulations. Legitimate businesses will mostly do so, but any person or group planning to launch a biological or chemical attack is presumably unconcerned with informing the Department of Commerce whether it has complied with NIST’s AI Risk Management Framework and unworried about a government fine. Such individuals are by definition outside the law, so policing their activity requires proactive enforcement rather than standards. Instead of side-stepping the issue or pretending that it can be dismissed with things like compute thresholds, let’s consider what enforcing AI model regulation on everyone would look like in practice.
An AI model requires, in very basic terms, three things to train: data, a model architecture and related software tools, and compute (most commonly graphics processing units, or GPUs). Compute is not the focus of this piece, so let’s set it aside (I will return to so-called ‘compute governance’ some day).
Data, including genomics data, while a critical source of competitive differentiation in the AI field, is available broadly. Here, for example, is an open dataset containing genomics data for 2,504 humans, available from Microsoft. Similar datasets for plants and animals exist freely online.
Model architectures, pre-trained models, and the related software (such as PyTorch) needed to create and fine-tune an AI model are similarly available for free online, primarily hosted by the open source AI platform Hugging Face. An LLM called CodonBERT, designed to generate mRNA sequences, was released in the past few months (and trained, by the way, on 4 Nvidia A10 GPUs, costing around $15,000 to $20,000 in total, or about $1,000 on a cloud computing service—not all potentially useful AI models require millions of dollars to train). DeepMind has openly released hundreds of millions of protein structures predicted by its AlphaFold model, as well as the model itself.
If our goal is to ensure that a bad actor cannot make use of tools like this, we’d have to eliminate them from the Internet. None of the scientific datasets deemed dangerous could be allowed, because a malicious non-state actor could use them to train a model. Open source AI models would need to be banned and removed from the Internet. Even if they complied with the government’s standards, they could be modified by the malicious actor to subvert those standards. Presumably, this would also be required for all of the open source tools that are currently used to build AI models.
Beyond the obvious blow this would deal to the AI industry and the scientific community, which thrive on open access to tools and information, it would be unenforceable without something like China’s Great Firewall. How else could government keep the undesirable things off of the Internet? Universally applicable and genuinely enforceable regulation for software would therefore require the United States to jettison its status as the “leader of the free world.”
None of this is to say that safety and reliability standards for AI models are undesirable. Mandatory standards for AI models used in various fields, such as medicine, cybersecurity, or financial services, can be effective, particularly if they are developed collaboratively with industry and targeted appropriately. The FDA, for example, could require that all medical devices using AI only employ models that meet certain minimum standards. Government procurement rules could forbid agencies from using models below a given standard. Such standards could be tailored for each industry and evolve over time. They may very well help the AI field, because assessing model quality is currently quite challenging for both AI researchers and businesses interested in using AI.
Regulation of this kind is quite different from a mandate that no AI model below a given standard be permitted to exist at all. The former can be enforced through regulatory and procurement pathways that are already in place; the latter would require the invention of an entirely new, and quite invasive, enforcement mechanism. Similarly, anyone is free to write a biology textbook and publish it on the Internet, but public schools choose which textbooks they will give students based on mandatory curricular standards. It is one thing to have rules about which textbooks schools can use; it is quite a different thing to forbid the publication of unapproved textbooks on the Internet.
The standards approach, useful though it may be, does little to mitigate against chemical or biological attacks made using AI, unless accompanied by a draconian enforcement regime. Software and the use of general-purpose computing devices are simply not the right layer of the production chain to target for mitigation of this particular risk. So what is?
A Mitigation Strategy That Works
Fortunately, creating a novel biological or chemical weapon requires more than just the exact recipe needed to make it. Of course, specialized scientific know-how, as opposed to knowledge, is essential. ChatGPT may be able to tell me how to make hare à la royale, but that doesn’t mean I’ll know remotely how to do it. But more importantly for these purposes, it also requires specialized tools, such as RNA/DNA synthesis machines in the case of a virus or bacteria, and substantial laboratories in almost all cases.
Experimentation will always play a crucial role in science, and AI is making progress in automating the process of experimentation itself. This is particularly true in automated cloud labs, which allow scientists to remotely conduct experiments by issuing commands in code. LLMs like GPT-4 have successfully carried out autonomous facilities using such facilities; because these labs are largely automated, it is possible for an AI model to order and complete an experiment with minimal, or even no, human intervention. While this is a promising development that could rapidly accelerate scientific progress, it is crucial to ensure robust security at such cloud labs.
Strangely enough, even though it is AI that makes this risk more salient, it is best addressed by focusing on almost every part of the production chain other than AI. This is because this is specialized, physical, equipment whose use can be reasonably regulated, even strictly, without impinging overmuch on freedom of thought, speech, or scientific inquiry.
Just a few examples of these controls include:
Regulating the export of DNA/RNA synthesis equipment;
Mandatory screening mechanisms for DNA/RNA synthesis, as described here and as put forward in President Biden’s Executive Order on AI;
Mandatory KYC (Know Your Customer) for data centers and cloud labs (good news: this appears to be in progress for data centers);
Diplomatic efforts to encourage as many countries as possibly to implement all of the measures above.
These mitigations do not eliminate all risk, but they mitigate risks effectively and well within the limits of our Constitution.
As ever, conversations about AI remain overly focused on the risks rather than the benefits. Imagine the capabilities that our finest scientific enterprises would possess in a world where poorly trained non-state actors have the capability to create novel bioweapons—imagine also the new capabilities we would have to curtail the non-state actors. Imagine the diseases that could be cured, the suffering that could be eliminated, from such a suffusion of intelligence throughout the world. Yet these achievements are hard, and the technology to achieve them is still nascent; the future is always fragile. Perhaps the risk that should concern us the most is whether we will let our fear get the best of us.