Introduction
In my most recent commentary on SB 1047, and in several Twitter interactions, I have argued that one of the biggest marks against the bill is that it is simply premature. It would be better, I believe, to wait until the potential for catastrophic risks from AI models is more clearly demonstrated. If that happens—and to be clear, I won’t necessarily be shocked if this happens soon—we will have the opportunity to craft a bill that is both grounded in empirical evidence and, at least conceivably, more precisely tailored to the kinds of risks that seem the likeliest (rather than stabbing in the dark, as we currently are).
If evidence emerges that future models are indeed dangerous, that evidence is likely to drive the federal government toward action. The federal government is better suited than California’s to handle frontier AI regulation, so in addition to evidence leading to a better bill, it could also lead to a bill from the appropriate jurisdiction for such things.
More broadly, SB 1047 sets a precedent of regulating emerging technologies based on almost entirely speculative risks. Given the number of promising technologies I expect to emerge in the coming decade, I do not think it would be wise to set this precedent.
Many SB 1047 proponents disagree with me about this; they argue that if we wait until the risk potential is more evidence-based, we will have waited too long, and that things will “get out of control.” I think it’s worth explaining why I don’t find this line of reasoning especially persuasive. Let’s think about this, as they say, step-by-step.
The Trajectory of AI
AI history is replete with “AI summers,” where recent breakthroughs caused researchers to believe that AGI was just around the corner, and “AI winters,” where it became clear that those breakthroughs were far from enough. One should not discount the possibility that a similar “AI winter” is coming once again. Perhaps we will discover that mastery of language, image generation, and video creation is not enough; perhaps the magic of human “general intelligence” (which, for the record, I doubt is really all that general) will elude us once again, and perhaps one day we will all just think of language models as another form of “machine learning,” rather than true “AI.”
This is a live academic dispute, and there are eminently credible figures on both sides of it. I may have my own hunches, but hunches—my own included—are insufficient justification for public policy. Public policy is expensive; we need to have, to borrow a favorite phrase of SB 1047 supporters, “reasonable assurance,” that the cost is worth the benefit. That evidence is currently insufficient in my view.
Most SB 1047 proponents basically agree that current frontier AI models do not pose the kind of threats they have in mind. Indeed, SB 1047 wouldn’t apply to any publicly available AI model (at the time of writing). It’s future models they argue we should be worried about. And they argue that waiting until those future models are here, and have the ability to cause catastrophic harm, would be irresponsible. We’ll have already lost control of the plot by then, they’d argue.
But this criticism has never made much sense to me. It seems to me you’d need to believe a few things for this rebuttal to make sense:
Future AI models will leap from “no ability to cause much harm at all” to “able to cause catastrophic harms as envisioned by SB 1047” ($500 million in damage or similar);
Those models will be uncontrolled, implying one of four sub-beliefs:
They begin life as open-source;
They will be controlled poorly by their closed-source creators;
Open-source developers like Meta and Mistral will eagerly follow the closed-source providers and release models with catastrophic harm capabilities as open source;
A malicious actor will themselves make a model capable of catastrophic harms.
Policymakers, including California’s legislature, will remain indifferent to all of this as it is happening, likely over the course of 6-18 months.
You don’t just need to believe one of these things, you need to believe them all (well, you’d need to only believe one of the sub-points under (2)). So let’s take them each one at a time.
1. Future AI models will leap from “no harm at all” to “catastrophic harm”
This point is the most believable to me. If a model goes from not being able to meaningfully assist in cyberattacks to being able to do so, then in principle that could mean quite a profound potential for damage. The damage done by a cyberattack isn’t necessarily proportional to the skill of the cyberattacker; it’s proportional to the size and value of the target. As a general matter, larger and more valuable targets have more security, but this is not necessarily a uniform fact about the world. And automated systems that search for vulnerabilities in a sophisticated manner may be able to uncover targets in the sweet spot: valuable and vulnerable.
The same goes for things like biorisk; once you can “assist in the creation of a bioweapon,” you can presumably make some potent things. You may not be able to engineer a supervirus, as some fear future systems will do, but conceivably you could do something like iterate on the COVID genome and create some nastier form of the virus than nature has produced. I have written before that biorisk, even with transformatively capable AI, is at least a bit overrated because of the many other real-world steps and pieces of equipment required to manufacture a bioweapon. But surely, it could provide some motivated group meaningful uplift.
2. Future models with catastrophic harms will be uncontrolled
There are four sub-beliefs here, so I’ll tackle them one-by-one.
2a. Future frontier models will begin life as open-source
Thus far, the frontier of AI has been decidedly led by closed-source providers like OpenAI, Anthropic, and DeepMind (ignoring DeepMind’s good, but not frontier, Gemma line of open-source language models). This could change, though; Mark Zuckerberg has indicated that he expects Meta’s open-source Llama series to lead the frontier of AI by the end of next year.
Still, it is widely expected that truly next generation models like GPT-5 will be released by early 2025 (or whatever it ends up being called; I have a sneaking suspicion that GPT-5 exists, but is not the whole story, and that instead OpenAI’s next generation is a broader system of some sort, of which GPT-5 is just one part). It seems likely that this model (or system) will have been tested not just internally within OpenAI, but also by the federal government’s AI Safety Institute. There are also rumors that it has already been shown to officials within the national security apparatus.
If the next generation of models possess catastrophic harm potential and begin life as open source, one could indeed see there being chaos. But this would require someone with the financial resources—billions of dollars, at this point—and the physical infrastructure and the human capital necessary to produce such a model to decide to release such a thing into the world. That person or company would need to be something of an open source jihadi, or maybe just an honest-to-goodness jihadi. That seems highly unlikely to me.
2b. Future closed-source models will be controlled poorly by their developers
Part of the appeal of closed-source AI, from a security perspective, is that usage of the models can be closely monitored by the developer. This allows them to shut down uses that conflict with their terms of service (like when OpenAI removed Bytedance’s access to their API for using GPT-4 outputs to train Bytedance’s own language model) or violate the law (like when OpenAI has announced the discovery of malicious model use by Chinese, Russian, and Iranian actors).
We don’t have any good statistics on how quickly malicious use is noticed, investigated, and stopped by frontier closed-source AI providers. But surely the announcements by OpenAI, which have notably picked up in the past six months, indicate that at least that company is getting more serious about such things. It would be surprising if Anthropic, with their deep institutional focus on safety and security, was less fastidious than OpenAI. And Google (DeepMind) has been running worldwide web services for decades; one imagines they know how to monitor usage of their products for malicious behavior.
Keep in mind, also, that future closed-source (and open-source) models are unlikely to help with cyberattacks or bioweapon creation out of the box. They will need to be jailbroken, which, while possible (often easy) for any model, requires a pattern of user activity which is itself detectable by closed-source model developers.
2c. Open-Source Developers Will “Fast Follow” Closed-Source and Release Models with Catastrophic Harm Potential
In most circumstances—particularly if a closed-source mode demonstrates the potential to enable catastrophic harms—this scenario would require companies like Mistral and Meta, which have investors and employees and the eyes of the world on them, to be—again—jihadis about releasing open-source models. I don’t think these companies or their leaders are jihadis. Perhaps I am wrong.
It is conceivable, I suppose, that an open-source model possesses some latent catastrophic capability that no closed-source competitor does, or that is only demonstrable with full access to the model weights. Based on Meta’s excellent Llama 3.1 paper, though, it seems as though the company takes testing for such capabilities seriously. Indeed, Meta provided as much or more detail about their testing of Llama 3.1 than Anthropic or OpenAI has with their recent frontier models. Furthermore, I will be entirely unsurprised if Meta (and even Mistral, if they can keep up) ends up agreeing to give their next-generation models to the federal government for pre-deployment testing; so it seems likely that the next-generation of open frontier models will be subject to just as much testing as the closed models.
2d. A Malicious Actor Will Make a Model with Catastrophic Capabilities Themselves
This is possible, though obviously would require non-trivial resources that at least a non-state actor would struggle to muster. It’s conceivable that a nation-state could make such a model, though it is far from obvious that, for example, the Chinese government would want a model of this kind to exist in the world. It would be just as threatening to them as it is to anyone else.
Furthermore, once we are talking about actors of this kind, we are beyond the scope of harms that US law, especially a law passed by a state government, can successfully mitigate. Hamas, the CCP, or the Iranian government probably do not care about being taken to court by the California Attorney General.
Finally, moving on from item (2) in my list, let’s move to the final belief—that government would remain indifferent to models with catastrophic harm potential.
3. Governments would remain indifferent
This is perhaps the most contentious argument. SB 1047 supporters seem convinced that governments would remain indifferent even as models reach catastrophic or near-catastrophic harm potential. I simply don’t believe this.
The federal government has shown a willingness to turn on a dime in the recent past: witness how quickly radical levels of government control over the economy and daily life were asserted in a matter of weeks during the pandemic (let alone the trillions spent). It is true that government has struggled to respond to the big, slow-moving crises of our time, such as the federal debt or immigration or military readiness. But when it feels threatened, it tends to lash out aggressively.
The federal government already has a variety of tools at its disposal to control the distribution of AI if it seems it to be a matter of national security. At its most extreme, the federal government could seize all of the compute, intellectual property, and even the office furniture of every frontier lab, in a heartbeat, under the Defense Production Act (and in fact, the Biden Executive Order on AI uses DPA authorities as the basis for its reporting requirements for frontier model developers).
Setting aside the federal government, I needn’t mention what state governments—including California, including Scott Wiener himself—would be able to get done in the event that models showed compelling evidence of possessing catastrophic harm potential. Look how far California’s legislature got with SB 1047, and numerous other bills, in the absence of such evidence.
In short, I’ve never seen America’s governments more primed to regulate anything in my career of observing public policy, and I would be shocked if new, scary AI capabilities did not meaningfully change the policy conversation.
Conclusion
Currently available AI models are not dangerous. It is unlikely that failing to pass SB 1047 now would give policymakers no time to act if future models are dangerous. If future models pose catastrophic risks, there will be time to update public policy. The combination of beliefs one must have to think otherwise defy reason. It is not literally impossible, but it is unlikely.
If you believe we are on an exponential trajectory to AGI, you should probably expect next-generation models to be the ones that begin demonstrating catastrophic potential. If models do not demonstrate those harms, it is not obvious to me that we need to make many changes at all to our regulatory regime. Sure, there are some holes we need to plug regarding deepfakes (I’ll have a report out on this soon, by the way). But as a general matter, better, faster, cheaper, and more fact-grounded LLMs will not shake the foundations of our civilization.
If, on the other hand, we are indeed on a short timeline to AGI, and radically new model capabilities are coming soon, we may well need a novel policy regime. But that is a story for another day.
Not the most exciting article, but one that you’ll reference a lot
I understand where you're coming from in these articles and you are raising excellent points. But what is your take on the whistleblowers who've left OpenAI? Sam Altman puts up a good front and all that, got great press, but then again, so did Sam Bankman-Fried.
You pointed out, in a previous article, that heavy restrictive regulation could have severely damaged the growth of personal computers back in the 1980s. Fair enough. But I see another comparison.
Gain of function research being valuable but inherently dangerous. Covid 19 hits. Must be from a wet market...not a lab leak (and don't say that because it's racist). Lotsa CYA in my opinion. Screw ups happen, damage gets done. Often because participants were too concerned with progressing to the next step than they were travelling along a safer path. The result there was 3,000,000+ dead.
Are you going to get involved with this: https://aitreaty.org/ or this: https://righttowarn.ai/