Quick Hits
I wrote a piece in The Dispatch explaining the Gemini incident: what, technically speaking, went wrong, and how does it relate to the bigger picture? This piece is for a general audience, so the more technically inclined among you probably will want to skip it.
The Arc Institute, a promising biological research center, published a new foundation model for biology called Evo. The model is trained on DNA sequences but can generalize to predicting RNA sequences, proteins, and even entire prokaryotic genomes. The ability to prompt a model to generate the genome of a novel lifeform, it would seem, is within reach. The model was trained on 2 x 10^22 flops, below the threshold set by President Biden’s Executive Order on AI. Here is the white paper on the model, and here is the model itself, which is open source (and was trained on open genomics datasets). If you’d like to generate DNA sequences using this model yourself from the comfort of your web browser, you may do so here. This is the kind of development that doesn’t surprise me, but is breathtaking nonetheless.
French AI startup Mistral announced a new model, Mistral Large, which is competitive with GPT-4 on benchmarks. While Mistral has been known for releasing leading open-source models, this one is closed (they do need to make money somehow, I suppose). The company also announced a partnership with Microsoft, part of which entails a single-digit percentage investment by Microsoft in Mistral. EU lawmakers are apparently upset about this; the EU AI Act was less onerous than it otherwise might have been in large part to keep Mistral competitive. The EU has threatened to investigate the Microsoft-Mistral partnership, which, again, involves a ~$15 million investment and a technical partnership to make Mistral’s models available using Microsoft Azure. It seems to me as though Mistral is doing precisely what it needs to remain competitive: partnering with a large firm that has access to the compute they will need to continue development. What alternative path would EU regulators have them pursue? In any case, you can play around with Mistral’s models using their online chat interface, which, charmingly, is called Le Chat.
AI Regulation: Getting the Order of Operations Right
Governments at all levels of the federal system are clamoring to both regulate and incorporate artificial intelligence. The opportunities seem boundless, but so do the risks. Understandably, policymakers want as much of the former and as little of the latter as possible. So does everyone else.
Unfortunately, our approach right now to accomplishing this goal is backwards. We are starting with prescriptive regulation that outlaws or punishes developers for the bad outcomes before we have reasonable technical standards for what constitutes a properly functioning AI model. The outcome we risk from this approach is that any misuse of an AI model will be considered the fault of the model’s developer, even if they in fact did nothing wrong. This is not how liability works for any other product, and such a regime could result in a highly constrained AI industry.
Let me show you what I mean.
(Quick note: I’m going to focus here on comprehensive AI regulation, not more target laws to outlaw specific undesirable things. An example of this would be Tennessee’s ELVIS Act, which bans using AI to clone the voice of a singer. The primary problem with laws like these are unintended consequences and enforceability, but I’ll avoid further comment because they are not my focus today.)
The aim of many comprehensive AI regulation proposals is to ensure that AI developers are accountable for blatant negligence. At the federal level, for example, Josh Hawley and Richard Blumenthal have proposed exposing developers to liability for any “cognizable harm” caused by their models. California’s SB 1047, which I wrote about recently, is perhaps a bit more reasonable: the threshold for punishment of developers is $500 million in damage from a variety of AI-related malicious acts.
What unifies both these proposals is that they trigger liability and accountability based purely on outcome. This is not how liability works for most other products. For example, every time you type on an iPhone running iOS 17 or later, you are using the sub-second transformer model (the same architecture as ChatGPT) that undergirds iOS autocorrect. If I defame someone using my iPhone and its AI-based autocorrect, have I employed an AI model to cause cognizable harm? What if I use ChatGPT to create a pitch for investors in a Ponzi scheme (assume ChatGPT does not “know” that I am running a Ponzi scheme)? It is unreasonable to think that Apple or OpenAI are legally responsible for such harms, but a poorly crafted AI law might open that possibility by defining any misuse as a flaw in the model.
Let’s explore this further by drawing an analogy to cars. We do not, to my knowledge, have laws that say things such as “it is illegal to make a car that kills people.” Instead, we punish irresponsible or malicious drivers with crimes like vehicular manslaughter. Sometimes, cars kill people even when neither the driver nor the victim is at fault. If my brakes suddenly fail, causing me to hit a pedestrian, and I have been maintaining my car reasonably well, there is a good chance that the company that made the car (or the company that made the brakes) is legally responsible for the harm.
This legal framework is coherent because we have minimum standards for what makes a reliable and safe car. Those standards were created and are maintained by the National Highway Traffic Safety Administration (NHTSA) and are known formally as the Federal Motor Vehicle Safety Standards. Here’s an excerpt from the section on brakes:
Fade and recovery. The service brakes shall be capable of stopping each vehicle in two fade and recovery tests as specified below.
S5.1.4.1 The control force used for the baseline check stops or snubs shall be not less than 10 pounds, nor more than 60 pounds, except that the control force for a vehicle with a GVWR of 10,000 pounds or more may be between 10 pounds and 90 pounds.
S5.1.4.2
(a) Each vehicle with GVWR of 10,000 lbs or less shall be capable of making 5 fade stops (10 fade stops on the second test) from 60 mph at a deceleration not lower than 15 fpsps for each stop, followed by 5 fade stops at the maximum deceleration attainable from 5 to 15 fpsps.
(b) Each vehicle with a GVWR greater than 10,000 pounds shall be capable of making 10 fade snubs (20 fade snubs on the second test) from 40 mph to 20 mph at 10 fpsps for each snub.
My point is not to educate you about brake safety mandates. Instead, it is to illustrate that the liability regime for carmaker accountability is undergirded by stupendous amounts of highly technical and prescriptive rules.
Setting aside, for a moment, the question of whether a similarly length set of rules is desirable for AI models, ask yourself: is it currently possible to write technical standards of this kind for AI models?
I would contend that the answer is no. We do not currently know enough about the uses of AI, the failure cases, and even the mechanisms of AI models, to craft such standards even if we wanted to do so. Until reasonable, yet concrete, standards are feasible, it seems that ensuring accountability through top-down lawmaking will be next-to-impossible.
Of course, this is not the only way that the contours of liability are established. We also accomplish that through the legal system, which interprets existing law in a decentralized manner based on the particulars of a case at hand. But such a process is necessarily time-consuming; the legal system responds to harms that have been at least plausibly demonstrated in the real world.
In all likelihood, we will arrive at an ideal liability and accountability regime for AI developers through both the court system and top-down standards-setting. The point, though, is that for a top-down accountability system to work for a technical product such as an AI model, technical standards of some kind must exist.
Fortunately, the United States has several of the world’s foremost standards-setting bodies. The National Institute for Standards and Technology (NIST), for example, sets standards for everything from scales and thermometers to DNA sequencing and post-quantum encryption. They are widely respected and often develop their standards in close collaboration with both academic experts and representatives of the relevant industries.
President Biden’s Executive Order on AI tasks NIST with coming up with a variety of technical standards, including for evaluating model capabilities, red-teaming (testing) models, and other objectives. I suspect that fully satisfactory standards will not be achieved within the 270-day deadline specified by the Executive Order (I would love to be wrong about this), but generally speaking, this is moving in the right direction.
I have to think that whatever standards NIST does release within that timeframe (I will cover them here) will be works in progress. We are simply too early in the scientific process of understanding how advanced AI models work for the job to be done in such a short period. We need significant breakthroughs in areas such as AI interpretability and alignment to be able to robustly specify, in technical language, what constitutes a reasonable standard of care in AI model development.
NIST’s standards are not, by and large, legally binding on their own because NIST is a standards-setting agency, not a regulatory agency. But as appropriate standards are developed, lawmakers and regulators can use them to craft much simpler and more coherent laws: “If you wish to commercialize an AI model in [insert industry here], you must adhere to [relevant NIST standards].” This is how the system was designed to work. It is not clear to me that we should rush to legislate before laws of this type are feasible to craft. As President Calvin Coolidge put it, “Give administration a chance to catch up with legislation.”
Standards like the ones NIST aims to craft may take years to finalize, and open scientific research to further our understanding of AI models is essential. NIST does not set standards for other fields on its own, and it certainly cannot do so here. This is one of the many reasons that regulations which threaten the viability of open-source AI development are so deeply counter-productive. Instead, government should be doing everything it can to accelerate scientific inquiry into AI, including by funding public AI compute infrastructure.
The logical order of operations for AI regulation, then, is simple: scientific research and industry/academic engagement leads to better standards, which leads to better laws. If we follow this basic sequence, the odds of a positive regulatory outcome are far higher. If we ignore it, the odds of a bad or downright stupid outcome are far higher. It is, without a doubt, more complicated than that: the exact standards will matter a great deal, as will the exact laws used to give those standards teeth. But the good news is that the broad framework is in place, and despite the efforts of too many lawmakers, we seem to be following it so far.