Questions, Unasked and Unanswered

The US-China Commission Floats an AGI Manhattan Project

Nov 20, 2024

Introduction

Things now started happening at an accelerated pace and on an expanding scale, and it became uncommon for anything to happen as anyone had expected or intended.
A World Undone, G.J. Meyer

On Tuesday, the US-China Economic and Security Review Commission released its annual report with recommendations to Congress about US policy relating to China. The commission was created by Congress in 2000 to study the US-China relationship. It is, from my vantage point, well-respected in Washington.

Not much about this lengthy document caught my attention. But the first recommendation caused every AI policy researcher’s phone to light up simultaneously:

The Commission recommends:
I. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would usurp the sharpest human minds at every task. Among the specific actions the Commission recommends for Congress:
Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and
Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.

This text reads to me like an insertion rather than an integral part of the report. The prose, with bespoke phrasing such as “usurp the sharpest human minds at every task,” is not especially consistent with the rest of the report. The terms “Artificial General Intelligence,” “AGI,” and “Manhattan Project” are, near as I can tell, mentioned exactly nowhere else in the 800-page report other than in this recommendation. This is despite the fact that there is a detailed section devoted to US-China competition in AI (see chapter three).

My tentative conclusion is that this was a trial balloon inserted by the report authors (or some subset of them), meant to squeeze this idea into the Overton Window and to see what the public’s reaction was. Allow me to give you my own.

About the Report

As always, there are a few facts worth noting for context. First, a Defense Priorities and Allocations System (DPAS) DX Rating gives the Department of Defense (or any other federal agency involved) first priority for any equipment, services, and other goods it deems necessary for building “AGI.” It does not, however, permit seizure of private assets by the US government.

Instead, the US government would be first in line for every GPU Nvidia sells. They would be first in line for every extant GPU in hyperscaler data centers on American soil. And first in line for all models made by any company doing AGI research. And for services provided by companies like Scale AI, which includes worldwide collection of human-generated data for model-training purposes. And every industrial electrical transformer, and anything else necessary to get the job done.

We are talking, then, about what comes close to government command-and-control powers over any economic resources deemed necessary for winning the “race” to “AGI.” And an open-ended commitment for the US government to spend whatever it takes—conceivably hundreds of billions of dollars—over the next few years.

You would think, with such a detailed report from such a well-regarded (and federally commissioned) group, there would be a great deal of explanation for a proposal of this magnitude—the very first recommendation the authors make.

But you would be wrong. What is perhaps most surprising about this proposal for an AGI Manhattan Project is that the quoted portion above is the most detailed statement of the envisioned policy.

The report’s third chapter goes into a reasonable amount of detail about the competition between the US and China over various emerging (and emerged) technologies. It notes that we are winning many of them. It provides a decently comprehensive, think tank-esque summary of the important vectors of AI-related competition between the US and China. What it does not do, however, is provide a justification for the radical Manhattan Project policy it recommends first among all other policies.

Here are just some of the questions this report could have addressed.

The Unasked Questions

How sure are we that large multimodal models per se are the source of a decisive military advantage? Many of the military-inflected uses of AI the report cites (autonomous drones, better target recognition and coordination on autonomous weapons, and others) are not likely to be driven by massive multimodal generalist agents.

While “AGI” surely has numerous uses for the military, are those uses more like electricity or the computer? No modern military is possible without electricity or computers, but in general we do not think of our most fearsome weapons as fundamentally “electrical” or “computerized.” We instead think of those general-purpose technologies as the machines that enable the weapons—electricity, computers, and software undergird the factory that makes the weapons, the supply chain and logistics that bring those weapons to the battlefield, the internal systems enabling the weapons’ capabilities, and the communications infrastructure that helps military leaders determine whether and how to use those weapons in combat.

How useful, or wise, or productive, does a “Manhattan Project” for electricity, or a “Manhattan Project” for computers, sound to you? Does that sound anything like the quest to create a handful of atomic bombs? Does the Manhattan Project seem like an optimal template to think about how to structure the organization that builds AGI? Do nuclear weapons seem like the optimal technological analogy for reasoning about AI?

As I have written consistently, I think the answers to these questions are far from obvious. In fact, I lean instinctively against the idea that nuclear weapons or the Manhattan Project are useful for thinking about AGI.

And perhaps you agree with me that nuclear weapons are a poor way to think about how AGI will be used. But you could still argue, as Leopold Aschenbrenner does in his Situational Awareness essay series, that there are more salient parallels. AGI will require, we are told, data centers costing $100 billion or even $1 trillion. Certainly infrastructure of this kind seems quite susceptible to a “Manhattan Project”-esque structure.

But will we even need such facilities? Over the past few weeks, there has been (controversial) reporting that the next generation of large models from OpenAI, Anthropic, and DeepMind have not met those companies’ internal expectations. I’ve had the intuition for most of this year that the next generation of big models (GPT-5 and similar) will be a disappointment relative to expectations. This is because of what the scaling laws precisely mean (which few discuss), rather than what they mean impressionistically (which is what people tend to discuss).

In short, scaling laws suggest that as data, compute, and model size are increased, the model’s cross-entropy loss will decrease. Think of this as a measure of how confident a model is in its prediction—in the case of a large language model, that would be its prediction of the next token in a sequence of text. I have been skeptical that further decreases to cross-entropy loss will, on their own, result in genius insights that stun users of consumer chatbots. Instead, as Nathan Lambert has thoroughly illustrated, increasing model size leads to an increase in robustness—to put it crudely: How often the model does difficult tasks correctly. This may prove essential for use cases like agents (models that can take action on behalf of users), but might not stand out to consumers with standard chatbot queries.

Will increases in that specific characteristic of a model justify the construction of $100 billion data centers and the accompanying gigawatts of energy infrastructure? Keep in mind that developers have to spend exponentially more for each marginal improvement on cross-entropy loss, and that this improvement becomes smaller with each new (far more expensive) model. Will improvements of this kind be necessary to building AGI?

And even if they do prove essential, are we sure that kicking off a “race” to AGI with China is the wisest course of action? Set aside the question of whether such a thing would be the safest course of action—instead, consider that, in a command-and-control race for AGI, it is not obvious that the US wins. As Dylan Patel opined in a recent interview with Dwarkesh Patel:

We [the US] don't have gigawatt data centers ready. China could just build it in six months, I think, around the Three Gorges Dam or many other places. They have the ability to do the substations. They have the power generation capabilities. Everything can be done like a flip of a switch, but they haven't done it yet. Then they can centralize the chips like crazy. Right now they can be like “Oh, a million chips that NVIDIA's shipping in Q3 and Q4, the H20, let's just put them all in this one data center.” They just haven't had that centralization effort yet.

And perhaps $100 billion clusters were unrealistic all along. As an alternative to scaling model training compute, for example, OpenAI’s new o1 paradigm emphasizes scaling inference compute. How important will that prove to be? Inference compute requires different data center configurations from training compute. Those data centers to be placed in close geographic proximity to large population centers, whereas training clusters can be located just about anywhere (on US soil, please!). What is the ideal mix between these different kinds of facilities? Will decisions like this be made optimally by, in essence, the Department of Defense? And if massive compute clusters of this kind are not needed, or less useful than many thought, what, exactly, is the value added by the US government?

And what about practical implementation? Who will lead this initiative? Will the top AI companies be merged so that their resources can be pooled? What interpersonal, organizational, technological, economic, and legal barriers might there be to merging the top AI labs? Given that these companies all have distinct technical approaches to AGI development, who will make the final decision about which approach we should take? Will the many non-US citizens who work at these companies be allowed to work on an American “Manhattan Project?” Will the military release AGI to the public, or will they preserve access to it for the federal government (and various international and corporate partners) alone?

Will the AGI Manhattan Project proceed just as the US federal government pursues antitrust investigations into Microsoft, OpenAI, Nvidia, and others? Will the AGI Manhattan Project proceed as the federal government attempts to break up Google, the largest owner of compute in the world? In fairness, President Trump has communicated his desire not to break up Google—but the sheer level of cognitive dissonance coming from the federal government on AI policy does not make me confident that they will be better stewards of AGI development than the companies currently leading the way.

Conclusion

In theory, all these questions can be answered. In practice, I suspect some of them cannot be answered quite yet. Some of them are sure to the subject of vigorous debate. Yet in a nearly 800-page report, precisely none of these questions are asked, and precisely none are answered.

Supporters of this proposal, or anything like it, will need to do much better than that.

Hyperdimensional

Discussion about this post