Thresholds

Dec 12, 2024

o1, o1 Pro, and what comes next

10 Comments

> Many in the safety community always assume that every capabilities-related hurdle will be overcome, yet at the same time they assume that the safety problems that concern them most are somehow uniquely difficult. I have never quite understood the logic behind this—yet at the same time, I’ll be keeping my eyes open.

I think the intuition is something like this – I present this without necessarily fully endorsing it:

If there are weaknesses in a model's capabilities, there will be unmistakable signals (e.g. dissatisfied customers) and lots of motivation to fix them. The dynamics of the system are such that resources will be reliably, and on balance at least somewhat intelligently, directed toward improving capabilities.

If there are weaknesses in a model's safety properties, there may or may not be clear signals (see below), and the dynamics of the system are not as reliable in ensuring that risks are addressed. Look at how long it took for issues with tobacco or leaded gasoline to be addressed (neither is fully addressed even today, e.g. lead in aviation fuel) vs., for instance, how long it took for the cell phone industry to move to smartphones. In general, the motivation to fix safety issues can be weaker and less direct than to fix capability gaps: feedback loops (such as getting sued, or public outcry leading to policy changes) are slower and less direct than a tightly monitored revenue pipeline, and model and application developers don't necessarily internalize all of the downside (especially for catastrophic risks, or small companies where potential harms could easily exceed the entire value of the enterprise). (It is worth noting that developers aren't able to internalize all of the *benefits* of their work, either.)

I mentioned the question of clear signals. I think people have a wide range of intuitions as to whether a catastrophe could occur without warning (e.g. a foom and/or sharp-left-turn scenario leading to an unforeseen loss-of-control event, or a bad actor unleashing a really nasty engineered virus out of nowhere). Unanticipated weaknesses in capabilities are fine in the grand scheme of things, maybe someone has a bad quarter while the market routes around the inadequate product, or customers find workarounds. Unanticipated weaknesses in safety properties can have a very different impact.

There are proposed mechanisms by which such unanticipated issues could occur, such as deceptive alignment.

Another asymmetry: capability failures are addressed at the speed of capitalism, while safety failures might only be addressed at the speed of government. (I guess this is sort of the same as things I said earlier.)

Finally, some people observe the discourse of the last few years and conclude that some folks are acting in bad faith, and/or have bought into their own arguments to such an extent as to have the same result, and will actively downplay warning signs, evade safety requirements, etc. If you believe this, then you'll want to force the discussion now and set the stage for restrictions-with-teeth before events overtake us.

Expand full comment

Reply (1)

Dean W. Ball

Dec 13

I think this analysis suggests something like "governments and other third parties should do pre-deployment testing of frontier models for catastrophic risk potential," but the kind of ambient alignment problems the Apollo Research addresses would manifest themselves as capabilities flaws. The models would be unreliable in a host of different settings. So I do think that market mechanisms basically solve that problem.

Expand full comment

Ryan David Mullins

Dec 12

It seems a fundamental paradigm shift in our computing model. Moving from a passive, spectacle based consumption experience to a movement, agent based model. The previous paradigm was based on search and the democratization thereof. Devices were dumb terminals that displayed content. But if you can democratised reasoning and intelligence then the entire cloud computing model evolves to something new.

Expand full comment

Sandro

Jan 12

Outstanding article! Do you believe AI will level the playing field for everyone - a high school graduate yielding the same result as someone with a Ph.D.? How will AI upend society in general, where everyone will have access to the same result regardless of their educational level? Will AI cause everyone to be equal?

Expand full comment

Reply (1)

Dean W. Ball

Jan 12

It will in some ways, but there will always be new vectors of competition. For instance, simple creativity and willingness/institutional ability to adopt AI will remain nontrivial. Also, there will always be information asymmetries—things some people know that others do not, and that can be used for competitive edge.

Expand full comment

Logan Thorneloe

Dec 16

This is an awesome article. Most people working in tech overlook just how important crossing thresholds like this is for mainstream adoption.

Expand full comment

Nathan Lambert

Dec 14

I see it. I see o1, at least in the door it opens to new types of ai, to a much more impressive future for ai.

Expand full comment

Reply (1)

Dean W. Ball

Dec 15

I think so…

Expand full comment

Steve Newman

Dec 12

Nice analysis. Timothy Lee (author of the Understanding AI blog) has written about another example of a threshold effect – somewhere in the last year or so Waymo seems to have crossed a threshold and is now growing exponentially: https://www.understandingai.org/p/waymo-is-growing-exponentially.

Expand full comment

harry lewis

Dec 13

How did you know Nabeel? I only know of his faith pronouncements. Do you also know David Wood?

Harry Lewis

Expand full comment

Hyperdimensional

Thresholds