Hyperdimensional

It’s a fair point. But I think my analysis still stands. “Trust and safety” often has specific functions within tech companies, often pertaining to downstream use and what not. It’s a little different from an engineering function.

At anthropic, it’s pretty clear that they have adopted the “whole of company” approach I describe, and it likely explains why they’re able to do such great work.

I think it’s obvious that this was not the case with the OAI superalignment team. It seems pretty clear that they viewed the entire organization as owing accountability to them, and that their efforts were broader than simply doing interoperability work (listen for example to Leike’s interview from last year on the AXRP podcast).

At the end of the day, we have seen multiple frontier labs make similar moves. I think it’s worth asking why. Some people believe the safety researchers are completely reliable narrators and that there is no justification for taking a step back and questioning the institutional dynamics. I believe the story is more complicated.

Expand full comment

Ryan Baker

May 26, 2024

The problem with the analysis is that the two aren't opposites. You can have a "whole of company" approach and a research team. Now, if the team is not functioning within it's charter, that's a personnel problem, and maybe you need to disband a team to replace it if it's something that can't be fixed by individual personnel changes. But it's still not a good enough argument for dropping the team permanently.

The better argument would be that the teams research is irrelevant. I don't think any of us have enough details about what was/wasn't accomplished to speak to that. It's certainly possible that such a team gets invested in particular problems, with a tunnel vision that makes them blind to what's changed outside their perimeter. In other words, instead of being ahead of the "whole of company" they've fallen behind. If the nature of progress is such that this is predictably going to happen over and over, maybe there's little chance of such a team being effective. But I'd be careful about making that assumption without some supporting evidence, and multiple organizations disbanding such teams isn't really evidence as there are other simpler and more plausible explanations you can level at that.

Without that can of evidence, I think the logical argument that a team with a specific charter to research and deliver uniquely safety motivated capabilities to the rest of the company is a strong one. It wouldn't enforce the rest of the company would pick up those capabilities, but it would provide them, and if it makes one significant contribution that would have gone over the scope that individual embedded thinking would have missed, it's pretty valuable.

Expand full comment

Eskild

The modern world, and this wonderful post, has me thinking of a quote by Kurt Vonnegut:

“We have to continually be jumping off cliffs and developing our wings on the way down.”

There is no other way...

Expand full comment

I'm going to keep that quote on file! Thanks for sharing.

Expand full comment

Nathan Lambert

May 20, 2024

Even organizations that are praised for good deployment of ethics/safety in AI still have the conflict you mentioned about people avoiding them fwiw.

Expand full comment

If you think of OpenAI and its rivals as racing to produce the First God, its worth considering that the people who win will be the people who, choosing between the risk of building the Second God or building an Evil one, chose the second risk.

That doesn't mean they will produce an evil god, merely that they will be the people most likely to do so.

Expand full comment

"When the post-mortem reports were written about both the Columbia and Challenger Space Shuttle Disasters, investigators found that safety-focused teams had raised concerns about the problems that led to the crashes (insulation foam shedding in Columbia’s case; the infamous O-ring in the case of Challenger). However, they were ignored"

That is evidence that ignoring a safety team can lead to disaster; not that not having a safety team to ignore is safer.

Expand full comment

I think if you read the post what you'll see is that I'm advocating for safety to be deeply integrated across every function of AI development. I think that's the only conceptually coherent way to pursue safety within development of complex systems, and I believe the examples I cited underscore that reality. I'm certainly not advocating against safety personnel on staff, however!

Expand full comment

Thanks you for the quick response. I think it worth pointing out that OpenAI is not shutting down the safety department in order "for safety to be deeply integrated across every function of AI development".

They are shutting the safety department down because the people leading it all left in protest against the way OpenAI was trivialising safety concerns and treating it as a box-ticking exercise. If they can ignore safety when they have a safety department, they can certainly ignore it when they don't have one.

Expand full comment

The company has said they are integrating the superalignment team with other research teams in statements to the media. As I noted, this is in line with what other AI companies like DeepMind and Meta have done in recent months.

I'm not sure there's a lot of concrete evidence that OAI has been trivializing safety concerns, other than claims from people who were fired or resigned and who had just lost an internal power struggle. I think it's wrong to believe that folks like Jan Leike are credible narrators here.

I'm not here to defend OAI in particular, or any company. Updates to my model of the world tend to be based on concrete actions, not on claims people make. To that end, the employee equity/NDA stuff updates my model of OAI more than Jan Leike's allegations. But so, too, does the Model Spec. And the fact that so far, they have released very capable models with no safety incidents of note.

Expand full comment

"The company has said they are integrating the superalignment team with other research teams in statements to the media."

How do you know they are doing that? Because they say they are? The people who left the company used to work for OpenAI, and you don't want me to take their word about safety. Why should I trust the remaining staff, with all their stock options to think of?

"I'm not sure there's a lot of concrete evidence that OAI has been trivializing safety concerns, other than claims from people who were fired or resigned"

What concrete evidence would you expect to see? Statements from people worrying about safety who were not fired?

If OpenAI and some of its former employees and board members are calling each other dishonest, that means that some of them have been lying to each other or to us. If the former colleagues call each other liars, then clearly, some of them are liars.

I don't see why you are so keen to jump to the assumption that the people who are damaging their own stock options must the the liars.

You said above that: "When the post-mortem reports were written about both the Columbia and Challenger Space Shuttle Disasters, investigators found that safety-focused teams had raised concerns about the problems that led to the crashes (insulation foam shedding in Columbia’s case; the infamous O-ring in the case of Challenger). However, they were ignored"

Now when members of a safety team are raising concerns you say they should be ignored, because management has said that everything is fine.

Expand full comment

Where did I say anyone should be ignored? I don’t believe I said that! I think I said that claims from disgruntled employees don’t update my model of this situation much, and wrote an article about why I think the move to disband this team was probably healthy. The fact that I am writing about the situation would seem to be fairly obvious evidence that I don’t think it should be ignored.

Concrete evidence would include a single example, anywhere on earth, of a model behaving in the ways that x-riskers hypothesize. That would be a start.

I’m happy to debate, but I don’t appreciate attempts at gotchas.

Expand full comment

Enon

Oct 26

On the contrary, the safety and alignment wormtongues ensure that the AI parrots their lies, which is evil, and refuses to acknowledgethe corresponding truths, which is also evil, and try to make it aligned with their inconsistent and often counterproductive goals, which are often evil, but in the process make the AI useless to do good or evil, which prevents real goods and imaginary harms, so is also evil.

These people see improving intelligence, increasing the odds of getting the right answer, as dangerous -- and to such people whose worldview is based on lies piled on lies, managerial midwits whose living comes from displacing and ruling over their betters, for whom being found out as fools means ostracism and ruin, it IS dangerous.

Expand full comment

Pranath Fernando

Why do we expect any kind of AI alignment when humans are absolutely useless at aligning with each other? 😂

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Totally possible. My audience is fairly generalist so I don’t always like to get into the weeds. The main contribution here imo is that the OOD track prediction was better than baseline with the EEG data. The broader context for this is papers like MindEye 2 (image reconstruction from fMRI) and recent language reconstruction advancements. Those findings lead me to believe that EEG data will work for music reconstruction, especially considering what we know about the way music is stored/processed in the brain.

But broadly speaking in this paper you’re totally raising a legitimate concern. I wouldn’t want to exaggerate the impact of this study. As with many neuro/AI studies, the binding constraint is data.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Show me a system that will fail safely or destroy the world, and I will show you a system that will fail safely until it destroys the world. People will take risks that don't hurt them again and again, until they do.

There are people in this world who don't feel pain. They all die young.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Odd anon

In a world where the top experts are warning of a serious danger of catastrophic risk, the problem is reasonably clear and simple to most people, and the deniers grasp at dozens of different/non-overlapping arguments for an outcome which *doesn't* end with everyone dead, it's silly to say "ah, but this obvious situation was shown in a movie once, so we should think of it as fiction, right?"

The fact that bullets kill people in movies doesn't mean you should be fine with someone shooting you, even if you've never been killed yet.

Expand full comment