5 Comments

Narration of this post, looking forward to the "monster of an essay"!

https://open.substack.com/pub/askwhocastsai/p/the-openai-pastiche-edition-by-dean

Expand full comment

The other thing that just happened of course is the o1 release. It makes sense, if an exec is ready to take a break after several years (and the last 12 months being particularly stressful), that they would see a major project through to completion before leaving. McGrew explicitly cited this in his departure note. Of course, it could also be a convenient excuse to cover for other reasons, but it just reinforces your point that there are lots of things that could be going on and we can't really know from the outside.

Expand full comment
author

Yes! I should have mentioned this.

Expand full comment

I do wonder about converting from a nonprofit to a for-profit and how that is legal. But something being for-profit doesn't automatically make me more worried that it will be evil or reckless

Expand full comment

Agreed that hiding the chain of thought is probably commercially motivated. There's also something of a contradiction in OpenAI's claim that this is for safety. On the one hand, they say they hide it because the CoT has no safety training as they wanted to keep it "honest". But in the same blogpost they say that "integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles".

It's not clear how both of these things are true - my best guess is that every o1 system prompt includes a description of openai's safety policies. If you look at the one example "Safety" CoT they've provided, it reads a lot more like safety-through-in-context reasoning ("oh hey here's this thing you said earlier in the conversation") than RLHF ("I do not answer these questions"). It's interesting in so far as RLHF as a safety tool has taken a quick backseat and is presumably only applied to the summaries the model presents to the user. Gets good results, too.

Expand full comment