Thomas Buddenbroke on Hyperdimensional

1 Comment

Agreed that hiding the chain of thought is probably commercially motivated. There's also something of a contradiction in OpenAI's claim that this is for safety. On the one hand, they say they hide it because the CoT has no safety training as they wanted to keep it "honest". But in the same blogpost they say that "integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles".

It's not clear how both of these things are true - my best guess is that every o1 system prompt includes a description of openai's safety policies. If you look at the one example "Safety" CoT they've provided, it reads a lot more like safety-through-in-context reasoning ("oh hey here's this thing you said earlier in the conversation") than RLHF ("I do not answer these questions"). It's interesting in so far as RLHF as a safety tool has taken a quick backseat and is presumably only applied to the summaries the model presents to the user. Gets good results, too.

Expand full comment