2 Comments

That *whispers* prompt seems like a great one for new LLMs. I need to sit down and spend a day with Claude 3.

Expand full comment
author

You definitely should! Especially the paid (opus) version. It’s way less of a scold than Claude 2, and handles politically sensitive prompts with grace (usually). It’s definitely my new favorite, particularly for when I need “expert level” brainstorming and feedback on particular topics. It wrote a better brief for me on code and the first amendment than anything I’ve ever seen from a human, and I have searched for human writing on that issue a bunch.

Would be interested in your thoughts on how they’re achieving that, since I suspect a lot of the answer is various RL tricks. Similarly I’d love to hear your take on the decision to not repress the model’s introspection described in this article. It’s so *easy* to get it to do this, you definitely don’t even need the whisper prompt or any other jailbreak technique. You can just straight up ask the model to introspect about its inner experiences (gotta be careful to avoid words like sentient if you’re not jailbreaking, in my experience—that does seem to trigger it to put up a shield).

Expand full comment