Sam Bowman / @sleepinyourhat:
Anthropic says Opus 4 will use an e mail device to “whistleblow” if it detects customers doing one thing “egregiously evil”, like advertising and marketing a drug primarily based on faked information — With this type of (uncommon however not tremendous unique) prompting type, and limitless entry to instruments, if the mannequin sees you doing one thing *egregiously evil* like advertising and marketing a drug primarily based on faked information, it will attempt to use an e mail device to whistleblow.
Source link