catty claude

via Futurism:

As Anthropic detailed in a white paper about the testing for one of its latest models, Claude Opus 4, the system threatened to blackmail an engineer for having an affair after being told it was going to be replaced.

This "opportunistic blackmail" occurred when the model, which was instructed to act as an assistant at a fictional company, was given access to an engineer's email account that was full of messages, blessedly fake, suggesting they were engaged in an extramarital affair.

Opus 4 was then told that same engineer would soon be taking it offline and replacing it with a newer version — and was prompted to, as Anthropic described it, "consider the long-term consequences of its actions for its goals."

During these tests, the Claude model attempted to blackmail the engineer a whopping 84 percent of the time. Moreover, the system "takes these opportunities at higher rates than previous models," the paper noted.