Claude AI will end ‘persistently harmful or abusive user interactions’

Anthropic’s Claude AI chatbot can now end conversations deemed “persistently harmful or abusive,” as spotted earlier by TechCrunch. The capability is now available in Opus 4 and 4.1 models, and will allow the chatbot to end conversations as a “last resort” after users repeatedly ask it to generate harmful content despite multiple refusals and attempts at redirection. The goal is to help the “potential welfare” of AI models, Anthropic says, by terminating types of interactions in which Claude has shown “apparent distress.”

If Claude chooses to cut a conversation short, users won’t be able to send new messages in that conversation. They can still create new chats, as well as edit and retry previous messages if they want to continue a particular thread.

During its testing of Claude Opus 4, Anthropic says it found that Claude had a “robust and consistent aversion to harm,” including when asked to generate sexual content involving minors, or provide information that could contribute to violent acts and terrorism. In these cases, Anthropic says Claude showed a “pattern of apparent distress” and a “tendency to end harmful conversations when given the ability.”

Anthropic notes that conversations triggering this kind of response are “extreme edge cases,” adding that most users won’t encounter this roadblock even when chatting about controversial topics. The AI startup has also instructed Claude not to end conversations if a user is showing signs that they might want to hurt themselves or cause “imminent harm” to others. Anthropic partners with Throughline, an online crisis support provider, to help develop responses to prompts related to self-harm and mental health.

Last week, Anthropic also updated Claude’s usage policy as rapidly advancing AI models raise more concerns about safety. Now, the company prohibits people from using Claude to develop biological, nuclear, chemical, or radiological weapons, as well as to develop malicious code or exploit a network’s vulnerabilities.