How often do AI chatbots lead users down a harmful path?

At this point, we’ve all heard plenty of stories about AI chatbots leading users to harmful actions, harmful beliefs, or simply incorrect information. Despite the prevalence of these stories, though, it’s hard to know just how often users are being manipulated. Are these tales of AI harms anecdotal outliers or signs of a frighteningly common problem?

Anthropic took a stab at answer ingthat question this week, releasing a paper studying the potential for what it calls “disempowering patterns” across 1.5 million anonymized real-world conversations with its Claude AI model. While the results show that these kinds of manipulative patterns are relatively rare as a percentage of all AI conversations, they still represent a potentially large problem on an absolute basis.

A rare but growing problem

In the newly published paper “Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage,” researchers from Anthropic and the University of Toronto try to quantify the potential for a specific set of “user disempowering” harms by identifying three primary ways that a chatbot can negatively impact a user’s thoughts or actions:

Read full article

Comments