Hacking ChatGPT by Planting False Memories into Its Data

This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model.

A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.

Related Posts

Protecting yourself after a medical data breach – Week in security with Tony Anscombe

Sav-Rx data breach impacted over 2.8 million individuals

Finnish police linked APT31 to the 2021 parliament attack

Jane Goodall: Reasons for hope | Starmus highlights