We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through. Evan Hubinger – Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Just like the plot of Netflix’s ‘Leave the World Behind’, we’ve welcomed artificial intelligence (AI) into our homes and workplaces. It’s […]
The post You’re Saying LLMs Can Turn Nasty? A Machine Learning Engineer’s View appeared first on Heimdal Security Blog.