Meta will use public EU user data to train its AI models

Meta announced that it will use public EU user data to train AI, resuming plans paused last year over Irish data protection concerns.

Meta will start training its AI models using public data from adults in the EU, after pausing the plan last year over data protection concerns raised by Irish regulators.

In June 2024, the social media giant announced it was delaying the training of its large language models (LLMs) using public content shared by adults on Facebook and Instagram following the Irish Data Protection Commission (DPC) request.

Meta was disappointed by the DPC request, the company pointed out that this is a step “backwards for European innovation, competition in AI development and further delays bringing the benefits of AI to people in Europe.”

“We’re disappointed by the request from the Irish Data Protection Commission (DPC), our lead regulator, on behalf of the European DPAs, to delay training our large language models (LLMs) using public content shared by adults on Facebook and Instagram  — particularly since we incorporated regulatory feedback and the European DPAs have been informed since March.” said Meta in a statement. “This is a step backwards for European innovation, competition in AI development and further delays bringing the benefits of AI to people in Europe.”

The company explained that its AI, including Llama LLM, is already available in other parts of the world. Meta explained that to provide a better service to its European communities, it needs to train the models on relevant information that reflects the diverse languages, geography and cultural references of the people in Europe. For this reason, the company initially planned to train its large language models using the content that its European users in the EU have publicly stated on its products and services.

Meta now confirmed it is going to resume training its AI models with public data from EU individuals.

“In the EU, we will soon begin training our AI models on the interactions that people have with AI at Meta, as well as public content shared by adults on Meta Products.” reads a post published by the company. “This training will better support millions of people and businesses in Europe, by teaching our generative AI models to better understand and reflect their cultures, languages and history.”

The company pointed out that users based in the EU can choose to object to their public data being used for training purposes. Starting this week, EU users will get notices about their data being used to improve AI, with an option to easily object at any time.

Meta remarked that they do not use people’s private messages with friends and family to train their generative AI models. It also added that public data from the accounts of people in the EU under the age of 18 is not being used for training purposes.

“We believe we have a responsibility to build AI that’s not just available to Europeans, but is actually built for them. That’s why it’s so important for our generative AI models to be trained on a variety of data so they can understand the incredible and diverse nuances and complexities that make up European communities.” concludes the post. “It’s important to note that the kind of AI training we’re doing is not unique to Meta, nor will it be unique to Europe. This is how we have been training our generative AI models for other regions since launch. We’re following the example set by others including Google and OpenAI, both of which have already used data from European users to train their AI models. We’re proud that our approach is more transparent than many of our industry counterparts.”

Follow me on Twitter: @securityaffairs and Facebook and Mastodon

Pierluigi Paganini

(SecurityAffairs – hacking, artificial intelligence)