Headlines

Meta knew it used pirated books to train AI, authors say

Published by Global Banking & Finance Review

Posted on January 10, 2025

2 min read

· Last updated: January 27, 2026

Add as preferred source on Google
Meta Platforms accused of using pirated books for AI training - Global Banking & Finance Review
This image illustrates the controversy surrounding Meta Platforms' alleged use of pirated books for training AI models. The article discusses authors' claims against Meta, highlighting issues of copyright infringement and the ethical implications of using unauthorized content for AI development.
Global Banking & Finance Awards 2026 — Call for Entries

By Blake Brittain (Reuters) - Meta Platforms used pirated versions of copyrighted books to train its artificial intelligence systems with approval from its CEO Mark Zuckerberg, a group of authors

Meta Accused of Using Pirated Books for AI Training

By Blake Brittain

(Reuters) - Meta Platforms used pirated versions of copyrighted books to train its artificial intelligence systems with approval from its CEO Mark Zuckerberg, a group of authors alleged in newly disclosed court papers.

Ta-Nehisi Coates, comedian Sarah Silverman and other authors suing Meta for copyright infringement made the accusations in filings made public on Wednesday in California federal court. They said internal documents produced by Meta during the discovery process showed the company knew the works were pirated.

Spokespeople for Meta did not immediately respond to a request for comment.

The authors sued Meta in 2023, arguing that the tech giant misused their books to train its large language model Llama.

The case is one of several alleging that copyrighted works by authors, artists and others were used to develop AI products without permission. Defendants have argued that they made fair use of copyrighted material.

The authors asked the court on Wednesday for permission to file an updated complaint. They said new evidence showed Meta used the AI training dataset LibGen, which allegedly includes millions of pirated works, and distributed it through peer-to-peer torrents.

They said internal Meta communications showed Zuckerberg "approved Meta's use of the LibGen dataset notwithstanding concerns within Meta's AI executive team (and others at Meta) that LibGen is 'a dataset we know to be pirated.'"

U.S. District Judge Vince Chhabria last year dismissed claims that text generated by Meta's chatbots infringed the authors' copyrights and that Meta unlawfully stripped their books' copyright management information (CMI).

The writers argued Wednesday that the evidence bolstered their infringement claims and justified reviving their CMI claim and adding a new computer fraud claim.

Chhabria said during a hearing on Thursday that he would allow the writers to file an amended complaint but expressed skepticism about the merits of the fraud and CMI claims.

(Reporting by Blake Brittain in Washington; Editing by David Bario and Aurora Ellis)

Key Takeaways

  • Meta allegedly used pirated books to train AI systems.
  • Authors claim CEO Mark Zuckerberg approved the use.
  • Lawsuit filed for copyright infringement in California.
  • Meta used the LibGen dataset for AI training.
  • Court allowed authors to file an amended complaint.

Frequently Asked Questions

What is the main topic?
The main topic is Meta's alleged use of pirated books to train its AI systems, leading to a lawsuit by authors.
What is the LibGen dataset?
LibGen is a dataset allegedly containing millions of pirated works, used by Meta for AI training.
Who is involved in the lawsuit?
Authors like Ta-Nehisi Coates and Sarah Silverman are suing Meta for copyright infringement.

Related Articles

More from Headlines

Explore more articles in the Headlines category