
Meta, the tech giant founded by Mark Zuckerberg, trained its latest artificial intelligence model, LLaMA, using a massive dataset of books sourced from the notorious piracy website Library Genesis (LibGen).
LibGen is an illegal repository hosting millions of publications, ranging from literary classics and academic works to contemporary bestsellers. For Meta, it served as a goldmine of training data. However, none of this involved the consent of authors or publishers.
Zuckerberg: It’s Legal
Internal Meta documents reveal that Mark Zuckerberg approved using LibGen-sourced content for AI training. He argued that the move falls within the bounds of U.S. law—specifically under the “fair use” doctrine, which permits specific uses of copyrighted material for research or developmental purposes.
This interpretation is highly controversial and legally dubious—especially given the commercial intent and the sheer volume of work involved.
Legal Sources? Too Expensive!
The documents also show that Meta considered purchasing licenses from publishers, but staff reportedly deemed the proposed terms “unreasonably expensive.” Ultimately, they opted for the pirated material, not due to a lack of alternatives, but because it was the more cost-effective route.
It was not a necessity—it was a business decision.
Was Your Book Used to Train AI? The Atlantic Can Tell You
Although Meta has not released a complete list of the titles used to train its models, many authors and researchers are turning to The Atlantic’s searchable dataset—a resource based on pirated archives like LibGen.
The dataset includes about 190,000 titles and allows users to check whether their books were used to train AI language models by Meta and other major players like OpenAI and Anthropic.
Legal Battles Underway. Europe Is Responding—Is the English-Speaking World?
In the U.S., lawsuits have already been filed over the unauthorized use of books in AI training. Notable plaintiffs include comedian Sarah Silverman and authors Paul Tremblay and Michael Chabon. These cases touch on copyright infringement and the lack of transparency from companies like OpenAI and Meta.
France is also preparing collective legal actions to protect the rights of authors and publishers from exploitation by Big Tech.
In contrast, the response from institutions in English-speaking countries like the UK, Canada, or Australia has been muted. Organizations like the UK Society of Authors or the Authors Guild in the U.S. have voiced concern. Still, legislative or governmental action has been minimal—despite evidence that works by prominent authors such as Margaret Atwood, George R. R. Martin, and Colson Whitehead appear in the datasets.
Time for New Rules?
The unauthorized use of creative work to train artificial intelligence systems has become widespread. As a result, creative communities around the world are calling for new legal safeguards, including:
- full transparency about AI training data sources,
- the ability for authors to opt-out,
- and reasonable compensation for the use of their work.
Books may soon be treated as free raw material without decisive regulation, and copyright law is rendered obsolete.
Sources: