Training AI on Books? U.S. Court Says It’s Fair Use

The United States has issued its first ruling on the use of books to train artificial intelligence. A federal court in San Francisco ruled that Anthropic’s actions—using millions of books to train its Claude model—fall within the bounds of what is known as “fair use.”

Judge William Alsup found that the use of the books was “transformative” in nature and did not infringe on the market value of the original works.

Why Is This Considered Fair Use?

The judge emphasised that training AI does not mean copying books, but rather analysing their content to create new functionalities. According to the court, this process is similar to other cases previously recognised as fair use, such as the indexing of content by search engines.

Moreover, the Claude model does not reproduce full versions of books or present them in a way that could replace the original. This kind of use does not harm authors in terms of sales or commercial value.

Behind the Scenes: The Pirated Books Controversy

Part of the lawsuit concerned the fact that Anthropic had previously downloaded around seven million books from illegal online sources. The company claims that it ultimately did not use these materials for training and instead purchased “millions” of legitimate copies.

However, the judge ruled that simply downloading and storing pirated copies constitutes copyright infringement and could lead to financial liability. That part of the case is scheduled to return to court in December.

What Does This Mean for the AI and Publishing Industries?

While the ruling may favour AI developers, it also highlights the fine line between lawful and unlawful data use. Key takeaways include:

Fair use may apply to model training, but only when the source materials are lawfully acquired.
The use must be transformative—it cannot result in direct replication of original works.
AI companies must carefully vet their data sources, as pirated content carries serious legal risks.

We’ve Written About This Before

In those articles, we examined the accusations raised by authors against companies like Meta and Anthropic. We explored the wider debate over whether training AI models on copyrighted works without consent is the future or a violation of the law.

This is only the first in a series of rulings that will define the relationship between artificial intelligence and copyright law. The stakes are high for authors, publishers, and the entire tech industry.

Sources:

2232