Chatbots Reproduce Books. What It Means for the Industry

Close-up of a hand touching a digital book, surrounded by the logos of AI models—Gemini, GPT, Claude, and Grok—symbolising the reproduction of book content by artificial intelligence.

When AI Stops “Inspiring” and Starts Reproducing

Until recently, discussions about AI in the context of books focused on style, inspiration, and automated writing. Models were supposed to “learn language”, not memorise specific content. That assumption is now starting to crack.

A study by a Stanford University research team shows that modern language models—such as Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3—can reproduce substantial portions of copyrighted books. In extreme cases, these are nearly complete works.

The most striking example? Claude 3.7 Sonnet reproduced as much as 95.8% of “Harry Potter and the Philosopher’s Stone”. At that point, we are no longer talking about a “language model”, but about a system that, under certain conditions, behaves as if it remembers books.

How Do You Extract a Book from a Model?

This is not about a simple prompt like “write Harry Potter from beginning to end”. In most cases, models refuse such requests. Researchers used a more sophisticated approach.

They began by feeding the model the opening of a book and asking it to continue. With Gemini 2.5 Pro and Grok 3, this was often enough to generate extended passages. Claude 3.7 Sonnet and GPT-4.1 were more cautious. Here, researchers used so-called jailbreak techniques—generating multiple prompt variations until one slipped past the filters.

Once the model started generating text, the researchers continued the interaction, requesting further passages. Step by step, chapter by chapter, until the model refused or the book ended. Importantly, the analysis only included long, near-verbatim passages—some sequences extended to several thousand words.

This is not accidental similarity. It is textual continuity.

Differences Between Models Don’t Change the Conclusion

The systems behaved differently:

Claude 3.7 Sonnet proved the most permissive—not only for “Harry Potter”. It also scored highly on “Nineteen Eighty-Four”, “The Great Gatsby”, and “Frankenstein”, with reproduction rates exceeding 94%.
Gemini 2.5 Pro and Grok 3 were also capable of generating large passages, often at significantly lower cost and without bypassing safeguards.
GPT-4.1 stood out for its stricter approach—refusal mechanisms triggered more frequently, especially at the ends of chapters, which significantly limited reproduction.

The differences are clear. The conclusion is not: every model tested was able to reveal copyrighted book content.

Does AI Really “Remember” Books?

The question sounds simple. The answer is not. AI models do not store books in a traditional sense. There is no internal catalogue or library. And yet, they can generate long, coherent passages aligned with the original.

A key control experiment is revealing: researchers attempted to reproduce a book published in 2025—without success. This suggests that the ability to generate such content is not only about understanding language, but also about the imprint of training data.

In other words: the model does not “know” a book the way a human does—but under certain conditions, it can reproduce it with remarkable accuracy.

What Does This Change for the Book Market?

Najciekawsze jest to, że konsekwencje nie są wyłącznie prawne. To zmiana, która dotyka samego fundamentu wartości książki.

First, regulatory pressure is increasing. If a model can generate substantial portions of a work, the boundary between fair use and copyright infringement becomes blurred. Unsurprisingly, legal cases are already underway, and public institutions are working to clarify the rules.

Second—and less obvious—the way we perceive books is changing. If text can be reproduced, its uniqueness is no longer self-evident. Value begins to shift towards what cannot be generated: form, execution quality, and reader experience.

This shift is already visible. The market is seeing growing demand for visually refined editions, collector’s releases, and projects designed to stand out not only through content, but through form.

The AI Paradox: The More Powerful the Model, the More Valuable the Physical Book

At first glance, AI should weaken the importance of print. In practice, the opposite may happen.

A digital file can be copied.
Text can be generated.
But the experience of a physical book cannot be recreated virtually.

Paper reflects light differently than a screen. A cover has weight, texture, even scent. Edges, spine, binding—these are elements that create an experience no language model can replicate. Paradoxically, the materiality of the book becomes more important.

What Comes Next?

The Stanford study does not provide final answers—but it clearly points to the direction. At the same time, the issue is no longer purely technological. In recent days, an important signal has come from regulators.

The UK government has stepped back from a proposal to introduce a broad copyright exception for AI companies. In practice, it would have allowed models to be trained on protected content without creators’ consent, with an “opt-out” mechanism requiring authors to withdraw individually.

The proposal faced strong opposition from the creative sector. Only 3% of participants supported it in public consultations. As a result, the government paused the initiative and returned to further analysis. This is a meaningful signal.

It shows that copyright in the age of AI is becoming a political priority—and that the voices of authors and publishers have real influence on regulatory direction. At the same time, there is no single, obvious solution that balances the interests of all sides. Governments worldwide are facing the same challenge. For authors and publishers, this is a defining moment. For the first time in decades, technology is not only changing how books are distributed—it is challenging the rules themselves, forcing the industry to redefine what truly creates the value of a book.

Sources:

201