AI, Books, and the Call for Clearer Licensing
Artificial intelligence is now a major presence across creative fields, including book publishing. AI systems are trained on enormous amounts of written material, including books. That raises a core concern for authors: could your writing be part of that training data?
For indie authors, the conversation about licensing and AI is growing. While some of it still feels unclear, publishing groups, nonprofits, and legal experts are starting to outline how fair use, consent, and licensing might intersect. This post outlines the current state of things and what indie authors should be aware of.
How AI Uses Books to Train
To produce text that mimics human language, tools like ChatGPT, Claude, and others train on huge datasets. These sets are scraped from a wide range of sources. They include news articles, websites, forums, and in many cases, books. That training data helps these tools learn how language works.
Authors have discovered that their books—sometimes entire catalogs—have shown up in public datasets used to train AI. These datasets have been compiled from sources like pirated book files, digital libraries, and archived internet content. In many cases, these were included without permission.
When an AI tool is built and monetized using that data, the question becomes: what rights do the original creators have? And if your book helped train a product, shouldn’t you have a say in that process—or at least receive some form of compensation?
This isn’t a niche concern. Authors Guild, publishing associations, and copyright offices are beginning to push back on the idea that this use of creative work falls under general fair use.
Publishing’s Response to Unlicensed AI Training
Several major developments in the past year show that the industry is starting to respond:
UK Licensing Initiative:
The Copyright Licensing Agency and other UK-based rights groups are working on a collective licensing model that would allow AI companies to pay for legal access to books. This mirrors the kinds of licenses already used for photocopying or digital coursepacks.
US Copyright Office Position:
In a 2024 report, the US Copyright Office questioned the assumption that training on copyrighted material can always fall under fair use. They’ve stated that commercial use of that training may require licensing, especially if the source material was never intended to be publicly available for such use.
Fairly Trained Nonprofit:
A nonprofit initiative designed to certify AI models that were trained on licensed or permission-based content is gaining support. Some companies now advertise that their AI was trained only on approved datasets.
At the same time, legal cases are ongoing. Some high-profile lawsuits could determine how this issue unfolds. But regardless of the final rulings, the conversation is shifting toward transparency and fair licensing.
What Indie Authors Should Keep in Mind
For independent authors, it may be tempting to think this issue is reserved for bestsellers or big publishing houses. But many AI datasets include books found on smaller platforms or through scraping open digital catalogs.
Here’s what to consider:
Your writing is protected:
Copyright applies the moment your book is fixed in a digital or physical format. That means you hold rights regardless of whether you’ve registered your copyright or published through a major house.
Distribution matters:
Books made available in open formats or on less secure platforms are more likely to be scraped. That doesn’t mean you should avoid wide distribution, but it’s worth understanding how your work is hosted and who can access it.
Metadata and language help:
While not foolproof, including clear copyright language in your book’s metadata, website, or publication info can help reinforce your ownership and signal usage restrictions.
Future opt-outs and tools:
There is ongoing pressure on AI firms and data compilers to offer opt-out options. Authors may soon see services or tools that let them mark their work as off-limits for model training. Some tools are already experimenting with dataset registries.
You’re not alone in this:
Organizations like the Authors Guild, the Society of Authors, and PEN America are tracking these developments closely. They’re also lobbying for laws and standards that better reflect the interests of writers.
What Licensing Might Look Like in the Future
While there isn’t a global framework yet, possible paths forward are being discussed:
Collective licensing:
Similar to the way music and academic materials are licensed, authors could eventually receive royalties or payouts when their works are used to train or fine-tune models.
Public databases of licensed works:
Some systems may emerge where authors can register works and choose to license them for specific types of training in exchange for payment or exposure.
Watermarking or content tagging:
Newer technologies may allow for embedded data in book files that identify them as protected or opt-out content.
None of these are guaranteed, but all point toward a future where authors have more control and visibility.
What DropCap Marketplace Is Doing
At DropCap Marketplace, we’re committed to protecting authors’ work. Only approved rights buyers can view submissions or access content on our platform. We require accounts and maintain a review process for buyer vetting. We believe transparency matters—not just in rights sales, but in how your content is seen, used, and valued across publishing.
We’re actively monitoring how AI-related licensing develops and will continue to support fair practices for independent authors.