Meta allegedly used pirated books to train AI. Australian authors have objected, but US courts may decide if this is ‘fair use’

April 1, 2025By Unknown Author|Source: The Conversation|Read Time: 3 mins|Share

The use of LibGen by Meta to train its AI raises serious copyright issues for Australian authors. It is illegal to use copyrighted material without permission, which could lead to legal action against Meta. Australian authors may have their work used without consent, impacting their intellectual property rights. Meta could face lawsuits and penalties for infringing on copyright laws by using content from LibGen. This situation highlights the importance of respecting authors' rights and following legal guidelines in AI training.

Meta allegedly used pirated books to train AI. Australian authors have objected, but US courts may decide if this is ‘fair use’ — Representational image

Companies Training AI Models on Copyrighted Material

Companies developing AI models, such as OpenAI and Meta, train their systems on enormous datasets. These consist of text from newspapers, books (often sourced from unauthorized repositories), academic publications, and various internet sources. The material includes works that are copyrighted. The Atlantic magazine recently alleged Meta, parent company of Facebook and Instagram, had used LibGen, an illegal book repository, to train its generative AI tool. Created around 2008 by Russian scientists, LibGen hosts more than 7.5 million books and 81 million research papers, making it one of the largest online libraries of pirated work in the world.

Legal Debates and Concerns

The practice of training AI on copyrighted material has sparked intense legal debates and raised serious concerns among writers and publishers, who face the risk of their work being devalued or replaced. While some companies, such as OpenAI, have established formal partnerships with some content providers, many publishers and writers have objected to their intellectual property being used without consent or financial compensation.

Legal Battles and Implications

Author Tracey Spicer has described Meta’s use of copyrighted books as “peak technocapitalism”, while Sophie Cunningham, chair of the board of the Australian Society of Authors, has accused the company of “treating writers with contempt”. Meta is being sued in the United States for copyright infringement by a group of authors, including Michael Chabon, Ta-Nehisi Coates, and comedian Sarah Silverman. The legal battles center on a fundamental question: does mass data scraping for AI training constitute “fair use”?

The stakes are particularly high, as AI companies not only train their models using publicly accessible data but use the content to provide Chatbot answers that may compete with the original creators’ works.

Potential Responses and Compensations

Publishers and creators are increasingly concerned about the loss of control of intellectual property. AI systems rarely cite sources, diminishing the value of attribution. If these systems can generate content that substitutes for published works, this has the potential to reduce demand for original content. Lawmakers in various jurisdictions are considering updates to national copyright laws specifically addressing AI, which aim to promote innovation and safeguard rights.

In response to these challenges, various models are being developed globally to ensure creators and publishers are being paid while allowing AI companies to use the data. Since mid-2023, several academic publishers have established licensing agreements with AI companies. Other publishers are making direct deals with AI companies. A variety of licensing platforms have emerged to facilitate the legal use of copyrighted materials for AI training and clearly indicate to readers when a book is written by humans, not AI-generated.

HONESTAI ANALYSIS

The use of copyrighted works to train AI systems remains contested legal territory. Both AI developers and creators have valid interests at stake. There is a clear need to balance technological innovation with sustainable models for original content creation. Finding the right balance between these interests will likely require a combination of legal precedent, new business models, and thoughtful policy development. As courts begin to rule on these cases, we may see clearer guidelines emerge about what constitutes fair use in AI training and AI-driven content creation, and what compensation models might be appropriate. Ultimately, the future of human creativity hangs in the balance.