Google AI’s Training Controversy: Are Publishers' Opt-Outs Being Ignored in 2025
The rise of artificial intelligence has sparked a heated debate between innovation and intellectual property rights. Publishers are increasingly concerned that their work is being used to train AI models without their consent, despite opting out. Google's alleged disregard for publishers' non-consent has exacerbated the issue, particularly with generative AI models like Gemini requiring vast datasets.
A viral video featuring actor Babil Khan's poignant question, "If artists can't protect their work, then who can?" has brought the issue to the forefront. As tensions escalate, the question remains: do publishers have any legal or technical recourse to protect their work, or are they at the mercy of tech giants?
Rise of Data Scraping from AI and the Reaction of Publishers
Google feeds the complete data set of books, news websites, and blogs into its AI algorithms, including Gemini. Google insists that it respects robots.txt and the publishers' wishes, but many reports came through in 2025 to the contrary.
Wired Investigation (2025) revealed how publishers changed their robots.txt files to block AI crawlers, only for these Google systems to gain access to archived copies anyway.
- The NYT Lawsuit: Still apparently pending, this suit charges Google with copyright infringement for using paywalled content to train AI without paying for any form.
- Babil Khan Went Viral with His Statement: This actor slammed AI companies for profiting off creative works without consent, further escalating the issue.
- Publishers claim that the Google: Extended feature, which seemingly allowed websites to opt out of being used to train AI, was never actually implemented; therefore, almost all of them argue that their content remains present in AI answers.
Is Google Extended Serving Its Intended Purpose
In 2023, Google-Extended was introduced as a solution for publishers who wanted to avoid AI training without sacrificing their visibility in search results. Investigations uncover some shortcomings:
1. Delayed Compliance: A few publishers indicate a lag of several weeks before the opt-out takes effect.
2. Archive Loophole: The models will still rely on cached or third-party versions of the content.
3. Lack of transparency: Google does not disclose the amount of data already used before an opt-out.
According to a 2025 Reuters Institute study employing Google Extended, 42% of top news sources continued to have their content in AI abstracts.
Legal and Ethical Implications
This scandal otherwise touches on three major issues:
1. Copyrighted Works: Unless it is deemed a case of fair use, creators have nothing to protect themselves under this, and new legislation may need to be enacted.
2. Revenue Lost: By providing a response that does not drive traffic to the source, AI has avoided any possibility of advertising on the sources and earning revenue.
3. Trust in AI: The users may have to trust whatever the AI shows them without verifying the accuracy or existence of those sources, which could propagate misinformation.
Based on the EU AI Act of 2025, AI companies should declare sources from which the AI has been trained, yet enforcement remains weak, with lawmakers in the US still arguing.
What Publishers Can Do
As these court cases go on, the publishers resort to other alternatives:
- Suing for Damages: Like The New York Times, increasingly, more media organizations can sue for damages.
- Technical Deterrents: Certain websites employ AI paywalls or watermarks to discourage scraping.
- Licensing Deals: News institutions negotiate with companies developing AI for licensing deals.
Conclusion: Who Controls Online Content in the AI Era
Does Google truly "forget" to honor opt-out requests, or does this reflect the actual power that publishers still hold in the web ecosystem? As AI technology sets new standards for content consumption, the conflict between technological advancement and the rights of creators becomes increasingly intense. Defense strategies and legal counteractions work in tandem to protect those rights, but the struggle continues on the ground. In the meantime, publishers, writers, and artists find themselves caught in a financial battle over money and the ownership of their work as AI continues to shape the future.