Photo by zhang kaiyv / Unsplash

Apple's AI Training Program Meets Resistance: Prominent Outlets Opt Out, Igniting Debate Over Web Data Usage

AI Aug 31, 2024

Leading Platforms Reject Apple's AI Scraping

Resisting Apple's AI Scrapping: Major Websites Express Disapproval

Renowned platforms such as Facebook, Instagram, The New York Times, and more, have decided to not participate in Apple's AI training. This follows the company's introduction of Applebot-Extended, a new tool catering to publishers who do not wish their data to be utilized for AI training purposes.

AI Bots: In the Midst of a Paradigm Shift

An apparent shift in attitude toward web-crawling bots has taken place as their functionalities have evolved from trawling the web for data, to becoming critical components of AI training. The controversy around the training of AI using data obtained from these bots has sparked a debate on intellectual property and the web's future.

Diving into Applebot-Extended

Launched as an enhancement to Applebot, the original web-crawling bot, Applebot-Extended gives website proprietors the power to block Apple from using their data. Despite this, the original Applebot's scraping actions will continue, affecting the visibility of the website's content on Apple's search products. The goal behind Applebot-Extended is to enable proper regulation of AI projects by preventing the obtained data from feeding into generative AI programs.

Robots.txt: A Lingering Debate Over AI Training

A website's Robots Exclusion Protocol, or robots.txt, is a key factor in managing their interaction with Applebot-Extended and other AI bots. The file, which has been a long-standing regulation tool for data scraping, proves integral to either blocking or permitting these AI bots.

An Underwhelming Reception for Applebot-Extended

So far, only a small percentage of high-traffic sites are seen to block Applebot-Extended. Surveys by Originality AI and Dark Visitors found that approximately 7% and 6% of sites, respectively, were blocking Applebot-Extended despite the chance to benefit from Apple's AI training.

AI Scrapping Partnerships: A Growing Trend

An analysis by data journalist Ben Welsh revealed that mainstream news websites seem to block AI-specific bots unless they have established collaborative partnerships. This finding aligns with recent partnership announcements from competitors like OpenAI and Perplexity with numerous news outlets, reflecting a strategic response to AI data scraping.

Major Media Companies Chime In

Several large media companies, including Vox Media and Gannett, have expressed their stance on AI scraping tools. While Vox Media is blocking AI scraping bots without a commercial agreement, Gannett asserts that allowing AI scrapers is not beneficial to the company. The New York Times is critical of the voluntary nature of AI scraping, slamming it as copyright infringement.

Conclusion

The disagreement with Applebot-Extended's usage highlights the ongoing debate over data usage rights and privacy. As Apple strives to seal deals with publishers, the implications of any data licensing arrangements may become apparent in robots.txt files and reflect the escalating concern over the control of information in the age of AI.

Tags

Suiradybedam Tobami

Software Automation Engineer