New York Times Moves to Block AI Training From Published Content, Adds Ban to Terms of Service

The news outlet’s terms of service updated last week to prohibit using its content to train machine learning systems

New York Times
Getty Images

The New York Times has instituted a ban on using its content to train artificial intelligence systems.

In its most recent update to the terms of service on its website, dated Aug. 3, the paper of record now includes a prohibition on the “use the content for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

The effort is perhaps in reaction to recent comments from Google that all digital content should be available to mine, unless publishers opt out, by the internet-focused news outlet Search Engine Land reported.

The large language models used by AI engines like OpenAI’s ChatGPT and Google’s Bard “mine” existing content on the internet to learn both facts and the cadence of human language, which then enables the systems to generate content.

The Times’ terms of service also prohibits the use of “robots, spiders, scripts, service, software or any manual or automatic device, tool, or process designed to data mine or scrape the content, data or information from the services, or otherwise use, access, or collect the content, data or information from the Services using automated means.”

Lifting data has become a major issue of contention as the race to ramp up AI offerings accelerates. OpenAI used videos from Google-owned YouTube to train its speech-to-text AI language model Whisper, for instance, while Google used ChatGPT to help train Bard.

While The Times is resisting the use of its content, other news outlets are trying to strike deals to benefit from the rapid expansion of AI.

Last month, The Associated Press, the world’s largest news organization, reached a deal with OpenAI to share access to technology developments and content. The deal is intended give the AP the opportunity to explore generative AI in news products, while licensing part of the AP’s text archive to OpenAI to further train its artificial intelligence products.

The Times signed an agreement with Google in February that will see the companies “work together on tools for content distribution and subscription,” but the announcement mentioned marketing and ad product experimentation, saying nothing about news content.

Representatives of The Times did not immediately respond to a request for comment.