Meta, Mark Zuckerberg Ripped Millions of Copyrighted Works to Train AI Systems, Major Book Publishers Claim

Meta and Mark Zuckerberg are defendants in a lawsuit from five major book publishers and author Scott Turow, who claim the tech giant infringed on copyrights by training AI systems on ripped and pirated works, according to a Tuesday filing in New York federal court.

Turow and five publishers – Hachette, Macmillan, McGraw Hill, Elsevier and Cengage – all alleged that Zuckerberg instructed Meta’s AI programs be trained by copying millions of books, articles, and other written work through pirating sites and web scrapes.

“In their effort to win the AI ‘arms race’ and build a functional generative AI model, Defendants Meta and Zuckerberg followed their well-known motto: ‘move fast and break things,’” the plaintiffs’ lawsuit read. ”They first illegally torrented millions of copyrighted books and journal articles from notorious pirate sites and downloaded unauthorized web scrapes of virtually the entire internet. They then copied those stolen fruits many times over to train Meta’s multibillion-dollar generative AI system called Llama. In doing so, Defendants engaged in one of the most massive infringements of copyrighted materials in history.”

The suit added: “Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama. Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”

The plaintiffs are seeking unspecified damages in a jury trial.

The lawsuit went on to explain that Meta briefly considered expanding its licensing deals with publishers following the release of the Llama 1 tool. The document floated a $200 million increase to the licensing budget before looking to Zuckerberg for a call.

“The question of whether to license or pirate moving forward was ‘escalated’ to Zuckerberg,” the suit read. “After this escalation to Zuckerberg, Meta’s business development team received verbal instructions to stop licensing efforts. One Meta employee presciently described the rationale: ‘If we license once [sic] single book, we won’t be able to lean into the fair-use strategy.’”

The lawsuit concludes by pointing out the system “readily generates, at speed and scale, substitutes for Plaintiffs’ and the Class’s works on which it was trained” and can even “mimic the expressive elements and creative choices of specific authors.”

“Users are touting AI’s ability to generate books with ease and Llama is flooding
the market with AI-generated substitutes,” the suit said. “The scale and speed at which Llama can create written works and compete with human writers is unprecedented, and it can only do that because Defendants copied Plaintiffs’ and the Class’s works to train their LLM.”

A Meta spokesperson noted to Variety that similar lawsuits have been shot down in court. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” the rep said. “We will fight this lawsuit aggressively.”