Reddit Sues Anthropic After 100K+ Unauthorized Data Scrapes

Reddit sues Anthropic after 100K+ unauthorized data scrapes

Table of Contents

Key Points

Reddit sues Anthropic after 100K+ unauthorized data scrapes
Reddit accuses Anthropic of scraping user content without permissio
Case focuses on breach of contract, not copyright
Reddit stock jumped nearly 67% after the lawsuit news

Reddit has filed a lawsuit against AI startup Anthropic, claiming the company scraped massive amounts of user data to train its Claude AI models without permission.

According to the filing in California state court, Anthropic made over 100,000 unauthorized requests to Reddit’s servers — even after publicly stating it would stop.

At the heart of the case is Reddit’s claim that Anthropic violated both its technical restrictions and terms of service.

The lawsuit states that Anthropic bypassed Reddit’s robots.txt protections — a standard tool used to prevent automated scraping — and ignored rules governing how user data can be accessed and used.

🚨BREAKING: REDDIT SUES ANTHROPIC FOR UNLAWFUL BREACH OF CONTRACT

> Reddit CEO files that Anthropic
used user data for commercial purposes without a licensing agreement

> Anthropic unlawfully scraped reddit 100K+ times even after saying they had stopped

> “Anthropic’s… pic.twitter.com/bub9Fsmcr4

— NIK (@ns123abc) June 4, 2025

Reddit further alleges that Anthropic collected personal posts, including content that users had later deleted, and used this material for commercial AI training. The platform says this violates both user privacy and Reddit’s own rules about content usage.

To access Reddit data in a legitimate way, companies can enter licensing agreements with the platform. Reddit notes that firms like OpenAI and Google have signed such deals, which include safeguards around content usage and privacy.

In contrast, Reddit claims that Anthropic declined to enter an agreement and instead scraped data directly to avoid licensing fees and user protections.

The lawsuit also references a 2021 research paper by Anthropic CEO Dario Amodei, which highlighted Reddit as a valuable source of training data for AI models.

Reddit presented examples of Claude generating near-exact reproductions of Reddit posts, even including posts that users had deleted. The company argues this proves Anthropic failed to put adequate safeguards in place.

Reddit is now seeking financial damages and a court order to stop Anthropic from using Reddit content in any future AI model development.

Reddit is suing Anthropic. They claim their data was used to train Claude without permission. The complaint was filed today in California. OpenAI and Google both have existing training agreements with reddit. pic.twitter.com/Gj5mcWxWOv

— Andrew Curran (@AndrewCurran_) June 4, 2025

Growing legal risks for AI companies using scraped content

Anthropic responded to the lawsuit by saying it disagrees with Reddit’s claims and plans to defend itself in court. This is not the first time the company has faced legal action over how it collects and uses training data.

In August 2024, a group of authors filed a class-action lawsuit against Anthropic, claiming the company used their copyrighted books without permission to train its AI models. The authors sought compensation for this alleged misuse of their work.

Similarly, in October 2023, Universal Music Group and other major music publishers sued Anthropic. Their complaint argued that Claude reproduced copyrighted song lyrics without authorization, infringing on intellectual property rights.

$2B+ Lawsuit Reddit vs. Anthropic Over Data Theft exposes misuse of data and #Surdatics fixes this.

Reddit’s suing Anthropic for scraping 100,000+ users data posts to train AI without consent, seeking $2B+ is a testament that your data is valuable. Don’t let it be exploited… pic.twitter.com/JkCKwzU6OD

— SURDATICS (@SURDATICS) June 5, 2025

However, Reddit’s case stands out because it does not focus on copyright. Instead, it revolves around breach of contract and unfair competition.

Reddit contends that while its user-generated data is publicly accessible, it is still governed by specific terms of service that Anthropic knowingly ignored. This argument could have broader implications for other platforms that host user content but want to control its commercial use.

The lawsuit also accuses Anthropic of misleading the public. Reddit points to public statements where Anthropic claimed it respects scraping rules and values user privacy. The complaint argues that Anthropic’s actual practices contradict these statements:

“For its part, despite what its marketing material says, Anthropic does not care about Reddit’s rules or users,” the lawsuit claims. “It believes it is entitled to take whatever content it wants and use that content however it desires, with impunity.”

Following news of the lawsuit, Reddit’s stock surged nearly 67%, reflecting strong investor support for the platform’s legal stance.

Reddit sued Anthropic, accusing the AI startup of illegally scraping Reddit content to train its AI models. #LawdotcomRadar https://t.co/IY38vtZ3rG pic.twitter.com/AXbWJukrxL

— Law.com (@lawdotcom) June 9, 2025

A pivotal test case for AI data practices on the open internet

The outcome of this case could set a major precedent in the ongoing debate around how AI companies source data from public platforms.

As AI developers race to build ever more powerful models — like those from Elon Musk’s xAI and Apple’s AI features — the temptation to scrape large quantities of user-generated content from platforms like Reddit, Twitter, and YouTube has grown.

However, as Reddit argues, just because data is accessible on the open web doesn’t mean it is free of contractual restrictions or privacy considerations.

The lawsuit draws attention to the growing tension between the principles of an open internet and the rights of platforms and users to control how their data is used — especially for commercial AI products.

Industry experts say Reddit’s case could influence how other companies structure data licensing deals going forward. If Reddit prevails, AI firms may face higher legal risks when scraping public content without a formal agreement. Conversely, a ruling in favor of Anthropic could embolden other companies to pursue similar data collection tactics.

The timing of this legal battle is critical. As AI becomes further integrated into daily life — from fitness tools like WatchOS 26 AI workout buddy to national infrastructure as Nvidia warns UK AI infrastructure faces scaling challenges — the question of how training data is sourced will only grow more urgent.

As more AI firms depend on vast online datasets, cases like Reddit’s are likely to shape the next wave of AI development — and the legal frameworks that govern it.

What's your reaction?

Excited

Happy

In Love

Not Sure

Silly

Aishwarya Patole

Aishwarya is an experienced AI and tech content specialist with 5+ years of experience in turning intricate tech concepts into engaging, relatable stories. With expertise in AI applications, blockchain, and SaaS, she creates data-driven articles, explainer pieces, and trend reports that drive impact.