Reddit Blocks Internet Archive to Stop AI Companies from Scraping Data for Free

Do Son August 12, 2025 2 minutes read

The popular online forum Reddit recently disclosed that it had discovered artificial intelligence companies harvesting Reddit data via the Internet Archive’s Wayback Machine, a practice the company says violates its terms of service.

Reddit has already blocked most search engine crawlers and AI scrapers from accessing its content. Under current policy, any party wishing to scrape Reddit data for AI model training must first obtain a commercial license and pay a fee. For example, Google reportedly pays Reddit up to $60 million annually for data access, allowing it to harvest vast numbers of posts and other content for training its models—an arrangement Google still considers worthwhile.

Historically, Reddit has collaborated with the Internet Archive to index posts and preserve snapshots in the Wayback Machine for future reference. However, AI companies seeking to avoid licensing fees have begun redirecting their crawlers to the Internet Archive, using it as a proxy to obtain Reddit data.

Upon discovering this, Reddit announced it would immediately begin blocking the Internet Archive from crawling and indexing most of its pages. The Wayback Machine will no longer be able to capture post detail pages, comments, or user profiles. Instead, it will be limited to indexing only certain public-facing elements such as the Reddit homepage and popular post listings—effectively restricted to titles and similar metadata.

Reddit’s CEO stated that the company would begin enforcing these restrictions as of today, having already notified the Internet Archive in advance. The Internet Archive has confirmed it is in active discussions with Reddit regarding the matter.

This move follows Reddit’s recent lawsuit against Anthropic, the developer of Claude, alleging that Anthropic scraped Reddit content without authorization. Even after Reddit explicitly blocked its crawlers, the company claims Anthropic continued to harvest data, in direct violation of its terms of service.

Support Our Threat Intelligence

If you find our CVE report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal

Written by

@DdoS · Security Researcher

Do Son

Do Son is the Founder and Editor of SecurityOnline.info. Working in cybersecurity since 2013, he reports on vulnerabilities, malware, and emerging threats, providing timely analysis to help organizations and individuals stay ahead of evolving risks.

Related Posts:

Get Zero-Hour Vulnerability Alerts

Support Our Threat Intelligence

Do Son

Leave a Reply Cancel reply