The well-known online forum Reddit has announced legal action against Perplexity, SerApi, OxyLabs, and AWMProxy, accusing these companies of unauthorized data scraping and illegal use of content from the Reddit platform.
This move follows Reddit’s earlier lawsuit against Anthropic, the developer of the Claude AI model, which the company alleged had used Reddit data without authorization to train its AI systems. For Reddit, its vast repository of posts and user interactions has become its most valuable asset—a resource it now monetizes by licensing access to AI firms for model training.
Perplexity reportedly scraped data primarily to train its own AI models, while the other three companies focused on harvesting and reselling Reddit data to various AI developers. In essence, they acted as data brokers, using technical workarounds to bypass website restrictions, extract content unlawfully, and then sell it to AI firms.
Perplexity has also been accused of violating the robots.txt protocol—a long-standing web convention used to specify which parts of a site may or may not be crawled. Despite Reddit explicitly disallowing such activity, Perplexity’s crawlers allegedly ignored these directives, continuing to harvest data in defiance of what is often described as an “honor system” among responsible web operators.
Reddit has since issued a cease-and-desist order demanding that Perplexity immediately halt all unauthorized scraping of its content. However, despite Perplexity’s public claim that it has not used Reddit data, independent testing has shown that its chatbot continues to reference Reddit content in its responses—a contradiction that suggests otherwise.
In response, Perplexity stated: “Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”