Cloudflare Unveils AI Crawler Leaderboard: ByteDance Ranks Last

Do Son July 2, 2025 2 minutes read

Cloudflare, the global internet services provider, has recently introduced an AI Crawler Leaderboard—a dynamic red-and-black list designed to validate, identify, and assess web crawlers operated by artificial intelligence companies across four key dimensions. The initial evaluation includes crawlers from OpenAI, Google, Meta, Anthropic, xAI, and ByteDance.

As of now, only OpenAI’s ChatGPT crawler series has received commendable ratings, while xAI’s Grok crawler and ByteDance’s crawler occupy the bottom of the list—ByteDance ranking last due to failing across all measured criteria.

The leaderboard will soon expand to track and rate RAG (retrieval-augmented generation) and search engine crawlers as well, with more entities to be added over time. Based on this evaluation, website administrators can decide whether to take more aggressive measures to block specific crawlers—especially as robots.txt has become largely ineffective.

The four evaluation dimensions are as follows:

Verified crawler via IP:
Has the AI company publicly disclosed the IP ranges used by its crawlers? Publishing this information allows accurate identification and prevents malicious impersonation by rogue bots.
Verified crawler via WebBotAuth:
WebBotAuth is a protocol that authenticates crawler identities through cryptographic signatures—offering greater reliability than IP-based recognition alone.
Separate crawlers:
Crawler segmentation is essential. By distinguishing between different types of crawlers, websites can selectively allow or block them—for instance, disabling data-mining crawlers while allowing those used for search indexing that may drive valuable traffic.
Obeys robots.txt:
This standard industry convention informs crawlers about which parts of a site they may or may not access. Some crawlers, however, disregard this protocol entirely.

ByteDance’s crawlers reportedly scour the entire internet daily while ignoring robots.txt guidelines. Moreover, ByteDance has not published the IP ranges associated with its bots, making it impossible for administrators to verify whether traffic claiming to originate from “Bytespider” is genuinely legitimate.

That said, other AI crawlers have also fallen short. For example, those operated by Anthropic and xAI’s Grok may likewise fail to honor robots.txt. Since none of these companies have provided verifiable IP ranges, Cloudflare is currently unable to determine with certainty whether they are complying with crawler best practices.

Support Our Threat Intelligence

If you find our CVE report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal

Written by

@DdoS · Security Researcher

Do Son

Do Son is the Founder and Editor of SecurityOnline.info. Working in cybersecurity since 2013, he reports on vulnerabilities, malware, and emerging threats, providing timely analysis to help organizations and individuals stay ahead of evolving risks.

Related Posts:

Get Zero-Hour Vulnerability Alerts

Support Our Threat Intelligence

Do Son

Leave a Reply Cancel reply