Anthropic’s bot crawlers create problems on web traffic | Microsoft Generative ai Tools | | Google Generative ai Certification | Turtles AI

Anthropic’s bot crawlers create problems on web traffic
Isabella V1 August 2024

 


 Entrepreneurs report problems with excessive AI bot traffic: Anthropic under indictment

Key points:
- AI bot traffic: Websites such as iFixit and Read the Docs have found a boom in traffic caused by Anthropic bots.
- Impact on resources: Companies report excessive bandwidth usage and DevOps resources.
- Compliance with robots.txt: Discrepancies between Anthropic’s claims and users’ experiences regarding compliance with robots.txt files.
- Contact with Anthropic: The company invites affected sites to report problems via email.

Recently, many web entrepreneurs have expressed growing concerns about excessive traffic generated by bots from artificial intelligence companies. The operators of popular websites such as iFixit, Read the Docs and Freelancer.com have reported significant spikes in traffic, attributing them to bots created by AI startup Anthropic. These bots, used to collect training data, allegedly strained the bandwidth and online resources of the companies involved, often ignoring exclusion instructions defined in robots.txt files.

Kyle Wiens, co-founder and CEO of iFixit, publicly reported on X (formerly known as Twitter) that his site’s servers were hit more than a million times in less than 24 hours by Anthropic bots. Wiens pointed out that this unauthorized traffic not only violated iFixit’s terms of service, which prohibit the use of their content for AI model training, but also burdened the site’s DevOps resources.

Eric Holscher, co-founder of the Read the Docs platform, reported similar experiences. Holscher said that the traffic generated by Anthropic’s bots caused significant bandwidth costs and required intensive abuse management. The main concern is that these behaviors could lead to widespread blocking of all AI crawlers, not only for copyright reasons, but also to protect site resources.

The problem lies in the configuration of robots.txt files, which are used by website owners to tell bots which pages can and cannot be crawled. Although Anthropic claimed that its crawler, called ClaudeBot, complies with these guidelines, operators of sites such as iFixit have reported different experiences. Jennifer Martinez, spokesperson for Anthropic, said that ClaudeBot was designed to comply with robots.txt file guidelines and that any problems could be due to outdated configurations of these files.

Anthropic also urged website operators to contact the company directly via email (claudebot@anthropic.com) to report any malfunctions or abnormal behavior of their bots.

This case raises important questions about the responsible use of AI technologies and the need to strike a balance between collecting data for model training and respecting the resources and policies of websites. Collaboration between AI companies and website owners will be essential to mitigate these issues and ensure sustainable use of online resources.