The AI data gold rush meets its match: Cloudflare

TechCrunch°:

Cloudflare, the publicly traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

Cloudflare’s stepping into the AI scraping fray with a new tool to block sneaky bots. The tool uses machine learning (ironically) to spot AI bots trying to masquerade as regular users.

It’s a timely move, given the recent kerfuffle over AI companies like Perplexity° playing fast and loose with web scraping ethics.

AI companies really need to start being more respectful of content creators. Because I can feel the tide turning against them. More and more people and companies who publish on the web are becoming anti-AI.

After the story broke about Perplexity not respecting robots.txt° it felt like loads of people started thinking about how to block AI web crawlers for the first time. Before that they hadn’t even thought about it.

Cloudflare’s tool might help. But the real solution needs to come via the AI industry taking a long, hard look at its data practices and quite simply, not being dicks.