Ask HN: What measures are you taking to stop AI crawlers?

Curious to know what steps people here are taking to protect their sites, products, and APIs. What have you tried that actually works in practice?

6 points | by kjok 13 hours ago

5 comments

mmarian 42 minutes ago
I set up the Cloudflare blocks on one site where I don't want the content to be ingested. Seems to work pretty well, my SEO looks to be ok too.
JohnFen 13 hours ago
I spent a lot of time trying to find a good solution to this problem and failed, so what I ended up doing was to give up and remove my sites from the public web entirely.
I'm eager for a good solution that will allow me to put them back, but I'm doubtful that's going to happen. In any case, I'm extremely interested in other people's replies here. Maybe there's a solution that I haven't been able to find!
[-]
- mmarian 45 minutes ago
  I'm curious, what made you decide to completely remove them from the public web?
ATechGuy 10 hours ago
Just saw this https://x.com/ycombinator/status/1960779353589211577
They say "... can scrape any website—not even Cloudflare can detect it."
johng 13 hours ago
Some of our sites have been getting absolutely hammered by the AI bots -- so much so they are taking down the sites. Even with cloudflare protection and caching. The only thing We've been able to do so far is tell Cloudflare to block all AI bots, modify the robots.txt and even then we've had to manually identify IP addresses and bots that ignore all of the above and block them specifically or at the ASN level.
Cloudflare makes doing this kind of stuff easy but I would hate to have to do this manually on a webserver. And I don't like the idea of how much of the internet already relies on Cloudflare.
bediger4000 13 hours ago
I have a lot of them in robots.txt as disallow /, of course. I have several getting 404 on any request whatsoever, Meta's AI crawler, Bytespider mainly, via Apache httpd mod_rewrite.