• algernon@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      It can stop them nowadays, by firewalling some of the crawlers off. The reason it doesn’t stop them by default is because it serves them poisoned URLs, which it can later identify if the crawlers come back riding a headless Chrome. But once they do that, and hit a poisoned URL, there’s little reason to let them wander in an endless maze further: serve one request, and block the IP.

      I’ve been running that on my own infra, and my daily number of requests went down from ~50+ million to… 2 million.