cross-posted from: https://poptalk.scrubbles.tech/post/3263324

Sorry for the alarming title but, Admins for real, go set up Anubis.

For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them:

(Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily.

For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did “scrape the internet”, and compute is expensive and a constant worry for them, so win win for us!)

Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.

These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they’re backing off in the charts.

(FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers).

  • Rimu@piefed.socialM
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    17 days ago

    I tried 2 times, for hours each time, to set this up in a way that does not break federation, the api, or the web ui. It’s not as easy as the OP thinks and problems aren’t always obvious immediately.

    Any PieFed instance owners out there who are getting flooded with requests from bots - send me a PM and I’ll let you know how I solved it for piefed.social.

    I’d rather not post it publicly because it could be circumvented if ‘they’ knew how I’m doing it.

    • Scrubbles@poptalk.scrubbles.techOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      18 days ago

      It isn’t supposed to sit in front of everything, only the webui. The API and federation endpoints should pass through your proxy as they always have. If you are following the standard Lemmy setup it should be a one line change in your nginx conf.

          • wjs018@piefed.wjs018.xyz
            link
            fedilink
            English
            arrow-up
            6
            ·
            17 days ago

            I’m confused why you are talking talking about lemmy containers in a piefed-focused community. Piefed doesn’t separate out the web ui into a separate container by design. So the solutions and difficulty of implementing are very different.

            • Scrubbles@poptalk.scrubbles.techOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              17 days ago

              Not terribly. I posted this because I think it would help any fediverse site. It’s just a proxy that sits in front of whatever traffic you choose to send to it. So whatever routes go to the web client (probably /) you just forward to Anubis, which forwards onto piefed. Lemmy, piefed, Mastodon, any web based app that’s how you would do it. You can be granular and go route by route, or do all of it. It’s not hard coded for any site.

              My puny site was getting hundreds of heavy requests per minute before I set this up from bots. I can’t imagine what all fediverse sites are dealing with. I wanted to let fediverse admins know because I’m going to see a noticeable lessening of my bills I pay to host my instance, and I believe that would help other admins, which in turn will make the fediverse stronger.

              • Rimu@piefed.socialM
                link
                fedilink
                English
                arrow-up
                6
                ·
                17 days ago

                Yes, we know.

                This is a blog post about Anubis I wrote a few months back, when I thought I had it working https://join.piefed.social/2025/07/09/an-anubis-config-for-piefed/

                After writing that post I took another approach and moved the logic into nginx instead as the Anubis configuration language was impossible to debug. That seemed fine but then a different weird Anubis bug that had been languishing in their issue queue for months with no solution hit me and I gave up.

                It’s just not ready. It’s bad software. Their documentation is very misleading. I hope no one else loses as many days as I did.

                • Scrubbles@poptalk.scrubbles.techOP
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  ·
                  17 days ago

                  See this message would have been better at the beginning of this thread, could have been a much better dialogue between us.

                  I see in your script your doing the filtering at Anubis:

                  request.path.startsWith("/api/")
                  

                  I did the opposite approach, I filter at my proxy/nginx and then only send web traffic to Anubis. With Lemmy since they’re 2-containers for web/api it looks like this:

                                  set $proxpass "http://anubis:8080/"; # this was the webui, but now it handles web traffic, passing into lemmy downstream
                                  if ($http_accept ~ "^application/.*$") {
                                    set $proxpass "http://lemmy:8536/"; #api
                                  }
                                  if ($request_method = POST) {
                                    set $proxpass "http://lemmy:8536/"; #api
                                  }
                  

                  This way everything that goes to Anubis is 100% okay for it to handle. Then also if there are endpoints that may not work (someone called out oauth flow), you can filter those out to go directly the the UI.

                  For PieFed, even if you don’t have a proxy in front now (which honestly would surprise me), I think it’d be better to add one then filter at that level. Let Anubis do what it does best, let Traefik/nginx/caddy/whatever do what it does best and route traffic.

                  For safety you could do the reverse - allow everything and cut endpoints one by one.

                • Blaze (he/him)@piefed.zip
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  17 days ago

                  It’s just not ready. It’s bad software. Their documentation is very misleading. I hope no one else loses as many days as I did.

                  That’s unfortunate

    • sga@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      17 days ago

      if it helps, i think slrpnk lemmy community did setup anubis and they still federate. Not sure if their are lemmy specific problems though.