I realise this is a known issue and that lemmy.world isn’t the only instance that does this. Also, I’m aware that there are other things affecting federation. But I’m seeing some things not federate, and can’t help thinking that things would be going smoother if all the output from the biggest lemmy instance wasn’t 50% spam.

Hopefully this doesn’t seem like I’m shit-stirring, or trying to make the Issue I’m interested in more important than other Issues. It’s something I mention occasionally, but it might be a bit abstract if you’re not the admin of another instance.

The red terminal is a tail -f of the nginx log on my server. The green terminal is outputting some details from the ActivityPub JSON containing the Announce. You should be able to see the correlation between the lines in the nginx log, and lines from the activity, and that everything is duplicated.

This was generated by me commenting on an old post, using content that spawns an answer from a couple of bots, and then me upvoting the response. (so CREATE, CREATE, LIKE, is being announced as CREATE, CREATE, CREATE, CREATE, LIKE, LIKE). If you scale that up to every activity by every user, you’ll appreciate that LW is creating a lot of work for anyone else in the Fediverse, just to filter out the duplicates.

  • bamboo@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    1
    ·
    3 months ago

    Are you able to include the HTTP Method being called and the amount of data transferred per request? It’s possible that the first request is an OPTION request and then the second request is a POST.

    If you can see the amount of data transferred, then you can have some more indication that double the requests are being sent and quantity the bandwidth impact at least.

    • freamon@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      28
      ·
      3 months ago

      They’ll all POST requests. I trimmed it out of the log for space, but the first 6 requests on the video looked like (nginx shows the data amount for GET, but not POST):

      ip.address - - [07/Apr/2024:23:18:44 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      ip.address- - [07/Apr/2024:23:18:44 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      ip.address - - [07/Apr/2024:23:19:14 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      ip.address - - [07/Apr/2024:23:19:14 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      ip.address - - [07/Apr/2024:23:19:44 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      ip.address - - [07/Apr/2024:23:19:44 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://lemmy.world"
      

      If I was running Lemmy, every second line would say 400, from it rejecting it as a duplicate. In terms of bandwidth, every line represents a full JSON, so I guess it’s about 2K minimum for the standard cruft, plus however much for the actual contents of comment (the comment replying to this would’ve been 8K)

      My server just took the requests and dumped the bodies out to a file, and then a script was outputting the object.id, object.type and object.actor into /tmp/demo.txt (which is another confirmation that they were POST requests, of course)

    • hoshikarakitaridia@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      3 months ago

      If the first one is OPTION, would that be a bug? Would the right design principle be to do it once per endpoint and then cache it for future requests?

      I’m really curious cause I don’t know how this usually works…

        • Lemzlez@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          3 months ago

          I’ve never really seen this in (Java/Rust/PHP) backend personally, only in client-side JS (the CORS preflight).

          It’s a security feature for browsers doing calls (checking the CORS headers before actually calling the endpoint), but for backends the only place it makes sense is if you’re implementing something like webhooks, to validate the (user submitted) endpoint.

          • ericjmorey@discuss.online
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 months ago

            I wonder if the legacy webhooks implementation in Lemmy has left some artifacts that show up when the services that comprise Lemmy are split up as they are for larger instances.

            This is pure speculation.