LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

  • sudo@programming.dev
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    3 days ago

    The assumption is correct. PoW has been proven to significantly reduce bot traffic.

    What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.

    meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.

    Correct, and thats why they are the number one expense for any scraping company. Any scraper that can’t be bothered to spin up a headless browser isn’t going to cough up the dough for residential proxies.

    Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks

    That’s not what “demonstrably false” even means. Canvas fingerprinting filters out bots better than PoW. What you’re complaining about too strict settings and some users being denied. Make your Anubis settings too high you’ll have users waiting long times while their batteries drain.

    • refalo@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      3 days ago

      What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.

      Most bots and scrapers from what I’ve seen already are using (headless) full browsers, and hence are executing javascript, so I think anything that slows them down or increases their cost can reduce the traffic they bring.

      Canvas fingerprinting filters out bots better than PoW

      Source? I strongly disagree, and it’s not hard to change your browser characteristics to get a new canvas fingerprint every time, some browsers like firefox even have built-in options for it.

      • sudo@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        11 hours ago

        Most bots and scrapers from what I’ve seen already are using (headless) full browsers

        That’s not going to be the majority of your bot traffic by a long shot because it doesn’t scale like using basic HTTP requests.

        This is from personal experience. With PoW you just need any puppetted browser, maybe less. With Canvas finerprinting you need a heavily customized scraping browser, either one you made yourself or one you’re paying for. If that’s the case the cost of PoW is neglible. If you still want actual stats, I’d have to ask where you’re getting any stats on PoW working.

      • YetiSkotch@ieji.de
        link
        fedilink
        arrow-up
        1
        ·
        3 days ago

        @refalo @sudo If Proof of Work gets widely adopted I foresee a future where bot running data-centers can out-compute humans to visit sites, while old devices of users in poorer countries struggle to compute the required task for hours … Or is that fear misguided?

        • sudo@programming.dev
          link
          fedilink
          arrow-up
          1
          ·
          11 hours ago

          Admins will always turn down the bot management when it starts blocking end users. At that point you cough up the money for the extra bandwidth and investigate different solutions.