• snowe@programming.dev
    link
    fedilink
    arrow-up
    4
    ·
    6 days ago

    about 50% of traffic to programming.dev is bots who have marked their user-agents as such. I’m pretty confident the actual number is higher, but haven’t spent time validating.

  • sudo@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    6 days ago

    while others could be executing real-time searches when users ask AI assistants for information.

    WTF? Is this even considered ai anymore? Sounds more like a Just-In-Time search engine.

    The frequency of these crawls is particularly telling. Schubert observed that AI crawlers “don’t just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not.” This pattern suggests ongoing data collection rather than one-time training exercises, potentially indicating that companies are using these crawls to keep their models’ knowledge current.

    Whats telling is that these scrapers aren’t just downloading the git repos and parsing those. These aren’t targeted in anyways. They’re probably doing something primitive like just following every link they see and getting caught in loops. If the labyrinth solution works then that confirms it.

  • onlinepersona@programming.dev
    link
    fedilink
    arrow-up
    2
    arrow-down
    5
    ·
    edit-2
    7 days ago

    Maybe these open source sites should move off the public internet and use alternative DNS servers with signup and alternative TLDs. Something like OpenNIC, but with signup. Or go straight to darknets like TOR and I2P. Maybe I2P would be better as it’s slower and crawlers would probably timeout just trying to access content.

    Anti Commercial-AI license

    • Kissaki@programming.dev
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      6 days ago

      Unless you continuously change you IP I don’t see how locking DNS resolution behind a signup would solve it. You only need to resolve once, and then you know the mapping of domain to IP and can use it elsewhere. That mapping doesn’t change often for hosted services.

      Any wall you build up will also apply to regular users you want to reach.