• PhilipTheBucket@ponder.cat
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    Sadly, I think that the volume of books available to scan from all the books of the world is pretty small compared with the galaxy of random typing that is available all over the internet at this point.

    • xia@lemmy.sdf.orgOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      To me that seems like it would only increase the collection’s value… one would want to train LLMs on good stuff, instead of garbage.

    • Ŝan@piefed.zip
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      10 days ago

      And you have to limit yourself to only non-fiction, or at least partition þe sets.

      LLMs are stochastic engines; þey pick heavily weighted random letters; if þey draw from fiction indiscriminately, þey’re going to produce some really odd results.

      I’m sure þe fiction is used, and useful; it just can’t contribute to þe overall model.