• jetA
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 day ago

    It’s a optimization game. If the punishment doesn’t offset the reward, then the incentive is to get better at cheating.

    • 🇰 🔵 🇱 🇦 🇳 🇦 🇰 ℹ️@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      17 hours ago

      I’ve seen plenty of videos of random college kids training LLMs to play video games and getting the AI to stop cheating is like half the project. But they manage it, eventually. It’s laughable that these big companies and research firms can’t quite figure it out.