Punishing AI doesn't stop it from lying and cheating — it just makes it hide better, study shows

Nemeski@lemm.ee · 4 months ago

Punishing AI doesn't stop it from lying and cheating — it just makes it hide better, study shows

jet · edit-2 4 months ago

It’s a optimization game. If the punishment doesn’t offset the reward, then the incentive is to get better at cheating.

🇰 🔵 🇱 🇦 🇳 🇦 🇰 ℹ️@lemmy.world · edit-2 4 months ago

I’ve seen plenty of videos of random college kids training LLMs to play video games and getting the AI to stop cheating is like half the project. But they manage it, eventually. It’s laughable that these big companies and research firms can’t quite figure it out.

Punishing AI doesn't stop it from lying and cheating — it just makes it hide better, study shows

Punishing AI doesn't stop it from lying and cheating — it just makes it hide better, study shows

Punishing AI for lying and cheating might not be such a good idea after all