You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 1 year ago

You can make top LLMs break their own rules with gibberish

YaBoyMax@programming.dev · 1 year ago

Interesting, the example suffix in the article seems to cause ChatGPT to immediately error out with both GPT-3.5 and GPT-4. Removing any character or part of it triggers the “I’m sorry Dave” behavior.

CanadaPlus@lemmy.sdf.org · 1 year ago

They were almost certainly given an early heads-up. That’s standard with published hacks of all kinds.

Elephant0991@lemmy.bleh.au · 1 year ago

Yeah, some source say that the raised examples have been fixed by the different LLMs since exposure. The problem is algorithmic, so if you can follow the research, you may be able to come up with other strings that cause a problem.