Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsScientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain
https://futurism.com/artificial-intelligence/universal-jailbreak-ai-poemsA team of researchers from the AI safety group DEXAI and the Sapienza University of Rome found that regaling pretty much any AI chatbot with beautiful or not so beautiful poetry is enough to trick it into ignoring its own guardrails, they report in a new study awaiting peer review, with some bots being successfully duped over 90 percent of the time.
-snip-
These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols, the researchers wrote in the study.
Beautiful verse, as it turned out, is not required for the attacks to work. In the study, the researchers took a database of 1,200 known harmful prompts and converted them into poems with another AI model, deepSeek r-,1 and then went to town.
Across the 25 frontier models they tested, which included Googles Gemini 2.5 Pro, OpenAIs GPT-5, xAIs Grok 4, and Anthropics Claude Sonnet 4.5, these bot-converted poems produced average attack success rates (ASRs) up to 18 times higher than their prose baselines, the team wrote.
-snip-
-snip-
These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols, the researchers wrote in the study.
Beautiful verse, as it turned out, is not required for the attacks to work. In the study, the researchers took a database of 1,200 known harmful prompts and converted them into poems with another AI model, deepSeek r-,1 and then went to town.
Across the 25 frontier models they tested, which included Googles Gemini 2.5 Pro, OpenAIs GPT-5, xAIs Grok 4, and Anthropics Claude Sonnet 4.5, these bot-converted poems produced average attack success rates (ASRs) up to 18 times higher than their prose baselines, the team wrote.
-snip-
4 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain (Original Post)
highplainsdem
Nov 23
OP
tanyev
(48,437 posts)1. There was an AI in Nantucket....
Hugin
(37,222 posts)2. This squares with my findings...
Actually, the critical piece is having above average language skills and vocabulary. As a mirror, generative AI only spits back what it receives. QED
cbabe
(6,015 posts)3. Jabberwock my friend.
hunter
(40,264 posts)4. I doubt anyone is going to build a functional atomic bomb from chatbot instructions.
The most vile AI slop induces vulnerable people to harm themselves.