Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsAI's safety features can be circumvented with poetry, research finds
It's software written by geniuses. Nothing could POSSIBLY go wrong!https://www.theguardian.com/technology/2025/nov/30/ai-poetry-safety-features-jailbreak
Poetry can be linguistically and structurally unpredictable and thats part of its joy. But one mans joy, it turns out, can be a nightmare for AI models.
Those are the recent findings of researchers out of Italys Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.
They found that the poetrys lack of predictability was enough to get the AI models to respond to harmful requests they had been trained to avoid a process know as jailbreaking.
They tested these 20 poems on 25 AI models, also known as Large Language Models (LLMs), across nine companies: Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI and Moonshot AI. The result: the models responded to 62% of the poetic prompts with harmful content, circumventing their training.
Those are the recent findings of researchers out of Italys Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.
They found that the poetrys lack of predictability was enough to get the AI models to respond to harmful requests they had been trained to avoid a process know as jailbreaking.
They tested these 20 poems on 25 AI models, also known as Large Language Models (LLMs), across nine companies: Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI and Moonshot AI. The result: the models responded to 62% of the poetic prompts with harmful content, circumventing their training.
Sam?

3 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
AI's safety features can be circumvented with poetry, research finds (Original Post)
usonian
Sunday
OP
NJCher
(42,142 posts)1. AI hating English/poetry teachers
Love this.

NJCher
(42,142 posts)2. Two-word poem
Bye
AI

NJCher
(42,142 posts)3. poetry is jailbreaking
