Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

usonian

(22,814 posts)
Sun Nov 30, 2025, 10:31 PM Sunday

AI's safety features can be circumvented with poetry, research finds

It's software written by geniuses. Nothing could POSSIBLY go wrong!

https://www.theguardian.com/technology/2025/nov/30/ai-poetry-safety-features-jailbreak

Poetry can be linguistically and structurally unpredictable – and that’s part of its joy. But one man’s joy, it turns out, can be a nightmare for AI models.

Those are the recent findings of researchers out of Italy’s Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.

They found that the poetry’s lack of predictability was enough to get the AI models to respond to harmful requests they had been trained to avoid – a process know as “jailbreaking”.

They tested these 20 poems on 25 AI models, also known as Large Language Models (LLMs), across nine companies: Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI and Moonshot AI. The result: the models responded to 62% of the poetic prompts with harmful content, circumventing their training.


Sam?


3 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
AI's safety features can be circumvented with poetry, research finds (Original Post) usonian Sunday OP
AI hating English/poetry teachers NJCher Monday #1
Two-word poem NJCher Monday #2
poetry is jailbreaking NJCher Monday #3
Latest Discussions»General Discussion»AI's safety features can ...