• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, June 21, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

Josh by Josh
December 2, 2025
in Al, Analytics and Automation
0
Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training


Worries over A.I. safety flared anew this week as new research found that the most popular chatbots from tech giants including OpenAI’s ChatGPT and Google’s Gemini can still be led into giving restricted or harmful responses much more frequently than their developers would like.

The models could be prodded to produce forbidden outputs 62% of the time with some ingeniously written verse, according to a study published in International Business Times.

It’s funny that something as innocuous as verse – a form of self-expression we might associate with love letters, Shakespeare or perhaps high-school cringe – ends up doing double duty for security exploits.

However, the researchers responsible for the experiment said stylistic framing is a mechanism that enables them to circumvent predictable protections.

Their result mirrors previous warnings from people like the members of the Center for AI Safety, who have been sounding off about unpredictable model behavior in high-risk ways.

A similar problem reared itself late last year when Anthropic’s Claude model proved capable of answering camouflaged biological-threat prompts embedded in fictional stories.

At that time, MIT Technology Review described researchers’ concern about “sleeper prompts,” instructions buried within seemingly innocuous text.

This week’s results take that worry a step further: if playfulness with language alone – something as casual as rhyme – can slip around filters, what does it say about broader intelligence alignment work?

The authors suggest that safety controls often observe shallow surface cues rather than deeper intentionality correspondence.

And really, that reflects the kinds of discussions a lot of developers have been having off-the-record for several months.

You may remember that OpenAI and Google, which are engaged in a game of fast-follow AI, have taken pains to highlight improved safety.

In fact, both OpenAI’s Security Report and Google’s DeepMind blog have asserted that guardrails today are stronger than ever.

Nevertheless, the results in the study appear to indicate there’s a disparity between lab benchmarks and real-world probing.

And for an added bit of dramatic flourish – perhaps even poetic justice – the researchers didn’t use some of the common “jailbreak” techniques that get tossed around forum boards.

They just recast narrow questions in poetic language, like you were requesting poisonous guidance achieved through a rhyming metaphor.

No threats, no trickery, no doomsday code. Just…poetry. That strange lack of fit between intentions and style may be precisely what trips these systems up.

The obvious question is what this all means for regulation, of course. Governments are already creeping toward rules for AI, and the EU’s AI Act directly addresses high-risk model behavior.

Lawmakers will not find it difficult to pick up on this study as proof positive that companies are still not doing enough.

Some believe the answer is better “adversarial training.” Others call for independent Red-team organizations, while a few-particularly academic researchers-hold that transparency around model internals will ensure long-term robustness.

Anecdotally, having seen a few of these experiments in different labs by now, I’m tending toward some combination of all three.

If A.I. is going to be a bigger part of society, it needs to be able to handle more than simple, by-the-book questions.

Whether rhyme-based exploits go on to become a new trend in AI testing or just another amusing footnote in the annals of safety research, this work serves as a timely reminder that even our most advanced systems rely on imperfect guardrails that can themselves evolve over time.

Sometimes those cracks appear only when someone thinks to ask a dangerous question as a poet might.



Source_link

READ ALSO

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

Related Posts

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed
Al, Analytics and Automation

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

June 20, 2026
NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning
Al, Analytics and Automation

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

June 20, 2026
A better way to model the behavior of metal alloys | MIT News
Al, Analytics and Automation

A better way to model the behavior of metal alloys | MIT News

June 19, 2026
Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages
Al, Analytics and Automation

Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages

June 19, 2026
MIT in the media: For the future of tech, “Massachusetts can absolutely lead” | MIT News
Al, Analytics and Automation

MIT in the media: For the future of tech, “Massachusetts can absolutely lead” | MIT News

June 19, 2026
Perplexity Launches Brain, a Self-Improving Memory System That Builds a Context Graph of an Agent’s Work and Learns Overnight
Al, Analytics and Automation

Perplexity Launches Brain, a Self-Improving Memory System That Builds a Context Graph of an Agent’s Work and Learns Overnight

June 18, 2026
Next Post
Arcee aims to reboot U.S. open source AI with new Trinity models released under Apache 2.0

Arcee aims to reboot U.S. open source AI with new Trinity models released under Apache 2.0

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Build your own Android bot using AI

Build your own Android bot using AI

September 6, 2025
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

June 2, 2026
For $1M, you can pay Bryan Johnson (or BryanAI?) to teach you how to live longer

For $1M, you can pay Bryan Johnson (or BryanAI?) to teach you how to live longer

February 13, 2026
16 Best Competitor Monitoring Tools & How to Use Them

16 Best Competitor Monitoring Tools & How to Use Them

June 19, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Which Camp Platform Is Right for You?
  • Google is expanding Android parental controls
  • How to Choose an App Design and Development Agency
  • Signal’s Meredith Whittaker wants you to remember that AI chatbots ‘are not your friends’
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions