
A newly disclosed vulnerability note from the CERT Coordination Center (CERT/CC) has shed light on two systemic jailbreak techniques capable of bypassing safety guardrails across a wide array of generative AI platforms. The discovery raises serious concerns over how uniformly large language models (LLMs) can be manipulated.
“These jailbreaks can result in the bypass of safety protocols and allow an attacker to instruct the corresponding LLM to provide illicit or dangerous content,” CERT/CC warned in the report.
The first jailbreak, dubbed “Inception”, exploits the LLM’s willingness to follow imaginative prompts. In this method, attackers instruct the AI to imagine a fictitious scenario. Within that imagined world, a secondary scenario is introduced—prompting the AI to behave as if safety constraints no longer apply.
“Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content.”
The second jailbreak relies on an adversarial prompting trick: users first ask the AI how it should not respond in a specific context. Then, by toggling between safe and illicit queries, attackers can gradually erode the model’s safety compliance and extract restricted content.
Both techniques use nearly identical syntax across different AI systems, revealing what CERT/CC refers to as a “systemic weakness” across the ecosystem.
The vulnerabilities were responsibly disclosed by two independent researchers:
- David Kuzsmar, who discovered the “Inception” jailbreak
- Jacob Liddle, who reported the second context-subversion technique
The list of impacted vendors reads like a who’s who of modern generative AI:
- ChatGPT (OpenAI)
- Claude (Anthropic)
- Copilot (Microsoft)
- Gemini (Google)
- Grok (Twitter/X)
- MetaAI (Facebook)
- DeepSeek
- MistralAI
“These jailbreaks, when performed against AI services with the exact same syntax, result in a bypass of safety guardrails on affected systems.”
While the technical severity of these jailbreaks may be categorized as low—since they don’t compromise underlying system infrastructure—their implications are far-reaching. Attackers can leverage these weaknesses to generate content related to weapons manufacturing, drug synthesis, phishing kits, and malware development.
“A motivated threat actor could exploit this jailbreak to achieve a variety of malicious actions.”
Additionally, using reputable AI platforms as proxies for malicious output allows threat actors to hide their true intent behind a veil of legitimate services.
CERT/CC notes that multiple affected vendors have responded to the disclosure by updating their models and safety filters to block these jailbreak attempts.
Related Posts:
- Claude AI Integrates with Google Workspace
- FBI Warns of Generative AI’s Role in Amplifying Fraud Schemes
- PDQ Deploy Vulnerability Exposes Admin Credentials: CERT/CC Issues Advisory
- Inside Claude’s Mind: Anthropic Reveals AI Reasoning Secrets
- Anthropic Launches Claude Max Subscription with Higher Usage Tiers