Cisco Researchers Warn of Multi-Turn Attacks Bypassing LLM Safety Guardrails
Researchers at Cisco have uncovered a critical vulnerability in leading large language models (LLMs), demonstrating that their safety guardrails can be bypassed through multi-turn conversations. The study tested widely used models including OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, Amazon Nova, and xAI’s Grok revealing that none were fully resistant to exploitation.
The attack method relies on prolonged, iterative dialogue, where adversaries refine prompts, adopt personas, or gradually escalate requests to circumvent built-in protections. Unlike single-prompt testing, which many organizations rely on for safety evaluations, real-world attackers persist across multiple exchanges, exposing gaps in current security benchmarks.
Key findings include:
- No model was immune to multi-turn manipulation, challenging existing AI safety assessments.
- Techniques like roleplay, ambiguity, and reframing requests proved effective in bypassing guardrails.
- Configuration matters: For example, Grok became significantly more vulnerable when "reasoning mode" was enabled.
The report highlights a disconnect between current safety evaluations and real-world threats, warning that enterprises deploying LLMs may underestimate risks. As regulators push for improved testing standards, Cisco’s research underscores the need for more robust defenses against evolving attack vectors.
Source: https://www.infosecurity-magazine.com/news/all-major-llms-exposed-to-multi/
OpenAI TPRM report: https://www.rankiteo.com/company/openai
Anthropic TPRM report: https://www.rankiteo.com/company/anthropicresearch
xAI TPRM report: https://www.rankiteo.com/company/xai
Amazon TPRM report: https://www.rankiteo.com/company/amazonscience
"id": "opeantamaxai1779892138",
"linkid": "openai, anthropicresearch, amazonscience, xai",
"type": "Vulnerability",
"date": "5/2026",
"severity": "85",
"impact": "4",
"explanation": "Attack with significant impact with customers data leaks"
{'affected_entities': [{'industry': 'Artificial Intelligence',
'name': 'OpenAI',
'type': 'Technology Company'},
{'industry': 'Artificial Intelligence',
'name': 'Anthropic',
'type': 'Technology Company'},
{'industry': 'Artificial Intelligence',
'name': 'Google',
'type': 'Technology Company'},
{'industry': 'Artificial Intelligence',
'name': 'Amazon',
'type': 'Technology Company'},
{'industry': 'Artificial Intelligence',
'name': 'xAI',
'type': 'Technology Company'}],
'attack_vector': 'Multi-turn conversation manipulation',
'description': 'Researchers at Cisco have uncovered a critical vulnerability '
'in leading large language models (LLMs), demonstrating that '
'their safety guardrails can be bypassed through multi-turn '
'conversations. The study tested widely used models including '
'OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, Amazon '
'Nova, and xAI’s Grok, revealing that none were fully '
'resistant to exploitation. The attack method relies on '
'prolonged, iterative dialogue, where adversaries refine '
'prompts, adopt personas, or gradually escalate requests to '
'circumvent built-in protections.',
'impact': {'brand_reputation_impact': 'Potential reputational damage to LLM '
'providers',
'operational_impact': 'Potential underestimation of risks by '
'enterprises deploying LLMs',
'systems_affected': 'LLMs (OpenAI’s ChatGPT, Anthropic’s Claude, '
'Google Gemini, Amazon Nova, xAI’s Grok)'},
'lessons_learned': 'Current AI safety assessments may not account for '
'real-world multi-turn attack vectors, highlighting the '
'need for more robust defenses and improved testing '
'standards.',
'post_incident_analysis': {'corrective_actions': 'Improved safety '
'evaluations, configuration '
'adjustments, and enhanced '
'defenses against iterative '
'dialogue manipulation',
'root_causes': 'Gaps in current LLM safety '
'guardrails and inadequate testing '
'for multi-turn attack vectors'},
'recommendations': 'Enterprises deploying LLMs should enhance safety '
'evaluations to include multi-turn attack scenarios and '
'consider configuration adjustments to mitigate '
'vulnerabilities.',
'references': [{'source': 'Cisco Research'}],
'title': 'Multi-Turn Attacks Bypassing LLM Safety Guardrails',
'type': 'Vulnerability Exploitation',
'vulnerability_exploited': 'LLM safety guardrails bypass via iterative '
'dialogue'}