Security researchers exploited cross-modal vulnerabilities in **OpenAI’s Sora 2**—a cutting-edge multimodal AI model for video generation—to extract its system prompt, a critical security artifact defining the model’s behavioral guardrails and operational constraints. The attack leveraged **audio transcription** as the most effective method, bypassing traditional safeguards by fragmenting and reassembling small token sequences from generated speech clips. While the extracted prompt itself may not contain highly sensitive data, its exposure reveals **content restrictions, copyright protections, and technical specifications**, which could enable follow-up attacks or model misuse.The vulnerability stems from **semantic drift** during cross-modal transformations (text → image → video → audio), where errors accumulate but short fragments remain recoverable. Unlike text-based LLMs trained to resist prompt extraction, Sora 2’s multimodal architecture introduced new attack surfaces. Researchers circumvented visual-based extraction (e.g., QR codes) due to poor text rendering in AI-generated frames, instead optimizing audio output for high-fidelity recovery. This breach underscores systemic risks in securing multimodal AI systems, where each transformation layer introduces noise and exploitable inconsistencies.The incident highlights the need to treat **system prompts as confidential configuration secrets** rather than benign metadata, as their exposure compromises model integrity and could facilitate adversarial exploits targeting behavioral constraints or proprietary logic.
Source: https://gbhackers.com/openai-sora-2-vulnerability-allows-exposure-of-hidden-system-prompts/
OpenAI cybersecurity rating report: https://www.rankiteo.com/company/openai
"id": "OPE0792807111325",
"linkid": "openai",
"type": "Vulnerability",
"date": "5/2025",
"severity": "85",
"impact": "4",
"explanation": "Attack with significant impact with customers data leaks"
{'affected_entities': [{'industry': 'Artificial Intelligence',
'location': 'San Francisco, California, USA',
'name': 'OpenAI',
'type': 'AI Research Organization'}],
'attack_vector': ['Cross-Modal Chaining',
'Audio Transcription Exploitation',
'Semantic Drift in Multimodal Transformations'],
'data_breach': {'data_exfiltration': ['Partial/Full System Prompt via Audio '
'Transcription'],
'file_types_exposed': ['Audio Clips (Transcribed)',
'Optical Character Recognition (OCR) '
'Fragments'],
'sensitivity_of_data': ['Moderate (Security Artifact, Not '
'Directly Sensitive but Enables '
'Misuse)'],
'type_of_data_compromised': ['System Prompt',
'Model Guardrails',
'Content Restrictions',
'Technical Specifications']},
'description': 'Security researchers successfully extracted the system prompt '
'from OpenAI’s Sora 2 video generation model by exploiting '
'cross-modal vulnerabilities, with audio transcription proving '
'to be the most effective extraction method. The core '
'vulnerability stems from semantic drift occurring when data '
'transforms across modalities (text → image → video → audio), '
'allowing short fragments of the system prompt to be recovered '
'and stitched together. This highlights challenges in securing '
'multimodal AI systems, as each transformation layer '
'introduces noise and potential for unexpected behavior. While '
'the extracted prompt itself may not be highly sensitive, it '
'defines model constraints, content restrictions, and '
'technical specifications, which could enable follow-up '
'attacks or misuse.',
'impact': {'brand_reputation_impact': ['Highlighted Vulnerabilities in AI '
'Security',
'Potential Erosion of Confidence in '
'Multimodal Models'],
'data_compromised': ['System Prompt (Partial/Full)',
'Model Behavior Constraints',
'Technical Specifications'],
'operational_impact': ['Potential for Follow-Up Attacks',
'Misuse of Model Constraints',
'Erosion of Trust in AI Guardrails'],
'systems_affected': ['OpenAI Sora 2 (Multimodal Video Generation '
'Model)']},
'investigation_status': 'Disclosed by Security Researchers (No Official '
'Response from OpenAI Mentioned)',
'lessons_learned': ['Multimodal AI systems introduce unique vulnerabilities '
'due to semantic drift across data transformations (text '
'→ image → video → audio).',
'System prompts should be treated as sensitive '
'configuration secrets, not harmless metadata.',
'Traditional text-based prompt extraction safeguards '
"(e.g., 'never reveal these rules') are ineffective in "
'multimodal contexts where alternative modalities (e.g., '
'audio) can bypass restrictions.',
'Fragmented extraction of small token sequences can '
'circumvent distortions in visual/audio outputs, enabling '
'reconstruction of sensitive information.',
'AI models with multiple transformation layers (e.g., '
'video generation) compound errors, creating '
'opportunities for exploitation.'],
'motivation': ['Research',
'Vulnerability Disclosure',
'AI Security Assessment'],
'post_incident_analysis': {'root_causes': ['Lack of modality-aware safeguards '
'in Sora 2’s design, assuming '
'text-based protections would '
'extend to audio/video outputs.',
'Semantic drift in multimodal '
'transformations enabling '
'fragmented data recovery.',
'Over-reliance on probabilistic '
'model behavior without '
'deterministic checks for prompt '
'leakage.']},
'recommendations': ['Implement modality-specific guardrails to prevent '
'cross-modal prompt extraction (e.g., audio watermarking, '
'visual distortion for text).',
'Treat system prompts as high-value secrets with access '
'controls and encryption.',
'Conduct red-team exercises focusing on multimodal attack '
'vectors (e.g., audio transcription, OCR bypasses).',
'Monitor for semantic drift in transformations and apply '
'noise reduction or consistency checks.',
'Adopt defense-in-depth strategies, such as rate-limiting '
'prompt extraction attempts or detecting anomalous token '
'reconstruction patterns.',
'Collaborate with the AI security community to '
'standardize protections for multimodal models.'],
'references': [{'source': 'GBHackers (GBH)'},
{'source': 'System Prompt Examples from Major AI Providers '
'(Anthropic, Google, Microsoft, etc.)'}],
'threat_actor': ['Security Researchers (Unspecified)'],
'title': 'System Prompt Extraction from OpenAI’s Sora 2 via Cross-Modal '
'Vulnerabilities',
'type': ['Prompt Extraction', 'AI Model Vulnerability', 'Cross-Modal Attack'],
'vulnerability_exploited': ['Semantic Drift in Multimodal AI',
'Fragmented Token Extraction via '
'Optical/Transcription Methods',
'Lack of Robust Guardrails for Non-Text '
'Modalities']}