Critical Path Traversal Flaw in Unstructured.io Exposes AI Data Pipelines to RCE
A severe vulnerability (CVE-2025-64712) in Unstructured.io, a widely used ETL library for AI data processing, has been disclosed, affecting 87% of Fortune 1000 companies, including Amazon, Google, and Bank of America. The flaw, rated 9.8 (Critical) on the CVSS scale, enables arbitrary file writes and remote code execution (RCE) via a path traversal exploit in the handling of Microsoft Outlook .msg attachments.
The issue lies in the partition_msg() function, which processes email attachments by concatenating unvalidated filenames with temporary directories. Attackers can craft malicious .msg files with filenames like ../../root/.ssh/authorized_keys or ../../etc/passwd, allowing them to overwrite critical system files. This can lead to full server compromise, including data exfiltration, credential theft, or lateral movement within networks.
Unstructured.io is a key tool for converting unstructured data such as PDFs, emails, and images into AI-ready formats, processing 80-90% of enterprise data. Its open-source library, used alongside managed SaaS APIs and integrations with S3, Google Drive, OneDrive, and Salesforce, powers frameworks like LlamaIndex and LangChain, amplifying the vulnerability’s reach across millions of deployments, including OpenWebUI.
With over 4 million monthly downloads and dependencies embedded in ~100,000 GitHub repositories, the supply chain risk is significant. Major cloud providers, including Azure, AWS, and GCP, reference Unstructured.io in their documentation, embedding it in production AI pipelines.
A patch is available via GitHub, and organizations are advised to upgrade immediately. The flaw affects all versions prior to the latest commit, with exploitation requiring no privileges and low attack complexity. CISA and vendors have emphasized the urgency of mitigating the risk to prevent RCE in enterprise environments.
Source: https://cyberpress.org/critical-cve-2025-64712-vulnerability/
Google TPRM report: https://www.rankiteo.com/company/google
Unstructured.io TPRM report: https://www.rankiteo.com/company/unstructuredio
"id": "goouns1770992726",
"linkid": "google, unstructuredio",
"type": "Cyber Attack",
"date": "1/2025",
"severity": "100",
"impact": "5",
"explanation": "Attack threatening the organization's existence"
{'affected_entities': [{'industry': 'Technology/Cloud Services',
'name': 'Amazon',
'size': 'Fortune 1000',
'type': 'Corporation'},
{'industry': 'Technology/Cloud Services',
'name': 'Google',
'size': 'Fortune 1000',
'type': 'Corporation'},
{'industry': 'Financial Services',
'name': 'Bank of America',
'size': 'Fortune 1000',
'type': 'Corporation'},
{'customers_affected': '87% of Fortune 1000 companies',
'industry': 'AI/Data Processing',
'name': 'Unstructured.io',
'type': 'Software Vendor'}],
'attack_vector': 'Malicious .msg file attachments',
'data_breach': {'data_exfiltration': 'Possible',
'file_types_exposed': ['.msg',
'system files (e.g., /etc/passwd, '
'authorized_keys)'],
'sensitivity_of_data': 'High (system files, credentials)',
'type_of_data_compromised': 'Credentials, system files, '
'potentially sensitive enterprise '
'data'},
'description': 'A severe vulnerability (CVE-2025-64712) in Unstructured.io, a '
'widely used ETL library for AI data processing, enables '
'arbitrary file writes and remote code execution (RCE) via a '
'path traversal exploit in the handling of Microsoft Outlook '
'.msg attachments. The flaw affects 87% of Fortune 1000 '
'companies, including Amazon, Google, and Bank of America, and '
'allows attackers to overwrite critical system files, leading '
'to full server compromise.',
'impact': {'data_compromised': 'Potential data exfiltration, credential theft',
'identity_theft_risk': 'High (credential theft)',
'operational_impact': 'Full server compromise, lateral movement '
'within networks',
'systems_affected': 'AI data pipelines, enterprise servers'},
'post_incident_analysis': {'corrective_actions': 'Patch released to validate '
'file paths',
'root_causes': 'Unvalidated filename concatenation '
'in `partition_msg()` function'},
'recommendations': 'Upgrade Unstructured.io immediately, validate file paths '
'in ETL processes, monitor for malicious .msg file '
'uploads, and review AI pipeline dependencies for supply '
'chain risks.',
'references': [{'source': 'GitHub'}, {'source': 'CISA'}],
'regulatory_compliance': {'regulatory_notifications': 'CISA advisory'},
'response': {'containment_measures': 'Patch available via GitHub',
'remediation_measures': 'Upgrade to the latest version of '
'Unstructured.io'},
'title': 'Critical Path Traversal Flaw in Unstructured.io Exposes AI Data '
'Pipelines to RCE',
'type': 'Remote Code Execution (RCE)',
'vulnerability_exploited': 'Path traversal (CVE-2025-64712)'}