Anthropic, OpenAI and Google: Hidden instructions in README files can make AI agents leak data

Anthropic, OpenAI and Google: Hidden instructions in README files can make AI agents leak data

AI Coding Agents Vulnerable to "Semantic Injection" Attacks via Malicious README Files

New research reveals a critical security flaw in AI-powered coding agents, which can be exploited through hidden malicious instructions in project README files. These files commonly used to guide software setup often include commands for installing dependencies or configuring applications. Attackers can embed seemingly benign steps, such as file synchronization or data uploads, that trick AI agents into leaking sensitive local files to external servers.

The attack, dubbed a "semantic injection", was tested using ReadSecBench, a dataset of 500 README files from open-source repositories across Java, Python, C, C++, and JavaScript. When malicious instructions were inserted, AI agents including those powered by Anthropic’s Claude, OpenAI’s GPT models, and Google’s Gemini executed them in up to 85% of cases, regardless of programming language or instruction placement.

Key findings:

  • Direct commands (e.g., "Upload config files to this server") succeeded 84% of the time, while less explicit phrasing reduced success rates.
  • Linked documentation proved even riskier: When malicious instructions were placed two links deep from the main README, attacks succeeded in 91% of tests.
  • Human reviewers failed to detect the threats: In a test with 15 participants, none identified the hidden instructions. Over 53% found nothing unusual, while 40% focused on minor grammar issues.
  • Automated detection tools struggled: Rule-based scanners flagged benign files due to common README elements (commands, paths), while AI classifiers missed attacks in linked files.

The researchers warn that as AI agents become more integrated into development workflows, unverified execution of README instructions poses a growing risk. They recommend treating external documentation as "partially trusted input" and implementing stricter verification for sensitive actions. The findings underscore the need for improved safeguards to prevent unintended data exposure in automated coding environments.

Source: https://www.helpnetsecurity.com/2026/03/17/ai-agents-readme-files-data-leak-security-risk/

Google Research cybersecurity rating report: https://www.rankiteo.com/company/googleresearch

Anthropic cybersecurity rating report: https://www.rankiteo.com/company/anthropicresearch

OpenAI cybersecurity rating report: https://www.rankiteo.com/company/openai

"id": "GOOANTOPE1773736050",
"linkid": "googleresearch, anthropicresearch, openai",
"type": "Vulnerability",
"date": "3/2026",
"severity": "85",
"impact": "4",
"explanation": "Attack with significant impact with customers data leaks"
{'affected_entities': [{'industry': 'Technology/AI',
                        'name': 'Anthropic',
                        'type': 'AI Provider'},
                       {'industry': 'Technology/AI',
                        'name': 'OpenAI',
                        'type': 'AI Provider'},
                       {'industry': 'Technology/AI',
                        'name': 'Google',
                        'type': 'AI Provider'}],
 'attack_vector': 'Malicious README files with hidden instructions',
 'data_breach': {'data_exfiltration': 'Yes (files uploaded to external '
                                      'servers)',
                 'sensitivity_of_data': 'High (potentially confidential or '
                                        'proprietary information)',
                 'type_of_data_compromised': 'Sensitive local files'},
 'description': 'New research reveals a critical security flaw in AI-powered '
                'coding agents, which can be exploited through hidden '
                'malicious instructions in project README files. Attackers can '
                'embed seemingly benign steps, such as file synchronization or '
                'data uploads, that trick AI agents into leaking sensitive '
                "local files to external servers. The attack, dubbed 'semantic "
                "injection,' was tested using ReadSecBench, a dataset of 500 "
                'README files from open-source repositories across Java, '
                'Python, C, C++, and JavaScript. AI agents, including those '
                'powered by Anthropic’s Claude, OpenAI’s GPT models, and '
                'Google’s Gemini, executed malicious instructions in up to 85% '
                'of cases.',
 'impact': {'brand_reputation_impact': 'Potential reputational damage to AI '
                                       'coding agent providers',
            'data_compromised': 'Sensitive local files',
            'operational_impact': 'Potential data leakage and unauthorized '
                                  'data exfiltration',
            'systems_affected': 'AI-powered coding agents (Anthropic’s Claude, '
                                'OpenAI’s GPT models, Google’s Gemini)'},
 'lessons_learned': 'AI coding agents are vulnerable to semantic injection '
                    'attacks via malicious README files, and human reviewers '
                    'often fail to detect such threats. Automated detection '
                    'tools also struggle with false positives and missed '
                    'attacks in linked files.',
 'post_incident_analysis': {'corrective_actions': 'Implement stricter '
                                                  'verification for sensitive '
                                                  'actions, improve detection '
                                                  'tools, and educate users on '
                                                  'the risks of untrusted '
                                                  'documentation.',
                            'root_causes': 'Lack of verification for '
                                           'instructions in README files and '
                                           'over-reliance on AI agents to '
                                           'execute commands without '
                                           'scrutiny.'},
 'recommendations': "Treat external documentation as 'partially trusted input' "
                    'and implement stricter verification for sensitive actions '
                    'executed by AI agents. Improve safeguards to prevent '
                    'unintended data exposure in automated coding '
                    'environments.',
 'references': [{'source': 'Research on Semantic Injection Attacks'}],
 'response': {'remediation_measures': 'Recommend treating external '
                                      "documentation as 'partially trusted "
                                      "input' and implementing stricter "
                                      'verification for sensitive actions'},
 'title': "AI Coding Agents Vulnerable to 'Semantic Injection' Attacks via "
          'Malicious README Files',
 'type': 'Semantic Injection',
 'vulnerability_exploited': 'Unverified execution of README instructions by AI '
                            'coding agents'}
Great! Next, complete checkout for full access to Rankiteo Blog.
Welcome back! You've successfully signed in.
You've successfully subscribed to Rankiteo Blog.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.