New LLM Attack "TokenBreaker" Bypasses Protections with Single-Character Tweaks
Researchers from cybersecurity firm HiddenLayer have uncovered a novel attack technique, dubbed TokenBreaker, that exploits weaknesses in how certain Large Language Models (LLMs) tokenize text. By altering or adding a single character to key words—such as changing "instructions" to "finstructions"—attackers can bypass protective filters while still conveying malicious intent to the underlying LLM.
The attack targets LLMs that use Byte Pair Encoding (BPE) or WordPiece tokenization, which break text into smaller units (tokens) for processing. While protective models may misclassify the manipulated input as harmless, the core LLM interprets the original intent, enabling the delivery of harmful prompts. Potential applications include evading AI-powered spam filters, allowing phishing emails or malware-laden messages to reach users undetected.
For example, a spam filter blocking the word "lottery" might still permit a message containing "slottery," exposing recipients to malicious links or malware. The researchers noted that models using Unigram tokenizers appear resistant to this manipulation, suggesting a potential mitigation strategy.
The findings, published in an in-depth report by HiddenLayer’s Kieran Evans, Kasimir Schulz, and Kenneth Yeung, highlight vulnerabilities in current LLM security mechanisms and underscore the need for more robust tokenization methods. The discovery was first reported by The Hacker News.
HiddenLayer cybersecurity rating report: https://www.rankiteo.com/company/hiddenlayersec
"id": "HID1766995749",
"linkid": "hiddenlayersec",
"type": "Vulnerability",
"date": "6/2025",
"severity": "50",
"impact": "2",
"explanation": "Attack limited on finance or reputation"
{'affected_entities': [{'industry': 'Cybersecurity, AI/ML',
'name': 'AI-powered spam filter providers',
'type': 'Technology/Software'}],
'attack_vector': 'Prompt Manipulation (Tokenization Bypass)',
'description': 'Researchers from HiddenLayer devised a new LLM attack called '
'TokenBreaker that bypasses protection mechanisms by adding or '
'changing a single character in prompts. The underlying LLM '
'still understands the malicious intent, allowing attackers to '
'circumvent AI-powered defenses such as spam filters.',
'impact': {'identity_theft_risk': 'Increased risk if malicious prompts lead '
'to phishing or malware delivery',
'operational_impact': 'Potential bypass of security protections in '
'AI systems',
'payment_information_risk': 'Increased risk if malicious prompts '
'lead to phishing or malware delivery',
'systems_affected': 'AI-powered spam filters, LLMs with vulnerable '
'tokenization methods'},
'investigation_status': 'Research/Public Disclosure',
'lessons_learned': 'LLMs using BPE or WordPiece tokenization are vulnerable '
'to prompt manipulation attacks. Unigram tokenizers are '
'more resistant to such attacks.',
'motivation': 'Research/Demonstration of AI Security Flaws',
'post_incident_analysis': {'corrective_actions': 'Transition to Unigram '
'tokenizers or other robust '
'tokenization methods',
'root_causes': 'Weaknesses in BPE/WordPiece '
'tokenization methods allowing '
'prompt manipulation'},
'recommendations': 'Adopt LLMs with robust tokenization methods like Unigram. '
'Implement additional layers of security for AI-powered '
'defenses.',
'references': [{'source': 'HiddenLayer Research Report'},
{'source': 'The Hacker News'}],
'response': {'remediation_measures': 'Use models with Unigram tokenizers or '
'more robust tokenization methods'},
'threat_actor': 'HiddenLayer Researchers (Kieran Evans, Kasimir Schulz, '
'Kenneth Yeung)',
'title': 'TokenBreaker: Bypassing LLM Protections via Tokenization '
'Manipulation',
'type': 'AI/ML Vulnerability Exploitation',
'vulnerability_exploited': 'Byte Pair Encoding (BPE) or WordPiece '
'tokenization weaknesses in LLMs'}