🎉 [Gate 30 Million Milestone] Share Your Gate Moment & Win Exclusive Gifts!
Gate has surpassed 30M users worldwide — not just a number, but a journey we've built together.
Remember the thrill of opening your first account, or the Gate merch that’s been part of your daily life?
📸 Join the #MyGateMoment# campaign!
Share your story on Gate Square, and embrace the next 30 million together!
✅ How to Participate:
1️⃣ Post a photo or video with Gate elements
2️⃣ Add #MyGateMoment# and share your story, wishes, or thoughts
3️⃣ Share your post on Twitter (X) — top 10 views will get extra rewards!
👉
TokenBreak Attack Bypasses LLM Safeguards With Single Character
HomeNews* Researchers have identified a new method called TokenBreak that bypasses large language model (LLM) safety and moderation by altering a single character in text inputs.
The research team explained in their report that, “the TokenBreak attack targets a text classification model’s tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent.” Tokenization is essential in language models because it turns text into units that can be mapped and understood by algorithms. The manipulated text can pass through LLM filters, triggering the same response as if the input had been unaltered.
HiddenLayer found that TokenBreak works on models using BPE (Byte Pair Encoding) or WordPiece tokenization, but does not affect Unigram-based systems. The researchers stated, “Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack.” They recommend using Unigram tokenizers, teaching filter models to recognize tokenization tricks, and reviewing logs for signs of manipulation.
The discovery follows previous research by HiddenLayer detailing how Model Context Protocol (MCP) tools can be used to leak sensitive information by inserting specific parameters within a tool’s function.
In a related development, the Straiker AI Research team showed that “Yearbook Attacks”—which use backronyms to encode bad content—can trick chatbots from companies like Anthropic, DeepSeek, Google, Meta, Microsoft, Mistral AI, and OpenAI into producing undesirable responses. Security researchers explained that such tricks pass through filters because they resemble normal messages and exploit how models value context and pattern completion, rather than intent analysis.
Previous Articles: