Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection

Published: 12 July 2024 Publication History

Abstract

With the expanding application of Large Language Models (LLMs) in various domains, it becomes imperative to comprehensively investigate their unforeseen behaviors and consequent outcomes. In this study, we introduce and systematically explore the phenomenon of “glitch tokens”, which are anomalous tokens produced by established tokenizers and could potentially compromise the models’ quality of response. Specifically, we experiment on seven top popular LLMs utilizing three distinct tokenizers and involving a totally of 182,517 tokens. We present categorizations of the identified glitch tokens and symptoms exhibited by LLMs when interacting with glitch tokens. Based on our observation that glitch tokens tend to cluster in the embedding space, we propose GlitchHunter, a novel iterative clustering-based technique, for efficient glitch token detection. The evaluation shows that our approach notably outperforms three baseline methods on eight open-source LLMs. To the best of our knowledge, we present the first comprehensive study on glitch tokens. Our new detection further provides valuable insights into mitigating tokenization-related errors in LLMs.

References

[1]
(Accessed on 09/25/2023). SolidGoldMagikarp (plus, prompt generation). https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
[2]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, 10 (2008), oct, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
[3]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165.
[4]
Zhiyuan Chang, Mingyang Li, Yi Liu, Junjie Wang, Qing Wang, and Yang Liu. 2024. Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues. arXiv preprint arXiv:2402.09091.
[5]
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, and Joseph E Gonzalez. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
[6]
Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. 2024. MASTERKEY: Automated jailbreaking of large language model chatbots. In NDSS.
[7]
Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2023. Pentestgpt: An llm-empowered automatic penetration testing tool. arXiv preprint arXiv:2308.06782.
[8]
Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, and Yang Liu. 2024. Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning. NDSS AISCC.
[9]
Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2023. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv preprint arXiv:2304.02014.
[10]
Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. ACL, 320–335.
[11]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA. 96, 226–231.
[12]
U. Flick. 2009. An Introduction to Qualitative Research. SAGE Publications. isbn:9781446241318 https://books.google.com.sg/books?id=sFv1oWX2DoEC
[13]
A Search for More ChatGPT / GPT-3.5 / GPT-4 "Unspeakable" Glitch Tokens. (Accessed on 09/26/2023). https://www.lesswrong.com/posts/kmWrwtGE9B9hpbgRT/a-search-for-more-chatgpt-gpt-3-5-gpt-4-unspeakable-glitch
[14]
Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, and David Chartash. 2023. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9, 1 (2023), e45312.
[15]
GlitchHunter. (Accessed on 03/05/2024). https://sites.google.com/view/glitchhunter-fse2024
[16]
Shashij Gupta, Pinjia He, Clara Meister, and Zhendong Su. 2020. Machine Translation Testing via Pathological Invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 863–875. isbn:9781450370431 https://doi.org/10.1145/3368089.3409756
[17]
J Hartigan. 1975. The K-means algorithm. Clustering algorithms, 4 (1975).
[18]
Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2023. An empirical study on fine-tuning large language models of code for automated program repair. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1162–1174.
[19]
Hierarchical Clustering in Machine Learning. (Accessed on 09/27/2023). https://www.geeksforgeeks.org/ml-hierarchical-clustering-agglomerative-and-divisive-clustering/
[20]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arxiv:2310.06825.
[21]
James Melville Leland McInnes, John Healy. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arxiv:1802.03426.
[22]
Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Teaching Models to Express Their Uncertainty in Words. arxiv:2205.14334.
[23]
Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, and Lingming Zhang. 2023. Nnsmith: Generating diverse and valid test cases for deep learning compilers. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 530–543.
[24]
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. 2023. Prompt Injection attack against LLM-integrated Applications. arXiv preprint arXiv:2306.05499.
[25]
Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, and Yang Liu. 2023. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860.
[26]
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. arxiv:cs/0205028.
[27]
ML | K means++ Algorithm. (Accessed on 09/27/2023). https://www.geeksforgeeks.org/ml-k-means-algorithm/
[28]
C. Model card Models and evaluations for claude models. (Accessed on 09/25/2023). https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf
[29]
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, and Lilian Weng. 2022. Text and Code Embeddings by Contrastive Pre-Training. arxiv:2201.10005.
[30]
Ann E. Nicholson, Kevin B. Korb, Erik P. Nyberg, Michael Wybrow, Ingrid Zukerman, Steven Mascaro, Shreshth Thakur, Abraham Oshni Alvandi, Jeff Riley, Ross Pearson, Shane Morris, Matthieu Herrmann, A.K.M. Azad, Fergus Bolger, Ulrike Hahn, and David Lagnado. 2020. BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning. arxiv:2003.01207.
[31]
Hyunjong Ok and Seong-Bae Park. 2023. Post-Trained Language Model Adaptive to Extractive Summarization of Long Spoken Documents. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–2.
[32]
OpenAI. 2024. GPT-4 Technical Report. arxiv:2303.08774.
[33]
The petertodd phenomenon. (Accessed on 09/25/2023). https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petertodd-phenomenon
[34]
ShareGPT52K. (Accessed on 03/06/2024). https://huggingface.co/datasets/RyokoAI/ShareGPT52K
[35]
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine Wang. 2019. Release Strategies and the Social Impacts of Language Models. arxiv:1908.09203.
[36]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca
[37]
SolidGoldMagikarp II: technical details and more recent findings. (Accessed on 09/25/2023). https://www.lesswrong.com/posts/Ya9LzwEbfaAMY8ABo/solidgoldmagikarp-ii-technical-details-and-more-recent
[38]
Archana Tikayat Ray, Olivia J Pinon-Fischer, Dimitri N Mavris, Ryan T White, and Bjorn F Cole. 2023. aeroBERT-NER: Named-Entity Recognition for Aerospace Requirements Engineering using BERT. In AIAA SCITECH 2023 Forum. 2583.
[39]
SolidGoldMagikarp III: Glitch token archaeology — LessWrong. (Accessed on 09/26/2023). https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
[40]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971.
[41]
Nees Jan van Eck Vincent Traag, Ludo Waltman. 2018. From Louvain to Leiden: guaranteeing well-connected communities. arxiv:1810.08473.
[42]
Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael Lyu. 2023. BiasAsker: Measuring the Bias in Conversational AI System. arxiv:2305.12434.
[43]
Guanyu Wang, Yuekang Li, Yi Liu, Gelei Deng, Tianlin Li, Guosheng Xu, Yang Liu, Haoyu Wang, and Kailong Wang. 2024. MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented Generation. FORGE.
[44]
Wenxuan Wang, Jingyuan Huang, Chang Chen, Jiazhen Gu, Jianping Zhang, Weibin Wu, Pinjia He, and Michael Lyu. 2023. Validating Multimedia Content Moderation Software via Semantic Fusion. arxiv:2305.13623.
[45]
Wenxuan Wang, Jingyuan Huang, Jen tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, and Michael R. Lyu. 2023. An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software. arxiv:2308.09810.
[46]
Wenxuan Wang, Jen-tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang, Shuqing Li, Pinjia He, and Michael R Lyu. 2023. Mttm: Metamorphic testing for textual content moderation software. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2387–2399.
[47]
Wenxuan Wang, Jen tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang, Shuqing Li, Pinjia He, and Michael Lyu. 2023. MTTM: Metamorphic Testing for Textual Content Moderation Software. arxiv:2302.05706.
[48]
Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, and Bin Wang. 2023. CMATH: Can Your Language Model Pass Chinese Elementary School Math Test? arXiv preprint arXiv:2306.16636.
[49]
Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, and Stjepan Picek. 2024. LLM Jailbreak Attack versus Defense Techniques–A Comprehensive Study. arXiv preprint arXiv:2402.13457.
[50]
Boxi Yu, Yiyan Hu, Qiuyang Mang, Wenhan Hu, and Pinjia He. 2023. Automated Testing and Improvement of Named Entity Recognition Systems. arxiv:2308.07937.
[51]
Shyr-Shen Yu, Shao-Wei Chu, Ching-Lin Wang, Yung-Kuan Chan, and Chia Yi Chuang. 2014/05. A Modified K-means Algorithms - Bi-Level K-Means Algorithm. In Proceedings of the 2nd International Conference on Soft Computing in Information Communication Technology. Atlantis Press, 10–13. isbn:978-94-6252-014-1 issn:1951-6851 https://doi.org/10.2991/scict-14.2014.3
[52]
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, and Xiao Xia. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
[53]
Zhen Zhang, Mengting Hu, Shiwan Zhaofor, Minlie Huang, Haotian Wang, Lemao Liu, Zhirui Zhang, Zhe Liu, and Bingzhe Wu. 2023. E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition. arxiv:2305.17854.
[54]
Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. Fine-Tuning Language Models from Human Preferences. arxiv:1909.08593.

Cited By

View all
  • (2024)Semantic-Enhanced Indirect Call Analysis with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695016(430-442)Online publication date: 27-Oct-2024

Index Terms

  1. Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Software Engineering
    Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
    July 2024
    2770 pages
    EISSN:2994-970X
    DOI:10.1145/3554322
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2024
    Published in PACMSE Volume 1, Issue FSE

    Author Tags

    1. LLM analysis
    2. LLM security
    3. glitch token

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)375
    • Downloads (Last 6 weeks)120
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semantic-Enhanced Indirect Call Analysis with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695016(430-442)Online publication date: 27-Oct-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media