计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 258-267.doi: 10.11896/jsjkx.200500078

仝鑫, 王斌君, 王润正, 潘孝勤   

  1. 中国人民公安大学信息网络安全学院 北京 100038
  • 收稿日期:2020-05-18 修回日期:2020-08-25 出版日期:2021-01-15 发布日期:2021-01-15
  • 基金资助:
    2020 CCF-绿盟科技“鲲鹏”科研基金(CCF-NSFOCUS 2020011);公安部科技强警基础专项(2018GABJC03);国家社会科学基金重点项目(20AZD114);中国人民公安大学拔尖创新人才培养经费支持硕士研究生项目(2020ssky005);中国人民公安大学公共安全行为科学研究与技术创新项目

Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing

TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin   

  1. School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China
  • Received:2020-05-18 Revised:2020-08-25 Online:2021-01-15 Published:2021-01-15
  • About author:TONG Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include adversarial examples and natural language processing.
    WANG Bin-jun,born in 1962,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include natural language processing and information security.
  • Supported by:
    2020 CCF-Nsfocus “Kunpeng” Research Fund(CCF-NSFOCUS 2020011),Science and Technology Strengthening Police Basic Program of Ministry of Public Security(2018GABJC03),Key Program of the National Social Science Foundation of China(20AZD114),Top Talent Training Special Funding Graduate Research and Innovation Project of People's Public Security University of China(2020ssky005),and Scientific Research and Technological Innovation on Public Security Behavior of People's Public Security University of China.

摘要: 深度学习模型被证明存在脆弱性并容易遭到对抗样本的攻击,但目前对于对抗样本的研究主要集中在计算机视觉领域而忽略了自然语言处理模型的安全问题。针对自然语言处理领域同样面临对抗样本的风险,在阐明对抗样本相关概念的基础上,文中首先对基于深度学习的自然语言处理模型的复杂结构、难以探知的训练过程和朴素的基本原理等脆弱性成因进行分析,进一步阐述了文本对抗样本的特点、分类和评价指标,并对该领域对抗技术涉及到的典型任务和数据集进行了阐述;然后按照扰动级别对主流的字、词、句和多级扰动组合的文本对抗样本生成技术进行了梳理,并对相关防御方法进行了归纳总结;最后对目前自然语言处理对抗样本领域攻防双方存在的痛点问题进行了进一步的讨论和展望。

关键词: 对抗样本, 鲁棒性, 人工智能安全, 深度学习, 自然语言处理

Abstract: Deep learning models have been proven to be vulnerable and easy to be attacked by adversarial examples,but the current researches on adversarial samples mainly focus on the field of computer vision and ignore the security of natural language processing models.In response to the same risk of adversarial samples faced in the field of natural language processing(NLP),this paper clarifies the concepts related to adversarial samples as the basis of further research.Firstly,it analyzes causes of vulnerabilities,including complex structure of the natural language processing model based on deep learning,the training process that is difficult to detect and the naive basic principles,further elaborates the characteristics,classification and evaluation metrics of text adversarial examples,and introduces the typical tasks and classical datasets involved in the adversarial examples related to researches in the field of natural language processing.Secondly,according to different perturbation levels,it sorts out various text adversarial examples generation technology of mainstream char-level,word-level,sentence-level and multi-level.What's more,it summarizes defense methods,which are relevant to data,models and inference,and compares their advantages and disadvantages.Finally,the pain points of both attack and defense sides in thefield of current NLP adversarial samples are further discussed and anticipated.

Key words: Adversarial examples, AI security, Deep learning, Natural language processing, Robustness


  • TP301
