Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3666122.3666467guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

A unified detection framework for inference-stage backdoor defenses

Published: 30 May 2024 Publication History

Abstract

Backdoor attacks involve inserting poisoned samples during training, resulting in a model containing a hidden backdoor that can trigger specific behaviors without impacting performance on normal samples. These attacks are challenging to detect, as the backdoored model appears normal until activated by the backdoor trigger, rendering them particularly stealthy. In this study, we devise a unified inference-stage detection framework to defend against backdoor attacks. We first rigorously formulate the inference-stage backdoor detection problem, encompassing various existing methods, and discuss several challenges and limitations. We then propose a framework with provable guarantees on the false positive rate or the probability of misclassifying a clean sample. Further, we derive the most powerful detection rule to maximize the detection power, namely the rate of accurately identifying a backdoor sample, given a false positive rate under classical learning scenarios. Based on the theoretically optimal detection rule, we suggest a practical and effective approach for real-world applications based on the latent representations of backdoored deep nets. We extensively evaluate our method on 14 different backdoor attacks using Computer Vision (CV) and Natural Language Processing (NLP) benchmark datasets. The experimental findings align with our theoretical results. We significantly surpass the state-of-the-art methods, e.g., up to 300% improvement on the detection power as evaluated by AUCROC, over the state-of-the-art defense against advanced adaptive backdoor attacks.

References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.
[2]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[3]
T. Gu, B. Dolan-Gavitt, and S. Garg, "Badnets: Identifying vulnerabilities in the machine learning model supply chain," arXiv preprint arXiv:1708.06733, 2017.
[4]
X. Chen, C. Liu, B. Li, K. Lu, and D. Song, "Targeted backdoor attacks on deep learning systems using data poisoning," arXiv preprint arXiv:1712.05526, 2017.
[5]
Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, "Trojaning attack on neural networks," 2017.
[6]
A. Turner, D. Tsipras, and A. Madry, "Clean-label backdoor attacks," 2018.
[7]
M. Barni, K. Kallas, and B. Tondi, "A new backdoor attack in cnns by training set corruption without label poisoning," in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 101-105.
[8]
J. Dai, C. Chen, and Y. Li, "A backdoor attack against lstm-based text classification systems," IEEE Access, vol. 7, pp. 138 872-138 878, 2019.
[9]
S. Garg, A. Kumar, V. Goel, and Y. Liang, "Can adversarial weight perturbations inject neural backdoors," in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2029-2032.
[10]
S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, "Clean-label backdoor attacks on video recognition models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14443-14452.
[11]
T. A. Nguyen and A. Tran, "Input-aware dynamic backdoor attack," Advances in Neural Information Processing Systems, vol. 33, pp. 3454-3464, 2020.
[12]
K. Doan, Y. Lao, W. Zhao, and P. Li, "Lira: Learnable, imperceptible and robust backdoor attacks," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11966-11976.
[13]
A. Nguyen and A. Tran, "Wanet-imperceptible warping-based backdoor attack," arXiv preprint arXiv:2102.10369, 2021.
[14]
D. Tang, X. Wang, H. Tang, and K. Zhang, "Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection." in USENIX Security Symposium, 2021, pp. 1541-1558.
[15]
Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu, "Invisible backdoor attack with sample-specific triggers," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16463-16472.
[16]
X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, and Y. Zhang, "Badnl: Backdoor attacks against nlp models with semantic-preserving improvements," in Annual Computer Security Applications Conference, 2021, pp. 554-569.
[17]
E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, "How to backdoor federated learning," in International conference on artificial intelligence and statistics. PMLR, 2020, pp. 2938-2948.
[18]
H. Souri, L. Fowl, R. Chellappa, M. Goldblum, and T. Goldstein, "Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch," Advances in Neural Information Processing Systems, vol. 35, pp. 19 165-19 178, 2022.
[19]
X. Qi, T. Xie, Y. Li, S. Mahloujifar, and P. Mittal, "Revisiting the assumption of latent separability for backdoor defenses," in The eleventh international conference on learning representations, 2023.
[20]
Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, "Strip: A defence against trojan attacks on deep neural networks," in Proceedings of the 35th Annual Computer Security Applications Conference, 2019, pp. 113-125.
[21]
F. Qi, Y. Chen, M. Li, Y. Yao, Z. Liu, and M. Sun, "Onion: A simple and effective defense against textual backdoor attacks," arXiv preprint arXiv:2011.10369, 2020.
[22]
R. Cai, Z. Zhang, T. Chen, X. Chen, and Z. Wang, "Randomized channel shuffling: Minimal-overhead backdoor attack detection without clean datasets," Advances in Neural Information Processing Systems, vol. 35, pp. 33 876-33 889, 2022.
[23]
W. Yang, Y. Lin, P. Li, J. Zhou, and X. Sun, "Rethinking stealthiness of backdoor attack against nlp models," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5543-5557.
[24]
J. Guo, Y. Li, X. Chen, H. Guo, L. Sun, and C. Liu, "Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency," in The eleventh international conference on learning representations, 2023.
[25]
X. Sheng, Z. Han, P. Li, and X. Chang, "A survey on backdoor attack and defense in natural language processing," arXiv preprint arXiv:2211.11958, 2022.
[26]
K. J. Cheoi, H. Choi, and J. Ko, "Empirical remarks on the translational equivariance of convolutional layers," Applied Sciences, vol. 10, no. 9, p. 3161, 2020.
[27]
E. Diao, J. Ding, and V. Tarokh, "Restricted recurrent neural networks," in 2019 IEEE international conference on big data (big data). IEEE, 2019, pp. 56-63.
[28]
E. Diao, J. Ding, and V. Tarokh, "Drasic: Distributed recurrent autoencoder for scalable image compression," in 2020 Data Compression Conference (DCC). IEEE, 2020, pp. 3-12.
[29]
A. Graves and A. Graves, "Long short-term memory," Supervised sequence labelling with recurrent neural networks, pp. 37-45, 2012.
[30]
J. Wang, M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh, "Speech emotion recognition with dual-sequence lstm architecture," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6474-6478.
[31]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[32]
V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a random world. Springer, 2005, vol. 29.
[33]
J. Lei and L. Wasserman, "Distribution-free prediction bands for non-parametric regression," Journal of the Royal Statistical Society: Series B: Statistical Methodology, pp. 71-96, 2014.
[34]
A. Krizhevsky, G. Hinton et al., "Learning multiple layers of features from tiny images," 2009.
[35]
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, "Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition," Neural networks, vol. 32, pp. 323-332, 2012.
[36]
Y. Le and X. Yang, "Tiny imagenet visual recognition challenge," CS 231N, vol. 7, no. 7, p. 3, 2015.
[37]
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631-1642.
[38]
A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 2011, pp. 142-150.
[39]
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[40]
L. Li, D. Song, X. Li, J. Zeng, R. Ma, and X. Qiu, "Backdoor attacks on pre-trained models by layerwise weight poisoning," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 3023-3032. [Online]. Available: https://aclanthology.org/2021.emnlp-main.241
[41]
X. Xian, G. Wang, J. Srinivasa, A. Kundu, X. Bi, M. Hong, and J. Ding, "Understanding backdoor attacks through the Adaptability Hypothesis," in International Conference on Machine Learning, 2023.
[42]
E. Chou, F. Tramer, and G. Pellegrino, "Sentinet: Detecting localized universal attacks against deep learning systems," in 2020 IEEE Security and Privacy Workshops (SPW). IEEE, 2020, pp. 48-54.
[43]
B. Tran, J. Li, and A. Madry, "Spectral signatures in backdoor attacks," arXiv preprint arXiv:1811.00636, 2018.
[44]
B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, "Detecting backdoor attacks on deep neural networks by activation clustering," arXiv preprint arXiv:1811.03728, 2018.
[45]
E. Wallace, T. Z. Zhao, S. Feng, and S. Singh, "Concealed data poisoning attacks on nlp models," arXiv preprint arXiv:2010.12563, 2020.
[46]
J. Hayase, W. Kong, R. Somani, and S. Oh, "Spectre: defending against backdoor attacks using robust statistics," arXiv preprint arXiv:2104.11315, 2021.
[47]
Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, "Anti-backdoor learning: Training clean models on poisoned data," Advances in Neural Information Processing Systems, vol. 34, pp. 14900-14912, 2021.
[48]
Z. Hammoudeh and D. Lowd, "Identifying a training-set attack's target using renormalized influence estimation," in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 1367-1381.
[49]
G. Cui, L. Yuan, B. He, Y. Chen, Z. Liu, and M. Sun, "A unified evaluation of textual backdoor learning: Frameworks and benchmarks," arXiv preprint arXiv:2206.08514, 2022.
[50]
K. Huang, Y. Li, B. Wu, Z. Qin, and K. Ren, "Backdoor defense via decoupling the training process," arXiv preprint arXiv:2202.03423, 2022.
[51]
L. McInnes, J. Healy, and S. Astels, "hdbscan: Hierarchical density based clustering." J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
[52]
B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, "Februus: Input purification defense against trojan attacks on deep neural network systems," in Annual Computer Security Applications Conference, 2020, pp. 897-912.
[53]
W. Yang, Y. Lin, P. Li, J. Zhou, and X. Sun, "Rap: Robustness-aware perturbations for defending against backdoor attacks on nlp models," arXiv preprint arXiv:2110.07831, 2021.
[54]
Y. Li, B. Wu, Y. Jiang, Z. Li, and S.-T. Xia, "Backdoor learning: A survey," arXiv preprint arXiv:2007.08745, 2020.
[55]
Y. Zeng, W. Park, Z. M. Mao, and R. Jia, "Rethinking the backdoor attacks' triggers: A frequency perspective," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16473-16481.
[56]
J. Neyman and E. S. Pearson, "On the problem of the most efficient tests of statistical hypotheses," Philosophical Transactions of the Royal Society of London, vol. 231, no. 694-706, pp. 289-337, 1933.
[57]
O. Bodnar, T. Bodnar, and N. Parolya, "Recent advances in shrinkage-based high-dimensional inference," Journal of Multivariate Analysis, vol. 188, p. 104826, 2022.
[58]
J. Z. Huang, N. Liu, M. Pourahmadi, and L. Liu, "Covariance matrix selection and estimation via penalised normal likelihood," Biometrika, vol. 93, no. 1, pp. 85-98, 2006.
[59]
L. N. Trefethen and D. Bau, Numerical linear algebra. Siam, 2022, vol. 181.
[60]
A. D. Ker, "Stability of the mahalanobis distance: A technical note," Technical Report CS-RR-10-20, 2010.
[61]
O. Ledoit and M. Wolf, "Honey, i shrunk the sample covariance matrix," UPF economics and business working paper, no. 691, 2003.
[62]
P. J. Bickel, B. Li, A. B. Tsybakov, S. A. van de Geer, B. Yu, T. Valdés, C. Rivero, J. Fan, and A. van der Vaart, "Regularization in statistics," Test, vol. 15, pp. 271-344, 2006.
[63]
C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
[64]
J. Ding, V. Tarokh, and Y. Yang, "Model selection techniques: An overview," IEEE Signal Process. Mag., vol. 35, no. 6, pp. 16-34, 2018.
[65]
M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International conference on machine learning. PMLR, 2019, pp. 6105-6114.
[66]
X. Xian, X. Wang, J. Ding, and R. Ghanadan, "Assisted learning: A framework for multi-organization learning," in Advances in neural information processing systems, 2020, pp. 14580-14591.
[67]
E. Diao, J. Ding, and V. Tarokh, "GAL: Gradient assisted learning for decentralized multi-organization collaborations," in Advances in Neural Information Processing Systems, 2022, pp. 11854-11868.
[68]
X. Wang, J. Zhang, M. Hong, Y. Yang, and J. Ding, "Parallel assisted learning," IEEE Transactions on Signal Processing, vol. 70, pp. 5848-5858, 2022.
[69]
J. Ding, E. Tramel, A. K. Sahu, S. Wu, S. Avestimehr, and T. Zhang, "Federated learning challenges and opportunities: An outlook," in International Conference on Acoustics, Speech, and Signal Processing, 2022.
[70]
E. Diao, J. Ding, and V. Tarokh, "HeteroFL: Computation and communication efficient federated learning for heterogeneous clients," in International Conference on Learning Representations, 2021.
[71]
E. Diao, J. Ding, and V. Tarokh, "SemiFL: Communication efficient semi-supervised federated learning with unlabeled clients," in Advances in neural information processing systems, 2022.
[72]
E. Diao, J. Ding, and V. Tarokh, "Multimodal controller for generative models," in Computer Vision and Machine Intelligence: Proceedings of CVMI 2022. Springer, 2023, pp. 109-121.
[73]
Y. Zhou, Y. Zhou, J. Ding, and B. Wang, "Visualizing and analyzing the topology of neuron activations in deep adversarial training," in Proceedings of the 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at the 40th International Conference on Machine Learning, 2023.
[74]
X. Tong, Y. Feng, and J. J. Li, "Neyman-pearson (np) classification algorithms and np receiver operating characteristic (np-roc) curves," arXiv preprint arXiv:1608.03109, 2016.
[75]
T. Xie, "Backdoor toolbox," https://github.com/vtu81/backdoor-toolbox, 2022.
[76]
B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, and C. Shen, "Backdoorbench: A comprehensive benchmark of backdoor learning," in Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media