Abstract
Training from pre-trained models (PTM) is a popular approach for fast machine learning (ML) service deployment. Recent studies on hardware security have revealed that ML systems could be compromised through flipping bits in model parameters (e.g., weights) with memory faults. In this paper, we introduce WBP (i.e., weight bit poisoning), a novel task-agnostic backdoor attack that manifests during the victim’s training time (i.e., fine-tuning from a public and clean PTM) by inducing hardware-based weight bit flips. WBP utilizes a novel distance-aware algorithm that identifies bit flips to maximize the distance between the distribution of poisoned output representations (ORs) and clean ORs based on the public PTM. This unique set of bit flips can be applied to backdoor any victim model during the fine-tuning of the same public PTM, regardless of the downstream tasks. We evaluate WBP on state-of-the-art CNNs and Vision Transformer models with representative downstream tasks. The results show that WBP can compromise a wide range of PTMs and downstream tasks with an average 99.3% attack success rate by flipping as few as 11 model weight bits. WBP can be effective in various training configurations with respect to learning rate, optimizer, and fine-tuning duration. We investigate limitations of existing backdoor protection techniques against WBP and discuss potential future mitigation. (Our code can be accessed at: https://github.com/casrl/WBP).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note this is a theoretical setup to study the applicability of prior methods. In reality, typically only one bit can flip for a period of time.
References
Al Rafi, M., Feng, Y., Yao, F., Tang, M., Jeon, H.: Decepticon: attacking secrets of transformers. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 128–139 (2023)
Bai, J., Gao, K., Gong, D., Xia, S.T., Li, Z., Liu, W.: Hardly perceptible trojan attack against neural networks with bit flips. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13665, pp. 104–121. Springer, Cham (2022)
Bai, J., Wu, B., Zhang, Y., Li, Y., Li, Z., Xia, S.: Targeted attack against deep neural networks via flipping limited weight bits. In: International Conference on Learning Representations (ICLR) (2021)
Bickel, P., Doksum, K.: Mathematical Statistics: Basic Ideas and Selected Topics. Prentice Hall (2001)
Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. arXiv preprint arXiv:1712.09665 (2017)
Cai, K., Chowdhuryy, M.H.I., Zhang, Z., Yao, F.: Seeds of seed: NMT-stroke: diverting neural machine translation through hardware-based faults. In: International Symposium on Secure and Private Execution Environment Design (SEED), pp. 76–82 (2021)
Cai, K., Chowdhuryy, M.H.I., Zhenkai, Z., Yao, F.: DeepVenom: persistent DNN backdoors exploiting transient weight perturbations in memories. In: 2024 IEEE Symposium on Security and Privacy (SP), p. 244 (2024)
Cai, K., Zhang, Z., Yao, F.: On the feasibility of training-time trojan attacks through hardware-based faults in memory. In: Hardware Oriented Security and Trust (HOST), pp. 133–136 (2022)
Chan, A., Ong, Y.S.: Poison as a cure: detecting & neutralizing variable-sized backdoor attacks in deep neural networks. arXiv preprint arXiv:1911.08040 (2019)
Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. In: Workshop on AAAI Conference on Artificial Intelligence (AAAI), vol. 2301 (2019)
Chen, H., Fu, C., Zhao, J., Koushanfar, F.: ProFlip: targeted trojan attack with progressive bit flips. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7718–7727 (2021)
Chen, K., et al.: BadPre: task-agnostic backdoor attacks to pre-trained NLP foundation models. In: International Conference on Learning Representations (ICLR) (2022)
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)
Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: benchmark and state of the art. Proc. IEEE 1865–1883 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Fang, H., Dayapule, S.S., Yao, F., Doroslovački, M., Venkataramani, G.: A noise-resilient detection method against advanced cache timing channel attack. In: Asilomar Conference on Signals, Systems, and Computers. pp. 237–241 (2018)
French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 128–135 (1999)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) International Conference on Learning Representations (ICLR) (2015)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 723–773 (2012)
Gruss, D., et al.: Another flip in the wall of Rowhammer defenses. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 245–261. IEEE (2018)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR abs/1708.06733 (2017)
Hayase, J., Kong, W.: SPECTRE: defending against backdoor attacks using robust covariance estimation. In: International Conference on Machine Learning (ICLR) (2020)
Helber, P., Bischke, B., Dengel, A., Borth, D.: EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2217–2226 (2019)
Hestness, J., et al.: Deep learning scaling is predictable. Empirically. arXiv, p. 2 (2017)
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the German Traffic Sign Detection Benchmark. In: International Joint Conference on Neural Networks, No. 1288 (2013)
Jattke, P., van der Veen, V., Frigo, P., Gunter, S., Razavi, K.: BLACKSMITH: scalable Rowhammering in the frequency domain. In: IEEE Symposium on Security and Privacy (SP), vol. 1 (2022)
Jeon, M., Venkataraman, S., Phanishayee, A., Qian, J., Xiao, W., Yang, F.: In: USENIX Annual Technical Conference, pp. 947–960 (2019)
Jia, J., Liu, Y., Gong, N.Z.: BadEncoder: backdoor attacks to pre-trained encoders in self-supervised learning. In: IEEE Symposium on Security and Privacy (SP) (2022)
Kim, Y., et al.: Flipping bits in memory without accessing them: an experimental study of dram disturbance errors. ACM SIGARCH Comput. Architect. News 361–372 (2014)
Koh, J.Y.: Model zoo: discover open source deep learning code and pretrained models. https://modelzoo.co/
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Li, L., Song, D., Li, X., Zeng, J., Ma, R., Qiu, X.: Backdoor attacks on pre-trained models by layerwise weight poisoning. In: Empirical Methods in Natural Language Processing (EMNLP) (2021)
Liu, F., Yarom, Y., Ge, Q., Heiser, G., Lee, R.B.: Last-level cache side-channel attacks are practical. In: IEEE Symposium on Security and Privacy (SP), pp. 605–622 (2015)
Liu, Q., Yin, J., Wen, W., Yang, C., Sha, S.: \(\{\)NeuroPots\(\}\): realtime proactive defense against \(\{\)Bit-Flip\(\}\) attacks in neural networks. In: USENIX Security Symposium, pp. 6347–6364 (2023)
Liu, Y., et al.: Trojaning attack on neural networks. In: Network and Distributed System Security Symposium (NDSS) (2018)
Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. In: International Conference on Computer Design (ICCD), pp. 45–48 (2017)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, pp. 109–165 (1989)
McKeen, F., et al.: Intel® software guard extensions (intel® SGX) support for dynamic memory management inside an enclave. In: Hardware and Architectural Support for Security and Privacy (HASP), pp. 1–9 (2016)
monkeydoodle@gmail.com: The Oxford-IIIT pet dataset dataset (2022). https://universe.roboflow.com/monkeydoodle-gmail-com/the-oxford-iiit-pet-dataset
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729 (2008)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 1345–1359 (2009)
Rakin, A.S., Chowdhuryy, M.H.I., Yao, F., Fan, D.: DeepSteal: advanced model extractions leveraging efficient weight stealing in memories. In: IEEE Security and Privacy (SP), pp. 1157–1174 (2022)
Rakin, A.S., He, Z., Fan, D.: Bit-flip attack: crushing neural network with progressive bit search. In: International Conference on Computer Vision (ICCV), pp. 1211–1220 (2019)
Rakin, A.S., He, Z., Fan, D.: TBT: targeted neural network attack with bit trojan. In: Computer Vision and Pattern Recognition (CVPR), pp. 13195–13204 (2020)
Rakin, A.S., He, Z., Li, J., Yao, F., Chakrabarti, C., Fan, D.: T-BFA: targeted bit-flip adversarial weight attack. IEEE Trans. Pattern Anal. Mach. Intell. 7928–7939 (2021)
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Seaborn, M., Dullien, T.: Exploiting the dram Rowhammer bug to gain kernel privileges. Black Hat 71 (2015)
Shen, G., et al.: Backdoor scanning for deep neural networks through k-arm optimization. In: International Conference on Machine Learning (ICML) (2021)
Shen, L., et al.: Backdoor pre-trained models can transfer to all. In: ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 3141–3158 (2021)
Singhal, A., et al.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 35–43 (2001)
Tol, M.C., Islam, S., Adiletta, A.J., Sunar, B., Zhang, Z.: Don’t knock! Rowhammer at the backdoor of DNN models. In: Dependable Systems and Networks (DSN), pp. 109–122 (2023)
Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against federated learning systems. In: European Symposium on Research in Computer Security (ESORICS), pp. 480–501 (2020)
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 31 (2018)
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: IEEE Symposium on Security and Privacy (SP), pp. 707–723 (2019)
Wang, J., et al.: Aegis: mitigating targeted bit-flip attacks against deep neural networks. In: USENIX Security Symposium, pp. 2329–2346 (2023)
Wang, S., Nepal, S., Rudolph, C., Grobler, M., Chen, S., Chen, T.: Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Trans. Serv. Comput. (TSC) 1526–1539 (2020)
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR (2019)
Yao, F., Fang, H., Doroslovački, M., Venkataramani, G.: COTSknight: practical defense against cache timing channel attacks using cache monitoring and partitioning technologies. In: Hardware Oriented Security and Trust (HOST), pp. 121–130 (2019)
Yao, F., Rakin, A.S., Fan, D.: DeepHammer: depleting the intelligence of deep neural networks through targeted chain of bit flips. In: USENIX Security Symposium, pp. 1463–1480 (2020)
Yao, F., Venkataramani, G., Doroslovački, M.: Covert timing channels exploiting non-uniform memory access based architectures. In: Proceedings of the on Great Lakes Symposium on VLSI 2017, pp. 155–160 (2017)
Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 2041–2055 (2019)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems (NeurIPS), vol. 27 (2014)
Zhang, Z., et al.: Red alarm for pre-trained models: universal vulnerability to neuron-level backdoor attacks. Mach. Intell. Res. 180–193 (2023)
Acknowledgements
This work is supported in part by U.S. National Science Foundation under SaTC-2019536 and CNS-2147217.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, K., Zhang, Z., Lou, Q., Yao, F. (2025). WBP: Training-Time Backdoor Attacks Through Hardware-Based Weight Bit Poisoning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15123. Springer, Cham. https://doi.org/10.1007/978-3-031-73650-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-73650-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73649-0
Online ISBN: 978-3-031-73650-6
eBook Packages: Computer ScienceComputer Science (R0)