Abstract
Speaker Recognition Systems (SRSs) are gradually introducing Deep Neural Networks (DNNs) as their core architecture, while attackers exploit the weakness of DNNs to launch adversarial attacks. Previous studies generate adversarial examples by injecting the human-imperceptible noise into the gradients of audio data, which is termed as white-box attacks. However, these attacks are impractical in real-world scenarios because they have a high dependency on the internal information of the target classifier. To address this constraint, this study proposes a method applying in a black-box condition which only permits the attacker to estimate the internal information by interacting with the model through its inputs and outputs. We use the idea of the substitution-based method and transfer-based method to train various surrogate models for imitating the target models. Our methods combine the surrogate models with white-box methods like Momentum Iterative Fast Gradient Sign Method (MI-FGSM) and Enhanced Momentum Iterative Fast Gradient Sign Method (EMI-FGSM) to boost the performance of the adversarial attacks. Furthermore, a transferability analysis is conducted on multiple models under cross-architecture, cross-feature and cross-architecture-feature conditions. Additionally, frequency analysis also provides us with valuable findings about adjusting the parameters in attack algorithms. Massive experiments validate that our attack yields a prominent performance compared to previous studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hanifa, R.M., Isa, K., Mohamad, S.: A review on speaker recognition: technology and challenges. Comput. Elec. Eng. 90(3), 107005 (2021)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). IEEE (2014)
Li, X., Zhong, J., Wu, X., Yu, J., Liu, X., Meng, H.: Adversarial attacks on GMM i-vector based speaker verification systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 6579–6583. IEEE (2020)
Tan, H., Wang, L., Zhang, H., Zhang, J., Shafiq, M., Gu, Z.: Adversarial attack and defense strategies of speaker recognition systems: a survey. Electronics 11(14), 2183 (2022)
Li, J., Zhang, X., Xu, J., Ma, S., Gao, W.: Learning to fool the speaker recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 2937–2941. IEEE (2020)
Li, J., et al.: Universal adversarial perturbations generative network for speaker recognition. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Zhang, L., Meng, Y., Yu, J., Xiang, C., Falk, B., Zhu, H.: Voiceprint mimicry attack towards speaker verification system in smart home. In: Proceedings of the 39th IEEE Conference on Computer Communications, INFOCOM 2020, pp. 377–386. IEEE (2020)
Zhang, J., et al.: NMI-FGSM-Tri: an efficient and targeted method for generating adversarial examples for speaker recognition. In: 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 167–174. IEEE (2022)
Zheng, B., et al.: Black-box adversarial attacks on commercial speech platforms with minimal information. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 86–107. ACM (2021)
Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 357–369. ACM (2020)
Zhang, X., Zhang, X., Sun, M., Zou, X., Chen, K., Yu, N.: Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition. Complex Intell. Syst. 9(1), 65–79 (2023)
Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 14129–14137 (2021)
Kariyappa, S., Prakash, A., Qureshi, M.K.: MAZE: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13814–13823. IEEE (2021)
Wang, Y., et al.: Black-box dissector: towards erasing-based hard-label model stealing attack. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) 17th European Conference on Computer Vision, ECCV 2022. LNCS, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part V, pp. 192–208. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20065-6_12
Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: ES attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerg. Top. Comput. Intell. 6(5), 1258–1270 (2022)
Wang, F., Ma, Z., Zhang, X., Li, Q., Wang, C.: DDSG-GAN: generative adversarial network with dual discriminators and single generator for black-box attacks. Mathematics. 11(4), 1016 (2023)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9185–9193. IEEE (2018)
Wang, X., Lin, J., Hu, H., Wang, J., He, K.: Boosting adversarial transferability through enhanced momentum. arXiv preprint arXiv: 2103.10609 (2021)
Goodfellow, I.J., Shlens, J., Szegedy C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Zhang, X., Xu, Y., Zhang, S., Li, X.: A highly stealthy adaptive decay attack against speaker recognition. IEEE Access 10(11), 118789–118805 (2022)
Luo, H., Shen, Y., Lin, F., Xu, G.: Spoofing speaker verification system by adversarial examples leveraging the generalized speaker difference. Secur. Commun. Netw. 2021, 1–10 (2021)
Zhang, W., et al.: Attack on practical speaker verification system using universal adversarial perturbations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 2575–2579. IEEE (2021)
Shamsabadi, A.S., Teixeira, F.S., Abad, A., Raj, B., Cavallaro, A., Trancoso, I.: FoolHD: fooling speaker identification by highly imperceptible adversarial disturbances. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 6159–6163. IEEE (2021)
Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 694–711. IEEE (2019)
Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: SEC4SR: a security analysis platform for speaker recognition. arXiv preprint arXiv:2109.01766 (2021)
Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: AS2T: arbitrary source-to-target adversarial attack on speaker recognition systems. arXiv preprint arXiv:2206.03351 (2022)
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals. arXiv preprint arXiv:1807.03418 (2018)
Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., Khudanpur, S.: Speaker recognition for multi-speaker conversations using x-vectors. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5796–5800 (2019)
Son Chung, J., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of the Workshop of the 5th International Conference on Learning Representations, ICLR 2017, pp. 99–112. IEEE (2017)
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: Proceedings of the 27th USENIX Security Symposium, pp. 49–64. IEEE (2018)
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 749–752. IEEE (2001)
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Sharma, Y., Ding, G.W., Brubaker, M.: On the effectiveness of low frequency perturbations. arXiv preprint arXiv:1903.00073 (2019)
Acknowledgements
This research was funded by NSFC under Grant 61572170, Natural Science Foundation of Hebei Province under Grant F2021205004, Science and Technology Foundation Project of Hebei Normal University under Grant L2021K06, Science Foundation of Returned Overseas of Hebei Province Under Grant C2020342, and Key Science Foundation of Hebei Education Department under Grant ZD2021062.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, F., Song, R., Li, Q., Wang, C. (2024). Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14491. Springer, Singapore. https://doi.org/10.1007/978-981-97-0808-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-97-0808-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0807-9
Online ISBN: 978-981-97-0808-6
eBook Packages: Computer ScienceComputer Science (R0)