Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3372297.3423348acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations

Published: 02 November 2020 Publication History

Abstract

Existing efforts in audio adversarial attacks only focus on the scenarios where an adversary has prior knowledge of the entire speech input so as to generate an adversarial example by aligning and mixing the audio input with corresponding adversarial perturbation. In this work we consider a more practical and challenging attack scenario where the intelligent audio system takes streaming audio inputs (e.g., live human speech) and the adversary can deceive the system by playing adversarial perturbations simultaneously. This change in attack behavior brings great challenges, preventing existing adversarial perturbation generation methods from being applied directly. In practice, (1) the adversary cannot anticipate what the victim will say: the adversary cannot rely on their prior knowledge of the speech signal to guide how to generate adversarial perturbations; and (2) the adversary cannot control when the victim will speak: the synchronization between the adversarial perturbation and the speech cannot be guaranteed. To address these challenges, in this paper we propose AdvPulse, a systematic approach to generate subsecond audio adversarial perturbations, that achieves the capability to alter the recognition results of streaming audio inputs in a targeted and synchronization-free manner. To circumvent the constraints on speech content and time, we exploit penalty-based universal adversarial perturbation generation algorithm and incorporate the varying time delay into the optimization process. We further tailor the adversarial perturbation according to environmental sounds to make it inconspicuous to humans. Additionally, by considering the sources of distortions occurred during the physical playback, we are able to generate more robust audio adversarial perturbations that can remain effective even under over-the-air propagation. Extensive experiments on two representative types of intelligent audio systems (i.e., speaker recognition and speech command recognition) are conducted in various realistic environments. The results show that our attack can achieve an average attack success rate of over 89.6% in indoor environments and 76.0% in inside-vehicle scenarios even with loud engine and road noises.

Supplementary Material

MOV File (Copy of CCS2020_fpe202_Zhuohang Li - Pat Weeden.mov)
Presentation video

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265--283.
[2]
Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin RB Butler, and Joseph Wilson. 2019. Practical hidden voice attacks against speech and speaker recognition systems. arXiv preprint arXiv:1904.05734 (2019).
[3]
Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2017. Did you hear that? adversarial examples against automatic speech recognition. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS).
[4]
Amazon. 2020. Amazon Echo. https://www.amazon.com/all-new-Echo/dp/B07R1CXKN7
[5]
Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018).
[6]
Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017).
[7]
Chase Bank. 2019. Security as unique as your voice. https://www.chase.com/personal/voice-biometrics.
[8]
Karissa Bell. 2015. A smarter Siri learns to recognize the sound of your voice in iOS 9. https://mashable.com/2015/09/11/hey-siri-voice-recognition/
[9]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP). 39--57.
[10]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of the IEEE Security and Privacy Workshops (SPW). 1--7.
[11]
Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2019. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. arXiv preprint arXiv:1911.01840 (2019).
[12]
Tao Chen, Longfei Shangguan, Zhenjiang Li, and Kyle Jamieson. 2020. Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[13]
Moustapha M Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. 2017. Houdini: Fooling deep structured visual and speech recognition models with adversarial examples. In Proceedings of Advances in neural information processing systems (NeurIPS). 6977--6987.
[14]
Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Li Chen, Michael E Kounavis, and Duen Horng Chau. 2018. Adagio: Interactive experimentation with adversarial attack and defense for audio. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677--681.
[15]
Timothy Dozat. 2016. Incorporating nesterov momentum into adam. In International Conference on Learning Representations (ICLR).
[16]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, Vol. 12, Jul (2011), 2121--2159.
[17]
Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi. 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[18]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.
[19]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[20]
Google. 2020 a. Google Home. https://store.google.com/us/product/google_home
[21]
Google. 2020 b. Speech-to-text Conversion Powered by Machine Learning. https://cloud.google.com/speech-to-text
[22]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.
[23]
Chris Hall. 2019. Hey BMW: Your intelligent voice assistant is actually pretty good. https://www.pocket-lint.com/cars/news/bmw/148690-hey-bmw-your-intelligent-voice-assistant-is-actually-pretty-good.
[24]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).
[25]
Xuedong Huang. 2017. Microsoft researchers achieve new conversational speech recognition milestone. https://www.microsoft.com/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/
[26]
Marco Jeub, Magnus Schafer, and Peter Vary. 2009. A binaural room impulse response database for the evaluation of dereverberation algorithms. In International Conference on Digital Signal Processing. IEEE, 1--5.
[27]
Jack Kiefer, Jacob Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, Vol. 23, 3 (1952), 462--466.
[28]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[29]
Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Emanuel Habets, Reinhold Haeb-Umbach, Volker Leutnant, Armin Sehr, Walter Kellermann, Roland Maas, et al. 2013. The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 1--4.
[30]
Felix Kreuk, Yossi Adi, Moustapha Cisse, and Joseph Keshet. 2018. Fooling end-to-end speaker verification with adversarial examples. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1962--1966.
[31]
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications (HotMobile). 9--14.
[32]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1765--1773.
[33]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2574--2582.
[34]
Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takanobu Nishiura, and Takeshi Yamada. 2000. Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. In Language Resources and Evaluation Conference. 965--968.
[35]
Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, and Farinaz Koushanfar. 2019. Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019).
[36]
Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. A time delay neural network architecture for efficient modeling of long temporal contexts. In Annual Conference of the International Speech Communication Association (INTERSPEECH).
[37]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi Speech Recognition Toolkit. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRW-USB.
[38]
Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition. In Proceedings of the International Conference on Machine Learning (ICLR). 5231--5240.
[39]
Tara N Sainath and Carolina Parada. 2015. Convolutional neural networks for small-footprint keyword spotting. In Annual Conference of the International Speech Communication Association (INTERSPEECH).
[40]
Samsung. 2020. Unlocks your phone with Bixby Voice. https://www.samsung.com/us/support/answer/ANS00082783/
[41]
Lea Schönherr, Katharina Kohls, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2018. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665 (2018).
[42]
Google Assistant SDK. 2020. Best Practices for Audio. https://developers.google.com/assistant/sdk/guides/service/python/best-practices/audio.
[43]
Suwon Shon, Hao Tang, and James Glass. 2018. Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In IEEE Spoken Language Technology Workshop (SLT). IEEE, 1007--1013.
[44]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 5329--5333.
[45]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[46]
Tesla. 2020. Voice Commands. https://www.tesla.com/support/voice-commands.
[47]
Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. arXiv preprint arXiv:1911.10182 (2019).
[48]
Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, et al. 2017. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR) (2017).
[49]
Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
[50]
Weidi Xie, Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2019. Utterance-level aggregation for speaker recognition in the wild. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5791--5795.
[51]
Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52]
Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. arXiv preprint arXiv:1810.11793 (2018).
[53]
Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Characterizing audio adversarial examples using temporal dependency. arXiv preprint arXiv:1809.10875 (2018).
[54]
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium (USENIX Security 18). 49--64.
[55]
Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. 2020. Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM).
[56]
Yingke Zhu, Tom Ko, David Snyder, Brian Mak, and Daniel Povey. 2018. Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. In Interspeech. 3573--3577.

Cited By

View all
  • (2024)Extending adversarial attacks to produce adversarial class probability distributionsThe Journal of Machine Learning Research10.5555/3648699.364871424:1(491-532)Online publication date: 6-Mar-2024
  • (2024)CommanderUAP: a practical and transferable universal adversarial attacks on speech recognition modelsCybersecurity10.1186/s42400-024-00218-87:1Online publication date: 5-Jun-2024
  • (2024)Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665532(47-55)Online publication date: 2-Jul-2024
  • Show More Cited By

Index Terms

  1. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
      October 2020
      2180 pages
      ISBN:9781450370899
      DOI:10.1145/3372297
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 November 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. audio adversarial attack
      2. intelligent audio system
      3. synchronization-free

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation
      • Air Force Research Laboratory

      Conference

      CCS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)603
      • Downloads (Last 6 weeks)96
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Extending adversarial attacks to produce adversarial class probability distributionsThe Journal of Machine Learning Research10.5555/3648699.364871424:1(491-532)Online publication date: 6-Mar-2024
      • (2024)CommanderUAP: a practical and transferable universal adversarial attacks on speech recognition modelsCybersecurity10.1186/s42400-024-00218-87:1Online publication date: 5-Jun-2024
      • (2024)Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665532(47-55)Online publication date: 2-Jul-2024
      • (2024)Toward Robust ASR System against Audio Adversarial Examples using Agitated LogitACM Transactions on Privacy and Security10.1145/366182227:2(1-26)Online publication date: 10-Jun-2024
      • (2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
      • (2024)Adversarial Perturbation Prediction for Real-Time Protection of Speech PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346353819(8701-8716)Online publication date: 2024
      • (2024)Indelible “Footprints” of Inaudible Command InjectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.345972819(8485-8499)Online publication date: 1-Jan-2024
      • (2024)AFPM: A Low-Cost and Universal Adversarial Defense for Speaker Recognition SystemsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334823219(2273-2287)Online publication date: 1-Jan-2024
      • (2024)AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human PerceptionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334563919(1948-1962)Online publication date: 1-Jan-2024
      • (2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media