Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3212480.3212505acmconferencesArticle/Chapter ViewAbstractPublication PageswisecConference Proceedingsconference-collections
research-article
Public Access

Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security

Published: 18 June 2018 Publication History

Abstract

Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.

References

[1]
2017 Voice Assistant Trends {Infographic}. https://ifttt.com/blog/2017/07/voice-assistant-trends-infographic. 2017-07-12.
[2]
Adobe demos "photoshop for audio," lets you edit speech as easily as text. https://arstechnica.com/information-technology/2016/11/adobe-voco-photoshop-for-audio-speech-editing/, 2017.
[3]
Amazon Alexa Line. https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?ie=UTF8&node=9818047011, 2017.
[4]
Apple Siri. https://www.apple.com/ios/siri/, 2017.
[5]
August Home Supports the Google Assistant. http://august.com/2017/03/28/google-assistant/, 2017.
[6]
Cortana. https://www.microsoft.com/en-us/windows/cortana, 2017.
[7]
Google Assistant. https://assistant.google.com/, 2017.
[8]
Google Home now lets you shop by voice just like Amazon's Alexa. https://techcrunch.com/2017/02/16/google-home-now-lets-you-shop-byvoice-just-like-amazons-alexa/, 2017.
[9]
Lyrebird. https://github.com/logant/Lyrebird, 2017.
[10]
Starling Bank Integrates API into Google Home. http://bankinnovation.net/2017/02/starling-bank-integrates-api-into-google-home-video/, 2017.
[11]
P. S. Aleksic and A. K. Katsaggelos. Audio-visual biometrics. Proceedings of the IEEE, 94(11):2025--2044, Nov 2006.
[12]
P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli. Multimodal fusion for multimedia analysis: a survey. Multimedia Systems, 16(6):345--379, Nov 2010.
[13]
R. J. Baken and R. F. Orlikoff. Clinical measurement of speech and voice. Cengage Learning, 2000.
[14]
L. Blue, H. Abdullah, L. Vargas, and P. Traynor. 2ma: Verifying voice commands via two microphone authentication. In Proceedings of the 2018 ACM on Asia Conference on Computer and Communications Security. ACM, 2018.
[15]
N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. Hidden Voice Commands. In 25th USENIX Security Symposium, 2016.
[16]
G. Chetty and M. Wagner. LivenessâĂİ verification in audio-video authentication. 2004.
[17]
N. Eveno and L. Besacier. Co-inertia analysis for "liveness" test in audio-visual biometrics. In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., pages 257--261, Sept 2005.
[18]
Google. Transactions Developer Preview. https://developers.google.com/actions/transactions/, 2017.
[19]
A. K. Jain, R. Bolle, and S. Pankanti. Biometrics: personal identification in networked society, volume 479. Springer Science & Business Media, 2006.
[20]
H. Kuwabara. Acoustic properties of phonemes in continuous speech for different speaking rate. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, volume 4, pages 2435--2438. IEEE, 1996.
[21]
S. Maheshwari. Burger King 'O.K. Google' Ad Doesn't Seem O.K. With Google. https://www.nytimes.com/2017/04/12/business/burger-king-tv-ad-google-home.html, 2017.
[22]
D. Mukhopadhyay, M. Shirvanian, and N. Saxena. All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. 20th European Symposium on Research in Computer Security, 2015.
[23]
S. Nichols. TV anchor says live on-air 'Alexa, order me a dollhouse' - Guess what happens next. https://www.theregister.co.uk/2017/01/07/tv-anchor-says-alexabuy-me-a-dollhouse-and-she-does/, 2017.
[24]
D. A. Reynolds. Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17(1):91--108, 1995.
[25]
A. Ross and A. Jain. Information fusion in biometrics. Pattern Recognition Letters, 24(13):2115--2125, 2003. Audio- and Video-based Biometric Person Authentication (AVBPA 2001).
[26]
R. R. Sanders. The electrostatic loudspeaker design cookbook. Audio Amateur, Incorporated, 2017.
[27]
C. Sanderson and K. K. Paliwal. Identity verification using speech and face information. Digital Signal Processing, 14(5):449--480, 2004.
[28]
M. Shirvanian and N. Saxena. Wiretapping via Mimicry: Short Voice Imitation Man-in-the-middle attacks on Crypto Phones. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014.
[29]
S. T. Shivappa, M. M. Trivedi, and B. D. Rao. Audiovisual information fusion in human 2013;computer interfaces and intelligent environments: A survey. Proceedings of the IEEE, 98(10):1692--1715, Oct 2010.
[30]
H. Stephenson. UX design trends 2018: from voice interfaces to a need to not trick people. Digital Arts - https://www.digitalartsonline.co.uk/features/interactive-design/ux-design-trends-2018-from-voice-interfaces-need-not-trick-people/, 2018.
[31]
K. N. Stevens. Acoustic phonetics, volume 30. MIT press, 2000.
[32]
S. Studio. Respeaker 4-mic array for raspberry pi. https://www.seeedstudio.com/ReSpeaker-4-Mic-Array-for-Raspberry-Pi-p-2941.html. Accessed: March 5, 2018.
[33]
A. Team. Audacity homepage. https://www.audacityteam.org/. Accessed: March 5, 2018.
[34]
I. R. Titze and D. W. Martin. Principles of voice production. ASA, 1998.
[35]
T. Vaidya, Y. Zhang, M. Sherr, and C. Shields. Cocaine Noodles: Exploiting the Gap Between Human and Machine Speech Recognition. 11th USENIX Workshop on Offensive Technologies, 2015.
[36]
Z. Wu, S. Gao, E. S. Cling, and H. Li. A study on replay attack and anti-spoofing for text-dependent speaker verification. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, pages 1--5, Dec 2014.
[37]
G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu. Dolphinattack: Inaudible Voice Commands. Computer and Communications Security (CCS), 2017.

Cited By

View all
  • (2024)AeroSense: Sensing Aerosol Emissions from Indoor Human ActivitiesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595938:2(1-30)Online publication date: 15-May-2024
  • (2024)Turning Noises to Fingerprint-Free “Credentials”: Secure and Usable Drone AuthenticationIEEE Transactions on Mobile Computing10.1109/TMC.2024.337350323:10(10161-10174)Online publication date: Oct-2024
  • (2024)Indelible “Footprints” of Inaudible Command InjectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.345972819(8485-8499)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WiSec '18: Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks
      June 2018
      317 pages
      ISBN:9781450357319
      DOI:10.1145/3212480
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Internet of Things
      2. Voice interface

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      WiSec '18
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 98 of 338 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)193
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 30 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)AeroSense: Sensing Aerosol Emissions from Indoor Human ActivitiesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595938:2(1-30)Online publication date: 15-May-2024
      • (2024)Turning Noises to Fingerprint-Free “Credentials”: Secure and Usable Drone AuthenticationIEEE Transactions on Mobile Computing10.1109/TMC.2024.337350323:10(10161-10174)Online publication date: Oct-2024
      • (2024)Indelible “Footprints” of Inaudible Command InjectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.345972819(8485-8499)Online publication date: 2024
      • (2024)Room-scale Voice Liveness Detection for Smart DevicesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.3367269(1-14)Online publication date: 2024
      • (2024)Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal10.1109/JIOT.2023.332825311:8(13108-13124)Online publication date: 15-Apr-2024
      • (2024)Speech emotion recognition systems and their security aspectsArtificial Intelligence Review10.1007/s10462-024-10760-z57:6Online publication date: 21-May-2024
      • (2024)Security and Privacy of Augmented Reality SystemsNetwork Security Empowered by Artificial Intelligence10.1007/978-3-031-53510-9_11(305-330)Online publication date: 24-Feb-2024
      • (2023)Is Someone There or Is That the TV? Detecting Social Presence Using SoundACM Transactions on Human-Robot Interaction10.1145/361165812:4(1-33)Online publication date: 13-Dec-2023
      • (2023)Phantom-CSI Attacks against Wireless Liveness DetectionProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607245(440-454)Online publication date: 16-Oct-2023
      • (2023)BarrierBypass: Out-of-Sight Clean Voice Command Injection Attacks through Physical BarriersProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3581772(203-214)Online publication date: 29-May-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media