Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448300.3467816acmconferencesArticle/Chapter ViewAbstractPublication PageswisecConference Proceedingsconference-collections
research-article

VoIPLoc: passive VoIP call provenance via acoustic side-channels

Published: 28 June 2021 Publication History

Abstract

We propose VoIPLoc, a novel location fingerprinting technique and apply it to the VoIP call provenance problem. It exploits echo-location information embedded within VoIP audio to support fine-grained location inference. We found consistent statistical features induced by the echo-reflection characteristics of the location into recorded speech. These features are discernible within traces received at the VoIP destination, enabling location inference. We evaluated VoIPLoc by developing a dataset of audio traces received through VoIP channels over the Tor network. We show that recording locations can be fingerprinted and detected remotely with a low false-positive rate, even when a majority of the audio samples are unlabelled. Finally, we note that the technique is fully passive and thus undetectable, unlike prior art. VoIPLoc is robust to the impact of environmental noise and background sounds, as well as the impact of compressive codecs and network jitter. The technique is also highly scalable and offers several degrees of freedom terms of the fingerprintable space.

References

[1]
https://venturebeat.com/2011/06/19/narattes-zoosh-enables-nfc-with-just-a-speaker-and-microphone/. https://venturebeat.com/2011/06/19/narattes-zoosh-enables-nfc-with-just-a-speaker-and-microphone/. Accessed: 2020-03-13.
[2]
Ultrasound detector android app. https://play.google.com/store/apps/details?id=com.microcadsystems.serge.ultrasounddetector&hl=en. Accessed: 2020-03-13.
[3]
AbdelRahman Abdou. Internet location verification: Challenges and solutions, 2018.
[4]
Martin Azizyan, Ionut Constandache, and Romit Roy Choudhury. Surroundsense Mobile phone localization via ambience fingerprinting. In Proceedings of the 15th Annual International Conference on Mobile Computing and Networking, MobiCom '09, pages 261--272, New York, NY, USA, 2009. ACM.
[5]
Vijay A. Balasubramaniyan, Aamir Poonawalla, Mustaque Ahamad, Michael T. Hunter, and Patrick Traynor. Pindr0p: Using single-ended audio features to determine call provenance. In Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS '10, pages 109--120, New York, NY, USA, 2010. ACM.
[6]
Y-Lan Boureau, Jean Ponce, and Yann Lecun. A theoretical analysis of feature pooling in visual recognition. In 27TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING, HAIFA, ISRAEL, 2010.
[7]
J. C. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1):425--434, 1991.
[8]
Sasha Calhoun, Jean Carletta, Jason M. Brenier, Neil Mayo, Dan Jurafsky, Mark Steedman, and David Beaver. The nxt-format switchboard corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Lang. Resour. Eval., 44(4):387--419, December 2010.
[9]
Peng Cheng, Ibrahim Ethem Bagci, Utz Roedig, and Jeff Yan. Sonarsnoop: Active acoustic side-channel attacks. International Journal of Information Security, pages 1--16, 2018.
[10]
S. Chu, S. Narayanan, and C.-C.J. Kuo. Environmental sound recognition with time;frequency audio features. Audio, Speech, and Language Processing, IEEE Transactions on, 17(6):1142--1158, Aug 2009.
[11]
Stephane G. Conti, Philippe Roux, David Demer, and Julien de Rosny. Measurement of the scattering and absorption cross sections of the human body. Applied Physics Letters, 84(5):819, 2004.
[12]
S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357--366, Aug 1980.
[13]
Ivan Dokmanić, Reza Parhizkar, Andreas Walther, Yue M. Lu, and Martin Vetterli. Acoustic echoes reveal room shape. Proceedings of the National Academy of Sciences, 110(30):12186--12191, 2013.
[14]
Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan Salakhutdinov, and Aarti Singh. How many samples are needed to estimate a convolutional or recurrent neural network?, 2018.
[15]
A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho, and J. Huopaniemi. Audio-based context recognition. IEEE Transactions on Audio, Speech and Language Processing, 14(1):321--329, 2006.
[16]
A. Farina and R. Ayalon. Recording concert hall acoustics for posterity. In 24th AES Conference on Multichannel Audio, Banff, Canada, pages 26--28, 2003.
[17]
Van Gegel. Onionphone v0. 2a: Voip add-on for torchat.
[18]
Van Gegel. Torfone: P2p secure and anonymous voip tool. http://torfone.org, 2012.
[19]
Marie Guéguin, Régine Le Bouquin-Jeannès, Valérie Gautier-Turbin, Gérard Faucon, and Vincent Barriac. On the evaluation of the conversational speech quality in telecommunications. EURASIP J. Adv. Signal Process, 2008, January 2008.
[20]
Michael Hanspach and Michael Goetz. On covert acoustical mesh networks in air. JCM, 8(11):758--767, 2013.
[21]
Stephan Heuser, Bradley Reaves, Praveen Kumar Pendyala, Henry Carter, Alexandra Dmitrienko, William Enck, Negar Kiyavash, Ahmad-Reza Sadeghi, and Patrick Traynor. Phonion: Practical protection of metadata in telephony networks. PoPETs, 2017(1):170--187, 2017.
[22]
Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, and Noboru Miyazaki. Noise robust voice activity detection based on periodic to aperiodic component ratio. Speech Communication, 52(1):41 -- 60, 2010.
[23]
Soeren Jensen, Koen Vos, and Karsten Soerensen. SILK Speech Codec. Technical Report draft-vos-silk-02.txt, IETF Secretariat, Fremont, CA, USA, September 2010.
[24]
Constantine Kotropoulos and Stamatios Samaras. Mobile phone identification using recorded speech signals. In 2014 19th International Conference on Digital Signal Processing, pages 586--591. IEEE, 2014.
[25]
Sotiris Kotsiantis. Combining bagging, boosting, rotation forest and random subspace methods. Artificial Intelligence Review, 35(3):223--240, 2011.
[26]
Christian Kraetzer, Andrea Oermann, Jana Dittmann, and Andreas Lang. Digital audio forensics: A first practical evaluation on microphone and environment classification. In Proceedings of the 9th Workshop on Multimedia & Security, MM&Sec '07, pages 63--74, New York, NY, USA, 2007. ACM.
[27]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84--90, May 2017.
[28]
NicholasDLane, Petko Georgiev, and Lorena Qendro. Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 283--294, 2015.
[29]
Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In In NIPS, pages 556--562. MIT Press, 2000.
[30]
E. Lehmann and A. Johansson. Prediction of energy decay in room impulse responses simulated with an image-source model. Journal of Acoustical Soceity of America, 124(1):269--277, June 2008.
[31]
Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury, and Andrew T. Campbell. Soundsense: Scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services, MobiSys '09, pages 165--178, New York, NY, USA, 2009. ACM.
[32]
H. Malik and H. Farid. Audio forensics from acoustic reverberation. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 1710--1713, March 2010.
[33]
Hafiz Malik and Hong Zhao. Recording environment identification using acoustic reverberation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1833--1836. IEEE, 2012.
[34]
Gregory Maxwell, Timothy Terriberry, Jean-Marc Valin, and Christopher Montgomery. Constrained-Energy Lapped Transform (CELT) Codec. Technical Report draft-valin-celt-codec-02.txt, IETF Secretariat, Fremont, CA, USA, July 2010.
[35]
Deep Medhi and Karthik Ramasamy. Network routing: algorithms, protocols, and architectures. Morgan Kaufmann, 2017.
[36]
Jahanirad Mehdi. Optimising acoustic features for source mobile device identification using spectral analysis techniques/Mehdi Jahanirad. PhD thesis, University of Malaya, 2016.
[37]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. CVPR, 518(7540):529--533, February 2015.
[38]
F. Mordelet and J. P. Vert. A bagging svm to learn from positive and unlabeled examples. Pattern Recogn. Lett., 37:201--209, February 2014.
[39]
Samuel Muhizi, Gregory Shamshin, Ammar Muthanna, Ruslan Kirichek, Andrei Vladyko, and Andrey Koucheryavy. Analysis and performance evaluation of sdn queue model. In International Conference on Wired/Wireless Internet Communication, pages 26--37. Springer, 2017.
[40]
Prateek Murgai, Mark Rau, and Jean-Marc Jot. Blind estimation of the reverberation fingerprint of unknown acoustic environments. In Audio Engineering Society Convention 143. Audio Engineering Society, 2017.
[41]
Shishir Nagaraja and Ryan Shah. Clicktok: Click fraud detection using traffic analysis. In Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, WiSec '19, page 105--116, New York, NY, USA, 2019. Association for Computing Machinery.
[42]
G.M. Naylor. Odeon---another hybrid room acoustical model. Applied Acoustics, 38(2--4):131 -- 143, 1993.
[43]
Reza Parhizkar, Ivan Dokmanić, and Martin Vetterli. Single-channel indoor microphone localization. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1434--1438. IEEE, 2014.
[44]
The Guardian Paul Lewis. Snowden documents show nsa gathering 5bn cell phone records daily. https://www.theguardian.com/world/2013/dec/04/nsa-storing-cell-phone-records-daily-snowden, 2013.
[45]
C. Plapous, C. Marro, and P. Scalart. Improved signal-to-noise ratio estimation for speech enhancement. Audio, Speech, and Language Processing, IEEE Transactions on, 14(6):2098--2108, Nov 2006.
[46]
Nico Podevijn, David Plets, Jens Trogh, Abdulkadir Karaagac, Jetmir Haxhibcqiri, Jeroen Hoebeke, Luc Martens, Pieter Suanet, and Wout Joseph. Performance comparison of rss algorithms for indoor localization in large open environments. In 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pages 1--6. IEEE, 2018.
[47]
Swadhin Pradhan, Ghufran Baig, Wenguang Mao, Lili Qiu, Guohai Chen, and Bo Yang. Smartphone-based acoustic indoor space mapping. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2):1--26, 2018.
[48]
J. Ramirez, J. C. Segura, J. M. Gorriz, and L. Garcia. Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition. Trans. Audio, Speech and Lang. Proc., 15(8):2177--2189, November 2007.
[49]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision - ECCV 2016, pages 525--542, Cham, 2016. Springer International Publishing.
[50]
Roman Schlegel, Kehuan Zhang, Xiao Yong Zhou, Mehool Intwala, Apu Kapadia, and XiaoFeng Wang. Soundcomber: A stealthy and context-aware sound trojan for smartphones. In NDSS. The Internet Society, 2011.
[51]
Ilia Shumailov, Laurent Simon, Jeff Yan, and Ross Anderson. Hearing your touch: A new acoustic side channel on smartphones. arXiv preprint arXiv:1903.11137, 2019.
[52]
Stephen P. Tarzia, Robert P. Dick, Peter A. Dinda, and Gokhan Memik. Sonar-based measurement of user presence and attention. In Proceedings of the 11th International Conference on Ubiquitous Computing, UbiComp '09, pages 89--92, New York, NY, USA, 2009. ACM.
[53]
Stephen P. Tarzia, Peter A. Dinda, Robert P. Dick, and Gokhan Memik. Indoor localization without infrastructure using the acoustic background spectrum. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, MobiSys '11, pages 155--168, New York, NY, USA, 2011. ACM.
[54]
Nicolas Tsingos. Pre-computing geometry-based reverberation effects for games, 2004.
[55]
J. Usher and J. Benesty. Enhancement of spatial sound quality: A new reverberation-extraction audio upmixer. Audio, Speech, and Language Processing, IEEE Transactions on, 15(7):2141--2150, Sept 2007.
[56]
Tavish Vaidya, Tim Walsh, and Micah Sherr. Whisper: a unilateral defense against voip traffic re-identification attacks. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 286--296, 2019.
[57]
Michael Vorla:ander. Auralisation: Fundamentals of Acoustics, Modelling, Simulation, and Acoustic Virtual Reality. Springer, 2007.
[58]
Xuyu Wang, Lingjun Gao, Shiwen Mao, and Santosh Pandey. Csi-based finger-printing for indoor localization: A deep learning approach. IEEE Transactions on Vehicular Technology, 66(1):763--776, 2016.
[59]
Moustafa Youssef and Ashok Agrawala. The horus wlan location determination system. In Proceedings of the 3rd international conference on Mobile systems, applications, and services, pages 205--218, 2005.

Cited By

View all
  • (2022)Background Buster: Peeking through Virtual Backgrounds in Online Video Calls2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN53405.2022.00058(522-533)Online publication date: Jun-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WiSec '21: Proceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks
June 2021
412 pages
ISBN:9781450383493
DOI:10.1145/3448300
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. VoIP security
  2. acoustic fingerprint
  3. call provenance
  4. location privacy
  5. source identification

Qualifiers

  • Research-article

Funding Sources

  • UKIERI
  • EPSRC

Conference

WiSec '21
Sponsor:

Acceptance Rates

WiSec '21 Paper Acceptance Rate 34 of 121 submissions, 28%;
Overall Acceptance Rate 98 of 338 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)4
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Background Buster: Peeking through Virtual Backgrounds in Online Video Calls2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN53405.2022.00058(522-533)Online publication date: Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media