Nothing Special   »   [go: up one dir, main page]

Skip to main content

Listening Between the Bits: Privacy Leaks in Audio Fingerprints

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2024)

Abstract

Audio content recognition is an emerging technology that forms the basis for mobile services, such as automatic song recognition, second-screen synchronization, and broadcast monitoring. The technology utilizes audio fingerprints, short patterns that are extracted from audio recordings of a smartphone and enable the identification of specific content. These fingerprints are generally considered privacy-friendly, as they contain minimal information of the original signal. As a result, mobile applications have emerged in the past few years that silently monitor user habits by collecting such audio fingerprints in the background. In this paper, we systematically examine whether audio fingerprints leak sensitive information from the recording environment and potentially violate the privacy of smartphone users. To this end, we analyze three popular audio recognition solutions and develop attacks to infer sensitive information from their fingerprints. To the best of our knowledge, we are the first to show that the identification of speakers and words in the fingerprints is possible. Based on our analysis, we conclude that current audio fingerprints do not sufficiently protect privacy and should be used with great caution.

M. Pfister and R. Michael—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/acr-privacy/attacks.

  2. 2.

    sha1: 3c0770204a5d769c1a22a4acb7f9d6a4dd12e55c.

  3. 3.

    sha1: 5de8eb4098d2e35a2c3951a169bf9e19a680e2d4.

  4. 4.

    https://github.com/acr-privacy/attacks.

References

  1. ACRCloud: ACRCloud: automatic content recognition services for doers (2022). https://www.acrcloud.com/. Accessed 22 Apr 2024

  2. Arp, D., Quiring, E., Wressnegger, C., Rieck, K.: Privacy threats through ultrasonic side channels on mobile devices. In: Proceedings of IEEE European Symposium on Security and Privacy (EuroS &P) (2017)

    Google Scholar 

  3. Arp, D., et al.: Dos and don’ts of machine learning in computer security. In: Proceedings of USENIX Security Symposium (2022)

    Google Scholar 

  4. Brookman, J., Rouge, P., Alva, A., Yeung, C.: Cross-device tracking: measurement and disclosures. Proc. Priv. Enhancing Technol. (PETS) 2017(2) (2017)

    Google Scholar 

  5. Celosia, G., Cunche, M.: Discontinued privacy: personal data leaks in apple Bluetooth-low-energy continuity protocols. Proc. Priv. Enhancing Technol. (PETS) 2020(1) (2020)

    Google Scholar 

  6. Chatterjee, R., et al.: The spyware used in intimate partner violence. In: Proceedings of IEEE Symposium on Security and Privacy (S &P) (2018)

    Google Scholar 

  7. Chen, H., Laine, K., Rindal, P.: Fast private set intersection from homomorphic encryption. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2017)

    Google Scholar 

  8. Deezer: Deezer \(|\) listen to music \(|\) online music streaming platform (2022). https://www.deezer.com. Accessed 22 Apr 2024

  9. Deezer: Third party data breach – deezer support (2022). https://support.deezer.com/hc/en-gb/articles/7726141292317-Third-Party-Data-Breach. Accessed 22 Apr 2024

  10. Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: a dataset for music analysis. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2017)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2019)

    Google Scholar 

  12. Dong, C., Chen, L., Wen, Z.: When private set intersection meets big data: an efficient and scalable protocol. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2013)

    Google Scholar 

  13. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  14. Faragher, R., Harle, R.: Location fingerprinting with Bluetooth low energy beacons. IEEE J. Sel. Areas Commun. 33(11), 2418–2428 (2015)

    Article  Google Scholar 

  15. Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2002)

    Google Scholar 

  16. Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers. CoRR abs/2104.05704 (2021)

    Google Scholar 

  17. Huang, G., Sun, Yu., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39

    Chapter  Google Scholar 

  18. Jawurek, M., Johns, M., Rieck, K.: Smart metering de-pseudonymization. In: Proceedings of Annual Computer Security Applications Conference (ACSAC) (2011)

    Google Scholar 

  19. Kennedy, S., Li, H., Wang, C., Liu, H., Wang, B., Sun, W.: I can hear your Alexa: voice command fingerprinting on smart home speakers. In: Proceedings of IEEE Conference on Communications and Network Security (CNS) (2019)

    Google Scholar 

  20. Kim, H.G., Cho, H.S., Kim, J.Y.: Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment. Cluster Comput. 19(1) (2016)

    Google Scholar 

  21. Knospe, H.: Privacy-enhanced perceptual hashing of audio data. In: International Conference on Security and Cryptography (SECRYPT) (2013)

    Google Scholar 

  22. Konjeti, S., Potty, H., Kashyap, D.: Zapr audio fingerprinting (2017). https://www.music-ir.org/mirex/abstracts/2017/KP1.pdf

  23. Korolova, A., Sharma, V.: Cross-app tracking via nearby Bluetooth low energy devices. In: Proceedings of ACM Conference on Data and Applications Security and Privacy (CODASPY) (2018)

    Google Scholar 

  24. LG Ads Solutions: Alphonso ACR technology and consumer choice - lg ads\(^{3}\) (2018). https://tinyurl.com/yp9t2dmz. Accessed 22 Apr 2024

  25. LG Ads Solutions: Automatic content recognition (2021). https://alphonso.tv/. Accessed 22 Apr 2024

  26. Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2006)

    Google Scholar 

  27. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  28. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  29. Mavroudis, V., Hao, S., Fratantonio, Y., Maggi, F., Kruegel, C., Vigna, G.: On the privacy and security of the ultrasound ecosystem. Proc. Priv. Enhancing Technol. (PETS) 2017(2) (2017)

    Google Scholar 

  30. Media, B.: Hotstar, newsdog and other Indian apps are spying on your phone’s mic (2018). https://beebom.com/hotstar-newsdog-apps-spying-phone-mic/. Accessed 22 Apr 2024

  31. Musixmatch S.p.A.: Musixmatch website (2022). https://www.musixmatch.com. Accessed 22 Apr 2024

  32. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)

    Google Scholar 

  33. Park, M., Kim, H.R., Yang, S.H.: Frequency-temporal filtering for a robust audio fingerprinting scheme in real-noise environments. ETRI J. 28(4) (2006)

    Google Scholar 

  34. Ravnås, O.A.V.: Frida \(\bullet \) a world-class dynamic instrumentation framework (2022). https://frida.re/. Accessed 22 Apr 2024

  35. Reardon, J., Feal, Á., Wijesekera, P., On, A.E.B., Vallina-Rodriguez, N., Egelman, S.: 50 ways to leak your data: an exploration of apps’ circumvention of the Android permissions system. In: Proceedings of USENIX Security Symposium (2019)

    Google Scholar 

  36. Rimmer, V., Preuveneers, D., Juárez, M., van Goethem, T., Joosen, W.: Automated website fingerprinting through deep learning. In: Proceedings of Network and Distributed System Security Symposium (NDSS) (2018)

    Google Scholar 

  37. Saadatpanah, P., Shafahi, A., Goldstein, T.: Adversarial attacks on copyright detection systems. In: Proceedings of International Conference on Machine Learning (ICML) (2020)

    Google Scholar 

  38. Schlegel, R., Zhang, K., Zhou, X., Intwala, M., Kapadia, A., Wang, X.: Soundcomber: a stealthy and context-aware sound trojan for smartphones. In: Proceedings of Network and Distributed System Security Symposium (NDSS) (2011)

    Google Scholar 

  39. Son, W., Cho, H.T., Yoon, K.: Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices. In: Digest of Technical Papers International Conference on Consumer Electronics (ICCE) (2010)

    Google Scholar 

  40. Sonnleitner, R., Widmer, G.: Robust quad-based audio fingerprinting. IEEE ACM Trans. Audio Speech Lang. Process. 24(3) (2016)

    Google Scholar 

  41. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  42. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  43. The New York Times Company: All 3 billion yahoo accounts were affected by 2013 attack (2017). https://www.nytimes.com/2017/10/03/technology/yahoo-hack-3-billion-users.html. Accessed 22 Apr 2024

  44. The New York Times Company: That game on your phone may be tracking what you’re watching on TV (2017). https://www.nytimes.com/2017/12/28/business/media/alphonso-app-tracking.html. Accessed 22 Apr 2024

  45. Thiemert, S., Nürnberger, S., Steinebach, M., Zmudzinski, S.: Security of robust audio hashes. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2009)

    Google Scholar 

  46. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS) (2017)

    Google Scholar 

  47. Wang, A.: An industrial strength audio search algorithm. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2003)

    Google Scholar 

  48. Wang, A.: The shazam music recognition service. Commun. ACM 49(8) (2006)

    Google Scholar 

  49. Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. CoRR abs/1804.03209 (2018). http://arxiv.org/abs/1804.03209

  50. White, A.M., Matthews, A.R., Snow, K.Z., Monrose, F.: Phonotactic reconstruction of encrypted VoIP conversations: Hookt on Fon-iks. In: Proceedings of IEEE Symposium on Security and Privacy (S &P) (2011)

    Google Scholar 

  51. Xu, Y., Frahm, J., Monrose, F.: Watching the watchers: automatically inferring TV content from outdoor light effusions. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2014)

    Google Scholar 

  52. Zapr Media Labs: Privacy \(|\) Zapr Media Labs (Zapr discontinued its service in mid 2022. Thus, we can only provide a link to the snapshot of the website) (2022). https://tinyurl.com/rneknwyb. Accessed 22 Apr 2024

  53. Zapr Media Labs: Zapr \(|\) TV analytics, integrated advertising, real-time surveys\(^{3}\) (2022). https://tinyurl.com/2vhr6vmu. Accessed 22 Apr 2024

  54. Zimmeck, S., Li, J.S., Kim, H., Bellovin, S.M., Jebara, T.: A privacy analysis of cross-device tracking. In: Proceedings of USENIX Security Symposium (2017)

    Google Scholar 

Download references

Acknowledgements

This work was funded by the German Federal Ministry of Education and Research (BMBF) under the grants BIFOLD24B and 16KIS1142K.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Arp .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

A Details on ACR Solutions

A Details on ACR Solutions

In this section, we provide information that we obtained through reverse engineering of the two commercial ACR solutions Zapr and ACRCloud.

Analysis Setup

We start by describing our experimental setup to reverse engineer the apps and discuss our findings of both solutions afterward.

Mobile Apps. For ACRCloud, we base our analysis on the Deezer app (version 6.1.14.99.) and verify that our insights also remain valid for more recent versions of the SDK (version 6.2.13.151). Similarly, we use the Android smartphone application ABP Live TV News (version 9.9.7) for Zapr.

Dynamic Analysis. Both solutions encapsulate the implementations of the ACR algorithms in a shared library, which is provided as a native binary object and accessed by the Android apps through the Android apps through the Java Native Interface (JNI). We treat the fingerprinting algorithms inside the shared object as a black box and observe its return values. To this end, we use the dynamic instrumentation toolkit Frida, which allows us to run the fingerprinting algorithms on controlled input signals, and extract the resulting audio fingerprints. A static analysis shows that all algorithms expect the input signal to be sampled at a frequency of 8,000 Hz with an audio bit depth of 16 bit. Providing the ACR implementations with properly preprocessed audio samples yields the required audio fingerprints, which can then be utilized for further analysis.

To learn more about the underlying structure of the fingerprints, we perform controlled experiments using specifically crafted audio signals from which we derive audio fingerprints. For instance, we use audio signals that contain only one particular frequency or even pure silence.

Table 2. Overview of final model parameters. The table shows the number of bits of the subfingerprints (l), sequence length (k), the embedding size (d), the number of encoder blocks (encoders), and the number of heads per encoder (heads) for each setting.

Fingerprint Structures

We find that the fingerprint structures do not only widely differ between ACRCloud and Zapr, but even between the two Zapr algorithms we selected for our analysis. In the following, we provide more details on our findings.

ACRCloud. For ACRCloud, we find that the generated audio fingerprints vary in length, although all audio snippets are three seconds long. In particular, the length of the generated fingerprints for our dataset varies between 344 and 752 bytes, with a median at 544 bytes. Each fingerprint consists of multiple subfingerprints (see Sect. 3.2) with a length of 8 bytes. The first two bytes of each subfingerprint encode the frequency of an identified peak. Here, ACRCloud seemingly segments the frequency band, which has a maximum frequency of 4,000 Hz, into 1024 distinct bins of equal size, leading to a frequency resolution of \(f_{res} = \frac{4000~\textrm{Hz}}{1024} \approx 3.906~\textrm{Hz}\). The third and fourth byte of the subfingerprints encode the time offset \(\varDelta t\) with a granularity of roughly 20 ms. For the last four bytes, we are unable to derive clear explanations. But we notice that the information stored in these bytes depend on the frequency bytes but not on the time offset.

Zapr. For Zapr Alg1, none of the observed fingerprints exceeds 340 bytes in length, which suggests a maximum length for the fingerprints. Additionally, each fingerprint’s length is divisible by 4 bytes, indicating that they are composed of multiple subfingerprints, each 4 bytes long. The only exception we find is for silent signals, for which the algorithm does not output any fingerprints. The first two bytes of a fingerprint encode the time offset with a precision of 2 s. The last byte encodes frequency information, systematically partitioning the 4 kHz-band into 256 distinct frequency bins. The purpose of the third byte remains unclear. Unfortunately, for Zapr Alg2, we have not been able to derive information about its structure.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pfister, M., Michael, R., Boll, M., Körfer, C., Rieck, K., Arp, D. (2024). Listening Between the Bits: Privacy Leaks in Audio Fingerprints. In: Maggi, F., Egele, M., Payer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2024. Lecture Notes in Computer Science, vol 14828. Springer, Cham. https://doi.org/10.1007/978-3-031-64171-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64171-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64170-1

  • Online ISBN: 978-3-031-64171-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics