Listening Between the Bits: Privacy Leaks in Audio Fingerprints

Moritz Pfister²⁸,
Robert Michael²⁸,
Max Boll²⁸,
Cosima Körfer²⁸,
Konrad Rieck^29,30 &
…
Daniel Arp^29,30

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14828))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

628 Accesses

Abstract

Audio content recognition is an emerging technology that forms the basis for mobile services, such as automatic song recognition, second-screen synchronization, and broadcast monitoring. The technology utilizes audio fingerprints, short patterns that are extracted from audio recordings of a smartphone and enable the identification of specific content. These fingerprints are generally considered privacy-friendly, as they contain minimal information of the original signal. As a result, mobile applications have emerged in the past few years that silently monitor user habits by collecting such audio fingerprints in the background. In this paper, we systematically examine whether audio fingerprints leak sensitive information from the recording environment and potentially violate the privacy of smartphone users. To this end, we analyze three popular audio recognition solutions and develop attacks to infer sensitive information from their fingerprints. To the best of our knowledge, we are the first to show that the identification of speakers and words in the fingerprints is possible. Based on our analysis, we conclude that current audio fingerprints do not sufficiently protect privacy and should be used with great caution.

M. Pfister and R. Michael—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A low-complexity audio fingerprinting technique for embedded applications

Article 27 February 2017

Watermarking and Fingerprinting

Speech Manipulation Detection Method Using Speech Fingerprints and Timestamp Data

Notes

1.
https://github.com/acr-privacy/attacks.
2.
sha1: 3c0770204a5d769c1a22a4acb7f9d6a4dd12e55c.
3.
sha1: 5de8eb4098d2e35a2c3951a169bf9e19a680e2d4.
4.
https://github.com/acr-privacy/attacks.

References

ACRCloud: ACRCloud: automatic content recognition services for doers (2022). https://www.acrcloud.com/. Accessed 22 Apr 2024
Arp, D., Quiring, E., Wressnegger, C., Rieck, K.: Privacy threats through ultrasonic side channels on mobile devices. In: Proceedings of IEEE European Symposium on Security and Privacy (EuroS &P) (2017)
Google Scholar
Arp, D., et al.: Dos and don’ts of machine learning in computer security. In: Proceedings of USENIX Security Symposium (2022)
Google Scholar
Brookman, J., Rouge, P., Alva, A., Yeung, C.: Cross-device tracking: measurement and disclosures. Proc. Priv. Enhancing Technol. (PETS) 2017(2) (2017)
Google Scholar
Celosia, G., Cunche, M.: Discontinued privacy: personal data leaks in apple Bluetooth-low-energy continuity protocols. Proc. Priv. Enhancing Technol. (PETS) 2020(1) (2020)
Google Scholar
Chatterjee, R., et al.: The spyware used in intimate partner violence. In: Proceedings of IEEE Symposium on Security and Privacy (S &P) (2018)
Google Scholar
Chen, H., Laine, K., Rindal, P.: Fast private set intersection from homomorphic encryption. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2017)
Google Scholar
Deezer: Deezer $|$ listen to music $|$ online music streaming platform (2022). https://www.deezer.com. Accessed 22 Apr 2024
Deezer: Third party data breach – deezer support (2022). https://support.deezer.com/hc/en-gb/articles/7726141292317-Third-Party-Data-Breach. Accessed 22 Apr 2024
Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: a dataset for music analysis. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2019)
Google Scholar
Dong, C., Chen, L., Wen, Z.: When private set intersection meets big data: an efficient and scalable protocol. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2013)
Google Scholar
Dosovitskiy, A., et al.: An image is worth $16 \times 16$ words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Faragher, R., Harle, R.: Location fingerprinting with Bluetooth low energy beacons. IEEE J. Sel. Areas Commun. 33(11), 2418–2428 (2015)
Article Google Scholar
Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2002)
Google Scholar
Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers. CoRR abs/2104.05704 (2021)
Google Scholar
Huang, G., Sun, Yu., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39
Chapter Google Scholar
Jawurek, M., Johns, M., Rieck, K.: Smart metering de-pseudonymization. In: Proceedings of Annual Computer Security Applications Conference (ACSAC) (2011)
Google Scholar
Kennedy, S., Li, H., Wang, C., Liu, H., Wang, B., Sun, W.: I can hear your Alexa: voice command fingerprinting on smart home speakers. In: Proceedings of IEEE Conference on Communications and Network Security (CNS) (2019)
Google Scholar
Kim, H.G., Cho, H.S., Kim, J.Y.: Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment. Cluster Comput. 19(1) (2016)
Google Scholar
Knospe, H.: Privacy-enhanced perceptual hashing of audio data. In: International Conference on Security and Cryptography (SECRYPT) (2013)
Google Scholar
Konjeti, S., Potty, H., Kashyap, D.: Zapr audio fingerprinting (2017). https://www.music-ir.org/mirex/abstracts/2017/KP1.pdf
Korolova, A., Sharma, V.: Cross-app tracking via nearby Bluetooth low energy devices. In: Proceedings of ACM Conference on Data and Applications Security and Privacy (CODASPY) (2018)
Google Scholar
LG Ads Solutions: Alphonso ACR technology and consumer choice - lg ads$^{3}$ (2018). https://tinyurl.com/yp9t2dmz. Accessed 22 Apr 2024
LG Ads Solutions: Automatic content recognition (2021). https://alphonso.tv/. Accessed 22 Apr 2024
Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2006)
Google Scholar
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Mavroudis, V., Hao, S., Fratantonio, Y., Maggi, F., Kruegel, C., Vigna, G.: On the privacy and security of the ultrasound ecosystem. Proc. Priv. Enhancing Technol. (PETS) 2017(2) (2017)
Google Scholar
Media, B.: Hotstar, newsdog and other Indian apps are spying on your phone’s mic (2018). https://beebom.com/hotstar-newsdog-apps-spying-phone-mic/. Accessed 22 Apr 2024
Musixmatch S.p.A.: Musixmatch website (2022). https://www.musixmatch.com. Accessed 22 Apr 2024
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Google Scholar
Park, M., Kim, H.R., Yang, S.H.: Frequency-temporal filtering for a robust audio fingerprinting scheme in real-noise environments. ETRI J. 28(4) (2006)
Google Scholar
Ravnås, O.A.V.: Frida $\bullet $ a world-class dynamic instrumentation framework (2022). https://frida.re/. Accessed 22 Apr 2024
Reardon, J., Feal, Á., Wijesekera, P., On, A.E.B., Vallina-Rodriguez, N., Egelman, S.: 50 ways to leak your data: an exploration of apps’ circumvention of the Android permissions system. In: Proceedings of USENIX Security Symposium (2019)
Google Scholar
Rimmer, V., Preuveneers, D., Juárez, M., van Goethem, T., Joosen, W.: Automated website fingerprinting through deep learning. In: Proceedings of Network and Distributed System Security Symposium (NDSS) (2018)
Google Scholar
Saadatpanah, P., Shafahi, A., Goldstein, T.: Adversarial attacks on copyright detection systems. In: Proceedings of International Conference on Machine Learning (ICML) (2020)
Google Scholar
Schlegel, R., Zhang, K., Zhou, X., Intwala, M., Kapadia, A., Wang, X.: Soundcomber: a stealthy and context-aware sound trojan for smartphones. In: Proceedings of Network and Distributed System Security Symposium (NDSS) (2011)
Google Scholar
Son, W., Cho, H.T., Yoon, K.: Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices. In: Digest of Technical Papers International Conference on Consumer Electronics (ICCE) (2010)
Google Scholar
Sonnleitner, R., Widmer, G.: Robust quad-based audio fingerprinting. IEEE ACM Trans. Audio Speech Lang. Process. 24(3) (2016)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
The New York Times Company: All 3 billion yahoo accounts were affected by 2013 attack (2017). https://www.nytimes.com/2017/10/03/technology/yahoo-hack-3-billion-users.html. Accessed 22 Apr 2024
The New York Times Company: That game on your phone may be tracking what you’re watching on TV (2017). https://www.nytimes.com/2017/12/28/business/media/alphonso-app-tracking.html. Accessed 22 Apr 2024
Thiemert, S., Nürnberger, S., Steinebach, M., Zmudzinski, S.: Security of robust audio hashes. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2009)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Wang, A.: An industrial strength audio search algorithm. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (2003)
Google Scholar
Wang, A.: The shazam music recognition service. Commun. ACM 49(8) (2006)
Google Scholar
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. CoRR abs/1804.03209 (2018). http://arxiv.org/abs/1804.03209
White, A.M., Matthews, A.R., Snow, K.Z., Monrose, F.: Phonotactic reconstruction of encrypted VoIP conversations: Hookt on Fon-iks. In: Proceedings of IEEE Symposium on Security and Privacy (S &P) (2011)
Google Scholar
Xu, Y., Frahm, J., Monrose, F.: Watching the watchers: automatically inferring TV content from outdoor light effusions. In: Proceedings of ACM Conference on Computer and Communications Security (CCS) (2014)
Google Scholar
Zapr Media Labs: Privacy $|$ Zapr Media Labs (Zapr discontinued its service in mid 2022. Thus, we can only provide a link to the snapshot of the website) (2022). https://tinyurl.com/rneknwyb. Accessed 22 Apr 2024
Zapr Media Labs: Zapr $|$ TV analytics, integrated advertising, real-time surveys$^{3}$ (2022). https://tinyurl.com/2vhr6vmu. Accessed 22 Apr 2024
Zimmeck, S., Li, J.S., Kim, H., Bellovin, S.M., Jebara, T.: A privacy analysis of cross-device tracking. In: Proceedings of USENIX Security Symposium (2017)
Google Scholar

Download references

Acknowledgements

This work was funded by the German Federal Ministry of Education and Research (BMBF) under the grants BIFOLD24B and 16KIS1142K.

Author information

Authors and Affiliations

Technische Universität Braunschweig, Braunschweig, Germany
Moritz Pfister, Robert Michael, Max Boll & Cosima Körfer
Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
Konrad Rieck & Daniel Arp
Technische Universität Berlin, Berlin, Germany
Konrad Rieck & Daniel Arp

Authors

Moritz Pfister
View author publications
You can also search for this author in PubMed Google Scholar
Robert Michael
View author publications
You can also search for this author in PubMed Google Scholar
Max Boll
View author publications
You can also search for this author in PubMed Google Scholar
Cosima Körfer
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Arp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Arp .

Editor information

Editors and Affiliations

AWS, San Diego, CA, USA
Federico Maggi
Boston University, Boston, MA, USA
Manuel Egele
EPFL, Lausanne, Switzerland
Mathias Payer
Politecnico di Milano, Milan, Italy
Michele Carminati

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

A Details on ACR Solutions

In this section, we provide information that we obtained through reverse engineering of the two commercial ACR solutions Zapr and ACRCloud.

Analysis Setup

We start by describing our experimental setup to reverse engineer the apps and discuss our findings of both solutions afterward.

Mobile Apps. For ACRCloud, we base our analysis on the Deezer app (version 6.1.14.99.) and verify that our insights also remain valid for more recent versions of the SDK (version 6.2.13.151). Similarly, we use the Android smartphone application ABP Live TV News (version 9.9.7) for Zapr.

Dynamic Analysis. Both solutions encapsulate the implementations of the ACR algorithms in a shared library, which is provided as a native binary object and accessed by the Android apps through the Android apps through the Java Native Interface (JNI). We treat the fingerprinting algorithms inside the shared object as a black box and observe its return values. To this end, we use the dynamic instrumentation toolkit Frida, which allows us to run the fingerprinting algorithms on controlled input signals, and extract the resulting audio fingerprints. A static analysis shows that all algorithms expect the input signal to be sampled at a frequency of 8,000 Hz with an audio bit depth of 16 bit. Providing the ACR implementations with properly preprocessed audio samples yields the required audio fingerprints, which can then be utilized for further analysis.

To learn more about the underlying structure of the fingerprints, we perform controlled experiments using specifically crafted audio signals from which we derive audio fingerprints. For instance, we use audio signals that contain only one particular frequency or even pure silence.

Table 2. Overview of final model parameters. The table shows the number of bits of the subfingerprints (l), sequence length (k), the embedding size (d), the number of encoder blocks (encoders), and the number of heads per encoder (heads) for each setting.

Full size table

Fingerprint Structures

We find that the fingerprint structures do not only widely differ between ACRCloud and Zapr, but even between the two Zapr algorithms we selected for our analysis. In the following, we provide more details on our findings.

ACRCloud. For ACRCloud, we find that the generated audio fingerprints vary in length, although all audio snippets are three seconds long. In particular, the length of the generated fingerprints for our dataset varies between 344 and 752 bytes, with a median at 544 bytes. Each fingerprint consists of multiple subfingerprints (see Sect. 3.2) with a length of 8 bytes. The first two bytes of each subfingerprint encode the frequency of an identified peak. Here, ACRCloud seemingly segments the frequency band, which has a maximum frequency of 4,000 Hz, into 1024 distinct bins of equal size, leading to a frequency resolution of $f_{res} = \frac{4000~\textrm{Hz}}{1024} \approx 3.906~\textrm{Hz}$. The third and fourth byte of the subfingerprints encode the time offset $\varDelta t$ with a granularity of roughly 20 ms. For the last four bytes, we are unable to derive clear explanations. But we notice that the information stored in these bytes depend on the frequency bytes but not on the time offset.

Zapr. For Zapr Alg1, none of the observed fingerprints exceeds 340 bytes in length, which suggests a maximum length for the fingerprints. Additionally, each fingerprint’s length is divisible by 4 bytes, indicating that they are composed of multiple subfingerprints, each 4 bytes long. The only exception we find is for silent signals, for which the algorithm does not output any fingerprints. The first two bytes of a fingerprint encode the time offset with a precision of 2 s. The last byte encodes frequency information, systematically partitioning the 4 kHz-band into 256 distinct frequency bins. The purpose of the third byte remains unclear. Unfortunately, for Zapr Alg2, we have not been able to derive information about its structure.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pfister, M., Michael, R., Boll, M., Körfer, C., Rieck, K., Arp, D. (2024). Listening Between the Bits: Privacy Leaks in Audio Fingerprints. In: Maggi, F., Egele, M., Payer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2024. Lecture Notes in Computer Science, vol 14828. Springer, Cham. https://doi.org/10.1007/978-3-031-64171-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-64171-8_10
Published: 09 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64170-1
Online ISBN: 978-3-031-64171-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Listening Between the Bits: Privacy Leaks in Audio Fingerprints

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A low-complexity audio fingerprinting technique for embedded applications

Watermarking and Fingerprinting

Speech Manipulation Detection Method Using Speech Fingerprints and Timestamp Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

A Details on ACR Solutions

A Details on ACR Solutions

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us