research-article

SOTERIA: Preserving Privacy in Distributed Machine Learning

Authors:

Cláudia Brito,

Pedro Ferreira,

Bernardo Portela,

João PauloAuthors Info & Claims

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

Pages 135 - 142

https://doi.org/10.1145/3555776.3578591

Published: 07 June 2023 Publication History

Abstract

We propose Soteria, a system for distributed privacy-preserving Machine Learning (ML) that leverages Trusted Execution Environments (e.g. Intel SGX) to run code in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The conducted experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41%, when compared to previous related work. Our protocol is accompanied by a security proof, as well as a discussion regarding resilience against a wide spectrum of ML attacks.

References

[1]

[n.d.]. SOTERIA Proof. https://dbr-haslab.github.io/repository/sac23.pdf.

[2]

Mohammad Al-Rubaie and J Morris Chang. 2019. Privacy-preserving machine learning: Threats and solutions. IEEE Security & Privacy.

[3]

Yoshinori Aono, Takuya Hayashi, Lihua Wang, Shiho Moriai, et al. 2017. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security.

[4]

Microsoft Azure. [n. d.]. Azure Confidential Computing. https://azure.microsoft.com/en-us/solutions/confidential-compute/. (Accessed on 22/10/2022).

[5]

Raad Bahmani, Manuel Barbosa, Ferdinand Brasser, Bernardo Portela, et al. 2017. Secure multiparty computation from SGX. In International Conference on Financial Cryptography and Data Security. Springer.

Digital Library

[6]

Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In 6th International Conference on Learning Representations,.

[7]

R. Canetti. 2001. Universally composable security: A new paradigm for cryptographic protocols. In 42nd IEEE Symposium on Foundations of Computer Science.

[8]

Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, et al. 2020. Exploring connections between active learning and model extraction. In 29th USENIX Security Symposium.

Digital Library

[9]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In 10th ACM workshop on artificial intelligence and security.

Digital Library

[10]

Databricks. [n. d.]. Optimizing Apache Spark UDFs. https://www.databricks.com/session_eu20/optimizing-apache-spark-udfs. (Accessed on 27/10/2022).

[11]

Tu Dinh Ngoc, Bao Bui, Stella Bitchebe, Alain Tchana, et al. 2019. Everything you should know about Intel SGX performance on virtualized systems. ACM on Measurement and Analysis of Computing Systems.

[12]

Muhammad El-Hindi, Tobias Ziegler, Matthias Heinrich, Adrian Lutsch, et al. 2022. Benchmarking the Second Generation of Intel SGX Hardware. In Data Management on New Hardware.

[13]

Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In 22nd ACM SIGSAC Conference on Computer and Communications Security.

Digital Library

[14]

Large-Scale Data & Systems (LSDS) Group. [n. d.]. SGX-Spark. https://github.com/lsds/sgx-spark. (Accessed on 22/10/2022).

[15]

Tyler Hunt, Congzheng Song, Reza Shokri, Vitaly Shmatikov, et al. 2018. Chiron: Privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961.

[16]

Nick Hynes, Raymond Cheng, and Dawn Song. 2018. Efficient deep learning on multi-source private data. arXiv preprint arXiv:1807.06689.

[17]

Intel. [n. d.]. HiBench is a big data benchmark suite. https://github.com/Intel-bigdata/HiBench. (Accessed on 22/10/2022).

[18]

Salman Iqbal, Miss Laiha Mat Kiah, Babak Dhaghighi, Muzammil Hussain, Suleman Khan, Muhammad Khurram Khan, and Kim-Kwang Raymond Choo. 2016. On cloud security attacks: A taxonomy and intrusion detection and prevention as a service. Journal of Network and Computer Applications.

Digital Library

[19]

Jianyu Jiang, Xusheng Chen, TszOn Li, Cheng Wang, et al. 2020. Uranus: Simple, efficient sgx programming and its applications. In 15th ACM Asia Conference on Computer and Communications Security.

Digital Library

[20]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial Machine Learning at Scale. In 5th International Conference on Learning Representations.

[21]

Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V Rozas, et al. 2013. Innovative instructions and software model for isolated execution. Hasp isca.

[22]

Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, et al. 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research.

[23]

Olga Ohrimenko, Felix Schuster, Cédric Fournet, Aastha Mehta, et al. 2016. Oblivious multi-party machine learning on trusted processors. In 25th USENIX Security Symposium.

Digital Library

[24]

Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, and other. 2020. Updates-leak: Data set inference and reconstruction attacks in online learning. In 29th USENIX Security Symposium.

[25]

Fahad Shaon, Murat Kantarcioglu, Zhiqiang Lin, and Latifur Khan. 2017. Sgx-bigmatrix: A practical encrypted data analytic framework with trusted processors. In ACM SIGSAC Conference on Computer and Communications Security.

Digital Library

[26]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In Symposium on Security and Privacy (SP).

[27]

Emil Stefanov, Marten Van Dijk, Elaine Shi, T-H Hubert Chan, et al. 2018. Path ORAM: an extremely simple oblivious RAM protocol. J. ACM.

[28]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and other. 2016. Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium.

Digital Library

[29]

Jean-Baptiste Truong, Pratyush Maini, Robert J Walls, and Nicolas Papernot. 2021. Data-free model extraction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]

Chia-Che Tsai, Donald E Porter, and Mona Vij. 2017. Graphene-sgx: A practical library OS for unmodified applications on SGX. In USENIX Annual Technical Conference.

[31]

Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM.

[32]

Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, et al. 2017. Opaque: An oblivious and encrypted distributed analytics platform. In 14th USENIX Symposium on Networked Systems Design and Implementation.

Cited By

Archa AKartheeban K(2024)A Review on Privacy Enhanced Distributed ML Against Poisoning AttacksAI Applications in Cyber Security and Communication Networks10.1007/978-981-97-3973-8_11(173-186)Online publication date: 18-Sep-2024
https://doi.org/10.1007/978-981-97-3973-8_11
Brito CFerreira PPortela BOliveira RPaulo J(2023)Privacy-Preserving Machine Learning on Apache SparkIEEE Access10.1109/ACCESS.2023.333222211(127907-127930)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3332222

Index Terms

SOTERIA: Preserving Privacy in Distributed Machine Learning
1. Security and privacy
  1. Systems security
    1. Distributed systems security

Recommendations

Benchmarking robustness and privacy-preserving methods in federated learning
Highlight
- There is a conflict between robust aggregation methods and privacy protection methods in Federated Learning.
- The greater the resistance of robust aggregators to Byzantine.
- attacks, the more their privacy leakage will also be.
- ...
Abstract
Federated learning (FL) is a machine learning framework that enables the use of user data for training without the need to share the data with the central server. FL's decentralized structure can lead to security and privacy issues, as it allows ...
Cache Attacks on Intel SGX
EuroSec'17: Proceedings of the 10th European Workshop on Systems Security

For the first time, we practically demonstrate that Intel SGX enclaves are vulnerable against cache-timing attacks. As a case study, we present an access-driven cache-timing attack on AES when running inside an Intel SGX enclave. Using Neve and Seifert'...
Survey on Privacy-Preserving Machine Learning Protocols
Machine Learning for Cyber Security
Abstract
Machine learning, especially deep learning, is a hot research field in academia, and it is revolutionizing industry. However, the privacy-preserving problems are not solved. In this paper, we investigate the privacy-preserving technology in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

March 2023

1932 pages

ISBN:9781450395175

DOI:10.1145/3555776

Conference Chairs:
Jiman Hong
Soongsil University, South Korea
,
Maart Lanperne
Tallinn University, Estonia
,
Program Chairs:
Juw Won Park
University of Louisville, USA
,
Tomas Cerny
Baylor University, USA
,
Publication Chair:
Hossain Shahriar
Kennesaw State University, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SAC '23

Sponsor:

SIGAPP

SAC '23: 38th ACM/SIGAPP Symposium on Applied Computing

March 27 - 31, 2023

Tallinn, Estonia

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)6

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Archa AKartheeban K(2024)A Review on Privacy Enhanced Distributed ML Against Poisoning AttacksAI Applications in Cyber Security and Communication Networks10.1007/978-981-97-3973-8_11(173-186)Online publication date: 18-Sep-2024
https://doi.org/10.1007/978-981-97-3973-8_11
Brito CFerreira PPortela BOliveira RPaulo J(2023)Privacy-Preserving Machine Learning on Apache SparkIEEE Access10.1109/ACCESS.2023.333222211(127907-127930)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3332222

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents