Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SecretFlow-SCQL: A Secure Collaborative Query Platform

Published: 08 November 2024 Publication History

Abstract

In the business scenarios at Ant Group, there is a rising demand for collaborative data analysis among multiple institutions, which can promote health insurance, financial services, risk control, and others. However, the increasing concern about privacy issues has led to data silos. Secure Multi-Party Computation (MPC) provides an effective solution for collaborative data analysis, which can utilize data value while ensuring data security. Nevertheless, the performance bottlenecks of MPC and the strong demand for scalability pose great challenges to secure collaborative data analysis frameworks.
In this paper, we build a secure collaborative data analysis system SCQL with a general purpose. We design more efficient MPC protocols and relational operators to meet the demand for scalability. In terms of system design, we aim to implement a system with security, usability, and efficiency.
We conduct extensive experiments on SCQL to validate our optimization improvements: (1) Our optimized secure sort protocol sorts one million 64-bit data in only 4.5 minutes, 126× faster than EMP (9.4 hours). (2) The end-to-end execution time of the typical vertical scenario query is reduced by 1991× from the state-of-the-art semi-honest collaborative analysis framework Secrecy (rewritten with Additive Secret Sharing protocol), with appropriate security tradeoffs. (3) We test the system in the WAN setting with input size = 107 to demonstrate the scalability. We have successfully deployed SCQL to address problems in real-world business scenarios at Ant Group.

References

[1]
[n.d.]. General Data Protection Regulation (GDPR). https://gdpr-info.eu/. Accessed April 4, 2010.
[2]
Kenneth E Batcher. 1968. Sorting networks and their applications. In Proceedings of the April 30--May 2, 1968, spring joint computer conference. 307--314.
[3]
Johes Bater, Gregory Elliott, Craig Eggen, Satyender Goel, Abel N Kho, and Jennie Rogers. 2017. SMCQL: Secure Query Processing for Private Data Networks. Proc. VLDB Endow. 10, 6 (2017), 673--684.
[4]
Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, and Jennie Rogers. 2018. Shrinkwrap: efficient sql query processing in differentially private data federations. Proceedings of the VLDB Endowment 12, 3 (2018).
[5]
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment 13, 12 (2020), 2691--2705.
[6]
Donald Beaver. 1991. Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference. Springer, 420--432.
[7]
Mihir Bellare, Alexandra Boldyreva, and Adam O'Neill. 2007. Deterministic and Efficiently Searchable Encryption. In Advances in Cryptology - CRYPTO 2007, Alfred Menezes (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 535--552.
[8]
Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O'Neill. 2009. Order-Preserving Symmetric Encryption. 224--241.
[9]
Alexandra Boldyreva, Nathan Chenette, and Adam O'Neill. 2011. Order-preserving encryption revisited: improved security analysis and alternative solutions. 578--595.
[10]
Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, Srdjan Capkun, and Ahmad-Reza Sadeghi. 2017. Software grand exposure:{SGX} cache attacks are practical. In 11th USENIX Workshop on Offensive Technologies (WOOT 17).
[11]
David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. 2015. Leakage-Abuse Attacks Against Searchable Encryption. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.
[12]
Koji Chida, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Naoto Kiribuchi, and Benny Pinkas. 2019. An efficient secure three-party sorting protocol with an honest majority. Cryptology ePrint Archive (2019).
[13]
Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. 2011. Searchable symmetric encryption: Improved definitions and efficient constructions. Journal of Computer Security (Nov 2011), 895--934.
[14]
Ankur Dave, Chester Leung, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2020. Oblivious coopetitive analytics using hardware enclaves. In Proceedings of the Fifteenth European Conference on Computer Systems. 1--17.
[15]
Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation. In NDSS.
[16]
Daniel Escudero. 2022. An introduction to secret-sharing-based secure multiparty computation. Cryptology ePrint Archive (2022).
[17]
Saba Eskandarian and Matei Zaharia. 2019. ObliDB: Oblivious Query Processing for Secure Databases. Proc. VLDB Endow. 13, 2 (oct 2019), 169--183.
[18]
David Evans, Vladimir Kolesnikov, and Mike Rosulek. 2018. A Pragmatic Introduction to Secure Multi-Party Computation.
[19]
Wenjing Fang, Derun Zhao, Jin Tan, Chaochao Chen, Chaofan Yu, Li Wang, Lei Wang, Jun Zhou, and Benyu Zhang. 2021. Large-scale secure XGB for vertical federated learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 443--452.
[20]
Fangcheng Fu, Huanran Xue, Yong Cheng, Yangyu Tao, and Bin Cui. 2022. Blindfl: Vertical federated machine learning without peeking into your data. In Proceedings of the 2022 International Conference on Management of Data. 1316--1330.
[21]
Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 169--178.
[22]
Oded Goldreich. 2003. Cryptography and cryptographic protocols. Distributed Computing (Sep 2003), 177--199.
[23]
Robert E Goldschmidt. 1964. Applications of division by convergence. Ph.D. Dissertation. Massachusetts Institute of Technology.
[24]
Feng Han, Lan Zhang, Hanwen Feng, Weiran Liu, and Xiangyang Li. 2022. Scape: Scalable collaborative analytics system on private database with malicious security. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1740--1753.
[25]
Xi He, Ashwin Machanavajjhala, Cheryl Flynn, and Divesh Srivastava. 2017. Composing differential privacy and secure computation: A case study on scaling private record linkage. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1389--1406.
[26]
Zhian He, Wai Kit Wong, Ben Kao, David Wai Lok Cheung, Rongbin Li, Siu Ming Yiu, and Eric Lo. 2015. SDB: a secure query processing system with data interoperability. Proceedings of the VLDB Endowment (Aug 2015), 1876--1879.
[27]
Bernardo A Huberman, Matt Franklin, and Tad Hogg. 1999. Enhancing privacy and trust in electronic communities. In Proceedings of the 1st ACM conference on Electronic commerce. 78--86.
[28]
Mihaela Ion, Benjamin Kreuter, Erhan Nergiz, Sarvar Patel, Mariana Raykova, Shobhit Saxena, Karn Seth, David Shanahan, and Moti Yung. 2020. Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions. In 2020 IEEE European Symposium on Security and Privacy (EuroS&P). 370--389. https://eprint.iacr.org/2019/723.pdf
[29]
Yuval Ishai, Eyal Kushilevitz, Sigurd Meldgaard, Claudio Orlandi, and Anat Paskin-Cherniavsky. 2013. On the power of correlated randomness in secure computation. In Theory of Cryptography Conference. Springer, 600--620.
[30]
MohammadSaiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. 2012. Access Pattern disclosure on Searchable Encryption: Ramification, Attack and Mitigation. Network and Distributed System Security Symposium, Network and Distributed System Security Symposium (Jan 2012).
[31]
Liina Kamm, Dan Bogdanov, Sven Laur, and Jaak Vilo. 2013. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 7 (2013), 886--893.
[32]
Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O'Neill. 2016. Generic Attacks on Secure Outsourced Databases. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.
[33]
Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Marcus Peinado. 2017. Inferring fine-grained control flow inside {SGX} enclaves with branch shadowing. In 26th USENIX Security Symposium (USENIX Security 17). 557--574.
[34]
John Liagouris, Vasiliki Kalavri, Muhammad Faisal, and Mayank Varia. 2023. {SECRECY}: Secure collaborative analytics in untrusted clouds. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 1031--1056.
[35]
Linpeng Lu and Ning Ding. 2020. Multi-party private set intersection in vertical federated learning. In 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 707--714.
[36]
Junming Ma, Yancheng Zheng, Jun Feng, Derun Zhao, Haoqi Wu, Wenjing Fang, Jin Tan, Chaofan Yu, Benyu Zhang, and Lei Wang. [n.d.]. SecretFlow-SPU: A Performant and User-Friendly Framework for Privacy-Preserving Machine Learning. ([n. d.]).
[37]
Payman Mohassel and Peter Rindal. 2018. ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security. 35--52.
[38]
Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 19--38.
[39]
Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join Queries over Distributed Databases. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, Hollywood, CA, 149--162. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/narayan
[40]
Muhammad Naveed, Seny Kamara, and Charles V. Wright. 2015. Inference Attacks on Property-Preserving Encrypted Databases. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.
[41]
Pascal Paillier. 2007. Public-key cryptosystems based on composite degree residuosity classes. 223--238.
[42]
Antonis Papadimitriou, Ranjita Bhagwan, Nishanth Chandran, Ramachandran Ramjee, Andreas Haeberlen, Harmeet Singh, Abhishek Modi, and Saikrishna Badrinarayanan. 2016. Big Data Analytics over Encrypted Datasets with Seabed. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 587--602. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/papadimitriou
[43]
Antonis Papadimitriou, Arjun Narayan, and Andreas Haeberlen. 2017. DStress: Efficient differentially private computations on distributed data. In Proceedings of the Twelfth European Conference on Computer Systems. 560--574.
[44]
Vasilis Pappas, Fernando Krell, Binh Vo, Vladimir Kolesnikov, Tal Malkin, Seung Geol Choi, Wesley George, Angelos Keromytis, and Steve Bellovin. 2014. Blind Seer: A Scalable Private DBMS. In 2014 IEEE Symposium on Security and Privacy.
[45]
Rishabh Poddar, Tobias Boelter, and Raluca Ada Popa. 2019. Arx: An Encrypted Database Using Semantically Secure Encryption. 12, 11 (jul 2019), 1664--1678.
[46]
Rishabh Poddar, Sukrit Kalra, Avishay Yanai, Ryan Deng, Raluca Ada Popa, and Joseph M Hellerstein. 2021. Senate: a {Maliciously-Secure}{MPC} platform for collaborative analytics. In 30th USENIX Security Symposium (USENIX Security 21). 2129--2146.
[47]
Raluca Ada Popa, Catherine MS Redfield, Nickolai Zeldovich, and Hari Balakrishnan. 2011. CryptDB: Protecting confidentiality with encrypted query processing. In Proceedings of the twenty-third ACM symposium on operating systems principles. 85--100.
[48]
Christian Priebe, Kapil Vaswani, and Manuel Costa. 2018. EnclaveDB: A secure database using SGX. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 264--278.
[49]
Xuanle Ren, Le Su, Zhen Gu, Sheng Wang, Feifei Li, Yuan Xie, Song Bian, Chao Li, and Fan Zhang. 2022. HEDA: Multi-Attribute Unbounded Aggregation over Homomorphically Encrypted Database. Proceedings of the VLDB Endowment 16, 4 (2022), 601--614.
[50]
Alex Sangers, Maran van Heesch, Thomas Attema, Thijs Veugen, Mark Wiggerman, Jan Veldsink, Oscar Bloemen, and Daniël Worm. 2019. Secure multiparty PageRank algorithm for collaborative fraud detection. In Financial Cryptography and Data Security: 23rd International Conference, FC 2019, Frigate Bay, St. Kitts and Nevis, February 18--22, 2019, Revised Selected Papers 23. Springer, 605--623.
[51]
Dawn Xiaoding Song, D. Wagner, and A. Perrig. 2002. Practical techniques for searches on encrypted data. In Proceeding 2000 IEEE Symposium on Security and Privacy. SP 2000.
[52]
Stephen Lyle Tu, M Frans Kaashoek, Samuel R Madden, and Nickolai Zeldovich. 2013. Processing analytical queries over encrypted data. (2013).
[53]
Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx. 2018. Foreshadow: Extracting the keys to the intel {SGX} kingdom with transient {Out-of-Order} execution. In 27th USENIX Security Symposium (USENIX Security 18). 991--1008.
[54]
Dhinakaran Vinayagamurthy, Alexey Gribov, and Sergey Gorbunov. 2019. StealthDB: a Scalable Encrypted Database with Full SQL Query Support. Proc. Priv. Enhancing Technol. 2019, 3 (2019), 370--388.
[55]
Nikolaj Volgushev, Malte Schwarzkopf, Ben Getchell, Mayank Varia, Andrei Lapets, and Azer Bestavros. 2019. Conclave: secure multi-party computation on big data. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--18.
[56]
Matei Zaharia, Mosharaf Chowdhury, MichaelJ. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. IEEE International Conference on Cloud Computing Technology and Science, IEEE International Conference on Cloud Computing Technology and Science (Jun 2010).
[57]
Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2017. Opaque: An oblivious and encrypted distributed analytics platform. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 283--298.

Index Terms

  1. SecretFlow-SCQL: A Secure Collaborative Query Platform
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 17, Issue 12
      August 2024
      837 pages
      • Editors:
      • Meihui Zhang,
      • Cyrus Shahabi
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 08 November 2024
      Published in PVLDB Volume 17, Issue 12

      Check for updates

      Badges

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media