Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3634737.3657024acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret Sharing

Published: 01 July 2024 Publication History

Abstract

As a well known machine learning model, Gradient Boosting Decision Tree (GBDT) is widely used in many real-world scenes such as online marketing, risk management, fraud detection and recommendation systems. Due to limited data resources, two data owners may collaborate with each other to jointly train a high-quality model. As privacy regulations such as HIPPA and GDPR come into force, Privacy-Preserving Machine Learning (PPML) has drawn increasingly higher attention. Recently, a line of works [3--6] studies function secret sharing (FSS) schemes in the preprocessing model, where the online stage of secure two-party computation (2PC) is significantly improved. While recent privacy-preserving GDBT frameworks mainly focus on improving the performance of a singular module (e.g. secure bucket aggregation), we propose SiGBDT, a globally silent two-party GBDT framework via function secret sharing on a vertically partitioned dataset. During the training process, we apply FSS schemes to construct efficient modular protocols, such as secure bucket aggregation, argmax computation and a node split approach. We run in-depth experiments and discover that SiGBDT completely outperforms state-of-the-art frameworks. The experiment results show that SiGBDT is at least 3.32 X faster in LAN and at least 6.4 X faster in WAN.

References

[1]
Mark Abspoel, Daniel Escudero, and Nikolaj Volgushev. 2021. Secure training of decision trees with continuous attributes. Proceedings on Privacy Enhancing Technologies 1 (2021), 167--187.
[2]
Donald Beaver. 1992. Efficient multiparty protocols using circuit randomization. In Advances in Cryptology---CRYPTO'91: Proceedings 11. Springer, 420--432.
[3]
Elette Boyle, Nishanth Chandran, Niv Gilboa, Divya Gupta, Yuval Ishai, Nishant Kumar, and Mayank Rathee. 2021. Function secret sharing for mixed-mode and fixed-point secure computation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 871--900.
[4]
Elette Boyle, Niv Gilboa, and Yuval Ishai. 2015. Function secret sharing. In Annual international conference on the theory and applications of cryptographic techniques. Springer, 337--367.
[5]
Elette Boyle, Niv Gilboa, and Yuval Ishai. 2016. Function secret sharing: Improvements and extensions. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1292--1303.
[6]
Elette Boyle, Niv Gilboa, and Yuval Ishai. 2019. Secure computation with preprocessing via function secret sharing. In Theory of Cryptography: 17th International Conference, TCC 2019, Nuremberg, Germany, December 1--5, 2019, Proceedings, Part I 17. Springer, 341--371.
[7]
L. Breiman, J. H. Freidman, Richard A. Olshen, and C. J. Stone. 1984. CART: Classification and Regression Trees. https://api.semanticscholar.org/CorpusID:59814698
[8]
Ran Canetti. 2000. Security and composition of multiparty cryptographic protocols. Journal of CRYPTOLOGY 13 (2000), 143--202.
[9]
Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In Financial Cryptography and Data Security: 14th International Conference, FC 2010, Tenerife, Canary Islands, January 25--28, 2010, Revised Selected Papers 14. Springer, 35--50.
[10]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.
[11]
Weijing Chen, Guoqiang Ma, Tao Fan, Yan Kang, Qian Xu, and Qiang Yang. 2021. Secureboost+: A high performance gradient boosting tree framework for large scale vertical federated learning. arXiv preprint arXiv:2110.10927 (2021).
[12]
Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems 36, 6 (2021), 87--98.
[13]
Tianxiang Dai, Yufan Jiang, Yong Li, and Fei Mei. 2024. NodeGuard: A Highly Efficient Two-Party Computation Framework for Training Large-Scale Gradient Boosting Decision Tree. In 2024 IEEE Security and Privacy Workshops (SPW).
[14]
Sebastiaan De Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. 2014. Practical secure decision tree learning in a teletreatment application. In Financial Cryptography and Data Security: 18th International Conference, FC 2014, Christ Church, Barbados, March 3--7, 2014, Revised Selected Papers 18. Springer, 179--194.
[15]
Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation. In NDSS.
[16]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.
[17]
Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51--60.
[18]
Wenjing Fang, Derun Zhao, Jin Tan, Chaochao Chen, Chaofan Yu, Li Wang, Lei Wang, Jun Zhou, and Benyu Zhang. 2021. Large-scale secure XGB for vertical federated learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 443--452.
[19]
Sam Fletcher and Md Zahidul Islam. 2017. Differentially private random decision forests using smooth sensitivity. Expert Systems with Applications 100, 78 (2017), 16--31.
[20]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[21]
Chong Fu, Xuhong Zhang, Shouling Ji, Jinyin Chen, Jingzheng Wu, Shanqing Guo, Jun Zhou, Alex X Liu, and Ting Wang. 2022. Label inference attacks against vertical federated learning. In 31st USENIX Security Symposium (USENIX Security 22). 1397--1414.
[22]
Fangcheng Fu, Yingxia Shao, Lele Yu, Jiawei Jiang, Huanran Xue, Yangyu Tao, and Bin Cui. 2021. Vf2boost: Very fast vertical federated gradient boosting for cross-enterprise learning. In Proceedings of the 2021 International Conference on Management of Data. 563--576.
[23]
Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems 33 (2020), 16937--16947.
[24]
Oded Goldreich. 2004. Foundations of cryptography: volume 2, basic applications. Cambridge university press.
[25]
Robert E Goldschmidt. 1964. Applications of division by convergence. Ph. D. Dissertation. Massachusetts Institute of Technology.
[26]
Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, and Tianwei Zhang. 2023. FastSecNet: An Efficient Cryptographic Framework for Private Neural Network Inference. IEEE Transactions on Information Forensics and Security 18 (2023), 2569--2582.
[27]
Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora. 2021. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems 34 (2021), 7232--7241.
[28]
Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, and Rahul Sharma. 2023. Orca: FSS-based Secure Training with GPUs. Cryptology ePrint Archive (2023).
[29]
Wen jie Lu, Zhicong Huang, Qizhi Zhang, Yuchen Wang, and Cheng Hong. 2023. Squirrel: A Scalable Secure Two-Party Computation Framework for Training Gradient Boosting Decision Tree. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 6435--6451. https://www.usenix.org/conference/usenixsecurity23/presentation/lu
[30]
Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. 2021. Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34 (2021), 994--1006.
[31]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
[32]
Marcel Keller, Valerio Pastro, and Dragos Rotaru. 2018. Overdrive: making SPDZ great again. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 158--189.
[33]
Andrew Law, Chester Leung, Rishabh Poddar, Raluca Ada Popa, Chenyu Shi, Octavian Sima, Chaofan Yu, Xingmeng Zhang, and Wenting Zheng. 2020. Secure collaborative training and inference for xgboost. In Proceedings of the 2020 workshop on privacy-preserving machine learning in practice. 21--26.
[34]
Chester Leung, Andrew Law, and Octavian Sima. 2019. Towards privacy-preserving collaborative gradient boosted decision trees. UC Berkeley (2019).
[35]
Yehuda Lindell. 2017. How to simulate it-a tutorial on the simulation proof technique. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich (2017), 277--346.
[36]
Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 36--54.
[37]
Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th international conference on world wide web companion. 689--698.
[38]
Jing Liu, Jamie Cui, and Cen Chen. 2023. Online Efficient Secure Logistic Regression based on Function Secret Sharing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1597--1606.
[39]
Junming Ma, Yancheng Zheng, Jun Feng, Derun Zhao, Haoqi Wu, Wenjing Fang, Jin Tan, Chaofan Yu, Benyu Zhang, and Lei Wang. 2023. SecretFlow-SPU: A Performant and User-Friendly Framework for Privacy-Preserving Machine Learning. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 17--33.
[40]
Yuan Meng, Nianhua Yang, Zhilin Qian, and Gaoyu Zhang. 2020. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values. Journal of Theoretical and Applied Electronic Commerce Research 16, 3 (2020), 466--490.
[41]
Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, and Nicolas Kourtellis. 2021. PPFL: privacy-preserving federated learning with trusted execution environments. In Proceedings of the 19th annual international conference on mobile systems, applications, and services. 94--108.
[42]
Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP). IEEE, 19--38.
[43]
Ole-Edvard Ørebæk and Marius Geitle. 2021. Exploring the Hyperparameters of XGBoost Through 3D Visualizations. In AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.
[44]
Benny Pinkas, Thomas Schneider, Nigel Smart, and Stephen Williams. 2009. Secure Two-Party Computation Is Practical. In Advances in Cryptology-ASIACRYPT 2009. Springer Berlin Heidelberg, 250--267.
[45]
Gabriel Rushin, Cody Stancil, Muyang Sun, Stephen Adams, and Peter Beling. 2017. Horse race analysis in credit card fraud---deep learning, logistic regression, and Gradient Boosted Tree. In 2017 systems and information engineering design symposium (SIEDS). IEEE, 117--121.
[46]
Théo Ryffel, Pierre Tholoniat, David Pointcheval, and Francis Bach. 2022. AriaNN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing. Proceedings on Privacy Enhancing Technologies 1 (2022), 291--316.
[47]
Zeinab Shahbazi and Yung-Cheol Byun. 2019. Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29 (2019), 6979--6988.
[48]
Zhenya Tian, Jialiang Xiao, Haonan Feng, and Yutian Wei. 2020. Credit risk assessment based on gradient boosting decision tree. Procedia Computer Science 174 (2020), 150--160.

Cited By

View all
  • (2024)S-BDT: Distributed Differentially Private Boosted Decision TreesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690301(288-302)Online publication date: 2-Dec-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '24: Proceedings of the 19th ACM Asia Conference on Computer and Communications Security
July 2024
1987 pages
ISBN:9798400704826
DOI:10.1145/3634737
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2024

Check for updates

Author Tags

  1. multi-party computation
  2. function secret sharing
  3. privacy-preserving machine learning
  4. gradient boosting decision tree

Qualifiers

  • Research-article

Conference

ASIA CCS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)197
  • Downloads (Last 6 weeks)29
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)S-BDT: Distributed Differentially Private Boosted Decision TreesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690301(288-302)Online publication date: 2-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media