research-article

SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret Sharing

Authors:

Yong LiAuthors Info & Claims

ASIA CCS '24: Proceedings of the 19th ACM Asia Conference on Computer and Communications Security

Pages 274 - 288

https://doi.org/10.1145/3634737.3657024

Published: 01 July 2024 Publication History

Abstract

As a well known machine learning model, Gradient Boosting Decision Tree (GBDT) is widely used in many real-world scenes such as online marketing, risk management, fraud detection and recommendation systems. Due to limited data resources, two data owners may collaborate with each other to jointly train a high-quality model. As privacy regulations such as HIPPA and GDPR come into force, Privacy-Preserving Machine Learning (PPML) has drawn increasingly higher attention. Recently, a line of works [3--6] studies function secret sharing (FSS) schemes in the preprocessing model, where the online stage of secure two-party computation (2PC) is significantly improved. While recent privacy-preserving GDBT frameworks mainly focus on improving the performance of a singular module (e.g. secure bucket aggregation), we propose SiGBDT, a globally silent two-party GBDT framework via function secret sharing on a vertically partitioned dataset. During the training process, we apply FSS schemes to construct efficient modular protocols, such as secure bucket aggregation, argmax computation and a node split approach. We run in-depth experiments and discover that SiGBDT completely outperforms state-of-the-art frameworks. The experiment results show that SiGBDT is at least 3.32 X faster in LAN and at least 6.4 X faster in WAN.

References

[1]

Mark Abspoel, Daniel Escudero, and Nikolaj Volgushev. 2021. Secure training of decision trees with continuous attributes. Proceedings on Privacy Enhancing Technologies 1 (2021), 167--187.

[2]

Donald Beaver. 1992. Efficient multiparty protocols using circuit randomization. In Advances in Cryptology---CRYPTO'91: Proceedings 11. Springer, 420--432.

[3]

Elette Boyle, Nishanth Chandran, Niv Gilboa, Divya Gupta, Yuval Ishai, Nishant Kumar, and Mayank Rathee. 2021. Function secret sharing for mixed-mode and fixed-point secure computation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 871--900.

Digital Library

[4]

Elette Boyle, Niv Gilboa, and Yuval Ishai. 2015. Function secret sharing. In Annual international conference on the theory and applications of cryptographic techniques. Springer, 337--367.

[5]

Elette Boyle, Niv Gilboa, and Yuval Ishai. 2016. Function secret sharing: Improvements and extensions. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1292--1303.

Digital Library

[6]

Elette Boyle, Niv Gilboa, and Yuval Ishai. 2019. Secure computation with preprocessing via function secret sharing. In Theory of Cryptography: 17th International Conference, TCC 2019, Nuremberg, Germany, December 1--5, 2019, Proceedings, Part I 17. Springer, 341--371.

Digital Library

[7]

L. Breiman, J. H. Freidman, Richard A. Olshen, and C. J. Stone. 1984. CART: Classification and Regression Trees. https://api.semanticscholar.org/CorpusID:59814698

[8]

Ran Canetti. 2000. Security and composition of multiparty cryptographic protocols. Journal of CRYPTOLOGY 13 (2000), 143--202.

Digital Library

[9]

Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In Financial Cryptography and Data Security: 14th International Conference, FC 2010, Tenerife, Canary Islands, January 25--28, 2010, Revised Selected Papers 14. Springer, 35--50.

Digital Library

[10]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.

Digital Library

[11]

Weijing Chen, Guoqiang Ma, Tao Fan, Yan Kang, Qian Xu, and Qiang Yang. 2021. Secureboost+: A high performance gradient boosting tree framework for large scale vertical federated learning. arXiv preprint arXiv:2110.10927 (2021).

[12]

Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems 36, 6 (2021), 87--98.

Digital Library

[13]

Tianxiang Dai, Yufan Jiang, Yong Li, and Fei Mei. 2024. NodeGuard: A Highly Efficient Two-Party Computation Framework for Training Large-Scale Gradient Boosting Decision Tree. In 2024 IEEE Security and Privacy Workshops (SPW).

[14]

Sebastiaan De Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. 2014. Practical secure decision tree learning in a teletreatment application. In Financial Cryptography and Data Security: 18th International Conference, FC 2014, Christ Church, Barbados, March 3--7, 2014, Revised Selected Papers 18. Springer, 179--194.

[15]

Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation. In NDSS.

[16]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.

[17]

Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51--60.

Digital Library

[18]

Wenjing Fang, Derun Zhao, Jin Tan, Chaochao Chen, Chaofan Yu, Li Wang, Lei Wang, Jun Zhou, and Benyu Zhang. 2021. Large-scale secure XGB for vertical federated learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 443--452.

Digital Library

[19]

Sam Fletcher and Md Zahidul Islam. 2017. Differentially private random decision forests using smooth sensitivity. Expert Systems with Applications 100, 78 (2017), 16--31.

Digital Library

[20]

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.

[21]

Chong Fu, Xuhong Zhang, Shouling Ji, Jinyin Chen, Jingzheng Wu, Shanqing Guo, Jun Zhou, Alex X Liu, and Ting Wang. 2022. Label inference attacks against vertical federated learning. In 31st USENIX Security Symposium (USENIX Security 22). 1397--1414.

[22]

Fangcheng Fu, Yingxia Shao, Lele Yu, Jiawei Jiang, Huanran Xue, Yangyu Tao, and Bin Cui. 2021. Vf2boost: Very fast vertical federated gradient boosting for cross-enterprise learning. In Proceedings of the 2021 International Conference on Management of Data. 563--576.

Digital Library

[23]

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems 33 (2020), 16937--16947.

[24]

Oded Goldreich. 2004. Foundations of cryptography: volume 2, basic applications. Cambridge university press.

Digital Library

[25]

Robert E Goldschmidt. 1964. Applications of division by convergence. Ph. D. Dissertation. Massachusetts Institute of Technology.

[26]

Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, and Tianwei Zhang. 2023. FastSecNet: An Efficient Cryptographic Framework for Private Neural Network Inference. IEEE Transactions on Information Forensics and Security 18 (2023), 2569--2582.

Digital Library

[27]

Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora. 2021. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems 34 (2021), 7232--7241.

[28]

Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, and Rahul Sharma. 2023. Orca: FSS-based Secure Training with GPUs. Cryptology ePrint Archive (2023).

[29]

Wen jie Lu, Zhicong Huang, Qizhi Zhang, Yuchen Wang, and Cheng Hong. 2023. Squirrel: A Scalable Secure Two-Party Computation Framework for Training Gradient Boosting Decision Tree. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 6435--6451. https://www.usenix.org/conference/usenixsecurity23/presentation/lu

[30]

Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. 2021. Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34 (2021), 994--1006.

[31]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).

Digital Library

[32]

Marcel Keller, Valerio Pastro, and Dragos Rotaru. 2018. Overdrive: making SPDZ great again. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 158--189.

[33]

Andrew Law, Chester Leung, Rishabh Poddar, Raluca Ada Popa, Chenyu Shi, Octavian Sima, Chaofan Yu, Xingmeng Zhang, and Wenting Zheng. 2020. Secure collaborative training and inference for xgboost. In Proceedings of the 2020 workshop on privacy-preserving machine learning in practice. 21--26.

Digital Library

[34]

Chester Leung, Andrew Law, and Octavian Sima. 2019. Towards privacy-preserving collaborative gradient boosted decision trees. UC Berkeley (2019).

[35]

Yehuda Lindell. 2017. How to simulate it-a tutorial on the simulation proof technique. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich (2017), 277--346.

[36]

Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 36--54.

Digital Library

[37]

Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th international conference on world wide web companion. 689--698.

Digital Library

[38]

Jing Liu, Jamie Cui, and Cen Chen. 2023. Online Efficient Secure Logistic Regression based on Function Secret Sharing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1597--1606.

Digital Library

[39]

Junming Ma, Yancheng Zheng, Jun Feng, Derun Zhao, Haoqi Wu, Wenjing Fang, Jin Tan, Chaofan Yu, Benyu Zhang, and Lei Wang. 2023. SecretFlow-SPU: A Performant and User-Friendly Framework for Privacy-Preserving Machine Learning. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 17--33.

[40]

Yuan Meng, Nianhua Yang, Zhilin Qian, and Gaoyu Zhang. 2020. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values. Journal of Theoretical and Applied Electronic Commerce Research 16, 3 (2020), 466--490.

[41]

Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, and Nicolas Kourtellis. 2021. PPFL: privacy-preserving federated learning with trusted execution environments. In Proceedings of the 19th annual international conference on mobile systems, applications, and services. 94--108.

Digital Library

[42]

Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP). IEEE, 19--38.

[43]

Ole-Edvard Ørebæk and Marius Geitle. 2021. Exploring the Hyperparameters of XGBoost Through 3D Visualizations. In AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.

[44]

Benny Pinkas, Thomas Schneider, Nigel Smart, and Stephen Williams. 2009. Secure Two-Party Computation Is Practical. In Advances in Cryptology-ASIACRYPT 2009. Springer Berlin Heidelberg, 250--267.

[45]

Gabriel Rushin, Cody Stancil, Muyang Sun, Stephen Adams, and Peter Beling. 2017. Horse race analysis in credit card fraud---deep learning, logistic regression, and Gradient Boosted Tree. In 2017 systems and information engineering design symposium (SIEDS). IEEE, 117--121.

[46]

Théo Ryffel, Pierre Tholoniat, David Pointcheval, and Francis Bach. 2022. AriaNN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing. Proceedings on Privacy Enhancing Technologies 1 (2022), 291--316.

[47]

Zeinab Shahbazi and Yung-Cheol Byun. 2019. Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29 (2019), 6979--6988.

[48]

Zhenya Tian, Jialiang Xiao, Haonan Feng, and Yutian Wei. 2020. Credit risk assessment based on gradient boosting decision tree. Procedia Computer Science 174 (2020), 150--160.

Cited By

Peinemann TKirschte MStock JCotrini CMohammadi ELuo BLiao XXu JKirda ELie D(2024)S-BDT: Distributed Differentially Private Boosted Decision TreesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690301(288-302)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690301

Index Terms

SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret Sharing

Recommendations

Online Efficient Secure Logistic Regression based on Function Secret Sharing
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Logistic regression is an algorithm widely used for binary classification in various real-world applications such as fraud detection, medical diagnosis, and recommendation systems. However, training a logistic regression model with data from different ...
(t,p)-Threshold point function secret sharing scheme based on polynomial interpolation and its application
UCC '16: Proceedings of the 9th International Conference on Utility and Cloud Computing

Point function secret sharing(PFSS)was a special kind of function secret sharing(FSS)that was a special case of secret sharing,that the shared secret was a point function instead of a value.Motivated by the reason that the PFSS has very good application ...
Fair secret reconstruction in (t, n) secret sharing

In Shamir's (t, n) threshold secret sharing scheme, one secret s is divided into n shares by a dealer and all shares are shared among n shareholders, such that knowing t or more than t shares can reconstruct this secret; but knowing fewer than t shares ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASIA CCS '24: Proceedings of the 19th ACM Asia Conference on Computer and Communications Security

July 2024

1987 pages

ISBN:9798400704826

DOI:10.1145/3634737

Chair:
Jianying Zhou,
Co-chair:
Tony Q. S. Quek,
Program Chairs:
Debin Gao,
Alvaro Cardenas
University of California, Santa Cruz, USA

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASIA CCS '24

Sponsor:

SIGSAC

ASIA CCS '24: 19th ACM Asia Conference on Computer and Communications Security

July 1 - 5, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
197
Total Downloads

Downloads (Last 12 months)197
Downloads (Last 6 weeks)29

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Peinemann TKirschte MStock JCotrini CMohammadi ELuo BLiao XXu JKirda ELie D(2024)S-BDT: Distributed Differentially Private Boosted Decision TreesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690301(288-302)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690301

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents