research-article

DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks

Authors:

Tie-Yan LiuAuthors Info & Claims

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 384 - 394

https://doi.org/10.1145/3292500.3330858

Published: 25 July 2019 Publication History

Abstract

Online prediction has become one of the most essential tasks in many real-world applications. Two main characteristics of typical online prediction tasks include tabular input space and online data generation. Specifically, tabular input space indicates the existence of both sparse categorical features and dense numerical ones, while online data generation implies continuous task-generated data with potentially dynamic distribution. Consequently, effective learning with tabular input space as well as fast adaption to online data generation become two vital challenges for obtaining the online prediction model. Although Gradient Boosting Decision Tree (GBDT) and Neural Network (NN) have been widely used in practice, either of them yields their own weaknesses. Particularly, GBDT can hardly be adapted to dynamic online data generation, and it tends to be ineffective when facing sparse categorical features; NN, on the other hand, is quite difficult to achieve satisfactory performance when facing dense numerical features. In this paper, we propose a new learning framework, DeepGBM, which integrates the advantages of the both NN and GBDT by using two corresponding NN components: (1) CatNN, focusing on handling sparse categorical features. (2) GBDT2NN, focusing on dense numerical features with distilled knowledge from GBDT. Powered by these two components, DeepGBM can leverage both categorical and numerical features while retaining the ability of efficient online update. Comprehensive experiments on a variety of publicly available datasets have demonstrated that DeepGBM can outperform other well-recognized baselines in various online prediction tasks.

References

[1]

Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19--26.

Digital Library

[2]

Arunava Banerjee. 1997. Initializing neural networks using decision trees. Computational learning theory and natural learning systems, Vol. 4 (1997), 3--15.

Digital Library

[3]

I nigo Barandiaran. 1998. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, Vol. 20, 8 (1998).

Digital Library

[4]

Yael Ben-Haim and Elad Tom-Tov. 2010. A streaming parallel decision tree algorithm. Journal of Machine Learning Research, Vol. 11, Feb (2010), 849--872.

Digital Library

[5]

Gérard Biau, Erwan Scornet, and Johannes Welbl. 2016. Neural random forests. Sankhya A (2016), 1--40.

[6]

Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.

[7]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning. ACM, 129--136.

Digital Library

[8]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.

Digital Library

[9]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.

Digital Library

[10]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198.

Digital Library

[11]

Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 71--80.

Digital Library

[12]

Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. 2018. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).

[13]

James Dougherty, Ron Kohavi, and Mehran Sahami. 1995. Supervised and unsupervised discretization of continuous features. In Machine Learning Proceedings 1995. Elsevier, 194--202.

Digital Library

[14]

Ji Feng, Yang Yu, and Zhi-Hua Zhou. 2018. Multi-Layered Gradient Boosting Decision Trees. arXiv preprint arXiv:1806.00007 (2018).

[15]

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, Vol. 15, 1 (2014), 3133--3181.

Digital Library

[16]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics New York, NY, USA:.

[17]

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.

[18]

Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy. 2005. Mining data streams: a review. ACM Sigmod Record, Vol. 34, 2 (2005), 18--26.

Digital Library

[19]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.

Digital Library

[20]

Krzysztof Grabczewski and Norbert Jankowski. 2005. Feature selection with decision tree criterion. In null. IEEE, 212--217.

Digital Library

[21]

Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, and Ralf Herbrich. 2010. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. Omnipress.

Digital Library

[22]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247 (2017).

[23]

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1--9.

Digital Library

[24]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[25]

K. D. Humbird, J. L. Peterson, and R. G. McClarren. 2017. Deep neural network initialization with decision trees. ArXiv e-prints (July 2017). arxiv: 1707.00784

[26]

Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, and Antonio Criminisi. 2016. Decision forests, convolutional networks and the models in-between. arXiv preprint arXiv:1603.01250 (2016).

[27]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

Digital Library

[28]

Ruoming Jin and Gagan Agrawal. 2003. Efficient decision tree construction on streaming data. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 571--576.

Digital Library

[29]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146--3154.

Digital Library

[30]

Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. 2015. Deep neural decision forests. In Proceedings of the IEEE international conference on computer vision. 1467--1475.

Digital Library

[31]

Yaguang Li, Kun Fu, Zheng Wang, Cyrus Shahabi, Jieping Ye, and Yan Liu. 2018. Multi-task representation learning for travel time estimation. In International Conference on Knowledge Discovery and Data Mining,(KDD) .

Digital Library

[32]

Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. arXiv preprint arXiv:1803.05170 (2018).

[33]

Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 689--698.

Digital Library

[34]

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, and Tie-Yan Liu. 2016. A communication-efficient parallel algorithm for decision tree. In Advances in Neural Information Processing Systems. 1279--1287.

Digital Library

[35]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[36]

Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1149--1154.

[37]

Yao Quanming, Wang Mengshuo, Jair Escalante Hugo, Guyon Isabelle, Hu Yi-Qi, Li Yu-Feng, Tu Wei-Wei, Yang Qiang, and Yu Yang. 2018. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306 (2018).

[38]

Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000.

Digital Library

[39]

David L Richmond, Dagmar Kainmueller, Michael Y Yang, Eugene W Myers, and Carsten Rother. 2015. Relating cascaded random forests to deep convolutional neural networks for semantic segmentation. arXiv preprint arXiv:1507.07583 (2015).

[40]

Samuel Rota Bulo and Peter Kontschieder. 2014. Neural decision forests for semantic image labelling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 81--88.

Digital Library

[41]

Scikit-learn. 2018. categorical_encoding. https://github.com/scikit-learn-contrib/categorical-encoding .

[42]

Ishwar Krishnan Sethi. 1990. Entropy nets: from decision trees to neural networks. Proc. IEEE, Vol. 78, 10 (1990), 1605--1613.

[43]

Ira Shavitt and Eran Segal. 2018. Regularization Learning Networks: Deep Learning for Tabular Datasets. In Advances in Neural Information Processing Systems. 1386--1396.

Digital Library

[44]

Jeany Son, Ilchae Jung, Kayoung Park, and Bohyung Han. 2015. Tracking-by-segmentation with online gradient boosting decision tree. In Proceedings of the IEEE International Conference on Computer Vision. 3056--3064.

Digital Library

[45]

V Sugumaran, V Muralidharan, and KI Ramachandran. 2007. Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing, Vol. 21, 2 (2007), 930--942.

[46]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[47]

Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1235--1244.

Digital Library

[48]

Suhang Wang, Charu Aggarwal, and Huan Liu. 2017. Using a random forest to inspire a neural network and improving on it. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 1--9.

[49]

Zheng Wang, Kun Fu, and Jieping Ye. 2018. Learning to estimate the travel time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 858--866.

Digital Library

[50]

Yongxin Yang, Irene Garcia Morillo, and Timothy M Hospedales. 2018. Deep Neural Decision Trees. arXiv preprint arXiv:1806.06988 (2018).

[51]

Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In European conference on information retrieval. Springer, 45--57.

[52]

Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. arXiv preprint arXiv:1702.08835 (2017).

Digital Library

[53]

Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, and Yi Zhang. 2017. Deep embedding forest: Forest-based serving with deep embedding features. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1703--1711.

Digital Library

Cited By

Tang ZLu TLi T(2024)Metric-Independent Mitigation of Unpredefined Bias in Machine ClassificationIntelligent Computing10.34133/icomputing.00833Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0083
Fang ZLi ZLi MYue ZLi K(2024)Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine LearningGenes10.3390/genes1506067615:6(676)Online publication date: 23-May-2024
https://doi.org/10.3390/genes15060676
Ji JKim JKim Y(2024)Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-ResponseFuture Internet10.3390/fi1610035116:10(351)Online publication date: 27-Sep-2024
https://doi.org/10.3390/fi16100351
Show More Cited By

Index Terms

DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods
        Boosting
    2. Machine learning approaches
      1. Classification and regression trees
      2. Neural networks

Recommendations

Structured Data Encoder for Neural Networks Based on Gradient Boosting Decision Tree
Algorithms and Architectures for Parallel Processing
Abstract
Features are very important for machine learning tasks, therefore, feature engineering has been widely adopted to obtain effective handcrafted features, which is, however, labor-intensive and in need of expert knowledge. Therefore, feature ...
Feature selection with neural networks

We present a neural network based approach for identifying salient features for classification in feedforward neural networks. Our approach involves neural network training with an augmented cross-entropy error function. The augmented error function ...
Application of Adaptive Resonance Theory Neural Network for MR Brain Tumor Image Classification

In the present study, the effectiveness of the adaptive resonance theory neural network ART2 is illustrated in the context of automatic classification of abnormal brain tumor images. Abnormal images from four different classes namely metastase, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2019

3305 pages

ISBN:9781450362016

DOI:10.1145/3292500

General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '19

Sponsor:

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 4 - 8, 2019

AK, Anchorage, USA

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

89
Total Citations
View Citations
5,034
Total Downloads

Downloads (Last 12 months)222
Downloads (Last 6 weeks)25

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tang ZLu TLi T(2024)Metric-Independent Mitigation of Unpredefined Bias in Machine ClassificationIntelligent Computing10.34133/icomputing.00833Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0083
Fang ZLi ZLi MYue ZLi K(2024)Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine LearningGenes10.3390/genes1506067615:6(676)Online publication date: 23-May-2024
https://doi.org/10.3390/genes15060676
Ji JKim JKim Y(2024)Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-ResponseFuture Internet10.3390/fi1610035116:10(351)Online publication date: 27-Sep-2024
https://doi.org/10.3390/fi16100351
Du LSong HXu YDai S(2024)An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning TasksElectronics10.3390/electronics1312229113:12(2291)Online publication date: 12-Jun-2024
https://doi.org/10.3390/electronics13122291
Mauer PPaszkiel S(2024)Tabular Data Models for Predicting Art Auction ResultsApplied Sciences10.3390/app14231100614:23(11006)Online publication date: 26-Nov-2024
https://doi.org/10.3390/app142311006
Gao MChen SGao YZhang ZChen YLi YYe QWang XChen Y(2024)Detecting compromised accounts caused by phone number recycling on e-commerce platforms: taking Meituan as an example电子商务平台 “二次放号” 被盗账号检测研究: 以美团为例Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230029125:8(1077-1095)Online publication date: 30-Aug-2024
https://doi.org/10.1631/FITEE.2300291
Li PChang YWei MHsieh H(2024)Estimating Future Financial Development of Urban Areas for Deploying Bank Branches: A Local-Regional Interpretable ModelACM Transactions on Management Information Systems10.1145/365647915:2(1-26)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656479
Chen HEldardiry H(2024)Graph Time-series Modeling in Deep Learning: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/363853418:5(1-35)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3638534
Borisov VLeemann TSeßler KHaug JPawelczyk MKasneci G(2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3229161
Ye MYu YShen ZYu WZeng Q(2024)Cross-Feature Interactive Tabular Data Modeling With Multiplex Graph Neural NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.344065436:12(7851-7864)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3440654
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents