Statistics > Machine Learning

arXiv:2304.13761 (stat)

[Submitted on 26 Apr 2023 (v1), last revised 11 May 2023 (this version, v3)]

Title:Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization

Authors:Shijie Cui, Agus Sudjianto, Aijun Zhang, Runze Li

View PDF

Abstract:Gradient-boosted decision trees (GBDT) are widely used and highly effective machine learning approach for tabular data modeling. However, their complex structure may lead to low robustness against small covariate perturbation in unseen data. In this study, we apply one-hot encoding to convert a GBDT model into a linear framework, through encoding of each tree leaf to one dummy variable. This allows for the use of linear regression techniques, plus a novel risk decomposition for assessing the robustness of a GBDT model against covariate perturbations. We propose to enhance the robustness of GBDT models by refitting their linear regression forms with $L_1$ or $L_2$ regularization. Theoretical results are obtained about the effect of regularization on the model performance and robustness. It is demonstrated through numerical experiments that the proposed regularization approach can enhance the robustness of the one-hot-encoded GBDT models.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2304.13761 [stat.ML]
	(or arXiv:2304.13761v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2304.13761

Submission history

From: Shijie Cui [view email]
[v1] Wed, 26 Apr 2023 18:04:16 UTC (716 KB)
[v2] Fri, 5 May 2023 04:03:21 UTC (720 KB)
[v3] Thu, 11 May 2023 15:47:17 UTC (720 KB)

Statistics > Machine Learning

Title:Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators