Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam
<p>ROC of classifiers. Source: author’s calculation.</p> "> Figure 2
<p>The feature importance of XGB and random forest. Source: author’s calculation.</p> "> Figure 3
<p>The SHAP dependence plot of single feature. Source: author’s calculation.</p> ">
Abstract
:1. Introduction
2. Literature Review
2.1. Literature Review on Financial Distress Prediction
2.2. Literature Review on Explanation
3. Methodology
3.1. Data and Data Processing
3.2. Machine Learning Methods to Predict Financial Distress
3.2.1. Logistic Regression
3.2.2. Support Vector Machine
3.2.3. Decision Tree
3.2.4. Random Forest
3.2.5. Extreme Gradient Boosting (XGB)
3.2.6. Artificial Neural Network
3.3. Explainability Methods
3.4. Evaluation of Model Performance
- Accuracy—The proportion of correct classification in the evaluation data
- Precision—The proportion of true positives among the predicted positives
- Sensitivity (Recall)—The proportion of positives correctly predicted
- —The harmonic mean of precision and recall.
- The ROC plots the true positive rate to the false positive rate.
- Area under the receiver operating curve (AUC)—The receiver operating curve (ROC) measures the model’s classification ability subject to varying decision boundary thresholds. The area under the curve (AUC) aggregates the performance measures given by the ROC curve. AUC also helps to provide criteria for evaluating and comparing models: AUC had to be more than 0.5 for the model to be acceptable, and the close to 1, the stronger its predictive power.
4. Results and Discussions
4.1. Prediction Results
4.2. Interpretation Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Symbol | Input Features | Category |
---|---|---|
X1 | Cash Ratio | Liquidity risk |
X2 | Quick Ratio | Liquidity risk |
X3 | Current Ratio | Liquidity risk |
X4 | Long term Debts to Equity | Financial risk |
X5 | Long term Debts to Total Assets | Financial risk |
X6 | Total Liabilities to Equity | Financial risk |
X7 | Total Liabilities to Total Assets | Financial risk |
X8 | Short term Debt to Equity | Financial risk |
X9 | Short term Debt to Total Assets | Financial risk |
X10 | Account Payable to Equity | Business Risk |
X11 | Account Payable to Total Assets | Business Risk |
X12 | Total Assets to Total Liabilities | Business Risk |
X13 | EBITDA to Short term Debt and Interest | Business Risk |
X14 | Price to Earning | Market factor |
X15 | Diluted Price to Earning | Market factor |
X16 | Price to Book Value | Market factor |
X17 | Price to Sales | Market factor |
X18 | Price to Tagible Book Value | Market factor |
X19 | Market Capital | Market factor |
X20 | Price to Cashflow | Market factor |
X21 | Enterprise Value | Valuation |
X22 | Enterprise Value to Revenues | Valuation |
X23 | Enterprise Value to EBITDA | Valuation |
X24 | Enterprise Value to EBIT | Valuation |
X25 | Diluted EPS | Valuation |
References
- Beaver, W.H. Financial Ratios as Predictors of Failure. J. Account. Res. 1966, 71–111. [Google Scholar] [CrossRef]
- Altman, E.I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
- Ohlson, J.A. Financial Ratios and the Probabilistic Prediction of Bankruptcy. J. Account. Res. 1980, 18, 109–131. [Google Scholar] [CrossRef] [Green Version]
- Cox, D.R. Regression Models and Life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
- Kim, D.; Shin, S. The Economic Explainability of Machine Learning and Standard Econometric Models-an Application to the US Mortgage Default Risk. Int. J. Strateg. Prop. Manag. 2021, 25, 396–412. [Google Scholar] [CrossRef]
- Olson, D.L.; Delen, D.; Meng, Y. Comparative Analysis of Data Mining Methods for Bankruptcy Prediction. Decis. Support Syst. 2012, 52, 464–473. [Google Scholar] [CrossRef]
- Chen, H.-J.; Huang, S.Y.; Lin, C.-S. Alternative Diagnosis of Corporate Bankruptcy: A Neuro Fuzzy Approach. Expert Syst. Appl. 2009, 36, 7710–7720. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Freund, Y.; Schapire, R.; Abe, N. A Short Introduction to Boosting. J. -Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
- Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme Gradient Boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
- Kruppa, J.; Schwarz, A.; Arminger, G.; Ziegler, A. Consumer Credit Risk: Individual Probability Estimates Using Machine Learning. Expert Syst. Appl. 2013, 40, 5125–5131. [Google Scholar] [CrossRef]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999; ISBN 0-387-98780-0. [Google Scholar]
- Chen, S.; Härdle, W.K.; Moro, R.A. Modeling Default Risk with Support Vector Machines. Quant. Financ. 2011, 11, 135–154. [Google Scholar] [CrossRef]
- Shin, K.-S.; Lee, T.S.; Kim, H. An Application of Support Vector Machines in Bankruptcy Prediction Model. Expert Syst. Appl. 2005, 28, 127–135. [Google Scholar] [CrossRef]
- Zhao, Z.; Xu, S.; Kang, B.H.; Kabir, M.M.J.; Liu, Y.; Wasinger, R. Investigation and Improvement of Multi-Layer Perceptron Neural Networks for Credit Scoring. Expert Syst. Appl. 2015, 42, 3508–3516. [Google Scholar] [CrossRef]
- Geng, R.; Bose, I.; Chen, X. Prediction of Financial Distress: An Empirical Study of Listed Chinese Companies Using Data Mining. Eur. J. Oper. Res. 2015, 241, 236–247. [Google Scholar] [CrossRef]
- Barboza, F.; Kimura, H.; Altman, E. Machine Learning Models and Bankruptcy Prediction. Expert Syst. Appl. 2017, 83, 405–417. [Google Scholar] [CrossRef]
- Chakraborty, C.; Joseph, A. Machine Learning at Central Banks; SSRN: Amsterdam, The Netherlands, 2017. [Google Scholar]
- Fuster, A.; Goldsmith-Pinkham, P.; Ramadorai, T.; Walther, A. Predictably Unequal? The Effects of Machine Learning on Credit Markets. J. Financ. 2022, 77, 5–47. [Google Scholar] [CrossRef]
- Dubyna, M.; Popelo, O.; Kholiavko, N.; Zhavoronok, A.; Fedyshyn, M.; Yakushko, I. Mapping the Literature on Financial Behavior: A Bibliometric Analysis Using the VOSviewer Program. WSEAS Trans. Bus. Econ. 2022, 19, 231–246. [Google Scholar] [CrossRef]
- Zhavoronok, A..; Popelo, O.; Shchur, R.; Ostrovska, N.; Kordzaia, N. The Role of Digital Technologies in the Transformation of Regional Models of Households’ Financial Behavior in the Conditions of the National Innovative Economy Development. Ingénierie Des Systèmes D’Inf. 2022, 27, 613–620. [Google Scholar] [CrossRef]
- Doshi-Velez, F.; Kim, B. Towards a Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
- Miller, T. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
- Bracke, P.; Datta, A.; Jung, C.; Sen, S. Machine Learning Explainability in Finance: An Application to Default Risk Analysis; SSRN: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Babaei, G.; Giudici, P.; Raffinetti, E. Explainable Fintech Lending; SSRN: Amsterdam, The Netherlands, 2021. [Google Scholar]
- Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable Machine Learning in Credit Risk Management. Comput. Econ. 2021, 57, 203–216. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; pp. 4768–4777. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. arXiv 2016, arXiv:1602.04938. [Google Scholar]
- Ariza-Garzón, M.J.; Arroyo, J.; Caparrini, A.; Segovia-Vargas, M.-J. Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE Access 2020, 8, 64873–64890. [Google Scholar] [CrossRef]
- Hadji Misheva, B.; Hirsa, A.; Osterrieder, J.; Kulkarni, O.; Fung Lin, S. Explainable AI in Credit Risk Management. Credit. Risk Manag. 2021. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Waskom, M.; Botvinnik, O.; O’Kane, D.; Hobson, P.; Lukauskas, S.; Gemperline, D.C.; Augspurger, T.; Halchenko, Y.; Cole, J.B.; Warmenhoven, J. Mwaskom/Seaborn: V0. 8.1 (September 2017). Zenodo 2017. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Abellán, J.; Castellano, J.G. A Comparative Study on Base Classifiers in Ensemble Methods for Credit Scoring. Expert Syst. Appl. 2017, 73, 1–10. [Google Scholar] [CrossRef]
Algorithms | Hyper-Parameter | AUC | Accuracy | Precision | Recall | F1 Score | |
---|---|---|---|---|---|---|---|
1 | Extreme Gradient Boosting | booster = “gbtree”, n_estimator = 100, max_depth = 1, random_state = 42 | 0.9702 | 0.9566 | 0.8726 | 0.8354 | 0.8536 |
2 | Random Forest | max_depth = 14,n_estimators = 100, random_state = 42 | 0.9788 | 0.9529 | 0.8535 | 0.8272 | 0.8401 |
3 | Logistic Regression | random_state = 42 | 0.9303 | 0.8623 | 0.8854 | 0.5148 | 0.6511 |
4 | Artificial Neural Network | n_hidden = 2, max_iter = 200, activations = relu, Optimizer = adam | 0.9034 | 0.9168 | 0.8025 | 0.6811 | 0.7368 |
5 | Decision Trees | Criterion = “gini”, max_depth = 14, random_state = 42 | 0.8848 | 0.9251 | 0.828 | 0.7065 | 0.7625 |
6 | Support Vector Machine | Kernel = “rbf”, probability = True, class_weight = “balanced”, random_state = 42 | 0.7889 | 0.8789 | 0.9427 | 0.4022 | 0.5815 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tran, K.L.; Le, H.A.; Nguyen, T.H.; Nguyen, D.T. Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam. Data 2022, 7, 160. https://doi.org/10.3390/data7110160
Tran KL, Le HA, Nguyen TH, Nguyen DT. Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam. Data. 2022; 7(11):160. https://doi.org/10.3390/data7110160
Chicago/Turabian StyleTran, Kim Long, Hoang Anh Le, Thanh Hien Nguyen, and Duc Trung Nguyen. 2022. "Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam" Data 7, no. 11: 160. https://doi.org/10.3390/data7110160
APA StyleTran, K. L., Le, H. A., Nguyen, T. H., & Nguyen, D. T. (2022). Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam. Data, 7(11), 160. https://doi.org/10.3390/data7110160