Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market

Nazeeh Ghatasheh¹,
Hossam Faris²,
Ruba Abukhurma³,
Pedro A. Castillo ORCID: orcid.org/0000-0002-5258-0620⁴,
Nailah Al-Madi⁵,
Antonio M. Mora⁶,
Ala’ M. Al-Zoubi⁷ &
…
Ahmad Hassanat^8,9

694 Accesses
24 Citations
Explore all metrics

Abstract

Bankruptcy is an issue of interest in the business world since decades. It is a crucial endeavor for survival to predict this phenomenon in periods of economic turmoil and recession. In fact, bankruptcy modeling is challenging due to the complexity of contributing factors and the highly imbalanced distribution of available data sets. This work aims at improving the prediction power of bankruptcy modeling, by applying cost-sensitive ensemble methods on a real-world Spanish bankruptcy data set to generate prediction models. The performance of the prediction models is highly competitive in comparison with the related research in the field. Cost-sensitive random forests over-performed other approaches in predicting bankruptcy, achieving a geometric mean of 90.7%, 0.094 and 0.088 type I & type II errors, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Corporate Bankruptcy Using Machine Learning Models

Predicting the Probability of Bankruptcy of Service Sector Enterprises Based on Ensemble Learning Methods

Bankruptcy Prediction Using Bi-Level Classification Technique

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Bought from http://infotel.es.

References

Akerlof, G.A., Romer, P.M., Hall, R.E., Mankiw, N.G.: Looting: the economic underworld of bankruptcy for profit. Brook. Pap. Econ. Act. 1993(2), 1–73 (1993)
Google Scholar
Alaminos, D., del Castillo, A., Fernández, M.Á.: A global model for bankruptcy prediction. PLoS ONE 11(11), e0166693 (2016)
Google Scholar
Alswiti, W., Faris, H., Aljawazneh, H., Safi, S., Castillo, P., Mora, A., Abukhurma, R., Alsawalqah, H.: Empirical evaluation of advanced oversampling methods for improving bankruptcy prediction. In: Proceedings of the International Conference on Time Series and Forecasting (ITISE 2018), pp. 1495–1506 (2018)
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Finance 23(4), 589–609 (1968)
Google Scholar
Altman, E.I., Hotchkiss, E.: Corporate financial distress and bankruptcy: predict and avoid bankruptcy, analyze and invest in distressed debt, vol. 289. Wiley, Hoboken (2010)
Google Scholar
Baird, D.G., Morrison, E.R.: Bankruptcy decision making. J Law Econ Organ 17(2), 356–372 (2001)
Google Scholar
Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis-a brief tutorial. In: Institute for Signal and information Processing, p. 18 (1998)
Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017)
Google Scholar
Bellovary, J.L., Giacomino, D.E., Akers, M.D.: A review of bankruptcy prediction studies: 1930 to present. J. Financ. Educ. 3, 1–42 (2007)
Google Scholar
Blanco-Oliver, A., Irimia-Dieguez, A., Oliver-Alfonso, M., Wilson, N.: Improving bankruptcy prediction in micro-entities by using nonlinear effects and non-financial variables. Finance Uver 65(2), 144 (2015)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
MATH Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer (2009)
Chen, N., Ribeiro, B., Vieira, A.S., Duarte, J., Neves, J.C.: A genetic algorithm-based approach to cost-sensitive bankruptcy prediction. Expert Syst. Appl. 38(10), 12939–12945 (2011)
Google Scholar
Cho, S., Hong, H., Ha, B.C.: A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the mahalanobis distance: For bankruptcy prediction. Expert Syst. Appl. 37(4), 3482–3488 (2010)
Google Scholar
Collins, R.A., Green, R.D.: Statistical methods for bankruptcy forecasting. J. Econ. Bus. 34(4), 349–354 (1982)
Google Scholar
Constand, R.L., Yazdipour, R.: Firm failure prediction models: a critique and a review of recent developments. In: Advances in Entrepreneurial Finance, pp. 185–204. Springer (2011)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’99, pp. 155–164. ACM, New York, NY, USA (1999). https://doi.org/10.1145/312129.312220
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd (2001)
Faris, H., Abukhurma, R., Almanaseer, W., Saadeh, M., Mora, A.M., Castillo, P.A., Aljarah, I.: Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the spanish market. In: Progress in Artificial Intelligence, pp. 1–23 (2019)
Fejér-Király, G., et al.: Bankruptcy prediction: a survey on evolution, critiques, and solutions. Acta Universitatis Sapientiae, Econ. Bus. 3(1), 93–108 (2015)
Google Scholar
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
MathSciNet Google Scholar
García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inform. Fusion 47, 88–101 (2019). https://doi.org/10.1016/j.inffus.2018.07.004
Article Google Scholar
Gerritsen, P.: Accuracy rate of bankruptcy prediction models for the dutch professional football industry. Master’s thesis, University of Twente (2015)
Ghatasheh, N., Faris, H., AlTaharwa, I., Harb, Y., Harb, A.: Business analytics in telemarketing: cost-sensitive analysis of bank campaigns using artificial neural networks. Appl. Sci. 10(7), 2581 (2020). https://doi.org/10.3390/app10072581
Article Google Scholar
Grice, J.S., Dugan, M.T.: The limitations of bankruptcy prediction models: some cautions for the researcher. Rev. Quant. Financ. Acc. 17(2), 151–166 (2001)
Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1), 29–36 (1982)
Google Scholar
Kaski, S., Sinkkonen, J., Peltonen, J.: Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Trans. Neural Netw. 12(4), 936–947 (2001)
MATH Google Scholar
Khor, K.C., Ng, K.H.: Evaluation of cost sensitive learning for imbalanced bank direct marketing data. Indian J. Sci. Technol. (2016). https://doi.org/10.17485/ijst/2016/v9i42/100812
Article Google Scholar
Kim, M.J., Kang, D.K., Kim, H.B.: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 42(3), 1074–1082 (2015)
Google Scholar
Kiviluoto, K.: Predicting bankruptcies with the self-organizing map. Neurocomputing 21(1), 191–201 (1998)
MATH Google Scholar
Kleinert, M.: Comparison of bankruptcy prediction models of Altman (1969), Ohlson (1980) and Zmijewski (1984) on German and Belgian listed companies between 2008–2013. Master’s thesis, University of Twente (2014)
Korol, T., Korodi, A., et al.: An evaluation of effectiveness of fuzzy logic model in predicting the business bankruptcy. Rom. J. Econ. Forecast. 3(1), 92–107 (2011)
Google Scholar
Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent techniques-a review. Eur. J. Oper. Res. 180(1), 1–28 (2007)
MATH Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
MATH Google Scholar
Laitinen, E.K., Laitinen, T.: Bankruptcy prediction: application of the Taylor’s expansion in logistic regression. Int. Rev. Financ. Anal. 9(4), 327–349 (2001)
Google Scholar
Le, T., Vo, M.T., Vo, B., Lee, M.Y., Baik, S.W.: A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity 2019, 8460934 (2019). https://doi.org/10.1155/2019/8460934
Article Google Scholar
Lee, H.H., Lin, C.M.: Industry effect, credit contagion and bankruptcy prediction. In: 20th Annual Conference on Pacific Basin Finance, Economics, Accounting, and Management (2012)
Leo, M., Sharma, S., Maddulety, K.: Machine learning in banking risk management: a literature review. Risks 7(1), 29 (2019)
Google Scholar
Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. IJCAI 3, 505–510 (2003)
Google Scholar
Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Inform. Fusion 6(1), 99–111 (2005)
Google Scholar
Min, S.H., Lee, J., Han, I.: Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Syst. Appl. 31(3), 652–660 (2006)
Google Scholar
Mossman, C.E., Bell, G.G., Swartz, L.M., Turtle, H.: An empirical comparison of bankruptcy models. Financ. Rev. 33(2), 35–54 (1998)
Google Scholar
Nassimbwa, J., Tian, Y.: Bankruptcy effect on business competitors: Empirical study of US companies (2013). http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-76240
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
MATH Google Scholar
Ouenniche, J., Bouslah, K., Cabello, J.M., Ruiz, F.: A new classifier based on the reference point method with application in bankruptcy prediction. J. Oper. Res. Soc. 69(10), 1653–1660 (2018)
Google Scholar
O’Brien, R.G., Castelloe, J.: Sample size analysis for traditional hypothesis testing: concepts and issues. In: Pharmaceutical Statistics Using SAS: A Practical Guide, pp. 237–71 (2007)
Pacey, J.W., Pham, T.M.: The predictiveness of bankruptcy models: methodological problems and evidence. Aust. J. Manag. 15(2), 315–337 (1990)
Google Scholar
Pervan, I., Kuvek, T.: The relative importance of financial ratios and nonfinancial variables in predicting of insolvency. Croat. Oper. Res. Rev. 4(1), 187–197 (2013)
Google Scholar
Rahim, A.H.A., Rashid, N.A., Nayan, A., Ahmad, A.R.: Smote approach to imbalanced dataset in logistic regression analysis. In: Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017), pp. 429–433. Springer (2019)
Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test, pp. 1658–1659. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-04898-2_616
Book Google Scholar
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Google Scholar
Schapire, R.E.: Explaining adaboost. In: Empirical Inference, pp. 37–52. Springer (2013)
Shen, F., Zhao, X., Li, Z., Li, K., Meng, Z.: A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation. Phys. A Stat. Mech. Appl. 526, 121073 (2019)
Google Scholar
Shin, K.S., Lee, T.S., Kim, H.J.: An application of support vector machines in bankruptcy prediction model. Expert Syst. Appl. 28(1), 127–135 (2005)
Google Scholar
Shin, K.S., Lee, Y.J.: A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 23(3), 321–328 (2002)
Google Scholar
Shumway, T.: Forecasting bankruptcy more accurately: a simple hazard model. J. Bus. 74(1), 101–124 (2001)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
MATH Google Scholar
Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369–409 (1994)
Google Scholar
Vu, L.T., Vu, L.T., Nguyen, N.T., Do, P.T.T., Dao, D.P.: Feature selection methods and sampling techniques to financial distress prediction for vietnamese listed companies. Invest. Manag. Financ. Innov. 16(1), 276 (2019)
Google Scholar
Wang, H.: Cost-sensitive adaboost selective ensemble for financial distress prediction. Int. J. u e Serv. Sci. Technol. 8(10), 83–94 (2015)
Google Scholar
Wang, J.: Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications, vol. 3. IGI Global, Pennsylvania (2008)
Google Scholar
Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? DMIN 7, 35–41 (2007)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Google Scholar
Wu, X., Yang, D., Zhang, W., Zhang, S.: A hybrid ensemble model for corporate bankruptcy prediction based on feature engineering method. Int. J. Inform. Commun. Sci. 4(3), 63 (2019)
Google Scholar
Xu, W., Fu, H., Pan, Y.: A novel soft ensemble model for financial distress prediction with different sample sizes. Math. Probl. Eng. 2019, 3085247 (2019). https://doi.org/10.1155/2019/3085247
Article Google Scholar
Yu, Q., Miche, Y., Lendasse, A., Séverin, E.: Bankruptcy prediction with missing data. In: Proceedings of 2011 International Conference on Data Mining, Las Vegas, USA, pp. 279–285 (2011)
Zefrehi, H.G., Altınçay, H.: Imbalance learning using heterogeneous ensembles. Expert Syst. Appl. 142, 113005 (2020)
Google Scholar
Zhang, G., Hu, M.Y., Patuwo, B.E., Indro, D.C.: Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis. Eur. J. Oper. Res. 116(1), 16–32 (1999)
MATH Google Scholar
Zhou, Z.H.: Cost-sensitive learning. In: International Conference on Modeling Decisions for Artificial Intelligence, pp. 17–18. Springer (2011)

Download references

Acknowledgements

This work has been supported in part by Ministerio español de Economía y Competitividad under Project TIN2017-85727-C4-2-P (UGR-DeepBio), SPIP2017-02116 and TEC2015-68752 (also funded by FEDER), as well as Project B-TIC-402-UGR18 (FEDER and Junta de Andalucíıa) and RTI2018-102002-A-I00 (Ministerio español de Ciencia, Innovación y Universidades).

Author information

Authors and Affiliations

Information Technology Department,Faculty of Information Technology and Systems, The University of Jordan, Aqaba, Jordan
Nazeeh Ghatasheh
Information Technology Department, King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
Hossam Faris
King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
Ruba Abukhurma
Department of Computer Architecture and Computer Technology, ETSIIT and CITIC, University of Granada, Granada, Spain
Pedro A. Castillo
Data Science Department, Princess Sumaya University for Technology (PSUT), Amman, Jordan
Nailah Al-Madi
Department of Signal Theory, Telematics and Communications, ETSIIT and CITIC, University of Granada, Granada, Spain
Antonio M. Mora
School of Sciences, Technology and Engineering, University of Granada, Granada, Spain
Ala’ M. Al-Zoubi
Department of Information Technology, Mutah University, Karak, Jordan
Ahmad Hassanat
Industrial Innovation and Robotics Center, University of Tabuk, Tabuk, 71491, Saudi Arabia
Ahmad Hassanat

Authors

Nazeeh Ghatasheh
View author publications
You can also search for this author in PubMed Google Scholar
Hossam Faris
View author publications
You can also search for this author in PubMed Google Scholar
Ruba Abukhurma
View author publications
You can also search for this author in PubMed Google Scholar
Pedro A. Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Nailah Al-Madi
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Mora
View author publications
You can also search for this author in PubMed Google Scholar
Ala’ M. Al-Zoubi
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Hassanat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro A. Castillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghatasheh, N., Faris, H., Abukhurma, R. et al. Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market. Prog Artif Intell 9, 361–375 (2020). https://doi.org/10.1007/s13748-020-00219-x

Download citation

Received: 11 November 2019
Accepted: 08 October 2020
Published: 24 October 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13748-020-00219-x

Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Corporate Bankruptcy Using Machine Learning Models

Predicting the Probability of Bankruptcy of Service Sector Enterprises Based on Ensemble Learning Methods

Bankruptcy Prediction Using Bi-Level Classification Technique

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Cost-sensitive ensemble methods for bankruptcy prediction in a highly imbalanced data distribution: a real case from the Spanish market

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Corporate Bankruptcy Using Machine Learning Models

Predicting the Probability of Bankruptcy of Service Sector Enterprises Based on Ensemble Learning Methods

Bankruptcy Prediction Using Bi-Level Classification Technique

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation