An Interpretable Automated Machine Learning Credit Risk Model

Gabriel Patron⁹,
Diego Leon¹⁰,
Edwin Lopez⁹ &
…
German Hernandez⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1274))

Included in the following conference series:

Workshop on Engineering Applications

1147 Accesses
2 Citations

Abstract

Credit risk prediction is one of the most recurrent problems in the financial industry. While machine learning techniques such as Neural Networks can have a stunning power of prediction accuracy when done right, the results of such models are not easily interpretable and hence, are difficult to explain and to integrate into financial regulation. Building strong and robust models requires a high degree of expertise, time and testing, and as the list of the available model grows, their complexity also increases. This is why meta-heuristic search and optimization techniques are being built to tackle this task. However, this often means that such models may not be easily interpretable. This work proposes a fast, reproducible pipeline that targets these two salient needs: solid, comparable model-building and reliable interpretability. An automated machine learning process is implemented via Genetic Algorithms to obtain a locally optimal model for our data that is comparable to top Kagglers’ performance for the same classification problem and then, an interpretation engine is added on top to perform sanity checks on our results and identify the most important causals of prediction. This process greatly reduces time, cost and barrier of entry for model-building while providing the reasons for prediction, which can be easily contrasted with expert knowledge to check for correctness and extracting key insights.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cost of Explainability in AI: An Example with Credit Scoring Models

Application of Machine Learning Techniques for Credit Risk Management: A Survey

Feature contribution alignment with expert knowledge for artificial intelligence credit scoring

Article 06 May 2022

References

Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Dumitrescu, E., Hue, S., Hurlin, C., Tokpavi, S.: Machine learning for credit scoring: improving logistic regression with non linear decision tree effects. Ph.D. thesis, Paris Nanterre University, University of Orleans (2018)
Google Scholar
ElMasry, M.H.A.M.T.: Machine learning approach for credit score analysis: a case study of predicting mortgage loan defaults. Ph.D. thesis (2019)
Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)
Article Google Scholar
Gulsoy, N., Kulluk, S.: A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(3), e1299 (2019)
Article Google Scholar
Guszcza, J., Rahwan, I., Bible, W., Cebrian, M., Katyal, V.: Why we need to audit algorithms. Harv. Bus. Rev. (2018). https://hbr.org/2018/11/why-we-need-to-audit-algorithms
Hernandez, G.J.: Asymptotic behavior of evolutionary algorithms. The University of Memphis (2000)
Google Scholar
Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machine-learning algorithms. J. Bank. Finance 34(11), 2767–2787 (2010)
Article Google Scholar
Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML, vol. 9. Citeseer (2014)
Google Scholar
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 485–492. ACM, New York (2016). https://doi.org/10.1145/2908812.2908918
Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 123–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_9
Chapter Google Scholar
Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87(3), 357–380 (2012)
Article MathSciNet Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Nacional de Colombia, Bogotá, Colombia
Gabriel Patron, Edwin Lopez & German Hernandez
Universidad Externado de Colombia, Bogotá, Colombia
Diego Leon

Authors

Gabriel Patron
View author publications
You can also search for this author in PubMed Google Scholar
Diego Leon
View author publications
You can also search for this author in PubMed Google Scholar
Edwin Lopez
View author publications
You can also search for this author in PubMed Google Scholar
German Hernandez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Leon .

Editor information

Editors and Affiliations

Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García
Infantry School of the National Colombian Army, Bogotá, Colombia
Fabián Steven Garay-Rairán
National University of Colombia, Bogotá, Colombia
Germán Jairo Hernández-Pérez
Corporación Unificada Nacional CUN, Bogotá, Colombia
Yesid Díaz-Gutierrez

4 Apendix: Hyperparameter Values

All these modeling hyperparameters come in the form of Python code for immediate reproducibility.

1.1 4.1 Random Forest with Boosting

mod_rf=RandomForestClassifier(bootstrap=True, criterion=’entropy’,

max_features=0.1, min_samples_leaf=8, min_samples_split=15,

n_estimators=100)
mod_gb=GradientBoostingClassifier(learning_rate=0.01, max_depth=8,

max_features=0.7500000000000001, min_samples_leaf=11,

min_samples_split=15, n_estimators=100, subsample=0.9000000000000001)

1.2 4.2 Random Forest

model_rf=RandomForestClassifier(bootstrap=False, criterion=’gini’,

max_features=0.3, min_samples_leaf=3, min_samples_split=15,

n_estimators=100)

1.3 4.3 XGB Classifier

model_xgb=xgb.XGBClassifier(learning_rate=0.001, max_depth=7,

min_child_weight=19, n_estimators=100, nthread=1,

subsample=0.6500000000000001)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patron, G., Leon, D., Lopez, E., Hernandez, G. (2020). An Interpretable Automated Machine Learning Credit Risk Model. In: Figueroa-García, J.C., Garay-Rairán, F.S., Hernández-Pérez, G.J., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2020. Communications in Computer and Information Science, vol 1274. Springer, Cham. https://doi.org/10.1007/978-3-030-61834-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-61834-6_2
Published: 08 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61833-9
Online ISBN: 978-3-030-61834-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Interpretable Automated Machine Learning Credit Risk Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cost of Explainability in AI: An Example with Credit Scoring Models

Application of Machine Learning Techniques for Credit Risk Management: A Survey

Feature contribution alignment with expert knowledge for artificial intelligence credit scoring

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

4 Apendix: Hyperparameter Values

1.1 4.1 Random Forest with Boosting

1.2 4.2 Random Forest

1.3 4.3 XGB Classifier

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Interpretable Automated Machine Learning Credit Risk Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cost of Explainability in AI: An Example with Credit Scoring Models

Application of Machine Learning Techniques for Credit Risk Management: A Survey

Feature contribution alignment with expert knowledge for artificial intelligence credit scoring

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

4 Apendix: Hyperparameter Values

4 Apendix: Hyperparameter Values

1.1 4.1 Random Forest with Boosting

1.2 4.2 Random Forest

1.3 4.3 XGB Classifier

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation