Abstract
Credit risk prediction is one of the most recurrent problems in the financial industry. While machine learning techniques such as Neural Networks can have a stunning power of prediction accuracy when done right, the results of such models are not easily interpretable and hence, are difficult to explain and to integrate into financial regulation. Building strong and robust models requires a high degree of expertise, time and testing, and as the list of the available model grows, their complexity also increases. This is why meta-heuristic search and optimization techniques are being built to tackle this task. However, this often means that such models may not be easily interpretable. This work proposes a fast, reproducible pipeline that targets these two salient needs: solid, comparable model-building and reliable interpretability. An automated machine learning process is implemented via Genetic Algorithms to obtain a locally optimal model for our data that is comparable to top Kagglers’ performance for the same classification problem and then, an interpretation engine is added on top to perform sanity checks on our results and identify the most important causals of prediction. This process greatly reduces time, cost and barrier of entry for model-building while providing the reasons for prediction, which can be easily contrasted with expert knowledge to check for correctness and extracting key insights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Dumitrescu, E., Hue, S., Hurlin, C., Tokpavi, S.: Machine learning for credit scoring: improving logistic regression with non linear decision tree effects. Ph.D. thesis, Paris Nanterre University, University of Orleans (2018)
ElMasry, M.H.A.M.T.: Machine learning approach for credit score analysis: a case study of predicting mortgage loan defaults. Ph.D. thesis (2019)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)
Gulsoy, N., Kulluk, S.: A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(3), e1299 (2019)
Guszcza, J., Rahwan, I., Bible, W., Cebrian, M., Katyal, V.: Why we need to audit algorithms. Harv. Bus. Rev. (2018). https://hbr.org/2018/11/why-we-need-to-audit-algorithms
Hernandez, G.J.: Asymptotic behavior of evolutionary algorithms. The University of Memphis (2000)
Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machine-learning algorithms. J. Bank. Finance 34(11), 2767–2787 (2010)
Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML, vol. 9. Citeseer (2014)
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 485–492. ACM, New York (2016). https://doi.org/10.1145/2908812.2908918
Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 123–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_9
Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87(3), 357–380 (2012)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
4 Apendix: Hyperparameter Values
4 Apendix: Hyperparameter Values
All these modeling hyperparameters come in the form of Python code for immediate reproducibility.
1.1 4.1 Random Forest with Boosting
-
mod_rf=RandomForestClassifier(bootstrap=True, criterion=’entropy’,
max_features=0.1, min_samples_leaf=8, min_samples_split=15,
n_estimators=100)
-
mod_gb=GradientBoostingClassifier(learning_rate=0.01, max_depth=8,
max_features=0.7500000000000001, min_samples_leaf=11,
min_samples_split=15, n_estimators=100, subsample=0.9000000000000001)
1.2 4.2 Random Forest
-
model_rf=RandomForestClassifier(bootstrap=False, criterion=’gini’,
max_features=0.3, min_samples_leaf=3, min_samples_split=15,
n_estimators=100)
1.3 4.3 XGB Classifier
-
model_xgb=xgb.XGBClassifier(learning_rate=0.001, max_depth=7,
min_child_weight=19, n_estimators=100, nthread=1,
subsample=0.6500000000000001)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Patron, G., Leon, D., Lopez, E., Hernandez, G. (2020). An Interpretable Automated Machine Learning Credit Risk Model. In: Figueroa-García, J.C., Garay-Rairán, F.S., Hernández-Pérez, G.J., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2020. Communications in Computer and Information Science, vol 1274. Springer, Cham. https://doi.org/10.1007/978-3-030-61834-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-61834-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61833-9
Online ISBN: 978-3-030-61834-6
eBook Packages: Computer ScienceComputer Science (R0)