Nothing Special   »   [go: up one dir, main page]

Skip to main content

An Interpretable Automated Machine Learning Credit Risk Model

  • Conference paper
  • First Online:
Applied Computer Sciences in Engineering (WEA 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1274))

Included in the following conference series:

Abstract

Credit risk prediction is one of the most recurrent problems in the financial industry. While machine learning techniques such as Neural Networks can have a stunning power of prediction accuracy when done right, the results of such models are not easily interpretable and hence, are difficult to explain and to integrate into financial regulation. Building strong and robust models requires a high degree of expertise, time and testing, and as the list of the available model grows, their complexity also increases. This is why meta-heuristic search and optimization techniques are being built to tackle this task. However, this often means that such models may not be easily interpretable. This work proposes a fast, reproducible pipeline that targets these two salient needs: solid, comparable model-building and reliable interpretability. An automated machine learning process is implemented via Genetic Algorithms to obtain a locally optimal model for our data that is comparable to top Kagglers’ performance for the same classification problem and then, an interpretation engine is added on top to perform sanity checks on our results and identify the most important causals of prediction. This process greatly reduces time, cost and barrier of entry for model-building while providing the reasons for prediction, which can be easily contrasted with expert knowledge to check for correctness and extracting key insights.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  3. Dumitrescu, E., Hue, S., Hurlin, C., Tokpavi, S.: Machine learning for credit scoring: improving logistic regression with non linear decision tree effects. Ph.D. thesis, Paris Nanterre University, University of Orleans (2018)

    Google Scholar 

  4. ElMasry, M.H.A.M.T.: Machine learning approach for credit score analysis: a case study of predicting mortgage loan defaults. Ph.D. thesis (2019)

    Google Scholar 

  5. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  6. Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017)

    Article  Google Scholar 

  7. Gulsoy, N., Kulluk, S.: A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(3), e1299 (2019)

    Article  Google Scholar 

  8. Guszcza, J., Rahwan, I., Bible, W., Cebrian, M., Katyal, V.: Why we need to audit algorithms. Harv. Bus. Rev. (2018). https://hbr.org/2018/11/why-we-need-to-audit-algorithms

  9. Hernandez, G.J.: Asymptotic behavior of evolutionary algorithms. The University of Memphis (2000)

    Google Scholar 

  10. Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machine-learning algorithms. J. Bank. Finance 34(11), 2767–2787 (2010)

    Article  Google Scholar 

  11. Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML, vol. 9. Citeseer (2014)

    Google Scholar 

  12. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 485–492. ACM, New York (2016). https://doi.org/10.1145/2908812.2908918

  13. Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 123–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_9

    Chapter  Google Scholar 

  14. Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87(3), 357–380 (2012)

    Article  MathSciNet  Google Scholar 

  15. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Leon .

Editor information

Editors and Affiliations

4 Apendix: Hyperparameter Values

4 Apendix: Hyperparameter Values

All these modeling hyperparameters come in the form of Python code for immediate reproducibility.

1.1 4.1 Random Forest with Boosting

  • mod_rf=RandomForestClassifier(bootstrap=True, criterion=’entropy’,

    max_features=0.1, min_samples_leaf=8, min_samples_split=15,

    n_estimators=100)

  • mod_gb=GradientBoostingClassifier(learning_rate=0.01, max_depth=8,

    max_features=0.7500000000000001, min_samples_leaf=11,

    min_samples_split=15, n_estimators=100, subsample=0.9000000000000001)

1.2 4.2 Random Forest

  • model_rf=RandomForestClassifier(bootstrap=False, criterion=’gini’,

    max_features=0.3, min_samples_leaf=3, min_samples_split=15,

    n_estimators=100)

1.3 4.3 XGB Classifier

  • model_xgb=xgb.XGBClassifier(learning_rate=0.001, max_depth=7,

    min_child_weight=19, n_estimators=100, nthread=1,

    subsample=0.6500000000000001)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patron, G., Leon, D., Lopez, E., Hernandez, G. (2020). An Interpretable Automated Machine Learning Credit Risk Model. In: Figueroa-García, J.C., Garay-Rairán, F.S., Hernández-Pérez, G.J., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2020. Communications in Computer and Information Science, vol 1274. Springer, Cham. https://doi.org/10.1007/978-3-030-61834-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61834-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61833-9

  • Online ISBN: 978-3-030-61834-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics