Authors:
Hope Hodges
1
;
Carolyn Garrity
2
and
James Pope
3
Affiliations:
1
Mississippi State University, Starkville, MS, U.S.A.
;
2
Stephens College of Business, University of Montevallo, Montevallo, AL, U.S.A.
;
3
Intelligent Systems Laboratory, University of Bristol, Bristol, U.K.
Keyword(s):
Deep Learning, Feature Selection, Home Mortgage Disclosure Act, Loan Classification, Financial Technology.
Abstract:
Analysis of home mortgage applications is critical for financial decision-making for commercial and government lending organisations. The Home Mortgage Disclosure Act (HMDA) requires financial organisations to provide data on loan applications. Accordingly, the Consumer Financial Protection Bureau (CFPB) provides loan application data by year. This loan application data can be used to design regression and classification models. However, the amount of data is too large to train for modest computational resources. To address this, we used reservoir sampling to take suitable subsets for processing. A second issue is that the number of features are limited to the original 78 features in the HMDA records. There are a large number of other data source and associated features that may improve model accuracy. We augment the HMDA data with ten economic indicator features from an external data source. We found that the additional economic features do not improve the model’s accuracy. We desig
ned and compared several classical and recent classification approaches to predict the loan approval decision. We show that the Decision Tree, XG Boost, Random Forest, and Support Vector Machine classifiers achieve between 82-85% accuracy while Naive Bayes results in the lowest accuracy of 79%. We found that a Deep Neural Network classifier had the best classification perfor-mance with almost 89% f1 accuracy on the HMDA data. We performed feature selection to determine what features are the most important loan classification. We found that the more obvious loan amount and applicant income were important. Interestingly we found that when we left race and gender in the feature set, unfortunately, they were selected as an important feature by the machine learning methods. This highlights the need for diligence in financial systems to make sure the machine is not biased.
(More)