Nothing Special   »   [go: up one dir, main page]

Ariza-Garzon (2021) Risk-Return Modelling in The p2p Lending Market - Trends, Gaps, Recommendations and Future Directions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Electronic Commerce Research and Applications 49 (2021) 101079

Contents lists available at ScienceDirect

Electronic Commerce Research and Applications


journal homepage: www.elsevier.com/locate/elerap

Risk-return modelling in the p2p lending market: Trends, gaps,


recommendations and future directions
Miller-Janny Ariza-Garzón a, b, *, María-Del-Mar Camacho-Miñano c, María-Jesús Segovia-
Vargas d, Javier Arroyo a, e
a
Software Engineering and Artificial Intelligence Department, Complutense University of Madrid, 28040 Madrid, Spain
b
Faculty of Statistical Studies, Complutense University of Madrid, 28040 Madrid, Spain
c
Accounting and Finance Department, Faculty of Business Administration and Economics, Complutense University of Madrid, Campus de Somosaguas, 28223, Pozuelo de
Alarcón, Madrid, Spain
d
Department of Financial and Actuarial Economics and Statistics, Faculty of Business Administration and Economics, Complutense University of Madrid, 28223 Madrid,
Spain
e
Institute of Knowledge Technology, Complutense University of Madrid, 28040 Madrid, Spain

A R T I C L E I N F O A B S T R A C T

Keywords: Peer-to-peer (P2P) lending is a market with significant growth in recent years. We review the academic literature
Risk management modeling published during the last decade on P2P lending to identify the main research trends and find potential gaps that
Credit risk modeling limit stakeholders’ use of research proposals. We perform both a bibliometric and systematic analysis. The
Profit and investment modeling
bibliometric analysis will identify the most influential papers and the relationship and evolution of the main
P2P lending
Machine learning
topics. In the systematic analysis, we categorized the documents according to methodological elements and
Literature review business aspects. Remarkably, many proposals include artificial intelligence or machine learning algorithms.
Bibliometric analysis However, many of them lack a proper understanding of the application context, the definition of potential
variables in a business framework, explainability, etc. Such elements should be recognized as essential elements
to exploit their benefits. In this respect, we provide some recommendations and show future research directions.

1. Introduction reduction of intermediation costs, the incipient regulation customized to


the business, and the evolution of big data and artificial intelligence
The peer-to-peer (P2P) lending market is based on technological tools that complement traditional modeling, allowing the experience to
platforms that charge a fee for the service of connecting lenders and be personalized and the service to be improved (Financial Stability
borrowers. Borrowers can obtain credit directly from lenders at lower Board, 2017; Giudici et al., 2020). Claessens et al. (2018) differentiate
rates and with greater accessibility than conventional credit alterna­ leverage factors by country, market, and regulatory framework and
tives, while lenders can obtain higher returns than with other financial detail some challenges of the new credit markets. Some of the most
products (Emekter et al., 2015). The P2P lending market emerges as an important risks are information asymmetries (Serrano-Cinca and
alternative for investment and financing that breaks down some of the Gutierrez-Nieto, 2016), adverse selection, and moral hazard (Cummins
intermediation barriers offered by traditional banking. This market can and Lynn, 2019), and lenders bear credit risk on most platforms
be considered a collaborative economy agreement (Serrano-Cinca et al., (Serrano-Cinca and Gutierrez-Nieto, 2016).
2015) that even allows the financial inclusion of people excluded from However, the P2P lending market also has inherent risks, for
traditional systems (Claessens et al., 2018). It is considered a comple­ example, in China, the lack of regulation of the P2P market caused a
mentary and non-competitive market to conventional banking (Milne surge in fraud and illegal activity. As a result, this market suffered a
and Parboteeah, 2016), an approach that different governments and regression in China. As Gao et al. (2020) mentioned, even though the
regulatory bodies worldwide are working on (ROFIEG, 2019). P2P market is based on technological innovation, it remains a financial
The rise of this new market is also associated with factors such as the problem that demands proper credit and financial risk management. The

* Corresponding author at: Software Engineering and Artificial Intelligence Department, Complutense University of Madrid, 28040 Madrid, Spain. Tel.: +91 394
2564; fax: +91 394 23 88.
E-mail address: millerar@ucm.es (M.-J. Ariza-Garzón).

https://doi.org/10.1016/j.elerap.2021.101079
Received 24 December 2020; Received in revised form 24 May 2021; Accepted 22 July 2021
Available online 26 July 2021
1567-4223/© 2021 Elsevier B.V. All rights reserved.
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

collapse of the Chinese P2P lending market contributed to making future work.
visible the need for efficient regulatory policies that are customized to
the challenges of the new markets, and, based on competent technicians, 2. Research Methodology
also highlighted aspects in ROFIEG (2019). As it is a unique market,
there is a need for customized risk management and regulation; how­ The goal of this study is to provide an overview of the current
ever, it should be unified for the different countries involved, to truly research on the P2P lending market. To conduct it, we propose three
protect investors and the entire market in a globalized context. research questions:
Quantitative modeling is considered an important activity for risk
management. Nevertheless, it is also vital to maximize investors’ profits 1) Research question 1 (RQ1): How has the literature on the P2P
and provide quality of service in a technological and globalized envi­ lending market evolved in the last ten years? More specifically, what
ronment. Its use can be improved by recognizing factors such as are the countries that are researching it? Which venues are pub­
appropriate business knowledge and understanding the associated risk lishing this research? What are the most influential papers?
factors, risk stages, and different techniques customized to the specific 2) Research question 2 (RQ2): What are the main contents developed in
problems in its financial management. It is also essential to establish a the P2P lending market in the last decade? Datasets? Business
modeling structure that ensures unbiasedness and efficiency and con­ problems? Methodology? Data modeling paradigm used? Current
cerns predictive accuracy measures and what they mean in a business major research trends?
context (Xia et al., 2017). It is equally important to have sufficient data, 3) Research question 3 (RQ3): What are the main research gaps and
take advantage of various sources with validation and appropriate future research directions in the P2P lending market?
treatment (Giudici et al., 2020; ROFIEG, 2019), and know a range of
powerful prediction techniques with explainability (Ariza-Garzon et al., To answer the research questions, we use two tools: a bibliometric
2020; Bussmann et al., 2020; ROFIEG, 2019), among other aspects. analysis and a systematic literature review. More precisely, we use
These provide elements to guarantee its use, control, and trust from bibliometric analysis to answer RQ1. Part of the bibliometric analysis
regulators and users. also helps us to answer RQ2. However, to deepen this question, we carry
Bearing all these ideas in mind and given the topic’s relevance, we out a systematic literature review, where we carefully analyze and
aim to find main research trends in the literature associated with the categorize each paper. To answer RQ3, we integrated the results of the
different quantitative methodologies used in managing the risk-return two types of analysis, looking for underdeveloped topics and emerging
pair in the P2P lending market, which includes several of the aspects aspects.
described above. In other words, our goal is to know what topics have
already been studied and addressed in the P2P lending market and what 2.1. Research tools
are the biggest challenges and limitations in need of further study. To
address these questions, we carry out a bibliometric and traditional In the following subsections, we present methodological details that
systematic analysis of 104 papers from the Web of Science (WoS) support the bibliometric analysis with VOSviewer and how we have
bibliographic database, which corresponds to papers on the topic pub­ used them in our research. We also detail the criteria used to review and
lished between 2010 and the first quarter of 2020 (April 15, 2020). classify the papers in the systematic analysis.
The bibliometric analysis will show the main references, trends, and
different developments in the last decade. Our paper has a different 2.1.1. Bibliometric analysis
approach from prior papers published, such as Bachmann et al. (2011), This methodology provides an overview of the research area, iden­
which is focused mainly on the description of existing platforms, the tifying the most relevant documents, topics, and trends associated with
identification of the determinants associated with the proper manage­ managing both risk and profitability in the P2P market. We explore
ment of financing sources, and the obtention of returns on investment citation and co-occurrence with network analysis using VOSviewer (Van
during the second half of the first decade of this century. In our case, a Eck and Waltman, 2010), a software tool for analyzing and visualizing
systematic analysis complements the bibliometric analysis by analyzing freely accessible bibliographic data on www.vosviewer.com.
several in-depth aspects of risk and return management in this market. In the first case, the network represents citations (links) between
The criteria used are the business focus of the proposals, the methodo­ documents (nodes). We use the VOSviewer definition of citation link,
logical contribution, the definition of the target variables, the type of where a link is created between two articles when one article cites the
modeling paradigm used, the quantitative contribution, and the other. VOSviewer represents them as undirected links; however, we can
description of the modeling stages. To conclude, we propose some rec­ generally infer the direction of the link from the year of publication of
ommendations and identify research gaps, which we recommend be both nodes and the color of the nodes and links.
investigated and deepened for adequate risk and profit management and In the co-occurrence network, nodes represent topics, and there is a
the healthy development of the P2P lending market in our countries. link between topics when they appear in the same document, or more
To achieve the purposes described, we organize the rest of the paper precisely, in their titles, abstracts, or keywords. We only account for the
as follows. Section 2 presents the research methodology, with the defi­ presence of the terms, regardless of the number of times the terms may
nition of our research questions and the selection criteria of the papers appear.
studied. In section 3, we show the results. First, the bibliometric analysis For this network, VOSviewer performs NLP to identify the most
includes general statistics of the worldwide P2P market, publications by relevant noun phrases, known as terms. It starts by tagging verbs, nouns,
region, indicators of cooperation among authors, and citation and co- adjectives, etc. Then, it uses a linguistic filter that identifies the most
occurrence analysis through graphs and cluster analysis of concepts relevant noun phrases. The filter is based on the distribution of co-
and methodological elements, among other aspects. In the second part of occurrences and the randomness of their presence in the text through
the section, we show the results of a systematic analysis, where we the Kullback-Leibler distance. The distribution of each term is compared
include descriptive criteria, such as the datasets and software used. with the overall distribution of co-occurrences over noun phrases.
Furthermore, we characterize and classify the papers by some important VOSviewer, therefore, calculates a relevance score for each noun phrase.
aspects of the P2P business: the business problem involved, the data Noun phrases have a high relevance score if they co-occur mainly with a
modeling paradigm used, methodological aspects highlighted in each limited set of other noun phrases and a low relevance score if their co-
study, and some modeling aspects required by market regulators and occurrences are close to randomness. For more detail on the process,
supervisors. In section 4, we discuss the results. Finally, in section 5, we see Van Eck and Waltman (2011).
summarize the main conclusions and offer some recommendations for The visualization of the co-occurrence network and the cluster

2
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

analysis are based on the similarity matrix from the set of terms, where 1. The main topic: peer to peer lending market.
each cell represents the association strength or proximity index between 2. Modeling methods and algorithms used.
two nodes (terms). We manually merged synonyms or closely related 3. Terms associated with risk management.
terms (e.g., decision tree also includes decision tree classifier, decision 4. Terms focusing on the default or profit generated from the obligation
tree technique, or decision-tree approach; artificial neural network in­ loan.
cludes terms such as backpropagation algorithm, bp neural network, bp
neural network method, bp neural network model, bp neural network For each aspect, the authors agreed on potential keywords, including
model, and mlp). We also discarded terms that were too general or could synonyms. The final keywords used are the logical conjunction of the
have an ambiguous meaning out of context (e.g., P2P lending, credit, following:
market, efficiency, and evidence, among others).
For both the citation and co-occurrence networks, the final layout of 1. TOPIC: ((“peer to peer market” OR “P2P Platforms” OR “Online Peer-
the nodes in a two-dimensional space, VOSviewer uses a technique to-Peer credit”) OR (“peer-to-peer credit”) OR (“P2P credit”) OR
similar to multidimensional scaling. The mapping and clustering process (“peer to peer credit”) OR (“online Peer-to-Peer lending”) OR (“peer-
maximization of a modularity function are unified in a single approach to-peer lending”) OR (“P2P lending”) OR (“peer to peer lending”) OR
(Waltman et al., 2010). More precisely, node proximity implies a (“social lending”))
stronger relationship between the items, either by citation or co- 2. TOPIC: (statistical analysis OR survival OR learning schemes OR
occurrence. In addition, the node size is proportional to the number of supervised models OR classification models OR learning techniques
citations in the citation network, and the number of occurrences in the OR model OR machine learning OR logistic OR cox OR method OR
co-occurrence network. For more details on VOSviewer, the interested algorithm OR prediction)
reader can refer to its manual (Van Eck and Waltman, 2020). 3. TOPIC: (credit risk OR risk assessment OR risk OR risk assessment OR
Furthermore, we use two types of representations from VOSviewer: financial risk)
4. TOPIC: (loan evaluation OR loan default OR probability of default
• Clusters by color on a network built using measures of similarity OR lenders’ profitability OR profit OR creditworthiness OR credit­
associated with the connection or relationship between the observed worthiness rates OR scoring OR pay loan OR pay back the loan OR
units, which can be papers or terms. The links represent the default OR status OR borrower status)
connection or the relationship between two units.
• A temporal trend on a network. The same network described in the We also refine by years of publication between 2010 and the first
previous representation, but with a color degradation from green to quarter of 2020 (April 15, 2020). The main reason for selecting this
yellow that represents how recent the unit’s appearance is from range is that the prior literature review on a similar topic was Bachmann
oldest to most recent. et al. (2011), who considered the first decade of the century. Our study
aims to provide an updated review focusing on the most recent research
Finally, we use bibliometrix, a library on R developed by (Aria and trends, and we consider more criteria for analysis. We manually ensured
Cuccurullo, 2017), to analyze authorship and coauthorship statistics. that all the retrieved papers were relevant to our aims.
We discard two papers from the 106 documents retrieved by the
2.1.2. Systematic literature review query since their object of study is not associated with the P2P lending
To complement the bibliometric analysis, we carry out a systematic market. One focuses on the real estate market, and the other focuses on
review of the literature where we manually analyze several aspects of credit risk in general.
the proposed approaches, focusing on risk and profit management. In As a result, for the bibliometric analysis, we consider 104 documents
particular, we consider the following aspects related to the modeling published in conferences and journals. All of them are supposed to
process: follow high-quality standards because they have been revised by blind
peer review (Clarivate, 2021). We decide not to exclude conference
• Data sets: legal person (individuals or companies) considered and the papers because we consider that they may reflect emerging AI and ML
country or region. trends, especially in computer science conferences (Yli-Huumo et al.,
• Software tools. 2016). The analysis uses authors, citations, titles, abstracts, and key­
• The business problem (e.g., default classification/default probabil­ words in English.
ity, profit scoring model, fraud, LGD, etc.) We considered the papers from the bibliometric analysis and
• The data modeling paradigm. removed non-English papers and those without full-text availability for
• Methodological aspects (e.g., performance evaluation, the inclusion the systematic review (Bae, 2018; Park and Choi, 2019; Soo and A, 2016;
of new variables, class imbalance treatment, reject inference). Sungbok, 2018; Wu et al., 2018; Liu et al., 2019). We obtain a final
• Other modeling aspects specifically associated with classification sample of 98 papers.
models that we consider crucial to ensuring the correct applicability
and replicability of the results include cross-validation strategy, 3. Results
hyperparameter tuning, the use of logistic regression as a reference
model, the statistical comparison of classifiers’ performance, and In this section, we present the results of the bibliometric and sys­
explainability. tematic analyses.

We reviewed the documents and classified them according to these 3.1. Bibliometric analysis
criteria. The annex shows a table with some details related to the vari­
ables and the specific type of model used in each document. 3.1.1. Analysis of the publication year, authors, source, and geographic
distribution
2.2. Search and screening of the relevant papers Fig. 1 shows the distribution of the selected papers by publication
year. Given this trend, we can conclude that the P2P lending market is
We used Web of Science (WoS), the main scientific journal database gaining attention (almost one-third of the papers were published in
(Gong et al., 2019; Birkle et al., 2020), to search for papers relevant to 2019). Since complete information has not been provided for 2020, it is
our literature review. To construct the query, we identified the following difficult to assess whether it follows the trend from previous years.
key aspects that the retrieved papers should include: Table 1 of authoring statistics uses the bibliometrix library in R (Aria

3
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

35%

30%

25%

20%

15%

10%

5%

0%
2010 2011 2014 2015 2016 2017 2018 2019 2020

Fig. 1. Distribution of primary papers selected from 2010 to 2019 and up until April 2020. (Source: Prepared by the authors using Clarivate Web of Science data).

Table 1 Table 2
Measures of contribution per author. Publication venues of the P2P lending market.
Index Value Documents Journals Information Number
Classification
Single-authored documents 12
Documents per author 0.367 Q1 journals* IEEE Access 6
Authors per Document 2.72 Electronic Commerce Research and Applications 5
Co-Authors per Documents 3.25 Journal of Management Information Systems 3
Collaboration Index 2.96 Mathematics 2
Expert Systems with Applications 2
(Source: Prepared by the authors using bibliometrix and Web Decision Support Systems 2
of Science data). Journal of the Royal Statistical Society Series A- 1
Statistics in Society
Journal of Marketing Research 1
and Cuccurullo, 2017). There are 12 papers with only one author. Ac­
European Journal of Operational Research 1
cording to the data, on average, each author wrote 37% of a document World Wide Web: Internet and Web Information 1
(documents/authors), which is equivalent to 2.7 authors per document Systems
(authors/documents). In reality, there were 3.25 authors on average per Journal of The Franklin Institute-Engineering 1
document (authors appearance/documents). The difference is that the and Applied Mathematics
Journal of Banking and Finance 1
first metric is calculated with the total number of authors compared to
Finance Research Letters 1
the total number of documents. The second calculates the average Journal of Computational and Applied 1
number of authors per document. The collaboration index is almost 3, Mathematics
which means three co-authors per article, calculated on the multiau­ Engineering Applications of Artificial 1
thored article set (total authors of multi-authored articles/total multi­ Intelligence
Information Systems Frontiers 1
authored articles), according to the indicator presented in Koseoglu Q2 journals* Electronic Commerce Research 3
(2016). Thus, we can conclude that there is frequent collaboration in Physica A-Statistical Mechanics and its 3
this research area. Applications
Table 2 shows the publication venues of the retrieved papers. We can North American Journal of Economics and 1
Finance
see that 44 papers (42% of our sample) are published in top journals,
Sustainability 1
that is, Q1 and Q2 journals in their respective WoS categories. The Plos One 1
journals that appear more frequently are IEEE Access, an open-access Expert Systems 1
interdisciplinary journal that aims for fast dissemination, and Elec­ Quality Engineering 1
tronic Commerce Research and Applications, a journal whose scope International Journal of Electronic Commerce 1
Annals of Operations Research 1
perfectly fits the topic of P2P lending. In addition, we can find journals Journal of Forecasting 1
from mathematics (including statistics and operational research), eco­ Other journals – 24
nomics (from finance and electronic commerce), computer science Proceedings 36
(from the areas of artificial intelligence and information systems), and *Quartiles according to the web of science 2019 Journal Impact Factor. We
multidisciplinary areas. This shows that the problem can be addressed considered the best quartile if the journal was classified in more than one WoS
from different perspectives. category.
Fig. 2 shows the geographical distribution of authorship and coau­ (Source: Prepared by the authors using data from Clarivate Web of Science).
thorship. Interestingly, authors from China participated in most of the
papers (approximately 94.2%), followed by the USA (14.4%), Italy, and smaller proportion of intercountry contributions between authors from
other Eastern countries. It is also noteworthy that the most relevant other Eastern countries.
intercountry collaborations by the number of published articles are This result is unsurprising since P2P had tremendous growth in
between the USA and China, followed by the UK and China, with a China over the last decade. However, at the time of this analysis, the

4
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Fig. 2. Map of the final sample of paper participation by country, and contributions between countries according to coauthorship. (Source: Prepared by the authors
using Clarivate Web of Science data and bibliometrix).

market of this country has deteriorated due to the lack of an adequate 3.1.2. Citation analysis
regulatory framework. This market presented a significant increase in In this section, we use citation analysis to identify the papers with the
illegality and fraud, with few risk management practices. This over­ greatest number of links, which means the greatest number of citations.
shadowed the entities shown to have a robust offer and efficient risk Fig. 4 shows the citation network. Each circle represents a paper. The
management, forcing the intervention in and closure of many platforms. larger circles represent papers that have more citations. Colors are
Fig. 3 shows the number of P2P platforms worldwide, but the figure was associated with each article. This map allows us to visualize the citation
not available for China. The top ten countries in order of the highest structure and identify the most relevant scientific community docu­
number of platforms are the USA, the UK, Indonesia, Germany, Mexico, ments. We identify the most relevant papers to answer RQ1.
Switzerland, Latvia, Estonia, Spain, and South Korea. Consequently, according to Fig. 4, the main articles are Greiner and

Fig. 3. Distribution of the number of lending platforms in the P2P market by country. (Source: Compiled by the authors based on data from https://P2Pmarketdata.
com/, accessed April 17, 2020).

5
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Fig. 4. Main authors’ citation graph. (Source: Prepared by the authors using VOSviewer based on the Web of Science data).

Wang (2010), Herzenstein et al. (2011), Emekter et al. (2015), Serrano- papers such as Serrano-Cinca and Gutierrez-Nieto (2016) and Xia et al.
Cinca et al. (2015), Malekipirbazari and Aksakalli (2015), Serrano-Cinca (2017) will have a greater impact in the coming years. Both papers deal
and Gutierrez-Nieto (2016), and Xia et al. (2017), with Emekter et al. with risk management models with a business focus, making them very
(2015) being the most cited publication. However, the relevance of the suitable for the industry.
papers has to be leveraged by the time they have been published because The most cited papers are mostly devoted to identifying the key
recent papers have fewer opportunities to be cited. Thus, we believe that factors in credit risk models in P2P lending. They mainly use statistical

Fig. 5. Graph of the co-occurrence and clusters for the concepts present in titles and abstracts.. (Source: Prepared by the authors using VOSviewer based on the Web
of Science data).

6
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

and econometric models. They propose cross-sectional models and some grouped in each cluster. The third column (Occur.) represents the
survival models. These models try to predict endogenous variables such number of papers in which each term appears. The fourth one (Avg.
as the credit contract’s approval, default prediction, performance, or Occur) shows the average number of documents in which each term
profit on a credit obligation. However, works such as Malekipirbazari appears. The fifth (Avg. pub. Year) indicates the average year of publi­
and Aksakalli (2015) and Xia et al. (2017) try to improve credit risk cation of the term considering the articles that appear within. The sixth
management prediction and obtain profits by incorporating machine column (Avg. Avg. pub. Year) indicates the average year of publication
learning techniques, mainly decision tree methods. for all the terms of the cluster. The seventh represents the number of
node links (the links represent co-occurrence). Finally, the eighth (Avg.
3.1.3. Co-occurrence analysis of abstract and titles Links) shows the average number of links for each term within a cluster.
In this section, we identify the main topics associated with risk and Given Table 3 and Figs. 5 and 6, we will characterize the clusters
profit management in P2P lending using a co-occurrence network con­ below. The first cluster, “Business and regulation” (green color), in­
structed with the textual data in the abstracts and the titles of publica­ cludes terms associated with modeling objectives from the business
tions. We also carry out a cluster analysis that will identify communities point of view: “default” modeling (60 occurrences), “scoring” (32 oc­
on the networks. We use VOSviewer for such purposes, which also currences), “investment” (25 occurrences), and “profits” (15 occur­
provides mapping and visualization using multivariate methods. The rences) models. These terms are also characterized by their high
resulting plots show as many labels and nodes as possible, prioritizing centrality according to the number of links with 36, 33, 25, and 15 links.
those most relevant nodes over the less critical ones. As evidenced in some papers, the modeling of risk or profit management
Figs. 5 and 6 show the co-occurrence networks. The nearness of the must include solid and comprehensive regulatory guidelines (“regula­
nodes represents how closely the terms are related, while the size rep­ tion,” “regulatory authority,” and “government”) to support the sus­
resents the number of publications in which each term appears. The tainability and development of the P2P lending market. The appearance
color of the nodes in Fig. 5 corresponds to the cluster to which each term of terms related to business and regulation highlights the need to study
has been assigned, while in Fig. 6, lighter colors are associated with and consider this framework for action in developing modeling pro­
more recent publications, and darker colors are associated with older posals for risk and profit management. Oddly enough, the term “network
publications. connectivity” appears in this cluster because some papers use statistics
In addition, Table 3 presents a descriptive summary of the resulting derived from network analysis as risk determinants in recent years (Avg.
clusters. The publication year variable was not considered in the clus­ Pub. Year of 2018). These network statistics somehow consider latent
tering process, so it is shown in the table to complement the description factors determined by the relationship among users, and that may
of the clusters. The first column (Name of Cluster) represents the name represent a structure of contagion, which aligns with the regulatory
we assigned to each cluster: “Business and regulation” (green color), perspectives. Another term appears as “collateral,” less central (7 links),
“Performance evaluation, ensemble models and new sources of infor­ frequent (3 occurrences), and studied less recently (Avg. pub. Year
mation” (red color), “Features and neural networks models” (yellow), 2016.3), to which more attention should be devoted as an element of
“Logistic regression and interpretability” (blue) and “Fraud, survival, hedging and risk management in P2P business.
and other models” (purple). The names try to summarize the terms that A second cluster, both numerous and diverse, is the one we call
belong to each cluster. In addition, we ordered the clusters in the table “Performance evaluation, ensemble models, and soft data” (red color).
by relevance according to the frequency and centrality defined by the This cluster is mainly represented by terms associated with classification
number of links. The second column symbolizes the labels of the terms as a modeling objective (the term “classifier” has 26 occurrences and 33

Fig. 6. Graph of the co-occurrence and time trend for the concepts present in titles and abstracts. (Source: Prepared by the authors using VOSviewer based on the
Web of Science data).

7
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 3
Descriptive summary of clusters in the co-occurrence network based on abstracts and titles.
Name of Cluster Label Occur. Avg. Avg. Pub. Avg. Avg. Pub. Links Avg.
Occur. Year Year Links

Business and regulation (green) default 60 16.5 2017.7 2017.5 36 19.8


scoring 32 2017.7 33
investment 25 2018 26
profits 15 2017.8 19
business knowledge 11 2017.5 23
information asymmetry 9 2016.4 18
regulatory authority 9 2017.9 17
regulation 9 2017.4 15
network connectivity 6 2018 15
collateral 3 2016.3 7
government 3 2017.7 9
Performance evaluation, ensemble models, and soft data performance 50 12.36 2017.8 2018.4 35 19.7
(red) classifier 26 2018.1 33
decision tree 12 2017.8 21
ensemble 8 2018 18
gradient boosting 8 2018.5 17
regression 7 2018.3 20
class imbalance 7 2018.4 18
natural language processing 6 2018.7 18
soft information 5 2019.2 14
hard information 4 2018.5 13
fuzzy 3 2019 10
Features and neural network models (yellow) features 37 14.8 2017.6 2017.7 31 20
deep learning 13 2018.4 24
artificial neural network 13 2017.2 22
strategy for model 6 2017.8 13
evaluation
big data_complex data 5 2017.6 10
Logistic regression and interpretability (blue) logistic regression 21 9.2 2018.2 2017.3 26 18.8
trust 9 2016.6 19
credit grade 7 2016.7 20
interpretability 6 2018.7 19
bureau score 3 2016.3 10
Fraud, survival and other models (purple) Random forest 7 4.8 2017.1 2017.2 20 15
survival analysis 6 2017.8 16
fraud 4 2017.3 15
support vector machine 4 2017.3 11
social data 3 2016.3 13

(Source: Prepared by the authors using VOSviewer based on the Web of Science data).

links) and with model evaluation focused on “performance” (50 occur­ line have highlighted the “strategy for model evaluation” (six occur­
rences and 35 links) through different metrics (see also Fig. 5). Classi­ rences and 13 links) as an element to be considered when evaluating and
fication is one of the main topics in the literature, and the evaluation comparing models.
criteria in these papers typically focus on performance metrics. How­ The fourth cluster, “Logistic regression and interpretability” (blue),
ever, these papers usually neglect other aspects that would boost their includes elements of classic risk management such as “logistic regres­
application by regulators and users, as mentioned later. Some of these sion” (21 occurrences and 26 links). “Logistic regression” is one of the
papers use “ensemble” models (8 occurrences and 18 links) or “gradient central topics in the connectivity network because it is typically used as
boosting” (8 occurrences and 17 links) models to improve performance, a benchmark model in many publications (see Fig. 5). “Bureau score”
and “decision tree” methods (12 occurrences and 21 links) are typically (three occurrences and ten links) and “credit grade” (seven occurrences
used as benchmarks. The term “class imbalance” also appears because it and ten links) appear with minor occurrences. While “Credit grade” is
is one of the main problems in credit risk. Finally, we can find recently used as an alternative modeling variable, “bureau score” is one of the
treated terms such as “natural language processing” (Avg. pub. Year of relevant variables in most credit-granting models. When there is finan­
2018.7), “soft information” (Avg. pub. Year of 2019.2), “hard informa­ cial experience, it describes the behavior of the credit history of loan
tion” (Avg. pub. Year of 2018.5) or “fuzzy” (Avg. pub. Year of 2018.5). applicants. This cluster is associated with “interpretability” (six occur­
They account for new sources of information or new ways of repre­ rences and 19 links). It is a rare and recently studied term (Avg. pub.
senting information in the models. Year of 2018.7) but is deeply associated with the “trust” (nine occur­
We named the third cluster “Features and Neural Networks models” rences and 19 links) term. Likely, these features are greatly valued by
(yellow color). We mainly found terms related to “features” (37 occur­ regulators who have precise “interpretability” and “trust”, which is why
rences and 31 links) and “big data_complex data” (five occurrences and “logistic regression” continues to be a reference model, despite finding
ten links), both related to the exploration of new sources of information better performances in other modeling proposals.
or new ways to include the information. Such venues seem to gain The last cluster (purple color) presents the lowest frequency (Avg.
strength together with modeling alternatives such as “artificial neural Occur of 4.8) and centrality (Avg. Links of 15), and deals with terms not
networks” (13 occurrences and 22 links). In recent years, “deep seen in recent years (Avg. Avg. pub. Year of 2017.2). We call this cluster
learning” (13 occurrences and 24 links) methodologies have been used “fraud, survival, and other models,” and it includes some well-known
by researchers (Avg. pub. Year of 2018.4) as an alternative and valuable machine learning methods such as “random forest” and “support vec­
tool when considering the varied “features” and sources of complex tor machine”. It also includes papers that consider “social data” infor­
information from “big data_complex data”. The studies proposed in this mation. They also propose “fraud” models (4 occurrences), which is a

8
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

topic that barely worked, given the importance it has in the P2P market 2.2.
(see the collapse of the Chinese P2P market already mentioned). “Sur­
vival models” also appeared a few times (six occurrences and 16 links). It 3.2.1. Dataset and software
is a classic alternative to default probability models and behavior We start our systematic review by analyzing the data set used to
models, which are widely used in expected loss estimation, thanks to assess risk and profit management strategies in P2P lending. We identify
longitudinal evaluation. Options of “survival models” are worth study­ the dataset by the name of the company and its country. Table 4 shows
ing and deepening them from machine learning, for the possibility of the frequency of the data sets in the literature. The Lending Club (LC)
aligning with the requirements of the leading international regulators, data set is the most popular. LC is one of the platforms with the highest
Basel, or IFRS. volume of users in the United States. The data were used in 35 of the
research papers analyzed, and approximately 35% of the papers
3.1.4. Co-occurrence analysis of keywords analyzed. Most likely, this is due to the quality, variety, and quantity of
This section briefly comments on the co-occurrence network created the data. Kaggle, the popular data science website, also makes it possible
using keywords (see Fig. 7). The network is similar to that using the to work with data from many people from different countries. As a
terms from the abstracts and titles. However, we can find new terms result, it is a data set studied from different perspectives and with
such as “microfinance” and “microcredit”. This indicates the association standard and novel methodological strategies.
of the P2P market with the supply of alternative financial services for Apart from the Lending Club case, there is a strong tendency to use
low-income people and small businesses, leveraging financial inclusion. data from China. This may be caused by the growth of the Chinese
We also find that the use of cost-sensitive models stands out in publi­ market, the availability of the information, and the concern of an
cations in mid-2017 as an alternative that includes business elements in important group of researchers who seek to provide answers to the
credit risk management. One of the relevant components of the expected regulatory problems of this market. In the literature, the most commonly
loss calculation, the “loss given default” (LGD), appeared less frequently used datasets are from ppdai.com, renrendai.com, and we.com.
around the year 2018, drawing attention to the need for more in-depth As we saw in Fig. 3, the relevant data set is closely related to the
research on this topic for the P2P market. We can also see the evolution development of P2P markets by country, globally. The data was pri­
of the use of machine learning methods over time: support vector ma­ marily from China and the United States, followed by continental
chine and random forest were mostly used around 2017. They later Europe, the United Kingdom, Mexico, and, in a lesser proportion, even
became gradient boosting, and more recently, the use of deep learning. Southeast Asia.
“Logistic regression” also appears, as it is typically used as a benchmark It is important to mention that most of the studies have been carried
method. Finally, the issue of reject inference also appears in the last two out on loans to individuals. In a few cases, the P2P business lending
years. It could be an important element to incorporate into the new market has been selected as the object of study, particularly for SMEs. In
modeling alternatives to strengthen their credibility. particular, the papers by Giudici et al. (2020), Ahelegbey et al. (2019),
and Hadji-Misheva et al. (2018) use information from European External
Credit Assessment Institutions (ECAIs), such as modeFinance, an agency
3.2. Systematic analysis
specializing in companies and banks, and a fintech credit rating evalu­
ation in Europe.
In this section, we carry out a systematic review of the literature on
Table 5 focuses on the software used. R and Python are the most
P2P lending. We analyze several aspects of the papers to answer RQ2.
popular programming environments used. In Python, Scikit-learn
We have considered 98 English-written papers, as explained in section

Fig. 7. Graph of co-occurrence and time trend for concepts present in keywords. (Source: Prepared by the authors using VOSviewer based on the Web of Sci­
ence data).

9
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 4 Table 4 (continued )


Datasets. Data sets (Observation Country/ Number Authors/year
Data sets (Observation Country/ Number Authors/year Unit) Region
Unit) Region
Bondora (Individuals) Europe. 2 Byanjankar et al. (2015);
Lending Club United States 30 Bastani et al. (2019); Boiko Estonia, Byanjankar (2017)
(Individuals) Ferreira et al. (2017); Cai Finland, and
and Zhang (2020); Spain
Calabrese et al. (2019); Wangdaizhijia P2P China 2 Fu et al. (2019); Ge et al.
Cho et al. (2019); Duan online loan industry (2017)
(2019); Durovic (2017); portal (Platforms)
Emekter et al. (2015); No Information – 2 Ji et al. (2020); Li et al.
Gourieroux and Lu (2019); (2016)
Jin and Zhu (2015); Kim HNL-Home of Network China 1 Li et al. (2019)
and Cho (2019b); Kim and Loan (p2p Lending
Cho (2019a); Kim and Cho Intermediaries)
(2019b); Kumar et al. LendingMarket China 1 Xu and Chau (2018)
(2016); Ma et al. (2018b); (Individuals)
Malekipirbazari and MyLending China 1 Xu et al. (2016)
Aksakalli (2015); Namvar (Individuals)
et al. (2018); Rodrigues Paipai (Individuals) China 1 Chen (2017)
et al. (2018); Serrano- Eloan (Individuals) China 1 Jiang et al. (2018)
Cinca and Gutierrez-Nieto Three undefined Australia, 1 Ding et al. (2017)
(2016); Serrano-Cinca institutions Germany, and
et al. (2015); Stofa (2017); (Individuals) Brazil
Wang et al. (2018a); Wan Yooli (Individuals) China 1 Lin et al. (2017)
et al. (2019); Wang et al. European External Europe 1 Ahelegbey et al. (2019)
(2020); Wei et al. (2018); Credit Assessment
Xia et al. (2019); Ye et al. Institution-ECAI
(2018); Zang et al. (2015); (SMEs)
Zhou et al. (2018); Zhu Bandung (SMEs) Indonesia 1 Rosavina et al. (2019)
et al. (2019) Undefined platforms Indonesia 1 Amalia et al. (2019)
Undefined platforms China 16 Chen et al. (2016); Guo (Individuals)
(Individuals and et al. (2016); Jiang et al. Undefined platform Mexico 1 Canfield (2018)
enterprises) (2019); Li et al. (2018a); Li (Individuals)
et al. (2020); Niu et al. Funding Circle United 1 Pierrakis (2019)
(2019); Ma et al. (2018a); (Individuals) Kingdom
Wang et al. (2018b); Wang KIVA non-profit United States 1 Uddin et al. (2018)
et al. (2019); Xia and Li organization
(2016); Xu and Zhang (Individuals)
(2017); Yan et al. (2016); Data simulated – 1 Lee et al. (2017)
Yan et al. (2017); Yang
Source: Compiled by the authors
et al. (2019); Yuan et al.
(2018); Zhou et al. (2019)
Ppdai (Individuals) China 6 Chen et al. (2019); Xu et al. (machine learning library) and Keras (neural networks library) stand out
(2015); Zhang et al. for their frequency of use. Other software such as TensorFlow for
(2016b); Zhang et al.
(2016a); Zhang et al.
research involving deep learning, usually combined with Python, is also
(2017a); Zhao (2015) used. Software packages such as SPSS, STATA, and WEKA are also
Renrendai China 5 Gao et al. (2017); Li et al. employed, but to a lesser extent (the first two are mainly for traditional
(Individuals) (2018b); Liu et al. (2018); statistical and econometric models). Interestingly, only 47 papers report
Tao et al. (2017); Yao et al.
the software used.
(2019)
Prosper (Individuals) United States 5 Greiner & Wang (2010);
Herzenstein et al. (2011); 3.2.2. Business problem
Ren and Malik (2019); Tan This section proposes several categories associated with different
et al. (2017); Wang et al. problems with risk management, profit, and market knowledge. It is
(2016)
No use data sets – 5 Liu and Yan (2016);
worth mentioning that a paper can be found in several categories
Pokorná and Sponer because it can analyze and propose solutions made up of various models
(2016); Pur et al. (2014); dealing with different aspects of the market. The defined categories are
Wang (2018); Xiong listed in Table 6.
(2018)
The models that try to estimate the default and approximate the
Lending Club, We.com China and 3 Xia et al. (2017); Xia et al.
(Individuals) United States (2018); Xia (2019) probability of this event, the “Default classification/default probability”
Jinan Hengxin Micro- China 2 Zhang et al.(2017b); Zhang category, are the most frequent, with 70 papers, more than two-thirds of
Investment Advisory et al. (2017c) the studies analyzed. Many of them propose new methodological ap­
Co., Ltd. proaches, new variables, or combinations of various types of informa­
(Individuals)
Paipaidai (Individuals) China 2 Xinmin et al. (2019); Zhu
tion. They typically exhibit remarkable results in classification and
(2018) performance indicators (see Annex at the end of the paper for more
Two undefined Australia, 2 Nguyen Truong et al. details about each proposal). However, we draw attention to some
institutions and Germany, and (2019); Van-Sang et al. limitations that we have found in many of the studies. First, although
Lending Club the United (2019)
many of them describe alternatives for managing credit risk and are
(Individuals) States
Modefinance a ECAI Europe 2 Giudici et al. (2020); even intended to be used in the lending process, few draw attention to
(SMEs) Hadji-Misheva et al. selecting variables for this purpose. More precisely, many proposals
(2018) include input variables that are indeed the output of the risk assessment
(e.g., grade or levels of risk estimated by the platforms or interest rates,

10
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 5 Table 6
Software used. Business problem.
Software Number Authors/Year Business problem Number Authors/Year

Python 10 Boiko Ferreira et al. (2017); Cho et al. (2019); Default classification/ 70 Ahelegbey et al. (2019); Bastani et al.
Ding et al. (2017); Li et al. (2019); Li et al. Default probability (2019); Boiko Ferreira et al. (2017);
(2020); Li et al. (2018a); Namvar et al. Byanjankar et al. (2015); Byanjankar
(2018); Rodrigues et al. (2018); Xia (2019); (2017); Cai and Zhang (2020); Calabrese
Xia et al. (2017) et al. (2019); Canfield (2018); Chen
R 9 Byanjankar (2017); Byanjankar et al. (2015); (2017); Chen et al. (2019); Chen et al.
Calabrese et al. (2019); Chen et al. (2019); (2016); Cho et al. (2019); Ding et al.
Giudici et al. (2020); Jiang et al. (2019); Wan (2017); Duan (2019); Durovic (2017);
et al. (2019); Wang et al. (2018b); Xia et al. Emekter et al. (2015); Gao et al. (2017);
(2019) Ge et al. (2017); Giudici et al. (2020);
SPSS 7 Emekter et al. (2015); Greiner and Wang Gourieroux and Lu (2019); Guo et al.
(2010); Jin and Zhu (2015); Serrano-Cinca (2016); Herzenstein et al. (2011); Jiang
and Gutierrez-Nieto (2016); Yan et al. et al. (2018); Jiang et al. (2019); Jin and
(2016); Yan et al. (2017); Xu and Chau (2018) Zhu (2015); Kim and Cho (2019a); Kim
Python, TensorFlow 5 Bastani et al. (2019); Fu et al. (2019); Nguyen and Cho (2019a); Kim and Cho (2019b);
Truong et al. (2019); Van-Sang et al. (2019); Kumar et al. (2016); Li et al. (2016); Li
Wang et al. (2019) et al. (2018a); Li et al. (2020); Lin et al.
MATLAB 2 Yang et al. (2019); Zang et al. (2015) (2017); Liu et al. (2018); Ma et al.
STATA 3 Canfield (2018); Chen et al. (2016); Chen (2018b); Ma et al. (2018a);
(2017) Malekipirbazari and Aksakalli (2015);
Weka 3 Cai and Zhang (2020); Wang et al. (2016); Xu Hadji-Misheva et al. (2018); Namvar
et al. (2016) et al. (2018); Nguyen Truong et al.
Python, SPSS, 1 Duan (2019) (2019); Niu et al. (2019); Rodrigues
TensorFlow et al. (2018); Serrano-Cinca et al.
Python, STATA 1 Niu et al. (2019) (2015); Serrano-Cinca and Gutierrez-
R, Weka 1 Malekipirbazari and Aksakalli (2015) Nieto (2016); Stofa (2017); Tan et al.
R, Stanford CoreNLP, 1 Wang et al. (2020) (2017); Tao et al. (2017); Uddin et al.
SentiStrength (2018); Van-Sang et al. (2019); Wang
R, MATLAB, Portfolio 1 Wei et al. (2018) et al. (2016); Wang et al. (2019); Wang
Safeguard (PSG) et al. (2020); Wang et al. (2018b); Wang
R, Bazhuayu web- 1 Yao et al. (2019) et al. (2018a); Xia (2019); Xia et al.
crawler software (2017); Xia et al. (2018); Xia et al.
LISREL 1 Amalia et al. (2019) (2019); Xinmin et al. (2019); Xu and
Weka, Liblinear, 1 Guo et al. (2016) Chau (2018); Yang et al. (2019); Ye et al.
LibSVM (2018); Yuan et al. (2018); Zang et al.
(2015); Zhang et al. (2016a); Zhang
Source: Compiled by the authors et al.(2017b); Zhang et al. (2017c); Zhou
et al. (2019); Zhu (2018); Zhu et al.
among others). As a result, these proposals could overestimate or un­ (2019)
Risk and investment/ 25 Amalia et al. (2019); Chen et al. (2016);
derestimate the performance indicators, and they cannot be used
Lending decision/Asset Fu et al. (2019); Gao et al. (2017);
because they do not suit a real-life lending assessment process. allocation Greiner and Wang (2010); Herzenstein
Second, few papers specify the type of models they propose for risk et al. (2011); Ji et al. (2020); Lee et al.
management (application model, behavioral model, collection model, (2017); Li et al. (2018b); Pierrakis
(2019); Ren and Malik (2019); Rosavina
among others). We believe that it is essential to make such a distinction
et al. (2019); Tan et al. (2017); Tao et al.
since this determines the moment of application in risk management, the (2017); Wei et al. (2018); Xinmin et al.
potential variables to be used, and the modeling methodology. (2019); Xu and Chau (2018); Xu and
Third, there are many ways in which the default event is defined, but Zhang (2017); Yan et al. (2016), Yan
some of the studies do not even detail this aspect when it is crucial for et al. (2017); Yao et al. (2019); Zhang
et al. (2017a); Zhang et al. (2016b);
interpreting the results. For example, having a model that flags defaults
Zhao (2015); Zhu (2018)
older than 30 days past-due in one year is different from a model that Profit Model/profit scoring 6 Bastani et al. (2019); Cho et al. (2019);
defines default as an unpaid obligation during the life of the credit. This model Serrano-Cinca and Gutierrez-Nieto
kind of distinction must be considered when designing proposals, and in (2016); Tan et al. (2017); Xia and Li
turn, it must be accompanied by a good understanding of the business (2016); Ye et al. (2018)
Market knowledge: Risk and 5 Liu and Yan (2016); Pokorná and Sponer
and regulatory needs. As an example, some papers define default as regulation (2016); Pur et al. (2014), Wang (2018);
charged-off transactions analyzed against fully paid transactions at the (Descriptive, theoretical, Xiong (2018)
end of the credit life, including Serrano-Cinca and Gutierrez-Nieto and qualitative analysis)
(2016), Kim and Cho (2019a), Kim and Cho (2019b), Kim and Cho Fraud 3 Li et al. (2019); Xu et al. (2015); Xu et al.
(2016)
(2019c) Ye et al. (2018), Cho et al. (2019), Rodrigues et al. (2018) or
LGD 3 Gourieroux and Lu (2019); Xia et al.
Stofa (2017). Other definitions can be found in the Annex. (2017); Zhou et al. (2018)
All these facts limit the application due to a lack of uniformity and Prepayment 1 Wan et al. (2019)
consistency. These two characteristics are objectives sought by regula­
Source: Compiled by the authors.
tory bodies, especially for transnational growing markets such as P2P
lending. From the joint work between academics, those who propose
We also want to highlight that most papers are focused on classifi­
models for risk management, regulators, and platforms, tailored solu­
cation, which is valuable for lending processes. However, the robust
tions can be presented, which generate greater confidence and proba­
estimation of the probability of default (PD) needs to be studied in more
bility of application. This synergy would ensure a better understanding
depth. This aspect is crucial in the application and lending processes; it is
of the business’s nature and structure, which is reflected in better
a component in the classification process and assists in the rate alloca­
modeling, risk management, and service, enhancing the sustainability
tion and provisioning processes to better manage the risk, investment,
and development of the P2P lending market.

11
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

and regulatory processes. The above implies calibration methods, China and identify variables that positively impact social capital, risk
incorporating forward-looking macroeconomic factors, and searching management, and operating duration. In addition to studying the de­
for other factors, which are recognized as entering the modeling pro­ terminants of the 30 days past-due default rate, Gao et al. (2017) studies
cesses, as shown in the list of independent variables in Annex. the determinants of the percentage of bad debt in the total amount of a
Another line of research seeks to model investor’s rate of return, loan, including variables from a forward-looking and backward-looking
identify the determinants, or offer sets of better investment alternatives segment, the latter to identify variables from previous experience with
through the “Profit model/profit scoring model” alternatives (see Table 6). the platforms. On the other hand, Li et al. (2018b), using connectivity
Within this category, there are studies, such as Serrano-Cinca and networks and regression models, analyze the systemic risk from topo­
Gutierrez-Nieto (2016), Xia and Li (2016) or Ye et al. (2018), that pro­ logical variables derived from the network generated by the P2P market.
pose a profit score combining random forest (RF) and genetic algorithms Zhao (2015) simulates, using neural network models, expert risk as­
(AG) (see the Annex for details of the objective, independent variables, sessments on credit granted.
and the methodologies used). Other papers in this category that also Pierrakis (2019) analyzes a survey of lenders from a platform in the
propose default prediction models incorporate these results in profit United Kingdom using principal component analysis (PCA) to study
scoring models. These are the cases of Bastani et al. (2019) that propose their investment criteria and their motivation to invest. Rosavina et al.
a two-stage model using deep learning, or Tan et al. (2017) that estab­ (2019) uses qualitative methods to explore the determinants of
lishes a recommendation model, maximizing the total profit of investors, borrowing for SMEs in Indonesia. Additionally, they evaluate some as­
incorporating a novel variable, and the time value of money prediction pects, including collateral, requirements, process duration, interest rate,
(TVM). Additionally, Cho et al. (2019), with their proposal for an costs, and profits.
instance-based entropy fuzzy support vector machine (IEFSVM) and We also include works such as Fu et al. (2019), which help us un­
several methods for treating imbalance, use the less-risky transactions’ derstand the dynamics of the P2P market for efficient decision making.
return on credit operations as a final criterion for investment decisions. They characterize the feelings of investors’ comments using convolu­
Another important field of research is related to the investment, tional neural networks for sentence classification (TextCNN) and a long-
recommendation, or decision models, which we have grouped under the and short-term memory neural network (LSTM). In addition, they pre­
category of “Risk and investment/Lending decision/Asset allocation” (see dict an index for the daily trading volume of the Chinese platforms from
Table 6). Here, papers such as those by Zhang et al. (2016b) and Zhang a temporal series of changes that describes the dynamics of this variable
et al. (2017a) use different collaborative filtering algorithms. Ren and and others.
Malik (2019) raise the problem of investment decisions under Marko­ Other lines of work that appear less frequency include relevant fields
witz’s theory, taking into account variables such as the number of days to guarantee risk management, sustainability, and market efficiency.
required for an application to be fully funded. Xu and Zhang (2017) They include “Fraud,” the estimation of Loss Given Default or “LGD,” and
propose an investment index per platform based on an analytic hierar­ “Prepayment” as a target variable. These areas may be research oppor­
chy process. Ji et al. (2020) proposed risk ranking through fuzzy tunities to strengthen the development of the P2P market. Another
methodologies and the interactive multicriteria decision-making category that has been explored very little is what we have called
method (TODIM). See the Annex for further details. “Market knowledge: Risk and Regulation,” which presents descriptive,
Moreover, we include Lee et al. (2017), who evaluate the impact of theoretical, contextual, and qualitative elements of the P2P markets,
collateral on P2P lending as a tool to generate trust and a more efficient including benefits, actors, risks, and market failures. These works do not
operation based on risk and investment. Tan et al. (2017) also propose a use data but focus on how P2P lending markets can ensure their
recommender system, allowing investors to choose the most profitable development, continuity, and stability as an alternative for investment
and least risky borrowers. In this category, we have also included the and financing. They suggest several elements of regulation and super­
papers by Herzenstein et al. (2011), Greiner and Wang (2010), and Chen vision, another essential and necessary line for this market to develop
et al. (2016). They study the determinants of variables, such as the healthily.
percentage of the requested amount that is funded by investors, along
with the final interest rate, or the variation between the maximum rate 3.2.3. Data modeling paradigm
payable versus the assigned rate (see the Annex for details). In this way, In Table 7, we segment the methodological paradigms that underlie
they shed light on the applicant factors used by investors to make reli­ the quantitative tools used, bearing in mind again that a paper can use
able investment decisions. For this same purpose, Herzenstein et al. more than one paradigm depending on the objective of the research.
(2011) study credit performance, including narrative variables, com­ Thus, it can appear in several categories.
bined with demographic variables and loan characteristics. Chen et al. The largest category is “Statistics and econometrics models,” with 47
(2016) propose mixed regression and survival models to investigate papers, almost all of the papers analyzed. These papers typically use
possible gender discrimination in China’s P2P credit loan market. They explanatory models to identify the type of relationship and measure the
find empirical evidence that supports their hypothesis. impact of the determinants of the dependent variable (default, fraud,
In the same category, the papers by Tao et al. (2017), Zhu (2018), Xu investment, return, etc.). We believe that it is the most widely used
and Chau (2018), and Xinmin et al. (2019) that model both the pre­ precisely because of its capacity for interpretation, the ease of compar­
diction of default and whether the credit requested can be funded under ison with business theory and knowledge, the ease of estimation and
traditional statistical methodologies. They also propose new variables implementation, and the greater generalized understanding of re­
related to information disclosure, guarantees, and the safety promise searchers, users, and regulators.
offered by the platforms. Concretely, Xu and Chau’s (2018) paper in­ The next category is “Tree-based models,” with 37 papers. Decision
cludes soft information, among other inputs, while Xinmin et al. (2019) trees have been used throughout the last decades as an alternative for
focus on the impact of the variable for successful borrowing times. Yao managing credit risk, mainly because of their easy interpretation and
et al. (2019) evaluate the influence of the description of the loan’s application. However, in recent years, their application has increased
purpose with text mining techniques. Finally, Wei et al. (2018) use due to advances in tree-based ensemble methods, which obtain better
alternative transformations on the variables and an alternative objective classification performance than standard decision trees, although
function (buffered AUC) to understand the loan approval processes. See explainability is sacrificed. The main advances have come from homo­
the Annex for more details on each proposal. geneous ensemble techniques, such as bagging and boosting, random
In this broad category, we can find studies that try to characterize forest (RF), and extreme gradient boosting (XGBoost is the most
and explain the market. For example, Yan et al. (2016) and Yan et al. commonly used (see the Annex)). The developments of the boosting
(2017) study the determinants of the number of investors per platform in technique have also brought the application of algorithms such as

12
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 7 Table 7 (continued )


Paradigm used (93 papers, the five papers that did not use datasets were Main paradigm used Number Authors/Year
excluded).
Yuan et al. (2018); Zhang et al. (2017c);
Main paradigm used Number Authors/Year Zhao (2015)
Statistics and 47 Ahelegbey et al. (2019); Amalia et al. Bayesian-probabilistic 6 Boiko Ferreira et al. (2017); Guo et al.
econometrics models (2019); Boiko Ferreira et al. (2017); models (2016); Jiang et al. (2018); Rodrigues et al.
Byanjankar (2017); Cai and Zhang (2020); (2018); Wang et al. (2016); Wang et al.
Calabrese et al. (2019); Canfield (2018); (2018a)
Chen et al. (2016); Chen (2017); Chen et al. Recommendation 4 Ren and Malik (2019); Tan et al. (2017);
(2019); Cho et al. (2019); Ding et al. (2017); algorithms Zhang et al. (2016b); Zhang et al. (2017a)
Durovic (2017); Emekter et al. (2015); Gao Propose heterogenous 6 Guo et al. (2016); Zhou et al. (2019); Kim
et al. (2017); Ge et al. (2017); Gourieroux ensemble and Cho (2019a); Li et al. (2018a); Li et al.
and Lu (2019); Greiner and Wang (2010); (2018a); Namvar et al. (2018)
Herzenstein et al. (2011); Jiang et al. (2018); Fuzzy methods 2 Cho et al. (2019); Ji et al. (2020)
Jiang et al. (2019); Li et al. (2016); Li et al. Qualitative models 2 Rosavina et al. (2019); Uddin et al. (2018)
(2020); Li et al. (2018a); Li et al. (2018b); Propose homogenous 1 Ding et al. (2017)
Lin et al. (2017); Liu et al. (2018); Hadji- ensemble
Misheva et al. (2018); Namvar et al. (2018);
Source: Compiled by the authors.
Pierrakis (2019); Serrano-Cinca et al.
(2015); Serrano-Cinca and Gutierrez-Nieto
(2016); Stofa (2017); Tan et al. (2017); Tao adaptive boosting (AdaBoost) and light gradient boosting machine
et al. (2017); Wan et al. (2019); Wang et al. (LightGBM), among other alternatives. In the tree-based category, 22
(2016); Wei et al. (2018); Xia and Li (2016);
papers use such tree-based ensemble methods.
Xinmin et al. (2019); Xu and Chau (2018);
Xu and Zhang (2017); Yan et al. (2016); Yan In the category of “Propose heterogeneous ensemble,” six papers pro­
et al. (2017); Yao et al. (2019); Zhou et al. pose their own heterogeneous ensemble. As in homogenous ensemble
(2018); Zhu (2018) techniques, they aim to increase the learning ability of algorithms,
Tree-based models 37 Bastani et al. (2019); Boiko Ferreira et al. reduce the variability of classifiers or predictive models, generalize
(2017); Cai and Zhang (2020); Ding et al.
(2017); Guo et al. (2016); Jin and Zhu
performance, and guarantee scalability. They are also presented as an
(2015); Jiang et al. (2019); Kim and Cho alternative to the problem of class imbalance. In other words, these
(2019a); Kim and Cho (2019a); Kim and Cho methods are offered as an alternative to reduce the over adjustment of
(2019b); Kumar et al. (2016); Li et al. individual options and to produce superior predictive results and a
(2018a); Li et al. (2020); Ma et al. (2018a);
better capacity for generalization, which are visible in data sets with the
Malekipirbazari and Aksakalli (2015); Ma
et al. (2018b); Hadji-Misheva et al. (2018); class imbalance problem. On the other hand, only one paper in the
Namvar et al. (2018); Nguyen Truong et al. category “Propose a homogenous ensemble,” Ding et al. (2017), whopro­
(2019); Niu et al. (2019); Rodrigues et al. pose a homogenous ensemble model based on the clustering processes.
(2018); Tan et al. (2017); Van-Sang et al. Another widely used method is the “Support vector machine (SVM),”
(2019); Wang et al. (2016); Wang et al.
(2020); Wang et al. (2018a); Wang et al.
with 25 papers. However, most papers present SVM as an alternative
(2018b); Xia et al. (2017); Xia et al. (2018); model for comparison (see Annex for more details of the models used in
Xia et al. (2019); Xia (2019); Xu et al. each paper). In the case of "Artificial neural networks (ANNs)," seven
(2015); Xu et al. (2016); Ye et al. (2018); papers use them, mainly backpropagation algorithms. However, the
Zhang et al. (2016a); Zhou et al. (2019); Zhu
more sophisticated “deep learning” methods are being more commonly
et al. (2019)
Support Vector Machine 25 Bastani et al. (2019); Cho et al. (2019); Ding used (11). These factors account for the great development and appli­
et al. (2017); Duan (2019); Fu et al. (2019); cability of these methods. The most popular techniques are long short-
Guo et al. (2016); Jiang et al. (2018); Jin and term memory (LSTM) techniques, wide deep learning (WDP), deep
Zhu (2015); Kim and Cho (2019a); Kim and neural networks based on multilayer perceptrons, and convolutional
Cho (2019a); Kim and Cho (2019b); Li et al.
(2020); Malekipirbazari and Aksakalli
neural networks (CNNs), among other options.
(2015); Nguyen Truong et al. (2019); Finally, smaller categories include methods such as “Bayesian-prob­
Rodrigues et al. (2018); Van-Sang et al. abilistic models,” with six papers, mainly with the Naive Bayes (NB) al­
(2019); Wang et al. (2018a); Xia et al. gorithm, and “Recommendation algorithms” with four papers focusing on
(2018); Xia (2019); Xu et al. (2015); Xu et al.
the choice of profitable investment alternatives as a tool for platforms
(2016); Yang et al. (2019); Ye et al. (2018);
Zhou et al. (2019); Zhu et al. (2019) and investors in borrowing. “Fuzzy methods” (2) and “Qualitative models”
Use ensemble models 22 Bastani et al. (2019); Boiko Ferreira et al. (2) are less frequent.
(2017); Cho et al. (2019); Jiang et al. (2019);
Kim and Cho (2019a); Kim and Cho (2019b); 3.2.4. Methodological aspects
Ma et al. (2018b); Ma et al. (2018a);
Malekipirbazari and Aksakalli (2015);
In this section, we focus on the methodological aspects of the pro­
Nguyen Truong et al. (2019); Niu et al. posed models. We have identified ten nonexclusive categories that are
(2019); Rodrigues et al. (2018); Tan et al. shown in Table 8.
(2017); Van-Sang et al. (2019); Wang et al. The largest category is “Performance/evaluation” because 44 publi­
(2018b); Wang et al. (2020); Xia et al.
cations address this aspect. Those articles focus on default classification
(2017); Xia et al. (2018); Xia et al. (2019);
Xia (2019); Ye et al. (2018); Zhu et al. (2019) and use performance measures as the main evaluation criterion. The
Deep Learning 11 Bastani et al. (2019); Duan (2019); Fu et al. most frequent measures are accuracy (ACC), even if the data are typi­
(2019); Kim and Cho (2019a); Kim and Cho cally imbalanced, the area under the ROC curve (AUC), precision, and
(2019b); Li et al. (2018a); Li et al. (2020); recall. On the other hand, measures such as the F1 score, the harmonic
Nguyen Truong et al. (2019); Van-Sang et al.
(2019); Wang et al. (2019); Zhang et al.
mean between recall and precision, Kolmogorov-Smirnov (KS), or the H-
(2017b) measure present a lower frequency.
Artificial Neural 7 Byanjankar et al. (2015); Jin and Zhu Another category we want to draw attention to is “Reject inference,”
Networks (2015); Xu et al. (2015); Zang et al. (2015); which is a modeling component not very frequent in the literature. The
articles in this category propose to reject inference techniques to correct

13
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 8 rejected applications. For this purpose, Xia et al. (2018) and Xia (2019)
Methodological aspects (93 papers, the five papers that did not use datasets were combined classifier gradient boosting decision tree (GBDT) techniques
excluded). with semisupervised methods and outlier detection methods, respec­
Methodological Number Authors/Year tively (see Annex for details).
aspects The category of “Class imbalance treatment” includes the works that
Performance/ 44 Bastani et al. (2019); Boiko Ferreira et al. tackle the problem of unbalanced classes typical in P2P lending, where
evaluation (2017); Byanjankar et al. (2015); Byanjankar the percentage of default, fraud, or accepted applications is the minority
(2017); Cai and Zhang (2020); Calabrese et al. class. This may be problematic since the algorithms mainly learn from
(2019); Chen (2017); Chen et al. (2019); Cho
the majority class. Additionally, many performance metrics assign the
et al. (2019); Duan (2019); Jiang et al. (2019);
Jin and Zhu (2015); Kim and Cho (2019a); Kim same cost to the classification errors, which is far from the market needs
and Cho (2019a); Kim and Cho (2019b); Kumar (Xia et al., 2017). Different strategies can alleviate this problem, such as
et al. (2016); Li et al. (2018a); Li et al. (2019); Li resampling methods to balance classes, the adjustments of misclassifi­
et al. (2020); Liu et al. (2018); Ma et al. (2018b); cation costs or the definition of cost functions to be minimized (either
Malekipirbazari and Aksakalli (2015); Nguyen
Truong et al. (2019); Niu et al. (2019);
directly in the learning processes or after it), the calibration of the cutoff
Rodrigues et al. (2018); Uddin et al. (2018); value of the models employing cost-sensitive models, etc. More pre­
Van-Sang et al. (2019); Wang et al. (2019); cisely, Bastani et al. (2019) and Zhu et al. (2019) use resampling tech­
Wang et al. (2020); Wang et al. (2018b); Wang niques such as under sampling, oversampling, or synthetic minority
et al. (2018a); Xia et al. (2017); Xia et al.
oversampling techniques (SMOTEs). Regarding cost-sensitive models,
(2018); Xia (2019); Xia et al. (2019); Xu et al.
(2015); Yang et al. (2019); Ye et al. (2018); Xia et al. (2017) provides an excellent description of this line of research
Yuan et al. (2018); Zang et al. (2015); Zhang and proposes to adjust the estimation process and learning of the
et al.(2017b); Zhang et al. (2017c); Zhao extreme gradient boosting algorithm (XGBoost), taking into consider­
(2015); Zhu et al. (2019) ation the costs of misclassification. Similarly, Wang et al. (2018a), Ye
Explanation 37 Amalia et al. (2019); Byanjankar (2017);
et al. (2018), Boiko Ferreira et al. (2017), and Cho et al. (2019) use and
Calabrese et al. (2019); Canfield (2018); Chen
(2017); Chen et al. (2019); Chen et al. (2016); compare cost-sensitive strategies with resampling strategies. However,
Durovic (2017); Emekter et al. (2015); Gao et al. Cho et al. (2019) additionally propose and contrast an instance-based
(2017); Ge et al. (2017); Giudici et al. (2020); entropy fuzzy support vector machine model (IEFSVM) for imbalanced
Gourieroux and Lu (2019); Greiner and Wang
datasets. They select the least risky registers to compose a portfolio that
(2010); Herzenstein et al. (2011); Jiang et al.
(2018); Lee et al. (2017); Li et al. (2018b); Lin seeks high expected returns based on an investment model. Ye et al.
et al. (2017); Liu et al. (2018); Hadji-Misheva (2018), in turn, compare their cost-sensitive approach to standard
et al. (2018); Niu et al. (2019); Serrano-Cinca methods based on profit indicators. Ding et al. (2017) proposes an
et al. (2015); Serrano-Cinca and Gutierrez-Nieto ensemble using a novel technique of under sampling supported by
(2016); Stofa (2017); Tao et al. (2017); Wan
clustering techniques. Likewise, Malekipirbazari and Aksakalli (2015),
et al. (2019); Xia and Li (2016); Xinmin et al.
(2019); Xu and Chau (2018); Yan et al. (2016); on their random forest (RF) proposal, incorporates a cost-weighted
Yan et al. (2017); Yang et al. (2019); Yao et al. matrix that increases the cost of erroneous classification associated
(2019); Zhang et al. (2016a); Zhou et al. (2018); with borrowers with poor default behavior through a cost-sensitive
Zhu (2018)
meta-algorithm incorporated in the WEKA software (see the Annex for
Inclusion of new 23 Ahelegbey et al. (2019); Fu et al. (2019); Gao
variables et al. (2017); Ge et al. (2017); Giudici et al.
more details).
(2020); Guo et al. (2016); Herzenstein et al. On the other hand, another group of papers is focused on identifying
(2011); Jiang et al. (2018); Li et al. (2019); Li determinants, validating relationship hypotheses, and understanding
et al. (2018b); Ma et al. (2018a); Hadji-Misheva the types of relationships of different potential variables with the target
et al. (2018); Niu et al. (2019); Wang et al.
variable. For example, default, fraud, approval, interest rate decrease,
(2016); Wang et al. (2019); Wang et al. (2020);
Wei et al. (2018); Xia et al. (2019); Xu et al. and the number of investors per platform, among others (see Annex for
(2015); Xu and Chau (2018); Yao et al. (2019); the list of target variables used). We have classified these studies in
Zhang et al. (2016a); Zhu (2018) Table 8 under the “Explanation” category. This aspect is demanded by
Class imbalance 9 Bastani et al. (2019); Boiko Ferreira et al.
regulators, supervisors, and users of machine learning methods. It is
treatment (2017); Cho et al. (2019); Ding et al. (2017);
Malekipirbazari and Aksakalli (2015); Wang
strongly associated with transparency and confidence in credit market
et al. (2018a); Xia et al. (2017); Ye et al. (2018); risk and profit management, particularly in P2P lending. We found that
Zhu et al. (2019) 31 out of 37 used classic statistical and econometric methods, such as
Feature selection 9 Cai and Zhang (2020); Jiang et al. (2019); Jin linear regression models and logit, probit, or survival models. Such
and Zhu (2015); Kim and Cho (2019a); Kim and
models include the explanatory component through the interpretation
Cho (2019b); Nguyen Truong et al. (2019); Van-
Sang et al. (2019); Xu et al. (2015); Zhu et al. of coefficients and testing of inferential hypotheses about them.
(2019) Among the proposals that try to emphasize the explanation compo­
Rank by risk or 6 Ji et al. (2020); Ren and Malik (2019); Tan et al. nent, we highlight Jiang et al. (2018) and Yao et al. (2019) for their
profitability (2017); Xu and Zhang (2017); Zhang et al. methodological innovations. The authors, using text mining tools such
(2016b); Zhang et al. (2017a)
Descriptive analysis 3 Durovic (2017); Pierrakis (2019); Rosavina
as latent Dirichlet allocation (LDA), among other techniques, seek to
et al. (2019) incorporate and validate new soft information features through LR. LR is
Reject inference 2 Xia et al. (2018); Xia (2019) the most commonly used model in papers that predict default through
Outliers treatment 1 Li et al. (2016) classification models (see Annex for more details). Another important
Source: Compiled by the authors. paper is Ge et al. (2017), which analyses the effect of social media in­
formation on default prediction. In addition, Chen et al. (2019) makes a
the sampling bias generated by modeling only with the set of accepted proposal with logistic quantile regression (LQR) for the same objective
loans, even though the models’ use is established on the whole set of variable and with explanatory elements. Amalia et al. (2019) uses the
loan applications. This bias can affect the predictability of the models. structural equation model (SEM) to causally understand the aspects that
The input feature distributions and the label proportion of the target generate confidence in the P2P market. Li et al. (2018b) and Hadji-
variables could be very different if the data set would include the Misheva et al. (2018) validate and incorporate the determinants of
credit risk associated with the P2P market agents’ connectivity network

14
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

topology into regression models. In line with them, Calabrese et al. topology for systematic risk assessment in the first paper and credit risk
(2019) propose a flexible bivariate regression model advisable for class in the latter two. Similarly, Ahelegbey et al. (2019), with inference
imbalance events and recognizing the dependence on borrowers’ derived from a latent factor model on financial indicators, distinguishes
behavior on platforms and credit bureau background (see the Annex for between connected and unconnected business communities. This
details). distinction improves the predictive performance of scoring models in
The category of “Feature selection” has nine papers that emphasize P2P lending for SMEs.
this modeling aspect (see Table 8). These studies use methodologies to Other works study the impact of particular features of P2P lending
select the best set of features, seeking to improve the classification platforms. Zhu (2018) studies how a platform’s promise of security as
performance. Nevertheless, they typically ignore the explanation in the guaranteed support to back a loan presents a level of negative relevance
causal relationship with the dependent variable. This aspect of machine in the models, bringing problems of moral risk and adverse selection and
learning is a primary subject of research today. affecting the probability of loan financing and default. They find that
In the category of “Inclusion of new variables,” we can find works loans with security promises have a higher quoted amount, a lower in­
incorporating new sources and types of information that prove to be terest rate, higher ratings, more investors, a higher funding success rate,
decisive in risk and profit management models in the P2P market. The but a higher default rate than unsecured loans. Xu and Chau (2018)
main papers are written by Herzenstein et al. (2011), Wang et al. (2016), examines the impact of communication between lenders and borrowers
Jiang et al. (2018), and Yao et al. (2019), who use text mining algo­ on financing outcomes and loan performance, collecting variables
rithms within natural language processing (NLP) technologies. They related to information disclosure, social influence, and the quantity and
include narrative features derived from aspects such as economic quality of information exchanged, among other characteristics (see the
hardship, and qualities such as hardworking, successful, moral, and Annex). Finally, in this category, Fu et al. (2019) characterize investors’
religious. Additionally, they add variables that are derived from aspects comments’ sentiments. From a time series of changes describing the
related to deception, subjectivity, sentiment, readability, personality, dynamics of this variable, they predict an index of the daily trading
etc. or soft features related to assets, income, work, family, agriculture, volume of Chinese platforms. They take comments from the first
length, capital turnover, investment and entrepreneurship, business authorized P2P information platform in China and one of the largest
expansion, and the ambiguity of purpose, among others (see the Annex). portals in the P2P industry.
These variables are mainly taken from loan titles, textual data generated On the other hand, Gao et al. (2017) empirically validate the use of
from statements describing the purpose of the loan, and all descriptive forward-looking and backward-looking mechanisms. The first mecha­
text paragraphs related to each application. Wang et al. (2020) proposes nism with credit indicators and the second with variables describe
a soft factor mining method in terms of the distribution of the kinds of borrowers’ experiences on the platform. These variables (Annex) are
semantics expressed in the descriptive loan text. They also evaluate the complemented with information on loans and titles (e.g., length). On the
inclusion of these features by performance compared to linguistic and other hand, Wei et al. (2018), trying to capture the nonlinearity of
stylistic soft factors. In turn, Xia et al. (2019) complements demographic characteristics, uses a cubic spline regression transformation, combining
and financial information with narrative data and includes soft infor­ their proposal with a classification optimization procedure through
mation related to loan description, borrowers’ character, and variables buffered AUC (bAUC). However, we consider that these performance-
obtained from a clustering procedure on soft information. Li et al. enhancing transformations must also be evaluated in light of the
(2019), with advanced natural language processing (NLP) techniques, explainability of the model. We include this article in the “Inclusion of
evaluated the risk of fraud associated with platforms in China. They use new variables” group due to the transformation proposed to test the
variables derived from management team members’ working experi­ variables differently from the traditional way.
ence, educational background, and composition. Text mining techniques
are used on new information sources, and the explanation and predic­ 3.2.5. Other modeling aspects
tion components in risk models are improved, benefiting management in This section highlights some elements associated with classification
the P2P market. models, mainly for default and fraud estimation in P2P lending risk
We can also find works that deal with users’ internet and telecom­ management. In particular, elements that provide confidence to users
munication information. Wang et al. (2019) uses information on bor­ and, especially, to regulators and supervisors, who must understand
rowers’ online operation behavior on P2P lending websites, use them to regulate and supervise effectively (ROFIEG, 2019). In partic­
variables such as registration records, login records, click records, ular, we highlight “Cross-validation,” “Hyperparameter tuning,” “Use of
browse records, authentication records, etc. Guo et al. (2016) focuses on logistic regression,” “Statistical comparison of the performance of classifiers,”
the evaluation of capacity, character, and conditions within the five Cs and “Explainability.” All these elements reveal how the models were
of credit risk management, excluding capital and collateral. They also estimated, how well they perform, and how they work.
include information obtained from the internet, heterogeneous social In this line, Table 9 presents these elements. We analyzed the papers
data, mainly demographic information derived from social networks, that proposed classification models, 74 out of 98 papers (75% of the
content generation, and the structure of each user’s social network total). It is worth mentioning that 49 of them included machine learning
through the demographic, tweet, and network feature components. Ge methods.
et al. (2017) and Zhang et al. (2016a) evaluate the role of social media Several studies use mechanisms to evaluate the efficiency of the
information in the prediction of default. Similarly, Niu et al. (2019) and proposals in their evaluation strategy (55 papers), with “Cross-valida­
Xu et al. (2016) study the role of social network information. This last tion” being the most commonly used (36 papers). Other studies use other
paper also evaluates information derived from herding manipulation to alternatives, including the mere separation of training, validation, and
predict fraud. They find that herding behavior increases the likelihood test sets (or only training and testing) or the replication and evaluation
that lenders will invest in listings that have already received bids from of these segmentations randomly, a given number of times.
others. In addition, Ma et al. (2018a) include patterns derived from The use of “hyperparameter tuning” is also analyzed. This process
mobile phone use (phone calls, text messages, and data traffic), de­ shows the purpose of estimating optimal, or at least better, models.
mographics, mobility patterns, telecommunication patterns, app usage Machine learning methods have different parameters that control the
patterns, and telecommunication records to predict loan default. learning process, and it is essential to ensure their performance and
Another trend of the “Inclusion of new variables” consists of the use of stability. However, only 20 papers mention “Hyperparameter tuning”
network topology information, a field that has gained attention in recent processes. Among the most commonly used methods are searching on
years. Li et al. (2018b), Hadji-Misheva et al. (2018), and Giudici et al. grids, Bayesian optimization methods, and genetic algorithms.
(2020) include variables derived from the market connectivity network We considered it important to highlight the studies that “use logistic

15
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Table 9 Table 9 (continued )


Other modeling aspects. Component Number Authors/Year
Component Number Authors/Year
et al. (2018); Wang et al. (2019); Wang
Use of logistic regression 49 Ahelegbey et al. (2019); Boiko Ferreira et al. (2020); Xia et al. (2017); Xia et al.
et al. (2017); Byanjankar et al. (2015); (2018); Xia et al. (2019); Xia (2019);
Byanjankar (2017); Cai and Zhang Yang et al. (2019); Ye et al. (2018);
(2020); Calabrese et al. (2019); Canfield Zhou et al. (2019)
(2018); Chen (2017); Chen et al (2019); Statistical comparison of the 10 Cho et al. (2019); Jiang et al. (2019);
Ding et al. (2017); Duan (2019); Jiang performance of classifiers Kim and Cho (2019a); Ma et al.
et al. (2019); Jiang et al. (2018); Ge (2018b); Niu et al. (2019); Wang et al.
et al. (2017); Giudici et al. (2020); (2018b); Wang et al. (2020); Xia et al.
Gourieroux and Lu (2019); Guo et al. (2017); Xia et al. (2018); Xia (2019)
(2016); Li et al. (2018a); Li et al. (2020);
Source: Compiled by the authors.
Lin et al. (2017); Liu et al. (2018); Ma
et al. (2018a); Malekipirbazari and
Aksakalli (2015); Hadji-Misheva et al. regression,” either directly or as a benchmark. Remarkably, 49 papers
(2018); Namvar et al. (2018); Nguyen “Use logistic regression” as the main model or as the benchmark meth­
Truong et al. (2019); Niu et al. (2019);
Rodrigues et al. (2018); Wang et al.
odology in the P2P lending market. Thus, logistic regression seems to
(2016); Serrano-Cinca and Gutierrez- still be the reference model for P2P lending loan evaluation proposals, as
Nieto (2016); Serrano-Cinca et al. it is for regulators, banks, and financial institutions in traditional
(2015); Stofa (2017); Tan et al. (2017); products.
Tao et al. (2017); Van-Sang et al.
In turn, in the category “Statistical comparison of the performance of
(2019); Wang et al. (2018a); Wei et al.
(2018); Xinmin et al. (2019); Xu and classifiers,” we highlight the works that perform statistical inference as a
Chau (2018); Xia et al. (2017); Xia et al. decision criterion against the modeling alternatives’ performance. Only
(2018); Xia et al. (2019); Xia (2019); ten papers use a statistical significance test to compare the performance
Yao et al. (2019); Ye et al. (2018); Wang of the competing alternatives. The tests used are paired-t, Friedman,
et al. (2018b); Wang et al. (2020);
Zhang et al. (2016a); Zhu et al. (2019)
DeLong, or Shapiro-Wilk.
Explainability 38 Ahelegbey et al. (2019); Byanjankar Finally, in the “Explainability” category, we consider studies that
et al. (2015); Byanjankar (2017); include elements that provide explainability to their proposals. Such
Calabrese et al. (2019); Canfield (2018); elements could be, for example, dependence analysis, sensitivity or
Chen (2017); Chen et al. (2019);
statistical causality, analysis of coefficients or parameters, inference of
Durovic (2017); Emekter et al. (2015);
Gao et al. (2017); Ge et al. (2017); parameters, analysis of cases, interpretation of signs, evaluation of
Giudici et al. (2020); Gourieroux and Lu monotonicity, nonlinearities, changes of structure, and feature impor­
(2019); Guo et al. (2016); Herzenstein tance analysis, among other aspects. In particular, we counted 38 papers
et al. (2011); Jin and Zhu (2015); Jiang that included the “Explainability” component. However, most of them
et al. (2018); Li et al. (2018a); Lin et al.
(2017); Li et al. (2020); Liu et al.
(26 papers) do so for statistical or econometric models. More precisely,
(2018); Ma et al. (2018b); Ma et al. they do so through coefficient estimations and inferential tests. Few
(2018a); Hadji-Misheva et al. (2018); papers include explanatory elements in machine learning (14 papers),
Niu et al. (2019); Serrano-Cinca and and such papers mostly (8 papers) only study feature importance,
Gutierrez-Nieto (2016); Serrano-Cinca
ignoring aspects such as monotonicity, the nonlinearity of relationships,
et al. (2015); Stofa (2017); Tao et al.
(2017); Uddin et al. (2018); Wan et al. and structural changes, which could be considered through approaches
(2019); Xinmin et al. (2019); Xu and such as those developed in recent years with techniques of local and
Chau (2018); Xia et al. (2017); Yang global explicability and interpretability (Molnar, 2021). As a result,
et al. (2019); Yao et al. (2019); Zhu most machine learning models lack a critical aspect demanded for
(2018); Zhang et al. (2016a)
Cross-validation 36 Bastani et al. (2019); Boiko Ferreira
managing risk in the P2P lending market, which would help to extend
et al. (2017); Byanjankar (2017); Cai their adoption.
and Zhang (2020); Cho et al. (2019);
Guo et al. (2016); Ding et al. (2017); 4. Discussion
Jiang et al. (2019); Jiang et al. (2018);
Jin and Zhu (2015); Kim and Cho
(2019a); Kim and Cho (2019b); Kumar The P2P market has been growing hand in hand with further tech­
et al. (2016); Ma et al. (2018a); Li et al. nological developments, and risk and profit management have become
(2019); Li et al. (2020); Malekipirbazari more sophisticated. Although this has brought opportunities for in­
and Aksakalli (2015), Namvar et al. vestment and financing for lenders and borrowers, it has also generated
(2018); Nguyen Truong et al. (2019);
Niu et al. (2019); Rodrigues et al.
difficulties in regulation and supervision. Governing bodies and p2p
(2018); Van-Sang et al. (2019); Wang platforms must adapt quickly to new market demands for risk and profit
et al. (2016); Wang et al. (2018b); Wang management to ensure participants’ financial health and the market’s
et al. (2020); Wei et al. (2018); Xia et al. sustainability. For this purpose, they must cover financial elements and
(2019); Xia (2019); Xu et al. (2016);
technological and methodological elements, where machine learning
Yang et al. (2019); Ye et al. (2018);
Zhang et al. (2016a); Zhang et al. and data science tools are strategic if used efficiently.
(2017b); Zhang et al. (2017c); Zhou Current technologies and, in particular, machine learning, offer the
et al. (2019); Zhu (2018) potential for broader, deeper, and faster analysis of large data sets,
Hyperparameter tuning 20 Boiko Ferreira et al. (2017); Cho et al. incorporating diverse sources of information that may be relevant for
(2019); Jiang et al. (2019); Kim and Cho
risk assessment and profitability management of companies and in­
(2019a); Kim and Cho (2019b); Li et al.
(2018a); Li et al. (2020); Ma et al. dividuals, much more so in a developing market such as P2P lending
(2018a); Malekipirbazari and Aksakalli (ROFIEG, 2019). The development of machine learning and new data
(2015); Niu et al. (2019); Rodrigues science technologies helps provide more suitable and adapted products.
For lenders, these solutions can help protect against credit risk and

16
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

fraud, reduce the cost of the credit evaluation process, encourage sources such as social networks (connections, consultations, photo­
product development, distribution, and monitoring, and enable better graphs, messages, and videos), connectivity networks, the use of tele­
consumer-customer interaction (ROFIEG, 2019). communication devices, georeferencing, the textual analysis of the
Therefore, given the growth of P2P lending and the increasing documents that accompany the applications, etc. Such variables help to
adoption of new technologies such as machine learning for efficient risk improve the performance indicators.
management and profit assessment, we considered it relevant to review
the academic literature. It helped us identify the main research trends 5. Research gaps, recommendations, and future directions
and research and development opportunities that contribute to the P2P
lending market’s healthy development. We also found some limitations In this section, we answer RQ3, where we asked about research gaps
in many of the studies that we consider relevant to emphasize to be and future research directions. As we have seen, new proposals based on
avoided in future works. machine learning and artificial intelligence have become more relevant.
Related to RQ1, we found that among the most relevant works by Some papers include aspects that others do not, but few cover most of
citation are Greiner and Wang (2010), Herzenstein et al. (2011), the elements that we consider to ensure their effectiveness, efficiency,
Emekter et al. (2015), Serrano-Cinca et al. (2015), Malekipirbazari and and transparency.
Aksakalli (2015), Serrano-Cinca and Gutierrez-Nieto (2016), and Xia First, we reflect on the sources of information, which are increasingly
et al. (2017), with Emekter et al. (2015) being the most cited publica­ diverse and require complex handling. The potential predictors of risk
tion. These papers are mostly devoted to identifying the key factors in and profit are more varied and complex in their treatment, including soft
credit risk models in P2P lending. They mainly use statistical and data, variables from NLP, connectivity networks, other sources of big
econometric models. Additionally, the papers by Serrano-Cinca and data, etc. Such new data typically improves the levels of predictability
Gutierrez-Nieto (2016) and Xia et al. (2017) have had a great impact. and the performance of the models,
They present proposals for risk management models with a business However, in many cases in academic research, models exploit these
focus, which we believe make them very suitable for the industry. sources without a conscious study on their use in the market context. As
Additionally, we identify that China and the United States have more researchers, we should strive to generate proposals without discrimi­
experience in the P2P market. However, the eastern market shows nation and implicit biases associated with selecting and treating new
deterioration and reduction of the market due to fraud and regulatory sources of information. Possible biases should be reflected upon and
issues. Most likely, the most significant number of publications come investigated in light of the ethics, culture, and laws of the financial
from eastern countries and China in particular. The most frequent context in which the p2p market develops.
cooperation between researchers is between the two countries With respect to the population under study related to the treated
mentioned. Consequently, we can conclude that the countries with the business problem, it is interesting to note that very few studies focus on
largest market also have researched the topic more intensely; however, P2P loans for businesses and companies, most of which have focused on
we cannot establish a causal relationship. loans for individuals. Having businesses and companies as the object of
In RQ2, we asked about the main trends and contents in the litera­ study changes the treatment of the sources of information, and the
ture. The most used data set is that from Lending Club, a company from evaluation criteria should be more financial. This line of research de­
the US. This is probably because it provides comprehensive information, serves a more in-depth analysis.
including variables for both the application and the behavior of the Additionally, we found that some other business problems are
obligations. We can see a strong tendency to use data from the Chinese understudied, such as fraud, debt collection, credit provision (LGD),
market that may be due to market growth and its subsequent problems, third-party collateral warranties, etc. All of them are necessary for the
which make it suitable for research. Hence, by country, data mostly healthy development of the market.
come from China and the United States, but we found other datasets Surprisingly, we found that many papers lack a proper specification
from Mexico, Southeast Asia, and the United Kingdom, where P2P of the business problem. For example, few papers clarify whether the
lending is also developing. proposed model is for granting or behavior. This lack prevents the use of
The software is mainly free, open-source Python or R libraries. Its use these models by regulators and end-users. We have also found problems
is frequently associated with free access and the possibility of enabling in the default definition; in some papers, there is even no mention of
replicability and the reutilization of previous developments of other how this event is defined, which could cause an underestimation of the
researchers’ developments with few restrictions, if any. risk or an overestimation of the profit.
We have identified that many publications focus on credit risk clas­ We believe that it is essential to clearly define the model’s objective
sification models using new machine learning methods at the business in the business context since this establishes the evaluation criteria and
level. These models typically improve the performance of traditional meets the demands of interpretability and transparency. It is different
alternatives, which eventually favor investors, platforms, and con­ from having a granting model, where the main objective is the classifi­
sumers. In recent papers, we can find new trends in the use of machine cation, to a default probability model where the interest is in provi­
learning, such as deep learning, decision tree ensembles (mainly random sioning the expected losses associated with the investment.
forest and gradient boosting algorithms), and heterogeneous ensembles. Alternatively, if the aim is to construct scoring, this will define the access
However, despite substantial methodological progress, few efforts have and pricing rate assigned to a liability. We believe that this distinction is
been made to add interpretability and explainability to the resulting not made in detail in many cases and would certainly allow researchers
models, which may hinder their adoption due to regulatory re­ to provide solutions more in line with market demand.
quirements and industry needs. In turn, papers should have an adequate selection of the independent
On the other hand, a significant part of the publications use statistical variables according to the model’s purpose. This aspect has been over­
and econometric models, probably due to their interpretability, inte­ looked in some research proposals where the inclusion of variables was
gration with business theory and knowledge, ease of estimation and guided merely by improving the prediction performance. For example,
implementation, and generalized understanding of researchers, users, for granting loans, it makes no sense to use decision variables such as
and user regulators. It is worth mentioning that logistic regression is interest rate or the level of risk estimated by the companies because they
typically used as a benchmark or base model. are closely linked to the event being analyzed, generating biases in the
We also found another trend: complementing traditional information results and limiting their use. The variables included must consider the
with variables from other sources or using soft information and NLP for operability of the business. These facts limit the application due to a lack
credit risk and investment models and profits. For example, many papers of uniformity and consistency, which are objectives sought by regulatory
explore the use of new input variables derived from new information bodies, especially for transnational growing markets such as P2P

17
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

lending. The appropriate choice of variables for the models is a (ROFIEG, 2019). In P2P lending, we are starting to see some efforts
requirement that must be complied by research proposals. along this research line, as shown in Ariza-Garzon et al. (2020) and
Regarding the software used, we believe that we must continue to Bussmann et al. (2020), both published after our literature review. At a
take advantage of the benefits of open-access software, such as Python broader level, there is great interest in reflecting on the component of
and R. They have proven effective in effectively integrating different explainability in modeling and offering alternative explanations for new
information sources, but more importantly, they enable access to other machine learning models, as shown in Carvalho et al. (2019), Rudin
researchers’ developments, with few limitations, contributing to the (2019), and Barredo Arrieta et al. (2020).
democratization of knowledge in general and the healthy development Finally, we highlight another line of research that includes articles
of the P2P lending market. that help understand the market, deficiencies in regulation, and eco­
Regarding the methodological aspects, we will highlight those pre­ nomic, political, and financial approaches. Most of these studies are
sent in some of the works reviewed and offer the potential for the based on qualitative research, with theoretical and contextual elements,
market’s healthy development, for example, the proposal of cost- seeking to identify the shortcomings and opportunities that help the
sensitive credit risk models that help mitigate the class imbalance development, continuity, and stability of the P2P market as an alter­
problem and provide insightful error measures for the business. From native for investment, inclusion, and financing. We believe this is a
our perspective, cost-sensitive models for P2P lending are a research line research line that deserves further attention and needs to be com­
that deserves to be further developed. plemented with quantitative research.
We also call attention to the correction of some biases that re­ Regarding the limitations of this study, we point out that although
searchers have considered in a few studies. One of them is the method of the review of articles in the main journals has been both extensive and
rejecting inference as an alternative to the bias that arises from not intensive, the design established could have caused some studies on the
knowing the rejected applications. Many works are built on data already subject in other publications to be overlooked, for example, due to the
filtered by the data providers, affecting the models’ actual performance limitation of the database selected or the query used. Assertions on
and limiting their use. Reject inference is a problem of estimating a findings are, therefore, only based on articles retrieved.
counterfactual that should be studied. Another bias, perhaps to a lesser In summary, P2P lending is a flourishing market, and its research is
extent, arises in models that aim to estimate the probability of the also buoyant and active. Nevertheless, it has essential shortcomings and
event’s probability, which deserves evaluation criteria different from opportunities that must be addressed and studied to ensure its service,
those used in classification. They deserve calibration methods of the development, and sustainability. We hope that this study will serve to
estimated probability, which academics have rarely studied in light of draw attention to some of these.
the business contexts. Additionally, we consider it vital that academics, regulators, and
On the other hand, we would also like to draw attention to the use of fintech work together to develop a framework for the risk management
variable transformations in the models, again with a business sense that and profitability of financial products in the p2p lending market. We
guarantees explainability and transparency, such as categorical vari­ believe that joint work solutions could be presented that are much more
ables. In this case, transformations using dummy variables, the weight of tailored to the P2P context, generating a greater probability of use and
evidence (WOE), standardizations, and normalizations are used indis­ confidence in these solutions. This synergy would ensure a better un­
tinctly in the works, improving performance but limiting their inter­ derstanding of the business’s nature and structure, which would lead to
pretation and application. We believe that variable transformation in better modeling, risk management, and service, enhancing the sustain­
future proposals needs more careful consideration to ensure the pro­ ability and development of the P2P lending market. In this way, the
posals’ usefulness. research will help develop the market for the digitalization of finance,
We would also like to mention that the evaluation process and financial coverage and inclusion, and the offer of better products and
strategy should not only seek to guarantee performance in terms of services to benefit individuals, households, and companies.
classification or prediction. Research proposals should also incorporate
other aspects already considered by regulators to evaluate models for Funding
traditional product markets: representativeness, replicability, scalabil­
ity, and temporal stability. We believe that there is no standard mini­ This work was supported in part by the European Union’s H2020
mum criteria evaluation that could guarantee the suitability and Coordination and Support Actions [Grant 825215], by COST (European
application of the proposals that recognize the business context, the type Cooperation in Science and Technology) [COST Action 19130] and by
of problem, the regulatory framework, and the sources of information. the Santander-UCM Research Project [Grant PR87/19-22586].
The evaluation process of risk and profit modeling proposals is one of the
topics in the P2P lending market that merits further research. CRediT authorship contribution statement
The explainability component also deserves further attention since
understanding the models’ decisions should be implicit in the modeling Miller-Janny Ariza-Garzón: Conceptualization, Methodology,
processes. Given the extended use of machine learning methods and the Formal analysis, Writing – original draft, Writing - review & editing,
great need for model explainability and transparency required by reg­ Visualization. María-Del-Mar Camacho-Miñano: Supervision,
ulators, supervisors, financial institutions, and governments, we believe Conceptualization, Methodology, Writing - review & editing, Funding
that more research on this topic is needed. Most machine learning acquisition. María-Jesús Segovia-Vargas: Methodology, Writing - re­
proposals focus on performance and neglect interpretability. The models view & editing, Funding acquisition. Javier Arroyo: Writing - review &
with explainable purposes are still supported in traditional methodolo­ editing, Formal analysis, Funding acquisition.
gies, econometrics, and statistical tools. The papers that use machine
learning models rarely include this aspect, and if they do, they only
include an analysis of feature importance. Thus, the interpretability Declaration of Competing Interest
components are still a challenge and a necessity for machine learning
proposals. As future research lines, there is a need to generate frame­ The authors declare that they have no known competing financial
works that ensure the ethical application of new solutions, particularly interests or personal relationships that could have appeared to influence
credit, whose central elements rely on the explainability of decisions the work reported in this paper.

18
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Appendix. Variables and specific model of the selected papers

Year Dependent variables Independent variables Specific Model Authors/Year

2010 (1) % funded; (2) % reduction interest rate financial, loan, soft (social capital, listing quality: MLR Greiner and Wang
description-image) (2010)
2011 (1) % funded; (2) % reduction interest rate; (3) demographic, loan, narrative variables MR, MLR Herzenstein et al.
performance (4 categories: paid ahead of (trustworthy, economic hardship, hardworking, (2011)
schedule and in full, paid as scheduled, payments successful, moral, and religious)
between one and four months late, Default)
2015 Default demographic, loan, financial, employment, credit ANN, LR Byanjankar et al.
(2015)
S-CR, LR Serrano-Cinca et al.
(2015)
loan, financial, credit S-CR, LR Emekter et al. (2015)
Default; fully paid demographic, loan, financial, employment, credit RF, KNN, SVM, LR, FICO, Lcgrade Malekipirbazari and
Aksakalli (2015)
loan, financial, credit SVM, RBF, MLP, DT-CART, DT-CHAID Jin and Zhu (2015)
Default: with outstanding repayment records loan, credit, employment, financial BP-ANN Zang et al. (2015)
Expert assessment of the risk demographic, loan behavior, credit BP-ANN Zhao (2015)
Fraud loan application, e-commerce, text description, DT, SVM, ANN Xu et al. (2015)
social network
2016 (1) % funded; (2) final interest rate; (3) Default: demographic, loan, credit, experience lending HMR, LR, S-CR Chen et al. (2016)
60 + days past due website, interest rate offered, duration (list-expire
or loan), role (borrower or borrower and lender)
(1) IRR; (2) Default: Charged off, fully paid demographic, loan, financial, employment, credit MLR, LR, DT Serrano-Cinca and
Gutierrez-Nieto
(2016)
# investors financial, credit, social capital, risk management ELM-MLR Yan et al. (2016)
(mortgage collateral or third-party guarantee),
listing information, operation duration
Default heterogeneous social data: demographic, tweet, GBDT, RF, BAG, NB, L1-LR, SVM; with Guo et al. (2016)
network, high-level (features derived from different LDA
classifiers)
loan, credit, soft information (deception, LR, DT-C4.5, NB, MLP, RF Wang et al. (2016)
subjectivity, sentiment, readability, personality and
mode of thought)
Default; fully paid demographic, loan, credit historic, social media DT, LR, ANN Zhang et al. (2016a)
Default: 16 + days past due, charged off; Current, loan RF, DT, BAG Kumar et al. (2016)
Fully Paid, In grace period
Default: overdue demographic, credit, loan, platform variables OC Li et al. (2016)
(successful loan number, failed loan number,
membership score, prestige, forum currency,
contribution, group)
Fraud loan, learning, past performance, social RF, SVM Xu et al. (2016)
networking, herding manipulation
repayment rate loan application, loan, demographic, platform MLR Xia and Li (2016)
top-N recommendation Invest or not, proportion express bid amount user UCF, ICF, LDA Zhang et al. (2016b)
invest in a project, categorical proportion amount
variable
2017 (1) Funded; (2) interest rate; (3) Default demographic, loan, credit, financial, third-party LR, TOBIT, PROBIT Tao et al. (2017)
credit guarantee, offline authentication
# investors financial, credit, social capital, risk management ELM-MLR Yan et al. (2016)
(mortgage collateral or third-party guarantee),
listing information, operation duration
Default demographic, financial, loan, credit LR Chen (2017)
NI ECSC-L2-LR, ECSC-DT, ECSC-SVM, Ding et al. (2017)
ECVWC-L2-LR, ECVWC-DT, ECVWC-
SVM
demographic, loan, financial, employment, credit cs-XGBoost, LR, RF, cs-LR-T, cs-RF-T, Xia et al. (2017)
cs-LR-sm, cs-RF-sm
Default: 120 + days past due demographic, loan, campaign, image, social media LR. PSM, IV, Dif-Dif Ge et al. (2017)
Default: 120 + days past due-charged off credit, loan, financial, verification, number of LR Stofa (2017)
derogatory public records
Default: 150 + days past due-charged off NI cs-GNB, cs-DT, cs-LR, ru-GNB, sm- Boiko Ferreira et al.
GNB, sm-DT, ru-DT, ru-LR, sm-LR, (2017)
AdaBoost, BAG, RF
Default: 30 + days past due; Bad debt rate Forward-looking credit indicators (demographic, PROBIT, TOBIT. Three-stage dynamic Gao et al. (2017)
financial, employment), Backward-looking credit game to analyze the default behavior
indicators (Successful borrowing times, borrowing of borrowers
amount), loan, borrowing description and titles
(lengthened)
Default: three consecutive payments past due demographic, loan, verification, employment, S-DA(KM), S-CR, S-ANN, S-LR Byanjankar (2017)
financial
Default: charged off loan S-DA(KM) Durovic (2017)
Default: overdue demographic, employment, financial, credit FNT, IBPNN, RBF, ANN Zhang et al. (2017c)
LSTM, IBPNN, RBF, ANN Zhang et al.(2017b)
demographic, loan, financial, employment, credit LR Lin et al. (2017)
Degree by node profit (lenders and borrowers), collateral BBN Lee et al. (2017)
(continued on next page)

19
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

(continued )
Year Dependent variables Independent variables Specific Model Authors/Year

Rank by risk or profitability financial, management, operation, guarantee AHP-Index Xu and Zhang (2017)
top-N recommendation Default, time value of money, profits TCVM (profit: term, r, default(RF, Tan et al. (2017)
GBDT, KNN, NB LR, DT); TVM)
Invest or not, proportion express bid amount user’s UCF, ICF, LDA Zhang et al. (2017a)
invest in a project, categorical proportion amount
variable
2018 (1) Funded; (2) Default demographic, loan, credit historic, online seller, L1-LR Zhu (2018)
student, safety promise, number of bidders
(1) Funded; (2) interest rate; (3) % Funded; (4) # demographic, loan, Soft Information MLR, LR, S-CR Xu and Chau (2018)
Bids; (5) Default (communication, interaction, Lender Comment-
Social Influence, Borrower Response-Info Quality)
5 credit grades demographic, loan, financial BP-ANN Yuan et al. (2018)
7 credit grades loan, financial, credit csDT, csNB, csLR, csSVM with 3 cost Wang et al. (2018a)
matrices
Approved; declined financial, loan, credit historic, employment LR Wei et al. (2018)
Default demographic, loan, credit, network behavior, third- E(XGBoost, DNN, LR), XGBoost, DNN, Li et al. (2018a)
party, social network LR
demographic, financial, loan LR, S-DA Canfield (2018)
demographic, loan, credit, financial LR, PROBIT Liu et al. (2018)
demographic, loan, financial, credit CPLE-LightGBM, CPLE-RF, CPLE-LR, Xia et al. (2018)
CPLE-SVM, E-SVM, E-RF, E-LightGBM,
S3VM, Ex, Ag, LR, SVM, RF
demographic, loan, reputational risk, Macro QM-EBRR Uddin et al. (2018)
(sectorial risk, country risk)
demographics, mobility patterns, phone usage data: AdaBoost, RF, LR Ma et al. (2018a)
(phone calls, text messages, and data traffic),
telecommunication patterns, App usage patterns
and telecommunication records
financial ratios + degree and closeness centrality CN-LR, CN-DT, LR, DT Hadji-Misheva et al.
measures (2018)
loan, financial, loan behavior LightGBM, XGBoost Ma et al. (2018b)
Default; fully paid demographic, loan, financial, soft features (asset, RF, LR, NB, SVM; with LDA Jiang et al. (2018)
income, work, family, agriculture, and length)
Default: 120 + days past due-charged off demographic, loan, financial, credit RFoGAPS, RF, SVM, DT, KNN, LN, Ye et al. (2018)
Actual profit, LR
Default: 150 + days past due-charged off demographic, loan, financial E-FIC(GBDT, AdaBoost, LR), E-MV Namvar et al. (2018)
(GBDT, AdaBoost, LR), E-OWAo
(GBDT, AdaBoost, LR), E-OWAp
(GBDT, AdaBoost, LR), GBDT,
AdaBoost, LR
Default: 90 + days past due demographic, loan, financial, credit S-EMRF, MCM, S-CR, LR Wang et al. (2018b)
Default: charged off, fully paid loan, financial GBDT, SVM, LR, KNN, NB, RF Rodrigues et al.
(2018)
LGD loan, credit, financial MLR Zhou et al. (2018)
network systemic risk: DVtf network variables CN, MLR Li et al. (2018b)
2019 Grade ≥ A: 1, Grade < A: − 1 demographic, employment, financial, loan, credit, GA-SVM Yang et al. (2019)
institutional guarantee
(1,3) Default; (2) Rejected demographic, loan, employment, credit LD, LR, SVM, ANN, KNN, RF with Nguyen Truong et al.
RBM-FS (2019)
(1) % negative sentiment comments-(2) trading % negative sentiment comments, TVI (trading Text-CNN-LSTM, Text-CNN-VAR, Fu et al. (2019)
volume index-(3) Weekday volume index-ND), Weekday Text-CNN-DNN, Text-CNN-MLR, Text-
CNN-RF, Text-CNN-SVM
(1) default-charged off-(2) 1 + public record Thau-kendall (copula), loan, financial, credit, BivGEV, BivProbit, LR Calabrese et al.
bankruptcies spatial variables (2019)
(1) Default; (2) IRR demographic, loan, financial, credit DP, WL, WDP, RF, GB, SVM with IHT, Bastani et al. (2019)
BC, ru, so, sm
(1) Default; (2) Rejected demographic, loan, employment, credit LD, LR, SVM, ANN, KNN, RF with Van-Sang et al.
RBM-FS (2019)
(1) Default; (2) Successful demographic, loan, verification and proof LR, Probit Xinmin et al. (2019)
variables, successful borrowing times,
(1) Potential defaulted; (2) Default loan, credit, financial, employment, region LR, RF, SVM; S3VM, OD-LightGBM Xia (2019)
information
Default demographic, financial, soft (loan description, CatBoost, LR, DT, BNN, RF, GBDT, Xia et al. (2019)
character, clustering result based soft information) XGBoost
demographic, registration, loan, social network LightGBM, RF, AdaBoost and LR Niu et al. (2019)
financial ratios CN(LF)-LR Ahelegbey et al.
(2019)
NI E(GBDT, XGBoost, LightGBM), NN, Zhou et al. (2019)
LR, RF, SVM, KNN, AdaBoost
soft information (online operation behavior: data of AM-LSTM, BOA-XGBoost, LSTM, Wang et al. (2019)
borrower’s online operation behavior on P2P BLSTM, BLSTM-Meanpool.
lending website), credit
Default: 16 + days past due, charged off, In Grace loan, verification, application type RF, DT, SVM, LR with sm Zhu et al. (2019)
period; Current, Fully Paid, Issued
Default: 90 + days past due demographic, loan, employment, financial, credit Latency and Incidence with MCM: RF- Jiang et al. (2019)
TDH, RF-Cox, LR-Cox, LR-TDH and
(continued on next page)

20
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

(continued )
Year Dependent variables Independent variables Specific Model Authors/Year

RF, LR, DT, B-LR, B-DT, BAG-LR, BAG-


DT
Default: charged off, fully paid demographic, loan, credit E(LP, TSVM), LP, TSVM, DT Kim and Cho (2019a)
demographic, loan, employment, financial, credit IEFSVM, cs-AdaBoost, cs-RF, EE, ru-B, Cho et al. (2019)
w-ELM, cs-XGBoost, EFSVM
Default:120 + days past due, 120- days past due, demographic, loan, financial DNN, LR, LD, DT, SVM, RBF, MLP, Duan (2019)
0 safe loans AdaBoost
Default: charged off, fully paid demographic, loan, employment, financial, credit CNN: (CNN, Inception, ResNet, Kim and Cho (2019a)
DenseNet, Inception-ResNet); MLP,
SVM, KNN, DT, RF
DP-CNN, CNN: (CNN, Inception, Kim and Cho (2019b)
ResNet, DenseNet, Inception-ResNet);
MLP, SVM, KNN, DT, RF
Default: with outstanding repayment records demographic, loan, platform authentication, LQR Chen et al. (2019)
regulation change of the government
Hazard exposed (absconded with ill-gotten gains, Text information: Management team members’ MUN-LETCLA, Doc2vec, LDA, Li et al. (2019)
difficult withdrawing, out of business and working experience, educational background, and Dependency, Syntactic, Keyword
investigated by Economic Crime Investigation composition
Police), Normal.
fully repay the loan in advance demographic, loan, verification, financial, S-CR Wan et al. (2019)
employment, credit
Funded demographic, loan, credit, financial, employment, LDA(VSM)-LR Yao et al. (2019)
platform audition, soft information (capital
turnover, Investment and entrepreneurship,
expanding business, ambiguous purpose, house
decoration, household expenses, Daily
consumption, Car purchase)
Number of loans by grade. top-N return-risk trade-off (probability of appearance of MO Ren and Malik
recommendation loans in the future and maximum number of days (2019)
the investor is willing to wait to invest their funds)
Total loss ratio (LGD, balance at default) demographic, loan, employment, financial SPM-LIR Gourieroux and Lu
(2019)
Lending Intention, Platform Trust Protection Policies, Platform Trust SEM Amalia et al. (2019)
Survey: demographic, investment background, investment behavior, investment criteria, motivation to DA, PCA Pierrakis (2019)
lend variables
Semi structured interviews (nonprobability sampling): Collateral, requirement, procedure, online QM, DA Rosavina et al.
process, fast process, interest rate, credit scoring, profits, costs variables (2019)
2020 Default demographic, loan, financial, employment, credit DT-J48, LR Cai and Zhang
(2020)
demographic, network behavior, third party E(XGBoost, DNN, LR), XGBoost, DNN, Li et al. (2020)
information, social network, loan transaction time, LR, AdaBoost, GBDT, RF, DT, KNN,
loan SVM
financial ratios, degree(number of neighbors of a LR Giudici et al. (2020)
node), strength(average distance of a node for its
neighbors) and PageRank (importance of a node in
a network by assigning relative scores to all nodes
in the network)
Default: 120 + days past due demographic, loan, credit, soft information LR, L1-LR, RF, XGBoost with Semantic Wang et al. (2020)
(semantic vs. linguistic and stylistic factors) Soft Factor Mining Method, LDA
Rank by risk or profitability demographic, loan, employment, financial, credit DHPF-TODIM Ji et al. (2020)

Legend: AdaBoost: Adaptive Boosting, Ag: Augmentation, AHP: Analytic Hierarchy Process, AM: Attention Mechanism, ANN: Artificial Neural
Network, Ap: Actual profit, B: Boosting, BAG: Bagging method, BBN: Bianconi-Barabasi network model, BC: Balance Cascade, BivGEV: Bivariate
Generalized Extreme Value regression, BivProbit: Bivariate Probit regression, BLSTM: Bidirectional LSTM network, BNN: Bagging Neural Network,
BOA: Bayesian hyperparameter optimization, BP: Back Propagation, C4.5: C4.5 decision tree algorithm, CatBoost: Unbiased boosting with categorical
features, CN: Connective Network model, CNN: Convolutional Neural Network, CPLE: Contrastive Pessimistic Likelihood Estimation, CR: Cox
Regression, cs: cost sensitive, DA: Descriptive Analysis, DenseNet: Dense Convolutional Network, DHPF-TODIM: Dual Hesitant Pythagorean Fuzzy
number + TODIM approach ((an acronym in Portuguese of interactive and multicriteria decision making), Dif-Dif: Difference-in-Differences, DNN:
Deep Neural Network, DP: Deep Learning, DST: Dempster–Shafer theory, DT: Decision Tree, E: Ensemble, EBRR: Expert Based Risk Rating, ECSC:
Ensemble Classification based on Supervised Clustering, ECVWC: Ensemble Classification based on Variable Weighting Clustering, EE: Easy Ensemble,
EFSVM: Entropy Fuzzy Support Vector Machine, ELM: Elaboration Likelihood Model, EMRF: survival Ensemble Mixture RF, Ex: Extrapolation, FIC:
Fuzzy Integral Combination, FICO: Fair Isaac Corporation Score, FNT: Flexible Neural Tree, FS: Feature selection, GA: Genetic Algorithm, GB: Gradient
Boosting regression, GBDT: Gradient Boosting Decision Tree, GNB: Gaussian Naive Bayes, HMR: Hierarchical Multiple Regression, IBPNN: Improved
BP Neutral Network, ICF: Item based Collaborative Filtering, IEFSVM: Instance-based Entropy Fuzzy Support Vector Machine, IHT: Instance Hardness
Threshold, IV: Instrumental Variables, KM: Kaplan-Meier method, KNN: K-Nearest Neighbor, L1: Lasso (L1 norm) Regularization, L2: Ridge (L2 norm)
Regularization, LC: Lending Club, LD: Linear Discriminant Analysis, LDA: Latent Dirichlet Allocation, LF: Latent factor model, LightGBM: Light
Gradient Boosting Machine, LIR: semiparametric estimator Least Impulse Response, LP: Label Propagation, LQR: Logistic Quantile Regression, LR:
Logistic Regression, LSTM: Long Short-Term Memory, MCM: Mixture Cure Model, MLP: Multilayer Perceptron , MLR: Multiple Linear Regression, MO:
Markowitz optimization model, MR: Multinomial Regression, MUN-LETCLA: Multiple NLP (Natural Language Processing) Integrated Learning Text
Classifier Model, MV: Majority voting, NB: Naive Bayes, NR: Nuclear Regression model, OC: Outliers Cluster-based method, OD: Outlier Detection,
OWAo: Optimistic OWA (ordered weighting averaging), OWAp: Pessimistic OWA (ordered weighting averaging), PCA: Principle Component Analysis,
PSM: Propensity Score Matching, QM: Qualitative Model, RBF: Radial Basis Function, RBM: Restricted Boltzmann Machines, ResNet: Residual Neural

21
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

network, RF: Random Forest, RFoGAPS: Random Forest optimized using a genetic algorithm with profit score, ru: random under sampling, S: Survival,
S3VM: semi supervised variants of SVM, SEM: Structural Equation Model, sm: smote (Synthetic Minority Oversampling Technique), so: random
oversampling, SPM: Semi-Parametric Transformation model, SVM: Support Vector Machine , T: Threshold, TCVM: Total Capital Value Maximization,
TDH: Time-Dependent Hazards, TSVM: Transductive Support Vector Machine, TVM: Time Value of Money, UCF: User based Collaborative Filtering,
VAR: Vector Autoregressive Model, VSM: Vector Space Model, WDP: Wide and Deep Learning, WL: Wide Learning, XGBoost: Extreme Gradient
Boosting. NI: No Information.
Source: Compiled by the authors

References Clarivate, 2021. Web of Science Journal Evaluation Process and Selection Criteria.
https://clarivate.com/webofsciencegroup/journal-evaluation-process-and-selection-
criteria/.
Ahelegbey, D.F., Giudici, P., Hadji-Misheva, B., 2019. Latent factor models for credit
Cummins, M., Lynn, T., Mac an Bhaird, C., Rosati, P., 2019. Addressing information
scoring in P2P systems. Physica A-Statistical Mechanics and Its Applications 522,
asymmetries in online In: Lynn, T., Mooney, J.G. (Wds.);Peer-to-Peer Lending.
112–121. https://doi.org/10.1016/j.physa.2019.01.130.
Disrupting Finance, pp. 15-31 https://doi.org/10.1007/978-3-030-02330-0_2.
Amalia, N., Dalimunthe, Z., and Triono, R. A., 2019. The Effect of Lender’s Protection on
Ding, H., Zhang, P., Lu, T., Gu, H., Gu, N., 2017. Credit Scoring Using Ensemble
Online Peer-to-Peer Lending in Indonesia. In: Proceedings of the 33rd International
Classification Based on Variable Weighting Clustering. In: Shen, W., Antunes, P.,
Business Information Management Association Conference, IBIMA. Education
Thuan, N.H., Barthes, J.P., Luo, J., Yong, J. (Eds.), 2017 Ieee 21st International
Excellence and Innovation Management through Vision 2020.
Conference on Computer Supported Cooperative Work in Design, pp. 509–514.
Aria, M., Cuccurullo, C., 2017. Bibliometrix: An R-tool for comprehensive science
Duan, J., 2019. Financial system modeling using deep neural networks (DNNs) for
mapping analysis. Journal of Informetrics 11 (4), 959–975. https://doi.org/
effective risk assessment and prediction. Journal of the Franklin Institute-
10.1016/j.joi.2017.08.007.
Engineering and Applied Mathematics 356 (8), 4716–4731. https://doi.org/
Ariza-Garzon, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.-J., 2020. Explainability
10.1016/j.jfranklin.2019.01.046.
of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE Access
Durovic, A., 2017. Estimating probability of default on peer to peer market - survival
8, 64873–64890. https://doi.org/10.1109/Access.628763910.1109/
analysis approach. Journal of Central Banking Theory and Practice 6 (2), 149–167.
ACCESS.2020.2984412.
https://doi.org/10.1515/jcbtp-2017-0017.
Bachmann, A., Becker, A., Buerckner, D., Hilker, M., Kock, F., Lehmann, M., Tiburtius, P.,
Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M., 2015. Evaluating credit risk and loan
Funk, B., 2011. Online peer-to-peer lending - A literature review. Journal of Internet
performance in online Peer-to-Peer (P2P) lending. Appl. Econ. 47 (1), 54–70.
Banking and Commerce 16 (2).
https://doi.org/10.1080/00036846.2014.962222.
Bae, J.K., 2018. A Study on the Determinant Factors of P2P Loans and Activation Factors
Financial Stability Board, 2017. Artificial intelligence and machine learning in financial
of P2P Lending Market - P2p. Logos Management Review 16 (2), 21–36.
services. Market developments and financial stability implications. FSB. Financial
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A.,
Stability Board. internal-pdf://0180047707/P011117.pdf.
Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F., 2020.
Fu, X., Zhang, S., Chen, J., Ouyang, T., Wu, Ji, 2019. A sentiment-aware trading volume
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and
prediction model for P2P market using LSTM. IEEE Access 7, 81934–81944. https://
challenges toward responsible AI. Information Fusion 58, 82–115. https://doi.org
doi.org/10.1109/Access.628763910.1109/ACCESS.2019.2923637.
/10.1016/j.inffus.2019.12.012.
Gao, Y., Sun, J., Zhou, Q., 2017. Forward looking vs backward looking An empirical
Bastani, K., Asgari, E., Namavari, H., 2019. Wide and deep learning for peer-to-peer
study on the effectiveness of credit evaluation system in China’s online P2P lending
lending. Expert Syst. Appl. 134, 209–224. https://doi.org/10.1016/j.
market. China Finance Review International 7 (2), 228–248. https://doi.org/
eswa.2019.05.042.
10.1108/cfri-07-2016-0089.
Birkle, C., Pendlebury, D.A., Schnell, J., Adams, J., 2020. Web of Science as a data source
Gao, Y., Yu, S., Chen, M., Shiue, Y., 2020. A 2020 perspective on “The performance of the
for research on scientific and scholarly activity. Quantitative Science Studies 1 (1),
P2P finance industry in China”. Electron. Commer. Res. Appl. 40, 100940. https://
363–376. https://doi.org/10.1162/qss_a_00018.
doi.org/10.1016/j.elerap.2020.100940.
Boiko Ferreira, L.E., Barddal, J.P., Enembreck, F., Gomes, H.M., 2017. Improving Credit
Ge, R., Feng, J., Gu, B., Zhang, P., 2017. Predicting and Deterring Default with Social
Risk Prediction in Online Peer-to-Peer (P2P) Lending Using Imbalanced Learning
Media Information in Peer-to-Peer Lending. Journal of Management Information
Techniques. In: 2017 IEEE 29th International Conference on Tools with Artificial
Systems 34 (2), 401–424. https://doi.org/10.1080/07421222.2017.1334472.
Intelligence, pp. 175–181. https://doi.org/10.1109/ictai.2017.00037.
Giudici, P., Hadji-Misheva, B., Spelta, A., 2020. Network based credit risk models. Qual.
Bussmann, N., Giudici, P., Marinelli, D., Papenbrock, J., 2020. Explainable AI in Credit
Eng. 32 (2), 199–211. https://doi.org/10.1080/08982112.2019.1655159.
Risk Management. Frontiers in Artifical Intelligence. Artifical Intelligence in
Gong, R., Xue, J., Zhao, L., Zolotova, O., Ji, X., Xu, Y., 2019. A bibliometric analysis of
Finance. https://doi.org/10.3389/frai.2020.00026.
green supply chain management based on the Web of Science (WOS) platform.
Byanjankar, A., 2017. Predicting Credit Risk in Peer-to-Peer Lending with Survival
Sustainability 11 (12), 3459.
Analysis. In 2017 Ieee Symposium Series on Computational Intelligence (SSCI).
Gourieroux, C., Lu, Y., 2019. Least impulse response estimator for stress test exercises.
Byanjankar, A., Heikkila, M., Mezei, J., 2015. Predicting Credit Risk in Peer-to-Peer
J. Bank. Finance 103, 62–77. https://doi.org/10.1016/j.jbankfin.2019.03.021.
Lending: A Neural Network Approach. In: 2015 Ieee Symposium Series on
Greiner, M.E., Wang, H., 2010. Building consumer-to-consumer trust in E-finance
Computational Intelligence, pp. 719–725. https://doi.org/10.1109/ssci.2015.109.
marketplaces: an empirical analysis. International Journal of Electronic Commerce
Cai, S., Zhang, J., 2020. Exploration of credit risk of P2P platform based on data mining
15 (2), 105–136. https://doi.org/10.2753/jec1086-4415150204.
technology. J. Comput. Appl. Math. 372, 112718. https://doi.org/10.1016/j.
Guo, G., Zhu, F., Chen, E., Liu, Q., Wu, L., Guan, C., 2016. From footprint to evidence: an
cam.2020.112718.
exploratory study of mining social data for credit scoring. Acm Transactions on the
Calabrese, R., Osmetti, S.A., Zanin, L., 2019. A joint scoring model for peer-to-peer and
Web (TWEB) 10 (4), 1–38. https://doi.org/10.1145/2996465.
traditional lending: a bivariate model with copula dependence. Journal of the Royal
Hadji-Misheva, B. H., Giudici, P., Pediroda, V., Ieee., 2018. Network-based models to
Statistical Society Series A-Statistics in Society 182 (4), 1163–1188. https://doi.org/
improve credit scoring accuracy. In: 2018 Ieee 5th International Conference on Data
10.1111/rssa:v182.410.1111/rssa:12523.
Science and Advanced Analytics, pp. 623–630. https://doi.org/10.1109/
Canfield, R., C. E, 2018. Determinants of Default in P2P Lending: The Mexican Case.
dsaa.2018.00080.
Independent Journal of Management and Production 9 (1), 1–24. https://doi.
Herzenstein, M., Sonenshein, S., Dholakia, U.M., 2011. Tell me a good story and i may
org/10.14807/ijmp.v9i1.537.
lend you money: the role of narratives in peer-to-peer lending decisions. J. Mark.
Carvalho, D.V., Pereira, E.M., Cardoso, J.S., 2019. Machine Learning Interpretability: A
Res. 48 (SPL), S138–S149. https://doi.org/10.1509/jmkr.48.SPL.S138.
Survey on Methods and Metrics. Electronics 8(8), 832, 1–34. https://doi.org/
Ji, X., Yu, L., Fu, J., 2020. Evaluating personal default risk in P2P lending platform: based
10.3390/electronics8080832.
on dual hesitant pythagorean fuzzy TODIM approach. Mathematics 8 (1), 8. https://
Chen, C., Dong, M. C., Liu, N., Sriboonchitta, S., 2019. Inferences of default risk and
doi.org/10.3390/math8010008.
borrower characteristics on P2P lending. North Am. J. Econ. Fin., 50, 101013.
Jiang, C.Q., Wang, Z., Wang, R.Y., Ding, Y., 2018. Loan default prediction by combining
https://doi.org/10.1016/j.najef.2019.101013.
soft information extracted from descriptive text in online peer-to-peer lending. Ann.
Chen, D., Li, X., Lai, F., 2016. Gender discrimination in online peer-to-peer credit
Oper. Res. 266 (1–2), 511–529. https://doi.org/10.1007/s10479-017-2668-z.
lending: evidence from a lending platform in China. Electronic Commerce Research
Jiang, C., Wang, Z., Zhao, H., 2019. A prediction-driven mixture cure model and its
17 (4), 553–583. https://doi.org/10.1007/s10660-016-9247-2.
application in credit scoring. Eur. J. Oper. Res. 277 (1), 20–31. https://doi.org/
Chen, Y., 2017. Research on the credit risk assessment of chinese online peer-to-peer
10.1016/j.ejor.2019.01.072.
lending borrower on logistic regression model. In: In: 3rd Asian Pacific Conference
Jin, Y., Zhu, Y., 2015. A data-driven approach to predict default risk of loan for online
on Energy, Environment and Sustainable Development, pp. 216–221.
Peer-to-Peer (P2P) lending. In: Tomar, G. (Ed.), 2015 Fifth International Conference
Cho, P., Chang, W., Song, J.W., 2019. Application of instance-based entropy fuzzy
on Communication Systems and Network Technologies, pp. 609–613. https://doi.
support vector machine in peer-to-peer lending investment decision. IEEE Access 7,
org/10.1109/csnt.2015.25.
16925–16939. https://doi.org/10.1109/access.2019.2896474.
Kim, A., Cho, S.-B., 2019a. An ensemble semi-supervised learning method for predicting
Claessens, S., Frost, J., Turner, G., Zhu, F., 2018. Fintech credit markets around the
defaults in social lending. Eng. Appl. Artif. Intell. 81, 193–199. https://doi.org/
world: size, drivers and policy issues. BIS Quarterly Review September. https:
10.1016/j.engappai.2019.02.014.
//www.bis.org/publ/qtrpdf/r_qt1809e.pdf.

22
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Kim, J.-Y., Cho, S.-B., 2019b. Predicting repayment of borrows in peer-to-peer social Ren, K., Malik, A., 2019. Investment Recommendation System for Low-Liquidity Online
lending with deep dense convolutional network. Expert Systems 36 (4), e12403. Peer to Peer Lending (P2PL) Marketplaces. In: In Proceedings of the Twelfth ACM
https://doi.org/10.1111/exsy.12403. International Conference on Web Search and Data Mining, pp. 510–518. https://doi.
Kim, J.-Y., Cho, S.-B., 2019c. Towards repayment prediction in peer-to-peer social org/10.1145/3289600.3290959.
lending using deep learning. Mathematics 7 (11). https://doi.org/10.3390/ Rodrigues, D. S., Brasil, A. R. A., Costa, M. B., Komati, K. S., Pinto, L. A., Acm, 2018. A
math7111041. comparative analysis of loan requests classification algorithms in a peer-to-peer
Koseoglu, M.A., 2016. Growth and structure of authorship and co-authorship network in lending platform. In: Proceedings of the 14th Brazilian Symposium on Information
the strategic management realm: Evidence from the Strategic Management Journal. Systems. https://doi.org/10.1145/3229345.3229390.
BRQ Business Research Quarterly 19 (3), 153–170. https://doi.org/10.1016/j. ROFIEG, Expert Group on Regulatory Obstacles to Financial Innovation, 2019. Thirty
brq.2016.02.001. recommendations on regulation, innovation and finance (Issue December). Final
Kumar, V. L., Natarajan, S., Keerthana, S., Chinmayi, K. M., Lakshmi, N., 2016. Credit Report to the European Commission. https://ec.europa.eu/info/files/191113-report-
Risk Analysis in Peer-to-Peer Lending System. In: 2016 IEEE International expert-group-regulatory-obstacles-financial-innovation_en.
Conference on Knowledge Engineering and Applications. pp. 193-196. https://doi: Rosavina, M., Rahadi, R.A., Kitri, M.L., Nuraeni, S., Mayangsari, L., 2019. P2P lending
10.1109/ICKEA.2016.7803017. adoption by SMEs in Indonesia. Qualitative Research in Financial Markets 11 (2),
Lee, Y.-W., Chen, S., Yu, T., Ieee, 2017. Analysis of the Impact of Collateral on Peer-to- 260–279. https://doi.org/10.1108/qrfm-09-2018-0103.
Peer Lending. In: 2017 IEEE/Sice International Symposium on System Integration, Rudin, C., 2019. Stop explaining black box machine learning models for high stakes
pp. 77–82. decisions and use interpretable models instead. Nature Machine Intelligence 1 (5),
Promoting Business Analytics and Quantitative Management of Technology Vol. 91, 206–215. https://doi.org/10.1038/s42256-019-0048-x.
2016, 357–361. https://doi.org/10.1016/j.procs.2016.07.095. Serrano-Cinca, C., Gutierrez-Nieto, B., 2016. The use of profit scoring as an alternative to
Li, L., Feng, Y., Lv, Y., Cong, X., Fu, X., Qi, J., 2019. Automatically detecting peer-to-peer credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89,
lending intermediary risk-top management team profile textual features perspective. 113–122. https://doi.org/10.1016/j.dss.2016.06.014.
IEEE Access 7, 72551–72560. https://doi.org/10.1109/Access.628763910.1109/ Serrano-Cinca, C., Gutierrez-Nieto, B., Lopez-Palacios, L., 2015. Determinants of Default
ACCESS.2019.2919727. in P2P Lending. Plos One, 10(10), e0139427. https://doi.org/10.1371/journal.
Li, W., Ding, S., Chen, Y., Yang, S., 2018a. Heterogeneous ensemble for default prediction pone.0139427.
of peer-to-peer lending in China. IEEE Access 6, 54396–54406. https://doi.org/ Soo, H., A, 2016. FinTech supporting Government’s Policy, its Implementing Measures
10.1109/access.2018.2810864. and Legal Institution in UK- focused on the Payment Service Industry. Kangwon Law
Li, W., Ding, S., Wang, H., Chen, Y., Yang, S., 2020. Heterogeneous ensemble learning Review 49, 179–219. https://doi.org/10.18215/kwlr.2016.49.179.
with feature engineering for default prediction in peer-to-peer lending in China. Stofa, T., 2017. Analysis of repayment failures in P2P Lending. In: Gavurova, B., Soltes,
World Wide Web-Internet and Web Information Systems 23 (1), 23–45. https://doi. M. (Eds.), Central European Conference in Finance and Economics CEFE 2017, pp.
org/10.1007/s11280-019-00676-y. 773-781.
Li, Y., Hao, A., Zhang, X., Xiong, X., 2018b. Network topology and systemic risk in Peer- Sungbok, L., 2018. Study on the Financial Intermediary Role of P2P Lending Platform -
to-Peer lending market. Physica A-Statistical Mechanics and Its Applications 508, P2p. Journal of Money and Finance 32 (2), 21–62. https://doi.org/10.21023/j
118–130. https://doi.org/10.1016/j.physa.2018.05.083. mf.32.2.2.
Lin, X.C., Li, X.L., Zheng, Z., 2017. Evaluating borrower’s default risk in peer-to-peer Tan, Y., Zheng, X., Zhu, M., Wang, C., Zhu, Z., Yu, L., 2017. Investment Recommendation
lending: evidence from a lending platform in China. Appl. Econ. 49 (35), 3538–3545. with Total Capital Value Maximization in Online P2P Lending. In: Hussain, O.,
https://doi.org/10.1080/00036846.2016.1262526. Jiang, L., Fei, X., Lan, C.W., Chao, K.M. (Eds.), 2017 Ieee 14th International
Liu, C., Yan, J., 2016. Researches on Risks and Precautions of Chinese P2P Lending. In: Conference on E-Business Engineering. https://doi.org/10.1109/icebe.2017.32.
Kuek, M., Zhao, R. (Eds.), Proceedings of the 23rd International Business Annual Tao, Q., Dong, Y., Lin, Z., 2017. Who can get money? Evidence from the Chinese peer-to-
Conference. peer lending platform. Information Systems Frontiers 19 (3), 425–441. https://doi.
Liu, H., Zhou, S., Yang, W., 2019. Research on Intelligent Inter net Financial Investment org/10.1007/s10796-017-9751-5.
Model. In: R. Su (Ed.), In 2019 International Conference on Image and Video Uddin, M.J., Vizzari, G., Bandini, S., Imam, M.O., 2018. A case-based reasoning approach
Processing, and Artificial Intelligence, vol. 11321. International Society for Optics to rate microcredit borrower risk in online Kiva P2P lending model. Data
and Photonics, , p. 113211P. https://doi.org/10.1117/12.2539006. Technologies and Applications 52 (1), 58–83. https://doi.org/10.1108/dta-02-2017-
Liu, Y., Zhou, Q., Zhao, X., Wang, Y., 2018. Can listing information indicate borrower 0009.
credit risk in online peer-to-peer lending? Emerging Markets Finance and Trade 54 Van-Sang, H., Dang-Nhac, L., Choi, G. S., Ha-Nam, N., Yoon, B., 2019. Improving Credit
(13), 2982–2994. https://doi.org/10.1080/1540496x.2018.1427061. Risk Prediction in Online Peer-to-Peer {(P2P)} Lending Using Feature selection with
Ma, L., Zhao, X., Zhou, Z., Liu, Y., 2018a. A new aspect on P2P online lending default Deep learning. In: 2019 21st International Conference on Advanced Communication
prediction using meta-level phone usage data in China. Decis. Support Syst. 111, Technology, 6(1), pp. 20–31. https://doi.org/10.23919/ICACT.2019.8701943.
60–71. https://doi.org/10.1016/j.dss.2018.05.001. van Eck, N.J., Waltman, L., 2010. Software survey: VOSviewer, a computer program for
Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q., Niu, X., 2018b. Study on a prediction of P2P bibliometric mapping. Scientometrics 84 (2), 523–538. https://doi.org/10.1007/
network loan default based on the machine learning LightGBM and XGboost s11192-009-0146-3.
algorithms according to different high dimensional data cleaning. Electron. Commer. Van Eck, N. J., & Waltman, L. (2011). Text mining and visualization using VOSviewer.
Res. Appl. 31, 24–39. https://doi.org/10.1016/j.elerap.2018.08.002. ArXiv Preprint ArXiv:1109.2058.
Malekipirbazari, M., Aksakalli, V., 2015. Risk assessment in social lending via random van Eck, N. J., & Waltman, L. (2020): VOSviewer Manual 1.6.16. Manual (version
forests. Expert Syst. Appl. 42 (10), 4621–4631. https://doi.org/10.1016/j. 1.6.16). Available at https://www.vosviewer.com/documentation/Manual_
eswa.2015.02.001. VOSviewer_1.6.16.pdf.
Milne, A., Parboteeah, P., 2016. The Business Models and Economics of Peer-to-Peer Waltman, L., van Eck, N.J., Noyons, E.C.M., 2010. A unified approach to mapping and
Lending. Centre for European Policy Studies, 17, 36. European Credit Research clustering of bibliometric networks. Journal of Informetrics 4 (4), 629–635. htt
Institute (ECRI) http://aei.pitt.edu/76108/1/ECRI_RR17_P2P_Lending.pdf - ps://doi.org/10.1016/j.joi.2010.07.002.
Technical Report. Wan, J., Zhang, H., Zhu, X., Sun, X., and Li, G. (2019). Research on Influencing Factors of
Molnar, C., 2021. Interpretable Machine Learning. A Guide for Making Black Box Models P2P Network Loan Prepayment Risk Based on Cox Proportional Hazards. In E.
Explainable. https://christophm. github.io/interpretable-ml-book/. HerreraViedma, Y. Shi, D. Berg, J. Tien, F. J. Cabrerizo, and J. Li (Eds.), 7th
Namvar, A., Naderpour, M., Ieee, 2018. Handling uncertainty in social lending credit risk International Conference on Information Technology and Quantitative Management
prediction with a Choquet fuzzy integral model. In: 2018 Ieee International (Vol. 162, pp. 842–848). https://doi.org/10.1016/j.procs.2019.12.058.
Conference on Fuzzy Systems. Wang, C., Han, D., Liu, Q., Luo, S., 2019. A Deep learning approach for credit scoring of
Nguyen Truong, T., Khuat Thanh, S., Ngo Thi Thu, T., Nguyen Ha, N., Tran Manh, D., peer-to-peer lending using attention mechanism LSTM. IEEE Access 7, 2161–2168.
2019. Improve Risk Prediction in Online Lending (P2P) Using Feature Selection and https://doi.org/10.1109/access.2018.2887138.
Deep Learning. Int. J. Comput. Sci. Network Security, 19(11), 216–222. Wang, H., Kou, G., Peng, Y., 2018a. Cost-sensitive Classifiers in Credit Rating A
Niu, B., Ren, J., Li, X., 2019. Credit scoring using machine learning by combing social Comparative Study on P2P Lending. In: Dzitac, I., Filip, F.G., Manolescu, M.J.,
network information: evidence from peer-to-peer lending. Information 10 (12), 397. Dzitac, S., Oros, H., Dzitac, D. (Eds.), 2018 7th International Conference on
https://doi.org/10.3390/info10120397. Computers Communications and Control.
Park, S., Choi, D., 2019. A study on P2P lending deadline prediction model based on Wang, L., 2018. Supervision of Peer-to-Peer Lending in China. In: Liu, J., Teves, K.L.
machine learning. Journal of KIISE 46 (2), 174–183. (Eds.), Proceedings of the 2018 2nd International Conference on Education,
Pierrakis, Y., 2019. Peer-to-peer lending to businesses: Investors’ characteristics, Economics and Management Research, vol. 182, pp. 291–293.
investment criteria and motivation. International Journal of Entrepreneurship and Wang, S., Qi, Y., Fu, B., Liu, H., 2016. Credit Risk evaluation based on text analysis. Int.
Innovation 20 (4), 239–251. https://doi.org/10.1177/1465750319842528. J. Cognit. Inf. Nat. Intell. 10(1), 1–11. https://doi.org/10.4018/ijcini.2016010101.
Pokorna, M., Sponer, M., 2016. Social lending and its risks. In: Kapounek, S., Krutilova, Wang, Z., Jiang, C., Ding, Y., Lyu, X., Liu, Y., 2018b. A Novel behavioral scoring model
V. (Eds.), 19th International Conference Enterprise and Competitive Environment for estimating probability of default over time in peer-to-peer lending. Electron.
2016, vol. 220, pp. 330–337. https://doi.org/10.1016/j.sbspro.2016.05.506. Commer. Res. Appl. 27, 74–82. https://doi.org/10.1016/j.elerap.2017.12.006.
Pur, S., Huesig, S., Mann, H.-G., Schmidhammer, C., 2014. How to Analyze the Wang, Z., Jiang, C., Zhao, H., Ding, Y., 2020. Mining semantic soft factors for credit risk
Disruptive Potential of Business Model Innovation in Two-Sided Markets?: The Case evaluation in peer-to-peer lending. Journal of Management Information Systems 37
of Peer to Peer Lending Marketplaces in Germany. In: Kocaoglu, D. F., Anderson, T. (1), 282–308. https://doi.org/10.1080/07421222.2019.1705513.
R., Daim, T. U., Kozanoglu, D. C., Niwa, K., Perman, G. (Eds.), 2014 Portland Wei, X., Gotoh, J., Uryasev, S., 2018. Peer-to-peer lending: classification in the loan
International Conference on Management of Engineering and Technology, pp. application process. Risks 6 (4), 129. https://doi.org/10.3390/risks6040129.
693–709.

23
M.-J. Ariza-Garzón et al. Electronic Commerce Research and Applications 49 (2021) 101079

Wu, C., Zhang, D., Wang, Y., 2018. Evaluating the risk performance of online peer-to- Yao, J., Chen, J., Wei, J., Chen, Y., Yang, S., 2019. The relationship between soft
peer lending platforms in China. Journal of Risk Model Validation 12 (2), 63–87. information in loan titles and online peer-to-peer lending: evidence from RenRenDai
https://doi.org/10.21314/jrmv.2018.187. platform. Electronic Commerce Research 19 (1), 111–129. https://doi.org/10.1007/
Xia, L., Li, J., 2016. Analysis on Credit Risk Assessment of P2P. In: Qi, E., Shen, J., s10660-018-9293-z.
Dou, R. (Eds.), Proceedings of the 22nd International Conference on Industrial Ye, X., Dong, L.-A., Ma, D., 2018. Loan evaluation in P2P lending based on Random
Engineering and Engineering Management: Core Theory and Applications of Forest optimized by genetic algorithm with profit score. Electron. Commer. Res.
Industrial Engineering. https://doi.org/10.2991/978-94-6239-180-2_86. Appl. 32, 23–36. https://doi.org/10.1016/j.elerap.2018.10.004.
Xia, Y., 2019. A novel reject inference model using outlier detection and gradient Yli-Huumo, J., Ko, D., Choi, S., Park, S., Smolander, K., 2016. Where is current research
boosting technique in peer-to-peer lending. IEEE Access 7, 92893–92907. https:// on blockchain technology?—a systematic review. PloS One, 11(10), e0163477.
doi.org/10.1109/Access.628763910.1109/ACCESS.2019.2927602. Yuan, Z. N., Wang, Z. H., Xu, H., 2018. Credit Risk Assessment of Peer-to-Peer Lending
Xia, Y., He, L., Li, Y., Liu, N., Ding, Y., 2019. Predicting loan default in peer-to-peer Borrower Utilizing {BP} Neural Network. In: Barolli, L., Zhang, M., Wang, X. A.
lending using narrative data. Journal of Forecasting 39 (2), 260–280. https://doi. (Eds.), Advances in Internetworking, Data and Web Technologies, Eidwt-2017, vol.
org/10.1002/for.v39.210.1002/for.2625. 6, pp. 22–33. https://doi.org/10.1007/978-3-319-59463-7_3.
Xia, Y., Liu, C., Liu, N., 2017. Cost-sensitive boosted tree for loan evaluation in peer-to- Zang, D., Qi, M., Fu, Y., 2015. The credit risk assessment of P2P lending based on BP
peer lending. Electron. Commer. Res. Appl. 24, 30–49. https://doi.org/10.1016/j. neural network. In: Lee, G. (Ed.), Industrial Engineering and Management Science
elerap.2017.06.004. (Vol. 2, p. 91).
Xia, Y., Yang, X., Zhang, Y., 2018. A rejection inference technique based on contrastive Zhang, Y., Geng, X., Jia, H., 2017. The Scoring Matrix Generation Method and
pessimistic likelihood estimation for P2P lending. Electron. Commer. Res. Appl. 30, Recommendation algorithm in P2P Lending. In: Bahsoon, R., Chen, Z. (Eds.), 2017
111–124. https://doi.org/10.1016/j.elerap.2018.05.011. 13th Ieee World Congress on Services, pp. 86–89. https://doi.org/10.1109/
Xinmin, W., Peng, H., Akram, U., Yan, M., Attiq, S., 2019. The effect of successful services.2017.22.
borrowing times on behavior of investors: An empirical investigation of the P2P Promoting Business Analytics and Quantitative Management of Technology Vol. 91,
online lending market. Hum. Syst. Manage. 38 (4), 385–393. https://doi.org/ 2016, 168–174. https://doi.org/10.1016/j.procs.2016.07.055.
10.3233/hsm-190517. Zhang, Y., Wang, D., Chen, Y., Shang, H., Tian, Q., 2017. Credit Risk Assessment Based
Xiong, J., 2018. Risk Identification and Monitoring Model of Online P2P Lending. In: Liu, on Long Short-Term Memory Model. In: Huang, D. S., Jo, K. H., FigueroaGarcia, J. C.
J., Teves, K.L. (Eds.), Proceedings of the 2018 2nd International Conference on (Eds.), Intelligent Computing Theories and Application, Icic 2017, Pt Ii, vol. 10362,
Education, Economics and Management Research, vol. 182, pp. 360–363. pp. 700–712. https://doi.org/10.1007/978-3-319-63312-1_62.
Xu, J., Chau, M., 2018. Cheap talk? The impact of lender-borrower communication on Zhang, Y., Wang, D., Chen, Y., Zhao, Y., Shao, P., Meng, Q., 2017. Credit Risk Assessment
peer-to-peer lending outcomes. Journal of Management Information Systems 35 (1), Based on Flexible Neural Tree Model. In: Cong, F., Leung, A., Wei, Q. (Eds.),
53–85. https://doi.org/10.1080/07421222.2018.1440776. Advances in Neural Networks, Pt I (Vol. 10261, pp. 215–222). https://doi.org/
Xu, J., Chen, D., Chau, M., 2016. Identifying features for detecting fraudulent loan 10.1007/978-3-319-59072-1_26.
requests on P2P platforms. In: Zhou, L., Kaati, L., Mao, W., Wang, G.A. (Eds.), Ieee Zhang, Y., Wang, X., Qian, Y., Jia, H., 2016b. The Research of Recommendation
International Conference on Intelligence and Security Informatics (ISI): Algorithms in P2P Lending. DEStech Transactions on Engineering and Technology
Cybersecurity and Big Data, pp. 79–84. Research.
Xu, J. J., Lu, Y., Chau, M., 2015. P2P Lending Fraud Detection: A Big Data Approach. In: Zhao, J., 2015. Research on Mathematical Model P2P Online Credit Risk Evaluation
Chau, M., Wang, G.A., Chen, H. (Eds.), Intelligence and Security Informatics, Paisi Based on Data Processing. In: Wang, J., Qin, Y. (Eds.), Proceedings of the 2015
2015, vol. 9074, pp. 71–81. https://doi.org/10.1007/978-3-319-18455-5_5. International Conference on Education Technology, Management and Humanities
Xu, L., Zhang, Y., 2017. A credit rating model for online P2P lending based on analytic Science, vol. 27, pp. 897–900.
hierarchy process. In: Xu, J., Hajiyev, A., Nickel, S., Gen, M. (Eds.), Proceedings of Zhou, G., Zhang, Y., Luo, S., 2018. P2P network lending, loss given default and credit
the Tenth International Conference on Management Science and Engineering risks. Sustainability 10 (4), 1010. https://doi.org/10.3390/su10041010.
Management, vol. 502, pp. 537–549. https://doi.org/10.1007/978-981-10-1837-4_ Zhou, J., Li, W., Wang, J., Ding, S., Xia, C., 2019. Default prediction in P2P lending from
46. high-dimensional data based on machine learning. Physica A-Statistical Mechanics
Yan, Y., Lv, Z., Hu, B., 2017. Building investor trust in the P2P lending platform with a and Its Applications 534, 122370. https://doi.org/10.1016/j.physa:2019.122370.
focus on Chinese P2P lending platforms. Electronic Commerce Research 18 (2), Zhu, L., Qiu, D., Ergu, D., Ying, C., Liu, K., 2019. A study on predicting loan default based
203–224. https://doi.org/10.1007/s10660-017-9255-x. on the random forest algorithm. In: HerreraViedma, E., Shi, Y., Berg, D., Tien, J.,
Yan, Y., Lv, Z., Hu, B., 2016. Building Investor Trust in the P2P Lending Platform with a Cabrerizo, F. J., Li, J. (Eds.), 7th International Conference on Information
Focus on Chinese P2P Lending Platforms. In: 2016 International Conference on Technology and Quantitative Management, vol. 162, pp. 503–513. https://doi.org/
Identification, Information and Knowledge in the Internet of Things, pp. 470–474. 10.1016/j.procs.2019.12.017.
https://doi.org/10.1109/iiki.2016.15. Zhu, Z., 2018. Safety promise, moral hazard and financial supervision: Evidence from
Yang, X., Fan, W., Wang, L., Yang, S., Wang, W., 2019. Risk Control of Online P2P peer-to-peer lending. Finance Research Letters 27, 1–5. https://doi.org/10.1016/j.
Lending in China Based on Health Investment. Ekoloji 28 (107), 2013–2022. frl.2018.07.002.

24

You might also like