Analyzing Textual Information from Financial Statements for Default Prediction

Chinesh Doshi¹¹,
Himani Shrotiya¹¹,
Rohit Bhiogade¹¹,
Himanshu S. Bhatt¹¹ &
…
Abhishek Jha¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14189))

Included in the following conference series:

International Conference on Document Analysis and Recognition

988 Accesses

Abstract

Financial statements provide a view of company’s financial status at a specific point in time including the quantitative as well as qualitative view. Besides the quantitative information, the paper asserts that the qualitative information present in the form of textual disclosures have high discriminating power to predict the financial default. Towards this, the paper presents a technique to capture comprehensive 360-$^{\circ }$ features from qualitative textual data at multiple granularities. The paper proposes a new sentence embedding (SE) from large language models specifically built for financial domain to encode the textual data and presents three deep learning models built on SE for financial default prediction. To accommodate unstructured and non-standard financial statements from small and unlisted companies, the paper also presents a document processing pipeline to be inclusive of such companies in the financial text modelling. Finally, the paper presents comprehensive experimental results on two datasets demonstrating the discriminating power of textual features to predict financial defaults.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Credit Risk Scoring Using a Data Fusion Approach

Interpretability of deep learning models in analysis of Spanish financial text

Article Open access 26 February 2024

Document Representation for Text Analytics in Finance

References

Zhao, Z., Xu, S., Kang, B.H., et al.: Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst. Appl. 42(7), 3508–3516 (2015)
Article Google Scholar
Chong, E., Han, C., Park, F.C.: Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies. Expert Syst. Appl. 83, 187–205 (2017)
Article Google Scholar
Loughran, T., McDonald, B.: Textual Analysis in Accounting and Finance: A Survey. https://doi.org/10.2139/ssrn.2504147 (2016)
Hosaka, T.: Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 117, 287–299 (2019)
Article Google Scholar
Beaver, W.H.: Financial ratios as predictors of failure. J. Accounting Res., 71–111 (1966)
Google Scholar
Araci, D.: FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. https://arxiv.org/abs/1908.10063 (2019)
Huang, A.H., Wang, H., Yang, Y.: FinBERT: a large language model for extracting information from financial text. Contemporary Accounting Research (2022)
Google Scholar
Loukas, L., et al.: FiNER: financial numeric entity recognition for XBRL tagging. https://arxiv.org/abs/2203.06482 (2022)
Shen, Z., et al.: LayoutParser: a unified toolkit for deep learning based document image analysis. In: 16th International Conference on Document Analysis and Recognition, Lausanne, Switzerland, pp. 131–146. https://doi.org/10.1007/978-3-030-86549-8_9 (2021)
Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C., Wei, F.: DiT: self-supervised pre-training for document image transformer. In: Proceedings of the 30th ACM International Conference on Multimedia (2022)
Google Scholar
Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: LayoutLMv3: pre-training for Document AI with Unified Text and Image Masking. arXiv:2204.08387 (2022)
Lombardo, G., Pellegrino, M., Adosoglou, G., Cagnoni, S., Pardalos, P.M., Poggi, A.: Machine learning for bankruptcy prediction in the American stock market: dataset and benchmarks. Future Internet. 14(8), 244. https://doi.org/10.3390/fi14080244(2022)
Lombardo, G., Pellegrino, M., Adosoglou, G., Cagnoni, S., Pardalos, P.M., Poggi, A.: Deep Learning with Multi-Head Recurrent Neural Networks for Bankruptcy Prediction with Time Series Accounting Data. Available at SSRN: https://ssrn.com/abstract=4191839 (2022)
Edward, I.: Altman: financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968)
Article Google Scholar
Shin, K.S., Lee, T.S., Kim, H.: An application of Support Vector Machines in bankruptcy prediction model. Expert Syst. Appl. 28(1), 127–135 (2005)
Article Google Scholar
Nanni, L., Lumini, A.: An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 36(2), 3028–3033 (2009)
Article Google Scholar
Kim, S.Y., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014)
Article Google Scholar
Atiya, A.F.: Bankruptcy prediction for credit risk using neural networks: a survey and new results. IEEE Trans. Neural Networks 12(4), 929–935 (2001)
Article Google Scholar
Tsai, C.F., Wu, J.W.: Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 34(4), 2639–2649 (2008)
Article Google Scholar
Yoshihara, A., Fujikawa, K., Seki, K., Uehara, K.: Predicting stock market trends by recurrent deep neural networks. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 759–769. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13560-1_60
Chapter Google Scholar
Lin, X.: Header and footer extraction by page association. In: Proceedings SPIE 5010, Document Recognition and Retrieval X, 13 January 2003. https://doi.org/10.1117/12.472833
Mai, F., Tian, S., Lee, C., et al.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)
Article Google Scholar
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18, 109–131 (1980)
Article Google Scholar
U.S. Securities and Exchange Comission. https://www.sec.gov/edgar/search-and-access. Accessed 15 Jan 2023
Why a global recession is inevitable in 2023?. https://www.economist.com/the-world-ahead/2022/11/18/why-a-global-recession-is-inevitable-in-2023?gclid=CjwKCAiA5Y6eBhAbEiwA_2ZWIT-e4RQK695FLW-F_YuXnMT0Tx4w3Qcx4BdMXPv0P8A_S8guWgh0bRoCKsUQAvD_BwE &gclsrc=aw.ds. Accessed 15 Jan 2023
An Introduction to XBRL. https://www.xbrl.org/guidance/xbrl-glossary/. Accessed 10 Feb 2023
Karl Pearson F.R.S., 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), pp. 559–572
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, New York (2016). https://doi.org/10.1145/2939672.2939785
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Loukas, L., Fergadiotis, M., Androutsopoulos, I., Malakasiotis, P.: EDGAR-CORPUS: billions of tokens make the world go round. In: The Proceedings of the Workshop on Economics and Natural Language Processing - co-located with EMNLP (2021)
Google Scholar
LoPucki, L.M.: UCLA-LoPucki Bankruptcy Research Database. UCLA School of Law. Print, Los Angeles, California (2005)
Google Scholar
Peng, B., Chersoni, E., Hsu, Y.-Y., Huang, C.-R.: Is domain adaptation worth your investment? comparing BERT and FinBERT on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44, Punta Cana, Dominican Republic. Association for Computational Linguistics (2021)
Google Scholar
Decile. In: The Concise Encyclopedia of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32833-1_99 (2008)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). ISSN 0167–8655, https://doi.org/10.1016/j.patrec.2005.10.010
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics (2019)
Google Scholar
Radford, A.: Improving language understanding with unsupervised learning (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

AI Labs, American Express, New York, USA
Chinesh Doshi, Himani Shrotiya, Rohit Bhiogade, Himanshu S. Bhatt & Abhishek Jha

Authors

Chinesh Doshi
View author publications
You can also search for this author in PubMed Google Scholar
Himani Shrotiya
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Bhiogade
View author publications
You can also search for this author in PubMed Google Scholar
Himanshu S. Bhatt
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Jha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Bhiogade .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doshi, C., Shrotiya, H., Bhiogade, R., Bhatt, H.S., Jha, A. (2023). Analyzing Textual Information from Financial Statements for Default Prediction. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-41682-8_4
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Analyzing Textual Information from Financial Statements for Default Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Credit Risk Scoring Using a Data Fusion Approach

Interpretability of deep learning models in analysis of Spanish financial text

Document Representation for Text Analytics in Finance

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Analyzing Textual Information from Financial Statements for Default Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Credit Risk Scoring Using a Data Fusion Approach

Interpretability of deep learning models in analysis of Spanish financial text

Document Representation for Text Analytics in Finance

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation