Abstract
Financial statements provide a view of company’s financial status at a specific point in time including the quantitative as well as qualitative view. Besides the quantitative information, the paper asserts that the qualitative information present in the form of textual disclosures have high discriminating power to predict the financial default. Towards this, the paper presents a technique to capture comprehensive 360-\(^{\circ }\) features from qualitative textual data at multiple granularities. The paper proposes a new sentence embedding (SE) from large language models specifically built for financial domain to encode the textual data and presents three deep learning models built on SE for financial default prediction. To accommodate unstructured and non-standard financial statements from small and unlisted companies, the paper also presents a document processing pipeline to be inclusive of such companies in the financial text modelling. Finally, the paper presents comprehensive experimental results on two datasets demonstrating the discriminating power of textual features to predict financial defaults.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhao, Z., Xu, S., Kang, B.H., et al.: Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst. Appl. 42(7), 3508–3516 (2015)
Chong, E., Han, C., Park, F.C.: Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies. Expert Syst. Appl. 83, 187–205 (2017)
Loughran, T., McDonald, B.: Textual Analysis in Accounting and Finance: A Survey. https://doi.org/10.2139/ssrn.2504147 (2016)
Hosaka, T.: Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 117, 287–299 (2019)
Beaver, W.H.: Financial ratios as predictors of failure. J. Accounting Res., 71–111 (1966)
Araci, D.: FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. https://arxiv.org/abs/1908.10063 (2019)
Huang, A.H., Wang, H., Yang, Y.: FinBERT: a large language model for extracting information from financial text. Contemporary Accounting Research (2022)
Loukas, L., et al.: FiNER: financial numeric entity recognition for XBRL tagging. https://arxiv.org/abs/2203.06482 (2022)
Shen, Z., et al.: LayoutParser: a unified toolkit for deep learning based document image analysis. In: 16th International Conference on Document Analysis and Recognition, Lausanne, Switzerland, pp. 131–146. https://doi.org/10.1007/978-3-030-86549-8_9 (2021)
Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C., Wei, F.: DiT: self-supervised pre-training for document image transformer. In: Proceedings of the 30th ACM International Conference on Multimedia (2022)
Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: LayoutLMv3: pre-training for Document AI with Unified Text and Image Masking. arXiv:2204.08387 (2022)
Lombardo, G., Pellegrino, M., Adosoglou, G., Cagnoni, S., Pardalos, P.M., Poggi, A.: Machine learning for bankruptcy prediction in the American stock market: dataset and benchmarks. Future Internet. 14(8), 244. https://doi.org/10.3390/fi14080244(2022)
Lombardo, G., Pellegrino, M., Adosoglou, G., Cagnoni, S., Pardalos, P.M., Poggi, A.: Deep Learning with Multi-Head Recurrent Neural Networks for Bankruptcy Prediction with Time Series Accounting Data. Available at SSRN: https://ssrn.com/abstract=4191839 (2022)
Edward, I.: Altman: financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968)
Shin, K.S., Lee, T.S., Kim, H.: An application of Support Vector Machines in bankruptcy prediction model. Expert Syst. Appl. 28(1), 127–135 (2005)
Nanni, L., Lumini, A.: An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 36(2), 3028–3033 (2009)
Kim, S.Y., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014)
Atiya, A.F.: Bankruptcy prediction for credit risk using neural networks: a survey and new results. IEEE Trans. Neural Networks 12(4), 929–935 (2001)
Tsai, C.F., Wu, J.W.: Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 34(4), 2639–2649 (2008)
Yoshihara, A., Fujikawa, K., Seki, K., Uehara, K.: Predicting stock market trends by recurrent deep neural networks. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 759–769. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13560-1_60
Lin, X.: Header and footer extraction by page association. In: Proceedings SPIE 5010, Document Recognition and Retrieval X, 13 January 2003. https://doi.org/10.1117/12.472833
Mai, F., Tian, S., Lee, C., et al.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18, 109–131 (1980)
U.S. Securities and Exchange Comission. https://www.sec.gov/edgar/search-and-access. Accessed 15 Jan 2023
Why a global recession is inevitable in 2023?. https://www.economist.com/the-world-ahead/2022/11/18/why-a-global-recession-is-inevitable-in-2023?gclid=CjwKCAiA5Y6eBhAbEiwA_2ZWIT-e4RQK695FLW-F_YuXnMT0Tx4w3Qcx4BdMXPv0P8A_S8guWgh0bRoCKsUQAvD_BwE &gclsrc=aw.ds. Accessed 15 Jan 2023
An Introduction to XBRL. https://www.xbrl.org/guidance/xbrl-glossary/. Accessed 10 Feb 2023
Karl Pearson F.R.S., 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), pp. 559–572
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, New York (2016). https://doi.org/10.1145/2939672.2939785
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Loukas, L., Fergadiotis, M., Androutsopoulos, I., Malakasiotis, P.: EDGAR-CORPUS: billions of tokens make the world go round. In: The Proceedings of the Workshop on Economics and Natural Language Processing - co-located with EMNLP (2021)
LoPucki, L.M.: UCLA-LoPucki Bankruptcy Research Database. UCLA School of Law. Print, Los Angeles, California (2005)
Peng, B., Chersoni, E., Hsu, Y.-Y., Huang, C.-R.: Is domain adaptation worth your investment? comparing BERT and FinBERT on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44, Punta Cana, Dominican Republic. Association for Computational Linguistics (2021)
Decile. In: The Concise Encyclopedia of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32833-1_99 (2008)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). ISSN 0167–8655, https://doi.org/10.1016/j.patrec.2005.10.010
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics (2019)
Radford, A.: Improving language understanding with unsupervised learning (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Doshi, C., Shrotiya, H., Bhiogade, R., Bhatt, H.S., Jha, A. (2023). Analyzing Textual Information from Financial Statements for Default Prediction. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-41682-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)