Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
<p>Diagram illustrating the research methodology employed in the study.</p> "> Figure 2
<p>(<b>a</b>,<b>c</b>) Processing of a blurry image with the Laplace and Sobel operator, respectively: on the right—the original image, on the left—the image processed using the Sobel operator. (<b>b</b>,<b>d</b>) Processing of a sharp image with the Laplace and Sobel operator, respectively: on the right—the original image, on the left—the image processed using the Sobel operator.</p> "> Figure 3
<p>(<b>a</b>) The number of combinations of readable/dark reports among all available; (<b>b</b>) only among printed reports.</p> "> Figure 4
<p>Correlation matrix of automatically detected parameters in medical reports recognition.</p> "> Figure 5
<p>Illustration of the proposed approach utilizing GA for optimizing OCR parameters to maximize character recognition in document images, facilitating enhanced entity extraction through NLP techniques.</p> "> Figure 6
<p>Illustration of the possible structure of the proposed NN.</p> "> Figure 7
<p>Schema showcasing the adaptive model’s functionality in adjusting image parameters and OCR settings to optimize IE from document images.</p> "> Figure 8
<p>Comparison of recognized TINs before and after the application of the adaptive model for (<b>a</b>) 12-digit TINs and (<b>b</b>) 10-digit organization TINs. The bar chart visualizes 1682 printed reports, where each pair of adjacent bins represents the number of correctly recognized digits before (red) and after (blue) the application of the adaptive model. (<b>c</b>) An example of successfully recognized Taxpayer Identification Number (TIN) where all 12 digits are correctly identified. (<b>d</b>) In another instance, only 11 out of 12 digits are correctly recognized. One possible reason for this discrepancy could be due to the tilt of the medical report image.</p> "> Figure 9
<p>Comparison of recognized payment dates costs (<b>a</b>) before and (<b>b</b>) after the application of the adaptive model for 1382 documents. Distribution of recognized address entities (<b>c</b>) before and (<b>d</b>) after the application of the adaptive model to the OCR process. Red corresponds to the number of incorrectly determined entities, while blue indicates correct ones.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Manual Quality Check of Reports
2.2. Automated Processing of Reports
2.3. NLP-Based Text Structurization
3. Results
3.1. Manual Labeling Statistics
3.2. Correlation Analysis of Parameters
3.3. Adaptive Model for Document Recognition
3.4. Adaptive Model Outcomes
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Image Characteristics
- Brightness: Ensures adequate illumination for accurate text recognition.
- Image Sharpness (Laplace Operator): Detects areas with rapid intensity changes, crucial for identifying text clarity.
- Image Sharpness (Sobel Operator): Highlights brightness gradients to enhance edge detection and overall text sharpness.
- Image Length and Height: Provides dimensions for contextualizing image content.
- Standard Deviation of Pixel Values: Quantifies pixel value variability, indicative of image clarity.
- Dark Pixels: Identifies low-intensity areas, aiding in contrast enhancement for text extraction.
- Bright Pixels: Highlights high-intensity areas, assisting in text differentiation from backgrounds.
- Medium Brightness Pixels: Captures text within a balanced brightness range for optimal recognition.
- Total Number of Pixels: Determines image resolution and detail for accurate text extraction.
- Variation: Measures noise level, influencing the clarity of text recognition.
- Entropy: Indicates image complexity, affecting text extraction reliability.
- Percentage Metrics: Offer insights into brightness distribution, aiding in text segmentation and extraction.
- Angle: The skew angle of the document refers to the rotation angle required to align the document horizontally.
- Segment Count: The number of segments refers to the count of identified regions containing text.
References
- Butt, H.; Raza, M.R.; Ramzan, M.J.; Ali, M.J.; Haris, M. Attention-based CNN-RNN Arabic text recognition from natural scene images. Forecasting 2021, 3, 520–540. [Google Scholar] [CrossRef]
- Bose, P.; Srinivasan, S.; Sleeman IV, W.C.; Palta, J.; Kapoor, R.; Ghosh, P. A survey on recent named entity recognition and relationship extraction techniques on clinical texts. Appl. Sci. 2021, 11, 8319. [Google Scholar] [CrossRef]
- Chu, Q.; Chen, C.P.; Hu, H.; Wu, X.; Han, B. iHand: Hand Recognition-Based Text Input Method for Wearable Devices. Computers 2024, 13, 80. [Google Scholar] [CrossRef]
- Jung, K.; Kim, K.I.; Jain, A.K. Text information extraction in images and video: A survey. Pattern Recognit. 2004, 37, 977–997. [Google Scholar] [CrossRef]
- Zhang, H.; Zhao, K.; Song, Y.Z.; Guo, J. Text extraction from natural scene image: A survey. Neurocomputing 2013, 122, 310–323. [Google Scholar] [CrossRef]
- Ye, Q.; Doermann, D. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1480–1500. [Google Scholar] [CrossRef]
- Yin, X.C.; Zuo, Z.Y.; Tian, S.; Liu, C.L. Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Process. 2016, 25, 2752–2773. [Google Scholar] [CrossRef] [PubMed]
- Zhao, M.; Li, S.; Kwok, J. Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 2010, 28, 1590–1599. [Google Scholar] [CrossRef]
- Neumann, L.; Matas, J. A method for text localization and recognition in real-world images. In Proceedings of the Computer Vision–ACCV 2010: 10th Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010; Revised Selected Papers, Part III 10. Springer: Berlin/Heidelberg, Germany, 2011; pp. 770–783. [Google Scholar]
- Yi, C.; Tian, Y. Text extraction from scene images by character appearance and structure modeling. Comput. Vis. Image Underst. 2013, 117, 182–194. [Google Scholar] [CrossRef]
- Gupta, N.; Banga, V. Image Segmentation for Text Extraction. In Proceedings of the 2nd International Conference on Electrical, Electronics and Civil Engineering (ICEECE’2012), Singapore, 28–29 April 2012; pp. 182–185. [Google Scholar]
- Dang, N.C.; Moreno-García, M.N.; De la Prieta, F. Sentiment analysis based on deep learning: A comparative study. Electronics 2020, 9, 483. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Lu, H.; Ma, L.; Fu, X.; Liu, C.; Wang, Z.; Tang, M.; Li, N. Landslides information extraction using object-oriented image analysis paradigm based on deep learning and transfer learning. Remote Sens. 2020, 12, 752. [Google Scholar] [CrossRef]
- Yang, Y.; Wu, Z.; Yang, Y.; Lian, S.; Guo, F.; Wang, Z. A survey of information extraction based on deep learning. Appl. Sci. 2022, 12, 9691. [Google Scholar] [CrossRef]
- Yu, W.; Lu, N.; Qi, X.; Gong, P.; Xiao, R. PICK: Processing key information extraction from documents using improved graph learning-convolutional networks. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 4363–4370. [Google Scholar]
- Revathi, A.; Modi, N.A. Comparative analysis of text extraction from color images using tesseract and opencv. In Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 17–19 March 2021; pp. 931–936. [Google Scholar]
- Montejo-Ráez, A.; Jiménez-Zafra, S.M. Current approaches and applications in natural language processing. Appl. Sci. 2022, 12, 4859. [Google Scholar] [CrossRef]
- Wang, D.; Su, J.; Yu, H. Feature extraction and analysis of natural language processing for deep learning English language. IEEE Access 2020, 8, 46335–46345. [Google Scholar] [CrossRef]
- Olivetti, E.A.; Cole, J.M.; Kim, E.; Kononova, O.; Ceder, G.; Han, T.Y.J.; Hiszpanski, A.M. Data-driven materials research enabled by natural language processing and information extraction. Appl. Phys. Rev. 2020, 7, 041317. [Google Scholar] [CrossRef]
- El-Komy, A.; Shahin, O.R.; Abd El-Aziz, R.M.; Taloba, A.I. Integration of computer vision and natural language processing in multimedia robotics application. Inf. Sci. 2022, 7, 765–775. [Google Scholar]
- Laique, S.N.; Hayat, U.; Sarvepalli, S.; Vaughn, B.; Ibrahim, M.; McMichael, J.; Qaiser, K.N.; Burke, C.; Bhatt, A.; Rhodes, C.; et al. Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports. Gastrointest. Endosc. 2021, 93, 750–757. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, L.; Rastegar-Mojarad, M.; Moon, S.; Shen, F.; Afzal, N.; Liu, S.; Zeng, Y.; Mehrabi, S.; Sohn, S.; et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 2018, 77, 34–49. [Google Scholar] [CrossRef]
- Viani, N.; Larizza, C.; Tibollo, V.; Napolitano, C.; Priori, S.G.; Bellazzi, R.; Sacchi, L. Information extraction from Italian medical reports: An ontology-driven approach. Int. J. Med. Inform. 2018, 111, 140–148. [Google Scholar] [CrossRef]
- Hahn, U.; Oleynik, M. Medical information extraction in the age of deep learning. Yearb. Med. Inform. 2020, 29, 208–220. [Google Scholar] [CrossRef] [PubMed]
- Dash, G.; Sharma, C.; Sharma, S. Sustainable marketing and the role of social media: An experimental study using natural language processing (NLP). Sustainability 2023, 15, 5443. [Google Scholar] [CrossRef]
- Zhou, M.; Duan, N.; Liu, S.; Shum, H.Y. Progress in neural NLP: Modeling, learning, and reasoning. Engineering 2020, 6, 275–290. [Google Scholar] [CrossRef]
- McMillan-Major, A.; Osei, S.; Rodriguez, J.D.; Ammanamanchi, P.S.; Gehrmann, S.; Jernite, Y. Reusable templates and guides for documenting datasets and models for natural language processing and generation: A case study of the HuggingFace and GEM data and model cards. arXiv 2021, arXiv:2108.07374. [Google Scholar]
- Zolotarev, O.; Solomentsev, Y.; Khakimova, A.; Charnine, M. Identification of semantic patterns in full-text documents using neural network methods. In Proceedings of the 29th International Conference on Computer Graphics and Vision, Graphicon-2019, Bryansk, Russia, 23–26 September 2019. [Google Scholar]
- Zolotarev, O. Research and development of linguo-statistical methods for forming a portrait of a subject area. In Proceedings of the International Conference “Computing for Physics and Technology—CPT2020”, Moscow, Russia, 9–13 November 2020; Volume 11, pp. 305–313. [Google Scholar]
- Kanev, A.I.; Savchenko, G.A.; Grishin, I.A.; Vasiliev, D.A.; Duma, E.M. Sentiment analysis of multilingual texts using machine learning methods. In Proceedings of the 2022 Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg, Russia, 25–28 January 2022; pp. 326–331. [Google Scholar]
- Atnashev, T.; Ganeeva, V.; Kazakov, R.; Matyash, D.; Sonkin, M.; Voloshina, E.; Serikov, O.; Artemova, E. Razmecheno: Named Entity Recognition from Digital Archive of Diaries Prozhito. arXiv 2022, arXiv:2201.09997. [Google Scholar]
- da Silva, A.R.; Savić, D. Linguistic patterns and linguistic styles for requirements specification: Focus on data entities. Appl. Sci. 2021, 11, 4119. [Google Scholar] [CrossRef]
- Bargshady, G.; Zhou, X.; Deo, R.C.; Soar, J.; Whittaker, F.; Wang, H. Enhanced deep learning algorithm development to detect pain intensity from facial expression images. Expert Syst. Appl. 2020, 149, 113305. [Google Scholar] [CrossRef]
- Percha, B. Modern clinical text mining: A guide and review. Annu. Rev. Biomed. Data Sci. 2021, 4, 165–187. [Google Scholar] [CrossRef] [PubMed]
- Graffelman, J.; De Leeuw, J. Improved approximation and visualization of the correlation matrix. Am. Stat. 2023, 77, 432–442. [Google Scholar] [CrossRef]
- Sinha, P.; Crucilla, S.; Gandhi, T.; Rose, D.; Singh, A.; Ganesh, S.; Mathur, U.; Bex, P. Mechanisms underlying simultaneous brightness contrast: Early and innate. Vis. Res. 2020, 173, 41–49. [Google Scholar] [CrossRef]
- Qiao, L.; Li, Z.; Cheng, Z.; Zhang, P.; Pu, S.; Niu, Y.; Ren, W.; Tan, W.; Wu, F. LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In Proceedings of the International Conference on Document Analysis and Recognition, Lausanne, Switzerland, 5–10 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 99–114. [Google Scholar]
- Singla, S. AI and IoT in healthcare. In Internet of Things Use Cases for the Healthcare Industry; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–23. [Google Scholar]
- Cao, L. Ai in finance: Challenges, techniques, and opportunities. ACM Comput. Surv. (CSUR) 2022, 55, 64. [Google Scholar]
- Zhai, X.; Chu, X.; Chai, C.S.; Jong, M.S.Y.; Istenic, A.; Spector, M.; Liu, J.B.; Yuan, J.; Li, Y. A Review of Artificial Intelligence (AI) in Education from 2010 to 2020. Complexity 2021, 2021, 8812542. [Google Scholar] [CrossRef]
- Ranaldi, L.; Pucci, G. Knowing knowledge: Epistemological study of knowledge in transformers. Appl. Sci. 2023, 13, 677. [Google Scholar] [CrossRef]
- Suster, D. A Mid-blue Logic. In HUMAN RATIONALITY Festschrift for Nenad Smokrović; Berčić, B., Golubović, A., Trobok, M., Eds.; University of Rijeka, Faculty of Humanities and Social Sciences: Rijeka, Croatia, 2022. [Google Scholar]
- Paaß, G.; Konya, I. Machine learning for document structure recognition. In Modeling, Learning, and Processing of Text Technological Data Structures; Springer: Berlin/Heidelberg, Germany, 2012; pp. 221–247. [Google Scholar]
- Ghazal, T. Convolutional neural network based intelligent handwritten document recognition. Comput. Mater. Contin. 2022, 70, 4563–4581. [Google Scholar]
- Akhter, M.P.; Jiangbin, Z.; Naqvi, I.R.; Abdelmajeed, M.; Mehmood, A.; Sadiq, M.T. Document-level text classification using single-layer multisize filters convolutional neural network. IEEE Access 2020, 8, 42689–42707. [Google Scholar] [CrossRef]
- Lozić, E.; Štular, B. Documentation of archaeology-specific workflow for airborne LiDAR data processing. Geosciences 2021, 11, 26. [Google Scholar] [CrossRef]
- Pizarro-Carmona, V.; Castano-Solis, S.; Cortes-Carmona, M.; Fraile-Ardanuy, J.; Jimenez-Bermejo, D. GA-based approach to optimize an equivalent electric circuit model of a Li-ion battery-pack. Expert Syst. Appl. 2021, 172, 114647. [Google Scholar] [CrossRef]
- Rodrigues Mendes Ribeiro, R.; Dias Maciel, C. Bayesian Network Structural Learning Using Adaptive Genetic Algorithm with Varying Population Size. Mach. Learn. Knowl. Extr. 2023, 5, 1877–1887. [Google Scholar] [CrossRef]
- Rezaeian, N.; Gurina, R.; Saltykova, O.A.; Hezla, L.; Nohurov, M.; Reza Kashyzadeh, K. Novel GA-Based DNN Architecture for Identifying the Failure Mode with High Accuracy and Analyzing Its Effects on the System. Appl. Sci. 2024, 14, 3354. [Google Scholar] [CrossRef]
- Alblooshi, E.; Alblooshi, A.; Poon, K.; Ouali, A. A GA based approach for solving ring design telecommunication network. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Boston, MA, USA, 9–13 July 2022; pp. 415–418. [Google Scholar]
- Shameem, M.; Nadeem, M.; Zamani, A.T. Genetic algorithm based probabilistic model for agile project success in global software development. Appl. Soft Comput. 2023, 135, 109998. [Google Scholar] [CrossRef]
- Sengupta, S.; Basak, S.; Peters, R.A. Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives. Mach. Learn. Knowl. Extr. 2018, 1, 157–191. [Google Scholar] [CrossRef]
- Wang, K.; Li, X.; Gao, L.; Li, P.; Gupta, S.M. A genetic simulated annealing algorithm for parallel partial disassembly line balancing problem. Appl. Soft Comput. 2021, 107, 107404. [Google Scholar] [CrossRef]
- Cherian, I.; Tamboli, A.I.; Pandey, A.; Manchanda, M.; Verma, G. A Comparative Study of Simulated Annealing and Ant Colony Optimization for Optimizing MRI-Based Alzheimer’s Disease Classification. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 464–475. [Google Scholar]
- Sangaiah, A.K.; Hosseinabadi, A.A.R.; Shareh, M.B.; Bozorgi Rad, S.Y.; Zolfagharian, A.; Chilamkurti, N. IoT resource allocation and optimization based on heuristic algorithm. Sensors 2020, 20, 539. [Google Scholar] [CrossRef] [PubMed]
- Park, A.; Hartzler, A.L.; Huh, J.; McDonald, D.W.; Pratt, W. Automatically detecting failures in natural language processing tools for online community text. J. Med. Internet Res. 2015, 17, e4612. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.H. Comparisons in End-to-End Pipeline Designs for Customized Document Information Extraction. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2024. [Google Scholar]
- Ben Tamou, A.; Benzinou, A.; Nasreddine, K. Live Fish Species Classification in Underwater Images by Using Convolutional Neural Networks Based on Incremental Learning with Knowledge Distillation Loss. Mach. Learn. Knowl. Extr. 2022, 4, 753–767. [Google Scholar] [CrossRef]
- Coyac-Torres, J.E.; Sidorov, G.; Aguirre-Anaya, E.; Hernández-Oregón, G. Cyberattack Detection in Social Network Messages Based on Convolutional Neural Networks and NLP Techniques. Mach. Learn. Knowl. Extr. 2023, 5, 1132–1148. [Google Scholar] [CrossRef]
- Mohammed, F.A.; Tune, K.K.; Assefa, B.G.; Jett, M.; Muhie, S. Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature. Mach. Learn. Knowl. Extr. 2024, 6, 699–735. [Google Scholar] [CrossRef]
- Mayr, F.; Yovine, S.; Visca, R. Property Checking with Interpretable Error Characterization for Recurrent Neural Networks. Mach. Learn. Knowl. Extr. 2021, 3, 205–227. [Google Scholar] [CrossRef]
- Yousef, M.; Hussain, K.F.; Mohammed, U.S. Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognit. 2020, 108, 107482. [Google Scholar] [CrossRef]
- Lin, J.C.W.; Shao, Y.; Djenouri, Y.; Yun, U. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl.-Based Syst. 2021, 212, 106548. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Malashin, I.; Masich, I.; Tynchenko, V.; Gantimurov, A.; Nelyub, V.; Borodulin, A. Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports. Mach. Learn. Knowl. Extr. 2024, 6, 1361-1377. https://doi.org/10.3390/make6020064
Malashin I, Masich I, Tynchenko V, Gantimurov A, Nelyub V, Borodulin A. Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports. Machine Learning and Knowledge Extraction. 2024; 6(2):1361-1377. https://doi.org/10.3390/make6020064
Chicago/Turabian StyleMalashin, Ivan, Igor Masich, Vadim Tynchenko, Andrei Gantimurov, Vladimir Nelyub, and Aleksei Borodulin. 2024. "Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports" Machine Learning and Knowledge Extraction 6, no. 2: 1361-1377. https://doi.org/10.3390/make6020064
APA StyleMalashin, I., Masich, I., Tynchenko, V., Gantimurov, A., Nelyub, V., & Borodulin, A. (2024). Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports. Machine Learning and Knowledge Extraction, 6(2), 1361-1377. https://doi.org/10.3390/make6020064