Abstract
The medical field has experienced a series of transformations with the adoption of new technologies. One of the aspects that experienced significant changes is how a patient’s information is stored. Electronic health records have brought a series of advantages but still present many issues. One of them is the degree of structuring for contained information. More structuring brings a greater richness of information. On the other hand, it contains more challenging and complex content when most of the information is stored in free text (unstructured information). In this sense, many studies focused on structuring the information contained in free text have emerged. This work aims to review the studies focused on the structuring of unstructured health record information, seeking to answer key questions to propose new studies in the field on topics such as the form in which information is structured, the main techniques used, and how data acquisition for development and evaluation is done. To answer these questions, a wide systematic review of the field was conducted since the emergence of BERT networks. In addition to answering those questions, this systematic review identified the main challenges, such as difficulty in data acquisition, problems with natural language processing, and the specific challenges of the studies that process non-English languages, finalizing with a general view of the state of the art in the field and its future opportunities.
Similar content being viewed by others
References
Roehrs A, Da Costa CA, da Rosa Righi R, De Oliveira KSF. Personal health records: a systematic literature review. J Med Int Res 2017;19(1):e13.
Castillo VH, Martínez-García AI, Soriano-Equigua L, Maciel-Mendoza FM, Álvarez-Flores JL, Juárez-Ramírez R. An interaction framework for supporting the adoption of ehrs by physicians. Univ Access Inf Soc. 2019;18(2):399–412.
Maximilian Z, J BO, Michael M. Using openehr archetypes for automated extraction of numerical information from clinical narratives. Studies in Health Technology and Informatics 267(German Medical Data Sciences: Shaping Change Creative Solutions for Innovative Medicine) 2019;156-163. https://doi.org/10.3233/SHTI190820
Tognola G, Murri A, Cuda D. Cognitive computing for the automated extraction and meaningful use of health data in narrative medical notes: An application to the clinical management of hearing impaired aged patients. In: 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 2018 IEEE, https://doi.org/10.1109%2Fbhi.2018.8333428
Kormilitzin A, Vaci N, Liu Q, Nevado-Holgado A. Med7: a transferable clinical natural language processing model for electronic health records. 2020 CoRR.
Kersloot MG, van Putten FJ, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. Journal of biomedical semantics. 2020;11(1):1–21.
Kim MC, Nam S, Wang F, Zhu Y. Mapping scientific landscapes in umls research: a scientometric review. J Am Med Inform Assoc. 2020;27(10):1612–24.
Basyal GP, Rimal BP, Zeng D. A systematic review of natural language processing for knowledge management in healthcare. 2020. https://arxiv.org/abs/2007.09134
AlShuweihi M, Salloum SA, Shaalan K. Biomedical corpora and natural language processing on clinical text in languages other than english: A systematic review. Recent Advances in Intelligent Systems and Smart Applications. 2021;491–509.
Luque C, Luna JM, Luque M, Ventura S. An advanced review on text mining in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019;9(3).
Sun W, Cai Z, Liu F, Fang S, Wang G. A survey of data mining technology on electronic medical records. In: IEEE 19th International Conference on e-Health Networking. IEEE: Applications and Services (Healthcom); 2017. p. 1–6.
Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering 2018.
Yadav P, Steinbach M, Kumar V, Simon G. Mining electronic health records (ehrs) a survey. ACM Computing Surveys (CSUR). 2018;50(6):1–40.
Alfattni G, Peek N, Nenadic G. Extraction of temporal relations from clinical free text: A systematic review of current approaches. Journal of Biomedical Informatics 2020;103488.
Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, et al. Clinical concept extraction: a methodology review. Journal of Biomedical Informatics 2020;103526.
Kaieski N, da Costa CA, da RosaRighi R, Lora PS, Eskofier B. Application of artificial intelligence methods in vital signs analysis of hospitalized patients: A systematic literature review. Appl Soft Comp, 2020;106612.
Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019;26(4):364–79.
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med Inform. 2019;7(2).
Al-Aiad A, El-shqeirat T. Text mining in radiology reports (methodologies and algorithms), and how it affects on workflow and supports decision making in clinical practice (systematic review). In: 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, 2020;283–287.
Luo JW, Chong JJ. Review of natural language processing in radiology. Neuroimaging Clinics. 2020;30(4):447–58.
Colmenarejo G. Machine learning models to predict childhood and adolescent obesity: A review. Nutrients. 2020;12(8):2466.
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform. 2017;73:14–29.
Percha B. Modern clinical text mining: A guide and review. Preprints 2021.
Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE J Biomed Health Inform. 2017;22(5):1589–604.
Spasic I, Nenadic G. Clinical text data in machine learning: Systematic review. JMIR Med Inform. 2020;8(3).
Gubert LC, da Costa CA, da Rosa Righi R. Context awareness in healthcare: a systematic literature review. Univ Access Inf Soc. 2020;19(2):245–59.
Budgen D, Brereton P. Performing systematic literature reviews in software engineering. In: Proceedings of the 28th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’06, 2006;1051-1052. https://doi.org/10.1145/1134285.1134500
Keele S, et al. Guidelines for performing systematic literature reviews in software engineering. Tech. rep., Technical report, Ver. 2.3 EBSE Tech Rep. EBSE 2007.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. 2017, arXiv preprint arXiv:170603762
Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR. 2018. http://arxiv.org/abs/1810.04805
Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, McDermott MBA. Publicly available clinical bert embeddings. 2019. https://arxiv.org/abs/1904.03323.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019;4171–4186, https://www.aclweb.org/anthology/N19-1423
Qin X, Liu J, Wang Y, Liu Y, Deng K, Ma Y, Zou K, Li L, Sun X. Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews. J Clin Epidemiol. 2021.
El Rifai O, Biotteau M, de Boissezon X, Megdiche I, Ravat F, Teste O. Blockchain-based personal health records for patients’ empowerment. In: Dalpiaz F, Zdravkovic J, Loucopoulos P, editors. Research Challenges in Information Science. Cham: Springer International Publishing, 2020;455–71.
Reza F, Prieto JT, Julien SP. Electronic Health Records: Origination, Adoption, and Progression, Springer International Publishing, Cham, 2020;183–201. https://doi.org/10.1007/978-3-030-41215-9_11
Syed L, Jabeen S, Manimala S. Telemammography: A Novel Approach for Early Detection of Breast Cancer Through Wavelets Based Image Processing and Machine Learning Techniques, 2018;149–183. https://doi.org/10.1007/978-3-319-63754-9_8
Feature Extraction Method from Electronic Health Records in Russia, FRUCT Oy. 2020. https://doi.org/10.5281/zenodo.4007408
Amin S, Neumann G, Dunfield K, Vechkaeva A, Chapman KA, Wixted MK. Mlt-dfki at clef ehealth 2019: Multi-label classification of icd-10 codes with bert. In: CLEF (Working Notes). 2019.
Blanco A, Casillas A, Pérez A, deIlarraza AD. Multi-label clinical document classification: Impact of label-density. Expert Systems with Applications 2019;138:112835. https://doi.org/10.1016%2Fj.eswa.2019.112835
Breischneider C, Zillner S, Hammon M, Gass P, Sonntag D. Automatic extraction of breast cancer information from clinical reports. In: 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), IEEE. 2017. https://doi.org/10.1109%2Fcbms.2017.138
Cai T, Zhou Y, Zheng H. Cost-quality adaptive active learning for chinese clinical named entity recognition. 2020a. arXiv:200812548
Cai T, Zhou Y, Zheng H. Cost-quality adaptive active learning for chinese clinical named entity recognition. 2020b. arXiv preprint arXiv:200812548
Chen R, Ho JC, Lin JMS. Extracting medication information from unstructured public health data: a demonstration on data from population-based and tertiary-based samples. BMC Medical Research Methodology 2020;20(1). https://doi.org/10.1186/s12874-020-01131-7
Chen Y, Zhou C, Li T, Wu H, Zhao X, Ye K, Liao J. Named entity recognition from chinese adverse drug event reports with lexical feature based bilstm-crf and tri-training. Journal of Biomedical Informatics 2019b;96:103252. http://www.sciencedirect.com/science/article/pii/S1532046419301716
Chowdhury S, Dong X, Qian L, Li X, Guan Y, Yang J, Yu Q. A multitask bi-directional RNN model for named entity recognition on chinese electronic medical records. BMC Bioinformatics. 2018;19(S17). https://doi.org/10.1186%2Fs12859-018-2467-9
Dai HJ, Su CH, Wu CS. Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings. J Am Med Inform Asso 2019;27(1):47–55. https://doi.org/10.1093%2Fjamia%2Focz120
Dong X, Chowdhury S, Qian L, Li X, Guan Y, Yang J, Yu Q. Deep learning for named entity recognition on chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN. PLOS ONE. 2019;14(5):e0216046. https://doi.org/10.1371%2Fjournal.pone.0216046
Du M, Pang M, Xu B. Multi-task learning for attribute extraction from unstructured electronic medical records. In: Wang X, Lisi FA, Xiao G, Botoeva E, editors. Semantic Technology. Singapore: Springer Singapore; 2020. p. 117–28.
Huang HL, Hong SH, Tsai YC. Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A prisma-compliant meta-analysis. Medicine 2020;99(29).
Ji J, Chen B, Jiang H (2020) Fully-connected LSTM–CRF on medical concept extraction. International Journal of Machine Learning and Cybernetics 11(9):1971–1979. https://doi.org/10.1007/s13042-020-01087-6
Jouffroy J, Feldman SF, Lerner I, Rance B, Burgun A, Neuraz A. MedExt: combining expert knowledge and deep learning for medication extraction from french clinical texts (preprint). JMIR Med Inform 10.2196/preprints.17934, URL, 2020. https://doi.org/10.2196%2Fpreprints.17934
Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, Mascio A, Zhu L, Folarin AA, Roberts A, et al. Multi-domain clinical natural language processing with medcat: the medical concept annotation toolkit. 2020; arXiv preprint arXiv:201001165
Lee W, Kim K, Lee EY, Choi J. Conditional random fields for clinical named entity recognition: A comparative study using korean clinical texts. Computers in Biology and Medicine 2018;101:7–14. http://www.sciencedirect.com/science/article/pii/S0010482518302105
Lerner I, Paris N, Tannier X. Terminologies augmented recurrent neural network model for clinical named entity recognition. J Biomed Inform 2020;102:103356. http://www.sciencedirect.com/science/article/pii/S1532046419302734
Li Y, Du G, Xiang Y, Li S, Ma L, Shao D, Wang X, Chen H. Towards chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge. J Biomed Inform 2020a;106:103435. https://doi.org/10.1016%2Fj.jbi.2020.103435
Li Y, Wang X, Hui L, Zou L, Li H, Xu L, Liu W. Chinese clinical named entity recognition in electronic medical records: Development of a lattice long short-term memory model with contextualized character representations. JMIR Medical Inform 2020b;8(9):e19848. https://doi.org/10.2196%2F19848
Liu K, Hu Q, Liu J, Xing C. Named entity recognition in chinese electronic medical records based on CRF. In: 2017 14th Web Information Systems and Applications Conference (WISA), IEEE, 2017. https://doi.org/10.1109%2Fwisa.2017.8
Lopes F, Teixeira C, Oliveira HG. Named entity recognition in portuguese neurology text using crf. In: EPIA Conference on Artificial Intelligence, Springer, 2019;336–348
Lopes F, Teixeira C, Oliveira HG. Comparing different methods for named entity recognition in portuguese neurology text. J Med Systems 2020;44(4). https://doi.org/10.1007%2Fs10916-020-1542-8
Lu N, Zheng J, Wu W, Yang Y, Chen K, Hu W. Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019. https://doi.org/10.1109%2Fijcnn.2019.8852113
Maguire FB, Morris CR, Parikh-Patel A, Cress RD, Keegan THM, Li CS, Lin PS, Kizer KW. A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in california. PLOS ONE 2019;14(2):e0212454, https://doi.org/10.1371%2Fjournal.pone.0212454
Nuthakki S, Neela S, Gichoya JW, Purkayastha S. Natural language processing of mimic-iii clinical notes for identifying diagnosis and procedures with neural networks, 2019. arXiv preprint arXiv:191212397
Ohno-Machado L, Séroussi B. Automatic methods to extract prescription status quality measures from unstructured health records. In: MEDINFO 2019: Health and Wellbeing e-Networks for All: Proceedings of the 17th World Congress on Medical and Health Informatics, IOS Press, 2019;264(15).
Pérez A, Weegar R, Casillas A, Gojenola K, Oronoz M, Dalianis H. Semi-supervised medical entity recognition: A study on spanish and swedish clinical corpora. J Biomed Info 2017;71:16–30, https://doi.org/10.1016%2Fj.jbi.2017.05.009
Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Info 2019;129:20–28. https://doi.org/10.1016%2Fj.ijmedinf.2019.05.020
Schneider ETR, deSouza JVA, Knafou J, eOliveira LES, Copara J, Gumiel YB, deOliveira LFA, Paraiso EC, Teodoro D, Barra CMCM. Biobertpt-a portuguese neural language model for clinical named entity recognition. In: Proceedings of the 3rd Clin Natural Language Process Workshop, 2020;65–72.
Sen C, Hartvigsen T, Kong X, Rundensteiner E. Patient-level classification on clinical note sequences guided by attributed hierarchical attention. In: 2019 IEEE International Conference on Big Data (Big Data), IEEE, 2019. https://doi.org/10.1109%2Fbigdata47090.2019.9006403
Sharma B, Dligach D, Swope K, Salisbury-Afshar E, Karnik NS, Joyce C, Afshar M. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients. BMC Med Info Dec Making, 2020;20(1). https://doi.org/10.1186/s12911-020-1099-y
Soriano IM, Peña JLC. Automatic medical concept extraction from free text clinical reports, a new named entity recognition approach. Int J Computers, 2017;2.
Spandorfer A, Branch C, Sharma P, Sahbaee P, Schoepf UJ, Ravenel JG, Nance JW. Deep learning to convert unstructured ct pulmonary angiography reports into structured reports. European radiology experimental. 2019;3(1):37.
Steinkamp JM, Bala W, Sharma A, Kantrowitz JJ. Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. J Biomed Info. 2020;102:103354. https://doi.org/10.1016%2Fj.jbi.2019.103354
Symeonidou A, Sazonau V, Groth P. Transfer learning for biomedical named entity recognition with biobert. In: SEMANTICS Posters & Demos. 2019.
Tarcar AK, Tiwari A, Rao D, Dhaimodker VN, Rebelo P, Desai R. Healthcare ner models using language model pretraining. In: Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, 2020;12–18.
Wang Q, Zeng L. Chinese symptom component recognition via bidirectional lstm-crf. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), 2018;45–50. https://doi.org/10.1109/ICACI.2018.8377564
Wang R, Zhao J, Peng L, Yang B, Wang L, Li B. Medical entity recognition of esophageal carcinoma based on word clustering. In: 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), IEEE, 2018a. https://doi.org/10.1109%2Fspac46244.2018.8965515
Wang S, Ma S, Chen M, Wei M, Yu G. A childhood disease database based on word segmentation technology: Research and practice. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), IEEE, 2018b. https://doi.org/10.1109%2Fcompsac.2018.10269
Wang S, Pang M, Pan C, Yuan J, Xu B, Du M, Zhang H. Information extraction for intestinal cancer electronic medical records. IEEE Access, 2020;8:125923–125934. https://doi.org/10.1109/access.2020.3005684
Weegar R, Perez A, Casillas A, Oronoz M. Deep medical entity recognition for swedish and spanish. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2018. https://doi.org/10.1109%2Fbibm.2018.8621282
Weeks HL, Beck C, McNeer E, Williams ML, Bejan CA, Denny JC, Choi L. medExtractR: A targeted, customizable approach to medication extraction from electronic health records. J Am Med Inform Association, 2020;27(3):407–418, 10.1093/jamia/ocz207. https://doi.org/10.1093%2Fjamia%2Focz207
Wunnava S, Qin X, Kakar T, Sen C, Rundensteiner EA, Kong X. Adverse drug event detection from electronic health records using hierarchical recurrent neural networks with dual-level embedding. Drug Safety, 2019;42(1):113–122. https://doi.org/10.1007/s40264-018-0765-9
Yang T, Jiang D, Shi S, Zhan S, Zhuo L, Yin Y, Liang Z. Chinese data extraction and named entity recognition. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), IEEE, 2020. https://doi.org/10.1109%2Ficbda49040.2020.9101204
Yin M, Mou C, Xiong K, Ren J. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism. J Biomed Inform, 2019;98:103289. https://doi.org/10.1016%2Fj.jbi.2019.103289
Zhang T, Wang Y, Wang X, Yang Y, Ye Y. Constructing fine-grained entity recognition corpora based on clinical records of traditional chinese medicine. BMC Medical Informatics and Decision Making 2020;20(1) https://doi.org/10.1186/s12911-020-1079-2
Zhang Y, Wang X, Hou Z, Li J. Clinical named entity recognition from chinese electronic health records via machine learning methods. JMIR medical informatics, 2018b;6(4):e50.
Zhao B. Clinical data extraction and normalization of cyrillic electronic health records via deep-learning natural language processing. JCO Clin Canc Inform, 2019;(3):1–9. https://doi.org/10.1200%2Fcci.19.00057
Almeida JR, Matos S. Rule-based extraction of family history information from clinical notes. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, ACM. 2020. https://doi.org/10.1145%2F3341105.3374000
Alodadi MS, Janeja VP. Clinical entities association rules (CLEAR): Untangling clinical notes in electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE. 2019. https://doi.org/10.1109%2Fbibm47256.2019.8983140
Balabaeva K, Kovalchuk S. Experiencer detection and automated extraction of a family disease tree from medical texts in russian language. In: Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J, editors. Computational Science - ICCS 2020. Cham: Springer International Publishing, 2020;603–12.
Boytcheva S. Indirect association rules mining in clinical texts. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer, 2018;36–47.
Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Medical Informatics and Decision Making, 2019;19(1). https://doi.org/10.1186/s12911-019-0970-1
Chen L, Song L, Shao Y, Li D, Ding K. Using natural language processing to extract clinically useful information from chinese electronic medical records. Int J Med Inform, 2019a;124:6–12. http://www.sciencedirect.com/science/article/pii/S138650561830594X
Cheng M, Li L, Ren Y, Lou Y, Gao J. A hybrid method to extract clinical information from chinese electronic medical records. IEEE Access, 2019;7:70624–70633. https://doi.org/10.1109%2Faccess.2019.2919121
Dandala B, Joopudi V, Tsou C, Liang JJ, Suryanarayanan P. Extraction of information related to drug safety surveillance from electronic health record notes: Joint modeling of entities and relations using knowledge-aware neural attentive models. JMIR Medical Informatics. 2020;8(7). www.scopus.com
Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the exect (extraction of epilepsy clinical text) system. BMJ Open, 2019;9(4). https://bmjopen.bmj.com/content/9/4/e023232
Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N, Dzahini O, Pandey C, Broadbent M, Stewart R, Dobson RJB, Ibrahim ZM. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLOS ONE, 2017;12(11):e0187121. https://doi.org/10.1371/journal.pone.0187121
Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics, 2019;10(1). https://doi.org/10.1186/s13326-019-0207-3
Lamy M, Pereira R, Ferreira JC, Melo F, Velez I. Extracting clinical knowledge from electronic medical records. Extracting clinical knowledge from electronic medical records, 2018a;(3):488–493.
Leiter RE, Santus E, Jin Z, Lee KC, Yusufov M, Chien I, Ramaswamy A, Moseley ET, Qian Y, Schrag D, Lindvall C. Deep natural language processing to identify symptom documentation in clinical notes for patients with heart failure undergoing cardiac resynchronization therapy. J Pain and Symp Manage, 2020;60(5):948–958.e3. http://www.sciencedirect.com/science/article/pii/S0885392420305248
Li P, Yuan Z, Tu W, Yu K, Lu D. Medical knowledge extraction and analysis from electronic medical records using deep learning. Chinese Med Sci J, 2019;34(2):133–139. http://www.sciencedirect.com/science/article/pii/S1001929419300355
Li Z, Li C, Long Y, Wang X. A system for automatically extracting clinical events with temporal information. BMC Medical Informatics and Decision Making, 2020c;20(1). https://doi.org/10.1186/s12911-020-01208-9
Liu S, Pan X, Chen B, Gao D, Hao T. An automated approach for clinical quantitative information extraction from chinese electronic medical records. In: Health Information Science, Springer International Publishing, 2018;98–109. https://doi.org/10.1007%2F978-3-030-01078-2_9
Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform, 2017;72:85–95, https://doi.org/10.1016%2Fj.jbi.2017.07.006
Luo Y, Cheng Y, Uzuner O, Szolovits P, Starren J. Segment convolutional neural networks (seg-cnns) for classifying relations in clinical notes. J Am Med Inform Assoc. 2018;25(1):93–8.
Munkhdalai T, Liu F, Yu H. Clinical relation extraction toward drug safety surveillance using electronic health record narratives: Classical learning versus deep learning. JMIR Public Health and Surveillance, 2018;4(2):e29. https://doi.org/10.2196%2Fpublichealth.9361
Natarajan S, Bangera V, Khot T, Picado J, Wazalwar A, Costa VS, Page D, Caldwell M. Markov logic networks for adverse drug event extraction from text. Knowl Info Syst, 2016;51(2):435–457. https://doi.org/10.1007/s10115-016-0980-6
Peterson KJ, Liu H. Automating the transformation of free-text clinical problems into snomed ct expressions. AMIA Summits on Translational Science Proceedings. 2020;2020:497.
Sagheb E, Ramazanian T, Tafti AP, Fu S, Kremers WK, Berry DJ, Lewallen DG, Sohn S, Kremers HM. Use of natural language processing algorithms to identify common data elements in operative notes for knee arthroplasty. The J Arthroplasty. 2020.
Shah S, Luo X, Kanakasabai S, Tuason R, Klopper G. Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Info Sci Syst 2018;7(1). https://doi.org/10.1007%2Fs13755-018-0062-0
Shi X, Jiang D, Huang Y, Wang X, Chen Q, Yan J, Tang B. Family history information extraction via deep joint learning. BMC Med Info Dec Making, 2019;19(S10). https://doi.org/10.1186%2Fs12911-019-0995-5
Singh G, Marshall IJ, Thomas J, Shawe-Taylor J, Wallace BC. A neural candidate-selector architecture for automatic structured clinical text annotation. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Association for Computing Machinery, New York, NY, USA, CIKM ’17, 2017;1519-1528. https://doi.org/10.1145/3132847.3132989
Song B, Feng Y, Li X, Sun Z, Yang Y. Un-apriori: A novel association rule mining algorithm for unstructured emrs. In: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), 2017;1–6. https://doi.org/10.1109/HealthCom.2017.8210792
Steinkamp JM, Chambers C, Lalevic D, Zafar HM, Cook TS. Toward complete structured information extraction from radiology reports using machine learning. J Dig Imag 2019;32(4):554–564. https://doi.org/10.1007/s10278-019-00234-y
Su J, Hu J, Jiang J, Xie J, Yang Y, He B, Yang J, Guan Y. Extraction of risk factors for cardiovascular diseases from chinese electronic medical records. Comp Meth Prog Biomed, 2019;172:1–10. https://doi.org/10.1016%2Fj.cmpb.2019.01.007
Viani N, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R, Sacchi L. Information extraction from italian medical reports: An ontology-driven approach. International J Med Inform, 2018;111:140- 148. http://www.sciencedirect.com/science/article/pii/S1386505617304586
Yadav S, Ramteke P, Ekbal A, Saha S, Bhattacharyya P. Exploring disorder-aware attention for clinical event extraction. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020b;16(1s):1–21. https://doi.org/10.1145%2F3372328
Yang X, Bian J, Gong Y, Hogan WR, Wu Y. MADEx: A system for detecting medications, adverse drug events, and their relations from clinical notes. Drug Safety, 2019;42(1):123–133. https://doi.org/10.1007/s40264-018-0761-0
Yehia E, Boshnak H, AbdelGaber S, Abdo A, Elzanfaly DS. Ontology-based clinical information extraction from physician’s free-text notes. J Biomed Inform. 2019;98.
Zhang Z, Zhou T, Zhang Y, Pang Y. Attention-based deep residual learning network for entity relation extraction in chinese EMRs. BMC Med Inform Decision Making, 2019b;19(S2), https://doi.org/10.1186%2Fs12911-019-0769-0
Kenei JK, Moso JC, Omullo ETO, Oboko R. Deep CNN with residual connections and range normalization for clinical text classification. Comp Sci Inform Tech, 2019;7(4):111–127. https://doi.org/10.13189%2Fcsit.2019.070402
Moen H, Hakala K, Peltonen LM, Suhonen H, Loukasmki P, Salakoski T, Ginter F, Salanter S. Evaluation of a prototype system that automatically assigns subject headings to nursing narratives using recurrent neural network. In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, Association for Computational Linguistics, 2018. https://doi.org/10.18653%2Fv1%2Fw18-5611
Moen H, Hakala K, Peltonen LM, Suhonen H, Ginter F, Salakoski T, Salanter S. Supporting the use of standardized nursing terminologies with automatic subject heading prediction: a comparison of sentence-level text classification methods. J Am Med Inform Assoc. 2019;27(1):81–8. https://doi.org/10.1093/jamia/ocz150.
Moen H, Hakala K, Peltonen LM, Matinolli HM, Suhonen H, Terho K, Danielsson-Ojala R, Valta M, Ginter F, Salakoski T, Salanter S. Assisting nurses in care documentation: from automated sentence classification to coherent document structures with subject headings. J Biomed Seman, 2020;11(1) https://doi.org/10.1186%2Fs13326-020-00229-7
Wu PH, Yu A, Tsai CW, Koh JL, Kuo CC, Chen ALP. Keyword extraction and structuralization of medical reports. Health Information Science and Systems, 2020;8(1). https://doi.org/10.1007/s13755-020-00108-6
Zhang R, Chu F, Chen D, Shang X. A text structuring method for chinese medical text based on temporal information. Int J Environ Res Pub Health, 2018a;15(3), https://www.mdpi.com/1660-4601/15/3/402
Mansouri A, Affendey LS, Mamat A. Named entity recognition approaches. International Journal of Computer Science and Network Security. 2008;8(2):339–44.
Soriano IM, Castro J. Dner clinical (named entity recognition) from free clinical text to snomed-ct concept. WSEAS Trans Comput. 2017;16:83–91.
Han X, Ruonan R. The method of medical named entity recognition based on semantic model and improved svm-knn algorithm. In: 2011 Seventh International Conference on Semantics. IEEE: Knowledge and Grids; 2011. p. 21–7.
Kundeti SR, Vijayananda J, Mujjiga S, Kalyan M. Clinical named entity recognition: Challenges and opportunities. In: 2016 IEEE International Conference on Big Data (Big Data), 2016;(1937–1945)10.1109/BigData.2016.7840814.
Saripalle R, Sookhak M, Haghparast M. An interoperable umls terminology service using fhir. Future Internet. 2020;12(11):199.
Browne AC, Divita G, Aronson AR, McCray AT. Umls language and vocabulary tools: Amia 2003 open source expo. In: AMIA annual symposium proceedings, American Medical Informatics Association, 2003;798.
Organization WH, et al. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research, vol 2. World Health Organization. 1993.
Donnelly K. Snomed-ct: The advanced terminology and coding system for ehealth. Studies in health technology and informatics. 2006;121:279.
Liu S, Ma W, Moore R, Ganesan V, Nelson S. Rxnorm: prescription for electronic drug information exchange. IT professional. 2005;7(5):17–23.
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for ehr-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(1):139–53.
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.
Aronson AR, Lang FM. An overview of metamap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.
Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. Medex: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010;17(1):19–24.
Osborne JD, Gyawali B, Solorio T. Evaluation of ytex and metamap for clinical concept recognition. 2014, arXiv preprint arXiv:14021668
Gorrell G, Song X, Roberts A. Bio-yodie: A named entity linking system for biomedical text. 2018, arXiv preprint arXiv:181104860
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. Clamp-a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2018;25(3):331–6.
Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004;11(5):392–402.
Neumann M, King D, Beltagy I, Ammar W. Scispacy: Fast and robust models for biomedical natural language processing. 2019, arXiv preprint arXiv:190207669
Tibbo ME, Wyles CC, Fu S, Sohn S, Lewallen DG, Berry DJ, Kremers HM. Use of natural language processing tools to identify and classify periprosthetic femur fractures. J Arthroplasty. 2019;34(10):2216–9.
Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical natural language processing in languages other than english: opportunities and challenges. Journal of biomedical semantics. 2018;9(1):12.
Fu TJ, Li PH, Ma WY. Graphrel: Modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019;1409–1418
Yadav S, Ramesh S, Saha S, Ekbal A. Relation extraction from biomedical and clinical text: Unified multitask learning framework. IEEE/ACM Transac Comput Biol Bioinform, 2020a.
Uzuner O, Solti I, Xia F, Cadag E. Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Asso 2010;17(5):519–523, https://doi.org/10.1136/jamia.2010.004200, https://academic.oup.com/jamia/article-pdf/17/5/519/5940619/17-5-519.pdf
Boshnaka H, AbdelGaberb S, AmanyAbdoc EY. Ontology-based knowledge modelling for clinical data representation in electronic health records. Int J Comp Sci Info Sec (IJCSIS) 2018;16(10).
Tomanek K, Wermter J, Hahn U. Sentence and token splitting based on conditional random fields. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, Citeseer, 2007;49(57).
Deshpande S, Palshikar GK, Athiappan G. An unsupervised approach to sentence classification. In: COMAD, 2020;88
Cameron S, Turtle-Song I. Learning to write case notes using the soap format. Journal of Counseling & Development. 2002;80(3):286–92.
Gallant SI, Gallant SI. Neural network learning and expert systems. MIT press, 1993.
Dash S, Dash S, Tripathy BK, Rahman A. Handbook of Research on Modeling, Analysis, and Application of Nature-Inspired Metaheuristic Algorithms. 1st ed. USA: IGI Global, 2017.
Lafferty J, McCallum A, Pereira FC. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2019;36(4):1234–1240, https://doi.org/10.1093/bioinformatics/btz682, https://academic.oup.com/bioinformatics/article-pdf/36/4/1234/32527770/btz682.pdf
Zhang H, Candido E, Wilton AS, Duchen R, Jaakkimainen L, Wodchis W, Morris Q (2019a) Identifying transitional high cost users from unstructured patient profiles written by primary care physicians. In: Biocomputing 2020, World Sci, https://doi.org/10.1142%2F9789811215636_0012
Bampa M, Dalianis H. Detecting adverse drug events from swedish electronic health records using text mining. In: Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultiligualBIO), 2020;1–8.
Lamy M, Pereira R, Ferreira JC, Vasconcelos JB, Melo F, Velez I. Extracting clinical information from electronic medical records. In: Int Symp on Amb Intel, Springer, 2018b;113–120.
Blinov P, Avetisian M, Kokh V, Umerenkov D, Tuzhilin A. Predicting clinical diagnosis from patients electronic health records using BERT-based neural networks. In: Artificial Intelligence in Medicine, Springer International Publishing, 2020;111–121. https://doi.org/10.1007%2F978-3-030-59137-3_11.
Feng J, Shaib C, Rudzicz F. Explainable clinical decision support from text. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020;1478–1489.
Singh AKB, Guntu M, Bhimireddy AR, Gichoya JW, Purkayastha S. Multi-label natural language processing to identify diagnosis and procedure codes from mimic-iii inpatient notes. 2020;2003.07507
Acknowledgements
The authors would like to thank the Coordination for the Improvement of Higher Education Personnel - CAPES (Financial Code 001), the National Council for Scientific and Technological Development - CNPq (Grant number 309537 / 2020-7) and the Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul (IFRS) for their support in this work.
Funding
The article was partially funded by the Coordination for the Improvement of Higher Education Personnel - CAPES (Financial Code 001), the National Council for Scientific and Technological Development - CNPq (Grant number 309537 / 2020-7), and the Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul (IFRS).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
de Oliveira, J.M., da Costa, C.A. & Antunes, R.S. Data structuring of electronic health records: a systematic review. Health Technol. 11, 1219–1235 (2021). https://doi.org/10.1007/s12553-021-00607-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-021-00607-w