Abstract
Electronic Medical Records (EMRs) are written in an unstructured way, often using natural language. Information Extraction (IE) may be used for acquiring knowledge from such texts, including the automatic recognition of meaningful entities, through models for Named Entity Recognition (NER). However, while most work on the previous was made for English, this experience aimed at testing different methods in Portuguese text, more precisely, on the domain of Neurology, and take some conclusions. This paper comprised the comparison between Conditional Random Fields (CRF), bidirectional Long Short-term Memory - Conditional Random Fields (BiLSTM-CRF) and a BiLSTM-CRF with residual learning connections, using not only Portuguese texts from medical journals but also texts from the Coimbra Hospital and Universitary Centre (CHUC) Neurology Service. Furthermore, the performances of BiLSTM-CRF models using word embeddings (WEs) trained with clinical text and WEs trained with general language texts were compared. Deep learning models achieved F1-Scores of nearly 83% and 75%, respectively for relaxed and strict evaluation, on texts extracted from the medical journal. For texts collected from the Hospital, the same achieved F1-Scores of nearly 71% and 62%. This work concludes that deep learning models outperform the shallow learning models and that in-domain WEs get better results than general language WEs, even when the latter are trained with much more text than the former. Furthermore, the results show that it is possible to extract information from Hospital clinical texts with models trained with clinical cases extracted from medical journals, and thus openly available. Nevertheless, such results still require a healthcare technician to check if the information is well extracted.
Similar content being viewed by others
References
Folland, S., Goodman, A.C., Stano, M., Introduction. In: The Economics of Health and Health Care, 8th edn., chap. 1, pp. 29–54. Pearson Prentice Hall Upper Saddle River, NJ, 2017.
Oderkirk, J., Readiness of Electronic Health Record Systems to Contribute to National Health Information and Research. OECD Health Working Papers (99), 1–80, 2017
Lamy, M., Pereira, R., Ferreira, J.C., de Vasconcelos, J.B., Melo, F., Velez, I., Extracting Clinical Information from Electronic Medical Records. In: P. Novais, J.J. Jung, G. Villarrubia-González, A. Fernández-Caballero, E. Navarro, P. González, D. Carneiro, A. pinto, A.T. Campbell, D. Durães (eds.) International Symposium on Ambient Intelligence, Advances in Intelligent Systems and Computing, pp. 113–120. Springer, 2018.
Berezina, K., Bilgihan, A., Cobanoglu, C., and Okumus, F., Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews. Journal of Hospitality Marketing & Management 25(1):1–24, 2016.
Cai, T., Giannopoulos, A. A., Yu, S., Kelil, T., Ripley, B., Kumamaru, K. K., Rybicki, F. J., and Mitsouras, D., Natural Language Processing Technologies in Radiology Research and Clinical Applications. Radiographics 36(1):176–191, 2016.
Ferreira, L., Teixeira, A.J.S., Cunha, J.P., Information Extraction from Portuguese Hospital Discharge Letters. VI Jornadas en Technologia del Habla and II Iberian SL Tech Workshop (January), 39–42, 2010.
Névéol, A., Dalianis, H., Velupillai, S., Savova, G., Zweigenbaum, P., Clinical Natural Language Processing in Languages other than English: Opportunities and Challenges. Journal of Biomedical Semantics 9(1), 12, 2018. DOI https://doi.org/10.1186/s13326-018-0179-8. URL https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326- 018-0179-8
Lopes, F., Teixeira, C., Gonçalo Oliveira, H., Named entity recognition in portuguese neurology text using crf. In: P. Moura Oliveira, P. Novais, L.P. Reis (eds.) Progress in Artificial Intelligence, pp. 336–348. Springer International Publishing, Cham, 2019.
Gold, S., Elhadad, N., Zhu, X., Cimino, J.J., Hripcsak, G., Extracting structured medication event information from discharge summaries. In: AMIA Annual Symposium Proceedings, pp. 237–241. American Medical Informatics Association, 2008.
Mykowiecka, A., Marciniak, M., and Kupść, A., Rule-based Information Extraction from Patients’ Clinical Data. Journal of Biomedical Informatics 42(5):923–936, 2009. https://doi.org/10.1016/j.jbi.2009.07.007.
Skeppstedt, M., Kvist, M., Dalianis, H., Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text. In: LREC, pp. 1250–1257, 2012.
Rais, M., Lachkar, A., Lachkar, A., Ouatik, S.E.A., A Comparative Study of Biomedical Named Entity Recognition Methods based Machine Learning Approach. In: 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), pp. 329–334. IEEE, 2014. DOI https://doi.org/10.1109/CIST.2014.7016641. URL http://ieeexplore.ieee.org/document/7016641/
Wang, Y., Yu, Z., Chen, L., Chen, Y., Liu, Y., Hu, X., and Jiang, Y., Supervised Methods for Symptom Name Recognition in Free-text Clinical Records of Traditional Chinese Medicine: An Empirical Study. Journal of Biomedical Informatics 47:91–104, 2014. https://doi.org/10.1016/j.jbi.2013.09.008.
Skeppstedt, M., Kvist, M., Nilsson, G. H., and Dalianis, H., Automatic Recognition of Disorders, Findings, Pharmaceuticals and Body Structures from Clinical Text: An Annotation and Machine Learning Study. Journal of Biomedical Informatics 49:148–158, 2014. https://doi.org/10.1016/j.jbi.2014.01.012.
Henriksson, A., Dalianis, H., Kowalski, S., Generating Features for Named Entity Recognition by Learning Prototypes in Semantic Space: The Case of De-identifying Health Records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457. IEEE, 2014. DOI https://doi.org/10.1109/BIBM.2014.6999199. URL http://ieeexplore.ieee.org/document/6999199/.
Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H., A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. In: AMIA ... Annual Symposium proceedings. AMIA Symposium, vol. 2015, pp. 1326–1333, 2015. URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC 4765694.
Goodfellow, I., Bengio, Y., Courville, A., Sequence Modeling: Recurrent and Recursive Nets. In: Deep Learning, chap. 10, pp. 363–408. MIT Press, 2016.
Hochreiter, S., and Schmidhuber, J., Long Short-Term Memory. Neural Computation 9(8):1735–1780, 1997.
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y., On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
Goodfellow, I., Bengio, Y., Courville, A., Convolutional Networks. In: Deep Learning, chap. 9, pp. 321–362. MIT Press, 2016.
Goodfellow, I., Bengio, Y., Courville, A., Deep Feedforward Networks. In: Deep Learning, chap. 6, pp. 163–220. MIT Press, 2016.
Luu, T.M., Phan, R., Davey, R., Chetty, G., Clinical Name Entity Recognition Based on Recurrent Neural Networks. 2018 18th International Conference on Computational Science and Applications (ICCSA) pp. 1–9, 2018. DOI https://doi.org/10.1109/iccsa.2018.8439147
Kelly, L., Goeuriot, L., Suominen, H., Névéol, A., Palotti, J., Zuccon, G., Overview of the CLEF eHealth evaluation lab 2016. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 255–266. Springer, 2016.
Chokwijitkul, T., Nguyen, A., Hassanzadeh, H., Perez, S., Hospital, L., Identifying Risk Factors For Heart Disease in Electronic Medical Records : A Deep Learning Approach. In: Proceedings of the BioNLP 2018 workshop, pp. 18–27, 2018.
Wu, Y., Jiang, M., Xu, J., Zhi, D., Xu, H., Clinical Named Entity Recognition Using Deep Learning Models. In: AMIA Annual Symposium proceedings. AMIA Symposium, pp. 1812–1819, 2018.
Xu, K., Zhou, Z., Hao, T., Liu, W., A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 355–365, 2018. DOI https://doi.org/10.1007/978-3-319-64861-3_33
Jauregi Unanue, I., Zare Borzeshi, E., and Piccardi, M., Recurrent Neural Networks with Specialized Word Embeddings for Health-domain Named-entity Recognition. Journal of Biomedical Informatics 76:102–109, 2017. https://doi.org/10.1016/j.jbi.2017.11.007.
Tran, Q., MacKinlay, A., Jimeno Yepes, A., Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 566–575. Asian Federation of Natural Language Processing, Taipei, Taiwan, 2017. URL https://www.aclweb.org/anthology/I17-1057.
Prakash, A., Hasan, S.A., Lee, K., Datla, V., Qadir, A., Liu, J., Farri, O., Neural Paraphrase Generation with Stacked Residual LSTM Networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2923–2934. The COLING 2016 Organizing Committee, Osaka, Japan, 2016.
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, 2016. DOI https://doi.org/10.1109/CVPR.2016.90
de Castro, P.V.Q., da Silva, N.F.F., da Silva Soares, A., Portuguese named entity recognition using lstm-crf. In: International Conference on Computational Processing of the Portuguese Language, pp. 83–92. Springer, 2018.
Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using bert-crf. arXiv preprint arXiv:1909.10649, 2019.
Devlin, J., Chang, M.W., Lee, K., Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. ACL Press, Minneapolis, Minnesota, 2019.
dos Santos, C., Guimarães, V., Boosting Named Entity Recognition with Neural Character Embeddings. Proceedings of the Fifth Named Entity Workshop pp. 25–33 (2015). DOI https://doi.org/10.18653/v1/W15-3904. URL http://aclweb.org/anthology/W15-3904
Santos, C.D., Zadrozny, B., Learning Character-level Representations for Part-of-speech Tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1818–1826, 2014.
de Neurologia, S.P., Sinapse. In: Publicações da Sociedade Portuguesa de Neurologia, 1, vol. 17, pp. 1–196. Sociedade Portuguesa de Neurologia, Lisbon, 2017.
de Neurologia, S.P., Sinapse. In: Publicações da Sociedade Portuguesa de Neurologia, 2, vol. 17, pp. 1–184. Sociedade Portuguesa de Neurologia, Lisbon, 2017.
Klatt, J., Feldwisch-Drentrup, H., Ihle, M., Navarro, V., Neufang, M., Teixeira, C., Adam, C., Valderrama, M., Alvarado-Rojas, C., and Witon, A., Others: The EPILEPSIAE database: An Extensive Electroencephalography Database of Epilepsy Patients. Epilepsia 53(9):1669–1676, 2012.
Tjong Kim Sang, E.F., De Meulder, F., Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ‘03, pp. 142–147. Association for Computational Linguistics, Stroudsburg, PA, USA, 2003. DOI https://doi.org/10.3115/1119176.1119195.
Lopes, F., Teixeira, C., Gonçalo Oliveira, H., Contributions to clinical named entity recognition in Portuguese. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 223–233. Association for Computational Linguistics, Florence, Italy, 2019. URL https://www.aclweb.org/anthology/W19-5024
Mikolov, T., Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Learning Word Vectors for 157 Languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), pp. 3483–3487, 2018.
Rehurek, R., Sojka, P., Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, 2010.
Mikolov, T., Le, Q.V., Sutskever, I., Exploiting Similarities among Languages for Machine Translation. arXiv preprint arXiv:1309.4168, 2013.
Bouma, G., Normalized (Pointwise) Mutual Information in Collocation Extraction. Proceedings of the Biennial GSCL Conference 2009 pp. 31–40, 2009.
Klinger, R., Tomanek, K., Classical Probabilistic Models and Conditional Random Fields. Tech. Rep. TR07-2-013, Department of Computer Science, Dortmund University of Technology, 2007.
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O., Understanding Deep Learning Requires Rethinking Generalization. arXiv preprint arXiv:1611.03530, 2016.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research 15(1):1929–1958, 2014.
Benjamini, Y., and Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1):289–300, 1995.
Newman-Griffis, D., Zirikly, A., Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility. In: Proceedings of the BioNLP 2018 workshop, pp. 1–11 (2018). URL http://arxiv.org/abs/1806.02814
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Systems-Level Quality Improvement
Rights and permissions
About this article
Cite this article
Lopes, F., Teixeira, C. & Gonçalo Oliveira, H. Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text. J Med Syst 44, 77 (2020). https://doi.org/10.1007/s10916-020-1542-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-020-1542-8