research-article

Automatic Extraction of Nested Entities in Clinical Referrals in Spanish

Authors:

Felipe Bravo-Marquez,

Jocelyn Dunstan,

Fabián VillenaAuthors Info & Claims

ACM Transactions on Computing for Healthcare (HEALTH), Volume 3, Issue 3

Article No.: 28, Pages 1 - 22

https://doi.org/10.1145/3498324

Published: 07 April 2022 Publication History

Abstract

Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of relations with clinical relevance. In total, there are 110,771 annotated tokens. A trained medical doctor or dentist annotated these referrals, and then, together with three other researchers, consolidated each of the annotations. The annotated corpus has 48.17% of entities embedded in other entities or containing another one. We use this corpus to build models for Named Entity Recognition (NER). The best results were achieved using a Multiple Single-entity architecture with clinical word embeddings stacked with character and Flair contextual embeddings. The entity with the best performance is abbreviation, and the hardest to recognize is finding. NER models applied to this corpus can leverage statistics of diseases and pending procedures. This work constitutes the first annotated corpus using clinical narratives from Chile and one of the few in Spanish. The annotated corpus, clinical word embeddings, annotation guidelines, and neural models are freely released to the community.

References

[1]

Aitor Gonzalez Agirre, Montserrat Marimon, Ander Intxaurrondo, Obdulia Rabal, Marta Villegas, and Martin Krallinger. 2019. Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track. In Proceedings of the 5th Workshop on BioNLP Open Shared Tasks. 1–10.

[2]

Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 1638–1649.

[3]

Beatrice Alex, Barry Haddow, and Claire Grover. 2007. Recognising nested named entities in biomedical text. In Biological, Translational, and Clinical Language Processing. Association for Computational Linguistics, 65–72.

[4]

Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and Michael Auli. 2019. Cloze-driven Pretraining of Self-attention Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 5360–5369.

[5]

Pablo Báez, Fabián Villena, Matías Rojas, Manuel Durán, and Jocelyn Dunstan. 2020. The chilean waiting list corpus: A new resource for clinical named entity recognition in spanish. In Proceedings of the 3rd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, 291–300.

[6]

Pablo Báez, Fabián Villena, Karen Zúñiga, Natalia Jones, Gustavo Fernández, Manuel Durán, and Jocelyn Dunstan. 2021. Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas. Rev. méd. Chile 149, 7 (2021), 1014–1022.

[7]

Leonardo Campillos, Louise Deléger, Cyril Grouin, Thierry Hamon, Anne-Laure Ligozat, and Aurélie Névéol. 2018. A french clinical corpus with comprehensive semantic annotations: Development of the medical entity and relation LIMSI annOtated text corpus (MERLOT). Lang. Resourc. Eval. 52, 2 (2018), 571–601.

Digital Library

[8]

Leonardo Campillos-Llanos. 2019. First steps towards building a medical lexicon for spanish with linguistic and semantic information. In Proceedings of the 18th BioNLP Workshop and Shared Task. 152–164.

[9]

Leonardo Campillos-Llanos, Ana Valverde-Mateos, Adrián Capllonch-Carrión, and Antonio Moreno-Sandoval. 2021. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med. Inf. Decis. Making 21, 1 (2021), 1–19.

[10]

José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang, and Jorge Pérez. 2020. Spanish pre-trained BERT model and evaluation data. In Proceedings of the Practical ML for Developing Countries Workshop at the International Conference on Learning Representations (PML4DC at ICLR’20).

[11]

Nancy Chinchor and Patricia Robinson. 1997. MUC-7 named entity task definition. In Proceedings of the 7th Conference on Message Understanding, Vol. 29. 1–21.

[12]

Viviana Cotik, Darío Filippo, Roland Roller, Hans Uszkoreit, and Feiyu Xu. 2017. Annotation of entities and relations in spanish radiology reports. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP’17). 177–184.

[13]

Noa P. Cruz Diaz, Roser Morante, Manuel J. Mana López, Jacinto Mata Vázquez, and Carlos L. Parra Calderón. 2017. Annotating negation in spanish clinical texts. In Proceedings of the Workshop Computational Semantics Beyond Events and Roles. 53–58.

[14]

Hercules Dalianis. 2018. Clinical Text Mining: Secondary Use of Electronic Patient Records. Springer Nature.

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.

[16]

Thomas G. Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 7 (1998), 1895–1923.

Digital Library

[17]

Roberto Estay, Cristóbal Cuadrado, Francisca Crispi, Fernando González, Francisco Alvarado, and Natalia Cabrera. 2017. Desde el conflicto de listas de espera, hacia el fortalecimiento de los prestadores públicos de salud: Una propuesta para chile. Cuader. Méd. Social. 57, 1 (2017).

[18]

Jenny Rose Finkel and Christopher D. Manning. 2009. Nested named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 141–150.

[19]

División de Desarrollo Institucional, Departamento de Estudios y Estadísticas. 2020. Informe CDD: Caracterización sociodemográfica y socioeconómica en la población asegurada inscrita. Retrieved from https://www.fonasa.cl/sites/fonasa/adjuntos/Informe_caracterizacion_poblacion_asegurada.

[20]

Karën Fort. 2016. Collaborative Annotation for Reliable Natural Language Processing: Technical and Sociological Aspects. John Wiley & Sons.

[21]

Carol Friedman, Pauline Kra, and Andrey Rzhetsky. 2002. Two biomedical sublanguages: A description based on the theories of zellig harris. J. Biomed. Inf. 35, 4 (2002), 222–235.

Digital Library

[22]

Archana Goyal, Vishal Gupta, and Manish Kumar. 2018. Recent named entity recognition and classification techniques: A systematic review. Comput. Sci. Rev. 29 (2018), 21–43.

[23]

Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

[24]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python.

[25]

George Hripcsak and Adam S. Rothschild. 2005. Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12, 3 (2005), 296–298.

[26]

Ander Intxaurrondo, Juan Carlos de la Torre, H. Rodriguez Betanco, Montserrat Marimon, Jose Antonio Lopez-Martin, Aitor Gonzalez-Agirre, J. Santamarıa, Marta Villegas, and Martin Krallinger. 2018. Resources, guidelines and annotations for the recognition, definition resolution and concept normalization of Spanish clinical abbreviations: the BARR2 corpus. In Proceedings of the Annual Conference on Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN’18). 1–9.

[27]

Ander Intxaurrondo, Montserrat Marimon, Aitor Gonzalez-Agirre, Jose Antonio Lopez-Martin, Heidy Rodriguez, Jesus Santamaria, Marta Villegas, and Martin Krallinger. 2018. Finding mentions of abbreviations and their definitions in spanish clinical cases: The BARR2 shared task evaluation results. In Proceedings of the Evaluation of Human Language Technologies for Iberian Laguages at the Annual Conference on Sociedad Espa para el Procesamiento del Lenguaje Natural (IberEval@ SEPLN’18). 280–289.

[28]

Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, and Jingbo Zhu. 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 3576–3581.

[29]

Meizhi Ju, Makoto Miwa, and Sophia Ananiadou. 2018. A neural layered model for nested named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 1446–1459.

[30]

Arzoo Katiyar and Claire Cardie. 2018. Nested named entity recognition revisited. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 861–871.

[31]

J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. GENIA Corpus—A Semantically Annotated Corpus for Bio-textmining. Vol. 19. Suppl 1:i180-2.

[32]

Rob Koeling, John Carroll, Rosemary Tate, and Amanda Nicholson. 2011. Annotating a corpus of clinical text records for learning to recognize symptoms automatically. In Proceedings of LOUHI 3rd International Workshop on Health Document Text Mining and Information Analysis. CEUR Workshop Proceedings, 43–50.

[33]

Jan A. Kors, Simon Clematide, Saber A. Akhondi, Erik M. Van Mulligen, and Dietrich Rebholz-Schuhmann. 2015. A multilingual gold-standard corpus for biomedical concept recognition: The Mantra GSC. J. Am. Med. Inf. Assoc. 22, 5 (2015), 948–956.

[34]

Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Salgado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M. Lowe, et al. 2015. The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7, 1 (2015), 1–17.

[35]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260–270.

[36]

Lukas Lange, Heike Adel, and Jannik Strötgen. 2019. NLNDE: The neither-language-nor-domain-experts’ way of spanish medical document de-identification. CEUR Workshop Proc. 2421 (2019), 671–678.

[37]

C. Lecaros, J. Dunstan, F. Villena, D. M. Ashcroft, R. Parisi, C. E. M. Griffiths, S. Härtel, J. T. Maul, and C. De la Cruz. 2021. The incidence of psoriasis in Chile: An analysis of the national waiting list repository. Clin. Exp. Dermatol. 46, 7 (2021), 1262–1269.

[38]

Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5849–5859.

[39]

Salvador Lima-López, Naiara Pérez, Montse Cuadros, and German Rigau. 2020. Nubes: A corpus of negation and uncertainty in spanish clinical texts. In Proceedings of the 12th Language Resources and Evaluation Conference. 5772–5781.

[40]

Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2019. Sequence-to-nuggets: Nested entity mention detection via anchor-region networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5182–5192.

[41]

Donald A. Lindberg, Betsy L. Humphreys, and Alexa T. McCray. 1993. The unified medical language system. Methods Inf. Med. 32, 4 (1993), 281.

[42]

Jason P. Lott, Denise M. Boudreau, Ray L. Barnhill, Martin A. Weinstock, Eleanor Knopp, Michael W. Piepkorn, David E. Elder, Steven R. Knezevich, Andrew Baer, Anna N. A. Tosteson, et al. 2018. Population-based analysis of histologically confirmed melanocytic proliferations using natural language processing. JAMA Dermatol. 154, 1 (2018), 24–29.

[43]

Wei Lu and Dan Roth. 2015. Joint mention extraction and classification with mention hypergraphs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 857–867.

[44]

Montserrat Marimon, Aitor Gonzalez-Agirre, Ander Intxaurrondo, Heidy Rodriguez, Jose Lopez Martin, Marta Villegas, and Martin Krallinger. 2019. Automatic de-identification of medical texts in spanish: the MEDDOCAN track, corpus, guidelines, methods and evaluation of results. In Proceedings of the Iberian Languages Evaluation Forum at the Annual Conference on Sociedad Espa para el Procesamiento del Lenguaje Natural (IberLEF@ SEPLN’19). 618–638.

[45]

Montserrat Marimon, Jorge Vivaldi, and Núria Bel Rafecas. 2017. annotation of negation in the IULA spanish clinical record corpus. In Proceedings of the Computational Semantics Beyond Events and Roles Conference(SemBEaR’17). E. Blanco, R. Morante, and R. Saurí (Eds.). ACL, 43–52.

[46]

Zita Marinho, Alfonso Mendes, Sebastiao Miranda, and David Nogueira. 2019. Hierarchical nested named entity recognition. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. 28–34.

[47]

Diego A. Martinez, Haoxiang Zhang, Magdalena Bastias, Felipe Feijoo, Jeremiah Hinson, Rodrigo Martinez, Jocelyn Dunstan, Scott Levin, and Diana Prieto. 2019. Prolonged wait time is associated with increased mortality for chilean waiting list patients with non-prioritized conditions. BMC Publ. Health 19, 1 (2019), 233.

[48]

Jose A. Miñarro-Giménez, Ronald Cornet, Marie-Christine Jaulent, Heike Dewenter, Sylvia Thun, Kirstine Rosenbeck Gøeg, Daniel Karlsson, and Stefan Schulz. 2019. Quantitative analysis of manual annotation of clinical text samples. Int. J. Med. Inf. 123 (2019), 37–48.

[49]

Ministerio de Salud de Chile. 2004. Ley 19.966. Retrieved from https://www.leychile.cl/Navegar?idNorma=229834.

[50]

Ministerio de Salud de Chile. 2011. Estrategia Nacional de Salud para el cumplimiento de los Objetivos Sanitarios de la Década 2010-2020. Retrieved from http://www.bibliotecaminsal.cl/wp/wp-content/uploads/2011/12/Metas-2011-2020.pdf.

[51]

Ministerio de Salud de Chile. 2011. Norma Técnica Para El Registro De Las Listas De Espera. Retreived from www.minsal.cl/wp-content/uploads/2016/03/Norma-Tecnica-118.pdf.

[52]

Ministerio Secretaría General de la Presidencia. 2008. Ley 20.285. Retrieved from https://www.leychile.cl/Navegar?idNorma=276363&idParte=.

[53]

Antonio Miranda-Escalada, Aitor Gonzalez-Agirre, Jordi Armengol-Estapé, and Martin Krallinger. 2020. Overview of automatic clinical coding: Annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020. In Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings.

[54]

Antonio Moreno-Sandoval and Leonardo Campillos-Llanos. 2013. Design and annotation of multimedica–a multilingual text corpus of the biomedical domain. Proc. Soc. Behav. Sci. 95 (2013), 33–39.

[55]

Aldrian Obaja Muis and Wei Lu. 2017. Labeling gaps between words: Recognizing overlapping mentions with mention separators. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2608–2618.

[56]

Isar Nejadgholi, Kathleen C. Fraser, and Berry de Bruijn. 2020. Extensive error analysis and a learning-based evaluation of medical entity recognition systems to approximate user experience. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. Association for Computational Linguistics, 177–186.

[57]

Aurélie Névéol, Hercules Dalianis, Sumithra Velupillai, Guergana Savova, and Pierre Zweigenbaum. 2018. Clinical natural language processing in languages other than english: Opportunities and challenges. J. Biomed. Semant. 9, 1 (2018), 12.

[58]

Aurélie Névéol, Rezarta Islamaj Doğan, and Zhiyong Lu. 2011. Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction. J. Biomed. Inf. 44, 2 (2011), 310–318.

Digital Library

[59]

Tomoko Ohta, Yuka Tateisi, Jin-Dong Kim, Hideki Mima, and Junichi Tsujii. 2002. The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In Proceedings of the 2nd International Conference on Human Language Technology Research. 82–86.

[60]

Maite Oronoz, Arantza Casillas, Koldo Gojenola, and Alicia Perez. 2013. Automatic annotation of medical records in spanish with disease, drug and substance names. In Iberoamerican Congress on Pattern Recognition. Springer, 536–543.

[61]

Maite Oronoz, Koldo Gojenola, Alicia Pérez, Arantza Díaz de Ilarraza, and Arantza Casillas. 2015. On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions. J. Biomed. Inf. 56 (2015), 318–332.

Digital Library

[62]

Ana Carolina Peters, Adalniza Moura Pucca da Silva, Caroline P. Gebeluca, Yohan Bonescki Gumiel, Lilian Mie Mukai Cintho, Deborah Ribeiro Carvalho, Sadid A. Hasan, Claudia Maria Cabral Moro, et al. 2020. SemClinBr–a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks. arXiv:2001.10071. Retrieved from https://arxiv.org/abs/2001.10071.

[63]

Angus Roberts, Robert Gaizauskas, Mark Hepple, Neil Davis, George Demetriou, Yikun Guo, Jay Subbarao Kola, Ian Roberts, Andrea Setzer, Archana Tapuria, et al. 2007. The CLEF corpus: Semantic annotation of clinical text. In AMIA Annual Symposium Proceedings, Vol. 2007. American Medical Informatics Association, 625.

[64]

Angus Roberts, Robert Gaizauskas, Mark Hepple, George Demetriou, Yikun Guo, Ian Roberts, and Andrea Setzer. 2009. Building a semantically annotated corpus of clinical texts. J. Biomed. Inf. 42, 5 (2009), 950–966.

Digital Library

[65]

Aleksandar Savkov, John Carroll, Rob Koeling, and Jackie Cassell. 2016. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus. Lang. Resourc. Eval. 50, 3 (2016), 523–548.

Digital Library

[66]

Maria Skeppstedt, Maria Kvist, Gunnar H. Nilsson, and Hercules Dalianis. 2014. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. J. Biomed. Inf. 49 (2014), 148–158.

Digital Library

[67]

Mohammad Golam Sohrab and Makoto Miwa. 2018. Deep exhaustive model for nested named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2843–2849.

[68]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 56 (2014), 1929–1958.

Digital Library

[69]

Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. BRAT: A web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. 102–107.

Digital Library

[70]

Jana Straková, Milan Straka, and Jan Hajic. 2019. Neural architectures for nested NER through linearization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5326–5331.

[71]

Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. 142–147.

Digital Library

[72]

Özlem Uzuner, Ira Goldstein, Yuan Luo, and Isaac Kohane. 2008. Identifying patient smoking status from medical discharge records. J. Am. Med. Inf. Assoc. 15, 1 (2008), 14–24.

[73]

Özlem Uzuner, Imre Solti, and Eithon Cadag. 2010. Extracting medication information from clinical text. J. Am. Med. Inf. Assoc. 17, 5 (2010), 514–518.

[74]

Özlem Uzuner, Imre Solti, Fei Xia, and Eithon Cadag. 2010. Community annotation experiment for ground truth generation for the i2b2 medication challenge. J. Am. Med. Inf. Assoc. 17, 5 (2010), 519–523.

[75]

Özlem Uzuner, Brett R. South, Shuying Shen, and Scott L. DuVall. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inf. Assoc. 18, 5 (2011), 552–556.

[76]

Fabián Villena and Jocelyn Dunstan. 2019. Obtención automática de palabras clave en textos clínicos: Una aplicación de procesamiento del lenguaje natural a datos masivos de sospecha diagnóstica en Chile. Rev. méd. Chile 147, 10 (2019), 1229–1238.

[77]

Fabián Villena, Jorge Perez, René Lagos, and Jocelyn Dunstan. 2021. Supporting the classification of patients in public hospitals in Chile by designing, deploying and validating a system based on natural language processing. BMC Med. Inf. Decis. Making 21, 1 (2021), 208.

[78]

Bailin Wang and Wei Lu. 2018. Neural segmental hypergraphs for overlapping mention recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 204–214.

[79]

Juntao Yu, Bernd Bohnet, and Massimo Poesio. 2020. Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6470–6476.

[80]

Changmeng Zheng, Yi Cai, Jingyun Xu, Ho-fung Leung, and Guandong Xu. 2019. A boundary-aware neural model for nested named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 357–366.

Cited By

Ahumada RDunstan JRojas MPeñafiel SParedes IBáez P(2024)Automatic Detection of Distant Metastasis Mentions in Radiology Reports in SpanishJCO Clinical Cancer Informatics10.1200/CCI.23.00130Online publication date: Mar-2024
https://doi.org/10.1200/CCI.23.00130
Dunstan JVakili TMiranda LVillena FAracena CQuiroga TVera PViteri Valenzuela SRocco V(2024)A pseudonymized corpus of occupational health narratives for clinical entity recognition in SpanishBMC Medical Informatics and Decision Making10.1186/s12911-024-02609-w24:1Online publication date: 24-Jul-2024
https://doi.org/10.1186/s12911-024-02609-w
Hua ZChen Y(2024)Local Metric NER: A new paradigm for named entity recognition from a multi-label perspectiveKnowledge-Based Systems10.1016/j.knosys.2024.112686305(112686)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112686
Show More Cited By

Index Terms

Automatic Extraction of Nested Entities in Clinical Referrals in Spanish
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
      2. Language resources

Recommendations

Nested Named Entity Recognition: A Survey
With the rapid development of text mining, many studies observe that text generally contains a variety of implicit information, and it is important to develop techniques for extracting such information. Named Entity Recognition (NER), the first step of ...
NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links
Abstract
This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation ...
Extraction and evaluation of candidate named entities in search engine queries
WISE'12: Proceedings of the 13th international conference on Web Information Systems Engineering

Named Entity Recognition (NER) has recently been applied to search queries, in order to better understand their semantics. We present a novel method for detecting candidate named entities (NEs) using grammar annotation and query segmentation with the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computing for Healthcare

ACM Transactions on Computing for Healthcare Volume 3, Issue 3

July 2022

251 pages

EISSN:2637-8051

DOI:10.1145/3514183

Editors:
Insup Lee
University of Pennsylvania, USA
,
John A. Stankovic
University of Virginia, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2022

Accepted: 01 November 2021

Revised: 01 October 2021

Received: 01 November 2020

Published in HEALTH Volume 3, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Centro de Modelamiento Matemático (CMM)
U-INICIA VID
FONDECYT
CIMT-CORFO
ICM
Postdoctoral FONDECYT
ANID - Millennium Science Initiative Program
Supercomputing infrastructure of the NLHPC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
364
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ahumada RDunstan JRojas MPeñafiel SParedes IBáez P(2024)Automatic Detection of Distant Metastasis Mentions in Radiology Reports in SpanishJCO Clinical Cancer Informatics10.1200/CCI.23.00130Online publication date: Mar-2024
https://doi.org/10.1200/CCI.23.00130
Dunstan JVakili TMiranda LVillena FAracena CQuiroga TVera PViteri Valenzuela SRocco V(2024)A pseudonymized corpus of occupational health narratives for clinical entity recognition in SpanishBMC Medical Informatics and Decision Making10.1186/s12911-024-02609-w24:1Online publication date: 24-Jul-2024
https://doi.org/10.1186/s12911-024-02609-w
Hua ZChen Y(2024)Local Metric NER: A new paradigm for named entity recognition from a multi-label perspectiveKnowledge-Based Systems10.1016/j.knosys.2024.112686305(112686)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112686
Báez PCampillos-Llanos LNúñez FDunstan J(2024)Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitationsLanguage Resources and Evaluation10.1007/s10579-024-09755-7Online publication date: 2-Jul-2024
https://doi.org/10.1007/s10579-024-09755-7
Campillos-Llanos L(2023)MedLexSp – a medical lexicon for Spanish medical natural language processingJournal of Biomedical Semantics10.1186/s13326-022-00281-514:1Online publication date: 2-Feb-2023
https://doi.org/10.1186/s13326-022-00281-5
Chiu CVillena FMartin KNúñez FBesa CDunstan J(2022)Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in SpanishFrontiers in Artificial Intelligence10.3389/frai.2022.9705175Online publication date: 21-Sep-2022
https://doi.org/10.3389/frai.2022.970517
Báez PArancibia AChaparro MBucarey TNúñez FDunstan J(2022)Procesamiento de lenguaje natural para texto clínico en español: el caso de las listas de espera en ChileRevista Médica Clínica Las Condes10.1016/j.rmclc.2022.10.00233:6(576-582)Online publication date: Nov-2022
https://doi.org/10.1016/j.rmclc.2022.10.002

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents