article

A machine-learning approach to negation and speculation detection in clinical texts

Authors:

Noa P. Cruz Díaz,

Manuel J. Maña López,

Jacinto Mata Vázquez,

Victoria Pachón ÁlvarezAuthors Info & Claims

Journal of the American Society for Information Science and Technology, Volume 63, Issue 7

Pages 1398 - 1410

https://doi.org/10.1002/asi.22679

Published: 01 July 2012 Publication History

Abstract

Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation. © 2012 Wiley Periodicals, Inc.

References

[1]

Agarwal, Sh., & Yu, H. (2010). Biomedical negation scope detection with conditional random fields. Journal of the American Medical Information Association, 17(6), 696–701.

[2]

Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., & Rokach, L. (2004). Context-sensitive medical information retrieval. In M. Fieschi, E. Coiera, & Y.-C.J. Li (Eds.), MEDINFO 2004: Proceedings of the 11th World Congress on Medical Informatics (pp. 1––8). Amsterdam, The Netherlands: IOS Press.

[3]

Buenaga, M., Fernández Riverola, F., Maña, M., Puertas, E., Glez-Peña, D., & Mata, J. (2010). Medical-Miner: Integracióón de conocimiento textual explícito en técnicas de minería de datos para la creación de herramientas traslacionales en medicina {Medical-Miner: Integrating explicit knowledge in data mining techniques for the development of translational medicine tools}. Procesamiento del Lenguaje Natural, 47, 319–320.

[4]

Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27), 1–27.

[5]

Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., & Buchanan, B.G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Information, 34(5), 301–310.

[6]

Collier, N., Park, H.S., Ogata, N., Tateishi, Y., Nobata, C., Ohta, T., … Tsujii, J. (1999). The GENIA project: Corpus-based knowledge acquisition and information extraction from genome research papers. In H.S. Thompson & A. Lascarides (Eds.), Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL-99) (pp. 8–12). Stroudsburg, PA: Association for Computational Linguistics.

Digital Library

[7]

Councill, I., McDonald, R., & Velikovich, L. (2010). What's great and what's not: Learning to classify the scope of negation for improved sentiment analysis. In R. Morante & C. Sporleder (Eds.), Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP '10). Stroudsburg, PA: Association for Computational Linguistics.

Digital Library

[8]

Dadvar, M., Hauff, C., & de Jong, F. (2011). Scope of negation detection in sentiment analysis. In Dutch-Belgian Information Retrieval Workshop (pp. 16–20). Amsterdam, The Netherlands: IOS Press.

[9]

Denny, J.C., Miller, R.A., Waitman, L.R., Arrieta, M.A., & Peterson, J.F. (2008). Identifying QT prolongation from ECG impressions using a general-purpose natural language processor. International Journal of Medical Informatics, 78(Suppl 1), S34–S42.

[10]

Denny, J.C., Choma, N.N., Peterson, J.F., Miller, R.A., Bastarache, L., Li, M., & Peterson, N.B. (2012). Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Medical Decision Making, 32(1), 188–197.

[11]

Elkin, P.L., Brown, S.H., Bauer, B.A., Husser, C.S., Carruth, W., Bergstrom, L.R., & Wahner-Roedler, D.L. (2005). A controlled trial of automated classification of negation from clinical notes. BMC Medical Information Decision Making, 5(1), 13.

[12]

Garcia, S., Fernandez, A., & Herrera, F. (2009). Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Applied Soft Computing, 9, 1304–1314.

Digital Library

[13]

Goldin, I.M., & Chapman, W.W. (2003). Learning to detect negation with ““not” in medical texts. In SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference. New York, NY: ACM Press.

[14]

Huang, Y., & Lowe, H.J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Information Association, 14(3), 304–311.

[15]

Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification (Technical report). Taiwan: National Taiwan University, Department of Computer Science.

[16]

Mierswa, I., Lemmen, F., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T. (2006). YALE: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Dunopulos, & T. Eliassi-Rad (Eds.), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06) (pp. 935–940). New York, NY: ACM Press.

Digital Library

[17]

Mitchell, K.J., Becich, M.J., Berman, J.J., Chapman, W.W., Gilbertson, J., Gupta, D., … Crowley, R.S. (2004). Implementation an evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports. Medinformation, 11(Pt 1), 663–667.

[18]

Morante, R., Liekens, A., & Daelemans, W. (2008). Learning the scope of negation in biomedical texts. In M. Lapata & T.H. NG (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 715––724). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[19]

Morante, R., & Daelemans, W. (2009a). A metalearning approach to processing the scope of negation. In S. Stevenson & X. Carreras (Eds.), Proceedings of the 13th Conference on Computational Natural Language Learning (pp. 21––29). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[20]

Morante R, & Daelemans W. (2009b). Learning the scope of hedge cues in biomedical texts. In K.B. Cohen, D. Demner-Fushman, S. Ananiadon, J. Pestian, J. Tsujii, & B. Webber (Eds.), Proceedings of the Workshop on BioNLP (pp. 28––36). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[21]

Mutalik, P.G., Deshpande, A., & Nadkarni, P.M. (2001). Use of general purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Information Association, 8(6), 598–609.

[22]

Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. In K.B. Cohen, D. Demner-Fushman, C. Friedman, L. Hirschman, & J. Pestian (Eds.), Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (pp. 97–104). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[23]

Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.

[24]

Quinlan, J.R. (1993). C4.5: Programs for machine learning. Waltham, MA: Morgan Kaufmann.

Digital Library

[25]

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Survey, 34(1), 1–47.

Digital Library

[26]

Toutanova, K., & Manning, C.D. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In H. Schüütze & K.-Y. Su (Eds.), Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 63–70). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[27]

Van Rijsbergen, C.J. (1979). Information retrieval. London, England: Butterworths-Heinemann.

Digital Library

[28]

Vince, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope corpus: Annotation for negation, uncertainty and their scope in biomedical texts. In D. Demner-Fushman, S. Ananiadon, K.B. Cohen, J. Pestian, J. Tsujii, & B. Webber (Eds.), Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP '08) (pp. 38–45). Stroudsburg, PA: Association of Computational Linguistics.

Digital Library

[29]

Witten, I.H., & Frank, E. Data mining: Practical machine learning tools and techniques (2nd ed.). Waltham, MA: Morgan Kaufmann.

Digital Library

Cited By

Solarte-Pabón OMenasalvas ERodriguez-González A(2020)Spa-neg: An Approach for Negation Detection in Clinical Text Written in SpanishBioinformatics and Biomedical Engineering10.1007/978-3-030-45385-5_29(323-337)Online publication date: 6-May-2020
https://dl.acm.org/doi/10.1007/978-3-030-45385-5_29
(2019)Negation scope detection with recurrent neural networks models in review textsInternational Journal of High Performance Computing and Networking10.5555/3319261.331926913:2(211-221)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3319261.3319269
Shi JHurdle J(2018)Trie-based rule processing for clinical NLPJournal of Biomedical Informatics10.1016/j.jbi.2018.08.00285:C(106-113)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1016/j.jbi.2018.08.002
Show More Cited By

Recommendations

Building a semantically annotated corpus of clinical texts

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient ...
Speculation and negation: Rules, rankers, and the role of syntax

This article explores a combination of deep and shallow approaches to the problem of resolving the scope of speculation and negation within a sentence, specifically in the domain of biomedical research literature. The first part of the article focuses ...
Speculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal
NeSp-NLP '10: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

In information extraction, it is of key importance to distinguish between facts and uncertain or negated information. In other words, IE applications have to treat sentences / clauses containing uncertain or negated information differently from factual ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the American Society for Information Science and Technology

Journal of the American Society for Information Science and Technology Volume 63, Issue 7

July 2012

191 pages

ISSN:1532-2882

Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 July 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Solarte-Pabón OMenasalvas ERodriguez-González A(2020)Spa-neg: An Approach for Negation Detection in Clinical Text Written in SpanishBioinformatics and Biomedical Engineering10.1007/978-3-030-45385-5_29(323-337)Online publication date: 6-May-2020
https://dl.acm.org/doi/10.1007/978-3-030-45385-5_29
(2019)Negation scope detection with recurrent neural networks models in review textsInternational Journal of High Performance Computing and Networking10.5555/3319261.331926913:2(211-221)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3319261.3319269
Shi JHurdle J(2018)Trie-based rule processing for clinical NLPJournal of Biomedical Informatics10.1016/j.jbi.2018.08.00285:C(106-113)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1016/j.jbi.2018.08.002
Kang TZhang SXu NWen DZhang XLei J(2017)Detecting negation and scope in Chinese clinical notes using character and word embeddingComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2016.11.009140:C(53-59)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.cmpb.2016.11.009
Jean PHarispe SRanwez SBellot PMontmain J(2016)Uncertainty detection in natural languageProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912873(1-10)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2912845.2912873
Zhang SKang TZhang XWen DElhadad NLei J(2016)Speculation detection for Chinese clinical notesJournal of Biomedical Informatics10.1016/j.jbi.2016.02.01160:C(334-341)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.jbi.2016.02.011
Cruz NTaboada MMitkov R(2016)A machine-learning approach to negation and speculation detection for sentiment analysisJournal of the Association for Information Science and Technology10.1002/asi.2353367:9(2118-2136)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1002/asi.23533
Mehrabi SKrishnan ASohn SRoch ASchmidt HKesterson JBeesley CDexter PMax Schmidt CLiu HPalakal M(2015)DEEPENJournal of Biomedical Informatics10.1016/j.jbi.2015.02.01054:C(213-219)Online publication date: 1-Apr-2015
https://dl.acm.org/doi/10.1016/j.jbi.2015.02.010
Karl AWisnowski JRushing W(2015)A practical guide to text mining with topic extractionWIREs Computational Statistics10.1002/wics.13617:5(326-340)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1002/wics.1361
Fujikawa KSeki KUehara KTao CBouamrane M(2012)A hybrid approach to finding negated and uncertain expressions in biomedical documentsProceedings of the 2nd international workshop on Managing interoperability and compleXity in health systems10.1145/2389672.2389685(67-74)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2389672.2389685

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents