Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A machine-learning approach to negation and speculation detection in clinical texts

Published: 01 July 2012 Publication History

Abstract

Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation. © 2012 Wiley Periodicals, Inc.

References

[1]
Agarwal, Sh., & Yu, H. (2010). Biomedical negation scope detection with conditional random fields. Journal of the American Medical Information Association, 17(6), 696–701.
[2]
Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., & Rokach, L. (2004). Context-sensitive medical information retrieval. In M. Fieschi, E. Coiera, & Y.-C.J. Li (Eds.), MEDINFO 2004: Proceedings of the 11th World Congress on Medical Informatics (pp. 1––8). Amsterdam, The Netherlands: IOS Press.
[3]
Buenaga, M., Fernández Riverola, F., Maña, M., Puertas, E., Glez-Peña, D., & Mata, J. (2010). Medical-Miner: Integracióón de conocimiento textual explícito en técnicas de minería de datos para la creación de herramientas traslacionales en medicina {Medical-Miner: Integrating explicit knowledge in data mining techniques for the development of translational medicine tools}. Procesamiento del Lenguaje Natural, 47, 319–320.
[4]
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27), 1–27.
[5]
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., & Buchanan, B.G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Information, 34(5), 301–310.
[6]
Collier, N., Park, H.S., Ogata, N., Tateishi, Y., Nobata, C., Ohta, T., … Tsujii, J. (1999). The GENIA project: Corpus-based knowledge acquisition and information extraction from genome research papers. In H.S. Thompson & A. Lascarides (Eds.), Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL-99) (pp. 8–12). Stroudsburg, PA: Association for Computational Linguistics.
[7]
Councill, I., McDonald, R., & Velikovich, L. (2010). What's great and what's not: Learning to classify the scope of negation for improved sentiment analysis. In R. Morante & C. Sporleder (Eds.), Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP '10). Stroudsburg, PA: Association for Computational Linguistics.
[8]
Dadvar, M., Hauff, C., & de Jong, F. (2011). Scope of negation detection in sentiment analysis. In Dutch-Belgian Information Retrieval Workshop (pp. 16–20). Amsterdam, The Netherlands: IOS Press.
[9]
Denny, J.C., Miller, R.A., Waitman, L.R., Arrieta, M.A., & Peterson, J.F. (2008). Identifying QT prolongation from ECG impressions using a general-purpose natural language processor. International Journal of Medical Informatics, 78(Suppl 1), S34–S42.
[10]
Denny, J.C., Choma, N.N., Peterson, J.F., Miller, R.A., Bastarache, L., Li, M., & Peterson, N.B. (2012). Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Medical Decision Making, 32(1), 188–197.
[11]
Elkin, P.L., Brown, S.H., Bauer, B.A., Husser, C.S., Carruth, W., Bergstrom, L.R., & Wahner-Roedler, D.L. (2005). A controlled trial of automated classification of negation from clinical notes. BMC Medical Information Decision Making, 5(1), 13.
[12]
Garcia, S., Fernandez, A., & Herrera, F. (2009). Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Applied Soft Computing, 9, 1304–1314.
[13]
Goldin, I.M., & Chapman, W.W. (2003). Learning to detect negation with ““not” in medical texts. In SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference. New York, NY: ACM Press.
[14]
Huang, Y., & Lowe, H.J. (2007). A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Information Association, 14(3), 304–311.
[15]
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification (Technical report). Taiwan: National Taiwan University, Department of Computer Science.
[16]
Mierswa, I., Lemmen, F., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T. (2006). YALE: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Dunopulos, & T. Eliassi-Rad (Eds.), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06) (pp. 935–940). New York, NY: ACM Press.
[17]
Mitchell, K.J., Becich, M.J., Berman, J.J., Chapman, W.W., Gilbertson, J., Gupta, D., … Crowley, R.S. (2004). Implementation an evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports. Medinformation, 11(Pt 1), 663–667.
[18]
Morante, R., Liekens, A., & Daelemans, W. (2008). Learning the scope of negation in biomedical texts. In M. Lapata & T.H. NG (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 715––724). Stroudsburg, PA: Association of Computational Linguistics.
[19]
Morante, R., & Daelemans, W. (2009a). A metalearning approach to processing the scope of negation. In S. Stevenson & X. Carreras (Eds.), Proceedings of the 13th Conference on Computational Natural Language Learning (pp. 21––29). Stroudsburg, PA: Association of Computational Linguistics.
[20]
Morante R, & Daelemans W. (2009b). Learning the scope of hedge cues in biomedical texts. In K.B. Cohen, D. Demner-Fushman, S. Ananiadon, J. Pestian, J. Tsujii, & B. Webber (Eds.), Proceedings of the Workshop on BioNLP (pp. 28––36). Stroudsburg, PA: Association of Computational Linguistics.
[21]
Mutalik, P.G., Deshpande, A., & Nadkarni, P.M. (2001). Use of general purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Information Association, 8(6), 598–609.
[22]
Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. In K.B. Cohen, D. Demner-Fushman, C. Friedman, L. Hirschman, & J. Pestian (Eds.), Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (pp. 97–104). Stroudsburg, PA: Association of Computational Linguistics.
[23]
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
[24]
Quinlan, J.R. (1993). C4.5: Programs for machine learning. Waltham, MA: Morgan Kaufmann.
[25]
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Survey, 34(1), 1–47.
[26]
Toutanova, K., & Manning, C.D. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In H. Schüütze & K.-Y. Su (Eds.), Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 63–70). Stroudsburg, PA: Association of Computational Linguistics.
[27]
Van Rijsbergen, C.J. (1979). Information retrieval. London, England: Butterworths-Heinemann.
[28]
Vince, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope corpus: Annotation for negation, uncertainty and their scope in biomedical texts. In D. Demner-Fushman, S. Ananiadon, K.B. Cohen, J. Pestian, J. Tsujii, & B. Webber (Eds.), Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP '08) (pp. 38–45). Stroudsburg, PA: Association of Computational Linguistics.
[29]
Witten, I.H., & Frank, E. Data mining: Practical machine learning tools and techniques (2nd ed.). Waltham, MA: Morgan Kaufmann.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology  Volume 63, Issue 7
July 2012
191 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 July 2012

Author Tags

  1. biomedical information
  2. machine learning
  3. natural language processing
  4. semantic analysis
  5. signal boundary detection

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Spa-neg: An Approach for Negation Detection in Clinical Text Written in SpanishBioinformatics and Biomedical Engineering10.1007/978-3-030-45385-5_29(323-337)Online publication date: 6-May-2020
  • (2019)Negation scope detection with recurrent neural networks models in review textsInternational Journal of High Performance Computing and Networking10.5555/3319261.331926913:2(211-221)Online publication date: 1-Jan-2019
  • (2018)Trie-based rule processing for clinical NLPJournal of Biomedical Informatics10.1016/j.jbi.2018.08.00285:C(106-113)Online publication date: 1-Sep-2018
  • (2017)Detecting negation and scope in Chinese clinical notes using character and word embeddingComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2016.11.009140:C(53-59)Online publication date: 1-Mar-2017
  • (2016)Uncertainty detection in natural languageProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912873(1-10)Online publication date: 13-Jun-2016
  • (2016)Speculation detection for Chinese clinical notesJournal of Biomedical Informatics10.1016/j.jbi.2016.02.01160:C(334-341)Online publication date: 1-Apr-2016
  • (2016)A machine-learning approach to negation and speculation detection for sentiment analysisJournal of the Association for Information Science and Technology10.1002/asi.2353367:9(2118-2136)Online publication date: 1-Sep-2016
  • (2015)DEEPENJournal of Biomedical Informatics10.1016/j.jbi.2015.02.01054:C(213-219)Online publication date: 1-Apr-2015
  • (2015)A practical guide to text mining with topic extractionWIREs Computational Statistics10.1002/wics.13617:5(326-340)Online publication date: 1-Sep-2015
  • (2012)A hybrid approach to finding negated and uncertain expressions in biomedical documentsProceedings of the 2nd international workshop on Managing interoperability and compleXity in health systems10.1145/2389672.2389685(67-74)Online publication date: 29-Oct-2012

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media