Abstract
Introduction
Early detection of adverse drug events (ADEs) from electronic health records is an important, challenging task to support pharmacovigilance and drug safety surveillance. A well-known challenge to use clinical text for detection of ADEs is that much of the detailed information is documented in a narrative manner. Clinical natural language processing (NLP) is the key technology to extract information from unstructured clinical text.
Objective
We present a machine learning-based clinical NLP system—MADEx—for detecting medications, ADEs, and their relations from clinical notes.
Methods
We developed a recurrent neural network (RNN) model using a long short-term memory (LSTM) strategy for clinical name entity recognition (NER) and compared it with baseline conditional random fields (CRFs). We also developed a modified training strategy for the RNN, which outperformed the widely used early stop strategy. For relation extraction, we compared support vector machines (SVMs) and random forests on single-sentence relations and cross-sentence relations. In addition, we developed an integrated pipeline to extract entities and relations together by combining RNNs and SVMs.
Results
MADEx achieved the top-three best performances (F1 score of 0.8233) for clinical NER in the 2018 Medication and Adverse Drug Events (MADE1.0) challenge. The post-challenge evaluation showed that the relation extraction module and integrated pipeline (identify entity and relation together) of MADEx are comparable with the best systems developed in this challenge.
Conclusion
This study demonstrated the efficiency of deep learning methods for automatic extraction of medications, ADEs, and their relations from clinical text to support pharmacovigilance and drug safety surveillance.
Similar content being viewed by others
References
Institute of Medicine (US) Committee on quality of health care in America. To err is human: building a safer health system. Washington, DC: National Academies Press; 2000. http://www.ncbi.nlm.nih.gov/books/NBK225182/. Accessed 23 June 2018.
Weiss AJ, Freeman WJ, Heslin KC, Barrett ML. Adverse drug events in US Hospitals, 2010 versus 2014. Statistical brief #234. AHRQ; 2018. https://www.hcup-us.ahrq.gov/reports/statbriefs/sb234-Adverse-Drug-Events.jsp. Accessed Dec 2018.
Stausberg J. International prevalence of adverse drug events in hospitals: an analysis of routine data from England, Germany, and the USA. BMC Health Serv Res. 2014;14:125.
Poudel DR, Acharya P, Ghimire S, Dhital R, Bharati R. Burden of hospitalizations related to adverse drug events in the USA: a retrospective analysis from large inpatient database. Pharmacoepidemiol Drug Saf. 2017;26:635–41.
Aljadhey H, Mahmoud MA, Mayet A, Alshaikh M, Ahmed Y, Murray MD, et al. Incidence of adverse drug events in an academic hospital: a prospective cohort study. Int J Qual Health Care. 2013;25:648–55.
Aljadhey H, Mahmoud MA, Ahmed Y, et al. Incidence of adverse drug events in public and private hospitals in Riyadh, Saudi Arabia: the (ADESA) prospective cohort study. BMJ Open. 2016;6:e010831.
Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49.
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008;17:128–44.
Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18:544–51.
Kumar S. A survey of deep learning methods for relation extraction; 2017. arXiv:170503645.
Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18:552–6.
Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J Am Med Inform Assoc. 2013;20:806–13.
Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. SemEval-2014 Task 7: analysis of clinical text. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014);2014. p. 54–62.
Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, et al. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2015;22:143–54.
Lafferty JD, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning. San Franciso, CA: Morgan Kaufmann Publishers Inc.; 2001. p. 282–89.
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl. 1998;13:18–28.
Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. J Mach Learn Res. 2005;6:1453–84.
Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17:229–36.
Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp. 1997;595–599.
Denny JC, Irani PR, Wehbe FH, Smithers JD, Spickard A. The KnowledgeMap project: development of a concept-based medical school curriculum database. AMIA Annu Symp Proc.; 2003. pp. 195–199.
de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc. 2011;18:557–62.
Zhang Y, Wang J, Tang B, Wu Y, Jiang M, Chen Y, et al. UTH_CCB: a report for semeval 2014–task 7 analysis of clinical text. Sem Eval. 2014;2014:802.
Tang B, Wu Y, Jiang M, Denny JC, Xu H. Recognizing and encoding disorder concepts in clinical text using machine learning and vector space model. CLEF 2013 proceedings. 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-TangEt2013.pdf.
Le H-Q, Nguyen TM, Vu ST, Dang TH. D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics. 2018;24(20):3539–46.
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017;33:i37–48.
Liu Z, Yang M, Wang X, Chen Q, Tang B, Wang Z, et al. Entity recognition from clinical texts via recurrent neural network. BMC Med Inform Decis Mak. 2017;17(Suppl 2):67.
Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. Proc Conf. 2016;2016:473–82.
Wu Y, Jiang M, Lei J, Xu H. Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform. 2015;216:624–8.
Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical named entity recognition using deep learning models. AMIA Annu Symp Proc 2018; 2017:1812–19 (eCollection 2017).
Zhao S, Grishman R. Extracting relations with integrated information using Kernel methods. In: Proceedings of the 43rd annual meeting of the association for computational linguistics. Stroudsburg, PA; 2005. pp. 419–426.
Tikk D, Thomas P, Palaga P, Hakenberg J, Leser U. A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. PLoS Comput Biol. 2010;6:e1000837.
Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction. J Mach Learn Res. 2003;3:1083–106.
Brin S. Extracting patterns and relations from the world wide web. In: Atzeni P, Mendelzon A, Mecca G, editors. The world wide web and databases. London: Springer; 1999. p. 172–83.
Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc. 2013;20:828–35.
Xu J, Wu Y, Zhang Y, Wang J, Lee H-J, Xu H. CD-REST: a system for extracting chemical-induced disease relation in literature. Database. 2016;2016:baw036.
Wei C-H, Peng Y, Leaman R, Davis AP, Mattingly CJ, Li J, et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database. 2016;2016:baw032.
Comeau DC, Islamaj Doğan R, Ciccarese P, et al. BioC: a minimalist approach to interoperability for biomedical text processing. Database. 2013;2013:bat064.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition; 2016. arXiv:160301360.
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H. CLAMP—a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2017. https://doi.org/10.1093/jamia/ocx132.
Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A study of neural word embeddings for named entity recognition in clinical text. AMIA Annu Symp Proc. 2015;2015:1326–33.
LIBSVM. A library for support vector machines. https://www.csie.ntu.edu.tw/~cjlin/libsvm/. Accessed 23 Jun 2018.
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.
Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV. Hybrid system for adverse drug event detection. Proc Mach Learn Res. 2018;90:16–24.
Dandala B, Joopudi V, Devarakonda M. IBM Research System at MADE 2018: detecting adverse drug events from electronic health records. Proc Mach Learn Res. 2018;90:39–47.
Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35:1798–828.
Acknowledgements
The authors would like to thank the organizers who provided the annotated corpus and word embeddings for this challenge, and gratefully acknowledge the support of the NVIDIA Corporation with the donation of the GPUs used for this research. The authors would also like to thank the anonymous reviewers for their helpful feedback.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This study was supported in part by the University of Florida Clinical and Translational Science Institute, which is funded by the National Institutes of Health (NIH) National Center for Advancing Translational Sciences under award number UL1TR001427, and the OneFlorida Clinical Research Consortium, which is funded by the Patient-Centered Outcomes Research Institute (PCORI) under award number CDRN-1501-26692. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Conflict of Interest
Xi Yang, Jiang Bian, Yan Gong, William R. Hogan, and Yonghui Wu have no conflicts of interest to declare that are directly relevant to the contents of this study.
Ethical Considerations
This study utilized de-identified clinical notes provided by the University of Massachusetts Medical School through the MADE1.0 challenge, and was approved by the University of Florida Institutional Review Board.
Additional information
Part of a theme issue on "NLP Challenge for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0)" guest edited by Feifan Liu, Abhyuday Jagannatha and Hong Yu.
Rights and permissions
About this article
Cite this article
Yang, X., Bian, J., Gong, Y. et al. MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes. Drug Saf 42, 123–133 (2019). https://doi.org/10.1007/s40264-018-0761-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40264-018-0761-0