Abstract
Biomedical named entity recognition and normalization aim at recognizing biomedical entity mentions from text and mapping them to their unique database entity identifiers (IDs), which are the primary task of biomedical text mining. However, name variation and entity ambiguity problems make this task challenging. In this paper, we leverage domain knowledge by a novel knowledge feature representation method to recognize more entity variants, and model important local context through a dual attention mechanism and a gating mechanism to perform entity normalization. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed system achieves the new state-of-the-art performance (0.844 F1-score for protein/gene entity recognition and 0.408 F1-score for normalization).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lin, Y., Liu, Z., Sun, M.: Neural relation extraction with multi-lingual attention. Proc. Assoc. Comput. Linguist. 1, 34–43 (2017)
Rudolf, K., Ondrej, B., Jan, K.: Knowledge base completion: baselines strike back. In: Proceedings of the Association for Computational Linguistics, pp. 69–74 (2017)
Arighi, C., et al.: Bio-ID track overview. In: Proceedings of BioCreative Workshop, pp. 482–376 (2017)
Sheikhshab, G., Starks, E., Karsan, A., Sarkar, A., Birol, I.: Graph-based semi-supervised gene mention tagging. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, pp. 27–35 (2016)
Kaewphan, S., Mehryary, F., Hakala, K., Salakoski, T., Ginter, F.: TurkuNLP entry for interactive Bio-ID assignment. In: Proceedings of the BioCreative VI Workshop, pp. 32–35 (2017)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)
Chiu, J., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)
Sheng, E., Miller, S., Ambite, J., Natarajan, P: A neural named entity recognition approach to biological entity identification. In: Proceedings of the BioCreative VI Workshop, pp. 24–27 (2017)
Devlin, J., Chang, M., Lee, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucl. Acids Res. 32(suppl_1), D115–D119 (2004)
Edgar, R., Domrachev, M., Lash, A.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucl. Acids Res. 30(1), 207–210 (2002)
Eshel, Y., Cohen, N., Radinsky, K., Markovitch, Y., Levy, O.: Named entity disambiguation for noisy text. arXiv preprint arXiv:1706.09147 (2017)
Ganea, O., Hofmann, T.: Deep joint entity disambiguation with local neural attention. arXiv preprint arXiv:1704.04920 (2017)
GENIA Tagger tool Homepage. https://omictools.com/genia-tagger-tool. Accessed 12 Aug 2019
Moen, S., Ananiadou, T.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, pp. 39–43 (2013)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Acknowledgments
This work was supported by the grants of the Ministry of education of Humanities and Social Science project (No. 17YJA740076) and the National Natural Science Foundation of China (No. 61772109). Comments from the audience of CLSW2019 and the reviewers are also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yao, W., Li, X., Li, Z., Liu, Z., Ning, S. (2020). Protein/Gene Entity Recognition and Normalization with Domain Knowledge and Local Context. In: Hong, JF., Zhang, Y., Liu, P. (eds) Chinese Lexical Semantics. CLSW 2019. Lecture Notes in Computer Science(), vol 11831. Springer, Cham. https://doi.org/10.1007/978-3-030-38189-9_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-38189-9_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38188-2
Online ISBN: 978-3-030-38189-9
eBook Packages: Computer ScienceComputer Science (R0)