Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

Published: 18 March 2024 Publication History

Abstract

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.

References

[1]
M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in Proc. Joint Conf. 47th Annu. Meeting ACL 4th Int. Joint Conf. Natural Lang. Process. AFNLP, 2009, pp. 1003–1011.
[2]
S. Riedel, L. Yao, and A. McCallum, “Modeling relations and their mentions without labeled text,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases, Berlin, Germany: Springer, 2010, pp. 148–163.
[3]
D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation extraction via piecewise convolutional neural networks,” in Proc. Conf. Empirical Methods Natural Lang. Process., Lisbon, Portugal, 2015, pp. 1753–1762.
[4]
Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, “Neural relation extraction with selective attention over instances,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Berlin, Germany, 2016, pp. 2124–2133.
[5]
G. Ji, K. Liu, S. He, and J. Zhao, “Distant supervision for relation extraction with sentence-level attention and entity descriptions,” in Proc. AAAI Conf. Artif. Intell., San Francisco, CA, USA, 2017, pp. 3060–3066.
[6]
S. Jat, S. Khandelwal, and P. Talukdar, “Improving distantly supervised relation extraction using word and entity based attention,” 2018,.
[7]
Z.-X. Ye and Z.-H. Ling, “Distant supervision relation extraction with intra-bag and inter-bag attentions,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., Minneapolis, MN, USA, 2019, pp. 2810–2819.
[8]
G. Wang et al., “Label-free distant supervision for relation extraction via knowledge graph embedding,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2246–2255.
[9]
T. Liang, Y. Liu, X. Liu, H. Zhang, G. Sharma, and M. Guo, “Distantly-supervised long-tailed relation extraction using constraint graphs,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 7, pp. 6852–6865, Jul. 2023.
[10]
X. Li et al., “Entity-relation extraction as multi-turn question answering,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, 2019, pp. 1340–1350.
[11]
X. Zhang, T. Liu, P. Li, W. Jia, and H. Zhao, “Robust neural relation extraction via multi-granularity noises reduction,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 9, pp. 3297–3310, Sep. 2021.
[12]
M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, “Bidirectional attention flow for machine comprehension,” in Proc. 5th Int. Conf. Learn. Representations, Palais des Congrès Neptune, Toulon, France, 2017, pp. 63–**.
[13]
C. Xiong, V. Zhong, and R. Socher, “Dynamic coattention networks for question answering,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, 2017.
[14]
A. Lamurias, L. A. Clarke, and F. M. Couto, “Extracting microRNA-gene relations from biomedical literature using distant supervision,” PLoS One, vol. 12, no. 3, 2017, Art. no.
[15]
T. Zhu, Y. Qin, Y. Xiang, B. Hu, Q. Chen, and W. Peng, “Distantly supervised biomedical relation extraction using piecewise attentive convolutional neural network and reinforcement learning,” J. Amer. Med. Inform. Assoc., no. 12, 2021, Art. no.
[16]
K. B. Cohen and L. Hunter, “Getting started in text mining,” PLoS Comput. Biol., vol. 4, no. 1, pp. 1–3, Jan. 2008.
[17]
R. Xing, J. Luo, and T. Song, “BioRel: A large-scale dataset for biomedical relation extraction,” in Proc. IEEE Int. Conf. Bioinf. Biomed., Los Alamitos, CA, USA, 2019, pp. 1801–1808.
[18]
B. Olivier, “The unified medical language system (UMLS): Integrating biomedical terminology,” Nucleic Acids Res., vol. 32, no. suppl_1, pp. 267–270, 2004.
[19]
Z. Peng, S. Wei, J. Tian, Z. Qi, and X. Bo, “Attention-based bidirectional long short-term memory networks for relation classification,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, 2016.
[20]
L. Hong et al., “A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories,” Nature Mach. Intell., vol. 2, no. 6, pp. 347–355, 2020.
[21]
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in Proc. Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018.
[22]
S. Marchesin and G. Silvello, “TBGA: A large-scale gene-disease association dataset for biomedical relation extraction,” BMC Bioinf., vol. 23, no. 1, pp. 1–16, 2022.
[23]
R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, “Semantic compositionality through recursive matrix-vector spaces,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2012, pp. 1201–1211.
[24]
C. N. D. Santos, B. Xiang, and B. Zhou, “Classifying relations by ranking with convolutional neural networks,” in Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process. Asian Federation Natural Lang. Process., Beijing, China, 2015, pp. 626–634.
[25]
R. Cai, X. Zhang, and H. Wang, “Bidirectional recurrent convolutional neural network for relation classification,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Berlin, Germany, 2016, pp. 756–765.
[26]
J. Lee, S. Seo, and Y. S. Choi, “Semantic relation classification via bidirectional LSTM networks with entity-aware attention using latent entity typing,” Symmetry, vol. 11, no. 6, 2019, Art. no.
[27]
R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, “Knowledge-based weak supervision for information extraction of overlapping relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics Hum. Lang. Technol., Portland, Oregon, USA, 2011, pp. 541–550.
[28]
M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning, “Multi-instance multi-label learning for relation extraction,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2012, pp. 455–465.
[29]
D. Christou and G. Tsoumakas, “Improving distantly-supervised relation extraction through BERT-based label and instance embeddings,” IEEE Access, vol. 9, pp. 62 574–62 582, 2021.
[30]
Y. Liu, K. Liu, L. Xu, and J. Zhao, “Exploring fine-grained entity type constraints for distantly supervised relation extraction,” in Proc. 25th Int. Conf. Comput. Linguistics, Dublin, Ireland, 2014, pp. 2107–2116.
[31]
Y.-L. Hsieh, Y.-C. Chang, N.-W. Chang, and W.-L. Hsu, “Identifying protein-protein interactions in biomedical literature using recurrent neural networks with long short-term memory,” in Proc. 8th Int. Joint Conf. Natural Lang. Process., Taipei, Taiwan, 2017, pp. 240–245.
[32]
Q.-C. Bui, P. M. Sloot, E. M. Van Mulligen, and J. A. Kors, “A novel feature-based approach to extract drug–drug interactions from biomedical text,” Bioinformatics, vol. 30, no. 23, pp. 3365–3371, Dec. 2014.
[33]
X. Su et al., “Biomedical knowledge graph embedding with capsule network for multi-label drug-drug interaction prediction,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 5640–5651, Jun. 2023.
[34]
K. Ravikumar, H. Liu, J. D. Cohn, M. E. Wall, and K. Verspoor, “Literature mining of protein-residue associations with graph rules learned through distant supervision,” J. Biomed. Semantics, vol. 3, no. 3, pp. 1–23, 2012.
[35]
M. Liu, Y. Ling, Y. An, X. Hu, A. Yagoda, and R. Misra, “Relation extraction from biomedical literature with minimal supervision and grouping strategy,” in Proc. IEEE Int. Conf. Bioinf. Biomed., Belfast, U.K., 2014, pp. 444–449.
[36]
A. Vaswani et al., “Attention is all you need,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.
[37]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 4171–4186.
[38]
J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
[39]
V. Rathore, K. Badola, P. Singla, and Mausam, “PARE: A simple and strong baseline for monolingual and multilingual distantly supervised relation extraction,” in Proc. 60th Annu. Meeting Assoc. Comput. Linguistics, Dublin, Ireland, 2022, pp. 340–354.
[40]
T. Chen, H. Shi, S. Tang, Z. Chen, F. Wu, and Y. Zhuang, “CIL: Contrastive instance learning framework for distantly supervised relation extraction,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., 2021, pp. 6191–6200.
[41]
D. Li, T. Zhang, N. Hu, C. Wang, and X. He, “HiCLRE: A hierarchical contrastive learning framework for distantly supervised relation extraction,” in Proc. Int. Conf. Findings Assoc. Comput. Linguistics, Dublin, Ireland, 2022, pp. 2567–2578.
[42]
D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification via convolutional deep neural network,” in Proc. 25th Int. Conf. Comput. Linguistics, Dublin, Ireland, 2014, pp. 2335–2344.
[43]
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., vol. 22, pp. 400–407, 1951.
[44]
A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. 33rd Conf. Neural Inf. Process. Syst., Vancouver, Canada: MIT Press, 2019, pp. 8024–8035.
[45]
J. Piero, J. Ramírez-Anguita, J. Saüch-Pitarch, F. Ronzano, and L. I. Furlong, “The DisGeNET knowledge platform for disease genomics: 2019 update,” Nucleic Acids Res., vol. 48, no. D1, pp. D845–D855, 2020.
[46]
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, pp. 249–256.
[47]
K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., Doha, Qatar, 2014, pp. 1724–1734.
[48]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” in Proc. NIPS Workshop Deep Learn., 2014.
[49]
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn. Representations, San Diego, CA, USA, 2015.
[50]
T. Chen et al., “Empower distantly supervised relation extraction with collaborative adversarial training,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 12 675–12 682.
[51]
X. Han, P. Yu, Z. Liu, M. Sun, and P. Li, “Hierarchical relation extraction with coarse-to-fine grained attention,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2236–2245.
[52]
S. Vashishth, R. Joshi, S. S. Prayaga, C. Bhattacharyya, and P. Talukdar, “RESIDE: Improving distantly-supervised neural relation extraction using side information,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 1257–1266.
[53]
Y. Wu et al., “Google's neural machine translation system: Bridging the gap between human and machine translation,” 2016,. [Online]. Available: http://arxiv.org/abs/1609.08144

Index Terms

  1. Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction
                Index terms have been assigned to the content through auto-classification.

                Recommendations

                Comments

                Please enable JavaScript to view thecomments powered by Disqus.

                Information & Contributors

                Information

                Published In

                Publisher

                IEEE Educational Activities Department

                United States

                Publication History

                Published: 18 March 2024

                Qualifiers

                • Research-article

                Contributors

                Other Metrics

                Bibliometrics & Citations

                Bibliometrics

                Article Metrics

                • 0
                  Total Citations
                • 0
                  Total Downloads
                • Downloads (Last 12 months)0
                • Downloads (Last 6 weeks)0
                Reflects downloads up to 02 Oct 2024

                Other Metrics

                Citations

                View Options

                View options

                Get Access

                Login options

                Media

                Figures

                Other

                Tables

                Share

                Share

                Share this Publication link

                Share on social media