research-article

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

Authors:

Tianming Liang,

Maozu GuoAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 36, Issue 9

Pages 4890 - 4903

https://doi.org/10.1109/TKDE.2024.3377229

Published: 18 March 2024 Publication History

Abstract

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.

References

[1]

M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in Proc. Joint Conf. 47th Annu. Meeting ACL 4th Int. Joint Conf. Natural Lang. Process. AFNLP, 2009, pp. 1003–1011.

[2]

S. Riedel, L. Yao, and A. McCallum, “Modeling relations and their mentions without labeled text,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases, Berlin, Germany: Springer, 2010, pp. 148–163.

[3]

D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation extraction via piecewise convolutional neural networks,” in Proc. Conf. Empirical Methods Natural Lang. Process., Lisbon, Portugal, 2015, pp. 1753–1762.

[4]

Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, “Neural relation extraction with selective attention over instances,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Berlin, Germany, 2016, pp. 2124–2133.

[5]

G. Ji, K. Liu, S. He, and J. Zhao, “Distant supervision for relation extraction with sentence-level attention and entity descriptions,” in Proc. AAAI Conf. Artif. Intell., San Francisco, CA, USA, 2017, pp. 3060–3066.

[6]

S. Jat, S. Khandelwal, and P. Talukdar, “Improving distantly supervised relation extraction using word and entity based attention,” 2018,.

[7]

Z.-X. Ye and Z.-H. Ling, “Distant supervision relation extraction with intra-bag and inter-bag attentions,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., Minneapolis, MN, USA, 2019, pp. 2810–2819.

[8]

G. Wang et al., “Label-free distant supervision for relation extraction via knowledge graph embedding,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2246–2255.

[9]

T. Liang, Y. Liu, X. Liu, H. Zhang, G. Sharma, and M. Guo, “Distantly-supervised long-tailed relation extraction using constraint graphs,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 7, pp. 6852–6865, Jul. 2023.

Digital Library

[10]

X. Li et al., “Entity-relation extraction as multi-turn question answering,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, 2019, pp. 1340–1350.

[11]

X. Zhang, T. Liu, P. Li, W. Jia, and H. Zhao, “Robust neural relation extraction via multi-granularity noises reduction,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 9, pp. 3297–3310, Sep. 2021.

Digital Library

[12]

M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, “Bidirectional attention flow for machine comprehension,” in Proc. 5th Int. Conf. Learn. Representations, Palais des Congrès Neptune, Toulon, France, 2017, pp. 63–**.

[13]

C. Xiong, V. Zhong, and R. Socher, “Dynamic coattention networks for question answering,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, 2017.

[14]

A. Lamurias, L. A. Clarke, and F. M. Couto, “Extracting microRNA-gene relations from biomedical literature using distant supervision,” PLoS One, vol. 12, no. 3, 2017, Art. no.

[15]

T. Zhu, Y. Qin, Y. Xiang, B. Hu, Q. Chen, and W. Peng, “Distantly supervised biomedical relation extraction using piecewise attentive convolutional neural network and reinforcement learning,” J. Amer. Med. Inform. Assoc., no. 12, 2021, Art. no.

[16]

K. B. Cohen and L. Hunter, “Getting started in text mining,” PLoS Comput. Biol., vol. 4, no. 1, pp. 1–3, Jan. 2008.

[17]

R. Xing, J. Luo, and T. Song, “BioRel: A large-scale dataset for biomedical relation extraction,” in Proc. IEEE Int. Conf. Bioinf. Biomed., Los Alamitos, CA, USA, 2019, pp. 1801–1808.

[18]

B. Olivier, “The unified medical language system (UMLS): Integrating biomedical terminology,” Nucleic Acids Res., vol. 32, no. suppl_1, pp. 267–270, 2004.

[19]

Z. Peng, S. Wei, J. Tian, Z. Qi, and X. Bo, “Attention-based bidirectional long short-term memory networks for relation classification,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, 2016.

[20]

L. Hong et al., “A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories,” Nature Mach. Intell., vol. 2, no. 6, pp. 347–355, 2020.

[21]

P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in Proc. Int. Conf. Learn. Representations, Vancouver, BC, Canada, 2018.

[22]

S. Marchesin and G. Silvello, “TBGA: A large-scale gene-disease association dataset for biomedical relation extraction,” BMC Bioinf., vol. 23, no. 1, pp. 1–16, 2022.

[23]

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, “Semantic compositionality through recursive matrix-vector spaces,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2012, pp. 1201–1211.

[24]

C. N. D. Santos, B. Xiang, and B. Zhou, “Classifying relations by ranking with convolutional neural networks,” in Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process. Asian Federation Natural Lang. Process., Beijing, China, 2015, pp. 626–634.

[25]

R. Cai, X. Zhang, and H. Wang, “Bidirectional recurrent convolutional neural network for relation classification,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Berlin, Germany, 2016, pp. 756–765.

[26]

J. Lee, S. Seo, and Y. S. Choi, “Semantic relation classification via bidirectional LSTM networks with entity-aware attention using latent entity typing,” Symmetry, vol. 11, no. 6, 2019, Art. no.

[27]

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, “Knowledge-based weak supervision for information extraction of overlapping relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics Hum. Lang. Technol., Portland, Oregon, USA, 2011, pp. 541–550.

[28]

M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning, “Multi-instance multi-label learning for relation extraction,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2012, pp. 455–465.

[29]

D. Christou and G. Tsoumakas, “Improving distantly-supervised relation extraction through BERT-based label and instance embeddings,” IEEE Access, vol. 9, pp. 62 574–62 582, 2021.

[30]

Y. Liu, K. Liu, L. Xu, and J. Zhao, “Exploring fine-grained entity type constraints for distantly supervised relation extraction,” in Proc. 25th Int. Conf. Comput. Linguistics, Dublin, Ireland, 2014, pp. 2107–2116.

[31]

Y.-L. Hsieh, Y.-C. Chang, N.-W. Chang, and W.-L. Hsu, “Identifying protein-protein interactions in biomedical literature using recurrent neural networks with long short-term memory,” in Proc. 8th Int. Joint Conf. Natural Lang. Process., Taipei, Taiwan, 2017, pp. 240–245.

[32]

Q.-C. Bui, P. M. Sloot, E. M. Van Mulligen, and J. A. Kors, “A novel feature-based approach to extract drug–drug interactions from biomedical text,” Bioinformatics, vol. 30, no. 23, pp. 3365–3371, Dec. 2014.

[33]

X. Su et al., “Biomedical knowledge graph embedding with capsule network for multi-label drug-drug interaction prediction,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 5640–5651, Jun. 2023.

Digital Library

[34]

K. Ravikumar, H. Liu, J. D. Cohn, M. E. Wall, and K. Verspoor, “Literature mining of protein-residue associations with graph rules learned through distant supervision,” J. Biomed. Semantics, vol. 3, no. 3, pp. 1–23, 2012.

[35]

M. Liu, Y. Ling, Y. An, X. Hu, A. Yagoda, and R. Misra, “Relation extraction from biomedical literature with minimal supervision and grouping strategy,” in Proc. IEEE Int. Conf. Bioinf. Biomed., Belfast, U.K., 2014, pp. 444–449.

[36]

A. Vaswani et al., “Attention is all you need,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.

[37]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 4171–4186.

[38]

J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.

[39]

V. Rathore, K. Badola, P. Singla, and Mausam, “PARE: A simple and strong baseline for monolingual and multilingual distantly supervised relation extraction,” in Proc. 60th Annu. Meeting Assoc. Comput. Linguistics, Dublin, Ireland, 2022, pp. 340–354.

[40]

T. Chen, H. Shi, S. Tang, Z. Chen, F. Wu, and Y. Zhuang, “CIL: Contrastive instance learning framework for distantly supervised relation extraction,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., 2021, pp. 6191–6200.

[41]

D. Li, T. Zhang, N. Hu, C. Wang, and X. He, “HiCLRE: A hierarchical contrastive learning framework for distantly supervised relation extraction,” in Proc. Int. Conf. Findings Assoc. Comput. Linguistics, Dublin, Ireland, 2022, pp. 2567–2578.

[42]

D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification via convolutional deep neural network,” in Proc. 25th Int. Conf. Comput. Linguistics, Dublin, Ireland, 2014, pp. 2335–2344.

[43]

H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., vol. 22, pp. 400–407, 1951.

[44]

A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. 33rd Conf. Neural Inf. Process. Syst., Vancouver, Canada: MIT Press, 2019, pp. 8024–8035.

[45]

J. Piero, J. Ramírez-Anguita, J. Saüch-Pitarch, F. Ronzano, and L. I. Furlong, “The DisGeNET knowledge platform for disease genomics: 2019 update,” Nucleic Acids Res., vol. 48, no. D1, pp. D845–D855, 2020.

[46]

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, pp. 249–256.

[47]

K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., Doha, Qatar, 2014, pp. 1724–1734.

[48]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” in Proc. NIPS Workshop Deep Learn., 2014.

[49]

D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn. Representations, San Diego, CA, USA, 2015.

[50]

T. Chen et al., “Empower distantly supervised relation extraction with collaborative adversarial training,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 12 675–12 682.

[51]

X. Han, P. Yu, Z. Liu, M. Sun, and P. Li, “Hierarchical relation extraction with coarse-to-fine grained attention,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2236–2245.

[52]

S. Vashishth, R. Joshi, S. S. Prayaga, C. Bhattacharyya, and P. Talukdar, “RESIDE: Improving distantly-supervised neural relation extraction using side information,” in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 1257–1266.

[53]

Y. Wu et al., “Google's neural machine translation system: Bridging the gap between human and machine translation,” 2016,. [Online]. Available: http://arxiv.org/abs/1609.08144

Index Terms

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
      2. Language resources

Index terms have been assigned to the content through auto-classification.

Recommendations

Distant supervision for relation extraction without labeled data
ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-...
Distant Supervision for Relation Extraction via Group Selection
ICONIP 2015: Proceeings, Part II, of the 22nd International Conference on Neural Information Processing - Volume 9490

Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels ...
Reducing wrong labels in distant supervision for relation extraction
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

In relation extraction, distant supervision seeks to extract relations between entities from text by using a knowledge base, such as Freebase, as a source of supervision. When a sentence and a knowledge base refer to the same entity pair, this approach ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

1041-4347 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 18 March 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents