Open knowledge base canonicalization with multi-task learning

Bingchen Liu¹,
Huang Peng²,
Weixin Zeng²,
Xiang Zhao²,
Shijun Liu^1,3,
Li Pan¹ &
…
Xin Li¹

163 Accesses
Explore all metrics

Abstract

The construction of large open knowledge bases (OKBs) is integral to many knowledge-driven applications on the world wide web such as web search. However, noun phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Nevertheless, these works fail to fully exploit the synergy between clustering and KGE learning, and the methods designed for these sub-tasks are sub-optimal. To this end, we put forward a multi-task learning framework, namely MulCanon, to tackle OKB canonicalization. Specifically, diffusion model is used in the soft clustering process to improve the noun phrase representations with neighboring information, which can lead to more accurate representations. MulCanon unifies the learning objective of diffusion model, KGE model, side information and cluster assignment, and adopts a two-stage multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization benchmarks validates that MulCanon can achieve competitive canonicalization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases

Efficient Distributed Knowledge Representation Learning for Large Knowledge Graphs

Multilingual Knowledge Graph Completion with Negative Sample Balance Based Adaptive Self-supervised Graph Alignment

Availability of data and materials

All of the materials including figures is owned by the authors and no permissions are required.

References

Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006). https://doi.org/10.1109/TKDE.2006.152
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A core of semantic knowledge unifying WordNet and wikipedi. In: Proceedings of the 2007 World Wide Web Conference on World Wide Web-WWW’07, pp. 449–458(2007). https://hal.archives-ouvertes.fr/hal-01472497
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A shared database of structured general human knowledge. In: Proceedings of the Special Interest Group on Management Of Data-SIGMOD’08, pp. 1247–1250 (2008). https://doi.org/10.5555/1619797.1619981
Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1271–1279 (2017). https://doi.org/10.1145/3038912.3052558
Kurt, Z., Köllmer, T., Aichroth, P.: An explainable knowledge graph-based news recommendation system. In: Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2023, Volume 1: KDIR, Rome, Italy, November 13-15, 2023, pp. 214–221 (2023). https://doi.org/10.5220/0012161300003598
Angeli, G., Johnson Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 344–354 (2015). https://doi.org/10.3115/v1/P15-1034, http://aclweb.org/anthology/P15-1034
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545 (2011). https://aclanthology.org/D11-1142
Vashishth, S., Jain, P., Talukdar, P.: CESI: Canonicalizing open knowledge bases using embeddings and side information. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW’18, pp. 1317–1327 (2018). https://doi.org/10.1145/3178876.3186030, arXiv:1902.00172
Sturgeon, D.: Constructing a crowdsourced linked open knowledge base of chinese history. In: 2021 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), pp. 1–6 (2021). https://doi.org/10.23919/PNC53575.2021.9672294, https://ieeexplore.ieee.org/document/9672294/
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162, http://aclweb.org/anthology/D14-1162
Lin, X., Chen, L.: Canonicalization of open knowledge bases with side information from the source text. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 950–961 (2019). https://doi.org/10.1109/ICDE.2019.00089, https://ieeexplore.ieee.org/document/8731346/
Shen, W., Yang, Y., Liu, Y.: Multi-view clustering for open knowledge base canonicalization. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1578–1588 (2022). https://doi.org/10.1145/3534678.3539449
Dash, S., Rossiello, G., Mihindukulasooriya, N., Bagchi, S., Gliozzo, A.: Open knowledge graphs canonicalization using variational autoencoders. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 10379–10394 (2021). arXiv:2012.04780
Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. arXiv (2017). arXiv:1611.05148. Accessed 2023-01-01
Galárraga, L., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1679–1688 (2014). https://doi.org/10.1145/2661829.2662073
Liu, Y., Shen, W., Wang, Y., Wang, J., Yang, Z., Yuan, X.: Joint open knowledge base canonicalization and linking. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2253–2261 (2021). https://doi.org/10.1145/3448016.3452776
Wu, T.-H., Wu, Z., Kao, B., Yin, P.: Towards practical open knowledge base canonicalization. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 883–892 (2018). https://doi.org/10.1145/3269206.3271707
Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 425–430 (2015). https://doi.org/10.3115/v1/P15-2070, http://aclweb.org/anthology/P15-2070
Zhao, X., Zeng, W., Tang, J.: Entity alignment-concepts, recent advances and novel approaches. Big Data Management (2023). https://doi.org/10.1007/978-981-99-4250-3
Article Google Scholar
Zeng, W., Zhao, X., Li, X., Tang, J., Wang, W.: On entity alignment at scale. VLDB J. 31(5), 1009–1033 (2022)
Article Google Scholar
Zeng, W., Zhao, X., Tang, J., Lin, X., Groth, P.: Reinforcement learning-based collective entity alignment with adaptive features. ACM Trans. Inf. Syst. 39(3), 26–12631 (2021)
Article Google Scholar
Zeng, W., Zhao, X., Tang, J., Lin, X.: Collective entity alignment via adaptive features. In:36th IEEE International Conference on Data Engineering, pp. 1870–1873 (2020)
Zeng, W., Zhao, X., Wang, W., Tang, J., Tan, Z.: Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’20, pp. 811–820. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401161
Chai, H., Cui, J., Wang, Y., Zhang, M., Fang, B., Liao, Q.: Improving gradient trade-offs between tasks in multi-task text classification. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pp. 2565–2579 (2023)
Gao, M., Li, J.-Y., Chen, C.-H., Li, Y., Zhang, J., Zhan, Z.-H.: Enhanced multi-task learning and knowledge graph-based recommender system. IEEE Trans. Knowl. Data Eng. 35(10), 10281–10294 (2023). https://doi.org/10.1109/TKDE.2023.3251897
Article Google Scholar
Zhou, Y., Guo, J., Song, B., Chen, C., Chang, J., Yu, F.R.: Trust-aware multi-task knowledge graph for recommendation. IEEE Trans. Knowl. Data Eng. 35(8), 8658–8671 (2023). https://doi.org/10.1109/TKDE.2022.3221160
Article Google Scholar
Pei, S., Zhang, Q., Zhang, X.: Few-shot low-resource knowledge graph completion with reinforced task generation. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 7252–7264. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.findings-acl.455, https://aclanthology.org/2023.findings-acl.455
Zhang, Z., Zhuang, F., Zhu, H., Li, C., Xiong, H., He, Q., Xu, Y.: Towards robust knowledge graph embedding via multi-task reinforcement learning. IEEE Trans. Knowl. Data Eng. 35(4), 4321–4334 (2023). https://doi.org/10.1109/TKDE.2021.3127951
Article Google Scholar
Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: AnoDDPM: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 649–655 (2022). https://doi.org/10.1109/CVPRW56347.2022.00080, https://ieeexplore.ieee.org/document/9857019/
Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR) (2022). arXiv:2111.14822
Shan, X., Sun, J., Guo, Z., Yao, W., Zhou, Z.: Fractional-order diffusion model for multiplicative noise removal in texture-rich images and its fast explicit diffusion solving. BIT Numer. Math. 62(4), 1319–1354 (2022). https://doi.org/10.1007/s10543-022-00913-3
Article MathSciNet Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, USA (2005)
Google Scholar
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs, 1955–1961 (2016). https://doi.org/10.1609/AAAI.V30I1.10314
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (2019)
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012). https://doi.org/10.5555/2390948.2391003
Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 425–430. https://doi.org/10.3115/v1/P15-2070, http://aclweb.org/anthology/P15-2070
Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545
Smucker, M., Clarke, C., Cormack, G.: Experiments with clueweb09: Relevance feedback and web tracks. (2009). https://www.researchgate.net/publication/221038320_Experiments_with_ClueWeb09_Relevance_Feedback_and_Web_Tracks
Jiang, C., Jiang, Y., Wu, W., Zheng, Y., Xie, P., Tu, K.: Combo: A complete benchmark for open kg canonicalization. In: The 17th Conference of the European Chapter of the Association for Computational Linguistics (2023)
Souza Silva, L., Barbosa, L.: Matching news articles and wikipedia tables for news augmentation. Knowl. Inf. Syst. 65(4), 1713–1734 (2023). https://doi.org/10.1007/S10115-022-01815-0
Article Google Scholar
Maximilian Nickel, T.P. Lorenzo Rosasco: Holographic embeddings of knowledge graphs. In: Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence (2016)
Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: An unsupervised and generative approach to clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1965–1972 (2017)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Neural Information Processing Systems (2013)

Download references

Funding

The authors would like to acknowledge the support provided by the Key R&D Program of Shandong Province, China (No. 2023CXGC010801), the National Natural Science Foundation of China (No.62302513 & 62272469), the “New 20 Regulations for Universities" funding program of Jinan (No.202228089) and the TaiShan Industrial Experts Programme (No.tscx202312128).

Author information

Authors and Affiliations

School of Software, Shandong University, 1500 Shunhua Road, Jinan, 250000, Shandong Province, China
Bingchen Liu, Shijun Liu, Li Pan & Xin Li
Laboratory for Big Data and Decision, National University of Defense Technology, 109 Deya Road, Changsha, 410073, Hunan Province, China
Huang Peng, Weixin Zeng & Xiang Zhao
QuanCheng Laboratory, Jinan, 250103, Shandong Province, China
Shijun Liu

Authors

Bingchen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shijun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Bingchen Liu wrote the main manuscript text, prepared all figures and tables and provided the methodology. Weixin Zeng, Xiang Zhao and Huang Peng provided writing-review and editing. Li Pan, Xin Li and Shijun Liu provided writing-review and editing and provided funding support.

Corresponding author

Correspondence to Weixin Zeng.

Ethics declarations

Competing interests

I declare that all authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and discussion reported in this paper.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, B., Peng, H., Zeng, W. et al. Open knowledge base canonicalization with multi-task learning. World Wide Web 27, 51 (2024). https://doi.org/10.1007/s11280-024-01288-x

Download citation

Received: 21 March 2024
Revised: 28 May 2024
Accepted: 05 July 2024
Published: 18 July 2024
DOI: https://doi.org/10.1007/s11280-024-01288-x

Open knowledge base canonicalization with multi-task learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases

Efficient Distributed Knowledge Representation Learning for Large Knowledge Graphs

Multilingual Knowledge Graph Completion with Negative Sample Balance Based Adaptive Self-supervised Graph Alignment

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Open knowledge base canonicalization with multi-task learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases

Efficient Distributed Knowledge Representation Learning for Large Knowledge Graphs

Multilingual Knowledge Graph Completion with Negative Sample Balance Based Adaptive Self-supervised Graph Alignment

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation