Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Open knowledge base canonicalization with multi-task learning

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The construction of large open knowledge bases (OKBs) is integral to many knowledge-driven applications on the world wide web such as web search. However, noun phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Nevertheless, these works fail to fully exploit the synergy between clustering and KGE learning, and the methods designed for these sub-tasks are sub-optimal. To this end, we put forward a multi-task learning framework, namely MulCanon, to tackle OKB canonicalization. Specifically, diffusion model is used in the soft clustering process to improve the noun phrase representations with neighboring information, which can lead to more accurate representations. MulCanon unifies the learning objective of diffusion model, KGE model, side information and cluster assignment, and adopts a two-stage multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization benchmarks validates that MulCanon can achieve competitive canonicalization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3

Similar content being viewed by others

Availability of data and materials

All of the materials including figures is owned by the authors and no permissions are required.

References

  1. Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006). https://doi.org/10.1109/TKDE.2006.152

    Article  Google Scholar 

  2. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A core of semantic knowledge unifying WordNet and wikipedi. In: Proceedings of the 2007 World Wide Web Conference on World Wide Web-WWW’07, pp. 449–458(2007). https://hal.archives-ouvertes.fr/hal-01472497

  3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A shared database of structured general human knowledge. In: Proceedings of the Special Interest Group on Management Of Data-SIGMOD’08, pp. 1247–1250 (2008). https://doi.org/10.5555/1619797.1619981

  4. Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1271–1279 (2017). https://doi.org/10.1145/3038912.3052558

  5. Kurt, Z., Köllmer, T., Aichroth, P.: An explainable knowledge graph-based news recommendation system. In: Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2023, Volume 1: KDIR, Rome, Italy, November 13-15, 2023, pp. 214–221 (2023). https://doi.org/10.5220/0012161300003598

  6. Angeli, G., Johnson Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 344–354 (2015). https://doi.org/10.3115/v1/P15-1034, http://aclweb.org/anthology/P15-1034

  7. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545 (2011). https://aclanthology.org/D11-1142

  8. Vashishth, S., Jain, P., Talukdar, P.: CESI: Canonicalizing open knowledge bases using embeddings and side information. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW’18, pp. 1317–1327 (2018). https://doi.org/10.1145/3178876.3186030, arXiv:1902.00172

  9. Sturgeon, D.: Constructing a crowdsourced linked open knowledge base of chinese history. In: 2021 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), pp. 1–6 (2021). https://doi.org/10.23919/PNC53575.2021.9672294, https://ieeexplore.ieee.org/document/9672294/

  10. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162, http://aclweb.org/anthology/D14-1162

  11. Lin, X., Chen, L.: Canonicalization of open knowledge bases with side information from the source text. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 950–961 (2019). https://doi.org/10.1109/ICDE.2019.00089, https://ieeexplore.ieee.org/document/8731346/

  12. Shen, W., Yang, Y., Liu, Y.: Multi-view clustering for open knowledge base canonicalization. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1578–1588 (2022). https://doi.org/10.1145/3534678.3539449

  13. Dash, S., Rossiello, G., Mihindukulasooriya, N., Bagchi, S., Gliozzo, A.: Open knowledge graphs canonicalization using variational autoencoders. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 10379–10394 (2021). arXiv:2012.04780

  14. Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. arXiv (2017). arXiv:1611.05148. Accessed 2023-01-01

  15. Galárraga, L., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1679–1688 (2014). https://doi.org/10.1145/2661829.2662073

  16. Liu, Y., Shen, W., Wang, Y., Wang, J., Yang, Z., Yuan, X.: Joint open knowledge base canonicalization and linking. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2253–2261 (2021). https://doi.org/10.1145/3448016.3452776

  17. Wu, T.-H., Wu, Z., Kao, B., Yin, P.: Towards practical open knowledge base canonicalization. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 883–892 (2018). https://doi.org/10.1145/3269206.3271707

  18. Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 425–430 (2015). https://doi.org/10.3115/v1/P15-2070, http://aclweb.org/anthology/P15-2070

  19. Zhao, X., Zeng, W., Tang, J.: Entity alignment-concepts, recent advances and novel approaches. Big Data Management (2023). https://doi.org/10.1007/978-981-99-4250-3

    Article  Google Scholar 

  20. Zeng, W., Zhao, X., Li, X., Tang, J., Wang, W.: On entity alignment at scale. VLDB J. 31(5), 1009–1033 (2022)

    Article  Google Scholar 

  21. Zeng, W., Zhao, X., Tang, J., Lin, X., Groth, P.: Reinforcement learning-based collective entity alignment with adaptive features. ACM Trans. Inf. Syst. 39(3), 26–12631 (2021)

    Article  Google Scholar 

  22. Zeng, W., Zhao, X., Tang, J., Lin, X.: Collective entity alignment via adaptive features. In:36th IEEE International Conference on Data Engineering, pp. 1870–1873 (2020)

  23. Zeng, W., Zhao, X., Wang, W., Tang, J., Tan, Z.: Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’20, pp. 811–820. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401161

  24. Chai, H., Cui, J., Wang, Y., Zhang, M., Fang, B., Liao, Q.: Improving gradient trade-offs between tasks in multi-task text classification. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pp. 2565–2579 (2023)

  25. Gao, M., Li, J.-Y., Chen, C.-H., Li, Y., Zhang, J., Zhan, Z.-H.: Enhanced multi-task learning and knowledge graph-based recommender system. IEEE Trans. Knowl. Data Eng. 35(10), 10281–10294 (2023). https://doi.org/10.1109/TKDE.2023.3251897

    Article  Google Scholar 

  26. Zhou, Y., Guo, J., Song, B., Chen, C., Chang, J., Yu, F.R.: Trust-aware multi-task knowledge graph for recommendation. IEEE Trans. Knowl. Data Eng. 35(8), 8658–8671 (2023). https://doi.org/10.1109/TKDE.2022.3221160

    Article  Google Scholar 

  27. Pei, S., Zhang, Q., Zhang, X.: Few-shot low-resource knowledge graph completion with reinforced task generation. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 7252–7264. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.findings-acl.455, https://aclanthology.org/2023.findings-acl.455

  28. Zhang, Z., Zhuang, F., Zhu, H., Li, C., Xiong, H., He, Q., Xu, Y.: Towards robust knowledge graph embedding via multi-task reinforcement learning. IEEE Trans. Knowl. Data Eng. 35(4), 4321–4334 (2023). https://doi.org/10.1109/TKDE.2021.3127951

    Article  Google Scholar 

  29. Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: AnoDDPM: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 649–655 (2022). https://doi.org/10.1109/CVPRW56347.2022.00080, https://ieeexplore.ieee.org/document/9857019/

  30. Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR) (2022). arXiv:2111.14822

  31. Shan, X., Sun, J., Guo, Z., Yao, W., Zhou, Z.: Fractional-order diffusion model for multiplicative noise removal in texture-rich images and its fast explicit diffusion solving. BIT Numer. Math. 62(4), 1319–1354 (2022). https://doi.org/10.1007/s10543-022-00913-3

    Article  MathSciNet  Google Scholar 

  32. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, USA (2005)

    Google Scholar 

  33. Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs, 1955–1961 (2016). https://doi.org/10.1609/AAAI.V30I1.10314

  34. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (2019)

  35. Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012). https://doi.org/10.5555/2390948.2391003

  36. Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 425–430. https://doi.org/10.3115/v1/P15-2070, http://aclweb.org/anthology/P15-2070

  37. Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534

  38. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545

  39. Smucker, M., Clarke, C., Cormack, G.: Experiments with clueweb09: Relevance feedback and web tracks. (2009). https://www.researchgate.net/publication/221038320_Experiments_with_ClueWeb09_Relevance_Feedback_and_Web_Tracks

  40. Jiang, C., Jiang, Y., Wu, W., Zheng, Y., Xie, P., Tu, K.: Combo: A complete benchmark for open kg canonicalization. In: The 17th Conference of the European Chapter of the Association for Computational Linguistics (2023)

  41. Souza Silva, L., Barbosa, L.: Matching news articles and wikipedia tables for news augmentation. Knowl. Inf. Syst. 65(4), 1713–1734 (2023). https://doi.org/10.1007/S10115-022-01815-0

    Article  Google Scholar 

  42. Maximilian Nickel, T.P. Lorenzo Rosasco: Holographic embeddings of knowledge graphs. In: Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence (2016)

  43. Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: An unsupervised and generative approach to clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1965–1972 (2017)

  44. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Neural Information Processing Systems (2013)

Download references

Funding

The authors would like to acknowledge the support provided by the Key R&D Program of Shandong Province, China (No. 2023CXGC010801), the National Natural Science Foundation of China (No.62302513 & 62272469), the “New 20 Regulations for Universities" funding program of Jinan (No.202228089) and the TaiShan Industrial Experts Programme (No.tscx202312128).

Author information

Authors and Affiliations

Authors

Contributions

Bingchen Liu wrote the main manuscript text, prepared all figures and tables and provided the methodology. Weixin Zeng, Xiang Zhao and Huang Peng provided writing-review and editing. Li Pan, Xin Li and Shijun Liu provided writing-review and editing and provided funding support.

Corresponding author

Correspondence to Weixin Zeng.

Ethics declarations

Competing interests

I declare that all authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and discussion reported in this paper.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Peng, H., Zeng, W. et al. Open knowledge base canonicalization with multi-task learning. World Wide Web 27, 51 (2024). https://doi.org/10.1007/s11280-024-01288-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01288-x

Keywords

Navigation