Simple and automated negative sampling for knowledge graph embedding

1371 Accesses
Explore all metrics

Abstract

Negative sampling, which samples negative triplets from non-observed ones in knowledge graph (KG), is an essential step in KG embedding. Recently, generative adversarial network (GAN) has been introduced in negative sampling. By sampling negative triplets with large gradients, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, they make the original model more complex and harder to train. In this paper, motivated by the observation that negative triplets with large gradients are important but rare, we propose to directly keep track of them with the cache. In this way, our method acts as a “distilled” version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. However, how to sample from and update the cache are two critical questions. We propose to solve these issues by automated machine learning techniques. The automated version also covers GAN-based methods as special cases. Theoretical explanation of NSCaching is also provided, justifying the superior over fixed sampling scheme. Besides, we further extend NSCaching with skip-gram model for graph embedding. Finally, extensive experiments show that our method can gain significant improvements on various KG embedding models and the skip-gram model and outperforms the state-of-the-art negative sampling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TuckerDNCaching: high-quality negative sampling with tucker decomposition

Article Open access 20 June 2023

DCNS: A Double-Cache Negative Sampling Method for Improving Knowledge Graph Embedding

Reinforced Negative Sampling for Knowledge Graph Embedding

Notes

The mean rank (MR) metric, which is given in the conference version [67], is removed in the journal version since (i) space is limited. and (ii) the mean rank is easily influenced by low ranking samples.
https://github.com/cai-lw/KBGAN

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: ICLR (2017)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: The Semantic Web, pp. 722–735. Springer (2007)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In:NIPS, pp. 2546–2554 (2011)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. JMLR 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: ACM SIGMOD, pp. 1247–1250 (2008)
Bordes, A., Chopra, S., Weston, J.: Question answering with subgraph embeddings. In: Conference on EMNLP, pp. 615–620 (2014)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Bose, A., Ling, H., Cao, Y.: Adversarial contrastive estimation. In: ACL (Volume 1: Long Papers), pp. 1021–1032 (2018)
Cai, H., Zheng, V., Chang, K.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE TKDE 30(9), 1616–1637 (2018)
Google Scholar
Cai, L., Wang, W.: KBGAN: adversarial learning for knowledge graph embeddings. Conference of NAACL 1, 1470–1480 (2018)
Google Scholar
Chen, L., Yuan, F., Jose, J., Zhang, W.: Improving negative sampling for word representation using self-embedded features. In: WSDM, pp. 99–107 (2018)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI (2018)
Ding, J., Quan, Y., He, X., Li, Y., Jin, D.: Reinforced negative sampling for recommendation with exposure data. In: IJCAI, pp. 2230–2236. AAAI Press (2019)
Dong, L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: ACM SIGKDD, pp. 601–610 (2014)
Dori, D.: Visweb-the visual semantic web: unifying human and machine knowledge representations with object-process methodology. VLDB J. 13(2), 120–147 (2004)
Article Google Scholar
Fedus, W., Goodfellow, I., Dai, A.: Maskgan: better text generation via filling in the \_. In: ICLR (2018)
Gao, H. Huang, H.: Self-paced network embedding. In: ACM SIGKDD, pp. 1406–1415 (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: ACM SIGKDD, pp. 855–864 (2016)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein gans. In: NIPS, pp. 5767–5777 (2017)
Hutter, F., Hoos, H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: ICLIO, pp. 507–523 (2011)
Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning: Methods. Systems, Challenges. Springer (2018)
Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. ACL 1, 687–696 (2015)
Google Scholar
Kadlec, R., Bajgar, O., Kleindienst, J.: Knowledge base completion: baselines strike back. In: The 2nd Workshop on Representation Learning for NLP, pp. 69–74 (2017)
Kazemi, S., Poole, D.: SimplE embedding for link prediction in knowledge graphs. In: NeurIPS, pp. 4289–4300 (2018)
Kingma, D., Ba, J.A.: A method for stochastic optimization. Technical report. arXiv:1412.6980 (2014)
Kok, S., Domingos, P.: Statistical predicate invention. In: ICML, pp. 433–440 (2007)
Koller, D., Friedman, N., Džeroski, S., Sutton, C., McCallum, A., Pfeffer, A., Abbeel, P., Wong, M., Heckerman, D., Meek, C., et al.: Introduction to Statistical Relational Learning. The MIT Press (2007)
Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)
Lao, N., Mitchell, T., Cohen, W. Random walk inference and learning in a large scale knowledge base. In: Conference on EMNLP, pp. 529–539. ACL (2011)
Li, J., Tao, C., Feng, Y., Zhao, D., Yan, R. et al.: Sampling matters! an empirical study of negative sampling strategies for learning of matching models in retrieval-based dialogue systems. In: Proceedings of the 2019 EMNLP-IJCNLP, pp. 1291–1296 (2019)
March, J.: Exploration and exploitation in organizational learning. Organ. Sci. 2(1), 71–87 (1991)
Article Google Scholar
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Inf. Retrieval 3(2), 127–163 (2000)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Mikolov, T. and Yih, G., Zweig, W.: Linguistic regularities in continuous space word representations. In: NAACL, pp. 746–751 (2013)
Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: a new way of searching. VLDB J. 25(6), 741–765 (2016)
Article Google Scholar
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: NIPS, pp. 1017–1025 (2014)
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2015)
Article Google Scholar
Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., Taylor, J.: Industry-scale knowledge graphs: lessons and challenges. Queue 17(2), 48–75 (2019)
Article Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019)
Perozzi, B., Al-Rfou, R., Skiena, S.D.: Online learning of social representations. In: ACM SIGKDD, pp. 701–710 (2014)
Rawat, A., Chen, J., Yu, F., Suresh, A., Kumar, S.: Sampled softmax with random Fourier features. In: NeurIPS, pp. 13857–13867 (2019)
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Conference on UAI, pp. 452–461. AUAI Press (2009)
Suchanek, F., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: knowledge graph embedding by relational rotation in complex space. In: ICLR (2018)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: WWW, pp. 1067–1077 (2015)
Toutanova, K. and Chen, D.: Observed versus latent features for knowledge base and text inference. In: Workshop on CVSMC, pp. 57–66 (2015)
Trouillon, T., Dance, C., Gaussier, É., Welbl, J., Riedel, S., Bouchard, G.: Knowledge graph completion via complex tensor factorization. JMLR 18(1), 4735–4772 (2017)
MathSciNet MATH Google Scholar
Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., Xie, X., Guo, M.: Graphgan: graph representation learning with generative adversarial nets. In: AAAI (2018)
Wang, J., Yu, L., Zhang, W., Gong, Y., Xu, Y., Wang, B., Zhang, P., Zhang, D.: IRGAN: a minimax game for unifying generative and discriminative information retrieval models. In: ACM SIGIR, pp. 515–524 (2017)
Wang, P., Li, S., Pan, R.: Incorporating GAN for negative sampling in knowledge representation learning. In: AAAI (2018)
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE TKDE 29(12), 2724–2743 (2017)
Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. AAAI 14, 1112–1119 (2014)
Google Scholar
Welford, B.: Note on a method for calculating corrected sums of squares and products. Technometrics 4(3), 419–420 (1962)
Article MathSciNet Google Scholar
Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH Google Scholar
Wu, C., Manmatha, R., Smola, A., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the ICCV, pp. 2840–2848 (2017)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: ICLR (2017)
Yao, Q., Wang, N., Jair Escalante, H., Guyon, I., Hu, Y., Li, Y., Tu, W., Yang, Q., Yu, Y.: Taking human out of learning applications: a survey on automated machine learning. Technical report, arXiv preprint (2018)
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W., Leskovec, J.: Graph convolutional neural networks for web-scale recommender systems. In: ACM SIGKDD, pp. 974–983 (2018)
Zhang, C., Li, Y., Du, N., Fan, W., Yu, P.: On the generative discovery of structured medical knowledge. In: SIGKDD, pp. 2720–2728 (2018)
Zhang, F., Yuan, N., Lian, D., Xie, X., Ma, W.: Collaborative knowledge base embedding for recommender systems. In: ACM SIGKDD, pp. 353–362 (2016)
Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)
Google Scholar
Zhang, W., Chen, T., Wang, J., Yu, T.: Optimizing top-n collaborative filtering via dynamic negative item sampling. In: ACM SIGIR, pp. 785–788 (2013)
Zhang, Y., Yao, Q., Shao, Y., Chen, L.: NSCaching: simple and efficient negative sampling for knowledge graph embedding. In: ICDE, pp. 614–625 (2019)
Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: ICML, pp. 1–9 (2015)
Zou, L., Chen, L., Özsu, M., Zhao, D.: Answering pattern match queries in large graph databases via graph embedding. VLDB J. 21(1), 97–120 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

4Paradigm Inc, Beijing, China
Yongqi Zhang & Quanming Yao
Department of Electronic Engineering, Tsinghua University, Beijing, China
Yongqi Zhang
Department of Science and Engineering, Hong Kong University and Technology, Hong Kong, China
Lei Chen

Authors

Yongqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quanming Yao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quanming Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: Proof of Theorem 1

Proof

Following [68], we consider a more general optimization formulation as $ \min _{\mathbf {w}} \bar{F}(\mathbf {w}) \equiv \frac{1}{n} \sum \nolimits _{i = 1}^n \phi _{i}(\mathbf {w}) + \lambda r(\mathbf {w})$, which covers (11) as a special case with $\lambda = 0$. Then, the stochastic training gives $\mathbf {w}^{t + 1}$ as

$$\begin{aligned} \mathbf {w}^{t + 1}&=\! \arg \min _{\mathbf {w}} \left[ (np_{i_t}^t)^{-1}\mathbf {w}^{\top } \nabla \phi _{i_t}(\mathbf {w}^t) \! + \! \lambda r(\mathbf {w}) \! \right. \nonumber \\&\left. + \! \frac{1}{\eta _t} \mathscr {B}_{\psi }(\mathbf {w}, \mathbf {w}^t) \right] \!,\! \end{aligned}$$

(18)

where $\mathscr {B}_{\psi }$ is a Bregman distance function measuring the difference between $\mathbf {w}$ and $\mathbf {w}^t$. Based on (18), we state Theorem 3 in [68] as

Theorem 2

([68]) Let $\mathbf {w}^t$ be generated from (18). Assume $\mathscr {B}_{\psi }$ is $\sigma $-strongly convex w.r.t. a norm $\Vert \cdot \Vert $, $\bar{F}$ and r are convex, if $\eta _t = \eta $, the following inequality holds for any $T \ge 1$

$$\begin{aligned}&\frac{1}{T}\sum \nolimits _{t =1}^T \mathbb {E}[\bar{F}(\mathbf {w}^t)] - \mathbb E[\bar{F}(\mathbf {w}^*)] \\&\le \frac{1}{T}\left[ \frac{1}{\eta } \mathscr {B}_{\psi }(\mathbf {w}^*, \mathbf {w}^1) + \frac{\eta }{2\sigma } \sum \nolimits _{t = 1}^T \mathbb {E} \Vert \nicefrac {\nabla \phi _{i_t}(\mathbf {w}^t)}{ n \mathbf {p}_{i_t}^t }\Vert ^2 \right] \end{aligned}$$

where the expectation is take with distribution $\mathbf {p}^t$.

In our Algorithm 5, we do not have r and thus $\lambda = 0$ in (11). Besides, $\mathscr {B}_{\psi }(\mathbf {w}, \mathbf {w}^t) = \frac{1}{2} \Vert \mathbf {w} - \mathbf {w}^t \Vert ^2$ in our case; thus, $\sigma = 1$. Above all, we have Theorem 1 which is derived from Theorem 2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Yao, Q. & Chen, L. Simple and automated negative sampling for knowledge graph embedding. The VLDB Journal 30, 259–285 (2021). https://doi.org/10.1007/s00778-020-00640-7

Download citation

Received: 24 December 2019
Revised: 27 July 2020
Accepted: 14 September 2020
Published: 28 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00778-020-00640-7

Simple and automated negative sampling for knowledge graph embedding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TuckerDNCaching: high-quality negative sampling with tucker decomposition

DCNS: A Double-Cache Negative Sampling Method for Improving Knowledge Graph Embedding

Reinforced Negative Sampling for Knowledge Graph Embedding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix: Proof of Theorem 1

Proof

Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Simple and automated negative sampling for knowledge graph embedding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TuckerDNCaching: high-quality negative sampling with tucker decomposition

DCNS: A Double-Cache Negative Sampling Method for Improving Knowledge Graph Embedding

Reinforced Negative Sampling for Knowledge Graph Embedding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Appendix: Proof of Theorem 1

A Appendix: Proof of Theorem 1

Proof

Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now