Keywords

1 Introduction

Knowledge graphs such as WordNet [1], Freebase [2] and Yago [3] have become important resources in many AI applications, such as web/mobile search, question answering(Q&A), etc. They usually contain extensive structured data as the form of triplets \((head\_entity, relation, tail\_entity)\)(denoted as (hrt)), where relation represents the relationship between the two entities. Although a typical knowledge graph may contain a great many triplets that represent numerous facts, they usually suffer from incompleteness. Knowledge graph completion aims at predicting relations between entities based on existing triplets in a knowledge graph. In the past decade, much work based on formal logic has been done for knowledge graph completion [4,5,6,7], but they are neither tractable nor robust for a real large scale knowledge graph.

Recently, a promising approach for this task is to encode a knowledge graph into a low-dimensional embedding vector space. Following this approach, many methods have been explored, which will be introduced in detail in Sect. 2. Among these methods, TransE [8, 9] is simple and effective, achieving the state-of-the-art prediction performance. It learns vector embeddings for both entities and relationships. These vector embeddings are denoted by the same letter in boldface. The basic idea behind TransE is that, every relation is regarded as translation between the embeddings of entities, that is, \(\mathbf{h} +\mathbf{r} \approx \mathbf{t} \) when (hrt) holds. TransE is suitable for 1-to-1 relations, but has issues when dealing with N-to-1,1-to-N and N-to-N relations. TransH [10] is proposed to address these flaws, it enables an entity having different representations when involved in different relations by utilizing relation-specific hyperplane, which is denoted by a norm vector \(\mathbf{w} _r\) and a translation vector \(\mathbf{d} _r\). The embedding vectors \(\mathbf{h} \) and \(\mathbf{t} \) are projected to the hyperplane of relation r to obtain vectors \(\mathbf{h} _{\perp }=\mathbf{h} -\mathbf{w} _r^\mathrm {T}{} \mathbf{h} {} \mathbf{w} _r\) and \(\mathbf{t} _{\perp }=\mathbf{t} -\mathbf{w} _r^\mathrm {T}{} \mathbf{h} {} \mathbf{w} _r\), then \(\mathbf{h} _{\perp }+\mathbf{d} _r\approx \mathbf{t} _{\perp }\). The common assumption is entities and relations are in the same vectore space for TransE and TransH. However, entities and relations are different types of objects, they should be in different vector space. TransR [11] projected \(\mathbf{h} \) and \(\mathbf{t} \) to the aspects that relation r focuses on through the mapping matrix \(\mathbf{M} _r\) and then \(\mathbf{M} _r\mathbf{h} +\mathbf{r} \approx \mathbf{M} _r\mathbf{t} \).

However, those methods all do not consider the importance information of different triplets. It is intuitive that different facts or different entities have different importance level, some triplets are more important than other triplets. Hence, during learning the information in knowledge graph, we should pay different attention to different triplets. To address this issue, we propose a new approach PRTransE, which considers importance information in learning knowledge graph triplets and obtain importance by PageRank mainly, hence named as PRTransE. Our contributions in the paper are as follows:

  • We propose a novel model PRTransE, which consider importance information into knowledge graph completion. It provides a flexible approach to deal with different triplets in a knowledge graph.

  • Compared with TransE, PRTransE pays different attention to various triplets during learning knowledge graph facts.

  • In experiments, our approach outperforms previous methods including TransE, TransH, TransR in link prediction task.

2 Related Work

We first define our notations. (hrt) denotes a triplet and bold lower case \(\mathbf{h},r,t \) denotes their column vectors; bold upper case letters denotes matrices, such as \(\mathbf{M} \). \(f_{r}(\mathbf{h},t )\) represents score funtion. The score is expected to be lower for a golden triplet and higher for an incorrect triplet. Other notations will be described in the appropriate sections.

2.1 Translation-Based Models

As mentioned in Section “Introduction”, TransE regards the relation r as the translation from h to t for a golden triplet, i.e. the pair of embedded entities in a triplet (hrt) could be linked by \(\mathbf{r} \) with slight error if (hrt) is a golden triplet. This indicates that (t) should be a nearest neighbor of (h + r), while (h + r) should be far away from t otherwise. The score function is

$$\begin{aligned} f _{r}(\mathbf{h} ,\mathbf{t} )=\left\| \mathbf{h} +\mathbf{r} -\mathbf{t} \right\| _{2}^{2} \end{aligned}$$
(1)

TransE is suitable for 1-to-1 relations, but has issues for N-to-1,1-to-N and N-to-N relations. To solve these problems, TransH enabled an entity to have different representations on hyperplanes of different relations. For a relation r, TransH models the relation as a relation-specific translation vector \(\mathbf{r} \) on a relation-specific hyperplane with \(\mathbf{w} _{r}\) as the normal vector rather than in the same space of entity embeddings. Concretely, for a triplet (hrt), the entity embedding \(\mathbf{h} \) and \(\mathbf{t} \) are first projected to the hyperplane \(\mathbf{w} _{r}\). The projecting vectors are denoted as \(\mathbf{h} _{\perp }\) and \(\mathbf{t} _{\perp }\), respectively. The score function is defined as

$$\begin{aligned} f _{r}(\mathbf{h} ,\mathbf{t} )=\left\| \mathbf{h} _{\perp }+\mathbf{r} -\mathbf{t} _{\perp } \right\| _{2}^{2} \end{aligned}$$
(2)

TransH ensures that \(\mathbf{h} _{\perp }\) and \(\mathbf{t} _{\perp }\) are on the hyperplane of r by restricting \(\left\| \mathbf{w} _{r}\right\| =1\). The common assumption is entities and relations are in the same vectore space for TransE and TransH. Although TransH extxtends modeling flexibility by emploiting relation hyperplanes, it does not perfectly break the restriction of this assumption. However, entities and relations are different types of objects, they should be in different vector space.

Hence, TransR is proposed based on the assumption, which models entities and relations in distinct spaces, i.e. entity space and relation space. TransR uses a relation-specific projection matrix \(\mathbf{M} _{r}\) for each r to map entities into a relation specific subspace. The score function of TransR is:

$$\begin{aligned} f _{r}(\mathbf{h} ,\mathbf{t} )=\left\| \mathbf{M} _{r}{} \mathbf{h} _{\perp }+\mathbf{r} -\mathbf{M} _{r}{} \mathbf{t} _{\perp } \right\| _{2}^{2} \end{aligned}$$
(3)

2.2 Other Models

In addition to translation-based models, semantic matching models utilize similarity-based scoring functions. They measure the credibility of facts by matching the underlying semantics of the entities and the relationships contained in the vector space representation.

In DISTMULT [12], entities are represented by vectors, and relationships are represented by diagonal matrices, which model pairwise interactions between potential factors, and the scoring function is a bilinear function.

ComplEx [13] extended DISTMULT model to better model the asymmetric relationship by introducing complex number. Then, the entities and relationships are in the complex space, and the fact of the asymmetric relationship can get different scores according to the order of the involved entities.

3 Model

To take into account the importance of different triplets, we purpose PRTransE, which introduces the triplet importance in a natural way mainly based on pagerank.

3.1 PRTransE

The triplet importance includes two parts: 1) entity importance. 2) relation importance. In addition, the knowledge graph datasets include positive triplets and negative triplets. And the PRTransE adopts respective process for positive triplets and negative triplets. The model PRTransE is shown in Fig. 1.

Fig. 1.
figure 1

The model structure of PRTransE

3.2 Entity Importance

For entity importance [14] in positive triplets and negative triplets, the PageRank is used to compute it. The idea of PageRank is originally used to measure the importance of webpage on the Internet. And different nodes have different importances on the web network. Inspired by L Page et al. [15], the idea of PageRank is used to compute entity importance in knowledge graph. For entity importance in negative triplets, it adopts the similar PageRank algorithm as in positive triplets.

$$\begin{aligned} E_{imp}(h_i)=\frac{1-d}{N}+d\sum \limits _{h_j\in M(h_i)}\frac{E_{imp}(h_j)}{L(h_j)} \end{aligned}$$
(4)

where \(h_i\) represents the entity which is to be estimated the importance, \(M(h_i)\) is the set of entities that link to \(h_i\), N is total number of entities in the knowledge graph, d is damping coefficient, \(L(h_j)\) represents the number of outlinks of the entity \(h_j\).

3.3 Relation Importance

For relation importance in positive triplets, the model had an assumption that the importance of a relation is not only affected by first order relation [16] but also by higher order relation. For the estimation of relation importance in first order relation, PRTransE adopts the Jaccard similarity of relations.

$$\begin{aligned} R_{imp_1}(r_i)=\displaystyle \frac{h_i \cdot t_i}{||h_i||_{2}||t_i||_{2}} \end{aligned}$$
(5)

where \(R_{imp_1}\) represents the first-order importance of relation \(r_i\) in the triplet \((h_i,r_i,t_i)\). \(h_i\) and \(t_i\) are embedding vectors represent head entity and tail entity respectively. \(||.||_{2}\) is the module of corresponding vector.

For the higher order importance of the relationship, the model regards the association of the head entity, tail entity and the relationship in the triplet as “voting” for the relationship, and then combines the entity importance to obtain the higher order importance of the relationship. For example, as shown in Fig. 2, assume that r is a 3–2 relationship, but for the formation of the triplet (h2, r, t2), the head entity’s association with the relationship can be regarded as a “vote”. Similarly, the same is true for tail entity. Divide the importance scores of the entities at both ends by the number of alternative paths, and then combine to obtain the higher-order importance of the relationship in the triplet.

Fig. 2.
figure 2

The diagram of higher-order importance of relation

The following is a formula for calculating the higher-order importance of a relationship:

$$\begin{aligned} R_{imp_2}(r_i)=\frac{E_{imp}(h_i)}{h_{r_i}pt_{r_i}}+\frac{E_{imp}(t_i)}{t_{r_i}ph_{r_i}} \end{aligned}$$
(6)

where \(R_{imp_2}(r_i)\) means higher-order importance of relation \(r_i\), \(E_{imp}(h_i)\) and \(E_{imp}(t_i)\) represents importance score of entity \(h_i\) and \(t_i\) respectively, \(h_{r_i}pt_{r_i}=\frac{\#(\Delta _{r_i})}{\#distinct(t_{r_i})}\) [17], where \(t_{r_i}\) represents the tail entities belonging to relationship \(r_i\), and \(\Delta _{r_i}\) denotes the training triplets containing the relationship \(r_i\). \(h_{r_i}pt_{r_i}\) represents the average number of triplets corresponding to each tail entity in the relationship \(r_i\) Similarly, \(t_{r_i}ph_{r_i}=\frac{\#(\Delta _{r_i})}{\#distinct(h_{r_i})}\), \(t_{r_i}ph_{r_i}\) represents the average number of triplets corresponding to each head entity in the relationship \(r_i\).

Hence, the relation importance affected by higher order relation is obtained. Then, the comprehensive relation importance score \(R_{imp}(r_i)\) could be gained by combinating first order importance with higher order importance. Here, there is calculation formula:

$$\begin{aligned} R_{imp}(r_i)=\alpha *R_{imp_{1}}(r_i)+(1-\alpha )*R_{imp_{2}}(r_i) \end{aligned}$$
(7)

where \(\alpha \) is a hyperparameter, used to weigh the first-order importance and higher-order importance of the relationship.

In the above manner, we obtained the first-order importance score and the higher-order importance score of the relationship, and combined the two to obtain the overall relationship importance score.

3.4 Triplet Importance

After obtaining entity importance and relationship importance, for the i-th positive triplet \(ptri_i\), such as \((h_i,r_i,t_i)\), the importance score \(w_{+}(ptri_i)\) of the positive triplet can be calculated. The formula is defined as follows:

$$\begin{aligned} w_{+}(ptri_i)=\frac{E_{imp}(h_i)+E_{imp}(t_i)+R_{imp}(r_i)}{3} \end{aligned}$$
(8)

However, for relation importance in negative triplets, there exists some difference compared to situations in the positive triplets: 1) Negative triplets are generated by replacing the correct entities in the positive triplets with other entities, and the replaced entities will in most cases cause the original triplets to be invalid, resulting in incorrect matching of information. Since the information in the knowledge graph flows through the entire network through relationships, mismatched entities can cause incorrect information to flow, thus destroying the higher-order importance of the relationship. Therefore, for negative triplets, the model does not consider the higher order importance of the relationship. 2) Since the model learns negative samples during training, it can improve the model’s ability to recognize wrong tuples when making link predictions. Therefore, the model considers the importance of negative triplet mapping importance \(W_{r_i}\), and its purpose is to strengthen the learning process of negative triplets, this factor is obtained by combining \(h_{r_i}pt_{r_i}\) and \(t_{r_i}ph_{r_i}\). The formula is as follows:

$$\begin{aligned} W_{r_i}=\frac{1}{log(h_{r_i}pt_{r_i}+t_{r_i}ph_{r_i})} \end{aligned}$$
(9)

where \(W_{r_i}\) is a weight coefficient arising from \(h_{r_i}pt_{r_i}\) and \(t_{r_i}ph_{r_i}\).

Hence, for the i-th negative triplet \(ntri_i\) e.g. \((h_i,r_i,t'_i)\), its importance score \(w_{-}(ntri_i)\) is

$$\begin{aligned} w_{-}(ntri_i)=\frac{E_{imp}(h_i)+E_{imp}(t'_i)+R_{imp_1}(r_i)+W_{r_i}}{4} \end{aligned}$$
(10)

where \(E_{imp}(h_i)\) and \(E_{imp}(t'_i)\) are the importance score of entity \(h_i\) and \(t'_i\), \(R_{imp_1}(r_i)\) represents the first-order importance of relation as Eq.(7), \(W_{r_i}\) represents the mapping importance of the relationship in the negative triplet.

figure a

4 Experiments and Analysis

4.1 Datasets and Evaluation Protocol

To evaluate our proposed method, we use two benchmark datasets. They are from two popular knowledge graphs: WordNet(Miller 1995) and Freebase(Bollacker et al. 2008). WordNet is a large lexical knowledge graph. Entities in WordNet are synonyms which represent distinct concepts. Relations in WordNet are conceptual-semantic and lexical relations. Freebase is a large collaborative knowledge base consists of a great many the world facts, such as triplets (Peter Finch, location, london) and (Yoshinaga_Sayuri, profession, actor). Table 1 lists statistics of the 2 dataset.

Table 1. The summary of datasets

The purpose of the link predicion task is to predict a triplet \((e_{i},r_{k},e_{j})\) which lacks \(e_{i}\) or \(e_{j}\), i.e. predict \(e_{i}\) given \((r_{i},e_{j})\) or predict \(e_{j}\) given \((e_{i},r_{k})\). The task emphasizes the rank of the correct entity instead of only finding the best one entity. Similarly to (Bordes et al. 2013), we exploit two metrics as our ecaluation protocols: the average rank of all correct entities(Mean Rank) and the proportion of correct entities ranked in top N ranks(Hits@N) for N = 1, 3, and 10. A lower Mean Rank and a higher Hits@N mean the performance of model is better.

4.2 Experiment Setup

We first describe the notations We assume that there are \(n_t\) triplets in the training set, \((h_i,r_{i},t_{i})(i=1,2,...,n_{t})\) denotes the i-th triplets. Each triplet has a label \(y_i\) indicating that the triplet is positive\((y_i=1)\) or negative\((y_i=0)\). Then the positive triplet set \(S=\{(h_j,r_j,t_j)|y_j=1\}\), the negative triplet set \(S^{'}=\{(h_j,r_j,t_j)|y_j=0\}\). However, the knowledge graph originally contains only positive training triplets, not negative training triplets. Therefore, we obtain a positive triplet set S from the knowledge graph, and then generate a negative triplet set according to the generating rules: \(S^{'}=\{(h_l,r_k,t_k)|h_l \ne h_k \wedge y_k=1\} \cup \{(h_k,r_k,t_l)|t_l \ne t_k \wedge y_k=1\}\), that is to replace the head entity or tail entity in a positive triplet with a wrong entity to generate a negative triplet. In addition, according to previous research work, there are two commonly used methods for replacement strategies: “Unif” and “Bern”. “Unif” refers to randomly replacing the head entity or tail entity for a positive triplet, but the knowledge graph itself is pretty incomplete. This random sampling method may introduce many wrong negative triplets for training. The “Bern” strategy takes more into account the mapping properties of the relationships in the triplets, and then has different probabilities for replacing the head entity or tail entity. For example, the mapping properties of relationships in the knowledge graph are divided into 1−1, 1−N, N−1 and N−N. If a relation is 1−N, the “Bern” strategy tends to replace the head entity with a large probability. If the relation is N-to-1, it tends to replace the tail entity with a large probability.

Here, We define the following margin-based training object function

$$\begin{aligned} L=\sum _{(h,l,t)\in S} \sum _{(h^{'},l,t^{'})\in S'} [\gamma +w_{+}(ptri_{i})d(h+r,t)-w_{-}(ntri_{i})d(h^{'}+r,t^{'})]_+ \end{aligned}$$
(11)

where \([x]_{+}\) represents the positive part of x, \(\gamma >0\) is a margin hyperparameter, S is the set of positive triplets and \(S'\) is the set of negative triplets. \(d(h+r,t)\) represents the distance score(or energy score) of a triplet. For example, for a positive triplet \((h,r,t), h+r\approx t\), so its distance score \(d(h+r,t)\) should be small, and for a negative triplet, its distance score is larger, the L1 norm or L2 norm can be used to calculate the distance. The process of minimizing the objective function can be performed using stochastic gradient descent.

In this task, we use two datasets: FB15 K and WN18. We select the margin \(\gamma \) among \(\{1, 1.5, 2, 2.5, 3\}\), the dimension of entity vectors m and the dimension of relation vectors n among \(\{50, 100, 150\}\), and the mini-batch size B among \(\{100, 500, 1000\}\). The best configuration obtained by valid set are: \(\gamma =1, m, n=50,B=200\) and taking \(L_{2}\) as dissimilarity. For all the datasets. We traverse to training for 2000 rounds.

4.3 Link Prediction

The task of link prediction is to predict the missing h or t for a golden triplet (hrt). In the task, we remove the head/tail entity and then replace it with all the entities in the knowledge graph. For each position of missing entity, the model is asked to rank a set of candidate entities from the knowledge graph, instead of finding the best one result. We first achieve scores of those corrupted triplets and then rank them by descending order, the rank of the correct entity is finally stored. Noting the fact that a corrupted triplet may also exist in knowledge graphs, the corrupted triplet should be regard as a correct triplet. Hence, we should remove the corrupted triplets included in train set, valid set and test set before ranking. Hence, there are two evaluation setting: “Raw”(the corrupted triplets are not removed) and “Filter”((the corrupted triplets are removed).The paper will report evaluation results of the two settings.

Experimental results on all the datasets are shown in Table 2. From Table 2, we can conclude that: (1) PRTransE outperforms other baseline embedding models(TransE, TransH, TransR). (2) On both FB15 K and WN18 public datasets, PRTransE has achieved a consistent improvement in Mean Rank. It suggests that considering triplet importance will benefit knowledge graph completion. (3) On FB15K, PRTransE’s HITS@10 performance under the “Raw” setting is slightly weaker than ComplEx, but the HITS@10 performance under the “Filter” setting is better than all comparative models; On the WN18 public dataset, PRTransE’s Mean Rank and HITS@10 indicators performed better than the comparison model, indicating that when PRTransE performs entity link prediction, the location of the correct candidate entity is predicted to be more advanced and more accurate.

Table 2. Average results of link prediction

On the FB15 K data set, under the “Raw” setting, the Mean Rank indicator of PRTransE is improved by about 21.9% compared to the TransE model, and the HITS@10 indicator is improved by about 41.0% compared to TransE; Under the “Filter” setting, the Mean Rank indicator of PRTransE is improved by about 56.7% compared to TransE, and the HITS@10 indicator is improved by 52.9% compared to TransE.

On the WN18 data set, PRTransE has a consistent improvement on all indicators compared to the comparison model. Under the “Raw” setting, the Mean Rank indicator of PRTransE is improved by approximately 11.5% compared to TransE, and the HITS@10 indicator is improved by about 6.8% compared to TransE; Under the “Filter” setting, the Mean Rank indicator of PRTransE is improved by about 12.2% compared to TransE, and the HITS@10 indicator is improved by about 3.9% compared to TransE.

The TransE model and other comparison models do not consider the importance information of the triplet, thus the validity of the PRTransE model in considering the triplet importance information in knowledge graph completion is verified.

5 Conclusion

In this paper, we proposed a knowledge graph completion model based on PageRank and triplet importance, and combines the triplet importance information in the knowledge graph on the basis of TransE, which is used to take appropriate degree of attention for different triplets during model learning, improving the prediction ability of the model. In addition, when considering the importance information of triplets, the importance information hidden in the knowledge graph is fully explored, which makes the model more reasonable when evaluating the importance of triplets during the training process and improves the performance of the model. Experimental result shows that compared with TransE and other knowledge representation models, PRTransE has achieved the best overall performance, and has achieved consistent performance improvements on almost all kinds of indicators.