CN109522954A - Heterogeneous Information network linking prediction meanss - Google Patents
Heterogeneous Information network linking prediction meanss Download PDFInfo
- Publication number
- CN109522954A CN109522954A CN201811357907.8A CN201811357907A CN109522954A CN 109522954 A CN109522954 A CN 109522954A CN 201811357907 A CN201811357907 A CN 201811357907A CN 109522954 A CN109522954 A CN 109522954A
- Authority
- CN
- China
- Prior art keywords
- label
- training
- sample
- vote
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012360 testing method Methods 0.000 claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 8
- 238000010276 construction Methods 0.000 claims abstract description 4
- 238000005295 random walk Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 abstract 1
- 238000000034 method Methods 0.000 description 19
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 101001006370 Actinobacillus suis Hemolysin Proteins 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000003517 fume Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of Heterogeneous Information network linking prediction meanss, described device include: setup unit, and corresponding type label is arranged suitable for setting first path between heterogeneous network interior joint pair to be predicted, the maximum length in first path and every kind of first path type;Construction unit forms sample set suitable for constructing sample vector based on the isomery topological characteristic between first path extraction node pair;The sample set includes training set and test set;Classification learning unit, suitable for obtaining corresponding multiple labeling classifier based on the training set and test set progress multiple labeling classification learning in the sample set;Predicting unit, the multiple labeling classifier suitable for being obtained using training predict the unknown relation between heterogeneous network interior joint to be predicted.The accuracy of Heterogeneous Information network linking prediction can be improved in above-mentioned scheme.
Description
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a heterogeneous information network link prediction device.
Background
Many complex systems in the real world may be formatted as a network, with nodes representing objects and links representing interactions between objects. Most of these networks are heterogeneous networks, which contain various types of objects and relationships, and are usually composed of multiple sub-networks. For example, the online social network Twitter contains information about types such as user basic information, user location, and user Twitter operations, information of types such as post/reply/forward tweets, follow/follow, check-in, and the like.
As a key issue in link mining, link prediction aims to predict the formation of future links based on current or historical networks. It has wider application in the fields of book-to-eye networks, biological networks, social networks and the like. Most existing link prediction methods are designed for homogeneous information networks, where nodes and links are of the same type. Recently, there is a great interest in promoting link prediction in heterogeneous networks because it has a wider application prospect.
However, the link prediction in the heterogeneous network in the prior art has the problem of low prediction accuracy.
Disclosure of Invention
The technical problem solved by the invention is how to improve the accuracy of heterogeneous information network link prediction.
In order to achieve the above object, an embodiment of the present invention further provides a heterogeneous information network link prediction apparatus, where the apparatus includes:
the setting unit is suitable for setting meta-paths between node pairs in the heterogeneous network to be predicted, the maximum length of the meta-paths and a type label corresponding to each meta-path type;
the construction unit is suitable for extracting heterogeneous topological features between node pairs based on meta-paths, constructing sample vectors and forming a sample set; the sample set comprises a training set and a test set;
the classification learning unit is suitable for performing multi-label classification learning based on the training set and the test set in the sample set to obtain a corresponding multi-label classifier;
and the prediction unit is suitable for predicting the unknown relation between the nodes in the heterogeneous network to be predicted by adopting the multi-label classifier obtained by training.
Optionally, the classification learning unit is adapted to select a training subset corresponding to a label pair formed by every two set type labels from the training subset, and perform two-classification learning on the selected training subset, so as to obtain a plurality of two classifiers corresponding to each label pair one to one; respectively inputting the test set into a plurality of two classifiers obtained by training, and calculating first votes obtained by examples corresponding to samples in the test set on various types of labels; adding the corresponding virtual labels into each sample in the corresponding training subsets respectively to obtain training subsets corresponding to label pairs formed by the corresponding type labels and the virtual labels, and training the training subsets respectively by adopting the obtained training subsets to obtain a plurality of auxiliary two classifiers corresponding to the type labels one to one; the virtual labels are used for marking the division points of type labels related and unrelated to the samples in the corresponding training subsets; respectively inputting the test set into a plurality of auxiliary second classifiers obtained by training, and calculating a second vote obtained on each type label and a third vote obtained on a virtual label by a sample corresponding to the test set; and determining a final multi-label classifier based on the first vote and the second vote obtained on each type label by the corresponding example of the test sample and the third vote obtained on the virtual label.
Optionally, the heterogeneous topology features between the node pairs include a path number feature and a random walk feature.
Optionally, the classification learning unit is adapted to calculate a first vote obtained by an instance corresponding to the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the samples are correctly predicted as negative examples in the training subset whenIndicating that the sample was correctly predicted as a positive example in the training subset.
Optionally, the classification learning unit is adapted to calculate a second vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, which have not been updated, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the sample was correctly predicted as a positive example in the training subset.
Optionally, the classification learning unit is adapted to calculate a second vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
therein, ζ*(xi,ls) Represents example xiIn the virtual label lsThe votes obtained on the above-mentioned voting table,indicating that the sample was correctly predicted as a negative case in the training subset.
Optionally, the classification learning unit is adapted to determine a final multi-label classifier based on the first vote and the second vote obtained on each type label and the third vote obtained on the virtual label by the corresponding example of the test sample by using the following formula:
h(x)={lj|ζ*(x,lj)>ζ*(x,ls)};
wherein h (x) represents the multi-label classifier, ljIndicates the jth type tag, ζ*(x,lj) Represents example xiOn the label ljThe final vote, ζ obtained above*(x,ls) Represents example xiIn the virtual label lsThe votes obtained above.
Optionally, the apparatus may further include:
and the calculation output unit calculates and outputs the dependency scores between the label pairs.
Compared with the prior art, the invention has the beneficial effects that:
according to the scheme, the corresponding multi-label classifier is obtained by performing multi-label classification learning based on the training set and the test set in the sample set, and the unknown relation between the nodes in the heterogeneous network to be predicted is predicted by adopting the multi-label classifier obtained by training, so that not only can the direct link between the nodes be predicted, but also other relations between the nodes, namely the meta-path, can be predicted, the instance has a plurality of types of labels instead of the unique type labels, and the accuracy of heterogeneous network link prediction can be improved.
Further, by calculating and outputting the dependency scores between the label pairs, suggestions on how to form new links can be provided, and the method helps to explore the rules of link and relation formation in a complex network, and is convenient and practical.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart illustrating a method for predicting links of a heterogeneous information network according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for training a multi-labeled classifier according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a heterogeneous information network link prediction apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly.
As described in the background, link prediction in heterogeneous networks has the following problems:
(1) representation of topological features: due to the heterogeneity of objects and links, the topological features used in homogeneous networks are not directly available. The neighbors of two nodes may be of different types, so the number of common neighbors cannot represent this heterogeneity.
(2) Interdependencies between different relationships: in the present invention, a relationship refers not only to an explicit link but also to an indirect connection. Modeling the correlation between different types of relationships is important because they may affect each other. For example, in a co-author network, two scholars may establish a co-author (one type of relationship) by attending the same meeting (another type of relationship).
A simple method of studying the isomorphic projection of the network alone avoids the first problem, but loses information because it ignores the dependency patterns between types. While heterogeneity presents many difficulties to link predictions, it also provides rich and valuable information, understanding the underlying mechanisms of link formation. The study of link prediction follows the common meaning that several links can be generated randomly, but most links are generated in a latent pattern. The information of the target link depends on the potential relationship between the nodes of the target link.
If not only the direct links between the predicted nodes are considered, but also other relationships between the predicted nodes, namely meta-paths, are considered, the traditional two-classification supervision method is not applicable any more, and multi-label learning is required.
Real-world objects tend not to have only unique semantics but may be ambiguous, for example, a picture may convey a variety of information such as "blue sky", "brook", "cattle", and "cooking fume", among others. The ambiguous objects no longer have unique semantics, so that the traditional supervised learning framework with single semantics cannot achieve good effect.
According to the technical scheme, the corresponding multi-label classifier is obtained by performing multi-label classification learning on the basis of the training set and the test set in the sample set, and the unknown relation between the nodes in the heterogeneous network to be predicted is predicted by adopting the multi-label classifier obtained by training, so that not only can the direct link between the nodes be predicted, but also other relations between the nodes, namely the meta-path, can be predicted, the instance has a plurality of types of labels instead of being unique, and the accuracy of heterogeneous network link prediction can be improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart illustrating a heterogeneous information network link prediction method according to an embodiment of the present invention. Referring to fig. 1, a heterogeneous information network link prediction method may specifically include the following steps:
step S101: setting a meta-path between node pairs in the heterogeneous network to be predicted, the maximum length of the meta-path and a corresponding type label for each meta-path type.
In a specific implementation, the meta-path is a path from one node to another node in a heterogeneous network through a series of type connections. For example, meta path Pk=(V1E1V2E2...En-1Vn) Represents a node V1Linking E by a series of typesi(i 1, 2.., n-1) to node VnCan be recorded as
For simplicity, the first capital letter of the english name of the corresponding node is used for representation in the embodiment of the present invention. For example, in a DBLP collaborator network, P denotes a paper, a denotes an author, T denotes a subject, V denotes a conference, and U denotes a user, a meta path for describing a reference relationship between authors may be abbreviated as "APPA".
In a specific implementation, the meta-path between the node pairs in the heterogeneous network to be predicted, the maximum length of the meta-path, and each meta-path type set a corresponding type tag, and those skilled in the art can set the type tag according to actual needs, which is not limited herein.
Step S102: extracting heterogeneous topological features between node pairs based on meta-paths, and constructing sample vectors to form a sample set; the sample set includes a training set and a test set.
In an embodiment of the present invention, the heterogeneous topology features between the node pairs include a path number feature and a random walk feature. The path number indicates the total number of meta-paths of corresponding types between node pairs, and the random walk characteristics between the nodes correspond to the meta-path types one to one, that is, each meta-path type has a corresponding random walk characteristic, and can be calculated by using the following formula:
wherein,denotes the starting point as viEnding point is node vjThe number of the k-th meta-paths of (c),indicates that the starting point is viThe total number of k-th element paths.
As can be seen from the above description, when the meta-path type in the heterogeneous network to be predicted is K, and each meta-path between node pairs can be quantized into a path number and randomly walk two heterogeneous topological features, each node pair can be described by 2K heterogeneous topological features. In other words, each node is described using a corresponding 2K-dimensional space vector, which is referred to as the instance to which the node corresponds.
Let X denote a 2K-dimensional instance space, L ═ L1,l2,...,lKDenotes a label space containing K kinds of meta paths, and extracts a sample set composed of sample vectors constructed based on heterogeneous topological features between node pairs based on the meta paths, D { (x)i,Li) I 1 ≦ i ≦ m represents the multi-labeled training set, m represents the number of samples in the multi-labeled training set, (x)i,Li) Is a multi-labeled sample, whereinTo describe node pairs (v)i,vj) 2K-dimensional feature vector of (x)iEach dimension in (a) may be a heterogeneous or homogeneous feature,is xiThe tag set of (2); t { (x)i,Li) I m +1 ≦ i ≦ n } is the multi-labeled test set, n represents the total number of samples in the sample set.
Step S103: and performing multi-label classification learning based on the training set and the test set in the sample set to obtain a corresponding multi-label classifier.
In a specific implementation, when the corresponding training set and test set are constructed, the obtained multi-labeled training set D may be used to perform multi-labeled classification learning training to obtain a multi-labeled classifier, and the multi-labeled classifier obtained by training is optimized by using the multi-labeled test set T, which is specifically described in detail with reference to fig. 2 and will not be repeated.
Step S104: and predicting unknown relations among nodes in the heterogeneous network to be predicted by adopting a multi-label classifier obtained by training.
In specific implementation, when the corresponding multi-label classifier is obtained through training, the obtained multi-label classifier can be used for predicting an unknown part of a network to be predicted, namely, an unknown connection relation between target node pairs which are not connected in the network to be predicted is predicted, and a new predicted network graph is obtained.
In an embodiment of the present invention, the method may further include:
step S105: and calculating and outputting a dependency score between the label pairs.
In an embodiment of the invention, the dependency scores between the label pairs can be calculated and output by adopting a Chi-Square Test tool, suggestions on how to form new links between node pairs can be provided for a user, and the method helps to explore the rules of link and relationship formation in a complex network, and is convenient and practical.
Fig. 2 is a flowchart illustrating a training method of a multi-label classifier according to an embodiment of the present invention. Referring to fig. 2, a training method of a multi-label classifier may specifically include the following steps:
step S201: and respectively selecting training subsets corresponding to label pairs formed by every two set type labels from the training subsets, and respectively performing two-class learning on the selected training subsets to obtain a plurality of two classifiers corresponding to each label pair one to one.
In a specific implementation, L ═ L for the label space1,l2,...,lKAnd selecting samples in a training subset corresponding to the label formed by every two types of labels from a multi-labeled training set D, wherein the samples can be specifically defined as:
Djk={(xi,ψ(Li,lj,lk))|φ(Li,lj)≠φ(Li,lk)} (2)
wherein D isjkRepresents a label pair (l)j,lk) A corresponding training subset.
In an implementation, when selecting a label pair (l) from the multi-labeled training set Dj,lk) Corresponding training subset DjkThen, a classification learning algorithm can be adopted to perform classification learning on the two labels to obtain a label pair (l)j,lk) And a plurality of two classifiers in one-to-one correspondence.
Step S202: and respectively inputting the test set into a plurality of two classifiers obtained by training, and calculating first votes obtained by the examples corresponding to the samples in the test set on the labels of all types.
In an embodiment of the present invention, the first votes obtained by the instances corresponding to the samples in the test set on the tags of the respective types are calculated by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the samples are correctly predicted as negative examples in the training subset whenIndicating that the sample was correctly predicted as a positive example in the training subset.
Step S203: and respectively adding the corresponding virtual labels into each sample in the corresponding training subsets to obtain training subsets corresponding to label pairs formed by the corresponding type labels and the virtual labels, and respectively training by adopting the obtained training subsets to obtain a plurality of auxiliary two classifiers in one-to-one correspondence with each type label.
In the implementation, a virtual tag l is usedsAdded to each sample for labeling with xiSegmentation points of related and unrelated labels, for each new label pair (l)j,ls) Respectively obtained corresponding training sets DisAnd for each new label pair (l) using a corresponding two-class learning algorithmj,ls) Respectively obtained corresponding training sets DjsRespectively learning to obtain corresponding K auxiliary two classifiers Clfjs。
Step S204: and respectively inputting the test set into a plurality of auxiliary second classifiers obtained by training, and calculating a second vote obtained on each type label and a third vote obtained on the virtual label by the corresponding example of the sample in the test set.
In an embodiment of the present invention, the following formula is used to calculate the second vote obtained by the corresponding instance of the sample in the test set on each type label:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, which have not been updated, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the sample was correctly predicted as a positive example in the training subset.
In an embodiment of the present invention, the following formula is used to calculate the third vote obtained by the example corresponding to the sample in the test set on each type label:
therein, ζ*(xi,ls) Represents example xiIn the virtual label lsThe votes obtained on the above-mentioned voting table,indicating that the sample was correctly predicted as a negative case in the training subset.
Step S205: and determining a final multi-label classifier based on the first vote and the second vote obtained on each type label by the corresponding example of the test sample and the third vote obtained on the virtual label.
In one embodiment of the present invention, the determined final multi-labeled classifier can be represented as:
h(x)={lj|ζ*(x,lj)>ζ*(x,ls)} (8)
wherein h (x) represents the multi-label classifier, ljIndicates the jth type tag, ζ*(x,lj) Represents example xiOn the label ljThe final vote, ζ obtained above*(x,ls) Represents example xiIn the virtual label lsThe votes obtained above.
The above-mentioned detailed description of the method for predicting heterogeneous information network links in the embodiment of the present invention, and a device corresponding to the above-mentioned method will be introduced below.
Fig. 3 is a schematic structural diagram illustrating a heterogeneous information network link prediction apparatus according to an embodiment of the present invention. Referring to fig. 3, a heterogeneous information network link prediction apparatus 30 may include a setting unit 301, a construction unit 302, a classification learning unit 303, and a prediction unit 304, wherein:
the setting unit 301 is adapted to set a meta-path between node pairs in the heterogeneous network to be predicted, a maximum length of the meta-path, and a type tag corresponding to each meta-path type.
The constructing unit 302 is adapted to extract heterogeneous topological features between node pairs based on meta-paths, construct sample vectors, and form a sample set; the sample set includes a training set and a test set.
The classification learning unit 303 is adapted to perform multi-label classification learning based on the training set and the test set in the sample set to obtain a corresponding multi-label classifier.
The prediction unit 304 is adapted to predict the unknown relationship between the nodes in the heterogeneous network to be predicted by using the trained multi-label classifier.
In a specific implementation, the classification learning unit 303 is adapted to select, from the training set, a training subset corresponding to a label pair formed by every two set type labels, and perform two-classification learning on the selected training subset, so as to obtain a plurality of two classifiers corresponding to each label pair one to one; respectively inputting the test set into a plurality of two classifiers obtained by training, and calculating first votes obtained by examples corresponding to samples in the test set on various types of labels; adding the corresponding virtual labels into each sample in the corresponding training subsets respectively to obtain training subsets corresponding to label pairs formed by the corresponding type labels and the virtual labels, and training the training subsets respectively by adopting the obtained training subsets to obtain a plurality of auxiliary two classifiers corresponding to the type labels one to one; the virtual labels are used for marking the division points of type labels related and unrelated to the samples in the corresponding training subsets; respectively inputting the test set into a plurality of auxiliary second classifiers obtained by training, and calculating a second vote obtained on each type label and a third vote obtained on a virtual label by a sample corresponding to the test set; and determining a final multi-label classifier based on the first vote and the second vote obtained on each type label by the corresponding example of the test sample and the third vote obtained on the virtual label.
In a specific implementation, the heterogeneous topology features between the node pairs include a path number feature and a random walk feature.
In an embodiment of the present invention, the classification learning unit 303 is adapted to calculate a first vote obtained by an instance corresponding to the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the samples are correctly predicted as negative examples in the training subset whenIndicating that the sample was correctly predicted as a positive example in the training subset.
In an embodiment of the present invention, the classification learning unit 303 is adapted to calculate a second vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, which have not been updated, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the sample was correctly predicted as a positive example in the training subset.
In an embodiment of the present invention, the classification learning unit 303 is adapted to calculate a second vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
therein, ζ*(xi,ls) Represents example xiIn the virtual label lsThe votes obtained on the above-mentioned voting table,indicating that the sample was correctly predicted as a negative case in the training subset.
In an embodiment of the present invention, the classification learning unit 303 is adapted to determine a final multi-label classifier based on the first vote and the second vote obtained on each type label and the third vote obtained on the virtual label by using the following formula:
h(x)={lj|ζ*(x,lj)>ζ*(x,ls) }; wherein h (x) represents the multi-label classifier, ljIndicates the jth type tag, ζ*(x,lj) Represents example xiOn the label ljThe final vote, ζ obtained above*(x,ls) Represents example xiIn the virtual label lsThe votes obtained above.
In an embodiment of the present invention, the apparatus 30 may further include 305, wherein:
the calculation output unit 305 is adapted to calculate and output a dependency score between the tag pairs.
The embodiment of the invention also provides a computer readable storage medium, which stores computer instructions, and the computer instructions execute the steps of the heterogeneous information network link prediction method when running. For the method for predicting heterogeneous information network links, reference is made to the descriptions of the related parts, which are not repeated herein.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor, and the processor executes the steps of the heterogeneous information network link prediction method when running the computer instructions. For the method for predicting heterogeneous information network links, reference is made to the descriptions of the related parts, which are not repeated herein.
By adopting the scheme in the embodiment of the invention, the corresponding multi-label classifier is obtained by carrying out multi-label classification learning based on the training set and the test set in the sample set, and the unknown relation between the nodes in the heterogeneous network to be predicted is predicted by adopting the multi-label classifier obtained by training, so that not only can the direct link between the nodes be predicted, but also other relations between the nodes, namely a new type meta-path, can be predicted, and the accuracy of heterogeneous network link prediction can be improved.
Further, by calculating and outputting the dependency scores between the label pairs, suggestions on how to form new links can be provided, and the method helps to explore the rules of link and relation formation in a complex network, and is convenient and practical.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.
Claims (8)
1. A heterogeneous information network link prediction apparatus, comprising:
the setting unit is suitable for setting meta-paths between node pairs in the heterogeneous network to be predicted, the maximum length of the meta-paths and a type label corresponding to each meta-path type;
the construction unit is suitable for extracting heterogeneous topological features between node pairs based on meta-paths, constructing sample vectors and forming a sample set; the sample set comprises a training set and a test set;
the classification learning unit is suitable for performing multi-label classification learning based on the training set and the test set in the sample set to obtain a corresponding multi-label classifier;
and the prediction unit is suitable for predicting the unknown relation between the nodes in the heterogeneous network to be predicted by adopting the multi-label classifier obtained by training.
2. The heterogeneous information network link prediction device according to claim 1, wherein the classification learning unit is adapted to select, from the training set, training subsets corresponding to a pair of labels formed by every two set type labels, respectively, and perform classification learning on the selected training subsets, respectively, to obtain a plurality of two classifiers corresponding to each label pair one to one; respectively inputting the test set into a plurality of two classifiers obtained by training, and calculating first votes obtained by examples corresponding to samples in the test set on various types of labels; adding the corresponding virtual labels into each sample in the corresponding training subsets respectively to obtain training subsets corresponding to label pairs formed by the corresponding type labels and the virtual labels, and training the training subsets respectively by adopting the obtained training subsets to obtain a plurality of auxiliary two classifiers corresponding to the type labels one to one; the virtual labels are used for marking the division points of type labels related and unrelated to the samples in the corresponding training subsets; respectively inputting the test set into a plurality of auxiliary second classifiers obtained by training, and calculating a second vote obtained on each type label and a third vote obtained on a virtual label by a sample corresponding to the test set; and determining a final multi-label classifier based on the first vote and the second vote obtained on each type label by the corresponding example of the test sample and the third vote obtained on the virtual label.
3. The heterogeneous information network link prediction device of claim 2, wherein the heterogeneous topology features between the node pairs comprise a path number feature and a random walk feature.
4. The heterogeneous information network link prediction device of claim 3, wherein the classification learning unit is adapted to calculate a first vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,indicating that the samples are correctly predicted as negative examples in the training subset whenIndicating that the sample was correctly predicted as a positive example in the training subset.
5. The heterogeneous information network link prediction device of claim 4, wherein the classification learning unit is adapted to calculate the second vote obtained by the corresponding instance of the sample in the test set on each type label by using the following formula:
wherein, ζ (x)i,lj) Represents example xiOn the label ljVotes obtained above, which have not been updated, ClfjkRepresents a label pair (l)j,lk) A corresponding two-classifier is used for classifying the data,the representation indicates that the samples are correctly predicted as positive examples in the training subset.
6. The heterogeneous information network link prediction device of claim 5, wherein the classification learning unit is adapted to calculate the second vote obtained on each type label by the corresponding instance of the sample in the test set by using the following formula:
therein, ζ*(xi,ls) Represents example xiIn the virtual label lsThe votes obtained on the above-mentioned voting table,indicating that the sample was correctly predicted as a negative case in the training subset.
7. The heterogeneous information network link prediction device of any one of claims 3-6, wherein the classification learning unit is adapted to determine a final multi-label classifier based on the first vote and the second vote obtained on each type label and the third vote obtained on the virtual label by the corresponding instance of the test sample by using the following formula:
h(x)={lj|ζ*(x,lj)>ζ*(x,ls)}
wherein h (x) represents the multi-label classifier, ljIndicates the jth type tag, ζ*(x,lj) Represents example xiOn the label ljThe final vote, ζ obtained above*(x,ls) Represents example xiIn the virtual label lsThe votes obtained above.
8. The heterogeneous information network link prediction device of any one of claims 3-6, further comprising:
and the calculation output unit is suitable for calculating and outputting the dependency scores between the label pairs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811357907.8A CN109522954A (en) | 2018-11-14 | 2018-11-14 | Heterogeneous Information network linking prediction meanss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811357907.8A CN109522954A (en) | 2018-11-14 | 2018-11-14 | Heterogeneous Information network linking prediction meanss |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109522954A true CN109522954A (en) | 2019-03-26 |
Family
ID=65777879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811357907.8A Pending CN109522954A (en) | 2018-11-14 | 2018-11-14 | Heterogeneous Information network linking prediction meanss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522954A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110677284A (en) * | 2019-09-24 | 2020-01-10 | 北京工商大学 | Heterogeneous network link prediction method based on meta path |
CN111310822A (en) * | 2020-02-12 | 2020-06-19 | 山西大学 | PU learning and random walk based link prediction method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893637A (en) * | 2016-06-24 | 2016-08-24 | 四川大学 | Link prediction method in large-scale microblog heterogeneous information network |
CN106778894A (en) * | 2016-12-29 | 2017-05-31 | 大连理工大学 | A kind of method of author's cooperative relationship prediction in academic Heterogeneous Information network |
CN107145527A (en) * | 2017-04-14 | 2017-09-08 | 东南大学 | Link prediction method based on first path in alignment isomery social networks |
-
2018
- 2018-11-14 CN CN201811357907.8A patent/CN109522954A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893637A (en) * | 2016-06-24 | 2016-08-24 | 四川大学 | Link prediction method in large-scale microblog heterogeneous information network |
CN106778894A (en) * | 2016-12-29 | 2017-05-31 | 大连理工大学 | A kind of method of author's cooperative relationship prediction in academic Heterogeneous Information network |
CN107145527A (en) * | 2017-04-14 | 2017-09-08 | 东南大学 | Link prediction method based on first path in alignment isomery social networks |
Non-Patent Citations (2)
Title |
---|
KE-JIA CHEN等: "On Link Formation in Heterogeneous Information Networks: A View Based on Multi-Label Learning", 《ASONAM "17: PROCEEDINGS OF THE 2017 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING 2017》 * |
MIN-LING ZHANG等: "A Review on Multi-Label Learning Algorithms", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110677284A (en) * | 2019-09-24 | 2020-01-10 | 北京工商大学 | Heterogeneous network link prediction method based on meta path |
CN110677284B (en) * | 2019-09-24 | 2022-06-17 | 北京工商大学 | Heterogeneous network link prediction method based on meta path |
CN111310822A (en) * | 2020-02-12 | 2020-06-19 | 山西大学 | PU learning and random walk based link prediction method and device |
CN111310822B (en) * | 2020-02-12 | 2022-09-20 | 山西大学 | PU learning and random walk based link prediction method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
You et al. | Cross-modality attention with semantic graph embedding for multi-label classification | |
Liang et al. | Interpretable structure-evolving LSTM | |
CN109359564B (en) | Image scene graph generation method and device | |
Leng et al. | Mining and matching relationships from interaction contexts in a social manufacturing paradigm | |
CN112434721A (en) | Image classification method, system, storage medium and terminal based on small sample learning | |
CN108073677A (en) | A kind of multistage text multi-tag sorting technique and system based on artificial intelligence | |
CN112396106B (en) | Content recognition method, content recognition model training method, and storage medium | |
CN112308115B (en) | Multi-label image deep learning classification method and equipment | |
Zhou et al. | Multi-label image classification via category prototype compositional learning | |
Omrani et al. | Multi-label class assignment in land-use modelling | |
CN113761250A (en) | Model training method, merchant classification method and device | |
CN117992805B (en) | Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion | |
Moyano | Learning network representations | |
Zare et al. | Detection of community structures in networks with nodal features based on generative probabilistic approach | |
Wu et al. | Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks | |
CN109543114A (en) | Heterogeneous Information network linking prediction technique, readable storage medium storing program for executing and terminal | |
CN116129286A (en) | Method for classifying graphic neural network remote sensing images based on knowledge graph | |
CN114943017A (en) | Cross-modal retrieval method based on similarity zero sample hash | |
CN117370578A (en) | Method for supplementing food safety knowledge graph based on multi-mode information | |
CN115631008A (en) | Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and commodity recommendation medium | |
CN110489613B (en) | Collaborative visual data recommendation method and device | |
Zhang et al. | End‐to‐end generation of structural topology for complex architectural layouts with graph neural networks | |
CN109522954A (en) | Heterogeneous Information network linking prediction meanss | |
Xu et al. | Adaptive feature selection with reinforcement learning for skeleton-based action recognition | |
Cai et al. | Semantic and correlation disentangled graph convolutions for multilabel image recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |