Nothing Special   »   [go: up one dir, main page]

CN114861766A - Dynamic link prediction method and system based on multi-granularity evolution - Google Patents

Dynamic link prediction method and system based on multi-granularity evolution Download PDF

Info

Publication number
CN114861766A
CN114861766A CN202210387628.6A CN202210387628A CN114861766A CN 114861766 A CN114861766 A CN 114861766A CN 202210387628 A CN202210387628 A CN 202210387628A CN 114861766 A CN114861766 A CN 114861766A
Authority
CN
China
Prior art keywords
dynamic
granularity
graph
nodes
evolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210387628.6A
Other languages
Chinese (zh)
Inventor
杨祎
古晓艳
樊海慧
李波
王伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202210387628.6A priority Critical patent/CN114861766A/en
Publication of CN114861766A publication Critical patent/CN114861766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a dynamic link prediction method and a dynamic link prediction system based on multi-granularity evolution. The method comprises the following steps: dividing the graph under each time slice of the dynamic graph to obtain a multi-granularity subgraph; extracting structural features of nodes from the multi-granularity subgraph; learning a dynamic evolution rule of the graph from the multi-granularity subgraphs to obtain dynamic evolution characteristics of the subgraphs with different granularities, and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain dynamic evolution characteristics of the nodes; and fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics. The invention can mine richer graph information, thereby enhancing the extraction capability of the structural characteristics and improving the accuracy of link prediction; the invention can fully realize the fusion of dynamic characteristics, enhance the learning ability of the dynamic characteristics and further improve the performance of link prediction.

Description

Dynamic link prediction method and system based on multi-granularity evolution
Technical Field
The invention belongs to the technical field of information, and particularly relates to a dynamic link prediction method and system based on multi-granularity evolution.
Background
Many real-world complex information systems can be modeled in graph form, and research on graph data is going into further in recent years. Because the relationship information between the entities is important for the correlation calculation on the graph, and the link prediction can predict the possibility that the relationship exists between the nodes which are not formed, effective information is mined, and the method is one of important tasks in the graph mining field. Meanwhile, graph data in the real world often dynamically evolves along with time, link prediction for a dynamic graph has a research value, the goal is to learn the structural evolution of the graph from past information to predict a future link, and wide attention is attracted. For example, chinese patent (application No. 201910440098.5, application publication No. CN 110413844a) proposes a dynamic link prediction method based on a spatio-temporal attention depth model, in an encoding stage, the method uses an adjacent matrix sequence of different time slices of a dynamic graph as input, uses a spatial attention mechanism to focus on neighboring nodes in a GCN-attention module to update a feature vector of each node under the corresponding time slice, inputs the feature vectors of T time slices into an LSTM-attention module to obtain a hidden layer vector, calculates a context vector for the hidden layer vectors of T times, inputs the context vector into a decoder as a spatio-temporal feature vector, decodes the input spatio-temporal feature vector by the decoder, and outputs a probability matrix obtained by decoding and indicating whether a link exists between a node and a node, that is, thereby realizing prediction of a dynamic link. Chinese patent (application No. 202110280461.9, application publication No. CN 113065974A) proposes a link prediction method based on dynamic network representation learning, which constructs a similarity matrix of a snapshot network by calculating similarity values among nodes of a dynamic network through a similarity-based aggregation strategy; applying the graph convolution neural network to a single snapshot network for feature aggregation, guiding a feature aggregation process by using an adjacency matrix and a similarity matrix, and determining low-dimensional feature representation of a node; and inputting the low-dimensional feature representation of the node into a logistic regression classifier to obtain a link prediction result of the dynamic network.
The existing dynamic link prediction method focuses on extracting the structural features and the dynamic features of a dynamic graph from the whole graph, but the method ignores that the graphs under different layers contain different structures and change rules, and the feature extraction is incomplete, so that the dynamic link prediction effect is not accurate enough. For example, in a social network, different groups are included, such as a group of students, a group of company associates, and so on. The interaction (sending messages and e-mails) among different groups in a period of time forms a dynamic graph, some groups have more interaction and dense structure, and some groups have less interaction and sparse structure. The subgraphs formed by different groups contain different structures and dynamic change modes, and the structural features and the dynamic features of different groups are learned distinctively, so that richer dynamic graph information is mined. If the characteristics are extracted from the whole graph, some noise may be introduced, so that the predicted edges in a dense structure are less than the actual edges, and the predicted edges in a sparse structure are more than the actual edges, so that the overall dynamic link prediction effect is poor.
Disclosure of Invention
Aiming at the defects of the existing link prediction method, the invention provides a dynamic link prediction method and a dynamic link prediction system based on multi-granularity evolution.
The method comprises the steps of dividing a graph under each time slice by a certain strategy to construct a multi-granularity subgraph. Then, a multi-granularity-based structural feature extraction module is designed, features are extracted from subgraphs under different granularities and are fused, and therefore the structural features of the subgraphs are captured. Then, a multi-granularity-based dynamic evolution fusion module is designed for learning the dynamic rules of the graph from the subgraphs under different granularities, and further, the dynamic characteristics are selectively fused by considering different importance of the characteristics under different granularities, and structural characteristics are simultaneously fused, so that node representation containing space-time characteristics is finally obtained, and the relation between future entities is more accurately predicted.
Specifically, the technical scheme adopted by the invention is as follows:
a dynamic link prediction method based on multi-granularity evolution comprises the following steps:
dividing the graph under each time slice of the dynamic graph to obtain a multi-granularity subgraph;
extracting structural features of nodes from the multi-granularity subgraph;
learning a dynamic evolution rule of the graph from the multi-granularity subgraphs to obtain dynamic evolution characteristics of the subgraphs with different granularities, and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain dynamic evolution characteristics of the nodes;
and fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics.
Further, dividing the graph under each time slice by adopting a k-tress subgraph decomposition algorithm to obtain the multi-granularity subgraph.
Further, a graph convolution network based on the disentanglement propagation and mapping operation is adopted and applied to the k-tress subgraph to extract the structural features of the nodes.
Further, the graph convolution network based on the disentanglement propagation and mapping operation has the following formula:
Figure BDA0003594296910000021
att l =σ(MLP(X l ))l=0,1,2,…,J
X out =softmax(sum(att 0 οX 0 ,…,att J οX J ))
wherein l is the number of GCN layers; j is the maximum depth of propagation;
Figure BDA0003594296910000022
is a node feature, N represents the number of nodes, d represents a feature dimension; MLP denotes full join operation; x l Representing depthA node characteristic when l;
Figure BDA0003594296910000031
the method is used for adaptively adjusting the information which needs to be reserved at different propagation depths of each node; att is represented by ° l Each element in (1) is represented by X with a corresponding d-dimensional node J Multiplying; softmax denotes normalization operation; sum denotes a summation operation; final output X out Is obtained by combining a plurality of propagation layers.
Further, the learning of the dynamic evolution law of the graph from the multi-granularity subgraph is to use RNN to learn the dynamic evolution law of different granularity subgraphs.
Further, the fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain the dynamic evolution characteristics of the nodes includes: and introducing a multi-head self-attention mechanism, learning the weight on each granularity through the self-attention mechanism, multiplying the weight by the node representation of the corresponding granularity, and summing to obtain the fused node dynamic evolution characteristic representation.
Further, the predicting the future link according to the node representation containing the space-time characteristics comprises: and performing Hadamard product operation on two node representations corresponding to the link, predicting the score of the link through the node representations, and training a logistic regression classifier to judge whether the link exists or not so as to predict the link on the graph at the next time.
A dynamic link prediction system based on multi-granularity evolution adopting the method comprises the following steps:
the multi-granularity graph dividing module is used for dividing the graph under each time slice of the dynamic graph to obtain a multi-granularity subgraph;
the multi-granularity structural feature extraction module is used for extracting structural features of the nodes from the multi-granularity subgraph;
the multi-granularity dynamic characteristic learning module is used for learning the dynamic evolution rule of the graph from the multi-granularity subgraph to obtain the dynamic evolution characteristics of the subgraphs with different granularities and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain the dynamic evolution characteristics of the nodes;
and the dynamic link prediction module is used for fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics.
The invention has the following beneficial effects:
1. aiming at the problems that the existing dynamic link prediction method focuses on extracting features from the whole graph and neglects the sub-graph structures with different granularities to have different structural features, the graph is divided into multi-granularity sub-graphs according to a certain strategy, then the features are extracted from the sub-graphs, the sub-graphs with different granularities in the same time slice are connected, the mode of the development from a local dense structure to a global sparse structure is learned, and richer graph information is mined, so that the extraction capability of the structural features is enhanced, and the accuracy of link prediction is improved.
2. Aiming at the problem that dynamic feature learning is not sufficient in a dynamic link prediction method, the invention provides a multi-granularity dynamic feature evolution fusion method, and the dynamic evolution features of the dynamic feature evolution fusion method are learned on a sub-graph time sequence under different granularities, so that the dynamic features of multiple granularities are obtained. Then, different weights are set for the features with different granularities by combining a self-attention mechanism, the contribution of important features is increased, the fusion of dynamic features is fully realized, the learning capability of the dynamic features is enhanced, and the performance of link prediction is further improved.
3. The method and the system can be used in the fields of social networks, recommendation systems and the like. For example, when used in a social network, the nodes in the present invention may represent users in the social network, and the predicted links may represent potential (future) friend relationships between the users in the social network. When the method is used for a recommendation system, the nodes in the method can represent users and commodities in the recommendation system, the predicted links connect the users and the commodities, and potential commodities (potential commodities represent commodities which are loved by the users but not bought yet) can be recommended to the users.
Drawings
FIG. 1 is an example of a k-tress diagram.
FIG. 2 is a schematic flow diagram of the process of the present invention.
Fig. 3 is a schematic diagram of a dynamic link prediction architecture proposed by the method of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific examples and the accompanying drawings.
The invention provides a dynamic link prediction method based on multi-granularity evolution, which provides a complete algorithm framework and mainly comprises 4 parts of dynamic multi-granularity map construction, multi-granularity structure feature extraction, multi-granularity dynamic feature evolution fusion and dynamic link prediction, and specifically comprises the following steps:
step 1. construction of dynamic multi-granularity graph
The invention defines the dynamic graph G as a time series G 1 ,G 2 ,…,G T And (4) separating various structures of the graph at each time from 1 to T by a certain strategy, and excavating abundant structure information with distinguishing force to help improve the link prediction capability. According to requirements, the k-tress subgraph decomposition algorithm in the community mining field is introduced. First, the concepts of support and k-tress subgraphs are introduced, which are defined as follows:
definition 1.support (e), if an edge e in the graph G is in k triangles, the support of the edge is (e) k-2.
Given a graph G and an integer k, the k-tress subgraph is a maximum subgraph G in G, in which the support of all edges is (e) > < k-2.
FIG. 1 shows a decomposed subgraph according to the k-tress definition. Wherein, the left one is the original image and is not changed. By definition, the edge support in a 2-tress subgraph should be greater than 0, meaning that each edge should be in at least 0 triangles, and any edge satisfies this condition, i.e., the graph itself. By analogy, each edge in a 3-tress subgraph is in at least 1 triangle, as shown by the right two. The 4-tress diagram is shown on the right.
Thus, a multi-granularity subgraph divided by k-tress decomposition is obtained. The multi-granularity subgraphs obtained by the k-tress strategy have many excellent characteristics, and the k-tress subgraphs require all edges to be at least in (k-2) triangles. It is known that triangles are the fundamental modules of real-world networks because a link is caused by a triangle closure, which has a high local clustering coefficient. In a social network, a triangle indicates that two friends have a common friend, indicating that the relationship between the three friends is firm and stable. Further, if two people have more friends, their friendship is more stable, and it can be emphasized whether any two friends have a firm relationship. On the link prediction task, common neighbors of nodes are considered in the k-tress strategy, a subgraph with a higher k value has a denser structure and fewer nodes, and intuitively, the more the common neighbors of two nodes are, the more the nodes are likely to become friends or interact with each other, so that the link prediction task has strong advantages.
And applying a k-tress decomposition algorithm to the graph under each time slice to obtain a multi-granularity subgraph sequence corresponding to the time slice.
And 2, learning the multi-granularity structural features.
In order to extract enough features from a multi-granularity map obtained over multiple timeslices to reflect the diversity of the structure, we need to propagate the features in a larger domain. One of the most common ways is GCN (Graph convolution Network), which extracts features by two operations of propagation and mapping, as the most classical GCN formula:
Figure BDA0003594296910000051
wherein, X l For node representation matrices, adjacency matrices with self-loops
Figure BDA0003594296910000052
Is an adjacency matrix a to which an identity matrix I is added. Each element a in the adjacency matrix ij Indicating whether there is an edge between node i and node j, 0 representing no edge, and 1 representing an edge.
Figure BDA0003594296910000053
Figure BDA0003594296910000054
Is a degree matrix with self-loops, where diag () represents a diagonal matrix, the elements on the diagonal representing the degrees of the corresponding nodes,
Figure BDA0003594296910000055
to represent
Figure BDA0003594296910000056
The value in row i and column j represents whether there is an edge between node i and node j. Superscript l denotes the number of GCN layers, W l Represents the parameter of the l-th layer, σ () is a nonlinear activation function. The propagation operation is
Figure BDA0003594296910000057
For communicating the node characteristics to neighboring nodes. The mapping operation is σ (P) l-1 W l ) For mapping the node representation to a feature space.
However, the mode of the entanglement propagation and mapping operation can cause the propagation depth of the GCN to be limited, and the features cannot be propagated in a large field. Therefore, the invention designs a graph convolution network based on disentanglement propagation and mapping operation, named as Modified GCN, to extract features, which can alleviate the over-fitting problem of general GCN, thereby propagating information in a larger range. The formula is as follows:
Figure BDA0003594296910000058
att l =σ(MLP(X l ))l=0,1,2,…,J
X out =softmax(sum(att 0 οX 0 ,…,att J οX J ))
where l is the number of GCN layers, and also the depth. J is the maximum depth of propagation and,
Figure BDA0003594296910000059
is node feature, N represents the number of nodes, d represents the feature dimension, and MLP represents the full join operation. X l Representing the node characteristics at depth l.
Figure BDA00035942969100000510
The method is used for adaptively adjusting the information which each node needs to keep at different propagation depths. Att is represented by ° l Each element in (1) is represented by X with a corresponding d-dimensional node J Multiplication, softmax denotes normalization, sum denotes summation. Final output X out Is obtained by combining a plurality of propagation layers.
And applying the Modified GCN to the k-tress subgraph to acquire the characteristics. Because the feature propagation and the feature mapping in the Modified GCN are carried out separately, the overfitting problem can be effectively relieved, and therefore the Modified GCN has a better effect on capturing high-order structure information.
Applying the Modified GCN to the subgraph on each time slice, and expressing the formula as follows:
Figure BDA0003594296910000061
the sub-graph with the granularity of K is represented by K, the value range of K is 2-K, K is the sub-graph with the maximum granularity, which is manually set and divided by applying a K-tress algorithm, T represents a time slice index, the range is 1-T, and T is the latest time slice.
Figure BDA0003594296910000062
Representing the node representation of the k-tress subgraph in the t-time slice,
Figure BDA0003594296910000063
a adjacency matrix representing k-tress subgraphs in a t-time slice. Thus we get the node representation at all granularities for all time slices
Figure BDA0003594296910000064
In the figure G t The multi-granularity sequence of the node representation is as follows
Figure BDA0003594296910000065
In the particle sizeAt k, the nodes represent a time sequence of
Figure BDA0003594296910000066
To better learn structural features, RNNs are well suited to handle highly related sequences, assuming that there is a potential process of structural evolution in each temporally multi-granular subgraph sequence. In the figure G t In the above, the sequence is inverted to
Figure BDA0003594296910000067
And inputs it to the RNN to learn structural information. The mathematical representation is:
Figure BDA0003594296910000068
Figure BDA0003594296910000069
Figure BDA00035942969100000610
wherein
Figure BDA00035942969100000611
A hidden state is represented in the form of,
Figure BDA00035942969100000612
representing the input features. In the input sequence, subgraph structures with higher k values are denser and have fewer nodes. In this process, structural information at different granularities is consolidated. Final hidden state output
Figure BDA00035942969100000613
Containing the structural attributes of the graph under the entire time slice t. This is performed for each time slice and a structural feature representation is finally obtained
Figure BDA00035942969100000614
And 3, multi-granularity dynamic feature learning.
In the time dimension, subgraphs of different granularities contribute differently to the evolution of the whole graph. Processing subgraphs of different granularities differently helps to model the complete graph evolution process.
RNN is used to learn the dynamic evolution law of different granularity subgraphs. At each particle size k, will
Figure BDA00035942969100000615
Input RNN, formulated as follows:
Figure BDA0003594296910000071
Figure BDA0003594296910000072
Figure BDA0003594296910000073
where T is the length of the time series,
Figure BDA0003594296910000074
indicating a hidden state. The dynamic feature learning module is similar to the structure fusion module except for their inputs, where the input sequences are time sequences of different granularity, with the purpose of capturing the dynamic evolution features. Note that the structure embedding sequence
Figure BDA0003594296910000075
Also input into RNN and output structural feature S T
In order to obtain the final node representation, a plurality of granularity learned subgraph dynamic evolution features need to be fused. In order to obtain the importance degrees of different granularities, a multi-head self-attention mechanism is introduced, the weight on each granularity is learned through the self-attention mechanism, and then the weight is multiplied by the node representation of the corresponding granularity and summed to obtain the fused node dynamic evolution characteristic representation.
Node v i The attention coefficient formula at k particle size is as follows:
Figure BDA0003594296910000076
wherein w k And w k’ As the parameter(s) is (are),
Figure BDA00035942969100000712
is represented by nodes, and the attention coefficient can be obtained after normalization
Figure BDA0003594296910000078
The fused nodes are represented as:
Figure BDA0003594296910000079
sigma is an activation function, and finally the dynamic evolution characteristic expression H of all the nodes is obtained out . Final node representation H emb Including structural and dynamic evolution features, formulated as follows, wherein
Figure BDA00035942969100000710
Representing a matrix splicing operation:
Figure BDA00035942969100000711
and 4, predicting a future link.
Finally, updating network parameters according to the optimization function until the model converges and the effect on the test set is optimal, and training to obtain an optimal model; performing Hadamard product operation on the representation of two nodes corresponding to the link, and predicting the link through the representation of the nodesAnd training the logistic regression classifier to judge whether the link exists or not so as to predict the graph G at the next time T T+1 And (c) a link.
Dynamic graph { G) input below 1 ,G 2 ,...,G T The invention is further described by taking the example as an example.
Fig. 2 is a flow chart of dynamic link prediction, including input, multi-granularity graph partitioning, multi-granularity structural feature extraction, multi-granularity dynamic feature extraction fusion, dynamic link prediction, and output.
Fig. 3 is a schematic diagram of the dynamic link prediction architecture of the present invention. The framework mainly comprises 4 parts of multi-granularity graph division, multi-granularity structural feature extraction, multi-granularity dynamic feature extraction fusion and dynamic link prediction.
Step 1, dividing a multi-granularity graph.
In { G 1 ,G 2 ,…,G T Applying a k-tress decomposition algorithm on each time slice to obtain a corresponding multi-granularity map, as shown in the left side of fig. 3, taking a multi-granularity sub-map decomposed by Gt as
Figure BDA0003594296910000081
Where the superscript represents time and the subscript represents the number of the multi-particle size plot. As the definition of k-tress is known, the subgraph sequence starts from a 2-tress subgraph (original graph), and k _ max is a value set manually.
And 2, learning the multi-granularity structural features.
Firstly, applying Modified GCN to each granularity graph of each time to extract features to obtain
Figure BDA0003594296910000082
Then, a multi-granularity subgraph sequence of each time t is reversely written
Figure BDA0003594296910000083
The RNN is input to capture potential structural features. This is performed for each time slice and finally a structural feature representation is obtained
Figure BDA0003594296910000084
Step 3, multi-granularity dynamic feature extraction and fusion
First, RNNs are used to learn the dynamic evolution rules of different granularity subgraphs. At each particle size k, will
Figure BDA0003594296910000085
Input RNN to obtain dynamic feature representation
Figure BDA0003594296910000086
The dynamic characteristics of the multiple granularities are expressed as
Figure BDA0003594296910000087
In combination with the self-attention mechanism, the importance (weight) of each granularity is calculated, and the dynamic feature representation is multiplied and summed with the weight to obtain the total dynamic feature representation. Finally, combining the structural characteristics and the dynamic characteristics to obtain a final node representation H emb
And 4, predicting the dynamic link.
Finally, updating network parameters according to an optimization function until the model converges and the effect on the test set reaches the optimum, and training to obtain an optimum model; performing Hadamard product operation on the representation of two nodes corresponding to the link, predicting the score of the link through the representation of the nodes, and training a logistic regression classifier to judge whether the link exists or not so as to predict the graph G at the next time T T+1 And (c) a link.
Experimental data: in the experiment, AUC is used as an evaluation index, and the AUC is an index for measuring the learning capacity of the model, and represents the probability of a positive sample before a negative sample, wherein the range is 0-1, and the closer to 1, the better the model effect is. The applied data sets UCI, AS, MATH, FACEBOOK, ask and ENRON are public graph data sets, and the information is shown in table 1.
TABLE 1 data set information
Data set Number of nodes Number of edges Number of time slices
UCI 1899 59835 7
AS 6828 1947704 100
MATH 24740 323357 77
FACEBOOK 60730 607487 27
ASKU 74924 356822 21
ENRON 87036 530284 38
Comparative experiments GCRN, dynagem, DynAE, dynarnn, DynAERNN, EvloveGCN and CTGCN are all existing advanced dynamic link prediction methods, as explained below:
GCRN: the model combines a GCN model and an RNN model, finds dynamic patterns in the graph, and generates node embedding in the dynamic graph.
DynGEM: the work is a graph embedding method based on depth self-coding, SDNE is expanded into a dynamic graph, and the method can be used for generating the embedded representation of the graph snapshot at the current moment based on the embedded representation of the graph snapshot at the previous moment, namely the embedded representation of the graph snapshot at the previous moment is used as an initial value, performing gradient training, and incrementally processing the dynamic graph, and has very good performance.
Dyngraph2 vec: the main dynamic graph feature of the work proposes an embedding method, and the dynamic graph learns the evolved structure and predicts the invisible link. It utilizes multiple non-linear layers to learn structural patterns in each network while learning temporal transitions in the network using recursive layers. There are three variants of this model, DynAE, dynarnn, and DynAERNN.
EvolveGCN: the framework is a dynamic variant of the GCN, and the framework does not adopt a node embedding mode, but adopts the GCN to learn the weight of a single moment in a time dimension. Meanwhile, the correlation of GCN weight among different time instants is considered, and GCN parameters are evolved by using RNN, so that the sequence dynamics is captured.
CTGCN: the model proposes a time-series GCN, combining the GCN and RNN, so that it captures the connectivity and structural information of the graph, while capturing the dynamics.
The experimental results are shown in table 2, and some methods cannot be tested on a large graph due to memory limitations, and are indicated by "-".
TABLE 2 dynamic Link prediction experiment AUC results
Figure BDA0003594296910000091
Based on the same inventive concept, another embodiment of the present invention provides a dynamic link prediction system based on multi-granularity evolution using the above method, which includes:
the multi-granularity graph dividing module is used for dividing the graph under each time slice of the dynamic graph to obtain multi-granularity subgraphs;
the multi-granularity structural feature extraction module is used for extracting structural features of the nodes from the multi-granularity subgraph;
the multi-granularity dynamic characteristic learning module is used for learning the dynamic evolution rule of the graph from the multi-granularity subgraph to obtain the dynamic evolution characteristics of the subgraphs with different granularities and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain the dynamic evolution characteristics of the nodes;
and the dynamic link prediction module is used for fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
The particular embodiments of the present invention disclosed above are illustrative only and are not intended to be limiting, since various alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The invention should not be limited to the disclosure of the embodiments in the present specification, but the scope of the invention is defined by the appended claims.

Claims (10)

1. A dynamic link prediction method based on multi-granularity evolution is characterized by comprising the following steps:
dividing the graph under each time slice of the dynamic graph to obtain a multi-granularity subgraph;
extracting structural features of nodes from the multi-granularity subgraph;
learning a dynamic evolution rule of the graph from the multi-granularity subgraphs to obtain dynamic evolution characteristics of the subgraphs with different granularities, and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain dynamic evolution characteristics of the nodes;
and fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics.
2. The method of claim 1, wherein the multi-granularity subgraph is obtained by partitioning the graph under each time slice by using a k-tress subgraph decomposition algorithm.
3. The method according to claim 2, wherein a graph convolution network based on de-entanglement propagation and mapping operations is adopted and applied to the k-tress subgraph to extract the structural features of the nodes.
4. The method according to claim 3, wherein the graph convolution network based on the disentanglement propagation and mapping operation has the following formula:
Figure FDA0003594296900000011
att l =σ(MLP(X l ))l=0,1,2,…,J
Figure FDA0003594296900000014
wherein l is the number of GCN layers; j is the maximum depth of propagation;
Figure FDA0003594296900000012
is a node feature, N represents the number of nodes, d represents a feature dimension; MLP denotes full join operation; x l Representing the node characteristics when the depth is l;
Figure FDA0003594296900000013
the method is used for adaptively adjusting the information which needs to be reserved at different propagation depths of each node;
Figure FDA0003594296900000015
stands for att l Each element in (1) is represented by X with a corresponding d-dimensional node J Multiplying; softmax denotes normalization operation; sum denotes a summation operation; final output X out Is obtained by combining a plurality of propagation layers.
5. The method of claim 1, wherein learning the dynamic evolution law of the graph from the multi-granularity subgraphs is by using RNN to learn the dynamic evolution law of different granularity subgraphs.
6. The method of claim 1, wherein the fusing the dynamic evolution features of the subgraphs of different granularities to obtain the dynamic evolution features of the nodes comprises: and introducing a multi-head self-attention mechanism, learning the weight on each granularity through the self-attention mechanism, multiplying the weight by the node representation of the corresponding granularity, and summing to obtain the fused node dynamic evolution characteristic representation.
7. The method of claim 1, wherein predicting future links from node representations containing spatio-temporal features comprises: and performing Hadamard product operation on two node representations corresponding to the link, predicting the score of the link through the node representations, and training a logistic regression classifier to judge whether the link exists or not so as to predict the link on the graph at the next time.
8. A dynamic link prediction system based on multi-granularity evolution adopting the method of any claim 1 to 7, which is characterized by comprising:
the multi-granularity graph dividing module is used for dividing the graph under each time slice of the dynamic graph to obtain a multi-granularity subgraph;
the multi-granularity structural feature extraction module is used for extracting structural features of the nodes from the multi-granularity subgraph;
the multi-granularity dynamic characteristic learning module is used for learning the dynamic evolution rule of the graph from the multi-granularity subgraph to obtain the dynamic evolution characteristics of the subgraphs with different granularities and fusing the dynamic evolution characteristics of the subgraphs with different granularities to obtain the dynamic evolution characteristics of the nodes;
and the dynamic link prediction module is used for fusing the structural characteristics of the nodes and the dynamic evolution characteristics of the nodes to obtain a node representation containing space-time characteristics, and predicting a future link according to the node representation containing the space-time characteristics.
9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.
CN202210387628.6A 2022-04-13 2022-04-13 Dynamic link prediction method and system based on multi-granularity evolution Pending CN114861766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210387628.6A CN114861766A (en) 2022-04-13 2022-04-13 Dynamic link prediction method and system based on multi-granularity evolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210387628.6A CN114861766A (en) 2022-04-13 2022-04-13 Dynamic link prediction method and system based on multi-granularity evolution

Publications (1)

Publication Number Publication Date
CN114861766A true CN114861766A (en) 2022-08-05

Family

ID=82630912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210387628.6A Pending CN114861766A (en) 2022-04-13 2022-04-13 Dynamic link prediction method and system based on multi-granularity evolution

Country Status (1)

Country Link
CN (1) CN114861766A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094020A (en) * 2024-01-30 2024-05-28 苏州科技大学 Topology adaptive graph layout method, system and storage medium for large graph visualization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094020A (en) * 2024-01-30 2024-05-28 苏州科技大学 Topology adaptive graph layout method, system and storage medium for large graph visualization

Similar Documents

Publication Publication Date Title
US20230169140A1 (en) Graph convolutional networks with motif-based attention
CN114117220B (en) Deep reinforcement learning interactive recommendation system and method based on knowledge enhancement
CN110263280B (en) Multi-view-based dynamic link prediction depth model and application
CN112685504B (en) Production process-oriented distributed migration chart learning method
CN113065974A (en) Link prediction method based on dynamic network representation learning
CN111325340B (en) Information network relation prediction method and system
CN113378913A (en) Semi-supervised node classification method based on self-supervised learning
WO2016165058A1 (en) Social prediction
CN113761250A (en) Model training method, merchant classification method and device
CN113554100B (en) Web service classification method for enhancing attention network of special composition picture
CN115907001A (en) Knowledge distillation-based federal diagram learning method and automatic driving method
CN113240086B (en) Complex network link prediction method and system
CN118134017A (en) Method for predicting social network link by adopting impulse neural network
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN116992151A (en) Online course recommendation method based on double-tower graph convolution neural network
CN109523012A (en) Based on Variational Solution Used coupled modes to the expression learning method of symbol directed networks
CN114880479B (en) Heterogeneous graph convolution rumor detection method based on multistage interaction and graph reconstruction
CN114861766A (en) Dynamic link prediction method and system based on multi-granularity evolution
CN115188022A (en) Human behavior identification method based on consistency semi-supervised deep learning
Hu et al. Data Customization-based Multiobjective Optimization Pruning Framework for Remote Sensing Scene Classification
CN115481215A (en) Partner prediction method and prediction system based on temporal partner knowledge graph
Zhang et al. Knowledge graph driven recommendation model of graph neural network
CN114842247B (en) Characteristic accumulation-based graph convolution network semi-supervised node classification method
Jiang et al. Dynamic adaptive and adversarial graph convolutional network for traffic forecasting
CN116975686A (en) Method for training student model, behavior prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination