Nothing Special   »   [go: up one dir, main page]

CN116049769B - Discrete object data relevance prediction method and system and storage medium - Google Patents

Discrete object data relevance prediction method and system and storage medium Download PDF

Info

Publication number
CN116049769B
CN116049769B CN202310339869.8A CN202310339869A CN116049769B CN 116049769 B CN116049769 B CN 116049769B CN 202310339869 A CN202310339869 A CN 202310339869A CN 116049769 B CN116049769 B CN 116049769B
Authority
CN
China
Prior art keywords
similarity
matrix
disease
data
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310339869.8A
Other languages
Chinese (zh)
Other versions
CN116049769A (en
Inventor
金敏
张寒雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310339869.8A priority Critical patent/CN116049769B/en
Publication of CN116049769A publication Critical patent/CN116049769A/en
Application granted granted Critical
Publication of CN116049769B publication Critical patent/CN116049769B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a discrete object data relevance prediction method and system and a storage medium. The prediction method comprises the following steps: respectively calculating the similarity of each discrete object, and constructing a heterogeneous network with the known association relation between different discrete objects; the node similarity and node association information of the discrete objects are combined on the heterogeneous network by using a graph convolution neural network encoder, and the plurality of discrete objects are encoded; learning the characteristic embedding of the discrete object nodes of each convolution layer by using a multi-head attention mechanism to obtain the final embedding of various discrete objects; decoding the obtained characteristics of the plurality of discrete objects by using a linear decoder to obtain discrete object associated prediction scores; and the minimized weighted binary cross entropy is adopted as a loss function learning parameter, so that decision deviation caused by sparse characteristics of the data set is reduced. The method can be applied to the associated prediction of various discrete object data, and has stronger generalization and high prediction accuracy.

Description

Discrete object data relevance prediction method and system and storage medium
Technical Field
The invention relates to the field of discrete object association relation prediction, relates to a discrete object data association prediction method and system and a storage medium, and in particular relates to a discrete object data association prediction method and system based on graph convolution multi-head attention and a storage medium.
Background
With the development of computer technology and internet, more and more discrete data are accumulated to lay a solid foundation for predicting association relations among discrete objects, and a wide platform is provided. A user discovers potential associations between discrete objects by searching for multiple discrete information. For example, if the commodity a and the commodity B are commodities that a customer has purchased at the same time, it is possible to predict whether the customer will purchase the commodity C and the commodity B at the same time for the commodity C having a similar function to the commodity a by using the discrete object association relationship prediction method. Another example is: if the medicine A can treat the disease B, the medicine B with a similar chemical structure with the medicine A can be predicted whether the medicine B has a treatment effect on the disease B by using a discrete object association relation prediction method. And the following steps: the word a and the word B are used for the same sentence multiple times, and then, for the word B having a similar meaning to the word a, whether the word B and the word C will appear in the same sentence may be predicted using the discrete object association prediction method.
In the existing method for predicting the association relation of the discrete objects, the heterogeneous information network among the discrete objects is constructed, and the heterogeneous information network is analyzed to obtain the prediction result of the association relation among a plurality of discrete objects. For example, a matrix decomposition method is adopted to analyze the heterogeneous information network, so that nonlinear association relations are easy to ignore; constructing a transition matrix based on a heterogeneous information network by adopting a random walk method, and enabling probability distribution to tend to converge through multiple iterations, wherein the probability distribution is easy to fall into local optimum; the topological information characteristics of the graphs in the heterogeneous network are easily ignored by adopting a neural network method to analyze the heterogeneous information network.
Specifically, for example, in the prediction of association between a discrete object microorganism and a disease, the method and system for predicting association between a microorganism and a disease disclosed in patent document CN112151191a mainly introduce a multi-source information representation of a disease and a microorganism obtained by random walk of a meta-path, so as to realize multi-source data fusion and multi-aspect information prediction of association between a microorganism and a disease. The random walk algorithm based on the meta-path can effectively extract information of microorganisms and diseases from different data sources, and particularly can effectively acquire heterogeneous network information. However, the random walk algorithm only focuses on adjacent nodes, and is easy to sink into local optimum, so that the final prediction result is inaccurate.
Therefore, the accuracy of the obtained prediction result is also provided with a larger improvement space by adopting the existing discrete object association relation prediction method.
Disclosure of Invention
The invention aims to solve the technical problems that: the method and the system for predicting the association of the discrete object data and the storage medium can fully capture the potential association relation between the discrete objects, capture the heterogeneous network topology information of different convolution layers, reduce the decision deviation caused by the sparsity of the known association data between the discrete objects and improve the prediction precision in the association relation prediction of the discrete objects.
In order to solve the technical problems, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for predicting relevance of discrete object data, specifically including the following steps:
s1, respectively calculating the similarity of each discrete object, and constructing a heterogeneous network with the known association relation between different discrete objects;
s2, combining node similarity and node association information of discrete objects on a heterogeneous network by using a graph convolution neural network encoder, and encoding a plurality of discrete objects contained in the heterogeneous network;
s3, learning characteristic embedding of the discrete object nodes of each convolution layer by using a multi-head attention mechanism to obtain final embedding containing various discrete objects;
s4, decoding the obtained characteristics containing a plurality of discrete objects by using a linear decoder so that the output matrix and the input matrix have the same dimension, and obtaining the discrete object associated prediction score;
s5, adopting a minimized weighted binary cross entropy as a loss function learning parameter, and reducing decision deviation caused by sparse characteristics of the data set.
Further, the step S1 specifically includes:
according to the data characteristics of the discrete objects, similarity of the discrete objects m and d is obtained by adopting a similarity calculation model through calculation; using the similarity of discrete objects m as matrix
Figure SMS_1
Matrix for representing similarity of discrete objects d
Figure SMS_2
A representation;
describing the known association between the discrete object m and the discrete object d as a binary matrix
Figure SMS_3
Wherein M, N represents the number of discrete objects m, d, respectively, when separatedBulk object datam i And discrete object datad j There is a known association between them, thenA ij =1Otherwise, the device can be used to determine whether the current,A ij =0the method comprises the steps of carrying out a first treatment on the surface of the i is an integer between 1 and M (including 1 and M), j is an integer between 1 and N (including 1 and N);
correlation matrix A based on discrete object m and discrete object d, and similarity matrix of discrete object mS m Similarity matrix to discrete object dS d Constructing heterogeneous network by using adjacency matrix of the following formula (1)
Figure SMS_4
The representation is:
Figure SMS_5
(1)
wherein ,
Figure SMS_7
and
Figure SMS_11
Respectively, a similarity matrix for the discrete objects mS m Similarity matrix to discrete object dS d Normalizing;
Figure SMS_13
Figure SMS_8
, wherein
Figure SMS_14
Representing data +.>
Figure SMS_16
and
Figure SMS_18
Similarity of>
Figure SMS_6
Representing data +.>
Figure SMS_10
and
Figure SMS_15
Similarity of (2); wherein->
Figure SMS_17
Representing data +.>
Figure SMS_9
and
Figure SMS_12
Similarity of (2); diag is a matrix calculation formula, and the meaning is to take the main diagonal elements of the matrix.
Further, step S2 specifically includes:
by being in heterogeneous networks
Figure SMS_19
An upper deployment graph convolutional neural network encoder (GCN) combines node similarity and node association information, and the input setting adopts the following formula (2):
Figure SMS_20
(2)
wherein ,
Figure SMS_21
as penalty factors, the similarity contribution in the GCN propagation process can be controlled>
Figure SMS_22
Representing a transpose of matrix a; the graph roll-up neural network propagation formula adopts the following formula (3):
Figure SMS_23
(3)
wherein : wherein :H (l) H (l+1) respectively areFirst, the
Figure SMS_24
Figure SMS_25
Features of the tier nodes;
Figure SMS_26
Is the degree of the matrix G, gij represents the elements of the ith row and jth column of the matrix G;W (l) is->
Figure SMS_27
Layer to->
Figure SMS_28
Weight matrix used in layer training, +.>
Figure SMS_29
A nonlinear activation function;
Figure SMS_30
The adjacency matrix G is normalized, and a propagation formula is initialized as follows:
Figure SMS_31
according to the above arrangement, the first layer GCN encoder is further described as the following equation (4):
Figure SMS_32
(4)
wherein :
Figure SMS_33
is a training weight matrix from the input layer to the hidden layer;
Figure SMS_34
is the feature matrix of the hidden layer, +.>
Figure SMS_35
Is the number of dimensions of the feature; g is an adjacency matrix, defined in equation (2).
Further, the step S3 specifically includes: capturing a specific representation of the discrete object m and the discrete object d by adding a multi-headed attention score in each of the layers of the graph, the attention score of each layer being represented by the following formula (5):
Figure SMS_36
(5)
wherein :
Figure SMS_37
is a parametric function->
Figure SMS_38
Is->
Figure SMS_39
Training weight matrix of layer, +.>
Figure SMS_40
and
Figure SMS_41
Respectively represent +.>
Figure SMS_42
The node outputs of the discrete object m, d of the layer normalize all the attention scores using a softmax function, which is the following equation (6):
Figure SMS_43
(6)
wherein :
Figure SMS_44
Figure SMS_45
respectively representing neighbor node sets of nodes i and j, wherein exp is an exponential function; capturing structural information of heterogeneous networks by combining embedding of different convolution layersThe final embedding of the graph roll-up neural network coding attention mechanism is represented by the following formula (7):
Figure SMS_46
(7)
wherein :
Figure SMS_47
is the feature of the discrete object m after coding;
Figure SMS_48
Is the feature of the discrete object d after encoding;
Figure SMS_49
Parameters for automatic learning of neural networks, +.>
Figure SMS_50
Is->
Figure SMS_51
Parameters automatically learned by the layer network; initializing to
Figure SMS_52
,LIs the number of iterations.
Further, the step S4 specifically includes: decoding the result using a linear decoder, the associated prediction score P between the discrete object m and the discrete object d being represented by the following formula (8):
Figure SMS_53
(8)
wherein :
Figure SMS_54
the training weight matrix from the hidden layer to the output layer is adopted, and the sigmoid function is a nonlinear activation function, so that the prediction results are all in the range of 0-1;
Figure SMS_55
Represents H d Is a transposed matrix of (a).
Further, the step S5 specifically includes: the calculation formula for minimizing weighted binary cross entropy as a loss function is as follows (9):
Figure SMS_56
(9)
wherein :(i,j)representing discrete object data
Figure SMS_58
And discrete object data->
Figure SMS_61
P(i,j)Representing discrete objects->
Figure SMS_63
And discrete object->
Figure SMS_60
A predicted relevance score between; influence factor- >
Figure SMS_62
For reducing->
Figure SMS_64
and
Figure SMS_65
Influence of data imbalance, ++>
Figure SMS_57
Representing the number of sets of known association pairs for all discrete objects m and d, +.>
Figure SMS_59
Representing the number of sets of discrete object m and discrete object d associated pairs that are not found (p+ representing the positive instance set, p-representing the negative instance set).
Further, the discrete object data association includes: microorganism-human disease association, known drug-disease association, association with different commercial products, and the like.
Further, the similarity calculation model comprises a directed acyclic graph similarity calculation model and a cosine similarity calculation model.
Such as: the semantic description of the disease has a hierarchical structure, so that the similarity can be calculated by using the directed acyclic graph, and the method is not limited to the directed acyclic graph; the medicine contains various characteristics such as a structure, an action target point and the like, so cosine similarity calculation can be selected, and the medicine is not limited to a cosine similarity calculation model.
In a second aspect, the present invention further provides a discrete object data relevance prediction system, which adopts the discrete object data relevance prediction method, and specifically includes:
the discrete object data similarity calculation module is used for calculating the similarity of each discrete object by using the similarity calculation model;
The heterogeneous network construction module is used for constructing a heterogeneous network by utilizing the similarity of the discrete objects and the known association relationship between the discrete objects;
the multi-head attention model building module comprises a graph convolution neural network encoder module, a multi-head attention mechanism module and a linear decoder module, wherein: a graph roll-up neural network encoder module for encoding the discrete object m and the discrete object d on the heterogeneous network using the graph roll-up neural network encoder to combine the node similarity and the node association information; the multi-head attention mechanism module is used for capturing node characteristics of each layer of graph convolution by using multi-head attention, calculating attention scores, and combining the multi-head attention of each layer to obtain the final embedding of the discrete object m and the discrete object d; the linear decoder module is used for decoding the obtained characteristics of the discrete object m and the discrete object d by using the linear decoder so that the output matrix and the input matrix have the same dimension, and obtaining the associated prediction scores among the discrete objects;
and the optimization module is used for adopting the minimized weighted binary cross entropy as a loss function learning parameter and reducing decision deviation caused by the sparse characteristic of the data set.
The invention also provides a computer storage medium on which a computer program is stored, wherein the computer program when executed by an executor implements the discrete object data relevance prediction method described above.
The invention provides a discrete object data association prediction method and a discrete object data association prediction system, which are based on a graph convolution multi-head attention mechanism, and aim at the advantages and disadvantages of the existing discrete object data association prediction method, a heterogeneous network is constructed by utilizing the similarity of a plurality of discrete objects and the known association relationship among the discrete objects, and the applicability of finding the potential association relationship among the discrete objects is effectively enhanced by utilizing the similarity data; the association relation between the nonlinear discrete objects can be effectively captured by using the graph convolution neural network; capturing node characteristics of each layer of graph convolution discrete objects by using multi-head attention, calculating and combining multi-head attention scores of each layer, so that node characteristics of more discrete objects are mined for embedding, and the influence of sparse association on the graph convolution neural network can be effectively compensated; the minimized weighted binary cross entropy is used as a loss function, so that decision deviation caused by sparsity of known associated data among discrete objects can be effectively compensated; the discrete object association relation prediction result obtained by the discrete object association prediction method for the graph convolution multi-head attention is evaluated, so that the prediction precision is high; the prediction method of the invention can be applied to various discrete object data for the association relation prediction of the discrete objects, and has stronger generalization.
Compared with the existing method, the discrete object data relevance prediction method and system provided by the invention have the following advantages:
(1) According to the invention, the heterogeneous network is constructed by using the similarity information of various discrete objects and the known association relation between the discrete objects, so that the data characteristics of each discrete object can be fully utilized.
(2) The present invention uses a graph convolution neural network encoder and a linear decoder to accomplish associative prediction between discrete objects. The graph convolution neural network can capture nonlinear association relations, and has better performance effect on association relations between a small number of known discrete objects and a large number of unknown or discrete objects contained in training data by adopting a semi-supervised training method.
(3) The multi-head attention mechanism is provided to capture more discrete object information, the multi-head attention can capture node characteristics of discrete objects of each layer of convolution layer, the enhancement characteristic representation of the current node can be obtained according to the neighbor node weight of each layer, the multi-head attention mechanism captures different structure information of a heterogeneous network, the problem of inconsistent contribution caused by embedding different node characteristics in different convolution layers can be effectively relieved, and the introduction of the attention mechanism can reduce the influence of association relation among sparse discrete objects on transmissibility in a graph convolution neural network.
(4) The invention uses the minimized weighted binary cross entropy as a loss function to reduce decision bias caused by the sparse characteristic of the known association relationship data between discrete objects, thereby strengthening the influence of positive samples.
(5) The prediction method is suitable for the association relation prediction of the discrete objects, and has stronger generalization.
Experiments prove that the method can remarkably improve the accuracy of discrete object association relation prediction; decision bias caused by known associations between datasets with respect to sparse discrete objects can be effectively reduced.
Drawings
FIG. 1 is a flowchart of a discrete object data relevance prediction method according to embodiment 1 of the present invention;
FIG. 2 is a schematic structural diagram of a discrete object data relevance prediction system according to embodiment 2 of the present invention;
FIG. 3 is a statistical chart of the relationship between the predicted microorganism and the disease according to the method and the system for predicting the microorganism in the embodiment 3;
FIG. 4 is a statistical chart of the predicted drug-disease association relationship using the prediction method and system of the present invention in example 4;
fig. 5 is a schematic diagram of the result of predicting the association relationship between different commodities by using the prediction method and the prediction system of the present invention in embodiment 5.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Example 1
As shown in fig. 1, the present embodiment provides a discrete object data relevance prediction method, which is based on a graph convolution multi-head attention mechanism, and includes the following steps:
s1, respectively calculating the similarity of each discrete object, and constructing a heterogeneous network with the known association relation between the discrete objects; the method specifically comprises the following steps:
according to the data characteristics of the discrete objects, similarity of the discrete objects m and d is obtained by adopting a similarity calculation model through calculation; using the similarity of discrete objects m as matrix
Figure SMS_66
Matrix for representing similarity of discrete objects d
Figure SMS_67
And (3) representing. The discrete object data association includes: microorganism-human disease association, known drug-disease association, association with different commercial products, and the like. The similarity calculation model comprises a directed acyclic graph similarity calculation model and a cosine similarity calculation model.
Such as: the semantic description of the disease has a hierarchical structure, so that the similarity can be calculated by using the directed acyclic graph, and the method is not limited to the directed acyclic graph; the medicine contains various characteristics such as a structure, an action target point and the like, so cosine similarity calculation can be selected, and the medicine is not limited to a cosine similarity calculation model.
Describing the known association between the discrete object m and the discrete object d as a binary matrix
Figure SMS_68
Wherein N, M represents discrete objects d, m, respectivelyQuantity, when discrete object datam i And discrete object datad j There is a known association between them, thenA ij =1Otherwise, the device can be used to determine whether the current,A ij =0the method comprises the steps of carrying out a first treatment on the surface of the i is an integer between 1 and M (including 1 and M), j is an integer between 1 and N (including 1 and N);
correlation matrix A based on discrete object m and discrete object d, and similarity matrix of discrete object mS m Similarity matrix to discrete object dS d Constructing heterogeneous network by using adjacency matrix of the following formula (1)
Figure SMS_69
The representation is:
Figure SMS_70
(1)
wherein ,
Figure SMS_72
and
Figure SMS_79
Respectively, a similarity matrix for the discrete objects mS m Similarity matrix to discrete object dS d Normalizing;
Figure SMS_80
Figure SMS_73
, wherein
Figure SMS_74
Representing data +.>
Figure SMS_75
and
Figure SMS_77
Similarity of>
Figure SMS_71
Representing data +.>
Figure SMS_76
and
Figure SMS_78
Similarity of (2); diag is a matrix calculation formula, and the meaning is to take the main diagonal elements of the matrix.
S2, combining node similarity and node association information of discrete objects on a heterogeneous network by using a graph convolution neural network encoder, and encoding a plurality of discrete objects contained in the heterogeneous network; the method specifically comprises the following steps:
by being in heterogeneous networks
Figure SMS_81
An upper deployment graph convolutional neural network encoder (GCN) combines node similarity and node association information, and the input setting adopts the following formula (2):
Figure SMS_82
(2)
wherein ,
Figure SMS_83
as penalty factors, the similarity contribution in the GCN propagation process can be controlled>
Figure SMS_84
Representing a transpose of matrix a; the graph roll-up neural network propagation formula adopts the following formula (3):
Figure SMS_85
(3)
wherein :H (l) H (l+1) respectively the first
Figure SMS_86
Figure SMS_87
Features of the tier nodes;
Figure SMS_88
Is the degree of the matrix G, gij represents the elements of the ith row and jth column of the matrix G;W (l) is->
Figure SMS_89
Layer to->
Figure SMS_90
Weight matrix used in layer training, +.>
Figure SMS_91
A nonlinear activation function;
Figure SMS_92
The adjacency matrix G is normalized, and a propagation formula is initialized as follows:
Figure SMS_93
according to the above arrangement, the first layer GCN encoder is further described as the following equation (4):
Figure SMS_94
(4)
wherein :
Figure SMS_95
is a training weight matrix from the input layer to the hidden layer;
Figure SMS_96
is the feature matrix of the hidden layer, +.>
Figure SMS_97
Is the number of dimensions of the feature; g is an adjacency matrix, defined in equation (2).
S3, learning characteristic embedding of the discrete object nodes of each convolution layer by using a multi-head attention mechanism to obtain final embedding containing various discrete objects; the method specifically comprises the following steps: capturing a specific representation of the discrete object m and the discrete object d by adding a multi-headed attention score in each of the layers of the graph, the attention score of each layer being represented by the following formula (5):
Figure SMS_98
(5)
wherein :
Figure SMS_99
is a parametric function->
Figure SMS_100
Is->
Figure SMS_101
Training weight matrix of layer, +.>
Figure SMS_102
and
Figure SMS_103
Respectively represent +.>
Figure SMS_104
The node outputs of the discrete object m, d of the layer normalize all the attention scores using a softmax function, which is the following equation (6):
Figure SMS_105
(6)
wherein :
Figure SMS_106
Figure SMS_107
respectively representing neighbor node sets of nodes i and j, wherein exp is an exponential function; the final embedding of the graph convolutional neural network coding attention mechanism by combining the embedding of different convolutional layers to capture the structural information of the heterogeneous network is represented by the following formula (7):
Figure SMS_108
(7)
wherein :
Figure SMS_109
is the feature of the discrete object m after coding;
Figure SMS_110
Is the feature of the discrete object d after encoding;
Figure SMS_111
Parameters for automatic learning of neural networks, +.>
Figure SMS_112
Is->
Figure SMS_113
Parameters automatically learned by the layer network; initializing to
Figure SMS_114
,LIs the number of iterations.
S4, decoding the obtained characteristics containing a plurality of discrete objects by using a linear decoder so that the output matrix and the input matrix have the same dimension, and obtaining the discrete object associated prediction score; the method specifically comprises the following steps: decoding the result using a linear decoder, the associated prediction score P between the discrete object m and the discrete object d being represented by the following formula (8):
Figure SMS_115
(8)
wherein :
Figure SMS_116
the training weight matrix from the hidden layer to the output layer is adopted, and the sigmoid function is a nonlinear activation function, so that the prediction results are all in the range of 0-1;
Figure SMS_117
Represents H d Is a transposed matrix of (a).
Further, the step S5 specifically includes: the calculation formula for minimizing weighted binary cross entropy as a loss function is as follows (9):
Figure SMS_118
(9)
wherein :(i,j)representing discrete object data
Figure SMS_120
And discrete object data->
Figure SMS_124
P(i,j)Representing discrete objects->
Figure SMS_125
And discrete object->
Figure SMS_121
A predicted relevance score between; influence factor->
Figure SMS_123
For reducing->
Figure SMS_126
and
Figure SMS_127
Influence of data imbalance, ++>
Figure SMS_119
Representing the number of sets of known association pairs for all discrete objects m and d, +.>
Figure SMS_122
Representing the number of sets of discrete object m and discrete object d associated pairs that are not found (p+ representing the positive instance set, p-representing the negative instance set).
S5, adopting a minimized weighted binary cross entropy as a loss function learning parameter, and reducing decision deviation caused by sparse characteristics of the data set.
Example 2
As shown in fig. 2, the present embodiment provides a discrete object data relevance prediction system 20 adopting the method described in embodiment 1, specifically including:
a discrete object data similarity calculation module 21, configured to calculate respective similarities of different discrete objects using the similarity calculation model;
A heterogeneous network construction module 22, configured to construct a heterogeneous network using similarity information of different discrete objects and known association relationships between the discrete objects;
a multi-headed attention model building block 23 comprising a graph convolutional neural network encoder block 231, a multi-headed attention mechanism block 232, and a linear decoder block 233, wherein: a convolutional neural network encoder module 231 for encoding the discrete object m and the discrete object d on the heterogeneous network using the convolutional neural network encoder to combine node similarity and node association information; a multi-headed attention mechanism module 232 for capturing the node features of the convolution of each layer of graph using multi-headed attention, calculating an attention score, combining the multi-headed attention of each layer to obtain the final embedding of the discrete object m and the discrete object d; a linear decoder module 233, configured to decode the obtained characteristics of the discrete object m and the discrete object d using a linear decoder so that the output matrix and the input matrix have the same dimensions, and obtain an associated prediction score between the discrete objects;
and the optimization module 24 is used for taking the minimized weighted binary cross entropy as a loss function learning parameter and reducing decision deviation caused by sparse characteristics of the data set.
The embodiment of the invention also provides a computer storage medium, on which a computer program is stored, wherein the computer program, when being executed by an actuator, implements the method as shown in fig. 1.
The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disk-read only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be an article of manufacture that is not accessed by a computer device or may be a component used by an accessed computer device.
Example 3: application example 1, discrete object microorganism-human disease association prediction
In order to verify the effectiveness of the discrete object relevance prediction method based on the graph convolution multi-head attention, in the embodiment, microorganisms are taken as discrete objects m, human diseases are taken as discrete objects d, the known microorganism-human disease relevance is the known relevance among the discrete objects, and the potential relevance between the microorganisms and the human diseases is predicted.
The microbial similarity calculation in this embodiment, unlike the prior art using a gaussian kernel, uses a gene sequence calculation, and is obtained from a gene sequence similarity calculation model, and specifically includes: the similarity was calculated by measuring the degree of fixed alignment between two sequences by Identity (ID) using the basic local alignment search tool BLAST (Basic Local Alignment Search Tool, http:// www.ncbi.nim.nih.gov/BLAST) tool seed, the similarity calculation formula for microorganisms A 'and B' being as follows (10):
Figure SMS_128
(10)
wherein :
Figure SMS_129
the alignment of the microbial A 'and microbial B' gene sequences is represented, and the ID represents the fixed alignment of the microbial sequences and is stored in a matrix form.
The similarity of human diseases is obtained by a directed acyclic graph similarity calculation model, and specifically comprises the following steps: disease data in the MeSH database is constructed into a Directed Acyclic Graph (DAG) of the disease, nodes represent the disease, edges represent the hierarchical relationship between the diseases, and the directed acyclic graph in this embodiment is represented by the following formula (11):
Figure SMS_130
(11)
wherein :
Figure SMS_131
representing all diseases in the subgraph->
Figure SMS_132
Node set, ++>
Figure SMS_133
Representing disease->
Figure SMS_134
Corresponding edges or sets of semantic relationships with other nodes. Then disease- >
Figure SMS_135
Corresponding semantic value->
Figure SMS_136
Can be represented by formula 12:
Figure SMS_137
(12)
Figure SMS_138
representing disease->
Figure SMS_139
Each ancestor node in the corresponding DAG graph is dedicated to the disease +.>
Figure SMS_140
The semantic contribution of (3) is calculated according to the following formula (13):
Figure SMS_141
(13)
wherein ,
Figure SMS_142
for semantic contributors->
Figure SMS_143
The value range is (0, 1), child node of child (n) is represented by child (n), and distance disease in DAG graph is +.>
Figure SMS_144
The farther ancestor nodes of a node contribute less to their semantics. Therefore, the formula for calculating the semantic similarity value between diseases A and B is as follows (14):
Figure SMS_145
(14)
constructing heterogeneous network by using microorganism similarity, disease similarity and known microorganism-disease association relationship, and expressing the microorganism-disease association as binary matrix
Figure SMS_147
Wherein N and M are used to represent the number of diseases and microorganisms, respectively. When a microorganismm i And a diseased j There is a known association between them, thenA ij =1The method comprises the steps of carrying out a first treatment on the surface of the Otherwise the first set of parameters is selected,A ij =0. The similarity of the microorganisms is expressed as matrix +.>
Figure SMS_152
, wherein
Figure SMS_155
Representing microorganism->
Figure SMS_148
and
Figure SMS_151
Similarity of (2); similarity of diseases is expressed as matrix->
Figure SMS_154
, wherein
Figure SMS_157
Representing disease->
Figure SMS_146
and
Figure SMS_150
Is a similarity of (3). Based on a microbe-disease association matrixAMicroorganism similarity matrix>
Figure SMS_153
And disease similarity matrix->
Figure SMS_156
Establishing an adjacency matrix->
Figure SMS_149
. The specific construction method is shown in example 1.
The node similarity and node association information of the discrete objects are combined on the heterogeneous network by using a graph convolution neural network encoder, and microorganisms and diseases contained in the heterogeneous network are encoded; learning the characteristic embedding of the microorganisms and disease nodes of each convolution layer by using a multi-head attention mechanism to obtain the final embedding containing the microorganisms and the disease; decoding the obtained characteristics containing microorganisms and diseases by using a linear decoder so that the dimensions of an output matrix and an input matrix are the same, and obtaining a microorganism-disease associated prediction score; and the minimized weighted binary cross entropy is adopted as a loss function learning parameter, so that decision deviation caused by sparse characteristics of the data set is reduced. Specific methods are described in example 1.
In this example, the method was applied to HMDAD dataset (comprising 483 experimentally confirmed microbe-disease associations between 39 diseases and 292 microbes) with BRWMDA (predicting microbe-disease association based on random walk method) and WMGHDMA (predicting microbe-disease association based on network-based calculation method) as controls.
The statistics of the HMDAD dataset are shown in table 1.
Table 1 data set statistics
Figure SMS_158
There are many performance evaluation methods for association prediction, among which the area under ROC curve (AUC) and the area under PR curve (AUPR) are the most used evaluation methods. AUC and AUPR were chosen as evaluation indicators in this example. The number of known drug-disease pairs predicted to be associated with each other is denoted by TP, FP denotes the number of unknown drug-disease pairs predicted to be associated with each other, FN denotes the number of known drug-disease pairs predicted to be absent from each other, TN denotes the number of unknown drug-disease pairs predicted to be absent from each other, and true positive rate (TPR, true Positive Rate), false positive rate (FPR, false Positive Rate), precision, recall rate Recall may be expressed as:
Figure SMS_159
(15)
Figure SMS_160
(16)
Figure SMS_161
(17)
Figure SMS_162
(18)/>
TPR is taken as an abscissa, FPR is taken as an ordinate, the ROC curve can be represented, the area under the ROC curve is AUROC, precision is taken as an abscissa, recall is taken as an ordinate, the PR curve can be represented, and the area under the PR curve is AUPR.
In order to verify that the method provided by the invention is based on other standard methods in terms of prediction effect, BRWMDA and WMGHDMA are compared with the discrete object data association prediction method (named mammda) provided by the embodiment to AUC and AUPR. In the experiment of the embodiment, a 5-fold cross validation method is adopted to evaluate the performance of the model, namely, a known microorganism-human disease associated data set is divided into 5 groups at random, one group is taken as a test set at a time, and the other four groups are taken as training sets. The convolutional neural network of the model parameter setting diagram of the embodiment adopts a 3-layer structure, 16 hidden layer nodes of each layer have a learning rate of 0.001, a node loss probability of 0.7, an edge loss probability of 0.3 and a heterogeneous network penalty factor of 6. Tables 2 and 3 show the methods AUC, AUPR, accuracy and Precision on the HMDAD dataset. According to the MAGMDA method, the AUC evaluation index on the HMDAD data set is respectively improved by 3.23% compared with that of the method with the next highest AUC evaluation index, the Accuracy is improved by 1.22% compared with that of the method with the next highest AUC evaluation index, and the Precision is improved by 0.41% compared with that of the method with the next highest AUC evaluation index. Overall, the MAGMDA method has some improvement in prediction effect compared to other baseline methods.
TABLE 2 statistical evaluation of the prediction of the relationship between microbial and human disease
Figure SMS_163
Taken together, it is shown that: the discrete object correlation prediction method based on graph convolution multi-head attention provided by the invention uses a gene sequence to calculate microorganism similarity and uses a directed acyclic graph to calculate disease similarity by utilizing microorganism data and disease data, so that more information characteristics of discrete objects are captured, a heterogeneous network is constructed by utilizing the microorganism similarity, the disease similarity and known microorganism-disease correlation, topological information of the heterogeneous network is effectively captured by utilizing a graph convolution neural network of a multi-head attention mechanism, and minimized weighted binary cross entropy is adopted as a loss function learning parameter. The method can fully utilize bioinformatics multisource data, capture nonlinear microorganism-human disease association relation, capture heterogeneous network topology information of different convolution layers, reduce decision bias caused by sparsity of known microorganism-human disease association data and improve microorganism-human disease association prediction precision.
Example 4: application example 2, discrete object drug-disease association prediction
In order to verify the effectiveness of the prediction method provided by the invention, in the embodiment, the medicine is taken as a discrete object m, the disease is taken as a discrete object d, the known medicine-disease association relationship is the known association relationship between the discrete objects, and the potential association relationship between the medicine and the disease is predicted.
Wherein the drug is characterized by comprising: chemical molecular structure, drug interaction and drug action targets are respectively used in matrix
Figure SMS_164
Matrix->
Figure SMS_165
Matrix->
Figure SMS_166
Indicating (I)>
Figure SMS_167
Represents the i-th feature quantity,/->
Figure SMS_168
Indicating the quantity of the drug. Drug similarity was calculated using a cosine similarity calculation model, as shown in formula (19):
Figure SMS_169
(19)
wherein
Figure SMS_170
Indicating the similarity between the ith and jth drugs obtained by the t-th feature,/v>
Figure SMS_171
and
Figure SMS_172
Ith and jth feature vectors representing the tth feature, respectively, +.>
Figure SMS_173
The similarity of the diseases is calculated from the directed acyclic graph calculation model, see example 3, and in particular from formulas (11) - (14).
Constructing heterogeneous network by using medicine similarity, disease similarity and known medicine-disease association relation, and expressing medicine-disease association as binary matrix
Figure SMS_174
Wherein N and M are used to represent the number of diseases and drugs, respectively. When a medicine ism i And a diseased j There is a known association between then A ij =1The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, A ij =0
The similarity of drugs is expressed as a matrix
Figure SMS_176
, wherein
Figure SMS_179
Representing drug->
Figure SMS_182
and
Figure SMS_177
Similarity of diseases is expressed as matrix +.>
Figure SMS_180
, wherein
Figure SMS_183
Representing disease->
Figure SMS_185
and
Figure SMS_175
Is a similarity of (3). Based on drug-disease association matrix A, drug similarity matrix +. >
Figure SMS_178
And disease similarity matrix->
Figure SMS_181
Establishing an adjacency matrix->
Figure SMS_184
. The specific construction method is shown in example 1.
The node similarity and node association information of the discrete objects are combined on the heterogeneous network by using a graph convolution neural network encoder, and medicines and diseases contained in the heterogeneous network are encoded; learning the characteristic embedding of the medicine and disease nodes of each convolution layer by using a multi-head attention mechanism to obtain the final embedding containing the medicine and the disease; decoding the obtained characteristics containing the medicine and the disease by using a linear decoder so that the dimension of an output matrix is the same as that of an input matrix to obtain a medicine-disease associated prediction score; and the minimized weighted binary cross entropy is adopted as a loss function learning parameter, so that decision deviation caused by sparse characteristics of the data set is reduced. Specific methods are described in example 1.
This example uses SCMFDD (binary network based drug-disease association prediction) and nimgcn (graph roll-up network based neuro-induction matrix based drug-disease association prediction) as reference methods, and applies the method to Ldataset dataset (comprising 18416 experimentally verified drug-disease associations between 269 and 598 drugs). Statistical information of the Ldataset dataset is shown in table 3.
Table 3 Ldataset dataset statistics
Figure SMS_186
To verify that the method provided by the present invention compares AUC and AUPR with the discrete object data correlation prediction method (named MAGGCN) provided by this example due to other baseline methods. In the embodiment, the performance of the model is evaluated by adopting a 5-fold cross-validation method, namely, a known drug-human disease associated data set is divided into 5 groups randomly and averagely, one group is taken as a test set at a time, and the other four groups are taken as training sets. The model parameter setting graph convolution neural network of the embodiment adopts a 2-layer structure, 64 hidden layer nodes in each layer have a learning rate of 0.01, a node loss probability of 0.7, an edge loss probability of 0.3 and a heterogeneous network penalty factor of 6. Table 4 and fig. 4 show the respective methods AUC, AUPR, accuracy and Precision on the Ldataset dataset. The MAGGCN method improves the AUC and AUPR evaluation indexes on the Ldataset data set by 3.9% and 3% respectively, and improves the Precision by 1.4% compared with the method with the next highest. Overall, the MAGGCN method predicts a certain improvement over other baseline methods.
TABLE 4 statistics of predicted evaluation results of drug-disease associations
Figure SMS_187
Taken together, it is shown that: according to the discrete object data correlation prediction method provided by the invention, the drug similarity is calculated by using the cosine similarity and the disease data, the disease similarity is calculated by using the directed acyclic graph, so that more information characteristics of discrete objects are captured, the drug similarity, the disease similarity and the known drug-disease correlation are constructed into a heterogeneous network, the topological information of the heterogeneous network is effectively captured by using the graph convolution neural network of a multi-head attention mechanism, and the minimized weighted binary cross entropy is adopted as a loss function learning parameter. The method can fully utilize various characteristic information and disease semantic information of the medicine, fully capture nonlinear medicine-human disease association relation, capture heterogeneous network topology information of different convolution layers, reduce decision bias caused by sparsity of known medicine-disease association data and improve medicine-human disease association prediction precision.
Example 5: application example 3, association prediction of different commodities
In order to verify the effectiveness of the prediction method provided by the invention, the embodiment predicts the potential association relationship existing between different commodities by taking the commodities as discrete objects and different commodities purchased by the same order as association relationships. And calculating commodity similarity by utilizing the characteristics of the commodity, such as attributes, types, functions and the like, and representing the commodity similarity as a similarity matrix. And constructing a heterogeneous network by using the commodity similarity and the association relation between commodities in the known order, wherein the commodity association is represented by a binary matrix. The similarity and association of the goods are combined by deploying a graph roll-up network on the heterogeneous network. The specific representation of the commodity is captured by adding a multi-head attention mechanism to each layer of graph convolution layer, and the final feature embedding of the commodity is obtained by combining the attention scores of each layer of convolution layer. And decoding the characteristic embedding by adopting a linear decoder to obtain the associated prediction score of the commodity. The parameters are learned for the loss function using a minimized weighted binary cross entropy. For specific methods, reference is made to the prediction method provided in example 1 and the discrete object data similarity calculation model provided in examples 3 and 4.
The method of the embodiment is applied to a jindong electronic commerce data set JDdataset (comprising 11212 commodities, 141 commodity categories and 240332 orders). The method of the invention is shown in figure 5 about the commodity association relation prediction result, wherein nodes represent commodities, and edges connected with the nodes represent predicted commodities with association relation.
Taken together, it is shown that: the discrete object relevance prediction method based on the graph convolution multi-head attention provided by the invention can fully utilize various characteristic information of commodities to capture the association relation between nonlinear commodities, capture heterogeneous network topology information of different convolution layers and provide reasonable recommended commodities for an electronic commerce platform recommended page so as to improve sales of the electronic commerce.
The above description is only of a few preferred embodiments of the present invention and should not be taken as limiting the invention, but all modifications, equivalents, improvements and modifications within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (27)

1. A method for predicting the association of a microorganism-human disease, comprising the steps of:
s1, respectively calculating the similarity between the microorganisms of the discrete objects and the similarity between the human diseases, and constructing a heterogeneous network with the known association relationship between the microorganisms and the human diseases;
S2, combining node similarity and node association information of microorganisms and human diseases on a heterogeneous network by using a graph convolution neural network encoder, and encoding the microorganisms and the human diseases contained in the heterogeneous network;
s3, learning characteristic embedding of microorganisms and human disease nodes of each convolution layer by using a multi-head attention mechanism to obtain final embedding containing microorganisms and human diseases;
s4, decoding the obtained characteristics containing the microorganisms and the human diseases by using a linear decoder so that the dimensions of an output matrix and an input matrix are the same, and obtaining a microorganism-human disease associated prediction score;
s5, adopting a minimized weighted binary cross entropy as a loss function learning parameter, and reducing decision deviation caused by sparse characteristics of the data set.
2. The method for predicting microbial-human disease association according to claim 1, wherein step S1 specifically comprises:
according to the data characteristics of the microbial and human diseases, similarity calculation models are adopted to calculate the similarity of the microbial and human diseases respectively; the similarity of microorganisms is used as matrix
Figure QLYQS_1
Matrix for representing similarity of human diseases
Figure QLYQS_2
A representation;
describing a known association between microorganisms and human diseases as a binary matrix
Figure QLYQS_3
Wherein M, N respectively represents the number of microorganisms and human diseases, when the discrete object microorganism data m i And human disease data d j There is a known association between then A ij =1, otherwise, a ij =0;
Correlation matrix A based on microorganisms and human diseases and similarity matrix of microorganisms
Figure QLYQS_4
Similarity matrix with human diseases->
Figure QLYQS_5
Constructing heterogeneous network by using adjacency matrix of the following formula (1)>
Figure QLYQS_6
The representation is:
Figure QLYQS_7
(1)
wherein ,
Figure QLYQS_9
and
Figure QLYQS_12
Respectively is a similarity matrix S for microorganisms m Similarity matrix S with human diseases d Normalizing;
Figure QLYQS_16
Figure QLYQS_10
, wherein
Figure QLYQS_13
Representing data +.>
Figure QLYQS_15
and
Figure QLYQS_17
Similarity of>
Figure QLYQS_8
Representing data +.>
Figure QLYQS_11
and
Figure QLYQS_14
Similarity of (2); diag is a matrix calculation formula, and the meaning is to take the main diagonal elements of the matrix.
3. The method for predicting microbial-human disease association according to claim 2, wherein step S2 specifically comprises:
by being in heterogeneous networks
Figure QLYQS_18
The upper deployment graph convolution neural network encoder combines node similarity and node association information, and the input setting adopts the following formula (2):
Figure QLYQS_19
(2)
wherein ,
Figure QLYQS_20
as penalty factors, the similarity contribution in the GCN propagation process can be controlled>
Figure QLYQS_21
Representing a transpose of matrix a; the graph roll-up neural network propagation formula adopts the following formula (3):
Figure QLYQS_22
(3)
wherein :H(l) ,H (l+1) Respectively the first
Figure QLYQS_23
Figure QLYQS_24
Features of the tier nodes;
Figure QLYQS_25
Is the degree of the matrix G, gij represents the elements of the ith row and jth column of the matrix G; w (W) (l) Is->
Figure QLYQS_26
Layer to->
Figure QLYQS_27
Weight matrix used in layer training, +.>
Figure QLYQS_28
A nonlinear activation function;
Figure QLYQS_29
The adjacency matrix G is normalized, and the propagation formula is initialized as follows:
Figure QLYQS_30
according to the above propagation equation (3) and the setup for propagation equation initialization, the first layer GCN encoder is further described as the following equation (4):
Figure QLYQS_31
(4)
wherein :
Figure QLYQS_32
is a training weight matrix from the input layer to the hidden layer;
Figure QLYQS_33
Is the feature matrix of the hidden layer, +.>
Figure QLYQS_34
Is the number of dimensions of the feature; g is an adjacency matrix.
4. The method for predicting microbial-human disease association according to claim 3, wherein step S3 specifically comprises: capturing a specific representation of microbial and human disease by adding a multi-headed attention score to each of the layers of the graph, the attention score of each layer being represented by the following formula (5):
Figure QLYQS_35
(5)
wherein :
Figure QLYQS_36
is a parametric function->
Figure QLYQS_37
Is->
Figure QLYQS_38
Training weight matrix of layer, +.>
Figure QLYQS_39
and
Figure QLYQS_40
Respectively represent +.>
Figure QLYQS_41
The nodal output of the microbial, human disease of the layer, normalized to all attention scores using a softmax function, which is given by the following formula (6):
Figure QLYQS_42
(6)
wherein :
Figure QLYQS_43
Figure QLYQS_44
respectively representing neighbor node sets of nodes i and j, wherein exp is an exponential function; the final embedding of the graph convolutional neural network coding attention mechanism by combining the embedding of different convolutional layers to capture the structural information of the heterogeneous network is represented by the following formula (7):
Figure QLYQS_45
(7)
wherein :
Figure QLYQS_46
is the characteristic of microorganism data after being encoded;
Figure QLYQS_47
Is the characteristic of encoded human disease data +.>
Figure QLYQS_48
Parameters for automatic learning of neural networks, +.>
Figure QLYQS_49
Is a parameter for the layer 1 network to automatically learn; initializing to
Figure QLYQS_50
L is the number of iterations.
5. The method for predicting microbial-human disease association according to claim 4, wherein step S4 specifically comprises: decoding the result using a linear decoder, the associated predictive score P between the discrete object microorganism and the human disease is represented by the following formula (8):
Figure QLYQS_51
(8)
wherein :
Figure QLYQS_52
is the training right from the hidden layer to the output layerThe heavy matrix, the sigmoid function is a nonlinear activation function, so that the prediction results are all in the range of 0-1;
Figure QLYQS_53
Represents H d Is a transposed matrix of (a).
6. The method for predicting microbial-human disease association according to claim 5, wherein step S5 specifically comprises: the calculation formula for minimizing weighted binary cross entropy as a loss function is as follows (9):
Figure QLYQS_54
(9)
Wherein: (i, j) represents microorganism data
Figure QLYQS_56
And human disease data->
Figure QLYQS_58
The method comprises the steps of carrying out a first treatment on the surface of the P (i, j) represents microorganism data +.>
Figure QLYQS_61
And human disease data->
Figure QLYQS_57
A predicted relevance score between; influence factor->
Figure QLYQS_60
For reducing->
Figure QLYQS_62
and
Figure QLYQS_63
The effect of the data imbalance is that,
Figure QLYQS_55
representation houseNumber of sets of known association pairs with microbial and human diseases, < >>
Figure QLYQS_59
Represents the number of sets of undiscovered microbial and human disease-associated pairs.
7. The method for predicting microbial-human disease association according to claim 2, wherein,
the similarity calculation model comprises a directed acyclic graph similarity calculation model and a cosine similarity calculation model.
8. A system for predicting the association of a microorganism-human disease, comprising:
a data similarity calculation module for calculating the similarity between microorganisms and the similarity between human diseases using the similarity calculation model;
the heterogeneous network construction module is used for constructing a heterogeneous network by utilizing the similarity between microorganisms and the similarity between human diseases and the known association relationship between microorganisms and human diseases;
The multi-head attention model building module comprises a graph convolution neural network encoder module, a multi-head attention mechanism module and a linear decoder module, wherein: a graph roll-up neural network encoder module for encoding microbial and human diseases on a heterogeneous network using the graph roll-up neural network encoder to combine node similarity and node association information; the multi-head attention mechanism module is used for capturing the node characteristics of the convolution of each layer of graph by using multi-head attention, calculating attention scores, and combining the multi-head attention of each layer to obtain the final embedding of the microorganism and human diseases; the linear decoder module is used for decoding the obtained characteristics of the microbial and human diseases by using the linear decoder so that the dimension of the output matrix is the same as that of the input matrix, and obtaining the correlation prediction score between the microbial and human diseases;
and the optimization module is used for adopting the minimized weighted binary cross entropy as a loss function learning parameter and reducing decision deviation caused by the sparse characteristic of the data set.
9. A computer storage medium, characterized in that a computer program is stored thereon, wherein the computer program, when executed by an actuator, implements the method for predicting the association of a microorganism-human disease according to any one of claims 1-7.
10. The medicine-disease association prediction method is characterized by comprising the following steps:
s1, respectively calculating the similarity between the medicines of the discrete objects and the similarity between diseases, and constructing a heterogeneous network with the known association relationship between the medicines and the diseases;
s2, combining the medicine and the node similarity and node association information of the diseases on the heterogeneous network by using a graph convolution neural network encoder, and encoding the medicine and the diseases contained in the heterogeneous network;
s3, learning characteristic embedding of the medicine and disease nodes of each convolution layer by using a multi-head attention mechanism to obtain final embedding containing the medicine and the disease;
s4, decoding the obtained characteristics containing the medicine and the disease by using a linear decoder so that the dimension of an output matrix is the same as that of an input matrix, and obtaining a medicine-disease association prediction score;
s5, adopting a minimized weighted binary cross entropy as a loss function learning parameter, and reducing decision deviation caused by sparse characteristics of the data set.
11. The method of claim 10, wherein step S1 specifically comprises:
according to the data characteristics of the medicine and the disease, adopting a similarity calculation model to calculate and obtain the similarity of the medicine and the disease respectively; matrix similarity of drugs
Figure QLYQS_64
The similarity of the diseases is represented by the matrix +.>
Figure QLYQS_65
A representation;
describing a known association between a drug and a disease as a binary matrix
Figure QLYQS_66
Wherein M, N respectively represents the number of medicines and diseases, when the medicine data m of the discrete object is obtained i And disease data d j There is a known association between then A ij =1, otherwise, a ij =0;
Drug and disease-based incidence matrix A and drug similarity matrix
Figure QLYQS_67
Similarity matrix with diseases->
Figure QLYQS_68
Constructing heterogeneous network by using adjacency matrix of the following formula (1)>
Figure QLYQS_69
The representation is:
Figure QLYQS_70
(1)
wherein ,
Figure QLYQS_73
and
Figure QLYQS_74
Respectively is a similarity matrix S for medicines m Similarity matrix S with disease d Normalizing;
Figure QLYQS_77
Figure QLYQS_72
, wherein
Figure QLYQS_76
Representing drug data
Figure QLYQS_78
and
Figure QLYQS_80
Similarity of>
Figure QLYQS_71
Representing disease data->
Figure QLYQS_75
and
Figure QLYQS_79
Similarity of (2); diag is a matrix calculation formula, and the meaning is to take the main diagonal elements of the matrix.
12. The method of claim 11, wherein step S2 specifically comprises:
by being in heterogeneous networks
Figure QLYQS_81
The upper deployment graph convolution neural network encoder combines node similarity and node association information, and the input setting adopts the following formula (2):
Figure QLYQS_82
(2)
wherein ,
Figure QLYQS_83
as penalty factors, the similarity contribution in the GCN propagation process can be controlled >
Figure QLYQS_84
Representing a transpose of matrix a; the graph roll-up neural network propagation formula adopts the following formula (3):
Figure QLYQS_85
(3)
wherein :H(l) ,H (l+1) Respectively the first
Figure QLYQS_86
Figure QLYQS_87
Features of the tier nodes;
Figure QLYQS_88
Is the degree of the matrix G, gij represents the elements of the ith row and jth column of the matrix G; w (W) (l) Is->
Figure QLYQS_89
Layer to->
Figure QLYQS_90
Weight matrix used in layer training, +.>
Figure QLYQS_91
A nonlinear activation function;
Figure QLYQS_92
The adjacency matrix G is normalized, and the propagation formula is initialized as follows:
Figure QLYQS_93
according to the above propagation equation (3) and the setup for propagation equation initialization, the first layer GCN encoder is further described as the following equation (4):
Figure QLYQS_94
(4)
wherein :
Figure QLYQS_95
is a training weight matrix from the input layer to the hidden layer;
Figure QLYQS_96
Is the feature matrix of the hidden layer, +.>
Figure QLYQS_97
Is the number of dimensions of the feature; g is an adjacency matrix.
13. The method of claim 12, wherein step S3 specifically comprises: capturing a specific representation of the drug and disease by adding multiple attention scores to each of the graph convolution layers, the attention score of each layer being represented by the following formula (5):
Figure QLYQS_98
(5)
wherein :
Figure QLYQS_99
is a parametric function->
Figure QLYQS_100
Is->
Figure QLYQS_101
Training weight matrix of layer, +.>
Figure QLYQS_102
and
Figure QLYQS_103
Respectively represent +.>
Figure QLYQS_104
The nodal output of drug, disease of the layer, normalized all attention scores using a softmax function, which is given by the following formula (6):
Figure QLYQS_105
(6)
wherein :
Figure QLYQS_106
Figure QLYQS_107
respectively representing neighbor node sets of nodes i and j, wherein exp is an exponential function; the final embedding of the graph convolutional neural network coding attention mechanism by combining the embedding of different convolutional layers to capture the structural information of the heterogeneous network is represented by the following formula (7):
Figure QLYQS_108
(7)
wherein :
Figure QLYQS_109
is the characteristic of the coded drug data;
Figure QLYQS_110
Is the characteristic of encoded disease data +.>
Figure QLYQS_111
Parameters for automatic learning of neural networks, +.>
Figure QLYQS_112
Is a parameter for the layer 1 network to automatically learn; initializing to
Figure QLYQS_113
L is the number of iterations.
14. The method of claim 13, wherein step S4 specifically comprises: decoding the result using a linear decoder, the associated predictive score P between the discrete subject drug and the disease is represented by the following formula (8):
Figure QLYQS_114
(8)
wherein :
Figure QLYQS_115
the training weight matrix from the hidden layer to the output layer is adopted, and the sigmoid function is a nonlinear activation function, so that the prediction results are all in the range of 0-1;
Figure QLYQS_116
Represents H d Is a transposed matrix of (a).
15. The method of claim 14, wherein step S5 specifically comprises: the calculation formula for minimizing weighted binary cross entropy as a loss function is as follows (9):
Figure QLYQS_117
(9)
Wherein: (i, j) represents drug data
Figure QLYQS_119
And disease data->
Figure QLYQS_122
The method comprises the steps of carrying out a first treatment on the surface of the P (i, j) represents drug data +.>
Figure QLYQS_124
And disease data->
Figure QLYQS_120
A predicted relevance score between; influence factor->
Figure QLYQS_123
For reducing->
Figure QLYQS_125
and
Figure QLYQS_126
Influence of data imbalance, ++>
Figure QLYQS_118
Representing the number of sets of known association pairs for all drugs and diseases, +.>
Figure QLYQS_121
Representing the number of sets of undiscovered drug and disease association pairs.
16. The method of claim 15, wherein the method comprises the steps of,
the similarity calculation model comprises a directed acyclic graph similarity calculation model and a cosine similarity calculation model.
17. A drug-disease association prediction system, characterized in that it employs the drug-disease association prediction method according to any one of claims 10 to 16, and specifically comprises:
the data similarity calculation module is used for calculating the similarity between medicines and the similarity between diseases by using the similarity calculation model;
the heterogeneous network construction module is used for constructing a heterogeneous network by utilizing the similarity between medicines and the similarity between diseases and the known association relationship between medicines and diseases;
the multi-head attention model building module comprises a graph convolution neural network encoder module, a multi-head attention mechanism module and a linear decoder module, wherein: a graph roll-up neural network encoder module for encoding drugs and diseases on a heterogeneous network using the graph roll-up neural network encoder to combine node similarity and node association information; the multi-head attention mechanism module is used for capturing the node characteristics of the convolution of each layer of graph by using multi-head attention, calculating attention scores, and combining the multi-head attention of each layer to obtain the final embedding of the medicine and the disease; a linear decoder module for decoding the obtained characteristics of the drug and the disease using a linear decoder such that the output matrix is the same as the input matrix in dimension, obtaining a drug-disease correlation prediction score;
And the optimization module is used for adopting the minimized weighted binary cross entropy as a loss function learning parameter and reducing decision deviation caused by the sparse characteristic of the data set.
18. A computer storage medium having stored thereon a computer program, wherein the computer program when executed by an actuator implements the method of predicting drug-disease association according to any one of claims 10-16.
19. The method for predicting the relevance of different commodities is characterized by comprising the following steps of:
s1, respectively calculating the similarity between the first-class commodities and the similarity between the second-class commodities, and constructing a heterogeneous network with the association relationship between the first-class commodities and the second-class commodities in the known order;
s2, combining node similarity and node association information of the first-class commodity and the second-class commodity on the heterogeneous network by using a graph convolution neural network encoder, and encoding the first-class commodity and the second-class commodity contained in the heterogeneous network;
s3, learning characteristic embedding of the first-class commodity and the second-class commodity nodes of each convolution layer by using a multi-head attention mechanism to obtain final embedding containing the first-class commodity and the second-class commodity;
S4, decoding the obtained characteristics comprising the first type of commodity and the second type of commodity by using a linear decoder so that the dimension of an output matrix is the same as that of an input matrix, and obtaining a first type of commodity-second type of commodity association prediction score;
s5, adopting a minimized weighted binary cross entropy as a loss function learning parameter, and reducing decision deviation caused by sparse characteristics of the data set.
20. The method for predicting relevance of different types of commodities according to claim 19, wherein step S1 specifically includes:
according to the data characteristics of the first-class commodities and the second-class commodities, respectively calculating to obtain the similarity of the first-class commodities and the second-class commodities by adopting a similarity calculation model; using a matrix for similarity of first-class commodities
Figure QLYQS_127
Representing the similarity of the second type of goods with matrix +.>
Figure QLYQS_128
A representation;
describing a known association between a first type of commodity and a second type of commodity as a binary matrix
Figure QLYQS_129
Wherein M, N represents the number of first-class commodity and second-class commodity respectively, and the first-class commodity data m is the discrete object i And commodity data of the second class d j There is a known association between then A ij =1, otherwise, a ij =0;
Correlation matrix A based on first-class commodity and second-class commodity and similarity matrix of first-class commodity
Figure QLYQS_130
Similarity matrix with the second class of goods +.>
Figure QLYQS_131
Constructing heterogeneous network by using adjacency matrix of the following formula (1)>
Figure QLYQS_132
The representation is:
Figure QLYQS_133
(1)
wherein ,
Figure QLYQS_135
and
Figure QLYQS_138
Respectively is a similarity matrix S for the first type of commodity m Similarity matrix S with second class commodity d Normalizing;
Figure QLYQS_140
Figure QLYQS_136
, wherein
Figure QLYQS_139
Representing commodity data of the first kind->
Figure QLYQS_141
and
Figure QLYQS_143
Similarity of>
Figure QLYQS_134
Representing commodity data of the second class->
Figure QLYQS_137
and
Figure QLYQS_142
Similarity of (2); diag is a matrix calculation formula, and the meaning is to take the main diagonal elements of the matrix.
21. The method for predicting relevance of different types of commodities according to claim 20, wherein step S2 specifically includes:
by being in heterogeneous networks
Figure QLYQS_144
The upper deployment graph convolution neural network encoder combines node similarity and node association information, and the input setting adopts the following formula (2):
Figure QLYQS_145
(2)
wherein ,
Figure QLYQS_146
as penalty factors, the similarity contribution in the GCN propagation process can be controlled>
Figure QLYQS_147
Representing a transpose of matrix a; the graph roll-up neural network propagation formula adopts the following formula (3):
Figure QLYQS_148
(3)
wherein :H(l) ,H (l+1) Respectively the first
Figure QLYQS_149
Figure QLYQS_150
Features of the tier nodes;
Figure QLYQS_151
Is the degree of the matrix G, gij represents the elements of the ith row and jth column of the matrix G; w (W) (l) Is->
Figure QLYQS_152
Layer to->
Figure QLYQS_153
Weight matrix used in layer training, +.>
Figure QLYQS_154
A nonlinear activation function; / >
Figure QLYQS_155
The adjacency matrix G is normalized, and the propagation formula is initialized as follows:
Figure QLYQS_156
according to the above propagation equation (3) and the setup for propagation equation initialization, the first layer GCN encoder is further described as the following equation (4):
Figure QLYQS_157
(4)
wherein :
Figure QLYQS_158
is a training weight matrix from the input layer to the hidden layer;
Figure QLYQS_159
Is the feature matrix of the hidden layer, +.>
Figure QLYQS_160
Is the number of dimensions of the feature; g is an adjacency matrix.
22. The method for predicting relevance of different types of commodities according to claim 21, wherein step S3 specifically includes: capturing a specific representation of the first type of commodity and the second type of commodity by adding a multi-headed attention score to each of the layers of the graph, the attention score of each layer being represented by the following formula (5):
Figure QLYQS_161
(5)
wherein :
Figure QLYQS_162
is a parametric function->
Figure QLYQS_163
Is->
Figure QLYQS_164
Training weight matrix of layer, +.>
Figure QLYQS_165
and
Figure QLYQS_166
Respectively represent +.>
Figure QLYQS_167
The node outputs of the first class commodity and the second class commodity of the layer normalize all attention scores by using a softmax function, wherein the softmax function is as follows (6):
Figure QLYQS_168
(6)
wherein :
Figure QLYQS_169
Figure QLYQS_170
respectively representing neighbor node sets of nodes i and j, wherein exp is an exponential function; the final embedding of the graph convolutional neural network coding attention mechanism by combining the embedding of different convolutional layers to capture the structural information of the heterogeneous network is represented by the following formula (7):
Figure QLYQS_171
(7)
wherein :
Figure QLYQS_172
is the characteristic of the first commodity data after being encoded;
Figure QLYQS_173
Is the characteristic of the second commodity data after coding>
Figure QLYQS_174
Parameters for automatic learning of neural networks, +.>
Figure QLYQS_175
Is a parameter for the layer 1 network to automatically learn; initializing to
Figure QLYQS_176
L is the number of iterations.
23. The method for predicting relevance of different types of commodities according to claim 22, wherein step S4 specifically includes: decoding the result using a linear decoder, the associated prediction score P between the first type of commodity and the second type of commodity being represented by the following formula (8):
Figure QLYQS_177
(8)
wherein :
Figure QLYQS_178
the training weight matrix from the hidden layer to the output layer is adopted, and the sigmoid function is a nonlinear activation function, so that the prediction results are all in the range of 0-1;
Figure QLYQS_179
Represents H d Is a transposed matrix of (a).
24. The method for predicting relevance of different types of commodities according to claim 23, wherein step S5 specifically includes: the calculation formula for minimizing weighted binary cross entropy as a loss function is as follows (9):
Figure QLYQS_180
(9)
wherein: (i, j) represents first-class commodity data
Figure QLYQS_182
And second class commodity data->
Figure QLYQS_185
The method comprises the steps of carrying out a first treatment on the surface of the P (i, j) represents first-class commodity data
Figure QLYQS_187
And second class commodity data->
Figure QLYQS_183
A predicted relevance score between; influence factor->
Figure QLYQS_186
For reducing- >
Figure QLYQS_188
and
Figure QLYQS_189
Influence of data imbalance, ++>
Figure QLYQS_181
Representing the number of sets of known association pairs for all first and second type of goods, +.>
Figure QLYQS_184
Representing the number of sets of undiscovered drug and disease association pairs.
25. The method of claim 24, wherein the step of predicting relevance of different types of merchandise,
the similarity calculation model comprises a directed acyclic graph similarity calculation model and a cosine similarity calculation model.
26. A system for predicting relevance of different types of commodities, which is characterized by adopting the method for predicting relevance of different types of commodities according to any one of claims 19 to 25, and specifically comprising:
the data similarity calculation module is used for calculating the similarity between the first type of commodities and the similarity between the second type of commodities by using a similarity calculation model;
the heterogeneous network construction module is used for constructing a heterogeneous network by utilizing the similarity between the first type commodities, the similarity between the second type commodities and the known association relationship between the first type commodities and the second type commodities;
the multi-head attention model building module comprises a graph convolution neural network encoder module, a multi-head attention mechanism module and a linear decoder module, wherein: the graph roll neural network encoder module is used for encoding the first type commodity and the second type commodity by using the graph roll neural network encoder to combine node similarity and node association information on the heterogeneous network; the multi-head attention mechanism module is used for capturing the node characteristics of the convolution of each layer of graph by using multi-head attention, calculating attention scores, and combining the multi-head attention of each layer to obtain the final embedding of the first type commodity and the second type commodity; the linear decoder module is used for decoding the obtained characteristics of the first-class commodities and the second-class commodities by using the linear decoder so that the output matrix and the input matrix have the same dimension, and obtaining the correlation prediction score between the first-class commodities and the second-class commodities;
And the optimization module is used for adopting the minimized weighted binary cross entropy as a loss function learning parameter and reducing decision deviation caused by the sparse characteristic of the data set.
27. A computer storage medium having stored thereon a computer program, wherein the computer program when executed by an actuator implements the heterogeneous product relevance prediction method of any of claims 19-25.
CN202310339869.8A 2023-04-03 2023-04-03 Discrete object data relevance prediction method and system and storage medium Active CN116049769B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310339869.8A CN116049769B (en) 2023-04-03 2023-04-03 Discrete object data relevance prediction method and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310339869.8A CN116049769B (en) 2023-04-03 2023-04-03 Discrete object data relevance prediction method and system and storage medium

Publications (2)

Publication Number Publication Date
CN116049769A CN116049769A (en) 2023-05-02
CN116049769B true CN116049769B (en) 2023-06-20

Family

ID=86122132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310339869.8A Active CN116049769B (en) 2023-04-03 2023-04-03 Discrete object data relevance prediction method and system and storage medium

Country Status (1)

Country Link
CN (1) CN116049769B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075756B (en) * 2023-10-12 2024-03-19 深圳市麦沃宝科技有限公司 Real-time induction data processing method for intelligent touch keyboard

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2605218A (en) * 2021-03-23 2022-09-28 Adobe Inc Graph Neural Networks for datasets with heterophily

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11481418B2 (en) * 2020-01-02 2022-10-25 International Business Machines Corporation Natural question generation via reinforcement learning based graph-to-sequence model
US20210374499A1 (en) * 2020-05-26 2021-12-02 International Business Machines Corporation Iterative deep graph learning for graph neural networks
US20220092413A1 (en) * 2020-09-23 2022-03-24 Beijing Wodong Tianjun Information Technology Co., Ltd. Method and system for relation learning by multi-hop attention graph neural network
CN113362160B (en) * 2021-06-08 2023-08-22 南京信息工程大学 Federal learning method and device for credit card anti-fraud
CN113807616B (en) * 2021-10-22 2022-11-04 重庆理工大学 Information diffusion prediction system based on space-time attention and heterogeneous graph convolution network
CN114496092B (en) * 2022-02-09 2024-05-03 中南林业科技大学 MiRNA and disease association relation prediction method based on graph rolling network
CN115527627A (en) * 2022-10-08 2022-12-27 湖州师范学院 Drug relocation method and system based on hypergraph convolutional neural network
CN115732079A (en) * 2022-11-17 2023-03-03 湖南电子科技职业学院 Microorganism and disease association relation prediction method and system based on graph convolution network
CN115798730A (en) * 2022-11-18 2023-03-14 中南大学 Method, apparatus and medium for circular RNA-disease association prediction based on weighted graph attention and heterogeneous graph neural networks
CN115828143A (en) * 2022-12-20 2023-03-21 南通大学 Node classification method for realizing heterogeneous primitive path aggregation based on graph convolution and self-attention mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2605218A (en) * 2021-03-23 2022-09-28 Adobe Inc Graph Neural Networks for datasets with heterophily

Also Published As

Publication number Publication date
CN116049769A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
Zhang et al. Hierarchical graph pooling with structure learning
Ma et al. Deep learning on graphs
Li et al. Deep convolutional computation model for feature learning on big data in internet of things
Jia et al. Feature dimensionality reduction: a review
Dong et al. A survey on deep learning and its applications
Law et al. Multi-label classification using a cascade of stacked autoencoder and extreme learning machines
Salaken et al. Seeded transfer learning for regression problems with deep learning
Hu et al. Transformation-gated LSTM: efficient capture of short-term mutation dependencies for multivariate time series prediction tasks
Song et al. Multi-layer discriminative dictionary learning with locality constraint for image classification
Tian et al. A neural architecture search based framework for liquid state machine design
Chen et al. AGNN: Alternating graph-regularized neural networks to alleviate over-smoothing
Ma et al. MIDIA: exploring denoising autoencoders for missing data imputation
Zhang et al. Application of convolutional neural network to traditional data
Fu et al. Adaptive graph convolutional collaboration networks for semi-supervised classification
Kinderkhedia Learning Representations of Graph Data--A Survey
Yuan et al. SRLF: a stance-aware reinforcement learning framework for content-based rumor detection on social media
CN116049769B (en) Discrete object data relevance prediction method and system and storage medium
CN117349494A (en) Graph classification method, system, medium and equipment for space graph convolution neural network
Jiang et al. An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing
Palmucci et al. Where is your field going? A machine learning approach to study the relative motion of the domains of physics
Zhang et al. Deep compression of probabilistic graphical networks
Bi et al. Improved network intrusion classification with attention-assisted bidirectional LSTM and optimized sparse contractive autoencoders
Zhao et al. Graph pooling via Dual-view Multi-level Infomax
Cao et al. Implicit user relationships across sessions enhanced graph for session-based recommendation
Zhang et al. Deep heterogeneous network embedding based on Siamese Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant