CN114639483B

CN114639483B - Electronic medical record retrieval method and device based on graphic neural network

Info

Publication number: CN114639483B
Application number: CN202210291079.2A
Authority: CN
Inventors: 吕旭东; 李梦阳; 段会龙; 蔡海领
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2024-10-18
Anticipated expiration: 2042-03-23
Also published as: CN114639483A

Abstract

The invention discloses an electronic medical record retrieval method based on a graph neural network, which comprises the following steps: obtaining a co-occurrence matrix of medical entities in the electronic medical record, adding co-occurrence information of the medical entities and ancestor medical entities into the medical entity co-occurrence matrix to obtain an enhanced medical entity co-occurrence matrix, extracting each medical entity vector representation and each patient vector representation by adopting a GloVe model, wherein the electronic medical record abnormal graph comprises medical entity nodes, patient nodes, a link real relationship between the medical entities and a link real relationship between the patient and the medical entities; inputting the electronic medical record heterogram into a graph neural network to respectively obtain a patient node output vector representation and a medical entity node output vector representation, and linking relation probability of the patient and the medical entity; probability of link relation between medical entities; training the graph neural network by using the total loss function, and updating parameters to obtain a final graph neural network; the method can prepare for predicting the probability of association of the patient with the medical entity.

Description

Electronic medical record retrieval method and device based on graphic neural network

Technical Field

The invention relates to the technical field of medical information data processing, in particular to an electronic medical record retrieval method and device based on a graphic neural network.

Background

Medical practice is an activity requiring a large amount of data support that requires the information of the patient to be constantly acquired for analysis and decision making. As one of the main information sources at present, electronic medical records contain rich information, and the use of the information to support medical activities such as clinical decision support, clinical research and clinical trials is of great importance, and the research development needs to effectively query the electronic medical record data. In the query task developed by the medical field personnel, the lack of support of information technicians enables the information technicians to complete query expression only by relying on own knowledge, so that the process of the query task is challenging, a large amount of browsing and exploration is needed to find target information, the work efficiency is greatly reduced, and the work load of the medical professionals is increased. To address this problem, an automated approach is needed to reduce the time costs of clinical staff.

In this process, the query performance can be effectively improved by utilizing the semantic association mode. Currently, in the query task of the actual scene, various medical entities in the electronic medical record are associated by utilizing medical ontology knowledge, and corresponding target query entities are expanded in the query through the relationship. However, the method excessively depends on a general medical knowledge body, and associated information existing in the electronic medical record is easy to ignore; in addition, entities in the electronic medical record, which are not present in the medical knowledge body, cannot be expanded, so that the application range of the method is limited.

The electronic medical record contains rich associated information which can effectively help to optimize the query task. Based on this idea, it is necessary to establish a link relationship between electronic medical record data using different information, and then improve a query task by an association relationship between the data. With the development of machine learning, in particular, the practical effect of deep learning in various fields is confirmed, so that modeling of electronic medical records by using a neural network is an effective means. The image neural network can be used for representing a complex topological structure, and the electronic medical record can be regarded as a complex heterogeneous image structure, so that the image neural network can be used for effectively representing the structure of the electronic medical record.

The graph neural network is a novel neural network structure developed from convolutional neural network and graph representation learning, can extract and represent the characteristics of the data in the graph field compared with the data type oriented by the prior neural network, is an efficient and easily-expanded structure, and shows a powerful function in the aspect of learning graph data. In contrast to conventional deep learning methods, it can reflect entities and their associations through a built graph model. The graph neural network firstly carries out initialization description on nodes, then obtains states with characteristics including neighbor node information and network topology through continuous node state update, and finally outputs the nodes through a specific method to obtain required results which can be used in subsequent tasks. Therefore, this method is well suited for modeling heterogeneous electronic medical records.

The medical field has a great deal of medical ontology knowledge due to its professionals and complexities, such as: ICD, SNOMED-CT, etc. can be used to build the relationship between different medical entities, and build the association information which does not exist in the electronic medical record, thereby enriching the topology structure information in the network.

Disclosure of Invention

The invention discloses an electronic medical record retrieval method based on a graph neural network, which can expand the relation range between medical entities and between the medical entities and patients so as to prepare for predicting the association probability between the patients and the medical entities.

An electronic medical record retrieval method based on a graph neural network comprises the following steps:

(1) Obtaining a co-occurrence matrix of medical entities in the electronic medical record, traversing medical ontology ICD codes to obtain a plurality of ancestor medical entities corresponding to the medical entities, adding co-occurrence information of the medical entities and the ancestor medical entities into the medical entity co-occurrence matrix to obtain an enhanced medical entity co-occurrence matrix, extracting each medical entity vector representation by adopting a GloVe model based on the enhanced medical entity co-occurrence matrix, and taking an aggregate result of the plurality of medical entity vector representations associated with a patient as a patient vector representation;

(2) Constructing an electronic medical record heterogram, wherein the electronic medical record heterogram comprises medical entity nodes, patient nodes, a real link relationship between medical entities and a real link relationship between a patient and the medical entities;

Each medical entity vector is expressed as an initial attribute of each medical entity node, each patient vector is expressed as an initial attribute of each patient node, relevant medical entities are connected to obtain a real link relationship between the medical entities, and the associated medical entities are connected with the patient to obtain a real link relationship between the patient and the medical entities;

(3) Inputting the electronic medical record iso-graph into GRAPHSAGE graph neural network to respectively obtain patient node output vector representation and medical entity node output vector representation; based on the patient node output vector representation and the medical entity node output vector representation, obtaining the probability of the link relation between the patient and the medical entity by adopting an activation function; based on the medical entity node output vector representation, obtaining the probability of the link relation between medical entities by adopting an activation function;

(4) Constructing a total loss function, wherein the total loss function comprises a first loss function, a second loss function and a multi-task weighted loss function;

Constructing a first loss function through the cross entropy of the patient and medical entity link true relationship and the patient and medical entity link relationship probability;

Constructing a second loss function through the cross entropy of the link true relationship between the medical entities and the probability of the link relationship between the medical entities;

Constructing a multitasking weighted loss function by the loss value of the first loss function and the loss value of the second loss function;

(5) Training GRAPHSAGE the graph neural network by using the total loss function, and updating parameters to obtain a final GRAPHSAGE graph neural network;

(6) When the method is applied, the medical entity vector representation and the patient vector representation are input into the final GRAPHSAGE-diagram neural network to predict and obtain the association probability of the medical entity and the patient.

Obtaining the co-occurrence matrix of the medical entity in the electronic medical record comprises:

And taking the frequency product of each patient in each medical entity in each treatment record as the co-occurrence information of each two medical entities, constructing a co-occurrence matrix of each treatment record based on the co-occurrence information of each two medical entities, adding the co-occurrence matrices of the treatment records of each patient to obtain a co-occurrence matrix of each patient electronic medical record, and adding the co-occurrence matrices of a plurality of patient electronic medical records to obtain the co-occurrence matrix of the medical entities in the electronic medical record.

Traversing the ICD codes to obtain a plurality of ancestor medical entities corresponding to the medical entities, including:

and taking each medical entity as a leaf node, traversing the medical ontology ICD codes from bottom to top to obtain a plurality of ancestor nodes corresponding to the leaf nodes, extracting the medical entities corresponding to the ancestor nodes to obtain ancestor medical entities, obtaining the co-occurrence information of each medical entity and the ancestor medical entities in the medical ontology ICD codes, and adding the co-occurrence information into a medical entity co-occurrence matrix to expand the medical entity co-occurrence matrix.

Extracting each medical entity vector representation using GloVe models based on the enhanced medical entity co-occurrence matrix, including:

setting an initial vector representation of each medical entity, inputting the initial vector representation into a GloVe model, and obtaining the vector representation of each medical entity through training an objective function, wherein the objective function J is as follows:

Wherein M _ij is the co-occurrence product of the ith entity vector and the jth entity vector in the enhanced medical entity co-occurrence matrix, |D| is the number of medical entities, e _j is the vector representation of the jth medical entity, e _i is the vector representation of the ith medical entity, b _i is the bias parameter of the ith medical entity, and b _j is the bias parameter of the jth medical entity.

An aggregate result of the plurality of medical entity vector representations associated with the patient is represented as a patient vector, the aggregate result comprising a sum, an average, a maximum, or a minimum.

Inputting the electronic medical record iso-graph to GRAPHSAGE graph neural network to respectively obtain patient node output vector representation and medical entity node output vector representation, including:

Performing Mean aggregator aggregation on the current layer neighbor node vector representation and the previous layer output vector representation of the patient node to obtain a patient node output vector representation, and performing Mean aggregator aggregation on the current layer neighbor node vector representation and the previous layer output vector representation of the medical entity node to obtain a medical entity node output vector representation;

wherein the current layer neighbor node vector represents The method comprises the following steps:

Wherein R is the link real relationship between medical entities or the link real relationship between a patient and the medical entities, R is the set of the link real relationship, u is the neighbor node, v is the current node, N ^(r) (v) is the neighbor node of the current node v on the R link real relationship, For the neighbor node vector representation of the previous layer, l is the current layer, AGGREGATE (·) is the aggregation operation of merging the neighbor information of all the current nodes v together;

Medical entity node output vector representation The method comprises the following steps:

Wherein, For the vector representation of the medical entity node d of the previous layer, W _d is the weight parameter of the medical entity node d, MEAN (·) is an averaging function, and sigma (·) is an activating function;

patient node output vector representation The method comprises the following steps:

Wherein, For the vector representation of the patient node p of the previous layer, W _p is the weight parameter of the patient node p.

The multitasking weighted loss function L is:

Wherein, Is an index of the mth loss function weight factor,The loss value of the mth loss function.

An electronic medical record prediction device based on a graph neural network comprises a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory adopts the final GRAPHSAGE graph neural network model;

the computer processor, when executing the computer program, performs the steps of:

And inputting the medical entity vector representation and the patient vector representation into a final GRAPHSAGE-diagram neural network to predict and obtain the association probability of the medical entity and the patient.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the co-occurrence matrix of the medical entity is expanded by introducing the co-occurrence information of the medical entity in the ICD code of the medical entity in the co-occurrence matrix of the electronic medical record, so that the relations between the medical entities and the patients are enriched, and the relevance between the patients and the medical entities is more accurately obtained through the post-learning graph neural network.

(2) According to the invention, the link relation between the medical entities is established through the heterogeneous graph, the link relation between the medical entities and the patient is obtained through the multi-task weighted loss function training, and the relevance between the medical entities and the patient can be accurately determined.

Drawings

Fig. 1 is a flowchart of an electronic medical record retrieval method based on a neural network according to an embodiment of the present invention.

FIG. 2 is a flow chart of a neural network model of a multi-task weighted loss function optimization graph according to an embodiment of the present invention.

Detailed Description

The implementation scheme of the electronic medical record link prediction method based on the graph neural network and fused with knowledge is clearly and completely described below with reference to the attached drawings.

An electronic medical record prediction method based on a graph neural network, as shown in fig. 1, specifically comprises the following steps:

S1: and taking the frequency product of each patient in each medical entity in each treatment record as the co-occurrence information of each two medical entities, constructing a co-occurrence matrix of each treatment record based on the co-occurrence information of each two medical entities, adding the co-occurrence matrices of the treatment records of each patient to obtain a co-occurrence matrix of each patient electronic medical record, and adding the co-occurrence matrices of a plurality of patient electronic medical records to obtain the co-occurrence matrix of the medical entities in the electronic medical record.

Wherein co-occurrence information co-occurrence (c _i,c_j, p) of each two medical entities is:

co-occurrence(c_i,c_j,p)＝count(c_i,p)×count(c_j,p)

wherein count (c _i, p) is the number of occurrences of the ith medical entity of the p-th patient in each visit record, and count (c _j, p) is the number of occurrences of the jth medical entity of the p-th patient in each visit record.

Obtaining a co-occurrence matrix of medical entities in an electronic medical record, taking each medical entity as a leaf node, obtaining a plurality of ancestor nodes corresponding to the leaf node through traversing a medical ontology ICD code from bottom to top, extracting the medical entities corresponding to the ancestor nodes to obtain ancestor medical entities, obtaining co-occurrence information of each medical entity and the ancestor medical entities in the medical ontology ICD code, adding the co-occurrence information into the medical entity co-occurrence matrix to obtain an enhanced medical entity co-occurrence matrix so as to expand the medical entity co-occurrence matrix, setting an initial vector representation of each medical entity, inputting the initial vector representation into a GloVe model, and obtaining each medical entity vector representation through training of an objective function, wherein the objective function J is as follows:

And representing as a patient vector an aggregate result of the plurality of medical entity vector representations associated with the patient, the aggregate result comprising a sum, an average, a maximum or a minimum.

S2: using each medical entity vector representation as an initial input of each medical entity node, using each patient vector representation as an initial input of each patient node, connecting related medical entities to obtain a real link relationship between medical entities, and connecting related medical entities with a patient to obtain a real link relationship between the patient and the medical entities; to construct an electronic medical record heterogram.

S3: inputting the electronic medical record iso-graph into GRAPHSAGE-graph neural network, obtaining current-layer neighbor node vector representation by means of a sum calculation method through the link real relationship among medical entities and the link real relationship between patients and the medical entities and through aggregator aggregation of GRAPHSAGEThe method comprises the following steps:

Wherein R is the link real relationship between medical entities or the link real relationship between a patient and the medical entities, R is the set of the link real relationship, u is the neighbor node, v is the current node, N ^(r) (v) is the neighbor node of the current node v on the R link real relationship, For the neighbor node vector representation of the previous layer, l is the current layer, AGGREGATE (·) is the aggregation operation that merges together the neighbor information of all current nodes v.

Obtaining the probability of the link relation between the medical entity and the patient by adopting an activation function based on the patient node output vector representation and the medical entity node output vector representationThe method comprises the following steps:

Where z _d is the output vector representation of the medical entity node, z _p is the patient node output vector representation, and δ (·) is the activation function.

Obtaining the probability of the link relation between the medical entities by adopting an activation function based on the output vector representation of the medical entity nodesIs that;

wherein z _d′ is another medical entity node output vector representation.

S4: constructing a total loss function: as shown in fig. 2, node information of the heterogram G is calculated by using a graph neural network, and a first loss function is constructed by a patient-medical entity link true relationship (the patient-medical entity link true relationship is 1 if there is a link relationship, and 0 if there is no link relationship) and a cross entropy of the patient-medical entity link relationship probability to train a patient-medical entity link prediction task L ₁;

Constructing a second loss function through the cross entropy of the link true relationship between the medical entities and the probability of the link relationship between the medical entities to train the medical entity-medical entity relationship link prediction task L ₂;

Constructing a multi-task weighted loss function learning weight factor eta through the loss value of the first loss function and the loss value of the second loss function;

The training method using the multi-task weighted loss function combines the two loss functions to optimize simultaneously, and the multi-task weighted loss function L is:

Wherein, Is an index of the mth loss function weighting factor eta _m,And (3) ending training to obtain a weight factor eta for the loss value of the mth loss function if the weight factor eta is converged, and continuously calculating node information of the heterograph G by using the graph neural network if the weight factor eta is not converged.

Training GRAPHSAGE the graph neural network by using the total loss function, and updating parameters to obtain a final GRAPHSAGE graph neural network;

s5: when the method is applied, the medical entity vector representation and the patient vector representation are input into the final GRAPHSAGE-diagram neural network to predict and obtain the association probability of the medical entity and the patient.

Based on the method, the relation range between the medical entities and the patient is enlarged, so that the association degree between the medical entities and the patient can be accurately predicted.

Claims

1. The electronic medical record retrieval method based on the graph neural network is characterized by comprising the following steps of:

the method comprises the steps of using each medical entity vector as an initial attribute of each medical entity node, using each patient vector as an initial attribute of each patient node, connecting related medical entities to obtain a real link relationship between medical entities, and connecting the related medical entities with a patient to obtain a real link relationship between the patient and the medical entities;

(3) Inputting the electronic medical record iso-graph into GRAPHSAGE graph neural network to respectively obtain patient node output vector representation and medical entity node output vector representation; obtaining the probability of the link relation between the patient and the medical entity by adopting an activation function based on the patient node output vector representation and the medical entity node output vector representation; obtaining the probability of the link relation between the medical entities by adopting an activation function based on the medical entity node output vector representation;

(6) When the method is applied, the medical entity vector representation and the patient vector representation are input into a final GRAPHSAGE graph neural network to predict and obtain the association probability of the medical entity and the patient;

taking the frequency product of each patient in each visit record of every two medical entities as the co-occurrence information of every two medical entities, constructing a co-occurrence matrix of each visit record based on the co-occurrence information of every two medical entities, adding the co-occurrence matrices of the multiple visit records of each patient to obtain a co-occurrence matrix of each patient electronic medical record, and adding the co-occurrence matrices of a plurality of patient electronic medical records to obtain a co-occurrence matrix of the medical entities in the electronic medical record;

taking each medical entity as a leaf node, traversing the medical ontology ICD codes from bottom to top to obtain a plurality of ancestor nodes corresponding to the leaf nodes, extracting the medical entities corresponding to the ancestor nodes to obtain ancestor medical entities, obtaining the co-occurrence information of each medical entity and the ancestor medical entities in the medical ontology ICD codes, and adding the co-occurrence information into a medical entity co-occurrence matrix to expand the medical entity co-occurrence matrix;

Wherein M _ij is the co-occurrence product of the ith entity vector and the jth entity vector in the enhanced medical entity co-occurrence matrix, |d| is the number of medical entities, e _j is the vector representation of the jth medical entity, e _i is the vector representation of the ith medical entity, b _i is the bias parameter of the ith medical entity, and b _j is the bias parameter of the jth medical entity;

Wherein R is the link real relationship between medical entities or the link real relationship between a patient and the medical entities, R is the set of the link real relationship, u is the neighbor node, v is the current node, N ^(r) (v) is the neighbor node of the current node v on the R link real relationship, For the neighbor node vector representation of the previous layer, l is the current layer, AGGREGATE (·) is the aggregation operation of merging the neighbor information of the current node v together;

2. The electronic medical record retrieval method based on a graph neural network of claim 1, wherein the aggregate result of the plurality of medical entity vector representations associated with the patient is represented as a patient vector, and the aggregate operation includes summing, averaging, maximum or minimum.

3. The electronic medical record retrieval method based on a graph neural network according to claim 1, wherein the multitasking weighted loss function L is:

Wherein, Is an index of the mth loss function weighting factor eta _m,The loss value of the mth loss function.

4. An electronic medical record retrieval device based on a graph neural network, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory adopts the final GRAPHSAGE graph neural network model as claimed in any one of claims 1 to 3;