CN115841861A - Similar medical record recommendation method and system - Google Patents
Similar medical record recommendation method and system Download PDFInfo
- Publication number
- CN115841861A CN115841861A CN202211268066.XA CN202211268066A CN115841861A CN 115841861 A CN115841861 A CN 115841861A CN 202211268066 A CN202211268066 A CN 202211268066A CN 115841861 A CN115841861 A CN 115841861A
- Authority
- CN
- China
- Prior art keywords
- medical record
- electronic medical
- entity
- unit
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a method and a system for recommending similar medical records, wherein the system comprises a medical record data preprocessing module, a medical record recommendation module and a medical record recommendation module, wherein the medical record data preprocessing module is used for acquiring and preprocessing an electronic medical record and extracting key contents in the electronic medical record to construct a knowledge graph of the electronic medical record; the disease knowledge graph construction module comprises a mode layer construction unit, a data pre-labeling unit, an entity relation extraction unit, a knowledge graph representation unit and a knowledge graph storage unit; the knowledge representation module comprises a knowledge representation learning unit and a medical record representation unit; learning the representation of the entity through the knowledge graph, and further representing the whole electronic medical record by using the entity in the medical record; and the similar medical record recommending module comprises a medical record similarity calculating unit and a similar medical record recommending unit, calculates the similarity of any two electronic medical records by using the representation of the knowledge graph, and calculates the electronic medical record database by using a sequencing mode so as to select and recommend the electronic medical record which is most similar to the electronic medical record.
Description
Technical Field
The invention relates to the technical field of digital medical treatment, in particular to a method and a system for recommending similar medical records.
Background
With the development of medical informatization, electronic medical records gradually replace handwritten medical records, and a large amount of structured and unstructured data in the patient treatment process are accumulated, so that the electronic medical records are important medical information resources. The electronic medical record is the record of the whole medical treatment process of the patient, saves the basic information, the morbidity condition, the treatment scheme and the like of the patient, and has a complex language structure and rich semantic knowledge.
The knowledge graph can effectively organize data and the relationship between the data, and through the steps of entity-relationship extraction, knowledge representation and the like, entities such as diseases, symptoms and the like and the association relationship are identified from complicated data to construct the medical knowledge graph.
On the basis, similar medical records are retrieved to obtain similar medical records, so that a doctor is assisted in diagnosis and decision making, utilization of medical data is improved, data value of electronic medical records is mined, diagnosis and treatment experience and relevant characteristic statistical information of similar patients are provided, and auxiliary support is provided for the diagnosis and treatment process of the doctor.
Disclosure of Invention
Based on the above, the invention aims to provide a similar medical record recommendation method and system, which are used for improving the utilization of medical data, mining the data value of electronic medical records, providing diagnosis and treatment experiences and related characteristic statistical information of similar patients and providing auxiliary support for the diagnosis and treatment process of doctors.
The invention provides a similar medical record recommendation system on one hand, which comprises the following components:
the medical record data preprocessing module is used for acquiring the electronic medical record, preprocessing the electronic medical record, extracting key contents in the electronic medical record, representing the electronic medical record according to the preprocessed electronic medical record and the extracted key contents and constructing a knowledge graph of the electronic medical record;
the disease knowledge graph construction module comprises a mode layer construction unit, a data pre-labeling unit, an entity relation extraction unit, a knowledge graph representation unit and a knowledge graph storage unit; the mode layer construction unit is used for constructing a knowledge graph of diseases in a top-down mode and needs to define an ontology and a relation related to the diseases; the data pre-labeling unit manually labels entities and relations on the data so as to facilitate the training of an entity and relation extraction model; the entity relationship extraction unit is used for automatically extracting entities and relationships from data in the medical record by using a model, and the extracted contents comprise the entities and the relationships defined by the mode layer; the knowledge graph representation unit represents the extracted knowledge and entities in an RDF triple mode; the knowledge map storage unit stores the knowledge map triples in a map database, and a disease knowledge map pattern database framework is obtained through defining and analyzing the ontology and the relation;
the knowledge representation module comprises a knowledge representation learning unit and a medical record representation unit; learning the representation of the entity through the knowledge graph, and further representing the whole electronic medical record by using the entity in the medical record;
the similar medical record recommending module comprises a medical record similarity calculating unit and a similar medical record recommending unit, the similarity of any two electronic medical records is calculated by using the representation of the knowledge graph, and the electronic medical record database is calculated in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical records, wherein the electronic medical record database comprises a hospital management information system.
In addition, the similar medical record recommendation system according to the present invention may further have the following additional technical features:
further, the air conditioner is provided with a fan,
the system comprises a mode layer construction unit, a data processing unit and a data processing unit, wherein the mode layer construction unit is used for constructing an ontology relationship diagram, the ontology relationship diagram comprises five types of entities and fifteen types of entity relationships, the five types of entities define the entities in the electronic medical record as five types, and the fifteen types of entity relationships define the relationships among the entities as fifteen types;
the data pre-labeling unit is used for performing data labeling on the entity extraction task and adopting a BIO labeling method, wherein B is Begin and represents the beginning of the entity; i is Internal, representing the remainder of the entity; o is Other, representing a non-entity;
the entity relation extraction unit is used for extracting entities and relations by using a neural network language model structure of BERT + BilSTM + CRF, wherein a BERT layer uses a BERT-Base-Chinese model; the intermediate layer uses the BilSTM, inputs the vector sequence, outputs after calculating the score of the vector sequence; the output layer uses a CRF model, the CRF layer calculates scores of all labels after acquiring the score matrix, and the label with the highest score is output as a prediction result;
the knowledge graph representing unit is used for describing the relation between entities by using the RDF (remote data format) triples of < entity 1, relation and entity 2> for the extracted knowledge, a graph database Neo4j is used for storing the knowledge graph, and the Neo4j manages data by using nodes and edges, wherein the nodes represent the entities and the edges represent the relation.
Further, the air conditioner is provided with a fan,
a knowledge representation learning unit, wherein the representation of the knowledge map is represented by using a Trans X (Trans E, trans H, trans R and Trans D) series model;
and the medical record representing unit is used for obtaining the vector representation of the entity and the relation in the knowledge map through knowledge representation learning and representing the electronic medical record by using the entity and the relation vector.
Further, the step of representing by using a train X (train E, train H, train R, train D) series model specifically includes:
given a triplet (h, r, t), the relationship r is defined as a translation vector, and when the triplet (h, r, t) is true, the tail entity moves toThe sum of the quantity and head entity vectors and the relation vectors is closer, when the triples are not established, the head entity and the relation vectors of the tail entity vectors are added farther, and the scoring function f is optimized r (h, t) learning of knowledge representation.
Further, the step of representing the electronic medical record by the entity and the relationship vector specifically comprises:
let A and B electronic medical records be A = (a) 1 ,a 2 ,…,a n ) And B = (B) 1 ,b 2 ,…,b m ) Wherein a is n And b m Respectively representing the vector of the entity or the relation, n and m respectively representing the total number of the entity relation in the A medical record and the B medical record, and defining the similarity of the words asWherein, w i And w j For the weight coefficient, the electronic medical record can be represented as a vector a = [ s ] of word similarity 11 ,s 12 ,…,s nm ],B=[s 11 ,s 12 ,…,s nm ]Thus, vectorization representations of the electronic medical records A and B are obtained.
Further, the air conditioner is provided with a fan,
the similarity calculation unit calculates the similarity of the medical records through cosine similarity, and the electronic medical record chapter vectors A and B represented by the entity and the relation vector are respectively A = [ s ] 11 ,s 12 ,…,s nm ]And B = [ s ] 11 ,s 12 ,…,s nm ]If yes, calculating the similarity by using the cosine similarity;
similar medical record recommending unit, TOP K electronic medical record recommending unit, the quantity of medical records in the electronic medical record set is large, when inputting new medical records to search for similar diseases, all medical records and similarity do not need to be displayed, and only the first K electronic medical records which are most similar, namely have the maximum similarity need to be displayed, wherein K is a positive integer.
And further, screening the electronic medical record with the maximum similarity from the electronic medical record database through a heap sorting algorithm.
Further, five entities of DISEASEs and fifteen types of relationships among the entities are defined in the pattern layer construction unit, wherein the entities are BODY parts (BODY), symptoms and SIGNS (SIGNS), DISEASEs and Diagnoses (DISEASE), examination and test (CHECK), and TREATMENT (TREATMENT);
the relationships are < symptom and sign, s _ locate _ b (disease located at body part), body part >, < symptom and sign, s _ index _ d (symptom indicates disease), disease and diagnosis >, < symptom and sign, s _ administration _ s (multiple symptoms appear together), symptom and sign >, < symptom and sign, c _ need _ s (symptom needs to be checked), check and check >, < disease and diagnosis, d _ result _ s (disease causes symptom), symptom and sign >, < disease and diagnosis, d _ locate _ b (disease located at body part), body part >, < disease and diagnosis, d _ approval _ d (multiple diseases are concurrent), disease and diagnosis >, < disease and diagnosis, d _ for _ c (examination to confirm disease), examination and test >, < examination and test c _ find _ s (examination finding symptoms), symptom and sign >, < examination and test, c _ confirm _ d (examination confirming disease), disease and diagnosis >, < examination and test, c _ locate _ b (examination located on body part), body part >, < examination and test, c _ acompany _ c (different examinations done simultaneously), examination and test >, < treatment, t _ acompany _ t (multiple treatment modalities), treatment >, < treatment, t _ act _ s (treatment acting on symptoms), symptom and sign >, < treatment, t _ act _ d (treatment acting on disease), disease and diagnosis >.
In another aspect, the present application provides a similar medical record recommendation method, which is applied to the similar medical record recommendation system described above, and the method includes:
acquiring a historical electronic medical record, extracting key contents of the historical electronic medical record, preprocessing the historical electronic medical record, and constructing a historical electronic medical record knowledge graph according to the preprocessed historical electronic medical record;
defining an ontology and a relation related to diseases according to the historical electronic medical record knowledge graph, labeling an entity and a relation according to the defined ontology and relation, wherein the entity comprises a body part, symptoms and physical signs, and diseases and diagnosis, the relation comprises the symptoms and the physical signs, s _ location _ b (diseases are located on the body part), and the body part >, and learning the representation of the entity through the historical electronic medical record knowledge graph so as to represent the whole electronic medical record by using the entity in the medical record;
acquiring a target electronic medical record, extracting entities and relations in the target electronic medical record by combining the labeled entities and relations to construct a target knowledge graph, representing the extracted knowledge and entities in an RDF (resource description framework) triple manner, storing the represented knowledge graph triple in a target knowledge graph database, defining and analyzing the body and relations to obtain a disease knowledge graph mode library frame, and representing a learning vectorization knowledge graph through knowledge;
and obtaining chapter vector representation of the target electronic medical record according to the vector representation of the entity and the relationship, calculating the similarity between each electronic medical record in the electronic medical record database and the target electronic medical record according to the chapter vector representation of the target electronic medical record, and sequencing the obtained similarities to obtain the most similar electronic medical record and recommend the most similar electronic medical record, wherein the electronic medical record database comprises a hospital management information system.
According to the method and the system for recommending the similar medical records, the electronic medical records are preprocessed through the medical record data preprocessing module, the key contents in the electronic medical records are extracted, the electronic medical records are represented according to the preprocessed electronic medical records and the extracted key contents, and the knowledge graph of the electronic medical records is constructed; extracting knowledge and entities and representing the extracted knowledge and entities through a disease knowledge graph construction module, storing knowledge graph triples in a graph database through a knowledge graph storage unit, and obtaining a disease knowledge graph pattern database framework through defining and analyzing an ontology and a relation; the knowledge in the knowledge representation module represents the representation of the learning unit learning entity, and further represents the whole electronic medical record through the medical record representation unit; and finally, calculating the similarity of any two electronic medical records according to a medical record similarity calculation unit in the similar medical record recommendation module, and calculating an electronic medical record database in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical record.
Drawings
FIG. 1 is a flowchart illustrating a method for recommending similar medical records according to a first embodiment of the present invention;
FIG. 2 is a system block diagram of a similar medical record recommendation system according to a second embodiment of the present invention;
the following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings.
Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The Chinese electronic medical record has a complex language structure and rich semantic knowledge, brings certain challenges to text mining and processing work, and the processing scheme aiming at the Chinese electronic medical record is a popular research direction. The knowledge graph can effectively organize data and the relation between the data, the knowledge graph is mapped into a low-dimensional vector by using a knowledge representation learning technology, the relation between entities in an electronic medical record text is vectorized to obtain the vector representation of a single electronic medical record, the similarity of the electronic medical record is calculated by cosine similarity, the similar medical record is recommended to the electronic medical record, and the query of similar patients and the auxiliary diagnosis of doctors are facilitated.
Example one
Referring to fig. 1, a method for recommending similar medical records according to a first embodiment of the present invention is shown, and the method includes steps S101 to S104:
s101, acquiring a historical electronic medical record, extracting key contents of the historical electronic medical record, preprocessing the historical electronic medical record, and constructing a historical electronic medical record knowledge graph according to the preprocessed historical electronic medical record.
S102, defining an ontology and a relation related to diseases according to a historical electronic medical record knowledge graph, marking an entity and a relation according to the defined ontology and the relation, wherein the entity comprises a body part, symptoms and signs and diseases and diagnosis, the relation comprises the symptoms and the signs, S _ locate _ b (the diseases are located on the body part) and the body part >, and the representation of the entity is learned through the historical electronic medical record knowledge graph so as to represent the whole electronic medical record by using the entity in the medical record.
S103, acquiring a target electronic medical record, extracting entities and relations in the target electronic medical record by combining the labeled entities and relations to construct a target knowledge graph, expressing the extracted knowledge and entities in an RDF (remote data format) triple mode, storing the expressed knowledge graph triples in a target knowledge graph database, obtaining a disease knowledge graph mode library frame through definition and analysis of the body and the relations, and learning the vectorized knowledge graph through knowledge expression.
S104, obtaining chapter vector representation of the target electronic medical record according to the vector representation of the entity and the relation, calculating the similarity between each electronic medical record in the electronic medical record database and the target electronic medical record according to the chapter vector representation of the target electronic medical record, sequencing the obtained similarities to obtain the most similar electronic medical record and recommending the most similar electronic medical record, wherein the electronic medical record database comprises a hospital management information system.
In summary, in the method for recommending similar medical records in the above embodiments of the present invention, the electronic medical records are preprocessed by the medical record data preprocessing module and the key contents in the electronic medical records are extracted, and the electronic medical records are represented and the knowledge graph of the electronic medical records is constructed according to the preprocessed electronic medical records and the extracted key contents; extracting knowledge and entities and representing the extracted knowledge and entities through a disease knowledge graph construction module, storing knowledge graph triples in a graph database through a knowledge graph storage unit, and obtaining a disease knowledge graph pattern database framework through defining and analyzing an ontology and a relation; the representation of the entity is learned through a knowledge representation learning unit in the knowledge representation module, and then the whole electronic medical record is represented through a medical record representation unit; and finally, calculating the similarity of any two electronic medical records according to a medical record similarity calculation unit in the similar medical record recommendation module, and calculating an electronic medical record database in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical record.
Example two
Referring to fig. 2, a similar medical record recommendation system in a second embodiment of the invention is shown, which includes:
the medical record data preprocessing module is used for acquiring the electronic medical record, preprocessing the electronic medical record, extracting key contents in the electronic medical record, representing the electronic medical record according to the preprocessed electronic medical record and the extracted key contents and constructing a knowledge graph of the electronic medical record.
The structure of the electronic medical record is greatly different from a general text structure, different contents such as chief complaints, past history, examination, diagnosis and the like can be defined in the medical record according to different modules, the medical record is usually filled in according to the time sequence from admission to discharge, and the data has strict front-back relation. A complete medical record mainly comprises basic information of a patient, admission records, disease course records, examination and examination, medical orders and the like. Most often stored in text form. The admission record includes the description of the state of illness of the patient at the time of admission, including personal history, past history, family history, present medical history, physical examination, etc. The medical record records various examination results, treatment processes and disease changes of the patient during the whole hospitalization period, and mainly comprises a first medical record, a daily medical record, a superior physician ward round record and the like. The examination and test records the examination and test items and the examination and test results of the patient. The data preprocessing is mainly used for eliminating noise, incompleteness, redundancy and inconsistency of data. In the embodiment, the preprocessing of the data comprises four aspects of processing of data cleaning, data integration, data transformation and data specification.
In particular, some data in electronic medical records is incomplete, noisy, and inconsistent. Data cleaning is mainly used for filling missing values, identifying illegal values and correcting inconsistent data. Data integration is the merging of data from different two-dimensional tables, stored in one two-dimensional table. Data transformation transforms the type or range of values of the data into a form suitable for mining. The method mainly converts the electronic medical record into text data suitable for extraction. Each form of the electronic medical record contains a plurality of attributes, many of which are irrelevant or redundant to the mining task. The reduction technique is to reduce the amount of data by deleting irrelevant attributes (or dimensions).
The disease knowledge graph building module comprises a mode layer building unit, a data pre-labeling unit, an entity relation extracting unit, a knowledge graph representing unit and a knowledge graph storage unit; the mode layer construction unit is used for constructing a knowledge graph of diseases in a top-down mode and needs to define an ontology and a relation related to the diseases; the data pre-labeling unit manually labels entities and relations on the data so as to facilitate the training of an entity and relation extraction model; the entity relationship extraction unit is used for automatically extracting entities and relationships from data in the medical record by using a model, and the extracted contents comprise the entities and the relationships defined by the mode layer; the knowledge graph representation unit represents the extracted knowledge and entities in an RDF triple mode; the knowledge graph storage unit stores the knowledge graph triples in the graph database, and obtains a disease knowledge graph mode library framework through defining and analyzing the ontology and the relation.
Specifically, the mode layer construction unit is configured to construct an ontology relationship diagram, where the ontology relationship diagram includes five types of entities and fifteen types of entity relationships, where the five types of entities define entities in the electronic medical record as five types, and the fifteen types of entity relationships define relationships between the entities as fifteen types.
Wherein the entities are BODY parts (BODY), symptoms and SIGNS (SIGNS), DISEASEs and Diagnoses (DISEASE), tests and tests (CHECK), TREATMENTs (TREATMENT);
the relationships are < symptom and sign, s _ locate _ b (disease located in body part), body part >, < symptom and sign, s _ indicator _ d (symptom indicates disease), disease and diagnosis >, < symptom and sign, s _ administration _ s (multiple symptoms appear together), symptom and sign >, < symptom and sign, c _ need _ s (symptom needs to be checked), check and check >, < disease and diagnosis, d _ result _ s (disease causes symptom), symptom and sign >, < disease and diagnosis, d _ locate _ b (disease located in body part), body part >, < disease and diagnosis, d _ registration _ d (multiple diseases occur concurrently), disease and diagnosis >, < disease and diagnosis, respectively, d _ for _ c (examination to confirm disease), examination and test >, < examination and test c _ find _ s (examination finding symptoms), symptom and sign >, < examination and test, c _ confirm _ d (examination confirming disease), disease and diagnosis >, < examination and test, c _ locate _ b (examination located on body part), body part >, < examination and test, c _ acompany _ c (different examinations done simultaneously), examination and test >, < treatment, t _ acompany _ t (multiple treatment modalities), treatment >, < treatment, t _ act _ s (treatment acting on symptoms), symptom and sign >, < treatment, t _ act _ d (treatment acting on disease), disease and diagnosis >.
The data pre-labeling unit is used for performing data labeling on the entity extraction task and adopting a BIO labeling method, wherein B is Begin and represents the beginning of the entity; i is Internal, representing the remainder of the entity; o is Other, which means a non-entity; the labeling formats are in the form of X-B, X-I, O, where X is BODY (BODY part), SIGNS (symptoms and SIGNS), DISEASE (DISEASE and diagnosis), CHECK (examination and test), TREATMENT (TREATMENT), respectively.
The entity relationship extraction unit is used for extracting entities and relationships by using a neural network language model structure of BERT + BilSTM + CRF, wherein a BERT layer uses a BERT-Base-Chinese model; the intermediate layer uses the BilSTM, inputs the vector sequence, outputs after calculating the score of the vector sequence; the output layer uses a CRF model, the CRF layer calculates scores of all labels after acquiring the score matrix, and the label with the highest score is output as a prediction result; the BerT-Base-Chinese model transform layer is 12 layers, the hidden layer is 768 dimensions, the number of self-attention mechanism models is 12, and the total parameter number is 110M; the BERT is followed by a BilSTM layer, and the score of the vector sequence is calculated and output by taking the vector sequence as input. The sequence length is set to 300 max, the batch size is 64, epochs is 100, dropout is 0.5, and the hidden layer dimension is 128; and a CRF layer is connected behind the BilSTM, the score matrix is acquired, the score of each label is calculated, and the label with the highest score is output as a prediction result.
And training an automatic extraction model through the manual pre-labeled data generated in the previous step, and automatically extracting the entity relation of the data by using the trained model.
The knowledge graph representing unit is used for describing the relation between entities by using the RDF (remote data format) triples of < entity 1, relation and entity 2> for the extracted knowledge, a graph database Neo4j is used for storing the knowledge graph, and the Neo4j manages data by using nodes and edges, wherein the nodes represent the entities and the edges represent the relation.
The knowledge representation module comprises a knowledge representation learning unit and a medical record representation unit; the representation of the entities is learned through the knowledge graph, and the entities in the medical record are further used for representing the whole electronic medical record. And vectorizing the knowledge graph through a knowledge representation learning technology based on the knowledge graph to obtain vector representation of the entities and the relations.
Wherein, the knowledge represents the learning unit, the representation of the knowledge map is represented by using a Trans X (Trans E, trans H, trans R and Trans D) series model; specifically, using a TransE model to represent the entities and the relations in the same vector space, given a triplet (h, r, t), where h represents a head entity, t represents a relation, r represents a tail entity, and the relation r is defined as a translation vector, when the triplet (h, r, t) is established, there is h + r ≈ t, that is, the sum of the tail entity vector and the head entity vector and the relation vector is closer, and the specific implementation is divided into three parts: the first part is to initialize a head entity, a tail entity and a relation, and randomly endow an initial variable and normalize the initial variable; the second part extracts part of the triples and replaces the head entity or the tail entity to form a negative example; the third part optimizes the objective function to obtain the optimized vector representation. Through the knowledge representation learning model, vector representations of all entities and relations in the knowledge graph can be obtained. Tail excess when triad is not trueThe head entity of the volume vector and the relation vector are added far, and the scoring function f is optimized r (h, t) learning of knowledge representation is performed.
And the medical record representing unit is used for obtaining the vector representation of the entity and the relation in the knowledge map through knowledge representation learning and representing the electronic medical record by using the entity and the relation vector. Specifically, let A and B electronic medical records be A = (a) 1 ,a 2 ,…,a n ) And B = (B) 1 ,b 2 ,…,b m ) Wherein a is n And b m The vector expression of the entity or the relationship is respectively, n and m are respectively the total number of the entity relationship in the A medical record and the B medical record, and the similarity calculation formula of each term in the two medical records is as follows:
wherein, a i =[x 1 ,x 2 ,…,x 128 ],b i =[y 1 ,y 2 ,…,y 128 ],w i And w j Is a weight coefficient, thereby obtaining a chapter vector of the electronic medical record, and the two electronic medical records are expressed as a vector A = [ s ] of word similarity 11 ,s 12 ,…,s nm ],B=[s 11 ,s 12 ,…,s nm ]Thus obtaining vectorization representation of the A and B electronic medical records;
calculating the similarity of medical records by using the cosine similarity:
a candidate set is generated. When a new medical record enters the system and needs to search similar diseases, a medical record set containing new medical record entities and relationships is generated through matching for more accurately searching similar medical records and improving the searching efficiency.
The similar medical record recommending module comprises a medical record similarity calculating unit and a similar medical record recommending unit, the similarity of any two electronic medical records is calculated by using the representation of the knowledge graph, and the electronic medical record database is calculated in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical records, wherein the electronic medical record database comprises a hospital management information system.
Specifically, the electronic medical records with the maximum similarity are screened from the electronic medical record database through a heap sorting algorithm.
Further, the similarity calculation unit calculates the similarity of the medical records through cosine similarity, and the electronic medical record chapter vectors A and B represented by the entity and the relationship vector are respectively A = [ s ] 11 ,s 12 ,…,s nm ]And B = [ s ] 11 ,s 12 ,…,s nm ]Then the similarity calculation uses cosine similarity for calculation:
and calculating and sequencing the similarity of the medical records. And based on the chapter vector representation of the electronic medical record, calculating the similarity of the medical record by using cosine similarity, and sequencing the similarity calculation results by using a heap sequencing algorithm to obtain the similar medical record meeting the conditions. The electronic medical record database has a large number of electronic medical records, when a new medical record is input to search for similar diseases, all the medical records do not need to be displayed, and only N medical records with the maximum similarity need to be displayed, so that a heap sorting algorithm is selected for sorting the medical records.
Similar medical record recommending unit, TOP K electronic medical record recommending unit, the quantity of medical records in the electronic medical record set is large, when inputting new medical records to search for similar diseases, all medical records and similarity do not need to be displayed, and only the first K electronic medical records which are most similar, namely have the maximum similarity need to be displayed, wherein K is a positive integer.
The electronic medical record sets have large number of medical records, and when a new medical record is input to search for similar diseases, all the medical records and the similarity do not need to be displayed, and only the top n medical records which are most similar, namely the highest similarity, need to be displayed. The invention selects a heap sorting algorithm to find TOP K electronic medical records of a given electronic medical record.
The similar medical record recommendation system in the embodiment is used for similar medical record retrieval based on knowledge graph representation learning, and in the process of introducing chapter vector representation to knowledge representation learning, four modules including a medical record data preprocessing module, a disease knowledge graph construction module, a knowledge representation module and a similar medical record recommendation module are used. The main realization process is as follows: the method comprises the steps of firstly extracting entities and relations in the electronic medical record, constructing a knowledge graph, learning a vectorized knowledge graph through knowledge representation, obtaining chapter vector representation of the electronic medical record through vector representation of the entities and the relations, calculating medical record similarity to achieve similar medical record retrieval, and applying the similar medical record retrieval to diagnosis and treatment processes such as disease diagnosis and treatment scheme suggestion.
The Chinese electronic medical record has complex structure, rich semantics and high professional degree, the word segmentation technology limits the recognition precision of entities and relations, so that a model mostly adopts a single character as input, but a single vector of the character cannot represent ambiguity, and in addition, the electronic medical record has high labeling cost, so that the existing high-quality labeled corpus is small in quantity. Aiming at the problems, a pre-training model BERT is introduced to carry out text vectorization and is used as the basis of an entity and relationship extraction model. The BERT model enhances the semantic expression of characters based on the characteristics of the BERT model, effectively utilizes semantic information in text data, and has better advantages on small-scale marking linguistic data in an unsupervised training process.
The knowledge graph is mapped into a low-dimensional vector by using a knowledge representation learning technology, the relation between entities in the electronic medical record text is vectorized to obtain the vector representation of a single electronic medical record, the similarity of the electronic medical record is calculated by cosine similarity, the effectiveness of the method is verified by experiments, the query of similar patients is facilitated, the treatment scheme of the similar patients is mined, and decision support is provided.
In summary, in the similar medical record recommendation system in the above embodiment of the present invention, the electronic medical record is preprocessed by the medical record data preprocessing module and the key content in the electronic medical record is extracted, and the electronic medical record is represented and the knowledge graph of the electronic medical record is constructed according to the preprocessed electronic medical record and the extracted key content; extracting knowledge and entities and representing the extracted knowledge and entities through a disease knowledge graph construction module, storing knowledge graph triples in a graph database through a knowledge graph storage unit, and obtaining a disease knowledge graph pattern database framework through defining and analyzing an ontology and a relation; the knowledge in the knowledge representation module represents the representation of the learning unit learning entity, and further represents the whole electronic medical record through the medical record representation unit; and finally, calculating the similarity of any two electronic medical records according to a medical record similarity calculation unit in the similar medical record recommendation module, and calculating an electronic medical record database in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical record.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (9)
1. A system for recommending similar medical records, comprising:
the medical record data preprocessing module is used for acquiring the electronic medical record, preprocessing the electronic medical record, extracting key contents in the electronic medical record, representing the electronic medical record according to the preprocessed electronic medical record and the extracted key contents and constructing a knowledge graph of the electronic medical record;
the disease knowledge graph construction module comprises a mode layer construction unit, a data pre-labeling unit, an entity relation extraction unit, a knowledge graph representation unit and a knowledge graph storage unit; the mode layer construction unit is used for constructing a knowledge graph of a disease in a top-down mode and needs to define an ontology and a relation related to the disease; the data pre-labeling unit manually labels entities and relations on data so as to facilitate training of an entity and relation extraction model; the entity relationship extraction unit is used for automatically extracting entities and relationships from data in the medical record by using a model, and the extracted contents comprise the entities and the relationships defined by the mode layer; the knowledge graph representation unit represents the extracted knowledge and entities in an RDF triple mode; the knowledge map storage unit stores the knowledge map triples in a map database, and a disease knowledge map pattern database framework is obtained through defining and analyzing the ontology and the relation;
the knowledge representation module comprises a knowledge representation learning unit and a medical record representation unit; learning the representation of the entity through the knowledge graph, and further representing the whole electronic medical record by using the entity in the medical record;
the similar medical record recommending module comprises a medical record similarity calculating unit and a similar medical record recommending unit, the similarity of any two electronic medical records is calculated by using the representation of the knowledge graph, and the electronic medical record database is calculated in a sequencing mode to select and recommend the electronic medical record which is most similar to the electronic medical records, wherein the electronic medical record database comprises a hospital management information system.
2. The system for recommending similar medical records according to claim 1,
the system comprises a mode layer construction unit, a data processing unit and a data processing unit, wherein the mode layer construction unit is used for constructing an ontology relationship diagram, the ontology relationship diagram comprises five types of entities and fifteen types of entity relationships, the five types of entities define the entities in the electronic medical record as five types, and the fifteen types of entity relationships define the relationships among the entities as fifteen types;
the data pre-labeling unit is used for performing data labeling on the entity extraction task and adopting a BIO labeling method, wherein B is Begin and represents the beginning of the entity; i is Internal, representing the remainder of the entity; o is Other, representing a non-entity;
the entity relation extraction unit is used for extracting entities and relations by using a neural network language model structure of BERT + BilSTM + CRF, wherein a BERT layer uses a BERT-Base-Chinese model; the intermediate layer uses the BilSTM, inputs the vector sequence, outputs after calculating the score of the vector sequence; the output layer uses a CRF model, the CRF layer calculates scores of all labels after acquiring the score matrix, and the label with the highest score is output as a prediction result;
the knowledge graph representing unit is used for describing the relation between entities by using the RDF (remote data format) triples of < entity 1, relation and entity 2> for the extracted knowledge, a graph database Neo4j is used for storing the knowledge graph, and the Neo4j manages data by using nodes and edges, wherein the nodes represent the entities and the edges represent the relation.
3. The system for recommending similar medical records according to claim 1,
a knowledge representation learning unit for representing the representation of the knowledge map by using a Trans X (Trans E, trans H, trans R and Trans D) series model;
and the medical record representing unit is used for obtaining the vector representation of the entity and the relation in the knowledge map through knowledge representation learning and representing the electronic medical record by using the entity and the relation vector.
4. The system for recommending similar medical records according to claim 3, wherein the step of representing by using a Trans X (Trans E, trans H, trans R, trans D) series model specifically comprises:
given a triplet (h, r, t), the relationship r is defined as a translation vector, the sum of the tail and head entity vectors and the relationship vector is closer when the triplet (h, r, t) is true, and the head and relationship vectors of the tail entity vector add farther when the triplet is false, by optimizing the scoring function f r (h, t) learning of knowledge representation.
5. The system for recommending similar medical records according to claim 3, wherein the step of representing the electronic medical record with the entity and the relationship vector specifically comprises:
let A and B electronic medical records be A = (a) 1 ,a 2 ,…,a n ) And B = (B) 1 ,b 2 ,…,b m ) Wherein a is n And b m Respectively representing the vector of the entity or the relation, n and m respectively representing the total number of the entity relation in the A medical record and the B medical record, and defining the similarity of the words asWherein, w i And w j For the weight coefficient, the electronic medical record can be represented as a vector of word similarity a = [ s ] 11 ,s 12 ,…,s nm ],B=[s 11 ,s 12 ,…,s nm ]Thus, the vectorization representation of the electronic medical records A and B is obtained.
6. The system for recommending similar medical records according to claim 5,
the similarity calculation unit calculates the similarity of the medical records through cosine similarity, and the electronic medical record chapter vectors A and B represented by the entity and the relation vector are respectively A = [ s ] 11 ,s 12 ,…,s nm ]And B = [ s ] 11 ,s 12 ,…,s nm ]If yes, calculating the similarity by using cosine similarity;
similar medical record recommending unit, TOP K electronic medical record recommending unit, the quantity of medical records in the electronic medical record set is large, when inputting new medical records to search for similar diseases, all medical records and similarity do not need to be displayed, and only the first K electronic medical records which are most similar, namely have the maximum similarity need to be displayed, wherein K is a positive integer.
7. The system for recommending similar medical records according to claim 6, wherein the electronic medical records with the highest similarity are screened from the electronic medical record database by a heap sorting algorithm.
8. The system of claim 2, wherein the schema layer building unit defines five entities of DISEASE and fifteen relationships among the entities, wherein the entities are BODY parts (BODY), symptoms and SIGNS (SIGNS), DISEASE and Diagnosis (DISEASE), examination and CHECK (CHECK), and TREATMENT (TREATMENT);
the relationships are < symptom and sign, s _ locate _ b (disease located in body part), body part >, < symptom and sign, s _ indicator _ d (symptom indicates disease), disease and diagnosis >, < symptom and sign, s _ administration _ s (multiple symptoms appear together), symptom and sign >, < symptom and sign, c _ need _ s (symptom needs to be checked), check and check >, < disease and diagnosis, d _ result _ s (disease causes symptom), symptom and sign >, < disease and diagnosis, d _ locate _ b (disease located in body part), body part >, < disease and diagnosis, d _ registration _ d (multiple diseases occur concurrently), disease and diagnosis >, < disease and diagnosis, respectively, d _ for _ c (examination to confirm disease), examination and test >, < examination and test c _ find _ s (examination finding symptoms), symptom and sign >, < examination and test, c _ confirm _ d (examination confirming disease), disease and diagnosis >, < examination and test, c _ locate _ b (examination located on body part), body part >, < examination and test, c _ acompany _ c (different examinations done simultaneously), examination and test >, < treatment, t _ acompany _ t (multiple treatment modalities), treatment >, < treatment, t _ act _ s (treatment acting on symptoms), symptom and sign >, < treatment, t _ act _ d (treatment acting on disease), disease and diagnosis >.
9. A similar medical record recommendation method applied to the similar medical record recommendation system of any one of claims 1-8, the method comprising:
acquiring a historical electronic medical record, extracting key contents of the historical electronic medical record, preprocessing the historical electronic medical record, and constructing a knowledge graph of the historical electronic medical record according to the preprocessed historical electronic medical record;
defining an ontology and a relation related to diseases according to the historical electronic medical record knowledge graph, labeling an entity and a relation according to the defined ontology and the relation, wherein the entity comprises a body part, symptoms and signs, and diseases and diagnoses, the relation comprises the symptoms and the signs, s _ locate _ b (the diseases are located on the body part), and the body part >, and learning the representation of the entity through the historical electronic medical record knowledge graph so as to represent the whole electronic medical record by the entity in the medical record;
acquiring a target electronic medical record, extracting entities and relations in the target electronic medical record by combining the labeled entities and relations to construct a target knowledge graph, expressing the extracted knowledge and entities in an RDF (remote data format) triple mode, storing the expressed knowledge graph triples in a target knowledge graph database, obtaining a disease knowledge graph mode library frame through defining and analyzing the body and relations, and expressing a learning vectorization knowledge graph through knowledge;
and obtaining chapter vector representation of the target electronic medical record according to the vector representation of the entity and the relationship, calculating the similarity between each electronic medical record in the electronic medical record database and the target electronic medical record according to the chapter vector representation of the target electronic medical record, and sequencing the obtained similarities to obtain the most similar electronic medical record and recommend the most similar electronic medical record, wherein the electronic medical record database comprises a hospital management information system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211268066.XA CN115841861A (en) | 2022-10-17 | 2022-10-17 | Similar medical record recommendation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211268066.XA CN115841861A (en) | 2022-10-17 | 2022-10-17 | Similar medical record recommendation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115841861A true CN115841861A (en) | 2023-03-24 |
Family
ID=85575841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211268066.XA Pending CN115841861A (en) | 2022-10-17 | 2022-10-17 | Similar medical record recommendation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115841861A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116682553A (en) * | 2023-08-02 | 2023-09-01 | 浙江大学 | Diagnosis recommendation system integrating knowledge and patient representation |
CN118299063A (en) * | 2024-06-03 | 2024-07-05 | 长春中医药大学 | Cardiovascular and cerebrovascular disease information management system for rehabilitation assistance |
CN118398230A (en) * | 2024-05-13 | 2024-07-26 | 盐城市第三人民医院 | Automatic generation and auxiliary decision-making system of medical knowledge graph based on electronic medical record |
CN118588225A (en) * | 2024-08-06 | 2024-09-03 | 成都知视界信息科技有限公司 | Disease analysis system based on biomedical big data |
-
2022
- 2022-10-17 CN CN202211268066.XA patent/CN115841861A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116682553A (en) * | 2023-08-02 | 2023-09-01 | 浙江大学 | Diagnosis recommendation system integrating knowledge and patient representation |
CN116682553B (en) * | 2023-08-02 | 2023-11-03 | 浙江大学 | Diagnosis recommendation system integrating knowledge and patient representation |
CN118398230A (en) * | 2024-05-13 | 2024-07-26 | 盐城市第三人民医院 | Automatic generation and auxiliary decision-making system of medical knowledge graph based on electronic medical record |
CN118299063A (en) * | 2024-06-03 | 2024-07-05 | 长春中医药大学 | Cardiovascular and cerebrovascular disease information management system for rehabilitation assistance |
CN118588225A (en) * | 2024-08-06 | 2024-09-03 | 成都知视界信息科技有限公司 | Disease analysis system based on biomedical big data |
CN118588225B (en) * | 2024-08-06 | 2024-10-15 | 成都知视界信息科技有限公司 | Disease analysis system based on biomedical big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111414393B (en) | Semantic similar case retrieval method and equipment based on medical knowledge graph | |
CN111613339B (en) | Similar medical record searching method and system based on deep learning | |
CN111274806B (en) | Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record | |
KR102153920B1 (en) | System and method for interpreting medical images through the generation of refined artificial intelligence reinforcement learning data | |
CN115841861A (en) | Similar medical record recommendation method and system | |
Ruan et al. | Representation learning for clinical time series prediction tasks in electronic health records | |
CN108091397B (en) | Bleeding event prediction method for patients with ischemic heart disease | |
Teng et al. | Automatic medical code assignment via deep learning approach for intelligent healthcare | |
CN107193919A (en) | The search method and system of a kind of electronic health record | |
CN108062978B (en) | Method for predicting main adverse cardiovascular events of patients with acute coronary syndrome | |
CN113707339B (en) | Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases | |
CN113764112A (en) | Online medical question and answer method | |
Kaswan et al. | AI-based natural language processing for the generation of meaningful information electronic health record (EHR) data | |
Mujtaba et al. | Classification of forensic autopsy reports through conceptual graph-based document representation model | |
CN115293161A (en) | Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph | |
CN115859914A (en) | Diagnosis ICD automatic coding method and system based on medical history semantic understanding | |
CN116992002A (en) | Intelligent care scheme response method and system | |
Hsu et al. | Multi-label classification of ICD coding using deep learning | |
Cai et al. | NE–LP: normalized entropy-and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs | |
Khokhar et al. | Framework for mining and analysis of standardized nursing care plan data | |
CN118171653B (en) | Health physical examination text treatment method based on deep neural network | |
CN113343680A (en) | Structured information extraction method based on multi-type case history texts | |
CN117194604B (en) | Intelligent medical patient inquiry corpus construction method | |
CN118114675A (en) | Medical named entity recognition method and device based on large language model | |
Cohen et al. | Improving severity classification of Hebrew PET-CT pathology reports using test-time augmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |