Nothing Special   »   [go: up one dir, main page]

CN118093736A - Acquisition system for corresponding entity and entity tag of medical record text - Google Patents

Acquisition system for corresponding entity and entity tag of medical record text Download PDF

Info

Publication number
CN118093736A
CN118093736A CN202410488600.0A CN202410488600A CN118093736A CN 118093736 A CN118093736 A CN 118093736A CN 202410488600 A CN202410488600 A CN 202410488600A CN 118093736 A CN118093736 A CN 118093736A
Authority
CN
China
Prior art keywords
target
entity
medical record
list
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410488600.0A
Other languages
Chinese (zh)
Other versions
CN118093736B (en
Inventor
刘立宇
初乃强
赵瑞莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Singularity Of Life Beijing Technology Co ltd
Singularity Digital Beijing Technology Co ltd
Original Assignee
Singularity Of Life Beijing Technology Co ltd
Singularity Digital Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Singularity Of Life Beijing Technology Co ltd, Singularity Digital Beijing Technology Co ltd filed Critical Singularity Of Life Beijing Technology Co ltd
Priority to CN202410488600.0A priority Critical patent/CN118093736B/en
Publication of CN118093736A publication Critical patent/CN118093736A/en
Application granted granted Critical
Publication of CN118093736B publication Critical patent/CN118093736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to an acquisition system of a corresponding entity and an entity tag of a medical record text, which relates to the technical field of text processing and comprises the following steps: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of: the method comprises the steps of obtaining an initial feature vector list corresponding to a target medical record text, inputting the initial feature vector list into a preset CNN model, obtaining an intermediate feature vector list, inputting the intermediate feature vector list into a preset transducer model, obtaining a target feature vector list, and obtaining a target entity list and a target label list corresponding to the target entity list according to the target feature vector list.

Description

Acquisition system for corresponding entity and entity tag of medical record text
Technical Field
The invention relates to the technical field of text processing, in particular to an acquisition system of corresponding entities and entity tags of medical record texts.
Background
With the continuous development of internet technology and the explosive growth of data volume in the information age, electronic medical record texts in the form of natural language texts have the characteristics of large capacity, rapid speed increasing, various forms and high potential value, under the background, structured information, namely an information extraction technology, is automatically extracted from unstructured electronic medical record texts, and the electronic medical record texts have wide attention and important application value, and extraction of entities and labels corresponding to the entities from the medical record texts become popular research directions.
In the prior art, the method for acquiring the corresponding entity and the entity tag of the medical record text comprises the following steps: setting corresponding rules based on lexical, syntactic and semantic features in the medical record text, matching and identifying the entities in the medical record text based on the set rules, setting corresponding entity tag libraries, and matching the acquired entities with the entities in the entity tag libraries to acquire entity tags.
In summary, the method for acquiring the corresponding entity and the entity tag of the medical record text has the following problems: the set rules can not cover all the entities, so that the entities involved in the medical record text are diversified, the situation that the entities and the entity labels can not be matched easily is caused, and the accuracy of acquiring the entities and the entity labels in the medical record text is reduced.
Disclosure of Invention
Aiming at the technical problems, the invention adopts the following technical scheme: an acquisition system of medical record text corresponding entity and entity label, the system comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of:
S100, an initial feature vector list A={A1,……,Ai,……,An},Ai=(Ai1,……,Aij,……,Aim),Ai corresponding to the target medical record text is obtained, the initial feature vector corresponding to the ith character in the target medical record text is obtained, A ij is the bit value of the jth bit in A i, j= … … m, m is the dimension of the initial feature vector, and i= … … n is the number of characters in the target medical record text.
S200, inputting A into a preset CNN model, obtaining an intermediate feature vector list B={B1,……,Bi,……,Bn},Bi=(Bi1,……,Bir,……,Bis),Bi corresponding to a target medical record text as an intermediate feature vector corresponding to an ith character in the target medical record text, wherein B ir is a bit value of an r-th bit in B i, r= … … S, and S is a dimension of the intermediate feature vector, and obtaining S in S200 by the following steps:
S201, obtaining a first target parameter eta 1 corresponding to a preset CNN model, wherein the first target parameter eta 1 is the type number corresponding to a convolution kernel in the preset CNN model.
S203, according to eta 1, obtaining a second target parameter eta 2 corresponding to a preset CNN model, wherein the second target parameter eta 2 meets the following conditions:
M/η 1<η2≤(m×μ)/η1 and η 2 =a×m, where μ is a preset parameter, a is the number of attention headers in the preset transducer, and M is any positive integer.
S205, obtaining the dimension S of the intermediate feature vector according to eta 1 and eta 2, wherein the dimension S of the intermediate feature vector meets the following conditions:
s=2×η1×η2
S300, inputting B into a preset transducer model, and obtaining a target feature vector list C={C1,……,Ci,……,Cn},Ci=(Ci1,……,Cih,……,Cig),Ci corresponding to a target medical record text as a target feature vector corresponding to an ith character in the target medical record text, wherein C ih is a bit value of an h bit in C i, h= … … g, and g is the dimension of the target feature vector, wherein g meets the following conditions:
g= (n+1) ×4, N is the number of preset physical tags.
S400, acquiring a target entity list corresponding to a target medical record text and a target label list corresponding to the target entity list according to the C, wherein the target entity list comprises a plurality of target entities, the target entities are entities in the target medical record text identified by combining a preset CNN model and a preset transducer model, the target label list comprises a plurality of target labels, and the target labels are preset entity labels categorized by the target entities.
Compared with the prior art, the acquisition system for the corresponding entity and the entity tag of the medical record text has obvious beneficial effects, can achieve quite technical progress and practicality, has wide industrial application value, and has at least the following beneficial effects:
The invention relates to a system for acquiring corresponding entities and entity tags of medical record texts, which comprises: n preset entity tags, a processor and a memory storing a computer program which, when executed by the processor, implements the steps of: the method comprises the steps of obtaining an initial feature vector list corresponding to a target medical record text, inputting the initial feature vector list into a preset CNN model, obtaining an intermediate feature vector list corresponding to the target medical record text, inputting the intermediate feature vector list into a preset transducer model, obtaining a target feature vector list corresponding to the target medical record text, and obtaining a target entity list corresponding to the target medical record text and a target label list corresponding to the target entity list according to the target feature vector list.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention, given by way of illustration only, together with the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a process implemented when a processor of an acquisition system for medical record text corresponding entities and entity tags executes a computer program according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Examples
The embodiment provides a system for acquiring an entity and an entity tag corresponding to medical record text, wherein the system comprises: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of, as shown in fig. 1:
S100, an initial feature vector list A={A1,……,Ai,……,An},Ai=(Ai1,……,Aij,……,Aim),Ai corresponding to the target medical record text is obtained, the initial feature vector corresponding to the ith character in the target medical record text is obtained, A ij is the bit value of the jth bit in A i, j= … … m, m is the dimension of the initial feature vector, and i= … … n is the number of characters in the target medical record text.
Specifically, the preset entity tag is a tag corresponding to a preset entity, where the tag corresponding to the entity is a tag corresponding to a word representing a physical state of the user, for example: the physical label is preset by checking mode, observation result and the like.
Specifically, the initial feature vector is a feature vector generated by performing front-back stitching on a vector generated based on content information and position information corresponding to Chinese characters in a text of a target medical record, where any method for converting text into a vector based on the content information and the position information in the prior art is known by a person skilled in the art, and falls into the protection scope of the present invention, and is not described herein.
Further, m=m 1+m2, where m 1 is a dimension of a vector generated based on content information corresponding to the chinese character in the target medical record text, and m 2 is a dimension of a vector generated based on position information corresponding to the chinese character in the target medical record text.
Preferably, m 1 has a value of 128.
Preferably, m 2 has a value of 64.
Specifically, the target medical record text is the medical record text of the entity to be acquired and the label corresponding to the entity.
S200, inputting A into a preset CNN model, obtaining an intermediate feature vector list B={B1,……,Bi,……,Bn},Bi=(Bi1,……,Bir,……,Bis),Bi corresponding to a target medical record text as an intermediate feature vector corresponding to an ith character in the target medical record text, wherein B ir is a bit value of an r-th bit in B i, r= … … S, and S is a dimension of the intermediate feature vector, and obtaining S in S200 by the following steps:
S201, obtaining a first target parameter eta 1 corresponding to a preset CNN model, wherein the first target parameter eta 1 is the type number corresponding to a convolution kernel in the preset CNN model.
Specifically, η 1 is obtained in S201 by the following steps:
s2011, a sample medical record text set is obtained, wherein the sample medical record text set comprises a plurality of sample medical record texts, and the sample medical record texts are medical record texts for labeling entities.
S2012, a sample entity set is obtained according to the sample medical record text set, wherein the sample entity set comprises a plurality of sample entities, and the sample entities are marked entities in the sample medical record text.
S2013, according to the sample entity set, obtaining a first sample number list and a second sample number list f= { F 1,……,Ft,……,Ff } corresponding to the sample entity type list, where F t is a second sample number corresponding to the t-th sample entity type, and t= … … F, and F is the number of the second sample number in the second sample number list.
Specifically, the sample entity type list includes a plurality of sample entity types, where the sample entity types are entity types classified based on the number of text characters corresponding to the sample entity, and may be understood as: the number of Chinese characters in the sample entity is 2 and is classified into one type, such as skin, chest tube, number, etc.
Specifically, the first sample number list includes a plurality of first sample numbers, where the first sample numbers are numbers of text characters corresponding to sample entities.
Specifically, the second sample number is the number of sample entities corresponding to each sample entity type in the sample entity set.
S2014, according to F, obtaining a target entity type list e= { E 1,……,Ev,……,Eb } and a third sample number list E 0={E0 1,……,E0 v,……,E0 b},Ev corresponding to E as a v-th target entity type, E 0 v as a third sample number corresponding to E v, v= … … b, b as a number of target entity types, where b=,/>Is a maximum integer not exceeding (alpha x f), alpha being a preset percentage.
Specifically, the target entity type is a sample entity type corresponding to the second sample number of the first α samples after the second sample number in the F is sequenced according to the order from big to small, which is obtained from the sample entity type list.
Further, the third sample number is a first sample number corresponding to the target entity type obtained from the first sample number list.
Specifically, the value range of α is 80% -90%, where those skilled in the art know that α can be selected according to actual requirements, and all fall into the protection range of the present invention, which is not described herein.
S2015, according to E 0, a fourth sample number list E 1={E1 1,……,E1 u,……,E1 w},E1 u is obtained as a u fourth sample number, u= … … w, w=b, where the fourth sample number is a third sample number obtained by reordering the third sample number in E 0 from small to large.
S2016, when E 1 u-E1 u-1 +.1 is present in E 1, then α=α - α 0, and repeatedly executing S2014-S2015 until a preset cycle cutoff condition is satisfied, where the preset cycle cutoff condition is: e 1 u-E1 u-1=1,α0 is a preset parameter threshold, and E 1 u-1 is the (u-1) th fourth sample number.
Specifically, the value range of α 0 is 0.1-0.5, where those skilled in the art know that α 0 can be selected according to actual requirements, which all fall into the protection scope of the present invention and are not described herein.
S2017, when E 1 u-E1 u-1 =1, η 1 =b is obtained.
According to the method, based on the related sample data related to the sample entity in the sample medical record text set, parameters of the preset CNN model are continuously adjusted, so that feature vectors of corresponding dimensions of the target medical record text are obtained, the type of a convolution kernel in the preset CNN model is determined based on the number of text characters corresponding to the entity in the sample medical record text, the corresponding size of the convolution kernel is guaranteed to be consistent, and accuracy of obtaining the entity and the entity tag in the medical record text is improved.
S203, according to eta 1, obtaining a second target parameter eta 2 corresponding to a preset CNN model, wherein the second target parameter eta 2 meets the following conditions:
M/η 1<η2≤(m×μ)/η1 and η 2 =a×m, where μ is a preset parameter, a is the number of attention headers in the preset transducer, and M is any positive integer.
Specifically, the second target parameter η 2 is the number of convolution kernels of each type in the preset CNN model.
Specifically, the value range of μ is 5-10, where those skilled in the art know that μ can be selected according to actual requirements, and all fall into the protection range of the present invention, and are not described herein.
Above-mentioned, set up the second target parameter that presets CNN model correspondence in corresponding scope, avoid because of the parameter undersize lead to the target case history text correspondence feature vector dimension of model output less, cause unable comprehensive information, lead to being difficult to find good solution in the training process for the model is difficult to the condition of convergence, simultaneously, avoid leading to the target case history text correspondence feature vector dimension of model output great because of the model parameter is too big, cause the condition of reducing model operating efficiency, make the accuracy of obtaining entity and entity label in the case history text higher.
S205, obtaining the dimension S of the intermediate feature vector according to eta 1 and eta 2, wherein the dimension S of the intermediate feature vector meets the following conditions:
s=2×η1×η2
S300, inputting B into a preset transducer model, and obtaining a target feature vector list C={C1,……,Ci,……,Cn},Ci=(Ci1,……,Cih,……,Cig),Ci corresponding to a target medical record text as a target feature vector corresponding to an ith character in the target medical record text, wherein C ih is a bit value of an h bit in C i, h= … … g, and g is the dimension of the target feature vector, wherein g meets the following conditions:
g= (n+1) ×4, N is the number of preset physical tags.
S400, acquiring a target entity list corresponding to a target medical record text and a target label list corresponding to the target entity list according to the C, wherein the target entity list comprises a plurality of target entities, the target entities are entities in the target medical record text identified by combining a preset CNN model and a preset transducer model, the target label list comprises a plurality of target labels, and the target labels are preset entity labels categorized by the target entities.
Specifically, those skilled in the art know that any method for obtaining an entity and a label corresponding to the entity in a text by combining a CNN model and a transducer model in the prior art falls within the protection scope of the present invention, and is not described herein.
Specifically, the following steps are further included after S400:
S500, according to the target entity list and the target label list, a first target entity relation diagram corresponding to the target entity list is obtained.
Specifically, in S500, the first target entity relationship diagram is obtained by:
S501, inputting a target entity list and a target label list into a preset neural network model, and acquiring an intermediate entity relation graph corresponding to a target medical record text, wherein the intermediate entity relation graph is a tree structure diagram of a relation between a target entity and a target entity constructed based on the target entity list, and comprises a target root node, a plurality of target child nodes and a plurality of target edges, and the intermediate entity relation graph is acquired in S501 through the following steps:
S5011, when the target entity in the target entity list is not dependent on any target entity except the target entity in the target entity list, connecting the target sub-node corresponding to the target entity with the target root node, wherein the connection direction of the target sub-node corresponding to the target entity and the target root node is that the target root node points to the target sub-node corresponding to the target entity.
Specifically, the target root node is a node that is set by itself, where, as known by those skilled in the art, any method for setting a node by using a symbol in the prior art falls into the protection scope of the present invention, and is not described herein. For example, the target root node is represented by-1.
Specifically, the target child node includes a target entity and a target label corresponding to the target entity.
Specifically, the target edge represents a connection relationship between the target entity and the target entity.
S5013, when the target entity in the target entity list depends on a certain target entity except the target entity in the target entity list, connecting the target sub-node corresponding to the target entity with the target sub-node corresponding to the dependent target entity.
S5015, a target edge is formed based on the connection between the target root node and the target child node and the connection between the target child node and the target child node, so as to obtain an intermediate entity relationship graph.
Specifically, in S501, a preset neural network model is obtained by:
s510, acquiring a key entity list set U= { U 1,……,Ud,……,Uz } corresponding to the key medical record text set and a key label list set corresponding to the key entity list set according to the key medical record text set, wherein U d is a d-th key entity list, d= … … z is the number of the key entity lists.
Specifically, the key medical record text set includes a plurality of key medical record texts, and the key medical record texts are medical record texts for training an initial neural network model.
Specifically, the key tag list set includes a plurality of key tag lists.
Further, each key medical record text corresponds to a key entity list, and each key entity list corresponds to a key label list.
Further, the acquiring manner of the key entity list is consistent with the acquiring manner of the target entity list, and the acquiring manner of the key tag list is consistent with the acquiring manner of the target tag list, which can refer to steps S100 to S400.
S520, inputting the U and the key label list set into the first neural network model, and obtaining a key entity relation diagram corresponding to the U.
Specifically, the key entity relation graph is a tree structure diagram of the relation between key entities constructed based on the key entity list.
Further, the obtaining manner of the key entity relationship diagram is consistent with the obtaining manner of the intermediate entity relationship diagram, and reference may be made to S5011 to S5015.
Specifically, those skilled in the art know that the neural network can be selected according to actual requirements, which fall within the protection scope of the present invention, and are not described herein.
S530, acquiring key parameters zeta corresponding to the first neural network model according to the key relation diagram, wherein zeta meets the following conditions:
sigma 0 d is the number of the two key entities in the real entity relation diagram corresponding to the d-th key medical record text in the key entity relation diagram corresponding to the d-th key medical record text and the relationship between the two key entities is the same, and sigma d is the number of the relationships formed by the two key entities in the real entity relation diagram corresponding to the d-th key medical record text.
S540, continuously adjusting a key parameter ζ until ζ is less than or equal to ζ 0 to obtain a preset neural network model, wherein ζ 0 is a preset key parameter threshold.
Specifically, ζ 0 has a value ranging from 0.1 to 0.3, and those skilled in the art know that ζ 0 can be selected according to actual requirements, which all fall into the protection scope of the present invention, and are not described herein.
S502, processing an intermediate entity relation diagram based on a preset rule list to obtain a first target entity relation diagram, wherein the preset rule list comprises a plurality of preset rules, the preset rules are rules set based on semantic information of a target entity and information included in a target label corresponding to the target entity, and the first target entity relation diagram is an intermediate entity relation diagram after determining a connection direction of a sub node to be selected corresponding to an entity to be selected in the intermediate entity relation diagram and a dependent entity to be selected based on the preset rule list on the basis of the intermediate entity relation diagram.
Specifically, for example: classifying the target labels, wherein the target labels of different types have different priority orders, and the priority orders are as follows: the attribute < auxiliary class < position class < detection content class < pathological manifestation class < treatment class < diagnosis class < inspection mode class, the low priority is dependent on the high priority, namely when the priority corresponding to one target label is lower than the priority corresponding to another target label, the target child node corresponding to the target label points to the target child node corresponding to the other target label in the target entity relation diagram.
According to the method, the device and the system, the entities in the medical record text are acquired by combining the plurality of models, the parameters of the models are continuously adjusted based on the sample data, the accuracy of acquiring the entities in the medical record text is improved, the accuracy of acquiring the entity relationship is further improved, the models and the rules are combined and presented in the form of a graph, the efficiency of acquiring the entity relationship in the medical record text is improved, the entities in the medical record text are processed by the models firstly and then by the rules set based on the characteristic information such as semantic information of the entities in the medical record text, and the accuracy of the acquired entity relationship graph of the medical record text is further improved.
S600, according to the target entity list, a second target entity relation list corresponding to the target entity list is obtained.
Specifically, the system further includes β preset entity relationship tags, where the preset entity relationship tags are tags of a preset relationship between entities, and the tags of the relationship between entities are tags of a corresponding relationship between words representing a physical state of a user, for example: and (5) determining entity relationship labels such as the relationship of the occurrence position and the diagnosis.
In a specific embodiment, the second intermediate entity relationship list is obtained in S600 by:
S601, acquiring a first candidate label list set corresponding to a candidate medical record text set according to the candidate medical record text set, wherein the candidate medical record text set comprises a plurality of candidate medical record texts.
Specifically, the candidate medical record text set includes a plurality of candidate medical record texts, wherein the candidate medical record texts are medical record texts including a preset entity relation label between entities.
Specifically, the first candidate tag list set includes a plurality of first candidate tag lists, where each candidate medical record text corresponds to one first candidate tag list, the first candidate tag list includes a plurality of first candidate tags, and the first candidate tags are preset entity relationship tags corresponding to two entities actually existing in each candidate medical record text.
S602, acquiring a candidate entity list set corresponding to the candidate medical record text set according to the candidate medical record text set, wherein the candidate entity list set comprises a plurality of candidate entity lists, and the candidate entity list comprises a plurality of candidate entities.
Specifically, the acquisition mode of the candidate entity is consistent with the acquisition mode of the target entity,
S603, inputting a candidate entity list set into an initial neural network model, and acquiring a second candidate tag list set, wherein the second candidate tag list set comprises a plurality of second candidate tag lists, the second candidate tag list comprises a plurality of second candidate tags, and the second candidate tags are preset entity relation tags corresponding to two candidate entities in each candidate entity list acquired based on the initial neural network model.
S604, acquiring a target parameter theta corresponding to the initial neural network model according to the first candidate tag list set and the second candidate tag list set.
Specifically, θ is acquired in S604 by:
S6041, according to the first candidate tag list set, obtaining a first target number list y= { Y 1,……,Yα,……,Yβ},Yα corresponding to the preset entity relationship tag list, where α= … … β is a first target number corresponding to the α preset entity relationship tag, and the first target number is a number of each preset entity relationship tag included in the first candidate tag list set.
S6043, according to the second candidate tag list set, a second target number list R= { R 1,……,Rα,……,Rβ},Rα corresponding to the preset entity relationship tag list is obtained, wherein the second target number is the number of each preset entity relationship tag included in the second candidate tag list set, and the second target number is the second target number corresponding to the alpha-th preset entity relationship tag.
S6045, according to a sample medical record text set, a third target number list P= { P 1,……,Pα,……,Pβ},Pα corresponding to a preset entity relation label list is obtained, wherein the third target number is the number of each preset entity relation label included in the sample medical record text, the sample medical record text set comprises a plurality of sample medical record texts, and the sample medical record text is a medical record text for labeling an entity.
S6047, acquiring theta according to Y, R and P, wherein the theta meets the following conditions:
S605, continuously adjusting a target parameter theta until a preset target condition is met, so as to obtain a target entity relation model, wherein the preset target condition is as follows: θ is less than or equal to θ 00 and is a preset target parameter threshold.
Specifically, the value range of θ 0 is 0.1-0.3, where those skilled in the art know that θ 0 can be selected according to actual requirements, which all fall into the protection scope of the present invention and are not described herein.
Specifically, the window corresponding to the target entity relationship model is obtained by determining based on the number of text characters corresponding to the intermediate entity in the obtained intermediate entity list.
S606, inputting the intermediate entity list into a target entity relation model, and obtaining a second intermediate entity relation list corresponding to the intermediate medical record text, wherein the second intermediate entity relation list comprises a plurality of second intermediate entity relations, and the second intermediate entity relations comprise preset entity relation labels matched based on the target entity relation model and two intermediate entities corresponding to the preset entity labels.
In another specific embodiment, the second intermediate entity relationship list is obtained in S600 by:
S61, acquiring a candidate entity list set corresponding to the candidate medical record text set according to the candidate medical record text set, wherein the candidate entity list set comprises a plurality of candidate entity lists, and the candidate entity list comprises a plurality of candidate entities.
Specifically, the obtaining mode of the candidate entity is consistent with the obtaining mode of the candidate entity in the above embodiment.
S62, inputting the candidate entity list set into the initial neural network model, and obtaining a second candidate tag list set and a second candidate priority list g= { G 1,……,Gx,……,Gp},Gx as a second candidate priority corresponding to the x-th second candidate tag, where x= … … p, and p is the number of second candidate priorities.
Specifically, the second candidate tag list set includes a plurality of second candidate tag lists, where the second candidate tag list includes a plurality of second candidate tags, and the second candidate tags are preset entity relationship tags corresponding to two candidate entities in each candidate entity list obtained based on an initial neural network model.
Further, the second candidate priority is a matching degree of matching the relationship between the candidate entity and the candidate entity obtained based on the initial neural network model to a preset entity relationship label.
S63, when epsilon/p is more than or equal to F 0, acquiring a key parameter gamma corresponding to an initial neural network model by adopting a first processing mode, wherein epsilon is the number of second candidate priorities which are not 1 in G, and F 0 is a preset priority threshold.
Specifically, the value range of F 0 is 0.5-0.7, where those skilled in the art know that selection of F 0 can be performed according to actual requirements, which all fall into the protection scope of the present invention, and are not described herein.
Specifically, γ is obtained in S63 by:
S631, according to the candidate medical record text set, a first candidate label list set corresponding to the candidate medical record text set is obtained, wherein the candidate medical record text set comprises a plurality of candidate medical record texts.
Specifically, the candidate medical record text set includes a plurality of candidate medical record texts, wherein the candidate medical record texts are medical record texts including a preset entity relation label between entities.
Specifically, the first candidate tag list set includes a plurality of first candidate tag lists, where each candidate medical record text corresponds to one first candidate tag list, the first candidate tag list includes a plurality of first candidate tags, and the first candidate tags are preset entity relationship tags corresponding to two entities actually existing in each candidate medical record text.
Specifically, the preset entity relationship tag is a preset entity-entity relationship tag, where the entity-entity relationship tag is a tag that characterizes a word-word correspondence relationship of a physical state of a user, for example: and (5) determining entity relationship labels such as the relationship of the occurrence position and the diagnosis.
S632, inputting the candidate entity list set into an initial neural network model, and acquiring a second candidate tag list set, wherein the second candidate tag list set comprises a plurality of second candidate tag lists, the second candidate tag list comprises a plurality of second candidate tags, and the second candidate tags are preset entity relation tags corresponding to two candidate entities in each candidate entity list acquired based on the initial neural network model.
Specifically, those skilled in the art know that any method for obtaining a label based on model training of a neural network model in the prior art falls into the protection scope of the present invention, and is not described herein again, for example, a pcnn model is a neural network model.
S633, according to the first candidate tag list set, obtaining a first target number list y= { Y 1,……,Yα,……,Yβ},Yα corresponding to the preset entity relationship tag list as a first target number corresponding to the α -th preset entity relationship tag, where α= … … β, where the first target number is the number of each preset entity relationship tag included in the first candidate tag list set.
S634, according to the second candidate tag list set, a second target number list R= { R 1,……,Rα,……,Rβ},Rα corresponding to the preset entity relationship tag list is obtained, wherein the second target number is the number of each preset entity relationship tag included in the second candidate tag list set, and the second target number is the second target number corresponding to the alpha-th preset entity relationship tag.
S635, acquiring gamma according to Y and R, wherein the gamma meets the following conditions:
phi is a preset weight coefficient.
Specifically, the value range of phi is 0-1, wherein, the person skilled in the art knows that phi can be selected according to the actual requirement, and all fall into the protection range of the invention, and the description is omitted here.
S64, when epsilon/p is smaller than F 0, acquiring a key parameter gamma corresponding to the initial neural network model by adopting a second processing mode.
Specifically, the γ acquisition method in step S64 is identical to that of θ in the above embodiment.
S65, continuously adjusting the key parameter gamma until a preset target condition is met, so as to obtain a target entity relation model, wherein the preset target condition is as follows: gamma is less than or equal to gamma 00 and is a preset key parameter threshold.
S66, inputting the intermediate entity list into a target entity relation model, and obtaining a second intermediate entity relation list corresponding to the intermediate medical record text, wherein the second intermediate entity relation list comprises a plurality of second intermediate entity relations, and the second intermediate entity relations comprise preset entity relation labels matched based on the target entity relation model and two intermediate entities corresponding to the preset entity labels.
According to the method, the device and the system, the entities in the medical record text are acquired by combining the plurality of models, the parameters of each model are continuously adjusted based on the sample data, the accuracy of acquiring the entities in the medical record text is improved, the accuracy of acquiring the entity relationship is further high, the parameters of the models are adjusted in different modes based on the matching condition in the model training process, and the accuracy of acquiring the entity relationship of the medical record text is improved.
S700, a target event list corresponding to the target medical record text is obtained according to the first target entity relation diagram and the second target entity relation list.
Specifically, the system further comprises a plurality of first medical record text labels and a plurality of second medical record text labels.
Further, the first medical record text label is a type of medical record text related to a process node in the process of processing the user when the body of the user is abnormal, for example: signs, examination, medication, etc.
Further, the second medical record text label is a type of medical record text corresponding to characteristics of the user, such as a smoking history, a menstrual history and an allergy Shi Dengdi, which are not related to a process node involved in the process of processing the user when the user body is abnormal, and the second medical record text label is provided by the user according to the difference of the user.
Specifically, in S700, the following steps are further included:
s701, obtaining a target medical record text label corresponding to a target medical record text, wherein the target medical record text label is a text type corresponding to the target medical record text obtained based on the content and the structure corresponding to a plurality of titles in the target medical record text.
S702, when the target medical record text label is consistent with the first medical record text label, executing S100-S600 to obtain a target triplet list corresponding to the medical record text to be selected, wherein the target triplet list comprises a plurality of target triples, and the target triples comprise two connected to-be-selected entities obtained from the first to-be-selected entity relation diagram, a connection relation corresponding to the two to-be-selected entities, two to-be-selected entities obtained from the second to-be-selected entity relation list and entity relation labels corresponding to the two to-be-selected entities.
S703, acquiring a target event list corresponding to the medical record text to be selected based on the target triplet list, wherein the target event list comprises a plurality of target events, and the target events are target triples acquired from the target triplet list.
S704, when a target triplet list corresponding to the medical record text to be selected is obtained, wherein the target triplet list comprises a plurality of target triples, and the target triples comprise trigger words corresponding to events obtained from the medical record text to be selected based on an event extraction model and two argument corresponding to the trigger words.
Specifically, those skilled in the art know that any method for extracting an event from a text by using an event extraction model in the prior art falls within the protection scope of the present invention, and is not described herein.
Specifically, the argument is an element participating in the occurrence of an event and is composed of entities in the medical record text to be selected.
According to the method, based on the text types corresponding to the medical record text, different methods are adopted to obtain the events in the medical record text, so that the accuracy of the events in the acquired medical record text is higher, meanwhile, the entities in the medical record text are obtained by combining a plurality of models, the parameters of each model are continuously adjusted based on sample data, the accuracy of obtaining the entities in the medical record text is improved, further, the accuracy of obtaining the entity relationship is higher, the trained models are utilized to obtain the entity relationship in the medical record text, and based on the matching condition in the model training process, the parameters of the models are adjusted in different modes, the accuracy of obtaining the entity relationship of the medical record text is improved, and the accuracy of obtaining the events in the medical record text is further improved.
S800, acquiring an initial medical record text set, wherein the initial medical record text set comprises a plurality of initial medical record texts, and the initial medical record texts are medical record texts acquired from a preset database.
Specifically, those skilled in the art know that the database for obtaining the medical record text can be selected according to the actual requirement, which falls into the protection scope of the present invention, and will not be described herein.
S900, acquiring an intermediate medical record text set according to the initial medical record text set, wherein the intermediate medical record text set comprises a plurality of intermediate medical record texts, and the intermediate medical record texts are initial medical record texts corresponding to each user acquired from the initial medical record text set.
Specifically, it can be understood that: the intermediate medical record text set is a multi-dimensional data set which is obtained by splitting the initial medical record text set into a plurality of patients according to the dimension of the users, namely the patients.
Specifically, the data format corresponding to the intermediate medical record text is a JSON format, and those skilled in the art know that any method for converting the text into the JSON data format in the prior art falls into the protection scope of the present invention, and is not described herein.
S1000, obtaining an intermediate event knowledge graph corresponding to the intermediate medical record text set according to the intermediate medical record text set, wherein the intermediate event knowledge graph comprises a plurality of intermediate triples.
Specifically, the obtaining mode of the intermediate triplet is consistent with the obtaining mode of the target triplet.
S1100, acquiring a disease type database according to a preset intermediate rule list and an intermediate event knowledge graph, wherein the disease type database comprises content data included in processing nodes related to disease types.
Specifically, the preset intermediate rule list includes a plurality of preset intermediate rules, where the preset intermediate rules are rules for judging entities in the intermediate event knowledge graph based on characteristics of disease types.
By combining the NLP system with the rules, the data in the medical record text can be more accurately understood and analyzed, the accuracy and the efficiency of acquiring the disease type database are improved, the NLP system can provide understanding and analysis of the medical record text data, richer and more accurate information is provided for a rule engine, and the accuracy of acquiring the disease type database is higher.
The system for acquiring the corresponding entity and the entity tag of the medical record text provided by the embodiment comprises: n preset entity tags, a processor and a memory storing a computer program which, when executed by the processor, implements the steps of: the method comprises the steps of obtaining an initial feature vector list corresponding to a target medical record text, inputting the initial feature vector list into a preset CNN model, obtaining an intermediate feature vector list corresponding to the target medical record text, inputting the intermediate feature vector list into a preset transducer model, obtaining a target feature vector list corresponding to the target medical record text, and obtaining a target entity list corresponding to the target medical record text and a target label list corresponding to the target entity list according to the target feature vector list.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the present disclosure is defined by the appended claims.

Claims (9)

1. An acquisition system for corresponding entities and entity tags of medical record texts, which is characterized by comprising: a processor and a memory storing a computer program which, when executed by the processor, performs the steps of:
S100, an initial feature vector list A={A1,……,Ai,……,An},Ai=(Ai1,……,Aij,……,Aim),Ai corresponding to the target medical record text is obtained, the initial feature vector corresponding to the ith character in the target medical record text, A ij is the bit value of the jth bit in A i, j= … … m, m is the dimension of the initial feature vector, and i= … … n is the number of characters in the target medical record text;
S200, inputting A into a preset CNN model, obtaining an intermediate feature vector list B={B1,……,Bi,……,Bn},Bi=(Bi1,……,Bir,……,Bis),Bi corresponding to a target medical record text as an intermediate feature vector corresponding to an ith character in the target medical record text, wherein B ir is a bit value of an r-th bit in B i, r= … … S, and S is a dimension of the intermediate feature vector, and obtaining S in S200 by the following steps:
S201, acquiring a first target parameter eta 1 corresponding to a preset CNN model, wherein the first target parameter eta 1 is the number of types corresponding to convolution kernels in the preset CNN model;
s203, according to eta 1, obtaining a second target parameter eta 2 corresponding to a preset CNN model, wherein the second target parameter eta 2 meets the following conditions:
M/η 1<η2≤(m×μ)/η1 and η 2 =a×m, where μ is a preset parameter, a is the number of attention headers in a preset transducer, and M is any positive integer;
S205, obtaining the dimension S of the intermediate feature vector according to eta 1 and eta 2, wherein the dimension S of the intermediate feature vector meets the following conditions:
s=2×η1×η2
S300, inputting B into a preset transducer model, and obtaining a target feature vector list C={C1,……,Ci,……,Cn},Ci=(Ci1,……,Cih,……,Cig),Ci corresponding to a target medical record text as a target feature vector corresponding to an ith character in the target medical record text, wherein C ih is a bit value of an h bit in C i, h= … … g, and g is the dimension of the target feature vector, wherein g meets the following conditions:
g= (n+1) ×4, N being the number of preset entity tags;
s400, acquiring a target entity list corresponding to a target medical record text and a target label list corresponding to the target entity list according to the C, wherein the target entity list comprises a plurality of target entities, the target entities are entities in the target medical record text identified by combining a preset CNN model and a preset transducer model, the target label list comprises a plurality of target labels, and the target labels are preset entity labels categorized by the target entities.
2. The system according to claim 1, wherein the initial feature vector is a feature vector generated by concatenating a vector generated based on content information and position information corresponding to the chinese characters in the target medical record text.
3. The system of claim 1, wherein m = m 1+m2, wherein m 1 is a dimension of a vector generated based on content information corresponding to the chinese characters in the target medical record text, and m 2 is a dimension of a vector generated based on location information corresponding to the chinese characters in the target medical record text.
4. The system for obtaining medical record text corresponding to entities and entity tags according to claim 1, wherein the target medical record text is medical record text of the entity to be obtained and the entity corresponding tag.
5. The system for obtaining a corresponding entity and an entity tag of a medical record text according to claim 1, wherein η 1 is obtained in S201 by:
S2011, acquiring a sample medical record text set, wherein the sample medical record text set comprises a plurality of sample medical record texts, and the sample medical record texts are medical record texts for labeling entities;
S2012, acquiring a sample entity set according to the sample medical record text set, wherein the sample entity set comprises a plurality of sample entities, and the sample entities are marked entities in the sample medical record text;
S2013, according to the sample entity set, acquiring a first sample number list and a second sample number list F= { F 1,……,Ft,……,Ff } corresponding to the sample entity type list, wherein F t is the second sample number corresponding to the t-th sample entity type, t= … … F, and F is the number of the second sample number in the second sample number list;
S2014, according to F, obtaining a target entity type list e= { E 1,……,Ev,……,Eb } and a third sample number list E 0={E0 1,……,E0 v,……,E0 b},Ev corresponding to E as a v-th target entity type, E 0 v as a third sample number corresponding to E v, v= … … b, b as a number of target entity types, where b= ,/>Is a maximum integer not exceeding (α×f), α being a preset percentage;
S2015, according to E 0, obtaining a fourth sample number list E 1={E1 1,……,E1 u,……,E1 w},E1 u as a u fourth sample number, u= … … w, w=b, where the fourth sample number is a third sample number obtained by reordering the third sample number in E 0 from small to large;
S2016, when E 1 u-E1 u-1 +.1 is present in E 1, then α=α - α 0, and repeatedly executing S2014-S2015 until a preset cycle cutoff condition is satisfied, where the preset cycle cutoff condition is: e 1 u-E1 u-1=1,α0 is a preset parameter threshold, E 1 u-1 is the (u-1) th fourth sample number;
S2017, when E 1 u-E1 u-1 =1, η 1 =b is obtained.
6. The system for obtaining medical record text corresponding entities and entity tags according to claim 5, wherein the first sample number list comprises a plurality of first sample numbers, wherein the first sample numbers are numbers of text characters corresponding to the sample entities.
7. The system for obtaining medical record text corresponding entities and entity tags according to claim 5, wherein the second number of samples is a number of sample entities corresponding to each sample entity type in the set of sample entities.
8. The system for obtaining an entity and an entity tag corresponding to medical record text according to claim 5, wherein the third sample number is a first sample number corresponding to a target entity type obtained from a first sample number list.
9. The system for obtaining the corresponding entity and the entity tag of the medical record text according to claim 5, wherein the value range of alpha is 80% -90%.
CN202410488600.0A 2024-04-23 2024-04-23 Acquisition system for corresponding entity and entity tag of medical record text Active CN118093736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410488600.0A CN118093736B (en) 2024-04-23 2024-04-23 Acquisition system for corresponding entity and entity tag of medical record text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410488600.0A CN118093736B (en) 2024-04-23 2024-04-23 Acquisition system for corresponding entity and entity tag of medical record text

Publications (2)

Publication Number Publication Date
CN118093736A true CN118093736A (en) 2024-05-28
CN118093736B CN118093736B (en) 2024-08-16

Family

ID=91155284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410488600.0A Active CN118093736B (en) 2024-04-23 2024-04-23 Acquisition system for corresponding entity and entity tag of medical record text

Country Status (1)

Country Link
CN (1) CN118093736B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN114239582A (en) * 2021-12-15 2022-03-25 天津健康医疗大数据有限公司 Electronic medical record detail extraction method and system based on semantic information
US20220301670A1 (en) * 2019-09-06 2022-09-22 Roche Molecular Systems, Inc. Automated information extraction and enrichment in pathology report using natural language processing
CN117556034A (en) * 2023-11-14 2024-02-13 生命奇点(北京)科技有限公司 Data processing system for standardizing output results of electronic medical record question-answering model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
US20220301670A1 (en) * 2019-09-06 2022-09-22 Roche Molecular Systems, Inc. Automated information extraction and enrichment in pathology report using natural language processing
CN114239582A (en) * 2021-12-15 2022-03-25 天津健康医疗大数据有限公司 Electronic medical record detail extraction method and system based on semantic information
CN117556034A (en) * 2023-11-14 2024-02-13 生命奇点(北京)科技有限公司 Data processing system for standardizing output results of electronic medical record question-answering model

Also Published As

Publication number Publication date
CN118093736B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN111708873B (en) Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium
CN111444344B (en) Entity classification method, entity classification device, computer equipment and storage medium
KR20190080234A (en) English text formatting method based on convolution network
US20230058194A1 (en) Text classification method and apparatus, device, and computer-readable storage medium
CN117711600B (en) LLM model-based electronic medical record question-answering system
CN112347352A (en) Course recommendation method and device and storage medium
CN117556034B (en) Data processing system for standardizing output results of electronic medical record question-answering model
JP2019082841A (en) Generation program, generation method and generation device
CN117473093B (en) Data processing system for acquiring target event based on LLM model
CN117520126B (en) Scoring system of electronic medical record question-answering model
CN117454989B (en) System for updating electronic medical record question-answer model based on parameter adjustment
CN117454990B (en) System for updating electronic medical record question-answer model based on feedback result
CN117520754A (en) Pretreatment system for model training data
CN110442674B (en) Label propagation clustering method, terminal equipment, storage medium and device
CN118093736B (en) Acquisition system for corresponding entity and entity tag of medical record text
Menon et al. Clustering of words using dictionary-learnt word representations
Pinto et al. What Drives Research Efforts? Find Scientific Claims that Count!
Schein et al. Author Attribution Evaluation with Novel Topic Cross-validation.
CN118197645A (en) System for acquiring target entity relation model based on medical record text
CN112883191A (en) Agricultural entity automatic identification classification method and device
CN113868424A (en) Text theme determining method and device, computer equipment and storage medium
CN118197646A (en) Data processing system for acquiring entity relationship in medical record text
CN118315077A (en) System for acquiring entity relation diagram based on medical record text
CN118471528A (en) Data processing system for acquiring events in medical record text
CN111966780A (en) Retrospective queue selection method and device based on word vector modeling and information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant