CN117454989B - System for updating electronic medical record question-answer model based on parameter adjustment - Google Patents
System for updating electronic medical record question-answer model based on parameter adjustment Download PDFInfo
- Publication number
- CN117454989B CN117454989B CN202311514678.7A CN202311514678A CN117454989B CN 117454989 B CN117454989 B CN 117454989B CN 202311514678 A CN202311514678 A CN 202311514678A CN 117454989 B CN117454989 B CN 117454989B
- Authority
- CN
- China
- Prior art keywords
- text
- medical record
- electronic medical
- preset
- question
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000004590 computer program Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 64
- 230000002159 abnormal effect Effects 0.000 claims description 23
- 238000000034 method Methods 0.000 description 46
- 239000011159 matrix material Substances 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 5
- 206010041823 squamous cell carcinoma Diseases 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000001574 biopsy Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 210000001989 nasopharynx Anatomy 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 101100379079 Emericella variicolor andA gene Proteins 0.000 description 1
- 241000282842 Lama glama Species 0.000 description 1
- 102000009151 Luteinizing Hormone Human genes 0.000 description 1
- 108010073521 Luteinizing Hormone Proteins 0.000 description 1
- 208000002454 Nasopharyngeal Carcinoma Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 229940040129 luteinizing hormone Drugs 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 201000011216 nasopharynx carcinoma Diseases 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Primary Health Care (AREA)
- Molecular Biology (AREA)
- Epidemiology (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Human Computer Interaction (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to a system for updating an electronic medical record question-answer model based on parameter adjustment, which comprises a sample electronic medical record information set, a processor and a memory storing a computer program, wherein when the computer program is executed by the processor, the following steps are realized: according to a sample electronic medical record text information set, an initial electronic medical record question-answering model is obtained, when the data volume of a training set corresponding to the initial electronic medical record question-answering model is larger than a preset data volume threshold value, a candidate parameter list is obtained, a first middle priority list is obtained, a second middle priority set and a third middle priority set are respectively obtained based on a first preset text, and a target parameter is adjusted according to a final priority list to update the initial electronic medical record question-answering model.
Description
Technical Field
The invention relates to the technical field of large language models, in particular to a system for updating an electronic medical record question-answering model based on parameter adjustment.
Background
Along with the continuous growth of medical service and the continuous development of artificial intelligence technology, the medical record electronization has become trend, the related models in the medical field designed based on medical record documents are more and more, the performance of the models determines the reliability of data processing results, along with the wide application of large language models in natural language processing, how to improve the reasoning speed of the large language models in actual deployment becomes a very critical problem, and therefore, the adjustment of parameters based on an electronic medical record question-answer model becomes a popular research direction.
At present, in the prior art, the method for updating the electronic medical record question-answering model comprises the following steps: rule matching is carried out based on the regular expression, training reasoning is carried out based on the traditional machine learning model to obtain an electronic medical record question-answering model, and parameters of the electronic medical record question-answering model are adjusted by adopting a batch reasoning method.
In summary, the method for updating the electronic medical record question-answering model has the following problems: the model parameters are not adjusted based on the size of the data volume, so that the adaptation degree of the model is reduced, the problems of time increase and resource waste of model training are caused when the data volume is too large, and simultaneously the reasoning capacity and the response capacity of the model are reduced in the process of adjusting the model parameters, so that the accuracy of the output result of the electronic medical record question-answering model is reduced.
Disclosure of Invention
The invention provides a system for updating an electronic medical record question-answer model based on parameter adjustment, which comprises: a system for updating an electronic medical record question-answering model based on parameter adjustment, the system comprising a sample electronic medical record information set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is corresponding abnormal state characteristic information in medical records acquired from a database, and when the computer program is executed by the processor, the following steps are realized:
S100, acquiring an initial electronic medical record question-answer model according to the sample electronic case information set.
And S200, when the data volume of the training set corresponding to the initial electronic medical record question-answer model is larger than a preset data volume threshold, acquiring a candidate parameter list omega= { omega 1,……,ωc,……,ωw},ωc corresponding to the initial electronic medical record question-answer model as a c candidate parameter, wherein c= … … w and w are the number of candidate parameters, and omega c=2c and w=6.
And S300, obtaining a first intermediate priority list Tω= { Tω 1,……,Tωc,……,Tωw},Tωc corresponding to ω according to ω, wherein Tω= { Tω 1,……,Tωc,……,Tωw},Tωc is the first intermediate priority corresponding to ω c.
S400, when the first preset text is the first preset text of the first type, acquiring the second intermediate priority set Eω={Eω1,……,Eωc,……,Eωw},Eωc={Eωc1……,Eωcμ,……,Eωcτ},Eωcμ corresponding to ω based on the preset weight type to be the μ second intermediate priority in the second intermediate priority list corresponding to ω c, where μ= … … τ, τ is the number of preset weight types.
S500, when the first preset text is the second type first preset text, acquiring a third intermediate priority set Lω={Lω1,……,Lωc,……,Lωw},Lωc={Lωc1……,Lωcμ,……,Lωcτ},Lωcμ corresponding to omega based on the preset weight type to be the mu third intermediate priority in a third intermediate priority list corresponding to omega c.
S600, obtaining a final priority list Fω= { Fω 1,……,Fωc,……,Fωw } corresponding to ω according to T ω, E ω and L ω, wherein Fω c meets the following conditions:
And S700, adjusting omega c to be a target parameter of the initial electronic medical record question-answer model according to F omega so as to update the initial electronic medical record question-answer model, wherein F omega c is the largest final priority in F omega.
The invention relates to a system for updating an electronic medical record question-answering model based on parameter adjustment, which comprises a sample electronic medical record information set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is corresponding abnormal state characteristic information in medical records acquired from a database, and when the computer program is executed by the processor, the following steps are realized: according to a sample electronic medical record text information set, an initial electronic medical record question-answering model is obtained, when the data volume of a training set corresponding to the initial electronic medical record question-answering model is larger than a preset data volume threshold, a candidate parameter list corresponding to the initial electronic medical record question-answering model is obtained, according to the candidate parameter list, a first intermediate priority list is obtained, when a first preset text is a first preset text of a first type, a second intermediate priority set corresponding to omega is obtained based on a preset weight type, when the first preset text is a first preset text of a second type, a third intermediate priority set corresponding to omega is obtained based on a preset weight type, a final priority list is obtained according to the first intermediate priority set, the second intermediate priority set and the third intermediate priority set, and according to the final priority list, target parameters of the initial electronic medical record question-answering model are obtained to update the initial electronic medical record question-answering model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a computer program executed by a system for updating an electronic medical record question-answer model based on call parameters according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A system for updating an electronic medical record question-answer model based on tuning, the system comprising a sample electronic medical record information set, a processor and a memory storing a computer program which, when executed by the processor, performs the steps of, as shown in fig. 1:
S100, acquiring an initial electronic medical record question-answer model according to the sample electronic case information set.
Specifically, in S100, an initial electronic medical record question-answer model is obtained by:
S10, acquiring a specified text vector set according to the sample electronic medical record information set.
Specifically, the sample electronic medical record information set includes a plurality of sample electronic medical record information, where the sample electronic medical record information is abnormal state feature information corresponding to medical records obtained from a database, where the abnormal state feature information is feature information associated with a disease, for example, abnormal state feature information such as abnormal glycoglycoprotein tap is in abnormal detection, and nasopharynx presents hypodifferentiation squamous cell carcinoma.
Furthermore, those skilled in the art know that any selection of the medical public database capable of acquiring cases can be performed according to actual requirements, which falls into the protection scope of the present invention, and will not be described again.
Further, the data format of the sample electronic medical record information comprises a text format and a table format.
Specifically, the system further comprises a target term knowledge graph, wherein the target term knowledge graph presents triplet forms, and each triplet form in the target term knowledge graph comprises two entities related to an abnormal state and a relationship between the two entities related to the abnormal state.
Further, those skilled in the art know that any method for constructing a knowledge graph based on a target term in the prior art falls into the protection scope of the present invention, and is not described herein.
Specifically, in S10, the method further includes the following steps:
S1, according to a sample electronic medical record information set, a candidate text set A= { A 1,……,Ai,……,An},Ai is the ith candidate text, i= … … n, and n is the number of the candidate texts.
Specifically, in S1, candidate texts are obtained by:
And S11, when the data format of the sample electronic case information is a text format, the sample electronic case information is segmented according to segmentation symbols to generate candidate texts.
S13, when the data format of the sample electronic case information is a table format, integrating each record in the sample electronic case information and the field name corresponding to the record to generate a candidate text, which can be understood as: when field names corresponding to each field in the sample electronic case information are ID, biopsy position and histological classification in sequence from left to right, and contents of a certain row in the sample electronic case information are 008 # s, nasopharynx and squamous cell carcinoma in sequence from left to right, a candidate text is obtained as follows: the biopsy site with ID 008 is nasopharyngeal and histologically classified as squamous cell carcinoma.
And S3, acquiring a candidate keyword set Q= { Q 1,……,Qi,……,Qn},Qi corresponding to the A as a candidate keyword list corresponding to the A i according to the A and the target term knowledge graph.
Specifically, in S3, Q i is obtained by the following steps:
S31, according to a, acquiring a first intermediate word set B={B1,……,Bi,……,Bn},Bi={Bi1,……,Bij,……,Bim(i)},Bij corresponding to a as the j-th first intermediate word in the first intermediate word list corresponding to a i, where j= … … m (i), and m (i) is the number of first intermediate words in the first intermediate word list corresponding to a i.
Specifically, the first intermediate word is a word obtained from the candidate text, and those skilled in the art know that any method for extracting a word from the text in the prior art falls within the protection scope of the present invention, and is not described herein in detail.
And S33, according to the target term knowledge graph, acquiring a target word list D= { D 1,……,Dr,……,Ds},Dr as an r-th target word, wherein r= … … S and S as the number of target words.
Specifically, the target word is an entity related to the abnormal state obtained from the target term knowledge graph.
And S35, according to B and D, acquiring a first intermediate similarity set F={F1,……,Fi,……,Fn},Fi={Fi1,……,Fij,……,Fim(i)},Fij={F1 i1,……,Fr ij,……,Fs im(i)},Fr ij corresponding to B as a first intermediate similarity between B ij and D r.
Specifically, the first intermediate similarity is a similarity between a word vector corresponding to the first intermediate word and a word vector corresponding to the target word, where one skilled in the art knows that any method for calculating the similarity between the vectors in the prior art falls within the protection scope of the present invention, and is not described herein.
Further, the method for the word vector corresponding to the first intermediate word is to input the first intermediate word vector into a vector corresponding to the word obtained in the natural language processing model, where those skilled in the art know that any natural language processing model for converting text into a vector in the prior art falls into the protection scope of the present invention, and is not described herein again.
S37, inserting B ij into Q i when F r ij≥F0 is performed, wherein F 0 is a preset first intermediate similarity threshold.
Specifically, the value range of F 0 is 0.8-0.9, where those skilled in the art know that selection of F 0 can be performed according to actual requirements, which all fall into the protection scope of the present invention, and are not described herein.
S5, according to A and Q, obtaining an initial text set T= { T 1,……,Ti,……,Tn},Ti={Ai,Qi},Ti as an ith initial text.
Specifically, the initial text is a text obtained by splicing the candidate text and the candidate keywords and splicing the candidate keywords after the candidate text.
S7, acquiring a specified text set U= { U 1,……,Ui,……,Un},Ui as an ith specified text according to T, wherein U i is acquired in S7 through the following steps:
S71, according to T i, the x-th text character corresponding to the text string WTi=(WT0 i1,……,WT0 ix,……,WT0 ip,WT1 i1,……,WT1 iy,……,WT1 iq),WT0 ix of T i is the x-th text character corresponding to a i, x= … … p, p is the number of text characters corresponding to a i, WT 1 iy is the y-th text character corresponding to Q i, y= … … Q, and Q is the number of text characters corresponding to Q i is obtained.
S72, when p+q=k, acquiring U i=Ti, where K is a preset key priority threshold.
Specifically, in S72K is obtained by:
S721, according to T, obtaining a set C={C1,……,Cd,……,Cz},Cd={Cd1,……,Cdg,……,Cdh(d)},Cdg of key text types as the g-th key text in the d-th type key text list, g= … … h (d), h (d) being the number of key texts in the d-th type key text list, d= … … z, and z being the number of key text types.
Specifically, the key text is an initial text obtained from T based on a text type corresponding to the initial text, and those skilled in the art know that any method for classifying the text in the prior art falls within the scope of the present invention, and the method for classifying the text by using keywords of the text is not described herein, where the text type is classified into a text type corresponding to the initial text, such as a heart type and a ocularnose and throat type.
S723, according to C, obtaining a first text string quantity set C0={C0 1,……,C0 d,……,C0 z},C0 d={C0 d1,……,C0 dg,……,C0 dh(d)},C0 dg corresponding to C as a first text string quantity corresponding to C dg.
Specifically, the number of the first text strings is the number of text strings corresponding to the key text.
S725, according to C 0, the second text string quantity set C1={C1 1,……,C1 d,……,C1 z},C1 d={C1 d1,……,C1 du,……,C1 dh(d)},C1 du corresponding to C is obtained as the u-th second text in the second text string quantity list corresponding to the d-th type key text list, where u=1 … … h (d), and C 1 d1≥……≥C1 du≥……≥C1 dh(d).
Specifically, the second text string number is the text string number sequentially obtained according to the first text string number from the big to the small.
Further, the number of text strings is the number of text strings corresponding to the text.
S725, according to C 0, obtaining K, wherein K meets the following conditions:
Wherein, C 1 dα is the number of text strings corresponding to the key text corresponding to the number of alpha second text strings in the d-th type key text list, and epsilon is a preset first number threshold.
In particular, the method comprises the steps of,Is an integer of not more than (h (d) ×ε).
Specifically, the value range of epsilon is 0.85-1, wherein, the person skilled in the art knows that epsilon can be selected according to the actual requirement, and the epsilon falls into the protection range of the invention, and is not repeated here.
According to the method, the preset key priority threshold is obtained based on the types of the key texts and the number of text strings corresponding to each type of key texts, so that the number of text strings corresponding to the initial text is uniform, the comprehensiveness of texts corresponding to the appointed text vectors obtained later is guaranteed by combining the number of text strings corresponding to the types of the texts, the accuracy of the obtained text string number unified value is improved by setting the threshold based on the number of text strings corresponding to each type of key texts, the problem that text data are easy to be lost due to too short text string length can be avoided, the problem that text data processing efficiency is reduced due to too long text string length can be avoided, and the accuracy of the appointed text vector set obtained later is improved.
S73, when p+q > K, acquiring a candidate priority set P={P1,……,Pi,……,Pn},Pi={Pi1,……,Pie,……,Pif(i)},Pie corresponding to Q as a candidate priority corresponding to the e-th candidate keyword in the candidate keyword list corresponding to Q i, e= … … f (i), and f (i) as the number of candidate keywords in the candidate keyword list corresponding to Q i.
Specifically, in S73, P ie is obtained by the following steps:
S731, the candidate keyword list Q i={Qi1,……,Qie,……,Qif(i)},Qie is obtained as the e-th candidate keyword in Q i.
S733, according to the target term knowledge graph, acquiring a specified keyword list R ie={R1 ie,……,Ra ie,……,Rb(e) ie corresponding to Q ie and a specified priority list G ie={G1 ie,……,Ga ie,……,Gb(e) ie},Ra ie corresponding to Q ie, wherein the specified keyword list G ie={G1 ie,……,Ga ie,……,Gb(e) ie},Ra ie is the a-th specified keyword corresponding to Q ie, a= … … b (e), b (e) is the number of specified keywords corresponding to Q ie, and G a ie is the specified priority between Q ie and R a ie.
Specifically, the specified keyword is a target word associated with the candidate keyword, which is obtained from the target term knowledge graph.
Specifically, the specified priority is the association degree between the candidate keyword and the specified keyword, where those skilled in the art know that any method for obtaining the association degree between two texts in the prior art falls into the protection scope of the present invention, and is not described herein in detail.
S735, according to Q ie、Rie and G ie, P ie is acquired, wherein P ie meets the following conditions:
Where M ie is the number of occurrences of Q ie in candidate text set A, N ie is the number of first intermediate words corresponding to candidate texts in candidate text set A that include Q ie, V ie is the number of candidate texts in candidate text set A that include Q ie, E a ie is the number of occurrences of G a ie in candidate text set A, L a ie is the number of first intermediate words corresponding to candidate texts in candidate text set A that include G a ie, and J a ie is the number of candidate texts in candidate text set A that include G a ie.
S74, WT i is processed to obtain U i based on P.
Specifically, in S74, the following steps are further included:
S741, according to P i, obtaining a first intermediate text beta 1 i=(Ai,Qi1,……,Qi(e-1),Qi(e+1)……,Qif(i) corresponding to T i, wherein P ie is the smallest candidate priority in P i.
And S743, when the number of text strings corresponding to beta 1 i is not more than K, acquiring U i=βi.
S745, when the number of text strings corresponding to β 1 i is greater than K, the minimum candidate priority of P ie except P ie in P i is obtained, and deleted in the initial text Q i to obtain the second intermediate text β 2 i corresponding to T i.
S747, repeatedly executing S743 to S745 until the number of text strings corresponding to the acquired U i is not greater than K so as to acquire U i.
S75, when p+q is less than K, acquiring a specified keyword set R i={Ri1,……,Rie,……,Rif(i) corresponding to Q i and a specified priority set G i={Gi1,……,Gie,……,Gif(i)},Rie corresponding to Q i, wherein G ie is a specified keyword list corresponding to Q ie, and G ie is a specified priority list corresponding to Q ie.
S76, according to R i and G i, the WT i is processed to obtain U i.
Specifically, in S76, the following steps are further included:
and S761, when G a ie is the largest designated priority in G ie, acquiring a first candidate text set corresponding to T i, wherein the first candidate text set comprises a plurality of first candidate texts, and the first candidate texts are candidate texts which are acquired from A and comprise designated keywords R a ie corresponding to G a ie.
S763, based on the first candidate text set corresponding to the T i, obtaining a second candidate text H i corresponding to the T i, wherein H 0 i=K-p-q,H0 i is the number of text strings corresponding to the H i.
S765, according to H i, U i=(Ai,Qi,Hi is obtained).
And when the text character strings corresponding to the initial text are not less than the preset length threshold, performing supplementary processing on the text associated with the candidate keywords corresponding to the initial text, unifying the number of the text character strings corresponding to the initial text by adopting different processing modes based on different numbers of the text character strings corresponding to the initial text, and improving the accuracy of the acquired appointed text vector set.
S9, acquiring a specified text vector set according to the U, wherein the specified text vector set comprises a plurality of specified text vectors, and the specified text vectors are acquired by inputting specified texts into a pre-training electronic medical record coding model.
Specifically, the pre-training electronic medical record coding model is a model which is obtained by training a medical record text training set based on the pre-training model and is used for converting texts into vectors.
Further, those skilled in the art know that the selection of the pre-training model can be performed according to actual requirements, which falls within the protection scope of the present invention, and the pre-training model, such as ERNIE, is not described herein.
Further, the medical record text training set is a medical record text set for model training, which is acquired based on different search engines, and the medical record text set comprises a plurality of medical record texts with different types and forms.
Further, those skilled in the art will know that any method of obtaining text from multiple search engines in the prior art falls within the protection scope of the present invention, and will not be described herein, where the search engines, such as hundred degrees, etc.
S20, acquiring a first target text set corresponding to the first preset text set based on the first preset text set and the appointed text vector set.
Specifically, the first preset text set includes a plurality of first preset texts, where the first preset texts are question texts related to abnormal states, which are acquired based on the abnormal states.
Further, the question text is text which presents the requirement for answer and interpretation in the form of a question, for example: question text of the expression of luteinizing hormone lower than 3.
Further, the first preset text is a question text obtained through a medical public database, where those skilled in the art know that any text of a question related to medicine obtained based on the medical public database in the prior art falls within the protection scope of the present invention, and is not described herein.
Specifically, in S20, the following steps are further included:
S21, a first preset text vector set I= { I 1,……,It,……,Iθ},It is obtained, wherein t= … … θ is the number of first preset texts, and the first preset text vector set I= { I 1,……,It,……,Iθ},It is the first preset text vector corresponding to the t first preset text.
Specifically, the first preset text vector is used for inputting the first preset text into the pre-training electronic medical record coding model.
S23, acquiring a designated text vector setA text vector is specified for the i-th.
S25, according to I andObtaining a first target similarity set ER={ER1,……,ERt,……,ERθ},ERt={ERt1,……,ERti,……,ERtn},ERti corresponding to I as I t andA first target similarity between.
Specifically, those skilled in the art know that any method for obtaining the similarity between vectors in the prior art falls into the protection scope of the present invention, and the method for calculating the similarity between vectors, such as cosine similarity, is not described herein.
S27, when ER ti≥ER0 is availableThe corresponding target text U i is the first target text corresponding to I t, where ER 0 is a preset second priority threshold.
Specifically, the range of ER 0 is 0.8-0.85, and those skilled in the art know that ER 0 can be selected according to actual requirements, which fall within the protection scope of the present invention and are not described herein.
S30, acquiring a second target text set corresponding to the first preset text set based on the first preset text set and the first target text set.
Specifically, the second target text set includes a plurality of second target texts, where the second target texts are explanatory content texts associated with the first preset text generated based on the first preset text and the first target text set through a prompt instruction, for example, when the first preset text relates to the heart, the heart is simply explained by combining the first target text related to the first preset text and related knowledge in some abnormal state fields, and the first preset text and the explanatory content acquired based on the first preset text are regarded as the second target text.
Further, those skilled in the art know that any method for training by using a prompt instruction in the prior art to output a result falls within the protection scope of the present invention, and is not described herein.
According to the method, the second target text set corresponding to the first preset text set is generated based on the first preset text set and the first target text through the prompt instruction, the medical record text corresponding to each question text is obtained, and the prompt instruction is used for setting the instruction for the corresponding question text, so that understanding and replying of the electronic medical record question-answering system are facilitated, and accuracy of an output result of the electronic medical record question-answering system is improved.
S40, inputting the first preset text set and the second target text set into a preset first initial LLM model, and obtaining a third target text set corresponding to the first preset text set.
Specifically, the third target text set includes a plurality of third target texts, where the third target texts are answer texts and interpretation texts corresponding to a first preset text obtained based on the first preset text.
Further, the answer text is a text which answers based on the question text.
Further, the explanation text is text for obtaining explanation of the answer text based on the question text.
Further, in S40, a third target text is acquired by:
S41, acquiring a psi fourth target text corresponding to the first preset text according to the first preset text and a second target text corresponding to the first preset text, wherein the fourth target text is an answer text and an explanation text corresponding to the first preset text acquired in a plurality of LLM models based on the second target text.
Specifically, those skilled in the art know that any method of outputting a result through the LLM model in the prior art falls within the protection scope of the present invention, and will not be described herein, where LLM models such as Baichuan-13B model, LLaMA model, etc. are included.
Specifically, the value range of ψ is 30-50, where those skilled in the art know that selection of ψ can be performed according to actual requirements, which falls into the protection range of the present invention, and will not be described herein.
S43, acquiring a priority corresponding to the fourth target text according to the fourth target text, wherein the priority is a score value acquired based on a voting method, and any method for acquiring the score based on the voting method in the prior art is known by a person skilled in the art and falls into the protection scope of the present invention, and is not repeated herein.
Specifically, the value range of the priority is 0-1.
S45, acquiring a third target text corresponding to the first preset text according to the priority, wherein the third target text is a fourth target text corresponding to the maximum priority.
S50, the first target text set, the second target text set and the third target text set are used as training sets to be input into a preset second initial LLM model, and an initial electronic medical record question-answering model is generated.
And S200, when the data volume of the training set corresponding to the initial electronic medical record question-answer model is larger than a preset data volume threshold, acquiring a candidate parameter list omega= { omega 1,……,ωc,……,ωw},ωc corresponding to the initial electronic medical record question-answer model as a c candidate parameter, wherein c= … … w and w are the number of candidate parameters, and omega c=2c and w=6.
Specifically, the candidate parameter is a rank corresponding to a matrix set for reducing training time of a training set in an initial electronic medical record question-answering model, where the rank can be understood as: when the LLM model performs data processing, multiplication between the matrix and the matrix is involved, and when the data volume of the training set is too large, the training efficiency is reduced, so that a matrix with a slightly smaller rank needs to be set to help training in order to reduce the training time of the training set, and the candidate parameter is the set rank of the matrix.
Further, the value range of the preset data quantity threshold is 100 GB-1 TB, and those skilled in the art know that the selection of the preset data quantity threshold can be performed according to the actual requirement, which falls into the protection range of the present invention, and will not be described herein.
And S300, obtaining a first intermediate priority list Tω= { Tω 1,……,Tωc,……,Tωw},Tωc corresponding to ω according to ω, wherein Tω= { Tω 1,……,Tωc,……,Tωw},Tωc is the first intermediate priority corresponding to ω c.
Specifically, the first intermediate priority is the occupancy rate of the GPU in the running process of the initial electronic medical record question-answering model, where those skilled in the art know that any method for obtaining the occupancy rate of the GPU in the prior art falls into the protection scope of the present invention, and is not described herein.
S400, when the first preset text is the first preset text of the first type, acquiring the second intermediate priority set Eω={Eω1,……,Eωc,……,Eωw},Eωc={Eωc1……,Eωcμ,……,Eωcτ},Eωcμ corresponding to ω based on the preset weight type to be the μ second intermediate priority in the second intermediate priority list corresponding to ω c, where μ= … … τ, τ is the number of preset weight types.
Specifically, the first preset text of the first type is a question text which is a single question and has no relevance with other questions.
Specifically, the second intermediate priority is a score value corresponding to the initial electronic medical record question-answering model obtained based on the candidate parameter and the first preset text under different preset weight types, wherein a person skilled in the art knows that any method for obtaining the model based on different conditions in the prior art falls into the protection scope of the present invention, and is not repeated herein.
Specifically, the preset weight type is a matrix type of calculated weights, where it can be understood that: in the transducer architecture, there are four weight matrices (Wq, wk, wv, wo) in the self-attention module, where Wq (or Wk, wv) is considered a single square matrix.
In particular, the method comprises the steps of, τ is more than or equal to 4 and less than or equal to 30.
Preferably, τ has a value of 6, where when τ takes 6, it is possible to avoid the problem of low efficiency caused by performing a large number of tests, and ensure the comprehensiveness of the tests.
S500, when the first preset text is the second type first preset text, acquiring a third intermediate priority set Lω={Lω1,……,Lωc,……,Lωw},Lωc={Lωc1……,Lωcμ,……,Lωcτ},Lωcμ corresponding to omega based on the preset weight type to be the mu third intermediate priority in a third intermediate priority list corresponding to omega c.
Specifically, the second type of first preset text is a question text including a plurality of questions and associated with each question.
Specifically, the third intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameter and the second type first preset text.
Further, the obtaining mode of the third intermediate priority is consistent with the obtaining mode of the second intermediate priority.
S600, obtaining a final priority list Fω= { Fω 1,……,Fωc,……,Fωw } corresponding to ω according to T ω, E ω and L ω, wherein Fω c meets the following conditions:
And S700, adjusting omega c to be a target parameter of the initial electronic medical record question-answer model according to F omega so as to update the initial electronic medical record question-answer model, wherein F omega c is the largest final priority in F omega.
According to the method, the performance of the initial electronic medical record question-answering model is obtained through the candidate parameters of the initial electronic medical record question-answering model, the time for training the model can be saved through setting the candidate parameters, the resource waste is not easy to cause, the reasoning capacity and the corresponding capacity of the model are not influenced, and the parameters are adjusted at the same time, so that the output result of the electronic medical record question-answering model is more accurate.
Specifically, the method further includes the following steps after S700:
s701, inputting a second preset text set into the initial electronic medical record question-answering model, and obtaining the priority to be selected corresponding to the initial electronic medical record question-answering model.
Specifically, the second preset text set includes a plurality of second preset texts, where the second preset texts are question texts related to abnormal states for testing the effect of the initial electronic medical record question-answering model.
Specifically, in S701, the priority to be selected is acquired by:
S7011, the second preset text set is input into the initial electronic medical record question-answer model, and a first key text set ep= { EP 1,……,EPδ,……,EPζ } corresponding to the second preset text set is obtained, where EP δ is a first key text corresponding to a delta second preset text, delta= … … ζ is a number corresponding to the second preset text.
Specifically, the first key text is an answer text and an explanation text corresponding to a second preset text obtained based on an initial electronic medical record question-answering model.
S7013, according to EP, acquiring a first set EP0={EP0 1,……,EP0 δ,……,EP0 ζ},EP0 δ=(EP0 δ1,……,EP0 δγ,……,EP0 δη),EP0 δγ of key text vectors corresponding to EP as bit values of the γ -th bit in the first key text vector corresponding to EP δ, where γ= … … η, η is bits of the first key text vector.
Specifically, the first key text vector is obtained by inputting the first key text into a pre-trained electronic medical record coding model.
S7015, a second set of key texts fp= { FP 1,……,FPδ,……,FPζ},FPδ corresponding to the second set of preset texts is obtained as a second key text corresponding to the delta second set of preset texts.
Specifically, the second key text is an accurate answer text and an interpretation text corresponding to the second preset text.
S7017, obtaining a second key text vector set FP0={FP0 1,……,FP0 δ,……,FP0 ζ},FP0 δ=(FP0 δ1,……,FP0 δγ,……,FP0 δη),EP0 δγ corresponding to the FP as a bit value of the gamma-th bit in the second key text vector corresponding to the EP δ according to the FP.
Specifically, the obtaining mode of the second key text vector is consistent with the obtaining mode of the first key text vector.
S7019, according to EP 0 and FP 0, obtaining a priority KL to be selected corresponding to the initial electronic medical record question-answer model, wherein the KL meets the following conditions:
in another specific embodiment, the candidate priority is acquired in S701 by:
s7001, inputting the second preset text set into the initial electronic medical record question-answering model, and obtaining a first initial text set ew= { EW 1,……,EWλ,……,EWσ }, where EW λ is a lambda-th first initial text, lambda= … … sigma, and sigma is the number of the first initial texts.
Specifically, the first initial text is a first key text with a Chinese-English ratio in a preset ratio range, which is obtained from a first key text set.
Further, the first key text set comprises a plurality of first key texts, wherein the first key texts are answer texts and explanation texts corresponding to second preset texts obtained based on an initial electronic medical record question-answering model.
Further, the answer text is a text which answers based on the question text.
Further, the explanation text is text for obtaining explanation of the answer text based on the question text.
Further, the preset ratio range is tr 1~tr2, where tr 1=tr-tr0,tr2=tr+tr0, tr are average values of the english-chinese ratios of the obtained text in the sample text, and tr 0 is a preset ratio threshold.
Further, tr 0 has a value ranging from 0.01 to 0.1, where a person skilled in the art knows that tr 0 can be selected according to actual requirements, and all fall into the protection scope of the present invention, which is not described herein.
Further, the sample text is a text which is output by inputting a preset sample text into an initial electronic medical record question-answering model, wherein the property of the preset sample text is consistent with that of a first preset text, and the acquisition mode of the preset sample text can refer to the acquisition mode of the first preset text.
S7002, according to EW, obtaining a first initial text vector set EW0={EW0 1,……,EW0 λ,……,EW0 σ},EW0 λ=(EW0 λ1,……,EW0 λγ,……,EW0 λη),EW0 λγ as a bit value of a γ -th bit in a first initial text vector corresponding to EW λ, where γ= … … η, and η is a bit of the first initial text vector.
Specifically, the first initial text vector is obtained by inputting the first initial text into a pre-trained electronic medical record coding model.
S7003, according to the first initial text set, a second initial text set fw= { FW 1,……,FWλ,……,FWσ }, where FW λ is the λ second initial text.
Specifically, the second initial text is an accurate answer text and an accurate interpretation text of a second preset text corresponding to the first initial text.
S7004, according to FW, obtaining a second initial text vector set FW0={FW0 1,……,FW0 λ,……,FW0 σ},FW0 λ=(FW0 λ1,……,FW0 λγ,……,FW0 λη),FW0 λγ corresponding to FW as a bit value of the gamma-th bit in the first initial text vector corresponding to FW λ.
Specifically, the obtaining mode of the second initial text vector is consistent with the obtaining mode of the first initial text vector.
S7005, according to EW 0 and FW 0, a first similarity list Δw= { Δw 1,……,ΔWλ,……,ΔWσ }, where Δw λ meets the following conditions:
S7006, a first initial keyword set corresponding to the EW is obtained according to the EW, wherein the first initial keyword set comprises a plurality of first initial keyword lists, each first initial keyword list comprises a first initial keyword, and the first initial keywords are keywords in a first initial text.
Specifically, the first keyword is a word similar to a target word in a target term knowledge graph obtained from a first initial text,
Specifically, the obtaining manner of the first initial keyword is consistent with the obtaining manner of the candidate keyword, and reference may be made to steps S31 to S37.
S7007, acquiring a second initial keyword set corresponding to FW according to the FW, wherein the second initial keyword set comprises a plurality of second initial keyword lists, each second initial keyword list comprises a second initial keyword, and the second initial keywords are keywords in a second initial text.
Specifically, the obtaining mode of the second initial keyword is consistent with the obtaining mode of the first initial keyword.
S7008, a first initial keyword set and a second initial keyword set are obtained, and a second similarity list DeltaV= { DeltaV 1,……,ΔVλ,……,ΔVσ }, wherein DeltaV λ is the similarity between the first initial keyword and the second initial keyword corresponding to the same second preset text.
Specifically, the acquisition mode of Δv λ is identical to the acquisition mode of Δw λ.
S7009, obtaining the priority KL to be selected corresponding to the initial electronic medical record question-answering model according to the DeltaW and DeltaV.
Specifically, KL is acquired in S7009 by:
S70091, when Δw λ≤ZM0, kl=0, where ZM 0 is a preset first similarity threshold.
Specifically, the value range of ZM 0 is 0.6-0.85, where those skilled in the art know that those skilled in the art can select the preset first similarity threshold according to the actual requirement, which all fall into the protection range of the present invention, and are not described herein.
S70093, when Δw λ≥ZM0 and Δv λ≤ZM1, KL meets the following conditions:
Wherein, ZM 1 is a preset second similarity threshold.
Specifically, the value range of ZM 1 is 0.5-0.9, where those skilled in the art know that those skilled in the art can select the preset second similarity threshold according to the actual requirement, which all fall into the protection range of the present invention, and are not described herein.
S70095, when DeltaW λ≥ZM0 and DeltaV λ≥ZM1, KL meets the following conditions
According to the method, based on the difference of the first similarity and the second similarity, the correlation coefficients of different calculation priorities are set, the acquired priorities to be selected are more accurate based on the different dimensions, the candidate priorities corresponding to the electronic medical record question-answering model are acquired based on the different dimensions, meanwhile, the priorities to be selected are acquired in different modes based on different conditions, and the results output by the electronic medical record question-answering system are more accurate by reasonably setting the priorities.
And S703, carrying out parameter adjustment on the initial electronic question-answering model based on the priority to be selected until the priority to be selected is not smaller than a preset priority threshold to be selected so as to obtain the target electronic medical record question-answering model.
Specifically, the value range of the preset priority threshold to be selected is 0.7-0.9, where those skilled in the art know that those skilled in the art can select the preset priority threshold to be selected according to actual needs, and all the selection falls into the protection range of the present invention, which is not described herein.
S705, acquiring a preset key text, and inputting the preset key text into a target electronic medical record question model to acquire a target text, wherein the preset key text is a question text which is to be queried and is acquired based on an abnormal state and is related to the abnormal state, and the target text is an answer text and an explanation text corresponding to the preset key text.
By applying the LLM model to the electronic medical record questions and answers, the large-scale data can be processed, the application limitation of the electronic medical record questions and answers model is reduced, the instruction is set for the electronic medical record questions and answers model through the prompt instruction, the understanding and the reply of the electronic medical record questions and answers system are facilitated, and the accuracy of the output result of the electronic medical record questions and answers system is improved.
Specifically, the step S705 further includes the following steps:
S7051, acquiring a key entity set according to the sample database, where the key entity set includes a plurality of key entities, and the key entities are entities related to abnormal states acquired based on the sample database.
Specifically, the sample database includes a plurality of information related to abnormal states, such as a drug data table, a human body part, an ICD-10 standard word stock, symptom signs, infectious diseases and the like.
Further, in S7051, the key entity is acquired by:
S70511, obtaining a sample entity set according to the sample data set, wherein the sample entity set comprises a plurality of sample entities, and the sample entities are entities related to abnormal states and obtained from the sample data set, which can be understood as: the sample dataset includes a plurality of text descriptions relating to abnormal conditions, from which terms associated with the medical field, namely, the acquired sample entities, are extracted.
In particular, the sample entity set includes millions of sample entities.
Further, it is known in the art that any method of extracting an entity from text in the prior art falls within the protection scope of the present invention, and is not described herein.
S70513, acquiring a first sample entity set according to the sample entity set, wherein the first sample entity set comprises a first sample entity like a dry sample entity, and the first sample entity is an entity similar to the sample entity acquired based on the LLM model.
Specifically, those skilled in the art know that any method for obtaining similar entities based on LLM model in the prior art falls within the protection scope of the present invention, and is not described herein again, for example, LLM model such as chatglm.
S70515, obtaining a second sample entity set according to the first sample entity set, wherein the second sample entity set comprises a plurality of second sample entities, and the second sample entities are entities which have no similar characteristics with the first sample entities.
Specifically, those skilled in the art know that any method for acquiring an entity with no similar characteristics to an entity based on an entity characteristic in the prior art falls within the protection scope of the present invention, and is not described herein, for example, acquiring an entity with no similar characteristics to an entity through an FM model, an FFM model, or the like.
S70517, acquiring a key entity set based on the sample entity set, the first sample entity set and the second sample entity set, wherein the key entity set comprises the sample entity set, the first sample entity set and the second sample entity set.
Specifically, the number of the key entities in the key entity set is tens of millions, wherein, those skilled in the art know that the ratio of the first sample entity to the second sample entity can be selected according to the actual requirement, which falls into the protection scope of the present invention, and is not described herein.
S7052, the key entity set and the target entity set are input into the first intermediate model, and the key entity vector set and the target entity vector set are obtained.
Specifically, the target entity set includes a plurality of target entities, wherein the target entities are standard terms related to abnormal states.
Specifically, the first intermediate model is a model for converting text into a vector, where those skilled in the art know that any natural language processing model for converting text into a vector can be performed according to actual requirements, which falls within the protection scope of the present invention, and the description thereof is omitted herein, for example, a natural language processing model such as bert model.
Specifically, the key entity vector set includes a plurality of key entity vectors, where the key entity vectors are vectors corresponding to key entities.
Further, the target entity vector set includes a plurality of target entity vectors, where the target entity vectors are vectors corresponding to target entities.
S7053, inputting the key entity vector set and the target entity vector set into a second intermediate model, and obtaining a final entity set corresponding to the key entity set, wherein the second intermediate model is a preset neural network model.
Specifically, in S7053, the final entity set is acquired by:
And S70531, any key entity vector XY= (XY 1,……,XY(ab),……,XY(jk)),XY(ab) is the bit value of the ab bit in the key entity vector, ab= … … jk, jk is the bit of the key entity vector) is obtained from the key entity vector set.
S70532, the target entity vector set ZH={ZH1,……,ZH(cd),……,ZH(ef)},ZH(cd)=(ZH1 (cd),……,ZH(ab) (cd),……,ZH(jk) (cd)),ZH(ab) (cd) is obtained as the bit value of the ab bit corresponding to the cd-th target entity vector, cd= … … ef, where ef is the number of target entity vectors.
S70533, according to XY and ZH, acquiring a first intermediate priority list XH= { XH 1,……,XH(cd),……,XH(ef)},XH(cd) corresponding to XY as a first intermediate priority between XY and ZH (cd), wherein XH (cd) meets the following conditions:
When the priority corresponding to the entity is obtained, the method is not limited to one method, and the final priority corresponding to the entity is obtained by combining a plurality of methods, so that the accuracy of obtaining the priority corresponding to the entity is improved, and the standardized result corresponding to the output result based on the electronic medical record question-answer model is more accurate.
S70535, acquiring a final entity corresponding to XY according to XH, wherein when XH (cd) is the largest first intermediate priority in XH, acquiring a final entity corresponding to ZH (cd) as the final entity corresponding to XY.
S7054, a target model is obtained based on the sample entity set and the final entity set, where the target model is a model trained in the process of obtaining the final entity set based on the sample entity set.
S7055, a first candidate entity set corresponding to the target text is obtained, where the first candidate entity set includes a plurality of first candidate entities, and the first candidate entities are entities obtained from the target text.
Specifically, those skilled in the art know that any method for obtaining an entity from a text in the prior art falls within the protection scope of the present invention, and is not described herein.
S7056, inputting the first candidate entity into a target model, and acquiring a second candidate entity set corresponding to the target text, wherein the second candidate entity set comprises a plurality of second candidate entities, and the second candidate entities are entities in the target entity corresponding to the first candidate entity acquired based on the first candidate entity and the target model.
S7057, the first candidate entity set in the target text is replaced with the corresponding second candidate entity set to implement the normalization process for the target text.
By means of the method, the results output by the electronic medical record question-answering model are subjected to standardized processing, and follow-up data query and statistics are facilitated.
The invention relates to a system for updating an electronic medical record question-answering model based on parameter adjustment, which comprises a sample electronic medical record information set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is corresponding abnormal state characteristic information in medical records acquired from a database, and when the computer program is executed by the processor, the following steps are realized: according to a sample electronic medical record text information set, an initial electronic medical record question-answering model is obtained, when the data volume of a training set corresponding to the initial electronic medical record question-answering model is larger than a preset data volume threshold, a candidate parameter list corresponding to the initial electronic medical record question-answering model is obtained, according to the candidate parameter list, a first intermediate priority list is obtained, when a first preset text is a first preset text of a first type, a second intermediate priority set corresponding to omega is obtained based on a preset weight type, when the first preset text is a first preset text of a second type, a third intermediate priority set corresponding to omega is obtained based on a preset weight type, a final priority list is obtained according to the first intermediate priority set, the second intermediate priority set and the third intermediate priority set, and according to the final priority list, target parameters of the initial electronic medical record question-answering model are obtained to update the initial electronic medical record question-answering model.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (9)
1. A system for updating an electronic medical record question-answering model based on parameter adjustment, the system comprises a sample electronic medical record information set, a first preset text set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is abnormal state characteristic information corresponding to medical records obtained from a database, and when the computer program is executed by the processor, the following steps are realized:
S100, acquiring an initial electronic medical record question-answer model according to a sample electronic case information set;
S200, when the data volume of the training set corresponding to the initial electronic medical record question-answer model is greater than a preset data volume threshold, obtaining a candidate parameter list ω= { ω 1,……,ωc,……,ωw},ωc corresponding to the initial electronic medical record question-answer model as a c-th candidate parameter, c= … … w, and w as the number of candidate parameters, wherein ω c=2c, w=6;
s300, according to ω, acquiring a first intermediate priority list Tω= { Tω 1,……,Tωc,……,Tωw},Tωc corresponding to ω c;
S400, when the first preset text is the first preset text of the first type, acquiring a second intermediate priority set Eω={Eω1,……,Eωc,……,Eωw},Eωc={Eωc1……,Eωcμ,……,Eωcτ},Eωcμ corresponding to ω based on the preset weight type to be a μ second intermediate priority in a second intermediate priority list corresponding to ω c, where μ= … … τ, τ is the number of preset weight types;
S500, when the first preset text is the second type first preset text, acquiring a third intermediate priority set Lω={Lω1,……,Lωc,……,Lωw},Lωc={Lωc1……,Lωcμ,……,Lωcτ},Lωcμ corresponding to omega based on a preset weight type, wherein the third intermediate priority set Lω={Lω1,……,Lωc,……,Lωw},Lωc={Lωc1……,Lωcμ,……,Lωcτ},Lωcμ is a mu third intermediate priority in a third intermediate priority list corresponding to omega c;
s600, obtaining a final priority list Fω= { Fω 1,……,Fωc,……,Fωw } corresponding to ω according to T ω, E ω and L ω, wherein Fω c meets the following conditions:
And S700, adjusting omega c to be a target parameter of the initial electronic medical record question-answer model according to F omega so as to update the initial electronic medical record question-answer model, wherein F omega c is the largest final priority in F omega.
2. The system for updating an electronic medical record question-answer model based on tuning of claim 1, wherein the initial electronic medical record question-answer model is obtained in S100 by:
S10, acquiring a specified text vector set according to the sample electronic medical record information set;
S20, acquiring a first target text set corresponding to a first preset text set based on the first preset text set and the appointed text vector set;
s30, acquiring a second target text set corresponding to the first preset text set based on the first preset text set and the first target text set;
s40, inputting a first preset text set and a second target text set into a preset first initial LLM model, and acquiring a third target text set corresponding to the first preset text set;
S50, the first target text set, the second target text set and the third target text set are used as training sets to be input into a preset second initial LLM model, and an initial electronic medical record question-answering model is generated.
3. The system for updating an electronic medical record question-answer model based on tuning of claim 1, wherein the candidate parameter is a matrix-corresponding rank set to reduce training time of a training set in an initial electronic medical record question-answer model.
4. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the preset data quantity threshold value ranges from 100GB to 1TB.
5. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the first intermediate priority is occupancy of a GPU during operation of the initial electronic medical record question-answering model.
6. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the first preset text of the first type is a question text in which the first preset text is a single question and has no relevance to other questions.
7. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the second intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameters and the first preset text.
8. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the second type of first preset text is a question text including a plurality of questions in the first preset text and associated with each question.
9. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the third intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameters and the second type of first preset text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311514678.7A CN117454989B (en) | 2023-11-14 | 2023-11-14 | System for updating electronic medical record question-answer model based on parameter adjustment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311514678.7A CN117454989B (en) | 2023-11-14 | 2023-11-14 | System for updating electronic medical record question-answer model based on parameter adjustment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117454989A CN117454989A (en) | 2024-01-26 |
CN117454989B true CN117454989B (en) | 2024-10-22 |
Family
ID=89589037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311514678.7A Active CN117454989B (en) | 2023-11-14 | 2023-11-14 | System for updating electronic medical record question-answer model based on parameter adjustment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117454989B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117995347B (en) * | 2024-04-07 | 2024-06-21 | 北京惠每云科技有限公司 | Medical record content quality control method and device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505243A (en) * | 2021-07-29 | 2021-10-15 | 深圳万海思数字医疗有限公司 | Intelligent question-answering method and device based on medical knowledge graph |
CN113934824A (en) * | 2021-12-15 | 2022-01-14 | 之江实验室 | Similar medical record matching system and method based on multi-round intelligent question answering |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334199A (en) * | 2019-07-09 | 2019-10-15 | 北京百度网讯科技有限公司 | Obtain method and apparatus, the electronic equipment, computer-readable medium of problem answers |
-
2023
- 2023-11-14 CN CN202311514678.7A patent/CN117454989B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505243A (en) * | 2021-07-29 | 2021-10-15 | 深圳万海思数字医疗有限公司 | Intelligent question-answering method and device based on medical knowledge graph |
CN113934824A (en) * | 2021-12-15 | 2022-01-14 | 之江实验室 | Similar medical record matching system and method based on multi-round intelligent question answering |
Also Published As
Publication number | Publication date |
---|---|
CN117454989A (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111538908B (en) | Search ranking method and device, computer equipment and storage medium | |
CN117454843B (en) | Data preprocessing system based on electronic medical record question-answering model | |
CN114925692B (en) | Data processing system for acquiring target event | |
CN117711600B (en) | LLM model-based electronic medical record question-answering system | |
CN117556034B (en) | Data processing system for standardizing output results of electronic medical record question-answering model | |
CN112380344B (en) | Text classification method, topic generation method, device, equipment and medium | |
CN110674252A (en) | High-precision semantic search system for judicial domain | |
CN112819023A (en) | Sample set acquisition method and device, computer equipment and storage medium | |
CN112559723B (en) | FAQ search type question-answering construction method and system based on deep learning | |
US12051017B2 (en) | Apparatus for determining role fitness while eliminating unwanted bias | |
CN111782826A (en) | Knowledge graph information processing method, device, equipment and storage medium | |
CN117454989B (en) | System for updating electronic medical record question-answer model based on parameter adjustment | |
CN117520503A (en) | Financial customer service dialogue generation method, device, equipment and medium based on LLM model | |
CN117454990B (en) | System for updating electronic medical record question-answer model based on feedback result | |
CN117520126B (en) | Scoring system of electronic medical record question-answering model | |
JP4143234B2 (en) | Document classification apparatus, document classification method, and storage medium | |
CN114373554A (en) | Drug interaction relation extraction method using drug knowledge and syntactic dependency relation | |
CN111415750B (en) | Rule-based user information structuring and quick retrieval method and system | |
CN110633363B (en) | Text entity recommendation method based on NLP and fuzzy multi-criterion decision | |
CN116680401A (en) | Document processing method, document processing device, apparatus and storage medium | |
CN114372478A (en) | Knowledge distillation-based question and answer method, terminal equipment and storage medium | |
CN114861625A (en) | Method for obtaining target training sample, electronic device and medium | |
CN117332768B (en) | Data processing system for acquiring text generation template | |
CN118093736B (en) | Acquisition system for corresponding entity and entity tag of medical record text | |
CN117493588B (en) | Search result determining method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |