CN117454989B

CN117454989B - System for updating electronic medical record question-answer model based on parameter adjustment

Info

Publication number: CN117454989B
Application number: CN202311514678.7A
Authority: CN
Inventors: 刘立宇; 初乃强; 赵瑞莹
Original assignee: Singularity Digital Beijing Technology Co ltd; Singularity Of Life Beijing Technology Co ltd
Current assignee: Singularity Digital Beijing Technology Co ltd; Singularity Of Life Beijing Technology Co ltd
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2024-10-22
Anticipated expiration: 2043-11-14
Also published as: CN117454989A

Abstract

The invention relates to a system for updating an electronic medical record question-answer model based on parameter adjustment, which comprises a sample electronic medical record information set, a processor and a memory storing a computer program, wherein when the computer program is executed by the processor, the following steps are realized: according to a sample electronic medical record text information set, an initial electronic medical record question-answering model is obtained, when the data volume of a training set corresponding to the initial electronic medical record question-answering model is larger than a preset data volume threshold value, a candidate parameter list is obtained, a first middle priority list is obtained, a second middle priority set and a third middle priority set are respectively obtained based on a first preset text, and a target parameter is adjusted according to a final priority list to update the initial electronic medical record question-answering model.

Description

System for updating electronic medical record question-answer model based on parameter adjustment

Technical Field

The invention relates to the technical field of large language models, in particular to a system for updating an electronic medical record question-answering model based on parameter adjustment.

Background

Along with the continuous growth of medical service and the continuous development of artificial intelligence technology, the medical record electronization has become trend, the related models in the medical field designed based on medical record documents are more and more, the performance of the models determines the reliability of data processing results, along with the wide application of large language models in natural language processing, how to improve the reasoning speed of the large language models in actual deployment becomes a very critical problem, and therefore, the adjustment of parameters based on an electronic medical record question-answer model becomes a popular research direction.

At present, in the prior art, the method for updating the electronic medical record question-answering model comprises the following steps: rule matching is carried out based on the regular expression, training reasoning is carried out based on the traditional machine learning model to obtain an electronic medical record question-answering model, and parameters of the electronic medical record question-answering model are adjusted by adopting a batch reasoning method.

In summary, the method for updating the electronic medical record question-answering model has the following problems: the model parameters are not adjusted based on the size of the data volume, so that the adaptation degree of the model is reduced, the problems of time increase and resource waste of model training are caused when the data volume is too large, and simultaneously the reasoning capacity and the response capacity of the model are reduced in the process of adjusting the model parameters, so that the accuracy of the output result of the electronic medical record question-answering model is reduced.

Disclosure of Invention

The invention provides a system for updating an electronic medical record question-answer model based on parameter adjustment, which comprises: a system for updating an electronic medical record question-answering model based on parameter adjustment, the system comprising a sample electronic medical record information set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is corresponding abnormal state characteristic information in medical records acquired from a database, and when the computer program is executed by the processor, the following steps are realized:

S100, acquiring an initial electronic medical record question-answer model according to the sample electronic case information set.

And S200, when the data volume of the training set corresponding to the initial electronic medical record question-answer model is larger than a preset data volume threshold, acquiring a candidate parameter list omega= { omega ₁,……,ω_c,……,ω_w},ω_c corresponding to the initial electronic medical record question-answer model as a c candidate parameter, wherein c= … … w and w are the number of candidate parameters, and omega _c＝2^c and w=6.

And S300, obtaining a first intermediate priority list Tω= { Tω ₁,……,Tω_c,……,Tω_w},Tω_c corresponding to ω according to ω, wherein Tω= { Tω ₁,……,Tω_c,……,Tω_w},Tω_c is the first intermediate priority corresponding to ω _c.

S400, when the first preset text is the first preset text of the first type, acquiring the second intermediate priority set Eω＝{Eω₁,……,Eω_c,……,Eω_w},Eω_c＝{Eω_c1……,Eω_cμ,……,Eω_cτ},Eω_cμ corresponding to ω based on the preset weight type to be the μ second intermediate priority in the second intermediate priority list corresponding to ω _c, where μ= … … τ, τ is the number of preset weight types.

S500, when the first preset text is the second type first preset text, acquiring a third intermediate priority set Lω＝{Lω₁,……,Lω_c,……,Lω_w},Lω_c＝{Lω_c1……,Lω_cμ,……,Lω_cτ},Lω_cμ corresponding to omega based on the preset weight type to be the mu third intermediate priority in a third intermediate priority list corresponding to omega _c.

S600, obtaining a final priority list Fω= { Fω ₁,……,Fω_c,……,Fω_w } corresponding to ω according to T ω, E ω and L ω, wherein Fω _c meets the following conditions:

And S700, adjusting omega _c to be a target parameter of the initial electronic medical record question-answer model according to F omega so as to update the initial electronic medical record question-answer model, wherein F omega _c is the largest final priority in F omega.

The invention relates to a system for updating an electronic medical record question-answering model based on parameter adjustment, which comprises a sample electronic medical record information set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is corresponding abnormal state characteristic information in medical records acquired from a database, and when the computer program is executed by the processor, the following steps are realized: according to a sample electronic medical record text information set, an initial electronic medical record question-answering model is obtained, when the data volume of a training set corresponding to the initial electronic medical record question-answering model is larger than a preset data volume threshold, a candidate parameter list corresponding to the initial electronic medical record question-answering model is obtained, according to the candidate parameter list, a first intermediate priority list is obtained, when a first preset text is a first preset text of a first type, a second intermediate priority set corresponding to omega is obtained based on a preset weight type, when the first preset text is a first preset text of a second type, a third intermediate priority set corresponding to omega is obtained based on a preset weight type, a final priority list is obtained according to the first intermediate priority set, the second intermediate priority set and the third intermediate priority set, and according to the final priority list, target parameters of the initial electronic medical record question-answering model are obtained to update the initial electronic medical record question-answering model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a computer program executed by a system for updating an electronic medical record question-answer model based on call parameters according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A system for updating an electronic medical record question-answer model based on tuning, the system comprising a sample electronic medical record information set, a processor and a memory storing a computer program which, when executed by the processor, performs the steps of, as shown in fig. 1:

Specifically, in S100, an initial electronic medical record question-answer model is obtained by:

S10, acquiring a specified text vector set according to the sample electronic medical record information set.

Specifically, the sample electronic medical record information set includes a plurality of sample electronic medical record information, where the sample electronic medical record information is abnormal state feature information corresponding to medical records obtained from a database, where the abnormal state feature information is feature information associated with a disease, for example, abnormal state feature information such as abnormal glycoglycoprotein tap is in abnormal detection, and nasopharynx presents hypodifferentiation squamous cell carcinoma.

Furthermore, those skilled in the art know that any selection of the medical public database capable of acquiring cases can be performed according to actual requirements, which falls into the protection scope of the present invention, and will not be described again.

Further, the data format of the sample electronic medical record information comprises a text format and a table format.

Specifically, the system further comprises a target term knowledge graph, wherein the target term knowledge graph presents triplet forms, and each triplet form in the target term knowledge graph comprises two entities related to an abnormal state and a relationship between the two entities related to the abnormal state.

Further, those skilled in the art know that any method for constructing a knowledge graph based on a target term in the prior art falls into the protection scope of the present invention, and is not described herein.

Specifically, in S10, the method further includes the following steps:

S1, according to a sample electronic medical record information set, a candidate text set A= { A ₁,……,A_i,……,A_n},A_i is the ith candidate text, i= … … n, and n is the number of the candidate texts.

Specifically, in S1, candidate texts are obtained by:

And S11, when the data format of the sample electronic case information is a text format, the sample electronic case information is segmented according to segmentation symbols to generate candidate texts.

S13, when the data format of the sample electronic case information is a table format, integrating each record in the sample electronic case information and the field name corresponding to the record to generate a candidate text, which can be understood as: when field names corresponding to each field in the sample electronic case information are ID, biopsy position and histological classification in sequence from left to right, and contents of a certain row in the sample electronic case information are 008 # s, nasopharynx and squamous cell carcinoma in sequence from left to right, a candidate text is obtained as follows: the biopsy site with ID 008 is nasopharyngeal and histologically classified as squamous cell carcinoma.

And S3, acquiring a candidate keyword set Q= { Q ₁,……,Q_i,……,Q_n},Q_i corresponding to the A as a candidate keyword list corresponding to the A _i according to the A and the target term knowledge graph.

Specifically, in S3, Q _i is obtained by the following steps:

S31, according to a, acquiring a first intermediate word set B＝{B₁,……,B_i,……,B_n},B_i＝{B_i1,……,B_ij,……,B_im(i)},B_ij corresponding to a as the j-th first intermediate word in the first intermediate word list corresponding to a _i, where j= … … m (i), and m (i) is the number of first intermediate words in the first intermediate word list corresponding to a _i.

Specifically, the first intermediate word is a word obtained from the candidate text, and those skilled in the art know that any method for extracting a word from the text in the prior art falls within the protection scope of the present invention, and is not described herein in detail.

And S33, according to the target term knowledge graph, acquiring a target word list D= { D ₁,……,D_r,……,D_s},D_r as an r-th target word, wherein r= … … S and S as the number of target words.

Specifically, the target word is an entity related to the abnormal state obtained from the target term knowledge graph.

And S35, according to B and D, acquiring a first intermediate similarity set F＝{F₁,……,F_i,……,F_n},F_i＝{F_i1,……,F_ij,……,F_im(i)},F_ij＝{F¹ _i1,……,F^r _ij,……,F^s _im(i)},F^r _ij corresponding to B as a first intermediate similarity between B _ij and D _r.

Specifically, the first intermediate similarity is a similarity between a word vector corresponding to the first intermediate word and a word vector corresponding to the target word, where one skilled in the art knows that any method for calculating the similarity between the vectors in the prior art falls within the protection scope of the present invention, and is not described herein.

Further, the method for the word vector corresponding to the first intermediate word is to input the first intermediate word vector into a vector corresponding to the word obtained in the natural language processing model, where those skilled in the art know that any natural language processing model for converting text into a vector in the prior art falls into the protection scope of the present invention, and is not described herein again.

S37, inserting B _ij into Q _i when F ^r _ij≥F⁰ is performed, wherein F ⁰ is a preset first intermediate similarity threshold.

Specifically, the value range of F ⁰ is 0.8-0.9, where those skilled in the art know that selection of F ⁰ can be performed according to actual requirements, which all fall into the protection scope of the present invention, and are not described herein.

S5, according to A and Q, obtaining an initial text set T= { T ₁,……,T_i,……,T_n},T_i＝{A_i,Q_i},T_i as an ith initial text.

Specifically, the initial text is a text obtained by splicing the candidate text and the candidate keywords and splicing the candidate keywords after the candidate text.

S7, acquiring a specified text set U= { U ₁,……,U_i,……,U_n},U_i as an ith specified text according to T, wherein U _i is acquired in S7 through the following steps:

S71, according to T _i, the x-th text character corresponding to the text string WT_i＝(WT⁰ _i1,……,WT⁰ _ix,……,WT⁰ _ip,WT¹ _i1,……,WT¹ _iy,……,WT¹ _iq),WT⁰ _ix of T _i is the x-th text character corresponding to a _i, x= … … p, p is the number of text characters corresponding to a _i, WT ¹ _iy is the y-th text character corresponding to Q _i, y= … … Q, and Q is the number of text characters corresponding to Q _i is obtained.

S72, when p+q=k, acquiring U _i＝T_i, where K is a preset key priority threshold.

Specifically, in S72K is obtained by:

S721, according to T, obtaining a set C＝{C₁,……,C_d,……,C_z},C_d＝{C_d1,……,C_dg,……,C_dh(d)},C_dg of key text types as the g-th key text in the d-th type key text list, g= … … h (d), h (d) being the number of key texts in the d-th type key text list, d= … … z, and z being the number of key text types.

Specifically, the key text is an initial text obtained from T based on a text type corresponding to the initial text, and those skilled in the art know that any method for classifying the text in the prior art falls within the scope of the present invention, and the method for classifying the text by using keywords of the text is not described herein, where the text type is classified into a text type corresponding to the initial text, such as a heart type and a ocularnose and throat type.

S723, according to C, obtaining a first text string quantity set C⁰＝{C⁰ ₁,……,C⁰ _d,……,C⁰ _z},C⁰ _d＝{C⁰ _d1,……,C⁰ _dg,……,C⁰ _dh(d)},C⁰ _dg corresponding to C as a first text string quantity corresponding to C _dg.

Specifically, the number of the first text strings is the number of text strings corresponding to the key text.

S725, according to C ⁰, the second text string quantity set C¹＝{C¹ ₁,……,C¹ _d,……,C¹ _z},C¹ _d＝{C¹ _d1,……,C¹ _du,……,C¹ _dh(d)},C¹ _du corresponding to C is obtained as the u-th second text in the second text string quantity list corresponding to the d-th type key text list, where u=1 … … h (d), and C ¹ _d1≥……≥C¹ _du≥……≥C¹ _dh(d).

Specifically, the second text string number is the text string number sequentially obtained according to the first text string number from the big to the small.

Further, the number of text strings is the number of text strings corresponding to the text.

S725, according to C ⁰, obtaining K, wherein K meets the following conditions:

Wherein, C ¹ _dα is the number of text strings corresponding to the key text corresponding to the number of alpha second text strings in the d-th type key text list, and epsilon is a preset first number threshold.

In particular, the method comprises the steps of,Is an integer of not more than (h (d) ×ε).

Specifically, the value range of epsilon is 0.85-1, wherein, the person skilled in the art knows that epsilon can be selected according to the actual requirement, and the epsilon falls into the protection range of the invention, and is not repeated here.

According to the method, the preset key priority threshold is obtained based on the types of the key texts and the number of text strings corresponding to each type of key texts, so that the number of text strings corresponding to the initial text is uniform, the comprehensiveness of texts corresponding to the appointed text vectors obtained later is guaranteed by combining the number of text strings corresponding to the types of the texts, the accuracy of the obtained text string number unified value is improved by setting the threshold based on the number of text strings corresponding to each type of key texts, the problem that text data are easy to be lost due to too short text string length can be avoided, the problem that text data processing efficiency is reduced due to too long text string length can be avoided, and the accuracy of the appointed text vector set obtained later is improved.

S73, when p+q > K, acquiring a candidate priority set P＝{P₁,……,P_i,……,P_n},P_i＝{P_i1,……,P_ie,……,P_if(i)},P_ie corresponding to Q as a candidate priority corresponding to the e-th candidate keyword in the candidate keyword list corresponding to Q _i, e= … … f (i), and f (i) as the number of candidate keywords in the candidate keyword list corresponding to Q _i.

Specifically, in S73, P _ie is obtained by the following steps:

S731, the candidate keyword list Q _i＝{Q_i1,……,Q_ie,……,Q_if(i)},Q_ie is obtained as the e-th candidate keyword in Q _i.

S733, according to the target term knowledge graph, acquiring a specified keyword list R _ie＝{R¹ _ie,……,R^a _ie,……,R^b(e) _ie corresponding to Q _ie and a specified priority list G _ie＝{G¹ _ie,……,G^a _ie,……,G^b(e) _ie},R^a _ie corresponding to Q _ie, wherein the specified keyword list G _ie＝{G¹ _ie,……,G^a _ie,……,G^b(e) _ie},R^a _ie is the a-th specified keyword corresponding to Q _ie, a= … … b (e), b (e) is the number of specified keywords corresponding to Q _ie, and G ^a _ie is the specified priority between Q _ie and R ^a _ie.

Specifically, the specified keyword is a target word associated with the candidate keyword, which is obtained from the target term knowledge graph.

Specifically, the specified priority is the association degree between the candidate keyword and the specified keyword, where those skilled in the art know that any method for obtaining the association degree between two texts in the prior art falls into the protection scope of the present invention, and is not described herein in detail.

S735, according to Q _ie、R_ie and G _ie, P _ie is acquired, wherein P _ie meets the following conditions:

Where M _ie is the number of occurrences of Q _ie in candidate text set A, N _ie is the number of first intermediate words corresponding to candidate texts in candidate text set A that include Q _ie, V _ie is the number of candidate texts in candidate text set A that include Q _ie, E ^a _ie is the number of occurrences of G ^a _ie in candidate text set A, L ^a _ie is the number of first intermediate words corresponding to candidate texts in candidate text set A that include G ^a _ie, and J ^a _ie is the number of candidate texts in candidate text set A that include G ^a _ie.

S74, WT _i is processed to obtain U _i based on P.

Specifically, in S74, the following steps are further included:

S741, according to P _i, obtaining a first intermediate text beta ¹ _i＝(A_i,Q_i1,……,Q_i(e-1),Q_i(e+1)……,Q_if(i) corresponding to T _i, wherein P _ie is the smallest candidate priority in P _i.

And S743, when the number of text strings corresponding to beta ¹ _i is not more than K, acquiring U _i＝β_i.

S745, when the number of text strings corresponding to β ¹ _i is greater than K, the minimum candidate priority of P _ie except P _ie in P _i is obtained, and deleted in the initial text Q _i to obtain the second intermediate text β ² _i corresponding to T _i.

S747, repeatedly executing S743 to S745 until the number of text strings corresponding to the acquired U _i is not greater than K so as to acquire U _i.

S75, when p+q is less than K, acquiring a specified keyword set R _i＝{R_i1,……,R_ie,……,R_if(i) corresponding to Q _i and a specified priority set G _i＝{G_i1,……,G_ie,……,G_if(i)},R_ie corresponding to Q _i, wherein G _ie is a specified keyword list corresponding to Q _ie, and G _ie is a specified priority list corresponding to Q _ie.

S76, according to R _i and G _i, the WT _i is processed to obtain U _i.

Specifically, in S76, the following steps are further included:

and S761, when G ^a _ie is the largest designated priority in G _ie, acquiring a first candidate text set corresponding to T _i, wherein the first candidate text set comprises a plurality of first candidate texts, and the first candidate texts are candidate texts which are acquired from A and comprise designated keywords R ^a _ie corresponding to G ^a _ie.

S763, based on the first candidate text set corresponding to the T _i, obtaining a second candidate text H _i corresponding to the T _i, wherein H ⁰ _i＝K-p-q,H⁰ _i is the number of text strings corresponding to the H _i.

S765, according to H _i, U _i＝(A_i,Q_i,H_i is obtained).

And when the text character strings corresponding to the initial text are not less than the preset length threshold, performing supplementary processing on the text associated with the candidate keywords corresponding to the initial text, unifying the number of the text character strings corresponding to the initial text by adopting different processing modes based on different numbers of the text character strings corresponding to the initial text, and improving the accuracy of the acquired appointed text vector set.

S9, acquiring a specified text vector set according to the U, wherein the specified text vector set comprises a plurality of specified text vectors, and the specified text vectors are acquired by inputting specified texts into a pre-training electronic medical record coding model.

Specifically, the pre-training electronic medical record coding model is a model which is obtained by training a medical record text training set based on the pre-training model and is used for converting texts into vectors.

Further, those skilled in the art know that the selection of the pre-training model can be performed according to actual requirements, which falls within the protection scope of the present invention, and the pre-training model, such as ERNIE, is not described herein.

Further, the medical record text training set is a medical record text set for model training, which is acquired based on different search engines, and the medical record text set comprises a plurality of medical record texts with different types and forms.

Further, those skilled in the art will know that any method of obtaining text from multiple search engines in the prior art falls within the protection scope of the present invention, and will not be described herein, where the search engines, such as hundred degrees, etc.

S20, acquiring a first target text set corresponding to the first preset text set based on the first preset text set and the appointed text vector set.

Specifically, the first preset text set includes a plurality of first preset texts, where the first preset texts are question texts related to abnormal states, which are acquired based on the abnormal states.

Further, the question text is text which presents the requirement for answer and interpretation in the form of a question, for example: question text of the expression of luteinizing hormone lower than 3.

Further, the first preset text is a question text obtained through a medical public database, where those skilled in the art know that any text of a question related to medicine obtained based on the medical public database in the prior art falls within the protection scope of the present invention, and is not described herein.

Specifically, in S20, the following steps are further included:

S21, a first preset text vector set I= { I ₁,……,I_t,……,I_θ},I_t is obtained, wherein t= … … θ is the number of first preset texts, and the first preset text vector set I= { I ₁,……,I_t,……,I_θ},I_t is the first preset text vector corresponding to the t first preset text.

Specifically, the first preset text vector is used for inputting the first preset text into the pre-training electronic medical record coding model.

S23, acquiring a designated text vector setA text vector is specified for the i-th.

S25, according to I andObtaining a first target similarity set ER＝{ER₁,……,ER_t,……,ER_θ},ER_t＝{ER_t1,……,ER_ti,……,ER_tn},ER_ti corresponding to I as I _t andA first target similarity between.

Specifically, those skilled in the art know that any method for obtaining the similarity between vectors in the prior art falls into the protection scope of the present invention, and the method for calculating the similarity between vectors, such as cosine similarity, is not described herein.

S27, when ER _ti≥ER⁰ is availableThe corresponding target text U _i is the first target text corresponding to I _t, where ER ⁰ is a preset second priority threshold.

Specifically, the range of ER ⁰ is 0.8-0.85, and those skilled in the art know that ER ⁰ can be selected according to actual requirements, which fall within the protection scope of the present invention and are not described herein.

S30, acquiring a second target text set corresponding to the first preset text set based on the first preset text set and the first target text set.

Specifically, the second target text set includes a plurality of second target texts, where the second target texts are explanatory content texts associated with the first preset text generated based on the first preset text and the first target text set through a prompt instruction, for example, when the first preset text relates to the heart, the heart is simply explained by combining the first target text related to the first preset text and related knowledge in some abnormal state fields, and the first preset text and the explanatory content acquired based on the first preset text are regarded as the second target text.

Further, those skilled in the art know that any method for training by using a prompt instruction in the prior art to output a result falls within the protection scope of the present invention, and is not described herein.

According to the method, the second target text set corresponding to the first preset text set is generated based on the first preset text set and the first target text through the prompt instruction, the medical record text corresponding to each question text is obtained, and the prompt instruction is used for setting the instruction for the corresponding question text, so that understanding and replying of the electronic medical record question-answering system are facilitated, and accuracy of an output result of the electronic medical record question-answering system is improved.

S40, inputting the first preset text set and the second target text set into a preset first initial LLM model, and obtaining a third target text set corresponding to the first preset text set.

Specifically, the third target text set includes a plurality of third target texts, where the third target texts are answer texts and interpretation texts corresponding to a first preset text obtained based on the first preset text.

Further, the answer text is a text which answers based on the question text.

Further, the explanation text is text for obtaining explanation of the answer text based on the question text.

Further, in S40, a third target text is acquired by:

S41, acquiring a psi fourth target text corresponding to the first preset text according to the first preset text and a second target text corresponding to the first preset text, wherein the fourth target text is an answer text and an explanation text corresponding to the first preset text acquired in a plurality of LLM models based on the second target text.

Specifically, those skilled in the art know that any method of outputting a result through the LLM model in the prior art falls within the protection scope of the present invention, and will not be described herein, where LLM models such as Baichuan-13B model, LLaMA model, etc. are included.

Specifically, the value range of ψ is 30-50, where those skilled in the art know that selection of ψ can be performed according to actual requirements, which falls into the protection range of the present invention, and will not be described herein.

S43, acquiring a priority corresponding to the fourth target text according to the fourth target text, wherein the priority is a score value acquired based on a voting method, and any method for acquiring the score based on the voting method in the prior art is known by a person skilled in the art and falls into the protection scope of the present invention, and is not repeated herein.

Specifically, the value range of the priority is 0-1.

S45, acquiring a third target text corresponding to the first preset text according to the priority, wherein the third target text is a fourth target text corresponding to the maximum priority.

S50, the first target text set, the second target text set and the third target text set are used as training sets to be input into a preset second initial LLM model, and an initial electronic medical record question-answering model is generated.

Specifically, the candidate parameter is a rank corresponding to a matrix set for reducing training time of a training set in an initial electronic medical record question-answering model, where the rank can be understood as: when the LLM model performs data processing, multiplication between the matrix and the matrix is involved, and when the data volume of the training set is too large, the training efficiency is reduced, so that a matrix with a slightly smaller rank needs to be set to help training in order to reduce the training time of the training set, and the candidate parameter is the set rank of the matrix.

Further, the value range of the preset data quantity threshold is 100 GB-1 TB, and those skilled in the art know that the selection of the preset data quantity threshold can be performed according to the actual requirement, which falls into the protection range of the present invention, and will not be described herein.

Specifically, the first intermediate priority is the occupancy rate of the GPU in the running process of the initial electronic medical record question-answering model, where those skilled in the art know that any method for obtaining the occupancy rate of the GPU in the prior art falls into the protection scope of the present invention, and is not described herein.

Specifically, the first preset text of the first type is a question text which is a single question and has no relevance with other questions.

Specifically, the second intermediate priority is a score value corresponding to the initial electronic medical record question-answering model obtained based on the candidate parameter and the first preset text under different preset weight types, wherein a person skilled in the art knows that any method for obtaining the model based on different conditions in the prior art falls into the protection scope of the present invention, and is not repeated herein.

Specifically, the preset weight type is a matrix type of calculated weights, where it can be understood that: in the transducer architecture, there are four weight matrices (Wq, wk, wv, wo) in the self-attention module, where Wq (or Wk, wv) is considered a single square matrix.

In particular, the method comprises the steps of, τ is more than or equal to 4 and less than or equal to 30.

Preferably, τ has a value of 6, where when τ takes 6, it is possible to avoid the problem of low efficiency caused by performing a large number of tests, and ensure the comprehensiveness of the tests.

Specifically, the second type of first preset text is a question text including a plurality of questions and associated with each question.

Specifically, the third intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameter and the second type first preset text.

Further, the obtaining mode of the third intermediate priority is consistent with the obtaining mode of the second intermediate priority.

According to the method, the performance of the initial electronic medical record question-answering model is obtained through the candidate parameters of the initial electronic medical record question-answering model, the time for training the model can be saved through setting the candidate parameters, the resource waste is not easy to cause, the reasoning capacity and the corresponding capacity of the model are not influenced, and the parameters are adjusted at the same time, so that the output result of the electronic medical record question-answering model is more accurate.

Specifically, the method further includes the following steps after S700:

s701, inputting a second preset text set into the initial electronic medical record question-answering model, and obtaining the priority to be selected corresponding to the initial electronic medical record question-answering model.

Specifically, the second preset text set includes a plurality of second preset texts, where the second preset texts are question texts related to abnormal states for testing the effect of the initial electronic medical record question-answering model.

Specifically, in S701, the priority to be selected is acquired by:

S7011, the second preset text set is input into the initial electronic medical record question-answer model, and a first key text set ep= { EP ₁,……,EP_δ,……,EP_ζ } corresponding to the second preset text set is obtained, where EP _δ is a first key text corresponding to a delta second preset text, delta= … … ζ is a number corresponding to the second preset text.

Specifically, the first key text is an answer text and an explanation text corresponding to a second preset text obtained based on an initial electronic medical record question-answering model.

S7013, according to EP, acquiring a first set EP⁰＝{EP⁰ ₁,……,EP⁰ _δ,……,EP⁰ _ζ},EP⁰ _δ＝(EP⁰ _δ1,……,EP⁰ _δγ,……,EP⁰ _δη),EP⁰ _δγ of key text vectors corresponding to EP as bit values of the γ -th bit in the first key text vector corresponding to EP _δ, where γ= … … η, η is bits of the first key text vector.

Specifically, the first key text vector is obtained by inputting the first key text into a pre-trained electronic medical record coding model.

S7015, a second set of key texts fp= { FP ₁,……,FP_δ,……,FP_ζ},FP_δ corresponding to the second set of preset texts is obtained as a second key text corresponding to the delta second set of preset texts.

Specifically, the second key text is an accurate answer text and an interpretation text corresponding to the second preset text.

S7017, obtaining a second key text vector set FP⁰＝{FP⁰ ₁,……,FP⁰ _δ,……,FP⁰ _ζ},FP⁰ _δ＝(FP⁰ _δ1,……,FP⁰ _δγ,……,FP⁰ _δη),EP⁰ _δγ corresponding to the FP as a bit value of the gamma-th bit in the second key text vector corresponding to the EP _δ according to the FP.

Specifically, the obtaining mode of the second key text vector is consistent with the obtaining mode of the first key text vector.

S7019, according to EP ⁰ and FP ⁰, obtaining a priority KL to be selected corresponding to the initial electronic medical record question-answer model, wherein the KL meets the following conditions:

in another specific embodiment, the candidate priority is acquired in S701 by:

s7001, inputting the second preset text set into the initial electronic medical record question-answering model, and obtaining a first initial text set ew= { EW ₁,……,EW_λ,……,EW_σ }, where EW _λ is a lambda-th first initial text, lambda= … … sigma, and sigma is the number of the first initial texts.

Specifically, the first initial text is a first key text with a Chinese-English ratio in a preset ratio range, which is obtained from a first key text set.

Further, the first key text set comprises a plurality of first key texts, wherein the first key texts are answer texts and explanation texts corresponding to second preset texts obtained based on an initial electronic medical record question-answering model.

Further, the answer text is a text which answers based on the question text.

Further, the preset ratio range is tr ¹～tr², where tr ¹＝tr-tr⁰,tr²＝tr+tr⁰, tr are average values of the english-chinese ratios of the obtained text in the sample text, and tr ⁰ is a preset ratio threshold.

Further, tr ⁰ has a value ranging from 0.01 to 0.1, where a person skilled in the art knows that tr ⁰ can be selected according to actual requirements, and all fall into the protection scope of the present invention, which is not described herein.

Further, the sample text is a text which is output by inputting a preset sample text into an initial electronic medical record question-answering model, wherein the property of the preset sample text is consistent with that of a first preset text, and the acquisition mode of the preset sample text can refer to the acquisition mode of the first preset text.

S7002, according to EW, obtaining a first initial text vector set EW⁰＝{EW⁰ ₁,……,EW⁰ _λ,……,EW⁰ _σ},EW⁰ _λ＝(EW⁰ _λ1,……,EW⁰ _λγ,……,EW⁰ _λη),EW⁰ _λγ as a bit value of a γ -th bit in a first initial text vector corresponding to EW _λ, where γ= … … η, and η is a bit of the first initial text vector.

Specifically, the first initial text vector is obtained by inputting the first initial text into a pre-trained electronic medical record coding model.

S7003, according to the first initial text set, a second initial text set fw= { FW ₁,……,FW_λ,……,FW_σ }, where FW _λ is the λ second initial text.

Specifically, the second initial text is an accurate answer text and an accurate interpretation text of a second preset text corresponding to the first initial text.

S7004, according to FW, obtaining a second initial text vector set FW⁰＝{FW⁰ ₁,……,FW⁰ _λ,……,FW⁰ _σ},FW⁰ _λ＝(FW⁰ _λ1,……,FW⁰ _λγ,……,FW⁰ _λη),FW⁰ _λγ corresponding to FW as a bit value of the gamma-th bit in the first initial text vector corresponding to FW _λ.

Specifically, the obtaining mode of the second initial text vector is consistent with the obtaining mode of the first initial text vector.

S7005, according to EW ⁰ and FW ⁰, a first similarity list Δw= { Δw ₁,……,ΔW_λ,……,ΔW_σ }, where Δw _λ meets the following conditions:

S7006, a first initial keyword set corresponding to the EW is obtained according to the EW, wherein the first initial keyword set comprises a plurality of first initial keyword lists, each first initial keyword list comprises a first initial keyword, and the first initial keywords are keywords in a first initial text.

Specifically, the first keyword is a word similar to a target word in a target term knowledge graph obtained from a first initial text,

Specifically, the obtaining manner of the first initial keyword is consistent with the obtaining manner of the candidate keyword, and reference may be made to steps S31 to S37.

S7007, acquiring a second initial keyword set corresponding to FW according to the FW, wherein the second initial keyword set comprises a plurality of second initial keyword lists, each second initial keyword list comprises a second initial keyword, and the second initial keywords are keywords in a second initial text.

Specifically, the obtaining mode of the second initial keyword is consistent with the obtaining mode of the first initial keyword.

S7008, a first initial keyword set and a second initial keyword set are obtained, and a second similarity list DeltaV= { DeltaV ₁,……,ΔV_λ,……,ΔV_σ }, wherein DeltaV _λ is the similarity between the first initial keyword and the second initial keyword corresponding to the same second preset text.

Specifically, the acquisition mode of Δv _λ is identical to the acquisition mode of Δw _λ.

S7009, obtaining the priority KL to be selected corresponding to the initial electronic medical record question-answering model according to the DeltaW and DeltaV.

Specifically, KL is acquired in S7009 by:

S70091, when Δw _λ≤ZM⁰, kl=0, where ZM ⁰ is a preset first similarity threshold.

Specifically, the value range of ZM ⁰ is 0.6-0.85, where those skilled in the art know that those skilled in the art can select the preset first similarity threshold according to the actual requirement, which all fall into the protection range of the present invention, and are not described herein.

S70093, when Δw _λ≥ZM⁰ and Δv _λ≤ZM¹, KL meets the following conditions:

Wherein, ZM ¹ is a preset second similarity threshold.

Specifically, the value range of ZM ¹ is 0.5-0.9, where those skilled in the art know that those skilled in the art can select the preset second similarity threshold according to the actual requirement, which all fall into the protection range of the present invention, and are not described herein.

S70095, when DeltaW _λ≥ZM⁰ and DeltaV _λ≥ZM¹, KL meets the following conditions

According to the method, based on the difference of the first similarity and the second similarity, the correlation coefficients of different calculation priorities are set, the acquired priorities to be selected are more accurate based on the different dimensions, the candidate priorities corresponding to the electronic medical record question-answering model are acquired based on the different dimensions, meanwhile, the priorities to be selected are acquired in different modes based on different conditions, and the results output by the electronic medical record question-answering system are more accurate by reasonably setting the priorities.

And S703, carrying out parameter adjustment on the initial electronic question-answering model based on the priority to be selected until the priority to be selected is not smaller than a preset priority threshold to be selected so as to obtain the target electronic medical record question-answering model.

Specifically, the value range of the preset priority threshold to be selected is 0.7-0.9, where those skilled in the art know that those skilled in the art can select the preset priority threshold to be selected according to actual needs, and all the selection falls into the protection range of the present invention, which is not described herein.

S705, acquiring a preset key text, and inputting the preset key text into a target electronic medical record question model to acquire a target text, wherein the preset key text is a question text which is to be queried and is acquired based on an abnormal state and is related to the abnormal state, and the target text is an answer text and an explanation text corresponding to the preset key text.

By applying the LLM model to the electronic medical record questions and answers, the large-scale data can be processed, the application limitation of the electronic medical record questions and answers model is reduced, the instruction is set for the electronic medical record questions and answers model through the prompt instruction, the understanding and the reply of the electronic medical record questions and answers system are facilitated, and the accuracy of the output result of the electronic medical record questions and answers system is improved.

Specifically, the step S705 further includes the following steps:

S7051, acquiring a key entity set according to the sample database, where the key entity set includes a plurality of key entities, and the key entities are entities related to abnormal states acquired based on the sample database.

Specifically, the sample database includes a plurality of information related to abnormal states, such as a drug data table, a human body part, an ICD-10 standard word stock, symptom signs, infectious diseases and the like.

Further, in S7051, the key entity is acquired by:

S70511, obtaining a sample entity set according to the sample data set, wherein the sample entity set comprises a plurality of sample entities, and the sample entities are entities related to abnormal states and obtained from the sample data set, which can be understood as: the sample dataset includes a plurality of text descriptions relating to abnormal conditions, from which terms associated with the medical field, namely, the acquired sample entities, are extracted.

In particular, the sample entity set includes millions of sample entities.

Further, it is known in the art that any method of extracting an entity from text in the prior art falls within the protection scope of the present invention, and is not described herein.

S70513, acquiring a first sample entity set according to the sample entity set, wherein the first sample entity set comprises a first sample entity like a dry sample entity, and the first sample entity is an entity similar to the sample entity acquired based on the LLM model.

Specifically, those skilled in the art know that any method for obtaining similar entities based on LLM model in the prior art falls within the protection scope of the present invention, and is not described herein again, for example, LLM model such as chatglm.

S70515, obtaining a second sample entity set according to the first sample entity set, wherein the second sample entity set comprises a plurality of second sample entities, and the second sample entities are entities which have no similar characteristics with the first sample entities.

Specifically, those skilled in the art know that any method for acquiring an entity with no similar characteristics to an entity based on an entity characteristic in the prior art falls within the protection scope of the present invention, and is not described herein, for example, acquiring an entity with no similar characteristics to an entity through an FM model, an FFM model, or the like.

S70517, acquiring a key entity set based on the sample entity set, the first sample entity set and the second sample entity set, wherein the key entity set comprises the sample entity set, the first sample entity set and the second sample entity set.

Specifically, the number of the key entities in the key entity set is tens of millions, wherein, those skilled in the art know that the ratio of the first sample entity to the second sample entity can be selected according to the actual requirement, which falls into the protection scope of the present invention, and is not described herein.

S7052, the key entity set and the target entity set are input into the first intermediate model, and the key entity vector set and the target entity vector set are obtained.

Specifically, the target entity set includes a plurality of target entities, wherein the target entities are standard terms related to abnormal states.

Specifically, the first intermediate model is a model for converting text into a vector, where those skilled in the art know that any natural language processing model for converting text into a vector can be performed according to actual requirements, which falls within the protection scope of the present invention, and the description thereof is omitted herein, for example, a natural language processing model such as bert model.

Specifically, the key entity vector set includes a plurality of key entity vectors, where the key entity vectors are vectors corresponding to key entities.

Further, the target entity vector set includes a plurality of target entity vectors, where the target entity vectors are vectors corresponding to target entities.

S7053, inputting the key entity vector set and the target entity vector set into a second intermediate model, and obtaining a final entity set corresponding to the key entity set, wherein the second intermediate model is a preset neural network model.

Specifically, in S7053, the final entity set is acquired by:

And S70531, any key entity vector XY= (XY ₁,……,XY_(ab),……,XY_(jk)),XY_(ab) is the bit value of the ab bit in the key entity vector, ab= … … jk, jk is the bit of the key entity vector) is obtained from the key entity vector set.

S70532, the target entity vector set ZH＝{ZH₁,……,ZH_(cd),……,ZH_(ef)},ZH_(cd)＝(ZH¹ _(cd),……,ZH^(ab) _(cd),……,ZH^(jk) _(cd)),ZH^(ab) _(cd) is obtained as the bit value of the ab bit corresponding to the cd-th target entity vector, cd= … … ef, where ef is the number of target entity vectors.

S70533, according to XY and ZH, acquiring a first intermediate priority list XH= { XH ₁,……,XH_(cd),……,XH_(ef)},XH_(cd) corresponding to XY as a first intermediate priority between XY and ZH _(cd), wherein XH _(cd) meets the following conditions:

When the priority corresponding to the entity is obtained, the method is not limited to one method, and the final priority corresponding to the entity is obtained by combining a plurality of methods, so that the accuracy of obtaining the priority corresponding to the entity is improved, and the standardized result corresponding to the output result based on the electronic medical record question-answer model is more accurate.

S70535, acquiring a final entity corresponding to XY according to XH, wherein when XH _(cd) is the largest first intermediate priority in XH, acquiring a final entity corresponding to ZH _(cd) as the final entity corresponding to XY.

S7054, a target model is obtained based on the sample entity set and the final entity set, where the target model is a model trained in the process of obtaining the final entity set based on the sample entity set.

S7055, a first candidate entity set corresponding to the target text is obtained, where the first candidate entity set includes a plurality of first candidate entities, and the first candidate entities are entities obtained from the target text.

Specifically, those skilled in the art know that any method for obtaining an entity from a text in the prior art falls within the protection scope of the present invention, and is not described herein.

S7056, inputting the first candidate entity into a target model, and acquiring a second candidate entity set corresponding to the target text, wherein the second candidate entity set comprises a plurality of second candidate entities, and the second candidate entities are entities in the target entity corresponding to the first candidate entity acquired based on the first candidate entity and the target model.

S7057, the first candidate entity set in the target text is replaced with the corresponding second candidate entity set to implement the normalization process for the target text.

By means of the method, the results output by the electronic medical record question-answering model are subjected to standardized processing, and follow-up data query and statistics are facilitated.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A system for updating an electronic medical record question-answering model based on parameter adjustment, the system comprises a sample electronic medical record information set, a first preset text set, a processor and a memory storing a computer program, wherein the sample electronic medical record information set comprises a plurality of sample electronic medical record information, the sample electronic medical record information is abnormal state characteristic information corresponding to medical records obtained from a database, and when the computer program is executed by the processor, the following steps are realized:

S100, acquiring an initial electronic medical record question-answer model according to a sample electronic case information set;

S200, when the data volume of the training set corresponding to the initial electronic medical record question-answer model is greater than a preset data volume threshold, obtaining a candidate parameter list ω= { ω ₁,……,ω_c,……,ω_w},ω_c corresponding to the initial electronic medical record question-answer model as a c-th candidate parameter, c= … … w, and w as the number of candidate parameters, wherein ω _c＝2^c, w=6;

s300, according to ω, acquiring a first intermediate priority list Tω= { Tω ₁,……,Tω_c,……,Tω_w},Tω_c corresponding to ω _c;

S400, when the first preset text is the first preset text of the first type, acquiring a second intermediate priority set Eω＝{Eω₁,……,Eω_c,……,Eω_w},Eω_c＝{Eω_c1……,Eω_cμ,……,Eω_cτ},Eω_cμ corresponding to ω based on the preset weight type to be a μ second intermediate priority in a second intermediate priority list corresponding to ω _c, where μ= … … τ, τ is the number of preset weight types;

S500, when the first preset text is the second type first preset text, acquiring a third intermediate priority set Lω＝{Lω₁,……,Lω_c,……,Lω_w},Lω_c＝{Lω_c1……,Lω_cμ,……,Lω_cτ},Lω_cμ corresponding to omega based on a preset weight type, wherein the third intermediate priority set Lω＝{Lω₁,……,Lω_c,……,Lω_w},Lω_c＝{Lω_c1……,Lω_cμ,……,Lω_cτ},Lω_cμ is a mu third intermediate priority in a third intermediate priority list corresponding to omega _c;

2. The system for updating an electronic medical record question-answer model based on tuning of claim 1, wherein the initial electronic medical record question-answer model is obtained in S100 by:

S10, acquiring a specified text vector set according to the sample electronic medical record information set;

S20, acquiring a first target text set corresponding to a first preset text set based on the first preset text set and the appointed text vector set;

s30, acquiring a second target text set corresponding to the first preset text set based on the first preset text set and the first target text set;

s40, inputting a first preset text set and a second target text set into a preset first initial LLM model, and acquiring a third target text set corresponding to the first preset text set;

3. The system for updating an electronic medical record question-answer model based on tuning of claim 1, wherein the candidate parameter is a matrix-corresponding rank set to reduce training time of a training set in an initial electronic medical record question-answer model.

4. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the preset data quantity threshold value ranges from 100GB to 1TB.

5. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the first intermediate priority is occupancy of a GPU during operation of the initial electronic medical record question-answering model.

6. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the first preset text of the first type is a question text in which the first preset text is a single question and has no relevance to other questions.

7. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the second intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameters and the first preset text.

8. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the second type of first preset text is a question text including a plurality of questions in the first preset text and associated with each question.

9. The system for updating an electronic medical record question-answering model based on call parameters according to claim 1, wherein the third intermediate priority is a score value corresponding to an initial electronic medical record question-answering model obtained under different preset weight types based on the candidate parameters and the second type of first preset text.