Nothing Special   »   [go: up one dir, main page]

CN109871538A - A kind of Chinese electronic health record name entity recognition method - Google Patents

A kind of Chinese electronic health record name entity recognition method Download PDF

Info

Publication number
CN109871538A
CN109871538A CN201910119391.1A CN201910119391A CN109871538A CN 109871538 A CN109871538 A CN 109871538A CN 201910119391 A CN201910119391 A CN 201910119391A CN 109871538 A CN109871538 A CN 109871538A
Authority
CN
China
Prior art keywords
speech
vector
entity
input
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910119391.1A
Other languages
Chinese (zh)
Inventor
董守斌
蔡晓玲
胡金龙
袁华
董守玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910119391.1A priority Critical patent/CN109871538A/en
Publication of CN109871538A publication Critical patent/CN109871538A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of Chinese electronic health records to name entity recognition method, comprising steps of 1) constructing popular word dictionary;2) brief part-of-speech tagging;3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed;4) prediction model of training name entity;5) Tag Estimation of entity is named.The present invention is by being added part of speech feature, to improve the boundary ga s safety degree of name entity and popular word, to improve name entity boundary accuracy rate.Meanwhile the degree of correlation that each moment input and other compositions in sentence are calculated from attention mechanism is introduced in two-way LSTM-CRF model, to alleviate long Dependence Problem, improve name Entity recognition accuracy rate.

Description

A kind of Chinese electronic health record name entity recognition method
Technical field
The present invention relates to the technical fields of Chinese electronic health record name Entity recognition, refer in particular to a kind of Chinese electronic health record Name entity recognition method.
Background technique
Name Entity recognition (Named Entity Recognition, NER) in electronic health record, is from electronic health record Some clinical entities relevant to patient, such as the disease sites of patient, symptom, used drug are found out in descriptive text With operation etc..The name Entity recognition of Chinese electronic health record is the key that Chinese electronic health record information extraction, can be retrieved for case history, The Chinese health and fitness information processing work such as building of disease forecasting, medical knowledge map lays the foundation.But in electronic health record exist compared with More unregistered words, and quantity is continuously increased, moreover, comparing with English, the identification mission of Chinese name entity is more complicated.
The name Entity recognition difficult point of Chinese electronic health record is primarily present in: 1) Chinese text is without in similar English text The boundary marking in space etc accords with, therefore the first step of Entity recognition needs first to determine the boundary of name entity;2) Chinese word segmentation Task and name Entity recognition influence each other;3) different classes of name entity has different characteristics, it is more difficult to combine;4) Electronic health record is different from medical literature, and the ways of writing of unified standard, does not compare with personal presentation, one of the various abbreviations Form also increases difficulty for the Entity recognition of electronic health record.
In medical domain, earliest electronic health record name Entity recognition generallys use the method that dictionary is combined with rule. This method mostly uses the rule template of linguistic expertise and medical domain expert's joint mapping, selects punctuation mark, the noun of locality, position The methods of word, centre word are set, matching and string matching are main means in mode.Based on dictionary and rule method mostly according to Rely the building of rule base and dictionary, and as data set changes, it may be necessary to rebuild rule and dictionary to adapt to new number According to collection.Method based on machine learning is that correlated characteristic such as word feature, label information, part of speech are counted from sample data sets Information etc., to establish identification model.Two class algorithms are roughly divided into, first is that Entity recognition will be named as classification task, using base In the method such as Bayesian model, support vector machines, maximum entropy etc. of classification.Another kind of algorithm is Entity recognition will to be named as sequence Column mark task carries out Entity recognition using the models such as hidden Markov model (HMM) and condition random field (CRF).
With the development of deep learning, neural network is also applied to name Entity recognition task.Deep learning side Method can automatically extract text feature, be not necessarily to Feature Engineering.Current name Entity recognition deep learning model is largely passed Return neural network (Recurrent Neural Networks, RNN)+CRF model, it can using character vector or term vector To reach preferably effect, become the mainstream in the NER method currently based on deep learning.Wherein common shot and long term remembers net Network (Long short-term memory, LSTM) is used to automatically extract the contextual feature in text sequence, condition random field (Conditional random field, CRF) not only allows for the feature of input, while further comprising label transfer characteristic, The optimal sequence that model passes through training output mark.
The overwhelming majority only uses term vector or character vector as input, due to Chinese point based on the method for deep learning Word problem may introduce participle mistake using term vector as input, lead to Entity recognition mistake.Using character vector as defeated Enter, on the one hand cannot preferably express semantic information, on the other hand increase the length of name entity, improves name entity Boundary Extraction difficulty.In model construction, most of method handles input vector by LSTM, by selectively retaining history Information handles long Dependence Problem, but as sentence increases and the movement of time step, gradually seems unable to do what one wishes, it is difficult to study compared with The characteristic information of distant place cannot handle the Boundary Extraction problem of name entity well.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, propose a kind of fusion part of speech and from attention The Chinese electronic health record of mechanism names entity recognition method, by the way that part of speech feature is added, to improve name entity and popular word Boundary ga s safety degree, thus improve name entity boundary accuracy rate.Meanwhile it is introduced in two-way LSTM-CRF model from note Meaning power mechanism, the degree of correlation for calculating each moment input and other compositions in sentence improve name to alleviate long Dependence Problem Entity recognition accuracy rate.
In name Entity recognition, the method based on machine learning is often using part-of-speech information as the important of name Entity recognition Feature, but part-of-speech information is rarely employed in deep learning method as feature, reason first is that currently to electronic health record Part-of-speech tagging is not mature enough, and there are more mistake in annotation results, the propagation of mistake causes Entity recognition effect poor.Needle To this problem, the invention proposes a kind of methods of brief part-of-speech tagging, by removing the word to doubtful name entity vocabulary Property mark, to avoid the error label to entity;The part-of-speech tagging to popular word is remained, again simultaneously to introduce name entity Context part-of-speech information and word boundary information.
To achieve the above object, a kind of technical solution provided by the present invention are as follows: Chinese electronic health record name Entity recognition Method, comprising the following steps:
1) it constructs popular word dictionary: to there is labeled data to segment, constructing popular word dictionary;
2) brief part-of-speech tagging: according to the popular word dictionary constructed in step 1), retaining the part-of-speech tagging of popular word, Remove the part of speech label of doubtful name entity vocabulary;
3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed: using term vector training tool word2vec to the text for having labeled data This and part of speech are trained respectively, obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
4) prediction model of training name entity: the mapping table obtained using step 3), by the text for having labeled data and Part of speech is mapped to vector, and fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains name entity Prediction model;
5) Tag Estimation of entity is named: according to the popular word dictionary constructed in step 1), in entity to be extracted The sub- medical record data of message carries out brief part-of-speech tagging;The mapping table obtained using step 3) is mapped the text of data and part of speech At vector;The Tag Estimation of entity is named using the prediction model that step 4) obtains.
In step 1), popular word dictionary is constructed, comprising the following steps:
1.1) using Chinese word segmentation tool to there is labeled data to segment;
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is part Name entity vocabulary contains part entity vocabulary, without processing;If it is not, then it is general to illustrate that the participle unit belongs to Logical vocabulary, is added in dictionary, obtains popular word dictionary.
In step 2), brief part-of-speech tagging, comprising the following steps:
2.1) using Chinese part of speech annotation tool to have labeled data carry out part-of-speech tagging;
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is general Logical vocabulary, retains part of speech;If it is not, then illustrating that the mark unit may be divided into comprising part names entity vocabulary Character string, to avoid participle mistake, and marking each character part of speech is " s ", to reduce the part-of-speech tagging mistake of name entity;
2.3) the part-of-speech tagging result for having labeled data is obtained.
In step 3), the detailed process of text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table is constructed: by the part-of-speech tagging knot in step 2) Fruit is separated into two parts of files, and portion is text sequence, contains word unit and character cell;Another is that text sequence is corresponding Part of speech sequence contains the part of speech of popular word and is divided into the entity part of speech " s " of character;Utilize term vector tool word2vec Two parts of files are respectively trained, obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table.
In step 4), the prediction model of training name entity, comprising the following steps:
4.1) text for having labeled data and part of speech are mapped to vector by the mapping table obtained according to step 3), are obtained every The text vector of a sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m It is sentence length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension For l;The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V=is obtained {X;P }, V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l;
4.2) weight vectors of each unit to ingredients other in sentence, fusion in sentence from attention layer, are being calculated Current input and corresponding weight vectors, and recompiled using LSTM network, it obtains obtaining and merges sentence semantics and word The feature vector of property:
The input vector of t moment is calculated to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment, W and Wv,It is function parameter,Indicate t The weighted value of moment input and the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, carries out it Normalization is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIndicate the defeated of i moment Enter, c is calculated by the adduction to all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted; Weight and current input are spliced into [vt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight Information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctIndicate t moment input with entirely it is defeated The weight entered, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM.Therefore, it obtains From the output vector of attention layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is right The output at BiLSTM network each moment is answered, k is the dimension of the network concealed layer of LSTM, because being two-way LSTM network, Output vector dimension is 2k;
4.4) BiLSTM layers of output is subjected to linear transformation, obtains emission probability matrix, be input to CRF layers, and according to The CRF layers of label transition probability matrix learnt calculate the corresponding optimal sequence label of list entries, by the sequence of maximum probability Name entity class sequence label as final output:
By the output sequence Q of step 4.3) by linear transformation, it is input to CRF layers:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity.The P obtained after linear transformation is The emission probability matrix of CRF, wherein matrix element Pi,jIt indicates to input the probability for being marked as j-th of entity tag i-th;Mark Sign shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to The probability of j-th of entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal The probability of sequence label y, specific calculating process are as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence Column;M indicates the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate i-th of input unit quilt Labeled as yiThe probability of label, s (V, y) indicate to calculate the score of sequence label y;Y indicates all sequence labels, to each in Y Sequence labelCalculate separately the score of the sequence labelSummation obtains the total score of all possible sequence labels, thus Obtain the normalization score p (y | V) of optimal sequence label y;Loss function of the negative logarithm of prediction probability as model is taken, training The prediction model of name entity is obtained, loss function is as follows:
L=-log (p (y | V)).
In step 5), the Tag Estimation of entity is named, comprising the following steps:
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the Chinese part of speech annotation tool to entity to be extracted;Root According to dictionary obtained in step 1), judge whether each mark unit appears in popular word dictionary, if it is, the mark Unit is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be comprising part entity vocabulary, by its stroke It is divided into character string, and marking each character part of speech is " s ";
5.2) mapping table obtained according to step 3), by the text of step 5.1) part-of-speech tagging result and part of speech be mapped to Amount, obtains the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2, p3,...,pm, wherein m is sentence length, xtIndicate t text unit, ptIndicate xthtPart of speech;By the text in each sentence This vector, corresponding part of speech vector are spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2, v3,...,vm},vtIndicate that t input vector, m are input length;
5.3) vector of step 5.2) is input to prediction model obtained in step 4), in name entity to be extracted The sub- medical record data of message carries out entity tag prediction;Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all possible sequence label, to each sequence label y in Y, meter It calculates at currently input V, obtains the normalization score p (y | V) of sequence label y, y*Indicate the highest sequence label of score.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, it joined part of speech feature in the deep learning model of name entity, to enrich the grammar property of input.
2, medical bodies participle and part-of-speech tagging mistake are reduced using brief part-of-speech tagging method.
3, medical dictionary is not depended on, the work of neighborhood dictionary creation is reduced.
4, it joined the ability that the long Dependence Problem of model treatment is improved from attention mechanism in a model.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the building flow chart of electronic health record normal dictionary.
Fig. 3 is brief part-of-speech tagging flow chart.
Fig. 4 is fusion part of speech and the deep learning illustraton of model from attention mechanism.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
As shown in Figures 1 to 4, Chinese electronic health record provided by the present embodiment names entity recognition method, mainly melts Part of speech feature is closed and from attention mechanism.It is obtained general in data preprocessing phase by the method for reduction entity part-of-speech tagging The term vector and part-of-speech tagging of logical vocabulary, and the character vector and substitution part-of-speech tagging of name entity vocabulary.By text vector and Corresponding part-of-speech tagging vector is input to fusion part of speech and from the model of attention mechanism after being spliced, by from attention Layer calculates the weight vectors of each relatively entire sentence of moment input vector, to obtain semantic feature and the part of speech spy of sentence level Sign, is input in two-way LSTM network, obtains the text context characteristic information and part of speech contextual feature information of each input, Finally via CRF layers of acquisition annotation results.
The specific steps of the present invention are as follows:
Step 1, building popular word dictionary
1.1) using participle tool to there is labeled data to segment;
Such as: " in our hospital's row<entity>complete hysterectomy</entity>" word segmentation result be " in the full uterus of our hospital's row Resection ".
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is name Entity vocabulary contains part names entity, without processing;If it is not, then illustrating that the participle unit belongs to generic word It converges, is added in dictionary, obtains popular word dictionary.
" " " our hospital " in such as example above does not include name entity part, is added in popular word dictionary." row This participle unit contains part names entity entirely ", is classified as name entity vocabulary, is added without popular word dictionary;" son Palace " is similar with " resection ", is not belonging to popular word.
Step 2, brief part-of-speech tagging: according to the popular word dictionary constructed in step 1, retain the part of speech mark of popular word Note removes the part of speech label of doubtful name entity vocabulary, specific as follows:
2.1) part-of-speech tagging is carried out to all data using Chinese part of speech annotation tool such as jieba.As example above marks Are as follows:
" _ p our hospital _ n row it is complete _ uterus n _ n resection _ l ", wherein " p, n, l " belong to part of speech, are expressed as being situated between Word, noun, idiom.
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is general Logical vocabulary, retains part of speech;If it is not, then illustrating that the mark unit may be divided into character comprising part names entity Sequence, and marking each character part of speech is " s ";Obtain annotation results are as follows:
" _ p our hospital _ n row _ s it is complete _ the s _ palace s _ s cuts _ s is except _ s art _ s "
Step 3, building text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table: being separated into two parts of files for the part-of-speech tagging result in step 2, Portion is text sequence, contains word unit and character cell;Another is the corresponding part of speech sequence of text sequence, is contained general The part of speech of logical vocabulary and the entity part of speech " s " for being divided into character;Two parts of files are respectively trained using term vector tool word2vec, Obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
Step 4, training entity prediction model: the mapping table obtained using step 3, the text and part of speech that will have labeled data It is mapped to vector, fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains entity prediction model, It is specific as follows:
4.1) text for having labeled data and part of speech are mapped to vector, obtained each by the mapping table obtained according to step 3 The text vector of sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m is Sentence length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension is l;The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V={ X is obtained; P }, V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l.
4.2) the corresponding sequence vector of each sentence is input in model, by calculating t moment from attention layer Input vector is to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment, W and Wv,It is function parameter,Indicate t The weighted value of moment input and the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, carries out it Normalization is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIndicate the defeated of i moment Enter, c is calculated by the adduction to all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted. Weight and current input are spliced into [vt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight Information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctIndicate t moment input with entirely it is defeated The weight entered, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM.Therefore, it obtains From the output vector of attention layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is right The output at BiLSTM network each moment is answered, k is the dimension of the network concealed layer of LSTM, because being two-way LSTM network, Output vector dimension is 2k;
4.4) 4.3) output sequence is input to CRF layers by linear transformation:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity.The P obtained after linear transformation is The emission probability matrix of CRF, wherein matrix element Pi,jIndicate that i-th of input marking is the probability of j-th of entity tag;Label Shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to The probability of j entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal mark The probability of sequences y is signed, specific calculating process is as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence Column;M indicates the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate i-th of input unit quilt Labeled as yiThe probability of label, s (V, y) indicate to calculate the score of sequence label y;
Y indicates all sequence labels, to sequence label each in YCalculate separately the score of the sequence label Summation obtains the total score of all possible sequence labels, thus obtains the normalization score p (y | V) of optimal sequence label y;Take prediction Loss function of the negative logarithm of probability as model, training obtain the prediction model of name entity, and loss function is as follows:
L=-log (p (y | V))
Step 5, entity tag prediction
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the part-of-speech tagging tool to name entity to be extracted;Root According to dictionary obtained in step 1, judge whether each mark unit appears in popular word dictionary, if it is, the mark Unit is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be comprising part entity vocabulary, by its stroke It is divided into character string, and marking each character part of speech is " s ";
5.2) 5.1) text of part-of-speech tagging result and part of speech are mapped to vector, obtained by the mapping table obtained according to step 3 To the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, Wherein m is sentence length, xtIndicate t-th of text unit, ptIndicate xthtPart of speech;By the text vector in each sentence, and Corresponding part of speech vector is spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2,v3,...,vm},vt Indicate t-th of input vector.
5.3) 5.2) vector is input to prediction model obtained in step 4, to the middle message of name entity to be extracted Sub- medical record data carries out entity tag prediction.Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all possible sequence label, to each sequence label y in Y, meter It calculates at currently input V, obtains the normalization score p (y | V) of sequence label y, y*Indicate the highest sequence label of score.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (6)

1. a kind of Chinese electronic health record names entity recognition method, which comprises the following steps:
1) it constructs popular word dictionary: to there is labeled data to segment, constructing popular word dictionary;
2) brief part-of-speech tagging: according to the popular word dictionary constructed in step 1), retaining the part-of-speech tagging of popular word, removes The part of speech label of doubtful name entity vocabulary;
3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed: using term vector training tool word2vec to the text for having labeled data and Part of speech is trained respectively, obtains text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
4) prediction model of training name entity: the mapping table obtained using step 3), the text and part of speech that will have labeled data It is mapped to vector, fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains the pre- of name entity Survey model;
5) Tag Estimation of entity is named: according to the popular word dictionary constructed in step 1), to the middle message of entity to be extracted Sub- medical record data carries out brief part-of-speech tagging;The mapping table obtained using step 3), by the text of data and part of speech be mapped to Amount;The Tag Estimation of entity is named using the prediction model that step 4) obtains.
2. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 1) In, construct popular word dictionary, comprising the following steps:
1.1) using Chinese word segmentation tool to there is labeled data to segment;
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is part names Entity vocabulary contains part entity vocabulary, without processing;If it is not, then illustrating that the participle unit belongs to generic word It converges, is added in dictionary, obtains popular word dictionary.
3. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 2) In, brief part-of-speech tagging, comprising the following steps:
2.1) using Chinese part of speech annotation tool to have labeled data carry out part-of-speech tagging;
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is generic word It converges, retains part of speech;If it is not, then illustrating that the mark unit may be divided into character comprising part names entity vocabulary Sequence, to avoid participle mistake, and marking each character part of speech is " s ", to reduce the part-of-speech tagging mistake of name entity;
2.3) the part-of-speech tagging result for having labeled data is obtained.
4. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 3) In, it constructs the detailed process of text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table: the part-of-speech tagging result in step 2) is separated into two parts of files, Portion is text sequence, contains word unit and character cell;Another is the corresponding part of speech sequence of text sequence, is contained general The part of speech of logical vocabulary and the entity part of speech " s " for being divided into character;Two parts of files are respectively trained using term vector tool word2vec, Obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table.
5. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 4) In, the prediction model of training name entity, comprising the following steps:
4.1) text for having labeled data and part of speech are mapped to vector, obtain each sentence by the mapping table obtained according to step 3) The text vector of son: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m is sentence Sub- length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension l; The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V={ X is obtained;P}, V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l;
4.2) from attention layer, each unit is to the weight vectors of ingredients other in sentence in calculating sentence, and fusion is currently Input and corresponding weight vectors, and are recompiled using LSTM network, are obtained obtaining and are merged sentence semantics and part of speech Feature vector:
The input vector of t moment is calculated to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment;W and Wv,It is function parameter,Indicate that t moment is defeated Enter the weighted value with the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, is normalized it It is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIt indicates the input at i moment, leads to It crosses and c is calculated to the adduction at all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted;It will power [v is spliced in weight and current inputt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctThe power for indicating t moment input and entirely inputting Weight, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM;Therefore, it obtains paying attention to certainly The output vector of power layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is corresponding The output at BiLSTM network each moment, k are the dimension of the network concealed layer of LSTM, defeated because being two-way LSTM network Outgoing vector dimension is 2k;
4.4) BiLSTM layers of output is subjected to linear transformation, obtains emission probability matrix, is input to CRF layers, and according to CRF layers The label transition probability matrix learnt calculates the corresponding optimal sequence label of list entries, using the sequence of maximum probability as The name entity class sequence label of final output:
By the output sequence Q of step 4.3) by linear transformation, it is input to CRF layers:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity, and the P obtained after linear transformation is The emission probability matrix of CRF, wherein matrix element Pi,jIt indicates to input the probability for being marked as j-th of entity tag i-th;Mark Sign shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to The probability of j-th of entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal The probability of sequence label y, specific calculating process are as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence;M table Show the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate that i-th of input unit is marked as yi The probability of label, s (V, y) indicate to calculate the score of sequence label y;Y indicates all sequence labels, to sequence label each in YCalculate separately the score of the sequence labelSummation obtains the total score of all possible sequence labels, thus obtains optimal The normalization score p (y | V) of sequence label y;Loss function of the negative logarithm of prediction probability as model is taken, training is named The prediction model of entity, loss function are as follows:
L=-log (p (y | V)).
6. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 5) In, name the Tag Estimation of entity, comprising the following steps:
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the Chinese part of speech annotation tool to entity to be extracted;According to step It is rapid 1) obtained in dictionary, judge whether each mark unit appears in popular word dictionary, if it is, the mark unit It is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be divided into comprising part entity vocabulary Character string, and marking each character part of speech is " s ";
5.2) text of step 5.1) part-of-speech tagging result and part of speech are mapped to vector by the mapping table obtained according to step 3), Obtain the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,..., pm, wherein m is sentence length, xtIndicate t text unit, ptIndicate xthtPart of speech;By the text vector in each sentence, Corresponding part of speech vector is spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2,v3,...,vm},vt Indicate t input vector;
5.3) vector of step 5.2) is input to prediction model obtained in step 4), to the middle message of name entity to be extracted Sub- medical record data carries out entity tag prediction;Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all sequence labels, and to each sequence label y in Y, calculating is being worked as Under preceding input V, the normalization score p (y | V) of sequence label y, y are obtained*Indicate the highest sequence label of score.
CN201910119391.1A 2019-02-18 2019-02-18 A kind of Chinese electronic health record name entity recognition method Pending CN109871538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910119391.1A CN109871538A (en) 2019-02-18 2019-02-18 A kind of Chinese electronic health record name entity recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910119391.1A CN109871538A (en) 2019-02-18 2019-02-18 A kind of Chinese electronic health record name entity recognition method

Publications (1)

Publication Number Publication Date
CN109871538A true CN109871538A (en) 2019-06-11

Family

ID=66918762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910119391.1A Pending CN109871538A (en) 2019-02-18 2019-02-18 A kind of Chinese electronic health record name entity recognition method

Country Status (1)

Country Link
CN (1) CN109871538A (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110347831A (en) * 2019-06-28 2019-10-18 西安理工大学 Based on the sensibility classification method from attention mechanism
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110444261A (en) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 Sequence labelling network training method, electronic health record processing method and relevant apparatus
CN110457682A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Electronic health record part-of-speech tagging method, model training method and relevant apparatus
CN110598203A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military imagination document entity information extraction method and device combined with dictionary
CN110674641A (en) * 2019-10-06 2020-01-10 武汉鸿名科技有限公司 GPT-2 model-based Chinese electronic medical record entity identification method
CN110765775A (en) * 2019-11-01 2020-02-07 北京邮电大学 Self-adaptive method for named entity recognition field fusing semantics and label differences
CN110837736A (en) * 2019-11-01 2020-02-25 浙江大学 Character structure-based named entity recognition method for Chinese medical record of iterative expansion convolutional neural network-conditional random field
CN110866399A (en) * 2019-10-24 2020-03-06 同济大学 Chinese short text entity identification and disambiguation method based on enhanced character vector
CN110866401A (en) * 2019-11-18 2020-03-06 山东健康医疗大数据有限公司 Chinese electronic medical record named entity identification method and system based on attention mechanism
CN111046671A (en) * 2019-12-12 2020-04-21 中国科学院自动化研究所 Chinese named entity recognition method based on graph network and merged into dictionary
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN111145914A (en) * 2019-12-30 2020-05-12 四川大学华西医院 Method and device for determining lung cancer clinical disease library text entity
CN111145718A (en) * 2019-12-30 2020-05-12 中国科学院声学研究所 Chinese mandarin character-voice conversion method based on self-attention mechanism
CN111144119A (en) * 2019-12-27 2020-05-12 北京联合大学 Entity identification method for improving knowledge migration
CN111222340A (en) * 2020-01-15 2020-06-02 东华大学 Breast electronic medical record entity recognition system based on multi-standard active learning
CN111243699A (en) * 2020-01-14 2020-06-05 中南大学 Chinese electronic medical record entity extraction method based on word information fusion
CN111274788A (en) * 2020-01-16 2020-06-12 创新工场(广州)人工智能研究有限公司 Dual-channel joint processing method and device
CN111312354A (en) * 2020-02-10 2020-06-19 东华大学 Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning
CN111428036A (en) * 2020-03-23 2020-07-17 浙江大学 Entity relationship mining method based on biomedical literature
CN111444720A (en) * 2020-03-30 2020-07-24 华南理工大学 Named entity recognition method for English text
CN111523320A (en) * 2020-04-20 2020-08-11 电子科技大学 Chinese medical record word segmentation method based on deep learning
CN111581972A (en) * 2020-03-27 2020-08-25 平安科技(深圳)有限公司 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text
CN111581974A (en) * 2020-04-27 2020-08-25 天津大学 Biomedical entity identification method based on deep learning
CN111651991A (en) * 2020-04-15 2020-09-11 天津科技大学 Medical named entity identification method utilizing multi-model fusion strategy
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援系统有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111724897A (en) * 2020-06-12 2020-09-29 电子科技大学 Motion function data processing method and system
CN111738006A (en) * 2020-06-22 2020-10-02 苏州大学 Commodity comment named entity recognition-based problem generation method
CN111950287A (en) * 2020-08-20 2020-11-17 广东工业大学 Text-based entity identification method and related device
CN111950283A (en) * 2020-07-31 2020-11-17 合肥工业大学 Chinese word segmentation and named entity recognition system for large-scale medical text mining
CN112001177A (en) * 2020-08-24 2020-11-27 浪潮云信息技术股份公司 Electronic medical record named entity identification method and system integrating deep learning and rules
CN112149420A (en) * 2020-09-01 2020-12-29 中国科学院信息工程研究所 Entity recognition model training method, threat information entity extraction method and device
CN112183099A (en) * 2020-10-09 2021-01-05 上海明略人工智能(集团)有限公司 Named entity identification method and system based on semi-supervised small sample extension
CN112329459A (en) * 2020-06-09 2021-02-05 北京沃东天骏信息技术有限公司 Text labeling method and neural network model construction method
CN112836046A (en) * 2021-01-13 2021-05-25 哈尔滨工程大学 Four-risk one-gold-field policy and regulation text entity identification method
CN112861533A (en) * 2019-11-26 2021-05-28 阿里巴巴集团控股有限公司 Entity word recognition method and device
CN112927806A (en) * 2019-12-05 2021-06-08 金色熊猫有限公司 Medical record structured network cross-disease migration training method, device, medium and equipment
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN113051905A (en) * 2019-12-28 2021-06-29 中移(成都)信息通信科技有限公司 Medical named entity recognition training model and medical named entity recognition method
CN113076751A (en) * 2021-02-26 2021-07-06 北京工业大学 Named entity recognition method and system, electronic device and storage medium
WO2021139247A1 (en) * 2020-08-06 2021-07-15 平安科技(深圳)有限公司 Construction method, apparatus and device for medical domain knowledge map, and storage medium
WO2021139239A1 (en) * 2020-07-28 2021-07-15 平安科技(深圳)有限公司 Mechanism entity extraction method, system and device based on multiple training targets
CN113177416A (en) * 2021-05-17 2021-07-27 同济大学 Event element detection method combining sequence labeling and pattern matching
CN113496120A (en) * 2020-03-19 2021-10-12 复旦大学 Domain entity extraction method, computer device, computer readable medium and processor
CN113743116A (en) * 2020-05-28 2021-12-03 株式会社理光 Training method and device for named entity recognition and computer readable storage medium
CN113779992A (en) * 2021-07-19 2021-12-10 西安理工大学 Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training
CN113807094A (en) * 2020-06-11 2021-12-17 株式会社理光 Entity identification method, device and computer readable storage medium
CN114328485A (en) * 2021-12-23 2022-04-12 中国科学院沈阳计算技术研究所有限公司 Electronic medical record named entity identification method for improving BilSTM-CRF
CN114970536A (en) * 2022-06-22 2022-08-30 昆明理工大学 Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition
CN115146628A (en) * 2021-11-21 2022-10-04 北京中科凡语科技有限公司 Method and device for determining real boundary of marked entity and electronic equipment
WO2022242074A1 (en) * 2021-05-21 2022-11-24 山东省人工智能研究院 Multi-feature fusion-based method for named entity recognition in chinese medical text
CN116227483A (en) * 2023-02-10 2023-06-06 南京南瑞信息通信科技有限公司 Word boundary-based Chinese entity extraction method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130297634A1 (en) * 2012-05-07 2013-11-07 Sap Ag Entity Name Variant Generator
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN109062893A (en) * 2018-07-13 2018-12-21 华南理工大学 A kind of product name recognition methods based on full text attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130297634A1 (en) * 2012-05-07 2013-11-07 Sap Ag Entity Name Variant Generator
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN109062893A (en) * 2018-07-13 2018-12-21 华南理工大学 A kind of product name recognition methods based on full text attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAOLING CAI 等: "A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records", 《4TH CHINA HEALTH INFORMATION PROCESSING CONFERENCE》 *

Cited By (77)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110347831A (en) * 2019-06-28 2019-10-18 西安理工大学 Based on the sensibility classification method from attention mechanism
CN110457682B (en) * 2019-07-11 2022-08-09 新华三大数据技术有限公司 Part-of-speech tagging method for electronic medical record, model training method and related device
CN110427493A (en) * 2019-07-11 2019-11-08 新华三大数据技术有限公司 Electronic health record processing method, model training method and relevant apparatus
CN110444261A (en) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 Sequence labelling network training method, electronic health record processing method and relevant apparatus
CN110457682A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Electronic health record part-of-speech tagging method, model training method and relevant apparatus
CN110427493B (en) * 2019-07-11 2022-04-08 新华三大数据技术有限公司 Electronic medical record processing method, model training method and related device
CN110598203A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military imagination document entity information extraction method and device combined with dictionary
CN110674641A (en) * 2019-10-06 2020-01-10 武汉鸿名科技有限公司 GPT-2 model-based Chinese electronic medical record entity identification method
CN110674641B (en) * 2019-10-06 2024-02-02 湖北大学 Chinese electronic medical record entity identification method based on GPT-2 model
CN110866399A (en) * 2019-10-24 2020-03-06 同济大学 Chinese short text entity identification and disambiguation method based on enhanced character vector
CN110866399B (en) * 2019-10-24 2023-05-02 同济大学 Chinese short text entity recognition and disambiguation method based on enhanced character vector
CN110837736A (en) * 2019-11-01 2020-02-25 浙江大学 Character structure-based named entity recognition method for Chinese medical record of iterative expansion convolutional neural network-conditional random field
CN110765775B (en) * 2019-11-01 2020-08-04 北京邮电大学 Self-adaptive method for named entity recognition field fusing semantics and label differences
CN110765775A (en) * 2019-11-01 2020-02-07 北京邮电大学 Self-adaptive method for named entity recognition field fusing semantics and label differences
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111079418B (en) * 2019-11-06 2023-12-05 科大讯飞股份有限公司 Named entity recognition method, device, electronic equipment and storage medium
CN110866401A (en) * 2019-11-18 2020-03-06 山东健康医疗大数据有限公司 Chinese electronic medical record named entity identification method and system based on attention mechanism
CN112861533A (en) * 2019-11-26 2021-05-28 阿里巴巴集团控股有限公司 Entity word recognition method and device
CN111079377B (en) * 2019-12-03 2022-12-13 哈尔滨工程大学 Method for recognizing named entities of Chinese medical texts
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN112927806A (en) * 2019-12-05 2021-06-08 金色熊猫有限公司 Medical record structured network cross-disease migration training method, device, medium and equipment
CN112927806B (en) * 2019-12-05 2022-11-25 金色熊猫有限公司 Medical record structured network cross-disease migration training method, device, medium and equipment
CN113033192B (en) * 2019-12-09 2024-04-26 株式会社理光 Training method and device for sequence annotation and computer readable storage medium
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN111046671A (en) * 2019-12-12 2020-04-21 中国科学院自动化研究所 Chinese named entity recognition method based on graph network and merged into dictionary
CN111144119B (en) * 2019-12-27 2024-03-29 北京联合大学 Entity identification method for improving knowledge migration
CN111144119A (en) * 2019-12-27 2020-05-12 北京联合大学 Entity identification method for improving knowledge migration
CN113051905A (en) * 2019-12-28 2021-06-29 中移(成都)信息通信科技有限公司 Medical named entity recognition training model and medical named entity recognition method
CN111145914B (en) * 2019-12-30 2023-08-04 四川大学华西医院 Method and device for determining text entity of lung cancer clinical disease seed bank
CN111145718A (en) * 2019-12-30 2020-05-12 中国科学院声学研究所 Chinese mandarin character-voice conversion method based on self-attention mechanism
CN111145914A (en) * 2019-12-30 2020-05-12 四川大学华西医院 Method and device for determining lung cancer clinical disease library text entity
CN111243699A (en) * 2020-01-14 2020-06-05 中南大学 Chinese electronic medical record entity extraction method based on word information fusion
CN111222340A (en) * 2020-01-15 2020-06-02 东华大学 Breast electronic medical record entity recognition system based on multi-standard active learning
CN111274788A (en) * 2020-01-16 2020-06-12 创新工场(广州)人工智能研究有限公司 Dual-channel joint processing method and device
CN111312354B (en) * 2020-02-10 2023-10-24 东华大学 Mammary gland medical record entity identification marking enhancement system based on multi-agent reinforcement learning
CN111312354A (en) * 2020-02-10 2020-06-19 东华大学 Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning
CN113496120B (en) * 2020-03-19 2022-07-29 复旦大学 Domain entity extraction method, computer device, computer readable medium and processor
CN113496120A (en) * 2020-03-19 2021-10-12 复旦大学 Domain entity extraction method, computer device, computer readable medium and processor
CN111428036B (en) * 2020-03-23 2022-05-27 浙江大学 Entity relationship mining method based on biomedical literature
CN111428036A (en) * 2020-03-23 2020-07-17 浙江大学 Entity relationship mining method based on biomedical literature
WO2021190236A1 (en) * 2020-03-23 2021-09-30 浙江大学 Entity relation mining method based on biomedical literature
CN111581972A (en) * 2020-03-27 2020-08-25 平安科技(深圳)有限公司 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text
CN111444720A (en) * 2020-03-30 2020-07-24 华南理工大学 Named entity recognition method for English text
CN111651991A (en) * 2020-04-15 2020-09-11 天津科技大学 Medical named entity identification method utilizing multi-model fusion strategy
CN111651991B (en) * 2020-04-15 2022-08-26 天津科技大学 Medical named entity identification method utilizing multi-model fusion strategy
CN111523320A (en) * 2020-04-20 2020-08-11 电子科技大学 Chinese medical record word segmentation method based on deep learning
CN111581974A (en) * 2020-04-27 2020-08-25 天津大学 Biomedical entity identification method based on deep learning
CN111680512B (en) * 2020-05-11 2024-04-02 上海阿尔卡特网络支援系统有限公司 Named entity recognition model, telephone exchange extension switching method and system
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援系统有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN113743116A (en) * 2020-05-28 2021-12-03 株式会社理光 Training method and device for named entity recognition and computer readable storage medium
CN111666754B (en) * 2020-05-28 2023-02-03 深圳平安医疗健康科技服务有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN112329459A (en) * 2020-06-09 2021-02-05 北京沃东天骏信息技术有限公司 Text labeling method and neural network model construction method
CN113807094B (en) * 2020-06-11 2024-03-19 株式会社理光 Entity recognition method, entity recognition device and computer readable storage medium
CN113807094A (en) * 2020-06-11 2021-12-17 株式会社理光 Entity identification method, device and computer readable storage medium
CN111724897B (en) * 2020-06-12 2022-07-01 电子科技大学 Motion function data processing method and system
CN111724897A (en) * 2020-06-12 2020-09-29 电子科技大学 Motion function data processing method and system
CN111738006A (en) * 2020-06-22 2020-10-02 苏州大学 Commodity comment named entity recognition-based problem generation method
WO2021139239A1 (en) * 2020-07-28 2021-07-15 平安科技(深圳)有限公司 Mechanism entity extraction method, system and device based on multiple training targets
CN111950283A (en) * 2020-07-31 2020-11-17 合肥工业大学 Chinese word segmentation and named entity recognition system for large-scale medical text mining
WO2021139247A1 (en) * 2020-08-06 2021-07-15 平安科技(深圳)有限公司 Construction method, apparatus and device for medical domain knowledge map, and storage medium
CN111950287A (en) * 2020-08-20 2020-11-17 广东工业大学 Text-based entity identification method and related device
CN111950287B (en) * 2020-08-20 2024-04-23 广东工业大学 Entity identification method based on text and related device
CN112001177B (en) * 2020-08-24 2024-08-13 浪潮云信息技术股份公司 Electronic medical record named entity recognition method and system integrating deep learning and rules
CN112001177A (en) * 2020-08-24 2020-11-27 浪潮云信息技术股份公司 Electronic medical record named entity identification method and system integrating deep learning and rules
CN112149420A (en) * 2020-09-01 2020-12-29 中国科学院信息工程研究所 Entity recognition model training method, threat information entity extraction method and device
CN112183099A (en) * 2020-10-09 2021-01-05 上海明略人工智能(集团)有限公司 Named entity identification method and system based on semi-supervised small sample extension
CN112836046A (en) * 2021-01-13 2021-05-25 哈尔滨工程大学 Four-risk one-gold-field policy and regulation text entity identification method
CN113076751A (en) * 2021-02-26 2021-07-06 北京工业大学 Named entity recognition method and system, electronic device and storage medium
CN113177416A (en) * 2021-05-17 2021-07-27 同济大学 Event element detection method combining sequence labeling and pattern matching
WO2022242074A1 (en) * 2021-05-21 2022-11-24 山东省人工智能研究院 Multi-feature fusion-based method for named entity recognition in chinese medical text
CN113779992A (en) * 2021-07-19 2021-12-10 西安理工大学 Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training
CN115146628A (en) * 2021-11-21 2022-10-04 北京中科凡语科技有限公司 Method and device for determining real boundary of marked entity and electronic equipment
CN114328485A (en) * 2021-12-23 2022-04-12 中国科学院沈阳计算技术研究所有限公司 Electronic medical record named entity identification method for improving BilSTM-CRF
CN114970536A (en) * 2022-06-22 2022-08-30 昆明理工大学 Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition
CN116227483A (en) * 2023-02-10 2023-06-06 南京南瑞信息通信科技有限公司 Word boundary-based Chinese entity extraction method, device and storage medium

Similar Documents

Publication Publication Date Title
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
CN107977361B (en) Chinese clinical medical entity identification method based on deep semantic information representation
CN111444726B (en) Chinese semantic information extraction method and device based on long-short-term memory network of bidirectional lattice structure
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109657239B (en) Chinese named entity recognition method based on attention mechanism and language model learning
CN106980683B (en) Blog text abstract generating method based on deep learning
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN110297908A (en) Diagnosis and treatment program prediction method and device
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN110263325B (en) Chinese word segmentation system
CN107748757A (en) A kind of answering method of knowledge based collection of illustrative plates
CN108829719A (en) The non-true class quiz answers selection method of one kind and system
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN113724882B (en) Method, device, equipment and medium for constructing user portrait based on inquiry session
CN111400455A (en) Relation detection method of question-answering system based on knowledge graph
CN110096572B (en) Sample generation method, device and computer readable medium
CN111159345B (en) Chinese knowledge base answer acquisition method and device
WO2021082086A1 (en) Machine reading method, system, device, and storage medium
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN113761890A (en) BERT context sensing-based multi-level semantic information retrieval method
CN115310448A (en) Chinese named entity recognition method based on combining bert and word vector
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN116341546A (en) Medical natural language processing method based on pre-training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190611

WD01 Invention patent application deemed withdrawn after publication