CN109871538A - A kind of Chinese electronic health record name entity recognition method - Google Patents
A kind of Chinese electronic health record name entity recognition method Download PDFInfo
- Publication number
- CN109871538A CN109871538A CN201910119391.1A CN201910119391A CN109871538A CN 109871538 A CN109871538 A CN 109871538A CN 201910119391 A CN201910119391 A CN 201910119391A CN 109871538 A CN109871538 A CN 109871538A
- Authority
- CN
- China
- Prior art keywords
- speech
- vector
- entity
- input
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a kind of Chinese electronic health records to name entity recognition method, comprising steps of 1) constructing popular word dictionary;2) brief part-of-speech tagging;3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed;4) prediction model of training name entity;5) Tag Estimation of entity is named.The present invention is by being added part of speech feature, to improve the boundary ga s safety degree of name entity and popular word, to improve name entity boundary accuracy rate.Meanwhile the degree of correlation that each moment input and other compositions in sentence are calculated from attention mechanism is introduced in two-way LSTM-CRF model, to alleviate long Dependence Problem, improve name Entity recognition accuracy rate.
Description
Technical field
The present invention relates to the technical fields of Chinese electronic health record name Entity recognition, refer in particular to a kind of Chinese electronic health record
Name entity recognition method.
Background technique
Name Entity recognition (Named Entity Recognition, NER) in electronic health record, is from electronic health record
Some clinical entities relevant to patient, such as the disease sites of patient, symptom, used drug are found out in descriptive text
With operation etc..The name Entity recognition of Chinese electronic health record is the key that Chinese electronic health record information extraction, can be retrieved for case history,
The Chinese health and fitness information processing work such as building of disease forecasting, medical knowledge map lays the foundation.But in electronic health record exist compared with
More unregistered words, and quantity is continuously increased, moreover, comparing with English, the identification mission of Chinese name entity is more complicated.
The name Entity recognition difficult point of Chinese electronic health record is primarily present in: 1) Chinese text is without in similar English text
The boundary marking in space etc accords with, therefore the first step of Entity recognition needs first to determine the boundary of name entity;2) Chinese word segmentation
Task and name Entity recognition influence each other;3) different classes of name entity has different characteristics, it is more difficult to combine;4)
Electronic health record is different from medical literature, and the ways of writing of unified standard, does not compare with personal presentation, one of the various abbreviations
Form also increases difficulty for the Entity recognition of electronic health record.
In medical domain, earliest electronic health record name Entity recognition generallys use the method that dictionary is combined with rule.
This method mostly uses the rule template of linguistic expertise and medical domain expert's joint mapping, selects punctuation mark, the noun of locality, position
The methods of word, centre word are set, matching and string matching are main means in mode.Based on dictionary and rule method mostly according to
Rely the building of rule base and dictionary, and as data set changes, it may be necessary to rebuild rule and dictionary to adapt to new number
According to collection.Method based on machine learning is that correlated characteristic such as word feature, label information, part of speech are counted from sample data sets
Information etc., to establish identification model.Two class algorithms are roughly divided into, first is that Entity recognition will be named as classification task, using base
In the method such as Bayesian model, support vector machines, maximum entropy etc. of classification.Another kind of algorithm is Entity recognition will to be named as sequence
Column mark task carries out Entity recognition using the models such as hidden Markov model (HMM) and condition random field (CRF).
With the development of deep learning, neural network is also applied to name Entity recognition task.Deep learning side
Method can automatically extract text feature, be not necessarily to Feature Engineering.Current name Entity recognition deep learning model is largely passed
Return neural network (Recurrent Neural Networks, RNN)+CRF model, it can using character vector or term vector
To reach preferably effect, become the mainstream in the NER method currently based on deep learning.Wherein common shot and long term remembers net
Network (Long short-term memory, LSTM) is used to automatically extract the contextual feature in text sequence, condition random field
(Conditional random field, CRF) not only allows for the feature of input, while further comprising label transfer characteristic,
The optimal sequence that model passes through training output mark.
The overwhelming majority only uses term vector or character vector as input, due to Chinese point based on the method for deep learning
Word problem may introduce participle mistake using term vector as input, lead to Entity recognition mistake.Using character vector as defeated
Enter, on the one hand cannot preferably express semantic information, on the other hand increase the length of name entity, improves name entity
Boundary Extraction difficulty.In model construction, most of method handles input vector by LSTM, by selectively retaining history
Information handles long Dependence Problem, but as sentence increases and the movement of time step, gradually seems unable to do what one wishes, it is difficult to study compared with
The characteristic information of distant place cannot handle the Boundary Extraction problem of name entity well.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, propose a kind of fusion part of speech and from attention
The Chinese electronic health record of mechanism names entity recognition method, by the way that part of speech feature is added, to improve name entity and popular word
Boundary ga s safety degree, thus improve name entity boundary accuracy rate.Meanwhile it is introduced in two-way LSTM-CRF model from note
Meaning power mechanism, the degree of correlation for calculating each moment input and other compositions in sentence improve name to alleviate long Dependence Problem
Entity recognition accuracy rate.
In name Entity recognition, the method based on machine learning is often using part-of-speech information as the important of name Entity recognition
Feature, but part-of-speech information is rarely employed in deep learning method as feature, reason first is that currently to electronic health record
Part-of-speech tagging is not mature enough, and there are more mistake in annotation results, the propagation of mistake causes Entity recognition effect poor.Needle
To this problem, the invention proposes a kind of methods of brief part-of-speech tagging, by removing the word to doubtful name entity vocabulary
Property mark, to avoid the error label to entity;The part-of-speech tagging to popular word is remained, again simultaneously to introduce name entity
Context part-of-speech information and word boundary information.
To achieve the above object, a kind of technical solution provided by the present invention are as follows: Chinese electronic health record name Entity recognition
Method, comprising the following steps:
1) it constructs popular word dictionary: to there is labeled data to segment, constructing popular word dictionary;
2) brief part-of-speech tagging: according to the popular word dictionary constructed in step 1), retaining the part-of-speech tagging of popular word,
Remove the part of speech label of doubtful name entity vocabulary;
3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed: using term vector training tool word2vec to the text for having labeled data
This and part of speech are trained respectively, obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
4) prediction model of training name entity: the mapping table obtained using step 3), by the text for having labeled data and
Part of speech is mapped to vector, and fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains name entity
Prediction model;
5) Tag Estimation of entity is named: according to the popular word dictionary constructed in step 1), in entity to be extracted
The sub- medical record data of message carries out brief part-of-speech tagging;The mapping table obtained using step 3) is mapped the text of data and part of speech
At vector;The Tag Estimation of entity is named using the prediction model that step 4) obtains.
In step 1), popular word dictionary is constructed, comprising the following steps:
1.1) using Chinese word segmentation tool to there is labeled data to segment;
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is part
Name entity vocabulary contains part entity vocabulary, without processing;If it is not, then it is general to illustrate that the participle unit belongs to
Logical vocabulary, is added in dictionary, obtains popular word dictionary.
In step 2), brief part-of-speech tagging, comprising the following steps:
2.1) using Chinese part of speech annotation tool to have labeled data carry out part-of-speech tagging;
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is general
Logical vocabulary, retains part of speech;If it is not, then illustrating that the mark unit may be divided into comprising part names entity vocabulary
Character string, to avoid participle mistake, and marking each character part of speech is " s ", to reduce the part-of-speech tagging mistake of name entity;
2.3) the part-of-speech tagging result for having labeled data is obtained.
In step 3), the detailed process of text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table is constructed: by the part-of-speech tagging knot in step 2)
Fruit is separated into two parts of files, and portion is text sequence, contains word unit and character cell;Another is that text sequence is corresponding
Part of speech sequence contains the part of speech of popular word and is divided into the entity part of speech " s " of character;Utilize term vector tool word2vec
Two parts of files are respectively trained, obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table.
In step 4), the prediction model of training name entity, comprising the following steps:
4.1) text for having labeled data and part of speech are mapped to vector by the mapping table obtained according to step 3), are obtained every
The text vector of a sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m
It is sentence length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension
For l;The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V=is obtained
{X;P }, V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l;
4.2) weight vectors of each unit to ingredients other in sentence, fusion in sentence from attention layer, are being calculated
Current input and corresponding weight vectors, and recompiled using LSTM network, it obtains obtaining and merges sentence semantics and word
The feature vector of property:
The input vector of t moment is calculated to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment, W and Wv,It is function parameter,Indicate t
The weighted value of moment input and the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, carries out it
Normalization is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIndicate the defeated of i moment
Enter, c is calculated by the adduction to all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted;
Weight and current input are spliced into [vt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight
Information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctIndicate t moment input with entirely it is defeated
The weight entered, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM.Therefore, it obtains
From the output vector of attention layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is right
The output at BiLSTM network each moment is answered, k is the dimension of the network concealed layer of LSTM, because being two-way LSTM network,
Output vector dimension is 2k;
4.4) BiLSTM layers of output is subjected to linear transformation, obtains emission probability matrix, be input to CRF layers, and according to
The CRF layers of label transition probability matrix learnt calculate the corresponding optimal sequence label of list entries, by the sequence of maximum probability
Name entity class sequence label as final output:
By the output sequence Q of step 4.3) by linear transformation, it is input to CRF layers:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is
The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity.The P obtained after linear transformation is
The emission probability matrix of CRF, wherein matrix element Pi,jIt indicates to input the probability for being marked as j-th of entity tag i-th;Mark
Sign shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to
The probability of j-th of entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal
The probability of sequence label y, specific calculating process are as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence
Column;M indicates the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate i-th of input unit quilt
Labeled as yiThe probability of label, s (V, y) indicate to calculate the score of sequence label y;Y indicates all sequence labels, to each in Y
Sequence labelCalculate separately the score of the sequence labelSummation obtains the total score of all possible sequence labels, thus
Obtain the normalization score p (y | V) of optimal sequence label y;Loss function of the negative logarithm of prediction probability as model is taken, training
The prediction model of name entity is obtained, loss function is as follows:
L=-log (p (y | V)).
In step 5), the Tag Estimation of entity is named, comprising the following steps:
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the Chinese part of speech annotation tool to entity to be extracted;Root
According to dictionary obtained in step 1), judge whether each mark unit appears in popular word dictionary, if it is, the mark
Unit is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be comprising part entity vocabulary, by its stroke
It is divided into character string, and marking each character part of speech is " s ";
5.2) mapping table obtained according to step 3), by the text of step 5.1) part-of-speech tagging result and part of speech be mapped to
Amount, obtains the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,
p3,...,pm, wherein m is sentence length, xtIndicate t text unit, ptIndicate xthtPart of speech;By the text in each sentence
This vector, corresponding part of speech vector are spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2,
v3,...,vm},vtIndicate that t input vector, m are input length;
5.3) vector of step 5.2) is input to prediction model obtained in step 4), in name entity to be extracted
The sub- medical record data of message carries out entity tag prediction;Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all possible sequence label, to each sequence label y in Y, meter
It calculates at currently input V, obtains the normalization score p (y | V) of sequence label y, y*Indicate the highest sequence label of score.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, it joined part of speech feature in the deep learning model of name entity, to enrich the grammar property of input.
2, medical bodies participle and part-of-speech tagging mistake are reduced using brief part-of-speech tagging method.
3, medical dictionary is not depended on, the work of neighborhood dictionary creation is reduced.
4, it joined the ability that the long Dependence Problem of model treatment is improved from attention mechanism in a model.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the building flow chart of electronic health record normal dictionary.
Fig. 3 is brief part-of-speech tagging flow chart.
Fig. 4 is fusion part of speech and the deep learning illustraton of model from attention mechanism.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
As shown in Figures 1 to 4, Chinese electronic health record provided by the present embodiment names entity recognition method, mainly melts
Part of speech feature is closed and from attention mechanism.It is obtained general in data preprocessing phase by the method for reduction entity part-of-speech tagging
The term vector and part-of-speech tagging of logical vocabulary, and the character vector and substitution part-of-speech tagging of name entity vocabulary.By text vector and
Corresponding part-of-speech tagging vector is input to fusion part of speech and from the model of attention mechanism after being spliced, by from attention
Layer calculates the weight vectors of each relatively entire sentence of moment input vector, to obtain semantic feature and the part of speech spy of sentence level
Sign, is input in two-way LSTM network, obtains the text context characteristic information and part of speech contextual feature information of each input,
Finally via CRF layers of acquisition annotation results.
The specific steps of the present invention are as follows:
Step 1, building popular word dictionary
1.1) using participle tool to there is labeled data to segment;
Such as: " in our hospital's row<entity>complete hysterectomy</entity>" word segmentation result be " in the full uterus of our hospital's row
Resection ".
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is name
Entity vocabulary contains part names entity, without processing;If it is not, then illustrating that the participle unit belongs to generic word
It converges, is added in dictionary, obtains popular word dictionary.
" " " our hospital " in such as example above does not include name entity part, is added in popular word dictionary." row
This participle unit contains part names entity entirely ", is classified as name entity vocabulary, is added without popular word dictionary;" son
Palace " is similar with " resection ", is not belonging to popular word.
Step 2, brief part-of-speech tagging: according to the popular word dictionary constructed in step 1, retain the part of speech mark of popular word
Note removes the part of speech label of doubtful name entity vocabulary, specific as follows:
2.1) part-of-speech tagging is carried out to all data using Chinese part of speech annotation tool such as jieba.As example above marks
Are as follows:
" _ p our hospital _ n row it is complete _ uterus n _ n resection _ l ", wherein " p, n, l " belong to part of speech, are expressed as being situated between
Word, noun, idiom.
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is general
Logical vocabulary, retains part of speech;If it is not, then illustrating that the mark unit may be divided into character comprising part names entity
Sequence, and marking each character part of speech is " s ";Obtain annotation results are as follows:
" _ p our hospital _ n row _ s it is complete _ the s _ palace s _ s cuts _ s is except _ s art _ s "
Step 3, building text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table: being separated into two parts of files for the part-of-speech tagging result in step 2,
Portion is text sequence, contains word unit and character cell;Another is the corresponding part of speech sequence of text sequence, is contained general
The part of speech of logical vocabulary and the entity part of speech " s " for being divided into character;Two parts of files are respectively trained using term vector tool word2vec,
Obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
Step 4, training entity prediction model: the mapping table obtained using step 3, the text and part of speech that will have labeled data
It is mapped to vector, fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains entity prediction model,
It is specific as follows:
4.1) text for having labeled data and part of speech are mapped to vector, obtained each by the mapping table obtained according to step 3
The text vector of sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m is
Sentence length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension is
l;The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V={ X is obtained;
P }, V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l.
4.2) the corresponding sequence vector of each sentence is input in model, by calculating t moment from attention layer
Input vector is to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment, W and Wv,It is function parameter,Indicate t
The weighted value of moment input and the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, carries out it
Normalization is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIndicate the defeated of i moment
Enter, c is calculated by the adduction to all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted.
Weight and current input are spliced into [vt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight
Information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctIndicate t moment input with entirely it is defeated
The weight entered, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM.Therefore, it obtains
From the output vector of attention layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is right
The output at BiLSTM network each moment is answered, k is the dimension of the network concealed layer of LSTM, because being two-way LSTM network,
Output vector dimension is 2k;
4.4) 4.3) output sequence is input to CRF layers by linear transformation:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is
The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity.The P obtained after linear transformation is
The emission probability matrix of CRF, wherein matrix element Pi,jIndicate that i-th of input marking is the probability of j-th of entity tag;Label
Shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to
The probability of j entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal mark
The probability of sequences y is signed, specific calculating process is as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence
Column;M indicates the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate i-th of input unit quilt
Labeled as yiThe probability of label, s (V, y) indicate to calculate the score of sequence label y;
Y indicates all sequence labels, to sequence label each in YCalculate separately the score of the sequence label
Summation obtains the total score of all possible sequence labels, thus obtains the normalization score p (y | V) of optimal sequence label y;Take prediction
Loss function of the negative logarithm of probability as model, training obtain the prediction model of name entity, and loss function is as follows:
L=-log (p (y | V))
Step 5, entity tag prediction
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the part-of-speech tagging tool to name entity to be extracted;Root
According to dictionary obtained in step 1, judge whether each mark unit appears in popular word dictionary, if it is, the mark
Unit is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be comprising part entity vocabulary, by its stroke
It is divided into character string, and marking each character part of speech is " s ";
5.2) 5.1) text of part-of-speech tagging result and part of speech are mapped to vector, obtained by the mapping table obtained according to step 3
To the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm,
Wherein m is sentence length, xtIndicate t-th of text unit, ptIndicate xthtPart of speech;By the text vector in each sentence, and
Corresponding part of speech vector is spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2,v3,...,vm},vt
Indicate t-th of input vector.
5.3) 5.2) vector is input to prediction model obtained in step 4, to the middle message of name entity to be extracted
Sub- medical record data carries out entity tag prediction.Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all possible sequence label, to each sequence label y in Y, meter
It calculates at currently input V, obtains the normalization score p (y | V) of sequence label y, y*Indicate the highest sequence label of score.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.
Claims (6)
1. a kind of Chinese electronic health record names entity recognition method, which comprises the following steps:
1) it constructs popular word dictionary: to there is labeled data to segment, constructing popular word dictionary;
2) brief part-of-speech tagging: according to the popular word dictionary constructed in step 1), retaining the part-of-speech tagging of popular word, removes
The part of speech label of doubtful name entity vocabulary;
3) text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table are constructed: using term vector training tool word2vec to the text for having labeled data and
Part of speech is trained respectively, obtains text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table;
4) prediction model of training name entity: the mapping table obtained using step 3), the text and part of speech that will have labeled data
It is mapped to vector, fusion part of speech is input to after splicing and from the model of attention mechanism, training obtains the pre- of name entity
Survey model;
5) Tag Estimation of entity is named: according to the popular word dictionary constructed in step 1), to the middle message of entity to be extracted
Sub- medical record data carries out brief part-of-speech tagging;The mapping table obtained using step 3), by the text of data and part of speech be mapped to
Amount;The Tag Estimation of entity is named using the prediction model that step 4) obtains.
2. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 1)
In, construct popular word dictionary, comprising the following steps:
1.1) using Chinese word segmentation tool to there is labeled data to segment;
1.2) judge whether each participle unit is within the scope of name entity, if it is, the participle unit is part names
Entity vocabulary contains part entity vocabulary, without processing;If it is not, then illustrating that the participle unit belongs to generic word
It converges, is added in dictionary, obtains popular word dictionary.
3. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 2)
In, brief part-of-speech tagging, comprising the following steps:
2.1) using Chinese part of speech annotation tool to have labeled data carry out part-of-speech tagging;
2.2) judge whether each mark unit appears in popular word dictionary, if it is, the mark unit is generic word
It converges, retains part of speech;If it is not, then illustrating that the mark unit may be divided into character comprising part names entity vocabulary
Sequence, to avoid participle mistake, and marking each character part of speech is " s ", to reduce the part-of-speech tagging mistake of name entity;
2.3) the part-of-speech tagging result for having labeled data is obtained.
4. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 3)
In, it constructs the detailed process of text and part of speech DUAL PROBLEMS OF VECTOR MAPPING table: the part-of-speech tagging result in step 2) is separated into two parts of files,
Portion is text sequence, contains word unit and character cell;Another is the corresponding part of speech sequence of text sequence, is contained general
The part of speech of logical vocabulary and the entity part of speech " s " for being divided into character;Two parts of files are respectively trained using term vector tool word2vec,
Obtain text vector mapping table and part of speech DUAL PROBLEMS OF VECTOR MAPPING table.
5. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 4)
In, the prediction model of training name entity, comprising the following steps:
4.1) text for having labeled data and part of speech are mapped to vector, obtain each sentence by the mapping table obtained according to step 3)
The text vector of son: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,pm, wherein m is sentence
Sub- length, xt∈RlIndicate t-th of text vector, vector dimension l;pt∈RlIndicate xtPart of speech vector, vector dimension l;
The corresponding part of speech vector of text vector in each sentence is spliced, the input vector of model: V={ X is obtained;P},
V={ v1,v2,v3,...,vm},vt∈R2lIndicate t-th of input vector, vector dimension 2l;
4.2) from attention layer, each unit is to the weight vectors of ingredients other in sentence in calculating sentence, and fusion is currently
Input and corresponding weight vectors, and are recompiled using LSTM network, are obtained obtaining and are merged sentence semantics and part of speech
Feature vector:
The input vector of t moment is calculated to the weight entirely inputted: ct=att (V, vt), specific calculating process is as follows:
Wherein vt, vjRespectively indicate the input of t moment and jth moment;W and Wv,It is function parameter,Indicate that t moment is defeated
Enter the weighted value with the input of jth moment.The weighted value for indicating t moment input and the input of the i-th moment, is normalized it
It is calculated Indicate the normalized weight value that t moment input inputted for the i-th moment, viIt indicates the input at i moment, leads to
It crosses and c is calculated to the adduction at all momentt, ctIndicate the input vector of t moment to the normalized weight entirely inputted;It will power
[v is spliced in weight and current inputt,ct], and recompiled using LSTM network to being originally inputted, incorporate weight information:
ht=LSTM (ht-1,[vt,ct])
Wherein ht-1Indicate the output of last moment, vtIndicate that t moment inputs, ctThe power for indicating t moment input and entirely inputting
Weight, ht∈RkCorresponding each moment recompile after output, k is the dimension of the network concealed layer of LSTM;Therefore, it obtains paying attention to certainly
The output vector of power layer: H={ h1,h2,h3,...,hm, wherein m is output sequence length;
4.3) text context characteristic information and part of speech contextual feature information are extracted using two-way LSTM neural network:
Q=BiLSTM (H)
Obtain BiLSTM layers of output are as follows: Q={ q1,q2,q3,...,qm, wherein m is output sequence length, qt∈R2kIt is corresponding
The output at BiLSTM network each moment, k are the dimension of the network concealed layer of LSTM, defeated because being two-way LSTM network
Outgoing vector dimension is 2k;
4.4) BiLSTM layers of output is subjected to linear transformation, obtains emission probability matrix, is input to CRF layers, and according to CRF layers
The label transition probability matrix learnt calculates the corresponding optimal sequence label of list entries, using the sequence of maximum probability as
The name entity class sequence label of final output:
By the output sequence Q of step 4.3) by linear transformation, it is input to CRF layers:
P=QWp+bp
Wherein Wp∈R2k×n, bp∈RnIt is parameter to be learned in model, P ∈ R is obtained after linear transformationm×n, wherein k is
The dimension of the network concealed layer of BiLSTM, m are the length of list entries, and n is entity tag quantity, and the P obtained after linear transformation is
The emission probability matrix of CRF, wherein matrix element Pi,jIt indicates to input the probability for being marked as j-th of entity tag i-th;Mark
Sign shift-matrix A ∈ Rn×nIt is that parameter matrix is acquired in model training, wherein matrix element Ai,jIndicate i-th of entity tag to
The probability of j-th of entity tag transfer;According to the two probability matrixs, calculates in the case where list entries is V, obtain optimal
The probability of sequence label y, specific calculating process are as follows:
Wherein, V indicates list entries;Y indicates optimal sequence label, the i.e. corresponding true tag sequence of current input sequence;M table
Show the length of input,Indicate yiLabel is to yi+1The probability of label transfer,Indicate that i-th of input unit is marked as yi
The probability of label, s (V, y) indicate to calculate the score of sequence label y;Y indicates all sequence labels, to sequence label each in YCalculate separately the score of the sequence labelSummation obtains the total score of all possible sequence labels, thus obtains optimal
The normalization score p (y | V) of sequence label y;Loss function of the negative logarithm of prediction probability as model is taken, training is named
The prediction model of entity, loss function are as follows:
L=-log (p (y | V)).
6. a kind of Chinese electronic health record according to claim 1 names entity recognition method, it is characterised in that: in step 5)
In, name the Tag Estimation of entity, comprising the following steps:
5.1) part-of-speech tagging is carried out using Chinese electronic health record data of the Chinese part of speech annotation tool to entity to be extracted;According to step
It is rapid 1) obtained in dictionary, judge whether each mark unit appears in popular word dictionary, if it is, the mark unit
It is popular word, retains part of speech;If it is not, then illustrating that the mark unit may be divided into comprising part entity vocabulary
Character string, and marking each character part of speech is " s ";
5.2) text of step 5.1) part-of-speech tagging result and part of speech are mapped to vector by the mapping table obtained according to step 3),
Obtain the text vector of each sentence: X={ x1,x2,x3,...,xmAnd corresponding part of speech vector: P={ p1,p2,p3,...,
pm, wherein m is sentence length, xtIndicate t text unit, ptIndicate xthtPart of speech;By the text vector in each sentence,
Corresponding part of speech vector is spliced, and the input vector of model: V={ X is obtained;P }, V={ v1,v2,v3,...,vm},vt
Indicate t input vector;
5.3) vector of step 5.2) is input to prediction model obtained in step 4), to the middle message of name entity to be extracted
Sub- medical record data carries out entity tag prediction;Take the forecasting sequence of maximum probability as final annotation results:Wherein Y indicates the set of all sequence labels, and to each sequence label y in Y, calculating is being worked as
Under preceding input V, the normalization score p (y | V) of sequence label y, y are obtained*Indicate the highest sequence label of score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910119391.1A CN109871538A (en) | 2019-02-18 | 2019-02-18 | A kind of Chinese electronic health record name entity recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910119391.1A CN109871538A (en) | 2019-02-18 | 2019-02-18 | A kind of Chinese electronic health record name entity recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109871538A true CN109871538A (en) | 2019-06-11 |
Family
ID=66918762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910119391.1A Pending CN109871538A (en) | 2019-02-18 | 2019-02-18 | A kind of Chinese electronic health record name entity recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871538A (en) |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223742A (en) * | 2019-06-14 | 2019-09-10 | 中南大学 | The clinical manifestation information extraction method and equipment of Chinese electronic health record data |
CN110347831A (en) * | 2019-06-28 | 2019-10-18 | 西安理工大学 | Based on the sensibility classification method from attention mechanism |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110444261A (en) * | 2019-07-11 | 2019-11-12 | 新华三大数据技术有限公司 | Sequence labelling network training method, electronic health record processing method and relevant apparatus |
CN110457682A (en) * | 2019-07-11 | 2019-11-15 | 新华三大数据技术有限公司 | Electronic health record part-of-speech tagging method, model training method and relevant apparatus |
CN110598203A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military imagination document entity information extraction method and device combined with dictionary |
CN110674641A (en) * | 2019-10-06 | 2020-01-10 | 武汉鸿名科技有限公司 | GPT-2 model-based Chinese electronic medical record entity identification method |
CN110765775A (en) * | 2019-11-01 | 2020-02-07 | 北京邮电大学 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
CN110837736A (en) * | 2019-11-01 | 2020-02-25 | 浙江大学 | Character structure-based named entity recognition method for Chinese medical record of iterative expansion convolutional neural network-conditional random field |
CN110866399A (en) * | 2019-10-24 | 2020-03-06 | 同济大学 | Chinese short text entity identification and disambiguation method based on enhanced character vector |
CN110866401A (en) * | 2019-11-18 | 2020-03-06 | 山东健康医疗大数据有限公司 | Chinese electronic medical record named entity identification method and system based on attention mechanism |
CN111046671A (en) * | 2019-12-12 | 2020-04-21 | 中国科学院自动化研究所 | Chinese named entity recognition method based on graph network and merged into dictionary |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111079377A (en) * | 2019-12-03 | 2020-04-28 | 哈尔滨工程大学 | Method for recognizing named entities oriented to Chinese medical texts |
CN111145914A (en) * | 2019-12-30 | 2020-05-12 | 四川大学华西医院 | Method and device for determining lung cancer clinical disease library text entity |
CN111145718A (en) * | 2019-12-30 | 2020-05-12 | 中国科学院声学研究所 | Chinese mandarin character-voice conversion method based on self-attention mechanism |
CN111144119A (en) * | 2019-12-27 | 2020-05-12 | 北京联合大学 | Entity identification method for improving knowledge migration |
CN111222340A (en) * | 2020-01-15 | 2020-06-02 | 东华大学 | Breast electronic medical record entity recognition system based on multi-standard active learning |
CN111243699A (en) * | 2020-01-14 | 2020-06-05 | 中南大学 | Chinese electronic medical record entity extraction method based on word information fusion |
CN111274788A (en) * | 2020-01-16 | 2020-06-12 | 创新工场(广州)人工智能研究有限公司 | Dual-channel joint processing method and device |
CN111312354A (en) * | 2020-02-10 | 2020-06-19 | 东华大学 | Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning |
CN111428036A (en) * | 2020-03-23 | 2020-07-17 | 浙江大学 | Entity relationship mining method based on biomedical literature |
CN111444720A (en) * | 2020-03-30 | 2020-07-24 | 华南理工大学 | Named entity recognition method for English text |
CN111523320A (en) * | 2020-04-20 | 2020-08-11 | 电子科技大学 | Chinese medical record word segmentation method based on deep learning |
CN111581972A (en) * | 2020-03-27 | 2020-08-25 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for identifying corresponding relation between symptom and part in text |
CN111581974A (en) * | 2020-04-27 | 2020-08-25 | 天津大学 | Biomedical entity identification method based on deep learning |
CN111651991A (en) * | 2020-04-15 | 2020-09-11 | 天津科技大学 | Medical named entity identification method utilizing multi-model fusion strategy |
CN111666754A (en) * | 2020-05-28 | 2020-09-15 | 平安医疗健康管理股份有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援系统有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111724897A (en) * | 2020-06-12 | 2020-09-29 | 电子科技大学 | Motion function data processing method and system |
CN111738006A (en) * | 2020-06-22 | 2020-10-02 | 苏州大学 | Commodity comment named entity recognition-based problem generation method |
CN111950287A (en) * | 2020-08-20 | 2020-11-17 | 广东工业大学 | Text-based entity identification method and related device |
CN111950283A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Chinese word segmentation and named entity recognition system for large-scale medical text mining |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
CN112149420A (en) * | 2020-09-01 | 2020-12-29 | 中国科学院信息工程研究所 | Entity recognition model training method, threat information entity extraction method and device |
CN112183099A (en) * | 2020-10-09 | 2021-01-05 | 上海明略人工智能(集团)有限公司 | Named entity identification method and system based on semi-supervised small sample extension |
CN112329459A (en) * | 2020-06-09 | 2021-02-05 | 北京沃东天骏信息技术有限公司 | Text labeling method and neural network model construction method |
CN112836046A (en) * | 2021-01-13 | 2021-05-25 | 哈尔滨工程大学 | Four-risk one-gold-field policy and regulation text entity identification method |
CN112861533A (en) * | 2019-11-26 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Entity word recognition method and device |
CN112927806A (en) * | 2019-12-05 | 2021-06-08 | 金色熊猫有限公司 | Medical record structured network cross-disease migration training method, device, medium and equipment |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN113051905A (en) * | 2019-12-28 | 2021-06-29 | 中移(成都)信息通信科技有限公司 | Medical named entity recognition training model and medical named entity recognition method |
CN113076751A (en) * | 2021-02-26 | 2021-07-06 | 北京工业大学 | Named entity recognition method and system, electronic device and storage medium |
WO2021139247A1 (en) * | 2020-08-06 | 2021-07-15 | 平安科技(深圳)有限公司 | Construction method, apparatus and device for medical domain knowledge map, and storage medium |
WO2021139239A1 (en) * | 2020-07-28 | 2021-07-15 | 平安科技(深圳)有限公司 | Mechanism entity extraction method, system and device based on multiple training targets |
CN113177416A (en) * | 2021-05-17 | 2021-07-27 | 同济大学 | Event element detection method combining sequence labeling and pattern matching |
CN113496120A (en) * | 2020-03-19 | 2021-10-12 | 复旦大学 | Domain entity extraction method, computer device, computer readable medium and processor |
CN113743116A (en) * | 2020-05-28 | 2021-12-03 | 株式会社理光 | Training method and device for named entity recognition and computer readable storage medium |
CN113779992A (en) * | 2021-07-19 | 2021-12-10 | 西安理工大学 | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training |
CN113807094A (en) * | 2020-06-11 | 2021-12-17 | 株式会社理光 | Entity identification method, device and computer readable storage medium |
CN114328485A (en) * | 2021-12-23 | 2022-04-12 | 中国科学院沈阳计算技术研究所有限公司 | Electronic medical record named entity identification method for improving BilSTM-CRF |
CN114970536A (en) * | 2022-06-22 | 2022-08-30 | 昆明理工大学 | Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition |
CN115146628A (en) * | 2021-11-21 | 2022-10-04 | 北京中科凡语科技有限公司 | Method and device for determining real boundary of marked entity and electronic equipment |
WO2022242074A1 (en) * | 2021-05-21 | 2022-11-24 | 山东省人工智能研究院 | Multi-feature fusion-based method for named entity recognition in chinese medical text |
CN116227483A (en) * | 2023-02-10 | 2023-06-06 | 南京南瑞信息通信科技有限公司 | Word boundary-based Chinese entity extraction method, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130297634A1 (en) * | 2012-05-07 | 2013-11-07 | Sap Ag | Entity Name Variant Generator |
CN107797992A (en) * | 2017-11-10 | 2018-03-13 | 北京百分点信息科技有限公司 | Name entity recognition method and device |
CN109062893A (en) * | 2018-07-13 | 2018-12-21 | 华南理工大学 | A kind of product name recognition methods based on full text attention mechanism |
-
2019
- 2019-02-18 CN CN201910119391.1A patent/CN109871538A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130297634A1 (en) * | 2012-05-07 | 2013-11-07 | Sap Ag | Entity Name Variant Generator |
CN107797992A (en) * | 2017-11-10 | 2018-03-13 | 北京百分点信息科技有限公司 | Name entity recognition method and device |
CN109062893A (en) * | 2018-07-13 | 2018-12-21 | 华南理工大学 | A kind of product name recognition methods based on full text attention mechanism |
Non-Patent Citations (1)
Title |
---|
XIAOLING CAI 等: "A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records", 《4TH CHINA HEALTH INFORMATION PROCESSING CONFERENCE》 * |
Cited By (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223742A (en) * | 2019-06-14 | 2019-09-10 | 中南大学 | The clinical manifestation information extraction method and equipment of Chinese electronic health record data |
CN110347831A (en) * | 2019-06-28 | 2019-10-18 | 西安理工大学 | Based on the sensibility classification method from attention mechanism |
CN110457682B (en) * | 2019-07-11 | 2022-08-09 | 新华三大数据技术有限公司 | Part-of-speech tagging method for electronic medical record, model training method and related device |
CN110427493A (en) * | 2019-07-11 | 2019-11-08 | 新华三大数据技术有限公司 | Electronic health record processing method, model training method and relevant apparatus |
CN110444261A (en) * | 2019-07-11 | 2019-11-12 | 新华三大数据技术有限公司 | Sequence labelling network training method, electronic health record processing method and relevant apparatus |
CN110457682A (en) * | 2019-07-11 | 2019-11-15 | 新华三大数据技术有限公司 | Electronic health record part-of-speech tagging method, model training method and relevant apparatus |
CN110427493B (en) * | 2019-07-11 | 2022-04-08 | 新华三大数据技术有限公司 | Electronic medical record processing method, model training method and related device |
CN110598203A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military imagination document entity information extraction method and device combined with dictionary |
CN110674641A (en) * | 2019-10-06 | 2020-01-10 | 武汉鸿名科技有限公司 | GPT-2 model-based Chinese electronic medical record entity identification method |
CN110674641B (en) * | 2019-10-06 | 2024-02-02 | 湖北大学 | Chinese electronic medical record entity identification method based on GPT-2 model |
CN110866399A (en) * | 2019-10-24 | 2020-03-06 | 同济大学 | Chinese short text entity identification and disambiguation method based on enhanced character vector |
CN110866399B (en) * | 2019-10-24 | 2023-05-02 | 同济大学 | Chinese short text entity recognition and disambiguation method based on enhanced character vector |
CN110837736A (en) * | 2019-11-01 | 2020-02-25 | 浙江大学 | Character structure-based named entity recognition method for Chinese medical record of iterative expansion convolutional neural network-conditional random field |
CN110765775B (en) * | 2019-11-01 | 2020-08-04 | 北京邮电大学 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
CN110765775A (en) * | 2019-11-01 | 2020-02-07 | 北京邮电大学 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111079418B (en) * | 2019-11-06 | 2023-12-05 | 科大讯飞股份有限公司 | Named entity recognition method, device, electronic equipment and storage medium |
CN110866401A (en) * | 2019-11-18 | 2020-03-06 | 山东健康医疗大数据有限公司 | Chinese electronic medical record named entity identification method and system based on attention mechanism |
CN112861533A (en) * | 2019-11-26 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Entity word recognition method and device |
CN111079377B (en) * | 2019-12-03 | 2022-12-13 | 哈尔滨工程大学 | Method for recognizing named entities of Chinese medical texts |
CN111079377A (en) * | 2019-12-03 | 2020-04-28 | 哈尔滨工程大学 | Method for recognizing named entities oriented to Chinese medical texts |
CN112927806A (en) * | 2019-12-05 | 2021-06-08 | 金色熊猫有限公司 | Medical record structured network cross-disease migration training method, device, medium and equipment |
CN112927806B (en) * | 2019-12-05 | 2022-11-25 | 金色熊猫有限公司 | Medical record structured network cross-disease migration training method, device, medium and equipment |
CN113033192B (en) * | 2019-12-09 | 2024-04-26 | 株式会社理光 | Training method and device for sequence annotation and computer readable storage medium |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN111046671A (en) * | 2019-12-12 | 2020-04-21 | 中国科学院自动化研究所 | Chinese named entity recognition method based on graph network and merged into dictionary |
CN111144119B (en) * | 2019-12-27 | 2024-03-29 | 北京联合大学 | Entity identification method for improving knowledge migration |
CN111144119A (en) * | 2019-12-27 | 2020-05-12 | 北京联合大学 | Entity identification method for improving knowledge migration |
CN113051905A (en) * | 2019-12-28 | 2021-06-29 | 中移(成都)信息通信科技有限公司 | Medical named entity recognition training model and medical named entity recognition method |
CN111145914B (en) * | 2019-12-30 | 2023-08-04 | 四川大学华西医院 | Method and device for determining text entity of lung cancer clinical disease seed bank |
CN111145718A (en) * | 2019-12-30 | 2020-05-12 | 中国科学院声学研究所 | Chinese mandarin character-voice conversion method based on self-attention mechanism |
CN111145914A (en) * | 2019-12-30 | 2020-05-12 | 四川大学华西医院 | Method and device for determining lung cancer clinical disease library text entity |
CN111243699A (en) * | 2020-01-14 | 2020-06-05 | 中南大学 | Chinese electronic medical record entity extraction method based on word information fusion |
CN111222340A (en) * | 2020-01-15 | 2020-06-02 | 东华大学 | Breast electronic medical record entity recognition system based on multi-standard active learning |
CN111274788A (en) * | 2020-01-16 | 2020-06-12 | 创新工场(广州)人工智能研究有限公司 | Dual-channel joint processing method and device |
CN111312354B (en) * | 2020-02-10 | 2023-10-24 | 东华大学 | Mammary gland medical record entity identification marking enhancement system based on multi-agent reinforcement learning |
CN111312354A (en) * | 2020-02-10 | 2020-06-19 | 东华大学 | Breast medical record entity identification and annotation enhancement system based on multi-agent reinforcement learning |
CN113496120B (en) * | 2020-03-19 | 2022-07-29 | 复旦大学 | Domain entity extraction method, computer device, computer readable medium and processor |
CN113496120A (en) * | 2020-03-19 | 2021-10-12 | 复旦大学 | Domain entity extraction method, computer device, computer readable medium and processor |
CN111428036B (en) * | 2020-03-23 | 2022-05-27 | 浙江大学 | Entity relationship mining method based on biomedical literature |
CN111428036A (en) * | 2020-03-23 | 2020-07-17 | 浙江大学 | Entity relationship mining method based on biomedical literature |
WO2021190236A1 (en) * | 2020-03-23 | 2021-09-30 | 浙江大学 | Entity relation mining method based on biomedical literature |
CN111581972A (en) * | 2020-03-27 | 2020-08-25 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for identifying corresponding relation between symptom and part in text |
CN111444720A (en) * | 2020-03-30 | 2020-07-24 | 华南理工大学 | Named entity recognition method for English text |
CN111651991A (en) * | 2020-04-15 | 2020-09-11 | 天津科技大学 | Medical named entity identification method utilizing multi-model fusion strategy |
CN111651991B (en) * | 2020-04-15 | 2022-08-26 | 天津科技大学 | Medical named entity identification method utilizing multi-model fusion strategy |
CN111523320A (en) * | 2020-04-20 | 2020-08-11 | 电子科技大学 | Chinese medical record word segmentation method based on deep learning |
CN111581974A (en) * | 2020-04-27 | 2020-08-25 | 天津大学 | Biomedical entity identification method based on deep learning |
CN111680512B (en) * | 2020-05-11 | 2024-04-02 | 上海阿尔卡特网络支援系统有限公司 | Named entity recognition model, telephone exchange extension switching method and system |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援系统有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111666754A (en) * | 2020-05-28 | 2020-09-15 | 平安医疗健康管理股份有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN113743116A (en) * | 2020-05-28 | 2021-12-03 | 株式会社理光 | Training method and device for named entity recognition and computer readable storage medium |
CN111666754B (en) * | 2020-05-28 | 2023-02-03 | 深圳平安医疗健康科技服务有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN112329459A (en) * | 2020-06-09 | 2021-02-05 | 北京沃东天骏信息技术有限公司 | Text labeling method and neural network model construction method |
CN113807094B (en) * | 2020-06-11 | 2024-03-19 | 株式会社理光 | Entity recognition method, entity recognition device and computer readable storage medium |
CN113807094A (en) * | 2020-06-11 | 2021-12-17 | 株式会社理光 | Entity identification method, device and computer readable storage medium |
CN111724897B (en) * | 2020-06-12 | 2022-07-01 | 电子科技大学 | Motion function data processing method and system |
CN111724897A (en) * | 2020-06-12 | 2020-09-29 | 电子科技大学 | Motion function data processing method and system |
CN111738006A (en) * | 2020-06-22 | 2020-10-02 | 苏州大学 | Commodity comment named entity recognition-based problem generation method |
WO2021139239A1 (en) * | 2020-07-28 | 2021-07-15 | 平安科技(深圳)有限公司 | Mechanism entity extraction method, system and device based on multiple training targets |
CN111950283A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Chinese word segmentation and named entity recognition system for large-scale medical text mining |
WO2021139247A1 (en) * | 2020-08-06 | 2021-07-15 | 平安科技(深圳)有限公司 | Construction method, apparatus and device for medical domain knowledge map, and storage medium |
CN111950287A (en) * | 2020-08-20 | 2020-11-17 | 广东工业大学 | Text-based entity identification method and related device |
CN111950287B (en) * | 2020-08-20 | 2024-04-23 | 广东工业大学 | Entity identification method based on text and related device |
CN112001177B (en) * | 2020-08-24 | 2024-08-13 | 浪潮云信息技术股份公司 | Electronic medical record named entity recognition method and system integrating deep learning and rules |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
CN112149420A (en) * | 2020-09-01 | 2020-12-29 | 中国科学院信息工程研究所 | Entity recognition model training method, threat information entity extraction method and device |
CN112183099A (en) * | 2020-10-09 | 2021-01-05 | 上海明略人工智能(集团)有限公司 | Named entity identification method and system based on semi-supervised small sample extension |
CN112836046A (en) * | 2021-01-13 | 2021-05-25 | 哈尔滨工程大学 | Four-risk one-gold-field policy and regulation text entity identification method |
CN113076751A (en) * | 2021-02-26 | 2021-07-06 | 北京工业大学 | Named entity recognition method and system, electronic device and storage medium |
CN113177416A (en) * | 2021-05-17 | 2021-07-27 | 同济大学 | Event element detection method combining sequence labeling and pattern matching |
WO2022242074A1 (en) * | 2021-05-21 | 2022-11-24 | 山东省人工智能研究院 | Multi-feature fusion-based method for named entity recognition in chinese medical text |
CN113779992A (en) * | 2021-07-19 | 2021-12-10 | 西安理工大学 | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training |
CN115146628A (en) * | 2021-11-21 | 2022-10-04 | 北京中科凡语科技有限公司 | Method and device for determining real boundary of marked entity and electronic equipment |
CN114328485A (en) * | 2021-12-23 | 2022-04-12 | 中国科学院沈阳计算技术研究所有限公司 | Electronic medical record named entity identification method for improving BilSTM-CRF |
CN114970536A (en) * | 2022-06-22 | 2022-08-30 | 昆明理工大学 | Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition |
CN116227483A (en) * | 2023-02-10 | 2023-06-06 | 南京南瑞信息通信科技有限公司 | Word boundary-based Chinese entity extraction method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109871538A (en) | A kind of Chinese electronic health record name entity recognition method | |
CN107977361B (en) | Chinese clinical medical entity identification method based on deep semantic information representation | |
CN111444726B (en) | Chinese semantic information extraction method and device based on long-short-term memory network of bidirectional lattice structure | |
WO2021139424A1 (en) | Text content quality evaluation method, apparatus and device, and storage medium | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN109657239B (en) | Chinese named entity recognition method based on attention mechanism and language model learning | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN112002411A (en) | Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record | |
CN109543181B (en) | Named entity model and system based on combination of active learning and deep learning | |
CN110297908A (en) | Diagnosis and treatment program prediction method and device | |
CN117076653B (en) | Knowledge base question-answering method based on thinking chain and visual lifting context learning | |
CN110263325B (en) | Chinese word segmentation system | |
CN107748757A (en) | A kind of answering method of knowledge based collection of illustrative plates | |
CN108829719A (en) | The non-true class quiz answers selection method of one kind and system | |
CN111914556B (en) | Emotion guiding method and system based on emotion semantic transfer pattern | |
CN113724882B (en) | Method, device, equipment and medium for constructing user portrait based on inquiry session | |
CN111400455A (en) | Relation detection method of question-answering system based on knowledge graph | |
CN110096572B (en) | Sample generation method, device and computer readable medium | |
CN111159345B (en) | Chinese knowledge base answer acquisition method and device | |
WO2021082086A1 (en) | Machine reading method, system, device, and storage medium | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN113761890A (en) | BERT context sensing-based multi-level semantic information retrieval method | |
CN115310448A (en) | Chinese named entity recognition method based on combining bert and word vector | |
CN113657105A (en) | Medical entity extraction method, device, equipment and medium based on vocabulary enhancement | |
CN116341546A (en) | Medical natural language processing method based on pre-training model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190611 |
|
WD01 | Invention patent application deemed withdrawn after publication |