CN110287494A - A method of the short text Similarity matching based on deep learning BERT algorithm - Google Patents
A method of the short text Similarity matching based on deep learning BERT algorithm Download PDFInfo
- Publication number
- CN110287494A CN110287494A CN201910583223.8A CN201910583223A CN110287494A CN 110287494 A CN110287494 A CN 110287494A CN 201910583223 A CN201910583223 A CN 201910583223A CN 110287494 A CN110287494 A CN 110287494A
- Authority
- CN
- China
- Prior art keywords
- short text
- text
- word
- bert
- similarity matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 45
- 230000011218 segmentation Effects 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 20
- 206010028916 Neologism Diseases 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000003058 natural language processing Methods 0.000 description 7
- 230000002146 bilateral effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 208000002874 Acne Vulgaris Diseases 0.000 description 1
- 206010000496 acne Diseases 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of methods of short text Similarity matching based on deep learning BERT algorithm, belong to field of artificial intelligence, the realization process of this method is as follows: 1), using common data sets and short text carrying out BERT training, obtain trained BERT model;2) word segmentation processing, is carried out to short text to be matched;3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, gets the feature vector of short text;4), Similarity matching short text is obtained using cosine similarity algorithm.The present invention carries out short text Similarity matching using the BERT model of pre-training, relatively before text Similarity matching method there is preferably performance.
Description
Technical field
The present invention relates to field of artificial intelligence, specifically a kind of short text based on deep learning BERT algorithm
The method of Similarity matching.
Background technique
During natural language processing, the similitude between two short texts of measurement, text are often involved how to
It is a kind of semantic space of higher-dimension, abstract decomposition how is carried out to it, goes to quantify its similitude in mathematical angle so as to stand.
If the metric form of similitude between text can be obtained, we can utilize the K-means of partitioning, based on density
The DBSCAN either probabilistic method based on model carries out the clustering between text;On the other hand, we also can use
Similitude between text carries out duplicate removal pretreatment to large-scale corpus, or looks for the related names (mould of a certain entity name
Paste matching).And there are many kinds of methods for the similitude of two character strings of measurement, such as most directly utilize hashcode, and classics
Topic model perhaps using term vector by text be abstracted as vector indicate again by Euclidean distance between feature vector or
Pearson's distance is measured.With the fast development of artificial intelligence, now constantly leaps up and revealed new algorithm and model, with more
The calculating of good, more efficient realization deep learning, short text Similarity matching play text analyzing and corpus processing important
Effect promotes the efficiency of short text Similarity matching, has great importance how under the calculating environment quickly updated.
Summary of the invention
Technical assignment of the invention be against the above deficiency place, a kind of short essay based on deep learning BERT algorithm is provided
The method of this Similarity matching carries out the Similarity matching of short text using the model of pre-training, has better application effect.
The technical solution adopted by the present invention to solve the technical problems is:
A method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method
Process is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is obtained
To the feature vector of short text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
BERT is a kind of method that pre-training language indicates, one is had trained on a large amount of corpus of text (wikipedia) and is led to
Then " language understanding " model is gone to execute the NLP task for wanting to do with this model.The method performance of BERT ratio before more goes out
Color, because it is first with unsupervised, the depth bilateral system on pre-training NLP.We use the BERT of pre-training
Model has preferably performance to carry out short text Similarity matching.
For BERT by the word of random mask list entries, target is exactly to utilize contextual information (relative to tradition from left-hand
Right language model, mask model can be predicted using masked word or so context simultaneously) it predicts by the word of mask.
It using the contextual information of masked or so is realized by the two-way encoder of Transformer simultaneously.It so both can be with
Solve " language model is unidirectional " limitation.In addition, BERT model introduces the task of " predicting whether as next sentence ", to learn jointly
Practise pre-trained representations.
Specifically, described instructed using common data sets and short text progress BERT training including the relationship between word in sentence
Relationship training between experienced and sentence.
Wherein, the relationship training method in the sentence between word are as follows:
Part of words is covered as training sample at random, wherein 80% is replaced with masked token, 10% at random
A word come replace, 10% keep this word it is constant.
Preferably, the part of words is 15%, i.e., covers 15% word at random as training sample, and model needs compared with
More training steps are restrained.
Further, the relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is
The actual context of A lower;Negative sample be in corpus randomly selected sentence as B.
Preferably, carrying out word segmentation processing to short text to be matched includes that removal text stop words and removal link.
Specifically, carrying out word segmentation processing to short text to be matched, participleization processing, benefit are carried out to short text with participle tool
With the acne method based on statistics, while carrying out string matching participle to short text, hidden horse is used using dictionary for word segmentation
Er Kefu model identifies some neologisms, is split to short text.
Specifically, the short text after word segmentation processing is input in trained BERT model, BERT model passes through inquiry
Each word in short is converted to one-dimensional vector by word vector table, as mode input;Model output is then that each word of input is corresponding
Fusion full text semantic information after vector indicate.
Further, mode input also includes text vector and position vector,
Text vector: the value of the vector learns automatically during model training, and the overall situation for portraying text is semantic
Information, and blended with the semantic information of single character/word;
Position vector: the semantic information as entrained by the character/word for appearing in text different location has differences,
BERT model adds a different vector respectively to the character/word of different location to distinguish.
Using the adduction of word vector, text vector and position vector as mode input, model is exported by character/word BERT model
The text vector that vector converts can include more accurate semantic information.
Preferably, Similarity matching short text is obtained using cosine similarity algorithm, to target short text and other short texts
Distribution carries out cosine similarity calculating, takes the highest text of similarity as the Similarity matching text of target text.
A kind of method of short text Similarity matching based on deep learning BERT algorithm of the invention compared with prior art,
It has the advantages that
Direct before measure the similitudes of two character strings relatively using BERT model using hashcode, it is classical
Text is abstracted as vector expression using term vector by topic model, then passes through the Euclidean distance between feature vector or Pearson
The mode of distance metric is compared, BERT method performance it is outstanding, it be first on pre-training NLP it is unsupervised,
Depth bilateral system, so short text Similarity matching is carried out using the BERT model of pre-training has preferably performance, it can be with
Very big promotion matching accuracy, improves the efficiency of short text Similarity matching, preferably realizes the text in natural language processing
Analysis and corpus processing.
Detailed description of the invention
Fig. 1 is the flow diagram of the method for the short text Similarity matching of the invention based on deep learning BERT algorithm;
Fig. 2 is the secondary relationship training example figure of BERT algorithm in embodiment.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
A method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method
Process is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is obtained
To the feature vector of short text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
BERT is a kind of method that pre-training language indicates, one is had trained on a large amount of corpus of text (wikipedia) and is led to
Then " language understanding " model is gone to execute the NLP task for wanting to do with this model.The method performance of BERT ratio before more goes out
Color, because it is first with unsupervised, the depth bilateral system on pre-training NLP.We use the BERT of pre-training
Model has preferably performance to carry out short text Similarity matching.
For BERT by the word of random mask list entries, target is exactly to utilize contextual information (relative to tradition from left-hand
Right language model, mask model can be predicted using masked word or so context simultaneously) it predicts by the word of mask.
It using the contextual information of masked or so is realized by the two-way encoder of Transformer simultaneously.It so both can be with
Solve " language model is unidirectional " limitation.In addition, BERT model introduces the task of " predicting whether as next sentence ", to learn jointly
Practise pre-trained representations.
Wherein, described to include the relationship training between word in sentence using common data sets and short text progress BERT training
Relationship training between sentence.
Relationship training method in sentence between word are as follows:
15% word is covered as training sample at random, wherein 80% is replaced with masked token, 10% at random
A word come replace, 10% keep this word it is constant.
With " Wu Tse-tien is first empress of China." for the words, as shown in Fig. 2,
Position " then " is chosen, is covered " then " as training sample;
For " then " being occluded, wherein 80% is replaced with masked token:
" military day [mask] is first empress of China.";
Wherein 10% keep this word constant:
" Wu Tse-tien is first empress of China.";
Wherein 10% replaced with a random word:
" Wu Zongtian is first empress of China.";
Predict the position of " then ":
Then the positive Zhou Yingzong of king often takes
0.999 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
The part selection 80% being occluded is replaced with masked token, replaces choosing with [mask] to avoid 100%
Word caused by fine tuning when model it is possible that some words that do not see, therefore wherein 10% keep this word not
Become;Wherein 10% is replaced with a random word and be intended to Transformer and to keep to the distributed table of each input token
Sign, otherwise Transformer is likely to remember that this [MASK] is exactly " then ".
(training of rest part is same as above, and the convergence of model can be carried out by repeatedly training.)
Relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is
The actual context of A lower;Negative sample be in corpus randomly selected sentence as B.
Many important Downstream Jobs are based on the understanding to relationship between two text sentences, such as question and answer (QA) and natural language
Speech infers (NLI), and this relationship is directly obtained not by Language Modeling.In order to train the model for understanding sentence relationship,
Pre-training one binaryzation lower prediction task, the task can easily be generated from any single language corpus.It is specific next
Say, select sentence A and B as pre-training sample: B have 50% may be A next sentence, also there is 50% to be from language
Expect the random sentence in library.
Wherein, carrying out word segmentation processing to short text to be matched includes that removal text stop words and removal link.
Word segmentation processing is carried out to short text to be matched, participleization is carried out to short text with participle tool and is handled, using being based on
The participle of statistics while carrying out string matching participle to short text, uses Hidden Markov Model using dictionary for word segmentation
It identifies some neologisms, short text is split.
Such as: text library format is corresponding two texts (format is " id ", " text-a ", " text-b ") two-by-two,
In, text-a is similar " what is private health insurance? ", " what private health insurance is? " or the like approximate short text,
And text-b is then corresponding text, in upper example text-a, text-b is then to tell that user's private health insurance is
Text.It utilizes " jieba " to segment, each of text library sample is segmented.During participle, needing first will such as
Punctuation mark and stop word etc. are removed without the stop words of specific meaning, guarantee participle effect and speed, secondly according to user
Dictionary is segmented, it is ensured that can extract proprietary noun.In upper example, " private health insurance " can be regarded as a proprietary name
Word, rather than it is divided into " private, health, insurance " three words.Be directed to " what is private health insurance? ", then can be converted to
" what " "Yes" " private health insurance ".
Short text after word segmentation processing is input in trained BERT model, the feature vector of short text is got:
Short text after word segmentation processing is input in trained BERT model, BERT model passes through inquiry word vector table
Each word in short is converted into one-dimensional vector, as mode input;Model output is then that the corresponding fusion of each word of input is complete
Vector after literary semantic information indicates;
In addition, mode input, in addition to word vector, also include other two part: 1, text vector: the value of the vector exists
Automatic study, mutually melts for portraying the global semantic information of text, and with the semantic information of single character/word during model training
It closes;2, position vector: the semantic information as entrained by the character/word for appearing in text different location have differences (such as: " I
Like you " and " you like me "), therefore, BERT model adds a different vector respectively to the character/word of different location to make area
Point;
BERT model using the adduction of word vector, text vector and position vector as mode input, therefore model output by
The text vector that character/word vector converts can include more accurate semantic information.
Similarity matching short text is obtained using cosine similarity algorithm, target short text and the distribution of other short texts are carried out
Cosine similarity calculates, and takes the highest text of similarity as the Similarity matching text of target text.
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers
Work as understanding, the present invention is not limited to above-mentioned specific embodiments.On the basis of the disclosed embodiments, the technical field
Technical staff can arbitrarily combine different technical features, to realize different technical solutions.
Except for the technical features described in the specification, it all is technically known to those skilled in the art.
Claims (10)
1. a kind of method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method
Journey is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is got short
The feature vector of text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
2. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special
Sign is that described to carry out BERT training using common data sets and short text include relationship training and the sentence in sentence between word
Between relationship training.
3. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 2, special
Sign is the relationship training method in the sentence between word are as follows:
Part of words is covered as training sample at random, wherein 80% is replaced with masked token, 10% uses random one
A word come replace, 10% keep this word it is constant.
4. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 3, special
Sign is that the part of words is 15%.
5. a kind of side of the short text Similarity matching based on deep learning BERT algorithm according to Claims 2 or 3 or 4
Method, it is characterised in that the relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is A
Actual context lower;Negative sample be in corpus randomly selected sentence as B.
6. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special
Sign is that it includes removal text stop words and removal link that word segmentation processing is carried out to short text to be matched.
7. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 6, special
Sign is that carrying out participleization to short text with participle tool is handled, and using the segmenting method based on statistics, uses dictionary for word segmentation
While carrying out string matching participle to short text, some neologisms are identified using Hidden Markov Model, and short text is carried out
Segmentation.
8. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special
Sign is for the short text after word segmentation processing to be input in trained BERT model, and BERT model will by inquiry word vector table
Each word in short is converted to one-dimensional vector, as mode input;Model output is then the corresponding fusion full text of each word of input
Vector after semantic information indicates.
9. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 8, special
Sign is that mode input also includes text vector and position vector,
Text vector: the value of the vector learns automatically during model training, for portraying the global semantic information of text,
And it is blended with the semantic information of single character/word;
Position vector: the semantic information as entrained by the character/word for appearing in text different location has differences, BERT
Model adds a different vector respectively to the character/word of different location to distinguish.
10. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special
Sign is to obtain Similarity matching short text using cosine similarity algorithm, more than target short text and the distribution progress of other short texts
String similarity calculation takes the highest text of similarity as the Similarity matching text of target text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910583223.8A CN110287494A (en) | 2019-07-01 | 2019-07-01 | A method of the short text Similarity matching based on deep learning BERT algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910583223.8A CN110287494A (en) | 2019-07-01 | 2019-07-01 | A method of the short text Similarity matching based on deep learning BERT algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287494A true CN110287494A (en) | 2019-09-27 |
Family
ID=68021471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910583223.8A Pending CN110287494A (en) | 2019-07-01 | 2019-07-01 | A method of the short text Similarity matching based on deep learning BERT algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287494A (en) |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674772A (en) * | 2019-09-29 | 2020-01-10 | 国家电网有限公司技术学院分公司 | Intelligent safety control auxiliary system and method for electric power operation site |
CN110750616A (en) * | 2019-10-16 | 2020-02-04 | 网易(杭州)网络有限公司 | Retrieval type chatting method and device and computer equipment |
CN110929714A (en) * | 2019-11-22 | 2020-03-27 | 北京航空航天大学 | Information extraction method of intensive text pictures based on deep learning |
CN111026850A (en) * | 2019-12-23 | 2020-04-17 | 园宝科技(武汉)有限公司 | Intellectual property matching technology of bidirectional coding representation of self-attention mechanism |
CN111090755A (en) * | 2019-11-29 | 2020-05-01 | 福建亿榕信息技术有限公司 | Text incidence relation judging method and storage medium |
CN111126068A (en) * | 2019-12-25 | 2020-05-08 | 中电云脑(天津)科技有限公司 | Chinese named entity recognition method and device and electronic equipment |
CN111159340A (en) * | 2019-12-24 | 2020-05-15 | 重庆兆光科技股份有限公司 | Answer matching method and system for machine reading understanding based on random optimization prediction |
CN111222329A (en) * | 2019-12-10 | 2020-06-02 | 上海八斗智能技术有限公司 | Sentence vector training method and model, and sentence vector prediction method and system |
CN111241275A (en) * | 2020-01-02 | 2020-06-05 | 厦门快商通科技股份有限公司 | Short text similarity evaluation method, device and equipment |
CN111241851A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Semantic similarity determination method and device and processing equipment |
CN111339766A (en) * | 2020-02-19 | 2020-06-26 | 云南电网有限责任公司昆明供电局 | Operation ticket compliance checking method and device |
CN111368037A (en) * | 2020-03-06 | 2020-07-03 | 平安科技(深圳)有限公司 | Text similarity calculation method and device based on Bert model |
CN111401076A (en) * | 2020-04-09 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN111460162A (en) * | 2020-04-11 | 2020-07-28 | 科技日报社 | Text classification method and device, terminal equipment and computer readable storage medium |
CN111553479A (en) * | 2020-05-13 | 2020-08-18 | 鼎富智能科技有限公司 | Model distillation method, text retrieval method and text retrieval device |
CN111563143A (en) * | 2020-07-20 | 2020-08-21 | 上海二三四五网络科技有限公司 | Method and device for determining new words |
CN111666753A (en) * | 2020-05-11 | 2020-09-15 | 清华大学深圳国际研究生院 | Short text matching method and system based on global and local matching |
CN111881257A (en) * | 2020-07-24 | 2020-11-03 | 广州大学 | Automatic matching method, system and storage medium based on subject word and sentence subject matter |
CN112101030A (en) * | 2020-08-24 | 2020-12-18 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112100373A (en) * | 2020-08-25 | 2020-12-18 | 南方电网深圳数字电网研究院有限公司 | Contract text analysis method and system based on deep neural network |
CN112231449A (en) * | 2020-12-10 | 2021-01-15 | 杭州识度科技有限公司 | Vertical field entity chain finger system based on multi-path recall |
CN112308743A (en) * | 2020-10-21 | 2021-02-02 | 上海交通大学 | Trial risk early warning method based on triple similar tasks |
CN112329450A (en) * | 2020-07-29 | 2021-02-05 | 好人生(上海)健康科技有限公司 | Insurance medical code mapping dictionary table production method |
CN112381099A (en) * | 2020-11-24 | 2021-02-19 | 中教云智数字科技有限公司 | Question recording system based on digital education resources |
CN112580373A (en) * | 2020-12-26 | 2021-03-30 | 内蒙古工业大学 | High-quality Mongolian unsupervised neural machine translation method |
WO2021082842A1 (en) * | 2019-10-29 | 2021-05-06 | 平安科技(深圳)有限公司 | Quality perception-based text generation method and apparatus, device, and storage medium |
CN113221530A (en) * | 2021-04-19 | 2021-08-06 | 杭州火石数智科技有限公司 | Text similarity matching method and device based on circle loss, computer equipment and storage medium |
CN113221531A (en) * | 2021-06-04 | 2021-08-06 | 西安邮电大学 | Multi-model dynamic collaborative semantic matching method |
CN113536789A (en) * | 2021-09-16 | 2021-10-22 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for predicting relevance of algorithm competition |
CN113569011A (en) * | 2021-07-27 | 2021-10-29 | 马上消费金融股份有限公司 | Training method, device and equipment of text matching model and storage medium |
CN113590763A (en) * | 2021-09-27 | 2021-11-02 | 湖南大学 | Similar text retrieval method and device based on deep learning and storage medium |
CN113591475A (en) * | 2021-08-03 | 2021-11-02 | 美的集团(上海)有限公司 | Unsupervised interpretable word segmentation method and device and electronic equipment |
CN113590813A (en) * | 2021-01-20 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Text classification method, recommendation device and electronic equipment |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
CN114154496A (en) * | 2022-02-08 | 2022-03-08 | 成都四方伟业软件股份有限公司 | Coal prison classification scheme comparison method and device based on deep learning BERT model |
CN114358210A (en) * | 2022-01-14 | 2022-04-15 | 平安科技(深圳)有限公司 | Text similarity calculation method and device, computer equipment and storage medium |
CN114357109A (en) * | 2021-11-25 | 2022-04-15 | 达而观数据(成都)有限公司 | Investment audit doubtful point extraction method based on mixed semantic similarity model |
CN115186660A (en) * | 2022-07-07 | 2022-10-14 | 东航技术应用研发中心有限公司 | Aviation safety report analysis and evaluation method based on text similarity model |
CN115309899A (en) * | 2022-08-09 | 2022-11-08 | 烟台中科网络技术研究所 | Method and system for identifying and storing specific content in text |
WO2022252638A1 (en) * | 2021-05-31 | 2022-12-08 | 平安科技(深圳)有限公司 | Text matching method and apparatus, computer device and readable storage medium |
CN116127334A (en) * | 2023-02-22 | 2023-05-16 | 佛山科学技术学院 | Semi-structured text matching method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030004716A1 (en) * | 2001-06-29 | 2003-01-02 | Haigh Karen Z. | Method and apparatus for determining a measure of similarity between natural language sentences |
US20150095017A1 (en) * | 2013-09-27 | 2015-04-02 | Google Inc. | System and method for learning word embeddings using neural language models |
CN109710770A (en) * | 2019-01-31 | 2019-05-03 | 北京牡丹电子集团有限责任公司数字电视技术中心 | A kind of file classification method and device based on transfer learning |
CN109815336A (en) * | 2019-01-28 | 2019-05-28 | 无码科技(杭州)有限公司 | A kind of text polymerization and system |
-
2019
- 2019-07-01 CN CN201910583223.8A patent/CN110287494A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030004716A1 (en) * | 2001-06-29 | 2003-01-02 | Haigh Karen Z. | Method and apparatus for determining a measure of similarity between natural language sentences |
US20150095017A1 (en) * | 2013-09-27 | 2015-04-02 | Google Inc. | System and method for learning word embeddings using neural language models |
CN109815336A (en) * | 2019-01-28 | 2019-05-28 | 无码科技(杭州)有限公司 | A kind of text polymerization and system |
CN109710770A (en) * | 2019-01-31 | 2019-05-03 | 北京牡丹电子集团有限责任公司数字电视技术中心 | A kind of file classification method and device based on transfer learning |
Non-Patent Citations (1)
Title |
---|
刘继明等: "基于小样本机器学习的跨任务对话系统", 《重庆邮电大学学报(自然科学版)》 * |
Cited By (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674772B (en) * | 2019-09-29 | 2022-08-05 | 国家电网有限公司技术学院分公司 | Intelligent safety control auxiliary system and method for electric power operation site |
CN110674772A (en) * | 2019-09-29 | 2020-01-10 | 国家电网有限公司技术学院分公司 | Intelligent safety control auxiliary system and method for electric power operation site |
CN110750616A (en) * | 2019-10-16 | 2020-02-04 | 网易(杭州)网络有限公司 | Retrieval type chatting method and device and computer equipment |
WO2021082842A1 (en) * | 2019-10-29 | 2021-05-06 | 平安科技(深圳)有限公司 | Quality perception-based text generation method and apparatus, device, and storage medium |
CN110929714A (en) * | 2019-11-22 | 2020-03-27 | 北京航空航天大学 | Information extraction method of intensive text pictures based on deep learning |
CN111090755B (en) * | 2019-11-29 | 2023-04-04 | 福建亿榕信息技术有限公司 | Text incidence relation judging method and storage medium |
CN111090755A (en) * | 2019-11-29 | 2020-05-01 | 福建亿榕信息技术有限公司 | Text incidence relation judging method and storage medium |
CN111222329A (en) * | 2019-12-10 | 2020-06-02 | 上海八斗智能技术有限公司 | Sentence vector training method and model, and sentence vector prediction method and system |
CN111222329B (en) * | 2019-12-10 | 2023-08-01 | 上海八斗智能技术有限公司 | Sentence vector training method, sentence vector model, sentence vector prediction method and sentence vector prediction system |
CN111026850A (en) * | 2019-12-23 | 2020-04-17 | 园宝科技(武汉)有限公司 | Intellectual property matching technology of bidirectional coding representation of self-attention mechanism |
CN111159340B (en) * | 2019-12-24 | 2023-11-03 | 重庆兆光科技股份有限公司 | Machine reading understanding answer matching method and system based on random optimization prediction |
CN111159340A (en) * | 2019-12-24 | 2020-05-15 | 重庆兆光科技股份有限公司 | Answer matching method and system for machine reading understanding based on random optimization prediction |
CN111126068A (en) * | 2019-12-25 | 2020-05-08 | 中电云脑(天津)科技有限公司 | Chinese named entity recognition method and device and electronic equipment |
CN111241275A (en) * | 2020-01-02 | 2020-06-05 | 厦门快商通科技股份有限公司 | Short text similarity evaluation method, device and equipment |
CN111241275B (en) * | 2020-01-02 | 2022-12-06 | 厦门快商通科技股份有限公司 | Short text similarity evaluation method, device and equipment |
CN111339766A (en) * | 2020-02-19 | 2020-06-26 | 云南电网有限责任公司昆明供电局 | Operation ticket compliance checking method and device |
CN111368037A (en) * | 2020-03-06 | 2020-07-03 | 平安科技(深圳)有限公司 | Text similarity calculation method and device based on Bert model |
CN111401076B (en) * | 2020-04-09 | 2023-04-25 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN111401076A (en) * | 2020-04-09 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Text similarity determination method and device and electronic equipment |
CN111460162A (en) * | 2020-04-11 | 2020-07-28 | 科技日报社 | Text classification method and device, terminal equipment and computer readable storage medium |
CN111460162B (en) * | 2020-04-11 | 2021-11-02 | 科技日报社 | Text classification method and device, terminal equipment and computer readable storage medium |
CN111241851A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Semantic similarity determination method and device and processing equipment |
CN111666753A (en) * | 2020-05-11 | 2020-09-15 | 清华大学深圳国际研究生院 | Short text matching method and system based on global and local matching |
CN111553479A (en) * | 2020-05-13 | 2020-08-18 | 鼎富智能科技有限公司 | Model distillation method, text retrieval method and text retrieval device |
CN111553479B (en) * | 2020-05-13 | 2023-11-03 | 鼎富智能科技有限公司 | Model distillation method, text retrieval method and device |
CN111563143A (en) * | 2020-07-20 | 2020-08-21 | 上海二三四五网络科技有限公司 | Method and device for determining new words |
CN111881257A (en) * | 2020-07-24 | 2020-11-03 | 广州大学 | Automatic matching method, system and storage medium based on subject word and sentence subject matter |
CN111881257B (en) * | 2020-07-24 | 2022-06-03 | 广州大学 | Automatic matching method, system and storage medium based on subject word and sentence subject matter |
CN112329450A (en) * | 2020-07-29 | 2021-02-05 | 好人生(上海)健康科技有限公司 | Insurance medical code mapping dictionary table production method |
CN112101030B (en) * | 2020-08-24 | 2024-01-26 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112101030A (en) * | 2020-08-24 | 2020-12-18 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112100373A (en) * | 2020-08-25 | 2020-12-18 | 南方电网深圳数字电网研究院有限公司 | Contract text analysis method and system based on deep neural network |
CN112308743A (en) * | 2020-10-21 | 2021-02-02 | 上海交通大学 | Trial risk early warning method based on triple similar tasks |
CN112308743B (en) * | 2020-10-21 | 2022-11-11 | 上海交通大学 | Trial risk early warning method based on triple similar tasks |
CN112381099A (en) * | 2020-11-24 | 2021-02-19 | 中教云智数字科技有限公司 | Question recording system based on digital education resources |
CN112231449A (en) * | 2020-12-10 | 2021-01-15 | 杭州识度科技有限公司 | Vertical field entity chain finger system based on multi-path recall |
CN112580373A (en) * | 2020-12-26 | 2021-03-30 | 内蒙古工业大学 | High-quality Mongolian unsupervised neural machine translation method |
CN112580373B (en) * | 2020-12-26 | 2023-06-27 | 内蒙古工业大学 | High-quality Mongolian non-supervision neural machine translation method |
CN113590813A (en) * | 2021-01-20 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Text classification method, recommendation device and electronic equipment |
CN113221530A (en) * | 2021-04-19 | 2021-08-06 | 杭州火石数智科技有限公司 | Text similarity matching method and device based on circle loss, computer equipment and storage medium |
CN113221530B (en) * | 2021-04-19 | 2024-02-13 | 杭州火石数智科技有限公司 | Text similarity matching method and device, computer equipment and storage medium |
WO2022252638A1 (en) * | 2021-05-31 | 2022-12-08 | 平安科技(深圳)有限公司 | Text matching method and apparatus, computer device and readable storage medium |
CN113221531B (en) * | 2021-06-04 | 2024-08-06 | 西安邮电大学 | Semantic matching method for multi-model dynamic collaboration |
CN113221531A (en) * | 2021-06-04 | 2021-08-06 | 西安邮电大学 | Multi-model dynamic collaborative semantic matching method |
CN113569011A (en) * | 2021-07-27 | 2021-10-29 | 马上消费金融股份有限公司 | Training method, device and equipment of text matching model and storage medium |
CN113569011B (en) * | 2021-07-27 | 2023-03-24 | 马上消费金融股份有限公司 | Training method, device and equipment of text matching model and storage medium |
CN113591475A (en) * | 2021-08-03 | 2021-11-02 | 美的集团(上海)有限公司 | Unsupervised interpretable word segmentation method and device and electronic equipment |
CN113536789A (en) * | 2021-09-16 | 2021-10-22 | 平安科技(深圳)有限公司 | Method, device, equipment and medium for predicting relevance of algorithm competition |
CN113590763A (en) * | 2021-09-27 | 2021-11-02 | 湖南大学 | Similar text retrieval method and device based on deep learning and storage medium |
CN114357109A (en) * | 2021-11-25 | 2022-04-15 | 达而观数据(成都)有限公司 | Investment audit doubtful point extraction method based on mixed semantic similarity model |
CN114003698B (en) * | 2021-12-27 | 2022-04-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
CN114358210A (en) * | 2022-01-14 | 2022-04-15 | 平安科技(深圳)有限公司 | Text similarity calculation method and device, computer equipment and storage medium |
CN114358210B (en) * | 2022-01-14 | 2024-07-02 | 平安科技(深圳)有限公司 | Text similarity calculation method, device, computer equipment and storage medium |
CN114154496A (en) * | 2022-02-08 | 2022-03-08 | 成都四方伟业软件股份有限公司 | Coal prison classification scheme comparison method and device based on deep learning BERT model |
CN115186660A (en) * | 2022-07-07 | 2022-10-14 | 东航技术应用研发中心有限公司 | Aviation safety report analysis and evaluation method based on text similarity model |
CN115309899A (en) * | 2022-08-09 | 2022-11-08 | 烟台中科网络技术研究所 | Method and system for identifying and storing specific content in text |
CN116127334A (en) * | 2023-02-22 | 2023-05-16 | 佛山科学技术学院 | Semi-structured text matching method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287494A (en) | A method of the short text Similarity matching based on deep learning BERT algorithm | |
White et al. | Inference is everything: Recasting semantic resources into a unified evaluation framework | |
CN107423284B (en) | Method and system for constructing sentence representation fusing internal structure information of Chinese words | |
CN110516245A (en) | Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium | |
CN109960786A (en) | Chinese Measurement of word similarity based on convergence strategy | |
CN110337645A (en) | The processing component that can be adapted to | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN107797987A (en) | A kind of mixing language material name entity recognition method based on Bi LSTM CNN | |
CN111581953A (en) | Method for automatically analyzing grammar phenomenon of English text | |
Chen et al. | ADOL: a novel framework for automatic domain ontology learning | |
CN113971394A (en) | Text repeat rewriting system | |
CN116186422A (en) | Disease-related public opinion analysis system based on social media and artificial intelligence | |
CN110232121A (en) | A kind of control order classification method based on semantic net | |
Zheng et al. | Enhanced word embedding with multiple prototypes | |
Wu | English Vocabulary Learning Aid System Using Digital Twin Wasserstein Generative Adversarial Network Optimized With Jelly Fish Optimization Algorithm | |
Duan et al. | Automatically build corpora for chinese spelling check based on the input method | |
Ali et al. | Word embedding based new corpus for low-resourced language: Sindhi | |
Meng et al. | Design of Intelligent Recognition Model for English Translation Based on Deep Machine Learning | |
CN114417008A (en) | Construction engineering field-oriented knowledge graph construction method and system | |
Wu | A computational neural network model for college English grammar correction | |
Marfani et al. | Analysis of learners’ sentiments on MOOC forums using natural language processing techniques | |
Guo | RETRACTED: An automatic scoring method for Chinese-English spoken translation based on attention LSTM [EAI Endorsed Scal Inf Syst (2022), Online First] | |
CN114970557A (en) | Knowledge enhancement-based cross-language structured emotion analysis method | |
CN113569560A (en) | Automatic scoring method for Chinese bilingual composition | |
Charnine et al. | Optimal automated method for collaborative development of universiry curricula |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190927 |