Nothing Special   »   [go: up one dir, main page]

CN110287494A - A method of the short text Similarity matching based on deep learning BERT algorithm - Google Patents

A method of the short text Similarity matching based on deep learning BERT algorithm Download PDF

Info

Publication number
CN110287494A
CN110287494A CN201910583223.8A CN201910583223A CN110287494A CN 110287494 A CN110287494 A CN 110287494A CN 201910583223 A CN201910583223 A CN 201910583223A CN 110287494 A CN110287494 A CN 110287494A
Authority
CN
China
Prior art keywords
short text
text
word
bert
similarity matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910583223.8A
Other languages
Chinese (zh)
Inventor
尹青山
李锐
于治楼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201910583223.8A priority Critical patent/CN110287494A/en
Publication of CN110287494A publication Critical patent/CN110287494A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of methods of short text Similarity matching based on deep learning BERT algorithm, belong to field of artificial intelligence, the realization process of this method is as follows: 1), using common data sets and short text carrying out BERT training, obtain trained BERT model;2) word segmentation processing, is carried out to short text to be matched;3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, gets the feature vector of short text;4), Similarity matching short text is obtained using cosine similarity algorithm.The present invention carries out short text Similarity matching using the BERT model of pre-training, relatively before text Similarity matching method there is preferably performance.

Description

A method of the short text Similarity matching based on deep learning BERT algorithm
Technical field
The present invention relates to field of artificial intelligence, specifically a kind of short text based on deep learning BERT algorithm The method of Similarity matching.
Background technique
During natural language processing, the similitude between two short texts of measurement, text are often involved how to It is a kind of semantic space of higher-dimension, abstract decomposition how is carried out to it, goes to quantify its similitude in mathematical angle so as to stand. If the metric form of similitude between text can be obtained, we can utilize the K-means of partitioning, based on density The DBSCAN either probabilistic method based on model carries out the clustering between text;On the other hand, we also can use Similitude between text carries out duplicate removal pretreatment to large-scale corpus, or looks for the related names (mould of a certain entity name Paste matching).And there are many kinds of methods for the similitude of two character strings of measurement, such as most directly utilize hashcode, and classics Topic model perhaps using term vector by text be abstracted as vector indicate again by Euclidean distance between feature vector or Pearson's distance is measured.With the fast development of artificial intelligence, now constantly leaps up and revealed new algorithm and model, with more The calculating of good, more efficient realization deep learning, short text Similarity matching play text analyzing and corpus processing important Effect promotes the efficiency of short text Similarity matching, has great importance how under the calculating environment quickly updated.
Summary of the invention
Technical assignment of the invention be against the above deficiency place, a kind of short essay based on deep learning BERT algorithm is provided The method of this Similarity matching carries out the Similarity matching of short text using the model of pre-training, has better application effect.
The technical solution adopted by the present invention to solve the technical problems is:
A method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method Process is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is obtained To the feature vector of short text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
BERT is a kind of method that pre-training language indicates, one is had trained on a large amount of corpus of text (wikipedia) and is led to Then " language understanding " model is gone to execute the NLP task for wanting to do with this model.The method performance of BERT ratio before more goes out Color, because it is first with unsupervised, the depth bilateral system on pre-training NLP.We use the BERT of pre-training Model has preferably performance to carry out short text Similarity matching.
For BERT by the word of random mask list entries, target is exactly to utilize contextual information (relative to tradition from left-hand Right language model, mask model can be predicted using masked word or so context simultaneously) it predicts by the word of mask. It using the contextual information of masked or so is realized by the two-way encoder of Transformer simultaneously.It so both can be with Solve " language model is unidirectional " limitation.In addition, BERT model introduces the task of " predicting whether as next sentence ", to learn jointly Practise pre-trained representations.
Specifically, described instructed using common data sets and short text progress BERT training including the relationship between word in sentence Relationship training between experienced and sentence.
Wherein, the relationship training method in the sentence between word are as follows:
Part of words is covered as training sample at random, wherein 80% is replaced with masked token, 10% at random A word come replace, 10% keep this word it is constant.
Preferably, the part of words is 15%, i.e., covers 15% word at random as training sample, and model needs compared with More training steps are restrained.
Further, the relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is The actual context of A lower;Negative sample be in corpus randomly selected sentence as B.
Preferably, carrying out word segmentation processing to short text to be matched includes that removal text stop words and removal link.
Specifically, carrying out word segmentation processing to short text to be matched, participleization processing, benefit are carried out to short text with participle tool With the acne method based on statistics, while carrying out string matching participle to short text, hidden horse is used using dictionary for word segmentation Er Kefu model identifies some neologisms, is split to short text.
Specifically, the short text after word segmentation processing is input in trained BERT model, BERT model passes through inquiry Each word in short is converted to one-dimensional vector by word vector table, as mode input;Model output is then that each word of input is corresponding Fusion full text semantic information after vector indicate.
Further, mode input also includes text vector and position vector,
Text vector: the value of the vector learns automatically during model training, and the overall situation for portraying text is semantic Information, and blended with the semantic information of single character/word;
Position vector: the semantic information as entrained by the character/word for appearing in text different location has differences, BERT model adds a different vector respectively to the character/word of different location to distinguish.
Using the adduction of word vector, text vector and position vector as mode input, model is exported by character/word BERT model The text vector that vector converts can include more accurate semantic information.
Preferably, Similarity matching short text is obtained using cosine similarity algorithm, to target short text and other short texts Distribution carries out cosine similarity calculating, takes the highest text of similarity as the Similarity matching text of target text.
A kind of method of short text Similarity matching based on deep learning BERT algorithm of the invention compared with prior art, It has the advantages that
Direct before measure the similitudes of two character strings relatively using BERT model using hashcode, it is classical Text is abstracted as vector expression using term vector by topic model, then passes through the Euclidean distance between feature vector or Pearson The mode of distance metric is compared, BERT method performance it is outstanding, it be first on pre-training NLP it is unsupervised, Depth bilateral system, so short text Similarity matching is carried out using the BERT model of pre-training has preferably performance, it can be with Very big promotion matching accuracy, improves the efficiency of short text Similarity matching, preferably realizes the text in natural language processing Analysis and corpus processing.
Detailed description of the invention
Fig. 1 is the flow diagram of the method for the short text Similarity matching of the invention based on deep learning BERT algorithm;
Fig. 2 is the secondary relationship training example figure of BERT algorithm in embodiment.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
A method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method Process is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is obtained To the feature vector of short text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
BERT is a kind of method that pre-training language indicates, one is had trained on a large amount of corpus of text (wikipedia) and is led to Then " language understanding " model is gone to execute the NLP task for wanting to do with this model.The method performance of BERT ratio before more goes out Color, because it is first with unsupervised, the depth bilateral system on pre-training NLP.We use the BERT of pre-training Model has preferably performance to carry out short text Similarity matching.
For BERT by the word of random mask list entries, target is exactly to utilize contextual information (relative to tradition from left-hand Right language model, mask model can be predicted using masked word or so context simultaneously) it predicts by the word of mask. It using the contextual information of masked or so is realized by the two-way encoder of Transformer simultaneously.It so both can be with Solve " language model is unidirectional " limitation.In addition, BERT model introduces the task of " predicting whether as next sentence ", to learn jointly Practise pre-trained representations.
Wherein, described to include the relationship training between word in sentence using common data sets and short text progress BERT training Relationship training between sentence.
Relationship training method in sentence between word are as follows:
15% word is covered as training sample at random, wherein 80% is replaced with masked token, 10% at random A word come replace, 10% keep this word it is constant.
With " Wu Tse-tien is first empress of China." for the words, as shown in Fig. 2,
Position " then " is chosen, is covered " then " as training sample;
For " then " being occluded, wherein 80% is replaced with masked token:
" military day [mask] is first empress of China.";
Wherein 10% keep this word constant:
" Wu Tse-tien is first empress of China.";
Wherein 10% replaced with a random word:
" Wu Zongtian is first empress of China.";
Predict the position of " then ":
Then the positive Zhou Yingzong of king often takes
0.999 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
The part selection 80% being occluded is replaced with masked token, replaces choosing with [mask] to avoid 100% Word caused by fine tuning when model it is possible that some words that do not see, therefore wherein 10% keep this word not Become;Wherein 10% is replaced with a random word and be intended to Transformer and to keep to the distributed table of each input token Sign, otherwise Transformer is likely to remember that this [MASK] is exactly " then ".
(training of rest part is same as above, and the convergence of model can be carried out by repeatedly training.)
Relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is The actual context of A lower;Negative sample be in corpus randomly selected sentence as B.
Many important Downstream Jobs are based on the understanding to relationship between two text sentences, such as question and answer (QA) and natural language Speech infers (NLI), and this relationship is directly obtained not by Language Modeling.In order to train the model for understanding sentence relationship, Pre-training one binaryzation lower prediction task, the task can easily be generated from any single language corpus.It is specific next Say, select sentence A and B as pre-training sample: B have 50% may be A next sentence, also there is 50% to be from language Expect the random sentence in library.
Wherein, carrying out word segmentation processing to short text to be matched includes that removal text stop words and removal link.
Word segmentation processing is carried out to short text to be matched, participleization is carried out to short text with participle tool and is handled, using being based on The participle of statistics while carrying out string matching participle to short text, uses Hidden Markov Model using dictionary for word segmentation It identifies some neologisms, short text is split.
Such as: text library format is corresponding two texts (format is " id ", " text-a ", " text-b ") two-by-two, In, text-a is similar " what is private health insurance? ", " what private health insurance is? " or the like approximate short text, And text-b is then corresponding text, in upper example text-a, text-b is then to tell that user's private health insurance is Text.It utilizes " jieba " to segment, each of text library sample is segmented.During participle, needing first will such as Punctuation mark and stop word etc. are removed without the stop words of specific meaning, guarantee participle effect and speed, secondly according to user Dictionary is segmented, it is ensured that can extract proprietary noun.In upper example, " private health insurance " can be regarded as a proprietary name Word, rather than it is divided into " private, health, insurance " three words.Be directed to " what is private health insurance? ", then can be converted to " what " "Yes" " private health insurance ".
Short text after word segmentation processing is input in trained BERT model, the feature vector of short text is got:
Short text after word segmentation processing is input in trained BERT model, BERT model passes through inquiry word vector table Each word in short is converted into one-dimensional vector, as mode input;Model output is then that the corresponding fusion of each word of input is complete Vector after literary semantic information indicates;
In addition, mode input, in addition to word vector, also include other two part: 1, text vector: the value of the vector exists Automatic study, mutually melts for portraying the global semantic information of text, and with the semantic information of single character/word during model training It closes;2, position vector: the semantic information as entrained by the character/word for appearing in text different location have differences (such as: " I Like you " and " you like me "), therefore, BERT model adds a different vector respectively to the character/word of different location to make area Point;
BERT model using the adduction of word vector, text vector and position vector as mode input, therefore model output by The text vector that character/word vector converts can include more accurate semantic information.
Similarity matching short text is obtained using cosine similarity algorithm, target short text and the distribution of other short texts are carried out Cosine similarity calculates, and takes the highest text of similarity as the Similarity matching text of target text.
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers Work as understanding, the present invention is not limited to above-mentioned specific embodiments.On the basis of the disclosed embodiments, the technical field Technical staff can arbitrarily combine different technical features, to realize different technical solutions.
Except for the technical features described in the specification, it all is technically known to those skilled in the art.

Claims (10)

1. a kind of method of the short text Similarity matching based on deep learning BERT algorithm, it is characterised in that the realization of this method Journey is as follows:
1) BERT training, is carried out using common data sets and short text, obtains trained BERT model;
2) word segmentation processing, is carried out to short text to be matched;
3), the short text after word segmentation processing that step 2) obtains is input in the BERT model that step 1) obtains, is got short The feature vector of text;
4), Similarity matching short text is obtained using cosine similarity algorithm.
2. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special Sign is that described to carry out BERT training using common data sets and short text include relationship training and the sentence in sentence between word Between relationship training.
3. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 2, special Sign is the relationship training method in the sentence between word are as follows:
Part of words is covered as training sample at random, wherein 80% is replaced with masked token, 10% uses random one A word come replace, 10% keep this word it is constant.
4. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 3, special Sign is that the part of words is 15%.
5. a kind of side of the short text Similarity matching based on deep learning BERT algorithm according to Claims 2 or 3 or 4 Method, it is characterised in that the relationship training method between sentence are as follows:
One two disaggregated model of pre-training, positive sample and negative sample ratio are 1:1, and positive sample is given sentence A and B, and B is A Actual context lower;Negative sample be in corpus randomly selected sentence as B.
6. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special Sign is that it includes removal text stop words and removal link that word segmentation processing is carried out to short text to be matched.
7. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 6, special Sign is that carrying out participleization to short text with participle tool is handled, and using the segmenting method based on statistics, uses dictionary for word segmentation While carrying out string matching participle to short text, some neologisms are identified using Hidden Markov Model, and short text is carried out Segmentation.
8. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special Sign is for the short text after word segmentation processing to be input in trained BERT model, and BERT model will by inquiry word vector table Each word in short is converted to one-dimensional vector, as mode input;Model output is then the corresponding fusion full text of each word of input Vector after semantic information indicates.
9. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 8, special Sign is that mode input also includes text vector and position vector,
Text vector: the value of the vector learns automatically during model training, for portraying the global semantic information of text, And it is blended with the semantic information of single character/word;
Position vector: the semantic information as entrained by the character/word for appearing in text different location has differences, BERT Model adds a different vector respectively to the character/word of different location to distinguish.
10. a kind of method of short text Similarity matching based on deep learning BERT algorithm according to claim 1, special Sign is to obtain Similarity matching short text using cosine similarity algorithm, more than target short text and the distribution progress of other short texts String similarity calculation takes the highest text of similarity as the Similarity matching text of target text.
CN201910583223.8A 2019-07-01 2019-07-01 A method of the short text Similarity matching based on deep learning BERT algorithm Pending CN110287494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910583223.8A CN110287494A (en) 2019-07-01 2019-07-01 A method of the short text Similarity matching based on deep learning BERT algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910583223.8A CN110287494A (en) 2019-07-01 2019-07-01 A method of the short text Similarity matching based on deep learning BERT algorithm

Publications (1)

Publication Number Publication Date
CN110287494A true CN110287494A (en) 2019-09-27

Family

ID=68021471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910583223.8A Pending CN110287494A (en) 2019-07-01 2019-07-01 A method of the short text Similarity matching based on deep learning BERT algorithm

Country Status (1)

Country Link
CN (1) CN110287494A (en)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674772A (en) * 2019-09-29 2020-01-10 国家电网有限公司技术学院分公司 Intelligent safety control auxiliary system and method for electric power operation site
CN110750616A (en) * 2019-10-16 2020-02-04 网易(杭州)网络有限公司 Retrieval type chatting method and device and computer equipment
CN110929714A (en) * 2019-11-22 2020-03-27 北京航空航天大学 Information extraction method of intensive text pictures based on deep learning
CN111026850A (en) * 2019-12-23 2020-04-17 园宝科技(武汉)有限公司 Intellectual property matching technology of bidirectional coding representation of self-attention mechanism
CN111090755A (en) * 2019-11-29 2020-05-01 福建亿榕信息技术有限公司 Text incidence relation judging method and storage medium
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
CN111159340A (en) * 2019-12-24 2020-05-15 重庆兆光科技股份有限公司 Answer matching method and system for machine reading understanding based on random optimization prediction
CN111222329A (en) * 2019-12-10 2020-06-02 上海八斗智能技术有限公司 Sentence vector training method and model, and sentence vector prediction method and system
CN111241275A (en) * 2020-01-02 2020-06-05 厦门快商通科技股份有限公司 Short text similarity evaluation method, device and equipment
CN111241851A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Semantic similarity determination method and device and processing equipment
CN111339766A (en) * 2020-02-19 2020-06-26 云南电网有限责任公司昆明供电局 Operation ticket compliance checking method and device
CN111368037A (en) * 2020-03-06 2020-07-03 平安科技(深圳)有限公司 Text similarity calculation method and device based on Bert model
CN111401076A (en) * 2020-04-09 2020-07-10 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN111460162A (en) * 2020-04-11 2020-07-28 科技日报社 Text classification method and device, terminal equipment and computer readable storage medium
CN111553479A (en) * 2020-05-13 2020-08-18 鼎富智能科技有限公司 Model distillation method, text retrieval method and text retrieval device
CN111563143A (en) * 2020-07-20 2020-08-21 上海二三四五网络科技有限公司 Method and device for determining new words
CN111666753A (en) * 2020-05-11 2020-09-15 清华大学深圳国际研究生院 Short text matching method and system based on global and local matching
CN111881257A (en) * 2020-07-24 2020-11-03 广州大学 Automatic matching method, system and storage medium based on subject word and sentence subject matter
CN112101030A (en) * 2020-08-24 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112100373A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Contract text analysis method and system based on deep neural network
CN112231449A (en) * 2020-12-10 2021-01-15 杭州识度科技有限公司 Vertical field entity chain finger system based on multi-path recall
CN112308743A (en) * 2020-10-21 2021-02-02 上海交通大学 Trial risk early warning method based on triple similar tasks
CN112329450A (en) * 2020-07-29 2021-02-05 好人生(上海)健康科技有限公司 Insurance medical code mapping dictionary table production method
CN112381099A (en) * 2020-11-24 2021-02-19 中教云智数字科技有限公司 Question recording system based on digital education resources
CN112580373A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 High-quality Mongolian unsupervised neural machine translation method
WO2021082842A1 (en) * 2019-10-29 2021-05-06 平安科技(深圳)有限公司 Quality perception-based text generation method and apparatus, device, and storage medium
CN113221530A (en) * 2021-04-19 2021-08-06 杭州火石数智科技有限公司 Text similarity matching method and device based on circle loss, computer equipment and storage medium
CN113221531A (en) * 2021-06-04 2021-08-06 西安邮电大学 Multi-model dynamic collaborative semantic matching method
CN113536789A (en) * 2021-09-16 2021-10-22 平安科技(深圳)有限公司 Method, device, equipment and medium for predicting relevance of algorithm competition
CN113569011A (en) * 2021-07-27 2021-10-29 马上消费金融股份有限公司 Training method, device and equipment of text matching model and storage medium
CN113590763A (en) * 2021-09-27 2021-11-02 湖南大学 Similar text retrieval method and device based on deep learning and storage medium
CN113591475A (en) * 2021-08-03 2021-11-02 美的集团(上海)有限公司 Unsupervised interpretable word segmentation method and device and electronic equipment
CN113590813A (en) * 2021-01-20 2021-11-02 腾讯科技(深圳)有限公司 Text classification method, recommendation device and electronic equipment
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium
CN114154496A (en) * 2022-02-08 2022-03-08 成都四方伟业软件股份有限公司 Coal prison classification scheme comparison method and device based on deep learning BERT model
CN114358210A (en) * 2022-01-14 2022-04-15 平安科技(深圳)有限公司 Text similarity calculation method and device, computer equipment and storage medium
CN114357109A (en) * 2021-11-25 2022-04-15 达而观数据(成都)有限公司 Investment audit doubtful point extraction method based on mixed semantic similarity model
CN115186660A (en) * 2022-07-07 2022-10-14 东航技术应用研发中心有限公司 Aviation safety report analysis and evaluation method based on text similarity model
CN115309899A (en) * 2022-08-09 2022-11-08 烟台中科网络技术研究所 Method and system for identifying and storing specific content in text
WO2022252638A1 (en) * 2021-05-31 2022-12-08 平安科技(深圳)有限公司 Text matching method and apparatus, computer device and readable storage medium
CN116127334A (en) * 2023-02-22 2023-05-16 佛山科学技术学院 Semi-structured text matching method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030004716A1 (en) * 2001-06-29 2003-01-02 Haigh Karen Z. Method and apparatus for determining a measure of similarity between natural language sentences
US20150095017A1 (en) * 2013-09-27 2015-04-02 Google Inc. System and method for learning word embeddings using neural language models
CN109710770A (en) * 2019-01-31 2019-05-03 北京牡丹电子集团有限责任公司数字电视技术中心 A kind of file classification method and device based on transfer learning
CN109815336A (en) * 2019-01-28 2019-05-28 无码科技(杭州)有限公司 A kind of text polymerization and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030004716A1 (en) * 2001-06-29 2003-01-02 Haigh Karen Z. Method and apparatus for determining a measure of similarity between natural language sentences
US20150095017A1 (en) * 2013-09-27 2015-04-02 Google Inc. System and method for learning word embeddings using neural language models
CN109815336A (en) * 2019-01-28 2019-05-28 无码科技(杭州)有限公司 A kind of text polymerization and system
CN109710770A (en) * 2019-01-31 2019-05-03 北京牡丹电子集团有限责任公司数字电视技术中心 A kind of file classification method and device based on transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘继明等: "基于小样本机器学习的跨任务对话系统", 《重庆邮电大学学报(自然科学版)》 *

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674772B (en) * 2019-09-29 2022-08-05 国家电网有限公司技术学院分公司 Intelligent safety control auxiliary system and method for electric power operation site
CN110674772A (en) * 2019-09-29 2020-01-10 国家电网有限公司技术学院分公司 Intelligent safety control auxiliary system and method for electric power operation site
CN110750616A (en) * 2019-10-16 2020-02-04 网易(杭州)网络有限公司 Retrieval type chatting method and device and computer equipment
WO2021082842A1 (en) * 2019-10-29 2021-05-06 平安科技(深圳)有限公司 Quality perception-based text generation method and apparatus, device, and storage medium
CN110929714A (en) * 2019-11-22 2020-03-27 北京航空航天大学 Information extraction method of intensive text pictures based on deep learning
CN111090755B (en) * 2019-11-29 2023-04-04 福建亿榕信息技术有限公司 Text incidence relation judging method and storage medium
CN111090755A (en) * 2019-11-29 2020-05-01 福建亿榕信息技术有限公司 Text incidence relation judging method and storage medium
CN111222329A (en) * 2019-12-10 2020-06-02 上海八斗智能技术有限公司 Sentence vector training method and model, and sentence vector prediction method and system
CN111222329B (en) * 2019-12-10 2023-08-01 上海八斗智能技术有限公司 Sentence vector training method, sentence vector model, sentence vector prediction method and sentence vector prediction system
CN111026850A (en) * 2019-12-23 2020-04-17 园宝科技(武汉)有限公司 Intellectual property matching technology of bidirectional coding representation of self-attention mechanism
CN111159340B (en) * 2019-12-24 2023-11-03 重庆兆光科技股份有限公司 Machine reading understanding answer matching method and system based on random optimization prediction
CN111159340A (en) * 2019-12-24 2020-05-15 重庆兆光科技股份有限公司 Answer matching method and system for machine reading understanding based on random optimization prediction
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
CN111241275A (en) * 2020-01-02 2020-06-05 厦门快商通科技股份有限公司 Short text similarity evaluation method, device and equipment
CN111241275B (en) * 2020-01-02 2022-12-06 厦门快商通科技股份有限公司 Short text similarity evaluation method, device and equipment
CN111339766A (en) * 2020-02-19 2020-06-26 云南电网有限责任公司昆明供电局 Operation ticket compliance checking method and device
CN111368037A (en) * 2020-03-06 2020-07-03 平安科技(深圳)有限公司 Text similarity calculation method and device based on Bert model
CN111401076B (en) * 2020-04-09 2023-04-25 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN111401076A (en) * 2020-04-09 2020-07-10 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN111460162A (en) * 2020-04-11 2020-07-28 科技日报社 Text classification method and device, terminal equipment and computer readable storage medium
CN111460162B (en) * 2020-04-11 2021-11-02 科技日报社 Text classification method and device, terminal equipment and computer readable storage medium
CN111241851A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Semantic similarity determination method and device and processing equipment
CN111666753A (en) * 2020-05-11 2020-09-15 清华大学深圳国际研究生院 Short text matching method and system based on global and local matching
CN111553479A (en) * 2020-05-13 2020-08-18 鼎富智能科技有限公司 Model distillation method, text retrieval method and text retrieval device
CN111553479B (en) * 2020-05-13 2023-11-03 鼎富智能科技有限公司 Model distillation method, text retrieval method and device
CN111563143A (en) * 2020-07-20 2020-08-21 上海二三四五网络科技有限公司 Method and device for determining new words
CN111881257A (en) * 2020-07-24 2020-11-03 广州大学 Automatic matching method, system and storage medium based on subject word and sentence subject matter
CN111881257B (en) * 2020-07-24 2022-06-03 广州大学 Automatic matching method, system and storage medium based on subject word and sentence subject matter
CN112329450A (en) * 2020-07-29 2021-02-05 好人生(上海)健康科技有限公司 Insurance medical code mapping dictionary table production method
CN112101030B (en) * 2020-08-24 2024-01-26 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112101030A (en) * 2020-08-24 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112100373A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Contract text analysis method and system based on deep neural network
CN112308743A (en) * 2020-10-21 2021-02-02 上海交通大学 Trial risk early warning method based on triple similar tasks
CN112308743B (en) * 2020-10-21 2022-11-11 上海交通大学 Trial risk early warning method based on triple similar tasks
CN112381099A (en) * 2020-11-24 2021-02-19 中教云智数字科技有限公司 Question recording system based on digital education resources
CN112231449A (en) * 2020-12-10 2021-01-15 杭州识度科技有限公司 Vertical field entity chain finger system based on multi-path recall
CN112580373A (en) * 2020-12-26 2021-03-30 内蒙古工业大学 High-quality Mongolian unsupervised neural machine translation method
CN112580373B (en) * 2020-12-26 2023-06-27 内蒙古工业大学 High-quality Mongolian non-supervision neural machine translation method
CN113590813A (en) * 2021-01-20 2021-11-02 腾讯科技(深圳)有限公司 Text classification method, recommendation device and electronic equipment
CN113221530A (en) * 2021-04-19 2021-08-06 杭州火石数智科技有限公司 Text similarity matching method and device based on circle loss, computer equipment and storage medium
CN113221530B (en) * 2021-04-19 2024-02-13 杭州火石数智科技有限公司 Text similarity matching method and device, computer equipment and storage medium
WO2022252638A1 (en) * 2021-05-31 2022-12-08 平安科技(深圳)有限公司 Text matching method and apparatus, computer device and readable storage medium
CN113221531B (en) * 2021-06-04 2024-08-06 西安邮电大学 Semantic matching method for multi-model dynamic collaboration
CN113221531A (en) * 2021-06-04 2021-08-06 西安邮电大学 Multi-model dynamic collaborative semantic matching method
CN113569011A (en) * 2021-07-27 2021-10-29 马上消费金融股份有限公司 Training method, device and equipment of text matching model and storage medium
CN113569011B (en) * 2021-07-27 2023-03-24 马上消费金融股份有限公司 Training method, device and equipment of text matching model and storage medium
CN113591475A (en) * 2021-08-03 2021-11-02 美的集团(上海)有限公司 Unsupervised interpretable word segmentation method and device and electronic equipment
CN113536789A (en) * 2021-09-16 2021-10-22 平安科技(深圳)有限公司 Method, device, equipment and medium for predicting relevance of algorithm competition
CN113590763A (en) * 2021-09-27 2021-11-02 湖南大学 Similar text retrieval method and device based on deep learning and storage medium
CN114357109A (en) * 2021-11-25 2022-04-15 达而观数据(成都)有限公司 Investment audit doubtful point extraction method based on mixed semantic similarity model
CN114003698B (en) * 2021-12-27 2022-04-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium
CN114358210A (en) * 2022-01-14 2022-04-15 平安科技(深圳)有限公司 Text similarity calculation method and device, computer equipment and storage medium
CN114358210B (en) * 2022-01-14 2024-07-02 平安科技(深圳)有限公司 Text similarity calculation method, device, computer equipment and storage medium
CN114154496A (en) * 2022-02-08 2022-03-08 成都四方伟业软件股份有限公司 Coal prison classification scheme comparison method and device based on deep learning BERT model
CN115186660A (en) * 2022-07-07 2022-10-14 东航技术应用研发中心有限公司 Aviation safety report analysis and evaluation method based on text similarity model
CN115309899A (en) * 2022-08-09 2022-11-08 烟台中科网络技术研究所 Method and system for identifying and storing specific content in text
CN116127334A (en) * 2023-02-22 2023-05-16 佛山科学技术学院 Semi-structured text matching method and system

Similar Documents

Publication Publication Date Title
CN110287494A (en) A method of the short text Similarity matching based on deep learning BERT algorithm
White et al. Inference is everything: Recasting semantic resources into a unified evaluation framework
CN107423284B (en) Method and system for constructing sentence representation fusing internal structure information of Chinese words
CN110516245A (en) Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium
CN109960786A (en) Chinese Measurement of word similarity based on convergence strategy
CN110337645A (en) The processing component that can be adapted to
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN107797987A (en) A kind of mixing language material name entity recognition method based on Bi LSTM CNN
CN111581953A (en) Method for automatically analyzing grammar phenomenon of English text
Chen et al. ADOL: a novel framework for automatic domain ontology learning
CN113971394A (en) Text repeat rewriting system
CN116186422A (en) Disease-related public opinion analysis system based on social media and artificial intelligence
CN110232121A (en) A kind of control order classification method based on semantic net
Zheng et al. Enhanced word embedding with multiple prototypes
Wu English Vocabulary Learning Aid System Using Digital Twin Wasserstein Generative Adversarial Network Optimized With Jelly Fish Optimization Algorithm
Duan et al. Automatically build corpora for chinese spelling check based on the input method
Ali et al. Word embedding based new corpus for low-resourced language: Sindhi
Meng et al. Design of Intelligent Recognition Model for English Translation Based on Deep Machine Learning
CN114417008A (en) Construction engineering field-oriented knowledge graph construction method and system
Wu A computational neural network model for college English grammar correction
Marfani et al. Analysis of learners’ sentiments on MOOC forums using natural language processing techniques
Guo RETRACTED: An automatic scoring method for Chinese-English spoken translation based on attention LSTM [EAI Endorsed Scal Inf Syst (2022), Online First]
CN114970557A (en) Knowledge enhancement-based cross-language structured emotion analysis method
CN113569560A (en) Automatic scoring method for Chinese bilingual composition
Charnine et al. Optimal automated method for collaborative development of universiry curricula

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190927