CN107564528A

CN107564528A - A kind of speech recognition text and the method and apparatus of order word text matches

Info

Publication number: CN107564528A
Application number: CN201710849743.XA
Authority: CN
Inventors: 姚佳
Original assignee: Shenzhen City Artificial Intelligence Technology Co Secluded Orchid In A Deserted Valley
Current assignee: Guangdong Hui He science and Technology Development Co., Ltd.
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2018-01-09
Anticipated expiration: 2037-09-20
Also published as: CN107564528B

Abstract

The present invention proposes the method and apparatus of a kind of speech recognition text and order word text matches, and method includes：Obtain the text obtained by speech recognition；Cutting word is carried out to text, to generate multiple text words；And cutting word is carried out to default order word text, to generate order word；It is determined that with each text Word similarity highest order word, and the first corresponding relation is generated with this；The Word similarity of text and order word text is determined based on each highest similarity；Determine the phonetic of text and each word in order word text；It is determined that the phonetic with the word in the phonetic similarity degree highest order word of each word in text, and the second corresponding relation is generated with this；The pinyin similarity of text and order word text is determined based on each highest pinyin similarity；The similarity of text and order word text is determined based on Word similarity and pinyin similarity.This programme, it is higher to the tolerance of speech recognition errors without labeled data, and ensure the subsequently matching with order word.

Description

A kind of speech recognition text and the method and apparatus of order word text matches

Technical field

The present invention relates to identification field, the method for more particularly to a kind of speech recognition text and order word text matches and set It is standby.

Background technology

In current interactive voice, by speech recognition it is text first with speech recognition technology, then carries out phase again The processing answered, in this process, just it is frequently necessary to match the text of speech recognition with function command word, determines user Target；But in actual applications, due to some function command words are too short, interrogatory, be not accordant to the old routine expression the problems such as, The text that can cause to identify is with actually there is larger difference.

At present, it is typically all directly to consider the similarity between text, current text in general text matches algorithm The scheme of this matching algorithm has：

A, character string rank, such as according to (cum rights) editing distance, retrieval model etc.；

B, shallow semantic rank, for example on the basis of A, build similar dictionary；Or interdependent syntax point is carried out to text Analysis etc., introduce word order information and be compared；

C, Deep Semantics rank, entered based on deep learning model more fiery at present, such as RNN, Bi-LSTM, GRU, CNN etc. Row Deep Semantics compare.

The objective shortcoming of prior art：

Current text matches algorithm is not very suitable in fact in the matching scene of speech recognition text and order word With.In the case that certain mistake occurs in speech recognition, substantially existing all methods can all go wrong；Wherein, with regard to character string For rank, enough semantic informations can not be matched, and in the case of having certain error rate in speech recognition, crash rate is just Can be very high；And for shallow semantic rank, the Shallow Semantic Parsing of interdependent syntactic analysis etc is built, can all be consumed more Processing time, and this can cause the reduction of environmental efficiency on whole line, be not very actual；And based on shallow semantic point Analysis, for colloquial style (or even speech recognition errors) expression way adaptability there is also certain otherness, and for structure Similar dictionary is built, then for the comparison aspect of character string rank is carried out, can not similarly solve asking for speech recognition errors Topic；As for Deep Semantics level otherwise, it is required for substantial amounts of labeled data, and in such a more emerging field, The acquisition of labeled data is great, or even is directly impossible in the short time.

Thus, user with during robot interactive, it is necessary to saying function command word can match, also just can be with Corresponding function is entered, but is based on aforesaid way, in the case that voice recognition information is insufficient, the result meeting of speech recognition There is larger deviation.

The content of the invention

For in the prior art the defects of, the present invention proposes the side of speech recognition text and order word text matches a kind of Method and equipment, speech recognition errors can greatly be tolerated by realizing, and ensure the subsequently matching with order word.

Specifically, the present invention proposes embodiment in detail below：

A kind of method that the embodiment of the present invention proposes speech recognition text and order word text matches, applied to man-machine friendship Mutual scene, this method include：

Obtain the text obtained by speech recognition；

Cutting word is carried out to the text, to generate multiple text words；And cutting word is carried out to default order word text, with life Into order word；

It is determined that with each text Word similarity highest order word, and the first corresponding relation is generated with this；Wherein, in institute State in the first corresponding relation, identified order word only corresponds to a text word；

The Word similarity of the text and the order word text is determined based on similarity described in each highest；

Determine the phonetic of the text and each word in the order word text；

It is determined that the phonetic with the word in order word described in the phonetic similarity degree highest of each word in the text, and with This second corresponding relation of generation；Wherein, in second corresponding relation, the phonetic of each word only corresponds to one in the text The phonetic of word in the order word；

The pinyin similarity of the text and the order word text is determined based on each highest pinyin similarity；

The similarity of the text and the order word text is determined based on the Word similarity and the pinyin similarity.

In a specific embodiment, in addition to：

Obtain the text data that quantity exceedes certain value；

First similarity of word and word in the text data is determined based on word2vec；

Second similarity of word and word in the text data is determined based on hownet；

Based on first similarity and second similarity, the phase between any two word in the text data is obtained Like spending and build the similar vocabulary of near synonym；

Determine all phonetics of all words in the text data；

Collect the Pinyin information that pronunciation meets default condition of similarity；

Similarity between two phonetic is determined based on acquired phonetic and Pinyin information and builds the bag text The similar table of phonetic of all word phonetics in data；

The cutting word is carried out based on the similarity between any two word；

In a specific embodiment, " it is determined that with each text Word similarity highest order word " includes：

For each text word, determine that the text word is similar to each order word based on the similar vocabulary of near synonym Degree；

Size sequence is carried out to the similarity, to determine and the text Word similarity highest order word；

It is described " it is determined that the spelling with the word in order word described in the phonetic similarity degree highest of each word in the text Sound " includes：

For the word of each text, based on the similar table of the phonetic determine with the phonetic of the word of the text with it is each described The pinyin similarity of word phonetic in order word；

Size sequence is carried out to the pinyin similarity, to determine the pinyin similarity with the phonetic of the word in the text The phonetic of word in highest order word.

It is described " text and the life to be determined based on similarity described in each highest in a specific embodiment Make the Word similarity of word text " include：

Obtain the number of words of the word of similarity described in similarity described in each highest and each highest；

Word length band weight average is determined based on similarity described in the number of words, the number of words of the text, each highest；

The word length is arranged to the Word similarity of the text and the order word text with weight average.

It is described " text and the life to be determined based on each highest pinyin similarity in a specific embodiment Make the pinyin similarity of word text " include：

Determine the average value of each highest pinyin similarity；

The average value is defined as to the pinyin similarity of the text and the order word text.

It is described " text to be determined with the pinyin similarity based on the Word similarity in a specific embodiment Sheet and the similarity of the order word text " includes：

Summation is weighted to the Word similarity and the pinyin similarity, to obtain the text and the order word The similarity of text；Wherein, the weighted sum is completed by following equation；

Sim_x_y=0.6*sim_w_x_y+0.4*pinyin_sim_x_y；Wherein, the sim_x_y is the text With the similarity of the order word text；The sim_w_x_y is the Word similarity；The pinyin_sim_x_y is described The similarity of order word text.

The equipment that the embodiment of the present invention also proposed a kind of speech recognition text and order word text matches, applied to man-machine Interactive scene, the equipment include：

Acquisition module, for obtaining the text obtained by speech recognition；

Cutting word module, cutting word is carried out to the text, to generate multiple text words；And default order word text is carried out Cutting word, to generate order word；

First generation module, for determination and each text Word similarity highest order word, and first is generated with this Corresponding relation；Wherein, in first corresponding relation, identified order word only corresponds to a text word；

Word similarity determining module, for determining the text and order word text based on similarity described in each highest This Word similarity；

Phonetic determining module, for determining the phonetic of the text and each word in the order word text；

Second generation module, for determining and order word described in the phonetic similarity degree highest of each word in the text In word phonetic, and the second corresponding relation is generated with this；Wherein, it is each in the text in second corresponding relation The phonetic of word only corresponds to the phonetic of the word in an order word；

Pinyin similarity determining module, for determining the text and the order word based on each highest pinyin similarity The pinyin similarity of text；

Processing module, for determining the text and the order word based on the Word similarity and the pinyin similarity The similarity of text.

In a specific embodiment, in addition to：Pretreatment module, it is used for

Obtain the text data that quantity exceedes certain value；

Determine all phonetics of all words in the text data；

The cutting word is carried out based on the similarity between any two word；

First generation module " it is determined that with each text Word similarity highest order word " includes：

Second generation module " it is determined that with order word described in the phonetic similarity degree highest of each word in the text In word phonetic " include：

In a specific embodiment, the Word similarity determining module, it is used for：

In a specific embodiment, the pinyin similarity determining module, it is used for：

Determine the average value of each highest pinyin similarity；

In a specific embodiment, the processing module, it is used for：

With this, the embodiment of the present invention proposes the method and apparatus of a kind of speech recognition text and order word text matches, Applied to the scene of man-machine interaction, wherein, this method includes：Obtain the text obtained by speech recognition；The text is entered Row cutting word, to generate multiple text words；And cutting word is carried out to default order word text, to generate order word；It is determined that with each institute Text Word similarity highest order word is stated, and the first corresponding relation is generated with this；Wherein, in first corresponding relation, Identified order word only corresponds to a text word；The text and the order word are determined based on similarity described in each highest The Word similarity of text；Determine the phonetic of the text and each word in the order word text；It is determined that with it is every in the text The phonetic of word in order word described in the phonetic similarity degree highest of individual word, and the second corresponding relation is generated with this；Wherein, exist In second corresponding relation, the phonetic of each word only corresponds to the phonetic of the word in an order word in the text；Base The pinyin similarity of the text and the order word text is determined in each highest pinyin similarity；Based on the Word similarity The similarity of the text and the order word text is determined with the pinyin similarity.By this programme, without labeled data, It is higher to the tolerance of speech recognition errors, and ensure the subsequently matching with order word.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.

Fig. 1 is that a kind of speech recognition text that the embodiment of the present invention proposes and the flow of the method for order word text matches are shown It is intended to；

Fig. 2 is that a kind of speech recognition text that the embodiment of the present invention proposes and the structure of the equipment of order word text matches are shown It is intended to.

Embodiment

Hereinafter, the various embodiments of the disclosure will be described more fully.The disclosure can have various embodiments, and It can adjust and change wherein.It should be understood, however, that：It is limited to spy disclosed herein in the absence of by the various embodiments of the disclosure Determine the intention of embodiment, but the disclosure should be interpreted as covering in the spirit and scope for the various embodiments for falling into the disclosure All adjustment, equivalent and/or alternatives.

Embodiment 1

The embodiment of the invention discloses a kind of speech recognition text and the method for order word text matches, applied to man-machine friendship Mutual scene, as shown in figure 1, this method includes：

The text that step 101, acquisition are obtained by speech recognition；

Specifically, speech data is identified by speech recognition, to generate text.

Step 102, cutting word is carried out to the text, to generate multiple text words；And default order word text is carried out Cutting word, to generate order word；

Specifically, carrying out cutting word to text and default order word file, the preferable cutting word can be based on any Similarity between two words can avoid excessive invalid cutting word from handling come what is carried out with this；Specifically, namely based on any two Similarity between word carries out cutting word to the text, and carries out cutting word to default order word text；

And the similarity between specific any two word can be obtained based on following flows：

Obtain the text data that quantity exceedes certain value；

Determine all phonetics of all words in the text data；

Pronunciation meets the Pinyin information of default condition of similarity；

The cutting word is carried out based on the similarity between any two word；

Specifically, illustrated with specific embodiment：

It is possible, firstly, to collect nearly 5G text data by the method crawled on the net, word2vec structures are then based on The similarity of word and word；Based on the similar vocabulary of word2vec and hownet structure near synonym, obtain similar between any two word Degree；It is defined as word_sim_fuc (w1, w2) function；

Specifically, it can be based on to two similarities (based on the similarity that word2vec is obtained with being obtained based on hownet Similarity) averagely obtained similarity between any two word.

Then, obtain all words is possible to phonetic (polyphone)

3rd, and the more similar Pinyin information of collection pronunciation (for example f-h is easily mixed, r-l is easily mixed, and rear nasal sound and pre-nasal sound are easy Mix), based on this part of data, the similar pinyin of all word phonetics can be constructed.And according to similar categorization, such as with artificial Mode or other identification methods give certain similar value.Such as, similarity can be 0.9 between ting-tin, It can be 0.7 for the similarity between fu-hu, can be 0.7 for similarity between rou-lou；For two phonetic Similarity can be defined as pinyin_sim_fuc (pinyin1, pinyin2).

Step 103, determination and each text Word similarity highest order word, and the first corresponding relation is generated with this； Wherein, in first corresponding relation, identified order word only corresponds to a text word；

Specifically, " it is determined that with each text Word similarity highest order word " in step 103 includes：

Step 104, the Word similarity for determining based on similarity described in each highest the text and the order word text；

Specifically, described in step 104 " determines the text and the order word based on similarity described in each highest The Word similarity of text " includes：

Specifically, after the text of speech recognition is obtained, can be compared with function command word；Define speech recognition text This (x, behind have x expressions), and function command word text (y, behind represented with y), specifically, in one example：With Example illustrates.Assuming that x is:Mutual thwack is opened；And y is：Open annex.

According to the similarity of foregoing obtained any two word；To x, y is successively after cutting word, to each word (w_x_ in x I), the word (w_y_j) of most like (utilizing the word_sim_fuc functions in A) is looked in y, similarity is designated as sim_w_i_j, and And ensure that w_y_j can only correspond to a w_x_i；After each w_x_i similarity of most like word is obtained, word length cum rights is taken X is averagely used as, y Word similarity, is designated as sim_w_x_y；

Specifically, being illustrated still exemplified by above-mentioned, after x cutting words, become mutually-strong-opening；And to y cutting words after, Become unlatching-annex；So opening-unlatching is corresponding here；It is remaining, it is exactly annex-mutual correspondence, does not correspond to word by force；

Assuming that it is 0.9 to open with the similarity opened, annex is 0 with mutual similarity, strong similarity 0.So sim_w_ X_y=0.5*0.9+0.25*0+0.25*0=0.45.

Specifically, implication of the word length with weight average is the band weight average of the length based on word length in the text, specifically, Such as " opening " has 2 words, and " mutual thwack is opened " then has 4 words, therefore the ratio of word length is 0.5, other " mutual " and " strong " is identical with this.

Step 105, the phonetic for determining the text and each word in the order word text；

Step 106, determine and the word in order word described in the phonetic similarity degree highest of each word in the text Phonetic, and the second corresponding relation is generated with this；Wherein, in second corresponding relation, the phonetic of each word in the text The only phonetic of the word in a corresponding order word；

Wherein, it is described " it is determined that with being ordered described in the phonetic similarity degree highest of each word in the text in step 106 Make the phonetic of the word in word " include：

Step 107, based on each highest pinyin similarity determine that the text is similar to the phonetic of the order word text Degree；

Specifically, described " determine the phonetic of the text and the order word text based on each highest pinyin similarity Similarity " includes：

Determine the average value of each highest pinyin similarity；

With this, illustrated based on one embodiment, according to all phonetics of foregoing obtained word, obtain x, it is every in y The phonetic of individual word.

For each word in x phonetic (pinyin_x_i) looked in y it is most like (utilize B in pinyin_sim_fuc Function) word phonetic (pinyin_y_j), similarity is designated as sim_pinyin_i_j, and ensures that pinyin_x_i only can be right Answer a pinyin_y_j；After each pinyin_x_i most like word phonetic is obtained, take it is average be used as x, y phonetic is similar Degree, is designated as pinyin_sim_x_y；

Specifically, still exemplified by above-mentioned, it is that hu jiang/qiang da kai Y open annex spelling that the mutual thwacks of X, which open phonetic, Sound is kai qi fu jian

Begun stepping through first from x, hu it is most like be fu, similarity 0.7

Jiang/qiang it is most like be jian, similarity 0.9

It is exactly kai that Kai is most like, similarity 1.0

Da and qi is remaining, is matched, similarity 0

Averagely get off, pinyin similarity 2.6/4=0.65

Step 108, the text and the order word text determined based on the Word similarity and the pinyin similarity Similarity.

In one embodiment, it is described " text and institute to be determined with the pinyin similarity based on the Word similarity State the similarity of order word text " include：

Still can be exemplified by above-mentioned, then last similarity is 0.6*0.45+0.4*0.65=0.53.

Embodiment 2

The equipment that the embodiment of the present invention 2 also discloses a kind of speech recognition text and order word text matches, applied to people The scene of machine interaction, as shown in Fig. 2 the equipment includes：

Acquisition module 201, for obtaining the text obtained by speech recognition；

Cutting word module 202, cutting word is carried out to the text, to generate multiple text words；And to default order word text Cutting word is carried out, to generate order word；

First generation module 203, for determine with each text Word similarity highest order word, and generate the with this One corresponding relation；Wherein, in first corresponding relation, identified order word only corresponds to a text word；

Word similarity determining module 204, for determining the text and the order based on similarity described in each highest The Word similarity of word text；

Phonetic determining module 205, for determining the phonetic of the text and each word in the order word text；

Second generation module 206, for determining and life described in the phonetic similarity degree highest of each word in the text The phonetic of the word in word is made, and the second corresponding relation is generated with this；Wherein, in second corresponding relation, in the text The phonetic of each word only corresponds to the phonetic of the word in an order word；

Pinyin similarity determining module 207, for determining the text and the life based on each highest pinyin similarity Make the pinyin similarity of word text；

Processing module 208, for determining the text and the life based on the Word similarity and the pinyin similarity Make the similarity of word text.

In one embodiment, in addition to：Pretreatment module, it is used for

Obtain the text data that quantity exceedes certain value；

Determine all phonetics of all words in the text data；

The cutting word is carried out based on the similarity between any two word；

In one embodiment, the Word similarity determining module, is used for：

In one embodiment, the pinyin similarity determining module, is used for：

Determine the average value of each highest pinyin similarity；

In one embodiment, the processing module, is used for：

It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, module in accompanying drawing or Flow is not necessarily implemented necessary to the present invention.

It will be appreciated by those skilled in the art that the module in device in implement scene can be described according to implement scene into Row is distributed in the device of implement scene, can also carry out one or more dresses that respective change is disposed other than this implement scene In putting.The module of above-mentioned implement scene can be merged into a module, can also be further split into multiple submodule.

The invention described above sequence number is for illustration only, does not represent the quality of implement scene.

Disclosed above is only several specific implementation scenes of the present invention, and still, the present invention is not limited to this, Ren Heben What the technical staff in field can think change should all fall into protection scope of the present invention.

Claims

1. a kind of method of speech recognition text and order word text matches, it is characterised in that applied to the scene of man-machine interaction, This method includes：

Obtain the text obtained by speech recognition；

Cutting word is carried out to the text, to generate multiple text words；And cutting word is carried out to default order word text, to generate life Make word；

It is determined that with each text Word similarity highest order word, and the first corresponding relation is generated with this；Wherein, described In one corresponding relation, identified order word only corresponds to a text word；

Determine the phonetic of the text and each word in the order word text；

It is determined that the phonetic with the word in order word described in the phonetic similarity degree highest of each word in the text, and given birth to this Into the second corresponding relation；Wherein, in second corresponding relation, the phonetic of each word is only corresponded to described in one in the text The phonetic of word in order word；

2. the method as described in claim 1, it is characterised in that also include：

Obtain the text data that quantity exceedes certain value；

Based on first similarity and second similarity, the similarity between any two word in the text data is obtained And build the similar vocabulary of near synonym；

Determine all phonetics of all words in the text data；

Similarity between two phonetic is determined based on acquired phonetic and Pinyin information and builds the bag text data In all word phonetics the similar table of phonetic；

The cutting word is carried out based on the similarity between any two word；

" it is determined that with each text Word similarity highest order word " includes：

For each text word, the similarity of the text word and each order word is determined based on the similar vocabulary of near synonym；

Described " it is determined that phonetic with the word in order word described in the phonetic similarity degree highest of each word in the text " wraps Include：

For the word of each text, determined and the phonetic of the word of the text and each order based on the similar table of the phonetic The pinyin similarity of word phonetic in word；

Size sequence is carried out to the pinyin similarity, to determine the pinyin similarity highest with the phonetic of the word in the text Order word in word phonetic.

3. the method as described in claim 1, it is characterised in that described " text to be determined based on similarity described in each highest Sheet and the Word similarity of the order word text " includes：

4. the method as described in claim 1, it is characterised in that described " text to be determined based on each highest pinyin similarity Sheet and the pinyin similarity of the order word text " includes：

Determine the average value of each highest pinyin similarity；

5. the method as described in claim 1, it is characterised in that described " based on the Word similarity and the pinyin similarity Determine the similarity of the text and the order word text " include：

Summation is weighted to the Word similarity and the pinyin similarity, to obtain the text and the order word text Similarity；Wherein, the weighted sum is completed by following equation；

Sim_x_y=0.6*sim_w_x_y+0.4*pinyin_sim_x_y；Wherein, the sim_x_y is the text and institute State the similarity of order word text；The sim_w_x_y is the Word similarity；The pinyin_sim_x_y is the order The similarity of word text.

A kind of 6. equipment of speech recognition text and order word text matches, it is characterised in that applied to the scene of man-machine interaction, The equipment includes：

Acquisition module, for obtaining the text obtained by speech recognition；

Cutting word module, cutting word is carried out to the text, to generate multiple text words；And default order word text is cut Word, to generate order word；

First generation module, for determination and each text Word similarity highest order word, and the first correspondence is generated with this Relation；Wherein, in first corresponding relation, identified order word only corresponds to a text word；

Word similarity determining module, for determining the text and the order word text based on similarity described in each highest Word similarity；

Second generation module, in order word described in the phonetic similarity degree highest of each word in determination and the text The phonetic of word, and the second corresponding relation is generated with this；Wherein, in second corresponding relation, each word in the text Phonetic only corresponds to the phonetic of the word in an order word；

Pinyin similarity determining module, for determining the text and the order word text based on each highest pinyin similarity Pinyin similarity；

Processing module, for determining the text and the order word text based on the Word similarity and the pinyin similarity Similarity.

7. equipment as claimed in claim 6, it is characterised in that also include：Pretreatment module, exceed necessarily for obtaining quantity The text data of value；

Determine all phonetics of all words in the text data；

The cutting word is carried out based on the similarity between any two word；

Second generation module " it is determined that with order word described in the phonetic similarity degree highest of each word in the text The phonetic of word " includes：

8. equipment as claimed in claim 6, it is characterised in that the Word similarity determining module, be used for：

9. equipment as claimed in claim 6, it is characterised in that the pinyin similarity determining module, be used for：

Determine the average value of each highest pinyin similarity；

10. equipment as claimed in claim 6, it is characterised in that the processing module, be used for：