Nothing Special   »   [go: up one dir, main page]

CN1201286C - Speech recognizer with a lexial tree based N-gram language model - Google Patents

Speech recognizer with a lexial tree based N-gram language model Download PDF

Info

Publication number
CN1201286C
CN1201286C CN99817058.5A CN99817058A CN1201286C CN 1201286 C CN1201286 C CN 1201286C CN 99817058 A CN99817058 A CN 99817058A CN 1201286 C CN1201286 C CN 1201286C
Authority
CN
China
Prior art keywords
probability
word
gram
phoneme
lamu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN99817058.5A
Other languages
Chinese (zh)
Other versions
CN1406374A (en
Inventor
林志威(音译)
严永宏(音译)
赵青薇(音译)
袁宝生(音译)
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN1406374A publication Critical patent/CN1406374A/en
Application granted granted Critical
Publication of CN1201286C publication Critical patent/CN1201286C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

In some embodiments, the present invention comprises a method for creating a lexical tree and identifying beginning phonemes in the lexical tree. The method of the embodiments further comprises estimating the probabilities of words with particular beginning phonemes in the lexical tree and storing at least some of the estimated probabilities, and is characterized in that backoff weights are not stored together with the estimated probabilities. The estimated probabilities can be stored in a query table. In other embodiments, the present invention comprises a method for receiving and identifying phonemes in the lexical tree. The method of the other embodiments also comprises estimating the probabilities of words including the phonemes through estimated probabilities retrieved from a storage area, and is characterized in that the retrieve probabilities do not include backoff weights stored with the estimated probabilities. Besides, the estimated probabilities can be stored in a query table and can be used for establishing trimming thresholds. The methods can be realized via orders on a readable medium.

Description

Use is based on the method for the execution speech recognition of the N lattice Aramaic language model of words tree
Technical field
The present invention relates to speech recognition system, more particularly, relate to a kind of n Ge Lamu (n-gram) language mode based on words tree.
Background technology
An ingredient in the speech recognition device is a language mode.Language mode comprises that probability that word occurs and word follow the probability of another word or a plurality of words in a vocabulary.Really, the popular approach of catching a kind of syntactic structure of given language is to use conditional probability to catch to be embedded in the continuous information in the sentence word string.For example, if current word is w 1, so just can set up a kind of language mode, some other words w is described 2, w 3... w NTo follow at w 1The probability of back.Conditional probability usually by check in the training main body (for example, newspaper) word each other neighbour's frequency computation part come out.For example, conditional probability P 21=(w 2| w 1) be word w 2Follow word w 1Probability.Probability P 21Be called as two Ge Lamu.One three lattice Aramaic language model is the conditional probability that two other word followed in order in a word.For example, P 210=(w 2| w 1w 0) be word w 2Follow word w 1And w 1Follow word w again 0Probability.Single Ge Lamu or 1-Ge Lamu probability are a probability that word will occur.For example, p 1=p (w 1) be word w 1The probability that under the situation of not considering previous word, will occur at a special time.
The possible quantity of related combinations of words is the rising of geometric series ground among single Ge Lamu, two Ge Lamu, three Ge Lamu etc.The term of Shi Yonging " lower Ge Lamu " and " higher Ge Lamu " are meant the rank of Ge Lamu herein.For example, single Ge Lamu is lower than two Ge Lamu, and two Ge Lamu is lower than three Ge Lamu.Three Ge Lamu are than two Ge Lamugao, and two Ge Lamu is than single Ge Lamugao.For a big vocabulary, the sum of the combination of three Ge Lamu, even the sum of the combination of two Ge Lamu also big must be difficult to the management.Yet the result is that three so a large amount of Ge Lamu and two Ge Lamu cause conditional probability very little (almost nil), are unworthy they are put in the language mode.Someone once used the over-compensation weight to adjust the probability of low Ge Lamu.For example, when three Ge Lamu probability are not included in the language mode, so just can use two Ge Lamu probability to multiply by a backoff weight (bowt) again.If backoff weight does not exist, Ge Lamu that so just can be lower replaces higher Ge Lamu.Correspondingly, the n-gram language mode based on word can be expressed as equation (1), and is as follows:
Figure C9981705800051
As mentioned above, represent, also seldom consider to be higher than the situation of three Ge Lamu although equation (1) is a more common n-gram.
Typical n-gram language mode file memory format is as follows:
For 1-Ge Lamu: p (w 1) w 1Bowt (w 1)
For i-Ge Lamu: (for i=1 ..., n-1) p (w i| w I-1... w 1) w 1... w iBowt (w 1... w I-1)
For n-gram:p (w n| w N-1w N-2... w 1) w 1... w n
Words tree is used to organize possible word.For example, suppose in a words tree word w 2, w 3... w NIn any one all may be followed at word w 1The back.Can calculate conditional probability to help decision word w 2, w 3... w NIn which word follow word w 1The back.For large-scale vocabulary, the quantity of possibility is huge.The someone develops various technology, compares low probability path low with respect to peaked threshold value by using one " pruning velocity of sound " " cutting " its conditional probability, thereby reduces the quantity of related possibility.
Word arrives as a series of phoneme detection.Phoneme is meant the digital electric signal of expression sound herein.But before last phoneme of word was detected, what say was which word is normally ignorant, and the result causes the pruning to the word of receiving to postpone, thereby slack-off to the speed integral body of reception word decoding.
The articles that the people write such as S.Ortmanns " Language-Model Look-Aheadfor Large Vocabulary Speech Recognition; " ICSLP96 (1996), among the pp.2095-98, propose a kind of Look-ahead Technique, in the pruning process of sound beam search strategy, early merged language model probability.But how the author of this article fails to recognize that best the estimated probability with the words tree of storage remains on manageable level.For example, people's such as Ortmanns article is drawn a conclusion at last, and the big young pathbreaker of table who has stored calculating (estimation) probability is unusually big.See P2097 in the literary composition.
Therefore, large-scale vocabulary continuous speech voice recognition device (LVCSR) needs a kind of better words tree n-gram language mode form.
Summary of the invention
According to an aspect of the present invention, provide a kind of method of carrying out speech recognition, having comprised: created words tree; Discern first phoneme in this words tree; Estimate to have in the words tree probability of the word of first phoneme; And the probability of storing some estimations at least, wherein the probability of the estimation of being stored includes only directly the probability of the estimation of deriving from the n-gram probability, and the probability of the estimation of deriving from the compensation probability probability that compensated (n-1) gram to estimate approx.
According to another aspect of the invention, also provide a kind of method of carrying out speech recognition, having comprised: received phoneme and in words tree, discern them; And the probability of estimating to comprise the word of these phonemes by the probability that uses the estimation from the memory block, retrieve, the probability of the estimation that wherein retrieves includes only directly the probability of the estimation of deriving from the n-gram probability, and the probability of the estimation of deriving from the compensation probability probability that compensated (n-1) gram to estimate approx.
In certain embodiments, the present invention includes a kind of method of creating the beginning phoneme in words tree and this words tree of identification.The method of these embodiment further comprises to be estimated to have the probability of word in words tree of specific beginning phoneme and stores the probability of some estimations at least, and wherein backoff weight is not stored with the probability of estimating.The probability of estimating can be stored in the question blank.
In other embodiments, the present invention includes a kind of method that receives phoneme and in words tree, discern them.The method of these embodiment also comprises the probability of estimating to comprise the word of these phonemes by the probability that uses the estimation retrieve from the memory block, wherein retrieves probability and does not comprise the backoff weight that is stored together with the probability of estimating.Equally, the probability of estimation can be stored in the question blank.
The probability of estimating can use when setting up the pruning threshold value.
These methods can realize by the instruction on the computer-readable medium.
This paper has also introduced more embodiment and has been summarized in claims.
Description of drawings
By reading following detailed description and with reference to the accompanying drawing of embodiments of the invention, you will have one comprehensively to understand to the present invention, still, the present invention should not only limit to embodiment described here, and these embodiment are only with the usefulness that explains and understand.
Fig. 1 is the synoptic diagram of expression according to the words tree of some embodiments of the present invention.
Fig. 2 is a kind of block scheme of high level overview of the computer system that can be used for some embodiments of the present invention.
Fig. 3 is a kind of synoptic diagram of high level overview of the hand-held computer system that can be used for some embodiments of the present invention.
Embodiment
The present invention relates to the n-gram language mode form of a kind of LVCSR of being used for based on words tree.By means of the present invention,, can estimate the probability of a word in case detect a beginning phoneme.The path that pruning is lower than a threshold value can begin before identifying successor word.The present invention is used for quickening the search procedure of LVCSR.In decode procedure, language mode plays a part crucial, no matter is aspect accuracy rate, still at aspect of performance.Therefore, the performance of speech recognition system is relevant with language mode.
The present invention relates to organize the several different methods of words tree.As an example, Fig. 1 has shown a part of synoptic diagram of a words tree.Words tree among Fig. 1 links together many words according to phoneme, and different words can more shared identical phonemes.Predecessor word w 0Represent with a rectangle.w 0Have word before, also may not have word.In vocabulary, there are some phonemes to can be used as the beginning of successor word.These begin phoneme is Bph 1, Bph 2... Bph x, may be less than the sum of phoneme.
A plurality of words can begin with a phoneme.For the ease of discussing, the word of shared identical phoneme has similar label.For example, word w 11, w 12And w 13Each word is all with beginning phoneme Bph 1Beginning.More particularly, phoneme Bph 1, ph 2, ph 3And ph 4Constituted word w 11(for example, word " fund "); Phoneme Bph 1, ph 2, ph 3, ph 4And Ph 5Constituted word w 12(for example, " funds "), phoneme Bph 1And ph 2-ph 4And ph 6-ph 10Constituted word w 13(for example, " fundamental ").(note that in the word actual phoneme quantity may with show here different).In reality, typically situation is, and is much more with the word of identical phoneme beginning, but for the ease of discussing, only shown three and Bph 1Related word.In the example of Fig. 1, suppose word w 12It is last detected word.In this case, word w 0And w 12The place will be Actual path, other path will be potential path.
In certain embodiments, in case word W 0First follow-up phoneme identified, just can carry out the estimation of the probability of successor word, so just can before knowing successor word definitely, begin to prune.
In some embodiments of the invention, can use the n-gram language mode form based on words tree, this form can be applied to effectively with (for example) a kind of Viterbi decoding algorithm based on tree and use the language mode controlling mechanism of going ahead of the rest.For a Viterbi sound beam search algorithm, usually for tree state s and older generation's word string w based on tree N-1w N-2W 1The language model probability π of estimation v(s) can estimate by equation as follows (2):
π v ( s ) = w ∈ W ( s ) max ( λ w · p ( w | w n - 1 w n - 2 . . . w 1 ) ) - - - ( 2 )
Wherein W (s) is one group of set of words that can obtain from words tree state s, λ wExpression weight (with fraction representation), v is a predecessor word, p (w|w N-1w N-2W 1) expression n-gram word conditional probability.π v(s) also can be called the estimated probability P that in setting up the pruning threshold value, uses EstimatedEstimated probability also can be called controls probability in advance.As the result that the applicational language pattern is controlled in advance, can obtain to prune more closely the sound bundle to quicken decode procedure.Fractional weight λ wCan be set to 1 or can be between 0 and 1.In certain embodiments, λ wMay be greater than 1.Fractional weight can adopt empirical method, determine or calculate by repetition test.For each Bphl, fractional weight may be identical, also may be different.Though the present invention represents with n-gram, also may use three Ge Lamu, two Ge Lamu, single Ge Lamu and/or other Ge Lamu in practice.
The skeleton view of phoneme node is the state of a tree.In the process of speaking, there is increasing phoneme to be detected in this tree, the probability of estimation may need to recomputate, and can continue so that prune.
Common above mentioned estimation (calculating) language model probability must dynamically be calculated and be generated when operation.This process is very time-consuming, although introduced high-speed cache to save overall computing cost.The probability of calculate estimating in advance also is stored in them in the question blank and can accelerates this process significantly.
In the example of Fig. 1, suppose that Bphl is first phoneme of successor word.In this case, will provide two Ge Lamu examples of equation (2) in the equation (3), as follows:
P estimated=λ wmax{P(w 11|w 0),P(w 12|w 0),P(w 13|w 0)} (3)。
According to circumstances, will prune away those probability or conditional probability are lower than threshold value, or are equal to or less than the word of threshold value.The method of derivation threshold value is varied.For example, multiply by P with a numeral EstimatedOr P EstimatedSubtract a numeral.
For quickening decode procedure, we are limited in a compensation mechanism in the controllable scope by deployment with memory requirements and have defined a n-gram language mode form based on words tree, are used to store precalculated estimated probability.The probability P of Gu Jiing generally speaking EstimatedCan obtain by equation (4) as follows:
P estimated = P ( S j | w n - 1 w n - 2 . . . . w 1 ) max
S wherein jBe j state of potential successor word.Equation (4) comprises the triplex row in the bracket.Generally speaking, the top line of equation (4) is exactly equation (2).Certainly, equation (4) also can be used for different Ge Lamu, as single Ge Lamu, two Ge Lamu and three Ge Lamu.Equation (4) provides the approximate value of equation (2).Has only under the situation that the top line when equation (4) is met P EstimatedJust can store in the memory block, for example be stored in the question blank, question blank just can be controlled at manageable less level like this.
In equation (4), we needn't store backoff weight because they with n-gram language mode based on standard word in the weight of storing identical.In decoding, backoff weight can obtain by the file of a routine.In decoding,,, just use the estimated probability of the lower-order of band backoff weight if be fit to so if first row of equation (4) is not being met.
The probability that is used to prune can be the estimated probability of successor word, and the perhaps probability addition of estimated probability and predecessor word (for example, in Fig. 1, p (w 0)+P Estimated).
In certain embodiments, question blank has been stored the language mode estimated probability based on the n-gram of tree, and is as follows.Yet, also can use extended formatting.
1-Ge Lamu:
p(s 1)?s 1
...
I-Ge Lamu: (i=1 ..., n-1)
p(si|w i-1...w 1)w 1...w i-1?s i
...
n-gram:
p(s n|w n-1?w 1)w 1...w n-1?s n
Because the sum of the node in the compression words tree is equivalent to the sum of the word in the dictionary, based on the n-gram language mode of words tree and with equation (4) is total storage of the words tree of approximate value, compare with the language mode of traditional corresponding n-gram based on word, its rank are identical.The treatment technology that is used for common n-gram language mode can be applied to the new language mode file based on words tree of the present invention.
In certain embodiments, estimated probability calculated before identification, and was stored in the question blank.Yet,, in certain embodiments, only store the clauses and subclauses that those are directly derived from n-gram probability (obstructed over-compensation) for dwindling the size of table.(n-gram compensates to (n-1)-Ge Lamu) roughly compensate to (n-1)-Ge Lamu estimated probability from the clauses and subclauses derived of compensation probability.By compression, the size of table can narrow down to a manageable level.
When arriving last phoneme (or terminal note) of a word, just can identify successor word.For example, in Fig. 1, in case arrived phoneme ph 5, just know it is word w 12In case known word, just estimated probability can have been replaced with actual probabilities.This can be by adding full-scale condition probability (for example, p (w in Fig. 1 12| W 0)) and deduct estimated probability and realize.In certain embodiments, can suppose from first word in the cumulative probability of searching period, for example, p (w 1w 2w 3... w i)=p (w 1)+p (w 2| w 1)+P (w 3| w 2)+...+p (w i| w I-1).Logarithm that can probability of use is so that method is converted to addition in the future: log (P 1* P 2)=log (p 1)+log (p 2).
True probability can determine that after identifying last phoneme it can be expressed as P True=p (w Predecessor)+P Estimated+ P (W Actual| W Predecessor)-P EstimatedIn the example of Fig. 1, suppose that word w12 is an actual word, true probability P True=p (w 0)+P Estimated+ P (W 12| W 0)-P Estimated, P wherein EstimatedCan obtain by method as described above.
The node of words tree can fold or compresses by eliminating unnecessary node.For example, in Fig. 1, phoneme Bph 1, ph 2, ph 3, and ph 4Can be folded into a state (node).Yet, in practice, Bph 1Usually have other branch's words, therefore may not use ph 2-ph 4Folding.Phoneme ph 6-ph 10Can be folded into a kind of state.In certain embodiments, two kinds of words trees are arranged: originally one is used for speech recognition device, and the words tree of compression is used for language mode.The compression words tree can be used for creating during training question blank.In training, can create words tree from a dictionary according to known technology.
There are various computer systems can be applied in training and the speech recognition system.Only as an example, Fig. 2 has shown the synoptic diagram of a computer system 10, and this computer system has a processor 14, storer 16, and I/O and controll block 18.A large amount of memory spaces is arranged in the processor 14, and storer 16 can be represented the storer of the chip that is not positioned at processor 14 or a part is positioned at but a part is not positioned at the storer of the chip of processor 14.(perhaps storer 16 can fully be positioned on the chip of processor 14).At least some I/O can be positioned on the identical chip with processor 14 with controll block 18, perhaps are positioned on the independent chip.Mai Gelamu wind 26, monitor 30, annex memory 34 and an input equipment (such as keyboard and mouse 38), network are connected 42, and loudspeaker 44 can be connected with controll block 18 with I/O.Storer 34 can be represented various storeies, as hard disk, CD-ROM or DVD CD.Question blank can be any form, not as a restricted term.The estimated probability of storage may be all together or be distributed to different positions.Part or all of table can duplicate and be put in the different storeies.Question blank may be positioned at storer 16, storer 34 or other place.Question blank 22 and 24 is represented all or part of of question blank.Emphasize that more a bit, the system among Fig. 1 only explains usefulness, the present invention is not limited only to adopt the situation of such computer system.Be used to realize that computer system 10 of the present invention and other computer systems can be various forms of computers, as desktop computer, large scale computer and portable computer.
For example, Fig. 3 has shown a portable equipment 60, and has a display 62, can be used for realizing the part or all of function of Fig. 2.Portable equipment can be connected with another computer system (as the system among Fig. 2) sometimes.The shape of the object among Fig. 2 and 3 and relative size do not hint its true form and relative size yet.
Various storeies can be regarded as computer-readable medium, in the above can storage instructions, when carrying out these instructions, just can implement some embodiments of the present invention.
Other information and embodiment
Two lattice Aramaic language models have been realized based on the above-mentioned form of employing of words tree.By using the control in advance of precalculated language mode, we have not only saved the computing cost of estimated probability, the saving amount can reach decoding task total computing time 15%, but also the essential needed about 50MB internal memory of buffer memory when having saved these probability of dynamic generation.Yet (, these numerals are example just, is not requirement.) in addition, our newspeak pattern form is also handled the more language mode control in advance of high-order for we provide with rational time and internal memory.
" embodiment ", " embodiment ", " some embodiment " or " other embodiment " mentioned in this explanation are meant at least in some embodiments of the invention, specific function, structure or the feature related with embodiment that not necessarily comprise in all embodiments.Said " embodiment ", " embodiment " or " some embodiment " differ to establish a capital and are meant identical embodiment.
If say in illustrating " possibility ", " can " or " perhaps " comprise assembly, function, structure or a feature, this specific components, function, structure or feature are not necessarily leaveed no choice but be comprised so.If mention " one " element in instructions or " claims ", be not that to mean be an element so.If mention " other " element in instructions or " claims ", be not to get rid of a plurality of other elements are arranged so.
Those those skilled in the art will find can make many changes to aforesaid explanation and accompanying drawing within the scope of the invention.Correspondingly, define scope of the present invention by following claims and to its any revisal.

Claims (15)

1. method of carrying out speech recognition comprises:
Create words tree;
Discern first phoneme in this words tree;
Estimate to have in the words tree probability of the word of first phoneme; And
At least store the probability of some estimations, wherein the probability of the estimation of being stored includes only directly the probability of the estimation of deriving from the n-gram probability, and the probability of the estimation of deriving from the compensation probability probability that compensated (n-1) gram to estimate approx.
2. method according to claim 1 is characterized in that only just storing the probability of estimating under the situation that the n-gram of correspondence exists.
3. method according to claim 1 is characterized in that the probability of estimating is stored in the question blank.
4. method according to claim 3 is characterized in that question blank comprises following message:
1-Ge Lamu: p (s 1) s 1
I-Ge Lamu: (for i=1 ..., n-1): p (s i| w I-1... w 1) w 1... w I-1s i
n-gram:p(s n|w n-1?w 1)w 1...w n-1?s n
5. method according to claim 1 is characterized in that the probability P of estimating EstimatedTo obtain according to following equation:
Figure C998170580002C2
S wherein jBe j state of the word related with first phoneme, wherein W (s) is the set of words that can draw from words tree state s, λ wRepresent a fractional weight, wherein only under the situation of first row that satisfies above-mentioned equation, just store the probability of estimating.
6. method according to claim 5 is characterized in that λ wBe 1.
7. method according to claim 5 is characterized in that λ wBetween 0 and 1, and select for each first phoneme.
8. method of carrying out speech recognition comprises:
Receive phoneme and in words tree, discern them; And
Estimate to comprise the probability of the word of these phonemes by the probability that uses the estimation from the memory block, retrieve, the probability of the estimation that wherein retrieves includes only directly the probability of the estimation of deriving from the n-gram probability, and the probability of the estimation of deriving from the compensation probability probability that compensated (n-1) gram to estimate approx.
9. method according to claim 8 is characterized in that the probability of estimating is stored in the question blank.
10. method according to claim 9 is characterized in that question blank comprises following message, and wherein s is the state of words tree, and p is a probability:
1-Ge Lamu: p (s 1) s 1
I-Ge Lamu: (for i=1 ..., n-1): p (s i| w I-1... w 1) w 1... w I-1s i
n-gram:p(s n|w n-1?w 1)w 1...w n-1?s n
11. method according to claim 8 is characterized in that can deriving backoff weight information from being stored in one based on the weight the n-gram language mode of word.
12. method according to claim 8 is characterized in that the probability of estimating uses when setting up a pruning threshold value.
13. method according to claim 8 is characterized in that the probability of estimating determines according to following equation:
Figure C998170580003C1
Figure C998170580003C2
S wherein jBe j state of the word related with first phoneme, wherein W (s) is the set of words that can draw from words tree state s, λ wRepresent a fractional weight, only store the result of first row.
CN99817058.5A 1999-12-23 1999-12-23 Speech recognizer with a lexial tree based N-gram language model Expired - Fee Related CN1201286C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN1999/000217 WO2001048737A2 (en) 1999-12-23 1999-12-23 Speech recognizer with a lexical tree based n-gram language model

Publications (2)

Publication Number Publication Date
CN1406374A CN1406374A (en) 2003-03-26
CN1201286C true CN1201286C (en) 2005-05-11

Family

ID=4575158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN99817058.5A Expired - Fee Related CN1201286C (en) 1999-12-23 1999-12-23 Speech recognizer with a lexial tree based N-gram language model

Country Status (3)

Country Link
CN (1) CN1201286C (en)
AU (1) AU1767600A (en)
WO (1) WO2001048737A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271450B (en) * 2007-03-19 2010-09-29 株式会社东芝 Method and device for cutting language model

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0420464D0 (en) 2004-09-14 2004-10-20 Zentian Ltd A speech recognition circuit and method
GB2453366B (en) * 2007-10-04 2011-04-06 Toshiba Res Europ Ltd Automatic speech recognition method and apparatus
CN102422245B (en) 2009-03-19 2016-05-04 谷歌公司 Input method editor
JP5362095B2 (en) * 2009-03-19 2013-12-11 グーグル・インコーポレーテッド Input method editor
US8655647B2 (en) 2010-03-11 2014-02-18 Microsoft Corporation N-gram selection for practical-sized language models
US8589164B1 (en) * 2012-10-18 2013-11-19 Google Inc. Methods and systems for speech recognition processing using search query information
CN111128172B (en) * 2019-12-31 2022-12-16 达闼机器人股份有限公司 Voice recognition method, electronic equipment and storage medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3009709B2 (en) * 1990-07-13 2000-02-14 日本電信電話株式会社 Japanese speech recognition method
DE4130631A1 (en) * 1991-09-14 1993-03-18 Philips Patentverwaltung METHOD FOR RECOGNIZING THE SPOKEN WORDS IN A VOICE SIGNAL
JPH0772840B2 (en) * 1992-09-29 1995-08-02 日本アイ・ビー・エム株式会社 Speech model configuration method, speech recognition method, speech recognition device, and speech model training method
US5621859A (en) * 1994-01-19 1997-04-15 Bbn Corporation Single tree method for grammar directed, very large vocabulary speech recognizer
JPH08123479A (en) * 1994-10-26 1996-05-17 Atr Onsei Honyaku Tsushin Kenkyusho:Kk Continuous speech recognition device
JP3304665B2 (en) * 1995-02-17 2002-07-22 松下電器産業株式会社 Voice recognition device
JP4180110B2 (en) * 1995-03-07 2008-11-12 ブリティッシュ・テレコミュニケーションズ・パブリック・リミテッド・カンパニー Language recognition
US5832428A (en) * 1995-10-04 1998-11-03 Apple Computer, Inc. Search engine for phrase recognition based on prefix/body/suffix architecture
US5758024A (en) * 1996-06-25 1998-05-26 Microsoft Corporation Method and system for encoding pronunciation prefix trees
US5822730A (en) * 1996-08-22 1998-10-13 Dragon Systems, Inc. Lexical tree pre-filtering in speech recognition
KR100509797B1 (en) * 1998-04-29 2005-08-23 마쯔시다덴기산교 가부시키가이샤 Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word
WO1999059141A1 (en) * 1998-05-11 1999-11-18 Siemens Aktiengesellschaft Method and array for introducing temporal correlation in hidden markov models for speech recognition
JPH11344991A (en) * 1998-05-30 1999-12-14 Brother Ind Ltd Voice recognition device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271450B (en) * 2007-03-19 2010-09-29 株式会社东芝 Method and device for cutting language model

Also Published As

Publication number Publication date
WO2001048737A2 (en) 2001-07-05
WO2001048737A3 (en) 2002-11-14
AU1767600A (en) 2001-07-09
CN1406374A (en) 2003-03-26

Similar Documents

Publication Publication Date Title
US7120582B1 (en) Expanding an effective vocabulary of a speech recognition system
US9734823B2 (en) Method and system for efficient spoken term detection using confusion networks
US8311825B2 (en) Automatic speech recognition method and apparatus
US9292487B1 (en) Discriminative language model pruning
US7711561B2 (en) Speech recognition system and technique
US7831911B2 (en) Spell checking system including a phonetic speller
JP5214461B2 (en) Word clustering for input data
EP1484744A1 (en) Speech recognition language models
WO2003010754A1 (en) Speech input search system
KR20080069990A (en) Speech index pruning
EP0800158A1 (en) Word spotting
CN101271450B (en) Method and device for cutting language model
Chelba et al. Query language modeling for voice search
CN1201286C (en) Speech recognizer with a lexial tree based N-gram language model
KR20230156125A (en) Lookup table recursive language model
Wester et al. A comparison of data-derived and knowledge-based modeling of pronunciation variation
KR100480790B1 (en) Method and apparatus for continous speech recognition using bi-directional n-gram language model
JP2000259645A (en) Speech processor and speech data retrieval device
Larson Sub-word-based language models for speech recognition: implications for spoken document retrieval
JP2938865B1 (en) Voice recognition device
Maskey et al. A phrase-level machine translation approach for disfluency detection using weighted finite state transducers
JP2000267693A (en) Voice processor and index preparation device
Shao et al. A fast fuzzy keyword spotting algorithm based on syllable confusion network
JP2005265967A (en) Recording medium where tree structure dictionary is recorded and language score table generating program for tree structure dictionary
Reichl Language model adaptation using minimum discrimination information.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050511

Termination date: 20121223