CN101464896A

CN101464896A - Voice fuzzy retrieval method and apparatus

Info

Publication number: CN101464896A
Application number: CNA2009100011645A
Authority: CN
Inventors: 王智国; 吴及; 钱胜; 吕萍; 陈志刚; 胡国平; 胡郁; 刘庆峰; 吴晓如; 王仁华
Original assignee: iFlytek Co Ltd
Current assignee: Tsinghua University; iFlytek Co Ltd
Priority date: 2009-01-23
Filing date: 2009-01-23
Publication date: 2009-06-24
Anticipated expiration: 2029-01-23
Also published as: CN101464896B

Abstract

The invention discloses a speech fuzzy retrieval method and device, wherein the method includes the following steps: using a preset acoustic model and a language model to perform speech recognition on an acquired speech signal to obtain a recognition result; using a preset index table according to the The recognition result is searched in the preset text entry library to obtain the primary selection item; the string fuzzy matching is performed on the primary selection item and the recognition result, and the entry whose matching degree is within the preset matching degree threshold range is selected as the Select the items, and record the matching positions of each item at the same time; calculate the posterior probability between the selected item matching part of the text and the speech signal, and finally use the posterior probability and the matching ratio obtained by the matching position to select several items as Search results for speech signals. By adopting the present invention, text entries matching the speech signal can be quickly and accurately retrieved based on the speech signal on a massive text entry database.

Description

Speech fuzzy retrieval method and device

技术领域 technical field

本发明涉及语音识别领域和检索领域，尤其涉及一种语音模糊检索方法及装置。The invention relates to the fields of speech recognition and retrieval, in particular to a speech fuzzy retrieval method and device.

背景技术 Background technique

语音模糊检索作为多媒体检索技术中的一个分支，与传统的文本检索和音频检索不同，它解决的不是文本之于文本库的检索或音频之于音频库的检索，而是音频之于文本库的检索，即如何根据用户提交的一段语音信号，在文本库中检索出与之内容相关的文本信息。As a branch of multimedia retrieval technology, speech fuzzy retrieval is different from traditional text retrieval and audio retrieval. Retrieval, that is, how to retrieve text information related to its content in the text library according to a piece of voice signal submitted by the user.

语音识别技术可以将语音信号转换为文字内容，若利用转换后的文字并借鉴文本检索方法，便可实现音频之于文本库的检索，然而，语音识别技术不能做到百分之百准确，特别是对于口语语音，识别准确率通常低于90％，可以想象，用非准确的文本来检索海量文本条目库，检索结果是更加不准确的。Speech recognition technology can convert speech signals into text content. If the converted text is used and the text retrieval method is used for reference, the retrieval of audio from the text library can be realized. However, speech recognition technology cannot be 100% accurate, especially for spoken language For speech, the recognition accuracy rate is usually lower than 90%. It is conceivable that if non-accurate text is used to retrieve a massive text entry library, the retrieval results will be even more inaccurate.

发明内容 Contents of the invention

本发明提供一种语音模糊检索方法及装置，以解决现有语音识别技术存在的检索不准确的问题。The invention provides a voice fuzzy retrieval method and device to solve the problem of inaccurate retrieval existing in the existing voice recognition technology.

为此，本发明实施例采用如下技术方案：For this reason, the embodiment of the present invention adopts following technical scheme:

一种语音模糊检索方法，包括：A speech fuzzy retrieval method, comprising:

利用预置的声学模型及语言模型对获取的语音信号进行语音识别，得到识别结果；Use the preset acoustic model and language model to perform speech recognition on the acquired speech signal, and obtain the recognition result;

利用预置的索引表根据所述识别结果在预置的文本条目库中进行检索，得到初选条目；Using the preset index table to search in the preset text item library according to the recognition result to obtain the primary selection item;

将所述初选条目与所述识别结果进行字符串模糊匹配，选取匹配度在预置的匹配度阈值范围内的精选条目，同时记录匹配位置；performing string fuzzy matching on the preliminary selection item and the recognition result, selecting selected items whose matching degree is within a preset matching degree threshold range, and recording the matching position at the same time;

计算精选条目匹配部分文本与所述语音信号间的后验概率，利用后验概率以及通过所述匹配位置获得的匹配比例选择若干个条目作为语音信号的检索结果。Calculate the posterior probability between the selected item matching part of the text and the speech signal, and use the posterior probability and the matching ratio obtained through the matching position to select several entries as the retrieval result of the speech signal.

该方法还包括：The method also includes:

根据待检索的文本条目以音节、字或词为索引单元建立所述索引表，用以进行一级或多级索引。According to the text items to be retrieved, the index table is established with syllables, characters or words as index units for one-level or multi-level indexing.

该方法还包括：The method also includes:

所述语言模型全部或部分利用所述预置的文本条目库训练得到。The language model is trained in whole or in part by using the preset text entry library.

其中：in:

所述识别结果的形式包括语音信号对应的最可能文字串、语音信号对应的最有可能的多种文字串，以及语音信号对应的词图。The form of the recognition result includes the most probable text string corresponding to the speech signal, the most probable various text strings corresponding to the speech signal, and the word graph corresponding to the speech signal.

所述利用预置的索引表根据所述识别结果在预置的文本条目库中进行检索得到初选条目的具体过程为：The specific process of using the preset index table to search in the preset text entry library according to the recognition result to obtain the primary selection entry is as follows:

利用预置的索引表对识别结果中的每个字/词进行投票，选取投票数高于预置的投票数阈值的条目作为所述初选条目；Using a preset index table to vote for each word/word in the recognition result, and selecting an entry whose vote count is higher than the preset vote count threshold as the primary selection entry;

其中，所述投票是指用识别结果中的字/词查找索引表的索引项，查询到索引项后，将该索引所包括的每个条目投票数都加1。Wherein, the voting refers to using the words/phrases in the recognition result to look up the index items of the index table, and after finding the index items, adding 1 to the number of votes for each item included in the index.

所述模糊匹配的匹配算法采用基于混淆矩阵的文本间编辑距离动态规划计算方法，其中，所述混淆矩阵通过训练得到或者预先设定，对替换、插入、删除代价进行优化。The matching algorithm of the fuzzy matching adopts a dynamic programming calculation method of editing distance between texts based on a confusion matrix, wherein the confusion matrix is obtained through training or preset, and optimizes the cost of replacement, insertion, and deletion.

一种语音模糊检索装置，包括：A speech fuzzy retrieval device, comprising:

语音信号获取单元，用于获取语音信号；A voice signal acquiring unit, configured to acquire a voice signal;

识别单元，用于利用预置的声学模型及语言模型对获取的语音信号进行语音识别，得到识别结果；The recognition unit is used to use the preset acoustic model and language model to perform speech recognition on the acquired speech signal to obtain a recognition result;

检索单元，用于利用预置的索引表根据所述识别结果在预置的文本条目库中进行检索，得到初选条目；A retrieval unit, configured to use a preset index table to search in a preset text entry library according to the recognition result to obtain primary selection entries;

模糊匹配单元，用于将所述初选条目与所述识别结果进行字符串模糊匹配，选取匹配度在预置的匹配度阈值范围内的精选条目，并记录匹配位置；A fuzzy matching unit, configured to fuzzy character string match the primary selection item and the recognition result, select selected items whose matching degree is within a preset matching degree threshold range, and record the matching position;

结果确定单元，用于计算精选条目的匹配部分与所述语音信号间的后验概率，利用后验概率以及通过所述匹配位置获得的匹配比例选择若干个条目作为语音信号的检索结果。The result determining unit is used to calculate the posterior probability between the matching part of the selected item and the speech signal, and select several entries as the retrieval result of the speech signal by using the posterior probability and the matching ratio obtained through the matching position.

该装置还包括：The unit also includes:

索引表建立单元，用于根据待检索的预置的文本条目库以音节、字或词为索引单元建立所述索引表，所述索引表用以进行一级或多级索引。The index table building unit is used to build the index table with syllables, words or words as index units according to the preset text entry library to be retrieved, and the index table is used for one-level or multi-level indexing.

该装置还包括：The unit also includes:

语言模型建立单元，用于利用所述预置的文本条目库训练得到所述语言模型的部分或全部。A language model building unit, configured to use the preset text entry library to train part or all of the language model.

所述检索单元包括：The retrieval unit includes:

索引投票子单元，用于利用预置的索引表对识别结果中的每个字/词进行投票，其中，所述投票是指用识别结果中的字/词查找索引表的索引项，查询倒索引项后，将该索引所包括的每个条目投票数都加1；The index voting subunit is used to vote each word/word in the recognition result by using a preset index table, wherein the voting refers to searching the index item of the index table with the word/word in the recognition result, and the query is reversed. After the index item, add 1 to the number of votes for each item included in the index;

初选条目选取子单元，用于选取投票数高于预置的投票数阈值的条目作为所述初选条目。The primary selection item selection subunit is configured to select an item whose vote count is higher than a preset vote count threshold as the primary selection item.

可见，本发明提出了一种全新的语音模糊检索模式，它通过应用相关的语言模型、索引投票、字符串模糊匹配、精选条目与语音信号的后验概率计算等步骤，克服了不完全正确的语音识别结果对文本库检索的不利影响，实现了语音信号在海量文本条目库上的快速准确检索。It can be seen that the present invention proposes a brand-new speech fuzzy retrieval mode, which overcomes incomplete correctness by applying steps such as relevant language model, index voting, character string fuzzy matching, selected items and posterior probability calculation of speech signals. The adverse effect of the speech recognition results on the text database retrieval has realized the rapid and accurate retrieval of the speech signal on the massive text entry database.

附图说明 Description of drawings

图1为本发明语音模糊检索方法流程图；Fig. 1 is the flowchart of speech fuzzy retrieval method of the present invention;

图2为本发明方法实施例流程图；Fig. 2 is the flowchart of the method embodiment of the present invention;

图3为本发明语音模糊检索装置结构示意图。Fig. 3 is a schematic structural diagram of the speech fuzzy retrieval device of the present invention.

具体实施方式 Detailed ways

本发明提供的语音模糊检索方案，在识别时加入合适的语言模型以提高准确率，在利用识别结果作为文本检索时进行字符串模糊匹配以减小识别错误的影响，并且，计算候选关键词为音频内容的后验概率进行验证，从而大幅度提高检索的准确性和可靠性。The voice fuzzy retrieval scheme provided by the present invention adds a suitable language model to improve the accuracy rate during recognition, performs character string fuzzy matching when using the recognition result as text retrieval to reduce the impact of recognition errors, and calculates candidate keywords as The posterior probability of audio content is verified, thereby greatly improving the accuracy and reliability of retrieval.

参见图1，为本发明语音模糊检索方法流程图，包括以下步骤：Referring to Fig. 1, it is the speech fuzzy retrieval method flowchart of the present invention, comprises the following steps:

S101：利用预置的声学模型以及语言模型对获取的语音信号进行语音识别，得到识别结果；S101: Perform speech recognition on the acquired speech signal by using a preset acoustic model and language model, and obtain a recognition result;

S102：利用预置的索引表根据所述识别结果在预置的文本条目库中进行检索，得到初选条目；S102: Use the preset index table to search in the preset text item library according to the recognition result to obtain the primary selection item;

其中，所述预置的文本条目库一般是海量的文本条目库，包括大量的文本条目信息。Wherein, the preset text entry library is generally a massive text entry library, including a large amount of text entry information.

S103：将所述初选条目与所述识别结果进行字符串模糊匹配，选取匹配度在预置的匹配度阈值范围内的精选条目，同时记录匹配位置；S103: Perform character string fuzzy matching on the preliminary selection item and the recognition result, select selected items whose matching degree is within a preset matching degree threshold range, and record the matching position at the same time;

S104：计算精选条目的匹配部分与所述语音信号间的后验概率，利用所述后验概率以及通过所述匹配位置获得的匹配比例选择若干个条目作为语音信号的检索结果。S104: Calculate the posterior probability between the matching part of the selected item and the speech signal, and use the posterior probability and the matching ratio obtained through the matching position to select several entries as the retrieval result of the speech signal.

下面结合具体实例，对本发明进行详细介绍。The present invention will be described in detail below in conjunction with specific examples.

参见图2，为利用语音模糊检索技术进行语音检索海量文本条目库的具体实施例方法流程图，包括：Referring to Fig. 2, it is a flow chart of a specific embodiment method for voice retrieval of a large amount of text entry database using voice fuzzy retrieval technology, including:

S201：获取用户输入的语音信号；S201: Obtain a voice signal input by a user;

S202：利用预先建立的声学模型以及语言模型对获取的语音信号进行语音识别，得到识别结果；S202: Perform speech recognition on the acquired speech signal by using the pre-established acoustic model and language model, and obtain a recognition result;

S203：利用预置的索引表根据识别结果在预置的文本条目库中进行快速检索，得到初选条目；S203: Use the preset index table to perform a quick search in the preset text item library according to the recognition result to obtain the primary selection item;

在开始构建语音模糊检索系统之前，需要预先建立合适的语音模型和海量文本条目库的索引表。Before starting to build a speech fuzzy retrieval system, it is necessary to establish a suitable speech model and an index table of a massive text entry library in advance.

因为要在海量文本条目库中检索包含语音内容的文本，所以语音内容极有可能是海量文本条目库中存在的，是其中的某个条目或某个条目的一部分，因此，根据海量文本条目库为语料库训练出的语言模型是应用相关的语言模型，它能更好地适应检索任务。Because it is necessary to retrieve the text containing speech content in the massive text entry library, the speech content is very likely to exist in the massive text entry library, and it is a certain entry or a part of a certain entry. Therefore, according to the massive text entry library A language model trained for a corpus is an application-dependent language model that is better suited to retrieval tasks.

对于预置的索引表，它包括两部分组成：索引项以及索引项对应的内容。本发明中索引表的索引项为字或词，索引项对应的内容是海量文本条目库中包含该字或词的文本，通常一个索引项对应多个文本。例如，索引项“中”对应的内容包括“中国共产党”、“中国人民共和国”以及“我们的大中国”等等。For the preset index table, it consists of two parts: the index item and the corresponding content of the index item. The index item of the index table in the present invention is a character or a word, and the corresponding content of the index item is the text containing the word or word in the massive text entry library. Usually, one index item corresponds to multiple texts. For example, the content corresponding to the index item "中" includes "Communist Party of China", "People's Republic of China", "Our Greater China" and so on.

由此，在S202中对输入语音进行语音识别时，加入S203中训练的应用相关的语言模型，可以很好地提高识别的准确率，在S202中得到准确率高的识别结果。Therefore, when performing speech recognition on the input speech in S202, the application-related language model trained in S203 can be added to improve the accuracy of recognition, and a recognition result with high accuracy can be obtained in S202.

识别结果是语音信号经解码后的字符表现形式，常用的形式有：输入语音信号对应的最可能文字串(即只有一种识别结果，例如“中华人民共和国”)、最有可能的是N种文字串(即多种识别结果，例如3种识别结果：“中国共产党”、“中国人民共和国”以及“我们的大中国”)、语音信号对应的词图，所谓词图是指以有向无环图的方式表示所有可能的文字串，词图是最高效的识别结果表现形式，它包含的信息量也是最丰富的。The recognition result is the character expression form after the speech signal is decoded. The commonly used forms are: the most likely text string corresponding to the input speech signal (that is, there is only one recognition result, such as "People's Republic of China"), and the most likely N types Text strings (that is, multiple recognition results, such as 3 kinds of recognition results: "Communist Party of China", "People's Republic of China" and "Our Great China"), and word graphs corresponding to voice signals. Ring graphs represent all possible text strings, word graphs are the most efficient form of recognition results, and they contain the most abundant information.

在S203中，对S202中得到的识别结果中的每个字/词，利用预置的索引表进行索引投票。所谓投票也就是说，用识别结果中的字/词查找索引表的索引项，查询倒索引项后，对应的文本投票数加1。例如，识别结果中包含“中”字，则所有包含“中”的文本，如中国共产党”、“中国人民共和国”以及“我们的大中国”等对应的投票数加1。投票数越高的文本，与识别结果的匹配程度越高。保留投票数高于阈值的文本作为初选条目。In S203, for each character/phrase in the recognition result obtained in S202, index voting is performed using a preset index table. The so-called voting means that the word/phrase in the recognition result is used to look up the index items of the index table, and after querying the inverted index items, the number of votes for the corresponding text is increased by 1. For example, if the word "中" is included in the recognition result, the number of votes corresponding to all texts containing "中", such as the Communist Party of China", "People's Republic of China" and "Our Great China", will be increased by 1. The higher the number of votes The text, the higher the degree of matching with the recognition result. Keep the text with the number of votes higher than the threshold as the primary selection item.

S204：对初选条目与识别结果进行字符串模糊匹配，依据匹配度从高到低排序匹配的条目，且只保留匹配度在匹配度阈值范围内的精选条目；S204: Perform character string fuzzy matching on the preliminary selection items and the recognition results, sort the matched items according to the matching degree from high to low, and keep only the selected items whose matching degree is within the matching degree threshold;

由于语音识别技术不能保证百分百的正确率，导致识别结果中存在一定的错误，而且，索引表只记录了文本中含有那些字/词，并没有字/词的位置信息，因此索引出的初选条目不能直接作为检索结果。Since the speech recognition technology cannot guarantee a 100% correct rate, there are some errors in the recognition results. Moreover, the index table only records those words/words contained in the text, and does not have the position information of the words/words. Therefore, the indexed Preliminary items cannot be directly used as search results.

因此，利用字符串模糊匹配技术，得到初选条目与识别结果中的匹配度。相对于字符串精度匹配而言，模糊匹配允许子串与主串不完全相同。目前字符串模糊匹配的两个主要方法是位向量方法和过滤方法，本发明可以采用现有的方法进行。最简单的模糊匹配算法是基于动态规划的编辑距离，匹配中存在删除、插入和替代三种错误，每种错误可以依据实际应用定义不同的错误代价，对于正确匹配的部分，通常定义错误代价为零。本发明中，识别结果和海量文本条目库中的文本都可以看作是某种字符表现形式，且子串是识别结果，主串是海量文本条目库中的条目。匹配度与错误代价程反比。由于用户输入的语音信号可能是海量文本条目库中的文本片段，字符串模糊匹配在给出匹配程度的同时，还给定了最可能的匹配位置。Therefore, using the string fuzzy matching technology, the matching degree between the primary selection item and the recognition result is obtained. Compared with string precision matching, fuzzy matching allows the substring to be different from the main string. The two main methods of character string fuzzy matching are the bit vector method and the filter method at present, and the present invention can adopt the existing method to carry out. The simplest fuzzy matching algorithm is based on the edit distance of dynamic programming. There are three kinds of errors in the matching: deletion, insertion and substitution. Each error can define different error costs according to the actual application. For the correct matching part, the error cost is usually defined as zero. In the present invention, both the recognition result and the text in the massive text entry database can be regarded as a certain character representation, and the substring is the recognition result, and the main string is the entry in the massive text entry database. The degree of matching is inversely proportional to the error cost. Since the voice signal input by the user may be a text fragment in a massive text entry library, string fuzzy matching not only gives the matching degree, but also gives the most likely matching position.

S205：对每个符合条件的精选条目计算其为输入音频内容的后验概率；同时，记录匹配位置；S205: Calculate the posterior probability of each qualified selected item being the input audio content; at the same time, record the matching position;

由于步骤S204得到的精选条目是与识别结果在字符层面进行比较得来的，而识别结果本身含有一定错误，因此匹配程度高并不一定代表其为语音实际内容的可能性大。因此在S205中，计算了给定语音信号条件下精选条目的后验概率。该后验概率是0到1之间的数值，所有精选条目的后验概率之和为1。后验概率越大，其对应的条目确为语音内容的可能性就越大。后验概率是指在得到＂结果＂的信息后重新修正的概率，如贝叶斯公式中的，是＂执果寻因＂问题中的＂因＂，先验概率与后验概率有不可分割的联系，后验概率的计算要以先验概率为基础。有关后验概率的计算方法为成熟的现有技术，此处不作多描述。Since the selected entry obtained in step S204 is compared with the recognition result at the character level, and the recognition result itself contains certain errors, a high degree of matching does not necessarily mean that it is more likely to be the actual content of the speech. Therefore, in S205, the posterior probability of the selected item under the given speech signal condition is calculated. The posterior probability is a value between 0 and 1, and the sum of the posterior probabilities of all selected items is 1. The greater the posterior probability is, the greater the possibility is that the corresponding item is indeed speech content. Posterior probability refers to the probability of re-correction after obtaining the "result" information. For example, in the Bayesian formula, it is the "cause" in the problem of "finding the cause of the fruit". The prior probability and the posterior probability are inseparable. The calculation of the posterior probability is based on the prior probability. The calculation method of the posterior probability is a mature prior art, and no further description is given here.

S206：利用所述后验概率以及通过所述匹配位置获得的匹配比例，选择若干个条目作为语音信号的检索结果，然后结束流程。S206: Using the posterior probability and the matching ratio obtained through the matching position, select several items as the retrieval result of the speech signal, and then end the process.

其中，可通过对后验概率和匹配比例加权处理的方式，最终选择出后验概率和匹配比例相对较高的条目作为检索结果。Among them, by weighting the posterior probability and matching ratio, the items with relatively high posterior probability and matching ratio can be finally selected as the retrieval result.

与上述方法相对应，本发明提供一种语音模糊检索装置，该装置可以由软件、硬件或软硬件结合方式实现。Corresponding to the above method, the present invention provides a speech fuzzy retrieval device, which can be realized by software, hardware or a combination of software and hardware.

参见图3，为该装置内部结构示意图，包括：语音信号获取单元300、识别单元301、检索单元302、模糊匹配单元303以及结果确定单元304，其中：Referring to Fig. 3, it is a schematic diagram of the internal structure of the device, including: a voice signal acquisition unit 300, an identification unit 301, a retrieval unit 302, a fuzzy matching unit 303 and a result determination unit 304, wherein:

语音信号获取单元300，用于获取语音信号；A voice signal acquisition unit 300, configured to acquire a voice signal;

识别单元301，用于利用预置的声学模型以及语言模型对语音信号获取单元300获取的语音信号进行语音识别，得到识别结果；The recognition unit 301 is configured to use a preset acoustic model and a language model to perform speech recognition on the speech signal obtained by the speech signal obtaining unit 300, to obtain a recognition result;

检索单元302，用于利用预置的索引表根据识别单元301得到的识别结果在预置的文本条目库中进行检索，得到初选条目；The retrieval unit 302 is configured to use the preset index table to search in the preset text entry library according to the recognition result obtained by the recognition unit 301 to obtain the primary selection entry;

模糊匹配单元303，用于将检索单元302得到的初选条目与识别单元301得到的识别结果进行字符串模糊匹配，选取匹配度在预置的匹配度阈值范围内的精选条目，同时记录匹配位置；The fuzzy matching unit 303 is used to perform character string fuzzy matching on the preliminary selection items obtained by the retrieval unit 302 and the recognition results obtained by the recognition unit 301, select selected entries whose matching degree is within a preset matching degree threshold range, and record the matching Location;

结果确定单元304，用于计算模糊匹配单元303匹配的精选条目与语音信号间的后验概率，利用所述后验概率以及通过所述匹配位置获得的匹配比例，选择若干个条目作为语音信号的检索结果。The result determination unit 304 is used to calculate the posterior probability between the selected items matched by the fuzzy matching unit 303 and the speech signal, and select several entries as the speech signal by using the posterior probability and the matching ratio obtained through the matching position search results.

优选地，该装置还包括：Preferably, the device also includes:

索引表建立单元305，用于根据所述预置的文本条目以音节、字或词为索引单元建立索引表。An index table building unit 305, configured to build an index table with syllables, characters or words as index units according to the preset text entries.

优选地，该装置还包括：Preferably, the device also includes:

语言模型建立单元306，用于利用所述预置的文本条目库训练得到语言模型。A language model building unit 306, configured to use the preset text entry library to train and obtain a language model.

优选地，检索单元302进一步包括：Preferably, the retrieval unit 302 further includes:

索引投票子单元(图中未示出)，用于利用预置的索引表对识别结果中的每个字/词进行投票，其中，所述投票是指用识别结果中的字/词查找索引表的索引项，查询倒索引项后，将该索引所包括的每个条目投票数都加1；An index voting subunit (not shown in the figure) is used to vote for each word/word in the recognition result using a preset index table, wherein the voting refers to searching the index with the word/word in the recognition result The index item of the table, after querying the inverted index item, the number of votes for each item included in the index is increased by 1;

初选条目选取子单元(图中未示出)，用于选取投票数高于预置的投票数阈值的条目作为所述初选条目。The primary selection item selection subunit (not shown in the figure) is configured to select an item whose vote count is higher than a preset vote count threshold as the primary selection item.

对于本发明提供装置的实现细节可参见方法实施例，此处不再赘述。For the implementation details of the device provided by the present invention, reference may be made to the method embodiments, which will not be repeated here.

可见，本发明提出了一种全新的语音模糊检索方案，它通过应用相关的语言模型、索引投票、字符串模糊匹配、候选文本与语音信号的后验概率计算等步骤，克服了不完全正确的语音识别结果对文本库检索的不利影响，实现了语音信号在海量文本条目库上的快速准确检索。It can be seen that the present invention proposes a brand-new speech fuzzy retrieval scheme, which overcomes the incomplete correctness by applying steps such as relevant language model, index voting, character string fuzzy matching, candidate text and posterior probability calculation of speech signal. The unfavorable impact of speech recognition results on text database retrieval enables fast and accurate retrieval of speech signals on massive text entry databases.

本领域普通技术人员可以理解，实现上述实施例的方法的过程可以通过程序指令相关的硬件来完成，所述的程序可以存储于可读取存储介质中，该程序在执行时执行上述方法中的对应步骤。所述的存储介质可以如：ROM/RAM、磁碟、光盘等。Those of ordinary skill in the art can understand that the process of realizing the method of the above-mentioned embodiment can be completed by the related hardware of the program instruction, and the described program can be stored in a readable storage medium, and the program executes the above-mentioned method when executed. Corresponding steps. The storage medium may be, for example: ROM/RAM, magnetic disk, optical disk, etc.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that, for those of ordinary skill in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications can also be made. It should be regarded as the protection scope of the present invention.

Claims

1, a kind of voice fuzzy retrieval method is characterized in that, comprising:

Acoustic model that utilization is preset and language model carry out speech recognition to the voice signal that obtains, and obtain recognition result;

The concordance list that utilization is preset is retrieved in the textual entry storehouse of presetting according to described recognition result, obtains the primary election clauses and subclauses;

Described primary election clauses and subclauses and described recognition result are carried out the character string fuzzy matching, choose the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, write down matched position simultaneously;

Calculate the posterior probability between selected entries match part text and described voice signal, utilize posterior probability and select the result for retrieval of several clauses and subclauses as voice signal by the matching ratio that described matched position obtains.

2, according to the described method of claim 1, it is characterized in that, also comprise: is that indexing units is set up described concordance list according to textual entry to be retrieved with syllable, word or speech, in order to carry out one or more levels index.

3, according to the described method of claim 2, it is characterized in that, also comprise: described language model is all or part of to utilize described textual entry storehouse training of presetting to obtain.

According to the described method of claim 1, it is characterized in that 4, the form of described recognition result comprises the most probable text strings of voice signal correspondence, the most possible kinds of words string of voice signal correspondence, and the speech figure of voice signal correspondence.

According to the described method of claim 1, it is characterized in that 5, the concordance list that described utilization is preset is retrieved the detailed process that obtains the primary election clauses and subclauses according to described recognition result and is in the textual entry storehouse of presetting:

The concordance list that utilization is preset is voted to each character/word in the recognition result, chooses votes and is higher than the clauses and subclauses of the votes threshold value that presets as described primary election clauses and subclauses;

Wherein, described ballot is meant the index entry of searching concordance list with the character/word in the recognition result, inquire index entry after, each clauses and subclauses votes that this index is included all adds 1.

6, according to the described method of claim 1, it is characterized in that, the matching algorithm of described fuzzy matching adopts based on editing distance dynamic programming computing method between the text of confusion matrix, wherein, described confusion matrix obtains or preestablishes by training, is optimized replacing, insert, delete cost.

7, a kind of voice fuzzy indexing unit is characterized in that, comprising:

The voice signal acquiring unit is used to obtain voice signal;

Recognition unit is used to utilize the acoustic model and the language model that preset that the voice signal that obtains is carried out speech recognition, obtains recognition result;

Retrieval unit is used for utilizing the concordance list that presets to retrieve in the textual entry storehouse of presetting according to described recognition result, obtains the primary election clauses and subclauses;

The fuzzy matching unit is used for described primary election clauses and subclauses and described recognition result are carried out the character string fuzzy matching, chooses the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, and the record matched position;

Determining unit is used to calculate the compatible portion of selected clauses and subclauses and the posterior probability between described voice signal as a result, utilizes posterior probability and selects the result for retrieval of several clauses and subclauses as voice signal by the matching ratio that described matched position obtains.

8, according to the described device of claim 7, it is characterized in that, also comprise:

Concordance list is set up the unit, and being used for according to the textual entry storehouse of presetting to be retrieved is that indexing units is set up described concordance list with syllable, word or speech, and described concordance list is in order to carry out one or more levels index.

9, described according to Claim 8 device is characterized in that, also comprises:

Language model is set up the unit, is used to utilize described textual entry storehouse training of presetting to obtain the part or all of of described language model.

10, according to claim 7,8 or 9 described devices, it is characterized in that described retrieval unit comprises:

Index ballot subelement, be used for utilizing the concordance list that presets that each character/word of recognition result is voted, wherein, described ballot is meant the index entry of searching concordance list with the character/word in the recognition result, after inquiring about index entry, each clauses and subclauses votes that this index is included all adds 1;

The primary election clauses and subclauses are chosen subelement, are used to choose votes and are higher than the clauses and subclauses of the votes threshold value that presets as described primary election clauses and subclauses.