CN106534548A

CN106534548A - Voice error correction method and device

Info

Publication number: CN106534548A
Application number: CN201611034174.5A
Authority: CN
Inventors: 刘迪源; 刘聪; 王智国; 胡国平; 潘嘉; 潘青华; 黄鑫
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-11-17
Filing date: 2016-11-17
Publication date: 2017-03-22
Anticipated expiration: 2036-11-17
Also published as: CN106534548B

Abstract

The invention provides a voice error correction method and device. The voice error correction method comprises that voice data of a user is received; a preset error correction mode which includes a semantic error correction mode or an index error correction mode is determined; according to the voice data of the user and the present error correction mode, an error of a content to be corrected is corrected; and the content after error correction is back fed to the user. The method can improve the accuracy and suitable range of error correction, requirements of the user can be met more effectively, and user experience is improved.

Description

Voice error correction method and device

Technical Field

The present application relates to the field of natural language understanding, and in particular, to a method and an apparatus for speech error correction.

Background

With the increasing maturity of the related art of artificial intelligence, more and more intelligent devices enter the lives of users, and human-machine interaction is becoming common. The most frequent use of the interaction process is generally voice interaction, and the interaction mode can liberate both hands of people so as to be favored by users, such as voice input and voice conversation. More and more intelligent devices provide the function of voice error correction, so that a user can modify display contents in the intelligent devices by using voice, hands of the user are further liberated, and user experience is greatly improved.

In the related art, the method for voice error correction generally corrects the text data, and when the text data is specifically corrected, the user must correct the text data according to a fixed mode, so that the limitation is more, the accuracy of the correction is lower, and the user requirements cannot be met.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present application is to provide a speech error correction method, which can improve error correction accuracy and an application range, thereby better satisfying user requirements and improving user experience.

Another object of the present application is to provide a speech error correction apparatus.

In order to achieve the above object, an embodiment of the first aspect of the present application provides a speech error correction method, including: receiving user voice data; determining a current error correction mode, the error correction mode comprising: a semantic error correction mode or an index error correction mode; correcting the content to be corrected according to the user voice data and the current error correction mode; and feeding back the corrected content to the user.

According to the voice error correction method provided by the embodiment of the first aspect of the application, the error correction mode suitable for the current scene can be selected by determining the error correction mode, so that the error correction accuracy is improved; by correcting the content to be corrected, the method is not limited to processing the text data, and the application range can be expanded; therefore, the user requirements can be better met and the user experience is improved by improving the error correction accuracy and expanding the application range.

In order to achieve the above object, an embodiment of the second aspect of the present application provides a speech error correction apparatus, including: the receiving module is used for receiving user voice data; a determining module, configured to determine a current error correction mode, where the error correction mode includes: a semantic error correction mode or an index error correction mode; the error correction module is used for correcting the error of the content to be corrected according to the user voice data and the current error correction mode; and the feedback module is used for feeding back the content after error correction to the user.

According to the voice error correction device provided by the embodiment of the second aspect of the application, the error correction mode suitable for the current scene can be selected by determining the error correction mode, so that the error correction accuracy is improved; by correcting the content to be corrected, the method is not limited to processing the text data, and the application range can be expanded; therefore, the user requirements can be better met and the user experience is improved by improving the error correction accuracy and expanding the application range.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart of a speech error correction method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a speech error correction method according to another embodiment of the present application;

FIG. 3 is a schematic diagram of an index constructed for each word in text data to be corrected in the embodiment of the present application;

FIG. 4 is a schematic diagram of text data to be corrected and corresponding candidate words and candidate indexes in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a speech error correction apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a speech error correction apparatus according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of a speech error correction apparatus according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Fig. 1 is a schematic flow chart of a speech error correction method according to an embodiment of the present application.

As shown in fig. 1, the method of the present embodiment includes:

s11: user voice data is received.

The user voice data is generally voice data for the user to correct errors in displayed error content, such as displayed error text data, but may also be other displayed content, such as images.

In this embodiment, when performing voice error correction, the method may be divided into multiple error correction modes, for example, a semantic error correction mode and an index error correction mode, and correspondingly, the user voice data may be voice data in the semantic error correction mode or may also be voice data in the index error correction mode.

Assuming that the content to be corrected is text data, and the text data is 'a train ticket good for Beijing', and the user wants to order the train ticket from Nanjing to Beijing ', the user can correct the error through voice, for example, the user speaks the voice data in the semantic error correction mode, for example,' the good for Nanjing is modified. The voice data in the index error correction mode is an index, and the index is generally a number, for example, a user speaks 'two-dot-two'.

The user voice data is generally determined according to the content that the user needs to modify, and the specific content is not limited in the present application.

S12: determining a current error correction mode, the error correction mode comprising: a semantic error correction mode or an index error correction mode.

In some examples, the current error correction mode may be automatically determined by the system.

In some examples, a user-selected current error correction mode may be received by the system.

Further, when the system automatically determines the current error correction mode, the system may analyze the historical pronunciation or the current pronunciation of the user or the current environment where the user is located, and automatically determine the current error correction mode; if the pronunciation of the user is more standard and the environment where the user is located is more quiet, the quality of the voice data of the user is higher, the semantic understanding accuracy is higher, and the system can automatically determine that the current error correction mode type is the semantic error correction mode; otherwise, if the pronunciation of the user is not standard or the environmental noise of the user is large, the quality of the voice data of the user is low, the semantic understanding accuracy is not high, and the recognition effect of the numbers is generally better than that of the Chinese characters, the system can automatically determine that the current error correction mode type is the index error correction mode. Or,

when the system automatically determines the current error correction mode, the system can also automatically determine the current error correction mode according to the error correction mode selected by the user history; if the user history usually selects the semantic error correction mode, the semantic error correction mode is used by the user, and the system can automatically determine that the current error correction mode is the semantic error correction mode.

When the system receives the current error correction mode selected by the user, for example, the system takes two error correction modes as options, the options are provided for the user through display or voice playing, and the user selects the current error correction mode through gestures, voice or key operations.

It should be noted that, although S11 and S12 are connected to each other in fig. 1, this is only an example, and in practical implementation, the current error correction mode may be related to the received user speech data, such as analyzing the received user speech data, determining whether the pronunciation is standard, and determining the current error correction mode according to whether the pronunciation is standard, in which case S11 and S12 may be connected to each other. Alternatively, the current error correction mode may be independent of the received user voice data, for example, when the system automatically analyzes the environment where the user is currently located to determine the current error correction mode, or determines the current error correction mode according to the user selection, the current error correction mode is not determined according to the received user voice data, and in this case, S11 and S12 are not connected to each other and are separated.

S13: and correcting the content to be corrected according to the user voice data and the current error correction mode.

In the semantic error correction mode, after semantic understanding is mainly performed on user voice data, error correction is performed on the content to be corrected according to a semantic understanding result. In the index error correction mode, an index is mainly established for the content to be corrected, and the user corrects the error through the index of the content to be corrected.

The content to be corrected includes: textual data and non-textual data, including but not limited to: image, video, audio, application.

In the semantic error correction mode, the user voice data is generally voice data corresponding to text data. Taking the image as an example, the user voice data is 'delete the fifth image' or 'insert a shiling image after the second image', and the like; taking an application program as an example, the user voice data may be "turn off 360 browser, turn on IE browser", and the like; in the index error correction mode, for example, after indexes are built in advance for the displayed videos, candidate indexes corresponding to each video are displayed, and the user voice data is generally voice data corresponding to the candidate indexes.

S14: and feeding back the corrected content to the user.

For example, if the text data to be corrected is 'a train ticket good to Beijing', and the user voice data is 'changing good to Nanjing', the 'train ticket from Nanjing to Beijing' is fed back to the user after the voice correction. The feedback can be performed by content display or voice playing.

In the embodiment, by determining the error correction mode, the error correction mode suitable for the current scene can be selected, so that the error correction accuracy is improved; by correcting the content to be corrected, the method is not limited to processing the text data, and the application range can be expanded; therefore, the user requirements can be better met and the user experience is improved by improving the error correction accuracy and expanding the application range.

Fig. 2 is a schematic flow chart of a speech error correction method according to another embodiment of the present application.

In this embodiment, the content to be corrected is taken as the text data to be corrected as an example.

As shown in fig. 2, the method of the present embodiment includes:

s21: user voice data is received.

S22: determining a current error correction mode, the error correction mode comprising: a semantic error correction mode or an index error correction mode.

The details of S21-S22 can be found in S11-S12, and are not described in detail herein.

And under different error correction modes, carrying out error correction by adopting corresponding error correction methods.

Specifically, in the semantic error correction mode, S23-S24 is executed; and then S28 is executed. In the index error correction mode, S25-S27 are performed; and then S28 is executed.

S23: and carrying out voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data.

Speech recognition may employ various existing or future technologies, which will not be described in detail herein.

S24: and determining error correction information according to the identification text data, and correcting the text data to be corrected according to the error correction information to obtain corrected text data.

In some examples, error correction information may be determined according to the identification text data and a preset error correction rule, and then error correction is performed by using the error correction information; this approach may be referred to as a rule-based approach.

In some examples, the error correction features of the recognition text data and the text data to be corrected may be extracted, error correction information may be determined according to the error correction features and a pre-constructed speech error correction model, and error correction may be performed using the error correction information; this approach may be referred to as a model-based approach.

The error correction information may include: error words and error correction words; error correction words and error correction locations; error words and error correction locations; or, error words, error correction words, and error correction locations.

The following are descriptions of the above two methods, respectively.

The method comprises the following steps: a rule-based approach.

The rule-based method directly determines error correction information according to the error correction rules by predefining the rules of voice error correction. The error correction rule may be predetermined according to application requirements, and the application is not limited in detail.

Taking the error correction rules including three types as an example, namely a replaceable error correction rule, a deletable error correction rule and an insertible error correction rule, wherein the replaceable error correction is to replace error words in the text data with corresponding error correction words; the said insertability error correction needs to insert error correction word in the corresponding place in the text data; the erasure correction requires erasure of the error word in the text data. The following is an example of each error correction rule, where "+" denotes a misword or error correction word, and "/" before and after denotes the relationship of two words being an or:

(1) alternative error correction rules: change/modify/change to

The former ". mark" represents an error word, and the latter ". mark" represents an error correction word;

(2) the rule of the inserted error correction: adding/adding after/before

The rear/front indicates the error correction position, the position where the error correction word is inserted is specifically under the rule, and the rear indicates the error correction word;

(3) erasure correction rules: delete/remove a posterior/anterior "") "

The left and right sides represent the error correction positions, the positions where the error words are deleted are specified under the rule, and the left and right sides represent the error words.

When determining error correction information according to the identification text data and the error correction rule, a currently applicable error correction rule may be determined according to the identification text data, and the identification text data and the currently applicable error correction rule may be matched to determine the error correction information.

Specifically, the system firstly judges the currently applicable error correction rule according to the recognition text data corresponding to the user voice data, and when the judgment is specific, the judgment can be determined according to the keywords in the recognition text data, and if the recognition text data contains the keywords such as "modify", "replace" or "change", the currently applicable error correction rule can be determined to be the alternative error correction rule; and then, character string matching is carried out on the corresponding type of error correction rules (such as alternative error correction rules) and the identification text data, so that error correction information is determined.

When the error correction information is determined based on the identification text data and the error correction rule, the identification text data may be matched with each error correction rule to determine the error correction information. That is, the identification text data is directly and sequentially matched with all the error correction rules, and the error correction information on the matching is determined as the finally adopted error correction information.

After the error correction information is determined, the text data to be corrected can be corrected according to the error correction information.

When the error is specifically corrected, the error correction position may be determined first, for example, when the error correction information includes an error word, the position of the error word is taken as the error correction position, or the error correction position included in the error correction information is directly obtained; and performing corresponding processing on the error correction positions, such as replacing the error correction words with the error correction positions, or inserting the error correction words into the error correction positions, or deleting the error words from the error correction positions.

Some examples of rule-based error correction are as follows:

(1) alternative error correction

Text data to be corrected: train ticket mixed with fertilizer to Beijing

And the identification text data corresponding to the user voice data: changes 'Hefei' into 'Nanjing'

Text data after error correction: train ticket from Nanjing to Beijing

(2) Insertion error correction

Text data to be corrected: i want to play basketball

And the identification text data corresponding to the user voice data: sports hall for playing basketball and joining east school zone in front of basketball

Text data after error correction: i want to go to east school district gym to play basketball

(3) Erasure correction

Text data to be corrected: my telephone is five-one-two-six-eight

And the identification text data corresponding to the user voice data: delete one eight

Text data after error correction: my phone is five-one-two-six-eight

The second method comprises the following steps: a model-based approach.

Since the preset error correction rule is limited, in order to improve the coverage, a model-based error correction method may be used.

In the model-based error correction method, firstly, error correction characteristics for identifying text data and contents to be corrected are extracted, and then error correction information is determined according to the extracted characteristics and a pre-constructed voice error correction model.

Taking the content to be corrected as the text data to be corrected as an example, determining the correction information based on the model may include:

(1) and respectively segmenting the text data to be corrected and the identification text data.

(2) And extracting the error correction characteristics of each word in the text data to be corrected and the recognized text data.

The error correction features comprise the position of each word in the text data to be corrected, the word vector of each word, the word vector of the context word of each word, the mutual information between each word and the context word thereof, the error correction probability of each word, the word vector of each word in the text data recognized by the voice data of the user, the word vector of the context word of each word, and the mutual information between each word and the context word thereof;

the context word of each word refers to a word before or a word after each word, and specifically how many words are considered in front and back can be determined according to application requirements, for example, 2 words are considered; the error correction probability of each word in the text data to be corrected can be obtained according to the historical habits of the user, and if the user frequently corrects one word, the error correction probability of the word can be set to be larger; mutual information of each word and its context words can be calculated by the prior art.

(3) And determining error correction information according to the extracted error correction features and a pre-constructed voice error correction model.

When the specific determination is carried out, the extracted error correction features are directly used as the input features of the voice error correction model and output as corresponding error words and/or error correction words and error correction positions; the error words are not required to be output for the insertion error correction, namely the output parameters of the error words are null, only the error words and the corresponding error correction positions are required to be output, the error words are not required to be output for the deletion error correction, namely the output parameters of the error words are null, only the error words and the corresponding error correction positions are required to be output, and the error words, the error words and the error correction positions can be simultaneously output for the alternative error correction.

The voice error correction model is obtained by collecting a large amount of text data to be error corrected and corresponding error correction text data in advance and constructing the voice error correction model by adopting a deep learning method, and during specific construction, firstly, error correction positions in the text data to be error corrected and error words and/or error correction words in the text data to be error corrected need to be marked; then extracting text data to be corrected and the correction characteristics of each word in the corrected text data; and finally, taking the error correction characteristics as the input of an error correction model, taking the error correction position of the text data to be corrected and the error words and/or the error correction words in the error correction text as the output of the model, and carrying out parameter training on the error correction model according to the labeling result, wherein the error correction model is a deep neural network model.

S25: and establishing candidate words and candidate indexes for the text data to be corrected.

The method specifically comprises the following steps:

(1) performing word segmentation on text data to be corrected;

and in particular may be implemented using a variety of existing or future technologies.

(2) And constructing an index for the words obtained by word segmentation.

Specifically, each word may be numbered directly according to the sequence, and the number is used as an index of each word.

If the text data to be corrected is "this is a wonderful fairy tale", after word segmentation, an index as shown in fig. 3 may be constructed for each word, wherein the number above each word is the index of the corresponding word.

(3) Determining a word-to-word corresponding to the word, and determining a candidate score of the word-to-word;

for example, corresponding to each word, another word having a word pair relationship with the word may be found from the word bank, and as the word-to-word of the word, the method specifically includes: the similar meaning word, the homophone word and the error word of the word.

The candidate score of the word-to-word of a word can be calculated according to the word-pair score of the word pair formed by the word and the word-to-word. The word pair score of each word pair can be calculated according to the corresponding word pair category, for example, when the word pair is a near-sense word pair, the word pair score can be calculated according to the semantic similarity of the words in the word pair; when the word pair is the homophone word pair, the word pair score can be obtained by calculation according to the pronunciation similarity of the words in the word pair; and when the word pair is the wrong word pair, the word pair score can be obtained by calculation according to the occurrence frequency of the wrong word pair. The particular word pair score may be calculated using a variety of existing or future technologies.

When calculating the candidate score of the word-to-word according to the word-to-word score, specifically, if the word pair formed by the word and the word-to-word includes word pairs of multiple categories, the word-to-word scores of the word pairs of the multiple categories are accumulated, and then the accumulated score is used as the candidate score of the word-to-word. And if the word pair formed by the word and the word-pair word is a word pair of one category, taking the word pair score of the word pair of the one category as the candidate score of the word-pair word.

For example, a word is "wonderful", its synonyms include "nice" and "beautiful", and the word pair score of the synonym pair consisting of "wonderful" and "nice" is 0.7, and the word pair score of the synonym pair consisting of "wonderful" and "beautiful" is 0.5; the homophones comprise every second, and the word pair score of the homophone pair formed by wonderful words and every second is 0.6; the wrong words comprise 'nice' and 'wonderful', the word pair score of the wrong word pair consisting of 'nice' and 'nice' is 0.4, and the word pair score of the synonym pair consisting of 'wonderful' and 'wonderful' is 0.2; according to the above example, the word pair consisting of "nice" and "wonderful" includes a pair of near-sense words and a pair of wrong words, that is, the category of the word pair is multiple, and then the score of the word pair of the near-sense word pair and the score of the word pair of the wrong word pair are accumulated, and the accumulated score is used as a candidate score of "nice", that is, the candidate score of "nice" is 0.7+0.4 — 1.1; "per second" and "wonderful" constitute only homophones, and the candidate score of "per second" is the word pair score of the homophones that it constitutes, i.e. the candidate score of "per second" is 0.6.

It can be understood that each word pair may be found in the word bank in advance, and a word pair score of each word pair is calculated, and when determining the current text data to be corrected, the word pair score required by the current text data to be corrected is found directly from the calculated word pair scores. Of course, online calculation is not excluded, such as determining the word pairs required by the current text data to be corrected, and calculating the word pair scores of the required word pairs in real time.

(4) Determining a candidate word of the word according to the candidate score of the word-to-word corresponding to the word, and constructing a candidate index for the candidate word according to the index of the word and the candidate score of the candidate word.

Specifically, word-pair words having candidate scores higher than a predetermined threshold may be selected as candidate words of the corresponding word.

When an index is built for the candidate words, the candidate words can be numbered in sequence according to the order of the candidate scores of the candidate words from large to small, and the numbers are used as the candidate indexes of the candidate words; and when numbering the candidate words, the numbering can be performed by taking the numbers of the words corresponding to the candidate words as a reference. For example, if a word is numbered 3, the candidate words of the word are numbered in the order of 3.1, 3.2,. according to the candidate score from high to low.

For example, the candidate words and candidate indexes established for the text data to be corrected may be as shown in fig. 4.

It can be understood that the candidate words and the candidate indexes corresponding to the text data to be corrected can be updated in real time according to the change of the text data to be corrected.

S26: performing voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data, wherein the recognition text data comprises: and error correction indexing.

The error correction index refers to an index corresponding to a correct word to be used, for example, a user says "three points one".

S27: and in the content to be corrected, replacing the corresponding error content with the candidate content corresponding to the error correction index to obtain the text data after error correction.

If the user says 'three points one', because the candidate content corresponding to 'three points one' is 'beautiful', the 'beautiful' is used for replacing the corresponding wrong content, namely 'wonderful', so that the content after error correction can be obtained as 'the beautiful fairy tale'.

It should be noted that, when there is no error correction word required by the user in the candidate words, the user may use the semantic error correction mode to correct the error.

S28: and feeding back the corrected text data to the user.

In the embodiment, by determining the error correction mode, the error correction mode suitable for the current scene can be selected, so that the error correction accuracy is improved; by correcting the content to be corrected, the method is not limited to processing the text data, and the application range can be expanded; therefore, the user requirements can be better met and the user experience is improved by improving the error correction accuracy and expanding the application range. Furthermore, error correction can be performed based on rules or models in a semantic error correction mode, so that the use range and accuracy can be further expanded; the user speaks the index to correct the error in the index error correction mode, and the user only needs to speak the number, so that the method is convenient for the user to use compared with a text speaking mode, and the number is relatively easy to recognize by voice, so that the implementation complexity can be reduced.

Fig. 5 is a schematic structural diagram of a speech error correction apparatus according to an embodiment of the present application.

As shown in fig. 5, the apparatus 50 of the present embodiment includes: a receiving module 51, a determining module 52, an error correcting module 53 and a feedback module 54.

A receiving module 51, configured to receive user voice data;

a determining module 52, configured to determine a current error correction mode, where the error correction mode includes: a semantic error correction mode or an index error correction mode;

an error correction module 53, configured to perform error correction on content to be error corrected according to the user voice data and the current error correction mode;

and a feedback module 54, configured to feed back the error-corrected content to the user.

In some embodiments, referring to fig. 6, if the current error correction mode is the semantic error correction mode, the error correction module 53 includes:

the voice recognition sub-module 5301 is configured to perform voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data;

the error correction sub-module 5302 is configured to determine error correction information according to the identification text data, and correct the content to be corrected according to the error correction information to obtain the corrected content.

In some embodiments, the error correction sub-module 5302 is configured to determine error correction information according to the identification text data, and includes:

determining error correction information according to the identification text data and a preset error correction rule;

and/or the presence of a gas in the gas,

and extracting the error correction characteristics of the identification text data and the content to be corrected, and determining error correction information according to the error correction characteristics and a pre-constructed voice error correction model.

In some embodiments, the error correction sub-module 5302 is configured to determine error correction information according to the identification text data and a preset error correction rule, and includes:

determining a currently applicable error correction rule according to the identification text data, matching the identification text data with the currently applicable error correction rule, and determining error correction information; or,

and matching the identification text data with each error correction rule to determine error correction information.

In some embodiments, the error correction information comprises:

error words and error correction words; error correction words and error correction locations; error words and error correction locations; or, error words, error correction words, and error correction locations.

In some embodiments, referring to fig. 7, if the current error correction mode is the index error correction mode, the error correction module 53 includes:

the establishing submodule 5311 is configured to establish candidate content and a candidate index for the content to be corrected;

a speech recognition sub-module 5312, configured to perform speech recognition on the user speech data to obtain recognition text data corresponding to the user speech data, where the recognition text data includes: error correction indexing;

the error correction submodule 5313 is configured to, in the content to be error corrected, replace the corresponding error content with the candidate content corresponding to the error correction index, so as to obtain the content after error correction.

In some embodiments, if the content to be corrected is text data to be corrected, the candidate content is a candidate word, and the establishing sub-module 5311 is specifically configured to:

performing word segmentation on text data to be corrected;

constructing an index for the words obtained by word segmentation;

determining a word-to-word corresponding to the word, and determining a candidate score of the word-to-word;

determining a candidate word of the word according to the index of the word and the candidate score of the word-to-word corresponding to the word, and constructing a candidate index for the candidate word according to the index of the word and the candidate score of the candidate word.

In some embodiments, the content to be error-corrected includes:

textual data and non-textual data.

It is understood that the apparatus of the present embodiment corresponds to the method embodiment described above, and specific contents may be referred to the related description of the method embodiment, and are not described in detail herein.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method of speech error correction, comprising:

receiving user voice data;

determining a current error correction mode, the error correction mode comprising: a semantic error correction mode or an index error correction mode;

correcting the content to be corrected according to the user voice data and the current error correction mode;

and feeding back the corrected content to the user.

2. The method of claim 1, wherein if the current error correction mode is a semantic error correction mode, said correcting the content to be corrected according to the user voice data and the current error correction mode comprises:

carrying out voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data;

and determining error correction information according to the identification text data, and correcting the content to be corrected according to the error correction information to obtain corrected content.

3. The method of claim 2, wherein determining error correction information based on the identifying text data comprises:

and/or the presence of a gas in the gas,

4. The method according to claim 3, wherein determining error correction information according to the identification text data and a preset error correction rule comprises:

5. The method according to any of claims 2-4, wherein the error correction information comprises:

6. The method of claim 1, wherein if the current error correction mode is an index error correction mode, said correcting the error of the content to be corrected according to the user speech data and the current error correction mode comprises:

establishing candidate contents and candidate indexes for the contents to be corrected;

performing voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data, wherein the recognition text data comprises: error correction indexing;

and in the content to be corrected, replacing the corresponding error content with the candidate content corresponding to the error correction index to obtain the corrected content.

7. The method according to claim 6, wherein if the content to be corrected is text data to be corrected, the candidate content is a candidate word, and the establishing of the candidate content and the candidate index for the content to be corrected comprises:

performing word segmentation on text data to be corrected;

constructing an index for the words obtained by word segmentation;

determining a candidate word of the word according to the candidate score of the word-to-word corresponding to the word, and constructing a candidate index for the candidate word according to the index of the word and the candidate score of the candidate word.

8. The method according to claim 1, wherein the content to be error-corrected comprises:

textual data and non-textual data.

9. A speech error correction apparatus, comprising:

the receiving module is used for receiving user voice data;

a determining module, configured to determine a current error correction mode, where the error correction mode includes: a semantic error correction mode or an index error correction mode;

the error correction module is used for correcting the error of the content to be corrected according to the user voice data and the current error correction mode;

and the feedback module is used for feeding back the content after error correction to the user.

10. The apparatus of claim 9, wherein if the current error correction mode is the semantic error correction mode, the error correction module comprises:

the voice recognition submodule is used for carrying out voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data;

and the error correction submodule is used for determining error correction information according to the identification text data and correcting the content to be corrected according to the error correction information to obtain the corrected content.

11. The apparatus of claim 10, wherein the error correction sub-module is configured to determine error correction information based on the identified text data, comprising:

and/or the presence of a gas in the gas,

12. The apparatus of claim 10, wherein the error correction sub-module is configured to determine error correction information according to the recognized text data and a preset error correction rule, and comprises:

13. The method according to any of claims 10-12, wherein the error correction information comprises:

14. The method of claim 9, wherein if the current error correction mode is the index error correction mode, the error correction module comprises:

the establishing submodule is used for establishing candidate contents and candidate indexes for the contents to be corrected;

a voice recognition submodule, configured to perform voice recognition on the user voice data to obtain recognition text data corresponding to the user voice data, where the recognition text data includes: error correction indexing;

and the error correction submodule is used for replacing the corresponding error content with the candidate content corresponding to the error correction index in the content to be corrected to obtain the corrected content.

15. The apparatus according to claim 14, wherein if the content to be corrected is text data to be corrected, the candidate content is a candidate word, and the creating sub-module is specifically configured to:

performing word segmentation on text data to be corrected;

constructing an index for the words obtained by word segmentation;

16. The apparatus of claim 9, wherein the content to be corrected comprises:

textual data and non-textual data.