CN108595431B

CN108595431B - Voice interaction text error correction method, device, terminal and storage medium

Info

Publication number: CN108595431B
Application number: CN201810399789.0A
Authority: CN
Inventors: 李金凯; 杨善松
Original assignee: Hisense Co Ltd
Current assignee: Hisense Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2020-09-25
Anticipated expiration: 2038-04-28
Also published as: CN108595431A

Abstract

The application discloses a method, a device, a terminal and a storage medium for correcting a voice interaction text, which belong to the field of voice recognition, and the method comprises the following steps: calculating the character string co-occurrence probability between two adjacent character strings in the interactive text obtained by voice recognition according to the pre-stored semantic attributes of each character string, and determining the character strings to be corrected in a plurality of character strings according to the calculated character string co-occurrence probability; determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, and selecting the target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability. According to the method and the device, the co-occurrence probability among the character strings is indirectly calculated through the semantic attributes of the character strings, so that the untrained character strings in the text error correction model are corrected, and the error correction efficiency of the interactive text is improved.

Description

Voice interaction text error correction method, device, terminal and storage medium

Technical Field

The present application relates to the field of speech recognition, and in particular, to a method, an apparatus, a terminal and a storage medium for correcting a text error in a speech interaction.

Background

With the development of speech recognition technology, the application field of speech recognition technology is wider and wider, and more users use the functions of speech search, speech control and the like.

Under the influence of various external environmental factors, a voice recognition system may have a situation that a part of character strings are wrongly recognized in the process of performing voice recognition. In the prior art, a speech recognition system attempts to correct recognized wrong text according to a pre-trained text correction model, and if the correction is successful, the original text is replaced or a user is prompted to correct the text. For example, after acquiring voice data "i want to see a fanghua" input by a user, a voice recognition system generates an interactive text "i want to see a fanghua" according to the voice data, performs word segmentation on the "i want to see a fanghua", then performs error correction on character strings "i", "want to see", and "fanghua" obtained after word segmentation according to a text error correction model, and finally replaces the character string "i want to see a fanghua" with the character string "i want to see a fanghua" which is successfully corrected.

The construction of the text error correction model in the prior art is usually based on the character string itself, and the error correction process cannot be completed for new words and rare words which cannot be covered in the error correction model. With the proliferation of network vocabularies and new vocabularies, especially in the application scene of television voice assistants, character strings such as names of movies and televisions and names of music in the entertainment field are layered endlessly, and the text error correction model is not very suitable in the scene.

Disclosure of Invention

In order to solve the problem that a voice recognition system in the related art cannot correct an untrained character string in a text error correction model, the embodiment of the application provides a voice interaction text error correction method, a device, a terminal and a storage medium. The technical scheme is as follows:

in a first aspect, a method for correcting text errors in voice interaction is provided, where the method includes:

acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data;

segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;

determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings, selecting a target character string with the highest pronunciation audio similarity to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.

Optionally, the calculating, according to the pre-stored semantic attributes of the character strings, a character string co-occurrence probability between two adjacent character strings includes:

determining the semantic attributes of the adjacent first character string and the second character string according to the pre-stored semantic attributes of each character string;

determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string in the pre-stored corresponding relation of the semantic attribute co-occurrence probability between the semantic attributes;

and calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string.

Optionally, the determining, from the plurality of character strings, a character string to be corrected includes:

and determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are lower than a preset first probability threshold value as character strings to be corrected in the character strings.

Optionally, the selecting, from the character strings corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, a target character string with a pronunciation audio having the highest similarity to the pronunciation audio corresponding to the character string to be corrected, includes:

screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold value in each character string corresponding to a first semantic attribute with the highest co-occurrence probability of the pre-stored semantic attributes;

respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;

and determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.

Optionally, after determining the character string to be corrected among the plurality of character strings, the method further includes:

calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text;

and if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold, executing the step of determining a first semantic attribute with the highest probability of co-occurrence with the semantic attributes of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings.

In a second aspect, there is provided a voice interaction text correction apparatus, the apparatus comprising:

the acquisition module is used for acquiring voice data to be recognized and performing voice recognition to obtain an interactive text corresponding to the voice data;

the determining module is used for segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of the character strings, and determining the character string to be corrected in the character strings according to the calculated character string co-occurrence probability;

and the replacing module is used for determining the semantic attribute co-occurrence probability of the non-to-be-corrected character string based on the semantic attributes of the non-to-be-corrected character string adjacent to the to-be-corrected character string, selecting a target character string with the pronunciation audio frequency having the highest similarity with the pronunciation audio frequency corresponding to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.

Optionally, the determining module includes:

the first determining unit is used for determining the semantic attribute of the adjacent first character string and the semantic attribute of the second character string according to the pre-stored semantic attribute of each character string;

a second determining unit, configured to determine, in a correspondence relationship between pre-stored semantic attribute co-occurrence probabilities among the semantic attributes, a semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string;

a first calculating unit, configured to calculate a string co-occurrence probability between the first string and the second string according to a semantic attribute co-occurrence probability between semantic attributes of the first string and semantic attributes of the second string.

Optionally, the determining module further includes:

and the third determining unit is used for determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are all lower than a preset first probability threshold value as character strings to be corrected in the plurality of character strings.

the screening unit is used for screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold in each character string corresponding to a first semantic attribute with the highest pre-stored semantic attribute co-occurrence probability;

the second calculation unit is used for respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;

and the fourth determining unit is used for determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.

the third calculating unit is used for calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text;

and a fifth determining unit, configured to execute the step of determining, based on the semantic attribute of the non-to-be-corrected character string adjacent to the character string to be corrected, the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the non-to-be-corrected character string, if the comprehensive probability of the semantic attribute corresponding to the interactive text is lower than a preset third probability threshold.

In a third aspect, a terminal is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for text error correction for voice interaction according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, which is characterized by at least one instruction, at least one program, a set of codes, or a set of instructions stored in the storage medium, and loaded and executed by a processor to implement the method for text error correction for speech interaction according to the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the method provided by the embodiment of the application calculates the co-occurrence probability of the character strings between adjacent character strings based on the respective semantic attributes of the character strings to determine the character string to be corrected in the interactive text, and corrects the character string to be corrected according to the semantic attribute of the character which is not to be corrected and is adjacent to the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, and therefore the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for correcting text errors in a voice interaction according to an embodiment of the present application;

FIG. 2A is a flowchart illustrating a method for correcting text errors in a voice interaction according to another embodiment of the present application;

FIG. 2B is a table illustrating correspondence between strings and semantic attributes provided by one embodiment of the present application;

FIG. 2C is a table illustrating the correspondence of semantic attribute co-occurrence probabilities between semantic attributes provided by one embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for correcting text errors in a voice interaction according to still another embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for correcting text errors in a voice interaction according to another embodiment of the present application;

FIG. 5 is a block diagram illustrating the structure of a text error correction apparatus for voice interaction provided in an embodiment of the present application;

fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the embodiment of the application, a method for correcting the error of the voice interaction text is provided, because the newly added vocabulary is not required to be input into a model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the untrained character strings in a text error correction model are difficult to correct accurately by a voice recognition system in the related art is solved. The embodiments of the present application will be described in further detail below based on the common aspects related to the embodiments of the present application described above.

The terms and specific explanations related to the embodiments of the present application are as follows:

semantic attribute co-occurrence probability: probability that semantic attributes respectively corresponding to two character strings adjacent to each other appear at the same time.

Probability of co-occurrence of character strings: the probability that the two character strings are adjacent to each other and appear simultaneously is calculated by the semantic attributes respectively corresponding to the two character strings.

Semantic attribute comprehensive probability: the comprehensive probability of the co-occurrence probability of the semantic attributes between the character strings included in the interactive text is used for representing the probability that the interactive text is a text with correct semantics, and is obtained by calculating the co-occurrence probability of the semantic attributes between the semantic attributes respectively corresponding to all the character strings which are adjacent in pairs in the interactive text.

Example 1

Referring to fig. 1, a flowchart of a method for correcting text errors in voice interaction according to an embodiment of the present application is shown. The voice interaction text error correction method can comprise the following steps:

step 101, acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data.

Optionally, a large amount of voice data and a voice text corresponding to the voice data are used to train an acoustic model (such as a GMM-HMM model, a DNN-HMM model, and an RNN + CTC model), when the acoustic model is trained to be mature, the voice data to be recognized is obtained, and the trained acoustic model is used to perform voice recognition on the voice data to obtain an interactive text corresponding to the voice data.

Optionally, the execution subject performing the voice recognition in this embodiment may be a terminal or a server. When the execution main body is a terminal, the terminal collects voice data of a user through a microphone and carries out voice recognition on the collected voice data; and when the execution main body is the server, the server receives the voice data sent by the terminal and carries out voice recognition on the received voice data.

102, segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.

Optionally, according to the pre-stored semantic attributes of each character string, calculating a character string co-occurrence probability between two adjacent character strings, which may be replaced by: determining semantic attributes of adjacent first character strings and semantic attributes of adjacent second character strings according to the pre-stored semantic attributes of the character strings, determining the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings in the corresponding relation of the pre-stored semantic attribute co-occurrence probability between the semantic attributes, and calculating the character string co-occurrence probability between the first character strings and the second character strings according to the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings.

Optionally, in the plurality of character strings, it is determined that the character string to be corrected is replaced by: and determining a third character string with the character string co-occurrence probability lower than a preset first probability threshold value with the adjacent character strings as the character string to be corrected.

For example, the preset first probability threshold is 0.8, and in the interactive text "i want to see the chi hua" obtained after the speech recognition, the character string co-occurrence probability between the character string "i" and the character string "want to see" is 0.95, and the character string co-occurrence probability between the character string "want to see" and the character string "chi hua" is 0.75. Since the co-occurrence probability of the character strings between the character string "i" and the adjacent character string "want to see" is higher than 0.8, the character string "i" is determined as a character string not to be modified; since the character string co-occurrence probabilities between the character string "want to see" and the adjacent character strings (the character string "i" and the character string "chi") are not all lower than 0.8, the character string "i" is determined as a non-to-be-corrected character string; since the character string co-occurrence probability between the character string "bloom" and the adjacent character string "want to see" is lower than 0.8, the character string "bloom" is determined as the character string to be corrected.

The word segmentation method may be word-by-word segmentation, sentence-by-sentence component (subject, predicate, object, etc.), or the like, and the present embodiment does not limit the specific word segmentation method. For example, the interactive text is "i want to see the fragrance bloom", five segmentations of "i", "want", "see", "fragrance", and bloom "can be obtained after word segmentation is performed on the interactive text, and three segmentations of" i "," want to see ", and" fragrance bloom "are obtained after word segmentation is performed on the interactive text.

It should be noted that, for the interactive text, word segmentation may be performed only according to characters, word segmentation may be performed only according to words, or word segmentation and word segmentation may be combined, and the embodiment does not limit the combination manner of the word segmentation.

103, determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting a target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the character string to be corrected with the target character string.

Because the character strings to be corrected are located at different positions of the interactive text, the number of character strings to be corrected adjacent to the character strings to be corrected is different, for example, when the character strings to be corrected are located at the beginning or the end of the interactive text, the number of character strings to be corrected adjacent to the character strings to be corrected is 1, that is, the first semantic attribute can be determined based on the semantic attribute of the character strings to be corrected; when the character string to be corrected is located in the sentence of the interactive text, the number of the character strings to be corrected adjacent to the character string to be corrected is 2, that is, the first semantic attribute needs to be determined based on the semantic attributes of the two character strings to be corrected. Therefore, in a possible implementation manner, when the character string to be corrected is located at a different position of the interactive text, based on the semantic attribute of the non-character string to be corrected adjacent to the character string to be corrected, the specific process of determining the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the non-character string to be corrected at least has the following two cases:

in the first case, when the character string to be corrected is adjacent to a fourth character string, a first semantic attribute with the highest probability of co-occurrence with the semantic attributes of the character strings not to be corrected is determined based on a second semantic attribute of the fourth character string, wherein the fourth character string is the character string not to be corrected.

For example, the interactive text "i want to see the chi hua" obtained after the speech recognition includes the character strings "i", "want to see" and "chi hua", where the "chi hua" is the character string to be corrected. Because the Fanghua is adjacent to the character string to be corrected, the first semantic attribute ' film name ' with the highest co-occurrence probability with the visual verb ' is determined based on the semantic attribute ' visual verb ' of the character string to be corrected.

In the second case, when the character string to be modified is adjacent to the fifth character string and the sixth character string, based on the third semantic attribute of the fifth character string and the fourth semantic attribute of the sixth character string, determining a semantic attribute set in which the co-occurrence probability of the first semantic attribute with the third semantic attribute and the co-occurrence probability of the second semantic attribute with the fourth semantic attribute both reach a preset second probability threshold, wherein the fifth character string and the sixth character string are non-character strings to be modified; calculating the average semantic attribute co-occurrence probability corresponding to each semantic attribute according to the first semantic attribute co-occurrence probability and the first semantic attribute co-occurrence probability corresponding to each semantic attribute in the semantic attribute set; and determining the semantic attribute with the highest average semantic attribute co-occurrence probability as the first semantic attribute with the highest semantic attribute co-occurrence probability with the semantic attributes of the character string not to be corrected.

For example, the second probability threshold is 0.8, and the interactive text "movie bloom is good after speech recognition" includes the character strings "movie", "bloom" and "good look", where "bloom" is the character string to be corrected. Because the Fanghua is respectively adjacent to the non-to-be-corrected character strings 'movie' and 'good look', based on the third semantic attribute 'movie classification' and the fourth semantic attribute 'movie evaluation' of the 'good look' of the 'movie', semantic attribute sets 'movie name', 'TV drama name' and 'video name' are determined, wherein the co-occurrence probability with the third semantic attribute 'movie classification' and the co-occurrence probability with the fourth semantic attribute 'movie evaluation' both reach a preset second probability threshold. Calculating an average semantic attribute co-occurrence probability 0.55 corresponding to the semantic attribute 'movie name' according to a first semantic attribute co-occurrence probability 0.8 between the semantic attribute 'movie name' and a third semantic attribute 'movie classification' in the semantic attribute set and a second semantic attribute co-occurrence probability 0.3 between the semantic attribute 'movie name' and a fourth semantic attribute 'movie evaluation'; calculating an average semantic attribute co-occurrence probability 0.2 corresponding to the semantic attribute TV series name according to a first semantic attribute co-occurrence probability 0.1 between the semantic attribute TV series name and a third semantic attribute movie classification in the semantic attribute set and a second semantic attribute co-occurrence probability 0.3 between the semantic attribute TV series name and a fourth semantic attribute movie evaluation; and calculating the average semantic attribute co-occurrence probability 0.2 corresponding to the semantic attribute video name according to the first semantic attribute co-occurrence probability 0.1 between the semantic attribute video name and the third semantic attribute movie classification in the semantic attribute set and the second semantic attribute co-occurrence probability 0.3 between the semantic attribute video name and the fourth semantic attribute movie evaluation. And determining the semantic attribute 'movie name' with the highest average semantic attribute co-occurrence probability as the first semantic attribute with the highest co-occurrence probability of the 'movie name' and the 'movie evaluation'.

After the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the character string not to be corrected is determined, selecting a target character string with the highest similarity between the pronunciation audio and the pronunciation audio corresponding to the character string to be corrected in each pre-stored character string corresponding to the first semantic attribute, and replacing the character string to be corrected in the interactive text with the target character string.

Optionally, the corresponding relationship between the semantic attribute and each character string is stored locally in a table form.

Since the character string is composed of characters, the characters are composed of pronunciation audio. The pronunciation audio is a phoneme, which is the smallest unit in the speech, that is, the similarity of the pronunciation audio of two character strings is calculated, and in fact, the similarity between the two character strings is calculated.

When the character is a Chinese character, the pronunciation audio is a Chinese pinyin. For example, when a character string is "fuhua", the characters constituting the character string are "square" and "hua", the pronunciation audio string constituting the character "square" is "fang", the pronunciation audio string constituting the character "hua" is "hua", that is, the pronunciation audio string whose character string is "fuhua" is "fang hua".

It should be noted that the calculation of the similarity of the pronunciation audio may be implemented by means of the longest common substring, the longest common subsequence, the minimum edit distance method, the hamming distance, the cosine value, the edit distance, and the like, and the embodiment does not make any limitation on the calculation manner of the similarity of the pronunciation audio.

To sum up, in the method provided by the embodiment of the present application, based on the respective semantic attributes of the character strings, the co-occurrence probability of the character strings between adjacent character strings is calculated to determine the character string to be corrected in the interactive text, and the error correction is performed on the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.

Example 2

Please refer to fig. 2A, which shows a flowchart of a method for correcting text errors in a voice interaction according to another embodiment of the present application. The voice interaction text error correction method can comprise the following steps:

step 201, acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data.

Step 202, performing word segmentation on the interactive text to obtain a plurality of character strings, and determining the semantic attribute of the adjacent first character string and the semantic attribute of the second character string according to the pre-stored semantic attribute of each character string.

Optionally, the corresponding relationship between the character string and the semantic attribute is stored locally in a table form. Fig. 2B shows a table of correspondence between character strings and semantic attributes provided by an embodiment of the present application. As shown in fig. 2B, in the interactive text "i want to see the bloom" obtained after the voice recognition, the semantic attribute of the character string "i" is "subject", the semantic attribute of the character string "want to see" is "visual action", and the semantic attribute of the character string "bloom" is "name of a person".

Step 203, determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string in the pre-stored corresponding relationship of the semantic attribute co-occurrence probability between the semantic attributes.

Optionally, the correspondence of the semantic attribute co-occurrence probability among the semantic attributes is stored locally in a form of a table. Fig. 2C shows a table of correspondence of semantic attribute co-occurrence probabilities between semantic attributes according to an embodiment of the present application. As can be seen from fig. 2B and 2C, in the interactive text "i want to see the bloom" obtained after the voice recognition, the semantic attribute of the character string "i" is "subject", the semantic attribute of the character string "want to see" is "visual action", and the semantic attribute of the character string "bloom" is "name". Here, the semantic attribute co-occurrence probability between the "subject" (semantic attribute of the character string "i" and the "visual action" (semantic attribute of the character string "i" to see)) is 0.2, and since the semantic attribute co-occurrence probability between the "visual action" (semantic attribute of the character string "i" to see) and the "person name" (semantic attribute of the character string "chi hua") is not found in the correspondence of the semantic attribute co-occurrence probabilities between the respective semantic attributes, the semantic attribute co-occurrence probability between the "visual action" and the "person name" is determined to be 0.

And 204, calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.

Optionally, the co-occurrence probability of the character strings between two adjacent character strings is calculated according to the pre-stored semantic attributes of the character strings and a preset character string co-occurrence probability formula. The preset character string co-occurrence probability formula is obtained by deduction of a Bayes formula, and the Bayes formula is shown in formula (1):

wherein, P (w)_i|w_i-1) To be in a character string w_i-1Character string w in occurrence_iProbability of occurrence (i.e. string w in interactive text)_i-1In a character string w_iProbability of previous occurrence), P (w)_i-1w_i) As a string w_i-1And character string w_iProbability of co-occurrence (which does not take into account the character string w)_i-1And character string w_iChronological order of occurrence), P (w)_i-1) As a string w_i-1The probability of occurrence.

The total probability formula is shown by formula (2):

wherein, P (w)_i-1) As a string w_i-1Number of occurrences, P (t)_j|w_i-1) For a predetermined character string w_i-1Has a semantic attribute of t_jProbability of (1), P (t)_k|t_j) As semantic attributes t_jWith semantic attribute t_kProbability of co-occurrence of semantic attributes between, P (w)_i|t_k) For a predetermined character string w_iHas a semantic attribute of t_kThe probability of (c).

Substituting the formula (2) into the formula (1) to obtain a preset character string co-occurrence probability formula, wherein the preset character string co-occurrence probability formula is represented by the following formula (3):

wherein,

C(t_jw_i-1) For a predetermined character string w_i-1Has a semantic attribute of t_jProbability of (C), C (w)_i-1) As a string w_i-1Number of occurrences, C (w)_it_k) For a predetermined character string w_iHas a semantic attribute of t_kProbability of (C) (t)_k) For training semantic attribute t in corpus_kNumber of occurrences, C (t)_kt_j) As semantic attributes t_jWith semantic attribute t_kProbability of co-occurrence of semantic attributes between, C (t)_j) For training semantic attribute t in corpus_jThe number of occurrences.

Still referring to fig. 2B, the correspondence between the character string and the semantic attributes is also recorded with the probability that the character string belongs to each semantic attribute, for example, the semantic attributes of the character string "third generation tenth of peaches" are "novel name", "movie name", and "drama name", respectively, and correspondingly, the probability that the semantic attribute of "third generation tenth of peaches" is "novel name" is 0.33, the probability that the semantic attribute of "third generation tenth of peaches" is "movie name" is 0.33, and the probability that the semantic attribute of "third generation tenth of peaches" is "drama name" is 0.33.

It should be noted that, in this embodiment, the probability relationship between the character string and each semantic attribute may be directly obtained by artificial pre-definition, or may be obtained by corpus training and calculation using a maximum likelihood estimation method based on the pre-defined probability.

Step 205, determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting the target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the character string to be corrected with the target character string.

In a possible scenario, if three adjacent character strings to be corrected exist in an interactive text, firstly, a target character string replacing the character string to be corrected adjacent to the character string to be corrected is determined based on the semantic attribute of the character string to be corrected adjacent to the character string to be corrected, then the determined target character string is used as the character string to be corrected, and the target character string replacing the character string to be corrected adjacent to the character string to be corrected is determined.

For example, a character string to be corrected "BCD" exists in the interactive text "ABCDE", a target character string "F" replacing the character string to be corrected "B" adjacent to the character string to be corrected "a" is determined based on the semantic attribute of the character string to be corrected "a" adjacent to the character string to be corrected "B", and determining a target character string 'G' replacing the character string 'D' to be corrected adjacent to the character string 'E' to be corrected based on the semantic attribute of the character string 'E' to be corrected adjacent to the character string 'D' to be corrected, then taking the determined target character string 'F' as the character string 'F' to be corrected, taking the determined target character string 'G' as the character string 'G' to be corrected, and determining a target character string 'H' replacing the character string 'C' to be corrected adjacent to the character string 'F' to be corrected and the character string 'G' to be corrected respectively.

It should be noted that, since step 201 is similar to step 101 in this embodiment, step 201 is not described in detail in this embodiment.

Example 3

In a possible implementation manner, before determining a target character string from each character string corresponding to a first semantic attribute, screening each character string corresponding to the first semantic attribute in advance according to the length of the character string to be corrected, so as to reduce the number of character strings for calculating the pronunciation audio similarity by the processor, thereby reducing the calculation pressure of the processor. Please refer to fig. 3, which shows a flowchart of a method for correcting text errors in voice interaction according to still another embodiment of the present application. The voice interaction text error correction method can comprise the following steps:

step 301, acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data.

Step 302, performing word segmentation on the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.

Step 303, determining a first semantic attribute with the highest probability of co-occurrence of semantic attributes of non-to-be-corrected character strings based on the semantic attributes of non-to-be-corrected character strings adjacent to the to-be-corrected character strings, and screening out character strings with a difference value smaller than a preset length threshold value between the length of the character strings and the length of the character strings corresponding to the to-be-corrected character strings in each character string corresponding to the pre-stored first semantic attribute with the highest probability of co-occurrence of semantic attributes.

Taking the edit distance as an example, since the edit distance is the minimum number of edit operations required to convert one audio string into another audio string between two audio strings, the length difference between the two audio strings is large, and the edit distance between the pronunciation audios corresponding to the two audio strings is also large, so as to reduce the amount of calculation of the processor, before calculating the similarity between the character strings, in each of the character strings corresponding to the pre-stored first semantic attribute, the character string whose difference between the character string length and the character string length corresponding to the character string to be corrected is smaller than the preset length threshold may be eliminated.

It should be noted that the preset length threshold may be set manually or preset systematically, and the preset length threshold may be 0, 1, 2, and the like, and the setting manner and the specific numerical value of the preset length threshold are not limited in this embodiment.

And 304, respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character strings obtained by screening.

The editing distance refers to the minimum number of editing operations required for converting one audio string into another audio string between two audio strings, wherein the editing operations comprise audio replacement, audio insertion and audio deletion.

For example, a character string to be modified in the interactive text "i want to see the bloom" is "bloom", and each character string corresponding to the first semantic attribute "movie name" is "bloom", and "inspiring", respectively, where a pronunciation audio corresponding to the character string "bloom" is "fang hua", and an edit distance between the character string and the "bloom" is 0; the pronunciation audio corresponding to the character string 'bloom' is 'fanhua', the editing distance between the pronunciation audio corresponding to the character string 'bloom' is 'fen fa', the editing distance between the pronunciation audio corresponding to the character string 'spurting' is 'fen fa', the editing distance between the pronunciation audio and the character string 'bloom' is 4, and the editing distance between the character string 'bloom' and the character string 'bloom' is the minimum, so that the character string 'bloom' to be corrected in the interactive text is replaced by the character string 'bloom', and the replaced interactive text 'i want to see the bloom' is obtained.

And 305, determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected, and replacing the character string to be corrected with the target character string.

Generally, the smaller the editing distance between two pronunciation audios, the higher the similarity between the two pronunciation audios, and the more similar the character strings corresponding to the two pronunciation audios, so the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected is determined as the target character string with the highest similarity between the pronunciation audios corresponding to the character string to be corrected.

It should be noted that, since steps 301 to 302 in this embodiment are similar to steps 101 to 102, step 301 to step 302 are not described in detail in this embodiment.

In this embodiment, before determining the target character string from each character string corresponding to the first semantic attribute, each character string corresponding to the first semantic attribute is screened in advance according to the length of the character string to be corrected, so as to reduce the number of character strings for which the processor calculates the pronunciation audio similarity, thereby reducing the calculation pressure of the processor.

Example 4

In a possible implementation manner, in order to reduce the possibility of error correction, after a character string to be corrected is determined from an interactive text, the comprehensive probability of semantic attributes corresponding to the interactive text is calculated, and whether error correction needs to be performed on the interactive text is determined according to the level of the comprehensive probability of the semantic attributes. Referring to fig. 4, a flowchart of a method for correcting text errors in a voice interaction according to another embodiment of the present application is shown. The voice interaction text error correction method can comprise the following steps:

step 401, acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data.

Step 402, performing word segmentation on the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.

Step 403, according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text, calculating the semantic attribute comprehensive probability corresponding to the interactive text.

Specifically, the semantic attribute comprehensive probability corresponding to the interactive text is calculated according to the semantic attribute co-occurrence probability between semantic attributes respectively corresponding to all adjacent two character strings in the interactive text and a preset semantic attribute comprehensive probability formula.

The preset semantic attribute comprehensive probability formula is represented by the following formula (4):

wherein, P (w)₁,w₂,...,w_m) For semantic property integration probabilities, P (t), of interactive text_j|w_i-1) For a predetermined character string w_i-1Has a semantic attribute of t_jProbability of (1), P (t)_k|t_j) As semantic attributes t_jWith semantic attribute t_kProbability of co-occurrence of semantic attributes between, P (w)_i|t_k) For a predetermined character string w_iHas a semantic attribute of t_kThe probability of (c).

It should be noted that, when the co-occurrence probability of the character strings between two adjacent character strings in the interactive text is 0, the comprehensive probability of the semantic attributes corresponding to the interactive text obtained according to the formula (4) is 0, so as to avoid that in the subsequent process, the processor directly determines that the interactive text needs to be corrected because the comprehensive probability of the semantic attributes corresponding to the interactive text is too low. Optionally, when the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string is not stored in the correspondence of the semantic attribute co-occurrence probabilities between the semantic attributes, determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string as a default value, where the default value is not zero.

Step 404, if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold, determining the co-occurrence probability of the semantic attributes of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting a target character string with the highest similarity between the pronunciation audio and the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest co-occurrence probability of the semantic attributes, and replacing the character string to be corrected with the target character string.

It should be noted that the third probability threshold may be set manually or preset systematically, and the third probability threshold may be 0.3, 0.4, 0.5, and the like, and the setting manner and the specific numerical value of the third probability threshold are not limited in this embodiment.

It should be noted that, since steps 401 to 402 in this embodiment are similar to steps 101 to 102, the description of steps 401 to 402 is omitted in this embodiment.

In this embodiment, in order to reduce the possibility of error correction, after the character string to be corrected is determined from the interactive text, the semantic attribute comprehensive probability corresponding to the interactive text is calculated, and whether error correction needs to be performed on the interactive text is determined according to the level of the semantic attribute comprehensive probability.

The following are embodiments of the apparatus of the present application, and for details not described in detail in the embodiments of the apparatus, reference may be made to the above-mentioned one-to-one corresponding method embodiments.

Referring to fig. 5, fig. 5 is a block diagram illustrating a structure of a text error correction apparatus for voice interaction provided in an embodiment of the present application. The voice interaction text error correction method comprises the following steps: an acquisition module 501, a determination module 502, and a replacement module 503.

The acquiring module 501 is configured to acquire voice data to be recognized, perform voice recognition, and obtain an interactive text corresponding to the voice data;

a determining module 502, configured to perform word segmentation on an interactive text to obtain a plurality of character strings, calculate a character string co-occurrence probability between two adjacent character strings according to a pre-stored semantic attribute of each character string, and determine a character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;

the replacing module 503 is configured to determine a semantic attribute co-occurrence probability of a non-to-be-corrected character string based on semantic attributes of non-to-be-corrected character strings adjacent to the to-be-corrected character string, select, in each character string corresponding to a first semantic attribute with the highest semantic attribute co-occurrence probability, a target character string with the highest pronunciation audio similarity between the pronunciation audio and the pronunciation audio corresponding to the to-be-corrected character string, and replace the to-be-corrected character string with the target character string.

In one possible implementation, the determining module 502 includes:

the second determining unit is used for determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string in the pre-stored corresponding relation of the semantic attribute co-occurrence probability between the semantic attributes;

and the first calculation unit is used for calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string.

In a possible implementation manner, the determining module 502 further includes:

In a possible implementation manner, the replacing module 503 further includes:

In one possible implementation, the apparatus further includes:

the third calculation unit is used for calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text after the character string to be corrected is determined in at least one character string;

and the fifth determining unit is used for executing the step of determining the first semantic attribute with the highest probability of co-occurrence of the semantic attributes of the character strings to be corrected based on the semantic attributes of the character strings to be corrected, which are adjacent to the character strings to be corrected, if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold.

To sum up, the device provided in the embodiment of the present application calculates the co-occurrence probability of character strings between adjacent character strings based on the respective semantic attributes of the character strings to determine the character string to be corrected in the interactive text, and corrects the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.

It should be noted that: the voice interactive text error correction device provided in the above embodiment is only illustrated by dividing the functional modules when the voice interactive text is corrected, and in practical application, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the voice interaction text error correction device provided by the above embodiment and the voice interaction text error correction method embodiment belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.

An exemplary embodiment of the present application provides a terminal, which can implement the method for correcting the text error in the voice interaction provided by the present application, and the terminal includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting a target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the character string to be corrected with the target character string.

Fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application. The terminal 600 may be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 600 includes: a processor 601 and a memory 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the abnormal data reporting method provided by the method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.

The power supply 608 is used to provide power to the various components in the terminal 600. The power supply 608 may be alternating current, direct current, disposable or rechargeable. When the power supply 608 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for text correction for voice interaction, the method comprising:

segmenting the interactive text to obtain a plurality of character strings, determining semantic attributes of adjacent first character strings and semantic attributes of adjacent second character strings according to the pre-stored semantic attributes of the character strings, determining the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings in the corresponding relation of the semantic attribute co-occurrence probability between the pre-stored semantic attributes, calculating the character string co-occurrence probability between the first character strings and the second character strings according to the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings, and determining the character strings to be corrected in the character strings according to the calculated character string co-occurrence probability;

2. The method according to claim 1, wherein the determining a character string to be corrected among the plurality of character strings comprises:

3. The method according to claim 1, wherein the selecting, from the character strings corresponding to the first semantic attribute with the highest probability of semantic attribute co-occurrence, a target character string with a pronunciation audio having the highest similarity to the pronunciation audio corresponding to the character string to be modified, comprises:

4. The method according to any one of claims 1 to 3, wherein after determining a character string to be corrected among the plurality of character strings, the method further comprises:

5. An apparatus for text correction for voice interaction, the apparatus comprising:

the determining module is used for segmenting the interactive text to obtain a plurality of character strings; the determining module comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining the semantic attribute of the adjacent first character string and the semantic attribute of the adjacent second character string according to the pre-stored semantic attribute of each character string; a second determining unit, configured to determine, in a correspondence relationship between pre-stored semantic attribute co-occurrence probabilities among the semantic attributes, a semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string; a first calculation unit configured to calculate a string co-occurrence probability between the first string and the second string according to a semantic attribute co-occurrence probability between semantic attributes of the first string and semantic attributes of the second string; the determining module is further configured to determine a character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;

6. The apparatus of claim 5, wherein the determining module further comprises:

7. A terminal, characterized in that the terminal comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by the processor to implement the method of text correction for voice interaction according to any of claims 1-4.

8. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of text correction for speech interaction according to any one of claims 1 to 4.