CN108595431B - Voice interaction text error correction method, device, terminal and storage medium - Google Patents
Voice interaction text error correction method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN108595431B CN108595431B CN201810399789.0A CN201810399789A CN108595431B CN 108595431 B CN108595431 B CN 108595431B CN 201810399789 A CN201810399789 A CN 201810399789A CN 108595431 B CN108595431 B CN 108595431B
- Authority
- CN
- China
- Prior art keywords
- character string
- corrected
- semantic
- character
- character strings
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012937 correction Methods 0.000 title claims abstract description 65
- 230000003993 interaction Effects 0.000 title claims abstract description 32
- 230000002452 interceptive effect Effects 0.000 claims abstract description 91
- 238000012216 screening Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 17
- 230000002093 peripheral effect Effects 0.000 description 10
- 238000012549 training Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 240000005809 Prunus persica Species 0.000 description 4
- 235000006040 Prunus persica var persica Nutrition 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000003205 fragrance Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000000919 ceramic Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a method, a device, a terminal and a storage medium for correcting a voice interaction text, which belong to the field of voice recognition, and the method comprises the following steps: calculating the character string co-occurrence probability between two adjacent character strings in the interactive text obtained by voice recognition according to the pre-stored semantic attributes of each character string, and determining the character strings to be corrected in a plurality of character strings according to the calculated character string co-occurrence probability; determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, and selecting the target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability. According to the method and the device, the co-occurrence probability among the character strings is indirectly calculated through the semantic attributes of the character strings, so that the untrained character strings in the text error correction model are corrected, and the error correction efficiency of the interactive text is improved.
Description
Technical Field
The present application relates to the field of speech recognition, and in particular, to a method, an apparatus, a terminal and a storage medium for correcting a text error in a speech interaction.
Background
With the development of speech recognition technology, the application field of speech recognition technology is wider and wider, and more users use the functions of speech search, speech control and the like.
Under the influence of various external environmental factors, a voice recognition system may have a situation that a part of character strings are wrongly recognized in the process of performing voice recognition. In the prior art, a speech recognition system attempts to correct recognized wrong text according to a pre-trained text correction model, and if the correction is successful, the original text is replaced or a user is prompted to correct the text. For example, after acquiring voice data "i want to see a fanghua" input by a user, a voice recognition system generates an interactive text "i want to see a fanghua" according to the voice data, performs word segmentation on the "i want to see a fanghua", then performs error correction on character strings "i", "want to see", and "fanghua" obtained after word segmentation according to a text error correction model, and finally replaces the character string "i want to see a fanghua" with the character string "i want to see a fanghua" which is successfully corrected.
The construction of the text error correction model in the prior art is usually based on the character string itself, and the error correction process cannot be completed for new words and rare words which cannot be covered in the error correction model. With the proliferation of network vocabularies and new vocabularies, especially in the application scene of television voice assistants, character strings such as names of movies and televisions and names of music in the entertainment field are layered endlessly, and the text error correction model is not very suitable in the scene.
Disclosure of Invention
In order to solve the problem that a voice recognition system in the related art cannot correct an untrained character string in a text error correction model, the embodiment of the application provides a voice interaction text error correction method, a device, a terminal and a storage medium. The technical scheme is as follows:
in a first aspect, a method for correcting text errors in voice interaction is provided, where the method includes:
acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data;
segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;
determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings, selecting a target character string with the highest pronunciation audio similarity to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.
Optionally, the calculating, according to the pre-stored semantic attributes of the character strings, a character string co-occurrence probability between two adjacent character strings includes:
determining the semantic attributes of the adjacent first character string and the second character string according to the pre-stored semantic attributes of each character string;
determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string in the pre-stored corresponding relation of the semantic attribute co-occurrence probability between the semantic attributes;
and calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string.
Optionally, the determining, from the plurality of character strings, a character string to be corrected includes:
and determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are lower than a preset first probability threshold value as character strings to be corrected in the character strings.
Optionally, the selecting, from the character strings corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, a target character string with a pronunciation audio having the highest similarity to the pronunciation audio corresponding to the character string to be corrected, includes:
screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold value in each character string corresponding to a first semantic attribute with the highest co-occurrence probability of the pre-stored semantic attributes;
respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;
and determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.
Optionally, after determining the character string to be corrected among the plurality of character strings, the method further includes:
calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text;
and if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold, executing the step of determining a first semantic attribute with the highest probability of co-occurrence with the semantic attributes of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings.
In a second aspect, there is provided a voice interaction text correction apparatus, the apparatus comprising:
the acquisition module is used for acquiring voice data to be recognized and performing voice recognition to obtain an interactive text corresponding to the voice data;
the determining module is used for segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of the character strings, and determining the character string to be corrected in the character strings according to the calculated character string co-occurrence probability;
and the replacing module is used for determining the semantic attribute co-occurrence probability of the non-to-be-corrected character string based on the semantic attributes of the non-to-be-corrected character string adjacent to the to-be-corrected character string, selecting a target character string with the pronunciation audio frequency having the highest similarity with the pronunciation audio frequency corresponding to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.
Optionally, the determining module includes:
the first determining unit is used for determining the semantic attribute of the adjacent first character string and the semantic attribute of the second character string according to the pre-stored semantic attribute of each character string;
a second determining unit, configured to determine, in a correspondence relationship between pre-stored semantic attribute co-occurrence probabilities among the semantic attributes, a semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string;
a first calculating unit, configured to calculate a string co-occurrence probability between the first string and the second string according to a semantic attribute co-occurrence probability between semantic attributes of the first string and semantic attributes of the second string.
Optionally, the determining module further includes:
and the third determining unit is used for determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are all lower than a preset first probability threshold value as character strings to be corrected in the plurality of character strings.
Optionally, the selecting, from the character strings corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, a target character string with a pronunciation audio having the highest similarity to the pronunciation audio corresponding to the character string to be corrected, includes:
the screening unit is used for screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold in each character string corresponding to a first semantic attribute with the highest pre-stored semantic attribute co-occurrence probability;
the second calculation unit is used for respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;
and the fourth determining unit is used for determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.
Optionally, after determining the character string to be corrected among the plurality of character strings, the method further includes:
the third calculating unit is used for calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text;
and a fifth determining unit, configured to execute the step of determining, based on the semantic attribute of the non-to-be-corrected character string adjacent to the character string to be corrected, the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the non-to-be-corrected character string, if the comprehensive probability of the semantic attribute corresponding to the interactive text is lower than a preset third probability threshold.
In a third aspect, a terminal is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for text error correction for voice interaction according to the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, which is characterized by at least one instruction, at least one program, a set of codes, or a set of instructions stored in the storage medium, and loaded and executed by a processor to implement the method for text error correction for speech interaction according to the first aspect.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the method provided by the embodiment of the application calculates the co-occurrence probability of the character strings between adjacent character strings based on the respective semantic attributes of the character strings to determine the character string to be corrected in the interactive text, and corrects the character string to be corrected according to the semantic attribute of the character which is not to be corrected and is adjacent to the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, and therefore the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for correcting text errors in a voice interaction according to an embodiment of the present application;
FIG. 2A is a flowchart illustrating a method for correcting text errors in a voice interaction according to another embodiment of the present application;
FIG. 2B is a table illustrating correspondence between strings and semantic attributes provided by one embodiment of the present application;
FIG. 2C is a table illustrating the correspondence of semantic attribute co-occurrence probabilities between semantic attributes provided by one embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for correcting text errors in a voice interaction according to still another embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for correcting text errors in a voice interaction according to another embodiment of the present application;
FIG. 5 is a block diagram illustrating the structure of a text error correction apparatus for voice interaction provided in an embodiment of the present application;
fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the embodiment of the application, a method for correcting the error of the voice interaction text is provided, because the newly added vocabulary is not required to be input into a model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the untrained character strings in a text error correction model are difficult to correct accurately by a voice recognition system in the related art is solved. The embodiments of the present application will be described in further detail below based on the common aspects related to the embodiments of the present application described above.
The terms and specific explanations related to the embodiments of the present application are as follows:
semantic attribute co-occurrence probability: probability that semantic attributes respectively corresponding to two character strings adjacent to each other appear at the same time.
Probability of co-occurrence of character strings: the probability that the two character strings are adjacent to each other and appear simultaneously is calculated by the semantic attributes respectively corresponding to the two character strings.
Semantic attribute comprehensive probability: the comprehensive probability of the co-occurrence probability of the semantic attributes between the character strings included in the interactive text is used for representing the probability that the interactive text is a text with correct semantics, and is obtained by calculating the co-occurrence probability of the semantic attributes between the semantic attributes respectively corresponding to all the character strings which are adjacent in pairs in the interactive text.
Example 1
Referring to fig. 1, a flowchart of a method for correcting text errors in voice interaction according to an embodiment of the present application is shown. The voice interaction text error correction method can comprise the following steps:
Optionally, a large amount of voice data and a voice text corresponding to the voice data are used to train an acoustic model (such as a GMM-HMM model, a DNN-HMM model, and an RNN + CTC model), when the acoustic model is trained to be mature, the voice data to be recognized is obtained, and the trained acoustic model is used to perform voice recognition on the voice data to obtain an interactive text corresponding to the voice data.
Optionally, the execution subject performing the voice recognition in this embodiment may be a terminal or a server. When the execution main body is a terminal, the terminal collects voice data of a user through a microphone and carries out voice recognition on the collected voice data; and when the execution main body is the server, the server receives the voice data sent by the terminal and carries out voice recognition on the received voice data.
102, segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.
Optionally, according to the pre-stored semantic attributes of each character string, calculating a character string co-occurrence probability between two adjacent character strings, which may be replaced by: determining semantic attributes of adjacent first character strings and semantic attributes of adjacent second character strings according to the pre-stored semantic attributes of the character strings, determining the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings in the corresponding relation of the pre-stored semantic attribute co-occurrence probability between the semantic attributes, and calculating the character string co-occurrence probability between the first character strings and the second character strings according to the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings.
Optionally, in the plurality of character strings, it is determined that the character string to be corrected is replaced by: and determining a third character string with the character string co-occurrence probability lower than a preset first probability threshold value with the adjacent character strings as the character string to be corrected.
For example, the preset first probability threshold is 0.8, and in the interactive text "i want to see the chi hua" obtained after the speech recognition, the character string co-occurrence probability between the character string "i" and the character string "want to see" is 0.95, and the character string co-occurrence probability between the character string "want to see" and the character string "chi hua" is 0.75. Since the co-occurrence probability of the character strings between the character string "i" and the adjacent character string "want to see" is higher than 0.8, the character string "i" is determined as a character string not to be modified; since the character string co-occurrence probabilities between the character string "want to see" and the adjacent character strings (the character string "i" and the character string "chi") are not all lower than 0.8, the character string "i" is determined as a non-to-be-corrected character string; since the character string co-occurrence probability between the character string "bloom" and the adjacent character string "want to see" is lower than 0.8, the character string "bloom" is determined as the character string to be corrected.
The word segmentation method may be word-by-word segmentation, sentence-by-sentence component (subject, predicate, object, etc.), or the like, and the present embodiment does not limit the specific word segmentation method. For example, the interactive text is "i want to see the fragrance bloom", five segmentations of "i", "want", "see", "fragrance", and bloom "can be obtained after word segmentation is performed on the interactive text, and three segmentations of" i "," want to see ", and" fragrance bloom "are obtained after word segmentation is performed on the interactive text.
It should be noted that, for the interactive text, word segmentation may be performed only according to characters, word segmentation may be performed only according to words, or word segmentation and word segmentation may be combined, and the embodiment does not limit the combination manner of the word segmentation.
103, determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting a target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the character string to be corrected with the target character string.
Because the character strings to be corrected are located at different positions of the interactive text, the number of character strings to be corrected adjacent to the character strings to be corrected is different, for example, when the character strings to be corrected are located at the beginning or the end of the interactive text, the number of character strings to be corrected adjacent to the character strings to be corrected is 1, that is, the first semantic attribute can be determined based on the semantic attribute of the character strings to be corrected; when the character string to be corrected is located in the sentence of the interactive text, the number of the character strings to be corrected adjacent to the character string to be corrected is 2, that is, the first semantic attribute needs to be determined based on the semantic attributes of the two character strings to be corrected. Therefore, in a possible implementation manner, when the character string to be corrected is located at a different position of the interactive text, based on the semantic attribute of the non-character string to be corrected adjacent to the character string to be corrected, the specific process of determining the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the non-character string to be corrected at least has the following two cases:
in the first case, when the character string to be corrected is adjacent to a fourth character string, a first semantic attribute with the highest probability of co-occurrence with the semantic attributes of the character strings not to be corrected is determined based on a second semantic attribute of the fourth character string, wherein the fourth character string is the character string not to be corrected.
For example, the interactive text "i want to see the chi hua" obtained after the speech recognition includes the character strings "i", "want to see" and "chi hua", where the "chi hua" is the character string to be corrected. Because the Fanghua is adjacent to the character string to be corrected, the first semantic attribute ' film name ' with the highest co-occurrence probability with the visual verb ' is determined based on the semantic attribute ' visual verb ' of the character string to be corrected.
In the second case, when the character string to be modified is adjacent to the fifth character string and the sixth character string, based on the third semantic attribute of the fifth character string and the fourth semantic attribute of the sixth character string, determining a semantic attribute set in which the co-occurrence probability of the first semantic attribute with the third semantic attribute and the co-occurrence probability of the second semantic attribute with the fourth semantic attribute both reach a preset second probability threshold, wherein the fifth character string and the sixth character string are non-character strings to be modified; calculating the average semantic attribute co-occurrence probability corresponding to each semantic attribute according to the first semantic attribute co-occurrence probability and the first semantic attribute co-occurrence probability corresponding to each semantic attribute in the semantic attribute set; and determining the semantic attribute with the highest average semantic attribute co-occurrence probability as the first semantic attribute with the highest semantic attribute co-occurrence probability with the semantic attributes of the character string not to be corrected.
For example, the second probability threshold is 0.8, and the interactive text "movie bloom is good after speech recognition" includes the character strings "movie", "bloom" and "good look", where "bloom" is the character string to be corrected. Because the Fanghua is respectively adjacent to the non-to-be-corrected character strings 'movie' and 'good look', based on the third semantic attribute 'movie classification' and the fourth semantic attribute 'movie evaluation' of the 'good look' of the 'movie', semantic attribute sets 'movie name', 'TV drama name' and 'video name' are determined, wherein the co-occurrence probability with the third semantic attribute 'movie classification' and the co-occurrence probability with the fourth semantic attribute 'movie evaluation' both reach a preset second probability threshold. Calculating an average semantic attribute co-occurrence probability 0.55 corresponding to the semantic attribute 'movie name' according to a first semantic attribute co-occurrence probability 0.8 between the semantic attribute 'movie name' and a third semantic attribute 'movie classification' in the semantic attribute set and a second semantic attribute co-occurrence probability 0.3 between the semantic attribute 'movie name' and a fourth semantic attribute 'movie evaluation'; calculating an average semantic attribute co-occurrence probability 0.2 corresponding to the semantic attribute TV series name according to a first semantic attribute co-occurrence probability 0.1 between the semantic attribute TV series name and a third semantic attribute movie classification in the semantic attribute set and a second semantic attribute co-occurrence probability 0.3 between the semantic attribute TV series name and a fourth semantic attribute movie evaluation; and calculating the average semantic attribute co-occurrence probability 0.2 corresponding to the semantic attribute video name according to the first semantic attribute co-occurrence probability 0.1 between the semantic attribute video name and the third semantic attribute movie classification in the semantic attribute set and the second semantic attribute co-occurrence probability 0.3 between the semantic attribute video name and the fourth semantic attribute movie evaluation. And determining the semantic attribute 'movie name' with the highest average semantic attribute co-occurrence probability as the first semantic attribute with the highest co-occurrence probability of the 'movie name' and the 'movie evaluation'.
After the first semantic attribute with the highest probability of co-occurrence with the semantic attribute of the character string not to be corrected is determined, selecting a target character string with the highest similarity between the pronunciation audio and the pronunciation audio corresponding to the character string to be corrected in each pre-stored character string corresponding to the first semantic attribute, and replacing the character string to be corrected in the interactive text with the target character string.
Optionally, the corresponding relationship between the semantic attribute and each character string is stored locally in a table form.
Since the character string is composed of characters, the characters are composed of pronunciation audio. The pronunciation audio is a phoneme, which is the smallest unit in the speech, that is, the similarity of the pronunciation audio of two character strings is calculated, and in fact, the similarity between the two character strings is calculated.
When the character is a Chinese character, the pronunciation audio is a Chinese pinyin. For example, when a character string is "fuhua", the characters constituting the character string are "square" and "hua", the pronunciation audio string constituting the character "square" is "fang", the pronunciation audio string constituting the character "hua" is "hua", that is, the pronunciation audio string whose character string is "fuhua" is "fang hua".
It should be noted that the calculation of the similarity of the pronunciation audio may be implemented by means of the longest common substring, the longest common subsequence, the minimum edit distance method, the hamming distance, the cosine value, the edit distance, and the like, and the embodiment does not make any limitation on the calculation manner of the similarity of the pronunciation audio.
To sum up, in the method provided by the embodiment of the present application, based on the respective semantic attributes of the character strings, the co-occurrence probability of the character strings between adjacent character strings is calculated to determine the character string to be corrected in the interactive text, and the error correction is performed on the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.
Example 2
Please refer to fig. 2A, which shows a flowchart of a method for correcting text errors in a voice interaction according to another embodiment of the present application. The voice interaction text error correction method can comprise the following steps:
Step 202, performing word segmentation on the interactive text to obtain a plurality of character strings, and determining the semantic attribute of the adjacent first character string and the semantic attribute of the second character string according to the pre-stored semantic attribute of each character string.
Optionally, the corresponding relationship between the character string and the semantic attribute is stored locally in a table form. Fig. 2B shows a table of correspondence between character strings and semantic attributes provided by an embodiment of the present application. As shown in fig. 2B, in the interactive text "i want to see the bloom" obtained after the voice recognition, the semantic attribute of the character string "i" is "subject", the semantic attribute of the character string "want to see" is "visual action", and the semantic attribute of the character string "bloom" is "name of a person".
Optionally, the correspondence of the semantic attribute co-occurrence probability among the semantic attributes is stored locally in a form of a table. Fig. 2C shows a table of correspondence of semantic attribute co-occurrence probabilities between semantic attributes according to an embodiment of the present application. As can be seen from fig. 2B and 2C, in the interactive text "i want to see the bloom" obtained after the voice recognition, the semantic attribute of the character string "i" is "subject", the semantic attribute of the character string "want to see" is "visual action", and the semantic attribute of the character string "bloom" is "name". Here, the semantic attribute co-occurrence probability between the "subject" (semantic attribute of the character string "i" and the "visual action" (semantic attribute of the character string "i" to see)) is 0.2, and since the semantic attribute co-occurrence probability between the "visual action" (semantic attribute of the character string "i" to see) and the "person name" (semantic attribute of the character string "chi hua") is not found in the correspondence of the semantic attribute co-occurrence probabilities between the respective semantic attributes, the semantic attribute co-occurrence probability between the "visual action" and the "person name" is determined to be 0.
And 204, calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability.
Optionally, the co-occurrence probability of the character strings between two adjacent character strings is calculated according to the pre-stored semantic attributes of the character strings and a preset character string co-occurrence probability formula. The preset character string co-occurrence probability formula is obtained by deduction of a Bayes formula, and the Bayes formula is shown in formula (1):
wherein, P (w)i|wi-1) To be in a character string wi-1Character string w in occurrenceiProbability of occurrence (i.e. string w in interactive text)i-1In a character string wiProbability of previous occurrence), P (w)i-1wi) As a string wi-1And character string wiProbability of co-occurrence (which does not take into account the character string w)i-1And character string wiChronological order of occurrence), P (w)i-1) As a string wi-1The probability of occurrence.
The total probability formula is shown by formula (2):
wherein, P (w)i-1) As a string wi-1Number of occurrences, P (t)j|wi-1) For a predetermined character string wi-1Has a semantic attribute of tjProbability of (1), P (t)k|tj) As semantic attributes tjWith semantic attribute tkProbability of co-occurrence of semantic attributes between, P (w)i|tk) For a predetermined character string wiHas a semantic attribute of tkThe probability of (c).
Substituting the formula (2) into the formula (1) to obtain a preset character string co-occurrence probability formula, wherein the preset character string co-occurrence probability formula is represented by the following formula (3):
wherein,C(tjwi-1) For a predetermined character string wi-1Has a semantic attribute of tjProbability of (C), C (w)i-1) As a string wi-1Number of occurrences, C (w)itk) For a predetermined character string wiHas a semantic attribute of tkProbability of (C) (t)k) For training semantic attribute t in corpuskNumber of occurrences, C (t)ktj) As semantic attributes tjWith semantic attribute tkProbability of co-occurrence of semantic attributes between, C (t)j) For training semantic attribute t in corpusjThe number of occurrences.
Still referring to fig. 2B, the correspondence between the character string and the semantic attributes is also recorded with the probability that the character string belongs to each semantic attribute, for example, the semantic attributes of the character string "third generation tenth of peaches" are "novel name", "movie name", and "drama name", respectively, and correspondingly, the probability that the semantic attribute of "third generation tenth of peaches" is "novel name" is 0.33, the probability that the semantic attribute of "third generation tenth of peaches" is "movie name" is 0.33, and the probability that the semantic attribute of "third generation tenth of peaches" is "drama name" is 0.33.
It should be noted that, in this embodiment, the probability relationship between the character string and each semantic attribute may be directly obtained by artificial pre-definition, or may be obtained by corpus training and calculation using a maximum likelihood estimation method based on the pre-defined probability.
In a possible scenario, if three adjacent character strings to be corrected exist in an interactive text, firstly, a target character string replacing the character string to be corrected adjacent to the character string to be corrected is determined based on the semantic attribute of the character string to be corrected adjacent to the character string to be corrected, then the determined target character string is used as the character string to be corrected, and the target character string replacing the character string to be corrected adjacent to the character string to be corrected is determined.
For example, a character string to be corrected "BCD" exists in the interactive text "ABCDE", a target character string "F" replacing the character string to be corrected "B" adjacent to the character string to be corrected "a" is determined based on the semantic attribute of the character string to be corrected "a" adjacent to the character string to be corrected "B", and determining a target character string 'G' replacing the character string 'D' to be corrected adjacent to the character string 'E' to be corrected based on the semantic attribute of the character string 'E' to be corrected adjacent to the character string 'D' to be corrected, then taking the determined target character string 'F' as the character string 'F' to be corrected, taking the determined target character string 'G' as the character string 'G' to be corrected, and determining a target character string 'H' replacing the character string 'C' to be corrected adjacent to the character string 'F' to be corrected and the character string 'G' to be corrected respectively.
It should be noted that, since step 201 is similar to step 101 in this embodiment, step 201 is not described in detail in this embodiment.
To sum up, in the method provided by the embodiment of the present application, based on the respective semantic attributes of the character strings, the co-occurrence probability of the character strings between adjacent character strings is calculated to determine the character string to be corrected in the interactive text, and the error correction is performed on the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.
Example 3
In a possible implementation manner, before determining a target character string from each character string corresponding to a first semantic attribute, screening each character string corresponding to the first semantic attribute in advance according to the length of the character string to be corrected, so as to reduce the number of character strings for calculating the pronunciation audio similarity by the processor, thereby reducing the calculation pressure of the processor. Please refer to fig. 3, which shows a flowchart of a method for correcting text errors in voice interaction according to still another embodiment of the present application. The voice interaction text error correction method can comprise the following steps:
Taking the edit distance as an example, since the edit distance is the minimum number of edit operations required to convert one audio string into another audio string between two audio strings, the length difference between the two audio strings is large, and the edit distance between the pronunciation audios corresponding to the two audio strings is also large, so as to reduce the amount of calculation of the processor, before calculating the similarity between the character strings, in each of the character strings corresponding to the pre-stored first semantic attribute, the character string whose difference between the character string length and the character string length corresponding to the character string to be corrected is smaller than the preset length threshold may be eliminated.
It should be noted that the preset length threshold may be set manually or preset systematically, and the preset length threshold may be 0, 1, 2, and the like, and the setting manner and the specific numerical value of the preset length threshold are not limited in this embodiment.
And 304, respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character strings obtained by screening.
The editing distance refers to the minimum number of editing operations required for converting one audio string into another audio string between two audio strings, wherein the editing operations comprise audio replacement, audio insertion and audio deletion.
For example, a character string to be modified in the interactive text "i want to see the bloom" is "bloom", and each character string corresponding to the first semantic attribute "movie name" is "bloom", and "inspiring", respectively, where a pronunciation audio corresponding to the character string "bloom" is "fang hua", and an edit distance between the character string and the "bloom" is 0; the pronunciation audio corresponding to the character string 'bloom' is 'fanhua', the editing distance between the pronunciation audio corresponding to the character string 'bloom' is 'fen fa', the editing distance between the pronunciation audio corresponding to the character string 'spurting' is 'fen fa', the editing distance between the pronunciation audio and the character string 'bloom' is 4, and the editing distance between the character string 'bloom' and the character string 'bloom' is the minimum, so that the character string 'bloom' to be corrected in the interactive text is replaced by the character string 'bloom', and the replaced interactive text 'i want to see the bloom' is obtained.
And 305, determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected, and replacing the character string to be corrected with the target character string.
Generally, the smaller the editing distance between two pronunciation audios, the higher the similarity between the two pronunciation audios, and the more similar the character strings corresponding to the two pronunciation audios, so the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected is determined as the target character string with the highest similarity between the pronunciation audios corresponding to the character string to be corrected.
It should be noted that, since steps 301 to 302 in this embodiment are similar to steps 101 to 102, step 301 to step 302 are not described in detail in this embodiment.
To sum up, in the method provided by the embodiment of the present application, based on the respective semantic attributes of the character strings, the co-occurrence probability of the character strings between adjacent character strings is calculated to determine the character string to be corrected in the interactive text, and the error correction is performed on the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.
In this embodiment, before determining the target character string from each character string corresponding to the first semantic attribute, each character string corresponding to the first semantic attribute is screened in advance according to the length of the character string to be corrected, so as to reduce the number of character strings for which the processor calculates the pronunciation audio similarity, thereby reducing the calculation pressure of the processor.
Example 4
In a possible implementation manner, in order to reduce the possibility of error correction, after a character string to be corrected is determined from an interactive text, the comprehensive probability of semantic attributes corresponding to the interactive text is calculated, and whether error correction needs to be performed on the interactive text is determined according to the level of the comprehensive probability of the semantic attributes. Referring to fig. 4, a flowchart of a method for correcting text errors in a voice interaction according to another embodiment of the present application is shown. The voice interaction text error correction method can comprise the following steps:
Specifically, the semantic attribute comprehensive probability corresponding to the interactive text is calculated according to the semantic attribute co-occurrence probability between semantic attributes respectively corresponding to all adjacent two character strings in the interactive text and a preset semantic attribute comprehensive probability formula.
The preset semantic attribute comprehensive probability formula is represented by the following formula (4):
wherein, P (w)1,w2,...,wm) For semantic property integration probabilities, P (t), of interactive textj|wi-1) For a predetermined character string wi-1Has a semantic attribute of tjProbability of (1), P (t)k|tj) As semantic attributes tjWith semantic attribute tkProbability of co-occurrence of semantic attributes between, P (w)i|tk) For a predetermined character string wiHas a semantic attribute of tkThe probability of (c).
It should be noted that, when the co-occurrence probability of the character strings between two adjacent character strings in the interactive text is 0, the comprehensive probability of the semantic attributes corresponding to the interactive text obtained according to the formula (4) is 0, so as to avoid that in the subsequent process, the processor directly determines that the interactive text needs to be corrected because the comprehensive probability of the semantic attributes corresponding to the interactive text is too low. Optionally, when the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string is not stored in the correspondence of the semantic attribute co-occurrence probabilities between the semantic attributes, determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string as a default value, where the default value is not zero.
It should be noted that the third probability threshold may be set manually or preset systematically, and the third probability threshold may be 0.3, 0.4, 0.5, and the like, and the setting manner and the specific numerical value of the third probability threshold are not limited in this embodiment.
It should be noted that, since steps 401 to 402 in this embodiment are similar to steps 101 to 102, the description of steps 401 to 402 is omitted in this embodiment.
To sum up, in the method provided by the embodiment of the present application, based on the respective semantic attributes of the character strings, the co-occurrence probability of the character strings between adjacent character strings is calculated to determine the character string to be corrected in the interactive text, and the error correction is performed on the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.
In this embodiment, in order to reduce the possibility of error correction, after the character string to be corrected is determined from the interactive text, the semantic attribute comprehensive probability corresponding to the interactive text is calculated, and whether error correction needs to be performed on the interactive text is determined according to the level of the semantic attribute comprehensive probability.
The following are embodiments of the apparatus of the present application, and for details not described in detail in the embodiments of the apparatus, reference may be made to the above-mentioned one-to-one corresponding method embodiments.
Referring to fig. 5, fig. 5 is a block diagram illustrating a structure of a text error correction apparatus for voice interaction provided in an embodiment of the present application. The voice interaction text error correction method comprises the following steps: an acquisition module 501, a determination module 502, and a replacement module 503.
The acquiring module 501 is configured to acquire voice data to be recognized, perform voice recognition, and obtain an interactive text corresponding to the voice data;
a determining module 502, configured to perform word segmentation on an interactive text to obtain a plurality of character strings, calculate a character string co-occurrence probability between two adjacent character strings according to a pre-stored semantic attribute of each character string, and determine a character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;
the replacing module 503 is configured to determine a semantic attribute co-occurrence probability of a non-to-be-corrected character string based on semantic attributes of non-to-be-corrected character strings adjacent to the to-be-corrected character string, select, in each character string corresponding to a first semantic attribute with the highest semantic attribute co-occurrence probability, a target character string with the highest pronunciation audio similarity between the pronunciation audio and the pronunciation audio corresponding to the to-be-corrected character string, and replace the to-be-corrected character string with the target character string.
In one possible implementation, the determining module 502 includes:
the first determining unit is used for determining the semantic attribute of the adjacent first character string and the semantic attribute of the second character string according to the pre-stored semantic attribute of each character string;
the second determining unit is used for determining the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string in the pre-stored corresponding relation of the semantic attribute co-occurrence probability between the semantic attributes;
and the first calculation unit is used for calculating the character string co-occurrence probability between the first character string and the second character string according to the semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string.
In a possible implementation manner, the determining module 502 further includes:
and the third determining unit is used for determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are all lower than a preset first probability threshold value as character strings to be corrected in the plurality of character strings.
In a possible implementation manner, the replacing module 503 further includes:
the screening unit is used for screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold in each character string corresponding to a first semantic attribute with the highest pre-stored semantic attribute co-occurrence probability;
the second calculation unit is used for respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;
and the fourth determining unit is used for determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.
In one possible implementation, the apparatus further includes:
the third calculation unit is used for calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text after the character string to be corrected is determined in at least one character string;
and the fifth determining unit is used for executing the step of determining the first semantic attribute with the highest probability of co-occurrence of the semantic attributes of the character strings to be corrected based on the semantic attributes of the character strings to be corrected, which are adjacent to the character strings to be corrected, if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold.
To sum up, the device provided in the embodiment of the present application calculates the co-occurrence probability of character strings between adjacent character strings based on the respective semantic attributes of the character strings to determine the character string to be corrected in the interactive text, and corrects the character string to be corrected according to the semantic attribute of the character adjacent to the character to be corrected, which is not the character to be corrected; because the newly added vocabulary is not required to be input into the model for training, and the error correction process of the text where the newly added vocabulary is located can be completed only according to the semantic attributes of the newly added vocabulary, the problem that the speech recognition system in the related technology is difficult to accurately correct the untrained character strings in the text error correction model is solved, and the effect of accurately completing the error correction process of the interactive text on the premise of not depending on the error correction model is achieved.
In this embodiment, before determining the target character string from each character string corresponding to the first semantic attribute, each character string corresponding to the first semantic attribute is screened in advance according to the length of the character string to be corrected, so as to reduce the number of character strings for which the processor calculates the pronunciation audio similarity, thereby reducing the calculation pressure of the processor.
In this embodiment, in order to reduce the possibility of error correction, after the character string to be corrected is determined from the interactive text, the semantic attribute comprehensive probability corresponding to the interactive text is calculated, and whether error correction needs to be performed on the interactive text is determined according to the level of the semantic attribute comprehensive probability.
It should be noted that: the voice interactive text error correction device provided in the above embodiment is only illustrated by dividing the functional modules when the voice interactive text is corrected, and in practical application, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the voice interaction text error correction device provided by the above embodiment and the voice interaction text error correction method embodiment belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.
An exemplary embodiment of the present application provides a terminal, which can implement the method for correcting the text error in the voice interaction provided by the present application, and the terminal includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data;
segmenting the interactive text to obtain a plurality of character strings, calculating the character string co-occurrence probability between two adjacent character strings according to the pre-stored semantic attributes of each character string, and determining the character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;
determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the character strings to be corrected, selecting a target character string with the highest pronunciation audio similarity to the pronunciation audio corresponding to the character string to be corrected in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the character string to be corrected with the target character string.
Fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application. The terminal 600 may be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.
In general, the terminal 600 includes: a processor 601 and a memory 602.
The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the abnormal data reporting method provided by the method embodiments herein.
In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal.
The power supply 608 is used to provide power to the various components in the terminal 600. The power supply 608 may be alternating current, direct current, disposable or rechargeable. When the power supply 608 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (8)
1. A method for text correction for voice interaction, the method comprising:
acquiring voice data to be recognized, and performing voice recognition to obtain an interactive text corresponding to the voice data;
segmenting the interactive text to obtain a plurality of character strings, determining semantic attributes of adjacent first character strings and semantic attributes of adjacent second character strings according to the pre-stored semantic attributes of the character strings, determining the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings in the corresponding relation of the semantic attribute co-occurrence probability between the pre-stored semantic attributes, calculating the character string co-occurrence probability between the first character strings and the second character strings according to the semantic attribute co-occurrence probability between the semantic attributes of the first character strings and the semantic attributes of the second character strings, and determining the character strings to be corrected in the character strings according to the calculated character string co-occurrence probability;
determining the semantic attribute co-occurrence probability of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings, selecting a target character string with the highest pronunciation audio similarity to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.
2. The method according to claim 1, wherein the determining a character string to be corrected among the plurality of character strings comprises:
and determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are lower than a preset first probability threshold value as character strings to be corrected in the character strings.
3. The method according to claim 1, wherein the selecting, from the character strings corresponding to the first semantic attribute with the highest probability of semantic attribute co-occurrence, a target character string with a pronunciation audio having the highest similarity to the pronunciation audio corresponding to the character string to be modified, comprises:
screening out character strings of which the difference value between the length of the character string and the length of the character string corresponding to the character string to be corrected is smaller than a preset length threshold value in each character string corresponding to a first semantic attribute with the highest co-occurrence probability of the pre-stored semantic attributes;
respectively calculating the editing distance between the pronunciation audio corresponding to each character string and the pronunciation audio corresponding to the character string to be corrected in the character string obtained by screening;
and determining the character string with the minimum editing distance between the corresponding pronunciation audio and the pronunciation audio corresponding to the character string to be corrected as the target character string with the highest pronunciation audio similarity corresponding to the character string to be corrected.
4. The method according to any one of claims 1 to 3, wherein after determining a character string to be corrected among the plurality of character strings, the method further comprises:
calculating the semantic attribute comprehensive probability corresponding to the interactive text according to the semantic attribute co-occurrence probability between the semantic attributes respectively corresponding to all the two adjacent character strings in the interactive text;
and if the comprehensive probability of the semantic attributes corresponding to the interactive text is lower than a preset third probability threshold, executing the step of determining a first semantic attribute with the highest probability of co-occurrence with the semantic attributes of the non-to-be-corrected character strings based on the semantic attributes of the non-to-be-corrected character strings adjacent to the to-be-corrected character strings.
5. An apparatus for text correction for voice interaction, the apparatus comprising:
the acquisition module is used for acquiring voice data to be recognized and performing voice recognition to obtain an interactive text corresponding to the voice data;
the determining module is used for segmenting the interactive text to obtain a plurality of character strings; the determining module comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining the semantic attribute of the adjacent first character string and the semantic attribute of the adjacent second character string according to the pre-stored semantic attribute of each character string; a second determining unit, configured to determine, in a correspondence relationship between pre-stored semantic attribute co-occurrence probabilities among the semantic attributes, a semantic attribute co-occurrence probability between the semantic attribute of the first character string and the semantic attribute of the second character string; a first calculation unit configured to calculate a string co-occurrence probability between the first string and the second string according to a semantic attribute co-occurrence probability between semantic attributes of the first string and semantic attributes of the second string; the determining module is further configured to determine a character string to be corrected in the plurality of character strings according to the calculated character string co-occurrence probability;
and the replacing module is used for determining the semantic attribute co-occurrence probability of the non-to-be-corrected character string based on the semantic attributes of the non-to-be-corrected character string adjacent to the to-be-corrected character string, selecting a target character string with the pronunciation audio frequency having the highest similarity with the pronunciation audio frequency corresponding to the to-be-corrected character string in each character string corresponding to the first semantic attribute with the highest semantic attribute co-occurrence probability, and replacing the to-be-corrected character string with the target character string.
6. The apparatus of claim 5, wherein the determining module further comprises:
and the third determining unit is used for determining third character strings of which the character string co-occurrence probabilities with adjacent character strings are all lower than a preset first probability threshold value as character strings to be corrected in the plurality of character strings.
7. A terminal, characterized in that the terminal comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by the processor to implement the method of text correction for voice interaction according to any of claims 1-4.
8. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of text correction for speech interaction according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810399789.0A CN108595431B (en) | 2018-04-28 | 2018-04-28 | Voice interaction text error correction method, device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810399789.0A CN108595431B (en) | 2018-04-28 | 2018-04-28 | Voice interaction text error correction method, device, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108595431A CN108595431A (en) | 2018-09-28 |
CN108595431B true CN108595431B (en) | 2020-09-25 |
Family
ID=63619157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810399789.0A Active CN108595431B (en) | 2018-04-28 | 2018-04-28 | Voice interaction text error correction method, device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595431B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109376360B (en) * | 2018-10-29 | 2023-10-20 | 广东小天才科技有限公司 | Method and device for assisting in learning language |
CN109766538B (en) * | 2018-11-21 | 2023-12-15 | 北京捷通华声科技股份有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN109979450B (en) * | 2019-03-11 | 2021-12-07 | 海信视像科技股份有限公司 | Information processing method and device and electronic equipment |
CN109922371B (en) * | 2019-03-11 | 2021-07-09 | 海信视像科技股份有限公司 | Natural language processing method, apparatus and storage medium |
CN110020432B (en) * | 2019-03-29 | 2021-09-14 | 联想(北京)有限公司 | Information processing method and information processing equipment |
CN112395863A (en) * | 2019-08-16 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Text processing method and device |
CN110781665B (en) * | 2019-10-29 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Method, device and equipment for evaluating quality of error correction pair and storage medium |
CN111062203B (en) * | 2019-11-12 | 2021-07-20 | 贝壳找房(北京)科技有限公司 | Voice-based data labeling method, device, medium and electronic equipment |
CN111312209A (en) * | 2020-02-21 | 2020-06-19 | 北京声智科技有限公司 | Text-to-speech conversion processing method and device and electronic equipment |
CN111862955B (en) * | 2020-06-23 | 2024-04-23 | 北京嘀嘀无限科技发展有限公司 | Speech recognition method and terminal, and computer readable storage medium |
CN113919327B (en) * | 2020-07-07 | 2024-11-08 | 阿里巴巴集团控股有限公司 | Text error correction method, apparatus, and computer readable medium |
CN112232080A (en) * | 2020-10-20 | 2021-01-15 | 大唐融合通信股份有限公司 | Named entity identification method and device and electronic equipment |
CN113225612B (en) * | 2021-04-14 | 2022-10-11 | 新东方教育科技集团有限公司 | Subtitle generating method, device, computer readable storage medium and electronic equipment |
CN113591440B (en) * | 2021-07-29 | 2023-08-01 | 百度在线网络技术(北京)有限公司 | Text processing method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309886A (en) * | 2012-03-13 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Trading-platform-based structural information searching method and device |
CN103473221A (en) * | 2013-09-16 | 2013-12-25 | 于江德 | Chinese lexical analysis method |
CN104375847A (en) * | 2013-08-14 | 2015-02-25 | 华为技术有限公司 | Business type identification method and device |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW559783B (en) * | 2002-05-31 | 2003-11-01 | Ind Tech Res Inst | Error-tolerant natural language understanding system and method integrating with confidence measure |
US8914278B2 (en) * | 2007-08-01 | 2014-12-16 | Ginger Software, Inc. | Automatic context sensitive language correction and enhancement using an internet corpus |
US8019158B2 (en) * | 2008-01-02 | 2011-09-13 | International Business Machines Corporation | Method and computer program product for recognition error correction data |
CN101930430A (en) * | 2009-06-19 | 2010-12-29 | 株式会社日立制作所 | Language text processing device and language learning device |
CN105244029B (en) * | 2015-08-28 | 2019-02-26 | 安徽科大讯飞医疗信息技术有限公司 | Voice recognition post-processing method and system |
CN107729321A (en) * | 2017-10-23 | 2018-02-23 | 上海百芝龙网络科技有限公司 | A kind of method for correcting error of voice identification result |
-
2018
- 2018-04-28 CN CN201810399789.0A patent/CN108595431B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309886A (en) * | 2012-03-13 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Trading-platform-based structural information searching method and device |
CN104375847A (en) * | 2013-08-14 | 2015-02-25 | 华为技术有限公司 | Business type identification method and device |
CN103473221A (en) * | 2013-09-16 | 2013-12-25 | 于江德 | Chinese lexical analysis method |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
Non-Patent Citations (3)
Title |
---|
A Hybrid Approach To Automatic Text Checking And Error Correction;FUJI REN等;《IEEE International Conference on Systems》;20011230;1693-1698 * |
Error Detection and Correction Based on Chinese Phonemic Alphabet in Chinese Text;Chuen-Min Huang等;《International Conference on Modeling Decisions for Artificial Intelligence》;20071230;1-18 * |
自然语言处理技术的三个里程碑;黄昌宁等;《外语教学与研究》;20020520;第34卷(第3期);1-8 * |
Also Published As
Publication number | Publication date |
---|---|
CN108595431A (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595431B (en) | Voice interaction text error correction method, device, terminal and storage medium | |
US11664027B2 (en) | Method of providing voice command and electronic device supporting the same | |
US10956771B2 (en) | Image recognition method, terminal, and storage medium | |
US20190348022A1 (en) | Method and device for performing voice recognition using grammar model | |
CN106710596B (en) | Answer sentence determination method and device | |
CN108496220B (en) | Electronic equipment and voice recognition method thereof | |
CN110263131B (en) | Reply information generation method, device and storage medium | |
CN110222649B (en) | Video classification method and device, electronic equipment and storage medium | |
US20140373163A1 (en) | Filtering Confidential Information In Voice and Image Data | |
CN110570857B (en) | Voice wake-up method and device, electronic equipment and storage medium | |
CN110931000B (en) | Method and device for speech recognition | |
CN109920309A (en) | Sign language conversion method, device, storage medium and terminal | |
CN111079422B (en) | Keyword extraction method, keyword extraction device and storage medium | |
CN110837557B (en) | Abstract generation method, device, equipment and medium | |
CN114333774B (en) | Speech recognition method, device, computer equipment and storage medium | |
CN106486119B (en) | A kind of method and apparatus identifying voice messaging | |
CN113688231B (en) | Abstract extraction method and device of answer text, electronic equipment and medium | |
CN112232059B (en) | Text error correction method and device, computer equipment and storage medium | |
CN109948155B (en) | Multi-intention selection method and device and terminal equipment | |
CN113012683A (en) | Speech recognition method and device, equipment and computer readable storage medium | |
CN110781270B (en) | Method and device for constructing non-keyword model in decoding network | |
CN108417208B (en) | Voice input method and device | |
CN112911403B (en) | Event analysis method and device, television and computer readable storage medium | |
CN117807993A (en) | Word segmentation method, word segmentation device, computer equipment and storage medium | |
CN110795927B (en) | n-gram language model reading method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |