CN109065045A - Audio recognition method, device, electronic equipment and computer readable storage medium - Google Patents
Audio recognition method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN109065045A CN109065045A CN201811004170.1A CN201811004170A CN109065045A CN 109065045 A CN109065045 A CN 109065045A CN 201811004170 A CN201811004170 A CN 201811004170A CN 109065045 A CN109065045 A CN 109065045A
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- target
- recognition model
- voice recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012545 processing Methods 0.000 claims description 22
- 238000004891 communication Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000012905 input function Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention discloses a kind of audio recognition method, device, electronic equipment and computer readable storage mediums, this method comprises: obtaining the current speech information of user, and identify to current speech information;If identifying target keyword in current speech information, at least two speech recognition modelings of pre-configuration, speech recognition modeling corresponding with target keyword is determined as target voice identification model.The scheme of the embodiment of the present invention, it can be based on the target keyword in the current speech information of user, determine speech recognition modeling corresponding with target keyword, due to the speech recognition modeling be it is corresponding with target keyword, can be based on target keyword Rapid matching to corresponding speech recognition modeling, and then when being identified using the speech recognition modeling to current speech information, speech recognition accuracy can be improved, it is also possible to shorten the speech recognition time, recognition efficiency is improved.
Description
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
Speech recognition technology is a technology that enables a machine to convert speech signals into corresponding text or commands through a recognition and understanding process. The application field of speech recognition is very wide, and common application systems are as follows: compared with a keyboard input method, the voice input system is more in line with the daily habits of people, and is more natural and more efficient; the voice control system, namely, the operation of the equipment is controlled by voice, is more rapid and convenient compared with manual control, and can be used in a plurality of fields such as industrial control, voice dialing system, intelligent household appliances, voice control intelligent toys and the like; the intelligent dialogue inquiry system operates according to the voice of the client, and provides natural and friendly database retrieval services for the user, such as family service, hotel service, travel agency service system, ticket booking system, medical service, bank service, stock inquiry service and the like.
In the specific implementation process, the inventor finds that the voice recognition efficiency is low in the prior art, greatly influences the user experience, and urgently needs a method capable of improving the voice recognition efficiency.
Disclosure of Invention
In view of this, embodiments of the present invention provide a speech recognition method, apparatus, electronic device and computer-readable storage medium, which can effectively improve the efficiency of speech recognition.
In order to solve the above problems, embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a speech recognition method, where the method includes:
acquiring current voice information of a user, and identifying the current voice information;
and if the target keyword is identified in the current voice information, determining a voice recognition model corresponding to the target keyword as a target voice recognition model in at least two pre-configured voice recognition models.
In a second aspect, an embodiment of the present invention further provides a speech recognition apparatus, where the apparatus includes:
the voice information acquisition module is used for acquiring the current voice information of the user; recognizing the current voice information;
and the recognition model matching module is used for recognizing the current voice information, and if the target keyword is recognized in the current voice information, determining the voice recognition model corresponding to the target keyword as the target voice recognition model in at least two pre-configured voice recognition models.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
at least one processor;
and at least one memory, bus connected with the processor; wherein,
the processor and the memory complete mutual communication through the bus;
the processor is arranged to call program instructions in the memory to perform the method as shown in the embodiments of the first aspect of the invention.
In a fourth aspect, the embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions cause a computer to execute the method shown in the embodiments of the first aspect of the present invention.
By the technical scheme, the technical scheme provided by the embodiment of the invention at least has the following advantages:
the voice recognition method, the voice recognition device, the electronic equipment and the computer readable storage medium provided by the embodiment of the invention can determine the voice recognition model corresponding to the target keyword based on the target keyword in the current voice information of the user, and the voice recognition model is corresponding to the target keyword, so that the corresponding voice recognition model can be quickly matched based on the target keyword, and further, when the current voice information is recognized by utilizing the voice recognition model, the voice recognition accuracy can be improved, meanwhile, the voice recognition time can be shortened, and the recognition efficiency can be improved.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the embodiments of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the embodiments of the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flow chart illustrating a speech recognition method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
An embodiment of the present invention provides a speech recognition method, and as shown in fig. 1, the speech recognition method provided in the embodiment of the present invention may include:
step S110, acquiring current voice information of the user, and recognizing the current voice information.
The current voice information may be voice information that can be provided by a user through any device having a voice input function, for example, a microphone on the user terminal device, a voice input function key of an application program in the user terminal device, and the like. In practical applications, the current speech information of the user may be a word or a segment of speech spoken by the user, and the embodiment of the present invention does not limit the specific form of the current speech information.
Step S120, if the target keyword is recognized in the current voice information, determining the voice recognition model corresponding to the target keyword as the target voice recognition model in at least two pre-configured voice recognition models.
The target keyword may be a keyword that is pre-configured based on the requirements or experiences of the actual application, for example, for various application programs in the mobile phone terminal or various functions of the smart sound box, the target keyword may be configured as: open, make a call, give, close, etc. Similarly, the speech recognition models may be configured differently based on different requirements in practical applications, and different speech recognition models correspond to different target keywords.
It should be noted that a speech recognition model may correspond to one or more target keywords.
According to the scheme of the embodiment of the invention, the voice recognition model corresponding to the target keyword can be determined based on the target keyword in the current voice information of the user, and the voice recognition model is corresponding to the target keyword, so that the corresponding voice recognition model can be quickly matched based on the target keyword, and further, when the current voice information is recognized by utilizing the voice recognition model, the voice recognition accuracy can be improved, meanwhile, the voice recognition time can be shortened, and the recognition efficiency can be improved.
In an alternative embodiment of the present invention, the target speech recognition model is a speech recognition model created from a corpus corresponding to the target keyword.
The preset speech recognition models are configured with corresponding target keywords, and the speech recognition models corresponding to the target keywords can be obtained by training based on the corpus corresponding to the target keywords. Since the corpus is usually established based on the user-related data or the historical dialogue context data of the user, the accuracy of speech recognition can be improved when the current speech information of the user is recognized based on the target speech recognition model.
In an example, the target keyword may be "make a call", the corpus corresponding to the target keyword may be related information in a telephone directory, the corpus may include information such as a name and a phone number in the directory, which is associated with the target keyword "make a call", and the speech recognition model corresponding to the target keyword "make a call" may be a speech recognition model trained based on the corpus.
In another example, if the target keyword may be "open", the corpus corresponding to the target keyword may be a database formed by various application names installed on the user terminal device, and the speech recognition model corresponding to the target keyword "open" may be a speech recognition model trained based on information in the database.
In practical application, the models may be configured based on different requirements in practical application, the corpus in the corpus may be content in pinyin, characters (characters and/or words, etc.), numbers or other forms, and the forms of the corpora corresponding to different speech recognition models may also be different. The embodiment of the invention does not limit the concrete expression form of the corpus.
In practical application, the corpus corresponding to the speech recognition model may be configured by a server and stored in the server, or may be stored in the user terminal device, the user terminal device may provide the corpus for constructing the corpus to the server, and the server may create the speech recognition model according to the corpus provided by the user terminal device.
It should be noted that the information in the corpus may be updated according to actual needs, and the update may include, but is not limited to, addition, deletion, or modification. For example, for a corpus created based on information in a telephone directory, when a user adds or deletes contact information, the information in the corresponding corpus may be changed. In practical applications, if the corpus is stored in the user terminal device, the user terminal device may update the corresponding corpus according to the information in the telephone directory for a certain period of time, and if the corpus is stored in the server, the server may periodically obtain the information in the telephone directory or the information in the telephone directory that is changed from the user terminal device, so as to update the corresponding corpus.
In an optional embodiment of the present invention, after determining the speech recognition model corresponding to the target keyword as the target speech recognition model, the method provided in the embodiment of the present invention may further include:
according to the target voice recognition model, recognizing the current voice information to obtain a first voice recognition result;
and carrying out corresponding processing according to the first voice recognition result.
The target speech recognition model is a model corresponding to the target keyword in the current speech information, so that the current speech information is recognized based on the target speech recognition model, the data volume in the corpus to which the target speech recognition model is applied can be greatly reduced, the recognition is more targeted, the recognition efficiency can be effectively improved, and the first speech recognition result can be correspondingly processed based on the recognized first speech recognition result. The specific method for recognizing the speech information according to the speech recognition model is a speech recognition method in the prior art, and is not described herein again.
It can be understood that the corresponding processing according to the speech recognition result may be performing speech interaction with the user based on the speech recognition result, or controlling the user terminal device to perform corresponding operations based on the speech recognition result, for example, making a call or playing music according to the recognition result.
In one example, for example, the current speech information is: if the user calls a (person name) phone, determining a target voice recognition model which is corresponding to the call and is created based on related information in a phone address list based on a target keyword in the current voice information, and then recognizing the current voice information by the target voice recognition model to obtain a first voice recognition result: because the information in the corpus based on the voice recognition model is the related information such as the name of a person in a telephone address book, the corresponding telephone number and the like, the voice recognition result of the 'calling to A' can be quickly recognized in the related information of the telephone address book, the name of the person A and the telephone number of the A in the current voice information can be quickly matched in the corpus based on the recognition result, and the operation of dialing the telephone number of the A can be further realized based on the name of the person A and the corresponding telephone number. In this example, the current speech is recognized through the speech recognition model corresponding to the target keyword, so that the speech recognition efficiency can be improved, and in practical application, a user can quickly dial the telephone of a certain person in the telephone address list through the speech based on the method, so that the speech recognition use experience of the user is improved.
In an optional embodiment of the present invention, recognizing the current speech information according to the target speech recognition model to obtain a first speech recognition result may include:
according to the target voice recognition model, recognizing information except recognized voice information in the current voice information to obtain a second voice recognition result, wherein the recognized voice information comprises target keywords;
and obtaining a first voice recognition result according to the second voice recognition result and the recognized voice information.
Wherein, in the process of recognizing the current voice information to recognize the target keyword, because some information including the target keyword is recognized, when the current voice information is recognized according to the target voice recognition model, the recognized voice information in the current voice information is not required to be recognized, the information except the recognized voice information can be recognized, the first voice recognition result can be obtained based on the recognized voice information and the second voice recognition result obtained by recognizing the information except the recognized voice information, the information except the recognized voice information is not repeatedly recognized, the resource consumption of the equipment can be reduced, the efficiency of the voice recognition is further improved, and meanwhile, the recognized voice information in the current voice information is not repeatedly recognized, the data volume in the corpus to which the method is applied can be greatly reduced, and the accuracy of speech recognition can be improved.
In an optional embodiment of the present invention, after determining the speech recognition model corresponding to the target keyword as the target speech recognition model, the method provided in the embodiment of the present invention may further include:
providing first prompt information corresponding to the target keyword to a user according to the target keyword;
acquiring voice information of a user based on first prompt information;
recognizing the voice information of the user based on the first prompt information through the target voice recognition model to obtain a third voice recognition result;
and performing corresponding processing according to the third voice recognition result.
The first prompt information may be information with a prompt function pre-configured according to a pre-configured target keyword, the corresponding first prompt information may be different for different target keywords, the first prompt information may limit the voice content input by the user after the current voice information, and further may improve the efficiency of voice recognition, and the presentation form of the first prompt information is not limited to voice or text.
In an example, such as the target keyword is "call", the corresponding prompt message may be information related to the call, such as "who to call", and the user may give a corresponding reply based on the prompt message, such as: and the 'XX' is given, the voice content input by the user can be limited by the 'who is given' of the prompt information, the voice information corresponding to the prompt information of the user is recognized through the target voice recognition model, a corresponding third voice recognition result can be rapidly recognized, and the processing corresponding to the third voice recognition result is carried out.
In alternative embodiments of the invention, if any of the following is present: the third speech recognition result does not correspond to the first prompt message, or the speech information of the user based on the first prompt message is not received within a preset time, or the speech information of the user based on the first prompt message is failed to be recognized through the target speech recognition model, and the method provided by the embodiment of the invention further comprises the following steps:
and providing second prompt information corresponding to the prompt strategy to the user according to the pre-configured prompt strategy.
The prompt strategy can be configured according to requirements in practical application, the configuration mode of the prompt strategy is not limited in the embodiment of the invention, different second prompt information can also be configured according to practical requirements, the expression form of the second prompt information is not limited in the embodiment of the invention, and the second prompt information can be any one of voice, characters, optical signals and specific sound.
In an example, if the second prompt message is a voice, the second prompt message may be a section of voice output through the voice output device, such as "please say again"; if the second prompt message is a text, the second prompt message may be a text displayed on the terminal device with the display function, such as "please say again"; if the second prompt information is an optical signal, the second prompt information may be an optical signal with different colors sent by an indicator light, for example, the indicator light continuously flashes red light to remind the user that the current voice information cannot identify the recognition result; if the second prompt message is a specific sound, the second prompt message may output a specific sound, such as an alarm sound, through the sound output device to remind the user that the current speech message cannot identify the recognition result.
The above three cases will be described separately with reference to specific examples.
For the first case: the third speech recognition result does not correspond to the first prompt message, which indicates that the speech information of the user based on the first prompt message may be information irrelevant to the first prompt message, and thus the accurate third speech recognition result cannot be recognized from the speech information of the user based on the first prompt message, for example: based on the first prompt message "who is given", the user's reply is: if the third recognition result after the recognition of "i want to eat" is not related to the first prompt information, "i want to eat" can provide the second prompt information corresponding to the prompt policy to the user according to the pre-configured prompt policy, such as "please say again," or "who to get," or "whether to cancel the call," or prompt the user through an optical signal, or prompt the user through a specific sound, etc.
For the second case: if the voice message of the user based on the first prompt message is not received within the preset time length, which indicates that the user may not reply based on the first prompt message, a second prompt message corresponding to the prompt policy may be provided to the user according to the preconfigured prompt policy, where the second prompt message may be the same message as the first prompt message or different messages.
For the third case: in this case, the voice information of the user based on the first prompt information may not be correctly recognized by the target voice recognition model because the voice information of the user based on the first prompt information does not correspond to the information in the corpus based on the target voice recognition model, and at this time, second prompt information corresponding to the prompt policy, such as "recognition failure" or the like, may also be provided to the user according to the preconfigured prompt policy.
In an optional embodiment of the present invention, if the target keyword is not identified in the current speech information, on the basis of the foregoing embodiment, the method provided in the embodiment of the present invention may further include:
recognizing the current voice information according to a pre-configured general recognition model to obtain a fourth voice recognition result;
and carrying out corresponding processing according to the fourth voice recognition result.
The general recognition model is another speech recognition model different from the at least two preconfigured speech recognition models, and it can be understood that the recognition application range of the general recognition model is larger than that of the speech recognition model corresponding to the target keyword, and the general recognition model may be a general speech recognition model in the prior art, that is, the general recognition model is not a speech recognition model configured based on the target keyword, and the target keyword may not be recognized in the current speech information, and then the current speech information may be recognized based on the general recognition model to obtain a corresponding fourth speech recognition result, and corresponding processing may be performed based on the fourth speech recognition result.
It should be noted that, in practical applications, when the current speech information is recognized to identify the target keyword, the target keyword may be identified based on a pre-configured recognition model for identifying the target keyword, that is, the target keyword recognition model may be a model specially used for target keyword recognition, which is trained based on each target keyword corresponding to at least two pre-configured speech recognition models, and when the target keyword cannot be identified in the current speech information through the model, the current speech information may be identified based on a general speech model.
Certainly, in practical applications, after obtaining the current speech information of the user, the current speech information may be recognized, or the current speech information may be recognized through a pre-configured general recognition model, and at this time, if the target keyword cannot be recognized through the general recognition model, the recognition result of the current speech information may be correspondingly processed based on the general recognition model.
In an optional embodiment of the present invention, on the basis of the foregoing embodiment, if the first speech recognition result is at least two speech recognition results, the method provided in the embodiment of the present invention may further include:
providing third prompt information corresponding to the first voice recognition result to the user according to the first voice recognition result;
acquiring voice information of the user based on the third prompt message;
recognizing the voice information of the user based on the third prompt information through the target voice recognition model to obtain a fifth voice recognition result;
and carrying out corresponding processing according to the fifth voice recognition result.
The third prompt information can be obtained according to the first voice recognition result and the corpus in the corpus corresponding to the target voice recognition model, because the first voice recognition result may not accurately reflect the real intention of the user, the third prompt information corresponding to the first voice recognition result can be provided for the user based on the first voice recognition result, the real intention of the user can be further determined according to the obtained voice information of the user based on the third prompt information through the third prompt information, and then voice recognition is performed according to the voice information of the user based on the third prompt information, so that the voice recognition result is more accurate.
In practical application, the expression form of the third prompt message is not limited to voice or text, and when the target voice recognition model recognizes the voice message of the user based on the third prompt message, the fifth voice recognition result can be obtained by combining the third prompt message.
In one example, the first speech recognition result is, for example: "give A1 (name of person) and A2 (name of person)"; the first voice recognition result recognized based on the current voice information of the user is two, it cannot be determined to which person the user is going to make a call, so the prompt can be made based on the two names a1 and a2 in the address book, for example, the third prompt information can be: "is it given to a 2? ", if the user answers yes, then the fifth speech recognition result: the object that the user really wants to make a call can be accurately identified, and the fifth speech recognition result is obtained by: "give a 2", the fifth speech recognition result is more accurate than the first speech recognition result.
Based on the same principle as the speech recognition method shown in fig. 1, the embodiment of the invention further provides a speech recognition apparatus 20, and as shown in fig. 2, the speech recognition apparatus 20 may include a speech information obtaining module 210 and a recognition model matching module 220. Wherein:
a voice information obtaining module 210, configured to obtain current voice information of a user;
and the recognition model matching module 220 is configured to recognize the current voice information, and if a target keyword is recognized in the current voice information, determine, as the target voice recognition model, a voice recognition model corresponding to the target keyword from among the at least two pre-configured voice recognition models.
According to the scheme in the optional embodiment of the invention, the voice recognition model corresponding to the target keyword can be determined based on the target keyword in the current voice information of the user, and the voice recognition model is corresponding to the target keyword, so that the corresponding voice recognition model can be quickly matched based on the target keyword, and further, when the current voice information is recognized by utilizing the voice recognition model, the voice recognition accuracy can be improved, meanwhile, the voice recognition time can be shortened, and the recognition efficiency can be improved.
In an alternative embodiment of the present invention, the target speech recognition model is a speech recognition model created from a corpus corresponding to the target keyword.
In an optional embodiment of the present invention, the apparatus provided in the embodiment of the present invention may further include:
and the first voice recognition processing module is used for recognizing the current voice information according to the target voice recognition model after determining the voice recognition model corresponding to the target keyword to obtain a first voice recognition result, and performing processing corresponding to the first voice recognition result according to the first voice recognition result.
In an optional embodiment of the present invention, based on the foregoing embodiment, the first speech recognition module may be specifically configured to:
according to the target voice recognition model, recognizing information except recognized voice information in the current voice information to obtain a second voice recognition result, wherein the recognized voice information comprises target keywords;
and obtaining a first voice recognition result according to the second voice recognition result and the recognized voice information.
In an optional embodiment of the present invention, on the basis of the foregoing embodiment, the apparatus provided in the embodiment of the present invention may further include:
the first prompt information providing module is used for providing first prompt information corresponding to the target keyword for the user according to the target keyword after the voice recognition model corresponding to the target keyword is determined as the target voice recognition model;
the first voice information receiving module is used for acquiring voice information of a user based on first prompt information;
and the second voice recognition processing module is used for recognizing the voice information of the user based on the first prompt information through the target voice recognition model to obtain a third voice recognition result, and performing corresponding processing according to the third voice recognition result.
In alternative embodiments of the invention, if any of the following is present: the third voice recognition result does not correspond to the first prompt message, the voice message of the user based on the first prompt message is not received within the preset time length, and the voice message of the user based on the first prompt message is failed to be recognized through the target voice recognition model; the device provided by the embodiment of the invention can also comprise:
and the second prompt information providing module is used for providing second prompt information corresponding to the prompt strategy to the user according to the pre-configured prompt strategy.
In an optional embodiment of the present invention, on the basis of the foregoing embodiment, the apparatus provided in the embodiment of the present invention may further include:
and the third voice recognition processing module is used for recognizing the current voice information according to the pre-configured general recognition model when the target keyword is not recognized in the current voice information, obtaining a fourth voice recognition result and carrying out corresponding processing according to the fourth voice recognition result.
In an optional embodiment of the present invention, on the basis of the foregoing embodiment, the apparatus provided in the embodiment of the present invention may further include:
and the prompt information acquisition module is used for acquiring prompt information corresponding to the prompt strategy according to the pre-configured prompt strategy.
In an optional embodiment of the present invention, on the basis of the foregoing embodiment, the apparatus provided in the embodiment of the present invention may further include:
the third prompt information providing module is used for providing third prompt information corresponding to the first voice recognition result for the user according to the first voice recognition result;
the second voice information receiving module is used for acquiring the voice information of the user based on the third prompt message;
and the third voice recognition processing module is used for recognizing the voice information of the user based on the third prompt information through the target voice recognition model to obtain a fifth voice recognition result, and performing corresponding processing according to the fifth voice recognition result.
It can be understood that, since the speech recognition apparatus described in this embodiment is an apparatus capable of executing the speech recognition method in the alternative embodiment of the present invention, based on the speech recognition method described in the alternative embodiment of the present invention, a person skilled in the art can understand the specific implementation manner of the speech recognition apparatus of this embodiment and various variations thereof, and therefore, a detailed description of how the speech recognition apparatus implements the speech recognition method in the alternative embodiment of the present invention is not provided here. The scope of the present invention is intended to encompass any device that can be used by those skilled in the art to implement the speech recognition method in the alternative embodiments of the present invention.
An embodiment of the present invention provides an electronic device, as shown in fig. 3, the electronic device may include: at least one processor (processor) 31; and at least one memory (memory)32, a bus 33, connected to the processor 31; wherein,
the processor 31 and the memory 32 complete mutual communication through the bus 33;
the processor 31 is arranged to call program instructions in the memory 32 to perform the steps in the above-described method embodiments.
The present embodiments provide a computer-readable storage medium storing computer instructions for causing a computer to perform a method as provided by any of the above method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (10)
1. A speech recognition method, comprising:
acquiring current voice information of a user, and identifying the current voice information;
and if the target keyword is identified in the current voice information, determining a voice recognition model corresponding to the target keyword as a target voice recognition model in at least two pre-configured voice recognition models.
2. The method of claim 1, wherein the target speech recognition model is a speech recognition model created from a corpus corresponding to the target keyword.
3. The method of claim 1 or 2, wherein after determining the speech recognition model corresponding to the target keyword as a target speech recognition model, the method further comprises:
recognizing the current voice information according to the target voice recognition model to obtain a first voice recognition result;
and carrying out corresponding processing according to the first voice recognition result.
4. The method of claim 3, wherein the recognizing the current speech information according to the target speech recognition model to obtain a first speech recognition result comprises:
according to the target voice recognition model, recognizing information except recognized voice information in the current voice information to obtain a second voice recognition result, wherein the recognized voice information comprises the target keyword;
and obtaining the first voice recognition result according to the second voice recognition result and the recognized voice information.
5. The method of claim 1 or 2, wherein after determining the speech recognition model corresponding to the target keyword as a target speech recognition model, the method further comprises:
providing first prompt information corresponding to the target keyword to the user according to the target keyword;
acquiring voice information of the user based on the first prompt message;
recognizing the voice information of the user based on the first prompt information through the target voice recognition model to obtain a third voice recognition result;
and carrying out corresponding processing according to the third voice recognition result.
6. The method of claim 5, wherein the step of determining if any of the following conditions exist is further defined as: the third voice recognition result does not correspond to the first prompt message, the voice message of the user based on the first prompt message is not received within a preset time length, and the voice message of the user based on the first prompt message is failed to be recognized through the target voice recognition model;
the method further comprises the following steps:
and providing second prompt information corresponding to the prompt strategy to the user according to the pre-configured prompt strategy.
7. The method of claim 1 or 2, wherein if no target keyword is identified in the current speech message, the method further comprises:
recognizing the current voice information according to a pre-configured general recognition model to obtain a fourth voice recognition result;
and carrying out corresponding processing according to the fourth voice recognition result.
8. A speech recognition apparatus, comprising:
the voice information acquisition module is used for acquiring the current voice information of the user; identifying the current voice information;
and the recognition model matching module is used for recognizing the current voice information, and if a target keyword is recognized in the current voice information, determining a voice recognition model corresponding to the target keyword as a target voice recognition model in at least two pre-configured voice recognition models.
9. An electronic device, comprising:
at least one processor;
and at least one memory, bus connected with the processor; wherein,
the processor and the memory complete mutual communication through the bus;
the processor is configured to invoke program instructions in the memory to perform the method of any one of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811004170.1A CN109065045A (en) | 2018-08-30 | 2018-08-30 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811004170.1A CN109065045A (en) | 2018-08-30 | 2018-08-30 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109065045A true CN109065045A (en) | 2018-12-21 |
Family
ID=64758729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811004170.1A Pending CN109065045A (en) | 2018-08-30 | 2018-08-30 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109065045A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048091A (en) * | 2019-12-30 | 2020-04-21 | 苏州思必驰信息科技有限公司 | Voice recognition method, voice recognition equipment and computer readable storage medium |
CN111539744A (en) * | 2019-01-21 | 2020-08-14 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112532691A (en) * | 2020-11-06 | 2021-03-19 | 问问智能信息科技有限公司 | Information processing method and device |
CN113468368A (en) * | 2020-04-28 | 2021-10-01 | 海信集团有限公司 | Voice recording method, device, equipment and medium |
CN113808582A (en) * | 2020-06-17 | 2021-12-17 | 北京字节跳动网络技术有限公司 | Voice recognition method, device, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102074231A (en) * | 2010-12-30 | 2011-05-25 | 万音达有限公司 | Voice recognition method and system |
CN104535071A (en) * | 2014-12-05 | 2015-04-22 | 百度在线网络技术(北京)有限公司 | Voice navigation method and device |
CN105632487A (en) * | 2015-12-31 | 2016-06-01 | 北京奇艺世纪科技有限公司 | Voice recognition method and device |
CN105654943A (en) * | 2015-10-26 | 2016-06-08 | 乐视致新电子科技(天津)有限公司 | Voice wakeup method, apparatus and system thereof |
CN105679314A (en) * | 2015-12-28 | 2016-06-15 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN108304375A (en) * | 2017-11-13 | 2018-07-20 | 广州腾讯科技有限公司 | A kind of information identifying method and its equipment, storage medium, terminal |
-
2018
- 2018-08-30 CN CN201811004170.1A patent/CN109065045A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102074231A (en) * | 2010-12-30 | 2011-05-25 | 万音达有限公司 | Voice recognition method and system |
CN104535071A (en) * | 2014-12-05 | 2015-04-22 | 百度在线网络技术(北京)有限公司 | Voice navigation method and device |
CN105654943A (en) * | 2015-10-26 | 2016-06-08 | 乐视致新电子科技(天津)有限公司 | Voice wakeup method, apparatus and system thereof |
CN105679314A (en) * | 2015-12-28 | 2016-06-15 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN105632487A (en) * | 2015-12-31 | 2016-06-01 | 北京奇艺世纪科技有限公司 | Voice recognition method and device |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN108304375A (en) * | 2017-11-13 | 2018-07-20 | 广州腾讯科技有限公司 | A kind of information identifying method and its equipment, storage medium, terminal |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539744A (en) * | 2019-01-21 | 2020-08-14 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111539744B (en) * | 2019-01-21 | 2023-08-29 | 北京嘀嘀无限科技发展有限公司 | Data processing method, device, electronic equipment and storage medium |
CN111048091A (en) * | 2019-12-30 | 2020-04-21 | 苏州思必驰信息科技有限公司 | Voice recognition method, voice recognition equipment and computer readable storage medium |
CN113468368A (en) * | 2020-04-28 | 2021-10-01 | 海信集团有限公司 | Voice recording method, device, equipment and medium |
CN113808582A (en) * | 2020-06-17 | 2021-12-17 | 北京字节跳动网络技术有限公司 | Voice recognition method, device, equipment and storage medium |
CN113808582B (en) * | 2020-06-17 | 2024-04-09 | 抖音视界有限公司 | Speech recognition method, device, equipment and storage medium |
CN112532691A (en) * | 2020-11-06 | 2021-03-19 | 问问智能信息科技有限公司 | Information processing method and device |
CN112532691B (en) * | 2020-11-06 | 2024-09-24 | 问问智能信息科技有限公司 | Information processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065045A (en) | Audio recognition method, device, electronic equipment and computer readable storage medium | |
CN109616108B (en) | Multi-turn dialogue interaction processing method and device, electronic equipment and storage medium | |
CN109036424A (en) | Audio recognition method, device, electronic equipment and computer readable storage medium | |
KR102112814B1 (en) | Parameter collection and automatic dialog generation in dialog systems | |
CN112527353B (en) | Online marketplace for enhancing plug-ins for dialog systems | |
CN106462565B (en) | Text is updated in document | |
CN103077714B (en) | Information identification method and apparatus | |
KR102046728B1 (en) | Method and device for identifying time information from voice information | |
US10249296B1 (en) | Application discovery and selection in language-based systems | |
WO2018213740A1 (en) | Action recipes for a crowdsourced digital assistant system | |
CN110459222A (en) | Sound control method, phonetic controller and terminal device | |
US20140278343A1 (en) | Assistive agent | |
CN105469789A (en) | Voice information processing method and voice information processing terminal | |
CN107733722B (en) | Method and apparatus for configuring voice service | |
CN103987130A (en) | Terminal access method, device and system based on WIFI equipment | |
CN115952272B (en) | Method, device and equipment for generating dialogue information and readable storage medium | |
CN104038630A (en) | Speech processing method and device | |
CN109979450B (en) | Information processing method and device and electronic equipment | |
US12093707B2 (en) | Action recipes for a crowdsourced digital assistant system | |
CN110930117A (en) | Artificial intelligence micro service system | |
CN111309857A (en) | Processing method and processing device | |
CN111144132A (en) | Semantic recognition method and device | |
CN109857450B (en) | Verification service arrangement method and device | |
WO2021112822A1 (en) | Intent addition for a chatbot | |
CN112784030B (en) | Method and device for generating sample, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |