Nothing Special   »   [go: up one dir, main page]

CN111681655A - Voice control method and device, electronic equipment and storage medium - Google Patents

Voice control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111681655A
CN111681655A CN202010437874.9A CN202010437874A CN111681655A CN 111681655 A CN111681655 A CN 111681655A CN 202010437874 A CN202010437874 A CN 202010437874A CN 111681655 A CN111681655 A CN 111681655A
Authority
CN
China
Prior art keywords
signal
information
electronic device
voice control
electronic equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010437874.9A
Other languages
Chinese (zh)
Inventor
王杰
陈孝良
李智勇
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202010437874.9A priority Critical patent/CN111681655A/en
Publication of CN111681655A publication Critical patent/CN111681655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure provides a voice control method, a voice control device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the steps of collecting sound when a voice control module of the electronic equipment is in a dormant state; acquiring environmental information of the environment where the electronic equipment is located in response to the acquired first personal sound signal; in response to determining that the first personal audio signal is used to trigger the electronic device to perform the target operation based on the environmental information, waking up the voice control module; and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module. Because when the first person sound signal is collected, the environment information of the environment where the electronic equipment is located is obtained, and the voice control module is awakened based on the environment information, the operation corresponding to the first person sound signal can be executed without awakening the voice control module by an awakening word in the voice control process, so that the voice control process is reduced, and the voice control efficiency is improved.

Description

Voice control method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a voice control method and apparatus, an electronic device, and a storage medium.
Background
With the development of artificial intelligence, more and more electronic devices have a voice control function; that is, the user can control the electronic device to perform some operations through voice. For example, the user may control the electronic device to play music or query for weather, etc.
In the related art, when a user controls an electronic device to execute an operation, the user needs to wake up a voice control module of the electronic device through a target wake-up word, and then the electronic device detects a voice signal of the user and executes an operation corresponding to the voice signal.
However, before the electronic device receives the voice signal, the voice control module of the electronic device can be started only by detecting the wake-up word, which is not in line with the natural process of human interaction, so that the voice control process is complicated and the voice control efficiency is low.
Disclosure of Invention
The embodiment of the disclosure provides a voice control method, a voice control device, an electronic device and a storage medium, which can improve the efficiency of voice control. The technical scheme is as follows:
in one aspect, a method for controlling voice is provided, the method comprising:
when a voice control module of the electronic equipment is in a dormant state, sound collection is carried out;
acquiring environment information of the environment where the electronic equipment is located in response to the acquired first personal sound signal;
in response to determining that the first personal acoustic signal is used to trigger the electronic device to perform a target operation based on the environmental information, waking up the voice control module;
and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module.
In a possible implementation manner, the obtaining environmental information of an environment in which the electronic device is located includes: determining, based on the first vocal signals, a number of target users that emitted the first vocal signals; and taking the number of the target users as the environment information.
In another possible implementation manner, the determining, based on the first vocal signal, the number of target users who emit the first vocal signal includes: determining the azimuth information of a target user sending the first personal sound signal in the electronic equipment based on the first personal sound signal, and determining the number of the target users according to the azimuth information; or,
determining voiceprint information included in the first personal sound signal based on the first personal sound signal, and determining the number of the target users according to the voiceprint information.
In another possible implementation manner, the acquiring environmental information of an environment in which the electronic device is located includes: acquiring noise information, wherein the noise information is acquired by the electronic equipment in the process of acquiring the first human voice signal; and taking the noise information as the environment information.
In another possible implementation manner, the method further includes:
determining a first decibel of the noise information; and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the first decibel being lower than a first preset decibel.
In another possible implementation manner, the method further includes:
determining a first decibel of the noise information and determining a second decibel of the first vocal signal;
and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset numerical value.
In another possible implementation manner, before the obtaining of the environment information of the environment where the electronic device is located, the method further includes:
performing intention identification on the first personal sound signal to obtain intention information of the first personal sound signal; determining a target application program for executing the intention information according to the intention information; and responding to the target application program included on the electronic equipment, and executing the step of acquiring the environment information of the environment where the electronic equipment is located.
In another possible implementation manner, before the obtaining of the environment information of the environment where the electronic device is located, the method further includes:
and in response to that no second voice signal is acquired within a preset time period before the first voice signal is acquired, executing the step of acquiring the environmental information of the environment where the electronic equipment is located.
In another possible implementation manner, the performing, by the voice control module, an operation corresponding to the first vocal signal based on the first vocal signal includes: sending the first human voice signal to a server through the voice control module, and receiving an operation instruction corresponding to the first human voice signal returned by the server; and executing the operation corresponding to the operation instruction.
In another aspect, a voice control apparatus is provided, the apparatus comprising:
the acquisition module is configured to acquire sound when the voice control module of the electronic equipment is in a dormant state;
the acquisition module is configured to respond to the acquisition of the first personal sound signal and acquire environment information of the environment where the electronic equipment is located;
a wake-up module configured to wake-up the voice control module in response to determining, based on the environmental information, that the first vocal signal is for triggering the electronic device to perform a target operation;
the execution module is configured to execute, through the voice control module, an operation corresponding to the first human voice signal based on the first human voice signal.
In one possible implementation, the obtaining module is further configured to determine, based on the first vocal signal, a number of target users who emit the first vocal signal; and taking the number of the target users as the environment information.
In another possible implementation manner, the obtaining module is further configured to determine, based on the first personal sound signal, location information of a target user who sends the first personal sound signal on the electronic device, and determine the number of the target users according to the location information; or,
determining voiceprint information included in the first personal sound signal based on the first personal sound signal, and determining the number of the target users according to the voiceprint information.
In another possible implementation manner, the obtaining module is further configured to obtain noise information, where the noise information is collected by the electronic device in a process of collecting the first human voice signal; and taking the noise information as the environment information.
In another possible implementation, the obtaining module is further configured to determine a first decibel of the noise information; and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the first decibel being lower than a first preset decibel.
In another possible implementation, the obtaining module is further configured to determine a first decibel of the noise information and determine a second decibel of the first vocal signal; and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset numerical value.
In another possible implementation manner, the execution module is further configured to perform intent recognition on the first vocal signal, so as to obtain intent information of the first vocal signal; determining a target application program for executing the intention information according to the intention information; and responding to the target application program included on the electronic equipment, and executing the step of acquiring the environment information of the environment where the electronic equipment is located.
In another possible implementation manner, the execution module is further configured to execute the step of acquiring the environmental information of the environment where the electronic device is located in response to that a second vocal signal is not acquired within a preset time period before the first vocal signal is acquired.
In another possible implementation manner, the execution module is further configured to send the first human voice signal to a server through the voice control module, and receive an operation instruction corresponding to the first human voice signal returned by the server; and executing the operation corresponding to the operation instruction.
In another aspect, an electronic device is provided, and the electronic device includes a processor and a memory, where the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the operations performed in the voice control method in any one of the above possible implementations.
In another aspect, a computer-readable storage medium is provided, where at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the electronic device in the voice control method in any one of the above possible implementation manners.
In another aspect, a computer program product is provided, which includes at least one computer program, and when being executed by a processor, is configured to implement the operations performed by an electronic device in the voice control method in any one of the above possible implementation manners.
The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:
in the embodiment of the disclosure, sound collection is performed when a voice control module of an electronic device is in a dormant state; acquiring environmental information of the environment where the electronic equipment is located in response to the acquired first personal sound signal; in response to determining that the first personal audio signal is used to trigger the electronic device to perform the target operation based on the environmental information, waking up the voice control module; and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module. In the embodiment of the disclosure, when the first person sound signal is collected, the environment information of the environment where the electronic device is located is obtained, and the voice control module of the electronic device is awakened based on the environment information, so that in the voice control process, the voice control module is awakened without a specific awakening word, and the operation corresponding to the first person sound signal can be executed based on the first person sound signal, so that the voice control process is reduced, and the voice control efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic illustration of an implementation environment provided by embodiments of the present disclosure;
FIG. 2 is a flow chart of a voice control method provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of another speech control method provided by the disclosed embodiments;
FIG. 4 is a schematic diagram of a voice control method provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of another speech control method provided by the disclosed embodiments;
FIG. 6 is a flow chart of another speech control method provided by the disclosed embodiments;
FIG. 7 is a flow chart of another speech control method provided by the disclosed embodiments;
FIG. 8 is a flow chart of another speech control method provided by the disclosed embodiments;
FIG. 9 is a flow chart of another speech control method provided by the disclosed embodiments;
fig. 10 is a block diagram of a voice control apparatus provided in an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of an implementation environment provided by embodiments of the present disclosure. Referring to fig. 1, the implementation environment includes an electronic device 101 and a server 102, and the electronic device 101 and the server 102 are connected through a wireless or wired network. In addition, an intelligent voice application that the server 102 provides services may be installed on the electronic device 101, and a user corresponding to the electronic device 101 may implement functions such as voice control, voice interaction, data transmission, and the like through the intelligent voice application.
The electronic device 101 may be a computer, a mobile phone, a tablet computer, an intelligent robot, an intelligent sound box, an intelligent home, an intelligent toy, a vehicle-mounted terminal, or a television box, and the like having a voice control function. The intelligent voice application may be an application in an operating system of the electronic device 101, and may also be an application provided by a third party; for example, the intelligent voice application may be an intelligent voice assistant. Server 102 may be a background server corresponding to the smart voice application. Correspondingly, the server 102 may be a voice recognition server, and the intelligent voice application may recognize an operation instruction corresponding to the human voice signal through the server.
The electronic device 101 may perform voice interaction with the user through the intelligent voice application, that is, the electronic device 101 may receive a human voice signal of the user through the intelligent voice application; and controlling the electronic equipment to execute corresponding operation according to the human voice signal. For example, the voice signal received by the electronic device 101 through the smart voice application is "go to beijing train station"; the electronic device 101 determines that the operation instruction of the human voice signal is 'open the navigation application on the electronic device and navigate to the beijing west railway station' through the intelligent voice application; the electronic equipment executes the operation corresponding to the human voice signal.
In the above solution, the electronic device 101 may send the received first personal sound signal to the server 102, and the server 102 determines that the operation instruction of the first personal sound signal is "open the navigation application on the electronic device, and navigate to the beijing western train station".
It should be noted that the operation corresponding to the first personal acoustic signal is any operation that can be performed by the electronic device. In another possible implementation manner, the operation corresponding to the first personal acoustic signal may be an inquiry operation performed by the electronic device; for example, the operation corresponding to the first personal acoustic signal may be "query route", "query knowledge", and the like. In another possible implementation manner, the operation corresponding to the first personal acoustic signal may also be that the electronic device feeds back the intention information; for example, the operation corresponding to the first personal acoustic signal may be "chat with the user" or the like. In another possible implementation manner, the operation corresponding to the first personal acoustic signal may also be that the electronic device opens the target application; for example, the operation corresponding to the first personal acoustic signal may be "open XX application".
Fig. 2 is a flowchart of a voice control method according to an embodiment of the present disclosure. Referring to fig. 2, the embodiment includes:
step 201, the electronic device performs sound collection when the voice control module of the electronic device is in a dormant state.
Step 202, the electronic device responds to the collected first personal sound signal to obtain environment information of the environment where the electronic device is located.
In step 203, the electronic device wakes up the voice control module in response to determining that the first personal sound signal is used to trigger the electronic device to perform the target operation based on the environment information.
And 204, the electronic equipment executes the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module.
In one possible implementation manner, acquiring environment information of an environment in which the electronic device is located includes: determining the number of target users emitting the first personal sound signals based on the first personal sound signals; the number of target users is taken as the environment information.
In another possible implementation manner, determining the number of target users who emit the first vocal signal based on the first vocal signal includes: determining the azimuth information of a target user sending the first personal sound signal in the electronic equipment based on the first personal sound signal, and determining the number of the target users according to the azimuth information; or,
based on the first vocal signals, determining vocal print information included in the first vocal signals, and determining the number of target users according to the vocal print information.
In another possible implementation manner, acquiring environment information of an environment in which the electronic device is located includes: acquiring noise information, wherein the noise information is acquired by electronic equipment in the process of acquiring a first human voice signal; the noise information is taken as environmental information.
In another possible implementation manner, the method further includes:
determining a first decibel of the noise information; and in response to the first decibel being lower than the first preset decibel, determining that the first human voice signal is used for triggering the electronic equipment to execute the target operation.
In another possible implementation manner, the method further includes:
determining a first decibel of the noise information and determining a second decibel of the first vocal signal;
and determining that the first human voice signal is used for triggering the electronic equipment to execute the target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset value.
In another possible implementation manner, before obtaining environment information of an environment in which the electronic device is located, the method further includes:
performing intention identification on the first personal sound signal to obtain intention information of the first personal sound signal; determining a target application program for executing the intention information according to the intention information; and responding to the target application program included on the electronic equipment, and executing the step of acquiring the environment information of the environment where the electronic equipment is located.
In another possible implementation manner, before obtaining environment information of an environment in which the electronic device is located, the method further includes:
and in response to that the second voice signal is not acquired within a preset time before the first voice signal is acquired, executing the step of acquiring the environmental information of the environment where the electronic equipment is located.
In another possible implementation manner, by the voice control module, based on the first vocal signal, the performing, by the voice control module, an operation corresponding to the first vocal signal includes: sending a first personal sound signal to a server through a voice control module, and receiving an operation instruction corresponding to the first personal sound signal returned by the server; and executing the target operation corresponding to the operation instruction.
In the embodiment of the disclosure, the electronic device collects sound when the voice control module of the electronic device is in a dormant state; acquiring environmental information of the environment where the electronic equipment is located in response to the acquired first personal sound signal; in response to determining that the first personal audio signal is used to trigger the electronic device to perform the target operation based on the environmental information, waking up the voice control module; and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module. In the embodiment of the disclosure, when the electronic device acquires the first personal sound signal, the electronic device acquires the environment information of the environment where the electronic device is located, and wakes up the voice control module of the electronic device based on the environment information, so that in the process of voice control, the operation corresponding to the first personal sound signal can be executed based on the first personal sound signal without waking up the voice control module by a specific wake-up word, thereby reducing the process of voice control and improving the efficiency of voice control.
Fig. 3 is a flowchart of another voice control method provided in the embodiments of the present disclosure. In the embodiment of the present disclosure, the number of target users is taken as the environment information for example. Referring to fig. 3, the embodiment includes:
step 301, the electronic device performs sound collection when the voice control module of the electronic device is in a dormant state.
In the disclosed embodiment, the sound may be a sound wave signal generated by the vibration of any object. For example, the sound may be a human voice signal emitted by the user; but also noise signals generated by object friction, air flow, etc. The sound source of the sound can be a mobile sound source or a fixed sound source.
The electronic equipment can determine that the voice control module is in a dormant state through the state of the voice control module; the electronic device can also determine that the voice control module is in the dormant state according to the state of the electronic device.
In one possible implementation manner, the electronic device executes an operation corresponding to the voice signal in response to that the voice control module is not currently based on the voice signal, and determines that the voice control module of the electronic device is in a dormant state.
In the embodiment of the disclosure, when the voice control module does not execute the operation corresponding to the voice signal, the electronic device performs voice acquisition on the voice signal in the current environment where the electronic device is located, so that the voice signal acquired by voice is prevented from colliding with the operation corresponding to the voice signal being executed by the electronic device, and the intelligence of voice acquisition of the electronic device is improved.
In another possible implementation manner, the electronic device determines that a voice control module of the electronic device is in a dormant state in response to the electronic device being in the dormant state. The voice control module can be in a dormant state, namely that a display screen of the electronic equipment is in a screen-off state; correspondingly, the steps can be as follows: the method comprises the steps that the electronic equipment obtains state information of a display screen of the electronic equipment; responding to the display screen of the electronic equipment in a screen-off state, and determining that the voice control module is in a dormant state; and collecting sound.
In the embodiment of the disclosure, a display screen of an electronic device is in a screen-off state, and a voice control module is determined to be in a dormant state; at the moment, the user does not control the electronic equipment through touch operation on the display screen, so that the voice control does not conflict with the touch operation, and the intelligence of the voice control is improved.
In another possible implementation, the voice control module being in the dormant state may be that the electronic device is not currently performing an operation. Correspondingly, the steps can be as follows: the electronic equipment acquires the operation information of the electronic equipment; in response to the electronic device not performing an operation, determining that a voice control module is in a dormant state; and collecting sound.
In the embodiment of the disclosure, the electronic device does not execute the operation currently, and determines that the voice control module is in a dormant state; at the moment, the voice control does not conflict with the operation being executed by the electronic equipment, and the intelligence of the voice control is improved.
In one possible implementation, referring to fig. 4, the electronic device may collect sound through VAD (Voice activity detection). The VAD can identify and eliminate the mute period in the sound, and only transmits the sound in the non-mute period to the electronic equipment, thereby saving the bandwidth resource of the electronic equipment and improving the collection effectiveness of the electronic equipment.
In step 302, the electronic device, in response to acquiring the first vocal signal, determines the number of target users who emit the first vocal signal based on the first vocal signal.
In the embodiment of the present disclosure, the directions of different target users are different from the electronic device, and the electronic device may determine the number of target users according to the directions of the target users. Correspondingly, the electronic device determines the number of target users emitting the first personal sound signal based on the first personal sound signal, and the determining includes: the electronic equipment determines the direction information of the target users sending the first personal sound signals in the electronic equipment based on the first personal sound signals, and determines the number of the target users according to the direction information.
In one possible implementation, the electronic device determines the azimuth information of the target user according to the azimuth of the target user. Correspondingly, the electronic device determines, based on the first personal sound signal, the position information of the target user who sent the first personal sound signal in the electronic device, including: the electronic equipment determines an azimuth angle between a target user sending the first personal sound signal and the electronic equipment based on the first personal sound signal, and determines azimuth information of the target user sending the first personal sound signal according to the azimuth angle between the target user and the electronic equipment. The azimuth angle between the target user and the electronic equipment is an included angle between a north-seeking direction line of the electronic equipment and a straight line of the electronic equipment pointing to the target user.
The electronic device can determine the number of the target users according to the number of the azimuth information of the target users. Correspondingly, the electronic device determines the number of the target users according to the orientation information, and the determining includes: the electronic equipment determines that the number of the target users is one in response to the fact that the number of the azimuth information of the target users is one; or the electronic equipment responds to the different direction information of the target user and acquires the first quantity of the direction information of the target user; the first number is taken as the number of target users.
In the embodiment of the disclosure, the electronic device determines the number of the target users according to the number of the azimuth information of the target users, and the accuracy of determining the number of the target users by the electronic device is improved because the azimuth information of different target users is different.
In another possible implementation, the voiceprints of different target users are different; the electronic device may determine the number of target users through the voiceprints of the target users. Correspondingly, the electronic device determines the number of target users emitting the first personal sound signal based on the first personal sound signal, and the determining includes: the electronic equipment determines voiceprint information included in the first personal sound signal based on the first personal sound signal, and determines the number of target users according to the voiceprint information.
Wherein, the voiceprint information included in the first personal acoustic signal can be one or more; correspondingly, the electronic device determines the number of the target users according to the voiceprint information, and the determining includes: the electronic equipment determines that the number of the target users is one in response to the fact that the voiceprint information included in the first personal acoustic signal is one; or, the electronic device acquires a third number of voiceprint information in response to that the voiceprint information included in the first personal audio signal is multiple, and takes the third number as the number of the target users.
In the embodiment of the disclosure, the electronic device determines the number of the target users according to the voiceprint information of the target users, and the accuracy of determining the number of the target users by the electronic device is improved due to the fact that the voiceprint information of different target users is different.
In the embodiment of the disclosure, the electronic device may acquire the first personal sound signal, and determine the number of target users who emit the first personal sound signal directly based on the first personal sound signal.
In another possible implementation manner, the electronic device may also acquire the first human voice signal, analyze the first human voice signal first, and determine whether the first human voice signal is used to trigger the electronic device to perform the target operation. The electronic equipment determines that the first personal sound signal is used for triggering the electronic equipment to execute target operation, and determines the number of target users sending the first personal sound signal based on the first personal sound signal; and the electronic equipment determines that the first personal sound signal is not used for triggering the electronic equipment to execute the target operation, discards the first personal sound signal and continues to collect the sound.
The step of analyzing, by the electronic device, the first personal sound signal and determining whether the first personal sound signal is used for triggering the electronic device to perform the target operation may be implemented by the following two ways:
for the first implementation mode, the electronic equipment performs intention identification on the first personal sound signal to obtain intention information of the first personal sound signal; determining a target application program for executing the intention information according to the intention information; in response to the target application being included on the electronic device, determining that the first personal acoustic signal is a step for triggering the electronic device to perform a target operation.
In one possible implementation, with continued reference to fig. 4, the electronic device performs speech recognition on the first vocal signal through an ASR (Automatic speech recognition) engine; wherein the ASR engine may convert the ASR speech into textual information. Correspondingly, the electronic device converts the first personal sound signal into the first text information, and the method comprises the following steps: the electronic device sends a first vocal signal to the ASR engine. The ASR engine receives the first human voice signal and converts the first human voice signal into first text information; and returning the first text information to the electronic equipment. The electronic equipment receives first text information returned by the ASR engine, and the electronic equipment acquires the first text information.
In one possible implementation manner, with continued reference to fig. 4, the electronic device may perform intent recognition on the first text information through NLP (natural language Processing), so as to obtain intent information of the first personal sound signal; correspondingly, the intention recognition is carried out on the first personal sound signal, and the intention information of the first personal sound signal is obtained, and the intention information comprises: the electronic device sends first text information to the NLP. The NLP receives first text information; performing intention identification on the first text information to obtain intention information of the first personal sound signal; the intention information is returned to the electronic device. The electronic equipment receives the intention information returned by the NLP to obtain the intention information of the first personal sound signal.
In one possible implementation manner, the electronic device determines, according to the intention information, a target application program for executing the intention information, including: the electronic equipment acquires the intention keywords contained in the intention information, and determines the target application program for executing the intention information according to the intention keywords.
In one possible implementation, the electronic device stores a correspondence between the intentional keyword and the target application; correspondingly, the target application program for executing the intention information is determined according to the intention keyword, and the method comprises the following steps: the electronic equipment determines a target application program for executing the intention information from the corresponding relation between the intention keyword and the target application program according to the intention keyword.
For example: the target application program corresponding to the intention keyword 'go' is a navigation application; the target application program corresponding to the intention keyword 'buy' is a shopping application; the target application program corresponding to the intention keyword "add, subtract, multiply, divide" is a calculator application.
In the embodiment of the disclosure, the electronic equipment performs intention identification on the first personal sound signal, and determines whether a target application program executing intention information is included on the electronic equipment; the target application program is included on the electronic equipment, and the first human voice signal is determined to be the step for triggering the electronic equipment to execute the target operation, so that the effectiveness of the first voice signal is improved, and the efficiency of voice control is improved.
In another possible implementation manner, the intention information of the electronic equipment responding to the first personal sound signal contains command words, and the command words contained in the intention information are command words in a command word bank of the electronic equipment; the first personal sound signal is determined for triggering the electronic device to perform the target operation. Accordingly, the step may include: the electronic equipment converts the first voice information into first text information; performing intention identification on the first text information to obtain intention information of the first personal sound signal; and in response to the intention information including any command word in the command word bank, determining that the first personal sound signal is used for triggering the electronic equipment to execute the target operation.
In one possible implementation, a command word library may be stored in the electronic device. Correspondingly, in response to the intention information including any command word in the command word bank, the electronic device determines that the first vocal signal is used for triggering the electronic device to execute the target operation, and the method includes: the electronic equipment identifies the command words in the intention information of the first personal sound signal, and determines that the first personal sound signal is used for triggering the electronic equipment to execute the target operation in response to the identified command words in the stored command word library.
In one possible implementation, the command words stored in the command word bank of the electronic device may correspond to operation instructions of the electronic device. The electronic equipment can create a command word bank corresponding to the operation instruction in the electronic equipment through the operation instruction of the electronic equipment.
For example: the operation instruction corresponding to the command word "open" is to open the XX application; the operation instruction corresponding to the command word "play" is to play XX music; accordingly, the electronic device includes command words such as "open", "play", and the like in a command word library created in the electronic device.
In the embodiment of the disclosure, when the intention information of the first personal acoustic signal includes a command word in a command word bank, the electronic device determines that the first personal acoustic signal is used for triggering the electronic device to execute a target operation, so that the effectiveness of the first personal acoustic signal is further improved, and the efficiency of voice control is improved.
For the second implementation, if the user triggers the electronic device to perform the target operation, the user is not chatting with others. The electronic equipment can determine whether the user chats with other people or not by whether the second voice signal is collected or not within a preset time before the first voice signal. Correspondingly, the electronic device determines that the first personal sound signal is used for triggering the electronic device to execute the target operation, and the method comprises the following steps: the electronic equipment responds to the fact that the second voice signal is not collected within the preset time before the first voice signal is collected, and the first voice signal is determined to be used for triggering the electronic equipment to execute the target operation.
The preset time duration may be any value between 1s and 10s, for example, the preset time duration is 1s, 3s, 5s, and the like; in the embodiment of the present disclosure, the preset duration is not specifically limited, and may be set and changed as needed. The second voice signal may be a voice signal sent by the user, or a voice signal sent by another person except the user.
In one possible implementation, the electronic device may record an acquisition time of a second vocal signal acquired before the first vocal signal. Correspondingly, the electronic device responds to that the second human voice signal is not acquired within the preset time before the first human voice signal is acquired, and the method comprises the following steps: the electronic equipment responds to the collected first personal sound signal and acquires the current first time; extracting the recorded second time in the electronic equipment; and determining that the electronic equipment does not acquire the second human voice signal within the preset time period before the first human voice signal is acquired in response to the fact that the difference value between the first time and the second time exceeds the preset time period.
In another possible implementation manner, the electronic device records the acquisition time of a second human voice signal acquired before the first human voice signal to generate time recording information; and deleting the time record information in response to the fact that the first personal sound signal is not collected within the preset time length. Correspondingly, the electronic device responds to that the second human voice signal is not acquired within the preset time before the first human voice signal is acquired, and the method comprises the following steps: the electronic equipment responds to the collected first personal sound signal and obtains time recording information in the electronic equipment; and responding to the fact that the time record information is not stored in the electronic equipment, and determining that the electronic equipment does not acquire the second human voice signal within a preset time before the first human voice signal is acquired.
In the embodiment of the disclosure, in response to that the second vocal signal is not acquired within the preset time before the first vocal signal is acquired, the electronic device determines that the first vocal signal is not the chat content of the user or the self-language of the user, so that the effectiveness of the first vocal signal is improved, and the efficiency of voice control is increased.
In embodiments of the present disclosure, the first vocal signal may include one or more of a voiceprint signal, a natural language signal, a noise signal.
In one possible implementation, the first personal acoustic signal comprises a voiceprint signal. Accordingly, the electronic device, in response to acquiring the first personal acoustic signal, comprises: the electronic equipment detects a voiceprint signal in sound, and determines that the electronic equipment acquires a first language signal in response to the detected voiceprint signal.
In another possible implementation, the first personal acoustic signal may be a sound of a host of the electronic device. Accordingly, the electronic device, in response to acquiring the first personal acoustic signal, comprises: the electronic equipment collects sound; responding to the collected second voice signal, and acquiring second voiceprint information of the second voice signal; the electronic equipment compares the first voiceprint information with second voiceprint information preset in the electronic equipment; taking the second vocal signal as the first vocal signal in response to the matching of the first vocal print information and second vocal print information preset in the electronic equipment; in response to the first voiceprint information not matching the second voiceprint information, the first vocal signal is discarded.
The preset second voiceprint information can also be voiceprint information of a human voice signal recorded by the electronic equipment in advance. For example, the second voiceprint information may be a voice of an owner of the electronic device, or may be a voice of a relative of the owner of the electronic device.
In the embodiment of the disclosure, in response to the fact that the collected second voice signal is matched with the second voice print information preset in the electronic device, the second voice signal is used as the first voice signal, so that interference of invalid voice signals is avoided, validity of the first voice signal is greatly improved, and efficiency of voice control of the electronic device is improved.
In another possible implementation, the first vocal signal may be a natural language signal; such as chinese, english, japanese, etc. Accordingly, the electronic device, in response to acquiring the first personal acoustic signal, comprises: the electronic equipment detects a natural language signal in the human voice signal, and determines that the electronic equipment acquires the first language signal in response to the detection of the natural language signal.
In the embodiment of the disclosure, the electronic device collects the natural language signal, and the natural language signal belongs to a human voice signal sent by a human, so that the effectiveness of the first human voice signal is improved, and the intelligence of voice control of the electronic device is improved.
Step 303, in response to the number of target users, the electronic device determines that the first human voice signal is a human voice signal emitted by a single sound source in the environment, and determines that the first human voice signal is used for triggering the electronic device to execute a target operation, so as to wake up the voice control module.
In the embodiment of the present disclosure, if the number of the target users is one, it is indicated that the first human voice signal acquired by the electronic device is a human voice signal sent by the user to the electronic device, that is, the user triggers the electronic device to execute the target operation through the first human voice signal, and then wakes up the voice control module. If the number of the target users is multiple, the electronic equipment cannot determine that the first human voice signal acquired by the electronic equipment is the human voice signal sent by the user to the electronic equipment, the first human voice signal is discarded, and the electronic equipment continues to acquire the voice.
In one possible implementation, the electronic device, in response to determining that the first personal acoustic signal is a personal acoustic signal emitted by a single acoustic source within the environment based on the number of target users, includes: the electronic device determines, in response to the number of target users being one, that the first personal acoustic signal is a personal acoustic signal emitted by a single acoustic source within the environment.
In the embodiment of the present disclosure, the electronic device wakes up the voice control module to switch the voice control module from the sleep state to the wake-up state for the electronic device.
In one possible implementation manner, the awakening state of the voice control module is that a display screen of the electronic device is in a bright screen state; accordingly, the step may include: the electronic equipment switches the display screen of the electronic equipment from the screen-off state to the screen-on state.
In another possible implementation manner, the wake-up state of the voice control module is a state in which the electronic device receives a first personal sound signal; accordingly, the step may include: the electronic device switches the electronic device from a state of not receiving the first vocal signal to a state of receiving the first vocal signal.
In another possible implementation manner, the voice control module is in an awake state, which is a state in which the voice control module can execute an operation corresponding to the first human voice signal; accordingly, the step may include: the electronic equipment switches the voice control module from the dormant state to a state in which operation corresponding to the first human voice signal can be executed.
And 304, the electronic equipment executes the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module.
In the embodiment of the present disclosure, this step may be implemented by the following two implementation manners:
the first implementation mode comprises the following steps: the electronic equipment can locally recognize the first voice signal through the electronic equipment based on the first voice signal and execute the operation corresponding to the first voice signal. Accordingly, referring to FIG. 5, step 304 may be replaced with the following steps:
at step 5041, the electronic device converts the first speech signal into a first text message.
In one possible implementation, the electronic device may convert ASR speech into textual information via an ASR engine. The corresponding steps can be as follows: the electronic device sends a first speech signal to the ASR engine; the ASR engine receives the first voice signal and converts the first voice signal into first text information; the electronic equipment acquires first text information.
Step 5042, the electronic device executes an operation corresponding to the first voice signal according to the first text message.
In the embodiment of the disclosure, the electronic device may execute an operation corresponding to the first voice signal directly according to the first text information. The electronic device may also process the first text information to obtain second text information, and execute an operation corresponding to the first voice signal according to the second text information.
In a possible implementation manner, the electronic device may identify the first text information through NLP, and determine a first operation instruction corresponding to the first text information. Accordingly, the step may include: the electronic device sends first text information to the NLP. The NLP receives first text information and identifies the semantics of the first text information; and determining a first operation instruction corresponding to the first text information, and returning the first operation instruction to the electronic equipment. The electronic equipment receives a first operation instruction; and executing the operation corresponding to the first voice signal according to the first operation instruction.
In another possible implementation manner, the electronic device removes invalid words in the first text information corresponding to the first voice signal to obtain the second text information. And the electronic equipment executes the operation corresponding to the first voice signal according to the second text information.
In one possible implementation, the invalid words include isolated words; correspondingly, the electronic equipment eliminates the invalid words in the first text information corresponding to the first voice signal, and the method comprises the following steps: the electronic equipment detects isolated words in first text information corresponding to the first voice signal; in response to detecting the isolated words, the isolated words are culled.
In another possible implementation, the null words include linguistic words; correspondingly, the electronic equipment eliminates the invalid words in the first text information corresponding to the first voice signal, and the method comprises the following steps: the electronic equipment detects the language words in the first text information; and eliminating the linguistic words in response to detecting the linguistic words.
In the embodiment of the disclosure, the electronic device removes the invalid words in the second text information corresponding to the first voice signal to obtain the first text information, so that the validity of the first text information is improved, and the voice control efficiency of the electronic device is increased.
The second implementation mode comprises the following steps: the electronic equipment can identify the first personal sound signal through the server based on the first personal sound signal through the voice control module, and execute the operation corresponding to the first personal sound signal. Accordingly, referring to fig. 6, step 304 may be replaced with the following steps:
step 6041, the electronic device sends the first personal audio signal to the server through the voice control module.
Step 6042, the server receives the first personal sound signal and identifies the first personal sound signal; determining a second operation instruction corresponding to the first personal sound signal according to the identification result; and returning a second operation instruction corresponding to the first human voice signal to the electronic equipment.
Step 6043, the electronic device receives a second operation instruction corresponding to the first personal acoustic signal returned by the server, and executes an operation corresponding to the second operation instruction.
In the embodiment of the disclosure, when the first personal sound signal is acquired, the number of target users who send out the first personal sound signal is determined based on the first personal sound signal, and when the first personal sound signal is determined to be the personal sound signal sent out by a single sound source in the environment based on the number of the target users, the voice control module of the electronic device is awakened. Therefore, in the voice control process, the electronic equipment is not required to be awakened by a specific awakening word, and the operation corresponding to the first human voice signal can be executed based on the first human voice signal, so that the voice control process is reduced, and the voice control efficiency is improved.
Fig. 7 is a flowchart of a voice control method according to an embodiment of the present disclosure. In the embodiment of the present disclosure, noise information is taken as environmental information for example. Referring to fig. 7, the embodiment includes:
step 701, the electronic device performs sound collection when the voice control module of the electronic device is in a dormant state.
Step 701 is the same as step 301, and is not described herein again.
Step 702, the electronic device acquires noise information in response to the acquired first human voice signal, wherein the noise information is acquired by the electronic device in the process of acquiring the first human voice signal.
The method for the electronic device to respond to the acquired first vocal signal in step 702 is the same as the method for the electronic device to respond to the acquired first vocal signal in step 302, and is not described herein again.
In one possible implementation, the noise information may be a non-natural language signal; correspondingly, the electronic device acquires noise information, and the method comprises the following steps: the electronic device extracts an unnatural human voice signal from the collected human voice signal, and uses the unnatural human voice signal as noise information.
In another possible implementation, the noise information may be a natural language signal; the voiceprint signal of the noise information is different from the first voiceprint signal of the first vocal signal. Correspondingly, the electronic device acquires noise information, and the method comprises the following steps: the electronic equipment extracts a second voiceprint signal different from the first voiceprint signal from the collected human voice signal, and the second voiceprint signal is used as noise information.
Step 703, the electronic device determines, in response to the noise information, that the first human voice signal is a human voice signal emitted by a single sound source in the environment, and determines that the first human voice signal is used for triggering the electronic device to execute the target operation, so as to wake up the voice control module.
In one possible implementation, the electronic device determines, based on the noise information, that the first personal acoustic signal is a personal acoustic signal emitted from a single acoustic source within the environment, including: the electronic equipment determines a first decibel of the noise information; in response to the first decibel being below the first predetermined decibel, determining that the first vocal signal is a vocal signal emitted by a single source within the environment.
The first preset decibel may be any value between 30 decibels and 50 decibels, for example, the first preset decibel is 30 decibels, 35 decibels, 40 decibels, or the like; in the embodiment of the present disclosure, the value of the first preset decibel is not specifically limited, and may be set and changed as needed.
In one possible implementation manner, the electronic device may determine a decibel value of the noise information according to a sound pressure value of the noise information, and a corresponding relationship between the sound pressure value and the decibel value may be stored in the electronic device; accordingly, the electronic device determines a first decibel of the noise information, including: the electronic equipment acquires a first sound pressure value of the noise information through the microphone, and determines a first decibel of the noise information from a corresponding relation between the stored sound pressure value and a decibel value according to the first sound pressure value.
In the embodiment of the disclosure, when the first decibel of the noise information is lower than the first preset decibel, the electronic device determines that the first vocal signal is a vocal signal emitted by a single sound source in the environment, thereby ensuring that the first vocal signal is not interfered by the noise signal, improving the accuracy of the first vocal signal, and improving the efficiency of the voice control.
In another possible implementation manner, the electronic device determines, based on the noise information, that the first personal sound signal is used for triggering the electronic device to perform the target operation, including: the electronic equipment determines a first decibel of the noise information and a second decibel of the first human voice signal; and determining that the first human voice signal is used for triggering the electronic equipment to execute the target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset value.
The preset value may be any value between 20 and 50, for example, the preset value is 20, 30, 40, or the like; in the embodiment of the present disclosure, the preset value is not specifically limited, and may be set and changed as needed.
In one possible implementation, the electronic device determining a second decibel of the first personal sound signal includes: the electronic equipment acquires a second sound pressure value of the first human voice signal through the microphone, and determines a second decibel of the first human voice signal from the corresponding relation between the stored sound pressure value and the decibel value according to the second sound pressure value.
In the embodiment of the disclosure, when the first decibel of the noise information is lower than the second decibel of the first vocal signal, and the ratio of the second decibel to the first decibel is greater than the preset value, the electronic device determines that the first vocal signal is a vocal signal emitted by a single sound source in the environment, thereby ensuring that the first vocal signal is not interfered by the noise signal, improving the accuracy of the first vocal signal, and improving the efficiency of the voice control.
Step 704, the electronic device executes, through the voice control module, an operation corresponding to the first human voice signal based on the first human voice signal.
In the embodiment of the present disclosure, this step may be implemented by the following two implementation manners:
the first implementation mode comprises the following steps: the electronic equipment can locally identify the first human voice signal through the electronic equipment based on the first human voice signal, and execute the operation corresponding to the first human voice signal. Accordingly, referring to fig. 8, step 704 may be replaced with the following steps:
in step 8041, the electronic device converts the first person sound signal into first text information.
Step 8041 is the same as step 5041, and is not described herein again.
Step 8042, the electronic device executes an operation corresponding to the first vocal signal according to the first text information.
Step 8042 is the same as step 5042, and is not described herein again.
The second implementation mode comprises the following steps: the electronic equipment can identify the first personal sound signal through the server based on the first personal sound signal through the voice control module, and execute the operation corresponding to the first personal sound signal. Accordingly, referring to fig. 9, step 604 may be replaced with the following steps:
step 9041, the electronic device sends the first personal audio signal to the server through the voice control module.
9042, the server receives the first personal sound signal and identifies the first personal sound signal; determining a second operation instruction corresponding to the first personal sound signal according to the identification result; and returning a second operation instruction corresponding to the first human voice signal to the electronic equipment.
And 9043, the electronic device receives a second operation instruction corresponding to the first human voice signal returned by the server, and executes an operation corresponding to the second operation instruction.
In the embodiment of the disclosure, when the first personal sound signal is collected, noise information of an environment where the electronic device is located is obtained, and the voice control module of the electronic device is awakened based on the fact that the noise information does not interfere with the first personal sound signal. Therefore, in the voice control process, the operation corresponding to the first personal sound signal can be directly executed based on the first personal sound signal without waking up the voice control module of the electronic equipment by a specific wake-up word, so that the voice control process is reduced, and the voice control efficiency is improved.
Fig. 10 is a block diagram of another speech control apparatus provided in an embodiment of the present disclosure. Referring to fig. 10, the apparatus includes:
the acquisition module 1001 is configured to acquire sound when a voice control module of the electronic device is in a dormant state;
the obtaining module 1002 is configured to obtain environment information of an environment where the electronic device is located in response to the first personal acoustic signal being collected;
a wake-up module 1003 configured to wake up the voice control module in response to determining that the first personal sound signal is used to trigger the electronic device to perform the target operation based on the environment information;
an executing module 1004 configured to execute, by the voice control module, an operation corresponding to the first vocal signal based on the first vocal signal.
In one possible implementation, the obtaining module 1002 is further configured to determine, based on the first vocal signal, the number of target users who emit the first vocal signal; the number of target users is taken as the environment information.
In another possible implementation manner, the obtaining module 1002 is further configured to determine, based on the first personal sound signal, position information of a target user sending the first personal sound signal on the electronic device, and determine the number of the target users according to the position information; or,
based on the first vocal signals, determining vocal print information included in the first vocal signals, and determining the number of target users according to the vocal print information.
In another possible implementation manner, the obtaining module 1002 is further configured to obtain noise information, where the noise information is collected by the electronic device in a process of collecting the first human voice signal; the noise information is taken as environmental information.
In another possible implementation, the obtaining module 1002 is further configured to determine a first decibel of the noise information; and in response to the first decibel being lower than the first preset decibel, determining that the first human voice signal is used for triggering the electronic equipment to execute the target operation.
In another possible implementation, the obtaining module 1002 is further configured to determine a first decibel of the noise information and determine a second decibel of the first vocal signal; and determining that the first human voice signal is used for triggering the electronic equipment to execute the target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset value.
In another possible implementation manner, the execution module 1004 is further configured to perform intent recognition on the first vocal signal, so as to obtain intent information of the first vocal signal; determining a target application program for executing the intention information according to the intention information; and responding to the target application program included on the electronic equipment, and executing the step of acquiring the environment information of the environment where the electronic equipment is located.
In another possible implementation manner, the executing module 1004 is further configured to execute the step of acquiring the environmental information of the environment where the electronic device is located, in response to that the second human voice signal is not acquired within a preset time period before the first human voice signal is acquired.
In another possible implementation manner, the execution module 1004 is further configured to send a first personal sound signal to the server through the voice control module, and receive an operation instruction corresponding to the first personal sound signal returned by the server; and executing the target operation corresponding to the operation instruction.
In the embodiment of the disclosure, the electronic device collects sound when the voice control module of the electronic device is in a dormant state; acquiring environmental information of the environment where the electronic equipment is located in response to the acquired first personal sound signal; in response to determining that the first personal audio signal is used to trigger the electronic device to perform the target operation based on the environmental information, waking up the voice control module; and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module. In the embodiment of the disclosure, when the electronic device acquires the first personal sound signal, the electronic device acquires the environment information of the environment where the electronic device is located, and wakes up the voice control module of the electronic device based on the environment information, so that in the process of voice control, the operation corresponding to the first personal sound signal can be executed based on the first personal sound signal without waking up the voice control module by a specific wake-up word, thereby reducing the process of voice control and improving the efficiency of voice control.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the voice control apparatus provided in the foregoing embodiment, when performing voice control, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the voice control apparatus and the voice control method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Fig. 11 shows a block diagram of an electronic device 1100 provided in an exemplary embodiment of the present disclosure. The electronic device 1100 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts compression standard audio Layer 4), a notebook computer, or a desktop computer. Electronic device 1100 may also be referred to by other names as user equipment, portable electronic device, laptop electronic device, desktop electronic device, and so on.
In general, the electronic device 1100 includes: a processor 1101 and a memory 1102.
Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement the speech control methods provided by method embodiments in the present disclosure.
In some embodiments, the electronic device 1100 may also optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, touch screen display 1105, camera assembly 1106, audio circuitry 1107, positioning assembly 1108, and power supply 1109.
The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.
The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1104 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.
The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 1105 may be one, providing the front panel of the electronic device 1100; in other embodiments, the display screens 1105 may be at least two, respectively disposed on different surfaces of the electronic device 1100 or in a folded design; in still other embodiments, the display 1105 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of an electronic apparatus, and a rear camera is disposed on a rear surface of the electronic apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of the electronic device 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.
The positioning component 1108 is used to locate a current geographic Location of the electronic device 1100 for navigation or LBS (Location Based Service). The Positioning component 1108 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union galileo System.
The power supply 1109 is used to provide power to the various components within the electronic device 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the electronic device 1100 also includes one or more sensors 1111. The one or more sensors 1111 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.
The acceleration sensor 1111 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with the electronic device 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 1112 may detect a body direction and a rotation angle of the electronic device 1100, and the gyro sensor 1112 may cooperate with the acceleration sensor 1111 to acquire a 3D motion of the user on the electronic device 1100. From the data collected by gyroscope sensor 1112, processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensors 1113 may be disposed on a side bezel of the electronic device 1100 and/or on an underlying layer of the touch display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the electronic device 1100, the holding signal of the user to the electronic device 1100 can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the touch display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 may be disposed on the front, back, or side of the electronic device 1100. When a physical button or vendor Logo is provided on the electronic device 1100, the fingerprint sensor 1114 may be integrated with the physical button or vendor Logo.
Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1105 is turned down. In another embodiment, processor 1101 may also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.
The proximity sensor 1116, also referred to as a distance sensor, is typically disposed on the front panel of the electronic device 1100. The proximity sensor 1116 is used to capture the distance between the user and the front of the electronic device 1100. In one embodiment, the touch display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state when the proximity sensor 1116 detects that the distance between the user and the front face of the electronic device 1100 is gradually decreasing; when the proximity sensor 1116 detects that the distance between the user and the front face of the electronic device 1100 becomes progressively larger, the touch display screen 1105 is controlled by the processor 1101 to switch from a breath-screen state to a light-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of the electronic device 1100, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in an electronic device to perform a voice control method in the embodiments described below. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The embodiment of the present disclosure further provides a computer program product, which includes at least one computer program, and when the at least one computer program is executed by a processor, the computer program is configured to implement the method for detecting touch delay in the above embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is intended to be exemplary only and not to limit the present disclosure, and any modification, equivalent replacement, or improvement made without departing from the spirit and scope of the present disclosure is to be considered as the same as the present disclosure.

Claims (12)

1. A method for voice control, the method comprising:
when a voice control module of the electronic equipment is in a dormant state, sound collection is carried out;
acquiring environment information of the environment where the electronic equipment is located in response to the acquired first personal sound signal;
in response to determining that the first personal acoustic signal is used to trigger the electronic device to perform a target operation based on the environmental information, waking up the voice control module;
and executing the operation corresponding to the first human voice signal based on the first human voice signal through the voice control module.
2. The method of claim 1, wherein the obtaining environmental information of an environment in which the electronic device is located comprises:
determining, based on the first vocal signals, a number of target users that emitted the first vocal signals;
and taking the number of the target users as the environment information.
3. The method of claim 2, wherein the determining the number of target users emitting the first personal acoustic signal based on the first personal acoustic signal comprises:
determining the azimuth information of a target user sending the first personal sound signal in the electronic equipment based on the first personal sound signal, and determining the number of the target users according to the azimuth information; or,
determining voiceprint information included in the first personal sound signal based on the first personal sound signal, and determining the number of the target users according to the voiceprint information.
4. The method of claim 1, wherein the obtaining environmental information of an environment in which the electronic device is located comprises:
acquiring noise information, wherein the noise information is acquired by the electronic equipment in the process of acquiring the first human voice signal;
and taking the noise information as the environment information.
5. The method of claim 4, further comprising:
determining a first decibel of the noise information;
and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the first decibel being lower than a first preset decibel.
6. The method of claim 4, further comprising:
determining a first decibel of the noise information and determining a second decibel of the first vocal signal;
and determining that the first human voice signal is used for triggering the electronic equipment to execute target operation in response to the fact that the first decibel is smaller than the second decibel and the ratio of the second decibel to the first decibel is larger than a preset numerical value.
7. The method of claim 1, wherein before obtaining the environment information of the environment in which the electronic device is located, the method further comprises:
performing intention identification on the first personal sound signal to obtain intention information of the first personal sound signal;
determining a target application program for executing the intention information according to the intention information;
and responding to the target application program included on the electronic equipment, and executing the step of acquiring the environment information of the environment where the electronic equipment is located.
8. The method of claim 1, wherein before obtaining the environment information of the environment in which the electronic device is located, the method further comprises:
and in response to that no second voice signal is acquired within a preset time period before the first personal sound signal is acquired, executing the step of acquiring the environmental information of the environment where the electronic equipment is located.
9. The method according to any one of claims 1-8, wherein the performing, by the voice control module, the operation corresponding to the first vocal signal based on the first vocal signal comprises:
sending the first human voice signal to a server through the voice control module, and receiving an operation instruction corresponding to the first human voice signal returned by the server;
and executing the operation corresponding to the operation instruction.
10. A voice control apparatus, characterized in that the apparatus comprises:
the acquisition module is configured to acquire sound when the voice control module of the electronic equipment is in a dormant state;
the acquisition module is configured to respond to the acquisition of the first personal sound signal and acquire environment information of the environment where the electronic equipment is located;
a wake-up module configured to wake-up the voice control module in response to determining, based on the environmental information, that the first vocal signal is for triggering the electronic device to perform a target operation;
the execution module is configured to execute, through the voice control module, an operation corresponding to the first human voice signal based on the first human voice signal.
11. An electronic device, comprising a processor and a memory, wherein at least one instruction is stored in the memory, and wherein the instruction is loaded and executed by the processor to implement the operations performed by the voice control method according to any one of claims 1 to 9.
12. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to perform operations performed by the voice control method of any one of claims 1 to 9.
CN202010437874.9A 2020-05-21 2020-05-21 Voice control method and device, electronic equipment and storage medium Pending CN111681655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010437874.9A CN111681655A (en) 2020-05-21 2020-05-21 Voice control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010437874.9A CN111681655A (en) 2020-05-21 2020-05-21 Voice control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111681655A true CN111681655A (en) 2020-09-18

Family

ID=72452908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010437874.9A Pending CN111681655A (en) 2020-05-21 2020-05-21 Voice control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111681655A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112259097A (en) * 2020-10-27 2021-01-22 深圳康佳电子科技有限公司 Control method for voice recognition and computer equipment
CN112365899A (en) * 2020-10-30 2021-02-12 北京小米松果电子有限公司 Voice processing method, device, storage medium and terminal equipment
CN113335205A (en) * 2021-06-09 2021-09-03 东风柳州汽车有限公司 Voice wake-up method, device, equipment and storage medium
CN113778226A (en) * 2021-08-26 2021-12-10 江西恒必达实业有限公司 Infrared AI intelligent glasses based on speech recognition technology control intelligence house
WO2024103893A1 (en) * 2022-11-16 2024-05-23 荣耀终端有限公司 Method for waking up application program, and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542941B1 (en) * 2015-10-01 2017-01-10 Lenovo (Singapore) Pte. Ltd. Situationally suspending wakeup word to enable voice command input
CN107977183A (en) * 2017-11-16 2018-05-01 百度在线网络技术(北京)有限公司 voice interactive method, device and equipment
CN108520743A (en) * 2018-02-02 2018-09-11 百度在线网络技术(北京)有限公司 Sound control method, smart machine and the computer-readable medium of smart machine
CN108712681A (en) * 2018-05-30 2018-10-26 深圳市零度智控科技有限公司 Smart television sound control method, smart television and readable storage medium storing program for executing
CN108766438A (en) * 2018-06-21 2018-11-06 Oppo广东移动通信有限公司 Man-machine interaction method, device, storage medium and intelligent terminal
CN109671426A (en) * 2018-12-06 2019-04-23 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN110232916A (en) * 2019-05-10 2019-09-13 平安科技(深圳)有限公司 Method of speech processing, device, computer equipment and storage medium
CN110827821A (en) * 2019-12-04 2020-02-21 三星电子(中国)研发中心 Voice interaction device and method and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542941B1 (en) * 2015-10-01 2017-01-10 Lenovo (Singapore) Pte. Ltd. Situationally suspending wakeup word to enable voice command input
CN107977183A (en) * 2017-11-16 2018-05-01 百度在线网络技术(北京)有限公司 voice interactive method, device and equipment
CN108520743A (en) * 2018-02-02 2018-09-11 百度在线网络技术(北京)有限公司 Sound control method, smart machine and the computer-readable medium of smart machine
CN108712681A (en) * 2018-05-30 2018-10-26 深圳市零度智控科技有限公司 Smart television sound control method, smart television and readable storage medium storing program for executing
CN108766438A (en) * 2018-06-21 2018-11-06 Oppo广东移动通信有限公司 Man-machine interaction method, device, storage medium and intelligent terminal
CN109671426A (en) * 2018-12-06 2019-04-23 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN110232916A (en) * 2019-05-10 2019-09-13 平安科技(深圳)有限公司 Method of speech processing, device, computer equipment and storage medium
CN110827821A (en) * 2019-12-04 2020-02-21 三星电子(中国)研发中心 Voice interaction device and method and computer readable storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112259097A (en) * 2020-10-27 2021-01-22 深圳康佳电子科技有限公司 Control method for voice recognition and computer equipment
CN112365899A (en) * 2020-10-30 2021-02-12 北京小米松果电子有限公司 Voice processing method, device, storage medium and terminal equipment
CN113335205A (en) * 2021-06-09 2021-09-03 东风柳州汽车有限公司 Voice wake-up method, device, equipment and storage medium
CN113335205B (en) * 2021-06-09 2022-06-03 东风柳州汽车有限公司 Voice wake-up method, device, equipment and storage medium
CN113778226A (en) * 2021-08-26 2021-12-10 江西恒必达实业有限公司 Infrared AI intelligent glasses based on speech recognition technology control intelligence house
WO2024103893A1 (en) * 2022-11-16 2024-05-23 荣耀终端有限公司 Method for waking up application program, and electronic device

Similar Documents

Publication Publication Date Title
CN111933112B (en) Awakening voice determination method, device, equipment and medium
CN110971930A (en) Live virtual image broadcasting method, device, terminal and storage medium
CN111477225B (en) Voice control method and device, electronic equipment and storage medium
CN111681655A (en) Voice control method and device, electronic equipment and storage medium
CN111445901B (en) Audio data acquisition method and device, electronic equipment and storage medium
CN111862972B (en) Voice interaction service method, device, equipment and storage medium
CN111613213B (en) Audio classification method, device, equipment and storage medium
CN111681654A (en) Voice control method and device, electronic equipment and storage medium
CN114594923A (en) Control method, device and equipment of vehicle-mounted terminal and storage medium
CN113744736B (en) Command word recognition method and device, electronic equipment and storage medium
CN113362836B (en) Vocoder training method, terminal and storage medium
CN108831423B (en) Method, device, terminal and storage medium for extracting main melody tracks from audio data
CN111986700B (en) Method, device, equipment and storage medium for triggering contactless operation
CN114333774A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN111341317B (en) Method, device, electronic equipment and medium for evaluating wake-up audio data
CN113782025B (en) Speech recognition method, device, terminal and storage medium
CN110992954A (en) Method, device, equipment and storage medium for voice recognition
CN113160802B (en) Voice processing method, device, equipment and storage medium
CN111028846B (en) Method and device for registration of wake-up-free words
CN115035187A (en) Sound source direction determining method, device, terminal, storage medium and product
CN113162837B (en) Voice message processing method, device, equipment and storage medium
CN114299997A (en) Audio data processing method and device, electronic equipment, storage medium and product
CN114595019A (en) Theme setting method, device and equipment of application program and storage medium
CN115166633B (en) Sound source direction determining method, device, terminal and storage medium
CN112749583A (en) Face image grouping method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination