Nothing Special   »   [go: up one dir, main page]

WO2022068694A1 - Electronic device and wake-up method thereof - Google Patents

Electronic device and wake-up method thereof Download PDF

Info

Publication number
WO2022068694A1
WO2022068694A1 PCT/CN2021/120305 CN2021120305W WO2022068694A1 WO 2022068694 A1 WO2022068694 A1 WO 2022068694A1 CN 2021120305 W CN2021120305 W CN 2021120305W WO 2022068694 A1 WO2022068694 A1 WO 2022068694A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
voiceprint
wake
orientation
azimuth
Prior art date
Application number
PCT/CN2021/120305
Other languages
French (fr)
Chinese (zh)
Inventor
孙渊
屈伸
许天亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022068694A1 publication Critical patent/WO2022068694A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/021Services related to particular areas, e.g. point of interest [POI] services, venue services or geofences

Definitions

  • the present application relates to the field of terminal technologies, and in particular, to an electronic device and a wake-up method thereof.
  • Electronic devices can perform functions through voice interaction with the user.
  • Such electronic devices include pickups (eg, microphone arrays) and speakers (eg, speakers), with pickup and playback functions.
  • pickups eg, microphone arrays
  • speakers eg, speakers
  • the electronic device needs to be woken up. By waking up, the electronic device can enter the working state from the standby state.
  • the electronic device determines whether to wake up by recognizing whether the received sound contains a preset wake-up word.
  • the wake-up word of the smart speaker is "Xiaoyi Xiaoyi" as an example. If the user makes a sound containing "Xiaoyi Xiaoyi", the smart speaker detects "Xiaoyi Xiaoyi” from the received sound, and the smart speaker wakes up. Sometimes, smart speakers can also play a wake-up response voice to interact with the user's voice. For example, "Xiaoyi is here, what can I do for you". However, in some scenarios, the user or other devices emit a sound, but the sound does not contain "Xiaoyi Xiaoyi", which causes the smart speaker to be awakened by mistake.
  • the user is watching TV.
  • the user did not pronounce "Xiaoyi Xiaoyi”, and the sound from the TV did not contain "Xiaoyi Xiaoyi”, but the smart speaker was mistakenly awakened.
  • the normal life of the user is affected, and the user is required to turn off the smart speaker additionally, which brings a bad experience to the user.
  • the present application provides an electronic device and a wake-up method thereof, which can reduce the false wake-up rate of the electronic device and improve user experience.
  • a wake-up method is provided.
  • the method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones.
  • the method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth is matched with one of the first azimuths in the first azimuth set, and after the first azimuth position reliability corresponding to the matched first azimuth is greater than or equal to the third threshold, wake up the electronic device; or, after the matching After the position reliability of the first party corresponding to the first orientation on the device is less than the third threshold, the electronic device is not woken up.
  • the wake-up word is used to wake up the electronic device;
  • the sound source orientation is the direction and position of the sound source relative to the electronic device;
  • the first orientation set includes M first orientation elements, and each first orientation element includes a first orientation and a The first position reliability;
  • the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, which is used to indicate that the electronic device has been woken up in the first position;
  • the first position reliability is used to indicate that the first position
  • M is a positive integer greater than or equal to 1.
  • the orientation of the sound source corresponding to the sound including: after the wake-up word confidence is greater than or equal to the first threshold, and when the wake-up word confidence is less than the second threshold After that, calculate the azimuth of the sound source corresponding to the sound. In this way, by setting the second threshold, the situation where the confidence level of the wake-up word is between the first threshold and the second threshold is screened out, so that the processing efficiency of the electronic device can be improved.
  • the sound source orientation matches one of the first orientation sets in the first orientation set; including: the sound source orientation relative to the direction of the electronic device matches the first orientation set One of the first orientations is relative to the direction of the electronic device, and the angular deviation of the two directions is within a preset fourth threshold; and the position of the sound source orientation relative to the electronic device is different from the position of the first orientation relative to the electronic device. , the position deviation of the two positions is within the preset fifth threshold.
  • a voiceprint is extracted from the sound; After matching with a first voiceprint in the first voiceprint set, and after the first voiceprint confidence corresponding to the first voiceprint is greater than or equal to the preset sixth threshold, wake up the electronic device; After the confidence level of the first voiceprint corresponding to the voiceprint is less than the preset sixth threshold, the electronic device is not woken up.
  • the first voiceprint set includes L voiceprint elements, and each voiceprint element includes a first voiceprint and a first voiceprint confidence level.
  • the first voiceprint is used to represent the voiceprint for waking up the electronic device, and the first voiceprint confidence is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1. In this way, if the possible false awakening cannot be screened out according to the first orientation set, the possible false awakening is further screened through the first voiceprint set, thereby reducing the false awakening probability of the electronic device and improving the user experience.
  • the method further includes: updating the first orientation set and the first voiceprint set.
  • the electronic device is woken up, and the first orientation set and the first voiceprint set are updated.
  • the wake-up word confidence is greater than or equal to the second threshold
  • wake up the electronic device and update the first orientation set and the first voiceprint set; including: after waking up After the word confidence is greater than or equal to the second threshold, wake up the electronic device, and create a first orientation set and a first voiceprint set; the orientation of waking up the electronic device is included in the first orientation set, and the voiceprint waking up the electronic device is included in the first orientation set.
  • a voiceprint set is assigned, and an initial azimuth position confidence is given to the azimuths included in the first azimuth set, and an initial voiceprint confidence is given to the voiceprints included in the first voiceprint set.
  • a wake-up method is provided.
  • the method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones.
  • the method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth matches one of the second azimuths in the second azimuth set, and after the position reliability of the second azimuth corresponding to the matched second azimuth is greater than or equal to the seventh threshold, wake up the electronic device; or, after the matching After the position reliability of the second party corresponding to the second orientation on the device is less than the seventh threshold, the electronic device is not awakened.
  • the wake-up word is used to wake up the electronic device;
  • the sound source orientation is the direction and position of the sound source relative to the electronic device;
  • the second orientation set includes N second orientation elements, and each second orientation element includes a second orientation and a
  • the second position reliability is the direction and position of the sound source that does not wake up the electronic device relative to the electronic device, which is used to indicate that the electronic device is not woken up in the second position, and the second position reliability is used to indicate that in the first position
  • N is a positive integer greater than or equal to 1.
  • the sound source orientation is matched with a second orientation in the second set of orientations; including: the orientation of the sound source relative to the direction of the electronic device, and the orientation of one of the second orientations in the second set of orientations relative to the electronic device , the angular deviation of the two directions is within the preset eighth threshold; and, the position of the sound source azimuth relative to the position of the electronic device, and the position of the second azimuth relative to the position of the electronic device, the position deviation of the two positions is within the preset eighth within nine thresholds.
  • the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, extracting from the sound Voiceprint; after the voiceprint does not match any of the first voiceprints in the first voiceprint set, update the second orientation set.
  • the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence, the first voiceprint is used to represent the voiceprint for waking up the electronic device, the first The voiceprint confidence is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1. In this way, if the possible false awakening cannot be screened out according to the second orientation set, the possible false awakening is further screened through the first voiceprint set, thereby reducing the false awakening probability of the electronic device and improving the user experience.
  • the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, extracting from the sound Voiceprint; wake up the electronic device after the voiceprint matches one of the first voiceprints in the first voiceprint set, and after the confidence level of the first voiceprint corresponding to the first voiceprint is greater than or equal to the preset tenth threshold ; or, after the confidence level of the first voiceprint corresponding to the first voiceprint is smaller than the preset tenth threshold, the electronic device is not woken up, and the second orientation set is updated.
  • the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint confidence level is used to indicate that the first voiceprint wakes up the electronic device
  • the probability of , the first voiceprint is used to represent the voiceprint for waking up the electronic device;
  • L is a positive integer greater than or equal to 1.
  • the method further includes: updating the first voiceprint set; after not waking up the electronic device, the method further includes: updating the second Azimuth collection.
  • the electronic device after the wake-up word confidence is greater than or equal to the second threshold, the electronic device is woken up, and the first voiceprint set is updated.
  • the second threshold by setting the second threshold, the situation where the confidence level of the wake-up word is between the first threshold and the second threshold is screened out, so that the processing efficiency of the electronic device can be improved.
  • a wake-up method is provided.
  • the method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones.
  • the method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth matches a second azimuth in the second azimuth set, and after the sound source azimuth does not match any first azimuth in the first azimuth set, and the matching second azimuth corresponds to After the second party location reliability is greater than or equal to the eleventh threshold, the electronic device is awakened; or, after the second party location reliability corresponding to the matched second orientation is less than the eleventh threshold, the electronic device is not awakened.
  • the wake-up word is used to wake up the electronic device;
  • the sound source orientation is the direction and position of the sound source relative to the electronic device;
  • the first orientation set includes M first orientation elements, and each first orientation element includes a first orientation and a The first position reliability;
  • the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, which is used to indicate that the electronic device has been woken up in the first position;
  • the first position reliability is used to indicate that the first position
  • the second orientation set includes N second orientation elements, each of which includes a second orientation and a second orientation reliability;
  • the second orientation is a sound source that does not wake up the electronic device Relative to the direction and position of the electronic device, it is used to indicate that the electronic device is not woken up in the second orientation;
  • the second position reliability is used to indicate the confidence that the electronic device has not been woken up in the second orientation;
  • M and N are both greater than
  • the sound source azimuth is matched with a second azimuth in the second azimuth set; including: the sound source azimuth relative to the direction of the electronic device, and a second azimuth in the second azimuth set relative to the direction of the electronic device , the angular deviation of the two directions is within the preset twelfth threshold; and, the position deviation of the sound source azimuth relative to the position of the electronic device and the position of the second azimuth relative to the electronic device is within the preset twelfth threshold.
  • the sound source orientation does not match any first orientation in the first orientation set; including: the sound source orientation relative to the direction of the electronic device is relative to any first orientation in the first orientation set The direction of the electronic device, the angular deviation of the two directions is not within the preset fourteenth threshold; and, the position of the sound source azimuth relative to the electronic device is the same as the position of any first azimuth in the first azimuth set relative to the electronic device. position, the position deviation of both positions is not within the preset fifteenth threshold.
  • the method further includes: after the sound source azimuth is matched with a first azimuth in the first azimuth set, and after the sound source azimuth is matched with the second azimuth set After any one of the second orientations does not match, and after the position reliability of the first party corresponding to the matching first orientation is greater than or equal to the sixteenth threshold, wake up the electronic device; After the position reliability of the first party corresponding to the orientation is less than the sixteenth threshold, the electronic device is not awakened.
  • the sound source orientation matches one of the first orientation sets in the first orientation set; including: the sound source orientation relative to the direction of the electronic device is matched with the first orientation set One of the first orientations is relative to the direction of the electronic device, and the angular deviation of the two directions is within the preset fourteenth threshold; and, the position of the sound source orientation relative to the electronic device is different from the first orientation relative to the electronic device.
  • the position deviation of the two positions is within the preset fifteenth threshold; the sound source azimuth does not match any second azimuth in the second azimuth set; including: the sound source azimuth relative to the direction of the electronic device, and The angular deviation of any second azimuth in the second azimuth set relative to the direction of the electronic device is not within the preset twelfth threshold; For the position of any second orientation in the set relative to the position of the electronic device, the position deviation of the two positions is not within the preset thirteenth threshold.
  • the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, and after the sound source azimuth matches the first azimuth After any one of the first orientations in the set of orientations does not match, the voiceprint is extracted from the sound; after the voiceprint is matched with a first voiceprint in the first set of voiceprints, and in the first voiceprint After the corresponding first voiceprint confidence level is greater than or equal to the preset sixteenth threshold, wake up the electronic device, and update the first orientation combination and the first voiceprint set; or, in the first voiceprint corresponding to the first voiceprint After the confidence level is less than the preset sixteenth threshold, the electronic device is not woken up, and the second set of orientations is updated.
  • the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint confidence level is used to indicate that the first voiceprint wakes up the electronic device
  • the probability of , the first voiceprint is used to represent the voiceprint for waking up the electronic device; L is a positive integer greater than or equal to 1. In this way, if possible false wakeups cannot be screened out according to the first set of orientations and the second set of orientations, the first voiceprint set is further screened for possible false wakeups, thereby reducing the probability of false wakeups of the electronic device and improving user experience.
  • the method further includes: after the voiceprint does not match any one of the first voiceprints in the first voiceprint set, updating the second orientation set.
  • the method further includes: after waking up the electronic device, updating the first set of orientations; after not waking up the electronic device, updating the second set of orientations.
  • an electronic device in a fourth aspect, includes a pickup and a speaker, the pickup includes a plurality of microphones, and the electronic device further includes: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, and when the computer program is executed by the processor, the electronic device executes the The method described in the first aspect and any implementation manner of the first aspect, the second aspect and any implementation manner of the second aspect, and the third aspect and any implementation manner of the third aspect.
  • a computer-readable storage medium includes a computer program that, when the computer program runs on an electronic device, causes the electronic device to perform the first aspect and any one of the implementations of the first aspect, the second aspect and any one of the second aspect Implementations, the third aspect and the method of any one of the implementations of the third aspect, wherein the electronic device includes a pickup and a speaker, and the pickup includes a plurality of microphones.
  • a computer program product When it runs on a computer, it causes the computer to execute the first aspect and any one of the implementations of the first aspect, the second aspect and any one of the implementations of the second aspect, and any one of the third aspect and the third aspect method described in an implementation.
  • FIG. 1 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scenario of a wake-up method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a graphical user interface set by a user in a wake-up method provided by an embodiment of the present application
  • FIG. 5 is a flowchart of an embodiment of a wake-up method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • FIG. 7 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • FIG. 8 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • FIG. 9 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • FIG. 10 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • FIG. 11 is a schematic structural composition diagram of an electronic device provided by an embodiment of the present application.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • the term “connected” includes both direct and indirect connections unless otherwise specified.
  • the probability of false wake-up generated by the electronic device is reduced by optimizing the wake-up word model preset in the electronic device.
  • the main function of the wake-up word model is to detect the wake-up word from the sound picked up by the electronic device, and obtain the probability that the sound contains the wake-up word.
  • the wake-up word model is a trained machine learning model. For example, a model for detecting wake-up words can be established in advance, and a wake-up word model can be obtained by training the model with samples.
  • the above-mentioned pre-established model may be a neural network model, a Gaussian mixture model, a hidden Markov model, or the like.
  • the above-mentioned samples may be sounds containing wake-up words, or phoneme sequences of sounds containing wake-up words, or audio features of sounds containing wake-up words, or the like.
  • Voices containing wake words can be recorded by different people in different scenarios.
  • Using the sounds containing wake words recorded by different people in different scenarios can enable the trained wake word model to detect wake words in sounds in various scenarios.
  • the sounds recorded in different scenarios do not only include wake words, but may include noise (such as non-wake words). In this way, if the sounds recorded in different scenarios are used as samples to train the wake-up word model, the wake-up word model will be polluted, so that the wake-up word model may recognize sounds including non-wake words as wake-up word sounds, resulting in false wake-up.
  • the wake-up word model may mix the sound picked up by the smart speaker with "Xiaoyi".
  • "Xiaoyi” sounds with similar or even completely different pronunciations are detected as sounds containing wake-up words, thus causing the smart speaker to be awakened by mistake.
  • embodiments of the present application provide an electronic device and a wake-up method, which can reduce the false wake-up probability of the electronic device and improve user experience.
  • the electronic device provided by the embodiment of the present application is an electronic device with a function of picking up sound and a function of broadcasting voice.
  • smart speakers smart phones, tablet computers, personal computers (PCs), wearable devices (such as smart glasses, smart watches, smart bracelets, etc.), smart home appliances such as smart TVs, smart screens, smart network connections Vehicle (intelligent connected vehicle, ICV), intelligent (car) car (smart/intelligent car) or in-vehicle equipment, etc.
  • ICV intelligent (car) car (smart/intelligent car) or in-vehicle equipment, etc.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100 .
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the electronic device 100 may be a smart speaker.
  • the smart speaker may include: a processor 110, an internal memory 121, a speaker 170A, and a microphone 170C.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. part or all of it.
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and the like.
  • I2S integrated circuit sound
  • PCM pulse code modulation
  • the I2S interface can be used for audio communication.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit sound to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit sound to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • Audio module 170 is used to convert digital audio information to analog sound output, and also to convert analog audio input to digital sound. Audio module 170 may also be used to encode and decode sound. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound through the human mouth close to the microphone 170C, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D can be the USB interface 130, or can be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 as an example.
  • FIG. 2 is a block diagram of a software structure of an electronic device 100 according to an embodiment of the present invention.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • FIG. 2 is used as an example for the software structure of the electronic device, the software structure shown in FIG. 2 is only a schematic example, and the software structures of other operating systems are also applicable to the wake-up method provided in this embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scenario of a wake-up method provided by an embodiment of the present application.
  • the home environment is also equipped with other devices with a function of broadcasting sound, such as TVs and traditional speakers, and furniture such as sofas and dining tables. Users can move around the sofa, dining table, etc., and wake up the smart speaker by uttering a voice containing a wake-up word.
  • Smart speakers can also be placed in other scenarios. Such as shopping malls, office environments, etc.
  • the wake-up method of the embodiment of the present application By executing the wake-up method of the embodiment of the present application, the false wake-up probability of the smart speaker can also be reduced.
  • the specific implementation of the wake-up method provided by the embodiments of the present application will be described.
  • the smart speaker When the smart speaker is not awakened and in the standby state, it picks up the sound in the environment to obtain the sound.
  • the voice includes the voice of the target speaker, that is, the user, and also includes noise signals in the environment. For this reason, noise reduction processing is generally performed on the received sound to obtain a clean sound, which is used as the sound for triggering the execution of the wake-up method in the embodiment of the present application.
  • a wake-up orientation set at least one of a wake-up orientation set, a false wake-up orientation set and a voiceprint set is set in the smart speaker.
  • the set of mis-awakened orientations includes: mis-awakened orientations and confidence levels of the mis-awakened orientations.
  • an element in the false-awakening location set is represented by (false-awakening location, confidence).
  • the false wake-up position is used to record the sound source position of the sound that did not wake up the smart speaker.
  • the confidence level of the false wake-up position is used to describe the probability that the voice to wake up the smart speaker is issued at the false wake-up position. Confidence can identify the probability by the magnitude of the value.
  • the orientation in the embodiments of the present application refers to the direction and position relative to the smart speaker, for example, the orientation of the sound source refers to the direction and position of the sound source relative to the smart speaker.
  • the wake-up position set includes: the wake-up position, and the confidence level of the wake-up position.
  • an element in the set of wake-up positions is represented by (wake-up position, confidence).
  • the wake-up location is used to record the sound source location of the sound that wakes up the smart speaker.
  • the confidence of the wake-up position is used to describe the probability that the voice to wake up the smart speaker is issued at the wake-up position.
  • the voiceprint set includes: wake-up voiceprint, and the confidence level of the wake-up voiceprint.
  • the confidence of the wake-up voiceprint can be represented by the number of hits of the wake-up voiceprint.
  • an element in the voiceprint set is represented by (wakeup voiceprint, confidence).
  • the wake-up voiceprint is used to record the user's voiceprint of the sound that wakes up the smart speaker.
  • the confidence of the wake-up voiceprint is used to record the probability that the sound with the wake-up voiceprint wakes up the smart speaker.
  • the number of hits is used to record the number of times the sound with the wake-up voiceprint wakes up the smart speaker.
  • User voiceprint and wake-up voiceprint can be represented by parameter values of voiceprint feature parameters.
  • the above-mentioned voiceprint feature parameters may include, but are not limited to, intensity, wavelength, frequency, rhythm, and the like, for example.
  • the parameter value of at least one voiceprint feature parameter is different between different voiceprints.
  • a coordinate system of the smart speaker can be established.
  • the origin of the coordinate system may be the physical center point of the smart speaker, and the positive direction of the x-axis may be the direction pointing horizontally to the front of the smart speaker.
  • the method for establishing the coordinate system is only an example, and is not intended to limit the method for establishing the coordinate system of the smart speaker.
  • the sound source orientation can be identified by distance and angle in the above-mentioned coordinate system. Specifically, the distance of the sound source azimuth can be used to record: the distance between the sound source of the sound and the origin of the coordinate system of the smart speaker.
  • the angle can be used to record: the angle between the origin of the coordinate system of the smart speaker pointing to the ray of the sound source of the sound and the positive direction of the x-axis of the smart speaker.
  • a parameter of the dimension of height may be further added to the sound source orientation.
  • Height can be used to record: the vertical distance between the source of the sound and the origin of the coordinate system.
  • Information such as the distance and angle of the sound source azimuth can be calculated by the smart speaker based on the relevant sound source localization method.
  • the sound source localization method can calculate the relative position between the sound source and the smart speaker based on a microphone array composed of at least two microphones set in the smart speaker. For example distance and angle etc.
  • sound source localization methods may include, but are not limited to: controllable beamforming technology based on maximum output power, high-resolution spectrogram estimation technology, and sound source localization technology based on time-delay estimation (TDE), etc. .
  • TDE time-delay estimation
  • the propagation delay is generally obtained by performing cross-correlation processing on the sounds picked up by the microphone array of the smart speaker. After that, the distance between the smart speaker and the sound source can be calculated by simple delay summation, geometric calculation, or direct use of cross-correlation results to search for controllable power response.
  • Specific Algorithms The embodiments of the present application are not expanded one by one.
  • the initial setting of the set in the initial case, for example, when the smart speaker is not used from the factory or the factory settings are restored, at least one of the preset false wake-up orientation set, the wake-up orientation set, and the voiceprint set in the smart speaker can be empty. .
  • the user can set at least one of the false wake-up orientation set, the wake-up orientation set and the voiceprint set based on the environment in which the smart speaker is located, or not set. If the user does not make settings, user operations can be reduced and user experience can be improved.
  • the method of setting the above sets is given as an example: since the false wake-up orientation records the sound source orientation of the sound that does not wake up the smart speaker, the false wake-up orientation generally corresponds to the orientation of other devices that can emit sound in the environment relative to the smart speaker. Based on this, the mis-awakening orientation can be set based on the orientation of other devices capable of making sounds in the environment relative to the smart speaker, and the user or the smart speaker can set an initial confidence level for the mis-awakening orientation. Taking the home environment shown in FIG. 3 as an example, when the smart speaker has a display screen, the user can be provided with a setting interface for the wrong wake-up orientation on the display screen of the smart speaker.
  • the user can set a false wake-up position and the confidence of the false wake-up position based on the relative position between the TV and the smart speaker; set a false wake-up position based on the relative position between the traditional speaker and the smart speaker and the Confidence of the false wake-up position; then click the "OK” control; correspondingly, the smart speaker detects the user's operation on the "OK” control in the setting interface, and obtains the false wake-up position and confidence in the setting interface and other information. Save it in the false wakeup location collection.
  • the setting interface can be displayed to the user by other devices associated with the smart speaker (such as the user's smartphone, etc.), and other devices will be obtained from the setting interface.
  • the received false wake-up position, and information such as confidence are sent to the smart speaker.
  • the wake-up position since the wake-up position records the sound source position of the sound that wakes up the smart speaker, the wake-up position generally corresponds to the position in the environment where the user often sends out the voice to wake up the smart speaker relative to the position of the smart speaker. Based on this, the user can set the wake-up orientation based on the position where the user often wakes up the smart speaker relative to the orientation of the smart speaker, and the user or the smart speaker can set an initial confidence level for the wake-up orientation. Taking the home environment shown in Figure 3 as an example, ordinary users often move around sofas, dining tables, etc. and wake up smart speakers.
  • one or more wake-up positions and corresponding confidence levels can be set based on the position on the sofa relative to the position of the smart speaker, and one or more wake-up positions can be set based on the position near the dining table, such as the position of the dining chair relative to the position of the smart speaker, and corresponding confidence.
  • the setting method of the false wake-up orientation shown in FIG. 4 which will not be repeated here.
  • the wake-up voiceprint can be set by the user by recording the voice.
  • the smart speaker obtains the user's voiceprint according to the sound obtained by recording the voice, and sets it as the wake-up voiceprint.
  • the user or the smart speaker sets the initial confidence level of the wake-up voiceprint. For example, if the confidence is the number of hits, the initial confidence may be 0.
  • the wake-up orientation set can be updated according to the sound source orientation of the sound that wakes up the smart speaker;
  • the voiceprint set can be updated according to the user voiceprint extracted from the sound;
  • the set of mis-awakened orientations is updated according to the sound source orientation of the sound of the unawakened smart speaker.
  • the sound source orientation may be completely consistent with a wake-up orientation, or may have a certain deviation.
  • the distance threshold and the angle threshold can be preset separately.
  • the wake-up orientation 1 may be referred to as the wake-up orientation corresponding to the sound source orientation, or may be referred to as the wake-up orientation including the sound source orientation.
  • the confidence of the wake-up orientation 1 corresponding to the sound source orientation is improved. It should be noted that the embodiment of the present application does not limit the setting value of the initial confidence level. The embodiments of the present application also do not limit the extent to which the confidence is increased each time the confidence of the wake-up orientation is increased.
  • the magnitude may be a fixed value, or a fixed percentage of the confidence level, or the like.
  • the embodiments of the present application also do not limit the specific values of the preset distance threshold and angle threshold.
  • the distance threshold and the angle threshold may be determined based on the accuracy of the wake-up method, the accuracy of the sound source orientation calculation method, and the like. Specifically, the higher the accuracy of the wake-up method, the smaller the distance threshold and the angle threshold are; the higher the accuracy of the sound source orientation calculation method, the smaller the distance and angle thresholds are.
  • the setting of the distance threshold and the angle threshold can expand the wake-up orientation in the wake-up orientation set from a point to an area, and the distance threshold and the angle threshold can be set based on the size of the desired expansion area.
  • the distance threshold and the angle threshold can be adjusted by the user of the smart speaker according to their needs.
  • the smart speaker can extract the user's voiceprint of the sound, and determine whether the extracted user's voiceprint is included in the wake-up voiceprint of the voiceprint set. If included, increase the confidence of the wake-up voiceprint; otherwise, add the user's voiceprint as a wake-up voiceprint to the voiceprint set, and set the confidence for the newly added wake-up voiceprint. Similar to the judgment of the wake-up orientation set, in judging whether the voiceprint set includes the extracted user voiceprint, it is also possible to allow a certain error between the user's voiceprint and the wake-up voiceprint.
  • a threshold may be set for each voiceprint feature included in the voiceprint, as long as the difference between the value of each voiceprint feature of the user's voiceprint and the value of the corresponding voiceprint feature of a wake-up voiceprint is smaller than the corresponding value of the voiceprint feature It can be considered that the voiceprint set includes the user's voiceprint, and the above-mentioned one of the wake-up voiceprints is the wake-up voiceprint corresponding to the user's voiceprint.
  • the smart voice signal calculates the sound source orientation of the sound.
  • the intelligent voice signal judges whether the set of mis-awakened orientations includes the orientation of the sound source of the sound. If included, reduce the confidence level of the false wake-up orientation corresponding to the sound source orientation; if not, add the sound source orientation as the mis-awakened orientation to the set of mis-awakened orientations, and set the initial confidence level for the newly added mis-awakened orientation.
  • the update of the erroneously awakened orientation set reference may be made to the relevant description in the update of the awakened orientation set, which will not be repeated here.
  • the wake-up method of the embodiment of the present application determines whether to execute the wake-up process based on the wake-up word confidence output by the wake-up word model, the false wake-up location set and/or the wake-up location set, and the voiceprint set, thereby reducing the probability of false wake-up.
  • the wake-up method will be described in detail below.
  • the smart speaker includes a pickup and a speaker.
  • the pickup includes a microphone array
  • the microphone array includes a plurality of microphones.
  • the smart speaker is preset with a wake-up orientation set (also referred to as a first orientation set), a false wake-up orientation set (also referred to as a second orientation set) and a voiceprint set (also referred to as a first orientation set) voiceprint collection).
  • the wake-up method in this embodiment of the present application may include:
  • Step 501 The smart speaker picks up the sound in the environment to obtain the sound.
  • smart speakers Since smart speakers generally pick up sounds in the environment continuously, smart speakers generally divide the continuously picked up sounds into audio segments for a certain duration.
  • the sound in the embodiment of the present application generally refers to the divided audio segment.
  • the specific duration of the audio segment is not limited in this embodiment of the present application.
  • the smart speaker In order to reduce the influence of noise on subsequent processing, the smart speaker generally performs noise reduction processing on the sound before executing step 502, so as to suppress noise signals in the sound and obtain a relatively clean sound. In this way, the sound used in step 502 is generally the sound after noise reduction processing.
  • preset conditions such as sound intensity threshold can be set for the sound picked up by the sound. Only the sound that meets the preset conditions will wake up based on the wake-up word model. word confidence to trigger subsequent processing. Specific preset conditions are not limited in this embodiment of the present application.
  • Step 502 The smart speaker calculates the wake-up word confidence of the sound based on the wake-up word model.
  • the wake word confidence is used to describe the probability that the sound includes the wake word sound.
  • Step 503 the smart speaker determines whether the confidence level of the wake-up word is less than the first threshold; if the confidence level of the wake-up word is not less than the first threshold, step 504 is executed.
  • step 503 further includes: if it is less than the first threshold, do not execute the wake-up process, update the false wake-up orientation set according to the sound source orientation of the sound, and this branch process ends.
  • step 503 can also be performed by a wake-up word model, so that the wake-up word model can output two parameters, a judgment result of whether to wake up and a wake-up word confidence, which are not limited in this embodiment of the present application.
  • the wake word confidence is used to describe the probability that the sound includes the wake word sound. The higher the wake word confidence, the greater the probability that the sound includes the wake word sound.
  • the wake-up method of the embodiment of the present application further performs the following steps 504 to 511 to further determine whether to execute the wake-up process, thereby realizing the screening of false wake-ups and reducing the probability of false wake-ups.
  • Step 504 The smart speaker determines whether the confidence level of the wake-up word is less than the second threshold, and the second threshold is greater than the first threshold. If it is not less than the second threshold, execute the wake-up process, and execute step 511; if it is less than the second threshold, execute step 505 .
  • the situation where the confidence of the wake-up word is not less than the first threshold is further divided into two types by the second threshold: if the confidence of the wake-up word is not less than the second threshold, it means that the sound has a high probability of including the sound of the wake-up word, The probability of false wake-up is low, so the wake-up process is directly executed to wake up the smart speaker; if the confidence of the wake-up word is less than the second threshold and not less than the first threshold, it means that the probability that the sound includes the sound of the wake-up word is relatively low, and an error occurs.
  • the probability of wake-up is relatively high, so the following steps 506 to 509 are performed, and further combined with the wake-up orientation set, the false-awakened orientation set, or the voiceprint set to determine whether to execute the wake-up process.
  • the value range of the wake word confidence is (0, 100)
  • the first threshold is 30, and the second threshold is 80.
  • the confidence of the wake-up word is less than 30, the wake-up process is not executed; if the confidence of the wake-up word is not less than 80, the wake-up process is directly executed; if the confidence of the wake-up word is less than 80 and not less than 30, the following steps 505 ⁇ are executed 509, further screen out possible false awakenings.
  • the wake-up location set, the false wake-up location set, or the voiceprint set may include at least one set element, and each set element includes at least two units.
  • the set elements included in the wake-up orientation set include the wake-up orientation and the confidence level corresponding to the wake-up orientation
  • the set elements included in the false-wake orientation set include the mis-awaken orientation and the confidence level corresponding to the mis-awaken orientation
  • the set elements included in the voiceprint set include the sound Corresponding confidence of print and voiceprint.
  • step 504 there is no order of execution between the steps of executing the wake-up process in step 504 and step 511 .
  • step 511 is executed as an example.
  • Both the first threshold and the second threshold may be preset.
  • Step 505 The smart speaker calculates the sound source orientation of the sound.
  • Step 506 The smart speaker judges whether the set of mis-awakened orientations includes the sound source orientation of the sound, and judges whether the set of wake-up orientations includes the sound source orientation of the sound; if only the set of mis-awakened orientations includes the sound source orientation, perform step 507; if only the wake-up orientation set includes the sound source orientation If the set includes the sound source orientation, go to step 508; if it does not belong to the above two situations, go to step 509.
  • Step 507 The smart speaker judges whether to execute the wake-up process according to the confidence of the false wake-up orientation corresponding to the sound source orientation; if so, execute the wake-up process, and execute step 511; The position of the sound source is updated to the wrong wake-up position set, and the process of this branch ends.
  • the smart speaker determines whether to execute the wake-up process according to the confidence of the false wake-up position corresponding to the sound source position, which may include:
  • the confidence of the false wake-up position is less than the threshold a, it means that the probability of the sound is a noise signal is relatively high, so it is judged that the wake-up process is not executed, that is, the smart speaker is not woken up, thus reducing the probability of false wake-up.
  • Step 508 The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up orientation corresponding to the sound source orientation; if yes, execute the wake-up process, and go to step 511; if not, do not execute the wake-up process, and according to the sound source
  • the orientation update wakes up the orientation set by mistake, and the process of this branch ends.
  • the smart speaker determines whether to execute the wake-up process according to the confidence of the wake-up position corresponding to the sound source position, which may include:
  • the confidence of the wake-up position is less than the threshold b, it means that the probability of the sound is a noise signal is relatively high, so it is judged that the wake-up process is not executed, that is, the smart speaker is not to be woken up, thereby reducing the probability of false wake-up.
  • Step 509 the smart speaker extracts the user voiceprint of the voice, and determines whether the wake-up voiceprint of the voiceprint set includes the extracted user voiceprint; if it does, go to step 510; if not, do not execute the wake-up process; according to the sound source of the sound
  • the orientation update wakes up the orientation set by mistake, and this branch process ends;
  • Step 510 The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint; if so, execute the wake-up process, and execute step 511; The source location updates the false wake-up location set, and this branch process ends.
  • the execution sequence between the wake-up process and step 511 is not limited.
  • the smart speaker determines whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint, which may include:
  • the confidence level of the wake-up voiceprint is less than the threshold c, it means that the sound is more likely to be made by a person who does not often appear in the environment, so it is judged not to perform the wake-up process, that is, not to wake up the smart speaker, thus reducing the probability of false wake-up .
  • Step 511 The smart speaker updates the wake-up orientation set according to the sound source orientation of the sound, and updates the voiceprint set according to the user voiceprint of the sound, and this branch process ends.
  • the second threshold may not be set, that is, step 504 is not executed, but step 505 is directly executed.
  • the judgment in step 504 can be moved to steps 507, 508, and 510 in FIG. 5 for execution, and the smart speaker combines the wake-up word confidence to determine whether to execute the wake-up process.
  • the specific judgment The criteria can refer to the judgment criteria shown in FIG. 5 .
  • step 507 will be replaced by: the smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up word and the confidence of the false wake-up orientation corresponding to the sound source orientation; if the judgment result is yes, then execute the wake-up process, and Step 511 is executed; if the judgment result is no, the wake-up process is not executed, and step 512 is executed.
  • the smart speaker judges whether to execute the wake-up process according to the confidence level of the wake-up word and the confidence level of the mis-awakened orientation corresponding to the sound source orientation, which may include:
  • the confidence level of the wake-up word is less than the second threshold, and the confidence level of the mis-awakened orientation is not less than the first-party location confidence threshold, it is determined to execute the wake-up process.
  • the smart speaker updates the wake-up orientation set and the voiceprint set every time it judges to execute the wake-up process, and updates the false wake-up orientation set every time it determines not to execute the wake-up process.
  • the above set may not be updated after each judgment to execute the wake-up process or not, but to update the above set after selecting certain judgments based on a certain rule.
  • the embodiments of the present application are not limited.
  • the smart speaker After judging to execute the wake-up process, the smart speaker updates the wake-up location set and voiceprint set; after judging not to execute the wake-up process, it updates the false wake-up location set; in this way, during the gradual use of the smart speaker, the wake-up location set recorded in the wake-up location set is updated.
  • the orientation can correspond to the position where the user often makes the wake-up word voice in the environment, and the mis-awakening orientation recorded in the false-awakening orientation set can correspond to the position of other devices that emit sound in the environment.
  • the confidence level recorded in the voiceprint set The high wake-up voiceprint corresponds to the user's voiceprint who frequently wakes up the smart speaker, so that the wake-up method of the embodiment of the present application can better achieve the effect of reducing the probability of false wake-up.
  • the smart speaker is preset with wake-up orientation, false wake-up orientation and voiceprint sets, and the three sets are empty respectively;
  • the second threshold is 0.7, the threshold a is 0.5, the threshold b is 0.6, and the threshold c is 5; then,
  • the wake-up word confidence of sound 1 is calculated based on the wake-up word model to be 0.1, which is less than the preset first threshold of 0.4, then go to step 503 In the branch where the judgment result is yes, no wake-up is performed, and the sound source orientation 1 of sound 1 is added to the false-awakening orientation set to obtain the false-awakening orientation 1, and an initial confidence level is set for it, for example, 0.8; It is very small.
  • the smart speaker picks up the sound of the TV again to obtain voice 2, and then calculates the wake-up word confidence of voice 2 to be 0.2, and executes the branch with the judgment result in 503 to reduce the false wake-up position 1 in the set of false wake-up positions. As the smart speaker performs the above process for many times, the confidence level of the false wake-up position 1 is reduced.
  • the smart speaker calculates the wake-up word confidence of sound n as a value between the first threshold of 0.4 and the second threshold of 0.7, such as 0.55, and the smart speaker executes steps 503, 504, and 505 in sequence, according to the confidence of the false wake-up orientation 1 0.48, judging that it is less than the threshold a 0.5, the wake-up process will not be performed, so that possible false wake-up situations are screened out from the prior art execution of the wake-up process, thereby reducing the probability of false wake-up;
  • the smart speaker picks up the TV sound to obtain the sound m, and the calculated wake-up word confidence level is 0.65, and execute the steps in sequence Steps 503 to 505, according to the confidence of the false wake-up orientation 1, such as 0.55, determine that it is not less than the threshold a 0.5, and execute the wake-up process.
  • the wake-up orientation set and the voiceprint set will be updated, so that the wake-up orientation set also includes the voice.
  • Source azimuth 1 the smart speaker will also update the wake-up azimuth set and the voiceprint set according to the sound when the user wakes up the smart speaker.
  • the confidence of the voiceprint is gradually improved; when the smart speaker subsequently obtains a sound with a wake-up word confidence between 0.4 and 0.7 at the sound source position where the TV is located, it can further judge whether to execute the wake-up process according to the voiceprint collection, so as to filter possible False wake-up reduces the probability of false wake-up.
  • the wake-up method of the embodiment of the present application shown in FIG. 5 after calculating the wake-up word confidence of the sound based on the wake-up word model, it is further combined with the wake-up orientation set, the false-awakened orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to wake up the smart speaker.
  • the judgment result output by the word model is that the smart speaker is awakened, the possible false awakening is further screened, thereby reducing the false awakening probability of the smart speaker and improving the user experience.
  • the wake-up location set, false wake-up location set, and voiceprint set may also be non-preset, but created along with machine learning; after creation, continue to enrich and adjust based on machine learning.
  • the preset wake-up orientation set, false wake-up orientation set and voiceprint set in the smart speaker are taken as examples, in the embodiment shown in Fig. 6, the preset mis-awaken orientation set and voiceprint set in the smart speaker are as follows.
  • steps 506 to 511 are replaced by the following steps 601 to 605, specifically:
  • Step 601 the smart speaker determines whether the set of mis-awakened orientations includes the sound source orientation of the sound; if it does, go to step 602; if not, go to step 603;
  • Step 602 the smart speaker judges whether to execute the wake-up process according to the confidence of the false wake-up orientation corresponding to the sound source orientation; if the judgment result is yes, the wake-up process is executed, and step 605 is executed; if the judgment result is no, the wake-up process is not executed , and update the false wake-up orientation set according to the sound source orientation of the sound, and this branch process ends.
  • step 507 For the implementation of this step, please refer to the description in step 507, which is not repeated here.
  • Step 603 The smart speaker extracts the user's voiceprint of the voice, and determines whether the wake-up voiceprint of the voiceprint set includes the extracted user's voiceprint; if it does, go to step 604; The position of the sound source is updated to the wrong wake-up position set, and the process of this branch ends.
  • Step 604 The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint; if the judgment result is yes, execute the wake-up process, and execute step 605; The sound source orientation of the sound updates the set of mis-awakened orientations, and this branch process ends.
  • Step 605 The smart speaker updates the voiceprint set according to the user's voiceprint of the voice.
  • step 511 For the implementation of this step, please refer to the description in step 511, which is not repeated here.
  • the wake-up method of the embodiment of the present application shown in FIG. 6 calculates the wake-up word confidence level of the sound based on the wake-up word model, and further combines the false wake-up orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to determine whether to wake up the smart speaker based on the wake-up word model output.
  • the result is that in the case of waking up the smart speaker, the possible false wake-up is further screened, thereby reducing the probability of false wake-up of the smart speaker and improving the user experience.
  • the wake-up method shown in Figure 7 takes the preset wake-up orientation set and voiceprint set in the smart speaker as an example.
  • the difference with Fig. 6 is mainly: the wrong wake-up orientation set is replaced by the wake-up orientation set, and, omitting the wrong wake-up orientation set update step, in step 705, the wake-up orientation set is updated according to the sound source orientation of the sound, according to the user voiceprint of the sound Update the voiceprint collection.
  • step 702 The implementation of judging whether to execute the wake-up process according to the confidence of the wake-up orientation in step 702 may refer to the description in step 508, which will not be repeated here.
  • the first orientation set and the first voiceprint set are updated; including: creating the first orientation set and the first voiceprint set; incorporating the orientation for waking up the electronic device into the first orientation set, and assigning the The orientation of the first orientation set, an initial first party position reliability; and incorporating the voiceprint of waking up the electronic device into the first voiceprint set, and giving the voiceprint included in the first voiceprint set an initial first voiceprint grain confidence.
  • the first orientation set and the first voiceprint set are updated; including at least one of the following: creating a new first orientation in the first orientation set, and assigning a new first orientation to the newly created first orientation The initial first-party position reliability; create a new first voiceprint in the first voiceprint set, and give the newly created first voiceprint an initial first voiceprint confidence; For an existing first orientation on a match, increase the position reliability of the first party corresponding to the existing first orientation; for an existing first voiceprint on a match in the first voiceprint set, increase The first voiceprint confidence level corresponding to the existing first voiceprint.
  • steps 602 to 605 in FIG. 6 update the set of mis-awakened orientations, update the set of voiceprints, etc. similar to this.
  • Steps 507 to 511 in FIG. 5 are similar.
  • the update of the false wake-up location set, the update of the voiceprint set, etc. are similar to this; they will not be described one by one here.
  • the wake-up method of the embodiment of the present application shown in FIG. 7 calculates the wake-up word confidence of the sound based on the wake-up word model, and further combines the wake-up orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to output the judgment result in the wake-up word model In order to wake up the smart speaker, the possible false wake-up is further screened, thereby reducing the false wake-up probability of the smart speaker and improving the user experience.
  • FIG. 8 is a schematic flowchart of another embodiment of the wake-up method provided by the embodiment of the present application.
  • the method can be applied to electronic devices such as the above-mentioned smart speakers.
  • the method can include:
  • Step 801 Receive the sound, and calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
  • Step 802 If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
  • Step 803 Determine whether the sound source orientation is in the first orientation set or the second orientation set; wherein, the first orientation set includes several first orientations, and the first orientation is used to record the sound source orientation of the sound that does not wake up the smart speaker ;
  • the second orientation set includes several second orientations, and the second orientation is used to record the sound source orientation of the sound that wakes up the smart speaker;
  • Step 804 If the sound source azimuth is only in the first azimuth set, determine whether to wake up the smart speaker according to the confidence of the first azimuth corresponding to the sound source azimuth. the probability of speech;
  • Step 805 If the sound source azimuth is only in the second azimuth set, judge whether to wake up the smart speaker according to the confidence of the second azimuth corresponding to the sound source azimuth. the probability of speech.
  • the wake-up word confidence level may correspond to the wake-up word confidence level
  • the first orientation may correspond to the false wake-up orientation
  • the second orientation may correspond to the wake-up orientation
  • it can also include:
  • the sound source azimuth is in the first azimuth set and the second azimuth set, or the sound source azimuth is not in the first azimuth set and the second azimuth set, extract the user voiceprint according to the sound;
  • the first voiceprint set includes a user voiceprint
  • the first voiceprint set includes a first voiceprint
  • the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device
  • Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
  • the method may further include: judging that the confidence level of the wake-up word is less than a second threshold; and the second threshold is greater than the first threshold.
  • judging whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • judging whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • it can also include:
  • the second orientation set includes the sound source orientation of the sound, improve the confidence of the second orientation corresponding to the sound source orientation
  • the sound source orientation is stored as the second orientation in the second orientation set, and an initial confidence level is set for the second orientation.
  • it can also include:
  • the judgment result is to wake up the electronic device, and the user voiceprint of the voice is included in the first voiceprint set, improve the confidence of the first voiceprint corresponding to the user voiceprint;
  • the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
  • it can also include:
  • the determination result is that the electronic device is not to be woken up, and the sound source orientation of the sound is included in the first orientation set, reducing the confidence level of the first orientation including the sound source orientation;
  • the sound source orientation is stored as the first orientation in the first orientation set, and an initial confidence level is set for the first orientation.
  • FIG. 8 For the specific implementation of FIG. 8 , reference may be made to the embodiment shown in FIG. 5 , which will not be repeated here.
  • FIG. 9 is a flowchart of another embodiment of the wake-up method of the present application.
  • the method can be applied to electronic devices such as the above-mentioned smart speakers.
  • the method can include:
  • Step 901 Receive the sound, and calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
  • Step 902 If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
  • Step 903 Determine whether the sound source orientation is in the first orientation set; wherein the first orientation set includes the first orientation, and the first orientation is used to record the sound source orientation of the sound that does not wake up the electronic device;
  • Step 904 If the sound source azimuth is in the first azimuth set, determine whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth. probability of speech.
  • it can also include:
  • the first voiceprint set includes a user voiceprint
  • the first voiceprint set includes a first voiceprint
  • the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device
  • Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
  • the method may further include:
  • the second threshold is greater than the first threshold
  • judging whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth includes:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • it can also include:
  • the judgment result is to wake up the electronic device, and the user voiceprint of the voice is included in the first voiceprint set, improve the confidence of the first voiceprint corresponding to the user voiceprint;
  • the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
  • it can also include:
  • the determination result is that the electronic device is not to be woken up, and the sound source orientation of the sound is included in the first orientation set, reducing the confidence level of the first orientation including the sound source orientation;
  • the sound source orientation is stored as the first orientation in the first orientation set, and an initial confidence level is set for the first orientation.
  • FIG. 9 For the specific implementation of FIG. 9 , reference may be made to the embodiment shown in FIG. 6 , which will not be repeated here.
  • FIG. 10 is a flowchart of another embodiment of a wake-up method provided by an embodiment of the present application.
  • the method can be applied to electronic devices such as the above-mentioned smart speakers.
  • the method can include:
  • Step 1001 Receive the sound, calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
  • Step 1002 If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
  • Step 1003 determine whether the sound source orientation is in the second orientation set; wherein, the second orientation set includes the second orientation, and the second orientation is used to record the sound source orientation of the sound that wakes up the electronic device;
  • Step 1004 If the sound source azimuth is in the second azimuth set, determine whether to wake up the electronic device according to the confidence of the second azimuth corresponding to the sound source azimuth. probability of speech.
  • it can also include:
  • the first voiceprint set includes a user voiceprint
  • the first voiceprint set includes a first voiceprint
  • the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device
  • Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
  • the method may further include:
  • the second threshold is greater than the first threshold
  • judging whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
  • the judgment result is not to wake up the electronic device
  • the judgment result is to wake up the electronic device.
  • it can also include:
  • the sound source orientation is stored as the second orientation in the second orientation set, and an initial confidence level is set for the second orientation.
  • it can also include:
  • the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence level of the first voiceprint corresponding to the user's voiceprint;
  • the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
  • FIG. 10 For the specific implementation of FIG. 10 , reference may be made to the embodiment shown in FIG. 7 , which will not be repeated here.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 11 , the electronic device 1100 may include: a calculation unit 1110 and a judgment unit 1120 .
  • the calculation unit 1110 is used to receive the sound and calculate the wake-up word confidence of the sound; the wake-up word confidence is used to describe the probability that the sound includes the wake-up word sound, if the wake-up word confidence is greater than or equal to the first threshold, then calculate the sound source of the sound position;
  • the determining unit 1120 is used to determine whether the sound source orientation is in the first orientation set or the second orientation set; wherein the first orientation set includes the first orientation, and the first orientation is used to record the sound source orientation of the sound that does not wake up the electronic device , the second azimuth set includes a second azimuth, and the second azimuth is used to record the sound source azimuth of the sound that wakes up the electronic device; if the sound source azimuth is only in the first azimuth set, according to the confidence level of the first azimuth corresponding to the sound source azimuth To judge whether to wake up the electronic device, the confidence of the first position is used to describe the probability of the voice that wakes up the electronic device at the first position; if the sound source position is only in the second position set, according to the sound source position corresponding to the second position The confidence level is used to determine whether to wake up the electronic device, and the confidence level of the second orientation is used to describe the probability that the voice to wake up the electronic device is issued at the second orientation.
  • the judging unit 1120 may also be configured to: if the sound source azimuth is in the first azimuth set and the second azimuth set, or the sound source azimuth is not in the first azimuth set and the second azimuth set, according to Extracting user voiceprints by voice; judging whether the first voiceprint set includes user voiceprints; the first voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device; The confidence level of the first voiceprint corresponding to the voiceprint determines whether to wake up the electronic device.
  • the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
  • the judgment unit 1120 may be specifically configured to: judge whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a; if it is less than the threshold a, the judgment result is not to wake up the electronic device; Threshold a, the judgment result is to wake up the electronic device.
  • the judging unit 1120 may be specifically configured to: judge whether the confidence level of the second azimuth corresponding to the sound source azimuth is less than the threshold b; Threshold b, the judgment result is to wake up the electronic device.
  • the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
  • it may further include: an update unit, configured to increase the confidence of the second orientation corresponding to the sound source orientation if the determination result is to wake up the electronic device and the second orientation set includes the sound source orientation of the sound If the judgment result is to wake up the electronic device, and the sound source orientation of the sound is not included in the second orientation set, store the sound source orientation as the second orientation in the second orientation set, and set the initial confidence level for the second orientation .
  • an update unit configured to increase the confidence of the second orientation corresponding to the sound source orientation if the determination result is to wake up the electronic device and the second orientation set includes the sound source orientation of the sound If the judgment result is to wake up the electronic device, and the sound source orientation of the sound is not included in the second orientation set, store the sound source orientation as the second orientation in the second orientation set, and set the initial confidence level for the second orientation .
  • the updating unit may also be used to: if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence of the first voiceprint corresponding to the user's voiceprint If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is not included in the first voiceprint set, the user voiceprint is stored as the first voiceprint in the first voiceprint set, and the first voiceprint is the first voiceprint Set the initial confidence level.
  • the updating unit may be further configured to: if the determination result is that the electronic device is not to be woken up, and the sound source azimuth of the sound is included in the first azimuth set, reducing the confidence level of the first azimuth including the sound source azimuth If the judgment result is not to wake up the electronic equipment, and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and the initial confidence level is set for the first orientation .
  • the calculation unit 1110 is used to receive the sound; calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound; if the wake-up word confidence level is greater than or equal to the first threshold, then calculate the sound source of the sound position;
  • the judgment unit 1120 is used to judge whether the sound source azimuth is in the first azimuth set; wherein, the first azimuth set includes the first azimuth, and the first azimuth is used to record the sound source azimuth of the sound that does not wake up the electronic device; if the sound source azimuth In the first set of orientations, whether to wake up the electronic device is determined according to the confidence level of the first orientation corresponding to the sound source orientation, and the confidence level of the first orientation is used to describe the probability that the voice to wake up the electronic device is issued at the first orientation.
  • the judging unit 1120 can also be used to: if the sound source azimuth is not in the first azimuth set, extract the user's voiceprint according to the sound; judge whether the user's voiceprint is included in the first voiceprint set;
  • the voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user's voiceprint for waking up the sound of the electronic device; whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
  • the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
  • the judgment unit 1120 may be specifically configured to: judge whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a; if it is less than the threshold a, the judgment result is not to wake up the electronic device; Threshold a, the judgment result is to wake up the electronic device.
  • the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
  • it can also include:
  • the updating unit is used to improve the confidence of the first voiceprint corresponding to the user's voiceprint if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice; if the judgment result is to wake up the electronic device, and For user voiceprints that do not include voice in the first voiceprint set, store the user voiceprint as the first voiceprint in the first voiceprint set, and set an initial confidence level for the first voiceprint.
  • the updating unit may be further configured to: if the determination result is that the electronic device is not to be woken up, and the sound source azimuth of the sound is included in the first azimuth set, reducing the confidence level of the first azimuth including the sound source azimuth If the judgment result is not to wake up the electronic equipment, and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and the initial confidence level is set for the first orientation .
  • the calculation unit 1110 is used to receive the sound; calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound; if the wake-up word confidence level is greater than or equal to the first threshold, then calculate the sound source of the sound position;
  • the judgment unit 1120 is used to judge whether the sound source azimuth is in the second azimuth set; wherein, the second azimuth set includes the second azimuth, and the second azimuth is used to record the sound source azimuth of the sound that wakes up the electronic device; if the sound source azimuth is in In the second azimuth set, whether to wake up the electronic device is determined according to the confidence of the second azimuth corresponding to the sound source azimuth, and the confidence of the second azimuth is used to describe the probability that the voice to wake up the electronic device is issued at the second azimuth.
  • the judging unit 1120 may also be used to: if the sound source orientation is not in the second orientation set, extract the user's voiceprint according to the sound; determine whether the first voiceprint set includes the user's voiceprint; The voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user's voiceprint for waking up the sound of the electronic device; whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
  • the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
  • the judging unit 1120 may be specifically configured to: judge whether the confidence level of the second azimuth corresponding to the sound source azimuth is less than the threshold b; Threshold b, the judgment result is to wake up the electronic device.
  • the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
  • it can also include:
  • the updating unit is used to improve the confidence of the second azimuth corresponding to the sound source azimuth if the judgment result is to wake up the electronic device and the sound source azimuth of the sound is included in the second azimuth set; if the judgment result is to wake up the electronic equipment, and the second azimuth
  • the azimuth set does not include the sound source azimuth of the sound, the sound source azimuth is stored as the second azimuth in the second azimuth set, and an initial confidence level is set for the second azimuth.
  • the updating unit may also be used to: if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence of the first voiceprint corresponding to the user's voiceprint If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is not included in the first voiceprint set, the user voiceprint is stored as the first voiceprint in the first voiceprint set, and the first voiceprint is the first voiceprint Set the initial confidence level.
  • the electronic device provided by the embodiment shown in FIG. 11 can be used to implement the technical solutions of the method embodiments shown in FIG. 5 to FIG. 7 of the present application.
  • the implementation principle and technical effect reference may be made to the related descriptions in the method embodiments.
  • each unit of the apparatus shown in FIG. 11 above is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated.
  • these units can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some units can also be implemented in the form of software calling through processing elements, and some units can be implemented in hardware.
  • the acquisition unit may be a separately established processing element, or may be integrated in a certain chip of the electronic device.
  • the implementation of other units is similar.
  • all or part of these units can be integrated together, and can also be implemented independently.
  • each step of the above-mentioned method or each of the above-mentioned units may be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
  • Embodiments of the present application further provide an electronic device, including: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, and the computer program includes instructions, when the instructions are stored by the device During execution, the device is caused to execute the methods shown in FIG. 5 to FIG. 7 .
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the programs provided by the embodiments shown in FIG. 5 to FIG. 7 of the present application. method.
  • Embodiments of the present application further provide a computer program product, where the computer program product includes a computer program that, when run on a computer, enables the computer to execute the methods provided by the embodiments shown in FIGS. 5 to 7 of the present application.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (Read-Only Memory; hereinafter referred to as: ROM), Random Access Memory (Random Access Memory; hereinafter referred to as: RAM), magnetic disk or optical disk and other various A medium on which program code can be stored.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • magnetic disk or optical disk and other various A medium on which program code can be stored.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Telephone Function (AREA)

Abstract

Provided is a wake-up method, comprising: receiving a sound, and calculating the confidence of a wake-up word for the sound (801); if the wake-up word confidence is greater than or equal to a first threshold, then calculating a sound source orientation of the sound (802), and determining whether the sound source orientation is in a first orientation set or a second orientation set (803); if the sound source orientation is in the first orientation set, then according to the confidence of a first orientation corresponding to a first orientation in the first orientation set, determining whether to wake up the electronic device (804); if the sound source orientation is in the second orientation set, then according to the confidence of a second orientation corresponding to a second orientation in the second orientation set, determining whether to wake up the electronic device (805). The method reduces the probability of erroneous wake-up of the electronic device and improves user experience. Also provided are an electronic device (100) and a computer storage medium.

Description

电子设备及其唤醒方法Electronic device and wake-up method thereof 技术领域technical field
本申请涉及终端技术领域,特别涉及一种电子设备及其唤醒方法。The present application relates to the field of terminal technologies, and in particular, to an electronic device and a wake-up method thereof.
背景技术Background technique
电子设备可以通过与用户的语音交互来执行功能。这样的电子设备包括拾音器(比如,麦克风阵列)和扬声器(比如,扬声器),具有拾音功能和播放功能。比如,智能音箱,智能手机,智能电视等。用户与电子设备语音交互前,需要将电子设备唤醒。通过唤醒,电子设备可以从待机状态进入工作状态。一般来说,电子设备通过识别接收到的声音是否包含预设的唤醒词来确定是否唤醒。Electronic devices can perform functions through voice interaction with the user. Such electronic devices include pickups (eg, microphone arrays) and speakers (eg, speakers), with pickup and playback functions. For example, smart speakers, smart phones, smart TVs, etc. Before the user interacts with the electronic device by voice, the electronic device needs to be woken up. By waking up, the electronic device can enter the working state from the standby state. Generally, the electronic device determines whether to wake up by recognizing whether the received sound contains a preset wake-up word.
以电子设备为智能音箱,以及智能音箱的唤醒词是“小艺小艺”为例说明。如果用户发出包含“小艺小艺”的声音,智能音箱从接收到的声音中检测到“小艺小艺”,智能音箱唤醒。有时,智能音箱还可播放唤醒应答语音,与用户语音交互。比如,“小艺在,我可以帮您做什么”。但有些场景下,用户或者其他设备发出声音,但是发出的声音中并未包含“小艺小艺”,却导致智能音箱被误唤醒。例如,用户正在看电视。用户并没有发出“小艺小艺”,电视发出的声音中也不包含“小艺小艺”,智能音箱却被误唤醒。这样,影响了用户的正常生活,还需要用户额外关闭智能音箱,给用户带来了不好的体验。Take the electronic device as a smart speaker, and the wake-up word of the smart speaker is "Xiaoyi Xiaoyi" as an example. If the user makes a sound containing "Xiaoyi Xiaoyi", the smart speaker detects "Xiaoyi Xiaoyi" from the received sound, and the smart speaker wakes up. Sometimes, smart speakers can also play a wake-up response voice to interact with the user's voice. For example, "Xiaoyi is here, what can I do for you". However, in some scenarios, the user or other devices emit a sound, but the sound does not contain "Xiaoyi Xiaoyi", which causes the smart speaker to be awakened by mistake. For example, the user is watching TV. The user did not pronounce "Xiaoyi Xiaoyi", and the sound from the TV did not contain "Xiaoyi Xiaoyi", but the smart speaker was mistakenly awakened. In this way, the normal life of the user is affected, and the user is required to turn off the smart speaker additionally, which brings a bad experience to the user.
发明内容SUMMARY OF THE INVENTION
为了解决现有技术中存在的上述技术问题,本申请提供了一种电子设备及其唤醒方法,能够降低电子设备的误唤醒率,提升用户体验。In order to solve the above technical problems existing in the prior art, the present application provides an electronic device and a wake-up method thereof, which can reduce the false wake-up rate of the electronic device and improve user experience.
第一方面,提供一种唤醒方法。该方法应用于包含拾音器和扬声器的电子设备,拾音器包括多个麦克风。该方法包括:接收到声音;计算出声音的唤醒词置信度;唤醒词置信度用于表示声音包括唤醒词的概率;在唤醒词置信度大于等于第一阈值后,计算声音的声源方位;在声源方位与第一方位集合中的一个第一方位匹配后,并且,在匹配上的第一方位对应的第一方位置信度大于等于第三阈值后,唤醒电子设备;或者,在匹配上的第一方位对应的第一方位置信度小于第三阈值后,不唤醒电子设备。其中,唤醒词用于唤醒电子设备;声源方位为声源相对于电子设备的方向和位置;第一方位集合包括M个第一方位元素,每个第一方位元素包括一个第一方位和一个第一方位置信度;第一方位为唤醒电子设备的声源相对于电子设备的方向和位置,用于表示在第一方位唤醒过电子设备;第一方位置信度用于表示在第一方位唤醒电子设备的概率;M为大于等于1的正整数。这样,在声音的唤醒词置信度大于等于第一阈值 的情况下,进一步根据第一方位集合筛选出可能的误唤醒,从而降低了电子设备的误唤醒概率,提升了用户体验。In a first aspect, a wake-up method is provided. The method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones. The method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth is matched with one of the first azimuths in the first azimuth set, and after the first azimuth position reliability corresponding to the matched first azimuth is greater than or equal to the third threshold, wake up the electronic device; or, after the matching After the position reliability of the first party corresponding to the first orientation on the device is less than the third threshold, the electronic device is not woken up. The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set includes M first orientation elements, and each first orientation element includes a first orientation and a The first position reliability; the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, which is used to indicate that the electronic device has been woken up in the first position; the first position reliability is used to indicate that the first position The probability that the orientation wakes up the electronic device; M is a positive integer greater than or equal to 1. In this way, when the confidence of the wake-up word of the sound is greater than or equal to the first threshold, possible false wake-ups are further screened out according to the first set of orientations, thereby reducing the false-awakening probability of the electronic device and improving the user experience.
根据第一方面,在唤醒词置信度大于等于第一阈值后,计算声音对应的声源的方位;包括:在唤醒词置信度大于等于第一阈值后,并且在唤醒词置信度小于第二阈值后,计算声音对应的声源的方位。这样,通过设置第二阈值,筛选出唤醒词置信度在第一阈值和第二阈值之间的情况,能够提高电子设备的处理效率。According to the first aspect, after the wake-up word confidence is greater than or equal to the first threshold, calculating the orientation of the sound source corresponding to the sound; including: after the wake-up word confidence is greater than or equal to the first threshold, and when the wake-up word confidence is less than the second threshold After that, calculate the azimuth of the sound source corresponding to the sound. In this way, by setting the second threshold, the situation where the confidence level of the wake-up word is between the first threshold and the second threshold is screened out, so that the processing efficiency of the electronic device can be improved.
根据第一方面,或者以上第一方面的任意一种实现方式,声源方位与第一方位集合中的一个第一方位匹配;包括:声源方位相对于电子设备的方向,与第一方位集合中的一个第一方位相对于电子设备的方向,两个方向的角度偏差在预设的第四阈值内;并且,声源方位相对于电子设备的位置,与第一方位相对于电子设备的位置,两个位置的位置偏差在预设的第五阈值内。According to the first aspect, or any implementation manner of the above first aspect, the sound source orientation matches one of the first orientation sets in the first orientation set; including: the sound source orientation relative to the direction of the electronic device matches the first orientation set One of the first orientations is relative to the direction of the electronic device, and the angular deviation of the two directions is within a preset fourth threshold; and the position of the sound source orientation relative to the electronic device is different from the position of the first orientation relative to the electronic device. , the position deviation of the two positions is within the preset fifth threshold.
根据第一方面,或者以上第一方面的任意一种实现方式,在声源方位与第一方位集合中的任意一个第一方位都不匹配后,则从声音中提取出声纹;在声纹与第一声纹集合中的一个第一声纹匹配后,并且,在第一声纹对应的第一声纹置信度大于等于预设的第六阈值后,唤醒电子设备;或者,在第一声纹对应的第一声纹置信度小于预设的第六阈值后,不唤醒电子设备。其中,第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度。第一声纹用于表示唤醒电子设备的声纹,第一声纹置信度用于表示第一声纹唤醒电子设备的概率;L为大于等于1的正整数。这样,如果根据第一方位集合无法筛选出可能的误唤醒,则进一步通过第一声纹集合筛选可能的误唤醒,从而降低电子设备的误唤醒概率,提升用户体验。According to the first aspect, or any implementation manner of the above first aspect, after the sound source orientation does not match any one of the first orientations in the first orientation set, a voiceprint is extracted from the sound; After matching with a first voiceprint in the first voiceprint set, and after the first voiceprint confidence corresponding to the first voiceprint is greater than or equal to the preset sixth threshold, wake up the electronic device; After the confidence level of the first voiceprint corresponding to the voiceprint is less than the preset sixth threshold, the electronic device is not woken up. The first voiceprint set includes L voiceprint elements, and each voiceprint element includes a first voiceprint and a first voiceprint confidence level. The first voiceprint is used to represent the voiceprint for waking up the electronic device, and the first voiceprint confidence is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1. In this way, if the possible false awakening cannot be screened out according to the first orientation set, the possible false awakening is further screened through the first voiceprint set, thereby reducing the false awakening probability of the electronic device and improving the user experience.
根据第一方面,或者以上第一方面的任意一种实现方式,在唤醒电子设备之后,该方法还包括:更新第一方位集合和第一声纹集合。According to the first aspect, or any implementation manner of the above first aspect, after waking up the electronic device, the method further includes: updating the first orientation set and the first voiceprint set.
根据第一方面,或者以上第一方面的任意一种实现方式,在唤醒词置信度大于等于第二阈值后,唤醒电子设备,并更新第一方位集合和第一声纹集合。According to the first aspect, or any implementation manner of the above first aspect, after the wake-up word confidence is greater than or equal to the second threshold, the electronic device is woken up, and the first orientation set and the first voiceprint set are updated.
根据第一方面,或者以上第一方面的任意一种实现方式,在唤醒词置信度大于等于第二阈值后,唤醒电子设备,并更新第一方位集合和第一声纹集合;包括:在唤醒词置信度大于等于第二阈值后,唤醒电子设备,并创建第一方位集合和第一声纹集合;唤醒所述电子设备的方位纳入第一方位集合,唤醒所述电子设备的声纹纳入第一声纹集合,并赋予纳入第一方位集合的方位一个初始的方位置信度,以及赋予纳入第一声纹集合的声纹一个初始的声纹置信度。According to the first aspect, or any implementation manner of the above first aspect, after the wake-up word confidence is greater than or equal to the second threshold, wake up the electronic device, and update the first orientation set and the first voiceprint set; including: after waking up After the word confidence is greater than or equal to the second threshold, wake up the electronic device, and create a first orientation set and a first voiceprint set; the orientation of waking up the electronic device is included in the first orientation set, and the voiceprint waking up the electronic device is included in the first orientation set. A voiceprint set is assigned, and an initial azimuth position confidence is given to the azimuths included in the first azimuth set, and an initial voiceprint confidence is given to the voiceprints included in the first voiceprint set.
第二方面,提供一种唤醒方法。该方法应用于包含拾音器和扬声器的电子设备,拾音器包括多个麦克风。该方法包括:接收到声音;计算出声音的唤醒词置信度;唤醒词置信度用于表示声音包括唤醒词的概率;在唤醒词置信度大于等于第一阈值后,计算声音的声源方位;在声源方位与第二方位集合中的一个第二方位匹配后,并且,在匹配上的第二方位对应的第二方位置信度大于等于第七阈值后,唤醒电子设备;或者,在匹配上的第二方位对应的第二方位置信度小于第七阈值后,则不唤醒电子设备。其中,唤醒词用于唤醒电子设备;声源方位为声源相对于电子设备的方向和位置;第二方位集合包括N个第二方位元素,每个第二方位元素包括一个第二方位和一个第二 方位置信度,第二方位为没有唤醒电子设备的声源相对于电子设备的方向和位置,用于表示在第二方位没有唤醒电子设备,第二方位置信度用于表示在第二方位没有唤醒电子设备的概率;N为大于等于1的正整数。这样,在声音的唤醒词置信度大于等于第一阈值的情况下,进一步根据第二方位集合筛选出可能的误唤醒,从而降低了电子设备的误唤醒概率,提升了用户体验。In a second aspect, a wake-up method is provided. The method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones. The method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth matches one of the second azimuths in the second azimuth set, and after the position reliability of the second azimuth corresponding to the matched second azimuth is greater than or equal to the seventh threshold, wake up the electronic device; or, after the matching After the position reliability of the second party corresponding to the second orientation on the device is less than the seventh threshold, the electronic device is not awakened. The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the second orientation set includes N second orientation elements, and each second orientation element includes a second orientation and a The second position reliability, the second position is the direction and position of the sound source that does not wake up the electronic device relative to the electronic device, which is used to indicate that the electronic device is not woken up in the second position, and the second position reliability is used to indicate that in the first position The probability of not waking up the electronic device in two directions; N is a positive integer greater than or equal to 1. In this way, when the confidence of the wake-up word of the sound is greater than or equal to the first threshold, possible false wake-ups are further screened out according to the second orientation set, thereby reducing the probability of false wake-up of the electronic device and improving the user experience.
根据第二方面,声源方位与第二方位集合中的一个第二方位匹配;包括:声源方位相对于电子设备的方向,与第二方位集合中的一个第二方位相对于电子设备的方向,两个方向的角度偏差在预设的第八阈值内;并且,声源方位相对于电子设备的位置,与第二方位相对于电子设备的位置,两个位置的位置偏差在预设的第九阈值内。According to the second aspect, the sound source orientation is matched with a second orientation in the second set of orientations; including: the orientation of the sound source relative to the direction of the electronic device, and the orientation of one of the second orientations in the second set of orientations relative to the electronic device , the angular deviation of the two directions is within the preset eighth threshold; and, the position of the sound source azimuth relative to the position of the electronic device, and the position of the second azimuth relative to the position of the electronic device, the position deviation of the two positions is within the preset eighth within nine thresholds.
根据第二方面,或者以上第二方面的任意一种实现方式,所述方法还包括:在声源方位与第二方位集合中的任意一个第二方位都不匹配后,则从声音中提取出声纹;在声纹与第一声纹集合中的任意一个第一声纹都不匹配后,更新第二方位集合。其中,第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,第一声纹用于表示唤醒电子设备的声纹,第一声纹置信度用于表示第一声纹唤醒电子设备的概率;L为大于等于1的正整数。这样,如果根据第二方位集合无法筛选出可能的误唤醒,则进一步通过第一声纹集合筛选可能的误唤醒,从而降低电子设备的误唤醒概率,提升用户体验。According to the second aspect, or any implementation manner of the above second aspect, the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, extracting from the sound Voiceprint; after the voiceprint does not match any of the first voiceprints in the first voiceprint set, update the second orientation set. Wherein, the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence, the first voiceprint is used to represent the voiceprint for waking up the electronic device, the first The voiceprint confidence is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1. In this way, if the possible false awakening cannot be screened out according to the second orientation set, the possible false awakening is further screened through the first voiceprint set, thereby reducing the false awakening probability of the electronic device and improving the user experience.
根据第二方面,或者以上第二方面的任意一种实现方式,所述方法还包括:在声源方位与第二方位集合中的任意一个第二方位都不匹配后,则从声音中提取出声纹;在声纹与第一声纹集合中的一个第一声纹匹配后,并且,在第一声纹对应的第一声纹置信度大于等于预设的第十阈值后,唤醒电子设备;或者,在第一声纹对应的第一声纹置信度小于预设的第十阈值后,则不唤醒电子设备,并且更新第二方位集合。其中,第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,第一声纹置信度用于表示第一声纹唤醒电子设备的概率,第一声纹用于表示唤醒电子设备的声纹;L为大于等于1的正整数。According to the second aspect, or any implementation manner of the above second aspect, the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, extracting from the sound Voiceprint; wake up the electronic device after the voiceprint matches one of the first voiceprints in the first voiceprint set, and after the confidence level of the first voiceprint corresponding to the first voiceprint is greater than or equal to the preset tenth threshold ; or, after the confidence level of the first voiceprint corresponding to the first voiceprint is smaller than the preset tenth threshold, the electronic device is not woken up, and the second orientation set is updated. The first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint confidence level is used to indicate that the first voiceprint wakes up the electronic device The probability of , the first voiceprint is used to represent the voiceprint for waking up the electronic device; L is a positive integer greater than or equal to 1.
根据第二方面,或者以上第二方面的任意一种实现方式,在唤醒电子设备之后,该方法还包括:更新第一声纹集合;在未唤醒电子设备之后,该方法还包括:更新第二方位集合。According to the second aspect, or any implementation manner of the above second aspect, after waking up the electronic device, the method further includes: updating the first voiceprint set; after not waking up the electronic device, the method further includes: updating the second Azimuth collection.
根据第二方面,或者以上第二方面的任意一种实现方式,在唤醒词置信度大于等于第二阈值后,唤醒电子设备,并更新第一声纹集合。这样,通过设置第二阈值,筛选出唤醒词置信度在第一阈值和第二阈值之间的情况,能够提高电子设备的处理效率。According to the second aspect, or any implementation manner of the above second aspect, after the wake-up word confidence is greater than or equal to the second threshold, the electronic device is woken up, and the first voiceprint set is updated. In this way, by setting the second threshold, the situation where the confidence level of the wake-up word is between the first threshold and the second threshold is screened out, so that the processing efficiency of the electronic device can be improved.
第三方面,提供一种唤醒方法。该方法应用于包含拾音器和扬声器的电子设备,拾音器包括多个麦克风。该方法包括:接收到声音;计算出声音的唤醒词置信度;唤醒词置信度用于表示声音包括唤醒词的概率;在唤醒词置信度大于等于第一阈值后,计算声音的声源方位;在声源方位与第二方位集合中的一个第二方位匹配后,以及在声源方位与第一方位集合中的任意一个第一方位都不匹配后,并且在匹配上的第二方位对应的第二方位置信度大于等于第十一阈值后,则唤醒电子设备;或者,在匹配上的第二方位对应的第二方位置信度小于第十一阈值后,则不唤醒电子设备。其中,唤 醒词用于唤醒电子设备;声源方位为声源相对于电子设备的方向和位置;第一方位集合包括M个第一方位元素,每个第一方位元素包括一个第一方位和一个第一方位置信度;第一方位为唤醒电子设备的声源相对于电子设备的方向和位置,用于表示在第一方位唤醒过电子设备;第一方位置信度用于表示在第一方位唤醒电子设备的概率;第二方位集合包括N个第二方位元素,每个第二方位元素包括一个第二方位和一个第二方位置信度;第二方位为没有唤醒电子设备的声源相对于电子设备的方向和位置,用于表示在第二方位没有唤醒电子设备;第二方位置信度用于表示在第二方位没有唤醒电子设备的置信度;M和N均为大于等于1的正整数。这样,在声音的唤醒词置信度大于等于第一阈值的情况下,进一步根据第一方位集合和第二方位集合筛选出可能的误唤醒,从而降低电子设备的误唤醒概率,提升用户体验。In a third aspect, a wake-up method is provided. The method is applied to an electronic device comprising a microphone and a speaker, the microphone including a plurality of microphones. The method includes: receiving a sound; calculating a wake-up word confidence level of the sound; the wake-up word confidence level is used to indicate the probability that the sound includes a wake-up word; after the wake-up word confidence level is greater than or equal to a first threshold, calculating the sound source orientation of the sound; After the sound source azimuth matches a second azimuth in the second azimuth set, and after the sound source azimuth does not match any first azimuth in the first azimuth set, and the matching second azimuth corresponds to After the second party location reliability is greater than or equal to the eleventh threshold, the electronic device is awakened; or, after the second party location reliability corresponding to the matched second orientation is less than the eleventh threshold, the electronic device is not awakened. The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set includes M first orientation elements, and each first orientation element includes a first orientation and a The first position reliability; the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, which is used to indicate that the electronic device has been woken up in the first position; the first position reliability is used to indicate that the first position The probability that the orientation wakes up the electronic device; the second orientation set includes N second orientation elements, each of which includes a second orientation and a second orientation reliability; the second orientation is a sound source that does not wake up the electronic device Relative to the direction and position of the electronic device, it is used to indicate that the electronic device is not woken up in the second orientation; the second position reliability is used to indicate the confidence that the electronic device has not been woken up in the second orientation; M and N are both greater than or equal to 1 positive integer of . In this way, when the confidence of the wake-up word of the sound is greater than or equal to the first threshold, possible false wake-ups are further screened out according to the first set of orientations and the second set of orientations, thereby reducing the probability of mis-awakening of electronic devices and improving user experience.
根据第三方面,声源方位与第二方位集合中的一个第二方位匹配;包括:声源方位相对于电子设备的方向,与第二方位集合中的一个第二方位相对于电子设备的方向,两个方向的角度偏差在预设的第十二阈值内;并且,声源方位相对于电子设备的位置,与第二方位相对于电子设备的位置,两个位置的位置偏差在预设的第十三阈值内;声源方位与第一方位集合中的任意一个第一方位都不匹配;包括:声源方位相对于电子设备的方向,与第一方位集合中任意一个第一方位相对于电子设备的方向,两个方向的角度偏差都不在预设的第十四阈值内;并且,声源方位相对于电子设备的位置,与第一方位集合中任意一个第一方位相对于电子设备的位置,两个位置的位置偏差都不在预设的第十五阈值内。According to a third aspect, the sound source azimuth is matched with a second azimuth in the second azimuth set; including: the sound source azimuth relative to the direction of the electronic device, and a second azimuth in the second azimuth set relative to the direction of the electronic device , the angular deviation of the two directions is within the preset twelfth threshold; and, the position deviation of the sound source azimuth relative to the position of the electronic device and the position of the second azimuth relative to the electronic device is within the preset twelfth threshold. Within the thirteenth threshold; the sound source orientation does not match any first orientation in the first orientation set; including: the sound source orientation relative to the direction of the electronic device is relative to any first orientation in the first orientation set The direction of the electronic device, the angular deviation of the two directions is not within the preset fourteenth threshold; and, the position of the sound source azimuth relative to the electronic device is the same as the position of any first azimuth in the first azimuth set relative to the electronic device. position, the position deviation of both positions is not within the preset fifteenth threshold.
根据第三方面,或者以上第三方面的任意一种实现方式,该方法还包括:在声源方位与第一方位集合中的一个第一方位匹配后,以及在声源方位与第二方位集合中的任意一个第二方位都不匹配后,并且,在匹配上的第一方位对应的第一方位置信度大于等于第十六阈值后,则唤醒电子设备;或者,在匹配上的第一方位对应的第一方位置信度小于第十六阈值后,则不唤醒电子设备。According to the third aspect, or any implementation manner of the above third aspect, the method further includes: after the sound source azimuth is matched with a first azimuth in the first azimuth set, and after the sound source azimuth is matched with the second azimuth set After any one of the second orientations does not match, and after the position reliability of the first party corresponding to the matching first orientation is greater than or equal to the sixteenth threshold, wake up the electronic device; After the position reliability of the first party corresponding to the orientation is less than the sixteenth threshold, the electronic device is not awakened.
根据第三方面,或者以上第三方面的任意一种实现方式,声源方位与第一方位集合中的一个第一方位匹配;包括:声源方位相对于电子设备的方向,与第一方位集合中的一个第一方位相对于电子设备的方向,两个方向的角度偏差在预设的第十四阈值内;并且,声源方位相对于电子设备的位置,与第一方位相对于电子设备的位置,两个位置的位置偏差在预设的第十五阈值内;声源方位与第二方位集合中的任意一个第二方位都不匹配;包括:声源方位相对于电子设备的方向,与第二方位集合中任意一个第二方位相对于电子设备的方向,两个方向的角度偏差都不在预设的第十二阈值内;并且,声源方位相对于电子设备的位置,与第二方位集合中任意一个第二方位相对于电子设备的位置,两个位置的位置偏差都不在预设的第十三阈值内。According to the third aspect, or any one of the implementation manners of the above third aspect, the sound source orientation matches one of the first orientation sets in the first orientation set; including: the sound source orientation relative to the direction of the electronic device is matched with the first orientation set One of the first orientations is relative to the direction of the electronic device, and the angular deviation of the two directions is within the preset fourteenth threshold; and, the position of the sound source orientation relative to the electronic device is different from the first orientation relative to the electronic device. position, the position deviation of the two positions is within the preset fifteenth threshold; the sound source azimuth does not match any second azimuth in the second azimuth set; including: the sound source azimuth relative to the direction of the electronic device, and The angular deviation of any second azimuth in the second azimuth set relative to the direction of the electronic device is not within the preset twelfth threshold; For the position of any second orientation in the set relative to the position of the electronic device, the position deviation of the two positions is not within the preset thirteenth threshold.
根据第三方面,或者以上第三方面的任意一种实现方式,该方法还包括:在声源方位与第二方位集合中的任意一个第二方位都不匹配后,以及在声源方位与第一方位集合中的任意一个第一方位都不匹配后,则从声音中提取出声纹;在声纹与第一声纹集合中的一个第一声纹匹配后,并且,在第一声纹对应的第一声纹置信度大于等于预设的第十六阈值后,则唤醒电子设备,并且更新第一方位结合和第一声纹集合;或者, 在第一声纹对应的第一声纹置信度小于预设的第十六阈值后,则不唤醒电子设备,并且更新第二方位集合。其中,第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,第一声纹置信度用于表示第一声纹唤醒电子设备的概率,第一声纹用于表示唤醒电子设备的声纹;L为大于等于1的正整数。这样,如果根据第一方位集合和第二方位集合无法筛选出可能的误唤醒,则进一步通过第一声纹集合筛选可能的误唤醒,从而降低电子设备的误唤醒概率,提升用户体验。According to the third aspect, or any implementation manner of the above third aspect, the method further includes: after the sound source azimuth does not match any second azimuth in the second azimuth set, and after the sound source azimuth matches the first azimuth After any one of the first orientations in the set of orientations does not match, the voiceprint is extracted from the sound; after the voiceprint is matched with a first voiceprint in the first set of voiceprints, and in the first voiceprint After the corresponding first voiceprint confidence level is greater than or equal to the preset sixteenth threshold, wake up the electronic device, and update the first orientation combination and the first voiceprint set; or, in the first voiceprint corresponding to the first voiceprint After the confidence level is less than the preset sixteenth threshold, the electronic device is not woken up, and the second set of orientations is updated. The first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint confidence level is used to indicate that the first voiceprint wakes up the electronic device The probability of , the first voiceprint is used to represent the voiceprint for waking up the electronic device; L is a positive integer greater than or equal to 1. In this way, if possible false wakeups cannot be screened out according to the first set of orientations and the second set of orientations, the first voiceprint set is further screened for possible false wakeups, thereby reducing the probability of false wakeups of the electronic device and improving user experience.
根据第三方面,或者以上第三方面的任意一种实现方式,该方法还包括:在声纹与第一声纹集合中的任意一个第一声纹都不匹配后,更新第二方位集合。According to the third aspect, or any implementation manner of the above third aspect, the method further includes: after the voiceprint does not match any one of the first voiceprints in the first voiceprint set, updating the second orientation set.
根据第三方面,或者以上第三方面的任意一种实现方式,该方法还包括:在唤醒电子设备后,更新第一方位集合;在不唤醒电子设备后,更新第二方位集合。According to the third aspect, or any implementation manner of the above third aspect, the method further includes: after waking up the electronic device, updating the first set of orientations; after not waking up the electronic device, updating the second set of orientations.
第四方面,提供一种电子设备。该电子设备包括拾音器和扬声器,拾音器包括多个麦克风,电子设备还包括:处理器;存储器;以及计算机程序,其中计算机程序存储在存储器中,当计算机程序被处理器执行时,使得电子设备执行如第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式所述的方法。In a fourth aspect, an electronic device is provided. The electronic device includes a pickup and a speaker, the pickup includes a plurality of microphones, and the electronic device further includes: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, and when the computer program is executed by the processor, the electronic device executes the The method described in the first aspect and any implementation manner of the first aspect, the second aspect and any implementation manner of the second aspect, and the third aspect and any implementation manner of the third aspect.
第四方面及第四方面中任意一种实现方式对应的技术效果可参见上述第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式对应的技术效果,此处不再赘述。For the technical effects corresponding to the fourth aspect and any one of the implementations of the fourth aspect, please refer to the above-mentioned first aspect and any one of the implementations of the first aspect, the second aspect and any one of the implementations of the second aspect, and the third. The technical effects corresponding to any one of the implementation manners of the aspect and the third aspect will not be repeated here.
第五方面,提供一种计算机可读存储介质。该计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得电子设备执行如第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式所述的方法,其中电子设备包括拾音器和扬声器,拾音器包括多个麦克风。In a fifth aspect, a computer-readable storage medium is provided. The computer-readable storage medium includes a computer program that, when the computer program runs on an electronic device, causes the electronic device to perform the first aspect and any one of the implementations of the first aspect, the second aspect and any one of the second aspect Implementations, the third aspect and the method of any one of the implementations of the third aspect, wherein the electronic device includes a pickup and a speaker, and the pickup includes a plurality of microphones.
第五方面及第五方面中任意一种实现方式对应的技术效果可参见上述第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式对应的技术效果,此处不再赘述。For the technical effect corresponding to any one of the implementation manners of the fifth aspect and the fifth aspect, reference may be made to any implementation manner of the first aspect and the first aspect, any implementation manner of the second aspect and the second aspect, and the third aspect. The technical effects corresponding to any one of the implementation manners of the aspect and the third aspect will not be repeated here.
第六方面,提供一种计算机程序产品。当其在计算机上运行时,使得计算机执行如第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式所述的方法。In a sixth aspect, a computer program product is provided. When it runs on a computer, it causes the computer to execute the first aspect and any one of the implementations of the first aspect, the second aspect and any one of the implementations of the second aspect, and any one of the third aspect and the third aspect method described in an implementation.
第六方面及第六方面中任意一种实现方式对应的技术效果可参见上述第一方面以及第一方面中任意一种实现方式,第二方面以及第二方面中任意一种实现方式,第三方面以及第三方面中任意一种实现方式对应的技术效果,此处不再赘述。For the technical effect corresponding to any one of the implementations of the sixth aspect and the sixth aspect, reference may be made to any implementation of the first aspect and the first aspect, the second aspect and any one of the implementations of the second aspect, and the third aspect. The technical effects corresponding to any one of the implementation manners of the aspect and the third aspect will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请实施例提供的电子设备的硬件结构示意图;1 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application;
图2为本申请实施例提供的电子设备的软件结构示意图;2 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application;
图3为本申请实施例提供的唤醒方法的场景示意图;FIG. 3 is a schematic diagram of a scenario of a wake-up method provided by an embodiment of the present application;
图4为本申请实施例提供的唤醒方法中用户设置的图形用户界面的示意图;4 is a schematic diagram of a graphical user interface set by a user in a wake-up method provided by an embodiment of the present application;
图5为本申请实施例提供的唤醒方法中一个实施例的流程图;FIG. 5 is a flowchart of an embodiment of a wake-up method provided by an embodiment of the present application;
图6为本申请实施例提供的唤醒方法中另一个实施例的流程图;FIG. 6 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application;
图7为本申请实施例提供的唤醒方法中又一个实施例的流程图;FIG. 7 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application;
图8为本申请实施例提供的唤醒方法中又一个实施例的流程图;FIG. 8 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application;
图9为本申请实施例提供的唤醒方法中又一个实施例的流程图;FIG. 9 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application;
图10为本申请实施例提供的唤醒方法中又一个实施例的流程图;FIG. 10 is a flowchart of another embodiment of the wake-up method provided by the embodiment of the present application;
图11为本申请实施例提供的电子设备的结构组成示意图。FIG. 11 is a schematic structural composition diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请以下各实施例中,“至少一个”、“一个或多个”是指一个、两个以上(包含两个)。术语“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。The terms used in the following embodiments are for the purpose of describing particular embodiments only, and are not intended to be limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also Expressions such as "one or more" are included unless the context clearly dictates otherwise. It should also be understood that, in the following embodiments of the present application, "at least one" and "one or more" refer to one, two or more (including two). The term "and/or", used to describe the association relationship of related objects, indicates that there can be three kinds of relationships; for example, A and/or B, can indicate: A alone exists, A and B exist at the same time, and B exists alone, A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship.
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。术语“连接”包括直接连接和间接连接,除非另外说明。References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise. The term "connected" includes both direct and indirect connections unless otherwise specified.
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.
在一个实例中,通过优化电子设备中预设的唤醒词模型来降低电子设备产生的误唤醒概率。唤醒词模型的主要作用为:从电子设备拾音得到的声音中检测唤醒词,得到声音中包含唤醒词的概率。唤醒词模型是一个训练好的机器学习模型,举例来说,可以预先建立用于检测唤醒词的模型,使用样本对模型进行训练得到唤醒词模型。上述预先建立的模型可以是神经网络模型、高斯混合模型、隐马尔可夫模型等。上述样本可以是包含唤醒词的声音、或者包含唤醒词的声音的音素序列、或者包含唤醒词的声音的音频特征等。包含唤醒词的声音可以由不同的人在不同的场景下录制得到。使用不同场景下不同人录制的包含唤醒词的声音,可以使训练后的唤醒词模型能够检测 出各种场景下声音中的唤醒词。不同场景下录制的声音并非仅包括唤醒词,可能包括噪声(比如非唤醒词)。这样,不同场景下录制的声音作为样本,对唤醒词模型进行训练的话,会导致唤醒词模型被污染,使得唤醒词模型可能会将包括非唤醒词的声音识别为唤醒词声音,从而出现误唤醒。以唤醒词为“小艺小艺”的智能音箱为例,基于上述方法训练得到的唤醒词模型设置到智能音箱中后,唤醒词模型可能会将智能音箱拾音得到的声音中与“小艺小艺”发音相似、甚至发音完全不同的声音检测为包含唤醒词的声音,从而使得智能音箱被误唤醒。In one example, the probability of false wake-up generated by the electronic device is reduced by optimizing the wake-up word model preset in the electronic device. The main function of the wake-up word model is to detect the wake-up word from the sound picked up by the electronic device, and obtain the probability that the sound contains the wake-up word. The wake-up word model is a trained machine learning model. For example, a model for detecting wake-up words can be established in advance, and a wake-up word model can be obtained by training the model with samples. The above-mentioned pre-established model may be a neural network model, a Gaussian mixture model, a hidden Markov model, or the like. The above-mentioned samples may be sounds containing wake-up words, or phoneme sequences of sounds containing wake-up words, or audio features of sounds containing wake-up words, or the like. Voices containing wake words can be recorded by different people in different scenarios. Using the sounds containing wake words recorded by different people in different scenarios can enable the trained wake word model to detect wake words in sounds in various scenarios. The sounds recorded in different scenarios do not only include wake words, but may include noise (such as non-wake words). In this way, if the sounds recorded in different scenarios are used as samples to train the wake-up word model, the wake-up word model will be polluted, so that the wake-up word model may recognize sounds including non-wake words as wake-up word sounds, resulting in false wake-up. . Taking the smart speaker with the wake-up word "Xiaoyi Xiaoyi" as an example, after the wake-up word model trained based on the above method is set in the smart speaker, the wake-up word model may mix the sound picked up by the smart speaker with "Xiaoyi". "Xiaoyi" sounds with similar or even completely different pronunciations are detected as sounds containing wake-up words, thus causing the smart speaker to be awakened by mistake.
为了尽量降低样本不干净导致的唤醒词模型污染、进而导致的电子设备的误唤醒问题,需要不断的优化唤醒词模型。具体来说,是通过数据标注不断迭代优化唤醒词模型。但是,数据标注需要人工对作为样本的声音进行标注,人力资源消耗过大,且优化后的唤醒词模型仍然具有一定概率的误唤醒问题。为此,本申请实施例提供一种电子设备及唤醒方法,能够降低电子设备的误唤醒概率,提升用户体验。In order to minimize the pollution of the wake-up word model caused by unclean samples, and the problem of false wake-up of electronic devices caused by it, it is necessary to continuously optimize the wake-up word model. Specifically, iteratively optimizes the wake-up word model through data annotation. However, data labeling requires manual labeling of the sounds used as samples, which consumes too much human resources, and the optimized wake-up word model still has a certain probability of false wake-up. To this end, embodiments of the present application provide an electronic device and a wake-up method, which can reduce the false wake-up probability of the electronic device and improve user experience.
本申请实施例提供的电子设备,为带有拾音功能和外放语音功能的电子设备。例如:智能音箱,智能手机,平板电脑,个人电脑(personal computer,PC),可穿戴设备(如智能眼镜、智能手表、智能手环等),智能电视等智能家电设备,智慧屏,智能网联车(intelligent connected vehicle,ICV),智能(汽)车(smart/intelligent car)或车载设备等。The electronic device provided by the embodiment of the present application is an electronic device with a function of picking up sound and a function of broadcasting voice. For example: smart speakers, smart phones, tablet computers, personal computers (PCs), wearable devices (such as smart glasses, smart watches, smart bracelets, etc.), smart home appliances such as smart TVs, smart screens, smart network connections Vehicle (intelligent connected vehicle, ICV), intelligent (car) car (smart/intelligent car) or in-vehicle equipment, etc.
示例性的,图1示出了电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。Exemplarily, FIG. 1 shows a schematic structural diagram of an electronic device 100 . The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。示例性的,电子设备100可以为智能音箱。智能音箱可以包括:处理器110,内部存储器121,扬声器170A,以及麦克风170C。It can be understood that, the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. Exemplarily, the electronic device 100 may be a smart speaker. The smart speaker may include: a processor 110, an internal memory 121, a speaker 170A, and a microphone 170C.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等中的部分或全部。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. part or all of it. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and the like.
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递声音,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 . In some embodiments, the audio module 170 can transmit sound to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递声音,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit sound to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。Internal memory 121 may be used to store computer executable program code, which includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟声音输出,也用于将模拟音频输入转换为数字声音。音频模块170还可以用于对声音编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。 Audio module 170 is used to convert digital audio information to analog sound output, and also to convert analog audio input to digital sound. Audio module 170 may also be used to encode and decode sound. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输 入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound through the human mouth close to the microphone 170C, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone jack 170D is used to connect wired earphones. The earphone interface 170D can be the USB interface 130, or can be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 as an example.
图2是本发明实施例的电子设备100的软件结构框图。FIG. 2 is a block diagram of a software structure of an electronic device 100 according to an embodiment of the present invention.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。As shown in Figure 2, the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。As shown in Figure 2, the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the electronic device 100 . For example, the management of call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
2D图形引擎是2D绘图的绘图引擎。2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
为了便于理解,本申请以下实施例将以具有图1和图2所示结构的电子设备为例,结合附图和应用场景,对本申请实施例提供的方法进行具体说明。需要说明的是,虽然电子设备的软件结构以图2为例,但图2所示的软件结构只是一种示意性示例,其他操作系统的软件结构也适用于本申请实施例提供的唤醒方法。For ease of understanding, the following embodiments of the present application will take the electronic device having the structure shown in FIG. 1 and FIG. 2 as an example, and combine the drawings and application scenarios to specifically describe the methods provided by the embodiments of the present application. It should be noted that although FIG. 2 is used as an example for the software structure of the electronic device, the software structure shown in FIG. 2 is only a schematic example, and the software structures of other operating systems are also applicable to the wake-up method provided in this embodiment of the present application.
为了便于说明,以电子设备为智能音箱,并且该智能音箱处于家庭环境为例,阐明本申请实施例提供的唤醒方法。图3为本申请实施例提供的唤醒方法的一种场景示意图。如图3所示,家庭环境除了设置有智能音箱之外,还设置有:电视、传统音箱等具有外放声音功能的其他设备,沙发、餐桌等家具。用户可以在沙发、餐桌等的附近活动,通过发出包含唤醒词的语音唤醒智能音箱。智能音箱还可以放置于其他的场景下。例如商场、办公环境等。通过执行本申请实施例的唤醒方法,同样能够降低智能音箱的误唤醒概率。以下,对本申请实施例提供的唤醒方法的具体实现进行说明。For convenience of description, the wake-up method provided by the embodiment of the present application is illustrated by taking the electronic device as a smart speaker and the smart speaker is in a home environment as an example. FIG. 3 is a schematic diagram of a scenario of a wake-up method provided by an embodiment of the present application. As shown in Figure 3, in addition to smart speakers, the home environment is also equipped with other devices with a function of broadcasting sound, such as TVs and traditional speakers, and furniture such as sofas and dining tables. Users can move around the sofa, dining table, etc., and wake up the smart speaker by uttering a voice containing a wake-up word. Smart speakers can also be placed in other scenarios. Such as shopping malls, office environments, etc. By executing the wake-up method of the embodiment of the present application, the false wake-up probability of the smart speaker can also be reduced. Hereinafter, the specific implementation of the wake-up method provided by the embodiments of the present application will be described.
智能音箱在未被唤醒、处于待机状态时,对环境中的声音进行拾音得到声音。该声音包括了目标说话人也即用户的声音,还包括环境中的噪声信号。为此,一般对接 收到的声音进行降噪处理,得到干净的声音,作为触发执行本申请实施例唤醒方法的声音。When the smart speaker is not awakened and in the standby state, it picks up the sound in the environment to obtain the sound. The voice includes the voice of the target speaker, that is, the user, and also includes noise signals in the environment. For this reason, noise reduction processing is generally performed on the received sound to obtain a clean sound, which is used as the sound for triggering the execution of the wake-up method in the embodiment of the present application.
本申请实施例的唤醒方法中,在智能音箱中设置:唤醒方位集合,误唤醒方位集合和声纹集合中的至少一项。其中,误唤醒方位集合包括:误唤醒方位,以及误唤醒方位的置信度。以下,通过(误唤醒方位,置信度)来表示误唤醒方位集合中的一个元素。误唤醒方位用于记录未唤醒智能音箱的声音的声源方位。误唤醒方位的置信度用于描述误唤醒方位处发出唤醒智能音箱的语音的概率。置信度可以通过数值的大小来标识概率。例如,数值越大,概率越高,数值越小,概率越低。本申请实施例中的方位是指相对于智能音箱的方向和位置,例如声源方位是指声源相对于智能音箱的方向和位置。In the wake-up method of the embodiment of the present application, at least one of a wake-up orientation set, a false wake-up orientation set and a voiceprint set is set in the smart speaker. The set of mis-awakened orientations includes: mis-awakened orientations and confidence levels of the mis-awakened orientations. Hereinafter, an element in the false-awakening location set is represented by (false-awakening location, confidence). The false wake-up position is used to record the sound source position of the sound that did not wake up the smart speaker. The confidence level of the false wake-up position is used to describe the probability that the voice to wake up the smart speaker is issued at the false wake-up position. Confidence can identify the probability by the magnitude of the value. For example, the larger the value, the higher the probability, and the smaller the value, the lower the probability. The orientation in the embodiments of the present application refers to the direction and position relative to the smart speaker, for example, the orientation of the sound source refers to the direction and position of the sound source relative to the smart speaker.
唤醒方位集合包括:唤醒方位,以及唤醒方位的置信度。以下,通过(唤醒方位,置信度)来表示唤醒方位集合中的一个元素。唤醒方位用于记录唤醒智能音箱的声音的声源方位。唤醒方位的置信度用于描述唤醒方位处发出唤醒智能音箱的语音的概率。The wake-up position set includes: the wake-up position, and the confidence level of the wake-up position. Hereinafter, an element in the set of wake-up positions is represented by (wake-up position, confidence). The wake-up location is used to record the sound source location of the sound that wakes up the smart speaker. The confidence of the wake-up position is used to describe the probability that the voice to wake up the smart speaker is issued at the wake-up position.
声纹集合包括:唤醒声纹,以及唤醒声纹的置信度。唤醒声纹的置信度可以通过唤醒声纹的命中次数来表示。以下,通过(唤醒声纹,置信度)来表示声纹集合中的一个元素。唤醒声纹用于记录唤醒智能音箱的声音的用户声纹。唤醒声纹的置信度用于记录具有该唤醒声纹的声音唤醒智能音箱的概率。命中次数用于记录具有该唤醒声纹的声音唤醒智能音箱的次数。用户声纹和唤醒声纹可以通过声纹特征参数的参数值来表示。上述声纹特征参数例如可以包括但不限于强度、波长、频率、节奏等。不同的声纹之间至少一个声纹特征参数的参数值不同。The voiceprint set includes: wake-up voiceprint, and the confidence level of the wake-up voiceprint. The confidence of the wake-up voiceprint can be represented by the number of hits of the wake-up voiceprint. In the following, an element in the voiceprint set is represented by (wakeup voiceprint, confidence). The wake-up voiceprint is used to record the user's voiceprint of the sound that wakes up the smart speaker. The confidence of the wake-up voiceprint is used to record the probability that the sound with the wake-up voiceprint wakes up the smart speaker. The number of hits is used to record the number of times the sound with the wake-up voiceprint wakes up the smart speaker. User voiceprint and wake-up voiceprint can be represented by parameter values of voiceprint feature parameters. The above-mentioned voiceprint feature parameters may include, but are not limited to, intensity, wavelength, frequency, rhythm, and the like, for example. The parameter value of at least one voiceprint feature parameter is different between different voiceprints.
对声源方位可能的表示方法以及计算方法说明如下:可选地,可以建立智能音箱的坐标系。例如坐标系的原点可以是智能音箱的物理中心点,x轴正方向可以是水平指向智能音箱正前方的方向。该坐标系的建立方法仅为示例,并不用以限制智能音箱的坐标系的建立方法。声源方位可以在上述坐标系下通过距离、以及角度来标识。具体的,声源方位的距离可以用于记录:声音的声源与智能音箱的坐标系原点之间的距离。角度可以用于记录:智能音箱的坐标系原点指向声音的声源的射线与智能音箱的x轴正方向之间的夹角。可选地,还可以进一步为声源方位增加高度这一维度的参数。高度可以用于记录:声音的声源与坐标系的原点之间的垂直距离。声源方位的距离和角度等信息可以由智能音箱基于相关的声源定位方法计算得到。声源定位方法可以基于智能音箱中设置的至少2个麦克风组成的麦克风阵列来计算声源与智能音箱之间的相对位置。例如距离和角度等。具体的,声源定位方法可以包括但不限于:基于最大输出功率的可控波束形成技术、基于高分辨率谱图估计技术和基于声音时间差(time-delay estimation,TDE)的声源定位技术等。以基于TDE的算法为例,其核心在于对传播时延的准确估计,传播时延一般通过对智能音箱的麦克风阵列拾取到的声音之间做互相关处理得到。之后,可以通过简单的延时求和、几何计算或是直接利用互相关结果进行可控功率响应搜索等方法来计算得到智能音箱与声源之间的距离。具体算法本申请实施例不一一展开。The possible representation methods and calculation methods of the sound source orientation are described as follows: Optionally, a coordinate system of the smart speaker can be established. For example, the origin of the coordinate system may be the physical center point of the smart speaker, and the positive direction of the x-axis may be the direction pointing horizontally to the front of the smart speaker. The method for establishing the coordinate system is only an example, and is not intended to limit the method for establishing the coordinate system of the smart speaker. The sound source orientation can be identified by distance and angle in the above-mentioned coordinate system. Specifically, the distance of the sound source azimuth can be used to record: the distance between the sound source of the sound and the origin of the coordinate system of the smart speaker. The angle can be used to record: the angle between the origin of the coordinate system of the smart speaker pointing to the ray of the sound source of the sound and the positive direction of the x-axis of the smart speaker. Optionally, a parameter of the dimension of height may be further added to the sound source orientation. Height can be used to record: the vertical distance between the source of the sound and the origin of the coordinate system. Information such as the distance and angle of the sound source azimuth can be calculated by the smart speaker based on the relevant sound source localization method. The sound source localization method can calculate the relative position between the sound source and the smart speaker based on a microphone array composed of at least two microphones set in the smart speaker. For example distance and angle etc. Specifically, sound source localization methods may include, but are not limited to: controllable beamforming technology based on maximum output power, high-resolution spectrogram estimation technology, and sound source localization technology based on time-delay estimation (TDE), etc. . Taking the TDE-based algorithm as an example, its core lies in the accurate estimation of the propagation delay. The propagation delay is generally obtained by performing cross-correlation processing on the sounds picked up by the microphone array of the smart speaker. After that, the distance between the smart speaker and the sound source can be calculated by simple delay summation, geometric calculation, or direct use of cross-correlation results to search for controllable power response. Specific Algorithms The embodiments of the present application are not expanded one by one.
集合的初始设置:在初始情况下例如智能音箱出厂未被使用或者恢复出厂设置时, 智能音箱中预设的误唤醒方位集合,和唤醒方位集合,和声纹集合中的至少一项可以为空。在用户使用智能音箱的过程中,用户可以基于智能音箱所处环境来对误唤醒方位集合,唤醒方位集合和声纹集合中的至少一项进行设置,也可以不进行设置。如果用户不进行设置,可以减少用户操作,提升用户体验。The initial setting of the set: in the initial case, for example, when the smart speaker is not used from the factory or the factory settings are restored, at least one of the preset false wake-up orientation set, the wake-up orientation set, and the voiceprint set in the smart speaker can be empty. . In the process of using the smart speaker, the user can set at least one of the false wake-up orientation set, the wake-up orientation set and the voiceprint set based on the environment in which the smart speaker is located, or not set. If the user does not make settings, user operations can be reduced and user experience can be improved.
对上述集合进行设置的方法进行举例说明:由于误唤醒方位记录未唤醒智能音箱的声音的声源方位,因此误唤醒方位一般对应着环境中能够发出声音的其他设备相对于智能音箱的方位。基于此,可以基于环境中能够发出声音的其他设备相对于智能音箱的方位设置误唤醒方位,并且由用户或者智能音箱为该误唤醒方位设置初始的置信度。以图3所示的家庭环境为例,在智能音箱具有显示屏时,可以在智能音箱的显示屏上为用户提供误唤醒方位的设置界面。例如图4所示,用户可以基于电视与智能音箱之间的相对位置设置一个误唤醒方位以及该误唤醒方位的置信度;基于传统音箱与智能音箱之间的相对位置设置一个误唤醒方位以及该误唤醒方位的置信度;之后点击“确定”控件;相应的,智能音箱检测到用户在设置界面中针对“确定”控件的操作,获取到设置界面中的误唤醒方位、以及置信度等信息,将其保存于误唤醒方位集合中。可选地,如果智能音箱不包括显示屏或者显示屏不方便用户操作,该设置界面可以由与智能音箱关联的其他设备(如用户的智能手机等)展示给用户,其他设备将从设置界面获取到的误唤醒方位、以及置信度等信息发送至智能音箱。The method of setting the above sets is given as an example: since the false wake-up orientation records the sound source orientation of the sound that does not wake up the smart speaker, the false wake-up orientation generally corresponds to the orientation of other devices that can emit sound in the environment relative to the smart speaker. Based on this, the mis-awakening orientation can be set based on the orientation of other devices capable of making sounds in the environment relative to the smart speaker, and the user or the smart speaker can set an initial confidence level for the mis-awakening orientation. Taking the home environment shown in FIG. 3 as an example, when the smart speaker has a display screen, the user can be provided with a setting interface for the wrong wake-up orientation on the display screen of the smart speaker. For example, as shown in Figure 4, the user can set a false wake-up position and the confidence of the false wake-up position based on the relative position between the TV and the smart speaker; set a false wake-up position based on the relative position between the traditional speaker and the smart speaker and the Confidence of the false wake-up position; then click the "OK" control; correspondingly, the smart speaker detects the user's operation on the "OK" control in the setting interface, and obtains the false wake-up position and confidence in the setting interface and other information. Save it in the false wakeup location collection. Optionally, if the smart speaker does not include a display screen or the display screen is inconvenient for the user to operate, the setting interface can be displayed to the user by other devices associated with the smart speaker (such as the user's smartphone, etc.), and other devices will be obtained from the setting interface. The received false wake-up position, and information such as confidence are sent to the smart speaker.
由于唤醒方位记录唤醒智能音箱的声音的声源方位,因此唤醒方位一般对应着环境中用户经常发出唤醒智能音箱的语音的位置相对于智能音箱的方位。基于此,用户可以基于环境中用户经常唤醒智能音箱的位置相对于智能音箱的方位设置唤醒方位,并且由用户或者智能音箱为该唤醒方位设置初始的置信度。以图3所示的家庭环境为例,一般用户经常会在沙发、餐桌等附近活动并唤醒智能音箱。因此可以基于沙发上的位置相对于智能音箱的方位设置一个或多个唤醒方位以及对应的置信度,基于餐桌附近的位置例如餐椅的位置相对于智能音箱的方位设置一个或多个唤醒方位以及对应的置信度。具体设置方式可以参考图4所示的误唤醒方位的设置方式,此处不再赘述。Since the wake-up position records the sound source position of the sound that wakes up the smart speaker, the wake-up position generally corresponds to the position in the environment where the user often sends out the voice to wake up the smart speaker relative to the position of the smart speaker. Based on this, the user can set the wake-up orientation based on the position where the user often wakes up the smart speaker relative to the orientation of the smart speaker, and the user or the smart speaker can set an initial confidence level for the wake-up orientation. Taking the home environment shown in Figure 3 as an example, ordinary users often move around sofas, dining tables, etc. and wake up smart speakers. Therefore, one or more wake-up positions and corresponding confidence levels can be set based on the position on the sofa relative to the position of the smart speaker, and one or more wake-up positions can be set based on the position near the dining table, such as the position of the dining chair relative to the position of the smart speaker, and corresponding confidence. For a specific setting method, reference may be made to the setting method of the false wake-up orientation shown in FIG. 4 , which will not be repeated here.
唤醒声纹可以由用户通过录制语音的方式进行设置。相应的,智能音箱根据录制语音得到的声音获取用户声纹,并设置为唤醒声纹。用户或者智能音箱设置该唤醒声纹初始的置信度。例如,如果置信度为命中次数,则初始的置信度可以为0。The wake-up voiceprint can be set by the user by recording the voice. Correspondingly, the smart speaker obtains the user's voiceprint according to the sound obtained by recording the voice, and sets it as the wake-up voiceprint. The user or the smart speaker sets the initial confidence level of the wake-up voiceprint. For example, if the confidence is the number of hits, the initial confidence may be 0.
集合的更新:在智能音箱的使用过程中,可以在智能音箱被唤醒时,根据唤醒智能音箱的声音的声源方位更新唤醒方位集合;根据该声音中提取的用户声纹更新声纹集合;在智能音箱未被唤醒时,根据未唤醒智能音箱的声音的声源方位更新误唤醒方位集合。Update of the set: During the use of the smart speaker, when the smart speaker is awakened, the wake-up orientation set can be updated according to the sound source orientation of the sound that wakes up the smart speaker; the voiceprint set can be updated according to the user voiceprint extracted from the sound; When the smart speaker is not awakened, the set of mis-awakened orientations is updated according to the sound source orientation of the sound of the unawakened smart speaker.
基于唤醒智能音箱的声音,计算该声音的声源方位,判断唤醒方位集合是否包括该声音的声源方位。如果包括,提高声源方位对应的唤醒方位的置信度;如果不包括,将声源方位作为唤醒方位加入唤醒方位集合中,并为新加入的唤醒方位设置初始的置信度。其中,智能语音信号判断唤醒方位集合是否包括声音的声源方位时,声源方位可以与某一唤醒方位完全一致,也可以具有一定的偏差。例如,唤醒方位和声源方位 分别通过(距离,角度)来表示时,可以预设分别设置距离阈值和角度阈值。声源方位与唤醒方位1的距离差满足距离阈值、角度差满足角度阈值,就可以判断唤醒方位集合包括该声源方位。唤醒方位1可以称为声源方位对应的唤醒方位,也可以称为包括声源方位的唤醒方位。相应的,提高声源方位对应的唤醒方位1的置信度。需要说明的是,本申请实施例不限定初始置信度的设置数值。本申请实施例也不限定在每次提高唤醒方位的置信度时,置信度提高的幅度。例如,该幅度可以为某一固定数值、或者置信度的固定百分比等。类似地,本申请实施例也不限定预设的距离阈值、角度阈值的具体数值。所述距离阈值、所述角度阈值可以基于唤醒方法的精确度、以及声源方位计算方法的精确度等确定。具体来说,唤醒方法的精确度越高,距离阈值、角度阈值的数值一般越小;声源方位计算方法的精确度越高,距离阈值、角度阈值的数值一般越小。另外,距离阈值和角度阈值的设置,可以将唤醒方位集合中的唤醒方位从一个点扩展为一个区域,可以基于所希望扩展区域的大小来设置距离阈值和角度阈值。所述距离阈值、所述角度阈值可以由智能音箱的用户根据需要自行调整。Based on the sound of waking up the smart speaker, calculate the sound source orientation of the sound, and determine whether the wake-up orientation set includes the sound source orientation of the sound. If included, increase the confidence of the wake-up orientation corresponding to the sound source orientation; if not, add the sound source orientation as the wake-up orientation to the wake-up orientation set, and set the initial confidence level for the newly added wake-up orientation. Wherein, when the intelligent voice signal judges whether the wake-up orientation set includes the sound source orientation of the sound, the sound source orientation may be completely consistent with a wake-up orientation, or may have a certain deviation. For example, when the wake-up azimuth and the sound source azimuth are respectively represented by (distance, angle), the distance threshold and the angle threshold can be preset separately. If the distance difference between the sound source azimuth and the wake-up azimuth 1 satisfies the distance threshold, and the angle difference satisfies the angle threshold, it can be determined that the wake-up azimuth set includes the sound source azimuth. The wake-up orientation 1 may be referred to as the wake-up orientation corresponding to the sound source orientation, or may be referred to as the wake-up orientation including the sound source orientation. Correspondingly, the confidence of the wake-up orientation 1 corresponding to the sound source orientation is improved. It should be noted that the embodiment of the present application does not limit the setting value of the initial confidence level. The embodiments of the present application also do not limit the extent to which the confidence is increased each time the confidence of the wake-up orientation is increased. For example, the magnitude may be a fixed value, or a fixed percentage of the confidence level, or the like. Similarly, the embodiments of the present application also do not limit the specific values of the preset distance threshold and angle threshold. The distance threshold and the angle threshold may be determined based on the accuracy of the wake-up method, the accuracy of the sound source orientation calculation method, and the like. Specifically, the higher the accuracy of the wake-up method, the smaller the distance threshold and the angle threshold are; the higher the accuracy of the sound source orientation calculation method, the smaller the distance and angle thresholds are. In addition, the setting of the distance threshold and the angle threshold can expand the wake-up orientation in the wake-up orientation set from a point to an area, and the distance threshold and the angle threshold can be set based on the size of the desired expansion area. The distance threshold and the angle threshold can be adjusted by the user of the smart speaker according to their needs.
基于唤醒智能音箱的声音,智能音箱可以提取声音的用户声纹,判断声纹集合的唤醒声纹中是否包括提取到的用户声纹。如果包括,提高该唤醒声纹的置信度;否则,将用户声纹作为唤醒声纹增加至声纹集合中,并为新增加的唤醒声纹设置置信度。与唤醒方位集合的判断相类似的,在判断声纹集合是否包括提取到的用户声纹中,也可以允许用户声纹与唤醒声纹之间存在一定的误差。例如,可以为声纹包括的各个声纹特征分别设置一个阈值,只要用户声纹的各个声纹特征的数值与某一个唤醒声纹对应声纹特征的数值之间的差值小于声纹特征对应的阈值,就可以认为声纹集合包括该用户声纹,上述某一个唤醒声纹是该用户声纹对应的唤醒声纹。Based on the sound that wakes up the smart speaker, the smart speaker can extract the user's voiceprint of the sound, and determine whether the extracted user's voiceprint is included in the wake-up voiceprint of the voiceprint set. If included, increase the confidence of the wake-up voiceprint; otherwise, add the user's voiceprint as a wake-up voiceprint to the voiceprint set, and set the confidence for the newly added wake-up voiceprint. Similar to the judgment of the wake-up orientation set, in judging whether the voiceprint set includes the extracted user voiceprint, it is also possible to allow a certain error between the user's voiceprint and the wake-up voiceprint. For example, a threshold may be set for each voiceprint feature included in the voiceprint, as long as the difference between the value of each voiceprint feature of the user's voiceprint and the value of the corresponding voiceprint feature of a wake-up voiceprint is smaller than the corresponding value of the voiceprint feature It can be considered that the voiceprint set includes the user's voiceprint, and the above-mentioned one of the wake-up voiceprints is the wake-up voiceprint corresponding to the user's voiceprint.
基于未唤醒智能音箱的声音,智能语音信号计算声音的声源方位。智能语音信号判断误唤醒方位集合是否包括声音的声源方位。如果包括,降低声源方位对应的误唤醒方位的置信度;如果不包括,将声源方位作为误唤醒方位加入误唤醒方位集合中,并为新加入的误唤醒方位设置初始的置信度。误唤醒方位集合更新的实现可以参考唤醒方位集合更新中的相关描述,此处不再赘述。Based on the sound that does not wake up the smart speaker, the smart voice signal calculates the sound source orientation of the sound. The intelligent voice signal judges whether the set of mis-awakened orientations includes the orientation of the sound source of the sound. If included, reduce the confidence level of the false wake-up orientation corresponding to the sound source orientation; if not, add the sound source orientation as the mis-awakened orientation to the set of mis-awakened orientations, and set the initial confidence level for the newly added mis-awakened orientation. For the implementation of the update of the erroneously awakened orientation set, reference may be made to the relevant description in the update of the awakened orientation set, which will not be repeated here.
本申请实施例唤醒方法基于唤醒词模型输出的唤醒词置信度,误唤醒方位集合和/或唤醒方位集合、以及声纹集合来判断是否执行唤醒流程,从而降低误唤醒概率。以下对唤醒方法进行具体说明。The wake-up method of the embodiment of the present application determines whether to execute the wake-up process based on the wake-up word confidence output by the wake-up word model, the false wake-up location set and/or the wake-up location set, and the voiceprint set, thereby reducing the probability of false wake-up. The wake-up method will be described in detail below.
在一种实施方式中,智能音箱包括拾音器和扬声器。其中,拾音器包括麦克风阵列,麦克风阵列包括多个麦克风。In one embodiment, the smart speaker includes a pickup and a speaker. Wherein, the pickup includes a microphone array, and the microphone array includes a plurality of microphones.
如图5所示,智能音箱预设有唤醒方位集合(也可称为第一方位集合)、误唤醒方位集合(也可称为第二方位集合)和声纹集合(也可称为第一声纹集合)。本申请实施例唤醒方法可以包括:As shown in FIG. 5 , the smart speaker is preset with a wake-up orientation set (also referred to as a first orientation set), a false wake-up orientation set (also referred to as a second orientation set) and a voiceprint set (also referred to as a first orientation set) voiceprint collection). The wake-up method in this embodiment of the present application may include:
步骤501:智能音箱对环境中的声音进行拾音,得到声音。Step 501: The smart speaker picks up the sound in the environment to obtain the sound.
由于智能音箱对环境中声音的拾音一般是持续进行的,因此智能音箱一般会对持续拾取到的声音按照一定的时长划分为音频段。本申请实施例中的声音一般是指划分后的音频段。音频段的具体时长本申请实施例不作限定。Since smart speakers generally pick up sounds in the environment continuously, smart speakers generally divide the continuously picked up sounds into audio segments for a certain duration. The sound in the embodiment of the present application generally refers to the divided audio segment. The specific duration of the audio segment is not limited in this embodiment of the present application.
为了降低噪声对后续处理的影响,智能音箱在执行步骤502之前,一般对声音进行降噪处理,以便抑制声音中的噪声信号,得到较为干净的声音。这样,步骤502中使用的声音一般是进行了降噪处理后的声音。In order to reduce the influence of noise on subsequent processing, the smart speaker generally performs noise reduction processing on the sound before executing step 502, so as to suppress noise signals in the sound and obtain a relatively clean sound. In this way, the sound used in step 502 is generally the sound after noise reduction processing.
由于智能音箱持续拾音,为了降低智能音箱的数据处理量和电量消耗,可以为拾音得到的声音设置声音强度阈值等预设条件,只有满足预设条件的声音才会基于唤醒词模型计算唤醒词置信度,以触发后续处理。具体的预设条件本申请实施例不作限定。Since the smart speaker continues to pick up sound, in order to reduce the data processing amount and power consumption of the smart speaker, preset conditions such as sound intensity threshold can be set for the sound picked up by the sound. Only the sound that meets the preset conditions will wake up based on the wake-up word model. word confidence to trigger subsequent processing. Specific preset conditions are not limited in this embodiment of the present application.
步骤502:智能音箱基于唤醒词模型计算声音的唤醒词置信度。Step 502: The smart speaker calculates the wake-up word confidence of the sound based on the wake-up word model.
唤醒词置信度用于描述声音中包括唤醒词声音的概率。The wake word confidence is used to describe the probability that the sound includes the wake word sound.
步骤503:智能音箱判断唤醒词置信度是否小于第一阈值;如果唤醒词置信度不小于第一阈值,执行步骤504。Step 503 : the smart speaker determines whether the confidence level of the wake-up word is less than the first threshold; if the confidence level of the wake-up word is not less than the first threshold, step 504 is executed.
进一步地,步骤503还包括:如果小于第一阈值,不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Further, step 503 further includes: if it is less than the first threshold, do not execute the wake-up process, update the false wake-up orientation set according to the sound source orientation of the sound, and this branch process ends.
需要说明的是,步骤503的判断也可以由唤醒词模型执行,从而唤醒词模型可以输出是否唤醒的判断结果以及唤醒词置信度这两个参数,本申请实施例不作限定。It should be noted that the judgment in step 503 can also be performed by a wake-up word model, so that the wake-up word model can output two parameters, a judgment result of whether to wake up and a wake-up word confidence, which are not limited in this embodiment of the present application.
唤醒词置信度用于描述声音中包括唤醒词声音的概率。唤醒词置信度越高,声音中包括唤醒词声音的概率越大。本申请实施例的唤醒方法还进一步执行以下步骤504~步骤511,进一步判断是否执行唤醒流程,从而实现误唤醒的筛选,降低误唤醒概率。The wake word confidence is used to describe the probability that the sound includes the wake word sound. The higher the wake word confidence, the greater the probability that the sound includes the wake word sound. The wake-up method of the embodiment of the present application further performs the following steps 504 to 511 to further determine whether to execute the wake-up process, thereby realizing the screening of false wake-ups and reducing the probability of false wake-ups.
步骤504:智能音箱判断唤醒词置信度是否小于第二阈值,第二阈值大于第一阈值,如果不小于第二阈值,执行唤醒流程,并且,执行步骤511;如果小于第二阈值,执行步骤505。Step 504: The smart speaker determines whether the confidence level of the wake-up word is less than the second threshold, and the second threshold is greater than the first threshold. If it is not less than the second threshold, execute the wake-up process, and execute step 511; if it is less than the second threshold, execute step 505 .
本申请实施例中将唤醒词置信度不小于第一阈值的情况进一步通过第二阈值划分为两种:如果唤醒词置信度不小于第二阈值,则说明声音中包括唤醒词声音的概率高,出现误唤醒的概率较低,从而直接执行唤醒流程,唤醒智能音箱;如果唤醒词置信度小于第二阈值,不小于第一阈值,则说明声音中包括唤醒词声音的概率相对较低,出现误唤醒的概率相对较高,从而执行以下步骤506~步骤509,进一步结合唤醒方位集合、误唤醒方位集合、或者声纹集合判断是否执行唤醒流程。举例来说,唤醒词置信度的取值范围是(0,100),第一阈值为30,第二阈值为80。相应的,如果唤醒词置信度小于30,不执行唤醒流程;如果唤醒词置信度不小于80,直接执行唤醒流程;如果唤醒词置信度小于80,不小于30,则执行以下的步骤505~步骤509,进一步筛选出可能的误唤醒。In the embodiment of the present application, the situation where the confidence of the wake-up word is not less than the first threshold is further divided into two types by the second threshold: if the confidence of the wake-up word is not less than the second threshold, it means that the sound has a high probability of including the sound of the wake-up word, The probability of false wake-up is low, so the wake-up process is directly executed to wake up the smart speaker; if the confidence of the wake-up word is less than the second threshold and not less than the first threshold, it means that the probability that the sound includes the sound of the wake-up word is relatively low, and an error occurs. The probability of wake-up is relatively high, so the following steps 506 to 509 are performed, and further combined with the wake-up orientation set, the false-awakened orientation set, or the voiceprint set to determine whether to execute the wake-up process. For example, the value range of the wake word confidence is (0, 100), the first threshold is 30, and the second threshold is 80. Correspondingly, if the confidence of the wake-up word is less than 30, the wake-up process is not executed; if the confidence of the wake-up word is not less than 80, the wake-up process is directly executed; if the confidence of the wake-up word is less than 80 and not less than 30, the following steps 505~ are executed 509, further screen out possible false awakenings.
唤醒方位集合、误唤醒方位集合、或者声纹集合可以包括至少一个集合元素,每个集合元素至少包括两个单元。比如,唤醒方位集合包括的集合元素包括唤醒方位和唤醒方位对应的置信度;误唤醒方位集合包括的集合元素包括误唤醒方位和误唤醒方位对应的置信度;声纹集合包括的集合元素包括声纹和声纹对应的置信度。The wake-up location set, the false wake-up location set, or the voiceprint set may include at least one set element, and each set element includes at least two units. For example, the set elements included in the wake-up orientation set include the wake-up orientation and the confidence level corresponding to the wake-up orientation; the set elements included in the false-wake orientation set include the mis-awaken orientation and the confidence level corresponding to the mis-awaken orientation; the set elements included in the voiceprint set include the sound Corresponding confidence of print and voiceprint.
需要说明的是,步骤504中执行唤醒流程的步骤与步骤511之间没有执行顺序限制,图5中以先执行唤醒流程后执行步骤511为例。It should be noted that there is no order of execution between the steps of executing the wake-up process in step 504 and step 511 . In FIG. 5 , the wake-up process is executed first and then step 511 is executed as an example.
第一阈值和第二阈值都可以是预设的。Both the first threshold and the second threshold may be preset.
步骤505:智能音箱计算声音的声源方位。Step 505: The smart speaker calculates the sound source orientation of the sound.
声源方位的计算方法已在前述描述中进行说明,此处不再赘述。The calculation method of the sound source azimuth has been described in the foregoing description, and will not be repeated here.
步骤506:智能音箱判断误唤醒方位集合是否包括声音的声源方位,并且判断唤醒方位集合是否包括声音的声源方位;如果仅误唤醒方位集合包括声源方位,执行步骤507;如果仅唤醒方位集合包括声源方位,执行步骤508;如果不属于上述两种情形,则执行步骤509。Step 506: The smart speaker judges whether the set of mis-awakened orientations includes the sound source orientation of the sound, and judges whether the set of wake-up orientations includes the sound source orientation of the sound; if only the set of mis-awakened orientations includes the sound source orientation, perform step 507; if only the wake-up orientation set includes the sound source orientation If the set includes the sound source orientation, go to step 508; if it does not belong to the above two situations, go to step 509.
步骤507:智能音箱根据声源方位对应的误唤醒方位的置信度判断是否执行唤醒流程;如果为是,则执行唤醒流程,并且执行步骤511;如果为否,则不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 507: The smart speaker judges whether to execute the wake-up process according to the confidence of the false wake-up orientation corresponding to the sound source orientation; if so, execute the wake-up process, and execute step 511; The position of the sound source is updated to the wrong wake-up position set, and the process of this branch ends.
其中,智能音箱根据声源方位对应的误唤醒方位的置信度判断是否执行唤醒流程,可以包括:Among them, the smart speaker determines whether to execute the wake-up process according to the confidence of the false wake-up position corresponding to the sound source position, which may include:
如果误唤醒方位的置信度小于阈值a,判断不执行唤醒流程;If the confidence of the false wake-up position is less than the threshold a, it is judged that the wake-up process is not executed;
如果误唤醒方位的置信度不小于阈值a,判断执行唤醒流程;If the confidence of the false wake-up position is not less than the threshold a, judge to execute the wake-up process;
如果误唤醒方位的置信度小于阈值a,说明声音是噪声信号的概率相对较高,从而判断不执行唤醒流程,也即不唤醒智能音箱,从而降低了误唤醒概率。If the confidence of the false wake-up position is less than the threshold a, it means that the probability of the sound is a noise signal is relatively high, so it is judged that the wake-up process is not executed, that is, the smart speaker is not woken up, thus reducing the probability of false wake-up.
步骤508:智能音箱根据声源方位对应的唤醒方位的置信度判断是否执行唤醒流程;如果为是,执行唤醒流程,并且执行步骤511;如果为否,则不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 508: The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up orientation corresponding to the sound source orientation; if yes, execute the wake-up process, and go to step 511; if not, do not execute the wake-up process, and according to the sound source The orientation update wakes up the orientation set by mistake, and the process of this branch ends.
其中,智能音箱根据声源方位对应的唤醒方位的置信度判断是否执行唤醒流程,可以包括:Among them, the smart speaker determines whether to execute the wake-up process according to the confidence of the wake-up position corresponding to the sound source position, which may include:
如果唤醒方位的置信度小于阈值b,判断不执行唤醒流程;If the confidence of the wake-up orientation is less than the threshold b, it is judged that the wake-up process is not executed;
如果唤醒方位的置信度不小于阈值b,判断执行唤醒流程;If the confidence of the wake-up orientation is not less than the threshold b, judge to execute the wake-up process;
如果唤醒方位的置信度小于阈值b,说明声音是噪声信号的概率相对较高,从而判断不执行唤醒流程,也即不唤醒智能音箱,从而降低了误唤醒概率。If the confidence of the wake-up position is less than the threshold b, it means that the probability of the sound is a noise signal is relatively high, so it is judged that the wake-up process is not executed, that is, the smart speaker is not to be woken up, thereby reducing the probability of false wake-up.
步骤509:智能音箱提取声音的用户声纹,判断声纹集合的唤醒声纹是否包括提取到的用户声纹;如果包括,执行步骤510;如果不包括,不执行唤醒流程;根据声音的声源方位更新误唤醒方位集合,本分支流程结束;Step 509: the smart speaker extracts the user voiceprint of the voice, and determines whether the wake-up voiceprint of the voiceprint set includes the extracted user voiceprint; if it does, go to step 510; if not, do not execute the wake-up process; according to the sound source of the sound The orientation update wakes up the orientation set by mistake, and this branch process ends;
步骤510:智能音箱根据用户声纹对应的唤醒声纹的置信度判断是否执行唤醒流程;如果为是,执行唤醒流程,并且执行步骤511;如果为否,则不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 510: The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint; if so, execute the wake-up process, and execute step 511; The source location updates the false wake-up location set, and this branch process ends.
其中,判断结果为是后,唤醒流程和步骤511之间的执行顺序不限制。Wherein, after the determination result is yes, the execution sequence between the wake-up process and step 511 is not limited.
其中,智能音箱根据用户声纹对应的唤醒声纹的置信度判断是否执行唤醒流程,可以包括:Among them, the smart speaker determines whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint, which may include:
如果唤醒声纹的置信度小于阈值c,判断不执行唤醒流程;If the confidence of the wake-up voiceprint is less than the threshold c, it is judged that the wake-up process is not executed;
如果唤醒声纹的置信度不小于阈值c,判断执行唤醒流程;If the confidence of the wake-up voiceprint is not less than the threshold c, judge to execute the wake-up process;
如果唤醒声纹的置信度小于阈值c,说明声音是环境中不经常出现的人发出的声音的可能性较高,从而判断不执行唤醒流程,也即不唤醒智能音箱,从而降低了误唤醒概率。If the confidence level of the wake-up voiceprint is less than the threshold c, it means that the sound is more likely to be made by a person who does not often appear in the environment, so it is judged not to perform the wake-up process, that is, not to wake up the smart speaker, thus reducing the probability of false wake-up .
步骤511:智能音箱根据声音的声源方位更新唤醒方位集合,根据声音的用户声 纹更新声纹集合,本分支流程结束。Step 511: The smart speaker updates the wake-up orientation set according to the sound source orientation of the sound, and updates the voiceprint set according to the user voiceprint of the sound, and this branch process ends.
本步骤的实现可以参考前述关于集合更新的描述,此处不再赘述。For the implementation of this step, reference may be made to the foregoing description about the set update, which will not be repeated here.
需要说明的是,图5所示的实施例中,也可以不设置第二阈值,也即不执行步骤504,而直接执行步骤505。或者,在另一种可能的实现中,可以将步骤504的判断移至图5中的步骤507、步骤508、步骤510执行,智能音箱结合唤醒词置信度来判断是否执行唤醒流程,具体的判断标准可以参考图5所示的判断标准。以步骤507为例,步骤507将被替换为:智能音箱根据唤醒词置信度以及声源方位对应的误唤醒方位的置信度判断是否执行唤醒流程;如果判断结果为是,则执行唤醒流程,并且执行步骤511;如果判断结果为否,则不执行唤醒流程,并且执行步骤512。此时,步骤507中智能音箱根据唤醒词置信度以及声源方位对应的误唤醒方位的置信度判断是否执行唤醒流程,可以包括:It should be noted that, in the embodiment shown in FIG. 5 , the second threshold may not be set, that is, step 504 is not executed, but step 505 is directly executed. Or, in another possible implementation, the judgment in step 504 can be moved to steps 507, 508, and 510 in FIG. 5 for execution, and the smart speaker combines the wake-up word confidence to determine whether to execute the wake-up process. The specific judgment The criteria can refer to the judgment criteria shown in FIG. 5 . Taking step 507 as an example, step 507 will be replaced by: the smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up word and the confidence of the false wake-up orientation corresponding to the sound source orientation; if the judgment result is yes, then execute the wake-up process, and Step 511 is executed; if the judgment result is no, the wake-up process is not executed, and step 512 is executed. At this time, in step 507, the smart speaker judges whether to execute the wake-up process according to the confidence level of the wake-up word and the confidence level of the mis-awakened orientation corresponding to the sound source orientation, which may include:
如果唤醒词置信度不小于第二阈值,判断执行唤醒流程;If the confidence level of the wake-up word is not less than the second threshold, it is judged to execute the wake-up process;
如果唤醒词置信度小于第二阈值,误唤醒方位的置信度小于第一方位置信度阈值,判断不执行唤醒流程;If the confidence of the wake-up word is less than the second threshold, and the confidence of the false wake-up position is less than the first-party location confidence threshold, it is judged that the wake-up process is not executed;
如果唤醒词置信度小于第二阈值,误唤醒方位的置信度不小于第一方位置信度阈值,判断执行唤醒流程。If the confidence level of the wake-up word is less than the second threshold, and the confidence level of the mis-awakened orientation is not less than the first-party location confidence threshold, it is determined to execute the wake-up process.
需要说明的是,图5中以智能音箱每次判断执行唤醒流程后,均更新唤醒方位集合以及声纹集合,每次判断不执行唤醒流程后,均更新误唤醒方位集合为例,但是基于降低智能音箱的数据处理量、降低功耗等的考虑,也可以不每次判断执行唤醒流程或不执行唤醒流程后均更新上述集合,而是基于某种规则选择某些次判断后更新上述集合,本申请实施例不作限定。It should be noted that in Figure 5, the smart speaker updates the wake-up orientation set and the voiceprint set every time it judges to execute the wake-up process, and updates the false wake-up orientation set every time it determines not to execute the wake-up process. Considering the data processing volume and power consumption reduction of the smart speaker, the above set may not be updated after each judgment to execute the wake-up process or not, but to update the above set after selecting certain judgments based on a certain rule. The embodiments of the present application are not limited.
智能音箱在判断执行唤醒流程后,更新唤醒方位集合以及声纹集合;在判断不执行唤醒流程后,更新误唤醒方位集合;这样,在智能音箱的逐渐使用过程中,唤醒方位集合中记录的唤醒方位能够与用户在环境中经常发出唤醒词语音的位置相对应,误唤醒方位集合中记录的误唤醒方位能够与环境中会发出声音的其他设备的位置相对应,声纹集合中记录的置信度高的唤醒声纹与经常唤醒智能音箱的用户声纹相对应,从而使得本申请实施例唤醒方法能够更好的达到降低误唤醒概率的效果。After judging to execute the wake-up process, the smart speaker updates the wake-up location set and voiceprint set; after judging not to execute the wake-up process, it updates the false wake-up location set; in this way, during the gradual use of the smart speaker, the wake-up location set recorded in the wake-up location set is updated. The orientation can correspond to the position where the user often makes the wake-up word voice in the environment, and the mis-awakening orientation recorded in the false-awakening orientation set can correspond to the position of other devices that emit sound in the environment. The confidence level recorded in the voiceprint set The high wake-up voiceprint corresponds to the user's voiceprint who frequently wakes up the smart speaker, so that the wake-up method of the embodiment of the present application can better achieve the effect of reducing the probability of false wake-up.
举例来说:假设智能音箱出厂后放置于家居环境中开始进行使用,智能音箱中预设有唤醒方位、误唤醒方位和声纹集合,三个集合分别为空;设第一阈值为0.4,第二阈值为0.7,阈值a为0.5,阈值b为0.6,阈值c为5;则,For example: Assuming that the smart speaker is placed in the home environment for use after leaving the factory, the smart speaker is preset with wake-up orientation, false wake-up orientation and voiceprint sets, and the three sets are empty respectively; The second threshold is 0.7, the threshold a is 0.5, the threshold b is 0.6, and the threshold c is 5; then,
假设环境中放置的电视声音较大,智能音箱对电视声音拾音得到声音1,基于唤醒词模型计算声音1的唤醒词置信度为0.1,小于预设的第一阈值0.4,则,执行步骤503中判断结果为是的分支,不执行唤醒,将声音1的声源方位1添加至误唤醒方位集合,得到误唤醒方位1,并为其设置初始置信度,例如为0.8;由于误唤醒概率一般很小,因此,智能音箱再次对电视声音拾音得到声音2,继而计算得到声音2的唤醒词置信度为0.2,执行503中判断结果为是的分支,降低误唤醒方位集合中误唤醒方位1的置信度;随着智能音箱多次执行上述流程,降低误唤醒方位1的置信度,一旦该置信度降低至阈值a 0.5以下,例如为0.45,则即便偶尔智能音箱对电视发出的 声音拾音得到声音n,计算声音n的唤醒词置信度为第一阈值0.4与第二阈值0.7之间的数值例如0.55,智能音箱依次执行步骤503、步骤504、步骤505,根据误唤醒方位1的置信度0.48,判断其小于阈值a 0.5,将不执行唤醒流程,从而从现有技术执行唤醒流程的情况中筛选出了可能的误唤醒情况,从而降低误唤醒概率;Assuming that the sound of the TV placed in the environment is loud, and the smart speaker picks up the sound of the TV to obtain sound 1, the wake-up word confidence of sound 1 is calculated based on the wake-up word model to be 0.1, which is less than the preset first threshold of 0.4, then go to step 503 In the branch where the judgment result is yes, no wake-up is performed, and the sound source orientation 1 of sound 1 is added to the false-awakening orientation set to obtain the false-awakening orientation 1, and an initial confidence level is set for it, for example, 0.8; It is very small. Therefore, the smart speaker picks up the sound of the TV again to obtain voice 2, and then calculates the wake-up word confidence of voice 2 to be 0.2, and executes the branch with the judgment result in 503 to reduce the false wake-up position 1 in the set of false wake-up positions. As the smart speaker performs the above process for many times, the confidence level of the false wake-up position 1 is reduced. Once the confidence level is reduced to below the threshold a 0.5, for example, 0.45, even if the smart speaker occasionally picks up the sound from the TV Obtain sound n, calculate the wake-up word confidence of sound n as a value between the first threshold of 0.4 and the second threshold of 0.7, such as 0.55, and the smart speaker executes steps 503, 504, and 505 in sequence, according to the confidence of the false wake-up orientation 1 0.48, judging that it is less than the threshold a 0.5, the wake-up process will not be performed, so that possible false wake-up situations are screened out from the prior art execution of the wake-up process, thereby reducing the probability of false wake-up;
即便在误唤醒方位1的置信度降低至阈值a 0.5以下之前已经发生了电视声音引起的误唤醒,例如智能音箱对电视声音拾音得到声音m,计算得到唤醒词置信度为0.65,依次执行步骤503~步骤505,根据误唤醒方位1的置信度例如0.55,判断其不小于阈值a 0.5,执行唤醒流程,此时,将更新唤醒方位集合和声纹集合,从而唤醒方位集合中也包括了声源方位1;然而在上述使用过程中,智能音箱还将根据用户唤醒智能音箱时的声音更新唤醒方位集合和声纹集合,声纹集合中记录的唤醒声纹中,经常唤醒智能音箱的用户声纹的置信度逐步提高;智能音箱后续在电视所在的声源方位获取到唤醒词置信度在0.4~0.7之间的声音时,可以根据该声纹集合进一步判断是否执行唤醒流程,从而筛选可能的误唤醒,降低误唤醒概率。Even if the false wake-up caused by the TV sound has occurred before the confidence level of the false wake-up position 1 falls below the threshold a 0.5, for example, the smart speaker picks up the TV sound to obtain the sound m, and the calculated wake-up word confidence level is 0.65, and execute the steps in sequence Steps 503 to 505, according to the confidence of the false wake-up orientation 1, such as 0.55, determine that it is not less than the threshold a 0.5, and execute the wake-up process. At this time, the wake-up orientation set and the voiceprint set will be updated, so that the wake-up orientation set also includes the voice. Source azimuth 1; however, during the above use process, the smart speaker will also update the wake-up azimuth set and the voiceprint set according to the sound when the user wakes up the smart speaker. The confidence of the voiceprint is gradually improved; when the smart speaker subsequently obtains a sound with a wake-up word confidence between 0.4 and 0.7 at the sound source position where the TV is located, it can further judge whether to execute the wake-up process according to the voiceprint collection, so as to filter possible False wake-up reduces the probability of false wake-up.
图5所示的本申请实施例唤醒方法,在基于唤醒词模型计算出声音的唤醒词置信度后,进一步结合唤醒方位集合、误唤醒方位集合和声纹集合判断是否唤醒智能音箱,从而在唤醒词模型输出的判断结果为唤醒智能音箱的情况下,进一步筛选出可能的误唤醒,从而降低了智能音箱的误唤醒概率,提升了用户体验。In the wake-up method of the embodiment of the present application shown in FIG. 5, after calculating the wake-up word confidence of the sound based on the wake-up word model, it is further combined with the wake-up orientation set, the false-awakened orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to wake up the smart speaker. When the judgment result output by the word model is that the smart speaker is awakened, the possible false awakening is further screened, thereby reducing the false awakening probability of the smart speaker and improving the user experience.
可替换地,唤醒方位集合、误唤醒方位集合和声纹集合也可为非预设的,而是随着机器学习创建的;在创建后,继续根据机器学习,来不断的丰富和调整。Alternatively, the wake-up location set, false wake-up location set, and voiceprint set may also be non-preset, but created along with machine learning; after creation, continue to enrich and adjust based on machine learning.
区别于图5中以智能音箱中预设唤醒方位集合、误唤醒方位集合和声纹集合为例,在图6所示的实施例中以智能音箱中预设误唤醒方位集合和声纹集合为例,此时,步骤506~步骤511被替换为以下的步骤601~步骤605,具体的:Different from Fig. 5, the preset wake-up orientation set, false wake-up orientation set and voiceprint set in the smart speaker are taken as examples, in the embodiment shown in Fig. 6, the preset mis-awaken orientation set and voiceprint set in the smart speaker are as follows. For example, at this time, steps 506 to 511 are replaced by the following steps 601 to 605, specifically:
步骤601:智能音箱判断误唤醒方位集合是否包括声音的声源方位;如果包括,执行步骤602;如果不包括,执行步骤603;Step 601: the smart speaker determines whether the set of mis-awakened orientations includes the sound source orientation of the sound; if it does, go to step 602; if not, go to step 603;
本步骤的实现可以参考上述误唤醒方位集合更新中的相关判断方法,此处不再赘述。For the implementation of this step, reference may be made to the above-mentioned related judgment method in the update of the false-awakened orientation set, which will not be repeated here.
步骤602:智能音箱根据声源方位对应的误唤醒方位的置信度判断是否执行唤醒流程;如果判断结果为是,则执行唤醒流程,并且执行步骤605;如果判断结果为否,则不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 602: the smart speaker judges whether to execute the wake-up process according to the confidence of the false wake-up orientation corresponding to the sound source orientation; if the judgment result is yes, the wake-up process is executed, and step 605 is executed; if the judgment result is no, the wake-up process is not executed , and update the false wake-up orientation set according to the sound source orientation of the sound, and this branch process ends.
本步骤的实现请参考步骤507中的描述,此处不再赘述。For the implementation of this step, please refer to the description in step 507, which is not repeated here.
步骤603:智能音箱提取声音的用户声纹,判断声纹集合的唤醒声纹中是否包括提取到的用户声纹;如果包括,则执行步骤604;如果不包括,则不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 603: The smart speaker extracts the user's voiceprint of the voice, and determines whether the wake-up voiceprint of the voiceprint set includes the extracted user's voiceprint; if it does, go to step 604; The position of the sound source is updated to the wrong wake-up position set, and the process of this branch ends.
步骤604:智能音箱根据用户声纹对应的唤醒声纹的置信度判断是否执行唤醒流程;如果判断结果为是,执行唤醒流程,并且执行步骤605;如果判断结果为否,不执行唤醒流程,根据声音的声源方位更新误唤醒方位集合,本分支流程结束。Step 604: The smart speaker judges whether to execute the wake-up process according to the confidence of the wake-up voiceprint corresponding to the user's voiceprint; if the judgment result is yes, execute the wake-up process, and execute step 605; The sound source orientation of the sound updates the set of mis-awakened orientations, and this branch process ends.
步骤605:智能音箱根据声音的用户声纹更新声纹集合。Step 605: The smart speaker updates the voiceprint set according to the user's voiceprint of the voice.
本步骤的实现请参考步骤511中的描述,此处不再赘述。For the implementation of this step, please refer to the description in step 511, which is not repeated here.
图6所示的本申请实施例唤醒方法,在基于唤醒词模型计算出声音的唤醒词置信度,进一步结合误唤醒方位集合和声纹集合判断是否唤醒智能音箱,从而在唤醒词模型输出的判断结果为唤醒智能音箱的情况下,进一步筛选出可能的误唤醒,从而降低了智能音箱的误唤醒概率,提升了用户体验。The wake-up method of the embodiment of the present application shown in FIG. 6 calculates the wake-up word confidence level of the sound based on the wake-up word model, and further combines the false wake-up orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to determine whether to wake up the smart speaker based on the wake-up word model output. The result is that in the case of waking up the smart speaker, the possible false wake-up is further screened, thereby reducing the probability of false wake-up of the smart speaker and improving the user experience.
区别于图6所示的唤醒方法中以智能音箱中预设误唤醒方位集合和声纹集合为例,图7所示的唤醒方法中以智能音箱中预设唤醒方位集合和声纹集合为例;与图6的区别主要在于:将误唤醒方位集合替换为唤醒方位集合,并且,省略误唤醒方位集合更新步骤,步骤705中根据声音的声源方位更新唤醒方位集合,根据声音的用户声纹更新声纹集合。Different from the wake-up method shown in Figure 6, which takes the preset false wake-up orientation set and voiceprint set in the smart speaker as an example, the wake-up method shown in Figure 7 takes the preset wake-up orientation set and voiceprint set in the smart speaker as an example. The difference with Fig. 6 is mainly: the wrong wake-up orientation set is replaced by the wake-up orientation set, and, omitting the wrong wake-up orientation set update step, in step 705, the wake-up orientation set is updated according to the sound source orientation of the sound, according to the user voiceprint of the sound Update the voiceprint collection.
步骤702中根据唤醒方位的置信度判断是否执行唤醒流程的实现可以参考步骤508中的描述,此处不再赘述。The implementation of judging whether to execute the wake-up process according to the confidence of the wake-up orientation in step 702 may refer to the description in step 508, which will not be repeated here.
在首次执行步骤705时,更新第一方位集合和第一声纹集合;包括:创建第一方位集合和第一声纹集合;将唤醒所述电子设备的方位纳入第一方位集合,并赋予纳入第一方位集合的方位一个初始的第一方位置信度;以及将唤醒所述电子设备的声纹纳入第一声纹集合,并赋予纳入第一声纹集合的声纹一个初始的第一声纹置信度。When step 705 is performed for the first time, the first orientation set and the first voiceprint set are updated; including: creating the first orientation set and the first voiceprint set; incorporating the orientation for waking up the electronic device into the first orientation set, and assigning the The orientation of the first orientation set, an initial first party position reliability; and incorporating the voiceprint of waking up the electronic device into the first voiceprint set, and giving the voiceprint included in the first voiceprint set an initial first voiceprint grain confidence.
在以后执行步骤705时,更新第一方位集合和第一声纹集合;包括以下中的至少一项:在第一方位集合中创建一个新的第一方位,并赋予新创建的第一方位一个初始的第一方位置信度;在第一声纹集合中创建一个新的第一声纹,并赋予新创建的第一声纹一个初始的第一声纹置信度;对第一方位集合中一个匹配上的已有的第一方位,增大该已有的第一方位对应的第一方位置信度;对第一声纹集合中一个匹配上的已有的第一声纹,增大该已有的第一声纹对应的第一声纹置信度。When step 705 is performed later, the first orientation set and the first voiceprint set are updated; including at least one of the following: creating a new first orientation in the first orientation set, and assigning a new first orientation to the newly created first orientation The initial first-party position reliability; create a new first voiceprint in the first voiceprint set, and give the newly created first voiceprint an initial first voiceprint confidence; For an existing first orientation on a match, increase the position reliability of the first party corresponding to the existing first orientation; for an existing first voiceprint on a match in the first voiceprint set, increase The first voiceprint confidence level corresponding to the existing first voiceprint.
虽然是以图7中的步骤705为例说明,但图6中的步骤602~步骤605中的更新误唤醒方位集合、更新声纹集合等与此类似,图5中的步骤507~步骤511中的更新误唤醒方位集合、更新声纹集合等也与此类似;此处不再一一阐述。Although step 705 in FIG. 7 is used as an example for illustration, steps 602 to 605 in FIG. 6 update the set of mis-awakened orientations, update the set of voiceprints, etc. similar to this. Steps 507 to 511 in FIG. 5 are similar. The update of the false wake-up location set, the update of the voiceprint set, etc. are similar to this; they will not be described one by one here.
图7所示的本申请实施例唤醒方法,在基于唤醒词模型计算出声音的唤醒词置信度,进一步结合唤醒方位集合和声纹集合判断是否唤醒智能音箱,从而在唤醒词模型输出的判断结果为唤醒智能音箱的情况下,进一步筛选出可能的误唤醒,从而降低了智能音箱的误唤醒概率,提升了用户体验。The wake-up method of the embodiment of the present application shown in FIG. 7 calculates the wake-up word confidence of the sound based on the wake-up word model, and further combines the wake-up orientation set and the voiceprint set to determine whether to wake up the smart speaker, so as to output the judgment result in the wake-up word model In order to wake up the smart speaker, the possible false wake-up is further screened, thereby reducing the false wake-up probability of the smart speaker and improving the user experience.
图8是本申请实施例提供的唤醒方法中又一个实施例的流程示意图。该方法可以应用于电子设备例如上述的智能音箱。该方法可以包括:FIG. 8 is a schematic flowchart of another embodiment of the wake-up method provided by the embodiment of the present application. The method can be applied to electronic devices such as the above-mentioned smart speakers. The method can include:
步骤801:接收到声音,计算声音的唤醒词置信度;唤醒词置信度用于描述声音中包括唤醒词声音的概率;Step 801: Receive the sound, and calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
步骤802:如果唤醒词置信度大于等于第一阈值,计算声音的声源方位;Step 802: If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
步骤803:判断声源方位是否在第一方位集合或者第二方位集合中;其中,第一方位集合中包括若干个第一方位,第一方位用于记录未唤醒智能音箱的声音的声源方位;第二方位集合中包括若干个第二方位,第二方位用于记录唤醒智能音箱的声音的声源方位;Step 803: Determine whether the sound source orientation is in the first orientation set or the second orientation set; wherein, the first orientation set includes several first orientations, and the first orientation is used to record the sound source orientation of the sound that does not wake up the smart speaker ; The second orientation set includes several second orientations, and the second orientation is used to record the sound source orientation of the sound that wakes up the smart speaker;
步骤804:如果声源方位仅在第一方位集合中,根据声源方位对应的第一方位的置信度判断是否唤醒智能音箱,第一方位的置信度用于描述第一方位处发出唤醒智能音箱的语音的概率;Step 804: If the sound source azimuth is only in the first azimuth set, determine whether to wake up the smart speaker according to the confidence of the first azimuth corresponding to the sound source azimuth. the probability of speech;
步骤805:如果声源方位仅在第二方位集合中,根据声源方位对应的第二方位的置信度判断是否唤醒智能音箱,第二方位的置信度用于描述第二方位处发出唤醒智能音箱的语音的概率。Step 805: If the sound source azimuth is only in the second azimuth set, judge whether to wake up the smart speaker according to the confidence of the second azimuth corresponding to the sound source azimuth. the probability of speech.
其中,唤醒词置信度可以对应上述的唤醒词置信度,第一方位可以对应上述的误唤醒方位,第二方位可以对应上述的唤醒方位。The wake-up word confidence level may correspond to the wake-up word confidence level, the first orientation may correspond to the false wake-up orientation, and the second orientation may correspond to the wake-up orientation.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果声源方位在第一方位集合和第二方位集合中,或者声源方位不在第一方位集合和第二方位集合中,根据声音提取用户声纹;If the sound source azimuth is in the first azimuth set and the second azimuth set, or the sound source azimuth is not in the first azimuth set and the second azimuth set, extract the user voiceprint according to the sound;
判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;Determine whether the first voiceprint set includes a user voiceprint; the first voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device;
根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
在一种可能的实现方式中,在计算声音的声源方位之前,还可以包括:判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。In a possible implementation manner, before calculating the sound source azimuth of the sound, the method may further include: judging that the confidence level of the wake-up word is less than a second threshold; and the second threshold is greater than the first threshold.
在一种可能的实现方式中,根据声源方位对应的第一方位的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth may include:
判断声源方位对应的第一方位的置信度是否小于阈值a;Determine whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a;
如果小于阈值a,判断结果为不唤醒电子设备;If it is less than the threshold value a, the judgment result is not to wake up the electronic device;
如果不小于阈值a,判断结果为唤醒电子设备。If it is not less than the threshold value a, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,根据声源方位对应的第二方位的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
判断声源方位对应的第二方位的置信度是否小于阈值b;Determine whether the confidence of the second azimuth corresponding to the sound source azimuth is less than the threshold b;
如果小于阈值b,判断结果为不唤醒电子设备;If it is less than the threshold b, the judgment result is not to wake up the electronic device;
如果不小于阈值b,判断结果为唤醒电子设备。If it is not less than the threshold value b, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
判断用户声纹对应的第一声纹的置信度是否小于阈值c;Determine whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than a threshold c;
如果小于阈值c,判断结果为不唤醒电子设备;If it is less than the threshold value c, the judgment result is not to wake up the electronic device;
如果不小于阈值c,判断结果为唤醒电子设备。If it is not less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为唤醒电子设备,且第二方位集合中包括声音的声源方位,提高声源方位对应的第二方位的置信度;If the judgment result is to wake up the electronic device, and the second orientation set includes the sound source orientation of the sound, improve the confidence of the second orientation corresponding to the sound source orientation;
如果判断结果为唤醒电子设备,且第二方位集合中不包括声音的声源方位,将声源方位作为第二方位存储至第二方位集合中,并为该第二方位设置初始置信度。If the determination result is to wake up the electronic device and the sound source orientation of the sound is not included in the second orientation set, the sound source orientation is stored as the second orientation in the second orientation set, and an initial confidence level is set for the second orientation.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用 户声纹对应的第一声纹的置信度;If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is included in the first voiceprint set, improve the confidence of the first voiceprint corresponding to the user voiceprint;
如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。If the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为不唤醒电子设备,且第一方位集合中包括声音的声源方位,降低包括声源方位的第一方位的置信度;If the determination result is that the electronic device is not to be woken up, and the sound source orientation of the sound is included in the first orientation set, reducing the confidence level of the first orientation including the sound source orientation;
如果判断结果为不唤醒电子设备,且第一方位集合中不包括声音的声源方位,将声源方位作为第一方位存储至第一方位集合中,并为该第一方位设置初始置信度。If the determination result is that the electronic device is not to be woken up and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and an initial confidence level is set for the first orientation.
图8的具体实现可以参考图5所示实施例,此处不再赘述。For the specific implementation of FIG. 8 , reference may be made to the embodiment shown in FIG. 5 , which will not be repeated here.
图9是本申请唤醒方法又一个实施例的流程图。该方法可以应用于电子设备例如上述的智能音箱。该方法可以包括:FIG. 9 is a flowchart of another embodiment of the wake-up method of the present application. The method can be applied to electronic devices such as the above-mentioned smart speakers. The method can include:
步骤901:接收到声音,计算声音的唤醒词置信度;唤醒词置信度用于描述声音包括唤醒词声音的概率;Step 901: Receive the sound, and calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
步骤902:若唤醒词置信度大于等于第一阈值,则计算声音的声源方位;Step 902: If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
步骤903:判断声源方位是否在第一方位集合中;其中,第一方位集合包括第一方位,第一方位用于记录未唤醒电子设备的声音的声源方位;Step 903: Determine whether the sound source orientation is in the first orientation set; wherein the first orientation set includes the first orientation, and the first orientation is used to record the sound source orientation of the sound that does not wake up the electronic device;
步骤904:如果声源方位在第一方位集合中,根据声源方位对应的第一方位的置信度判断是否唤醒电子设备,第一方位的置信度用于描述第一方位处发出唤醒电子设备的语音的概率。Step 904: If the sound source azimuth is in the first azimuth set, determine whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth. probability of speech.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果声源方位不在第一方位集合中,根据声音提取用户声纹;If the sound source azimuth is not in the first azimuth set, extract the user's voiceprint according to the sound;
判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;Determine whether the first voiceprint set includes a user voiceprint; the first voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device;
根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
在一种可能的实现方式中,在计算声音的声源方位之前,还可以包括:In a possible implementation manner, before calculating the sound source orientation of the sound, the method may further include:
判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。It is judged that the confidence level of the wake-up word is less than the second threshold; the second threshold is greater than the first threshold.
在一种可能的实现方式中,根据声源方位对应的第一方位的置信度判断是否唤醒电子设备,包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the first azimuth corresponding to the sound source azimuth includes:
判断声源方位对应的第一方位的置信度是否小于阈值a;Determine whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a;
如果小于阈值a,判断结果为不唤醒电子设备;If it is less than the threshold value a, the judgment result is not to wake up the electronic device;
如果不小于阈值a,判断结果为唤醒电子设备。If it is not less than the threshold value a, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
判断用户声纹对应的第一声纹的置信度是否小于阈值c;Determine whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than a threshold c;
如果小于阈值c,判断结果为不唤醒电子设备;If it is less than the threshold value c, the judgment result is not to wake up the electronic device;
如果不小于阈值c,判断结果为唤醒电子设备。If it is not less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用 户声纹对应的第一声纹的置信度;If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is included in the first voiceprint set, improve the confidence of the first voiceprint corresponding to the user voiceprint;
如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。If the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为不唤醒电子设备,且第一方位集合中包括声音的声源方位,降低包括声源方位的第一方位的置信度;If the determination result is that the electronic device is not to be woken up, and the sound source orientation of the sound is included in the first orientation set, reducing the confidence level of the first orientation including the sound source orientation;
如果判断结果为不唤醒电子设备,且第一方位集合中不包括声音的声源方位,将声源方位作为第一方位存储至第一方位集合中,并为该第一方位设置初始置信度。If the determination result is that the electronic device is not to be woken up and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and an initial confidence level is set for the first orientation.
图9的具体实现可以参考图6所示实施例,此处不再赘述。For the specific implementation of FIG. 9 , reference may be made to the embodiment shown in FIG. 6 , which will not be repeated here.
图10是本申请实施例提供的唤醒方法又一个实施例的流程图。该方法可以应用于电子设备例如上述的智能音箱。该方法可以包括:FIG. 10 is a flowchart of another embodiment of a wake-up method provided by an embodiment of the present application. The method can be applied to electronic devices such as the above-mentioned smart speakers. The method can include:
步骤1001:接收到声音,计算声音的唤醒词置信度;唤醒词置信度用于描述声音包括唤醒词声音的概率;Step 1001: Receive the sound, calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound;
步骤1002:若唤醒词置信度大于等于第一阈值,则计算声音的声源方位;Step 1002: If the wake-up word confidence is greater than or equal to the first threshold, calculate the sound source orientation of the sound;
步骤1003:判断声源方位是否在第二方位集合中;其中,第二方位集合包括第二方位,第二方位用于记录唤醒电子设备的声音的声源方位;Step 1003: determine whether the sound source orientation is in the second orientation set; wherein, the second orientation set includes the second orientation, and the second orientation is used to record the sound source orientation of the sound that wakes up the electronic device;
步骤1004:如果声源方位在第二方位集合中,根据声源方位对应的第二方位的置信度判断是否唤醒电子设备,第二方位的置信度用于描述第二方位处发出唤醒电子设备的语音的概率。Step 1004: If the sound source azimuth is in the second azimuth set, determine whether to wake up the electronic device according to the confidence of the second azimuth corresponding to the sound source azimuth. probability of speech.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果声源方位不在第二方位集合中,根据声音提取用户声纹;If the sound source azimuth is not in the second azimuth set, extract the user's voiceprint according to the sound;
判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;Determine whether the first voiceprint set includes a user voiceprint; the first voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device;
根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。Whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
在一种可能的实现方式中,在计算声音的声源方位之前,还可以包括:In a possible implementation manner, before calculating the sound source orientation of the sound, the method may further include:
判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。It is judged that the confidence level of the wake-up word is less than the second threshold; the second threshold is greater than the first threshold.
在一种可能的实现方式中,根据声源方位对应的第二方位的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
判断声源方位对应的第二方位的置信度是否小于阈值b;Determine whether the confidence of the second azimuth corresponding to the sound source azimuth is less than the threshold b;
如果小于阈值b,判断结果为不唤醒电子设备;If it is less than the threshold b, the judgment result is not to wake up the electronic device;
如果不小于阈值b,判断结果为唤醒电子设备。If it is not less than the threshold value b, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备,可以包括:In a possible implementation manner, judging whether to wake up the electronic device according to the confidence of the first voiceprint corresponding to the user's voiceprint may include:
判断用户声纹对应的第一声纹的置信度是否小于阈值c;Determine whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than a threshold c;
如果小于阈值c,判断结果为不唤醒电子设备;If it is less than the threshold value c, the judgment result is not to wake up the electronic device;
如果不小于阈值c,判断结果为唤醒电子设备。If it is not less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为唤醒电子设备,且第二方位集合中包括声音的声源方位,提高声 源方位对应的第二方位的置信度;If the judgment result is to wake up the electronic device, and the sound source orientation of the sound is included in the second orientation set, improve the confidence of the second orientation corresponding to the sound source orientation;
如果判断结果为唤醒电子设备,且第二方位集合中不包括声音的声源方位,将声源方位作为第二方位存储至第二方位集合中,并为该第二方位设置初始置信度。If the determination result is to wake up the electronic device and the sound source orientation of the sound is not included in the second orientation set, the sound source orientation is stored as the second orientation in the second orientation set, and an initial confidence level is set for the second orientation.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用户声纹对应的第一声纹的置信度;If the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence level of the first voiceprint corresponding to the user's voiceprint;
如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。If the judgment result is to wake up the electronic device, and the first voiceprint set does not include the user voiceprint of the voice, store the user voiceprint as the first voiceprint in the first voiceprint set, and set the initial voiceprint for the first voiceprint Confidence.
图10的具体实现可以参考图7所示实施例,此处不再赘述。For the specific implementation of FIG. 10 , reference may be made to the embodiment shown in FIG. 7 , which will not be repeated here.
可以理解的是,上述实施例中的部分或全部步骤骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照上述实施例呈现的不同的顺序来执行,并且有可能并非要执行上述实施例中的全部操作。It can be understood that, some or all of the steps or operations in the foregoing embodiments are merely examples, and other operations or variations of various operations may also be performed in the embodiments of the present application. Furthermore, the various steps may be performed in a different order presented in the above-described embodiments, and may not perform all operations in the above-described embodiments.
图11是本申请实施例提供的电子设备的的结构示意图。如图11所示,电子设备1100可以包括:计算单元1110和判断单元1120。FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 11 , the electronic device 1100 may include: a calculation unit 1110 and a judgment unit 1120 .
在一个实施例中:In one embodiment:
计算单元1110,用于接收到声音,计算声音的唤醒词置信度;唤醒词置信度用于描述声音包括唤醒词声音的概率,若唤醒词置信度大于等于第一阈值,则计算声音的声源方位;The calculation unit 1110 is used to receive the sound and calculate the wake-up word confidence of the sound; the wake-up word confidence is used to describe the probability that the sound includes the wake-up word sound, if the wake-up word confidence is greater than or equal to the first threshold, then calculate the sound source of the sound position;
判断单元1120,用于判断声源方位是否在第一方位集合或第二方位集合中;其中,第一方位集合包括第一方位,第一方位用于记录未唤醒电子设备的声音的声源方位,第二方位集合包括第二方位,第二方位用于记录唤醒电子设备的声音的声源方位;如果声源方位仅在第一方位集合中,根据声源方位对应的第一方位的置信度判断是否唤醒电子设备,第一方位的置信度用于描述第一方位处发出唤醒电子设备的语音的概率;如果声源方位仅在第二方位集合中,根据声源方位对应的第二方位的置信度判断是否唤醒电子设备,第二方位的置信度用于描述第二方位处发出唤醒电子设备的语音的概率。The determining unit 1120 is used to determine whether the sound source orientation is in the first orientation set or the second orientation set; wherein the first orientation set includes the first orientation, and the first orientation is used to record the sound source orientation of the sound that does not wake up the electronic device , the second azimuth set includes a second azimuth, and the second azimuth is used to record the sound source azimuth of the sound that wakes up the electronic device; if the sound source azimuth is only in the first azimuth set, according to the confidence level of the first azimuth corresponding to the sound source azimuth To judge whether to wake up the electronic device, the confidence of the first position is used to describe the probability of the voice that wakes up the electronic device at the first position; if the sound source position is only in the second position set, according to the sound source position corresponding to the second position The confidence level is used to determine whether to wake up the electronic device, and the confidence level of the second orientation is used to describe the probability that the voice to wake up the electronic device is issued at the second orientation.
在一种可能的实现方式中,判断单元1120还可以用于:如果声源方位在第一方位集合和第二方位集合中,或者声源方位不在第一方位集合和第二方位集合中,根据声音提取用户声纹;判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。In a possible implementation manner, the judging unit 1120 may also be configured to: if the sound source azimuth is in the first azimuth set and the second azimuth set, or the sound source azimuth is not in the first azimuth set and the second azimuth set, according to Extracting user voiceprints by voice; judging whether the first voiceprint set includes user voiceprints; the first voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user voiceprint of the sound that wakes up the electronic device; The confidence level of the first voiceprint corresponding to the voiceprint determines whether to wake up the electronic device.
在一种可能的实现方式中,判断单元1120还可以用于:在计算声音的声源方位之前,判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。In a possible implementation manner, the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
在一种可能的实现方式中,判断单元1120具体可以用于:判断声源方位对应的第一方位的置信度是否小于阈值a;如果小于阈值a,判断结果为不唤醒电子设备;如果不小于阈值a,判断结果为唤醒电子设备。In a possible implementation manner, the judgment unit 1120 may be specifically configured to: judge whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a; if it is less than the threshold a, the judgment result is not to wake up the electronic device; Threshold a, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,判断单元1120具体可以用于:判断声源方位对应的 第二方位的置信度是否小于阈值b;如果小于阈值b,判断结果为不唤醒电子设备;如果不小于阈值b,判断结果为唤醒电子设备。In a possible implementation manner, the judging unit 1120 may be specifically configured to: judge whether the confidence level of the second azimuth corresponding to the sound source azimuth is less than the threshold b; Threshold b, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,判断单元1120具体可以用于:判断用户声纹对应的第一声纹的置信度是否小于阈值c;如果小于阈值c,判断结果为不唤醒电子设备;如果不小于阈值c,判断结果为唤醒电子设备。In a possible implementation manner, the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:更新单元,用于如果判断结果为唤醒电子设备,且第二方位集合中包括声音的声源方位,提高声源方位对应的第二方位的置信度;如果判断结果为唤醒电子设备,且第二方位集合中不包括声音的声源方位,将声源方位作为第二方位存储至第二方位集合中,并为该第二方位设置初始置信度。In a possible implementation, it may further include: an update unit, configured to increase the confidence of the second orientation corresponding to the sound source orientation if the determination result is to wake up the electronic device and the second orientation set includes the sound source orientation of the sound If the judgment result is to wake up the electronic device, and the sound source orientation of the sound is not included in the second orientation set, store the sound source orientation as the second orientation in the second orientation set, and set the initial confidence level for the second orientation .
在一种可能的实现方式中,更新单元还可以用于:如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用户声纹对应的第一声纹的置信度;如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。In a possible implementation manner, the updating unit may also be used to: if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence of the first voiceprint corresponding to the user's voiceprint If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is not included in the first voiceprint set, the user voiceprint is stored as the first voiceprint in the first voiceprint set, and the first voiceprint is the first voiceprint Set the initial confidence level.
在一种可能的实现方式中,更新单元还可以用于:如果判断结果为不唤醒电子设备,且第一方位集合中包括声音的声源方位,降低包括声源方位的第一方位的置信度;如果判断结果为不唤醒电子设备,且第一方位集合中不包括声音的声源方位,将声源方位作为第一方位存储至第一方位集合中,并为该第一方位设置初始置信度。In a possible implementation manner, the updating unit may be further configured to: if the determination result is that the electronic device is not to be woken up, and the sound source azimuth of the sound is included in the first azimuth set, reducing the confidence level of the first azimuth including the sound source azimuth If the judgment result is not to wake up the electronic equipment, and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and the initial confidence level is set for the first orientation .
在另一个实施例中:In another embodiment:
计算单元1110,用于接收到声音;计算声音的唤醒词置信度;唤醒词置信度用于描述声音包括唤醒词声音的概率;若唤醒词置信度大于等于第一阈值,则计算声音的声源方位;The calculation unit 1110 is used to receive the sound; calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound; if the wake-up word confidence level is greater than or equal to the first threshold, then calculate the sound source of the sound position;
判断单元1120,用于判断声源方位是否在第一方位集合中;其中,第一方位集合包括第一方位,第一方位用于记录未唤醒电子设备的声音的声源方位;如果声源方位在第一方位集合中,根据声源方位对应的第一方位的置信度判断是否唤醒电子设备,第一方位的置信度用于描述第一方位处发出唤醒电子设备的语音的概率。The judgment unit 1120 is used to judge whether the sound source azimuth is in the first azimuth set; wherein, the first azimuth set includes the first azimuth, and the first azimuth is used to record the sound source azimuth of the sound that does not wake up the electronic device; if the sound source azimuth In the first set of orientations, whether to wake up the electronic device is determined according to the confidence level of the first orientation corresponding to the sound source orientation, and the confidence level of the first orientation is used to describe the probability that the voice to wake up the electronic device is issued at the first orientation.
在一种可能的实现方式中,判断单元1120还可以用于:如果声源方位不在第一方位集合中,根据声音提取用户声纹;判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。In a possible implementation manner, the judging unit 1120 can also be used to: if the sound source azimuth is not in the first azimuth set, extract the user's voiceprint according to the sound; judge whether the user's voiceprint is included in the first voiceprint set; The voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user's voiceprint for waking up the sound of the electronic device; whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
在一种可能的实现方式中,判断单元1120还可以用于:在计算声音的声源方位之前,判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。In a possible implementation manner, the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
在一种可能的实现方式中,判断单元1120具体可以用于:判断声源方位对应的第一方位的置信度是否小于阈值a;如果小于阈值a,判断结果为不唤醒电子设备;如果不小于阈值a,判断结果为唤醒电子设备。In a possible implementation manner, the judgment unit 1120 may be specifically configured to: judge whether the confidence of the first azimuth corresponding to the sound source azimuth is less than the threshold a; if it is less than the threshold a, the judgment result is not to wake up the electronic device; Threshold a, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,判断单元1120具体可以用于:判断用户声纹对应的第一声纹的置信度是否小于阈值c;如果小于阈值c,判断结果为不唤醒电子设备;如果不小于阈值c,判断结果为唤醒电子设备。In a possible implementation manner, the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
更新单元,用于如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用户声纹对应的第一声纹的置信度;如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。The updating unit is used to improve the confidence of the first voiceprint corresponding to the user's voiceprint if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice; if the judgment result is to wake up the electronic device, and For user voiceprints that do not include voice in the first voiceprint set, store the user voiceprint as the first voiceprint in the first voiceprint set, and set an initial confidence level for the first voiceprint.
在一种可能的实现方式中,更新单元还可以用于:如果判断结果为不唤醒电子设备,且第一方位集合中包括声音的声源方位,降低包括声源方位的第一方位的置信度;如果判断结果为不唤醒电子设备,且第一方位集合中不包括声音的声源方位,将声源方位作为第一方位存储至第一方位集合中,并为该第一方位设置初始置信度。In a possible implementation manner, the updating unit may be further configured to: if the determination result is that the electronic device is not to be woken up, and the sound source azimuth of the sound is included in the first azimuth set, reducing the confidence level of the first azimuth including the sound source azimuth If the judgment result is not to wake up the electronic equipment, and the sound source orientation of the sound is not included in the first orientation set, the sound source orientation is stored as the first orientation in the first orientation set, and the initial confidence level is set for the first orientation .
在又一个实施例中:In yet another embodiment:
计算单元1110,用于接收到声音;计算声音的唤醒词置信度;唤醒词置信度用于描述声音包括唤醒词声音的概率;若唤醒词置信度大于等于第一阈值,则计算声音的声源方位;The calculation unit 1110 is used to receive the sound; calculate the wake-up word confidence level of the sound; the wake-up word confidence level is used to describe the probability that the sound includes the wake-up word sound; if the wake-up word confidence level is greater than or equal to the first threshold, then calculate the sound source of the sound position;
判断单元1120,用于判断声源方位是否在第二方位集合中;其中,第二方位集合包括第二方位,第二方位用于记录唤醒电子设备的声音的声源方位;如果声源方位在第二方位集合中,根据声源方位对应的第二方位的置信度判断是否唤醒电子设备,第二方位的置信度用于描述第二方位处发出唤醒电子设备的语音的概率。The judgment unit 1120 is used to judge whether the sound source azimuth is in the second azimuth set; wherein, the second azimuth set includes the second azimuth, and the second azimuth is used to record the sound source azimuth of the sound that wakes up the electronic device; if the sound source azimuth is in In the second azimuth set, whether to wake up the electronic device is determined according to the confidence of the second azimuth corresponding to the sound source azimuth, and the confidence of the second azimuth is used to describe the probability that the voice to wake up the electronic device is issued at the second azimuth.
在一种可能的实现方式中,判断单元1120还可以用于:如果声源方位不在第二方位集合中,根据声音提取用户声纹;判断第一声纹集合中是否包括用户声纹;第一声纹集合中包括第一声纹,第一声纹用于记录唤醒电子设备的声音的用户声纹;根据用户声纹对应的第一声纹的置信度判断是否唤醒电子设备。In a possible implementation manner, the judging unit 1120 may also be used to: if the sound source orientation is not in the second orientation set, extract the user's voiceprint according to the sound; determine whether the first voiceprint set includes the user's voiceprint; The voiceprint set includes a first voiceprint, and the first voiceprint is used to record the user's voiceprint for waking up the sound of the electronic device; whether to wake up the electronic device is determined according to the confidence of the first voiceprint corresponding to the user's voiceprint.
在一种可能的实现方式中,判断单元1120还可以用于:在计算声音的声源方位之前,判断唤醒词置信度小于第二阈值;第二阈值大于第一阈值。In a possible implementation manner, the judging unit 1120 may also be configured to: before calculating the sound source azimuth of the sound, judging that the wake-up word confidence is less than a second threshold; the second threshold is greater than the first threshold.
在一种可能的实现方式中,判断单元1120具体可以用于:判断声源方位对应的第二方位的置信度是否小于阈值b;如果小于阈值b,判断结果为不唤醒电子设备;如果不小于阈值b,判断结果为唤醒电子设备。In a possible implementation manner, the judging unit 1120 may be specifically configured to: judge whether the confidence level of the second azimuth corresponding to the sound source azimuth is less than the threshold b; Threshold b, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,判断单元1120具体可以用于:判断用户声纹对应的第一声纹的置信度是否小于阈值c;如果小于阈值c,判断结果为不唤醒电子设备;如果不小于阈值c,判断结果为唤醒电子设备。In a possible implementation manner, the judging unit 1120 may be specifically configured to: judge whether the confidence level of the first voiceprint corresponding to the user's voiceprint is less than the threshold c; If it is less than the threshold value c, the judgment result is to wake up the electronic device.
在一种可能的实现方式中,还可以包括:In a possible implementation, it can also include:
更新单元,用于如果判断结果为唤醒电子设备,且第二方位集合中包括声音的声源方位,提高声源方位对应的第二方位的置信度;如果判断结果为唤醒电子设备,且第二方位集合中不包括声音的声源方位,将声源方位作为第二方位存储至第二方位集合中,并为该第二方位设置初始置信度。The updating unit is used to improve the confidence of the second azimuth corresponding to the sound source azimuth if the judgment result is to wake up the electronic device and the sound source azimuth of the sound is included in the second azimuth set; if the judgment result is to wake up the electronic equipment, and the second azimuth The azimuth set does not include the sound source azimuth of the sound, the sound source azimuth is stored as the second azimuth in the second azimuth set, and an initial confidence level is set for the second azimuth.
在一种可能的实现方式中,更新单元还可以用于:如果判断结果为唤醒电子设备,且第一声纹集合中包括声音的用户声纹,提高用户声纹对应的第一声纹的置信度;如果判断结果为唤醒电子设备,且第一声纹集合中不包括声音的用户声纹,将用户声纹 作为第一声纹存储至第一声纹集合中,并为该第一声纹设置初始置信度。In a possible implementation manner, the updating unit may also be used to: if the judgment result is to wake up the electronic device, and the first voiceprint set includes the user's voiceprint of the voice, improve the confidence of the first voiceprint corresponding to the user's voiceprint If the judgment result is to wake up the electronic device, and the user voiceprint of the voice is not included in the first voiceprint set, the user voiceprint is stored as the first voiceprint in the first voiceprint set, and the first voiceprint is the first voiceprint Set the initial confidence level.
图11所示实施例提供的电子设备可用于执行本申请图5~图7所示方法实施例的技术方案,其实现原理和技术效果可以进一步参考方法实施例中的相关描述。The electronic device provided by the embodiment shown in FIG. 11 can be used to implement the technical solutions of the method embodiments shown in FIG. 5 to FIG. 7 of the present application. For the implementation principle and technical effect, reference may be made to the related descriptions in the method embodiments.
应理解以上图11所示装置的各个单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些单元可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元通过硬件的形式实现。例如,获取单元可以为单独设立的处理元件,也可以集成在电子设备的某一个芯片中实现。其它单元的实现与之类似。此外这些单元全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be understood that the division of each unit of the apparatus shown in FIG. 11 above is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated. And these units can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some units can also be implemented in the form of software calling through processing elements, and some units can be implemented in hardware. For example, the acquisition unit may be a separately established processing element, or may be integrated in a certain chip of the electronic device. The implementation of other units is similar. In addition, all or part of these units can be integrated together, and can also be implemented independently. In the implementation process, each step of the above-mentioned method or each of the above-mentioned units may be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
本申请实施例还提供一种电子设备,包括:处理器;存储器;以及计算机程序,其中所述计算机程序被存储在所述存储器中,所述计算机程序包括指令,当所述指令被所述设备执行时,使得所述设备执行图5~图7所示的方法。Embodiments of the present application further provide an electronic device, including: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, and the computer program includes instructions, when the instructions are stored by the device During execution, the device is caused to execute the methods shown in FIG. 5 to FIG. 7 .
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请图5~图7所示实施例提供的方法。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the programs provided by the embodiments shown in FIG. 5 to FIG. 7 of the present application. method.
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请图5~图7所示实施例提供的方法。Embodiments of the present application further provide a computer program product, where the computer program product includes a computer program that, when run on a computer, enables the computer to execute the methods provided by the embodiments shown in FIGS. 5 to 7 of the present application.
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps described in the embodiments disclosed herein can be implemented by a combination of electronic hardware, computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory;以下简称:ROM)、随机存取存储器(Random Access Memory;以下简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In the several embodiments provided in this application, if any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (Read-Only Memory; hereinafter referred to as: ROM), Random Access Memory (Random Access Memory; hereinafter referred to as: RAM), magnetic disk or optical disk and other various A medium on which program code can be stored.
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims (21)

  1. 一种唤醒方法,应用于包含拾音器和扬声器的电子设备,所述拾音器包括多个麦克风,其特征在于,所述方法包括:A wake-up method, applied to an electronic device comprising a pickup and a speaker, wherein the pickup includes a plurality of microphones, wherein the method includes:
    接收到声音;received sound;
    计算出所述声音的唤醒词置信度;所述唤醒词置信度用于表示所述声音包括唤醒词的概率;Calculate the wake-up word confidence of the sound; the wake-up word confidence is used to represent the probability that the sound includes a wake-up word;
    在所述唤醒词置信度大于等于第一阈值后,计算所述声音的声源方位;After the wake-up word confidence is greater than or equal to a first threshold, calculate the sound source orientation of the sound;
    在所述声源方位与第一方位集合中的一个第一方位匹配后,并且,After the sound source azimuth is matched with a first azimuth in the first azimuth set, and,
    在匹配上的第一方位对应的第一方位置信度大于等于第三阈值后,唤醒所述电子设备;或者,Wake up the electronic device after the position reliability of the first party corresponding to the matched first orientation is greater than or equal to the third threshold; or,
    在匹配上的第一方位对应的第一方位置信度小于第三阈值后,不唤醒所述电子设备;Do not wake up the electronic device after the position reliability of the first party corresponding to the first orientation on the match is less than the third threshold;
    其中,所述唤醒词用于唤醒所述电子设备;所述声源方位为所述声源相对于所述电子设备的方向和位置;所述第一方位集合包括M个第一方位元素,每个第一方位元素包括一个第一方位和一个第一方位置信度;所述第一方位为唤醒所述电子设备的声源相对于所述电子设备的方向和位置,用于表示在所述第一方位唤醒过所述电子设备;所述第一方位置信度用于表示在所述第一方位唤醒所述电子设备的概率;M为大于等于1的正整数。The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set includes M first orientation elements, each The first position elements include a first position and a first position reliability; the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, and is used to indicate that the The electronic device has been woken up in the first position; the first position reliability is used to represent the probability of waking up the electronic device in the first position; M is a positive integer greater than or equal to 1.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述在所述唤醒词置信度大于等于第一阈值后,计算所述声音对应的声源的方位;包括:After the confidence of the wake-up word is greater than or equal to the first threshold, calculating the position of the sound source corresponding to the sound; including:
    在所述唤醒词置信度大于等于第一阈值后,并且在所述唤醒词置信度小于第二阈值后,计算所述声音对应的声源的方位。After the wake-up word confidence level is greater than or equal to the first threshold, and after the wake-up word confidence level is smaller than the second threshold, the location of the sound source corresponding to the sound is calculated.
  3. 根据权利要求1或2所述的方法,其特征在于,所述声源方位与第一方位集合中的一个第一方位匹配;包括:The method according to claim 1 or 2, wherein the sound source azimuth is matched with a first azimuth in the first azimuth set; comprising:
    所述声源方位相对于所述电子设备的方向,与所述第一方位集合中的一个第一方位相对于所述电子设备的方向,两个方向的角度偏差在预设的第四阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of a first azimuth in the first azimuth set relative to the electronic device, the angular deviation of the two directions is within a preset fourth threshold ;and,
    所述声源方位相对于所述电子设备的位置,与所述第一方位相对于所述电子设备的位置,两个位置的位置偏差在预设的第五阈值内。The position deviation of the position of the sound source relative to the electronic device and the position of the first position relative to the electronic device is within a preset fifth threshold.
  4. 根据权利要求1-3中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    在所述声源方位与所述第一方位集合中的任意一个第一方位都不匹配后,则After the sound source azimuth does not match any first azimuth in the first azimuth set, then
    从所述声音中提取出声纹;extracting a voiceprint from the voice;
    在所述声纹与所述第一声纹集合中的一个第一声纹匹配后,并且,After the voiceprint is matched with a first voiceprint in the first set of voiceprints, and,
    在所述第一声纹对应的第一声纹置信度大于等于预设的第六阈值后,唤醒所述电子设备;或者,Wake up the electronic device after the confidence level of the first voiceprint corresponding to the first voiceprint is greater than or equal to a preset sixth threshold; or,
    在所述第一声纹对应的第一声纹置信度小于预设的第六阈值后,不唤醒所述电子设备;Do not wake up the electronic device after the confidence level of the first voiceprint corresponding to the first voiceprint is less than a preset sixth threshold;
    其中,所述第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和 一个第一声纹置信度,所述第一声纹用于表示唤醒电子设备的声纹,所述第一声纹置信度用于表示所述第一声纹唤醒所述电子设备的概率;L为大于等于1的正整数。Wherein, the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint is used to represent the voice that wakes up the electronic device The first voiceprint confidence level is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1.
  5. 根据权利要求1-4中任意一项所述的方法,其特征在于,在唤醒所述电子设备之后,所述方法还包括:更新所述第一方位集合和所述第一声纹集合。The method according to any one of claims 1-4, wherein after waking up the electronic device, the method further comprises: updating the first orientation set and the first voiceprint set.
  6. 根据权利要求2-5中任意一项所述的方法,其特征在于,The method according to any one of claims 2-5, wherein,
    在所述唤醒词置信度大于等于第二阈值后,唤醒所述电子设备,并更新所述第一方位集合和所述第一声纹集合。After the wake-up word confidence is greater than or equal to a second threshold, the electronic device is woken up, and the first orientation set and the first voiceprint set are updated.
  7. 一种唤醒方法,应用于包含拾音器和扬声器的电子设备,所述拾音器包括多个麦克风,其特征在于,所述方法包括:A wake-up method, applied to an electronic device comprising a pickup and a speaker, wherein the pickup includes a plurality of microphones, wherein the method includes:
    接收到声音;received sound;
    计算出所述声音的唤醒词置信度;所述唤醒词置信度用于表示所述声音包括唤醒词的概率;Calculate the wake-up word confidence of the sound; the wake-up word confidence is used to represent the probability that the sound includes a wake-up word;
    在所述唤醒词置信度大于等于第一阈值后,计算所述声音的声源方位;After the wake-up word confidence is greater than or equal to a first threshold, calculate the sound source orientation of the sound;
    在所述声源方位与第二方位集合中的一个第二方位匹配后,并且,After the sound source azimuth is matched with a second azimuth in the second set of azimuths, and,
    在匹配上的第二方位对应的第二方位置信度大于等于第七阈值后,唤醒所述电子设备;或者,After the second party position reliability corresponding to the matched second orientation is greater than or equal to the seventh threshold, wake up the electronic device; or,
    在匹配上的第二方位对应的第二方位置信度小于第七阈值后,则不唤醒所述电子设备;After the position reliability of the second party corresponding to the second orientation on the match is less than the seventh threshold, the electronic device is not awakened;
    其中,所述唤醒词用于唤醒所述电子设备;所述声源方位为所述声源相对于所述电子设备的方向和位置;所述第二方位集合包括N个第二方位元素,每个第二方位元素包括一个第二方位和一个第二方位置信度,所述第二方位为没有唤醒所述电子设备的声源相对于所述电子设备的方向和位置,用于表示在所述第二方位没有唤醒所述电子设备,所述第二方位置信度用于表示在所述第二方位没有唤醒所述电子设备的概率;N为大于等于1的正整数。The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the second orientation set includes N second orientation elements, each The second orientation elements include a second orientation and a second orientation confidence, where the second orientation is the direction and position of the sound source that does not wake up the electronic device relative to the electronic device, and is used to indicate where the electronic device is located. The second location does not wake up the electronic device, and the second location reliability is used to represent the probability that the electronic device is not woken up in the second location; N is a positive integer greater than or equal to 1.
  8. 根据权利要求7所述的方法,其特征在于,所述声源方位与第二方位集合中的一个第二方位匹配;包括:The method according to claim 7, wherein the sound source azimuth is matched with a second azimuth in the second azimuth set; comprising:
    所述声源方位相对于所述电子设备的方向,与所述第二方位集合中的一个第二方位相对于所述电子设备的方向,两个方向的角度偏差在预设的第八阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of a second azimuth in the second azimuth set relative to the electronic device, the angular deviation of the two directions is within a preset eighth threshold ;and,
    所述声源方位相对于所述电子设备的位置,与所述第二方位相对于所述电子设备的位置,两个位置的位置偏差在预设的第九阈值内。The position deviation of the position of the sound source relative to the electronic device and the position of the second position relative to the electronic device is within a preset ninth threshold.
  9. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:The method according to claim 7 or 8, wherein the method further comprises:
    在所述声源方位与所述第二方位集合中的任意一个第二方位都不匹配后,则After the sound source azimuth does not match any second azimuth in the second azimuth set, then
    从所述声音中提取出声纹;extracting a voiceprint from the voice;
    在所述声纹与所述第一声纹集合中的任意一个第一声纹都不匹配后,更新所述第二方位集合;After the voiceprint does not match any one of the first voiceprints in the first voiceprint set, update the second orientation set;
    其中,所述第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,所述第一声纹用于表示唤醒电子设备的声纹,所述第一声纹置信度用于表示所述第一声纹唤醒所述电子设备的概率;L为大于等于1的正整数。Wherein, the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence level, and the first voiceprint is used to represent the voice that wakes up the electronic device The first voiceprint confidence level is used to represent the probability that the first voiceprint wakes up the electronic device; L is a positive integer greater than or equal to 1.
  10. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:The method according to claim 7 or 8, wherein the method further comprises:
    在所述声源方位与所述第二方位集合中的任意一个第二方位都不匹配后,则After the sound source azimuth does not match any second azimuth in the second azimuth set, then
    从所述声音中提取出声纹;extracting a voiceprint from the voice;
    在所述声纹与所述第一声纹集合中的一个第一声纹匹配后,并且,After the voiceprint is matched with a first voiceprint in the first set of voiceprints, and,
    在所述第一声纹对应的第一声纹置信度大于等于预设的第十阈值后,唤醒所述电子设备;或者,After the first voiceprint confidence level corresponding to the first voiceprint is greater than or equal to a preset tenth threshold, wake up the electronic device; or,
    在所述第一声纹对应的第一声纹置信度小于预设的第十阈值后,则不唤醒所述电子设备,并且更新所述第二方位集合;After the first voiceprint confidence level corresponding to the first voiceprint is less than a preset tenth threshold, the electronic device is not woken up, and the second orientation set is updated;
    其中,所述第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,所述第一声纹置信度用于表示所述第一声纹唤醒所述电子设备的概率,所述第一声纹用于表示唤醒电子设备的声纹;L为大于等于1的正整数。Wherein, the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence, and the first voiceprint confidence is used to represent the first voiceprint The probability that a voiceprint wakes up the electronic device, the first voiceprint is used to represent the voiceprint for waking up the electronic device; L is a positive integer greater than or equal to 1.
  11. 根据权利要求7-10中任意一项所述的方法,其特征在于,The method according to any one of claims 7-10, wherein,
    在唤醒所述电子设备之后,所述方法还包括:更新所述第一声纹集合;After waking up the electronic device, the method further includes: updating the first voiceprint set;
    在未唤醒所述电子设备之后,所述方法还包括:更新所述第二方位集合。After not waking up the electronic device, the method further includes updating the second set of orientations.
  12. 根据权利要求9-11中任意一项所述的方法,其特征在于,The method according to any one of claims 9-11, wherein,
    在所述唤醒词置信度大于等于第二阈值后,唤醒所述电子设备,并更新所述第一声纹集合。After the wake-up word confidence is greater than or equal to a second threshold, the electronic device is woken up, and the first voiceprint set is updated.
  13. 一种唤醒方法,应用于包含拾音器和扬声器的电子设备,所述拾音器包括多个麦克风,其特征在于,所述方法包括:A wake-up method, applied to an electronic device comprising a pickup and a speaker, wherein the pickup includes a plurality of microphones, wherein the method includes:
    接收到声音;received sound;
    计算出所述声音的唤醒词置信度;所述唤醒词置信度用于表示所述声音包括唤醒词的概率;Calculate the wake-up word confidence of the sound; the wake-up word confidence is used to represent the probability that the sound includes a wake-up word;
    在所述唤醒词置信度大于等于第一阈值后,计算所述声音的声源方位;After the wake-up word confidence is greater than or equal to a first threshold, calculate the sound source orientation of the sound;
    在所述声源方位与第二方位集合中的一个第二方位匹配后,以及在所述声源方位与第一方位集合中的任意一个第一方位都不匹配后,并且after the sound source orientation matches one of the second orientations in the second set of orientations, and after the sound source orientation does not match any of the first orientations in the first set of orientations, and
    在匹配上的第二方位对应的第二方位置信度大于等于第十一阈值后,则唤醒所述电子设备;或者,After the position reliability of the second party corresponding to the matched second orientation is greater than or equal to the eleventh threshold, wake up the electronic device; or,
    在匹配上的第二方位对应的第二方位置信度小于第十一阈值后,则不唤醒所述电子设备;After the second party position reliability corresponding to the second orientation on the match is less than the eleventh threshold, the electronic device is not awakened;
    其中,所述唤醒词用于唤醒所述电子设备;所述声源方位为所述声源相对于所述电子设备的方向和位置;所述第一方位集合包括M个第一方位元素,每个第一方位元素包括一个第一方位和一个第一方位置信度;所述第一方位为唤醒所述电子设备的声源相对于所述电子设备的方向和位置,用于表示在所述第一方位唤醒过所述电子设备;所述第一方位置信度用于表示在所述第一方位唤醒所述电子设备的概率;所述第二方位集合包括N个第二方位元素,每个第二方位元素包括一个第二方位和一个第二方位置信度;所述第二方位为没有唤醒所述电子设备的声源相对于所述电子设备的方向和位置,用于表示在所述第二方位没有唤醒所述电子设备;所述第二方位置信度用于表示在所述第二方位没有唤醒所述电子设备的置信度;M和N均为大于等于1的 正整数。The wake-up word is used to wake up the electronic device; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set includes M first orientation elements, each The first position elements include a first position and a first position reliability; the first position is the direction and position of the sound source that wakes up the electronic device relative to the electronic device, and is used to indicate that the The first position has woken up the electronic device; the first position reliability is used to represent the probability of waking up the electronic device in the first position; the second position set includes N second position elements, each The second position elements include a second position and a second position reliability; the second position is the direction and position of the sound source that does not wake up the electronic device relative to the electronic device, and is used to indicate The second position does not wake up the electronic device; the second position reliability is used to represent the confidence that the electronic device is not woken up in the second position; M and N are both positive integers greater than or equal to 1.
  14. 根据权利要求13所述的方法,其特征在于,The method of claim 13, wherein:
    所述声源方位与第二方位集合中的一个第二方位匹配;包括:The sound source azimuth is matched with a second azimuth in the second azimuth set; including:
    所述声源方位相对于所述电子设备的方向,与所述第二方位集合中的一个第二方位相对于所述电子设备的方向,两个方向的角度偏差在预设的第十二阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of a second azimuth in the second azimuth set relative to the electronic device, the angular deviation of the two directions is within a preset twelfth threshold. within; and,
    所述声源方位相对于所述电子设备的位置,与所述第二方位相对于所述电子设备的位置,两个位置的位置偏差在预设的第十三阈值内;the position of the sound source azimuth relative to the electronic device, and the position of the second azimuth relative to the electronic device, the position deviation of the two positions is within a preset thirteenth threshold;
    所述声源方位与第一方位集合中的任意一个第一方位都不匹配;包括:The sound source azimuth does not match any first azimuth in the first azimuth set; including:
    所述声源方位相对于所述电子设备的方向,与所述第一方位集合中任意一个第一方位相对于所述电子设备的方向,两个方向的角度偏差都不在预设的第十四阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of any first azimuth in the first azimuth set relative to the electronic device, the angular deviation of the two directions is not within the preset fourteenth within the threshold; and,
    所述声源方位相对于所述电子设备的位置,与所述第一方位集合中任意一个第一方位相对于所述电子设备的位置,两个位置的位置偏差都不在预设的第十五阈值内。The position of the sound source azimuth relative to the electronic device, and the position of any first azimuth in the first azimuth set relative to the electronic device, the position deviation of the two positions is not within the preset fifteenth within the threshold.
  15. 根据权利要求13所述的方法,其特征在于,所述方法还包括:The method of claim 13, wherein the method further comprises:
    在所述声源方位与第一方位集合中的一个第一方位匹配后,以及在所述声源方位与第二方位集合中的任意一个第二方位都不匹配后,并且after the sound source orientation matches one of the first orientations in the first set of orientations, and after the sound source orientation does not match any second orientation in the second set of orientations, and
    在匹配上的第一方位对应的第一方位置信度大于等于第十六阈值后,则唤醒所述电子设备;或者,After the position reliability of the first party corresponding to the matching first orientation is greater than or equal to the sixteenth threshold, wake up the electronic device; or,
    在匹配上的第一方位对应的第一方位置信度小于第十六阈值后,则不唤醒所述电子设备。After the position reliability of the first party corresponding to the matched first orientation is less than the sixteenth threshold, the electronic device is not awakened.
  16. 根据权利要求15所述的方法,其特征在于,The method of claim 15, wherein:
    所述声源方位与第一方位集合中的一个第一方位匹配;包括:The sound source azimuth is matched with a first azimuth in the first azimuth set; including:
    所述声源方位相对于所述电子设备的方向,与所述第一方位集合中的一个第一方位相对于所述电子设备的方向,两个方向的角度偏差在预设的第十四阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of a first azimuth in the first azimuth set relative to the electronic device, the angular deviation of the two directions is within a preset fourteenth threshold. within; and,
    所述声源方位相对于所述电子设备的位置,与所述第一方位相对于所述电子设备的位置,两个位置的位置偏差在预设的第十五阈值内;the position of the sound source azimuth relative to the electronic device, and the position of the first azimuth relative to the electronic device, the position deviation of the two positions is within a preset fifteenth threshold;
    所述声源方位与第二方位集合中的任意一个第二方位都不匹配;包括:The sound source orientation does not match any second orientation in the second orientation set; including:
    所述声源方位相对于所述电子设备的方向,与所述第二方位集合中任意一个第二方位相对于所述电子设备的方向,两个方向的角度偏差都不在预设的第十二阈值内;并且,The direction of the sound source azimuth relative to the electronic device, and the direction of any second azimuth in the second azimuth set relative to the electronic device, the angular deviation of the two directions is not within the preset twelfth direction. within the threshold; and,
    所述声源方位相对于所述电子设备的位置,与所述第二方位集合中任意一个第二方位相对于所述电子设备的位置,两个位置的位置偏差都不在预设的第十三阈值内。The position of the sound source azimuth relative to the electronic device, and the position of any second azimuth in the second azimuth set relative to the electronic device, the position deviation of the two positions is not within the preset thirteenth. within the threshold.
  17. 根据权利要求13所述的方法,其特征在于,所述方法还包括:The method of claim 13, wherein the method further comprises:
    在所述声源方位与第二方位集合中的任意一个第二方位都不匹配后,以及在所述 声源方位与第一方位集合中的任意一个第一方位都不匹配后,则After the sound source azimuth does not match any second azimuth in the second azimuth set, and after the sound source azimuth does not match any first azimuth in the first azimuth set, then
    从所述声音中提取出声纹;extracting a voiceprint from the voice;
    在所述声纹与所述第一声纹集合中的一个第一声纹匹配后,并且,After the voiceprint is matched with a first voiceprint in the first set of voiceprints, and,
    在所述第一声纹对应的第一声纹置信度大于等于预设的第十六阈值后,则唤醒所述电子设备,并且更新所述第一方位结合和所述第一声纹集合;或者,After the first voiceprint confidence level corresponding to the first voiceprint is greater than or equal to a preset sixteenth threshold, wake up the electronic device, and update the first orientation combination and the first voiceprint set; or,
    在所述第一声纹对应的第一声纹置信度小于预设的第十六阈值后,则不唤醒所述电子设备,并且更新所述第二方位集合;After the confidence level of the first voiceprint corresponding to the first voiceprint is less than a preset sixteenth threshold, the electronic device is not woken up, and the second orientation set is updated;
    其中,所述第一声纹集合包括L个声纹元素,每个声纹元素包括一个第一声纹和一个第一声纹置信度,所述第一声纹置信度用于表示所述第一声纹唤醒所述电子设备的概率,所述第一声纹用于表示唤醒电子设备的声纹;L为大于等于1的正整数。Wherein, the first voiceprint set includes L voiceprint elements, each voiceprint element includes a first voiceprint and a first voiceprint confidence, and the first voiceprint confidence is used to represent the first voiceprint The probability that a voiceprint wakes up the electronic device, the first voiceprint is used to represent the voiceprint for waking up the electronic device; L is a positive integer greater than or equal to 1.
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method of claim 17, wherein the method further comprises:
    在所述声纹与所述第一声纹集合中的任意一个第一声纹都不匹配后,更新所述第二方位集合。After the voiceprint does not match any one of the first voiceprints in the first voiceprint set, the second orientation set is updated.
  19. 根据权利要求13-16中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 13-16, wherein the method further comprises:
    在唤醒所述电子设备后,更新所述第一方位集合;After waking up the electronic device, updating the first set of orientations;
    在不唤醒所述电子设备后,更新所述第二方位集合。After not waking up the electronic device, the second set of orientations is updated.
  20. 一种电子设备,包括拾音器和扬声器,所述拾音器包括多个麦克风,其特征在于,所述电子设备还包括:An electronic device comprising a pickup and a speaker, wherein the pickup includes a plurality of microphones, wherein the electronic device further includes:
    处理器;processor;
    存储器;memory;
    以及计算机程序,其中所述计算机程序存储在所述存储器中,当所述计算机程序被所述处理器执行时,使得所述电子设备执行如权利要求1-19中任意一项所述的方法。and a computer program, wherein the computer program is stored in the memory and, when executed by the processor, causes the electronic device to perform the method of any one of claims 1-19.
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-19中任意一项所述的方法,其中所述电子设备包括拾音器和扬声器,所述拾音器包括多个麦克风。A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a computer program, which, when the computer program is run on an electronic device, causes the electronic device to execute any one of claims 1-19 The method of clause 1, wherein the electronic device includes a pickup and a speaker, the pickup including a plurality of microphones.
PCT/CN2021/120305 2020-09-30 2021-09-24 Electronic device and wake-up method thereof WO2022068694A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011063583.4 2020-09-30
CN202011063583.4A CN114360546A (en) 2020-09-30 2020-09-30 Electronic equipment and awakening method thereof

Publications (1)

Publication Number Publication Date
WO2022068694A1 true WO2022068694A1 (en) 2022-04-07

Family

ID=80951134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120305 WO2022068694A1 (en) 2020-09-30 2021-09-24 Electronic device and wake-up method thereof

Country Status (2)

Country Link
CN (1) CN114360546A (en)
WO (1) WO2022068694A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376524A (en) * 2022-07-15 2022-11-22 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system
US11762052B1 (en) * 2021-09-15 2023-09-19 Amazon Technologies, Inc. Sound source localization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188483A1 (en) * 2012-12-28 2014-07-03 Alpine Electronics, Inc. Audio device and storage medium
CN108800473A (en) * 2018-07-20 2018-11-13 珠海格力电器股份有限公司 Device control method and apparatus, storage medium, and electronic apparatus
CN110428810A (en) * 2019-08-30 2019-11-08 北京声智科技有限公司 A kind of recognition methods, device and electronic equipment that voice wakes up
CN110727821A (en) * 2019-10-12 2020-01-24 深圳海翼智新科技有限公司 Method, apparatus, system and computer storage medium for preventing device from being awoken by mistake

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188483A1 (en) * 2012-12-28 2014-07-03 Alpine Electronics, Inc. Audio device and storage medium
CN108800473A (en) * 2018-07-20 2018-11-13 珠海格力电器股份有限公司 Device control method and apparatus, storage medium, and electronic apparatus
CN110428810A (en) * 2019-08-30 2019-11-08 北京声智科技有限公司 A kind of recognition methods, device and electronic equipment that voice wakes up
CN110727821A (en) * 2019-10-12 2020-01-24 深圳海翼智新科技有限公司 Method, apparatus, system and computer storage medium for preventing device from being awoken by mistake

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11762052B1 (en) * 2021-09-15 2023-09-19 Amazon Technologies, Inc. Sound source localization
CN115376524A (en) * 2022-07-15 2022-11-22 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system
CN115376524B (en) * 2022-07-15 2023-08-04 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system

Also Published As

Publication number Publication date
CN114360546A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
EP2821992B1 (en) Method for updating voiceprint feature model and terminal
WO2020228815A1 (en) Voice-based wakeup method and device
CN107580113B (en) Reminding method, device, storage medium and terminal
JP2019117623A (en) Voice dialogue method, apparatus, device and storage medium
WO2018045536A1 (en) Sound signal processing method, terminal, and headphones
CN110910872A (en) Voice interaction method and device
CN112470217A (en) Method for determining electronic device to perform speech recognition and electronic device
CN108475502A (en) Speech enhan-cement perceptual model
KR20160026317A (en) Method and apparatus for voice recording
CN107919138B (en) Emotion processing method in voice and mobile terminal
CN108712566A (en) A kind of voice assistant awakening method and mobile terminal
WO2022068694A1 (en) Electronic device and wake-up method thereof
CN111599358A (en) Voice interaction method and electronic equipment
CN107371102B (en) Audio playing volume control method and device, storage medium and mobile terminal
CN105975063B (en) A kind of method and apparatus controlling intelligent terminal
CN106293601A (en) A kind of audio frequency playing method and device
WO2023207149A1 (en) Speech recognition method and electronic device
WO2022143258A1 (en) Voice interaction processing method and related apparatus
CN112394901A (en) Audio output mode adjusting method and device and electronic equipment
CN115810356A (en) Voice control method, device, storage medium and electronic equipment
CN111613213A (en) Method, device, equipment and storage medium for audio classification
WO2022001170A1 (en) Call prompting method, call device, readable storage medium and system on chip
WO2019061292A1 (en) Noise reduction method for terminal and terminal
CN114765026A (en) Voice control method, device and system
CN116405593B (en) Audio processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21874354

Country of ref document: EP

Kind code of ref document: A1