CN110211578B

CN110211578B - Sound box control method, device and equipment

Info

Publication number: CN110211578B
Application number: CN201910304851.8A
Authority: CN
Inventors: 戚耀文
Original assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2022-01-04
Anticipated expiration: 2039-04-16
Also published as: CN110211578A

Abstract

The embodiment of the invention provides a sound box control method, a device and equipment, wherein the method comprises the following steps: after at least two sound boxes detect preset voice information, acquiring sound reception energy information of the at least two sound boxes, wherein the sound reception energy information is used for indicating the sound size of the sound boxes receiving the preset voice information; and determining a target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes, and awakening the target sound box. The accuracy of controlling the sound box is improved.

Description

Sound box control method, device and equipment

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a sound box control method, a sound box control device and sound box control equipment.

Background

Currently, smart speakers are deployed in many scenes (e.g., home scenes, laboratory scenes, etc.), and users can control the smart speakers through voice.

In the practical application process, a plurality of intelligent sound boxes may be deployed in the same scene, and when a user needs to wake up one intelligent sound box, the user can speak a wake-up word. However, when the user's voice is too small, the speaker cannot detect the user's voice, resulting in the smart speaker not being awakened. When the sound of the user is too large, the smart sound box may be awakened while the smart sound box is awakened, so that the smart sound box is awakened by mistake. Therefore, the accuracy of controlling the intelligent sound box is poor.

Disclosure of Invention

The embodiment of the invention provides a sound box control method, a sound box control device and sound box control equipment, and the accuracy of controlling a sound box is improved.

In a first aspect, an embodiment of the present invention provides a sound box control method, including:

after at least two sound boxes detect preset voice information, acquiring sound reception energy information of the at least two sound boxes, wherein the sound reception energy information is used for indicating the sound size of the sound boxes receiving the preset voice information;

and determining a target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes, and awakening the target sound box.

In a possible implementation manner, at least two microphones are arranged in the sound box, and the received sound energy information includes a received sound energy value of each microphone receiving the preset voice information; the determining a target loudspeaker box in the at least two loudspeaker boxes according to the sound reception energy information of the at least two loudspeaker boxes comprises:

determining the average value of the radio energy of each sound box according to the radio energy values of the preset voice information received by at least two microphones in each sound box;

and determining a target sound box in the at least two sound boxes according to the sound receiving energy average value of each sound box.

In a possible embodiment, the determining a target loudspeaker in the at least two loudspeakers according to the average value of the sound pickup energy of each loudspeaker comprises:

determining at least one first loudspeaker box in the at least two loudspeaker boxes according to the average value of the sound receiving energy of each loudspeaker box, wherein the average value of the sound receiving energy of the first loudspeaker box in the at least two loudspeaker boxes is the largest;

determining the target loudspeaker box in the at least one first loudspeaker box.

In a possible implementation, determining the target loudspeaker among the at least one first loudspeaker comprises:

when the number of the at least one first sound box is 1, determining the at least one first sound box as the target sound box;

when the number of the at least one first sound box is larger than 1, acquiring a maximum sound receiving energy value corresponding to each first sound box, determining at least one second sound box in the at least one first sound box according to the maximum sound receiving energy value corresponding to each first sound box, and determining the target sound box in the at least one second sound box; the maximum sound receiving energy value is the maximum value of the sound receiving energy values of at least two microphones in the first loudspeaker box, and the maximum sound receiving energy value of the second loudspeaker box in the at least one first loudspeaker box is the maximum.

In a possible implementation, determining the target loudspeaker among the at least one second loudspeaker comprises:

when the number of the at least one second sound box is 1, determining the at least one second sound box as the target sound box;

and when the number of the at least one second sound box is more than 1, acquiring a sound reception energy difference value between the microphone with the largest sound reception energy value in each second sound box and the adjacent microphone, and determining the target sound box according to the sound reception energy difference value corresponding to each second sound box.

In a possible implementation manner, the determining the target loudspeaker according to the sound energy difference corresponding to each second loudspeaker includes:

determining at least one third sound box with the minimum sound receiving energy difference in the at least one second sound box;

when the number of the at least one third sound box is 1, determining the at least one third sound box as the target sound box;

and when the number of the at least one third sound box is larger than 1, determining any one sound box in the at least one third sound box as the target sound box.

In one possible embodiment, the method further comprises:

after the preset voice information is detected by the at least two sound boxes, acquiring the preset voice information;

acquiring a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information;

and determining a target sound box in the at least two sound boxes according to the preset voiceprint corresponding to each sound box and the voiceprint corresponding to the preset voice information, and awakening the target sound box, wherein the voiceprint of the target sound box is matched with the voiceprint corresponding to the preset voice information.

In a possible embodiment, the at least two loudspeakers are located on the same local area network.

In a possible embodiment, the at least two speakers are smart speakers.

In a second aspect, an embodiment of the present invention provides a sound box control device, including: a first obtaining module, a determining module and a waking module, wherein,

the first acquisition module is used for acquiring the radio energy information of at least two sound boxes after the sound boxes detect preset voice information, wherein the radio energy information is used for indicating the sound size of the sound boxes receiving the preset voice information;

the determining module is used for determining a target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes;

the awakening module is used for awakening the target sound box.

In a possible implementation manner, at least two microphones are arranged in the sound box, and the received sound energy information includes a received sound energy value of each microphone receiving the preset voice information; the determining module is specifically configured to:

In a possible implementation, the determining module is specifically configured to:

In a possible implementation, the determining module is specifically configured to: :

In a possible embodiment, the apparatus further comprises a second obtaining module, wherein,

the second obtaining module is used for obtaining the preset voice information after the preset voice information is detected by the at least two sound boxes, and obtaining a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information;

the determining module is further configured to determine a target sound box in the at least two sound boxes according to a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information, wherein the voiceprint of the target sound box is matched with the voiceprint corresponding to the preset voice information;

the awakening module is also used for awakening the target sound box.

In a possible embodiment, the at least two speakers are smart speakers.

In a third aspect, an embodiment of the present invention provides a sound box control device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored in the memory, so that the at least one processor executes the sound box control method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the sound box control method according to any one of the first aspect is implemented.

According to the sound box control method, the sound box control device and the sound box control equipment, after the preset voice information is monitored by the at least two sound boxes, the server can obtain the sound receiving energy information of the at least two sound boxes, determine the target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes, and awaken the target sound box. In the above process, even if the plurality of sound boxes monitor the preset voice information of the user at the same time, the server may still select a target sound box with the best sound receiving effect from the plurality of sound boxes and awaken the target sound box. When the sound of a user is too large, unnecessary awakening of too many sound boxes is avoided, the probability that the sound boxes are awakened by mistake is reduced, and the accuracy of controlling the sound boxes is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a sound box control method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a sound box control method according to an embodiment of the present invention;

fig. 3A is a schematic view of a sound box according to an embodiment of the present invention;

fig. 3B is a schematic view of a sound box according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of another sound box control method according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of a sound box control method according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of another sound box control method according to an embodiment of the present invention;

fig. 7 is a schematic view of a sound box according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a sound box control device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another sound box control device according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a hardware structure of the sound box control device according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic view of an application scenario of the sound box control method according to the embodiment of the present invention. Referring to fig. 1, a plurality of speakers (e.g., speaker 1, speaker 2, speaker 3, and speaker 4) and a server are included, the plurality of speakers are located in the same lan, and each speaker can communicate with the server. The state of audio amplifier includes dormant state and awaken up state, and when the audio amplifier was in dormant state, the audio amplifier can carry out the pronunciation and monitor, after monitoring preset voice message, the audio amplifier can acquire the radio reception energy information of receiving preset voice message, and the radio reception energy information is used for instructing the audio amplifier to receive the sound size of preset voice message. The sound boxes send the sound receiving energy information to the server, the server selects one sound box from the plurality of sound boxes according to the sound receiving energy information of the sound boxes and awakens the selected sound box, and after the sound boxes are awakened, the sound boxes can play audio.

In the application, even if the plurality of sound boxes monitor the preset voice information of the user at the same time, the server can still select one sound box with the best voice recognition effect from the plurality of sound boxes and awaken the selected sound box. When the sound of the user is too large, unnecessary awakening of too many sound boxes is avoided, the probability that the sound boxes are awakened by mistake is reduced, and the accuracy of controlling the intelligent sound boxes is improved.

The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may be combined with each other, and the description of the same or similar contents in different embodiments is not repeated.

Fig. 2 is a schematic flow chart of a sound box control method according to an embodiment of the present invention. Referring to fig. 2, the method may include:

s201, after the preset voice information is detected by the at least two sound boxes, the sound receiving energy information of the at least two sound boxes is obtained.

The execution main body of the embodiment of the invention can be a server, and can also be a sound box control device arranged in the server. Optionally, the sound box control device may be implemented by software, or may be implemented by a combination of software and hardware.

Optionally, the sound box according to the embodiment of the present invention may be an intelligent sound box, that is, the sound box according to the embodiment of the present invention at least has functions of voice monitoring, voice recognition, processing voice information, communicating with a server, and the like.

Optionally, the at least two speakers are located in the same lan, that is, the at least two speakers may access the same lan and communicate with the server through the same lan.

Optionally, the at least two speakers are located at different positions in the lan. For example, at least two speakers may be placed at different locations in a home.

The preset voice information is a wake-up word for waking up the sound box. For example, the preset voice information may be "hi, small sound box", "hi, sound box", "small degree", or the like.

The wake-up word for each speaker in the same lan may be the same. That is, all speakers in the lan can be woken up by the same wake-up word.

In the practical application process, the at least two sound boxes can perform voice monitoring, and after the at least two sound boxes monitor the preset voice information, the at least two sound boxes can respectively acquire respective radio reception energy values and send the respective radio reception energy values to the server.

Optionally, when the at least two speakers send respective radio energy values to the server, the server may also send an identifier of a local area network in which the speakers are located, so that the server may identify all speakers in the same local area network according to the identifier of the local area network.

The sound receiving energy information is used for indicating the sound size of the sound box receiving the preset voice information.

Optionally, a plurality of microphones may be disposed in one sound box, each microphone may receive voice information, and correspondingly, the sound reception energy information of one sound box may include the sound reception energy value of each microphone in the sound box. The larger the sound receiving energy value of the microphone is, the larger the sound received by the microphone is.

S202, determining a target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes.

Optionally, the sound box with the best sound receiving effect may be determined in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes, and the sound box with the best voice recognition effect may be determined as the target sound box. The sound box with the best sound receiving effect is usually the sound box closest to the user, and the sound box has the best voice playing effect.

It should be noted that, in the embodiment shown in fig. 4, a process of determining a target sound box is described, and details are not described here.

And S203, awakening the target sound box.

Optionally, after the server determines that the target speaker is obtained, the server may send a wake-up instruction to the target speaker, so that the target speaker is switched to a wake-up state according to the wake-up instruction.

It should be noted that, when only one sound box detects the preset voice information, the server only obtains the reception energy information of one sound box, and correspondingly, the server determines the one sound box as the target sound box and wakes up the one sound box.

According to the sound box control method provided by the embodiment of the invention, after the preset voice information is monitored by the at least two sound boxes, the server can acquire the sound receiving energy information of the at least two sound boxes, determine the target sound box in the at least two sound boxes according to the sound receiving energy information of the at least two sound boxes, and awaken the target sound box. In the above process, even if the plurality of sound boxes monitor the preset voice information of the user at the same time, the server may still select a target sound box with the best sound receiving effect from the plurality of sound boxes and awaken the target sound box. When the sound of a user is too large, unnecessary awakening of too many sound boxes is avoided, the probability that the sound boxes are awakened by mistake is reduced, and the accuracy of controlling the sound boxes is improved.

On the basis of the embodiment shown in fig. 2, optionally, at least two microphones are disposed in the sound box, and each microphone can receive voice information. Optionally, the sound box may be a stereo sound box, and when the sound box is a cylinder, the microphones adjacent to each other in position may be called as adjacent microphones. When the sound box is a cube, a plurality of microphones may be disposed on different side surfaces of the sound box, the plurality of microphones disposed on the same side surface and adjacent to each other may be referred to as adjacent microphones, or the microphones disposed on different side surfaces and adjacent to each other may be referred to as adjacent microphones. Next, a microphone in the sound box will be described with reference to fig. 3.

Fig. 3A is a schematic view of a sound box according to an embodiment of the present invention. Referring to fig. 3A, the sound box is a cylinder, and a microphone a, a microphone B, a microphone C, and a microphone D are disposed on a side surface of the sound box, so that the microphone a is adjacent to the microphone B, the microphone C, and the microphone D.

Fig. 3B is a schematic view of a sound box according to an embodiment of the present invention. Referring to fig. 3B, the speaker is a cube, a microphone E, a microphone F and a microphone G are disposed on one side of the speaker, and a microphone H and a microphone I are disposed on the other side of the speaker, so that the microphone E is adjacent to the microphone F, the microphone F and the microphone G, and the microphone H and the microphone I. Alternatively, the microphone G and the microphone H may also be referred to as adjacent microphones.

Fig. 4 is a schematic flow chart of another sound box control method according to an embodiment of the present invention. Referring to fig. 4, the method may include:

s401, determining the average value of the sound receiving energy of each sound box according to the sound receiving energy values of the preset voice information received by at least two microphones in each sound box.

And aiming at any sound box, the sound reception energy information of the sound box comprises the sound reception energy value of each microphone in the sound box for receiving preset voice information.

For any sound box, the average value of the sound receiving energy values of the microphones in the sound box can be determined as the average value of the sound receiving energy of the sound box.

For example, suppose that 3 microphones, respectively denoted as microphone 1, microphone 2 and microphone 3, are disposed in a sound box, and suppose that the value of the sound receiving energy of the microphone 1 receiving the preset voice message is a, the value of the sound receiving energy of the microphone 2 receiving the preset voice message is b, the value of the sound receiving energy of the microphone 3 receiving the preset voice message is c, and accordingly, the average value of the sound receiving energy of the sound box is (a + b + c)/3.

S402, determining at least one first sound box in at least two sound boxes according to the sound receiving energy average value of each sound box.

Wherein, the average value of the sound receiving energy of the at least two sound boxes and the first sound box is the largest.

Optionally, the number of the first sound boxes may be 1, and may also be multiple.

And S403, judging whether the number of the first sound boxes is more than 1.

If yes, go to S405.

If not, go to S404.

And S404, determining the first sound box as a target sound box.

When the number of the first sound boxes is not more than 1, the number of the first sound boxes is 1. Since the number of the first speakers is 1, the one first speaker can be determined as the target speaker.

S405, obtaining the maximum sound receiving energy value corresponding to each first sound box.

Wherein, the maximum sound receiving energy value is the maximum value of the sound receiving energy values of at least two microphones in the first sound box.

For example, assuming that 3 microphones, respectively denoted as microphone 1, microphone 2, and microphone 3, are disposed in the first sound box, and assuming that the sound-collected energy value of the microphone 1 is the largest among the microphones 1, 2, and 3, the sound-collected energy value of the microphone 1 is determined as the maximum sound-collected energy value corresponding to the first sound box.

S406, determining at least one second sound box in the at least one first sound box according to the maximum sound receiving energy value corresponding to each first sound box.

Wherein, the maximum sound energy value of the second sound box is the maximum in the at least one first sound box.

Optionally, the number of the second sound boxes may be 1, and may also be multiple.

And S407, judging whether the number of the second sound boxes is more than 1.

If yes, S409 is performed.

If not, go to step S408.

And S408, determining the second sound box as a target sound box.

And when the number of the second sound boxes is not more than 1, the number of the second sound boxes is 1. Since the number of the second speakers is 1, the one second speaker can be determined as the target speaker.

And S409, acquiring a sound receiving energy difference value between the microphone with the maximum sound receiving energy value in each second sound box and the microphone adjacent to the microphone.

For example, assuming that the second sound box is as shown in fig. 3B, and the sound pickup energy of the microphone E in the second sound box is the largest, and the microphone adjacent to the microphone E is the microphone F, the sound pickup energy difference corresponding to the second sound box is the difference of the sound pickup energy values between the microphone E and the microphone F.

For example, assuming that the second sound box is as shown in fig. 3B, and the sound pickup energy of the microphone F in the second sound box is the maximum, the adjacent microphones of the microphone E are the microphone E and the microphone G, and assuming that the difference between the sound pickup energy values of the microphone F and the microphone E is the difference 1, and the difference between the sound pickup energy values of the microphone F and the microphone G is the difference 2, the sound pickup energy difference corresponding to the second sound box is: the smallest difference between difference 1 and difference 2.

And S410, determining at least one third sound box with the minimum sound receiving energy difference in the at least one second sound box.

Optionally, the number of the third sound boxes may be 1, and may also be greater than 1.

S411, judging whether the number of the third sound boxes is larger than 1.

If yes, S413 is performed.

If not, go to S412.

And S412, determining the third sound box as a target sound box.

And when the number of the third sound boxes is not more than 1, the number of the third sound boxes is 1. Since the number of the third enclosures is 1, the one third enclosure can be determined as the target enclosure.

And S413, determining any sound box in the at least one third sound box as a target sound box.

Because the average value of the sound receiving energy of each sound box in the at least one third sound box is the same, the maximum sound receiving energy value is the same, and the difference value of the sound receiving energy is the same, one sound box can be arbitrarily selected as the target sound box in the at least one third sound box.

In the embodiment shown in fig. 4, at least one first sound box with the largest average value of the sound receiving energies is first determined among the at least two sound boxes, and if the number of the first sound boxes is 1, the first sound box is determined as the target sound box. And if the number of the first sound boxes is greater than 1, determining at least one second sound box with the largest maximum sound receiving energy value in the at least one first sound box, and if the number of the second sound boxes is 1, determining the second sound box as a target sound box. And if the number of the second sound boxes is greater than 1, determining at least one third sound box with the minimum sound receiving energy difference in the at least one second sound box, if the number of the third sound boxes is 1, determining the third sound box as a target sound box, and if the number of the third sound boxes is greater than 1, arbitrarily selecting one sound box in the at least one third sound box as the target sound box. In the process, the determined target sound box is the sound box with the best sound receiving effect.

Next, the sound box control method shown in the above method embodiment is described in detail by a specific example with reference to fig. 5.

Fig. 5 is a schematic flow chart of a sound box control method according to an embodiment of the present invention. Referring to fig. 5, there are 6 sound boxes in the lan, which are respectively marked as sound box 1, sound box 2, sound box 3, sound box 4, sound box 5, and sound box 6. Assume that the awakening words of the 6 speakers are "hi, min", respectively.

In practical applications, the user may say "hi, hi", when the user needs to wake up the loudspeaker closest to him (or the loudspeaker with the best sound pickup). Assuming that the sound box 2, the sound box 4, the sound box 5, and the sound box 6 close to the user hear the voice information after the user has said "hi, min", the sound box 2, the sound box 4, the sound box 5, and the sound box 6 transmit the sound reception energy information of the respective sound boxes to the server, respectively. And the sound reception energy information of each sound box comprises a sound reception energy value of a microphone arranged in the sound box and is sent to the server.

The server firstly determines the sound box with the largest average value of the sound receiving energy according to the sound receiving energy information of the sound boxes 2, 4, 5 and 6, and assumes that the sound box with the largest average value of the sound receiving energy is determined to be the sound box 4, 5 and 6. Because the number of the sound boxes with the largest average value of the sound receiving energy is greater than 1, the server determines the sound box with the largest maximum value of the sound receiving energy among the sound boxes 4, 5 and 6, and assumes that the sound boxes with the largest maximum value of the sound receiving energy obtained by determination are the sound boxes 4 and 6. Because the number of the sound boxes with the largest maximum sound receiving energy value is larger than 1, the server determines the sound box with the smallest sound receiving energy difference value from the sound boxes 4 and 6, and if the sound box with the smallest sound receiving energy difference value is the sound box 6, the sound box 6 is determined as the target sound box, and the sound box 6 is awakened.

In the above process, after the user speaks the awakening word "hi, hi" for the sound boxes, even if the plurality of sound boxes monitor the awakening word, the server determines the sound box which is closest to the user and has the best sound receiving effect among the plurality of sound boxes, and awakens the sound box, so that unnecessary awakening of too many sound boxes is avoided when the sound of the user is too large, the probability that the sound boxes are awakened by mistake is reduced, and the accuracy of controlling the intelligent sound boxes is improved.

On the basis of any one of the above embodiments, optionally, a corresponding preset voiceprint may be set for each sound box in advance, and accordingly, only the sound of the voiceprint may wake up the sound box. Next, a sound box control method in this case will be described with reference to the embodiment shown in fig. 6.

Fig. 6 is a schematic flow chart of another sound box control method according to an embodiment of the present invention. Referring to fig. 6, the method may include:

s601, after the preset voice information is detected by the at least two sound boxes, the preset voice information is obtained.

Optionally, after the at least two speakers detect the preset voice information, the detected preset voice information is sent to the server.

Optionally, the preset voice messages sent by the at least two speakers to the server are the same.

Optionally, when the at least two speakers send the preset voice information to the server, the identifier of the local area network where the speakers are located may also be sent to the server, so that the server may identify all speakers in the same local area network according to the identifier of the local area network.

S602, acquiring a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information.

Optionally, the preset voiceprint corresponding to each speaker may be pre-stored in the server.

After receiving the preset voice information, the server may perform recognition processing on the voice information to recognize and obtain a voiceprint corresponding to the preset voice information.

S603, determining a target sound box in at least two sound boxes according to the preset voiceprint corresponding to each sound box and the voiceprint corresponding to the preset voice information.

And matching the voiceprint of the target sound box with the voiceprint corresponding to the preset voice information.

Optionally, the sound boxes and the preset voiceprints may be in a one-to-one correspondence relationship, and therefore, the server may determine to obtain one sound box in at least two sound boxes according to the preset voice information.

Optionally, a many-to-one relationship may also be set between the sound boxes and the preset voiceprint, that is, a sound corresponding to one preset voiceprint may wake up a plurality of sound boxes. Accordingly, if the server may recognize that a plurality of sound boxes matched with the voiceprint corresponding to the preset voice information are obtained, any one sound box matched with the voiceprint corresponding to the preset voice information may be determined as the target sound box, or the target sound box may be determined in the sound boxes matched with the voiceprint corresponding to the preset voice information by the method shown in the embodiment of fig. 2 to 5.

And S604, awakening the target sound box.

It should be noted that the execution process of S604 may refer to the execution process of S203, and is not described herein again.

Next, the method shown in the embodiment of fig. 6 will be described by way of specific examples with reference to fig. 7.

Fig. 7 is a schematic view of a sound box according to an embodiment of the present invention. Referring to fig. 7, the lan is provided with a sound box 1, a sound box 2, a sound box 3, and a sound box 4, and it is assumed that the voiceprint of the user 1 is preset to correspond to the sound box 1, the voiceprint of the user 2 corresponds to the sound box 2, the voiceprint of the user 3 corresponds to the sound box 3, and the voiceprint of the user 4 corresponds to the sound box 4. That is, only the wake-up word spoken by the user 1 can wake up the speaker 1, only the wake-up word spoken by the user 2 can wake up the speaker 2, only the wake-up word spoken by the user 3 can wake up the speaker 3, and only the wake-up word spoken by the user 4 can wake up the speaker 4.

In an actual application process, after the user 1 speaks the wake-up word, if the sound box 1, the sound box 2, the sound box 3, and the sound box 4 all monitor the wake-up word, the wake-up word is sent to the server, the server judges that the voiceprint of the wake-up word corresponds to the sound box 1, and the server wakes up the sound box 1.

In the embodiments shown in fig. 6 to 7, the corresponding relationship between the sound boxes and the voiceprints may be preset, so that even if a plurality of sound boxes monitor the preset voice information of the user at the same time, the server may still select one sound box from the plurality of sound boxes, which matches the voiceprint of the preset voice information, and wake up the sound box. When the sound of the user is too large, unnecessary awakening of too many sound boxes is avoided, the probability that the sound boxes are awakened by mistake is reduced, and the accuracy of controlling the intelligent sound boxes is improved.

Fig. 8 is a schematic structural diagram of a sound box control device according to an embodiment of the present invention. Referring to fig. 8, the speaker control device 10 includes: a first acquisition module 11, a determination module 12 and a wake-up module 13, wherein,

the first obtaining module 11 is configured to obtain the sound reception energy information of at least two sound boxes after the at least two sound boxes detect a preset voice message, where the sound reception energy information is used to indicate a sound size of the sound boxes receiving the preset voice message;

the determining module 12 is configured to determine a target loudspeaker box in the at least two loudspeaker boxes according to the sound reception energy information of the at least two loudspeaker boxes;

the awakening module 13 is configured to awaken the target sound box.

It should be noted that the sound box control device provided in the embodiment of the present invention may implement the technical solution shown in the above method embodiment, and the implementation principle and the beneficial effect are similar, which are not described again here.

In a possible implementation manner, at least two microphones are arranged in the sound box, and the received sound energy information includes a received sound energy value of each microphone receiving the preset voice information; the determining module 12 is specifically configured to:

In a possible implementation, the determining module 12 is specifically configured to:

In a possible embodiment, the determining module 12 is specifically configured to:

In a possible implementation, the determining module 12 is specifically configured to: :

Fig. 9 is a schematic structural diagram of another sound box control device according to an embodiment of the present invention. On the basis of the embodiment shown in fig. 8, please refer to fig. 9, the sound box control apparatus 10 further includes a second obtaining module 14, wherein,

the second obtaining module 14 is configured to, after the at least two sound boxes detect the preset voice information, obtain the preset voice information, and obtain a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information;

the determining module 12 is further configured to determine a target sound box in the at least two sound boxes according to a preset voiceprint corresponding to each sound box and a voiceprint corresponding to the preset voice information, where the voiceprint of the target sound box is matched with the voiceprint corresponding to the preset voice information;

the awakening module 13 is further configured to awaken the target sound box.

In a possible embodiment, the at least two speakers are smart speakers.

Fig. 10 is a schematic diagram of a hardware structure of a sound box control device according to an embodiment of the present invention, and as shown in fig. 10, the sound box control device 20 includes: at least one processor 21 and a memory 22. The processor 21 and the memory 22 are connected by a bus 23.

Optionally, the speaker control device 20 may further include a communication component, which may include a receiver and/or a transmitter.

In a specific implementation process, the at least one processor 21 executes the computer-executable instructions stored in the memory 22, so that the at least one processor 21 executes the sound box control method.

For a specific implementation process of the processor 21, reference may be made to the above method embodiments, which implement similar principles and technical effects, and this embodiment is not described herein again.

In the embodiment shown in fig. 10, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The application also provides a computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the sound box control method is realized.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

The division of the units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention. Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A sound box control method is characterized by comprising the following steps:

according to the sound receiving energy information of the at least two sound boxes, determining a target sound box in the at least two sound boxes, and awakening the target sound box, wherein the target sound box comprises a third sound box, and the third sound box is the sound box with the smallest sound receiving energy difference value between the microphone with the largest sound receiving energy value and the microphone adjacent to the microphone in the at least two second sound boxes; the second sound box is the sound box with the largest maximum sound receiving energy value in the at least one first sound box; the maximum sound reception energy value is the maximum value of the sound reception energy values of at least two microphones in the first sound box; the first sound box is the sound box with the largest average value of the sound receiving energy in at least two sound boxes.

2. The method according to claim 1, wherein the determining a target loudspeaker box among the at least two loudspeaker boxes according to the sound pickup energy information of the at least two loudspeaker boxes comprises:

3. The method of claim 2, wherein the determining a target loudspeaker in the at least two loudspeakers based on the average of the picked energies for each loudspeaker comprises:

determining at least one first sound box in the at least two sound boxes according to the sound receiving energy average value of each sound box;

4. The method of claim 3, wherein determining the target loudspeaker among the at least one first loudspeaker comprises:

when the number of the at least one first sound box is larger than 1, obtaining a maximum sound receiving energy value corresponding to each first sound box, determining at least one second sound box in the at least one first sound box according to the maximum sound receiving energy value corresponding to each first sound box, and determining the target sound box in the at least one second sound box.

5. The method of claim 4, wherein determining the target loudspeaker among the at least one second loudspeaker comprises:

6. The method of claim 5, wherein the determining the target speaker according to the received energy difference corresponding to each second speaker comprises:

7. The method according to any one of claims 1-6, further comprising:

8. The method of any of claims 1-6, wherein the at least two enclosures are located on the same local area network.

9. The method of any of claims 1-6, wherein the at least two enclosures are smart enclosures.

10. A speaker control apparatus, comprising: a first obtaining module, a determining module and a waking module, wherein,

the determining module is configured to determine a target loudspeaker box among the at least two loudspeaker boxes according to the sound reception energy information of the at least two loudspeaker boxes, where the target loudspeaker box includes a third loudspeaker box, and the third loudspeaker box is a loudspeaker box with a minimum sound reception energy difference between a microphone with a maximum sound reception energy value and a microphone adjacent to the microphone in the at least two second loudspeaker boxes; the second sound box is the sound box with the largest maximum sound receiving energy value in the at least one first sound box; the maximum sound reception energy value is the maximum value of the sound reception energy values of at least two microphones in the first sound box; the first sound box is the sound box with the largest average value of the received sound energy in the at least two sound boxes;

the awakening module is used for awakening the target sound box.

11. The apparatus of claim 10, wherein the determining module is specifically configured to:

12. The apparatus of claim 11, wherein the determining module is specifically configured to:

13. The apparatus of claim 12, wherein the determining module is specifically configured to:

14. The apparatus of claim 13, wherein the determining module is specifically configured to:

15. The apparatus of claim 14, wherein the determining module is specifically configured to:

16. The apparatus of any one of claims 10-15, further comprising a second acquisition module, wherein,

the awakening module is also used for awakening the target sound box.

17. The apparatus according to any one of claims 10-15, wherein the at least two speakers are located on the same local area network.

18. The apparatus of any one of claims 10-15, wherein the at least two speakers are smart speakers.

19. A speaker control apparatus, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory, such that the at least one processor performs the loudspeaker control method of any one of claims 1-9.

20. A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, the sound box control method according to any one of claims 1 to 9 is implemented.