WO2021180085A1

WO2021180085A1 - Sound pickup method and apparatus and electronic device

Info

Publication number: WO2021180085A1
Application number: PCT/CN2021/079789
Authority: WO
Inventors: 黄磊; 鲍光照; 缪海波
Original assignee: 华为技术有限公司
Priority date: 2020-03-11
Filing date: 2021-03-09
Publication date: 2021-09-16
Also published as: CN113393856A; CN113393856B

Abstract

A sound pickup method and apparatus (600) and an electronic device (800). The method comprises: acquiring the orientation of a user relative to the electronic device (800) (201), wherein the electronic device (800) is provided with at least three microphones; selecting, from preconfigured fixed beams of the electronic device (800), a fixed beam closest to the orientation as a main beam, and selecting at least one fixed beam as a side beam in the order of decreasing distances to the orientation (202); when the microphones receive a sound signal, calculating a main output signal of the sound signal by using a beamforming coefficient of the main beam, and calculating a side output signal of the sound signal by using a beamforming coefficient of the side beam (203); and performing filtering processing on the main output signal by using the side output signal to obtain a target sound signal (204), thereby alleviating the problem of speech distortion and the problem of incomplete elimination of human voice interference.

Description

Sound pickup method, device and electronic equipment

Technical field

This application relates to the technical field of smart terminals, in particular to methods, devices and electronic equipment for picking up sound.

Background technique

Most terminal electronic devices on the market, such as smart phones and tablets, have voice assistant applications. Its main function is to control electronic devices through voice commands and complete some low-level difficulties without touching electronic devices such as mobile phones. Frequent command operations, such as playing music, querying the weather, setting alarms, making calls, map navigation, etc.

The above-mentioned human-computer interaction process generally includes: using a microphone of an electronic device to pick up an audio signal; using a front-end enhancement algorithm to estimate a clean voice signal from the audio signal; using the voice signal for voice wake-up and voice recognition. The front-end enhancement algorithm mainly extracts clean speech signals through noise cancellation. Noise cancellation includes echo cancellation, interference suppression, and background noise removal. The echo that needs to be eliminated in echo cancellation is generally the spontaneous sound of the electronic device's horn during human-computer interaction. The interference in interference suppression is generally directional noise, such as the TV sound in the living room environment, the car horn in the car environment, and so on. The performance of the front-end enhancement algorithm directly affects the success rate of human-computer interaction, and ultimately affects the user experience.

Take mobile phones as an example. The front-end enhancement algorithm mainly uses the microphone on the mobile phone to eliminate noise. Considering the limitations of power consumption and computing resources, in most cases only one microphone is used for single-mic noise reduction. This algorithm is called a single-channel noise reduction algorithm. Common single-channel noise reduction algorithms include spectral subtraction, Wiener filtering algorithm, and deep learning method. The single-channel noise reduction algorithm has no effect on unpredictable non-stationary noise, and speech distortion is serious under the condition of low signal-to-noise ratio.

In order to achieve better noise reduction effects, dual-channel noise reduction algorithms based on two microphones are becoming more and more popular in electronic devices. It is mainly used in scenarios that are not sensitive to power consumption, such as in-vehicle scenarios where users can charge electronic devices at any time. , Using two microphones located at the top and bottom of the phone for noise suppression. The main idea of the dual-channel noise reduction algorithm is to select one microphone as the main microphone and one microphone as the auxiliary microphone. First, determine the time-frequency information of the noise in the main microphone data based on the harmonic detection algorithm of human voice and then use the idea of filtering. The auxiliary microphone noise filters out the main microphone noise, improves the voice quality, and achieves the idea of noise reduction. However, the harmonic detection algorithm cannot distinguish between the human voice interference and the target human voice containing the wake-up word, and it is basically difficult to eliminate the human voice interference.

Summary of the invention

The embodiment of the present application provides a sound pickup method to alleviate the problem of voice distortion and incomplete elimination of human voice interference.

In the first aspect, an embodiment of the present application provides a sound pickup method, including:

Obtain the user's position relative to the electronic device; the electronic device is equipped with N microphones; N is an integer greater than or equal to 3; the above-mentioned electronic devices may include mobile terminals (mobile phones), computers, PADs, wearable devices, smart screens, drones, Intelligent Connected Vehicle (Intelligent Connected Vehicle; hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or on-board equipment; optionally, in order to achieve a better sound pickup effect, N microphones The device can be distributed, for example, in different parts of the electronic device. The location of each microphone includes but not limited to: the upper part, lower part, top, bottom of the electronic device, the upper surface where the screen is located, and/or the back, etc.;

In the preset fixed beams of the electronic device, the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short; the number of preset fixed beams is greater than or equal to 2;

When N microphones receive the sound signal, the beamforming coefficient of the main beam is used to calculate the main output signal of the sound signal, and the beamforming coefficient of the side beam is used to calculate the auxiliary output signal of the sound signal;

Use the auxiliary output signal to filter the main output signal to obtain the target sound signal.

In this method, the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound signal of the target sound source can be obtained more accurately from the sound signal, effectively Reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device casing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the voice distortion under the condition of low signal-to-noise ratio Problems and incomplete elimination of vocal interference.

In a possible implementation manner, obtaining the position of the user relative to the electronic device includes:

Obtain the image captured by the camera of the electronic device;

If the facial information of the user of the electronic device is recognized from the image, the position of the user relative to the electronic device is obtained according to the position information of the facial information in the image;

If the user's face information is not recognized from the image, the placement position of the electronic device is obtained; according to the placement position, the user's position relative to the electronic device is obtained.

By obtaining the position of the user relative to the electronic device, more accurate target person's speech information can be obtained, which brings more prior information for subsequent signal processing.

In a possible implementation manner, among the preset fixed beams of the electronic device, selecting the fixed beam with the closest azimuth as the main beam, and selecting at least one fixed beam as the sub-beam in the order of the distance and azimuth from far to short, including:

Calculate the ratio K of the azimuth to each fixed beam; K _k = included angle Δ _k /beam width

Among them, K _k is the ratio of the azimuth to the fixed beam k, the angle Δ _k is the angle between the azimuth and the direction of the fixed beam k, and the beam width

Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;

The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam, starting from the largest ratio in the order of the ratio from larger to smaller.

In a possible implementation manner, before obtaining the user's position relative to the electronic device, the method further includes:

Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.

In a possible implementation manner, obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:

Establish a three-dimensional Cartesian coordinate system for electronic equipment;

Obtain the coordinates of N microphones in the coordinate system;

Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

Obtain the frequency domain response matrix of the housing of the electronic device to the microphone;

Calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;

Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.

In a second aspect, an embodiment of the present application provides a sound pickup device, including:

The position obtaining unit is used to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

The beam selection unit is configured to select, among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth obtained by the azimuth obtaining unit as the main beam, and select at least one fixed beam as the secondary beam in the order of the distance and azimuth from far to short;

The signal calculation unit is used to calculate the main output signal of the sound signal using the beam forming coefficient of the main beam selected by the beam selection unit when the sound signal is received by the N microphones, and use the beam forming coefficient of the sub beam selected by the beam selection unit Calculate the secondary output signal of the sound signal;

The filtering unit is configured to perform filtering processing on the main output signal using the auxiliary output signal calculated by the signal calculation unit to obtain the target sound signal.

In a possible implementation manner, the position obtaining unit includes:

The image acquisition subunit is used to acquire the image captured by the camera of the electronic device;

The position obtaining subunit is used to obtain the position of the user relative to the electronic device according to the position information of the face information in the image if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit; The face information of the user is not recognized in the image obtained by the subunit, and the placement position of the electronic device is obtained; according to the placement position, the position of the user relative to the electronic device is obtained.

In a possible implementation manner, the beam selection unit includes:

The ratio calculation subunit is used to calculate the ratio K of the azimuth to each fixed beam; K _k = the included angle Δ _k /beam width

The beam selection subunit is used to select the fixed beam corresponding to the smallest ratio as the main beam among the ratios calculated by the ratio calculation subunit, and select at least one fixed beam corresponding to the ratio starting from the largest ratio in the order of the ratio from the largest to the smallest. As a secondary beam.

In a possible implementation, it also includes:

The beam obtaining unit is used to obtain the beam forming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.

In a possible implementation manner, the beam obtaining unit includes:

The coordinate system establishment subunit is used to establish a three-dimensional Cartesian coordinate system for electronic equipment;

The coordinate obtaining subunit is used to obtain the coordinates of the N microphones in the coordinate system;

The ideal steering vector calculation subunit is used to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

The matrix obtaining subunit is used to obtain the frequency domain response matrix of the housing of the electronic device to the microphone;

The true steering vector calculation subunit is used to calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;

The fixed beam calculation subunit is used to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.

In a third aspect, an embodiment of the present application provides an electronic device, including:

Display screen; one or more processors; memory; multiple application programs; and one or more computer programs, one or more computer programs are stored in the memory, one or more computer programs include instructions, when the instructions are When the device executes, make the device perform the following steps:

Obtain the user's position relative to the electronic device; the electronic device is equipped with N microphones; N is an integer greater than or equal to 3;

Among the preset fixed beams of the electronic device, the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short;

Use the auxiliary output signal to filter the main output signal to obtain the target sound signal. .

In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the position of the user relative to the electronic device includes:

Obtain the image captured by the camera of the electronic device;

In a possible implementation manner, when the instruction is executed by the device, the fixed beam with the closest azimuth is selected as the main beam among the preset fixed beams of the electronic device, and at least one fixed beam is selected in the order of the distance from the farthest to the nearer. The steps of using the beam as a secondary beam include:

In a possible implementation manner, when the instruction is executed by the device, the following steps are performed before the step of obtaining the user's position relative to the electronic device:

In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:

Obtain the coordinates of N microphones in the coordinate system;

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the method of the first aspect.

In the fifth aspect, an embodiment of the present application provides a computer program, which is used to execute the method of the first aspect when the computer program is executed by a computer.

In a possible design, the program in the fifth aspect may be stored in whole or in part on a storage medium that is packaged with the processor, and may also be stored in part or in a memory that is not packaged with the processor.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

FIG. 1 is an example diagram of microphone settings on an electronic device according to an embodiment of the application;

FIG. 2 is a flowchart of an embodiment of the sound pickup method of this application;

Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application;

Figure 3b is an example diagram of the three-dimensional Cartesian coordinate system of the electronic device of this application;

FIG. 3c is an example diagram of the azimuth angle and the pitch angle according to the embodiment of the application;

FIG. 3d is an example diagram of the placement position of the electronic device according to the embodiment of the application;

FIG. 4 is a flowchart of an embodiment of a method for implementing one step of this application;

5a and 5b are a structural diagram of an electronic device to which the sound pickup method of this application is applicable;

Fig. 6a is a schematic structural diagram of an embodiment of a sound pickup device according to the present application;

Fig. 6b is a schematic structural diagram of an embodiment of a unit of the sound pickup device of the present application;

Fig. 6c is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application;

Fig. 7a is a schematic structural diagram of another embodiment of a sound pickup device according to the present application;

Fig. 7b is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application.

Detailed ways

The terms used in the implementation mode part of this application are only used to explain specific embodiments of this application, and are not intended to limit this application.

In the existing implementation scheme, the single-channel noise reduction algorithm has serious voice distortion under the condition of low signal-to-noise ratio, and the dual-channel noise reduction algorithm is basically difficult to eliminate the human voice interference. For this reason, this application proposes a sound pickup method that can It can alleviate the voice distortion under the condition of low signal-to-noise ratio, and can also reduce the human voice interference.

In the embodiment of the present application, at least three microphones are provided on the electronic device, and the location of each microphone on the electronic device is not limited in the embodiment of the present application. Optionally, in order to achieve a better sound pickup effect, the at least three microphones are dispersedly arranged on the electronic device, for example, arranged in different parts of the electronic device, and the position of each microphone includes but is not limited to: the upper part of the electronic device , Bottom, top, bottom, top surface where the screen is located, and/or back, etc. In a possible implementation manner, as shown in FIG. 1, three microphones can be respectively arranged on the top of the electronic device, the bottom of the electronic device, and the back of the electronic device.

The embodiments of this application can be applied to the scenario of voice assistant application of electronic equipment, providing relatively clean voice signals for voice wake-up and voice recognition, and can also be applied to other scenarios, such as recording and video recording for a certain person, which need to be relatively clean. The scene of the voice signal.

FIG. 2 is a flowchart of an embodiment of a sound pickup method according to this application. As shown in FIG. 2, the above method may include:

Step 201: Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones, N≥3.

Step 202: Among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order from the farthest to the shortest from the azimuth.

Step 203: When the N microphones receive the sound signal, use the beamforming coefficient of the main beam to calculate the main output signal of the sound signal, and use the beamforming coefficient of the secondary beam to calculate the sound signal The secondary output signal.

Step 204: Use the auxiliary output signal to filter the main output signal to obtain a target sound signal.

Here, the target sound signal obtained is a clean speech signal with noise filtered out.

In the method shown in FIG. 2, the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound of the target sound source can be obtained more accurately from the sound signal Signal, effectively reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device housing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the condition of low signal-to-noise ratio The voice distortion problem under the. Especially when the at least three microphones are dispersedly arranged on different parts of the electronic device, for example, when the three microphones are respectively arranged on the top, bottom, and back of the electronic device, due to the influence of the housing of the electronic device, the front and rear noise can be better distinguished. Enhance the effect of filtering processing, alleviate the problem of voice distortion under the condition of low signal-to-noise ratio and incomplete elimination of human voice interference.

Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application. As shown in Fig. 3a, the method may include:

Step 301: Obtain beamforming coefficients, directions, and beam widths of a preset number of fixed beams.

Wherein, the preset group number is greater than or equal to 2, that is, the minimum value of the preset group number is 2, and the maximum value is not limited.

Among them, this step is generally a preset step, that is, after obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams, the obtained information can be stored in the electronic device, without performing step 302 every time. Execute this step before step 309. In practical applications, the above-mentioned information stored in the electronic device can also be modified.

For the implementation of this step, please refer to the description shown in Figure 4, which will not be repeated here.

In order to facilitate the description in the following steps, the three-dimensional Cartesian coordinate system established based on the electronic device in the embodiment shown in FIG. 4 is described. As shown in FIG. 3b, the three-dimensional Cartesian coordinate system uses the center point of the upper surface of the electronic device as the coordinates. The origin and the symmetry axes of the upper surface of the electronic device are the X-axis and the Y-axis, respectively, and the vertical line passing through the center point of the upper surface of the electronic device is the Z-axis. The upper surface of the electronic device is generally the surface of the electronic device on the side of the display screen.

The following steps 302 to 304 are a possible implementation method of the step of obtaining the user's position relative to the electronic device.

Step 302: Obtain the image captured by the camera of the electronic device, and determine whether the face information of the user of the electronic device can be recognized from the image, if not, execute step 303; if yes, execute step 304.

In practical applications, the electronic device can store the facial information of the user of the electronic device. In a possible implementation manner, the facial information can be independently set by the user of the electronic device in the electronic device.

Among them, whether all cameras of the electronic device are used in this step to capture images or some cameras are not limited in the embodiment of the present application. For example, you can use the front camera to capture images, or you can use the front camera and the rear camera to capture images.

In a possible implementation, this step can use face recognition detection technology to recognize the user's face information. Specifically, the face recognition detection technology uses the camera of an electronic device to collect images or video streams containing human faces, and automatically A series of related technologies that detect and track faces in the collected images or video streams, and then perform facial recognition on the detected faces. Using this technology, the user’s face information can be identified in the image or video stream. location information.

Step 303: Obtain the placement position of the electronic device, and estimate the position of the user relative to the electronic device according to the placement position; go to step 305.

In a possible implementation, the user's position relative to the electronic device can be represented by (azimuth angle, pitch angle), where the user's position relative to the electronic device can be represented in the three-dimensional Cartesian coordinate system shown in Figure 3b. A ray whose origin points to the center point of the user’s face is expressed as the azimuth angle is: the angle between the ray projected on the XOY plane and the positive direction of the X-axis by the ray whose origin of the coordinate system points to the center point of the user’s face; The angle is: the angle between the ray with the origin of the coordinate system pointing to the center of the user's face and the positive direction of the Z axis. Referring to the specific example in Figure 3c, assuming point A is the center point of the user's face, then OA is the ray whose origin of the coordinate system points to the center point of the user's face, that is, the position of the user relative to the electronic device, and the azimuth is the ray OA at The angle between the ray OB projected on the XOY plane and the positive direction of the X axis, as shown in Figure 3c, is ∠XOB; its elevation angle is the angle between the ray OA and the positive direction of the Z axis, as shown in Figure 3c, which is ∠ZOA , Through these two angles to show the user's position relative to the electronic device. It should be noted that the identification of the user's position relative to the electronic device with the azimuth angle and the pitch angle is only an example, and is not used to limit other representations or implementations of the user's position relative to the electronic device in the embodiment of the present application.

In a possible implementation manner, a g-sensor in the electronic device may be used to obtain the placement position of the electronic device. Specifically, the gravity sensor can obtain the gravitational acceleration of the electronic device in different directions, and the value of the gravitational acceleration obtained by the gravity sensor in different directions will be different when the position of the electronic device is different. Taking the electronic device to establish the three-dimensional Cartesian coordinate system in Figure 3b as an example, refer to the example diagram of possible placement positions of the electronic device shown in Figure 3d, and place the electronic device’s display screen facing up on the desktop, with the X axis and Y The axis gravitational acceleration is 0, the value of the Z axis gravitational acceleration is greater than 9.8, and the desktop is not shown in Figure 3d; when the electronic device’s display screen is placed on the desktop, the X-axis and Y-axis gravitational accelerations are 0, Z The value of the axis gravitational acceleration is less than -9.8; the electronic device is placed upright (fully vertical), the gravitational acceleration of the X-axis and Z-axis is 0, and the value of the Y-axis gravitational acceleration is greater than 9.8; the electronic device is placed upside down (fully vertical) Straight), the gravitational acceleration of X-axis and Z-axis is 0, and the value of Y-axis gravitational acceleration is less than -9.8; if the electronic device is placed horizontally to the left (fully horizontally), the gravitational acceleration of Y-axis and Z-axis is 0, X The value of the axis gravitational acceleration is greater than 9.8; the electronic device is placed horizontally to the right (fully horizontally), the gravitational acceleration of the Y-axis and the Z-axis are 0, and the value of the X-axis gravitational acceleration is less than -9.8. Therefore, according to the gravity acceleration values obtained by the gravity sensor in various directions, the placement position of the electronic device can be obtained.

Specifically, the threshold range of X-axis gravitational acceleration, the threshold range of Y-axis gravitational acceleration, and the threshold range of Z-axis gravitational acceleration corresponding to different placement positions of the electronic device can be preset. Correspondingly, the output can be based on the gravity acceleration in this step. The X-axis gravitational acceleration, Y-axis gravitational acceleration, and Z-axis gravitational acceleration of, determine the threshold range in which they are located, so as to obtain the placement position of the electronic device. For example, referring to the aforementioned example of gravitational acceleration corresponding to the placement position of the electronic device, assuming that the gravitational accelerations of the X-axis, Y-axis, and Z-axis are g ₁ , g ₂ , and g ₃ , respectively, when |g ₁ |<Δ ₁ , | g ₂ |＜Δ ₁ , |g ₃ -9.8|＜Δ ₁ or |g ₃ +9.8|＜Δ ₁ , when the electronic device is placed horizontally; when |g ₁ |＜Δ ₁ , |g ₃ |＜Δ ₁ , when g ₂ ＞Δ ₂ , the electronic device is in a hand-held state; when |g ₂ |＜Δ ₁ , |g ₃ |＜Δ ₁ , g ₁ ＞Δ ₂ , the electronic device is in a state of tilting to the left; when |g ₂ |＜Δ ₁ , |g ₃ |＜Δ ₁ , when g ₁ ＜-Δ ₂ , the electronic device is tilted to the right, where Δ ₁ and Δ ₂ are preset thresholds, and Δ ₁ can be close to 0 positive, Δ ₂ [Delta] may be a positive number greater than _1. Among them, _{the specific values of Δ 1} and Δ ₂ can be independently set in practical applications, and this application is not limited.

In practical applications, the corresponding relationship between different placement positions and the user's position relative to the electronic device can be preset; then, the estimating the user's position relative to the electronic device based on the placement position may include:

Obtain the user's position relative to the electronic device corresponding to the placement position of the electronic device from the preset correspondence relationship.

The implementation method is described as follows: if the electronic device does not recognize the user's face information of the electronic device from the image taken by the camera, it indicates that the user's face orientation exceeds the camera's shooting angle range, and then it can be placed according to the position And the shooting angle range of the camera to estimate the most likely position of the user relative to the electronic device. Specifically,

The position corresponding to the shooting angle range of the camera may be first excluded from all positions of the user relative to the electronic device;

Then, according to the big data statistical analysis of the user’s usage habits, from the remaining positions, the position with the greatest probability of the user relative to the electronic device in the different placement positions of the electronic device can be calculated, so as to obtain: Correspondence between the two directions.

For example, referring to the foregoing placement position example, based on usage habits and ease of reading, when the electronic device is in a handheld state or a horizontal placement state, the user is likely to face the electronic device directly, which is located in the negative position of the y-axis of the electronic device, and eliminate the camera The position corresponding to the shooting angle range can be set when the electronic device is in a handheld state or a horizontally placed state. The corresponding user's position relative to the electronic device can be: (270°, 90°); the electronic device is tilted to the left or tilted to the right In the state, the user is mostly watching videos or playing games. The user is located in the XOZ plane of the electronic device. The orientation corresponding to the shooting angle range of the camera is eliminated. The electronic device can be set to the left or right tilt state, corresponding The position of the user relative to the electronic device can be: (0°, 45°) or (180°, 45°).

The foregoing is only an exemplary description of possible implementation manners, and is not used to limit the embodiments of the present application. For example: the specific values of the above-mentioned azimuth and pitch angles can be different; different electronic devices have different camera shooting angle ranges, and different electronic devices are in the same placement position, and the user's orientation relative to the electronic device corresponding to the placement position may also be set different.

Compared with estimating the user's position relative to the electronic device according to the position of the face in the image in the following step 304, the position of the electronic device is used to indirectly estimate the user's position relative to the electronic device. The accuracy is a bit lower, but it is considered to exceed There are not many scenes with camera angles. In addition, the width of the fixed beam in the subsequent steps can also tolerate a certain angle error. Therefore, in this step, the position of the user relative to the electronic device is estimated according to the placement position of the electronic device, which can still meet the requirements of the implementation of this application. The requirements of the examples have little impact on the subsequent processing results of the examples of this application.

For example, according to the big data of the user's usage habits and the placement position of the electronic device, the position of the user with the greatest probability relative to the electronic device corresponding to different placement positions can be obtained. Taking the electronic device as a mobile phone as an example, assuming that the electronic device is placed in a handheld position, and excluding the positions corresponding to the shooting angles of the front camera and the rear camera, the position with the greatest probability of the user relative to the electronic device can be: located at the bottom position of the mobile phone , That is, the negative direction of the y-axis in Figure 3b.

Step 304: Obtain the position information of the user's face information in the image, and obtain the position of the user relative to the electronic device according to the position information; go to step 305.

In this step, projection and other related technologies can be used to directly convert the user's position information in the image into the azimuth and elevation angles in the three-dimensional Cartesian coordinate system shown in FIG. 3b to obtain the user's azimuth relative to the electronic device.

The following steps 305 to 306 are a possible implementation of step 202.

Step 305: Calculate the ratio K of the azimuth to each fixed beam.

K = included angle Δ _k /beam width

Wherein the angle [Delta] _k is the angle between the direction of orientation of the fixed beam k beamwidth

Is the beam width of the fixed beam k. k=1, 2,...,M.

In one possible implementation, the present step may include: k for a fixed beam, the angle [Delta] _k is calculated between the direction of the orientation of the fixed beam k, then _k is calculated beam angle [Delta] k is the fixed beam width

The ratio between.

Step 306: Select the fixed beam corresponding to the smallest ratio from the ratios as the main beam, and select at least one fixed beam corresponding to the ratio as the secondary beam starting from the largest ratio in the descending order of the ratio. .

In practical applications, the number of secondary beams may be one or more, and the specific number is not limited in this application. However, the total number of secondary beams and main beams does not exceed the number M of fixed beams. In other words, if M is 2, the number of sub-beams can only be 1, and if M is 5, the number of sub-beams can be 2, 3, or 4. In a possible implementation, the number of sub-beams may be two.

The beamforming coefficient of the main beam is denoted as W ₍₁₎ (f), and the beamforming coefficient of the secondary beam is denoted as W _(q) (f), q=2,...,S+1; S is the number of secondary beams.

Step 307: Obtain N channels of sound signals received by N microphones, and perform echo cancellation on the N channels of sound signals to obtain a sound signal: X(f,l)=[X ₁ (f,l), X ₂ (f,l) ),..., X _N (f, l)] ^T ; l is the frame number.

Wherein, the echo cancellation step is an optional step. How to perform echo cancellation on N channels of sound signals in this step is not limited in this application.

In practical applications, the related echo cancellation algorithm can be used to perform echo cancellation of N channels of sound signals. The echo cancellation algorithm includes time domain processing algorithm and frequency domain processing algorithm, which will not be repeated here. The basic principle of the adaptive echo cancellation algorithm is: use the reference signal to adaptively estimate the echo signal, and subtract the estimated echo signal from the sound signal received by the microphone to obtain an echoless sound signal.

There is no restriction on the execution order between step 307 and step 305 to step 306.

Step 308: Calculate the main output signal Y ₁ (f, l) = W ₍₁₎ (f) X (f, l) according to the sound signal X (f, l) and the beam forming coefficient W _{(1) (f) of the main beam} ); calculate the secondary output signal Y _q (f, l) = W _(q) (f) X (f, l) according to the sound signal X (f, l) and the beam forming coefficient W _{(q) (f) of the secondary beam} .

Step 309: Use the auxiliary output signal Y _q (f, l) to _{filter the main output signal Y 1} (f, l) to obtain the target sound signal.

In a possible implementation manner, assuming that there are two secondary beams, the secondary output signal is two, and assuming that the target sound signal is Z(f, l), then

Among them, y ₂ =[Y ₂ (f,l),..., Y ₂ (f,l-p+1)] ^T , y ₃ =[Y ₃ (f,l),..., Y ₃ (f,l -p+1)] ^T , b ₂ and b ₃ are p×1 dimensional filter coefficient matrices, p is the dimension of the filter coefficient matrix, and specific values can be independently selected and set in practical applications, and this application is not limited.

In practical applications, relevant filtering algorithms such as Wiener filtering, minimum mean square error criterion filtering, Kalman filtering, etc. can be used to perform the filtering processing in this step, which will not be repeated here.

In the embodiment of this application, at least one microphone is added on the basis of the conventional two microphones. Optionally, the added microphone may be a back microphone. These microphones form a stereo microphone array. Due to the influence of the housing of the electronic device, The microphone array can perform directional beamforming based on the 3D space, and achieve the effect of distinguishing front and rear noise.

Hereinafter, the implementation of step 301 will be explained by using the step flow shown in FIG. 4 as an example. See Figure 4, including:

Step 401: Establish a three-dimensional Cartesian coordinate system based on the electronic device.

Please refer to Figure 3b and the corresponding description for the establishment method of the three-dimensional Cartesian coordinate system, which will not be repeated here. In Figure 3b, the number of microphones N is taken as an example, and the three microphones are located on the top, bottom and back of the electronic device. Take for example.

Step 402: Obtain the coordinates of the N microphones in the three-dimensional Cartesian coordinate system according to the positions of the N microphones on the electronic device.

Assuming that the coordinates of each microphone Mici are (x _i , y _i , z _i ), i=1, 2, ..., N.

As shown in Figure 3b, the coordinates of the first microphone Mic1 are (x ₁ , y ₁ , z ₁ ); the coordinates of the second microphone Mic2 are (x ₂ , y ₂ , z ₂ ); the coordinates of the third microphone Mic3 are The coordinates are (x ₃ , y ₃ , z ₃ ).

Step 403: Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones in the three-dimensional Cartesian coordinate system.

Specifically, assuming that the direction of the target sound source is (θ, φ), θ is the azimuth angle of the target sound source, and φ is the pitch angle of the target sound source, the steering vector of the target sound source under ideal conditions

Among them, τ _i is the time delay of the microphone i relative to the origin of the coordinate, and the calculation formula refers to the following formula (1).

Where c is the speed of sound and f is the frequency.

Step 404: Obtain the frequency domain response matrix Γ(θ, φ, f) of the housing of the electronic device to the microphone.

In practical applications, the microphone's response to signals in different directions is generally calculated by allowing the microphone of the electronic device to receive the same audio in different directions, and the frequency domain response matrix of the housing of the electronic device, such as a mobile phone, to the microphone is obtained. The specific steps are: place the electronic equipment in a professional complete elimination room, take the electronic equipment as the center of the sphere, and play the same audio on different positions of the spherical surface with a radius of 1m. The audio is generally Gaussian white noise, and then it is received through the microphone of the electronic device. The audio signals from different positions on the spherical surface are based on the principle that the audio signals received by the microphones should be consistent without the influence of the electronic device casing. The response of the electronic device casing to each microphone is obtained by comparison and calculation, and the frequency domain response is obtained The matrix Γ(θ, φ, f).

Step 405: Calculate the true steering vector of the target sound source according to the frequency domain response matrix Γ(θ, φ, f) and the steering vector a(θ, φ, f) of the target sound source under ideal conditions

The true guidance vector of the target sound source

Step 406: According to the real steering vector of the target sound source

_{Calculate the beamforming coefficient W k} (f), direction and beam width of the fixed beams with a preset number of groups

k=1, 2, ..., M, M is the preset number of groups of fixed beams.

In a possible implementation, if M<4, the direction of each fixed beam points to a horizontal direction, and the 360° space is divided into M equally; if M≥4, the direction of a fixed beam points to the positive direction of the Z axis , The directions of the other M-1 fixed beams point to a horizontal direction, and the 360° space is divided into M-1 parts on average, similar to a lotus shape. For example, when M=5, the directions of the 5 groups of fixed beams can respectively point to the positive direction of the X axis, the negative direction of the X axis, the positive direction of the Y axis, the negative direction of the Y axis, and the positive direction of the Z axis.

In a possible implementation manner, M can be 5, and the beamforming coefficients W _k (f) of five groups of fixed beams are obtained, k = 1, 2, 3, 4, 5; the directions of the five groups of beams can be respectively directed to X-axis positive direction, X-axis negative direction, Y-axis positive direction, Y-axis negative direction and Z-axis positive direction; the beam widths of the five groups of fixed beams are respectively

In practical applications, a fixed beamforming algorithm can be used to calculate five sets of fixed beamforming coefficients.

The simple fixed beamforming algorithm is the delay addition algorithm, and its beamforming coefficient is

θ _k represents the azimuth angle of the fixed beam k, and φ _k represents the elevation angle of the fixed beam k. Taking the directions of the above five fixed beams pointing to the positive X-axis, the negative X-axis, the positive Y-axis, the negative Y-axis, and the positive Z-axis as an example, the azimuth and elevation angles of the five fixed beams (θ _k , Φ _k ) are respectively: (0°, 90°), (180°, 90°), (90°, 90°), (270°, 90°) and (0°, 0°).

Among them, the direction of the fixed beam can also be expressed by (azimuth angle, elevation angle). The azimuth angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, the angle between the ray projected on the XOY plane and the positive direction of the X axis in the direction of the fixed beam; the pitch angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, The angle between the direction of the fixed beam and the positive direction of the Z-axis; for details, please refer to the aforementioned example of FIG. 3c, which will not be repeated.

Complex fixed beamforming algorithms include superdirectional waves, constant beamwidth beamforming, etc. The above complex fixed beamforming algorithms finally boil down to a quadratic programming problem, which requires the help of convex optimization technology to solve and obtain the fixed beamforming coefficient W _k (f ).

Beam width

The setting of is related to the number of beams, the microphone layout on the electronic device, the selected fixed beamforming algorithm, and the range of sound sources that need to be picked up by each fixed beam. It can be set independently in practical applications and is not limited here.

The method shown in Figure 4 achieves the acquisition of M groups of fixed beams.

In a possible implementation manner, the sound pickup method shown in Figure 3a of the embodiment of the present application can be applied to a voice assistant scenario of an electronic device. For example, a driving scenario is a scenario in which a user uses a mobile phone voice assistant with a relatively high frequency. The noise environment in this scenario is relatively harsh, including engine sound, tire friction sound, air-conditioning sound, wind noise when opening windows, etc. This will directly cause the user's voice signal-to-noise ratio received by the mobile phone to decrease, and the voice assistant will pick up cleanly. The user’s voice posed a greater challenge. Specifically, referring to FIG. 5a, the electronic device may include: a sensor module, a scene analysis module, a front-end enhancement module, a voice wake-up module, a voiceprint recognition and confirmation module, a voice recognition module, and other interaction modules. Among them, the sensor module may include: a camera, a microphone, and a gravity sensor, through which data such as the user's image, sound signal, and the placement position of the electronic device can be obtained respectively; the scene analysis module is used to obtain a priori information about the sound signal, Perform targeted sound pickup; the front-end enhancement module is used to extract the user's (host) sound signal, that is, the target sound signal, while suppressing other interference signals and noise; the voice wake-up module is used to detect specific target sound signals Wake-up words, these specific wake-up words can "wake up" the electronic device, and whether the electronic device will eventually be awakened requires the voiceprint recognition confirmation module to "check". As the name implies, the voiceprint recognition confirmation module is used to check the user's voiceprint Recognition and confirmation, only when the user's voiceprint currently speaking the wake-up word is consistent with the preset user's voiceprint, the electronic device is finally awakened by the user.

Due to the resource cost limitation of electronic equipment, the voice wake-up module only supports one way to wake up, which requires the front-end enhancement module to only output one audio signal to the voice wake-up module for wake-up detection. When there are multiple speakers, it is necessary to accurately identify the target speech The person’s position and other information are then used for noise reduction algorithms such as echo cancellation, fixed beam forming, and multi-channel adaptive filtering for directional sound pickup enhancement, and the clean target sound signal is estimated to be sent to the voice wake-up module for voiceprint detection and voice wake-up recognition Wait for follow-up processing.

Based on the structure of the electronic device shown in FIG. 5a, in conjunction with the embodiment shown in FIG. 3a, the processing process of the embodiment shown in FIG. 3a in the electronic device shown in FIG. 5a will be described as an example. As shown in Fig. 5b, the interaction between the user and the sensor module includes: a camera captures an image containing a human face, a gravity sensor can obtain gravitational acceleration values of the electronic device in various directions, and a microphone obtains the user's voice signal. The image captured by the camera in the sensor module and the gravitational acceleration value obtained by the gravity sensor are transmitted to the scene analysis module, and the scene analysis module obtains the position of the user relative to the electronic device according to this, and transmits the position to the front-end enhancement module. The sensor module also transmits the sound signal obtained by the microphone to the front-end enhancement module, and the front-end enhancement module extracts the target sound signal according to the position and the sound signal. The target sound signal is a relatively clean voice signal. The target sound signal will be transmitted to the voice wake-up module and the voiceprint recognition and confirmation module. The voice wake-up module detects the specific wake-up word, and the voiceprint recognition and confirmation module compares the voiceprint of the target sound signal. Compare with the preset user's voiceprint to confirm whether the voiceprint is consistent; if the voiceprint recognition confirmation module confirms that the voiceprint is consistent, the voice recognition module interacts with other interaction modules according to the specific wake-up words extracted by the voice wake-up module.

It can be understood that some or all of the steps or operations in the above-mentioned embodiments are only examples, and the embodiments of the present application may also perform other operations or various operation variations. In addition, each step may be executed in a different order presented in the foregoing embodiment, and it may not be necessary to perform all operations in the foregoing embodiment.

Fig. 6a is a structural diagram of an embodiment of a sound pickup device of this application. As shown in Fig. 6a, the sound pickup device 600 may include:

The position obtaining unit 610 is configured to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

The beam selection unit 620 is configured to select, among the preset fixed beams of the electronic device, the fixed beam that is closest to the azimuth obtained by the azimuth obtaining unit 610 as the main beam, according to the distance from the farthest to the closest to the azimuth. Select at least one fixed beam as the secondary beam in sequence;

The signal calculation unit 630 is configured to use the beamforming coefficient of the main beam selected by the beam selection unit 620 to calculate the main output signal of the sound signal when the N microphones receive the sound signal, and use all Calculating the beamforming coefficient of the secondary beam selected by the beam selecting unit 620 to the secondary output signal of the sound signal;

The filtering unit 640 is configured to use the auxiliary output signal calculated by the signal calculation unit 630 to filter the main output signal to obtain a target sound signal.

Wherein, referring to FIG. 6b, the position obtaining unit 610 may include:

The image acquisition subunit 611 is configured to acquire the image captured by the camera of the electronic device;

The position obtaining subunit 612 is configured to, if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit 611, according to the position information of the facial information in the image , Obtain the position of the user relative to the electronic device; if the user’s face information is not recognized in the image obtained from the image subunit, obtain the placement position of the electronic device; Position to obtain the position of the user relative to the electronic device.

Wherein, referring to FIG. 6c, the beam selection unit 620 may include:

The ratio calculation subunit 621 is used to calculate the ratio K of the azimuth to each fixed beam; K _k = included angle Δ _k /beam width

Where K _k is the ratio of the azimuth to the fixed beam k, the angle Δ _k is the angle between the azimuth and the direction of the fixed beam k, and the beam width

The beam selection subunit 622 is configured to select, among the ratios calculated by the ratio calculation subunit, the fixed beam corresponding to the smallest ratio as the main beam, and the ratio from the largest to the smallest in the order of the ratio Start to select at least one fixed beam corresponding to the ratio as a secondary beam.

Referring to FIG. 7a, based on the device shown in FIG. 6a, the device 600 may further include:

The beam obtaining unit 650 is configured to obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.

Referring to FIG. 7b, the beam obtaining unit 650 may include:

The coordinate system establishment subunit 651 is used to establish a three-dimensional Cartesian coordinate system for the electronic device;

A coordinate obtaining subunit 652, configured to obtain the coordinates of the N microphones in the coordinate system;

The ideal steering vector calculation subunit 653 is configured to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

A matrix obtaining subunit 654, configured to obtain a frequency domain response matrix of the electronic device housing to the microphone;

The true steering vector calculation subunit 655 is configured to calculate the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;

The fixed beam calculation subunit 656 is configured to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.

The sound pickup device 600 provided in the embodiment shown in FIGS. 6a to 7b can be used to implement the technical solutions of the method embodiments shown in FIGS. 2 to 4 of this application. For its implementation principles and technical effects, please refer to the related descriptions in the method embodiments. .

It should be understood that the division of the various units of the sound pickup device shown in FIGS. 6a to 7b is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated. And these units can all be implemented in the form of software invocation through processing elements; they can also be implemented in the form of hardware; part of the units can also be implemented in the form of software invocation through processing elements, and some of the units can be implemented in the form of hardware. For example, the position obtaining unit may be a separately established processing element, or it may be integrated in a certain chip of the electronic device. The implementation of other units is similar. In addition, all or part of these units can be integrated together or implemented independently. In the implementation process, each step of the above method or each of the above units can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.

For example, the above units may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit; hereinafter referred to as ASIC), or, one or more micro-processing Digital Processor (Digital Singnal Processor; hereinafter referred to as DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array; hereinafter referred to as FPGA), etc. For another example, these units can be integrated together and implemented in the form of a System-On-a-Chip (hereinafter referred to as SOC).

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application. As shown in FIG. 8, the above-mentioned electronic device may include: a display screen; one or more processors; a memory; and one or more computer programs.

Among them, the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned electronic device may be a mobile terminal (mobile phone), a computer, a PAD, a wearable device, a smart screen, a drone, and an intelligent network connection. Vehicle (Intelligent Connected Vehicle; hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car), or in-vehicle equipment.

The above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions. When the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device is caused to perform the following steps:

Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;

When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;

Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal.

In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the user's position relative to the electronic device may include:

Acquiring an image captured by a camera of the electronic device;

If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;

If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.

In a possible implementation manner, when the instruction is executed by the device, the fixed beam closest to the azimuth is selected as the main beam among the preset fixed beams of the electronic device, and the fixed beam is selected according to the distance. The step of selecting at least one fixed beam as the secondary beam in the order of the azimuth from far to near may include:

The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.

In a possible implementation manner, when the instruction is executed by the device, the following steps are further executed before the step of obtaining the user's position relative to the electronic device:

In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams may include:

Obtaining the coordinates of the N microphones in the coordinate system;

Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;

Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;

The electronic device shown in FIG. 8 may be a terminal device or a circuit device built in the aforementioned terminal device. The device can be used to execute the functions/steps in the methods provided in the embodiments shown in FIGS. 2 to 4 of this application.

The electronic device 800 may include a processor 810, an external memory interface 820, an internal memory 821, a universal serial bus (USB) interface 830, a charging management module 840, a power management module 841, a battery 842, an antenna 1, and an antenna 2. , Mobile communication module 850, wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, earphone jack 870D, sensor module 880, buttons 890, motor 891, indicator 892, camera 893, display 894, and Subscriber identification module (subscriber identification module, SIM) card interface 895, etc. The sensor module 880 can include pressure sensor 880A, gyroscope sensor 880B, air pressure sensor 880C, magnetic sensor 880D, acceleration sensor 880E, distance sensor 880F, proximity light sensor 880G, fingerprint sensor 880H, temperature sensor 880J, touch sensor 880K, ambient light Sensor 880L, bone conduction sensor 880M, etc.

It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 800. In other embodiments of the present application, the electronic device 800 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 810 may include one or more processing units. For example, the processor 810 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc. Among them, the different processing units may be independent devices or integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.

A memory may also be provided in the processor 810 for storing instructions and data. In some embodiments, the memory in the processor 810 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 810. If the processor 810 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 810 is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 810 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 810 may include multiple sets of I2C buses. The processor 810 may be coupled to the touch sensor 880K, charger, flash, camera 893, etc., respectively through different I2C bus interfaces. For example, the processor 810 may couple the touch sensor 880K through an I2C interface, so that the processor 810 and the touch sensor 880K communicate through the I2C bus interface to implement the touch function of the electronic device 800.

The I2S interface can be used for audio communication. In some embodiments, the processor 810 may include multiple sets of I2S buses. The processor 810 may be coupled with the audio module 870 through an I2S bus to implement communication between the processor 810 and the audio module 870. In some embodiments, the audio module 870 may transmit audio signals to the wireless communication module 860 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.

The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 870 and the wireless communication module 860 may be coupled through a PCM bus interface. In some embodiments, the audio module 870 may also transmit audio signals to the wireless communication module 860 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is usually used to connect the processor 810 and the wireless communication module 860. For example, the processor 810 communicates with the Bluetooth module in the wireless communication module 860 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 870 may transmit audio signals to the wireless communication module 860 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.

The MIPI interface can be used to connect the processor 810 with the display screen 894, the camera 893 and other peripheral devices. The MIPI interface includes camera serial interface (CSI), display serial interface (DSI) and so on. In some embodiments, the processor 810 and the camera 893 communicate through a CSI interface to implement the shooting function of the electronic device 800. The processor 810 and the display screen 894 communicate through a DSI interface to realize the display function of the electronic device 800.

The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 810 with the camera 893, the display screen 894, the wireless communication module 860, the audio module 870, the sensor module 880, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 830 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 830 can be used to connect a charger to charge the electronic device 800, and can also be used to transfer data between the electronic device 800 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 800. In other embodiments of the present application, the electronic device 800 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.

The charging management module 840 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 840 may receive the charging input of the wired charger through the USB interface 830. In some embodiments of wireless charging, the charging management module 840 may receive the wireless charging input through the wireless charging coil of the electronic device 800. While the charging management module 840 charges the battery 842, it can also supply power to the electronic device through the power management module 841.

The power management module 841 is used to connect the battery 842, the charging management module 840 and the processor 810. The power management module 841 receives input from the battery 842 and/or the charging management module 840, and supplies power to the processor 810, the internal memory 821, the display screen 894, the camera 893, and the wireless communication module 860. The power management module 841 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 841 may also be provided in the processor 810. In other embodiments, the power management module 841 and the charging management module 840 may also be provided in the same device.

The wireless communication function of the electronic device 800 can be implemented by the antenna 1, the antenna 2, the mobile communication module 850, the wireless communication module 860, the modem processor, and the baseband processor.

The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 800 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 850 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 800. The mobile communication module 850 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 850 can receive electromagnetic waves by the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 850 can also amplify the signal modulated by the modem processor, and convert it to electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 850 may be provided in the processor 810. In some embodiments, at least part of the functional modules of the mobile communication module 850 and at least part of the modules of the processor 810 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker 870A, a receiver 870B, etc.), or displays an image or video through the display screen 894. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 810 and be provided in the same device as the mobile communication module 850 or other functional modules.

The wireless communication module 860 can provide applications on the electronic device 800 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 860 may be one or more devices integrating at least one communication processing module. The wireless communication module 860 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 810. The wireless communication module 860 may also receive the signal to be sent from the processor 810, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.

In some embodiments, the antenna 1 of the electronic device 800 is coupled with the mobile communication module 850, and the antenna 2 is coupled with the wireless communication module 860, so that the electronic device 800 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).

The electronic device 800 implements a display function through a GPU, a display screen 894, and an application processor. The GPU is a microprocessor for image processing, which connects the display 894 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 810 may include one or more GPUs, which execute program instructions to generate or change display information.

The display screen 894 is used to display images, videos, and so on. The display screen 894 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc. In some embodiments, the electronic device 800 may include one or N display screens 894, and N is a positive integer greater than one.

The electronic device 800 can realize a shooting function through an ISP, a camera 893, a video codec, a GPU, a display screen 894, and an application processor.

The ISP is used to process the data fed back from the camera 893. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 893.

The camera 893 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 800 may include 1 or N cameras 893, and N is a positive integer greater than 1.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 800 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The electronic device 800 may support one or more video codecs. In this way, the electronic device 800 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the electronic device 800 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.

The external memory interface 820 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 800. The external memory card communicates with the processor 810 through the external memory interface 820 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 821 may be used to store computer executable program code, where the executable program code includes instructions. The internal memory 821 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 800. In addition, the internal memory 821 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like. The processor 810 executes various functional applications and data processing of the electronic device 800 by running instructions stored in the internal memory 821 and/or instructions stored in a memory provided in the processor.

The electronic device 800 can implement audio functions through an audio module 870, a speaker 870A, a receiver 870B, a microphone 870C, a headphone interface 870D, and an application processor. For example, music playback, recording, etc.

The audio module 870 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 870 can also be used to encode and decode audio signals. In some embodiments, the audio module 870 may be provided in the processor 810, or part of the functional modules of the audio module 870 may be provided in the processor 810.

The speaker 870A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 800 can listen to music through the speaker 870A, or listen to a hands-free call.

The receiver 870B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 800 answers a call or voice message, it can receive the voice by bringing the receiver 870B close to the human ear.

Microphone 870C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 870C through the human mouth, and input the sound signal to the microphone 870C. The electronic device 800 may be provided with at least one microphone 870C. In other embodiments, the electronic device 800 may be provided with two microphones 870C, which can implement noise reduction functions in addition to collecting sound signals. In some other embodiments, the electronic device 800 can also be equipped with three, four or more microphones 870C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.

The earphone interface 870D is used to connect wired earphones. The earphone interface 870D may be a USB interface 830, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 880A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 880A may be provided on the display screen 894. There are many types of pressure sensors 880A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 880A, the capacitance between the electrodes changes. The electronic device 800 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 894, the electronic device 800 detects the intensity of the touch operation according to the pressure sensor 880A. The electronic device 800 may also calculate the touched position according to the detection signal of the pressure sensor 880A. In some embodiments, touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 880B can be used to determine the movement posture of the electronic device 800. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 880B. The gyro sensor 880B can be used for shooting anti-shake. Exemplarily, when the shutter is pressed, the gyroscope sensor 880B detects the jitter angle of the electronic device 800, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the jitter of the electronic device 800 through reverse movement to achieve anti-shake. The gyro sensor 880B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 880C is used to measure air pressure. In some embodiments, the electronic device 800 uses the air pressure value measured by the air pressure sensor 880C to calculate the altitude to assist positioning and navigation.

The magnetic sensor 880D includes a Hall sensor. The electronic device 800 can use the magnetic sensor 880D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 800 is a flip machine, the electronic device 800 can detect the opening and closing of the flip according to the magnetic sensor 880D. Furthermore, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.

The acceleration sensor 880E can detect the magnitude of the acceleration of the electronic device 800 in various directions (generally three axes). When the electronic device 800 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, and so on.

Distance sensor 880F, used to measure distance. The electronic device 800 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 800 may use the distance sensor 880F to measure the distance to achieve fast focusing.

The proximity light sensor 880G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 800 emits infrared light to the outside through the light emitting diode. The electronic device 800 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 800. When insufficient reflected light is detected, the electronic device 800 can determine that there is no object near the electronic device 800. The electronic device 800 can use the proximity light sensor 880G to detect that the user holds the electronic device 800 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 880G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 880L is used to sense the brightness of the ambient light. The electronic device 800 can adaptively adjust the brightness of the display screen 894 according to the perceived brightness of the ambient light. The ambient light sensor 880L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 880L can also cooperate with the proximity light sensor 880G to detect whether the electronic device 800 is in the pocket to prevent accidental touch.

The fingerprint sensor 880H is used to collect fingerprints. The electronic device 800 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.

The temperature sensor 880J is used to detect temperature. In some embodiments, the electronic device 800 uses the temperature detected by the temperature sensor 880J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 880J exceeds a threshold value, the electronic device 800 executes to reduce the performance of the processor located near the temperature sensor 880J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 800 heats the battery 842 to avoid abnormal shutdown of the electronic device 800 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 800 boosts the output voltage of the battery 842 to avoid abnormal shutdown caused by low temperature.

The touch sensor 880K is also called "touch device". The touch sensor 880K can be arranged on the display screen 894, and the touch screen is composed of the touch sensor 880K and the display screen 894, which is also called a “touch screen”. The touch sensor 880K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation may be provided through the display screen 894. In other embodiments, the touch sensor 880K may also be disposed on the surface of the electronic device 800, which is different from the position of the display screen 894.

The bone conduction sensor 880M can acquire vibration signals. In some embodiments, the bone conduction sensor 880M can obtain the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 880M can also contact the human pulse and receive the blood pressure pulse signal. In some embodiments, the bone conduction sensor 880M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 870 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 880M, and realize the voice function. The application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 880M, and realize the heart rate detection function.

The button 890 includes a power button, a volume button, and so on. The button 890 may be a mechanical button. It can also be a touch button. The electronic device 800 may receive key input, and generate key signal input related to user settings and function control of the electronic device 800.

The motor 891 can generate vibration prompts. The motor 891 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photo taking, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 894, the motor 891 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 892 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.

The SIM card interface 895 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 895 or pulled out from the SIM card interface 895 to achieve contact and separation with the electronic device 800. The electronic device 800 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 895 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 895 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 895 can also be compatible with different types of SIM cards. The SIM card interface 895 can also be compatible with external memory cards. The electronic device 800 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the electronic device 800 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the electronic device 800 and cannot be separated from the electronic device 800.

It should be understood that the electronic device 800 shown in FIG. 8 can implement various processes of the methods provided in the embodiments shown in FIGS. 2 to 4 of this application. The operations and/or functions of each module in the electronic device 800 are respectively for implementing the corresponding processes in the foregoing method embodiments. For details, please refer to the descriptions in the method embodiments shown in Figs. 2 to 4 of this application. To avoid repetition, detailed descriptions are appropriately omitted here.

It should be understood that the processor 810 in the electronic device 800 shown in FIG. 8 may be a system-on-chip SOC, and the processor 810 may include a central processing unit (CPU), and may further include other types of processors. For example: Graphics Processing Unit (GPU), etc.

In short, each part of the processor or processing unit inside the processor 810 can cooperate to implement the previous method flow, and the corresponding software program of each part of the processor or processing unit can be stored in the internal memory 121.

The present application also provides an electronic device. The device includes a storage medium and a central processing unit. The storage medium may be a non-volatile storage medium. A computer executable program is stored in the storage medium. The central processing unit is connected to the The non-volatile storage medium is connected, and the computer executable program is executed to implement the methods provided by the embodiments shown in FIGS. 2 to 4 of this application.

In the above embodiments, the processors involved may include, for example, CPU, DSP, microcontroller or digital signal processor, and may also include GPU, embedded neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and Image Signal Processing (Image Signal Processing; hereinafter referred to as ISP), which may also include necessary hardware accelerators or logic processing hardware circuits, such as ASIC, or one or more integrated circuits used to control the execution of the technical solutions of this application Circuit etc. In addition, the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium.

The embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, which when running on a computer, causes the computer to execute the functions provided by the embodiments shown in Figs. 2 to 4 of the present application. method.

The embodiments of the present application also provide a computer program product. The computer program product includes a computer program that, when running on a computer, causes the computer to execute the method provided by the embodiments shown in FIGS. 2 to 4 of the present application.

In the embodiments of the present application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "The following at least one item" and similar expressions refer to any combination of these items, including any combination of single items or plural items. For example, at least one of a, b, and c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single, or There can be more than one.

A person of ordinary skill in the art may be aware that the units and algorithm steps described in the embodiments disclosed herein can be implemented by a combination of electronic hardware, computer software, and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, if any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as ROM), random access memory (Random Access Memory; hereinafter referred to as RAM), magnetic disks or optical disks, etc. A medium that can store program codes.

The above are only specific implementations of this application. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed in this application, and they should all be covered by the protection scope of this application. The protection scope of this application shall be subject to the protection scope of the claims.

Claims

A sound pickup method, characterized in that it comprises:

Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;

When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;

Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal.
The method according to claim 1, wherein the obtaining the position of the user relative to the electronic device comprises:

Acquiring an image captured by a camera of the electronic device;

If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;

If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.
The method according to claim 1 or 2, characterized in that, in the preset fixed beams of the electronic device, the fixed beam closest to the azimuth is selected as the main beam, and the fixed beam is selected from the farthest to the farthest according to the distance to the azimuth. Select at least one fixed beam as the secondary beam in the near sequence, including:

Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;

The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
The method according to claim 1 or 2, wherein before obtaining the user's position relative to the electronic device, the method further comprises:

Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
The method according to claim 4, wherein the obtaining the beam forming coefficient, direction, and beam width of a preset number of fixed beams comprises:

Establish a three-dimensional Cartesian coordinate system for electronic equipment;

Obtaining the coordinates of the N microphones in the coordinate system;

Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;

Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;

Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
A sound pickup device, characterized in that it comprises:

The position obtaining unit is used to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

The beam selection unit is configured to select, among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth obtained by the azimuth obtaining unit as the main beam, and select in the order from the farthest to the nearest to the azimuth. At least one fixed beam is used as a secondary beam;

The signal calculation unit is configured to use the beamforming coefficient of the main beam selected by the beam selection unit to calculate the main output signal of the sound signal when the N microphones receive the sound signal, and use the beam The beamforming coefficient of the secondary beam selected by the selection unit is calculated as the secondary output signal of the sound signal;

The filtering unit is configured to perform filtering processing on the main output signal using the auxiliary output signal calculated by the signal calculation unit to obtain a target sound signal.
The device according to claim 6, wherein the position obtaining unit comprises:

An image acquisition subunit for acquiring the image captured by the camera of the electronic device;

The orientation obtaining subunit is configured to obtain the facial information of the user of the electronic device according to the position information of the facial information in the image if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit The position of the user relative to the electronic device; if the user's face information is not recognized in the image obtained from the image subunit, obtain the placement position of the electronic device; according to the placement position To obtain the position of the user relative to the electronic device.
The device according to claim 6 or 7, wherein the beam selection unit comprises:

The ratio calculation subunit is used to calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;

The beam selection subunit is configured to select, among the ratios calculated by the ratio calculation subunit, the fixed beam corresponding to the smallest ratio as the main beam, starting from the largest ratio in the descending order of the ratio At least one fixed beam corresponding to the ratio is selected as a secondary beam.
The device according to claim 6 or 7, further comprising:

The beam obtaining unit is used to obtain the beam forming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
The apparatus according to claim 9, wherein the beam obtaining unit comprises:

The coordinate system establishment subunit is used to establish a three-dimensional Cartesian coordinate system for electronic equipment;

A coordinate obtaining subunit, configured to obtain the coordinates of the N microphones in the coordinate system;

The ideal steering vector calculation subunit is used to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

A matrix obtaining subunit for obtaining a frequency domain response matrix of the electronic device housing to the microphone;

A true steering vector calculation subunit, configured to calculate the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;

The fixed beam calculation subunit is configured to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
An electronic device, characterized in that it comprises:

A display screen; one or more processors; a memory; a plurality of application programs; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs Including instructions, when the instructions are executed by the device, cause the device to perform the following steps:

Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;

Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;

When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;

Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal. .
The electronic device according to claim 11, wherein when the instruction is executed by the device, the step of obtaining the user's position relative to the electronic device comprises:

Acquiring an image captured by a camera of the electronic device;

If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;

If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.
The electronic device according to claim 11 or 12, wherein when the instruction is executed by the device, the fixed beam that is closest to the azimuth is selected among the preset fixed beams of the electronic device As the main beam, the step of selecting at least one fixed beam as the secondary beam in the order from the farthest to the nearer distance from the azimuth includes:

Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;

The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
The electronic device according to claim 11 or 12, wherein when the instruction is executed by the device, the following steps are performed before the step of obtaining the user's position relative to the electronic device:

Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
The electronic device according to claim 14, wherein when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams comprises:

Establish a three-dimensional Cartesian coordinate system for electronic equipment;

Obtaining the coordinates of the N microphones in the coordinate system;

Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;

Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;

Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;

Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which when running on a computer, causes the computer to execute the method according to any one of claims 1 to 5.