Nothing Special   »   [go: up one dir, main page]

WO2021180085A1 - Sound pickup method and apparatus and electronic device - Google Patents

Sound pickup method and apparatus and electronic device Download PDF

Info

Publication number
WO2021180085A1
WO2021180085A1 PCT/CN2021/079789 CN2021079789W WO2021180085A1 WO 2021180085 A1 WO2021180085 A1 WO 2021180085A1 CN 2021079789 W CN2021079789 W CN 2021079789W WO 2021180085 A1 WO2021180085 A1 WO 2021180085A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
fixed
azimuth
user
ratio
Prior art date
Application number
PCT/CN2021/079789
Other languages
French (fr)
Chinese (zh)
Inventor
黄磊
鲍光照
缪海波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021180085A1 publication Critical patent/WO2021180085A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
    • H04M2201/405Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition involving speaker-dependent recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/12Details of telephonic subscriber devices including a sensor for measuring a physical value, e.g. temperature or motion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • This application relates to the technical field of smart terminals, in particular to methods, devices and electronic equipment for picking up sound.
  • the above-mentioned human-computer interaction process generally includes: using a microphone of an electronic device to pick up an audio signal; using a front-end enhancement algorithm to estimate a clean voice signal from the audio signal; using the voice signal for voice wake-up and voice recognition.
  • the front-end enhancement algorithm mainly extracts clean speech signals through noise cancellation.
  • Noise cancellation includes echo cancellation, interference suppression, and background noise removal.
  • the echo that needs to be eliminated in echo cancellation is generally the spontaneous sound of the electronic device's horn during human-computer interaction.
  • the interference in interference suppression is generally directional noise, such as the TV sound in the living room environment, the car horn in the car environment, and so on.
  • the performance of the front-end enhancement algorithm directly affects the success rate of human-computer interaction, and ultimately affects the user experience.
  • the front-end enhancement algorithm mainly uses the microphone on the mobile phone to eliminate noise. Considering the limitations of power consumption and computing resources, in most cases only one microphone is used for single-mic noise reduction. This algorithm is called a single-channel noise reduction algorithm. Common single-channel noise reduction algorithms include spectral subtraction, Wiener filtering algorithm, and deep learning method. The single-channel noise reduction algorithm has no effect on unpredictable non-stationary noise, and speech distortion is serious under the condition of low signal-to-noise ratio.
  • dual-channel noise reduction algorithms based on two microphones are becoming more and more popular in electronic devices. It is mainly used in scenarios that are not sensitive to power consumption, such as in-vehicle scenarios where users can charge electronic devices at any time.
  • the main idea of the dual-channel noise reduction algorithm is to select one microphone as the main microphone and one microphone as the auxiliary microphone. First, determine the time-frequency information of the noise in the main microphone data based on the harmonic detection algorithm of human voice and then use the idea of filtering.
  • the auxiliary microphone noise filters out the main microphone noise, improves the voice quality, and achieves the idea of noise reduction.
  • the harmonic detection algorithm cannot distinguish between the human voice interference and the target human voice containing the wake-up word, and it is basically difficult to eliminate the human voice interference.
  • the embodiment of the present application provides a sound pickup method to alleviate the problem of voice distortion and incomplete elimination of human voice interference.
  • an embodiment of the present application provides a sound pickup method, including:
  • the electronic device is equipped with N microphones; N is an integer greater than or equal to 3; the above-mentioned electronic devices may include mobile terminals (mobile phones), computers, PADs, wearable devices, smart screens, drones, Intelligent Connected Vehicle (Intelligent Connected Vehicle; hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or on-board equipment; optionally, in order to achieve a better sound pickup effect, N microphones
  • ICV Intelligent Connected Vehicle
  • smart/intelligent car smart/intelligent car
  • the device can be distributed, for example, in different parts of the electronic device.
  • the location of each microphone includes but not limited to: the upper part, lower part, top, bottom of the electronic device, the upper surface where the screen is located, and/or the back, etc.;
  • the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short; the number of preset fixed beams is greater than or equal to 2;
  • the beamforming coefficient of the main beam is used to calculate the main output signal of the sound signal
  • the beamforming coefficient of the side beam is used to calculate the auxiliary output signal of the sound signal
  • the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound signal of the target sound source can be obtained more accurately from the sound signal, effectively Reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device casing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the voice distortion under the condition of low signal-to-noise ratio Problems and incomplete elimination of vocal interference.
  • obtaining the position of the user relative to the electronic device includes:
  • the position of the user relative to the electronic device is obtained according to the position information of the facial information in the image
  • the placement position of the electronic device is obtained; according to the placement position, the user's position relative to the electronic device is obtained.
  • selecting the fixed beam with the closest azimuth as the main beam, and selecting at least one fixed beam as the sub-beam in the order of the distance and azimuth from far to short including:
  • K k included angle ⁇ k /beam width
  • K k is the ratio of the azimuth to the fixed beam k
  • the angle ⁇ k is the angle between the azimuth and the direction of the fixed beam k
  • k 1, 2, ..., M
  • M is the number of fixed beam groups
  • the fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam, starting from the largest ratio in the order of the ratio from larger to smaller.
  • the method before obtaining the user's position relative to the electronic device, the method further includes:
  • obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:
  • an embodiment of the present application provides a sound pickup device, including:
  • the position obtaining unit is used to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
  • the beam selection unit is configured to select, among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth obtained by the azimuth obtaining unit as the main beam, and select at least one fixed beam as the secondary beam in the order of the distance and azimuth from far to short;
  • the signal calculation unit is used to calculate the main output signal of the sound signal using the beam forming coefficient of the main beam selected by the beam selection unit when the sound signal is received by the N microphones, and use the beam forming coefficient of the sub beam selected by the beam selection unit Calculate the secondary output signal of the sound signal;
  • the filtering unit is configured to perform filtering processing on the main output signal using the auxiliary output signal calculated by the signal calculation unit to obtain the target sound signal.
  • the position obtaining unit includes:
  • the image acquisition subunit is used to acquire the image captured by the camera of the electronic device
  • the position obtaining subunit is used to obtain the position of the user relative to the electronic device according to the position information of the face information in the image if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit; The face information of the user is not recognized in the image obtained by the subunit, and the placement position of the electronic device is obtained; according to the placement position, the position of the user relative to the electronic device is obtained.
  • the beam selection unit includes:
  • the ratio calculation subunit is used to calculate the ratio K of the azimuth to each fixed beam;
  • K k the included angle ⁇ k /beam width
  • K k is the ratio of the azimuth to the fixed beam k
  • the angle ⁇ k is the angle between the azimuth and the direction of the fixed beam k
  • k 1, 2, ..., M
  • M is the number of fixed beam groups;
  • the beam selection subunit is used to select the fixed beam corresponding to the smallest ratio as the main beam among the ratios calculated by the ratio calculation subunit, and select at least one fixed beam corresponding to the ratio starting from the largest ratio in the order of the ratio from the largest to the smallest. As a secondary beam.
  • it also includes:
  • the beam obtaining unit is used to obtain the beam forming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
  • the beam obtaining unit includes:
  • the coordinate system establishment subunit is used to establish a three-dimensional Cartesian coordinate system for electronic equipment
  • the coordinate obtaining subunit is used to obtain the coordinates of the N microphones in the coordinate system
  • the ideal steering vector calculation subunit is used to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
  • the matrix obtaining subunit is used to obtain the frequency domain response matrix of the housing of the electronic device to the microphone;
  • the true steering vector calculation subunit is used to calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;
  • the fixed beam calculation subunit is used to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
  • an electronic device including:
  • the electronic device is equipped with N microphones; N is an integer greater than or equal to 3;
  • the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short;
  • the beamforming coefficient of the main beam is used to calculate the main output signal of the sound signal
  • the beamforming coefficient of the side beam is used to calculate the auxiliary output signal of the sound signal
  • the step of obtaining the position of the user relative to the electronic device includes:
  • the position of the user relative to the electronic device is obtained according to the position information of the facial information in the image
  • the placement position of the electronic device is obtained; according to the placement position, the user's position relative to the electronic device is obtained.
  • the fixed beam with the closest azimuth is selected as the main beam among the preset fixed beams of the electronic device, and at least one fixed beam is selected in the order of the distance from the farthest to the nearer.
  • the steps of using the beam as a secondary beam include:
  • K k included angle ⁇ k /beam width
  • K k is the ratio of the azimuth to the fixed beam k
  • the angle ⁇ k is the angle between the azimuth and the direction of the fixed beam k
  • k 1, 2, ..., M
  • M is the number of fixed beam groups
  • the fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam, starting from the largest ratio in the order of the ratio from larger to smaller.
  • the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the method of the first aspect.
  • an embodiment of the present application provides a computer program, which is used to execute the method of the first aspect when the computer program is executed by a computer.
  • the program in the fifth aspect may be stored in whole or in part on a storage medium that is packaged with the processor, and may also be stored in part or in a memory that is not packaged with the processor.
  • FIG. 1 is an example diagram of microphone settings on an electronic device according to an embodiment of the application
  • FIG. 2 is a flowchart of an embodiment of the sound pickup method of this application.
  • Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application.
  • Figure 3b is an example diagram of the three-dimensional Cartesian coordinate system of the electronic device of this application.
  • FIG. 3c is an example diagram of the azimuth angle and the pitch angle according to the embodiment of the application.
  • FIG. 3d is an example diagram of the placement position of the electronic device according to the embodiment of the application.
  • FIG. 4 is a flowchart of an embodiment of a method for implementing one step of this application.
  • 5a and 5b are a structural diagram of an electronic device to which the sound pickup method of this application is applicable;
  • Fig. 6a is a schematic structural diagram of an embodiment of a sound pickup device according to the present application.
  • Fig. 6b is a schematic structural diagram of an embodiment of a unit of the sound pickup device of the present application.
  • Fig. 6c is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application.
  • Fig. 7a is a schematic structural diagram of another embodiment of a sound pickup device according to the present application.
  • Fig. 7b is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application.
  • the single-channel noise reduction algorithm has serious voice distortion under the condition of low signal-to-noise ratio, and the dual-channel noise reduction algorithm is basically difficult to eliminate the human voice interference.
  • this application proposes a sound pickup method that can It can alleviate the voice distortion under the condition of low signal-to-noise ratio, and can also reduce the human voice interference.
  • the at least three microphones are provided on the electronic device, and the location of each microphone on the electronic device is not limited in the embodiment of the present application.
  • the at least three microphones are dispersedly arranged on the electronic device, for example, arranged in different parts of the electronic device, and the position of each microphone includes but is not limited to: the upper part of the electronic device , Bottom, top, bottom, top surface where the screen is located, and/or back, etc.
  • three microphones can be respectively arranged on the top of the electronic device, the bottom of the electronic device, and the back of the electronic device.
  • the embodiments of this application can be applied to the scenario of voice assistant application of electronic equipment, providing relatively clean voice signals for voice wake-up and voice recognition, and can also be applied to other scenarios, such as recording and video recording for a certain person, which need to be relatively clean.
  • the scene of the voice signal can be applied to the scenario of voice assistant application of electronic equipment, providing relatively clean voice signals for voice wake-up and voice recognition, and can also be applied to other scenarios, such as recording and video recording for a certain person, which need to be relatively clean. The scene of the voice signal.
  • FIG. 2 is a flowchart of an embodiment of a sound pickup method according to this application. As shown in FIG. 2, the above method may include:
  • Step 201 Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones, N ⁇ 3.
  • Step 202 Among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order from the farthest to the shortest from the azimuth.
  • Step 203 When the N microphones receive the sound signal, use the beamforming coefficient of the main beam to calculate the main output signal of the sound signal, and use the beamforming coefficient of the secondary beam to calculate the sound signal The secondary output signal.
  • Step 204 Use the auxiliary output signal to filter the main output signal to obtain a target sound signal.
  • the target sound signal obtained is a clean speech signal with noise filtered out.
  • the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound of the target sound source can be obtained more accurately from the sound signal Signal, effectively reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device housing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the condition of low signal-to-noise ratio The voice distortion problem under the.
  • the at least three microphones are dispersedly arranged on different parts of the electronic device, for example, when the three microphones are respectively arranged on the top, bottom, and back of the electronic device, due to the influence of the housing of the electronic device, the front and rear noise can be better distinguished. Enhance the effect of filtering processing, alleviate the problem of voice distortion under the condition of low signal-to-noise ratio and incomplete elimination of human voice interference.
  • Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application. As shown in Fig. 3a, the method may include:
  • Step 301 Obtain beamforming coefficients, directions, and beam widths of a preset number of fixed beams.
  • the preset group number is greater than or equal to 2, that is, the minimum value of the preset group number is 2, and the maximum value is not limited.
  • this step is generally a preset step, that is, after obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams, the obtained information can be stored in the electronic device, without performing step 302 every time. Execute this step before step 309. In practical applications, the above-mentioned information stored in the electronic device can also be modified.
  • the three-dimensional Cartesian coordinate system established based on the electronic device in the embodiment shown in FIG. 4 is described.
  • the three-dimensional Cartesian coordinate system uses the center point of the upper surface of the electronic device as the coordinates.
  • the origin and the symmetry axes of the upper surface of the electronic device are the X-axis and the Y-axis, respectively, and the vertical line passing through the center point of the upper surface of the electronic device is the Z-axis.
  • the upper surface of the electronic device is generally the surface of the electronic device on the side of the display screen.
  • the following steps 302 to 304 are a possible implementation method of the step of obtaining the user's position relative to the electronic device.
  • Step 302 Obtain the image captured by the camera of the electronic device, and determine whether the face information of the user of the electronic device can be recognized from the image, if not, execute step 303; if yes, execute step 304.
  • the electronic device can store the facial information of the user of the electronic device.
  • the facial information can be independently set by the user of the electronic device in the electronic device.
  • this step can use face recognition detection technology to recognize the user's face information.
  • the face recognition detection technology uses the camera of an electronic device to collect images or video streams containing human faces, and automatically A series of related technologies that detect and track faces in the collected images or video streams, and then perform facial recognition on the detected faces.
  • the user’s face information can be identified in the image or video stream. location information.
  • Step 303 Obtain the placement position of the electronic device, and estimate the position of the user relative to the electronic device according to the placement position; go to step 305.
  • the user's position relative to the electronic device can be represented by (azimuth angle, pitch angle), where the user's position relative to the electronic device can be represented in the three-dimensional Cartesian coordinate system shown in Figure 3b.
  • a ray whose origin points to the center point of the user’s face is expressed as the azimuth angle is: the angle between the ray projected on the XOY plane and the positive direction of the X-axis by the ray whose origin of the coordinate system points to the center point of the user’s face;
  • the angle is: the angle between the ray with the origin of the coordinate system pointing to the center of the user's face and the positive direction of the Z axis.
  • OA is the ray whose origin of the coordinate system points to the center point of the user's face, that is, the position of the user relative to the electronic device
  • the azimuth is the ray OA at
  • ⁇ XOB The angle between the ray OB projected on the XOY plane and the positive direction of the X axis, as shown in Figure 3c
  • ⁇ ZOA the elevation angle
  • the identification of the user's position relative to the electronic device with the azimuth angle and the pitch angle is only an example, and is not used to limit other representations or implementations of the user's position relative to the electronic device in the embodiment of the present application.
  • a g-sensor in the electronic device may be used to obtain the placement position of the electronic device.
  • the gravity sensor can obtain the gravitational acceleration of the electronic device in different directions, and the value of the gravitational acceleration obtained by the gravity sensor in different directions will be different when the position of the electronic device is different.
  • the threshold range of X-axis gravitational acceleration, the threshold range of Y-axis gravitational acceleration, and the threshold range of Z-axis gravitational acceleration corresponding to different placement positions of the electronic device can be preset.
  • the output can be based on the gravity acceleration in this step.
  • the X-axis gravitational acceleration, Y-axis gravitational acceleration, and Z-axis gravitational acceleration of, determine the threshold range in which they are located, so as to obtain the placement position of the electronic device.
  • the gravitational accelerations of the X-axis, Y-axis, and Z-axis are g 1 , g 2 , and g 3 , respectively, when
  • the corresponding relationship between different placement positions and the user's position relative to the electronic device can be preset; then, the estimating the user's position relative to the electronic device based on the placement position may include:
  • the implementation method is described as follows: if the electronic device does not recognize the user's face information of the electronic device from the image taken by the camera, it indicates that the user's face orientation exceeds the camera's shooting angle range, and then it can be placed according to the position And the shooting angle range of the camera to estimate the most likely position of the user relative to the electronic device. Specifically,
  • the position corresponding to the shooting angle range of the camera may be first excluded from all positions of the user relative to the electronic device;
  • the position with the greatest probability of the user relative to the electronic device in the different placement positions of the electronic device can be calculated, so as to obtain: Correspondence between the two directions.
  • the position corresponding to the shooting angle range can be set when the electronic device is in a handheld state or a horizontally placed state.
  • the corresponding user's position relative to the electronic device can be: (270°, 90°); the electronic device is tilted to the left or tilted to the right In the state, the user is mostly watching videos or playing games.
  • the user is located in the XOZ plane of the electronic device.
  • the orientation corresponding to the shooting angle range of the camera is eliminated.
  • the electronic device can be set to the left or right tilt state, corresponding The position of the user relative to the electronic device can be: (0°, 45°) or (180°, 45°).
  • the foregoing is only an exemplary description of possible implementation manners, and is not used to limit the embodiments of the present application.
  • the specific values of the above-mentioned azimuth and pitch angles can be different; different electronic devices have different camera shooting angle ranges, and different electronic devices are in the same placement position, and the user's orientation relative to the electronic device corresponding to the placement position may also be set different.
  • the position of the electronic device is used to indirectly estimate the user's position relative to the electronic device.
  • the accuracy is a bit lower, but it is considered to exceed There are not many scenes with camera angles.
  • the width of the fixed beam in the subsequent steps can also tolerate a certain angle error. Therefore, in this step, the position of the user relative to the electronic device is estimated according to the placement position of the electronic device, which can still meet the requirements of the implementation of this application. The requirements of the examples have little impact on the subsequent processing results of the examples of this application.
  • the position of the user with the greatest probability relative to the electronic device corresponding to different placement positions can be obtained.
  • the electronic device as a mobile phone as an example, assuming that the electronic device is placed in a handheld position, and excluding the positions corresponding to the shooting angles of the front camera and the rear camera, the position with the greatest probability of the user relative to the electronic device can be: located at the bottom position of the mobile phone , That is, the negative direction of the y-axis in Figure 3b.
  • Step 304 Obtain the position information of the user's face information in the image, and obtain the position of the user relative to the electronic device according to the position information; go to step 305.
  • projection and other related technologies can be used to directly convert the user's position information in the image into the azimuth and elevation angles in the three-dimensional Cartesian coordinate system shown in FIG. 3b to obtain the user's azimuth relative to the electronic device.
  • step 202 The following steps 305 to 306 are a possible implementation of step 202.
  • Step 305 Calculate the ratio K of the azimuth to each fixed beam.
  • K included angle ⁇ k /beam width
  • [Delta] k is the angle between the direction of orientation of the fixed beam k beamwidth Is the beam width of the fixed beam k.
  • k 1, 2,...,M.
  • the present step may include: k for a fixed beam, the angle [Delta] k is calculated between the direction of the orientation of the fixed beam k, then k is calculated beam angle [Delta] k is the fixed beam width The ratio between.
  • Step 306 Select the fixed beam corresponding to the smallest ratio from the ratios as the main beam, and select at least one fixed beam corresponding to the ratio as the secondary beam starting from the largest ratio in the descending order of the ratio. .
  • the number of secondary beams may be one or more, and the specific number is not limited in this application.
  • the total number of secondary beams and main beams does not exceed the number M of fixed beams.
  • M the number of sub-beams can only be 1
  • M the number of sub-beams can be 2, 3, or 4.
  • the number of sub-beams may be two.
  • the echo cancellation step is an optional step. How to perform echo cancellation on N channels of sound signals in this step is not limited in this application.
  • the related echo cancellation algorithm can be used to perform echo cancellation of N channels of sound signals.
  • the echo cancellation algorithm includes time domain processing algorithm and frequency domain processing algorithm, which will not be repeated here.
  • the basic principle of the adaptive echo cancellation algorithm is: use the reference signal to adaptively estimate the echo signal, and subtract the estimated echo signal from the sound signal received by the microphone to obtain an echoless sound signal.
  • step 307 There is no restriction on the execution order between step 307 and step 305 to step 306.
  • Step 309 Use the auxiliary output signal Y q (f, l) to filter the main output signal Y 1 (f, l) to obtain the target sound signal.
  • relevant filtering algorithms such as Wiener filtering, minimum mean square error criterion filtering, Kalman filtering, etc. can be used to perform the filtering processing in this step, which will not be repeated here.
  • At least one microphone is added on the basis of the conventional two microphones.
  • the added microphone may be a back microphone.
  • These microphones form a stereo microphone array. Due to the influence of the housing of the electronic device, The microphone array can perform directional beamforming based on the 3D space, and achieve the effect of distinguishing front and rear noise.
  • step 301 will be explained by using the step flow shown in FIG. 4 as an example. See Figure 4, including:
  • Step 401 Establish a three-dimensional Cartesian coordinate system based on the electronic device.
  • Figure 3b Please refer to Figure 3b and the corresponding description for the establishment method of the three-dimensional Cartesian coordinate system, which will not be repeated here.
  • the number of microphones N is taken as an example, and the three microphones are located on the top, bottom and back of the electronic device. Take for example.
  • Step 402 Obtain the coordinates of the N microphones in the three-dimensional Cartesian coordinate system according to the positions of the N microphones on the electronic device.
  • the coordinates of the first microphone Mic1 are (x 1 , y 1 , z 1 ); the coordinates of the second microphone Mic2 are (x 2 , y 2 , z 2 ); the coordinates of the third microphone Mic3 are The coordinates are (x 3 , y 3 , z 3 ).
  • Step 403 Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones in the three-dimensional Cartesian coordinate system.
  • the steering vector of the target sound source under ideal conditions Among them, ⁇ i is the time delay of the microphone i relative to the origin of the coordinate, and the calculation formula refers to the following formula (1).
  • Step 404 Obtain the frequency domain response matrix ⁇ ( ⁇ , ⁇ , f) of the housing of the electronic device to the microphone.
  • the microphone's response to signals in different directions is generally calculated by allowing the microphone of the electronic device to receive the same audio in different directions, and the frequency domain response matrix of the housing of the electronic device, such as a mobile phone, to the microphone is obtained.
  • the specific steps are: place the electronic equipment in a professional complete elimination room, take the electronic equipment as the center of the sphere, and play the same audio on different positions of the spherical surface with a radius of 1m.
  • the audio is generally Gaussian white noise, and then it is received through the microphone of the electronic device.
  • the audio signals from different positions on the spherical surface are based on the principle that the audio signals received by the microphones should be consistent without the influence of the electronic device casing.
  • the response of the electronic device casing to each microphone is obtained by comparison and calculation, and the frequency domain response is obtained
  • Step 405 Calculate the true steering vector of the target sound source according to the frequency domain response matrix ⁇ ( ⁇ , ⁇ , f) and the steering vector a( ⁇ , ⁇ , f) of the target sound source under ideal conditions
  • the direction of each fixed beam points to a horizontal direction, and the 360° space is divided into M equally; if M ⁇ 4, the direction of a fixed beam points to the positive direction of the Z axis ,
  • the directions of the other M-1 fixed beams point to a horizontal direction, and the 360° space is divided into M-1 parts on average, similar to a lotus shape.
  • a fixed beamforming algorithm can be used to calculate five sets of fixed beamforming coefficients.
  • the simple fixed beamforming algorithm is the delay addition algorithm, and its beamforming coefficient is ⁇ k represents the azimuth angle of the fixed beam k, and ⁇ k represents the elevation angle of the fixed beam k.
  • the azimuth and elevation angles of the five fixed beams ( ⁇ k , ⁇ k ) are respectively: (0°, 90°), (180°, 90°), (90°, 90°), (270°, 90°) and (0°, 0°).
  • the direction of the fixed beam can also be expressed by (azimuth angle, elevation angle).
  • the azimuth angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, the angle between the ray projected on the XOY plane and the positive direction of the X axis in the direction of the fixed beam;
  • the pitch angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, The angle between the direction of the fixed beam and the positive direction of the Z-axis; for details, please refer to the aforementioned example of FIG. 3c, which will not be repeated.
  • Complex fixed beamforming algorithms include superdirectional waves, constant beamwidth beamforming, etc.
  • the above complex fixed beamforming algorithms finally boil down to a quadratic programming problem, which requires the help of convex optimization technology to solve and obtain the fixed beamforming coefficient W k (f ).
  • Beam width The setting of is related to the number of beams, the microphone layout on the electronic device, the selected fixed beamforming algorithm, and the range of sound sources that need to be picked up by each fixed beam. It can be set independently in practical applications and is not limited here.
  • the method shown in Figure 4 achieves the acquisition of M groups of fixed beams.
  • a driving scenario is a scenario in which a user uses a mobile phone voice assistant with a relatively high frequency.
  • the noise environment in this scenario is relatively harsh, including engine sound, tire friction sound, air-conditioning sound, wind noise when opening windows, etc. This will directly cause the user's voice signal-to-noise ratio received by the mobile phone to decrease, and the voice assistant will pick up cleanly.
  • the electronic device may include: a sensor module, a scene analysis module, a front-end enhancement module, a voice wake-up module, a voiceprint recognition and confirmation module, a voice recognition module, and other interaction modules.
  • the sensor module may include: a camera, a microphone, and a gravity sensor, through which data such as the user's image, sound signal, and the placement position of the electronic device can be obtained respectively;
  • the scene analysis module is used to obtain a priori information about the sound signal, Perform targeted sound pickup;
  • the front-end enhancement module is used to extract the user's (host) sound signal, that is, the target sound signal, while suppressing other interference signals and noise;
  • the voice wake-up module is used to detect specific target sound signals Wake-up words, these specific wake-up words can "wake up" the electronic device, and whether the electronic device will eventually be awakened requires the voiceprint recognition confirmation module to "check”.
  • the voiceprint recognition confirmation module is used to check the user's voiceprint Recognition and confirmation, only when the user's voiceprint currently speaking the wake-up word is consistent with the preset user's voiceprint, the electronic device is finally awakened by the user.
  • the voice wake-up module only supports one way to wake up, which requires the front-end enhancement module to only output one audio signal to the voice wake-up module for wake-up detection.
  • the front-end enhancement module When there are multiple speakers, it is necessary to accurately identify the target speech
  • the person’s position and other information are then used for noise reduction algorithms such as echo cancellation, fixed beam forming, and multi-channel adaptive filtering for directional sound pickup enhancement, and the clean target sound signal is estimated to be sent to the voice wake-up module for voiceprint detection and voice wake-up recognition Wait for follow-up processing.
  • the interaction between the user and the sensor module includes: a camera captures an image containing a human face, a gravity sensor can obtain gravitational acceleration values of the electronic device in various directions, and a microphone obtains the user's voice signal.
  • the image captured by the camera in the sensor module and the gravitational acceleration value obtained by the gravity sensor are transmitted to the scene analysis module, and the scene analysis module obtains the position of the user relative to the electronic device according to this, and transmits the position to the front-end enhancement module.
  • the sensor module also transmits the sound signal obtained by the microphone to the front-end enhancement module, and the front-end enhancement module extracts the target sound signal according to the position and the sound signal.
  • the target sound signal is a relatively clean voice signal.
  • the target sound signal will be transmitted to the voice wake-up module and the voiceprint recognition and confirmation module.
  • the voice wake-up module detects the specific wake-up word, and the voiceprint recognition and confirmation module compares the voiceprint of the target sound signal. Compare with the preset user's voiceprint to confirm whether the voiceprint is consistent; if the voiceprint recognition confirmation module confirms that the voiceprint is consistent, the voice recognition module interacts with other interaction modules according to the specific wake-up words extracted by the voice wake-up module.
  • Fig. 6a is a structural diagram of an embodiment of a sound pickup device of this application. As shown in Fig. 6a, the sound pickup device 600 may include:
  • the position obtaining unit 610 is configured to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
  • the beam selection unit 620 is configured to select, among the preset fixed beams of the electronic device, the fixed beam that is closest to the azimuth obtained by the azimuth obtaining unit 610 as the main beam, according to the distance from the farthest to the closest to the azimuth. Select at least one fixed beam as the secondary beam in sequence;
  • the signal calculation unit 630 is configured to use the beamforming coefficient of the main beam selected by the beam selection unit 620 to calculate the main output signal of the sound signal when the N microphones receive the sound signal, and use all Calculating the beamforming coefficient of the secondary beam selected by the beam selecting unit 620 to the secondary output signal of the sound signal;
  • the filtering unit 640 is configured to use the auxiliary output signal calculated by the signal calculation unit 630 to filter the main output signal to obtain a target sound signal.
  • the position obtaining unit 610 may include:
  • the image acquisition subunit 611 is configured to acquire the image captured by the camera of the electronic device
  • the position obtaining subunit 612 is configured to, if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit 611, according to the position information of the facial information in the image , Obtain the position of the user relative to the electronic device; if the user’s face information is not recognized in the image obtained from the image subunit, obtain the placement position of the electronic device; Position to obtain the position of the user relative to the electronic device.
  • the beam selection unit 620 may include:
  • the beam selection subunit 622 is configured to select, among the ratios calculated by the ratio calculation subunit, the fixed beam corresponding to the smallest ratio as the main beam, and the ratio from the largest to the smallest in the order of the ratio Start to select at least one fixed beam corresponding to the ratio as a secondary beam.
  • the device 600 may further include:
  • the beam obtaining unit 650 is configured to obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
  • the beam obtaining unit 650 may include:
  • the coordinate system establishment subunit 651 is used to establish a three-dimensional Cartesian coordinate system for the electronic device
  • the ideal steering vector calculation subunit 653 is configured to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
  • the true steering vector calculation subunit 655 is configured to calculate the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
  • the fixed beam calculation subunit 656 is configured to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
  • the sound pickup device 600 provided in the embodiment shown in FIGS. 6a to 7b can be used to implement the technical solutions of the method embodiments shown in FIGS. 2 to 4 of this application.
  • each step of the above method or each of the above units can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
  • the above units may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit; hereinafter referred to as ASIC), or, one or more micro-processing Digital Processor (Digital Singnal Processor; hereinafter referred to as DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array; hereinafter referred to as FPGA), etc.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Singnal Processor
  • FPGA Field Programmable Gate Array
  • these units can be integrated together and implemented in the form of a System-On-a-Chip (hereinafter referred to as SOC).
  • SOC System-On-a-Chip
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application. As shown in FIG. 8, the above-mentioned electronic device may include: a display screen; one or more processors; a memory; and one or more computer programs.
  • the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned electronic device may be a mobile terminal (mobile phone), a computer, a PAD, a wearable device, a smart screen, a drone, and an intelligent network connection.
  • Vehicle Intelligent Connected Vehicle; hereinafter referred to as ICV
  • smart/intelligent car smart/intelligent car
  • in-vehicle equipment in-vehicle equipment.
  • the above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions.
  • the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device is caused to perform the following steps:
  • the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
  • the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal
  • the beamforming coefficients of the side beams are used to calculate the side output of the sound signal.
  • auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal.
  • the step of obtaining the user's position relative to the electronic device may include:
  • the fixed beam closest to the azimuth is selected as the main beam among the preset fixed beams of the electronic device, and the fixed beam is selected according to the distance.
  • the step of selecting at least one fixed beam as the secondary beam in the order of the azimuth from far to near may include:
  • K k included angle ⁇ k /beam width
  • K k is the ratio of the azimuth to the fixed beam k
  • the angle ⁇ k is the angle between the azimuth and the direction of the fixed beam k
  • k 1, 2, ..., M
  • M is the number of fixed beam groups
  • the fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
  • the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams may include:
  • the electronic device shown in FIG. 8 may be a terminal device or a circuit device built in the aforementioned terminal device.
  • the device can be used to execute the functions/steps in the methods provided in the embodiments shown in FIGS. 2 to 4 of this application.
  • the electronic device 800 may include a processor 810, an external memory interface 820, an internal memory 821, a universal serial bus (USB) interface 830, a charging management module 840, a power management module 841, a battery 842, an antenna 1, and an antenna 2.
  • Mobile communication module 850 wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, earphone jack 870D, sensor module 880, buttons 890, motor 891, indicator 892, camera 893, display 894, and Subscriber identification module (subscriber identification module, SIM) card interface 895, etc.
  • SIM Subscriber identification module
  • the sensor module 880 can include pressure sensor 880A, gyroscope sensor 880B, air pressure sensor 880C, magnetic sensor 880D, acceleration sensor 880E, distance sensor 880F, proximity light sensor 880G, fingerprint sensor 880H, temperature sensor 880J, touch sensor 880K, ambient light Sensor 880L, bone conduction sensor 880M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 800.
  • the electronic device 800 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 810 may include one or more processing units.
  • the processor 810 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
  • a memory may also be provided in the processor 810 for storing instructions and data.
  • the memory in the processor 810 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 810. If the processor 810 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 810 is reduced, and the efficiency of the system is improved.
  • the processor 810 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter/receiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 810 may include multiple sets of I2C buses.
  • the processor 810 may be coupled to the touch sensor 880K, charger, flash, camera 893, etc., respectively through different I2C bus interfaces.
  • the processor 810 may couple the touch sensor 880K through an I2C interface, so that the processor 810 and the touch sensor 880K communicate through the I2C bus interface to implement the touch function of the electronic device 800.
  • the I2S interface can be used for audio communication.
  • the processor 810 may include multiple sets of I2S buses.
  • the processor 810 may be coupled with the audio module 870 through an I2S bus to implement communication between the processor 810 and the audio module 870.
  • the audio module 870 may transmit audio signals to the wireless communication module 860 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 870 and the wireless communication module 860 may be coupled through a PCM bus interface.
  • the audio module 870 may also transmit audio signals to the wireless communication module 860 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is usually used to connect the processor 810 and the wireless communication module 860.
  • the processor 810 communicates with the Bluetooth module in the wireless communication module 860 through the UART interface to implement the Bluetooth function.
  • the audio module 870 may transmit audio signals to the wireless communication module 860 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 810 with the display screen 894, the camera 893 and other peripheral devices.
  • the MIPI interface includes camera serial interface (CSI), display serial interface (DSI) and so on.
  • the processor 810 and the camera 893 communicate through a CSI interface to implement the shooting function of the electronic device 800.
  • the processor 810 and the display screen 894 communicate through a DSI interface to realize the display function of the electronic device 800.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 810 with the camera 893, the display screen 894, the wireless communication module 860, the audio module 870, the sensor module 880, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 830 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 830 can be used to connect a charger to charge the electronic device 800, and can also be used to transfer data between the electronic device 800 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 800.
  • the electronic device 800 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 840 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 840 may receive the charging input of the wired charger through the USB interface 830.
  • the charging management module 840 may receive the wireless charging input through the wireless charging coil of the electronic device 800. While the charging management module 840 charges the battery 842, it can also supply power to the electronic device through the power management module 841.
  • the power management module 841 is used to connect the battery 842, the charging management module 840 and the processor 810.
  • the power management module 841 receives input from the battery 842 and/or the charging management module 840, and supplies power to the processor 810, the internal memory 821, the display screen 894, the camera 893, and the wireless communication module 860.
  • the power management module 841 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 841 may also be provided in the processor 810.
  • the power management module 841 and the charging management module 840 may also be provided in the same device.
  • the wireless communication function of the electronic device 800 can be implemented by the antenna 1, the antenna 2, the mobile communication module 850, the wireless communication module 860, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 800 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 850 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 800.
  • the mobile communication module 850 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 850 can receive electromagnetic waves by the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 850 can also amplify the signal modulated by the modem processor, and convert it to electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 850 may be provided in the processor 810.
  • at least part of the functional modules of the mobile communication module 850 and at least part of the modules of the processor 810 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 870A, a receiver 870B, etc.), or displays an image or video through the display screen 894.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 810 and be provided in the same device as the mobile communication module 850 or other functional modules.
  • the wireless communication module 860 can provide applications on the electronic device 800 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 860 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 860 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 810.
  • the wireless communication module 860 may also receive the signal to be sent from the processor 810, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the antenna 1 of the electronic device 800 is coupled with the mobile communication module 850, and the antenna 2 is coupled with the wireless communication module 860, so that the electronic device 800 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 800 implements a display function through a GPU, a display screen 894, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display 894 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 810 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 894 is used to display images, videos, and so on.
  • the display screen 894 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 800 may include one or N display screens 894, and N is a positive integer greater than one.
  • the electronic device 800 can realize a shooting function through an ISP, a camera 893, a video codec, a GPU, a display screen 894, and an application processor.
  • the ISP is used to process the data fed back from the camera 893. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 893.
  • the camera 893 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 800 may include 1 or N cameras 893, and N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 800 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 800 may support one or more video codecs. In this way, the electronic device 800 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the electronic device 800 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 820 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 800.
  • the external memory card communicates with the processor 810 through the external memory interface 820 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 821 may be used to store computer executable program code, where the executable program code includes instructions.
  • the internal memory 821 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 800.
  • the internal memory 821 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the processor 810 executes various functional applications and data processing of the electronic device 800 by running instructions stored in the internal memory 821 and/or instructions stored in a memory provided in the processor.
  • the electronic device 800 can implement audio functions through an audio module 870, a speaker 870A, a receiver 870B, a microphone 870C, a headphone interface 870D, and an application processor. For example, music playback, recording, etc.
  • the audio module 870 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 870 can also be used to encode and decode audio signals.
  • the audio module 870 may be provided in the processor 810, or part of the functional modules of the audio module 870 may be provided in the processor 810.
  • the speaker 870A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 800 can listen to music through the speaker 870A, or listen to a hands-free call.
  • the receiver 870B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 800 answers a call or voice message, it can receive the voice by bringing the receiver 870B close to the human ear.
  • Microphone 870C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 870C through the human mouth, and input the sound signal to the microphone 870C.
  • the electronic device 800 may be provided with at least one microphone 870C. In other embodiments, the electronic device 800 may be provided with two microphones 870C, which can implement noise reduction functions in addition to collecting sound signals. In some other embodiments, the electronic device 800 can also be equipped with three, four or more microphones 870C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 870D is used to connect wired earphones.
  • the earphone interface 870D may be a USB interface 830, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA, CTIA
  • the pressure sensor 880A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 880A may be provided on the display screen 894.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the electronic device 800 may also calculate the touched position according to the detection signal of the pressure sensor 880A.
  • touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 880B can be used to determine the movement posture of the electronic device 800.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 880B can be used for shooting anti-shake.
  • the gyroscope sensor 880B detects the jitter angle of the electronic device 800, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the jitter of the electronic device 800 through reverse movement to achieve anti-shake.
  • the gyro sensor 880B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 880C is used to measure air pressure.
  • the electronic device 800 uses the air pressure value measured by the air pressure sensor 880C to calculate the altitude to assist positioning and navigation.
  • the magnetic sensor 880D includes a Hall sensor.
  • the electronic device 800 can use the magnetic sensor 880D to detect the opening and closing of the flip holster.
  • the electronic device 800 when the electronic device 800 is a flip machine, the electronic device 800 can detect the opening and closing of the flip according to the magnetic sensor 880D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 880E can detect the magnitude of the acceleration of the electronic device 800 in various directions (generally three axes). When the electronic device 800 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, and so on.
  • the electronic device 800 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 800 may use the distance sensor 880F to measure the distance to achieve fast focusing.
  • the proximity light sensor 880G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 800 emits infrared light to the outside through the light emitting diode.
  • the electronic device 800 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 800. When insufficient reflected light is detected, the electronic device 800 can determine that there is no object near the electronic device 800.
  • the electronic device 800 can use the proximity light sensor 880G to detect that the user holds the electronic device 800 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 880G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 880L is used to sense the brightness of the ambient light.
  • the electronic device 800 can adaptively adjust the brightness of the display screen 894 according to the perceived brightness of the ambient light.
  • the ambient light sensor 880L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 880L can also cooperate with the proximity light sensor 880G to detect whether the electronic device 800 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 880H is used to collect fingerprints.
  • the electronic device 800 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 880J is used to detect temperature.
  • the electronic device 800 uses the temperature detected by the temperature sensor 880J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 880J exceeds a threshold value, the electronic device 800 executes to reduce the performance of the processor located near the temperature sensor 880J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 800 heats the battery 842 to avoid abnormal shutdown of the electronic device 800 due to low temperature.
  • the electronic device 800 boosts the output voltage of the battery 842 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 880K is also called “touch device”.
  • the touch sensor 880K can be arranged on the display screen 894, and the touch screen is composed of the touch sensor 880K and the display screen 894, which is also called a “touch screen”.
  • the touch sensor 880K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation may be provided through the display screen 894.
  • the touch sensor 880K may also be disposed on the surface of the electronic device 800, which is different from the position of the display screen 894.
  • the bone conduction sensor 880M can acquire vibration signals.
  • the bone conduction sensor 880M can obtain the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 880M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 880M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 870 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 880M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 880M, and realize the heart rate detection function.
  • the button 890 includes a power button, a volume button, and so on.
  • the button 890 may be a mechanical button. It can also be a touch button.
  • the electronic device 800 may receive key input, and generate key signal input related to user settings and function control of the electronic device 800.
  • the motor 891 can generate vibration prompts.
  • the motor 891 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 894, the motor 891 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 892 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 895 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 895 or pulled out from the SIM card interface 895 to achieve contact and separation with the electronic device 800.
  • the electronic device 800 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 895 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 895 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 895 can also be compatible with different types of SIM cards.
  • the SIM card interface 895 can also be compatible with external memory cards.
  • the electronic device 800 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 800 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 800 and cannot be separated from the electronic device 800.
  • the electronic device 800 shown in FIG. 8 can implement various processes of the methods provided in the embodiments shown in FIGS. 2 to 4 of this application.
  • the operations and/or functions of each module in the electronic device 800 are respectively for implementing the corresponding processes in the foregoing method embodiments.
  • processor 810 in the electronic device 800 shown in FIG. 8 may be a system-on-chip SOC, and the processor 810 may include a central processing unit (CPU), and may further include other types of processors. For example: Graphics Processing Unit (GPU), etc.
  • CPU central processing unit
  • GPU Graphics Processing Unit
  • each part of the processor or processing unit inside the processor 810 can cooperate to implement the previous method flow, and the corresponding software program of each part of the processor or processing unit can be stored in the internal memory 121.
  • the present application also provides an electronic device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium.
  • a computer executable program is stored in the storage medium.
  • the central processing unit is connected to the The non-volatile storage medium is connected, and the computer executable program is executed to implement the methods provided by the embodiments shown in FIGS. 2 to 4 of this application.
  • the processors involved may include, for example, CPU, DSP, microcontroller or digital signal processor, and may also include GPU, embedded neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and Image Signal Processing (Image Signal Processing; hereinafter referred to as ISP), which may also include necessary hardware accelerators or logic processing hardware circuits, such as ASIC, or one or more integrated circuits used to control the execution of the technical solutions of this application Circuit etc.
  • the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium.
  • the embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, which when running on a computer, causes the computer to execute the functions provided by the embodiments shown in Figs. 2 to 4 of the present application. method.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a computer program that, when running on a computer, causes the computer to execute the method provided by the embodiments shown in FIGS. 2 to 4 of the present application.
  • At least one refers to one or more
  • multiple refers to two or more.
  • And/or describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • the following at least one item” and similar expressions refer to any combination of these items, including any combination of single items or plural items.
  • At least one of a, b, and c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single, or There can be more than one.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as ROM), random access memory (Random Access Memory; hereinafter referred to as RAM), magnetic disks or optical disks, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disks or optical disks etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

A sound pickup method and apparatus (600) and an electronic device (800). The method comprises: acquiring the orientation of a user relative to the electronic device (800) (201), wherein the electronic device (800) is provided with at least three microphones; selecting, from preconfigured fixed beams of the electronic device (800), a fixed beam closest to the orientation as a main beam, and selecting at least one fixed beam as a side beam in the order of decreasing distances to the orientation (202); when the microphones receive a sound signal, calculating a main output signal of the sound signal by using a beamforming coefficient of the main beam, and calculating a side output signal of the sound signal by using a beamforming coefficient of the side beam (203); and performing filtering processing on the main output signal by using the side output signal to obtain a target sound signal (204), thereby alleviating the problem of speech distortion and the problem of incomplete elimination of human voice interference.

Description

拾音方法、装置和电子设备Sound pickup method, device and electronic equipment 技术领域Technical field
本申请涉及智能终端技术领域,特别涉及拾音方法、装置和电子设备。This application relates to the technical field of smart terminals, in particular to methods, devices and electronic equipment for picking up sound.
背景技术Background technique
市面上大多数终端类电子设备,比如智能手机、平板,都带有语音助手应用,它的主要作用是在用户不接触手机等电子设备的情况下,通过语音指令控制电子设备,完成一些低难度高频次的指令操作,比如播放音乐、查询天气、设置闹钟、拨打电话、地图导航等。Most terminal electronic devices on the market, such as smart phones and tablets, have voice assistant applications. Its main function is to control electronic devices through voice commands and complete some low-level difficulties without touching electronic devices such as mobile phones. Frequent command operations, such as playing music, querying the weather, setting alarms, making calls, map navigation, etc.
上述人机交互的流程一般包括:利用电子设备的麦克风拾取音频信号;通过前端增强算法从音频信号中估计出一路干净的语音信号;使用该语音信号进行语音唤醒和语音识别。前端增强算法主要通过噪声消除来提取干净的语音信号,噪声消除包括:回声消除、干扰抑制以及去除背景噪声等,回声消除中需要消除的回声一般是人机交互过程中电子设备的喇叭自发声,干扰抑制中的干扰一般是方向性噪声,比如客厅环境中的电视声音、车载环境中的车载喇叭声等。前端增强算法的性能直接影响到人机交互的成功率,最终影响用户体验。The above-mentioned human-computer interaction process generally includes: using a microphone of an electronic device to pick up an audio signal; using a front-end enhancement algorithm to estimate a clean voice signal from the audio signal; using the voice signal for voice wake-up and voice recognition. The front-end enhancement algorithm mainly extracts clean speech signals through noise cancellation. Noise cancellation includes echo cancellation, interference suppression, and background noise removal. The echo that needs to be eliminated in echo cancellation is generally the spontaneous sound of the electronic device's horn during human-computer interaction. The interference in interference suppression is generally directional noise, such as the TV sound in the living room environment, the car horn in the car environment, and so on. The performance of the front-end enhancement algorithm directly affects the success rate of human-computer interaction, and ultimately affects the user experience.
以手机为例。前端增强算法主要利用手机上的麦克风进行噪声消除,考虑到功耗以及计算资源的限制,大多情况下只利用一个麦克风进行单麦降噪,该算法称之为单通道降噪算法。常见的单通道降噪算法有谱减法、维纳滤波算法、及深度学习法。单通道降噪算法对于不可预测的非平稳噪声没有效果,低信噪比条件下语音失真严重。Take mobile phones as an example. The front-end enhancement algorithm mainly uses the microphone on the mobile phone to eliminate noise. Considering the limitations of power consumption and computing resources, in most cases only one microphone is used for single-mic noise reduction. This algorithm is called a single-channel noise reduction algorithm. Common single-channel noise reduction algorithms include spectral subtraction, Wiener filtering algorithm, and deep learning method. The single-channel noise reduction algorithm has no effect on unpredictable non-stationary noise, and speech distortion is serious under the condition of low signal-to-noise ratio.
为了达到更好的降噪效果,基于两个麦克风的双通道降噪算法在电子设备上越来越普及,它主要应用于对功耗不敏感的场景,比如用户能够随时为电子设备充电的车载场景,利用位于手机顶部和底部的两个麦克风来进行噪声抑制。双通道降噪算法的主要思想是选取一个麦克风作为主麦,一个麦克风作为副麦,首先基于人声语音的谐波检测算法确定主麦数据中噪声的时频点信息,然后基于滤波的思想利用副麦噪声滤除主麦噪声,提高语音质量,达到降噪的思想。但是,谐波检测算法不能区分人声干扰和包含唤醒词的目标人声,对人声干扰基本很难消除。In order to achieve better noise reduction effects, dual-channel noise reduction algorithms based on two microphones are becoming more and more popular in electronic devices. It is mainly used in scenarios that are not sensitive to power consumption, such as in-vehicle scenarios where users can charge electronic devices at any time. , Using two microphones located at the top and bottom of the phone for noise suppression. The main idea of the dual-channel noise reduction algorithm is to select one microphone as the main microphone and one microphone as the auxiliary microphone. First, determine the time-frequency information of the noise in the main microphone data based on the harmonic detection algorithm of human voice and then use the idea of filtering. The auxiliary microphone noise filters out the main microphone noise, improves the voice quality, and achieves the idea of noise reduction. However, the harmonic detection algorithm cannot distinguish between the human voice interference and the target human voice containing the wake-up word, and it is basically difficult to eliminate the human voice interference.
发明内容Summary of the invention
本申请实施例提供了一种拾音方法,缓解语音失真问题、以及人声干扰消除不彻底问题。The embodiment of the present application provides a sound pickup method to alleviate the problem of voice distortion and incomplete elimination of human voice interference.
第一方面,本申请实施例提供了一种拾音方法,包括:In the first aspect, an embodiment of the present application provides a sound pickup method, including:
获得用户相对电子设备的方位;电子设备设置有N个麦克风;N为大于等于3的整数;上述电子设备可以包括移动终端(手机)、电脑、PAD、可穿戴设备、智慧屏、无人机、智能网联车(Intelligent Connected Vehicle;以下简称:ICV)、智能(汽)车(smart/intelligent car)或车载设备等设备;可选地,为了达到更好的拾音效果,N个麦克风在电子设备上可以分散设置,例如设置在电子设备的不同部位,每个麦克风设置的位置包括但不限于:电子设备的上部、下部、顶部、底部、屏幕所在的上表面、和/或背部等;Obtain the user's position relative to the electronic device; the electronic device is equipped with N microphones; N is an integer greater than or equal to 3; the above-mentioned electronic devices may include mobile terminals (mobile phones), computers, PADs, wearable devices, smart screens, drones, Intelligent Connected Vehicle (Intelligent Connected Vehicle; hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or on-board equipment; optionally, in order to achieve a better sound pickup effect, N microphones The device can be distributed, for example, in different parts of the electronic device. The location of each microphone includes but not limited to: the upper part, lower part, top, bottom of the electronic device, the upper surface where the screen is located, and/or the back, etc.;
在电子设备的预设固定波束中,选择距离方位最近的固定波束作为主波束,按照距离方位从远到近的顺序选择至少一个固定波束作为副波束;预设固定波束的数量大于等于2;In the preset fixed beams of the electronic device, the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short; the number of preset fixed beams is greater than or equal to 2;
当N个麦克风接收到声音信号时,使用主波束的波束形成系数计算声音信号的主输出信号,并且,使用副波束的波束形成系数计算声音信号的副输出信号;When N microphones receive the sound signal, the beamforming coefficient of the main beam is used to calculate the main output signal of the sound signal, and the beamforming coefficient of the side beam is used to calculate the auxiliary output signal of the sound signal;
使用副输出信号对主输出信号进行滤波处理,得到目标声音信号。Use the auxiliary output signal to filter the main output signal to obtain the target sound signal.
该方法中,获得用户相对电子设备的方位,通过该方位从电子设备的预设固定波束中选择主波束和副波束,从而能够更准确地从声音信号中获得目标声源的声音信号,有效地减少目标声音信号中的人声干扰;使用至少3个麦克风接收声音信号,由于电子设备壳体的影响,可以更好的区分噪声,增强滤波处理的效果,缓解低信噪比条件下的语音失真问题以及人声干扰消除不彻底问题。In this method, the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound signal of the target sound source can be obtained more accurately from the sound signal, effectively Reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device casing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the voice distortion under the condition of low signal-to-noise ratio Problems and incomplete elimination of vocal interference.
在一种可能的实现方式中,获得用户相对电子设备的方位,包括:In a possible implementation manner, obtaining the position of the user relative to the electronic device includes:
获取电子设备的摄像头捕捉到的图像;Obtain the image captured by the camera of the electronic device;
如果从图像中识别出电子设备的用户的人脸信息,根据人脸信息在图像中的位置信息,获得用户相对电子设备的方位;If the facial information of the user of the electronic device is recognized from the image, the position of the user relative to the electronic device is obtained according to the position information of the facial information in the image;
如果从图像中未识别出用户的人脸信息,获取电子设备的摆放位置;根据摆放位置,获得用户相对电子设备的方位。If the user's face information is not recognized from the image, the placement position of the electronic device is obtained; according to the placement position, the user's position relative to the electronic device is obtained.
通过获得用户相对电子设备的方位,可以获得更加准确的目标人说话信息,为后续信号处理带来更多先验信息。By obtaining the position of the user relative to the electronic device, more accurate target person's speech information can be obtained, which brings more prior information for subsequent signal processing.
在一种可能的实现方式中,在电子设备的预设固定波束中,选择距离方位最近的固定波束作为主波束,按照距离方位从远到近的顺序选择至少一个固定波束作为副波束,包括:In a possible implementation manner, among the preset fixed beams of the electronic device, selecting the fixed beam with the closest azimuth as the main beam, and selecting at least one fixed beam as the sub-beam in the order of the distance and azimuth from far to short, including:
计算方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000001
其中,K k是方位针对固定波束k的比值,夹角Δ k是方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000002
是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Figure PCTCN2021079789-appb-000001
Among them, K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Figure PCTCN2021079789-appb-000002
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
选择最小的比值对应的固定波束作为主波束,按照比值从大到小的顺序从最大的比值开始选择至少一个比值对应的固定波束作为副波束。The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam, starting from the largest ratio in the order of the ratio from larger to smaller.
在一种可能的实现方式中,获得用户相对电子设备的方位之前,还包括:In a possible implementation manner, before obtaining the user's position relative to the electronic device, the method further includes:
获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
在一种可能的实现方式中,获得预设组数的固定波束的波束形成系数、方向、 以及波束宽度,包括:In a possible implementation manner, obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:
为电子设备建立三维笛卡尔坐标系;Establish a three-dimensional Cartesian coordinate system for electronic equipment;
获得N个麦克风在坐标系中的坐标;Obtain the coordinates of N microphones in the coordinate system;
根据N个麦克风的坐标计算目标声源在理想条件下的导向矢量;Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
获得电子设备壳体对麦克风的频域响应矩阵;Obtain the frequency domain response matrix of the housing of the electronic device to the microphone;
根据理想条件下的导向矢量以及频域响应矩阵计算目标声源的真实导向矢量;Calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;
根据真实导向矢量计算预设组数的固定波束的波束形成系数、方向、以及波束宽度。Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
第二方面,本申请实施例提供一种拾音装置,包括:In a second aspect, an embodiment of the present application provides a sound pickup device, including:
方位获得单元,用于获得用户相对电子设备的方位;电子设备设置有N个麦克风;N为大于等于3的整数;The position obtaining unit is used to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
波束选择单元,用于在电子设备的预设固定波束中,选择距离方位获得单元获得的方位最近的固定波束作为主波束,按照距离方位从远到近的顺序选择至少一个固定波束作为副波束;The beam selection unit is configured to select, among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth obtained by the azimuth obtaining unit as the main beam, and select at least one fixed beam as the secondary beam in the order of the distance and azimuth from far to short;
信号计算单元,用于当N个麦克风接收到声音信号时,使用波束选择单元选择的主波束的波束形成系数计算声音信号的主输出信号,并且,使用波束选择单元选择的副波束的波束形成系数计算声音信号的副输出信号;The signal calculation unit is used to calculate the main output signal of the sound signal using the beam forming coefficient of the main beam selected by the beam selection unit when the sound signal is received by the N microphones, and use the beam forming coefficient of the sub beam selected by the beam selection unit Calculate the secondary output signal of the sound signal;
滤波单元,用于使用信号计算单元计算的副输出信号对主输出信号进行滤波处理,得到目标声音信号。The filtering unit is configured to perform filtering processing on the main output signal using the auxiliary output signal calculated by the signal calculation unit to obtain the target sound signal.
在一种可能的实现方式中,方位获得单元包括:In a possible implementation manner, the position obtaining unit includes:
图像获取子单元,用于获取电子设备的摄像头捕捉到的图像;The image acquisition subunit is used to acquire the image captured by the camera of the electronic device;
方位获得子单元,用于如果从图像子单元获取到的图像中识别出电子设备的用户的人脸信息,根据人脸信息在图像中的位置信息,获得用户相对电子设备的方位;如果从图像子单元获取到的图像中未识别出用户的人脸信息,获取电子设备的摆放位置;根据摆放位置,获得用户相对电子设备的方位。The position obtaining subunit is used to obtain the position of the user relative to the electronic device according to the position information of the face information in the image if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit; The face information of the user is not recognized in the image obtained by the subunit, and the placement position of the electronic device is obtained; according to the placement position, the position of the user relative to the electronic device is obtained.
在一种可能的实现方式中,波束选择单元包括:In a possible implementation manner, the beam selection unit includes:
比值计算子单元,用于计算方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000003
其中,K k是方位针对固定波束k的比值,夹角Δ k是方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000004
是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
The ratio calculation subunit is used to calculate the ratio K of the azimuth to each fixed beam; K k = the included angle Δ k /beam width
Figure PCTCN2021079789-appb-000003
Among them, K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Figure PCTCN2021079789-appb-000004
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
波束选择子单元,用于在比值计算子单元计算的比值中,选择最小的比值对应的固定波束作为主波束,按照比值从大到小的顺序从最大的比值开始选择至少一个比值对应的固定波束作为副波束。The beam selection subunit is used to select the fixed beam corresponding to the smallest ratio as the main beam among the ratios calculated by the ratio calculation subunit, and select at least one fixed beam corresponding to the ratio starting from the largest ratio in the order of the ratio from the largest to the smallest. As a secondary beam.
在一种可能的实现方式中,还包括:In a possible implementation, it also includes:
波束获得单元,用于获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。The beam obtaining unit is used to obtain the beam forming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
在一种可能的实现方式中,波束获得单元包括:In a possible implementation manner, the beam obtaining unit includes:
坐标系建立子单元,用于为电子设备建立三维笛卡尔坐标系;The coordinate system establishment subunit is used to establish a three-dimensional Cartesian coordinate system for electronic equipment;
坐标获得子单元,用于获得N个麦克风在坐标系中的坐标;The coordinate obtaining subunit is used to obtain the coordinates of the N microphones in the coordinate system;
理想导向矢量计算子单元,用于根据N个麦克风的坐标计算目标声源在理想条件下的导向矢量;The ideal steering vector calculation subunit is used to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
矩阵获得子单元,用于获得电子设备壳体对麦克风的频域响应矩阵;The matrix obtaining subunit is used to obtain the frequency domain response matrix of the housing of the electronic device to the microphone;
真实导向矢量计算子单元,用于根据理想条件下的导向矢量以及频域响应矩阵计算目标声源的真实导向矢量;The true steering vector calculation subunit is used to calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;
固定波束计算子单元,用于根据真实导向矢量计算预设组数的固定波束的波束形成系数、方向、以及波束宽度。The fixed beam calculation subunit is used to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
第三方面,本申请实施例提供一种电子设备,包括:In a third aspect, an embodiment of the present application provides an electronic device, including:
显示屏;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令,当指令被设备执行时,使得设备执行以下步骤:Display screen; one or more processors; memory; multiple application programs; and one or more computer programs, one or more computer programs are stored in the memory, one or more computer programs include instructions, when the instructions are When the device executes, make the device perform the following steps:
获得用户相对电子设备的方位;电子设备设置有N个麦克风;N为大于等于3的整数;Obtain the user's position relative to the electronic device; the electronic device is equipped with N microphones; N is an integer greater than or equal to 3;
在电子设备的预设固定波束中,选择距离方位最近的固定波束作为主波束,按照距离方位从远到近的顺序选择至少一个固定波束作为副波束;Among the preset fixed beams of the electronic device, the fixed beam with the closest azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order of the distance and azimuth from far to short;
当N个麦克风接收到声音信号时,使用主波束的波束形成系数计算声音信号的主输出信号,并且,使用副波束的波束形成系数计算声音信号的副输出信号;When N microphones receive the sound signal, the beamforming coefficient of the main beam is used to calculate the main output signal of the sound signal, and the beamforming coefficient of the side beam is used to calculate the auxiliary output signal of the sound signal;
使用副输出信号对主输出信号进行滤波处理,得到目标声音信号。。Use the auxiliary output signal to filter the main output signal to obtain the target sound signal. .
在一种可能的实现方式中,指令被设备执行时,使得获得用户相对电子设备的方位的步骤包括:In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the position of the user relative to the electronic device includes:
获取电子设备的摄像头捕捉到的图像;Obtain the image captured by the camera of the electronic device;
如果从图像中识别出电子设备的用户的人脸信息,根据人脸信息在图像中的位置信息,获得用户相对电子设备的方位;If the facial information of the user of the electronic device is recognized from the image, the position of the user relative to the electronic device is obtained according to the position information of the facial information in the image;
如果从图像中未识别出用户的人脸信息,获取电子设备的摆放位置;根据摆放位置,获得用户相对电子设备的方位。If the user's face information is not recognized from the image, the placement position of the electronic device is obtained; according to the placement position, the user's position relative to the electronic device is obtained.
在一种可能的实现方式中,指令被设备执行时,使得在电子设备的预设固定波束中,选择距离方位最近的固定波束作为主波束,按照距离方位从远到近的顺序选择至少一个固定波束作为副波束的步骤包括:In a possible implementation manner, when the instruction is executed by the device, the fixed beam with the closest azimuth is selected as the main beam among the preset fixed beams of the electronic device, and at least one fixed beam is selected in the order of the distance from the farthest to the nearer. The steps of using the beam as a secondary beam include:
计算方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000005
其中,K k是方位针对固定波束k的比值,夹角Δ k是方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000006
是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Figure PCTCN2021079789-appb-000005
Among them, K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Figure PCTCN2021079789-appb-000006
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
选择最小的比值对应的固定波束作为主波束,按照比值从大到小的顺序从最大的比值开始选择至少一个比值对应的固定波束作为副波束。The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam, starting from the largest ratio in the order of the ratio from larger to smaller.
在一种可能的实现方式中,指令被设备执行时,使得获得用户相对电子设备的方位的步骤之前还执行以下步骤:In a possible implementation manner, when the instruction is executed by the device, the following steps are performed before the step of obtaining the user's position relative to the electronic device:
获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
在一种可能的实现方式中,指令被设备执行时,使得获得预设组数的固定波束的波束形成系数、方向、以及波束宽度的步骤包括:In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams includes:
为电子设备建立三维笛卡尔坐标系;Establish a three-dimensional Cartesian coordinate system for electronic equipment;
获得N个麦克风在坐标系中的坐标;Obtain the coordinates of N microphones in the coordinate system;
根据N个麦克风的坐标计算目标声源在理想条件下的导向矢量;Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
获得电子设备壳体对麦克风的频域响应矩阵;Obtain the frequency domain response matrix of the housing of the electronic device to the microphone;
根据理想条件下的导向矢量以及频域响应矩阵计算目标声源的真实导向矢量;Calculate the true steering vector of the target sound source according to the steering vector under ideal conditions and the frequency domain response matrix;
根据真实导向矢量计算预设组数的固定波束的波束形成系数、方向、以及波束宽度。Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第一方面的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the method of the first aspect.
第五方面,本申请实施例提供一种计算机程序,当计算机程序被计算机执行时,用于执行第一方面的方法。In the fifth aspect, an embodiment of the present application provides a computer program, which is used to execute the method of the first aspect when the computer program is executed by a computer.
在一种可能的设计中,第五方面中的程序可以全部或者部分存储在与处理器封装在一起的存储介质上,也可以部分或者全部存储在不与处理器封装在一起的存储器上。In a possible design, the program in the fifth aspect may be stored in whole or in part on a storage medium that is packaged with the processor, and may also be stored in part or in a memory that is not packaged with the processor.
附图说明Description of the drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本申请实施例电子设备上麦克风设置示例图;FIG. 1 is an example diagram of microphone settings on an electronic device according to an embodiment of the application;
图2为本申请拾音方法一个实施例的流程图;FIG. 2 is a flowchart of an embodiment of the sound pickup method of this application;
图3a为本申请拾音方法另一个实施例的流程图;Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application;
图3b为本申请电子设备的三维笛卡尔坐标系示例图;Figure 3b is an example diagram of the three-dimensional Cartesian coordinate system of the electronic device of this application;
图3c为本申请实施例方位角和俯仰角示例图;FIG. 3c is an example diagram of the azimuth angle and the pitch angle according to the embodiment of the application;
图3d为本申请实施例电子设备摆放位置示例图;FIG. 3d is an example diagram of the placement position of the electronic device according to the embodiment of the application;
图4为本申请一个步骤的实现方法一个实施例的流程图;FIG. 4 is a flowchart of an embodiment of a method for implementing one step of this application;
图5a和图5b为本申请拾音方法所适用的电子设备的一种结构图;5a and 5b are a structural diagram of an electronic device to which the sound pickup method of this application is applicable;
图6a为本申请拾音装置一个实施例的结构示意图;Fig. 6a is a schematic structural diagram of an embodiment of a sound pickup device according to the present application;
图6b为本申请拾音装置一个单元的一个实施例的结构示意图;Fig. 6b is a schematic structural diagram of an embodiment of a unit of the sound pickup device of the present application;
图6c为本申请拾音装置另一个单元的一个实施例的结构示意图;Fig. 6c is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application;
图7a为本申请拾音装置又一个实施例的结构示意图;Fig. 7a is a schematic structural diagram of another embodiment of a sound pickup device according to the present application;
图7b为本申请拾音装置另一个单元一个实施例的结构示意图;Fig. 7b is a schematic structural diagram of an embodiment of another unit of the sound pickup device of the present application;
图8为本申请电子设备一个实施例的结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application.
具体实施方式Detailed ways
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The terms used in the implementation mode part of this application are only used to explain specific embodiments of this application, and are not intended to limit this application.
现有的实现方案中,单通道降噪算法在低信噪比条件下语音失真严重,双通道降噪算法对人声干扰基本很难消除,为此,本申请提出一种拾音方法,能够缓解低信噪比条件下语音失真的情况,也能够减少人声干扰。In the existing implementation scheme, the single-channel noise reduction algorithm has serious voice distortion under the condition of low signal-to-noise ratio, and the dual-channel noise reduction algorithm is basically difficult to eliminate the human voice interference. For this reason, this application proposes a sound pickup method that can It can alleviate the voice distortion under the condition of low signal-to-noise ratio, and can also reduce the human voice interference.
本申请实施例中电子设备上设置有至少3个麦克风,各个麦克风在电子设备上的设置位置本申请实施例不作限定。可选地,为了达到更好的拾音效果,所述至少3个麦克风在电子设备上分散设置,例如设置在电子设备的不同部位,每个麦克风设置的位置包括但不限于:电子设备的上部、下部、顶部、底部、屏幕所在的上表面、和/或背部等。在一种可能的实现方式中,参见图1所示,3个麦克风可以分别设置于:电子设备的顶部,电子设备的底部,电子设备的背部。In the embodiment of the present application, at least three microphones are provided on the electronic device, and the location of each microphone on the electronic device is not limited in the embodiment of the present application. Optionally, in order to achieve a better sound pickup effect, the at least three microphones are dispersedly arranged on the electronic device, for example, arranged in different parts of the electronic device, and the position of each microphone includes but is not limited to: the upper part of the electronic device , Bottom, top, bottom, top surface where the screen is located, and/or back, etc. In a possible implementation manner, as shown in FIG. 1, three microphones can be respectively arranged on the top of the electronic device, the bottom of the electronic device, and the back of the electronic device.
本申请实施例可以适用于电子设备的语音助手应用的场景下,为语音唤醒和语音识别提供较为干净的语音信号,也可以适用于其他场景下,例如为某个人进行录音、录像等需要较为干净的语音信号的场景下。The embodiments of this application can be applied to the scenario of voice assistant application of electronic equipment, providing relatively clean voice signals for voice wake-up and voice recognition, and can also be applied to other scenarios, such as recording and video recording for a certain person, which need to be relatively clean. The scene of the voice signal.
图2为本申请拾音方法一个实施例的流程图,如图2所示,上述方法可以包括:FIG. 2 is a flowchart of an embodiment of a sound pickup method according to this application. As shown in FIG. 2, the above method may include:
步骤201:获得用户相对电子设备的方位;所述电子设备设置有N个麦克风,N≥3。Step 201: Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones, N≥3.
步骤202:在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束。Step 202: Among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth is selected as the main beam, and at least one fixed beam is selected as the secondary beam in the order from the farthest to the shortest from the azimuth.
步骤203:当所述N个麦克风接收到声音信号时,使用所述主波束的波束形成系数计算所述声音信号的主输出信号,并且,使用所述副波束的波束形成系数计算所述声音信号的副输出信号。Step 203: When the N microphones receive the sound signal, use the beamforming coefficient of the main beam to calculate the main output signal of the sound signal, and use the beamforming coefficient of the secondary beam to calculate the sound signal The secondary output signal.
步骤204:使用所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。Step 204: Use the auxiliary output signal to filter the main output signal to obtain a target sound signal.
这里,得到的目标声音信号是滤除了噪声的干净语音信号。Here, the target sound signal obtained is a clean speech signal with noise filtered out.
图2所示的方法中,获得用户相对电子设备的方位,通过该方位从电子设备的预设固定波束中选择主波束和副波束,从而能够更准确地从声音信号中获得目标声源的声音信号,有效地减少目标声音信号中的人声干扰;使用至少3个麦克风接收声音信号,由于电子设备壳体的影响,可以更好的区分噪声,增强滤波处理的效果,缓解低信噪比条件下的语音失真问题。尤其当所述至少3个麦克风分散设置在电子设备的不同部位,例如3个麦克风分别设置于电子设备的顶部、底部以及背部时,由于电子设备壳体的影响,可以更好的区分前后噪声,增强滤波处理的效果,缓解低信噪比条件下的语音失真问题以及人声干扰消除不彻底问题。In the method shown in FIG. 2, the user's position relative to the electronic device is obtained, and the main beam and the sub-beam are selected from the preset fixed beams of the electronic device through the position, so that the sound of the target sound source can be obtained more accurately from the sound signal Signal, effectively reduce the human voice interference in the target sound signal; use at least 3 microphones to receive the sound signal, due to the influence of the electronic device housing, it can better distinguish the noise, enhance the effect of filtering processing, and alleviate the condition of low signal-to-noise ratio The voice distortion problem under the. Especially when the at least three microphones are dispersedly arranged on different parts of the electronic device, for example, when the three microphones are respectively arranged on the top, bottom, and back of the electronic device, due to the influence of the housing of the electronic device, the front and rear noise can be better distinguished. Enhance the effect of filtering processing, alleviate the problem of voice distortion under the condition of low signal-to-noise ratio and incomplete elimination of human voice interference.
图3a为本申请拾音方法另一个实施例的流程图,如图3a所示,该方法可以包括:Fig. 3a is a flowchart of another embodiment of a sound pickup method according to the present application. As shown in Fig. 3a, the method may include:
步骤301:获得预设组数的固定波束的波束形成系数、方向、以及波束宽度。Step 301: Obtain beamforming coefficients, directions, and beam widths of a preset number of fixed beams.
其中,所述预设组数大于等于2,也即预设组数的最小值为2,最大值不限制。Wherein, the preset group number is greater than or equal to 2, that is, the minimum value of the preset group number is 2, and the maximum value is not limited.
其中,本步骤一般为预设步骤,也即获得预设组数的固定波束的波束形成系数、方向、以及波束宽度后,可以将获得的上述信息存储于电子设备中,无需每次执行步骤302~步骤309之前均执行该步骤。实际应用中,也可以对存储于电子设备中的上述信息进行修改。Among them, this step is generally a preset step, that is, after obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams, the obtained information can be stored in the electronic device, without performing step 302 every time. Execute this step before step 309. In practical applications, the above-mentioned information stored in the electronic device can also be modified.
本步骤的实现请参考图4所示的说明,这里不赘述。For the implementation of this step, please refer to the description shown in Figure 4, which will not be repeated here.
为了便于以下步骤中的描述,对图4所示实施例中基于电子设备建立的三维笛卡尔坐标系进行说明,参见图3b所示,三维笛卡尔坐标系以电子设备上表面的中心点作为坐标原点,电子设备上表面的对称轴分别为X轴和Y轴,经过电子设备上表面的中心点的垂线作为Z轴。电子设备的上表面一般是电子设备具有显示屏侧的表面。In order to facilitate the description in the following steps, the three-dimensional Cartesian coordinate system established based on the electronic device in the embodiment shown in FIG. 4 is described. As shown in FIG. 3b, the three-dimensional Cartesian coordinate system uses the center point of the upper surface of the electronic device as the coordinates. The origin and the symmetry axes of the upper surface of the electronic device are the X-axis and the Y-axis, respectively, and the vertical line passing through the center point of the upper surface of the electronic device is the Z-axis. The upper surface of the electronic device is generally the surface of the electronic device on the side of the display screen.
以下的步骤302~步骤304为所述获得用户相对电子设备的方位的步骤的一种可能实现方法。The following steps 302 to 304 are a possible implementation method of the step of obtaining the user's position relative to the electronic device.
步骤302:获取所述电子设备的摄像头捕捉到的图像,判断从所述图像中是否能够识别出所述电子设备的用户的人脸信息,如果否,执行步骤303;如果是,执行步骤304。Step 302: Obtain the image captured by the camera of the electronic device, and determine whether the face information of the user of the electronic device can be recognized from the image, if not, execute step 303; if yes, execute step 304.
在实际应用中,电子设备中可以保存电子设备的用户的人脸信息,在一种可能的实现方式中,该人脸信息可以由电子设备的用户在电子设备中自主设置。In practical applications, the electronic device can store the facial information of the user of the electronic device. In a possible implementation manner, the facial information can be independently set by the user of the electronic device in the electronic device.
其中,本步骤中使用电子设备的所有摄像头还是部分摄像头捕捉图像,本申请实施例不作限定。例如,可以使用前置摄像头捕捉图像,或者,也可以使用前置摄像头和后置摄像头捕捉图像。Among them, whether all cameras of the electronic device are used in this step to capture images or some cameras are not limited in the embodiment of the present application. For example, you can use the front camera to capture images, or you can use the front camera and the rear camera to capture images.
在一种可能的实现方式中,本步骤可以使用人像识别检测技术识别用户的人脸信息,具体的,人像识别检测技术是用电子设备的摄像头采集含有人脸的图像或视频流,并自动在采集的图像或视频流中检测和跟踪人脸,进而对检测到的人脸进行脸部识别的一系列相关技术,使用这一技术就可以识别出用户的人脸信息在图像或者视频流中的位置信息。In a possible implementation, this step can use face recognition detection technology to recognize the user's face information. Specifically, the face recognition detection technology uses the camera of an electronic device to collect images or video streams containing human faces, and automatically A series of related technologies that detect and track faces in the collected images or video streams, and then perform facial recognition on the detected faces. Using this technology, the user’s face information can be identified in the image or video stream. location information.
步骤303:获取电子设备的摆放位置,根据所述摆放位置估计所述用户相对电子设备的方位;执行步骤305。Step 303: Obtain the placement position of the electronic device, and estimate the position of the user relative to the electronic device according to the placement position; go to step 305.
在一种可能的实现方式中,用户相对电子设备的方位可以通过(方位角,俯仰角)来表示,其中,用户相对电子设备的方位在图3b所示三维笛卡尔坐标系中可以用坐标系原点指向用户的人脸中心点的射线来表示,所述方位角是:坐标系原点指向用户的人脸中心点的射线在XOY平面上投影的射线与X轴正方向的夹角;所述俯仰角是:坐标系原点指向用户的人脸中心点的射线与Z轴正方向的夹角。参见图3c的具体举例,假设A点为用户的人脸中心点,则OA为坐标系原点指向用户的人脸中心点的射线,也即用户相对电子设备的方位,其方位角为射线OA在XOY平面上投影的射线OB与X轴正方向的夹角,如图3c所示,为∠XOB;其俯仰角为射线OA与Z轴正方向的夹角,如图3c所示,为∠ZOA,通过这两个角度来表示用户相对电子设备的方位。需要说明的是,用户相对电子设备的方位用方位角和俯仰角来标 识仅为举例,并不用以限定本申请实施例用户相对电子设备的方位的其他表示方式或实现方式。In a possible implementation, the user's position relative to the electronic device can be represented by (azimuth angle, pitch angle), where the user's position relative to the electronic device can be represented in the three-dimensional Cartesian coordinate system shown in Figure 3b. A ray whose origin points to the center point of the user’s face is expressed as the azimuth angle is: the angle between the ray projected on the XOY plane and the positive direction of the X-axis by the ray whose origin of the coordinate system points to the center point of the user’s face; The angle is: the angle between the ray with the origin of the coordinate system pointing to the center of the user's face and the positive direction of the Z axis. Referring to the specific example in Figure 3c, assuming point A is the center point of the user's face, then OA is the ray whose origin of the coordinate system points to the center point of the user's face, that is, the position of the user relative to the electronic device, and the azimuth is the ray OA at The angle between the ray OB projected on the XOY plane and the positive direction of the X axis, as shown in Figure 3c, is ∠XOB; its elevation angle is the angle between the ray OA and the positive direction of the Z axis, as shown in Figure 3c, which is ∠ZOA , Through these two angles to show the user's position relative to the electronic device. It should be noted that the identification of the user's position relative to the electronic device with the azimuth angle and the pitch angle is only an example, and is not used to limit other representations or implementations of the user's position relative to the electronic device in the embodiment of the present application.
在一种可能的实现方式中,可以使用电子设备中的重力传感器(g-sensor)获取电子设备的摆放位置。具体的,重力传感器可以获取电子设备不同方向的重力加速度,电子设备的摆放位置不同,重力传感器在不同方向获取的重力加速度的数值也会不同。以为电子设备建立图3b中的三维笛卡尔坐标系为例,参见图3d所示电子设备的可能摆放位置示例图,将电子设备的显示屏朝上平躺放在桌面上,X轴与Y轴重力加速度为0,Z轴重力加速度的值大于9.8,桌面在图3d中未示出;将电子设备的显示屏朝下平躺放在桌面上时,X轴与Y轴重力加速度为0,Z轴重力加速度的值小于-9.8;将电子设备正向竖立放置(完全竖直),X轴与Z轴的重力加速度为0,Y轴重力加速度的值大于9.8;将电子设备倒立放置(完全竖直),X轴与Z轴的重力加速度为0,Y轴重力加速度的值小于-9.8;将电子设备向左横立放置(完全横立),Y轴与Z轴的重力加速度为0,X轴重力加速度的值大于9.8;将电子设备向右横立放置(完全横立),Y轴与Z轴的重力加速度为0,X轴重力加速度的值小于-9.8。因此,根据重力传感器在各个方向上获取的重力加速度的数值,可以得到电子设备的摆放位置。In a possible implementation manner, a g-sensor in the electronic device may be used to obtain the placement position of the electronic device. Specifically, the gravity sensor can obtain the gravitational acceleration of the electronic device in different directions, and the value of the gravitational acceleration obtained by the gravity sensor in different directions will be different when the position of the electronic device is different. Taking the electronic device to establish the three-dimensional Cartesian coordinate system in Figure 3b as an example, refer to the example diagram of possible placement positions of the electronic device shown in Figure 3d, and place the electronic device’s display screen facing up on the desktop, with the X axis and Y The axis gravitational acceleration is 0, the value of the Z axis gravitational acceleration is greater than 9.8, and the desktop is not shown in Figure 3d; when the electronic device’s display screen is placed on the desktop, the X-axis and Y-axis gravitational accelerations are 0, Z The value of the axis gravitational acceleration is less than -9.8; the electronic device is placed upright (fully vertical), the gravitational acceleration of the X-axis and Z-axis is 0, and the value of the Y-axis gravitational acceleration is greater than 9.8; the electronic device is placed upside down (fully vertical) Straight), the gravitational acceleration of X-axis and Z-axis is 0, and the value of Y-axis gravitational acceleration is less than -9.8; if the electronic device is placed horizontally to the left (fully horizontally), the gravitational acceleration of Y-axis and Z-axis is 0, X The value of the axis gravitational acceleration is greater than 9.8; the electronic device is placed horizontally to the right (fully horizontally), the gravitational acceleration of the Y-axis and the Z-axis are 0, and the value of the X-axis gravitational acceleration is less than -9.8. Therefore, according to the gravity acceleration values obtained by the gravity sensor in various directions, the placement position of the electronic device can be obtained.
具体的,可以预先设置电子设备的不同摆放位置对应的X轴重力加速度的阈值范围,Y轴重力加速度的阈值范围,Z轴重力加速度的阈值范围,相应的,本步骤中可以根据重力加速度输出的X轴重力加速度、Y轴重力加速度、以及Z轴重力加速度,确定其所处阈值范围,从而获得电子设备的摆放位置。举例来说,参考前述电子设备摆放位置对应的重力加速度举例,假设X轴、Y轴、Z轴的重力加速度分别为g 1、g 2、g 3,当|g 1|<Δ 1,|g 2|<Δ 1,|g 3-9.8|<Δ 1或者|g 3+9.8|<Δ 1时,电子设备处于水平放置状态;当|g 1|<Δ 1,|g 3|<Δ 1,g 2>Δ 2时,电子设备处于手持状态;当|g 2|<Δ 1,|g 3|<Δ 1,g 1>Δ 2时,电子设备处于向左倾斜状态;当|g 2|<Δ 1,|g 3|<Δ 1,g 1<-Δ 2时,电子设备处于向右倾斜状态,其中Δ 1和Δ 2为预设的阈值,Δ 1可以为接近于0的正数,Δ 2可以为大于Δ 1的正数。其中,Δ 1和Δ 2的具体取值可以在实际应用中自主设置,本申请不限制。 Specifically, the threshold range of X-axis gravitational acceleration, the threshold range of Y-axis gravitational acceleration, and the threshold range of Z-axis gravitational acceleration corresponding to different placement positions of the electronic device can be preset. Correspondingly, the output can be based on the gravity acceleration in this step. The X-axis gravitational acceleration, Y-axis gravitational acceleration, and Z-axis gravitational acceleration of, determine the threshold range in which they are located, so as to obtain the placement position of the electronic device. For example, referring to the aforementioned example of gravitational acceleration corresponding to the placement position of the electronic device, assuming that the gravitational accelerations of the X-axis, Y-axis, and Z-axis are g 1 , g 2 , and g 3 , respectively, when |g 1 |<Δ 1 , | g 2 |<Δ 1 , |g 3 -9.8|<Δ 1 or |g 3 +9.8|<Δ 1 , when the electronic device is placed horizontally; when |g 1 |<Δ 1 , |g 3 |<Δ 1 , when g 2 >Δ 2 , the electronic device is in a hand-held state; when |g 2 |<Δ 1 , |g 3 |<Δ 1 , g 1 >Δ 2 , the electronic device is in a state of tilting to the left; when |g 2 |<Δ 1 , |g 3 |<Δ 1 , when g 1 <-Δ 2 , the electronic device is tilted to the right, where Δ 1 and Δ 2 are preset thresholds, and Δ 1 can be close to 0 positive, Δ 2 [Delta] may be a positive number greater than 1. Among them, the specific values of Δ 1 and Δ 2 can be independently set in practical applications, and this application is not limited.
在实际应用中,可以预设不同摆放位置与用户相对电子设备的方位之间的对应关系;则,所述根据所述摆放位置估计所述用户相对电子设备的方位,可以包括:In practical applications, the corresponding relationship between different placement positions and the user's position relative to the electronic device can be preset; then, the estimating the user's position relative to the electronic device based on the placement position may include:
从预设的所述对应关系中获取所述电子设备的摆放位置对应的用户相对电子设备的方位。Obtain the user's position relative to the electronic device corresponding to the placement position of the electronic device from the preset correspondence relationship.
对该实现方式说明如下:如果电子设备从摄像头拍摄的图像中没有识别出电子设备的用户的人脸信息,表明用户的人脸方位超过了摄像头拍摄角度范围,这时可以根据所述摆放位置以及所述摄像头的拍摄角度范围来估计用户相对电子设备最可能的方位,具体的,The implementation method is described as follows: if the electronic device does not recognize the user's face information of the electronic device from the image taken by the camera, it indicates that the user's face orientation exceeds the camera's shooting angle range, and then it can be placed according to the position And the shooting angle range of the camera to estimate the most likely position of the user relative to the electronic device. Specifically,
可以先从用户相对电子设备的所有方位中剔除掉所述摄像头的拍摄角度范围对应的方位;The position corresponding to the shooting angle range of the camera may be first excluded from all positions of the user relative to the electronic device;
之后,可以根据用户使用习惯大数据统计分析,从剩余的方位中推算出电子设 备在不同摆放位置下用户相对电子设备最大概率的方位,从而可以得到:不同摆放位置、用户相对电子设备的方位两者之间的对应关系。Then, according to the big data statistical analysis of the user’s usage habits, from the remaining positions, the position with the greatest probability of the user relative to the electronic device in the different placement positions of the electronic device can be calculated, so as to obtain: Correspondence between the two directions.
例如,参考前述的摆放位置举例,基于使用习惯和阅读方便,电子设备处于手持状态或者水平放置状态时,用户大概率正对电子设备,位于电子设备的y轴负方向位置,剔除掉摄像头的拍摄角度范围对应的方位,可以设置电子设备处于手持状态或者水平放置状态时,对应的用户相对电子设备的方位可以为:(270°,90°);电子设备处于向左倾斜状态或者向右倾斜状态时,用户大多是在观看视频或者玩游戏,用户位于电子设备的XOZ平面内,剔除掉摄像头的拍摄角度范围对应的方位,可以设置电子设备处于向左倾斜状态或者向右倾斜状态时,对应的用户相对电子设备的方位可以为:(0°,45°)或者(180°,45°)。For example, referring to the foregoing placement position example, based on usage habits and ease of reading, when the electronic device is in a handheld state or a horizontal placement state, the user is likely to face the electronic device directly, which is located in the negative position of the y-axis of the electronic device, and eliminate the camera The position corresponding to the shooting angle range can be set when the electronic device is in a handheld state or a horizontally placed state. The corresponding user's position relative to the electronic device can be: (270°, 90°); the electronic device is tilted to the left or tilted to the right In the state, the user is mostly watching videos or playing games. The user is located in the XOZ plane of the electronic device. The orientation corresponding to the shooting angle range of the camera is eliminated. The electronic device can be set to the left or right tilt state, corresponding The position of the user relative to the electronic device can be: (0°, 45°) or (180°, 45°).
以上仅为可能实现方式的示例性说明,并不用以限定本申请实施例。例如:上述方位角和俯仰角的具体取值可以不同;不同电子设备的摄像头拍摄角度范围不同,不同电子设备处于同一摆放位置,设置的该摆放位置对应的用户相对电子设备的方位也可能不同。The foregoing is only an exemplary description of possible implementation manners, and is not used to limit the embodiments of the present application. For example: the specific values of the above-mentioned azimuth and pitch angles can be different; different electronic devices have different camera shooting angle ranges, and different electronic devices are in the same placement position, and the user's orientation relative to the electronic device corresponding to the placement position may also be set different.
相比于以下步骤304中根据人脸在图像中的位置估计用户相对电子设备的方位,通过电子设备的摆放位置来间接估计用户相对电子设备的方位,准确度要低一点,但是考虑到超过摄像头角度的场景不多,另外后续步骤中固定波束的宽度也可以容许一定的角度误差,因此,本步骤中根据电子设备的摆放位置估计出用户相对电子设备的方位,仍然可以满足本申请实施例的要求,对本申请实施例后续的处理结果影响较小。Compared with estimating the user's position relative to the electronic device according to the position of the face in the image in the following step 304, the position of the electronic device is used to indirectly estimate the user's position relative to the electronic device. The accuracy is a bit lower, but it is considered to exceed There are not many scenes with camera angles. In addition, the width of the fixed beam in the subsequent steps can also tolerate a certain angle error. Therefore, in this step, the position of the user relative to the electronic device is estimated according to the placement position of the electronic device, which can still meet the requirements of the implementation of this application. The requirements of the examples have little impact on the subsequent processing results of the examples of this application.
举例来说,可以根据用户使用习惯大数据以及电子设备的摆放位置,得到不同摆放位置对应的用户相对电子设备最大概率的方位。以电子设备是手机为例,假设电子设备的摆放位置为手持,剔除前置摄像头和后置摄像头的拍摄角度方位对应的方位,用户相对电子设备最大概率的方位可以为:位于手机的底部方位,即图3b中y轴负方向。For example, according to the big data of the user's usage habits and the placement position of the electronic device, the position of the user with the greatest probability relative to the electronic device corresponding to different placement positions can be obtained. Taking the electronic device as a mobile phone as an example, assuming that the electronic device is placed in a handheld position, and excluding the positions corresponding to the shooting angles of the front camera and the rear camera, the position with the greatest probability of the user relative to the electronic device can be: located at the bottom position of the mobile phone , That is, the negative direction of the y-axis in Figure 3b.
步骤304:获取所述用户的人脸信息在所述图像中的位置信息,根据所述位置信息,获得所述用户相对电子设备的方位;执行步骤305。Step 304: Obtain the position information of the user's face information in the image, and obtain the position of the user relative to the electronic device according to the position information; go to step 305.
本步骤中可以使用投影等相关技术,将用户在图像中的位置信息直接转化为在图3b所示三维笛卡尔坐标系中的方位角和俯仰角,得到用户相对电子设备的方位。In this step, projection and other related technologies can be used to directly convert the user's position information in the image into the azimuth and elevation angles in the three-dimensional Cartesian coordinate system shown in FIG. 3b to obtain the user's azimuth relative to the electronic device.
通过获得用户相对电子设备的方位,可以获得更加准确的目标人说话信息,为后续信号处理带来更多先验信息。By obtaining the position of the user relative to the electronic device, more accurate target person's speech information can be obtained, which brings more prior information for subsequent signal processing.
以下的步骤305~步骤306是步骤202的一种可能的实现方式。The following steps 305 to 306 are a possible implementation of step 202.
步骤305:计算所述方位针对每个固定波束的比值K。Step 305: Calculate the ratio K of the azimuth to each fixed beam.
K=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000007
其中,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000008
是固定波束k的波束宽度。k=1,2,…,M。
K = included angle Δ k /beam width
Figure PCTCN2021079789-appb-000007
Wherein the angle [Delta] k is the angle between the direction of orientation of the fixed beam k beamwidth
Figure PCTCN2021079789-appb-000008
Is the beam width of the fixed beam k. k=1, 2,...,M.
在一种可能的实现方式中,本步骤可以包括:针对固定波束k,计算所述方位与固定波束k的方向之间的夹角Δ k,再计算夹角Δ k与该固定波束k的波束宽度
Figure PCTCN2021079789-appb-000009
之间的比值。
In one possible implementation, the present step may include: k for a fixed beam, the angle [Delta] k is calculated between the direction of the orientation of the fixed beam k, then k is calculated beam angle [Delta] k is the fixed beam width
Figure PCTCN2021079789-appb-000009
The ratio between.
步骤306:从所述比值中选择最小的比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。Step 306: Select the fixed beam corresponding to the smallest ratio from the ratios as the main beam, and select at least one fixed beam corresponding to the ratio as the secondary beam starting from the largest ratio in the descending order of the ratio. .
在实际应用中,副波束的数量可以为1个或者多个,具体数量本申请并不限制,但是,副波束和主波束的总数量不超过固定波束的数量M。也即是说:假设M为2,则副波束的数量只能为1,假设M为5,则副波束的数量可以为2、3、或4。在一种可能的实现方式中,副波束的数量可以为2。In practical applications, the number of secondary beams may be one or more, and the specific number is not limited in this application. However, the total number of secondary beams and main beams does not exceed the number M of fixed beams. In other words, if M is 2, the number of sub-beams can only be 1, and if M is 5, the number of sub-beams can be 2, 3, or 4. In a possible implementation, the number of sub-beams may be two.
主波束的波束形成系数记为W (1)(f),副波束的波束形成系数记为W (q)(f),q=2,…,S+1;S为副波束的数量。 The beamforming coefficient of the main beam is denoted as W (1) (f), and the beamforming coefficient of the secondary beam is denoted as W (q) (f), q=2,...,S+1; S is the number of secondary beams.
步骤307:获得N个麦克风接收到的N路声音信号,对N路声音信号进行回声消除,得到声音信号:X(f,l)=[X 1(f,l),X 2(f,l),…,X N(f,l)] T;l为帧号。 Step 307: Obtain N channels of sound signals received by N microphones, and perform echo cancellation on the N channels of sound signals to obtain a sound signal: X(f,l)=[X 1 (f,l), X 2 (f,l) ),..., X N (f, l)] T ; l is the frame number.
其中,所述回声消除步骤为可选步骤,本步骤中如何对N路声音信号进行回声消除,本申请并不限制。Wherein, the echo cancellation step is an optional step. How to perform echo cancellation on N channels of sound signals in this step is not limited in this application.
在实际应用中,可以使用相关的回声消除算法进行N路声音信号的回声消除,回声消除算法又包括时域处理算法和频域处理算法,这里不赘述。自适应的回声消除算法的基本原理是:利用参考信号自适应地估计出回声信号,将麦克风接收到的声音信号减去估计的回声信号,获得无回声的声音信号。In practical applications, the related echo cancellation algorithm can be used to perform echo cancellation of N channels of sound signals. The echo cancellation algorithm includes time domain processing algorithm and frequency domain processing algorithm, which will not be repeated here. The basic principle of the adaptive echo cancellation algorithm is: use the reference signal to adaptively estimate the echo signal, and subtract the estimated echo signal from the sound signal received by the microphone to obtain an echoless sound signal.
步骤307与步骤305~步骤306之间没有执行顺序的限制。There is no restriction on the execution order between step 307 and step 305 to step 306.
步骤308:根据声音信号X(f,l)以及主波束的波束形成系数W (1)(f)计算主输出信号Y 1(f,l)=W (1)(f)X(f,l);根据声音信号X(f,l)以及副波束的波束形成系数W (q)(f)计算副输出信号Y q(f,l)=W (q)(f)X(f,l)。 Step 308: Calculate the main output signal Y 1 (f, l) = W (1) (f) X (f, l) according to the sound signal X (f, l) and the beam forming coefficient W (1) (f) of the main beam ); calculate the secondary output signal Y q (f, l) = W (q) (f) X (f, l) according to the sound signal X (f, l) and the beam forming coefficient W (q) (f) of the secondary beam .
步骤309:使用副输出信号Y q(f,l)对主输出信号Y 1(f,l)进行滤波处理,得到目标声音信号。 Step 309: Use the auxiliary output signal Y q (f, l) to filter the main output signal Y 1 (f, l) to obtain the target sound signal.
在一种可能的实现方式中,假设副波束为2个,则副输出信号为2个,假设目标声音信号为Z(f,l),那么
Figure PCTCN2021079789-appb-000010
其中,y 2=[Y 2(f,l),…,Y 2(f,l-p+1)] T,y 3=[Y 3(f,l),…,Y 3(f,l-p+1)] T,b 2和b 3为p×1维滤波器系数矩阵,p是滤波器系数矩阵的维数,具体数值可以在实际应用中自主选择设置,本申请不限制。
In a possible implementation manner, assuming that there are two secondary beams, the secondary output signal is two, and assuming that the target sound signal is Z(f, l), then
Figure PCTCN2021079789-appb-000010
Among them, y 2 =[Y 2 (f,l),..., Y 2 (f,l-p+1)] T , y 3 =[Y 3 (f,l),..., Y 3 (f,l -p+1)] T , b 2 and b 3 are p×1 dimensional filter coefficient matrices, p is the dimension of the filter coefficient matrix, and specific values can be independently selected and set in practical applications, and this application is not limited.
在实际应用中可以使用相关的滤波算法如维纳滤波、最小均方差准则滤波、卡尔曼滤波等进行本步骤中的滤波处理,这里不再赘述。In practical applications, relevant filtering algorithms such as Wiener filtering, minimum mean square error criterion filtering, Kalman filtering, etc. can be used to perform the filtering processing in this step, which will not be repeated here.
本申请实施例中,在常规2个麦克风的基础上增加了至少一个麦克风,可选地,增加的麦克风可以为背部麦克风,这些麦克风组成了一个立体的麦克风阵列,由于电子设备壳体的影响,该麦克风阵列可以很好地基于3D空间进行定向波束形成,做 到区分前后噪声的效果。In the embodiment of this application, at least one microphone is added on the basis of the conventional two microphones. Optionally, the added microphone may be a back microphone. These microphones form a stereo microphone array. Due to the influence of the housing of the electronic device, The microphone array can perform directional beamforming based on the 3D space, and achieve the effect of distinguishing front and rear noise.
以下,通过图4所示的步骤流程对步骤301的实现进行举例说明。参见图4所示,包括:Hereinafter, the implementation of step 301 will be explained by using the step flow shown in FIG. 4 as an example. See Figure 4, including:
步骤401:建立基于电子设备的三维笛卡尔坐标系。Step 401: Establish a three-dimensional Cartesian coordinate system based on the electronic device.
三维笛卡尔坐标系的建立方法请参见图3b以及对应描述,这里不赘述,在图3b中以麦克风的数量N取值3为例,以这3个麦克风分别位于电子设备的顶部、底部以及背部为例。Please refer to Figure 3b and the corresponding description for the establishment method of the three-dimensional Cartesian coordinate system, which will not be repeated here. In Figure 3b, the number of microphones N is taken as an example, and the three microphones are located on the top, bottom and back of the electronic device. Take for example.
步骤402:根据N个麦克风在电子设备上的位置,分别获得N个麦克风在三维笛卡尔坐标系中的坐标。Step 402: Obtain the coordinates of the N microphones in the three-dimensional Cartesian coordinate system according to the positions of the N microphones on the electronic device.
假设每个麦克风Mici的坐标为(x i,y i,z i),i=1,2,…,N。 Assuming that the coordinates of each microphone Mici are (x i , y i , z i ), i=1, 2, ..., N.
参见图3b所示,第一个麦克风Mic1的坐标为(x 1,y 1,z 1);第二个麦克风Mic2的坐标为(x 2,y 2,z 2);第三个麦克风Mic3的坐标为(x 3,y 3,z 3)。 As shown in Figure 3b, the coordinates of the first microphone Mic1 are (x 1 , y 1 , z 1 ); the coordinates of the second microphone Mic2 are (x 2 , y 2 , z 2 ); the coordinates of the third microphone Mic3 are The coordinates are (x 3 , y 3 , z 3 ).
步骤403:根据N个麦克风在三维笛卡尔坐标系中的坐标,计算目标声源在理想条件下的导向矢量。Step 403: Calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones in the three-dimensional Cartesian coordinate system.
具体的,假设目标声源的方向为(θ,φ),θ是目标声源的方位角,φ是目标声源的俯仰角,则目标声源在理想条件下的导向矢量
Figure PCTCN2021079789-appb-000011
Figure PCTCN2021079789-appb-000012
其中,τ i是麦克风i相对于坐标原点的时延,计算公式参见以下公式(1)。
Specifically, assuming that the direction of the target sound source is (θ, φ), θ is the azimuth angle of the target sound source, and φ is the pitch angle of the target sound source, the steering vector of the target sound source under ideal conditions
Figure PCTCN2021079789-appb-000011
Figure PCTCN2021079789-appb-000012
Among them, τ i is the time delay of the microphone i relative to the origin of the coordinate, and the calculation formula refers to the following formula (1).
Figure PCTCN2021079789-appb-000013
Figure PCTCN2021079789-appb-000013
其中,c是声速,f是频率。Where c is the speed of sound and f is the frequency.
步骤404:获得电子设备壳体对麦克风的频域响应矩阵Γ(θ,φ,f)。Step 404: Obtain the frequency domain response matrix Γ(θ, φ, f) of the housing of the electronic device to the microphone.
在实际应用中,一般通过让电子设备的麦克风接收不同方向的同一音频来计算麦克风对不同方位信号的响应,得到电子设备如手机的壳体对麦克风的频域响应矩阵。具体步骤是:将电子设备放置在专业的全消室中,以电子设备为球心,依次在1m半径的球面不同位置上播放同一音频,该音频一般是高斯白噪声,然后通过电子设备麦克风接收来自球面不同位置上的音频信号,基于如果没有电子设备壳体的影响,麦克风接收到的音频信号应该是一致的原理,通过对比计算获得电子设备壳体对各个麦克风的响应情况,得到频域响应矩阵Γ(θ,φ,f)。In practical applications, the microphone's response to signals in different directions is generally calculated by allowing the microphone of the electronic device to receive the same audio in different directions, and the frequency domain response matrix of the housing of the electronic device, such as a mobile phone, to the microphone is obtained. The specific steps are: place the electronic equipment in a professional complete elimination room, take the electronic equipment as the center of the sphere, and play the same audio on different positions of the spherical surface with a radius of 1m. The audio is generally Gaussian white noise, and then it is received through the microphone of the electronic device. The audio signals from different positions on the spherical surface are based on the principle that the audio signals received by the microphones should be consistent without the influence of the electronic device casing. The response of the electronic device casing to each microphone is obtained by comparison and calculation, and the frequency domain response is obtained The matrix Γ(θ, φ, f).
步骤405:根据频域响应矩阵Γ(θ,φ,f)、以及目标声源在理想条件下的导向矢量a(θ,φ,f),计算目标声源的真实导向矢量
Figure PCTCN2021079789-appb-000014
Step 405: Calculate the true steering vector of the target sound source according to the frequency domain response matrix Γ(θ, φ, f) and the steering vector a(θ, φ, f) of the target sound source under ideal conditions
Figure PCTCN2021079789-appb-000014
目标声源的真实导向矢量
Figure PCTCN2021079789-appb-000015
Figure PCTCN2021079789-appb-000016
The true guidance vector of the target sound source
Figure PCTCN2021079789-appb-000015
Figure PCTCN2021079789-appb-000016
步骤406:根据目标声源的真实导向矢量
Figure PCTCN2021079789-appb-000017
计算预设组数的固定波束的波束形成系数W k(f)、方向以及波束宽度
Figure PCTCN2021079789-appb-000018
k=1,2,…,M,M为固定波束的预 设组数。
Step 406: According to the real steering vector of the target sound source
Figure PCTCN2021079789-appb-000017
Calculate the beamforming coefficient W k (f), direction and beam width of the fixed beams with a preset number of groups
Figure PCTCN2021079789-appb-000018
k=1, 2, ..., M, M is the preset number of groups of fixed beams.
在一种可能的实现方式中,如果M<4,每个固定波束的方向指向一水平方向,将360°空间平均划分为M份;如果M≥4,一个固定波束的方向指向Z轴正方向,其他M-1个固定波束的方向指向一水平方向,将360°空间平均划分为M-1份,类似一个莲花状。例如,M=5时,5组固定波束的方向可以分别指向X轴正方向、X轴负方向、Y轴正方向、Y轴负方向和Z轴正方向。In a possible implementation, if M<4, the direction of each fixed beam points to a horizontal direction, and the 360° space is divided into M equally; if M≥4, the direction of a fixed beam points to the positive direction of the Z axis , The directions of the other M-1 fixed beams point to a horizontal direction, and the 360° space is divided into M-1 parts on average, similar to a lotus shape. For example, when M=5, the directions of the 5 groups of fixed beams can respectively point to the positive direction of the X axis, the negative direction of the X axis, the positive direction of the Y axis, the negative direction of the Y axis, and the positive direction of the Z axis.
在一种可能的实现方式中,M可以为5,则得到五组固定波束的波束形成系数W k(f),k=1,2,3,4,5;五组波束的方向可以分别指向X轴正方向、X轴负方向、Y轴正方向、Y轴负方向和Z轴正方向;五组固定波束的波束宽度分别为
Figure PCTCN2021079789-appb-000019
In a possible implementation manner, M can be 5, and the beamforming coefficients W k (f) of five groups of fixed beams are obtained, k = 1, 2, 3, 4, 5; the directions of the five groups of beams can be respectively directed to X-axis positive direction, X-axis negative direction, Y-axis positive direction, Y-axis negative direction and Z-axis positive direction; the beam widths of the five groups of fixed beams are respectively
Figure PCTCN2021079789-appb-000019
在实际应用中,可以使用固定波束形成算法来计算五组固定波束形成系数。In practical applications, a fixed beamforming algorithm can be used to calculate five sets of fixed beamforming coefficients.
简单的固定波束形成算法为延时相加算法,其波束形成系数是
Figure PCTCN2021079789-appb-000020
Figure PCTCN2021079789-appb-000021
θ k表示固定波束k的方位角,φ k表示固定波束k的俯仰角。以上述5组固定波束的方向分别指向X轴正方向、X轴负方向、Y轴正方向、Y轴负方向和Z轴正方向为例,五组固定波束的方位角和俯仰角(θ k,φ k)分别为:(0°,90°)、(180°,90°)、(90°,90°)、(270°,90°)以及(0°,0°)。
The simple fixed beamforming algorithm is the delay addition algorithm, and its beamforming coefficient is
Figure PCTCN2021079789-appb-000020
Figure PCTCN2021079789-appb-000021
θ k represents the azimuth angle of the fixed beam k, and φ k represents the elevation angle of the fixed beam k. Taking the directions of the above five fixed beams pointing to the positive X-axis, the negative X-axis, the positive Y-axis, the negative Y-axis, and the positive Z-axis as an example, the azimuth and elevation angles of the five fixed beams (θ k , Φ k ) are respectively: (0°, 90°), (180°, 90°), (90°, 90°), (270°, 90°) and (0°, 0°).
其中,固定波束的方向也可以用(方位角,俯仰角)来表示。固定波束的方位角是:在三维笛卡尔坐标系中,固定波束的方向在XOY平面上投影的射线与X轴正方向的夹角;固定波束的俯仰角是:在三维笛卡尔坐标系中,固定波束的方向与Z轴正方向的夹角;具体可以参考前述关于图3c的举例,不再赘述。Among them, the direction of the fixed beam can also be expressed by (azimuth angle, elevation angle). The azimuth angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, the angle between the ray projected on the XOY plane and the positive direction of the X axis in the direction of the fixed beam; the pitch angle of the fixed beam is: in the three-dimensional Cartesian coordinate system, The angle between the direction of the fixed beam and the positive direction of the Z-axis; for details, please refer to the aforementioned example of FIG. 3c, which will not be repeated.
复杂的固定波束形成算法包括超定向波、恒定束宽波束形成等,以上复杂的固定波束形成算法最后归结为一个二次规划问题,需要借助凸优化技术来求解获得固定波束形成系数W k(f)。 Complex fixed beamforming algorithms include superdirectional waves, constant beamwidth beamforming, etc. The above complex fixed beamforming algorithms finally boil down to a quadratic programming problem, which requires the help of convex optimization technology to solve and obtain the fixed beamforming coefficient W k (f ).
波束宽度
Figure PCTCN2021079789-appb-000022
的设定与波束的数量、电子设备上的麦克风布局、选用的固定波束形成算法、以及需要每个固定波束拾取的声源范围有关,可以在实际应用中自主设置,这里不限定。
Beam width
Figure PCTCN2021079789-appb-000022
The setting of is related to the number of beams, the microphone layout on the electronic device, the selected fixed beamforming algorithm, and the range of sound sources that need to be picked up by each fixed beam. It can be set independently in practical applications and is not limited here.
图4所示的方法实现了M组固定波束的获得。The method shown in Figure 4 achieves the acquisition of M groups of fixed beams.
在一种可能的实现方式中,可以将本申请实施例图3a所示的拾音方法应用于电子设备的语音助手场景下,例如,驾驶场景是用户使用手机语音助手频率相对较高的场景,该场景下噪声环境相对恶劣,包括有发动机的声音、轮胎摩擦声、空调声、开窗时的风噪等,这将直接导致手机接收到的用户语音信噪比变低,对语音助手拾取干净的用户语音提出了较大的挑战。具体的,参见图5a所示,电子设备可以包括:传感器模块、场景分析模块、前端增强模块、语音唤醒模块、声纹辨识确认模块、语音识别模块以及其他交互模块。其中,传感器模块可以包括:摄像头、麦克风以及重力传感 器,通过这些传感器可以分别获得用户的图像、声音信号、电子设备的摆放方位等数据;场景分析模块用于获取关于声音信号的先验信息,进行有针对性的声音拾取;前端增强模块用于提取用户(机主)的声音信号,也即是目标声音信号,同时抑制其他干扰信号和噪声;语音唤醒模块用于检测目标声音信号中的特定唤醒词,这些特定唤醒词可以“叫醒”电子设备,而电子设备最终会不会被唤醒还需要声纹辨识确认模块来“把关”,顾名思义,声纹辨识确认模块用于对用户声纹的辨识和确认,只有当前说唤醒词的用户声纹和预设的用户声纹一致,电子设备才最终被用户唤醒。In a possible implementation manner, the sound pickup method shown in Figure 3a of the embodiment of the present application can be applied to a voice assistant scenario of an electronic device. For example, a driving scenario is a scenario in which a user uses a mobile phone voice assistant with a relatively high frequency. The noise environment in this scenario is relatively harsh, including engine sound, tire friction sound, air-conditioning sound, wind noise when opening windows, etc. This will directly cause the user's voice signal-to-noise ratio received by the mobile phone to decrease, and the voice assistant will pick up cleanly. The user’s voice posed a greater challenge. Specifically, referring to FIG. 5a, the electronic device may include: a sensor module, a scene analysis module, a front-end enhancement module, a voice wake-up module, a voiceprint recognition and confirmation module, a voice recognition module, and other interaction modules. Among them, the sensor module may include: a camera, a microphone, and a gravity sensor, through which data such as the user's image, sound signal, and the placement position of the electronic device can be obtained respectively; the scene analysis module is used to obtain a priori information about the sound signal, Perform targeted sound pickup; the front-end enhancement module is used to extract the user's (host) sound signal, that is, the target sound signal, while suppressing other interference signals and noise; the voice wake-up module is used to detect specific target sound signals Wake-up words, these specific wake-up words can "wake up" the electronic device, and whether the electronic device will eventually be awakened requires the voiceprint recognition confirmation module to "check". As the name implies, the voiceprint recognition confirmation module is used to check the user's voiceprint Recognition and confirmation, only when the user's voiceprint currently speaking the wake-up word is consistent with the preset user's voiceprint, the electronic device is finally awakened by the user.
由于电子设备的资源开销限制,语音唤醒模块只支持一路唤醒,这就要求前端增强模块只能输出一路音频信号至语音唤醒模块进行唤醒检测,在存在多个说话人时,需准确辨识出目标说话人方位等信息,然后利用回声消除、固定波束形成、多通道自适应滤波等降噪算法进行定向拾音增强,估计出干净的目标声音信号送入语音唤醒模块,进行声纹检测和语音唤醒识别等后续处理。Due to the resource cost limitation of electronic equipment, the voice wake-up module only supports one way to wake up, which requires the front-end enhancement module to only output one audio signal to the voice wake-up module for wake-up detection. When there are multiple speakers, it is necessary to accurately identify the target speech The person’s position and other information are then used for noise reduction algorithms such as echo cancellation, fixed beam forming, and multi-channel adaptive filtering for directional sound pickup enhancement, and the clean target sound signal is estimated to be sent to the voice wake-up module for voiceprint detection and voice wake-up recognition Wait for follow-up processing.
基于图5a所示的电子设备的结构,结合图3a所示的实施例,对图3a所示实施例在图5a所示电子设备中的处理过程进行举例说明。参见图5b所示,用户与传感器模块之间的交互包括:摄像头捕捉包含人脸的图像,重力传感器可以获得电子设备在各个方向的重力加速度值,麦克风获取用户的声音信号。传感器模块中摄像头捕捉的图像、以及重力传感器获得的重力加速度值传输至场景分析模块,场景分析模块据此获得用户相对电子设备的方位,将该方位传输至前端增强模块。传感器模块将麦克风获取的声音信号也传输至前端增强模块,前端增强模块根据所述方位以及声音信号提取出目标声音信号。该目标声音信号是比较干净的一路语音信号,该目标声音信号将传输至语音唤醒模块和声纹辨识确认模块,由语音唤醒模块检测特定唤醒词,声纹辨识确认模块将目标声音信号的声纹与预设的用户声纹进行比对,确认声纹是否一致;如果声纹辨识确认模块确认声纹一致,语音识别模块根据语音唤醒模块提取的特定唤醒词与其他交互模块进行交互。Based on the structure of the electronic device shown in FIG. 5a, in conjunction with the embodiment shown in FIG. 3a, the processing process of the embodiment shown in FIG. 3a in the electronic device shown in FIG. 5a will be described as an example. As shown in Fig. 5b, the interaction between the user and the sensor module includes: a camera captures an image containing a human face, a gravity sensor can obtain gravitational acceleration values of the electronic device in various directions, and a microphone obtains the user's voice signal. The image captured by the camera in the sensor module and the gravitational acceleration value obtained by the gravity sensor are transmitted to the scene analysis module, and the scene analysis module obtains the position of the user relative to the electronic device according to this, and transmits the position to the front-end enhancement module. The sensor module also transmits the sound signal obtained by the microphone to the front-end enhancement module, and the front-end enhancement module extracts the target sound signal according to the position and the sound signal. The target sound signal is a relatively clean voice signal. The target sound signal will be transmitted to the voice wake-up module and the voiceprint recognition and confirmation module. The voice wake-up module detects the specific wake-up word, and the voiceprint recognition and confirmation module compares the voiceprint of the target sound signal. Compare with the preset user's voiceprint to confirm whether the voiceprint is consistent; if the voiceprint recognition confirmation module confirms that the voiceprint is consistent, the voice recognition module interacts with other interaction modules according to the specific wake-up words extracted by the voice wake-up module.
可以理解的是,上述实施例中的部分或全部步骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照上述实施例呈现的不同的顺序来执行,并且有可能并非要执行上述实施例中的全部操作。It can be understood that some or all of the steps or operations in the above-mentioned embodiments are only examples, and the embodiments of the present application may also perform other operations or various operation variations. In addition, each step may be executed in a different order presented in the foregoing embodiment, and it may not be necessary to perform all operations in the foregoing embodiment.
图6a为本申请拾音装置一个实施例的结构图,如图6a所示,拾音装置600可以包括:Fig. 6a is a structural diagram of an embodiment of a sound pickup device of this application. As shown in Fig. 6a, the sound pickup device 600 may include:
方位获得单元610,用于获得用户相对电子设备的方位;所述电子设备设置有N个麦克风;N为大于等于3的整数;The position obtaining unit 610 is configured to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
波束选择单元620,用于在所述电子设备的预设固定波束中,选择距离所述方位获得单元610获得的所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束;The beam selection unit 620 is configured to select, among the preset fixed beams of the electronic device, the fixed beam that is closest to the azimuth obtained by the azimuth obtaining unit 610 as the main beam, according to the distance from the farthest to the closest to the azimuth. Select at least one fixed beam as the secondary beam in sequence;
信号计算单元630,用于当所述N个麦克风接收到声音信号时,使用所述波束选择单元620选择的所述主波束的波束形成系数计算所述声音信号的主输出信号, 并且,使用所述波束选择单元620选择的所述副波束的波束形成系数计算所述声音信号的副输出信号;The signal calculation unit 630 is configured to use the beamforming coefficient of the main beam selected by the beam selection unit 620 to calculate the main output signal of the sound signal when the N microphones receive the sound signal, and use all Calculating the beamforming coefficient of the secondary beam selected by the beam selecting unit 620 to the secondary output signal of the sound signal;
滤波单元640,用于使用所述信号计算单元630计算的所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。The filtering unit 640 is configured to use the auxiliary output signal calculated by the signal calculation unit 630 to filter the main output signal to obtain a target sound signal.
其中,参见图6b所示,所述方位获得单元610可以包括:Wherein, referring to FIG. 6b, the position obtaining unit 610 may include:
图像获取子单元611,用于获取所述电子设备的摄像头捕捉到的图像;The image acquisition subunit 611 is configured to acquire the image captured by the camera of the electronic device;
方位获得子单元612,用于如果从所述图像子单元611获取到的所述图像中识别出所述电子设备的用户的人脸信息,根据所述人脸信息在所述图像中的位置信息,获得所述用户相对电子设备的方位;如果从所述图像子单元获取到的所述图像中未识别出所述用户的人脸信息,获取所述电子设备的摆放位置;根据所述摆放位置,获得所述用户相对所述电子设备的方位。The position obtaining subunit 612 is configured to, if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit 611, according to the position information of the facial information in the image , Obtain the position of the user relative to the electronic device; if the user’s face information is not recognized in the image obtained from the image subunit, obtain the placement position of the electronic device; Position to obtain the position of the user relative to the electronic device.
其中,参见图6c所示,所述波束选择单元620可以包括:Wherein, referring to FIG. 6c, the beam selection unit 620 may include:
比值计算子单元621,用于计算所述方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000023
其中,K k是所述方位针对固定波束k的比值,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000024
是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
The ratio calculation subunit 621 is used to calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Figure PCTCN2021079789-appb-000023
Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Figure PCTCN2021079789-appb-000024
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
波束选择子单元622,用于在所述比值计算子单元计算的比值中,选择最小的所述比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。The beam selection subunit 622 is configured to select, among the ratios calculated by the ratio calculation subunit, the fixed beam corresponding to the smallest ratio as the main beam, and the ratio from the largest to the smallest in the order of the ratio Start to select at least one fixed beam corresponding to the ratio as a secondary beam.
参见图7a,在图6a所示装置的基础上,该装置600还可以包括:Referring to FIG. 7a, based on the device shown in FIG. 6a, the device 600 may further include:
波束获得单元650,用于获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。The beam obtaining unit 650 is configured to obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
参见图7b,所述波束获得单元650可以包括:Referring to FIG. 7b, the beam obtaining unit 650 may include:
坐标系建立子单元651,用于为电子设备建立三维笛卡尔坐标系;The coordinate system establishment subunit 651 is used to establish a three-dimensional Cartesian coordinate system for the electronic device;
坐标获得子单元652,用于获得所述N个麦克风在所述坐标系中的坐标;A coordinate obtaining subunit 652, configured to obtain the coordinates of the N microphones in the coordinate system;
理想导向矢量计算子单元653,用于根据所述N个麦克风的坐标计算目标声源在理想条件下的导向矢量;The ideal steering vector calculation subunit 653 is configured to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
矩阵获得子单元654,用于获得电子设备壳体对所述麦克风的频域响应矩阵;A matrix obtaining subunit 654, configured to obtain a frequency domain response matrix of the electronic device housing to the microphone;
真实导向矢量计算子单元655,用于根据所述理想条件下的导向矢量以及所述频域响应矩阵计算所述目标声源的真实导向矢量;The true steering vector calculation subunit 655 is configured to calculate the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
固定波束计算子单元656,用于根据所述真实导向矢量计算所述预设组数的固定波束的波束形成系数、方向、以及波束宽度。The fixed beam calculation subunit 656 is configured to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
图6a~图7b所示实施例提供的拾音装置600可用于执行本申请图2~图4所示方法实施例的技术方案,其实现原理和技术效果可以进一步参考方法实施例中的相关描述。The sound pickup device 600 provided in the embodiment shown in FIGS. 6a to 7b can be used to implement the technical solutions of the method embodiments shown in FIGS. 2 to 4 of this application. For its implementation principles and technical effects, please refer to the related descriptions in the method embodiments. .
应理解以上图6a~图7b所示的拾音装置的各个单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些单元可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实 现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元通过硬件的形式实现。例如,方位获得单元可以为单独设立的处理元件,也可以集成在电子设备的某一个芯片中实现。其它单元的实现与之类似。此外这些单元全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be understood that the division of the various units of the sound pickup device shown in FIGS. 6a to 7b is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated. And these units can all be implemented in the form of software invocation through processing elements; they can also be implemented in the form of hardware; part of the units can also be implemented in the form of software invocation through processing elements, and some of the units can be implemented in the form of hardware. For example, the position obtaining unit may be a separately established processing element, or it may be integrated in a certain chip of the electronic device. The implementation of other units is similar. In addition, all or part of these units can be integrated together or implemented independently. In the implementation process, each step of the above method or each of the above units can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
例如,以上这些单元可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit;以下简称:ASIC),或,一个或多个微处理器(Digital Singnal Processor;以下简称:DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array;以下简称:FPGA)等。再如,这些单元可以集成在一起,以片上系统(System-On-a-Chip;以下简称:SOC)的形式实现。For example, the above units may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit; hereinafter referred to as ASIC), or, one or more micro-processing Digital Processor (Digital Singnal Processor; hereinafter referred to as DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array; hereinafter referred to as FPGA), etc. For another example, these units can be integrated together and implemented in the form of a System-On-a-Chip (hereinafter referred to as SOC).
图8为本申请电子设备一个实施例的结构示意图,如图8所示,上述电子设备可以包括:显示屏;一个或多个处理器;存储器;以及一个或多个计算机程序。FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of this application. As shown in FIG. 8, the above-mentioned electronic device may include: a display screen; one or more processors; a memory; and one or more computer programs.
其中,上述显示屏可以包括车载计算机(移动数据中心Mobile Data Center)的显示屏;上述电子设备可以为移动终端(手机),电脑,PAD,可穿戴设备,智慧屏,无人机,智能网联车(Intelligent Connected Vehicle;以下简称:ICV),智能(汽)车(smart/intelligent car)或车载设备等设备。Among them, the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned electronic device may be a mobile terminal (mobile phone), a computer, a PAD, a wearable device, a smart screen, a drone, and an intelligent network connection. Vehicle (Intelligent Connected Vehicle; hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car), or in-vehicle equipment.
其中上述一个或多个计算机程序被存储在上述存储器中,上述一个或多个计算机程序包括指令,当上述指令被上述设备执行时,使得上述设备执行以下步骤:The above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions. When the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device is caused to perform the following steps:
获得用户相对电子设备的方位;所述电子设备设置有N个麦克风;N为大于等于3的整数;Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束;Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;
当所述N个麦克风接收到声音信号时,使用所述主波束的波束形成系数计算所述声音信号的主输出信号,并且,使用所述副波束的波束形成系数计算所述声音信号的副输出信号;When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;
使用所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal.
在一种可能的实现方式中,所述指令被所述设备执行时,使得所述获得用户相对电子设备的方位的步骤可以包括:In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the user's position relative to the electronic device may include:
获取所述电子设备的摄像头捕捉到的图像;Acquiring an image captured by a camera of the electronic device;
如果从所述图像中识别出所述电子设备的用户的人脸信息,根据所述人脸信息在所述图像中的位置信息,获得所述用户相对电子设备的方位;If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;
如果从所述图像中未识别出所述用户的人脸信息,获取所述电子设备的摆放位置;根据所述摆放位置,获得所述用户相对所述电子设备的方位。If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.
在一种可能的实现方式中,所述指令被所述设备执行时,使得所述在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束的步骤可以包括:In a possible implementation manner, when the instruction is executed by the device, the fixed beam closest to the azimuth is selected as the main beam among the preset fixed beams of the electronic device, and the fixed beam is selected according to the distance. The step of selecting at least one fixed beam as the secondary beam in the order of the azimuth from far to near may include:
计算所述方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
Figure PCTCN2021079789-appb-000025
其中,K k是所述方位针对固定波束k的比值,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
Figure PCTCN2021079789-appb-000026
是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
Figure PCTCN2021079789-appb-000025
Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
Figure PCTCN2021079789-appb-000026
Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
选择最小的所述比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
在一种可能的实现方式中,所述指令被所述设备执行时,使得所述获得用户相对电子设备的方位的步骤之前还执行以下步骤:In a possible implementation manner, when the instruction is executed by the device, the following steps are further executed before the step of obtaining the user's position relative to the electronic device:
获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
在一种可能的实现方式中,所述指令被所述设备执行时,使得所述获得预设组数的固定波束的波束形成系数、方向、以及波束宽度的步骤可以包括:In a possible implementation manner, when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams may include:
为电子设备建立三维笛卡尔坐标系;Establish a three-dimensional Cartesian coordinate system for electronic equipment;
获得所述N个麦克风在所述坐标系中的坐标;Obtaining the coordinates of the N microphones in the coordinate system;
根据所述N个麦克风的坐标计算目标声源在理想条件下的导向矢量;Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
获得电子设备壳体对所述麦克风的频域响应矩阵;Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;
根据所述理想条件下的导向矢量以及所述频域响应矩阵计算所述目标声源的真实导向矢量;Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
根据所述真实导向矢量计算所述预设组数的固定波束的波束形成系数、方向、以及波束宽度。Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
图8所示的电子设备可以是终端设备也可以是内置于上述终端设备的电路设备。该设备可以用于执行本申请图2~图4所示实施例提供的方法中的功能/步骤。The electronic device shown in FIG. 8 may be a terminal device or a circuit device built in the aforementioned terminal device. The device can be used to execute the functions/steps in the methods provided in the embodiments shown in FIGS. 2 to 4 of this application.
电子设备800可以包括处理器810,外部存储器接口820,内部存储器821,通用串行总线(universal serial bus,USB)接口830,充电管理模块840,电源管理模块841,电池842,天线1,天线2,移动通信模块850,无线通信模块860,音频模块870,扬声器870A,受话器870B,麦克风870C,耳机接口870D,传感器模块880,按键890,马达891,指示器892,摄像头893,显示屏894,以及用户标识模块(subscriber identification module,SIM)卡接口895等。其中传感器模块880可以包括压力传感器880A,陀螺仪传感器880B,气压传感器880C,磁传感器880D,加速度传感器880E,距离传感器880F,接近光传感器880G,指纹传感器880H,温度传感器880J,触摸传感器880K,环境光传感器880L,骨传导传感器880M等。The electronic device 800 may include a processor 810, an external memory interface 820, an internal memory 821, a universal serial bus (USB) interface 830, a charging management module 840, a power management module 841, a battery 842, an antenna 1, and an antenna 2. , Mobile communication module 850, wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, earphone jack 870D, sensor module 880, buttons 890, motor 891, indicator 892, camera 893, display 894, and Subscriber identification module (subscriber identification module, SIM) card interface 895, etc. The sensor module 880 can include pressure sensor 880A, gyroscope sensor 880B, air pressure sensor 880C, magnetic sensor 880D, acceleration sensor 880E, distance sensor 880F, proximity light sensor 880G, fingerprint sensor 880H, temperature sensor 880J, touch sensor 880K, ambient light Sensor 880L, bone conduction sensor 880M, etc.
可以理解的是,本发明实施例示意的结构并不构成对电子设备800的具体限定。在本申请另一些实施例中,电子设备800可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 800. In other embodiments of the present application, the electronic device 800 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器810可以包括一个或多个处理单元,例如:处理器810可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器 (neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 810 may include one or more processing units. For example, the processor 810 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc. Among them, the different processing units may be independent devices or integrated in one or more processors.
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
处理器810中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器810中的存储器为高速缓冲存储器。该存储器可以保存处理器810刚用过或循环使用的指令或数据。如果处理器810需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器810的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 810 for storing instructions and data. In some embodiments, the memory in the processor 810 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 810. If the processor 810 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 810 is reduced, and the efficiency of the system is improved.
在一些实施例中,处理器810可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 810 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器810可以包含多组I2C总线。处理器810可以通过不同的I2C总线接口分别耦合触摸传感器880K,充电器,闪光灯,摄像头893等。例如:处理器810可以通过I2C接口耦合触摸传感器880K,使处理器810与触摸传感器880K通过I2C总线接口通信,实现电子设备800的触摸功能。The I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 810 may include multiple sets of I2C buses. The processor 810 may be coupled to the touch sensor 880K, charger, flash, camera 893, etc., respectively through different I2C bus interfaces. For example, the processor 810 may couple the touch sensor 880K through an I2C interface, so that the processor 810 and the touch sensor 880K communicate through the I2C bus interface to implement the touch function of the electronic device 800.
I2S接口可以用于音频通信。在一些实施例中,处理器810可以包含多组I2S总线。处理器810可以通过I2S总线与音频模块870耦合,实现处理器810与音频模块870之间的通信。在一些实施例中,音频模块870可以通过I2S接口向无线通信模块860传递音频信号,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 810 may include multiple sets of I2S buses. The processor 810 may be coupled with the audio module 870 through an I2S bus to implement communication between the processor 810 and the audio module 870. In some embodiments, the audio module 870 may transmit audio signals to the wireless communication module 860 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块870与无线通信模块860可以通过PCM总线接口耦合。在一些实施例中,音频模块870也可以通过PCM接口向无线通信模块860传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 870 and the wireless communication module 860 may be coupled through a PCM bus interface. In some embodiments, the audio module 870 may also transmit audio signals to the wireless communication module 860 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器810与无线通信模块860。例如:处理器810通过UART接口与无线通信模块860中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块870可以通过UART接口向无线通信模块860传递音频信号,实现通过蓝牙耳机播放音乐的功能。The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is usually used to connect the processor 810 and the wireless communication module 860. For example, the processor 810 communicates with the Bluetooth module in the wireless communication module 860 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 870 may transmit audio signals to the wireless communication module 860 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
MIPI接口可以被用于连接处理器810与显示屏894,摄像头893等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display  serial interface,DSI)等。在一些实施例中,处理器810和摄像头893通过CSI接口通信,实现电子设备800的拍摄功能。处理器810和显示屏894通过DSI接口通信,实现电子设备800的显示功能。The MIPI interface can be used to connect the processor 810 with the display screen 894, the camera 893 and other peripheral devices. The MIPI interface includes camera serial interface (CSI), display serial interface (DSI) and so on. In some embodiments, the processor 810 and the camera 893 communicate through a CSI interface to implement the shooting function of the electronic device 800. The processor 810 and the display screen 894 communicate through a DSI interface to realize the display function of the electronic device 800.
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器810与摄像头893,显示屏894,无线通信模块860,音频模块870,传感器模块880等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 810 with the camera 893, the display screen 894, the wireless communication module 860, the audio module 870, the sensor module 880, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
USB接口830是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口830可以用于连接充电器为电子设备800充电,也可以用于电子设备800与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 830 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 830 can be used to connect a charger to charge the electronic device 800, and can also be used to transfer data between the electronic device 800 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备800的结构限定。在本申请另一些实施例中,电子设备800也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 800. In other embodiments of the present application, the electronic device 800 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
充电管理模块840用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块840可以通过USB接口830接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块840可以通过电子设备800的无线充电线圈接收无线充电输入。充电管理模块840为电池842充电的同时,还可以通过电源管理模块841为电子设备供电。The charging management module 840 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 840 may receive the charging input of the wired charger through the USB interface 830. In some embodiments of wireless charging, the charging management module 840 may receive the wireless charging input through the wireless charging coil of the electronic device 800. While the charging management module 840 charges the battery 842, it can also supply power to the electronic device through the power management module 841.
电源管理模块841用于连接电池842,充电管理模块840与处理器810。电源管理模块841接收电池842和/或充电管理模块840的输入,为处理器810,内部存储器821,显示屏894,摄像头893,和无线通信模块860等供电。电源管理模块841还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块841也可以设置于处理器810中。在另一些实施例中,电源管理模块841和充电管理模块840也可以设置于同一个器件中。The power management module 841 is used to connect the battery 842, the charging management module 840 and the processor 810. The power management module 841 receives input from the battery 842 and/or the charging management module 840, and supplies power to the processor 810, the internal memory 821, the display screen 894, the camera 893, and the wireless communication module 860. The power management module 841 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 841 may also be provided in the processor 810. In other embodiments, the power management module 841 and the charging management module 840 may also be provided in the same device.
电子设备800的无线通信功能可以通过天线1,天线2,移动通信模块850,无线通信模块860,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 800 can be implemented by the antenna 1, the antenna 2, the mobile communication module 850, the wireless communication module 860, the modem processor, and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。电子设备800中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 800 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块850可以提供应用在电子设备800上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块850可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块850可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块850还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块850的至少部分功能模块可以被设置于处理器810中。在一些实施例中,移动通信模块850的至少部分功能模块可以 与处理器810的至少部分模块被设置在同一个器件中。The mobile communication module 850 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 800. The mobile communication module 850 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 850 can receive electromagnetic waves by the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 850 can also amplify the signal modulated by the modem processor, and convert it to electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 850 may be provided in the processor 810. In some embodiments, at least part of the functional modules of the mobile communication module 850 and at least part of the modules of the processor 810 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器870A,受话器870B等)输出声音信号,或通过显示屏894显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器810,与移动通信模块850或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker 870A, a receiver 870B, etc.), or displays an image or video through the display screen 894. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 810 and be provided in the same device as the mobile communication module 850 or other functional modules.
无线通信模块860可以提供应用在电子设备800上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块860可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块860经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器810。无线通信模块860还可以从处理器810接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 860 can provide applications on the electronic device 800 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 860 may be one or more devices integrating at least one communication processing module. The wireless communication module 860 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 810. The wireless communication module 860 may also receive the signal to be sent from the processor 810, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
在一些实施例中,电子设备800的天线1和移动通信模块850耦合,天线2和无线通信模块860耦合,使得电子设备800可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 800 is coupled with the mobile communication module 850, and the antenna 2 is coupled with the wireless communication module 860, so that the electronic device 800 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
电子设备800通过GPU,显示屏894,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏894和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器810可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 800 implements a display function through a GPU, a display screen 894, and an application processor. The GPU is a microprocessor for image processing, which connects the display 894 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 810 may include one or more GPUs, which execute program instructions to generate or change display information.
显示屏894用于显示图像,视频等。显示屏894包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting  diodes,QLED)等。在一些实施例中,电子设备800可以包括1个或N个显示屏894,N为大于1的正整数。The display screen 894 is used to display images, videos, and so on. The display screen 894 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc. In some embodiments, the electronic device 800 may include one or N display screens 894, and N is a positive integer greater than one.
电子设备800可以通过ISP,摄像头893,视频编解码器,GPU,显示屏894以及应用处理器等实现拍摄功能。The electronic device 800 can realize a shooting function through an ISP, a camera 893, a video codec, a GPU, a display screen 894, and an application processor.
ISP用于处理摄像头893反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头893中。The ISP is used to process the data fed back from the camera 893. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 893.
摄像头893用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备800可以包括1个或N个摄像头893,N为大于1的正整数。The camera 893 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 800 may include 1 or N cameras 893, and N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备800在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 800 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
视频编解码器用于对数字视频压缩或解压缩。电子设备800可以支持一种或多种视频编解码器。这样,电子设备800可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 800 may support one or more video codecs. In this way, the electronic device 800 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备800的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the electronic device 800 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
外部存储器接口820可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备800的存储能力。外部存储卡通过外部存储器接口820与处理器810通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 820 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 800. The external memory card communicates with the processor 810 through the external memory interface 820 to realize the data storage function. For example, save music, video and other files in an external memory card.
内部存储器821可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器821可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备800使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器821可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器810通过运行存储在内部存储器821的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备800的各种功能应用以及数据处理。The internal memory 821 may be used to store computer executable program code, where the executable program code includes instructions. The internal memory 821 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 800. In addition, the internal memory 821 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like. The processor 810 executes various functional applications and data processing of the electronic device 800 by running instructions stored in the internal memory 821 and/or instructions stored in a memory provided in the processor.
电子设备800可以通过音频模块870,扬声器870A,受话器870B,麦克风870C,耳机接口870D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 800 can implement audio functions through an audio module 870, a speaker 870A, a receiver 870B, a microphone 870C, a headphone interface 870D, and an application processor. For example, music playback, recording, etc.
音频模块870用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块870还可以用于对音频信号编码和解码。在一些实施例中,音频模块870可以设置于处理器810中,或将音频模块870的部分功能模块设置于处理器810中。The audio module 870 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 870 can also be used to encode and decode audio signals. In some embodiments, the audio module 870 may be provided in the processor 810, or part of the functional modules of the audio module 870 may be provided in the processor 810.
扬声器870A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备800可以通过扬声器870A收听音乐,或收听免提通话。The speaker 870A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 800 can listen to music through the speaker 870A, or listen to a hands-free call.
受话器870B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备800接听电话或语音信息时,可以通过将受话器870B靠近人耳接听语音。The receiver 870B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 800 answers a call or voice message, it can receive the voice by bringing the receiver 870B close to the human ear.
麦克风870C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风870C发声,将声音信号输入到麦克风870C。电子设备800可以设置至少一个麦克风870C。在另一些实施例中,电子设备800可以设置两个麦克风870C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备800还可以设置三个,四个或更多麦克风870C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。 Microphone 870C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 870C through the human mouth, and input the sound signal to the microphone 870C. The electronic device 800 may be provided with at least one microphone 870C. In other embodiments, the electronic device 800 may be provided with two microphones 870C, which can implement noise reduction functions in addition to collecting sound signals. In some other embodiments, the electronic device 800 can also be equipped with three, four or more microphones 870C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
耳机接口870D用于连接有线耳机。耳机接口870D可以是USB接口830,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 870D is used to connect wired earphones. The earphone interface 870D may be a USB interface 830, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
压力传感器880A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器880A可以设置于显示屏894。压力传感器880A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器880A,电极之间的电容改变。电子设备800根据电容的变化确定压力的强度。当有触摸操作作用于显示屏894,电子设备800根据压力传感器880A检测所述触摸操作强度。电子设备800也可以根据压力传感器880A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 880A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 880A may be provided on the display screen 894. There are many types of pressure sensors 880A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 880A, the capacitance between the electrodes changes. The electronic device 800 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 894, the electronic device 800 detects the intensity of the touch operation according to the pressure sensor 880A. The electronic device 800 may also calculate the touched position according to the detection signal of the pressure sensor 880A. In some embodiments, touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
陀螺仪传感器880B可以用于确定电子设备800的运动姿态。在一些实施例中,可以通过陀螺仪传感器880B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器880B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器880B检测电子设备800抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备800的抖动,实现防抖。陀螺仪传感器880B还可以用于导航,体感游戏场景。The gyro sensor 880B can be used to determine the movement posture of the electronic device 800. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 880B. The gyro sensor 880B can be used for shooting anti-shake. Exemplarily, when the shutter is pressed, the gyroscope sensor 880B detects the jitter angle of the electronic device 800, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the jitter of the electronic device 800 through reverse movement to achieve anti-shake. The gyro sensor 880B can also be used for navigation and somatosensory game scenes.
气压传感器880C用于测量气压。在一些实施例中,电子设备800通过气压传感 器880C测得的气压值计算海拔高度,辅助定位和导航。The air pressure sensor 880C is used to measure air pressure. In some embodiments, the electronic device 800 uses the air pressure value measured by the air pressure sensor 880C to calculate the altitude to assist positioning and navigation.
磁传感器880D包括霍尔传感器。电子设备800可以利用磁传感器880D检测翻盖皮套的开合。在一些实施例中,当电子设备800是翻盖机时,电子设备800可以根据磁传感器880D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。The magnetic sensor 880D includes a Hall sensor. The electronic device 800 can use the magnetic sensor 880D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 800 is a flip machine, the electronic device 800 can detect the opening and closing of the flip according to the magnetic sensor 880D. Furthermore, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
加速度传感器880E可检测电子设备800在各个方向上(一般为三轴)加速度的大小。当电子设备800静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 880E can detect the magnitude of the acceleration of the electronic device 800 in various directions (generally three axes). When the electronic device 800 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, and so on.
距离传感器880F,用于测量距离。电子设备800可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备800可以利用距离传感器880F测距以实现快速对焦。Distance sensor 880F, used to measure distance. The electronic device 800 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 800 may use the distance sensor 880F to measure the distance to achieve fast focusing.
接近光传感器880G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备800通过发光二极管向外发射红外光。电子设备800使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备800附近有物体。当检测到不充分的反射光时,电子设备800可以确定电子设备800附近没有物体。电子设备800可以利用接近光传感器880G检测用户手持电子设备800贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器880G也可用于皮套模式,口袋模式自动解锁与锁屏。The proximity light sensor 880G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 800 emits infrared light to the outside through the light emitting diode. The electronic device 800 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 800. When insufficient reflected light is detected, the electronic device 800 can determine that there is no object near the electronic device 800. The electronic device 800 can use the proximity light sensor 880G to detect that the user holds the electronic device 800 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 880G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
环境光传感器880L用于感知环境光亮度。电子设备800可以根据感知的环境光亮度自适应调节显示屏894亮度。环境光传感器880L也可用于拍照时自动调节白平衡。环境光传感器880L还可以与接近光传感器880G配合,检测电子设备800是否在口袋里,以防误触。The ambient light sensor 880L is used to sense the brightness of the ambient light. The electronic device 800 can adaptively adjust the brightness of the display screen 894 according to the perceived brightness of the ambient light. The ambient light sensor 880L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 880L can also cooperate with the proximity light sensor 880G to detect whether the electronic device 800 is in the pocket to prevent accidental touch.
指纹传感器880H用于采集指纹。电子设备800可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 880H is used to collect fingerprints. The electronic device 800 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
温度传感器880J用于检测温度。在一些实施例中,电子设备800利用温度传感器880J检测的温度,执行温度处理策略。例如,当温度传感器880J上报的温度超过阈值,电子设备800执行降低位于温度传感器880J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备800对电池842加热,以避免低温导致电子设备800异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备800对电池842的输出电压执行升压,以避免低温导致的异常关机。The temperature sensor 880J is used to detect temperature. In some embodiments, the electronic device 800 uses the temperature detected by the temperature sensor 880J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 880J exceeds a threshold value, the electronic device 800 executes to reduce the performance of the processor located near the temperature sensor 880J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 800 heats the battery 842 to avoid abnormal shutdown of the electronic device 800 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 800 boosts the output voltage of the battery 842 to avoid abnormal shutdown caused by low temperature.
触摸传感器880K,也称“触控器件”。触摸传感器880K可以设置于显示屏894,由触摸传感器880K与显示屏894组成触摸屏,也称“触控屏”。触摸传感器880K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏894提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器880K也可以设置于电子设备800的表面,与显示屏894所处的位置不同。The touch sensor 880K is also called "touch device". The touch sensor 880K can be arranged on the display screen 894, and the touch screen is composed of the touch sensor 880K and the display screen 894, which is also called a “touch screen”. The touch sensor 880K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation may be provided through the display screen 894. In other embodiments, the touch sensor 880K may also be disposed on the surface of the electronic device 800, which is different from the position of the display screen 894.
骨传导传感器880M可以获取振动信号。在一些实施例中,骨传导传感器880M 可以获取人体声部振动骨块的振动信号。骨传导传感器880M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器880M也可以设置于耳机中,结合成骨传导耳机。音频模块870可以基于所述骨传导传感器880M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器880M获取的血压跳动信号解析心率信息,实现心率检测功能。The bone conduction sensor 880M can acquire vibration signals. In some embodiments, the bone conduction sensor 880M can obtain the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 880M can also contact the human pulse and receive the blood pressure pulse signal. In some embodiments, the bone conduction sensor 880M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 870 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 880M, and realize the voice function. The application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 880M, and realize the heart rate detection function.
按键890包括开机键,音量键等。按键890可以是机械按键。也可以是触摸式按键。电子设备800可以接收按键输入,产生与电子设备800的用户设置以及功能控制有关的键信号输入。The button 890 includes a power button, a volume button, and so on. The button 890 may be a mechanical button. It can also be a touch button. The electronic device 800 may receive key input, and generate key signal input related to user settings and function control of the electronic device 800.
马达891可以产生振动提示。马达891可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏894不同区域的触摸操作,马达891也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 891 can generate vibration prompts. The motor 891 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photo taking, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 894, the motor 891 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器892可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 892 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
SIM卡接口895用于连接SIM卡。SIM卡可以通过插入SIM卡接口895,或从SIM卡接口895拔出,实现和电子设备800的接触和分离。电子设备800可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口895可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口895可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口895也可以兼容不同类型的SIM卡。SIM卡接口895也可以兼容外部存储卡。电子设备800通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备800采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备800中,不能和电子设备800分离。The SIM card interface 895 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 895 or pulled out from the SIM card interface 895 to achieve contact and separation with the electronic device 800. The electronic device 800 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 895 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 895 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 895 can also be compatible with different types of SIM cards. The SIM card interface 895 can also be compatible with external memory cards. The electronic device 800 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the electronic device 800 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the electronic device 800 and cannot be separated from the electronic device 800.
应理解,图8所示的电子设备800能够实现本申请图2~图4所示实施例提供的方法的各个过程。电子设备800中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见本申请图2~图4所示方法实施例中的描述,为避免重复,此处适当省略详细描述。It should be understood that the electronic device 800 shown in FIG. 8 can implement various processes of the methods provided in the embodiments shown in FIGS. 2 to 4 of this application. The operations and/or functions of each module in the electronic device 800 are respectively for implementing the corresponding processes in the foregoing method embodiments. For details, please refer to the descriptions in the method embodiments shown in Figs. 2 to 4 of this application. To avoid repetition, detailed descriptions are appropriately omitted here.
应理解,图8所示的电子设备800中的处理器810可以是片上系统SOC,该处理器810中可以包括中央处理器(Central Processing Unit,CPU),还可以进一步包括其他类型的处理器,例如:图像处理器(Graphics Processing Unit,GPU)等。It should be understood that the processor 810 in the electronic device 800 shown in FIG. 8 may be a system-on-chip SOC, and the processor 810 may include a central processing unit (CPU), and may further include other types of processors. For example: Graphics Processing Unit (GPU), etc.
总之,处理器810内部的各部分处理器或处理单元可以共同配合实现之前的方法流程,且各部分处理器或处理单元相应的软件程序可存储在内部存储器121中。In short, each part of the processor or processing unit inside the processor 810 can cooperate to implement the previous method flow, and the corresponding software program of each part of the processor or processing unit can be stored in the internal memory 121.
本申请还提供一种电子设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现本申请图2~图4所示实施例提供的方法。The present application also provides an electronic device. The device includes a storage medium and a central processing unit. The storage medium may be a non-volatile storage medium. A computer executable program is stored in the storage medium. The central processing unit is connected to the The non-volatile storage medium is connected, and the computer executable program is executed to implement the methods provided by the embodiments shown in FIGS. 2 to 4 of this application.
以上各实施例中,涉及的处理器可以例如包括CPU、DSP、微控制器或数字信号处理器,还可包括GPU、嵌入式神经网络处理器(Neural-network Process Units;以 下简称:NPU)和图像信号处理器(Image Signal Processing;以下简称:ISP),该处理器还可包括必要的硬件加速器或逻辑处理硬件电路,如ASIC,或一个或多个用于控制本申请技术方案程序执行的集成电路等。此外,处理器可以具有操作一个或多个软件程序的功能,软件程序可以存储在存储介质中。In the above embodiments, the processors involved may include, for example, CPU, DSP, microcontroller or digital signal processor, and may also include GPU, embedded neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and Image Signal Processing (Image Signal Processing; hereinafter referred to as ISP), which may also include necessary hardware accelerators or logic processing hardware circuits, such as ASIC, or one or more integrated circuits used to control the execution of the technical solutions of this application Circuit etc. In addition, the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium.
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请图2~图4所示实施例提供的方法。The embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, which when running on a computer, causes the computer to execute the functions provided by the embodiments shown in Figs. 2 to 4 of the present application. method.
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请图2~图4所示实施例提供的方法。The embodiments of the present application also provide a computer program product. The computer program product includes a computer program that, when running on a computer, causes the computer to execute the method provided by the embodiments shown in FIGS. 2 to 4 of the present application.
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。In the embodiments of the present application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "The following at least one item" and similar expressions refer to any combination of these items, including any combination of single items or plural items. For example, at least one of a, b, and c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single, or There can be more than one.
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps described in the embodiments disclosed herein can be implemented by a combination of electronic hardware, computer software, and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory;以下简称:ROM)、随机存取存储器(Random Access Memory;以下简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In the several embodiments provided in this application, if any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as ROM), random access memory (Random Access Memory; hereinafter referred to as RAM), magnetic disks or optical disks, etc. A medium that can store program codes.
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed in this application, and they should all be covered by the protection scope of this application. The protection scope of this application shall be subject to the protection scope of the claims.

Claims (16)

  1. 一种拾音方法,其特征在于,包括:A sound pickup method, characterized in that it comprises:
    获得用户相对电子设备的方位;所述电子设备设置有N个麦克风;N为大于等于3的整数;Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
    在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束;Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;
    当所述N个麦克风接收到声音信号时,使用所述主波束的波束形成系数计算所述声音信号的主输出信号,并且,使用所述副波束的波束形成系数计算所述声音信号的副输出信号;When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;
    使用所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal.
  2. 根据权利要求1所述的方法,其特征在于,所述获得用户相对电子设备的方位,包括:The method according to claim 1, wherein the obtaining the position of the user relative to the electronic device comprises:
    获取所述电子设备的摄像头捕捉到的图像;Acquiring an image captured by a camera of the electronic device;
    如果从所述图像中识别出所述电子设备的用户的人脸信息,根据所述人脸信息在所述图像中的位置信息,获得所述用户相对电子设备的方位;If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;
    如果从所述图像中未识别出所述用户的人脸信息,获取所述电子设备的摆放位置;根据所述摆放位置,获得所述用户相对所述电子设备的方位。If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.
  3. 根据权利要求1或2所述的方法,其特征在于,所述在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束,包括:The method according to claim 1 or 2, characterized in that, in the preset fixed beams of the electronic device, the fixed beam closest to the azimuth is selected as the main beam, and the fixed beam is selected from the farthest to the farthest according to the distance to the azimuth. Select at least one fixed beam as the secondary beam in the near sequence, including:
    计算所述方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
    Figure PCTCN2021079789-appb-100001
    其中,K k是所述方位针对固定波束k的比值,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
    Figure PCTCN2021079789-appb-100002
    是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
    Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
    Figure PCTCN2021079789-appb-100001
    Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
    Figure PCTCN2021079789-appb-100002
    Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
    选择最小的所述比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
  4. 根据权利要求1或2所述的方法,其特征在于,所述获得用户相对电子设备的方位之前,还包括:The method according to claim 1 or 2, wherein before obtaining the user's position relative to the electronic device, the method further comprises:
    获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
  5. 根据权利要求4所述的方法,其特征在于,所述获得预设组数的固定波束的波束形成系数、方向、以及波束宽度,包括:The method according to claim 4, wherein the obtaining the beam forming coefficient, direction, and beam width of a preset number of fixed beams comprises:
    为电子设备建立三维笛卡尔坐标系;Establish a three-dimensional Cartesian coordinate system for electronic equipment;
    获得所述N个麦克风在所述坐标系中的坐标;Obtaining the coordinates of the N microphones in the coordinate system;
    根据所述N个麦克风的坐标计算目标声源在理想条件下的导向矢量;Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
    获得电子设备壳体对所述麦克风的频域响应矩阵;Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;
    根据所述理想条件下的导向矢量以及所述频域响应矩阵计算所述目标声源的真实导向矢量;Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
    根据所述真实导向矢量计算所述预设组数的固定波束的波束形成系数、方向、以及波束宽度。Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
  6. 一种拾音装置,其特征在于,包括:A sound pickup device, characterized in that it comprises:
    方位获得单元,用于获得用户相对电子设备的方位;所述电子设备设置有N个麦克风;N为大于等于3的整数;The position obtaining unit is used to obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
    波束选择单元,用于在所述电子设备的预设固定波束中,选择距离所述方位获得单元获得的所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束;The beam selection unit is configured to select, among the preset fixed beams of the electronic device, the fixed beam closest to the azimuth obtained by the azimuth obtaining unit as the main beam, and select in the order from the farthest to the nearest to the azimuth. At least one fixed beam is used as a secondary beam;
    信号计算单元,用于当所述N个麦克风接收到声音信号时,使用所述波束选择单元选择的所述主波束的波束形成系数计算所述声音信号的主输出信号,并且,使用所述波束选择单元选择的所述副波束的波束形成系数计算所述声音信号的副输出信号;The signal calculation unit is configured to use the beamforming coefficient of the main beam selected by the beam selection unit to calculate the main output signal of the sound signal when the N microphones receive the sound signal, and use the beam The beamforming coefficient of the secondary beam selected by the selection unit is calculated as the secondary output signal of the sound signal;
    滤波单元,用于使用所述信号计算单元计算的所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。The filtering unit is configured to perform filtering processing on the main output signal using the auxiliary output signal calculated by the signal calculation unit to obtain a target sound signal.
  7. 根据权利要求6所述的装置,其特征在于,所述方位获得单元包括:The device according to claim 6, wherein the position obtaining unit comprises:
    图像获取子单元,用于获取所述电子设备的摄像头捕捉到的图像;An image acquisition subunit for acquiring the image captured by the camera of the electronic device;
    方位获得子单元,用于如果从所述图像子单元获取到的所述图像中识别出所述电子设备的用户的人脸信息,根据所述人脸信息在所述图像中的位置信息,获得所述用户相对电子设备的方位;如果从所述图像子单元获取到的所述图像中未识别出所述用户的人脸信息,获取所述电子设备的摆放位置;根据所述摆放位置,获得所述用户相对所述电子设备的方位。The orientation obtaining subunit is configured to obtain the facial information of the user of the electronic device according to the position information of the facial information in the image if the facial information of the user of the electronic device is recognized from the image obtained by the image subunit The position of the user relative to the electronic device; if the user's face information is not recognized in the image obtained from the image subunit, obtain the placement position of the electronic device; according to the placement position To obtain the position of the user relative to the electronic device.
  8. 根据权利要求6或7所述的装置,其特征在于,所述波束选择单元包括:The device according to claim 6 or 7, wherein the beam selection unit comprises:
    比值计算子单元,用于计算所述方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
    Figure PCTCN2021079789-appb-100003
    其中,K k是所述方位针对固定波束k的比值,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
    Figure PCTCN2021079789-appb-100004
    是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
    The ratio calculation subunit is used to calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
    Figure PCTCN2021079789-appb-100003
    Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
    Figure PCTCN2021079789-appb-100004
    Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
    波束选择子单元,用于在所述比值计算子单元计算的比值中,选择最小的所述比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。The beam selection subunit is configured to select, among the ratios calculated by the ratio calculation subunit, the fixed beam corresponding to the smallest ratio as the main beam, starting from the largest ratio in the descending order of the ratio At least one fixed beam corresponding to the ratio is selected as a secondary beam.
  9. 根据权利要求6或7所述的装置,其特征在于,还包括:The device according to claim 6 or 7, further comprising:
    波束获得单元,用于获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。The beam obtaining unit is used to obtain the beam forming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
  10. 根据权利要求9所述的装置,其特征在于,所述波束获得单元包括:The apparatus according to claim 9, wherein the beam obtaining unit comprises:
    坐标系建立子单元,用于为电子设备建立三维笛卡尔坐标系;The coordinate system establishment subunit is used to establish a three-dimensional Cartesian coordinate system for electronic equipment;
    坐标获得子单元,用于获得所述N个麦克风在所述坐标系中的坐标;A coordinate obtaining subunit, configured to obtain the coordinates of the N microphones in the coordinate system;
    理想导向矢量计算子单元,用于根据所述N个麦克风的坐标计算目标声源在理想条件下的导向矢量;The ideal steering vector calculation subunit is used to calculate the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
    矩阵获得子单元,用于获得电子设备壳体对所述麦克风的频域响应矩阵;A matrix obtaining subunit for obtaining a frequency domain response matrix of the electronic device housing to the microphone;
    真实导向矢量计算子单元,用于根据所述理想条件下的导向矢量以及所述频域响应矩阵计算所述目标声源的真实导向矢量;A true steering vector calculation subunit, configured to calculate the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
    固定波束计算子单元,用于根据所述真实导向矢量计算所述预设组数的固定波束的波束形成系数、方向、以及波束宽度。The fixed beam calculation subunit is configured to calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
  11. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    显示屏;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述设备执行时,使得所述设备执行以下步骤:A display screen; one or more processors; a memory; a plurality of application programs; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs Including instructions, when the instructions are executed by the device, cause the device to perform the following steps:
    获得用户相对电子设备的方位;所述电子设备设置有N个麦克风;N为大于等于3的整数;Obtain the position of the user relative to the electronic device; the electronic device is provided with N microphones; N is an integer greater than or equal to 3;
    在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束;Among the preset fixed beams of the electronic device, selecting the fixed beam closest to the azimuth as the main beam, and selecting at least one fixed beam as the secondary beam in the order from the farthest to the shortest from the azimuth;
    当所述N个麦克风接收到声音信号时,使用所述主波束的波束形成系数计算所述声音信号的主输出信号,并且,使用所述副波束的波束形成系数计算所述声音信号的副输出信号;When the N microphones receive sound signals, the beamforming coefficients of the main beam are used to calculate the main output signal of the sound signal, and the beamforming coefficients of the side beams are used to calculate the side output of the sound signal. Signal;
    使用所述副输出信号对所述主输出信号进行滤波处理,得到目标声音信号。。Using the auxiliary output signal to perform filtering processing on the main output signal to obtain a target sound signal. .
  12. 根据权利要求11所述的电子设备,其特征在于,所述指令被所述设备执行时,使得所述获得用户相对电子设备的方位的步骤包括:The electronic device according to claim 11, wherein when the instruction is executed by the device, the step of obtaining the user's position relative to the electronic device comprises:
    获取所述电子设备的摄像头捕捉到的图像;Acquiring an image captured by a camera of the electronic device;
    如果从所述图像中识别出所述电子设备的用户的人脸信息,根据所述人脸信息在所述图像中的位置信息,获得所述用户相对电子设备的方位;If the face information of the user of the electronic device is recognized from the image, obtain the position of the user relative to the electronic device according to the position information of the face information in the image;
    如果从所述图像中未识别出所述用户的人脸信息,获取所述电子设备的摆放位置;根据所述摆放位置,获得所述用户相对所述电子设备的方位。If the face information of the user is not recognized from the image, obtain the placement position of the electronic device; obtain the position of the user relative to the electronic device according to the placement position.
  13. 根据权利要求11或12所述的电子设备,其特征在于,所述指令被所述设备执行时,使得所述在所述电子设备的预设固定波束中,选择距离所述方位最近的固定波束作为主波束,按照距离所述方位从远到近的顺序选择至少一个固定波束作为副波束的步骤包括:The electronic device according to claim 11 or 12, wherein when the instruction is executed by the device, the fixed beam that is closest to the azimuth is selected among the preset fixed beams of the electronic device As the main beam, the step of selecting at least one fixed beam as the secondary beam in the order from the farthest to the nearer distance from the azimuth includes:
    计算所述方位针对每个固定波束的比值K;K k=夹角Δ k/波束宽度
    Figure PCTCN2021079789-appb-100005
    其中,K k是所述方位针对固定波束k的比值,夹角Δ k是所述方位与固定波束k的方向之间的夹角,波束宽度
    Figure PCTCN2021079789-appb-100006
    是固定波束k的波束宽度;k=1,2,…,M;M是固定波束的组数;
    Calculate the ratio K of the azimuth to each fixed beam; K k = included angle Δ k /beam width
    Figure PCTCN2021079789-appb-100005
    Where K k is the ratio of the azimuth to the fixed beam k, the angle Δ k is the angle between the azimuth and the direction of the fixed beam k, and the beam width
    Figure PCTCN2021079789-appb-100006
    Is the beam width of the fixed beam k; k = 1, 2, ..., M; M is the number of fixed beam groups;
    选择最小的所述比值对应的固定波束作为主波束,按照所述比值从大到小的顺序从最大的所述比值开始选择至少一个所述比值对应的固定波束作为副波束。The fixed beam corresponding to the smallest ratio is selected as the main beam, and at least one fixed beam corresponding to the ratio is selected as the secondary beam starting from the largest ratio in the descending order of the ratio.
  14. 根据权利要求11或12所述的电子设备,其特征在于,所述指令被所述设备执行时,使得所述获得用户相对电子设备的方位的步骤之前还执行以下步骤:The electronic device according to claim 11 or 12, wherein when the instruction is executed by the device, the following steps are performed before the step of obtaining the user's position relative to the electronic device:
    获得M组固定波束的波束形成系数、方向、以及波束宽度,M为大于等于2的整数。Obtain beamforming coefficients, directions, and beam widths of M groups of fixed beams, where M is an integer greater than or equal to 2.
  15. 根据权利要求14所述的电子设备,其特征在于,所述指令被所述设备执行时,使得所述获得预设组数的固定波束的波束形成系数、方向、以及波束宽度的步骤包括:The electronic device according to claim 14, wherein when the instruction is executed by the device, the step of obtaining the beamforming coefficient, direction, and beam width of a preset number of fixed beams comprises:
    为电子设备建立三维笛卡尔坐标系;Establish a three-dimensional Cartesian coordinate system for electronic equipment;
    获得所述N个麦克风在所述坐标系中的坐标;Obtaining the coordinates of the N microphones in the coordinate system;
    根据所述N个麦克风的坐标计算目标声源在理想条件下的导向矢量;Calculating the steering vector of the target sound source under ideal conditions according to the coordinates of the N microphones;
    获得电子设备壳体对所述麦克风的频域响应矩阵;Obtaining a frequency domain response matrix of the housing of the electronic device to the microphone;
    根据所述理想条件下的导向矢量以及所述频域响应矩阵计算所述目标声源的真实导向矢量;Calculating the true steering vector of the target sound source according to the steering vector under the ideal condition and the frequency domain response matrix;
    根据所述真实导向矢量计算所述预设组数的固定波束的波束形成系数、方向、以及波束宽度。Calculate the beam forming coefficient, direction, and beam width of the preset number of fixed beams according to the real steering vector.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行权利要求1至5任一项所述的方法。A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which when running on a computer, causes the computer to execute the method according to any one of claims 1 to 5.
PCT/CN2021/079789 2020-03-11 2021-03-09 Sound pickup method and apparatus and electronic device WO2021180085A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010167292.3A CN113393856B (en) 2020-03-11 2020-03-11 Pickup method and device and electronic equipment
CN202010167292.3 2020-03-11

Publications (1)

Publication Number Publication Date
WO2021180085A1 true WO2021180085A1 (en) 2021-09-16

Family

ID=77615411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079789 WO2021180085A1 (en) 2020-03-11 2021-03-09 Sound pickup method and apparatus and electronic device

Country Status (2)

Country Link
CN (1) CN113393856B (en)
WO (1) WO2021180085A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023347A (en) * 2021-11-30 2022-02-08 云知声智能科技股份有限公司 Directional sound pickup method and device, electronic equipment and storage medium
CN114257684A (en) * 2021-12-17 2022-03-29 歌尔科技有限公司 Voice processing method, system and device and electronic equipment
CN114339525A (en) * 2021-12-31 2022-04-12 紫光展锐(重庆)科技有限公司 Signal processing method and device, chip and module equipment
CN117215516A (en) * 2023-09-12 2023-12-12 深圳市品声科技有限公司 Interaction method and device based on microphone

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292700A (en) * 2022-06-20 2023-12-26 青岛海尔科技有限公司 Voice enhancement method and device for distributed wakeup and storage medium
CN115103267A (en) * 2022-06-30 2022-09-23 歌尔科技有限公司 Beam-forming function implementation method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150110284A1 (en) * 2013-10-21 2015-04-23 Nokia Corportion Noise reduction in multi-microphone systems
CN106710603A (en) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 Speech recognition method and system based on linear microphone array
CN107742522A (en) * 2017-10-23 2018-02-27 科大讯飞股份有限公司 Target voice acquisition methods and device based on microphone array
CN109996137A (en) * 2017-12-30 2019-07-09 Gn 奥迪欧有限公司 Microphone apparatus and earphone
CN110447073A (en) * 2017-03-20 2019-11-12 伯斯有限公司 Audio Signal Processing for noise reduction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415117B2 (en) * 2004-03-02 2008-08-19 Microsoft Corporation System and method for beamforming using a microphone array
US10015588B1 (en) * 2016-12-20 2018-07-03 Verizon Patent And Licensing Inc. Beamforming optimization for receiving audio signals
CN109102822B (en) * 2018-07-25 2020-07-28 出门问问信息科技有限公司 Filtering method and device based on fixed beam forming
CN110428851B (en) * 2019-08-21 2022-02-18 浙江大华技术股份有限公司 Beam forming method and device based on microphone array and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150110284A1 (en) * 2013-10-21 2015-04-23 Nokia Corportion Noise reduction in multi-microphone systems
CN106710603A (en) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 Speech recognition method and system based on linear microphone array
CN110447073A (en) * 2017-03-20 2019-11-12 伯斯有限公司 Audio Signal Processing for noise reduction
CN107742522A (en) * 2017-10-23 2018-02-27 科大讯飞股份有限公司 Target voice acquisition methods and device based on microphone array
CN109996137A (en) * 2017-12-30 2019-07-09 Gn 奥迪欧有限公司 Microphone apparatus and earphone

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023347A (en) * 2021-11-30 2022-02-08 云知声智能科技股份有限公司 Directional sound pickup method and device, electronic equipment and storage medium
CN114257684A (en) * 2021-12-17 2022-03-29 歌尔科技有限公司 Voice processing method, system and device and electronic equipment
CN114339525A (en) * 2021-12-31 2022-04-12 紫光展锐(重庆)科技有限公司 Signal processing method and device, chip and module equipment
CN117215516A (en) * 2023-09-12 2023-12-12 深圳市品声科技有限公司 Interaction method and device based on microphone

Also Published As

Publication number Publication date
CN113393856A (en) 2021-09-14
CN113393856B (en) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2021180085A1 (en) Sound pickup method and apparatus and electronic device
US11782554B2 (en) Anti-mistouch method of curved screen and electronic device
WO2021052214A1 (en) Hand gesture interaction method and apparatus, and terminal device
CN110798568B (en) Display control method of electronic equipment with folding screen and electronic equipment
CN111369988A (en) Voice awakening method and electronic equipment
WO2021208723A1 (en) Full-screen display method and apparatus, and electronic device
EP4258685A1 (en) Sound collection method, electronic device, and system
WO2020019355A1 (en) Touch control method for wearable device, and wearable device and system
WO2022022319A1 (en) Image processing method, electronic device, image processing system and chip system
WO2021175266A1 (en) Identity verification method and apparatus, and electronic devices
WO2020034104A1 (en) Voice recognition method, wearable device, and system
CN113347560A (en) Bluetooth connection method, electronic device and storage medium
CN114610193A (en) Content sharing method, electronic device, and storage medium
WO2020237617A1 (en) Screen control method, device and apparatus, and storage medium
CN110248037A (en) A kind of identity document scan method and device
WO2020051852A1 (en) Method for recording and displaying information in communication process, and terminals
WO2021046696A1 (en) Antenna switching method and apparatus
WO2023197997A1 (en) Wearable device, and sound pickup method and apparatus
WO2023216930A1 (en) Wearable-device based vibration feedback method, system, wearable device and electronic device
WO2020078267A1 (en) Method and device for voice data processing in online translation process
CN114089902A (en) Gesture interaction method and device and terminal equipment
WO2023174161A1 (en) Message transmission method and corresponding terminal
CN113436635B (en) Self-calibration method and device of distributed microphone array and electronic equipment
WO2022214004A1 (en) Target user determination method, electronic device and computer-readable storage medium
CN113467904B (en) Method and device for determining collaboration mode, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21766956

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21766956

Country of ref document: EP

Kind code of ref document: A1