CN113129916A

CN113129916A - Audio acquisition method, system and related device

Info

Publication number: CN113129916A
Application number: CN201911404753.8A
Authority: CN
Inventors: 王昆; 王宇峰; 余珞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-16
Anticipated expiration: 2039-12-30
Also published as: CN113129916B

Abstract

The application discloses an audio acquisition method, an audio acquisition system and a related device. The method comprises the steps that audio acquisition equipment acquires a first voice signal and a first vibration signal of a human body sound part vibration bone block; the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal; the audio acquisition equipment determines a first noise signal in the first voice signal by using the second vibration signal; the audio acquisition equipment separates a first noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic equipment. The interference of the noise to the collected user voice signal can be effectively reduced.

Description

Audio acquisition method, system and related device

Technical Field

The present application relates to the field of electronic technologies, and in particular, to an audio acquisition method, system, and related apparatus.

Background

In recent years, when a user performs voice interaction (voice call, video call, voice assistant enabled) with an electronic device (e.g., a mobile phone), more and more users choose to perform voice interaction with the electronic device using an audio capturing device such as an earphone, glasses with a microphone, etc., which can capture a voice signal of the user. That is, the audio capture device captures a voice signal of a user and then transmits the voice signal to the electronic device. Thus, the user can free the hands of the user without holding the electronic equipment.

However, in the prior art, when the audio capturing device captures the voice signal of the user, a noise signal in the environment where the user is located is also captured. The noise signals collected by the audio collection device interfere with the user's voice interaction with the electronic device.

Therefore, in a noise environment, how the audio acquisition device can effectively reduce the interference of noise on the acquired user voice signal is an urgent problem to be solved.

Disclosure of Invention

The application provides an audio acquisition method, an audio acquisition system and a related device, which can effectively reduce the interference of noise on the acquired user voice signal, thereby improving the user experience of the user and the electronic equipment in the voice interaction process.

In a first aspect, the present application provides an audio acquisition system, which includes an audio acquisition device and a first electronic device, where the audio acquisition device and the first electronic device are established with a communication connection; wherein,

the audio acquisition equipment is used for acquiring a first voice signal and a first vibration signal of a human body sound part vibration bone block of a user;

the audio acquisition equipment is used for carrying out signal processing on the first vibration signal to obtain a second vibration signal;

the audio acquisition equipment is used for determining a first noise signal in the first voice signal by using the second vibration signal;

the audio acquisition equipment is used for separating a first noise signal in the first voice signal to obtain a second voice signal and sending the second voice signal to the first electronic equipment;

and the first electronic equipment is used for receiving the second voice signal.

In the audio acquisition system provided in the first aspect, the audio acquisition device processes the acquired vibration signal and filters the noise of the speech signal according to the vibration signal. Because the vibration signal has no interference of noise signal, the frequency of the vibration signal of the human body vocal part vibration bone block is within 2KHz, and the frequency of the voice signal is between 20Hz and 20 KHz. Therefore, the vibration signal can be used to determine the noise signal size in the voice signal with the frequency within 2 KHz. And then determining the size of the noise signal contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, thereby filtering the noise signal in the voice signal. Therefore, the influence of the noise signal can be effectively reduced, so that the user can interact with the electronic equipment more effectively, and the user experience is improved.

With reference to the first aspect, in a possible implementation manner, the audio acquisition device is specifically configured to: filtering a low-frequency signal in the first vibration signal to obtain a third vibration signal; determining a conjugate coefficient H of a channel coefficient H in the first vibration signal acquisition channel^*(ii) a Convolving the third vibration signal with the upper conjugate coefficient H^*A second vibration signal is obtained. Thus, by convolving a conjugate coefficient, channel interference in the audio acquisition device can be reduced.

With reference to the first aspect, in a possible implementation manner, the audio acquisition device is specifically configured to: determining a second noise signal in the first voice signal, wherein the frequency range of the second noise signal is the same as that of the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal; and performing linear prediction on the second noise signal to obtain a first noise signal of the complete frequency range of the first voice signal. In this way, the audio capture device can determine the noise signal carried in the captured speech signal.

With reference to the first aspect, in a possible implementation manner, the audio acquisition device is specifically configured to: receiving a first user operation of a user before acquiring a first voice signal and a first vibration signal; in response to a first user operation, a first vibration signal and a first voice signal are collected.

With reference to the first aspect, in a possible implementation manner, the audio acquisition device is specifically configured to: receiving a first instruction sent by first electronic equipment, wherein the first instruction is used for instructing audio acquisition equipment to start acquiring a first voice signal and a first vibration signal of a human body sound part vibration bone block.

With reference to the first aspect, in a possible implementation manner, the first electronic device is further configured to: receiving a second user operation of the user; and responding to the second user operation, and sending a first instruction to the audio acquisition equipment.

With reference to the first aspect, in a possible implementation manner, the first electronic device is further configured to: and responding to the second user operation, starting a voice call with the second electronic equipment or starting a voice assistant in the first electronic equipment.

With reference to the first aspect, in one possible implementation manner, the audio capture device may be glasses. Like this, when the user need wear glasses (i.e. the user is near-sighted or needs the sunglasses to shelter from the sunshine), need not to wear the earphone again and just can gather user's pronunciation, convenience of customers has promoted user experience.

In a second aspect, the present application provides an audio acquisition method, including: the method comprises the steps that audio acquisition equipment acquires a first voice signal and a first vibration signal of a human body sound part vibration bone block; the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal; the audio acquisition equipment determines a first noise signal in the first voice signal by using the second vibration signal; the audio acquisition equipment separates a first noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic equipment, wherein the audio acquisition equipment is in communication connection with the electronic equipment.

With reference to the second aspect, in one possible implementation manner, an audio acquisition device acquires a first voice signal and a first vibration signal of a human body sound part vibration bone block, and includes: the audio acquisition equipment receives a first user operation of a user before acquiring a first voice signal and a first vibration signal; in response to a first user operation, the audio capture device captures a first vibration signal and a first speech signal.

With reference to the second aspect, in a possible implementation manner, before the audio acquiring device acquires the first voice signal and the first vibration signal of the human body sound part vibration bone block, the method includes: the audio acquisition equipment receives a first instruction sent by the electronic equipment, and the first instruction is used for instructing the audio acquisition equipment to start acquiring a first voice signal and a first vibration signal of a vibration bone block of a human body sound part.

With reference to the second aspect, in a possible implementation manner, the signal processing, by the audio acquisition device, the first vibration signal to obtain a second vibration signal includes: the audio acquisition equipment filters a low-frequency signal in the first vibration signal to obtain a third vibration signal; the audio acquisition equipment determines the conjugate coefficient H of the channel coefficient H in the first vibration signal acquisition channel^*(ii) a Convolution upper conjugate coefficient H of third vibration signal by audio acquisition equipment^*A second vibration signal is obtained.

With reference to the second aspect, in a possible implementation manner, the determining, by the audio acquisition device, a first noise signal in the first voice signal by using the second vibration signal includes: the audio acquisition equipment determines a second noise signal in the first voice signal, which has the same frequency range as the second vibration signal, according to the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal; and the audio acquisition equipment performs linear prediction on the second noise signal to obtain a first noise signal of the complete frequency range of the first voice signal.

In a third aspect, the present application provides an audio capture device comprising one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories are for storing computer program code comprising computer instructions that, when executed by the one or more processors, cause the audio acquisition apparatus to perform the audio acquisition method of any of the possible implementations of any of the aspects.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes computer instructions, and when the computer instructions are executed on an electronic device, the electronic device is caused to execute an audio acquisition method in any possible implementation manner of any one of the foregoing aspects.

In a fifth aspect, the present application provides a computer program product, which when run on a computer, causes the computer to execute the audio acquisition method in any one of the possible implementations of any one of the above aspects.

Drawings

Fig. 1 is a schematic diagram of an audio acquisition system 10 according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an audio acquisition apparatus 100 according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of glasses 101 according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of an audio acquisition method according to an embodiment of the present application;

fig. 6 is a schematic flow chart of signal processing provided in an embodiment of the present application;

fig. 7 is a schematic flow chart of signal processing provided in an embodiment of the present application;

FIG. 8 is a waveform diagram of a vibration signal provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a waveform of a speech signal provided by an embodiment of the present application;

fig. 10 is an interaction diagram of an audio acquisition method according to an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present application will be described in detail and removed with reference to the accompanying drawings. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" in the text is only an association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.

First, an audio capture system according to an embodiment of the present application will be described. Referring to fig. 1, fig. 1 is a schematic diagram of an audio acquisition system according to an embodiment of the present disclosure. The audio capture system 10 may include an audio capture device 100 and an electronic device 200. The embodiment of the application takes the following scenes as an example for introduction: the audio collecting device 100 is glasses capable of collecting voice signals, and the electronic device 200 is a mobile phone.

The audio collecting apparatus 100 may establish a communication connection with the electronic apparatus 200 through bluetooth or a wireless lan. The audio capture device 100 may capture a user's voice. The audio capture device 100 can also receive audio transmitted by the electronic device 200 and play the audio. The electronic device 200 may receive the user's voice captured by the audio capturing device 100. The electronic device 200 may also transmit an audio signal to the audio capture device 100.

The audio collecting apparatus 100 can collect a user voice signal and a vibration signal of a bone mass vibrated by a voice part when the user speaks. The audio acquisition device 100 processes the acquired vibration signal, and performs noise signal filtering processing on the speech signal according to the processed vibration signal to obtain a speech signal after the noise signal filtering processing. The audio collecting apparatus 100 transmits the voice signal processed by filtering the noise signal to the electronic apparatus 200.

The audio collecting device 100 may be implemented as any device capable of collecting vibration signals and voice signals of a bone mass vibrated by a human body sound part, for example, glasses, earphones, and the like having a bone conduction sensor and a microphone. The electronic device 200 may be implemented as any one of the following electronic devices: mobile phones, personal computers, portable game machines, portable media playback devices, in-vehicle media playback devices, and the like.

The following describes an audio capture device 100 according to an embodiment of the present application. Referring to fig. 2, fig. 2 is a schematic structural diagram of an audio capture device 100 according to an embodiment of the present disclosure.

As shown in fig. 2, the audio collecting apparatus 100 may include: a processor 301, a memory 302, a sensor 303, a wireless communication module 304, at least one electro-acoustic transducer 305, a microphone 306, and a power supply 307.

It should be understood that the audio capture device 100 shown in fig. 2 is merely an example, and the audio capture device 100 may have more or fewer components than shown in fig. 2, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The memory 302 may be used, among other things, to store application program code that is executed by the processor 301 to cause the audio capture device 100 to perform methods in embodiments of the present invention. The audio capture device 100 may establish a communication connection, specifically a bluetooth or Wi-Fi connection, with the electronic device 200 through the wireless communication module 304. The audio collecting apparatus 100 may transmit a voice signal to the electronic apparatus 200 or receive an audio signal transmitted by the electronic apparatus 200 through the wireless communication module 304. The application code stored in the memory 302 may also be used to implement functions that call the electroacoustic transducer 305 to convert an audio signal to audio and play it.

The sensor 303 may comprise a bone conduction sensor that may acquire a vibration signal of the acoustic portion vibrating the bone mass.

In some embodiments, the sensor 303 may also include an acceleration sensor. The acceleration sensor may detect a tapping operation. In particular, different numbers of taps cause the acceleration sensor to output different voltage signals that can be passed to the processor to perform corresponding control functions. For example, when the acceleration sensor detects N (N is an integer greater than 0) consecutive tapping operations and outputs a corresponding voltage signal, the processor 301 may start the wireless communication module 304 to establish a communication connection, specifically, a Wi-Fi connection, with the wireless transmission device 200. For another example, when the acceleration sensor detects N +1 continuous tapping operations and outputs a corresponding voltage signal, the processor 301 may start the wireless communication module 304 to establish a communication connection, specifically, a bluetooth connection or a Wi-Fi connection, with the electronic device 200.

In other embodiments, the sensor 303 may also include a fingerprint sensor for detecting a user's fingerprint, identifying a user's identity, and the like.

In other embodiments, the sensor 303 may further include a touch sensor, and in some embodiments, when the touch sensor detects a touch operation, the processor 301 activates the wireless communication module 304 to receive an audio or voice signal.

In other embodiments, the sensor 303 may further include a pressure sensor for detecting a pressing operation by the user. In other embodiments, when the pressure sensor detects a pressing operation, the processor 301 may activate the wireless communication module 304 to receive an audio or voice signal.

In other embodiments, the sensor 303 may also include a distance sensor, a proximity light sensor. The distance sensor, proximity light sensor may detect whether there is an object near the audio capture device 100, thereby determining whether the audio capture device 100 is worn by the user. In other embodiments, the sensor 303 may further include an ambient light sensor, and the processor 301 may adaptively adjust some parameters, such as volume, according to the brightness of the ambient light sensed by the ambient light sensor.

A wireless communication module 304 for supporting short-range data interaction between the audio capture device 100 and various electronic devices. In some embodiments, the wireless communication module 304 can include a bluetooth transceiver for receiving bluetooth audio broadcast signals broadcast by the electronic device 200. The wireless communication module 304 may also include a Wi-Fi module that may receive audio or voice signals transmitted by the wireless transmission device 200 as described above.

The electroacoustic transducer 305, which may comprise a receiver (i.e., "earpiece"), or a speaker, may be used to convert the audio electrical signal into an acoustic signal for playback. The audio electrical signal may be decoded from audio, which may be transmitted by the electronic device 200. The electronic device 200 establishes a Wi-Fi connection with the audio collecting device 100, and the audio collecting device 100 receives an audio file transmitted by the electronic device 200, and then converts the audio file into a voice signal and plays the voice signal.

The microphone 306, which may also be referred to as a "microphone," is used to convert voice signals into electrical audio signals. For example, when the user speaks, the microphone 306 may capture the user's voice signal and convert it to an electrical audio signal.

A power supply 307 may be used to supply power to various components included in the audio capture device 100. In some embodiments, the power source 307 may be a battery, such as a rechargeable battery.

It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the audio capture apparatus 100. In addition, the wireless communication module 304 may further include a bluetooth transceiver. The audio acquisition device 100 can establish a bluetooth connection with another bluetooth audio source through the bluetooth transceiver to realize short-distance data interaction between the two, for example, the audio acquisition device 100 receives an audio signal through the bluetooth transceiver and then plays the audio signal. The audio capture device 100 may also contain one earpiece, or two earpieces. The ear bud contains the various functional modules described above (processor 301, memory 302, sensor 303, wireless communication module 304, electro-acoustic transducer 305, microphone 306, and power supply 307) and an ear bud housing that encloses these functional modules together. When the audio capture device 100 includes two earpieces, the two earpieces can be used as a pair of binaural earphones, and a bluetooth connection can be established through a bluetooth transceiver to achieve data interaction between the two earpieces. The audio capture device 100 may have more or fewer components than shown in fig. 4, may combine two or more components, or may have a different configuration of components. For example, the audio capture device 100 may further include an indicator light (which may indicate the status of the earpiece, such as power, etc.), a dust screen (which may be used with the earpiece), and the like. The various components shown in fig. 4 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.

In some illustrative examples, the audio capture device 100 according to embodiments of the present application may be implemented as glasses 101 as shown in fig. 3. Referring to fig. 3, fig. 3 is a schematic view of glasses according to an embodiment of the present application. As shown in fig. 3, each nose pad 10a of the glasses 101 may include at least one bone conduction sensor therein for collecting vibration signals of the nasal bone when a user wearing the glasses speaks. The temple 10b contains at least one microphone for picking up the voice signal of the user wearing the glasses. The frame 10c contains a processor, a memory, a power source, a plurality of transmission elements, and the like. The glasses 101 may be near vision glasses or sunglasses.

It should be understood that the glasses 101 shown in fig. 3 is only an example, and the glasses 101 may have a different outer shape from the glasses 101 shown in fig. 3. That is, the outer shapes of the nose pad, the frame, the temples, and the lenses may be different from the shapes of the lenses already shown in fig. 3 of the nose pad 10a, the temples 10b, and the frame 10 c.

The vibration signals and the voice signals of the nasal bone of the user are collected by the glasses, so that the voice of the user can be collected, and the functions of the myopia glasses or the sunglasses can be realized. When the user wears the glasses, the user does not need to wear other wearing equipment such as earphones for audio acquisition. Therefore, the user can meet various requirements of the user by using only one device, and the user experience is improved.

The electronic device 200 according to the embodiment of the present application is described below. Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present application.

It should be understood that the electronic device 200 shown in fig. 4 is merely an example, and that the electronic device 200 may have more or fewer components than shown in fig. 4, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The electronic device 200 may include: the mobile terminal includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be, among other things, a neural center and a command center of the electronic device 200. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K via an I2C interface, such that the processor 110 and the touch sensor 180K communicate via an I2C bus interface to implement the touch functionality of the electronic device 200.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 200. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 200.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 200, and may also be used to transmit data between the electronic device 200 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 200. In other embodiments of the present application, the electronic device 200 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 200. The charging management module 140 may also supply power to the first audio device through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 200 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the electronic device 200. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 200, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of electronic device 200 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 200 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device 200 implements display functions through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 200 may include 1 or N display screens 194, with N being a positive integer greater than 1.

The electronic device 200 may implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 200 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 200 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 200 may support one or more video codecs. In this way, the electronic device 200 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the electronic device 200, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 200. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 200 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phone book, etc.) created during use of the electronic device 200, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The electronic device 200 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 200 can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 200 receives a call or voice information, it is possible to receive a voice by placing the receiver 170B close to the human ear.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The electronic device 200 may be provided with at least one microphone 170C. In other embodiments, the electronic device 200 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 200 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and the like.

The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 200 determines the intensity of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 200 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 200 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 200. In some embodiments, the angular velocity of the electronic device 200 about three axes (i.e., x, y, and z axes) may be determined by the gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 200, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 200 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 200 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by the barometric pressure sensor 180C.

The magnetic sensor 180D includes a hall sensor. The electronic device 200 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 200 is a flip, the electronic device 200 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 200 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 200 is stationary. The method can also be used for recognizing the posture of the first audio equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 200 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, the electronic device 200 may utilize the distance sensor 180F to range to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic apparatus 200 emits infrared light to the outside through the light emitting diode. The electronic device 200 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 200. When insufficient reflected light is detected, the electronic device 200 may determine that there are no objects near the electronic device 200. The electronic device 200 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 200 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The electronic device 200 may adaptively adjust the brightness of the display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 200 is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 200 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 200 implements a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 200 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 200 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 200 to shut down abnormally. In other embodiments, the electronic device 200 performs boosting of the output voltage of the battery 142 when the temperature is below a further threshold value to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the electronic device 200 at a different position than the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function. For example, the acoustic vibrating bone mass may include bones such as teeth, gums, maxilla and mandible, nasal bone, and the like.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 200 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 200.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 200 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic device 200 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 200 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 200 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 200 and cannot be separated from the electronic device 200.

The embodiment of the application provides an audio acquisition method, in which an audio acquisition device 100 acquires a vibration signal and a voice signal of a human body sound part vibration bone block of a user. The audio collecting device 100 processes the collected vibration signal and filters noise of the voice signal according to the vibration signal, so as to obtain a voice signal with noise filtered and send the voice signal to the electronic device 200.

In the prior art, the audio acquisition device only acquires a voice signal, and a noise signal in the environment interferes with the voice signal. Therefore, when the user is in a call, the audio capture device 100 captures the voice of the user and transmits the voice to the electronic device 200, and the electronic device 200 transmits the voice of the user to the electronic device of the other party of the call. Since the voice of the user collected by the audio collecting device 100 carries noise, the specific content of the voice of the user may not be known by the opposite party of the call, and thus the experience of the call process is poor. In addition, when the user voice command interacts with the electronic device 200, if the user voice collected by the audio collecting device 100 is interfered by a noise signal, the electronic device 200 cannot correctly recognize the voice command in the voice signal sent by the audio collecting device 100. Thus, the electronic apparatus 200 cannot correctly perform the user voice command, resulting in a poor user experience.

Compared with the prior art, in the audio acquisition method provided by the application, the audio acquisition equipment 100 processes the acquired vibration signal and filters the noise of the voice signal according to the vibration signal. Because the vibration signal has no interference of noise signal, the frequency of the vibration signal of the human body vocal part vibration bone block is within 2KHz, and the frequency of the voice signal is between 20Hz and 20 KHz. Therefore, the vibration signal can be used to determine the noise signal size in the voice signal with the frequency within 2 KHz. And then determining the size of the noise signal contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, thereby filtering the noise signal in the voice signal. Therefore, the influence of the noise signal can be effectively reduced, so that the user can interact with the electronic device 200 more effectively, and the user experience is improved.

Fig. 5 is a schematic flowchart of an audio acquisition method according to an embodiment of the present application. The audio collecting device 100 may collect a voice signal of a user and a vibration signal of a nasal bone, determine a noise signal in the voice signal according to the vibration signal, filter the noise signal, and finally send the voice signal with the noise signal filtered to the electronic device 200. The method specifically comprises the following steps:

s101, the audio acquisition equipment 100 acquires a first voice signal and a first vibration signal of a vibration bone block of a human body sound part of a user.

The audio collecting apparatus 100 collects a first vibration signal of a human body sound part vibrating a bone mass using the bone conduction sensor 303 and collects a first voice signal of a user using the microphone 306. For example, the audio capture device 100 may be glasses 101 as shown in fig. 3. Bone conduction sensors at the nose pads of the glasses 101 collect vibration signals at the user's nasal bones. The microphones at the temples of the glasses 101 pick up the voice signals of the user.

In one possible implementation manner, before the audio acquiring apparatus 100 acquires the first vibration signal and the first voice signal of the user's human body sound part vibration bone mass, the method includes: the audio collecting apparatus 100 receives a first instruction for instructing the audio collecting apparatus 100 to start collecting a first vibration signal and a first voice signal of the user's human body sound part vibrating bone mass. The first instruction may be an instruction generated by the audio capturing device 100 receiving a user operation, for example, the audio capturing device 100 receiving a single-click or double-click operation or a pressing operation, the processor in the audio capturing device 100 generating an instruction instructing the bone conduction sensor 303 in the audio capturing device 100 to start capturing a first vibration signal of a vibration bone mass of a human body vocal part of the user, and instructing the microphone 306 in the audio capturing device 100 to start capturing a first voice signal. Alternatively, the first instruction may be an instruction transmitted by the electronic apparatus 200. For example, after the electronic apparatus 200 receives a user operation for the user to select to answer a call with the audio capture apparatus 100, the electronic apparatus 200 transmits a first instruction to the audio capture apparatus 100.

In one possible implementation manner, before the audio acquiring apparatus 100 acquires the first vibration signal and the first voice signal of the user's human body sound part vibration bone mass, the method includes: the audio capture device 100 and the electronic device 200 establish a communication connection. Specifically, the audio capture device 200 may establish a communication connection with the electronic device 200 through the wireless communication module 304. The communication connection may be a Wi-Fi connection or a bluetooth connection.

S102, the audio acquisition equipment 100 performs signal processing on the first vibration signal to obtain a second vibration signal.

The first vibration signal collected by the audio collecting apparatus 100 is an analog signal. Due to the acquisition device, the acquired first vibration signal is weak, and the acquired first vibration signal needs to be subjected to signal processing. After the audio acquisition device 100 acquires the first vibration signal, the first vibration signal is subjected to signal processing to obtain a second vibration signal.

In one possible implementation, the processing of the first vibration signal by the audio acquisition device 100 may refer to fig. 6. Fig. 6 is a schematic diagram of a signal processing flow according to an embodiment of the present application. As shown in fig. 6, the processing procedure of the vibration signal by the audio acquisition device 100 may include:

s1021, the audio capture device 100 amplifies the first vibration signal.

The audio collecting apparatus 100 may amplify the first vibration signal through a signal amplifying circuit.

S1022, the audio collecting device 100 performs analog-to-digital conversion on the amplified first vibration signal to obtain a vibration signal in the form of a digital signal.

The audio collecting apparatus 100 converts the amplified first vibration signal into a vibration signal in the form of a digital signal through an analog-to-digital converter.

S1023, the audio capture device 100 processes the vibration signal in the form of a digital signal using a digital signal processing unit in the audio capture device 100.

In a possible implementation manner, the process of processing the vibration signal in the form of a digital signal by the audio acquisition device 100 by using the digital signal processing unit can refer to fig. 7. As shown in fig. 7, the process of processing the vibration signal in the form of a digital signal by the audio capture device 100 may include:

s201, filtering low-frequency vibration signals generated by human body motion.

When the human body moves, the bone vibrates, and therefore a vibration signal is generated. The vibration frequency of the human body sound part vibration bone block vibration caused by the movement is far less than the vibration frequency of the human body sound part vibration bone block vibration caused by speaking. Because the vibration signal that audio acquisition equipment gathered has been enlargied, can lead to the motion to cause the vibration frequency of human sound portion vibration bone piece vibration to enlarge, in order to avoid the bone vibration that user's motion produced to speak to the influence that the bone motion produced the vibration signal to user, audio acquisition equipment 100 carries out filtering process to the first vibration signal who gathers.

In one possible implementation, the audio capturing device 100 may retain only a portion of the first vibration signal having a vibration frequency greater than a first threshold.

And S202, performing filtering compensation.

The human motion may also cause mechanical changes to the sensors in the audio capturing device 100, resulting in interference with the captured first vibration signal. In addition, the first vibration signal captured by the audio capturing apparatus 100 may also be disturbed due to the vibration of the horn in the audio capturing apparatus 100. Factors of various devices in the audio collecting apparatus 100 cause the first vibration signal to be collected and the voice signal to be distorted, thereby causing failure to obtain complete voice information of the user. In order to solve this problem, in the embodiment of the present application, the audio capture device 100 performs filter compensation processing on the captured first vibration signal to reduce distortion of the signal.

In one possible implementation manner, the audio acquisition device 100 performs filtering compensation processing on the first vibration signal, including: filter compensation parameter H for determining a disturbance channel parameter H which disturbs a vibration signal^*Convolution filtering compensation parameter H of vibration signal of output vibration signal^*Then a second vibration signal is obtained.

In a possible implementation manner, a vibration signal obtained by amplifying and filtering a low-frequency vibration signal generated by human body motion of a first vibration signal collected by an audio collecting device can be represented by R1, and a vibration signal obtained by digitally processing the first vibration signal can be represented by S1. Then it is determined that,

S1＝H*R1 (1)

in formula (1), "+" indicates convolution. According to the formula (1), the filter compensation parameter H can be obtained^*：

H^*＝INV(A1^T*A1)*A1^T*S1 (2)

H*H^*＝1 (3)

Wherein "INV" represents the matrix inversion, A1 is the R1 generator matrix, A1^TIs a transposed matrix of a 1. H^*Is the conjugate matrix of H.

And S1024, the audio acquisition equipment 100 obtains a second vibration signal.

Then, the second vibration signal should be:

S2＝S1*H^*＝R1*H*H^* (4)

therefore, the obtained second vibration signal is closer to the original vibration signal of the human body sound part vibration bone block, and channel interference can be reduced. Therefore, the voice signal of the user can be acquired more accurately by using the second vibration signal.

And S103, the audio acquisition equipment 100 determines a noise signal in the first voice signal by using the second vibration signal.

As shown in fig. 8, fig. 8 is a schematic diagram of a nasal bone vibration signal collected by the audio collecting apparatus 100. The nasal bone vibration signal collected by the audio collecting apparatus 100 is not interfered by a noise signal. The frequency of the nasal bone vibration signal collected in fig. 8 is F1, and F1 is less than 2 KHz. It will be understood herein that the frequency of the vibration signal picked up by the glasses is related to the sensors in the glasses. When the sensor is sensitive, the vibration frequency of the collected vibration signal may be greater than 2KHz, which is not limited herein. As shown in fig. 9, fig. 9 is a schematic diagram of a voice signal collected by a microphone in the audio collecting apparatus 100. The voice signal collected in the audio collecting apparatus 100 includes human voice and noise. Noise can interfere with human voice. In the same frequency range, the vibration signal of the nasal bone and the speech signal of the user carried by the human voice signal are the same. Therefore, the magnitude of the noise signal in the speech signal in the frequency range of 0 to F1 can be determined using the vibration signal of the nasal bone. Then, by linear prediction, the magnitude of the noise signal in the speech signal in the range of 0 to F2 can be obtained. Therefore, noise can be filtered out, and a human voice signal without noise interference is obtained.

In an illustrative example, the vibration signal in fig. 8 is assumed to be a vector Y. The speech signal in fig. 9 with the frequency range 0-F1 is vector X1 ═ X + N. The vector X is the voice collected by the microphone, and the vector N is the noise collected by the microphone. If the processing coefficient is a vector B, then Y ═ N × B, N ═ Y-X × B)/B, and noise in the frequency range of 0 to F1 can be obtained. The noise N1 in the frequency range 0-F2 can then be found by linear prediction. For the linear prediction algorithm, reference may be made to linear prediction algorithms in the prior art, and details thereof are not described herein.

In one possible implementation manner, before the audio capture device 100 determines the noise signal in the first voice signal by using the second vibration signal, the method includes: the first speech signal is signal processed.

Alternatively, the processing of the first speech signal may include processing to amplify, filter, etc. the first speech signal. Specifically, reference may be made to the processing procedure of the first vibration signal, which is not described herein again.

S104, the audio acquisition equipment 100 separates the noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic equipment.

When it is predicted that the noise signal in the entire frequency band range (for example, the frequency range of 0 to F2 shown in fig. 9) in the speech signal collected in the audio collecting apparatus 100 is N1, the speech signal is X2, and then the human voice signal is X3 — X2-N1, that is, the second speech signal.

In the audio acquisition method provided by the application, the audio acquisition device 100 processes the acquired vibration signal and filters the noise of the voice signal according to the vibration signal. Because the vibration signal has no interference of noise signal, the frequency of the vibration signal of the human body vocal part vibration bone block is within 2KHz, and the frequency of the voice signal is between 20Hz and 20 KHz. Therefore, the vibration signal can be used to determine the noise signal size in the voice signal with the frequency within 2 KHz. And then determining the size of the noise signal contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, thereby filtering the noise signal in the voice signal. Therefore, the influence of the noise signal can be effectively reduced, so that the user can interact with the electronic device 200 more effectively, and the user experience is improved.

Fig. 10 is an interaction diagram of an audio acquisition method according to an embodiment of the present application. In the embodiment of the present application, the following scenarios are taken as examples: the audio capture device 100 is a pair of glasses 101 and the electronic device 200 is a mobile phone. When the mobile phone 1 and the mobile phone 2 perform a voice call, the user of the mobile phone 1 inputs a voice signal to the mobile phone 1 through the glasses 101. The glasses 101 process the voice signal after acquiring the voice signal, so as to reduce the influence of noise signals. Therefore, the conversation process between the mobile phone 1 and the mobile phone 2 can be smoother. As shown in fig. 10, the method specifically includes:

s301, the glasses 101 and the mobile phone 1 are in communication connection.

In particular, the glasses 101 may establish a Wi-Fi connection or a bluetooth connection with the handset 1.

S302, triggering the glasses 101 to start executing step S303.

In this embodiment, step S302 may include two ways to trigger the glasses 101 to execute step S303. The manner of triggering the glasses 101 to execute step S303 can be specifically referred to steps S302a and S302 b.

The first method is as follows:

s302a, glasses 101 receive the first user operation.

The glasses 101 may receive a first user operation, which may be used to trigger the glasses 101 to perform step S303. The first user operation may specifically be a user clicking or double-clicking on the glasses 101, or long-pressing a power-on key of the glasses 101, and the like, and the first user operation is not limited here.

The second method comprises the following steps:

s302b and the glasses 101 receive the first command sent by the mobile phone 1.

The glasses 101 may receive a first instruction sent by the mobile phone 1, where the first instruction is used to trigger the glasses 101 to perform step S303.

It will be appreciated that in one possible implementation, the handset receives the second user action and, in response to the second user action, sends the first instruction to the glasses 101. The second user operation may be the user putting on a voice call on the handset 1 or starting a handset assistant APP installed in the handset 1.

S303, the glasses 101 collect a first vibration signal and a first voice signal of the nasal bone of the user.

Step S303 may refer to the description in step S101, and is not described herein again.

S304, the glasses 101 perform signal processing on the first vibration signal to obtain a second vibration signal, and process the first voice signal according to the second vibration signal to obtain a second voice signal.

Step 304 may refer to the descriptions in steps S102-S104, which are not repeated herein.

S305, the glasses 101 send a second voice signal to the mobile phone 1

S306, the mobile phone 1 receives the second voice signal, and encodes the second voice signal to obtain the first voice data.

After receiving the voice signal, the mobile phone 1 may encode the voice signal to obtain first voice data. The first voice data, which may be a modulation pulse encoded file. A modulated pulse code file is a file in the form of a digital signal containing only 0 and 1.

S307, the mobile phone 1 sends the first voice data to the mobile phone 2.

S308, the mobile phone 2 receives the first voice data.

In a possible implementation manner, after the glasses 101 performs S303, the first vibration signal and the first voice signal are directly transmitted to the mobile phone 1, and then the mobile phone 1 performs step S304 and step S305.

In a possible implementation manner, after the mobile phone 1 receives the second voice signal, the voice assistant application program in the mobile phone 1 performs voice recognition on the second voice signal, and displays a recognition result in the user interface. The voice assistant application may also execute voice commands carried in the recognition results. For example, the voice command carried in the recognition result is to look up weather, and the voice assistant application program may display the weather query result in the user interface, or want the user to broadcast the weather query result.

Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any one of the methods described above.

The embodiment of the application also provides a computer program product containing instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optics, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. The audio acquisition system is characterized by comprising audio acquisition equipment and first electronic equipment, wherein the audio acquisition equipment is in communication connection with the first electronic equipment; wherein,

the audio acquisition equipment is used for separating the first noise signal in the first voice signal to obtain a second voice signal and sending the second voice signal to first electronic equipment;

the first electronic device is used for receiving the second voice signal.

2. The system of claim 1, wherein the audio capture device is specifically configured to:

filtering a low-frequency signal in the first vibration signal to obtain a third vibration signal;

determining a conjugate coefficient H of a channel coefficient H in the first vibration signal acquisition channel^*；

Convolving the third vibration signal with the upper conjugate coefficient H^*A second vibration signal is obtained.

3. The system according to any one of claims 1 or 2, wherein the audio capture device is specifically configured to:

determining a second noise signal in the first voice signal, wherein the frequency range of the second noise signal is the same as that of the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal;

and performing linear prediction on the second noise signal to obtain a first noise signal of the complete frequency range of the first voice signal.

4. The system of claim 1, wherein the audio capture device is specifically configured to:

receiving a first user operation of a user before acquiring the first voice signal and the first vibration signal;

in response to the first user operation, the first vibration signal and the first voice signal are collected.

5. The system according to any one of claims 1 to 3, wherein the audio acquisition device is specifically configured to:

and receiving a first instruction sent by the first electronic device, wherein the first instruction is used for instructing the audio acquisition device to start acquiring a first voice signal and a first vibration signal of the human body sound part vibration bone block.

6. The system of claim 5, wherein the first electronic device is further configured to:

receiving a second user operation of the user;

and responding to the second user operation, and sending a first instruction to the audio acquisition equipment.

7. The system of claim 6, wherein the first electronic device is further configured to:

and responding to the second user operation, starting a voice call with a second electronic device, or starting a voice assistant in the first electronic device.

8. An audio acquisition method, comprising:

the method comprises the steps that audio acquisition equipment acquires a first voice signal and a first vibration signal of a human body sound part vibration bone block;

the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal;

the audio acquisition equipment determines a first noise signal in the first voice signal by using the second vibration signal;

the audio acquisition equipment separates the first noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic equipment, wherein the audio acquisition equipment is in communication connection with the electronic equipment.

9. The method of claim 8, wherein the audio capture device captures a first speech signal and a first vibration signal of a human vocal part vibrating bone mass, comprising:

the audio acquisition equipment receives a first user operation of a user before acquiring the first voice signal and the first vibration signal;

in response to the first user operation, the audio capture device captures the first vibration signal and the first voice signal.

10. The method of claim 8, wherein the audio capturing device prior to capturing the first speech signal and the first vibration signal of the human vocal part vibrating bone mass comprises:

the audio acquisition equipment receives a first instruction sent by the electronic equipment, wherein the first instruction is used for indicating the audio acquisition equipment to start acquiring a first voice signal and a first vibration signal of a vibration bone block of a human body sound part.

11. The method according to any one of claims 8-10, wherein the audio capturing device performs signal processing on the first vibration signal to obtain a second vibration signal, comprising:

the audio acquisition equipment filters a low-frequency signal in the first vibration signal to obtain a third vibration signal;

the audio acquisition equipment determines a conjugate coefficient H of a channel coefficient H in the first vibration signal acquisition channel^*；

The audio acquisition equipment convolves the third vibration signal with an upper conjugate coefficient H^*A second vibration signal is obtained.

12. The method according to any one of claims 8-11, wherein the audio capture device determining a first noise signal in the first speech signal using the second vibration signal comprises:

the audio acquisition equipment determines a second noise signal in the first voice signal, which has the same frequency range as the second vibration signal, according to the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal;

and the audio acquisition equipment performs linear prediction on the second noise signal to obtain a first noise signal of the complete frequency range of the first voice signal.

13. An audio acquisition device, comprising: a communication interface, a memory, and a processor; the communication interface, the memory coupled with the processor, the memory for storing computer program code comprising computer instructions which, when read from the memory by the processor, cause the audio capturing device to perform the method of any of claims 8-12.

14. A computer storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 8 to 12.

15. A computer program product, characterized in that it causes a computer to carry out the method according to any one of claims 8 to 12 when said computer program product is run on the computer.