CN118235431A - Spatial audio acquisition method and device - Google Patents
Spatial audio acquisition method and device Download PDFInfo
- Publication number
- CN118235431A CN118235431A CN202280004436.0A CN202280004436A CN118235431A CN 118235431 A CN118235431 A CN 118235431A CN 202280004436 A CN202280004436 A CN 202280004436A CN 118235431 A CN118235431 A CN 118235431A
- Authority
- CN
- China
- Prior art keywords
- microphone
- spatial audio
- array
- arrays
- signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000003491 array Methods 0.000 claims abstract description 68
- 230000005236 sound signal Effects 0.000 claims abstract description 45
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000004891 communication Methods 0.000 claims description 28
- 230000015654 memory Effects 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000035945 sensitivity Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 abstract description 17
- 238000010295 mobile communication Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 21
- 238000004590 computer program Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 13
- 238000013461 design Methods 0.000 description 5
- 239000004065 semiconductor Substances 0.000 description 5
- 229910044991 metal oxide Inorganic materials 0.000 description 4
- 150000004706 metal oxides Chemical class 0.000 description 4
- 229910000577 Silicon-germanium Inorganic materials 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000001010 compromised effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- JBRZTFJDHDCESZ-UHFFFAOYSA-N AsGa Chemical compound [As]#[Ga] JBRZTFJDHDCESZ-UHFFFAOYSA-N 0.000 description 1
- 244000062793 Sorghum vulgare Species 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- LEVVHYCKPQWKOP-UHFFFAOYSA-N [Si].[Ge] Chemical compound [Si].[Ge] LEVVHYCKPQWKOP-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000019713 millet Nutrition 0.000 description 1
- 239000011295 pitch Substances 0.000 description 1
- 150000003071 polychlorinated biphenyls Chemical class 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The disclosure provides a spatial audio acquisition method and device, and relates to the technical field of mobile communication. According to the method, a plurality of groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is carried out on microphone signals acquired by the microphone arrays so as to acquire a spatial audio signal. The size of the spatial audio acquisition system can be controlled within a certain size by using the mutually orthogonal microphone arrays formed by the miniature microphones applying the beam technology so as to be built in the existing mobile equipment to form a pickup system capable of being built in the mobile intelligent equipment, meanwhile, the directivity of the microphone arrays is controlled by the differential beam technology, and the requirements of extra electroacoustic and acoustic hardware are reduced, so that the requirements of the mobile intelligent equipment on acquisition of immersive audio are met under the condition of controlling the volume of the equipment.
Description
The disclosure relates to the technical field of mobile communication, and in particular relates to a spatial audio acquisition method and device.
With the development of technology, spatial audio has many applications in multimedia and instant messaging of consumer devices. However, the current spatial audio acquisition is dependent on external equipment, and cannot be directly acquired through intelligent mobile equipment, and the current spatial audio acquisition equipment has the problems of overlarge volume and difficult operation, and is not suitable for the increasing high-quality audio and video acquisition demands of users.
Disclosure of Invention
The disclosure provides a spatial audio acquisition method and a device, which are used for solving the problem that a spatial audio acquisition system cannot be integrated in UE to perform effective and high-quality spatial audio acquisition in the prior art.
An embodiment of a first aspect of the present disclosure provides a spatial audio acquisition method, which is performed by a user equipment UE, in which a plurality of groups of microphone arrays are arranged, and maximum response directions of each group of arrays are mutually orthogonal, the method including: and carrying out differential beam processing on the microphone signals acquired by the microphone array to acquire a spatial audio signal.
In some embodiments, differential beam processing of microphone signals acquired by the microphone array to acquire spatial audio signals includes: adding proper delay filtering and corresponding compensation filters to the microphone signals to obtain array signals with required directivity; and decoding the array signal to obtain a spatial audio signal.
In some embodiments, the method further comprises: a plurality of directivities of the microphone array are acquired, the directivities characterizing the sensitivity of the signals in different directions.
In some embodiments, the method further comprises: acquiring a plurality of directional differential arrays; and acquiring directivity required in a three-dimensional space through the combination of different differential arrays so as to acquire a spatial audio signal.
In some embodiments, the method further comprises: the spatial audio signal is decoded to output immersive multi-channel audio and/or ambisonic audio.
In some embodiments, the method further comprises: the microphone signal is subjected to a filtering process to obtain a low frequency component and a high frequency component, wherein the low frequency component is output as a low frequency effect and the high frequency component is used to form a spatial audio signal.
In some embodiments, the microphone array is arranged in the UE in any of the following ways: the microphone array is arranged at a position close to the voice acquisition component in the UE; the microphone array is disposed in the UE at a location proximate to the image acquisition component.
In some embodiments, the microphone arrays include a predetermined number of microphones forming three groups of microphone arrays that are orthogonal to each other or angularly offset from the orthogonal error by a predetermined range, the centers of the three groups of microphone arrays coinciding or having a distance that does not exceed an error threshold.
An embodiment of a second aspect of the present disclosure provides a spatial audio collection device, which is configured to be executed by a user equipment UE, in which a plurality of groups of microphone arrays are arranged, and maximum response directions of each group of arrays are orthogonal to each other, and includes: and the spatial audio signal acquisition module is used for carrying out differential beam processing on the microphone signals acquired by the microphone array so as to acquire the spatial audio signals.
An embodiment of a third aspect of the present disclosure provides a communication apparatus including: a transceiver; a memory; and the processor is respectively connected with the transceiver and the memory, and is configured to control the wireless signal receiving and transmitting of the transceiver by executing the computer executable instructions on the memory, and can realize the spatial audio acquisition method of the embodiment of the first aspect.
An embodiment of a fourth aspect of the present disclosure proposes a computer storage medium, in which computer-executable instructions are stored; the spatial audio collection method according to the embodiment of the first aspect can be implemented when the computer executable instructions are executed by the processor.
The embodiment of the disclosure provides a method and a device for acquiring spatial audio, wherein a plurality of groups of mutually orthogonal microphone arrays are arranged in UE, and microphone signals acquired by the microphone arrays are subjected to differential beam processing to acquire the spatial audio signals. According to the pick-up system, the microphone arrays which are mutually orthogonal are arranged, the size of the pick-up system is controlled within a certain size while the directivity of the microphones is controlled, so that the pick-up system is built in the existing mobile equipment, and the pick-up system capable of being built in the mobile intelligent equipment is formed. And the signals collected by the pickup system are controlled by a differential beam technology to collect space audio, so that the requirements of extra electroacoustic and acoustic hardware are reduced, and the requirements of the mobile intelligent equipment on collecting immersive audio are met under the condition of controlling the volume of the equipment.
Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a flow chart of a spatial audio collection method according to an embodiment of the disclosure;
fig. 2 is a flow chart of a spatial audio collection method according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of spatial audio acquisition logic according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a first order differential array according to an embodiment of the present disclosure;
Fig. 5 is a schematic diagram of directivity of a microphone array according to an embodiment of the disclosure;
fig. 6 is a schematic diagram of directionality of decoded left and right channels according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a first order B format according to an embodiment of the present disclosure;
FIG. 8 is a diagram of a first order B format directed signal component according to an embodiment of the present disclosure;
Fig. 9 is a schematic diagram of an arrangement of a microphone array in a mobile device according to an embodiment of the disclosure;
Fig. 10 is a schematic diagram of an arrangement of a microphone array in a mobile device according to an embodiment of the disclosure;
FIG. 11 is a block diagram of a spatial audio acquisition device according to an embodiment of the present disclosure;
Fig. 12 is a schematic structural diagram of a communication device according to an embodiment of the disclosure;
Fig. 13 is a schematic structural diagram of a chip according to an embodiment of the disclosure.
Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.
With the development of technology, spatial audio has many applications in civil equipment, and websites such as Youtube, facebook support spatial audio content. In the aspect of real-time communication, audio and video codecs such as AVS support the codec of spatial audio.
The existing spatial audio acquisition technology can achieve good audio quality and immersive sound field reproduction, but the acquisition of the current spatial audio depends on external equipment and cannot be directly acquired through intelligent mobile equipment. In addition, the existing spatial audio acquisition equipment has the problems of overlarge volume and difficult operation. The following table shows a prior art 3D spatial audio acquisition device:
it can be seen that the spatial audio acquisition device in the related art cannot be built in the existing mobile intelligent device, and is not suitable for the increasing high-quality audio and video acquisition demands of users. Taking the current most common mobile smart device, namely a smart phone, for example, the size is about 7 inches, such as millet 12S PRO (length 163.6mm, width 74.6mm, thickness 8.16 mm). In addition, the hardware arrangement in the mobile intelligent device is very compact, and the volume of a built-in audio acquisition system of the mobile intelligent device is very limited.
Therefore, the disclosure provides a spatial audio acquisition method and device, so as to solve the problem that a spatial audio acquisition system cannot be integrated in a UE to perform effective and high-quality spatial audio acquisition in the prior art.
The spatial audio acquisition method and the spatial audio acquisition device provided by the application are described in detail below with reference to the accompanying drawings.
Fig. 1 shows a flow diagram of a spatial audio acquisition method according to an embodiment of the present disclosure. The method may be performed by a User Equipment (UE). In the present disclosure, the UE is arranged with multiple groups of microphone arrays, and the maximum response directions of each group of arrays are orthogonal to each other. As shown in fig. 1, the method may include the following steps.
S101, differential beam processing is carried out on microphone signals acquired by a microphone array so as to acquire a space audio signal.
In embodiments of the present disclosure, multiple sets of mutually orthogonal microphone arrays provided in a UE, each microphone array may include multiple microphones. The present disclosure is not limited to microphone types, and may employ omni-directional miniature microphones, such as MEMS (micro electro mechanical system) microphones, electret microphones, etc., that are small in size, small in error, more suitable for integration in small devices such as user devices, etc., and suitable for beam steering, to control the size of pickup systems, and the present disclosure can greatly reduce the volume of spatial audio collection systems by employing the omni-directional miniature microphones as compared to the past spatial audio collection devices.
The conventional microphone beams have delay-sum (delay-sum), filter-sum (filter-sum), adaptive beamforming (MVDR), and differential beam (DIFFERENTIAL BEAMFORMING). Since the differential beam has the advantage of compact layout, the frequency-invariant beam pattern. In embodiments of the present disclosure, directivity may be controlled by differential microphone beam technology for multiple sets of microphone arrays to assist in acquiring spatial audio signals. In the present disclosure, the required directivities in three-dimensional space are obtained by different differential beam designs and combinations of different low-order arrays, so as to collect spatial audio signals. The scheme of the present disclosure only depends on the beam technology to control directivity, and can effectively reduce the dependence of the pickup system on electroacoustic and acoustic hardware.
In summary, according to the spatial audio acquisition method provided by the present disclosure, a plurality of groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on microphone signals acquired by the microphone arrays, so as to acquire spatial audio signals. The size of the spatial audio acquisition system can be controlled within a certain size by using the mutually orthogonal microphone arrays formed by the miniature microphones applying the beam technology so as to be built in the existing mobile equipment to form a pickup system capable of being built in the mobile intelligent equipment, meanwhile, the directivity of the microphone arrays is controlled by the differential beam technology, and the requirements of extra electroacoustic and acoustic hardware are reduced, so that the requirements of the mobile intelligent equipment on acquisition of immersive audio are met under the condition of controlling the volume of the equipment.
Fig. 2 shows a flow diagram of a spatial audio acquisition method according to an embodiment of the disclosure. The method may be performed by a UE, and in an embodiment of the present disclosure, an arrangement of a microphone array is first described.
In some alternative embodiments, the microphone arrays include a predetermined number of microphones forming three sets of microphone arrays that are orthogonal to each other or angularly offset from the orthogonal error by a predetermined range, the centers of the three sets of microphone arrays coinciding or having a distance that does not exceed an error threshold.
In other words, the number of microphones in the microphone arrays in the present disclosure is not limited, and each array may be composed of any number of microphones. In a preferred embodiment, 4 MEMS microphones are of a cost-effective design, for example, arranged on 4 vertices adjacent to the front 6 face, forming three mutually orthogonal microphone arrays.
The microphone type is not limited in the present disclosure, and a miniature microphone (such as MEMS) can be used to control the size of the pickup system, so that the volume can be greatly reduced compared with the conventional spatial audio collection device.
It should be appreciated that three microphone arrays arranged at orthogonal angles are one preferred form of the invention, in alternative embodiments the three microphones may be angularly offset and the three arrays may ideally be perfectly centered, in alternative embodiments the separation or distance of the centers may be considered an error. Of course, the wheat arrays on practical equipment should keep the arrays orthogonal to each other, so as to reduce interference caused by position errors. In addition, calibration is required according to practical situations, since position errors, inconsistencies between microphones, and disturbances of the device itself all affect the final performance.
For example, the present disclosure uses a fully-directional miniature microphone arrangement 3 pairs of mutually orthogonal microphones with identical parameters, and each pair of microphone connection midpoints coincide. The microphone signals forming the array can be multiplexed, so that the microphone array required by the invention can be formed by only 4 microphones at least and can be arranged at any 4 vertexes of a regular hexahedron. Due to the size limitation of the mobile intelligent device, 4 microphones are recommended to be arranged on 4 adjacent vertexes of the right 6-face body, the directions of the main axes of the microphones are consistent, and the distance between the microphone arrays is as small as possible.
In this example, a three-dimensional spatial coordinate system is established with microphone 0 as the origin, microphone 1 on the x-axis, microphone 2 on the y-axis, and microphone 3 on the z-axis. Microphone 0 is equal to 3 groups of microphone pitches of microphone 1, microphone 2 and microphone 3, and 3 pairs of orthogonal first-order differential arrays are formed. Compared with the traditional capacitor and moving coil microphone, the miniature microphone has the advantage of small volume, the 3 pairs of microphone spacing can be completely controlled at 4mm and is far smaller than the wavelength (1.7 cm) of a target signal (20-20 kHz), so that the error caused by the microphone spacing is negligible.
Based on the embodiment shown in fig. 1, the method may comprise the following steps, as shown in fig. 2.
S201, filtering processing is performed on the microphone signals acquired by the microphone array to acquire a low frequency component and a high frequency component.
In an embodiment of the present disclosure, a microphone signal acquired by a microphone array is subjected to a filtering process, where the obtained low frequency component is output as a low frequency effect, and the high frequency component is used for performing a subsequent process to form a spatial audio signal, as shown in fig. 3, which illustrates a spatial audio acquisition logic diagram described in the present disclosure.
It should be understood that, because of the high-pass characteristic of the differential beam, the high-pass characteristic of the differential beam is poor in the low-frequency part, so that the original signal of the microphone 0 (i.e., the microphone signal acquired by the microphone array) can only retain the low-frequency component through the low-pass filter, and is used as the LFE channel, the low-frequency component has a longer wavelength, so that the influence on the positioning of the human ear is less, the low-frequency effect is enhanced, and the spatial sense is not influenced. The rest channels are filtered by a high-pass filter to remove low-frequency components and then serve as high-frequency components for subsequent processing to form a spatial audio signal.
And S202, adding proper delay filtering and corresponding compensation filters to the microphone signals to obtain array signals with required directivity.
It should be understood that, for the high frequency component obtained in step S201, an appropriate delay filter and a corresponding compensation filter may be added to obtain an array signal of a desired directivity.
S203, acquiring a plurality of directivities of the microphone array.
In embodiments of the present disclosure, directivity characterizes the sensitivity of signals in different directions. The present disclosure obtains the directivity required in the three-dimensional space by obtaining a plurality of directional differential arrays, and by combining different differential arrays, as shown in fig. 4, a first-order differential array schematic diagram is shown.
Specifically, the above steps S202 to S203 are described in detail below.
The standard first-order differential array obtains a target signal by subtracting microphones between two microphones with the same main axis direction, and the directivity is controlled by adding an angular frequency-invariant delay to the subtracted microphone signals:
first, let the Where δ is the microphone spacing and c is the speed of sound.
The output compensation filter can be expressed as: Where ω is the angular frequency, and oc 1,1 is the delay filter coefficient,
Thus, for a signal at angle θ (the angle of incidence of the sound source at the microphone), the signal output by the array (i.e., the array signal described above) is expressed as: y (ω, θ) = (X 1(ω,θ)-X 2(ω,θ))H L (ω), where X n (ω, θ) represents the nth microphone signal.
Since the microphone spacing is much smaller than the wavelength, the amplitude difference of τ 0-∝ 1,1τ 0<<2π,X 1,X 2 is negligible and e x =1+x.
The signal Y (ω, θ) output by the array can be expressed as:
The directivity of the array (signal sensitivity for different directions) is: simplified, expressed as:
two of the most common orientations are (90 ° principal axis direction):
Directionality of | ∝ 1,1 | Angle of sensitivity 0 |
8-Shaped/dipole character | 0 | 0°,180° |
Heart-shaped | -1 | -90° |
The direction of the differential beam can be controlled by controlling the delay filter coefficients.
S204, decoding the array signal to obtain a spatial audio signal.
In the embodiments of the present disclosure, according to the principle of differential array, 3 pairs of microphones may constitute the following 5 first order differential arrays of different directivities:
Sequence number | Direction of main shaft | Directionality of | Selecting a microphone |
Array 1: | Positive direction of X axis | Heart-shaped | Microphone 0, microphone 1 |
Array 2: | Negative direction of X-axis | Heart-shaped | Microphone 0, microphone 1 |
Array 3: | positive Y-axis direction | 8-Shaped word | Microphone 0, microphone 2 |
Array 4: | Positive direction of Z axis | 8-Shaped word | Microphone 0, microphone 3 |
Array 5: | Positive direction of X axis | 8-Shaped word | Microphone 0, microphone 1 |
And acquiring the directivity required in the three-dimensional space through the combination of different order arrays, thereby acquiring the spatial audio signal.
S205, performing decoding processing on the spatial audio signal to output immersive multi-channel audio and/or ambisonic audio.
In embodiments of the present disclosure, the audio signals required for spatial audio are obtained through different differential beam designs. For example, different audio formats such as multi-channel audio and ambisonic (B-format) may be output, wherein the multi-channel audio, ambisonic audio are two formats of immersive (surround sound).
For example, in an alternative embodiment, according to the m\s recording principle, an m\s-3D recording system is constructed, and 5.1.4 channels of multi-channel audio are output by decoding the spatial audio signal. The two heart-shaped directional arrays with opposite directivities point to the positive direction of the X axis and the negative direction of the X axis, and the two 8-shaped directional arrays point to the positive direction of the Y axis and the positive direction of the Z axis respectively.
The manner in which the decoding results in multi-channel audio is shown below, where "+" indicates signal addition and "-" indicates signal inverse addition.
Sound channel | Array 1 | Array 2 | Array 3 | Array 4 |
Left side | + | + | - | |
In (a) | + | |||
Right side | + | - | - | |
Left ring | + | + | - | |
Right ring | + | - | - | |
Left side of front top | + | + | + | |
Front top right side | + | + | ||
Top left rear | + | + | + | |
Top right rear | + | - | + |
The microphone arrays according to the present invention are arranged in five types of arrays as shown above, wherein the directivity of the array 1 is shown in the xoy section as shown in fig. 5 (a); the directivity of the array 3 is shown in the xoy section as shown in fig. 5 (b), wherein +is positive phase, -is negative phase, and the same positive and negative phase signals cancel each other; the plane section of the array 4 at xoz is shown in fig. 5 (c), where + is positive phase, -is negative phase, and the same positive and negative phase signals cancel each other.
The directivity of the left channel and the right channel after decoding is shown in fig. 6 in a coordinate axis plane section, wherein the left channel is shown in fig. 6 (a), and the right channel is shown in fig. 6 (b).
In another embodiment of the present disclosure, the present disclosure may output standard ambisonic audio. It should be appreciated that the first order B-format is a first order decomposition of the spherical harmonics, as shown in fig. 7. The B format constituting the standard requires a full direction signal (W) and three 8-shaped direction signals (X, Y, Z) forward to each other. By choosing the corresponding array, the four components required to obtain the B format can be expressed as: w=microphone 0; x = array 1; y = array 2; z=array 5, as shown in fig. 8.
Therefore, the method and the device acquire the audio signals in different formats by decoding the spatial audio signals so as to meet the diversity requirement of the spatial audio acquisition.
Furthermore, in an alternative example, the arrangement of the microphone array in the UE may be laid out according to actual needs.
In one example, the microphone array is disposed in the UE in proximity to the voice acquisition component when hand-held call requirements are compromised. For example, the microphone array is arranged at the lower end of the mobile intelligent device and is closer to the position of the mouth of a person, so that better signal-to-noise ratio is ensured, as shown in fig. 9, which shows a schematic layout of the microphone array in the mobile device, wherein fig. 9 (a) is a back schematic view of the mobile device, and fig. 9 (b) is a front schematic view.
In another example, the microphone array is disposed in the UE in close proximity to the image acquisition component when video effects are compromised. For example, the array is disposed close to the camera and is aligned with the forward direction of the camera. The visual angle of the camera is ensured to be consistent as much as possible, so that better audio-visual effect is ensured. Fig. 10 shows a schematic diagram of the arrangement of the microphone array in the mobile device, where fig. 10 (a) is a schematic diagram of the back side of the mobile device and fig. 10 (b) is a schematic diagram of the front side.
In summary, according to the spatial audio acquisition method provided by the present disclosure, a plurality of groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on microphone signals acquired by the microphone arrays, so as to acquire spatial audio signals. The size of the spatial audio acquisition system can be controlled within a certain size by using the mutually orthogonal microphone arrays formed by the miniature microphones applying the beam technology so as to be arranged in the existing mobile equipment to form a pickup system capable of internally arranging the mobile intelligent equipment, meanwhile, the directivity of the microphone arrays is controlled by the differential beam technology, and the requirements of extra electroacoustic and acoustic hardware are reduced, so that the requirements of the mobile intelligent equipment on acquisition of immersive audio are met under the condition of controlling the volume of the equipment. In addition, different application requirements can be met by outputting audio in different formats, and the present disclosure can accommodate different application scenarios by arranging microphone arrays at different locations within a mobile device.
In the embodiment of the present application, the method provided by the embodiment of the present application is described from the perspective of the user equipment. In order to implement the functions in the method provided by the embodiment of the present application, the user equipment may include a hardware structure, a software module, and implement the functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above may be implemented in a hardware structure, a software module, or a combination of a hardware structure and a software module.
Corresponding to the spatial audio collection methods provided in the above embodiments, the present disclosure further provides a spatial audio collection device, and since the spatial audio collection device provided in the embodiments of the present disclosure corresponds to the spatial audio collection method provided in the above embodiments, implementation of the spatial audio collection method is also applicable to the spatial audio collection device provided in the embodiments, and will not be described in detail in the embodiments.
Fig. 11 is a schematic structural diagram of a spatial audio collection device 1100 according to an embodiment of the present disclosure, where the spatial audio collection device 1100 is disposed in a UE for execution, and multiple groups of microphone arrays are disposed in the UE, and maximum response directions of the multiple groups of microphone arrays are orthogonal to each other.
As shown in fig. 11, the apparatus 1100 includes: the spatial audio signal acquisition module 1110 is configured to perform differential beam processing on the microphone signals acquired by the microphone array, so as to acquire a spatial audio signal.
According to the spatial audio acquisition device provided by the disclosure, through arranging a plurality of groups of mutually orthogonal microphone arrays in the UE, differential beam processing is performed on microphone signals acquired by the microphone arrays so as to acquire spatial audio signals. The microphone array which is formed by the miniature microphones and is formed by the beam technology can be used, the size of the spatial audio acquisition system is controlled within a certain size while the directivity of the microphones is controlled so as to be built in the existing mobile equipment, a pickup system capable of internally arranging the mobile intelligent equipment is formed, the directivity of signals acquired by the pickup system is controlled by the differential beam technology, and the requirements of extra electroacoustic and acoustic hardware are reduced, so that the requirements of the mobile intelligent equipment on acquisition of immersive audio are met under the condition of controlling the volume of the equipment.
In some embodiments, the spatial audio signal acquisition module 1110 is further to: adding proper delay filtering and corresponding compensation filters to the microphone signals to obtain array signals with required directivity; and decoding the array signal to obtain a spatial audio signal.
In some embodiments, the spatial audio signal acquisition module 1110 is further to: a plurality of directivities of the microphone array are acquired, the directivities characterizing the sensitivity of the signals in different directions.
In some embodiments, the spatial audio signal acquisition module 1110 is further to: acquiring a plurality of directional differential arrays; and acquiring directivity required in a three-dimensional space through the combination of different differential arrays so as to acquire a spatial audio signal.
In some embodiments, the spatial audio signal acquisition module 1110 is further to: the spatial audio signal is decoded to output immersive multi-channel audio and/or ambisonic audio.
In some embodiments, the spatial audio signal acquisition module 1110 is further to: the microphone signal is subjected to a filtering process to obtain a low frequency component and a high frequency component, wherein the low frequency component is output as a low frequency effect and the high frequency component is used to form a spatial audio signal.
In some embodiments, the microphone array is arranged in the UE in any of the following ways: the microphone array is arranged at a position close to the voice acquisition component in the UE; the microphone array is disposed in the UE at a location proximate to the image acquisition component.
In some embodiments, the microphone arrays include a predetermined number of microphones forming three groups of microphone arrays that are orthogonal to each other or angularly offset from the orthogonal error by a predetermined range, the centers of the three groups of microphone arrays coinciding or having a distance that does not exceed an error threshold.
According to the spatial audio acquisition device provided by the disclosure, through arranging a plurality of groups of mutually orthogonal microphone arrays in the UE, differential beam processing is performed on microphone signals acquired by the microphone arrays so as to acquire spatial audio signals. The microphone array which is formed by the miniature microphones and is formed by the beam technology can be used, the size of the spatial audio acquisition system is controlled within a certain size while the directivity of the microphones is controlled so as to be built in the existing mobile equipment, a pickup system capable of internally arranging the mobile intelligent equipment is formed, the directivity of signals acquired by the pickup system is controlled by the differential beam technology, and the requirements of extra electroacoustic and acoustic hardware are reduced, so that the requirements of the mobile intelligent equipment on acquisition of immersive audio are met under the condition of controlling the volume of the equipment. In addition, different application requirements can be met by outputting audio in different formats, and the microphone array is arranged at different positions in the mobile device, so that different application scenes can be adapted.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a communication device 1200 according to an embodiment of the application. The communication apparatus 1200 may be a network device, a user device, a chip system, a processor, or the like that supports the network device to implement the above method, or a chip, a chip system, a processor, or the like that supports the user device to implement the above method. The device can be used for realizing the method described in the method embodiment, and can be particularly referred to the description in the method embodiment.
The communications apparatus 1200 can include one or more processors 1201. The processor 1201 may be a general purpose processor, a special purpose processor, or the like. For example, a baseband processor or a central processing unit. The baseband processor may be used to process communication protocols and communication data, and the central processor may be used to control communication devices (e.g., base stations, baseband chips, terminal equipment chips, DUs or CUs, etc.), execute computer programs, and process data of the computer programs.
Optionally, the communication device 1200 may further include one or more memories 1202, on which a computer program 1204 may be stored, and the processor 1201 executes the computer program 1204, so that the communication device 1200 performs the method described in the above method embodiments. Optionally, the memory 1202 may also have data stored therein. The communication device 1200 and the memory 1202 may be provided separately or may be integrated.
Optionally, the communication device 1200 may further include a transceiver 1205, an antenna 1206. The transceiver 1205 may be referred to as a transceiver unit, transceiver circuitry, or the like, for implementing a transceiver function. The transceiver 1205 may include a receiver, which may be referred to as a receiver or a receiving circuit, etc., for implementing a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., for implementing a transmitting function.
Optionally, one or more interface circuits 1207 may also be included in the communications device 1200. The interface circuit 1207 is configured to receive code instructions and transmit the code instructions to the processor 1201. The processor 1201 executes code instructions to cause the communication apparatus 1200 to perform the method described in the method embodiments described above.
In one implementation, a transceiver for implementing the receive and transmit functions may be included in the processor 1201. For example, the transceiver may be a transceiver circuit, or an interface circuit. The transceiver circuitry, interface or interface circuitry for implementing the receive and transmit functions may be separate or may be integrated. The transceiver circuit, interface or interface circuit may be used for reading and writing codes/data, or the transceiver circuit, interface or interface circuit may be used for transmitting or transferring signals.
In one implementation, the processor 1201 may store a computer program 1203, where the computer program 1203 runs on the processor 1201, and may cause the communication apparatus 1200 to perform the method described in the above method embodiment. The computer program 1203 may be solidified in the processor 1201, in which case the processor 1201 may be implemented in hardware.
In one implementation, the communication apparatus 1200 may include circuitry that may implement the functions of transmitting or receiving or communicating in the foregoing method embodiments. The processors and transceivers described in this disclosure may be implemented on integrated circuits (INTEGRATED CIRCUIT, ICs), analog ICs, radio frequency integrated circuits RFICs, mixed signal ICs, application SPECIFIC INTEGRATED Circuits (ASICs), printed circuit boards (printed circuit board, PCBs), electronic devices, and the like. The processor and transceiver may also be fabricated using a variety of IC process technologies such as complementary metal oxide semiconductor (complementary metal oxide semiconductor, CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (PMOS), bipolar junction transistor (bipolar junction transistor, BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
The communication apparatus described in the above embodiment may be a network device or a user device, but the scope of the communication apparatus described in the present application is not limited thereto, and the structure of the communication apparatus may not be limited by fig. 12. The communication means may be a stand-alone device or may be part of a larger device. For example, the communication device may be:
(1) A stand-alone integrated circuit IC, or chip, or a system-on-a-chip or subsystem;
(2) A set of one or more ICs, optionally including storage means for storing data, a computer program;
(3) An ASIC, such as a Modem (Modem);
(4) Modules that may be embedded within other devices;
(5) A receiver, a terminal device, an intelligent terminal device, a cellular phone, a wireless device, a handset, a mobile unit, a vehicle-mounted device, a network device, a cloud device, an artificial intelligent device, and the like;
(6) Others, and so on.
For the case where the communication device may be a chip or a chip system, reference may be made to the schematic structural diagram of the chip shown in fig. 13. The chip shown in fig. 13 includes a processor 1301 and an interface 1302. Wherein the number of processors 1301 may be one or more, and the number of interfaces 1302 may be a plurality.
Optionally, the chip further comprises a memory 1303, the memory 1303 being configured to store necessary computer programs and data.
Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block) and steps (steps) described in connection with the embodiments of the application may be implemented by electronic hardware, computer software, or combinations of both. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the functionality in a variety of ways for each particular application, but such implementation should not be construed as beyond the scope of the embodiments of the present application.
The application also provides a readable storage medium having stored thereon instructions which when executed by a computer perform the functions of any of the method embodiments described above.
The application also provides a computer program product which, when executed by a computer, implements the functions of any of the method embodiments described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer programs. When the computer program is loaded and executed on a computer, the flow or functions according to embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer program may be stored in or transmitted from one computer readable storage medium to another, e.g., from one website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid-state disk (solid-state drive STATE DISK, SSD)), or the like.
Those of ordinary skill in the art will appreciate that: the first, second, etc. numbers referred to in the present application are merely for convenience of description and are not intended to limit the scope of the embodiments of the present application, but also to indicate the sequence.
At least one of the present application may also be described as one or more, and a plurality may be two, three, four or more, and the present application is not limited thereto. In the embodiment of the application, for a technical feature, the technical features of the technical feature are distinguished by a first, a second, a third, a, B, a C, a D and the like, and the technical features described by the first, the second, the third, the a, the B, the C, the D are not in sequence or in order of magnitude.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Furthermore, it is to be understood that the various embodiments of the application may be practiced alone or in combination with other embodiments as the scheme permits.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
The foregoing is merely illustrative embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present application, and the application should be covered. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (11)
- A spatial audio acquisition method, characterized in that the method is performed by a user equipment UE, in which a plurality of groups of microphone arrays are arranged, the maximum response directions of each group of arrays being mutually orthogonal, the method comprising:and carrying out differential beam processing on the microphone signals acquired by the microphone array to acquire a spatial audio signal.
- The method of claim 1, wherein the differential beam processing of the microphone signals acquired by the microphone array to acquire spatial audio signals comprises:Adding proper delay filtering and corresponding compensation filters to the microphone signals to obtain array signals with required directivity;and decoding the array signal to obtain the spatial audio signal.
- The method according to claim 2, wherein the method further comprises:A plurality of directivities of the microphone array are acquired, the directivities characterizing the sensitivity of the signals in different directions.
- A method according to claim 3, characterized in that the method further comprises:Acquiring a differential array of the plurality of directivities;And acquiring directivity required in a three-dimensional space through the combination of different differential arrays so as to acquire the spatial audio signal.
- The method according to any one of claims 1 to 4, further comprising:The spatial audio signal is decoded to output immersive multi-channel audio and/or ambisonic audio.
- The method according to any one of claims 1 to 5, further comprising:Filtering the microphone signal to obtain a low frequency component and a high frequency component,Wherein the low frequency component is output as a low frequency effect and the high frequency component is used to form the spatial audio signal.
- The method according to any of claims 1 to 6, wherein the microphone array is arranged in the UE in any of the following ways:the microphone array is arranged at a position close to a voice acquisition component in the UE;the microphone array is disposed in the UE at a location proximate to an image acquisition component.
- The method according to any one of claims 1 to 7, wherein the microphone array comprises a predetermined number of microphones forming three groups of microphone arrays, the three groups of microphone arrays being mutually orthogonal or angularly offset by an orthogonal error within a predetermined range, the centers of the three groups of microphone arrays coinciding or having a distance not exceeding an error threshold.
- A spatial audio acquisition device, characterized in that the device is arranged to be executed by a user equipment UE, in which a plurality of groups of microphone arrays are arranged, the maximum response directions of each group of arrays being mutually orthogonal, the device comprising:And the spatial audio signal acquisition module is used for carrying out differential beam processing on the microphone signals acquired by the microphone array so as to acquire the spatial audio signals.
- A communication device, comprising: a transceiver; a memory; a processor, coupled to the transceiver and the memory, respectively, configured to control wireless signal transceiving of the transceiver and to enable the method of any one of claims 1-8 by executing computer-executable instructions on the memory.
- A computer storage medium, wherein the computer storage medium stores computer-executable instructions; the computer executable instructions, when executed by a processor, are capable of implementing the method of any one of claims 1-8.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/126234 WO2024082181A1 (en) | 2022-10-19 | 2022-10-19 | Spatial audio collection method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118235431A true CN118235431A (en) | 2024-06-21 |
Family
ID=90736504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280004436.0A Pending CN118235431A (en) | 2022-10-19 | 2022-10-19 | Spatial audio acquisition method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118235431A (en) |
WO (1) | WO2024082181A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9161149B2 (en) * | 2012-05-24 | 2015-10-13 | Qualcomm Incorporated | Three-dimensional sound compression and over-the-air transmission during a call |
CN105451151B (en) * | 2014-08-29 | 2018-09-21 | 华为技术有限公司 | A kind of method and device of processing voice signal |
WO2016123572A1 (en) * | 2015-01-30 | 2016-08-04 | Dts, Inc. | System and method for capturing, encoding, distributing, and decoding immersive audio |
US10477304B2 (en) * | 2016-06-15 | 2019-11-12 | Mh Acoustics, Llc | Spatial encoding directional microphone array |
-
2022
- 2022-10-19 CN CN202280004436.0A patent/CN118235431A/en active Pending
- 2022-10-19 WO PCT/CN2022/126234 patent/WO2024082181A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2024082181A1 (en) | 2024-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6023779B2 (en) | Audio information processing method and apparatus | |
CN108777732B (en) | Audio capture with multiple microphones | |
CN105814909B (en) | System and method for feeding back detection | |
CN104321812A (en) | Three-dimensional sound compression and over-the-air-transmission during a call | |
US20120013768A1 (en) | Electronic apparatus for generating modified wideband audio signals based on two or more wideband microphone signals | |
CN107534725A (en) | A kind of audio signal processing method and device | |
WO2004034734A1 (en) | Array device and portable terminal | |
CN103181192A (en) | Three-dimensional sound capturing and reproducing with multi-microphones | |
TWI566525B (en) | Hybrid common mode choke | |
KR101710174B1 (en) | Method and apparatus for reducing crosstalk in an integrated headset | |
US20210266665A1 (en) | Apparatus, Method and Computer Program for Obtaining Audio Signals | |
CN107017000B (en) | Apparatus, method and computer program for encoding and decoding an audio signal | |
CN106205630A (en) | Video recording system reduces the system of motor vibration noise | |
US20100104118A1 (en) | Earpiece based binaural sound capturing and playback | |
WO2022012328A1 (en) | Conference voice enhancement method, apparatus and system | |
CN118235431A (en) | Spatial audio acquisition method and device | |
US20160232886A1 (en) | Adaptive filtering for wired speaker amplifiers | |
CN113302689B (en) | Acoustic path modeling for signal enhancement | |
EP3240266A1 (en) | An apparatus, electronic device, system and method for capturing audio signals | |
CN111083250A (en) | Mobile terminal and noise reduction method thereof | |
CN111246345B (en) | Method and device for real-time virtual reproduction of remote sound field | |
EP2922222B1 (en) | Electronic device and audio-data transmission method | |
US20230156399A1 (en) | Electronic device and method of operating the same | |
WO2024026639A1 (en) | Method and apparatus for beamforming, device, and storage medium | |
US20230379623A1 (en) | Method for processing audio data and electronic device supporting same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |