CN117037816A - Multi-channel audio coding method, system, medium and equipment - Google Patents
Multi-channel audio coding method, system, medium and equipment Download PDFInfo
- Publication number
- CN117037816A CN117037816A CN202311111144.XA CN202311111144A CN117037816A CN 117037816 A CN117037816 A CN 117037816A CN 202311111144 A CN202311111144 A CN 202311111144A CN 117037816 A CN117037816 A CN 117037816A
- Authority
- CN
- China
- Prior art keywords
- spectrum coefficient
- coding
- coefficient
- downmix
- stereo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000003595 spectral effect Effects 0.000 claims abstract description 114
- 238000001228 spectrum Methods 0.000 claims abstract description 114
- 238000013139 quantization Methods 0.000 claims abstract description 12
- 238000005538 encapsulation Methods 0.000 claims abstract description 11
- 230000001131 transforming effect Effects 0.000 claims abstract description 3
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 5
- 238000007493 shaping process Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application discloses a multichannel audio coding method, a multichannel audio coding system, a multichannel audio coding medium and multichannel audio coding equipment, and belongs to the technical field of audio coding. The method comprises the following steps: discrete cosine transforming the left channel audio data and the right channel audio data to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and stereo parameters; carrying out standard coding flow on the downmix spectral coefficient, and carrying out quantization and arithmetic coding on the stereo parameter to obtain a corresponding coding result; and carrying out code stream encapsulation on the coding result to obtain a coding code stream. When the multi-channel audio is encoded, the application utilizes the space information between the two channels to only transmit the down-mix spectrum coefficient and the stereo parameter during encoding, thereby reducing the encoding code rate, saving the Bluetooth air bandwidth, further reducing the interference between Bluetooth devices and improving the user experience.
Description
Technical Field
The present application relates to the field of audio encoding and decoding technologies, and in particular, to a method, a system, a medium, and an apparatus for encoding multi-channel audio.
Background
In encoding TWS (True Wireless Stereo) true wireless stereo, i.e. in encoding multi-channel audio, the LC3 audio codec is encoded on a mono basis, for example, if the input to the encoder is a multi-channel signal (such as two-channel stereo, 5.1 surround or 7.1 surround), the LC3 audio codec encodes each channel independently when encoded, which results in the overall code rate of the LC3 audio encoder being a linear multiple of the code rate of the single channel. That is, for the case configured at a sampling rate of 48k, 10ms, the standard recommended mono code rate is 124kbps, resulting in: the total code rate in encoding binaural stereo is 2x124 = 248kbps; the total code rate in encoding 5.1 surround sound is 6x124=744 kbps; the total code rate in encoding 7.1 surround sound is 8x124=992 kbps.
In the LC3 audio encoder, as the number of channels increases, the transmission power of the bluetooth device increases due to the linearly increasing code rate, so that interference among the bluetooth devices is increased, especially in a public environment, due to the increase of interference, transmission reliability is reduced, and voice jamming is caused, so that user experience is affected.
Disclosure of Invention
Aiming at the problem that when an LC3 codec encodes multi-channel audio, the encoding code rate is higher, so that the interference exists between devices and the communication between the devices is influenced, the application provides a multi-channel audio encoding method, a multi-channel audio encoding system, a multi-channel audio medium and multi-channel audio equipment.
In a first aspect, the present application proposes a discrete cosine transform for left channel audio data and right channel audio data, respectively, to obtain a left channel spectral coefficient and a right channel spectral coefficient; performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and stereo parameters; carrying out standard coding flow on the downmix spectral coefficient, and carrying out quantization and arithmetic coding on the stereo parameter to obtain a corresponding coding result; and carrying out code stream encapsulation on the coding result to obtain a coding code stream.
Optionally, performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and a stereo coefficient, including: downmixing the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmixed spectrum coefficient; respectively calculating corresponding improved discrete sine spectrum coefficients according to the left channel spectrum coefficient and the right channel spectrum coefficient; calculating improved discrete Fourier signals corresponding to the channels according to the improved discrete sine spectral coefficients corresponding to the left channel and the right channel; and calculating stereo parameters according to the improved discrete Fourier signals corresponding to the left channel and the right channel.
Optionally, downmixing the left channel spectral coefficient and the right channel spectral coefficient to obtain a downmix spectral coefficient, including: sub-band division is carried out on the left channel spectrum coefficient and the right channel spectrum coefficient; and (3) down-mixing the left channel spectrum coefficient and the right channel spectrum coefficient according to the corresponding sequence of dividing the sub-bands to obtain a down-mixed spectrum coefficient.
Optionally, downmixing the left channel spectral coefficient and the right channel spectral coefficient to obtain a downmix spectral coefficient, including: and (3) carrying out energy balance on the obtained downmix spectral coefficients and optimizing the downmix spectral coefficients.
Optionally, performing code stream encapsulation on the encoding result to obtain an encoded code stream, including: when the encoding result is encapsulated, a bit indication bit is added to indicate the enabling condition of the current audio frame.
Optionally, the method further comprises: performing code stream analysis and arithmetic residual error decoding on the coded code stream to obtain a downmix spectral coefficient and a stereo parameter; and carrying out stereo decoding on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data.
Optionally, stereo decoding is performed on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data, including: calculating an improved discrete sine spectrum coefficient corresponding to the downmix spectral coefficient, and constructing a first improved discrete Fourier signal according to the downmix spectral coefficient and the improved discrete sine coefficient; decorrelating the first modified discrete fourier signal to obtain a second modified discrete fourier signal; upmixing according to the first improved discrete Fourier signal, the second improved discrete Fourier signal and stereo parameters to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; and performing discrete cosine inverse transformation on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain left channel audio data and right channel audio data.
In a second aspect, the present application proposes a multi-channel audio coding system comprising: a module for performing discrete cosine transform on the left channel audio data and the right channel audio data to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; a module for performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and stereo parameters;
the module is used for carrying out standard coding flow on the downmix spectral coefficient, and carrying out quantization and arithmetic coding on the stereo parameter to obtain a corresponding coding result; and the module is used for carrying out code stream encapsulation on the coding result to obtain a coded code stream.
In a third aspect, the present application proposes a computer readable storage medium storing a computer program, wherein the computer program is operative to perform the multi-channel audio encoding method in scheme one.
In a fourth aspect, the present application proposes a computer device comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates the computer program to perform the multi-channel audio encoding method in scheme one.
When the multi-channel audio is encoded, the application utilizes the space information between the two channels, and only transmits the down-mixing spectrum coefficient and the stereo parameter of the down-mixing channel during encoding, thereby reducing the encoding code rate, saving the Bluetooth air bandwidth, further reducing the interference between Bluetooth devices and improving the user experience.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description of the embodiments will briefly describe the drawings that are required to be used, and it is apparent that the drawings in the following description exemplarily show some embodiments of the present application.
FIG. 1 is a schematic diagram of one embodiment of a multi-channel audio encoding method of the present application;
FIG. 2 is a schematic diagram of one example of stereo encoding of the present application;
fig. 3 is a schematic diagram of an example of a multi-channel audio encoding method of the present application;
FIG. 4 is a schematic diagram of one example of stereo decoding of the present application;
fig. 5 is a schematic diagram of an example of a multi-channel audio encoding method of the present application;
fig. 6 is a schematic diagram of an embodiment of the multi-channel audio codec system of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
The preferred embodiments of the present application will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present application can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In encoding TWS (True Wireless Stereo) true wireless stereo, i.e. in encoding multi-channel audio, the LC3 audio codec is encoded on a mono basis, for example, if the input to the encoder is a multi-channel signal (such as two-channel stereo, 5.1 surround or 7.1 surround), the LC3 audio codec encodes each channel independently when encoded, which results in the overall code rate of the LC3 audio encoder being a linear multiple of the code rate of the single channel. That is, for the case configured at a sampling rate of 48k, 10ms, the standard recommended mono code rate is 124kbps, resulting in: the total code rate in encoding binaural stereo is 2x124 = 248kbps; the total code rate in encoding 5.1 surround sound is 6x124=744 kbps; the total code rate in encoding 7.1 surround sound is 8x124=992 kbps.
In the LC3 audio encoder, as the number of channels increases, the transmission power of the bluetooth device increases due to the linearly increasing code rate, so that interference among the bluetooth devices is increased, and especially in a public environment, the increased interference reduces the reliability of transmission, causes sound to be blocked, and affects the user experience.
In view of the above problems, the present application provides a multi-channel audio encoding method, system, medium and apparatus. The method comprises the following steps: discrete cosine transforming the left channel audio data and the right channel audio data to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and stereo parameters; carrying out standard coding flow on the downmix spectral coefficient, and carrying out quantization and arithmetic coding on the stereo parameter to obtain a corresponding coding result; and carrying out code stream encapsulation on the coding result to obtain a coding code stream.
When the multi-channel audio is encoded, the application utilizes the space information between the two channels, and only transmits the down-mixing spectrum coefficient of the down-mixing channel and the stereo parameter as the space information parameter when encoding, thereby reducing the encoding code rate, saving the Bluetooth air bandwidth, further reducing the interference among Bluetooth devices and improving the user experience.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The specific embodiments described below may be combined with one another to form new embodiments. The same or similar ideas or processes described in one embodiment may not be repeated in certain other embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an embodiment of a multi-channel audio encoding method of the present application.
In the embodiment shown in fig. 1, the multi-channel audio encoding method of the present application includes a process S101 of performing discrete cosine transform on left channel audio data and right channel audio data, respectively, to obtain left channel spectral coefficients and right channel spectral coefficients.
In the embodiment shown in fig. 1, when encoding two-channel audio, low-delay modified discrete cosine transform is performed on left-channel audio data and right-channel audio data, respectively, to obtain corresponding left-channel spectral coefficients and right-channel spectral coefficients.
In the embodiment shown in fig. 1, the multi-channel audio encoding method of the present application includes a process S102 of performing parametric stereo encoding on left channel spectral coefficients and right channel lineages to obtain downmix spectral coefficients and stereo parameters.
In this embodiment, in the encoding process of the present application, the present application acquires spatial information of the binaural audio data and encodes the spatial information. The down-mix spectrum coefficient and the stereo parameter are obtained to encode, so that the original encoding process of the audio data of each channel is replaced, and the code rate is reduced.
Optionally, performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and a stereo coefficient, including: downmixing a left channel spectrum coefficient and a right channel spectrum coefficient MDCT to obtain a downmixed spectrum coefficient; respectively calculating corresponding MDST (modified discrete sine) spectrum coefficients according to the left channel spectrum coefficient and the right channel spectrum coefficient; calculating MDFT (modified discrete Fourier) signals corresponding to the channels according to MDST spectral coefficients corresponding to the left channel and the right channel; and calculating stereo parameters according to the MDFT signals corresponding to the left channel and the right channel.
In this alternative embodiment, when stereo encoding is performed on the left channel spectral coefficient and the right channel spectral coefficient, a downmix operation is performed on the left channel spectral coefficient and the right channel spectral coefficient, so as to obtain a downmix spectral coefficient after mixing the left channel spectral coefficient and the right channel spectral coefficient. And subsequently, taking the downmix coefficients as processing objects, and performing a coding and decoding process. Therefore, compared with the process of respectively encoding the left channel spectrum coefficient and the right channel spectrum coefficient in the prior art, the method can reduce calculation force and code rate. In the process of calculating stereo parameters, firstly calculating MDST coefficients corresponding to all channels according to left channel spectrum coefficients and right channel spectrum coefficients; and calculating MDFT signals corresponding to the left and right channels according to the MDFT coefficients of the left and right channels, and finally calculating to obtain stereo parameters.
Optionally, downmixing the left channel spectral coefficient and the right channel spectral coefficient to obtain a downmix spectral coefficient, including: sub-band division is carried out on the left channel spectrum coefficient and the right channel spectrum coefficient; and (3) down-mixing the left channel spectrum coefficient and the right channel spectrum coefficient according to the corresponding sequence of dividing the sub-bands to obtain a down-mixed spectrum coefficient.
In this alternative embodiment, when the left channel spectral coefficient and the right channel spectral coefficient are downmixed, the left channel spectral coefficient and the right channel spectral coefficient are sub-band divided, and then the downmixing process of the left and right channel spectral coefficients is performed in the order in which the sub-bands correspond.
Specifically, a low-delay modified discrete cosine transform is performed to obtain MDCT (modified discrete cosine) coefficients t (N) =x (Z-N) of the left and right channels F +n),for n=0…2·N F -1-Z
t(2N F -Z+n)=0,for n=0…Z-1
for k=0…N F -1
X (k), the spectral coefficient of each frame, the spectral coefficient distribution of the left and right channels is denoted as X Left (k) And X Right (k)。
In sub-band division, the spectral coefficients are divided into a plurality of frequency bands according to standard specifications, taking 48kHz and 10ms frame length configuration as an example, 400 spectral coefficients are divided into 64 frequency bands Int I_48000[65] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,22,24,26,28,30,32,34,36,39,42,45,48,51,55,59,63,67,71,76,81,86,92,98,105,112,119,127,135,144,154,164,175,186,198,211,225,240,256,273,291,310,330,352,375,400};
based on the MDCT spectral coefficients of the left and right channels, the downmix spectral coefficients are calculated:
X mix (k)=X left (k)+X right (k),k=0…N E -1
optionally, the energy balance is performed on the obtained downmix spectral coefficients, and the downmix spectral coefficients are optimized.
Specifically, to avoid the overflow or out of range of the downmix spectral coefficients, energy equalization is performed:
specifically, when calculating stereo parameters, the corresponding MDST spectral coefficients are derived first based on the MDCT spectral coefficients of the left and right channels. In the calculation process, reference may be made to the prior art, the calculation method of the MDST signal is as follows, wherein C 0 And C 1 For MDCT spectral line basis vector matrix, S 0 And S is 1 For the MDST spectral line base vector matrix, T represents matrix transposition, and the MDST spectral line meaning of the m-th frame can be calculated from the spectral line of the previous frame (m-1 th frame) and the spectral line of the following frame (m+1 th frame) at corresponding positions and the base vector matrix, and the specific formula is as follows:
constructing MDFT signals of left and right channels:
Z(k)=X mdct (k)+jX mdst (k)
for the left and right channels, new signals are respectively constructed by using the formulas, and the corresponding left and right channels are marked as Z left (k) And Z right (k)
In order to further reduce the code rate in practical application, the whole frequency band can be divided into a smaller number of sub-bands, so as to reduce the number of parameters to be transmitted:
the ILD, ITD and ICC obtained above are stereo parameters, wherein ILD is Interchannel Level Difference, inter-channel level difference parameters; ITD Interchannel Time Difference, inter-channel time difference parameter; ICC Interchannel Coherence, inter-channel correlation parameters.
Fig. 2 is a schematic diagram of one example of stereo encoding of the present application.
Shown in fig. 2 is a specific process of parametric stereo coding. In the parametric stereo coding process, the left channel MDCT spectral coefficients and the right channel MDCT spectral coefficients are subjected to a downmix operation, and the obtained downmix spectral coefficients are referred to as a downmix signal shown in fig. 2. In the stereo parameter calculation process, corresponding MDST coefficients are calculated according to the left channel MDCT spectral coefficients and the right channel MDCT spectral coefficients respectively, then MDFT parameters corresponding to the left channel and the right channel are calculated, and finally the stereo parameters are obtained, wherein the specific calculation process is as described above.
In the embodiment shown in fig. 1, the multi-channel audio encoding method of the present application includes a process S103, performing a standard encoding procedure on the downmix coefficients, and performing quantization and arithmetic encoding on the stereo parameters to obtain a corresponding encoding result.
In this embodiment, the downmix coefficients are subjected to coding processes such as transform domain noise shaping, time domain noise shaping, quantization, noise level estimation, and arithmetic and residual coding. The stereo parameters are subjected to quantization and arithmetic coding processes.
In the embodiment shown in fig. 1, the multi-channel audio encoding method of the present application includes a process S104, and a code stream is encapsulated for the encoding result to obtain an encoded code stream.
Optionally, performing code stream encapsulation on the encoding result to obtain an encoded code stream, including: when the encoding result is encapsulated, a bit indication bit is added to indicate the enabling condition of the current audio frame.
In this alternative embodiment, before the encoding of the present application, the parameters of the transmitting end and the receiving end are negotiated and configured, and the audio format, the sampling rate, the code rate range, whether the parameter stereo preprocessing is supported, etc. are negotiated. If both the reflecting end and the receiving end support the stereo coding in the application, the transmitting end carries out the stereo coding; if the receiving end does not support stereo coding, standard coding flow is carried out at the transmitting end. Therefore, in the encoding process, a specific case of the audio frame needs to be described. Therefore, when the global enabling parameter is used for stereo coding, a bit indication is added in the output code stream of each frame and is positioned behind the code stream of the time domain noise shaping, the current frame is enabled by using a value 1, and the stereo coding is carried out; the value 0 is used to indicate that the current frame is not enabled, i.e. the current frame is coded standard. The bits may be written into the last of side information (side information); the Side information is a part of the LC3 coded output code stream, and is mainly used for storing information of some frame levels, such as bandwidth, global gain, TNS activation flag, and the like.
Specifically, fig. 3 is a schematic diagram of an example of the multi-channel audio encoding method of the present application.
As shown in fig. 3, in the encoding process of the present application, parametric stereo encoding is performed after low-delay modified discrete cosine transforms are performed on the left channel audio data and the right channel audio data, respectively. The output downmix spectral coefficients are then subjected to a standard encoding procedure, while the resulting stereo parameters are subjected to quantization and arithmetic encoding procedures. Compared with the prior art, the method only encodes the downmix spectral coefficient, and does not encode the two spectral coefficients, so that the encoding rate can be reduced.
Optionally, the present application further includes: performing code stream analysis and arithmetic residual error decoding on the coded code stream to obtain a downmix spectral coefficient and a stereo parameter; and carrying out stereo decoding on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data.
In this alternative embodiment, in the process of decoding the encoded code stream, the encoded code stream is first subjected to code stream analysis and arithmetic and residual coding to obtain the downmix coefficients and the stereo parameters. And then respectively decoding the down-mix spectral coefficient and the stereo parameter to obtain left channel audio data and right channel audio data.
Optionally, stereo decoding is performed on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data, including: calculating MDST spectrum coefficients corresponding to the downmix spectrum coefficients, and constructing a first MDFT signal according to the downmix spectrum coefficients and the MDST coefficients; decorrelating the first MDFT signals to obtain second MDFT signals; upmixing is carried out according to the first MDFT signal, the second MDFT signal and the stereo parameter to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; and performing discrete cosine inverse transformation on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain left channel audio data and right channel audio data.
Specifically, fig. 4 is a schematic diagram of one example of stereo decoding of the present application. As shown in fig. 4, the MDST spectral coefficients corresponding to the MDCT spectral coefficients of the downmix signal are calculated and respectively denoted asAnd->First construct a first MDFT signal:
then calculate the decorrelated signal of the first MDFT signal:
wherein the method comprises the steps of
Wherein the all-pass decorrelation filterThe time domain expression of (2) is
Up-mixing is performed based on the two-path MDFT signal and stereo parameter information (ILD, ITD and ICC) to obtain MDCT spectral coefficients of the left and right channels. First calculate the modulation signal of the correlation ICC, wherein
The modulated signal of the intensity difference ILD is recalculated:
wherein g 0 And g 1 The formula needs to be satisfied:
finally, modulating the time difference to obtain the spectral coefficients of the left and right channels:
and finally, performing inverse transformation on the spectral coefficients of the left and right channels to obtain a time domain signal.
Specifically, fig. 5 is a schematic diagram of an example of the multi-channel audio encoding method of the present application.
Fig. 5 shows a decoding process corresponding to the encoding of multi-channel audio. As shown in fig. 5, the enabled state of the current frame is first identified according to the side information in the input code stream. And obtaining the down-mix spectrum coefficient and the stereo parameter through code stream analysis and arithmetic and residual error decoding during decoding. Performing inverse quantization on stereo parameters; and performing noise filling, global gain, time domain noise shaping decoding and transform domain noise shaping decoding on the downmix spectral coefficients, finally performing parametric stereo decoding to obtain left channel spectral coefficients and right channel spectral coefficients, and finally performing low-delay modified discrete cosine inverse transformation and long-term post-filter decoding to obtain left channel audio data and right channel audio data.
When the multi-channel audio is encoded, the application utilizes the space information between the two channels, and only transmits the down-mixing spectrum coefficient and the space information parameter of the down-mixing channel to encode when encoding, thereby reducing the encoding rate, saving the Bluetooth air bandwidth, further reducing the interference between Bluetooth devices and improving the user experience.
Fig. 6 is a schematic diagram of an embodiment of the multi-channel audio codec system of the present application.
In the embodiment shown in fig. 6, the multi-channel audio codec system of the present application includes: a module 601 for performing discrete cosine transform on the left channel audio data and the right channel audio data to obtain a left channel spectral coefficient and a right channel spectral coefficient, respectively; a module 602 for parametric stereo encoding of left channel spectral coefficients and right channel lineages to obtain downmix spectral coefficients and stereo parameters; a module 603 for performing standard coding flow on the downmix spectral coefficients, and performing quantization and arithmetic coding on the stereo parameters to obtain a corresponding coding result; and a module 604 for performing code stream encapsulation on the encoding result to obtain an encoded code stream.
Optionally, performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and a stereo coefficient, including: downmixing the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmixed spectrum coefficient; respectively calculating corresponding MDST spectral coefficients according to the left channel spectral coefficients and the right channel spectral coefficients (MDCT); calculating MDFT signals corresponding to the channels according to MDST spectral coefficients corresponding to the left channel and the right channel; and calculating stereo parameters according to the MDFT signals corresponding to the left channel and the right channel.
Optionally, downmixing the left channel spectral coefficient and the right channel spectral coefficient to obtain a mixed spectral coefficient, including: sub-band division is carried out on the left channel spectrum coefficient and the right channel spectrum coefficient; and (3) down-mixing the left channel spectrum coefficient and the right channel spectrum coefficient according to the corresponding sequence of dividing the sub-bands to obtain a down-mixed spectrum coefficient.
Optionally, downmixing the left channel spectral coefficient and the right channel spectral coefficient to obtain a downmix spectral coefficient, including: and (3) carrying out energy balance on the obtained downmix spectral coefficients and optimizing the downmix spectral coefficients.
Optionally, performing code stream encapsulation on the encoding result to obtain an encoded code stream, including: when the encoding result is encapsulated, a bit indication bit is added to indicate the enabling condition of the current audio frame.
Optionally, the method further comprises: performing code stream analysis and arithmetic residual error decoding on the coded code stream to obtain a downmix spectral coefficient and a stereo parameter; and carrying out stereo decoding on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data.
Optionally, stereo decoding is performed on the downmix spectral coefficients and the stereo parameters to obtain left channel audio data and right channel audio data, including: calculating MDST spectrum coefficients corresponding to the downmix spectrum coefficients, and constructing a first path of MDFT signals according to the downmix spectrum coefficients and the MDST coefficients; decorrelation is carried out on the first path of MDFT signals to obtain second path of MDFT signals; upmixing is carried out according to the first path of MDFT signals, the second path of MDFT signals and stereo parameters to obtain a left channel spectrum coefficient and a right channel spectrum coefficient; and performing discrete cosine inverse transformation on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain left channel audio data and right channel audio data.
When the multi-channel audio is encoded, the application utilizes the space information between the two channels, and only transmits the down-mixed channel and the space information parameter during encoding, thereby reducing the encoding code rate, saving the Bluetooth air bandwidth, further reducing the interference between Bluetooth devices and improving the user experience.
In one embodiment of the application, a computer readable storage medium stores computer instructions, wherein the computer instructions are operative to perform the multi-channel audio encoding method described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
The processor may be a central processing unit (English: central Processing Unit; CPU; for short), or other general purpose processor, digital signal processor (English: digital Signal Processor; for short DSP), application specific integrated circuit (English: application Specific Integrated Circuit; ASIC; for short), field programmable gate array (English: field Programmable Gate Array; FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, etc. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one embodiment of the application, a computer device includes a processor and a memory storing computer instructions, wherein: the processor operates the computer instructions to perform the multi-channel audio encoding method described in any of the embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The foregoing is only illustrative of the present application and is not to be construed as limiting the scope of the application, and all equivalent structural changes made by the present application and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present application.
Claims (10)
1. A multi-channel audio encoding method, comprising:
discrete cosine transforming the left channel audio data and the right channel audio data to obtain a left channel spectrum coefficient and a right channel spectrum coefficient;
performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and a stereo parameter;
performing standard coding flow on the downmix spectral coefficients, and performing quantization and arithmetic coding on the stereo parameters to obtain corresponding coding results;
and carrying out code stream encapsulation on the coding result to obtain a coding code stream.
2. The multi-channel audio coding method according to claim 1, wherein said parametric stereo coding the left channel spectral coefficients and the right channel spectral coefficients to obtain downmix spectral coefficients and stereo coefficients comprises:
downmixing the left channel spectrum coefficient and the right channel spectrum coefficient to obtain the downmix spectrum coefficient;
respectively calculating corresponding improved discrete sine spectrum coefficients according to the left channel spectrum coefficient and the right channel spectrum coefficient;
calculating improved discrete Fourier signals corresponding to the channels according to the improved discrete sine spectral coefficients corresponding to the left channel and the right channel;
and calculating the stereo parameters according to the improved discrete Fourier signals corresponding to the left channel and the right channel.
3. The multi-channel audio encoding method according to claim 2, wherein said downmixing the left channel spectral coefficients and the right channel spectral coefficients to obtain the mixed spectral coefficients comprises:
sub-band division is carried out on the left channel spectrum coefficient and the right channel spectrum coefficient;
and carrying out down-mixing on the left channel spectrum coefficient and the right channel spectrum coefficient according to the corresponding sequence of dividing sub-bands to obtain the down-mixed spectrum coefficient.
4. A multi-channel audio coding method according to claim 3, wherein said downmixing the left channel spectral coefficients and the right channel spectral coefficients to obtain the downmix spectral coefficients comprises:
and carrying out energy equalization on the downmix spectral coefficients and optimizing the downmix spectral coefficients.
5. The multi-channel audio coding method according to claim 1, wherein said performing a code stream encapsulation on the coding result to obtain a coded code stream comprises:
and adding a bit indication bit when the coding result is packaged, and indicating the enabling condition of the current audio frame.
6. The multi-channel audio encoding method of claim 1, further comprising:
performing code stream analysis and arithmetic residual error decoding on the coded code stream to obtain the downmix spectral coefficient and the stereo parameter;
and carrying out stereo decoding on the downmix spectral coefficient and the stereo parameter to obtain left channel audio data and right channel audio data.
7. The multi-channel audio encoding method of claim 6, wherein said stereo decoding the downmix coefficients and the stereo parameters to obtain left channel audio data and right channel audio data comprises:
calculating an improved discrete sine spectrum coefficient corresponding to the downmix spectrum coefficient, and constructing a first improved discrete Fourier signal according to the downmix spectrum coefficient and the improved discrete sine coefficient;
decorrelating the first modified discrete fourier signal to obtain a second modified discrete fourier signal;
upmixing according to the first modified discrete Fourier signal, the second modified discrete Fourier signal and the stereo parameter to obtain the left channel spectrum coefficient and the right channel spectrum coefficient;
and performing inverse discrete cosine transform on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain the left channel audio data and the right channel audio data.
8. A multi-channel audio coding system, comprising:
a module for performing discrete cosine transform on the left channel audio data and the right channel audio data to obtain a left channel spectrum coefficient and a right channel spectrum coefficient;
a module for performing parametric stereo coding on the left channel spectrum coefficient and the right channel spectrum coefficient to obtain a downmix spectrum coefficient and stereo parameters;
the module is used for carrying out standard coding flow on the downmix spectral coefficient, and carrying out quantization and arithmetic coding on the stereo parameter to obtain a corresponding coding result;
and a module for carrying out code stream encapsulation on the coding result to obtain a coded code stream.
9. A computer readable storage medium storing a computer program, wherein the computer program is operative to perform the multi-channel audio encoding method of any one of claims 1-7.
10. A computer device comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates a computer program to perform the multi-channel audio coding method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311111144.XA CN117037816A (en) | 2023-08-31 | 2023-08-31 | Multi-channel audio coding method, system, medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311111144.XA CN117037816A (en) | 2023-08-31 | 2023-08-31 | Multi-channel audio coding method, system, medium and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117037816A true CN117037816A (en) | 2023-11-10 |
Family
ID=88644918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311111144.XA Pending CN117037816A (en) | 2023-08-31 | 2023-08-31 | Multi-channel audio coding method, system, medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117037816A (en) |
-
2023
- 2023-08-31 CN CN202311111144.XA patent/CN117037816A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10741187B2 (en) | Encoding of multi-channel audio signal to generate encoded binaural signal, and associated decoding of encoded binaural signal | |
KR102230727B1 (en) | Apparatus and method for encoding or decoding a multichannel signal using a wideband alignment parameter and a plurality of narrowband alignment parameters | |
US8433583B2 (en) | Audio decoding | |
US8655670B2 (en) | Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction | |
KR101662681B1 (en) | Multi-channel audio encoder and method for encoding a multi-channel audio signal | |
RU2665214C1 (en) | Stereophonic coder and decoder of audio signals | |
CN102084418B (en) | Apparatus and method for adjusting spatial cue information of a multichannel audio signal | |
CA2750451C (en) | Upmixer, method and computer program for upmixing a downmix audio signal | |
KR101453732B1 (en) | Method and apparatus for encoding and decoding stereo signal and multi-channel signal | |
CN101410889A (en) | Controlling spatial audio coding parameters as a function of auditory events | |
JP2008530616A (en) | Near-transparent or transparent multi-channel encoder / decoder configuration | |
TWI792006B (en) | Audio synthesizer, signal generation method, and storage unit | |
CN104246873A (en) | Parametric encoder for encoding a multi-channel audio signal | |
KR20140139586A (en) | Method for parametric spatial audio coding and decoding, parametric spatial audio coder and parametric spatial audio decoder | |
JP2015517121A (en) | Inter-channel difference estimation method and spatial audio encoding device | |
CN112233682B (en) | Stereo encoding method, stereo decoding method and device | |
CN110462733A (en) | The decoding method and codec of multi-channel signal | |
EP2212883B1 (en) | An encoder | |
CN112151045B (en) | Stereo encoding method, stereo decoding method and device | |
CN117037816A (en) | Multi-channel audio coding method, system, medium and equipment | |
WO2009068086A1 (en) | Mutichannel audio encoder, decoder, and method thereof | |
MX2008011994A (en) | Generation of spatial downmixes from parametric representations of multi channel signals. | |
GB2582916A (en) | Spatial audio representation and associated rendering | |
MX2008010631A (en) | Audio encoding and decoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |