CN107358961B

CN107358961B - Coding method and coder for multi-channel signal

Info

Publication number: CN107358961B
Application number: CN201610305243.5A
Authority: CN
Inventors: 张兴涛; 刘泽新; 苗磊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-05-10
Filing date: 2016-05-10
Publication date: 2021-09-17
Anticipated expiration: 2036-05-10
Also published as: WO2017193549A1; CN107358961A

Abstract

The embodiment of the invention provides a coding method and a coder of a multi-channel signal, wherein the method comprises the following steps: and constructing a target frequency domain signal according to the multi-channel signal, enabling the phase of the target frequency domain signal to be linearly related to the IPD of the first channel and the second channel, transforming the target frequency domain signal into a target time domain signal, and extracting an ITD parameter based on the target time domain signal.

Description

Coding method and coder for multi-channel signal

Technical Field

Embodiments of the present invention relate to the field of audio coding, and more particularly, to a method and an encoder for encoding a multi-channel signal.

Background

With the improvement of quality of life, people's demand for high-quality audio is increasing. Compared with single-channel audio, stereo audio has the sense of direction and distribution of each sound source, and can improve the definition, intelligibility and presence of sound, thereby being popular among people.

The Stereo processing techniques mainly include sum/sum (MS) coding, Intensity Stereo (IS) coding, and Parametric Stereo (PS) coding.

MS coding carries out sum and difference transformation on two paths of signals based on correlation between channels, and energy of each channel is mainly concentrated in a sum channel, so that redundancy between the channels is removed. In the MS coding technique, the code rate saving depends on the correlation of the input signal, and when the correlation of the left and right channel signals is poor, the left channel signal and the right channel signal need to be transmitted separately. The IS coding IS based on the characteristic that the human auditory system IS insensitive to the fine result of the phase difference of the high frequency components (for example, components greater than 2 kHz) of the vocal tract, and the high frequency components of the left and right signals are simplified. However, the IS coding technique IS only effective for high frequency components, and if the IS coding process IS extended to low frequency, serious artifacts will be caused. PS coding is based on a binaural auditory model, converting stereo to mono signal at the encoding end and a small number of spatial parameters (or spatial perceptual parameters) describing the spatial sound field, as shown in fig. 1 (x in fig. 1)_LFor left channel time domain signals, x_RA right channel time domain signal). The decoding end obtains the mono signal and then restores the stereo sound by further combining the spatial parameters, as shown in fig. 2. Compared with MS coding, PS coding compression ratio is high, higher coding gain can be obtained on the premise of keeping better tone quality, and the method can work in full audio bandwidth and can well workThe spatial perception effect of stereo sound is restored.

In PS coding, spatial parameters include Inter-channel correlation (IC), Inter-channel Level Difference (ILD), Inter-channel Time Difference (ITD), and Inter-channel Phase Difference (IPD). IC describes the inter-channel cross-correlation or coherence, which determines the perception of the sound field range, which can improve the audio signal spatial perception and sound stability. ILD is used to resolve the horizontal direction angle of stereo sources, describing the inter-channel intensity differences, which parameter will affect the frequency content of the entire spectrum. ITD and IPD are spatial parameters representing the horizontal orientation of the sound source, describing the time and phase differences between the channels, which mainly affect the frequency components below 2 kHz. ILD, ITD and IPD can decide the perception of human ears to the sound source position, can effectively confirm the sound field position, have important effect to the recovery of stereophonic signal.

The encoding flow of the ITD parameters is shown in fig. 3. As can be seen from fig. 3, in the prior art, the extraction of the ITD parameters is implemented based on frequency domain signals. The main steps of encoding of the ITD parameters include:

step 1, respectively carrying out time-frequency transformation on the left and right channel time domain signals to obtain frequency domain signals of left and right channels.

Specifically, the following formula may be adopted for time-frequency transformation:

wherein x is_L(n) and x_R(n) are time domain signals of left and right sound channels respectively, Length is the frame Length or the subframe Length, and L is the Length of time-frequency transformation.

And 2, extracting ITD parameters based on the frequency domain signals of the left and right sound channels.

In particular, step 2 may be subdivided into the following steps:

step 2.1, based on the formula (3), calculating the IPD parameter by frequency points within a preset range [ k1, k2 ]:

IPD(k)＝∠L(k)*R^*(k),k₁≤k≤k₂ (3)

wherein k represents frequency point, L (k) and R (k) are respectively the kth frequency point value of the left and right sound channel frequency domain signal, the frequency point value comprises a real part and an imaginary part, R^*(k) Representing the conjugate of the k-th frequency point of the right channel frequency domain signal, the real and imaginary components of L (k) and R (k) may be based on X_L(k) And X_R(k) See the prior art for construction.

Step 2.2, calculating the time difference between the sound channels of each frequency point based on a formula (4):

wherein, L is the time-frequency transformation length adopted when the time domain signals of the left and right channels are transformed into the frequency domain signals of the left and right channels, and pi is the circumferential rate.

And 2.3, carrying out statistical treatment on the ITD (k) to obtain ITD parameters.

In particular, can be found in [ k1, k2]]After ITD (k) in the range, the number N of ITD (k) as positive number is counted_posAnd ITD (k) is the number N of negative numbers_negAnd further respectively calculating ITD (k) as mean value M of positive numbers_posVariance V_posAnd ITD (k) is the mean M of negative numbers_negVariance V_neg(ii) a Finally according to N_pos、N_neg、M_pos、M_neg、V_pos、V_negObtaining ITD parameters for current frame/subframe, e.g. when N_pos>N_negWhen, if V_pos<V_negIf the ITD parameter is M_posRounding up.

And 3, carrying out quantization processing on the extracted ITD parameters.

The decoding end can combine the single-channel signal and the decoded ITD parameter to recover the stereo phase information.

As can be seen from equation (4), the prior art calculates ITD based on IPD. However, for signals with large time delay, IPD may exceed the range of 2 pi, and if the ITD parameter is extracted by using the prior art, the calculated ITD parameter may be inaccurate, thereby degrading the quality of decoded audio.

Disclosure of Invention

The application provides an encoding method and an encoder of a multi-channel signal, so as to accurately extract ITD parameters of the multi-channel signal.

In a first aspect, a method for encoding a multi-channel signal is provided, including: acquiring a multi-channel signal; generating a target frequency domain signal according to the multi-channel signal, wherein the phase of the target frequency domain signal is linearly related to the IPD of the multi-channel signal; performing frequency-time transformation on the target frequency domain signal to obtain a target time domain signal; determining ITD parameters of the multi-channel signals according to the target time domain signals; encoding the ITD parameters.

Because the phase of the constructed target frequency domain signal is linearly related to the IPD of the multi-channel signal, the maximum value of the target time domain signal obtained by performing frequency-time transformation on the target frequency domain signal is positioned at the ITD, and the ITD parameter obtained by using the target time domain signal is not influenced by whether the IPD of the multi-channel signal exceeds the range of 2 pi or not, so that the method is more accurate.

In some implementations, the phase of the target frequency domain signal is the IPD of the multi-channel signal. It is to be understood that the frequency-domain signal may be represented by a complex number, and the complex number may be represented by an amplitude and a phase, and the phase of the target frequency-domain signal may refer to a phase representing the complex number constituting the target frequency-domain signal.

With reference to the first aspect, in a first implementation manner of the first aspect, the generating a target frequency domain signal according to the multi-channel signal includes: determining the amplitude of the target frequency domain signal according to the multi-channel signal; determining IPD parameters of the multi-channel signals according to the multi-channel signals; and generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal.

In combination with the first aspectIn a second implementation manner of the first aspect, the determining an amplitude of the target frequency domain signal according to the multi-channel signal includes: according to

Determining the amplitude of the target frequency domain signal, wherein A_M(k) Representing the amplitude, A, of the target frequency domain signal₁(k) Representing the amplitude, A, of a frequency domain signal of a first channel of said multi-channel signal₂(k) And the amplitude of the frequency domain signal of the second channel in the multi-channel signal is represented, k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the length of time-frequency transformation adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

With reference to the first or the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal includes: according to

Generating the target frequency domain signal, wherein A_M(k) Representing the amplitude, X, of the target frequency domain signal_{M_real}(k) Representing the real part, X, of the target frequency domain signal_{M_iamge}(k) And (c) representing an imaginary part of the target frequency domain signal, IPD (k) representing the IPD parameter, k representing a frequency point, k being more than or equal to 0 and less than or equal to L/2, and L representing a time-frequency transformation length adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

With reference to any one of the first to third implementation manners of the first aspect, in a fourth implementation manner of the first aspect, the generating a target frequency domain signal according to the multi-channel signal includes: according to X_M(k)＝X₁(k)*X^* ₂(k) Generating the target frequency domain signal, wherein X_M(k) Representing the target frequency domain signal, X₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) A conjugate of a frequency domain signal representing a second channel of the multi-channel signalK represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents a time-frequency transformation length adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

The target frequency domain signal is constructed in a mode of directly multiplying the frequency domain signal of the first channel and the conjugate of the frequency domain signal of the second channel, and therefore the coding efficiency can be improved.

With reference to the first aspect or any one of the first to third implementation manners of the first aspect, in a fifth implementation manner of the first aspect, the generating a target frequency domain signal according to the multi-channel signal includes: according to X_M(k)＝X₁(k)*X^* ₂(k) Determining the frequency domain signal X_M(k) Wherein X is₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) Representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain; for the frequency domain signal X_M(k) The amplitude value of the target frequency domain signal is normalized to obtain the target frequency domain signal.

With reference to the first aspect or any one of the first to fifth implementation manners of the first aspect, in a sixth implementation manner of the first aspect, the determining the ITD parameter of the multi-channel signal according to the target time-domain signal includes: selecting a target sampling point from N sampling points of the target time domain signal, wherein the target sampling point is the sampling point with the largest sampling value in the N sampling points, and N represents the number of the sampling points of the target time domain signal; and determining the ITD parameters of the multi-channel signal according to the index values corresponding to the target sampling points, wherein the index values are used for indicating the sequencing of the target sampling points in the N sampling points. Alternatively, the index value is used to indicate that the target sample point is the second sample point of the N sample points. For example, the range of the index values of the N sampling points may be (-N/2, N/2), and if the target sampling point is the last sampling point of the N sampling points, the index value corresponding to the target sampling point is N/2.

With reference to the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the determining the ITD parameter according to the index value corresponding to the target sampling point includes: and determining the index value corresponding to the target sampling point as the ITD parameter of the multi-channel signal.

With reference to the first aspect or any one of the first to seventh implementations of the first aspect, in an eighth implementation of the first aspect, before the determining the ITD parameters of the multi-channel signal according to the target time-domain signal, the method further includes: and smoothing the sampling value of the target time domain signal.

By smoothing the sampling value of the target time domain signal, calculation errors caused by noise interference can be effectively avoided.

With reference to the first aspect or any one of the first to eighth implementation manners of the first aspect, in a ninth implementation manner of the first aspect, the frequency-time transforming the target frequency-domain signal to obtain a target time-domain signal includes: and performing frequency-time transformation on part of the frequency domain signals in the target frequency domain signals to obtain the target time domain signals.

And selecting partial frequency domain signals in the target frequency domain signals to perform frequency-time conversion, so that the encoding complexity can be effectively reduced.

In a second aspect, there is provided an encoder comprising means capable of performing the steps of the encoding method of the first aspect.

In a third aspect, an encoder is provided, comprising a memory for storing a program and a processor for executing the program, wherein the processor performs the method of the first aspect or any one of the implementations of the first aspect when the program is executed.

In some implementations, the target frequency-domain signal may be a cross-correlation signal of the frequency-domain signals of the multiple channels.

In some implementations, the phase of the target frequency domain signal may be an IPD of the multi-channel signal. It is to be understood that the frequency-domain signal may be represented by a complex number, and the complex number may be represented by an amplitude and a phase, and the phase of the target frequency-domain signal may refer to a phase representing the complex number constituting the target frequency-domain signal.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of PS encoding in the prior art.

Fig. 2 is a flowchart of PS decoding in the prior art.

Fig. 3 is a flowchart of ITD parameter encoding in the prior art.

Fig. 4 is a schematic flowchart of an encoding method of a multi-channel signal according to an embodiment of the present invention.

Fig. 5 is a schematic flowchart of an encoding method of a multi-channel signal according to an embodiment of the present invention.

Fig. 6 is a schematic flowchart of an encoding method of a multi-channel signal according to an embodiment of the present invention.

Fig. 7 is a schematic configuration diagram of an encoder of the embodiment of the present invention.

Fig. 8 is a schematic configuration diagram of an encoder of the embodiment of the present invention.

Detailed Description

For ease of understanding, the meaning of multi-channel ILD, ITD, IPD is briefly introduced. Taking the signal picked up by the first microphone as the first channel signal and the signal picked up by the second microphone as the second channel signal as an example:

the ILD describes the difference in intensity between the first channel signal and the second channel signal; if the ILD is larger than 0, the energy of the first channel signal is higher than that of the second channel signal; if ILD equals 0, it means that the energy of the first channel signal equals the energy of the second channel signal; if the ILD is less than 0, it indicates that the energy of the first channel signal is less than the energy of the second channel signal;

the ITD describes the time difference between the first channel signal and the second channel signal, namely the time difference of the sound source reaching the first microphone and the second microphone, if the ITD is more than 0, the time of the sound source reaching the first microphone is earlier than the time of the sound source reaching the second microphone; if ITD equals 0, it indicates that the sound source arrives at the first microphone and the second microphone simultaneously; if the ITD is less than 0, it indicates that the sound source arrives at the first microphone later than the sound source arrives at the second microphone;

IPD describes the phase difference between the first channel signal and the second channel signal, which is usually combined with the ITD parameters to recover the phase information of the multi-channel signal at the decoding end.

The encoding method according to the embodiment of the present invention is described in detail below with reference to fig. 4 to 6.

Fig. 4 is a schematic flow chart of an encoding method of an embodiment of the present invention. The method of fig. 4 includes:

410. a multi-channel signal is acquired.

It should be understood that the multi-channel signal may be a multi-channel time domain signal or a multi-channel frequency domain signal.

420. And generating a target frequency domain signal according to the multi-channel signal.

In some embodiments, the phase of the target frequency domain signal may be linearly related to the IPD of the multi-channel signal. In one example, the phase of the target frequency domain signal may be IPD, i.e., a linear correlation coefficient of 1.

The implementation of step 420 can be various, and will be described in detail with reference to specific examples, which are not described in detail here.

430. And carrying out frequency-time transformation on the target frequency domain signal to obtain a target time domain signal.

In some embodiments, all of the target frequency domain signals may be frequency-to-time transformed to obtain target time domain signals. In some embodiments, a portion of the frequency domain signal in the target frequency domain signal may be subjected to frequency-time transformation to obtain a target time domain signal, which may reduce the encoding complexity.

It should be noted that, in the embodiment of the present invention, a selection manner of a part of the frequency domain signals in the target frequency domain signal is not specifically limited. In some embodiments, assuming that the spectral range of the target frequency-domain signal is [0, F ], the selected partial frequency-domain signal may be a low-frequency portion of the target frequency-domain signal, such as [0, F/2], [3, F/4] or [ F/4, F/2] portion of the target frequency-domain signal, and this selection is based on: for a stable signal, the results obtained based on the low frequency part of the signal (i.e. the ITD parameters of the multi-channels) do not differ much from the results obtained based on the entire spectrum of the signal.

440. And determining ITD parameters of the multi-channel signal according to the target time domain signal.

In some embodiments, ITD parameters of a multi-channel signal may be determined from a target time domain signal.

As an optional implementation manner, a target sampling point may be selected from N sampling points of a target time domain signal, where the target sampling point is a sampling point with a largest sampling value among the N sampling points, and N represents the number of sampling points of the target time domain signal; and determining the ITD parameters of the multi-channel signal according to the index values corresponding to the target sampling points, wherein the index values are used for indicating the sequencing of the target sampling points in the N sampling points.

For example, an index value corresponding to the target sampling point may be determined as an ITD parameter of the multi-channel signal. For another example, the index value corresponding to the target sampling point may be transformed according to a preset rule, and the transform result may be determined as the ITD parameter of the multi-channel signal.

In some embodiments, the sampled values of the target time domain signal may be smoothed prior to determining the ITD parameters from the target time domain signal.

450. The ITD parameters of the multi-channel signal are encoded.

In particular, ITD parameters of the multi-channel signal may be quantized.

In some embodiments, the method of fig. 4 may further include: and transmitting the ITD parameters of the encoded multi-channel signal to a decoding end.

In addition, the encoding method of fig. 4 may further include: down-mixing the multi-channel time domain signals to obtain a single-channel signal; coding the single sound channel signal to obtain a bit stream corresponding to the single sound channel signal; carrying out bit stream multiplexing on a bit stream corresponding to the single-channel signal and a bit stream corresponding to the spatial parameter; and transmitting the multiplexed bit stream to a decoding end.

The decoding end can restore stereo sound by combining the decoded mono signal and the ITD parameters in a manner similar to the prior art.

The following description will take a multi-channel signal as a left and right channel signal as an example, but the embodiment of the present invention is not limited thereto. In practice, the solution in the present application may be applied to process any two channels of a two-channel or multi-channel signal, in a multi-channel scenario, the left and right channels in the following may be any two channels of the multi-channel.

Fig. 5 is a schematic flowchart of an encoding method of a multi-channel signal according to an embodiment of the present invention. In the embodiment corresponding to fig. 5, the target Frequency domain signal is a Frequency domain signal constructed by calculating the amplitude of the monaural Frequency domain signal and the IPDs of the left and right channel signals through Frequency Bin (Frequency Bin). It should be understood that the process steps or operations illustrated in fig. 5 are merely examples, and other operations or variations of the various operations in fig. 5 may also be performed by embodiments of the present invention. Moreover, the various steps in FIG. 5 may be performed in a different order presented in FIG. 5, and it is possible that not all of the operations in FIG. 5 may be performed.

510. And respectively carrying out time-frequency transformation on the time domain signals of the left and right sound channels to obtain frequency domain signals of the left and right sound channels.

Specifically, Fast Fourier Transform (FFT) may be performed on the time domain signals of the left and right channels using equations (5) and (6):

wherein x is_L(n) and x_R(n) are time domain signals of left and right sound channels respectively, Length is frame Length or subframe Length, k is an index value of frequency points of the frequency domain signals, and L is time frequency conversion Length.

The frequency domain signal obtained after FFT is a complex signal containing a real part and an imaginary part, and the real part is X for the frequency domain signal of the left channel_{L_real}(k) Imaginary part of X_{L_image}(k) (ii) a For the frequency domain signal of the right channel, the real part is X_{R_real}(k) Imaginary part of X_{R_image}(k) Wherein

specifically, taking the frequency domain signal of the left channel as an example, the values of the real part and the imaginary part thereof may be calculated as follows:

X_{L_real}(0)＝X_L(0),X_{L_image}(0)＝0 (7)

or,

X_{L_real}(0)＝X_L(0),X_{L_image}(0)＝0 (10)

it should be noted that after the time-frequency transformation, for a wideband signal (WB signal), if the length of the time-frequency transformation is 512, the obtained frequency domain signal includes 256 frequency points, where the 256 frequency point corresponds to a spectrum of 8kHz, the 128 frequency point corresponds to a spectrum of 4kHz, and so on.

520. And carrying out frequency domain coefficient processing on the frequency domain signals of the left and right sound channels to obtain target frequency domain signals.

In some embodiments, the amplitude a of the target frequency domain signal may be calculated frequency point by frequency point_M(k) And inter-channel phase difference IPD (k), wherein k is a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L is a time-frequency transformation length adopted when the time domain signals of the left and right channels are transformed into the frequency domain signals of the left and right channels.

Specifically, the amplitude a of the target frequency domain signal may be calculated first_M(k)：

The amplitude of the frequency domain signal of the left channel may be:

the amplitude of the frequency domain signal of the right channel may be:

then, ipd (k) of the left and right channel signals can be calculated:

IPD(k)＝∠L(k)*R^*(k),k₁≤k≤k₂ (16)

l (k) and R (k) are respectively the kth frequency point values of the left and right sound channel frequency domain signals, the frequency point values comprise a real part and an imaginary part, R^*(k) Representing the conjugate of the kth frequency-point value of the right channel frequency-domain signal, the real and imaginary parts of L (k) and R (k) may be based on X_L(k) And X_R(k) And (5) constructing.

The formula (16) may be further organized as:

wherein:

A′(k)＝X_{L_real}(k)*X_{R_real}(k)+X_{L_image}(k)*X_{R_image}(k) (18)

A″(k)＝X_{L_image}(k)*X_{R_real}(k)-X_{L_real}(k)*X_{R_image}(k) (19)

then, after obtaining the amplitude of the target frequency domain signal and the phase difference of the left and right channel signals, further processing to obtain the target frequency domain signal:

X_{M_real}(k)＝A_M(k)*cos(IPD(k)) (20)

X_{M_iamge}(k)＝A_M(k)*sin(IPD(k)) (21)

in some embodiments, after obtaining the amplitude of the target frequency domain signal and the IPD of the left and right channel signals, a table lookup method may be used to obtain the target frequency domain signal, for example, a sin function table and a cos function table are set, and the table lookup method is used to obtain the target frequency domain signal, which may effectively reduce the computational complexity of the algorithm.

530. And carrying out frequency-time transformation on the target frequency domain signal to obtain a target time domain signal.

In some embodiments, the target frequency domain signal may be windowed and subjected to an Inverse Discrete Fourier Transform (IDFT).

Specifically, the target frequency domain signal may be windowed first:

k is a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L is a time-frequency transformation length adopted when the time domain signals of the left and right sound channels are transformed into the frequency domain signals of the left and right sound channels.

Then, performing IDFT on the windowed signal to obtain a target time domain signal:

wherein n is an index value of a sampling point of the time domain signal, and n is more than or equal to 0 and less than L/2.

In some embodiments, step 530 may use IDFT for frequency-time transformation, and may also use Inverse Fast Fourier Transform (IFFT) for frequency-time transformation.

In some embodiments, the frequency-time transformation may be performed only in a specific frequency domain range without performing the frequency-time transformation on all frequency points, so that the computational complexity of the algorithm may be effectively reduced. For example, frequency-time transformation may be performed within a frequency bin range [ k3, k4], where k3>0 and k4< L/2.

540. And smoothing the amplitude of the target time domain signal.

Specifically, the amplitude of the target time domain signal can be represented by the following formula:

smoothing the amplitude of the target time domain signal to obtain an amplitude smoothing value A_sm(n)：

Wherein,

the amplitude smoothing value of the nth point of the previous frame/subframe of the current frame is obtained; w is a₁、w₂The smoothing factor can be set to be constant or follow

And a (n) while satisfying w₁+w₂1. For example, w may be set₁＝0.75，w₂0.25, or w₁＝0.8，w₂0.2, or w₁＝0.9，w₂0.1, or

550. And searching an index value corresponding to the sampling point with the maximum sampling value of the smoothed time domain signal to obtain the ITD parameter.

Specifically, the index value index corresponding to the maximum sampling point of the smoothed time domain signal is searched for, which is argmax (a)_sm(n)), the ITD parameter is index.

As can be seen from equations (20) and (21), the phases of the target frequency-domain signal obtained after the frequency-domain coefficient processing are IPDs of the first channel signal and the second channel signal. Further, since there is a linear relationship between IPD and ITD, the target frequency domain signal can be approximately rewritten as follows:

after the frequency-time transformation of the target frequency domain signal, the maximum value of the target time domain signal will be at the ITD.

Fig. 6 is a schematic flowchart of an encoding method of a multi-channel signal according to an embodiment of the present invention. In the embodiment corresponding to fig. 6, the target frequency-domain signal is constructed mainly based on the conjugate of the frequency-domain signal of one of the left and right channels and the frequency-domain signal of the other channel. It should be understood that the process steps or operations shown in fig. 6 are merely examples, and other operations or variations of the various operations in fig. 6 may also be performed by embodiments of the present invention. Moreover, the various steps in FIG. 6 may be performed in a different order presented in FIG. 6, and it is possible that not all of the operations in FIG. 6 may be performed. In addition, each step in fig. 6 corresponds to each step in fig. 5, except that the processing manner of step 620 is different from that of step 520, and other steps may refer to fig. 5, and are not described in detail here.

610. And respectively carrying out time-frequency transformation on the time domain signals of the left and right sound channels to obtain frequency domain signals of the left and right sound channels.

620. And obtaining a target frequency domain signal by conjugate multiplication of the frequency domain signal of one sound channel and the frequency domain signal of the other sound channel in the left and right sound channel signals.

It will be appreciated that the phase of the frequency domain signal resulting from the multiplication of the conjugate of the frequency domain signal of one channel and the frequency domain signal of the other channel is the IPD between the two channel signals.

Specifically, the target frequency domain signal X_M(k) Can be calculated by the following formula:

X_M(k)＝L(k)*R^*(k) (30)

wherein, L (k) and R (k) are respectively the kth frequency point value of the left and right sound channel frequency domain signal, the frequency point value comprises a real part and an imaginary part, R^*(k) Representing the conjugate of the kth frequency-point value of the right channel frequency-domain signal, the real and imaginary parts of L (k) and R (k) may be based on X_L(k) And X_R(k) And (5) constructing.

Or X_M(k)＝R(k)*L^*(k) (31)

Wherein R (k) is the k-th frequency point value of the frequency domain signal of the right channel, L^*(k) And k is the conjugate of the kth frequency point value of the frequency domain signal of the left sound channel, and is more than or equal to 0 and less than or equal to L/2.

In some embodiments, X is obtained_M(k) Then, X can be further paired_M(k) And carrying out normalization processing to obtain a target frequency domain signal.

Specifically, X may be calculated first_M(k) Maximum value of (d):

then to X_M(k) Normalizing the amplitude value:

630. and carrying out frequency-time transformation on the target frequency domain signal to obtain a target time domain signal.

640. And smoothing the amplitude of the target time domain signal.

650. And searching an index value corresponding to the sampling point with the maximum sampling value of the smoothed time domain signal to obtain the ITD parameter.

The method of encoding a multi-channel signal according to an embodiment of the present invention is described in detail above with reference to fig. 4 to 6, and the encoder according to an embodiment of the present invention is described in detail below with reference to fig. 7 to 8. It is to be understood that the encoder in fig. 7 or fig. 8 can perform the respective steps in fig. 4 to fig. 6, and a detailed description thereof will not be provided to avoid repetition.

Fig. 7 is a schematic configuration diagram of an encoder of the embodiment of the present invention. The encoder 700 of fig. 7 includes:

an acquisition unit 710 for acquiring a multi-channel signal;

a generating unit 720, configured to generate a target frequency domain signal according to the multi-channel signal, where a phase of the target frequency domain signal is linearly related to an IPD of the multi-channel signal;

a frequency-time transformation unit 730, configured to perform frequency-time transformation on the target frequency domain signal to obtain a target time domain signal;

a determining unit 740, configured to determine an ITD parameter of the multi-channel signal according to the target time domain signal;

an encoding unit 750 for encoding the ITD parameter of the multi-channel signal.

Optionally, as an embodiment, the generating unit 720 is specifically configured to determine, according to the multi-channel signal, an amplitude of the target frequency domain signal; determining IPD parameters of the multi-channel signals according to the multi-channel signals; and generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal.

Optionally, as an embodiment, the generating unit 720 is specifically configured to perform the method according to

Generating the target frequency domain signal, wherein A_M(k) Representing the amplitude, X, of the target frequency domain signal_{M_real}(k) Representing the real part, X, of the target frequency domain signal_{M_iamge}(k) And (c) representing an imaginary part of the target frequency domain signal, IPD (k) representing the IPD parameter, k representing a frequency point, k being more than or equal to 0 and less than or equal to L/2, and L representing the length of time-frequency transformation adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

Optionally, as an embodiment, the generating unit 720 is specifically configured to generate the X according to X_M(k)＝X₁(k)*X^* ₂(k) Generating the target frequency domain signal, wherein X_M(k) Representing the target frequency domain signal, X₁(k) Watch (A)A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) And representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the length of time-frequency transformation adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

Optionally, as an embodiment, the generating unit 720 is specifically configured to generate the X according to X_M(k)＝X₁(k)*X^* ₂(k) Determining the frequency domain signal X_M(k) Wherein X is₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) Representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the length of time-frequency transformation adopted when the multi-channel signal is transformed from a time domain to a frequency domain; for the frequency domain signal X_M(k) The amplitude value of the target frequency domain signal is normalized to obtain the target frequency domain signal.

Optionally, as an embodiment, the determining unit 740 is specifically configured to select a target sampling point from N sampling points of the target time domain signal, where the target sampling point is a sampling point with a largest sampling value of the N sampling points, and N represents a number of sampling points of the target time domain signal; and determining the ITD parameters of the multi-channel signal according to the index values corresponding to the target sampling points, wherein the index values are used for indicating the sequencing of the target sampling points in the N sampling points.

Optionally, as an embodiment, the determining unit 740 is specifically configured to determine an index value corresponding to the target sampling point as an ITD parameter of the multi-channel signal.

Optionally, as an embodiment, the encoder 700 further includes: and the smoothing unit is used for smoothing the sampling point value of the target time domain signal.

Optionally, as an embodiment, the frequency-time transforming unit 730 is specifically configured to perform frequency-time transformation on a part of the frequency domain signals in the target frequency domain signals to obtain the target time domain signals.

Fig. 8 is a schematic configuration diagram of an encoder of the embodiment of the present invention. The encoder 800 of fig. 8 includes:

a memory 810 for storing a program;

a processor 820 for executing a program in the memory 810, the processor 820 acquiring a multi-channel signal when the program is executed; generating a target frequency domain signal according to the multi-channel signal, wherein the phase of the target frequency domain signal is linearly related to the IPD of the multi-channel signal; performing frequency-time transformation on the target frequency domain signal to obtain a target time domain signal; determining ITD parameters of the multi-channel signals according to the target time domain signals; encoding the ITD parameters of the multi-channel signal.

Optionally, as an embodiment, the processor 820 is specifically configured to determine, according to the multi-channel signal, an amplitude of the target frequency domain signal; determining IPD parameters of the multi-channel signals according to the multi-channel signals; and generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal.

Optionally, as an embodiment, the processor 820 is specifically configured according to

Generating the target frequency domain signal, wherein A_M(k) Representing the amplitude, X, of the target frequency domain signal_{M_real}(k) Representing the target frequency domain signalReal part, X_{M_iamge}(k) And (c) representing an imaginary part of the target frequency domain signal, IPD (k) representing the IPD parameter, k representing a frequency point, k being more than or equal to 0 and less than or equal to L/2, and L representing the length of time-frequency transformation adopted when the multi-channel signal is transformed from a time domain to a frequency domain.

Optionally, as an embodiment, the processor 820 is specifically configured to according to X_M(k)＝X₁(k)*X^* ₂(k) Generating the target frequency domain signal, wherein X_M(k) Representing the target frequency domain signal, X₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) And representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain.

Optionally, as an embodiment, the processor 820 is specifically configured to according to X_M(k)＝X₁(k)*X^* ₂(k) Determining the frequency domain signal X_M(k) Wherein X is₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) Representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain; for the frequency domain signal X_M(k) The amplitude value of the target frequency domain signal is normalized to obtain the target frequency domain signal.

Optionally, as an embodiment, the processor 820 is specifically configured to select a target sampling point from N sampling points of the target time-domain signal, where the target sampling point is a sampling point with a largest sampling value of the N sampling points, and N represents a number of sampling points of the target time-domain signal; and determining the ITD parameters of the multi-channel signal according to the index values corresponding to the target sampling points, wherein the index values are used for indicating the sequencing of the target sampling points in the N sampling points.

Optionally, as an embodiment, the processor 820 is specifically configured to determine an index value corresponding to the target sampling point as an ITD parameter of the multi-channel signal.

Optionally, as an embodiment, the processor 820 is further configured to perform smoothing processing on the sample value of the target time-domain signal.

Optionally, as an embodiment, the processor 820 is specifically configured to perform frequency-time transformation on a part of the target frequency-domain signal to obtain the target time-domain signal.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of encoding a multi-channel signal, comprising:

acquiring a multi-channel signal;

generating a target frequency domain signal according to the multi-channel signal, wherein the phase of the target frequency domain signal is linearly related to the inter-channel phase difference IPD of the multi-channel signal;

performing frequency-time transformation on the target frequency domain signal to obtain a target time domain signal;

determining an inter-channel time difference ITD parameter of the multi-channel signal according to the target time domain signal;

encoding the ITD parameters.

2. The method of claim 1, wherein generating a target frequency domain signal from the multi-channel signal comprises:

determining the amplitude of the target frequency domain signal according to the multi-channel signal;

determining IPD parameters of the multi-channel signals according to the multi-channel signals;

and generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal.

3. The method of claim 2, wherein determining the magnitude of the target frequency domain signal from the multi-channel signal comprises:

according to

Determining the amplitude of the target frequency domain signal, wherein A_M(k) Representing the amplitude, A, of the target frequency domain signal₁(k) Representing the amplitude, A, of a frequency domain signal of a first channel of said multi-channel signal₂(k) And the amplitude of the frequency domain signal of the second channel in the multi-channel signal is represented, k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain.

4. The method of claim 2 or 3, wherein the generating the target frequency-domain signal according to the amplitude of the target frequency-domain signal and the IPD parameter of the multi-channel signal comprises:

according to

5. The method according to any of claims 1-3, wherein generating a target frequency domain signal from the multi-channel signal comprises:

according to X_M(k)＝X₁(k)*X^* ₂(k) Generating the target frequency domain signal, wherein X_M(k) Representing the target frequency domain signal, X₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) And representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain.

6. The method according to any of claims 1-3, wherein generating a target frequency domain signal from the multi-channel signal comprises:

according to X_M(k)＝X₁(k)*X^* ₂(k) Determining the frequency domain signal X_M(k) Wherein X is₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) Representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain;

for the frequency domain signal X_M(k) The amplitude value of the target frequency domain signal is normalized to obtain the target frequency domain signal.

7. The method according to any of claims 1-3, wherein the determining ITD parameters of the multi-channel signal from the target time domain signal comprises:

and determining the ITD parameter of the multi-channel signal according to the index value corresponding to the sampling point with the maximum sampling value of the target time domain signal.

8. The method of claim 7, wherein determining the ITD parameter of the multi-channel signal according to the index value corresponding to the sampling point with the largest sampling value of the target time-domain signal comprises:

determining the index value as the ITD parameter.

9. A method according to any one of claims 1-3, wherein prior to said determining ITD parameters of the multi-channel signal from the target time domain signal, the method further comprises:

and smoothing the sampling value of the target time domain signal.

10. The method according to any one of claims 1-3, wherein said frequency-time transforming said target frequency domain signal to obtain a target time domain signal comprises:

and performing frequency-time transformation on part of the frequency domain signals in the target frequency domain signals to obtain the target time domain signals.

11. An encoder, comprising:

an acquisition unit configured to acquire a multi-channel signal;

the generating unit is used for generating a target frequency domain signal according to the multi-channel signal, and the phase of the target frequency domain signal is linearly related to the inter-channel phase difference IPD of the multi-channel signal;

the frequency-time conversion unit is used for carrying out frequency-time conversion on the target frequency domain signal to obtain a target time domain signal;

the determining unit is used for determining an inter-channel time difference ITD parameter of the multi-channel signal according to the target time domain signal;

and the coding unit is used for coding the ITD parameters.

12. The encoder of claim 11, wherein the generating unit is specifically configured to determine an amplitude of the target frequency domain signal based on the multi-channel signal; determining IPD parameters of the multi-channel signals according to the multi-channel signals; and generating the target frequency domain signal according to the amplitude of the target frequency domain signal and the IPD parameter of the multi-channel signal.

13. The encoder of claim 12, wherein the generating unit is specifically configured to be based on

14. The encoder according to claim 12 or 13, wherein the generating unit is in particular adapted to be based on

Generating the target frequency domain signal, wherein A_M(k) Representing the amplitude, X, of the target frequency domain signal_{M_real}(k) Representing the real part, X, of the target frequency domain signal_{M_iamge}(k) Expressing the imaginary part of the target frequency domain signal, IPD (k) expressing the IPD parameter, k expressing the frequency point, k being more than or equal to 0 and less than or equal to L/2, L expressing the conversion of the multi-channel signal from the time domain to the frequencyThe time-frequency transformation length adopted in the domain.

15. The encoder according to any of claims 11-13, wherein the generating unit is in particular adapted to generate the signal according to X_M(k)＝X₁(k)*X^* ₂(k) Generating the target frequency domain signal, wherein X_M(k) Representing the target frequency domain signal, X₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) And representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain.

16. The encoder according to any of claims 11-13, wherein the generating unit is in particular adapted to generate the signal according to X_M(k)＝X₁(k)*X^* ₂(k) Determining the frequency domain signal X_M(k) Wherein X is₁(k) A frequency domain signal, X, representing a first channel of the multi-channel signal^* ₂(k) Representing the conjugate of the frequency domain signal of the second channel in the multi-channel signal, wherein k represents a frequency point, k is more than or equal to 0 and less than or equal to L/2, and L represents the time-frequency transformation length adopted when the multi-channel signal is transformed from the time domain to the frequency domain; for the frequency domain signal X_M(k) The amplitude value of the target frequency domain signal is normalized to obtain the target frequency domain signal.

17. The encoder according to any of claims 11 to 13, wherein the determining unit is specifically configured to determine the ITD parameter of the multi-channel signal based on an index value corresponding to a sampling point of a target time domain signal having a largest sampling value.

18. The encoder of claim 17, wherein the determination unit is specifically configured to determine the index value as the ITD parameter.

19. The encoder of any one of claims 11-13, wherein the encoder further comprises:

and the smoothing unit is used for smoothing the sampling value of the target time domain signal.

20. The encoder according to any of claims 11 to 13, wherein the frequency-time transform unit is specifically configured to perform a frequency-time transform on a part of the target frequency-domain signal to obtain the target time-domain signal.