CN102804264A

CN102804264A - Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information

Info

Publication number: CN102804264A
Application number: CN2011800140389A
Authority: CN
Inventors: 尤哈·维尔卡莫; 扬·普洛格斯蒂亚斯; 伯恩哈德·诺伊格鲍尔; 于尔根·赫莱
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2010-01-15
Filing date: 2011-01-11
Publication date: 2012-11-28
Anticipated expiration: 2031-01-11
Also published as: CA2786943C; US20120314876A1; AU2011206670A1; RU2568926C2; RU2012136027A; ES2587196T3; WO2011086060A1; CA2786943A1; EP2524370B1; AR079998A1; US9093063B2; JP2013517518A; TWI459376B; BR112012017551A2; KR20120109627A; EP2360681A1; EP2524370A1; AU2011206670B2; TW201142825A; CN102804264B

Abstract

An apparatus for extracting a direct/ambient signal from a downmix signal and spatial parameter information representing a multi-channel audio signal having more channels than the downmix signal, wherein the The spatial parameter information includes an inter-channel relationship of the multi-channel audio signal. The device includes a direct/surround estimator and a direct/surround extractor. The direct/surround estimator is configured for estimating level information of the direct part and/or the surrounding part of the multi-channel audio signal based on the spatial parameter information. The direct/surround extractor is configured for extracting the direct signal part and/or the ambient signal part from the downmix signal based on the level information of the direct part or the surrounding part.

Description

Apparatus and method for extracting direct/surrounding signal from downmix signal and spatial parameter information

技术领域 technical field

本发明涉及音频信号处理，并且更明确地，涉及从下混信号和空间参数信息提取直接/周围信号的一种装置及方法。本发明的其他实施例涉及利用直接/周围分离用于提升音频信号的双耳再现。又一些实施例涉及多声道声音的双耳再现，其中，多声道音频表示具有二个或多个声道的音频。具有多声道声音的典型音频内容为电影声轨及多声道音乐记录。The present invention relates to audio signal processing, and more particularly to an apparatus and method for extracting direct/surrounding signals from downmix signals and spatial parameter information. Other embodiments of the invention relate to utilizing direct/ambient separation for boosting binaural reproduction of audio signals. Yet other embodiments relate to binaural reproduction of multi-channel sound, wherein multi-channel audio means audio having two or more channels. Typical audio content with multi-channel sound are movie soundtracks and multi-channel music recordings.

背景技术 Background technique

人类空间听觉系统倾向于粗略地以两部分处理声音。一方面为可局限化部分或直接部分，而另一方面为非可局限化部分或周围部分。存在许多音频处理应用，诸如双耳声音再现及多声道上混，其中，期望存取这二个音频组分。The human spatial auditory system tends to process sound roughly in two parts. On the one hand the localizable or immediate part and on the other hand the non-localizable or surrounding part. There are many audio processing applications, such as binaural sound reproduction and multi-channel upmixing, where it is desired to access both audio components.

在本领域中，已知一种直接/周围分离方法，例如在“用于空间音频编码及增强的一次周围信号分解及基于向量的局限化”，Goodwin,Jot，IEEE国际声学、语音及信号处理会议，2007年4月；“从立体声记录的基于相关性的周围抽取”，Merimaa,Goodwin,Jot，AES第123期刊，纽约2007年；“立体信号的多扬声器回放”，C.Faller，AES会议，2007年10月；“立体音频信号使用复杂相似性指针的一次周围分解”，Goodwin等人，公告号码：US2009/0198356A1，2009年8月；“专利申请案名称：从立体信号产生多声道音频信号之方法”，发明人：Christof Faller，代理人：FISH&RICHARDSON P.C.，受让人：LG电子公司，源自：美国明尼苏达州明尼波里市，IPC8类别：AH04R500FI，USPC类别：3811；以及“立体信号的周围产生”，Avendano et al.，发行日期：2009年7月28日，申请号：10/163,158，申请日：2002年6月4日中所描述，这些方法可用于多项应用。现有技术最先进的直接/周围分离算法则基于立体声于频带的频带间信号比较。A direct/ambient separation method is known in the art, e.g. in "One-time ambient signal decomposition and vector-based localization for spatial audio coding and enhancement", Goodwin, Jot, IEEE International Acoustics, Speech, and Signal Processing Conference, April 2007; "Correlation-based surround extraction from stereo recordings", Merimaa, Goodwin, Jot, AES Journal 123, New York 2007; "Multi-loudspeaker playback of stereo signals", C. Faller, AES Conference , October 2007; "One-time Surrounding Decomposition of Stereo Audio Signals Using Complex Similarity Pointers", Goodwin et al., publication number: US2009/0198356A1, August 2009; A METHOD FOR AUDIO SIGNALS", Inventor: Christof Faller, Attorney: FISH&RICHARDSON P.C., Assignee: LG Electronics Corporation, Origin: Minneapolis, MN, USA, IPC8 Class: AH04R500FI, USPC Class: 3811; and " Ambient Generation of Stereo Signals", Avendano et al., Issue Date: July 28, 2009, Application No.: 10/163,158, Application Date: June 4, 2002, these methods can be used in a number of applications. State-of-the-art direct/surround separation algorithms are based on inter-band signal comparisons between stereo and frequency bands.

此外，在“基于空间音频场景编码的双耳3-D音频呈现”，Goodwin,Jot，AES123届会议，纽约2007年，解决使用周围提取的双耳回放。关联双耳再现的周围提取也在J.Usher及J.Benesty中叙述，“空间声音质量的提升：新颖残响音频上混器”，IEEE音频、语音、语言处理会报，第15期第2141-2150页2007年9月。后述报告聚焦在使用各声道的直接组分的适应性最小均方交叉声道滤波而在立体麦克风记录的周围提取。空间音频编译码器例如MPEG环绕，典型地由一或二声道音频串流组合空间侧边信息组成，其将音频延伸入多个声道，如在ISO/IEC 23003-1-MPEG环绕中叙述的那样；及Breebaart,J.,Herre,J.,Villemoes,L,,Jin,C.,Kjorling,K.,Plogsties,J.,Koppens,J.（2006），“多声道进入行动装置：MPEG环绕双耳呈现”，第29届AES会议议事录，韩国首尔。Also, in "Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding", Goodwin, Jot, AES123 Session, New York 2007, addressing binaural playback using ambient extraction. Surrounding extraction for correlated binaural reproduction is also described in J. Usher and J. Benesty, "Enhancement of spatial sound quality: a novel reverberant audio upmixer", IEEE Transactions on Audio, Speech, and Language Processing, Issue 15, Issue 2141 - 2150 pages September 2007. The report that follows focuses on extraction around stereo microphone recordings using adaptive least mean square cross-channel filtering of the direct components of each channel. Spatial audio codecs such as MPEG Surround typically consist of a one- or two-channel audio stream combined with spatial side information, which extends the audio into multiple channels, as described in ISO/IEC 23003-1-MPEG Surround and Breebaart, J., Herre, J., Villemoes, L,, Jin, C., Kjorling, K., Plogsties, J., Koppens, J. (2006), "Multichannel access to mobile devices: MPEG Surrounding Binaural Presentation", Proceedings of the 29th AES Conference, Seoul, South Korea.

但现代参数音频编码技术诸如MPEG环绕（MPS）及参数立体声（PS）只提供较少数音频下混声道，在某些情况下，只提供一个声道连同额外空间侧边信息。“原先”输入声道间的比较唯有在首次将声音解码成为期望的输出格式后才有可能。But modern parametric audio coding techniques such as MPEG Surround (MPS) and Parametric Stereo (PS) provide only a few audio downmix channels, and in some cases only one channel with additional spatial side information. Comparisons between "old" input channels are only possible after first decoding the sound into the desired output format.

因此，要求从下混信号及空间参数信息提取直接信号部分或周围信号部分的构想。但使用参数侧边信息作直接/周围提取并非既有的解决之道。Therefore, a concept is required to extract the direct signal part or the surrounding signal part from the downmix signal and the spatial parameter information. But using parameter side information for direct/surrounding extraction is not an existing solution.

因此本发明的目的是提供一种通过使用空间参数信息而从下混信号提取直接信号部分或周围信号部分的构想。It is therefore an object of the present invention to provide a concept for extracting direct or ambient signal parts from a downmix signal by using spatial parameter information.

该目的可通过权利要求1所述的装置、根据权利要求15所述的方法、或根据权利要求16所述的计算机程序来实现。This object is achieved by an apparatus according to claim 1 , a method according to claim 15 , or a computer program according to claim 16 .

发明内容 Contents of the invention

基于本发明的基本观念是当基于该空间参数信息而估算多声道音频信号的直接部分或周围部分的位准信息（level information，电平信息）并基于该估算的位准信息而从下混信号提取直接信号部分或周围信号部分时，可实现前述的直接/周围提取。此处，该下混信号及该空间参数信息表示该具有比下混信号更多声道的多声道音频信号。此种解决办法允许通过使用空间参数侧边信息而从具有一或多个输入声道的下混信号进行直接和/或周围提取。The basic concept based on the present invention is when estimating the level information (level information, level information) of the direct part or the surrounding part of the multi-channel audio signal based on the spatial parameter information and from the downmixing based on the estimated level information When the signal extracts the direct signal part or the surrounding signal part, the aforementioned direct/surrounding extraction can be realized. Here, the downmix signal and the spatial parameter information represent the multi-channel audio signal having more channels than the downmix signal. Such a solution allows direct and/or ambient extraction from a downmix signal with one or more input channels by using spatial parameter side information.

根据本发明的一实施例，一种用于从下混信号及空间参数信息提取直接和/或周围信号的装置包含直接/周围估算器及直接/周围提取器（direct/ambience estimator）。该下混信号及该空间参数信息表示比该下混信号具有更多声道的多声道音频信号。此外，该空间参数信息包含该多声道音频信号的声道间关系式。该直接/周围估算器被构造为用于基于该空间参数信息而估算该多声道音频信号的直接部分或周围部分的位准信息。该直接/周围提取器被构造为用于基于该直接部分或该周围部分的该估算得到的位准信息而从该下混信号提取该直接信号部分或该周围信号部分。According to an embodiment of the present invention, an apparatus for extracting direct and/or ambient signals from a downmix signal and spatial parameter information includes a direct/ambience estimator and a direct/ambience estimator. The downmix signal and the spatial parameter information represent a multi-channel audio signal having more channels than the downmix signal. In addition, the spatial parameter information includes an inter-channel relational expression of the multi-channel audio signal. The direct/surround estimator is configured for estimating level information of the direct part or the surrounding part of the multi-channel audio signal based on the spatial parameter information. The direct/surrounding extractor is configured for extracting the direct signal part or the surrounding signal part from the downmix signal based on the estimated level information of the direct part or the surrounding part.

根据本发明的另一实施例，一种用于从下混信号及空间参数信息提取直接和/或周围信号的装置包含双耳直接声音呈现装置（binarural directsound rendering device）、双耳周围声音呈现装置及组合器。该双耳直接声音呈现装置被构造为用于处理该直接信号部分来获得一第一双耳输出信号。该双耳周围声音呈现装置被构造为用于处理该周围信号部分来获得一第二双耳输出信号。该组合器被构造为用于组合该第一双耳输出信号及第二双耳输出信号来获得一经组合的双耳输出信号。因此，可提供一音频信号的双耳再现，其中，该音频信号的直接信号部分及周围信号部分被分开处理。According to another embodiment of the present invention, a device for extracting direct and/or ambient signals from a downmix signal and spatial parameter information includes a binarural direct sound rendering device (binarural direct sound rendering device), a binarural ambient sound rendering device and combiners. The binaural direct sound rendering device is configured for processing the direct signal portion to obtain a first binaural output signal. The binaural ambient sound rendering device is configured to process the ambient signal portion to obtain a second binaural output signal. The combiner is configured to combine the first binaural output signal and the second binaural output signal to obtain a combined binaural output signal. Thus, binaural reproduction of an audio signal can be provided, wherein the direct signal part and the surrounding signal part of the audio signal are processed separately.

附图说明 Description of drawings

图1示出了用于从下混信号及表示多声道音频信号的空间参数信息提取直接/周围信号的一种装置的一实施例的框图；1 shows a block diagram of an embodiment of an apparatus for extracting direct/surrounding signals from a downmix signal and spatial parameter information representing a multi-channel audio signal;

图2示出了用于从单声道下混信号及表示参数立体声音频信号的空间参数信息提取直接/周围信号的一种装置的一实施例的框图；Figure 2 shows a block diagram of an embodiment of an apparatus for extracting direct/ambient signals from a mono downmix signal and spatial parameter information representing a parametric stereo audio signal;

图3a示出了根据本发明的一实施例的多声道音频信号的频谱分解的示意说明图；Fig. 3a shows a schematic illustration of spectral decomposition of a multi-channel audio signal according to an embodiment of the present invention;

图3b示出了用于基于图3a的频谱分解而计算多声道音频信号的声道间关系式的示意说明图；Figure 3b shows a schematic illustration of the inter-channel relationship for calculating a multi-channel audio signal based on the spectral decomposition of Figure 3a;

图4示出了利用估算的位准信息下混的直接/周围提取器的实施例的框图；Figure 4 shows a block diagram of an embodiment of a direct/surround extractor with estimated level information downmix;

图5示出了通过施加增益参数至一下混信号的一直接/周围提取器的又一实施例的框图；Figure 5 shows a block diagram of yet another embodiment of a direct/surround extractor by applying gain parameters to the downmix signal;

图6示出了基于使用声道交混的最小均方（LMS）解的一直接/周围提取器的又一实施例的框图；Figure 6 shows a block diagram of yet another embodiment of a direct/surround extractor based on a Least Mean Square (LMS) solution using channel mixing;

图7a示出了使用立体声周围估算式的一种直接/周围估算器的实施例的框图；Figure 7a shows a block diagram of an embodiment of a direct/ambient estimator using stereo ambient estimators;

图7b示出了直接对总能比（direct-to-total energy ratio）相对于声道间相干性的一实例的曲线图；Figure 7b shows a graph of an example of direct-to-total energy ratio versus inter-channel coherence;

图8示出了根据本发明的一实施例的编码器/译码器系统的框图；Figure 8 shows a block diagram of an encoder/decoder system according to an embodiment of the present invention;

图9a示出了根据本发明的一实施例的双耳直接声音呈现的纵览的框图；Figure 9a shows a block diagram of an overview of binaural direct sound presentation according to an embodiment of the invention;

图9b示出了图9a的双耳直接声音呈现的细节的框图；Figure 9b shows a block diagram of details of the binaural direct sound presentation of Figure 9a;

图10a示出了根据本发明的一实施例的双耳周围声音呈现的纵览的框图；Figure 10a shows a block diagram of an overview of binaural ambient sound presentation according to an embodiment of the invention;

图10b示出了图10a的双耳周围声音呈现细节的双耳周围声音呈现细节的框图；Figure 10b shows a block diagram of binaural ambient sound rendering details of the binaural ambient sound rendering details of Figure 10a;

图11示出了多声道音频信号的双耳再现的一实施例的构想框图；Figure 11 shows a conceptual block diagram of an embodiment of binaural reproduction of a multi-channel audio signal;

图12示出了包括双耳再现的直接/周围提取的一实施例的总体框图；Figure 12 shows a general block diagram of an embodiment of direct/surrounding extraction including binaural rendering;

图13a示出了用于在滤波器排组域（filterbank domain）中从单声道下混信号提取一直接/周围信号的一种装置的一实施例的框图；Figure 13a shows a block diagram of an embodiment of an apparatus for extracting a direct/surrounding signal from a mono downmix signal in the filterbank domain;

图13b示出了图13a的直接/周围提取块的一实施例的框图；以及Figure 13b shows a block diagram of an embodiment of the immediate/surrounding extraction block of Figure 13a; and

图14示出了根据本发明的又一实施例的MPEG环绕译码方案的一实例的示意说明图。FIG. 14 shows a schematic illustration of an example of an MPEG surround decoding scheme according to yet another embodiment of the present invention.

具体实施方式 Detailed ways

图1示出了用于从下混信号115及空间参数信息105提取直接/周围信号125-1、125-2的装置100的一实施例的框图。如图1所示，下混信号115及空间参数信息105表示比下混信号115具有更多声道Ch₁…Ch_N的多声道音频信号101。空间参数信息105可包含多声道音频信号101的声道间关系式。更明确言之，装置100包含一直接/周围估算器110及一直接/周围提取器120。直接/周围估算器110可被构造为基于空间参数信息105而估算该多声道音频信号101的直接部分或周围部分的位准信息113。直接/周围提取器120可被构造为基于该估算的直接部分或周围部分的位准信息（level information）113，而从该下混信号115提取直接信号部分125-1或周围信号部分125-2。FIG. 1 shows a block diagram of an embodiment of an apparatus 100 for extracting direct/surrounding signals 125 - 1 , 125 - 2 from a downmix signal 115 and spatial parameter information 105 . As shown in FIG. 1 , the downmix signal 115 and the spatial parameter information 105 represent the multi-channel audio signal 101 having more channels Ch ₁ . . . Ch _N than the downmix signal 115 . The spatial parameter information 105 may include an inter-channel relationship of the multi-channel audio signal 101 . More specifically, the device 100 includes a direct/surrounding estimator 110 and a direct/surrounding extractor 120 . The direct/surround estimator 110 may be configured to estimate the level information 113 of the direct part or the surrounding part of the multi-channel audio signal 101 based on the spatial parameter information 105 . The direct/surround extractor 120 may be configured to extract the direct signal portion 125-1 or the ambient signal portion 125-2 from the downmix signal 115 based on the estimated level information 113 of the direct portion or the ambient portion. .

图2示出了用于从一单声道下混信号215及表示参数立体声音频信号201的空间参数信息105提取直接/周围信号125-1、125-2的装置的一实施例的框图。图2的装置200大致上包含与图1的装置100相同的框。因此，具有相同实现方式和/或功能的相同框系以相同组件符号标示。此外，图2的参数立体声音频信号201可与图1的多声道音频信号101相对应，及图2的单声道下混信号215可与图1的下混信号115相对应。在图2的实施例中，单声道下混信号215及空间参数信息105表示参数立体声音频信号201。参数立体声音频信号可包含以“L”指示的左声道及以“R”指示的右声道。此处，直接/周围提取器120被构造为基于该估算的位准信息113，而从该单声道下混信号215提取直接信号部分125-1或周围信号部分125-2；该位准信息113可通过直接/周围估算器110的使用而从空间参数信息105导算出。FIG. 2 shows a block diagram of an embodiment of an arrangement for extracting direct/ambient signals 125 - 1 , 125 - 2 from a mono downmix signal 215 and spatial parameter information 105 representing a parametric stereo audio signal 201 . The device 200 of FIG. 2 generally includes the same blocks as the device 100 of FIG. 1 . Therefore, identical blocks having identical implementations and/or functions are labeled with identical reference symbols. Furthermore, the parametric stereo audio signal 201 of FIG. 2 may correspond to the multi-channel audio signal 101 of FIG. 1 , and the mono downmix signal 215 of FIG. 2 may correspond to the downmix signal 115 of FIG. 1 . In the embodiment of FIG. 2 , the mono downmix signal 215 and the spatial parameter information 105 represent the parametric stereo audio signal 201 . A parametric stereo audio signal may include a left channel indicated with an "L" and a right channel indicated with an "R". Here, the direct/surround extractor 120 is configured to extract the direct signal portion 125-1 or the ambient signal portion 125-2 from the mono downmix signal 215 based on the estimated level information 113; 113 may be derived from the spatial parameter information 105 through the use of the direct/surrounding estimator 110 .

实际上，图1或图2实施例中的空间参数（空间参数信息105）特别是指MPEG环绕（MPS）或参数立体声（PS）侧边信息。该二项技术是现有技术中的低位率立体声或环绕音频编码方法。参考图2，PS提供一个具有空间参数的下混音频声道，并参考图1，MPS提供一个、二个或多个具有空间参数的下混音频声道。In fact, the spatial parameters (spatial parameter information 105 ) in the embodiment in FIG. 1 or FIG. 2 especially refer to MPEG surround (MPS) or parametric stereo (PS) side information. These two technologies are low bit rate stereo or surround audio coding methods in the prior art. Referring to FIG. 2, the PS provides one downmix audio channel with spatial parameters, and referring to FIG. 1, the MPS provides one, two or more downmix audio channels with spatial parameters.

具体地，图1和图2的实施例清晰地示出了空间参数侧边信息105可容易地用在从具有一或多个输入声道的一信号（也即下混信号115；215）进行直接和/或周围提取的领域中。In particular, the embodiments of Figs. 1 and 2 clearly show that the spatial parameter side information 105 can be easily used to generate directly and/or around the extracted field.

直接和/或周围位准（位准信息113）的估算基于有关声道间关系或声道间差值的信息，诸如位准差和/或相关性。这些值可从立体声或多声道信号算出。图3a示出了用来计算各个Ch₁…Ch_N的声道间关系的多声道音频信号（Ch₁…Ch_N）的频谱分解300的示意说明图。如图3a可知，多声道音频信号（Ch₁…Ch_N）的受检查的声道Ch_i或其余声道的线性组合R的频谱分解，分别包含多个子频带301，其中，这些多个子频带301中的各个子频带303沿着具有子频带值305的一横轴（时间轴310）延伸，如时间/频率网格的小框所指示的。此外，子频带303沿纵轴（频率轴320）连续定位而与一滤波器排组的不同频率区域相对应。在图3a中，相应时间/频率片（tile）

或以虚线指示。此处，指数i表示声道Ch_i，而R表示其余声道的线性组合，而指数n及k对应于某些滤波器排组时槽（filter banktime slot）307和滤波器排组子频带303。基于这些时间/频率片（tile）

或

例如定位在相对于时间/频率轴310、320的相同时间/频率点（t₀,f₀），如图3b所示，可在步骤330中求出声道间关系式335，诸如所检查的声道Ch_i的声道间相干性（ICC_i）或声道位准差（CLD_i）。此处，声道间关系式ICC_i及CLD_i的计算可通过使用下列关系式进行：The estimation of direct and/or ambient level (level information 113 ) is based on information about inter-channel relationships or inter-channel differences, such as level differences and/or correlations. These values can be calculated from stereo or multi-channel signals. Fig. 3a shows a schematic illustration of a spectral decomposition 300 of a multi-channel audio signal (Ch ₁ ...Ch _N ) used to calculate the inter-channel relationship of the respective Ch ₁ ...Ch _N . As can be seen in Fig. 3a, the spectral decomposition of the inspected channel _Chi or the linear combination R of other channels of a multi-channel audio signal (Ch ₁ ... Ch _N ) respectively includes a plurality of sub-bands 301, wherein these multiple sub-bands Each sub-band 303 in 301 extends along a horizontal axis (time axis 310 ) with sub-band values 305 , as indicated by the small boxes of the time/frequency grid. Furthermore, sub-bands 303 are positioned consecutively along the vertical axis (frequency axis 320 ) to correspond to different frequency regions of a filter bank. In Figure 3a, the corresponding time/frequency tiles (tiles)

or indicated by dashed lines. Here, the index i denotes the channel Ch _i , while R denotes the linear combination of the remaining channels, while the indices n and k correspond to certain filter banktime slots 307 and filter bank subbands 303 . Based on these time/frequency tiles

or

For example positioned at the same time/frequency point (t ₀ , f ₀ ) relative to the time/frequency axes 310, 320, as shown in FIG. Inter-channel coherence (ICC _i ) or channel level difference (CLD _i ) of channel Ch _i . Here, the calculation of the inter-channel relations ICC _i and CLD _i can be performed by using the following relations:

${ICC ICC}_{i i} = = \frac{< < {Ch Ch}_{i i} {R R}^{* *} > >}{\sqrt{< < {Ch Ch}_{i i} {Ch Ch}_{i i}^{* *} > > < < {RR RR}^{* *} > >}}$

${σ σ}_{i i} = = \frac{< < {Ch Ch}_{i i} {Ch Ch}_{i i}^{* *} > >}{< < {RR RR}^{* *} > >}$

其中，Ch_i为所检查的声道，及R为其余声道的线性组合，而<…>表示时间平均。其余声道的线性组合R的一例为它们的能量标准化和（energy-normalized）。此外，声道位准差（CLD_i）（channel level difference）通常为参数σ_i的分贝值。where Ch _i is the checked channel, and R is the linear combination of the remaining channels, and <...> denotes the temporal average. An example of a linear combination R of the remaining channels is their energy-normalized sum. In addition, the channel level difference (CLD _i ) (channel level difference) is usually the decibel value of the parameter σ _i .

参考前述方程式，声道位准差（CLD_i）或参数σ_i可与标准化至其余声道的线性组合R的位准P_R的声道Ch_i的位准P_i相对应。这里，位准P_i或P_R可从声道Ch_i的声道间位准差参数ICLD_i及其余声道的声道间位准差参数ICLD_j（j不等于i）的线性组合ICLD_R中导算出。Referring to the aforementioned equations, the channel level difference (CLD _i ) or parameter σ _i may correspond to the level P _i of the channel Ch _i normalized to the level _PR of the linear combination R of the remaining channels. _Here , the level P _i or P _R can _be obtained _from the linear combination ICLD _R Calculated in the guide.

这里，ICLD_i及ICLD_j分别与一参考声道Ch_ref相关。在额外实施例中，声道间位准差参数ICLD_i及ICLD_j也可与多声道音频信号（Ch₁…Ch_N）的属于参考声道Ch_ref的任何其它声道相关。如此，最终将导致声道位准差（CLD_i）及参数σ_i的相同结果。Here, ICLD _i and ICLD _j are respectively related to a reference channel Ch _ref . In an additional embodiment, the inter-channel level difference parameters ICLD _i and ICLD _j may also be related to any other channel of the multi-channel audio signal (Ch ₁ . . . Ch _N ) belonging to the reference channel Ch _ref . In this way, it will eventually lead to the same result of channel level difference (CLD _i ) and parameter σ _i .

根据其它实施例，图3b的声道间关系式335也可通过在多声道音频信号（Ch₁…Ch_N）的不同或全部成对Ch_i、Ch_j输入声道上经运算而导算出。此种情况下，可获得成对逐一计算的声道间相干性参数ICC_i，j或声道位准差（CLD_i，j）或参数σ_ij（或ICLD_i，j），指数（i,j）分别表示某一对声道Ch_i及Ch_j。According to other embodiments, the inter-channel relationship 335 of Fig. 3b can also be derived by operating on different or all pairs of Ch _i , Ch _j input channels of a multi-channel audio signal (Ch ₁ ... Ch _N ) . In this case, the inter-channel coherence parameter ICC _{i, j} or channel level difference (CLD _{i, j} ) or parameter σ _ij (or ICLD _{i, j} ), index (i, j) represent a certain pair of channels Ch _i and Ch _j respectively.

图4示出了一直接/周围提取器420的一实施例400的框图，其包括估算的位准信息113的下混。图4的实施例大致上包含图1的实施例的相同框。因此，具有类似实现方式和/或功能的相同框以相同的组件符号标示。但对应于图1的直接/周围提取器120的图4的直接/周围提取器420被构造为将多声道音频信号的直接部分或周围部分的估算得的位准信息113下混，以获得该直接部分或周围部分的已下混的位准信息，并基于已下混的位准信息而从下混信号115提取直接信号部分125-1或周围信号部分125-2。如图4所示，空间参数信息105例如可从图1的多声道（Ch₁…Ch_N）音频信号101导算出，并可包含图3b所介绍的Ch₁…Ch_N的声道间关系式335。图4的空间参数信息105还包含将要馈送至直接/周围提取器420的下混信息410。在实施例中，下混信息410可将原先的多声道音频信号（例如图1的多声道音频信号101）的下混特征化为下混信号115。下混例如可使用于任何编码域，例如在时域或频域中运算的下混器（图中未示出）来执行。FIG. 4 shows a block diagram of an embodiment 400 of a direct/surround extractor 420 that includes downmixing of estimated level information 113 . The embodiment of FIG. 4 contains substantially the same blocks as the embodiment of FIG. 1 . Accordingly, identical blocks with similar implementation and/or functionality are marked with the same reference numerals. However, the direct/surround extractor 420 of FIG. 4 corresponding to the direct/surround extractor 120 of FIG. 1 is configured to downmix the estimated level information 113 of the direct part or the surrounding part of the multi-channel audio signal to obtain The downmixed level information of the direct part or the surrounding part, and extracting the direct signal part 125 - 1 or the surrounding signal part 125 - 2 from the downmixed signal 115 based on the downmixed level information. As shown in Fig. 4, the spatial parameter information 105 can be derived from the multi-channel (Ch ₁ ... Ch _N ) audio signal 101 in Fig. 1, for example, and can include the inter-channel relationship of Ch ₁ ... Ch _N introduced in Fig. 3b Formula 335. The spatial parameter information 105 of FIG. 4 also contains downmix information 410 to be fed to the direct/surround extractor 420 . In an embodiment, downmix information 410 may characterize a downmix of an original multi-channel audio signal (eg, multi-channel audio signal 101 of FIG. 1 ) into downmix signal 115 . Downmixing can eg be performed using a downmixer (not shown in the figure) operating in any coding domain, eg time domain or frequency domain.

根据其它实施例，直接/周围提取器420还被构造为通过组合具有相干性和的直接部分的估算得到的位准信息与具有非相干性和的周围部分的估算得到的位准信息，来执行多声道音频信号101的直接部分或周围部分的估算的位准信息113的下混。According to other embodiments, the direct/surrounding extractor 420 is further configured to perform Downmixing of the estimated level information 113 of the immediate part or the surrounding part of the multi-channel audio signal 101 .

须指出，估算的位准信息可分别表示直接部分或周围部分的能量（energy）位准或功率位准。It should be noted that the estimated level information may represent the energy level or power level of the immediate part or the surrounding part, respectively.

更明确言之，估算得到的直接/周围部分的能量（也即位准信息113）的下混可通过假设声道间的完全非相干性（full incoherence）或完全相干性来执行。在分别基于非相干性和或相干性和进行下混的情况下，可应用如下二个公式。More specifically, downmixing of the estimated direct/surrounding energy (ie level information 113 ) can be performed by assuming full incoherence or full coherence between channels. In the case of performing downmixing based on incoherence sum or coherence sum respectively, the following two formulas can be applied.

对于非相干性信号，已下混的能量或已下混的位准信息可由 $E_{DMX} = Σ_{i = 1}^{N} g_{i}^{2} E_{{Ch}_{i}}$ 计算。For incoherent signals, the downmixed energy or downmixed level information can be given by ${E.}_{DMX} = Σ_{i = 1}^{N} g_{i}^{2} {E.}_{{Ch}_{i}}$ calculate.

对于相干性信号，已下混的能量或已下混的位准信息可由 $E_{DMX} = {(Σ_{i = 1}^{N} g_{i} \sqrt{E_{{Ch}_{i}}})}^{2}$ 计算。For coherent signals, the downmixed energy or downmixed level information can be given by ${E.}_{DMX} = {(Σ_{i = 1}^{N} g_{i} \sqrt{{E.}_{{Ch}_{i}}})}^{2}$ calculate.

此处，g为下混增益，其可得自于下混信息，而E（Ch_i）表示多声道音频信号中的一声道Ch_i的直接/周围部分的能量。至于非相干性下混的典型例，在下混5.1声道成为二声道的情况下，左下混的能量可为：Here, g is the downmix gain, which can be obtained from the downmix information, and E(Ch _i ) represents the energy of the immediate/surrounding part of a channel _Chi in the multi-channel audio signal. As for the typical example of incoherent downmixing, in the case of downmixing 5.1 channels into two channels, the energy of the left downmixing can be:

E_{L_DMX}=E_Left+E_{Left_surround}+0.5*E_Center E _{L_DMX} =E _Left +E _{Left_surround} +0.5*E _Center

图5示出了通过将增益参数g_D、g_A施加至下混信号115的直接/周围提取器520的又一实施例。图5的直接/周围提取器520可对应于图4的直接/周围提取器420。首先，直接部分545-1或周围部分545-2的估算的位准信息可从如前文说明的一直接/周围估算器接收到。接收到的位准信息545-1、545-2可在步骤550中组合/下混以分别获得直接部分555-1或周围部分555-2的下混位准信息。然后在步骤560中，增益参数g_D 565-1、g_A565-2分别可针对直接部分或周围部分而从下混位准信息555-1、555-2导算出。最后，直接/周围提取器520可用来施加导算得出的增益参数565-1、565-3至下混信号115（步骤570），因而将获得直接信号部分125-1或周围部分125-2。FIG. 5 shows a further embodiment of a direct/surround extractor 520 by applying gain parameters g _D , g _A to the downmix signal 115 . The direct/surround extractor 520 of FIG. 5 may correspond to the direct/surround extractor 420 of FIG. 4 . First, estimated level information for either the direct portion 545-1 or the surrounding portion 545-2 may be received from a direct/surrounding estimator as previously described. The received level information 545-1, 545-2 may be combined/downmixed in step 550 to obtain downmixed level information for the immediate portion 555-1 or surrounding portion 555-2, respectively. Then in step 560, gain parameters g _D 565-1 , g _A 565-2 may be derived from the downmix level information 555-1 , 555-2 for the immediate part or surrounding part, respectively. Finally, the direct/surround extractor 520 can be used to apply the derived gain parameters 565-1, 565-3 to the downmix signal 115 (step 570), so that either the direct signal portion 125-1 or the ambient portion 125-2 will be obtained.

此处，须注意，在图1、图4、图5的实施例中，下混信号115可由分别存在于直接/周围提取器120、420、520的输入端处的多个下混声道（Ch₁…Ch_N）组成。Here, it should be noted that in the embodiments of FIGS. 1 , 4 , and 5 , the downmix signal 115 may be composed of a plurality of downmix channels (Ch ₁ ... Ch _N ) composition.

在其它实施例中，直接/周围提取器520被构造为从直接部分或周围部分的下混位准信息555-1、555-2来测定直接对总（DTT）能比或周围对总（ATT）能比，并使用基于所测得的DTT能比或ATT能比的提取参数作为增益参数565-1、565-2。In other embodiments, the direct/ambient extractor 520 is configured to determine the direct-to-total (DTT) energy ratio or the ambient-to-total (ATT ) energy ratio, and use the extracted parameters based on the measured DTT energy ratio or ATT energy ratio as gain parameters 565-1, 565-2.

在又一些实施例中，直接/周围提取器520被构造为将下混信号115与第一提取参数sqrt（DTT）相乘来获得直接信号部分125-1，并且与第二提取参数sqrt（ATT）相乘来获得周围信号部分125-2。此处，下混信号115可对应于单声道下混信号215，如图2的实施例所示（「单声道下混情况」）。In yet other embodiments, the direct/surround extractor 520 is configured to multiply the downmix signal 115 with the first extraction parameter sqrt(DTT) to obtain the direct signal portion 125-1, and multiply the downmix signal 115 with the second extraction parameter sqrt(ATT ) are multiplied to obtain the surrounding signal portion 125-2. Here, the downmix signal 115 may correspond to the mono downmix signal 215 , as shown in the embodiment of FIG. 2 (“mono downmix case”).

在单声道下混情况下，周围提取可通过施加sqrt（ATT）及sqrt（DTT）进行。但更明确言之，通过对各个声道Ch_i施加sqrt（ATT_i）及sqrt（DTT_i），对多声道下混信号相同办法也有效。In the case of a mono downmix, surround extraction can be performed by applying sqrt(ATT) and sqrt(DTT). But to be more specific, by applying sqrt(ATT _i ) and sqrt(DTT _i ) to each channel Ch _i , the same method is also effective for a multi-channel downmix signal.

根据其它实施例，在下混信号115包含多个声道的清况下（「单声道下混情况」），直接/周围提取器520可被构造为来施加第一多个提取参数例如sqrt（DTT_i）至下混信号115来获得直接信号部分125-1，并施加第二多个提取参数例如sqrt（ATT_i）至下混信号115来获得周围信号部分125-2。此处，第一多个提取参数及第二多个提取参数可组成对角线矩阵。According to other embodiments, in the case where the downmix signal 115 contains multiple channels (“mono downmix case”), the direct/surround extractor 520 may be configured to apply a first plurality of extraction parameters such as sqrt( DTT _i ) to the downmix signal 115 to obtain the direct signal portion 125 - 1 , and apply a second plurality of extraction parameters such as sqrt(ATT _i ) to the downmix signal 115 to obtain the ambient signal portion 125 - 2 . Here, the first plurality of extraction parameters and the second plurality of extraction parameters may form a diagonal matrix.

一般而言，直接/周围提取器120、420、520还可被构造为通过施加平方M×M提取矩阵至下混信号115来提取直接信号部分125-1或周围信号部分125-2，其中，平方M×M提取矩阵的大小（M）与下混声道（Ch₁…Ch_N）的数目（M）相对应。In general, the direct/surround extractor 120, 420, 520 can also be configured to extract the direct signal portion 125-1 or the ambient signal portion 125-2 by applying a square M×M extraction matrix to the downmix signal 115, wherein, The size (M) of the square M×M extraction matrix corresponds to the number (M) of the downmix channels (Ch ₁ . . . Ch _N ).

因此，施加周围提取可被描述为施加平方M×M提取矩阵，其中，M为下混声道（Ch₁…Ch_N）的数目。这可包括全部可能的方式来操纵输入信号来获得直接/周围输出，包括基于表示平方M×M提取矩阵（被构造为对角线矩阵）的主要组件的sqrt（ATT_i）及sqrt（DTT_i）参数的相当简单的办法，或被构造为完整矩阵的LMS交混办法。后者将在后文说明，此处，须注意，前述施加M×M提取矩阵的办法涵盖任何数目的声道，包括一个。Thus, applying ambient extraction can be described as applying a square M×M extraction matrix, where M is the number of downmix channels (Ch ₁ . . . Ch _N ). This can include all possible ways to manipulate the input signal to obtain direct/surrounding output, including sqrt(ATT _i ) and sqrt(DTT _i ) parameters, or the LMS hybrid approach structured as a full matrix. The latter will be explained later, and here, it should be noted that the aforementioned method of applying an M×M extraction matrix covers any number of channels, including one.

根据其它实施例，提取矩阵可以并非必然为M×M矩阵大小的平方矩阵，原因在于发明人具有较少数目的输出声道。因此，提取矩阵具有减少的行。该一实例可为提取单一直接信号来代替M。According to other embodiments, the extraction matrix may not necessarily be a square matrix of MxM matrix size because the inventors have a smaller number of output channels. Therefore, the extraction matrix has reduced rows. An example of this could be to extract a single direct signal instead of M.

也并非必要经常性取全部M个下混声道作为与具有提取矩阵的M列的输入。更明确言之，可与应用用途相关，此处并非必要具有全部声道作为输入信号。It is also not necessary to always take all M downmix channels as input to M columns with extraction matrices. More specifically, it may be relevant to the application use that it is not necessary here to have all channels as input signals.

图6示出了基于使用声道交混的LMS（最小均方）解的直接/周围提取器620的又一实施例600的框图。图6的直接/周围提取器620可对应于图1的直接/周围提取器120。在图6的实施例中，因此具有与图1实施例类似的实现方式和/或功能的相同框以相同的组件符号表示。但对应于图1的下混信号115的图6的下混信号615包含多个617下混声道Ch₁…Ch_N，其中，下混声道的数目（M）小于多声道音频信号101的声道Ch₁…Ch_N的数目（N），也即M<N。更明确言之，直接/周围提取器620被构造为通过使用声道交混的最小均方（LMS）解，来提取直接信号部分125-1或周围信号部分125-2，LMS解并不要求相等周围位准。如下提供此种LMS解，其并不要求相等周围位准，并且也可延伸至任何数目的声道。刚刚前述的LMS解并非强制性，而是表示前述办法的更精准替代之道。Fig. 6 shows a block diagram of yet another embodiment 600 of a direct/surround extractor 620 based on an LMS (least mean square) solution using channel mixing. The direct/surround extractor 620 of FIG. 6 may correspond to the direct/surround extractor 120 of FIG. 1 . In the embodiment of FIG. 6 , the same blocks having similar implementations and/or functions as those of the embodiment of FIG. 1 are therefore denoted by the same component symbols. But the downmix signal 615 of FIG. 6 corresponding to the downmix signal 115 of FIG. ₁ comprises a plurality 617 of downmix channels Ch _{1 .} The number of channels Ch ₁ ... Ch _N (N), that is, M<N. More specifically, direct/surround extractor 620 is configured to extract either direct signal portion 125-1 or ambient signal portion 125-2 by using a least mean square (LMS) solution of channel mixing, which does not require equal to the surrounding level. Such an LMS solution is provided as follows, which does not require equal ambient levels, and is also extendable to any number of channels. The LMS solution just mentioned is not mandatory, but represents a more precise alternative to the previous method.

用于直接/周围提取的交混权值的LMS解所使用的组件符号为：The component notation used for the LMS solution of blended weights for direct/surrounding extraction is:

Ch_i 声道iCh _i channel i

α_i 在声道i中的直接声音增益α _i direct sound gain in channel i

D及

声音的直接部分及其估值D and

The immediate part of the sound and its valuation

A_i及声道i的周围部分及其估值A _i and Surrounding part of channel i and its evaluation

P_X=E[XX*] X的估算得的能量P _X = E[XX*] Estimated energy of X

E[] 预期值E[] expected value

X的估算误差

X's estimation error

声道i对直接部分的LMS交混权值

The LMS blending weight of channel i to the direct part

声道n对声道i的周围部分的LMS交混权值

The LMS blending weight of channel n to the surrounding part of channel i

在本内文中，须注意，LMS解的导算可基于多声道音频信号的各个声道的频谱表示型态，其表示频带中的每项函数。In this context, it is noted that the derivation of the LMS solution may be based on the spectral representation of each channel of a multi-channel audio signal, which represents each function in the frequency band.

信号模型被表示为The signal model is expressed as

Ch_i=a_iD+A_i Ch _i =a _i D+A _i

导算首先处理a）直接部分，然后，b）周围部分。最后，导算出权值的解，并描述权值的标准化方法。The derivation first deals with a) the immediate part, then, b) the surrounding part. Finally, a solution to the weights is derived and a method for normalizing the weights is described.

a）直接部分a) direct part

权值直接部分的估算为The direct part of the weight is estimated as

$\overset{^^}{D D.} = = {Σ Σ}_{i i = = 11}^{N N} {ω ω}_{\overset{^^}{D D.} i i} {Ch Ch}_{i i} = = {Σ Σ}_{i i = = 11}^{N N} {ω ω}_{\overset{^^}{D D.} i i} (({a a}_{i i} D D. + + {A A}_{i i}))$

估算误差读取Estimate Error Read

${E E.}_{\overset{^^}{D D.}} = = D D. - - \overset{^^}{D D.} = = D D. - - {Σ Σ}_{i i = = 11}^{N N} {ω ω}_{σi σi} (({a a}_{i i} + + {A A}_{i i}))$

为了获得LMS解，发明人要求

与输入信号正交To obtain the LMS solution, the inventors require

Orthogonal to the input signal

E[E_σCh_i]=0，对于全部kE[E _σ Ch _i ]=0, for all k

$E E. [[((D D. - - {Σ Σ}_{i i = = 11}^{N N} {w w}_{\overset{^^}{D D.} i i} (({a a}_{i i} D D. + + {A A}_{i i})))) {(({a a}_{k k} D D. + + {A A}_{k k}))}^{* *}]]$

$= = (({a a}_{k k} - - {Σ Σ}_{i i = = 11}^{N N} {w w}_{\overset{^^}{D D.} i i} {a a}_{i i} {a a}_{k k})) {P P}_{D D.} - - {w w}_{\overset{^^}{D D.} k k} {P P}_{Ak Ak} = = 00$

$&DoubleLeftRightArrow; &DoubleLeftRightArrow; {Σ Σ}_{i i = = 11}^{N N} {w w}_{\overset{^^}{D D.} i i} {a a}_{i i} {a a}_{k k} {P P}_{D D.} + + {w w}_{\overset{^^}{D D.} k k} {P P}_{AK AK} = = {a a}_{k k} {P P}_{D D.}$

呈矩阵形式，前述关系式读成In matrix form, the aforementioned relation is read as

b）周围部分b) Surrounding part

发明人从相同的信号模型开始并根据下式来估算权值The inventors start from the same signal model and estimate the weights according to

$\overset{^^}{{A A}_{i i}} = = {Σ Σ}_{n no = = 11}^{N N} {w w}_{\overset{^^}{A A} i i,, n no} {Ch Ch}_{i i} {Σ Σ}_{n no = = 11}^{N N} {w w}_{\overset{^^}{A A} i i,, n no} (({a a}_{i i} D D. + + {A A}_{i i}))$

估算误差为The estimated error is

${E E.}_{\overset{^^}{A A} i i} = = {A A}_{i i} - - {\overset{^^}{A A}}_{i i} = = {A A}_{i i} - - {Σ Σ}_{n no = = 11}^{N N} {w w}_{\overset{^^}{A A} i i,, n no} (({a a}_{i i} D D. + + {A A}_{i i}))$

并且正交性and orthogonality

$E [E_{\hat{A} i} {Ch}_{k}] = 0,$ 对于全部k $E. [{E.}_{\hat{A} i} {Ch}_{k}] = 0,$ for all k

以矩阵形式，前述关系式读成In matrix form, the preceding relation reads as

权值的解solution of weights

权值可通过颠倒矩阵A来求解，这对直接部分及周围部分的计算而言是相同的。在立体声情况下，该解为：The weights can be solved by inverting the matrix A, which is the same for the calculation of the immediate part and the surrounding part. In the stereo case, the solution is:

${w w}_{D D. 11} = = \frac{{a a}_{11} {P P}_{D D.} {P P}_{A A 22}}{{a a}_{22} {a a}_{22} {P P}_{D D.} {P P}_{A A 11} + + {a a}_{11} {a a}_{11} {P P}_{D D.} {P P}_{A A 22} + + {P P}_{A A 11} {P P}_{A A 22}} = = \frac{{a a}_{11} {P P}_{D D.} {P P}_{A A 22}}{div div}$

${w w}_{D D. 22} = = \frac{{a a}_{22} {P P}_{D D.} {P P}_{A A 11}}{div div}$

${w w}_{\overset{^^}{A A} 1,1 1,1} = = \frac{{a a}_{22} {a a}_{22} {P P}_{D D.} {P P}_{A A 11} + + {P P}_{A A 11} {P P}_{A A 22}}{div div}$

${w w}_{\overset{^^}{A A} 1,2 1,2} = = \frac{{a a}_{11} {a a}_{22} {P P}_{D D.} {P P}_{A A 11}}{div div}$

${w w}_{\overset{^^}{A A} 2,1 2,1} = = \frac{{a a}_{11} {a a}_{22} {P P}_{D D.} {P P}_{A A 22}}{div div}$

${w w}_{\overset{^^}{A A} 2,2 2,2} = = \frac{{a a}_{11} {a a}_{11} {P P}_{D D.} {P P}_{A A 22} + + {P P}_{A A 11} {P P}_{A A 22}}{div div}$

此处，div为除数a₂a₂P_DP_A1+a₁a₁P_DP_A2+P_A1P_A2。Here, div is the divisor a ₂ a ₂ P _D P _A1 +a ₁ a ₁ P _D P _A2 +P _A1 P _A2 .

权值的标准化Standardization of weights

权值用于LMS解，但因能量级（energy level）须保留，故将权值标准化。这如此也使得上式中由div项进行的除法变成不必要。标准化通过确保输出直接及周围声道为P_D及P_Ai来进行，其中，i为声道指数。The weights are used for the LMS solution, but since the energy level must be preserved, the weights are standardized. This also makes the division by the div term in the above formula unnecessary. Normalization is done by ensuring that the output direct and surrounding channels are P _D and P _Ai , where i is the channel index.

直接假设发明人知晓声道间相干性、混合因子及声道能量。为求简明，发明人关注在二个声道的情况，并且特别为一对权值及

其为从第一输入声道及第二输入声道产生第一周围声道的增益。步骤如下：It is straightforward to assume that the inventor knows the inter-channel coherence, mixing factor and channel energy. For simplicity, the inventors focus on the case of two channels, and specifically for a pair of weights and

It is the gain to generate the first ambient channel from the first input channel and the second input channel. Proceed as follows:

步骤1：计算输出信号能量（其中，相干性部分逐振幅加总，而非相干部分逐能加总）Step 1: Calculate the output signal energy (where the coherent part sums amplitude by amplitude and the non-coherent part sums energy by energy)

${P P}_{\overset{^^}{A A} 11} = = {(({w w}_{\overset{^^}{A A} 1,1 1,1} \sqrt{| | ICC ICC | | \cdot \cdot {P P}_{11}} + + sign sign ((ICC ICC)) {w w}_{\overset{^^}{A A} 1,2 1,2} \sqrt{| | ICC ICC | | \cdot &Center Dot; {P P}_{22}}))}^{22} + + ((11 - - | | ICC ICC | |)) {P P}_{11} {w w}_{\overset{^^}{A A} 1,1 1,1}^{22} + + ((11 - - | | ICC ICC | |)) {P P}_{22} {w w}_{\overset{^^}{A A} 1,2 1,2}^{22}$

步骤2：计算标准化增益因子Step 2: Calculate Normalized Gain Factor

$g g = = \sqrt{\frac{{P P}_{A A 11}}{{P P}_{\overset{^^}{A A} 11}}}$

并施加该结果至交混权值因子及

在步骤1中，ICC的绝对值和符号操作数被包括为也考虑输入声道为负相干的情况。其余权值因子也以相同方式被标准化。and apply this result to the blend weight factor and

In step 1, the absolute value and sign operands of ICC are included to also consider the case where the input channels are negatively coherent. The remaining weight factors are also normalized in the same way.

更明确言之，参考前文说明，直接/周围提取器620可被构造为通过假设稳定的多声道信号模型而导算出LMS解，使得LMS解不会限于立体声道下混信号。More specifically, referring to the foregoing description, the direct/surround extractor 620 can be configured to derive the LMS solution by assuming a stable multi-channel signal model, so that the LMS solution is not limited to the stereo channel downmix signal.

图7a示出了直接/周围估算器710的实施例700的框图，该估算器基于立体声周围估算公式。图7a的直接/周围估算器710可对应于图1的直接/周围估算器110。更明确言之，图7a的直接/周围估算器710被构造为针对多声道音频信号101的各声道（Ch_i）施加使用空间参数信息105的立体声周围估算公式，其中，该立体声周围估算公式可以函数相依性表示为Fig. 7a shows a block diagram of an embodiment 700 of a direct/ambient estimator 710, which is based on a stereo ambient estimation formula. The direct/surround estimator 710 of FIG. 7a may correspond to the direct/surround estimator 110 of FIG. 1 . More specifically, direct/ambient estimator 710 of FIG _. The formula can be expressed as a functional dependency as

DTT_i=f_DTT[σ(Ch_i′R)_′ICC_i(Ch_i′R)]DTT _i =f _DTT [σ(Ch _i′ R) _′ ICC _i (Ch _i′ R)]

ATT_i＝1-DTT_i ATT _i ＝1-DTT _i

其明确地示出了对声道位准差（CLD_i）或声道Ch_i的参数σ_i及声道间相干性（ICC_i）参数的相依性。如图7a所示，空间参数信息105被馈送至直接/周围估算器710，并可包含各声道Ch_i的声道间关系式参数ICC_i及σ_i。在通过使用直接/周围估算器710施加此一立体声周围估算公式之后，将分别在其输出715处获得直接对总（DTT_i）能比或周围对总（ATT_i）能比。须注意，前述用来估算各个DTT能比或ATT能比的立体声周围估算公式并非基于相等周围的条件。It explicitly shows the dependence on the channel level difference (CLD _i ) or parameter σ _i of the channel Ch _i and the inter-channel coherence (ICC _i ) parameter. As shown in Fig. 7a, the spatial parameter information 105 is fed to the direct/surrounding estimator 710 and may include inter-channel relational parameters ICC _i and σ _i for each channel Ch _i . After applying this stereo ambient estimation formula by using the direct/ambient estimator 710, a direct-to-total (DTT _i ) or ambient-to-total (ATT _i ) energy ratio will be obtained at its output 715 , respectively. It should be noted that the aforementioned stereo surround estimation formulas for estimating each DTT energy ratio or ATT energy ratio are not based on the condition of equal ambient.

更明确言之，直接/周围比值估算的执行方式为声道直接能量相对于该声道总能的比（DTT）可以公式表示为More specifically, direct/surround ratio estimation is performed in such a way that the ratio of the direct energy of a channel to the total energy of that channel (DTT) can be formulated as

这里，

及

Ch为检查声道，并且R为其余声道的线性组合。<>为时间平均值。当声道及其余声道的线性组合的周围位准假设为相等，并且其相干性为零时遵照此一公式。here,

and

Ch is the check channel, and R is a linear combination of the remaining channels. <> is the time average. This formula is followed when the ambient levels of a channel and a linear combination of other channels are assumed to be equal and their coherence is zero.

图7b示出了DTT（直接对总）能比760实例呈声道间相干性参数ICC770的函数的线图750。在图7b的实施例中，声道位准差（CLD）或参数σ例如设定为1（σ=1），使得声道Ch_i的位准P（Ch_i）与其余声道的线性组合R位准P（R）将为相等。在此种情况下，如标示以DTT~ICC的直线775指示，DTT能比760将与ICC参数成线性比例。从图7b可知，在ICC=0的情况下，其可对应于完全解相干性声道间关系式，DTT能比760将为0，其可对应于完全周围情况（情况“R₁”）。但在ICC=1的情况下，其可对应于完全相干性声道间关系式，DTT能比760将为1，其可对应于全然直接情况（案例“R₂”）。因此，在声道中相对于该声道的总能，在情况R₁大致上并无直接能量，而在情况R₂大致上并无周围能量。Figure 7b shows a line graph 750 of an example DTT (direct-to-total) energy ratio 760 as a function of the inter-channel coherence parameter ICC 770 . In the embodiment of Fig. 7b, the channel level difference (CLD) or the parameter σ is set to 1 (σ=1), for example, so that the linear combination of the level P(Ch _i ) of the channel Ch _i and the remaining channels R levels P(R) will be equal. In this case, the DTT energy ratio 760 will be linearly proportional to the ICC parameter, as indicated by the straight line 775 labeled DTT~ICC. From Fig. 7b, in the case of ICC=0, which may correspond to a fully decoherent inter-channel relationship, the DTT power ratio 760 will be 0, which may correspond to a fully ambient case (case "R ₁ "). But in the case of ICC=1, which may correspond to a perfectly coherent inter-channel relationship, the DTT energy ratio 760 will be 1, which may correspond to a completely direct case (case “R ₂ ”). Thus, there is substantially no direct energy in case _R1 and substantially no ambient energy in case _R2 relative to the total energy in the vocal tract.

图8示出了根据本发明的其它实施例的编码器/译码器系统800的框图。在该编码器/译码器系统800的译码器端上，示出了译码器820的实施例，其可与图1的装置100相对应。由于图1与图8实施例的相似性，这二个实施例中具有相似实现方式和/或功能的相同框以相同的组件符号表示。如图8的实施例所示，直接/周围提取器120可在具有多个下混声道Ch₁…Ch_M的下混信号115上操作。图8的直接/周围估算器110进一步被构造为接收下混信号815（选择性）的至少二个下混声道825，使得多声道音频信号110的直接部分或周围部分的位准信息113将基于所接收的至少个二下混声道825的空间参数信息105以外估算。最后，在由直接/周围提取器120提取后，将获得直接信号部分125-1或周围信号部分125-2。FIG. 8 shows a block diagram of an encoder/decoder system 800 according to other embodiments of the present invention. On the decoder side of the encoder/decoder system 800, an embodiment of a decoder 820 is shown, which may correspond to the device 100 of FIG. 1 . Due to the similarity between the embodiments of Fig. 1 and Fig. 8, the same blocks with similar implementations and/or functions in these two embodiments are denoted by the same component symbols. As shown in the embodiment of FIG. 8 , the direct/surround extractor 120 may operate on the downmix signal 115 having a plurality of downmix channels Ch ₁ . . . Ch _M . The direct/surrounding estimator 110 of FIG. 8 is further configured to receive at least two downmix channels 825 of the downmix signal 815 (optional), such that the level information 113 of the direct part or the surrounding part of the multi-channel audio signal 110 will be The estimation is based on the received spatial parameter information 105 of at least two downmix channels 825 . Finally, after extraction by the direct/surround extractor 120, either the direct signal portion 125-1 or the ambient signal portion 125-2 will be obtained.

在该编码器/译码器系统800的编码器端上，示出了编码器810的实施例，其可包含下混器815，用来将多声道音频信号（Ch₁…Ch_N）下混成为具有多个下混声道Ch₁…Ch_M的下混信号115，其中，声道数目从N减少成M。下混器815还可被构造为通过根据多声道音频信号101计算声道间关系式来输出空间参数信息105。在图8的编码器/译码器系统800中，下混信号115及空间参数信息105可从编码器810传输至译码器820。这里，编码器810可基于下混信号115和空间参数信息105导算出编码信号用于从编码器端传输至译码器端。此外，空间参数信息105基于多声道音频信号101的声道信息。On the encoder side of the encoder/decoder system 800, an embodiment of an encoder 810 is shown, which may include a downmixer 815 for downmixing multi-channel audio signals (Ch ₁ ...Ch _N ) to The downmix signal 115 has a number of downmix channels Ch ₁ . . . Ch _M , where the number of channels is reduced from N to M. The down-mixer 815 may also be configured to output the spatial parameter information 105 by calculating an inter-channel relationship from the multi-channel audio signal 101 . In the encoder/decoder system 800 of FIG. 8 , the downmix signal 115 and the spatial parameter information 105 may be transmitted from the encoder 810 to the decoder 820 . Here, the encoder 810 may derive an encoded signal based on the downmix signal 115 and the spatial parameter information 105 for transmission from the encoder end to the decoder end. Furthermore, the spatial parameter information 105 is based on channel information of the multi-channel audio signal 101 .

另一方面，声道间关系式参数σ_i（Ch_i,R）及ICC_i（Ch_i,R）可在编码器810的声道Ch_i与其余声道的线性组合R间计算，并且在编码信号的内部传输。译码器820又可接收编码信号，并且在所传输的声道间关系式参数σ_i（Ch_i,R）和ICC_i（Ch_i,R）上操作。On the other hand, the inter-channel relationship parameters σ _i (Ch _i ,R) and ICC _i (Ch _i ,R) can be calculated between the linear combination R of the channel Ch _i of the encoder 810 and the remaining channels, and in Internal transmission of coded signals. The decoder 820 may in turn receive the encoded signal and operate on the transmitted inter-channel relationship parameters σ _i (Ch _i ,R) and ICC _i (Ch _i ,R).

另一方面，编码器810还可被构造为计算欲传输的成对不同声道（Ch_i,Ch_j）间的声道间相干性参数ICC_i，j。在这种情况下，编码器810应能够根据所传输的逐对计算的ICC_i，j（Ch_i,Ch_j）导算出声道Ch_i与其余声道的线性组合R之间的参数ICC_i（Ch_i,R），使得实现前文已描述的对应实施例。在本上下文中须注意，译码器820无法单独从知晓下混信号115中来重建参数ICC_i（Ch_i,R）。On the other hand, the encoder 810 can also be configured to calculate the inter-channel coherence parameter ICC _i,j between different pairs of channels (Ch _i , Ch _j ) to be transmitted. In this case, the encoder 810 should be able to derive the parameter ICC i between the channel Ch _i and the linear combination R of the remaining channels from the transmitted pairwise calculated ICC _i,j (Ch _i _, Ch _j ) (Ch _i , R), such that the corresponding embodiments already described above are realized. It should be noted in this context that the decoder 820 cannot reconstruct the parameters ICC _i (Ch _i ,R) from the knowledge of the downmix signal 115 alone.

在实施例中，所传输的空间参数不仅关于逐对声道比较。In an embodiment, the transmitted spatial parameters are not only about pair-wise channel comparisons.

举例言之，最典型的MPS情况是具有二个下混声道。MPS译码中的第一空间参数集合使得二个声道变成三个声道：中、左及右。引导此种映射关系的参数集合被称作中心预测系数（CPC）和针对二对三组态具有专一性的ICC参数。For example, the most typical MPS situation is to have two downmix channels. The first set of spatial parameters in MPS coding makes two channels into three channels: center, left and right. The set of parameters that guide this mapping is called the Central Prediction Coefficient (CPC) and the ICC parameters that are specific for two-to-three configurations.

空间参数的第二集合被一分为二：侧声道分成相对应的前声道和后声道，而中心声道被分成中心声道和Lfe声道。这种映射关系与如前文介绍的ICC及CLD参数有关。The second set of spatial parameters is split in two: the side channels are split into corresponding front and rear channels, while the center channel is split into center and Lfe channels. This mapping relationship is related to the ICC and CLD parameters introduced above.

对全部下混组态类别及所有种类的空间参数类别皆找出计算规则并不实际。然而，虚拟地遵照下混步骤则是符合实际的。原因在于发明人知晓二声道变成三声道，而三声道变成六声道，最终，发明人找出二输入声道如何安排路径成为六输出声道的输入输出关系式。输出信号只有下混声道的线性组合加上其解相关（decorrelated）版本的线性组合。并非一定实际上译码输出信号并且测量它，而是发明人知晓此一“解码矩阵”，可以在运算上有效地计算参数域中任何声道或声道组合的ICC及CLD参数。It is not practical to find calculation rules for all downmix configuration classes and all kinds of spatial parameter classes. However, it is practical to follow the downmixing steps virtually. The reason is that the inventor knows that two channels become three channels, and three channels become six channels. Finally, the inventor finds out how to arrange the path of two input channels to become the input-output relational expression of six output channels. The output signal is just the linear combination of the downmix channels plus the linear combination of their decorrelated versions. It is not necessary to actually decode the output signal and measure it, but knowing such a "decoding matrix" the inventors can computationally efficiently calculate the ICC and CLD parameters for any channel or combination of channels in the parameter domain.

与下混信号组态及多声道信号组态独立无关，译码信号的各个输出为下混信号的线性组合加上其各自的解相关版本的线性组合。Independently of the downmix signal configuration and the multi-channel signal configuration, each output of the decoded signal is a linear combination of the downmix signal plus a linear combination of their respective decorrelated versions.

${Ch Ch__out out}_{i i} = = {Σ Σ}_{k k = = 11}^{dmx dmx__channels channels} (({a a}_{k k,, i i} Ch Ch__{dmx dmx}_{k k} + + {b b}_{k k,, i i} D D. [[Ch Ch__{dmx dmx}_{k k}]]))$

其中，操作数D[]对应于解相关器（decorrelator），也即，制成输入信号的不相干复本的处理程序。因子a和b是已知的，原因在于其可从参数侧边信息直接导算出。因从定义上，参数信息指导译码器如何从下混信号形成多声道输出信号。上式可简化成where the operand D[] corresponds to a decorrelator, that is, a process that makes an incoherent replica of the input signal. The factors a and b are known since they can be directly derived from the parametric side information. Because by definition, the parameter information instructs the decoder how to form the multi-channel output signal from the downmix signal. The above formula can be simplified to

${Ch Ch__out out}_{i i} = = {Σ Σ}_{k k = = 11}^{dmx dmx__channels channels} (({a a}_{k k,, i i} {Ch Ch__dmx dmx}_{k k})) + + {D D.}_{i i}$

原因在于全部解相关部分可组合用于能量/相干性比较。D的能量是已知的，原因在于因子b在第一式中也是已知的。The reason is that all decorrelation parts can be combined for energy/coherence comparison. The energy of D is known because the factor b is also known in the first equation.

根据这一点，须注意，发明人可在输出声道间或在输出声道的不同线性组合间做任一种相干性及能量比较。在二下混声道及一输出声道集合的简单例的情况下，声道号3及5相对彼此作比较，总和计算如下：In light of this, it should be noted that the inventors can make any kind of coherence and energy comparisons between output channels or between different linear combinations of output channels. In the simple case of two downmix channels and a set of output channels, channel numbers 3 and 5 are compared against each other and the sum is calculated as follows:

${σ σ}_{3,5 3,5} = = \frac{E E. [[{Ch Ch__out out}_{33}^{22}]]}{E E. [[Ch Ch__{out out}_{55}^{22}]]}$

其中，E[]为预期（实际上：平均）操作数。两项可以公式表示如下where E[] is the expected (actually: average) number of operands. The two terms can be expressed as follows

全部前述参数皆是已知的，或从下混信号为可量测。交叉项E[Ch_dmx*D]被定义为零，因而在公式中的较下列。同理，相干性公式为All the aforementioned parameters are known or measurable from the downmix signal. The cross term E[Ch_dmx*D] is defined as zero and thus the lower column in the formula. Similarly, the coherence formula is

${ICC ICC}_{3,5 3,5} = = \frac{E E. [[{Ch Ch__out out}_{33} {Ch Ch__out out}_{55}]]}{\sqrt{E E. [[Ch Ch__{out out}_{33}^{22}]] E E. [[Ch Ch__{out out}_{55}^{22}]]}}$

再者，因上式中的全部部分为输入信号加解相关信号的线性组合，故解可直接获得。Furthermore, since all parts in the above formula are linear combinations of the input signal plus the decorrelation signal, the solution can be obtained directly.

如上实例比较二个输出声道，但同理可进行输出声道的线性组合间的比较，诸如使用容后详述的处理程序实例。The above example compares two output channels, but comparisons between linear combinations of output channels can similarly be performed, such as using the example handlers described in detail later.

综合前述先前实施例，所呈现的技术/构想包含下列步骤：Combining the aforementioned previous embodiments, the presented technique/idea includes the following steps:

1.取得可能高于下混声道数目的一“原先”声道集合的声道间关系式（相干性，位准）。1. Obtain inter-channel relationships (coherence, level) for a "original" set of channels that may be higher than the number of downmix channels.

2.估算该“原先”声道集合的周围能量及直接能量。2. Estimate the ambient and direct energies of the "old" channel set.

3.将“原先”声道集合的周围能量及直接能量下混为较少的声道数目。3. Downmix the ambient and direct energies of the "original" channel set to a smaller number of channels.

4.通过施加增益因子或增益矩阵，使用下混能量来提取所提供的下混声道中的直接信号及周围信号。4. The downmix energy is used to extract the direct and surrounding signals in the provided downmix channel by applying gain factors or gain matrices.

空间参数侧边信息的使用通过图2的实施例将最佳地得到解释和概括。在图2的实施例中，发明人有一参数立体声串流，其包括单一音频声道及有关其所表示的立体声的声道间差（相干性，位准）的空间侧边信息。现在因发明人知晓声道间差，故可将如上立体声周围估算式施加至该声道间差，并得知原先声道集合的直接能量及周围能量。然后，发明人可通过加总直接能量（使用相干性加法）及周围能量（使用非相干性加法）来“下混”声道能量，并导算出该单一下混声道的直接对总能比及周围对总能比。The use of spatial parameter side information is best explained and summarized by the embodiment of FIG. 2 . In the embodiment of FIG. 2, the inventors have a parametric stereo stream that includes a single audio channel and spatial side information about the inter-channel differences (coherence, level) of the stereo it represents. Now since the inventor knows the inter-channel difference, the above stereo ambient estimation formula can be applied to the inter-channel difference, and the direct energy and the ambient energy of the original channel set are known. The inventors can then "downmix" the channel energy by summing the direct energy (using coherent addition) and the ambient energy (using incoherent addition), and derive the direct-to-total energy ratio and The surrounding pairs can always be compared.

参考图2的实施例，空间参数信息大致上包含声道间相干性参数（ICC_L，ICC_R）及声道位准差参数（CLD_L，CLD_R），它们分别与参数立体声音频信号的左声道（L）及右声道（R）相对应。此处，须注意，声道间相干性参数ICC_L与ICC_R是相等的（ICC_L=ICC_R），而声道位准差参数CLD_L与CLD_R通过CLD_L＝-CLD_R而相关。相对应地，声道位准差参数CLD_L与CLD_R典型地分别为参数σ_L及σ_R的分贝值，故左（L）及右（R）声道的参数σ_L及σ_R通过σ_L=1/σ_R而相关。这些声道间差参数可以容易地用来基于立体声周围估算公式，而对二声道（L,R）计算各个直接对总能比（DTT_L,DTT_R）及周围对总能比（ATT_L,ATT_R）。在该立体声周围估算公式中，左声道（L）的直接对总能比及周围对总能比（DTT_L,ATT_L）取决于左声道L的声道间差参数（CLD_L，ICC_L），而右声道（R）的直接对总能比及周围对总能比（DTT_R，ATT_R）取决于右声道R的声道间差参数（CLD_R,ICC_R）。此外，对参数立体声音频信号的二声道L、R的能量（E_L,E_R）可分别基于左声道（L）及右声道（R）的声道位准差参数（CLD_L,CLD_R）来导算出。此处，左声道L的能量（E_L）可通过施加左声道L的声道位准差参数（CLD_L）至该单声道下混信号得知，而右声道R的能（E_R）可通过施加右声道R的声道位准差参数（CLD_R）至该单声道下混信号得知。然后通过将二声道（L,R）的能量（E_L,E_R）与相对应的基于DTT_L、DTT_R、及ATT_L、ATT_R的参数相乘，可获得对二声道（L,R）的直接能量（E_DL,E_DR）及周围能量（E_AL,E_AR）。然后，二声道（L,R）的直接能量（E_DL,E_DR）可通过使用相干性下混法则组合/相加来获得单声道下混信号的直接部分的下混能量（E_D,mono）；而二声道（L,R）的周围能量（E_AL,E_AR）可通过使用非相干性下混法则组合/相加来获得单声道下混信号的周围部分的下混能（E_A,mono）。然后，通过找出直接信号部分及周围信号部分的下混能量（E_D,mono,E_A,mono）与该单声道下混信号的总能量（E_mono）的关系式，将得知该单声道下混信号的直接对总能比（DTT_mono）及周围对总能比（ATT_mono）。最后，基于这些DTT_mono能比及ATT_mono能比，大致上可从该单声道下混信号提取直接信号部分或周围信号部分。Referring to the embodiment in Fig. 2, the spatial parameter information generally includes inter-channel coherence parameters (ICC _L , ICC _R ) and channel level difference parameters (CLLD _L , _CLDR ), which are respectively related to the left Channel (L) and right channel (R) correspond. Here, it should be noted that the inter-channel coherence parameters ICC _L and ICC _R are equal (ICC _L =ICC _R ), and the channel level difference parameters CLD _L and CLD _R are related by CLD _L = _-CLDR . Correspondingly, the channel level difference parameters CLD _L and CLD _R are typically the decibel values of the parameters σ _L and σ _R respectively, so the parameters _σ L and σ _R of the left (L) and right (R) channels pass through σ _L = 1/σ _R and related. These inter-channel difference parameters can be easily used to calculate the individual direct-to-total energy ratios (DTT _L , DTT _R ) and ambient-to-total energy ratios (ATT _L ,ATT _R ). In this stereo surround estimation formula, the direct-to-total-energy ratio and the ambient-to-total-energy ratio (DTT _L , ATT _L ) of the left channel (L) depend on the inter-channel difference parameters (CLD _L , ICC _L ), while the direct to total energy ratio and ambient to total energy ratio (DTT _R , ATTR ) of the right channel ( _R ) depend on the inter-channel difference parameters ( _CLDR , ICC _R ) of the right channel R. In addition, the energy (E _L , E _R ) of the two channels L and R of the parametric stereo audio signal can be based on the channel level difference parameters (CLD _L , _CLDR ) to derive the calculation. Here, the energy of the left channel L (E _L ) can be obtained by applying the channel level difference parameter (CLLD _L ) of the left channel L to the mono downmix signal, while the energy of the right channel R ( E _R ) can be obtained by applying the channel level difference parameter (CLDR) of the right channel _R to the mono downmix signal. Then by multiplying the energy (E _L , E _R ) of the two-channel (L, R) with the corresponding parameters based on DTT _L , DTT _R , and ATT _L , ATT _R , the two-channel (L ,R) direct energy (E _DL , E _DR ) and surrounding energy (E _AL , E _AR ). The direct energies (E _DL , E _DR ) of the two channels (L,R) can then be combined/added using the coherent downmixing law to obtain the downmix energy (E _{D ,mono} ); while the ambient energies (E _AL ,E _AR ) of the two-channel (L,R) can be combined/added using incoherent downmixing rules to obtain the downmix of the ambient part of the mono downmix signal Can (E _{A, mono} ). Then, by finding the relationship between the downmix energy (E _D,mono ,EA _,mono ) of the direct signal part and the surrounding signal part and the total energy (E _mono ) of the mono downmix signal, the Direct to Total Energy Ratio (DTT _mono ) and Ambient to Total Energy Ratio (ATT _Mono ) of the mono downmix signal. Finally, based on these DTT _mono energy ratios and ATT _mono energy ratios, either direct signal parts or ambient signal parts can roughly be extracted from the mono downmix signal.

在音频的再现上，经常需要通过头戴耳机而再现声音。耳机收听具有独特特征，使得其与扬声器收听并且也与任何自然声音环境有极大的不同。音频直接设定给左耳及右耳。再现的音频内容典型地再现给扬声器回放。因此，音频信号并未含有人类听觉系统用在空间声音知觉的性质及提示。除非系统中有导入双耳处理，否则即为此种情况。In audio reproduction, it is often necessary to reproduce sound through headphones. Headphone listening has unique characteristics that make it very different from speaker listening and also from any natural sound environment. Audio is set directly to the left and right ear. The reproduced audio content is typically reproduced for speaker playback. Therefore, audio signals do not contain the properties and cues that the human auditory system uses for spatial sound perception. This is the case unless binaural processing is imported into the system.

基本上，双耳处理可称作为一种处理程序，其取输入声音并对其修正，使得声音只含有知觉上正确的（就人类听觉系统处理空间声音而言）这些耳际性质及单耳性质。双耳处理并非直接工作，根据最先进的既有解决的方法仍然不是最佳的。Basically, binaural processing can be called a processing procedure that takes an input sound and modifies it so that the sound contains only those perceptually correct (in terms of the human auditory system processing spatial sound) these interaural and monaural properties . Binaural processing does not work directly and is still not optimal according to state-of-the-art existing solutions.

存在大量应用，其中，已经包括音频及电影回放的双耳处理，诸如被设计用来将多声道音频信号变换成耳机的双耳对应部分的媒体播放器及处理装置。典型的办法是使用头部相关传递函数（head-related transferfunctions（HRTF））来制作虚拟耳机，并加上室内效果给该信号。理论上，这可相当于在特殊室内使用耳机收听。There are numerous applications where binaural processing of audio and movie playback has been included, such as media players and processing devices designed to transform multi-channel audio signals into binaural counterparts for headphones. The typical approach is to use head-related transfer functions (HRTF) to make virtual headphones and add room effects to this signal. In theory, this could be the equivalent of listening with headphones in a special room.

然而，实际上重复示出这种办法尚未能一致地满足收听者。似乎需要折衷，使用此种直接方法的良好空间化牺牲了音频质量，诸如音色或音质改变变得不佳、室内效果恼人的知觉、以及动态的丧失。其它问题包括定位不准确（例如，头内定位、前后混淆），缺乏音源的空间距离，并且耳间（inter-aural）不匹配，也即由于耳间提示错误而靠近耳朵的听觉。However, it has been shown repeatedly in practice that this approach has not consistently satisfied listeners. There seems to be a tradeoff, good spatialization using this direct approach sacrifices audio quality, such as poor timbre or timbre changes, annoying perception of room effects, and loss of dynamics. Other problems include inaccurate localization (eg, intrahead positioning, front-to-back confusion), lack of spatial distance to sound sources, and inter-aural mismatch, that is, hearing close to the ear due to wrong interaural cues.

不同的收听者对判定的问题有极大差异。灵敏度也依输入材料各异，诸如音乐（就音色而言，质量标准严格）、电影（较不严格）及游戏（甚至更不严格，但定位是重要的）。根据内容也典型地存在不同的设计目的。Different listeners respond to adjudicated questions very differently. Sensitivity also varies depending on the input material, such as music (strict quality standards in terms of timbre), movies (less stringent) and games (even less stringent, but positioning is important). There are also typically different design purposes depending on the content.

因此，后文的细节尽可能成功地处理克服前述问题的办法来最大化平均知觉总体质量。Therefore, the details that follow deal as successfully as possible with ways of overcoming the aforementioned problems to maximize the average perceptual overall quality.

图9a示出了根据本发明其它实施例的双耳直接声音呈现装置910的概况900的框图。如图9a所示，双耳直接声音呈现装置910被构造为用于处理其可存在于图1实施例的直接/周围提取器120的输出处的直接信号部分125-1，以获得第一双耳输出信号915。第一双耳输出信号915可包含L指示的左声道及R指示的右声道。Fig. 9a shows a block diagram of an overview 900 of a binaural direct sound presentation device 910 according to other embodiments of the present invention. As shown in FIG. 9a, the binaural direct sound rendering device 910 is configured for processing the direct signal portion 125-1 which may be present at the output of the direct/surround extractor 120 of the embodiment of FIG. 1 to obtain a first binaural Ear output signal 915. The first binaural output signal 915 may include a left channel indicated by L and a right channel indicated by R.

此处，双耳直接声音呈现装置910可被构造为通过头部相关传递函数（HRTF）馈送直接信号部分125-1来获得已变换的直接信号部分。此外，双耳直接声音呈现装置910可被构造为施加室内效果给己变换的直接信号部分来最终获得第一双耳输出信号915。Here, the binaural direct sound rendering device 910 may be configured to feed the direct signal portion 125-1 through a head related transfer function (HRTF) to obtain a transformed direct signal portion. Furthermore, the binaural direct sound rendering device 910 may be configured to apply a room effect to the transformed direct signal portion to finally obtain the first binaural output signal 915 .

图9b示出了图9a的双耳直接声音呈现装置910的细节905的框图。双耳直接声音呈现装置910可包含框912指示的“HRTF变换器”及框914指示的室内效果处理装置（早期反映的并列混响或模拟）。如图9b所示，HRTF变换器912及室内效果处理装置914可通过并列施加头部相关传递函数（HRTF）及室内效果，而在直接信号部125-1上操作，由此将获得第一双耳输出信号915。Fig. 9b shows a block diagram of a detail 905 of the binaural direct sound rendering device 910 of Fig. 9a. The binaural direct sound rendering device 910 may comprise a "HRTF transformer" indicated at block 912 and a room effects processing device (early reflection parallel reverberation or analog) at block 914 . As shown in Figure 9b, the HRTF converter 912 and the room effect processing device 914 can operate on the direct signal part 125-1 by applying the head-related transfer function (HRTF) and the room effect in parallel, thereby obtaining the first dual Ear output signal 915.

更明确言之，参考图9b，此种室内效果处理还可提供非相干性混响直接信号919，其可通过随后的交混滤波器920处理来使该信号适应扩散声场的耳间相干性。这里，滤波器920及HRTF变换器912组成第一双耳输出信号915。根据其它实施例，室内效果对直接声音的处理也可为早期反映的参数表示型态。More specifically, referring to Fig. 9b, such room effects processing may also provide an incoherent reverberant direct signal 919, which may be processed by a subsequent cross filter 920 to adapt this signal to the interaural coherence of the diffuse sound field. Here, the filter 920 and the HRTF transformer 912 compose the first binaural output signal 915 . According to other embodiments, the processing of the direct sound by room effects may also be a parametric representation of early reflection.

因此，在实施例中，室内效果可以优选地与HRTF并列施加，而非串行施加（也即，通过HRTF馈送信号后施加室内效果）。更明确言之，唯有从来源直接传播的声音通过或由相应的HRTF变换。间接/混响声音可经概略估算也即以统计方式（通过采用相干性控制来代替HRTF）而进入耳朵。这也可通过串行实施，但并列方法是优选的。Therefore, in an embodiment, the indoor effect may preferably be applied in parallel with the HRTF rather than in series (ie, the indoor effect is applied after the signal is fed through the HRTF). More specifically, only sound propagating directly from the source passes through or is transformed by the corresponding HRTF. Indirect/reverberant sound can be approximated, ie statistically (by employing coherence control instead of HRTF) to enter the ear. This can also be done serially, but a parallel approach is preferred.

图10a示出了根据本发明的其它实施例的双耳周围声音呈现装置1010的概况1000的框图。如图10a所示，双耳周围声音呈现装置1010可被构造为用于处理其可存在于图1实施例的直接/周围提取器120的输出的周围信号部分125-2，以获得第二双耳输出信号1015。第二双耳输出信号1015可包含左声道（L）及右声道（R）。Fig. 10a shows a block diagram of an overview 1000 of a binaural ambient sound presentation device 1010 according to other embodiments of the present invention. As shown in FIG. 10a, the binaural ambient sound rendering device 1010 may be configured to process the ambient signal portion 125-2 which may be present at the output of the direct/ambient extractor 120 of the embodiment of FIG. 1 to obtain a second binaural Ear output signal 1015. The second binaural output signal 1015 may include a left channel (L) and a right channel (R).

图10b示出了图10a的双耳周围声音呈现装置1010的细节1005的框图。在图10b中可以看出，双耳周围声音呈现装置1010可被构造为将如标示以“室内效果处理”的框1012指示的室内效果施加给周围信号部分125-2，使得获得非相干性混响周围信号1013。此外，双耳周围声音呈现装置1010可被构造为通过施加滤波器（诸如框1014表示的交混滤波器）而处理非相干性混响周围信号1013，由此将提供第二双耳输出信号1015，第二双耳输出信号1015适用于实际扩散声场的耳间相干性。以“室内效果处理”标示的框1012也可被构造为使得其直接产生实际扩散声场的耳间相干性。在此种情况下，未使用框1014。Fig. 10b shows a block diagram of a detail 1005 of the binaural ambient sound rendering device 1010 of Fig. 10a. As can be seen in Fig. 10b, the binaural ambient sound rendering device 1010 may be configured to apply a room effect to the ambient signal portion 125-2 as indicated by the box 1012 labeled "Room Effects Processing" such that an incoherent mix is obtained. Ring the surrounding signal 1013. Furthermore, the binaural ambient sound rendering device 1010 may be configured to process the incoherent reverberant ambient signal 1013 by applying a filter, such as a cross filter represented by block 1014, thereby providing a second binaural output signal 1015 , the second binaural output signal 1015 is adapted to the interaural coherence of the actual diffuse sound field. Block 1012 labeled "Room Effects Processing" may also be structured such that it directly produces the interaural coherence of the actual diffuse sound field. In this case, block 1014 is not used.

根据其它实施例，双耳周围声音呈现装置1010被构造为施加室内效果和/或滤波器至周围信号部分125-2用于提供第二双耳输出信号1015，使得第二双耳输出信号1015将适用于实际扩散声场的耳间相干性。According to other embodiments, the binaural ambient sound rendering device 1010 is configured to apply room effects and/or filters to the ambient signal portion 125-2 for providing the second binaural output signal 1015, such that the second binaural output signal 1015 will Interaural coherence for practical diffuse sound fields.

在前述实施例中，解相关性及相干性控制可以在二个连续步骤中执行，但这不是必要的。也可以以单步骤处理实现相同的结果，而无需经中间非相干性信号的求取公式。两种方法同等有效。In the foregoing embodiments, decorrelation and coherence control may be performed in two consecutive steps, but this is not necessary. It is also possible to achieve the same result in a single-step process without going through an intermediate incoherent signal finding formula. Both methods are equally valid.

图11示出了多声道音频信号101的双耳再现的实施例1100的构想框图。更明确言之，图11的实施例表示一种用于多声道音频信号101的双耳再现的装置，其包含第一变换器1110（“频率变换”）、分离器1120（“直接-周围分离”）、双耳直接声音呈现装置910（“直接来源呈现”）、双耳周围声音呈现装置1010（“周围声音呈现”）、如“+”指示的组合器1130和第二变换器1140（“反相频率变换”）。更明确言之，第一变换器1110可被构造为用于将多声道音频信号101变换成频谱表示型态1115。分离器1120可被构造为用于从频谱表示型态1115提取直接信号部分125-1或周围信号部分125-2。这里，分离器1120可对应于图1的装置100，特别包括图1的实施例的直接/周围估算器110和直接/周围提取器120。如前文所解释的，双耳直接声音呈现装置910可在直接信号部分125-1上操作来获得第一双耳输出信号915。相对应地，双耳周围声音呈现装置1010可在周围信号部分125-2上操作来获得第二双耳输出信号1015。组合器1130可被构造为用于组合第一双耳输出信号915及第二双耳输出信号1015来获得组合信号1135。最后，第二变换器1140可被构造为用来将组合信号1135变换成时域来获得立体声输出音频信号1150（“用于耳机的立体声输出信号”）。FIG. 11 shows a conceptual block diagram of an embodiment 1100 of binaural reproduction of a multi-channel audio signal 101 . More specifically, the embodiment of Fig. 11 represents an apparatus for binaural reproduction of a multi-channel audio signal 101 comprising a first converter 1110 ("frequency conversion"), a separator 1120 ("direct-surround Separation"), binaural direct sound rendering 910 ("Direct source rendering"), binaural ambient sound rendering 1010 ("Ambient sound rendering"), combiner 1130 as indicated by "+" and second transducer 1140 ( "Inverse Frequency Transformation"). More specifically, the first transformer 1110 may be configured for transforming the multi-channel audio signal 101 into a spectral representation 1115 . Separator 1120 may be configured to extract direct signal portion 125 - 1 or ambient signal portion 125 - 2 from spectral representation 1115 . Here, the separator 1120 may correspond to the apparatus 100 of FIG. 1 , and particularly includes the direct/surrounding estimator 110 and the direct/surrounding extractor 120 of the embodiment of FIG. 1 . As explained before, the binaural direct sound rendering device 910 is operable on the direct signal portion 125 - 1 to obtain the first binaural output signal 915 . Correspondingly, the binaural ambient sound presentation device 1010 may operate on the ambient signal portion 125 - 2 to obtain the second binaural output signal 1015 . The combiner 1130 may be configured to combine the first binaural output signal 915 and the second binaural output signal 1015 to obtain a combined signal 1135 . Finally, a second transformer 1140 may be configured to transform the combined signal 1135 into the time domain to obtain a stereo output audio signal 1150 ("stereo output signal for headphones").

图11实施例的频率变换操作说明了在频率变换域中的系统功能，其为空间音频的听觉处理中的天然域。若该系统被在已经在频率变换域中发挥功能的系统上用作增上功能（add-on），则系统本身并非一定具有频率变换。The frequency transform operation of the FIG. 11 embodiment illustrates system functionality in the frequency transform domain, which is the natural domain in auditory processing of spatial audio. If the system is used as an add-on on a system already functioning in the frequency translation domain, then the system itself does not necessarily have frequency translation.

前述直接/周围分离方法可被再划分成二个不同部分。在直接/周围估算部分中，直接周围部分的位准和/或比基于信号模型的组合及音频信号的性质估算。在直接/周围提取部分中，已知的比及输入信号可用来形成周围信号的直接输出。The aforementioned direct/surround separation method can be subdivided into two distinct parts. In the direct/surrounding estimation part, the level and/or ratio of the immediate surrounding part is estimated based on a combination of signal models and properties of the audio signal. In the direct/surrounding extraction part, the known ratio and input signal can be used to form the direct output of the ambient signal.

最后，图12示出了包括双耳再现情况的直接/周围估算/提取的一实施例1200的总体框图。特定言之，图12的实施例1200可对应图11的实施例1100。但在实施例1200中，示出了与图1实施例的框110、120，其包括基于空间参数信息105的估算/提取处理程序，相对应的图11的分离器1120的细节。此外，与图11的实施例1100相反，并无任何不同域间的变换处理程序示出于图12的实施例1200。实施例1200的框也外显地在下混信号115运算，该信号可从多声道音频信号101导算出。Finally, Figure 12 shows a generalized block diagram of an embodiment 1200 including direct/surround estimation/extraction of binaural rendering situations. In particular, the embodiment 1200 of FIG. 12 may correspond to the embodiment 1100 of FIG. 11 . In embodiment 1200, however, details of separator 1120 of FIG. 11 are shown corresponding to blocks 110, 120 of the embodiment of FIG. 1, which include an estimation/extraction process based on spatial parameter information 105. Furthermore, in contrast to the embodiment 1100 of FIG. 11 , no transformation process between different domains is shown in the embodiment 1200 of FIG. 12 . The blocks of the embodiment 1200 also explicitly operate on the downmix signal 115 , which can be derived from the multi-channel audio signal 101 .

图13a示出了一种用于在滤波器排组域中从单声道下混信号提取直接/周围信号的装置1300实施例的框图。如图13a所示，装置1300包含一分析滤波器排组1310、用于直接部分的一合成滤波器排组1320、及用于周围部分的一合成滤波器排组1322。Fig. 13a shows a block diagram of an embodiment of an apparatus 1300 for extracting a direct/ambient signal from a mono downmix signal in the filter bank domain. As shown in Figure 13a, the device 1300 comprises an analysis filter bank 1310, a synthesis filter bank 1320 for the immediate part, and a synthesis filter bank 1322 for the surrounding part.

更明确言之，装置1300的分析滤波器排组1310可被实施为执行短期傅里叶变换（STFT），或例如可被构造为分析QMF滤波器排组，而装置1300的合成滤波器排组1310可被实施为执行反相短期傅里叶变换（ISTFT），或例如可被构造为合成QMF滤波器排组。More specifically, the analysis filter bank 1310 of the apparatus 1300 may be implemented to perform a short-term Fourier transform (STFT), or may be configured, for example, as an analysis QMF filter bank, while the synthesis filter bank of the apparatus 1300 1310 may be implemented to perform an inverse short-term Fourier transform (ISTFT), or may be constructed, for example, as a bank of synthesized QMF filters.

分析滤波器排组1310被构造为用于接收单声道下混信号1315，其可对应于如图2的实施例所示的单声道下混信号215，并将单声道下混信号1315变换成多个滤波器排组子频带1311。如图13a可知，多个1311滤波器排组子频带分别连结至多个直接/周围提取框1350、1352，其中，多个直接/周围提取框1350、1352被构造为施加基于DTT_mono参数或ATT_mono参数1333、1335至滤波器排组子频带。Analysis filter bank 1310 is configured to receive mono downmix signal 1315, which may correspond to mono downmix signal 215 as shown in the embodiment of FIG. Transform into multiple filter bank sub-bands 1311 . As can be seen in Fig. 13a, a plurality of 1311 filter group sub-bands are respectively connected to a plurality of direct/surrounding extraction frames 1350, 1352, wherein the plurality of direct/surrounding extraction frames 1350, 1352 are configured to apply DTT _mono parameters or ATT _mono Parameters 1333, 1335 to filter bank subbands.

如图13b所示，基于DTT_mono或ATT_mono的参数1333、1335可由DTT_mono,ATT_mono计算器1330提供。更明确言之，图13b的DTT_mono,ATT_mono计算器1330可被构造为计算DTT_mono,ATT_mono能比，或从对应于参数立体声音频信号（例如图2的参数立体声音频信号201）的左声道和右声道（L，R）的所提供的声道间相干性及声道位准差参数（ICC_L,CLD_L,ICC_R,CLD_R），而导算出基于DTT_mono或ATT_mono的参数，已经对应地如前所述。此处，对单一滤波器排组子频带，可使用相对应的参数105和基于DTT_mono或ATT_mono的参数1333、1335。在本上下文中，指出了这些参数相对于频率并非常数。DTT _mono or ATT _mono based parameters 1333, 1335 may be provided by a DTT _mono , ATT _mono calculator 1330, as shown in FIG. 13b. More specifically, the DTT _mono ,ATT _mono calculator 1330 of FIG. 13b may be configured to calculate the DTT _mono ,ATT _mono energy ratio, or from the left The inter-channel coherence and channel level difference parameters (ICC _L , CLD _L , ICC _R , _CLDR ) of the channel and the right channel (L, R) are provided, and the derivation is based on DTT _mono or ATT _mono The parameters have been correspondingly as described above. Here, for a single filter group sub-band, the corresponding parameters 105 and DTT _mono or ATT _mono based parameters 1333, 1335 can be used. In this context, it is pointed out that these parameters are not constant with respect to frequency.

由于施加了基于DTT_mono或ATT_mono的参数1333、1335的结果，分别可获得多个修正滤波器排组子频带1353、1355。随后，多个修正滤波器排组子频带1353、1355分别被馈至合成滤波器排组1320、1322，合成滤波器排组可被构造为合成多个修正滤波器排组子频带1353、1355，由此分别获得单声道下混信号1315的直接信号部分1325-1或周围信号部分1325-2。这里，图13a的直接信号部分1325-1对应于图2的直接信号部分125-1，而图13a的周围信号部分1325-2对应于图2的直接信号部分125-2。As a result of applying DTT _mono or ATT _mono based parameters 1333, 1335, a plurality of modified filter bank sub-bands 1353, 1355 respectively are available. Subsequently, the plurality of modified filter bank sub-bands 1353, 1355 are respectively fed to a synthesis filter bank 1320, 1322, which may be configured to synthesize the plurality of modified filter bank sub-bands 1353, 1355, Thereby the direct signal part 1325-1 or the ambient signal part 1325-2 of the mono downmix signal 1315 respectively is obtained. Here, the direct signal portion 1325-1 of FIG. 13a corresponds to the direct signal portion 125-1 of FIG. 2, and the ambient signal portion 1325-2 of FIG. 13a corresponds to the direct signal portion 125-2 of FIG.

参考图13b，图13a的多个1350、1352直接/周围提取框的直接/周围提取框1380特别包含DTT_mono,ATT_mono计算器1330和乘法器1360。乘法器1360可被构造为将多个滤波器排组子频带1311的单一滤波器排组（FB）子频带1301乘以相对应的基于DTT_mono或ATT_mono的参数1333、1335，使得获得多个滤波器排组子频带1353、1355的修正单一滤波器排组子频带1365。更明确言之，在框1380属于多个1350框的情况下，直接/周围提取框1380被构造为施加基于DTT_mono的参数；而在框1380属于多个框1352的情况下，其被构造为施加基于ATT_mono的参数。此外，修正单一滤波器排组子频带1365可提供直接部分或周围部分的相应的合成滤波器排组1320、1322。Referring to FIG. 13b, a direct/surrounding extraction block ₁₃₈₀ of the plurality 1350, ₁₃₅₂ direct/surrounding extraction blocks of FIG. The multiplier 1360 may be configured to multiply a single filter bank (FB) subband 1301 of the plurality of filter bank subbands 1311 by a corresponding DTT _mono or ATT _mono based parameter 1333, 1335 such that multiple Modified single filter bank sub-band 1365 of filter bank sub-bands 1353 , 1355 . More specifically, where block 1380 belongs to multiple blocks 1350, direct/surrounding extraction block 1380 is structured to apply DTT _mono based parameters; whereas, where block 1380 belongs to multiple blocks 1352, it is structured to Apply ATT _mono based parameters. Furthermore, modifying a single filterbank sub-band 1365 may provide a corresponding synthesis filterbank 1320, 1322 for the immediate part or the surrounding part.

根据实施例，空间参数及导算出的参数根据人类听觉系统的关键频带（例如28频带）而以频率分辨率提供，通常低于滤波器排组的分辨率。According to an embodiment, the spatial parameters and derived parameters are provided at a frequency resolution according to key frequency bands of the human auditory system (eg band 28), typically lower than the resolution of the filter bank.

因此，根据图13a的实施例的直接/周围提取大致上基于逐子频带计算得的声道间相干性及声道位准差参数（可与图3b的声道间关系式参数335相对应）而在滤波器排组域的不同子频带上运算。Therefore, the direct/surrounding extraction according to the embodiment of Fig. 13a is substantially based on inter-channel coherence and channel level difference parameters calculated sub-band by sub-band (may correspond to the inter-channel relation parameter 335 of Fig. 3b) Instead, operate on different subbands of the filter bank domain.

图14示出了根据本发明的又一实施例的MPEG环绕译码方案1400的实例的示意说明图。更明确言之，图14实施例描述从立体声下混信号1410译码成6个输出声道1420。此处，标示以“res”的信号为残响信号，其为解相关信号的选择性置换（从标示以“D”的框获得）。根据图14实施例，空间参数信息或声道间关系式参数（ICC,CLD）在MPS串流内部从编码器，诸如图8的编码器810，传输至译码器诸如图8的译码器820，分别可用来产生标示以“前置解相关器矩阵M1”及“混合矩阵M2”的解码矩阵1430、1440。图14的实施例所特有的为：通过使用混合矩阵M21440从侧声道（L，R）及中心声道（C）（L，R，C 1435）产生输出声道1420（也即上混声道L、LS、R、RS、C、LFE）大致上由空间参数信息1405决定，其可对应于图1的空间参数信息105，包含根据MPS环绕标准的特殊声道间关系式参数（ICC,CLD）。FIG. 14 shows a schematic illustration of an example of an MPEG Surround coding scheme 1400 according to yet another embodiment of the invention. More specifically, the embodiment of FIG. 14 describes decoding from a stereo downmix signal 1410 into six output channels 1420 . Here, the signal marked "res" is the reverberation signal, which is a selective permutation of the decorrelated signal (obtained from the box marked "D"). According to the embodiment of FIG. 14, spatial parameter information or inter-channel relationship parameters (ICC, CLD) are transmitted within the MPS stream from an encoder, such as the encoder 810 in FIG. 8, to a decoder, such as the decoder in FIG. 8 820, can be used to generate decoding matrices 1430, 1440 denoted as "pre-decorrelator matrix M1" and "mixing matrix M2", respectively. Unique to the embodiment of Fig. 14 is that output channels 1420 (i.e. upmix channels) are generated from side channels (L, R) and center channel (C) (L, R, C 1435) by using mixing matrix M2 1440 L, LS, R, RS, C, LFE) are roughly determined by the spatial parameter information 1405, which may correspond to the spatial parameter information 105 in FIG. 1, including special inter-channel relationship parameters (ICC, CLD ).

这里，将左声道（L）划分成对应的输出声道L、LS，将右声道（R）划分成对应的输出声道R、RS，以及将中心声道（C）划分成对应的输出声道C、LFE，这种划分可以由具有相对应的ICC、CLD参数的各个输入信号的一分为二（OTT）的组态表示。Here, the left channel (L) is divided into corresponding output channels L, LS, the right channel (R) is divided into corresponding output channels R, RS, and the center channel (C) is divided into corresponding Output channels C, LFE, this division can be represented by a one-to-two (OTT) configuration of the respective input signals with corresponding ICC, CLD parameters.

特别地，与“5-2-5组态”相对应的MPEG环绕译码方案1400实例例如可包含下列步骤。在第一步骤中，空间参数或参数侧边信息可调配成译码矩阵1430、1440，其在图14中根据既有的MPEG环绕标准示出。在第二步骤中，解码矩阵1430、1440可用于在参数域中来提供上混声道1420的声道间信息。在第三步骤中，使用如此提供的声道间信息，可计算各个上混声道的直接/周围能量。在第四步骤中，如此所得的直接/周围能量可下混至下混声道1410的数目。在第五步骤中，计算将施加给下混声道1410的权值。In particular, an example of the MPEG surround decoding scheme 1400 corresponding to the "5-2-5 configuration" may include the following steps, for example. In a first step, spatial parameters or parametric side information can be deployed into decoding matrices 1430, 1440, which are shown in Fig. 14 according to the existing MPEG Surround standard. In a second step, the decoding matrices 1430, 1440 may be used to provide inter-channel information for the upmix channels 1420 in the parametric domain. In a third step, using the inter-channel information thus provided, the direct/ambient energy of each upmix channel can be calculated. In a fourth step, the direct/ambient energy thus obtained may be downmixed to a number of downmix channels 1410 . In a fifth step, the weights to be applied to the downmix channel 1410 are calculated.

在更进一步之前，须指出，刚刚前述的处理要求量测值为Before going any further, it should be pointed out that the processing just described requires a measurement of

E[|L_dmx|²]，E[|R_dmx|²]。E[|L _dmx | ² ], E[|R _dmx | ² ].

其为下混声道的平均功率，以及which is the average power of the downmix channels, and

$E E. [[{L L}_{max max},, {R R}_{dmx dmx}^{* *}]]$

其可被称作为来自下混声道的交叉频谱。这里，下混声道的平均功率有目的地被称作为能量，原因在于“平均功率”一词并非常用的术语。It may be referred to as the cross-spectrum from the downmix channel. Here, the average power of the downmix channels is purposely called energy, since the term "average power" is not a commonly used term.

由方括号指示的预期操作数在实际应用中可以由时间平均、递归或非递归来置换。能量和交叉频谱从下混信号直接可量测。Expected operands indicated by square brackets may be replaced by time averaging, recursion, or non-recursion in practice. Energy and cross-spectrum are directly measurable from the downmix signal.

也须注意，二声道的线性组合能量可从声道能量、混合因子、及交叉频谱中导出公式（全部皆在参数域中，这里，无需信号运算）。It should also be noted that the linear combination energy of two channels can be derived from the formula of channel energy, mixing factor, and cross spectrum (all in the parameter domain, here, no signal operation is required).

线性组合linear combination

Ch=aL_dmx+bR_dmx Ch=aL _dmx +bR _dmx

具有下述能量：Has the following energies:

$E E. [[{| | Ch Ch | |}^{22}]] = = E E. [[{| | {aL aL}_{dmx dmx} + + b b {R R}_{dmx dmx} | |}^{22}]] = = {a a}^{22} E E. [[{| | {L L}_{dmx dmx} | |}^{22}]] + + {b b}^{22} E E. [[{| | {R R}_{dmx dmx} | |}^{22}]] + + ab ab ((E E. [[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]] + + E E. [[{R R}_{dmx dmx} {L L}_{dmx dmx}^{* *}]]))$

$= = {a a}^{22} E E. [[{| | {L L}_{dmx dmx} | |}^{22}]] + + {b b}^{22} E E. [[{| | {R R}_{dmx dmx} | |}^{22}]] + + 22 ab ab ((Re Re {{E E. [[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]]}}))$

以下说明处理程序（也即译码方案）的各个步骤。The individual steps of the processing procedure (ie, the decoding scheme) are described below.

第一步骤（混合矩阵的空间参数）First step (spatial parameters of the mixing matrix)

如前所述，M1和M2矩阵根据MPEG环绕标准形成。M1的第a列、第b行元素为M1（a，b）。As mentioned earlier, the M1 and M2 matrices are formed according to the MPEG Surround standard. The element in column a and row b of M1 is M1(a, b).

第二步骤（具有下混至上混声道的声道间信息的能量及交叉频谱的混合矩阵）Second step (mixing matrix with energy and crossover spectrum of inter-channel information from downmixed to upmixed channels)

现在发明人已有混合矩阵M1和M2。发明人需要用公式表达输出声道如何根据左下混声道（L_dmx）及右下混声道（R_dmx）创建。发明人假设使用解相关器（图14，灰色区）。MPS标准的解码/上混基本上最终提供整个处理程序中用于总输入/输出关系式的如下公式：The inventor now has mixing matrices M1 and M2. The inventor needs to formulate how the output channels are created from the left downmix channel (L _dmx ) and the right downmix channel (R _dmx ). The inventors assumed the use of a decorrelator (Fig. 14, gray area). The decoding/upmixing of the MPS standard basically ends up providing the following formula for the total input/output relationship in the whole process:

L=a_LL_dmx+b_LR_dmx+c_LD₁[S₁]+d_LD₂[S₂]+e_LD₃[S₃]L=a _L L _dmx +b _L R _dmx +c _L D ₁ [S ₁ ]+d _L D ₂ [S ₂ ]+e _L D ₃ [S ₃ ]

前文说明已上混的前左声道实例。其它声道可以以相同方式导出公式。D组件为解相关器，a-e为从M1及M2矩阵条目可求出的权值。The previous section illustrates an example of a front left channel that has been upmixed. Other channels can derive formulas in the same way. The D component is a decorrelator, and a-e are weights that can be obtained from the M1 and M2 matrix entries.

具体地，因子a-e可根据矩阵条目直接以公式表示：Specifically, the factors a-e can be directly expressed in the formula according to the matrix entries:

${a a}_{L L} = = {Σ Σ}_{i i = = 11}^{33} {M m 11}_{i i,, 11} {M m 22}_{11,, i i}$

${b b}_{L L} = = {Σ Σ}_{i i = = 11}^{33} {M m 11}_{i i,, 22} {M m 22}_{11,, i i}$

c_L=M2_1,4 c _L = M2 _1,4

d_L=M2_1,5 d _L = M2 _1,5

e_L=M2_1,6 e _L =M2 _1,6

及相应地用于其它声道。and correspondingly for other channels.

S信号为The S signal is

S_n=M1_n+3,1L_dmx+M1_n+3,2R_dmx S _n =M1 _n+3,1 L _dmx +M1 _n+3,2 R _dmx

这些S信号为从图14左侧矩阵至解相关器的输入。该能量These S signals are the inputs from the matrix on the left side of Fig. 14 to the decorrelator. the energy

E[|D[S_n]|²]=E[|S_n|²]E[|D[S _n ]| ² ]=E[|S _n | ² ]

可如前文解说的那样计算。解相关器并不影响该能量。can be calculated as explained above. The decorrelator does not affect this energy.

进行多声道周围提取的感性动机方式是通过一声道对全部其它声道之和作比较（注意这仅为多选项中的一个选项）。现在，举例说明考虑声道L的案例，声道的其余部分读成：An intuitively motivated way to do multi-channel surround extraction is by comparing one channel to the sum of all other channels (note that this is only one option out of multiple). Now, for example considering the case of the vocal tract L, the rest of the vocal tract reads:

${X x}_{L L} = = \underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{Ch Ch} {L L}_{dmx dmx} + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {b b}_{Ch Ch} {R R}_{dmx dmx} + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {c c}_{Ch Ch} {D D.}_{11} [[{S S}_{11}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {d d}_{Ch Ch} {D D.}_{22} [[{S S}_{22}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {e e}_{Ch Ch} {D D.}_{33} [[{S S}_{33}]]$

发明人在此处使用“X”，原因在于对“其余声道”使用“R”可能产生混淆。The inventors used an "X" here because the use of an "R" for "the rest of the channels" might be confusing.

然后，声道L的能量为Then, the energy of the channel L is

$E E. [[{| | L L | |}^{22}]] = = {a a}_{L L}^{22} E E. [[{| | {L L}_{dmx dmx} | |}^{22}]] + + {b b}_{L L}^{22} E E. [[{| | {R R}_{dmx dmx} | |}^{22}]] + + {c c}_{L L}^{22} E E. [[{| | {S S}_{11} | |}^{22}]] + + {d d}_{L L}^{22} E E. [[{| | {S S}_{22} | |}^{22}]] + + {e e}_{L L}^{22} E E. [[{| | {S S}_{33} | |}^{22}]] + + 22 abRe abRe {{E E. [[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]]}}$

然后，声道X的能量为Then, the energy of channel X is

$E E. [[{| | {X x}_{L L} | |}^{22}]] = = {((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{Ch Ch}))}^{22} E E. [[{| | {L L}_{dmx dmx} | |}^{22}]] + + {((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {b b}_{Ch Ch}))}^{22} E E. [[{| | {R R}_{dmx dmx} | |}^{22}]] + + {((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {c c}_{Ch Ch}))}^{22} E E. [[{| | {S S}_{11} | |}^{22}]] + + {((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {d d}_{Ch Ch}))}^{22} E E. [[{| | {S S}_{22} | |}^{22}]]$

$+ + {((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {e e}_{Ch Ch}))}^{22} E E. [[{| | {S S}_{33} | |}^{22}]] + + 22 ((\underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{Ch Ch} \underset{Ch Ch = = ((REST REST))}{Σ Σ} {b b}_{Ch Ch})) Re Re {{E E. [[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]]}}$

及交叉频谱为：and the cross spectrum is:

$E E. [[{LX LX}_{L L}^{* *}]] = = \underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{Ch Ch} {a a}_{L L} E E. [[{| | {L L}_{dmx dmx} | |}^{22}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {b b}_{Ch Ch} {b b}_{L L} E E. [[{| | {R R}_{dmx dmx} | |}^{22}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {c c}_{Ch Ch} {c c}_{L L} E E. [[{| | {S S}_{11} | |}^{22}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {d d}_{Ch Ch} {d d}_{L L} E E. [[{| | {S S}_{22} | |}^{22}]]$

$+ + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {e e}_{Ch Ch} {e e}_{L L} E E. [[{| | {S S}_{33} | |}^{22}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{L L} {b b}_{Ch Ch} E E. [[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]] + + \underset{Ch Ch = = ((REST REST))}{Σ Σ} {a a}_{Ch Ch} {b b}_{L L} E E. {[[{L L}_{dmx dmx} {R R}_{dmx dmx}^{* *}]]}^{* *}$

现在发明人可将ICC公式化The inventors can now formulate the ICC

${ICC ICC}_{L L} = = \frac{Re Re {{E E. [[{LX LX}_{L L}^{* *}]]}{\sqrt{E E. [[{| | L L | |}^{22}]] E E. [[{| | {X x}_{L L} | |}^{22}]]}}$

并求和总和and sum the sum

${σ σ}_{L L} = = \frac{E E. [[{| | L L | |}^{22}]]}{E E. [[{| | {X x}_{L L} | |}^{22}]]}$

第三步骤（上混声道的声道间信息对上混声道的DTT参数）现在发明人可根据下式计算声道LThe third step (the inter-channel information of the upmix channel to the DTT parameter of the upmix channel) now the inventor can calculate the channel L according to the following formula

${DTT DTT}_{L L} = = \frac{11}{22} [[((11 - - \frac{11}{{σ σ}_{L L}})) + + \sqrt{{((\frac{11}{{σ σ}_{L L}} - - 11))}^{22} + + 44 \frac{{ICC ICC}_{L L}^{22}}{{σ σ}_{L L}}}]]$

L的直接能量为The direct energy of L is

E[|D_L|²]＝DTT·E[|L|²]E[|D _L | ² ]＝DTT·E[|L| ² ]

L的周围能量为The surrounding energy of L is

E[|A_L|²]=(1-DTT)·E[|L|²]E[|A _L | ² ]=(1-DTT)·E[|L| ² ]

第四步骤（下混直接/周围能量）Step Four (Downmix Direct/Ambient Energy)

若使用非相干性下混法则实例，则左下混声道周围能量为If an example of the incoherent downmixing method is used, the energy around the left downmix channel is

$E E. [[{| | {A A}_{Ldmx wxya} | |}^{22}]] = = E E. [[{| | {A A}_{L L} | |}^{22}]] + + E E. [[{| | {A A}_{Ls ls} | |}^{22}]] + + \frac{E E. [[{| | {A A}_{C C} | |}^{22}]] + + E E. [[{| | {A A}_{LF LF} | |}^{22}]]}{22}$

，对直接部分及左声道的直接及周围部分也相同。注意前文说明只是一种下混法则。也可有其它下混法则。, and the same for the direct part and the direct and surrounding part of the left channel. Note that the previous description is only a downmixing rule. Other downmixing rules are also possible.

第五步骤（计算在下混声道中的周围提取的权值）Step 5 (calculate weights for surrounding extraction in the downmix channel)

左下混DTT比为The DTT ratio of the lower left mix is

${DTT DTT}_{Ldmx wxya} = = 11 - - \frac{E E. [[{| | {A A}_{Ldmx wxya} | |}^{22}]]}{E E. [[{| | {L L}_{dmx dmx} | |}^{22}]]}$

然后权值因子的计算可如图5的实施例所述（也即使用sqrt（DTT）或sqrt（1-DTT）办法）或如图6的实施例所述（也即使用交混矩阵方法）计算。Then the calculation of the weight factor can be as described in the embodiment of Figure 5 (that is, using the sqrt (DTT) or sqrt (1-DTT) method) or as described in the embodiment of Figure 6 (that is, using the mixing matrix method) calculate.

基本上，前述处理程序的实例有关在下混声道的中MPS串流对周围比的CPC、ICC、及CLD参数。Basically, the foregoing processing example is concerned with the CPC, ICC, and CLD parameters of the MPS stream-to-ambient ratio in the downmix channel.

根据其它实施例，典型地存在其它手段来达成类似目的及其它情况。举例言之，可存在前文说明者以外的其它法则用于下混、其它扬声器布局、其它译码方法及其它进行多声道周围估算方式，其中，特定声道与其余声道作比较。According to other embodiments, there are typically other means to achieve similar ends and other circumstances. For example, there may be other algorithms than those described above for downmixing, other speaker layouts, other coding methods, and other ways of doing multi-channel ambient estimation where a specific channel is compared to the rest.

尽管本发明已经在框图的背景下进行了描述，但本发明也可通过计算机实施方法来实现，其中，框表示实际或逻辑硬件组件。在后者情况下，框表示对应的方法步骤，其中，这些步骤代表由对应逻辑或实体硬件框执行的功能。Although the invention has been described in the context of block diagrams, the invention can also be implemented by computer-implemented methods, where the blocks represent actual or logical hardware components. In the latter case, the blocks represent corresponding method steps, wherein these steps represent functions performed by corresponding logical or physical hardware blocks.

所述实施例仅供举例说明本发明的原理。须了解，此处所述的配置及细节的修正及变化为本领域技术人员显而易见。因此其旨在仅受所附权利要求的范围所限而非受此处实施例的举例说明及解释所呈现的特定细节所限。The examples are presented merely to illustrate the principles of the invention. It should be understood that modifications and variations in the arrangements and details described herein will be apparent to those skilled in the art. It is therefore the intention to be limited only by the scope of the appended claims rather than by the specific details presented in the illustration and explanation of the embodiments herein.

根据本发明方法的若干实现要求，本发明方法可于硬件或于软件实施。实作可使用数字储存媒体执行，特别为具有可读取控制信号储存于其上的盘片、DVD或CD，其可与可程序规划计算机系统协力合作因而执行本发明方法。一般而言，本发明因而可作为具有程序代码储存于机器可读取载体上的计算机程序产品实施，当该计算机程序产品于计算机上跑时，该程序代码可运算用于执行本发明方法。换言之，本发明方法因而为具有程序代码的一种计算机程序，当该计算机程序于计算机上运行时该程序代码可用于执行本发明方法中的至少一者。本发明编码音频信号可储存在任一种机器可读取储存媒体，诸如数字储存媒体。According to some implementation requirements of the inventive method, the inventive method can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disc, DVD or CD with readable control signals stored thereon, which can cooperate with a programmable computer system to carry out the inventive method. In general, the invention can thus be implemented as a computer program product having a program code stored on a machine-readable carrier, which program code is operable to carry out the inventive method when the computer program product is run on a computer. In other words, the inventive method is thus a computer program with a program code which can be used to perform at least one of the inventive methods when the computer program is run on a computer. The encoded audio signal of the present invention may be stored on any machine-readable storage medium, such as a digital storage medium.

该新颖构想及技术的优点为本案所述前述实施例，也即装置、方法或计算机程序允许借助于参数空间信息而从音频信号估算与提取直接和/或周围组件。更明确言之，本发明的新颖处理在频带中发挥功能，如同典型地在周围提取领域中那样。所呈现的构想与音频信号处理有关，原因在于有多项应用要求直接及周围组件与音频信号分开。The advantages of this novel concept and technique are the foregoing embodiments described in this application, ie a device, a method or a computer program allowing estimation and extraction of direct and/or surrounding components from an audio signal by means of parametric spatial information. More specifically, the novel processing of the present invention functions in frequency bands as typically in the field of ambient extraction. The concepts presented are relevant to audio signal processing, since several applications require separation of the immediate and surrounding components from the audio signal.

与先前技术的周围提取方法相反，本构想并非仅基于立体输入信号，其也可应用至单声道下混情况。用于单一声道下混，通常并无声道间差异可资运算。但通过考虑空间侧边信息，周围提取在此种情况也变可能。Contrary to the surrounding extraction methods of the prior art, the present concept is not based only on stereo input signals, it can also be applied to the mono downmix case. For single-channel downmixing, there is usually no channel-to-channel difference to calculate. But by considering spatial side information, surrounding extraction becomes possible in this case as well.

本发明的优点在于其利用空间参数来估算“原先”信号的周围位准。其基于下述构想：空间参数已经含有有关“原先”立体声或多声道信号的声道间差的相关信息。An advantage of the present invention is that it uses spatial parameters to estimate the ambient level of the "old" signal. It is based on the idea that the spatial parameters already contain relevant information about the inter-channel differences of the "original" stereo or multi-channel signal.

一旦估算原先立体声或多声道信号的周围位准，也可在所提供的下混声道导算出直接位准及周围位准。此可由周围部分的周围能量及直接部分的直接能量或振幅的线性组合（也即加权加总）进行。因此，本发明的实施例借助于空间侧边信息来提供周围估算及提取。Once the ambient level of the original stereo or multi-channel signal is estimated, direct and ambient levels can also be derived in the provided downmix channels. This can be done from a linear combination (ie a weighted sum) of the ambient energy of the surrounding part and the direct energy or amplitude of the immediate part. Therefore, embodiments of the present invention provide surrounding estimation and extraction by means of spatial side information.

从基于侧边信息的处理的此种构想延伸，存在有下列有利性质或优点。Extending from this idea of side information based processing, there are the following advantageous properties or advantages.

本发明的实施例借助于空间侧边信息及所提供的下混声道而提供周围估算。当连同侧边信息提供多于一个下混声道的情况下，这些及周围估算相当重要。侧边信息及从下混声道量测得的信息可一起用在周围估算。于具有立体声下混的MPEG环绕，此二信息源共同提供原先多声道声音的声道间关系式的完整信息，及周围估算系基于这些关系式。Embodiments of the present invention provide ambient estimation by means of spatial side information and the provided downmix channel. These and surrounding estimates are important when more than one downmix channel is provided along with side information. The side information and the information measured from the downmix channel volume can be used together for ambient estimation. In MPEG Surround with stereo downmix, these two sources of information together provide complete information on the inter-channel relations of the original multi-channel sound, and the surrounding estimation is based on these relations.

本发明的实施例也提供直接能量及周围能量的下混。在所述基于侧边信息的周围提取的情况下，有个中间步骤于高于所提供的下混声道的多个声道估算周围。因此，此种周围信息须以有效方式对映至下混音频声道数目。此种处理程序可称作为下混，原因在于其与音频声道的下混相对应。如此可通过如同所提供的下混声道下混的相同方式组合直接能量及周围能量可最直捷地进行。Embodiments of the present invention also provide downmixing of direct energy and ambient energy. In the case of the side information based ambient extraction, there is an intermediate step above the multi-channel estimation of the provided downmix channel. Therefore, such ambient information has to be mapped to the number of downmixed audio channels in an efficient manner. Such a process may be referred to as downmixing, since it corresponds to the downmixing of audio channels. This is most straightforwardly done by combining the direct energy and the ambient energy in the same way as the downmix channel downmix provided.

下混法则不具有一个理想解，反而可能取决于应用用途。例如，于MPEG环绕，由于典型地信号内容不同，故有利地差异处理各声道（中心、前扬声器、后扬声器）。The downmixing law does not have an ideal solution, but may depend on the application. For example, in MPEG Surround, each channel (center, front speaker, rear speaker) is advantageously processed differently since typically the signal content is different.

此外，实施例提供多声道周围估算，其于各个声道相对于其它声道乃独立无关。此种性质/办法允许单纯使用所呈现的立体周围估算式给各声道相对于全部其它声道。借此手段，无需假设全部声道的周围位准相等。所呈现的办法系基于假设有关空间知觉，于各声道的周围组件为该组件于全部其它声道中的部分具有不相干的对应部分。提示此种假设为有效的实例为发出噪声的二声道中的一者（周围）可进一步划分成各自具有半量能的二声道，而未对所接收的声音场景造成显著影响。Furthermore, embodiments provide multi-channel ambient estimation that is independent for each channel with respect to other channels. This property/approach allows pure use of the stereo surround estimation formula presented for each channel relative to all other channels. By this means, there is no need to assume that the ambient levels of all channels are equal. The presented approach is based on the assumption regarding spatial perception that the surrounding components in each channel have incoherent counterparts for that component's parts in all other channels. An example suggesting that such an assumption is valid is that one of the noisy two channels (surroundings) can be further divided into two channels each with half volume energy, without noticeable impact on the received sound scene.

就信号处理而言，有利的是，通过施加所呈现的周围估算式至各声道与全部其它声道的线性组合相比较，可进行实际直接/周围比估算。As far as signal processing is concerned, it is advantageous that by applying the presented ambient estimation formula to each channel compared to a linear combination of all other channels, the actual direct/ambient ratio estimation can be done.

最后，实施例提供了施加已估算的直接周围能量来提取实际信号。一旦已知下混声道的周围位准，则可应用两种本发明方法来获得周围信号。第一方法基于简单乘法，其中，各个下混声道的直接部分及周围部分可通过该信号乘以sqrt（直接对总能比）及sqrt（周围对总能比）而产生。如此对各个下混声道提供彼此相干的二个信号，但二信号具有直接部分及周围部分经估算得的能量。Finally, an embodiment provides for applying the estimated immediate ambient energy to extract the actual signal. Once the ambient level of the downmix channel is known, two inventive methods can be applied to obtain the ambient signal. The first method is based on simple multiplication, where the direct and ambient parts of each downmix channel can be generated by multiplying the signal by sqrt (direct to total energy ratio) and sqrt (surround to total energy ratio). In this way, two signals that are coherent to each other are provided for each downmix channel, but the two signals have estimated energies of the immediate part and the surrounding part.

第二方法基于带有各声道交混的最小均方解，其中，声道交混（也可能具有负号）允许比前述解，更佳地估算直接周围信号。与在“立体信号的多扬声器回放”，C.Faller，AES会议，2007年10月；及“专利申请案名称：从立体信号产生多声道音频信号的方法”，发明人：Christof Faller，代理人：FISH&RICHARDSON P.C.，受让人：LG电子公司，源自：美国明尼苏达州明尼波里市，IPC8类别：AH04R500FI，USPC类别：3811所提供的声道的立体声输入及相等周围位准的最小平均解相反，本发明提供了最小均方解，该方法并不要求相等的周围位准，也可延伸至任何数目的声道。The second method is based on a least mean square solution with channel mixing, where channel mixing (possibly also with a negative sign) allows a better estimation of the immediate surrounding signal than the previous solution. and in "Multi-Speaker Playback of Stereo Signals", C. Faller, AES Conference, October 2007; and "Patent Application Title: Method for Producing Multi-Channel Audio Signals from Stereo Signals", Inventor: Christof Faller, Attorney By: FISH & RICHARDSON P.C., Assignee: LG Electronics, Origin: Minneapolis, Minnesota, USA, IPC8 Class: AH04R500FI, USPC Class: 3811 Minimum Average of Stereo Input and Equivalent Ambient Levels for Channels Provided Solution In contrast, the present invention provides a least mean square solution which does not require equal ambient levels and which can be extended to any number of channels.

新颖处理的额外性质如下。在双耳呈现的周围处理中，周围可使用滤波器处理，该滤波器具有提供在频带的耳际相干性类似于实际扩散声场的耳际相干性性质，其中，该滤波器也包括室内效果。于双耳呈现的直接部分处理中，直接部分可馈送通过头部相关传递函数（HRTF）可能加上室内效果，诸如早期反射和/或混响。Additional properties of the novel treatment are as follows. In ambient processing for binaural rendering, the ambient may be processed using a filter having the property of providing an interaural coherence in frequency bands similar to that of an actual diffuse sound field, wherein the filter also includes room effects. In direct part processing for binaural rendering, the direct part may be fed through a head related transfer function (HRTF) possibly adding room effects such as early reflections and/or reverberation.

除此之外，与干/湿控制相对应的“分离位准”控制可在其它实施例实现。更明确言之，在许多应用中可能并不期望全然分离，原因在于可能导致听觉假影缺陷，例如突然改变、调变效应等。因此，所述处理程序的全部相关部分可以“分离位准”控制实施用来控制期望且有用的分离量。至于图11，此种分离位准控制由控制直接/周围分离1120的虚线框和/或双耳呈现装置910、1010的控制输入信号1105指示。此项控制可类似于音频效应处理的干/湿控制发挥效果。Besides, "separation level" control corresponding to dry/wet control can be implemented in other embodiments. More specifically, full separation may not be desired in many applications, since it may lead to auditory artifact defects, such as sudden changes, modulation effects, and the like. Thus, all relevant portions of the process can be implemented with "separation level" control to control the desired and useful amount of separation. As for FIG. 11 , such separation level control is indicated by the dashed box controlling the direct/surround separation 1120 and/or the control input signal 1105 of the binaural rendering means 910 , 1010 . This control works similarly to the Dry/Wet control for audio effects processing.

所提供的解的主要效果如下。系统在全部情况下皆有效，也可使用参数立体声及带有单声道下混信号的MPEG环绕，与只依赖于下混信息的先前解不同。此外，比较使用下混声道的单纯声道间分析，系统可利用与音频信号一起在空间音频位串流中传输的空间侧边信息来更准确地估算直接能量及周围能量。因此，许多应用诸如双耳处理可通过施加不同处理用于声音的直接部分及周围部分而获益。The main effects of the provided solution are as follows. The system is valid in all cases and can also use parametric stereo and MPEG surround with a mono downmix signal, unlike previous solutions which relied only on the downmix information. Furthermore, the system can exploit the spatial side information transmitted with the audio signal in the spatial audio bitstream to more accurately estimate direct and ambient energy compared to purely inter-channel analysis using downmix channels. Therefore, many applications such as binaural processing can benefit by applying different processing for the immediate part and the surrounding part of the sound.

实施例基于下列心理声学假设。人类听觉系统基于时间-频率片（tile）（限于某些频率及时间范围的区域）的耳间提示而定位音源。若有二个或多个时间及频率上重迭的不相干并列音源同时呈现在不同位置，则听觉系统无法觉察音源的所在位置。原因在于这些音源的和并未在收听者产生可靠的耳际提示。如此听觉系统可能作如此描述，从靠近时间-频率片的音频场景（scene）拾取而提供可靠定位信息，但将其余部分视为无法定位。藉此手段表示听觉系统可在复杂的声音环境定位音源。同时相干性音源具有不同效应，形成在相干性音源间的单一音源所可能形成的相同耳际提示。The embodiments are based on the following psychoacoustic assumptions. The human auditory system localizes sound sources based on interaural cues in time-frequency tiles (regions limited to certain frequency and time ranges). If two or more irrelevant parallel sound sources overlapping in time and frequency are presented at different positions at the same time, the auditory system cannot perceive the position of the sound source. The reason is that the sum of these sources does not produce reliable ear cues in the listener. The auditory system may thus be described as picking up audio scenes close to the time-frequency slice to provide reliable localization information, but treating the rest as unlocalizable. This means that the auditory system can locate the sound source in a complex sound environment. At the same time coherent sources have different effects, forming the same ear cue that a single source among coherent sources would form.

此点也为实施例所利用的性质。可估算可定位（直接）及不可定位（周围）声音位准，然后提取这些组件。空间化信号处理只应用至可定位/直接部分，而扩散/空间感/包封处理系应用至不可定位/周围部分。如此在双耳处理系统的设计上获得显著效果，原因在于多项处理只能应用至需要之处，而留下其余信号不受影响。全部处理皆系出现在近似人类听觉频率分辨率的频带。This point is also a property utilized by the embodiments. Localizable (direct) and non-localizable (surrounding) sound levels can be estimated and these components extracted. Spatialization signal processing is applied to localizable/direct parts only, while diffusion/spatial/enveloping processing is applied to non-localizable/surrounding parts. This has a significant effect on the design of binaural processing systems, since multiple processing is only applied where needed, leaving the rest of the signal unaffected. All processing occurs in frequency bands that approximate the frequency resolution of human hearing.

实施例基于信号的分解来最大化知觉质量，但将所察觉的问题最小化。通过使用此种分解，可以分开获得音频信号的直接组分及周围组分。然后二组分经进一步处理来达成期望的效果或表示型态。Embodiments maximize perceptual quality but minimize perceived problems based on a decomposition of the signal. By using this decomposition, the immediate and surrounding components of the audio signal can be obtained separately. The two components are then further processed to achieve the desired effect or expression.

更明确言之，本发明的实施例允许在编码域中借助于空间侧边信息做周围估算。More specifically, embodiments of the present invention allow surrounding estimation in the coding domain with the aid of spatial side information.

本发明的优点还在于可通过分离直接信号及周围信号中的信号，来减少头戴耳机再现音频信号的典型问题。实施例允许改善施加至用于耳机再现的双耳声音呈现的既有直接/周围提取方法。The invention is also advantageous in that it reduces the typical problems of headphones reproducing audio signals by separating the direct signal from the surrounding signal. Embodiments allow improving the existing direct/ambient extraction methods applied to binaural sound presentation for headphone reproduction.

基于空间侧边信息的处理的主要用途案例为自然MPEG环绕及参数立体声（以及类似的参数编码技术）。从周围提取可获益的典型应用用途为双耳回放，原因在于其可施加不同室内效果程度至声音的不同部分；以及上混至更多个声道，原因在于可差异地定位及处理声音的不同组分。可能还存在一些应用用途，其中，使用者要求修正直接/周围位准，例如用于智能地增强语音。The main use cases for processing based on spatial side information are natural MPEG surround and parametric stereo (and similar parametric coding techniques). Typical application uses that benefit from ambient extraction are binaural playback, since it can apply different degrees of room effects to different parts of the sound, and upmixing to more channels, since it can localize and process parts of the sound differently different components. There may also be some application uses where the user requires modification of the direct/surrounding alignment, eg for intelligent speech enhancement.

Claims

1. one kind is used for extracting a direct and/or ambient signals (125-1 from mixing signal (115) and a spatial parameter information (105) once; Device 125-2) (100), said mixed signal (115) down and said spatial parameter information (105) expression have more multichannel (Ch than said mixed signal (115) down ₁Ch _N) a multi-channel audio signal (101), wherein, said spatial parameter information (105) comprises relational expression between the sound channel of said multi-channel audio signal (101), said device (100) comprises:

One directly/and estimate device (110) on every side, be used for estimating a direct definite message or answer breath (113) of part and/or a peripheral part of said multi-channel audio signal (101) based on said spatial parameter information (105); And

One direct/on every side extraction apparatus (120) is used for extracting said direct signal part (125-1) and/or said ambient signals part (125-2) based on the position definite message or answer breath (113) of the said estimation of said direct part or said peripheral part and from said mixed signal (115) down.

2. device according to claim 1; Wherein, The position definite message or answer breath (113) that said directly/on every side extraction apparatus (420) is constructed to down to mix the said estimation of said direct part or said peripheral part obtains the position definite message or answer breath that mixes down of said direct part or said peripheral part, and extracts said direct signal part (125-1) or said ambient signals partly (125-2) based on the said position definite message or answer breath that mixes down and from said mixed signal (115) down.

3. device according to claim 2; Wherein, Said directly/on every side extraction apparatus (420) further is constructed to have through combination the position definite message or answer breath of position definite message or answer breath and the said estimation of the said peripheral part with incoherence summation of said estimation of the said direct part of coherence's summation, and the following of position definite message or answer breath (113) of carrying out the said estimation of said direct part or said peripheral part mixes.

4. according to claim 2 or 3 described devices; Wherein, Said directly/on every side extraction apparatus (520) further be constructed to from said direct part or said peripheral part the said position definite message or answer breath that mixes down (555-1, lead in 555-2) calculate gain parameter (565-1,565-2); And lead the gain parameter calculated (565-1 565-2) is applied to said mixed signal (115) down and obtains said direct signal part (125-1) or said ambient signals part (125-2) said.

5. device according to claim 4; Wherein, Said directly/on every side extraction apparatus (520) further is constructed to the said position definite message or answer (555-1 that mixes down according to said direct part or said peripheral part; 555-2) measure one directly to total (DTT) can than or on every side can ratio to total (ATT), and use based on the DTT that is measured can than or ATT can than extracting parameter as said gain parameter (565-1,565-2).

6. according to each described device in the claim 1 to 5; Wherein, Said directly/on every side extraction apparatus (520) is constructed to extract said direct signal part (125-1) or said ambient signals part (125-2) through one M * M square of extraction matrix is applied to said mixed signal (115) down; Wherein, said M * M square of size (M) and following mixing sound road (Ch of extracting matrix ₁Ch _M) number (M) corresponding.

7. device according to claim 6; Wherein, Said directly/on every side extraction apparatus (520) is constructed to that further one first many extracting parameters are applied to said mixed signal (115) down and obtains said direct signal part (125-1); And one second many extracting parameters are applied to said mixed signal (115) down obtain said ambient signals part (125-2), said more than first extracting parameter and said more than second the extracting parameter linea angulata matrix that partners.

8. according to each described device in the claim 1 to 7; Wherein, Said directly/estimate on every side device (110) be constructed to based on said spatial parameter information (113) and by said directly/estimate said at least two following mixing sound roads (825) of mixed signal (115) down that device (110) is received on every side, estimate the said direct part of said multi-channel audio signal (101) or institute's rheme definite message or answer breath (113) of said peripheral part.

9. according to each described device in the claim 1 to 8, wherein, said directly/estimate that on every side device (710) is constructed to each the sound channel (Ch to said multi-channel audio signal (101) _i) through use said spatial parameter information (105) apply one stereo around the estimation formula, wherein, said stereo around the estimation formula given by following formula

DTT _i=f _DTT[σ(Ch _i′R) _′ICC _i(Ch _i′R)]

ATT _i＝1-DTT _i

Said formula depends on the accurate poor (CLD in sound channel position _i), said sound channel position is accurate poor to be σ _iDecibel value, and be sound channel Ch _iInter-channel coherence (ICC _i) parameter, wherein, R is the linear combination of all the other sound channels.

10. according to each described device in the claim 1 to 9; Wherein, Said directly/on every side extraction apparatus (620) is constructed to hand over the lowest mean square that mixes to separate by the use sound channel and extracts said direct signal part (125-1) or said ambient signals part (125-2), and said LMS separates and do not require that equal position on every side is accurate.

11. device according to claim 9, wherein, said directly/on every side extraction apparatus (620) is constructed to calculate said LMS and separate through supposing that a signal model is led, and makes said LMS separate and is not limited to mixed signal under the stereo channel.

12. according to each described device in the claim 1 to 11, wherein, said device also comprises:

One ears direct voice presents device (910), is used to handle said direct signal part (125-1) and obtains one first ears output signal (915);

One ears ambient sound presents device (1010), is used to handle said ambient signals part (125-2) and obtains one second ears output signal (1015); And

One combiner (1130) is used to make up said first ears output signal (915) and said second ears output signals (1015) obtain the ears output signal (1135) once combination.

13. device according to claim 12; Wherein, Said ears ambient sound presents device (1010) and is constructed to that a room effect and/or a wave filter are applied to said ambient signals part (125-2) said second ears output signal (1015) is provided, and said second ears output signals (1015) are applicable to coherence between the ears of actual dispersion sound field.

14. according to claim 12 or 13 described devices; Wherein, said ears direct voice presents device (910) and is constructed to present said direct signal part (125-1) based on head related transfer function (HRTF) through wave filter and obtains said first ears output signal (915).

15. one kind is used for extracting a direct and/or ambient signals (125-1 from mixing signal (115) and a spatial parameter information (105) once; Method 125-2) (100), said mixed signal (115) down and said spatial parameter information (105) expression have more multichannel (Ch than said mixed signal (115) down ₁Ch _N) a multi-channel audio signal (101), wherein, said spatial parameter information (105) comprises relational expression between the sound channel of said multi-channel audio signal (101), said method (100) comprises:

Estimate direct part of (110) said multi-channel audio signal (101) and/or a definite message or answer breath (113) an of peripheral part based on said spatial parameter information (105); And

Based on the position definite message or answer breath (113) of the said estimation of said direct part or said peripheral part, extract (120) said direct signal parts (125-1) and/or ambient signals part (125-2) from said mixed signal (115) down.

16. the computer program with program code is used for enforcement of rights and requires 15 described methods (100) when said computer program is carried out on a computing machine.