CN102804264A - Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information - Google Patents
Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information Download PDFInfo
- Publication number
- CN102804264A CN102804264A CN2011800140389A CN201180014038A CN102804264A CN 102804264 A CN102804264 A CN 102804264A CN 2011800140389 A CN2011800140389 A CN 2011800140389A CN 201180014038 A CN201180014038 A CN 201180014038A CN 102804264 A CN102804264 A CN 102804264A
- Authority
- CN
- China
- Prior art keywords
- direct
- signal
- channel
- ambient
- sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 48
- 230000005236 sound signal Effects 0.000 claims abstract description 66
- 238000000605 extraction Methods 0.000 claims description 55
- 239000011159 matrix material Substances 0.000 claims description 26
- 230000000694 effects Effects 0.000 claims description 24
- 238000002156 mixing Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 210000005069 ears Anatomy 0.000 claims 13
- 230000002093 peripheral effect Effects 0.000 claims 11
- 239000006185 dispersion Substances 0.000 claims 1
- 238000012545 processing Methods 0.000 description 38
- 238000010586 diagram Methods 0.000 description 29
- 238000009877 rendering Methods 0.000 description 28
- 230000000875 corresponding effect Effects 0.000 description 19
- 238000000926 separation method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 230000001427 coherent effect Effects 0.000 description 10
- 238000000354 decomposition reaction Methods 0.000 description 8
- 238000013459 approach Methods 0.000 description 7
- 230000003595 spectral effect Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 4
- 238000009795 derivation Methods 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101100259947 Homo sapiens TBATA gene Proteins 0.000 description 1
- 101000772194 Homo sapiens Transthyretin Proteins 0.000 description 1
- 102100029290 Transthyretin Human genes 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
一种用于从下混信号和空间参数信息提取直接/周围信号的装置,该下混信号及该空间参数信息表示比该下混信号具有更多声道的多声道音频信号,其中,该空间参数信息包含该多声道音频信号的声道间关系式。该装置包含直接/周围估算器及直接/周围提取器。该直接/周围估算器被构造为用于基于该空间参数信息估算该多声道音频信号的直接部分和/或周围部分的位准信息。该直接/周围提取器被构造为用于基于该直接部分或周围部分的位准信息而从该下混信号提取直接信号部分和/或周围信号部分。
An apparatus for extracting a direct/ambient signal from a downmix signal and spatial parameter information representing a multi-channel audio signal having more channels than the downmix signal, wherein the The spatial parameter information includes an inter-channel relationship of the multi-channel audio signal. The device includes a direct/surround estimator and a direct/surround extractor. The direct/surround estimator is configured for estimating level information of the direct part and/or the surrounding part of the multi-channel audio signal based on the spatial parameter information. The direct/surround extractor is configured for extracting the direct signal part and/or the ambient signal part from the downmix signal based on the level information of the direct part or the surrounding part.
Description
技术领域 technical field
本发明涉及音频信号处理,并且更明确地,涉及从下混信号和空间参数信息提取直接/周围信号的一种装置及方法。本发明的其他实施例涉及利用直接/周围分离用于提升音频信号的双耳再现。又一些实施例涉及多声道声音的双耳再现,其中,多声道音频表示具有二个或多个声道的音频。具有多声道声音的典型音频内容为电影声轨及多声道音乐记录。The present invention relates to audio signal processing, and more particularly to an apparatus and method for extracting direct/surrounding signals from downmix signals and spatial parameter information. Other embodiments of the invention relate to utilizing direct/ambient separation for boosting binaural reproduction of audio signals. Yet other embodiments relate to binaural reproduction of multi-channel sound, wherein multi-channel audio means audio having two or more channels. Typical audio content with multi-channel sound are movie soundtracks and multi-channel music recordings.
背景技术 Background technique
人类空间听觉系统倾向于粗略地以两部分处理声音。一方面为可局限化部分或直接部分,而另一方面为非可局限化部分或周围部分。存在许多音频处理应用,诸如双耳声音再现及多声道上混,其中,期望存取这二个音频组分。The human spatial auditory system tends to process sound roughly in two parts. On the one hand the localizable or immediate part and on the other hand the non-localizable or surrounding part. There are many audio processing applications, such as binaural sound reproduction and multi-channel upmixing, where it is desired to access both audio components.
在本领域中,已知一种直接/周围分离方法,例如在“用于空间音频编码及增强的一次周围信号分解及基于向量的局限化”,Goodwin,Jot,IEEE国际声学、语音及信号处理会议,2007年4月;“从立体声记录的基于相关性的周围抽取”,Merimaa,Goodwin,Jot,AES第123期刊,纽约2007年;“立体信号的多扬声器回放”,C.Faller,AES会议,2007年10月;“立体音频信号使用复杂相似性指针的一次周围分解”,Goodwin等人,公告号码:US2009/0198356A1,2009年8月;“专利申请案名称:从立体信号产生多声道音频信号之方法”,发明人:Christof Faller,代理人:FISH&RICHARDSON P.C.,受让人:LG电子公司,源自:美国明尼苏达州明尼波里市,IPC8类别:AH04R500FI,USPC类别:3811;以及“立体信号的周围产生”,Avendano et al.,发行日期:2009年7月28日,申请号:10/163,158,申请日:2002年6月4日中所描述,这些方法可用于多项应用。现有技术最先进的直接/周围分离算法则基于立体声于频带的频带间信号比较。A direct/ambient separation method is known in the art, e.g. in "One-time ambient signal decomposition and vector-based localization for spatial audio coding and enhancement", Goodwin, Jot, IEEE International Acoustics, Speech, and Signal Processing Conference, April 2007; "Correlation-based surround extraction from stereo recordings", Merimaa, Goodwin, Jot, AES Journal 123, New York 2007; "Multi-loudspeaker playback of stereo signals", C. Faller, AES Conference , October 2007; "One-time Surrounding Decomposition of Stereo Audio Signals Using Complex Similarity Pointers", Goodwin et al., publication number: US2009/0198356A1, August 2009; A METHOD FOR AUDIO SIGNALS", Inventor: Christof Faller, Attorney: FISH&RICHARDSON P.C., Assignee: LG Electronics Corporation, Origin: Minneapolis, MN, USA, IPC8 Class: AH04R500FI, USPC Class: 3811; and " Ambient Generation of Stereo Signals", Avendano et al., Issue Date: July 28, 2009, Application No.: 10/163,158, Application Date: June 4, 2002, these methods can be used in a number of applications. State-of-the-art direct/surround separation algorithms are based on inter-band signal comparisons between stereo and frequency bands.
此外,在“基于空间音频场景编码的双耳3-D音频呈现”,Goodwin,Jot,AES123届会议,纽约2007年,解决使用周围提取的双耳回放。关联双耳再现的周围提取也在J.Usher及J.Benesty中叙述,“空间声音质量的提升:新颖残响音频上混器”,IEEE音频、语音、语言处理会报,第15期第2141-2150页2007年9月。后述报告聚焦在使用各声道的直接组分的适应性最小均方交叉声道滤波而在立体麦克风记录的周围提取。空间音频编译码器例如MPEG环绕,典型地由一或二声道音频串流组合空间侧边信息组成,其将音频延伸入多个声道,如在ISO/IEC 23003-1-MPEG环绕中叙述的那样;及Breebaart,J.,Herre,J.,Villemoes,L,,Jin,C.,Kjorling,K.,Plogsties,J.,Koppens,J.(2006),“多声道进入行动装置:MPEG环绕双耳呈现”,第29届AES会议议事录,韩国首尔。Also, in "Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding", Goodwin, Jot, AES123 Session, New York 2007, addressing binaural playback using ambient extraction. Surrounding extraction for correlated binaural reproduction is also described in J. Usher and J. Benesty, "Enhancement of spatial sound quality: a novel reverberant audio upmixer", IEEE Transactions on Audio, Speech, and Language Processing, Issue 15, Issue 2141 - 2150 pages September 2007. The report that follows focuses on extraction around stereo microphone recordings using adaptive least mean square cross-channel filtering of the direct components of each channel. Spatial audio codecs such as MPEG Surround typically consist of a one- or two-channel audio stream combined with spatial side information, which extends the audio into multiple channels, as described in ISO/IEC 23003-1-MPEG Surround and Breebaart, J., Herre, J., Villemoes, L,, Jin, C., Kjorling, K., Plogsties, J., Koppens, J. (2006), "Multichannel access to mobile devices: MPEG Surrounding Binaural Presentation", Proceedings of the 29th AES Conference, Seoul, South Korea.
但现代参数音频编码技术诸如MPEG环绕(MPS)及参数立体声(PS)只提供较少数音频下混声道,在某些情况下,只提供一个声道连同额外空间侧边信息。“原先”输入声道间的比较唯有在首次将声音解码成为期望的输出格式后才有可能。But modern parametric audio coding techniques such as MPEG Surround (MPS) and Parametric Stereo (PS) provide only a few audio downmix channels, and in some cases only one channel with additional spatial side information. Comparisons between "old" input channels are only possible after first decoding the sound into the desired output format.
因此,要求从下混信号及空间参数信息提取直接信号部分或周围信号部分的构想。但使用参数侧边信息作直接/周围提取并非既有的解决之道。Therefore, a concept is required to extract the direct signal part or the surrounding signal part from the downmix signal and the spatial parameter information. But using parameter side information for direct/surrounding extraction is not an existing solution.
因此本发明的目的是提供一种通过使用空间参数信息而从下混信号提取直接信号部分或周围信号部分的构想。It is therefore an object of the present invention to provide a concept for extracting direct or ambient signal parts from a downmix signal by using spatial parameter information.
该目的可通过权利要求1所述的装置、根据权利要求15所述的方法、或根据权利要求16所述的计算机程序来实现。This object is achieved by an apparatus according to claim 1 , a method according to claim 15 , or a computer program according to claim 16 .
发明内容 Contents of the invention
基于本发明的基本观念是当基于该空间参数信息而估算多声道音频信号的直接部分或周围部分的位准信息(level information,电平信息)并基于该估算的位准信息而从下混信号提取直接信号部分或周围信号部分时,可实现前述的直接/周围提取。此处,该下混信号及该空间参数信息表示该具有比下混信号更多声道的多声道音频信号。此种解决办法允许通过使用空间参数侧边信息而从具有一或多个输入声道的下混信号进行直接和/或周围提取。The basic concept based on the present invention is when estimating the level information (level information, level information) of the direct part or the surrounding part of the multi-channel audio signal based on the spatial parameter information and from the downmixing based on the estimated level information When the signal extracts the direct signal part or the surrounding signal part, the aforementioned direct/surrounding extraction can be realized. Here, the downmix signal and the spatial parameter information represent the multi-channel audio signal having more channels than the downmix signal. Such a solution allows direct and/or ambient extraction from a downmix signal with one or more input channels by using spatial parameter side information.
根据本发明的一实施例,一种用于从下混信号及空间参数信息提取直接和/或周围信号的装置包含直接/周围估算器及直接/周围提取器(direct/ambience estimator)。该下混信号及该空间参数信息表示比该下混信号具有更多声道的多声道音频信号。此外,该空间参数信息包含该多声道音频信号的声道间关系式。该直接/周围估算器被构造为用于基于该空间参数信息而估算该多声道音频信号的直接部分或周围部分的位准信息。该直接/周围提取器被构造为用于基于该直接部分或该周围部分的该估算得到的位准信息而从该下混信号提取该直接信号部分或该周围信号部分。According to an embodiment of the present invention, an apparatus for extracting direct and/or ambient signals from a downmix signal and spatial parameter information includes a direct/ambience estimator and a direct/ambience estimator. The downmix signal and the spatial parameter information represent a multi-channel audio signal having more channels than the downmix signal. In addition, the spatial parameter information includes an inter-channel relational expression of the multi-channel audio signal. The direct/surround estimator is configured for estimating level information of the direct part or the surrounding part of the multi-channel audio signal based on the spatial parameter information. The direct/surrounding extractor is configured for extracting the direct signal part or the surrounding signal part from the downmix signal based on the estimated level information of the direct part or the surrounding part.
根据本发明的另一实施例,一种用于从下混信号及空间参数信息提取直接和/或周围信号的装置包含双耳直接声音呈现装置(binarural directsound rendering device)、双耳周围声音呈现装置及组合器。该双耳直接声音呈现装置被构造为用于处理该直接信号部分来获得一第一双耳输出信号。该双耳周围声音呈现装置被构造为用于处理该周围信号部分来获得一第二双耳输出信号。该组合器被构造为用于组合该第一双耳输出信号及第二双耳输出信号来获得一经组合的双耳输出信号。因此,可提供一音频信号的双耳再现,其中,该音频信号的直接信号部分及周围信号部分被分开处理。According to another embodiment of the present invention, a device for extracting direct and/or ambient signals from a downmix signal and spatial parameter information includes a binarural direct sound rendering device (binarural direct sound rendering device), a binarural ambient sound rendering device and combiners. The binaural direct sound rendering device is configured for processing the direct signal portion to obtain a first binaural output signal. The binaural ambient sound rendering device is configured to process the ambient signal portion to obtain a second binaural output signal. The combiner is configured to combine the first binaural output signal and the second binaural output signal to obtain a combined binaural output signal. Thus, binaural reproduction of an audio signal can be provided, wherein the direct signal part and the surrounding signal part of the audio signal are processed separately.
附图说明 Description of drawings
图1示出了用于从下混信号及表示多声道音频信号的空间参数信息提取直接/周围信号的一种装置的一实施例的框图;1 shows a block diagram of an embodiment of an apparatus for extracting direct/surrounding signals from a downmix signal and spatial parameter information representing a multi-channel audio signal;
图2示出了用于从单声道下混信号及表示参数立体声音频信号的空间参数信息提取直接/周围信号的一种装置的一实施例的框图;Figure 2 shows a block diagram of an embodiment of an apparatus for extracting direct/ambient signals from a mono downmix signal and spatial parameter information representing a parametric stereo audio signal;
图3a示出了根据本发明的一实施例的多声道音频信号的频谱分解的示意说明图;Fig. 3a shows a schematic illustration of spectral decomposition of a multi-channel audio signal according to an embodiment of the present invention;
图3b示出了用于基于图3a的频谱分解而计算多声道音频信号的声道间关系式的示意说明图;Figure 3b shows a schematic illustration of the inter-channel relationship for calculating a multi-channel audio signal based on the spectral decomposition of Figure 3a;
图4示出了利用估算的位准信息下混的直接/周围提取器的实施例的框图;Figure 4 shows a block diagram of an embodiment of a direct/surround extractor with estimated level information downmix;
图5示出了通过施加增益参数至一下混信号的一直接/周围提取器的又一实施例的框图;Figure 5 shows a block diagram of yet another embodiment of a direct/surround extractor by applying gain parameters to the downmix signal;
图6示出了基于使用声道交混的最小均方(LMS)解的一直接/周围提取器的又一实施例的框图;Figure 6 shows a block diagram of yet another embodiment of a direct/surround extractor based on a Least Mean Square (LMS) solution using channel mixing;
图7a示出了使用立体声周围估算式的一种直接/周围估算器的实施例的框图;Figure 7a shows a block diagram of an embodiment of a direct/ambient estimator using stereo ambient estimators;
图7b示出了直接对总能比(direct-to-total energy ratio)相对于声道间相干性的一实例的曲线图;Figure 7b shows a graph of an example of direct-to-total energy ratio versus inter-channel coherence;
图8示出了根据本发明的一实施例的编码器/译码器系统的框图;Figure 8 shows a block diagram of an encoder/decoder system according to an embodiment of the present invention;
图9a示出了根据本发明的一实施例的双耳直接声音呈现的纵览的框图;Figure 9a shows a block diagram of an overview of binaural direct sound presentation according to an embodiment of the invention;
图9b示出了图9a的双耳直接声音呈现的细节的框图;Figure 9b shows a block diagram of details of the binaural direct sound presentation of Figure 9a;
图10a示出了根据本发明的一实施例的双耳周围声音呈现的纵览的框图;Figure 10a shows a block diagram of an overview of binaural ambient sound presentation according to an embodiment of the invention;
图10b示出了图10a的双耳周围声音呈现细节的双耳周围声音呈现细节的框图;Figure 10b shows a block diagram of binaural ambient sound rendering details of the binaural ambient sound rendering details of Figure 10a;
图11示出了多声道音频信号的双耳再现的一实施例的构想框图;Figure 11 shows a conceptual block diagram of an embodiment of binaural reproduction of a multi-channel audio signal;
图12示出了包括双耳再现的直接/周围提取的一实施例的总体框图;Figure 12 shows a general block diagram of an embodiment of direct/surrounding extraction including binaural rendering;
图13a示出了用于在滤波器排组域(filterbank domain)中从单声道下混信号提取一直接/周围信号的一种装置的一实施例的框图;Figure 13a shows a block diagram of an embodiment of an apparatus for extracting a direct/surrounding signal from a mono downmix signal in the filterbank domain;
图13b示出了图13a的直接/周围提取块的一实施例的框图;以及Figure 13b shows a block diagram of an embodiment of the immediate/surrounding extraction block of Figure 13a; and
图14示出了根据本发明的又一实施例的MPEG环绕译码方案的一实例的示意说明图。FIG. 14 shows a schematic illustration of an example of an MPEG surround decoding scheme according to yet another embodiment of the present invention.
具体实施方式 Detailed ways
图1示出了用于从下混信号115及空间参数信息105提取直接/周围信号125-1、125-2的装置100的一实施例的框图。如图1所示,下混信号115及空间参数信息105表示比下混信号115具有更多声道Ch1…ChN的多声道音频信号101。空间参数信息105可包含多声道音频信号101的声道间关系式。更明确言之,装置100包含一直接/周围估算器110及一直接/周围提取器120。直接/周围估算器110可被构造为基于空间参数信息105而估算该多声道音频信号101的直接部分或周围部分的位准信息113。直接/周围提取器120可被构造为基于该估算的直接部分或周围部分的位准信息(level information)113,而从该下混信号115提取直接信号部分125-1或周围信号部分125-2。FIG. 1 shows a block diagram of an embodiment of an apparatus 100 for extracting direct/surrounding signals 125 - 1 , 125 - 2 from a
图2示出了用于从一单声道下混信号215及表示参数立体声音频信号201的空间参数信息105提取直接/周围信号125-1、125-2的装置的一实施例的框图。图2的装置200大致上包含与图1的装置100相同的框。因此,具有相同实现方式和/或功能的相同框系以相同组件符号标示。此外,图2的参数立体声音频信号201可与图1的多声道音频信号101相对应,及图2的单声道下混信号215可与图1的下混信号115相对应。在图2的实施例中,单声道下混信号215及空间参数信息105表示参数立体声音频信号201。参数立体声音频信号可包含以“L”指示的左声道及以“R”指示的右声道。此处,直接/周围提取器120被构造为基于该估算的位准信息113,而从该单声道下混信号215提取直接信号部分125-1或周围信号部分125-2;该位准信息113可通过直接/周围估算器110的使用而从空间参数信息105导算出。FIG. 2 shows a block diagram of an embodiment of an arrangement for extracting direct/ambient signals 125 - 1 , 125 - 2 from a
实际上,图1或图2实施例中的空间参数(空间参数信息105)特别是指MPEG环绕(MPS)或参数立体声(PS)侧边信息。该二项技术是现有技术中的低位率立体声或环绕音频编码方法。参考图2,PS提供一个具有空间参数的下混音频声道,并参考图1,MPS提供一个、二个或多个具有空间参数的下混音频声道。In fact, the spatial parameters (spatial parameter information 105 ) in the embodiment in FIG. 1 or FIG. 2 especially refer to MPEG surround (MPS) or parametric stereo (PS) side information. These two technologies are low bit rate stereo or surround audio coding methods in the prior art. Referring to FIG. 2, the PS provides one downmix audio channel with spatial parameters, and referring to FIG. 1, the MPS provides one, two or more downmix audio channels with spatial parameters.
具体地,图1和图2的实施例清晰地示出了空间参数侧边信息105可容易地用在从具有一或多个输入声道的一信号(也即下混信号115;215)进行直接和/或周围提取的领域中。In particular, the embodiments of Figs. 1 and 2 clearly show that the spatial
直接和/或周围位准(位准信息113)的估算基于有关声道间关系或声道间差值的信息,诸如位准差和/或相关性。这些值可从立体声或多声道信号算出。图3a示出了用来计算各个Ch1…ChN的声道间关系的多声道音频信号(Ch1…ChN)的频谱分解300的示意说明图。如图3a可知,多声道音频信号(Ch1…ChN)的受检查的声道Chi或其余声道的线性组合R的频谱分解,分别包含多个子频带301,其中,这些多个子频带301中的各个子频带303沿着具有子频带值305的一横轴(时间轴310)延伸,如时间/频率网格的小框所指示的。此外,子频带303沿纵轴(频率轴320)连续定位而与一滤波器排组的不同频率区域相对应。在图3a中,相应时间/频率片(tile)或以虚线指示。此处,指数i表示声道Chi,而R表示其余声道的线性组合,而指数n及k对应于某些滤波器排组时槽(filter banktime slot)307和滤波器排组子频带303。基于这些时间/频率片(tile)或例如定位在相对于时间/频率轴310、320的相同时间/频率点(t0,f0),如图3b所示,可在步骤330中求出声道间关系式335,诸如所检查的声道Chi的声道间相干性(ICCi)或声道位准差(CLDi)。此处,声道间关系式ICCi及CLDi的计算可通过使用下列关系式进行:The estimation of direct and/or ambient level (level information 113 ) is based on information about inter-channel relationships or inter-channel differences, such as level differences and/or correlations. These values can be calculated from stereo or multi-channel signals. Fig. 3a shows a schematic illustration of a
其中,Chi为所检查的声道,及R为其余声道的线性组合,而<…>表示时间平均。其余声道的线性组合R的一例为它们的能量标准化和(energy-normalized)。此外,声道位准差(CLDi)(channel level difference)通常为参数σi的分贝值。where Ch i is the checked channel, and R is the linear combination of the remaining channels, and <...> denotes the temporal average. An example of a linear combination R of the remaining channels is their energy-normalized sum. In addition, the channel level difference (CLD i ) (channel level difference) is usually the decibel value of the parameter σ i .
参考前述方程式,声道位准差(CLDi)或参数σi可与标准化至其余声道的线性组合R的位准PR的声道Chi的位准Pi相对应。这里,位准Pi或PR可从声道Chi的声道间位准差参数ICLDi及其余声道的声道间位准差参数ICLDj(j不等于i)的线性组合ICLDR中导算出。Referring to the aforementioned equations, the channel level difference (CLD i ) or parameter σ i may correspond to the level P i of the channel Ch i normalized to the level PR of the linear combination R of the remaining channels. Here , the level P i or P R can be obtained from the linear combination ICLD R Calculated in the guide.
这里,ICLDi及ICLDj分别与一参考声道Chref相关。在额外实施例中,声道间位准差参数ICLDi及ICLDj也可与多声道音频信号(Ch1…ChN)的属于参考声道Chref的任何其它声道相关。如此,最终将导致声道位准差(CLDi)及参数σi的相同结果。Here, ICLD i and ICLD j are respectively related to a reference channel Ch ref . In an additional embodiment, the inter-channel level difference parameters ICLD i and ICLD j may also be related to any other channel of the multi-channel audio signal (Ch 1 . . . Ch N ) belonging to the reference channel Ch ref . In this way, it will eventually lead to the same result of channel level difference (CLD i ) and parameter σ i .
根据其它实施例,图3b的声道间关系式335也可通过在多声道音频信号(Ch1…ChN)的不同或全部成对Chi、Chj输入声道上经运算而导算出。此种情况下,可获得成对逐一计算的声道间相干性参数ICCi,j或声道位准差(CLDi,j)或参数σij(或ICLDi,j),指数(i,j)分别表示某一对声道Chi及Chj。According to other embodiments, the inter-channel relationship 335 of Fig. 3b can also be derived by operating on different or all pairs of Ch i , Ch j input channels of a multi-channel audio signal (Ch 1 ... Ch N ) . In this case, the inter-channel coherence parameter ICC i, j or channel level difference (CLD i, j ) or parameter σ ij (or ICLD i, j ), index (i, j) represent a certain pair of channels Ch i and Ch j respectively.
图4示出了一直接/周围提取器420的一实施例400的框图,其包括估算的位准信息113的下混。图4的实施例大致上包含图1的实施例的相同框。因此,具有类似实现方式和/或功能的相同框以相同的组件符号标示。但对应于图1的直接/周围提取器120的图4的直接/周围提取器420被构造为将多声道音频信号的直接部分或周围部分的估算得的位准信息113下混,以获得该直接部分或周围部分的已下混的位准信息,并基于已下混的位准信息而从下混信号115提取直接信号部分125-1或周围信号部分125-2。如图4所示,空间参数信息105例如可从图1的多声道(Ch1…ChN)音频信号101导算出,并可包含图3b所介绍的Ch1…ChN的声道间关系式335。图4的空间参数信息105还包含将要馈送至直接/周围提取器420的下混信息410。在实施例中,下混信息410可将原先的多声道音频信号(例如图1的多声道音频信号101)的下混特征化为下混信号115。下混例如可使用于任何编码域,例如在时域或频域中运算的下混器(图中未示出)来执行。FIG. 4 shows a block diagram of an
根据其它实施例,直接/周围提取器420还被构造为通过组合具有相干性和的直接部分的估算得到的位准信息与具有非相干性和的周围部分的估算得到的位准信息,来执行多声道音频信号101的直接部分或周围部分的估算的位准信息113的下混。According to other embodiments, the direct/surrounding
须指出,估算的位准信息可分别表示直接部分或周围部分的能量(energy)位准或功率位准。It should be noted that the estimated level information may represent the energy level or power level of the immediate part or the surrounding part, respectively.
更明确言之,估算得到的直接/周围部分的能量(也即位准信息113)的下混可通过假设声道间的完全非相干性(full incoherence)或完全相干性来执行。在分别基于非相干性和或相干性和进行下混的情况下,可应用如下二个公式。More specifically, downmixing of the estimated direct/surrounding energy (ie level information 113 ) can be performed by assuming full incoherence or full coherence between channels. In the case of performing downmixing based on incoherence sum or coherence sum respectively, the following two formulas can be applied.
对于非相干性信号,已下混的能量或已下混的位准信息可由
对于相干性信号,已下混的能量或已下混的位准信息可由
此处,g为下混增益,其可得自于下混信息,而E(Chi)表示多声道音频信号中的一声道Chi的直接/周围部分的能量。至于非相干性下混的典型例,在下混5.1声道成为二声道的情况下,左下混的能量可为:Here, g is the downmix gain, which can be obtained from the downmix information, and E(Ch i ) represents the energy of the immediate/surrounding part of a channel Chi in the multi-channel audio signal. As for the typical example of incoherent downmixing, in the case of downmixing 5.1 channels into two channels, the energy of the left downmixing can be:
EL_DMX=ELeft+ELeft_surround+0.5*ECenter E L_DMX =E Left +E Left_surround +0.5*E Center
图5示出了通过将增益参数gD、gA施加至下混信号115的直接/周围提取器520的又一实施例。图5的直接/周围提取器520可对应于图4的直接/周围提取器420。首先,直接部分545-1或周围部分545-2的估算的位准信息可从如前文说明的一直接/周围估算器接收到。接收到的位准信息545-1、545-2可在步骤550中组合/下混以分别获得直接部分555-1或周围部分555-2的下混位准信息。然后在步骤560中,增益参数gD 565-1、gA565-2分别可针对直接部分或周围部分而从下混位准信息555-1、555-2导算出。最后,直接/周围提取器520可用来施加导算得出的增益参数565-1、565-3至下混信号115(步骤570),因而将获得直接信号部分125-1或周围部分125-2。FIG. 5 shows a further embodiment of a direct/
此处,须注意,在图1、图4、图5的实施例中,下混信号115可由分别存在于直接/周围提取器120、420、520的输入端处的多个下混声道(Ch1…ChN)组成。Here, it should be noted that in the embodiments of FIGS. 1 , 4 , and 5 , the
在其它实施例中,直接/周围提取器520被构造为从直接部分或周围部分的下混位准信息555-1、555-2来测定直接对总(DTT)能比或周围对总(ATT)能比,并使用基于所测得的DTT能比或ATT能比的提取参数作为增益参数565-1、565-2。In other embodiments, the direct/
在又一些实施例中,直接/周围提取器520被构造为将下混信号115与第一提取参数sqrt(DTT)相乘来获得直接信号部分125-1,并且与第二提取参数sqrt(ATT)相乘来获得周围信号部分125-2。此处,下混信号115可对应于单声道下混信号215,如图2的实施例所示(「单声道下混情况」)。In yet other embodiments, the direct/
在单声道下混情况下,周围提取可通过施加sqrt(ATT)及sqrt(DTT)进行。但更明确言之,通过对各个声道Chi施加sqrt(ATTi)及sqrt(DTTi),对多声道下混信号相同办法也有效。In the case of a mono downmix, surround extraction can be performed by applying sqrt(ATT) and sqrt(DTT). But to be more specific, by applying sqrt(ATT i ) and sqrt(DTT i ) to each channel Ch i , the same method is also effective for a multi-channel downmix signal.
根据其它实施例,在下混信号115包含多个声道的清况下(「单声道下混情况」),直接/周围提取器520可被构造为来施加第一多个提取参数例如sqrt(DTTi)至下混信号115来获得直接信号部分125-1,并施加第二多个提取参数例如sqrt(ATTi)至下混信号115来获得周围信号部分125-2。此处,第一多个提取参数及第二多个提取参数可组成对角线矩阵。According to other embodiments, in the case where the
一般而言,直接/周围提取器120、420、520还可被构造为通过施加平方M×M提取矩阵至下混信号115来提取直接信号部分125-1或周围信号部分125-2,其中,平方M×M提取矩阵的大小(M)与下混声道(Ch1…ChN)的数目(M)相对应。In general, the direct/
因此,施加周围提取可被描述为施加平方M×M提取矩阵,其中,M为下混声道(Ch1…ChN)的数目。这可包括全部可能的方式来操纵输入信号来获得直接/周围输出,包括基于表示平方M×M提取矩阵(被构造为对角线矩阵)的主要组件的sqrt(ATTi)及sqrt(DTTi)参数的相当简单的办法,或被构造为完整矩阵的LMS交混办法。后者将在后文说明,此处,须注意,前述施加M×M提取矩阵的办法涵盖任何数目的声道,包括一个。Thus, applying ambient extraction can be described as applying a square M×M extraction matrix, where M is the number of downmix channels (Ch 1 . . . Ch N ). This can include all possible ways to manipulate the input signal to obtain direct/surrounding output, including sqrt(ATT i ) and sqrt(DTT i ) parameters, or the LMS hybrid approach structured as a full matrix. The latter will be explained later, and here, it should be noted that the aforementioned method of applying an M×M extraction matrix covers any number of channels, including one.
根据其它实施例,提取矩阵可以并非必然为M×M矩阵大小的平方矩阵,原因在于发明人具有较少数目的输出声道。因此,提取矩阵具有减少的行。该一实例可为提取单一直接信号来代替M。According to other embodiments, the extraction matrix may not necessarily be a square matrix of MxM matrix size because the inventors have a smaller number of output channels. Therefore, the extraction matrix has reduced rows. An example of this could be to extract a single direct signal instead of M.
也并非必要经常性取全部M个下混声道作为与具有提取矩阵的M列的输入。更明确言之,可与应用用途相关,此处并非必要具有全部声道作为输入信号。It is also not necessary to always take all M downmix channels as input to M columns with extraction matrices. More specifically, it may be relevant to the application use that it is not necessary here to have all channels as input signals.
图6示出了基于使用声道交混的LMS(最小均方)解的直接/周围提取器620的又一实施例600的框图。图6的直接/周围提取器620可对应于图1的直接/周围提取器120。在图6的实施例中,因此具有与图1实施例类似的实现方式和/或功能的相同框以相同的组件符号表示。但对应于图1的下混信号115的图6的下混信号615包含多个617下混声道Ch1…ChN,其中,下混声道的数目(M)小于多声道音频信号101的声道Ch1…ChN的数目(N),也即M<N。更明确言之,直接/周围提取器620被构造为通过使用声道交混的最小均方(LMS)解,来提取直接信号部分125-1或周围信号部分125-2,LMS解并不要求相等周围位准。如下提供此种LMS解,其并不要求相等周围位准,并且也可延伸至任何数目的声道。刚刚前述的LMS解并非强制性,而是表示前述办法的更精准替代之道。Fig. 6 shows a block diagram of yet another embodiment 600 of a direct/surround extractor 620 based on an LMS (least mean square) solution using channel mixing. The direct/surround extractor 620 of FIG. 6 may correspond to the direct/
用于直接/周围提取的交混权值的LMS解所使用的组件符号为:The component notation used for the LMS solution of blended weights for direct/surrounding extraction is:
Chi 声道iCh i channel i
αi 在声道i中的直接声音增益α i direct sound gain in channel i
D及声音的直接部分及其估值D and The immediate part of the sound and its valuation
Ai及声道i的周围部分及其估值A i and Surrounding part of channel i and its evaluation
PX=E[XX*] X的估算得的能量P X = E[XX*] Estimated energy of X
E[] 预期值E[] expected value
X的估算误差 X's estimation error
声道i对直接部分的LMS交混权值 The LMS blending weight of channel i to the direct part
声道n对声道i的周围部分的LMS交混权值 The LMS blending weight of channel n to the surrounding part of channel i
在本内文中,须注意,LMS解的导算可基于多声道音频信号的各个声道的频谱表示型态,其表示频带中的每项函数。In this context, it is noted that the derivation of the LMS solution may be based on the spectral representation of each channel of a multi-channel audio signal, which represents each function in the frequency band.
信号模型被表示为The signal model is expressed as
Chi=aiD+Ai Ch i =a i D+A i
导算首先处理a)直接部分,然后,b)周围部分。最后,导算出权值的解,并描述权值的标准化方法。The derivation first deals with a) the immediate part, then, b) the surrounding part. Finally, a solution to the weights is derived and a method for normalizing the weights is described.
a)直接部分a) direct part
权值直接部分的估算为The direct part of the weight is estimated as
估算误差读取Estimate Error Read
为了获得LMS解,发明人要求与输入信号正交To obtain the LMS solution, the inventors require Orthogonal to the input signal
E[EσChi]=0,对于全部kE[E σ Ch i ]=0, for all k
呈矩阵形式,前述关系式读成In matrix form, the aforementioned relation is read as
b)周围部分b) Surrounding part
发明人从相同的信号模型开始并根据下式来估算权值The inventors start from the same signal model and estimate the weights according to
估算误差为The estimated error is
并且正交性and orthogonality
以矩阵形式,前述关系式读成In matrix form, the preceding relation reads as
权值的解solution of weights
权值可通过颠倒矩阵A来求解,这对直接部分及周围部分的计算而言是相同的。在立体声情况下,该解为:The weights can be solved by inverting the matrix A, which is the same for the calculation of the immediate part and the surrounding part. In the stereo case, the solution is:
此处,div为除数a2a2PDPA1+a1a1PDPA2+PA1PA2。Here, div is the divisor a 2 a 2 P D P A1 +a 1 a 1 P D P A2 +P A1 P A2 .
权值的标准化Standardization of weights
权值用于LMS解,但因能量级(energy level)须保留,故将权值标准化。这如此也使得上式中由div项进行的除法变成不必要。标准化通过确保输出直接及周围声道为PD及PAi来进行,其中,i为声道指数。The weights are used for the LMS solution, but since the energy level must be preserved, the weights are standardized. This also makes the division by the div term in the above formula unnecessary. Normalization is done by ensuring that the output direct and surrounding channels are P D and P Ai , where i is the channel index.
直接假设发明人知晓声道间相干性、混合因子及声道能量。为求简明,发明人关注在二个声道的情况,并且特别为一对权值及其为从第一输入声道及第二输入声道产生第一周围声道的增益。步骤如下:It is straightforward to assume that the inventor knows the inter-channel coherence, mixing factor and channel energy. For simplicity, the inventors focus on the case of two channels, and specifically for a pair of weights and It is the gain to generate the first ambient channel from the first input channel and the second input channel. Proceed as follows:
步骤1:计算输出信号能量(其中,相干性部分逐振幅加总,而非相干部分逐能加总)Step 1: Calculate the output signal energy (where the coherent part sums amplitude by amplitude and the non-coherent part sums energy by energy)
步骤2:计算标准化增益因子Step 2: Calculate Normalized Gain Factor
并施加该结果至交混权值因子及在步骤1中,ICC的绝对值和符号操作数被包括为也考虑输入声道为负相干的情况。其余权值因子也以相同方式被标准化。and apply this result to the blend weight factor and In step 1, the absolute value and sign operands of ICC are included to also consider the case where the input channels are negatively coherent. The remaining weight factors are also normalized in the same way.
更明确言之,参考前文说明,直接/周围提取器620可被构造为通过假设稳定的多声道信号模型而导算出LMS解,使得LMS解不会限于立体声道下混信号。More specifically, referring to the foregoing description, the direct/surround extractor 620 can be configured to derive the LMS solution by assuming a stable multi-channel signal model, so that the LMS solution is not limited to the stereo channel downmix signal.
图7a示出了直接/周围估算器710的实施例700的框图,该估算器基于立体声周围估算公式。图7a的直接/周围估算器710可对应于图1的直接/周围估算器110。更明确言之,图7a的直接/周围估算器710被构造为针对多声道音频信号101的各声道(Chi)施加使用空间参数信息105的立体声周围估算公式,其中,该立体声周围估算公式可以函数相依性表示为Fig. 7a shows a block diagram of an
DTTi=fDTT[σ(Chi′R)′ICCi(Chi′R)]DTT i =f DTT [σ(Ch i′ R) ′ ICC i (Ch i′ R)]
ATTi=1-DTTi ATT i =1-DTT i
其明确地示出了对声道位准差(CLDi)或声道Chi的参数σi及声道间相干性(ICCi)参数的相依性。如图7a所示,空间参数信息105被馈送至直接/周围估算器710,并可包含各声道Chi的声道间关系式参数ICCi及σi。在通过使用直接/周围估算器710施加此一立体声周围估算公式之后,将分别在其输出715处获得直接对总(DTTi)能比或周围对总(ATTi)能比。须注意,前述用来估算各个DTT能比或ATT能比的立体声周围估算公式并非基于相等周围的条件。It explicitly shows the dependence on the channel level difference (CLD i ) or parameter σ i of the channel Ch i and the inter-channel coherence (ICC i ) parameter. As shown in Fig. 7a, the
更明确言之,直接/周围比值估算的执行方式为声道直接能量相对于该声道总能的比(DTT)可以公式表示为More specifically, direct/surround ratio estimation is performed in such a way that the ratio of the direct energy of a channel to the total energy of that channel (DTT) can be formulated as
这里,及Ch为检查声道,并且R为其余声道的线性组合。<>为时间平均值。当声道及其余声道的线性组合的周围位准假设为相等,并且其相干性为零时遵照此一公式。here, and Ch is the check channel, and R is a linear combination of the remaining channels. <> is the time average. This formula is followed when the ambient levels of a channel and a linear combination of other channels are assumed to be equal and their coherence is zero.
图7b示出了DTT(直接对总)能比760实例呈声道间相干性参数ICC770的函数的线图750。在图7b的实施例中,声道位准差(CLD)或参数σ例如设定为1(σ=1),使得声道Chi的位准P(Chi)与其余声道的线性组合R位准P(R)将为相等。在此种情况下,如标示以DTT~ICC的直线775指示,DTT能比760将与ICC参数成线性比例。从图7b可知,在ICC=0的情况下,其可对应于完全解相干性声道间关系式,DTT能比760将为0,其可对应于完全周围情况(情况“R1”)。但在ICC=1的情况下,其可对应于完全相干性声道间关系式,DTT能比760将为1,其可对应于全然直接情况(案例“R2”)。因此,在声道中相对于该声道的总能,在情况R1大致上并无直接能量,而在情况R2大致上并无周围能量。Figure 7b shows a line graph 750 of an example DTT (direct-to-total) energy ratio 760 as a function of the inter-channel coherence parameter ICC 770 . In the embodiment of Fig. 7b, the channel level difference (CLD) or the parameter σ is set to 1 (σ=1), for example, so that the linear combination of the level P(Ch i ) of the channel Ch i and the remaining channels R levels P(R) will be equal. In this case, the DTT energy ratio 760 will be linearly proportional to the ICC parameter, as indicated by the straight line 775 labeled DTT~ICC. From Fig. 7b, in the case of ICC=0, which may correspond to a fully decoherent inter-channel relationship, the DTT power ratio 760 will be 0, which may correspond to a fully ambient case (case "R 1 "). But in the case of ICC=1, which may correspond to a perfectly coherent inter-channel relationship, the DTT energy ratio 760 will be 1, which may correspond to a completely direct case (case “R 2 ”). Thus, there is substantially no direct energy in case R1 and substantially no ambient energy in case R2 relative to the total energy in the vocal tract.
图8示出了根据本发明的其它实施例的编码器/译码器系统800的框图。在该编码器/译码器系统800的译码器端上,示出了译码器820的实施例,其可与图1的装置100相对应。由于图1与图8实施例的相似性,这二个实施例中具有相似实现方式和/或功能的相同框以相同的组件符号表示。如图8的实施例所示,直接/周围提取器120可在具有多个下混声道Ch1…ChM的下混信号115上操作。图8的直接/周围估算器110进一步被构造为接收下混信号815(选择性)的至少二个下混声道825,使得多声道音频信号110的直接部分或周围部分的位准信息113将基于所接收的至少个二下混声道825的空间参数信息105以外估算。最后,在由直接/周围提取器120提取后,将获得直接信号部分125-1或周围信号部分125-2。FIG. 8 shows a block diagram of an encoder/decoder system 800 according to other embodiments of the present invention. On the decoder side of the encoder/decoder system 800, an embodiment of a
在该编码器/译码器系统800的编码器端上,示出了编码器810的实施例,其可包含下混器815,用来将多声道音频信号(Ch1…ChN)下混成为具有多个下混声道Ch1…ChM的下混信号115,其中,声道数目从N减少成M。下混器815还可被构造为通过根据多声道音频信号101计算声道间关系式来输出空间参数信息105。在图8的编码器/译码器系统800中,下混信号115及空间参数信息105可从编码器810传输至译码器820。这里,编码器810可基于下混信号115和空间参数信息105导算出编码信号用于从编码器端传输至译码器端。此外,空间参数信息105基于多声道音频信号101的声道信息。On the encoder side of the encoder/decoder system 800, an embodiment of an
另一方面,声道间关系式参数σi(Chi,R)及ICCi(Chi,R)可在编码器810的声道Chi与其余声道的线性组合R间计算,并且在编码信号的内部传输。译码器820又可接收编码信号,并且在所传输的声道间关系式参数σi(Chi,R)和ICCi(Chi,R)上操作。On the other hand, the inter-channel relationship parameters σ i (Ch i ,R) and ICC i (Ch i ,R) can be calculated between the linear combination R of the channel Ch i of the
另一方面,编码器810还可被构造为计算欲传输的成对不同声道(Chi,Chj)间的声道间相干性参数ICCi,j。在这种情况下,编码器810应能够根据所传输的逐对计算的ICCi,j(Chi,Chj)导算出声道Chi与其余声道的线性组合R之间的参数ICCi(Chi,R),使得实现前文已描述的对应实施例。在本上下文中须注意,译码器820无法单独从知晓下混信号115中来重建参数ICCi(Chi,R)。On the other hand, the
在实施例中,所传输的空间参数不仅关于逐对声道比较。In an embodiment, the transmitted spatial parameters are not only about pair-wise channel comparisons.
举例言之,最典型的MPS情况是具有二个下混声道。MPS译码中的第一空间参数集合使得二个声道变成三个声道:中、左及右。引导此种映射关系的参数集合被称作中心预测系数(CPC)和针对二对三组态具有专一性的ICC参数。For example, the most typical MPS situation is to have two downmix channels. The first set of spatial parameters in MPS coding makes two channels into three channels: center, left and right. The set of parameters that guide this mapping is called the Central Prediction Coefficient (CPC) and the ICC parameters that are specific for two-to-three configurations.
空间参数的第二集合被一分为二:侧声道分成相对应的前声道和后声道,而中心声道被分成中心声道和Lfe声道。这种映射关系与如前文介绍的ICC及CLD参数有关。The second set of spatial parameters is split in two: the side channels are split into corresponding front and rear channels, while the center channel is split into center and Lfe channels. This mapping relationship is related to the ICC and CLD parameters introduced above.
对全部下混组态类别及所有种类的空间参数类别皆找出计算规则并不实际。然而,虚拟地遵照下混步骤则是符合实际的。原因在于发明人知晓二声道变成三声道,而三声道变成六声道,最终,发明人找出二输入声道如何安排路径成为六输出声道的输入输出关系式。输出信号只有下混声道的线性组合加上其解相关(decorrelated)版本的线性组合。并非一定实际上译码输出信号并且测量它,而是发明人知晓此一“解码矩阵”,可以在运算上有效地计算参数域中任何声道或声道组合的ICC及CLD参数。It is not practical to find calculation rules for all downmix configuration classes and all kinds of spatial parameter classes. However, it is practical to follow the downmixing steps virtually. The reason is that the inventor knows that two channels become three channels, and three channels become six channels. Finally, the inventor finds out how to arrange the path of two input channels to become the input-output relational expression of six output channels. The output signal is just the linear combination of the downmix channels plus the linear combination of their decorrelated versions. It is not necessary to actually decode the output signal and measure it, but knowing such a "decoding matrix" the inventors can computationally efficiently calculate the ICC and CLD parameters for any channel or combination of channels in the parameter domain.
与下混信号组态及多声道信号组态独立无关,译码信号的各个输出为下混信号的线性组合加上其各自的解相关版本的线性组合。Independently of the downmix signal configuration and the multi-channel signal configuration, each output of the decoded signal is a linear combination of the downmix signal plus a linear combination of their respective decorrelated versions.
其中,操作数D[]对应于解相关器(decorrelator),也即,制成输入信号的不相干复本的处理程序。因子a和b是已知的,原因在于其可从参数侧边信息直接导算出。因从定义上,参数信息指导译码器如何从下混信号形成多声道输出信号。上式可简化成where the operand D[] corresponds to a decorrelator, that is, a process that makes an incoherent replica of the input signal. The factors a and b are known since they can be directly derived from the parametric side information. Because by definition, the parameter information instructs the decoder how to form the multi-channel output signal from the downmix signal. The above formula can be simplified to
原因在于全部解相关部分可组合用于能量/相干性比较。D的能量是已知的,原因在于因子b在第一式中也是已知的。The reason is that all decorrelation parts can be combined for energy/coherence comparison. The energy of D is known because the factor b is also known in the first equation.
根据这一点,须注意,发明人可在输出声道间或在输出声道的不同线性组合间做任一种相干性及能量比较。在二下混声道及一输出声道集合的简单例的情况下,声道号3及5相对彼此作比较,总和计算如下:In light of this, it should be noted that the inventors can make any kind of coherence and energy comparisons between output channels or between different linear combinations of output channels. In the simple case of two downmix channels and a set of output channels, channel numbers 3 and 5 are compared against each other and the sum is calculated as follows:
其中,E[]为预期(实际上:平均)操作数。两项可以公式表示如下where E[] is the expected (actually: average) number of operands. The two terms can be expressed as follows
全部前述参数皆是已知的,或从下混信号为可量测。交叉项E[Ch_dmx*D]被定义为零,因而在公式中的较下列。同理,相干性公式为All the aforementioned parameters are known or measurable from the downmix signal. The cross term E[Ch_dmx*D] is defined as zero and thus the lower column in the formula. Similarly, the coherence formula is
再者,因上式中的全部部分为输入信号加解相关信号的线性组合,故解可直接获得。Furthermore, since all parts in the above formula are linear combinations of the input signal plus the decorrelation signal, the solution can be obtained directly.
如上实例比较二个输出声道,但同理可进行输出声道的线性组合间的比较,诸如使用容后详述的处理程序实例。The above example compares two output channels, but comparisons between linear combinations of output channels can similarly be performed, such as using the example handlers described in detail later.
综合前述先前实施例,所呈现的技术/构想包含下列步骤:Combining the aforementioned previous embodiments, the presented technique/idea includes the following steps:
1.取得可能高于下混声道数目的一“原先”声道集合的声道间关系式(相干性,位准)。1. Obtain inter-channel relationships (coherence, level) for a "original" set of channels that may be higher than the number of downmix channels.
2.估算该“原先”声道集合的周围能量及直接能量。2. Estimate the ambient and direct energies of the "old" channel set.
3.将“原先”声道集合的周围能量及直接能量下混为较少的声道数目。3. Downmix the ambient and direct energies of the "original" channel set to a smaller number of channels.
4.通过施加增益因子或增益矩阵,使用下混能量来提取所提供的下混声道中的直接信号及周围信号。4. The downmix energy is used to extract the direct and surrounding signals in the provided downmix channel by applying gain factors or gain matrices.
空间参数侧边信息的使用通过图2的实施例将最佳地得到解释和概括。在图2的实施例中,发明人有一参数立体声串流,其包括单一音频声道及有关其所表示的立体声的声道间差(相干性,位准)的空间侧边信息。现在因发明人知晓声道间差,故可将如上立体声周围估算式施加至该声道间差,并得知原先声道集合的直接能量及周围能量。然后,发明人可通过加总直接能量(使用相干性加法)及周围能量(使用非相干性加法)来“下混”声道能量,并导算出该单一下混声道的直接对总能比及周围对总能比。The use of spatial parameter side information is best explained and summarized by the embodiment of FIG. 2 . In the embodiment of FIG. 2, the inventors have a parametric stereo stream that includes a single audio channel and spatial side information about the inter-channel differences (coherence, level) of the stereo it represents. Now since the inventor knows the inter-channel difference, the above stereo ambient estimation formula can be applied to the inter-channel difference, and the direct energy and the ambient energy of the original channel set are known. The inventors can then "downmix" the channel energy by summing the direct energy (using coherent addition) and the ambient energy (using incoherent addition), and derive the direct-to-total energy ratio and The surrounding pairs can always be compared.
参考图2的实施例,空间参数信息大致上包含声道间相干性参数(ICCL,ICCR)及声道位准差参数(CLDL,CLDR),它们分别与参数立体声音频信号的左声道(L)及右声道(R)相对应。此处,须注意,声道间相干性参数ICCL与ICCR是相等的(ICCL=ICCR),而声道位准差参数CLDL与CLDR通过CLDL=-CLDR而相关。相对应地,声道位准差参数CLDL与CLDR典型地分别为参数σL及σR的分贝值,故左(L)及右(R)声道的参数σL及σR通过σL=1/σR而相关。这些声道间差参数可以容易地用来基于立体声周围估算公式,而对二声道(L,R)计算各个直接对总能比(DTTL,DTTR)及周围对总能比(ATTL,ATTR)。在该立体声周围估算公式中,左声道(L)的直接对总能比及周围对总能比(DTTL,ATTL)取决于左声道L的声道间差参数(CLDL,ICCL),而右声道(R)的直接对总能比及周围对总能比(DTTR,ATTR)取决于右声道R的声道间差参数(CLDR,ICCR)。此外,对参数立体声音频信号的二声道L、R的能量(EL,ER)可分别基于左声道(L)及右声道(R)的声道位准差参数(CLDL,CLDR)来导算出。此处,左声道L的能量(EL)可通过施加左声道L的声道位准差参数(CLDL)至该单声道下混信号得知,而右声道R的能(ER)可通过施加右声道R的声道位准差参数(CLDR)至该单声道下混信号得知。然后通过将二声道(L,R)的能量(EL,ER)与相对应的基于DTTL、DTTR、及ATTL、ATTR的参数相乘,可获得对二声道(L,R)的直接能量(EDL,EDR)及周围能量(EAL,EAR)。然后,二声道(L,R)的直接能量(EDL,EDR)可通过使用相干性下混法则组合/相加来获得单声道下混信号的直接部分的下混能量(ED,mono);而二声道(L,R)的周围能量(EAL,EAR)可通过使用非相干性下混法则组合/相加来获得单声道下混信号的周围部分的下混能(EA,mono)。然后,通过找出直接信号部分及周围信号部分的下混能量(ED,mono,EA,mono)与该单声道下混信号的总能量(Emono)的关系式,将得知该单声道下混信号的直接对总能比(DTTmono)及周围对总能比(ATTmono)。最后,基于这些DTTmono能比及ATTmono能比,大致上可从该单声道下混信号提取直接信号部分或周围信号部分。Referring to the embodiment in Fig. 2, the spatial parameter information generally includes inter-channel coherence parameters (ICC L , ICC R ) and channel level difference parameters (CLLD L , CLDR ), which are respectively related to the left Channel (L) and right channel (R) correspond. Here, it should be noted that the inter-channel coherence parameters ICC L and ICC R are equal (ICC L =ICC R ), and the channel level difference parameters CLD L and CLD R are related by CLD L = -CLDR . Correspondingly, the channel level difference parameters CLD L and CLD R are typically the decibel values of the parameters σ L and σ R respectively, so the parameters σ L and σ R of the left (L) and right (R) channels pass through σ L = 1/σ R and related. These inter-channel difference parameters can be easily used to calculate the individual direct-to-total energy ratios (DTT L , DTT R ) and ambient-to-total energy ratios (ATT L ,ATT R ). In this stereo surround estimation formula, the direct-to-total-energy ratio and the ambient-to-total-energy ratio (DTT L , ATT L ) of the left channel (L) depend on the inter-channel difference parameters (CLD L , ICC L ), while the direct to total energy ratio and ambient to total energy ratio (DTT R , ATTR ) of the right channel ( R ) depend on the inter-channel difference parameters ( CLDR , ICC R ) of the right channel R. In addition, the energy (E L , E R ) of the two channels L and R of the parametric stereo audio signal can be based on the channel level difference parameters (CLD L , CLDR ) to derive the calculation. Here, the energy of the left channel L (E L ) can be obtained by applying the channel level difference parameter (CLLD L ) of the left channel L to the mono downmix signal, while the energy of the right channel R ( E R ) can be obtained by applying the channel level difference parameter (CLDR) of the right channel R to the mono downmix signal. Then by multiplying the energy (E L , E R ) of the two-channel (L, R) with the corresponding parameters based on DTT L , DTT R , and ATT L , ATT R , the two-channel (L ,R) direct energy (E DL , E DR ) and surrounding energy (E AL , E AR ). The direct energies (E DL , E DR ) of the two channels (L,R) can then be combined/added using the coherent downmixing law to obtain the downmix energy (E D ,mono ); while the ambient energies (E AL ,E AR ) of the two-channel (L,R) can be combined/added using incoherent downmixing rules to obtain the downmix of the ambient part of the mono downmix signal Can (E A, mono ). Then, by finding the relationship between the downmix energy (E D,mono ,EA ,mono ) of the direct signal part and the surrounding signal part and the total energy (E mono ) of the mono downmix signal, the Direct to Total Energy Ratio (DTT mono ) and Ambient to Total Energy Ratio (ATT Mono ) of the mono downmix signal. Finally, based on these DTT mono energy ratios and ATT mono energy ratios, either direct signal parts or ambient signal parts can roughly be extracted from the mono downmix signal.
在音频的再现上,经常需要通过头戴耳机而再现声音。耳机收听具有独特特征,使得其与扬声器收听并且也与任何自然声音环境有极大的不同。音频直接设定给左耳及右耳。再现的音频内容典型地再现给扬声器回放。因此,音频信号并未含有人类听觉系统用在空间声音知觉的性质及提示。除非系统中有导入双耳处理,否则即为此种情况。In audio reproduction, it is often necessary to reproduce sound through headphones. Headphone listening has unique characteristics that make it very different from speaker listening and also from any natural sound environment. Audio is set directly to the left and right ear. The reproduced audio content is typically reproduced for speaker playback. Therefore, audio signals do not contain the properties and cues that the human auditory system uses for spatial sound perception. This is the case unless binaural processing is imported into the system.
基本上,双耳处理可称作为一种处理程序,其取输入声音并对其修正,使得声音只含有知觉上正确的(就人类听觉系统处理空间声音而言)这些耳际性质及单耳性质。双耳处理并非直接工作,根据最先进的既有解决的方法仍然不是最佳的。Basically, binaural processing can be called a processing procedure that takes an input sound and modifies it so that the sound contains only those perceptually correct (in terms of the human auditory system processing spatial sound) these interaural and monaural properties . Binaural processing does not work directly and is still not optimal according to state-of-the-art existing solutions.
存在大量应用,其中,已经包括音频及电影回放的双耳处理,诸如被设计用来将多声道音频信号变换成耳机的双耳对应部分的媒体播放器及处理装置。典型的办法是使用头部相关传递函数(head-related transferfunctions(HRTF))来制作虚拟耳机,并加上室内效果给该信号。理论上,这可相当于在特殊室内使用耳机收听。There are numerous applications where binaural processing of audio and movie playback has been included, such as media players and processing devices designed to transform multi-channel audio signals into binaural counterparts for headphones. The typical approach is to use head-related transfer functions (HRTF) to make virtual headphones and add room effects to this signal. In theory, this could be the equivalent of listening with headphones in a special room.
然而,实际上重复示出这种办法尚未能一致地满足收听者。似乎需要折衷,使用此种直接方法的良好空间化牺牲了音频质量,诸如音色或音质改变变得不佳、室内效果恼人的知觉、以及动态的丧失。其它问题包括定位不准确(例如,头内定位、前后混淆),缺乏音源的空间距离,并且耳间(inter-aural)不匹配,也即由于耳间提示错误而靠近耳朵的听觉。However, it has been shown repeatedly in practice that this approach has not consistently satisfied listeners. There seems to be a tradeoff, good spatialization using this direct approach sacrifices audio quality, such as poor timbre or timbre changes, annoying perception of room effects, and loss of dynamics. Other problems include inaccurate localization (eg, intrahead positioning, front-to-back confusion), lack of spatial distance to sound sources, and inter-aural mismatch, that is, hearing close to the ear due to wrong interaural cues.
不同的收听者对判定的问题有极大差异。灵敏度也依输入材料各异,诸如音乐(就音色而言,质量标准严格)、电影(较不严格)及游戏(甚至更不严格,但定位是重要的)。根据内容也典型地存在不同的设计目的。Different listeners respond to adjudicated questions very differently. Sensitivity also varies depending on the input material, such as music (strict quality standards in terms of timbre), movies (less stringent) and games (even less stringent, but positioning is important). There are also typically different design purposes depending on the content.
因此,后文的细节尽可能成功地处理克服前述问题的办法来最大化平均知觉总体质量。Therefore, the details that follow deal as successfully as possible with ways of overcoming the aforementioned problems to maximize the average perceptual overall quality.
图9a示出了根据本发明其它实施例的双耳直接声音呈现装置910的概况900的框图。如图9a所示,双耳直接声音呈现装置910被构造为用于处理其可存在于图1实施例的直接/周围提取器120的输出处的直接信号部分125-1,以获得第一双耳输出信号915。第一双耳输出信号915可包含L指示的左声道及R指示的右声道。Fig. 9a shows a block diagram of an
此处,双耳直接声音呈现装置910可被构造为通过头部相关传递函数(HRTF)馈送直接信号部分125-1来获得已变换的直接信号部分。此外,双耳直接声音呈现装置910可被构造为施加室内效果给己变换的直接信号部分来最终获得第一双耳输出信号915。Here, the binaural direct
图9b示出了图9a的双耳直接声音呈现装置910的细节905的框图。双耳直接声音呈现装置910可包含框912指示的“HRTF变换器”及框914指示的室内效果处理装置(早期反映的并列混响或模拟)。如图9b所示,HRTF变换器912及室内效果处理装置914可通过并列施加头部相关传递函数(HRTF)及室内效果,而在直接信号部125-1上操作,由此将获得第一双耳输出信号915。Fig. 9b shows a block diagram of a
更明确言之,参考图9b,此种室内效果处理还可提供非相干性混响直接信号919,其可通过随后的交混滤波器920处理来使该信号适应扩散声场的耳间相干性。这里,滤波器920及HRTF变换器912组成第一双耳输出信号915。根据其它实施例,室内效果对直接声音的处理也可为早期反映的参数表示型态。More specifically, referring to Fig. 9b, such room effects processing may also provide an incoherent reverberant
因此,在实施例中,室内效果可以优选地与HRTF并列施加,而非串行施加(也即,通过HRTF馈送信号后施加室内效果)。更明确言之,唯有从来源直接传播的声音通过或由相应的HRTF变换。间接/混响声音可经概略估算也即以统计方式(通过采用相干性控制来代替HRTF)而进入耳朵。这也可通过串行实施,但并列方法是优选的。Therefore, in an embodiment, the indoor effect may preferably be applied in parallel with the HRTF rather than in series (ie, the indoor effect is applied after the signal is fed through the HRTF). More specifically, only sound propagating directly from the source passes through or is transformed by the corresponding HRTF. Indirect/reverberant sound can be approximated, ie statistically (by employing coherence control instead of HRTF) to enter the ear. This can also be done serially, but a parallel approach is preferred.
图10a示出了根据本发明的其它实施例的双耳周围声音呈现装置1010的概况1000的框图。如图10a所示,双耳周围声音呈现装置1010可被构造为用于处理其可存在于图1实施例的直接/周围提取器120的输出的周围信号部分125-2,以获得第二双耳输出信号1015。第二双耳输出信号1015可包含左声道(L)及右声道(R)。Fig. 10a shows a block diagram of an
图10b示出了图10a的双耳周围声音呈现装置1010的细节1005的框图。在图10b中可以看出,双耳周围声音呈现装置1010可被构造为将如标示以“室内效果处理”的框1012指示的室内效果施加给周围信号部分125-2,使得获得非相干性混响周围信号1013。此外,双耳周围声音呈现装置1010可被构造为通过施加滤波器(诸如框1014表示的交混滤波器)而处理非相干性混响周围信号1013,由此将提供第二双耳输出信号1015,第二双耳输出信号1015适用于实际扩散声场的耳间相干性。以“室内效果处理”标示的框1012也可被构造为使得其直接产生实际扩散声场的耳间相干性。在此种情况下,未使用框1014。Fig. 10b shows a block diagram of a
根据其它实施例,双耳周围声音呈现装置1010被构造为施加室内效果和/或滤波器至周围信号部分125-2用于提供第二双耳输出信号1015,使得第二双耳输出信号1015将适用于实际扩散声场的耳间相干性。According to other embodiments, the binaural ambient
在前述实施例中,解相关性及相干性控制可以在二个连续步骤中执行,但这不是必要的。也可以以单步骤处理实现相同的结果,而无需经中间非相干性信号的求取公式。两种方法同等有效。In the foregoing embodiments, decorrelation and coherence control may be performed in two consecutive steps, but this is not necessary. It is also possible to achieve the same result in a single-step process without going through an intermediate incoherent signal finding formula. Both methods are equally valid.
图11示出了多声道音频信号101的双耳再现的实施例1100的构想框图。更明确言之,图11的实施例表示一种用于多声道音频信号101的双耳再现的装置,其包含第一变换器1110(“频率变换”)、分离器1120(“直接-周围分离”)、双耳直接声音呈现装置910(“直接来源呈现”)、双耳周围声音呈现装置1010(“周围声音呈现”)、如“+”指示的组合器1130和第二变换器1140(“反相频率变换”)。更明确言之,第一变换器1110可被构造为用于将多声道音频信号101变换成频谱表示型态1115。分离器1120可被构造为用于从频谱表示型态1115提取直接信号部分125-1或周围信号部分125-2。这里,分离器1120可对应于图1的装置100,特别包括图1的实施例的直接/周围估算器110和直接/周围提取器120。如前文所解释的,双耳直接声音呈现装置910可在直接信号部分125-1上操作来获得第一双耳输出信号915。相对应地,双耳周围声音呈现装置1010可在周围信号部分125-2上操作来获得第二双耳输出信号1015。组合器1130可被构造为用于组合第一双耳输出信号915及第二双耳输出信号1015来获得组合信号1135。最后,第二变换器1140可被构造为用来将组合信号1135变换成时域来获得立体声输出音频信号1150(“用于耳机的立体声输出信号”)。FIG. 11 shows a conceptual block diagram of an
图11实施例的频率变换操作说明了在频率变换域中的系统功能,其为空间音频的听觉处理中的天然域。若该系统被在已经在频率变换域中发挥功能的系统上用作增上功能(add-on),则系统本身并非一定具有频率变换。The frequency transform operation of the FIG. 11 embodiment illustrates system functionality in the frequency transform domain, which is the natural domain in auditory processing of spatial audio. If the system is used as an add-on on a system already functioning in the frequency translation domain, then the system itself does not necessarily have frequency translation.
前述直接/周围分离方法可被再划分成二个不同部分。在直接/周围估算部分中,直接周围部分的位准和/或比基于信号模型的组合及音频信号的性质估算。在直接/周围提取部分中,已知的比及输入信号可用来形成周围信号的直接输出。The aforementioned direct/surround separation method can be subdivided into two distinct parts. In the direct/surrounding estimation part, the level and/or ratio of the immediate surrounding part is estimated based on a combination of signal models and properties of the audio signal. In the direct/surrounding extraction part, the known ratio and input signal can be used to form the direct output of the ambient signal.
最后,图12示出了包括双耳再现情况的直接/周围估算/提取的一实施例1200的总体框图。特定言之,图12的实施例1200可对应图11的实施例1100。但在实施例1200中,示出了与图1实施例的框110、120,其包括基于空间参数信息105的估算/提取处理程序,相对应的图11的分离器1120的细节。此外,与图11的实施例1100相反,并无任何不同域间的变换处理程序示出于图12的实施例1200。实施例1200的框也外显地在下混信号115运算,该信号可从多声道音频信号101导算出。Finally, Figure 12 shows a generalized block diagram of an
图13a示出了一种用于在滤波器排组域中从单声道下混信号提取直接/周围信号的装置1300实施例的框图。如图13a所示,装置1300包含一分析滤波器排组1310、用于直接部分的一合成滤波器排组1320、及用于周围部分的一合成滤波器排组1322。Fig. 13a shows a block diagram of an embodiment of an
更明确言之,装置1300的分析滤波器排组1310可被实施为执行短期傅里叶变换(STFT),或例如可被构造为分析QMF滤波器排组,而装置1300的合成滤波器排组1310可被实施为执行反相短期傅里叶变换(ISTFT),或例如可被构造为合成QMF滤波器排组。More specifically, the
分析滤波器排组1310被构造为用于接收单声道下混信号1315,其可对应于如图2的实施例所示的单声道下混信号215,并将单声道下混信号1315变换成多个滤波器排组子频带1311。如图13a可知,多个1311滤波器排组子频带分别连结至多个直接/周围提取框1350、1352,其中,多个直接/周围提取框1350、1352被构造为施加基于DTTmono参数或ATTmono参数1333、1335至滤波器排组子频带。
如图13b所示,基于DTTmono或ATTmono的参数1333、1335可由DTTmono,ATTmono计算器1330提供。更明确言之,图13b的DTTmono,ATTmono计算器1330可被构造为计算DTTmono,ATTmono能比,或从对应于参数立体声音频信号(例如图2的参数立体声音频信号201)的左声道和右声道(L,R)的所提供的声道间相干性及声道位准差参数(ICCL,CLDL,ICCR,CLDR),而导算出基于DTTmono或ATTmono的参数,已经对应地如前所述。此处,对单一滤波器排组子频带,可使用相对应的参数105和基于DTTmono或ATTmono的参数1333、1335。在本上下文中,指出了这些参数相对于频率并非常数。DTT mono or ATT mono based
由于施加了基于DTTmono或ATTmono的参数1333、1335的结果,分别可获得多个修正滤波器排组子频带1353、1355。随后,多个修正滤波器排组子频带1353、1355分别被馈至合成滤波器排组1320、1322,合成滤波器排组可被构造为合成多个修正滤波器排组子频带1353、1355,由此分别获得单声道下混信号1315的直接信号部分1325-1或周围信号部分1325-2。这里,图13a的直接信号部分1325-1对应于图2的直接信号部分125-1,而图13a的周围信号部分1325-2对应于图2的直接信号部分125-2。As a result of applying DTT mono or ATT mono based
参考图13b,图13a的多个1350、1352直接/周围提取框的直接/周围提取框1380特别包含DTTmono,ATTmono计算器1330和乘法器1360。乘法器1360可被构造为将多个滤波器排组子频带1311的单一滤波器排组(FB)子频带1301乘以相对应的基于DTTmono或ATTmono的参数1333、1335,使得获得多个滤波器排组子频带1353、1355的修正单一滤波器排组子频带1365。更明确言之,在框1380属于多个1350框的情况下,直接/周围提取框1380被构造为施加基于DTTmono的参数;而在框1380属于多个框1352的情况下,其被构造为施加基于ATTmono的参数。此外,修正单一滤波器排组子频带1365可提供直接部分或周围部分的相应的合成滤波器排组1320、1322。Referring to FIG. 13b, a direct/surrounding extraction block 1380 of the
根据实施例,空间参数及导算出的参数根据人类听觉系统的关键频带(例如28频带)而以频率分辨率提供,通常低于滤波器排组的分辨率。According to an embodiment, the spatial parameters and derived parameters are provided at a frequency resolution according to key frequency bands of the human auditory system (eg band 28), typically lower than the resolution of the filter bank.
因此,根据图13a的实施例的直接/周围提取大致上基于逐子频带计算得的声道间相干性及声道位准差参数(可与图3b的声道间关系式参数335相对应)而在滤波器排组域的不同子频带上运算。Therefore, the direct/surrounding extraction according to the embodiment of Fig. 13a is substantially based on inter-channel coherence and channel level difference parameters calculated sub-band by sub-band (may correspond to the inter-channel relation parameter 335 of Fig. 3b) Instead, operate on different subbands of the filter bank domain.
图14示出了根据本发明的又一实施例的MPEG环绕译码方案1400的实例的示意说明图。更明确言之,图14实施例描述从立体声下混信号1410译码成6个输出声道1420。此处,标示以“res”的信号为残响信号,其为解相关信号的选择性置换(从标示以“D”的框获得)。根据图14实施例,空间参数信息或声道间关系式参数(ICC,CLD)在MPS串流内部从编码器,诸如图8的编码器810,传输至译码器诸如图8的译码器820,分别可用来产生标示以“前置解相关器矩阵M1”及“混合矩阵M2”的解码矩阵1430、1440。图14的实施例所特有的为:通过使用混合矩阵M21440从侧声道(L,R)及中心声道(C)(L,R,C 1435)产生输出声道1420(也即上混声道L、LS、R、RS、C、LFE)大致上由空间参数信息1405决定,其可对应于图1的空间参数信息105,包含根据MPS环绕标准的特殊声道间关系式参数(ICC,CLD)。FIG. 14 shows a schematic illustration of an example of an MPEG
这里,将左声道(L)划分成对应的输出声道L、LS,将右声道(R)划分成对应的输出声道R、RS,以及将中心声道(C)划分成对应的输出声道C、LFE,这种划分可以由具有相对应的ICC、CLD参数的各个输入信号的一分为二(OTT)的组态表示。Here, the left channel (L) is divided into corresponding output channels L, LS, the right channel (R) is divided into corresponding output channels R, RS, and the center channel (C) is divided into corresponding Output channels C, LFE, this division can be represented by a one-to-two (OTT) configuration of the respective input signals with corresponding ICC, CLD parameters.
特别地,与“5-2-5组态”相对应的MPEG环绕译码方案1400实例例如可包含下列步骤。在第一步骤中,空间参数或参数侧边信息可调配成译码矩阵1430、1440,其在图14中根据既有的MPEG环绕标准示出。在第二步骤中,解码矩阵1430、1440可用于在参数域中来提供上混声道1420的声道间信息。在第三步骤中,使用如此提供的声道间信息,可计算各个上混声道的直接/周围能量。在第四步骤中,如此所得的直接/周围能量可下混至下混声道1410的数目。在第五步骤中,计算将施加给下混声道1410的权值。In particular, an example of the MPEG
在更进一步之前,须指出,刚刚前述的处理要求量测值为Before going any further, it should be pointed out that the processing just described requires a measurement of
E[|Ldmx|2],E[|Rdmx|2]。E[|L dmx | 2 ], E[|R dmx | 2 ].
其为下混声道的平均功率,以及which is the average power of the downmix channels, and
其可被称作为来自下混声道的交叉频谱。这里,下混声道的平均功率有目的地被称作为能量,原因在于“平均功率”一词并非常用的术语。It may be referred to as the cross-spectrum from the downmix channel. Here, the average power of the downmix channels is purposely called energy, since the term "average power" is not a commonly used term.
由方括号指示的预期操作数在实际应用中可以由时间平均、递归或非递归来置换。能量和交叉频谱从下混信号直接可量测。Expected operands indicated by square brackets may be replaced by time averaging, recursion, or non-recursion in practice. Energy and cross-spectrum are directly measurable from the downmix signal.
也须注意,二声道的线性组合能量可从声道能量、混合因子、及交叉频谱中导出公式(全部皆在参数域中,这里,无需信号运算)。It should also be noted that the linear combination energy of two channels can be derived from the formula of channel energy, mixing factor, and cross spectrum (all in the parameter domain, here, no signal operation is required).
线性组合linear combination
Ch=aLdmx+bRdmx Ch=aL dmx +bR dmx
具有下述能量:Has the following energies:
以下说明处理程序(也即译码方案)的各个步骤。The individual steps of the processing procedure (ie, the decoding scheme) are described below.
第一步骤(混合矩阵的空间参数)First step (spatial parameters of the mixing matrix)
如前所述,M1和M2矩阵根据MPEG环绕标准形成。M1的第a列、第b行元素为M1(a,b)。As mentioned earlier, the M1 and M2 matrices are formed according to the MPEG Surround standard. The element in column a and row b of M1 is M1(a, b).
第二步骤(具有下混至上混声道的声道间信息的能量及交叉频谱的混合矩阵)Second step (mixing matrix with energy and crossover spectrum of inter-channel information from downmixed to upmixed channels)
现在发明人已有混合矩阵M1和M2。发明人需要用公式表达输出声道如何根据左下混声道(Ldmx)及右下混声道(Rdmx)创建。发明人假设使用解相关器(图14,灰色区)。MPS标准的解码/上混基本上最终提供整个处理程序中用于总输入/输出关系式的如下公式:The inventor now has mixing matrices M1 and M2. The inventor needs to formulate how the output channels are created from the left downmix channel (L dmx ) and the right downmix channel (R dmx ). The inventors assumed the use of a decorrelator (Fig. 14, gray area). The decoding/upmixing of the MPS standard basically ends up providing the following formula for the total input/output relationship in the whole process:
L=aLLdmx+bLRdmx+cLD1[S1]+dLD2[S2]+eLD3[S3]L=a L L dmx +b L R dmx +c L D 1 [S 1 ]+d L D 2 [S 2 ]+e L D 3 [S 3 ]
前文说明已上混的前左声道实例。其它声道可以以相同方式导出公式。D组件为解相关器,a-e为从M1及M2矩阵条目可求出的权值。The previous section illustrates an example of a front left channel that has been upmixed. Other channels can derive formulas in the same way. The D component is a decorrelator, and a-e are weights that can be obtained from the M1 and M2 matrix entries.
具体地,因子a-e可根据矩阵条目直接以公式表示:Specifically, the factors a-e can be directly expressed in the formula according to the matrix entries:
cL=M21,4 c L = M2 1,4
dL=M21,5 d L = M2 1,5
eL=M21,6 e L =M2 1,6
及相应地用于其它声道。and correspondingly for other channels.
S信号为The S signal is
Sn=M1n+3,1Ldmx+M1n+3,2Rdmx S n =M1 n+3,1 L dmx +M1 n+3,2 R dmx
这些S信号为从图14左侧矩阵至解相关器的输入。该能量These S signals are the inputs from the matrix on the left side of Fig. 14 to the decorrelator. the energy
E[|D[Sn]|2]=E[|Sn|2]E[|D[S n ]| 2 ]=E[|S n | 2 ]
可如前文解说的那样计算。解相关器并不影响该能量。can be calculated as explained above. The decorrelator does not affect this energy.
进行多声道周围提取的感性动机方式是通过一声道对全部其它声道之和作比较(注意这仅为多选项中的一个选项)。现在,举例说明考虑声道L的案例,声道的其余部分读成:An intuitively motivated way to do multi-channel surround extraction is by comparing one channel to the sum of all other channels (note that this is only one option out of multiple). Now, for example considering the case of the vocal tract L, the rest of the vocal tract reads:
发明人在此处使用“X”,原因在于对“其余声道”使用“R”可能产生混淆。The inventors used an "X" here because the use of an "R" for "the rest of the channels" might be confusing.
然后,声道L的能量为Then, the energy of the channel L is
然后,声道X的能量为Then, the energy of channel X is
及交叉频谱为:and the cross spectrum is:
现在发明人可将ICC公式化The inventors can now formulate the ICC
并求和总和and sum the sum
第三步骤(上混声道的声道间信息对上混声道的DTT参数)现在发明人可根据下式计算声道LThe third step (the inter-channel information of the upmix channel to the DTT parameter of the upmix channel) now the inventor can calculate the channel L according to the following formula
L的直接能量为The direct energy of L is
E[|DL|2]=DTT·E[|L|2]E[|D L | 2 ]=DTT·E[|L| 2 ]
L的周围能量为The surrounding energy of L is
E[|AL|2]=(1-DTT)·E[|L|2]E[|A L | 2 ]=(1-DTT)·E[|L| 2 ]
第四步骤(下混直接/周围能量)Step Four (Downmix Direct/Ambient Energy)
若使用非相干性下混法则实例,则左下混声道周围能量为If an example of the incoherent downmixing method is used, the energy around the left downmix channel is
,对直接部分及左声道的直接及周围部分也相同。注意前文说明只是一种下混法则。也可有其它下混法则。, and the same for the direct part and the direct and surrounding part of the left channel. Note that the previous description is only a downmixing rule. Other downmixing rules are also possible.
第五步骤(计算在下混声道中的周围提取的权值)Step 5 (calculate weights for surrounding extraction in the downmix channel)
左下混DTT比为The DTT ratio of the lower left mix is
然后权值因子的计算可如图5的实施例所述(也即使用sqrt(DTT)或sqrt(1-DTT)办法)或如图6的实施例所述(也即使用交混矩阵方法)计算。Then the calculation of the weight factor can be as described in the embodiment of Figure 5 (that is, using the sqrt (DTT) or sqrt (1-DTT) method) or as described in the embodiment of Figure 6 (that is, using the mixing matrix method) calculate.
基本上,前述处理程序的实例有关在下混声道的中MPS串流对周围比的CPC、ICC、及CLD参数。Basically, the foregoing processing example is concerned with the CPC, ICC, and CLD parameters of the MPS stream-to-ambient ratio in the downmix channel.
根据其它实施例,典型地存在其它手段来达成类似目的及其它情况。举例言之,可存在前文说明者以外的其它法则用于下混、其它扬声器布局、其它译码方法及其它进行多声道周围估算方式,其中,特定声道与其余声道作比较。According to other embodiments, there are typically other means to achieve similar ends and other circumstances. For example, there may be other algorithms than those described above for downmixing, other speaker layouts, other coding methods, and other ways of doing multi-channel ambient estimation where a specific channel is compared to the rest.
尽管本发明已经在框图的背景下进行了描述,但本发明也可通过计算机实施方法来实现,其中,框表示实际或逻辑硬件组件。在后者情况下,框表示对应的方法步骤,其中,这些步骤代表由对应逻辑或实体硬件框执行的功能。Although the invention has been described in the context of block diagrams, the invention can also be implemented by computer-implemented methods, where the blocks represent actual or logical hardware components. In the latter case, the blocks represent corresponding method steps, wherein these steps represent functions performed by corresponding logical or physical hardware blocks.
所述实施例仅供举例说明本发明的原理。须了解,此处所述的配置及细节的修正及变化为本领域技术人员显而易见。因此其旨在仅受所附权利要求的范围所限而非受此处实施例的举例说明及解释所呈现的特定细节所限。The examples are presented merely to illustrate the principles of the invention. It should be understood that modifications and variations in the arrangements and details described herein will be apparent to those skilled in the art. It is therefore the intention to be limited only by the scope of the appended claims rather than by the specific details presented in the illustration and explanation of the embodiments herein.
根据本发明方法的若干实现要求,本发明方法可于硬件或于软件实施。实作可使用数字储存媒体执行,特别为具有可读取控制信号储存于其上的盘片、DVD或CD,其可与可程序规划计算机系统协力合作因而执行本发明方法。一般而言,本发明因而可作为具有程序代码储存于机器可读取载体上的计算机程序产品实施,当该计算机程序产品于计算机上跑时,该程序代码可运算用于执行本发明方法。换言之,本发明方法因而为具有程序代码的一种计算机程序,当该计算机程序于计算机上运行时该程序代码可用于执行本发明方法中的至少一者。本发明编码音频信号可储存在任一种机器可读取储存媒体,诸如数字储存媒体。According to some implementation requirements of the inventive method, the inventive method can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disc, DVD or CD with readable control signals stored thereon, which can cooperate with a programmable computer system to carry out the inventive method. In general, the invention can thus be implemented as a computer program product having a program code stored on a machine-readable carrier, which program code is operable to carry out the inventive method when the computer program product is run on a computer. In other words, the inventive method is thus a computer program with a program code which can be used to perform at least one of the inventive methods when the computer program is run on a computer. The encoded audio signal of the present invention may be stored on any machine-readable storage medium, such as a digital storage medium.
该新颖构想及技术的优点为本案所述前述实施例,也即装置、方法或计算机程序允许借助于参数空间信息而从音频信号估算与提取直接和/或周围组件。更明确言之,本发明的新颖处理在频带中发挥功能,如同典型地在周围提取领域中那样。所呈现的构想与音频信号处理有关,原因在于有多项应用要求直接及周围组件与音频信号分开。The advantages of this novel concept and technique are the foregoing embodiments described in this application, ie a device, a method or a computer program allowing estimation and extraction of direct and/or surrounding components from an audio signal by means of parametric spatial information. More specifically, the novel processing of the present invention functions in frequency bands as typically in the field of ambient extraction. The concepts presented are relevant to audio signal processing, since several applications require separation of the immediate and surrounding components from the audio signal.
与先前技术的周围提取方法相反,本构想并非仅基于立体输入信号,其也可应用至单声道下混情况。用于单一声道下混,通常并无声道间差异可资运算。但通过考虑空间侧边信息,周围提取在此种情况也变可能。Contrary to the surrounding extraction methods of the prior art, the present concept is not based only on stereo input signals, it can also be applied to the mono downmix case. For single-channel downmixing, there is usually no channel-to-channel difference to calculate. But by considering spatial side information, surrounding extraction becomes possible in this case as well.
本发明的优点在于其利用空间参数来估算“原先”信号的周围位准。其基于下述构想:空间参数已经含有有关“原先”立体声或多声道信号的声道间差的相关信息。An advantage of the present invention is that it uses spatial parameters to estimate the ambient level of the "old" signal. It is based on the idea that the spatial parameters already contain relevant information about the inter-channel differences of the "original" stereo or multi-channel signal.
一旦估算原先立体声或多声道信号的周围位准,也可在所提供的下混声道导算出直接位准及周围位准。此可由周围部分的周围能量及直接部分的直接能量或振幅的线性组合(也即加权加总)进行。因此,本发明的实施例借助于空间侧边信息来提供周围估算及提取。Once the ambient level of the original stereo or multi-channel signal is estimated, direct and ambient levels can also be derived in the provided downmix channels. This can be done from a linear combination (ie a weighted sum) of the ambient energy of the surrounding part and the direct energy or amplitude of the immediate part. Therefore, embodiments of the present invention provide surrounding estimation and extraction by means of spatial side information.
从基于侧边信息的处理的此种构想延伸,存在有下列有利性质或优点。Extending from this idea of side information based processing, there are the following advantageous properties or advantages.
本发明的实施例借助于空间侧边信息及所提供的下混声道而提供周围估算。当连同侧边信息提供多于一个下混声道的情况下,这些及周围估算相当重要。侧边信息及从下混声道量测得的信息可一起用在周围估算。于具有立体声下混的MPEG环绕,此二信息源共同提供原先多声道声音的声道间关系式的完整信息,及周围估算系基于这些关系式。Embodiments of the present invention provide ambient estimation by means of spatial side information and the provided downmix channel. These and surrounding estimates are important when more than one downmix channel is provided along with side information. The side information and the information measured from the downmix channel volume can be used together for ambient estimation. In MPEG Surround with stereo downmix, these two sources of information together provide complete information on the inter-channel relations of the original multi-channel sound, and the surrounding estimation is based on these relations.
本发明的实施例也提供直接能量及周围能量的下混。在所述基于侧边信息的周围提取的情况下,有个中间步骤于高于所提供的下混声道的多个声道估算周围。因此,此种周围信息须以有效方式对映至下混音频声道数目。此种处理程序可称作为下混,原因在于其与音频声道的下混相对应。如此可通过如同所提供的下混声道下混的相同方式组合直接能量及周围能量可最直捷地进行。Embodiments of the present invention also provide downmixing of direct energy and ambient energy. In the case of the side information based ambient extraction, there is an intermediate step above the multi-channel estimation of the provided downmix channel. Therefore, such ambient information has to be mapped to the number of downmixed audio channels in an efficient manner. Such a process may be referred to as downmixing, since it corresponds to the downmixing of audio channels. This is most straightforwardly done by combining the direct energy and the ambient energy in the same way as the downmix channel downmix provided.
下混法则不具有一个理想解,反而可能取决于应用用途。例如,于MPEG环绕,由于典型地信号内容不同,故有利地差异处理各声道(中心、前扬声器、后扬声器)。The downmixing law does not have an ideal solution, but may depend on the application. For example, in MPEG Surround, each channel (center, front speaker, rear speaker) is advantageously processed differently since typically the signal content is different.
此外,实施例提供多声道周围估算,其于各个声道相对于其它声道乃独立无关。此种性质/办法允许单纯使用所呈现的立体周围估算式给各声道相对于全部其它声道。借此手段,无需假设全部声道的周围位准相等。所呈现的办法系基于假设有关空间知觉,于各声道的周围组件为该组件于全部其它声道中的部分具有不相干的对应部分。提示此种假设为有效的实例为发出噪声的二声道中的一者(周围)可进一步划分成各自具有半量能的二声道,而未对所接收的声音场景造成显著影响。Furthermore, embodiments provide multi-channel ambient estimation that is independent for each channel with respect to other channels. This property/approach allows pure use of the stereo surround estimation formula presented for each channel relative to all other channels. By this means, there is no need to assume that the ambient levels of all channels are equal. The presented approach is based on the assumption regarding spatial perception that the surrounding components in each channel have incoherent counterparts for that component's parts in all other channels. An example suggesting that such an assumption is valid is that one of the noisy two channels (surroundings) can be further divided into two channels each with half volume energy, without noticeable impact on the received sound scene.
就信号处理而言,有利的是,通过施加所呈现的周围估算式至各声道与全部其它声道的线性组合相比较,可进行实际直接/周围比估算。As far as signal processing is concerned, it is advantageous that by applying the presented ambient estimation formula to each channel compared to a linear combination of all other channels, the actual direct/ambient ratio estimation can be done.
最后,实施例提供了施加已估算的直接周围能量来提取实际信号。一旦已知下混声道的周围位准,则可应用两种本发明方法来获得周围信号。第一方法基于简单乘法,其中,各个下混声道的直接部分及周围部分可通过该信号乘以sqrt(直接对总能比)及sqrt(周围对总能比)而产生。如此对各个下混声道提供彼此相干的二个信号,但二信号具有直接部分及周围部分经估算得的能量。Finally, an embodiment provides for applying the estimated immediate ambient energy to extract the actual signal. Once the ambient level of the downmix channel is known, two inventive methods can be applied to obtain the ambient signal. The first method is based on simple multiplication, where the direct and ambient parts of each downmix channel can be generated by multiplying the signal by sqrt (direct to total energy ratio) and sqrt (surround to total energy ratio). In this way, two signals that are coherent to each other are provided for each downmix channel, but the two signals have estimated energies of the immediate part and the surrounding part.
第二方法基于带有各声道交混的最小均方解,其中,声道交混(也可能具有负号)允许比前述解,更佳地估算直接周围信号。与在“立体信号的多扬声器回放”,C.Faller,AES会议,2007年10月;及“专利申请案名称:从立体信号产生多声道音频信号的方法”,发明人:Christof Faller,代理人:FISH&RICHARDSON P.C.,受让人:LG电子公司,源自:美国明尼苏达州明尼波里市,IPC8类别:AH04R500FI,USPC类别:3811所提供的声道的立体声输入及相等周围位准的最小平均解相反,本发明提供了最小均方解,该方法并不要求相等的周围位准,也可延伸至任何数目的声道。The second method is based on a least mean square solution with channel mixing, where channel mixing (possibly also with a negative sign) allows a better estimation of the immediate surrounding signal than the previous solution. and in "Multi-Speaker Playback of Stereo Signals", C. Faller, AES Conference, October 2007; and "Patent Application Title: Method for Producing Multi-Channel Audio Signals from Stereo Signals", Inventor: Christof Faller, Attorney By: FISH & RICHARDSON P.C., Assignee: LG Electronics, Origin: Minneapolis, Minnesota, USA, IPC8 Class: AH04R500FI, USPC Class: 3811 Minimum Average of Stereo Input and Equivalent Ambient Levels for Channels Provided Solution In contrast, the present invention provides a least mean square solution which does not require equal ambient levels and which can be extended to any number of channels.
新颖处理的额外性质如下。在双耳呈现的周围处理中,周围可使用滤波器处理,该滤波器具有提供在频带的耳际相干性类似于实际扩散声场的耳际相干性性质,其中,该滤波器也包括室内效果。于双耳呈现的直接部分处理中,直接部分可馈送通过头部相关传递函数(HRTF)可能加上室内效果,诸如早期反射和/或混响。Additional properties of the novel treatment are as follows. In ambient processing for binaural rendering, the ambient may be processed using a filter having the property of providing an interaural coherence in frequency bands similar to that of an actual diffuse sound field, wherein the filter also includes room effects. In direct part processing for binaural rendering, the direct part may be fed through a head related transfer function (HRTF) possibly adding room effects such as early reflections and/or reverberation.
除此之外,与干/湿控制相对应的“分离位准”控制可在其它实施例实现。更明确言之,在许多应用中可能并不期望全然分离,原因在于可能导致听觉假影缺陷,例如突然改变、调变效应等。因此,所述处理程序的全部相关部分可以“分离位准”控制实施用来控制期望且有用的分离量。至于图11,此种分离位准控制由控制直接/周围分离1120的虚线框和/或双耳呈现装置910、1010的控制输入信号1105指示。此项控制可类似于音频效应处理的干/湿控制发挥效果。Besides, "separation level" control corresponding to dry/wet control can be implemented in other embodiments. More specifically, full separation may not be desired in many applications, since it may lead to auditory artifact defects, such as sudden changes, modulation effects, and the like. Thus, all relevant portions of the process can be implemented with "separation level" control to control the desired and useful amount of separation. As for FIG. 11 , such separation level control is indicated by the dashed box controlling the direct/
所提供的解的主要效果如下。系统在全部情况下皆有效,也可使用参数立体声及带有单声道下混信号的MPEG环绕,与只依赖于下混信息的先前解不同。此外,比较使用下混声道的单纯声道间分析,系统可利用与音频信号一起在空间音频位串流中传输的空间侧边信息来更准确地估算直接能量及周围能量。因此,许多应用诸如双耳处理可通过施加不同处理用于声音的直接部分及周围部分而获益。The main effects of the provided solution are as follows. The system is valid in all cases and can also use parametric stereo and MPEG surround with a mono downmix signal, unlike previous solutions which relied only on the downmix information. Furthermore, the system can exploit the spatial side information transmitted with the audio signal in the spatial audio bitstream to more accurately estimate direct and ambient energy compared to purely inter-channel analysis using downmix channels. Therefore, many applications such as binaural processing can benefit by applying different processing for the immediate part and the surrounding part of the sound.
实施例基于下列心理声学假设。人类听觉系统基于时间-频率片(tile)(限于某些频率及时间范围的区域)的耳间提示而定位音源。若有二个或多个时间及频率上重迭的不相干并列音源同时呈现在不同位置,则听觉系统无法觉察音源的所在位置。原因在于这些音源的和并未在收听者产生可靠的耳际提示。如此听觉系统可能作如此描述,从靠近时间-频率片的音频场景(scene)拾取而提供可靠定位信息,但将其余部分视为无法定位。藉此手段表示听觉系统可在复杂的声音环境定位音源。同时相干性音源具有不同效应,形成在相干性音源间的单一音源所可能形成的相同耳际提示。The embodiments are based on the following psychoacoustic assumptions. The human auditory system localizes sound sources based on interaural cues in time-frequency tiles (regions limited to certain frequency and time ranges). If two or more irrelevant parallel sound sources overlapping in time and frequency are presented at different positions at the same time, the auditory system cannot perceive the position of the sound source. The reason is that the sum of these sources does not produce reliable ear cues in the listener. The auditory system may thus be described as picking up audio scenes close to the time-frequency slice to provide reliable localization information, but treating the rest as unlocalizable. This means that the auditory system can locate the sound source in a complex sound environment. At the same time coherent sources have different effects, forming the same ear cue that a single source among coherent sources would form.
此点也为实施例所利用的性质。可估算可定位(直接)及不可定位(周围)声音位准,然后提取这些组件。空间化信号处理只应用至可定位/直接部分,而扩散/空间感/包封处理系应用至不可定位/周围部分。如此在双耳处理系统的设计上获得显著效果,原因在于多项处理只能应用至需要之处,而留下其余信号不受影响。全部处理皆系出现在近似人类听觉频率分辨率的频带。This point is also a property utilized by the embodiments. Localizable (direct) and non-localizable (surrounding) sound levels can be estimated and these components extracted. Spatialization signal processing is applied to localizable/direct parts only, while diffusion/spatial/enveloping processing is applied to non-localizable/surrounding parts. This has a significant effect on the design of binaural processing systems, since multiple processing is only applied where needed, leaving the rest of the signal unaffected. All processing occurs in frequency bands that approximate the frequency resolution of human hearing.
实施例基于信号的分解来最大化知觉质量,但将所察觉的问题最小化。通过使用此种分解,可以分开获得音频信号的直接组分及周围组分。然后二组分经进一步处理来达成期望的效果或表示型态。Embodiments maximize perceptual quality but minimize perceived problems based on a decomposition of the signal. By using this decomposition, the immediate and surrounding components of the audio signal can be obtained separately. The two components are then further processed to achieve the desired effect or expression.
更明确言之,本发明的实施例允许在编码域中借助于空间侧边信息做周围估算。More specifically, embodiments of the present invention allow surrounding estimation in the coding domain with the aid of spatial side information.
本发明的优点还在于可通过分离直接信号及周围信号中的信号,来减少头戴耳机再现音频信号的典型问题。实施例允许改善施加至用于耳机再现的双耳声音呈现的既有直接/周围提取方法。The invention is also advantageous in that it reduces the typical problems of headphones reproducing audio signals by separating the direct signal from the surrounding signal. Embodiments allow improving the existing direct/ambient extraction methods applied to binaural sound presentation for headphone reproduction.
基于空间侧边信息的处理的主要用途案例为自然MPEG环绕及参数立体声(以及类似的参数编码技术)。从周围提取可获益的典型应用用途为双耳回放,原因在于其可施加不同室内效果程度至声音的不同部分;以及上混至更多个声道,原因在于可差异地定位及处理声音的不同组分。可能还存在一些应用用途,其中,使用者要求修正直接/周围位准,例如用于智能地增强语音。The main use cases for processing based on spatial side information are natural MPEG surround and parametric stereo (and similar parametric coding techniques). Typical application uses that benefit from ambient extraction are binaural playback, since it can apply different degrees of room effects to different parts of the sound, and upmixing to more channels, since it can localize and process parts of the sound differently different components. There may also be some application uses where the user requires modification of the direct/surrounding alignment, eg for intelligent speech enhancement.
Claims (16)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29527810P | 2010-01-15 | 2010-01-15 | |
US61/295,278 | 2010-01-15 | ||
EP10174230.2 | 2010-08-26 | ||
EP10174230A EP2360681A1 (en) | 2010-01-15 | 2010-08-26 | Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information |
PCT/EP2011/050265 WO2011086060A1 (en) | 2010-01-15 | 2011-01-11 | Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102804264A true CN102804264A (en) | 2012-11-28 |
CN102804264B CN102804264B (en) | 2016-03-09 |
Family
ID=43536672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180014038.9A Active CN102804264B (en) | 2010-01-15 | 2011-01-11 | For from lower mixed signal and spatial parameter information extraction directly/device and method of ambient signals |
Country Status (14)
Country | Link |
---|---|
US (1) | US9093063B2 (en) |
EP (2) | EP2360681A1 (en) |
JP (1) | JP5820820B2 (en) |
KR (1) | KR101491890B1 (en) |
CN (1) | CN102804264B (en) |
AR (1) | AR079998A1 (en) |
AU (1) | AU2011206670B2 (en) |
BR (1) | BR112012017551B1 (en) |
CA (1) | CA2786943C (en) |
ES (1) | ES2587196T3 (en) |
MX (1) | MX2012008119A (en) |
RU (1) | RU2568926C2 (en) |
TW (1) | TWI459376B (en) |
WO (1) | WO2011086060A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105405445A (en) * | 2015-12-10 | 2016-03-16 | 北京大学 | Parameter stereo coding, decoding method based on inter-channel transfer function |
CN109313907A (en) * | 2016-04-22 | 2019-02-05 | 诺基亚技术有限公司 | Combined audio signal and Metadata |
CN109644314A (en) * | 2016-09-23 | 2019-04-16 | 苹果公司 | Headphone driving signal is generated in digital audio and video signals processing ears rendering contexts |
WO2020057050A1 (en) * | 2018-09-17 | 2020-03-26 | 中科上声(苏州)电子有限公司 | Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011083981A2 (en) * | 2010-01-06 | 2011-07-14 | Lg Electronics Inc. | An apparatus for processing an audio signal and method thereof |
TWI854548B (en) * | 2010-12-03 | 2024-09-01 | 美商杜比實驗室特許公司 | Audio decoding device, audio decoding method, and audio encoding method |
US9253574B2 (en) | 2011-09-13 | 2016-02-02 | Dts, Inc. | Direct-diffuse decomposition |
IN2014CN03413A (en) * | 2011-11-01 | 2015-07-03 | Koninkl Philips Nv | |
CN104704558A (en) * | 2012-09-14 | 2015-06-10 | 杜比实验室特许公司 | Multi-channel audio content analysis based upmix detection |
WO2014126688A1 (en) | 2013-02-14 | 2014-08-21 | Dolby Laboratories Licensing Corporation | Methods for audio signal transient detection and decorrelation control |
TWI618050B (en) | 2013-02-14 | 2018-03-11 | 杜比實驗室特許公司 | Method and apparatus for signal decorrelation in an audio processing system |
KR101729930B1 (en) * | 2013-02-14 | 2017-04-25 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | Methods for controlling the inter-channel coherence of upmixed signals |
CN105075293B (en) * | 2013-03-29 | 2017-10-20 | 三星电子株式会社 | Audio frequency apparatus and its audio provide method |
CN108806704B (en) | 2013-04-19 | 2023-06-06 | 韩国电子通信研究院 | Multi-channel audio signal processing device and method |
US10075795B2 (en) | 2013-04-19 | 2018-09-11 | Electronics And Telecommunications Research Institute | Apparatus and method for processing multi-channel audio signal |
EP2804176A1 (en) | 2013-05-13 | 2014-11-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio object separation from mixture signal using object-specific time/frequency resolutions |
CN104240711B (en) * | 2013-06-18 | 2019-10-11 | 杜比实验室特许公司 | For generating the mthods, systems and devices of adaptive audio content |
EP2830053A1 (en) | 2013-07-22 | 2015-01-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal |
US9319819B2 (en) | 2013-07-25 | 2016-04-19 | Etri | Binaural rendering method and apparatus for decoding multi channel audio |
CN105493182B (en) * | 2013-08-28 | 2020-01-21 | 杜比实验室特许公司 | Hybrid waveform coding and parametric coding speech enhancement |
UA117258C2 (en) | 2013-10-21 | 2018-07-10 | Долбі Інтернешнл Аб | Decorrelator structure for parametric reconstruction of audio signals |
EP2866227A1 (en) | 2013-10-22 | 2015-04-29 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method for decoding and encoding a downmix matrix, method for presenting audio content, encoder and decoder for a downmix matrix, audio encoder and audio decoder |
ES2755349T3 (en) | 2013-10-31 | 2020-04-22 | Dolby Laboratories Licensing Corp | Binaural rendering for headphones using metadata processing |
CN103700372B (en) * | 2013-12-30 | 2016-10-05 | 北京大学 | A kind of parameter stereo coding based on orthogonal decorrelation technique, coding/decoding method |
EP2892250A1 (en) * | 2014-01-07 | 2015-07-08 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a plurality of audio channels |
US9955276B2 (en) | 2014-10-31 | 2018-04-24 | Dolby International Ab | Parametric encoding and decoding of multichannel audio signals |
KR102146878B1 (en) * | 2015-03-27 | 2020-08-21 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Apparatus and method for processing stereo signals for reproduction of automobiles to achieve individual stereoscopic sound by front loudspeakers |
EA034936B1 (en) | 2015-08-25 | 2020-04-08 | Долби Интернешнл Аб | AUDIO CODING AND DECODING USING REPRESENT CONVERSION PARAMETERS |
KR102357287B1 (en) | 2016-03-15 | 2022-02-08 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Apparatus, Method or Computer Program for Generating a Sound Field Description |
JP6846822B2 (en) * | 2016-04-27 | 2021-03-24 | 国立大学法人富山大学 | Audio signal processor, audio signal processing method, and audio signal processing program |
US9913061B1 (en) | 2016-08-29 | 2018-03-06 | The Directv Group, Inc. | Methods and systems for rendering binaural audio content |
CN109427337B (en) * | 2017-08-23 | 2021-03-30 | 华为技术有限公司 | Method and device for reconstructing a signal during coding of a stereo signal |
US10306391B1 (en) | 2017-12-18 | 2019-05-28 | Apple Inc. | Stereophonic to monophonic down-mixing |
WO2020009350A1 (en) * | 2018-07-02 | 2020-01-09 | 엘지전자 주식회사 | Method and apparatus for transmitting or receiving audio data associated with occlusion effect |
WO2020008112A1 (en) | 2018-07-03 | 2020-01-09 | Nokia Technologies Oy | Energy-ratio signalling and synthesis |
EP3618464A1 (en) * | 2018-08-30 | 2020-03-04 | Nokia Technologies Oy | Reproduction of parametric spatial audio using a soundbar |
FI3874492T3 (en) | 2018-10-31 | 2024-01-08 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
GB2578603A (en) * | 2018-10-31 | 2020-05-20 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
BR112020018466A2 (en) * | 2018-11-13 | 2021-05-18 | Dolby Laboratories Licensing Corporation | representing spatial audio through an audio signal and associated metadata |
CN114402631B (en) * | 2019-05-15 | 2024-05-31 | 苹果公司 | Method and electronic device for playback of captured sound |
US12183351B2 (en) | 2019-09-23 | 2024-12-31 | Dolby Laboratories Licensing Corporation | Audio encoding/decoding with transform parameters |
WO2024081957A1 (en) * | 2022-10-14 | 2024-04-18 | Virtuel Works Llc | Binaural externalization processing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264264A (en) * | 2000-02-14 | 2000-08-23 | 王幼庚 | Method for generating space sound signals by recording sound waves before ear |
WO2005101905A1 (en) * | 2004-04-16 | 2005-10-27 | Coding Technologies Ab | Scheme for generating a parametric representation for low-bit rate applications |
WO2007110101A1 (en) * | 2006-03-28 | 2007-10-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Enhanced method for signal shaping in multi-channel audio reconstruction |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL129752A (en) * | 1999-05-04 | 2003-01-12 | Eci Telecom Ltd | Telecommunication method and system for using same |
US7567845B1 (en) | 2002-06-04 | 2009-07-28 | Creative Technology Ltd | Ambience generation for stereo signals |
SE0402652D0 (en) * | 2004-11-02 | 2004-11-02 | Coding Tech Ab | Methods for improved performance of prediction based multi-channel reconstruction |
EP1761110A1 (en) | 2005-09-02 | 2007-03-07 | Ecole Polytechnique Fédérale de Lausanne | Method to generate multi-channel audio signals from stereo signals |
US8103005B2 (en) | 2008-02-04 | 2012-01-24 | Creative Technology Ltd | Primary-ambient decomposition of stereo audio signals using a complex similarity index |
JP5237463B2 (en) * | 2008-12-11 | 2013-07-17 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | Apparatus for generating a multi-channel audio signal |
-
2010
- 2010-08-26 EP EP10174230A patent/EP2360681A1/en not_active Withdrawn
-
2011
- 2011-01-07 TW TW100100644A patent/TWI459376B/en active
- 2011-01-11 RU RU2012136027/08A patent/RU2568926C2/en active
- 2011-01-11 JP JP2012548400A patent/JP5820820B2/en active Active
- 2011-01-11 KR KR1020127021317A patent/KR101491890B1/en active IP Right Grant
- 2011-01-11 CA CA2786943A patent/CA2786943C/en active Active
- 2011-01-11 EP EP11700088.5A patent/EP2524370B1/en active Active
- 2011-01-11 AU AU2011206670A patent/AU2011206670B2/en active Active
- 2011-01-11 BR BR112012017551-3A patent/BR112012017551B1/en active IP Right Grant
- 2011-01-11 WO PCT/EP2011/050265 patent/WO2011086060A1/en active Application Filing
- 2011-01-11 CN CN201180014038.9A patent/CN102804264B/en active Active
- 2011-01-11 ES ES11700088.5T patent/ES2587196T3/en active Active
- 2011-01-11 MX MX2012008119A patent/MX2012008119A/en active IP Right Grant
- 2011-01-13 AR ARP110100109A patent/AR079998A1/en active IP Right Grant
-
2012
- 2012-07-11 US US13/546,048 patent/US9093063B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264264A (en) * | 2000-02-14 | 2000-08-23 | 王幼庚 | Method for generating space sound signals by recording sound waves before ear |
WO2005101905A1 (en) * | 2004-04-16 | 2005-10-27 | Coding Technologies Ab | Scheme for generating a parametric representation for low-bit rate applications |
WO2007110101A1 (en) * | 2006-03-28 | 2007-10-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Enhanced method for signal shaping in multi-channel audio reconstruction |
Non-Patent Citations (1)
Title |
---|
JEROEN BREEBAART ET AL: "Multi-channel goes mobile:MPEG Surround binaural rendering", 《PROC.29TH AES CONFERENCE,SEOUL,KOREA》, 4 September 2006 (2006-09-04) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105405445A (en) * | 2015-12-10 | 2016-03-16 | 北京大学 | Parameter stereo coding, decoding method based on inter-channel transfer function |
CN105405445B (en) * | 2015-12-10 | 2019-03-22 | 北京大学 | A kind of parameter stereo coding, coding/decoding method based on transmission function between sound channel |
CN109313907A (en) * | 2016-04-22 | 2019-02-05 | 诺基亚技术有限公司 | Combined audio signal and Metadata |
CN109313907B (en) * | 2016-04-22 | 2023-11-17 | 诺基亚技术有限公司 | Combining audio signals and spatial metadata |
CN109644314A (en) * | 2016-09-23 | 2019-04-16 | 苹果公司 | Headphone driving signal is generated in digital audio and video signals processing ears rendering contexts |
WO2020057050A1 (en) * | 2018-09-17 | 2020-03-26 | 中科上声(苏州)电子有限公司 | Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor |
Also Published As
Publication number | Publication date |
---|---|
CA2786943C (en) | 2017-11-07 |
US20120314876A1 (en) | 2012-12-13 |
AU2011206670A1 (en) | 2012-08-09 |
RU2568926C2 (en) | 2015-11-20 |
RU2012136027A (en) | 2014-02-20 |
ES2587196T3 (en) | 2016-10-21 |
WO2011086060A1 (en) | 2011-07-21 |
CA2786943A1 (en) | 2011-07-21 |
EP2524370B1 (en) | 2016-07-27 |
AR079998A1 (en) | 2012-03-07 |
US9093063B2 (en) | 2015-07-28 |
JP2013517518A (en) | 2013-05-16 |
TWI459376B (en) | 2014-11-01 |
BR112012017551A2 (en) | 2017-10-03 |
KR20120109627A (en) | 2012-10-08 |
EP2360681A1 (en) | 2011-08-24 |
EP2524370A1 (en) | 2012-11-21 |
AU2011206670B2 (en) | 2014-01-23 |
TW201142825A (en) | 2011-12-01 |
CN102804264B (en) | 2016-03-09 |
KR101491890B1 (en) | 2015-02-09 |
BR112012017551B1 (en) | 2020-12-15 |
MX2012008119A (en) | 2012-10-09 |
JP5820820B2 (en) | 2015-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102804264B (en) | For from lower mixed signal and spatial parameter information extraction directly/device and method of ambient signals | |
US12131744B2 (en) | Audio encoding and decoding using presentation transform parameters | |
EP1817768B1 (en) | Parametric coding of spatial audio with cues based on transmitted channels | |
Herre et al. | MPEG surround-the ISO/MPEG standard for efficient and compatible multichannel audio coding | |
EP2122613B1 (en) | A method and an apparatus for processing an audio signal | |
RU2409911C2 (en) | Decoding binaural audio signals | |
EP1977417B1 (en) | Method and system for decoding a multi-channel signal | |
CN101160618B (en) | Compact side information for parametric coding of spatial audio | |
PT2372701E (en) | Enhanced coding and parameter representation of multichannel downmixed object coding | |
CN101410889A (en) | Controlling spatial audio coding parameters as a function of auditory events | |
CN101853660A (en) | Diffuse sound shaping for binaural cue coding schemes and similar schemes | |
EP2834813A1 (en) | Multi-channel audio encoder and method for encoding a multi-channel audio signal | |
KR20070091587A (en) | Stereo signal generation method and apparatus | |
US20240406650A1 (en) | Binaural dialogue enhancement | |
He | Spatial audio reproduction with primary ambient extraction | |
Breebaart et al. | Binaural rendering in MPEG Surround | |
He et al. | Literature review on spatial audio | |
He et al. | Time-shifting based primary-ambient extraction for spatial audio reproduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Munich, Germany Applicant after: Fraunhofer Application and Research Promotion Association Address before: Munich, Germany Applicant before: Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |