TW201528251A

TW201528251A - Apparatus and method for efficient object metadata coding

Info

Publication number: TW201528251A
Application number: TW103124953A
Authority: TW
Inventors: Christian Borss; Christian Ertel
Original assignee: Fraunhofer Ges Forschung
Priority date: 2013-07-22
Filing date: 2014-07-21
Publication date: 2015-07-16
Also published as: TWI560699B

Abstract

An apparatus (100) for generating one or more audio channels is provided. The apparatus (100) comprises a metadata decoder (110) for receiving one or more compressed metadata signals. Each of the one or more compressed metadata signals comprises a plurality of first metadata samples. The first metadata samples of each of the one or more compressed metadata signals indicate information associated with an audio object signal of one or more audio object signals. The metadata decoder (110) is configured to generate one or more reconstructed metadata signals, so that each of the one or more reconstructed metadata signals comprises the first metadata samples of one of the one or more compressed metadata signals and further comprises a plurality of second metadata samples. Moreover, the metadata decoder (110) is configured to generate each of the second metadata samples of each reconstructed metadata signal of the one or more reconstructed metadata signals depending on at least two of the first metadata samples of said reconstructed metadata signal. Moreover, the apparatus (100) comprises an audio channel generator (120) for generating the one or more audio channels depending on the one or more audio object signals and depending on the one or more reconstructed metadata signals. Furthermore, an apparatus for generating encoded audio information comprising one or more encoded audio signals and one or more compressed metadata signals is provided.

Description

High-efficiency object metadata coding device and method thereof

本發明係有關於音源編碼/解碼，特別的是有關於空間音源編碼以及空間音源物件編碼，更特別的是有關於高效率物件元數據編碼。 The present invention relates to sound source encoding/decoding, and more particularly to spatial sound source encoding and spatial sound source object encoding, and more particularly to high efficiency object metadata encoding.

空間音源編碼工具係此技術領域中所熟知，例如，在環繞MPEG標準中已有標準化規範。空間音源編碼從原始輸入聲道開始，例如在再現方案中依照其位置而識別的五個或是七個聲道，即左聲道、中間聲道、右聲道、左環繞聲道、右環繞聲道以及低頻增強聲道。空間音源編碼器通常從原始聲道衍生出至少一降混聲道，以及另外衍生出關於空間線索的參數數據，例如在聲道相干數值中的聲道間等級差異、聲道間相位差異、聲道間時間差異等等。至少一降混聲道係與指示空間線索的參數化輔助資訊一起傳送到空間音源解碼器。空間音源解碼器係解碼降混聲道以及相關聯的參數數據，最後取得與原始輸入聲道近似版本的輸出聲道。聲道在輸出端方案之設置通常為固定，例如，5.1聲道格式或7.1聲道格式等等。 Spatial sound source coding tools are well known in the art, for example, there are standardized specifications in the surround MPEG standard. The spatial sound source encoding starts from the original input channel, for example, five or seven channels identified according to its position in the reproduction scheme, namely, left channel, middle channel, right channel, left surround channel, right surround Channel and low frequency enhancement channels. Spatial sound source encoders typically derive at least one downmix channel from the original channel, and additionally derive parameter data about spatial cues, such as inter-channel level differences in channel coherence values, inter-channel phase differences, sound Differences in time between roads, etc. At least one downmix channel is transmitted to the spatial source decoder along with the parametric auxiliary information indicating the spatial cues. The spatial sound source decoder decodes the downmix channel and associated parameter data, and finally obtains an output channel that is approximately the same version as the original input channel. The setting of the channel at the output is usually fixed, for example, 5.1 channel format or 7.1 channel format.

此種以聲道為主的音源格式係廣泛使用於儲存或是傳送多聲道音源內容，而每一個聲道係有關於在給定位置上的一特定揚聲器。這些種類格式的忠實再現，需要一揚聲器設備，其揚聲器係放置在與音源訊號生產期間揚聲器使用相同的位置。增加揚聲器數量可改進真實三維虛擬實境音場，但是執行此要求是越來越困難，尤其是在家庭環境中，像是客廳。 Such channel-based audio source formats are widely used to store or transmit multi-channel audio content, and each channel has a specific speaker at a given location. A faithful reproduction of these types of formats requires a speaker device whose speaker is placed in the same position as the speaker used during the production of the audio signal. Increasing the number of speakers can improve the real three-dimensional virtual reality sound field, but it is increasingly difficult to perform this requirement, especially in a home environment like a living room.

可用以物件為基礎的方法來克服對特殊揚聲器設備的需求，在以物件為基礎的方法中揚聲器訊號係特別針對播放方案來轉譯。 An object-based approach can be used to overcome the need for special speaker devices in which the speaker signal is specifically translated for the playback scheme.

例如，空間音源物件編碼工具係此技術領域中所熟知且在 MPEG SAOC標準中已成標準。相比於空間音源編碼從原始聲道開始，空間音源物件編碼係從非自動專為特定轉譯再現方案的音源物件開始。另外，音源物件在再現場景中的位置為可變化，且可由使用者藉由將特定的轉譯資訊輸入至空間音源物件編碼解碼器來決定。另外，轉譯資訊，即在再現方案中特定音源物件待放置的位置資訊，係以額外的輔助資訊或是元數據來傳送。為了獲得特定的數據壓縮，係由一SAOC編碼器來編碼音源物件之數量，SAOC編碼器係根據特定的降混合資訊來降混合物件以從輸入物件計算至少一運輸聲道。此外，SAOC編碼器係計算參數化側資訊，其代表物件間線索，例如物件位準差異(OLD)、物件相干數值等等。當在空間音源編碼(SAC)中，物件間參數數據係針對個別時間平鋪(time tiles)/頻率平鋪(frequency tiles)來計算，即，針對音源訊號之特定訊框，例如，1024或是2048個取樣值，24、32或是64等等，係考慮頻帶使得對於每一訊框以及每一頻帶皆存在參數數據。作為一舉例，當一音源片具有20個訊框且當每一訊框係細分成32個頻帶，則時間/頻率平鋪之數量係為640。 For example, spatial source object encoding tools are well known in the art and are The standard has been established in the MPEG SAOC standard. Starting from the original channel, spatial source code encoding begins with a source object that is not automatically designed for a particular translation rendering scheme, as compared to spatial source encoding. In addition, the position of the sound source object in the reproduction scene is changeable and can be determined by the user by inputting specific translation information to the spatial sound source object codec. In addition, the translation information, that is, the location information to be placed in the specific sound source object in the reproduction scheme, is transmitted by additional auxiliary information or metadata. To obtain a particular data compression, the number of source objects is encoded by a SAOC encoder that drops the mixture based on the particular downmix information to calculate at least one transport channel from the input object. In addition, the SAOC encoder calculates parameterized side information that represents inter-object cues, such as object level differences (OLD), object coherence values, and the like. In spatial sound source coding (SAC), inter-object parameter data is calculated for individual time tiles/frequency tiles, ie, for specific frames of the source signal, for example, 1024 or 2048 samples, 24, 32 or 64, etc., consider the frequency band such that there is parameter data for each frame and for each band. As an example, when a source chip has 20 frames and when each frame is subdivided into 32 bands, the number of time/frequency tiles is 640.

在以物件為基礎的方法中，以分離式音源物件來描述音場。此需要物件元數據，其描述在3D空間中每一個音源之時變位置。 In an object-based approach, the sound field is described as a separate source object. This requires object metadata that describes the time varying position of each source in 3D space.

在先前技術中，第一數據編碼編碼概念為空間聲音描述交換格式(SpatDIF)，而音頻場景描述格式目前尚在發展中[1]。音頻場景描述格式係為以物件為主的聲音場景交換格式，其並沒有提供任何壓縮物件軌跡的方法。SpatDIF將以文字為主的開放性聲音控制(OSC)格式使用於物件元數據的結構[2]。然而，一個簡單以文字為主的表現並非為物件軌跡的經壓縮傳輸的選項。 In the prior art, the first data encoding and coding concept is the spatial sound description exchange format (SpatDIF), and the audio scene description format is currently under development [1]. The audio scene description format is an object-based sound scene exchange format that does not provide any means of compressing object tracks. SpatDIF uses the text-based Open Sound Control (OSC) format for the structure of object metadata [2]. However, a simple text-based performance is not an option for compressed transmission of object trajectories.

在先前技術中，另一個元數據概念為音源場景描述格式(ASDF)[3]，其和一種以文字為基礎的解決方案具有相同的缺點。此數據係藉由同步多媒體集成語言(SMIL)之延伸所建構，該同步多媒體集成語言(SMIL)為可延伸標記式語言(XML)[4,5]之子集合。 In the prior art, another metadata concept is the Sound Source Scene Description Format (ASDF) [3], which has the same disadvantages as a text-based solution. This data is constructed by an extension of Synchronized Multimedia Integration Language (SMIL), a subset of Extensible Markup Language (XML) [4, 5].

在先前技術中的另一個元數據技術為場景的音源二進制格式(AudioBIFS)，二進制格式MPEG-4標準的一部分[6,7]，其高度有關於虛擬實境建模語言(VRML)，其已開發應用於音源虛擬3D場景以及虛擬實境 [8]。複雜的AudioBIFS標準使用場景圖以指定物件移動的路徑。AudioBIFS主要的缺點在於並非設計用於即時作業系統，其會使即時作業系統延遲並且需要隨機讀取數據流。此外，物件位置的編碼不運用受限的聽者的定位能力。在音源虛擬場景中的聽者有固定位置時，則物件數據可量化成較低的位元數值[9]。因此，應用於AudioBIFS的物件元數據的編碼對於數據壓縮是無效的。 Another metadata technique in the prior art is the audio binary format of the scene (AudioBIFS), part of the binary format MPEG-4 standard [6, 7], which is highly relevant to the Virtual Reality Modeling Language (VRML), which has Developed for virtual 3D scenes and virtual reality [8]. The complex AudioBIFS standard uses scene graphs to specify the path the object moves. The main disadvantage of AudioBIFS is that it is not designed for real-time operating systems, which delays the real-time operating system and requires random reading of the data stream. In addition, the encoding of the object location does not utilize the limited viewer's ability to locate. When the listener in the virtual scene of the source has a fixed position, the object data can be quantized to a lower bit value [9]. Therefore, the encoding of object metadata applied to AudioBIFS is not valid for data compression.

如果能提供改善使物件元數據的編碼可進行數據的壓縮，將會獲得高度的讚賞。 If you can improve the encoding of the object metadata, you can get a high degree of appreciation.

本發明的目的係用以改善物件元數據編碼的技術。本發明的目的係藉由權利項1的裝置、權利項8的裝置、權利項14的系統、權利項15的方法、權利項16的方法以及權利項17的電腦程式來達成。 The object of the invention is a technique for improving the encoding of object metadata. The object of the present invention is achieved by the apparatus of claim 1, the apparatus of claim 8, the system of claim 14, the method of claim 15, the method of claim 16, and the computer program of claim 17.

本發明係提供一種用於產生至少一音源聲道的裝置，其中該裝置包含元數據解壓縮器，用於接收至少一經壓縮元數據訊號。每一個經壓縮元數據訊號係包含複數個第一元數據取樣值。每一個經壓縮元數據訊號中的第一元數據取樣值係指示與至少一音源物件訊號相關聯的資訊。元數據解碼器係用以產生至少一再建元數據訊號，使得每一個再建元數據訊號包含至少一經壓縮元數據訊號中的其中一個的複數個第一元數據取樣值以及更包含複數個第二元數據取樣值。元數據解碼器係依照再建元數據訊號之複數個第一元數據取樣值中的至少二個，產生每一個再建元數據訊號之每一個第二元數據取樣值。此外，該裝置包含一音源聲道產生器，音源聲道產生器係依照至少一音源物件訊號以及至少一再建元數據訊號而產生至少一音源聲道。 The present invention provides an apparatus for generating at least one source channel, wherein the apparatus includes a metadata decompressor for receiving at least one compressed metadata signal. Each compressed metadata signal includes a plurality of first metadata samples. The first metadata sample value in each compressed metadata signal is indicative of information associated with at least one source object signal. The metadata decoder is configured to generate at least one re-established metadata signal, such that each reconstructed metadata signal includes a plurality of first metadata samples of at least one of the compressed metadata signals and further includes a plurality of second elements Data sample value. The metadata decoder generates each of the second metadata samples of each of the reconstructed metadata signals in accordance with at least two of the plurality of first metadata samples of the reconstructed metadata signal. In addition, the apparatus includes a source channel generator that generates at least one source channel in accordance with at least one source object signal and at least one reconstructed metadata signal.

此外，本發明係提供一種用於產生編碼音源資訊的裝置，該編碼音源資訊包含至少一編碼音源訊號以及至少一經壓縮元數據訊號。此裝置包含：一元數據編碼器，用於接收至少一原始的元數據訊號，其中每一個原始的元數據訊號係包含複數個元數據取樣值，其中每一個原始的元數據訊號中的元數據取樣值係指示與至少一音源物件訊號之音源物件訊號相關聯的的資訊，其中元數據編碼器係用以產生至少一經壓縮元數據訊號，使得每一經壓縮元數據訊號包含原始的元數據訊號中的其中一個之元數據取樣值之至少二個的一第一群，以及使得經壓縮元數據訊號不包含複數個原始的元數據訊號中的其中一個之複數個元數據取樣值之另外至少二個之一第二群的任何元數據取樣值。此外，該裝置包含一音源編碼器，該音源編碼器係用以編碼至少一音源物件訊號以取得至少一編碼音源訊號。 In addition, the present invention provides an apparatus for generating encoded source information, the encoded source information including at least one encoded sound source signal and at least one compressed metadata signal. The apparatus includes: a metadata decoder for receiving at least one original metadata signal, wherein each of the original metadata signals includes a plurality of metadata samples, wherein each of the original metadata signals is sampled in the metadata The value indicates the source signal of the signal with at least one source object signal Corresponding information, wherein the metadata encoder is configured to generate at least one compressed metadata signal such that each compressed metadata signal includes at least one of a metadata sample value of one of the original metadata signals a first group, and any metadata sample value such that the compressed metadata signal does not include at least one of the other at least two of the plurality of metadata samples of one of the plurality of original metadata signals. In addition, the device includes a sound source encoder for encoding at least one sound source object signal to obtain at least one encoded sound source signal.

此外，本發明係提供一種系統，其包含用於產生編碼音源資訊的裝置，該編碼音源資訊包含至少一編碼音源訊號以及至少一經壓縮元數據訊號。此外，該系統包含用於接收至少一編碼音源訊號以及至少一經壓縮元數據訊號的裝置，並依照至少一編碼音源訊號以及依照至少一經壓縮元數據訊號產生至少一音源聲道。 Moreover, the present invention provides a system for generating information for encoding source information, the encoded source information including at least one encoded sound source signal and at least one compressed metadata signal. Moreover, the system includes means for receiving at least one encoded sound source signal and at least one compressed metadata signal, and generating at least one sound source channel in accordance with the at least one encoded sound source signal and in accordance with the at least one compressed metadata signal.

根據實施例，提供用於物件元數據的數據壓縮概念，其係達成具有限的數據速率的傳輸聲道為有效的壓縮架構。此外，對於純方位變化的良好壓縮率得以實現，例如照相機旋轉。此外，此架構支持不連續的軌跡，例如位置的跳躍。此外，亦能實現低解碼複雜度。此外，可實現有限的重新初始化時間下的隨機讀取。 According to an embodiment, a data compression concept for object metadata is provided that is a valid compression architecture that achieves a limited data rate transmission channel. In addition, a good compression ratio for pure azimuth changes is achieved, such as camera rotation. In addition, this architecture supports discontinuous trajectories, such as positional hopping. In addition, low decoding complexity can be achieved. In addition, random reads at limited reinitialization times can be achieved.

此外，本發明係提供一種用於產生至少一音源聲道的方法，其中該方法包含：接收至少一經壓縮元數據訊號，其中每一個經壓縮中的元數據訊號係包含複數個第一元數據取樣值，其中每一個經壓縮元數據訊號中的第一元數據取樣值係指示與至少一音源物件訊號相關聯的的資訊；產生至少一再建元數據訊號，使得每一個再建元數據訊號包含至少一經壓縮元數據訊號中的其中一個的複數個第一元數據取樣值，以及更包含複數個第二元數據取樣值，其中產生至少一再建元數據訊號之步驟包含依照再建元數據訊號之複數個第一元數據取樣值中的至少二個以產生每一個再建元數據訊號之每一個第二元數據取樣值的步驟；依照至少一音源物件訊號以及依照至少一再建元數據訊號產生至少一音源聲道。 Moreover, the present invention provides a method for generating at least one source channel, wherein the method includes receiving at least one compressed metadata signal, wherein each compressed metadata signal includes a plurality of first metadata samples a value, wherein the first metadata sample value in each compressed metadata signal indicates information associated with at least one source object signal; generating at least one reconstructed metadata signal such that each reconstructed metadata signal includes at least one Compressing a plurality of first metadata sample values of one of the metadata signals, and further comprising a plurality of second metadata sample values, wherein the step of generating the at least one reconstructed metadata signal comprises a plurality of steps according to the reconstructed metadata signal At least two of the metadata samples to generate each of the second metadata samples of each of the reconstructed metadata signals; generating at least one source channel according to the at least one source object signal and according to the at least one reconstructed metadata signal .

此外，本發明更揭露一種用於產生編碼音源資訊的方法，編碼音源資訊係包含至少一編碼音源訊號以及至少一壓縮元數據訊號。此方法包含：接收至少一原始的元數據訊號，其中每一原始的元數據訊號係包含複數個元數據取樣值，其中每一原始的元數據訊號之元數據取樣值係指示與至少一音源物件訊號之音源物件訊號相關聯的資訊。 In addition, the present invention further discloses a method for generating encoded source information, The code source information includes at least one encoded sound source signal and at least one compressed metadata signal. The method includes: receiving at least one original metadata signal, wherein each original metadata signal includes a plurality of metadata sample values, wherein a metadata sample value of each original metadata signal is indicative of at least one audio source object The information associated with the signal source signal of the signal.

產生至少一經壓縮元數據訊號，使得每一經壓縮元數據訊號包含原始的元數據訊號中的其中一個之複數個元數據取樣值之至少二個的一第一群，以及使得經壓縮元數據訊號不包含該複數個原始的元數據訊號中的其中一個之複數個元數據取樣值之另外至少二個之一第二群的任何元數據取樣值。 Generating at least one compressed metadata signal such that each compressed metadata signal includes a first group of at least two of a plurality of metadata samples of one of the original metadata signals, and such that the compressed metadata signal is not Any metadata sample value of the second group of the other at least two of the plurality of metadata samples comprising one of the plurality of original metadata signals.

編碼至少一音源物件訊號以獲得至少一編碼音源訊號。 Encoding at least one source object signal to obtain at least one encoded sound source signal.

此外，本發明提供一種電腦程式，當此電腦程式於一電腦或是一訊號處理器上執行時，電腦程式係實現上述之方法。 Furthermore, the present invention provides a computer program that implements the above method when the computer program is executed on a computer or a signal processor.

100‧‧‧用以產生(複數個)音源聲道的裝置 100‧‧‧Devices for generating (multiple) source channels

100‧‧‧裝置 100‧‧‧ device

101‧‧‧音源輸入數據 101‧‧‧Source input data

110‧‧‧元數據解碼器 110‧‧‧ metadata decoder

120‧‧‧音源聲道產生器、音源解碼器 120‧‧‧Source channel generator, sound source decoder

200‧‧‧混合器、區塊 200‧‧‧mixers, blocks

200‧‧‧選擇性的預轉譯器/混合器、預轉譯器/混合器、混合器 200‧‧‧Selective pre-translator/mixer, pre-translator/mixer, mixer

210‧‧‧元數據編碼器 210‧‧‧ metadata encoder

220‧‧‧音源編碼器 220‧‧‧Source encoder

250‧‧‧用以編碼音源資訊的裝置、裝置 250‧‧‧Devices and devices for encoding source information

300/500‧‧‧USAC編碼器 300/500‧‧‧USAC encoder

300‧‧‧核心編碼器 300‧‧‧core encoder

400‧‧‧原點、元數據壓縮器 400‧‧‧ origin, metadata compressor

400‧‧‧元數據壓縮器、區塊 400‧‧‧ metadata compressor, block

400‧‧‧OAM編碼器 400‧‧‧OAM encoder

410‧‧‧音源物件之位置、音源物件的精確位置 410‧‧‧Location of the source object, precise location of the source object

415‧‧‧直線 415‧‧‧ Straight line

420‧‧‧OAM解碼器 420‧‧OAM decoder

500‧‧‧輸出界面、原點 500‧‧‧Output interface, origin

501‧‧‧音源輸出數據、數據 501‧‧‧Source output data, data

510‧‧‧第一音源物件之位置、第一音源物件所在的位置、位置、揚聲器 510‧‧‧Location of the first source object, location, position of the first source object, speaker

511、512、513、514‧‧‧揚聲器 511, 512, 513, 514‧‧‧ speakers

520‧‧‧第二音源物件之位置、位置、揚聲器 520‧‧‧ Position, position, speaker of the second source object

600‧‧‧模式控制器 600‧‧‧ mode controller

611‧‧‧量化 611‧‧‧Quantification

612‧‧‧縮減取樣 612‧‧‧Sampling

621‧‧‧反升取樣處理 621‧‧‧Anti-up sampling

622‧‧‧線性內插 622‧‧‧linear interpolation

640‧‧‧多邊形近似 640‧‧‧Polygon approximation

721‧‧‧反升取樣處理 721‧‧‧Anti-up sampling

722‧‧‧線性內插、線性內插法 722‧‧‧Linear interpolation, linear interpolation

740‧‧‧步驟、反向多邊形轉換 740‧‧‧Steps, Reverse Polygon Conversion

800‧‧‧選擇性的SAOC編碼器、SAOC編碼器 800‧‧‧Selective SAOC encoder, SAOC encoder

900‧‧‧連接 900‧‧‧Connect

1100‧‧‧輸入界面 1100‧‧‧ input interface

1200‧‧‧物件處理器、處理器 1200‧‧‧object processor, processor

1205‧‧‧輸出聲道 1205‧‧‧ Output channel

1210‧‧‧物件轉譯器 1210‧‧‧Object Translator

1220‧‧‧混合器 1220‧‧‧ Mixer

1300‧‧‧核心解碼器、USAC解碼器 1300‧‧‧core decoder, USAC decoder

1400‧‧‧元數據解壓縮器 1400‧‧‧ metadata decompressor

1400‧‧‧OAM解碼器 1400‧‧OAM decoder

1600‧‧‧模式控制器 1600‧‧‧ mode controller

1700‧‧‧後處理器 1700‧‧‧post processor

1710‧‧‧二進位轉譯器、二進制轉譯器 1710‧‧‧ binary translator, binary translator

1720‧‧‧格式轉換、格式轉換器 1720‧‧‧Format conversion, format converter

1727‧‧‧快捷 1727‧‧‧fast

1730‧‧‧32ch揚聲器、輸出、輸出界面 1730‧‧‧32ch speaker, output, output interface

1800‧‧‧選擇性的SAOC解碼器、SAOC解碼器、方塊 1800‧‧‧Selective SAOC decoder, SAOC decoder, block

1810‧‧‧向量基準波幅泛移級、VBAP 1810‧‧‧ Vector reference amplitude shift level, VBAP

第1圖係繪示根據一實施例之用於產生至少一音源聲道之一裝置。 1 is a diagram of an apparatus for generating at least one source channel in accordance with an embodiment.

第2圖係繪示根據一實施例之用於產生編碼音源資訊的裝置，編碼音源資訊包含至少一編碼音源訊號以及至少一經壓縮元數據訊號。 2 is a diagram of an apparatus for generating encoded sound source information according to an embodiment, the encoded sound source information includes at least one encoded sound source signal and at least one compressed metadata signal.

第3圖係繪示根據一實施例之一系統。 Figure 3 is a diagram showing a system in accordance with an embodiment.

第4圖係繪示在三維空間中從原點藉由方位角、仰角以及半徑表示之音源物件之部分。 Figure 4 is a diagram showing a portion of a sound source object represented by an azimuth, elevation, and radius from an origin in a three-dimensional space.

第5圖係繪示音源聲道產生器假定之音源物件之部分以及揚聲器方案。 Figure 5 is a diagram showing the portion of the source device assumed by the source channel generator and the speaker scheme.

第6圖係繪示根據一實施例之元數據編碼。 Figure 6 illustrates metadata encoding in accordance with an embodiment.

第7圖係繪示根據一實施例之元數據解碼。 Figure 7 illustrates metadata decoding in accordance with an embodiment.

第8圖係繪示根據另一實施例之元數據編碼。 Figure 8 illustrates metadata encoding in accordance with another embodiment.

第9圖係繪示根據另一實施例之元數據解碼。 Figure 9 is a diagram showing metadata decoding in accordance with another embodiment.

第10圖係繪示根據另一實施例之元數據編碼。 Figure 10 is a diagram showing metadata encoding in accordance with another embodiment.

第11圖係繪示根據另一實施例之元數據解碼。 Figure 11 is a diagram showing metadata decoding in accordance with another embodiment.

第12圖係繪示3D音源編碼器之第一實施例。 Figure 12 is a diagram showing a first embodiment of a 3D sound source encoder.

第13圖係繪示3D音源解碼器之第一實施例。 Figure 13 is a diagram showing a first embodiment of a 3D sound source decoder.

第14圖係繪示3D音源編碼器之第二實施例。 Figure 14 is a diagram showing a second embodiment of a 3D sound source encoder.

第15圖係繪示3D音源解碼器之第二實施例。 Figure 15 is a diagram showing a second embodiment of a 3D sound source decoder.

第16圖係繪示3D音源編碼器之第三實施例。 Figure 16 is a diagram showing a third embodiment of a 3D sound source encoder.

第17圖係繪示3D音源解碼器之第三實施例。 Figure 17 is a diagram showing a third embodiment of a 3D sound source decoder.

第2圖係繪示根據一實施例之用於產生編碼音源資訊的裝置250，編碼音源資訊包含至少一編碼音源訊號以及至少一經壓縮元數據訊號。 FIG. 2 illustrates an apparatus 250 for generating encoded sound source information, the encoded sound source information including at least one encoded sound source signal and at least one compressed metadata signal, according to an embodiment.

裝置250包含一元數據編碼器210，用以接收至少一原始的元數據訊號。每一個原始的元數據訊號係包含複數個元數據取樣值。至少一原始的元數據訊號中的每一個之元數據取樣值係指示與至少一音源物件訊號之音源物件訊號相關聯的的資訊。元數據編碼器210係用以產生至少一經壓縮元數據訊號，使得每一經壓縮元數據訊號能包含原始的元數據訊號中的其中一個之元數據取樣值之至少二個的一第一群，以及使得經壓縮元數據訊號不包含複數個原始的元數據訊號中的其中一個之複數個元數據取樣值之另外至少二個之一第二群的任何元數據取樣值。 The device 250 includes a metadata encoder 210 for receiving at least one original metadata signal. Each original metadata signal contains a plurality of metadata samples. The metadata sample value of each of the at least one original metadata signal is indicative of information associated with the source object signal of at least one source object signal. The metadata encoder 210 is configured to generate at least one compressed metadata signal such that each compressed metadata signal can include a first group of at least two of the metadata samples of one of the original metadata signals, and The compressed metadata signal does not include any metadata sample values of at least one of the other at least two of the plurality of metadata samples of one of the plurality of original metadata signals.

此外，裝置250包含一音源編碼器220，用以編碼至少一音源物件訊號以取得至少一編碼音源訊號。例如，音源聲道產生器可包含一SAOC編碼器，該SAOC編碼器根據習知技術編碼至少一音源物件訊號，以取得至少一SAOC運輸聲道並作為至少一編碼音源訊號。各種其它用以編碼至少一音源物件聲道的編碼技術可替換或額外地編碼所述的至少一音源物件聲道。 In addition, the device 250 includes a sound source encoder 220 for encoding at least one sound source object signal to obtain at least one encoded sound source signal. For example, the source channel generator can include an SAOC encoder that encodes at least one source object signal in accordance with conventional techniques to obtain at least one SAOC transport channel and as at least one encoded source signal. Various other encoding techniques for encoding at least one source object channel may alternatively or additionally encode the at least one source object channel.

裝置100包含一元數據解碼器110，用於接收至少一經壓縮的元數據訊號。每一個經壓縮的元數據訊號係包含複數個第一元數據取樣值。每一個經壓縮元數據訊號的第一元數據取樣值係指示與至少一音源物件訊號相關聯的資訊。元數據解碼器110係用以產生至少一再建元數據訊號，使得每一個再建元數據訊號包含至少一經壓縮元數據訊號中的其中一個的複數個第一元數據取樣值以及更包含複數個第二元數據取樣值。此外，元數據解碼器110係依照再建元數據訊號之複數個第一元數據取樣值中的至少二個，產生每一個再建元數據訊號之每一個第二元數據取樣值。 Apparatus 100 includes a metadata decoder 110 for receiving at least one compressed metadata signal. Each compressed metadata signal includes a plurality of first metadata samples. The first metadata sample value of each compressed metadata signal is indicative of at least one source Information related to the signal. The metadata decoder 110 is configured to generate at least one re-established metadata signal, such that each reconstructed metadata signal includes a plurality of first metadata samples of at least one of the compressed metadata signals and further includes a plurality of second Metadata sampled value. In addition, the metadata decoder 110 generates each second metadata sample value of each reconstructed metadata signal according to at least two of the plurality of first metadata samples of the reconstructed metadata signal.

此外，裝置100包含一音源聲道產生器120，該音源聲道產生器120係依照至少一音源物件訊號以及至少一再建元數據訊號而產生至少一音源聲道。 In addition, the device 100 includes a sound source channel generator 120 that generates at least one sound source channel according to at least one sound source object signal and at least one reconstructed metadata signal.

當參閱元數據取樣時，應當注意的是，元數據取樣的特徵在於其元數據取樣值以及與其相關的時間點。例如，此類時間點可與音源序列或其相似物的起始相關。例如，符號n或k可辨識在元數據訊號內的元數據取樣的位置，並藉此指示出(相關的)時間點(其與起始時間相關)。應當注意的是，當兩個元數據取樣與不同時間點相關時，該兩個元數據取樣不同於其他的元數據取樣，即使當它們的元數據取樣值為相同時，有時也會出現這樣的情況。 When referring to metadata sampling, it should be noted that metadata sampling is characterized by its metadata sample values and the time points associated therewith. For example, such a point in time may be related to the beginning of a sequence of sound sources or its analogs. For example, the symbol n or k can identify the location of the metadata sample within the metadata signal and thereby indicate the (correlated) time point (which is related to the start time). It should be noted that when two metadata samples are correlated with different time points, the two metadata samples are different from other metadata samples, even when their metadata samples are the same. Case.

上述的實施例係基於發現與音源物件訊號相關聯的(包含於元數據訊號的)元數據資訊常變化緩慢。 The above embodiments are based on the discovery that the metadata information associated with the source object signal (included in the metadata signal) often changes slowly.

例如，元數據訊號可指示在音源物件上的位置資訊(例如用以定義音源物件的位置的方位角、仰角或半徑)。可以假設音源物件的位置在大部分的時間不會改變或僅緩慢地改變。 For example, the metadata signal can indicate location information on the source object (eg, azimuth, elevation, or radius to define the location of the source object). It can be assumed that the position of the source object does not change or only slowly changes most of the time.

或者是，元數據訊號可例如指示音源物件的音量(例如增益)，並且也可以假設音源物件的音量在大部分的時間緩慢地改變。 Alternatively, the metadata signal may, for example, indicate the volume (eg, gain) of the source object, and may also assume that the volume of the source object changes slowly over most of the time.

基於這個原因，在每個時間點並不需要傳遞(完整的)元數據資訊。相反地，(完整的)元數據資訊僅在特定時間點週期性地傳遞，例如週期性地，像是在第N個時間點、在時間點0、N、2N、3N等。在解碼器側上，在中間的時間點(例如時間點1、2...N-1)的元數據可接著基於至少兩個時間點的元數據取樣進行近似。例如，時間點1、2...N-1的元數據取樣可根據時間點0以及N的元數據取樣進行近似，例如採用線性內插法。如前所述，此類近似係基於發現在音源物件上的元數據資訊通常緩慢地改變。 For this reason, there is no need to pass (complete) metadata information at each point in time. Conversely, (complete) metadata information is only periodically transmitted at specific points in time, such as periodically, such as at the Nth time point, at time points 0, N, 2N, 3N, and the like. On the decoder side, metadata at intermediate points in time (eg, time points 1, 2...N-1) can then be approximated based on metadata samples of at least two points in time. For example, metadata samples at time points 1, 2...N-1 can be approximated based on time samples 0 and N metadata samples, such as linear interpolation. As mentioned earlier, such approximations are based on the fact that the metadata information found on the source objects typically changes slowly.

例如，在多個實施例中，三個元數據訊號指定在3D空間中的一音源物件的位置。元數據訊號中的第一個可例如指定音源物件所在位置的方位角。元數據訊號中的第二個可例如指定音源物件所在位置的仰角。元數據訊號中的第二個可例如指定與音源物件距離相關的半徑。 For example, in various embodiments, three metadata signals specify the location of a source object in 3D space. The first of the metadata signals can, for example, specify the azimuth of the location of the source object. The second of the metadata signals can, for example, specify the elevation angle at which the source object is located. The second of the metadata signals may, for example, specify a radius associated with the distance of the source object.

請參閱第4圖，如圖所示，方位角、仰角以及半徑明確地定義在3D空間中的一音源物件的位置。 Referring to Figure 4, as shown, the azimuth, elevation, and radius clearly define the location of a source object in 3D space.

第4圖係繪示在三維(3D)空間中從原點400藉由方位角、仰角以及半徑表示之音源物件之位置410。 Figure 4 is a diagram showing the position 410 of the source object represented by the azimuth, elevation and radius from the origin 400 in a three-dimensional (3D) space.

仰角例，如指定從原點到物件位置的直線以及在xy平面(x軸以及y軸所定義之平面)上的直線的正交投影之間的角度。方位角，例如定義在x軸以及該正交投影之間的角度。藉由指定方位角以及仰角，可定義出從原點400以及音源物件的位置410之間的直線415。藉由更進一步指定半徑，可定義音源物件的精確位置410。 Examples of elevation angles, such as specifying the line from the origin to the position of the object and the angle between the orthogonal projections of the line on the xy plane (the x-axis and the plane defined by the y-axis). The azimuth, for example, defines the angle between the x-axis and the orthogonal projection. By specifying the azimuth and elevation, a line 415 from the origin 400 and the position 410 of the source object can be defined. By further specifying the radius, the precise position 410 of the source object can be defined.

在一實施例中，方位角定義為：-180°<方位角180°，仰角定義為：-90°仰角90°，半徑的單位可例如定義為公尺[m](大於0m或等於0m)。 In an embodiment, the azimuth angle is defined as: -180° < azimuth 180°, the elevation angle is defined as: -90° Elevation angle 90°, the unit of the radius can be defined, for example, as a meter [m] (greater than 0 m or equal to 0 m).

在另一實施例中，可假設在xyz座標系統中的音源物件位置的所有x值大於或相等於零值，方位角的範圍可定義為-90°方位角90°，仰角的範圍可定義為：-90°仰角90°，半徑的單位可例如定義為公尺[m]。 In another embodiment, it can be assumed that all x values of the source object position in the xyz coordinate system are greater than or equal to zero, and the range of azimuth can be defined as -90°. Azimuth 90°, the range of elevation angle can be defined as: -90° Elevation angle 90°, the unit of the radius can be defined, for example, as a meter [m].

在另一實施例中，可調整元數據訊號以使方位角的範圍被定義為：-128°<方位角128°、仰角的範圍被定義為：-32°仰角32°以及半徑可例如被定義為對數標度。在一些實施例中，原始的元數據訊號、經壓縮元數據訊號以及再建元數據訊號可包含位置資訊的縮放表現及/或至少一音源物件訊號中的其中一個的音量的縮放表現。 In another embodiment, the metadata signal can be adjusted such that the range of azimuth angles is defined as: -128° < azimuth The range of 128° and elevation angle is defined as: -32° Elevation angle 32° and radius can be defined, for example, as a logarithmic scale. In some embodiments, the original metadata signal, the compressed metadata signal, and the reconstructed metadata signal may include a zoomed representation of the location information and/or a scaled representation of the volume of one of the at least one source object signals.

音源聲道產生器120可例如用以依照至少一音源物件訊號中以及依照再建元數據訊號以產生至少一音源聲道，其中再建元數據訊號可例如指示複數個音源物件的位置。 The source channel generator 120 can be used, for example, to generate at least one source channel in accordance with at least one source object signal and in accordance with the reconstructed metadata signal, wherein the reconstructed metadata signal can, for example, indicate the location of the plurality of source objects.

第5圖係繪示音源聲道產生器假定之音源物件之部分以及揚聲器方案。xyz座標系統的原點500被繪示於圖中。此外，第一音源物件之位置510以及第二音源物件之位置520被繪示於圖中。此外，第5圖係繪示音源聲道產生器120產生四個揚聲器的四個音源聲道的方案。音源聲道產生器120假定四個揚聲器511、512、513及514在第5圖中的位置。 Figure 5 depicts the portion of the source of the sound source assumed by the source channel generator and Speaker solution. The origin 500 of the xyz coordinate system is shown in the figure. Additionally, the location 510 of the first source object and the location 520 of the second source object are depicted in the figures. In addition, FIG. 5 illustrates a scheme in which the sound source channel generator 120 generates four sound source channels of four speakers. The sound source channel generator 120 assumes the positions of the four speakers 511, 512, 513, and 514 in FIG.

在第5圖中，第一音源物件所在的位置510係接近於假定的揚聲器511及512並遠離揚聲器513及514。因此，音源聲道產生器120可產生四個音源聲道，以使第一音源物件510藉由揚聲器511及512而不藉由揚聲器513及514播放。 In FIG. 5, the position 510 at which the first source object is located is close to the assumed speakers 511 and 512 and away from the speakers 513 and 514. Therefore, the sound source channel generator 120 can generate four sound source channels such that the first sound source object 510 is played by the speakers 511 and 512 without being driven by the speakers 513 and 514.

在另一實施例中，音源聲道產生器120可產生四個音源聲道，以使第一音源物件510藉由揚聲器511及512以高音量播放以及藉由揚聲器513及514以低音量播放。 In another embodiment, the sound source channel generator 120 can generate four sound source channels such that the first sound source object 510 is played at a high volume by the speakers 511 and 512 and played at a low volume by the speakers 513 and 514.

此外，第二音源物件的位置接近於揚聲器513及514的假定位置以及遠離揚聲器511。因此，音源產生器可產生四個音源聲道，以使第二音源物件520藉由揚聲器513及514而不藉由揚聲器513及514播放。 In addition, the position of the second source object is close to the assumed position of the speakers 513 and 514 and away from the speaker 511. Therefore, the sound source generator can generate four sound source channels such that the second sound source object 520 is played by the speakers 513 and 514 without being driven by the speakers 513 and 514.

在另一個實施例，音源聲道產生器120可產生四個音源聲道，以使第二音源物件520藉由揚聲器513及514以高音量播放以及藉由揚聲器511及512以低音量播放。 In another embodiment, the sound source channel generator 120 can generate four sound source channels such that the second sound source object 520 is played at a high volume by the speakers 513 and 514 and played at a low volume by the speakers 511 and 512.

在替代實施例中，僅兩個元數據訊號被用於指定音源物件的位置。舉例來說，當假設所有音源物件位於單一平面時，例如僅方位以及半徑可被指定。 In an alternate embodiment, only two metadata signals are used to specify the location of the source object. For example, when all sound source objects are assumed to be in a single plane, for example only orientation and radius can be specified.

在另一實施例中，每個音源物件僅有單一元數據訊號被編碼以及傳遞作為位置資訊。例如，僅方位角可被指定作為音源物件的位置資訊(例如可假設所有音源物件在具有與中心點相隔相同距離的相同平面，因此被假設為相同半徑)。方位角資訊可例如用以決定音源物件的位置相近於左揚聲器以及遠離右揚聲器。在此情況中，音源聲道產生器可例如產生至少一音源聲道，以使音源物件藉由左揚聲器而不藉由右揚聲器播放。 In another embodiment, only a single metadata signal is encoded and transmitted as location information for each source object. For example, only the azimuth can be specified as the positional information of the source object (eg, it can be assumed that all of the source objects are in the same plane with the same distance from the center point, and thus are assumed to be the same radius). The azimuth information can be used, for example, to determine that the location of the source object is similar to the left speaker and away from the right speaker. In this case, the source channel generator may, for example, generate at least one source channel to cause the source object to be played by the left speaker and not by the right speaker.

例如，向量基準波幅泛移(Vector Base Amplitude Panning，VBAP)可被用以決定在揚聲器的每個音源聲道內的音源物件訊號的寬度(例如，請見參考文獻[12])。例如關於VBAP，假定音源物件與虛擬音源相關。 For example, Vector Base Amplitude Panning (VBAP) can be used to determine the width of the source object signal in each source channel of the speaker (see, for example, Ref. [12]). For example, with respect to VBAP, it is assumed that the source object is associated with a virtual sound source.

在多個實施例中，另一元數據訊號可指定每個音源物件的音量，例如增益(例如以分貝[dB]表示)。 In various embodiments, another metadata signal may specify the volume of each source object, such as a gain (eg, expressed in decibels [dB]).

例如，在第5圖中，第一增益值可藉由在位置510上的第一音源物件的另一元數據訊號指定，第二增益值藉由在位置520上的第二音源物件的另一元數據訊號指定，其中第一增益值大於第二增益值。在此情況中，揚聲器510及520播放的第一音源物件的音量大於揚聲器513及514播放的第二音源物件的音量。 For example, in FIG. 5, the first gain value may be specified by another metadata signal of the first source object at location 510, and the second gain value is by another metadata of the second source object at location 520. The signal is specified, wherein the first gain value is greater than the second gain value. In this case, the volume of the first source object played by the speakers 510 and 520 is greater than the volume of the second source object played by the speakers 513 and 514.

多個實施例也假定音源物件的此類增益通常緩慢地改變。因此，不需要在每個時間點傳送此類的元數據資訊。相反地，僅在特定時間點傳送元數據資訊。在中間的時間點上，元數據資訊可例如使用上述的元數據取樣以及隨後的元數據取樣被近似並且被傳送。例如，線性內插法可用於中間值的近似。 Various embodiments also assume that such gains of the sound source object typically change slowly. Therefore, there is no need to transmit such metadata information at each point in time. Conversely, metadata information is only transmitted at specific points in time. At intermediate points in time, the metadata information can be approximated and transmitted, for example, using the metadata samples described above and subsequent metadata samples. For example, linear interpolation can be used for approximation of intermediate values.

藉由此方式，可有效節省元數據傳輸的效率。 In this way, the efficiency of metadata transmission can be effectively saved.

該系統包含一裝置250，用於產生編碼音源資訊，編碼音源資訊包含至少一編碼音源訊號以及至少一經壓縮元數據訊號。 The system includes a device 250 for generating encoded sound source information, the encoded sound source information including at least one encoded sound source signal and at least one compressed metadata signal.

此外，該系統包含一裝置100，用於接收至少一編碼音源訊號以及至少一經壓縮元數據訊號，並依照至少一編碼音源訊號以及依照至少一經壓縮元數據訊號產生至少一音源聲道。 In addition, the system includes a device 100 for receiving at least one encoded sound source signal and at least one compressed metadata signal, and generating at least one sound source channel according to the at least one encoded sound source signal and according to the at least one compressed metadata signal.

例如，當使用SAOC編碼器進行編碼的裝置250用於編碼至少一音源物件時，至少一編碼音源訊號可藉由用以產生至少一音源聲道的裝置100進行解碼，該至少一音源聲道根據現有技術採用SAOC解碼器以取得至少一音源物件訊號。 For example, when the apparatus 250 for encoding using the SAOC encoder is used to encode at least one source object, at least one encoded source signal can be decoded by the apparatus 100 for generating at least one source channel, the at least one source channel being The prior art uses a SAOC decoder to obtain at least one source object signal.

考量物件位置僅作為元數據的一示例，以允許在有限的重新初始化時間可隨機讀取，而在多個實施例中有提供定期重新傳輸所有物件的位置。 The location of the object is considered only as an example of metadata to allow random reading at limited reinitialization times, while in various embodiments there is a location to provide periodic retransmission of all objects.

根據一實施例，裝置100係接收隨機存取資訊，其中針對每一個經壓縮元數據訊號，隨機存取資訊係指經壓縮元數據訊號之一存取訊號部分，其中元數據訊號之至少一其他訊號部分並非由隨機存取資訊所標示，以及元數據解碼器110係依照經壓縮元數據訊號之存取訊號部分之第一元數據取樣值，但不依照經壓縮元數據訊號之任何其他訊號部分之任何其他第一元數據取樣值，以產生至少一再建元數據訊號中的其中一個。換句話說，藉由指定隨機存取資訊，每一個經壓縮訊號的一部分可以被指定，而經壓縮訊號的其他部分沒有被指定。在此情況中，僅經壓縮訊號的特定部分而無其他部分被再建作為再建元數據訊號的其中一個。因此，針對特定的時間點進行再建是可能的，作為經壓縮元數據訊號傳送的複數個第一元數據取樣代表經壓縮元數據訊號完整的元數據資訊(然而對於其它時間點，元數據資訊不會被傳送)。 According to an embodiment, the device 100 receives random access information, wherein for each compressed metadata signal, the random access information refers to accessing the signal portion by one of the compressed metadata signals, wherein at least one other of the metadata signals The signal part is not marked by random access information. And the metadata decoder 110 is responsive to the first metadata sample value of the access signal portion of the compressed metadata signal, but not to any other first metadata sample value of any other signal portion of the compressed metadata signal. To generate at least one of the reconstructed metadata signals. In other words, by specifying random access information, a portion of each compressed signal can be designated, while other portions of the compressed signal are not specified. In this case, only a specific portion of the signal is compressed and no other portion is reconstructed as one of the reconstructed metadata signals. Therefore, it is possible to rebuild for a specific point in time, and the plurality of first metadata samples transmitted as the compressed metadata signal represent the complete metadata information of the compressed metadata signal (however, for other time points, the metadata information is not Will be transmitted).

第6圖係繪示根據一實施例之元數據編碼。根據多個實施例可得知元數據編碼器210可用以實現第6圖所繪示的元數據編碼。 Figure 6 illustrates metadata encoding in accordance with an embodiment. According to various embodiments, the metadata encoder 210 can be used to implement the metadata encoding illustrated in FIG.

在第6圖中，s(n)可表示原始的元數據訊號中的其中一個。舉例來說，s(n)可例如代表音源物件中的其中一個的方位角的函數，n可指示時間(例如藉由指示在原始的元數據訊號內的取樣位置)。 In Figure 6, s(n) may represent one of the original metadata signals. For example, s(n) may, for example, represent a function of the azimuth of one of the source objects, and n may indicate time (eg, by indicating a sampling location within the original metadata signal).

隨時間變化軌跡元件s(n)被以明顯低於音源取樣率的一取樣率進行取樣(例如，等於或低於1：1024)，並藉由因數N進行量化(請見611)以及縮減取樣(請見612)。藉此產生標示為z(k)的定期傳送數位訊號。 The time-varying trajectory element s(n) is sampled at a sampling rate significantly lower than the source sampling rate (eg, equal to or lower than 1:1024) and quantized by a factor of N (see 611) and downsampling (See 612). This produces a periodic transmission digital signal labeled z(k).

z(k)為至少一經壓縮元數據訊號中的其中一個。例如，(n)的每第N個元數據取樣也是經壓縮元數據訊號z(k)的元數據取樣，在每第N個元數據取樣之間的(n)的其他N-1元數據取樣並非為經壓縮元數據訊號z(k)的元數據取樣。 z(k) is one of at least one compressed metadata signal. E.g, Each Nth metadata sample of (n) is also a metadata sample of the compressed metadata signal z(k), between each Nth metadata sample. The other N-1 metadata samples of (n) are not sampled for the metadata of the compressed metadata signal z(k).

例如，假設在s(n)內，n指示時間(例如藉由指示在原始的元數據訊號內的取樣位置)，其中n為正整數或0。(例如起始時間：n=0)。N為降取樣因數。例如，N=32或任何其他適合的降取樣因數。 For example, assume that within s(n), n indicates time (eg, by indicating the location of the sample within the original metadata signal), where n is a positive integer or zero. (eg start time: n=0). N is the downsampling factor. For example, N = 32 or any other suitable downsampling factor.

例如，在612的降取樣係用以從原始的元數據訊號中取得經壓縮元數據訊號z，可例如被理解如下：z(k)=(k．N)；其中k為正整數或0(k=0,1,2,…) For example, the downsampling at 612 is used to derive the compressed metadata signal z from the original metadata signal, which can be understood, for example, as follows: z(k)= (k.N); where k is a positive integer or 0 (k=0, 1, 2,...)

因此z(0)=(0)；z(1)=(32)；z(2)=(64)；z(3)=(96),… …。 Therefore z(0)= (0); z(1)= (32); z(2)= (64); z(3)= (96),...

第7圖係繪示根據一實施例之元數據解碼。多個實施例中的元數據解碼器110可被用以實現第7圖所繪示的元數據解碼。 Figure 7 illustrates metadata decoding in accordance with an embodiment. The metadata decoder 110 in various embodiments may be used to implement the metadata decoding illustrated in FIG.

根據第7圖所繪示的實施例，元數據解碼器110係用以藉由升取樣至少一經壓縮元數據訊號，以產生每一個再建元數據訊號，其中元數據解碼器110係依照再建元數據訊號之複數個第一元數據取樣值中的至少二個進行線性內插，以產生每一個再建元數據訊號之每一個第二元數據取樣值。 According to the embodiment illustrated in FIG. 7, the metadata decoder 110 is configured to generate at least one compressed metadata signal by upsampling, wherein the metadata decoder 110 is configured according to the reconstructed metadata. At least two of the plurality of first metadata samples of the signal are linearly interpolated to produce each second metadata sample value for each reconstructed metadata signal.

因此，每一個再建元數據訊號包含其經壓縮元數據訊號的所有元數據取樣(該取樣被稱為至少一經壓縮元數據訊號的“複數個第一元數據取樣值”)。 Thus, each reconstructed metadata signal contains all metadata samples of its compressed metadata signal (this sample is referred to as "a plurality of first metadata sample values" of at least one compressed metadata signal).

額外的(“第二”)元數據取樣值係藉由執行升取樣被加入於再建元數據訊號內。升取樣的步驟用以決定被加入於再建元數據訊號內的額外的(“第二”)元數據取樣值的位置。 Additional ("second") metadata samples are added to the reconstructed metadata signal by performing upsampling. The upsampling step is used to determine the location of the additional ("second") metadata sample values that are added to the reconstructed metadata signal.

藉由執行線性內插，以判斷複數個第二元數據取樣值的元數據取樣值。線性內插係基於經壓縮元數據訊號的兩個元數據取樣值被執行(該經壓縮元數據訊號已成為再建元數據訊號的複數個第一元數據取樣值)。 By performing linear interpolation, a metadata sample value of a plurality of second metadata sample values is determined. Linear interpolation is performed based on two metadata samples of the compressed metadata signal (the compressed metadata signal has become a plurality of first metadata samples of the reconstructed metadata signal).

根據實施例，藉由執行線性內插法以升取樣以及產生第二元數據值取樣值，例如，可在單一步驟中進行。 According to an embodiment, sampling is performed by performing linear interpolation and sampling values of the second metadata value are generated, for example, in a single step.

在第7圖中，反升取樣處理(見721)結合於線性內插法(見722)導致原始訊號粗略近似於原始訊號。反升取樣處理(見721)以及線性內插法(見722)可在單一步驟中進行。 In Figure 7, the inverse upsampling process (see 721) combined with linear interpolation (see 722) results in a rough approximation of the original signal to the original signal. The inverse upsampling process (see 721) and the linear interpolation process (see 722) can be performed in a single step.

例如，在解碼器側上的反升取樣處理(見721)以及線性內插(見722)可如下執行：s’(k．N)=z(k)；其中k為正整數或0 For example, the inverse upsampling process (see 721) and linear interpolation (see 722) on the decoder side can be performed as follows: s'(k.N) = z(k); where k is a positive integer or 0

其中j為正整數並並以下列式子界定：1jN-1。 Where j is a positive integer and is defined by the following formula: 1 j N-1.

在此，z(k)為確實地接收經壓縮元數據訊號z的元數據取樣值，z(k-1)為經壓縮元數據訊號z的元數據取樣值，在確實地接收經壓縮元數據訊號z(k)之前，z(k-1)被立即接收。 Here, z(k) is a metadata sample that reliably receives the compressed metadata signal z. The value, z(k-1), is the metadata sample value of the compressed metadata signal z, and z(k-1) is received immediately before the compressed metadata signal z(k) is surely received.

第8圖係繪示根據另一實施例之元數據編碼。根據實施例，元數據編碼器210可用以實現第8圖所繪示的元數據編碼。 Figure 8 illustrates metadata encoding in accordance with another embodiment. According to an embodiment, the metadata encoder 210 can be used to implement the metadata encoding illustrated in FIG.

在實施例中，如第8圖所繪示，在元數據編碼中，良好的結構可藉由在延遲補償輸入訊號以及線性內插粗略近似之間的編碼差值指定。 In an embodiment, as depicted in FIG. 8, in metadata encoding, a good structure can be specified by the difference in encoding between the delay compensated input signal and the linear interpolation coarse approximation.

根據此類的實施例，結合於線性內插的反升取樣處理也被執行作為編碼器側上的元數據編碼的一部分(見第6圖中的621以及622)。此外，反升取樣處理(見621)以及線性內插(見622)可在單一步驟中進行。 According to such an embodiment, the inverse upsampling process in conjunction with linear interpolation is also performed as part of the metadata encoding on the encoder side (see 621 and 622 in Figure 6). In addition, the inverse upsampling process (see 621) and linear interpolation (see 622) can be performed in a single step.

如上所述，元數據編碼器210係用以產生至少一經壓縮元數據訊號，以使每一個經壓縮元數據訊號包含原始的元數據訊號中的其中一個之元數據取樣值之至少二個的一第一群，該經壓縮元數據訊號可被認為與原始的元數據訊號相關聯。 As described above, the metadata encoder 210 is configured to generate at least one compressed metadata signal such that each compressed metadata signal includes at least two of the metadata samples of one of the original metadata signals. The first group, the compressed metadata signal can be considered to be associated with the original metadata signal.

每一個元數據取樣值，其被包含於至少一原始的元數據訊號中的其中一個以及被包含於經壓縮元數據訊號中並與原始的元數據訊號相關聯，可被當作為複數個第一元數據取樣中的其中一個。 Each metadata sample value, which is included in one of the at least one original metadata signal and included in the compressed metadata signal and associated with the original metadata signal, can be considered as a plurality of first One of the metadata samples.

此外，每一個元數據取樣值，其被包含於至少一原始的元數據訊號中的其中一個但不被包含於經壓縮元數據訊號且與原始的元數據訊號相關聯，則為複數個第二元數據取樣中的其中一個。 In addition, each metadata sample value, which is included in one of the at least one original metadata signal but not included in the compressed metadata signal and associated with the original metadata signal, is a plurality of second One of the metadata samples.

根據第8圖的實施例，元數據編碼器210係依照至少一原始的元數據訊號中的其中一個的複數個第一元數據取樣值之至少兩個來執行線性內插，以針對該複數個原始的元數據訊號中的其中一個的複數個第二元數據取樣值中的每一個產生一近似元數據取樣值。 According to the embodiment of FIG. 8, the metadata encoder 210 performs linear interpolation in accordance with at least two of the plurality of first metadata samples of one of the at least one original metadata signal for the plurality of Each of the plurality of second metadata sample values of one of the original metadata signals produces an approximate metadata sample value.

此外，第8圖的實施例，元數據編碼器210係用以針對至少一原始的元數據訊號中的其中一個的每一個第二元數據取樣中以產生一差值，使得此插值代表第二元數據取樣與第二元數據取樣之近似元數據取樣之間的差異。 In addition, in the embodiment of FIG. 8, the metadata encoder 210 is configured to generate a difference for each of the second metadata samples of one of the at least one original metadata signals, such that the interpolation represents the second The difference between the metadata sampling and the approximate metadata sampling of the second metadata sample.

在第10圖中所述之較佳的實施例中，係針對至少一原始的元數據訊號中的其中一個之複數個第二元數據取樣值之複數個差值中的至少一個，以判斷每一差值是否大於一門檻值。 In the preferred embodiment described in FIG. 10, for at least one original At least one of a plurality of differences of the plurality of second metadata samples of one of the metadata signals to determine whether each difference is greater than a threshold.

在第8圖的實施例中，近似元數據取樣值可例如藉由在經壓縮元數據訊號z(k)上執行升取樣以及線性內插來決定。升取樣以及線性內插可在編碼器側上編碼的元數據之一部分上執行(請見第6圖之621以及622)，同樣的方法亦可見於721與722之元數據解碼。 In the embodiment of Fig. 8, the approximate metadata sample value can be determined, for example, by performing upsampling and linear interpolation on the compressed metadata signal z(k). Upsampling and linear interpolation can be performed on a portion of the metadata encoded on the encoder side (see 621 and 622 in Figure 6), and the same method can be found in the metadata decoding of 721 and 722.

s”(k．N)=z(k)；其中k為正整數或0 s"(k.N)=z(k); where k is a positive integer or 0

；其中j為整數且：1jN-1。 Where j is an integer and: 1 j N-1.

例如，在第8圖所繪示的實施例中，當執行元數據編碼時，複數個差值可在630內針對差異被決定。 For example, in the embodiment illustrated in FIG. 8, when metadata encoding is performed, a plurality of differences may be determined for differences within 630.

s(n)-s”(n)，例如，(k-1)．N<n<k．N的所有n值，或者例如，(k-1)．N<nk．N的所有n值。 s(n)-s"(n), for example, all n values of (k-1).N<n<k.N, or for example, (k-1).N<n k. All n values of N.

在多個實施例中，此類的至少一差值係傳送至元數據解碼器。 In various embodiments, at least one difference of such is transmitted to the metadata decoder.

第9圖係繪示根據另一實施例之元數據解碼。根據多個實施例，元數據解碼器110可用以實現第9圖所繪示的元數據解碼。 Figure 9 is a diagram showing metadata decoding in accordance with another embodiment. According to various embodiments, the metadata decoder 110 can be used to implement the metadata decoding illustrated in FIG.

如上所述，每一個再建元數據訊號包含至少一經壓縮元數據訊號中的其中一個的複數個第一元數據取樣，此再建元數據訊號被認為與經壓縮元數據訊號相關聯。 As described above, each reconstructed metadata signal includes a plurality of first metadata samples of at least one of the compressed metadata signals, the reconstructed metadata signal being considered to be associated with the compressed metadata signal.

在第9圖的實施例中，元數據解碼器110係藉由產生再建元數據訊號的複數個近似元數據取樣值，以產生每一個再建元數據訊號中的複數個第二元數據取樣，其中元數據解碼器110係依照再建元數據訊號的複數個第一元數據取樣值中的至少二個，以產生複數個近似元數據取樣值中的每一個，該近似元數據取樣值可例如藉由線性內插產生，如第7圖所繪示。 In the embodiment of FIG. 9, the metadata decoder 110 generates a plurality of second metadata samples in each reconstructed metadata signal by generating a plurality of approximate metadata sample values of the reconstructed metadata signal, wherein The metadata decoder 110 is configured to generate each of a plurality of approximate metadata sample values according to at least two of the plurality of first metadata sample values of the reconstructed metadata signal, for example, by using the approximate metadata sample value. Linear interpolation is produced, as shown in Figure 7. Painted.

根據第9圖所繪示的實施例，元數據解碼器110係用以接收針對至少一經壓縮元數據訊號之其一的複數個差值。更進一步，元數據解碼器110係用以將每一個差值與再建元數據訊號的複數個近似元數據取樣值中的其中一個相加，以取得再建元數據訊號的複數個第二元數據取樣值，而再建元數據訊號係與經壓縮元數據訊號相關聯。 According to the embodiment illustrated in FIG. 9, the metadata decoder 110 is configured to receive a plurality of differences for one of the at least one compressed metadata signal. Further, the metadata decoder 110 is configured to add each of the difference values to one of a plurality of approximate metadata sample values of the reconstructed metadata signal to obtain a plurality of second metadata samples of the reconstructed metadata signal. The value, and the reconstructed metadata signal is associated with the compressed metadata signal.

因為一差值已被接收，所以所有近似元數據取樣值係與近似元數據取樣值相加，以獲得複數個第二元數據取樣值。 Since a difference has been received, all approximate metadata sample values are added to the approximate metadata sample values to obtain a plurality of second metadata sample values.

根據一實施例，對於沒有接收差值的近似元數據取樣值係被作為再建元數據訊號的第二元數據取樣值。 According to an embodiment, the approximate metadata sample value for which no difference is received is used as the second metadata sample value of the reconstructed metadata signal.

然而，根據不同的實施例，如果近似元數據取樣值沒有差值被接收，則針對近似差值取樣值依照至少一所接收的差值產生近似差值，以及將近似元數據取樣值與近似元數據取樣值相加，如下所述。 However, according to various embodiments, if no difference is received for the approximate metadata sample value, an approximate difference value is generated for the approximate difference sample value in accordance with the at least one received difference value, and the approximate metadata sample value and the approximate element are approximated The data sample values are added as described below.

根據第9圖所繪示的實施例，所接收的差值與升取樣元數據訊號相對應的複數個元數據取樣相加。藉此，當差值已被傳輸，相對應的內插元數據取樣值的差值可以是正確的，如果需要的話，可藉以取得正確的元數據取樣值。 According to the embodiment illustrated in FIG. 9, the received difference is summed with a plurality of metadata samples corresponding to the upsampled metadata signal. Thereby, when the difference has been transmitted, the difference of the corresponding interpolated metadata sample values can be correct, and if necessary, the correct metadata sample value can be obtained.

請參閱第8圖的元數據編碼，在較佳實施例中，用於編碼複數個差值的位元數係少於用於編碼元數據取樣值的位元數。這些實施例係基於發現在大部分的時間裡隨後的(例如N個)元數據取樣值僅有略有變化。舉例來說，如果一種元數據取樣值(例如以8位元)被編碼，則元數據取樣可從256個不同的差值中取出一個差值。因此，隨後(例如N個)的元數據值通常有略為變化，僅進行差值進行編碼(例如以5個位元)被認為是足夠的。因此，即使僅差值被傳送，依然可減少傳輸的位元的數量。 Referring to the metadata encoding of Figure 8, in the preferred embodiment, the number of bits used to encode the plurality of differences is less than the number of bits used to encode the metadata samples. These embodiments are based on the discovery that only a small change in subsequent (e.g., N) metadata samples over most of the time. For example, if a metadata sample value (eg, in 8-bits) is encoded, the metadata sample can take a difference from 256 different differences. Thus, subsequent (eg, N) metadata values typically vary slightly, and only the difference encoding (eg, in 5 bits) is considered sufficient. Therefore, even if only the difference is transmitted, the number of transmitted bits can be reduced.

在一較佳實施例中，至少一差值被傳送，並且每一個差值自係以少於每一個元數據取樣值的位元數進行編碼，其中每個差值皆為整數。 In a preferred embodiment, at least one difference is transmitted, and each difference is encoded with a number of bits less than each of the metadata samples, wherein each difference is an integer.

根據一實施例，元數據編碼器110係用以將該至少一經壓縮元數據訊號中的其中一個之該至少一元數據取樣值與第一數量之位元進行編碼，其中至少一經壓縮元數據訊號中的其中一個的每一個元數據取樣值係表示一整數。此外，元數據編碼器110係用以將複數個第二元數據取樣值之至少一第二差值與第二數量之位元進行編碼，其中複數個第二元數據取樣值之至少一差值係表示一整數，其中第二數量之位元係小於第一數量之位元。 According to an embodiment, the metadata encoder 110 is configured to encode the at least one metadata sample value of one of the at least one compressed metadata signal with a first number of bits, wherein at least one compressed metadata signal is used. One of each metadata sample value Represents an integer. In addition, the metadata encoder 110 is configured to encode at least one second difference of the plurality of second metadata samples and the second number of bits, wherein at least one difference of the plurality of second metadata samples Represents an integer in which the second number of bits is less than the first number of bits.

在一實施例中，複數個元數據取樣值可例如代表方位角以8位元進行編碼，方位角為一整數並可例如以下列式子界定：-90方位角90。因此，方位角可用181個不同的數值，如果可假設其與隨後的(例如N個)方位角取樣值相差不大，例如不超過±15，接著，5位元(25=32)可足以編碼複數個差值。如果複數個不同的差值可代表複數個整數值，則判斷複數個不同的差值自動地傳送額外的數值並待傳送到一個適當的數值範圍。 In an embodiment, the plurality of metadata samples may be encoded, for example, by azimuth in 8-bits, the azimuth being an integer and may be defined, for example, by the following equation: -90 Azimuth 90. Thus, the azimuth can be 181 different values, if it can be assumed that it is not much different from the subsequent (eg N) azimuth samples, eg no more than ±15, then 5 bits (25=32) may be sufficient to encode Multiple differences. If a plurality of different differences can represent a plurality of integer values, then it is determined that the plurality of different differences automatically convey additional values and are to be transmitted to an appropriate range of values.

例如，在第一音源物件的第一方位值為60°的情況中，隨後的複數個方位角值會在45°至75°之間做改變。此外，假設第二音源物件第二方位值為-30°，則隨後的複數個方位值會在-45°至-15°之間做改變。針對第二音源物件以及第一音源物件兩者的兩個隨後的數值的不同差值，第二方位值以及第一方位值兩者的複數個不同的差值皆介於-15°至+15°的數值範圍內，使得5位元足以編碼複數個不同的差值中的每一個以及使得編碼複數個不同的差值的位元序列具有第二方位值以及第一方位值兩者的複數個不同的差值。 For example, in the case where the first orientation value of the first source object is 60°, the subsequent plurality of azimuth values may vary between 45° and 75°. Furthermore, assuming that the second orientation value of the second source object is -30°, the subsequent plurality of orientation values will vary between -45° and -15°. For the difference between the two subsequent values of the second source object and the first source object, the plurality of different differences between the second orientation value and the first orientation value are between -15° and +15 Within a range of values, such that 5 bits are sufficient to encode each of a plurality of different difference values and such that a sequence of bits encoding a plurality of different differences has a plurality of second orientation values and a plurality of first orientation values Different difference.

在一實施例中，因為沒有元數據取樣存在於經壓縮元數據訊號中，所以每一個差值被傳送到解碼側上。此外，根據一實施例，因為沒有元數據取樣存在於經壓縮元數據訊號中，所以每一個差值被接收並藉由元數據解碼器處理。然而，第10圖以及第11圖所繪示的一些較佳實施例應被理解為不同的概念。 In an embodiment, because no metadata samples are present in the compressed metadata signal, each difference is transmitted to the decoding side. Moreover, according to an embodiment, since no metadata samples are present in the compressed metadata signal, each difference is received and processed by the metadata decoder. However, some of the preferred embodiments illustrated in Figures 10 and 11 should be understood as different concepts.

第10圖係繪示根據另一實施例之元數據編碼。根據多個實施例，元數據編碼器210可用以實現第10圖所繪示的元數據編碼。 Figure 10 is a diagram showing metadata encoding in accordance with another embodiment. According to various embodiments, the metadata encoder 210 can be used to implement the metadata encoding illustrated in FIG.

在一些實施例中，如第10圖所繪示，複數個差值例如被用以決定未包含於經壓縮元數據訊號的原始元數據訊號的每個元數據取樣值。例如，當在時間點n=0以及n=N的元數據取樣值有包含於經壓縮元數據訊號，但不包含時間點n=1至n=N-1之間的元數據取樣值時，則需決定時間點n=1至n=N-1的差值。 In some embodiments, as depicted in FIG. 10, the plurality of differences are used, for example, to determine each metadata sample value of the original metadata signal that is not included in the compressed metadata signal. For example, when the metadata sample values at the time points n=0 and n=N are included in the compressed metadata signal, but do not include the metadata sample values between the time points n=1 to n=N-1, Need to decide The difference between the time points n=1 and n=N-1.

然而，根據第10圖的實施例，接著在640執行多邊形近似(polygon approximation)。元數據編碼器210係用以決定將傳送複數個差值中的哪一個以及決定是否傳送所有的差值。 However, according to the embodiment of FIG. 10, polygon approximation is then performed at 640. The metadata encoder 210 is used to determine which of the plurality of differences will be transmitted and to decide whether to transmit all of the differences.

例如，元數據編碼器210可僅傳送具有大於一門檻值的一差值的複數個差值。 For example, metadata encoder 210 may only transmit a plurality of differences having a difference greater than a threshold.

在另一實施例中，當複數個差值中的差值與元數據取樣值的比值大於一門檻值時，元數據編碼器210可僅傳送該差值。 In another embodiment, when the ratio of the difference between the plurality of differences to the metadata sample value is greater than a threshold, the metadata encoder 210 may only transmit the difference.

在一實施例中，元數據編碼器210檢查最大的絕對差值是否大於一門檻值，若是，則傳送該絕對差值，反之，則不會傳送任何的差值並結束檢查。持續檢查第二大的差值以及第三大差值等，直到所有的差值皆小於門檻值。 In an embodiment, the metadata encoder 210 checks if the maximum absolute difference is greater than a threshold, and if so, transmits the absolute difference, otherwise, does not transmit any difference and ends the check. Continue to check the second largest difference and the third largest difference, etc. until all the differences are less than the threshold.

根據實施例，因為並非所有的差值皆一定會被傳送，所以元數據編碼器210不僅編碼其(第10圖中的複數個數值y1[k]...yN-1[k]中的其中一個)差值(的大小)，並且傳送與(第10圖中的複數個數值x1[k]...xN-1[k]中的其中一個)差值相關聯的原始的元數據訊號的元數據取樣值的資訊。例如，元數據編碼器可編碼與差值相關聯的時間點。例如，元數據編碼器可編碼介於1到N-1之間的數值以指示出介於0到N之間的元數據取樣值，並在與差值相關聯的經壓縮元數據訊號中傳送。根據複數個差值作判斷，在多邊形近似的輸出上所列出的複數個數值x1[k]...xN-1[k]y1[k]...yN-1[k]並非意指所有數值一定會被傳送，相反地，其意指沒有、一個、一些或全部的數值對會被傳送。 According to the embodiment, since not all the differences are necessarily transmitted, the metadata encoder 210 not only encodes it (in the plural numbers y1[k]...yN-1[k] in FIG. 10) a) a difference (size) and transmitting the original metadata signal associated with the difference (one of the plurality of values x1[k]...xN-1[k] in FIG. 10) Information on metadata sample values. For example, the metadata encoder can encode the point in time associated with the difference. For example, the metadata encoder may encode a value between 1 and N-1 to indicate a metadata sample value between 0 and N and transmit it in a compressed metadata signal associated with the difference. . Judging from the complex differences, the complex values x1[k]...xN-1[k]y1[k]...yN-1[k] listed on the output of the polygon approximation do not mean All values must be transmitted, and conversely, it means that no, one, some or all of the pairs of values will be transmitted.

在一實施例中，元數據編碼器210可處理部份(例如N個)連續的差值，並藉由量化的多邊形點[xi,yi]的變數形成的多邊形過程來近似每個部分。 In an embodiment, metadata encoder 210 may process portions (eg, N) of consecutive differences and approximate each portion by a polygonal process formed by the quantized variables of polygon points [xi, yi].

可預期必須精確地近似差信號的多邊形點的變數的平均值明顯地小於N。此外，[xi,yi]係為很小的整數值，它們將與低位元進行編碼。 It is expected that the average of the variables of the polygonal points that must accurately approximate the difference signal is significantly smaller than N. In addition, [xi, yi] is a small integer value that will be encoded with the lower bits.

第11圖係繪示根據另一實施例之元數據解碼。根據多個實施例，元數據解碼器110可用以實現第11圖所繪示的元數據解碼。 Figure 11 is a diagram showing metadata decoding in accordance with another embodiment. According to various embodiments, the metadata decoder 110 can be used to implement the metadata decoding illustrated in FIG.

在實施例中，元數據解碼器110接收一些差值，並將這些差值與在730內的相對應的線性內插的元數據取樣值相加。 In an embodiment, metadata decoder 110 receives some differences and adds the differences to the corresponding linearly interpolated metadata samples in 730.

在一些實施例中，元數據解碼器110僅將所接收的差值與在730內的相對應的線性內插的元數據取樣值相加，並保留沒有接收到任何的差值的其他線性內插的元數據取樣值。 In some embodiments, metadata decoder 110 simply adds the received difference value to the corresponding linearly interpolated metadata sample value within 730 and retains other linearities that do not receive any difference values. The inserted metadata sample value.

然而，多個實施例實現另一個概念，如下所述。 However, various embodiments implement another concept, as described below.

根據此類的實施例，元數據解碼器110係針對至少一經壓縮元數據訊號中的其中一個以接收複數個差值，此意味著每一個差值可作為“所接收的差值”。所接收的差值被指定為再建元數據訊號的近似元數據取樣中的其中一個，其中所接收的差值與經壓縮元數據訊號相關聯。 In accordance with such an embodiment, metadata decoder 110 receives a plurality of differences for one of the at least one compressed metadata signals, which means that each difference can be referred to as a "received difference." The received difference is designated as one of the approximate metadata samples of the reconstructed metadata signal, wherein the received difference is associated with the compressed metadata signal.

請參閱已描述的第9圖，元數據解碼器110係用以將接收到的複數個差值中的每一個與近似元數據取樣值相加，該近似元數據取樣值與所接收的差值相關聯。再建元數據訊號的複數個第二元數據取樣值中的其中一個係藉由將所接收的差值與其近似元數據取樣值相加而取得。 Referring to FIG. 9 already described, the metadata decoder 110 is configured to add each of the received plurality of differences to an approximate metadata sample value, the approximate metadata sample value and the received difference value. Associated. One of the plurality of second metadata samples of the reconstructed metadata signal is obtained by adding the received difference to its approximate metadata sample value.

然而，針對一些近似元數據取樣值，通常沒有差值被接收。 However, for some approximate metadata sample values, typically no difference is received.

在一些實施例中，當複數個所接收的差值沒有一個與近似元數據取樣值相關聯時，針對再建元數據訊號的每一個近似元數據取樣值，元數據解碼器110可例如依據複數個所接收的差值中的至少一個來判斷近似差值，該再建元數據訊號與經壓縮元數據訊號相關聯。 In some embodiments, when none of the plurality of received differences is associated with an approximate metadata sample value, the metadata decoder 110 may receive each of the approximate metadata samples for the reconstructed metadata signal, for example, based on the plurality of received metadata samples. The approximate difference is determined by at least one of the difference values, and the reconstructed metadata signal is associated with the compressed metadata signal.

換句話說，對於所有的近似元數據取樣值而言，沒有差值被接收時，近似差值仍依據至少一個所接收的差值所產生。 In other words, for all approximate metadata samples, when no difference is received, the approximate difference is still generated based on at least one of the received differences.

元數據解碼器110係用以將複數個近似差值的每一個與近似差值的近似元數據取樣值相加，以取得再建元數據訊號的複數個第二元數據取樣值中的另一個。 The metadata decoder 110 is configured to add each of the plurality of approximate differences to the approximate metadata sample value of the approximate difference to obtain the other of the plurality of second metadata samples of the reconstructed metadata signal.

然而，在另一實施例中，針對複數個元數據取樣值，元數據解碼器110係依照在步驟740內被接收的複數個差值來執行線性內插，對沒有被接收的複數個差值進行近似。 However, in another embodiment, for a plurality of metadata sample values, metadata decoder 110 performs linear interpolation in accordance with the plurality of differences received in step 740 for a plurality of differences that are not received. Make an approximation.

舉例來說，如果接收第一差值以及第二差值，則位於所接收的複數個差值之間的複數個差值可以被近似，例如採用線性內插。 For example, if a first difference and a second difference are received, then a plurality of differences between the received plurality of differences can be approximated, such as by linear interpolation.

例如，當在時間點n=15上的第一差值具有差值d[15]=5以及當在時間點n=18上的第二差值具有差值d[18]=2時，對於n=16以及d=17的複數個差值可被線性近似作為d[16]=以及d[17]=3。 For example, when the first difference at time point n=15 has a difference d[15]=5 and when the second difference at time point n=18 has a difference d[18]=2, The complex differences of n=16 and d=17 can be linearly approximated as d[16]= and d[17]=3.

在另一實施例中，當複數個元數據取樣被包含於經壓縮元數據訊號時，複數個元數據取樣值的複數個差值被假設為0，元數據解碼器可依據被假設為0的複數個元數據取樣值來執行沒有被接收的複數個差值的線性內插。 In another embodiment, when a plurality of metadata samples are included in the compressed metadata signal, the plurality of differences of the plurality of metadata samples are assumed to be zero, and the metadata decoder can be assumed to be 0. A plurality of metadata sample values are used to perform linear interpolation of a plurality of difference values that are not received.

例如，當在n=16的單一個差值d=8被傳送時以及當在n=0以及n=32的元數據取樣值在經壓縮元數據訊號內被傳送時，則在n=0以及n=32上沒有被傳送的差值被假設為0。 For example, when a single difference d=8 at n=16 is transmitted and when the metadata samples at n=0 and n=32 are transmitted within the compressed metadata signal, then n=0 and The difference that is not transmitted on n=32 is assumed to be zero.

我們假設n代表時間以及假設d[n]為在時間點n的差值。接著：d[16]=8(接收的差值) We assume that n represents time and that d[n] is the difference at time point n. Then: d[16]=8 (the difference received)

d[0]=0(假設的差值，作為存在於z(k)的元數據取樣值) d[0]=0 (assumed difference as the metadata sample value present in z(k))

d[32]=0(假設的差值，作為存在於z(k)的元數據取樣值) d[32]=0 (assumed difference, as the metadata sample value present in z(k))

則複數個近似元數據取樣值：d[1]=0.5；d[2]=1；d[3]=1.5；d[4]=2；d[5]=2.5；d[6]=3；d[7]=3.5；d[8]=4；d[9]=4.5；d[10]=5；d[11]=5.5；d[12]=6；d[13]=6.5；d[14]=7；d[15]=7.5；d[17]=7.5；d[18]=7；d[19]=6.5；d[20]=6；d[21]=5.5；d[22]=5；d[23]=4.5；d[24]=4；d[25]=3.5；d[26]=3；d[27]=2.5；d[28]=2；d[29]=1.5；d[30]=1；d[31]=0.5。 Then a plurality of approximate metadata samples: d[1]=0.5; d[2]=1; d[3]=1.5; d[4]=2; d[5]=2.5;d[6]=3 ;d[7]=3.5;d[8]=4;d[9]=4.5;d[10]=5;d[11]=5.5;d[12]=6;d[13]=6.5; d[14]=7;d[15]=7.5;d[17]=7.5;d[18]=7;d[19]=6.5;d[20]=6;d[21]=5.5;d [22]=5;d[23]=4.5;d[24]=4;d[25]=3.5;d[26]=3;d[27]=2.5;d[28]=2;d[ 29]=1.5; d[30]=1; d[31]=0.5.

在實施例中，(在730中)所接收的複數個近似差值與相對應的線性內插取樣值相加。 In an embodiment, the plurality of approximate differences received (in 730) are added to the corresponding linearly interpolated sample values.

最佳實施例被描述如下。 The preferred embodiment is described below.

(物件)元數據編碼器可使用給定大小N的前瞻緩衝器來編碼一串規則(子)取樣軌跡值。一旦緩衝器被填充，整體數據區塊被編碼以及傳送。所編碼的物件數據可由兩個部份組成，分別為內部編碼物件數據以及包含每個部分的精細結構的選擇性差分數據。 The (object) metadata encoder can encode a series of regular (sub)sampled track values using a look-ahead buffer of a given size N. Once the buffer is filled, the entire data block is encoded as well Transfer. The encoded object data can be composed of two parts, namely internally encoded object data and selective differential data containing the fine structure of each part.

內部編碼物件數據包含被取樣於一個規則網格(每32個音源訊框的長度1024)上的量化值z(k)。複數個布林變數可被用於針對每個物件指示複數個數值被個別指定或用於指示適用於所有物件的數值。 The internally encoded object data contains quantized values z(k) that are sampled on a regular grid (length 1024 per 32 audio frames). A plurality of Boolean variables can be used to indicate for each object that a plurality of values are individually specified or used to indicate values applicable to all objects.

解碼器可藉由線性內插從內部編碼物件數據提取粗略軌跡。軌跡的精細結構係由差分數據給定，該差分數據部分包含在輸入軌跡以及線性內插之間的編碼差值。針對方位角、仰角以及半徑，多邊形表現與不同的量化步驟結合，導致所預期的非相關性減少。 The decoder can extract coarse trajectories from the internally encoded object data by linear interpolation. The fine structure of the trajectory is given by differential data that contains the encoded difference between the input trajectory and the linear interpolation. For azimuth, elevation, and radius, polygon representation is combined with different quantization steps, resulting in a reduction in expected non-correlation.

多邊形表現可從不使用遞迴的道格拉斯-普克演算法[10,11]的一個變異數中取得，其中道格拉斯-普克演算法係藉由使用一個額外的中斷迴圈使其不同於原始的近似。 Polygon representation can be obtained from a variation of the Douglas-Puck algorithm [10,11] that does not use recursion, where the Douglas-Puke algorithm differs from the original by using an additional interrupt loop. approximate.

所產生的複數個多邊形點可使用可變的字元長度被編碼於差分數據部分，該字元長度在數據流內被指定。額外的布林變異數係指示相同數值的共同編碼。 The resulting plurality of polygon points can be encoded in the differential data portion using a variable character length that is specified within the data stream. Additional Boolean variants indicate a common encoding of the same values.

根據多個實施例，複數個物件數據訊框以及象徵表現被描述如下。 According to various embodiments, a plurality of object data frames and symbolic representations are described below.

為了提高效率，係聯合編碼一串規則的(子)取樣軌跡值。編碼器可使用給定大小的前瞻緩衝器，一旦緩衝器被填充，則整體數據區塊被編碼以及傳送。編碼的物件數據(例如酬載物件元數據)可例如包含兩個部分，分別為內部編碼物件數據(第一部分)以及選擇性差分數據(第二部分)。 In order to improve efficiency, a series of regular (sub)sampled track values are coded. The encoder can use a look-ahead buffer of a given size, and once the buffer is filled, the entire data block is encoded and transmitted. The encoded object data (eg, payload item metadata) may, for example, comprise two portions, internally encoded object data (first portion) and selective differential data (second portion).

例如，可採用下面的語法的一些或全部部分： For example, some or all of the following syntax can be used:

以下根據實施例描述內部編碼物件數據：為了支持編碼物件元數據的隨機讀取，所有物件元數據的完整且獨立的標準需要被規則地傳送。在此，可理解為透過內部編碼物件數據(“I圖框”)包含在一規則的網格上的複數個量化取樣值(例如，每32個訊框的長度1024)，其中I圖框具有下列語法：在目前的I圖框之後，position_azimuth、position_elevation、position_radius以及gain_factor，這些語法指定在iframe_period訊框內的複數個量化取樣值。 Internally encoded object data is described below in accordance with an embodiment: In order to support random reading of encoded object metadata, a complete and independent standard of all object metadata needs to be transmitted regularly. Here, it can be understood that the internal coded object data ("I frame") contains a plurality of quantized sample values on a regular grid (for example, the length of each 32 frames is 1024), wherein the I frame has The following syntax: after the current I frame, position_azimuth, position_elevation, position_radius, and gain_factor, these grammars specify a plurality of quantized sample values within the iframe_period frame.

以下根據一實施例描述差分物件數據。 Differential object data is described below in accordance with an embodiment.

透過傳送較少數量的複數個取樣點多邊形近似，以實現較精確的近似。因此，一個非常稀疏的三維矩陣係被傳送，其中第一維度可以為物件索引、第二維度可為複數個元數據元素(方位角，仰角，半徑，增益)的組成以及第三維度可為複數個多邊形取樣點的訊框索引。不須進一步的量測，所指示的矩陣的複數個元素已需要num_objects * num_components *(iframe_period-1)個位元數。第一步驟為減少位元數以使其可被加入四個旗標，該四個旗標用於指示是否有至少一個數值屬於四個元素中的其中一個。例如，可預期僅在少數的情況下會出現差分半徑值或增益值。降低的三維矩陣的第三維度包含具有複數個iframe_period-1元素的一向量。如果僅預期少量的多邊形點，藉由一組訊號索引以及該組的基數來參數化向量會更有效率。例如，針對iframe_period of Nperiod=32訊框以及最多數量的 16個多邊形點，使用此方法對Npoints<(32-log2(16))/log2(32)=5.6個多邊形點會更有利。根據多個實施例，採用以下用於此類編碼方案的語法： A more accurate approximation is achieved by transmitting a smaller number of multiple sample point polygon approximations. Therefore, a very sparse three-dimensional matrix is transmitted, wherein the first dimension can be an object index, the second dimension can be a component of a plurality of metadata elements (azimuth, elevation, radius, gain) and the third dimension can be a complex number The frame index of the polygon sampling points. Without further measurement, the complex elements of the indicated matrix already require num_objects * num_components *(iframe_period-1) number of bits. The first step is to reduce the number of bits so that they can be added to four flags, which are used to indicate whether at least one value belongs to one of the four elements. For example, it can be expected that differential radius values or gain values will occur in only a few cases. The third dimension of the reduced three-dimensional matrix contains a vector having a plurality of iframe_period-1 elements. If only a small number of polygon points are expected, it is more efficient to parameterize the vector by a set of signal indices and the cardinality of the group. For example, for iframe_period of Nperiod=32 frames and a maximum of 16 polygon points, using this method would be more advantageous for Npoints<(32-log2(16))/log2(32)=5.6 polygon points. According to various embodiments, the following syntax for such an encoding scheme is employed:

巨集offset_data()編碼複數個多邊形點的複數個位置(訊框偏移)，作為一個簡單的位域或上述所使用的概念之一。複數個num_bits數值允許較大的位置跳躍編碼，同時，差位數據的其餘部分與較少的字元數進行邊碼。 The macro offset_data() encodes a plurality of locations (frame offsets) of a plurality of polygon points as a simple bit field or one of the concepts used above. A plurality of num_bits values allow for larger position skip coding, while the remainder of the difference data is edge coded with fewer characters.

特別地，在一實施例中，複數個巨集可例如具有下面的涵義：根據一實施例，object_metadata()的參數定義如下：has_differential_metadata 指示差分物件是否出現。 In particular, in an embodiment, the plurality of macros may have, for example, the following meaning: According to an embodiment, the parameters of object_metadata() are defined as follows: has_differential_metadata indicates whether a differential object is present.

根據一實施例，intracoded_object_metadata()的參數定義如下：ifperiod 定義在複數個獨立訊框之間的訊框數量。 According to an embodiment, the parameters of intracoded_object_metadata() are defined as follows: ifperiod defines the number of frames between a plurality of independent frames.

common_azimuth 指示共同方位角是否使用於所有的物件。 Common_azimuth indicates whether the common azimuth is used for all objects.

default_azimuth 定義共同方位角的數值。 Default_azimuth defines the value of the common azimuth.

position_azimuth 如果不存在共同方位角值，則傳送每個物件的一數值。 Position_azimuth If there is no common azimuth value, then a value for each object is transmitted.

common_elevation 指示共同仰角是否使用於所有的物件。 Common_elevation indicates whether the common elevation is used for all objects.

default_elevation 定義共同仰角的數值。 Default_elevation defines the value of the common elevation angle.

position_elevation 如果不存在共同仰角值，則傳送每個物件的數值。 Position_elevation If there is no common elevation value, the value of each object is transmitted.

common_radius 指示共同半徑值是否被使用於所有的物件。 Common_radius indicates whether the common radius value is used for all objects.

default_radius 定義共同半徑的值。 Default_radius defines the value of the common radius.

position_radius 如果不存在共同半徑值，則傳送每個物件的數值。 Position_radius Transfers the value of each object if there is no common radius value.

common_gain 指示共同增益值是否使用於所有的物件。 Common_gain indicates whether the common gain value is used for all objects.

default_gain 定義共同增益因數值。 Default_gain defines the common gain factor value.

gain_factor 如果不存在共同增益因數值，則傳送每個物件的數值。 Gain_factor Transmits the value of each object if there is no common gain factor value.

position_azimuth 如果僅存在一個物件，這是它的方位角。 Position_azimuth If there is only one object, this is its azimuth.

position_elevation 如果僅存在一個物件，這是它的仰角。 Position_elevation If there is only one object, this is its elevation angle.

position_radius 如果僅存在一個物件，這是它的半徑。 Position_radius If there is only one object, this is its radius.

gain_factor 如果僅存在一個物件，這是它的增益因數。 Gain_factor If there is only one object, this is its gain factor.

根據一實施例，differential_object_metadata()的參數定義如下：bits_per_point 用於表現多邊形點所需要的位元數量。 According to an embodiment, the parameters of differential_object_metadata() are defined as follows: bits_per_point is used to represent the number of bits needed for a polygon point.

fixed_azimuth 用於指示所有物件的方位角值是否為固定不變的旗標。 Fixed_azimuth is used to indicate whether the azimuth value of all objects is a fixed flag.

flag_azimuth 用於指示方位角值是否有改變的每個物件之旗標。 Flag_azimuth A flag for each object that indicates whether the azimuth value has changed.

nbits_azimuth 用於表示差值所需要的多少位元。 Nbits_azimuth is used to indicate how many bits are needed for the difference.

differential_azimuth在線性內插值以及實際值之間的差值。 Differential_azimuth is the difference between the linear interpolation value and the actual value.

fixed_elevation 用於指示所有物件的仰角值是否為固定不變的旗標。 Fixed_elevation is used to indicate whether the elevation value of all objects is a fixed flag.

flag_elevation 用於指示仰角值是否有改變的每個物件旗標。 Flag_elevation is used to indicate each object flag for whether the elevation value has changed.

nbits_elevation 用於表示差值所需要的多少位元。 Nbits_elevation is used to indicate how many bits are needed for the difference.

differential_elevation 在線性內插值以及實際值之間的差值。 Differential_elevation The difference between the linear interpolation and the actual value.

fixed_radius 用於指示所有物件的半徑是否為固定不變的旗標。 Fixed_radius is used to indicate whether the radius of all objects is a fixed flag.

flag_radius 用於指示半徑是否有改變的每個物件旗標。 Flag_radius is used to indicate each object flag for a change in radius.

nbits_radius 用於表示差值所需要的多少位元。 Nbits_radius is used to indicate how many bits are needed for the difference.

differential_radius 在線性內插值以及實際值之間的差值。 Differential_radius The difference between the linear interpolation and the actual value.

fixed_gain 用於指示所有物件的增益因數是否為固定不變的旗標。 Fixed_gain is used to indicate whether the gain factor of all objects is a fixed flag.

flag_gain 用於指示增益因數是否有改變的每個物件旗標。 Flag_gain is used to indicate each object flag for whether the gain factor has changed.

nbits_gain 用於表示差值所需要的多少位元。 Nbits_gain is used to indicate how many bits are needed for the difference.

differential_gain 在線性內插值以及實際值之間的差值。 Differential_gain The difference between the linear interpolation and the actual value.

根據一實施例，offset_data()的參數定義如下：bitfield_syntax 用於指示具有複數個多邊形索引的一向量是否存在於數據流內的旗標。 According to an embodiment, the parameters of offset_data() are defined as follows: bitfield_syntax is used to indicate whether a vector having a plurality of polygon indices exists in a flag within the data stream.

offset_bitfield 布林陣列包含一個旗標，其係針對 Offset_bitfield The Boolean array contains a flag that is targeted at

iframe_period的每個點是否為多邊形點。 Whether each point of iframe_period is a polygon point.

npoints 多邊形點減1(num_points=npoints+1)的數量。 The number of npoints polygon points minus 1 (num_points=npoints+1).

foffset 在frame_period(frame_offset=foffset+1)內的多邊形點的時間分割索引。 Foffset The time division index of a polygon point within frame_period(frame_offset=foffset+1).

根據一實施例，元數據可例如被轉換作為每個音源物件在所定義的時間戳記上的給定位置(例如所指示的方位角、仰角以及半徑)。 According to an embodiment, the metadata may be converted, for example, as a given location (eg, the indicated azimuth, elevation, and radius) of each source object on the defined timestamp.

在先前技術中，一方面不存在結合聲道的可變技術，另一方面物件編碼使得可接收的音源品質以低位元速率取得。 In the prior art, on the one hand there is no variable technique for combining channels, and on the other hand object encoding enables the quality of the receivable source to be obtained at a low bit rate.

3D音源邊解碼系統克服此限制，並且被描述如下。 The 3D tone source side decoding system overcomes this limitation and is described below.

第12圖係繪示根據本發明之一實施例之3D音源編碼器。3D音源編碼器係用以編碼音源輸入數據101以取得音源輸出數據501。3D音源編碼器包含一輸入界面，該輸入界面係用以接收CH所指示的複數個音源聲道以及OBJ所指示的複數個音源物件。此外，第12圖所繪示的輸入界面1100額外地接收與複數個音源物件OBJ中的至少一個相關的元數據。此外，3D音源編碼器包含一混合器200，該混合器200係用以混合複數個物件以及複數個聲道以取得複數個預混合的聲道，其中每個預混合的聲道包含一聲道的音源數據以及至少一物件的音源數據。 Figure 12 is a diagram showing a 3D sound source encoder in accordance with an embodiment of the present invention. The 3D sound source encoder is used to encode the sound source input data 101 to obtain the sound source output data 501. The 3D sound source encoder includes an input interface for receiving a plurality of sound source channels indicated by the CH and the plural indicated by the OBJ. One source object. In addition, the input interface 1100 illustrated in FIG. 12 additionally receives metadata associated with at least one of the plurality of source objects OBJ. In addition, the 3D sound source encoder includes a mixer 200 for mixing a plurality of objects and a plurality of channels to obtain a plurality of premixed channels, wherein each premixed channel includes one channel. Source data and sound source data of at least one object.

此外，3D音源編碼器包含一核心編碼器300以及一元數據壓縮器400，其中核心編碼器300係用以核心編碼其輸入數據，元數據壓縮器400係用以壓縮與複數個音源物件中的至少一個相關的元數據。 In addition, the 3D sound source encoder includes a core encoder 300 for core encoding its input data, and a metadata compressor 400 for compressing at least one of the plurality of sound source objects. A related metadata.

此外，3D音源編碼器可包含一模式控制器600，其在複數個操作模式中的其中一個下係控制混合器，核心編碼器及/或一輸出界面500，其中核心編碼器在第一模式係用以編碼複數個音源聲道以及藉由輸入界面1100接收而不受混合器影響(亦即不藉由混合器200混合)的複數個音源物件。然而，在第二模式下混合器200是活躍的，核心編碼器編碼複數個混合的聲道，亦即區塊200所產生的輸出。在後者的情況下，較佳地，不要再編碼任何物件數據。相反地，指示複數個音源物件位置的元數據已被使用於混合器200，以將複數個物件轉譯於元數據所指示的複數個聲道上。換句話說，混合器200使用與複數個音源物件相關的元數據以預轉譯複數個音源物件，接著，所預轉譯的複數個音源物件與聲道混和以取得在混合器輸出上的混合聲道。在此實施例中，可以不必傳輸任何物件，也可將音源物件施加於經壓縮元數據並作為區塊400的輸出。然而，如果並非輸入界面1100的所有物件皆被混合而僅有特定數量的物件被混合，則僅維持沒有被混合的物件以及相關聯的元數據仍分別被傳送到編碼器300或元數據壓縮器400。 In addition, the 3D sound source encoder can include a mode controller 600 that controls the mixer, the core encoder, and/or an output interface 500 in one of a plurality of operating modes, wherein the core encoder is in the first mode system A plurality of source objects for encoding a plurality of source channels and received by the input interface 1100 without being affected by the mixer (ie, not being mixed by the mixer 200). However, in the second mode the mixer 200 is active and the core encoder encodes a plurality of mixed channels, i.e., the output produced by block 200. In the latter case, preferably, no more object data is encoded. Conversely, metadata indicating the location of a plurality of source objects has been used in the mixer 200 to translate the plurality of objects to the plurality of channels indicated by the metadata. on. In other words, the mixer 200 uses metadata associated with a plurality of source objects to pre-translate a plurality of source objects, and then the pre-translated plurality of source objects are mixed with the channels to obtain a mixed channel on the mixer output. . In this embodiment, any object may not have to be transmitted, and the source object may be applied to the compressed metadata and as an output of block 400. However, if not all of the objects of the input interface 1100 are mixed and only a certain number of objects are mixed, then only the objects that are not mixed and the associated metadata are still transferred to the encoder 300 or the metadata compressor, respectively. 400.

根據上述多個實施例中的其中一個，在第12圖中的元數據壓縮器400係為裝置250的元數據編碼器210，用於產生編碼音源資訊。此外，根據上述多個實施例中的其中一個，在第12圖中的混合器200以及核心編碼器一起組成裝置250的音源編碼器220，用於產生編碼音源資訊。 According to one of the various embodiments described above, the metadata compressor 400 in FIG. 12 is a metadata encoder 210 of the apparatus 250 for generating encoded sound source information. Moreover, in accordance with one of the various embodiments described above, the mixer 200 and the core encoder in FIG. 12 together comprise a source encoder 220 of the apparatus 250 for generating encoded source information.

第14圖係繪示3D音源編碼器之另一實施例。圖中的3D音源編碼器更包含一SAOC編碼器800，該SAOC編碼器800用於從空間音源物件編碼器輸入數據中產生至少一運輸聲道以及參數化數據。如第14圖所繪示，空間音源物件編碼器的輸入數據係為尚未經由預轉譯器/混合器處理的物件。另外，當獨立聲道/物件編碼在第一模式下是活躍時，則預轉譯器/混合器被旁通略過，所有的物件被輸入到SAOC編碼器800所編碼的輸入界面1100。 Figure 14 is a diagram showing another embodiment of a 3D sound source encoder. The 3D sound source encoder in the figure further includes a SAOC encoder 800 for generating at least one transport channel and parameterized data from the spatial source object encoder input data. As shown in FIG. 14, the input data of the spatial sound source object encoder is an object that has not been processed by the pre-translator/mixer. Additionally, when the independent channel/object code is active in the first mode, the pre-translator/mixer is bypassed and all objects are input to the input interface 1100 encoded by the SAOC encoder 800.

此外，如第14圖所繪示，較佳地，核心編碼器300被實現作為USAC編碼器，亦即作為MPEG-USAC標準(USAC=聯合語音以及音源編碼)中所定義以及規範的編碼器。針對獨立數據型態，描繪於第14圖中的3D音源編碼器的所有輸出係為具有容器狀結構的一MPEG 4數據流。此外，元數據被指示作為“OAM”數據，第12圖中的元數據壓縮器400對應於OAM編碼器400，以取得輸入到USAC編碼器300內的經壓縮OAM數據，如第14圖所繪示，USAC編碼器300更包含輸出界面，用於取得具有編碼聲道/物件數據以及經壓縮OAM數據的MP4輸出數據流。 Further, as shown in Fig. 14, preferably, the core encoder 300 is implemented as a USAC encoder, that is, as an encoder defined and standardized in the MPEG-USAC standard (USAC = Joint Speech and Source Coding). For the independent data type, all outputs of the 3D sound source encoder depicted in Fig. 14 are an MPEG 4 data stream having a container-like structure. Further, the metadata is indicated as "OAM" data, and the metadata compressor 400 in FIG. 12 corresponds to the OAM encoder 400 to obtain compressed OAM data input into the USAC encoder 300, as depicted in FIG. The USAC encoder 300 further includes an output interface for obtaining an MP4 output data stream having encoded channel/object data and compressed OAM data.

根據上述多個實施例中的其中一個，在第14圖中的OAM編碼器400係為裝置250的元數據編碼器210，用於產生編碼音源資訊。此外，根據上述多個實施例中的其中一個，在第14圖中的SAOC編碼器800 以及USAC編碼器300一起組成裝置250的音源編碼器220，用於產生編碼音源資訊。 In accordance with one of the various embodiments described above, the OAM encoder 400 in FIG. 14 is a metadata encoder 210 of the apparatus 250 for generating encoded sound source information. Further, according to one of the above various embodiments, the SAOC encoder 800 in FIG. 14 And the USAC encoder 300 together form the source encoder 220 of the device 250 for generating encoded source information.

第16圖係繪示3D音源編碼器之另一實施例。相對於第14圖，SAOC編碼器可用於使用SAOC編碼演算法以進行另一編碼，在預轉譯器/混合器200上所提供的複數個聲道於此模式下不會活躍，或者，SAOC編碼器用於SAOC編碼加入物件的複數個預轉譯聲道。因此，在第16圖中的SAOC編碼器800可在三種不同類型的輸入數據上操作，亦即複數個聲道不具有任何預處理物件、複數個聲道以及複數個預轉譯物件，或是複數個獨立物件。此外，較佳地，在第16圖中提供另一OAM解碼器420，以使SAOC編碼器800用於處理在編碼器側上與其相同的數據，亦即失真壓縮所取得的數據，而非原始的OAM數據。 Figure 16 is a diagram showing another embodiment of a 3D sound source encoder. With respect to Figure 14, the SAOC encoder can be used to perform another encoding using the SAOC encoding algorithm, and the plurality of channels provided on the pre-translator/mixer 200 are not active in this mode, or SAOC encoding The device is used for SAOC encoding to add a plurality of pre-translation channels to the object. Therefore, the SAOC encoder 800 in FIG. 16 can operate on three different types of input data, that is, a plurality of channels do not have any preprocessed objects, a plurality of channels, and a plurality of pre-translated objects, or plural Separate objects. Further, preferably, another OAM decoder 420 is provided in FIG. 16 to cause the SAOC encoder 800 to process the same data on the encoder side, that is, the data obtained by distortion compression, instead of the original OAM data.

在第16圖中，3D音源編碼器可在多個獨立模式下操作。 In Figure 16, the 3D source encoder can operate in multiple independent modes.

除了在第12圖的上下文中所描述的第一模式以及第二模式下外，在第16圖中的3D音源編碼器可額外地在第三模式下操作，當預轉譯/混合器200沒有活躍時，核心編碼器在第三模式下從複數個獨立物件中產生至少一運輸聲道。另外或額外地，當對應於第12圖中的混合器200的預轉譯/混合器200未活耀，SAOC編碼器在第三模式下從複數個原始訊號中產生至少一個另外的或額外的運輸聲道。 The 3D sound source encoder in Fig. 16 may additionally operate in the third mode except when the first mode and the second mode are described in the context of Fig. 12, when the pre-translation/mixer 200 is not active The core encoder generates at least one transport channel from the plurality of independent objects in the third mode. Additionally or additionally, when the pre-translation/mixer 200 corresponding to the mixer 200 in FIG. 12 is not active, the SAOC encoder generates at least one additional or additional transport from the plurality of original signals in the third mode. Channel.

最後，當3D音源編碼器使用於第四模式時，SAOC編碼器800可對加入預轉譯/混合器所產生的複數個預轉譯物件的複數個聲道進行編碼。因此，在第四模式下，由於複數個聲道以及複數個物件完整地被傳送到複數個獨立的SAOC運輸聲道內，最低的位元率應用將提供良好的品質，並與第3圖以及第5圖中所指示的側編碼資訊相關聯而作為“SAOC-SI”，另外，在第四模式下，不會有任何的經壓縮元數據被傳送。 Finally, when the 3D sound source encoder is used in the fourth mode, the SAOC encoder 800 can encode a plurality of channels of the plurality of pre-translated objects generated by the pre-translation/mixer. Therefore, in the fourth mode, since a plurality of channels and a plurality of objects are completely transmitted into a plurality of independent SAOC transport channels, the lowest bit rate application will provide good quality, and with FIG. 3 and The side code information indicated in Fig. 5 is associated as "SAOC-SI", and in the fourth mode, no compressed metadata is transmitted.

根據上述多個實施例中的其中一個，在第16圖中的OAM編碼器400係為裝置250的元數據編碼器210，用於產生編碼音源資訊。此外，根據上述多個實施例中的其中一個，在第16圖中的SAOC編碼器800以及USAC編碼器300一起組成裝置250的音源編碼器220，用於產生編碼音源資訊。 In accordance with one of the various embodiments described above, the OAM encoder 400 in FIG. 16 is a metadata encoder 210 of the apparatus 250 for generating encoded sound source information. Further, according to one of the above various embodiments, the SAOC encoder 800 and the USAC encoder 300 in Fig. 16 together constitute the sound source encoder 220 of the device 250 for generating encoded sound source information.

根據另一實施例，提供一種對音源輸入數據101進行編碼以取得音源輸出數據501的裝置，其包含：一輸入界面1100，係用於接收複數個音源聲道、複數個音源物件以及有關於複數個音源物件之至少一個的元數據；一混合器200，係混合複數個物件以及複數個聲道以取得複數個預先混合聲道，複數個預先混合聲道中的每一個包含一聲道之音源數據以及至少一物件之音源數據；一裝置250，用於產生包含一元數據編碼器以及一音源編碼器的編碼音源資訊。 According to another embodiment, an apparatus for encoding sound source input data 101 to obtain sound source output data 501 is provided, comprising: an input interface 1100 for receiving a plurality of sound source channels, a plurality of sound source objects, and related to a plurality of sound source objects Metadata of at least one of the sound source objects; a mixer 200, which mixes the plurality of objects and the plurality of channels to obtain a plurality of pre-mixed channels, each of the plurality of pre-mixed channels includes a channel of one channel Data and sound source data of at least one object; a device 250 for generating encoded sound source information including a metadata encoder and a sound source encoder.

裝置250的音源編碼器220係為對核心編碼器輸入數據進行核心編碼的核心編碼器300。 The source encoder 220 of the device 250 is a core encoder 300 that core encodes core encoder input data.

用於產生編碼音源數據之裝置250的元數據編碼器210係為對有關於複數個音源物件中的至少一個的元數據進行壓縮的一元數據壓縮器400。 A metadata encoder 210 for generating means 250 for encoding source data is a metadata compressor 400 for compressing metadata relating to at least one of a plurality of source objects.

第13圖係繪示根據本發明之一實施例之3D音源解碼器。3D音源解碼器接收編碼音源數據作為一輸入，亦即第12圖的數據501。 Figure 13 is a diagram showing a 3D sound source decoder in accordance with an embodiment of the present invention. The 3D sound source decoder receives the encoded sound source data as an input, that is, the data 501 of Fig. 12.

3D音源解碼器包含一元數據解壓縮器1400、一核心解碼器1300、一處理器1200、一模式控制器1600以及一後處理器1700。 The 3D sound source decoder includes a metadata decompressor 1400, a core decoder 1300, a processor 1200, a mode controller 1600, and a post processor 1700.

具體地，3D音源解碼器係用以解碼編碼音源數據，輸入界面係用以接收包含複數個編碼聲道以及複數個編碼物件的編碼音源數據，在一特定的模式下，經壓縮元數據係與複數個物件相關聯。 Specifically, the 3D sound source decoder is configured to decode the encoded sound source data, and the input interface is configured to receive the encoded sound source data including the plurality of encoded channels and the plurality of encoded objects, in a specific mode, the compressed metadata system and A plurality of objects are associated.

此外，核心解碼器1300係用以解碼複數個編碼聲道以及複數個編碼物件，額外地，元數據解壓縮器係用以解壓縮經壓縮元數據。 In addition, the core decoder 1300 is configured to decode a plurality of code channels and a plurality of coded objects. Additionally, a metadata decompressor is used to decompress the compressed metadata.

此外，物件處理器1200係用以使用解壓縮元數據處理核心解碼器1300所產生的複數個解碼物件，以取得包含物件數據以及複數個解碼聲道的一預定數量的複數個輸出聲道，該輸出聲道在1205上被指示並接著被輸入到後處理器1700內。後處理器1700係用以將一定數量的輸出聲道1205轉換成一特定輸出格式，該特定輸出格式可以為二進制輸出格式或揚聲器輸出格式，例如5.1以及7.1等輸出格式。 In addition, the object processor 1200 is configured to use the plurality of decoded objects generated by the decompressed metadata processing core decoder 1300 to obtain a predetermined number of output channels including the object data and the plurality of decoded channels. The output channel is indicated at 1205 and then input to post processor 1700. The post processor 1700 is configured to convert a number of output channels 1205 into a particular output format, which may be a binary output format or a speaker output format, such as output formats such as 5.1 and 7.1.

較佳地，3D音源解碼器包含一模式控制器1600，該模式控制器1600係用以分析編碼數據以檢測一模式指示。因此，模式控制器1600 係連接到第13圖內的輸入界面1100。然而，模式控制器在此並非為必要的。相反地，可調式音源解碼器可藉由任何其他種類的控制數據進行預設，例如使用者輸入或任何其他控制。較佳地，在第13圖中的3D音源解碼器係藉由模式控制器1600進行控制，並用以旁通任何物件處理器並將複數個解碼聲道饋入後處理器1700。當第二模式應用於3D音源編碼器時，即第12圖的3D音源編碼器在第二模式下操作時，則僅有預轉譯聲道被接收。另外，當第一模式應用於3D音源編碼器在時，亦即當3D音源編碼器已執行獨立的聲道/物件編碼時，物件處理器1200不會被旁通，而複數個解碼聲道以及複數個解碼物件與元數據解壓縮器1400產生的解壓縮元數據一同被饋入到物件處理器1200。 Preferably, the 3D sound source decoder includes a mode controller 1600 for analyzing the encoded data to detect a mode indication. Therefore, the mode controller 1600 The system is connected to the input interface 1100 in FIG. However, a mode controller is not necessary here. Conversely, the adjustable source decoder can be preset by any other kind of control data, such as user input or any other control. Preferably, the 3D sound source decoder in FIG. 13 is controlled by mode controller 1600 and is used to bypass any object processor and feed a plurality of decoded channels to post processor 1700. When the second mode is applied to the 3D sound source encoder, that is, when the 3D sound source encoder of Fig. 12 operates in the second mode, only the pre-translation channel is received. In addition, when the first mode is applied to the 3D sound source encoder, that is, when the 3D sound source encoder has performed independent channel/object encoding, the object processor 1200 is not bypassed, and the plurality of decoded channels and The plurality of decoded objects are fed to the object processor 1200 along with the decompressed metadata generated by the metadata decompressor 1400.

較佳地，應用第一模式或第二模式的指示係被包含於解碼音源數據，模式控制器1600分析解碼數據以檢測一模式指示。當模式指示表示編碼音源數據包含複數個編碼聲道以及複數個編碼物件時，使用第一模式；而當模式指示表示編碼音源數據不包含任何音源物件(亦即僅包含由第12圖中的3D音源解碼器取得的複數個預轉譯聲道)時，使用第二模式。 Preferably, the indication that the first mode or the second mode is applied is included in the decoded sound source data, and the mode controller 1600 analyzes the decoded data to detect a mode indication. The first mode is used when the mode indication indicates that the encoded sound source data includes a plurality of coded channels and the plurality of coded objects; and the mode indication indicates that the coded source data does not contain any source objects (ie, only includes the 3D in FIG. 12) The second mode is used when a plurality of pre-translated channels are obtained by the sound source decoder.

在第13圖中，根據上述多個實施例中的其中一個，元數據解壓縮器1400係為裝置100的元數據解碼器110，用於產生至少一音源聲道。此外，根據上述多個實施例中的其中一個，第13圖中的核心解碼器1300、物件處理器1200以及後處理器1700一起組成裝置100的音源解碼器120，用於產生複數個音源聲道。 In Fig. 13, in accordance with one of the various embodiments described above, the metadata decompressor 1400 is a metadata decoder 110 of the apparatus 100 for generating at least one source channel. Further, according to one of the above various embodiments, the core decoder 1300, the object processor 1200, and the post-processor 1700 in FIG. 13 together constitute the sound source decoder 120 of the device 100 for generating a plurality of sound source channels. .

第15圖係繪示相對於第13圖的3D音源編碼器的一實施例，第15圖的實施例係對應於第14圖的3D音源編碼器。除了在第13圖中3D音源編碼器的實施方式之外，在第15圖中的3D音源編碼器包含一SAOC解碼器1800。此外，第13圖的物件處理器1200被實施作為一獨立的物件轉譯器1210以及混合器1220，物件轉譯器1210的功能也可藉由SAOC解碼器1800根據不同的模式來實施。 Fig. 15 is a view showing an embodiment of the 3D sound source encoder with reference to Fig. 13, and the embodiment of Fig. 15 corresponds to the 3D sound source encoder of Fig. 14. The 3D sound source encoder in Fig. 15 includes a SAOC decoder 1800 in addition to the embodiment of the 3D sound source encoder in Fig. 13. In addition, the object processor 1200 of FIG. 13 is implemented as a separate object translator 1210 and a mixer 1220. The function of the object translator 1210 can also be implemented by the SAOC decoder 1800 according to different modes.

此外，後處理器1700可被實施作為一二進制轉譯器1710或一格式轉換器1720。另外，也可實施第13圖的數據1205的直接輸出，如1730所繪示。因此，為了具有可變性，較佳的是使用較多數量(例如22.2 或32)的聲道執行解碼器內的處理，如果需要一較小的格式，再接著進行後處理。然而，當一開始就清楚知道僅需要小格式(例如5.1格式)，較佳地，如第13圖或第6圖的快捷1727所繪示，可施加跨越SAOC解碼器及/或USAC解碼器的一特別控制，以避免不必要的升混合操作以及隨後的降混合操作。 Additionally, post processor 1700 can be implemented as a binary translator 1710 or a format converter 1720. Alternatively, direct output of data 1205 of Figure 13 can be implemented, as depicted at 1730. Therefore, in order to have variability, it is preferable to use a larger number (for example, 22.2) Or the channel of 32) performs processing within the decoder, if a smaller format is required, followed by post processing. However, it is clear from the outset that only a small format (e.g., 5.1 format) is needed, preferably, as depicted by the shortcut 1727 of Figure 13 or Figure 6, an application across the SAOC decoder and/or the USAC decoder can be applied. A special control is taken to avoid unnecessary liter mixing operations and subsequent downmix operations.

在本發明的較佳實施例中，物件處理器1200包含SAOC解碼器1800，該SAOC解碼器1800係用以解碼核心解碼器所輸出的至少一運輸聲道以及相關聯的參數化數據，並使用解碼元數據以取得複數個轉譯音源物件。為此，OAM輸出被連接至方塊1800。 In a preferred embodiment of the present invention, the object processor 1200 includes a SAOC decoder 1800 for decoding and using at least one transport channel and associated parameterized data output by the core decoder. The metadata is decoded to obtain a plurality of translated source objects. To this end, the OAM output is connected to block 1800.

此外，物件處理器1200係用以轉譯核心解碼器所輸出的複數個解碼物件，其並未被編碼於複數個SAOC運輸聲道，而是獨立編碼於物件轉譯器1210所指示的複數個典型單一聲道元件。此外，解碼器包含相對應於輸出1730的一輸出界面，用於將混合器之一輸出輸出到複數個揚聲器。 In addition, the object processor 1200 is configured to translate a plurality of decoded objects output by the core decoder, which are not encoded in a plurality of SAOC transport channels, but are independently encoded in a plurality of typical singles indicated by the object translator 1210. Channel component. In addition, the decoder includes an output interface corresponding to output 1730 for outputting one of the mixer outputs to a plurality of speakers.

在另一實施例中，物件處理器1200包含一空間音源物件編碼解碼器1800，用於解碼至少一運輸聲道以及相關聯的參數化側邊資訊，其代表複數個編碼音源訊號或複數個編碼音源聲道，其中空間音源物件編碼解碼器係用以將相關聯的參數化資訊以及解壓縮元數據轉碼到經轉碼之參數化側邊資訊，以使能夠直接地轉譯輸出格式，例如在SAOC的早期版本所定義的示例。後處理器1700係用以使用複數個解碼運輸聲道以及經轉碼的參數化側邊資訊，以計算輸出格式的複數個音源聲道。後處理器所執行的處理可相似於MPEG環繞處理或可以為任何其他的處理，例如BCC處理等。 In another embodiment, the object processor 1200 includes a spatial sound source object codec 1800 for decoding at least one transport channel and associated parameterized side information representing a plurality of encoded sound source signals or a plurality of codes a source channel, wherein the spatial source codec is used to transcode the associated parameterized information and decompressed metadata to the transcoded parametric side information to enable direct translation of the output format, such as in An example defined by an earlier version of SAOC. The post processor 1700 is configured to use a plurality of decoded transport channels and transcoded parametric side information to calculate a plurality of source channels of the output format. The processing performed by the post processor may be similar to MPEG Surround processing or may be any other processing, such as BCC processing or the like.

在另一實施例中，物件處理器1200包含一空間音源物件編碼解碼器1800，用於使用複數個解碼(藉由核心解碼器)運輸聲道以及參數化側邊資訊，針對輸出格式直接升混合以及轉譯。 In another embodiment, the object processor 1200 includes a spatial source object codec 1800 for transporting channels and parameterizing side information using a plurality of decodings (by the core decoder) for direct mixing of the output formats. And translation.

此外，重要的是，第13圖的物件處理器1200更包含混合器1220，當存在複數個預轉譯物件與複數個聲道的混合時(亦即當第12圖的混合器200活躍時)，混合器1220直接地接收USAC解碼器1300所輸出的數據並作為一輸入。此外，混合器1220從執行物件轉譯的物件轉譯器接收沒有經SAOC解碼的數據。此外，混合器接收SAOC解碼器輸出數據，亦即複數個SAOC轉譯的物件。 In addition, it is important that the object processor 1200 of FIG. 13 further includes a mixer 1220, when there is a mixture of a plurality of pre-translated objects and a plurality of channels (ie, when the mixer 200 of FIG. 12 is active), The mixer 1220 directly receives the number output by the USAC decoder 1300 According to and as an input. In addition, the mixer 1220 receives data that has not been SAOC decoded from an object translator that performs object translation. In addition, the mixer receives the SAOC decoder output data, that is, a plurality of SAOC translated objects.

混合器1220係連接到輸出界面1730、二進制轉譯器1710以及格式轉換器1720。二進制轉譯器1710係用以使用頭部相關的轉換函數或雙耳空間脈衝響應(BRIR)，以將複數個輸出聲道轉譯成兩個二進制聲道。格式轉換器1720係用以將複數個輸出聲道轉換成一輸出格式，該輸出格式具有數量少於混合器的複數個輸出聲道1205的聲道，格式轉換器1720需要再現佈局上的資訊，例如5.1揚聲器等。 Mixer 1220 is coupled to output interface 1730, binary translator 1710, and format converter 1720. Binary translator 1710 is used to translate a plurality of output channels into two binary channels using a head related transfer function or a binaural spatial impulse response (BRIR). The format converter 1720 is for converting a plurality of output channels into an output format having a number of channels smaller than the plurality of output channels 1205 of the mixer, and the format converter 1720 needs to reproduce information on the layout, for example 5.1 speakers, etc.

根據上述多個實施例中的其中一個，在第15圖中的OAM解碼器1400係為裝置100的元數據解碼器110，用以產生至少一音源數據。此外，根據上述多個實施例中的其中一個，在第15圖中的物件轉譯器1210、USAC解碼器1300以及混合器1220一起組成裝置100的音源解碼器120，用於產生至少一音源聲道。 In accordance with one of the various embodiments described above, the OAM decoder 1400 in FIG. 15 is a metadata decoder 110 of the apparatus 100 for generating at least one source data. Further, according to one of the above various embodiments, the object translator 1210, the USAC decoder 1300, and the mixer 1220 in FIG. 15 together constitute the sound source decoder 120 of the apparatus 100 for generating at least one sound source channel. .

第17圖中的3D音源解碼器不同於第15圖中的3D音源解碼器，不同之處在於其SAOC解碼器不僅能產生複數個轉譯物件，也能產生複數個轉譯聲道，在此情況下，3D音源解碼器已被使用於第16圖中，且在複數個聲道/預轉譯物件以及SAOC編碼器800輸入界面之間的連接900為活躍的。 The 3D sound source decoder in Fig. 17 is different from the 3D sound source decoder in Fig. 15, except that the SAOC decoder can generate not only a plurality of translation objects but also a plurality of translation channels, in which case A 3D sound source decoder has been used in Figure 16, and the connection 900 between the plurality of channels/pre-translated objects and the SAOC encoder 800 input interface is active.

此外，向量基準波幅泛移(VBAP)級1810係用以從SAOC解碼器接收在再現佈局上的資訊，並將轉譯矩陣輸出到SAOC解碼器，以使SAOC解碼器在終端能以1205(亦即32個揚聲器)的高聲道格式來提供複數個轉譯聲道，而不需混合器的任何額外的操作。 In addition, a vector reference amplitude shifting (VBAP) stage 1810 is used to receive information on the reproduction layout from the SAOC decoder and output the translation matrix to the SAOC decoder so that the SAOC decoder can be at 1205 at the terminal (ie, The high channel format of 32 speakers) provides a plurality of translation channels without any additional manipulation by the mixer.

較佳地，VBAP方塊係接收經解壓縮OAM數據以衍生複數個轉譯矩陣。更普遍的，較佳的是需要再現布局以及複數個輸入訊號應被轉譯到再現布局之位置的幾何資訊。幾何輸入數據可以為複數個物件或聲道位置資訊的OAM數據，其中複數個聲道已使用SAOC傳送。 Preferably, the VBAP block receives the decompressed OAM data to derive a plurality of translation matrices. More generally, it is preferred to have a reproduction layout and geometric information that the plurality of input signals should be translated to the position of the reproduction layout. The geometric input data can be OAM data for a plurality of object or channel position information, wherein a plurality of channels have been transmitted using SAOC.

然而，如果僅需要一特定的輸出界面，則VBAP級1810已經針對例如5.1輸出而提供所需要的轉譯矩陣。SAOC解碼器1800係執行來自SAOC運輸聲道、相關聯的參數數據以及解壓縮元數據的直接轉譯，而不須混合器1220之互相作用下直接轉譯成所需要的輸出格式。然而，當多個模式之間採用特定的混合時，即幾個聲道係SAOC編碼但非所有聲道皆為SAOC編碼；或是幾個物件係SAOC編碼但非所有物件皆SAOC編碼；或是僅特定數量的具有聲道之前轉譯物件係SAOC解碼而剩餘聲道不以SAOC處理，然後混合器將從個別輸入部分，即直接來自核心解碼器1300、物件轉譯器1210以及SAOC解碼器1800的數據放在一起。 However, if only a particular output interface is required, the VBAP stage 1810 has provided the required translation matrix for, for example, 5.1 output. SAOC decoder 1800 is executed Direct translation from the SAOC transport channel, associated parameter data, and decompressed metadata is directly translated into the desired output format without the interaction of the mixer 1220. However, when a particular blend is used between multiple modes, that is, several channels are SAOC encoded but not all channels are SAOC encoded; or several objects are SAOC encoded but not all objects are SAOC encoded; Only a certain number of pre-channel translated objects are SAOC decoded and the remaining channels are not processed by SAOC, then the mixer will take data from the individual input portions, ie directly from core decoder 1300, object translator 1210, and SAOC decoder 1800. put it together.

在第17圖中，上述多個實施例之一的用於產生至少一音源聲道的裝置100之元數據解碼器110係為一OAM解碼器1400。而且，在第17圖中，上述多個實施例之一的用於產生至少一音源聲道的裝置100之音源解碼器120係由物件轉譯器1210、USAC解碼器1300以及混合器1220一起組成。 In Fig. 17, the metadata decoder 110 of the apparatus 100 for generating at least one source channel of one of the above embodiments is an OAM decoder 1400. Moreover, in Fig. 17, the sound source decoder 120 of the apparatus 100 for generating at least one sound source channel of one of the above-described plurality of embodiments is composed of an object translator 1210, a USAC decoder 1300, and a mixer 1220.

本發明提供一種對編碼音源數據進行解碼的裝置。對編碼音源數據進行解碼的裝置包含：一輸入界面1100，係用於接收編碼音源數據，此編碼音源數據包含複數個編碼聲道、或是複數個編碼物件、或是有關於複數個物件的壓縮元數據；以及一裝置100，包含元數據解碼器110以及音源聲道產生器120，用於產生至少一如上所述的音源聲道。 The present invention provides an apparatus for decoding encoded source data. The device for decoding the encoded sound source data comprises: an input interface 1100, configured to receive encoded sound source data, the encoded sound source data comprising a plurality of encoded channels, or a plurality of encoded objects, or compression of a plurality of objects Metadata; and a device 100 comprising a metadata decoder 110 and a tone channel generator 120 for generating at least one source channel as described above.

用於產生至少一音源聲道的裝置100之元數據解碼器110係為對壓縮元數據進行解壓縮的一元數據解壓縮器400。 The metadata decoder 110 of the apparatus 100 for generating at least one source channel is a metadata decompressor 400 that decompresses the compressed metadata.

用以產生至少一音源聲道的裝置100之音源聲道產生器120係包含用於解碼複數個編碼聲道以及複數個編碼物件的一核心解碼器1300。 The source channel generator 120 of the apparatus 100 for generating at least one source channel includes a core decoder 1300 for decoding a plurality of code channels and a plurality of coded objects.

而且，音源聲道產生器120更包含一物件處理器1200，其使用解壓縮元數據處理複數個解碼物件，以從複數個物件以及複數個解碼聲道取得包含音源數據之複數個輸出聲道1205之數量。 Moreover, the sound source channel generator 120 further includes an object processor 1200 that processes the plurality of decoded objects using the decompressed metadata to obtain a plurality of output channels 1205 including the sound source data from the plurality of objects and the plurality of decoded channels. The number.

此外，音源聲道產生器120更包含一後處理器1700，其將複數個輸出聲道1205之數量轉換成一輸出格式。 In addition, the sound source channel generator 120 further includes a post processor 1700 that converts the number of the plurality of output channels 1205 into an output format.

雖然一些態樣已經在裝置之內容中描述，清楚的是這些態樣亦代表相對應的方法之描述，而方塊或是裝置係對應方法步驟或是方法步驟之特徵。同樣地，在方法步驟之內容中描述的態樣亦代表相對應的方塊或是項目或是相對應裝置之特徵的描述。 Although some aspects have been described in the content of the device, it is clear that these aspects It also represents a description of the corresponding method, and the block or device corresponds to a method step or a method step. Likewise, the aspects described in the context of the method steps also represent a description of the corresponding blocks or items or features of the corresponding device.

本發明的解壓縮訊號可儲存在數位儲存媒體上或是可傳送至傳送媒體上(例如無線傳送媒體)或是有線傳送媒體(例如網際網路)。 The decompressed signal of the present invention can be stored on a digital storage medium or can be transmitted to a transmission medium (e.g., a wireless transmission medium) or a wired transmission medium (e.g., the Internet).

取決於特定的執行需求，本發明的實施例可在硬體或是在軟體上實現。此實現可使用性，數位儲存媒體，例如儲存有電子可讀取控制訊號的軟碟、DVD、CD、ROM、PROM-EPROM、EEPROM或是FLASH記憶體其能與一可程式化電腦系統合作(或是能夠配合)以執行上述方法。 Embodiments of the invention may be implemented in hardware or on software, depending on the particular implementation requirements. This implements usability, digital storage media, such as floppy disks, DVDs, CDs, ROMs, PROM-EPROMs, EEPROMs or FLASH memories that store electronically readable control signals, which can be used with a programmable computer system ( Or can cooperate) to perform the above method.

根據本發明之一些實施例包含具有電子可讀取控制訊號的非暫態數據載體，其能夠與可程式化電腦系統配合，以執行上述方法中的其中一個。 Some embodiments in accordance with the present invention comprise a non-transitory data carrier having an electronically readable control signal that is capable of cooperating with a programmable computer system to perform one of the above methods.

通常，本發明之實施例可實現為一具有程式碼的電腦程式產品，當此電腦程式產品在一電腦上執行時此程式碼係操作以執行上述方法中的其中一個。例如此程式碼可儲存在機器可讀取載體上。 In general, an embodiment of the present invention can be implemented as a computer program product having a program code that operates to perform one of the above methods when executed on a computer. For example, the code can be stored on a machine readable carrier.

其他實施例包含用以執行上述方法中的其中一個的電腦程式，其儲存在機器可讀取載體上。 Other embodiments include a computer program for performing one of the above methods, stored on a machine readable carrier.

換句話說，因此發明的方法之實施例係為具有當此電腦程式在電腦上執行時，能執行上述方法中的其中一個的程式碼的電腦程式。 In other words, an embodiment of the inventive method is therefore a computer program having a code that can execute one of the above methods when the computer program is executed on a computer.

因此，本發明的方法之另一實施例數據載體(或是數位儲存媒體或是電腦可讀取媒體)包含紀錄用以執行上述方法中的其中一個的電腦程式。 Thus, another embodiment of the method of the present invention is a data carrier (either a digital storage medium or a computer readable medium) containing a computer program recorded to perform one of the above methods.

因此，本發明之方法之另一實施例係為一數據流或是一串訊號，其代表用於執行上述方法中的其中一個的電腦程式。例如數據流或是此串訊號可配置經由數據通訊連接傳輸，例如透過網際網路。 Thus, another embodiment of the method of the present invention is a data stream or a series of signals representing a computer program for performing one of the above methods. For example, the data stream or the serial signal can be configured to be transmitted via a data communication connection, such as through the Internet.

另一實施例包含一處理裝置例如電腦，或是可程式化邏輯裝置，用以或是採用執行上述方法中的其中一個。 Another embodiment includes a processing device such as a computer or a programmable logic device for either performing one of the methods described above.

另一實施例包含一安裝有用於執行上述方法中的其中一個之電腦程式的電腦。 Another embodiment includes a computer having a computer program for performing one of the above methods.

在一些實施例中，可程式化邏輯裝置(例如場效可程式化閘極陣列)可用以執行上述方法之一些或是全部功能。在一些實施例中，為了執行上述方法中的其中一個，場效可程式化閘極陣列可配合微處理器。通常，此方法可藉由任何硬體裝置較佳執行。 In some embodiments, a programmable logic device, such as a field effect programmable gate array, can be used to perform some or all of the functions of the above methods. In some embodiments, in order to perform one of the above methods, the field effect programmable gate array can be mated to a microprocessor. Generally, this method can be preferably performed by any hardware device.

上述實施例係僅為本發明原理之說明。應理解的是在較佳實施例之詳細說明中所提出之具體實施例僅用以方便說明本發明之技術內容，而非將本發明狹義地限制於上述實施例，在不超出本發明之精神及以下申請專利範圍之情況，所做之種種變化實施，皆屬於本發明之範圍。 The above embodiments are merely illustrative of the principles of the invention. It is to be understood that the specific embodiments of the present invention are not intended to be limited In the case of the following patent application, various changes are made and are within the scope of the invention.

references:

[1] Peters, N., Lossius, T. and Schacher J. C., "SpatDIF: Principles, Specification, and Examples", 9th Sound and Music Computing Conference, Copenhagen, Denmark, Jul. 2012. [1] Peters, N., Lossius, T. and Schacher J. C., "SpatDIF: Principles, Specification, and Examples", 9th Sound and Music Computing Conference, Copenhagen, Denmark, Jul. 2012.

[2] Wright, M., Freed, A., "Open Sound Control: A New Protocol for Communicating with Sound Synthesizers", International Computer Music Conference, Thessaloniki, Greece, 1997. [2] Wright, M., Freed, A., "Open Sound Control: A New Protocol for Communicating with Sound Synthesizers", International Computer Music Conference, Thessaloniki, Greece, 1997.

[3] Matthias Geier, Jens Ahrens, and Sascha Spors. (2010), "Object-based audio reproduction and the audio scene description format", Org. Sound, Vol. 15, No. 3, pp. 219-227, December 2010. [3] Matthias Geier, Jens Ahrens, and Sascha Spors. (2010), "Object-based audio reproduction and the audio scene description format", Org. Sound, Vol. 15, No. 3, pp. 219-227, December 2010.

[4] W3C, "Synchronized Multimedia Integration Language (SMIL 3.0)", Dec. 2008. [4] W3C, "Synchronized Multimedia Integration Language (SMIL 3.0)", Dec. 2008.

[5] W3C, "Extensible Markup Language (XML) 1.0 (Fifth Edition)", Nov. 2008. [5] W3C, "Extensible Markup Language (XML) 1.0 (Fifth Edition)", Nov. 2008.

[6] MPEG, "ISO/IEC International Standard 14496-3 - Coding of audio-visual objects, Part 3 Audio", 2009. [6] MPEG, "ISO/IEC International Standard 14496-3 - Coding of audio-visual objects, Part 3 Audio", 2009.

[7] Schmidt, J.; Schroeder, E. F. (2004), "New and Advanced Features for Audio Presentation in the MPEG-4 Standard", 116th AES Convention, Berlin, Germany, May 2004 [7] Schmidt, J.; Schroeder, E. F. (2004), "New and Advanced Features for Audio Presentation in the MPEG-4 Standard", 116th AES Convention, Berlin, Germany, May 2004

[8] Web3D, "International Standard ISO/IEC 14772-1:1997 - The Virtual Reality Modeling Language (VRML), Part 1: Functional specification and UTF-8 encoding", 1997. [8] Web3D, "International Standard ISO/IEC 14772-1:1997 - The Virtual Reality Modeling Language (VRML), Part 1: Functional specification and UTF-8 encoding", 1997.

[9] Sporer, T. (2012), "Codierung räumlicher Audiosignale mit leicht-gewichtigen Audio-Objekten", Proc. Annual Meeting of the German Audiological Society (DGA), Erlangen, Germany, Mar. 2012. [9] Sporer, T. (2012), "Codierung räumlicher Audiosignale mit leicht-gewichtigen Audio-Objekten", Proc. Annual Meeting of the German Audiological Society (DGA), Erlangen, Germany, Mar. 2012.

[10] Ramer, U. (1972), "An iterative procedure for the polygonal approximation of plane curves", Computer Graphics and Image Processing, 1(3), 244-256. [10] Ramer, U. (1972), "An iterative procedure for the polygonal approximation of plane curves", Computer Graphics and Image Processing, 1(3), 244-256.

[11] Douglas, D.; Peucker, T. (1973), "Algorithms for the reduction of the number of points required to represent a digitized line or its caricature", The Canadian Cartographer 10(2), 112-122. [11] Douglas, D.; Peucker, T. (1973), "Algorithms for the reduction of the number of points required to represent a digitized line or its caricature", The Canadian Cartographer 10(2), 112-122.

[12] Ville Pulkki, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”; J. Audio Eng. Soc., Volume 45, Issue 6, pp. 456-466, June 1997. [12] Ville Pulkki, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”; J. Audio Eng. Soc., Volume 45, Issue 6, pp. 456-466, June 1997.

100‧‧‧裝置 100‧‧‧ device

110‧‧‧元數據解碼器 110‧‧‧ metadata decoder

Claims

A device (100) for generating at least one source channel, wherein the device comprises: a metadata decoder (110) for receiving at least one compressed metadata signal, wherein each of the at least one compressed metadata signal The method includes a plurality of first metadata sample values, wherein the first metadata sample value of each of the at least one compressed metadata signal indicates information associated with at least one source object signal, wherein the metadata decoder (110) The method is configured to generate at least one re-established metadata signal, such that each of the at least one re-established metadata signal includes the plurality of first metadata samples of the at least one compressed metadata signal and further comprises a plurality of a binary data sample, wherein the metadata decoder (110) generates each of the reconstructed metadata signals according to at least two of the plurality of first metadata samples of the reconstructed metadata signal a second metadata sample value; and a sound source channel generator (120) generating the signal according to the at least one sound source object signal and the at least one reconstructed metadata signal One less source channel.

The device (100) of claim 1, wherein the metadata decoder (110) generates each of the at least one by upsampling one of the at least one compressed metadata signals. And reconstructing the metadata signal, wherein the metadata decoder (110) performs linear interpolation according to at least two of the plurality of first metadata samples of the reconstructed metadata signal to generate each of the reconstructed metadata signals Each of the second metadata samples.

The apparatus (100) of claim 1, wherein each of the reconstructed metadata signals includes the first metadata sample value of the at least one compressed metadata signal, and the reconstructed metadata signal is associated with the Corresponding to the compressed metadata signal, wherein the metadata decoder (110) generates the plurality of approximate metadata samples for the reconstructed metadata signal to generate the plural of each of the at least one reconstructed metadata signals a second metadata sample value, wherein the metadata decoder (110) is configured to generate at least two of the plurality of first metadata samples according to the reconstructed metadata signal to generate each of the plurality of approximate metadata samples a value, wherein the metadata decoder (110) is configured to receive a plurality of differences for the compressed metadata signal of the at least one compressed metadata signal, and to compare each of the plurality of differences with the One of the plurality of approximate metadata samples of the built-in data signal is added to obtain the plurality of second metadata samples of the reconstructed metadata signal, and the reconstructed metadata signal is associated with the compressed element Data signals are associated.

The device (100) of claim 3, wherein the metadata decoder (110) is configured to receive the plurality of differences for the compressed metadata signal of the at least one compressed metadata signal, wherein Each of the plurality of differences is a received difference value assigned to one of the plurality of approximate metadata sample values of the reconstructed metadata signal associated with the compressed metadata signal, wherein the plurality of differences The data decoder (110) adds each of the received differences to the approximate metadata sample value associated with the received difference to obtain the plurality of second metadata of the reconstructed metadata signal. One of the sampled values, wherein when none of the plurality of received differences is associated with the approximate metadata sample value, the metadata decoder (110) is responsive to the at least one of the plurality of received differences An approximate difference value for each approximate metadata sample value of the reconstructed metadata signal associated with the compressed metadata signal, wherein the metadata decoder (110) is to each approximate Approximate value is added to the metadata of the sampled value of the difference approximation, to obtain the re-Chien wherein another of the plurality of data signals of the second sample values in the metadata.

The device (100) of claim 1, wherein at least one of the at least one reconstructed metadata signals includes location information of one of the at least one source object signals, or includes the at least one audio source a zooming performance on the position information of one of the object signals, and wherein the sound source channel generator (120) is in accordance with the sound source object signal of the at least one sound source object signal and according to the position information to generate the at least one sound source sound At least one of the Tao.

The device (100) of claim 1, wherein at least one of the at least one reconstructed metadata signals includes a volume of one of the at least one source object signals, or includes the at least one source object a zooming performance of the volume of one of the signals, and wherein the source channel generator (120) is in accordance with the source object signal of the at least one source object signal and according to the volume to generate the at least one source channel at least one.

The device (100) of claim 1, wherein the device (100) receives random access information, wherein the random access information refers to the compressed information for each of the compressed metadata signals. One of the metadata signals accesses the signal portion, wherein at least one other signal portion of the metadata signal is not indicated by the random access information, and wherein the metadata decoder (110) is in accordance with the compressed metadata signal The first metadata sample value of the access signal portion, but not any other first metadata sample value of any other signal portion of the compressed metadata signal, to generate the at least one reconstructed metadata signal One.

A device (250) for generating encoded sound source information, the encoded sound source information comprising at least one encoded sound source signal and at least one compressed metadata signal, wherein the device (250) comprises: a metadata decoder (210) for receiving At least one original metadata signal, wherein each of the at least one original metadata signal includes a plurality of metadata sample values, wherein the metadata sample value of each of the at least one original metadata signal is indicative of at least one The information associated with the source object signal of the source object signal, wherein the metadata encoder (210) is configured to generate the at least one compressed metadata signal such that each of the at least one compressed metadata signal includes the original metadata a first group of at least two of the metadata samples of one of the signals, and the plurality of metadata samples such that the compressed metadata signal does not include one of the plurality of original metadata signals Any of the other at least two of the second group of any metadata sample values, and a sound source encoder (220) for encoding the at least one The source object signal obtains the at least one encoded sound source signal.

The device (250) of claim 8, wherein the metadata encoder (210) is configured to generate the at least one compressed metadata signal such that each of the at least one compressed metadata signal includes the at least one a first group of at least two of the metadata samples of one of the original metadata signals, the compressed metadata signal being associated with the original metadata signal, wherein the at least one original metadata Each of the metadata sample values included in the signal and also included in the compressed metadata signal is associated with the original metadata signal and is one of a plurality of first metadata sample values. , wherein one of the at least one original metadata signal is included and is not the compressed number of elements Each of the metadata sample values included in the signal is associated with the original metadata signal and is one of a plurality of second metadata samples, wherein the metadata encoder (210) is in accordance with Performing linear interpolation on at least two of the plurality of first metadata samples of one of the at least one original metadata signal for each of the plurality of original metadata signals The plurality of second metadata sample values produces an approximate metadata sample value, and wherein the metadata encoder (210) is for each of the at least one original metadata signal The data sampled value produces a difference such that the difference represents a difference between the second metadata sample value and the approximate metadata sample value of the second metadata sample value.

The device (250) of claim 9, wherein the metadata encoder (210) is for the plurality of second metadata sample values of one of the at least one original metadata signal At least one of the plurality of differences determines whether each of the plurality of differences is greater than a threshold value.

The device (250) of claim 9, wherein the metadata encoder (210) is configured to compare the at least one metadata sample value of the at least one compressed metadata signal with the first quantity. The bit is encoded, wherein each of the at least one compressed metadata signal represents an integer, wherein the metadata encoder (210) is configured to use the plurality of second The at least one second difference of the metadata sample value is encoded with the second number of bits, wherein the at least one difference of the plurality of second metadata sample values represents an integer, and wherein the second quantity The bit system is smaller than the first number of bits.

The device (250) of claim 8, wherein at least one of the at least one original metadata signal includes location information of one of the at least one source object signal, or includes the at least one a scaled representation of the location information of one of the source object signals, and wherein the metadata encoder (210) generates the at least one compressed metadata signal in accordance with at least one of the at least one original metadata signal At least one of them.

The device (250) of claim 8, wherein at least one of the at least one original metadata signal includes a volume of one of the at least one source object signal, or includes the at least one sound source a scaled representation of the volume of one of the object signals, and wherein the metadata encoder (210) generates at least one of the at least one compressed metadata signal in accordance with at least one of the at least one original metadata signal .

A system, comprising: a device (250) according to any one of claims 8 to 13 for generating encoded source information, the encoded sound source information comprising at least one encoded sound source signal and at least one And a device (100) according to any one of claims 1 to 7 for receiving the at least one encoded sound source signal and the at least one compressed metadata signal, and Generating at least one source channel in accordance with the at least one encoded source signal and in accordance with the at least one compressed metadata signal.

A method for generating at least one source channel, wherein the method includes receiving at least one compressed metadata signal, wherein each of the at least one compressed metadata signal comprises a plurality of first metadata samples, each of each The first metadata sample value of the at least one compressed metadata signal is indicative of information associated with the at least one source object signal, and the at least one reconstructed metadata signal is generated such that each of the at least one reconstructed metadata signal includes the at least one The plurality of first metadata sample values of one of the compressed metadata signals, and further comprising a plurality of second metadata sample values, wherein the step of generating the at least one reconstructed metadata signal includes the reconstructing the metadata signal according to the reconstructed metadata signal And at least two of the plurality of first metadata sample values to generate each of the second metadata sample values of each of the reconstructed metadata signals, and according to the at least one source object signal and according to the at least one Rebuilding the metadata signal generates the at least one source channel.

A method for generating encoded source information, the encoded sound source information comprising at least one encoded sound source signal and at least one compressed metadata signal, wherein the method comprises: Receiving at least one original metadata signal, wherein each of the at least one original metadata signal includes a plurality of metadata sample values, wherein the metadata sample value of each of the at least one original metadata signal is indicative and at least The information associated with the source object signal of the source object signal generates the at least one compressed metadata signal such that each of the compressed metadata signals includes the plurality of metadata samples of the original metadata signal a first group of at least two, and a second group of at least two of the plurality of metadata samples that cause the compressed metadata signal to not include one of the plurality of original metadata signals Any metadata sample value, and encoding the at least one source object signal to obtain the at least one encoded sound source signal.

A computer program for implementing a computer program as set forth in claim 15 or 16 when the computer program is executed on a computer or a signal processor.

An apparatus for encoding sound source input data (101) to obtain sound source output data (501), comprising: an input interface (1100) for receiving a plurality of sound source channels, a plurality of sound source objects, and relating to at least one Metadata of a plurality of sound source objects; a mixer (200) mixing the plurality of objects and the plurality of channels to obtain a plurality of pre-mixed channels, each of the plurality of pre-mixed channels comprising one channel The sound source data and the sound source data of the at least one object; and the device (250) according to any one of claims 8 to 13, wherein any one of the claims 8th to 13th is applied for. The sound source encoder (220) of the device (250) is a core encoder (300) for core coding the core encoder input data, and the application of the patent scopes 8 to 13 The metadata encoder (210) of the device (250) is a metadata compressor (400) that compresses metadata about the at least one of the plurality of source objects.

An apparatus for decoding encoded source data, comprising: an input interface (1100), configured to receive the encoded sound source data, the encoded sound source data comprising a plurality of encoded channels, or a plurality of encoded objects, or related The plurality of objects And a device (100) according to any one of claims 1 to 7, wherein the device (100) according to any one of claims 1 to 7 The metadata decoder (110) is a metadata decompressor (400) for decompressing the compressed metadata; wherein the device of any one of claims 1 to 7 is 100) The sound source channel generator (120) includes a core decoder (1300) for decoding the plurality of code channels and the plurality of coded objects, wherein the sound source channel generator (120) further comprises An object processor (1200) that processes the plurality of decoded objects using decompressed metadata to obtain a plurality of output channels (1205) including sound source data from the object and the decoded channel. The number, and wherein the source channel generator (120) further includes a post processor (1700) that converts the number of the plurality of output channels (1205) into an output format.