JP3346404B2

JP3346404B2 - Audio coding device

Info

Publication number: JP3346404B2
Application number: JP2000328620A
Authority: JP
Inventors: 美昭田中; 昭治植野
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2000-10-27
Filing date: 2000-10-27
Publication date: 2002-11-18
Anticipated expiration: 2018-11-16
Also published as: JP2001188588A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マルチチャネルの
音声信号を可変長で圧縮するための音声符号化装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio encoding device for compressing a multi-channel audio signal with a variable length.

【０００２】[0002]

【従来の技術】音声信号を可変長で圧縮する方法とし
て、本発明者は先の出願（特願平９−２８９１５９号）
において１チャネルの原デジタル音声信号に対して、特
性が異なる複数の予測器により時間領域における過去の
信号から現在の信号の複数の線形予測値を算出し、原デ
ジタル音声信号と、この複数の線形予測値から予測器毎
の予測残差を算出し、予測残差の最小値を選択する予測
符号化方法を提案している。2. Description of the Related Art As a method of compressing an audio signal with a variable length, the present inventor has filed a prior application (Japanese Patent Application No. 9-289159).
Calculates a plurality of linear prediction values of a current signal from a past signal in the time domain by using a plurality of predictors having different characteristics with respect to the one-channel original digital audio signal. A prediction encoding method for calculating a prediction residual for each predictor from a prediction value and selecting a minimum value of the prediction residual has been proposed.

【０００３】なお、上記方法では原デジタル音声信号が
サンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２
０ビット程度の場合にある程度の圧縮効果を得ることが
できるが、近年のＤＶＤオーディオディスクではこの２
倍のサンプリング周波数（＝１９２ｋＨｚ）が使用さ
れ、また、量子化ビット数も２４ビットが使用される傾
向があるので、圧縮率を改善する必要がある。また、マ
ルチチャネルにおけるサンプリング周波数と量子化ビッ
ト数はチャネル毎に異なることもある。In the above method, the original digital audio signal has a sampling frequency = 96 kHz and the number of quantization bits = 2.
Although a certain degree of compression effect can be obtained in the case of about 0 bits, in recent DVD audio discs, this 2
Since a double sampling frequency (= 192 kHz) is used and the number of quantization bits tends to be 24 bits, it is necessary to improve the compression ratio. Further, the sampling frequency and the number of quantization bits in the multi-channel may be different for each channel.

【０００４】[0004]

【発明が解決しようとする課題】ところで、予測符号化
方式のような圧縮方式は圧縮率が可変（ＶＢＲ：バリア
ブル・ビット・レート）であるので、マルチチャネルの
音声信号を予測符号化するとチャネル毎のデータ量が時
間的に大きく変化する。また、このようなデータを伝送
する場合には、チャネル毎にパラレルではなくデータス
トリームとして伝送される。By the way, since a compression rate such as a predictive coding method has a variable compression ratio (VBR: variable bit rate), when predictive coding of a multi-channel audio signal is performed, Of data greatly changes over time. When transmitting such data, the data is transmitted as a data stream instead of parallel for each channel.

【０００５】したがって、再生側（デコード側）におい
てこのような可変長のデータストリームをチャネル毎に
同期して再生（プレゼンテーション）可能にするために
は、入力バッファに蓄積されたデータストリームを読み
出してデコーダに出力するためのタイミングを示すデコ
ード時間と、出力バッファに蓄積されたデコード後のデ
ータを読み出してスピーカなどに出力（プレゼンテーシ
ョン）するためのタイミングを示す再生時間を管理しな
ければならない。また、再生側でこのような可変長のデ
ータストリームをサーチ再生するための時間を管理しな
ければならない。Therefore, in order to allow such a variable-length data stream to be reproduced (presented) on the reproduction side (decoding side) in synchronization with each channel, the data stream stored in the input buffer is read out and the decoder is read out. It is necessary to manage a decoding time indicating a timing for outputting the data to the output buffer and a reproduction time indicating a timing for reading out the decoded data stored in the output buffer and outputting (presenting) the data to a speaker or the like. In addition, the playback side must manage the time for searching and playing back such a variable-length data stream.

【０００６】そこで本発明は、マルチチャネルの音声信
号を可変の圧縮率で符号化する場合に再生側の処理時間
を管理することができる音声符号化装置を提供すること
を目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide an audio encoding apparatus capable of managing a processing time on a reproduction side when encoding a multi-channel audio signal at a variable compression ratio.

【０００７】[0007]

【課題を解決するための手段】本発明は上記目的を達成
するために、以下に記載の手段よりなる。すなわち、The present invention, in order to achieve the above object, comprises the following means. That is,

【０００８】マルチチャネルの音声信号を、そのままの
チャネル又は互いに相関をとったチャネル毎に、入力さ
れる音声信号に応答して、先頭サンプル値を得ると共
に、特性が異なる複数の線形予測方法により時間領域の
過去の信号から現在の信号の線形予測値がそれぞれ予測
され、その予測される線形予測値と前記音声信号とから
得られる予測残差が最小となるような線形予測方法を選
択して圧縮する圧縮手段と、前記圧縮データの所定時間
前又は所定時間後のアクセスユニットをサーチ再生する
ためのアクセスユニット・サーチポインタを生成するタ
イミング生成手段と、前記アクセスユニット・サーチポ
インタを含むプライベートヘッダと、前記アクセスユニ
ットを含む前記圧縮データと、を含むユーザデータを有
するパケットにフォーマット化する手段とを、有する音
声符号化装置。In response to an input audio signal, a multi-channel audio signal is obtained for each channel as it is or for each channel correlated with each other , a leading sample value is obtained, and time is calculated by a plurality of linear prediction methods having different characteristics. Predict linear predictions of the current signal from past signals in the region
From the predicted linear prediction value and the audio signal
Compression means and the predetermined time of the compressed data prediction residual obtained is compressed by selecting the linear prediction method that minimizes
Search and play back the access unit before or after a predetermined time
To generate an access unit search pointer for
An access unit / search port
A private header including an interface and the access unit.
Have a user data, including, and the compressed data, including Tsu door
Means for formatting a packet to be encoded.

【０００９】[0009]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。図１は本発明に係る音声符号化装
置とそれに対応する音声復号装置の第１の実施形態を示
すブロック図、図２は図１の符号化部を詳しく示すブロ
ック図、図３は図１、図２の符号化部により符号化され
たビットストリームを示す説明図、図４はＤＶＤのパッ
クのフォーマットを示す説明図、図５はＤＶＤのオーデ
ィオパックのフォーマットを示す説明図、図６は図１の
復号化部を詳しく示すブロック図、図７は図６の入力バ
ッファの書き込み／読み出しタイミングを示すタイミン
グチャート、図８はアクセスユニット毎の圧縮データ量
を示す説明図、図９はアクセスユニットとプレゼンテー
ションユニットを示す説明図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a speech encoding apparatus according to the present invention and a speech decoding apparatus corresponding thereto, FIG. 2 is a block diagram showing the encoding unit of FIG. 1 in detail, and FIG. FIG. 4 is an explanatory diagram showing a bit stream encoded by the encoding unit in FIG. 2, FIG. 4 is an explanatory diagram showing a DVD pack format, FIG. 5 is an explanatory diagram showing a DVD audio pack format, and FIG. FIG. 7 is a timing chart showing the write / read timing of the input buffer of FIG. 6, FIG. 8 is an explanatory diagram showing the amount of compressed data for each access unit, and FIG. 9 is an access unit and presentation. It is explanatory drawing which shows a unit.

【００１０】ここで、マルチチャネル方式としては、例
えば次の４つの方式が知られている。（１）４チャネル方式ドルビーサラウンド方式の
ように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方Ｓの１チャ
ネルの合計４チャネル（２）５チャネル方式ドルビーＡＣ−３方式のＳ
Ｗチャネルなしのように、前方Ｌ、Ｃ、Ｒの３チャネル
＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル（３）６チャネル方式ＤＴＳ（Digital Theater
System）方式や、ドルビーＡＣ−３方式のように６チャ
ネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ）（４）８チャネル方式ＳＤＤＳ（Sony Dynamic D
igital Sound）方式のように、前方Ｌ、ＬＣ、Ｃ、Ｒ
Ｃ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チャネ
ルの合計８チャネルHere, for example, the following four systems are known as multi-channel systems. (1) Four-channel system As in the Dolby surround system, a total of four channels including three channels of front L, C, and R + one channel of rear S (2) Five-channel system S in the Dolby AC-3 system
Like without W channel, 3 channels of front L, C and R + 2 channels of rear SL and SR, total 5 channels (3) 6 channel system DTS (Digital Theater)
6) (L, C, R, SW (Lfe), SL, SR) such as the Dolby AC-3 system (4) 8-channel system SDDS (Sony Dynamic D
digital sound), forward L, LC, C, R
6 channels of C, R, SW + 2 channels of rear SL, SR, total 8 channels

【００１１】図１に示す符号化側の６チャネル（ch）ミ
クス＆マトリクス回路１’は、マルチチャネル信号の一
例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロ
ントライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラ
ウンドライト（Ｒｓ）及びＬｆｅ（Low Frequency Effe
ct）の６chのＰＣＭデータを次式（１）により前方グル
ープに関する２ch「１」、「２」と他のグループに関す
る４ch「３」〜「６」に分類して変換し、２ch「１」、
「２」を第１符号化部２’−１に、また、４ch「３」〜
「６」を第２符号化部２’−２に出力する。「１」＝Ｌｆ＋Ｒｆ「２」＝Ｌｆ−Ｒｆ「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２「４」＝Ｌｓ＋Ｒｓ「５」＝Ｌｓ−Ｒｓ「６」＝Ｌｆｅ−ａ×Ｃただし、０≦ａ≦１ …（１）The 6-channel (ch) mixing and matrix circuit 1 'on the encoding side shown in FIG. 1 includes a front left (Lf), a center (C), a front right (Rf), a surround left ( Ls), surround light (Rs) and Lfe (Low Frequency Effe)
ct), the 6-channel PCM data is classified and converted into 2ch “1” and “2” for the front group and 4ch “3” to “6” for the other group by the following equation (1), and converted into 2ch “1”.
"2" is assigned to the first encoding unit 2'-1, and 4ch "3" to
"6" is output to the second encoding unit 2'-2. “1” = Lf + Rf “2” = Lf−Rf “3” = C− (Ls + Rs) / 2 “4” = Ls + Rs “5” = Ls−Rs “6” = Lfe−a × C where 0 ≦ a ≦ 1 ... (1)

【００１２】符号化部２’を構成する第１及び第２符号
化部２’−１、２’−２はそれぞれ、図２に詳しく示す
ように２ch「１」、「２」と４ch「３」〜「６」のＰＣ
Ｍデータを予測符号化し、予測符号化データを図３に示
すようなビットストリームで記録媒体５や通信媒体６を
介して復号側に伝送する。復号側では復号化部３’を構
成する第１及び第２復号化部３’−１、３’−２によ
り、図６に詳しく示すようにそれぞれ前方グループに関
する２ch「１」、「２」と他のグループに関する４ch
「３」〜「６」の予測符号化データをＰＣＭデータに復
号する。As shown in detail in FIG. 2, the first and second encoding units 2'-1 and 2'-2 constituting the encoding unit 2 'respectively have 2ch "1", "2" and 4ch "3". "~" 6 "PC
The M data is predictively coded, and the predicted coded data is transmitted to the decoding side via the recording medium 5 and the communication medium 6 in a bit stream as shown in FIG. On the decoding side, the first and second decoding units 3'-1 and 3'-2 constituting the decoding unit 3 'respectively provide 2ch "1" and "2" for the forward group as shown in detail in FIG. 4ch for other groups
The predictive encoded data of “3” to “6” is decoded into PCM data.

【００１３】次いでミクス＆マトリクス回路４’により
式（１）に基づいて元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、
Ｒｓ、Ｌｆｅ）を復元するとともに、この元の６chと係
数ｍiｊ（ｉ＝１，２，ｊ＝１，２〜６）により次式
（２）のようにステレオ２chデータ（Ｌ、Ｒ）を生成す
る。Ｌ＝ｍ１１・Ｌｆ＋ｍ１２・Ｒｆ＋ｍ１３・Ｃ＋ｍ１４・Ｌｓ＋ｍ１５・Ｒｓ＋ｍ１６・ＬｆｅＲ＝ｍ２１・Ｌｆ＋ｍ２２・Ｒｆ＋ｍ２３・Ｃ＋ｍ２４・Ｌｓ＋ｍ２５・Ｒｓ＋ｍ２６・Ｌｆｅ …（２）Next, the original 6 ch (Lf, C, Rf, Ls,
Rs, Lfe) are restored, and stereo 2-ch data (L, R) is generated from the original 6 ch and coefficient mij (i = 1, 2, j = 1, 2 to 6) as in the following equation (2). I do. L = m11 · Lf + m12 · Rf + m13 · C + m14 · Ls + m15 · Rs + m16 · Lfe R = m21 · Lf + m22 · Rf + m23 · C + m24 · Ls + m25 · Rs + m26 · Lfe (2)

【００１４】図２を参照して符号化部２’−１、２’−
２について詳しく説明する。各ch「１」〜「６」のＰＣ
Ｍデータは１フレーム毎に１フレームバッファ１０に格
納される。そして、１フレームの各ch「１」〜「６」の
サンプルデータがそれぞれ予測回路１３Ｄ１、１３Ｄ
２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ch
「１」〜「６」の各フレームの先頭サンプルデータがフ
ォーマット化回路１９に印加される。予測回路１３Ｄ
１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそれぞれ、各ch
「１」〜「６」のＰＣＭデータに対して、特性が異なる
複数の予測器（不図示）により時間領域における過去の
信号から現在の信号の複数の線形予測値を算出し、次い
で原ＰＣＭデータと、この複数の線形予測値から予測器
毎の予測残差を算出する。続くバッファ・選択器１４Ｄ
１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれぞれ、予測回
路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４により算出
された各予測残差を一時記憶して、選択信号／ＤＴＳ
（デコーディング・タイム・スタンプ）生成器１７によ
り指定されたサブフレーム毎に予測残差の最小値を選択
する。Referring to FIG. 2, encoding sections 2'-1, 2'-
2 will be described in detail. PC for each channel "1" to "6"
The M data is stored in one frame buffer 10 for each frame. Then, the sample data of each of the channels “1” to “6” of one frame are respectively supplied to the prediction circuits 13D1 and 13D.
2, 15D1 to 15D4 and each channel
The first sample data of each frame of “1” to “6” is applied to the formatting circuit 19. Prediction circuit 13D
1, 13D2, 15D1 to 15D4 are each channel
For the PCM data of “1” to “6”, a plurality of linear prediction values of a current signal are calculated from a past signal in a time domain by a plurality of predictors (not shown) having different characteristics, and then the original PCM data Then, a prediction residual for each predictor is calculated from the plurality of linear prediction values. Following buffer / selector 14D
1, 14D2, 16D1 to 16D4 temporarily store the prediction residuals calculated by the prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively, and select the selection signal / DTS.
(Decoding Time Stamp) The minimum value of the prediction residual is selected for each subframe specified by the generator 17.

【００１５】選択信号／ＤＴＳ生成器１７は予測残差の
ビット数フラグをパッキング回路１８とフォーマット化
回路１９に対して印加し、また、予測残差が最小の予測
器を示す予測器選択フラグと、式（１）における相関係
数ａと、復号化側が入力バッファ２２ａ（図６）からス
トリームデータを取り出す時間を示すＤＴＳをフォーマ
ット化回路１９に対して印加する。パッキング回路１８
はバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１
６Ｄ４により選択された６ch分の予測残差を、選択信号
／ＤＴＳ生成器１７により指定されたビット数フラグに
基づいて指定ビット数でパッキングする。またＰＴＳ生
成器１７ｃは、復号化側が出力バッファ１１０（図６）
からＰＣＭデータを取り出す時間を示すＰＴＳ（プレゼ
ンテーション・タイム・スタンプ）を生成してフォーマ
ット化回路１９に出力する。The selection signal / DTS generator 17 applies a bit number flag of the prediction residual to the packing circuit 18 and the formatting circuit 19, and outputs a predictor selection flag indicating a predictor having the minimum prediction residual. To the formatting circuit 19, the correlation coefficient a in the equation (1) and the DTS indicating the time at which the decoding side takes out the stream data from the input buffer 22a (FIG. 6). Packing circuit 18
Are buffer / selectors 14D1, 14D2, 16D1-1.
The prediction residual for 6 ch selected by 6D4 is packed with the specified bit number based on the bit number flag specified by the selection signal / DTS generator 17. In the PTS generator 17c, the decoding side is the output buffer 110 (FIG. 6).
A PTS (Presentation Time Stamp) indicating the time at which the PCM data is to be extracted from the PCM is generated and output to the formatting circuit 19.

【００１６】続くフォーマット化回路１９は図３〜図５
に示すようなユーザデータにフォーマット化する。図３
に示すユーザデータ（サブパケット）は、前方グループ
に関する２ch「１」、「２」の予測符号化データを含む
可変レートビットストリーム（サブストリーム）ＢＳ０
と、他のグループに関する４ch「３」〜「６」の予測符
号化データを含む可変レートビットストリーム（サブス
トリーム）ＢＳ１と、サブストリームＢＳ０、ＢＳ１の
前に設けられたビットストリームヘッダ（リスタートヘ
ッダ）により構成されている。また、サブストリームＢ
Ｓ０、ＢＳ１の１フレーム分は・フレームヘッダと、・各ch「１」〜「６」の１フレームの先頭サンプルデー
タと、・各ch「１」〜「６」のサブフレーム毎の予測器選択フ
ラグと、・各ch「１」〜「６」のサブフレーム毎のビット数フラ
グと、・各ch「１」〜「６」の予測残差データ列（可変ビット
数）と、・ch「６」の係数ａが多重化されている。このような予
測符号化によれば、原信号が例えばサンプリング周波数
＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネ
ルの場合、７１％の圧縮率を実現することができる。The following formatting circuit 19 is shown in FIGS.
Format as user data as shown in FIG.
Is a variable-rate bit stream (sub-stream) BS0 including 2ch “1” and “2” prediction coded data related to the forward group.
And a variable-rate bit stream (substream) BS1 including 4ch “3” to “6” prediction coded data relating to other groups, and a bitstream header (restart header) provided before substreams BS0 and BS1 ). Also, substream B
One frame of S0 and BS1 includes: a frame header; first sample data of one frame of each channel “1” to “6”; and selection of a predictor for each subframe of each channel “1” to “6”. A flag; a bit number flag for each subframe of each ch “1” to “6”; a prediction residual data string (variable bit number) for each ch “1” to “6”; Are multiplexed. According to such predictive coding, when the original signal has, for example, a sampling frequency of 96 kHz, the number of quantization bits = 24 bits, and 6 channels, a compression ratio of 71% can be realized.

【００１７】図２に示す符号化部２’−１、２’−２に
より予測符号化された可変レートビットストリームデー
タを、記録媒体の一例としてＤＶＤオーディオディスク
に記録する場合には、図４に示すオーディオ（Ａ）パッ
クにパッキングされる。このパックは２０３４バイトの
ユーザデータ（Ａパケット、Ｖパケット）に対して４バ
イトのパックスタート情報と、６バイトのＳＣＲ（Syst
em Clock Reference：システム時刻基準参照値）情報
と、３バイトのMux レート（rate）情報と１バイトのス
タッフィングの合計１４バイトのパックヘッダが付加さ
れて構成されている（１パック＝合計２０４８バイ
ト）。この場合、タイムスタンプであるＳＣＲ情報を、
先頭パックでは「１」として同一タイトル内で連続とす
ることにより同一タイトル内のＡパックの時間を管理す
ることができる。When the variable rate bit stream data predictively encoded by the encoding units 2'-1 and 2'-2 shown in FIG. 2 is recorded on a DVD audio disc as an example of a recording medium, FIG. The audio (A) pack shown is packed. This pack has 4 bytes of pack start information and 6 bytes of SCR (Syst) for 2034 bytes of user data (A packet, V packet).
em Clock Reference (system time reference value) information, a 3-byte Mux rate (rate) information, and a 1-byte stuffing that add a pack header of a total of 14 bytes (1 pack = 2048 bytes in total) . In this case, the time stamp SCR information is
In the first pack, the time of the A-pack in the same title can be managed by setting it to “1” so as to be continuous within the same title.

【００１８】圧縮ＰＣＭのＡパケットは図５に詳しく示
すように、１９又は１４バイトのパケットヘッダと、圧
縮ＰＣＭのプライベートヘッダと、図３に示すフォーマ
ットの１ないし２０１１バイトのオーディオデータ（圧
縮ＰＣＭ）により構成されている。そして、ＤＴＳとＰ
ＴＳは図５のパケットヘッダ内に（具体的にはパケット
ヘッダの１０〜１４バイト目にＰＴＳが、１５〜１９バ
イト目にＤＴＳが）セットされる。圧縮ＰＣＭのプライ
ベートヘッダは、・１バイトのサブストリームＩＤと、・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（Universal Pr
oduct Code/European Article Number-International S
tandard Recording Code）番号、及びＵＰＣ／ＥＡＮ−
ＩＳＲＣデータと、・１バイトのプライベートヘッダ長と、・２バイトの第１アクセスユニットポインタと、・８バイトのオーディオデータ情報（ＡＤＩ）と、・０〜７バイトのスタッフィングバイトとに、より構成
されている。そして、ＡＤＩ内に１秒後のアクセスユニ
ットをサーチするための前方アクセスユニット・サーチ
ポインタと、１秒前のアクセスユニットをサーチするた
めの後方アクセスユニット・サーチポインタがともに１
バイトで（具体的にはＡＤＩの７バイト目に前方アクセ
スユニット・サーチポインタが、８バイト目に後方アク
セスユニット・サーチポインタが）セットされる。As shown in detail in FIG. 5, the A packet of the compressed PCM has a packet header of 19 or 14 bytes, a private header of the compressed PCM, and audio data (compressed PCM) of 1 to 2011 bytes in the format shown in FIG. It consists of. And DTS and P
The TS is set in the packet header of FIG. 5 (specifically, the PTS is set in the 10th to 14th bytes and the DTS is set in the 15th to 19th bytes). The private header of the compressed PCM is: 1-byte substream ID, 2 bytes of UPC / EAN-ISRC (Universal Prism).
oduct Code / European Article Number-International S
tandard Recording Code) number and UPC / EAN-
ISRC data, 1-byte private header length, 2-byte first access unit pointer, 8-byte audio data information (ADI), and 0 to 7 stuffing bytes. ing. The forward access unit search pointer for searching for the access unit one second later and the backward access unit search pointer for searching for the access unit one second earlier in the ADI are both 1
A byte (specifically, a forward access unit search pointer is set at the seventh byte of the ADI, and a backward access unit search pointer is set at the eighth byte).

【００１９】次に図６を参照して復号化部３’−１、
３’−２について説明する。上記フォーマットの可変レ
ートビットストリームデータＢＳ０、ＢＳ１は、デフォ
ーマット化回路２１により分離される。そして、各ｃｈ
「１」〜「６」の１フレームの先頭サンプルデータと予
測器選択フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ
２、２３Ｄ１〜２３Ｄ４に印加され、各ｃｈ「１」〜
「６」のビット数フラグはアンパッキング回路２２に印
加される。また、ＳＣＲと、ＤＴＳと予測残差データ列
は入力バッファ２２ａに印加され、ＰＴＳは出力バッフ
ァ１１０に印加される。ここで、予測回路２４Ｄ１、２
４Ｄ２、２３Ｄ１〜２３Ｄ４内の複数の予測器（不図
示）はそれぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ
２、１５Ｄ１〜１５Ｄ４内の複数の予測器と同一の特性
であり、予測器選択フラグにより同一特性のものが選択
される。Next, referring to FIG. 6, the decoding units 3'-1,
3′-2 will be described. The variable rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21. And each channel
The head sample data of one frame of "1" to "6" and the predictor selection flag are respectively stored in the prediction circuits 24D1 and 24D.
2, 23D1 to 23D4, and each channel “1” to
The bit number flag of “6” is applied to the unpacking circuit 22. The SCR, the DTS, and the prediction residual data string are applied to the input buffer 22a, and the PTS is applied to the output buffer 110. Here, the prediction circuits 24D1, 2D2
A plurality of predictors (not shown) in 4D2 and 23D1 to 23D4 are respectively provided on the prediction circuits 13D1 and 13D on the encoding side.
2, the same characteristics as those of the plurality of predictors in 15D1 to 15D4, and those having the same characteristics are selected by the predictor selection flag.

【００２０】デフォーマット化回路２１により分離され
たストリームデータ（予測残差データ列）は、図７に示
すようにＳＣＲによりアクセスユニット毎に入力バッフ
ァ２２ａに取り込まれて蓄積される。ここで、１つのア
クセスユニットのデータ量は、例えばｆｓ＝９６ｋＨｚ
の場合には（１／９６ｋＨｚ）秒分であるが、図８、図
９（ａ）に詳しく示すように可変長である。そして、入
力バッファ２２ａに蓄積されたストリームデータはＤＴ
Ｓに基づいてＦＩＦＯで読み出されてアンパッキング回
路２２に印加される。The stream data (prediction residual data string) separated by the reformatting circuit 21 is fetched and stored in the input buffer 22a for each access unit by the SCR as shown in FIG. Here, the data amount of one access unit is, for example, fs = 96 kHz.
In the case of (1), it is (1/96 kHz) seconds, but as shown in detail in FIGS. 8 and 9A, the length is variable. The stream data accumulated in the input buffer 22a is DT
Based on S, the data is read out from the FIFO and applied to the unpacking circuit 22.

【００２１】アンパッキング回路２２は各ｃｈ「１」〜
「６」の予測残差データ列をビット数フラグ毎に基づい
て分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３
Ｄ１〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ
２、２３Ｄ１〜２３Ｄ４ではそれぞれ、アンパッキング
回路２２からの各ｃｈ「１」〜「６」の今回の予測残差
データと、内部の複数の予測器の内、予測器選択フラグ
により選択された各１つにより予測された前回の予測値
が加算されて今回の予測値が算出され、次いで１フレー
ムの先頭サンプルデータを基準として各サンプルのＰＣ
Ｍデータが算出されて出力バッファ１１０に蓄積され
る。出力バッファ１１０に蓄積されたＰＣＭデータはＰ
ＴＳに基づいて読み出されて出力される。したがって、
図９（ａ）に示す可変長のアクセスユニットが伸長され
て、図９（ｂ）に示す一定長のプレゼンテーションユニ
ットが出力される。The unpacking circuit 22 is provided for each channel "1" to
The prediction residual data string of “6” is separated based on each bit number flag, and is divided into prediction circuits 24D1, 24D2, and 23, respectively.
It outputs to D1-23D4. Prediction circuits 24D1, 24D
2, 23D1 to 23D4, the current prediction residual data of each of the channels “1” to “6” from the unpacking circuit 22 and each of the plurality of internal predictors selected by the predictor selection flag. The previous predicted value predicted by one frame is added to calculate the current predicted value, and then the PC of each sample is determined based on the first sample data of one frame.
M data is calculated and stored in the output buffer 110. PCM data stored in the output buffer 110 is P
Read and output based on TS. Therefore,
The variable length access unit shown in FIG. 9A is decompressed, and a fixed length presentation unit shown in FIG. 9B is output.

【００２２】ここで、操作部１０１を介してサーチ再生
が指示された場合には、制御部１００により図５に示す
ＡＤＩ内に置かれる１秒先を示す前方アクセスユニット
・サーチポインタと１秒後を示す後方アクセスユニット
・サーチポインタに基づいてアクセスユニットを再生す
る。このサーチポインタとしては、１秒先、１秒前の代
わりに２秒先、２秒前のものでよい。Here, when a search reproduction is instructed via the operation unit 101, the control unit 100 sets a forward access unit search pointer indicating one second ahead in the ADI shown in FIG. The access unit is reproduced on the basis of the backward access unit search pointer indicating. The search pointer may be one second ahead, two seconds ahead, two seconds ahead instead of one second ahead.

【００２３】図２に示す符号化部２’−１、２’−２に
より予測符号化された可変レートビットストリームデー
タをネットワークを介して伝送する場合には、符号化側
では図１０に示すように伝送用にパケット化し（ステッ
プＳ４１）、次いでパケットヘッダを付与し（ステップ
Ｓ４２）、次いでこのパケットをネットワーク上に送り
出す（ステップＳ４３）。When variable-rate bit stream data predictively coded by the coding units 2'-1 and 2'-2 shown in FIG. 2 is transmitted via a network, the coding side performs the processing shown in FIG. (Step S41), add a packet header (step S42), and send this packet out onto the network (step S43).

【００２４】復号側では図１１（Ａ）に示すようにヘッ
ダを除去し（ステップＳ５１）、次いでデータを復元し
（ステップＳ５２）、次いでこのデータをメモリに格納
して復号を待つ（ステップＳ５３）。そして、復号を行
う場合には図１１（Ｂ）に示すように、デフォーマット
化を行い（ステップＳ６１）、次いで入力バッファ２２
ａの入出力制御を行い（ステップＳ６２）、次いでアン
パッキングを行う（ステップＳ６３）。なお、このと
き、サーチ再生指示がある場合にはサーチポインタをデ
コードする。次いで予測器をフラグに基づいて選択して
デコードを行い（ステップＳ６４）、次いで出力バッフ
ァ１１０の入出力制御を行い（ステップＳ６５）、次い
で元のマルチチャネルを復元し（ステップＳ６６）、次
いでこれを出力し（ステップＳ６７）、以下、これを繰
り返す。On the decoding side, the header is removed as shown in FIG. 11A (step S51), the data is restored (step S52), and the data is stored in a memory and decoding is waited (step S53). . Then, when decoding is performed, as shown in FIG. 11B, deformatting is performed (step S61), and then the input buffer 22
The input / output control of a is performed (step S62), and then the unpacking is performed (step S63). At this time, if there is a search reproduction instruction, the search pointer is decoded. Next, a predictor is selected and decoded based on the flag (step S64), input / output control of the output buffer 110 is performed (step S65), and the original multi-channel is restored (step S66). This is output (step S67), and thereafter, this is repeated.

【００２５】なお、上記実施形態では、前方グループに
関する２ch「１」、「２」を「１」＝Ｌｆ＋Ｒｆ「２」＝Ｌｆ−Ｒｆにより変換して予測符号化したが、代わりに式（２）に
よりマルチチャネルをダウンミクスしてステレオ２chデ
ータ（Ｌ、Ｒ）を生成し、次いで次式（１）’ 「１」＝Ｌ＋Ｒ「２」＝Ｌ−Ｒ「３」〜「５」は同じ「６」＝Ｌｆｅ−Ｃ …（１）’ により変換して予測符号化するようにしてもよい（第２
の実施形態）。この場合には、復号化側のミクス＆マト
リクス回路４’はチャネル「１」、「２」を加算するこ
とによりチャネルＬを、減算することによりチャネルＲ
を生成することができる。In the above embodiment, 2ch "1" and "2" relating to the front group are converted by "1" = Lf + Rf "2" = Lf-Rf and are predictively coded. , Down-mixing the multi-channels to generate stereo 2-ch data (L, R), and then the following equation (1) ′ “1” = L + R “2” = LR “3” to “5” are the same as “6”. = Lfe-C (1) ′ and may be subjected to predictive coding (second
Embodiment). In this case, the mix & matrix circuit 4 ′ on the decoding side adds the channels “1” and “2” to the channel L, and subtracts the channel R by adding the channels “1” and “2”.
Can be generated.

【００２６】また、第３の実施形態として図１２に示す
ように、２ch「１」、「２」の代わりに式（２）により
マルチチャネルをダウンミクスしてステレオ２chデータ
（Ｌ、Ｒ）を生成して、このステレオ２ch（Ｌ、Ｒ）と
４ch「３」〜「６」を予測符号化するようにしてもよ
い。なお、第２、第３の実施形態では、フロントレフト
（Ｌｆ）とフロントライト（Ｒｆ）が復号化側に伝送さ
れないので、復号化側ではこれを式（１）、（２）によ
り生成する。As a third embodiment, as shown in FIG. 12, multi-channel downmixing is performed according to equation (2) instead of 2ch "1" and "2", and stereo 2ch data (L, R) is obtained. The stereo 2ch (L, R) and the 4ch “3” to “6” may be generated and predictively coded. In the second and third embodiments, since the front left (Lf) and the front right (Rf) are not transmitted to the decoding side, the decoding side generates them according to equations (1) and (2).

【００２７】次に図１３、図１４を参照して第４の実施
形態について説明する。上記の実施形態では、１グルー
プの相関性の信号「１」〜「６」を予測符号化するよう
に構成されているが、この第４の実施形態では複数グル
ープの相関性のある信号を生成して予測符号化し、圧縮
率が最も高いグループの予測符号化データを選択するよ
うに構成されている。このため図１３に示す符号化部で
は、第１〜第ｎの相関回路１−１〜１−ｎが設けられ、
このｎ個の相関回路１−１〜１−ｎは例えば６ch（Ｌ
ｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータ
を、相関性が異なるｎ種類の６ch信号「１」〜「６」に
変換する。Next, a fourth embodiment will be described with reference to FIGS. In the above embodiment, one group of correlated signals "1" to "6" are configured to be predictively coded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Then, it is configured to perform predictive encoding and select predictive encoded data of a group having the highest compression ratio. Therefore, the encoding unit shown in FIG. 13 includes first to n-th correlation circuits 1-1 to 1-n,
The n correlation circuits 1-1 to 1-n have, for example, 6 channels (L
f, C, Rf, Ls, Rs, and Lfe) PCM data are converted into n types of 6-channel signals “1” to “6” having different correlations.

【００２８】例えば第１の相関回路１−１は以下のよう
に変換し、「１」＝Ｌｆ「２」＝Ｃ−（Ｌｓ＋Ｒｓ）／２「３」＝Ｒｆ−Ｌｆ「４」＝Ｌｓ−ａ×Ｌｆｅ「５」＝Ｒｓ−ｂ×Ｒｆ「６」＝Ｌｆｅまた、第ｎの相関回路１−ｎは以下のように変換する。「１」＝Ｌｆ＋Ｒｆ「２」＝Ｃ−Ｌｆ「３」＝Ｒｆ−Ｌｆ「４」＝Ｌｓ−Ｌｆ「５」＝Ｒｓ−Ｌｆ「６」＝Ｌｆｅ−ＣFor example, the first correlation circuit 1-1 converts as follows: "1" = Lf "2" = C- (Ls + Rs) / 2 "3" = Rf-Lf "4" = Ls-a × Lfe “5” = Rs−b × Rf “6” = Lfe Further, the n-th correlation circuit 1-n performs conversion as follows. “1” = Lf + Rf “2” = C−Lf “3” = Rf−Lf “4” = Ls−Lf “5” = Rs−Lf “6” = Lfe−C

【００２９】また、相関回路１−１〜１−ｎ毎に予測回
路１５とバッファ・選択器１６が設けられ、グループ毎
の予測残差の最小値のデータ量に基づいて圧縮率が最も
高いグループが相関選択信号生成器１７ｂにより選択さ
れる。このとき、フォーマット化回路１９はその選択フ
ラグ（相関回路選択フラグ、その相関回路の相関係数
ａ、ｂ）を追加して多重化する。A prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and the group having the highest compression ratio is determined based on the data amount of the minimum prediction residual for each group. Are selected by the correlation selection signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).

【００３０】また、図１４に示す復号化側では、符号化
側の相関回路１−１〜１−ｎに対してｎ個の相関回路４
−１〜４−ｎ（又は係数ａ、ｂが変更可能な図示省略の
１つの相関回路）が設けられる。なお、図１３に示すｎ
グループの予測回路が同一の構成である場合、復号装置
では図１４に示すようにｎグループ分の予測回路を設け
る必要はなく、１つのグループ分の予測回路でよい。そ
して、符号化装置から伝送された選択フラグに基づいて
相関回路４−１〜４−ｎの１つを選択、又は係数ａ、ｂ
を設定して元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌ
ｆｅ）を復元し、また、式（２）によりマルチチャネル
をダウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生
成する。On the decoding side shown in FIG. 14, n correlating circuits 4 are provided for the correlating circuits 1-1 to 1-n on the encoding side.
−1 to 4-n (or one correlation circuit (not shown) whose coefficients a and b can be changed) are provided. Note that n shown in FIG.
When the prediction circuits of the groups have the same configuration, the decoding device does not need to provide the prediction circuits of n groups as shown in FIG. 14, and may use the prediction circuits of one group. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or the coefficients a and b are selected.
And set the original 6 ch (Lf, C, Rf, Ls, Rs, L
fe), and down-mixes the multi-channels according to equation (2) to generate stereo 2-ch data (L, R).

【００３１】また、上記の第１の実施形態では、１種類
の相関性の信号「１」〜「６」を予測符号化するように
構成されているが、この信号「１」〜「６」のグループ
と原信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグ
ループを予測符号化し、圧縮率が高い方のグループを選
択するようにしてもよい。In the first embodiment, one kind of correlation signal "1" to "6" is configured to be predictively coded. However, the signals "1" to "6" are encoded. And the group of the original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively coded and the group with the higher compression ratio may be selected.

【００３２】[0032]

【発明の効果】以上説明したように本発明によれば、ア
クセスユニットサーチポインタをパケットヘッダにセッ
トしたので、マルチチャネルの音声信号を可変の圧縮率
で符号化する場合に再生側がサーチ再生することができ
る。As described above, according to the present invention, since the access unit search pointer is set in the packet header, when the multi-channel audio signal is encoded at a variable compression ratio, the reproducing side performs search reproduction. Can be.

[Brief description of the drawings]

【図１】本発明に係る音声符号化装置とそれに対応した
音声復号装置の第１の実施形態を示すブロック図であ
る。FIG. 1 is a block diagram showing a first embodiment of a speech encoding apparatus according to the present invention and a speech decoding apparatus corresponding thereto.

【図２】図１の符号化部を詳しく示すブロック図であ
る。FIG. 2 is a block diagram illustrating an encoding unit of FIG. 1 in detail.

【図３】図１、図２の符号化部により符号化されたビッ
トストリームを示す説明図である。FIG. 3 is an explanatory diagram showing a bit stream encoded by an encoding unit shown in FIGS. 1 and 2;

【図４】ＤＶＤのパックのフォーマットを示す説明図で
ある。FIG. 4 is an explanatory diagram showing a format of a DVD pack.

【図５】ＤＶＤのオーディオパックのフォーマットを示
す説明図である。FIG. 5 is an explanatory diagram showing a format of a DVD audio pack.

【図６】図１の復号化部を詳しく示すブロック図であ
る。FIG. 6 is a block diagram illustrating a decoding unit of FIG. 1 in detail;

【図７】図６の入力バッファの書き込み／読み出しタイ
ミングを示すタイミングチャートである。FIG. 7 is a timing chart showing write / read timings of the input buffer of FIG. 6;

【図８】アクセスユニット毎の圧縮データ量を示す説明
図である。FIG. 8 is an explanatory diagram showing the amount of compressed data for each access unit.

【図９】アクセスユニットとプレゼンテーションユニッ
トを示す説明図である。FIG. 9 is an explanatory diagram showing an access unit and a presentation unit.

【図１０】音声伝送方法を示すフローチャートである。FIG. 10 is a flowchart showing a voice transmission method.

【図１１】音声伝送方法を示すフローチャートである。FIG. 11 is a flowchart showing a voice transmission method.

【図１２】第３の実施形態の音声符号化装置とそれに対
応した音声復号装置を示すブロック図である。FIG. 12 is a block diagram illustrating a speech encoding device according to a third embodiment and a speech decoding device corresponding thereto.

【図１３】第４の実施形態の音声符号化装置を示すブロ
ック図である。FIG. 13 is a block diagram illustrating a speech encoding device according to a fourth embodiment.

【図１４】第４の実施形態の音声復号装置を示すブロッ
ク図である。FIG. 14 is a block diagram illustrating a speech decoding device according to a fourth embodiment.

[Explanation of symbols]

１’ ６chミクス＆マトリクス回路１３Ｄ１，１３Ｄ２，１５Ｄ１〜１５Ｄ４予測回路
（バッファ・選択器１４Ｄ１，１４Ｄ２，１６Ｄ１〜１
６Ｄ４と共に圧縮手段を構成する。）１４Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４バッファ・
選択器１７選択信号／ＤＴＳ生成器（タイミング生成手段）１７ｃＰＴＳ生成器（タイミング生成手段）１９フォーマット化回路（フォーマット化手段）２１デフォーマット化回路（分離手段）２２アンパッキング回路２２ａ入力バッファ２４Ｄ１，２４Ｄ２，２３Ｄ１〜２３Ｄ４予測回路
（伸長手段）１００制御部（読み出し手段）１１０出力バッファ1 '6ch Mix & Matrix Circuit 13D1, 13D2, 15D1-15D4 Prediction Circuit (Buffer / Selector 14D1, 14D2, 16D1-1
A compression means is constituted together with 6D4. 14D1, 14D2, 16D1-16D4 buffer
Selector 17 Selection signal / DTS generator (timing generating means) 17c PTS generator (timing generating means) 19 Formatting circuit (Formatting means) 21 Deformatting circuit (Separating means) 22 Unpacking circuit 22a Input buffer 24D1, 24D2, 23D1 to 23D4 Prediction circuit (expansion means) 100 Control unit (reading means) 110 Output buffer

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 - 19/14 G11B 20/10 - 20/12 H03M 7/30 - 7/40 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 19/00-19/14 G11B 20/10-20/12 H03M ⁷ /30-7/40

Claims

(57) [Claims]

A plurality of linear prediction methods for obtaining a leading sample value in response to an input audio signal for a multi-channel audio signal as it is or for each channel correlated with each other, and having different characteristics. linear prediction value of the current signal is predicted from each of the past signal in the time domain by
From the predicted linear prediction value and the audio signal
Compression means for selecting and compressing a linear prediction method that minimizes the obtained prediction residual; and accessing the compressed data before or after a predetermined time.
Access unit support for searching and reproducing units
Timing generation means for generating a search pointer, and a private address including the access unit search pointer.
And over preparative header, the compression de-including the access unit
And a means for formatting into a packet having user data including the data.