JP4305869B2

JP4305869B2 - Speech encoding method and speech decoding method

Info

Publication number: JP4305869B2
Application number: JP2005236430A
Authority: JP
Inventors: 美昭田中; 昭治植野
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2005-08-17
Filing date: 2005-08-17
Publication date: 2009-07-29
Anticipated expiration: 2018-11-16
Also published as: JP2006023769A

Description

本発明は、マルチチャネルの音声信号を可変長で圧縮するための音声符号化方法及び音
声復号方法に関する。 The present invention relates to a speech encoding method and speech decoding method for compressing a multi-channel speech signal with a variable length.

音声信号を可変長で圧縮する方法として、本発明者は先の出願（特願平９−２８９１５９号）において１チャネルの原デジタル音声信号に対して、特性が異なる複数の予測器により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、原デジタル音声信号と、この複数の線形予測値から予測器毎の予測残差を算出、予測残差の最小値を選択する予測符号化方法を提案している。 As a method of compressing an audio signal with a variable length, the present inventor, in the previous application (Japanese Patent Application No. 9-289159), uses a plurality of predictors having different characteristics in the time domain for a single channel original digital audio signal. Prediction that calculates multiple linear prediction values of current signal from past signal, calculates prediction residual for each predictor from original digital audio signal and multiple linear prediction values, and selects the minimum value of prediction residual An encoding method is proposed.

なお、上記方法では原デジタル音声信号がサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２０ビット程度の場合にある程度の圧縮効果を得ることができるが、近年のＤＶＤオーディオディスクではこの２倍のサンプリング周波数（＝１９２ｋＨｚ）が使用され、また、量子化ビット数も２４ビットが使用される傾向があるので、圧縮率を改善する必要がある。また、マルチチャネルにおけるサンプリング周波数と量子化ビット数はチャネル毎に異なることもある。 In the above method, a certain degree of compression effect can be obtained when the original digital audio signal has a sampling frequency = 96 kHz and the number of quantization bits = 20 bits. However, in recent DVD audio discs, the sampling frequency (twice this ( = 192 kHz) is used, and the number of quantization bits tends to be 24. Therefore, it is necessary to improve the compression rate. In addition, the sampling frequency and the number of quantization bits in multichannel may be different for each channel.

ところで、予測符号化方式のような圧縮方式は圧縮率が可変（ＶＢＲ：バリアブル・ビット・レート）であるので、マルチチャネルの音声信号を予測符号化するとチャネル毎のデータ量が時間的に大きく変化する。また、このようなデータを伝送する場合には、チャネル毎にパラレルではなくデータストリームとして伝送される。 By the way, since a compression method such as the predictive coding method has a variable compression rate (VBR: variable bit rate), when a multi-channel audio signal is predictively encoded, the amount of data for each channel greatly changes over time. To do. When such data is transmitted, it is transmitted as a data stream instead of parallel for each channel.

したがって、再生側（デコード側）においてこのような可変長のデータストリームをチャネル毎に同期して再生（プレゼンテーション）可能にするためには、入力バッファに蓄積されたデータストリームを読み出してデコーダに出力するためのタイミングを示すデコード時間と、出力バッファに蓄積されたデコード後のデータを読み出してスピーカなどに出力（プレゼンテーション）するためのタイミングを示す再生時間を管理しなければならない。また、再生側でこのような可変長のデータストリームをサーチ再生するための時間を管理しなければならない。 Therefore, in order to enable reproduction (presentation) of such a variable length data stream in synchronization with each channel on the reproduction side (decoding side), the data stream stored in the input buffer is read and output to the decoder. Therefore, it is necessary to manage the decoding time indicating the timing for the output and the reproduction time indicating the timing for reading the decoded data accumulated in the output buffer and outputting (presenting) the data to a speaker or the like. Further, the playback side must manage the time for searching and playing back such a variable length data stream.

そこで本発明は、マルチチャネルの音声信号を可変の圧縮率で符号化する場合に再生側の処理時間を管理することができる音声符号化方法及び音声復号方法を提供することを目的とする。 Accordingly, an object of the present invention is to provide a speech encoding method and speech decoding method capable of managing the processing time on the playback side when a multi-channel speech signal is encoded with a variable compression rate.

本発明は上記目的を達成するために、以下の１）及び２）に記載の手段よりなる。 In order to achieve the above object, the present invention comprises the following means 1) and 2).

１）複数のチャネルの音声信号をそのままのチャネル又は互いに相関をとったチャネル毎に入力される音声信号に応答して、先頭サンプル値を得ると共に、特性が異なる複数の線形予測方法により時間領域の過去から現在の信号の線形予測値がそれぞれ予測され、その予測される線形予測値と前記音声信号とから得られる予測残差が最小となるような線形予測方法を選択して予測符号化するステップと、
前記選択された各チャネルの線形予測方法と予測残差と所定の先頭サンプル値とを含む前記予測符号化データをパッキングする場合、前記予測残差をビット数情報に基づいたビット数でパッキングするステップと、
前記パッキングされた圧縮データの量に応じて、復号側の入力バッファ内の圧縮データを読み出すタイミングを示すデコーディング・タイム・スタンプ情報を生成すると共に、復号側において前記伸長されるデータが一旦蓄積されて出力する際のタイミングを示すプレゼンテーション・タイム・スタンプ情報を生成するステップと、
前記デコーディング・タイム・スタンプ情報及び前記プレゼンテーション・タイム・スタンプ情報を含むパケットヘッダと、前記音声信号のＵＰＣ／ＥＡＮ番号及びＩＳＲＣデータを含む圧縮ＰＣＭプライベートヘッダと、前記圧縮データを含む圧縮ＰＣＭアクセスユニットと、を含むユーザデータを有するパケットにフォーマット化するステップと、
からなる音声符号化方法。
２）請求項１記載の音声符号化方法により符号化されたデータから元の前記複数のチャネルの音声信号を復号する音声復号方法であって、
前記圧縮ＰＣＭプライベートヘッダ内に配置するＵＰＣ／ＥＡＮ番号及びＩＳＲＣデータをデコードするステップと、
前記格納されたサブパケットを前記デコーディング・タイム・スタンプ情報に基づいてデコードして圧縮ＰＣＭアクセスユニットを分離するステップと、
前記分離された圧縮ＰＣＭアクセスユニット内の圧縮データの予測残差をビット情報に基づいたビット数で復号し、この復号した予測残差と前記先頭サンプル値と線形予測方法とに基づいて予測値をチャネル毎に算出するステップと、
前記算出された予測値から元の前記複数のチャネルの音声信号を復元して音声データとするステップと、
前記復元された音声データを前記プレゼンテーション・タイム・スタンプ情報に基づいて取り出すステップと、
からなる音声復号方法。
1) in response to audio signals of a plurality of channels to the exact channel or audio signal inputted to each channel taken mutually correlated, with obtaining the top sampled values, characteristics in the time domain by a plurality of different linear prediction methods A step of selecting and predicting a linear prediction method that predicts a linear prediction value of the current signal from the past and minimizes a prediction residual obtained from the predicted linear prediction value and the speech signal. When,
Packing the prediction residual with the number of bits based on the number-of-bits information when packing the prediction encoded data including the linear prediction method, the prediction residual, and a predetermined head sample value of each selected channel ; When,
In accordance with the amount of the compressed data packed, decoding time stamp information indicating the timing for reading the compressed data in the input buffer on the decoding side is generated, and the decompressed data is temporarily stored on the decoding side. Generating presentation time stamp information indicating the timing when output
A packet header including the decoding time stamp information and the presentation time stamp information, a compressed PCM private header including the UPC / EAN number and ISRC data of the audio signal, and a compressed PCM access unit including the compressed data And formatting into packets having user data comprising:
A speech encoding method comprising:
2) A speech decoding method for decoding the original speech signals of the plurality of channels from the data encoded by the speech encoding method according to claim 1,
Decoding the UPC / EAN number and ISRC data located in the compressed PCM private header;
Decoding the stored subpackets based on the decoding time stamp information to separate compressed PCM access units;
The prediction residual of the compressed data in the separated compressed PCM access unit is decoded with the number of bits based on bit information, and the prediction value is calculated based on the decoded prediction residual, the head sample value, and the linear prediction method. Calculating for each channel ;
Restoring the original audio signals of the plurality of channels from the calculated predicted values to audio data;
Retrieving the restored audio data based on the presentation time stamp information;
A speech decoding method comprising:

以上説明したように本発明によれば、圧縮データを読み出すタイミングを示すデコーディング・タイム・スタンプ情報をパケットヘッダにいれたので、複数チャネルの音声信号を可変の圧縮率で符号化する場合に再生側がサーチ再生することができる。 As described above, according to the present invention, the decoding time stamp information indicating the timing for reading the compressed data is included in the packet header, so that reproduction is performed when a multi-channel audio signal is encoded with a variable compression rate. The side can perform search playback.

以下、図面を参照して本発明の実施の形態を説明する。図１は本発明が適用される声符
号化装置とそれに対応する音声復号装置の第１の実施形態を示すブロック図、図２は図１
の符号化部を詳しく示すブロック図、図３は図１、図２の符号化部により符号化されたビ
ットストリームを示す説明図、図４はＤＶＤのパックのフォーマットを示す説明図、図５
はＤＶＤのオーディオパックのフォーマットを示す説明図、図６は図１の復号化部を詳し
く示すブロック図、図７は図６の入力バッファの書き込み／読み出しタイミングを示すタ
イミングチャート、図８はアクセスユニット毎の圧縮データ量を示す説明図、図９はアク
セスユニットとプレゼンテーションユニットを示す説明図である。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a voice encoding apparatus to which the present invention is applied and a corresponding voice decoding apparatus. FIG.
FIG. 3 is an explanatory diagram showing a bit stream encoded by the encoding unit shown in FIGS. 1 and 2, FIG. 4 is an explanatory diagram showing a format of a DVD pack, and FIG.
Is an explanatory diagram showing the format of a DVD audio pack, FIG. 6 is a block diagram showing in detail the decoding unit of FIG. 1, FIG. 7 is a timing chart showing write / read timing of the input buffer of FIG. 6, and FIG. FIG. 9 is an explanatory diagram showing an access unit and a presentation unit.

ここで、マルチチャネル方式としては、例えば次の４つの方式が知られている。
（１）４チャネル方式ドルビーサラウンド方式のように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方Ｓの１チャネルの合計４チャネル
（２）５チャネル方式ドルビーＡＣ−３方式のＳＷチャネルなしのように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル
（３）６チャネル方式ＤＴＳ（Digital Theater System）方式や、ドルビーＡＣ−３方式のように６チャネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ）
（４）８チャネル方式ＳＤＤＳ（Sony Dynamic Digital Sound）方式のように、前方Ｌ、ＬＣ、Ｃ、ＲＣ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チャネルの合計８チャネル
図１に示す符号化側の６チャネル（ch）ミクス＆マトリクス回路１’は、マルチチャネル信号の一例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロントライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラウンドライト（Ｒｓ）及びＬｆｅ（Low Frequency Effect）の６chのＰＣＭデータを次式（１）により前方グループに関する２ch「１」、「２」と他のグループに関する４ch「３」〜「６」に分類して変換し、２ch「１」、「２」を第１符号化部２’−１に、また、４ch「３」〜「６」を第２符号化部２’−２に出力する。 Here, as the multi-channel method, for example, the following four methods are known.
(1) 4-channel system Like the Dolby Surround system, a total of 4 channels of 3 channels for the front L, C, and R + 1 channel for the rear S (2) 5 channels system Like no Dolby AC-3 system SW channel , Front L, C, R 3 channels + rear SL, SR 2 channels in total 5 channels (3) 6 channel system 6 channels (L, L, DTS (Digital Theater System) system and Dolby AC-3 system C, R, SW (Lfe), SL, SR)
(4) 8-channel system Like the SDDS (Sony Dynamic Digital Sound) system, a total of 8 channels including 6 channels of forward L, LC, C, RC, R, and SW + 2 channels of backward SL and SR The 6-channel (ch) mix-and-matrix circuit 1 'on the control side includes, as an example of a multi-channel signal, front left (Lf), center (C), front right (Rf), surround left (Ls), and surround right (Rs) And Lfe (Low Frequency Effect) 6ch PCM data is classified and converted into 2ch “1” and “2” related to the front group and 4ch “3” to “6” related to the other group according to the following equation (1): 2ch “1” and “2” are output to the first encoding unit 2′-1, and 4ch “3” to “6” are output to the second encoding unit 2′-2.

「１」＝Ｌｆ＋Ｒｆ
「２」＝Ｌｆ−Ｒｆ
「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
「４」＝Ｌｓ＋Ｒｓ
「５」＝Ｌｓ−Ｒｓ
「６」＝Ｌｆｅ−ａ×Ｃ
ただし、０≦ａ≦１ …（１）
符号化部２’を構成する第１及び第２符号化部２’−１、２’−２はそれぞれ、図２に詳しく示すように２ch「１」、「２」と４ch「３」〜「６」のＰＣＭデータを予測符号化し、予測符号化データを図３に示すようなビットストリームで記録媒体５や通信媒体６を介して復号側に伝送する。復号側では復号化部３’を構成する第１及び第２復号化部３’−１、３’−２により、図６に詳しく示すようにそれぞれ前方グループに関する２ch「１」、「２」と他のグループに関する４ch「３」〜「６」の予測符号化データをＰＣＭデータに復号する。 “1” = Lf + Rf
“2” = Lf−Rf
“3” = C− (Ls + Rs) / 2
“4” = Ls + Rs
“5” = Ls−Rs
“6” = Lfe−a × C
However, 0 ≦ a ≦ 1 (1)
As shown in detail in FIG. 2, the first and second encoding units 2′-1, 2′-2 constituting the encoding unit 2 ′ are respectively 2ch “1”, “2” and 4ch “3” to “3” to “3” to “3”. The PCM data of “6” is predictively encoded, and the predictive encoded data is transmitted to the decoding side via the recording medium 5 and the communication medium 6 in a bit stream as shown in FIG. On the decoding side, the first and second decoding units 3′-1 and 3′-2 constituting the decoding unit 3 ′ perform 2ch “1” and “2” respectively related to the front group as shown in detail in FIG. The predictive encoded data of 4ch “3” to “6” related to other groups is decoded into PCM data.

次いでミクス＆マトリクス回路４’により式（１）に基づいて元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、この元の６chと係数ｍiｊ（ｉ＝１，２，ｊ＝１，２〜６）により次式（２）のようにステレオ２chデータ（Ｌ、Ｒ）を生成する。 Next, the original 6ch (Lf, C, Rf, Ls, Rs, Lfe) is restored by the mix & matrix circuit 4 ′ based on the equation (1), and the original 6ch and the coefficient mij (i = 1, 2, j = 1, 2 to 6), stereo 2ch data (L, R) is generated as in the following equation (2).

Ｌ＝ｍ１１・Ｌｆ＋ｍ１２・Ｒｆ＋ｍ１３・Ｃ
＋ｍ１４・Ｌｓ＋ｍ１５・Ｒｓ＋ｍ１６・Ｌｆｅ
Ｒ＝ｍ２１・Ｌｆ＋ｍ２２・Ｒｆ＋ｍ２３・Ｃ
＋ｍ２４・Ｌｓ＋ｍ２５・Ｒｓ＋ｍ２６・Ｌｆｅ …（２）
図２を参照して符号化部２’−１、２’−２について詳しく説明する。各ch「１」〜「６」のＰＣＭデータは１フレーム毎に１フレームバッファ１０に格納される。そして、１フレームの各ch「１」〜「６」のサンプルデータがそれぞれ予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ch「１」〜「６」の各フレームの先頭サンプルデータがフォーマット化回路１９に印加される。予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそれぞれ、各ch「１」〜「６」のＰＣＭデータに対して、特性が異なる複数の予測器（不図示）により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、次いで原ＰＣＭデータと、この複数の線形予測値から予測器毎の予測残差を算出する。続くバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれぞれ、予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４により算出された各予測残差を一時記憶して、選択信号／ＤＴＳ（デコーディング・タイム・スタンプ）生成器１７により指定されたサブフレーム毎に予測残差の最小値を選択する。 L = m11 · Lf + m12 · Rf + m13 · C
+ M14 · Ls + m15 · Rs + m16 · Lfe
R = m21 · Lf + m22 · Rf + m23 · C
+ M24 · Ls + m25 · Rs + m26 · Lfe (2)
The encoding units 2′-1 and 2′-2 will be described in detail with reference to FIG. The PCM data of each channel “1” to “6” is stored in one frame buffer 10 for each frame. The sample data of each channel “1” to “6” of one frame is applied to the prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively, and the head sample data of each frame of each channel “1” to “6” Is applied to the formatting circuit 19. Each of the prediction circuits 13D1, 13D2, 15D1 to 15D4 outputs a current signal from a past signal in the time domain to a PCM data of each channel “1” to “6” by a plurality of predictors (not shown) having different characteristics. A plurality of linear prediction values are calculated, and then a prediction residual for each predictor is calculated from the original PCM data and the plurality of linear prediction values. The subsequent buffer / selectors 14D1, 14D2, 16D1 to 16D4 temporarily store the prediction residuals calculated by the prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively, and select signals / DTS (decoding time stamps). The minimum value of the prediction residual is selected for each subframe designated by the generator 17.

選択信号／ＤＴＳ生成器１７は予測残差のビット数フラグをパッキング回路１８とフォーマット化回路１９に対して印加し、また、予測残差が最小の予測器を示す予測器選択フラグと、式（１）における相関係数ａと、復号化側が入力バッファ２２ａ（図６）からストリームデータを取り出す時間を示すＤＴＳをフォーマット化回路１９に対して印加する。パッキング回路１８はバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４により選択された６ch分の予測残差を、選択信号／ＤＴＳ生成器１７により指定されたビット数フラグに基づいて指定ビット数でパッキングする。またＰＴＳ生成器１７ｃは、復号化側が出力バッファ１１０（図６）からＰＣＭデータを取り出す時間を示すＰＴＳ（プレゼンテーション・タイム・スタンプ）を生成してフォーマット化回路１９に出力する。 The selection signal / DTS generator 17 applies a bit number flag of the prediction residual to the packing circuit 18 and the formatting circuit 19, and also includes a predictor selection flag indicating a predictor having the smallest prediction residual, and an expression ( The correlation coefficient a in 1) and the DTS indicating the time for the decoding side to extract the stream data from the input buffer 22a (FIG. 6) are applied to the formatting circuit 19. The packing circuit 18 packs the prediction residuals for 6ch selected by the buffers / selectors 14D1, 14D2, 16D1 to 16D4 with the designated number of bits based on the bit number flag designated by the selection signal / DTS generator 17. . The PTS generator 17c generates a PTS (Presentation Time Stamp) indicating the time for the decoding side to extract the PCM data from the output buffer 110 (FIG. 6), and outputs the PTS to the formatting circuit 19.

続くフォーマット化回路１９は図３〜図５に示すようなユーザデータにフォーマット化する。図３に示すユーザデータ（サブパケット）は、前方グループに関する２ch「１」、「２」の予測符号化データを含む可変レートビットストリーム（サブストリーム）ＢＳ０と、他のグループに関する４ch「３」〜「６」の予測符号化データを含む可変レートビットストリーム（サブストリーム）ＢＳ１と、サブストリームＢＳ０、ＢＳ１の前に設けられたビットストリームヘッダ（リスタートヘッダ）により構成されている。
また、サブストリームＢＳ０、ＢＳ１の１フレーム分は
・フレームヘッダと、
・各ch「１」〜「６」の１フレームの先頭サンプルデータと、
・各ch「１」〜「６」のサブフレーム毎の予測器選択フラグと、
・各ch「１」〜「６」のサブフレーム毎のビット数フラグと、
・各ch「１」〜「６」の予測残差データ列（可変ビット数）と、
・ch「６」の係数ａ
が多重化されている。このような予測符号化によれば、原信号が例えばサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネルの場合、７１％の圧縮率を実現することができる。 The subsequent formatting circuit 19 formats the user data as shown in FIGS. The user data (subpacket) shown in FIG. 3 includes a variable rate bit stream (substream) BS0 including 2ch “1” and “2” predictive encoded data related to the forward group, and 4ch “3” to “3” related to other groups. It is composed of a variable rate bit stream (substream) BS1 including predictive encoded data of “6” and a bitstream header (restart header) provided before the substreams BS0 and BS1.
Also, one frame of substream BS0, BS1 is a frame header,
・ First sample data of one frame of each channel “1” to “6”,
A predictor selection flag for each subframe of each channel “1” to “6”;
A bit number flag for each subframe of each channel “1” to “6”;
-Predictive residual data string (number of variable bits) of each ch "1" to "6",
・ Ch “6” coefficient a
Are multiplexed. According to such predictive coding, when the original signal is, for example, sampling frequency = 96 kHz, quantization bit number = 24 bits, and 6 channels, a compression rate of 71% can be realized.

図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビットストリームデータを、記録媒体の一例としてＤＶＤオーディオディスクに記録する場合には、図４に示すオーディオ（Ａ）パックにパッキングされる。このパックは２０３４バイトのユーザデータ（Ａパケット、Ｖパケット）に対して４バイトのパックスタート情報と、６バイトのＳＣＲ（System Clock Reference：システム時刻基準参照値）情報と、３バイトのMux レート（rate）情報と１バイトのスタッフィングの合計１４バイトのパックヘッダが付加されて構成されている（１パック＝合計２０４８バイト）。この場合、タイムスタンプであるＳＣＲ情報を、先頭パックでは「１」として同一タイトル内で連続とすることにより同一タイトル内のＡパックの時間を管理することができる。 When the variable rate bit stream data predictively encoded by the encoding units 2′-1, 2′-2 shown in FIG. 2 is recorded on a DVD audio disk as an example of a recording medium, the audio ( A) Packed in a pack. This pack consists of 2034 bytes of user data (A packet, V packet), 4 bytes of pack start information, 6 bytes of SCR (System Clock Reference) information, and 3 bytes of Mux rate ( rate) information and a 1-byte stuffing total 14-byte pack header are added (1 pack = total 2048 bytes). In this case, the time of the A pack in the same title can be managed by setting the SCR information as a time stamp as “1” in the first pack and continuing in the same title.

圧縮ＰＣＭのＡパケットは図５に詳しく示すように、１９又は１４バイトのパケットヘッダと、圧縮ＰＣＭのプライベートヘッダと、図３に示すフォーマットの１ないし２０１１バイトのオーディオデータ（圧縮ＰＣＭ）により構成されている。そして、ＤＴＳとＰＴＳは図５のパケットヘッダ内に（具体的にはパケットヘッダの１０〜１４バイト目にＰＴＳが、１５〜１９バイト目にＤＴＳが）セットされる。圧縮ＰＣＭのプライベートヘッダは、
・１バイトのサブストリームＩＤと、
・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（Universal Product Code/European Article Number-International Standard Recording Code）番号、及びＵＰＣ／ＥＡＮ−Ｉ
ＳＲＣデータと、
・１バイトのプライベートヘッダ長と、
・２バイトの第１アクセスユニットポインタと、
・８バイトのオーディオデータ情報（ＡＤＩ）と、
・０〜７バイトのスタッフィングバイトとに、
より構成されている。そして、ＡＤＩ内に１秒後のアクセスユニットをサーチするための前方アクセスユニット・サーチポインタと、１秒前のアクセスユニットをサーチするための後方アクセスユニット・サーチポインタがともに１バイトで（具体的にはＡＤＩの７バイト目に前方アクセスユニット・サーチポインタが、８バイト目に後方アクセスユニット・サーチポインタが）セットされる。 As shown in detail in FIG. 5, the compressed PCM A packet is composed of a 19 or 14 byte packet header, a compressed PCM private header, and 1 to 2011 byte audio data (compressed PCM) in the format shown in FIG. ing. The DTS and PTS are set in the packet header of FIG. 5 (specifically, the PTS is in the 10th to 14th bytes of the packet header and the DTS is in the 15th to 19th bytes). The compressed PCM private header is
A 1-byte substream ID,
2-byte UPC / EAN-ISRC (Universal Product Code / European Article Number-International Standard Recording Code) number and UPC / EAN-I
SRC data,
-1 byte private header length,
A 2-byte first access unit pointer;
8 bytes of audio data information (ADI)
・ With stuffing byte of 0-7 bytes,
It is made up of. Both the forward access unit search pointer for searching the access unit after 1 second in the ADI and the backward access unit search pointer for searching for the access unit before 1 second are both 1 byte (specifically, Is set to the 7th byte of the ADI with the forward access unit search pointer and the 8th byte with the backward access unit search pointer.

次に図６を参照して復号化部３’−１、３’−２について説明する。上記フォーマットの可変レートビットストリームデータＢＳ０、ＢＳ１は、デフォーマット化回路２１により分離される。そして、各ｃｈ「１」〜「６」の１フレームの先頭サンプルデータと予測器選択フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に印加され、各ｃｈ「１」〜「６」のビット数フラグはアンパッキング回路２２に印加される。また、ＳＣＲと、ＤＴＳと予測残差データ列は入力バッファ２２ａに印加され、ＰＴＳは出力バッファ１１０に印加される。ここで、予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４内の複数の予測器（不図示）はそれぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４内の複数の予測器と同一の特性であり、予測器選択フラグにより同一特性のものが選択される。 Next, the decoding units 3'-1 and 3'-2 will be described with reference to FIG. The variable rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21. The first sample data of one frame of each channel “1” to “6” and the predictor selection flag are respectively applied to the prediction circuits 24D1, 24D2, 23D1 to 23D4, and the number of bits of each channel “1” to “6”. The flag is applied to the unpacking circuit 22. The SCR, DTS, and prediction residual data string are applied to the input buffer 22a, and PTS is applied to the output buffer 110. Here, a plurality of predictors (not shown) in the prediction circuits 24D1, 24D2, and 23D1 to 23D4 have the same characteristics as the plurality of predictors in the encoding-side prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively. Those having the same characteristics are selected by the predictor selection flag.

デフォーマット化回路２１により分離されたストリームデータ（予測残差データ列）は、図７に示すようにＳＣＲによりアクセスユニット毎に入力バッファ２２ａに取り込まれて蓄積される。ここで、１つのアクセスユニットのデータ量は、例えばｆｓ＝９６ｋＨｚの場合には（１／９６ｋＨｚ）秒分であるが、図８、図９（ａ）に詳しく示すように可変長である。そして、入力バッファ２２ａに蓄積されたストリームデータはＤＴＳに基づいてＦＩＦＯで読み出されてアンパッキング回路２２に印加される。 The stream data (predicted residual data string) separated by the deformatting circuit 21 is taken and stored in the input buffer 22a for each access unit by the SCR as shown in FIG. Here, the data amount of one access unit is, for example, (1/96 kHz) when fs = 96 kHz, but is variable length as shown in detail in FIGS. 8 and 9A. Then, the stream data stored in the input buffer 22a is read out by the FIFO based on the DTS and applied to the unpacking circuit 22.

アンパッキング回路２２は各ｃｈ「１」〜「６」の予測残差データ列をビット数フラグ毎に基づいて分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４ではそれぞれ、アンパッキング回路２２からの各ｃｈ「１」〜「６」の今回の予測残差データと、内部の複数の予測器の内、予測器選択フラグにより選択された各１つにより予測された前回の予測値が加算されて今回の予測値が算出され、次いで１フレームの先頭サンプルデータを基準として各サンプルのＰＣＭデータが算出されて出力バッファ１１０に蓄積される。出力バッファ１１０に蓄積されたＰＣＭデータはＰＴＳに基づいて読み出されて出力される。したがって、図９（ａ）に示す可変長のアクセスユニットが伸長されて、図９（ｂ）に示す一定長のプレ
ゼンテーションユニットが出力される。 The unpacking circuit 22 separates the prediction residual data strings of the channels “1” to “6” based on the bit number flags and outputs them to the prediction circuits 24D1, 24D2, and 23D1 to 23D4, respectively. Each of the prediction circuits 24D1, 24D2, 23D1 to 23D4 uses the current prediction residual data of each channel “1” to “6” from the unpacking circuit 22 and a predictor selection flag among a plurality of internal predictors. The previous prediction value predicted by each selected one is added to calculate the current prediction value, and then the PCM data of each sample is calculated and stored in the output buffer 110 with reference to the first sample data of one frame. Is done. The PCM data stored in the output buffer 110 is read and output based on the PTS. Therefore, the variable-length access unit shown in FIG. 9A is expanded, and the fixed-length presentation unit shown in FIG. 9B is output.

ここで、操作部１０１を介してサーチ再生が指示された場合には、制御部１００により図５に示すＡＤＩ内に置かれる１秒先を示す前方アクセスユニット・サーチポインタと１秒後を示す後方アクセスユニット・サーチポインタに基づいてアクセスユニットを再生する。このサーチポインタとしては、１秒先、１秒前の代わりに２秒先、２秒前のものでよい。 Here, when search reproduction is instructed via the operation unit 101, the control unit 100 places the forward access unit / search pointer indicating one second ahead placed in the ADI shown in FIG. 5 and the backward indicating one second later. The access unit is reproduced based on the access unit search pointer. This search pointer may be one second ahead and two seconds ahead instead of one second ahead and one second ahead.

図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビットストリームデータをネットワークを介して伝送する場合には、符号化側では図１０に示すように伝送用にパケット化し（ステップＳ４１）、次いでパケットヘッダを付与し（ステップＳ４２）、次いでこのパケットをネットワーク上に送り出す（ステップＳ４３）。 When the variable rate bit stream data predictively encoded by the encoding units 2′-1 and 2′-2 shown in FIG. 2 is transmitted via the network, the encoding side uses the transmission unit as shown in FIG. (Step S41), then a packet header is added (step S42), and then the packet is sent out on the network (step S43).

復号側では図１１（Ａ）に示すようにヘッダを除去し（ステップＳ５１）、次いでデータを復元し（ステップＳ５２）、次いでこのデータをメモリに格納して復号を待つ（ステップＳ５３）。そして、復号を行う場合には図１１（Ｂ）に示すように、デフォーマット化を行い（ステップＳ６１）、次いで入力バッファ２２ａの入出力制御を行い（ステップＳ６２）、次いでアンパッキングを行う（ステップＳ６３）。なお、このとき、サーチ再生指示がある場合にはサーチポインタをデコードする。次いで予測器をフラグに基づいて選択してデコードを行い（ステップＳ６４）、次いで出力バッファ１１０の入出力制御を行い（ステップＳ６５）、次いで元のマルチチャネルを復元し（ステップＳ６６）、次いでこれを出力し（ステップＳ６７）、以下、これを繰り返す。 As shown in FIG. 11A, the decoding side removes the header (step S51), then restores the data (step S52), then stores this data in the memory and waits for decoding (step S53). When decoding is performed, as shown in FIG. 11B, deformatting is performed (step S61), input / output control of the input buffer 22a is performed (step S62), and then unpacking is performed (step S61). S63). At this time, if there is a search reproduction instruction, the search pointer is decoded. Next, a predictor is selected and decoded based on the flag (step S64), then the input / output control of the output buffer 110 is performed (step S65), and then the original multi-channel is restored (step S66). This is repeated (step S67).

なお、上記実施形態では、前方グループに関する２ch「１」、「２」を
「１」＝Ｌｆ＋Ｒｆ
「２」＝Ｌｆ−Ｒｆ
により変換して予測符号化したが、代わりに式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成し、
次いで次式（１）’
「１」＝Ｌ＋Ｒ
「２」＝Ｌ−Ｒ
「３」〜「５」は同じ
「６」＝Ｌｆｅ−Ｃ …（１）’
により変換して予測符号化するようにしてもよい（第２の実施形態）。この場合には、復号化側のミクス＆マトリクス回路４’はチャネル「１」、「２」を加算することによりチャネルＬを、減算することによりチャネルＲを生成することができる。 In the above embodiment, 2ch “1” and “2” related to the front group are set to “1” = Lf + Rf.
“2” = Lf−Rf
However, instead, the multi-channel is downmixed according to Equation (2) to generate stereo 2ch data (L, R),
Next, the following formula (1) ′
“1” = L + R
“2” = LR
“3” to “5” are the same. “6” = Lfe-C (1) ′
(2nd embodiment). In this case, the decoding-side mix & matrix circuit 4 ′ can generate channel R by adding channels “1” and “2” and subtracting channel L by subtraction.

また、第３の実施形態として図１２に示すように、２ch「１」、「２」の代わりに式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成して、このステレオ２ch（Ｌ、Ｒ）と４ch「３」〜「６」を予測符号化するようにしてもよい。なお、第２、第３の実施形態では、フロントレフト（Ｌｆ）とフロントライト（Ｒｆ）が復号化側に伝送されないので、復号化側ではこれを式（１）、（２）により生成する。 Also, as shown in FIG. 12 as the third embodiment, stereo 2ch data (L, R) is generated by down-mixing the multi-channel according to equation (2) instead of 2ch “1” and “2”. The stereo 2ch (L, R) and 4ch "3" to "6" may be predictively encoded. In the second and third embodiments, since the front left (Lf) and the front right (Rf) are not transmitted to the decoding side, they are generated by the equations (1) and (2) on the decoding side.

次に図１３、図１４を参照して第４の実施形態について説明する。上記の実施形態では、１グループの相関性の信号「１」〜「６」を予測符号化するように構成されているが、この第４の実施形態では複数グループの相関性のある信号を生成して予測符号化し、圧縮率が最も高いグループの予測符号化データを選択するように構成されている。このため図１３に示す符号化部では、第１〜第ｎの相関回路１−１〜１−ｎが設けられ、このｎ個の相関回路１−１〜１−ｎは例えば６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータを、相関性が異なるｎ種類の６ch信号「１」〜「６」に変換する。 Next, a fourth embodiment will be described with reference to FIGS. In the above embodiment, a group of correlated signals “1” to “6” is configured to be predictively encoded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Thus, the prediction coding is performed, and the prediction coding data of the group having the highest compression rate is selected. For this reason, the encoding unit shown in FIG. 13 is provided with first to n-th correlation circuits 1-1 to 1-n, and these n correlation circuits 1-1 to 1-n are, for example, 6ch (Lf, C , Rf, Ls, Rs, Lfe) is converted into n types of 6-channel signals “1” to “6” having different correlations.

例えば第１の相関回路１−１は以下のように変換し、
「１」＝Ｌｆ
「２」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
「３」＝Ｒｆ−Ｌｆ
「４」＝Ｌｓ−ａ×Ｌｆｅ
「５」＝Ｒｓ−ｂ×Ｒｆ
「６」＝Ｌｆｅ
また、第ｎの相関回路１−ｎは以下のように変換する。 For example, the first correlation circuit 1-1 converts as follows:
“1” = Lf
“2” = C− (Ls + Rs) / 2
“3” = Rf−Lf
“4” = Ls−a × Lfe
“5” = Rs−b × Rf
“6” = Lfe
The n-th correlation circuit 1-n converts as follows.

「１」＝Ｌｆ＋Ｒｆ
「２」＝Ｃ−Ｌｆ
「３」＝Ｒｆ−Ｌｆ
「４」＝Ｌｓ−Ｌｆ
「５」＝Ｒｓ−Ｌｆ
「６」＝Ｌｆｅ−Ｃ
また、相関回路１−１〜１−ｎ毎に予測回路１５とバッファ・選択器１６が設けられ、グループ毎の予測残差の最小値のデータ量に基づいて圧縮率が最も高いグループが相関選択信号生成器１７ｂにより選択される。このとき、フォーマット化回路１９はその選択フラグ（相関回路選択フラグ、その相関回路の相関係数ａ、ｂ）を追加して多重化する。 “1” = Lf + Rf
“2” = C−Lf
“3” = Rf−Lf
“4” = Ls−Lf
“5” = Rs−Lf
“6” = Lfe-C
Further, a prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and the group having the highest compression rate is selected based on the data amount of the minimum value of the prediction residual for each group. It is selected by the signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).

また、図１４に示す復号化側では、符号化側の相関回路１−１〜１−ｎに対してｎ個の相関回路４−１〜４−ｎ（又は係数ａ、ｂが変更可能な図示省略の１つの相関回路）が設けられる。なお、図１３に示すｎグループの予測回路が同一の構成である場合、復号装置では図１４に示すようにｎグループ分の予測回路を設ける必要はなく、１つのグループ分の予測回路でよい。そして、符号化装置から伝送された選択フラグに基づいて相関回路４−１〜４−ｎの１つを選択、又は係数ａ、ｂを設定して元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元し、また、式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成する。 Further, on the decoding side shown in FIG. 14, n correlation circuits 4-1 to 4-n (or coefficients a and b can be changed with respect to the correlation circuits 1-1 to 1-n on the encoding side. One correlation circuit (omitted) is provided. When the n groups of prediction circuits shown in FIG. 13 have the same configuration, the decoding device does not need to have n groups of prediction circuits as shown in FIG. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or the coefficients a and b are set and the original 6ch (Lf, C, Rf, Ls, Rs, Lfe) is restored, and the multi-channel is downmixed according to Equation (2) to generate stereo 2ch data (L, R).

また、上記の第１の実施形態では、１種類の相関性の信号「１」〜「６」を予測符号化するように構成されているが、この信号「１」〜「６」のグループと原信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグループを予測符号化し、圧縮率が高い方のグループを選択するようにしてもよい。 In the first embodiment described above, one type of correlation signal “1” to “6” is configured to be predictively encoded. The group of signals “1” to “6” A group of original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively encoded, and a group with a higher compression rate may be selected.

本発明が適用される音声符号化装置とそれに対応した音声復号装置の第１の実施形態を示すブロック図である。It is a block diagram which shows 1st Embodiment of the audio | voice coding apparatus with which this invention is applied, and the audio | voice decoding apparatus corresponding to it. 図１の符号化部を詳しく示すブロック図である。It is a block diagram which shows the encoding part of FIG. 1 in detail. 図１、図２の符号化部により符号化されたビットストリームを示す説明図である。It is explanatory drawing which shows the bit stream encoded by the encoding part of FIG. 1, FIG. ＤＶＤのパックのフォーマットを示す説明図である。It is explanatory drawing which shows the format of the pack of DVD. ＤＶＤのオーディオパックのフォーマットを示す説明図である。It is explanatory drawing which shows the format of the audio pack of DVD. 図１の復号化部を詳しく示すブロック図である。It is a block diagram which shows the decoding part of FIG. 1 in detail. 図６の入力バッファの書き込み／読み出しタイミングを示すタイミングチャートである。7 is a timing chart showing write / read timings of the input buffer of FIG. 6. アクセスユニット毎の圧縮データ量を示す説明図である。It is explanatory drawing which shows the compressed data amount for every access unit. アクセスユニットとプレゼンテーションユニットを示す説明図である。It is explanatory drawing which shows an access unit and a presentation unit. 音声伝送方法を示すフローチャートである。It is a flowchart which shows the audio | voice transmission method. 音声伝送方法を示すフローチャートである。It is a flowchart which shows the audio | voice transmission method. 本発明が適用される音声符号化装置とそれに対応した音声復号装置の第３の実施形態を示すブロック図である。It is a block diagram which shows 3rd Embodiment of the audio | voice encoding apparatus with which this invention is applied, and the audio | voice decoding apparatus corresponding to it. 第４の実施形態の音声符号化装置を示すブロック図である。It is a block diagram which shows the audio | voice coding apparatus of 4th Embodiment. 第４の実施形態の音声復号装置を示すブロック図である。It is a block diagram which shows the audio | voice decoding apparatus of 4th Embodiment.

Explanation of symbols

１’ ６chミクス＆マトリクス回路
１３Ｄ１，１３Ｄ２，１５Ｄ１〜１５Ｄ４予測回路（バッファ・選択器１４
Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４と共に圧縮手段を構成する。）
１４Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４バッファ・選択器
１７選択信号／ＤＴＳ生成器（タイミング生成手段）
１７ｃＰＴＳ生成器（タイミング生成手段）
１９フォーマット化回路（フォーマット化手段）
２１デフォーマット化回路（分離手段）
２２アンパッキング回路
２２ａ入力バッファ
２４Ｄ１，２４Ｄ２，２３Ｄ１〜２３Ｄ４予測回路（伸長手段）
１００制御部（読み出し手段）
１１０出力バッファ 1 '6ch mix & matrix circuit 13D1, 13D2, 15D1-15D4 Prediction circuit (buffer / selector 14)
The compression means is configured together with D1, 14D2, 16D1 to 16D4. )
14D1, 14D2, 16D1 to 16D4 Buffer / selector 17 Selection signal / DTS generator (timing generation means)
17c PTS generator (timing generator)
19 Formatting circuit (formatting means)
21 Deformatting circuit (separation means)
22 Unpacking circuit 22a Input buffer 24D1, 24D2, 23D1 to 23D4 Prediction circuit (expanding means)
100 Control unit (reading means)
110 Output buffer

Claims

In response to a voice signal inputted audio signals of a plurality of channels each channel taken intact channel or correlated, with obtaining the top sample values by a plurality of linear prediction method different characteristics from the past in the time domain Selecting and predicting a linear prediction method such that a linear prediction value of a current signal is predicted, and a prediction residual obtained from the predicted linear prediction value and the speech signal is minimized;
Packing the prediction residual with the number of bits based on the number-of-bits information when packing the prediction encoded data including the linear prediction method, the prediction residual, and a predetermined head sample value of each selected channel ; When,
In accordance with the amount of the compressed data packed, decoding time stamp information indicating the timing for reading the compressed data in the input buffer on the decoding side is generated, and the decompressed data is temporarily stored on the decoding side. Generating presentation time stamp information indicating the timing when output
A packet header including the decoding time stamp information and the presentation time stamp information, a compressed PCM private header including the UPC / EAN number and ISRC data of the audio signal, and a compressed PCM access unit including the compressed data And formatting into packets having user data comprising:
A speech encoding method comprising:

An audio decoding method for decoding the original audio signals of the plurality of channels from the data encoded by the audio encoding method according to claim 1,
Decoding the UPC / EAN number and ISRC data located in the compressed PCM private header;
Decoding the stored subpackets based on the decoding time stamp information to separate compressed PCM access units;
The prediction residual of the compressed data in the separated compressed PCM access unit is decoded with the number of bits based on bit information, and the prediction value is calculated based on the decoded prediction residual, the head sample value, and the linear prediction method. Calculating for each channel ;
Restoring the original audio signals of the plurality of channels from the calculated predicted values to audio data;
Retrieving the restored audio data based on the presentation time stamp information;
A speech decoding method comprising: