JP2002049399A

JP2002049399A - Digital signal processing method, learning method, and their apparatus, and program storage media therefor

Info

Publication number: JP2002049399A
Application number: JP2000238898A
Authority: JP
Inventors: Tetsujiro Kondo; 哲二郎近藤; Masaaki Hattori; 正明服部; Tsutomu Watanabe; 勉渡辺; Hiroto Kimura; 裕人木村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-08-02
Filing date: 2000-08-02
Publication date: 2002-02-15
Anticipated expiration: 2020-08-02
Also published as: JP4645869B2

Abstract

PROBLEM TO BE SOLVED: To provide a digital signal processing method capable of further improving the waveform reproducibility of a digital signal, a learning method, and their apparatus, and a program storage media therefor. SOLUTION: In this digital signal processing method, power spectrum data are calculated from a digital audio signal D10, the calculated power spectrum is normalized at the maximum value width and normalization data are calculated, the class of the digital audio signal D10 is categorized on the basis of the calculated normalization data, and the digital audio signal D10 is converted by a prediction system corresponding to the categorized class. Thus the method makes it possible to perform conversion further adaptive to the characteristics of the digital audio signal D10.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はディジタル信号処理
方法、学習方法及びそれらの装置並びにプログラム格納
媒体に関し、レートコンバータ又はＰＣＭ(Pulse Code
Modulation) 復号装置等においてディジタル信号に対し
てデータの補間処理を行うディジタル信号処理方法、学
習方法及びそれらの装置並びにプログラム格納媒体に適
用して好適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital signal processing method, a learning method, a device thereof, and a program storage medium, and more particularly to a rate converter or a PCM (Pulse Code).
Modulation) The present invention is suitably applied to a digital signal processing method and a learning method for performing data interpolation processing on a digital signal in a decoding device or the like, a device thereof, and a program storage medium.

【０００２】[0002]

【従来の技術】従来、ディジタルオーディオ信号をディ
ジタル／アナログコンバータに入力する前に、サンプリ
ング周波数を元の値の数倍に変換するオーバサンプリン
グ処理を行っている。これにより、ディジタル／アナロ
グコンバータから出力されたディジタルオーディオ信号
はアナログ・アンチ・エイリアス・フィルタの位相特性
が可聴周波数高域で一定に保たれ、また、サンプリング
に伴うディジタル系のイメージ雑音の影響が排除される
ようになされている。2. Description of the Related Art Conventionally, before a digital audio signal is input to a digital / analog converter, an oversampling process for converting a sampling frequency to several times the original value is performed. As a result, the digital audio signal output from the digital / analog converter maintains the phase characteristic of the analog anti-aliasing filter constant at high audio frequencies, and eliminates the influence of digital image noise caused by sampling. It has been made to be.

【０００３】かかるオーバサンプリング処理では、通
常、線形一次（直線）補間方式のディジタルフィルタが
用いられている。このようなディジタルフィルタは、サ
ンプリングレートが変わったりデータが欠落した場合等
に、複数の既存データの平均値を求めて直線的な補間デ
ータを生成するものである。In such oversampling processing, a digital filter of a linear primary (linear) interpolation system is usually used. Such a digital filter generates linear interpolation data by calculating the average value of a plurality of existing data when the sampling rate changes or data is lost.

【０００４】[0004]

【発明が解決しようとする課題】ところが、オーバサン
プリング処理後のディジタルオーディオ信号は、線形一
次補間によって時間軸方向に対してデータ量が数倍に緻
密になっているものの、オーバサンプリング処理後のデ
ィジタルオーディオ信号の周波数帯域は変換前とあまり
変わらず、音質そのものは向上していない。さらに、補
間されたデータは必ずしもＡ／Ｄ変換前のアナログオー
ディオ信号の波形に基づいて生成されたのではないた
め、波形再現性もほとんど向上していない。However, although the digital audio signal after the oversampling process has a data amount several times more dense in the time axis direction by linear linear interpolation, the digital audio signal after the oversampling process has been used. The frequency band of the audio signal is not much different from that before conversion, and the sound quality itself has not been improved. Furthermore, since the interpolated data is not necessarily generated based on the waveform of the analog audio signal before A / D conversion, the waveform reproducibility is hardly improved.

【０００５】また、サンプリング周波数の異なるディジ
タルオーディオ信号をダビングする場合において、サン
プリング・レート・コンバータを用いて周波数を変換し
ているが、かかる場合でも線形一次ディジタルフィルタ
によって直線的なデータの補間しか行うことができず、
音質や波形再現性を向上することが困難であった。さら
に、ディジタルオーディオ信号のデータサンプルが欠落
した場合において同様である。Further, when dubbing digital audio signals having different sampling frequencies, the frequency is converted using a sampling rate converter. Even in such a case, only linear data interpolation is performed by a linear primary digital filter. Can not
It was difficult to improve sound quality and waveform reproducibility. The same applies to a case where a data sample of a digital audio signal is missing.

【０００６】本発明は以上の点を考慮してなされたもの
で、ディジタルオーディオ信号の波形再現性を一段と向
上し得るディジタル信号処理方法、学習方法及びそれら
の装置並びにプログラム格納媒体を提案しようとするも
のである。The present invention has been made in view of the above points, and proposes a digital signal processing method, a learning method, a device thereof, and a program storage medium capable of further improving the waveform reproducibility of a digital audio signal. Things.

【０００７】[0007]

【課題を解決するための手段】かかる課題を解決するた
め本発明においては、ディジタルオーディオ信号からパ
ワースペクトルデータを算出し、算出されたパワースペ
クトルデータを最大値幅で正規化して正規化データを算
出し、算出された正規化データに基づいてそのクラスを
分類し、分類されたクラスに対応した予測方式でディジ
タルオーディオ信号を変換するようにしたことにより、
一段とディジタルオーディオ信号の特徴に適応した変換
を行うことができる。According to the present invention, power spectrum data is calculated from a digital audio signal, and the calculated power spectrum data is normalized by a maximum value width to calculate normalized data. By classifying the class based on the calculated normalized data and converting the digital audio signal by a prediction method corresponding to the classified class,
It is possible to perform conversion that is more adapted to the characteristics of the digital audio signal.

【０００８】[0008]

【発明の実施の形態】以下図面について、本発明の一実
施の形態を詳述する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below in detail with reference to the drawings.

【０００９】図１においてオーディオ信号処理装置１０
は、ディジタルオーディオ信号（以下これをオーディオ
データと呼ぶ）のサンプリングレートを上げたり、オー
ディオデータを補間する際に、真値に近いオーディオデ
ータをクラス分類適用処理によって生成するようになさ
れている。In FIG. 1, an audio signal processing device 10
When increasing the sampling rate of a digital audio signal (hereinafter referred to as audio data) or interpolating audio data, audio data that is close to a true value is generated by class classification application processing.

【００１０】因みに、この実施の形態におけるオーディ
オデータとは、人間の声や楽器の音等を表す楽音デー
タ、さらにはその他種々の音を表すデータである。[0010] Incidentally, the audio data in the present embodiment is tone data representing human voices and the sounds of musical instruments, and data representing various other sounds.

【００１１】すなわち、オーディオ信号処理装置１０に
おいて、スペクトル処理部１１は入力端子Ｔ_INから供給
された入力オーディオデータＤ１０を所定時間毎の領域
（この実施の形態の場合、例えば６サンプル毎とする）
に切り出した時間軸波形データであるクラスタップを構
築した後、当該構築したクラスタップについて、後述す
る対数データ算出方法により、入力手段１８から供給さ
れる制御データＤ１８に応じて対数データを算出する。That is, in the audio signal processing apparatus 10, the spectrum processing section 11 converts the input audio data D10 supplied from the input terminal T _IN into a region for every predetermined time (in this embodiment, for example, every six samples).
After constructing a class tap which is the time axis waveform data cut out in the above, log data is calculated for the constructed class tap according to control data D18 supplied from the input unit 18 by a log data calculation method described later.

【００１２】スペクトル処理部１１は入力オーディオデ
ータＤ１０のこのとき構築されたクラスタップについ
て、対数データ算出方法による算出結果であってクラス
分類しようとする対数データＤ１１を算出し、これをク
ラス分類部１４に供給する。The spectrum processing unit 11 calculates log data D11, which is a result of the log data calculation method and is to be classified, with respect to the class taps constructed at this time of the input audio data D10. To supply.

【００１３】クラス分類部１３は、スペクトル処理部１
１から供給された対数データＤ１１について、当該対数
データＤ１１を圧縮して圧縮データパターンを生成する
ＡＤＲＣ(Adaptive Dynamic Range Coding) 回路部と、
対数データＤ１１の属するクラスコードを発生するクラ
スコード発生回路部とを有する。The classifying unit 13 includes the spectrum processing unit 1
An ADRC (Adaptive Dynamic Range Coding) circuit unit for compressing the log data D11 and generating a compressed data pattern for the log data D11 supplied from 1;
A class code generation circuit for generating a class code to which the logarithmic data D11 belongs.

【００１４】ＡＤＲＣ回路部は対数データＤ１１に対し
て、例えば８ビットから２ビットに圧縮するような演算
を行うことによりパターン圧縮データを形成する。この
ＡＤＲＣ回路部は、適応的量子化を行うものであり、こ
こでは、信号レベルの局所的なパターンを短い語長で効
率的に表現することができるので、信号パターンのクラ
ス分類のコード発生用に用いられる。The ADRC circuit section performs an operation to compress the logarithmic data D11 from, for example, 8 bits to 2 bits to form pattern compression data. The ADRC circuit section performs adaptive quantization. Here, since a local pattern of a signal level can be efficiently represented by a short word length, the ADRC circuit section is used for generating a code for classifying a signal pattern. Used for

【００１５】具体的には、６つの８ビットのデータ（対
数データ）をクラス分類しようとする場合、２⁴⁸という
膨大な数のクラスに分類しなければならず、回路上の負
担が多くなる。そこで、この実施の形態のクラス分類部
１４ではその内部に設けられたＡＤＲＣ回路部で生成さ
れるパターン圧縮データに基づいてクラス分類を行う。
例えば６つの対数データに対して１ビットの量子化を実
行すると、６つの対数データを６ビットで表すことがで
き、２⁶＝６４クラスに分類することができる。[0015] More specifically, when attempting to classification six 8-bit data (log data), must be classified into enormous number of classes 2 ^48, the greater the burden on the circuit. Therefore, the class classification unit 14 of this embodiment classifies the data based on the compressed pattern data generated by the ADRC circuit unit provided therein.
For example, if 1-bit quantization is performed on six logarithmic data, the six logarithmic data can be represented by six bits and can be classified into 2 ⁶ = 64 classes.

【００１６】ここで、ＡＤＲＣ回路部は、切り出された
領域内のダイナミックレンジをＤＲ、ビット割り当てを
ｍ、各対数データのデータレベルをＬ、量子化コードを
Ｑとすると、次式、Here, the ADRC circuit section calculates the dynamic range in the cut-out area as DR, the bit allocation as m, the data level of each logarithmic data as L, and the quantization code as Q, as follows:

【００１７】[0017]

【数１】 (Equation 1)

【００１８】に従って、領域内の最大値ＭＡＸと最小値
ＭＩＮとの間を指定されたビット長で均等に分割して量
子化を行う。なお、（１）式において｛｝は小数点以
下の切り捨て処理を意味する。かくしてスペクトル処理
部１１において算出された６つの対数データが、それぞ
れ例えば８ビット（ｍ＝８）で構成されているとする
と、これらはＡＤＲＣ回路部においてそれぞれが２ビッ
トに圧縮される。In accordance with the above, quantization between the maximum value MAX and the minimum value MIN in the area is equally divided by the designated bit length. In equation (1), {} means truncation processing below the decimal point. Assuming that each of the six logarithmic data calculated in the spectrum processing unit 11 is composed of, for example, 8 bits (m = 8), these are each compressed to 2 bits in the ADRC circuit unit.

【００１９】このようにして圧縮された対数データをそ
れぞれｑ_n（ｎ＝１〜６）とすると、クラス分類部１４
に設けられたクラスコード発生回路部は、圧縮された対
数データｑ_nに基づいて、次式、Assuming that the log data thus compressed is q _n (n = 1 to 6), the class classification unit 14
Is based on the compressed log data q _n , the following equation:

【００２０】[0020]

【数２】 (Equation 2)

【００２１】に示す演算を実行することにより、そのブ
ロック（ｑ₁〜ｑ₆）が属するクラスを示すクラスコー
ドclass を算出し、当該算出されたクラスコードclass
を表すクラスコードデータＤ１４を予測係数メモリ１５
に供給する。このクラスコードclass は、予測係数メモ
リ１５から予測係数を読み出す際の読み出しアドレスを
示す。因みに（２）式において、ｎは圧縮された対数デ
ータｑ_nの数を表し、この実施の形態の場合ｎ＝６であ
り、またＰはビット割り当てを表し、この実施の形態の
場合Ｐ＝２である。By executing the operation shown in ( ₁ ), a class code class indicating the class to which the block (q _{1 to} q ₆ ) belongs is calculated, and the calculated class code class
Is stored in the prediction coefficient memory 15
To supply. This class code class indicates a read address when a prediction coefficient is read from the prediction coefficient memory 15. In the expression (2), n represents the number of compressed logarithmic data q _n , n = 6 in this embodiment, P represents bit allocation, and P = 2 in this embodiment. It is.

【００２２】このようにして、クラス分類部１４は入力
オーディオデータＤ１０から算出された対数データＤ１
１のクラスコードデータＤ１４を生成し、これを予測係
数メモリ１５に供給する。As described above, the classifying unit 14 calculates the logarithmic data D1 calculated from the input audio data D10.
1 is generated and supplied to the prediction coefficient memory 15.

【００２３】予測係数メモリ１５には、各クラスコード
に対応する予測係数のセットがクラスコードに対応する
アドレスにそれぞれ記憶されており、クラス分類部１４
から供給されるクラスコードデータＤ１４に基づいて、
当該クラスコードに対応するアドレスに記憶されている
予測係数のセットＷ₁〜Ｗ_nが読み出され、予測演算部
１６に供給される。The prediction coefficient memory 15 stores a set of prediction coefficients corresponding to each class code at an address corresponding to the class code.
Based on the class code data D14 supplied from
The set of prediction coefficients W _{1 to} W _n stored at the address corresponding to the class code is read and supplied to the prediction calculation unit 16.

【００２４】予測演算部１６は、予測演算部抽出部１３
において入力オーディオデータＤ１０から時間軸領域で
切り出された予測演算しようとするオーディオ波形デー
タ（予測タップ）Ｄ１３（Ｘ₁〜Ｘ_n）と、予測係数Ｗ
₁〜Ｗ_nに対して、次式The prediction calculation unit 16 includes a prediction calculation unit extraction unit 13
, Audio waveform data (prediction taps) D13 (X _{1 to} X _n ) cut out from the input audio data D10 in the time domain to be calculated, and a prediction coefficient W
_{For 1 to} W _n ,

【００２５】[0025]

【数３】 (Equation 3)

【００２６】に示す積和演算を行うことにより、予測結
果ｙ′を得る。この予測値ｙ′が、音質が改善されたオ
ーディオデータＤ１６として予測演算部１６から出力さ
れる。The prediction result y 'is obtained by performing the product-sum operation shown in FIG. The prediction value y 'is output from the prediction calculation unit 16 as audio data D16 with improved sound quality.

【００２７】なお、オーディオ信号処理装置１０の構成
として図１について上述した機能ブロックを示したが、
この機能ブロックを構成する具体的構成として、この実
施の形態においては図２に示すコンピュータ構成の装置
を用いる。すなわち、図２において、オーディオ信号処
理装置１０は、バスＢＵＳを介してＣＰＵ２１、ＲＯＭ
(Read Only Memory)２２、予測係数メモリ１５を構成す
るＲＡＭ(Random Access Memory)１５、及び各回路部が
それぞれ接続された構成を有し、ＣＰＵ１１はＲＯＭ２
２に格納されている種々のプログラムを実行することに
より、図１について上述した各機能ブロック（スペクト
ル処理部１１、予測演算部抽出部１３、クラス分類部１
４及び予測演算部１６）として動作するようになされて
いる。Although the functional blocks described above with reference to FIG. 1 are shown as the configuration of the audio signal processing device 10,
As a specific configuration of the functional blocks, in this embodiment, an apparatus having a computer configuration shown in FIG. 2 is used. That is, in FIG. 2, the audio signal processing device 10 includes a CPU 21 and a ROM via a bus BUS.
(Read Only Memory) 22, a RAM (Random Access Memory) 15 constituting the prediction coefficient memory 15, and each circuit unit are connected to each other.
By executing various programs stored in the storage unit 2, the function blocks (the spectrum processing unit 11, the prediction calculation unit extraction unit 13, and the class classification unit 1) described above with reference to FIG.
4 and a prediction calculation unit 16).

【００２８】また、オーディオ信号処理装置１０にはネ
ットワークとの間で通信を行う通信インターフェース２
４、フロッピィディスクや光磁気ディスク等の外部記憶
媒体から情報を読み出すリムーバブルドライブ２８を有
し、ネットワーク経由又は外部記憶媒体から図１につい
て上述したクラス分類適用処理を行うための各プログラ
ムをハードディスク装置２５のハードディスクに読み込
み、当該読み込まれたプログラムに従ってクラス分類適
応処理を行うこともできる。The audio signal processor 10 has a communication interface 2 for communicating with a network.
4. A hard disk drive 25 having a removable drive 28 for reading information from an external storage medium such as a floppy disk or a magneto-optical disk, and performing the class classification application processing described above with reference to FIG. , And the classification adaptive processing can be performed according to the read program.

【００２９】ユーザは、キーボードやマウス等の入力手
段１８を介して種々のコマンドを入力することにより、
ＣＰＵ２１に対して図１について上述したクラス分類処
理を実行させる。この場合、オーディオ信号処理装置１
０はデータ入出力部２７を介して音質を向上させようと
するオーディオデータ（入力オーディオデータ）Ｄ１０
を入力し、当該入力オーディオデータＤ１０に対してク
ラス分類適用処理を施した後、音質が向上したオーディ
オデータＤ１６をデータ入出力部２７を介して外部に出
力し得るようになされている。The user inputs various commands through input means 18 such as a keyboard and a mouse,
The CPU 21 is caused to execute the class classification processing described above with reference to FIG. In this case, the audio signal processing device 1
0 is audio data (input audio data) D10 whose sound quality is to be improved via the data input / output unit 27.
After the input audio data D10 is subjected to the class classification application processing, the audio data D16 with improved sound quality can be output to the outside via the data input / output unit 27.

【００３０】因みに、図３はオーディオ信号処理装置１
０におけるクラス分類適応処理の処理手順を示し、オー
ディオ信号処理装置１０はステップＳＰ１０１から当該
処理手順に入ると、続くステップＳＰ１０２において入
力オーディオデータＤ１０の対数データＤ１１をスペク
トル処理部１１において算出する。FIG. 3 shows the audio signal processing device 1
The processing procedure of the class classification adaptation processing at 0 is shown. When the audio signal processing apparatus 10 enters the processing procedure from step SP101, the logarithmic data D11 of the input audio data D10 is calculated by the spectrum processing unit 11 in the following step SP102.

【００３１】この算出された対数データＤ１１は入力オ
ーディオデータＤ１０の特徴を表すものであり、オーデ
ィオ信号処理装置１０は、ステップＳＰ１０３に移って
クラス分類部１４により対数データＤ１１に基づいてク
ラスを分類する。そしてオーディオ信号処理装置１０は
クラス分類の結果得られたクラスコードを用いて予測係
数メモリ１５から予測係数を読み出す。この予測係数は
予め学習によりクラス毎に対応して格納されており、オ
ーディオ信号処理装置１０はクラスコードに対応した予
測係数を読み出すことにより、このときの対数データＤ
１１の特徴に合致した予測係数を用いることができる。The calculated log data D11 represents the characteristics of the input audio data D10, and the audio signal processing apparatus 10 proceeds to step SP103 and classifies the class based on the log data D11 by the class classification unit 14. . Then, the audio signal processing device 10 reads a prediction coefficient from the prediction coefficient memory 15 using the class code obtained as a result of the classification. The prediction coefficients are stored in advance for each class by learning, and the audio signal processing device 10 reads out the prediction coefficients corresponding to the class codes, thereby obtaining the logarithmic data D at this time.
A prediction coefficient matching the eleven characteristics can be used.

【００３２】予測係数メモリ１５から読み出された予測
係数は、ステップＳＰ１０４において予測演算部１６の
予測演算に用いられる。これにより、入力オーディオデ
ータＤ１０はその対数データＤ１１の特徴に適応した予
測演算により、所望とするオーディオデータＤ１６に変
換される。かくして入力オーディオデータＤ１０はその
音質が改善されたオーディオデータＤ１６に変換され、
オーディオ信号処理装置１０はステップＳＰ１０５に移
って当該処理手順を終了する。The prediction coefficient read from the prediction coefficient memory 15 is used in the prediction operation of the prediction operation unit 16 in step SP104. As a result, the input audio data D10 is converted into desired audio data D16 by a prediction operation adapted to the characteristics of the log data D11. Thus, the input audio data D10 is converted into audio data D16 having improved sound quality,
The audio signal processing device 10 proceeds to step SP105 and ends the processing procedure.

【００３３】次に、オーディオ信号処理装置１０のスペ
クトル処理部１１における入力オーディオデータＤ１０
の対数データＤ１１の算出方法について説明する。Next, the input audio data D10 in the spectrum processing section 11 of the audio signal processing device 10
The method of calculating the log data D11 will be described.

【００３４】すなわち、図４はスペクトル処理部１１に
おける対数データ算出方法の対数データ算出処理手順を
示し、スペクトル処理部１１はステップＳＰ１から当該
処理手順に入ると、続くステップＳＰ２において入力オ
ーディオデータＤ１０を所定時間毎の領域に切り出した
時間軸波形データであるクラスタップを構築し、ステッ
プＳＰ３に移る。That is, FIG. 4 shows a logarithmic data calculation processing procedure of the logarithmic data calculation method in the spectrum processing section 11. When the processing procedure starts from step SP1, the spectrum processing section 11 converts the input audio data D10 in step SP2. A class tap, which is time-axis waveform data cut out for each predetermined time, is constructed, and the process proceeds to step SP3.

【００３５】ステップＳＰ３において、スペクトル処理
部１１はクラスタップに対して、窓関数を「Ｗ（ｋ）」
とすると、次式、In step SP3, the spectrum processing unit 11 sets the window function to "W (k)" for the class tap.
Then the following equation:

【００３６】[0036]

【数４】 (Equation 4)

【００３７】に示すハミング窓に従って、乗算データを
算出し、ステップＳＰ４に移る。因みに、この窓関数の
乗算処理においては、続くステップＳＰ４において行わ
れる周波数分析の精度を向上させるために、このとき構
築されたそれぞれのクラスタップの最初の値と最後の値
を等しくするようになされている。また、（１）式にお
いて、「Ｎ」はハミング窓のサンプル数を表しており、
「ｋ」は何番目のサンプルデータであるかを表してい
る。The multiplication data is calculated according to the Hamming window shown in FIG. Incidentally, in the multiplication processing of the window function, the first value and the last value of each of the class taps constructed at this time are made equal in order to improve the accuracy of the frequency analysis performed in the subsequent step SP4. ing. Further, in equation (1), “N” represents the number of samples of the Hamming window,
“K” indicates the order of the sample data.

【００３８】ステップＳＰ４において、スペクトル処理
部１１は乗算データに対して、高速フーリエ変換（ＦＦ
Ｔ：Fast Fourier Transform）を行うことにより、図５
に示すようなパワースペクトルデータを算出し、ステッ
プＳＰ５に移る。In step SP4, the spectrum processing unit 11 performs a fast Fourier transform (FF) on the multiplied data.
T: Fast Fourier Transform)
Then, the power spectrum data as shown in FIG.

【００３９】ステップＳＰ５において、スペクトル処理
部１１はパワースペクトルデータから有意であるパワー
スペクトルデータのみを抽出するようになされている。At step SP5, the spectrum processing section 11 extracts only significant power spectrum data from the power spectrum data.

【００４０】この抽出処理において、Ｎ個の乗算データ
から算出したパワースペクトルデータのうち、Ｎ／２か
ら右側のパワースペクトルデータ群ＡＲ２（図５）は、
ゼロ値からＮ／２までの左側のパワースペクトルデータ
群ＡＲ１（図５）とほぼ同じ成分になる（すなわち、左
右対称となる）。このことは、Ｎ個の乗算データの周波
数帯域内で、両端から等距離にある２個の周波数点にお
けるパワースペクトルデータの成分が互いに共役である
ことを示している。従って、スペクトル処理部１１は、
ゼロ値からＮ／２までの左側のパワースペクトルデータ
群ＡＲ１（図５）のみを抽出対象とする。In this extraction processing, among the power spectrum data calculated from the N multiplied data, the power spectrum data group AR2 (FIG. 5) on the right side from N / 2 is
The components are almost the same as the power spectrum data group AR1 (FIG. 5) on the left side from the zero value to N / 2 (that is, symmetric). This indicates that the components of the power spectrum data at two frequency points equidistant from both ends within the frequency band of the N multiplied data are conjugate to each other. Therefore, the spectrum processing unit 11
Only the left power spectrum data group AR1 (FIG. 5) from the zero value to N / 2 is to be extracted.

【００４１】そしてスペクトル処理部１１は、このとき
抽出対象としたパワースペクトルデータ群ＡＲ１のう
ち、予めユーザが入力手段１８（図１及び図２）を介し
て選択設定した以外のｍ個のパワースペクトルデータを
除いて抽出する。Then, the spectrum processing unit 11 selects m power spectrums from the power spectrum data group AR1 to be extracted which are not selected and set in advance by the user via the input means 18 (FIGS. 1 and 2). Extract without data.

【００４２】具体的には、ユーザが入力手段１８を介し
て例えば人間の声を一段と高音質にするように選択設定
を行った場合、当該選択操作に応じた制御データＤ１８
が入力手段１８からスペクトル処理部１１に出力され
（図１及び図２）、これによりスペクトル処理部１１
は、このとき抽出したパワースペクトルデータ群ＡＲ１
（図５）から、人間の声において有意となる５００Ｈｚ
から４ｋＨｚ付近のパワースペクトルデータのみを抽出
する（すなわち５００Ｈｚから４ｋＨｚ付近以外のパワ
ースペクトルデータが、除くべきｍ個のパワースペクト
ルデータである）。More specifically, when the user makes a selection setting through the input means 18 so that, for example, a human voice has a higher sound quality, the control data D18 corresponding to the selection operation is set.
Is output from the input means 18 to the spectrum processing unit 11 (FIGS. 1 and 2), whereby the spectrum processing unit 11
Is the power spectrum data group AR1 extracted at this time.
From (Fig. 5), 500 Hz that is significant in human voice
, And extracts only the power spectrum data in the vicinity of 4 kHz (that is, the power spectrum data other than the vicinity of 500 Hz to 4 kHz is m power spectrum data to be removed).

【００４３】また、ユーザが入力手段１８（図１及び図
２）を介して例えば音楽を一段と高音質にするように選
択を行った場合には、当該選択操作に応じた制御データ
Ｄ１８が入力手段１８からスペクトル処理部１１に出力
され、これによりスペクトル処理部１１は、このとき抽
出したパワースペクトルデータ群ＡＲ１（図５）から、
音楽において有意となる２０Ｈｚから２０ｋＨｚ付近の
パワースペクトルデータのみを抽出する（すなわち２０
Ｈｚから２０ｋＨｚ付近以外のパワースペクトルデータ
が、除くべきｍ個のパワースペクトルデータである）。When the user makes a selection through the input means 18 (FIGS. 1 and 2) so that, for example, the music has a higher sound quality, the control data D18 corresponding to the selection operation is input to the input means 18. 18 to the spectrum processing unit 11, whereby the spectrum processing unit 11 outputs the power spectrum data group AR 1 (FIG. 5)
Only power spectrum data around 20 Hz to 20 kHz which is significant in music is extracted (ie, 20 Hz).
The power spectrum data other than the frequency from about 20 Hz to 20 kHz is m power spectrum data to be removed.)

【００４４】このように入力手段１８（図１及び図２）
から出力される制御データＤ１８は、有意なパワースペ
クトルデータとして抽出する周波数成分を決定づけるよ
うになされており、入力手段１８（図１及び図２）を介
して手動で選択操作するユーザの意図を反映している。Thus, the input means 18 (FIGS. 1 and 2)
The control data D18 output from the CPU determines the frequency components to be extracted as significant power spectrum data, and reflects the intention of the user to manually perform the selection operation via the input means 18 (FIGS. 1 and 2). are doing.

【００４５】従って、制御データＤ１８に応じてパワー
スペクトルデータを抽出するスペクトル処理部１１は、
ユーザが高音質での出力を希望する特定のオーディオ成
分の周波数成分を有意なパワースペクトルデータとして
抽出することとなる。Therefore, the spectrum processing unit 11 for extracting the power spectrum data according to the control data D18
The frequency component of the specific audio component that the user desires to output with high sound quality is extracted as significant power spectrum data.

【００４６】因みに、スペクトル処理部１１は、抽出対
象としたパワースペクトルデータ群ＡＲ１のうち、もと
の波形の音程を表すため、有意な特徴をもたない直流成
分のパワースペクトルデータをも除いて抽出するように
なされている。Incidentally, since the spectrum processing unit 11 represents the pitch of the original waveform in the power spectrum data group AR1 to be extracted, the spectrum processing unit 11 excludes the DC spectrum power spectrum data having no significant feature. It has been made to extract.

【００４７】このように、ステップＳＰ５において、ス
ペクトル処理部１１は制御データＤ１８に応じて、パワ
ースペクトルデータ群ＡＲ１（図５）からｍ個のパワー
スペクトルデータを除くと共に、直流成分のパワースペ
クトルデータも除いてなる必要最小限のパワースペクト
ルデータ、すなわち有意なパワースペクトルデータのみ
を抽出し、続くステップＳＰ６に移る。As described above, in step SP5, the spectrum processing unit 11 removes the m pieces of power spectrum data from the power spectrum data group AR1 (FIG. 5) according to the control data D18, and also removes the DC component power spectrum data. Only the necessary minimum power spectrum data to be removed, that is, only significant power spectrum data is extracted, and the process proceeds to subsequent step SP6.

【００４８】ステップＳＰ６において、スペクトル処理
部１１は抽出されたパワースペクトルデータに対して、
次式、In step SP6, the spectrum processing section 11 performs processing on the extracted power spectrum data.
The following formula,

【００４９】[0049]

【数５】 (Equation 5)

【００５０】に従って、このとき抽出されたパワースペ
クトルデータ（ｐｓ[k] ）の最大値（ｐｓ＿ｍａｘ）を
算出し、次式、The maximum value (ps_max) of the power spectrum data (ps [k]) extracted at this time is calculated according to the following equation:

【００５１】[0051]

【数６】 (Equation 6)

【００５２】に従って、このとき抽出されたパワースペ
クトルデータ（ｐｓ[k] ）の最大値（ｐｓ＿ｍａｘ）で
の正規化（除算）し、このとき得られた基準値（ｐｓｎ
[k] ）に対して、次式、According to the above, the power spectrum data (ps [k]) extracted at this time is normalized (divided) by the maximum value (ps_max), and the reference value (psn) obtained at this time is obtained.
[k]),

【００５３】[0053]

【数７】 (Equation 7)

【００５４】に従って、対数（デシベル値）変換を行う
ようになされている。In accordance with the above, logarithmic (decibel) conversion is performed.

【００５５】因みに（７）式において、ｌｏｇは常用対
数である。また対数変換においては、任意の基準値によ
って、小さな波形をもデシベル値（音圧レベル）として
表し得る。従って、例えば大きな波形付近に有意である
小さな波形が存在するオーディオデータをスペクトル処
理部１１が対数変換しなかった場合、当該オーディオデ
ータは一般的に１６ビット等の大きなビット数で量子化
されていることにより、有意である小さな波形部分が大
きな波形にマスキングされてしまう。In the expression (7), log is a common logarithm. In the logarithmic conversion, a small waveform can be represented as a decibel value (sound pressure level) by an arbitrary reference value. Therefore, for example, when the spectrum processing unit 11 does not logarithmically convert audio data having a significant small waveform near a large waveform, the audio data is generally quantized with a large bit number such as 16 bits. As a result, a significant small waveform portion is masked into a large waveform.

【００５６】このため、スペクトル処理部１１は、特徴
部分（有意である小さな波形部分）を見い出せないこと
になる。従って、スペクトル処理部１１は、対数変換を
行うことにより、特徴部分（有意である小さな波形部
分）をも見い出すようになされている。For this reason, the spectrum processing section 11 cannot find a characteristic portion (significant small waveform portion). Therefore, the spectrum processing unit 11 also performs a logarithmic transformation to find a characteristic portion (a significant small waveform portion).

【００５７】また、音感等の刺激に対する人間の感覚
は、ほぼその強さの対数に比例するため、対数変換にて
表した量（すなわち、デシベル値）は、感覚の度合いを
表すことになる。従って、スペクトル処理部１１は、対
数変換を行うことにより、結果として、音声を聞く対象
である人間が心地よく聞き得るようにする。Further, since a human sensation to a stimulus such as a sound pitch is almost proportional to the logarithm of the intensity, a quantity expressed by logarithmic conversion (ie, a decibel value) indicates a degree of sensation. Therefore, the spectrum processing unit 11 performs logarithmic conversion, so that the person who is to hear the sound can comfortably hear the sound.

【００５８】このように、ステップＳＰ６において、ス
ペクトル処理部１１は最大振幅で正規化及び振幅の対数
変換を行うことにより、特徴部分（有意である小さな波
形部分）をも見い出すと共に、結果として、音声を聞く
対象である人間が心地よく聞き得るようにする対数デー
タＤ１１を算出し、続くステップＳＰ７に移って対数デ
ータ算出処理手順を終了する。As described above, in step SP6, the spectrum processing unit 11 performs the normalization at the maximum amplitude and the logarithmic conversion of the amplitude to find a characteristic portion (significant small waveform portion). The log data D11 is calculated so that the person who is to listen to can listen comfortably, and the process proceeds to step SP7 where the log data calculation procedure ends.

【００５９】このようにして、スペクトル処理部１１は
対数データ算出方法の対数データ算出処理手順によっ
て、入力オーディオデータＤ１０で表される信号波形の
特徴を一段と見い出した対数データＤ１１を算出するこ
とができる。In this way, the spectrum processing unit 11 can calculate log data D11 in which the characteristics of the signal waveform represented by the input audio data D10 are found further by the log data calculation procedure of the log data calculation method. .

【００６０】次に、図１について上述した予測係数メモ
リ１５に記憶するクラス毎の予測係数のセットを予め学
習によって得るための学習回路について説明する。Next, a learning circuit for obtaining a set of prediction coefficients for each class stored in the prediction coefficient memory 15 described above with reference to FIG. 1 by learning in advance will be described.

【００６１】図６において、学習回路３０は、高音質の
教師オーディオデータＤ３０を生徒信号生成フィルタ３
７に受ける。生徒信号生成フィルタ３７は、間引き率設
定信号Ｄ３９により設定された間引き率で教師オーディ
オデータＤ３０を所定時間ごとに所定サンプル間引くよ
うになされている。In FIG. 6, the learning circuit 30 converts the high-quality teacher audio data D30 into the student signal generation filter 3.
Receive at 7. The student signal generation filter 37 thins out the teacher audio data D30 by a predetermined number of samples at a predetermined time interval at the thinning rate set by the thinning rate setting signal D39.

【００６２】この場合、生徒信号生成フィルタ３７にお
ける間引き率によって、生成される予測係数が異なり、
これに応じて上述のオーディオ信号処理装置１０で再現
されるオーディオデータも異なる。例えば、上述のオー
ディオ信号処理装置１０においてサンプリング周波数を
高くすることでオーディオデータの音質を向上しようと
する場合、生徒信号生成フィルタ３７ではサンプリング
周波数を減らす間引き処理を行う。また、これに対して
上述のオーディオ信号処理装置１０において入力オーデ
ィオデータＤ１０の欠落したデータサンプルを補うこと
で音質の向上を図る場合には、これに応じて、生徒信号
生成フィルタ３７ではデータサンプルを欠落させる間引
き処理を行うようになされている。In this case, the generated prediction coefficient differs depending on the thinning rate in the student signal generation filter 37.
The audio data reproduced by the above-described audio signal processing device 10 differs accordingly. For example, when the audio signal processing device 10 attempts to improve the sound quality of audio data by increasing the sampling frequency, the student signal generation filter 37 performs a thinning process to reduce the sampling frequency. On the other hand, when the audio signal processing device 10 described above aims to improve the sound quality by compensating for the missing data sample of the input audio data D10, the student signal generation filter 37 responds accordingly. A thinning-out process is performed to remove the data.

【００６３】かくして、生徒信号生成フィルタ３７は教
師オーディオデータ３０から所定の間引き処理により生
徒オーディオデータＤ３７を生成し、これをスペクトル
処理部３１及び予測演算部抽出部３３にそれぞれ供給す
る。Thus, the student signal generation filter 37 generates the student audio data D37 from the teacher audio data 30 by a predetermined thinning process, and supplies this to the spectrum processing unit 31 and the prediction calculation unit extraction unit 33.

【００６４】スペクトル処理部３１は生徒信号生成フィ
ルタ３７から供給された生徒オーディオデータＤ３７を
所定時間毎の領域（この実施の形態の場合、例えば６サ
ンプル毎とする）に分割した後、当該分割された各時間
領域の波形について、図４について上述した対数データ
算出方法による算出結果であってクラス分類しようとす
る対数データＤ３１を算出し、これをクラス分類部３４
に供給する。The spectrum processing section 31 divides the student audio data D37 supplied from the student signal generation filter 37 into regions at predetermined time intervals (in this embodiment, for example, every six samples), and then performs the division. For each of the time domain waveforms, log data D31, which is a result of the log data calculation method described above with reference to FIG.
To supply.

【００６５】クラス分類部３４は、スペクトル処理部３
１から供給された対数データＤ３１について、当該対数
データＤ３１を圧縮して圧縮データパターンを生成する
ＡＤＲＣ回路部と、対数データＤ３１の属するクラスコ
ードを発生するクラスコード発生回路部とを有する。The class classifying unit 34 includes the spectrum processing unit 3
For the log data D31 supplied from No. 1, an ADRC circuit section that compresses the log data D31 to generate a compressed data pattern, and a class code generation circuit section that generates a class code to which the log data D31 belongs.

【００６６】ＡＤＲＣ回路部は対数データＤ３１に対し
て、例えば８ビットから２ビットに圧縮するような演算
を行うことによりパターン圧縮データを形成する。この
ＡＤＲＣ回路部は、適応的量子化を行うものであり、こ
こでは、信号レベルの局所的なパターンを短い語長で効
率的に表現することができるので、信号パターンのクラ
ス分類のコード発生用に用いられる。The ADRC circuit unit performs an operation for compressing the logarithmic data D31 from, for example, 8 bits to 2 bits to form pattern compression data. The ADRC circuit section performs adaptive quantization. Here, since a local pattern of a signal level can be efficiently represented by a short word length, the ADRC circuit section is used for generating a code for classifying a signal pattern. Used for

【００６７】具体的には、６つの８ビットのデータ（対
数データ）をクラス分類しようとする場合、２⁴⁸という
膨大な数のクラスに分類しなければならず、回路上の負
担が多くなる。そこで、この実施の形態のクラス分類部
３４ではその内部に設けられたＡＤＲＣ回路部で生成さ
れるパターン圧縮データに基づいてクラス分類を行う。
例えば６つの対数データに対して１ビットの量子化を実
行すると、６つの対数データを６ビットで表すことがで
き、２⁶＝６４クラスに分類することができる。[0067] More specifically, when attempting to classification six 8-bit data (log data), must be classified into enormous number of classes 2 ^48, the greater the burden on the circuit. Therefore, the class classification unit 34 of this embodiment classifies the data based on the compressed pattern data generated by the ADRC circuit unit provided therein.
For example, if 1-bit quantization is performed on six logarithmic data, the six logarithmic data can be represented by six bits and can be classified into 2 ⁶ = 64 classes.

【００６８】ここで、ＡＤＲＣ回路部は、切り出された
領域内のダイナミックレンジをＤＲ、ビット割り当てを
ｍ、各対数データのデータレベルをＬ、量子化コードを
Ｑとして、上述の（１）式と同様の演算により、領域内
の最大値ＭＡＸと最小値ＭＩＮとの間を指定されたビッ
ト長で均等に分割して量子化を行う。かくしてスペクト
ル処理部３１において算出された６つの対数データが、
それぞれ例えば８ビット（ｍ＝８）で構成されていると
すると、これらはＡＤＲＣ回路部においてそれぞれが２
ビットに圧縮される。Here, the ADRC circuit section calculates the dynamic range in the cut-out area as DR, the bit allocation as m, the data level of each logarithmic data as L, the quantization code as Q, and the above equation (1). By the same operation, quantization between the maximum value MAX and the minimum value MIN in the area is equally divided by the designated bit length. Thus, the six logarithmic data calculated by the spectrum processing unit 31 are:
Assuming that each is composed of, for example, 8 bits (m = 8), these are each 2 bits in the ADRC circuit section.
Compressed to bits.

【００６９】このようにして圧縮された対数データをそ
れぞれｑ_n（ｎ＝１〜６）とすると、クラス分類部３４
に設けられたクラスコード発生回路部は、圧縮された対
数データｑ_nに基づいて、上述の（２）式と同様の演算
を実行することにより、そのブロック（ｑ₁〜ｑ₆）が
属するクラスを示すクラスコードclass を算出し、当該
算出されたクラスコードclass を表すクラスコードデー
タＤ３４を予測係数算出部３６に供給する。因みに
（２）式において、ｎは圧縮された対数データｑ_nの数
を表し、この実施の形態の場合ｎ＝６であり、またＰは
ビット割り当てを表し、この実施の形態の場合Ｐ＝２で
ある。Assuming that the log data thus compressed is q _n (n = 1 to 6), the class classification unit 34
Class class code generating circuit section provided, on the basis of the compressed log data q _n, by performing a calculation similar to the above equation (2), the block (q ₁ to q ₆₎ belongs to the Is calculated, and class code data D34 representing the calculated class code class is supplied to the prediction coefficient calculation unit 36. In the expression (2), n represents the number of compressed logarithmic data q _n , n = 6 in this embodiment, P represents bit allocation, and P = 2 in this embodiment. It is.

【００７０】このようにして、クラス分類部３４はスペ
クトル処理部３１から供給された対数データＤ３１のク
ラスコードデータＤ３４を生成し、これを予測係数算出
部３６に供給する。また、予測係数算出部３６には、ク
ラスコードデータＤ３４に対応した時間軸領域のオーデ
ィオ波形データＤ３３（ｘ₁、ｘ₂、……、ｘ_n）が予
測演算部抽出部３３において切り出されて供給される。As described above, the class classification unit 34 generates the class code data D34 of the logarithmic data D31 supplied from the spectrum processing unit 31, and supplies this to the prediction coefficient calculation unit 36. In addition, the prediction coefficient calculation unit 36 cuts out and supplies the audio waveform data D33 (x ₁ , x ₂ ,..., X _n ) in the time axis region corresponding to the class code data D 34 in the prediction calculation unit extraction unit 33. Is done.

【００７１】予測係数算出部３６は、クラス分類部３４
から供給されたクラスコードｃｌａｓｓと、各クラスコ
ードｃｌａｓｓ毎に切り出されたオーディオ波形デー
タＤ３３と、入力端Ｔ_INから供給された高音質の教師オ
ーディオデータＤ３０とを用いて、正規方程式を立て
る。The prediction coefficient calculation unit 36 includes a class classification unit 34
A normal equation is established by using the class code class supplied from, the audio waveform data D33 cut out for each class code class, and the high-quality teacher audio data D30 supplied from the input terminal T _IN .

【００７２】すなわち、生徒オーディオデータＤ３７の
ｎサンプルのレベルをそれぞれｘ₁、ｘ₂、……、ｘ_n
として、それぞれにｐビットのＡＤＲＣを行った結果の
量子化データをｑ₁、……、ｑ_nとする。このとき、こ
の領域のクラスコードclassを上述の（２）式のように
定義する。そして、上述のように生徒オーディオデータ
Ｄ３７のレベルをそれぞれ、ｘ₁、ｘ₂、……、ｘ_nと
し、高音質の教師オーディオデータＤ３０のレベルをｙ
としたとき、クラスコード毎に、予測係数ｗ₁、ｗ₂、
……、ｗ_nによるｎタップの線形推定式を設定する。こ
れを次式、That is, the levels of n samples of the student audio data D37 are x ₁ , x ₂ _,.
As a result of the quantized data subjected to ADRC of p bits each q _1, ......, and q _n. At this time, the class code class of this area is defined as in the above equation (2). Then, each level of the student audio data D37 as described above, x _1, x _2, ......, and x _n, the level of teacher audio data D30 of the high-quality sound y
, The prediction coefficients w ₁ , w ₂ ,
.., And a linear estimation equation of n taps by w _n is set. This is given by the following equation:

【００７３】[0073]

【数８】 (Equation 8)

【００７４】とする。学習前は、Ｗ_nが未定係数であ
る。Assume that Before learning, W _n is an undetermined coefficient.

【００７５】学習回路３０では、クラスコード毎に、複
数のオーディオデータに対して学習を行う。データサン
プル数がＭの場合、上述の（８）式に従って、次式、The learning circuit 30 learns a plurality of audio data for each class code. When the number of data samples is M, the following equation is obtained according to the above equation (8).

【００７６】[0076]

【数９】 (Equation 9)

【００７７】が設定される。但しｋ＝１、２、……Ｍで
ある。Is set. However, k = 1, 2,..., M.

【００７８】Ｍ＞ｎの場合、予測係数ｗ₁、……ｗ_nは
一意的に決まらないので、誤差ベクトルｅの要素を次
式、When M> n, the prediction coefficients w ₁ ,..., W _n are not uniquely determined.

【００７９】[0079]

【数１０】 (Equation 10)

【００８０】によって定義し（但し、ｋ＝１、２、…
…、Ｍ）、次式、(Where k = 1, 2,...)
..., M),

【００８１】[0081]

【数１１】 [Equation 11]

【００８２】を最小にする予測係数を求める。いわゆ
る、最小自乗法による解法である。A prediction coefficient for minimizing is calculated. This is a so-called least squares solution.

【００８３】ここで、（１１）式によるｗ_nの偏微分係
数を求める。この場合、次式、Here, the partial differential coefficient of w _n is obtained by the equation (11). In this case,

【００８４】[0084]

【数１２】 (Equation 12)

【００８５】を「０」にするように、各Ｗ_n（ｎ＝１〜
６）を求めれば良い。Each W _n (n = 1 to 1) is set so that
6) may be obtained.

【００８６】そして、次式、Then, the following equation:

【００８７】[0087]

【数１３】 (Equation 13)

【００８８】[0088]

【数１４】 [Equation 14]

【００８９】のように、Ｘ_ij、Ｙ_iを定義すると、（１
２）式は行列を用いて次式、When X _ij and Y _i are defined as follows, (1)
Expression 2) is obtained by using a matrix as follows:

【００９０】[0090]

【数１５】 (Equation 15)

【００９１】として表される。Is represented as

【００９２】この方程式は、一般に正規方程式と呼ばれ
ている。なお、ここではｎ＝６である。This equation is generally called a normal equation. Here, n = 6.

【００９３】全ての学習用データ（教師オーディオデー
タＤ３０、クラスコードclass 、オーディオ波形データ
Ｄ３３）の入力が完了した後、予測係数算出部３６は各
クラスコードclass に上述の（１５）式に示した正規方
程式を立てて、この正規方程式を掃き出し法等の一般的
な行列解法を用いて、各Ｗ_nについて解き、各クラスコ
ード毎に、予測係数を算出する。予測係数算出部３６
は、算出された各予測係数（Ｄ３６）を予測係数メモリ
１５に書き込む。After the input of all the learning data (teacher audio data D30, class code class, audio waveform data D33) is completed, the prediction coefficient calculation unit 36 sets each class code class as shown in the above equation (15). A normal equation is established, and the normal equation is solved for each W _n using a general matrix solution method such as a sweeping method, and a prediction coefficient is calculated for each class code. Prediction coefficient calculation unit 36
Writes the calculated prediction coefficients (D36) into the prediction coefficient memory 15.

【００９４】このような学習を行った結果、予測係数メ
モリ１５には、量子化データｑ₁、……、ｑ₆で規定さ
れるパターン毎に、高音質のオーディオデータｙを推定
するための予測係数が、各クラスコード毎に格納され
る。この予測係数メモリ１５は、図１について上述した
オーディオ信号処理装置１０において用いられる。かか
る処理により、線形推定式に従って通常のオーディオデ
ータから高音質のオーディオデータを作成するための予
測係数の学習が終了する。As a result of such learning, the prediction coefficient memory 15 stores a prediction for estimating high-quality audio data y for each pattern defined by the quantized data q ₁ ,..., Q _6. A coefficient is stored for each class code. This prediction coefficient memory 15 is used in the audio signal processing device 10 described above with reference to FIG. With this processing, the learning of the prediction coefficient for creating the high-quality audio data from the normal audio data in accordance with the linear estimation formula ends.

【００９５】このように、学習回路３０は、オーディオ
信号処理装置１０において補間処理を行う程度を考慮し
て、生徒信号生成フィルタ３７で高音質の教師オーディ
オデータの間引き処理を行うことにより、オーディオ信
号処理装置１０における補間処理のための予測係数を生
成することができる。As described above, the learning circuit 30 performs the thinning process of the high-quality teacher audio data by the student signal generation filter 37 in consideration of the degree of performing the interpolation process in the audio signal processing device 10, thereby obtaining the audio signal. A prediction coefficient for the interpolation processing in the processing device 10 can be generated.

【００９６】以上の構成において、オーディオ信号処理
装置１０は、入力オーディオデータＤ１０に対して高速
フーリエ変換を行うことにより、周波数軸上にパワース
ペクトルを算出する。周波数分析（高速フーリエ変換）
は、時間軸波形データからでは知りえない微妙な違いを
発見することが可能であることにより、オーディオ信号
処理装置１０は、時間軸領域に特徴を見い出せない微妙
な特徴を見い出し得るようになる。In the above configuration, the audio signal processing apparatus 10 calculates a power spectrum on the frequency axis by performing a fast Fourier transform on the input audio data D10. Frequency analysis (fast Fourier transform)
Can detect a subtle difference that cannot be known from the time axis waveform data, so that the audio signal processing device 10 can find a subtle feature that cannot find a feature in the time axis region.

【００９７】微妙な特徴を見い出し得る状態（すなわ
ち、パワースペクトルを算出した状態）において、オー
ディオ信号処理装置１０は、選択範囲設定手段（ユーザ
が入力手段１８から手動で行う選択設定）に応じて、有
意とされるパワースペクトルデータのみを抽出（すなわ
ち、Ｎ／２−ｍ個）する。In a state where a delicate feature can be found (that is, a state in which a power spectrum is calculated), the audio signal processing device 10 responds to selection range setting means (selection setting manually performed by the user from the input means 18). Only the significant power spectrum data is extracted (that is, N / 2−m).

【００９８】これによりオーディオ信号処理装置１０
は、処理負担を一段と軽減することができ、かつ処理速
度を向上させることができる。Thus, the audio signal processing device 10
Can further reduce the processing load and improve the processing speed.

【００９９】さらに、オーディオ信号処理装置１０は、
有意とされた必要最小限のパワースペクトルデータに対
して、最大振幅で正規化及び振幅の対数変換を行うこと
により、対数データを生成する。この対数変換において
は、特徴部分（有意である小さな波形部分）をも見い出
すと共に、結果として、音声を聞く対象である人間が心
地よく聞き得るようにする対数データを生成する。Further, the audio signal processing device 10
Log data is generated by performing normalization with the maximum amplitude and logarithmic conversion of the amplitude for the minimum required power spectrum data that has been regarded as significant. In this logarithmic conversion, a characteristic portion (significant small waveform portion) is also found, and as a result, logarithmic data is generated so that a person who is to hear the sound can comfortably hear it.

【０１００】このように、オーディオ信号処理装置１０
は、周波数分析を行うことにより、微妙な特徴を見い出
し得るようになされたパワースペクトルデータから有意
とされるパワースペクトルデータのみを抽出し、さら
に、抽出したパワースペクトルデータに対して、最大振
幅で正規化及び振幅の対数変換を行うことにより得られ
る対数データに基づいて、そのクラスを特定する。As described above, the audio signal processing device 10
Is to extract only the power spectrum data that is significant from the power spectrum data that was able to find delicate features by performing frequency analysis. The class is specified based on logarithmic data obtained by performing logarithmic conversion of amplitude and amplitude.

【０１０１】そしてオーディオ信号処理装置１０は、抽
出した有意なパワースペクトルデータに基づいて特定し
たクラスに基づく予測係数を用いて入力オーディオデー
タＤ１０を予測演算することにより、当該入力オーディ
オデータＤ１０を一段と高音質のオーディオデータＤ１
６に変換することができる。The audio signal processing apparatus 10 performs a prediction operation on the input audio data D10 using a prediction coefficient based on the class specified based on the extracted significant power spectrum data, thereby further increasing the input audio data D10. Audio data D1 of sound quality
6 can be converted.

【０１０２】また、クラス毎の予測係数を生成する学習
時において、位相の異なる多数の教師オーディオデータ
についてそれぞれに対応した予測係数を求めておくこと
により、オーディオ信号処理装置１０における入力オー
ディオデータＤ１０のクラス分類適応処理時に位相変動
が生じても、位相変動に対応した処理を行うことができ
る。Also, at the time of learning for generating prediction coefficients for each class, by obtaining prediction coefficients corresponding to a large number of teacher audio data having different phases, the input audio data D10 of the audio signal processing apparatus 10 can be obtained. Even if a phase change occurs during the classification adaptive processing, a process corresponding to the phase change can be performed.

【０１０３】以上の構成によれば、周波数分析を行うこ
とにより、微妙な特徴を見い出し得るようになされたパ
ワースペクトルデータから有意とされるパワースペクト
ルデータのみを抽出し、さらに、抽出したパワースペク
トルデータに対して最大振幅で正規化及び振幅の対数変
換を行うことにより得た対数データをクラス分類した結
果に基づく予測係数を用いて入力オーディオデータＤ１
０を予測演算するようにしたことにより、入力オーディ
オデータＤ１０を一段と高音質のオーディオデータＤ１
６に変換することができる。According to the above configuration, by performing the frequency analysis, only the significant power spectrum data is extracted from the power spectrum data in which delicate features can be found, and the extracted power spectrum data is further extracted. Input data D1 using a prediction coefficient based on the result of classifying log data obtained by performing normalization with the maximum amplitude and logarithmic conversion of the amplitude.
0, the input audio data D10 is converted to the higher-quality audio data D1.
6 can be converted.

【０１０４】なお上述の実施の形態においては、窓関数
としてハミング窓を用いて乗算する場合について述べた
が、本発明はこれに限らず、ハミング窓に代えて、例え
ばハニング窓やブラックマン窓等、他の種々の窓関数に
よって乗算する、又はスペクトル処理部において予め各
種窓関数（ハミング窓、ハニング窓及びブラックマン窓
等）を用いて乗算し得るようにしておき、入力されるデ
ィジタルオーディオ信号の周波数特性に応じて、スペク
トル処理部が所望の窓関数を用いて乗算するようにして
も良い。In the above-described embodiment, multiplication is performed using a Hamming window as a window function. However, the present invention is not limited to this. For example, instead of the Hamming window, a Hanning window, a Blackman window, or the like may be used. , Or by using various window functions (such as a Hamming window, a Hanning window, and a Blackman window) in advance in the spectrum processing unit. The spectrum processing unit may perform the multiplication using a desired window function according to the frequency characteristics.

【０１０５】因みに、スペクトル処理部がハニング窓を
用いて乗算する場合、スペクトル処理部は、切り出し部
から供給されたクラスタップに対して、次式、Incidentally, when the spectrum processing unit performs the multiplication using the Hanning window, the spectrum processing unit applies the following equation to the class tap supplied from the cutout unit.

【０１０６】[0106]

【数１６】 (Equation 16)

【０１０７】からなるハニング窓を乗算して乗算データ
を算出する。Is multiplied by a Hanning window, to calculate multiplied data.

【０１０８】また、スペクトル処理部がブラックマン窓
を使用して乗算する場合、スペクトル処理部は、切り出
し部から供給されたクラスタップに対して、次式、When the spectrum processing unit performs the multiplication using the Blackman window, the spectrum processing unit applies the following equation to the class tap supplied from the cutout unit.

【０１０９】[0109]

【数１７】 [Equation 17]

【０１１０】からなるブラックマン窓を乗算して乗算デ
ータを算出する。Is multiplied by a Blackman window, thereby calculating multiplied data.

【０１１１】また上述の実施の形態においては、高速フ
ーリエ変換を用いる場合について述べたが、本発明はこ
れに限らず、例えば離散フーリエ変換（ＤＦＴ：Discre
te Fourier Transformer）や離散コサイン変換（ＤＣ
Ｔ：Discrete Cosine Transform ）や最大エントロピー
法、さらには線形予測分析による方法等、他の種々の周
波数分析手段を適用することができる。In the above-described embodiment, the case where the fast Fourier transform is used has been described. However, the present invention is not limited to this. For example, the discrete Fourier transform (DFT: Discrete
te Fourier Transformer) and discrete cosine transform (DC
T: Discrete Cosine Transform), a maximum entropy method, a method based on linear prediction analysis, and other various frequency analysis means can be applied.

【０１１２】さらに上述の実施の形態においては、スペ
クトル処理部１１がゼロ値からＮ／２までの左側のパワ
ースペクトルデータ群ＡＲ１（図５）のみを抽出対象と
する場合について述べたが、本発明はこれに限らず、右
側のパワースペクトルデータ群ＡＲ２（図５）のみを抽
出対象とするようにしても良い。Further, in the above-described embodiment, the case has been described where the spectrum processing section 11 extracts only the left power spectrum data group AR1 (FIG. 5) from zero value to N / 2. Is not limited thereto, and only the power spectrum data group AR2 on the right side (FIG. 5) may be extracted.

【０１１３】この場合、オーディオ信号処理装置１０の
処理負担を一段と軽減することができ、処理速度を一段
と向上させることができる。In this case, the processing load on the audio signal processing device 10 can be further reduced, and the processing speed can be further improved.

【０１１４】さらに上述の実施の形態においては、圧縮
データパターンを生成するパターン生成手段として、Ａ
ＤＲＣを行う場合について述べたが、本発明はこれに限
らず、例えば可逆符号化（ＤＰＣＭ：Differential Pul
se Code Modulation）やベクトル量子化（ＶＱ：Vector
Quantize ）等の圧縮手段を用いるようにしても良い。
要は、信号波形のパターンを少ないクラスで表現し得る
ような圧縮手段であれば良い。Further, in the above-described embodiment, A is used as a pattern generating means for generating a compressed data pattern.
Although the case of performing DRC has been described, the present invention is not limited to this, and for example, lossless coding (DPCM: Differential Pull
se Code Modulation and vector quantization (VQ: Vector)
Compressing means such as Quantize) may be used.
In short, any compression means that can represent a signal waveform pattern with a small number of classes may be used.

【０１１５】さらに上述の実施の形態においては、ユー
ザが手動で選択操作し得る選択範囲設定手段として、人
間の声及び音声を選択（すなわち、抽出する周波数成分
として５００Ｈｚ〜４ｋＨｚ又は２０Ｈｚ〜２０ｋＨ
ｚ）する場合について述べたが、本発明はこれに限ら
ず、例えば図７に示すように、高域（ＵＰＰ）、中域
（ＭＩＤ）及び低域（ＬＯＷ）のいづれかの周波数成分
を選択する、又は図８に示すように、まばらに周波数成
分を選択する、さらには図９に示すように、不均一に帯
域を周波数成分する等、他の種々の選択範囲設定手段を
適用し得る。Further, in the above-described embodiment, human voice and voice are selected (that is, 500 Hz to 4 kHz or 20 Hz to 20 kHz as frequency components to be extracted) as selection range setting means that can be manually selected and operated by the user.
z), the present invention is not limited to this. For example, as shown in FIG. 7, any one of high-frequency (UPP), middle-frequency (MID), and low-frequency (LOW) frequency components is selected. Alternatively, various other selection range setting means, such as sparsely selecting frequency components as shown in FIG. 8, and non-uniform frequency components as shown in FIG. 9, can be applied.

【０１１６】この場合、オーディオ信号処理装置には、
新たに設けられた選択範囲設定手段に対応するプログラ
ムを作成してハードディスクドライブやＲＯＭ等、所定
の記憶手段に格納させる。これにより、ユーザが手動で
入力手段１８を介して新たに設けられた選択範囲設定手
段を選択操作した場合においても、このとき選択された
選択範囲設定手段に応じた制御データが入力手段からス
ペクトル処理部に出力され、これによりスペクトル処理
部は、新たに設けられた選択範囲設定手段に対応するプ
ログラムによって、所望の周波数成分からパワースペク
トルデータの抽出を行う。In this case, the audio signal processing device includes:
A program corresponding to the newly provided selection range setting means is created and stored in a predetermined storage means such as a hard disk drive or a ROM. Thus, even when the user manually selects the newly provided selection range setting means via the input means 18, the control data corresponding to the selected selection range setting means at this time is transmitted from the input means to the spectrum processing unit. The spectrum processing unit extracts power spectrum data from desired frequency components by using a program corresponding to the newly provided selection range setting means.

【０１１７】このようにすれば、他の種々の選択範囲設
定手段を適用することができ、ユーザの意図に応じた有
意なパワースペクトルデータを抽出することができる。In this way, various other selection range setting means can be applied, and significant power spectrum data according to the user's intention can be extracted.

【０１１８】さらに上述の実施の形態においては、オー
ディオ信号処理装置１０（図２）がプログラムによって
クラスコード生成処理手順を実行する場合について述べ
たが、本発明はこれに限らず、ハードウェア構成によっ
てこれらの機能を実現して種々のディジタル信号処理装
置（例えば、レートコンバータ、オーバーサンプリング
処理装置、ＢＳ(Broadcasting Satellite)放送等に用い
られているＰＣＭ(Pulse Code Modulation) ディジタル
音声エラー訂正を行うＰＣＭエラー修正装置等）内に設
けたり、又は各機能を実現するプログラムを格納したプ
ログラム格納媒体（フロッピー（登録商標）ディスク、
光ディスク等）からこれらのプログラムを種々のディジ
タル信号処理装置にロードして各機能部を実現するよう
にしても良い。Further, in the above-described embodiment, a case has been described where the audio signal processing device 10 (FIG. 2) executes a class code generation processing procedure by a program. PCM (Pulse Code Modulation) used for various digital signal processing devices (for example, rate converter, oversampling processing device, BS (Broadcasting Satellite) broadcasting, etc.) to realize these functions and perform PCM error correction Correction device, etc.) or a program storage medium (floppy (registered trademark) disk,
These programs may be loaded from an optical disk or the like) into various digital signal processing devices to realize the respective functional units.

【０１１９】[0119]

【発明の効果】上述のように本発明によれば、ディジタ
ルオーディオ信号からパワースペクトルデータを算出
し、算出されたパワースペクトルデータを最大値幅で正
規化して正規化データを算出し、算出された正規化デー
タに基づいてそのクラスを分類し、分類されたクラスに
対応した予測方式でディジタルオーディオ信号を変換す
るようにしたことにより、一段とディジタルオーディオ
信号の特徴に適応した変換を行うことができ、かくし
て、ディジタルオーディオ信号の波形再現性を一段と向
上した高音質のディジタルオーディオ信号への変換を行
うことができる。As described above, according to the present invention, power spectrum data is calculated from a digital audio signal, and the calculated power spectrum data is normalized by a maximum value width to calculate normalized data. By classifying the class based on the digitized data and converting the digital audio signal by a prediction method corresponding to the classified class, it is possible to perform conversion that is more adapted to the characteristics of the digital audio signal. In addition, the digital audio signal can be converted to a high-quality digital audio signal with further improved waveform reproducibility.

[Brief description of the drawings]

【図１】本発明によるオーディオ信号処理装置を示す機
能ブロック図である。FIG. 1 is a functional block diagram showing an audio signal processing device according to the present invention.

【図２】本発明によるオーディオ信号処理装置を示すブ
ロック図である。FIG. 2 is a block diagram showing an audio signal processing device according to the present invention.

【図３】オーディオデータの変換処理手順を示すフロー
チャートである。FIG. 3 is a flowchart illustrating a procedure of audio data conversion processing;

【図４】対数データ算出処理手順を示すフローチャート
である。FIG. 4 is a flowchart illustrating a logarithmic data calculation processing procedure.

【図５】パワースペクトルデータ算出例を示す略線図で
ある。FIG. 5 is a schematic diagram illustrating an example of calculating power spectrum data.

【図６】学習回路の構成を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration of a learning circuit.

【図７】パワースペクトルデータ選択例を示す略線図で
ある。FIG. 7 is a schematic diagram illustrating an example of power spectrum data selection.

【図８】パワースペクトルデータ選択例を示す略線図で
ある。FIG. 8 is a schematic diagram illustrating an example of power spectrum data selection.

【図９】パワースペクトルデータ選択例を示す略線図で
ある。FIG. 9 is a schematic diagram illustrating an example of power spectrum data selection.

[Explanation of symbols]

１０……オーディオ信号処理装置、１１……スペクトル
処理部、２２……ＲＯＭ、１５……ＲＡＭ、２４……通
信インターフェース、２５……ハードディスクドライ
ブ、２６……入力手段、２７……データ入出力部、２８
……リムーバブルドライブ。10 audio signal processing device, 11 spectral processing unit, 22 ROM, 15 RAM, 24 communication interface, 25 hard disk drive, 26 input means, 27 data input / output unit , 28
...... Removable drive.

───────────────────────────────────────────────────── フロントページの続き (72)発明者渡辺勉東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者木村裕人東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 5D045 CC07 5J064 AA01 BB03 BC01 BC19 BC21 BD03 5K041 CC01 EE23 EE35 HH11 HH43 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Tsutomu Watanabe 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation (72) Inventor Hiroto Kimura 6-35-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony F term in reference (reference) 5D045 CC07 5J064 AA01 BB03 BC01 BC19 BC21 BD03 5K041 CC01 EE23 EE35 HH11 HH43

Claims

[Claims]

1. A digital signal processing method for converting a digital audio signal, comprising: a frequency analysis step of calculating power spectrum data from the digital audio signal; and normalizing the power spectrum data by a maximum value width to calculate normalized data. A normalization step; a class classification step of classifying the class based on the normalized data; and a prediction operation of the digital audio signal by a prediction method corresponding to the classified class, thereby converting the digital audio signal. A prediction operation step of generating a new digital audio signal.

2. The method according to claim 1, wherein said frequency analysis step includes various arithmetic processing methods of a window function.
2. The digital signal processing method according to claim 1, wherein a desired arithmetic processing method is used.

3. The digital signal processing method according to claim 1, wherein in the spectrum data extracting step, when extracting the part of the power spectrum data, power spectrum data of a DC component is removed.

4. The digital signal processing method according to claim 1, wherein in the prediction calculation step, a prediction coefficient generated by learning based on a desired digital audio signal in advance is used.

5. The power spectrum data comprises substantially symmetrical components, and in the spectrum data extracting step, one of right and left components of the power spectrum data is to be extracted. 2. The digital signal processing method according to 1.

6. A digital signal processing device for converting a digital audio signal, comprising: frequency analysis means for calculating power spectrum data from the digital audio signal; and normalizing the power spectrum data by a maximum value width to calculate normalized data. Normalizing means, class classifying means for classifying the class based on the normalized data, and converting the digital audio signal by performing a prediction operation on the digital audio signal by a prediction method corresponding to the classified class. A digital signal processing apparatus for generating a new digital audio signal.

7. The frequency analysis means includes various operation processing means for a window function, and according to frequency characteristics of the digital audio signal,
7. The digital signal processing apparatus according to claim 6, wherein said arithmetic processing means is used as desired.

8. The digital signal processing apparatus according to claim 6, wherein said spectrum data extracting means excludes DC component power spectrum data when extracting said partial power spectrum data.

9. The digital signal processing apparatus according to claim 6, wherein said prediction operation means uses a prediction coefficient generated by learning based on a desired digital audio signal in advance.

10. The power spectrum data comprises substantially symmetrical components, and the spectrum data extracting means extracts one of right and left components from the power spectrum data. 2. The digital signal processing device according to claim 1.

11. A frequency analysis step of calculating power spectrum data from a digital audio signal; a normalization step of normalizing the power spectrum data by a maximum value width to calculate normalized data; A class classification step of classifying a class; a prediction calculation step of generating a new digital audio signal by converting the digital audio signal by performing a prediction calculation on the digital audio signal by a prediction method corresponding to the classified class And a program storage medium for causing a digital signal processing device to execute a program including the following.

12. The frequency analysis step includes various arithmetic processing methods of a window function, and according to a frequency characteristic of the digital audio signal,
12. The program storage medium according to claim 11, wherein a desired arithmetic processing method is used.

13. The program storage medium according to claim 11, wherein in the spectrum data extracting step, when extracting the part of the power spectrum data, power spectrum data of a DC component is removed.

14. The power spectrum data comprises substantially symmetric components, and in the spectrum data extracting step, one of right and left components of the power spectrum data is to be extracted. 12. The program storage medium according to claim 11.

15. A learning method for generating a prediction coefficient used for predicting the conversion processing of a digital signal processing device for converting a digital audio signal, wherein a student digital signal obtained by deteriorating a digital audio signal from a desired digital audio signal is provided. A student digital audio signal generation step of generating an audio signal; a frequency analysis step of calculating power spectrum data from the student digital audio signal; and a normalization step of normalizing the power spectrum data by a maximum value width to calculate normalized data. A classifying step of classifying the class based on the normalized data; and a predictor calculating a prediction coefficient corresponding to the class based on the digital audio signal and the student digital audio signal. Learning method characterized by comprising a calculation step.

16. The frequency analysis step includes various arithmetic processing methods of a window function, and according to a frequency characteristic of the digital audio signal,
16. The learning method according to claim 15, wherein a desired arithmetic processing method is used.

17. The learning method according to claim 15, wherein in the spectrum data extracting step, when extracting the part of the power spectrum data, power spectrum data of a DC component is removed.

18. The power spectrum data comprises substantially symmetric components, and in the spectrum data extracting step, one of right and left components of the power spectrum data is to be extracted. 15. The learning method according to 15.

19. A learning apparatus for generating a prediction coefficient used in a prediction operation of the above-mentioned conversion processing of a digital signal processing apparatus for converting a digital audio signal, wherein a student who has degraded the digital audio signal from a desired digital audio signal is provided. Student digital audio signal generation means for generating a digital audio signal; frequency analysis means for calculating power spectrum data from the student digital audio signal; normalization for normalizing the power spectrum data with a maximum value width to calculate normalized data Means, class classification means for classifying the class based on the normalized data, and prediction coefficient calculation means for calculating a prediction coefficient corresponding to the class based on the digital audio signal and the student digital audio signal. Learning device characterized by comprising.

20. The frequency analysis means includes various operation processing means for a window function, and according to frequency characteristics of the digital audio signal,
20. The learning device according to claim 19, wherein a desired one of the arithmetic processing units is used.

21. The learning apparatus according to claim 19, wherein said spectrum data extracting means excludes DC component power spectrum data when extracting said part of the power spectrum data.

22. The power spectrum data according to claim 19, wherein said spectrum data extracting means extracts either the left or right component of said power spectrum data. The learning device according to claim 1.

23. A student digital audio signal generating step of generating a student digital audio signal in which the digital audio signal is degraded from a desired digital audio signal, and a frequency analyzing step of calculating power spectrum data from the student digital audio signal A normalization step of normalizing the power spectrum data with a maximum value width to calculate normalized data; a class classification step of classifying the class based on the normalized data; a digital audio signal and the student digital audio A prediction coefficient calculating step of calculating a prediction coefficient corresponding to the class based on the signal and the digital signal processing device.

24. In the frequency analysis step, there are provided various arithmetic processing methods of a window function, and in accordance with a frequency characteristic of the digital audio signal,
24. The program storage medium according to claim 23, wherein a desired arithmetic processing method is used.

25. The program storage medium according to claim 23, wherein, in the spectrum data extracting step, when extracting the part of the power spectrum data, power spectrum data of a DC component is removed.

26. The power spectrum data comprises substantially symmetric components, and in the spectrum data extracting step, one of right and left components of the power spectrum data is to be extracted. 24. The program storage medium according to claim 23.