JPH04125700A

JPH04125700A - Voice encoder and voice decoder

Info

Publication number: JPH04125700A
Application number: JP2249441A
Authority: JP
Inventors: Toshiyuki Morii; 利幸森井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-09-18
Filing date: 1990-09-18
Publication date: 1992-04-27
Anticipated expiration: 2016-11-12
Also published as: JP3227608B2

Abstract

PURPOSE:To encode voice waveforms which have high tone qualities by providing a linear prediction analyzing part at a framework encoder, and transmitting the frequency characteristics of the voice waveforms to a decoder in the shape of a linear prediction parameter. CONSTITUTION:A linear prediction coefficient obtained by a linear prediction analyzing part 6 is encoded by a parameter encoding part 5 by using a parameter code book 4 in order to search parameter information, towards an input voice signal 3. Then, an average waveform of one pitch is found based on pitch information obtained by a pitch analyzing part 7 towards the input voice signal 3, and the basic waveform is operated for filtering by the linear prediction coefficient in order to obtain a basic residual waveform. Moreover, the time sequences (framework) of the plural kinds of pulses indicating the shape are retrieved by a framework retrieving part 8 in order to obtain framework information, and the information of the waveforms strung among the frameworks is encoded by a framework waveform encoding part 9 by using a framework waveform code book 10 in order to find the waveform information among frameworks. Then, the information obtained by an encoder 1 is transmitted to a decod er 2. Thus, the voice waveforms which have the high tone qualities can be encoded by the simple data processing of a low pit rate.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声を符号化・復号化する音声符号化装置お
よび音声復号化装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech encoding device and a speech decoding device for encoding and decoding speech.

従来の技術従来、低ビツトレート（４，８ｋｂｐｓ程度）の音声符
号化を実現する方式としては、線型予測分析などの周波
数分析を利用して音声の周波数的特徴を抽出し、音源情
報と合せて符号化する音声分析合成符号化方式と、音声
の冗長性を用いて波形目前を符号化する音声波形符号化
方式とがある０更に、波形符号化の形態で低ビツトレー
トを実現する方法の一つとして骨組符号化という方式が
ある。Conventional technology Conventionally, the method for realizing low bit rate (about 4.8 kbps) audio encoding has been to extract the frequency characteristics of the audio using frequency analysis such as linear predictive analysis, and then encode the audio along with the audio source information. There is a speech analysis and synthesis coding method that uses speech redundancy to encode the immediate waveform, and a speech waveform coding method that uses speech redundancy to encode the immediate waveform. There is a method called skeleton encoding.

この方式では、まず、音声信号をピッチ分析することに
よってピッチ情報と１ピッチの基本波形を得て、その基
本波形の形状を表す数種類のパルスの時系列（骨組）を
検索し、その骨組情報を得る。次に、その骨組の間に張
られる波形（骨間波形）の情報を符号化して骨間波形情
報を得て、上記ピッチ情報と骨組情報と骨間波形情報と
を利用して音声の符号化・復号化を行う。In this method, first, pitch information and the basic waveform of one pitch are obtained by pitch analysis of the audio signal, and then the time series (skeleton) of several types of pulses representing the shape of the basic waveform is searched, and the skeleton information is extracted. obtain. Next, information on the waveform stretched between the bones (interbone waveform) is encoded to obtain interbone waveform information, and speech is encoded using the pitch information, skeleton information, and interbone waveform information.・Perform decryption.

この方式により、音声波形の概形は、ピッチ情報と１ピ
ッチの基本波形の形状を表す骨組の位置と大きさとによ
って符号化ができる。また、１ピッチの基本波形の概形
は骨組情報で符号化されるので、骨間波形は端点固定に
正規化すればベクトル量子化によυ低ビットレートで符
号化することができる。従って、良質な復号音声を得る
ことができる音声波形符号化の形態を取りながらも、簡
単なデータ処理で低ビツトレートの音声符号化を実現す
ることができる。With this method, the approximate shape of a speech waveform can be encoded using pitch information and the position and size of a skeleton representing the shape of the basic waveform of one pitch. Furthermore, since the outline of the basic waveform of one pitch is encoded with skeleton information, if the interbone waveform is normalized to fix the end points, it can be encoded at a low bit rate υ by vector quantization. Therefore, while using a form of audio waveform encoding that can obtain high quality decoded audio, it is possible to realize low bit rate audio encoding with simple data processing.

この方式について詳細に説明する。This method will be explained in detail.

第３図は、従来の音声符号化装置および音声復号化装置
の機能ブロック図である。各ブロックの説明を以下に述
べる。FIG. 3 is a functional block diagram of a conventional speech encoding device and speech decoding device. A description of each block is given below.

符号器１９においては、まず、入力音声信号２１　　を
サンプリングしてディジタル信号に変換し、一定時間長
（１フレーム）ごとに区切る。In the encoder 19, first, the input audio signal 21 is sampled and converted into a digital signal, and the digital signal is divided into intervals of a fixed time length (one frame).

次に、ピッチ分析部２２において、その区間内のピッチ
を求め、ピッチ情報とする。そして、ピッチ情報を基に
、区間内の波形から１ビｙテの平均的な波形を求め、こ
れを基本波形として骨組検索部２３に送る。Next, the pitch analysis section 22 determines the pitch within that section and uses it as pitch information. Then, based on the pitch information, an average waveform for one byte is determined from the waveforms within the section, and this is sent to the skeleton search section 23 as a basic waveform.

骨組検索部２３では、まず、ピッチ分析部２２で作成さ
れた基本波形の形状を分析し、何段階の骨組を立てるか
を考慮しながら、骨組の段数に応じて、正と負で絶対値
最大となるポイントを検索し、その信号の位置と信号の
振幅とを骨組情報とする。The skeleton search unit 23 first analyzes the shape of the basic waveform created by the pitch analysis unit 22, and determines the maximum absolute value for positive and negative values according to the number of stages of the skeleton while considering how many stages of the skeleton to construct. A point is searched for, and the position and amplitude of the signal are used as skeleton information.

ここで、この骨組検索法についテ、１ＢニＧ、ＩＥする
。Here, we will discuss this skeleton search method.

１ピッチの基本波形はどれもインパルス応答的形状であ
るが、その形状は発声者・発声状況によって様々である
。従って、その概形を骨組で表すにはその段数を波形の
形状に応じて決定する必要がある。すなわち、なだらか
な山の形状の波形には段数を少なく設定し、正負に激し
く振動する波形には段数を多く設定する必要がある。そ
こで、この骨組段数を考慮し々から骨組探索を行うアル
ゴリズムを以下に述べる。All one-pitch basic waveforms have impulse response shapes, but the shapes vary depending on the speaker and the speaking situation. Therefore, in order to represent the outline as a skeleton, it is necessary to determine the number of stages according to the shape of the waveform. That is, it is necessary to set a small number of steps for a waveform with a gentle mountain shape, and to set a large number of steps for a waveform that vibrates violently in positive and negative directions. Therefore, an algorithm will be described below that performs a skeleton search while taking into account the number of skeleton stages.

（１）初期値設定を行う。(1) Perform initial value settings.

Ｘｉ　（ｉ＝１．　Ｌ）　：　１ピッチ基本波形。Ｌは
長さ。Xi (i=1.L): 1 pitch basic waveform. L is length.

Ｄ：骨組段数の最大値。D: Maximum number of frame stages.

Ｋ：１〜Ｌまでの位置を要素とする探索の禁止領域集合
。初期値としてに一φ（空集合）とする。K: Search prohibited area set with positions 1 to L as elements. The initial value is 1φ (empty set).

Ｍ：検索段数。初期値Ｍ−０゜Ｈｊ　＝＝　（Ａｘ、　Ａｎ、　Ｉｘ、　Ｉｎ　）　：
骨組情報ＯＭＡＸ　ｏ信号値Ａｘ１ＭＩＮの信号値Ａｎ
。M: Search stage number. Initial value M-0゜Hj == (Ax, An, Ix, In):
Skeleton information OMAX o Signal value Ax1 MIN signal value An
.

ＭＡＸの位置Ｉｘ、ＭＩＮの位置１口の４つの値により
構成される。It is composed of four values: MAX position Ix and MIN position 1.

（２）　　Ｍ＝Ｍ＋１（３）　　Ｘｍａｘ＝　ｍａｘ　（Ｘｉ　ｌ　ｉ＝１．
　Ｌ　　ｉ　ｇＫ）＝Ｘ１１Ｘｍｉｎ　＝　ｍｉｎ　（Ｘｉ　ｌ　ｉ　＝　１．　Ｌ
　ｉ　ｇＫ　’Ｉｉ２ＨＭ＝　（Ｘｍａｘ、　Ｘｍ１ｎ＝　　ｉｌ、１２）（
４）　　ｉ：＋　　と１２を中心として、前後のＸｌの
符号が変化しない区間の位置全てを禁止領域としてＫの
要素に加える。(2) M=M+1 (3) Xmax= max (Xi l i=1.
L i gK) = X11 Xmin = min (Xi l i = 1. L
i gK 'Ii2 HM= (Xmax, Xm1n= il, 12) (
4) All the positions of the sections where the sign of Xl before and after i:+ and 12 do not change are added to the elements of K as a prohibited area.

（５）　　Ｍ＝ＤまたはＫが１〜Ｌ全てを要素として持
つ時は、（６）へ。それ以外の場合は（２）へ。(5) When M=D or K has all 1 to L as elements, go to (6). Otherwise, go to (2).

＋ｅ＋　　ト１ｊ（Ｊ＝　１．　Ｍ）　　の位置の部分
のみを取り出して、大きさの顆番に並べる。+e+ To1j (J=1.M) Take out only the part at the position and arrange them according to the size of the condyles.

（７）　　小さい方から、その位置がＭＡＸの位置であ
るか、ＭＩＮの位置であるかを調べる。そのどちらかが
２つ連続して続いた場合はＭ＝Ｍ−１として（６）へ。(7) Check whether the position is the MAX position or the MIN position, starting from the smallest one. If either of them occurs two times in a row, set M=M-1 and go to (6).

ＭＡＸとＭＩＮが全て交互に並んでいる場合は（８）へ
。If MAX and MIN are all arranged alternately, go to (8).

８）　　Ｍを骨組段数、Ｈｊ　（ｉ＝１．　Ｍ）を骨組
情報として検索を終了する。8) End the search by setting M as the number of frame stages and Hj (i=1.M) as the frame information.

上記アルゴリズムにより分類された基本波形の集合の例
を第４図に示す。第４図（ａ）に示す波形は１段の骨組
により符号化されると分類された波形の例である。第４
図（ｂ）に示す波形は２段の例、第４図（Ｃ）に示す波
形は３段の例である。第４図では実線で１ピンチの基本
波形を、破線で骨組の位置を示す。また、基本波形と骨
組情報の関係について、骨組が２段の場合を例として、
第５図に示す。Ａｌｌ、Ａｌ１、Ａ２１、Ａ２２が骨組
の位置情報、Ｂ１１、Ｂ１２、Ｂ２１、Ｂ２２が信号値
情報である。FIG. 4 shows an example of a set of basic waveforms classified by the above algorithm. The waveform shown in FIG. 4(a) is an example of a waveform classified as being encoded by a one-stage skeleton. Fourth
The waveform shown in FIG. 4(b) is an example of two stages, and the waveform shown in FIG. 4(C) is an example of three stages. In FIG. 4, the solid line shows the basic waveform of one pinch, and the broken line shows the position of the skeleton. In addition, regarding the relationship between the basic waveform and the skeleton information, using the case where the skeleton has two stages as an example,
It is shown in FIG. All, Al1, A21, and A22 are the position information of the skeleton, and B11, B12, B21, and B22 are the signal value information.

次に骨間波形選択部２４の機能を第６図を用いて説明す
る。ただし、第６図は骨組が１段の場合の概念図である
。Next, the function of the interosseous waveform selection section 24 will be explained using FIG. 6. However, FIG. 6 is a conceptual diagram when the frame has one stage.

まず、前記骨組情報を基に、１ピッチ内において、骨組
と力るＭＡＸ信号Ｃ１からＭＩＮ信号Ｃ２までの間に張
られる波形と、ＭＩＮ信号Ｃ２からＭＡＸ信号Ｃ１′１
での間に張られる波形とを求めて、これを基本骨間波形
Ｄ１、Ｄ２とする。次に、それぞれの基本骨間波形を端
点固定（時間的・パワー的）に正規化して信号Ｅ１、Ｂ
２とする。First, based on the skeleton information, the waveform that is applied between the MAX signal C1 and the MIN signal C2 which are applied to the skeleton, and the waveform from the MIN signal C2 to the MAX signal C1'1 within one pitch.
The waveforms extending between the oscilloscope and the interosseous waveforms are determined and are defined as the basic interosseous waveforms D1 and D2. Next, each basic interosseous waveform is normalized to a fixed end point (in terms of time and power), and the signals E1 and B
Set it to 2.

そして、骨間波形符号帳２５に格納されている番号付け
られた骨間波形サンプルと比較し、正規化された基本骨
間波形に最も近い骨間波形サンプルに付いている番号Ｎ
およびＭを骨間波形情報とする。そして、上記のピッチ
情報、骨組情報、骨間波形情報を単位時間の音声の符号
として伝送する。Then, it is compared with the numbered interosseous waveform samples stored in the interosseous waveform codebook 25, and the number N is attached to the interosseous waveform sample closest to the normalized basic interosseous waveform.
and M is the interosseous waveform information. Then, the pitch information, skeleton information, and interbone waveform information described above are transmitted as audio codes of unit time.

この時に用いられる骨間波形符号帳２５は、予め音声を
分析することによって得られる基本骨間波形を多くの音
声データについて集め、それぞれを端点固定（時間的、
パワー的）に正規化して番号を付けて格納することによ
って作成される。The interosseous waveform codebook 25 used at this time collects the basic interosseous waveforms obtained by analyzing the audio in advance for many audio data, and fixes the end points of each (temporally,
It is created by normalizing it (in terms of power) and storing it with a number.

ここで、その作成方法について詳細に述べる。Here, the method for creating it will be described in detail.

上記骨間波形符号帳２５はそのサイズが大きい程その符
号化歪は小さくなるのは目明である。高音質の実現する
ためには骨間波形符号帳２５のサイズは大きい事が望ま
しい。しかし、低ビツトレートを実現するためには骨間
波形情報のビット数が小さい事が望ましく、また、符号
器１９を実時間で動作させるためには骨間波形符号帳２
５とのマツチングに要する計算量は少ないのが望ましい
。It is obvious that the larger the size of the interosseous waveform codebook 25, the smaller the encoding distortion. In order to achieve high sound quality, it is desirable that the size of the interosseous waveform codebook 25 be large. However, in order to realize a low bit rate, it is desirable that the number of bits of the interosseous waveform information is small, and in order to operate the encoder 19 in real time, the interosseous waveform codebook 2
It is desirable that the amount of calculation required for matching with 5 is small.

従って、サイズは小さいながらも符号化歪が小さいとい
う効率の良い骨間波形符号帳２５が必要と々る。この骨
間波形符号帳２５を作成するために、充分大きな骨間波
形サンプル集合に対してサンプルとセントロイド（重心
）間のユークリッド距離が最小になるようなりラスタリ
ングを行い、作成しようとする符号帳サイズの数のクラ
スに分けてそのクラスタのセントロイド（重心）で骨間
波形符号帳２５を作成するという技術的手段を用いる。Therefore, there is a need for an efficient interosseous waveform codebook 25 that is small in size but has low encoding distortion. In order to create this interbone waveform codebook 25, rastering is performed on a sufficiently large set of interbone waveform samples so that the Euclidean distance between the sample and the centroid (center of gravity) is minimized, and the code to be created is A technical means is used in which the interosseous waveform codebook 25 is created using the centroids (centers of gravity) of the clusters divided into classes of book size.

本従来例に用いたクラスタリング・アルゴリズムは細胞
分裂型のアルゴリズムである。そのアルゴリズムを以下
に述べる。The clustering algorithm used in this conventional example is a cell division type algorithm. The algorithm is described below.

（１）　　Ｋ＝１（２）Ｋ個のクラスタのセントロイドを単純平均により
求める。そして、それぞれのクラスタに属する全てのサ
ンプルとセントロイト′とのユークリッド距離を求め、
その最大値をそのクラスタの歪とする。(1) K=1 (2) Find the centroids of K clusters by simple averaging. Then, find the Euclidean distance between all samples belonging to each cluster and the centroit′,
The maximum value is taken as the distortion of that cluster.

（３）Ｋ個のクラスタの中で最も歪の大きいクラスタの
セントロイト°の附近に２つのセントロイドを作る。（
細胞分裂の核になる。）＜４）Ｋ＋１個のセントロイドを基にクラスタリングを
行い、セントロイドを求め直す。(3) Two centroids are created near the centroit ° of the cluster with the largest distortion among the K clusters. (
Becomes the nucleus for cell division. ) <4) Perform clustering based on K+1 centroids and recalculate the centroids.

（５）空のクラスタがあればそのセントロイドを抹消し
て（３）へ。(5) If there is an empty cluster, delete its centroid and go to (3).

（６＋に＋１個のクラスタの歪を（２）と同様に求め、
その総和の変化量が予め設定した微小な閾値以下であれ
ば（７）へ、閾値より大きければ（４）へ。(Determine the distortion of +1 cluster in 6+ in the same way as (2),
If the amount of change in the total sum is less than or equal to a preset minute threshold, go to (7), and if it is larger than the threshold, go to (4).

（７）Ｋ＋１が目標のクラスタ数に達していなければに
＝に＋１として（２）へ、達していれば（８）へ。(7) If K+1 has not reached the target number of clusters, add +1 to = and go to (2); if it has, go to (8).

（８）スべてのクラスタのセントロイドを求め、符号帳
を作成する。(8) Find the centroids of all clusters and create a codebook.

次に、復号器の機能を第３図及び第７図を用いて説明す
る。ただし、第７図は骨組が１段の場合の波形図である
。Next, the function of the decoder will be explained using FIGS. 3 and 7. However, FIG. 7 is a waveform diagram when the frame has one stage.

捷ず、第３図に示す復号器２０内の骨組形成部２６　に
おいては、前記符号化によって得られるピッチ情報と骨
組情報を基に、音声の骨組Ｃ１、Ｃ２を形成する。第７
図の上部は、この骨組の一例である。骨組が骨組情報に
基づいて形成されている様子を示す。そして、波形合成
部２７においては、骨間波形情報ＮおよびＭに基づいて
、符号器１９に格納されている骨間波形符号帳２５　と
同じ骨間波形符号帳２８から基本骨間波形Ｅ１およびＥ
２を選び、骨組に応じて時間的・パワー的に変換して各
骨の間に張り、この合成波形Ｆを出力音声２９　とする
。第７図下部はこの波形合成の一例である。骨間波形情
報に基づいて、骨間波形符号帳２８から選び出した骨間
波形サンプルによって、骨組の間に基本骨間波形を張っ
ている様子を示す。Instead, a frame forming section 26 in the decoder 20 shown in FIG. 3 forms voice frames C1 and C2 based on the pitch information and frame information obtained by the encoding. 7th
The upper part of the figure is an example of this framework. It shows how the skeleton is formed based on the skeleton information. Then, in the waveform synthesis unit 27, based on the interosseous waveform information N and M, basic interosseous waveforms E1 and E are obtained from the same interosseous waveform codebook 28 as the interosseous waveform codebook 25 stored in the encoder 19.
2 is selected, converted in terms of time and power according to the skeleton, and placed between each bone, and this synthesized waveform F is used as the output sound 29. The lower part of FIG. 7 is an example of this waveform synthesis. A basic interosseous waveform is shown extending between the bones using interosseous waveform samples selected from an interosseous waveform codebook 28 based on interosseous waveform information.

発明が解決しようとする課題上記骨組符号化方式により、低ビツトレートの簡単なデ
ータ処理でありながら、自然で滑らかな合成音声が得ら
れる。Problems to be Solved by the Invention The above-mentioned skeleton coding method allows natural and smooth synthesized speech to be obtained while performing simple data processing at a low bit rate.

しかしながら、この方式における問題点として、復号化
された音声の明瞭度が悪いことが挙げられる。これは大
きく３つのことが原因と考えられる。However, a problem with this method is that the intelligibility of decoded speech is poor. This is thought to be caused by three main reasons.

まず、１ピッチの基本波形を１分析区間における１ピッ
チ波形の平均波形としているために、波形の細部構造が
なまってしまうこと。次に、符号帳参照時における符号
化歪。そして、分析区間の間で波形を滑らかに接合させ
るために波形の重ね合わせを行っている点である。First, since the basic waveform of one pitch is the average waveform of one pitch waveform in one analysis interval, the detailed structure of the waveform becomes dull. Next is the encoding distortion when referring to the codebook. Another point is that the waveforms are superimposed in order to smoothly join the waveforms between the analysis sections.

この符号化方式が１ピッチ波形を符号化する形態をとっ
ているために、３番目の原因は避けられないが、残り２
つの原因については改善の余地がある。Since this encoding method encodes a 1-pitch waveform, the third cause is unavoidable, but the remaining 2
There is room for improvement regarding two causes.

本発明は、簡単なデータ処理で低ビツトレート・高音質
の音声波形符号化を行うことができ、復号化された音声
の明瞭度を劣化させずに音声の符号化・復号化を行うこ
とを目的とするものである。The purpose of the present invention is to perform low bit rate, high quality audio waveform encoding with simple data processing, and to encode and decode audio without degrading the clarity of the decoded audio. That is.

課題を解決するだめの手段この目的を達成するだめに、本発明は、音声信号を一定
時間毎に区切り、その分析区間毎にピッチ分析を行い、
そのピッチ情報を用いて分析区間を代表する１ピッチの
長さの基本波形を求める基本波形抽出手段と、分析区間
毎に線形予測分析を行イ、分析区間の周波数的特徴を表
す線形予測パラメータを抽出するパラメータ抽出手段と
、基本波形に対してパラメータを用いてフィルタリング
を行い、１ピッチの長さの線形予測残差を求める基本予
測残差波形抽出手段と、基本予測残差波形の形状を表す
数種類のパルスの時系列（骨組）を求め符号化する骨組
符号化手段と、番号付けられた複数の骨間波形サンプル
が格納されている骨間波形符号帳と、骨組符号化手段で
得られた骨組の間に張られる骨間波形を骨間波形符号帳
を利用して符号化する骨間波形符号化手段を設けるよう
に構成されている。Means for Solving the Problem In order to achieve this object, the present invention divides the audio signal into sections at regular intervals, performs pitch analysis for each analysis section,
A basic waveform extraction means uses the pitch information to obtain a basic waveform with a length of one pitch representing the analysis interval, performs linear prediction analysis for each analysis interval, and extracts linear prediction parameters representing the frequency characteristics of the analysis interval. A parameter extraction means for extracting, a basic prediction residual waveform extraction means for filtering the basic waveform using parameters to obtain a linear prediction residual with a length of one pitch, and representing the shape of the basic prediction residual waveform. A skeleton encoding means that obtains and encodes a time series (skeleton) of several types of pulses, an interosseous waveform codebook that stores a plurality of numbered interosseous waveform samples, and The apparatus is configured to provide an interosseous waveform encoding means for encoding an interosseous waveform extending between the skeletons using an interosseous waveform codebook.

また、本発明は、符号化された情報を基に、数種類のパ
ルスの時系列（骨組）を作成する骨組復号化手段と、番
号付けられた複数の骨間波形サンプルが格納されている
骨間波形符号帳と、骨間波形符号化手段によって符号化
された波形の形状の情報を基に、骨間波形符号帳を利用
し時系列（骨組）の間に張られる骨間波形を復号化し、
基本予測残差波形を作成する骨間波形復号化手段と、骨
間波形復号化手段により作成された基本予測残差波形に
対して、符号器から伝送されてきたパラメータを用いて
フィルタリングを行い、１ピッチの基本波形を求める基
本波形復号化手段と、基本波形復号化手段によって復号
化されだ１ピッチの基本波形を用いて１分析区間内の波
形を復号化する分析区間内波形復号化手段を設けるよう
に構成されている。The present invention also provides a skeleton decoding means for creating a time series (skeleton) of several types of pulses based on encoded information, and an interosseous waveform in which a plurality of numbered interosseous waveform samples are stored. Based on the waveform codebook and information on the shape of the waveform encoded by the interosseous waveform encoding means, the interosseous waveform spread between the time series (skeleton) is decoded using the interosseous waveform codebook,
An interosseous waveform decoding means for creating a basic predicted residual waveform, and filtering is performed on the basic predicted residual waveform created by the interosseous waveform decoding means using parameters transmitted from the encoder. A fundamental waveform decoding means for obtaining a one-pitch fundamental waveform, and an analysis interval waveform decoding means for decoding a waveform within one analysis interval using the one-pitch fundamental waveform decoded by the fundamental waveform decoding means. It is configured to provide.

更に、好ましくは、骨間波形符号帳が、音声信号を分析
することによって得られる複数の骨間波形のそれぞれを
、時間的およびパワー的に端点固定して正規化し、番号
付けして格納することによって作成される。Furthermore, preferably, the interosseous waveform codebook normalizes each of the plurality of interosseous waveforms obtained by analyzing the audio signal by fixing the endpoints in terms of time and power, and stores the resulting numbers. Created by

作　　　　用本発明は、上記構成によシ、符号器に線形予測分析部を
置き、音声波形の周波数的特徴を線形予測パラメータの
形態で復号器に送るようにしている。According to the above-mentioned structure, the present invention includes a linear prediction analysis section in the encoder, and sends the frequency characteristics of the speech waveform to the decoder in the form of linear prediction parameters.

すなわち、まず、入力音声に対して線形予測分析を行い
、線形予測係数を求める。＃Ｊ形予測係数は符号帳によ
り符号化し、パラメータ情報とする。That is, first, linear prediction analysis is performed on the input speech to obtain linear prediction coefficients. #J-type prediction coefficients are encoded using a codebook and used as parameter information.

次に、入力音声に対してピッチ分析を行いピッチ情報を
求める。ピッチ情報を基に１ピッチの平均的な波形（基
本波形）を求め、更に、上記線形予測係数を用いてフィ
ルタリングすることによって基本残差波形を求める。そ
して、その基本残差波形の形状を表す数種類のパルスの
時系列（骨組）を検索して骨組情報を得る。さらにその
骨組の間に張られる波形（骨間波形）の情報を骨間波形
符号帳を用いて符号化し、骨間波形情報を求める。Next, pitch analysis is performed on the input voice to obtain pitch information. An average waveform (basic waveform) for one pitch is obtained based on the pitch information, and a basic residual waveform is further obtained by filtering using the linear prediction coefficient. Then, time series (skeleton) of several types of pulses representing the shape of the basic residual waveform are searched to obtain skeleton information. Furthermore, information on the waveform (interosseous waveform) stretched between the skeletons is encoded using an interosseous waveform codebook to obtain interosseous waveform information.

そして、上記パラメータ情報、ピッチ情報、骨組情報、
骨間波形情報を復号器に送る。Then, the above parameter information, pitch information, skeleton information,
Send interosseous waveform information to a decoder.

復号器側では、まず、ピッチ情報、骨組情報、骨間波形
情報を基に基本残差波形を求める。次に、パラメータ情
報を用いてフィルタリングを行い、基本波形を求める。On the decoder side, first, a basic residual waveform is obtained based on pitch information, skeleton information, and interbone waveform information. Next, filtering is performed using the parameter information to obtain the basic waveform.

そして、基本波形を分析区間に並べることによって波形
を復号化する。Then, the waveform is decoded by arranging the basic waveform in the analysis interval.

以上の符号化方式により、音声波形の周波数的特徴を線
形予測パラメータの形態で復号器に送り、復号器側では
復号化された基本残差波形に対して合成フィルタで周波
数的特徴を与えることにより、復号化させた音声の明瞭
度を劣化させずに音声の符号化・復号化を行うことがで
きる。With the above encoding method, the frequency features of the speech waveform are sent to the decoder in the form of linear prediction parameters, and the decoder side uses a synthesis filter to give the frequency features to the decoded basic residual waveform. , speech can be encoded and decoded without deteriorating the clarity of the decoded speech.

これにより、音声の周波数的特徴はパラメータ情報で伝
えることができる。また１ピッチの基本残差波形の概形
は骨組の位置と大きさで符号化ができ、骨間波形は端点
固定に正規化すればベクトル量子化により低ビツトレー
トで符号化することができる。しかも、１ピッチの基本
波形の復号化部においては、基本残差波形を合成した後
、パラメータ情報に基づき線形予測係数を用いてフィル
タリングを行うことによシ、入力音声の周波数的特徴を
直接基本波形に与えることができる。従って、復号化さ
れた音声の明瞭性を向上させることができる。Thereby, the frequency characteristics of the voice can be conveyed by parameter information. Furthermore, the outline of the basic residual waveform of one pitch can be encoded using the position and size of the skeleton, and the interosseous waveform can be encoded at a low bit rate by vector quantization if it is normalized to fix the end points. Moreover, in the decoding section of the 1-pitch fundamental waveform, after synthesizing the fundamental residual waveform, filtering is performed using linear prediction coefficients based on parameter information, so that the frequency characteristics of the input speech can be directly derived from the basic waveform. can be applied to the waveform. Therefore, the clarity of decoded speech can be improved.

実施例以下、本発明の一実施例について図面を参照しながら説
明する。第１図は本発明の一実施例における音声符号化
装置および音声復号化装置のブロック結線図である。EXAMPLE Hereinafter, an example of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a speech encoding device and a speech decoding device in one embodiment of the present invention.

各ブロックの説明を以下に述べる。A description of each block is given below.

符号器１においては、まず、大刀音声信号３をサンプリ
ングしてディジタル信号に変換し、一定時間長（１フレ
ーム）ごとに区切る。In the encoder 1, first, the long voice signal 3 is sampled and converted into a digital signal, and the digital signal is divided into intervals of a fixed time length (one frame).

線形予測分析部６では、各フレーム毎に線形予測分析を
行い、線形予測係数を求める。求めた線形予測係数は、
パラメータ符号化部５において、圧縮性−補間性の良い
ＬＳＰパラメータに変換し、更に、ＬＳＰパラメータの
符号帳４を用いてベクトル量子化し、これをパラメータ
情報として復号器２に送る。The linear prediction analysis unit 6 performs linear prediction analysis for each frame to obtain linear prediction coefficients. The obtained linear prediction coefficient is
In the parameter encoding unit 5, the LSP parameters are converted into LSP parameters with good compressibility and interpolability, and further vector quantized using the LSP parameter codebook 4, and sent to the decoder 2 as parameter information.

この時に用いるパラメータ符号帳４は次の手順で作成す
る。まず、予め多くの音声データに対して線形予測分析
を行い、ＬＳＰパラメータの母集団を作成する。次に、
このＬＡＰパラメータの母集団に対してサンプルとセン
トロイド（重心）間の平均ユークリッド距離が最小にな
るようなりラスタリングを行い、作成しようとする符号
帳サイズの数のクラスに分けて、そのクラスタのセント
ロイド（重心）で符号帳を作成する。この時のクラスタ
リングアルゴリズムの手順については、上記従来技術に
おける骨間波形符号帳作成のだめのクラスタリングアル
ゴリズムの説明の部分において詳細に記した。The parameter codebook 4 used at this time is created by the following procedure. First, linear predictive analysis is performed on a large amount of audio data in advance to create a population of LSP parameters. next,
Rastering is performed on the population of this LAP parameter so that the average Euclidean distance between the sample and the centroid (center of gravity) is minimized, and the cluster is divided into classes as many as the codebook size to be created. Create a codebook using the centroid (center of gravity). The procedure of the clustering algorithm at this time was described in detail in the above-mentioned part of the explanation of the clustering algorithm for creating an interosseous waveform codebook in the prior art.

次に、ピッチ分析部７において、その分析区間内のビシ
テを求めピッチ情報とする。そして、ピッチ情報を復号
器２へ送る。次に、上記ピッチ情報を基に、フレーム内
の波形から１ピッチの平均的な波形（基本波形）を求め
る。更に、上記線形予測分析部６にて求められた線形予
測係数を用いて、基本波形に対してフィルタリングを行
い、】ピッチの基本残差波形を求め、これを骨組検索部
８に送る。Next, the pitch analysis section 7 obtains the pitch within the analysis section and uses it as pitch information. Then, the pitch information is sent to the decoder 2. Next, based on the pitch information, an average waveform of one pitch (basic waveform) is determined from the waveforms within the frame. Further, the basic waveform is filtered using the linear prediction coefficients obtained by the linear prediction analysis section 6 to obtain a basic residual waveform of pitch, which is sent to the skeleton search section 8.

骨組検索部８では、まず、上記ピッチ分析部７で作成さ
れた基本残差波形の形状を分析し、何段階の骨組を立て
るかを考慮しながら、骨組の段数に応じて、正と負で絶
対値最大となるポイントを検索し、その信号の位置と信
号の振幅とを骨組情報とする。この骨組の段数を決めな
がら骨組を検索する方法についての詳細な説明は、上記
従来技術における骨組検索法の説明部分において詳細に
記した。The skeleton search section 8 first analyzes the shape of the basic residual waveform created by the pitch analysis section 7, and while considering how many stages of the framework to construct, determines whether it is positive or negative depending on the number of stages of the framework. The point with the maximum absolute value is searched, and the position and amplitude of the signal are used as skeleton information. A detailed explanation of this method of searching for a skeleton while determining the number of stages of the skeleton has been described in detail in the section explaining the skeleton search method in the above-mentioned prior art.

骨間波形符号化部９では、骨組検索部８において求めら
れた骨組情報に基づき、基本残差波形の骨組の間に張ら
れる部分波形（基本骨間波形）を端点固定（時間的・パ
ワー的）に正規化し、骨間波形符号帳１０に格納されて
いる番号付けられた骨間波形サンプルと比較し、正規化
された基本骨間波形に最も近い骨間波形サンプルに付い
ている番号を骨間波形情報とする。この骨間波形選択方
法の具体的な説明については、上記従来技術における第
６図を用いた骨間波形選択部２４　の説明の部分におい
て詳細に記した。The interosseous waveform encoding unit 9 fixes the end points (temporally and power ) and compare it with the numbered interosseous waveform samples stored in the interosseous waveform codebook 10, and then select the number attached to the interosseous waveform sample closest to the normalized basic interosseous waveform. Interval waveform information. A specific explanation of this interosseous waveform selection method was described in detail in the explanation of the interosseous waveform selection section 24 using FIG. 6 in the above-mentioned prior art.

また、この時に用いる骨間波形符号帳１０は、予め音声
を骨組分析することによって得られる基本骨間波形を多
くの音声データについて集め、それぞれを端点固定（時
間的、パワー的）に正規化して番号を付けて格納するこ
とによって作成される。骨間波形符号帳作成の際のクラ
スタリングアルゴリズムの手順については、上記従来技
術における骨間波形符号帳作成のためのクラスタリング
アルゴリズムの説明の部分において詳細に記した。In addition, the interosseous waveform codebook 10 used at this time collects basic interosseous waveforms obtained by performing skeleton analysis of the audio in advance for many audio data, and normalizes each with fixed end points (in terms of time and power). Created by numbering and storing. The procedure of the clustering algorithm for creating an interosseous waveform codebook has been described in detail in the above-mentioned description of the clustering algorithm for creating an interosseous waveform codebook in the prior art.

次に、第１図および第２図を用いて、復号器２の機能を
説明する。Next, the functions of the decoder 2 will be explained using FIGS. 1 and 2.

まず、骨組形成部１１においては、前記符号化によって
得られるピッチ情報と骨組情報を基に、１ピッチの基本
残差波形の骨組を形成する。第２図上部においては、こ
の時の１ピッチの基本残差波形の骨組ＣＩ、Ｃ２を形成
している様子を示す。First, the skeleton forming section 11 forms a skeleton of a one-pitch basic residual waveform based on the pitch information and skeleton information obtained by the encoding. The upper part of FIG. 2 shows how skeletons CI and C2 of the basic residual waveform of one pitch are formed at this time.

基本残差波形合成部１２においては、骨間波形情報に基
づいて、符号器１に格納されている骨間波形符号帳１０
と同じ骨間波形符号帳１３から基本骨間波形を選び、骨
組に応じて時間的・パワー的に変換して容置の間に張り
、基本残差波形を合成する。第２図中部においては、骨
間波形情報Ｍ、Ｎに基づいて、骨間波形符号帳１３から
基本骨間波形Ｅｉ、Ｅ２を選び、骨組Ｃ１、Ｃ２に応じ
て時間的・パワー的に変換して容置の間に張シ、基本残
差波形Ｆを作成している様子を示す。In the basic residual waveform synthesis unit 12, based on the interosseous waveform information, the interosseous waveform codebook 10 stored in the encoder 1 is
A basic interosseous waveform is selected from the same interosseous waveform codebook 13, converted in terms of time and power according to the skeleton, and stretched between containers to synthesize a basic residual waveform. In the middle part of FIG. 2, basic interosseous waveforms Ei and E2 are selected from the interosseous waveform codebook 13 based on the interosseous waveform information M and N, and are converted in terms of time and power according to the skeletons C1 and C2. The figure shows how the basic residual waveform F is created during storage.

パラメータ復号化部１５においては、符号器１から送ら
れてきたパラメータ情報に基づき、符号器１に格納され
ているパラメータ符合帳４と同じパラメータ符号帳１４
からＬＳＰパラメータを選び、これを基本波形復号化部
１６に送る。In the parameter decoding unit 15, based on the parameter information sent from the encoder 1, a parameter codebook 14 that is the same as the parameter codebook 4 stored in the encoder 1 is used.
The LSP parameters are selected from the LSP parameters and sent to the basic waveform decoding section 16.

基本波形復号化部１６においては、ＬＳＰパラメータを
用いて基本残差波形に対してフィルタリングを行い基本
波形Ｇ（第２図）を作成する。The basic waveform decoding unit 16 performs filtering on the basic residual waveform using the LSP parameters to create a basic waveform G (FIG. 2).

そして、波形復号化部１７においては、復号化された１
ピッチの基本波形を分析区間の始端から終端まで並べる
ことによって音声波形Ｈ（第２図）を作成し、これを出
力音声１８　とする。Then, in the waveform decoding section 17, the decoded 1
By arranging the basic pitch waveforms from the start to the end of the analysis section, a speech waveform H (FIG. 2) is created, and this is used as the output speech 18.

この音声符号化法の効果を示すために、この音声符号化
・復号化のシミーレーション実験を行う。In order to demonstrate the effectiveness of this speech encoding method, we will conduct a simulation experiment of this speech encoding/decoding method.

符号化される音声データは、女性アナウンサー１名の発
声した天気予報の音声［天気予報。気象庁予報部午後１
時３０分発表の天気予報をお知らせします。日本の南岸
には、東西にのびる前線が停滞し、前線上のへ丈島の東
や、北九州の工高列島付近には低気圧があって、東北東
に進んでいます。」を８ｋｌ−１ｚサンプリングでＡ／
Ｄ変換したディジタル音声データで、長さは約２０秒で
ある。音声デ−夕は２０ｍ５ｅｃ　（１フレーム）毎に
分析窓４Ｑｍｓｅｃで分析する。線形予測分析の次数は
１０次、ＬＳＰパラメータは１２８のＤＦＴを用いて検
索した。なお、パラメータ符号帳４およびＩ４のサイズ
は４０９６である。The audio data to be encoded is the audio of a weather forecast uttered by one female announcer [weather forecast]. Japan Meteorological Agency Forecasting Department 1pm
We will inform you of the weather forecast announced at 1:30 p.m. A front extending from east to west is stationary on the southern coast of Japan, and there is a low-pressure system above the front east of Hejojima and near the Kotaka Islands in Kitakyushu, which is moving east-northeast. ' with 8kl-1z sampling A/
This is D-converted digital audio data and is approximately 20 seconds long. The audio data is analyzed every 20m5ec (1 frame) using an analysis window of 4Qmsec. The order of linear prediction analysis was 10th, and the LSP parameters were searched using DFT of 128. Note that the size of parameter codebook 4 and I4 is 4096.

骨組検索部８における骨組段数は最大を３段とした。２
段と３段の骨組位置情報と３段の骨組ゲイン情報につい
ては、複数の情報をベクトルとして骨間波形と同様に符
号帳による符号化を行い、ビットレートの節約を行った
。The maximum number of skeleton stages in the skeleton search section 8 was three stages. 2
Regarding the frame position information of the 3rd stage and 3rd stage, and the 3rd stage skeleton gain information, a plurality of pieces of information were used as vectors and encoded using a codebook in the same way as the interosseous waveform, thereby saving the bit rate.

骨間検索部８においては、ビットレートをさらに下げる
ために、各段数に応じて適応ビット割当てを行った。骨
間波形情報を求めるだめの骨間波形符号帳１０のサイズ
を各段数と波形の長さに応じて変化させて短い波形は小
さい符号帳で、長い波形は大きな符号帳で符号化するよ
うにした。In the interbone search unit 8, adaptive bit allocation was performed according to the number of stages in order to further reduce the bit rate. The size of the interosseous waveform codebook 10 used to obtain interosseous waveform information is changed according to the number of stages and the length of the waveform, so that short waveforms are encoded with a small codebook and long waveforms are encoded with a large codebook. did.

骨間波形符号帳１０は、上記音声データを含まない男女
５０名の各約１０秒間の音声データを分析することによ
って得られた骨間波形サンプル集合を基に上記クラスタ
リング・アルゴリズムを用いて作成した。なお、サンプ
ル集合のサイズは約２万個である。The interosseous waveform codebook 10 was created using the above clustering algorithm based on a set of interosseous waveform samples obtained by analyzing approximately 10 seconds of audio data from 50 men and women, which did not include the audio data. . Note that the size of the sample set is approximately 20,000.

また、復号器２の波形復号化部１７　においては、４Ｑ
　ｍ５ｅＣの三角形窓を用いて波形を重ね合わせること
によシ、合成波形を滑らかに接合する処理を行う。In addition, in the waveform decoding section 17 of the decoder 2, 4Q
By overlapping the waveforms using the triangular window of m5eC, processing is performed to smoothly join the composite waveforms.

本システムにおける音声データ１単位（２０ｍ５ｅｃ　
）当たりのビット割当てについては下記の第１表に示す
。1 unit of audio data in this system (20m5ec
) is shown in Table 1 below.

第１表（最大５．１ｋｂｐｓ）上記条件による符号化実験の結果、低ビツトレートであ
りながら、滑らかで自然な音声が合成できた。Ｓ／Ｎ比
でも約１０ｄＢが得られた。この音声データ以外の音声
で同様の実験を試みたところ５〜１０　ｄＢのＳ／Ｎ比
が得られ、音質も良かった。Table 1 (Maximum 5.1 kbps) As a result of encoding experiments under the above conditions, smooth and natural speech could be synthesized despite the low bit rate. An S/N ratio of about 10 dB was also obtained. When similar experiments were conducted using audio other than this audio data, an S/N ratio of 5 to 10 dB was obtained, and the sound quality was also good.

従来の音声符号化装置および音声復号化装置との比較実
験では、Ｓ／Ｎ比では同等であるが、視聴実験によれば
、本発明による音声符号化装置および音声復号化装置の
方が明瞭性が良いとの評価を得た。In a comparative experiment with a conventional audio encoding device and audio decoding device, the S/N ratio was the same, but according to a viewing experiment, the audio encoding device and audio decoding device according to the present invention had better clarity. received a good evaluation.

上記シミーレーション実験により、本発明による音声符
号化装置および音声復号化装置によって、低ビツトレー
トで明瞭性のある音声符号化・復号化が実現できている
ことが検証された。The above simulation experiment verified that the speech encoding device and speech decoding device according to the present invention can realize clear speech encoding and decoding at a low bit rate.

発明の効果以上のように本発明は、骨組符号化器に線形予測分析部
を置き、音声波形の周波数的特徴を線形予測パラメータ
の形態で復号器に送り、復号器側では復号化された基本
残差波形に対して合成フィルタで周波数的特徴を与える
ようにしたので、１ピッチの基本波形の復号化部におい
ては、基本残差波形を合成した後、パラメータ情報に基
づき線形予測係数を用いてフィルタリングを行うことに
より、入力音声の周波数的特徴を直接基本波形に与える
ことができ、低ビツトレートの簡単なデータ処理で高音
質の音声波形符号化を行うことができ、かつ復号化され
た音声の明瞭度を劣化させずに音声の符号化・復号化を
行うことが可能となる。Effects of the Invention As described above, the present invention places a linear prediction analysis section in a skeleton encoder, sends the frequency characteristics of the speech waveform to the decoder in the form of linear prediction parameters, and the decoder side uses the decoded basic Since frequency characteristics are given to the residual waveform by a synthesis filter, in the decoding section of the 1-pitch basic waveform, after synthesizing the basic residual waveform, linear prediction coefficients are used based on the parameter information. By performing filtering, it is possible to directly apply the frequency characteristics of the input audio to the basic waveform, and it is possible to perform high-quality audio waveform encoding with simple data processing at a low bit rate. It becomes possible to encode and decode speech without deteriorating clarity.

[Brief explanation of the drawing]

第１図は本発明の一実施例における音声符号化装置およ
び音声復号化装置の機能ブロック図、第２図は本発明の
一実施例における音声復号化装置機能説明図、第３図は
従来の骨組符号化方式に基づく音声符号化装置および音
声復号化装置を示す機能ブロック図、第４図は従来の骨
組検索アルゴリズムによシ骨組段数別に分類された基本
波形の集合を示した波形図、第５図は従来の骨組符号化
方式に基づく音声符号化装置および音声復号化装置の基
本波形と骨組情報の関係について骨組が２段の場合を例
として示した波形図、第６図は従来の骨組符号化方式に
基づく音声符号化装置の機能説明図、第７図は従来の骨
組符号化方式に基づく音声復号化装置の機能説明図であ
る。１・・符号器、２・・復号器、３・・・入力音声信号、
４・・・パラメータ符号帳、５・・・パラメータ符号化
部、６・・・線形予測分析部、７・・・ピッチ分析部、
８・・・骨組検索部、９・・・骨間波形符号化部、１０
　・・骨間波形符号帳、１１　　骨組形成部、１２・基
本残差波形合成剤、１３・・・骨間波形符号帳、１４　
・・パラメータ符号帳、１５・・・パラメータ復号化部
、１６・・・基本波形復号化部、１７・・・波形復号化
部、１８　・出力音声。代理人の氏名　弁理士　小鍜治　　　明　ほか２名画　
２図１ｆ団Ｅ２ ↓ フィルタリング第図（ａ）１段の例（ｂ）２段の例（ｃ）３段の例第図第図 ↓FIG. 1 is a functional block diagram of a speech encoding device and a speech decoding device according to an embodiment of the present invention, FIG. 2 is a functional explanatory diagram of a speech decoding device according to an embodiment of the present invention, and FIG. 3 is a conventional FIG. 4 is a functional block diagram showing a speech encoding device and a speech decoding device based on a skeleton coding method. Figure 5 is a waveform diagram illustrating the relationship between the basic waveform and skeleton information of a voice encoding device and a voice decoding device based on a conventional skeleton encoding method, taking the case where the skeleton is two stages as an example, and Figure 6 is a waveform diagram showing the relationship between the basic waveform and skeleton information of a voice encoding device and a voice decoding device based on a conventional skeleton encoding method. FIG. 7 is a functional explanatory diagram of a speech encoding device based on a conventional encoding method. FIG. 1... Encoder, 2... Decoder, 3... Input audio signal,
4... Parameter codebook, 5... Parameter encoding section, 6... Linear prediction analysis section, 7... Pitch analysis section,
8... Skeleton search unit, 9... Interosseous waveform encoding unit, 10
... Interosseous waveform codebook, 11 Skeleton forming part, 12. Basic residual waveform synthesis agent, 13... Interosseous waveform codebook, 14
... Parameter codebook, 15... Parameter decoding section, 16... Basic waveform decoding section, 17... Waveform decoding section, 18 - Output audio. Name of agent: Patent attorney Akira Okaji and 2 other famous painters
2 Figure 1f Group E2 ↓ Filtering diagram (a) 1st stage example (b) 2nd stage example (c) 3rd stage example diagram ↓

Claims

[Claims]

(1) A basic waveform extraction means that divides the audio signal into fixed time intervals, performs pitch analysis for each analysis section, and uses the pitch information to obtain a basic waveform with a length of one pitch representative of the analysis section; Parameter extraction means performs linear prediction analysis for each analysis interval and extracts linear prediction parameters representing the frequency characteristics of the analysis interval; a basic prediction residual waveform extracting means for obtaining a linear prediction residual; a skeleton encoding means for obtaining and encoding a time series (skeleton) of several types of pulses representing the shape of the basic prediction residual waveform;
An interosseous waveform codebook in which a plurality of numbered interosseous waveform samples are stored and an interosseous waveform stretched between the skeleton obtained by the above-mentioned skeleton encoding means. A speech encoding device having interosseous waveform encoding means for encoding using a codebook.

(2) The interosseous waveform codebook is created by normalizing each of the multiple interosseous waveforms obtained by analyzing the audio signal by fixing the end points in terms of time and power, numbering them, and storing them. Claim 1 characterized in that
The audio encoding device described above.

(3) A skeleton decoding means that creates a time series (skeleton) of several types of pulses based on encoded information, and an interosseous waveform codebook that stores a plurality of numbered interosseous waveform samples. Then, based on the information on the shape of the waveform encoded by the interosseous waveform encoding means, the interosseous waveform stretched between the above time series (skeleton) is decoded using the above interosseous waveform codebook, and the basic An interosseous waveform decoding means for creating a predicted residual waveform, and filtering is performed on the basic predicted residual waveform created by the interosseous waveform decoding means using the parameters transmitted from the encoder. , a fundamental waveform decoding means for obtaining a one-pitch fundamental waveform, and intra-analysis interval waveform decoding for decoding a waveform within one analysis interval using the one-pitch fundamental waveform decoded by the fundamental waveform decoding means. A voice decoding device having means.

(4) An interosseous waveform codebook is created by normalizing each of a plurality of interosseous waveforms obtained by analyzing an audio signal by fixing the end points in terms of time and power, numbering them, and storing them. Claim 3 characterized in that
The audio decoding device described.