JPS6040636B2

JPS6040636B2 - speech synthesizer

Info

Publication number: JPS6040636B2
Application number: JP56156797A
Authority: JP
Inventors: 稔黒田; 博糸山
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1985-09-11
Also published as: JPS5857199A

Description

【発明の詳細な説明】本発明は音声合成装置に関するものであり、その目的と
するところはデータ記憶部の記憶容量を増加することな
く各圧縮パラメータに対応して複数種の音程が異なる音
程を選択的に合成できる音声合成装置を提供することに
ある。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech synthesis device, and its purpose is to synthesize a plurality of different pitches in response to each compression parameter without increasing the storage capacity of a data storage unit. An object of the present invention is to provide a speech synthesis device that can selectively synthesize speech.

一般に、音声信号を音声周波数よりも高い周波数のサン
プリングパルスにてサンプリングして音の大小を表わす
振中パラメータ（以下Ａパラメータと略称する）と、音
の高低すなわち基本周期を表わすピッチパラメータ（以
下Ｐパラメータと略称する）と、音の音色すなわちスペ
クトル分布を表わすスペクトルパラメータ（以下Ｓパラ
メータと略称する）よりなる特徴パラメータを抽出し、
各特徴パラメータをそれぞれ音質を寄与する度合に応じ
たビット数に圧縮して圧縮パラメータとしてデータ記憶
部に記憶し、データ記憶部から順次読み出される圧縮パ
ラメータにて予め各特徴パラメータを記憶させた再生用
ＲＯＭをアクセスし、再生用ＲＯＭから読み出された特
徴パラメータにより音源を駆動して音声を合成するよう
にしたこの種の音声合成装置において、音程（基本周期
）のみが異なる音声であっても全く異なる音声を再生す
る場合と同様に、各音程の音声に対応した圧縮パラメー
タをデータ記憶部に記憶させておく必要があった。In general, there is a middle parameter (hereinafter referred to as the A parameter) that represents the magnitude of the sound by sampling the audio signal using a sampling pulse with a frequency higher than the audio frequency, and a pitch parameter (hereinafter referred to as the P parameter) that represents the pitch or fundamental period of the sound. parameter) and a spectral parameter (hereinafter abbreviated as S parameter) representing the timbre of the sound, that is, the spectral distribution,
For playback, each feature parameter is compressed to the number of bits corresponding to the degree to which it contributes to sound quality and stored in the data storage unit as a compression parameter, and each feature parameter is stored in advance as a compression parameter read out sequentially from the data storage unit. In this type of speech synthesizer, which accesses the ROM and synthesizes speech by driving the sound source using characteristic parameters read from the playback ROM, even if the speech differs only in pitch (basic period), As in the case of reproducing different sounds, it is necessary to store compression parameters corresponding to sounds of each pitch in the data storage unit.

したがって、周囲の騒音の状態あるいは使用者の好みに
応じた音程で音声を再生し得るようにするには、各音程
の音声に対応してそれぞれ圧縮パラメータをデータ記憶
部に記憶させておく必要があり、データ記憶部の記憶容
量を必要以上に大きくしなければならないという欠点が
あった。本発明は上記の欠点に鑑みて為されたものであ
る。以下、ＰＡＲＣＯＲ型音声合成装置の一実施例につ
いて図を用いて説明する。ＰＡＲＣＯＲ型音声合成方式
は第１図に示すように音声信号Ｖｓをサンプリングパル
スにより適当周期ｔ。でサンプリングし、サンプリング
されたサンプリング値ＸｔとＸｔ−ｐの間にある（Ｐ−
１）個のサンプリング値による相関関係を除外し、Ｘｔ
とＸｔ‐ｐとの相関関係のみを抽出したＰＡＲＣＯＲ係
数（部分自己相関係数：以下Ｋパラメータと略称する）
をＳパラメータとして音声を合成するものであり、Ｋパ
ラメータは音声がほぼ定常状態とみなせる１フレーム（
５〜２０ｍｓｅｃ）において、適当周期ｔ。（約１００
山ｓｅｃ）鏡に音声信号Ｖｓのサンプリングを行ない、
隣り合うサンプリング値間の相関係数をＫ，とし、複数
間隔離されたサンプリング値間では、その間に挟まれた
サンプリング値による影響を最小２乗誤差による線形予
測によって求め、それらを差引いてできる相関係数をＫ
２〜Ｋ，。としたものである。このＫパラメータはＫ，
、Ｋ２、Ｋ３のように×ｔに近い点との部分自己相関関
係を表わす係数にはスペクトル分布に関する情報が豊富
に含まれているが、Ｋ３、Ｋ９、Ｋ，ｏのような×上か
ら遠い点との部分自己相関係数にはスペクトル分布に関
する情報があまり含まれていないので、低次のＫパラメ
ータに多数の量子化ビットを割り当て、高次のＫパラメ
ータには少数の量子化ビットを割り当てることによりビ
ット数を節減して冗長度を小さくするほうが効果的であ
る。したがってＰＡＲＣＯＲ方式はＳパラメータとして
自己相関係数を用いて各係数に同一ビット数を割り当て
るようにした自己相関係数方式に比べて帯城圧縮率がす
ぐれているものである。Therefore, in order to be able to reproduce sounds at pitches that match the surrounding noise conditions or the user's preferences, it is necessary to store compression parameters in the data storage unit for each pitch of sound. However, there was a drawback that the storage capacity of the data storage unit had to be made larger than necessary. The present invention has been made in view of the above drawbacks. An embodiment of a PARCOR type speech synthesizer will be described below with reference to the drawings. In the PARCOR type speech synthesis method, as shown in FIG. 1, the speech signal Vs is processed by sampling pulses at an appropriate period t. and the sampled value is between Xt and Xt-p (P-
1) Excluding the correlation due to the sampling values, Xt
PARCOR coefficient (partial autocorrelation coefficient: hereinafter abbreviated as K parameter) that extracts only the correlation between and Xt-p
The S-parameter is used to synthesize speech, and the K-parameter is one frame in which the speech can be considered to be in a steady state (
5 to 20 msec) at an appropriate period t. (about 100
Yama sec) Sampling the audio signal Vs on the mirror,
Let K be the correlation coefficient between adjacent sampling values, and between multiple isolated sampling values, the influence of the sampling values sandwiched between them is calculated by linear prediction using the least squares error, and the correlation created by subtracting them is calculated. The relation number is K
2~K,. That is. This K parameter is K,
, K2, K3, which represent the partial autocorrelation with points close to ×t, contain a wealth of information regarding the spectral distribution, but points far from above ×, such as K3, K9, K, o, Since the partial autocorrelation coefficients with points do not contain much information about the spectral distribution, we assign a large number of quantization bits to low-order K parameters and a small number of quantization bits to high-order K parameters. Therefore, it is more effective to reduce the number of bits and reduce redundancy. Therefore, the PARCOR method has a better band compression rate than the autocorrelation coefficient method, which uses an autocorrelation coefficient as an S parameter and allocates the same number of bits to each coefficient.

通常各Ａ、Ｐ、Ｋパラメ−夕は圧縮されて記憶あるいは
伝送され、Ａパラメータに対して５ビット、Ｐパラメー
タに対して６ビット、Ｋパラメータの各数Ｋ，、Ｋ２・
・・・・・・・・Ｋ，ｏに対して７、６、５、４、４、
４、３、３、３、３ビット等のように割り当てる。以下
本発明−実施例の構成を図示実施例について説明する。Typically, each A, P, and K parameter is stored or transmitted in a compressed manner, with 5 bits for the A parameter, 6 bits for the P parameter, and each number of K parameters K, , K2.
・・・・・・7, 6, 5, 4, 4 for K, o
Assign 4, 3, 3, 3, 3 bits, etc. The configuration of the present invention-embodiments will be described below with reference to the illustrated embodiments.

第３図は実施例に係る音声合成装置のブロック図である
。同図に示すようにこの音声合成装置はデータ記憶部８
を含む制御用ＩＣＡと音声合成用ＩＣ（点線部Ａ，Ｂを
除いた部分）との２チップで構成されており、両者間で
ビットシリアルにデータの受渡しを行なうようにしたも
のである。音声の特徴パラメータはすべて再生用ＲＯＭ
Ｉ内に１０ビットのデータとして記憶されており、再生
用ＲＯＭＩ内には音程が補正された補正音声を合成する
ための補正ピッチパラメータ（以下Ｐｍパラメータと略
称する）を記憶させた補正音程用記憶部と標準音程を有
する標準音声を合成するための標準ピッチパラメータ（
Ｐパラメータ）を記憶させた標準音程用記憶部とが設け
られている。各特徴パラメータに割り当てられるデータ
の個数は、その特徴パラメータが音質に寄与する度合に
応じて最適に配分されている。第４図は再生用ＲＯＭＩ
内に記憶されたＰｍ、Ａ、Ｐ、Ｋ，ｏ〜Ｋ，の各特徴パ
ラメータのデータ個数を示している。例えばＡパラメー
タの場合１０ビットで表現されるデ−夕が３２個記憶さ
れている。したがってＡパラメータの任意のデータのア
クセスするときに必要とされる相対アドレスのビット数
は５ビットである。この相対アドレスは特徴パラメータ
を必要最小限に圧縮して表現したものであるので圧縮パ
ラメータと呼ばれる。これに対して再生用ＲＯＭＩの内
に記憶されている実際の特徴パラメータは再生パラメー
タと呼ばれる。上述した所から明らかなように再生パラ
メータのビット数はＰｍ、Ａ、Ｐ、Ｋ，ｏ〜Ｋ，の各特
徴パラメータについてすべて共通に１０ビットであるが
、圧縮パラメータのビット数はＡ、Ｐ、Ｋ，ｏ〜Ｋ，の
各パラメータについて異なるものであり、それぞれ５、
６、３、３、３、３、４、４、４、５、６、７ビツト（
合計５３ビット）である。但し、Ｐｍパラメータをアク
セスする相対アドレスはＰパラメータの相対アドレス（
圧縮パラメータ）を流用する。そのほか予備エリアとし
て３ビット分すなわちデータ８個分が再生用ＲＯＭ内に
確保されている。かかる圧縮パラメータは音声信号がほ
ぼ定常状態とみなし得る２０ｍｓｅｃ（１フレーム）ご
とに１組（＝５３ビット）抽出されるのであるから、高
々２６５０ビット／秒で音声信号を記録することができ
、無音区間やりビート区間をも考慮に入れると実際には
１６００ビット／秒程度で音声信号を記録することがで
きるものである。このような圧縮パラメータ（すなわち
再生用ＲＯＭＩの相対アドレス）はデータ記憶部８から
読み出されて１フレームごとに切換回路１０を介してリ
ングレジスタ３にビットシリアルに記憶されるものであ
るが、このような相対アドレスだけで再生用ＲＯＭＩか
ら記憶データを取り出すことができないので、インデッ
クスＲＯＭ２の中に第５図に示すように記憶されている
先頭アドレスをアドレスカウンタ１１の制御の下に順次
取り出して、上記相対アドレスと加算回路４によって加
算することにより再生用ＲＯＭＩの絶対アドレス（９ビ
ット）を計算し、該絶対アドレスによって再生用ＲＯＭ
Ｉをアクセスするようにしている。FIG. 3 is a block diagram of a speech synthesis device according to an embodiment. As shown in the figure, this speech synthesis device has a data storage section 8.
It consists of two chips: a control ICA including a control IC and a voice synthesis IC (excluding the dotted lines A and B), and data is exchanged between the two in a bit-serial manner. All audio feature parameters are in playback ROM
It is stored as 10-bit data in I, and the playback ROMI has a memory for corrected pitch in which corrected pitch parameters (hereinafter abbreviated as Pm parameters) for synthesizing corrected speech whose pitch has been corrected are stored. Standard pitch parameters (
A standard pitch storage unit is provided in which the standard pitch storage unit stores the P parameters. The number of data assigned to each feature parameter is optimally distributed according to the degree to which the feature parameter contributes to sound quality. Figure 4 shows the ROMI for playback.
The number of data of each feature parameter Pm, A, P, K, o to K, stored in the table is shown. For example, in the case of the A parameter, 32 data expressed in 10 bits are stored. Therefore, the number of relative address bits required when accessing arbitrary data of the A parameter is 5 bits. This relative address is called a compressed parameter because it represents the characteristic parameter compressed to the minimum necessary size. In contrast, the actual feature parameters stored in the playback ROMI are called playback parameters. As is clear from the above, the number of bits of the reproduction parameter is 10 bits in common for each feature parameter Pm, A, P, K, o to K, but the number of bits of the compression parameter is A, P, Each parameter of K, o to K, is different, and each parameter is 5,
6, 3, 3, 3, 3, 4, 4, 4, 5, 6, 7 bits (
total of 53 bits). However, the relative address for accessing the Pm parameter is the relative address of the P parameter (
compression parameters). In addition, 3 bits, ie, 8 pieces of data, are reserved in the reproduction ROM as a spare area. Since one set (=53 bits) of such compression parameters is extracted every 20 msec (1 frame), which can be considered as an almost steady state of the audio signal, it is possible to record the audio signal at a rate of at most 2650 bits/second, and there is no sound. If sections and beat sections are taken into account, it is actually possible to record audio signals at about 1600 bits/second. Such compression parameters (i.e., relative addresses of the playback ROMI) are read from the data storage section 8 and stored bit-serially in the ring register 3 via the switching circuit 10 for each frame. Since it is not possible to retrieve the stored data from the playback ROMI using only such relative addresses, the first addresses stored in the index ROM 2 as shown in FIG. 5 are sequentially retrieved under the control of the address counter 11. The absolute address (9 bits) of the playback ROMI is calculated by adding the above relative address and the addition circuit 4, and the playback ROMI is added using the absolute address.
I am trying to access it.

ところで、実施例にあっては、標準音声を合成する場合
と、補正音声を合成する場合とにおける基本周期発生方
式を変更するようになっており、補正音声を合成する場
合、制御用ＩＣＡから入力される圧縮パラメータのうち
圧縮Ａパラメータの先頭に音程制御コードを付加し、音
程制御コードが検出されたときに出力される補正信号Ｖ
Ｍが得られたときこの音程補正信号ＶＭが入力される音
程切換回路３川こより絶対アドレスの先頭アドレスを０
とするように加算回路４を制御し、Ｐパラメータの圧縮
パラメータを用いて再生用ＲＯＭＩの補正音声用記憶部
からＰｍパラメータを読み出すようになっている。一方
、補正信号ＶＭが得られていないときは再生用ＲＯＭＩ
の標準音声用記憶部からＰパラメータが読み出されるこ
とになる。ここに、Ｐｍパラメータは合成される補正音
声を一定の補正比率で高くあるいは低くするためのパラ
メータであり、実施例では補正比率を十１０％として補
正音声を標準音声に比べて高音側に補正するようになっ
ている。但し、ＰパラメータあるいはＰｍパラメー外こ
対応する基本周期を有する音声の合成方式については後
述する。なお、補正比率は適当に設定すれば良く、複数
種の補正比率（例えば一２０％、１０％、十１０％、十
２０％）を設定する場合には補正音声用記憶部の容量を
複数倍にするとともに音程制御コードを複数ビットにし
て圧縮ｐパラメー外こて読み出されるＰパラメータある
いは複数個のＰｍパラメータを任意に選択できるように
すれば良い。さらにまた、音程制御コード検出回路９に
代えて音程切換スイッチを設けても良い。以下再生用Ｒ
ＯＭＩに記憶されている再生パラメータの読み出し動作
を詳述する。インデックスＲＯＭ２には圧縮パラメータ
のビット配分数を３ビットの２進数で記憶させており、
再生用ＲＯＭＩの記憶容量削減のための共通化ビットを
１ビット設けており、さらに再生用ＲＯＭＩ内の予備エ
リアに対応する予備ビットを設けている。圧縮パラメー
タのビット配分数に関するデータは再生制御回賂１２に
送られ、再生制御回路１２は、該ビット配分数だけシフ
トクロックをリングレジスタ３に送出する。したがって
リングレジスタ３からは、上記ビット配分数に応じて例
えばＡパラメータの場合には５ビット、Ｐパラメータの
場合には６ビット、Ｋ，。パラメータの場合には３ビッ
ト………、Ｋ，パラメータの場合には７ビットという具
合に圧縮パラメータ（相対アドレス）をそれぞれ加算回
路にシリアルに送出するものである。リングレジスタ３
はできるだけチップ面積をとらないようにダイナミック
シフトレジス夕で構成されている。またインデックスＲ
ＯＭ２内に記憶されている各特徴パラメータの再生用Ｒ
ＯＭＩ内における先頭アドレスは、パラレルシリアル変
換回路１３を介して１ビットずつ順次加算回路４に送出
されるので、順次１ビットずつ加算された絶対アドレス
が計算されるものである。こうして計算された直列デー
タよりなる絶対アドレスはシリアルパラレル変換装置１
４を介して並列データに変換され、再生用ＲＯＭＩをア
クセスできるようになっている。ところで、再生用ＲＯ
Ｍ１から出力される特徴パラメー外ま１フレームごとに
更新されるものであるが、データを更新する際に各フレ
ーム間の接続点において特徴パラメータが不連続的に変
化すると音声信号に歪みを生じて明瞭度が低下するおそ
れがあるので、データ更新の際に特徴パラメータがスム
ーズに変化し得るように補間計算回路５を設けて１７レ
ーム内の８点において近似的な直線的補間を行なうよう
にしている。By the way, in the embodiment, the basic period generation method is changed when synthesizing standard speech and when synthesizing corrected speech, and when synthesizing corrected speech, the input from the control ICA is changed. A pitch control code is added to the beginning of the compressed A parameter among the compression parameters to be compressed, and a correction signal V is output when the pitch control code is detected.
When M is obtained, the first address of the absolute address is set to 0 from the pitch switching circuit 3 to which this pitch correction signal VM is input.
The adder circuit 4 is controlled so that the Pm parameter is read from the corrected audio storage section of the playback ROMI using the compression parameter of the P parameter. On the other hand, when the correction signal VM is not obtained, the playback ROMI
The P parameter will be read from the standard voice storage section. Here, the Pm parameter is a parameter for making the synthesized corrected voice higher or lower by a certain correction ratio, and in the example, the correction ratio is set to 110% and the corrected voice is corrected to be higher pitched than the standard voice. It looks like this. However, a method for synthesizing speech having a fundamental period corresponding to the P parameter or the Pm parameter will be described later. Note that the correction ratio can be set appropriately, and when setting multiple types of correction ratios (for example, 120%, 10%, 10%, 120%), the capacity of the correction audio storage unit must be multiplied. At the same time, the pitch control code may be made into a plurality of bits so that a P parameter or a plurality of Pm parameters to be read outside the compressed p parameter can be arbitrarily selected. Furthermore, a pitch changeover switch may be provided in place of the pitch control code detection circuit 9. R for reproduction below
The operation of reading playback parameters stored in OMI will be described in detail. The index ROM2 stores the bit allocation number of compression parameters as a 3-bit binary number.
One common bit is provided to reduce the storage capacity of the reproduction ROMI, and a spare bit corresponding to a spare area within the reproduction ROMI is also provided. Data regarding the bit allocation number of the compression parameter is sent to the reproduction control circuit 12, and the reproduction control circuit 12 sends a shift clock to the ring register 3 by the bit allocation number. Therefore, from the ring register 3, depending on the bit allocation number, for example, 5 bits for the A parameter, 6 bits for the P parameter, K, etc. Compressed parameters (relative addresses) are serially sent to the adder circuit in the form of 3 bits...K in the case of parameters and 7 bits in the case of parameters. ring register 3
consists of dynamic shift registers to take up as little chip area as possible. Also index R
R for reproducing each feature parameter stored in OM2
The leading address in the OMI is sequentially sent bit by bit to the addition circuit 4 via the parallel-serial conversion circuit 13, so that an absolute address is calculated by sequentially adding bit by bit. The absolute address consisting of the serial data thus calculated is the serial-to-parallel converter 1.
4, it is converted into parallel data, and the ROMI for reproduction can be accessed. By the way, the regeneration RO
The feature parameters output from M1 are updated every frame, but if the feature parameters change discontinuously at the connection point between each frame when updating data, distortion will occur in the audio signal. Since there is a risk that the clarity may deteriorate, an interpolation calculation circuit 5 is provided to perform approximate linear interpolation at 8 points within 17 frames so that the feature parameters can change smoothly when updating data. There is.

なお補正音声を合成する場合にはこの補間計算回路５は
作動しない。この補間計算回路５はタイミング制御回路
５２８にて制御され、タイミング制御回路２８では第２
図に示すように１フレーム（２０ｗｓｅｃ）中に８個の
桶間用○クロツク（２．５机ｓｅｃ）を発生し、１個の
Ｄクロツク中に２９函のパラメータ論込用Ｐクロツク（
１００仏ｓｅｃ）、さらに１個のＰクロツク０中に２２
個のビット謙込用Ｔクロツツ（４．５ｒｓｅｃ）が作成
される。８個のＤクロックのうち、最初のＤ，において
リングレジスタ３にデータが読み込まれる。Note that this interpolation calculation circuit 5 does not operate when the corrected speech is synthesized. This interpolation calculation circuit 5 is controlled by a timing control circuit 528, and the timing control circuit 28
As shown in the figure, 8 O-clocks (2.5 seconds) are generated during one frame (20 wsec), and 29 P-clocks (P clocks for parameter input) are generated during one D clock (2.5 seconds).
100 French sec), and 22 in one P clock 0
T-blocks (4.5 rsec) for bit reduction are created. Data is read into the ring register 3 at the first D of the eight D clocks.

各圧縮パラメータＡ、Ｐ、Ｋ，。・…・・・・・Ｋ，は
奇数番目のＰクロックで順次読み込まれるものであり、
例えばＡパラメータはＰ，区間のＴ６〜Ｔ，ｏの５個の
Ｔクロツクで読み込まれる。偶数番目のＰクロックある
いは上記以外のＴクロツクは補間計算回路５、音源ＲＯ
Ｍ６、デジタルフィル夕７などのタイミングとして使用
されるものである。上記補間計算回路５によって２．５
のｓｅｃごとに新しい値に更新された各特徴パラメータ
は、それぞれＰラッチ１６、ＡＫラツチ２３に一時的に
蓄えられる。ただし、補間計算に差し当り必要のないパ
ラメータはすべてＡＫパラメータスタツク２４に転送し
てデジタルフィル夕７の音声合成用データとして蓄積す
る。一方Ｐラツチ１６に蓄えられた音声の基本周期に関
するデータすなわちＰｍ、Ｐパラメータはプリセット型
減算カウンター７にプリセットされる。この減算カウン
タ１７のクロックはクロック切換回路１７ａによりサン
プリングパルスと等しい周波数の標準音声用クロック（
Ｐクロツク）と、サンプリングパルスよりも高い周波数
の補正音声用クロック（Ｔクロック）とに切換えられる
ようになっており、クロック切換回路１７ａは音程制御
コード検出回路９から出力される音程補正信号ＶＭにて
制御される。この減算カウンター７の０出力信号ＶＲに
より音源ＲＯＭ６のアドレスカウンタ１８がリセット
されるようになっており、減算カウンター７の０出力信
号ＶＲの周期に相当する基本周期で音源ＲＯＭ６から音
源制御データが順次読み出され、上記基本周期を有する
音源制御データにて音声音源１９を駆動して基本周期を
有する音声音を発生させる。なお、上記音源制御データ
は原音を周波数分析して得られる残笹波形を再現して音
色を忠実に再生するためのデータである。一方、音声に
基本周期がない場合には、音源制御回路２０‘こて切換
回路２２を駆動し、無声音源２１に切り換える。無声音
源２１は基本周期を持たないホワイトノイズ（白雑音）
を発生するものである。次にＡパラメータおよびＫパラ
メータはデジタルフイルタ７に供給され、音源回路より
供Ｖ給された信号に振幅の大小およびスペクトル分布に
関する情報を付け加えることにより音声を再生するもの
である。なお、第３図において２５はアンプ、２６はス
ピ−力、２７は水晶発振回路である。以下、標準音声お
よび補正音声の基本周期発生部の動作を具体的に説明す
る。Each compression parameter A, P, K,. ......K, is read sequentially at odd-numbered P clocks,
For example, the A parameter is read using five T clocks from T6 to T, o in the P section. Even-numbered P clocks or T clocks other than the above are used by the interpolation calculation circuit 5 and the sound source RO.
This is used as the timing for M6, digital filter 7, etc. 2.5 by the above interpolation calculation circuit 5
Each characteristic parameter updated to a new value every sec is temporarily stored in the P latch 16 and the AK latch 23, respectively. However, all parameters that are not required for the time being for interpolation calculation are transferred to the AK parameter stack 24 and stored as data for speech synthesis in the digital filter 7. On the other hand, the data regarding the fundamental period of the voice stored in the P latch 16, that is, the Pm and P parameters, are preset in the preset type subtraction counter 7. The clock of this subtraction counter 17 is changed by a clock switching circuit 17a to a standard audio clock (of a frequency equal to the sampling pulse).
P clock) and a corrected audio clock (T clock) having a higher frequency than the sampling pulse. controlled by The address counter 18 of the sound source ROM 6 is reset by the 0 output signal VR of the subtraction counter 7, and the sound source control data is transferred from the sound source ROM 6 at a basic cycle corresponding to the period of the 0 output signal VR of the subtraction counter 7. The audio sound source 19 is driven by the sound source control data that is sequentially read out and has the basic period, thereby generating audio sound that has the basic period. The sound source control data is data for faithfully reproducing the tone by reproducing the residual waveform obtained by frequency analysis of the original sound. On the other hand, if the voice does not have a fundamental period, the sound source control circuit 20' drives the iron switching circuit 22 and switches to the silent sound source 21. The unvoiced sound source 21 is white noise that does not have a fundamental period.
is generated. Next, the A parameter and the K parameter are supplied to the digital filter 7, which reproduces the sound by adding information regarding amplitude magnitude and spectral distribution to the signal V supplied from the sound source circuit. In FIG. 3, 25 is an amplifier, 26 is a speaker, and 27 is a crystal oscillation circuit. The operation of the basic period generator for standard speech and corrected speech will be specifically described below.

いま、音程制御コード検出回路９から音程補正信号Ｖ畝
ミ得られてし、ない場合、音声の基本周期を設定するデ
−夕を蓄えるＰラッチ１６には再生用ＲＯＭＩの標準音
声用記憶部から読み出されるＰパラメータ（整数）がラ
ッチされており、減算カウンタ１７のクロツクは標準音
声用クロックすなわちＰクロック（１００ムｓｅｃ）に
切換えられている。Now, if the pitch correction signal V is obtained from the pitch control code detection circuit 9, and if not, the P latch 16, which stores data for setting the basic period of the voice, receives data from the standard voice storage section of the playback ROMI. The P parameter (integer) to be read is latched, and the clock of the subtraction counter 17 is switched to the standard audio clock, that is, the P clock (100 msec).

したがって減算カゥンタ１７の０出力信号ＶＲの周期は
１００ｒｓｅｃの整数倍となり、この０出力信号ＶＲで
リセツトされるアドレスカウンタ１８により音源ＲＯＭ
６から読み出される音源制御データに基いて発生される
音声は上記周期を有するものである。例えばＰパラメ−
夕を「２５」とすれば基本周期は１００×２５ムｓｅｃ
（基本周波数４００日２）となる。一方、音程制御コー
ド検出回路９から音程補正信号ＶＭが得られた場合、Ｐ
ラッチ１６には再生用ＲＯＭＩの補正音声記憶部から読
み出されるＰｍパラメータ（整数値）がラッチされるこ
ととなり、減算カウンター７のクロックはクロック切換
回路１７ａにて補正音声用クロックすなわちＴクロック
（４．５仏ｓｅｃ）に切換えられる。したがって減算カ
ウンタ１７の０出力信号ＶＲの周期は４．５ｕｓｅｃの
整数倍となる。この場合、標準音声用記憶部からＰパラ
メータ「２５」を読み出す圧縮Ｐパラメータにて補正音
声用記憶部から読み出されるＰｍパラメータは「６１」
であり、Ｐｍパラメータが「６１」であれば減算カゥン
タ１７から４．５×６１仏Ｓｅｃの周期で０出力信号Ｖ
Ｒが得られ、アドレスカウンタ１８出力により音源ＲＯ
Ｍ６から読み出される音源制御データに基いて発生され
る音声の基本周期は４．５×６１ｒｓｅｃ（３６４ＨＺ
）となって約十１０％低音側に補正された補正音声が合
成されることになる。この場合Ｐｍパラメータ「６１」
はＰパラメータ「２７．４５１に相当し、襟準音声より
も約１０％低音側に補正された音声を合成するためのも
のである。Therefore, the period of the 0 output signal VR of the subtraction counter 17 is an integral multiple of 100 rsec, and the address counter 18, which is reset by this 0 output signal VR, outputs the sound source ROM.
The sound generated based on the sound source control data read from 6 has the above period. For example, P parameter
If evening is ``25'', the fundamental period is 100 x 25 ms.
(Fundamental frequency 400 days 2). On the other hand, when the pitch correction signal VM is obtained from the pitch control code detection circuit 9, P
The latch 16 latches the Pm parameter (integer value) read from the corrected audio storage section of the playback ROMI, and the clock of the subtraction counter 7 is changed by the clock switching circuit 17a to the corrected audio clock, that is, the T clock (4. 5 French seconds). Therefore, the period of the 0 output signal VR of the subtraction counter 17 is an integral multiple of 4.5 usec. In this case, the Pm parameter read from the corrected audio storage section is "61" when the compressed P parameter is read out from the standard audio storage section as P parameter "25".
If the Pm parameter is "61", the 0 output signal V is output from the subtraction counter 17 at a period of 4.5 x 61 French Sec.
R is obtained, and the address counter 18 outputs the sound source RO.
The basic period of the sound generated based on the sound source control data read from M6 is 4.5 x 61 rsec (364Hz).
), and the corrected audio that is corrected to the bass side by about 110% is synthesized. In this case, Pm parameter "61"
corresponds to the P parameter "27.451," and is used to synthesize a voice that is corrected to be about 10% lower than the low-pitched voice.

ところで、上述のようにして合成された補正音声は基本
周期に関しては問題がないが、デジタルフィル夕７を用
いることによりＫパラメータに基いたスペクトル情報を
付加している場合において若干の問題がある。すなわち
、デジタルフィル夕７における演算はＰクロックに同期
して行なわれるので、Ｐクロツクに同期せずにアドレス
カウンタ１８がリセツトされると、デジタルフィル夕７
の演算処理に誤差が発生して合成された音声に歪が生ず
る。したがって、実施例にあっては減算カゥンタ１７か
ら出力される０出力信号ＶＲを第６図に示すようなリセ
ットパルス発生回路４０を介してアドレスカウンタ１８
のリセット端子に入力するようにしている。このリセッ
トパルス発生回路４０はィンバータ４１ａ，４１ｂ、コ
ンデンサ４２、ナンドゲート４３、Ｄフリツプフロツプ
４４およびアンドゲート４５にて形成されており、第７
図ａのタイムチャートに示すように減算カウンタ１７か
ら０出力信号ＶＲが得られた直後のＰクロツクをアドレ
スカウンタ１８のリセットパルスＶＲ′として出力する
ようになっている。図中イはＰパラメータが「１２」の
標準音声を合成するときの０検出信号ＶＲ、口はＰパラ
メータ「１２．８」に相当するようＰｍパラメータ「２
８４」に基づいて補正音声を合成するときの０検出信号
ＶＲ、ハは同上の補正音声を合成するときのＩＪセット
パルスＶＲ示すものである。Incidentally, although the corrected speech synthesized as described above has no problem with respect to the fundamental period, there is a slight problem when spectral information based on the K parameter is added by using the digital filter 7. That is, since the calculation in the digital filter 7 is performed in synchronization with the P clock, if the address counter 18 is reset without synchronization with the P clock, the calculation in the digital filter 7 is performed in synchronization with the P clock.
An error occurs in the calculation process, causing distortion in the synthesized voice. Therefore, in this embodiment, the 0 output signal VR output from the subtraction counter 17 is sent to the address counter 18 via a reset pulse generation circuit 40 as shown in FIG.
I am trying to input it to the reset terminal of. This reset pulse generation circuit 40 is formed by inverters 41a, 41b, a capacitor 42, a NAND gate 43, a D flip-flop 44, and an AND gate 45.
As shown in the time chart of FIG. 1A, the P clock immediately after the 0 output signal VR is obtained from the subtraction counter 17 is output as the reset pulse VR' of the address counter 18. In the figure, A is the 0 detection signal VR when synthesizing standard speech with P parameter "12", and mouth is Pm parameter "2" to correspond to P parameter "12.8".
0 detection signal VR when the corrected voice is synthesized based on ``84'', and C indicates the IJ set pulse VR when the corrected voice is synthesized based on the above.

このように、リセットパルス発生回路４０から出力され
るリセットパルスＶＲ′はＰクロツクと同期をとってい
るため、アドレスカウソタ１８のリセット間隔は等間隔
は等間隔にはならず、０検出信号ＶＲの基本周波数が４
．５×２４８仏ｓｅｃの場合、アドレスカウンタ１８は
Ｐクロツクを１針固カウントしてリセットさる場合と、
Ｐクｏツクを１２個カウントしてリセットされる場合と
が４：１の割合で起きることになる。As described above, since the reset pulse VR' outputted from the reset pulse generation circuit 40 is synchronized with the P clock, the reset intervals of the address counter 18 are not equal intervals, and the 0 detection signal VR The fundamental frequency of is 4
．． In the case of 5 x 248 fsec, the address counter 18 is reset by counting the P clock by one stitch;
The number of cases where 12 P clocks are counted and then reset occurs at a ratio of 4:1.

したがって等価的にＰパラメータ「１２．８」に相当す
る基本周期で音源ＲＯＭ６がアドレスされて有声音源１
９が制御されることになり、所定の基本周期を有する補
正音声が合成されることになる。なお、第７図ｂに示す
タイムチャートは０検出信号ＶＲとりセットパルスＶＲ
′との関係をさらに分かり易く説明するもので、例とし
て３．７瓜Ｈｚ（２６７仏ｓｅｃ周期）の０出力信号Ｖ
Ｒに対応するリセツトパルスＶＲ′を示したものである
。Therefore, the sound source ROM 6 is addressed at a fundamental period equivalent to the P parameter "12.8", and the voiced sound source 1 is
9 will be controlled, and a corrected speech having a predetermined fundamental period will be synthesized. In addition, the time chart shown in FIG. 7b shows the 0 detection signal VR and the set pulse VR.
' This is to explain the relationship between
The reset pulse VR' corresponding to R is shown.

図から明らかなようにリセットパルスＶＲ′としてＰク
ロツクの３、６、８、１１、１４１６……番目のパルス
が出力される。このリセツトパルスＶＲ′でリセットさ
れるアドレスカウンタ１８により音源ＲＯＭ６がアドレ
スされるので、音源ＲＯＭ６から等価的に３．７球比（
響。ｒ楓）とみなせる周期で有声音源データが読み出さ
れることになり、有声音源１９が所定の基本周波数で駆
動されて補正音声が正確な音程で合成されることになる
。本発明は上述のように構成されており、再生用ＲＯＭ
内に標準音程を有する標準音声を合成するための標準ピ
ッチパラメータを記憶する標準音声用記憶部と、音程が
補正された補正音声を合成するための補正ピッチパラメ
ータを記憶する補正音声用記憶部とを設け、圧縮パラメ
ータに基いて再生用ＲＯＭから読み出されるピッチパラ
メータが標準あるいは補正ピッチパラメータとなるよう
に再生用ＲＯＭのアクセス方式を適宜切換制御する音程
切換回路を設けたので、データ記憶部の記憶容量を増加
することなく各圧縮パラメータに対応して複数種の音程
の異なる音声を選択的に合成でき、簡単な構成で周囲の
騒音の状態あるいは使用者の好みに応じた音程の音声を
合成し得る音声合成装置を提供することができるという
利点がある。As is clear from the figure, the 3rd, 6th, 8th, 11th, 1416th... pulse of the P clock is output as the reset pulse VR'. Since the sound source ROM 6 is addressed by the address counter 18 which is reset by this reset pulse VR', the sound source ROM 6 is equivalently 3.7 pitch ratio (
sound. The voiced sound source data will be read out at a period that can be considered as 1), the voiced sound source 19 will be driven at a predetermined fundamental frequency, and the corrected sound will be synthesized at an accurate pitch. The present invention is configured as described above, and the playback ROM
a standard voice storage unit that stores standard pitch parameters for synthesizing a standard voice having a standard pitch; and a corrected voice storage unit that stores corrected pitch parameters for synthesizing a corrected voice whose pitch is corrected. A pitch switching circuit is provided to appropriately switch and control the access method of the playback ROM so that the pitch parameter read from the playback ROM based on the compression parameter becomes the standard or corrected pitch parameter. It is possible to selectively synthesize multiple types of voices with different pitches according to each compression parameter without increasing the capacity, and with a simple configuration, it is possible to synthesize voices with pitches according to the surrounding noise condition or the user's preference. There is an advantage in that it is possible to provide a speech synthesizer that obtains the desired results.

[Brief explanation of the drawing]

第１図は本発明一実施例の音声合成方式の原理説明図、
第２図は同上の動作説明図、第３図は同上のブロック回
路図、第４図および第５図は同上の再生用ＲＯＭおよび
インデックスＲＯＭの構成を示す図、第６図は同上の要
部回路図、第７図ａ，ｂは同上の動作説明図である。１は再生用ＲＯＭ、８はデータ記憶部、１９，２１は音
源、３０は音程切襖回路である。第１図第２図第７図図の濁第４図第６図第５図第７図FIG. 1 is a diagram explaining the principle of a speech synthesis method according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram of the same operation as above, FIG. 3 is a block circuit diagram of same as above, FIGS. 4 and 5 are diagrams showing the configuration of the playback ROM and index ROM of same as above, and FIG. The circuit diagram and FIGS. 7a and 7b are explanatory diagrams of the same operation. 1 is a reproduction ROM, 8 is a data storage section, 19 and 21 are sound sources, and 30 is a pitch cutting circuit. Figure 1 Figure 2 Figure 7 Cloudiness Figure 4 Figure 6 Figure 5 Figure 7

Claims

[Claims]

1 Sampling the audio signal using a sampling pulse with a frequency higher than the audio frequency to extract feature parameters consisting of amplitude parameters, pitch parameters, and spectral parameters, and assigning the number of bits for each feature parameter according to the degree to which it contributes to sound quality. The compression parameters are stored in the data storage unit as compression parameters compressed into
In a speech synthesis device configured to synthesize speech by agitating a sound source using characteristic parameters read from M, a standard for storing standard pitch parameters for synthesizing standard speech having a standard pitch in the playback ROM. An audio storage unit and a corrected audio storage unit that stores corrected pitch parameters for synthesizing corrected audio whose pitch has been corrected are provided, and the pitch parameters read from the playback ROM based on the compression parameters are standard or corrected. 1. A speech synthesis device comprising a pitch switching circuit that appropriately switches and controls an access method of a playback ROM so as to match a pitch parameter.