JPS58215700A

JPS58215700A - Voice synthesizer

Info

Publication number: JPS58215700A
Application number: JP9880782A
Authority: JP
Inventors: 竹内　貞二
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-06-09
Filing date: 1982-06-09
Publication date: 1983-12-15

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は音声合成装置に関し、特に、その音源振幅デー
タを合成パラメータとして必要とする合成装置に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech synthesis device, and particularly to a synthesis device that requires sound source amplitude data as a synthesis parameter.

一般に、パラメータ合成型の音声合成装置において合成
の初期値を設足するために用いられる所定区間毎の音源
振幅パラメータとしては、例えば、第１因に示すような
自然音声波形ａよシ、次式で求めることが出来る平均振
幅値が用いられる。In general, the sound source amplitude parameter for each predetermined section used to set the initial value for synthesis in a parameter synthesis type speech synthesizer is, for example, a natural speech waveform a as shown in the first factor, and the following formula: The average amplitude value that can be determined by is used.

ここで、Ａは平均振幅値、ｎは一定区間での標本数ｅ　
　”（りは時刻ｉでの振幅値を示す。Here, A is the average amplitude value, n is the number of samples in a certain interval e
”(ri indicates the amplitude value at time i.

かかる平均振幅値を用いて音声を合成する場合、例えば
従来のフォルマント合成では合成装置の特性２例えば、
他の音声パラメータの適合性やそれらの量子化時におけ
る誤差等を全く考慮に入れることなく上記の式から導き
出された自然音声（原音声）の平均振幅データをそのま
ま用いていたため、再生された合成音声が異常な振幅値
を示し、音品の低下を来たし７１音声レベルの制御が困
難である等の欠点があった。When synthesizing speech using such an average amplitude value, for example, in conventional formant synthesis, the characteristics 2 of the synthesizer, for example,
Because the average amplitude data of natural speech (original speech) derived from the above formula was used as is, without taking into account the suitability of other speech parameters or errors during their quantization, the reproduced synthesized There were drawbacks such as the sound exhibiting an abnormal amplitude value, deterioration of the sound quality, and difficulty in controlling the sound level.

本発明は音品の低下やレベル変動を来たさない合成装置
′１に提供することを目的とする。An object of the present invention is to provide a synthesizer '1 that does not cause deterioration of sound quality or level fluctuation.

本発明の音声合成装置は原音声から抽出した振幅平均値
を合成装置に見あうように補正する手段を有し、これに
よって補正された振幅平均値を用いて音声を合成するよ
うにしたことを特徴とする。The speech synthesis device of the present invention has means for correcting the amplitude average value extracted from the original speech so as to be suitable for the synthesis device, and the speech is synthesized using the corrected amplitude average value. Features.

本発明によれば、あらかじめ定められた値を音源振幅値
として用いて音声を合成する手段と、合成された音声の
平均振幅値を抽出する手段と、自然音声の平均振幅値と
合成された音声の平均振幅値を比較する手段と、比較さ
れた結果の出力に応じて、前記あらかじめ定められた値
全補正する手段とを有し、これによシ補正された値を音
声合成装置の音源振幅値として用いることを特徴とする
。According to the present invention, there is provided a means for synthesizing speech using a predetermined value as a sound source amplitude value, a means for extracting an average amplitude value of the synthesized speech, and a speech synthesized with the average amplitude value of natural speech. and means for correcting all of the predetermined values according to the output of the compared results, and the corrected value is thereby calculated as the sound source amplitude of the speech synthesizer. It is characterized by being used as a value.

本発明によれば、音声を合成する際自然音声（原音声）
から抽出した振幅平均値をそのまま音源振幅値として用
いるのではなく、これを合成装置に見合うように補正し
てから使用、するようにしているので、原音声を分析し
て抽出した振幅平均値以外の他の音声パラメータ（例え
ば有無声パラメータやピッチパラメータ）やパラメータ
に量子化する時の誤差、もしくは使用する合成装置自体
のもつ特性誤差等によって生じる音品の劣化やレベルの
変動を前記補正によって補償できる。従って、合成され
た音声には振幅の異常や音品の劣化は見られない。また
、同様のパラメータ合成方式を用いる合成装置であれば
、装置自体が有している振幅平均値を変動せしめるよう
な特性誤差があっても、これ全有効に補償して音声を合
成することができる。According to the present invention, when synthesizing speech, natural speech (original speech)
Rather than using the average amplitude value extracted from the source as the source amplitude value as it is, we correct it to suit the synthesizer before using it. The above correction compensates for deterioration of sound quality and fluctuations in level caused by other audio parameters (for example, voicing parameters and pitch parameters), errors in quantizing the parameters, or characteristic errors of the synthesizer itself used. can. Therefore, no abnormal amplitude or deterioration of sound quality is observed in the synthesized speech. Furthermore, if a synthesizer uses a similar parameter synthesis method, even if there is a characteristic error that causes the amplitude average value of the device itself to fluctuate, it will be possible to effectively compensate for all of these errors and synthesize speech. can.

以下に、本発明の一実施例を第２図を参照して、よシ詳
細に説明する。Hereinafter, one embodiment of the present invention will be described in detail with reference to FIG. 2.

第２図は本発明をフォルマント合成装置に適用した一実
施例の機能プロ、り図である。同図におちて、１は音声
分析装置、２と４は平均振幅値抽出装置、３と６は音声
合成装置、５は比較器である。又、７は自然音声の入力
、８は自然音声より分析された音源振幅以外のパラメー
タ（音声パラメータ、有無声パラメータ、及びピッチ・
パラメータ）９は原音声より得られた平均振幅値、１０
はあらかじめ定められた値を持つ音源振幅パラメータ（
なおこれは自然音声より得られた平均撮幅値全そのまま
使用してもよい）、１１は合成音声。FIG. 2 is a functional diagram of an embodiment in which the present invention is applied to a formant synthesis device. In the figure, 1 is a speech analysis device, 2 and 4 are average amplitude value extraction devices, 3 and 6 are speech synthesis devices, and 5 is a comparator. In addition, 7 is a natural voice input, and 8 is a parameter other than the sound source amplitude analyzed from the natural voice (voice parameter, voice presence/absence parameter, and pitch/voice parameter).
Parameter) 9 is the average amplitude value obtained from the original voice, 10
is a sound source amplitude parameter with a predetermined value (
(Note that all the average field of view values obtained from natural voices may be used as they are.) 11 is a synthesized voice.

１２は合成音声１１の平均振幅値、１３は９．１０゜１
２の平均摂幅値を比較して得られた音源振幅パラメータ
、１４は目的とする合成音声である。12 is the average amplitude value of the synthesized speech 11, 13 is 9.10°1
The sound source amplitude parameter 14 obtained by comparing the average amplitude values of 2 is the target synthesized speech.

但し、平均振幅抽出装置１２と４．又は音声合成装＃：
、３と６は、まったく同一のものを共用してもよいが、
説明を理解しやすくするために別々のブロック図で示し
た。However, the average amplitude extraction devices 12 and 4. Or speech synthesizer #:
, 3 and 6 may share exactly the same thing, but
They are shown in separate block diagrams to make the explanation easier to understand.

以下にこの構成にもとづく動作を説明する。まず、自然
音声７が平均振幅値抽出装置１２を具備した音声分析装
Ｒ１に入力され、その出力として、平均振幅値９とその
他のパラメータ（音声パラメータなど）８が得られる。The operation based on this configuration will be explained below. First, natural speech 7 is input to a speech analysis device R1 equipped with an average amplitude value extraction device 12, and an average amplitude value 9 and other parameters (speech parameters, etc.) 8 are obtained as output.

その各種パラメータ８゜９を直接最終的な音声合成に使
用せずに、その前に予め音源振幅パラメータの調整を行
なう。調整においては、各棟パラメータ８．９の内、音
源振幅パラメータ１０を、仮にある定められた他の値を
用いて、音声合成装置６と同じ音声合成装置３に入力し
て音声を合成してみる。而して得られたその出力の合成
音声１１の平均振幅値１２を平均振幅値抽出装置４よシ
得、上記の平均振幅値９゜１０．１２に用いて比較器５
で比較してこの音声合成装置６に最適な音源振幅パラメ
ータ１３を求める。その結果、この最適な音源振幅パラ
メータ１３とその他の各種パラメータ８で、音声合成装
置６によフ自然音声７の振幅値に忠実な振幅値を持つ合
成音声１４を得られる。The various parameters 8.9 are not directly used for final speech synthesis, but the sound source amplitude parameters are adjusted in advance. In the adjustment, the sound source amplitude parameter 10 of each building parameter 8.9 is temporarily input to the same speech synthesizer 3 as the speech synthesizer 6 using a certain other predetermined value to synthesize speech. View. The average amplitude value 12 of the output synthesized speech 11 thus obtained is obtained by the average amplitude value extraction device 4, and is used as the average amplitude value 9°10.12 to be used in the comparator 5.
The optimum sound source amplitude parameter 13 for this speech synthesizer 6 is determined by comparison. As a result, by using this optimum sound source amplitude parameter 13 and other various parameters 8, the synthesized speech 14 having an amplitude value faithful to the amplitude value of the natural speech 7 can be obtained by the speech synthesizer 6.

次に、この比較器５の説明をする。音声合成装置６で得
られる合成音声１４の平均振幅値を原音声７の平均振幅
値９と同じにさせたいわけであるが、ここで音源振幅パ
ラメータ以外のパラメータは同一とすると、いま、音声
合成装置３にあらかじめ定まった値の平均振幅パラメー
タを入力した時、その合成された合成音声−１１の平均
振幅値１２を得ることができる。従って、自然音声７の
平均振幅値９と同じ合成音声１４を得るためには、音声
合成装置６の入力として、比較器５からは次のような音
源振幅パラメータ１３を出力すればよい。Next, this comparator 5 will be explained. We want the average amplitude value of the synthesized speech 14 obtained by the speech synthesizer 6 to be the same as the average amplitude value 9 of the original speech 7, but if we assume that the parameters other than the sound source amplitude parameter are the same, now the speech synthesis When an average amplitude parameter having a predetermined value is input to the device 3, an average amplitude value 12 of the synthesized synthesized speech-11 can be obtained. Therefore, in order to obtain the synthesized speech 14 that is the same as the average amplitude value 9 of the natural speech 7, the following sound source amplitude parameter 13 may be outputted from the comparator 5 as an input to the speech synthesizer 6.

即ち、この音声合成装置の利得全考えて、あらかじめ定
まった値を持つ音源振幅パラメータ１０をＥとし、Ｅｉ
用いて合成した音声１１の平均振幅値１２’ｋＤとし、
自然音声の平均振幅値’ＩＤ’とし、又、上記の比較器
５が出力すべき値（っまＣＤ／　と同じ平均振幅値にな
る合成音声１４を出力するために音声合成装置６に入力
すべき値）１３＋　Ｅ　／　とすると、以下の式が成り
立つ。That is, considering all the gains of this speech synthesizer, let the sound source amplitude parameter 10 having a predetermined value be E, and Ei
The average amplitude value of the voice 11 synthesized using
The average amplitude value of the natural voice is 'ID', and the value that the comparator 5 should output (CD/ When the exponent value is 13+E/, the following formula holds true.

Ｄ／Ｅ＝Ｄ’　／Ｅ’ 故にＥ’　＝Ｄ’　ｘＥ／Ｄ即ち、比較器５は、この式に基づいて、Ｄ’　、　Ｅ。D/E=D'/E' Therefore E' = D' x E/D That is, the comparator 5 calculates D' and E based on this formula.

Ｄの入力を得て、Ｅ′を出力するように構成すればよい
。It may be configured to receive input of D and output E'.

以上説明したように、本発明は音声分析装置によって抽
出した音源振幅パラメータとその他のパラメータの適合
性や音声合成装置の特性を考慮に入れて、最適な音源振
幅パラメータに補正しているので、自然音声の振幅値を
忠実に再生できるという大きな効果がある。As explained above, the present invention takes into account the compatibility of the sound source amplitude parameter extracted by the speech analysis device with other parameters and the characteristics of the speech synthesizer, and corrects the sound source amplitude parameter to the optimal one. This has the great effect of faithfully reproducing the amplitude value of the audio.

[Brief explanation of drawings]

第１図はある一定区間の音声の波形図である。ａ：音声波形ｎニー足区間の標本数ａ（０）、　　ａ（１）・・・・・・：時刻０．１・・
・・・での振幅値第２図は本発明の一実施例を示す機能
プロ、り図である。１・・・・・音声分析装置、２．４・・・・・・平均振
幅値抽出装置、３．６・・・・・・音声合成装置、５・
・・・・・比較器、７・・・・・・自然音声入力端子、
８・・・・・・音源振幅以外のパラメータ、９・・・・
・・自然音声より得られた平均振幅値、１０・・・・・
・あらかじめ定まった値（自然音声より得られた平均振
幅値を使用してもよい）を持つ音源振幅パラメータ、１
１・・・・・・合成音声、１２・・パ・・・１１の合成
音声の平均振幅値、１３・・・・・９゜１０．１２の平
均振幅値を比較して得られた音源振幅パラメータ、１４
・・・・・・合成音声出力端子。FIG. 1 is a waveform diagram of audio in a certain period. a: Number of samples of audio waveform n knee section a(0), a(1)...: Time 0.1...
. . . Amplitude values FIG. 2 is a functional diagram showing an embodiment of the present invention. 1... Speech analysis device, 2.4... Average amplitude value extraction device, 3.6... Speech synthesis device, 5.
...Comparator, 7...Natural voice input terminal,
8... Parameters other than sound source amplitude, 9...
...Average amplitude value obtained from natural speech, 10...
・Sound source amplitude parameter with a predetermined value (an average amplitude value obtained from natural speech may be used), 1
1... Synthesized speech, 12... Pa... Average amplitude value of synthesized speech of 11, 13... Sound source amplitude obtained by comparing the average amplitude value of 9° 10.12. Parameter, 14
...Synthesized audio output terminal.

Claims

[Claims]

means for synthesizing speech using speech parameters extracted from natural speech and predetermined sound source amplitude value parameters; means for extracting an average amplitude value of the synthesized speech; and an average amplitude value of the natural speech. and a means for comparing the average amplitude value of the synthesized voice and correcting the predetermined value accordingly, and using the corrected value as a sound source amplitude value and using this and the voice parameter to generate the target voice. 1. A speech synthesis device comprising: means for synthesizing.