JPH0419800A

JPH0419800A - Voice synthesizing device

Info

Publication number: JPH0419800A
Application number: JP2125074A
Authority: JP
Inventors: Kanji Kunisawa; 国澤　寛治; Noboru Uechi; 上地　登; Akira Yamamura; 山村　彰; Junko Omukai; 大向　順子
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1990-05-15
Filing date: 1990-05-15
Publication date: 1992-01-23

Abstract

PURPOSE:To reduce the circuit scale and to decrease the difference between the quality of the synthesized voice of a fixed word and the quality of the synthesized voice of a variable word by performing the synthesis of the fixed word and the synthesize of the variable word by a common vocoder. CONSTITUTION:This device is equipped with a storage means 18 stored with codes of output voices of fixed words which are generated corresponding to vector quantization or matrix quantization according to standard patterns. Further, the device is equipped with the vocoder 8 which decodes and composes the codes of variable words generated with phoneme information and rhythm information obtained by the text composition 6 of character strings of variable words from the correspondence table between the composition units of text composition and standard patterns and codes extracted from the storage means 18. Namely, the composition of the fixed words and the composition of the variable words are performed by the common vocoder 8. Consequently, the circuit scale is reduced and the difference between the quality of the synthesized voice of the fixed words and the quality of the synthesized voice of the variable words can be made small.

Description

【発明の詳細な説明】［産業上の利用分野〕本発明は、定型文中置き換え可能な単語たる可変語の合
成音声と、置き換え不可能な固定語の合成音声とを組み
合わせて出力する音声合成装置に関するものである。[Detailed Description of the Invention] [Field of Industrial Application] The present invention provides a speech synthesis device that outputs a combination of synthesized speech of variable words that can be replaced in fixed sentences and synthesized speech of fixed words that cannot be replaced. It is related to.

［従来の技術］音声合成には録音編集方式、パラメータ編集方式、テキ
スト合成方式の３種類がある。[Prior Art] There are three types of speech synthesis: a recording editing method, a parameter editing method, and a text synthesis method.

この録音編集方式やパラメータ編集方式（以後、この二
つの方式を合わせて録音編集的方式と称する）は合成音
の品質は高いが、予め出力させたい音声に対応する自然
音声を録音して（圧縮して）記憶させる作業が必要であ
る。This recording editing method and parameter editing method (hereinafter, these two methods are collectively referred to as the recording editing method) have a high quality synthesized sound, but the natural sound corresponding to the sound you want to output must be recorded (compressed) in advance. It is necessary to memorize the information (by doing so).

テキスト合成方式の場合はその作業が必要であり、文字
列を入力さえすれば任意の音声が出力できるが、現状の
技術レベルでは合成音声の品質が低い。In the case of the text synthesis method, this work is necessary, and any voice can be output by simply inputting a string of characters, but the quality of the synthesized voice is low at the current technological level.

ここで、出力音声の内容が定型文で、しかも定型文の中
の一部の単語が例えば人名のように色々と変わるような
音声合成を行う音声合成装置をまず録音編集的方式のみ
で実現することを考えてみる。この場合上記のように色
々と変わる単語を可変語、定型文から可変語を除いた語
句を固定語と呼ぶことにすると、出力音声を記憶させる
最も単純な方法は各可変語に対応した全ての定型文を記
憶することであるが、このようにすると記憶容量が大き
くなってしまうという問題がある。Here, we will first realize a speech synthesis device that synthesizes speech in which the content of the output speech is a fixed phrase and some words in the fixed phrase change in various ways, such as a person's name, using only a recording/editing method. Let's think about it. In this case, the words that change in various ways as mentioned above are called variable words, and the words obtained by removing the variable words from the fixed phrase are called fixed words.The simplest way to memorize the output speech is to record all the words corresponding to each variable word. This method involves memorizing fixed phrases, but there is a problem in that the storage capacity becomes large.

この点を解消するために、各固定語の音声データと、各
可変語の音声データとを第４図に示すように固定語音声
データファイル１と、可変語音声データファイル２とに
より予め記憶し、固定語と可変語とを順次選択する選択
信号を入力した録音編集的合成部３はこの選択信号に基
づいて、固定語の音声データと、可変語の音声データと
を各音声データファイル１．２から読み出すことにより
合成を行ってスピーカ４より合成音声を出力させること
が考えられる。In order to solve this problem, the audio data of each fixed word and the audio data of each variable word are stored in advance in a fixed word audio data file 1 and a variable word audio data file 2, as shown in FIG. Based on this selection signal, the recording/editing synthesis unit 3 receives a selection signal for sequentially selecting fixed words and variable words, and converts the fixed word audio data and the variable word audio data into each audio data file 1. It is conceivable to perform synthesis by reading out the voice from the speaker 4 and output the synthesized voice from the speaker 4.

この場合は一般に韻律の自然性が悪くなるが、可変語の
韻律情報を固定語の韻律情報に合わせて合成することに
より自然性を向上させることができる。In this case, the naturalness of the prosody generally deteriorates, but the naturalness can be improved by combining the prosody information of the variable word with the prosody information of the fixed word.

しかしこの方法の欠点は可変語に関しては予め記憶され
ている可変語の音声しか出力できないことである。However, the drawback of this method is that only the sounds of variable words that have been stored in advance can be output.

そこで第５図に示すように固定語の選択信号と、出力し
たい可変語の文字列信号とからなる出力音声指示信号を
入力し、固定語、／′可変語分１ｌｌｔ部５により出力
音声信号を選択信号と、文字列信号とに振り分け、選択
信号に基づいて録音編集的合成部３が、固定語音声デー
タファイル１より固定語の音声データを読み出して固定
語の音声合成を行い、一方可変語に対応する文字列信号
をテキスト合成部６に入力してテキスト合成を行い、夫
々で合成された音声をデマルチアレフサ７により加算す
ることで定型文の音声をスピーカ４より出力させる方法
が考えられる。Therefore, as shown in FIG. 5, an output audio instruction signal consisting of a fixed word selection signal and a character string signal of a variable word to be output is input, and the fixed word//' variable word 1llt unit 5 outputs the output audio signal. Based on the selection signal, the recording/editing synthesis unit 3 reads the fixed word audio data from the fixed word audio data file 1 and synthesizes the fixed word into speech. A conceivable method is to input a character string signal corresponding to the text into the text synthesis section 6 to perform text synthesis, and add the voices synthesized by each using the demultiplexer 7 to output the voice of the fixed phrase from the speaker 4.

し発明が解決しようとする課題］この第５図従来例の場合、音声合成装置として固定語の
合成のための録音編集的合成部３と、可変語の合成のた
めのテキスト合成部６とを必要とし、そのため回路規模
が大きくなるという問題があり、また固定語の合成音声
の品質と可変語の合成音声の品質との差が大き過ぎて、
合成音声が不自然となるという問題がある。[Problems to be Solved by the Invention] In the case of the conventional example shown in FIG. There is a problem that the circuit size becomes large, and the difference between the quality of synthesized speech for fixed words and the quality of synthesized speech for variable words is too large.
There is a problem that the synthesized speech becomes unnatural.

本発明は上述の問題点に鑑みて為されたもので、その目
的とするところは回路規模が小さく、しかも固定語の合
成音声の品質と可変語の合成音声の品質との差が小さく
、高品質な音声合成が行え且つ製作コストも安価な音声
合成装置を提供するにある。The present invention has been made in view of the above-mentioned problems, and its purpose is to reduce the circuit scale, reduce the difference between the quality of synthesized speech for fixed words and the quality of synthesized speech for variable words, and achieve high performance. To provide a speech synthesis device which can perform high quality speech synthesis and is inexpensive to manufacture.

［課題を解決するための手段］本発明は上述の目的を達成するために、請求項１記載の
発明では、語句の変更が無い固定語と、変更が可能な単
語からなる可変語とを組み合わせて定型文の音声を出力
する音声合成装置において固定語の出力音声を標準パタ
ーンに基づいてベクトル量子化若しくはマトリクス量子
化に対応して形成した符号を記憶せる記憶手段を備える
とともに、テキスト合成の合成単位と標準パターンとの
対応表と可変語の文字列のテキスト合成により得られる
音韻情報、韻律情報とから生成せる可変語の符号と、上
記記憶手段から抽出される符号とを復号合成するボゴー
ダを備えたことを特徴とするものである。[Means for Solving the Problems] In order to achieve the above-mentioned object, the present invention according to claim 1 combines fixed words whose words do not change and variable words consisting of words that can be changed. A speech synthesis device that outputs speech of fixed phrases is equipped with a storage means for storing codes formed by vector quantization or matrix quantization of output speech of fixed words based on a standard pattern, and also includes a storage means for storing codes formed in accordance with vector quantization or matrix quantization of output speech of fixed words. A bogorder that decodes and synthesizes the code of the variable word generated from the correspondence table between units and standard patterns and the phonological information and prosody information obtained by text synthesis of the character string of the variable word and the code extracted from the storage means. It is characterized by the fact that it is equipped with

請求項２記載の発明は予め生成した可変語の符号を記憶
せる記憶手段を備え、この記憶手段から可変語の符号を
抽出してボゴーダへ与えるようになっている。The invention as set forth in claim 2 is provided with a storage means that can store codes of variable words generated in advance, and extracts the codes of variable words from this storage means and provides them to the bogoda.

［作用］請求項１記載の発明は、固定語の合成と、可変語の合成
が共通のボゴーダにより行え、そのため回路規模の縮小
化が図れ、しかも固定語の合成音声の品質と、可変語の
合成音声の品質との差を小さくすることができ、結果高
品質の合成音声が得られるとともに低コスト化が図れる
。[Function] According to the invention as claimed in claim 1, the synthesis of fixed words and the synthesis of variable words can be performed by a common bogorder, and therefore the circuit scale can be reduced, and the quality of the synthesized speech of fixed words and the synthesis of variable words can be improved. The difference between the quality of the synthesized speech and the quality of the synthesized speech can be reduced, and as a result, high quality synthesized speech can be obtained and costs can be reduced.

請求項２記載の発明は音声合成装置にテキスト合成部が
不要となって、回路規模を一層小さくすることができ、
結果音声合成装置の小型、軽量化ととともに、−層のコ
スト低減が図れる。The invention according to claim 2 eliminates the need for a text synthesis section in the speech synthesis device, making it possible to further reduce the circuit scale.
As a result, the speech synthesis device can be made smaller and lighter, and the cost of the − layer can be reduced.

［実施例］以下本発明を実施例により説明する。[Example] The present invention will be explained below with reference to Examples.

第１図は本実施例の全体構成を示しており、この実施例
では固定語の合成も、可変語の合成も共通のマトリクス
量子化ＬＰＧボゴーダ８を用いて行うもので、このマト
リクス量子化ＬＰＣボゴーダ８を用いる方式は、まず第
２図＜ａ）に示すように学習時に、音声波形を線形予測
分析手段９で線形予測分析を行って得られたスペクトル
パラメータの時系列をその時間的変化に基づいてセグメ
ント化手段１０でセグメント化し、各セグメントを再標
本化手段１１で再標本化してマトリクス量子化手段１２
によりマトリクス表現とし、このマトリクス表現された
ものを標準パターンとしてコードブック１３を作成し、
符号化時に、このコードブック１３の標準パターンの時
系列を示すスペクトルと、セグメント長、ピッチとから
符号化手段１４により符号化を図るのである。FIG. 1 shows the overall configuration of this embodiment. In this embodiment, both fixed words and variable words are synthesized using a common matrix quantized LPG bogorder 8. In the method using the Bogoda 8, first, as shown in Fig. 2 <a), during learning, the time series of spectral parameters obtained by performing linear predictive analysis on the speech waveform with the linear predictive analysis means 9 is used to analyze temporal changes in the time series of spectral parameters. Based on the segmentation means 10, each segment is resampled by the resampling means 11, and the matrix quantization means 12
Create a codebook 13 using this matrix representation as a standard pattern.
At the time of encoding, the encoding means 14 performs encoding based on the spectrum indicating the time series of the standard pattern of the codebook 13, the segment length, and the pitch.

復号、合成は第２図（ｂ）に示すように復号手段１５に
上記生成された符号を取り込んで復号し、その後線形予
測合成手段１６にて合成音声波形を得る。このマトリク
ス量子化ＬＰＣボゴーダ方式％式％１９８４）のｒ時間空間スペクトルパタンによる極低ビ
ット音声符号化」　白木善尚、誉田雅彰」に示されるよ
うな公知を方式である。　ここで本実施例では固定語の
合成音声に対する原音声と可変語のためのテキスト合成
部６が記憶している合成単位に対する原音声を同一話者
の音声とし、まずこの話者が発生した多量の音声から上
記方式による標準パターンを抽出して上記コードブック
１３を作成して、次に同じ音声からテキスト合成時に用
いる合成単位（ＣＶ音節）のファイルを生成する。そし
て各合成単位に対してスペクトルのみに間して符号化を
行い、即ち各合成単位に対する標準パターンの時系列を
求めて、合成単位−標準パターン対応表１７を作成して
おく。For decoding and synthesis, as shown in FIG. 2(b), the generated code is taken into the decoding means 15 and decoded, and then the linear predictive synthesis means 16 obtains a synthesized speech waveform. This matrix quantization LPC Bogoda method is a well-known method as shown in "Extremely low bit speech coding using r time-space spectral patterns" by Yoshihisa Shiraki and Masaaki Honda (1984). In this embodiment, the original speech for the synthesized speech of fixed words and the original speech for the synthesis unit stored in the text synthesis unit 6 for variable words are the speech of the same speaker, and first, the large amount of speech generated by this speaker is The code book 13 is created by extracting the standard pattern according to the above method from the speech, and then a file of synthesis units (CV syllables) to be used in text synthesis is generated from the same speech. Then, each composite unit is encoded using only the spectrum, that is, the time series of the standard pattern for each composite unit is determined, and a composite unit-standard pattern correspondence table 17 is created.

一方固定語の合成系においては出力音声を上記標準パタ
ーンにより上記方式に基づいて符号化を行って、生成し
た符号を予め固定語音声データファイル１８として適宜
記憶手段に記憶させておく。On the other hand, in the fixed word synthesis system, the output speech is encoded based on the standard pattern and the above method, and the generated code is stored in advance in a storage means as a fixed word speech data file 18 as appropriate.

上記可変語の合成系において設けられるテキスト合成部
６は、入力する文字列信号から文章解析部１９と、韻律
情報生成部２０とにより生成した音韻情報と、韻律情報
とに従って上記方式によるマトリクス量子化ＬＰＣボゴ
ーダ用符号生成部２１にて符号を生成するようになって
いる。スペクトルに関しては、音韻情報から得られる合
成単位の系列に基づいて前述の合成単位−標準パターン
対応表１７から標準パターンの時系列を求め、ピッチに
関しては、韻律情報に基づいて生成されるピッチパター
ンを各セグメント化で表現できる形に変換する。セグメント長に関しては、
韻律情報中の音韻長さについてのデータによる各合成単
位の伸縮方法に基づいて各セグメント長を求める。通常
は各合成単位の定常部が伸縮されるので、その定常部に
対応したセグメント（標準パターン）の長さを求めて符
号化することになる。The text synthesis unit 6 provided in the variable word synthesis system performs matrix quantization using the above method according to the phonological information and prosody information generated from the input character string signal by the sentence analysis unit 19 and the prosody information generation unit 20. A code is generated by an LPC bogorder code generation unit 21. Regarding the spectrum, the time series of the standard pattern is determined from the above-mentioned synthesis unit-standard pattern correspondence table 17 based on the sequence of synthesis units obtained from the phonological information, and regarding the pitch, the pitch pattern generated based on the prosody information is determined. Convert into a form that can be expressed by each segmentation. Regarding segment length,
The length of each segment is determined based on the expansion/contraction method of each synthesis unit based on the data regarding phoneme length in the prosody information. Normally, the constant part of each composite unit is expanded or contracted, so the length of the segment (standard pattern) corresponding to the constant part is determined and encoded.

さて実施例では外部から入力する出力音声指示信号は可
変語指定するための文字列信号と、固定語を選択するた
めの選択信号とを発生順に並べて形成されており、この
出力音声指示信号から文字列信号、選択信号に振り分け
るのが固定語／可変語分離部２２で、該固定語／可変語
分離部２２て振り分けられた選択信号により指定された
固定語の音声データ、つまり符号を固定語音声データフ
ァイル１８から抽出するのか音声データ抽出部２３であ
る。In the embodiment, the output voice instruction signal input from the outside is formed by arranging a character string signal for specifying a variable word and a selection signal for selecting a fixed word in the order of occurrence, and from this output voice instruction signal, a character string signal for specifying a variable word and a selection signal for selecting a fixed word are arranged. The fixed word/variable word separator 22 separates the fixed word/variable word separator 22 into a column signal and a selection signal, and converts the audio data, that is, the code, of the fixed word specified by the selection signal distributed by the fixed word/variable word separator 22 into the fixed word audio. It is the audio data extraction unit 23 that extracts from the data file 18.

而して所定の定型文の音声合成を行う場合に当たっては
、所定の固定語を指定する番号と、可変語を指定する文
字列とを、出力される順番に並べた出力音声指示信号を
音声合成装置へ与えれば良く、与えられた信号は固定語
／可変語分離部２２により、選択信号としての番号信号
と、文字列信号とに振り分けられて、音声データ抽出部
２３又はテキスト合成部６へ送られ、固定語については
固定語音声データファイル１から選択信号により指定さ
れて抽出さた符号が、可変語についてはマトリクス量子
化ＬＰＣボゴーダ用符号生成部２１により生成された符
号が出力して、デマルチプレクサ７を介してマトリクス
量子化ＬＰＧボゴーダ８へ送られて復号、合成されスピ
ーカ４より合成音声が発せられることになる。尚マトリ
クス量子化ＬＰＧボゴーダ８の合成部８ｂ、コードブッ
ク８ａは第２図（ｂ）の線形予測合成手段１６及び復号
化手段１５に対応するものである。When performing speech synthesis of a predetermined fixed sentence, an output speech instruction signal in which a number specifying a predetermined fixed word and a character string specifying a variable word are arranged in the order in which they are output is used for speech synthesis. The fixed word/variable word separation section 22 divides the given signal into a number signal as a selection signal and a character string signal, and sends the signal to the audio data extraction section 23 or the text synthesis section 6. For fixed words, the code specified and extracted from the fixed word audio data file 1 by the selection signal is output, and for variable words, the code generated by the matrix quantization LPC bogorder code generation unit 21 is output. The signals are sent to the matrix quantized LPG bogorder 8 via the multiplexer 7, decoded and synthesized, and the synthesized speech is emitted from the speaker 4. The synthesis section 8b and codebook 8a of the matrix quantized LPG bogorder 8 correspond to the linear prediction synthesis means 16 and decoding means 15 shown in FIG. 2(b).

このように本実施例装置では固定語も、可変語も同じ方
式のマトリクス量子化ＬＰＣボゴーダ８の合成部８ｂで
合成することができ、しかも使用するマトリクス量子化
ＬＰＣボゴーダ８に用いるコードブック８ａの標準パタ
ーンのための音声パラメータファイルと、テキスト合成
部６で用いる合成単位のための音声パラメータファイル
とが共有化できるため規模を一層小さくでき、また固定
語の合成音声の品質と可変語の合成音声の品質との差が
小さくなる。従って実施例での固定語と可変語の違いは
テキスト合成部６の韻律情報生成の不完全さと合成単位
作成法の不完全さのみになる。In this way, in the device of this embodiment, both fixed words and variable words can be synthesized in the synthesizing section 8b of the matrix quantization LPC bogorder 8 using the same method. The audio parameter file for the standard pattern and the audio parameter file for the synthesis unit used in the text synthesis unit 6 can be shared, making it possible to further reduce the scale. The difference between the quality and quality becomes smaller. Therefore, the only difference between fixed words and variable words in this embodiment is the incompleteness of the prosodic information generation by the text synthesis section 6 and the incompleteness of the synthesis unit creation method.

尚上記テキスト合成部６を音声合成装置とじては内蔵せ
ず、第３図（ａ）に示すように可変語音声データファイ
ル作成装置Ａとして別に備え、この可変語音声データフ
ァイル作成装置Ａにより可変語の音声データ、つまり符
号を予め作成して適宜記憶手段を用いて可変語音声デー
タファイル２５として記憶しておき、この可変語音声デ
ータファイル２５を第３図（ｂ）に示すように音声合成
装置に備えるようにしてもよい。この場合、予め登録さ
れた音声しか出力できないが、共通の音声データ抽出部
２３°で固定語音声データファイル１８或は可変語音声
データファイル２４から選択信号で指定される音声デー
タ、つまり符号を抽出することができ、結果第１図実施
例のデマルチプレクサ７が不要となるなめ、回路構成が
一層簡晧化できてコストの低減及び小型、軽量化が図れ
る。Note that the text synthesis section 6 is not built into the speech synthesis device, but is provided separately as a variable word speech data file creation device A as shown in FIG. 3(a). Word speech data, that is, codes, are created in advance and stored as a variable word speech data file 25 using an appropriate storage means, and this variable word speech data file 25 is subjected to speech synthesis as shown in FIG. 3(b). It may be provided in the device. In this case, only pre-registered voices can be output, but the common voice data extraction unit 23 extracts the voice data, that is, the code, specified by the selection signal from the fixed word voice data file 18 or the variable word voice data file 24. As a result, the demultiplexer 7 of the embodiment shown in FIG. 1 is no longer necessary, and the circuit configuration can be further simplified to reduce cost, size, and weight.

またユーザが簡単に可変語を追加登録することも可能で
ある。It is also possible for the user to easily register additional variable words.

尚上記各実施例ではマトリクス量子化を用いているが、
セグメント化とベクトル量子化若しくはマトリクス量子
化を用いた符号化であれば他の符号化方式、例えば波形
型セグメントボコーダ方式ような符号化方式でも勿論良
い。Although matrix quantization is used in each of the above embodiments,
Of course, other encoding methods such as a waveform segment vocoder method may be used as long as the encoding method uses segmentation and vector quantization or matrix quantization.

［発明の効果コ請求項１記載の発明は固定語の出力音声を標準パターン
に基づいてベクトル量子化若しくはマトリクス量子化に
対応して形成した符号を記憶せる記憶手段を備えるとと
もに、テキスト合成の合成単位と標準パターンとの対応
表と可変語の文字列のテキスト合成により得られる音韻
情報、韻律情報とから生成せる可変語の符号と、上記記
憶手段から抽出される符号とを復号合成するボゴーダ備
えているから、固定語の合成と、可変語の合成が共通の
ボゴーダにより行え、そのため回路規模の縮小化が図れ
、しかも固定語の合成音声の品質と、可変語の合成音声
の品質との差を小さくすることができ、結果高品質で、
低コストの音声合成装置を実現できるという効果がある
。[Effects of the Invention] The invention as claimed in claim 1 is provided with a storage means for storing a code formed by vector quantization or matrix quantization of fixed word output speech based on a standard pattern, and also for text synthesis. A Bogorder is provided for decoding and synthesizing the code of the variable word generated from the correspondence table between units and standard patterns and the phonological information and prosody information obtained by text synthesis of the character string of the variable word and the code extracted from the storage means. Because of this, the synthesis of fixed words and variable words can be performed using a common Bogorder, which reduces the circuit scale, and also reduces the difference in quality between the synthesized speech of fixed words and the quality of synthesized speech of variable words. can be made smaller, resulting in higher quality,
This has the effect of realizing a low-cost speech synthesis device.

請求項２記載の発明は予め生成した可変語の符号を記憶
せる記憶手段を備え、この記憶手段から可変語の符号を
抽出してボゴーダへ与えるようになっている音声合成装
置にテキスト合成部が不要となって、回路規模を一層小
さくすることができ、結果音声合成装置の小型、軽量化
ととともに、コスト低減が図れるどう効果を奏する。The invention according to claim 2 is characterized in that a text synthesis section is provided in a speech synthesis device which is provided with a storage means for storing the codes of variable words generated in advance, and extracts the codes of the variable words from the storage means and provides them to Bogoda. Since this is not necessary, the circuit scale can be further reduced, and as a result, the speech synthesis device can be made smaller and lighter, and the cost can be reduced.

[Brief explanation of the drawing]

第１図は本発明の実施例の回路構成図、第２図（ａ）は
同上のマトリクス量子化ＬＰＣボゴーダの符号化の説明
用回路構成図、第２区（ｂ）は同上のマトリクス量子化
ＬＰＣホゴーダの復号化の説明用回路構成図、第３図（
ａ）は本発明の別の実施例に使用する可変語音声データ
ファイル作成装置の回路構成図、第３図（ｂ）は同上の
回路構成図、第４図は従来例の回路構成図、第５図は別
の従来例の回路構成図である。６はテキスト合成部、７はデマルチプレクサ、８はマト
リクス量子化ＬＰＧボゴーダ、１８は固定語データファ
イル、１つは文章解析部、２０は韻律情報生成部、２１
はマトリクス量子化ＬＰＧボゴーダ用符号生成部、２２
は固定後／可変語分離部、２３は音声データ抽出部であ
る。FIG. 1 is a circuit configuration diagram of an embodiment of the present invention, FIG. 2 (a) is a circuit configuration diagram for explaining the encoding of matrix quantization LPC Bogoda as above, and Section 2 (b) is a circuit configuration diagram for explaining matrix quantization of same as above. A circuit configuration diagram for explaining decoding of LPC hogoder, Fig. 3 (
a) is a circuit configuration diagram of a variable language audio data file creation device used in another embodiment of the present invention, FIG. 3(b) is a circuit configuration diagram of the same as the above, FIG. FIG. 5 is a circuit configuration diagram of another conventional example. 6 is a text synthesis section, 7 is a demultiplexer, 8 is a matrix quantization LPG bogorder, 18 is a fixed word data file, 1 is a sentence analysis section, 20 is a prosodic information generation section, 21
is a code generation unit for matrix quantized LPG bogorder, 22
23 is a post-fixed/variable word separation unit, and a voice data extraction unit.

Claims

[Claims]

(1) In a speech synthesis device that outputs the speech of a fixed sentence by combining fixed words whose words do not change and variable words consisting of words that can be changed, the output speech of fixed words is converted into vector quantum based on a standard pattern. quantization or matrix quantization, a correspondence table between synthesis units of text synthesis and standard patterns, and phonological information and prosody information obtained by text synthesis of character strings of variable words. 1. A speech synthesis device comprising a bogorder that decodes and synthesizes a code of a variable word generated from the above and a code extracted from the storage means.

(2) The speech synthesis apparatus according to claim 1, further comprising a storage means for storing codes of variable words generated in advance, and extracting codes of variable words from the storage means and providing them to the bogoda.