JPH0549998B2

JPH0549998B2 -

Info

Publication number: JPH0549998B2
Application number: JP58180246A
Authority: JP
Inventors: Hiroshi Ichikawa
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-09-30
Filing date: 1983-09-30
Publication date: 1993-07-27
Also published as: JPS6073589A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は音声合成装置に係り、特に複雑な内容
を持つ内容を表現する音声を聞きやすく出力する
のに好適な音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a speech synthesis device, and particularly to a speech synthesis device suitable for outputting speech expressing complex content in an easy-to-hear manner.

[Background of the invention]

従来の聞きとりやすい音声合成の例は、特開昭
56−29297号公報に記載されている。しかし、複
雑な文章、たとえば（）ではさまれた注釈付き
の文章を音声に変換する場合、従来は、たとえば “人の声には様々な種類（たとえば男と女の
声）がある” という文章を合成する場合 “ヒトノコエニハサマザマナシユルイ、ヒラ
キカツコタトエバオトコトオンナノコエ
トジカツコ、ガアル” などと合成されるため、複雑な文形となり文章の
意味を聞き取るのには十分ではない。 An example of conventional speech synthesis that is easy to hear is the Japanese patent publication
It is described in Publication No. 56-29297. However, when converting complex sentences, such as sentences with annotations between parentheses ( When composing ``hitonokoeniha samazamanashiyurui, hirakikatsuko tatoeva otokoto onnano koe tojikatsuko, gaaru,'' the sentence becomes complex and not enough to make out the meaning of the sentence.

[Purpose of the invention]

本発明の目的は、このような複雑な書き言葉の
文章を聞き取り容易な形式の音声に変換する音声
合成装置を提供することにある。 An object of the present invention is to provide a speech synthesis device that converts such complex written sentences into speech in an easily audible format.

〔発明の概要〕本発明は、入力文章中に所定の記号（例えば、
カツコ）で囲まれた説明付きの文章を含む入力記
号列を入力し、入力記号列を解析して、所定の記
号で囲まれた説明部分の入力記号列を無視した入
力記号列より、所定の規則合成手法を用いて所定
の記号で囲まれた説明部分の位置を示す情報を含
む第１の音声合成用情報を生成し、無視した所定
の記号で囲まれた説明部分の入力記号列より、所
定の規則合成手法を用いて第２の音声合成用情報
を生成するように制御する制御手段と、第１の音
声合成用情報と第２の音声合成用情報とを夫々記
憶する記憶手段と、記憶手段に記憶された第１の
音声合成用情報の読み出しを行ない第１の種類
（例えば、第１の話者）の音声素片を用いて第１
の種類の音声に合成し、第１の音声合成情報の読
み出しの際に、所定の記号で囲まれた説明部分の
位置を示す情報を読み出した場合には、記憶手段
に記憶された第２の音声合成用情報の読み出しに
切り換えて第２の種類（例えば、第２の話者）の
音声素片を用いて第２の種類の音声に合成する合
成手段とを備えたことを特徴とする。[Summary of the Invention] The present invention provides a method for adding predetermined symbols (for example,
Input an input symbol string that includes a sentence with an explanation surrounded by brackets, analyze the input symbol string, and select a predetermined input symbol string from the input symbol string that ignores the input symbol string in the explanation part surrounded by the specified symbols. First speech synthesis information including information indicating the position of the explanatory part surrounded by predetermined symbols is generated using a rule synthesis method, and from the input symbol string of the explanatory part surrounded by the ignored predetermined symbols, a control means for controlling the generation of the second speech synthesis information using a predetermined rule synthesis method; a storage means for storing the first speech synthesis information and the second speech synthesis information, respectively; The first speech synthesis information stored in the storage means is read out, and the first speech synthesis information is read out using the speech segment of the first type (for example, the first speaker).
If information indicating the position of the explanatory part surrounded by predetermined symbols is read out when reading the first speech synthesis information, the second speech synthesis information stored in the storage means is The present invention is characterized by comprising a synthesizing means that switches to reading the speech synthesis information and synthesizes speech of the second type using speech segments of the second type (for example, from a second speaker).

音声に変換しようという文章の本文と、その一
部ないし全部を説明するカツコ内の部分とは文法
的に独立である。そこで、本発明では本文の文章
を第一の話者が発声し、カツコ内の部分を第二の
話者が解説するという形式を、合成に際し模擬す
ることによつて、上記目的を達しようというもの
である。このようにすると、第一の話者の音声の
声を聞きつなぐことにより、素直に本文を聞き取
れると共に、第二の音声の声で解説として説明を
聞き取ることができる。 The main text of the sentence to be converted into audio and the part inside the cutlet that explains part or all of it are grammatically independent. Therefore, in the present invention, the above objective is achieved by simulating the format in which the main text is uttered by a first speaker and the parts in the cutout are explained by a second speaker. It is something. In this way, by listening to the voice of the first speaker, the main text can be easily heard, and the explanation can be heard as an explanation using the voice of the second speaker.

[Embodiments of the invention]

以下、第一の目的を例に本発明の一実施例を図
をもつて説明する。第１図において、制御部１に
入力文章を表わす記号列が入力される。制御部１
では、入力記号列を解析し、カツコで囲まれた部
分を無視して、公知の規則合成の手法により、各
単語のアクセント型を定め、各文字に対応する文
節の時間長を推定し、ピツチパターンメモリ３の
Ａの部分から第一の話者の声の高さのピツチパタ
ーンを取り出し、素片編集方式による音声合成用
制御情報にまとめ、制御情報バツフア６の第１の
部分に書き込む。この情報には第一の話者の声で
あるという情報も併せ記述されており、また、無
視したカツコの位置も判別できるよう目印がつけ
られている。次に制御部１はカツコの内部の文章
を取り出し、第二の話者のピツチパターン情報を
ピツチパターンメモリ３のＢから取り出し、同様
の手順で制御情報を作成し、制御情報バツフア６
の第２の部分に書き込む。この制御情報には第二
の話者の声であるという情報も併せ記述されてい
る。このようにして制御情報の作成が終了する
と、制御部１は編集合成部７に音声合成を指令す
る。編集合成部７は、制御情報バツフア６の第１
部分の情報を取り出し、その指令に従いスイツチ
５を制御して音声素片メモリ４の第一の話者の素
片部４−１を選択し、順次音声を合成して行く。
制御情報バツフア６の第１の部分のデータ中から
カツコの位置を示す目印が入力されると、第２の
部分のデータに飛び、それに従いスイツチ５は第
二話者用に切り換り、メモリ４−２より第二話者
の素片を取り出し、カツコ内の文章の合成を順次
行ない、終了すると第一の話者のデータに復帰し
て、合成を続行する。なお音声素片を用いた音声
合成方式はたとえば電子通信学会誌Vol58−ＤNo.
9p522「音声素片を用いた単音節編集形音声合成
方式における音声素片の作成法」等で、また規則
合成についても、日本音響学会音声研究会資料
「音声素片を用いた単音節編集型音声応答方式」
等多数の資料にあるごとく公知の技術である。 Hereinafter, one embodiment of the present invention will be described with reference to the drawings, taking the first object as an example. In FIG. 1, a symbol string representing an input sentence is input to a control unit 1. As shown in FIG. Control unit 1
Then, we analyze the input symbol string, ignore the parts enclosed in brackets, determine the accent type of each word using a well-known rule synthesis method, estimate the duration of the clause corresponding to each character, and calculate the pitch. The pitch pattern of the pitch of the first speaker's voice is extracted from the portion A of the pattern memory 3, compiled into control information for speech synthesis using the segment editing method, and written into the first portion of the control information buffer 6. This information also includes the information that it is the voice of the first speaker, and a mark is also attached so that the location of the ignored katsuko can be determined. Next, the control unit 1 takes out the sentence inside Katsuko, takes out the pitch pattern information of the second speaker from B of the pitch pattern memory 3, creates control information using the same procedure, and stores the pitch pattern information in the control information buffer 6.
Write in the second part of. This control information also includes information indicating that the voice is the voice of the second speaker. When the creation of the control information is completed in this way, the control section 1 instructs the editing and synthesis section 7 to synthesize speech. The editing/synthesizing unit 7 is configured to control the first
The information on the part is extracted, and the switch 5 is controlled in accordance with the command to select the segment part 4-1 of the first speaker in the speech segment memory 4, and the speech is successively synthesized.
When a mark indicating the position of the cutter is input from the data in the first part of the control information buffer 6, the data jumps to the second part, and accordingly the switch 5 switches to the data for the second speaker and stores the data in the memory. The second speaker's segment is extracted from 4-2, and the sentences in the cutlet are successively synthesized, and when finished, the first speaker's data is restored and the synthesis continues. The speech synthesis method using speech segments is described in, for example, the Journal of the Institute of Electronics and Communication Engineers Vol. 58-D No.
9p522 "Method for creating speech segments in a monosyllabic editing type speech synthesis method using speech segments", etc. Also regarding rule synthesis, see the Acoustical Society of Japan speech study group material "Monosyllabic editing type using speech segments", etc. Voice response method”
This is a well-known technique as shown in numerous materials such as .

本実施例では音声素片方式を合成部に用いた
が、PARCOR方式等他の合成方式でも良いこと
は言うまでもない。 In this embodiment, the phoneme one-sided method is used in the synthesis section, but it goes without saying that other synthesis methods such as the PARCOR method may also be used.

上記の実施例ではカツコは一ヶ所としたが、複
数ヶ所への拡張は容易なことは言うまでもない。
また、音声は二種に限らず、さらに多数の種類を
用意するよう拡張することも容易である。二種を
用いる場合は、男声と女声が適当であろう。ま
た、一方の出力レベルを変えたり、ササヤキ声に
するなどの音声の変形も当然行なうことができ
る。 In the above embodiment, the cutter is placed in one place, but it goes without saying that it is easy to extend the cutter to multiple places.
Furthermore, the system is not limited to two types of voices, and can easily be expanded to include many more types. If two types are used, a male voice and a female voice would be appropriate. Naturally, it is also possible to change the output level of one side, or to change the sound by making it softer.

第二の目的を達するためには、第１図と同様の
構成を基本とし、入力信号に制御用情報か、デー
タかを区分する符号を付加し（従来の通常端末で
も一般に区別されている）、その符号に従つて制
御部１で音声を切り換選択すれば容易に実現でき
る。 In order to achieve the second objective, the basic configuration is the same as that shown in Figure 1, and a code is added to the input signal to distinguish whether it is control information or data (a distinction is generally made in conventional terminals as well). This can be easily realized by switching and selecting the audio using the control unit 1 according to the code.

〔発明の効果〕以上説明したごとく、本発明によれば、文字表
現では、カツコや位置、色などで容易に区別でき
る複数種の情報を、情報の種類の区分を、音声の
相異を利用することにより、音声で容易に聞き取
り区分し、理解出来る。[Effects of the Invention] As explained above, according to the present invention, in character expression, multiple types of information that can be easily distinguished by brackets, positions, colors, etc. can be classified by using differences in sounds to classify information types. By doing this, you can easily hear, categorize, and understand the words.

[Brief explanation of the drawing]

第１図は本発明の一実施例を説明する図であ
る。１……制御部、２……入力信号線、３……ピツ
チパタンメモリ、４……音声素片メモリ、５……
スイツチ、６……制御情報バツフア、７……編集
合成部、８……合成出力信号線。 FIG. 1 is a diagram illustrating an embodiment of the present invention. 1...Control unit, 2...Input signal line, 3...Pitch pattern memory, 4...Speech unit memory, 5...
switch, 6... control information buffer, 7... editing synthesis section, 8... synthesis output signal line.

Claims

[Claims] 1. Inputting an input symbol string including a sentence with an explanation surrounded by predetermined symbols in an input sentence, analyzing the input symbol string, and analyzing the explanatory part surrounded by the predetermined symbols. Using a predetermined rule synthesis method, first speech synthesis information including information indicating the position of the explanatory part surrounded by the predetermined symbols is generated from the input symbol string in which the input symbol string is ignored. a control means for controlling to generate second speech synthesis information using a predetermined rule synthesis method from the input symbol string of the explanatory part surrounded by the predetermined symbols; and the first speech synthesis information. and the second speech synthesis information; and a storage means for reading out the first speech synthesis information stored in the storage means and using the first type of speech segment to produce the first speech synthesis information. If information indicating the position of the explanatory part surrounded by the predetermined symbols is read out when reading the first speech synthesis information, the information stored in the storage means Said second
A speech synthesis device comprising: synthesis means for switching to reading speech synthesis information and synthesizing speech of a second type using speech units of a second type. 2. The speech synthesis device according to claim 1, wherein the explanatory portion surrounded by the predetermined symbols is an explanatory portion surrounded by brackets. 3. The speech synthesis device according to claim 1, wherein the first type and the second type are a first speaker and a second speaker.