JP4684609B2

JP4684609B2 - Speech synthesizer, control method, control program, and recording medium

Info

Publication number: JP4684609B2
Application number: JP2004284240A
Authority: JP
Inventors: 武史川本
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2004-09-29
Filing date: 2004-09-29
Publication date: 2011-05-18
Anticipated expiration: 2024-09-29
Also published as: JP2006098695A

Description

本発明は、音声合成装置、制御方法、制御プログラム及び記録媒体に係り、特にテキスト情報を入力して音声合成を行う音声合成装置、その制御方法、制御プログラム及び記録媒体に関する。 The present invention relates to a speech synthesizer, a control method, a control program, and a recording medium, and more particularly to a speech synthesizer that performs speech synthesis by inputting text information, a control method thereof, a control program, and a recording medium.

従来より、現在地から目的地までの誘導経路を案内するナビゲーション装置において、
ＴＴＳ（Text To Speech）コントローラを実装し、誘導経路案内に対応するテキスト情報、任意に指定したテキスト情報（ＶＩＣＳ情報、メールなど）に基づいて音声合成を行って、読み上げを行うものが知られている（例えば、特許文献１参照）。
この場合において、合成される音声の質（声質）は、予めナビゲーション装置で設定している発話する速度、声の高さ、声の太さなどの音素を固定値として使用し、音声合成（発声）を行わせていたため、一定のものになってしまうという問題点があった。
特開平５−１１３７９５号公報 Conventionally, in a navigation device that guides the guidance route from the current location to the destination,
It is known that a TTS (Text To Speech) controller is mounted, and speech synthesis is performed based on text information corresponding to guidance route guidance and arbitrarily specified text information (VICS information, e-mail, etc.). (For example, refer to Patent Document 1).
In this case, the synthesized speech quality (voice quality) is determined by using phonemes such as speech speed, voice pitch, voice thickness, etc., set in advance in the navigation device as fixed values. ) Has been performed, and there has been a problem that it becomes a certain thing.
Japanese Patent Application Laid-Open No. 5-113395

ところで、一般的に話をする場合に、その内容によっては、話者がその内容的な重要度に応じて、話し方（発話する速度、アクセント、声の大きさ等）を部分的に変更することがあり、このようにすることにより、聴取者は、よりその内容を聞き取り易くなったり、迅速にその内容を把握できるということがある。
しかしながら、上記従来のナビゲーションシステムにおいては、常に合成される音声の質は一定であるため、例えば、ユーザの長いメールを早く読み通したいという要望や、固有名称（地名、建物名、施設名など）については発話速度を遅くしたり、大きな声で発話してもらいたいという要望等には応えられないという問題点があった。
そこで、本発明の目的は、音声合成装置において、音声合成の対象とするテキスト情報の種類、内容に基づいて音声合成の態様を適宜変更することが可能な音声合成装置、その制御方法、制御プログラムおよび記録媒体を提供することにある。 By the way, when speaking in general, depending on the content, the speaker may partially change the way of speaking (speaking speed, accent, loudness, etc.) according to the importance of the content. In this way, the listener may be able to hear the content more easily or can quickly grasp the content.
However, since the quality of the synthesized voice is always constant in the conventional navigation system described above, for example, a user's desire to read a long email quickly and a unique name (location name, building name, facility name, etc.) However, there was a problem that it was not possible to respond to requests such as slowing down the utterance speed or requesting a loud voice.
SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a speech synthesizer capable of appropriately changing the mode of speech synthesis based on the type and content of text information to be synthesized in the speech synthesizer, a control method therefor, and a control program. And providing a recording medium.

上記課題を解決するために、入力されたテキスト情報に基づいて音声合成を行って音声読み上げを行う音声合成装置において、前記テキスト情報の入力元の種類に応じてメール読み上げモード、長文メール高速読み上げモード、交通情報読み上げモードあるいは経路案内モードのいずれかに音声読み上げモードを設定するモード設定部と、設定された前記音声読み上げモードに対応する音声合成制御パラメータの組み合わせを特定し、前記メール読み上げモードあるいは長文メール高速読み上げモードにおいては、固有名詞を大きくはっきり発音するようにし、前記交通情報読み上げモードにおいては、地名、インターチェンジ名などの固有名称を大きくはっきり発音するようにし、前記経路案内モードにおいては、距離、方向、目印について大きくはっきり発音するようにするパラメータ特定部と、特定された音声合成パラメータに基づいて音声合成を行って前記音声として出力する音声合成部と、を備えたことを特徴としている。
上記構成によれば、モード設定部は、テキスト情報の入力元の種類に応じてメール読み上げモード、長文メール高速読み上げモード、交通情報読み上げモードあるいは経路案内モードのいずれかに音声読み上げモードを設定する。
これにより、パラメータ特定部は、定された前記音声読み上げモードに対応する音声合成制御パラメータの組み合わせを特定し、前記メール読み上げモードあるいは長文メール高速読み上げモードにおいては、固有名詞を大きくはっきり発音するようにし、前記交通情報読み上げモードにおいては、地名、インターチェンジ名などの固有名称を大きくはっきり発音するようにし、前記経路案内モードにおいては、距離、方向、目印について大きくはっきり発音するようにし、音声合成部は、特定された音声合成パラメータに基づいて音声合成を行って前記音声として出力する。 In order to solve the above problems, in the speech synthesizing apparatus for performing speech voice line speech synthesis based on the input text information, mail reading mode, reading lengthy mail faster in response to the input source type of the text information A mode setting unit for setting the voice reading mode to any of the mode, traffic information reading mode or route guidance mode, and a combination of the voice synthesis control parameters corresponding to the set voice reading mode, and the mail reading mode or In the long mail high-speed reading mode, proper nouns are pronounced greatly clearly. In the traffic information reading mode, proper names such as place names and interchange names are pronounced clearly. In the route guidance mode, distances , Direction, placemark Is a parameter specifying unit to allow increased clearly pronounce the voice synthesizing unit for outputting as the speech by performing speech synthesis based on speech synthesis parameters identified, comprising the.
According to the above configuration, the mode setting unit sets the voice reading mode to one of the mail reading mode, the long text high speed reading mode, the traffic information reading mode or the route guidance mode according to the type of the text information input source.
Thus, the parameter specifying unit specifies the combination of the speech synthesis control parameters corresponding to the determined speech reading mode, and in the mail reading mode or the long text high speed reading mode, the proper noun is pronounced largely and clearly. In the traffic information reading mode, unique names such as place names and interchange names are pronounced clearly, and in the route guidance mode, the distance, direction, and landmarks are pronounced clearly. Then, speech synthesis is performed based on the identified speech synthesis parameters and output as the speech.

この場合において、前記テキスト情報の入力元は、メールアプリケーション、交通情報処理アプリケーションあるいはナビゲーションアプリケーションであり、前記モード設定部は、前記テキスト情報の入力元がメールアプリケーションである場合には、前記音声読み上げモードを前記メール読み上げモードあるいは長文メール高速読み上げモードに設定し、前記テキスト情報の入力元が交通情報処理アプリケーションである場合には、前記音声読み上げモードを前記交通情報読み上げモードに設定し、前記テキスト情報の入力元がナビゲーションアプリケーションである場合には、前記音声読み上げモードを前記経路案内モードに設定する、ようにしてもよい。 In this case, the input source of the text information is a mail application, a traffic information processing application or a navigation application, and the mode setting unit, when the input source of the text information is a mail application, Is set to the mail reading mode or the long text high speed reading mode, and when the input source of the text information is a traffic information processing application , the voice reading mode is set to the traffic information reading mode, and the text information When the input source is a navigation application, the voice reading mode may be set to the route guidance mode .

入力されたテキスト情報に基づいて音声合成を行って音声読み上げを行う音声合成装置の制御方法において、前記テキスト情報の入力元の種類に応じてメール読み上げモード、長文メール高速読み上げモード、交通情報読み上げモードあるいは経路案内モードのいずれかに音声読み上げモードを設定するモード設定過程と、設定された前記音声読み上げモードに対応する音声合成制御パラメータの組み合わせを特定し、前記メール読み上げモードあるいは長文メール高速読み上げモードにおいては、固有名詞を大きくはっきり発音するようにし、前記交通情報読み上げモードにおいては、地名、インターチェンジ名などの固有名称を大きくはっきり発音するようにし、前記経路案内モードにおいては、距離、方向、目印について大きくはっきり発音するようにするパラメータ特定過程と、特定された音声合成パラメータに基づいて音声合成を行って前記音声として出力する音声合成過程と、を備えたことを特徴としている。
この場合において、前記テキスト情報の入力元は、メールアプリケーション、交通情報処理アプリケーションあるいはナビゲーションアプリケーションであり、前記モード設定過程において、前記テキスト情報の入力元がメールアプリケーションである場合には、前記音声読み上げモードを前記メール読み上げモードあるいは長文メール高速読み上げモードに設定し、前記テキスト情報の入力元が交通情報処理アプリケーションである場合には、前記音声読み上げモードを前記交通情報読み上げモードに設定し、前記テキスト情報の入力元がナビゲーションアプリケーションである場合には、前記音声読み上げモードを前記経路案内モードに設定する、ようにしてもよい。 A method of controlling a speech synthesis apparatus for performing speech voice I line speech synthesis based on the input text information, in response to the input source type mail reading mode of the text information, long sentence mail fast reading mode, reading traffic information A combination of a mode setting process for setting the voice reading mode to either the mode or the route guidance mode and a voice synthesis control parameter corresponding to the set voice reading mode, and the mail reading mode or the long sentence high speed reading mode In the traffic information reading mode, proper names such as place names and interchange names are pronounced clearly, and in the route guidance mode, distances, directions, and landmarks are pronounced. Big A parameter specifying the process to be sound, and characterized in that and a speech synthesis step of outputting as the speech by performing speech synthesis based on speech synthesis parameters specified.
In this case, the input source of the text information is a mail application, a traffic information processing application, or a navigation application. If the input source of the text information is a mail application in the mode setting process, the voice reading mode is set. Is set to the mail reading mode or the long text high speed reading mode, and when the input source of the text information is a traffic information processing application , the voice reading mode is set to the traffic information reading mode, and the text information When the input source is a navigation application, the voice reading mode may be set to the route guidance mode.

また、入力されたテキスト情報に基づいて音声合成を行って音声読み上げを行う音声合成装置をコンピュータにより制御するための制御プログラムにおいて、前記テキスト情報の入力元の種類に応じてメール読み上げモード、長文メール高速読み上げモード、交通情報読み上げモードあるいは経路案内モードのいずれかに音声読み上げモードを設定させ、設定された前記音声読み上げモードに対応する音声合成制御パラメータの組み合わせを特定し、前記メール読み上げモードあるいは長文メール高速読み上げモードにおいては、固有名詞を大きくはっきり発音するようにさせ、前記交通情報読み上げモードにおいては、地名、インターチェンジ名などの固有名称を大きくはっきり発音するようにさせ、前記経路案内モードにおいては、距離、方向、目印について大きくはっきり発音するようにさせ、特定された音声合成パラメータに基づいて音声合成を行って前記音声として出力させる、ことを特徴としている。
この場合において、前記テキスト情報の入力元は、メールアプリケーション、交通情報処理アプリケーションあるいはナビゲーションアプリケーションであり、前記テキスト情報の入力元がメールアプリケーションである場合には、前記音声読み上げモードを前記メール読み上げモードあるいは長文メール高速読み上げモードに設定させ、前記テキスト情報の入力元が交通情報処理アプリケーションである場合には、前記音声読み上げモードを前記交通情報読み上げモードに設定させ、前記テキスト情報の入力元がナビゲーションアプリケーションである場合には、前記音声読み上げモードを前記経路案内モードに設定させる、ようにしてもよい。
また、上記各制御プログラムをコンピュータ読取可能な記録媒体に記録するようにしてもよい。
In the control program for controlling a computer speech synthesis apparatus for performing speech voice line speech synthesis based on the input text information, in response to the input source type mail reading mode of the text information, long sentence email fast reading mode, the traffic or information reading mode or route guidance mode is set the speech output mode, and specifying a combination of speech synthesis control parameters corresponding to the speech output mode set, the mail reading mode or the lengthy In the mail high-speed reading mode, proper nouns are pronounced greatly clearly. In the traffic information reading mode, proper names such as place names and interchange names are pronounced clearly. In the route guidance mode, distance The direction and the mark are pronounced largely and clearly , and speech synthesis is performed based on the specified speech synthesis parameters to output the speech.
In this case, the input source of the text information is a mail application, a traffic information processing application, or a navigation application. When the input source of the text information is a mail application, the voice reading mode is set to the mail reading mode or When the long text mail high speed reading mode is set and the text information input source is the traffic information processing application , the voice reading mode is set to the traffic information reading mode, and the text information input source is the navigation application. In some cases, the voice reading mode may be set to the route guidance mode.
The control programs may be recorded on a computer-readable recording medium.

本発明によれば、音声合成対象のテキスト情報の種類、内容に基づいて音声合成の態様を適宜変更することが可能となる。 According to the present invention, it is possible to appropriately change the mode of speech synthesis based on the type and content of text information to be synthesized.

以下図面を参照して本発明の実施の形態について説明する。以下の説明では、ナビゲーション装置として、車載型のナビゲーション装置（いわゆる、カーナビゲーション装置）について例示する。
図１は、本実施の形態に係るナビゲーション装置１００の機能的構成を示すブロック図である。この図に示すように、ナビゲーション装置１００は、絶対位置・方位検出部１と、相対方位検出部２と、車速検出部３と、主制御部４と、ＲＯＭ５と、ＤＲＡＭ６と、ＳＲＡＭ７と、ＶＲＡＭ８と、ユーザインタフェース部９と、表示部１０と、入力部１１と、ディスク制御部１２と、ＦＭ多重信号処理部１３と、外部記録装置制御部１４、音声データ生成部（音声合成出力部）１５と、を備えている。 Embodiments of the present invention will be described below with reference to the drawings. In the following description, an in-vehicle navigation device (so-called car navigation device) will be exemplified as the navigation device.
FIG. 1 is a block diagram showing a functional configuration of navigation device 100 according to the present embodiment. As shown in this figure, the navigation device 100 includes an absolute position / orientation detection unit 1, a relative orientation detection unit 2, a vehicle speed detection unit 3, a main control unit 4, a ROM 5, a DRAM 6, an SRAM 7, and a VRAM 8. A user interface unit 9, a display unit 10, an input unit 11, a disk control unit 12, an FM multiplexed signal processing unit 13, an external recording device control unit 14, and a voice data generation unit (speech synthesis output unit) 15. And.

絶対位置・方位検出部１は、ＧＰＳ（Global Positioning System）衛星から送信されているＧＰＳ電波を受信するレシーバ（アンテナを含む）を備え、ナビゲーション装置１００が搭載された自動車の現在地、すなわち自車位置の地表における絶対的な位置座標及び方位をＧＰＳ電波に基づいて計算し主制御部４に出力するものである。相対方位検出部２は、ジャイロセンサを有し、自車位置の相対的な方位を検出して主制御部４に出力するものである。また、車速検出部３は、自動車より得られる車速パルスを処理して、自車両の速度を主制御部４に出力するものである。 The absolute position / orientation detection unit 1 includes a receiver (including an antenna) that receives GPS radio waves transmitted from a GPS (Global Positioning System) satellite, and is the current location of the automobile on which the navigation device 100 is mounted, that is, the own vehicle position. Is calculated based on GPS radio waves and output to the main control unit 4. The relative azimuth detecting unit 2 has a gyro sensor, detects a relative azimuth of the own vehicle position, and outputs it to the main control unit 4. The vehicle speed detection unit 3 processes vehicle speed pulses obtained from the automobile and outputs the speed of the host vehicle to the main control unit 4.

主制御部４は、ナビゲーション装置１００の各部の制御や、ナビゲーション機能のための処理とった各種の処理を実行するものであり、演算手段としてのＣＰＵや、その他の周辺回路を備えている。ＲＯＭ５は、制御プログラムやＢＩＯＳ（Basic Input Output System）、装置起動のためのブートプログラム、ナビゲーション機能を実現するためのプログラムといった各種プログラムを予め格納するものであり、主制御部４によりアクセス可能になされている。また、ＤＲＡＭ６は揮発性メモリであり主制御部４のワークエリアとして用いられる。また、ＳＲＡＭ７は不揮発性メモリであり、自動車のアクセサリ電源等のメイン電源（図示せず）から電力が供給されると共に、当該メイン電源がオフの間は、電池などの予備電源（図示せず）から電力が供給されて記憶内容を常時保持可能に構成され、バックアップメモリとして機能する。また、ＶＲＡＭ８は、表示部１０に表示される画面データが書き込まれるバッファメモリである。 The main control unit 4 executes various processes such as control of each part of the navigation device 100 and processes for the navigation function, and includes a CPU as arithmetic means and other peripheral circuits. The ROM 5 stores various programs such as a control program, a basic input output system (BIOS), a boot program for starting the apparatus, and a program for realizing a navigation function, and is accessible by the main control unit 4. ing. The DRAM 6 is a volatile memory and is used as a work area for the main control unit 4. The SRAM 7 is a non-volatile memory, and is supplied with power from a main power source (not shown) such as an accessory power source of an automobile, and a standby power source (not shown) such as a battery while the main power source is off. Is configured so that the stored contents can be held at all times by supplying power from the storage device, and functions as a backup memory. The VRAM 8 is a buffer memory in which screen data displayed on the display unit 10 is written.

表示部１０は、ユーザインタフェース部９の制御の下、ナビゲーションのための地図や自車位置、操作メニュー等の各種の情報を表示するものであり、例えばＬＣＤ（Liquid Crystal Display）やＥＬ（Electro Luminescent）ディスプレイ等のディスプレイ装置を備えている。入力部１１は、ユーザの指示操作を受け付け、ユーザインタフェース部９に出力するものであり、当該ナビゲーション装置１００のフロント部分等に配設される複数の操作子や、表示部１０が備えるディスプレイ装置に設けられたタッチパネル（不図示）を備えている。なお、当該ナビゲーション装置１００をリモートコントローラ等により遠隔操作可能に入力部１１を構成しても良い。 The display unit 10 displays various information such as a map for navigation, a vehicle position, and an operation menu under the control of the user interface unit 9. For example, the display unit 10 is an LCD (Liquid Crystal Display) or an EL (Electro Luminescent). ) A display device such as a display is provided. The input unit 11 receives a user's instruction operation and outputs it to the user interface unit 9. The input unit 11 is used for a plurality of operators arranged on the front portion of the navigation device 100 or a display device provided in the display unit 10. A provided touch panel (not shown) is provided. Note that the input unit 11 may be configured such that the navigation device 100 can be remotely operated by a remote controller or the like.

ユーザインタフェース部９は、Ｉ／Ｏ（Input/Output）制御回路やドライバ回路を備え、表示部１０及び入力部１１と、主制御部４とを結ぶインターフェースとして機能する。具体的には、ユーザインタフェース部９は、主制御部４の制御の下、表示部１０の表示制御を実行すると共に、入力部１１の操作を主制御部４に出力する。 The user interface unit 9 includes an input / output (I / O) control circuit and a driver circuit, and functions as an interface that connects the display unit 10 and the input unit 11 to the main control unit 4. Specifically, the user interface unit 9 executes display control of the display unit 10 under the control of the main control unit 4 and outputs an operation of the input unit 11 to the main control unit 4.

ディスク制御部１２は、ナビゲーションに使用される地図データや、各種のデータを格納するＣＤ−ＲＯＭドライブやＤＶＤ-ＲＯＭドライブ、ハードディスクドライブといった記憶装置を制御するものである。また、ＦＭ多重信号処理部１３は、ＦＭ多重放送波を受信して、渋滞や事故、交通規制といった情報を示すＶＩＣＳ（道路交通情報通信システム）情報を取り出し、主制御部４に出力するものである。外部記録装置制御部１４は、例えばメモリカードやメモリースティック（登録商標）、コンパクトフラッシュ（登録商標）カード等の外部記録媒体に対してデータの記録及び読み出しを行うものである。 The disk control unit 12 controls storage devices such as a CD-ROM drive, a DVD-ROM drive, and a hard disk drive that store map data used for navigation and various data. The FM multiplex signal processing unit 13 receives FM multiplex broadcast waves, extracts VICS (road traffic information communication system) information indicating information such as traffic jams, accidents, and traffic regulations, and outputs the information to the main control unit 4. is there. The external recording device control unit 14 performs recording and reading of data with respect to an external recording medium such as a memory card, a Memory Stick (registered trademark), and a compact flash (registered trademark) card.

図２は、音声データ生成部の概要構成ブロック図である。
音声データ生成部１５は、音声表記記号列が入力される合成単位系列変換部１５Ａを備えている。
音声表記記号列が入力されると、合成単位系列変換部１５Ａは、音声表記記号列を解析し、音声合成単位の記号列である合成単位系列記号列に変換して、音声合成制御パラメータ生成部１５Ｂに出力する。
音声合成制御パラメータ生成部１５Ｂは、予め設定されたＴＴＳパラメータおよび音声合成規則データベース（ＤＢ）１５Ｃを参照して、音源部１５Ｄおよび音声合成フィルタ１５Ｅを制御するための音声合成制御パラメータを時系列的に生成する。 FIG. 2 is a schematic configuration block diagram of the audio data generation unit.
The voice data generation unit 15 includes a synthesis unit series conversion unit 15A to which a phonetic symbol string is input.
When a speech notation symbol string is input, the synthesis unit sequence converter 15A analyzes the speech notation symbol string, converts it into a synthesis unit sequence symbol string that is a symbol string of a speech synthesis unit, and a speech synthesis control parameter generation unit Output to 15B.
The voice synthesis control parameter generation unit 15B refers to a preset TTS parameter and a voice synthesis rule database (DB) 15C, and sets voice synthesis control parameters for controlling the sound source unit 15D and the voice synthesis filter 15E in time series. To generate.

この場合において、音声合成制御パラメータ生成部１５Ｂは、テキスト情報の内容あるいはテキスト情報の入力元（メールアプリケーション、ナビゲーションアプリケーション、交通情報処理アプリケーションなど）の種類に応じて音声読み上げモードを設定するモード設定部として機能する。従って、音声合成制御パラメータ生成部１５Ｂは、複数の音声読み上げモードに対応して予め設定された複数組のＴＴＳパラメータのうち、テキスト情報の内容あるいは、テキスト情報の入力元の種類に応じて音声読み上げモードを自動的に生成して対応するＴＴＳパラメータを参照することとなる。
生成される音声合成制御パラメータとしては、声道の伝達特性を定めるパラメータと、音源特性に関与するパラメータ（ピッチ（基本周波数）、音源振幅、有声／無声等）がある。これらのパラメータは、一定の時間的な枠の中で、互いに相関関係を持って設定される。 In this case, the speech synthesis control parameter generation unit 15B sets a text-to-speech mode according to the content of text information or the type of text information input source (mail application, navigation application, traffic information processing application, etc.). Function as. Accordingly, the speech synthesis control parameter generation unit 15B reads out the speech according to the content of the text information or the type of the input source of the text information among a plurality of TTS parameters set in advance corresponding to the plurality of speech reading modes. The mode is automatically generated and the corresponding TTS parameter is referred to.
The generated speech synthesis control parameters include a parameter that determines the transfer characteristics of the vocal tract and parameters (pitch (fundamental frequency), sound source amplitude, voiced / unvoiced, etc.) related to the sound source characteristics. These parameters are set in correlation with each other within a certain time frame.

これらの結果、音源部１５Ｄおよび音声合成フィルタ１５Ｅは、設定されたパラメータに基づいて連続音声を合成し、スピーカ１５Ｆより合成音声が出力されることとなる。
上記構成において、音声合成を行って各種情報の音声読み上げを行わせるに先立って、ユーザは、複数の音声読み上げモードに対応して複数組のＴＴＳパラメータを予め設定するためのＴＴＳパラメータの設定処理を行っておく必要がある。 As a result, the sound source unit 15D and the speech synthesis filter 15E synthesize continuous speech based on the set parameters, and the synthesized speech is output from the speaker 15F.
In the above configuration, prior to performing speech synthesis and reading out various kinds of information, the user performs a TTS parameter setting process for presetting a plurality of sets of TTS parameters corresponding to a plurality of speech reading modes. It is necessary to go.

図３は、ＴＴＳパラメータの設定処理のフローチャートである。
まず、ユーザはＴＴＳパラメータを設定すべき音声読み上げモードＮを選択する（ステップＳ１）。
この場合において、音声読み上げモードＮとしては、メール読み上げモード、長文メール高速読み上げモード、交通情報（ＶＩＣＳ）読み上げモード、経路案内モード等が挙げられる。メール読み上げモードにおいては、通常の音声合成がなされる、あるいは、固有名詞等については大きくはっきり発音するなどの処理がなされる。長文メール高速読み上げモードにおいては、メールの内容を短時間で把握するために、固有名詞等については大きくはっきり発音するとともに、発話速度を速くするなどの処理がなされる。交通情報（ＶＩＣＳ）読み上げモードにおいては、地名、インターチェンジ名などの固有名称を大きくはっきりと発音させる等の処理がなされる。経路案内モードにおいては、距離、方向（進行方向など）、目印（信号、ビルなど）については、大きくはっきり発音させる等の処理がなされる。 FIG. 3 is a flowchart of TTS parameter setting processing.
First, the user selects a speech reading mode N in which a TTS parameter is to be set (step S1).
In this case, examples of the voice reading mode N include a mail reading mode, a long mail high speed reading mode, a traffic information (VICS) reading mode, a route guidance mode, and the like. In the mail reading mode, normal speech synthesis is performed, or proper nouns and the like are pronounced largely clearly. In the long mail high-speed reading mode, in order to grasp the contents of the mail in a short time, the proper nouns are pronounced largely and clearly, and the speech speed is increased. In the traffic information (VICS) read-out mode, processing is performed such that a unique name such as a place name or interchange name is pronounced largely and clearly. In the route guidance mode, a process such as making the distance, direction (traveling direction, etc.), and a mark (signal, building, etc.) loud and clear is performed.

続いてユーザは、選択した音声読み上げモードに対するＴＴＳパラメータＰ1〜Ｐｎを設定する（ステップＳ２）。
ここで、ＴＴＳパラメータＰ1〜Ｐｎの種類としては、ピッチ、発話スピード、声の太さ、特殊記号の読み上げ可否、男声／女性、声の高さ、声の大きさ、アクセントの大きさ、母音の無声化の可否などが挙げられる。
続いて、主制御部４は、設定されたＴＴＳパラメータＰ1〜Ｐｎを読み上げモードＮに対応づけてＤＲＡＭ６およびＳＲＡＭ７に記憶し設定処理を終了する（ステップＳ３）。 Subsequently, the user sets TTS parameters P1 to Pn for the selected speech reading mode (step S2).
Here, the types of TTS parameters P1 to Pn include pitch, speech speed, voice thickness, whether special symbols can be read out, male / female, voice pitch, voice volume, accent size, vowel The possibility of devoicing is mentioned.
Subsequently, the main control unit 4 stores the set TTS parameters P1 to Pn in the DRAM 6 and the SRAM 7 in association with the reading mode N, and ends the setting process (step S3).

図４は、音声出力要求処理のフローチャートである。
その後、主制御部４は、自己が実行しているメールアプリケーション、交通情報処理アプリケーションあるいはナビゲーションアプリケーションから音声出力要求がなされた場合には、出力するテキストデータＴおよび音声読み上げモードを取得あるいは設定し（ステップＳ１１）、音声データ生成部１５に対し、ＴＴＳ出力要求情報Ｓを設定し、出力する（ステップＳ１２）。 FIG. 4 is a flowchart of the audio output request process.
Thereafter, when a voice output request is made from the mail application, traffic information processing application or navigation application being executed by itself, the main control unit 4 acquires or sets the text data T to be output and the voice reading mode ( In step S11), the TTS output request information S is set and output to the audio data generation unit 15 (step S12).

図５は、ＴＴＳ音声出力処理の処理フローチャートである。
音声データ生成部１５は、主制御部４からＴＴＳ出力要求情報Ｓを取得すると（ステップＳ２１）、ＴＴＳパラメータＰ1〜Ｐｎを設定する（ステップＳ２２）。
さらに音声データ生成部１５は、テキストデータＴを設定する（ステップＳ２３）。
これにより、音声データ生成部１５は、テキストデータＴを音声データに変換する（ステップＳ２４）。 FIG. 5 is a process flowchart of the TTS audio output process.
When the voice data generation unit 15 acquires the TTS output request information S from the main control unit 4 (step S21), the voice data generation unit 15 sets TTS parameters P1 to Pn (step S22).
Further, the voice data generation unit 15 sets text data T (step S23).
Thereby, the voice data generation unit 15 converts the text data T into voice data (step S24).

図６は、音声データ生成部の処理フローチャートである。
まず音声データ生成部１５は、入力されたテキストデータＴを解析し、音声記号の系列に変換するととともに、韻律的な特徴を自動的に生成するために、形態素解析や構文解析を行う（ステップＳ３１）。
具体的には、ある単語に対し文法的に連結可能な単語の種類を規定した単語検索テーブル１５Ｇおよび辞書１５Ｈを参照して入力されたテキストを単語と、形態素の系列と、に分割する。 FIG. 6 is a process flowchart of the audio data generation unit.
First, the speech data generation unit 15 analyzes the input text data T, converts it into a sequence of phonetic symbols, and performs morphological analysis and syntactic analysis to automatically generate prosodic features (step S31). ).
Specifically, the text input with reference to the word search table 15G and the dictionary 15H defining the types of words that can be grammatically linked to a certain word is divided into words and morpheme sequences.

続いて、分割した単語と、形態素の系列とに基づいて、読み仮名情報、文法情報、アクセント情報、単語／文節アクセント情報を音声合成用情報として抽出し、合成単位系列変換部に出力される（ステップＳ３２）。
これにより合成単位系列変換部１５Ａは、音声合成用情報を解析し、音声合成単位の記号列である合成単位系列記号列に変換されて、音声合成制御パラメータ生成部１５Ｂに出力される。 Subsequently, based on the divided words and the morpheme sequence, the reading information, the grammatical information, the accent information, and the word / sentence accent information are extracted as speech synthesis information and output to the synthesis unit sequence conversion unit ( Step S32).
As a result, the synthesis unit sequence conversion unit 15A analyzes the information for speech synthesis, converts it into a synthesis unit sequence symbol string that is a symbol string of a speech synthesis unit, and outputs it to the speech synthesis control parameter generation unit 15B.

音声合成制御パラメータ生成部１５Ｂは、設定されたＴＴＳパラメータＰ1〜Ｐｎおよび音声合成規則データベース（ＤＢ）１５Ｃを参照して、音源および音声合成フィルタを制御するための音声合成制御パラメータを時系列的に生成する（ステップＳ３４）。
ここで、音声合成制御パラメータとしては、声道の伝達特性を定めるパラメータと、音源特性に関与するパラメータ（ピッチ（基本周波数）、音源振幅、有声／無声等）がある。これらのパラメータは、一定の時間的な枠の中で、互いに相関関係を持って設定される。 The voice synthesis control parameter generation unit 15B refers to the set TTS parameters P1 to Pn and the voice synthesis rule database (DB) 15C, and sets voice synthesis control parameters for controlling the sound source and the voice synthesis filter in time series. Generate (step S34).
Here, as speech synthesis control parameters, there are a parameter that determines the transfer characteristics of the vocal tract and parameters (pitch (fundamental frequency), sound source amplitude, voiced / unvoiced, etc.) related to the sound source characteristics. These parameters are set in correlation with each other within a certain time frame.

これらの結果、音源部１５Ｄおよび音声合成フィルタ１５Ｅは、設定されたパラメータに基づいて連続音声を合成し、スピーカ１５Ｆより合成音声が出力される（ステップＳ２５）。
このとき、スピーカ１５Ｆから出力される合成音声は、設定されたＴＴＳパラメータＰ1〜Ｐｎに従うものとなる。
例えば、メールの容量が所定の基準容量よりも大きい長文メールである場合には、読上モードが長文高速読上モードに設定され、読み上げ速度を上げて、より短時間でメールの内容を把握することができる。 As a result, the sound source unit 15D and the speech synthesis filter 15E synthesize continuous speech based on the set parameters, and the synthesized speech is output from the speaker 15F (step S25).
At this time, the synthesized speech output from the speaker 15F follows the set TTS parameters P1 to Pn.
For example, if the mail volume is a long text mail that is larger than a predetermined reference capacity, the reading mode is set to the long text high speed reading mode, the reading speed is increased, and the content of the mail is grasped in a shorter time. be able to.

また、経路案内情報や交通情報を読み上げる場合には、距離あるいは地名（固有名称）などをゆっくり大きく発音するなどより聞きやすい状態にすることが可能となる。
以上の説明のように、本実施形態によれば、テキスト情報の内容あるいはテキスト情報の入力元の種類に応じて主制御部４が自動的に音声読み上げモードを設定し、設定された音声読み上げモードに対応するＴＴＳパラメータの組に基づいて音声合成を行って合成音声を出力するので、読み上げ対象に応じてより聞き取りやすい合成音声を得ることができ、使い勝手が向上する。 In addition, when reading out route guidance information and traffic information, it becomes possible to make it easier to hear, for example, slowly pronounce a distance or place name (unique name).
As described above, according to the present embodiment, the main control unit 4 automatically sets the speech reading mode according to the content of the text information or the input source type of the text information, and the set speech reading mode is set. Since the synthesized speech is output by performing speech synthesis based on the set of TTS parameters corresponding to, synthesized speech that is easier to hear can be obtained according to the reading target, and usability is improved.

以上の説明においては、音声読み上げモードを主制御部４が自動的に設定する構成としていたが、ユーザが入力部１１を介して任意に音声読み上げモードを設定するように構成することも可能である。
以上の説明では、テキスト情報の入力元は、ナビゲーション装置上で実行されるメールアプリケーション、交通情報処理アプリケーションあるいはナビゲーションアプリケーションである場合を説明したが、これら以外の各種アプリケーションであっても良い。
さらに、アプリケーションばかりでなく、外部装置、例えば、外部の交通情報処理装置、インターネット端末などからテキスト情報を入力するようにしてもよい。
以上の説明では、ナビゲーション装置について説明したが、音声合成装置を設けることが可能な装置であれば、どのような装置でも適用が可能である。 In the above description, the main control unit 4 automatically sets the voice reading mode. However, the user can arbitrarily set the voice reading mode via the input unit 11. .
In the above description, the case where the text information input source is a mail application, a traffic information processing application, or a navigation application executed on the navigation device has been described, but various other applications may be used.
Furthermore, text information may be input not only from an application but also from an external device such as an external traffic information processing device or an Internet terminal.
In the above description, the navigation device has been described. However, any device can be applied as long as it can provide a speech synthesizer.

本発明の実施形態に係るナビゲーション装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the navigation apparatus which concerns on embodiment of this invention. 音声データ生成部の概要構成ブロック図である。It is a general | schematic block diagram of an audio | voice data generation part. ＴＴＳパラメータの設定処理のフローチャートである。It is a flowchart of a setting process of a TTS parameter. 音声出力要求処理のフローチャートである。It is a flowchart of an audio | voice output request | requirement process. ＴＴＳ音声出力処理の処理フローチャートである。ナビゲーション装置の動作を示すフローチャートである。It is a process flowchart of a TTS audio | voice output process. It is a flowchart which shows operation | movement of a navigation apparatus. 音声データ生成部の処理フローチャートである。It is a process flowchart of an audio | voice data generation part.

Explanation of symbols

１００ナビゲーション装置
１絶対位置・方位検出部
２相対方位検出部
３車速検出部
４主制御部（モード設定部、パラメータ特定部）
５ＲＯＭ
６ＤＲＡＭ（パラメータ記憶部）
７ＳＲＡＭ（パラメータ記憶部）
９ユーザインタフェース部
１０表示部
１１入力部（モード指定部）
１２ディスク制御部
１３ＦＭ多重信号処理部
１４外部記録装置制御部
１５音声データ生成部（音声合成出力部、音声合成部）
１５Ａ合成単位系列変換部
１５Ｂ音声合成制御パラメータ生成部
１５Ｃ音声合成規則データベース（ＤＢ）
１５Ｄ音源部
１５Ｅ音声合成フィルタ
１５Ｆスピーカ DESCRIPTION OF SYMBOLS 100 Navigation apparatus 1 Absolute position / direction detection part 2 Relative direction detection part 3 Vehicle speed detection part 4 Main control part (mode setting part, parameter specific part)
5 ROM
6 DRAM (parameter storage unit)
7 SRAM (parameter storage unit)
9 User interface part 10 Display part 11 Input part (mode designation part)
12 Disc control unit 13 FM multiplexed signal processing unit 14 External recording device control unit 15 Voice data generation unit (speech synthesis output unit, voice synthesis unit)
15A Synthesis unit sequence conversion unit 15B Speech synthesis control parameter generation unit 15C Speech synthesis rule database (DB)
15D sound source unit 15E voice synthesis filter 15F speaker

Claims

In the speech synthesizing apparatus for performing speech voice line speech synthesis based on the input text information,
Depending on the input source type and e-mail read aloud mode, long sentence e-mail high-speed read-out mode, the mode setting unit that sets a text-to-speech mode to one of the traffic information reading mode or route guidance mode of the text information,
The combination of the speech synthesis control parameters corresponding to the set speech reading mode is specified, and in the mail reading mode or the long sentence high speed reading mode, proper nouns are pronounced largely and clearly, in the traffic information reading mode A parameter specifying unit that clearly pronounces unique names such as place names and interchange names, and pronounces clearly about distance, direction, and landmarks in the route guidance mode ;
A speech synthesizer that performs speech synthesis based on the identified speech synthesis parameters and outputs the synthesized speech;
A speech synthesizer characterized by comprising:

The speech synthesizer according to claim 1.
The input source of the text information is a mail application, a traffic information processing application or a navigation application,
When the text information input source is a mail application, the mode setting unit sets the voice reading mode to the mail reading mode or the long text high-speed reading mode, and the text information input source is the traffic information processing If it is an application , the speech reading mode is set to the traffic information reading mode, and if the input source of the text information is a navigation application, the voice reading mode is set to the route guidance mode.
A speech synthesizer characterized by the above.

A method of controlling a speech synthesis apparatus for performing speech voice I line speech synthesis based on the input text information,
Depending on the input source type and e-mail read aloud mode, long sentence e-mail high-speed read-out mode, the mode setting process of setting the text-to-speech mode to one of the traffic information reading mode or route guidance mode of the text information,
The combination of the speech synthesis control parameters corresponding to the set speech reading mode is specified, and in the mail reading mode or the long sentence high speed reading mode, proper nouns are pronounced largely and clearly, in the traffic information reading mode In the route guidance mode, a parameter specifying process for clearly pronounced distances, directions, and landmarks is pronounced .
A speech synthesis process in which speech synthesis is performed based on the identified speech synthesis parameters and output as the speech;
A method for controlling a speech synthesizer, comprising:

The method of controlling a speech synthesizer according to claim 3,
The input source of the text information is a mail application, a traffic information processing application or a navigation application,
In the mode setting process, when the text information input source is a mail application, the voice reading mode is set to the mail reading mode or the long text high-speed reading mode, and the text information input source is the traffic information processing If it is an application , the speech reading mode is set to the traffic information reading mode, and if the input source of the text information is a navigation application, the voice reading mode is set to the route guidance mode.
A method of controlling a speech synthesizer characterized by the above.

In the control program for controlling a computer speech synthesis apparatus for performing speech voice I line speech synthesis based on the input text information,
Depending on the input source type mail reading mode of the text information, long sentence e-mail high-speed read-out mode, to set the text-to-speech mode to one of the traffic information reading mode or route guidance mode,
The combination of the speech synthesis control parameters corresponding to the set speech reading mode is specified, and in the mail reading mode or the long sentence high speed reading mode, proper nouns are pronounced largely and clearly, in the traffic information reading mode Is to pronounce unique names such as place names, interchange names, etc., and in the route guidance mode, it should be pronounced clearly about distance, direction and landmarks ,
Performing speech synthesis based on the identified speech synthesis parameters and outputting as speech.
A control program characterized by that.

The control program according to claim 5, wherein
The input source of the text information is a mail application, a traffic information processing application or a navigation application,
If the input source of the text information is a mail application, the text-to-speech mode is set in the e-mail reading mode or a long sentence e-mail high-speed read-out mode, when the input source of the text information is a traffic information processing applications The speech reading mode is set to the traffic information reading mode, and when the input source of the text information is a navigation application, the voice reading mode is set to the route guidance mode.
A control program characterized by that.

Computer-readable recording medium characterized by recording a control program according to claim 5 or claim 6.