JP5625321B2

JP5625321B2 - Speech synthesis apparatus and program

Info

Publication number: JP5625321B2
Application number: JP2009247784A
Authority: JP
Inventors: 雅史吉田; 久湊　裕司; 裕司久湊; 隼人大下; 吉岡　靖雄; 靖雄吉岡
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2009-10-28
Filing date: 2009-10-28
Publication date: 2014-11-19
Anticipated expiration: 2029-10-28
Also published as: JP2011095397A

Description

本発明は、音声合成に適用される制御変数を設定する技術に関する。 The present invention relates to a technique for setting a control variable applied to speech synthesis.

発音の強度や息成分の強弱などの音楽的な表情が多様に制御された音声（典型的には歌唱音）を合成する技術が従来から提案されている。特許文献１には、合成の対象として指定された音（以下「指定音」という）の時系列を示す画像（ピアノロール画像）と、各指定音に付与される表情を示す複数種の制御変数の時間的な遷移を示すグラフとを、共通の時間軸のもとで表示する技術が提案されている。利用者は、予め時間的な遷移が設定された複数種の制御変数の何れかを選択してグラフを操作することで、その制御変数の時間的な遷移を編集することが可能である。 Techniques for synthesizing sounds (typically singing sounds) in which musical expressions such as the intensity of pronunciation and the strength of breath components are controlled in various ways have been proposed. Patent Document 1 discloses an image (piano roll image) showing a time series of sounds designated as synthesis targets (hereinafter referred to as “designated sounds”) and a plurality of types of control variables representing facial expressions given to each designated sound. There has been proposed a technique for displaying a graph showing temporal transitions of the above with a common time axis. The user can edit the temporal transition of the control variable by selecting one of a plurality of types of control variables for which temporal transition is set in advance and operating the graph.

特開２００８−１６５１３０号公報JP 2008-165130 A

しかし、特許文献１の技術のもとで合成音に付与される表情を変更するためには、予め用意された制御変数を利用者が編集する必要がある。したがって、制御変数とその制御変数に応じて変化する音楽的な表情との相関を熟知していない利用者にとって作業の負担が大きいという問題がある。複数種の制御変数を編集可能な構成では、複数種の制御変数の何れかを１種類ずつ順次に選択して編集する作業が必要であるから、作業の負担の増大という問題は特に深刻となる。以上の事情を考慮して、本発明は、複数種の制御変数を設定するための利用者の作業の負担を軽減することを目的とする。 However, in order to change the facial expression given to the synthesized sound under the technique of Patent Document 1, it is necessary for the user to edit a control variable prepared in advance. Therefore, there is a problem that the burden of work is large for a user who is not familiar with the correlation between the control variable and the musical expression that changes in accordance with the control variable. In a configuration in which a plurality of types of control variables can be edited, it is necessary to select and edit any one of the plurality of types of control variables sequentially one by one. Therefore, the problem of an increase in the work load becomes particularly serious. . In view of the above circumstances, an object of the present invention is to reduce a burden on a user's work for setting a plurality of types of control variables.

以上の課題を解決するために、本発明の音声合成装置は、音楽情報が示す指定音（合成の対象として指定された音）の時系列における適用区間を利用者からの指示に応じて可変に設定する区間設定手段と、音声合成に適用される制御変数の時系列を示す複数の変数情報のうち利用者からの指示に応じた変数情報を選択する変数選択手段と、変数選択手段が選択した変数情報に応じて適用区間内の制御変数の時系列を設定する変数設定手段と、音楽情報が示す指定音を合成する手段であって、変数設定手段が設定した制御変数の時系列を適用区間内の指定音の合成に適用する音声合成手段とを具備する。なお、音楽情報および変数情報は、単体の記憶装置（例えば図１の記憶装置１２）内に設定された別個の記憶領域、または、別体の記憶装置の各々に設定された記憶領域に記憶され得る。 In order to solve the above-described problems, the speech synthesizer according to the present invention variably applies an application section in a time series of a designated sound (sound designated as a synthesis target) indicated by music information according to an instruction from a user. The section selection means for setting, the variable selection means for selecting the variable information according to the instruction from the user among the plurality of variable information indicating the time series of the control variables applied to the speech synthesis, and the variable selection means selected Variable setting means for setting the time series of control variables in the applicable section according to the variable information, and means for synthesizing the specified sound indicated by the music information, wherein the time series of the control variable set by the variable setting means is applied to the applicable section Voice synthesizing means applied to the synthesis of the designated sound. Note that the music information and the variable information are stored in a separate storage area set in a single storage device (for example, the storage device 12 in FIG. 1) or a storage area set in each separate storage device. obtain.

以上の構成においては、制御変数の時間的な遷移を示す複数の変数情報のうち利用者からの指示に応じて選択された変数情報が、適用区間内の各指定音の音声合成に適用される制御変数Ｘの時系列の設定に利用される。したがって、合成音に付与される音楽的な表情を変更するためには利用者が制御変数の時系列を変更（編集）する必要がある特許文献１の技術と比較して、合成音に所望の表情を付与するために必要な利用者の作業の負担が軽減されるという利点がある。ただし、変数選択手段が複数の変数情報の何れかを選択する構成に加えて、変数設定手段による設定後の制御変数を特許文献１と同様に利用者が編集する構成も、本発明の範囲内の好適な態様として採用され得る。 In the above configuration, variable information selected in response to an instruction from the user among a plurality of variable information indicating temporal transitions of control variables is applied to speech synthesis of each designated sound in the application section. This is used to set the time series of the control variable X. Therefore, in order to change the musical expression assigned to the synthesized sound, the user needs to change (edit) the time series of the control variables to the desired value for the synthesized sound. There is an advantage that the burden on the user's work necessary for giving a facial expression is reduced. However, in addition to the configuration in which the variable selection unit selects any one of the plurality of pieces of variable information, the configuration in which the user edits the control variable after setting by the variable setting unit in the same manner as in Patent Document 1 is also within the scope of the present invention. It can employ | adopt as a suitable aspect of.

本発明の好適な態様において、複数の変数情報の各々は、音声合成に適用される複数種の制御変数の各々について当該制御変数の時系列を示す。以上の態様においては、各変数情報が、複数種の制御変数の時系列を示すから、変数情報が１種類の制御変数のみを示す構成と比較すると、多様な表情の合成音を生成できるという利点がある。 In a preferred aspect of the present invention, each of the plurality of variable information indicates a time series of the control variable for each of a plurality of types of control variables applied to speech synthesis. In the above aspect, since each variable information indicates a time series of a plurality of types of control variables, an advantage that synthetic sounds with various facial expressions can be generated as compared with a configuration in which the variable information indicates only one type of control variable. There is.

本発明の好適な態様に係る音声合成装置は、音楽情報が示す各指定音に対応する音指示子の時系列と、変数設定手段が設定した制御変数の時系列とを、時間軸を共通にして表示装置に表示させる表示制御手段を具備する。以上の態様においては、各指定音の音指示子の時系列と制御変数の時系列とが時間軸を共通にして表示されるから、制御変数に応じて各指定音に付与される音楽的な表情を利用者が容易に確認できるという利点がある。 A speech synthesizer according to a preferred aspect of the present invention uses a common time axis for a time series of sound indicators corresponding to each designated sound indicated by music information and a time series of control variables set by the variable setting means. Display control means for displaying on the display device. In the above aspect, since the time series of the sound indicator of each designated sound and the time series of the control variable are displayed with the time axis in common, the musical sound given to each designated sound according to the control variable There is an advantage that the user can easily confirm the facial expression.

本発明の好適な態様において、変数選択手段は、合成音の複数の属性（例えば音質やジャンルや曲部位や調）の各々に関する選択肢の相異なる組合せに対応する複数の変数情報のうち、利用者が前記各属性について指示した選択肢の組合せに対応する変数情報を選択する。以上の態様によれば、利用者に馴染みのある属性の指示に応じて変数情報が選択されるから、変数情報で指示される音楽的な表情について利用者に詳細な知識がなくても、適切な変数情報を音声合成に適用できるという利点がある。 In a preferred aspect of the present invention, the variable selecting means includes a user among a plurality of pieces of variable information corresponding to different combinations of options relating to each of a plurality of attributes (for example, sound quality, genre, song part, and key) of the synthesized sound. Selects variable information corresponding to the combination of options indicated for each attribute . According to the above aspect, since variable information is selected according to an instruction of an attribute familiar to the user, it is appropriate even if the user has no detailed knowledge about the musical expression indicated by the variable information. Advantageous variable information can be applied to speech synthesis.

本発明の好適な態様において、区間設定手段は、適用区間内の編集区間を利用者からの指示に応じて可変に設定し、変数設定手段は、適用区間のうち編集区間内の制御変数の時系列と編集区間以外の区間（編集外区間）内の制御変数の時系列とを独立に設定し得る。以上の態様においては、適用区間のうち利用者からの指示に応じた編集区間について編集区間以外の区間内とは独立に制御変数の時系列が設定される。すなわち、適用区間内の制御変数の時系列を部分的に変更することが可能である。したがって、適用区間のみが設定される構成と比較して、利用者に意図を高度に反映した多様な表情の合成音を生成できるという利点がある。以上の態様の具体例は、例えば第２実施形態として後述される。 In a preferred aspect of the present invention, the section setting means variably sets the editing section in the application section in response to an instruction from the user, and the variable setting means is a control variable in the editing section of the application section. A series and a time series of control variables in a section other than the editing section (non-editing section) can be set independently. In the above aspect, the time series of the control variables is set independently of the editing section corresponding to the instruction from the user in the applied section, in the section other than the editing section. That is, it is possible to partially change the time series of control variables in the application section. Therefore, compared with the configuration in which only the applicable section is set, there is an advantage that synthesized sounds with various facial expressions that reflect the intention to the user can be generated. A specific example of the above aspect will be described later as a second embodiment, for example.

ところで、編集区間と編集外区間とで制御変数の時系列が独立に設定される構成では、制御変数の時系列が編集区間と編集外区間との境界にて不連続となる可能性がある。そこで、本発明の好適な態様において、変数設定手段は、適用区間における編集区間の内外で制御変数が連続するように制御変数の補間を実行する。以上の態様においては、編集区間と編集外区間との境界における制御変数の不連続な変化が抑制されるから、合成音の音楽的な表情の不自然（唐突）な変化が防止されるという利点がある。なお、以上の態様の具体例は、例えば第３実施形態として後述される。 By the way, in the configuration in which the time series of the control variable is set independently in the editing section and the non-editing section, the time series of the control variable may be discontinuous at the boundary between the editing section and the non-editing section. Therefore, in a preferred aspect of the present invention, the variable setting means executes control variable interpolation so that the control variable is continuous inside and outside the editing section in the application section. In the above aspect, since the discontinuous change of the control variable at the boundary between the editing section and the non-editing section is suppressed, the advantage that an unnatural (abrupt) change in the musical expression of the synthesized sound is prevented. There is. In addition, the specific example of the above aspect is later mentioned, for example as 3rd Embodiment.

本発明の好適な態様に係る音声合成装置は、利用者からの指示に応じて効果調整値を可変に設定する調整値設定手段を具備し、変数設定手段は、効果調整値に応じた度合で変数情報が適用区間内の指定音の合成に反映されるように適用区間内の制御変数の時系列を設定する。以上の態様においては、利用者からの指示で設定された効果調整値に応じた度合で変数情報が音声合成に反映されるから、利用者の音楽的な意図を反映した多様な表情の合成音を生成できるという利点がある。 The speech synthesizer according to a preferred aspect of the present invention includes adjustment value setting means for variably setting an effect adjustment value in accordance with an instruction from a user, and the variable setting means has a degree according to the effect adjustment value. A time series of control variables in the application section is set so that the variable information is reflected in the synthesis of the designated sound in the application section. In the above aspect, since the variable information is reflected in the speech synthesis to a degree corresponding to the effect adjustment value set by the instruction from the user, the synthesized sound of various expressions reflecting the user's musical intention There is an advantage that can be generated.

本発明の好適な態様に係る音声合成装置は、利用者からの指示に応じた効果調整値を適用区間のうちの編集区間と編集区間以外の区間とで個別に設定する調整値設定手段を具備し、変数設定手段は、編集区間の効果調整値に応じた度合で当該編集区間内の指定音の合成に変数情報が反映され、編集区間以外の区間の効果調整値に応じた度合で当該区間内の指定音の合成に変数情報が反映されるように、適用区間内の制御変数の時系列を設定する。以上の態様においては、適用区間のうちの編集区間と編集外区間とについて設定された効果調整値に応じて、制御変数の時系列が編集区間と編集外区間とで独立に設定される。したがって、利用者に意図に沿った多様な表情の合成音を生成できるという利点がある。 A speech synthesizer according to a preferred aspect of the present invention comprises adjustment value setting means for individually setting an effect adjustment value according to an instruction from a user in an editing section of an applied section and a section other than the editing section. The variable setting means reflects the variable information in the synthesis of the specified sound in the editing section to a degree according to the effect adjustment value in the editing section, and the section in the degree according to the effect adjustment value in the section other than the editing section. The time series of the control variables in the applicable section is set so that the variable information is reflected in the synthesis of the designated sound in the. In the above aspect, the time series of the control variable is set independently for the editing section and the non-editing section in accordance with the effect adjustment value set for the editing section and the non-editing section of the applied section. Therefore, there is an advantage that a synthesized sound with various expressions according to the intention can be generated for the user.

なお、編集区間と編集外区間とについて効果調整値が設定される構成では、編集区間と編集外区間とで効果調整値が極端に相違することに起因して、合成音の音楽的な表情が不自然となる可能性もある。そこで、本発明の好適な態様における調整値設定手段は、適用区間のうち編集区間の効果調整値と編集区間以外の効果調整値とを、両者の比率を維持しながら、利用者からの指示に応じて変化させる。以上の態様においては、編集区間の効果調整値と編集外区間の効果調整値との比率が維持されるから、編集区間の内外で合成音の音楽的な表情が不自然に変化することを抑制できるという利点がある。なお、以上の態様の具体例は、例えば第４実施形態として後述される。 In the configuration in which the effect adjustment value is set for the editing section and the non-editing section, the musical expression of the synthesized sound is caused by the fact that the effect adjustment value is extremely different between the editing section and the non-editing section. It may be unnatural. Therefore, the adjustment value setting means according to a preferred aspect of the present invention provides an instruction from the user while maintaining the ratio between the effect adjustment value of the editing section and the effect adjustment value other than the editing section of the application section. Change accordingly. In the above aspect, since the ratio between the effect adjustment value of the editing section and the effect adjustment value of the non-editing section is maintained, the musical expression of the synthesized sound is prevented from changing unnaturally inside and outside the editing section. There is an advantage that you can. In addition, the specific example of the above aspect is later mentioned as 4th Embodiment, for example.

以上の各態様に係る音楽情報処理装置は、音楽情報の処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、コンピュータを、音楽情報が示す指定音（合成の対象として指定された音）の時系列における適用区間を利用者からの指示に応じて可変に設定する区間設定手段、音声合成に適用される制御変数の時系列を示す複数の変数情報のうち利用者からの指示に応じた変数情報を選択する変数選択手段、変数選択手段が選択した変数情報に応じて適用区間内の制御変数の時系列を設定する変数設定手段、および、音楽情報が示す指定音を合成する手段であって、変数設定手段が設定した制御変数の時系列を適用区間内の指定音の合成に適用する音声合成手段として機能させる。以上のプログラムによれば、本発明に係る音声合成装置と同様の作用および効果が実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。
The music information processing apparatus according to each of the aspects described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing music information, and a general-purpose device such as a CPU (Central Processing Unit). This is also realized by cooperation between the arithmetic processing unit and the program. The program of the present invention includes a section setting means for setting a computer to variably set an application section in a time series of a designated sound (sound designated as a synthesis target) indicated by music information in accordance with an instruction from a user, speech synthesis Variable selection means for selecting variable information according to an instruction from the user from among a plurality of variable information indicating a time series of control variables applied to the control, and control within the application section according to the variable information selected by the variable selection means Variable setting means for setting a time series of variables, and means for synthesizing a specified sound indicated by music information, and applying the time series of control variables set by the variable setting means to the synthesis of the specified sound in the application section It functions as a speech synthesis means. According to the above program, the same operation and effect as the speech synthesizer according to the present invention are realized. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

第１実施形態に係る音声合成装置のブロック図である。It is a block diagram of the speech synthesizer concerning a 1st embodiment. 編集画像の模式図である。It is a schematic diagram of an edit image. 適用区間が指示された場合の編集画像の模式図である。It is a schematic diagram of an edited image when an application section is instructed. 管理画像の模式図である。It is a schematic diagram of a management image. 操作画像の模式図である。It is a schematic diagram of an operation image. 区間条件および効果調整値が指示された場合の操作画像の模式図である。It is a schematic diagram of an operation image when an interval condition and an effect adjustment value are instructed. 変数遷移画像が表示された状態の編集画像の模式図である。It is a schematic diagram of an edited image in a state where a variable transition image is displayed. 第２実施形態における編集画像の模式図である。It is a schematic diagram of the edited image in 2nd Embodiment. 第２実施形態における管理画像の模式図である。It is a schematic diagram of the management image in 2nd Embodiment. 第２実施形態における管理画像の模式図である。It is a schematic diagram of the management image in 2nd Embodiment. 第３実施形態の区間設定部による制御変数の補間を説明するための模式図である。It is a schematic diagram for demonstrating the interpolation of the control variable by the area setting part of 3rd Embodiment. 第４実施形態における管理画像の模式図である。It is a schematic diagram of the management image in 4th Embodiment.

＜Ａ：第１実施形態＞
図１は、本発明の第１実施形態に係る音声合成装置１００のブロック図である。音声合成装置１００は、歌唱音などの様々な音声（以下「合成音」という）を合成する装置であり、図１に示すように、制御装置１０と記憶装置１２と入力装置１４と表示装置１６と放音装置１８とを具備するコンピュータシステムで実現される。音声合成装置１００を楽曲の歌唱音の合成に利用する場合を以下では想定する。 <A: First Embodiment>
FIG. 1 is a block diagram of a speech synthesizer 100 according to the first embodiment of the present invention. The speech synthesizer 100 is a device that synthesizes various sounds such as singing sounds (hereinafter referred to as “synthesized sounds”). As shown in FIG. 1, the control device 10, the storage device 12, the input device 14, and the display device 16. And a sound emitting device 18. In the following, it is assumed that the speech synthesizer 100 is used for synthesizing a song singing sound.

制御装置（ＣＰＵ）１０は、記憶装置１２に記憶されたプログラムＰGの実行で、音声信号ＳOUTの生成に必要な複数の機能（表示制御部２２，情報生成部２４，区間設定部２６，変数処理部３２，音声合成部３４）を実現する。音声信号ＳOUTは、合成音の波形を表す信号である。なお、制御装置１０の各機能を専用の電子回路（ＤＳＰ）で実現した構成や、制御装置１０の各機能を複数の集積回路に分散した構成も採用され得る。 The control device (CPU) 10 executes a plurality of functions (display control unit 22, information generation unit 24, section setting unit 26, variable processing necessary for generating the audio signal SOUT by executing the program PG stored in the storage device 12. Unit 32 and speech synthesis unit 34). The audio signal SOUT is a signal representing the waveform of the synthesized sound. A configuration in which each function of the control device 10 is realized by a dedicated electronic circuit (DSP) or a configuration in which each function of the control device 10 is distributed over a plurality of integrated circuits may be employed.

入力装置１４は、利用者からの指示を受付ける機器（例えばマウスやキーボード）である。表示装置（例えば液晶表示装置）１６は、制御装置１０から指示された画像を表示する。放音装置（例えばスピーカやヘッドホン）１８は、制御装置１０が生成する音声信号ＳOUTに応じた音波を放射する。 The input device 14 is a device (for example, a mouse or a keyboard) that receives an instruction from a user. The display device (for example, a liquid crystal display device) 16 displays an image instructed from the control device 10. A sound emitting device (for example, a speaker or a headphone) 18 emits a sound wave corresponding to the sound signal SOUT generated by the control device 10.

記憶装置１２は、制御装置１０が実行するプログラムＰGや制御装置１０が使用する各種のデータ（音素情報ＤV，音楽情報ＤS，変数情報ＤP）を記憶する。半導体記録媒体や磁気記録媒体などの公知の記録媒体（または複数種の記録媒体の組合せ）が記憶装置１２として任意に採用される。なお、プログラムＰGや各データ（ＤV，ＤS，ＤP）を複数の記録媒体に分散して記憶した構成も採用され得る。 The storage device 12 stores a program PG executed by the control device 10 and various data (phoneme information DV, music information DS, variable information DP) used by the control device 10. A known recording medium (or a combination of a plurality of types of recording media) such as a semiconductor recording medium or a magnetic recording medium is arbitrarily employed as the storage device 12. A configuration in which the program PG and each data (DV, DS, DP) are distributed and stored in a plurality of recording media may be employed.

音素情報ＤVは、合成音の素材として利用されるデータ群であり、相異なる音声素片に対応する多数の素片データ（例えば音声素片の時間波形や特徴量を示すデータ）を含んで構成される。音声素片は、音声を聴覚的に区別し得る最小の単位に相当する音素、または複数の音素を連結した音素連鎖である。 The phoneme information DV is a data group used as a synthetic sound material, and includes a large number of unit data corresponding to different speech units (for example, data indicating a time waveform or a feature amount of a speech unit). Is done. The phoneme segment is a phoneme corresponding to a minimum unit that can be audibly distinguished from a voice, or a phoneme chain in which a plurality of phonemes are connected.

音楽情報ＤSは、楽曲を構成する各指定音の時系列を示す情報（スコアデータ）である。具体的には、音楽情報ＤSは、指定音の音高（ノートナンバ）と発音期間（例えば発音の開始時刻と継続長）と発音文字（例えば歌詞の文字に対応する音節や音韻）とを楽曲内の指定音毎に指定する。 The music information DS is information (score data) indicating the time series of each designated sound that constitutes the music. Specifically, the music information DS includes the pitch (note number) of the specified sound, the pronunciation period (for example, the start time and duration of pronunciation), and the pronunciation characters (for example, syllables and phonemes corresponding to the words of the lyrics). Specify for each specified sound.

変数情報ＤPは、音声合成に適用される複数種の制御変数（コントロールパラメータ）Ｘの時間的な変化を示すデータ群である。制御変数Ｘは、合成音に付与される音楽的な表情を制御するための変数である。具体的には、指定音の発音の強弱（velocity），音量（dynamics），息成分の強弱（breathness）、明瞭度（brightness，clearness），発音時の開口度（opening），発音者の性別（genderfactor），音高を連続的に変化（ポルタメント）させる時点（portamento-timing），音高の微小変化（pitch-bend），音高の微小変化の最大幅（pitch-bend sensitivity）など、音声合成に適用される公知の変数が制御変数Ｘとして任意に採用される。 The variable information DP is a data group indicating temporal changes in a plurality of types of control variables (control parameters) X applied to speech synthesis. The control variable X is a variable for controlling a musical expression given to the synthesized sound. Specifically, the sound intensity (velocity), volume (dynamics) of the specified sound, breathness intensity (breathness), clarity (brightness, clearness), opening degree during pronunciation, gender ( genderfactor), time point of continuous pitch change (portamento) (portamento-timing), minute pitch change (pitch-bend), maximum pitch change (pitch-bend sensitivity), etc. A known variable applied to the above is arbitrarily adopted as the control variable X.

図１に示すように、記憶装置１２は複数の変数情報ＤPを記憶する。１個の変数情報ＤPは、相異なる種類の制御変数Ｘ（Ｘ1，Ｘ2，……）に対応する複数の変数遷移データＶを含んで構成される。各制御変数Ｘの変数遷移データＶは、所定の時間にわたる当該制御変数Ｘの時系列（時間的な遷移）を示すデータ列である。各変数遷移データＶが示す制御変数Ｘの変化の態様は、同種の制御変数Ｘの変数遷移データＶでも変数情報ＤP毎に相違し得る。なお、変数情報ＤP内の各変数遷移データＶが示す制御変数Ｘの種類は、基本的には複数の変数情報ＤPで共通するが、変数情報ＤP毎に制御変数Ｘの種類が相違する構成を採用することも可能である。 As shown in FIG. 1, the storage device 12 stores a plurality of variable information DP. One variable information DP includes a plurality of variable transition data V corresponding to different types of control variables X (X1, X2,...). The variable transition data V of each control variable X is a data string indicating a time series (temporal transition) of the control variable X over a predetermined time. The mode of change of the control variable X indicated by each variable transition data V may be different for each variable information DP even in the variable transition data V of the same type of control variable X. The type of the control variable X indicated by each variable transition data V in the variable information DP is basically common to a plurality of variable information DP, but the type of the control variable X is different for each variable information DP. It is also possible to adopt.

図１の表示制御部２２は、音楽情報ＤSの生成および編集や合成音に付与される音楽的な表情の編集のために利用者が視認する画像（編集画像６０，管理画像７０，操作画像８０）を表示装置１６に表示させる。図２は、音楽情報ＤSの作成および編集に使用される編集画像６０の模式図である。図２に示すように、編集画像６０は、指定音の時系列を表示する楽譜領域６２と、制御変数Ｘの経時的な変化を表示する変数領域６４とを含んで構成される。 The display control unit 22 in FIG. 1 generates images (edited image 60, management image 70, operation image 80) visually recognized by the user in order to generate and edit music information DS and edit musical expressions given to synthesized sounds. ) Is displayed on the display device 16. FIG. 2 is a schematic diagram of an edited image 60 used for creating and editing the music information DS. As shown in FIG. 2, the edited image 60 includes a score area 62 that displays a time series of designated sounds, and a variable area 64 that displays changes over time of the control variable X.

楽譜領域６２は、音高に対応する縦軸（音高軸）と時間に対応する横軸（時間軸）とが設定されたピアノロール型の画像領域である。利用者は、楽譜領域６２を視認しながら入力装置１４を適宜に操作することで指定音の音高と発音期間（始点および終点）とを指示する。表示制御部２２は、利用者から指示された指定音に対応する音指示子６２２を楽譜領域６２内に配置する。音高軸の方向における音指示子６２２の位置は利用者が指示した音高に応じて決定され、時間軸の方向における音指示子６２２の位置およびサイズは利用者が指示した発音期間に応じて決定される。また、利用者は、入力装置１４を適宜に操作することで各指定音の発音文字（歌詞）を指示する。なお、指定音の音譜を五線譜に記譜した楽譜の画像を楽譜領域６２に配置する構成も採用され得る。 The score area 62 is a piano roll type image area in which a vertical axis (pitch axis) corresponding to pitch and a horizontal axis (time axis) corresponding to time are set. The user designates the pitch of the designated sound and the sound generation period (start point and end point) by appropriately operating the input device 14 while viewing the score area 62. The display control unit 22 arranges a sound indicator 622 corresponding to the designated sound instructed by the user in the score area 62. The position of the sound indicator 622 in the direction of the pitch axis is determined in accordance with the pitch instructed by the user, and the position and size of the sound indicator 622 in the direction of the time axis is in accordance with the pronunciation period instructed by the user. It is determined. In addition, the user instructs the pronunciation characters (lyrics) of each designated sound by appropriately operating the input device 14. It is also possible to employ a configuration in which a score image in which a musical score of a designated sound is recorded in a staff score is arranged in the score area 62.

図１の情報生成部２４は、楽譜領域６２に対して利用者から指示された指定音の音高と発音期間と発音文字とを対応させて記憶装置１２の音楽情報ＤSに格納する。以上の処理が反復されることで、利用者から指示された指定音の時系列を示す音楽情報ＤSが記憶装置１２に生成され、各指定音の音指示子６２２の時系列が図２の例示のように楽譜領域６２に表示される。 The information generation unit 24 in FIG. 1 stores the pitch, the pronunciation period, and the pronunciation character of the designated sound instructed by the user for the score area 62 in the music information DS of the storage device 12. By repeating the above processing, music information DS indicating the time series of the designated sound instructed by the user is generated in the storage device 12, and the time series of the sound indicator 622 of each designated sound is illustrated in FIG. As shown in FIG.

図１の区間設定部２６は、音楽情報ＤSが示す指定音の時系列（楽曲）のうち変数情報ＤPの適用の対象となる区間（以下「適用区間」という）ＳAを、入力装置１４に対する利用者からの指示に応じて可変に設定する。例えば、区間設定部２６は、楽譜領域６２に対して利用者から指示された始点から終点にわたる区間を適用区間ＳAとして特定する。区間設定部２６は、時間的に重複しない複数の適用区間ＳAを利用者からの指示に応じて楽曲内に順次に特定する。各適用区間ＳAの始点および終点は、入力装置１４に対する利用者からの指示に応じて随時に変更され得る。 The section setting unit 26 in FIG. 1 uses a section SA (hereinafter referred to as “applied section”) SA to which the variable information DP is applied in the time series (songs) of the designated sound indicated by the music information DS for the input device 14. It is variably set according to instructions from the person. For example, the section setting unit 26 specifies a section extending from the start point to the end point instructed by the user with respect to the score area 62 as the application section SA. The section setting unit 26 sequentially specifies a plurality of application sections SA that do not overlap in time in the music in accordance with an instruction from the user. The start point and the end point of each application section SA can be changed at any time according to an instruction from the user to the input device 14.

図３に示すように、表示制御部２２は、区間設定部２６が設定した適用区間ＳAを示す区間指示子（網掛の部分）６２４を編集画像６０に配置する。区間指示子６２４は、例えば、楽譜領域６２のうち適用区間ＳAの始点から終点にかけて時間軸に沿って延在する帯状の画像である。また、利用者が指示した適用区間ＳAの識別情報（例えば図３の「パートＡ」という名称）が区間指示子６２４に付加される。 As shown in FIG. 3, the display control unit 22 arranges a section indicator (shaded portion) 624 indicating the application section SA set by the section setting unit 26 in the edited image 60. The section indicator 624 is, for example, a band-shaped image extending along the time axis from the start point to the end point of the application section SA in the score area 62. Further, identification information (for example, the name “Part A” in FIG. 3) of the application section SA designated by the user is added to the section indicator 624.

また、表示制御部２２は、各適用区間ＳAの管理に利用される図４の管理画像７０を表示装置１６に表示させる。図４に示すように、管理画像７０は、適用区間ＳA毎にレコード７２を配置した図表である。各レコード７２は、適用区間ＳAの識別情報（名称）と継続期間とを含んで構成される。継続期間は、適用区間ＳAの始点と終点とで指定される。なお、継続期間の指定の方法は任意である。例えば、図４の例示のように始点および終点の時刻を指定する方法のほか、楽曲内の小節や拍点の番号で継続期間の始点および終点を指定する方法も採用され得る。利用者は、管理画像７０に対する直接的な入力（継続期間や識別情報の入力）で適用区間ＳAを指示することも可能である。すなわち、区間設定部２６は、管理画像７０に対して利用者が入力した継続期間に相当する適用区間ＳAを設定する。 In addition, the display control unit 22 causes the display device 16 to display the management image 70 of FIG. 4 used for management of each application section SA. As shown in FIG. 4, the management image 70 is a chart in which records 72 are arranged for each application section SA. Each record 72 includes identification information (name) of the application section SA and a duration. The duration is specified by the start point and end point of the application section SA. The method for specifying the duration is arbitrary. For example, in addition to the method of specifying the time of the start point and the end point as illustrated in FIG. 4, a method of specifying the start point and end point of the duration by the number of the bar or beat point in the music may be employed. The user can also instruct the application section SA by direct input to the management image 70 (continuation period or identification information input). That is, the section setting unit 26 sets an application section SA corresponding to the duration period input by the user for the management image 70.

図１の変数処理部３２は、相異なる制御変数Ｘ（Ｘ1，Ｘ2，……）に対応する複数の変数遷移データＷを記憶装置１２内の変数情報ＤP（各変数遷移データＶ）から生成する。各制御変数Ｘに対応する変数遷移データＷは、適用区間ＳAの音声合成に適用されるべき当該制御変数Ｘの時系列（時間的な遷移）を示すデータ列である。変数処理部３２の具体的な構成や動作については後述する。 The variable processing unit 32 of FIG. 1 generates a plurality of variable transition data W corresponding to different control variables X (X1, X2,...) From the variable information DP (each variable transition data V) in the storage device 12. . The variable transition data W corresponding to each control variable X is a data string indicating a time series (temporal transition) of the control variable X to be applied to speech synthesis in the application section SA. A specific configuration and operation of the variable processing unit 32 will be described later.

音声合成部３４は、記憶装置１２に格納された音楽情報ＤSが示す指定音を合成して音声信号ＳOUTを生成する。具体的には、音声合成部３４は、記憶装置１２の音素情報ＤVのうち音楽情報ＤSが示す各指定音の発音文字（音声素片）に対応する素片データを、音楽情報ＤSが示す音高および発音期間に調整したうえで相互に連結することで音声信号ＳOUTを生成する。音楽情報ＤSが示す指定音の時系列（楽曲）のうち区間設定部２６が設定した適用区間ＳA内の各指定音の合成には、変数処理部３２が生成した複数の変数遷移データＷが適用される。すなわち、音楽情報ＤSが示す指定音に対して各制御変数Ｘの変数遷移データＷに応じた音楽的な表情を付与した合成音の音声信号ＳOUTが生成される。なお、適用区間ＳA以外の区間の各指定音については、音楽的な表情を付与しない構成や、各制御変数Ｘを所定値（初期値）に固定して表情を付与する構成が採用され得る。音楽情報ＤSおよび制御変数Ｘに応じた音声合成には公知の技術が任意に採用される。 The voice synthesizer 34 synthesizes the designated sound indicated by the music information DS stored in the storage device 12 to generate a voice signal SOUT. Specifically, the speech synthesizer 34 generates the segment data corresponding to the pronunciation character (speech segment) of each designated sound indicated by the music information DS among the phoneme information DV of the storage device 12. The audio signal SOUT is generated by adjusting the pitch and the tone generation period and connecting them to each other. A plurality of variable transition data W generated by the variable processing unit 32 is applied to the synthesis of each designated sound within the application section SA set by the section setting unit 26 in the time series (musical piece) of the designated sound indicated by the music information DS. Is done. In other words, the synthesized speech signal SOUT is generated by giving a musical expression corresponding to the variable transition data W of each control variable X to the designated sound indicated by the music information DS. For each designated sound in the sections other than the application section SA, a configuration in which a musical expression is not applied, or a configuration in which each control variable X is fixed to a predetermined value (initial value) and a facial expression is applied can be employed. A known technique is arbitrarily employed for speech synthesis in accordance with the music information DS and the control variable X.

図１に示すように、変数処理部３２は、変数選択部４２と調整値設定部４４と変数設定部４６とを含んで構成される。変数選択部４２は、記憶装置１２に格納された複数の変数情報ＤPのうち利用者からの指示に応じた変数情報ＤPを適用区間ＳA毎に選択する。具体的には、変数選択部４２は、適用区間ＳAについて利用者が入力装置１４から指示した条件（以下「区間条件」という）に応じた変数情報ＤPを記憶装置１２から取得する。図１の調整値設定部４４は、変数選択部４２が選択した変数情報ＤPを音声合成に反映させる度合を示す効果調整値Ａを、入力装置１４に対する利用者からの指示に応じて可変に設定する。 As shown in FIG. 1, the variable processing unit 32 includes a variable selection unit 42, an adjustment value setting unit 44, and a variable setting unit 46. The variable selection unit 42 selects the variable information DP corresponding to the instruction from the user for each application section SA among the plurality of variable information DP stored in the storage device 12. Specifically, the variable selection unit 42 acquires variable information DP corresponding to a condition (hereinafter referred to as “section condition”) instructed by the user from the input device 14 for the application section SA. The adjustment value setting unit 44 in FIG. 1 variably sets the effect adjustment value A indicating the degree to which the variable information DP selected by the variable selection unit 42 is reflected in the speech synthesis in accordance with an instruction from the user to the input device 14. To do.

表示制御部２２は、区間条件および効果調整値Ａの指示に利用される図５の操作画像８０を表示装置１６に表示させる。図５に示すように、操作画像８０は、区間条件の指示に利用される条件指示領域８２と、効果調整値Ａの指示に利用される調整値指示領域８４とを含んで構成される。 The display control unit 22 causes the display device 16 to display the operation image 80 of FIG. 5 that is used to instruct the section condition and the effect adjustment value A. As shown in FIG. 5, the operation image 80 includes a condition instruction area 82 used for instructing a section condition and an adjustment value instruction area 84 used for instructing an effect adjustment value A.

区間条件は、適用区間ＳAに関する複数の属性（音質，ジャンル，曲部位）で規定される。条件指示領域８２は、区間条件を規定する複数の属性の各々について、利用者が選択し得る複数の選択肢（候補）を羅列した画像である。具体的には、図５の例示のように、適用区間ＳAの音質に関する複数の選択肢（男声，女声，ロボット声）と、適用区間ＳAのジャンルに関する複数の選択肢（ロック，ポップス，ジャズ，……）と、適用区間ＳAの曲部位に関する複数の選択肢（イントロ，Ａメロ，Ｂメロ，サビ，……）とが条件指示領域８２に配列される。なお、以上の例示からも理解されるように、曲部位は、楽曲内における適用区間ＳAの構造的な位置付けに相当する。 The section condition is defined by a plurality of attributes (sound quality, genre, song part) regarding the application section SA. The condition indicating area 82 is an image in which a plurality of options (candidates) that can be selected by the user are listed for each of a plurality of attributes that define the section condition. Specifically, as illustrated in FIG. 5, a plurality of options relating to the sound quality of the application section SA (male voice, female voice, robot voice) and a plurality of options relating to the genre of the application section SA (rock, pops, jazz,... ) And a plurality of options (intro, A melody, B melody, rust,...) Related to the music part of the application section SA are arranged in the condition indicating area 82. As can be understood from the above examples, the music part corresponds to the structural positioning of the application section SA in the music.

利用者は、入力装置１４を適宜に操作することで、適用区間ＳAの各属性について、条件指示領域８２に配列された何れかの選択肢を指示する。図５では、利用者が、声質について「女声」を選択し、ジャンルについて「ボサノバ」を選択し、曲部位について「Ａメロ」を選択した場合が例示されている。利用者が属性毎に指示した選択肢の組合せが区間条件として変数選択部４２に指示される。 The user instructs one of the options arranged in the condition instruction area 82 for each attribute of the application section SA by appropriately operating the input device 14. FIG. 5 illustrates a case where the user selects “female voice” for the voice quality, selects “Bossa Nova” for the genre, and selects “A melody” for the song part. A combination of options designated by the user for each attribute is instructed to the variable selection unit 42 as a section condition.

記憶装置１２は、利用者から指示され得る区間条件毎（すなわち、各属性について選択され得る選択肢の組合せ毎）に変数情報ＤPを記憶する。各区間条件に対応する変数情報ＤPの各変数遷移データＶは、変数遷移データＶが示す制御変数Ｘの時系列で表現される音楽的な表情の時間的な遷移が、その区間条件（音質，ジャンル，曲部位）を満たす旋律に対して音楽的に適合するように作成される。例えば、図５の例示で指示された区間条件（女声，ボサノバ，Ａメロ）に対応する変数情報ＤPの各変数遷移データＶは、各変数遷移データＶが示す音楽的な表情の時間的な遷移が、「ボサノバ」の楽曲のうち「Ａメロ」の旋律を「女声」で発声した場合の歌唱音に対して音楽的に適合するように作成される。図１の変数選択部４２は、記憶装置１２が記憶する複数の変数情報ＤPのうち、利用者から指示された区間条件に適合または近似する変数情報ＤPを記憶装置１２から選択的に取得する。 The storage device 12 stores the variable information DP for each section condition that can be instructed by the user (that is, for each combination of options that can be selected for each attribute). Each variable transition data V of the variable information DP corresponding to each section condition has a temporal transition of a musical expression expressed in a time series of the control variable X indicated by the variable transition data V, and the section condition (sound quality, It is created so as to be musically compatible with the melody that satisfies the genre and the song part. For example, each variable transition data V of the variable information DP corresponding to the section condition (female voice, bossa nova, A melody) indicated in the example of FIG. 5 is the temporal transition of the musical expression indicated by each variable transition data V. However, it is created so as to be musically adapted to the singing sound when the melody of “A melody” is uttered with “woman voice” in the music of “Bossa Nova”. The variable selection unit 42 in FIG. 1 selectively acquires, from the storage device 12, variable information DP that matches or approximates the section condition instructed by the user among the plurality of variable information DP stored in the storage device 12.

他方、図５の調整値指示領域８４には、入力装置１４に対する操作に応じて移動するスライダ型の操作子画像８４２が配置される。調整値設定部４４は、利用者が入力装置１４の操作で移動させた操作子画像８４２の位置に応じた効果調整値Ａを設定する。例えば、調整値設定部４４は、移動可能な範囲の下端に操作子画像８４２が位置する場合には効果調整値Ａを最小値（例えば０％）に設定し、移動可能な範囲の上端に操作子画像８４２が位置する場合には効果調整値Ａを最大値（例えば100％）に設定する。効果調整値Ａが最小値（０％）である場合、例えば図５にて「OFF」が併記された図形“○”の点灯で、適用区間ＳA内の合成音に音楽的な表情を付与しないことが表示され、効果調整値Ａが最小値を上回る場合、図５にて「ON」が併記された図形“○”の点灯で、適用区間ＳA内の合成音に音楽的な表情を付与することが表示される。 On the other hand, a slider-type operation element image 842 that moves in response to an operation on the input device 14 is arranged in the adjustment value instruction area 84 of FIG. The adjustment value setting unit 44 sets the effect adjustment value A according to the position of the operator image 842 moved by the user by operating the input device 14. For example, when the operator image 842 is positioned at the lower end of the movable range, the adjustment value setting unit 44 sets the effect adjustment value A to the minimum value (for example, 0%) and operates the upper end of the movable range. When the child image 842 is positioned, the effect adjustment value A is set to the maximum value (for example, 100%). When the effect adjustment value A is the minimum value (0%), for example, the figure “O” with “OFF” written in FIG. 5 is turned on, and no musical expression is given to the synthesized sound in the application section SA. Is displayed and the effect adjustment value A exceeds the minimum value, the figure “O” with “ON” in FIG. 5 is turned on to give a musical expression to the synthesized sound in the application section SA. Is displayed.

図６に示すように、利用者が条件指示領域８２に対して指示した区間条件と調整値指示領域８４に対して指示した効果調整値Ａとは管理画像７０にも反映される。なお、利用者は、入力装置１４を適宜に操作することで、管理画像７０に対して直接的に区間条件や効果調整値Ａを入力することも可能である。すなわち、管理画像７０に入力された区間条件に応じて変数選択部４２は変数情報ＤPを選択し、管理画像７０に対する入力に応じて調整値設定部４４は効果調整値Ａを設定する。 As shown in FIG. 6, the section condition designated by the user for the condition designation area 82 and the effect adjustment value A designated for the adjustment value designation area 84 are also reflected in the management image 70. Note that the user can also directly input the section condition and the effect adjustment value A to the management image 70 by appropriately operating the input device 14. That is, the variable selection unit 42 selects the variable information DP according to the section condition input to the management image 70, and the adjustment value setting unit 44 sets the effect adjustment value A according to the input to the management image 70.

図１の変数設定部４６は、適用区間ＳAの音声合成に適用される各制御変数Ｘ（Ｘ1，Ｘ2，……）の変数遷移データＷを、変数選択部４２が選択した変数情報ＤPと調整値設定部４４が設定した効果調整値Ａとに応じて生成する。具体的には、変数設定部４６は、以下に例示する第１処理と第２処理とを実行する。 The variable setting unit 46 in FIG. 1 adjusts the variable transition data W of each control variable X (X1, X2,...) Applied to speech synthesis in the application section SA with the variable information DP selected by the variable selection unit 42. It is generated according to the effect adjustment value A set by the value setting unit 44. Specifically, the variable setting unit 46 executes a first process and a second process exemplified below.

記憶装置１２に記憶された変数情報ＤPの各変数遷移データＶは、適用区間ＳAとは無関係に事前に選定された所定の時間にわたる制御変数Ｘの時系列を指示する。第１処理は、変数情報ＤPの各変数遷移データＶを、区間設定部２６が設定した適用区間ＳAの時間長に合致するように伸縮する処理である。例えば、各変数遷移データＶが示す制御変数Ｘの時系列を例えば補間（間引）により伸縮する処理や、制御変数Ｘの時系列が反復するように変数遷移データＶを時間軸に沿って連結する処理が、第１処理として採用され得る。 Each variable transition data V of the variable information DP stored in the storage device 12 indicates a time series of the control variable X over a predetermined time selected in advance regardless of the application section SA. The first process is a process for expanding and contracting each variable transition data V of the variable information DP so as to match the time length of the application section SA set by the section setting unit 26. For example, the process of expanding / contracting the time series of the control variable X indicated by each variable transition data V by, for example, interpolation (decimation), or connecting the variable transition data V along the time axis so that the time series of the control variable X is repeated. The process to perform can be adopted as the first process.

第２処理は、第１処理後の各変数遷移データＶを効果調整値Ａに応じて調整することで各変数遷移データＷを生成する処理である。具体的には、変数設定部４６は、変数遷移データＷにおける制御変数Ｘの時系列と、第１処理後の変数遷移データＶにおける当該制御変数Ｘの時系列との近似の度合（すなわち、変数遷移データＷに対して変数遷移データＶが反映される度合）が効果調整値Ａに応じて変化するように、各制御変数Ｘの変数遷移データＷを第２処理で生成する。例えば、変数設定部４６は、効果調整値Ａが最大値（100％）に近いほど、第２処理後の各変数遷移データＷにおける制御変数Ｘの時系列が変数遷移データＶの制御変数Ｘの時系列に近づき、効果調整値Ａが最小値（０％）に近いほど、第２処理後の各変数遷移データＷにおける制御変数Ｘが、変数遷移データＶとは無関係の所定値（例えばゼロ）に近づくように、各制御変数Ｘの変数遷移データＷを生成する。なお、以上では第１処理の実行後に第２処理を実行したが、第２処理の実行後に第１処理を実行する構成も採用される。 The second process is a process of generating each variable transition data W by adjusting each variable transition data V after the first process according to the effect adjustment value A. Specifically, the variable setting unit 46 approximates the time series of the control variable X in the variable transition data W and the time series of the control variable X in the variable transition data V after the first processing (that is, the variable The variable transition data W of each control variable X is generated in the second process so that the degree of reflection of the variable transition data V with respect to the transition data W changes according to the effect adjustment value A. For example, as the effect adjustment value A is closer to the maximum value (100%), the variable setting unit 46 sets the time series of the control variable X in each variable transition data W after the second process to the control variable X of the variable transition data V. The closer to the time series and the closer the effect adjustment value A is to the minimum value (0%), the control variable X in each variable transition data W after the second processing is a predetermined value (eg, zero) unrelated to the variable transition data V. So that the variable transition data W of each control variable X is generated. In addition, although the 2nd process was performed after execution of a 1st process above, the structure which performs a 1st process after execution of a 2nd process is also employ | adopted.

音声合成部３４は、以上の手順で変数処理部３２（変数設定部４６）が生成した各変数遷移データＷの制御変数Ｘの時系列を適用区間ＳA内の各指定音の合成に適用して音声信号ＳOUTを生成する。他方、表示制御部２２は、図７に示すように、各変数遷移データＷが示す制御変数Ｘの時系列を示す画像（以下「変数遷移画像」という）６４２を編集画像６０内の変数領域６４に制御変数Ｘ毎に配置する。具体的には、制御変数Ｘの遷移を示すグラフ（例えば折れ線グラフ）が変数遷移画像６４２として変数領域６４に表示される。表示制御部２２は、変数領域６４内の変数遷移画像６４２が示す制御変数Ｘの時系列と、楽譜領域６２内の各音指示子６２２（指定音）の時系列とで時間軸が共通（一致）するように変数遷移画像６４２を表示する。すなわち、変数領域６４内の変数遷移画像６４２のうち時間軸上の各時点での制御変数Ｘは、楽譜領域６２のうちその時点に存在する指定音の合成に適用される。 The voice synthesis unit 34 applies the time series of the control variable X of each variable transition data W generated by the variable processing unit 32 (variable setting unit 46) by the above procedure to the synthesis of each designated sound in the application section SA. An audio signal SOUT is generated. On the other hand, as shown in FIG. 7, the display control unit 22 converts an image 642 indicating the time series of the control variable X indicated by each variable transition data W (hereinafter referred to as “variable transition image”) 642 into a variable area 64 in the edited image 60. For each control variable X. Specifically, a graph indicating the transition of the control variable X (for example, a line graph) is displayed in the variable area 64 as a variable transition image 642. The display control unit 22 uses the same time axis for the time series of the control variable X indicated by the variable transition image 642 in the variable area 64 and the time series of each sound indicator 622 (designated sound) in the score area 62 (match). ), The variable transition image 642 is displayed. That is, the control variable X at each time point on the time axis in the variable transition image 642 in the variable region 64 is applied to the synthesis of the designated sound existing at that time point in the score region 62.

図７に示すように、変数遷移画像６４２は、制御変数Ｘ毎に相異なる態様（表示色や線幅や線種）で表示される。例えば、図７では、制御変数Ｘ1の変数遷移画像６４２を実線で表示し、制御変数Ｘ2の変数遷移画像６４２を破線で表示した場合が例示されている。なお、複数の制御変数Ｘのうち利用者から指示された１種以上の制御変数Ｘの変数遷移画像６４２のみを選択的に変数領域６４に配置する構成も採用され得る。また、変数領域６４内の各変数遷移画像６４２を変更（編集）する指示が入力装置１４から入力された場合に、その変数遷移画像６４２に対応する変数遷移データＷ（制御変数Ｘの時系列）を変数設定部４６が変更の指示に応じて更新する構成も好適である。 As shown in FIG. 7, the variable transition image 642 is displayed in a different mode (display color, line width, line type) for each control variable X. For example, FIG. 7 illustrates a case where the variable transition image 642 of the control variable X1 is displayed with a solid line and the variable transition image 642 of the control variable X2 is displayed with a broken line. Note that a configuration in which only the variable transition image 642 of one or more types of control variables X instructed by the user among the plurality of control variables X is selectively arranged in the variable area 64 may be employed. Further, when an instruction to change (edit) each variable transition image 642 in the variable area 64 is input from the input device 14, the variable transition data W corresponding to the variable transition image 642 (time series of the control variable X) A configuration in which the variable setting unit 46 updates in response to a change instruction is also suitable.

以上に説明したように、第１実施形態においては、制御変数Ｘの時間的な遷移を示す複数の変数情報ＤPのうち利用者からの指示（区間条件）に応じて選択された変数情報ＤPが、適用区間ＳA内の各指定音の音声合成に適用される制御変数Ｘの時系列の設定（変数遷移データＷの生成）に利用される。すなわち、変数選択部４２による選択の結果（変数情報ＤP）に応じて合成音の音楽的な表情を変更することが可能である。したがって、合成音の音楽的な表情を変更するためには変数情報を利用者が編集する必要がある特許文献１の技術と比較して、合成音に所望の表情を付与するために必要な利用者の作業の負担が軽減されるという利点がある。さらに、以上の例示においては、変数情報ＤPが複数の制御変数Ｘの時間的な遷移を指示するから、変数情報ＤPが１種類の制御変数Ｘのみを指示する構成と比較すると、多様な表情の合成音を生成できるという格別の効果が実現される。 As described above, in the first embodiment, the variable information DP selected according to the instruction (section condition) from the user among the plurality of variable information DP indicating the temporal transition of the control variable X is the variable information DP. This is used for setting the time series of the control variable X (generation of variable transition data W) applied to speech synthesis of each designated sound in the application section SA. That is, it is possible to change the musical expression of the synthesized sound according to the selection result (variable information DP) by the variable selection unit 42. Therefore, in order to change the musical expression of the synthesized sound, compared with the technique of Patent Document 1 in which the user needs to edit the variable information, the use necessary for giving the desired expression to the synthesized sound. There is an advantage that the burden on the worker's work is reduced. Furthermore, in the above example, since the variable information DP indicates the temporal transition of the plurality of control variables X, compared with the configuration in which the variable information DP indicates only one type of control variable X, various facial expressions can be obtained. The special effect of being able to generate synthesized sounds is realized.

また、変数情報ＤPを音声合成に反映させる度合が利用者からの指示（効果調整値Ａ）に応じて可変に制御されるから、事前に用意された変数情報ＤPが音声合成に利用されるとは言っても、変数情報ＤPの反映の度合が固定された構成と比較すると、利用者の音楽的な意図を反映した多様な合成音を生成できるという利点がある。 In addition, since the degree to which the variable information DP is reflected in the speech synthesis is variably controlled according to the instruction (effect adjustment value A) from the user, when the variable information DP prepared in advance is used for the speech synthesis. Nevertheless, there is an advantage that various synthesized sounds reflecting the musical intention of the user can be generated as compared with the configuration in which the reflection degree of the variable information DP is fixed.

＜Ｂ：第２実施形態＞
次に、本発明の第２実施形態を説明する。なお、以下の各例示において作用や機能が第１実施形態と同等である要素については、以上と同じ符号を付して各々の詳細な説明を適宜に省略する。 <B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In the following examples, elements having the same functions and functions as those of the first embodiment are denoted by the same reference numerals, and detailed descriptions thereof are omitted as appropriate.

図８は、第２実施形態における編集画像６０の模式図である。第２実施形態の区間設定部２６は、入力装置１４に対する利用者からの指示に応じて適用区間ＳAのうちの編集区間ＳBを可変に設定する。区間設定部２６は、時間的に重複しない複数の編集区間ＳBを利用者からの指示に応じて各適用区間ＳA内に順次に特定し得る。なお、編集区間ＳBの設定には、適用区間ＳAの設定と同様の方法が適用される。 FIG. 8 is a schematic diagram of an edited image 60 in the second embodiment. The section setting unit 26 of the second embodiment variably sets the editing section SB in the application section SA in accordance with an instruction from the user to the input device 14. The section setting unit 26 can sequentially specify a plurality of editing sections SB that do not overlap in time within each application section SA in accordance with an instruction from the user. Note that the same method as the setting of the application section SA is applied to the setting of the editing section SB.

表示制御部２２は、図８に示すように、区間設定部２６が設定した編集区間ＳBを示す区間指示子６２６と、その編集区間ＳBについて利用者が指示した識別情報（図８の「フレーズＡ」という名称）とを編集画像６０に配置する。また、表示制御部２２は、図９に示すように、区間設定部２６が設定した編集区間ＳBの識別情報と継続期間とを含むレコード７２を、適用区間ＳAのレコード７２とは別個に管理画像７０に配置する。なお、適用区間ＳAのうち編集区間ＳB以外の区間（以下「編集外区間」という）に関する処理は第１実施形態の適用区間ＳAに関する処理と同様であるから、以下では適宜に説明を省略する。 As shown in FIG. 8, the display control unit 22 includes a section indicator 626 indicating the editing section SB set by the section setting section 26, and identification information (“phrase A” in FIG. 8) designated by the user for the editing section SB. Is placed in the edited image 60. Further, as shown in FIG. 9, the display control unit 22 manages the record 72 including the identification information and the duration of the editing section SB set by the section setting unit 26 separately from the record 72 of the application section SA. 70. Note that the processing related to the sections other than the editing section SB (hereinafter referred to as “non-editing sections”) in the application section SA is the same as the processing related to the application section SA of the first embodiment, and thus description thereof will be omitted as appropriate.

利用者は、所望の編集区間ＳBを選択して操作画像８０を適宜に操作することで、編集区間ＳBの区間条件と効果調整値Ａとを、編集外区間の区間条件や効果調整値Ａとは独立に設定することが可能である。変数選択部４２は、編集区間ＳBについて指示された区間条件に対応する変数情報ＤPを選択し、調整値設定部４４は、編集区間ＳBについて指示された効果調整値Ａを変数設定部４６に通知する。表示制御部２２は、利用者が指示した区間条件や効果調整値Ａを、管理画像７０内の編集区間ＳBのレコード７２に反映させる。例えば、図１０では、網掛で示すように、編集区間ＳBの効果調整値Ａを、その編集区間ＳBを含む適用区間ＳAの効果調整値Ａ（図９の80％）から増加した場合（80％→90％）が想定されている。 The user selects a desired editing section SB and appropriately manipulates the operation image 80, so that the section condition and effect adjustment value A of the editing section SB and the section condition and effect adjustment value A of the non-editing section and Can be set independently. The variable selection unit 42 selects the variable information DP corresponding to the section condition specified for the editing section SB, and the adjustment value setting section 44 notifies the variable setting section 46 of the effect adjustment value A specified for the editing section SB. To do. The display control unit 22 reflects the section condition and effect adjustment value A instructed by the user in the record 72 of the editing section SB in the management image 70. For example, in FIG. 10, when the effect adjustment value A in the editing section SB is increased from the effect adjustment value A (80% in FIG. 9) including the editing section SB (80%), as indicated by shading. → 90%) is assumed.

変数設定部４６は、適用区間ＳAのうち編集区間ＳBの変数遷移データＷと編集外区間の変数遷移データＷとを独立に設定する。すなわち、変数設定部４６は、適用区間ＳAのうち編集外区間について既に生成された変数遷移データＷを維持したまま、編集区間ＳBの変数遷移データＷを、変数選択部４２が編集区間ＳBについて選択した変数情報ＤPと調整値設定部４４が編集区間ＳBについて設定した効果調整値Ａとに応じて生成する。編集区間ＳBの変数遷移データＷの生成には、例えば、第１実施形態における適用区間ＳA内の変数遷移データＷの生成と同様の方法が採用される。 The variable setting unit 46 independently sets the variable transition data W in the editing section SB and the variable transition data W in the non-editing section in the application section SA. That is, the variable setting unit 46 selects the variable transition data W of the editing section SB while the variable selection section 42 selects the editing section SB while maintaining the variable transition data W already generated for the non-editing section of the application section SA. It is generated according to the variable information DP and the effect adjustment value A set by the adjustment value setting unit 44 for the editing section SB. For example, the same method as the generation of the variable transition data W in the application section SA in the first embodiment is adopted to generate the variable transition data W in the editing section SB.

表示制御部２２は、図８に示すように、適用区間ＳA内の編集外区間について第１実施形態と同様に変数遷移画像６４２を表示するほか、編集区間ＳBの変数遷移データＷに応じた変数遷移画像６４２を、変数領域６４のうち編集区間ＳBに対応する領域に配置する。音声合成部３４は、編集区間ＳBについて変数設定部４６が設定した変数遷移データＷを、その編集区間ＳB内の各指定音の合成に適用して音声信号ＳOUTを生成する。 As shown in FIG. 8, the display control unit 22 displays the variable transition image 642 for the non-editing section in the application section SA as in the first embodiment, and also displays the variable corresponding to the variable transition data W in the editing section SB. The transition image 642 is arranged in an area corresponding to the editing section SB in the variable area 64. The voice synthesizer 34 generates the voice signal SOUT by applying the variable transition data W set by the variable setting unit 46 for the editing section SB to the synthesis of each designated sound in the editing section SB.

第２実施形態においては、適用区間ＳAのうち利用者からの指示に応じた編集区間ＳBについて、編集外区間とは独立に変数遷移データＷが生成される。すなわち、適用区間ＳA内の制御変数Ｘの時系列を利用者が部分的に編集することが可能である。したがって、適用区間ＳAのみが設定される構成（第１実施形態）と比較して、利用者の意図を高度に反映した多様な表情の合成音を生成できるという利点がある。 In the second embodiment, the variable transition data W is generated independently of the non-editing section for the editing section SB corresponding to the instruction from the user in the application section SA. That is, the user can partially edit the time series of the control variable X in the application section SA. Therefore, compared to the configuration in which only the application section SA is set (first embodiment), there is an advantage that it is possible to generate synthetic sounds with various facial expressions that highly reflect the user's intention.

＜Ｃ：第３実施形態＞
第２実施形態のように適用区間ＳAのうち編集区間ＳBと編集外区間とで変数遷移データＷが個別に設定されると、図８における変数領域６４の内容から理解されるように、変数遷移データＷの示す制御変数Ｘの時系列が編集区間ＳBと編集外区間との境界にて不連続となる可能性がある。そこで、第３実施形態の変数設定部４６は、編集区間ＳBの内外で制御変数Ｘが連続する（滑らかに遷移する）ように、編集外区間の変数遷移データＷと編集区間ＳBの変数遷移データＷとの補間を実行する。 <C: Third Embodiment>
When the variable transition data W is individually set in the editing section SB and the non-editing section in the application section SA as in the second embodiment, the variable transition is understood as understood from the contents of the variable area 64 in FIG. The time series of the control variable X indicated by the data W may be discontinuous at the boundary between the editing section SB and the non-editing section. Therefore, the variable setting unit 46 of the third embodiment allows the variable transition data W in the non-editing section and the variable transition data in the editing section SB so that the control variable X continues (smoothly transitions) inside and outside the editing section SB. Perform interpolation with W.

例えば変数設定部４６は、図１１に示すように、編集区間ＳB内の変数遷移データＷが示す制御変数Ｘの時系列α1と、その編集区間ＳBの直前および直後の区間（編集外区間）の変数遷移データＷが示す制御変数Ｘの時系列α2とを時間軸上で重複させたうえでクロスフェードすることで、編集区間ＳBの始点および終点の部分における制御変数Ｘの時系列α3（破線部分）を算定する。音声合成部３４は、補間後の変数遷移データＷを適用区間ＳA（編集区間ＳBおよび編集外区間）内の音声合成に適用する。 For example, as shown in FIG. 11, the variable setting unit 46 includes the time series α1 of the control variable X indicated by the variable transition data W in the editing section SB and the sections immediately before and immediately after the editing section SB (non-editing section). By overlapping the time series α2 of the control variable X indicated by the variable transition data W on the time axis and cross-fading, the time series α3 (dashed line portion) of the control variable X at the start point and end point of the editing section SB ) Is calculated. The speech synthesizer 34 applies the interpolated variable transition data W to speech synthesis within the application section SA (the editing section SB and the non-editing section).

以上の構成によれば、編集区間ＳBと編集外区間との境界における制御変数Ｘの不連続な変化が抑制されるから、合成音の音楽的な表情の不自然（唐突）な変化が防止される。したがって、音楽的な表情が滑らかに遷移する自然な合成音を生成できるという利点がある。もっとも、編集区間ＳBの内外で音楽的な表情を不連続に変化させることが望ましい楽曲（制御変数Ｘを補間すると却って不自然となる楽曲）も存在し得るから、変数遷移データＷの補間の実行の有無を利用者が指示し得る構成が好適である。 According to the above configuration, since the discontinuous change of the control variable X at the boundary between the editing section SB and the non-editing section is suppressed, an unnatural (abrupt) change in the musical expression of the synthesized sound is prevented. The Therefore, there is an advantage that a natural synthesized sound in which a musical expression smoothly transitions can be generated. However, since there may be a song (a song that becomes unnatural when the control variable X is interpolated) for which it is desirable to discontinuously change the musical expression inside and outside the editing section SB, the variable transition data W is interpolated. A configuration in which the user can instruct the presence or absence of this is preferable.

＜Ｄ：第４実施形態＞
第２実施形態では編集区間ＳBの効果調整値Ａと編集外区間の効果調整値Ａとを独立に設定したが、編集区間ＳBと編集外区間とで効果調整値Ａが極端に相違すると、合成音の音楽的な表情の遷移が不自然となる可能性もある。そこで、第４実施形態では、編集区間ＳBの効果調整値Ａと編集外区間の効果調整値Ａとの一方を他方に連動して変化させる。 <D: Fourth Embodiment>
In the second embodiment, the effect adjustment value A of the editing section SB and the effect adjustment value A of the non-editing section are set independently. However, if the effect adjustment value A is extremely different between the editing section SB and the non-editing section, the synthesis is performed. The transition of the musical expression of the sound may be unnatural. Therefore, in the fourth embodiment, one of the effect adjustment value A in the editing section SB and the effect adjustment value A in the non-editing section is changed in conjunction with the other.

具体的には、調整値設定部４４は、編集区間ＳBの効果調整値Ａと編集外区間の効果調整値Ａとを、両者の比率（相対比）を維持しながら利用者からの指示に応じて変化させる。例えば、図１０の例示のように編集外区間（適用区間ＳA）の効果調整値Ａが80％に設定されるとともに編集区間ＳBの効果調整値Ａが90％に設定された場合を想定する。図１２の例示のように、利用者が編集外区間（パートＡ）の効果調整値Ａを80％から70％に変更すると、調整値設定部４４は、編集区間ＳBの効果調整値Ａを図１０の90％から78％（≒90×70／80）に変更する。編集区間ＳBの効果調整値Ａを利用者が変更した場合にも同様に、調整値設定部４４は、変更前の両者の比率が維持されるように編集外区間の効果調整値Ａを変更する。 Specifically, the adjustment value setting unit 44 responds to an instruction from the user while maintaining the ratio (relative ratio) between the effect adjustment value A of the editing section SB and the effect adjustment value A of the non-editing section. Change. For example, assume that the effect adjustment value A of the non-editing section (application section SA) is set to 80% and the effect adjustment value A of the editing section SB is set to 90% as illustrated in FIG. As illustrated in FIG. 12, when the user changes the effect adjustment value A in the non-editing section (part A) from 80% to 70%, the adjustment value setting unit 44 displays the effect adjustment value A in the editing section SB. Change from 90% of 10 to 78% (≒ 90 × 70/80). Similarly, when the user changes the effect adjustment value A of the editing section SB, the adjustment value setting unit 44 changes the effect adjustment value A of the non-editing section so that the ratio between the two before the change is maintained. .

以上の構成によれば、編集区間ＳBの効果調整値Ａと編集外区間の効果調整値Ａとの比率を維持したまま各々が変更されるから、編集区間ＳBの内外で音楽的な表情が不自然に変化することを抑制できるという利点がある。 According to the above configuration, each change is made while maintaining the ratio between the effect adjustment value A of the editing section SB and the effect adjustment value A of the non-editing section, so that the musical expression is not good inside and outside the editing section SB. There is an advantage that it is possible to suppress natural changes.

＜Ｅ：変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <E: Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）変形例１
区間条件を構成する属性は以上の例示（音質，ジャンル，曲部位）に限定されない。例えば、楽曲の調（キー）を含む区間条件に応じて変数情報ＤPを選択する構成が採用される。もっとも、変数情報ＤPの選択にとって区間条件の指示は必須ではなく、変数情報ＤPの選択の方法は適宜に変更される。例えば、複数の変数情報ＤPの何れかを利用者が直接的に（すなわち区間条件の入力を介さずに）入力装置１４から指示する構成も採用され得る。以上の説明から理解されるように、以上の各形態における変数選択部４２は、複数の変数情報ＤPのうち利用者からの指示に応じた変数情報ＤPを選択する要素として包括される。ただし、音質やジャンルや曲部位といった利用者に馴染みのある区間条件を変数情報ＤPの選択に利用する以上の各形態によれば、変数情報ＤPで付与される音楽的な表情について詳細な知識がなくても、利用者の所望の変数情報ＤPを音声合成に適用できるという格別の効果が実現される。 (1) Modification 1
The attributes constituting the section condition are not limited to the above examples (sound quality, genre, song part). For example, a configuration is adopted in which the variable information DP is selected according to the section condition including the key (key) of the music. However, the instruction of the section condition is not essential for the selection of the variable information DP, and the method for selecting the variable information DP is appropriately changed. For example, a configuration in which the user directly instructs any of the plurality of variable information DP from the input device 14 (that is, not via the input of the section condition) may be employed. As can be understood from the above description, the variable selection unit 42 in each of the above forms is included as an element for selecting the variable information DP according to the instruction from the user among the plurality of variable information DP. However, according to each of the above forms in which section conditions familiar to the user such as sound quality, genre, and song part are used for selection of the variable information DP, detailed knowledge about the musical expression given by the variable information DP is obtained. Even if it is not, a special effect that the variable information DP desired by the user can be applied to speech synthesis is realized.

（２）変形例２
効果調整値Ａに応じた変数遷移データＷの生成の方法は任意である。例えば、変数遷移データＶに対して効果調整値Ａを加算または乗算することで変数設定部４６が変数遷移データＷを生成する構成も採用される。また、多様な表情の付与という効果が不要であれば、効果調整値Ａの設定（調整値設定部４４）は省略され得る。 (2) Modification 2
A method of generating the variable transition data W according to the effect adjustment value A is arbitrary. For example, a configuration in which the variable setting unit 46 generates the variable transition data W by adding or multiplying the effect adjustment value A to the variable transition data V is also employed. If the effect of imparting various facial expressions is unnecessary, the setting of the effect adjustment value A (adjustment value setting unit 44) can be omitted.

（３）変形例３
第３実施形態においては編集区間ＳAと編集外区間とで制御変数Ｘを補間したが、時間軸上で相前後する２個の適用区間ＳAについて制御変数Ｘを補間する構成も採用される。適用区間ＳA間の制御変数Ｘの補間には、編集区間ＳAと編集外区間とで制御変数Ｘを補間する第３実施形態と同様の方法が採用される。各適用区間ＳAについて制御変数Ｘを補間する構成によれば、相前後する各適用区間ＳAの境界における制御変数Ｘの不連続な変化が抑制されるから、音楽的な表情が滑らかに遷移する自然な合成音を生成できるという利点がある。 (3) Modification 3
In the third embodiment, the control variable X is interpolated between the editing section SA and the non-editing section. However, a configuration in which the control variable X is interpolated for two application sections SA that are in succession on the time axis is also employed. For the interpolation of the control variable X between the application sections SA, the same method as in the third embodiment in which the control variable X is interpolated between the editing section SA and the non-editing section is employed. According to the configuration in which the control variable X is interpolated for each application section SA, the discontinuous change of the control variable X at the boundary of each successive application section SA is suppressed, so that the musical expression changes smoothly. There is an advantage that a simple synthesized sound can be generated.

（４）変形例４
以上の各形態においては、変数処理部３２（変数設定部４６）が生成した変数遷移データＷ（制御変数Ｘの時系列）に応じて音声合成部３４が音声信号ＳOUTを生成する音声合成装置１００を例示したが、音声合成に適用される変数遷移データＷを生成する装置（変数処理装置）としても本発明は実施され得る。 (4) Modification 4
In each of the above embodiments, the speech synthesizer 100 in which the speech synthesizer 34 generates the speech signal SOUT according to the variable transition data W (time series of the control variable X) generated by the variable processing unit 32 (variable setting unit 46). However, the present invention can also be implemented as a device (variable processing device) that generates variable transition data W applied to speech synthesis.

具体的には、変数処理装置は、制御変数Ｘの時系列を示す複数の変数情報ＤPを記憶する記憶装置１２と、利用者からの指示に応じて適用区間ＳAを可変に設定する区間設定部２６と、複数の変数情報ＤPのうち利用者からの指示に応じた変数情報ＤPを利用して適用区間ＳA内の制御変数Ｘの時系列（変数遷移データＷ）を設定する変数処理部３２とを具備する。すなわち、情報生成部２４や音声合成部３４や表示制御部２２は適宜に省略され得る。変数処理装置（変数処理部３２）が生成した変数遷移データＷは、可搬型の記録媒体や通信網を介して、変数処理装置とは別体の音声合成装置（音声合成部３４を具備する）に提供されて音声合成に適用される。 Specifically, the variable processing device includes a storage device 12 that stores a plurality of variable information DP indicating a time series of the control variable X, and a section setting unit that variably sets the application section SA according to an instruction from the user. 26, and a variable processing unit 32 that sets a time series (variable transition data W) of the control variable X in the application section SA using the variable information DP according to the instruction from the user among the plurality of variable information DP. It comprises. That is, the information generation unit 24, the speech synthesis unit 34, and the display control unit 22 can be omitted as appropriate. The variable transition data W generated by the variable processing device (variable processing unit 32) is separated from the variable processing device via a portable recording medium or communication network (having a speech synthesizing unit 34). Applied to speech synthesis.

１００……音声合成装置、１０……制御装置、１２……記憶装置、１４……入力装置、１６……表示装置、１８……放音装置、２２……表示制御部、２４……情報生成部、２６……区間設定部、３２……変数処理部、３４……音声合成部、４２……変数選択部、４４……調整値設定部、４６……変数設定部、６０……編集画像、６２……楽譜領域、６２２……音指示子、６２４，６２６……区間指示子、６４……変数領域、７０……管理画像、７２……レコード、８０……操作画像、８２……条件指示領域、８４……調整値指示領域、６４２……変数遷移画像、ＳA……適用区間、ＳB……編集区間、ＤS……音楽情報、ＤV……音素情報、ＤP……変数情報、Ｖ，Ｗ……変数遷移データ。
100 ... speech synthesizer, 10 ... control device, 12 ... storage device, 14 ... input device, 16 ... display device, 18 ... sound emitting device, 22 ... display control unit, 24 ... information generation , 26... Section setting section, 32... Variable processing section, 34... Speech synthesis section, 42... Variable selection section, 44 ....... adjustment value setting section, 46. 62 …… Score area, 622 …… Sound indicator, 624,626… Section indicator, 64 …… Variable area, 70 …… Management image, 72 …… Record, 80 …… Operation image, 82 …… Condition Designated area, 84... Adjustment value designated area, 642... Variable transition image, SA... Applicable section, SB... Edit section, DS ... Music information, DV ... Phoneme information, DP ... Variable information, V, W: Variable transition data.

Claims

Section setting means for variably setting the application section in the time series of the designated sound indicated by the music information according to an instruction from the user;
A plurality of variable information indicating the time sequence of the control variables applied to speech synthesis, among a plurality of variable information corresponding to a plurality of respective different combinations of the options for the attributes of the synthesized speech, wherein the user each Variable selection means for selecting variable information corresponding to the combination of options instructed for the attribute ;
Variable setting means for setting a time series of control variables in the application section according to the variable information selected by the variable selection means;
A voice synthesizer comprising: a voice synthesizing unit for synthesizing a designated sound indicated by the music information, the voice synthesizing unit applying a time series of control variables set by the variable setting unit to synthesis of the designated sound in the application section; .

Section setting means for variably setting the application section in the time series of the designated sound indicated by the music information and the editing section in the application section according to an instruction from the user;
Variable selection means for selecting variable information according to an instruction from a user among a plurality of variable information indicating a time series of control variables applied to speech synthesis ;
A means for setting a time series of control variables in the applied section according to the variable information selected by the variable selecting means, except for the time series of the control variables in the editing section and the editing section other than the applied section. Variable setting means capable of independently setting the time series of control variables in the interval of
A voice synthesizer comprising: a voice synthesizing unit for synthesizing a designated sound indicated by the music information, the voice synthesizing unit applying a time series of control variables set by the variable setting unit to synthesis of the designated sound in the application section; .

The speech synthesis apparatus according to claim 2 , wherein the variable setting unit performs interpolation of the control variable so that the control variable is continuous inside and outside the editing section in the application section.

Comprising adjustment value setting means for variably setting the effect adjustment value in accordance with an instruction from the user;
The variable setting means sets a time series of control variables in the application section so that the variable information is reflected in the synthesis of the designated sound in the application section to a degree according to the effect adjustment value. The speech synthesizer according to claim 3 .

  Computer
  Section setting means for variably setting the application section in the time series of the designated sound indicated by the music information according to an instruction from the user,
  A plurality of variable information indicating a time series of control variables applied to speech synthesis, and the user has instructed each of the attributes among the variable information corresponding to the combination of options for each of the plurality of attributes of the synthesized sound. Variable selection means for selecting variable information corresponding to the combination of options,
  Variable setting means for setting a time series of control variables in the application section according to the variable information selected by the variable selection means; and
  Means for synthesizing a designated sound indicated by the music information, the voice synthesizing means for applying a time series of control variables set by the variable setting means to synthesis of the designated sound in the application section;
  Program to function as

Computer
  Section setting means for variably setting the application section in the time series of the designated sound indicated by the music information and the editing section in the application section according to an instruction from the user,
  Variable selection means for selecting variable information according to an instruction from a user among a plurality of variable information indicating a time series of control variables applied to speech synthesis;
  A means for setting a time series of control variables in the applied section according to the variable information selected by the variable selecting means, except for the time series of the control variables in the editing section and the editing section other than the applied section. Variable setting means capable of independently setting the time series of control variables in the interval of
  Means for synthesizing a designated sound indicated by the music information, the voice synthesizing means for applying a time series of control variables set by the variable setting means to synthesis of the designated sound in the application section;
  Program to function as.