JP6337455B2

JP6337455B2 - Speech synthesizer

Info

Publication number: JP6337455B2
Application number: JP2013257938A
Authority: JP
Inventors: 康英檜垣
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-12-13
Filing date: 2013-12-13
Publication date: 2018-06-06
Anticipated expiration: 2033-12-13
Also published as: JP2015114584A

Description

本発明は、音声合成装置に関し、例えば、消防無線システムのデジタル無線等で、基地局より送信された音声と、他の移動局により送信された音声を合成するものするものに関する。 The present invention relates to a voice synthesizer, and more particularly to a voice synthesizer that synthesizes a voice transmitted from a base station and a voice transmitted from another mobile station by digital radio of a fire fighting radio system.

近年、消防無線システムのデジタル無線等で、基地局より送信された音声と、他の移動局により送信された音声を合成するものするものが、知られている。 2. Description of the Related Art In recent years, there has been known a technique that synthesizes a voice transmitted from a base station and a voice transmitted from another mobile station by a digital radio of a fire fighting radio system.

なお、本件発明の参考技術として、特許文献１に記載の技術が知られている。 As a reference technique of the present invention, a technique described in Patent Document 1 is known.

特開平１１−１６１２９５号公報Japanese Patent Laid-Open No. 11-161295

しかしながら、基地局より送信された音声と、他の移動局により送信された音声を合成する場合、これら各々の音声において、同期しているクロックが異なっている。このため、これら２つの音声を単純に合成してしまうと、一方の音声にスリップが発生し、周期性のあるノイズが生じてしまうという問題があった。 However, when synthesizing the voice transmitted from the base station and the voice transmitted from another mobile station, the clocks synchronized in these respective voices are different. For this reason, when these two sounds are simply synthesized, there is a problem that slip occurs in one of the sounds, and periodic noise is generated.

本発明は、このような事情を鑑みてなされたものであり、本発明の目的は、音声スリップの発生を抑止して、音声合成を行うことができる音声合成装置を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a speech synthesizer capable of performing speech synthesis while suppressing the occurrence of speech slip.

本発明の音声合成装置は、第１の音声データのデータ数である第１の音声データ数を計測する第１のカウンタと、第２の音声データのデータ数である第２の音声データ数を計測する第２のカウンタと、前記第１のカウンタにより計測された第１の音声データ数と、第２のカウンタにより計測された第２の音声データ数とを比較するデータ数比較部と、前記データ数比較部による比較結果に基づいて、前記第２の音声データを短縮または伸張し、短縮または伸張した後の第２の音声データを出力するデータ短縮伸張部と、前記データ短縮伸張部により出力された前記短縮または伸張した後の第２の音声データと、前記第１の音声データを合成する合成部とを備えている。 The speech synthesizer of the present invention includes a first counter that measures the number of first speech data that is the number of first speech data, and a second number of speech data that is the number of second speech data. A second counter to be measured; a data number comparison unit that compares the first audio data number measured by the first counter with the second audio data number measured by the second counter; Based on the comparison result by the data number comparison unit, the second audio data is shortened or expanded, the data shortening / decompressing unit outputting the second audio data after shortening or expanding, and the data shortening / decompressing unit outputting The second audio data after the shortening or expansion, and a synthesis unit for synthesizing the first audio data.

本発明にかかる音声合成装置によれば、音声スリップの発生を抑止して、音声合成を行うことができる。 The speech synthesizer according to the present invention can perform speech synthesis while suppressing the occurrence of speech slip.

本発明の実施の形態における音声合成装置の構成を示す図である。It is a figure which shows the structure of the speech synthesizer in embodiment of this invention. 音声データの短縮処理の一例を示す図である。It is a figure which shows an example of the shortening process of audio | voice data. 音声データの伸張処理の一例を示す図である。It is a figure which shows an example of the expansion | extension process of audio | voice data.

本発明の実施の形態における音声合成装置１００の構成について説明する。 The configuration of speech synthesis apparatus 100 in the embodiment of the present invention will be described.

図１は、音声合成装置１００の構成を示す図である。 FIG. 1 is a diagram illustrating the configuration of the speech synthesizer 100.

図１に示されるように、音声合成装置１００は、第１のＦＩＦＯ（First In First Out）メモリ１１０と、第２のＦＩＦＯメモリ１２０と、第１のカウンタ１３０と、第２のカウンタ１４０と、データ数比較部１５０と、相関ピーク検出部１６０と、データ短縮伸張部１７０と、合成部１８０とを備えている。 As shown in FIG. 1, the speech synthesizer 100 includes a first FIFO (First In First Out) memory 110, a second FIFO memory 120, a first counter 130, a second counter 140, A data number comparison unit 150, a correlation peak detection unit 160, a data shortening / expansion unit 170, and a synthesis unit 180 are provided.

図１に示されるように、第１のＦＩＦＯメモリ１１０には、第１の音声データＦ１が入力される。ここで、第１の音声データＦ１は、例えば、基地局（不図示）から送信される音声データである。なお、基地局は、本発明の第１の通信局に対応する。 As shown in FIG. 1, the first audio data F <b> 1 is input to the first FIFO memory 110. Here, the first audio data F1 is, for example, audio data transmitted from a base station (not shown). The base station corresponds to the first communication station of the present invention.

また、第１のＦＩＦＯメモリ１１０には、クロック信号ＣＬＫ＿Ａが入力される。このクロック信号ＣＬＫ＿Ａは、第１の音声データＦ１の電波の周波数に対応している。 In addition, the clock signal CLK_A is input to the first FIFO memory 110. The clock signal CLK_A corresponds to the radio wave frequency of the first audio data F1.

第１のＦＩＦＯメモリ１１０は、第１のカウンタ１３０と合成部１８０に接続されている。 The first FIFO memory 110 is connected to the first counter 130 and the combining unit 180.

第１のＦＩＦＯメモリ１１０は、入力される第１の音声データを当該第１の音声データの電波に同期して一時的に記憶しながら、順次、第１のカウンタ１３０および合成部１８０へ出力する。 The first FIFO memory 110 sequentially outputs the input first audio data to the first counter 130 and the synthesis unit 180 while temporarily storing the first audio data in synchronization with the radio waves of the first audio data. .

図１に示されるように、第２のＦＩＦＯメモリ１２０には、第２の音声データＦ２が入力される。ここで、第２の音声データＦ２は、例えば、移動局（不図示）から送信される音声データである。なお、移動局は、本発明の第２の通信局に対応する。 As shown in FIG. 1, the second audio data F <b> 2 is input to the second FIFO memory 120. Here, the second audio data F2 is, for example, audio data transmitted from a mobile station (not shown). The mobile station corresponds to the second communication station of the present invention.

また、第１のＦＩＦＯメモリ１１０には、クロック信号ＣＬＫ＿ＡおよびクロックＣＬＫ＿Ｂが入力される。このクロック信号ＣＬＫ＿Ｂは、第２の音声データＦ２の電波の周波数に対応している。 Further, the first FIFO memory 110 receives the clock signal CLK_A and the clock CLK_B. The clock signal CLK_B corresponds to the radio wave frequency of the second audio data F2.

第２のＦＩＦＯメモリ１２０は、第２のカウンタ１４０と、相関ピーク検出部１６０と、データ短縮伸張部１７０に接続されている。 The second FIFO memory 120 is connected to the second counter 140, the correlation peak detection unit 160, and the data shortening / decompression unit 170.

第２のＦＩＦＯメモリ１２０は、入力される第２の音声データＦ２を当該第２の音声データＦ２の電波に同期して一時的に記憶しながら、順次、第２のカウンタ１４０、相関ピーク検出部１６０およびデータ短縮伸張部１７０へ出力する。 The second FIFO memory 120 sequentially stores the second audio data F2 that is input in synchronization with the radio waves of the second audio data F2, while sequentially storing the second counter 140, the correlation peak detector. 160 and the data shortening / decompression unit 170.

図１に示されるように、第１のカウンタ１３０は、第１のＦＩＦＯメモリ１１０と、データ数比較部１５０に接続されている。 As shown in FIG. 1, the first counter 130 is connected to the first FIFO memory 110 and the data number comparison unit 150.

第１のカウンタ１３０には、第１のＦＩＦＯ１１０メモリから、第１の音声データＦ１が入力される。第１のカウンタ１３０は、第１の音声データＦ１のデータ数である第１の音声データ数Ｎ１を計測する。そして、第１のカウンタ１３０は、第１の音声データ数Ｎ１の計測値を、データ数比較部１５０へ出力する。 The first counter 130 receives the first audio data F1 from the first FIFO 110 memory. The first counter 130 measures a first number of audio data N1, which is the number of data of the first audio data F1. Then, the first counter 130 outputs the measured value of the first audio data number N1 to the data number comparison unit 150.

図１に示されるように、第２のカウンタ１４０は、第２のＦＩＦＯメモリ１２０と、データ数比較部１５０に接続されている。 As shown in FIG. 1, the second counter 140 is connected to the second FIFO memory 120 and the data number comparison unit 150.

第２のカウンタ１４０には、第２のＦＩＦＯメモリ１２０から、第２の音声データＦ２が入力される。第２のカウンタ１４０は、第２の音声データＦ２のデータ数である第２の音声データ数Ｎ２を計測する。そして、第２のカウンタ１４０は、第２の音声データ数Ｎ２の計測値を、データ数比較部１５０へ出力する。 The second audio data F <b> 2 is input from the second FIFO memory 120 to the second counter 140. The second counter 140 measures the second number of audio data N2, which is the number of data of the second audio data F2. Then, the second counter 140 outputs the measurement value of the second audio data number N2 to the data number comparison unit 150.

図１に示されるように、データ数比較部１５０は、第１のカウンタ１３０と、第２のカウンタ１４０と、データ短縮伸張部１７０とに接続されている。 As shown in FIG. 1, the data number comparison unit 150 is connected to a first counter 130, a second counter 140, and a data shortening / expanding unit 170.

データ数比較部１５０には、第１のカウンタ１３０から、第１の音声データ数Ｎ１の計測値が入力される。また、データ数比較部１５０には、第２のカウンタ１４０から、第１の音声データ数Ｎ２の計測値が入力される。 A measured value of the first number of audio data N1 is input from the first counter 130 to the data number comparison unit 150. Further, the measurement value of the first number of audio data N2 is input from the second counter 140 to the data number comparison unit 150.

データ数比較部１５０は、第１のカウンタ１３０により計測された第１の音声データ数Ｎ１と、第２のカウンタ１４０により計測された第２の音声データ数Ｎ２とを比較する。そして、データ数比較部１５０は、第１の音声データ数Ｎ１および第２の音声データ数Ｎ２の差分値を、データ短縮伸張部１７０へ出力する。 The data number comparison unit 150 compares the first audio data number N1 measured by the first counter 130 with the second audio data number N2 measured by the second counter 140. Then, the data number comparison unit 150 outputs the difference value between the first audio data number N1 and the second audio data number N2 to the data shortening / expanding unit 170.

図１に示されるように、相関ピーク検出部１６０は、第２のＦＩＦＯメモリ１２０と、データ短縮伸張部１７０とに接続されている。 As shown in FIG. 1, the correlation peak detection unit 160 is connected to the second FIFO memory 120 and the data shortening / expanding unit 170.

相関ピーク検出部１６０は、第２の音声データＦ２の自己相関ピークを検出する。また、相関ピーク検出部１６０は、検出した第２の音声データＦ２の自己相関ピークの位置と、第２の音声データ数Ｎ２に基づいて、第２の音声データＦ２の波長λ２を算出する。そして、相関ピーク検出部１６０は、第２の音声データＦ２の波長λ２をデータ短縮伸張部１７０へ出力する。 Correlation peak detector 160 detects the autocorrelation peak of second audio data F2. The correlation peak detector 160 calculates the wavelength λ2 of the second audio data F2 based on the detected position of the autocorrelation peak of the second audio data F2 and the second number of audio data N2. Correlation peak detection section 160 then outputs wavelength λ2 of second audio data F2 to data shortening / expansion section 170.

図１に示されるように、データ短縮伸張部１７０は、第２のＦＩＦＯメモリ１２０と、データ数比較部１５０と、相関ピーク検出部１６０と、合成部１８０とに接続されている。 As shown in FIG. 1, the data shortening / expanding unit 170 is connected to the second FIFO memory 120, the data number comparing unit 150, the correlation peak detecting unit 160, and the combining unit 180.

データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を出力する。 The data shortening / expanding section 170 shortens or expands the second sound data F2 based on the comparison result by the data number comparing section 150, and outputs the second sound data F2 after shortening or expanding.

すなわち、データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第１の音声データ数Ｎ１または第２の音声データ数Ｎ２のどちらが多いのかを判断する。そして、データ短縮伸張部１７０は、第１の音声データ数Ｆ１と第２の音声データＦ２を合わせるために、第２の音声データＦ２を短縮または伸張するのかを決定する。このとき、データ短縮伸張部１７０は、第１の音声データ数Ｎ１と第２の音声データ数Ｎ２の差分値に基づいて、どの程度の大きさで、第２の音声データＦ２を短縮または伸張するのかを決定する。また、データ短縮伸張部１７０は、データ数比較部１５０による比較結果と、相関ピーク検出部１６０により検出された第２の音声データの波長λ２とに基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を合成部１８０へ出力する。 That is, the data shortening / decompressing unit 170 determines whether the first audio data number N1 or the second audio data number N2 is larger based on the comparison result by the data number comparing unit 150. Then, the data shortening / expanding unit 170 determines whether to shorten or expand the second sound data F2 in order to match the first sound data number F1 and the second sound data F2. At this time, the data shortening / decompressing unit 170 shortens or decompresses the second audio data F2 to what size based on the difference value between the first audio data number N1 and the second audio data number N2. To decide. The data shortening / expanding unit 170 shortens or reduces the second audio data F2 based on the comparison result by the data number comparing unit 150 and the wavelength λ2 of the second audio data detected by the correlation peak detecting unit 160. The second audio data F2 after being expanded, shortened or expanded is output to the synthesis unit 180.

ここで、データ短縮伸張部１７０の具体的な短縮処理または伸張処理について、説明する。 Here, a specific shortening process or decompression process of the data shortening / decompressing unit 170 will be described.

図２は、音声データの短縮処理の一例を示す図である。図２に示されるように、データ短縮伸張部１７０は、相関ピーク検出部２６０により検出された第２の音声データの波長λ２に基づいて、例えば、３波長分の長さのデータを、２波長分の長さのデータに、短縮する。このとき、より好ましくは、図２に示されるように、オーバーラップする区間を設けることにより、より自然なつながりを有する音声とすることができる。 FIG. 2 is a diagram illustrating an example of audio data shortening processing. As shown in FIG. 2, the data shortening / expanding unit 170 converts, for example, data having a length of three wavelengths into two wavelengths based on the wavelength λ2 of the second audio data detected by the correlation peak detecting unit 260. Reduce to minute length data. At this time, more preferably, by providing overlapping sections as shown in FIG. 2, it is possible to obtain a voice having a more natural connection.

図３は、音声データの伸張処理の一例を示す図である。図３に示されるように、データ短縮伸張部１７０は、相関ピーク検出部２６０により検出された第２の音声データの波長λ２に基づいて、例えば、２波長分の長さのデータを、３波長分の長さのデータに、伸張する。このとき、より好ましくは、図３に示されるように、オーバーラップする区間を設けることにより、より自然なつながりを有する音声とすることができる。 FIG. 3 is a diagram illustrating an example of audio data decompression processing. As shown in FIG. 3, the data shortening / expanding unit 170 converts, for example, data having a length of two wavelengths into three wavelengths based on the wavelength λ2 of the second audio data detected by the correlation peak detecting unit 260. Decompresses to minutes of data. At this time, more preferably, by providing overlapping sections as shown in FIG. 3, it is possible to obtain a voice having a more natural connection.

図２および図３を用いて説明したように、本発明では、第２の音声データＦ２に対して短縮処理または伸張処理を行った後に、第１の音声データＦ１および第２の音声データＦ２を合成する。これにより、音声スリップを発生させずに、デジタル音声合成を行うことができる。 As described with reference to FIGS. 2 and 3, in the present invention, after the shortening process or the expansion process is performed on the second sound data F2, the first sound data F1 and the second sound data F2 are stored. Synthesize. As a result, digital speech synthesis can be performed without causing speech slip.

図１に示されるように、合成部１８０は、第１のＦＩＦＯメモリ１４０とデータ短縮伸張部１７０に接続されている。 As shown in FIG. 1, the synthesis unit 180 is connected to the first FIFO memory 140 and the data shortening / expanding unit 170.

合成部１８０は、データ短縮伸張部１７０により出力された短縮または伸張した後の第２の音声データＦ２と、第１の音声データＦ１を合成し、この合成データを出力する。 The synthesizing unit 180 synthesizes the second audio data F2 output from the data shortening / expanding unit 170 after being shortened or expanded and the first audio data F1, and outputs the synthesized data.

以上、音声合成装置１００の構成について説明した。 The configuration of the speech synthesizer 100 has been described above.

次に音声合成装置１００の動作について説明する。 Next, the operation of the speech synthesizer 100 will be described.

まず、基地局（不図示）から送信された第１の音声データＦ１が、第１のＦＩＦＯメモリ１１０に入力される。併せて、別の移動局（不図示）から送信された第２の音声データＦ２が、第２のＦＩＦＯメモリ１２０へ入力される。これらの動作と同時に、クロック信号ＣＬＫ＿Ａが、第１のＦＩＦＯメモリ１１０および第２のＦＩＦＯメモリ１２０に入力される。さらに、クロック信号ＣＬＫ＿Ｂが、第２のＦＩＦＯメモリ１２０に入力される。なお、前述の通り、クロック信号ＣＬＫ＿Ａは、第１の音声データＦ１の電波の周波数に対応している。また、クロック信号ＣＬＫ＿Ｂは、第２の音声データＦ２の電波の周波数に対応している。 First, first audio data F1 transmitted from a base station (not shown) is input to the first FIFO memory 110. In addition, the second audio data F2 transmitted from another mobile station (not shown) is input to the second FIFO memory 120. Simultaneously with these operations, the clock signal CLK_A is input to the first FIFO memory 110 and the second FIFO memory 120. Further, the clock signal CLK_B is input to the second FIFO memory 120. As described above, the clock signal CLK_A corresponds to the frequency of the radio wave of the first audio data F1. The clock signal CLK_B corresponds to the radio wave frequency of the second audio data F2.

次に、第１のＦＩＦＯメモリ１１０は、入力される第１の音声データを当該第１の音声データの電波に同期して一時的に記憶しながら、順次、第１のカウンタ１３０および合成部１８０へ出力する。第２のＦＩＦＯメモリ１２０は、入力される第２の音声データを当該第２の音声データの電波に同期して一時的に記憶しながら、順次、第２のカウンタ１４０、相関ピーク検出部１６０およびデータ短縮伸張部１７０へ出力する。 Next, the first FIFO memory 110 sequentially stores the input first audio data in synchronization with the radio waves of the first audio data, and sequentially stores the first counter 130 and the synthesis unit 180. Output to. The second FIFO memory 120 sequentially stores the input second audio data in synchronization with the radio waves of the second audio data, and sequentially stores the second counter 140, the correlation peak detector 160, and The data is output to the data shortening / expanding unit 170.

次に、第１のカウンタ１３０は、第１の音声データ数Ｎ１を計測する。そして、第１のカウンタ１３０は、第１の音声データ数Ｎ１の計測値を、データ数比較部１５０へ出力する。第２のカウンタ１４０は、第２の音声データ数Ｎ２を計測する。そして、第２のカウンタ１４０は、第２の音声データ数Ｎ２の計測値を、データ数比較部１５０へ出力する。 Next, the first counter 130 measures the first number of audio data N1. Then, the first counter 130 outputs the measured value of the first audio data number N1 to the data number comparison unit 150. The second counter 140 measures the second number of audio data N2. Then, the second counter 140 outputs the measurement value of the second audio data number N2 to the data number comparison unit 150.

次に、データ数比較部１５０は、第１のカウンタ１３０により計測された第１の音声データ数Ｎ１と、第２のカウンタ１４０により計測された第２の音声データ数Ｎ２とを比較する。そして、データ数比較部１５０は、第１の音声データ数Ｎ１および第２の音声データ数Ｎ２の差分値を、データ短縮伸張部１７０へ出力する。 Next, the data number comparison unit 150 compares the first audio data number N1 measured by the first counter 130 with the second audio data number N2 measured by the second counter 140. Then, the data number comparison unit 150 outputs the difference value between the first audio data number N1 and the second audio data number N2 to the data shortening / expanding unit 170.

また、相関ピーク検出部１６０は、第２の音声データＦ２の自己相関ピークを検出する。相関ピーク検出部１６０は、検出した第２の音声データＦ２の自己相関ピークの位置と、第２の音声データ数Ｎ２に基づいて、第２の音声データＦ２の波長λ２を算出する。そして、相関ピーク検出部１６０は、第２の音声データＦ２の波長λ２をデータ短縮伸張部１７０へ出力する。 Correlation peak detector 160 detects an autocorrelation peak of second audio data F2. The correlation peak detector 160 calculates the wavelength λ2 of the second audio data F2 based on the detected autocorrelation peak position of the second audio data F2 and the second number of audio data N2. Correlation peak detection section 160 then outputs wavelength λ2 of second audio data F2 to data shortening / expansion section 170.

次に、データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を出力する。 Next, the data shortening / expanding unit 170 shortens or expands the second sound data F2 based on the comparison result by the data number comparing unit 150, and outputs the second sound data F2 after the shortening or expanding.

すなわち、データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第１の音声データ数Ｎ１または第２の音声データ数Ｎ２のどちらが多いのかを判断する。そして、データ短縮伸張部１７０は、第１の音声データ数Ｆ１と第２の音声データＦ２を合わせるために、第２の音声データＦ２を短縮または伸張するのかを決定する。 That is, the data shortening / decompressing unit 170 determines whether the first audio data number N1 or the second audio data number N2 is larger based on the comparison result by the data number comparing unit 150. Then, the data shortening / expanding unit 170 determines whether to shorten or expand the second sound data F2 in order to match the first sound data number F1 and the second sound data F2.

このとき、データ短縮伸張部１７０は、第１の音声データ数Ｎ１と第２の音声データ数Ｎ２の差分値に基づいて、どの程度の大きさで、第２の音声データＦ２を短縮または伸張するのかを決定する。例えば、データ短縮伸張部１７０は、第２の音声データＦ２に対して、（Ｎ１／Ｎ２）倍の圧縮処理または伸長処理を行う。 At this time, the data shortening / decompressing unit 170 shortens or decompresses the second audio data F2 to what size based on the difference value between the first audio data number N1 and the second audio data number N2. To decide. For example, the data shortening / expanding unit 170 performs (N1 / N2) times compression processing or expansion processing on the second audio data F2.

また、データ短縮伸張部１７０は、データ数比較部１５０による比較結果と、相関ピーク検出部１６０により検出された第２の音声データの波長λ２とに基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を合成部１８０へ出力する。 The data shortening / expanding unit 170 shortens or reduces the second audio data F2 based on the comparison result by the data number comparing unit 150 and the wavelength λ2 of the second audio data detected by the correlation peak detecting unit 160. The second audio data F2 after being expanded, shortened or expanded is output to the synthesis unit 180.

最後に、合成部１８０は、データ短縮伸張部１７０により出力された短縮または伸張した後の第２の音声データＦ２と、第１の音声データＦ１を合成し、この合成データを出力する。 Finally, the synthesizing unit 180 synthesizes the second audio data F2 output from the data shortening / expanding unit 170 after being shortened or expanded and the first audio data F1, and outputs the synthesized data.

このように、本発明では、第２の音声データＦ２に対して短縮処理または伸張処理を行った後に、第１の音声データＦ１および第２の音声データＦ２を合成する。これにより、第１の音声データＦ１に同期するクロックＣＬＫ＿Ａと、第２の音声データＦ２に同期するクロックＣＬＫ＿Ｂが互いに異なっていても、音声スリップを発生させずに、デジタル音声合成を行うことができる。 As described above, in the present invention, the first audio data F1 and the second audio data F2 are synthesized after the shortening process or the expansion process is performed on the second audio data F2. Thereby, even if the clock CLK_A synchronized with the first audio data F1 and the clock CLK_B synchronized with the second audio data F2 are different from each other, digital audio synthesis can be performed without causing audio slip. .

以上、次に音声合成装置１００の動作について説明した。 The operation of the speech synthesizer 100 has been described above.

以上の通り、本発明の実施の形態における音声合成装置１００は、第１のカウンタ１３０と、第２のカウンタ１４０と、データ数比較部１５０と、データ短縮伸張部１７０と、合成部１８０とを備えている。 As described above, the speech synthesis apparatus 100 according to the embodiment of the present invention includes the first counter 130, the second counter 140, the data number comparison unit 150, the data shortening / expansion unit 170, and the synthesis unit 180. I have.

第１のカウンタ１３０は、第１の音声データＦ１のデータ数である第１の音声データ数Ｎ１を計測する。第２のカウンタ１４０は、第２の音声データＦ２のデータ数である第２の音声データ数Ｎ２を計測する。データ数比較部１５０は、第１のカウンタ１３０により計測された第１の音声データ数Ｎ１と、第２のカウンタ１４０により計測された第２の音声データ数Ｎ２とを比較する。データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を出力する。合成部１８０は、データ短縮伸張部１７０により出力された短縮または伸張した後の第２の音声データＦ２と、第１の音声データＦ１を合成する。 The first counter 130 measures a first number of audio data N1, which is the number of data of the first audio data F1. The second counter 140 measures the second number of audio data N2, which is the number of data of the second audio data F2. The data number comparison unit 150 compares the first audio data number N1 measured by the first counter 130 with the second audio data number N2 measured by the second counter 140. The data shortening / expanding unit 170 shortens or expands the second audio data F2 based on the comparison result by the data number comparison unit 150, and outputs the second audio data F2 after the shortening or expansion. The synthesizing unit 180 synthesizes the first audio data F1 and the second audio data F2 that has been shortened or expanded and output from the data shortening / expanding unit 170.

このように、データ数比較部１５０は、第１のカウンタ１３０により計測された第１の音声データ数Ｎ１と、第２のカウンタ１４０により計測された第２の音声データ数Ｎ２とを比較する。また、データ短縮伸張部１７０は、データ数比較部１５０による比較結果に基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を出力する。これにより、第２の音声データＦ２のデータ長を、第１の音声データのデータ長に合わせることができる。そして、合成部１８０は、データ短縮伸張部１７０により出力された短縮または伸張した後の第２の音声データＦ２と、第１の音声データＦ１を合成する。よって、合成部１８０は、第１の音声データＦ１と、この第１の音声データＦ１のデータ長に合わせたデータ長を有する第２の音声データＦ２とを、合成する。すなわち、合成部１８０は、互いに同じデータ長にした後に、第１の音声データＦ１と第２の音声データＦ２とを合成する。これにより、第１の音声データＦ１に同期するクロックＣＬＫ＿Ａと、第２の音声データＦ２に同期するクロックＣＬＫ＿Ｂが互いに異なっていても、音声スリップの発生を抑止して、音声合成を行うことができる。 As described above, the data number comparison unit 150 compares the first audio data number N1 measured by the first counter 130 with the second audio data number N2 measured by the second counter 140. Further, the data shortening / expanding section 170 shortens or expands the second sound data F2 based on the comparison result by the data number comparing section 150, and outputs the second sound data F2 after shortening or expanding. Thereby, the data length of the 2nd audio | voice data F2 can be match | combined with the data length of 1st audio | voice data. Then, the synthesizing unit 180 synthesizes the first audio data F1 with the second audio data F2 output from the data shortening / extending unit 170 after being shortened or expanded. Therefore, the synthesizer 180 synthesizes the first audio data F1 and the second audio data F2 having a data length that matches the data length of the first audio data F1. That is, the synthesizing unit 180 synthesizes the first audio data F1 and the second audio data F2 after setting the same data length. Thereby, even if the clock CLK_A synchronized with the first audio data F1 and the clock CLK_B synchronized with the second audio data F2 are different from each other, it is possible to suppress the occurrence of the audio slip and perform the audio synthesis. .

また、本発明の実施の形態における音声合成装置１００は、相関ピーク検出部１６０をさらに備えている。相関ピーク検出部１６０は、第２の音声データＦ２の自己相関ピークを検出する。この検出とともに、相関ピーク検出部１６０は、第２の音声データＦ２の自己相関ピークの位置と、第２の音声データ数Ｎ２に基づいて第２の音声データＦ２の波長λ２を算出する。さらに、相関ピーク検出部１６０は、第２の音声データＦ２の波長λ２をデータ短縮伸張部１７０へ出力する。そして、データ短縮伸張部１７０は、データ数比較部１５０による比較結果と、相関ピーク検出部１６０により検出された第２の音声データＦ２の波長λ２とに基づいて、第２の音声データＦ２を短縮または伸張し、短縮または伸張した後の第２の音声データＦ２を出力する。 Moreover, the speech synthesizer 100 according to the embodiment of the present invention further includes a correlation peak detector 160. Correlation peak detector 160 detects the autocorrelation peak of second audio data F2. Along with this detection, the correlation peak detector 160 calculates the wavelength λ2 of the second audio data F2 based on the position of the autocorrelation peak of the second audio data F2 and the second number of audio data N2. Further, correlation peak detection section 160 outputs wavelength λ2 of second audio data F2 to data shortening / expansion section 170. Then, the data shortening / extending unit 170 shortens the second audio data F2 based on the comparison result by the data number comparing unit 150 and the wavelength λ2 of the second audio data F2 detected by the correlation peak detecting unit 160. Alternatively, the second audio data F2 after being expanded and shortened or expanded is output.

このように、データ短縮伸張部１７０は、データ数比較部１５０による比較結果と、相関ピーク検出部１６０により検出された第２の音声データＦ２の波長λ２とに基づいて、第２の音声データＦ２を短縮または伸張する。これにより、データ短縮伸張部１７０は、第２の音声データＦ２の波長λ２単位で、当該第２の音声データＦ２を短縮または伸張することができる。この結果、第１の音声データＦ１に同期するクロックＣＬＫ＿Ａと、第２の音声データＦ２に同期するクロックＣＬＫ＿Ｂが互いに異なっていても、より効率よく音声スリップの発生を抑止して、音声合成を行うことができる。 As described above, the data shortening / expanding unit 170 performs the second audio data F2 based on the comparison result by the data number comparison unit 150 and the wavelength λ2 of the second audio data F2 detected by the correlation peak detection unit 160. Shorten or stretch. Thereby, the data shortening / expanding unit 170 can shorten or expand the second audio data F2 in units of the wavelength λ2 of the second audio data F2. As a result, even if the clock CLK_A synchronized with the first audio data F1 and the clock CLK_B synchronized with the second audio data F2 are different from each other, the generation of the audio slip is suppressed more efficiently and the audio synthesis is performed. be able to.

また、本発明の実施の形態における音声合成装置１００において、第１の音声データＦ１は、第１の通信局（例えば、基地局）から送信されデータである。第２の音声データＦ２は、第１の通信局と異なる第２の通信局（例えば、別の移動局）から送信されたデータである。このように、異なる通信局から送信される２つの音声データを、音声スリップの発生を抑止して、音声合成を行うことができる。 In the speech synthesizer 100 according to the embodiment of the present invention, the first speech data F1 is data transmitted from a first communication station (for example, a base station). The second audio data F2 is data transmitted from a second communication station (for example, another mobile station) different from the first communication station. As described above, it is possible to synthesize two voice data transmitted from different communication stations while suppressing the occurrence of voice slip.

以上、実施の形態をもとに本発明を説明した。実施の形態は例示であり、本発明の主旨から逸脱しない限り、上述各実施の形態に対して、さまざまな変更、増減、組合せを加えてもよい。これらの変更、増減、組合せが加えられた変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiment. The embodiment is an exemplification, and various modifications, increases / decreases, and combinations may be added to the above-described embodiments without departing from the gist of the present invention. It will be understood by those skilled in the art that modifications to which these changes, increases / decreases, and combinations are also within the scope of the present invention.

１００音声合成装置
１１０第１のＦＩＦＯメモリ
１２０第２のＦＩＦＯメモリ
１３０第１のカウンタ
１４０第２のカウンタ
１５０データ数比較部
１６０相関ピーク検出部
１７０データ短縮伸張部
１８０合成部 DESCRIPTION OF SYMBOLS 100 Speech synthesizer 110 1st FIFO memory 120 2nd FIFO memory 130 1st counter 140 2nd counter 150 Data number comparison part 160 Correlation peak detection part 170 Data shortening expansion part 180 Synthesis | combination part

Claims

A first counter that measures the number of first audio data that is the number of data of the first audio data;
A second counter that measures a second number of audio data that is the number of data of the second audio data;
A data number comparison unit that compares the first audio data number measured by the first counter with the second audio data number measured by the second counter;
A data shortening / expanding unit that shortens or expands the second audio data based on the comparison result by the data number comparison unit, and outputs the second audio data after the shortening or expansion;
A synthesizing unit that synthesizes the first audio data and the second audio data after the shortening or expansion output by the data shortening / extracting unit ;
The autocorrelation peak of the second audio data is detected, and the wavelength of the second audio data is calculated based on the position of the autocorrelation peak of the second audio data and the number of the second audio data. A correlation peak detection unit that outputs the wavelength of the second audio data to the data shortening / decompression unit;
The data shortening / extending unit shortens or decompresses the second audio data based on the comparison result by the data number comparing unit and the wavelength of the second audio data detected by the correlation peak detecting unit. A speech synthesizer that outputs the second speech data after shortening or expansion .

The first audio data is data transmitted from a first communication station,
The speech synthesis apparatus according to claim 1 , wherein the second voice data is data transmitted from a second communication station different from the first communication station.