JPH09101798A

JPH09101798A - Method and device for expanding voice band

Info

Publication number: JPH09101798A
Application number: JP7258448A
Authority: JP
Inventors: Yoshihisa Nakato; 良久中藤; Mineo Tsushima; 峰生津島; Takeshi Norimatsu; 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-10-05
Filing date: 1995-10-05
Publication date: 1997-04-15
Anticipated expiration: 2015-10-05
Also published as: JP2956548B2

Abstract

PROBLEM TO BE SOLVED: To synthesize a voice of high tone quality by converting a band-limited input voice into a broadband voice having a wide frequency band including the frequency band in the input voice. SOLUTION: This device is equipped with a voice analyzer 101 which separates the band-limited input voice into a narrow-band residue signal and a narrow-band spectrum envelope by taking a voice analysis, a residue band widening unit 102 which generates a broadband residue signal from the narrow- band residue signal, an envelope band widening unit 103 which estimates a broadband spectrum envelope from the narrow-band spectrum envelope, a voice synthesizer 104 which synthesizes a broadband synthesized voice from the broadband residue signal and broadband spectrum envelope, a filter 105 which extracts out-band components other than the frequency band that the input voice has from the broadband synthesized voice, and a voice superposing unit 106 which superposes the waveforms of the out-band components and input voice on the time base to synthesize a broadband voice having a frequency band including the frequency band that the input voice has.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ある周波数帯域に
帯域制限された入力音声を、入力音声の持つ周波数帯域
を包含するような広い周波数帯域を有する広帯域音声に
変換することで高音質な音声を合成する音声帯域拡大方
法および音声帯域拡大装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention converts an input voice band-limited to a certain frequency band into a wide band voice having a wide frequency band including the frequency band of the input voice, thereby producing a high-quality voice. TECHNICAL FIELD The present invention relates to a voice band expanding method and a voice band expanding device.

【０００２】[0002]

【従来の技術】アナログ電話による通信の大部分は、Ｎ
ＴＴの管轄する公衆回線網を通して行われているが、回
線の物理的な制約により３００Ｈｚ〜３.４ｋＨｚに帯
域制限されており、３００Ｈｚ以下の低域部分と３.４
ｋＨｚ以上の高域部分が失われることによる音質劣化が
起こる。また、携帯電話をはじめとするディジタル音声
通信では、ビットレートの制限のために音声の帯域が制
限されるという原理的な制約がある。2. Description of the Related Art Most of analog telephone communication is N
Although it is conducted through the public line network under the jurisdiction of TT, the band is limited to 300 Hz to 3.4 kHz due to physical restrictions of the line, and the low frequency band below 300 Hz and 3.4
Sound quality deterioration occurs due to the loss of the high frequency band above kHz. Further, in digital voice communication such as that of a mobile phone, there is a principle limitation that the voice band is limited due to the limitation of the bit rate.

【０００３】そこで、近年においては、回線はそのまま
で電話音声を高品質化する技術が切望されており、最近
この問題に関する研究が盛んに行われている。例えば、
吉田、阿部：“コードブックマッピングによる狭帯域音
声から広帯域音声の復元法”,音響学会講演論文集,１−
８−１８,pp.179-180,（1993.3）がある。この方法は狭
帯域音声と広帯域音声のコードブックの対応付けを基本
にしており、ベクトル量子化（ＶＱ）して得られた電話
音声のコードに対する広帯域コードを広帯域コードブッ
クから引き出すことで間接的に広帯域スペクトルを求
め、さらにピッチを音源として音声合成することにより
広帯域音声を得ている。Therefore, in recent years, there has been a strong demand for a technique for improving the quality of telephone voice while keeping the line as it is, and research on this problem has recently been actively conducted. For example,
Yoshida, Abe: "Reconstruction method of wideband speech from narrowband speech by codebook mapping", Proceedings of ASJ, 1-
8-18, pp.179-180, (1993.3). This method is based on the correspondence between the narrowband speech and the wideband speech codebook, and indirectly by extracting the wideband code for the telephone speech code obtained by vector quantization (VQ) from the wideband codebook. Wideband speech is obtained by obtaining a wideband spectrum and synthesizing the speech using the pitch as a sound source.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
ような従来の方法は、コードブックのマッピングによる
帯域拡大であるため、合成された音の劣化が大きくな
り、また処理量も比較的多くなる。However, in the conventional method as described above, since the band is expanded by mapping the codebook, the synthesized sound is greatly deteriorated and the processing amount is relatively large.

【０００５】本発明は、上記の課題を解決するもので、
ある周波数帯域に帯域制限された入力音声を、入力音声
の持つ周波数帯域を包含するような広い周波数帯域を有
する広帯域音声に変換し、合成することで、通信回線に
より帯域制限されたアナログ電話や帯域制限された携帯
電話の帯域を広げ、通話品質を向上させることを可能に
する音声帯域拡大方法および音声帯域拡大装置を提供す
ることを目的とするものである。The present invention solves the above-mentioned problems.
An analog telephone or a band that is band-limited by a communication line is created by converting an input voice band-limited to a certain frequency band into a wide-band voice having a wide frequency band that covers the frequency band of the input voice and synthesizing it. It is an object of the present invention to provide a voice band expanding method and a voice band expanding device that can expand the limited band of a mobile phone and improve the call quality.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に本発明は、特定の周波数帯域に帯域制限された入力音
声を、前記入力音声の周波数帯域を包含する広い周波数
帯域を有する広帯域音声に変換する音声帯域拡大方法で
あって、前記入力音声を一定フレーム毎に線形予測分析
して狭帯域残差信号と狭帯域スペクトル包絡とに分離
し、前記狭帯域残差信号から非線形処理により広帯域残
差信号を発生させ、前記狭帯域スペクトル包絡から広帯
域スペクトル包絡を推定し、前記広帯域残差信号と前記
広帯域スペクトル包絡とから線形予測合成法を用いて、
入力音声の持つ周波数帯域を包含する広い周波数帯域を
有する広帯域音声を合成することを特徴とするものであ
る。In order to achieve the above object, the present invention provides an input voice band-limited to a specific frequency band to a wide band voice having a wide frequency band including the frequency band of the input voice. A speech band expanding method for converting, wherein the input speech is linearly predictively analyzed for each constant frame to be separated into a narrow band residual signal and a narrow band spectrum envelope, and the narrow band residual signal is subjected to a wide band residual by nonlinear processing. Generating a difference signal, estimating a wideband spectrum envelope from the narrowband spectrum envelope, using a linear prediction synthesis method from the wideband residual signal and the wideband spectrum envelope,
It is characterized in that wideband speech having a wide frequency band including the frequency band of the input speech is synthesized.

【０００７】また、本発明は、特定の周波数帯域に帯域
制限された入力音声を、前記入力音声の周波数帯域を包
含する広い周波数帯域を有する広帯域音声に変換する音
声帯域拡大装置であって、前記入力音声を一定フレーム
毎に線形予測分析して狭帯域残差信号と狭帯域スペクト
ル包絡とに分離する音声分析器と、前記狭帯域残差信号
の１サンプル毎にゼロを挿入することでサンプリング周
波数を倍にし、さらにエイリアジングフィルタ処理する
ことで前記狭帯域残差信号の帯域幅のみを拡大した帯域
拡大残差信号を作成する残差帯域幅拡大器と、前記帯域
拡大残差信号をフィルタ処理することで前記帯域拡大残
差信号から音声の基本周波数成分の存在する周波数帯域
のみを切り出したピッチ含有残差信号を抽出するピッチ
フィルタと、前記ピッチ含有残差信号から広帯域残差信
号を発生させる残差広帯域化器と、前記狭帯域スペクト
ル包絡から広帯域スペクトル包絡を推定する包絡広帯域
化器と、前記広帯域残差信号と前記広帯域スペクトル包
絡とから線形予測合成法を用いて、入力音声の持つ周波
数帯域を包含する広い周波数帯域を有する広帯域音声を
合成する音声合成器とを備えたことを特徴とする。The present invention is also a voice band expansion device for converting an input voice band-limited to a specific frequency band into a wide band voice having a wide frequency band including the frequency band of the input voice, A speech analyzer that performs linear prediction analysis of the input speech for each fixed frame to separate it into a narrowband residual signal and a narrowband spectrum envelope, and a sampling frequency by inserting zero for each sample of the narrowband residual signal. And a residual bandwidth expander for creating a band-expanded residual signal in which only the bandwidth of the narrow-band residual signal is expanded by further performing aliasing filtering, and the band-expanded residual signal is filtered. A pitch filter for extracting a pitch-containing residual signal obtained by cutting out only the frequency band in which the fundamental frequency component of the voice exists from the band expansion residual signal, Switch-containing residual signal to generate a wideband residual signal, a wideband residual band signal, an envelope wideband unit that estimates a wideband spectral envelope from the narrowband spectral envelope, the wideband residual signal and the wideband spectral envelope. And a speech synthesizer for synthesizing wide-band speech having a wide frequency band including the frequency band of the input speech by using the linear predictive synthesis method.

【０００８】[0008]

【発明の実施の形態】本発明は、上記した構成により、
ある周波数帯域に帯域制限された入力音声を、入力音声
の持つ周波数帯域を包含する広い周波数帯域を有する広
帯域信号に変換するため、帯域制限された入力音声と、
入力音声を用いて作成した入力音声よりも広い周波数帯
域を有する広帯域合成音声を作成しておき、さらにフィ
ルタ処理により入力音声と広帯域合成音声を重畳して最
終的な広帯域音声を得ているので、簡単な構成により高
性能な帯域制限された音声の広帯域化が可能となる。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention has the above-described structure,
In order to convert an input voice band-limited to a certain frequency band into a wideband signal having a wide frequency band including the frequency band of the input voice, an input voice band-limited and
Since a wideband synthesized voice having a wider frequency band than the input voice created using the input voice is created, and the final wideband voice is obtained by superimposing the input voice and the wideband synthesized voice by the filtering process. With a simple configuration, it is possible to widen the band of high-performance band-limited voice.

【０００９】携帯電話をはじめとする通信端末の高度化
が進み、その際通話品質の議論が盛んになされている。
電話は多くの人にとって、その発明から今日に至るまで
なくてはならない重要な通信手段であり、その品質を向
上させることは非常に重要な研究課題である。本発明
は、回線特性のため０.３〜３.４ｋＨｚに帯域制限され
たアナログ電話音声やビットレートの制限のために帯域
制限されたディジタル音声を元の広帯域な信号に戻すた
めの方法と装置を提供するものである。[0009] The sophistication of communication terminals such as mobile phones is progressing, and at that time, discussion of call quality is being actively conducted.
The telephone is an important communication means for many people from its invention to the present day, and improving its quality is a very important research subject. SUMMARY OF THE INVENTION The present invention is a method and apparatus for restoring analog telephone voice band-limited to 0.3 to 3.4 kHz due to line characteristics or digital voice band-limited due to bit rate limitation to the original wideband signal. Is provided.

【００１０】以下、帯域制限された音声を広帯域化する
本発明の第１の実施例について説明する。A first embodiment of the present invention for expanding a band-limited voice into a wide band will be described below.

【００１１】図１は本発明の第１の実施例の全体構成を
示すブロック構成図である。図１において、１０１は帯
域制限された入力音声をフレーム毎（ただし、フレーム
とは音声信号を所定期間で区切られる単位時間）に音声
分析して、狭帯域残差信号と狭帯域スペクトル包絡とに
分離する音声分析器であり、例えばフレーム毎にＬＰＣ
分析（線形予測分析）により得られるスペクトル包絡と
残差信号を算出する部分である。ここで、ＬＰＣ分析に
より得られるスペクトル包絡としては、例えばＬＰＣ係
数、ＰＡＲＣＯＲ係数、反射係数、ＬＳＰ係数、ＬＰＣ
ケプストラム係数、ＬＰＣメルケプストラム係数等が考
えられる。これらは全て音声のスペクトル上の特徴を表
現している特徴量なので、いずれの係数を用いても何等
差し支えない。また、残差信号は入力音声からスペクト
ル包絡の情報を取り去った残りの情報であり、音声中の
ピッチ構造を良く表現していると言える。残差信号の代
わりにピッチやマルチパルス列、音源符号帳を用いて
も、これらは全て音声のＬＰＣ分析後に得られる残差信
号を特徴的に表現したものに相当するので、いずれの情
報を用いても差し支えない。ここで線形予測係数、ＰＡ
ＲＣＯＲ係数、反射係数、ＬＰＣケプストラム係数、Ｌ
ＰＣメルケプストラム係数などの特徴量に関しては、例
えば、L.R.RabinerとR.W.Schaferの共著、鈴木久喜訳
の、”音声のディジタル信号処理（上）（下）”、コロ
ナ社、に詳しく記載されており、ＬＳＰ係数に関して
は、例えば、F.K.Soong,B.H.Juang："Line Spectrum Pa
ir(LSP) andSpeech Data Compression",Proc.ICASSP,8
4,pp.1.10.1-1.10.4、に記載されている。ピッチやマル
チパルス列に関しては、例えば、古井：”音響・音声工
学”、近代科学社、に詳しく記載されており、また音源
符号帳に関しては、例えば、小野：“音声符号化技術の
最近の進展”,日本音響学会誌,４８巻、１号,pp.52-59,
（1992）に記載されている。FIG. 1 is a block diagram showing the overall construction of the first embodiment of the present invention. In FIG. 1, reference numeral 101 is a voice analysis of band-limited input voice for each frame (however, a frame is a unit time in which a voice signal is divided by a predetermined period) to analyze a narrow band residual signal and a narrow band spectrum envelope. A voice analyzer that separates, for example, LPC for each frame
This is a part for calculating a spectrum envelope and a residual signal obtained by analysis (linear prediction analysis). Here, as the spectral envelope obtained by the LPC analysis, for example, LPC coefficient, PARCOR coefficient, reflection coefficient, LSP coefficient, LPC
A cepstrum coefficient, an LPC mel cepstrum coefficient, etc. can be considered. Since these are all feature quantities expressing features on the spectrum of speech, any coefficient may be used. Further, the residual signal is the remaining information obtained by removing the information of the spectral envelope from the input voice, and can be said to well represent the pitch structure in the voice. Even if a pitch, a multi-pulse train, or an excitation codebook is used instead of the residual signal, these all correspond to the characteristic representation of the residual signal obtained after the LPC analysis of the speech, so any information is used. It doesn't matter. Where the linear prediction coefficient, PA
RCOR coefficient, reflection coefficient, LPC cepstrum coefficient, L
Regarding the feature quantities such as PC Mel cepstrum coefficient, it is described in detail in "Audio Digital Signal Processing (Upper) (Lower)", Corona Publishing Co., co-authored by LR Rabiner and RWSchafer, and LSP coefficient. Regarding, for example, FKSoong, BHJuang: "Line Spectrum Pa
ir (LSP) and Speech Data Compression ", Proc.ICASSP, 8
4, pp.1.10.1-1.10.4. Pitches and multi-pulse trains are described in detail in, for example, Furui: "Acoustic and Speech Engineering", Modern Science Co., Ltd., and regarding excitation codebooks, for example, Ono: "Recent progress in speech coding technology". , Journal of Acoustical Society of Japan, Vol. 48, No. 1, pp.52-59,
(1992).

【００１２】一方、音声分析の別の方法として、ケプス
トラム分析やＰＳＥ分析およびウェーブレット変換等を
用いても、音声の周波数軸上の特性を分離・抽出する手
法であることには変わりないので、使用しても何等差し
支えない。例えば、ケプストラム分析やＰＳＥ分析の場
合にはリフターにより抽出したケプストラム係数をスペ
クトル包絡とし、その残り分を残差信号として使用する
ことで実現できる。これらの分析手法に関しては既に公
知であり、例えばケプストラム分析に関しては、L.R.Ra
binerとR.W.Schaferの共著、鈴木久喜訳の、”音声のデ
ィジタル信号処理（上）（下）”、コロナ社、に詳しく
記載されており、またＰＳＥ分析に関しては、例えば、
中島、鈴木：“パワースペクトル包絡（ＰＳＥ）音声分
析・合成系”,日本音響学会誌,４４巻、１１号,pp.824-
832,（1988）に、ウェーブレット変換に関しては、河
原：“ウェーブレット解析の聴覚研究への応用”,日本
音響学会誌,４７巻、６号,pp.424-429,（1991）、に記
載されている。本実施例では以下、音声分析手法として
はＬＰＣ分析を使用し、スペクトル包絡としてはＰＡＲ
ＣＯＲ係数を、残差信号としては残差信号そのものを用
いることにする。On the other hand, even if a cepstrum analysis, a PSE analysis, a wavelet transform or the like is used as another method of speech analysis, it is still a method of separating / extracting the characteristic of the speech on the frequency axis. It doesn't matter what you do. For example, in the case of cepstrum analysis or PSE analysis, it can be realized by using a cepstrum coefficient extracted by a lifter as a spectrum envelope and using the rest as a residual signal. These analytical methods are already known, and for example, for cepstrum analysis, LRRa
Biner and RWSchafer's co-author, Kuki Suzuki's translation, "Digital signal processing of voice (top) (bottom)", Corona Publishing Co., Ltd.
Nakajima, Suzuki: “Power spectrum envelope (PSE) speech analysis and synthesis system”, Journal of Acoustical Society of Japan, Volume 44, No. 11, pp.824-
832, (1988), regarding wavelet transform, is described in Kawahara: “Application of Wavelet Analysis to Auditory Research”, Journal of Acoustical Society of Japan, Vol. 47, No. 6, pp.424-429, (1991). There is. In the present embodiment, hereinafter, LPC analysis is used as the voice analysis method and PAR is used as the spectrum envelope.
The COR coefficient is used as the residual signal.

【００１３】次に１０２は、音声分析器１０１により分
離された狭帯域残差信号を非線形に歪ませて、広帯域残
差信号を得る残差広帯域化器である。非線形に歪ませる
方法としては、まず狭帯域残差信号の１サンプル毎にゼ
ロを挿入することでサンプリング周波数を倍にし、さら
にエイリアジングフィルタ処理することで、折り返し歪
のないサンプリング拡大残差信号を作成する。このサン
プリング拡大残差信号のうち信号の絶対値が一定値以下
の値を持つ信号だけを０に変更することや、信号の値が
０以下の値を持つ信号だけを０に変更すること、あるい
は、信号の値が０以下の値を持つ信号のみ、その値の符
号を反転させることにより広帯域残差信号を発生させる
方法など種々の方法が考えられ、いずれも狭帯域残差信
号から広帯域残差信号を発生させる方法であるので使用
可能である。さらにサンプリング拡大残差信号のうち絶
対値が一定値以上の値を持ち、かつ符号が正のときは信
号をその一定値に変更し、また信号の絶対値が一定値以
上の値を持ち、かつ符号が負ならば、一定値の符号を反
転した値に変更することによっても、広帯域残差信号を
発生させることもできる。さらに、音質は若干劣るもの
の狭帯域残差信号に１サンプル毎にゼロを挿入するだけ
でもで、折り返し歪により疑似的に広帯域残差信号を発
生させることができ、同様の効果が得られるので使用可
能である。このように非線形に歪ませる方法は種々考え
られ、しかもそれぞれの効果はそれぞれ大きく異なるも
のである。Next, 102 is a residual band widening device for nonlinearly distorting the narrow band residual signal separated by the speech analyzer 101 to obtain a wide band residual signal. As a method of non-linearly distorting, first, a sampling frequency is doubled by inserting a zero for each sample of the narrow band residual signal, and further aliasing filtering is performed to obtain a sampled expanded residual signal without aliasing distortion. create. Of the sampling-enlarged residual signals, only those signals whose absolute values are below a certain value are changed to 0, or only those signals whose signal values are below 0 are changed to 0, or , Various methods such as a method of generating a wideband residual signal by inverting the sign of the value of a signal having a value of 0 or less are all possible. It can be used because it is a method of generating a signal. Further, if the absolute value of the sampling expanded residual signal has a value greater than a certain value and the sign is positive, the signal is changed to that certain value, and the absolute value of the signal has a value greater than a certain value, and If the sign is negative, the wideband residual signal can also be generated by changing the sign of a constant value to an inverted value. Furthermore, although the sound quality is slightly inferior, it is possible to artificially generate a wideband residual signal due to aliasing distortion by simply inserting a zero for each sample in the narrowband residual signal. It is possible. There are various methods that can be used to distort nonlinearly as described above, and the respective effects are greatly different.

【００１４】次に１０３は、あらかじめ多量の学習用デ
ータから抽出した狭帯域スペクトル包絡と広帯域スペク
トル包絡を用いて求めておいた写像関数により、音声分
析器１０１により分離された入力音声の狭帯域スペクト
ル包絡を広帯域スペクトル包絡へと変換する包絡広帯域
化器である。求めておく写像関数としては、本実施例で
は線形写像を用いるが、さらに線形写像以外に２次変換
や、ニューラルネットワークなどの非線形変換を用いて
も、狭帯域スペクトルから広帯域スペクトルを直接的に
変換している点では同じなので何等差し支えない。２次
変換に関しては、例えば、F.Class、A.Kaltenmeier、P.
Regel、and K.Trottler："Fast speakeradaptation for
speech recognition systems",Proc. IEEE ICASSP,pp.
133-136,（Apr.1990）、に記載されており、またニュー
ラルネットワークによる変換は、例えば、磯、麻生、吉
田、渡辺：”ニューラルネットワークによる話者適
応”、音響学会講演論文集,１−６−１６,（1989.3）、
に記載されている。上記した学習用データとしては例え
ば、ある標準話者１名の様々に発声した音声を用いても
よいし、さらに、複数の話者のデータを用いることで、
話者の発声の変動に強い写像関数を作成する時に有用で
ある。Next, 103 is a narrow-band spectrum of the input voice separated by the voice analyzer 101 by a mapping function obtained by using a narrow-band spectrum envelope and a wide-band spectrum envelope extracted from a large amount of learning data in advance. An envelope broadening device that converts an envelope into a wideband spectral envelope. As the mapping function to be obtained, a linear mapping is used in the present embodiment, but a quadratic conversion other than the linear mapping or a non-linear conversion such as a neural network may be used to directly convert the wide band spectrum from the narrow band spectrum. It does not matter because they are the same. Regarding the secondary conversion, for example, F.Class, A.Kaltenmeier, P.
Regel, and K.Trottler: "Fast speaker adaptation for
speech recognition systems ", Proc. IEEE ICASSP, pp.
133-136, (Apr.1990), and conversion by a neural network, for example, Iso, Aso, Yoshida, Watanabe: "Speaker adaptation by neural network", Proceedings of the Acoustical Society of Japan, 1- 6-16, (1989.3),
It is described in. As the above-mentioned learning data, for example, various uttered voices of one standard speaker may be used, or by using data of a plurality of speakers,
This is useful when creating a mapping function that is robust against changes in the speaker's utterance.

【００１５】次に１０４は、残差広帯域化器１０２およ
び包絡広帯域化器１０３でそれぞれ求められた広帯域残
差信号と広帯域スペクトル包絡とから、入力信号の周波
数帯域を包含する広い周波数帯域を有する広帯域合成音
声を合成する音声合成器である。音声合成方法として
は、広帯域残差信号として何を使用するか、また広帯域
スペクトル包絡としてどの特徴量を使用するかで決定で
きる。例えば、広帯域残差信号として残差信号やピッチ
を用い、広帯域スペクトル包絡として線形予測係数を用
いたときは、線形予測合成法を用いて合成すれば良い。
他にも、ケプストラム合成法やＰＳＥ合成法等が使用で
きる。さらに、音声合成器１０４で合成する前に残差広
帯域化器１０２からの出力である広帯域残差信号に対
し、スペクトルの平滑化処理を施せば、なお良いと言え
る。例えば、平滑化処理の一例として、低次の線形予測
分析を広帯域残差信号について行うことで、スペクトル
の平坦化が実現でき、歪発生の際の余分なスペクトルの
凹凸を軽減することが可能となる。また、平滑化処理に
近い処理として位相等価処理があり、これを広帯域残差
信号に施すことで、より音質の向上が得られる。Next, reference numeral 104 is a wide band having a wide frequency band including the frequency band of the input signal from the wide band residual signal and the wide band spectrum envelope obtained by the residual wide band expander 102 and the envelope wide band expander 103, respectively. A voice synthesizer for synthesizing synthetic voice. As the speech synthesis method, it is possible to decide what to use as the wideband residual signal and which feature amount to use as the wideband spectrum envelope. For example, when a residual signal or a pitch is used as the wideband residual signal and a linear prediction coefficient is used as the wideband spectrum envelope, the linear prediction synthesis method may be used for synthesis.
Besides, a cepstrum synthesis method, a PSE synthesis method, or the like can be used. Furthermore, it can be said that it is still better if the wide band residual signal output from the residual wide band expander 102 is subjected to spectrum smoothing processing before being synthesized by the voice synthesizer 104. For example, as an example of smoothing processing, by performing low-order linear prediction analysis on a wideband residual signal, spectrum flattening can be realized, and it is possible to reduce extra spectral unevenness when distortion occurs. Become. Further, there is a phase equalization process as a process close to the smoothing process, and by applying this to the wideband residual signal, it is possible to further improve the sound quality.

【００１６】１０５は、広帯域合成音声から入力音声の
持つ周波数帯域以外の周波数成分を抽出するためのフィ
ルタであり、さらに１０６は抽出された帯域外成分と入
力音声とを時間軸上で波形重畳して、入力音声の持つ周
波数帯域を包含する広い周波数帯域を有する広帯域音声
を合成する音声重畳器である。Reference numeral 105 is a filter for extracting frequency components other than the frequency band of the input voice from the wideband synthesized voice, and 106 is waveform superposition of the extracted out-of-band component and the input voice on the time axis. And a wideband voice having a wide frequency band including the frequency band of the input voice is synthesized.

【００１７】以下、本発明の第１の実施例について、図
１のブロック構成図を参照しながら詳細に説明する。The first embodiment of the present invention will be described in detail below with reference to the block diagram of FIG.

【００１８】まず、音声が電話回線や帯域制限フィルタ
などを通ってから音声分析器１０１に入力されると、音
声分析器１０１では一定の時間間隔ｉでのＭ次のＰＡＲ
ＣＯＲ係数Ｐ_i(M)が抽出される。一定の時間間隔は、こ
こでは例えば狭帯域音声のサンプリング周波数を８ＫＨ
ｚ（帯域幅４ＫＨｚ）としたとき、１６０点（２０ｍ
ｓ）であり、この時間単位をフレームと呼ぶ。一方、広
帯域化後の広帯域音声では、サンプリング周波数を１６
ＫＨｚ（帯域幅８ＫＨｚ）として、３２０点（２０ｍ
ｓ）とすれば良い。First, when a voice is input to the voice analyzer 101 after passing through a telephone line, a band limiting filter, etc., the voice analyzer 101 causes M-th order PAR at a constant time interval i.
The COR coefficient P _i (M) is extracted. Here, the fixed time interval is, for example, 8 KH for the sampling frequency of narrow band speech.
When z (bandwidth 4 KHz), 160 points (20 m
s), and this time unit is called a frame. On the other hand, the sampling frequency is set to 16
As KHz (bandwidth 8 KHz), 320 points (20 m
s).

【００１９】次に、音声分析器１０１により分離された
狭帯域残差信号を非線形に歪ませて、広帯域残差信号を
残差広帯域化器１０２にて得る。まず、狭帯域残差信号
の１サンプルおきにゼロを挿入することでサンプリング
周波数を倍にする（アップサンプリング）。そして、元
々の周波数帯域の信号成分のみ抽出するフィルタ（エイ
リアジングフィルタ）を作成しておき、このフィルタに
アップサンプリングされた残差信号を入力することで、
折り返し歪のないサンプリング拡大残差信号を作成す
る。こうすることで、例えば元々の残差信号が１フレー
ム１６０点のデータであるとすると、サンプリング拡大
残差信号では３２０点のデータとなる。そして例えば、
このサンプリング拡大残差信号のうち信号の絶対値が一
定値以下の値を持つ信号だけを０に変更することで、広
帯域残差信号を発生させることができる。Next, the narrow band residual signal separated by the voice analyzer 101 is nonlinearly distorted, and a wide band residual signal is obtained by the residual band widening unit 102. First, the sampling frequency is doubled by inserting zeros every other sample of the narrowband residual signal (upsampling). Then, by creating a filter (aliasing filter) that extracts only the signal component of the original frequency band and inputting the up-sampled residual signal to this filter,
Create a sampled expanded residual signal without aliasing. By doing so, for example, assuming that the original residual signal is data of 160 points in one frame, the sampling expanded residual signal has 320 points of data. And for example
A wideband residual signal can be generated by changing to 0 only those signals of which the absolute value of the signal has a value equal to or less than a fixed value among the sampling expanded residual signals.

【００２０】一方、包絡広帯域化器１０３では、音声分
析器１０１により分離された入力音声の狭帯域スペクト
ル包絡を広帯域スペクトル包絡へと変換する。まず、あ
らかじめ用意した多数の学習用音声データをフィルタ処
理して学習用狭帯域音声を作成する。フィルタとして
は、例えば電話回線の特性を模擬したものや、ディジタ
ル化の際に使用する低域通過フィルタ等を模擬したもの
を使用する。この処理により、入力音声と学習用音声を
同じ環境で得た共通のデータとして扱うことが可能にな
る。さらに、前述の多数の学習用データをフィルタ処理
しないで、学習用広帯域音声を作成する。以上の処理手
順を図２に示す。On the other hand, the envelope broadening device 103 converts the narrow band spectral envelope of the input voice separated by the voice analyzer 101 into a wide band spectral envelope. First, a large number of training speech data prepared in advance are filtered to create a learning narrowband speech. As the filter, for example, a model simulating the characteristics of a telephone line or a model simulating a low-pass filter used in digitization is used. By this processing, the input voice and the learning voice can be treated as common data obtained in the same environment. Further, the wideband speech for learning is created without filtering the large number of learning data described above. The above processing procedure is shown in FIG.

【００２１】次に、包絡広帯域化器１０３の部分の詳細
を図３に示す。まず、狭帯域データ処理器３０１は、先
ほど求めた学習用狭帯域音声を用いて、前述の音声分析
器１０１と同様に狭帯域スペクトル包絡を抽出する部分
である。同様に、広帯域データ処理器３０２では、前述
の学習用広帯域音声から広帯域スペクトル包絡を抽出す
る。次に写像関数推定器３０３では、抽出された狭帯域
スペクトル包絡と広帯域スペクトル包絡との関係を写像
関数として推定する。本実施例では写像関数として線形
写像を用い、狭帯域スペクトル情報から広帯域スペクト
ル情報へスペクトル変換を行う。具体的には、写像関数
{Ａ} の推定は、入力されたスペクトル包絡ｘ_iの変換後
の広帯域化スペクトル包絡ｚ_iと目標となる広帯域スペ
クトル包絡ｙ_iとの間の差の二乗誤差を最小化すること
で推定する。すなわち、全学習データの全フレームにお
ける（数１）で与えられる目的関数を最小化することで
求められる。Next, the details of the envelope broadening device 103 are shown in FIG. First, the narrow-band data processor 301 is a part that extracts the narrow-band spectrum envelope in the same manner as the speech analyzer 101 described above, using the learning narrow-band speech obtained earlier. Similarly, the wideband data processor 302 extracts a wideband spectrum envelope from the learning wideband speech described above. Next, the mapping function estimator 303 estimates the relationship between the extracted narrowband spectrum envelope and wideband spectrum envelope as a mapping function. In this embodiment, a linear mapping is used as the mapping function, and spectrum conversion from narrowband spectrum information to wideband spectrum information is performed. Specifically, the mapping function
The estimation of {A} is performed by minimizing the squared error of the difference between the wideband spectral envelope z _i after conversion of the input spectral envelope x _i and the target wideband spectral envelope y _i . That is, it is obtained by minimizing the objective function given by (Equation 1) in all frames of all learning data.

【００２２】[0022]

【数１】 (Equation 1)

【００２３】ただし、{Ａ} はＭ×Ｍ次元のマトリック
スであり、ｙ_iとｚ_iはＭ次元のベクトルである。However, {A} is an M × M dimensional matrix, and y _i and z _i are M dimensional vectors.

【００２４】この推定で用いる広帯域スペクトル包絡と
狭帯域スペクトル包絡は同じ学習音声データから求めて
いるので、フレーム毎に完全に一対一に対応づけするこ
とができる。次に３０４は、音声分析器１０１により抽
出された、ｉフレーム目の入力音声の狭帯域スペクトル
包絡ｘ_i を、写像関数{Ａ}で広帯域化スペクトル包絡ｚ
_i に変換する包絡広帯域化器である。具体的には、（数
２）により変換を行う。Since the wideband spectrum envelope and the narrowband spectrum envelope used in this estimation are obtained from the same learning speech data, it is possible to make a perfect one-to-one correspondence for each frame. Next, in step 304, the narrow band spectral envelope x _i of the input speech of the i-th frame extracted by the speech analyzer 101 is widened by the mapping function {A}.
_It is an envelope broadening device that converts to _i . Specifically, the conversion is performed according to (Equation 2).

【００２５】[0025]

【数２】 (Equation 2)

【００２６】次に音声合成器１０４では、残差広帯域化
器１０２と包絡広帯域化器１０３でそれぞれ求められた
広帯域残差信号と広帯域スペクトル包絡から、線形予測
合成法により音声合成する。音声合成方法としては、広
帯域残差信号として何を使用するか、また広帯域スペク
トル包絡としてどの特徴量を使用するかで決定できる。
例えば、広帯域残差信号として、残差信号やピッチを用
い、広帯域スペクトル包絡として線形予測係数を用いた
ときは、線形予測合成法を用いれば良い。他にも、ケプ
ストラム合成やＰＳＥ合成法等も同様に使用できる。Next, the speech synthesizer 104 synthesizes speech by the linear predictive synthesis method from the wideband residual signal and the wideband spectrum envelope obtained by the residual wideband 102 and the envelope wideband 103, respectively. As the speech synthesis method, it is possible to decide what to use as the wideband residual signal and which feature amount to use as the wideband spectrum envelope.
For example, when the residual signal or the pitch is used as the wideband residual signal and the linear prediction coefficient is used as the wideband spectrum envelope, the linear prediction synthesis method may be used. Besides, a cepstrum synthesis method, a PSE synthesis method, or the like can be similarly used.

【００２７】次に、フィルタ１０５によって、広帯域合
成音声から入力音声の持つ周波数帯域以外の周波数成分
を抽出する。フィルタとしては、ＦＩＲフィルタやＩＩ
Ｒフィルタ等のいずれのフィルタを用いても、広帯域合
成音声から入力音声の持つ周波数帯域以外の周波数成分
を切り出す意味では同じであるから差し支えない。Next, the filter 105 extracts frequency components other than the frequency band of the input voice from the wideband synthesized voice. As a filter, an FIR filter or II
It does not matter which filter such as the R filter is used because it has the same meaning as extracting the frequency component other than the frequency band of the input voice from the wideband synthesized voice.

【００２８】最後に、音声重畳器１０６にて、抽出され
た低域成分や高域成分などの帯域外成分と入力音声とを
時間軸上で波形重畳して、入力音声の持つ周波数帯域を
包含する広い周波数帯域を有する広帯域音声を合成す
る。Finally, in the voice superimposing unit 106, waveforms of the extracted low-frequency components, high-frequency components, and other out-of-band components and the input voice are superimposed on the time axis to include the frequency band of the input voice. Wideband speech having a wide frequency band is synthesized.

【００２９】以上のように、本実施例の構成によれば、
比較的簡単な構成で、帯域制限された入力音声を広い周
波数帯域を有する広帯域音声に正確に変換することがで
きる音声帯域拡大方法および装置を提供することができ
る。As described above, according to the configuration of this embodiment,
It is possible to provide a voice band expanding method and device capable of accurately converting a band-limited input voice into a wide band voice having a wide frequency band with a relatively simple configuration.

【００３０】次に、本発明の第２の実施例について説明
する。本実施例は、基本的には第１の実施例（図１）と
同様の構成であるが、その一部、すなわち残差広帯域化
器１０２を他の構成に変更した例である。Next, a second embodiment of the present invention will be described. This embodiment basically has the same configuration as that of the first embodiment (FIG. 1), but is an example in which a part thereof, that is, the residual band broadening device 102 is changed to another configuration.

【００３１】図４は本実施例の残差広帯域化器を詳しく
示したブロック構成図である。まず、音声分析器１０１
で帯域制限された入力音声をフレーム毎に音声分析し
て、狭帯域残差信号と狭帯域スペクトル包絡とに分離す
る。本実施例が第１の実施例と大きく異なる部分、すな
わち本実施例において最も特徴となる点は、この分離さ
れた狭帯域残差信号を非線形に歪ませて広帯域残差信号
を得る際、狭帯域残差信号を非線形に歪ませる前にピッ
チフィルタ４０１によりフィルタ処理を行い、残差信号
から音声の基本周波数成分の存在する周波数帯域のみを
切り出すところにある。FIG. 4 is a block diagram showing in detail the residual band broadening device of this embodiment. First, the voice analyzer 101
The input speech band-limited by is subjected to speech analysis for each frame and separated into a narrow band residual signal and a narrow band spectrum envelope. A major difference of the present embodiment from the first embodiment, that is, the most characteristic feature of the present embodiment is that when the separated narrowband residual signal is nonlinearly distorted to obtain a wideband residual signal, Before the band residual signal is distorted nonlinearly, the pitch filter 401 performs a filtering process to cut out only the frequency band in which the fundamental frequency component of the voice exists from the residual signal.

【００３２】次に、歪発生器４０２により非線形歪を発
生させ、広帯域残差信号を作成する。このように狭帯域
残差信号中の基本周波数成分のみ（ピッチ含有残差信
号）を抽出し、非線形歪を発生することで、ピッチ以外
の信号成分から発生する不要な歪を低減することがで
き、高音質な広帯域残差信号を作成することが可能とな
る。ここで、ピッチフィルタとは音声の基本周波数成分
の存在する周波数帯域のみを切り出すために使用するフ
ィルタであり、例えば８００Ｈｚ付近にカットオフ周波
数のある低域通過フィルタなどがその例である。このカ
ットオフ周波数は任意に決定できる。Next, the distortion generator 402 generates non-linear distortion to create a wide band residual signal. By extracting only the fundamental frequency component (pitch-containing residual signal) in the narrowband residual signal and generating non-linear distortion in this way, unnecessary distortion generated from signal components other than pitch can be reduced. It is possible to create a high-quality wideband residual signal. Here, the pitch filter is a filter used to cut out only a frequency band in which a fundamental frequency component of voice exists, and a low-pass filter having a cutoff frequency near 800 Hz is an example. This cutoff frequency can be arbitrarily determined.

【００３３】以下、包絡広帯域化器１０３で、音声分析
器１０１により分離された入力音声の狭帯域スペクトル
包絡を広帯域スペクトル包絡へと変換し、さらに得られ
た広帯域残差信号と広帯域スペクトル包絡とから音声合
成により入力信号の周波数帯域を包含する広い周波数帯
域を有する広帯域合成音声を合成し、さらに広帯域合成
音声から入力音声の持つ周波数帯域以外の周波数成分を
フィルタにより抽出し、入力音声とを時間軸上で波形重
畳して、入力音声の持つ周波数帯域を包含する広い周波
数帯域を有する広帯域音声を求める部分は第１の実施例
と同様である。In the following, the envelope band broadening device 103 converts the narrow band spectral envelope of the input voice separated by the voice analyzer 101 into a wide band spectral envelope, and further from the obtained wide band residual signal and wide band spectral envelope. Wideband synthetic speech having a wide frequency band including the frequency band of the input signal is synthesized by speech synthesis, and frequency components other than the frequency band of the input speech are extracted from the wideband synthetic speech by a filter, and the input speech and the time axis are The portion where the waveform is superimposed on the above to obtain a wide band voice having a wide frequency band including the frequency band of the input voice is the same as in the first embodiment.

【００３４】次に、本発明の第３の実施例について説明
する。本実施例は第１の実施例（図１）と基本的な構成
は同様であり、共通する部分については、詳しい説明は
割愛する。本実施例が第１の実施例と大きく異なる部分
は、第１の実施例ではフィルタ１０５において、広帯域
合成音声から入力音声の持つ周波数帯域以外の周波数成
分をフィルタにより抽出し、音声重畳器１０６で入力音
声にそのまま時間軸上で波形重畳していたのに対し、本
実施例では、フィルタにより抽出された成分と入力音声
とにそれぞれ一定の比を掛けた後、時間軸上で波形重畳
しているところにある。Next, a third embodiment of the present invention will be described. This embodiment has the same basic configuration as that of the first embodiment (FIG. 1), and a detailed description of common parts will be omitted. The main difference of the present embodiment from the first embodiment is that in the first embodiment, the filter 105 extracts the frequency components other than the frequency band of the input voice from the wideband synthesized voice by the filter, and the voice superimposing unit 106 While the waveform was superimposed on the input voice as it is on the time axis, in the present embodiment, the components extracted by the filter and the input voice are each multiplied by a certain ratio, and then the waveform is superimposed on the time axis. Where it is.

【００３５】以下、本発明の第３の実施例について詳細
に説明する。まず、音声分析器１０１により入力音声を
フレーム毎に音声分析して、狭帯域残差信号と狭帯域ス
ペクトル包絡とを分離し、残差広帯域化器１０２により
狭帯域残差信号から広帯域残差信号を発生させる部分
は、第１の実施例と同様である。本実施例では、包絡広
帯域化器１０３の部分が第１の実施例と異なる。The third embodiment of the present invention will be described in detail below. First, the speech analyzer 101 speech-analyzes the input speech for each frame to separate the narrowband residual signal and the narrowband spectrum envelope, and the residual wideband averaging unit 102 converts the narrowband residual signal into the wideband residual signal. The part for generating is the same as in the first embodiment. In this embodiment, the envelope broadening device 103 is different from that of the first embodiment.

【００３６】図５はこの包絡広帯域化器１０３の部分を
詳しく示したブロック構成図である。以下、図５のブロ
ック構成図を参照しながら説明する。まず５０１は、あ
らかじめ多量の学習用音声データから抽出しておいた狭
帯域スペクトル包絡を作成する狭帯域データ作成器であ
り、さらに５０２はこの狭帯域スペクトル包絡と時間的
に対応づけされた広帯域スペクトル包絡を作成する狭帯
域データ作成器である。次に、狭帯域符号帳作成器５０
３により狭帯域スペクトル包絡をいくつかの類似したス
ペクトル包絡毎に分類しておき、代表コード毎を求めて
おく。そして写像関数推定器５０４により、代表コード
毎に狭帯域スペクトル包絡から広帯域スペクトル包絡を
導く写像関数を推定する。推定方法は、第１の実施例と
同様である。そして、実際の入力音声の狭帯域スペクト
ル包絡がどの代表コードに近いかをコード判定器５０５
により判定し、包絡広帯域化器５０６により、最も近い
コードと対応する写像関数を用いて広帯域スペクトル包
絡へと変換する。FIG. 5 is a block diagram showing in detail the portion of the envelope broadening device 103. Hereinafter, description will be given with reference to the block diagram of FIG. First, 501 is a narrowband data generator that creates a narrowband spectrum envelope extracted in advance from a large amount of learning speech data, and 502 is a wideband spectrum temporally associated with this narrowband spectrum envelope. It is a narrow band data generator that creates an envelope. Next, the narrow band codebook generator 50
The narrowband spectrum envelope is classified into several similar spectrum envelopes by 3, and each representative code is obtained. Then, the mapping function estimator 504 estimates a mapping function for deriving a wideband spectrum envelope from the narrowband spectrum envelope for each representative code. The estimation method is the same as in the first embodiment. Then, the code determiner 505 determines to which representative code the narrow band spectrum envelope of the actual input speech is close.
Then, the envelope broadening unit 506 converts it into a wideband spectrum envelope using a mapping function corresponding to the closest code.

【００３７】例えば、実際に狭帯域スペクトル包絡をベ
クトル量子化するには、まずｊフレーム目の狭帯域スペ
クトル包絡ｘ_jに対するｋ番目のコードＶ_k（コード数
Ｌ）に対する量子化歪Ｄ_jkは、（数３）で計算され
る。For example, in order to actually perform vector quantization on the narrow band spectral envelope, first, the quantization distortion D _jk for the k-th code V _k (code number L) for the j-th frame narrow band spectral envelope x _j is It is calculated by (Equation 3).

【００３８】[0038]

【数３】 (Equation 3)

【００３９】ただし、ｘ_j、Ｖ_kはＭ次元のベクトル（Ｍ
次元の特徴量）である。そして、この歪が最も小さいコ
ードがｊフレーム目の狭帯域スペクトル包絡に対する代
表コードになる。However, x _j and V _k are M-dimensional vectors (M
Dimensional feature amount). The code with the smallest distortion is the representative code for the j-th frame narrow band spectrum envelope.

【００４０】このように、狭帯域スペクトル包絡を類似
したスペクトルのグループにクラスタリングし、各グル
ープを代表的に表現するいくつかの代表コードを求める
方法は、ベクトル量子化法（Y.Linde, A.buzo and R.M.
Gray:"An algorithm for vector quantizer design",IE
EE Trans.Commun.,COM-28,1,pp.84-95(Jan.1980)）と呼
ばれ、多量のデータの特徴を少ないデータで効率的に表
現することが可能となる。さらにクラスタリング方法と
して別の方法を用いても何等差し支えない。As described above, the method of clustering the narrow-band spectrum envelope into groups of similar spectra and obtaining some representative codes representatively representing each group is a vector quantization method (Y. Linde, A. buzo and RM
Gray: "An algorithm for vector quantizer design", IE
EE Trans.Commun., COM-28,1, pp.84-95 (Jan.1980)), which makes it possible to efficiently express the characteristics of a large amount of data with a small amount of data. Further, there is no problem even if another method is used as the clustering method.

【００４１】以上のようにして求めた広帯域残差信号と
広帯域スペクトル包絡から音声合成器１０４により、入
力音声の持つ周波数帯域を包含する広い周波数帯域を有
する広帯域合成音声を合成する。この部分は第１の実施
例と同様である。From the wide band residual signal and the wide band spectrum envelope obtained as described above, the voice synthesizer 104 synthesizes a wide band synthesized voice having a wide frequency band including the frequency band of the input voice. This part is the same as in the first embodiment.

【００４２】さらに本実施例では、第１の実施例のフィ
ルタ１０５および音声重畳器１０６の部分が異なる。図
６はこのフィルタおよび音声重畳器の部分を詳しく示し
たブロック構成図である。以下、図６を参照しながら説
明する。Furthermore, in this embodiment, the filter 105 and the voice superimposing unit 106 of the first embodiment are different. FIG. 6 is a block diagram showing in detail the filter and the voice superimposing unit. Hereinafter, description will be given with reference to FIG.

【００４３】まず、６０１は広帯域合成音声から入力音
声の持つ周波数帯域以外の周波数成分のうち低域成分の
みを低域通過フィルタにより抽出する低域フィルタであ
り、６０２は広帯域合成音声から入力音声の持つ周波数
帯域以外の周波数成分のうち高域成分のみを高域通過フ
ィルタにより抽出する高域フィルタである。なお、ここ
では、広帯域合成音声から入力音声の持つ周波数帯域以
外の周波数成分を抽出するフィルタとして低域、高域の
２種類を考えたが、３つ以上のフィルタを用いても何等
差し支えない。First, 601 is a low-pass filter for extracting only low-frequency components of the frequency components other than the frequency band of the input voice from the wide-band synthesized voice by a low-pass filter, and 602 is the input voice from the wide-band synthesized voice. It is a high-pass filter that extracts only high-frequency components from the frequency components other than the frequency band that it has by a high-pass filter. Here, two types of filters, low band and high band, are considered as filters for extracting frequency components other than the frequency band of the input voice from the wideband synthesized voice, but three or more filters may be used without any problem.

【００４４】６０３は、この低域成分および高域成分と
入力音声とを時間軸上で重畳する音声重畳器であるが、
本実施例ではコード判定器５０３において最も近いコー
ドと判定されたコードの内容に応じて、あらかじめ決定
しておいた比率で低域成分および高域成分と入力音声と
を重畳する機能を有する。この際、その比率として、例
えば、コードが摩擦音等の無声音を表すコードの場合
は、低域、高域、入力音声の比率を０.５:１:１.５のよ
うにして、高域を強調するようにし、コードが母音等の
有声音を表すコードの場合は、低域、高域、入力音声の
比率を１.５:１:０.５のようにして、低域を強調するよ
うにして、入力音声と波形重畳することで、コードすな
わち音声スペクトルの形状に応じて、的確に入力音声の
広帯域化が可能となる。Reference numeral 603 denotes a voice superimposing device which superimposes the low frequency component and the high frequency component and the input voice on the time axis.
The present embodiment has a function of superimposing the low frequency component and the high frequency component on the input voice at a predetermined ratio according to the content of the code determined to be the closest code by the code determination unit 503. At this time, for example, in the case where the chord is an unvoiced sound such as a fricative sound, the ratio of the low frequency range, the high frequency range, and the input voice is set to 0.5: 1: 1.5 so that the high frequency range is When the chord is a voiced voice such as a vowel, emphasize the low range by setting the ratio of low range, high range, and input voice to 1.5: 1: 0.5. By superimposing the waveform on the input voice, it is possible to accurately widen the band of the input voice according to the shape of the code, that is, the voice spectrum.

【００４５】次に、本発明の第４の実施例について説明
する。本実施例は、第３の実施例と共通する部分は多
く、共通する部分については、詳しい説明は割愛する。
本実施例が第３の実施例と異なる部分は、第３の実施例
では音声重畳器６０３で低域成分および高域成分と入力
音声とを時間軸上で重畳する際、コード判定器５０３に
おいて最も近いコードと判定されたコードの内容に応じ
て、あらかじめ決定しておいた比率で低域成分、高域成
分と入力音声とを重畳しているのに対し、本実施例で
は、コードの代わりに入力音声の有声性に応じた比率を
掛けた後、時間軸上で波形重畳しているところにある。Next, a fourth embodiment of the present invention will be described. This embodiment has many parts in common with the third embodiment, and detailed description of the common parts will be omitted.
The difference of the present embodiment from the third embodiment is that in the third embodiment, when the voice superimposing unit 603 superimposes the low frequency component and the high frequency component on the input voice on the time axis, the code determining unit 503 According to the content of the code determined to be the closest code, the low-frequency component, the high-frequency component and the input voice are superposed at a ratio determined in advance. Is multiplied by the ratio according to the voicedness of the input voice, and then the waveform is superimposed on the time axis.

【００４６】以下、本発明の第４の実施例について詳細
に説明する。まず、音声分析器１０１により入力音声を
フレーム毎に音声分析して、狭帯域残差信号と狭帯域ス
ペクトル包絡とを分離し、残差広帯域化器１０２により
狭帯域残差信号から広帯域残差信号を発生させ、包絡広
帯域化器１０３により狭帯域スペクトル包絡から広帯域
スペクトル包絡を推定する。さらに音声合成器１０４に
より、入力音声の持つ周波数帯域を包含する広い周波数
帯域を有する広帯域合成音声を合成する。ここまでの部
分は、第３の実施例と同様である。The fourth embodiment of the present invention will be described in detail below. First, the speech analyzer 101 speech-analyzes the input speech for each frame to separate the narrowband residual signal and the narrowband spectrum envelope, and the residual wideband averaging unit 102 converts the narrowband residual signal into the wideband residual signal. Then, the envelope band broadening unit 103 estimates the wide band spectrum envelope from the narrow band spectrum envelope. Further, the speech synthesizer 104 synthesizes wideband synthesized speech having a wide frequency band including the frequency bandwidth of the input speech. The parts up to this point are the same as in the third embodiment.

【００４７】しかし本実施例では、第３の実施例の音声
重畳器１０６の部分が異なる。図７はこの音声重畳器お
よびフィルタの部分を詳しく示したブロック構成図であ
る。以下、図７を参照しながら説明する。However, this embodiment is different in the part of the voice superimposing unit 106 of the third embodiment. FIG. 7 is a block diagram showing in detail the parts of the voice superimposing unit and the filter. Hereinafter, description will be given with reference to FIG. 7.

【００４８】音声合成器１０４により合成された広帯域
合成音声から低域フィルタ７０１により入力音声の持つ
周波数帯域以外の周波数成分のうち低域成分のみを抽出
し、さらに高域フィルタ７０２により、入力音声の持つ
周波数帯域以外の周波数成分のうち高域成分のみを抽出
する。この部分は第３の実施例と同様である。From the wide-band synthesized speech synthesized by the speech synthesizer 104, a low-pass filter 701 extracts only low-frequency components of frequency components other than the frequency band of the input speech, and a high-pass filter 702 extracts the input speech. Only the high frequency component is extracted from the frequency components other than the frequency band that it has. This part is similar to that of the third embodiment.

【００４９】次に、本実施例では、まず入力音声から抽
出した狭帯域残差信号および狭帯域スペクトル包絡を用
いて、有声性判定器７０３により入力音声の有声性を求
める。さらに、この有声性の割合にに応じて、あらかじ
め決定しておいた比率で低域成分、高域成分と入力音声
とを音声重畳器７０４により重畳する。この際その比率
として、例えば、有声性が摩擦音等の無声音を表す場合
は、低域、高域、入力音声の比率を０.５:１:１.５のよ
うにして、高域を強調するようにし、有声性が母音等の
有声音を表す場合は、低域、高域、入力音声の比率を
１.５:１:０.５のようにして、低域を強調するように
し、入力音声と波形重畳することで、有声性すなわちス
ペクトルの形状に応じて、的確に入力音声の広帯域化が
可能となる。入力音声の有声性としては、たとえば入力
音声の自己相関係数を求め、さらに０次の係数すなわち
パワーで割った正規化自己相関係数を求め、０次の係数
以外の値の中で最大となる係数がピッチに相当するピッ
チ係数と判定し、このピッチ係数を有声性として用いる
ことで実現することができる。Next, in the present embodiment, first, the voiced utterance determiner 703 determines the voicedness of the input voice using the narrowband residual signal and the narrowband spectrum envelope extracted from the input voice. Further, the low-frequency component, the high-frequency component, and the input voice are superimposed by the voice superimposing unit 704 at a ratio determined in advance in accordance with the voiced ratio. At this time, as the ratio, for example, when voicedness indicates unvoiced sound such as fricative, the high frequency is emphasized by setting the ratio of low frequency, high frequency, and input voice to 0.5: 1: 1.5. If the voicedness indicates a voiced sound such as a vowel, the ratio of the low frequency range, the high frequency range, and the input voice is set to 1.5: 1: 0.5 so that the low frequency range is emphasized. By superimposing the waveform on the voice, it is possible to accurately widen the bandwidth of the input voice according to the voicedness, that is, the shape of the spectrum. As the voiced characteristic of the input voice, for example, the autocorrelation coefficient of the input voice is obtained, and the normalized autocorrelation coefficient divided by the 0th-order coefficient, that is, the power is obtained. It can be realized by determining that the pitch coefficient corresponds to the pitch and using this pitch coefficient as voiced.

【００５０】以上のように、本実施例の構成によれば、
帯域制限された入力音声を広い周波数帯域を有する広帯
域音声に正確に変換することができる音声帯域拡大装置
を提供することができる。As described above, according to the configuration of this embodiment,
It is possible to provide a voice band expansion device capable of accurately converting a band-limited input voice into a wide band voice having a wide frequency band.

【００５１】このように、本発明の実施例の音声帯域拡
大装置によれば、特定の周波数帯域に帯域制限された入
力音声を一定フレーム毎に音声分析して狭帯域残差信号
と狭帯域スペクトル包絡とに分離する音声分析器１０１
と、狭帯域残差信号から広帯域残差信号を発生させる残
差広帯域化器１０２と、狭帯域スペクトル包絡から広帯
域スペクトル包絡を推定する包絡広帯域化器１０３と、
広帯域残差信号と広帯域スペクトル包絡とから広帯域合
成音声を合成する音声合成器１０４と、広帯域合成音声
から入力音声の持つ周波数帯域以外の帯域外成分を抽出
するフィルタ１０５と、帯域外成分と入力音声とを時間
軸上で波形重畳して、入力音声の持つ周波数帯域を包含
する周波数帯域を有する広帯域音声を合成する音声重畳
器１０６とを備えたことにより、簡単な構成でしかも高
精度な音声帯域拡大装置を提供することができる。As described above, according to the voice band expanding device of the embodiment of the present invention, the input voice whose band is limited to a specific frequency band is voice analyzed for every constant frame, and the narrow band residual signal and the narrow band spectrum are analyzed. Speech analyzer 101 separated into envelope
A residual band broadening device 102 for generating a wide band residual signal from the narrow band residual signal, an envelope wide band device 103 for estimating a wide band spectrum envelope from the narrow band spectrum envelope,
A speech synthesizer 104 for synthesizing a wideband synthesized speech from a wideband residual signal and a wideband spectrum envelope, a filter 105 for extracting out-of-band components other than the frequency band of the input speech from the wide-band synthesized speech, an out-of-band component and the input speech. And a voice superimposing unit 106 for synthesizing a wide-band voice having a frequency band including the frequency band of the input voice by superimposing the waveforms on the time axis. A magnifying device can be provided.

【００５２】[0052]

【発明の効果】以上の実施例から明らかなように、本発
明によれば、特定の周波数帯域に帯域制限された入力音
声を一定フレーム毎に音声分析して狭帯域残差信号と狭
帯域スペクトル包絡とに分離し、前記狭帯域残差信号か
ら広帯域残差信号を発生させ、前記狭帯域スペクトル包
絡から広帯域スペクトル包絡を推定し、前記広帯域残差
信号と前記広帯域スペクトル包絡から線形予測合成法を
用いて、入力音声の持つ周波数帯域を包含する広い周波
数帯域を有する広帯域合成音声を合成し、前記広帯域合
成音声から入力音声の持つ周波数帯域以外の周波数成分
をフィルタにより抽出し、抽出された前記帯域外成分と
入力音声とを時間軸上で波形重畳して、入力音声の持つ
周波数帯域を包含する広い周波数帯域を有する広帯域音
声を合成するように構成しているので、比較的簡単な構
成で、帯域制限された入力音声を、入力音声の持つ周波
数帯域を包含するような広い周波数帯域を有する広帯域
信号に正確に変換することができる。As is apparent from the above embodiments, according to the present invention, the input voice band-limited to the specific frequency band is voice-analyzed for every constant frame, and the narrow band residual signal and the narrow band spectrum are obtained. Separated into an envelope, generate a wideband residual signal from the narrowband residual signal, estimate a wideband spectrum envelope from the narrowband spectrum envelope, a linear prediction synthesis method from the wideband residual signal and the wideband spectrum envelope. By using, to synthesize a wide-band synthesized voice having a wide frequency band including the frequency band of the input voice, the frequency component other than the frequency band of the input voice is extracted from the wide-band synthesized voice by a filter, the extracted band Waveform superposition of the external component and the input voice on the time axis to synthesize a wideband voice having a wide frequency band including the frequency band of the input voice. Since the configuration, a relatively simple configuration, the input speech is band limited, it can be converted accurately into a broadband signal having a wide frequency band so as to encompass the frequency band possessed by the input speech.

[Brief description of the drawings]

【図１】本発明の一実施例の音声帯域拡大装置の全体構
成を示すブロック図FIG. 1 is a block diagram showing the overall configuration of a voice band expansion device according to an embodiment of the present invention.

【図２】実施例において学習用帯域音声を生成する処理
手順を示す図FIG. 2 is a diagram showing a processing procedure for generating learning band speech in the embodiment.

【図３】本発明の第１の実施例における包絡広帯域化器
の構成を示すブロック図FIG. 3 is a block diagram showing a configuration of an envelope broadband device according to the first embodiment of the present invention.

【図４】本発明の第２の実施例における残差広帯域化器
の構成を示すブロック図FIG. 4 is a block diagram showing a configuration of a residual band broadening device according to a second embodiment of the present invention.

【図５】本発明の第３の実施例における包絡広帯域化器
の構成を示すブロック図FIG. 5 is a block diagram showing a configuration of an envelope broadening device according to a third embodiment of the present invention.

【図６】本発明の第３の実施例におけるフィルタおよび
音声重畳器のブロック図FIG. 6 is a block diagram of a filter and a voice superimposing device according to a third embodiment of the present invention.

【図７】本発明の第４の実施例におけるフィルタおよび
音声重畳器のブロック図FIG. 7 is a block diagram of a filter and a voice superimposing device according to a fourth embodiment of the present invention.

[Explanation of symbols]

１０１音声分析器１０２残差広帯域化器１０３包絡広帯域化器１０４音声合成器１０５フィルタ１０６音声重畳器 101 Speech Analyzer 102 Residual Bandwidth Extender 103 Envelope Bandwidth Extender 104 Speech Synthesizer 105 Filter 106 Speech Superimposer

Claims

[Claims]

1. A voice band expanding method for converting an input voice band-limited to a specific frequency band into a wide band voice having a wide frequency band including the frequency band of the input voice, wherein the input voice is constant. A narrow band residual signal and a narrow band spectrum envelope are separated by linear prediction analysis for each frame, a wide band residual signal is generated from the narrow band residual signal by nonlinear processing, and the narrow band spectrum envelope is converted to a wide band spectrum envelope. And a wideband speech having a wide frequency band including a frequency band of the input speech is synthesized by using a linear prediction synthesis method from the wideband residual signal and the wideband spectrum envelope. Expansion method.

2. A method of generating a wideband residual signal from a narrowband residual signal by non-linear processing, by inserting a zero for each sample in the narrowband residual signal to double the sampling frequency and wideband residual signal. The voice band expanding method according to claim 1, wherein a signal is generated.

3. As a method for generating a wideband residual signal from a narrowband residual signal by nonlinear processing, a sampling frequency is doubled by inserting a zero for each sample of the narrowband residual signal, and further aliasing is performed. A band-expanded residual signal in which only the bandwidth of the narrow-band residual signal is expanded by filtering is created, and the absolute value of the band-expanded residual signal has a positive positive value or more and the signal If the sign of is positive, the signal is changed to the constant value, and if the absolute value of the signal has a value equal to or greater than the constant value and the sign of the signal is negative, the sign of the constant value is inverted. The method for expanding a voice band according to claim 1, wherein the wide band residual signal is generated by changing the signal.

4. A method for generating a wideband residual signal from a narrowband residual signal by non-linear processing, by inserting a zero for each sample of the narrowband residual signal to double the sampling frequency and further aliasing. A band-expanded residual signal in which only the bandwidth of the narrowband residual signal is expanded by filtering is created, and only a signal whose absolute value of the band-expanded residual signal has a value equal to or less than a certain value is generated. The method of claim 1, wherein a wide band residual signal is generated by changing the value to 0.

5. A method for generating a wideband residual signal from a narrowband residual signal by non-linear processing, by inserting a zero for each sample of the narrowband residual signal to double the sampling frequency and further aliasing. A band expansion residual signal in which only the bandwidth of the narrow band residual signal is expanded by filtering is created, and only signals having a signal value of 0 or less in the band expansion residual signal are set to 0. The method for expanding a voice band according to claim 1, wherein the wide band residual signal is generated by changing the signal.

6. A method of generating a wideband residual signal from a narrowband residual signal by non-linear processing, in which a sampling frequency is doubled by inserting a zero for each sample of the narrowband residual signal, and further aliasing is performed. A band expansion residual signal in which only the bandwidth of the narrow band residual signal is expanded by filtering is created, and only a signal having a value of 0 or less in the band expansion residual signal 2. The voice band expanding method according to claim 1, wherein the wide band residual signal is generated by inverting the sign.

7. A method for estimating a wide band spectrum envelope having a wide frequency band including the frequency band of the input voice from a narrow band spectrum envelope obtained by voice analysis of the input voice band-limited to a specific frequency band. , A wide band having a wide frequency band including a narrow band spectrum envelope having the same frequency band as the input voice extracted in advance from a large amount of learning voice data and a frequency band of the input voice correlated in time A spectral envelope is used to estimate a mapping function for deriving the wideband spectral envelope from the narrowband spectral envelope for each of several similar spectral envelopes, and the mapping function is used to estimate the narrowband spectral envelope of the input speech. Wideband spectrum having a wide frequency band including the frequency band of the input voice based on The speech band expanding method according to claim 1, wherein the envelope is estimated.

8. A voice band expanding device for converting an input voice band-limited to a specific frequency band into a wide band voice having a wide frequency band including the frequency band of the input voice, wherein the input voice is constant. A speech analyzer that performs linear prediction analysis for each frame to separate it into a narrowband residual signal and a narrowband spectrum envelope, and doubles the sampling frequency by inserting zero for each sample of the narrowband residual signal, A residual bandwidth expander that creates a band-expanded residual signal by expanding only the bandwidth of the narrowband residual signal by further performing aliasing filtering, and the band-expanded residual signal by filtering the band-expanded residual signal. A pitch filter for extracting a pitch-containing residual signal obtained by cutting out only a frequency band in which a fundamental frequency component of voice exists from the band-expanded residual signal, and the pitch-containing residual signal. Signal to generate a wideband residual signal, an envelope wideband device that estimates a wideband spectrum envelope from the narrowband spectrum envelope, and a linear prediction combining method from the wideband residual signal and the wideband spectrum envelope. And a voice synthesizer for synthesizing wide-band voice having a wide frequency band including the frequency band of the input voice by using the voice band expanding device.

9. The residual widening device changes the signal to the constant value if the absolute value of the signal among the pitch-containing residual signals has a value greater than or equal to a constant value and the sign of the signal is positive, A wideband residual signal is generated by changing the sign of the constant value to an inverted value if the absolute value of the signal has a value equal to or greater than a fixed value and the sign of the signal is negative. 8. The voice band expansion device described in 8.

10. A residual wideband generator generates a wideband residual signal by changing to 0 only a signal having an absolute value of the signal below a fixed value among pitch-containing residual signals. The voice band expansion device according to claim 8.

11. The residual band widening device generates a wide band residual signal by changing only a signal having a value of 0 or less among pitch-containing residual signals to 0. The voice band expansion device according to claim 8.

12. A residual band widening device generates a wide band residual signal by inverting the sign of the value of only the signal having a value of 0 or less among the pitch-containing residual signals. The voice band expansion device according to claim 8.

13. An envelope broadening device temporally correlates the narrow band spectrum envelope having the same frequency band as the input voice extracted in advance from a large amount of learning voice data with the narrow band spectrum envelope. And estimating a mapping function for deriving the wideband spectral envelope from the narrowband spectral envelope for each of several similar spectral envelopes with a wideband spectral envelope having a wide frequency band including the frequency band of the input speech. Aside from that, the mapping function is used to convert the narrow-band spectrum envelope of the input voice into a wide-band spectrum envelope having a wide frequency band including the frequency band of the input voice. The voice band expansion device according to any one of 1.

14. A filter for extracting an out-of-band component of a frequency other than the frequency band of the input voice by a filter from the wide-band synthesized voice synthesized by the voice synthesizer, and the extracted out-of-band component and the input voice. 13. A voice superimposing device for synthesizing a wide-band voice having a wide frequency band including a frequency band of an input voice by superimposing a waveform on a time axis, and further comprising a voice superimposing device. A voice band expansion device according to claim 1.

15. A low-pass filter for extracting only a low-frequency component of a frequency component other than a frequency band of an input voice from a wide-band synthesized voice synthesized by a voice synthesizer by a low-pass filter, and the wide-band synthesized voice. From the high-pass filter that extracts only the high-frequency component of the frequency components other than the frequency band of the input voice by the high-pass filter, and the extracted low-pass component, the high-frequency component and the input voice are constant. 13. A voice superimposing device for synthesizing a wide-band voice having a wide frequency band including a frequency band of an input voice by superimposing a waveform on a time axis by a ratio. The voice band expansion device according to any one of the above.

16. An envelope broadening device, when estimating a wide band spectrum envelope from a narrow band spectrum envelope, has a narrow band spectrum envelope having the same frequency band as the input voice extracted from a large amount of learning voice data in advance. To create a codebook by classifying each of several similar spectral envelopes, and further obtain a representative code representative of each codebook, and a narrowband spectral envelope extracted from the learning voice data, and Using the wideband spectrum envelope having a wide frequency band including the frequency band of the input speech temporally associated with the narrowband spectrum envelope extracted from the learning voice data, the narrowband spectrum envelope to the wideband The mapping function for deriving the spectrum envelope is estimated for each representative code, and the narrow band spectrum envelope of the input speech is calculated. A wideband spectrum having a wide frequency band including the frequency band of the input voice is determined by determining which of the representative codes is closest, determining the most similar code as a similar code, and using the mapping function corresponding to the similar code. The speech synthesizer uses a linear predictive synthesis method from the wideband residual signal and the wideband spectrum envelope to synthesize a wideband synthesized speech having a wide frequency band including the frequency band of the input speech. Further, a low-pass filter for extracting only a low-frequency component of the frequency components other than the frequency band of the input voice from the wide-band synthesized voice by a low-pass filter, and the input voice from the wide-band synthesized voice. The high-pass filter that extracts only the high-frequency components of the frequency components other than the frequency band of the When the output low-frequency component, the high-frequency component and the input voice are waveform-superimposed on the time axis, the low-frequency component, the aforesaid ratio at a ratio determined in advance according to the content of the similar code. 9. A voice superimposing device for synthesizing a wide band voice having a wide frequency band including a frequency band of the input voice by superposing a high frequency component and the input voice. The voice band expansion device according to claim 12.

17. The voice superimposing device, if the similar code is a code representing an unvoiced sound such as a fricative sound, the low frequency band of the input voice is 0.5.
If the similar code is a code representing a voiced sound such as a vowel, the low range is set to 1.5 times the input voice and the high range is set to 1.5 times the input voice. 17. The voice band expanding device according to claim 16, wherein the frequency band is set to 0.5 times the input voice so as to emphasize the low frequency region and the waveform is superimposed on the input voice.

18. The envelope broadening device, when estimating the wide band spectrum envelope from the narrow band spectrum envelope, has a narrow band spectrum envelope having the same frequency band as the input voice extracted from a large amount of learning voice data in advance. To create a codebook by classifying each of several similar spectral envelopes, and further obtain a representative code representative of each codebook, and a narrowband spectral envelope extracted from the learning voice data, and Using the wideband spectrum envelope having a wide frequency band including the frequency band of the input speech temporally associated with the narrowband spectrum envelope extracted from the learning voice data, the narrowband spectrum envelope to the wideband The mapping function for deriving the spectrum envelope is estimated for each representative code, and the narrow band spectrum envelope of the input speech is calculated. A wideband spectrum having a wide frequency band including the frequency band of the input voice is determined by determining which of the representative codes is closest, determining the most similar code as a similar code, and using the mapping function corresponding to the similar code. The speech synthesizer uses a linear predictive synthesis method from the wideband residual signal and the wideband spectrum envelope to synthesize a wideband synthesized speech having a wide frequency band including the frequency band of the input speech. Further, a low-pass filter for extracting only a low-frequency component of the frequency components other than the frequency band of the input voice from the wide-band synthesized voice by a low-pass filter, and the input voice from the wide-band synthesized voice. The high-pass filter that extracts only the high-frequency components of the frequency components other than the frequency band of the When waveform-superimposing the output low-frequency component, the high-frequency component, and the input voice on the time axis, the voicedness of the input voice is obtained in advance from the narrowband residual signal, and At a ratio according to the low frequency component,
By superimposing the high frequency component and the input voice,
The voice band expanding device according to any one of claims 8 to 12, further comprising: a voice superimposing device that synthesizes a wide band voice having a wide frequency band including a frequency band of the input voice.

19. A voice superimposing device, if the input voice is a phoneme close to an unvoiced sound such as a fricative, sets the low frequency to 0.5 times the input voice and the high frequency to 1.5 times the input voice. When the high frequency range is emphasized and the input voice is close to a voiced sound such as a vowel, the low frequency range is 1.5 times the input voice and the high frequency range is 0.5 times the input voice.
19. The voice band expanding device according to claim 18, wherein the input voice is waveform-superimposed so as to emphasize the low frequency band by doubling the frequency.

20. A speech superimposing device obtains an autocorrelation coefficient of the input speech as a voiced characteristic of the input speech, and further obtains a zero-order coefficient, that is, a normalized autocorrelation coefficient divided by power,
It is characterized in that the maximum coefficient among the values other than the zero-order coefficient is determined to be the pitch coefficient corresponding to the pitch, and a waveform obtained by multiplying the pitch coefficient by an appropriate value is used to superimpose the waveform on the input voice. The voice band expansion device according to claim 17.

21. An FIR filter is used as the filter according to claim 8, claim 14 or claim 15,
The voice band expansion device according to any one of claims 16 and 18.

22. An IIR filter is used as the filter, claim 8, claim 14, claim 15,
The voice band expansion device according to any one of claims 16 and 18.