JP3540988B2

JP3540988B2 - Sounding body directivity correction method and device

Info

Publication number: JP3540988B2
Application number: JP2000215545A
Authority: JP
Inventors: 健司清原; 賢一古家
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-07-17
Filing date: 2000-07-17
Publication date: 2004-07-07
Anticipated expiration: 2020-07-17
Also published as: JP2002031674A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のマイクロホンで構成されるマイクロホンアレーの出力信号を信号処理して高いＳＮ比で目的音を収音する際に発話者などの発音体の指向性を補正する方法およびその装置に係り、特に発音体に近い位置にある複数のマイクロホンからの出力（以下、「近傍マイク出力」と呼ぶ）を周波数領域で監視し、該近傍マイク出力のうち、その周波数特性のスペクトル包絡の高域が最も高くなっているマイクロホンの方向を発音体の正面として検出し、他のマイクロホンの周波数特性との差を検出し、その差を補正する方法および装置に関する。
【０００２】
【従来の技術】
近年、マルチメディア技術の進歩に伴い、マイクロホンとスピーカを用いた、拡声通話形態によるテレビ会議などの通信会議が可能になりつつある。その場合に、通信会議の机上に話者数分のマイクロホンを設置することなくマイクロホンを意識しない自然な通話が可能で、かつ音声等の目的音のみを収音する収音装置が求められている。
【０００３】
そのような収音装置の例として、複数のマイクロホン（マイクロホンアレー）を設置して、それらの出力を信号処理して目的音を抽出する収音装置がある。このようなマイクロホンアレーを用いて雑音を抑圧し目的音を抽出する信号処理方式には、遅延和方式、AMNORなど多数知られているが（例えば、大賀、山崎、金田共著「音響システムとディジタル処理」電子情報通信学会、1995年,pp.173-197）、例えば、遅延和方式では次のように目的音を抽出する。
【０００４】
図１は遅延和方式による目的音抽出の原理を説明する図である。
図１において、１は収音部（マイクロホンアレー）、２₁，２₂，・・・,２_Mはマイクロホン（Ｍはマイクロホンの数）、３₁，３₂，・・・,３_Mは遅延器、４は加算器、５は出力信号、６は雑音抑圧部、dはマイクロホン間隔、s(t)は収音部１に到来する音波（tは時間を表す）、θは音波s(t)が収音部１に到来する到来角度である。
【０００５】
図１のマイクロホン２₁，２₂，・・・,２_M が等間隔dで直線状に並び、音波s(t)が遠方から、この直線状に並んだマイクロホンに角度θで到来するものとする。このとき、マイクロホン２₁に到達した音波がマイクロホン２₂に到達するまでに伝播する距離は、マイクロホン間隔dと到来角θとからdsinθで表される（図１）。同様に、i番目のマイクロホン２_i(i=2,・・・,M)に到達するまでの伝播する距離は、(i−1)dsinθで表される。従って、マイクロホン２_i(i=2,・・・,M)に到達するまでの遅延時間τ_iは、マイクロホン２₁を基準とすると、この伝播距離を音速cで割ることにより、次式(1)で表される。
【０００６】
τ_i＝(i−1)dsinθ／c (1)
ここで、各マイクロホン２_i(i=1,・・・,M)から出力信号X_i(t)で表すと、これは音波s(t)がτ_iだけ遅れたものであるから、次式(2)のようになる。
X_i(t)＝s(t−τ_i) (2)
ここで遅延器３_i(i=1,・・・,M)の遅延量D_iを適切に設定すると、θ方向から到来する音波のみを強調して出力信号５を出力できることを以下に示す。
【０００７】
遅延器３_i(i=1,・・・,M)の遅延量D_iを次式(3)のように設定する。
D_i ＝D₀−τ_i (3)
ここでD₀は、τ_iの値が小さすぎると遅延特性をディジタルフィルタで実現する際の精度が低下することを防ぐために付加する固定遅延量である。
このとき、遅延器３_i(i=1,・・・,M)の出力は、式(2)の信号に式(3)の遅延D_iが生じたものなので、次式(4)のようになる。
【０００８】

すなわち、マイクロホンの番号iに関わらず、s(t)がD₀だけ遅れた同一の信号となる。
このように位相を揃えてから加算器４によって信号を足し合わせれば、このθ方向から到来する音波は足し合わされた分、強調される。一方、θ方向とは別のθ_N方向から到来する音波は、τ_iとは異なる遅延時間τ_Nをもって受音されるため、式(3)の遅延量では位相は揃わず、加算器４によって信号を足し合わせても強調されることはない。
【０００９】
このようにして、遅延和方式では目的の方向θから到来する音波を強調し、他の方向θ_Nから到来する雑音を相対的に抑圧する。
このとき、目的の方向θを走査し、マイクロホンアレーの出力信号を監視すれば、θが目的話者の方向に向いたとき出力信号が大きくなるので、目的話者の方向を探すことができる。そして、この目的話者の方向θからの音波を強調するように式(4)に従って位相を揃えて加算することにより、すなわちマイクロホンアレーの指向性をθの方向に向けることにより、目的音を高いＳＮ比で収音することができる。
【００１０】
なお、ここでは説明の便宜上、複数のマイクロホンを等間隔dで直線上に並んだものとして説明したが、このマイクロホンの間隔は不等間隔にすることも可能で、並べる形状も２次元的、３次元的に並べてもよい。
また、図２のように点音源的な音源Ｓがマイクロホンアレーに比較的近い距離に位置する場合は、音源Ｓからの球面波的な性質を利用して、遅延器３₁,３₂,・・・,３_Mの後段にゲイン部７₁,７₂,・・・,７_Mを設け、このゲイン部に適切な荷重（ゲイン）を与えることが収音ＳＮ比の向上に重要である。荷重の与え方としては、次式(5),(6),(7)で表されるような与え方がある（野村、金田、小島「近接音場型マイクロホンアレー」日本音響学会誌、53巻２号(1997)、pp.110-116）。
【００１１】
【数１】

ここに、ｒ₁,ｒ₂,・・・,ｒ_Mは音源Ｓから各マイクロホン２₁，２₂，・・・,２_Mまでの距離、ｒ_cは室内の臨界距離、すなわち音源の直接音パワーと残響音パワーとが等しくなる距離であり、室容積Ｖ(m³)、室の残響時間Ｔ（秒）に対し、ｒ_c＝√(0.0032V/T)で表される（H.Kuttruff,「Room Acoustics(Third Edition)」,Elsevier Applied Science,pp.100-132(1991))。このときマイクロホンアレーは音源Ｓの位置の“点”に対して最も感度が高くなるようになり、いわば感度の“焦点”が形成されるようになる。このとき、各マイクロホンまでの距離ｒ_i(i=1,・・・,M)に対する遅延器３₁，３₂，・・・,３_Mの遅延D₀−ｒ_i/ｃ（ｃ：音速）と上述のゲインg₀すなわちaを変化させて感度の焦点を走査し、アレー出力を監視すれば、目的話者の存在する点に感度の焦点が向いたときアレー出力が大きくなるので、これによって目的話者の位置を見出すことができる。
【００１２】
このようにして、方向ないし位置として目的話者の存在領域を見出し、その存在領域にアレーの指向性を向けることにより、高い収音ＳＮ比で目的音を収音することができる。
【００１３】
【発明が解決しようとする課題】
人間が発話する際、その指向性は正面を向いており、一般にその高域成分は、後方に向かうに従い減衰する（例えば、電子通信学会編「聴覚と音声」1975年コロナ社発行、p236）。その様子を図３に示す。この図は、正面を基準に人間の口の指向周波数特性を表している。この図から正面に対して後方において500Hzでは約５dB、４kHzでは約10dB減衰している様子が解る。このとき、図２の遅延和アレー方式で、発話者の後頭部付近に焦点が向くと、高域のこもった音が収音されるという問題があった。
【００１４】
【課題を解決するための手段】
上記の問題を解決するために、発話者などの発音体の位置を検出する音源位置検出手段と、該マイクロホンの出力のうち発音体に近い位置にある複数のマイクロホンからの出力（以下「近傍マイク出力」と呼ぶ）を周波数領域で監視する周波数特性監視手段と、該近傍マイク出力のうち、その周波数特性のスペクトル包絡が高域が最も高くなっているマイクロホンの方向を発音体の正面として検出する発音体指向性検出手段と他のマイクロホン出力のスペクトル特性との差を検出するスペクトル差検出手段とその差を補正するスペクトル差補正手段を設ける（図４参照）。
【００１５】
本発明は上記構成を備えることにより、図２の遅延和アレー方式における荷重ゲインの計算結果が発話者の後方のゲインを高くする結果となっても、高域のこもった音が収音されるという問題を回避することができる。
【００１６】
【発明の実施の形態】
以下、図面を参照して実施例を説明する。
図５は、本発明の第１の実施例を示す。
この図において、発音体指向性補正装置21のマイクロホンアレー22は天井等に２次元的（平面的）に配置されている。このマイクロホンアレー22の出力xi(t)は焦点操作部11によって遅延器３と荷重部７が走査され、各焦点毎に遅延器３で位相を揃えられて信号yi(t)となり、荷重部７で荷重されてgi×yi(t)となる。この出力gi×yi(t)はパワー計算部23に送られパワー(Σgi×yi(t))^2が計算さ
れる。音源位置検出部24では各焦点毎のパワーが比較され、最大パワーとなる焦点位置を音源位置として検出する。この音源位置情報は遅延器３'に送られ、音源位置に焦点が向くようにマイク出力xi(t)の位相を揃えた信号y'(t)を正面候補抽出部25に送る。正面候補抽出部25ではy(t)の中から音源位置に近い周辺のマイク出力（の位相を揃えた信号）を正面候補信号yi'(t)として抽出しハイパスフィルタ26に送り、yi'(t)以外のyi(t)を荷重部７"に送る。ハイパスフィルタ26を通った信号yi"(t)はパワー計算部27に送られ、高域のパワー（yi"(t))^2を計算した後、正面方向検出部27に送られる。正面方向検出部27では最も高域パワーの大きいマイク出力（yi(t)）を正面方向として検出し、正面方向決定部30に正面方向情報を送る。一方、正面候補yi'(t)は周波数領域変換部でFFT（高速フーリエ変換）されてYi(ω)となり、スペクトル包絡抽出部29に送られスペクトル包絡Si(ω)を算出する。正面方向決定部30はSi(ω)の中から正面方向決定部30により送られた正面方向情報を基に正面方向スペクトル包絡S0(ω)を選択する。スペクトル差検出部31ではDi（ω）＝S0(ω)/Si(ω)を算出し、スペクトル包絡S0(ω)を選択する。スペクトル差検出部31ではDi（ω）＝S0(ω)/Si(ω)を算出し、スペクトル差補正部32でZi(ω)＝Yi(ω)×Di（ω）を算出してスペクトル補正を行う。このZi(ω)は時間領域変換部33で逆FFTし時間波形zi(t)に変換される。このzi(t)は音源位置検出部24からの音源位置情報に基づいた荷重をもつ荷重部７'に送られgi×zi(t)となり、荷重部７"に送られたzi(t)はgi×yi(t)となり、それぞれ加算部４'に送られ総和をとられ、出力５に送られる。
【００１７】
なお、上記実施例においてマイクロホンアレーを天井等に２次元的（平面的）に配置する代わりに３次元的に配置してもよい。
【００１８】
【発明の効果】
以上説明したように、本発明は、図２の遅延和アレー方式における荷重ゲインの計算結果が人間の後方のゲインを高くする結果となっても、発音体位置を検出する音源位置検出手段と、該マイクロホンの出力のうち、発音体に近い位置にある複数のマイクロホンからの出力（以下、「近傍マイク出力」と呼ぶ）を周波数領域で監視する周波数特性監視手段と、該近傍マイク出力のうち、その周波数特性のスペクトル包絡の高域が最も高くなっているマイクの方向を発音体の正面として検出する発音体指向性検出手段とを設け、スペクトル差検出手段で他のマイクロホンの周波数特性との差を検出し、スペクトル差補正手段でその差を補正するので、図２の遅延和アレー方式で人間の後方のゲインが高くなっても、高域のこもりを防げるという、これまでにない優れた効果を奏する。
【図面の簡単な説明】
【図１】遅延和方式による雑音抑圧収音の原理を説明する図。
【図２】音源がマイクロホンアレーに近い位置に位置する場合に遅延器の後段のゲインの荷重を適切に設定して収音ＳＮ比を向上させることを説明するための図。
【図３】人間の口の指向性を説明するための図。
【図４】発音体の指向性を検出する様子を説明するための図。
【図５】本発明の実施例を示す構成図。
【符号の説明】
１収音部
２マイクロホン
３遅延器
４加算器（加算部）
６雑音抑圧部
７ゲイン部（荷重部）
11 焦点操作部
21 発音体指向性補正装置
22 マイクロホンアレー
23 パワー計算部
24 音源位置検出部
25 正面候補抽出部
26 ハイパスフィルタ
27 正面方向検出部
28 周波数領域変換部
29 スペクトル包絡抽出部
30 正面方向決定部
31 スペクトル差検出部
32 スペクトル差補正部
33 時間領域変換部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention provides a method and an apparatus for correcting the directivity of a sounding body such as a speaker when signal processing is performed on an output signal of a microphone array including a plurality of microphones to collect a target sound with a high SN ratio. In particular, outputs from a plurality of microphones located close to the sounding body (hereinafter, referred to as “proximal microphone output”) are monitored in the frequency domain, and the high frequency range of the spectral envelope of the frequency characteristic of the nearby microphone output is monitored. The present invention relates to a method and apparatus for detecting the direction of a microphone having the highest frequency as the front of a sounding body, detecting a difference from the frequency characteristics of other microphones, and correcting the difference.
[0002]
[Prior art]
2. Description of the Related Art In recent years, with the advancement of multimedia technology, communication conferences such as video conferences using a microphone and a speaker in a loudspeaker mode have become possible. In such a case, there is a demand for a sound collection device that can perform a natural call without being conscious of the microphone without installing microphones for the number of speakers on the desk of the communication conference and that collects only a target sound such as voice. .
[0003]
As an example of such a sound pickup device, there is a sound pickup device in which a plurality of microphones (microphone arrays) are installed and their outputs are subjected to signal processing to extract a target sound. There are many known signal processing methods for suppressing noise and extracting a target sound using such a microphone array, such as a delay-and-sum method and AMNOR (for example, Oga, Yamazaki, and Kaneda, "Sound System and Digital Processing"). The Institute of Electronics, Information and Communication Engineers, 1995, pp.173-197). For example, in the delay-and-sum method, a target sound is extracted as follows.
[0004]
FIG. 1 is a diagram for explaining the principle of target sound extraction by the delay-and-sum method.
In Figure 1, 1 is sound pickup unit (microphone _{_{array), 2 1, 2 2,}} ···, 2 M microphones (M is the number of _{_{microphones), 3 1, 3 2,}} ···, 3 M delay , 4 is an adder, 5 is an output signal, 6 is a noise suppression unit, d is a microphone interval, s (t) is a sound wave arriving at the sound collection unit 1 (t represents time), and θ is a sound wave s (t ) Is the angle of arrival at the sound pickup unit 1.
[0005]
The

microphones

2 ₁ , 2 ₂ ,..., 2 _M shown in FIG. 1 are arranged in a straight line at equal intervals d, and the sound wave s (t) arrives at an angle θ from a distant microphone. I do. The distance the sound wave reaching the microphone 2 ₁ propagates before reaching the microphone 2 ₂ is represented by dsinθ from the arrival angle θ a microphone spacing d (Figure 1). Similarly, the propagation distance to reach the i-th microphone 2 _i (i = 2,..., M) is represented by (i−1) dsin θ. Accordingly, the microphone _{2 i (i = 2, ···} , M) is a delay time tau _i to reach, when the reference microphones 2 _1, by dividing the propagation distance at the speed of sound c, the following equation (1 ).
[0006]
τ _i = (i−1) dsin θ / c (1)
Here, when each microphone 2 _i (i = 1,..., M) is represented by an output signal X _i (t), since this is a sound wave s (t) delayed by τ _i , It becomes like (2).
X _i (t) = s (t−τ _i ) (2)
Here delayer _{3 i (i = 1, ···} , M) when appropriately setting the delay amount D _i of, shown below can be output only the output signal 5 emphasizes the sound waves arriving from θ direction.
[0007]
The delay amount D _i of the delay unit 3 _i (i = 1,..., M) is set as in the following equation (3).
D _i = D ₀ −τ _i (3)
Here, D ₀ is a fixed delay amount added in order to prevent a decrease in accuracy in realizing delay characteristics with a digital filter when the value of τ _i is too small.
At this time, the delay circuit _{3 i (i = 1, ···} , M) output of, so that the delay D _i of equation (3) to the signal of equation (2) occurs, so that the following equation (4) become.
[0008]

That is, regardless of the number i of the microphone, s (t) is the same signal delayed by D _0.
If the signals are added by the adder 4 after the phases are aligned in this way, the sound waves arriving from the θ direction are emphasized by the added amount. On the other hand, sound waves and theta directions coming from another theta _N direction, since it is sound receiving with different delay times tau _N and tau _i, the phase is not aligned in the delay amount of a compound of formula (3), by the adder 4 The signals are not emphasized when added together.
[0009]
In this way, the delay-and-sum method emphasizes the sound waves coming from the desired direction theta, relatively suppresses noise arriving from other directions theta _N.
At this time, if the output signal of the microphone array is monitored by scanning the target direction θ and monitoring the output signal of the microphone array, the output signal becomes large when θ is directed to the target speaker, so that the direction of the target speaker can be searched. Then, the target sound is increased by aligning and adding the phases according to Expression (4) so as to emphasize the sound wave from the direction θ of the target speaker, that is, by directing the directivity of the microphone array in the direction of θ. Sound can be collected at the SN ratio.
[0010]
Here, for convenience of explanation, a plurality of microphones are described as being arranged on a straight line at equal intervals d. However, the microphones may be arranged at irregular intervals. They may be arranged dimensionally.
When the point sound source S is located relatively close to the microphone array as shown in FIG. 2, the

delay units

3 ₁ , 3 ₂ ,. It is important to improve the sound collection SN ratio by providing gain units 7 ₁ , 7 ₂ ,..., 7 _M at the subsequent stage of 3 _M and applying an appropriate load (gain) to these gain units. There is a method of applying the load as represented by the following formulas (5), (6), and (7) (Nomura, Kaneda, Kojima “Near-field Microphone Array” Journal of the Acoustical Society of Japan, 53 Volume 2 (1997), pp. 110-116).
[0011]
(Equation 1)

Here, r _1, r _2, · · ·, r _M ₁ 2 each microphone from the sound source S, ₂ 2, · · ·, distance to 2 _M, r _c is the critical length of the chamber, i.e., the direct sound of the sound source This is the distance at which the power and the reverberation sound power become equal, and is expressed by r _c = √ (0.0032 V / T) with respect to the room volume V (m ³ ) and the room reverberation time T (second) (H.Kuttruff , "Room Acoustics (Third Edition)", Elsevier Applied Science, pp. 100-132 (1991)). At this time, the microphone array has the highest sensitivity with respect to the “point” of the position of the sound source S, so that a “focus” of the sensitivity is formed. At this time, delays D ₀ −r _i / c (c: sound speed) of the

delay units

3 ₁ , 3 ₂ ,..., 3 _M with respect to the distance r _i (i = 1,..., M) to each microphone. and by changing the above-mentioned gain g ₀ i.e. a scan the focal point of the sensitivity, by monitoring an array output, the array output is increased when the focus sensitivity is directed to a point of presence of the target speaker, whereby The position of the target speaker can be found.
[0012]
In this way, by finding the region where the target speaker is present as the direction or position, and directing the array directivity to that region, the target sound can be collected with a high sound collection SN ratio.
[0013]
[Problems to be solved by the invention]
When a human utters, its directivity is directed toward the front, and its high-frequency component is generally attenuated toward the rear (for example, “Hearing and Speech” edited by the Institute of Electronics and Communication Engineers, 1975, Corona Publishing, p.236). This is shown in FIG. This figure shows the directional frequency characteristics of the human mouth with reference to the front. From this figure, it can be seen that the frequency is about 5 dB at 500 Hz and about 10 dB at 4 kHz with respect to the front. At this time, in the delay-and-sum array method of FIG. 2, when the focus is directed to the vicinity of the back of the speaker, a high-frequency muffled sound is collected.
[0014]
[Means for Solving the Problems]
In order to solve the above problem, sound source position detecting means for detecting the position of a sounding body such as a speaker, and outputs from a plurality of microphones located close to the sounding body among outputs of the microphone (hereinafter referred to as a “neighboring microphone”) Output) is detected in the frequency domain, and the direction of the microphone having the highest spectral envelope of the frequency characteristic of the output of the nearby microphone is detected as the front of the sounding body. A spectral difference detecting means for detecting a difference between the sounding body directivity detecting means and a spectral characteristic of another microphone output and a spectral difference correcting means for correcting the difference are provided (see FIG. 4).
[0015]
According to the present invention, even if the calculation result of the weight gain in the delay-and-sum array method of FIG. 2 results in increasing the gain behind the speaker, a muffled sound in a high frequency range is collected. Problem can be avoided.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments will be described with reference to the drawings.
FIG. 5 shows a first embodiment of the present invention.
In this figure, a microphone array 22 of a sounding body directivity correction device 21 is two-dimensionally (planarly) arranged on a ceiling or the like. The output xi (t) of the microphone array 22 is scanned by the delay unit 3 and the load unit 7 by the focus operation unit 11, and the phase is aligned by the delay unit 3 for each focal point to become a signal yi (t). Gi × yi (t). The output gi × yi (t) is sent to the power calculator 23, where the power (Σgi × yi (t)) ^ 2 is calculated. The sound source position detection unit 24 compares the power of each focal point, and detects the focal position having the maximum power as the sound source position. The sound source position information is sent to the delay unit 3 ', and a signal y' (t) in which the phases of the microphone outputs xi (t) are aligned so that the focus is directed to the sound source position is sent to the front candidate extraction unit 25. The front candidate extraction unit 25 extracts the microphone output (a signal with the same phase) near the sound source position from the y (t) as a front candidate signal yi ′ (t), sends the same to the high-pass filter 26, and outputs yi ′ ( The yi (t) other than t) is sent to the load 7 ". The signal yi" (t) that has passed through the high-pass filter 26 is sent to the power calculator 27, and the high-frequency power (yi "(t)) ^ 2 Is calculated and sent to the front direction detection unit 27. The front direction detection unit 27 detects the microphone output (yi (t)) having the highest high-frequency power as the front direction, and sends the front direction information to the front direction determination unit 30. On the other hand, the front candidate yi ′ (t) is subjected to FFT (Fast Fourier Transform) in the frequency domain transform unit to become Yi (ω), which is sent to the spectrum envelope extraction unit 29 to calculate the spectrum envelope Si (ω). The front direction determining unit 30 selects the front direction spectrum envelope S0 (ω) from Si (ω) based on the front direction information sent by the front direction determining unit 30. The torque difference detector 31 calculates Di (ω) = S0 (ω) / Si (ω) and selects the spectrum envelope S0 (ω) .The spectrum difference detector 31 calculates Di (ω) = S0 (ω) / Calculate Si (ω) and perform spectrum correction by calculating Zi (ω) = Yi (ω) × Di (ω) in the spectrum difference correction unit 32. The Zi (ω) is inversely calculated in the time domain conversion unit 33. FFT is performed and converted into a time waveform zi (t), which is sent to a load unit 7 ′ having a load based on the sound source position information from the sound source position detection unit 24, and becomes gi × zi (t), The zi (t) sent to the load unit 7 ″ becomes gi × yi (t), and is sent to the adding unit 4 ′ to be summed up and sent to the output 5.
[0017]
In the above embodiment, the microphone array may be arranged three-dimensionally instead of two-dimensionally (in a plane) on the ceiling or the like.
[0018]
【The invention's effect】
As described above, the present invention provides a sound source position detecting means for detecting a sounding body position even if a calculation result of a weight gain in the delay-and-sum array method of FIG. Frequency characteristic monitoring means for monitoring, in the frequency domain, outputs from a plurality of microphones close to the sounding body (hereinafter, referred to as “proximate microphone output”) among the outputs of the microphone; Sounding body directivity detecting means for detecting, as the front of the sounding body, the direction of the microphone in which the high frequency range of the spectral envelope of the frequency characteristic is the highest, and the difference from the frequency characteristics of other microphones is detected by the spectral difference detecting means. Is detected, and the difference is corrected by the spectrum difference correcting means. Therefore, even if the gain behind the human becomes high by the delay-and-sum array method of FIG. Achieve the unprecedented excellent effect.
[Brief description of the drawings]
FIG. 1 is a view for explaining the principle of noise suppression sound collection by a delay-and-sum method.
FIG. 2 is a diagram for explaining how to appropriately set a gain load at a subsequent stage of a delay unit to improve a sound pickup S / N ratio when a sound source is located at a position close to a microphone array.
FIG. 3 is a diagram for explaining directivity of a human mouth.
FIG. 4 is a diagram for explaining how to detect the directivity of a sounding body.
FIG. 5 is a configuration diagram showing an embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Sound collection part 2 Microphone 3 Delay device 4 Adder (addition part)
6 Noise suppression section 7 Gain section (load section)
11 Focus control section
21 Sounding body directivity correction device
22 microphone array
23 Power calculator
24 Sound source position detector
25 Front candidate extraction unit
26 High-pass filter
27 Front direction detector
28 Frequency domain converter
29 Spectral envelope extractor
30 Front direction determination unit
31 Spectrum difference detector
32 Spectrum difference corrector
33 Time domain converter

Claims

In a sounding body directivity correction method in a sound collection method using a microphone array including a plurality of microphones and a microphone array device that performs signal processing on an output signal of the microphone array,
The sounding body position is detected from the output signals of the plurality of microphones, and the output signals from the plurality of microphones located closer to the sounding body among the output signals of the microphones (hereinafter, referred to as “proximal microphone output signals”) are in the frequency domain. And detects the direction of the microphone having the highest frequency envelope of the frequency characteristic of the output signal of the nearby microphone as the front of the sounding body, and detects the difference from the frequency characteristics of other microphones. And correcting the difference.

A sounding body directivity correction device in a sound collection device using a microphone array including a plurality of microphones and a microphone array device that performs signal processing on an output signal of the microphone array,
Sound source position detection means for inputting output signals of a plurality of microphones to detect a sounding body position, and output signals from a plurality of microphones located close to the sounding body among output signals of the microphones (hereinafter referred to as “proximity microphone output”); Frequency signal monitoring means for monitoring the signal in the frequency domain, and detecting, as the front of the sounding body, the direction of the microphone having the highest frequency envelope of the frequency characteristic among the output signals of the nearby microphones. And a spectrum difference detecting means for detecting a difference between the sounding body directivity detecting means and a spectrum characteristic of another microphone output signal, and a spectrum difference correcting means for correcting the difference. .

The sounding body directivity correction device according to claim 2,
The frequency characteristic monitoring means comprises frequency domain conversion means for converting a nearby microphone output signal into a frequency domain, and spectrum envelope extraction means for extracting a spectrum envelope of a frequency characteristic from the frequency domain signal. Sex correction device.

The sounding body directivity correction device according to claim 3,
The sounding body directivity correction apparatus, characterized in that the spectrum envelope extracting means comprises means for extracting a low-order cepstrum from the frequency domain signal and means for extracting the spectrum envelope from the low-order cepstrum.