JP5321171B2

JP5321171B2 - Sound processing apparatus and program

Info

Publication number: JP5321171B2
Application number: JP2009064758A
Authority: JP
Inventors: 健一山内
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2009-03-17
Filing date: 2009-03-17
Publication date: 2013-10-23
Anticipated expiration: 2029-03-17
Also published as: JP2010217552A

Abstract

<P>PROBLEM TO BE SOLVED: To extract only non-objective stationary sound in an environment where the non-objective stationary sound and non-objective variation sound exist. <P>SOLUTION: A sound source separation section 30 specifies the intensity XB(k) of each non-objective sound frequency FB where the non-objective sound of a direction different from the objective sound in K pieces of frequency f1 to FK, for each unit section from a sound signal S1 and a sound signal S2. A noise estimation section 42 creates a noise spectrum N for each unit section. That is, in the noise estimation section 42, when the intensity XB(k) of a frequency fk (non-objective sound frequency FB) in the n-th unit section is lower than a threshold XTH, the intensity μn of the frequency fk in a noise spectrum N of the unit section is set according to the intensity XB(k) and the intensity μn-1(k) of the noise spectrum N of the (n-1)-th unit section. When the intensity XB(k) exceeds the threshold XTH, the intensity μn(k) is set according to the intensity μn-1(k), and the intensity XB(k) is not reflected. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、所定の方向から到来する音響（以下「目的音」という）と目的音以外の音響（以下「非目的音」という）との混合音から非目的音（特に定常的な成分）を推定する技術に関する。 In the present invention, a non-target sound (especially a stationary component) is obtained from a mixed sound of sound arriving from a predetermined direction (hereinafter referred to as “target sound”) and sound other than the target sound (hereinafter referred to as “non-target sound”). It relates to estimation technology.

複数の収音機器が生成した複数の音信号における複数の周波数（周波数帯域）の各々を、目的音が優勢な目的音周波数と非目的音が優勢な非目的音周波数とに選別する技術が従来から提案されている。例えば非特許文献１には、複数の音信号のうち目的音の音源に近い収音機器が生成した音信号の強度が高い周波数を目的音周波数に選別する技術（SAFIA）が開示されている。また、特許文献１には、目的音を強調した目的音優勢信号と目的音を抑制した目的音劣勢信号とを複数の音信号の遅延および加算（すなわちビームの形成）で生成し、目的音優勢信号の強度が目的音劣勢信号の強度を上回る周波数を目的音周波数に選別する技術が開示されている。 Conventionally, a technology for selecting each of a plurality of frequencies (frequency bands) in a plurality of sound signals generated by a plurality of sound collecting devices into a target sound frequency in which the target sound is dominant and a non-target sound frequency in which the non-target sound is dominant. Proposed by For example, Non-Patent Document 1 discloses a technique (SAFIA) for selecting a frequency having a high intensity of a sound signal generated by a sound collecting device close to a target sound source among a plurality of sound signals as a target sound frequency. Further, in Patent Document 1, a target sound dominance signal in which a target sound is emphasized and a target sound inferior signal in which the target sound is suppressed are generated by delaying and adding a plurality of sound signals (that is, beam formation), and the target sound dominance is generated. A technique is disclosed in which a frequency whose signal intensity exceeds the intensity of the target sound inferior signal is selected as the target sound frequency.

Mariko Aoki, et al., "Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones", Acoustical Science and Technology, Vol.22, No.2 p.149-p.157, 2001Mariko Aoki, et al., "Sound source segregation based on controlling incident angle of each frequency component of input signals acquired by multiple microphones", Acoustical Science and Technology, Vol.22, No.2 p.149-p.157, 2001

特開２００６−１９７５５２号公報JP 2006-197552 A

ところで、非特許文献１や特許文献１の技術のもとでは、所定の方向からの到来音であるか否かという基準で目的音と非目的音とが区別される。したがって、空調設備の動作音や人込み内での雑踏音などの時間的に定常的な雑音（以下「非目的定常音」という）が存在し、かつ、音響的な特性（例えば音量や音高）が刻々と変化する音声や楽音などの音響（以下「非目的変動音」という）が目的音とは別方向から到来する環境では、非目的定常音および非目的変動音の双方が区別なく非目的音として抽出される。すなわち、非目的定常音のみを非目的音として抽出することは困難である。以上の事情を背景として、本発明は、非目的定常音および非目的変動音の双方が存在する環境のもとで非目的定常音のみを高精度に抽出することを目的とする。 By the way, under the techniques of Non-Patent Document 1 and Patent Document 1, the target sound and the non-target sound are distinguished on the basis of whether or not the sound is an incoming sound from a predetermined direction. Therefore, there is noise that is stationary in time (hereinafter referred to as “non-target steady sound”), such as operating noise of air conditioning equipment and crowded noise in crowds, and acoustic characteristics (for example, volume and pitch) ) Is changing every moment, in an environment where sounds such as sounds and musical sounds (hereinafter referred to as “non-target fluctuation sounds”) arrive from a different direction from the target sound, both non-target stationary sounds and non-target fluctuation sounds are not distinguished. Extracted as the target sound. That is, it is difficult to extract only non-target steady sounds as non-target sounds. In view of the above circumstances, an object of the present invention is to extract only non-target stationary sound with high accuracy in an environment where both non-target stationary sound and non-target fluctuation sound exist.

以上の課題を解決するために、本発明に係る音処理装置は、複数の収音機器が生成した複数の音信号から、複数の周波数のうち目的音とは別方向から到来する非目的音が優勢な各非目的音周波数の成分の強度（振幅またはパワー）を、単位区間毎に特定する音源分離手段と、単位区間毎に雑音スペクトルを生成する雑音推定手段とを具備し、雑音推定手段は、第１単位区間における一の非目的音周波数の成分の強度（例えば図４の強度ＸB(k)）が、第１単位区間の開始前の第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度（例えば図４の強度μn-1(k)）を超える閾値（例えば図４の閾値ＸTH）を下回る場合、第１単位区間の雑音スペクトルにおける一の非目的音周波数での強度（例えば図４の強度μn(k)）を、第１単位区間における一の非目的音周波数の成分の強度と、第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度とに応じて設定し、第１単位区間における一の非目的音周波数の成分の強度が閾値を上回る場合、第１単位区間の雑音スペクトルにおける一の非目的音周波数での強度を、第１単位区間における一の非目的音周波数の成分の強度を反映させずに、第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度を上回る数値に設定する。以上の構成においては、第１単位区間における一の非目的音周波数の成分の強度が閾値を上回る場合（例えば一の非目的音周波数に非目的変動音が発生した場合）に、第１単位区間における当該非目的音周波数の成分の強度を反映させずに雑音スペクトルの強度が設定されるから、非目的定常音のみを高精度に抽出した（すなわち非目的変動音を有効に抑制した）雑音スペクトルを生成することが可能である。また、第２単位区間の雑音スペクトルの強度を上回る数値が第１単位区間の雑音スペクトルの強度として適用されるから、音処理装置の動作中に新たに発生した非目的定常音を適切に雑音スペクトルに含ませることが可能である。 In order to solve the above-described problems, the sound processing apparatus according to the present invention generates a non-target sound coming from a different direction from a target sound among a plurality of frequencies from a plurality of sound signals generated by a plurality of sound collecting devices. Comprising a sound source separation means for specifying the strength (amplitude or power) of each dominant non-target sound frequency component for each unit section, and a noise estimation means for generating a noise spectrum for each unit section, The intensity of one non-target sound frequency component in the first unit section (for example, the intensity XB (k) in FIG. 4) is one non-target sound in the noise spectrum of the second unit section before the start of the first unit section. If the intensity at a frequency (for example, the intensity μn-1 (k) in FIG. 4) exceeds a threshold (for example, the threshold XTH in FIG. 4), the intensity at one non-target sound frequency in the noise spectrum of the first unit interval ( For example, the intensity μn (k)) in FIG. One non-target sound frequency component in the first unit section is set according to the intensity of the one non-target sound frequency component and the intensity at the one non-target sound frequency in the noise spectrum of the second unit section. If the intensity exceeds the threshold, the intensity at one non-target sound frequency in the noise spectrum of the first unit section is reflected in the second unit without reflecting the intensity of the component of one non-target sound frequency in the first unit section. Set to a value that exceeds the intensity at one non-target sound frequency in the noise spectrum of the unit interval. In the above configuration, when the intensity of one non-target sound frequency component in the first unit section exceeds a threshold (for example, when a non-target fluctuation sound is generated at one non-target sound frequency), the first unit section Since the intensity of the noise spectrum is set without reflecting the intensity of the component of the non-target sound frequency in, only the non-target stationary sound is extracted with high accuracy (that is, the non-target fluctuation sound is effectively suppressed). Can be generated. In addition, since a numerical value exceeding the noise spectrum intensity of the second unit section is applied as the noise spectrum intensity of the first unit section, the non-target stationary sound newly generated during the operation of the sound processing apparatus is appropriately treated as the noise spectrum. Can be included.

第１単位区間の雑音スペクトルの強度は、第１単位区間の開始前（例えば直前）の１個の単位区間（第２単位区間）の雑音スペクトルの強度、または、第１単位区間の開始前の複数の単位区間（第２単位区間）の各々の雑音スペクトルの強度に応じて設定される。本発明の好適な態様における雑音推定手段は、第１単位区間における一の非目的音周波数の成分の強度が閾値を下回る場合に、第１単位区間の一の非目的音周波数の成分の強度と、第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度との加重和（例えば数式(2)）を、第１単位区間の雑音スペクトルにおける一の非目的音周波数での強度として設定する。以上の態様においては、第１単位区間の非目的音周波数の強度と第２単位区間の雑音スペクトルの強度との加重和が第１単位区間の雑音スペクトルの強度として算定されるから、第１単位区間からみて過去の複数の単位区間にわたって雑音スペクトルを保持する必要がないという利点がある。なお、本発明における「周波数」は、周波数軸上の１点の周波数に加えて、周波数軸上で拡がりを持った周波数帯域を含む概念である。 The intensity of the noise spectrum of the first unit section is the intensity of the noise spectrum of one unit section (second unit section) before the start of the first unit section (for example, immediately before), or before the start of the first unit section. It is set according to the intensity of each noise spectrum of a plurality of unit intervals (second unit intervals). According to a preferred aspect of the present invention, when the intensity of one non-target sound frequency component in the first unit section is below a threshold, the noise estimation means The weighted sum (for example, Equation (2)) with the intensity at one non-target sound frequency in the noise spectrum of the second unit section is set as the intensity at one non-target sound frequency in the noise spectrum of the first unit section. To do. In the above aspect, since the weighted sum of the intensity of the non-target sound frequency in the first unit interval and the intensity of the noise spectrum in the second unit interval is calculated as the intensity of the noise spectrum in the first unit interval, There is an advantage that it is not necessary to hold the noise spectrum over a plurality of past unit intervals when viewed from the interval. The “frequency” in the present invention is a concept including a frequency band having a spread on the frequency axis in addition to a single frequency on the frequency axis.

本発明の好適な態様において、音源分離手段は、複数の周波数のうち目的音が優勢な各目的音周波数の成分で構成される目的音スペクトルを生成し、目的音スペクトルから雑音スペクトルを減算する雑音抑圧手段を具備する。以上の態様においては、目的音周波数に選別された成分で構成される目的音スペクトルから非目的定常音の雑音スペクトルが減算されるから、非目的変動音および非目的定常音の双方を有効に抑圧することが可能である。 In a preferred aspect of the present invention, the sound source separation means generates a target sound spectrum composed of components of each target sound frequency where the target sound is dominant among a plurality of frequencies, and subtracts the noise spectrum from the target sound spectrum. Suppressing means is provided. In the above aspect, since the noise spectrum of the non-target stationary sound is subtracted from the target sound spectrum composed of the components selected for the target sound frequency, both the non-target fluctuation sound and the non-target stationary sound are effectively suppressed. Is possible.

以上の各態様に係る音処理装置は、音処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、複数の収音機器が生成した複数の音信号から、複数の周波数のうち目的音とは別方向から到来する非目的音が優勢な各非目的音周波数の成分の強度を、単位区間毎に特定する音源分離処理と、単位区間毎に雑音スペクトルを生成する処理であって、第１単位区間における一の非目的音周波数の成分の強度が、第１単位区間の開始前の第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度を超える閾値を下回る場合、第１単位区間の雑音スペクトルにおける一の非目的音周波数での強度を、第１単位区間における一の非目的音周波数の成分の強度と、第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度とに応じて設定し、第１単位区間における一の非目的音周波数の成分の強度が閾値を上回る場合、第１単位区間の雑音スペクトルにおける一の非目的音周波数での強度を、第１単位区間における一の非目的音周波数の成分の強度を反映させずに、第２単位区間の雑音スペクトルにおける一の非目的音周波数での強度を上回る数値に設定する雑音推定処理とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る信号処理装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。 The sound processing apparatus according to each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to sound processing, and a general-purpose arithmetic processing apparatus such as a CPU (Central Processing Unit). This is also realized through collaboration with programs. The program according to the present invention is based on a plurality of sound signals generated by a plurality of sound collecting devices, and the intensity of each non-target sound frequency component in which a non-target sound arriving from a direction different from the target sound is dominant among a plurality of frequencies. Sound source separation processing for identifying each unit section and processing for generating a noise spectrum for each unit section, where the intensity of one non-target sound frequency component in the first unit section is the start of the first unit section If the noise spectrum of the second unit interval is less than the threshold value exceeding the intensity at one non-target sound frequency, the intensity at one non-target sound frequency in the noise spectrum of the first unit interval is It is set according to the intensity of one non-target sound frequency component and the intensity of one non-target sound frequency in the noise spectrum of the second unit section, and the one non-target sound frequency component of the first unit section Strength is threshold If the value exceeds the value, the intensity at one non-target sound frequency in the noise spectrum of the first unit section is not reflected in the intensity of the component of one non-target sound frequency in the first unit section. The computer executes noise estimation processing for setting a numerical value exceeding the intensity at one non-target sound frequency in the noise spectrum. According to the above program, the same operation and effect as the signal processing apparatus according to the present invention are exhibited. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

本発明の第１実施形態に係る音処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment of the present invention. 音源分離部のブロック図である。It is a block diagram of a sound source separation part. 信号処理部による処理を説明するためのグラフである。It is a graph for demonstrating the process by a signal processing part. 雑音推定部の動作のフローチャートである。It is a flowchart of operation | movement of a noise estimation part. 第１実施形態における雑音スペクトルのスペクトログラムである。It is a spectrogram of the noise spectrum in a 1st embodiment. 対比例における雑音スペクトルのスペクトログラムである。It is the spectrogram of the noise spectrum in contrast.

＜Ａ：第１実施形態＞
図１は、本発明の第１実施形態に係る音処理装置のブロック図である。図１に示すように、音処理装置１００には収音機器Ｍ1と収音機器Ｍ2とが接続される。収音機器Ｍ1および収音機器Ｍ2は、周囲の音響の波形を表す信号を生成する無指向性（略無指向性）のマイクロホンである。目的音と非目的音との混合音が周囲から収音機器Ｍ1および収音機器Ｍ2に到達する。収音機器Ｍ1および収音機器Ｍ2の各々は、目的音と非目的音との混合音の波形を表す電気信号を生成する。収音機器Ｍ1は音信号Ｓ1を生成し、収音機器Ｍ2は音信号Ｓ2を生成する。 <A: First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus according to the first embodiment of the present invention. As shown in FIG. 1, a sound collection device M1 and a sound collection device M2 are connected to the sound processing apparatus 100. The sound collection device M1 and the sound collection device M2 are omnidirectional (substantially omnidirectional) microphones that generate a signal representing a surrounding acoustic waveform. The mixed sound of the target sound and the non-target sound reaches the sound collecting device M1 and the sound collecting device M2 from the surroundings. Each of the sound collecting device M1 and the sound collecting device M2 generates an electrical signal representing a waveform of a mixed sound of the target sound and the non-target sound. The sound collecting device M1 generates a sound signal S1, and the sound collecting device M2 generates a sound signal S2.

目的音は、既知の方向Ｄ0から収音機器Ｍ1および収音機器Ｍ2に到来する音響である。例えば利用者の発話音が入力される電子機器（例えば携帯電話機）に音処理装置１００が搭載される場合、電子機器の本体に対して正面の方向Ｄ0から発話音が目的音として到来する。収音機器Ｍ1と収音機器Ｍ2とは、目的音が到来する方向Ｄ0に垂直な方向に沿って相互に離間して配置される。一方、非目的音は、目的音の方向Ｄ0とは別方向（ＤR，ＤL）から到来する音響である。方向Ｄ0に対して時計回りに45°の方向ＤRや方向Ｄ0に対して反時計回りに45°の方向ＤLから収音機器Ｍ1および収音機器Ｍ2に非目的音が到来する。 The target sound is sound that arrives at the sound collecting device M1 and the sound collecting device M2 from the known direction D0. For example, when the sound processing apparatus 100 is mounted on an electronic device (for example, a mobile phone) to which a user's speech sound is input, the speech sound arrives as a target sound from the front direction D0 with respect to the main body of the electronic device. The sound collection device M1 and the sound collection device M2 are arranged apart from each other along a direction perpendicular to the direction D0 in which the target sound arrives. On the other hand, the non-target sound is sound coming from a direction (DR, DL) different from the direction D0 of the target sound. The non-target sound arrives at the sound collecting device M1 and the sound collecting device M2 from the direction DR of 45 ° clockwise with respect to the direction D0 and the direction DL of 45 ° counterclockwise with respect to the direction D0.

音処理装置１００は、目的音と非目的音との混合音のうちの非目的音を抑制した音信号ＳOUTを音信号Ｓ1および音信号Ｓ2から生成する。音信号ＳOUTは、放音機器（例えばスピーカやヘッドホン）に供給されることで音響として再生される。なお、音信号Ｓ1および音信号Ｓ2をデジタル信号に変換するＡ/Ｄ変換器や、音信号ＳOUTをアナログ信号に変換するＤ/Ａ変換器の図示は便宜的に省略されている。 The sound processing apparatus 100 generates a sound signal SOUT that suppresses the non-target sound of the mixed sound of the target sound and the non-target sound from the sound signal S1 and the sound signal S2. The sound signal SOUT is reproduced as sound by being supplied to a sound emitting device (for example, a speaker or headphones). Note that an A / D converter that converts the sound signal S1 and the sound signal S2 into a digital signal and a D / A converter that converts the sound signal SOUT into an analog signal are omitted for convenience.

図１に示すように、音処理装置１００は、演算処理装置１２と記憶装置１４とを含むコンピュータシステムで実現される。記憶装置１４は、音信号Ｓ1および音信号Ｓ2から音信号ＳOUTを生成するためのプログラムや各種のデータを記憶する。磁気記録媒体や半導体記録媒体などの公知の記録媒体が記憶装置１４として任意に採用される。演算処理装置１２は、記憶装置１４に格納されたプログラムを実行することで複数の要素（周波数分析部２０，音源分離部３０，雑音推定部４２，雑音抑圧部４４，信号合成部５０）として機能する。なお、音処理に専用される電子回路（ＤＳＰ）が演算処理装置１２の各要素を実現する構成や、演算処理装置１２の各要素を複数の集積回路に分散的に搭載した構成も採用される。 As shown in FIG. 1, the sound processing device 100 is realized by a computer system including an arithmetic processing device 12 and a storage device 14. The storage device 14 stores a program and various data for generating the sound signal SOUT from the sound signal S1 and the sound signal S2. A known recording medium such as a magnetic recording medium or a semiconductor recording medium is arbitrarily employed as the storage device 14. The arithmetic processing unit 12 functions as a plurality of elements (frequency analysis unit 20, sound source separation unit 30, noise estimation unit 42, noise suppression unit 44, signal synthesis unit 50) by executing a program stored in the storage device 14. To do. A configuration in which an electronic circuit (DSP) dedicated to sound processing realizes each element of the arithmetic processing device 12 and a configuration in which each element of the arithmetic processing device 12 is mounted in a plurality of integrated circuits are also adopted. .

周波数分析部２０は、音信号Ｓ1を時間軸上で区分した複数の単位区間（フレーム）の各々について周波数スペクトルＰ1を算定する。周波数スペクトルＰ1の特定には、ＦＦＴ（Fast Fourier Transform）処理などの公知の周波数分析が任意に採用される。また、周波数分析部２０は、周波数スペクトルＰ1の特定と同様の方法で、音信号Ｓ2の各単位区間について周波数スペクトルＰ2を特定する。 The frequency analysis unit 20 calculates a frequency spectrum P1 for each of a plurality of unit sections (frames) obtained by dividing the sound signal S1 on the time axis. For specifying the frequency spectrum P1, known frequency analysis such as FFT (Fast Fourier Transform) processing is arbitrarily employed. Further, the frequency analysis unit 20 specifies the frequency spectrum P2 for each unit section of the sound signal S2 by the same method as the specification of the frequency spectrum P1.

図１の音源分離部３０は、周波数軸上に離散的に設定されたＫ個（Ｋは自然数）の周波数（周波数帯域）ｆ1〜ｆKの各々を単位区間毎に目的音周波数ＦAと非目的音周波数ＦBとに選別することで目的音スペクトルＱAおよび非目的音スペクトルＱBを単位区間毎に生成する。目的音周波数ＦAは、目的音が優勢な周波数（典型的には目的音の音量が非目的音の音量を上回る周波数）であり、非目的音周波数ＦBは、非目的音が優勢な周波数（典型的には非目的音の音量が目的音の音量を上回る周波数）である。目的音スペクトルＱA1は目的音周波数ＦAの成分で構成され、非目的音スペクトルＱBは非目的音周波数ＦBの成分で構成される。目的音周波数ＦAと非目的音周波数ＦBとの選別には、以下に例示するように、目的音が到来する方向Ｄ0と非目的音が到来する方向（ＤR，ＤL）との相違を利用した方法（特許文献１）が好適に採用される。 The sound source separation unit 30 of FIG. 1 uses the target sound frequency FA and the non-target sound for each of the K (K is a natural number) frequencies (frequency bands) f1 to fK discretely set on the frequency axis. By selecting the frequency FB, the target sound spectrum QA and the non-target sound spectrum QB are generated for each unit section. The target sound frequency FA is a frequency where the target sound is dominant (typically the frequency where the target sound volume exceeds the volume of the non-target sound), and the non-target sound frequency FB is a frequency where the non-target sound is dominant (typically Specifically, the non-target sound volume is higher than the target sound volume). The target sound spectrum QA1 is composed of components of the target sound frequency FA, and the non-target sound spectrum QB is composed of components of the non-target sound frequency FB. In order to select the target sound frequency FA and the non-target sound frequency FB, as illustrated below, a method using the difference between the direction D0 where the target sound arrives and the direction (DR, DL) where the non-target sound arrives (Patent Document 1) is preferably employed.

図２は、音源分離部３０のブロック図である。図２に示すように、音源分離部３０は、信号処理部３２と周波数選別部３４と強度特定部３６とを含んで構成される。信号処理部３２は、複数の方向（Ｄ0，ＤR，ＤL）の各々からの到来音を他方向からの到来音と比較して抑制した複数の周波数スペクトル（Ｐ0，ＰR，ＰL）を周波数スペクトルＰ1および周波数スペクトルＰ2から生成する。図３は、信号処理部３２による処理の内容を説明するためのグラフである。図３の横軸は、目的音の方向Ｄ0を基準（０°）とした角度θを意味し、図３の縦軸は信号の強度（振幅またはパワー）を意味する。 FIG. 2 is a block diagram of the sound source separation unit 30. As shown in FIG. 2, the sound source separation unit 30 includes a signal processing unit 32, a frequency selection unit 34, and an intensity specifying unit 36. The signal processing unit 32 compares a plurality of frequency spectra (P0, PR, PL) in which the incoming sound from each of the plurality of directions (D0, DR, DL) is suppressed by comparing with the incoming sound from the other direction. And the frequency spectrum P2. FIG. 3 is a graph for explaining the contents of processing by the signal processing unit 32. The horizontal axis in FIG. 3 means the angle θ with the target sound direction D0 as the reference (0 °), and the vertical axis in FIG. 3 means the signal intensity (amplitude or power).

図２に示すように、信号処理部３２は、第１処理部３２１と第２処理部３２２と第３処理部３２３とで構成される。第１処理部３２１は、周波数スペクトルＰ1から周波数スペクトルＰ2を減算することで周波数スペクトルＰ0を生成する。方向Ｄ0から到来する目的音は略同位相で収音機器Ｍ1および収音機器Ｍ2に到達するから、周波数スペクトルＰ0は、図３に符号Ｂ0（実線）で示すように、方向Ｄ0から到来する目的音を別方向からの到来音に対して抑制したスペクトルに相当する。すなわち、第１処理部３２１は、方向Ｄ0に収音上の死角を形成する死角制御型（null）のビームフォーマである。 As shown in FIG. 2, the signal processing unit 32 includes a first processing unit 321, a second processing unit 322, and a third processing unit 323. The first processing unit 321 generates the frequency spectrum P0 by subtracting the frequency spectrum P2 from the frequency spectrum P1. Since the target sound arriving from the direction D0 reaches the sound collecting device M1 and the sound collecting device M2 with substantially the same phase, the frequency spectrum P0 is the purpose arriving from the direction D0, as indicated by the symbol B0 (solid line) in FIG. This corresponds to a spectrum in which sound is suppressed with respect to incoming sound from another direction. In other words, the first processing unit 321 is a blind spot control type (null) beamformer that forms a blind spot on sound collection in the direction D0.

第２処理部３２２は、音信号Ｓ1を遅延量Ｄだけ遅延させた信号の周波数スペクトルＤ(P1)を周波数スペクトルＰ2から減算することで周波数スペクトルＰRを生成する。遅延量Ｄは、方向ＤRからの到来音が収音機器Ｍ1に到達する時点と収音機器Ｍ2に到達する時点との時間差に設定される。したがって、周波数スペクトルＰRは、図３に符号ＢR（破線）で示すように、方向ＤRから到来する非目的音を別方向からの到来音に対して抑制したスペクトルに相当する。すなわち、第２処理部３２２は、方向ＤRに収音上の死角を形成する死角制御型のビームフォーマである。同様に、第３処理部３２３は、図３に符号ＢLで示すように、音信号Ｓ2を遅延量Ｄだけ遅延させた信号の周波数スペクトルＤ（P2）を周波数スペクトルＰ1から減算することで、方向ＤLからの非目的音を抑制した周波数スペクトルＰLを生成する死角制御型のビームフォーマである。 The second processing unit 322 generates the frequency spectrum PR by subtracting the frequency spectrum D (P1) of the signal obtained by delaying the sound signal S1 by the delay amount D from the frequency spectrum P2. The delay amount D is set to the time difference between the time when the incoming sound from the direction DR reaches the sound collecting device M1 and the time when it reaches the sound collecting device M2. Therefore, the frequency spectrum PR corresponds to a spectrum in which the non-target sound arriving from the direction DR is suppressed with respect to the incoming sound from another direction, as indicated by a symbol BR (broken line) in FIG. In other words, the second processing unit 322 is a blind spot control type beam former that forms a blind spot on sound collection in the direction DR. Similarly, the third processing unit 323 subtracts the frequency spectrum D (P2) of the signal obtained by delaying the sound signal S2 by the delay amount D from the frequency spectrum P1, as indicated by reference sign BL in FIG. This is a blind spot control type beam former that generates a frequency spectrum PL in which non-target sounds from DL are suppressed.

図２の周波数選別部３４は、信号処理部３２が生成した３種類の周波数スペクトル（Ｐ0，ＰR，ＰL）の強度をＫ個の周波数ｆ1〜ｆKの各々について比較することで、Ｋ個の周波数ｆ1〜ｆKの各々を目的音周波数ＦAと非目的音周波数ＦBとに選別する。図２に示すように、周波数選別部３４は、第１比較部３４１と第２比較部３４２とを含んで構成される。 2 compares the intensities of the three types of frequency spectra (P0, PR, and PL) generated by the signal processing unit 32 for each of the K frequencies f1 to fK. Each of f1 to fK is sorted into a target sound frequency FA and a non-target sound frequency FB. As shown in FIG. 2, the frequency selection unit 34 includes a first comparison unit 341 and a second comparison unit 342.

第１比較部３４１は、周波数スペクトルＰRと周波数スペクトルＰLとの間でＫ個の周波数ｆ1〜ｆKの各々における強度を比較することで周波数スペクトルＰLRを生成する。周波数スペクトルＰLRの周波数ｆkにおける強度は、周波数スペクトルＰRの周波数ｆkにおける強度と周波数スペクトルＰLの周波数ｆkにおける強度とのうちの低い方の強度に設定される。周波数スペクトルＰRは方向ＤRからの非目的音を抑制したスペクトルであり、周波数スペクトルＰLは方向ＤLからの非目的音を抑制したスペクトルであるから、周波数スペクトルＰLRは、方向ＤRおよび方向ＤLの非目的音を抑制したスペクトル（すなわち、方向Ｄ0からの目的音を強調したスペクトル）に相当する。 The first comparison unit 341 generates the frequency spectrum PLR by comparing the intensities at each of the K frequencies f1 to fK between the frequency spectrum PR and the frequency spectrum PL. The intensity of the frequency spectrum PLR at the frequency fk is set to the lower one of the intensity at the frequency fk of the frequency spectrum PR and the intensity at the frequency fk of the frequency spectrum PL. Since the frequency spectrum PR is a spectrum in which the non-target sound from the direction DR is suppressed, and the frequency spectrum PL is a spectrum in which the non-target sound from the direction DL is suppressed, the frequency spectrum PLR is the non-purpose of the direction DR and the direction DL. This corresponds to a spectrum in which the sound is suppressed (that is, a spectrum in which the target sound from the direction D0 is emphasized).

第２比較部３４２は、周波数スペクトルＰ0と周波数スペクトルＰLRとの間でＫ個の周波数ｆ1〜ｆKの各々おける強度を比較する。周波数スペクトルＰ0は非目的音を強調したスペクトルであり、周波数スペクトルＰLRは目的音を強調したスペクトルである。したがって、第２比較部３４２は、Ｋ個の周波数ｆ1〜ｆKのうち周波数スペクトルＰLRの強度が周波数スペクトルＰ0の強度を上回る周波数ｆkを目的音周波数ＦAに選別するとともに、Ｋ個の周波数ｆ1〜ｆKのうち周波数スペクトルＰ0の強度が周波数スペクトルＰLRの強度を上回る周波数ｆkを非目的音周波数ＦBに選別する。 The second comparison unit 342 compares the intensities of the K frequencies f1 to fK between the frequency spectrum P0 and the frequency spectrum PLR. The frequency spectrum P0 is a spectrum that emphasizes the non-target sound, and the frequency spectrum PLR is a spectrum that emphasizes the target sound. Accordingly, the second comparison unit 342 selects, as the target sound frequency FA, the frequency fk in which the intensity of the frequency spectrum PLR exceeds the intensity of the frequency spectrum P0 among the K frequencies f1 to fK, and the K frequencies f1 to fK. The frequency fk in which the intensity of the frequency spectrum P0 exceeds the intensity of the frequency spectrum PLR is selected as the non-target sound frequency FB.

図２の強度特定部３６は、周波数選別部３４による選別の結果を利用して単位区間毎に目的音スペクトルＱAと非目的音スペクトルＱBとを生成する。目的音スペクトルＱAは、目的音の強度に応じて周波数ｆk毎に設定された強度ＸA(k)の系列（ＸA(1)〜ＸA(K)）であり、非目的音スペクトルＱBは、非目的音の強度に応じて周波数ｆk毎に設定された強度ＸB(k)の系列（ＸB(1)〜ＸB(K)）である。強度ＸA(k)および強度ＸB(k)の設定について以下に詳述する。 The intensity specifying unit 36 in FIG. 2 generates a target sound spectrum QA and a non-target sound spectrum QB for each unit section using the result of selection by the frequency selection unit 34. The target sound spectrum QA is a series (XA (1) to XA (K)) of intensity XA (k) set for each frequency fk according to the intensity of the target sound, and the non-target sound spectrum QB is a non-target sound spectrum QB. This is a series (XB (1) to XB (K)) of intensity XB (k) set for each frequency fk according to the intensity of sound. The setting of the intensity XA (k) and the intensity XB (k) will be described in detail below.

図３に示すように、周波数スペクトルＰ0（符号Ｂ0）においては非目的音が強調され、周波数スペクトルＰLRにおいては目的音が強調される。そこで、強度特定部３６は、目的音スペクトルＱAのうち目的音周波数ＦAに選別された各周波数ｆkの強度ＸA(k)を、周波数スペクトルＰLRの当該周波数ｆkでの強度（主に目的音に由来する強度）から周波数スペクトルＰ0の当該周波数ｆkでの強度（主に非目的音に由来する強度）を減算した数値に設定する。以上のように周波数スペクトルＰLRから周波数スペクトルＰ0を減算（スペクトルサブトラクション）することで各目的音周波数ＦAの強度ＸA(k)が算定されるから、周波数スペクトルＰLRの目的音周波数ＦAに包含される非目的音の影響を効果的に低減した目的音スペクトルＱAを生成することが可能である。もっとも、目的音が強調された周波数スペクトルＰLRの強度を目的音スペクトルＱAの強度ＸA(k)として設定する構成も好適である。目的音スペクトルＱAのうち非目的音周波数ＦBに選別された各周波数ｆkの強度ＸA(k)はゼロに設定される。 As shown in FIG. 3, the non-target sound is emphasized in the frequency spectrum P0 (symbol B0), and the target sound is emphasized in the frequency spectrum PLR. Therefore, the intensity specifying unit 36 uses the intensity XA (k) of each frequency fk selected as the target sound frequency FA in the target sound spectrum QA as the intensity (mainly derived from the target sound) of the frequency spectrum PLR. The intensity at the frequency fk of the frequency spectrum P0 (mainly derived from the non-target sound) is subtracted from the intensity of the frequency spectrum P0. As described above, the intensity XA (k) of each target sound frequency FA is calculated by subtracting the frequency spectrum P0 from the frequency spectrum PLR (spectral subtraction), so that the non-included frequency included in the target sound frequency FA of the frequency spectrum PLR. It is possible to generate a target sound spectrum QA in which the influence of the target sound is effectively reduced. However, a configuration in which the intensity of the frequency spectrum PLR in which the target sound is emphasized is set as the intensity XA (k) of the target sound spectrum QA is also suitable. The intensity XA (k) of each frequency fk selected as the non-target sound frequency FB in the target sound spectrum QA is set to zero.

また、強度特定部３６は、非目的音スペクトルＱBのうち非目的音周波数ＦBに選別された各周波数ｆkにおける強度ＸB(k)を、周波数分析部２０が生成した周波数スペクトルＰ1の当該周波数ｆkにおける強度に設定する。なお、非目的音スペクトルＱBの非目的音周波数ＦBにおける強度ＸBを、周波数スペクトルＰ2の当該周波数ｆkにおける強度に設定する構成や、周波数スペクトルＰ0の当該周波数ｆkにおける強度（主に非目的音に由来する強度）から周波数スペクトルＰLRの当該周波数ｆkにおける強度（主に目的音に由来する強度）を減算した数値に設定する構成も採用される。非目的音スペクトルＱBのうち目的音周波数ＦAに選別された各周波数ｆkの強度ＸB(k)はゼロに設定される。 Further, the intensity specifying unit 36 uses the intensity XB (k) at each frequency fk selected as the non-target sound frequency FB in the non-target sound spectrum QB at the frequency fk of the frequency spectrum P1 generated by the frequency analysis unit 20. Set to strength. It should be noted that the intensity XB of the non-target sound spectrum QB at the non-target sound frequency FB is set to the intensity at the frequency fk of the frequency spectrum P2, or the intensity at the frequency fk of the frequency spectrum P0 (mainly derived from the non-target sound). A configuration in which the intensity at the frequency fk of the frequency spectrum PLR (mainly the intensity derived from the target sound) is subtracted from the intensity of the frequency spectrum PLR is also employed. The intensity XB (k) of each frequency fk selected as the target sound frequency FA in the non-target sound spectrum QB is set to zero.

ところで、非目的音周波数ＦBの成分（非目的音）には、時間的に定常（音量や音高などの音響的な特性の変化が少ない）な非目的定常音に加えて、目的音とは別方向から到来する非目的変動音が包含される。非目的定常音は、例えば空調設備の動作音や人込み内での雑踏音などの雑音であり、非目的変動音は、音量や音高などの音響的な特性が刻々と変化する音声や楽音などの妨害音である。図１の雑音推定部４２は、非目的音スペクトルＱB内の非目的変動音を抑制（理想的には除去）することで単位区間毎に雑音スペクトルＮを生成する。第ｎ番目の単位区間の雑音スペクトルＮは、Ｋ個の周波数ｆ1〜ｆKの各々における強度μn(1)〜μn(K)の系列である。 By the way, the component of the non-target sound frequency FB (non-target sound) includes the target sound in addition to the non-target stationary sound that is temporally steady (small change in acoustic characteristics such as volume and pitch). Non-target fluctuation sounds coming from other directions are included. Non-objective steady sound is noise such as operating noise of air conditioning equipment and crowded noise in crowds, and non-objective sound is voice or musical sound whose acoustic characteristics such as volume and pitch change every moment. It is a disturbance sound. The noise estimation unit 42 in FIG. 1 generates a noise spectrum N for each unit section by suppressing (ideally removing) non-target fluctuation sound in the non-target sound spectrum QB. The noise spectrum N of the nth unit section is a series of intensities μn (1) to μn (K) at each of the K frequencies f1 to fK.

図４は、雑音推定部４２がｎ番目の単位区間の雑音スペクトルＮを生成する動作のフローチャートである。図４の処理は単位区間毎に順次に実行される。図４の処理を開始すると、雑音推定部４２は、変数ｋを１に初期化する（ステップＳ1）。変数ｋは、Ｋ個の周波数ｆ1〜ｆKの何れかを指定する番号である。 FIG. 4 is a flowchart of an operation in which the noise estimation unit 42 generates the noise spectrum N of the nth unit section. The process of FIG. 4 is sequentially executed for each unit section. When the processing of FIG. 4 is started, the noise estimation unit 42 initializes the variable k to 1 (step S1). The variable k is a number that specifies any of the K frequencies f1 to fK.

雑音推定部４２は、周波数ｆkが非目的音周波数ＦBであるか否かを判定する（ステップＳ2）。周波数ｆkが非目的音周波数ＦBである場合、雑音推定部４２は、第ｎ番目の単位区間の非目的音スペクトルＱBのうち周波数ｆk（非目的音周波数ＦB）における強度ＸB(k)が閾値ＸTHを上回るか否かを判定する（ステップＳ3）。 The noise estimation unit 42 determines whether or not the frequency fk is the non-target sound frequency FB (step S2). When the frequency fk is the non-target sound frequency FB, the noise estimation unit 42 determines that the intensity XB (k) at the frequency fk (non-target sound frequency FB) in the non-target sound spectrum QB of the nth unit section is the threshold value XTH. It is determined whether or not (step S3).

閾値ＸTHは、以下の数式(1)で定義されるように、雑音推定部４２が直前（第(n-1)番目）の単位区間について生成した雑音スペクトルＮの周波数ｆkにおける強度μn-1(k)と係数τとの乗算値である。係数τは、１を上回る所定値（例えば２）に設定される。したがって、閾値ＸTHは、強度μn-1(k)を上回る数値（強度μn-1(k)に応じた可変値）に設定される。なお、第１番目の単位区間については所定の初期値が強度μn-1(k)として適用される。
ＸTH＝τ・μn-1(k) ……(1) As defined by the following formula (1), the threshold value XTH is the intensity μn−1 () at the frequency fk of the noise spectrum N generated by the noise estimation unit 42 for the immediately preceding ((n−1) th) unit interval. k) multiplied by a coefficient τ. The coefficient τ is set to a predetermined value (for example, 2) exceeding 1. Therefore, the threshold value XTH is set to a numerical value (variable value corresponding to the intensity μn-1 (k)) exceeding the intensity μn-1 (k). A predetermined initial value is applied as the intensity μn-1 (k) for the first unit section.
XTH ＝ τ ・ μn-1 (k) (1)

非目的変動音は非目的定常音と比較して強度が変化し易いから、非目的音スペクトルＱBのうち非目的変動音が発生する周波数ｆkの強度ＸB(k)は経時的な変化が大きい。したがって、ステップＳ3における強度ＸB(k)と閾値ＸTHとの比較は、非目的音スペクトルＱBにおける周波数ｆkに非目的変動音が発生したか否かを判定する処理に相当する。すなわち、強度ＸB(k)が閾値ＸTHを上回る場合、非目的音スペクトルＱBの周波数ｆkの成分は非目的変動音に該当すると推定され、強度ＸB(k)が閾値ＸTHを下回る場合、非目的音スペクトルＱBの周波数ｆkの成分は非目的変動音に該当しない（非目的定常音に該当する）と推定される。非目的変動音の発生時に強度ＸB(k)が閾値ＸTHを上回り、非目的定常音のみが存在する場合に強度ＸB(k)が閾値ＸTHを下回るように、数式(1)の係数τは統計的または実験的に選定される。 Since the intensity of the non-target fluctuation sound is likely to change compared to the non-target stationary sound, the intensity XB (k) of the frequency fk at which the non-target fluctuation sound is generated in the non-target sound spectrum QB varies greatly with time. Therefore, the comparison between the intensity XB (k) and the threshold value XTH in step S3 corresponds to a process of determining whether or not a non-target fluctuation sound is generated at the frequency fk in the non-target sound spectrum QB. That is, when the intensity XB (k) exceeds the threshold value XTH, the frequency fk component of the non-target sound spectrum QB is estimated to correspond to the non-target fluctuation sound, and when the intensity XB (k) is lower than the threshold value XTH, the non-target sound It is estimated that the component of the frequency fk of the spectrum QB does not correspond to the non-target fluctuation sound (corresponds to the non-target stationary sound). The coefficient τ in Equation (1) is statistical so that the intensity XB (k) exceeds the threshold value XTH when a non-target fluctuation sound is generated, and the intensity XB (k) falls below the threshold value XTH when only non-target stationary sound exists. Selected experimentally or experimentally.

非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを下回る場合（すなわち、周波数ｆkに非目的変動音が発生していない場合）、雑音推定部４２は、第ｎ番目の単位区間の非目的音スペクトルＱBの周波数ｆkにおける強度ＸB(k)と、第(n-1)番目の単位区間の雑音スペクトルＮの周波数ｆkにおける強度μn-1(k)とから、第ｎ番目の雑音スペクトルＮの周波数ｆkにおける強度μn(k)を算定する（ステップＳ4）。強度μn(k)は、例えば以下の数式(2)で定義されるように、第ｎ番目の単位区間の非目的音スペクトルＱBにおける強度ＸB(k)と、第(n-1)番目の単位区間の雑音スペクトルＮにおける強度μn-1(k)との加重和（加重平均）として算定される。数式(2)の係数αは１を下回る正数（例えば0.9）に設定される。数式(2)から理解されるように、係数αが大きいほど、強度μn(k)に対する強度ＸB(k)の影響が減少する（過去の各単位区間における強度ＸB(k)の影響が増大する）。
μn(k)＝α・μn-1(k)＋(１−α)・ＸB(k) ……(2) When the intensity XB (k) of the non-target sound spectrum QB is lower than the threshold value XTH (that is, when no non-target fluctuation sound is generated at the frequency fk), the noise estimation unit 42 performs the non-target of the nth unit section. From the intensity XB (k) at the frequency fk of the sound spectrum QB and the intensity μn-1 (k) at the frequency fk of the noise spectrum N of the (n-1) th unit interval, the nth noise spectrum N The intensity μn (k) at the frequency fk is calculated (step S4). The intensity μn (k) is defined by the following formula (2), for example, the intensity XB (k) in the non-target sound spectrum QB of the nth unit section and the (n−1) th unit It is calculated as a weighted sum (weighted average) with the intensity μn-1 (k) in the noise spectrum N of the section. The coefficient α in Expression (2) is set to a positive number (for example, 0.9) less than 1. As understood from Equation (2), the larger the coefficient α, the smaller the influence of the intensity XB (k) on the intensity μn (k) (the influence of the intensity XB (k) in each past unit interval increases. ).
μn (k) = α ・ μn-1 (k) + (1-α) ・ XB (k) (2)

一方、非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを上回る場合（Ｓ3：YES）、雑音推定部４２は、数式(3)に示すように、第(n-1)番目の雑音スペクトルＮの周波数ｆkにおける強度μn-1(k)を、第ｎ番目の雑音スペクトルＮの周波数ｆk（非目的音周波数ＦB）における強度μn(k)として設定する（ステップＳ5）。すなわち、強度ＸB(k)が閾値ＸTHを上回る場合（周波数ｆkの非目的変動音の発生に起因して強度ＸB(k)が増加した場合）、非目的音スペクトルＱBの強度ＸB(k)は強度μn(k)に反映されない。したがって、雑音スペクトルＮにおいては、非目的音スペクトルＱB内の非目的変動音が抑制（除去）される。
μn(k)＝μn-1(k) ……(3) On the other hand, when the intensity XB (k) of the non-target sound spectrum QB exceeds the threshold value XTH (S3: YES), the noise estimation unit 42 calculates the (n-1) th noise spectrum as shown in Equation (3). The intensity μn−1 (k) at the frequency fk of N is set as the intensity μn (k) at the frequency fk (non-target sound frequency FB) of the nth noise spectrum N (step S5). That is, when the intensity XB (k) exceeds the threshold value XTH (when the intensity XB (k) increases due to the generation of the non-target fluctuation sound of the frequency fk), the intensity XB (k) of the non-target sound spectrum QB is It is not reflected in the intensity μn (k). Therefore, in the noise spectrum N, the non-target fluctuation sound in the non-target sound spectrum QB is suppressed (removed).
μn (k) = μn-1 (k) (3)

周波数ｆkが目的音周波数ＦAである場合（Ｓ2：NO）、雑音推定部４２は、数式(3)と同様に、第(n-1)番目の雑音スペクトルＮの強度μn(k)を、第ｎ番目の雑音スペクトルＮの周波数ｆk（目的音周波数ＦA）における強度μn(k)として設定する（ステップＳ6）。 When the frequency fk is the target sound frequency FA (S2: NO), the noise estimation unit 42 calculates the intensity μn (k) of the (n−1) th noise spectrum N in the same way as the equation (3). The intensity μn (k) at the frequency fk (target sound frequency FA) of the nth noise spectrum N is set (step S6).

数式(2)および数式(3)から理解されるように、第ｎ番目の単位区間における雑音スペクトルＮの強度μn(k)は、過去（第(n-1)番目以前）の複数の単位区間について算定された雑音スペクトルＮの強度を累積的に反映した数値となる。すなわち、雑音スペクトルＮの強度μn(k)は、非目的音周波数ＦBに選別された周波数ｆkの強度ＸB(k)が閾値ＸTHを下回る複数の単位区間にわたって非目的音スペクトルＱBの強度ＸB(k)を平滑化（平均化）した数値となる。 As understood from the equations (2) and (3), the intensity μn (k) of the noise spectrum N in the nth unit interval is a plurality of unit intervals in the past (before the (n-1) th) unit interval. It is a numerical value that cumulatively reflects the intensity of the noise spectrum N calculated for. That is, the intensity μn (k) of the noise spectrum N is equal to the intensity XB (k) of the non-target sound spectrum QB over a plurality of unit intervals where the intensity XB (k) of the frequency fk selected as the non-target sound frequency FB is lower than the threshold value XTH. ) Is a smoothed (averaged) value.

以上のように各ステップ（Ｓ4，Ｓ5，Ｓ6）で強度μn(k)を設定すると、雑音推定部４２は、変数ｋが所定値Ｋに到達したか否かを判定する（ステップＳ7）。変数ｋが所定値Ｋに到達していない場合、雑音推定部４２は、変数ｋに１を加算したうえで（ステップＳ8）、処理をステップＳ2に移行する。すなわち、Ｋ個の周波数ｆ1〜ｆKの各々について強度μn(k)が順次に算定される。変数ｋが数値Ｋに到達した場合（すなわち、強度μn(1)〜μn(K)の算定が完了した場合）、雑音推定部４２は図４の処理を終了する（Ｓ7：YES）。Ｋ個の周波数ｆ1〜ｆKについての強度μn(1)〜μn(K)の系列が第ｎ番目の単位区間の雑音スペクトルＮに相当する。 As described above, when the intensity μn (k) is set in each step (S4, S5, S6), the noise estimation unit 42 determines whether or not the variable k has reached the predetermined value K (step S7). If the variable k has not reached the predetermined value K, the noise estimation unit 42 adds 1 to the variable k (step S8), and then proceeds to step S2. That is, the intensity μn (k) is sequentially calculated for each of the K frequencies f1 to fK. When the variable k reaches the numerical value K (that is, when the calculation of the intensity μn (1) to μn (K) is completed), the noise estimation unit 42 ends the process of FIG. 4 (S7: YES). A series of intensities μn (1) to μn (K) for K frequencies f1 to fK corresponds to the noise spectrum N of the nth unit interval.

図１の雑音抑圧部４４は、雑音推定部４２が生成した雑音スペクトルＮを、音源分離部３０が生成した目的音スペクトルＱAから減算（スペクトルサブトラクション）することで雑音抑圧スペクトルＱCを生成する。具体的には、雑音抑圧部４４は、第ｎ番目の単位区間の目的音スペクトルＱAにおける周波数ｆkの強度ＸA(k)から、当該単位区間について生成された雑音スペクトルＮの周波数ｆkにおける強度μn(k)を減算することで雑音抑圧スペクトルＱCを生成する。 The noise suppression unit 44 in FIG. 1 generates a noise suppression spectrum QC by subtracting the noise spectrum N generated by the noise estimation unit 42 from the target sound spectrum QA generated by the sound source separation unit 30 (spectral subtraction). Specifically, the noise suppression unit 44 calculates the intensity μn (at the frequency fk of the noise spectrum N generated for the unit section from the intensity XA (k) of the frequency fk at the target sound spectrum QA of the nth unit section. A noise suppression spectrum QC is generated by subtracting k).

すなわち、第ｎ番目の単位区間について雑音抑圧スペクトルＱCの周波数ｆkにおける強度ＸC(k)は数式(4a)で表現される。ただし、数式(4a)の右辺（ＸA(k)−μn(k)）が負数となる周波数ｆkの強度ＸC(k)はゼロに設定される。また、雑音抑圧スペクトルＱCは数式(4b)で表現される。数式(4b)の記号ｅ^ｊθ(k)は目的音スペクトルＱAの位相成分である。数式(4a)および数式(4b)から理解されるように、雑音抑圧スペクトルＱCは、方向Ｄ0からの到来音（目的音スペクトルＱA）から非目的定常音（雑音スペクトルＮ）を抑圧した音響（すなわち目的音）のスペクトルに相当する。
ＸC(k)＝ＸA(k)−μn(k) ……(4a)
ＱC＝｛ＸA(k)−μn(k)｝ｅ^ｊθ(k) ……(4b) That is, the intensity XC (k) at the frequency fk of the noise suppression spectrum QC for the nth unit interval is expressed by the equation (4a). However, the intensity XC (k) of the frequency fk at which the right side (XA (k) −μn (k)) of the formula (4a) is a negative number is set to zero. In addition, the noise suppression spectrum QC is expressed by Equation (4b). The symbol e ^{jθ (k) in the} equation (4b) is a phase component of the target sound spectrum QA. As can be understood from the equations (4a) and (4b), the noise suppression spectrum QC is a sound obtained by suppressing the non-target stationary sound (noise spectrum N) from the incoming sound (target sound spectrum QA) from the direction D0 (ie, the noise spectrum N). This corresponds to the spectrum of the target sound.
XC (k) = XA (k) -μn (k) (4a)
QC = {XA (k) −μn (k)} e ^{jθ (k)} (4b)

信号合成部５０は、雑音抑圧部４４が生成した雑音抑圧スペクトルＱCから時間領域の音信号ＳOUTを生成する。図１に示すように、信号合成部５０は、調整部５２と合成部５４と逆変換部５６とで構成される。調整部５２は、音源分離部３０が生成した非目的音スペクトルＱBの各強度ＸB(1)〜ＸB(K)に係数ｐを乗算する。係数ｐは所定の正数（例えば0.01）に設定される。 The signal synthesis unit 50 generates a time-domain sound signal SOUT from the noise suppression spectrum QC generated by the noise suppression unit 44. As shown in FIG. 1, the signal synthesis unit 50 includes an adjustment unit 52, a synthesis unit 54, and an inverse conversion unit 56. The adjustment unit 52 multiplies each intensity XB (1) to XB (K) of the non-target sound spectrum QB generated by the sound source separation unit 30 by a coefficient p. The coefficient p is set to a predetermined positive number (for example, 0.01).

合成部５４は、雑音抑圧部４４が生成した雑音抑圧スペクトルＱCと調整部５２による処理後の非目的音スペクトルＱBとを合成することで単位区間毎に出力スペクトルＲを生成する。出力スペクトルＲは、雑音抑圧スペクトルＱCのうち目的音周波数ＦAに選別された各周波数ｆkの強度ＸC(k)と非目的音スペクトルＱBのうち非目的音周波数ＦBに選別された各周波数ｆkの強度ＸB(k)とを周波数軸に沿って配列した系列である。すなわち、出力スペクトルＲのうち目的音周波数ＦAに選別された各周波数ｆkの強度は、雑音抑圧スペクトルＱCの強度ＸC(k)に設定され、出力スペクトルＲのうち非目的音周波数ＦBに選別された各周波数ｆkの強度は、非目的スペクトルＱBの強度ＸB(k)と係数ｐとの乗算値に設定される。以上のように、調整部５２による調整後の非目的音スペクトルＱBが雑音抑圧スペクトルＱCに合成されるから、雑音抑圧スペクトルＱCを出力スペクトルＲとして出力する構成（再生音のうち非目的音周波数ＦBの強度がゼロに設定される構成）と比較して、聴感的に自然な再生音を生成することが可能である。 The synthesizer 54 generates an output spectrum R for each unit section by synthesizing the noise suppression spectrum QC generated by the noise suppression unit 44 and the non-target sound spectrum QB processed by the adjustment unit 52. The output spectrum R includes the intensity XC (k) of each frequency fk selected as the target sound frequency FA in the noise suppression spectrum QC and the intensity of each frequency fk selected as the non-target sound frequency FB out of the non-target sound spectrum QB. XB (k) is a series arranged along the frequency axis. That is, the intensity of each frequency fk selected for the target sound frequency FA in the output spectrum R is set to the intensity XC (k) of the noise suppression spectrum QC, and selected for the non-target sound frequency FB in the output spectrum R. The intensity of each frequency fk is set to a multiplication value of the intensity XB (k) of the non-target spectrum QB and the coefficient p. As described above, since the non-target sound spectrum QB adjusted by the adjusting unit 52 is synthesized with the noise suppression spectrum QC, the noise suppression spectrum QC is output as the output spectrum R (non-target sound frequency FB of the reproduced sound). Compared with a configuration in which the intensity of the sound is set to zero, it is possible to generate a audibly natural reproduced sound.

逆変換部５６は、各単位区間の出力スペクトルＲを逆ＦＦＴ処理で時間領域の信号に変換し、各単位区間の変換後の信号を時間軸上で相互に連結することで音信号ＳOUTを生成する。放音機器（図示略）に音信号ＳOUTが供給されることで、非目的音が抑制されるとともに目的音が強調された再生音が放音される。 The inverse conversion unit 56 converts the output spectrum R of each unit section into a time domain signal by inverse FFT processing, and generates a sound signal SOUT by connecting the converted signals of each unit section to each other on the time axis. To do. By supplying the sound signal SOUT to the sound emitting device (not shown), the reproduction sound in which the non-target sound is suppressed and the target sound is emphasized is emitted.

以上の形態においては、目的音の方向Ｄ0と非目的音の方向（ＤR，ＤL）との相違を利用して目的音周波数ＦAと非目的音周波数ＦBとが選別されるから、目的音と非目的音とで音響的な特徴が類似する場合であっても、目的音スペクトルＱAと非目的音スペクトルＱBとを高精度に分離できる。さらに、非目的音スペクトルＱBから生成された雑音スペクトルＮが目的音スペクトルＱAから減算されるから、非目的定常音を効果的に低減した雑音抑圧スペクトルＱC（さらには出力スペクトルＲや再生音）を生成できるという利点がある。 In the above embodiment, the target sound frequency FA and the non-target sound frequency FB are selected using the difference between the direction D0 of the target sound and the direction (DR, DL) of the non-target sound. Even if the target sound has similar acoustic characteristics, the target sound spectrum QA and the non-target sound spectrum QB can be separated with high accuracy. Further, since the noise spectrum N generated from the non-target sound spectrum QB is subtracted from the target sound spectrum QA, the noise suppression spectrum QC (and the output spectrum R and reproduced sound) that effectively reduces the non-target steady sound is obtained. There is an advantage that it can be generated.

また、以上の形態においては、非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを上回る周波数ｆkについては雑音スペクトルＮの強度μn(k)に強度ＸB(k)が反映されないから、非目的定常音および非目的変動音の双方が存在する環境においても、非目的定常音のみを高精度に抽出した雑音スペクトルＮを生成することが可能である。本形態の効果を以下に詳述する。 Further, in the above form, the intensity XB (k) is not reflected in the intensity μn (k) of the noise spectrum N for the frequency fk where the intensity XB (k) of the non-target sound spectrum QB exceeds the threshold value XTH. Even in an environment where both stationary sound and non-target fluctuation sound exist, it is possible to generate a noise spectrum N obtained by extracting only non-target stationary sound with high accuracy. The effect of this form is explained in full detail below.

図５および図６は、各単位区間の雑音スペクトルＮの時系列（スペクトログラム）である。図５は、第１実施形態における雑音スペクトルＮの時系列であり、図６は、第１実施形態との対比例における雑音スペクトルＮの時系列である。対比例は、非目的音周波数ＦBの強度ＸB(k)に拘わらず、雑音スペクトルＮの強度μn(k)を数式(2)で算定する形態（すなわち、図４のステップＳ3とステップＳ5とを省略した形態）である。 5 and 6 are time series (spectrogram) of the noise spectrum N of each unit section. FIG. 5 is a time series of the noise spectrum N in the first embodiment, and FIG. 6 is a time series of the noise spectrum N in comparison with the first embodiment. In contrast, regardless of the intensity XB (k) of the non-target sound frequency FB, the intensity μn (k) of the noise spectrum N is calculated by the equation (2) (that is, steps S3 and S5 in FIG. (Omitted form).

図５および図６においては、雑音スペクトルＮのうち強度が高い周波数ｆk（ピークの周波数）を時間軸に沿って連結した直線が図示されている。直線が太い地点ほど強度が高いことを意味する。図５および図６の例示では、経時的に変化しない非目的定常音が雑音スペクトルＮ（非目的音スペクトルＱB）の低域側に存在する。また、図５および図６には、非目的変動音が発生した時点が図示されている。 5 and 6, a straight line in which the frequency fk (peak frequency) having a high intensity in the noise spectrum N is connected along the time axis is illustrated. A thicker straight line means higher strength. In the illustrations of FIGS. 5 and 6, the non-target stationary sound that does not change with time exists on the low frequency side of the noise spectrum N (non-target sound spectrum QB). 5 and 6 show the time when the non-target fluctuation sound is generated.

対比例においては、非目的音スペクトルＱBの強度ＸB(k)に拘わらず（すなわち非目的変動音の有無に拘わらず）、雑音スペクトルＮの強度μn(k)は数式(2)で算定される。したがって、雑音スペクトルＮは非目的定常音および非目的変動音の双方を包含する。そして、数式(2)で算定される強度μn(k)には過去の強度μn-1(k)が累積的に反映されるから、雑音スペクトルＮのうち特定の時点で非目的変動音が発生した周波数ｆkの強度μn(k)は、図６に示すように、非目的音変動音が停止した場合であっても、後続の複数の単位区間にわたって高い数値に維持される。したがって、目的音変動音が発生した周波数ｆkにおける目的音スペクトルＱAの強度が雑音抑圧部４４による処理で過剰に低減され、耳障りなミュージカルノイズの原因になる可能性がある。 In contrast, the intensity μn (k) of the noise spectrum N is calculated by the equation (2) regardless of the intensity XB (k) of the non-target sound spectrum QB (that is, regardless of the presence or absence of non-target fluctuation sound). . Therefore, the noise spectrum N includes both non-target stationary sounds and non-target fluctuation sounds. Since the past intensity μn-1 (k) is cumulatively reflected in the intensity μn (k) calculated by Equation (2), a non-target fluctuation sound is generated at a specific time in the noise spectrum N. As shown in FIG. 6, the intensity μn (k) of the frequency fk is maintained at a high value over a plurality of subsequent unit intervals even when the non-target sound fluctuation sound is stopped. Therefore, the intensity of the target sound spectrum QA at the frequency fk where the target sound fluctuation sound is generated is excessively reduced by the processing by the noise suppressing unit 44, which may cause annoying musical noise.

対比例とは対照的に、第１実施形態においては、強度ＸB(k)が閾値ＸTHを上回る周波数ｆkの強度μn(k)には強度ＸB(k)（すなわち周波数ｆkの非目的変動音の強度）が反映されないから、図５に示すように、非目的変動音を抑制した雑音スペクトルＮが生成される。したがって、目的音スペクトルＱAのうち非目的変動音が発生した周波数ｆkの強度の過剰な低減が防止され、ミュージカルノイズの発生が抑制されるという利点がある。なお、雑音スペクトルＮにおいては非目的変動音が抑制されているから、雑音抑圧部４４による処理で非目的変動音が目的音スペクトルＱAから低減される効果は少ない。しかし、方向ＤRや方向ＤLから到来する非目的変動音は音源分離部３０による選別で目的音スペクトルＱAから排除されているから、雑音抑圧部４４で非目的変動音が低減されないとは言っても、非目的定常音および非目的変動音の双方を高精度に抑圧した再生音を生成することが可能である。 In contrast to the proportionality, in the first embodiment, the intensity Xn (k) of the frequency fk at which the intensity XB (k) exceeds the threshold value XTH has the intensity XB (k) (that is, the non-target fluctuation sound of the frequency fk). (Intensity) is not reflected, and as shown in FIG. 5, a noise spectrum N in which non-target fluctuation sound is suppressed is generated. Therefore, there is an advantage that the intensity of the frequency fk where the non-target fluctuation sound is generated in the target sound spectrum QA is prevented from being excessively reduced, and the generation of musical noise is suppressed. Since the non-target fluctuation sound is suppressed in the noise spectrum N, the effect of reducing the non-target fluctuation sound from the target sound spectrum QA by the processing by the noise suppression unit 44 is small. However, since the non-target fluctuation sound arriving from the direction DR or the direction DL is excluded from the target sound spectrum QA by the selection by the sound source separation unit 30, the noise suppression unit 44 does not reduce the non-target fluctuation sound. In addition, it is possible to generate a reproduced sound in which both the non-target steady sound and the non-target fluctuation sound are suppressed with high accuracy.

ところで、第１実施形態の強度特定部３６は、目的音が強調された周波数スペクトルＰLRから非目的音が強調された周波数スペクトルＰ0を減算することで目的音スペクトルＱAを生成する。すなわち、強度特定部３６による処理だけでも非目的音は抑制される。しかし、例えば方向Ｄ0からの到来音に非目的定常音が含まれる場合には、周波数スペクトルＰLRから周波数スペクトルＰ0を減算しても非目的定常音は充分に抑圧されない。非目的定常音の雑音スペクトルＮが目的音スペクトルＱAから減算される第１実施形態によれば、強度特定部３６による処理のみで非目的音を抑制する構成（すなわち雑音抑圧部４４を省略した構成）と比較して非目的定常音が効果的に抑圧されるという利点がある。 Incidentally, the intensity specifying unit 36 of the first embodiment generates the target sound spectrum QA by subtracting the frequency spectrum P0 in which the non-target sound is emphasized from the frequency spectrum PLR in which the target sound is emphasized. That is, the non-target sound can be suppressed only by the processing by the intensity specifying unit 36. However, for example, when the non-target stationary sound is included in the incoming sound from the direction D0, the non-target stationary sound is not sufficiently suppressed even if the frequency spectrum P0 is subtracted from the frequency spectrum PLR. According to the first embodiment in which the noise spectrum N of the non-target stationary sound is subtracted from the target sound spectrum QA, the configuration that suppresses the non-target sound only by the processing by the intensity specifying unit 36 (that is, the configuration in which the noise suppression unit 44 is omitted). ) Has an advantage that non-target stationary sound is effectively suppressed.

＜Ｂ：第２実施形態＞
次に、本発明の第２実施形態について説明する。なお、以下の各態様において作用や機能が第１実施形態と同等である要素については、以上と同じ符号を付して各々の詳細な説明を適宜に省略する。 <B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In addition, about the element in which an effect | action and a function are equivalent to 1st Embodiment in each following aspect, the same code | symbol as the above is attached | subjected and each detailed description is abbreviate | omitted suitably.

第１実施形態においては、非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを上回る場合に、過去の雑音スペクトルＮの強度μn-1(k)を第ｎ番目の雑音スペクトルＮの強度μn(k)に設定した。以上の構成によれば、非目的変動音の影響を雑音スペクトルＮから除去できる一方、閾値ＸTHを上回る強度ＸB(k)で音処理装置１００の動作中に新たに発生し始めて継続する非目的定常音（以下では特に「新規定常音」という）も雑音スペクトルＮから除去される。したがって、新規定常音の抑圧が不足する可能性がある。第２実施形態は、以上の問題を解消する構成である。 In the first embodiment, when the intensity XB (k) of the non-target sound spectrum QB exceeds the threshold value XTH, the intensity μn−1 (k) of the past noise spectrum N is changed to the intensity μn of the nth noise spectrum N. Set to (k). According to the above configuration, the influence of the non-target fluctuation sound can be removed from the noise spectrum N, while the non-target steady state starts to be newly generated during the operation of the sound processing apparatus 100 with the intensity XB (k) exceeding the threshold value XTH. Sound (hereinafter particularly referred to as “new stationary sound”) is also removed from the noise spectrum N. Therefore, there is a possibility that the suppression of the new stationary sound is insufficient. The second embodiment is configured to solve the above problems.

第２実施形態においては図４のステップＳ5の処理が第１実施形態とは相違する。非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを上回る場合（すなわち、非目的変動音または新規定常音が発生した場合）、雑音推定部４２は、第１実施形態の数式(3)に代えて、以下の数式(5)の演算を実行する。すなわち、雑音推定部４２は、第(n-1)番目の雑音スペクトルＮの強度μn-1(k)と係数βとの乗算値を、第ｎ番目の雑音スペクトルＮの強度μn(k)として設定する（ステップＳ5）。
μn(k)＝β・μn-1(k) ……(5) In the second embodiment, the process of step S5 in FIG. 4 is different from that of the first embodiment. When the intensity XB (k) of the non-target sound spectrum QB exceeds the threshold value XTH (that is, when a non-target fluctuation sound or a new stationary sound is generated), the noise estimation unit 42 calculates Formula (3) of the first embodiment. Instead, the following equation (5) is calculated. That is, the noise estimation unit 42 sets the product of the intensity μn−1 (k) of the (n−1) th noise spectrum N and the coefficient β as the intensity μn (k) of the nth noise spectrum N. Set (step S5).
μn (k) ＝ β ・ μn-1 (k) (5)

係数βは、１を上回る所定値（例えば1.01）に設定される。したがって、強度ＸB(k)が閾値ＸTHを上回る状態が継続する複数の単位区間において、雑音スペクトルＮの強度μn(k)は経時的に増加して非目的音（非目的変動音または新規定常音）の強度に接近する。強度μn(k)は、係数βが大きいほど迅速に非目的定常音の強度に接近する。 The coefficient β is set to a predetermined value (for example, 1.01) exceeding 1. Therefore, in a plurality of unit intervals in which the state where the intensity XB (k) exceeds the threshold value XTH continues, the intensity μn (k) of the noise spectrum N increases with time, and the non-target sound (non-target fluctuation sound or new steady sound) ) The intensity μn (k) approaches the intensity of the non-target steady sound more rapidly as the coefficient β increases.

以上の形態においては、雑音スペクトルＮの強度μn(k)が経時的に新規定常音の強度に接近するから、新規定常音の特性を反映した雑音スペクトルＮが生成される。したがって、新規定常音を含む非目的音を目的音スペクトルＱAから効果的に抑圧することが可能である。 In the above embodiment, since the intensity μn (k) of the noise spectrum N approaches the intensity of the new stationary sound over time, the noise spectrum N reflecting the characteristics of the new stationary sound is generated. Therefore, it is possible to effectively suppress non-target sounds including new stationary sounds from the target sound spectrum QA.

なお、新規定常音の発生時だけでなく非目的変動音の発生時にも、雑音スペクトルＮの強度μn(k)は数式(5)の演算で経時的に増加する。すなわち、第２実施形態においては、非目的変動音の発生が雑音スペクトルＮの強度μn(k)に反映される。しかし、非目的変動音は経時的に変化し易いから、長時間にわたって高い強度に維持される可能性は新規定常音と比較して充分に低い。つまり、非目的変動音が発生した場合であっても、雑音スペクトルＮの強度μn(k)が非目的変動音に充分に接近する以前に、非目的変動音が閾値ＸTHを下回る強度に低下する（強度μn(k)の算定に数式(2)が適用される）ことで強度μn(k)の上昇は抑制される。したがって、強度ＸB(k)が閾値ＸTHを上回る場合に雑音スペクトルＮの強度μn(k)が経時的に上昇するとは言っても、非目的変動音が発生した場合の強度μn(k)の上昇は充分に小さい。すなわち、第２実施形態によれば、新規定常音を反映した雑音スペクトルＮを、非目的変動音の影響を充分に抑制しながら生成できるという利点がある。 Note that the intensity μn (k) of the noise spectrum N increases with time according to the calculation of Equation (5) not only when a new stationary sound is generated but also when a non-target fluctuation sound is generated. That is, in the second embodiment, the occurrence of non-target fluctuation sound is reflected in the intensity μn (k) of the noise spectrum N. However, since the non-target fluctuation sound is likely to change with time, the possibility of being maintained at a high intensity for a long time is sufficiently low as compared with the new stationary sound. That is, even when a non-target fluctuation sound is generated, before the intensity μn (k) of the noise spectrum N sufficiently approaches the non-target fluctuation sound, the non-target fluctuation sound decreases to an intensity below the threshold value XTH. (Equation (2) is applied to the calculation of the intensity μn (k)), thereby suppressing the increase in the intensity μn (k). Therefore, although the intensity μn (k) of the noise spectrum N increases with time when the intensity XB (k) exceeds the threshold value XTH, the intensity μn (k) increases when the non-target fluctuation sound is generated. Is small enough. That is, according to the second embodiment, there is an advantage that the noise spectrum N reflecting the new stationary sound can be generated while sufficiently suppressing the influence of the non-target fluctuation sound.

＜Ｃ：変形例＞
以上に例示した各形態には様々に変形される。具体的な変形の態様を以下に例示する。なお、以下の例示から２以上の態様を任意に選択して組合せてもよい。 <C: Modification>
Various modifications can be made to the embodiments exemplified above. Specific modifications are exemplified below. Two or more aspects may be arbitrarily selected from the following examples and combined.

（１）変形例１
雑音抑圧部４４による処理の内容は適宜に変更される。例えば、雑音抑圧スペクトルＱCの強度ＸC(k)の算定には、数式(4a)に代えて以下の数式(6)が利用される。ただし、数式(6)の右辺（ＸA(k)−γ・μn(k)）が所定値δ・μn(k)を下回る場合、強度ＸC(k)はδ・μn(k)に設定される。係数γは、１以上の所定値（例えば３〜６）に設定され、係数δは１よりも充分に小さい正数（例えば0.01）に設定される。
ＸC(k)＝ＸA(k)−γ・μn(k) ……(6) (1) Modification 1
The content of processing by the noise suppression unit 44 is changed as appropriate. For example, the following formula (6) is used instead of the formula (4a) for the calculation of the intensity XC (k) of the noise suppression spectrum QC. However, when the right side of formula (6) (XA (k) −γ · μn (k)) is below a predetermined value δ · μn (k), the intensity XC (k) is set to δ · μn (k). . The coefficient γ is set to a predetermined value of 1 or more (for example, 3 to 6), and the coefficient δ is set to a positive number that is sufficiently smaller than 1 (for example, 0.01).
XC (k) = XA (k) −γ ・ μn (k) (6)

数式(6)から理解されるように、雑音スペクトルＮの強度μn(k)が過剰に強度ＸA(k)から減算（オーバーサブトラクション）されるから、非目的音（非目的定常音）が充分に抑圧された高品位な再生音を生成することが可能である。一方、数式(6)の右辺（ＸA(k)−γ・μn(k)）が所定値δ・μn(k)を下回る周波数ｆkについては雑音抑圧スペクトルＱXの強度ＸC(k)が所定値δ・μn(k)に設定されるから、強度ＸC(k)がゼロまで低下することを防止して自然な再生音の生成が可能となる。 As understood from the equation (6), the intensity μn (k) of the noise spectrum N is excessively subtracted from the intensity XA (k) (oversubtraction), so that the non-target sound (non-target steady sound) is sufficient. It is possible to generate a suppressed high-quality reproduced sound. On the other hand, the intensity XC (k) of the noise suppression spectrum QX is the predetermined value δ for the frequency fk where the right side (XA (k) −γ · μn (k)) of the equation (6) is lower than the predetermined value δ · μn (k). Since it is set to μn (k), it is possible to prevent the intensity XC (k) from decreasing to zero and generate a natural reproduced sound.

（２）変形例２
以下に例示するように数式(2)の係数αを雑音推定部４２が可変に制御する構成も好適である。
雑音抑圧スペクトルＱCは、目的音スペクトルＱAから雑音スペクトルＮを減算することで算定されるから、非目的定常音の特性（例えば音量）の変化とともに雑音スペクトルＮが変化すると、雑音抑圧スペクトルＱCの特性も変化する。一方、数式(2)から理解されるように、第ｎ番目の単位区間における非目的音周波数ＦBの強度ＸB(k)の影響は、係数αが大きい（係数(1-α)が小さい）ほど抑制される。したがって、非目的音の音量が変化したときの再生音における目的音の音量の変化は、係数αが大きいほど低減される。 (2) Modification 2
As exemplified below, a configuration in which the noise estimation unit 42 variably controls the coefficient α in Expression (2) is also suitable.
Since the noise suppression spectrum QC is calculated by subtracting the noise spectrum N from the target sound spectrum QA, if the noise spectrum N changes with changes in the characteristics (for example, volume) of the non-target stationary sound, the characteristics of the noise suppression spectrum QC Also changes. On the other hand, as understood from the equation (2), the influence of the intensity XB (k) of the non-target sound frequency FB in the nth unit interval is larger as the coefficient α is larger (the coefficient (1-α) is smaller). It is suppressed. Therefore, the change in the volume of the target sound in the reproduced sound when the volume of the non-target sound is changed is reduced as the coefficient α is increased.

目的音が優勢である期間（目的音周波数ＦAが多い期間）にて目的音の音量が顕著に変動すると聴感上において不自然な印象となるから、第ｎ番目の単位区間における目的音周波数ＦAの個数が多い（非目的音周波数ＦBの個数が少ない）ほど係数αが増加するように、雑音推定部４２が係数αを可変に制御する構成が好適である。以上の構成によれば、目的音が優勢である期間においては非目的定常音の音量が変化しても目的音の音量の変化は抑制されるから、聴感上において自然な再生音を生成することが可能となる。 If the target sound volume fluctuates significantly during a period in which the target sound is dominant (a period in which the target sound frequency FA is large), an unnatural impression will occur, so that the target sound frequency FA in the nth unit section A configuration in which the noise estimation unit 42 variably controls the coefficient α is suitable so that the coefficient α increases as the number increases (the number of non-target sound frequencies FB decreases). According to the above configuration, since the change in the volume of the target sound is suppressed even if the volume of the non-target stationary sound changes during the period in which the target sound is dominant, a natural reproduction sound can be generated for hearing. Is possible.

（３）変形例３
Ｋ個の周波数ｆ1〜ｆKを目的音周波数ＦAと非目的音周波数ＦBとに選別する方法は適宜に変更される。具体的には、非特許文献１や特開平10-313497号公報に開示された技術（SAFIA）が目的音周波数ＦAと非目的音周波数ＦBとの選別に利用される。例えば、収音機器Ｍ1が収音機器Ｍ2と比較して目的音の音源に近く、収音機器Ｍ2が収音機器Ｍ1と比較して非目的音の音源に近い場合を想定する。音源分離部３０は、周波数スペクトルＰ1と周波数スペクトルＰ2との間でＫ個の周波数ｆ1〜ｆKの各々における強度を比較し、周波数スペクトルＰ1の強度が大きい周波数ｆkを目的音周波数ＦAに選別するとともに周波数スペクトルＰ2の強度が大きい周波数ｆkを非目的音周波数ＦBに選別する。以上の構成によれば、図２の信号処理部３２が不要となるから音処理装置１００の処理や構成が簡素化されるという利点がある。 (3) Modification 3
The method of selecting the K frequencies f1 to fK into the target sound frequency FA and the non-target sound frequency FB is appropriately changed. Specifically, the technique (SAFIA) disclosed in Non-Patent Document 1 and Japanese Patent Laid-Open No. 10-313497 is used for selecting the target sound frequency FA and the non-target sound frequency FB. For example, it is assumed that the sound collection device M1 is closer to the target sound source than the sound collection device M2, and the sound collection device M2 is closer to the non-target sound source than the sound collection device M1. The sound source separation unit 30 compares the intensities of the K frequencies f1 to fK between the frequency spectrum P1 and the frequency spectrum P2, and selects the frequency fk having a high intensity of the frequency spectrum P1 as the target sound frequency FA. A frequency fk having a high intensity of the frequency spectrum P2 is selected as a non-target sound frequency FB. According to the above configuration, since the signal processing unit 32 of FIG. 2 is not required, there is an advantage that the processing and configuration of the sound processing device 100 are simplified.

死角制御型のビームフォーマに代えて、遅延加算型のビームフォーマを信号処理部３２（第１処理部３２１，第２処理部３２２，第３処理部３２３）に採用した以下の構成も好適である。第１処理部３２１は、周波数スペクトルＰ1と周波数スペクトルＰ2とを加算することで、方向Ｄ0の目的音が強調された周波数スペクトルＰ0を生成する。第２処理部３２２は、周波数スペクトルＰ2と遅延量Ｄを付加した周波数スペクトルＰ1とを加算することで、方向ＤRの非目的音が強調された周波数スペクトルＰRを生成する。同様に、第３処理部３２３は、方向ＤLの非目的音が強調された周波数スペクトルＰLを生成する。第１比較部３４１は、周波数スペクトルＰLRの周波数ｆkにおける強度を、周波数スペクトルＰRの周波数ｆkにおける強度と周波数スペクトルＰLの周波数ｆkにおける強度とのうちの高い方の強度に設定される。したがって、周波数スペクトルＰLRは、方向ＤRおよび方向ＤLの非目的音を強調したスペクトルとなる。そして、第２比較部３４２は、Ｋ個の周波数のうち周波数スペクトルＰLRの強度が周波数スペクトルＰ0の強度を上回る周波数を非目的音周波数ＦBに選別するとともに、Ｋ個の周波数のうち周波数スペクトルＰ0の強度が周波数スペクトルＰLRの強度を上回る周波数を目的音周波数ＦAに選別する。 Instead of the blind spot control type beamformer, the following configuration in which a delay addition type beamformer is adopted in the signal processing unit 32 (first processing unit 321, second processing unit 322, third processing unit 323) is also suitable. . The first processing unit 321 generates the frequency spectrum P0 in which the target sound in the direction D0 is emphasized by adding the frequency spectrum P1 and the frequency spectrum P2. The second processing unit 322 generates the frequency spectrum PR in which the non-target sound in the direction DR is emphasized by adding the frequency spectrum P2 and the frequency spectrum P1 to which the delay amount D is added. Similarly, the third processing unit 323 generates a frequency spectrum PL in which the non-target sound in the direction DL is emphasized. The first comparison unit 341 sets the intensity at the frequency fk of the frequency spectrum PLR to the higher intensity of the intensity at the frequency fk of the frequency spectrum PR and the intensity at the frequency fk of the frequency spectrum PL. Therefore, the frequency spectrum PLR is a spectrum in which the non-target sound in the direction DR and the direction DL is emphasized. Then, the second comparison unit 342 selects, as the non-target sound frequency FB, the frequency of which the intensity of the frequency spectrum PLR exceeds the intensity of the frequency spectrum P0 among the K frequencies, and the frequency spectrum P0 of the K frequencies. A frequency whose intensity exceeds the intensity of the frequency spectrum PLR is selected as the target sound frequency FA.

また、時間領域の音信号Ｓ1および音信号Ｓ2を信号処理部３２が処理する構成も採用される。すなわち、信号処理部３２は、音信号Ｓ1から音信号Ｓ2を減算した信号Ｓ0と、遅延量Ｄを付与した音信号Ｓ1を音信号Ｓ2から減算した信号ＳRと、遅延量Ｄを付与した音信号Ｓ2を音信号Ｓ1から減算した信号ＳLとを生成する。信号処理部３２の後段に配置された周波数分析部２０は、信号Ｓ0を周波数スペクトルＰ0に変換し、信号ＳRを周波数スペクトルＰRに変換し、信号ＳLを周波数スペクトルＰLに変換する。 A configuration in which the signal processing unit 32 processes the sound signal S1 and the sound signal S2 in the time domain is also employed. That is, the signal processing unit 32 subtracts the sound signal S2 from the sound signal S1, the signal SR obtained by subtracting the sound signal S1 provided with the delay amount D from the sound signal S2, and the sound signal provided with the delay amount D. A signal SL obtained by subtracting S2 from the sound signal S1 is generated. The frequency analysis unit 20 arranged at the subsequent stage of the signal processing unit 32 converts the signal S0 into the frequency spectrum P0, converts the signal SR into the frequency spectrum PR, and converts the signal SL into the frequency spectrum PL.

（４）変形例４
非目的音スペクトルＱBの強度ＸB(k)が閾値ＸTHを下回る場合（Ｓ3：NO）に強度μn(k)を算定する方法は数式(2)に限定されない。例えば、第ｎ番目の単位区間を含む所定個の単位区間にわたる強度ＸB(k)の平均（移動平均）が強度μn(k)として算定される。すなわち、強度μn(k)の算定に利用される雑音スペクトルＮの個数（単位区間の個数）は任意に変更される。 (4) Modification 4
The method of calculating the intensity μn (k) when the intensity XB (k) of the non-target sound spectrum QB is lower than the threshold value XTH (S3: NO) is not limited to Expression (2). For example, the average (moving average) of the intensities XB (k) over a predetermined number of unit sections including the nth unit section is calculated as the intensity μn (k). That is, the number of noise spectra N (number of unit sections) used for calculating the intensity μn (k) is arbitrarily changed.

また、第２実施形態において、強度ＸB(k)が閾値ＸTHを上回る場合（Ｓ3：YES）に強度μn(k)を算定する方法は、過去の強度μn-1(k)と係数βとの乗算（数式(5)）に限定されない。例えば、過去の強度μn-1(k)と所定の正数との加算値を強度μn(k)として算定する構成も採用される。すなわち、強度ＸB(k)が閾値ＸTHを上回る場合に、過去の雑音スペクトルＮの強度μn-1(k)を上回る数値を強度μn(k)として設定する構成が好適である。 In the second embodiment, when the intensity XB (k) exceeds the threshold value XTH (S3: YES), the method of calculating the intensity μn (k) is based on the past intensity μn-1 (k) and the coefficient β. It is not limited to multiplication (Formula (5)). For example, a configuration in which an addition value of the past intensity μn−1 (k) and a predetermined positive number is calculated as the intensity μn (k) is also employed. That is, when the intensity XB (k) exceeds the threshold value XTH, a configuration in which a numerical value exceeding the intensity μn−1 (k) of the past noise spectrum N is set as the intensity μn (k) is preferable.

（５）変形例５
雑音抑圧部４４が生成した雑音抑圧スペクトルＱCを出力スペクトルＲとして逆変換部５６に出力する構成（調整部５２や合成部５４を省略した構成）も採用される。ただし、雑音抑圧スペクトルＱCにおける非目的音周波数ＦBの強度はゼロとなるから、雑音抑圧スペクトルＱCから生成される再生音は聴感上において不自然な印象となる可能性がある。したがって、自然な再生音の生成という観点からすると、調整部５２による処理後の非目的音スペクトルＱBを雑音抑圧スペクトルＱCに合成する図１の構成が好適である。 (5) Modification 5
A configuration in which the noise suppression spectrum QC generated by the noise suppression unit 44 is output as an output spectrum R to the inverse conversion unit 56 (a configuration in which the adjustment unit 52 and the synthesis unit 54 are omitted) is also employed. However, since the intensity of the non-target sound frequency FB in the noise suppression spectrum QC becomes zero, the reproduced sound generated from the noise suppression spectrum QC may have an unnatural impression on hearing. Therefore, from the viewpoint of natural reproduction sound generation, the configuration of FIG. 1 is preferable in which the non-target sound spectrum QB processed by the adjustment unit 52 is combined with the noise suppression spectrum QC.

（６）変形例６
以上の各形態においては雑音スペクトルＮを非目的定常音の抑圧（雑音抑圧スペクトルＱCの生成）に使用したが、本発明に係る音処理装置の用途（雑音スペクトルＮの用途）は非目的定常音の抑圧に限定されない。例えば、目的音と非目的定常音と非目的変動音との混合音から非目的定常音を抽出するための装置としても本発明の音処理装置が好適に使用される。 (6) Modification 6
In each of the above embodiments, the noise spectrum N is used for suppression of the non-target stationary sound (generation of the noise suppression spectrum QC). However, the use of the sound processing apparatus according to the present invention (use of the noise spectrum N) is non-target stationary sound. It is not limited to repression. For example, the sound processing apparatus of the present invention is preferably used as an apparatus for extracting non-target stationary sound from a mixed sound of target sound, non-target stationary sound, and non-target fluctuation sound.

１００……音処理装置、１２……演算処理装置、１４……記憶装置、２０……周波数分析部、３０……音源分離部、３２……信号処理部、３４……周波数選別部、３６……強度特定部、４２……雑音推定部、４４……雑音抑圧部、５０……信号合成部、５２……調整部、５４……合成部、５６……逆変換部。
DESCRIPTION OF SYMBOLS 100 ... Sound processing device, 12 ... Arithmetic processing device, 14 ... Memory | storage device, 20 ... Frequency analysis part, 30 ... Sound source separation part, 32 ... Signal processing part, 34 ... Frequency selection part, 36 ... ... intensity specifying part, 42 ... noise estimating part, 44 ... noise suppressing part, 50 ... signal combining part, 52 ... adjusting part, 54 ... combining part, 56 ... inverse converting part.

Claims

From the multiple sound signals generated by multiple sound collection devices, specify the intensity of each non-target sound frequency component in which the non-target sound dominates from a different direction from the target sound among multiple frequencies for each unit section Sound source separation means to perform,
Noise estimation means for generating a noise spectrum for each unit section,
The noise estimation means includes
The intensity of the component of one non-target sound frequency in the first unit section is below a threshold value exceeding the intensity at the one non-target sound frequency in the noise spectrum of the second unit section before the start of the first unit section. , The intensity at the one non-target sound frequency in the noise spectrum of the first unit section, the intensity of the component of the one non-target sound frequency in the first unit section, and the noise spectrum of the second unit section. Set according to the intensity at the one non-target sound frequency,
When the intensity of the component of the one non-target sound frequency in the first unit section exceeds the threshold, the intensity at the one non-target sound frequency in the noise spectrum of the first unit section is determined as the first unit section. The sound processing device is set to a numerical value exceeding the intensity at the one non-target sound frequency in the noise spectrum of the second unit section without reflecting the intensity of the component of the one non-target sound frequency in.

When the intensity of one non-target sound frequency component in the first unit section exceeds the threshold value, the noise estimation means determines the intensity at the one non-target sound frequency in the noise spectrum of the second unit section as 1 Is set as the intensity at the one non-target sound frequency in the noise spectrum of the first unit section.
The sound processing apparatus according to claim 1 .

The sound source separation means generates a target sound spectrum composed of components of each target sound frequency in which the target sound is dominant among the plurality of frequencies,
The sound processing apparatus according to claim 1 or claim 2 comprising a noise suppression means for subtracting the noise spectrum from the target sound spectrum.

From the multiple sound signals generated by multiple sound collection devices, specify the intensity of each non-target sound frequency component in which the non-target sound dominates from a different direction from the target sound among multiple frequencies for each unit section Sound source separation processing,
A process of generating a noise spectrum for each unit section,
The intensity of the component of one non-target sound frequency in the first unit section is below a threshold value exceeding the intensity at the one non-target sound frequency in the noise spectrum of the second unit section before the start of the first unit section. , The intensity at the one non-target sound frequency in the noise spectrum of the first unit section, the intensity of the component of the one non-target sound frequency in the first unit section, and the noise spectrum of the second unit section. Set according to the intensity at the one non-target sound frequency,
When the intensity of the component of the one non-target sound frequency in the first unit section exceeds the threshold, the intensity at the one non-target sound frequency in the noise spectrum of the first unit section is determined as the first unit section. Noise estimation processing for setting a numerical value exceeding the intensity at the one non-target sound frequency in the noise spectrum of the second unit interval without reflecting the intensity of the component at the one non-target sound frequency in the computer. The program to be executed.