JP2011015018A

JP2011015018A - Automatic sound volume controller

Info

Publication number: JP2011015018A
Application number: JP2009155488A
Authority: JP
Inventors: Takeshi Hashimoto; 武志橋本
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2009-06-30
Filing date: 2009-06-30
Publication date: 2011-01-20

Abstract

PROBLEM TO BE SOLVED: To achieve a smooth conversation as usual without requiring a driver or an occupant to operate a sound volume adjusting switch or a mute switch or to speak in a loud voice.SOLUTION: The automatic sound volume controller 1 includes: voice signal extraction means 2, 3 for extracting acoustic signals by applying band limitation of voice band components and adaptive algorithm to voice signals collected by microphones M1 and M2; a sound volume correction means 4 for reducing the variation feeling of a sound volume in music signals; a voice analysis means 6 for obtaining a voice analysis gain obtained by weighting a voice detection part in the acoustic signals; a voice detection means 7 for enhancing the voice detection part and obtaining voice detection signals by applying the voice analysis gain to the acoustic signals; and a sound volume control means 8 for reducing the output level of the music signals when detecting the voice signals on the basis of the voice detection signals.

Description

本発明は自動音量制御装置に関し、より詳細には、音楽が流されている空間において会話が行われた場合に、発話に応じて音楽の出力レベルを低減させることにより円滑な会話を実現することが可能な自動音量制御装置に関する。 The present invention relates to an automatic volume control apparatus, and more particularly, to realize smooth conversation by reducing the output level of music according to the utterance when conversation is performed in a space where music is played. The present invention relates to an automatic volume control device capable of performing the above.

走行中の車両の室内では、運転（走行）に音楽やラジオ番組等を流すことが多い。このような状況において、運転手と同乗者とが会話を行う場合には、音楽等の再生音によって、円滑な会話（会話の聞き取り等）が妨げられてしまうおそれがあった。 In the interior of a running vehicle, music and radio programs are often played during driving (running). In such a situation, when the driver and the passenger have a conversation, there is a possibility that a smooth conversation (listening of the conversation, etc.) may be hindered by the reproduced sound such as music.

一般的な車載用オーディオ装置には、音量を調節するための音量調節スイッチや、音量を一時的に低減させるためのミュートスイッチなどが設けられている（例えば、特許文献１および特許文献２参照）。このため、運転者等は、音量調節スイッチやミュートスイッチを操作することにより、会話を妨げない程度まで音楽等の再生音量を低減させることが多かった。 A general in-vehicle audio apparatus is provided with a volume control switch for adjusting the volume, a mute switch for temporarily reducing the volume, and the like (see, for example, Patent Document 1 and Patent Document 2). . For this reason, the driver or the like often reduces the playback volume of music or the like to such an extent that the conversation is not hindered by operating the volume control switch and the mute switch.

特開２００８−６２９０６号公報JP 2008-62906 A 特開２００６−６７４９０号公報JP 2006-67490 A

しかしながら、会話を行う度に音量調節スイッチを操作して再生音を低減する方法では、操作が煩雑になり、かえって円滑な会話を妨げてしまうおそれがあるという問題があった。一方で、ミュートスイッチを用いて音楽の再生音を低減させる方法では、会話が途切れた状態においてもそのまま音楽の再生音が低減された状態となってしまい、音楽やラジオ番組等を楽しむことができないという問題があった。 However, the method of operating the volume control switch every time a conversation is performed to reduce the reproduced sound has a problem that the operation becomes complicated and may hinder smooth conversation. On the other hand, with the method of reducing the music playback sound using the mute switch, the music playback sound is reduced even when the conversation is interrupted, and music or radio programs cannot be enjoyed. There was a problem.

このため、音量調節スイッチやミュートスイッチを操作することなく、発話者が会話の成立するような大きな声を発することにより、音楽等を再生させた状態で会話を行うこともしばしば行われるが、会話が続く場合には、発話者はもちろんのこと会話の相手側においても会話に疲労を感じてしまうおそれがあるという問題があった。 For this reason, a conversation is often performed in a state in which music or the like is played by a speaker speaking loudly without operating a volume control switch or a mute switch. If this continues, there is a problem that the conversation may be fatigued not only by the speaker but also by the conversation partner.

本発明は、上記問題に鑑みてなされたものであり、運転者又は同乗者が音量調節スイッチやミュートスイッチを操作することなく、さらに、大きな声を発することなく通常の状態で円滑な会話を行うことが可能な自動音量制御装置を提供することを課題とする。 The present invention has been made in view of the above problems, and a driver or a passenger does a smooth conversation in a normal state without operating a volume control switch or a mute switch and without generating a loud voice. It is an object of the present invention to provide an automatic sound volume control device that can be used.

上記課題を解決するために、本発明に係る自動音量制御装置は、マイクにより集音された音声信号に対して音声帯域成分の帯域制限処理を適用するとともに適応アルゴリズムを適用することにより音声帯域に係る音声信号を音響信号として抽出する音声信号抽出手段と、音源からの音楽信号を帯域毎に分割し、音楽信号の信号レベルが一定レベル以上の場合において、分割された各帯域における信号レベルを一定値に維持することにより前記音楽信号の音量補正を行う音量補正手段と、前記音声信号抽出手段によって抽出された音響信号より、音声検出部分の重み付けが行われた音声分析ゲインを求める音声分析手段と、前記音声信号抽出手段によって抽出された音響信号に対して前記音声分析手段により求められた音声分析ゲインを適用することにより、音響信号における音声検出部分を顕在化させて、音声検出の有無を示す音声検出信号を求める音声検出手段と、前記音声検出手段により求められた音声検出信号に基づいて、前記音量補正手段により音量補正が行われた音楽信号の出力レベルを、音声検出時に低減させる音量制御手段とを備えることを特徴とする。 In order to solve the above problems, an automatic volume control device according to the present invention applies a band limiting process of a voice band component to a voice signal collected by a microphone and applies an adaptive algorithm to a voice band. Audio signal extraction means for extracting the audio signal as an acoustic signal and a music signal from a sound source are divided for each band, and when the signal level of the music signal is equal to or higher than a certain level, the signal level in each divided band is constant Volume correction means for correcting the volume of the music signal by maintaining the value, and voice analysis means for obtaining a voice analysis gain in which a voice detection portion is weighted from the acoustic signal extracted by the voice signal extraction means; Applying the voice analysis gain obtained by the voice analysis means to the acoustic signal extracted by the voice signal extraction means. Thus, the sound detection part that makes the sound detection part in the acoustic signal obvious and obtains the sound detection signal indicating the presence or absence of the sound detection, and the sound volume correction means based on the sound detection signal obtained by the sound detection means And a volume control means for reducing the output level of the music signal subjected to the volume correction when the voice is detected.

本発明に係る自動音量制御装置によれば、音声検出手段により求められた音声検出信号に基づいて、前記音量補正手段により音量補正が行われた音楽信号の出力レベルが音声検出時に低減されるので、会話者の会話（発話）に応じて、自動的に音楽信号の信号レベルを低減させることができる。このため、会話を行う毎に音量調節スイッチやミュートスイッチを操作することなく、円滑な会話を行うことが可能となる。 According to the automatic volume control device of the present invention, the output level of the music signal whose volume is corrected by the volume correction means is reduced at the time of voice detection based on the voice detection signal obtained by the voice detection means. The signal level of the music signal can be automatically reduced according to the conversation (speech) of the talker. For this reason, it is possible to perform a smooth conversation without operating a volume control switch or a mute switch each time a conversation is performed.

特に、音量補正手段では、音源からの音楽信号を帯域毎に分割し、音楽信号の信号レベルが一定レベル以上の場合において、分割された各帯域における信号レベルを一定値に維持するので、音楽信号における音量の変動感を低減させることができる。このように音量の変動感を低減させた音楽信号に対して、音声信号の検出時において出力レベルの低減処理が行われるので、音源のソースやジャンルに依存することなく、精度良くかつ違和感なく音楽の音量を低減させることが可能となる。 In particular, the volume correction means divides the music signal from the sound source for each band, and maintains the signal level in each divided band at a constant value when the signal level of the music signal is equal to or higher than a certain level. It is possible to reduce the fluctuation of the sound volume at. For the music signal with reduced volume fluctuation, the output level is reduced when the audio signal is detected, so the music does not depend on the source or genre of the sound source, and the music is accurate and comfortable. It is possible to reduce the sound volume.

さらに、音声検出手段において音声検出の有無を示す音声検出信号を求める場合には、音声信号抽出手段によって抽出された音声帯域に係る音声信号に対して、さらに、音声検出部分の重み付けが行われた音声分析ゲインを適用させることにより、音響信号における音声検出部分を顕在化させて音声検出の有無を検出するため、音声検出信号の検出精度を高めることができる。 Furthermore, when obtaining a voice detection signal indicating the presence or absence of voice detection in the voice detection means, the voice detection portion is further weighted with respect to the voice signal related to the voice band extracted by the voice signal extraction means. By applying the voice analysis gain, the voice detection portion in the acoustic signal is revealed and the presence or absence of voice detection is detected, so that the detection accuracy of the voice detection signal can be increased.

また、上述した自動音量制御装置において、前記音声検出手段は、更に、前記音声検出信号に基づいて前記音声検出部分における所定時間毎の音声検出値を積分処理することにより、所定時間における音声の検出状態変化を求め、求められた積分値に基づいて、発話者の発話スピードを判断するテンポ検出値を算出するテンポ検出手段と、該テンポ検出手段により求められたテンポ検出値に基づいて、音声の検出時間に該当するアタック時間と、音声検出の保持時間に該当するリリース時間とを決定し、決定されたアタック時間およびリリース時間を前記音声検出信号に対して設定する適応アタックリリースフィルタ手段とを有するものであってもよい。 In the automatic sound volume control apparatus described above, the sound detection means further detects the sound at a predetermined time by integrating the sound detection value at a predetermined time in the sound detection portion based on the sound detection signal. A tempo detection means for calculating a tempo detection value for determining a state change and determining a utterance speed of the speaker based on the obtained integral value, and a tempo detection value obtained by the tempo detection means based on the tempo detection value obtained by the tempo detection means. Adaptive attack release filter means for determining an attack time corresponding to the detection time and a release time corresponding to the holding time of the voice detection, and setting the determined attack time and release time for the voice detection signal It may be a thing.

このように、音声検出手段のテンポ検出手段により、所定時間における音声の検出状態変化に基づいてテンポ検出値を算出することにより、テンポ検出値に基づいて発話者の発話スピードを判断することが可能となる。従って、適応アタックリリースフィルタ手段においてアタック時間およびリリース時間をテンポ検出値に基づいて決定することにより、発話者の発話スピード（テンポ）に応じて音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）を変動することができ、違和感のない音量制御を行うことが可能となる。 Thus, the tempo detection means of the voice detection means calculates the tempo detection value based on the change in the detection state of the voice during a predetermined time, so that the utterance speed of the speaker can be determined based on the tempo detection value. It becomes. Therefore, the adaptive attack release filter means determines the attack time and release time based on the tempo detection value, so that the voice detection time (fade-in time) and holding time (fade-out time) according to the utterance speed (tempo) of the speaker. ) Can be varied, and the sound volume can be controlled without any sense of incongruity.

また、上述した自動音量制御装置において、前記テンポ検出手段は、前記積分処理により求められた積分値を、リセット信号の入力に基づいてクリアにすることにより、発話者の発話スピードを判断するためのテンポ検出値を算出し直すものであってもよい。 Further, in the automatic volume control device described above, the tempo detection unit is configured to determine the speaking speed of the speaker by clearing the integration value obtained by the integration processing based on the input of the reset signal. The tempo detection value may be recalculated.

このように、リセット信号の入力に基づいて発話者の発話スピードを判断するテンポ検出値を算出し直すことにより、テンポ検出値に基づいて決定される音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）の決定内容が再度計算（学習）されることになるので、音声検出および保持時間が適切に変更されるように制御することができ、不意の音量変化に対する違和感を低減させることが可能となる。 Thus, by recalculating the tempo detection value for determining the utterance speed of the speaker based on the input of the reset signal, the voice detection time (fade-in time) and the holding time (based on the tempo detection value) are determined. Since the content of the determination of (fade-out time) is calculated (learned) again, it can be controlled so that the sound detection and holding time is appropriately changed, and it is possible to reduce the uncomfortable feeling of unexpected volume changes. It becomes.

また、上述した自動音量制御装置において、前記適応アタックリリースフィルタ手段は、前記アタック時間および前記リリース時間を前記テンポ検出値に基づいて設定する可変モードと、前記アタック時間および前記リリース時間を前記テンポ検出値に拘わらず所定の値に設定する固定モードとを有するものであってもよい。 Further, in the above-described automatic volume control device, the adaptive attack release filter means includes a variable mode for setting the attack time and the release time based on the tempo detection value, and the attack time and the release time for the tempo detection. It may have a fixed mode that is set to a predetermined value regardless of the value.

このように、本発明に係る自動音量制御装置では、適応アタックリリースフィルタ手段により決定される音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）を、発話者の発話スピードに応じて変更させるか、発話者の発話スピードに拘わらず所定の値に設定するかをユーザの好みにより変更することができる。このため、ユーザの使用状況や好みに応じて、最適な音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）を設定することが可能となる。特に、固定モードの場合において、音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）をユーザの好みで適宜設定することが可能な構成とすることにより、ユーザのニーズに細かく対応することが可能となる。 As described above, in the automatic volume control device according to the present invention, the voice detection time (fade-in time) and the holding time (fade-out time) determined by the adaptive attack release filter unit are changed according to the speaking speed of the speaker. Alternatively, whether the predetermined value is set regardless of the speaking speed of the speaker can be changed according to the preference of the user. For this reason, it is possible to set the optimum voice detection time (fade-in time) and holding time (fade-out time) according to the use situation and preference of the user. In particular, in the case of the fixed mode, the voice detection time (fade-in time) and the holding time (fade-out time) can be appropriately set according to the user's preference, so that the user's needs can be dealt with in detail. It becomes possible.

また、上述した自動音量制御装置において、前記音声信号抽出手段は、前記マイクにより集音された前記音声信号に対して音声帯域成分に対応する第１の帯域制限処理を行った後にＮＬＭＳ適応アルゴリズムを適用することにより音声帯域に係る音声信号を抽出するアレイマイク手段と、アレイマイク手段において抽出された音声帯域に係る前記音声信号に対して、音声帯域成分に対応する第２の帯域制限処理を行った後に、前記音楽信号のチャンネル数に対応させてカスケード接続される適応フィルタを用いて、前記第２の帯域制限処理が行われた音声信号に対して多段のＬＭＳ適応アルゴリズムを適用するオーディオキャンセラ手段とを備えるものであってもよい。 Further, in the above-described automatic volume control device, the audio signal extraction unit performs an NLMS adaptive algorithm after performing a first band limiting process corresponding to an audio band component on the audio signal collected by the microphone. Applying a second band limiting process corresponding to a voice band component to the array microphone means for extracting a voice signal relating to the voice band by applying the voice signal relating to the voice band extracted by the array microphone means Audio canceller means for applying a multi-stage LMS adaptive algorithm to the audio signal subjected to the second band limitation processing using an adaptive filter cascade-connected in correspondence with the number of channels of the music signal May be provided.

このように、アレイマイク手段においてＮＬＭＳ適応アルゴリズムを適用し、さらにオーディオキャンセラ手段において、音楽信号のチャンネル数に対応させてカスケード接続される適応フィルタを用いて多段的にＬＭＳ適応アルゴリズムを適用することにより、音声信号における音声信号成分以外の信号成分（ノイズ成分）を効果的かつ高い収束性を確保した上で低減させることができ、音声帯域における音声信号の検出精度の向上を図ることが可能となる。 In this way, by applying the NLMS adaptation algorithm in the array microphone means, and further applying the LMS adaptation algorithm in multiple stages using the adaptive filters cascaded in correspondence with the number of music signal channels in the audio canceller means. Thus, signal components (noise components) other than the audio signal components in the audio signal can be reduced while ensuring effective and high convergence, and the detection accuracy of the audio signal in the audio band can be improved. .

また、上述した自動音量制御装置において、前記オーディオキャンセラ手段における第２の帯域制限処理の帯域制限幅は、前記アレイマイク手段における第１の帯域制限処理の帯域制限幅の上限値および下限値を含み、第１の帯域制限処理の帯域制限幅よりもわずかに広い帯域幅となるように設定されるものであってもよい。 In the automatic sound volume control apparatus described above, the bandwidth limit width of the second bandwidth limit process in the audio canceller means includes an upper limit value and a lower limit value of the bandwidth limit width of the first bandwidth limit process in the array microphone means. The bandwidth may be set to be slightly wider than the bandwidth limit of the first bandwidth limitation process.

このように、第２の帯域制限処理の帯域制限幅が、第１の帯域制限処理の帯域制限幅の上限値および下限値を含み、第１の帯域制限処理の帯域制限幅よりもわずかに広い帯域幅となるように設定されることにより、帯域制限のカットオフ周波数付近のオーディオキャンセル性能を向上させることが可能となる。 As described above, the bandwidth limit width of the second bandwidth limit process includes the upper limit value and the lower limit value of the bandwidth limit width of the first bandwidth limit process, and is slightly wider than the bandwidth limit width of the first bandwidth limit process. By setting the bandwidth, the audio cancellation performance near the cutoff frequency of the band limitation can be improved.

また、上述した自動音量制御装置において、前記音量制御手段は、前記音源の音量状態に応じて、前記音楽信号における出力レベルの低減量を変化させるものであってもよい。 In the automatic volume control device described above, the volume control means may change an output level reduction amount in the music signal in accordance with a volume state of the sound source.

このように、音源の音量状態に応じて、音楽信号における出力レベルの低減量を変化させることにより、音源の音量状態に応じて、音量制御を行うことができるので、適切に音楽信号の信号レベル（音量）を変化させることが可能となる。 Thus, by changing the output level reduction amount in the music signal in accordance with the volume state of the sound source, the volume control can be performed in accordance with the volume state of the sound source. (Volume) can be changed.

本発明に係る自動音量制御装置によれば、音楽信号の出力レベルが、音声検出時に低減されるので、会話者の会話（発話）に応じて、自動的に音楽信号の信号レベルを低減させることができる。このため、会話を行う毎に音量調節スイッチやミュートスイッチを操作することなく、円滑な会話を行うことが可能となる。 According to the automatic volume control device of the present invention, the output level of the music signal is reduced when the voice is detected, so that the signal level of the music signal is automatically reduced according to the conversation (utterance) of the talker. Can do. For this reason, it is possible to perform a smooth conversation without operating a volume control switch or a mute switch each time a conversation is performed.

本実施の形態に係る自動音量制御装置の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the automatic volume control apparatus which concerns on this Embodiment. 本実施の形態に係るアレイマイク部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the array microphone part which concerns on this Embodiment. 車両の室内にマイクおよびスピーカが設置された状態を示した図である。It is the figure which showed the state in which the microphone and the speaker were installed in the vehicle interior. 本実施の形態に係るマイクＭ１およびマイクＭ２の指向性と、それぞれのマイクの指向性の違いにより求められる強調された指向性を視覚的に示した図である。It is the figure which showed visually the emphasized directivity calculated | required by the directivity of microphone M1 and microphone M2 which concern on this Embodiment, and the directivity difference of each microphone. 本実施の形態に係るオーディオキャンセラ部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the audio canceller part which concerns on this Embodiment. 本実施の形態に係るマイクで集音された音響信号（無指向性マイク）と、アレイマイク部の適応フィルタ部が適用された後の音響信号（アレイマイク部）と、オーディオキャンセラ部において第１適応フィルタ部が適用された後の音響信号（アレイマイク部＋オーディオキャンセラ部（Ｌ））と、第２適応フィルタ部が適用された後の音響信号（アレイマイク部＋オーディオキャンセラ部（Ｌ＋Ｒ））の周波数特性を示した図である。The acoustic signal (omnidirectional microphone) collected by the microphone according to the present embodiment, the acoustic signal (array microphone unit) after the adaptive filter unit of the array microphone unit is applied, and the first in the audio canceller unit Acoustic signal after applying the adaptive filter unit (array microphone unit + audio canceller unit (L)) and acoustic signal after applying the second adaptive filter unit (array microphone unit + audio canceller unit (L + R)) It is the figure which showed the frequency characteristic. （ａ）は、アレイマイク部の適応フィルタ部で適用されるフィルタ係数を示し、（ｂ）は、オーディオキャンセラ部の第１適応フィルタ部で適用されるフィルタ係数を示し、（ｃ）は、オーディオキャンセラ部の第２適応フィルタ部で適用されるフィルタ係数を示した図である。(A) shows the filter coefficient applied in the adaptive filter part of the array microphone part, (b) shows the filter coefficient applied in the first adaptive filter part of the audio canceller part, and (c) shows the audio coefficient. It is the figure which showed the filter coefficient applied in the 2nd adaptive filter part of a canceller part. 本実施の形態に係る音量補正部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the sound volume correction | amendment part which concerns on this Embodiment. 本実施の形態に係る３バンドバンドパスフィルタ部においてＬチャンネルの音楽信号Ｌ１を、低域、中域、高域の３つの帯域に周波数分割する処理に用いられる機能部を示したブロック図である。It is the block diagram which showed the function part used for the process which frequency-divides L channel music signal L1 into three bands, a low region, a mid range, and a high region, in the 3 band band pass filter part which concerns on this Embodiment. . 本実施の形態に係る３バンドバンドパスフィルタ部の第１ローパスフィルタ部および第２ローパスフィルタ部のフィルタ特性を示した図である。It is the figure which showed the filter characteristic of the 1st low pass filter part of the 3 band band pass filter part which concerns on this Embodiment, and a 2nd low pass filter part. 本実施の形態に係る最大値検出及び最大値ホールド部において、入力された信号に対して２ｍｓｅｃ毎の最大値検出を行い、さらに最大値を１６ｍｓｅｃだけホールドした状態を示した図である。It is the figure which showed the state which performed the maximum value detection for every 2 msec with respect to the input signal, and also hold | maintained the maximum value only for 16 msec in the maximum value detection and maximum value holding part which concerns on this Embodiment. 本実施の形態に係るゲイン計算部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the gain calculation part which concerns on this Embodiment. 本実施の形態に係る第１ルックアップテーブル部〜第３ルックアップテーブル部のレベル変換動作の一例を示した図である。It is the figure which showed an example of the level conversion operation | movement of the 1st lookup table part-3rd lookup table part which concerns on this Embodiment. 本実施の形態に係るゲイン設定部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the gain setting part which concerns on this Embodiment. 本実施の形態に係る音量補正部の低域信号を基準として、最大値検出及び最大値ホールド部より出力される最大値ホールド信号（制御信号）と、ゲイン計算部の第１アタックリリースフィルタ部より出力される出力信号と、ゲイン計算部より出力される低域制御信号とを示した図である。From the maximum value hold signal (control signal) output from the maximum value detection and maximum value hold unit on the basis of the low frequency signal of the volume correction unit according to the present embodiment, and from the first attack release filter unit of the gain calculation unit It is the figure which showed the output signal output and the low-pass control signal output from a gain calculation part. 本実施の形態に係る音量補正部の中域信号を基準として、最大値検出及び最大値ホールド部より出力される最大値ホールド信号（制御信号）と、ゲイン計算部の第２アタックリリースフィルタ部より出力される出力信号と、ゲイン計算部より出力される中域制御信号とを示した図である。From the maximum value hold signal (control signal) output from the maximum value detection and maximum value hold unit and the second attack release filter unit of the gain calculation unit with reference to the mid-range signal of the volume correction unit according to the present embodiment It is the figure which showed the output signal output and the mid range control signal output from a gain calculation part. 音源の信号レベルが低い場合における図１５の内容を示した図である。It is the figure which showed the content of FIG. 15 in case the signal level of a sound source is low. （ａ）は、本実施の形態に係る音量補正部において音量補正が行われなかった場合の信号状態を示し、（ｂ）は、本実施の形態に係る音量補正部において音量補正が行われた場合の信号状態を示した図である。(A) shows the signal state when the volume correction is not performed in the volume correction unit according to the present embodiment, and (b) is the volume correction performed in the volume correction unit according to the present embodiment. It is the figure which showed the signal state in the case. 本実施の形態に係る音声分析部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the audio | voice analysis part which concerns on this Embodiment. （ａ）は、本実施の形態に係る音源から出力される音楽信号を示し、（ｂ）は、本実施の形態に係るマイクから集音される音声信号を示し、（ｃ）は、本実施の形態に係る自動音量制御装置において音量制御が行われた後の音楽信号を示した図である。(A) shows the music signal output from the sound source according to the present embodiment, (b) shows the audio signal collected from the microphone according to the present embodiment, and (c) shows the present embodiment. It is the figure which showed the music signal after volume control was performed in the automatic volume control apparatus which concerns on the form. （ａ）は、図２０（ａ）に示す音楽信号と、図２０（ｂ）に示すような音声信号とが、マイクに入力された場合において、オーディオキャンセラ部より音声分析部へ入力された信号に対する音声分析値を示しており、（ｂ）は、同様の場合において、音声分析部より出力される音声分析ゲインを示している。(A) is a signal input from the audio canceller to the audio analyzer when the music signal shown in FIG. 20 (a) and the audio signal as shown in FIG. 20 (b) are input to the microphone. (B) shows the voice analysis gain output from the voice analysis unit in the same case. 本実施の形態に係る音声検出部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the audio | voice detection part which concerns on this Embodiment. 図２０（ａ）に示すような音楽信号と、図２０（ｂ）に示すような音声信号とが、マイクに入力された場合における、音声検出スレッショルド部の入力信号（音声信号×音声分析ゲイン）と、音声検出スレッショルド部で設定される音声検出スレッショルドとを示した図である。When a music signal as shown in FIG. 20A and an audio signal as shown in FIG. 20B are input to the microphone, the input signal (audio signal × audio analysis gain) of the audio detection threshold unit. And a voice detection threshold set by a voice detection threshold unit. 本実施の形態に係るテンポ検出部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the tempo detection part which concerns on this Embodiment. （ａ）は、本実施の形態に係る音声検出スレッショルド部における音声検出スレッショルドの検出により２値化された音声検出信号を示し、（ｂ）は、２値化された音声検出信号に基づいて、テンポゲイン部より出力される信号を示し、（ｃ）は、本実施の形態に係るローパスフィルタ部において求められる積分値（積分出力）と、ゲインオフセット部により出力されるテンポ検出値とを示している。(A) shows an audio detection signal binarized by detecting the audio detection threshold in the audio detection threshold unit according to the present embodiment, and (b) is based on the binarized audio detection signal, The signal output from the tempo gain unit is shown, and (c) shows the integrated value (integrated output) obtained by the low-pass filter unit according to the present embodiment and the tempo detection value output by the gain offset unit. Yes. （ａ）は、固定モードにおいて本実施の形態に係る音声検出スレッショルド部より入力された音声検出信号を示し、（ｂ）は、適応アタックリリースフィルタ部における音声保持フィルタと音声保持スレッショルド部における音声保持スレッショルドとを示した図である。(A) shows the audio | voice detection signal input from the audio | voice detection threshold part which concerns on this Embodiment in fixed mode, (b) is the audio | voice holding | maintenance filter in an adaptive attack release filter part, and the audio | voice holding | maintenance in an audio | voice holding threshold part It is the figure which showed the threshold. （ａ）は、図２６（ａ）（ｂ）に示した信号状態に基づいて求められた音声間隔検出値を示し、（ｂ）は、（ａ）に示した音声検出信号に対するアタックリリースフィルタ部の出力信号を示し、（ｃ）は、レベルリミッタ部における音量制御値を示している。(A) shows the voice interval detection value obtained based on the signal states shown in FIGS. 26 (a) and (b), and (b) shows the attack release filter section for the voice detection signal shown in (a). (C) shows the volume control value in the level limiter unit. 本実施の形態に係る音量制御部の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the volume control part which concerns on this Embodiment. 本実施の形態に係るレベル計算部において、音量情報（ボリュームの調整量）の増減に対応して変化する音量制御レベルを複数種類示した図である。In the level calculation part which concerns on this Embodiment, it is the figure which showed multiple types of volume control levels which change according to increase / decrease in volume information (volume adjustment amount). 本実施の形態に係るテンポ検出部において、テンポ検出値に応じて決定されるアタック時間（音声検出時間）とリリース時間（保持時間）との関係を示した図である。It is the figure which showed the relationship between attack time (voice detection time) and release time (holding time) determined according to a tempo detection value in the tempo detection part which concerns on this Embodiment. （ａ）は、音声信号の検出状態において、音声速度が短く、音声間隔が非常に長い場合における音声検出信号の検出状態を示し、（ｂ）は、固定モードにおける適応アタックリリースフィルタ部の音声保持フィルタの適用動作例を示し、（ｃ）は、可変モードにおける適応アタックリリースフィルタ部の音声保持フィルタの適応動作例を示した図である。(A) shows the detection state of the voice detection signal when the voice speed is short and the voice interval is very long in the voice signal detection state, and (b) shows the voice holding of the adaptive attack release filter unit in the fixed mode. (C) is the figure which showed the example of adaptive operation of the audio | voice holding filter of the adaptive attack release filter part in a variable mode. （ａ）は、音声信号の検出状態において、音声速度がやや短く、音声間隔がやや長い場合における音声検出値の検出状態を示し、（ｂ）は、固定モードにおける適応アタックリリースフィルタ部の音声保持フィルタの適用動作例を示し、（ｃ）は、可変モードにおける適応アタックリリースフィルタ部の音声保持フィルタの適応動作例を示した図である。(A) shows the detection state of the sound detection value when the sound speed is slightly short and the sound interval is slightly long in the sound signal detection state, and (b) is the sound holding of the adaptive attack release filter unit in the fixed mode. (C) is the figure which showed the example of adaptive operation of the audio | voice holding filter of the adaptive attack release filter part in a variable mode.

以下、本発明に係る自動音量制御装置について、図面を用いて詳細に説明を行う。 Hereinafter, an automatic sound volume control apparatus according to the present invention will be described in detail with reference to the drawings.

図１は、本実施の形態に係る自動音量制御装置１の概略構成を示したブロック図である。なお、本実施の形態では、自動音量制御装置１が車両に設置される場合を一例として示して説明する。本発明に係る自動音量制御装置を車両に設置することにより、会話の有無に応じて、車載用オーディオ装置より出力される音楽の音量を自動的に低減させることが可能になる。 FIG. 1 is a block diagram showing a schematic configuration of an automatic volume control apparatus 1 according to the present embodiment. In the present embodiment, the case where the automatic volume control device 1 is installed in a vehicle will be described as an example. By installing the automatic volume control device according to the present invention in a vehicle, it is possible to automatically reduce the volume of music output from the in-vehicle audio device according to the presence or absence of a conversation.

本実施の形態に係る自動音量制御装置１は、図１に示すように、アレイマイク部（アレイマイク手段、音声信号抽出手段）２と、オーディオキャンセラ部（オーディオキャンセラ手段、音声信号抽出手段）３と、音量補正部（音量補正手段）４と、メインボリューム部５と、音声分析部（音声分析手段）６と、音声検出部（音声検出手段）７と、音量制御部（音量制御手段）８と、パワーアンプ部９と、マイクＭ１，Ｍ２と、スピーカ１０ａ，１０ｂとにより概略構成されている。 As shown in FIG. 1, an automatic volume control apparatus 1 according to the present embodiment includes an array microphone unit (array microphone unit, audio signal extraction unit) 2 and an audio canceller unit (audio canceller unit, audio signal extraction unit) 3. A volume correction unit (volume correction unit) 4, a main volume unit 5, a voice analysis unit (speech analysis unit) 6, a voice detection unit (speech detection unit) 7, and a volume control unit (volume control unit) 8. And a power amplifier unit 9, microphones M1 and M2, and speakers 10a and 10b.

［アレイマイク部］
図２は、アレイマイク部２の概略構成を示したブロック図である。アレイマイク部２は、図２に示すように、第１バンドパスフィルタ部２１と、第２バンドパスフィルタ部２２と、遅延部２３と、適応フィルタ部２４とを有している。 [Array microphone section]
FIG. 2 is a block diagram showing a schematic configuration of the array microphone unit 2. As shown in FIG. 2, the array microphone unit 2 includes a first band pass filter unit 21, a second band pass filter unit 22, a delay unit 23, and an adaptive filter unit 24.

第１バンドパスフィルタ部２１および第２バンドパスフィルタ部２２は、マイクＭ１とマイクＭ２を介して入力される音響信号に対して４００Ｈｚ〜２．４ｋＨｚ程度の帯域制限を行う役割を有している。従って、第１バンドパスフィルタ部２１および第２バンドパスフィルタ部２２を通過する音響信号は、マイクＭ１，Ｍ２を介して入力される音響信号のうち音声帯域に対応する信号だけになる。 The first band-pass filter unit 21 and the second band-pass filter unit 22 have a role of performing band limitation of about 400 Hz to 2.4 kHz on an acoustic signal input via the microphone M1 and the microphone M2. . Therefore, the acoustic signals passing through the first bandpass filter unit 21 and the second bandpass filter unit 22 are only signals corresponding to the audio band among the acoustic signals input via the microphones M1 and M2.

遅延部２３は、適応フィルタ部２４における信号の減算処理に対応させるべく、マイクＭ１側の音響信号の遅延を行う役割を有している。このため、遅延部２３は、第１バンドパスフィルタ部２１により帯域制限が行われたマイクＭ１の音響信号に対してのみ適用される。遅延部２３により遅延処理が行われた音響信号は、適応フィルタ部２４に入力される。 The delay unit 23 has a role of delaying the acoustic signal on the microphone M1 side so as to correspond to the signal subtraction processing in the adaptive filter unit 24. For this reason, the delay unit 23 is applied only to the acoustic signal of the microphone M <b> 1 whose band is limited by the first bandpass filter unit 21. The acoustic signal subjected to the delay process by the delay unit 23 is input to the adaptive filter unit 24.

適応フィルタ部２４は、マイクＭ１より入力されて遅延部２３により遅延処理が行われた音響信号から、マイクＭ２より入力された音響信号の減算処理を行う。 The adaptive filter unit 24 performs subtraction processing of the acoustic signal input from the microphone M2 from the acoustic signal input from the microphone M1 and subjected to delay processing by the delay unit 23.

適応フィルタ部２４は、ＦＩＲ（Finite Impulse Response Filter）部２５とＮＬＭＳ(Normalized Least Mean Square)部２６と、加算部２７とを有している。 The adaptive filter unit 24 includes an FIR (Finite Impulse Response Filter) unit 25, an NLMS (Normalized Least Mean Square) unit 26, and an addition unit 27.

ＦＩＲ部２５は、有限のインパルス応答フィルタを備えており、ＮＬＭＳ部２６によって行われる係数制御に基づいて、マイクＭ２で集音された音響信号に対してフィルタ処理を施す機能を有している。加算部２７は、ＦＩＲ部２５によりフィルタ処理が行われたマイクＭ２からの音響信号を、位相を反転させた状態で、遅延部２３により遅延処理が行われたマイクＭ１からの音響信号に対して加算する（実質的には、マイクＭ１の音響信号から、フィルタ処理が行われたマイクＭ２の音響信号を減算する）。加算部２７により加算処理された音響信号は、適応フィルタ部２４から出力されるとともに、ＮＬＭＳ部２６へ出力される。 The FIR unit 25 includes a finite impulse response filter, and has a function of performing filter processing on the acoustic signal collected by the microphone M <b> 2 based on the coefficient control performed by the NLMS unit 26. The adder 27 applies the acoustic signal from the microphone M2 that has been filtered by the FIR unit 25 to the acoustic signal from the microphone M1 that has been subjected to the delay processing by the delay unit 23 while inverting the phase. Addition (substantially subtracting the acoustic signal of the microphone M2 subjected to the filtering process from the acoustic signal of the microphone M1). The acoustic signal added by the adding unit 27 is output from the adaptive filter unit 24 and also output to the NLMS unit 26.

ＮＬＭＳ部２６は、加算部２７より取得した音響信号（マイクＭ１の音響信号からフィルタ処理が行われたマイクＭ２の音響信号が減算された信号）と、マイクＭ２によって集音された音響信号とに基づいて、最小二乗アルゴリズムに基づいてＦＩＲ部２５におけるフィルタの係数制御を行う。このようにＮＬＭＳ部２６を適応フィルタ部２４に設けることによって、適応速度が入力信号の大きさに依存しないという特徴を備えたＮＬＭＳアルゴリズムを適用することが可能となる。 The NLMS unit 26 converts the acoustic signal acquired from the adding unit 27 (a signal obtained by subtracting the acoustic signal of the microphone M2 subjected to the filtering process from the acoustic signal of the microphone M1) and the acoustic signal collected by the microphone M2 into the acoustic signal. Based on the least square algorithm, the filter coefficient control of the FIR unit 25 is performed. By providing the NLMS unit 26 in the adaptive filter unit 24 in this way, it is possible to apply an NLMS algorithm having a feature that the adaptive speed does not depend on the magnitude of the input signal.

マイクＭ１とマイクＭ２とは、図３に示すように、車両２８の運転席２８ａおよび助手席２８ｂの上方位置に設けられたサンバイザーに設置されている。マイクＭ１およびマイクＭ２は、車両室内における会話を取得するために用いられるものであり、図４に示すように、マイクＭ１には、無指向性のマイクが用いられ、マイクＭ２には、単一指向性のマイクが用いられている。このようにして、無指向性のマイクＭ１により集音された音と、指向性を備えたマイクＭ２により集音された音とが、それぞれアレイマイク部２に入力される。 As shown in FIG. 3, the microphone M1 and the microphone M2 are installed in a sun visor provided above the driver seat 28a and the passenger seat 28b of the vehicle 28. The microphone M1 and the microphone M2 are used for acquiring a conversation in the vehicle interior. As shown in FIG. 4, a non-directional microphone is used for the microphone M1, and a single microphone M2 is used. A directional microphone is used. In this way, the sound collected by the omnidirectional microphone M1 and the sound collected by the directional microphone M2 are respectively input to the array microphone unit 2.

無指向性のマイクＭ１により集音された音と、指向性を備えたマイクＭ２により集音された音とが、それぞれアレイマイク部２に入力されると、アレイマイク部２の適応フィルタ部２４において、マイクＭ１の音響信号からマイクＭ２の音響信号が減算されるため、減算結果はマイクＭ２のヌル方向（マイクＭ２における指向範囲以外の方向）が残り、結果として、該当する方向の指向性が強調されることになる。 When the sound collected by the omnidirectional microphone M1 and the sound collected by the directional microphone M2 are respectively input to the array microphone unit 2, the adaptive filter unit 24 of the array microphone unit 2 is used. , The acoustic signal of the microphone M2 is subtracted from the acoustic signal of the microphone M1, so that the subtraction result remains the null direction of the microphone M2 (the direction other than the directivity range in the microphone M2), and as a result, the directivity in the corresponding direction is It will be emphasized.

従って、指向性が強調される方向に発話者が位置するようにして、マイクＭ１とマイクＭ２とを設置することによって、発話者の音声を効果的に取得することが可能となる。このように発話者の音声を効果的に取得することにより、取得される音声が強調されることになるので、アレイマイク部２において求められる音響信号は、発話者の音声（希望信号Ｄ）と車載用オーディオ信号から出力される音楽（非希望信号Ｕ）との相対的な比率、すなわちＤ／Ｕが改善された信号となる。 Accordingly, by installing the microphone M1 and the microphone M2 so that the speaker is positioned in the direction in which the directivity is emphasized, it is possible to effectively acquire the voice of the speaker. By effectively acquiring the voice of the speaker in this way, the acquired voice is emphasized, so the acoustic signal required in the array microphone unit 2 is the voice of the speaker (desired signal D). The relative ratio to the music (undesired signal U) output from the in-vehicle audio signal, that is, the D / U is improved.

なお、アレイマイク部２、マイクＭ１およびマイクＭ２の構成は、本実施の形態において説明した構成には限定されず、発話者の音声に対する指向性が強調されて、Ｄ／Ｕを改善することが可能な方式を実現するものであれば、異なる構成となるものであってもよい。 Note that the configurations of the array microphone unit 2, the microphone M1, and the microphone M2 are not limited to the configurations described in the present embodiment, and the directivity of the speaker's voice is emphasized to improve D / U. A different configuration may be used as long as a possible method is realized.

また、図３に示すように、車両２８にはスピーカが４カ所、具体的には、右フロントドア、左フロントドア、右リアドア、左リアドアの４カ所にそれぞれ設けられており、右フロントドアおよび右リアドアに設けられるスピーカ（このスピーカがスピーカ１０ａに該当する）からは、パワーアンプ部９において右側成分の音響効果が強調された音楽信号（右側音楽信号Ｒ５）が出力され、左フロントドアおよび左リアドアに設けられるスピーカ（このスピーカがスピーカ１０ｂに該当する）からは、パワーアンプ部９において左側成分の音響効果が強調された音楽信号（左側音楽信号Ｌ５）が出力される。 Further, as shown in FIG. 3, the vehicle 28 is provided with four speakers, specifically, a right front door, a left front door, a right rear door, and a left rear door, respectively. From the speaker provided in the right rear door (this speaker corresponds to the speaker 10a), a music signal (right music signal R5) in which the acoustic effect of the right component is emphasized in the power amplifier unit 9 is output, and the left front door and the left From the speaker provided in the rear door (this speaker corresponds to the speaker 10b), a music signal (left music signal L5) in which the acoustic effect of the left component is emphasized in the power amplifier unit 9 is output.

［オーディオキャンセラ部］
次に、オーディオキャンセラ部３について説明を行う。図５は、オーディオキャンセラ部３の概略構成を示したブロック図である。オーディオキャンセラ部３は、図５に示すように、第１バンドパスフィルタ部３１と、第２バンドパスフィルタ部３２と、第１遅延部３３と、第２遅延部３４と、第１適応フィルタ部３５と、第２適応フィルタ部３６とを有している。 [Audio canceller]
Next, the audio canceller unit 3 will be described. FIG. 5 is a block diagram showing a schematic configuration of the audio canceller unit 3. As shown in FIG. 5, the audio canceller unit 3 includes a first band pass filter unit 31, a second band pass filter unit 32, a first delay unit 33, a second delay unit 34, and a first adaptive filter unit. 35 and a second adaptive filter unit 36.

第１バンドパスフィルタ部３１および第２バンドパスフィルタ部３２は、音量補正部４を通過した２チャンネルの音楽信号、すなわち左側の音楽信号Ｌ２および右側の音楽信号Ｒ２において、２００Ｈｚ〜２．６ｋＨｚ程度の帯域制限を行うことにより、音響信号のうち主に音声帯域の信号のみを通過させる役割を有している。 The first band-pass filter unit 31 and the second band-pass filter unit 32 are about 200 Hz to 2.6 kHz in the two-channel music signal that has passed through the volume correction unit 4, that is, the left music signal L2 and the right music signal R2. By performing the band limitation, it has a role of allowing only the sound band signal to pass mainly among the acoustic signals.

なお、オーディオキャンセラ部３では、第１バンドパスフィルタ部３１および第２バンドパスフィルタ部３２において設定される帯域制限幅（２００Ｈｚ〜２．６ｋＨｚ程度）を、アレイマイク部２の第１バンドパスフィルタ部２１および第２バンドパスフィルタ部２２で設定される帯域制限幅（４００Ｈｚ〜２．４ｋＨｚ程度）よりも広い帯域幅（但し、４００Ｈｚ〜２．４ｋＨｚを含む）に設定することにより、アレイマイク部２の帯域制限のカットオフ付近、すなわち４００Ｈｚや２．４ｋＨｚにおけるオーディオキャンセル性能の向上を図っている。 Note that the audio canceller unit 3 uses the band limit width (about 200 Hz to 2.6 kHz) set in the first bandpass filter unit 31 and the second bandpass filter unit 32 as the first bandpass filter of the array microphone unit 2. By setting the bandwidth (including 400 Hz to 2.4 kHz) wider than the bandwidth limit (about 400 Hz to 2.4 kHz) set by the unit 21 and the second band pass filter unit 22, the array microphone unit The audio cancellation performance is improved in the vicinity of the band limit cutoff of 2, that is, at 400 Hz and 2.4 kHz.

第１遅延部３３および第２遅延部３４は、第１バンドパスフィルタ部３１および第２バンドパスフィルタ部３２により帯域制限処理が行われた音響信号に対して遅延処理を施す役割を有している。第１遅延部３３および第２遅延部３４による遅延処理によって、アレイマイク部２を通して入力される音響信号の伝搬遅延の補正を行うことが可能となる。 The first delay unit 33 and the second delay unit 34 have a role of performing a delay process on the acoustic signal subjected to the band limiting process by the first band pass filter unit 31 and the second band pass filter unit 32. Yes. By the delay processing by the first delay unit 33 and the second delay unit 34, it is possible to correct the propagation delay of the acoustic signal input through the array microphone unit 2.

第１適応フィルタ部３５は、第１ＦＩＲ部３７、第１ＬＭＳ部３９、第１加算部４１により概略構成されており、第２適応フィルタ部３６は、第２ＦＩＲ部３８、第２ＬＭＳ部４０、第２加算部４２により概略構成されている。第１適応フィルタ部３５および第２適応フィルタ部３６は、アレイマイク部２の適応フィルタ部２４におけるＮＬＭＳ部２６を第１ＬＭＳ部３９および第２ＬＭＳ部４０に置き換えた構成に該当する。 The first adaptive filter unit 35 is roughly configured by a first FIR unit 37, a first LMS unit 39, and a first adder unit 41. The second adaptive filter unit 36 includes a second FIR unit 38, a second LMS unit 40, a second LMS unit 40, and a second LMS unit 40. The adder 42 is roughly configured. The first adaptive filter unit 35 and the second adaptive filter unit 36 correspond to a configuration in which the NLMS unit 26 in the adaptive filter unit 24 of the array microphone unit 2 is replaced with a first LMS unit 39 and a second LMS unit 40.

第１適応フィルタ部３５および第２適応フィルタ部３６では、第１ＬＭＳ部３９および第２ＬＭＳ部４０において、一般的なＬＭＳ(Least Mean Square)アルゴリズムを用いることによって、アレイマイク部２から入力される音響信号から、音量補正部４によって音量補正が行われた音楽信号Ｌ２および音楽信号Ｒ２を順番に減算する処理を行う。具体的な第１適応フィルタ部３５および第２適応フィルタ部３６の構成については、図２に示すように、アレイマイク部２の適応フィルタ部２４と同様の構成であるため、ここでの詳細な説明を省略する。 In the first adaptive filter unit 35 and the second adaptive filter unit 36, the first LMS unit 39 and the second LMS unit 40 use a general LMS (Least Mean Square) algorithm to input sound from the array microphone unit 2. A process of sequentially subtracting the music signal L2 and the music signal R2 whose volume has been corrected by the volume correction unit 4 from the signal is performed. The specific configurations of the first adaptive filter unit 35 and the second adaptive filter unit 36 are the same as those of the adaptive filter unit 24 of the array microphone unit 2 as shown in FIG. Description is omitted.

なお、オーディオキャンセラ部３では、図５に示すように、第１適応フィルタ部３５および第２適応フィルタ部３６がカスケード接続されている。従って、オーディオキャンセラ部３では、第１適応フィルタ部３５においてアレイマイク部２から入力された音響信号を音楽信号Ｌ２で減算処理した後に、第２適応フィルタ部３６において第１適応フィルタ部３５で減算処理された音響信号を音楽信号Ｒ２で減算処理する構成となっている。この場合において、第１適応フィルタ部３５は第２適応フィルタ部３６よりも早く収束させることが必要となるため、適応速度を大きく設定している。なお、音源が２チャンネル以上ある場合は、チャンネル数に応じて適応フィルタ部の設置数を増加することにより同様の効果を奏することが可能である。 In the audio canceller unit 3, the first adaptive filter unit 35 and the second adaptive filter unit 36 are cascade-connected as shown in FIG. Therefore, in the audio canceller unit 3, the first adaptive filter unit 35 subtracts the acoustic signal input from the array microphone unit 2 using the music signal L 2, and then the second adaptive filter unit 36 subtracts the first adaptive filter unit 35. The processed acoustic signal is subtracted by the music signal R2. In this case, since the first adaptive filter unit 35 needs to converge faster than the second adaptive filter unit 36, the adaptive speed is set to be large. When there are two or more sound sources, the same effect can be obtained by increasing the number of adaptive filter units installed according to the number of channels.

図６は、図３に示すような車両２８において、マイクＭ１により集音された音響信号の周波数特性と、アレイマイク部２およびオーディオキャンセラ部３を動作させた場合における適応フィルタ部２４、第１適応フィルタ部３５および第２適応フィルタ部３６の出力信号の周波数特性を示した図である。具体的に図６には、アレイマイク部２の適応フィルタ部２４を適用させる前のマイクＭ１の音響信号（図６において「無指向性マイク」で示すグラフ）と、アレイマイク部２の適応フィルタ部２４が適用された後の音響信号（図６において「アレイマイク部」で示すグラフ）と、オーディオキャンセラ部３において第１適応フィルタ部３５を適用した後の音響信号（図６において「アレイマイク部＋オーディオキャンセラ部（Ｌ）」で示すグラフ）と、オーディオキャンセラ部３において第２適応フィルタ部３６を適用した後の音響信号（図６において「アレイマイク部＋オーディオキャンセラ部（Ｌ＋Ｒ）」で示すグラフ）とが示されている。 FIG. 6 shows the frequency characteristics of the acoustic signal collected by the microphone M1 in the vehicle 28 as shown in FIG. 3, the adaptive filter unit 24 when the array microphone unit 2 and the audio canceller unit 3 are operated, FIG. 5 is a diagram illustrating frequency characteristics of output signals of an adaptive filter unit and a second adaptive filter unit. Specifically, FIG. 6 shows an acoustic signal (a graph indicated by “omnidirectional microphone” in FIG. 6) of the microphone M1 before applying the adaptive filter unit 24 of the array microphone unit 2, and an adaptive filter of the array microphone unit 2. 6 and the acoustic signal after applying the first adaptive filter unit 35 in the audio canceller unit 3 (“array microphone” in FIG. 6). Part + audio canceller part (L) ”and an acoustic signal after applying the second adaptive filter part 36 in the audio canceller part 3 (“ array microphone part + audio canceller part (L + R) ”in FIG. 6) Graph).

なお、図６に示す場合において、車載用オーディオ装置より出力される音楽信号Ｌ１および音楽信号Ｒ１には長周期のＭ系列信号が用いられ、音楽信号Ｌ１と音楽信号Ｒ１とは無相関な信号となっている。また、図７（ａ）は、アレイマイク部２の適応フィルタ部２４で適用されるフィルタ係数を示し、図７（ｂ）は、オーディオキャンセラ部３の第１適応フィルタ部３５で適用されるフィルタ係数を示し、図７（ｃ）は、オーディオキャンセラ部３の第２適応フィルタ部３６で適用されるフィルタ係数を示している。具体的には、アレイマイク部２の適応フィルタ部２４におけるＦＩＲ部２５のＦＩＲフィルタ長は１２８ｔａｐ、オーディオキャンセラ部３の第１適応フィルタ部３５におけるＦＩＲフィルタ長および第２適応フィルタ部３６のＦＩＲフィルタ長は、それぞれ１９２ｔａｐ、また、各ＦＩＲ部におけるサンプリング周波数は６ｋＨｚに設定されている。 In the case shown in FIG. 6, a long-period M-sequence signal is used for the music signal L1 and the music signal R1 output from the in-vehicle audio apparatus, and the music signal L1 and the music signal R1 are uncorrelated signals. It has become. 7A shows filter coefficients applied by the adaptive filter unit 24 of the array microphone unit 2, and FIG. 7B shows a filter applied by the first adaptive filter unit 35 of the audio canceller unit 3. FIG. 7C shows filter coefficients applied by the second adaptive filter unit 36 of the audio canceller unit 3. Specifically, the FIR filter length of the FIR unit 25 in the adaptive filter unit 24 of the array microphone unit 2 is 128 tap, the FIR filter length in the first adaptive filter unit 35 of the audio canceller unit 3, and the FIR filter of the second adaptive filter unit 36 The length is set to 192 taps, and the sampling frequency in each FIR unit is set to 6 kHz.

それぞれの適応フィルタ部２４、３５、３６を適用する前後の周波数特性を比較すると、図６に示すように、無指向性マイクＭ１の出力信号レベルに対して、アレイマイク部２における出力信号レベルは、出力値が約１０ｄＢ程度減衰している。さらに、オーディオキャンセラ部３の第１適応フィルタ部３５の出力レベルでは、アレイマイク部２における出力信号レベルに対して、約８ｄＢ程度減衰し、さらに第２適応フィルタ部３６における出力信号レベルでは、第１適応フィルタ部３５の出力レベルに対して約１１ｄＢ程度減衰している。 Comparing the frequency characteristics before and after applying the adaptive filter units 24, 35, and 36, as shown in FIG. 6, the output signal level in the array microphone unit 2 is higher than the output signal level of the omnidirectional microphone M1. The output value is attenuated by about 10 dB. Further, the output level of the first adaptive filter unit 35 of the audio canceller unit 3 is attenuated by about 8 dB with respect to the output signal level of the array microphone unit 2, and the output signal level of the second adaptive filter unit 36 is About 1 dB is attenuated with respect to the output level of one adaptive filter unit 35.

このように、適応フィルタ部２４を適用するまでの信号レベルに比べて、適応フィルタ部２４、第１適応フィルタ部３５および第２適応フィルタ部３６を全て適用した後の信号レベルは、トータルで３０ｄＢ近く出力値が減衰しており、結果としてＤ／Ｕが大きく改善されている。なお、図７（ａ）〜（ｃ）に示す各適応フィルタのフィルタ係数においては、ＦＩＲフィルタの応答が続いていることから、フィルタタップ長をより長くすることにより、さらなるＤ／Ｕの改善を期待することができる。 Thus, compared to the signal level until the adaptive filter unit 24 is applied, the signal level after all the adaptive filter unit 24, the first adaptive filter unit 35, and the second adaptive filter unit 36 are applied is 30 dB in total. The output value is attenuated nearby, and as a result, the D / U is greatly improved. In addition, in the filter coefficient of each adaptive filter shown in FIGS. 7A to 7C, since the response of the FIR filter continues, the D / U can be further improved by increasing the filter tap length. You can expect.

［音量補正部］
次に、音量補正部４について説明を行う。図８は、音量補正部４の概略構成を示したブロック図である。音量補正部４は、図８に示すように、３バンドバンドパスフィルタ部５１と、最大値検出及び最大値ホールド部５２と、ゲイン計算部５３と、遅延部５４と、ゲイン設定部５５とを有している。 [Volume correction section]
Next, the volume correction unit 4 will be described. FIG. 8 is a block diagram showing a schematic configuration of the volume correction unit 4. As shown in FIG. 8, the volume correction unit 4 includes a three-band bandpass filter unit 51, a maximum value detection and maximum value hold unit 52, a gain calculation unit 53, a delay unit 54, and a gain setting unit 55. Have.

３バンドバンドパスフィルタ部５１は、車載用オーディオ信号より出力されるＬチャンネルの音楽信号Ｌ１およびＲチャンネルの音楽信号Ｒ１を、それぞれ低域、中域、高域の３つの帯域に周波数分割を行う役割を有している。 The 3-band bandpass filter unit 51 frequency-divides the L-channel music signal L1 and the R-channel music signal R1 output from the in-vehicle audio signal into three bands, a low band, a middle band, and a high band, respectively. Have a role.

図９は、３バンドバンドパスフィルタ部５１においてＬチャンネルの音楽信号Ｌ１を、低域、中域、高域の３つの帯域に周波数分割する処理に用いられる機能部を示したブロック図である。３バンドバンドパスフィルタ部５１の音楽信号Ｌ１を周波数分割するための機能部は、第１ローパスフィルタ部５６と、第２ローパスフィルタ部５７と、遅延部５８と、２つの加算部５９、６０とで概略構成されている。なお、図９には、便宜上、Ｌチャンネルの音楽信号Ｌ１を周波数分割するための３バンドバンドパスフィルタ部５１の機能部だけが示されているが、実際の３バンドバンドパスフィルタ部５１は、図９に示したＬチャンネルの音楽信号Ｌ１を周波数分割するための構成だけでなく、Ｒチャンネルの音楽信号Ｒ１を周波数分割するための構成をも備えている。 FIG. 9 is a block diagram illustrating a functional unit used for frequency division of the L-channel music signal L1 into three bands, a low band, a middle band, and a high band, in the three-band bandpass filter unit 51. The functional units for frequency-dividing the music signal L1 of the three-band bandpass filter unit 51 include a first low-pass filter unit 56, a second low-pass filter unit 57, a delay unit 58, and two addition units 59 and 60. It is roughly composed. In FIG. 9, for convenience, only the functional unit of the three-band bandpass filter unit 51 for frequency-dividing the L-channel music signal L1 is shown, but the actual three-band bandpass filter unit 51 is In addition to the configuration for frequency division of the L channel music signal L1 shown in FIG. 9, the configuration for frequency division of the R channel music signal R1 is also provided.

第１ローパスフィルタ部５６と第２ローパスフィルタ部５７とは、ＦＩＲ型のローパスフィルタで構成されており、第１ローパスフィルタ部５６および第２ローパスフィルタ部５７のそれぞれは、図１０に示すようなフィルタ特性を備えている。本実施の形態では、サンプリング周波数を４８ｋＨｚに設定し、第１ローパスフィルタ部５６のフィルタ特性は、ＦＩＲフィルタ長が１２８ｔａｐでカットオフ周波数が４００Ｈｚ、第２ローパスフィルタ部５７のフィルタ特性は、ＦＩＲフィルタ長が１２８ｔａｐでカットオフ周波数が４ｋＨｚに設定されている。また、遅延部５８は、第１ローパスフィルタ部５６および第２ローパスフィルタ部５７の適用による遅延の調整を図るために設けられるものであり、遅延部５８は、第１ローパスフィルタ部５６および第２ローパスフィルタ部５７のフィルタ長の半分の６４ｔａｐに設定される。 The first low-pass filter unit 56 and the second low-pass filter unit 57 are configured by FIR type low-pass filters, and each of the first low-pass filter unit 56 and the second low-pass filter unit 57 is as shown in FIG. Has filter characteristics. In the present embodiment, the sampling frequency is set to 48 kHz, the filter characteristic of the first low-pass filter unit 56 is that the FIR filter length is 128 tap, the cutoff frequency is 400 Hz, and the filter characteristic of the second low-pass filter unit 57 is FIR filter The length is set to 128 tap and the cut-off frequency is set to 4 kHz. The delay unit 58 is provided to adjust the delay by applying the first low-pass filter unit 56 and the second low-pass filter unit 57. The delay unit 58 includes the first low-pass filter unit 56 and the second low-pass filter unit 56. It is set to 64 tap, which is half the filter length of the low-pass filter unit 57.

図９に示す構成に基づいて、遅延部５８で遅延処理を施した音楽信号（全帯域（ＡＬＬ）の信号）に対して、加算部６０において、第２ローパスフィルタ部５７からの出力信号を減算することにより、４ｋＨｚ〜２４ｋＨｚの高域信号（Ｈｉｇｈ）が生成される。また、第２ローパスフィルタ部５７からの出力信号に対して、加算部５９において、第１ローパスフィルタ部５６からの出力信号（低域信号）を減算することにより、中域信号（Ｍｉｄ）が生成される。また、第１ローパスフィルタ部５６におけるフィルタ処理により、低域信号（Ｌｏｗ）が生成され、さらに、遅延部５８において遅延処理が行われた全帯域の信号が生成される。従って、３バンドバンドパスフィルタ部５１において入力されたＬチャンネルの音楽信号Ｌ１は、低域の音楽信号と、中域の音楽信号と、高域の音楽信号と、全帯域の音楽信号とに分離されて出力されることになる。 Based on the configuration shown in FIG. 9, the adder 60 subtracts the output signal from the second low-pass filter 57 from the music signal (all band (ALL) signal) subjected to delay processing by the delay unit 58. By doing so, a high frequency signal (High) of 4 kHz to 24 kHz is generated. Further, the adder 59 subtracts the output signal (low-frequency signal) from the first low-pass filter unit 56 from the output signal from the second low-pass filter unit 57 to generate a mid-range signal (Mid). Is done. Further, a low-frequency signal (Low) is generated by the filter processing in the first low-pass filter unit 56, and further, a signal of the entire band subjected to the delay processing in the delay unit 58 is generated. Therefore, the L-channel music signal L1 input in the 3-band bandpass filter unit 51 is separated into a low-frequency music signal, a mid-frequency music signal, a high-frequency music signal, and a full-band music signal. Will be output.

なお、上述したように、３バンドバンドパスフィルタ部５１は、音楽信号Ｌ１（Ｌチャンネル）用の構成だけでなく、音楽信号Ｒ１（Ｒチャンネル）用の構成を合わせた機能部を備えているため、上述したようなＬチャンネル側のみの動作だけでなく、Ｒチャンネル側でも同様の動作が行われる。 Note that, as described above, the 3-band bandpass filter unit 51 includes not only the configuration for the music signal L1 (L channel) but also a functional unit that combines the configuration for the music signal R1 (R channel). The same operation is performed not only on the L channel side as described above but also on the R channel side.

最大値検出及び最大値ホールド部５２は、３バンドバンドパスフィルタ部５１からの音楽信号Ｌ１（Ｌチャンネル）用および音楽信号Ｒ１（Ｒチャンネル）用の全帯域の音楽信号（ＡＬＬ）の合成を行った後、所定区間の最大値の検出を行い、さらに最大値を所定時間ホールドし、制御信号（最大値ホールド信号）として出力する役割を有している。 The maximum value detection and maximum value hold unit 52 synthesizes the music signal (ALL) of the entire band for the music signal L1 (L channel) and the music signal R1 (R channel) from the three-band bandpass filter unit 51. After that, the maximum value of the predetermined section is detected, and the maximum value is held for a predetermined time and is output as a control signal (maximum value hold signal).

本実施の形態では、所定区間を２ｍｓｅｃに設定し、ホールド時間を所定区間の８倍とする。本実施の形態では、図１１に示すように、サンプリング周波数が４８ｋＨｚであり、９６サンプルの区間の最大値を算出して２ｍｓｅｃ毎に出力し、最大で３２ｍｓｅｃの保持（ホールド）を行う。 In the present embodiment, the predetermined interval is set to 2 msec, and the hold time is set to 8 times the predetermined interval. In the present embodiment, as shown in FIG. 11, the sampling frequency is 48 kHz, the maximum value of the 96-sample section is calculated and output every 2 msec, and the maximum 32 msec is held.

次に、ゲイン計算部５３について説明を行う。図１２はゲイン計算部５３の概略構成を示したブロック図である。ゲイン計算部５３は、図１２に示すように、第１アタックリリースフィルタ部６３と、第２アタックリリースフィルタ部６４と、第３アタックリリースフィルタ部６５と、第１ルックアップテーブル部６７と、第２ルックアップテーブル部６８と、第３ルックアップテーブル部６９と、第１ローパスフィルタ部７１と、第２ローパスフィルタ部７２と、第３ローパスフィルタ部７３とを有している。 Next, the gain calculation unit 53 will be described. FIG. 12 is a block diagram showing a schematic configuration of the gain calculation unit 53. As shown in FIG. 12, the gain calculation unit 53 includes a first attack release filter unit 63, a second attack release filter unit 64, a third attack release filter unit 65, a first lookup table unit 67, A second lookup table unit 68; a third lookup table unit 69; a first low-pass filter unit 71; a second low-pass filter unit 72; and a third low-pass filter unit 73.

第１アタックリリースフィルタ部６３〜第３アタックリリースフィルタ部６５は、最大値検出及び最大値ホールド部５２より受信した制御信号の応答速度を設定する役割を有しており、応答速度の設定方法として、具体的には、アタック時間とリリース時間とによる設定が行われる。なお、アタック時間とリリース時間とは、任意に設定することが可能となっており、本実施の形態では、高域信号および低域信号用のアタック時間およびリリース時間よりも、中域信号用のアタック時間およびリリース時間の方が設定時間が長くなるように（制御速度が遅くなるように）設定されている。 The first attack release filter unit 63 to the third attack release filter unit 65 have a role of setting the response speed of the control signal received from the maximum value detection and maximum value hold unit 52, and as a response speed setting method. Specifically, the setting is made by the attack time and the release time. The attack time and the release time can be arbitrarily set. In this embodiment, the attack time and release time for the high frequency signal and the low frequency signal are higher than those for the mid frequency signal. The attack time and release time are set so that the set time becomes longer (the control speed becomes slower).

第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９は、最大値検出及び最大値ホールド部５２より受信した制御信号のレベル変換を行う役割を有しており、入力された入力信号を一定の出力値にレベル変換した後に、出力信号として出力する。 The first look-up table unit 67 to the third look-up table unit 69 have a role of performing level conversion of the control signal received from the maximum value detection and maximum value hold unit 52, and the inputted input signal is constant. The signal is output as an output signal after level conversion to the output value.

図１３は、本実施の形態に係る第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９のレベル変換動作の一例を示した図である。最大値検出及び最大値ホールド部５２より受信した入力信号（制御信号）の信号レベルが−３０ｄＢ〜０ｄＢの場合には、第１ルックアップテーブル部６７において出力レベルが−１６ｄＢ（低域制御信号）に制御され、第２ルックアップテーブル部６８において出力レベルが−２０ｄＢ（中域制御信号）に制御され、さらに、第３ルックアップテーブル部６９において出力レベルが−１８ｄＢ（高域制御信号）に制御される。また、最大値検出及び最大値ホールド部５２より受信した入力信号（制御信号）の信号レベルが−３０ｄＢ以下の場合には、第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９において、入力された制御信号に対して所定のゲインを保持したまま減衰するように出力レベルの設定が行われる。 FIG. 13 is a diagram illustrating an example of the level conversion operation of the first lookup table unit 67 to the third lookup table unit 69 according to the present embodiment. When the signal level of the input signal (control signal) received from the maximum value detection and maximum value hold unit 52 is −30 dB to 0 dB, the output level is −16 dB (low frequency control signal) in the first lookup table unit 67. In the second lookup table unit 68, the output level is controlled to -20 dB (middle range control signal), and in the third lookup table unit 69, the output level is controlled to -18 dB (high range control signal). Is done. Further, when the signal level of the input signal (control signal) received from the maximum value detection and maximum value hold unit 52 is −30 dB or less, the first lookup table unit 67 to the third lookup table unit 69 input the signal. The output level is set so that the control signal is attenuated while maintaining a predetermined gain.

このように、入力信号（制御信号）の信号レベルが−３０ｄＢ〜０ｄＢの場合において、低域制御信号、中域制御信号および高域制御信号のそれぞれの出力信号の信号レベルが一定の値に変換され、−３０ｄＢ以下の場合には、入力信号に対応するようにして出力信号の値が減衰するので、音量の変動感が低減されて違和感がなくなり、一定の振幅になるように音量の補正を行うことが可能となる。 In this way, when the signal level of the input signal (control signal) is -30 dB to 0 dB, the signal levels of the output signals of the low-frequency control signal, the mid-frequency control signal, and the high-frequency control signal are converted to constant values. In the case of -30 dB or less, the value of the output signal is attenuated so as to correspond to the input signal, so that the volume fluctuation is reduced and the sense of incongruity is eliminated, and the volume is corrected so as to have a constant amplitude. Can be done.

なお、第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９におけるレベル変換の内容は、図１３に示した入力信号と出力信号との関係には限定されず、図１３に示した関係とは異なるレベル変換に基づいて出力レベルの変換を行ってもよい The contents of level conversion in the first look-up table unit 67 to the third look-up table unit 69 are not limited to the relationship between the input signal and the output signal shown in FIG. 13, but the relationship shown in FIG. May perform output level conversion based on different level conversions

第１ローパスフィルタ部７１〜第３ローパスフィルタ部７３は、第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９より受信したそれぞれの制御信号（低域制御信号、中域制御信号、高域制御信号）の平滑化処理を行う役割を有しており、平滑化された制御信号（低域制御信号２、中域制御信号２、高域制御信号２）は、ゲイン計算部５３からゲイン設定部５５へと出力される。 The first low-pass filter unit 71 to the third low-pass filter unit 73 respectively receive the control signals (low-frequency control signal, mid-frequency control signal, high-frequency signal received from the first lookup table unit 67 to the third lookup table unit 69. Control signal) is smoothed, and the smoothed control signals (low frequency control signal 2, middle frequency control signal 2, high frequency control signal 2) are gain set from gain calculation unit 53. The data is output to the unit 55.

遅延部５４は、３バンドバンドパスフィルタ部５１において低域、中域、高域の３つの帯域に周波数分割されたＬチャンネルおよびＲチャンネルの信号（ＬチャンネルおよびＲチャンネルの低域信号、中域信号、高域信号）に対して遅延処理を施す役割を有している。遅延部５４による遅延処理により、ゲイン計算部５３の第１アタックリリースフィルタ部６３〜第３アタックリリースフィルタ部６５におけるアタック時間に対する遅延補正を行うことが可能となる。 The delay unit 54 includes L channel and R channel signals (L channel and R channel low band signals, mid band) frequency-divided into three bands, a low band, a middle band, and a high band in the three band band pass filter section 51. Signal, high-frequency signal). Due to the delay processing by the delay unit 54, it is possible to perform delay correction for the attack time in the first attack release filter unit 63 to the third attack release filter unit 65 of the gain calculation unit 53.

ゲイン設定部５５は、遅延部５４において遅延処理が施されたＬチャンネルおよびＲチャンネルの信号（ＬチャンネルおよびＲチャンネルの低域信号、中域信号、高域信号）に対して、ゲイン計算部５３よりゲイン設定部５５に向けて出力された制御信号（低域制御信号２、中域制御信号２、高域制御信号２）を合成することにより振幅が一定になるように補正を行う役割を有している。 The gain setting unit 55 applies the gain calculation unit 53 to the L channel and R channel signals (L channel and R channel low-frequency signals, mid-frequency signals, and high-frequency signals) subjected to the delay processing in the delay unit 54. Further, the control signal (low frequency control signal 2, middle frequency control signal 2, high frequency control signal 2) output to the gain setting unit 55 is combined to correct the amplitude to be constant. is doing.

図１４は、ゲイン設定部５５の概略構成を示したブロック図である。ゲイン設定部５５は、図１４に示すように、低域信号に対して低域制御信号２を乗算する乗算部７４ａと、中域信号に対して中域制御信号２を乗算する乗算部７４ｂと、高域信号に対して高域制御信号２を乗算する乗算部７４ｃと、乗算部７４ａ〜７４ｃにおいてそれぞれの制御信号が乗算された信号を足し合わせる加算部７４ｄとを有しており、加算部７４ｄにおいて足し合わされた信号は、音楽信号Ｌ２としてメインボリューム部５およびオーディオキャンセラ部３に対して出力される。なお、図１４には、便宜上、Ｌチャンネル側の構成（乗算部７４ａ〜７４ｃおよび加算部７４ｄ）のみしか示されていないが、ゲイン設定部５５は、Ｌチャンネル側の構成に対応したＲチャンネル側の構成を備えており、ＬチャンネルだけでなくＲチャンネルの信号に対しても各制御信号を乗算・加算することにより、音楽信号Ｒ２の出力を行っている。 FIG. 14 is a block diagram showing a schematic configuration of the gain setting unit 55. As shown in FIG. 14, the gain setting unit 55 includes a multiplication unit 74a that multiplies the low frequency signal by the low frequency control signal 2, and a multiplication unit 74b that multiplies the mid frequency signal by the mid frequency control signal 2. A multiplication unit 74c that multiplies the high-frequency signal by the high-frequency control signal 2, and an addition unit 74d that adds the signals multiplied by the respective control signals in the multiplication units 74a to 74c. The signal added in 74d is output to the main volume unit 5 and the audio canceller unit 3 as a music signal L2. 14 shows only the configuration on the L channel side (multiplication units 74a to 74c and addition unit 74d) for convenience, the gain setting unit 55 is on the R channel side corresponding to the configuration on the L channel side. The music signal R2 is output by multiplying and adding each control signal not only to the L channel but also to the R channel signal.

図１５は、音量補正部４における低域信号を基準として、最大値検出及び最大値ホールド部５２より出力される最大値ホールド信号（制御信号）と、ゲイン計算部５３の第１アタックリリースフィルタ部６３より出力される出力信号（低域制御信号）と、ゲイン計算部５３より出力される（第１ローパスフィルタ部７１より出力された）低域制御信号２とを示した図であり、図１６は、音量補正部４における中域信号を基準として、最大値検出及び最大値ホールド部５２より出力される最大値ホールド信号（制御信号）と、ゲイン計算部５３の第２アタックリリースフィルタ部６４より出力される出力信号（中域制御信号）と、ゲイン計算部５３より出力される（第２ローパスフィルタ部７２より出力された）中域制御信号２とを示した図である。 FIG. 15 shows a maximum value hold signal (control signal) output from the maximum value detection and maximum value hold unit 52 on the basis of the low frequency signal in the sound volume correction unit 4 and a first attack release filter unit of the gain calculation unit 53. 16 shows an output signal (low frequency control signal) output from 63 and a low frequency control signal 2 output from the gain calculation unit 53 (output from the first low pass filter unit 71). The maximum value hold signal (control signal) output from the maximum value detection and maximum value hold unit 52 and the second attack release filter unit 64 of the gain calculation unit 53 based on the mid-range signal in the volume correction unit 4 FIG. 6 is a diagram showing an output signal (middle range control signal) to be output and a middle range control signal 2 output from the gain calculation unit 53 (output from the second low-pass filter unit 72). That.

なお、図１５示した低域信号に対する第１アタックリリースフィルタ部６３でのアタック時間とリリース時間との設定値として、アタック時間として１０ｍｓｅｃ、リリース時間として０．５ｓｅｃが設定されている。なお、高域信号の場合における第３アタックリリースフィルタ部６５でのアタック時間とリリース時間との設定は、第１アタックリリースフィルタ部６３の場合（低域信号の場合）と同様に、アタック時間として１０ｍｓｅｃ、リリース時間として０．５ｓｅｃが設定される。 Note that as the set values of the attack time and release time in the first attack release filter unit 63 for the low frequency signal shown in FIG. 15, the attack time is set to 10 msec and the release time is set to 0.5 sec. The setting of the attack time and the release time in the third attack release filter unit 65 in the case of the high frequency signal is the attack time as in the case of the first attack release filter unit 63 (in the case of the low frequency signal). 10 msec and a release time of 0.5 sec are set.

一方で、図１６示した中域信号に対する第２アタックリリースフィルタ部６４でのアタック時間とリリース時間との設定は、アタック時間として２ｓｅｃ、リリース時間として２ｓｅｃが設定されている。このように、中域信号における制御速度を遅くすることにより、聴覚的に敏感に聴取される（聞き取りやすい）音楽のボーカル（音声）領域（主として中域音）の音量の変動感が低減されて違和感がなくなり、一定の振幅になるような音量の補正を行うことが可能となる。 On the other hand, the setting of the attack time and the release time in the second attack release filter unit 64 for the mid-range signal shown in FIG. 16 is set to 2 sec as the attack time and 2 sec as the release time. In this way, by slowing down the control speed of the mid-range signal, the sense of fluctuation in the volume of the vocal (speech) region (mainly mid-range sound) of music that is audibly sensitive (easy to hear) is reduced. It is possible to correct the volume so that the feeling of discomfort is eliminated and the amplitude is constant.

なお、図１５および図１６に示した図は、比較的音源の信号レベルが高い場合を例として示している。一方で、音源の信号レベルが低い場合における低域信号（あるいは高域信号）は、図１７のように示される。音源の信号レベルが低い場合には、図１７に示すように、第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９において出力信号の値が入力信号よりも高い値にレベル変換されるため、ゲイン計算部５３より出力される制御信号が０ｄＢ以上となり、音量が積極的に増加されることになる。 Note that the diagrams shown in FIGS. 15 and 16 show an example in which the signal level of the sound source is relatively high. On the other hand, the low-frequency signal (or high-frequency signal) when the signal level of the sound source is low is shown as in FIG. When the signal level of the sound source is low, the value of the output signal is level-converted to a value higher than the input signal in the first lookup table unit 67 to the third lookup table unit 69 as shown in FIG. Therefore, the control signal output from the gain calculation unit 53 becomes 0 dB or more, and the volume is positively increased.

図１８（ａ）は、音量補正部４において音量補正が行われなかった場合の信号状態を示し、図１８（ｂ）は、音量補正部４において音量補正が行われた場合の信号状態を示した図である。図１８（ａ）と図１８（ｂ）とを比較すると、音量補正が行われなかった場合（図１８（ａ））に比べて、音量補正が行われた場合（図１８（ｂ））には、大きな振幅の音源に対しては信号レベルが小さく制御され、小さな振幅の音源に対しては信号レベルが大きくなるように制御されていることがわかる。 18A shows a signal state when the volume correction unit 4 does not perform volume correction, and FIG. 18B shows a signal state when the volume correction unit 4 performs volume correction. It is a figure. Comparing FIG. 18 (a) and FIG. 18 (b), when the volume correction is performed (FIG. 18 (b)), compared with the case where the volume correction is not performed (FIG. 18 (a)). It can be seen that the signal level is controlled to be small for a sound source having a large amplitude, and the signal level is controlled to be large for a sound source having a small amplitude.

このように、音量補正部４において音量補正を行うことによって、音量の変動感が低減されて違和感がなくなり、一定の振幅になるように音量の補正を行うことができる。なお、本実施の形態では適用しなかったが、音量補正部４における音量補正において、リミッタを組み合わせて適応することにより、さらなる振幅の一定化を図ることが可能である。 In this way, by performing the volume correction in the volume correction unit 4, it is possible to correct the volume so that a sense of fluctuation in the volume is reduced and there is no sense of incongruity, and a constant amplitude is obtained. Although not applied in the present embodiment, it is possible to further stabilize the amplitude by applying a combination of limiters in the volume correction in the volume correction unit 4.

［メインボリューム部］
メインボリューム部５は、搭乗者などにより設定された音量（音量調節スイッチの操作量）に応じて、音量補正部４により音量補正が行われた音楽信号の音量調整を行う役割を有している。 [Main volume section]
The main volume unit 5 has a role of adjusting the volume of the music signal whose volume has been corrected by the volume correction unit 4 in accordance with the volume set by the passenger or the like (the amount of operation of the volume adjustment switch). .

メインボリューム部５による音量調整は、車載用オーディオ装置において設定される音量に連動して、あるいは、車載用オーディオ装置とは別に設けられるボリュームスイッチの設定に基づいて行われる。また、メインボリューム部５において行われる音量調整に関する情報（以下、音量情報という）は、音声検出部７へ出力される。 The volume adjustment by the main volume unit 5 is performed in conjunction with the volume set in the in-vehicle audio apparatus or based on the setting of a volume switch provided separately from the in-vehicle audio apparatus. Information relating to volume adjustment performed in the main volume unit 5 (hereinafter referred to as volume information) is output to the sound detection unit 7.

［音声分析部］
次に、音声分析部６について説明を行う。図１９は、音声分析部６の概略構成を示したブロック図である。音声分析部６は、図１９に示すように、実効値検出部７５と、標準偏差検出部７６と、平均部７７と、第１移動平均部７８と、第２移動平均部７９と、除算部８０と、レベル変換部８１とを有している。 [Speech analysis section]
Next, the voice analysis unit 6 will be described. FIG. 19 is a block diagram showing a schematic configuration of the voice analysis unit 6. As shown in FIG. 19, the voice analysis unit 6 includes an effective value detection unit 75, a standard deviation detection unit 76, an average unit 77, a first moving average unit 78, a second moving average unit 79, and a division unit. 80 and a level converter 81.

実効値検出部７５は、オーディオキャンセラ部３の出力信号において、所定区間の実効値の検出を行う役割を有している。また、標準偏差検出部７６は、実効値検出部７５で実効値検出が行われた信号に対して、所定区間の標準偏差を検出する役割を有し、さらに、平均部７７は、実効値検出部７５で実効値検出が行われた信号に対して平均値の検出を行う役割を有している。 The effective value detection unit 75 has a role of detecting an effective value in a predetermined section in the output signal of the audio canceller unit 3. The standard deviation detection unit 76 has a role of detecting the standard deviation of a predetermined section with respect to the signal whose effective value is detected by the effective value detection unit 75. Further, the average unit 77 is an effective value detection unit. The unit 75 has a role of detecting the average value of the signal for which the effective value is detected.

第１移動平均部７８および第２移動平均部７９は、入力された信号の所定区間の移動平均をとる役割を有しており、除算部８０は、第１移動平均部７８において移動平均された標準偏差を、第２移動平均部７９において移動平均された平均値で除算することにより、音声分析値を算出する役割を有している。 The first moving average unit 78 and the second moving average unit 79 have a role of taking a moving average of a predetermined section of the input signal, and the dividing unit 80 is moved and averaged by the first moving average unit 78. A voice analysis value is calculated by dividing the standard deviation by the average value obtained by moving average in the second moving average unit 79.

レベル変換部８１は、除算部８０において算出された音声分析値に対して、ゲインとオフセットを設定することにより音声分析ゲインとして出力する役割を有している。ここで、ゲインとオフセットは、音声検出部７の音声検出において、音声分析の重み付けを設定するものである。 The level conversion unit 81 has a role of outputting a voice analysis gain by setting a gain and an offset for the voice analysis value calculated by the division unit 80. Here, the gain and the offset are used to set a weight for voice analysis in the voice detection of the voice detection unit 7.

図２１（ａ）は、図２０（ａ）に示すような音楽信号（図２０（ａ）には便宜上、Ｌチャンネルの音楽信号しか示していないが、実際にはＲチャンネルの音楽信号も存在する）と、図２０（ｂ）に示すような音声信号とが、マイクＭ１およびマイクＭ２に入力された場合において、オーディオキャンセラ部３より音声分析部６へ入力された信号に対する音声分析値を示しており、図２１（ｂ）は、同様の音楽信号（図２０（ａ））および音声信号（図２０（ｂ））がマイクＭ１およびマイクＭ２に入力された場合において、音声分析部６のレベル変換部８１より出力される出力信号（音声分析ゲイン）を示している。 FIG. 21A shows a music signal as shown in FIG. 20A (FIG. 20A shows only an L channel music signal for convenience, but an R channel music signal actually exists. ) And an audio signal as shown in FIG. 20B are input to the microphone M1 and the microphone M2, and the audio analysis value for the signal input from the audio canceller unit 3 to the audio analysis unit 6 is shown. FIG. 21B shows the level conversion of the voice analysis unit 6 when the same music signal (FIG. 20A) and voice signal (FIG. 20B) are input to the microphone M1 and the microphone M2. An output signal (speech analysis gain) output from the unit 81 is shown.

なお、図２１（ａ）（ｂ）に示す状態において、音声分析部６でのサンプリング周波数は６ｋＨｚ、実効値検出部７５における実効値の検出区間は２．７ｍｓｅｃ、標準偏差検出部７６の標準偏差と平均部７７の平均値との検出区間は約３４０ｍｓｅｃ、第１移動平均部７８と第２移動平均部７９との移動平均区間は、約２．７ｓｅｃに設定されている。 21A and 21B, the sampling frequency in the voice analysis unit 6 is 6 kHz, the effective value detection section in the effective value detection unit 75 is 2.7 msec, and the standard deviation of the standard deviation detection unit 76 is. And the average interval of the average unit 77 is set to about 340 msec, and the moving average interval between the first moving average unit 78 and the second moving average unit 79 is set to about 2.7 sec.

図２１（ａ）に示す音声分析値を、図２０（ａ）に示す音楽信号の状態および図２０（ｂ）に示す音声信号の状態と対比させつつ観察すると、図２０（ｂ）において音声信号が検出された区間に対応する図２１（ａ）の区間（例えば、図２１（ａ）に矢印で示した区間）では、音声の存在を示すように音声分析値が増加して示されており、音声の検出を確認することができる。 When the voice analysis value shown in FIG. 21 (a) is observed while comparing with the state of the music signal shown in FIG. 20 (a) and the state of the voice signal shown in FIG. 20 (b), the voice signal in FIG. In the section of FIG. 21 (a) corresponding to the section in which is detected (for example, the section indicated by the arrow in FIG. 21 (a)), the voice analysis value is increased to indicate the presence of voice. , Voice detection can be confirmed.

一方で、図２１（ｂ）に示す音声分析ゲインは、図２１（ａ）に示す音声分析値に対して、ゲインを０．５、オフセットを１に設定したものであり、この音声分析ゲインの値が、次述する音声検出部７における音声検出の重み付けとなって利用される。 On the other hand, the voice analysis gain shown in FIG. 21B is obtained by setting the gain to 0.5 and the offset to 1 with respect to the voice analysis value shown in FIG. The value is used as a weight for voice detection in the voice detector 7 described below.

［音声検出部］
次に、音声検出部７について説明する。図２２は、音声検出部７の概略構成を示したブロック図である。音声検出部７は、実効値検出部８５と、移動平均部８６と、音声分析ゲイン乗算部８７と、音声検出スレッショルド部８８と、適応アタックリリースフィルタ部８９（適応アタックリリースフィルタ手段）と、テンポ検出部（テンポ検出手段）９０と、音声保持スレッショルド部９１とを有している。 [Audio detector]
Next, the voice detection unit 7 will be described. FIG. 22 is a block diagram illustrating a schematic configuration of the voice detection unit 7. The voice detection unit 7 includes an effective value detection unit 85, a moving average unit 86, a voice analysis gain multiplication unit 87, a voice detection threshold unit 88, an adaptive attack release filter unit 89 (adaptive attack release filter unit), a tempo A detection unit (tempo detection means) 90 and a voice holding threshold unit 91 are provided.

実効値検出部８５は、オーディオキャンセラ部３の出力信号において所定区間の実効値の検出を行う役割を有している。移動平均部８６は、実効値検出部８５において実効値の検出が行われた信号に対して、所定区間の移動平均を求める役割を有している。音声分析ゲイン乗算部８７は、音声分析部６より入力された音声分析ゲインと、実効値検出部８５において移動平均が求められた信号との乗算を行う役割を有している。 The effective value detection unit 85 has a role of detecting an effective value in a predetermined section in the output signal of the audio canceller unit 3. The moving average unit 86 has a role of obtaining a moving average of a predetermined section with respect to the signal whose effective value is detected by the effective value detecting unit 85. The voice analysis gain multiplication unit 87 has a role of multiplying the voice analysis gain input from the voice analysis unit 6 and the signal whose moving average is obtained by the effective value detection unit 85.

音声検出スレッショルド部８８は、あらかじめ設定された音声検出スレッショルド（閾値）に基づいて、音声信号の検出を行う役割を有している。なお、音声検出スレッショルド部８８における音声検出スレッショルドの値は、メインボリューム部５より入力される音量情報に応じて変化する構成となっており、例えば、メインボリューム部５の音量が５ｄＢアップすると、音声検出スレッショルドも連動して５ｄＢアップすることになる。 The voice detection threshold unit 88 has a role of detecting a voice signal based on a preset voice detection threshold (threshold). Note that the value of the voice detection threshold in the voice detection threshold unit 88 changes according to the volume information input from the main volume unit 5, and for example, when the volume of the main volume unit 5 increases by 5 dB, The detection threshold is also increased by 5 dB in conjunction with the detection threshold.

図２３は、図２０（ａ）に示すような音楽信号（Ｌチャンネルの音楽信号だけでなく、Ｒチャンネルの音楽信号も含む）と図２０（ｂ）に示すような音声信号とが、マイクＭ１およびＭ２から入力された場合における、音声検出スレッショルド部８８の入力信号（音声信号×音声分析ゲイン）と、音声検出スレッショルド部８８において設定される音声検出スレッショルドとを示している。 FIG. 23 shows that the music signal as shown in FIG. 20A (including not only the L channel music signal but also the R channel music signal) and the audio signal as shown in FIG. 2 shows the input signal (speech signal × speech analysis gain) of the speech detection threshold unit 88 and the speech detection threshold set in the speech detection threshold unit 88 in the case of input from M2.

図２３に示す状況において、音声検出部７でのサンプリング周波数は６ｋＨｚ、実効値検出部８５における実効値の検出区間は約２１ｍｓｅｃ、移動平均部８６における移動平均区間は約４２ｍｓｅｃに設定されている。図２３をみると、アレイマイク部２とオーディオキャンセラ部３において、Ｄ／Ｕが大きく改善されることにより音声信号が強調され、さらに音声分析部６より入力された音声分析ゲインによって音声検出部分が顕在化（強調）されているので、音声検出スレッショルドの検出が容易になっている。なお、音量補正部４における音量補正処理により、大きな振幅の音源に対しては信号レベルが小さくなるように制御され、小さな振幅の音源に対しては信号レベルが大きくなるように制御されて、振幅の一定化が図られているので、音源のソースやジャンルに依存することなく、音声検出スレッショルドを設定できることが可能になっている。 In the situation shown in FIG. 23, the sampling frequency in the voice detector 7 is set to 6 kHz, the effective value detection section in the effective value detector 85 is set to about 21 msec, and the moving average section in the moving average section 86 is set to about 42 msec. Referring to FIG. 23, in the array microphone unit 2 and the audio canceller unit 3, the audio signal is emphasized by greatly improving the D / U, and the audio detection portion is further detected by the audio analysis gain input from the audio analysis unit 6. Since it is manifested (emphasized), it is easy to detect the voice detection threshold. Note that the sound volume correction processing in the sound volume correction unit 4 is controlled so that the signal level is reduced for a large amplitude sound source, and is controlled so that the signal level is increased for a small amplitude sound source. Therefore, it is possible to set the voice detection threshold without depending on the source and genre of the sound source.

テンポ検出部９０は、発話者の発話スピード（音声信号の入力スピード）に応じて、次述する適応アタックリリースフィルタ部８９における音声検出時間と保持時間とを変化させる役割を有している。このように発話者の発話スピードに応じて検出精度を調整することにより最適な音量制御を行うことが可能となる。 The tempo detection unit 90 has a role of changing a voice detection time and a holding time in the adaptive attack release filter unit 89 described below in accordance with the utterance speed (speech signal input speed) of the speaker. Thus, it is possible to perform optimum volume control by adjusting the detection accuracy according to the speaking speed of the speaker.

図２４は、テンポ検出部９０の概略構成を示したブロック図である。テンポ検出部９０は、図２４に示すように、スレッショルドクロッシング検出部９４と、クロッシングゲイン部９５と、移動平均部９６と、乗算部９７と、テンポゲイン部９８と、ローパスフィルタ部９９と、ゲインオフセット部１００とを有している。 FIG. 24 is a block diagram showing a schematic configuration of the tempo detection unit 90. As shown in FIG. 24, the tempo detection unit 90 includes a threshold crossing detection unit 94, a crossing gain unit 95, a moving average unit 96, a multiplication unit 97, a tempo gain unit 98, a low-pass filter unit 99, a gain And an offset portion 100.

スレッショルドクロッシング検出部９４は、音声検出スレッショルド部８８で音声検出スレッショルドに基づいて検出された音声検出信号に対して、所定区間を１サンプルずつシフトしながらパルス数を検出する役割を有している。クロッシングゲイン部９５は、スレッショルドクロッシング検出部９４において検出されたパルス数の重み付けを行う役割を有している。移動平均部９６は、音声検出スレッショルド部８８で音声検出スレッショルドに基づいて検出された音声検出信号に対して、所定区間の移動平均を求める役割を有しており、乗算部９７は、クロッシングゲイン部９５の出力と移動平均部９６の出力との乗算を行う役割を有している。 The threshold crossing detection unit 94 has a role of detecting the number of pulses while shifting a predetermined interval one sample at a time with respect to the voice detection signal detected based on the voice detection threshold by the voice detection threshold unit 88. The crossing gain unit 95 has a role of weighting the number of pulses detected by the threshold crossing detection unit 94. The moving average unit 96 has a role of obtaining a moving average of a predetermined section with respect to the voice detection signal detected by the voice detection threshold unit 88 based on the voice detection threshold, and the multiplication unit 97 includes a crossing gain unit. It has the role of multiplying the output of 95 and the output of the moving average unit 96.

テンポゲイン部９８は、乗算部９７において乗算処理された出力の重み付けを行う役割を有しており、ローパスフィルタ部９９では乗算部９７において重み付けされた出力信号（つまりローパスフィルタ部９９に入力される信号）における所定時間の変化状態を、積分値を用いて求める処理（積分処理）を行う役割を有し、さらに、ゲインオフセット部１００は、ゲインのオフセット処理と丸め処理とを行い、テンポ検出値を出力する役割を有している。ここで、ローパスフィルタ部９９の積分処理には、１次のＩＩＲ（Infinite Impulse Response）フィルタを用いるものとし、さらに、外部からリセット信号が入力された場合には、積分処理により求められていた積分値をクリアすることが可能となっている。このように、積分処理により求められる積分値をリセット信号に応じてクリアにすることにより、音量制御における学習機能を実現することが可能となっている。 The tempo gain unit 98 has a role of weighting the output multiplied by the multiplication unit 97. In the low-pass filter unit 99, an output signal weighted by the multiplication unit 97 (that is, input to the low-pass filter unit 99). The signal offset) has a role of performing a process (integration process) for obtaining a change state for a predetermined time in the signal) using an integral value. Further, the gain offset unit 100 performs a gain offset process and a rounding process to detect a tempo detection value. Has a role to output. Here, the integration process of the low-pass filter unit 99 uses a first-order IIR (Infinite Impulse Response) filter. Further, when a reset signal is input from the outside, the integration obtained by the integration process is used. It is possible to clear the value. As described above, the learning function in the volume control can be realized by clearing the integration value obtained by the integration processing in accordance with the reset signal.

テンポ検出部９０の動作例を、図２５（ｃ）に示す。図２５（ａ）は、音声検出スレッショルド部８８における音声検出スレッショルドの検出により２値化された音声検出信号を示し、図２５（ｂ）は、２値化された音声検出信号に基づいて、テンポゲイン部９８より出力される信号を示し、図２５（ｃ）は、ローパスフィルタ部９９において求められる積分値（積分出力）と、ゲインオフセット部１００により出力されるテンポ検出値とを示している。 An example of the operation of the tempo detection unit 90 is shown in FIG. FIG. 25A shows an audio detection signal binarized by detecting the audio detection threshold in the audio detection threshold unit 88, and FIG. 25B shows a tempo based on the binarized audio detection signal. FIG. 25C shows an integrated value (integrated output) obtained by the low-pass filter unit 99 and a tempo detection value outputted by the gain offset unit 100. FIG.

なお、本実施の形態に係るテンポ検出部９０では、クロッシングゲイン部９５においてパルスを検出する所定区間を約１ｓｅｃとし、移動平均部９６における移動平均の区間を約１ｓｅｃとし、クロッシングゲイン部９５におけるクロッシングゲインを０．５とし、テンポゲイン部９８におけるテンポゲインを２０とし、ローパスフィルタ部９９におけるフィルタの正規化カットオフ周波数を０．０００２とし、さらに、ゲインオフセット部１００におけるゲインオフセットを１として設定する。図２５（ａ）〜図２５（ｃ）を検討すると、図２５（ａ）に示す音声検出信号に応じて、図２５（ｃ）に示すテンポ検出値が変化していることが理解できる。 In the tempo detection unit 90 according to the present embodiment, the predetermined interval for detecting pulses in the crossing gain unit 95 is about 1 sec, the moving average interval in the moving average unit 96 is about 1 sec, and the crossing in the crossing gain unit 95 is The gain is set to 0.5, the tempo gain in the tempo gain unit 98 is set to 20, the normalized cutoff frequency of the filter in the low-pass filter unit 99 is set to 0.0002, and the gain offset in the gain offset unit 100 is set to 1. . 25 (a) to 25 (c), it can be understood that the tempo detection value shown in FIG. 25 (c) changes according to the voice detection signal shown in FIG. 25 (a).

なお、リセット信号は、どのようなタイミングで入力されるものであってもよく、例えば、車の搭乗者の人数やメンバーが変わり、変わる前の会話状態に変化が生じそうな場合において、搭乗者が操作スイッチなどを操作してリセット（リセット信号の出力を）するものであってもよく、または、車の始動毎（エンジンを始動する毎）にリセットされるものであってもよい。 The reset signal may be input at any timing. For example, when the number of passengers or members of the car changes and the conversation state before the change is likely to change, the passenger May be reset by operating an operation switch or the like (or outputting a reset signal), or may be reset each time the vehicle is started (every time the engine is started).

次に、適応アタックリリースフィルタ部８９について説明する。適応アタックリリースフィルタ部８９は、マイクＭ１、Ｍ２において検出された音声信号に基づいて、会話の開始時にアタック時間を設定し、会話の終了時にリリース時間を設定する役割を有している。適応アタックリリースフィルタ部８９は、テンポ検出部９０において求められるテンポ検出値に応じて適応アタックリリースフィルタ部８９において設定されるアタック時間とリリース時間、すなわち音声検出時間と保持時間を可変にする「可変モード」と、テンポ検出部９０において求められるテンポ検出値に依存することなく、音声検出時間と保持時間を固定値にする「固定モード」との２つのモードを備えている。 Next, the adaptive attack release filter unit 89 will be described. The adaptive attack release filter unit 89 has a role of setting an attack time at the start of the conversation and setting a release time at the end of the conversation based on the audio signals detected by the microphones M1 and M2. The adaptive attack release filter unit 89 makes the attack time and release time set in the adaptive attack release filter unit 89, that is, the voice detection time and the holding time variable according to the tempo detection value obtained by the tempo detection unit 90. There are two modes: a “mode” and a “fixed mode” in which the voice detection time and the holding time are fixed values without depending on the tempo detection value obtained by the tempo detection unit 90.

まず、固定モードの場合について説明し、可変モードについては後に説明する。 First, the case of the fixed mode will be described, and the variable mode will be described later.

図２６（ａ）は、固定モードにおいて、音声検出スレッショルド部８８より入力された音声検出信号を示し、図２６（ｂ）は、適応アタックリリースフィルタ部８９における音声保持フィルタの適用動作例を示した図である。図２６（ｂ）に示す場合は、アタック時間とリリース時間とが固定された場合（固定モード）示しており、具体的には、アタック時間は０．２ｓｅｃに固定され、リリース時間は４ｓｅｃに固定されている。従って、音声信号を検出した場合には音声保持フィルタの適用により、０．２ｓｅｃで信号出力値が上昇し、音声信号の検出が終了した場合には、４ｓｅｃの時間をかけて信号出力が低減される。 FIG. 26A shows a voice detection signal input from the voice detection threshold unit 88 in the fixed mode, and FIG. 26B shows an application operation example of the voice holding filter in the adaptive attack release filter unit 89. FIG. In the case shown in FIG. 26 (b), the attack time and the release time are fixed (fixed mode). Specifically, the attack time is fixed at 0.2 sec and the release time is fixed at 4 sec. Has been. Therefore, when an audio signal is detected, the signal output value increases in 0.2 sec by applying the audio holding filter, and when the detection of the audio signal is completed, the signal output is reduced over 4 sec. The

音声保持スレッショルド部９１は、あらかじめ設定された音声保持スレッショルド（閾値）に基づいて、音声間隔時間の検出を行う役割を有している。図２６（ｂ）には、音声保持スレッショルド部９１において設定される音声保持スレッショルド値が示されており、本実施の形態では０．３に設定されている。 The voice holding threshold unit 91 has a role of detecting the voice interval time based on a preset voice holding threshold (threshold). FIG. 26B shows a voice holding threshold value set in the voice holding threshold unit 91, and is set to 0.3 in the present embodiment.

固定モードでは、上述したように、音声検出スレッショルド部８８の音声検出スレッショルド検出に基づいて、図２６（ａ）に示すような２値化された音声検出信号が適応アタックリリースフィルタ部８９に入力されると、適応アタックリリースフィルタ部８９では、音声検出信号に対してアタック時間とリリース時間とを適用する。そして、アタック時間とリリース時間とが適用された音声検出信号（音声保持フィルタ）のうち、音声保持スレッショルド部９１において設定される音声保持スレッショルド値を超える時間が、音声間隔時間（音声検出と保持時間）として、音声保持スレッショルド部９１により求められる。図２７（ａ）は、図２６（ａ）（ｂ）の信号状態に基づいて求められた音声間隔時間（音声間隔検出値）である。なお、上述したように、適応アタックリリースフィルタ部８９が可変モードの場合については後述する。 In the fixed mode, as described above, the binarized voice detection signal as shown in FIG. 26A is input to the adaptive attack release filter unit 89 based on the voice detection threshold detection of the voice detection threshold unit 88. Then, the adaptive attack release filter unit 89 applies the attack time and the release time to the voice detection signal. Of the voice detection signal (voice holding filter) to which the attack time and release time are applied, the time exceeding the voice holding threshold value set in the voice holding threshold unit 91 is the voice interval time (voice detection and holding time). ) Is obtained by the voice holding threshold unit 91. FIG. 27A shows a voice interval time (voice interval detection value) obtained based on the signal states of FIGS. 26A and 26B. As described above, the case where the adaptive attack release filter unit 89 is in the variable mode will be described later.

［音量制御部］
次に、音量制御部８について説明する。図２８は、音量制御部８の概略構成を示したブロック図である。音量制御部８は、図２８に示すように、アタックリリースフィルタ部１０１と、レベルリミッタ部１０２と、レベル計算部１０３と、乗算部１０４、１０５とを有している。 [Volume control section]
Next, the volume control unit 8 will be described. FIG. 28 is a block diagram showing a schematic configuration of the volume control unit 8. As shown in FIG. 28, the volume control unit 8 includes an attack release filter unit 101, a level limiter unit 102, a level calculation unit 103, and multiplication units 104 and 105.

アタックリリースフィルタ部１０１は音声検出部７において検出された音声間隔時間に基づいて、再生している音楽のフェードイン時間とフェードアウト時間の設定を行う役割を有している。 The attack release filter unit 101 has a role of setting the fade-in time and fade-out time of the music being played based on the voice interval time detected by the voice detection unit 7.

レベル計算部１０３は、メインボリューム部５より入力される音量情報に応じて、レベルリミッタ部１０２に対する音量の制御量を計算する役割を有している。具体的に、本実施の形態に係るレベル計算部１０３では、図２９に示すように、音量情報（ボリュームの調整量）の増減に対応して変化する音量制御レベルの変化を、聴取者の好みに応じて複数種類（図２９においては、一例として４種類示されている）の中から設定することが可能となっている。このように音量情報（ボリュームの調整量）に対する音量制御レベルの変化状態が任意に選択され、選択された変化状態に基づいて、レベル計算部１０３では、メインボリューム部５からの音量情報に対応する音量制御レベルを決定する。 The level calculation unit 103 has a role of calculating a volume control amount for the level limiter unit 102 in accordance with the volume information input from the main volume unit 5. Specifically, in the level calculation unit 103 according to the present embodiment, as shown in FIG. 29, a change in volume control level that changes in response to an increase or decrease in volume information (volume adjustment amount) According to the above, it is possible to set from a plurality of types (in FIG. 29, four types are shown as an example). Thus, the change state of the volume control level with respect to the volume information (volume adjustment amount) is arbitrarily selected, and the level calculation unit 103 corresponds to the volume information from the main volume unit 5 based on the selected change state. Determine the volume control level.

また、レベルリミッタ部１０２は、レベル計算部１０３において求められた音量制御レベルに基づいて、アタックリリースフィルタ部１０１の出力信号に対する調整処理を行う役割を有している。 Further, the level limiter unit 102 has a role of performing adjustment processing on the output signal of the attack release filter unit 101 based on the volume control level obtained by the level calculation unit 103.

図２７（ｂ）は、（ａ）に示すような音声検出信号（音声間隔時間）における、アタックリリースフィルタ部１０１の出力信号の変化状態を示し、図２７（ｃ）は、レベルリミッタ部１０２における出力信号（音量制御値）の変化を示している。なお、本実施の形態では、アタックリリースフィルタ部１０１のアタック時間を０．１ｓｅｃに設定し、リリース時間を５．０ｓｅｃに設定し、音量制御量を２４ｄＢ（リニアでは０．０６３１）に設定している。また、アタックリリースフィルタ部１０１において設定されるアタック時間は音楽のフェードアウト時間に該当し、リリース時間は音楽のフェードイン時間に該当する。 FIG. 27B shows a change state of the output signal of the attack release filter unit 101 in the voice detection signal (voice interval time) as shown in FIG. 27A, and FIG. A change in the output signal (volume control value) is shown. In this embodiment, the attack time of the attack release filter unit 101 is set to 0.1 sec, the release time is set to 5.0 sec, and the volume control amount is set to 24 dB (0.0631 in linear). Yes. Also, the attack time set in the attack release filter unit 101 corresponds to a music fade-out time, and the release time corresponds to a music fade-in time.

図２７（ｂ）（ｃ）に示すように、レベルリミッタ部１０２は、アタックリリースフィルタ部１０１の出力信号の増減変化を反転させて、音声が検出されない区間では音量制御値が１になるように変換させ、音声が検出される区間で音量制御値が０．０５程度まで低減されるように調整を行う。図２７（ｃ）に示すように、レベルリミッタ部１０２において、音声の検出時に音量制御値を１より低い値（例えば０に近い値）に変化・設定し、音声の検出時に音量制御値を１の値に変化・設定することにより、音声検出のタイミングに連動させて音楽信号の信号レベルを低減させることが可能となる。 As shown in FIGS. 27B and 27C, the level limiter unit 102 inverts the increase / decrease change of the output signal of the attack release filter unit 101 so that the volume control value becomes 1 in a section where no sound is detected. Conversion is performed so that the volume control value is reduced to about 0.05 in a section in which sound is detected. As shown in FIG. 27C, the level limiter unit 102 changes / sets the volume control value to a value lower than 1 (for example, a value close to 0) at the time of detecting the sound, and sets the volume control value to 1 at the time of detecting the sound. By changing and setting to the value of, the signal level of the music signal can be reduced in conjunction with the timing of voice detection.

乗算部１０４、１０５は、レベルリミッタ部１０２の信号出力（音量制御値）と、メインボリューム部５における２チャンネルの音楽信号（音楽信号Ｌ３および音楽信号Ｒ３）との乗算処理を行う役割を有している。乗算部１０４、１０５では、図２７（ｃ）に示される音量制御信号が、音楽信号Ｌ３および音楽信号Ｒ３のそれぞれに掛け合わされて、音量が制御された音楽信号Ｌ４および音楽信号Ｒ４がパワーアンプ部９へと出力される。乗算部１０４、１０５において音量制御が施された音楽信号Ｌ４およびＲ４は、パワーアンプ部９を介してスピーカ１０ａ，１０ｂから出力される。 The multipliers 104 and 105 have a role of performing multiplication processing of the signal output (volume control value) of the level limiter unit 102 and the two-channel music signals (music signal L3 and music signal R3) in the main volume unit 5. ing. In the multipliers 104 and 105, the volume control signal shown in FIG. 27C is multiplied by the music signal L3 and the music signal R3, respectively, and the volume-controlled music signal L4 and music signal R4 are the power amplifier section. 9 is output. The music signals L4 and R4 subjected to volume control in the multipliers 104 and 105 are output from the speakers 10a and 10b via the power amplifier unit 9.

図２０（ｃ）は、図２０（ａ）に示すような音楽信号（図２０（ａ）には便宜上、Ｌチャンネルの音楽信号しか示していないが、実際にはＲチャンネルの音楽信号も存在する）が車載用オーディオ装置で再生される状態において、図２０（ｂ）に示すような音声信号がマイクＭ１およびマイクＭ２に入力された場合に、スピーカ１０ａ、１０ｂより出力される音楽信号（音量制御部８において音量制御が行われた音楽信号）を示している。図２０（ａ）〜図２０（ｃ）を比較して比べるとわかるように、会話が行われて、マイクＭ１およびマイクＭ２において、発話者の音声が集音された場合には、その集音された音声のタイミングに応じて、音楽信号の信号レベルが低減された状態で出力されることになる。このため、会話を行っている当事者は、会話に応じて低減制御される音楽によって会話を阻害されることなく、違和感のない会話を車内で楽しむことが可能となる。 FIG. 20C shows a music signal as shown in FIG. 20A (FIG. 20A shows only an L channel music signal for convenience, but an R channel music signal actually exists. ) Is reproduced by the in-vehicle audio apparatus, and audio signals (volume control) are output from the speakers 10a and 10b when an audio signal as shown in FIG. 20B is input to the microphone M1 and the microphone M2. The music signal for which the volume control is performed in the unit 8) is shown. As can be seen by comparing FIG. 20A to FIG. 20C, when a conversation is performed and the voice of the speaker is collected in the microphone M1 and the microphone M2, the collected sound is collected. The music signal is output in a state where the signal level of the music signal is reduced in accordance with the timing of the sound. For this reason, the party who is having a conversation can enjoy a conversation without feeling uncomfortable in the vehicle without being hindered by the music that is controlled to be reduced according to the conversation.

また、上述したように、音声検出部７において検出された音声間隔時間（音声検出時間と保持時間）は固定値となっており、この音声間隔時間に基づいて、音楽のフェードアウト時間（アタックリリースフィルタ部１０１のアタック時間）が０．１ｓｅｃに設定され、音楽のフェードイン時間（アタックリリースフィルタ部１０１のリリース時間）が５．０ｓｅｃに設定されているため、図２０（ｃ）に示すように、音声信号がマイクＭ１，Ｍ２で取得された時には素早く音楽情報の信号レベルを低減させ、一方で、音声信号の取得が終了したときには、暫く時間を保って（５．０ｓｅｃ）ゆっくりと音楽信号の信号レベルを復帰される。 Further, as described above, the voice interval time (voice detection time and holding time) detected by the voice detection unit 7 is a fixed value, and the music fade-out time (attack release filter) is based on the voice interval time. The attack time of the section 101 is set to 0.1 sec, and the fade-in time of the music (release time of the attack release filter section 101) is set to 5.0 sec. Therefore, as shown in FIG. When the audio signal is acquired by the microphones M1 and M2, the signal level of the music information is quickly reduced. On the other hand, when the acquisition of the audio signal is completed, the signal of the music signal is slowly maintained for a while (5.0 sec). Returned level.

次に、音声検出部７の適応アタックリリースフィルタ部８９における設定が「可変モード」である場合について説明する。 Next, the case where the setting in the adaptive attack release filter unit 89 of the voice detection unit 7 is “variable mode” will be described.

可変モードの場合には、テンポ検出部９０のテンポ検出値に応じて、適応アタックリリースフィルタ部８９において設定されるアタック時間およびリリース時間、すなわち音声検出時間および保持時間が変化することになる。 In the case of the variable mode, the attack time and release time set in the adaptive attack release filter unit 89, that is, the voice detection time and the holding time change according to the tempo detection value of the tempo detection unit 90.

図３０は、テンポ検出部９０において求められたテンポ検出値に応じて決定されるアタック時間（音声検出時間）とリリース時間（保持時間）との関係を示した図である。図３０において、テンポ検出時間が３未満の場合、つまり車両走行時の一瞬の路面変化音がマイクＭ１、Ｍ２で取得された場合や、独り言などの短い発話がマイクＭ１、Ｍ２で取得された場合のように、音声信号の音声速度が速く、音声間隔が長いものと判断できる場合（図３０に示した「（Ａ）路面変化音、短い発話」に該当する場合）には、音声信号の検出を行わず、仮に検出した場合であっても、保持時間を短くすることにより、路面変化音や短い発話が行われた場合において不用意に音楽信号の信号レベルが低減されてしまうことを防止し、さらに、もしも信号レベルの低減などが行われた場合であっても、音楽情報を短い時間で元の信号レベルに復帰させることが可能となる。 FIG. 30 is a diagram showing the relationship between the attack time (sound detection time) and the release time (holding time) determined according to the tempo detection value obtained by the tempo detection unit 90. In FIG. 30, when the tempo detection time is less than 3, that is, when an instantaneous road change sound during vehicle travel is acquired by the microphones M1 and M2, or when a short utterance such as a monologue is acquired by the microphones M1 and M2. When it can be determined that the voice speed of the voice signal is high and the voice interval is long (as in “(A) road surface change sound, short utterance” shown in FIG. 30), the voice signal is detected. Even if it is detected even if it is not performed, it is possible to prevent the signal level of the music signal from being inadvertently reduced when a road change sound or a short utterance is performed by shortening the holding time. Furthermore, even if the signal level is reduced, the music information can be restored to the original signal level in a short time.

また、テンポ検出部９０において求められるテンポ検出時間が３以上であって７以下である場合、つまり、発話の速度がゆっくりで発話間隔がやや長いと判断できる場合（図３０に示した「（Ｂ）音声速度：遅い、音声間隔：やや長い」に該当する場合）には、音声検出を行うアタック時間は短く、音声保持を行うリリース時間は長くなるように、アタック時間およびリリース時間を設定する。このように、アタック時間およびリリース時間を設定することにより、会話の開始に応じて音楽信号の信号レベルを低減させることができ、さらに、音声間隔が長い状態と判断されるので，音楽信号の信号レベルの復帰を違和感のないように緩やかに行うことができる。 Further, when the tempo detection time required by the tempo detection unit 90 is 3 or more and 7 or less, that is, when it can be determined that the utterance speed is slow and the utterance interval is slightly long ("(B shown in FIG. 30" ) When the voice speed is slow and the voice interval is slightly long), the attack time and the release time are set so that the attack time for voice detection is short and the release time for holding the voice is long. In this way, by setting the attack time and the release time, the signal level of the music signal can be reduced according to the start of the conversation, and it is determined that the voice interval is long. The level can be restored slowly so that there is no sense of incongruity.

さらに、テンポ検出部９０において求められるテンポ検出時間が７以上である場合、つまり、音声速度が速く、さらに、音声間隔が短いと判断できる場合（図３０に示した「（Ｃ）音声速度：早い、音声間隔：短い」に該当する場合）には、テンポ検出時間が３以上であって７以下である場合に比べて、リリース時間を短くすることにより音声保持時間を短くし、音楽信号の信号レベルの復帰における応答性を良好にすることが可能である。 Further, when the tempo detection time required by the tempo detection unit 90 is 7 or more, that is, when it can be determined that the voice speed is high and the voice interval is short (“(C) voice speed: fast shown in FIG. 30”. In the case of “sound interval: short”), the sound holding time is shortened by shortening the release time compared to the case where the tempo detection time is 3 or more and 7 or less. It is possible to improve the responsiveness in returning the level.

図３１（ａ）は、音声信号の検出状態において、音声速度が短く（検出時間が短く）、音声間隔が非常に長い（検出間隔が非常に長い）場合（図３０に示した「（Ａ）路面変化音、短い発話」に該当する場合）における音声検出信号の検出状態を示し、図３１（ｂ）は、固定モードの場合における、適応アタックリリースフィルタ部８９の音声保持フィルタの適用動作例を示し、図３１（ｃ）は、図３１（ａ）と同じ可変モードにおける、適応アタックリリースフィルタ部８９の音声保持フィルタの適応動作例を示している。 FIG. 31A shows a case where the sound speed is short (detection time is short) and the sound interval is very long (the detection interval is very long) in the detection state of the sound signal (“(A) shown in FIG. 30”). FIG. 31B shows an application operation example of the speech holding filter of the adaptive attack release filter unit 89 in the fixed mode. FIG. 31 (c) shows an example of the adaptive operation of the speech holding filter of the adaptive attack release filter unit 89 in the same variable mode as FIG. 31 (a).

図３１（ｂ）に示すように、固定モードでは、音声保持フィルタの値が音声保持スレッショルドを超えた値となるため、音声保持が行われて所定時間の音量制御が実行されることになる。一方で、図３１（ｃ）に示すように、可変モードでは、音声保持フィルタの値が音声保持スレッショルドを超えることはないため、音量制御が行われない。このため、車両走行時の一瞬の路面変化音や独り言などの短い発話には、自動音量制御装置１による音量制御が行われず、不意の音量変化に対応する音量補正を抑制することができるので、車内の乗員に対して違和感のあるような音楽信号の出力が行われてしまうことを防止することが可能となる。 As shown in FIG. 31 (b), in the fixed mode, the value of the voice holding filter exceeds the voice holding threshold, so that voice holding is performed and volume control for a predetermined time is executed. On the other hand, as shown in FIG. 31 (c), in the variable mode, the value of the voice holding filter does not exceed the voice holding threshold, so that the volume control is not performed. For this reason, since the volume control by the automatic volume control device 1 is not performed for a short utterance such as an instantaneous road surface change sound or monologue when the vehicle is running, the volume correction corresponding to the unexpected volume change can be suppressed. It is possible to prevent the output of a music signal that makes the passenger in the vehicle feel uncomfortable.

一方で、図３２（ａ）は、音声信号の検出状態において、音声速度がやや短く（検出時間がやや短く）、音声間隔がやや長い（検出間隔がやや長い）場合における音声検出値の検出状態を示し、図３２（ｂ）は、固定モードの場合における、適応アタックリリースフィルタ部８９の音声保持フィルタの適用動作例を示し、図３２（ｃ）は、図３２（ａ）と同じ可変モードにおける、適応アタックリリースフィルタ部８９の音声保持フィルタの適応動作例を示している。 On the other hand, FIG. 32A shows the detection state of the sound detection value when the sound speed is slightly short (the detection time is slightly short) and the sound interval is slightly long (the detection interval is slightly long) in the sound signal detection state. FIG. 32B shows an application operation example of the speech holding filter of the adaptive attack release filter unit 89 in the case of the fixed mode, and FIG. 32C shows the same variable mode as that in FIG. 10 shows an adaptive operation example of the speech holding filter of the adaptive attack release filter unit 89.

図３２（ｂ）に示すように、固定モードでは、音声保持フィルタの値が音声保持スレッショルドを下回る場合があるため、音量制御が時々解除された状態となりやすいが、一方で、図３２（ｃ）に示すように、可変モードでは、最初の段階において音量制御が行われるが、テンポ検出部９０における検出値の積分結果、すなわち学習機能に基づいて、次第に発話中と判断することができ、音声保持スレッショルドを下回ることがなくなるので、音量制御が保持されることが可能となり、不意に音量変化が生じてしまって聴取者に違和感を生じさせることを防止することが可能となる。 As shown in FIG. 32 (b), in the fixed mode, since the value of the sound holding filter may be lower than the sound holding threshold, the volume control is likely to be released from time to time, but on the other hand, FIG. 32 (c) As shown in FIG. 4, in the variable mode, the volume control is performed in the first stage, but based on the integration result of the detection value in the tempo detection unit 90, that is, based on the learning function, it can be gradually determined that the speech is being performed, Since it will not fall below the threshold, the volume control can be maintained, and it is possible to prevent the volume from changing unexpectedly and causing the listener to feel uncomfortable.

上述したように本実施の形態に係る自動音量制御装置１では、マイクＭ１およびマイクＭ２において音声が検出された場合に、車載用オーディオ装置より出力される音楽の出力音量が自動的に低減されるので、会話を行う毎に音量調節スイッチやミュートスイッチを操作することなく、円滑な会話を行うことが可能となる。 As described above, in the automatic volume control device 1 according to the present embodiment, when sound is detected in the microphone M1 and the microphone M2, the output volume of music output from the in-vehicle audio device is automatically reduced. Therefore, it is possible to perform a smooth conversation without operating the volume control switch and the mute switch each time a conversation is performed.

特に、本実施の形態に係る自動音量制御装置１では、無指向性のマイクＭ１と単一指向性のマイクＭ２とを用いることにより指向性を強調させる構成を採用しているので、発話者の音声を精度良く取得することが可能である。 In particular, the automatic volume control device 1 according to the present embodiment employs a configuration in which the directivity is emphasized by using the omnidirectional microphone M1 and the unidirectional microphone M2, so that the speaker's It is possible to acquire sound with high accuracy.

さらに、アレイマイク部２の適応フィルタ部２４においてＮＬＭＳ適応アルゴリズムを適用し、さらにオーディオキャンセラ部３の第１適応フィルタ部３５および第２適応フィルタ部３６においてＬＭＳ適応アルゴリズムを適用することにより、音声信号における音声信号成分以外の信号成分（ノイズ成分）を効果的かつ高い収束性を確保した上で低減させることができ、音声帯域における音声信号の検出精度の向上を図ることが可能となる。 Further, by applying the NLMS adaptive algorithm in the adaptive filter unit 24 of the array microphone unit 2 and further applying the LMS adaptive algorithm in the first adaptive filter unit 35 and the second adaptive filter unit 36 of the audio canceller unit 3, It is possible to reduce signal components (noise components) other than the audio signal component in FIG. 2 while ensuring effective and high convergence, and it is possible to improve the detection accuracy of the audio signal in the audio band.

特に、アレイマイク部２の適応フィルタ部２４において、ＮＬＭＳ適応アルゴリズムを適用してノイズ成分の低減を図った上で、さらに音源のチャンネル数に応じて、オーディオキャンセラ部３の第１適応フィルタ部３５および第２適応フィルタ部３６において、適応フィルタ部をカスケード接続する構成を採用し、各適応フィルタ部でのフィルタ処理においてより早くフィルタ処理を適用する部分毎に適応速度を大きくしているので、フィルタ処理が適用される信号の収束を素早くすることが可能となる。 In particular, the adaptive filter unit 24 of the array microphone unit 2 applies the NLMS adaptive algorithm to reduce noise components, and further, according to the number of channels of the sound source, the first adaptive filter unit 35 of the audio canceller unit 3. In the second adaptive filter unit 36, a configuration in which the adaptive filter units are cascade-connected is adopted, and the adaptive speed is increased for each portion to which the filter processing is applied earlier in the filter processing in each adaptive filter unit. It is possible to quickly converge the signal to which the processing is applied.

また、オーディオキャンセラ部３の第１バンドパスフィルタ部３１および第２バンドパスフィルタ部３２において設定される帯域制限幅が、アレイマイク部２の第１バンドパスフィルタ部２１において設定される帯域制限幅よりも広い帯域幅に設定されているため、帯域制限のカットオフ周波数付近のオーディオキャンセル性能を向上させることが可能となる。 Further, the band limiting width set in the first band pass filter unit 31 and the second band pass filter unit 32 of the audio canceller unit 3 is the band limiting width set in the first band pass filter unit 21 of the array microphone unit 2. Since a wider bandwidth is set, it is possible to improve the audio cancellation performance near the cutoff frequency of the band limit.

上述したような複数の適応アルゴリズムの適用やバンドパスフィルタの適応制限幅の設定により、Ｄ（希望信号：音声信号）／Ｕ（非希望信号：音楽信号）に優れた信号を求めることができる。 A signal excellent in D (desired signal: audio signal) / U (undesired signal: music signal) can be obtained by applying a plurality of adaptive algorithms as described above and setting the adaptive limit width of the bandpass filter.

また、音量補正部４のゲイン計算部５３において、第１アタックリリースフィルタ部６３および第３アタックリリースフィルタ部６５で低域制御信号および高域制御信号を対象として設定されるアタック時間およびリリース時間に比べて、第２アタックリリースフィルタ部６４で中域制御信号を対象として設定されるアタック時間およびリリース時間を長い時間に設定することにより、中域信号における制御処理を遅くすることができる。このため、聴覚的に敏感に聴取される（聞き取りやすい）音楽のボーカル音領域の音量の変動感が低減されて違和感がなくなり、一定の振幅になるように音量補正を行うことが可能となる。 Further, in the gain calculation unit 53 of the volume correction unit 4, the attack time and release time set for the low-frequency control signal and the high-frequency control signal in the first attack release filter unit 63 and the third attack release filter unit 65 are set. In comparison, by setting the attack time and release time set for the mid-range control signal in the second attack release filter unit 64 as long, the control processing for the mid-range signal can be delayed. For this reason, it is possible to reduce the volume fluctuation in the vocal sound area of music that is audibly and audibly audible (easy to hear), eliminate the sense of incongruity, and perform volume correction so that the amplitude is constant.

さらに、音量補正部４のゲイン計算部５３において、第１ルックアップテーブル部６７〜第３ルックアップテーブル部６９で低域制御信号、中域制御信号、高域制御信号の出力信号レベルを入力信号の信号レベルが所定値以上（本実施の形態に場合には、−３０ｄＢ以上）の場合には出力信号の信号レベルを一定値に設定し、入力信号の信号レベルが所定値（本実施の形態の場合には、−２０ｄＢ〜−１６ｄＢ）以下の場合には、入力信号よりも信号レベルが高い値を示すように出力信号の信号レベルを変換するので、音楽信号における音量の変動感を低減させる音量補正を行うことができる。 Further, in the gain calculation unit 53 of the volume correction unit 4, the output signal levels of the low-frequency control signal, the mid-range control signal, and the high-frequency control signal are input to the first lookup table unit 67 to the third lookup table unit 69. Is equal to or higher than a predetermined value (in this embodiment, −30 dB or higher), the signal level of the output signal is set to a constant value, and the signal level of the input signal is set to a predetermined value (this embodiment). In the case of −20 dB to −16 dB), the signal level of the output signal is converted so that the signal level is higher than that of the input signal. Volume correction can be performed.

また、音声検出部７の音声検出スレッショルド部８８で音声間隔時間の検出を行う場合において、上述したアレイマイク部２およびオーディオキャンセラ部３の処理により優れたＤ／Ｕを備えた出力信号に対して、音声検出部分に重み付けが施された音声分析ゲインが適用されるので、出力信号における音声検出部分の顕在化（強調）を図ることができ、音声の有無を用意かつ確実に判断することが可能となる。さらに、音声検出部分の顕在化が図られた信号に対して音声検出スレッショルドが設定されて、音声が聴取された際の音声間隔時間の検出が行われるので、音源のソースやジャンルに依存することなく、容易に音声検出スレッショルドを設定することが可能となり、音声間隔時間の検出精度を高めることが可能となる。 In addition, when the voice detection threshold unit 88 of the voice detection unit 7 detects the voice interval time, an output signal having an excellent D / U by the processing of the array microphone unit 2 and the audio canceller unit 3 described above. Since the voice analysis gain with weight applied to the voice detection part is applied, the voice detection part in the output signal can be revealed (emphasized), and the presence / absence of voice can be prepared and reliably determined It becomes. In addition, since the voice detection threshold is set for the signal in which the voice detection part is made obvious and the voice interval time is detected when the voice is heard, it depends on the source and genre of the sound source. Therefore, the voice detection threshold can be easily set, and the detection accuracy of the voice interval time can be improved.

さらに、音声検出部７の音声検出スレッショルド部８８において設定される音声検出スレッショルドは、メインボリューム部５より入力される音量情報に基づいて決定されるので、音声検出スレッショルドを音量調節スイッチ操作に応じたボリューム調整量（音量情報）に連動させて最適化することが可能になる。従って、音声信号の制御量を音量に応じて制御することが可能となり、制御量を任意に設定することが可能になる Furthermore, since the voice detection threshold set in the voice detection threshold unit 88 of the voice detection unit 7 is determined based on the volume information input from the main volume unit 5, the voice detection threshold is set according to the volume adjustment switch operation. It becomes possible to optimize in conjunction with the volume adjustment amount (volume information). Therefore, the control amount of the audio signal can be controlled according to the volume, and the control amount can be arbitrarily set.

また、音声のテンポ（発話者の発語スピード）を検出し、テンポに応じて音量制御における音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）を変動することができるので、違和感のない音量制御を行うことができる。 Also, since the voice tempo (speaker's speech speed) can be detected and the voice detection time (fade-in time) and holding time (fade-out time) in volume control can be changed according to the tempo, there is no sense of incongruity. Volume control can be performed.

さらに、リセット入力の有無に応じて、音声検出時間（フェードイン時間）および保持時間（フェードアウト時間）の決定内容を再度計算（学習）させることができるので、音声検出および保持時間が適切に変更されるように制御することができ、不意の音量変化に対する違和感を低減させることが可能となる。 Furthermore, since the determination contents of the voice detection time (fade-in time) and the holding time (fade-out time) can be calculated (learned) again according to the presence or absence of the reset input, the voice detection and holding time is appropriately changed. Therefore, it is possible to reduce a sense of incongruity with unexpected volume changes.

さらに、メインボリューム部５の音量情報に応じて、音量制御部８のレベル計算部１０３における音量制御を行うことができるので、メインボリューム部５の音量に応じて適切に音楽信号の信号レベル（音量）を変化させることが可能となる。 Furthermore, since the volume control in the level calculation unit 103 of the volume control unit 8 can be performed according to the volume information of the main volume unit 5, the signal level (volume of the music signal) is appropriately adjusted according to the volume of the main volume unit 5. ) Can be changed.

また、自動音量制御装置１では、音量制御部８のアタックリリースフィルタ部１０１において音声検出部７より取得した音声検出信号に応じて音楽のフェードアウト時間に該当するアタック時間と、音楽のフェードイン時間に該当するリリース時間とが設定されるため、この時間設定を変更することにより、音楽のフェードアウト、フェードイン時間を任意に設定することが可能となる。 In the automatic volume control device 1, the attack time corresponding to the music fade-out time and the music fade-in time according to the sound detection signal acquired from the sound detection unit 7 in the attack release filter unit 101 of the volume control unit 8 are set. Since the corresponding release time is set, it is possible to arbitrarily set the music fade-out and fade-in time by changing the time setting.

以上、本発明に係る自動音量制御装置について、図面を用いて詳細に説明した、本発明に係る自動音量制御装置は、上述した実施の形態に限定されるものではない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到しうることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The automatic volume control device according to the present invention described above in detail with reference to the drawings for the automatic volume control device according to the present invention is not limited to the above-described embodiment. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

１ …自動音量制御装置
２ …アレイマイク部（アレイマイク手段、音声信号抽出手段）
３ …オーディオキャンセラ部（オーディオキャンセラ手段、音声信号抽出手段）
４ …音量補正部（音量補正手段）
５ …メインボリューム部
６ …音声分析部（音声分析手段）
７ …音声検出部（音声検出手段）
８ …音量制御部（音量制御手段）
９ …パワーアンプ部
１０ａ、１０ｂ …スピーカ
２１ …（アレイマイク部の）第１バンドパスフィルタ部
２２ …（アレイマイク部の）第２バンドパスフィルタ部
２３ …（アレイマイク部の）遅延部
２４ …（アレイマイク部の）適応フィルタ部
２５ …（適応フィルタ部の）ＦＩＲ部
２６ …（適応フィルタ部の）ＮＬＭＳ部
２７ …（適応フィルタ部の）加算部
２８ …車両
２８ａ …運転席
２８ｂ …助手席
３１ …（オーディオキャンセラ部の）第１バンドパスフィルタ部
３２ …（オーディオキャンセラ部の）第２バンドパスフィルタ部
３３ …（オーディオキャンセラ部の）第１遅延部
３４ …（オーディオキャンセラ部の）第２遅延部
３５ …（オーディオキャンセラ部の）第１適応フィルタ部
３６ …（オーディオキャンセラ部の）第２適応フィルタ部
３７ …（第１適応フィルタ部の）第１ＦＩＲ部
３８ …（第２適応フィルタ部の）第２ＦＩＲ部
３９ …（第１適応フィルタ部の）第１ＬＭＳ部
４０ …（第２適応フィルタ部の）第２ＬＭＳ部
４１ …（第１適応フィルタ部の）第１加算部
４２ …（第２適応フィルタ部の）第２加算部
５１ …（音量補正部の）３バンドバンドパスフィルタ部
５２ …（音量補正部の）最大値検出及び最大値ホールド部
５３ …（音量補正部の）ゲイン計算部
５４ …（音量補正部の）遅延部
５５ …（音量補正部の）ゲイン設定部
５６ …（３バンドバンドパスフィルタ部の）第１ローパスフィルタ部
５７ …（３バンドバンドパスフィルタ部の）第２ローパスフィルタ部
５８ …（３バンドバンドパスフィルタ部の）遅延部
５９、６０ …（３バンドバンドパスフィルタ部の）加算部
６３ …（ゲイン計算部の）第１アタックリリースフィルタ部
６４ …（ゲイン計算部の）第２アタックリリースフィルタ部
６５ …（ゲイン計算部の）第３アタックリリースフィルタ部
６７ …（ゲイン計算部の）第１ルックアップテーブル部
６８ …（ゲイン計算部の）第２ルックアップテーブル部
６９ …（ゲイン計算部の）第３ルックアップテーブル部
７１ …（ゲイン計算部の）第１ローパスフィルタ部
７２ …（ゲイン計算部の）第２ローパスフィルタ部
７３ …（ゲイン計算部の）第３ローパスフィルタ部
７４ａ、７４ｂ、７４ｃ …（ゲイン設定部の）乗算部
７４ｄ …（ゲイン設定部の）加算部
７５ …（音声分析部の）実効値検出部
７６ …（音声分析部の）標準偏差検出部
７７ …（音声分析部の）平均部
７８ …（音声分析部の）第１移動平均部
７９ …（音声分析部の）第２移動平均部
８０ …（音声分析部の）除算部
８１ …（音声分析部の）レベル変換部
８５ …（音声検出部の）実効値検出部
８６ …（音声検出部の）移動平均部
８７ …（音声検出部の）音声分析ゲイン乗算部
８８ …（音声検出部の）音声検出スレッショルド部
８９ …（音声検出部の）適応アタックリリースフィルタ部（適応アタックリリースフィルタ手段）
９０ …（音声検出部の）テンポ検出部（テンポ検出手段）
９１ …（音声検出部の）音声保持スレッショルド部
９４ …（テンポ検出部の）スレッショルドクロッシング検出部
９５ …（テンポ検出部の）クロッシングゲイン部
９６ …（テンポ検出部の）移動平均部
９７ …（テンポ検出部の）乗算部
９８ …（テンポ検出部の）テンポゲイン部
９９ …（テンポ検出部の）ローパスフィルタ部
１００ …（テンポ検出部の）ゲインオフセット部
１０１ …（音量制御部の）アタックリリースフィルタ部
１０２ …（音量制御部の）レベルリミッタ部
１０３ …（音量制御部の）レベル計算部
１０４、１０５ …（音量制御部の）乗算部
Ｍ１、Ｍ２ …マイク DESCRIPTION OF SYMBOLS 1 ... Automatic volume control apparatus 2 ... Array microphone part (array microphone means, audio | voice signal extraction means)
3 ... Audio canceller (audio canceller means, audio signal extraction means)
4 ... Volume correction section (volume correction means)
5 ... main volume part 6 ... voice analysis part (voice analysis means)
7: Voice detection unit (voice detection means)
8: Volume control unit (volume control means)
9... Power amplifier units 10a and 10b... Speaker 21... First band pass filter unit 22 (of array microphone unit) 2. Second band pass filter unit 23 (of array microphone unit)... Delay unit 24 of (array microphone unit). Adaptive filter unit 25 (of the array microphone unit) FIR unit 26 of the adaptive filter unit NLMS unit 27 of the adaptive filter unit Adder unit 28 of the adaptive filter unit Vehicle 28a Driver's seat 28b Passenger seat 31 ... First band pass filter unit 32 (of the audio canceller unit) ... Second band pass filter unit 33 (of the audio canceller unit) ... First delay unit 34 (of the audio canceller unit) ... Second (of the audio canceller unit) Delay unit 35 ... first adaptive filter unit 36 (of audio canceller unit) ... second adaptive filter (of audio canceller unit) Filter unit 37 ... First FIR unit 38 (of the first adaptive filter unit) ... Second FIR unit 39 (of the second adaptive filter unit) ... First LMS unit 40 (of the first adaptive filter unit) ... (of the second adaptive filter unit) ) Second LMS unit 41... First addition unit 42 (of the first adaptive filter unit)... Second addition unit 51 (of the second adaptive filter unit) 3 band bandpass filter unit 52 (of the volume correction unit). Maximum value detection and maximum value hold unit 53 (of correction unit) Gain calculation unit 54 (of volume correction unit) Delay unit 55 (of volume correction unit) Gain setting unit 56 (of volume correction unit) (3-band band) First low-pass filter unit 57 (of the pass filter unit) Second low-pass filter unit 58 (of the 3-band band-pass filter unit) Delay units 59, 60 (of the 3-band band-pass filter unit) (3-band band-pass) Adder unit 63 (of the filter unit) First attack release filter unit 64 (of the gain calculation unit) Second attack release filter unit 65 (of the gain calculation unit) Third attack release filter unit 67 (of the gain calculation unit) First look-up table unit 68 (for gain calculation unit) Second look-up table unit 69 (for gain calculation unit) Third look-up table unit 71 (for gain calculation unit) First (for gain calculation unit) Low-pass filter unit 72 ... second low-pass filter unit 73 (for gain calculation unit) third low-pass filter unit 74a, 74b, 74c (for gain calculation unit) multiplication unit 74d (for gain setting unit) ) Adder 75 ... RMS value detector 76 (of speech analyzer) ... Standard deviation detector 77 (of speech analyzer) ... Average unit 78 (of speech analyzer) ... (sound First moving average unit 79 (of the voice analysis unit) ... Second moving average unit 80 (of the voice analysis unit) ... Dividing unit 81 (of the voice analysis unit) ... Level conversion unit 85 (of the voice analysis unit) ... (Speech detection unit) ) Effective value detection unit 86 ... Moving average unit 87 (of voice detection unit) ... Voice analysis gain multiplication unit 88 (of voice detection unit) ... Voice detection threshold unit 89 (of voice detection unit) ... (of voice detection unit) Adaptive attack release filter (adaptive attack release filter means)
90 ... Tempo detection section (of the voice detection section) (tempo detection means)
91 ... Voice holding threshold part 94 (of the voice detection part) ... Threshold crossing detection part 95 (of the tempo detection part) ... Crossing gain part 96 (of the tempo detection part) ... Moving average part 97 (of the tempo detection part) ... (Tempo Multiplying unit 98 (of detection unit) Tempo gain unit 99 (of tempo detection unit) Low pass filter unit 100 (of tempo detection unit) Gain offset unit 101 (of tempo detection unit) Attack release filter (of volume control unit) Unit 102 ... level limiter unit 103 (for volume control unit) level calculation unit 104, 105 (for volume control unit) multiplication unit M1, M2 (for volume control unit) microphone

Claims

An audio signal extracting means for extracting an audio signal related to the audio band as an acoustic signal by applying an adaptive algorithm and applying a band limiting process of the audio band component to the audio signal collected by the microphone;
Volume for performing volume correction of the music signal by dividing the music signal from the sound source for each band and maintaining the signal level in each divided band at a constant value when the signal level of the music signal is equal to or higher than a certain level Correction means;
From the sound signal extracted by the sound signal extraction means, a sound analysis means for obtaining a sound analysis gain in which the weight of the sound detection part is performed;
Applying the audio analysis gain obtained by the audio analysis unit to the audio signal extracted by the audio signal extraction unit, thereby revealing the audio detection part in the audio signal and indicating the presence or absence of audio detection Voice detection means for obtaining a detection signal;
A volume control unit that reduces the output level of the music signal that has undergone the volume correction by the volume correction unit based on the voice detection signal obtained by the voice detection unit when detecting the voice; Volume control device.

The voice detection means further includes
Based on the voice detection signal, the voice detection value at every predetermined time in the voice detection portion is integrated to obtain a change in the detection state of the voice at the predetermined time, and the utterance of the speaker based on the obtained integration value Tempo detection means for calculating a tempo detection value for judging speed;
Based on the tempo detection value obtained by the tempo detection means, an attack time corresponding to the sound detection time and a release time corresponding to the sound detection holding time are determined, and the determined attack time and release time are determined. The automatic sound volume control apparatus according to claim 1, further comprising: an adaptive attack release filter unit that sets the sound detection signal.

The tempo detection means recalculates the tempo detection value for judging the utterance speed of the speaker by clearing the integration value obtained by the integration processing based on the input of the reset signal. The automatic volume control device according to claim 2.

The adaptive attack release filter means includes:
A variable mode for setting the attack time and the release time based on the tempo detection value; and a fixed mode for setting the attack time and the release time to a predetermined value regardless of the tempo detection value. The automatic volume control device according to claim 2 or 3.

The audio signal extraction means includes
Array microphone means for extracting an audio signal associated with an audio band by applying an NLMS adaptive algorithm after performing a first band limiting process corresponding to an audio band component on the audio signal collected by the microphone; ,
Adaptation that is cascade-connected in accordance with the number of channels of the music signal after performing a second band limiting process corresponding to the audio band component on the audio signal related to the audio band extracted by the array microphone means 5. An audio canceller unit that applies a multistage LMS adaptive algorithm to the audio signal that has been subjected to the second band limiting process using a filter. The automatic volume control device according to item 1.

The bandwidth limit width of the second bandwidth limit process in the audio canceller means includes an upper limit value and a lower limit value of the bandwidth limit width of the first bandwidth limit process in the array microphone means, and the bandwidth limit of the first bandwidth limit process. The automatic volume control device according to claim 5, wherein the automatic volume control device is set to have a bandwidth slightly wider than the width.

The automatic volume control according to any one of claims 1 to 6, wherein the volume control means changes an output level reduction amount in the music signal in accordance with a volume state of the sound source. apparatus.