JP2010230972A

JP2010230972A - Voice signal processing device, method and program therefor, and reproduction device

Info

Publication number: JP2010230972A
Application number: JP2009078326A
Authority: JP
Inventors: Shinji Suzuki; 信司鈴木
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2009-03-27
Filing date: 2009-03-27
Publication date: 2010-10-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a reproduction system which makes human voice easy to listen to, without sense of incongruity. <P>SOLUTION: A sound signal in which human voice is included, in sound signals of channels corresponding to a plurality of loudspeakers arranged in a surrounding of a reference point, is detected by comparing each sound characteristic such as a sound volume and a frequency band. A dynamic range of only the detected sound signal of the channel is compressed. A listener easily listens to words of a movie, which is human voice, in a simple configuration for comparing the sound volume and the frequency band without a sense of discomfort, and excellent contents are appreciated. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、基準点の周囲に配置される複数のスピーカーに対応したチャンネルの音信号を処理する音信号処理装置、その方法、そのプログラム、および、再生装置に関する。 The present invention relates to a sound signal processing device that processes sound signals of channels corresponding to a plurality of speakers arranged around a reference point, a method thereof, a program thereof, and a playback device.

従来、複数のスピーカーを用いて多チャンネルの音データーを再生する再生システムが知られている。この再生システムは、例えば画像データーをモニターで表示させ、視聴者の周りに複数のスピーカーを配置して、視聴者の周囲から音データーを再生させる。これら再生システムで再生する音データーは、例えばＤＶＤ（Digital Versatile Disc）などのパッケージメディアに記録されていたり、インターネットなどのネットワークを介して配信されていたりする。
そして、従来の音データー処理装置は、例えば５．１ｃｈ（チャンネル）などのように、視聴者の周りに配置する各スピーカーから、それぞれ対応する音データーを出力させる処理をしている。 Conventionally, a reproduction system that reproduces multi-channel sound data using a plurality of speakers is known. In this reproduction system, for example, image data is displayed on a monitor, a plurality of speakers are arranged around the viewer, and sound data is reproduced from around the viewer. Sound data to be reproduced by these reproduction systems is recorded on a package medium such as a DVD (Digital Versatile Disc), or distributed via a network such as the Internet.
Then, the conventional sound data processing apparatus performs a process of outputting corresponding sound data from each speaker arranged around the viewer, such as 5.1ch (channel).

ところで、映画などのコンテンツは、映画館などの大音量で暗く騒音が少ない環境で再生が前提として作成されているのが大半であり、そのような環境で最大限の表現力を演出するために、ダイナミックレンジの大きな録音が実施されている。
一方、同じコンテンツを家庭の音響映像再生装置で再生する場合、特に夜間などでは音量を絞って再生する場合や、周囲の騒音が大きい場合などでは、聞き取りにくい場合がある。具体的には、ささやくような台詞の演出の場合など、コンテンツの再生音のうちで小さい音量で演出されている箇所では、いわゆるマスキングスレッショルド付近まで再生音が落ち込み、非常に聞き取りにくくなる場合が多々ある。 By the way, most contents such as movies are created on the premise of playback in an environment such as a movie theater where the volume is dark and there is little noise. To produce the maximum expressive power in such an environment. Recording with a large dynamic range has been implemented.
On the other hand, when the same content is played back by a home audio / video playback device, it may be difficult to hear, particularly when the volume is reduced at night or when the surrounding noise is high. Specifically, when the content is played at a low volume, such as in the case of whispering dialogue, the playback sound drops to the vicinity of the so-called masking threshold and is often very difficult to hear. is there.

そして、従来、映画のようなダイナミックレンジが大きなコンテンツでは、音データーをダイナミックレンジ圧縮することが提案されている（例えば、特許文献１参照）。 Conventionally, for content with a large dynamic range such as a movie, it has been proposed to compress the dynamic range of sound data (see, for example, Patent Document 1).

特表２００６−５２４９６８号公報JP 2006-524968 A

しかしながら、上述した特許文献１に記載のようなコンテンツの音データーをダイナミックレンジ圧縮する場合、例えばコンテンツがミュージカルやオペラなどの楽曲を含んでいると、コンテンツの演出内容が損なわれるおそれがある。
本発明は、このような点に鑑み、違和感なく人の音声を聞き取りやすくすることを特徴とする音信号処理装置、その方法、そのプログラム、および、再生装置を提供することを目的の一つとする。 However, when dynamic range compression is performed on the sound data of the content as described in Patent Document 1 described above, for example, if the content includes music such as a musical or an opera, the contents of the content may be impaired.
In view of the above, it is an object of the present invention to provide a sound signal processing device, a method thereof, a program thereof, and a playback device that make it easy to hear a human voice without a sense of incongruity. .

本発明に記載の音信号処理装置は、基準点の周囲に配置される複数のスピーカーからこれらスピーカーに対応したチャンネルの音信号を再生させるために、当該音信号を処理する音信号処理装置であって、前記チャンネルの音信号のうち人の音声が含まれる音信号を、それぞれの音特性の比較に基づいて検出する音声検出手段と、この音声検出手段で検出した音声が含まれるチャンネルの音信号のみをダイナミックレンジ圧縮する圧縮処理手段と、を具備したことを特徴とする。 The sound signal processing device according to the present invention is a sound signal processing device that processes a sound signal of a channel corresponding to the speaker from a plurality of speakers arranged around the reference point. And a sound detection means for detecting a sound signal including a human voice among the sound signals of the channel based on a comparison of the respective sound characteristics, and a sound signal of a channel including the sound detected by the sound detection means. And compression processing means for compressing only the dynamic range.

本発明に記載の音信号処理方法は、基準点の周囲に配置される複数のスピーカーからこれらスピーカーに対応したチャンネルの音信号を再生させるために、当該音信号を演算手段により処理する音信号処理方法であって、前記演算手段は、前記チャンネルの音信号のうち人の音声が含まれる音信号を、それぞれの音特性の比較に基づいて検出する音声検出工程と、この音声検出工程で検出した音声が含まれるチャンネルの音信号のみをダイナミックレンジ圧縮する圧縮処理工程と、を実施することを特徴とする。 The sound signal processing method according to the present invention is a sound signal processing for processing sound signals of a channel corresponding to the speakers from a plurality of speakers arranged around the reference point by an arithmetic means in order to reproduce the sound signals. In the method, the calculation means detects a sound signal including a human voice among the sound signals of the channel based on a comparison of respective sound characteristics, and detects the sound signal in the sound detection step. And a compression processing step of performing dynamic range compression only on a sound signal of a channel including sound.

本発明に記載の音信号処理プログラムは、演算手段を請求項１から請求項７までのいずれか一項に記載の音信号処理装置として機能させることを特徴とする。 The sound signal processing program according to the present invention causes the calculation means to function as the sound signal processing device according to any one of claims 1 to 7.

本発明に記載の再生装置は、請求項１から請求項７までのいずれか一項に記載の音信号処理装置と、この音信号処理装置により基準点の周囲に配置される複数のスピーカーに対応して処理されたチャンネルの音信号を前記スピーカーで出力させる出力手段と、を具備したことを特徴とする。 A playback device according to the present invention corresponds to the sound signal processing device according to any one of claims 1 to 7 and a plurality of speakers arranged around a reference point by the sound signal processing device. And output means for outputting the sound signal of the processed channel through the speaker.

なお、本発明における演算手段としては、１つのコンピューターに限らず、複数のコンピューターをネットワーク状に組み合わせた構成、ＣＰＵやマイクロコンピュータなどの素子、あるいは複数の電子部品が搭載された回路基板などをも含むものである。 The calculation means in the present invention is not limited to a single computer, but may include a configuration in which a plurality of computers are combined in a network, an element such as a CPU or a microcomputer, or a circuit board on which a plurality of electronic components are mounted. Is included.

本発明における一実施形態に係る再生装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the reproducing | regenerating apparatus which concerns on one Embodiment in this invention.

以下、本発明の一実施形態を図１に基づいて説明する。
なお、本実施形態では、複数のスピーカーを備えた再生システムの構成を例示するが、この限りではない。 Hereinafter, an embodiment of the present invention will be described with reference to FIG.
In the present embodiment, the configuration of a playback system including a plurality of speakers is illustrated, but the present invention is not limited to this.

［再生システム］
図１において、１００は再生システムで、この再生システム１００は、基準点の周囲に配置される複数のスピーカーから、それぞれ音信号を再生させるいわゆるマルチチャンネルのシステム構成である。
再生する音信号は、映画やミュージカル、演劇、音楽プロモーション映像などの映像信号を含むものに限らず、音楽などの映像信号を含まない音信号のみの各種コンテンツデーターに含まれるものである。特に、人の音声が含まれるものを対象とするが、人の音声が含まれないコンテンツデーターの再生を除外するものではない。
そして、再生システム１００は、コンテンツデーターを処理する再生装置２００と、この再生装置２００で処理された音信号を出力する複数のスピーカー３００と、にて構成されている。なお、再生装置２００が映像信号を処理可能な構成の場合、再生システム１００は映像信号を出力すなわち画面表示する表示装置を備えた構成としてもよい。 [Playback system]
In FIG. 1, reference numeral 100 denotes a reproduction system. The reproduction system 100 has a so-called multi-channel system configuration in which sound signals are reproduced from a plurality of speakers arranged around a reference point.
The sound signals to be reproduced are not limited to those including video signals such as movies, musicals, plays, and music promotion videos, but are included in various content data including only sound signals that do not include video signals such as music. In particular, it is intended to include human voices, but does not exclude reproduction of content data that does not include human voices.
The playback system 100 includes a playback device 200 that processes content data, and a plurality of speakers 300 that output sound signals processed by the playback device 200. In the case where the playback device 200 is configured to process a video signal, the playback system 100 may include a display device that outputs the video signal, that is, displays the screen.

再生装置２００は、コンテンツデーターの音信号をスピーカーで出力させるためにコンテンツデーターを処理する装置である。
この再生装置２００は、コンテンツ取得手段２１０と、表示手段２２０と、入力手段２３０と、演算手段２４０と、出力手段２５０と、を備えている。 The playback device 200 is a device that processes content data in order to output a sound signal of the content data through a speaker.
The playback apparatus 200 includes a content acquisition unit 210, a display unit 220, an input unit 230, a calculation unit 240, and an output unit 250.

コンテンツ取得手段２１０は、コンテンツデーターを取得する。例えば、コンテンツ取得手段２１０は、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの光ディスク、磁気ディスクなどに記憶されたコンテンツデーターを読み取るドライブ装置や、インターネットやイントラネット、ＬＡＮなどのネットワークを介して再生装置２００外からコンテンツデーターを取得するインターフェースなどが例示できる。
そして、コンテンツ取得手段２１０は、取得したコンテンツデーターを演算手段２４０へ出力する。この取得するコンテンツデーターの出力は、ストリームデーターとして順次取得しつつ出力する。
なお、コンテンツデーターに映像信号と音信号とを含む場合には、音信号のみを演算手段に出力し、映像信号は別の処理装置にて処理させる。 The content acquisition unit 210 acquires content data. For example, the content acquisition unit 210 is connected to a drive device that reads content data stored on an optical disc such as a CD (Compact Disc) or DVD (Digital Versatile Disc) or a magnetic disc, or a network such as the Internet, an intranet, or a LAN. An interface for acquiring content data from outside the playback apparatus 200 can be exemplified.
Then, the content acquisition unit 210 outputs the acquired content data to the calculation unit 240. The content data to be acquired is output while being sequentially acquired as stream data.
If the content data includes a video signal and a sound signal, only the sound signal is output to the calculation means, and the video signal is processed by another processing device.

表示手段２２０は、再生装置２００の図示しない筐体に外部から視認可能なモニターを備えている。モニターとしては、例えば液晶やＥＬ（Electro Luminescence）パネルなどの各種表示装置が用いられる。
そして、表示手段２２０は、演算手段２４０の制御により、演算手段２４０から出力される信号に基づいて、コンテンツデーターの処理状況や再生出力状況、入力手段２３０の入力操作内容などを表示する。 The display unit 220 includes a monitor that is visible from the outside in a housing (not shown) of the playback device 200. As the monitor, for example, various display devices such as a liquid crystal or an EL (Electro Luminescence) panel are used.
Then, the display unit 220 displays the processing status of the content data, the playback output status, the input operation content of the input unit 230, and the like based on the signal output from the calculation unit 240 under the control of the calculation unit 240.

入力手段２３０は、利用者が入力操作可能な例えば図示しない操作ボタンや操作つまみなどの各種スイッチを複数有している。この入力手段２３０は、これらスイッチの入力操作により所定の信号を演算手段２４０に出力し、各種条件を演算手段２４０に設定入力する。
なお、入力手段２３０は、スイッチの入力操作にて設定入力する構成に限らず、例えば音声入力などいずれの入力方法を利用できる。また、いわゆるリモコンであるリモートコントローラとして構成し、入力操作に対応した信号を無線媒体を介して演算手段２４０へ送信して設定入力させる構成とするなどしてもよい。
また、入力手段２３０は、詳細は後述するが、利用者の入力操作により、音信号のダイナミックレンジ圧縮を実施する度合い、すなわち強弱を切り替える切替手段２３１を備えている。 The input unit 230 has a plurality of various switches such as operation buttons and operation knobs (not shown) that can be input by the user. The input means 230 outputs predetermined signals to the calculation means 240 by input operations of these switches, and inputs various conditions to the calculation means 240.
Note that the input unit 230 is not limited to a configuration in which setting input is performed by an input operation of a switch, and any input method such as voice input can be used. Further, it may be configured as a remote controller which is a so-called remote controller, and may be configured to transmit a signal corresponding to the input operation to the calculation means 240 via a wireless medium for setting input.
Although the details will be described later, the input unit 230 includes a switching unit 231 that switches the degree to which the dynamic range compression of the sound signal is performed, that is, the strength, by the user's input operation.

演算手段２４０は、例えばシステムマイコンが用いられ、再生装置２００全体を制御可能である。
この演算手段２４０は、音信号を処理するプログラムとして、音信号取得手段２４１と、音声検出手段２４２と、圧縮処理手段２４３と、を備えている。 As the calculation means 240, for example, a system microcomputer is used, and the entire playback apparatus 200 can be controlled.
The calculation unit 240 includes a sound signal acquisition unit 241, a sound detection unit 242, and a compression processing unit 243 as a program for processing a sound signal.

音信号取得手段２４１は、コンテンツ取得手段２１０から出力されるコンテンツデーターの音信号を取得し、チャンネル毎の音信号を抽出する。
チャンネル毎の音信号の抽出は、例えばバンドパスや、ミキシング処理、エフェクト処理、ディレイ処理を適宜実施するなど、各種方法を適用できる。 The sound signal acquisition unit 241 acquires the sound signal of the content data output from the content acquisition unit 210 and extracts the sound signal for each channel.
For the extraction of the sound signal for each channel, various methods such as bandpass, mixing processing, effect processing, and delay processing are appropriately applied.

音声検出手段２４２は、音信号取得手段２４１で取得したチャンネル毎の音信号のうち、人の音声が含まれるチャンネルの音信号を検出する。すなわち、音声検出手段２４２は、各音信号の音特性に基づいて、音声が含まれるか否かを判断する。
具体的には、この音声検出手段２４２は、音量比較手段２４２Ａと、相関性判断手段２４２Ｂと、周波数解析手段２４２Ｃと、を備え、これらで判断したチャンネルの音信号を、人の音声を含むものとして判定する。 The sound detection unit 242 detects a sound signal of a channel including a human voice among the sound signals for each channel acquired by the sound signal acquisition unit 241. That is, the sound detection unit 242 determines whether or not sound is included based on the sound characteristics of each sound signal.
Specifically, the voice detection unit 242 includes a volume comparison unit 242A, a correlation determination unit 242B, and a frequency analysis unit 242C, and the sound signal of the channel determined by these includes a human voice. Judge as.

音量比較手段２４２Ａは、各チャンネルの音信号のうち、他のチャンネルの音信号よりも相対的に音量が大きいチャンネルの音信号を判断する。例えば、映画などでは台詞である人の音声がある場合、その台詞のあるシーンでの音声の音量が他の音の音量より大きい場合がほとんどである。このように、音声が含まれるチャンネルの音信号は、他のチャンネルの音信号より音量が大きくなるので、音量が大きいチャンネルの音信号は、音声を含む可能性が高い。したがって、音量比較手段２４２Ａは、音量が大きいチャンネルの音信号を、音声を含む候補として認識する。
特に、基準点の周囲に配置されたスピーカー３００における隣接するスピーカー３００に対応したチャンネルの音信号同士で比較することが好ましい。例えば、音量比較手段２４２Ａは、隣接するスピーカー３００に対応した音信号より音量が大きいものを、音声を含む候補として認識する。すなわち、映画の多くのシーンでは、台詞を語る登場人物は１人であり、その登場人物が登場する立ち位置、例えば登場人物が画面中央から左側で発言している場合には左のスピーカー３００に対応するチャンネルで顕著に台詞が再生されるため、特に隣接するスピーカーに対応する音信号で比較するとよい。
なお、隣接する２つのスピーカー３００の中間の位置に台詞が定位したコンテンツの場合、すなわち隣接する２つのスピーカー３００に対応するチャンネルにそれぞれ台詞が含まれる場合には、該当の隣接する２つのスピーカー３００の音量がともに他のチャンネルの音量より大きいという場合も起こりうる。この場合、該当する隣接した２つのスピーカー３００をともに音声を再生しているチャンネルの候補とみなし、後述する他の手段でも該当チャンネルが台詞を再生していると検知された場合、２つのスピーカー３００がともに台詞を再生しているチャンネルとして決定するとよい。なお、この場合、隣接する２つのスピーカー３００にさらに隣接するスピーカー３００より音量が大きいこととなり、この比較で検知できる。
また、隣接するスピーカー３００に対応したチャンネルの音信号同士で比較するのは、あくまで検出のための手段であって、後述するダイナミックレンジ圧縮を検出された音信号に施すのは隣接したいずれかのスピーカー３００、または両方のスピーカー３００に限定されるものではなく、例えば、センタースピーカーとリアスピーカーのようにお互い離れた場所に位置するスピーカー３００の音信号に対してであってよい。すなわち、台詞などの人の音声が含まれているチャンネルをダイナミックレンジ圧縮する。 The volume comparison unit 242A determines a sound signal of a channel whose volume is relatively higher than that of the other channels among the sound signals of each channel. For example, in a movie or the like, when there is a voice of a person who is a dialogue, the volume of the voice in a scene with the dialogue is almost larger than the volume of other sounds. As described above, the sound signal of the channel including the sound has a higher volume than the sound signals of the other channels, and therefore, the sound signal of the channel having a higher volume is likely to include the sound. Therefore, the volume comparison unit 242A recognizes the sound signal of the channel with a high volume as a candidate including sound.
In particular, it is preferable to compare the sound signals of channels corresponding to adjacent speakers 300 among the speakers 300 arranged around the reference point. For example, the sound volume comparison unit 242A recognizes a sound whose volume is larger than the sound signal corresponding to the adjacent speaker 300 as a candidate including sound. That is, in many scenes of the movie, there is one character who speaks the line, and when the character appears, for example, when the character speaks on the left side from the center of the screen, the left speaker 300 Since the dialogue is remarkably reproduced in the corresponding channel, it is particularly preferable to compare the sound signals corresponding to the adjacent speakers.
Note that in the case of content in which dialogue is localized at an intermediate position between two adjacent speakers 300, that is, when dialogue is included in the channels corresponding to the two adjacent speakers 300, the corresponding two adjacent speakers 300. It may happen that the volume of both is higher than the volume of other channels. In this case, if the corresponding two adjacent speakers 300 are regarded as candidates for a channel that reproduces sound together, and it is detected that the corresponding channel is reproducing a dialogue by other means described later, the two speakers 300 are also detected. May be determined as channels that are both playing dialogue. In this case, the volume of the two adjacent speakers 300 is higher than that of the adjacent speakers 300, and can be detected by this comparison.
The comparison between the sound signals of the channels corresponding to the adjacent speakers 300 is merely a means for detection, and the dynamic range compression described later is applied to the detected sound signal in any of the adjacent sound signals. The present invention is not limited to the speaker 300 or both speakers 300, and may be, for example, a sound signal of the speaker 300 that is located at a distance from each other such as a center speaker and a rear speaker. That is, the dynamic range compression is performed on a channel including human speech such as dialogue.

相関性判断手段２４２Ｂは、各チャンネルの音信号のうち、他のチャンネルの音信号と音特性で相関性が低いチャンネルの音信号を判断する。例えば、アクション映画や戦争映画などの特定のジャンルにおいて、台詞の途中で爆発音があるなど、大音量となる演出の場合、他のチャンネルの音信号でも同じ爆発音が再生されることとなり、このような場合にはチャンネル間の相関性は高いことになるが、台詞の場合には特定のチャンネルにのみ台詞の音声がアサインされる場合が多く、この場合には必然的に他のチャンネルとの相関性は低くなる。したがって、相関性判断手段２４２Ｂは、他のチャンネルの音信号と相関性が低いチャンネルの音信号を、音声を含む候補として認識する。
この相関性判断手段２４２Ｂについても、音量比較手段２４２Ａと同様に、基準点の周囲に配置されたスピーカー３００における隣接するスピーカー３００に対応したチャンネルの音信号同士で比較する構成が好ましい。例えば、相関性判断手段２４２Ｂは、隣接するスピーカー３００に対応した音信号との音特性である周波数帯や音量を比較し、隣同士では相関性が高いが、他のスピーカー３００に対応するチャンネルの音信号とでは相関性が低い、すなわち周波数帯のレベルや音量の大きさが類似していない音信号を、音声を含む候補として認識する。音量の大きさの類似としては、音量差で判断するなどが例示できる。
なお、隣接する２つのスピーカー３００の中間の位置に台詞が定位したコンテンツの場合には、該当の隣接する２つのスピーカー３００の再生音声の相関は高く、それぞれが隣接する反対側のスピーカー３００との相関は低い、という状態が起こりうる。この場合、上述した音量の場合と同様に、これら隣接するスピーカー３００に対してさらに隣接するスピーカー３００に対応するチャンネルの音信号を比較することで、該当する隣接した２つのスピーカー３００をともに音声を再生しているチャンネルの候補とみなす。そして、後述する他の手段でも該当チャンネルが台詞を再生していると検知された場合、２つのスピーカー３００がともに台詞を再生しているチャンネルとして決定するとよい。
また、上述したように、隣接するスピーカー３００に対応したチャンネルの音信号同士で比較するのは、あくまで検出のための手段であって、後述するダイナミックレンジ圧縮を検出された音信号に施すのは隣接したいずれかのスピーカー３００、または両方のスピーカー３００に限定されるものではなく、例えば、センタースピーカーとリアスピーカーのようにお互い離れた場所に位置するスピーカー３００の音信号に対してであってよい。 The correlation determination unit 242B determines a sound signal of a channel having low correlation in sound characteristics with sound signals of other channels among the sound signals of each channel. For example, in a certain genre such as an action movie or war movie, there is an explosion sound in the middle of a dialogue, such as a loud sound production, the same explosion sound will be played even with the sound signal of other channels, In such cases, the correlation between channels is high, but in the case of dialogue, dialogue speech is often assigned only to a specific channel. In this case, it is inevitably necessary to communicate with other channels. Correlation is low. Accordingly, the correlation determination unit 242B recognizes a sound signal of a channel having low correlation with a sound signal of another channel as a candidate including sound.
The correlation determination unit 242B is also preferably configured to compare the sound signals of the channels corresponding to the adjacent speakers 300 in the speakers 300 arranged around the reference point, similarly to the sound volume comparison unit 242A. For example, the correlation determination unit 242B compares the frequency band and volume that are sound characteristics with the sound signal corresponding to the adjacent speaker 300, and the correlation between the adjacent ones is high, but the channel corresponding to the other speaker 300 has a high correlation. A sound signal having a low correlation with the sound signal, that is, a frequency signal whose level and volume are not similar is recognized as a candidate including sound. An example of the similarity in volume level is judgment based on a volume difference.
Note that, in the case of content in which the dialogue is localized at an intermediate position between two adjacent speakers 300, the correlation between the reproduced audio of the corresponding two adjacent speakers 300 is high, and each of the two adjacent speakers 300 is adjacent to the adjacent speaker 300 on the opposite side. There can be a situation where the correlation is low. In this case, as in the case of the above-described volume, by comparing the sound signals of the channels corresponding to the adjacent speakers 300 with respect to the adjacent speakers 300, the corresponding two adjacent speakers 300 can hear the sound. Considered as a candidate for the channel being played. Then, when it is detected that the corresponding channel is reproducing the dialogue by other means described later, it is preferable that the two speakers 300 be determined as the channels reproducing the dialogue.
Further, as described above, the comparison between the sound signals of the channels corresponding to the adjacent speakers 300 is only a means for detection, and the dynamic range compression described later is applied to the detected sound signal. The present invention is not limited to any one of the adjacent speakers 300 or both speakers 300, and may be, for example, a sound signal of a speaker 300 located at a distance from each other such as a center speaker and a rear speaker. .

周波数解析手段２４２Ｃは、各チャンネルの音信号の周波数帯域を解析することにより音声が含まれているか否かを判断する。例えば、人の音声は、一般に３００Ｈｚ以上４ｋＨｚ以下である。このことにより、周波数解析手段２４２Ｃは、音信号の周波数帯を解析し、３００Ｈｚ以上４ｋＨｚ以下の周波数帯が他の周波数帯より割合が顕著に多いチャンネルの音信号を、音声を含む候補として認識する。 The frequency analysis means 242C determines whether or not sound is included by analyzing the frequency band of the sound signal of each channel. For example, human voice is generally 300 Hz to 4 kHz. Thus, the frequency analysis unit 242C analyzes the frequency band of the sound signal and recognizes the sound signal of the channel whose frequency band of 300 Hz to 4 kHz is significantly higher than the other frequency bands as a candidate including sound. .

そして、音声検出手段２４２は、音量比較手段２４２Ａ、相関性判断手段２４２Ｂ、および周波数解析手段２４２Ｃでそれぞれ人の音声を含むと認識したチャンネルの音信号を検出する。
なお、検出する音信号のチャンネルは１チャンネルに限られない。すなわち、人の音声が複数のチャンネルの音信号に含まれる場合、例えば複数の登場人物が異なる立ち位置でそれぞれ発言している場合などでは、それら複数のチャンネルの音信号を検出する。 Then, the voice detection unit 242 detects the sound signal of the channel recognized as including the human voice by the volume comparison unit 242A, the correlation determination unit 242B, and the frequency analysis unit 242C.
The channel of the sound signal to be detected is not limited to one channel. That is, when human voices are included in the sound signals of a plurality of channels, for example, when a plurality of characters speak at different standing positions, the sound signals of the plurality of channels are detected.

圧縮処理手段２４３は、音声検出手段２４２で検出したチャンネルの音信号のみを、ダイナミックレンジ圧縮する。
圧縮する方法としては、各種方法が利用できる。一般的には、小さい入力レベルは大きな出力レベルに変換する一方で、大入力時のレベル差を圧縮する一種の対数変換を行い、音信号の入力レベルに対する出力レベルを変換する方法が例示できる。具体的には、ＤＳＰ（Digital Signal Processor）を用い、各チャンネルの音信号毎にサーブル参照し、入力レベルに対して出力レベルを読み替えることで実現できる。
この圧縮処理手段２４３は、ダイナミックレンジ圧縮をする程度を変更可能となっている。例えば、利用者による入力手段２３０の切替手段２３１の入力操作により、ダイナミックレンジ圧縮を実施する度合いである強弱が設定入力されると、この設定入力された度合いに対応してダイナミックレンジ圧縮を実施する。 The compression processing unit 243 performs dynamic range compression only on the sound signal of the channel detected by the sound detection unit 242.
Various methods can be used as the compression method. In general, a method of converting the output level relative to the input level of the sound signal by performing a kind of logarithmic conversion that compresses the level difference at the time of large input while converting the small input level to the large output level can be exemplified. Specifically, it can be realized by using a DSP (Digital Signal Processor), making a reference for each sound signal of each channel, and reading the output level relative to the input level.
The compression processing means 243 can change the degree of dynamic range compression. For example, when the user inputs and operates the switching unit 231 of the input unit 230 to set and input a dynamic range compression level, the dynamic range compression is performed in accordance with the input level. .

出力手段２５０は、デジタル／アナログコンバータ（Digital-Analog Converter：ＤＡＣ）２５１と、アンプ２５２と、を備えている。
ＤＡＣ２５１は、再生装置２００に接続され、再生装置２００からチャンネル毎に出力される処理されたデジタルの音声信号をアナログに変換する。そして、ＤＡＣ２１０は、アナログに変換した音信号を、それぞれアンプ２５２へ出力する。
アンプ２５２は、ＤＡＣ２５１に接続されるとともに、スピーカー３００にそれぞれ接続される。これらアンプ２５２は、ＤＡＣ２５１から出力されるアナログ信号の音信号をスピーカー３００から適宜出力可能に処理し、スピーカー３００へ出力して再生させる。 The output unit 250 includes a digital / analog converter (DAC) 251 and an amplifier 252.
The DAC 251 is connected to the playback apparatus 200 and converts the processed digital audio signal output from the playback apparatus 200 for each channel into analog. Then, the DAC 210 outputs the sound signal converted into analog to the amplifier 252 respectively.
The amplifier 252 is connected to the DAC 251 and also to the speaker 300. These amplifiers 252 process the sound signal of the analog signal output from the DAC 251 so that it can be appropriately output from the speaker 300, and output to the speaker 300 for reproduction.

［再生システムの動作］
次に、上記再生システム１００の動作を説明する。
まず、利用者は、所望とするコンテンツデーターの再生処理を要求する設定入力をする。この設定入力により、再生装置２００は、例えば記録媒体に記録されたコンテンツデーターをコンテンツ取得手段２１０で取得し、演算手段２４０へ順次出力する。
そして、演算手段２４０の音信号取得手段２４１は、コンテンツ取得手段２１０から出力されるコンテンツデーターの音信号を取得し、この音信号からチャンネル毎の音信号を取得する。
この音信号取得手段２４１で取得した各チャンネルの音信号について、演算手段２４０の音声検出手段２４２により、人の音声を含むチャンネルの音信号を検出する。 [Playback system operation]
Next, the operation of the playback system 100 will be described.
First, the user inputs a setting for requesting reproduction processing of desired content data. With this setting input, the playback apparatus 200 acquires content data recorded on a recording medium, for example, by the content acquisition unit 210 and sequentially outputs the content data to the calculation unit 240.
The sound signal acquisition unit 241 of the calculation unit 240 acquires the sound signal of the content data output from the content acquisition unit 210, and acquires the sound signal for each channel from the sound signal.
With respect to the sound signal of each channel acquired by the sound signal acquisition means 241, the sound detection means 242 of the calculation means 240 detects the sound signal of the channel including the human voice.

すなわち、音声検出手段２４２は、音量比較手段２４２Ａにより、各チャンネルの音信号のうち、他のチャンネルの音信号よりも相対的に音量が大きいチャンネルの音信号を判断する。特に、隣接するスピーカー３００に対応したチャンネルの音信号同士で比較し、音量が大きいチャンネルの音特性を特定する。
また、音声検出手段２４２は、相関性判断手段２４２Ｂにより、各チャンネルの音信号のうち、他のチャンネルの音信号と音特性で相関性が低いチャンネルの音信号を判断する。特に、隣接するスピーカー３００に対応したチャンネルの音信号同士の周波数帯のレベルや音量の大きさなどの音特性について比較し、音特性が類似しない、すなわち各周波数帯でのレベルの分布が異なる場合や音量差が大きい場合には、その相関性が低いチャンネルの音特性を特定する。
さらに、音声検出手段２４２は、周波数解析手段２４２Ｃにより、各チャンネルの音信号の周波数帯域を解析、すなわち３００Ｈｚ以上４ｋＨｚ以下の周波数帯が他の周波数帯より割合が顕著に多いチャンネルの音信号を特定する。
そして、音声検出手段２４２は、音量比較手段２４２Ａ、相関性判断手段２４２Ｂ、および周波数解析手段２４２Ｃでそれぞれ特定したチャンネルが同一である場合、そのチャンネルの音信号を候補として検出する。 That is, the sound detection means 242 uses the volume comparison means 242A to determine the sound signal of the channel whose volume is relatively larger than the sound signals of the other channels among the sound signals of each channel. In particular, comparison is made between the sound signals of the channels corresponding to the adjacent speakers 300, and the sound characteristics of the channel with a high volume are specified.
In addition, the sound detection unit 242 uses the correlation determination unit 242B to determine, among the sound signals of each channel, the sound signal of the channel having low correlation with the sound signal of the other channel. In particular, when the sound characteristics such as the level of the frequency band and the volume level of the sound signals of the channels corresponding to the adjacent speakers 300 are compared, the sound characteristics are not similar, that is, the level distribution in each frequency band is different. If the volume difference is large, the sound characteristics of the channel with low correlation are specified.
Further, the sound detection means 242 analyzes the frequency band of the sound signal of each channel by the frequency analysis means 242C, that is, identifies the sound signal of the channel whose frequency band of 300 Hz to 4 kHz is significantly higher than other frequency bands. To do.
When the channels specified by the sound volume comparison unit 242A, the correlation determination unit 242B, and the frequency analysis unit 242C are the same, the sound detection unit 242 detects the sound signal of that channel as a candidate.

この後、演算手段２４０は、圧縮処理手段２４３により、音声検出手段２４２で検出したチャンネルの音信号のみをダイナミックレンジ圧縮する。
そして、各チャンネルの音信号をそれぞれ同期させて各スピーカー３００に対応する出力手段２５０へ出力し、スピーカー３００から出力させる。この出力により、コンテンツデーターが再生される。 Thereafter, the calculation means 240 uses the compression processing means 243 to perform dynamic range compression on only the sound signal of the channel detected by the sound detection means 242.
Then, the sound signals of the respective channels are synchronized and output to the output means 250 corresponding to each speaker 300 and output from the speaker 300. With this output, the content data is reproduced.

［再生システムの作用効果］
上述したように、上記実施の形態では、基準点の周囲に配置される複数のスピーカーに対応したチャンネルの音信号のうち、人の音声が含まれる音信号を、それぞれの音特性の比較に基づいて検出し、この検出したチャンネルの音信号のみをダイナミックレンジ圧縮している。
このため、コンテンツデーターの音信号全体にダイナミックレンジ圧縮を実施することで、例えば音楽などの音のバランスが影響しやすいものまで圧縮が掛けられて音楽の演出内容が変わってしまうなどの不都合を生じることなく、音全体の音特性のバランスを損なわずに、人の音声である台詞などが聞き取りやすくなり、違和感なく良好なコンテンツの鑑賞を提供できる。 [Effect of playback system]
As described above, in the above-described embodiment, among sound signals of channels corresponding to a plurality of speakers arranged around the reference point, a sound signal including human speech is obtained based on comparison of sound characteristics. Only the sound signal of the detected channel is compressed in the dynamic range.
For this reason, by performing dynamic range compression on the entire sound signal of the content data, for example, the compression of the sound that is easily affected by the balance of the sound, such as music, is applied and the contents of the music effect are changed. Therefore, it is easy to hear dialogues such as human speech without losing the balance of the sound characteristics of the entire sound, and it is possible to provide good content appreciation without any sense of incongruity.

そして、人の音声が含まれる音信号の検出として、基準点の周囲に配置されるスピーカー３００における隣接するスピーカー３００に対応したチャンネルの音信号同士の音特性を比較することで検出している。
すなわち、映画の台詞は特定のチャンネルのみで顕著に再生されることが多いことから、隣接するスピーカー３００のチャンネルの音信号同士で比較する簡単な構成で、特に従来聞き取りにくい不都合が生じている映画の台詞を、誤作動を抑制しつつ違和感なく聞き取りやすくできる。 As a detection of a sound signal including human voice, detection is performed by comparing sound characteristics of sound signals of channels corresponding to the adjacent speakers 300 in the speakers 300 arranged around the reference point.
In other words, since the line of a movie is often remarkably reproduced only on a specific channel, a simple structure that compares sound signals of channels of adjacent speakers 300 with each other, and inconvenience that is difficult to hear in the past has occurred. Can be easily heard without any sense of incongruity while suppressing malfunction.

また、人の音声が含まれる音信号の検出に際して、他のチャンネルの音信号よりも相対的に音量が大きいチャンネルの音信号を音声が含まれる音信号の候補としている。
すなわち、例えば映画の台詞があるシーンではその台詞は画面上の人物の位置と相関のある音に定位させることが一般であり、言い替えれば所定のチャンネルないし特定の隣接する２つのチャンネルの間で顕著に再生されるようにミキシングされることが一般的であることから、音量を比較する簡単な構成で、比較的に誤作動を生じにくく違和感なく台詞を聞き取りやすくできる。 Further, when detecting a sound signal including human voice, a sound signal of a channel whose volume is relatively higher than that of the sound signals of other channels is set as a sound signal candidate including the voice.
That is, for example, in a scene with a movie dialogue, the dialogue is generally localized to a sound correlated with the position of the person on the screen. In other words, the dialogue is conspicuous between a predetermined channel or two adjacent channels. In general, mixing is performed so as to be played back, and therefore, with a simple configuration for comparing the volumes, it is relatively difficult for malfunctions to occur and the dialogue can be easily heard without a sense of incongruity.

さらに、人の音声が含まれる音信号の検出に際して、他のチャンネルの音信号と音特性相関性が低い、すなわち周波数帯や音量の類似性が低いチャンネルの音信号を音声が含まれる音信号の候補としている。
すなわち、例えば映画の台詞があるシーンではその台詞は画面上の人物の位置と相関のある音に定位させることが一般であり、言い替えれば所定のチャンネルないし特定の隣接する２つのチャンネルの間で顕著に再生されるようにミキシングされることが一般的であることから、台詞のあるチャンネルは他のチャンネルの音信号と音特性が異なるので、音特性を比較する簡単な構成で、比較的に誤作動を生じにくく違和感なく台詞を聞き取りやすくできる。
特に、相関性を判断するに際して、隣接するスピーカー３００に対応したチャンネルの音信号同士の相関性は高いが、他のチャンネルの音信号とでは相関性が低いチャンネルを候補とすることで、より誤作動を防止できる。 Further, when detecting a sound signal including human speech, a sound signal of a channel having low sound characteristic correlation with other channel sound signals, that is, having a low frequency band or volume similarity, is detected. It is a candidate.
That is, for example, in a scene with a movie dialogue, the dialogue is generally localized to a sound correlated with the position of the person on the screen. In other words, the dialogue is conspicuous between a predetermined channel or two adjacent channels. In general, the channel with dialogue has a different sound characteristic from the sound signal of the other channels, so it is relatively easy to compare the sound characteristics. It is easy to hear dialogue without feeling uncomfortable and difficult to act.
In particular, when determining the correlation, the sound signals of the channels corresponding to the adjacent speakers 300 are highly correlated with each other, but a channel having a low correlation with the sound signals of the other channels is used as a candidate. Operation can be prevented.

また、人の音声が含まれる音信号の検出に際して、周波数帯域を解析、すなわち人の音声である３００Ｈｚ以上４ｋＨｚ以下の周波数帯域の割合が多いチャンネルの音信号を、音声が含まれる音信号の候補としている。
このため、周波数帯のレベル分布状況を比較する簡単な構成で、比較的に誤作動を生じにくく違和感なく台詞を聞き取りやすくできる。 Further, when detecting a sound signal including human voice, the frequency band is analyzed, that is, a sound signal of a channel having a high frequency band ratio of 300 Hz to 4 kHz, which is a human voice, is selected as a sound signal candidate including the voice. It is said.
For this reason, with a simple configuration for comparing the level distribution status of the frequency bands, it is relatively difficult for malfunctions to occur and the dialogue can be easily heard without a sense of incongruity.

そして、人の音声が含まれる音信号の検出として、音量比較手段２４２Ａ、相関性判断手段２４２Ｂ、および周波数解析手段２４２Ｃでそれぞれ特定したチャンネルが同一である場合に、そのチャンネルの音信号を人の音声を含む音声の候補として検出している。
このため、より確実に人の音声を含むチャンネルの音信号を検出でき、より誤作動を防止できる。 Then, as a detection of a sound signal including human speech, if the channels specified by the volume comparison means 242A, the correlation determination means 242B, and the frequency analysis means 242C are the same, the sound signal of that channel is Detected as a voice candidate including voice.
For this reason, it is possible to more reliably detect the sound signal of the channel including the human voice and to prevent malfunction.

［変形例］
なお、本発明は、上述した実施の一形態に限定されるものではなく、本発明の目的を達成できる範囲で以下に示される変形をも含むものである。 [Modification]
In addition, this invention is not limited to one Embodiment mentioned above, The deformation | transformation shown below is included in the range which can achieve the objective of this invention.

例えば、本実施形態では、音量比較手段２４２Ａ、相関性判断手段２４２Ｂ、および周波数解析手段２４２Ｃを備え、それぞれで人の音声を含む音信号か否かを判断する構成を例示したが、この限りではない。すなわち、少なくともいずれか１つの構成を設けて判断してもよい。さらに、これらの構成に他の判断を加えてもよい。
また、これら音量比較手段２４２Ａ、相関性判断手段２４２Ｂ、および周波数解析手段２４２Ｃのそれぞれが人の音声を含む音信号であると認識したものを検出し、圧縮する構成を例示したが、これらの少なくともいずれか１つが人の音声を含む音信号であると認識したものを圧縮してもよい。 For example, in the present embodiment, the sound volume comparison unit 242A, the correlation determination unit 242B, and the frequency analysis unit 242C are provided, and each of them is determined as to whether or not the sound signal includes a human voice. Absent. That is, determination may be made by providing at least one of the configurations. Furthermore, other judgments may be added to these configurations.
In addition, although the volume comparison unit 242A, the correlation determination unit 242B, and the frequency analysis unit 242C each detect and recognize what is recognized as a sound signal including a human voice, at least these are exemplified. Any one recognized as a sound signal including a human voice may be compressed.

また、音声検出手段２４２は、基準点の周囲に配置されたスピーカー３００における隣接するスピーカー３００に対応したチャンネルの音信号同士で比較する構成を例示したが、この限りではない。例えば、他のチャンネルの音信号と音特性が異なるものを、人の音声を含む音信号として認識してもよい。 Moreover, although the audio | voice detection means 242 illustrated the structure compared between the sound signals of the channel corresponding to the adjacent speaker 300 in the speaker 300 arrange | positioned around the reference | standard point, it is not this limitation. For example, a sound signal that differs from the sound signal of another channel may be recognized as a sound signal including a human voice.

切替手段２３１を設け、ダイナミックレンジ圧縮の度合いを変更可能な構成を例示したが、ダイナミックレンジ圧縮の度合いは一定とした構成としてもよい。 Although a configuration in which the switching unit 231 is provided and the degree of dynamic range compression can be changed is exemplified, the degree of dynamic range compression may be constant.

そして、再生システム１００として例示したが、例えばスピーカー３００を備えない再生装置２００、さらには演算手段２４０を搭載した回路基板や、コンピューターなどの演算装置を演算手段２４０として機能させるプログラムなどとして構成してもよい。
また、台詞があるコンテンツデーターを処理して説明したが、例えばオーケストラ演奏などの台詞が含まれないコンテンツデーターの再生を除外するものではない。すなわち、このようなコンテンツデーターを処理しても、台詞が含まれないため、ダイナミックレンジ圧縮の処理を実行しないだけである。 The playback system 100 is exemplified, but for example, a playback device 200 that does not include the speaker 300, a circuit board on which the calculation unit 240 is mounted, and a program that causes the calculation unit such as a computer to function as the calculation unit 240 are configured. Also good.
Further, although the description has been made by processing the content data having the dialogue, the reproduction of the content data not containing the dialogue such as an orchestra performance is not excluded. That is, even if such content data is processed, the line is not included, so that the dynamic range compression process is not executed.

その他、本発明の実施の際の具体的な構造および手順は、本発明の目的を達成できる範囲で他の構造などに適宜変更できる。 In addition, the specific structure and procedure for carrying out the present invention can be appropriately changed to other structures and the like within a range in which the object of the present invention can be achieved.

２００……再生装置
２４０……演算手段
２４２……音声検出手段
２４２Ａ…音量比較手段
２４２Ｂ…相関性判断手段
２４２Ｃ…周波数解析手段
２４３……圧縮処理手段
３００……スピーカー 200 …… Playback device 240 …… Calculation means 242 …… Audio detection means 242A ... Volume comparison means 242B ... Correlation determination means 242C ... Frequency analysis means 243 …… Compression processing means 300 …… Speaker

Claims

A sound signal processing device for processing sound signals in order to reproduce sound signals of channels corresponding to these speakers from a plurality of speakers arranged around a reference point,
A sound detection means for detecting a sound signal including a human voice among the sound signals of the channel based on a comparison of respective sound characteristics;
Compression processing means for dynamic range compression only of the sound signal of the channel containing the sound detected by the sound detection means;
A sound signal processing apparatus comprising:

The sound signal processing device according to claim 1,
The sound detection means compares sound signals of channels corresponding to adjacent speakers in speakers arranged around a reference point, and determines whether or not human sound is included. Processing equipment.

The sound signal processing device according to claim 1 or 2,
The sound detection means detects a sound signal of a channel whose sound characteristics are relatively louder than sound signals of other channels among sound signals of each channel as a sound signal including sound. Sound signal processing device.

The sound signal processing device according to any one of claims 1 to 3,
The sound detection means detects sound signals of channels whose sound characteristics are low in correlation with sound signals of other channels, among sound signals of each channel, as sound signals containing sound apparatus.

The sound signal processing device according to claim 4,
The sound detection means has a sound characteristic of a channel having a high correlation in sound characteristics between sound signals of channels corresponding to adjacent speakers, and a sound signal of a channel having a low sound specific correlation in the sound signals of other channels. A sound signal processing apparatus, characterized in that the sound signal is detected as an included sound signal.

The sound signal processing device according to any one of claims 1 to 5,
The sound signal processing apparatus, wherein the sound detection means determines whether or not sound is included by analyzing a frequency band in a sound characteristic of a sound signal of each channel.

The sound signal processing apparatus according to claim 6,
The sound detection means detects a sound signal of a channel having a large proportion of a frequency band of 300 Hz to 4 kHz with respect to the entire frequency band in the sound signal of each channel as a sound signal containing sound. Signal processing device.

In order to reproduce sound signals of channels corresponding to these speakers from a plurality of speakers arranged around a reference point, the sound signal processing method of processing the sound signal by a calculation means,
The computing means is
A sound detection step of detecting a sound signal including a human voice among the sound signals of the channel based on a comparison of respective sound characteristics;
A sound signal processing method comprising: performing a dynamic range compression only on a sound signal of a channel including sound detected in the sound detection step.

A sound signal processing program for causing a calculation means to function as the sound signal processing device according to any one of claims 1 to 7.

The sound signal processing device according to any one of claims 1 to 7,
Output means for outputting sound signals of channels processed by the sound signal processing device corresponding to a plurality of speakers arranged around the reference point by the speakers;
A playback apparatus comprising: