JP2008225235A

JP2008225235A - Audio signal processor

Info

Publication number: JP2008225235A
Application number: JP2007065602A
Authority: JP
Inventors: Hideo Koide; 英生小出
Original assignee: Crimson Tech Inc
Current assignee: Crimson Tech Inc
Priority date: 2007-03-14
Filing date: 2007-03-14
Publication date: 2008-09-25

Abstract

<P>PROBLEM TO BE SOLVED: To make it possible to hold continuity of signal level between temporally adjacent frames when relative information is added in a frame of audio data, and to perform processing relatively easily so that noise is not included in an audio signal to which the relative information is added. <P>SOLUTION: The audio signal processor includes a frame position specifying means 302 of sectioning audio data by frames and adding information specifying frame positions, a frame extracting means 303 of extracting the sectioned frames, one by one, an additional information switching means 304 of alternating embedding and non-embedding of the relative information associated with the audio data in the extracted frames by the frames, a relative information adding means 305 of adding the relative information in a first frame selected as an object of embedding of the relative information, and a fade adding means 306 of adding fade data in a second frame selected as an object of non-embedding of the relative information so as to keep continuity to the signal level in the first frame temporally adjacent to the second frame. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声信号処理装置に係り、特に音声データに関連する関連情報の埋め込みまたは抽出を行う装置に関するもので、例えばストリームデータ形式の音楽コンテンツの配信システムに使用されるものである。 The present invention relates to an audio signal processing apparatus, and more particularly to an apparatus that embeds or extracts related information related to audio data, and is used, for example, in a music content distribution system in a stream data format.

デジタル化された音楽コンテンツ、例えばCD品質のデジタル音声データは、各チャンネル毎に1 秒あたり44100 個の16ビット長のサンプルから構成される。このような無圧縮のデジタル音声データに対して、著作権情報等の電子透かし情報や、その他の情報を付加（埋め込み）する場合、デジタル信号処理の基本手法として、ある一定長の区間でサンプルを抽出し、抽出されたサンプルに対して個別の処理を行う手法が多く用いられる。ここで、一定長の各抽出区間をフレームと呼ぶ。フレーム処理を用いる代表的な手法として高速フーリェ変換を挙げることができる。 Digitized music content, eg CD quality digital audio data, consists of 44100 16-bit samples per second for each channel. When digital watermark information such as copyright information or other information is added (embedded) to such uncompressed digital audio data, as a basic method of digital signal processing, samples are sampled in a certain length section. A method of extracting and performing individual processing on the extracted sample is often used. Here, each extraction section having a certain length is called a frame. As a typical technique using frame processing, high-speed Fourier transform can be mentioned.

従来、CD品質のデジタル音声データに対して情報を付加する情報付加装置は、音声データを例えば2048サンプル毎に区切り、区切られたフレーム内のサンプルを抽出し、所定の情報付加処理を行ったうえで、情報付加済みの音声データとして出力する。そして、上記したような情報付加済みの音声データが入力される情報抽出装置は、情報付加装置における動作時と同じ2048サンプル毎にフレーム区間内のサンプルを抽出し、所定の信号処理を行うことによって、音声データに埋め込まれた情報を抽出する。 Conventionally, an information adding device for adding information to digital audio data of CD quality delimits audio data every 2048 samples, for example, extracts samples in the divided frames, and performs a predetermined information adding process. And output as audio data with added information. Then, the information extraction device to which the information-added audio data as described above is input extracts samples in the frame interval for every 2048 samples that is the same as the operation in the information addition device, and performs predetermined signal processing. Extract information embedded in audio data.

ところで、従来の情報付加装置において、音声データのフレーム内に対する情報付加手法によっては、隣り合うフレームでの音声信号の連続性を保つことが不可能になり、この場合には情報が付加された音声信号にノイズが含まれることになってしまうという問題がある。 By the way, in the conventional information addition apparatus, it is impossible to maintain the continuity of the audio signal in the adjacent frames depending on the information addition method for the frame of the audio data. There is a problem that noise is included in the signal.

なお、フレーム処理に基づく情報付加によるノイズを低減する手法として、フレームの長さや、形状、連続するフレームの処理方法などをその状態に合わせて適応的に変化させる手法も提案されているが、処理計算量が多く、安価な製品への適用は困難である。（特許文献１参照）。
特表２００４−５２５４２９号公報 As a technique for reducing noise due to information addition based on frame processing, a technique has been proposed in which the frame length, shape, and processing method for consecutive frames are adaptively changed according to the state. The amount of calculation is large, and it is difficult to apply to inexpensive products. (See Patent Document 1).
JP-T-2004-525429

本発明は前記した従来の問題点を解決すべくなされたもので、音声データのフレーム内に関連情報を付加する際に、時間的に隣り合うフレームでの信号レベルの連続性を保つことが可能になり、関連情報が付加された音声信号にノイズが含まれないように比較的簡易に処理し得る音声信号処理装置を提供することを目的とする。 The present invention has been made to solve the above-described conventional problems, and when related information is added to a frame of audio data, it is possible to maintain continuity of signal levels in temporally adjacent frames. Therefore, an object of the present invention is to provide an audio signal processing apparatus that can process relatively easily so that noise is not included in an audio signal to which related information is added.

また、本発明の他の目的は、音声データのフレーム毎に交互に関連情報が付加され、かつ、時間的に隣り合うフレーム相互で信号レベルの連続性が保たれた情報埋め込み済みの音声データから、所望の関連情報を比較的簡易に抽出し得る音声信号処理装置を提供することにある。 Further, another object of the present invention is that information related to embedded audio data in which related information is alternately added to each frame of audio data and signal level continuity is maintained between temporally adjacent frames. Another object of the present invention is to provide an audio signal processing apparatus that can extract desired related information relatively easily.

本発明の音声信号処理装置の第１の態様は、音声データに対して一定長の区間であるフレーム毎に区切り、当該フレームの時間位置を特定するフレーム位置特定手段と、前記フレーム位置特定手段により区切られたフレーム毎にフレームを抽出するフレーム抽出手段と、前記フレーム抽出手段により抽出処理されたフレーム毎に、当該フレーム内に前記音声データに関連する関連情報の埋め込みを行うか否かを交互に切り替えるように制御する付加情報切り替え手段と、前記付加情報切り替え手段により前記関連情報の埋め込み対象として選択された第１フレーム内に、前記関連情報を所定のルールにしたがって付加する関連情報付加手段と、前記付加情報切り替え手段により前記関連情報の非埋め込み対象として選択された第２フレーム内に、当該第２フレームと時間的に隣り合う前記第１フレーム内の信号レベルとの連続性を保つようにフェードデータを付加するフェード付加手段と、を具備することを特徴とする。 According to a first aspect of the audio signal processing apparatus of the present invention, a frame position specifying unit that divides the audio data into frames each having a certain length section and specifies a time position of the frame, and the frame position specifying unit Alternately, frame extraction means for extracting a frame for each divided frame and whether to embed related information related to the audio data in the frame for each frame extracted by the frame extraction means Additional information switching means for controlling to switch, and related information adding means for adding the related information in accordance with a predetermined rule in the first frame selected as an embedding target of the related information by the additional information switching means, In the second frame selected as the non-embedding target of the related information by the additional information switching means , Characterized by comprising: a fading adding means for adding the fade data to maintain the continuity of the signal level of the second frame and the temporally first frame adjacent.

本発明の音声信号処理装置の第２の態様は、音声データのフレーム毎に交互に関連情報が付加され、かつ、時間的に隣り合うフレーム相互で信号レベルの連続性が保たれた情報埋め込み済みの音声データが入力し、各フレーム毎にフレーム内のデータを抽出するフレーム抽出手段と、前記フレーム抽出手段により抽出されたフレーム毎に、当該フレームの出力先を交互に切り替えるように制御するフレーム切り替え手段と、前記フレーム切り替え手段から供給されたフレームに対して所定の処理を行い、フレーム内に埋め込まれている前記関連情報を抽出する関連情報抽出手段と、前記関連情報抽出手段から出力するフレームおよび前記フレーム切り替え手段から供給されたフレーム出力を交互に取り込んで合成し、音声データを再生して出力するとともに関連情報を出力する再生手段と、を具備することを特徴とする。 According to the second aspect of the audio signal processing device of the present invention, information is embedded in which related information is alternately added to each frame of audio data and signal level continuity is maintained between temporally adjacent frames. Frame extraction means for extracting the audio data of each frame and extracting data within the frame for each frame, and frame switching for controlling the output destination of the frame alternately for each frame extracted by the frame extraction means Means, related information extracting means for performing a predetermined process on the frame supplied from the frame switching means, and extracting the related information embedded in the frame, a frame output from the related information extracting means, and The frame outputs supplied from the frame switching means are alternately captured and synthesized, and audio data is reproduced and output. Characterized by comprising a reproducing means for outputting the relevant information as well as.

本発明の音声信号処理装置によれば、音声データのフレーム内に関連情報を付加する際に、時間的に隣り合うフレームでの信号レベルの連続性を保つように処理し、情報が付加された音声信号にノイズが含まれないように比較的簡易に処理することができる。 According to the audio signal processing device of the present invention, when related information is added in a frame of audio data, processing is performed so as to maintain continuity of signal levels in temporally adjacent frames, and information is added. The audio signal can be processed relatively easily so as not to include noise.

また、本発明の音声信号処理装置によれば、フレーム毎に交互に関連情報が付加され、かつ、時間的に隣り合うフレーム相互で信号レベルの連続性が保たれた音声データから、所望の関連情報を比較的簡易に抽出することができる。 Further, according to the audio signal processing device of the present invention, desired information can be obtained from audio data in which related information is alternately added for each frame and the continuity of the signal level between temporally adjacent frames is maintained. Information can be extracted relatively easily.

以下、図面を参照して本発明の実施形態を説明する。この説明に際して、全図にわたり共通する部分には共通する参照符号を付す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In this description, common parts are denoted by common reference numerals throughout the drawings.

＜第１の実施形態＞
図１は、本発明の音声信号処理装置の第１の実施形態に係る音声コンテンツ配信システムの構成を概略的に示している。この音声コンテンツ配信システムの送信側１０では、音声コンテンツ配信システムの音声データが入力し、所定の情報を付加する情報付加装置３０と、この情報付加装置３０の出力を何らかの伝送路４０に送出する送信回路２０とを具備する。そして、音声コンテンツ配信システムの受信側５０では、伝送路４０からの入力を受領する受信回路６０と、この受信回路６０の出力から所定の情報を抽出し、オーディオデータを抽出する情報抽出装置７０とを具備する。上記情報付加装置３０および情報抽出装置７０は、ソフトウェア的に、または、ハードウェアにより実現することができるが、本例では、ＣＰＵとプログラムを用いて実現している。 <First Embodiment>
FIG. 1 schematically shows the configuration of an audio content distribution system according to the first embodiment of the audio signal processing apparatus of the present invention. On the transmission side 10 of this audio content distribution system, audio data of the audio content distribution system is input, an information adding device 30 for adding predetermined information, and a transmission for sending the output of this information adding device 30 to some transmission path 40 Circuit 20. On the receiving side 50 of the audio content distribution system, a receiving circuit 60 that receives an input from the transmission path 40, an information extracting device 70 that extracts predetermined information from the output of the receiving circuit 60, and extracts audio data; It comprises. The information adding device 30 and the information extracting device 70 can be realized by software or hardware. In this example, the information adding device 30 and the information extracting device 70 are realized by using a CPU and a program.

図２は、図１中の情報付加装置３０における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図である。この情報付加装置３０は、フレーム位置特定手段302 と、フレーム抽出手段303 と、付加情報切り替え手段304 と、関連情報付加手段305 と、フェード付加手段306 と、を具備する。 FIG. 2 is a block diagram showing an example of a processing flow (step) focusing on the processing function in the information adding device 30 in FIG. The information adding device 30 includes frame position specifying means 302, frame extracting means 303, additional information switching means 304, related information adding means 305, and fade adding means 306.

フレーム位置特定手段302 は、音声コンテンツ配信システムの音声データ（オーディオデータ）、例えばCD品質の音声データが入力し、この音声データを一定長の区間であるフレーム（例えば2048サンプル）毎に区切ることによってフレーム位置の特定を行う機能を有する。フレーム抽出手段303 は、フレーム位置特定手段302 により区切られたフレーム毎にフレーム内のサンプルを抽出する機能を有する。 The frame position specifying unit 302 receives audio data (audio data) of the audio content distribution system, for example, CD quality audio data, and divides the audio data into frames (for example, 2048 samples) that are constant length sections. It has a function of specifying the frame position. The frame extraction unit 303 has a function of extracting a sample in the frame for each frame delimited by the frame position specifying unit 302.

付加情報切り替え手段304 は、本例では、第１のフレーム切り替え手段3041と、第２のフレーム切り替え手段3042と、からなる。付加情報切り替え手段304 の機能は、フレーム抽出手段303 により抽出処理されたフレーム毎に、当該フレーム内に音声データに関連する関連情報の埋め込みを行うか否かを交互に切り替えるように制御するものである。 In this example, the additional information switching unit 304 includes a first frame switching unit 3041 and a second frame switching unit 3042. The function of the additional information switching unit 304 controls to alternately switch whether or not to embed related information related to audio data in the frame for each frame extracted by the frame extracting unit 303. is there.

関連情報付加手段305 は、付加情報切り替え手段304 により関連情報の埋め込み対象として選択された第１フレーム内に、関連情報を所定のルールにしたがって付加する機能を有する。ここで、音声データに関連する関連情報は、例えば歌詞などのテキスト情報、著作権情報、あるいは楽譜やリズムなどの演奏情報、演奏用ロボットや玩具の演奏動作制御情報などであり、例えばMIDI規格のデータ形式で付加される。 The related information adding unit 305 has a function of adding related information in accordance with a predetermined rule in the first frame selected as the related information embedding target by the additional information switching unit 304. Here, the related information related to the audio data is, for example, text information such as lyrics, copyright information, performance information such as a score or rhythm, performance control information of a performance robot or toy, for example, a MIDI standard Added in data format.

フェード付加手段306 は、付加情報切り替え手段304 により関連情報の非埋め込み対象として選択された第２フレーム内に、当該第２フレームと時間的に隣り合う第１フレーム内の信号レベルとの連続性を保つようにフェードデータを付加する機能を有する。 The fade adding means 306 determines the continuity between the signal level in the first frame temporally adjacent to the second frame in the second frame selected as the non-embedding target of the related information by the additional information switching means 304. It has a function of adding fade data to keep it.

第１のフレーム切り替え手段3041は、フレーム抽出手段303 により抽出処理されたフレーム毎にフレーム出力を関連情報付加手段305 およびフェード付加手段306 へ交互に切り替えて供給する機能を有する。この第１のフレーム切り替え手段3041により例えば奇数番目のフレームが関連情報付加手段305 へ供給される場合、関連情報が付加された奇数番目のフレーム信号は隣り合う偶数番目のフレーム内信号との連続性を失うおそれがある。この連続性を補うため、偶数番目のフレームの信号は第１のフレーム切り替え手段3041によりフェード付加手段306 に供給され、時間的に隣り合う奇数番目のフレームの信号と波形の連続性が保たれるように処理が施される。 The first frame switching unit 3041 has a function of alternately switching and supplying the frame output to the related information adding unit 305 and the fade adding unit 306 for each frame extracted by the frame extracting unit 303. When the first frame switching unit 3041 supplies, for example, an odd-numbered frame to the related information adding unit 305, the odd-numbered frame signal to which the related information is added is continuous with the adjacent even-numbered intra-frame signal. There is a risk of losing. In order to compensate for this continuity, the signal of the even-numbered frame is supplied to the fade adding means 306 by the first frame switching means 3041, and the continuity of the waveform with the signal of the odd-numbered frame adjacent in time is maintained. Processing is performed as follows.

第２のフレーム切り替え手段3042は、第１のフレーム切り替え手段3041と同期して動作し、情報付加手段305 の出力およびフェード付加手段306 の出力を交互に取り込んで合成し、情報付加済みの音声データとして出力する機能を有する。 The second frame switching means 3042 operates in synchronism with the first frame switching means 3041, and alternately captures and synthesizes the output of the information adding means 305 and the output of the fade adding means 306, and the information-added audio data As an output function.

次に、図２の情報付加装置３０における処理例について説明する。図３は、図２中のフレーム位置特定手段302 に入力する音声データの信号波形の一例を示す。ここでは、音声信号の振幅のサンプリング値がデジタル的に表現されたCD品質の音声データ入力を例にとり、サンプリング値を連続的に連ねた波形で表現しており、３つのフレーム区間N1、N2、N3に区切られている例を示している。奇数番目のフレームN1、N3は第１のフレーム切り替え手段304 によって関連情報付加手段305 へ供給され、偶数番目のフレームN2は第１のフレーム切り替え手段304 によってフェード付加手段306 に供給される。 Next, a processing example in the information adding device 30 in FIG. 2 will be described. FIG. 3 shows an example of the signal waveform of the audio data input to the frame position specifying means 302 in FIG. Here, taking the CD quality audio data input in which the sampling value of the amplitude of the audio signal is digitally expressed as an example, the sampling value is expressed by a continuous waveform, and the three frame sections N1, N2, An example is shown that is divided into N3. The odd-numbered frames N1 and N3 are supplied to the related information adding means 305 by the first frame switching means 304, and the even-numbered frame N2 is supplied to the fade adding means 306 by the first frame switching means 304.

図４は、図２中の関連情報付加手段305 によって元に音声データのフレーム内に音声データに関連する関連情報が付加された情報付加済みの音声データの信号波形の一例を示す。ここでは、奇数番目のフレームN1、N3が関連情報付加手段305 によって処理されて波形に変更が施される結果、フレームN1の後縁（図中では右端）と後続のフレームN2の前縁（図中では左端）、および、フレームN2の後縁と後続のフレームN3の前縁において、信号波形の連続性が失われる様子を示している。 FIG. 4 shows an example of the signal waveform of the audio data to which information has been added in which related information related to the audio data is added to the frame of the audio data based on the related information adding means 305 in FIG. Here, as a result of processing the odd-numbered frames N1 and N3 by the related information adding means 305 and changing the waveform, the trailing edge (right end in the figure) of the frame N1 and the leading edge of the subsequent frame N2 (see FIG. In the figure, the continuity of the signal waveform is lost at the left edge) and at the trailing edge of the frame N2 and the leading edge of the subsequent frame N3.

図５は、図２中のフェード付加手段306 に入力するフェード付加の対象となるフレームN2に付加されるフェードデータの一例を示す。図６は、図２中のフェード付加手段306 によってフェード付加の対象となるフレームN2にフェードデータが付加されたフェード付加後のデータx'(n) の信号波形の一例を示す。 FIG. 5 shows an example of fade data added to the frame N2 to be faded, which is input to the fade adding means 306 in FIG. FIG. 6 shows an example of a signal waveform of data x ′ (n) after fade addition in which fade data is added to the frame N2 to be faded by the fade adding means 306 in FIG.

この例では、フレーム長の設定値をN サンプルとした場合、フェード付加手段306 によってフレームN2の信号x(N)〜x(2N-1) は以下の（式１）、（式２）で表される信号x'(n)に置き換えられる。 In this example, when the set value of the frame length is N samples, the signals x (N) to x (2N-1) of the frame N2 are expressed by the following (Expression 1) and (Expression 2) by the fade adding means 306. Is replaced with the signal x ′ (n).

（N ≦ n < N + N/2の場合）
x'(n) = x(n) + {x(N-1) - x(N)}×(3N / 2 - n) / (N / 2) ……（式１）
（N + N/2 ≦ n < 2N の場合）
x'(n) = x(n) + {x(2N) - x(2N-1)}×(n - 3N / 2) / (N / 2) ……（式２）
すなわち、フェード付加手段306 は、隣り合うフレーム間の信号振幅の相関を検出し、フェード付加の対象となるフレームN2にフェード信号を付加する。このフェード信号の値は、本例では、隣り合うフレームの差分値をフレームN2の中心位置（時間的な中心位置）からの距離（時間）に応じて比例変化させたものとなる。 (When N ≤ n <N + N / 2)
x '(n) = x (n) + {x (N-1)-x (N)} x (3N / 2-n) / (N / 2) (Formula 1)
(N + N / 2 ≤ n <2N)
x '(n) = x (n) + {x (2N)-x (2N-1)} x (n-3N / 2) / (N / 2) (Formula 2)
That is, the fade adding means 306 detects the correlation of the signal amplitude between adjacent frames, and adds a fade signal to the frame N2 to be faded. In this example, the value of the fade signal is obtained by proportionally changing the difference value between adjacent frames according to the distance (time) from the center position (temporal center position) of the frame N2.

この際、フェード付加手段306 は、時間的に隣り合う複数のフレームの信号について隣り合うフレーム間の信号振幅の相関を検出した後にフェード信号を付加するので、この処理に要する若干の時間分の遅延が生じるとしても、音声データの速度に比して殆んど問題とならなず、フェード付加をリアルタイムで処理することが可能である。 At this time, the fade adding means 306 adds the fade signal after detecting the correlation of the signal amplitude between the adjacent frames for the signals of the plurality of temporally adjacent frames, so that a slight delay corresponding to this processing is required. Even if this occurs, it is hardly a problem as compared with the speed of the audio data, and the fade addition can be processed in real time.

なお、フェード信号値の変化方法としては、上記したように時間軸に対して一次的に変化する以外に、二次的な変化や、三角関数による曲線等、各種の特性を用いることができる。 As a method for changing the fade signal value, various characteristics such as a secondary change, a curve by a trigonometric function, and the like can be used in addition to the primary change with respect to the time axis as described above.

上記したようにフェード付加の対象となるフレームN2に対してフェード付加を行ったことにより、時間的に隣り合うフレーム同士の信号波形の連続性が補正される。また、フェード付加の対象とならないフレームN1およびN3の信号は、フェード処理の影響を受けることがなく、後の処理段階における付加情報の抽出処理を妨げる要因とはならない。 As described above, the continuity of the signal waveforms of the temporally adjacent frames is corrected by performing the fade addition on the frame N2 to be faded. Further, the signals of the frames N1 and N3 that are not subject to fade addition are not affected by the fade process, and do not interfere with the additional information extraction process in a later processing stage.

上記した情報付加装置３０によれば、音声データのフレーム内に情報を付加する際に、時間的に隣り合うフレームで音声信号の連続性を保つように処理し、情報が付加された音声信号にノイズが含まれないように比較的簡易に処理することができる。なお、上記した情報付加装置３０は、前記したような音声コンテンツ配信システムの送信側に限らず、任意の音声信号処理システムに設けることが可能である。 According to the information adding device 30 described above, when information is added in a frame of audio data, processing is performed so as to maintain the continuity of the audio signal in temporally adjacent frames, and the audio signal to which information is added is processed. Processing can be performed relatively easily so as not to include noise. The information adding device 30 described above can be provided not only on the transmission side of the audio content distribution system as described above but also in any audio signal processing system.

図７は、図１中の情報抽出装置７０における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図である。この情報抽出装置は、フレーム抽出手段402と、フレーム切り替え手段403 と、関連情報抽出手段404 と、再生手段405 と、を具備する。 FIG. 7 is a block diagram showing an example of a processing flow (step) focusing on the processing function in the information extracting device 70 in FIG. This information extraction apparatus includes a frame extraction unit 402, a frame switching unit 403, a related information extraction unit 404, and a reproduction unit 405.

フレーム抽出手段402 は、前述したような情報埋め込み済みの音声データが入力し、各フレーム毎（情報付加装置の動作時と同じ2048サンプル毎）にフレーム内のサンプルを抽出する。 The frame extraction unit 402 receives the voice data with information embedded therein as described above, and extracts a sample in the frame for each frame (every 2048 samples, which is the same as when the information adding device is operating).

フレーム切り替え手段403 は、フレーム抽出手段402 により抽出処理されたフレーム毎にフレーム出力を関連情報抽出手段404 および音声再生手段405 へ交互に切り替えて供給する機能を有する。本例では、関連情報抽出手段404 には、奇数番目のフレームだけ供給され、偶数番目のフレームは供給されない。 The frame switching unit 403 has a function of alternately switching and supplying the frame output to the related information extracting unit 404 and the audio reproducing unit 405 for each frame extracted by the frame extracting unit 402. In this example, only the odd-numbered frames are supplied to the related information extracting means 404, and the even-numbered frames are not supplied.

関連情報抽出手段404 は、フレーム切り替え手段403 により供給された奇数番目のフレームに対して所定の処理を行い、フレーム内に埋め込まれている関連情報を抽出する機能を有する。本例では、フレーム切り替え手段403 により供給された奇数番目のフレームからだけ情報抽出が行われ、偶数番目のフレームからは情報抽出が行われない。 The related information extracting unit 404 has a function of performing a predetermined process on the odd-numbered frame supplied by the frame switching unit 403 and extracting related information embedded in the frame. In this example, information extraction is performed only from odd-numbered frames supplied by the frame switching means 403, and information extraction is not performed from even-numbered frames.

再生手段405 は、関連情報抽出手段404 から出力する奇数番目のフレームおよびフレーム切り替え手段403 から出力する偶数番目のフレーム出力を交互に取り込んで合成し、音声データを再生して出力するとともに、関連情報を出力する機能を有する。 The reproduction means 405 alternately captures and synthesizes odd-numbered frames output from the related information extraction means 404 and even-numbered frame outputs output from the frame switching means 403, reproduces and outputs the audio data, and Has a function of outputting.

上記した情報抽出装置７０によれば、音声データのフレーム毎に交互に関連情報が付加され、かつ、時間的に隣り合うフレーム相互で信号レベルの連続性が保たれた情報埋め込み済みの音声データが入力される場合でも、所望の関連情報を比較的簡易に抽出することができる。 According to the information extraction device 70 described above, the embedded information of the audio data in which the related information is alternately added for each frame of the audio data and the continuity of the signal level between the temporally adjacent frames is maintained. Even when input, desired related information can be extracted relatively easily.

本発明は、DVD やムービーファイルなどの映像信号の音声部分を利用し、その音声部分に制御信号などの補助的信号を埋め込む分野に適用可能である。さらには、CDに記録されたオーディオデータ、あるいはパソコンで扱うwave形式などの非圧縮オーディオや、DVDや音楽配信、携帯デジタル音楽プレーヤ、着うたなどのようにMP-3やAAC 等のオーディオ圧縮伸張技術を用いたパッケージやファイル形式、あるいはファイル配信やストリーミングなどの音声信号を伝達・配信する分野に適用可能である。例えば、着うたなどの歌詞（解説やアーティストメッセージ）表示付きの音楽コンテンツ販売、MP-3/AAC等のオーディオ圧縮伸張技術を用いたカラオケサービス、歌詞表示音楽配信サービス、楽器演奏ロボット玩具の商品化、演奏情報付き音楽コンテンツの販売、ビートタイミング付き音楽コンテンツの販売、などの分野に適用可能である。 The present invention is applicable to the field of using an audio portion of a video signal such as a DVD or a movie file and embedding an auxiliary signal such as a control signal in the audio portion. In addition, audio data recorded on CDs or uncompressed audio such as wave formats handled by personal computers, audio compression / decompression technologies such as MP-3 and AAC, such as DVD and music distribution, portable digital music players, and Chaku-Uta It is applicable to the field of transmitting and distributing audio signals such as package and file format using file, or file distribution and streaming. For example, sales of music contents with lyrics (comments and artist messages) such as Chaku-Uta, karaoke service using audio compression / decompression technology such as MP-3 / AAC, music distribution service for displaying lyrics, commercialization of musical instrument playing robot toys, It can be applied to fields such as sales of music content with performance information and sales of music content with beat timing.

本発明の音声信号処理装置の第１の実施形態に係る音声コンテンツ配信システムの構成を概略的に示すブロック図。1 is a block diagram schematically showing the configuration of an audio content distribution system according to a first embodiment of an audio signal processing device of the present invention. 図１中の情報付加装置における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図。The block diagram shown according to an example of a processing flow (step) paying attention to the processing function in the information addition apparatus in FIG. 図２中のフレーム位置特定手段に入力する音声データの一例を示す信号波形図。FIG. 3 is a signal waveform diagram showing an example of audio data input to the frame position specifying means in FIG. 2. 図２中の関連情報付加手段によって音声データのフレーム内に音声データに関連する関連情報が付加された情報付加済みの音声データの一例を示す信号波形図。FIG. 3 is a signal waveform diagram showing an example of audio data with information added to which related information related to audio data is added in a frame of audio data by the related information adding means in FIG. 図２中のフェード付加手段に入力するフェード付加の対象となるフレームN2に付加されるフェードデータの一例を示す信号波形図。FIG. 3 is a signal waveform diagram showing an example of fade data added to a frame N2 to be faded, which is input to the fade adding means in FIG. 図２中のフェード付加手段によってフェード付加の対象となるフレームN2にフェードデータが付加されたフェード付加後のデータx'(n) の一例を示す信号波形図。FIG. 3 is a signal waveform diagram showing an example of data x ′ (n) after fade addition in which fade data is added to a frame N2 to be faded by the fade adding means in FIG. 2; 図１中の情報抽出装置における処理機能に着目して処理フロー（ステップ）の一例にしたがって示すブロック図。The block diagram shown according to an example of a processing flow (step) paying attention to the processing function in the information extraction device in FIG.

Explanation of symbols

302 …フレーム位置特定手段、303 …フレーム抽出手段、304 …付加情報切り替え手段、3041…第１のフレーム切り替え手段、3042…第２のフレーム切り替え手段、305 …関連情報付加手段、306 …フェード付加手段。 302 ... Frame position specifying means, 303 ... Frame extracting means, 304 ... Additional information switching means, 3041 ... First frame switching means, 3042 ... Second frame switching means, 305 ... Related information adding means, 306 ... Fade adding means .

Claims

A frame position specifying unit that divides the audio data into frames that are sections of a certain length and specifies the time position of the frame;
Frame extracting means for extracting a frame for each frame delimited by the frame position specifying means;
Additional information switching means for controlling to alternately switch whether or not to embed related information related to the audio data in the frame for each frame extracted by the frame extraction means;
Related information adding means for adding the related information in accordance with a predetermined rule in the first frame selected as an embedding target of the related information by the additional information switching means;
The second frame selected as the non-embedding target of the related information by the additional information switching means is faded so as to maintain continuity with the signal level in the first frame temporally adjacent to the second frame. A fade adding means for adding data;
An audio signal processing device comprising:

The additional information switching means includes
First frame switching means for controlling each frame extracted by the frame extracting means to alternately switch and supply a signal of the frame to the related information adding means or the fade adding means;
Audio data with information added, which operates in synchronism with the first frame switching means, alternately captures the output frame of the information adding means and the output frame of the fade adding means, and synthesizes the frames continuously in time. Second frame switching means for outputting as:
The audio signal processing apparatus according to claim 1, further comprising:

Relevant information is alternately added to each frame of audio data, and embedded audio data in which signal level continuity is maintained between temporally adjacent frames is input. Frame extraction means for extracting data;
Frame switching means for controlling to alternately switch the output destination of the frame for each frame extracted by the frame extraction means;
Related information extracting means for performing predetermined processing on the frame supplied from the frame switching means and extracting the related information embedded in the frame;
A reproduction unit that alternately captures and synthesizes the frame output from the related information extraction unit and the frame output supplied from the frame switching unit, reproduces and outputs audio data, and outputs related information;
An audio signal processing device comprising: