JP2003178078A

JP2003178078A - Additional indicator data to image and voice data, and its adding method

Info

Publication number: JP2003178078A
Application number: JP2001378313A
Authority: JP
Inventors: Koji Nishikawa; 孝司西川
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-12-12
Filing date: 2001-12-12
Publication date: 2003-06-27

Abstract

<P>PROBLEM TO BE SOLVED: To solve such problems that in some cases, it is difficult to represent the demand of the viewer side with a simple text with respect to the image and voice in an archive, and further it is very difficult to add a sensuous keyword such as 'interesting' and the like in the demand of a viewer, to a scene as the metadata. <P>SOLUTION: The indicator data obtained by automatically taking in and processing the viewer's biotic reaction to the image and voice, and correlating the same for every content such as the image and the voice, and every scene in the contents, is used besides the conventional metadata. Thereby, the viewer's biotic reaction to the image or the voice is automatically taken in and processed as the indicator data to be automatically correlated to every content and every scene in the contents. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、画像および音声を
含むデータに関連付ける標識データとその加工方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to marker data associated with data including images and sounds and a method for processing the marker data.

【０００２】[0002]

【従来の技術】地上波や衛星あるいはケーブルを使った
テレビ放送やラジオ放送、昨今ではインターネット上に
存在するホームページといったメディア上で様々な映像
や音声が放送あるいは配信されている。テレビやラジオ
では「放送される映像や音声などの内容そのもの」（こ
れをコンテンツと呼ぶ）は事前にそれを送出する事業者
側で決められている。もちろん季節や、時流、あるいは
その時期、特に人々の関心を惹くことになった事柄を事
業者が敏感に察知することなどで、直前に最も適した送
出内容を選別し、視聴者の好みに合ったコンテンツを送
出することは可能であり、現実に行われている。しかし
ながら、それは不特定多数に存在する視聴者の集団に対
しての好みを推測し、適合させるように選別を行ってい
るのであり、完全な個人に対して、個別な好みや要望に
対して対応しているわけではない。2. Description of the Related Art Various images and sounds are broadcast or distributed on media such as television broadcasts and radio broadcasts using terrestrial waves, satellites or cables, and homepages currently on the Internet these days. In television and radio, the “contents themselves such as video and audio to be broadcast” (this is called content) is determined in advance by the business operator side that transmits it. Of course, the operator may be sensitive to the season, the time, or the time, especially the things that have attracted people's attention. It is possible, and is actually being done, to send out such content. However, it estimates the tastes of an unspecified number of viewers and selects them to match, and responds to perfect individuals and individual tastes and requests. I'm not.

【０００３】これに対して最近の蓄積技術の発達と通信
技術の発達を受けて、完全な個人に対して個別な好みに
答えてコンテンツを送出しようという試みがなされるよ
うになってきた。その試みの一つとしてテレビエニータ
イム（ＴＶＡｎｙｔｉｍｅ、ＴＶエニータイム）とい
うものがある。このテレビエニータイムについては例え
ば文献１としてあげる「日経エレクトロニクス２００１
年４月２３日号１７３ページ」や文献２としてあげる朝
日新聞ホームページ（ＵＲＬ：「 HYPERLINK "http://w
ww.asahi.com./science/waza/010521.html" http://ww
w.asahi.com./science/waza/010521.html」）などに開
示されている。テレビやラジオ局、スタジオなどに蓄積
されたコンテンツ（これをアーカイブと呼ぶ）が膨大な
量におよんでおり、今後ますます増大し多様化する。ま
た、視聴者の側も例えばスポーツと言っても、国内野球
や相撲と言ったいずれの世代にも人気のあるものから、
米国大リーグやサッカーなど、視聴者の好みも多様化し
ている。そんな中、テレビ局やラジオ局が、視聴者の好
みや要求を的確に推測し、最も適したコンテンツを送出
することは難しい。また視聴者それぞれが好み、要求す
る内容が異なるため、元来、それにあった１００％応え
るコンテンツの送出は無理である。そこでこの技術は、
視聴者がその時の希望にあわせて自由に最適なコンテン
ツを呼び出すために考えられた仕組みであり、現在その
実用化に向けて開発が行われているのである。特にこの
技術はアーカイブデータの中の個別のコンテンツに、映
像・音声情報とは別にその中にどういうシーンを含むか
というテキスト形式のＩＤを付与し、これを使って、視
聴者が好みのコンテンツを選択、呼び出すよう考えられ
ている。（この様な映像あるいは音声などのコンテンツ
本来のデータとは別に付加される、コンテンツに対する
情報を与えるデータのことを「メタデータ」と呼ぶ。）
それについて文献２本文６行目以下では以下の様に開示
されている『例えば、深夜に帰宅して録画したプロ野球
を見る。巨人戦、松井選手の打席はまずチェックした
い。その時は「松井選手の打席だけ集めて見られるよう
にする、そんな仕組みです」と日本テレビの担当者、新
技術調査企画本部の浦野丈治さんは説明する。地上波の
テレビ電波には映像と音声しか含まれていない。しか
し、デジタル放送なら他の情報も電波に乗せられる。日
本でも０３年に地上波デジタル放送が始まる。そこで、
番組と一緒に、「映画」「スポーツ」「ニュース」など
のジャンルや、題名、出演者などの情報も送る。例え
ば、テレビ局が松井選手の打席の映像にＩＤをつけてお
けば、視聴者はその部分だけを選んで集めて見ることが
できる。浦野さんは「ＩＤのつけ方によっては、試合を
ダイジェストにして見るのも簡単」という。』すなわ
ち、自分が視聴したいコンテンツ、シーンをあらわす様
なテキストをキーワードとして送れば、そのキーワード
に該当するテキストデータ（メタデータ）を付与された
画像や音声を選択、返送してくるため、膨大なアーカイ
ブの中から、自分の見たい、聞きたいコンテンツを抽出
してくる作業が非常に、簡単・迅速かつ広範囲に行える
ようになるだろう。On the other hand, in response to the recent development of storage technology and communication technology, attempts have been made to send contents to perfect individuals in response to individual tastes. One of such attempts is called TV Anytime. The television anytime is described in, for example, Reference 1 “Nikkei Electronics 2001”.
April 23, 2003, page 173 "and the Asahi Shimbun homepage (URL:" HYPERLINK "http: // w
ww.asahi.com./science/waza/010521.html "http: // ww
w.asahi.com./science/waza/010521.html ”) and the like. The amount of content (called archives) accumulated in televisions, radio stations, studios, etc. has reached an enormous amount and will continue to grow and diversify. Also, from the viewer's side, even if it is called sports, it is popular with all generations such as domestic baseball and sumo,
The tastes of viewers are diversifying, such as in the US Major League and soccer. Under such circumstances, it is difficult for television stations and radio stations to accurately infer the viewer's preferences and requests and send out the most suitable content. Moreover, since the viewers have different tastes and requested contents, it is impossible to send out the contents that meet the demand, which is originally 100%. So this technology
This is a mechanism designed to allow viewers to freely call the optimum content according to their wishes at that time, and is currently being developed for practical use. In particular, this technology gives each content in the archived data a text-format ID that indicates what kind of scene is included in the content in addition to the video / audio information, and uses this to specify the content the viewer likes. Designed to be selected and called. (Data that gives information about the content, which is added separately from the original content data such as video or audio, is called "metadata".)
Regarding this, in the second line of the main text of Document 2, the following is disclosed as follows: "For example, watch a professional baseball recorded at home at midnight. First of all, I would like to check the battalion of Giants and Matsui. At that time, "It's a mechanism that allows you to collect only Matsui's at-bats and see them," explains Nippon Television's representative, Takeharu Urano of the New Technology Research and Planning Division. Terrestrial television waves contain only video and audio. However, with digital broadcasting, other information can be added to the radio waves. Terrestrial digital broadcasting will begin in 2003 in Japan. Therefore,
Information such as genres such as "movies,""sports," and "news", titles, performers, etc. are also sent with the program. For example, if the TV station attaches an ID to the at-bat image of player Matsui, the viewer can select and collect only those parts. Mr. Urano said, "Depending on how to attach the ID, it is easy to see the game as a digest." That is, if you send text that represents the content or scene you want to view as a keyword, you will select and send back images and audio with text data (metadata) corresponding to that keyword, so a huge amount It will be very easy, quick and wide-ranging to extract the contents you want to see and hear from the archive.

【０００４】[0004]

【発明が解決しようとする課題】しかしながらアーカイ
ブの中の映像や音声などに対して、見たい、聞きたいと
いう要望がいつも簡単なテキストであらわすことができ
るとは限らない。視聴者側の要望を単純なテキストで現
すことが難しい場合もあれば、コンテンツの中の特徴的
なシーンを単純なテキストで現すことが難しい場合もあ
る。例えば今週のプロ野球ジャイアンツ松井のホームラ
ンシーンは、「日付」「プロ野球」「セリーグ」「ジャ
イアンツ戦」「松井」「ホームラン」「本塁打」などの
キーワードを使うことで、視聴者の要望も表すこともで
きれば、コンテンツ自身もその中の特徴的なシーンとし
て現すこともできる。またサッカーＪリーグの中田のゴ
ールシーンも「日付」「場所」「サッカー」「Ｊリー
グ」「中田」「ゴール」などのキーワードであらわしメ
タデータとすることができる。しかしながら、例えばサ
ッカーで1対0であった試合の見所はそのたった1点のゴ
ールシーンのみではない。ゴールにいたるまでのシーン
の中に「見所」と思われるところが数多あるであろう
し、ゴールを防いだところにも「見所」は数多くあるだ
ろう。またいずれにも関係しないのだが、喝采が起こ
り、その試合中の大きな見せ場として注意を呼び起こす
シーンは決して少なくないと思われる。ところが、その
ゴールシーン以外の見所を単純なテキストで現し、メタ
データを付与することは容易ではない。例えば「歓声」
などのキーワードをある特定のシーンに付与するとして
も、それを誰がどの様に、どんな基準でどれだけ付与す
るか決めることは大変難しいし自動化もし難い。また、
視聴者の要望の中で「おもしろい」「すごい」「悲し
い」「つらい」「おだやかな」「おちつく」と言った感
覚的なキーワードをメタデータとしてシーンに付与する
ことも非常に難しい。どの様な基準でその様なキーワー
ドを対応させるかという判断が難しいとともに、判定
し、付与する作業に対して、人的労力も時間も要する。
これも自動化が難しい。However, it is not always possible to express a request for viewing or listening to video or audio in an archive with a simple text. In some cases, it is difficult to express the viewer's request with simple text, and in other cases it is difficult to express the characteristic scene in the content with simple text. For example, this week's professional baseball Giants Matsui's home run scene should also express the viewer's request by using keywords such as “date”, “professional baseball”, “series”, “giants match”, “Matsui”, “home run”, “home run”. If possible, the content itself can be represented as a characteristic scene in it. In addition, the goal scene of Nakata in the soccer J-League can also be represented by metadata such as “date”, “place”, “soccer”, “J-League”, “Nakada”, and “goal”. However, the highlight of a match that was 1-0 in soccer, for example, is not only that one-point goal scene. There will be many places that are considered "points of interest" in the scenes leading up to the goal, and there will be many "points of interest" where the goal is prevented. Although it is not related to any of them, it seems that there are quite a few scenes where cheers occur and the attention is given as a big show during the match. However, it is not easy to express the highlights other than the goal scene with simple text and add metadata. For example, "cheers"
Even if a keyword such as is added to a certain scene, it is very difficult to determine who assigns it, how and how much it is attached, and it is difficult to automate it. Also,
It is also very difficult to attach the sensory keywords such as “interesting”, “amazing”, “sad”, “painful”, “gentle”, and “chilling” to the scene as metadata in the viewer's request. It is difficult to judge by what kind of criteria such a keyword should be associated, and it takes a lot of human labor and time for the work of judging and assigning.
This is also difficult to automate.

【０００５】そこで本発明では、従来のメタデータとは
別に、画像や音声などのコンテンツやコンテンツの中の
シーン毎に、それを視聴した人間の生体的反応を自動的
に取りこみ加工して関連付ける標識データを用いること
を考えた。Therefore, according to the present invention, in addition to the conventional metadata, a marker associated with contents such as images and sounds and scenes in the contents is automatically captured and processed to correlate the biological reaction of a person who views the contents. Considered using the data.

【０００６】本発明の目的は、対象となる画像あるいは
音声などに対するその視聴した人間の生体的反応を自動
的に取りこみ加工し標識データとしてそのコンテンツや
コンテンツの中のシーン毎に自動的に関連付けること
で、単純なキーワード付けができないコンテンツやコン
テンツの中のシーンに対して、あいまいだったり感覚的
だったりするキーワードを用いて視聴者が検索、抽出、
呼び出しすることができるようにする標識データを提供
することにある。An object of the present invention is to automatically take in a biological reaction of a human being who has been viewed with respect to a target image or sound and process it and automatically associate it with the contents or scenes in the contents as marker data. Then, for the content or scenes in the content that cannot be simply keyworded, the viewer can search, extract, using keywords that are ambiguous or sensuous.
The purpose is to provide indicator data that can be called.

【０００７】[0007]

【課題を解決するための手段】人間の精神的な活動が様
々な生体的反応に現れることはよく知られている。つま
り全く運動を行わず、静止している人間においてもその
精神活動によって、生体的反応に様々な変化が現れる。
例えば心拍数が増加・減少したり、血圧が増加・減少し
たりする。また脳内の血流量が増加・減少したりして血
行の分布に変化が生じたりする。脳内の神経に電気パル
スが生じたり、その波形が変化したりする。また全身や
特定の部分の発汗が増加・減少したり、筋肉の膨張・収
縮が起こったりもする。目の瞳孔が拡大・収縮したりも
する。また呼吸数、呼吸の深さなどに変化が生じたりも
する。またその顔の表情や四肢の置き方による体位など
にも変化が現れる。[Means for Solving the Problems] It is well known that human mental activities appear in various biological reactions. In other words, even in a human who does not exercise at all and is stationary, various changes appear in the biological response due to the mental activity.
For example, the heart rate increases / decreases, and the blood pressure increases / decreases. In addition, the blood flow in the brain increases or decreases, and the distribution of blood circulation changes. An electric pulse is generated in a nerve in the brain or its waveform is changed. Also, sweating of the whole body or a specific part may be increased or decreased, or muscles may be expanded or contracted. The pupils of the eyes also expand and contract. In addition, changes may occur in breathing rate, breathing depth, etc. In addition, the facial expression and the posture depending on how the limbs are placed will also change.

【０００８】この様な人間の生体的反応は現在では様々
な機器によって検知、測定することができ、医療や生理
学研究、人間工学、スポーツなどの分野で活用されてい
る。例えば脳内の血流量などはＭＲＩやＳＱＵＩＤなど
の大掛かりな装置によっても測定することはできるが、
最近では赤外線レーザアレイを使ったヘッドギア型の小
さな装置で簡便に測定することができる。この装置には
赤外線半導体レーザと受光センサー（いずれも数ミリ
角）のものが数個〜数十個ヘッドギア内に配置されてい
るのみで、簡単かつ軽量でもあり、価格も安くどこでも
用意に用いることができる。また例えば目の瞳孔の拡大
や収縮などは小型のカメラによって観測可能であり、そ
の拡大収縮などの量についても、カメラに組み合わせた
画像認識システムによって簡単に定量化できる。セキュ
リティ対応のための瞳虹彩認識システムなどはその応用
の範疇であるが既に市販されている。また例えば心拍数
は数ミリ角の小さなセンサーで常時リアルタイムに検知
しつづけることが可能であるし、血圧測定も一般の人が
毎日の健康チェックに用いるために既に頻繁に用いられ
ており、現在では一本の指をリングに通すだけで数十秒
で測定を行うことができる。Such human biological reactions can be detected and measured by various devices at present, and are utilized in fields such as medical and physiological research, ergonomics and sports. For example, the blood flow in the brain can be measured by a large-scale device such as MRI or SQUID,
Recently, it is possible to measure easily with a small headgear type device using an infrared laser array. This device has only a few to several tens of infrared semiconductor lasers and light receiving sensors (both of which are several millimeters square) arranged in the headgear. It is simple and lightweight, inexpensive and easy to use anywhere. You can Further, for example, the expansion and contraction of the pupil of the eye can be observed by a small camera, and the amount of expansion and contraction can be easily quantified by an image recognition system combined with the camera. The pupil / iris recognition system for security is a category of its application, but it is already on the market. Also, for example, heart rate can be continuously detected in real time with a small sensor of a few millimeters square, and blood pressure measurement is already frequently used by the general public for daily health check, and nowadays. Measurements can be made in tens of seconds by simply passing one finger through the ring.

【０００９】上記にあげた例に代表される人間の生体的
反応はもちろん身体を動かす運動や作業によって変化が
もたらされることが多い。しかしそれのみならず、全く
運動や作業を行わない場合にも生体的反応に変化は生ず
る。すなわち精神的な活動が生体的反応に変化を生じせ
しめる。その精神的な活動の種類には意識的なものもあ
れば無意識的なものもある。能動的なものもあれば受動
的なものもある。身体の大きな動作は行わずとも、外部
から五感を通じて取り入れた情報に対して精神的な活動
に対して変化・反応が現れることは非常に一般的であ
る。またその精神的な変化・反応は生体的反応を引き起
こす。例えば驚きや興奮などは、どの生体的反応にも影
響を与え、心拍数を増加させたり、血圧を上げたりし、
瞬間的に瞳孔を収縮させたり手のひらの発汗を増大させ
たりもする、また当然脳内の特定の部位の血流量や電流
などの変化を生じさせることになる。[0009] In addition to the biological reactions of humans represented by the examples given above, changes are often brought about by movements and works that move the body. However, not only that, but also when no exercise or work is performed, the biological reaction changes. That is, mental activity causes a change in biological reaction. Some types of mental activity are conscious and some are unconscious. Some are active and some are passive. It is very common for a person to change or react to mental activity in response to information received from the outside through his five senses, even if he / she does not perform a large physical action. In addition, the mental change / reaction causes a biological reaction. Surprise and excitement, for example, can affect any biological response, increase heart rate, raise blood pressure,
It may also cause the pupil to momentarily contract or increase the sweating of the palm, and naturally cause changes in the blood flow, current, etc. at specific parts of the brain.

【００１０】人間が映像や音声などによるコンテンツを
視聴しているとき、その映像や音声が刺激となって視聴
者に精神的な反応あるいは活動が生ずる。もとより視聴
者は感動や感激、安らぎやおかしさ、おもしろさを求め
てコンテンツを視聴するのであるから、その結果精神的
な反応、活動が生ずるのは当然でもある。When a human is viewing content such as video or audio, the video or audio stimulates the viewer to cause a mental reaction or activity. Of course, since viewers watch the content in search of excitement, excitement, comfort, fun, and fun, it is natural that a mental reaction or activity occurs as a result.

【００１１】すなわちあるコンテンツが視聴者に感動や
感激、安らぎやおかしさ、面白さをどれだけ与えるか、
またそのコンテンツのどの部分、どのシーンが視聴者に
感動や感激、安らぎやおかしさ、面白さをどれだけ与え
るかということを、そのコンテンツに対する視聴者の一
種の評価とすることができる。そしてその評価を、視聴
者の精神的な反応・活動が反映された、視聴者の生体的
反応として情報収集することができる。そして、収集さ
れた生体的反応を的確な形のデータに加工してから、そ
のコンテンツあるいはコンテンツ中のシーンと関連付け
て保存しておけば、このデータを利用することにより、
膨大な量が蓄積されているアーカイブの中から、視聴者
の欲求に応じたコンテンツを的確に検索し、選択できる
ようになる。[0011] In other words, how much a certain content gives the viewer excitement, excitement, comfort, humor, and fun,
Further, what part and which scene of the content gives the viewer impression, excitement, comfort, weirdness, and fun can be a kind of evaluation of the viewer for the content. Then, the evaluation can be collected as information as the biological reaction of the viewer, which reflects the mental reaction / activity of the viewer. Then, after processing the collected biological reactions into data of an appropriate shape, and storing it in association with the content or the scene in the content, by using this data,
It will be possible to accurately search and select content that meets the desires of viewers from an enormous amount of archives.

【００１２】この様に、あるコンテンツに対する視聴者
の評価を、その精神的反応・活動の結果現れる生体的反
応として情報収集し、加工してもとのシーン、コンテン
ツに関連付けされたデータとし、検索に利用する。この
時この作製されたデータが視聴者の反応を評価し、ひい
てはそのシーン、コンテンツの評価としており、また元
のシーン、コンテンツに個別の標識（タグ）として関連
付けられていることから、このデータのことをアプレイ
ズドレスポンスタグ（Appraised Response Tag）略して
ＡＲタグと呼ぶことにする（Appraised：アプレイズド
とは〈人・能力などを〉評価する;〈状況などを〉認識
するという意味である。）。[0012] In this way, the viewer's evaluation of a certain content is collected as the biological reaction that appears as a result of the mental reaction / activity, and is processed into the data associated with the original scene and the content and retrieved. To use. At this time, this created data evaluates the viewer's reaction, and as a result, the evaluation of the scene and the content, and since it is associated with the original scene and the content as individual tags (tags), This is referred to as an AR tag for Appraised Response Tag (Appraised: Appraised means appraising <person / ability, etc .; recognizing <situation, etc.>).

【００１３】以下、以上の考察から導かれた本発明につ
いて説明する。The present invention derived from the above consideration will be described below.

【００１４】本発明におけるＡＲタグは目的とする画像
あるいは音声などに対するその視聴した人間の生体的反
応を加工し、その画像あるいは音声に関連付けたことを
特徴とする画像および音声データに対する標識データで
ある。The AR tag in the present invention is marker data for image and voice data, which is characterized in that the biological reaction of the viewed human being to the target image or voice is processed and is associated with the image or voice. .

【００１５】また本発明におけるＡＲタグは、目的とす
る画像あるいは音声などに対するその視聴した人間の生
体的反応を、その視聴した人間の感情的反応に関連付け
て加工し、その画像あるいは音声に関連付けたことを特
徴とする画像および音声データに対する標識データであ
る。Further, the AR tag in the present invention processes the biological reaction of the viewed person to the desired image or sound in association with the emotional reaction of the viewed person and associates it with the image or sound. It is the marker data for the image and audio data characterized by the above.

【００１６】また本発明におけるＡＲタグは、目的とす
る画像あるいは音声などに対するその視聴した人間の生
体的反応と、既にその目的とする画像あるいは音声など
のデータに関連付けられている既存の標識データとを共
に利用して加工し、あらたにその画像あるいは音声に関
連付けたことを特徴とする画像および音声データに対す
る標識データである。The AR tag according to the present invention is the biological response of the human being who has viewed the target image or sound, and the existing tag data already associated with the target image or sound data. Is the tag data for the image and audio data, which is characterized in that it is processed by using the above and is associated with the image or audio.

【００１７】また本発明におけるＡＲタグは、目的とす
る画像あるいは音声などに対するその視聴した人間の生
体的反応を、目的とする画像あるいは音声の連続する任
意のデータ断片に対して関連付けたことを特徴とする画
像および音声データに対する標識データである。Further, the AR tag in the present invention is characterized in that the biological reaction of the viewed human to the target image or sound is associated with any continuous data fragment of the target image or sound. It is the tag data for the image and audio data.

【００１８】また本発明は、目的とする画像あるいは音
声などに対するその視聴した人間の生体的反応を加工
し、その画像あるいは音声に関連付けることを特徴とす
る画像および音声データに対する標識データの加工方法
である。The present invention also provides a method of processing marker data for image and sound data, which is characterized by processing the biological reaction of a human being who has viewed the target image or sound and relating it to the image or sound. is there.

【００１９】また本発明は、目的とする画像あるいは音
声などに対するその視聴した人間の生体的反応を、その
視聴した人間の感情的反応に関連付けて加工し、その画
像あるいは音声に関連付けることを特徴とする画像およ
び音声データに対する標識データの加工方法である。Further, the present invention is characterized in that the biological reaction of the viewed human being with respect to a desired image or sound is processed in association with the emotional reaction of the viewed human, and is associated with the image or sound. It is a method of processing the sign data for the image and sound data.

【００２０】また本発明は、目的とする画像あるいは音
声などに対するその視聴した人間の生体的反応と、既に
その目的とする画像あるいは音声などのデータに関連付
けられている既存の標識データとを共に利用して加工
し、あらたにその画像あるいは音声に関連付けらること
を特徴とする画像および音声データに対する標識データ
の加工方法である。Further, the present invention utilizes both the biological response of the viewed human to the desired image or sound and the existing sign data already associated with the desired image or sound data. The method is a method of processing marker data for image and audio data, which is characterized by being processed and then associated with the image or audio.

【００２１】また本発明は、目的とする画像あるいは音
声などに対するその視聴した人間の生体的反応を、目的
とする画像あるいは音声の連続する任意の断片に対して
関連付けることを特徴とする画像および音声データに対
する標識データの加工方法である。Further, the present invention is characterized in that the biological reaction of the viewed human to the desired image or sound is associated with any continuous fragment of the desired image or sound. It is a method of processing label data for data.

【００２２】また本発明におけるＡＲタグは、関連付け
に用いた生体的反応を発生した個人を識別できることを
特徴とする画像および音声データに対する標識データで
ある。The AR tag in the present invention is tag data for image and voice data, which is characterized in that it can identify an individual who has caused a biological reaction used for association.

【００２３】[0023]

【発明の実施の形態】（第１の実施形態）次に本発明に
おける人間の生体的反応を加工し、目的とする画像ある
いは音声に関連付けた標識であるＡＲタグの作製と利用
に関する実施の形態について図を参照しながら説明す
る。BEST MODE FOR CARRYING OUT THE INVENTION (First Embodiment) Next, an embodiment relating to the production and use of an AR tag which is a marker associated with a desired image or sound by processing a human biological reaction according to the present invention Will be described with reference to the drawings.

【００２４】図１は、本発明の第１の実施形態に係るＡ
Ｒタグの作製の手順についてあらわした関係図である。
先ず目的とする映像あるいは音声１１が存在する。この
映像あるいは音声１１は撮影したり録音したり複写した
りすることで記録可能なあらゆる種類の事象であり、現
在進行中の事象そのものでもよいし、既に撮影あるいは
録音あるいは複写された後に、再生、再現されている事
象でもよい。その様な事象には例えば通常テレビあるい
はビデオや映画などで放映されているもので代表される
様々な内容のものがあり、例えば自然現象、スポーツ、
ドラマ、バラエティ、事件報道、講義、会見、コンサー
ト、実況放送、実録報道、演芸などがあげられる。また
これは音声のみのものでもよく、例えばラジオ、ＣＤ、
カセットテープなどで聞くことのできるもので代表され
る様々な内容のものがあり、例えば自然現象、音楽曲や
ドラマ、バラエティ、事件報道、講義、会見、コンサー
ト、実況放送、実録報道、演芸などがある。これを反応
提供者１２が見たり、聞いたりした場合、反応提供者の
生体的情報にそれらの映像あるいは音声に対する反応が
現れる。FIG. 1 shows an A according to a first embodiment of the present invention.
FIG. 6 is a relational diagram showing a procedure for producing an R tag.
First, there is a desired video or audio 11. This video or audio 11 is any kind of event that can be recorded by shooting, recording, or copying, and it may be the event itself that is currently in progress, or it can be played back after it has already been filmed, recorded, or copied. It may be an event that is being reproduced. Such phenomena include various contents such as those usually shown on television or video and movies, such as natural phenomena, sports,
Dramas, varieties, case reports, lectures, interviews, concerts, live broadcasts, live coverage, and entertainment. It may also be audio only, eg radio, CD,
There are various contents typified by what you can listen to with cassette tapes, for example, natural phenomena, music songs and dramas, variety, incident coverage, lectures, conferences, concerts, live broadcasts, live coverage, performances etc. is there. When the reaction provider 12 sees or hears this, the reaction to those images or sounds appears in the biological information of the reaction provider.

【００２５】なおこの反応提供者１２はこのＡＲタグ作
製のためにその事象１１の視聴に対する生体的反応のデ
ータを提供してくれる個人あるいは複数の人間であり、
特にその目的で用意した人でも良く、たまたま何の目的
もなくその映像あるいは音声を視聴した人でもよい。あ
るいはこの事象１１の映像あるいは音声を撮影あるいは
録音している撮影者や録音者でもよく、またこの映像あ
るいは音声を再生、編集、加工している作業者であって
もよい。またこの映像あるいは音声の事象が発生してい
るときにその事象を実行、構成、形成している本人であ
ってもよい。またコンテンツを利用している特定あるい
は不特定多数の視聴者やその集合であってもよい。The reaction provider 12 is an individual or a plurality of people who provide the data of the biological reaction to the viewing of the event 11 for producing the AR tag,
In particular, it may be a person prepared for that purpose, or a person who happens to watch the video or audio without any purpose. Alternatively, it may be a photographer or a sound recorder who shoots or records the image or sound of the event 11, or may be an operator who reproduces, edits, or processes the image or sound. Further, the person who is executing, configuring, or forming the event when the event of the video or audio is occurring may be the person. It may also be a specific or unspecified large number of viewers who are using the content, or a set of such viewers.

【００２６】データコレクタ１３は反応提供者１２が発
生する一種類あるいは複数種類の生体的反応のデータを
収集記録する装置であり、生体的反応を感知するための
部分とその感知したデータを一時的に保存する部分、次
の工程にあるプロセサ１４に送る部分などから構成され
るが、それぞれが組み合わさって一個の装置として形を
なしたものであってもよいし、機能別に個別な装置が通
信手段によって結合されているものでもよい。その様な
例としては例えば、反応提供者に装着した脈拍計や血圧
計、脳内電流測定装置、筋電流計などを包含した装置、
あるいは反応提供者の近傍に設置した瞳撮影用カメラ、
表情撮影用カメラなど直接反応提供者に装着しないデー
タ収集装置を包含した装置でもよい。The data collector 13 is a device for collecting and recording data of one or a plurality of types of biological reactions generated by the reaction provider 12, and a part for sensing the biological reactions and the sensed data temporarily. It is composed of a part to be stored in the device, a part to be sent to the processor 14 in the next process, etc., but may be combined to form a single device, or individual devices may communicate by function. It may be connected by means. As such an example, for example, a device including a pulse rate monitor and a sphygmomanometer attached to a reaction provider, a brain current measuring device, a muscle ammeter, and the like,
Or a camera for pupil photography installed near the reaction provider,
It may be a device including a data collection device such as a facial expression camera that is not directly attached to the reaction provider.

【００２７】またデータコレクタ１３は特に生体的反応
データの収集記録伝達専用の装置であってもよいが、そ
の他の機能との複合的な装置であってもよい。例えば映
像あるいは音声などの事象１１そのものを記録するため
の撮影装置、録音装置、録画装置と生体的反応の収集装
置との複合した装置であってもよい。その様な例として
例えば野球中継を撮影しているカメラマンとその周辺に
ある機器などを考えることができる。この場合映像およ
び音声１１は実際に目の前で行われている野球の試合で
ある。また反応提供者１２はカメラマンである。データ
コレクタ１３はカメラでありそのカメラにカメラマンの
生体的反応を感知する機能を持った装置やその収集した
生体的反応を記録あるいは他へ中継する装置が付加され
ている。あるいはこのデータコレクタ１３は映像あるい
は音声などの事象１１を記録するカメラとカメラマンの
生体的反応を感知し収集記録する装置とを通信手段によ
って結んだものでもよい。Further, the data collector 13 may be a device specifically for collecting and transmitting the biological reaction data, or may be a device combined with other functions. For example, it may be a photographing device for recording the event 11 itself such as video or audio, a recording device, or a combined device of a recording device and a biological reaction collecting device. As such an example, for example, a cameraman shooting a baseball broadcast and devices around the cameraman can be considered. In this case, the video and audio 11 is a baseball game actually being held in front of the eyes. The reaction provider 12 is a cameraman. The data collector 13 is a camera, and a device having a function of detecting a biological reaction of a cameraman and a device for recording or relaying the collected biological reaction to another camera are added to the camera. Alternatively, the data collector 13 may be a camera that records the event 11 such as video or audio and a device that detects and collects and records the biological reaction of the cameraman by communication means.

【００２８】データコレクタ１３の次の工程にあるプロ
セサ１４は収集された一人あるいは複数人数の反応提供
者が発生する一種類あるいは複数種類の生体的反応を適
当な形に加工しＡＲデータとする装置であり、またもと
の事象の映像あるいは音声１１の対応するシーン、コン
テンツなどに関連付けしていく作業を行う装置でもあ
る。加工の方法は様々あり、複数の生体的反応のデータ
を単純に加算してから規格化してもよいし、複数あるデ
ータの種類に応じて重み付けを行い、ある関数に基づい
て演算を行ってもよい。そのデータの大きさも様々考え
られ、通常数ビット〜１０数ビット前後と考えられる。
またその様なビット数のデータを複数個備えて一つのＡ
Ｒタグとしてもよく、ＡＲタグ同士の結合をまた新たな
ＡＲタグとしてもよい。また反応提供者１２を識別し、
その情報もＡＲタグにとりこんでもよいし、ＡＲタグと
は別にメタデータとしてもよい。The processor 14 in the next step of the data collector 13 is a device for processing one kind or plural kinds of biological reactions generated by one or a plurality of reaction providers collected into an appropriate form to obtain AR data. In addition, it is also a device for performing a work of associating with the scene or content corresponding to the video or audio 11 of the original event. There are various processing methods, and data of multiple biological reactions may be simply added and then standardized, or weighting may be performed according to the types of multiple data, and calculation may be performed based on a certain function. Good. The size of the data may be various, and is usually considered to be around several bits to several tens of bits.
In addition, a plurality of data having such a bit number are provided to form one A
It may be an R tag or a combination of AR tags may be a new AR tag. Also identify the reaction provider 12,
The information may be incorporated in the AR tag or may be metadata separately from the AR tag.

【００２９】加工され生成されたＡＲタグは元の事象の
映像あるいは音声１１と関連付けられるが、その関連付
けはコンテンツ全体に対してでもよいし、コンテンツ中
のあるシーンに対してでもよく、あるいはそのシーンの
中のより短い単位やフレームであってもよい。ただしそ
れぞれにおいてＡＲタグとその関連付けられる部分の位
置情報は合致している。つまり、コンテンツ全体に関連
付けられたＡＲタグはそのコンテンツ全体に対する評価
として加工され生成されており、あるシーン毎に関連付
けられたＡＲタグはそのシーン毎に対する評価として加
工され生成されている必要がある。その様子を概念的に
図３に示してある。すなわちコンテンツそのものの全体
データ３０に対して一つあるいは複数個のＡＲタグを関
連付けることができる。この場合のＡＲタグはコンテン
ツ全体に対する評価として加工、生成されている。また
コンテンツの中の断片である、あるシーン３１に対して
も一つあるいは複数個のＡＲタグを関連付けることがで
き、また更に短い断片３２に対しても同様である。The processed and generated AR tag is associated with the video or audio 11 of the original event, which may be associated with the entire content, a certain scene in the content, or the scene. It may be a shorter unit or frame in. However, in each case, the position information of the AR tag and the position associated with the AR tag match. That is, the AR tag associated with the entire content needs to be processed and generated as an evaluation for the entire content, and the AR tag associated with each certain scene needs to be processed and generated as an evaluation for each scene. The situation is conceptually shown in FIG. That is, one or a plurality of AR tags can be associated with the entire data 30 of the content itself. The AR tag in this case is processed and generated as an evaluation for the entire content. Also, one or a plurality of AR tags can be associated with a certain scene 31, which is a fragment of the content, and the same is true for a fragment 32 having a shorter length.

【００３０】ある場合、生成されたＡＲデータはもとの
事象の映像あるいは音声１１の記録データそのものに付
加される。それにより、ＡＲデータが付加された元の事
象の映像あるいは音声１１の記録データであるところの
タグ付データ１５ができる。このデータはある蓄積メデ
ィア１６に蓄積され保存されるが、この蓄積メディア１
６はコンピュータ上のシステムに付属するデータベース
でも良いし、それらを含むサーバーやゲートウェーでも
よい。また家庭内の家電機器におけるビデオやオーディ
オにあたる蓄積メディアを持った映像あるいは音声再生
装置、送出装置でもよい。またＤＶＤなどの光記録媒体
やＶＨＳビデオテープなどの磁気記録媒体、あるいはフ
ラッシュメモリなどによる固体メモリなどで代表され
る、手軽に搬送可能な記憶媒体でもよい。In some cases, the generated AR data is added to the recorded data itself of the video or audio 11 of the original event. Thereby, the tagged data 15 which is the recorded data of the video or audio 11 of the original event to which the AR data is added is formed. This data is stored and stored in a certain storage medium 16. This storage medium 1
6 may be a database attached to the system on the computer, or a server or gateway including them. Further, it may be a video or audio reproducing device or a transmitting device having a storage medium corresponding to video or audio in home electric appliances in the home. Further, an easily transportable storage medium represented by an optical recording medium such as a DVD, a magnetic recording medium such as a VHS video tape, or a solid-state memory such as a flash memory may be used.

【００３１】またある場合、生成されたＡＲタグは元の
事象の映像あるいは音声１１などのコンテンツ記録デー
タそのものであるデータ１７とは別に保存されてもよ
い。データ１７がある蓄積メディアに１８に記録される
一方、ＡＲタグ１９は同じ蓄積メディア１８に蓄積され
てもよいが、全く別の蓄積メディア２０に保存されても
よい。この場合データ１７とＡＲタグ１９が分離してい
ることが前述の場合と異なる。分離し別々の場所に保管
されていてもＡＲタグ１９が元のデータ１７に関係付け
られ、保存場所を特定できる。ＡＲタグ自身が元のデー
タ１７の保存場所を特定できなくても、元のデータ１７
と、その関係付けられたＡＲタグ１９の保存場所をデー
タベース化して、別の場所２１で持っておいてもよい。
蓄積メディア１８と蓄積メディア２０はコンピュータ上
のシステムに付属するデータベースでも良いし、サーバ
ーやゲートウェーでもよい。また家庭内の家電機器にお
けるビデオやオーディオにあたる蓄積メディアを持った
映像あるいは音声再生装置、送出装置でもよい。またＤ
ＶＤやＶＨＳビデオテープ、あるいはフラッシュメモリ
などによる固体メモリの様な手軽に搬送可能な記憶媒体
でもよい。In some cases, the generated AR tag may be stored separately from the data 17 which is the content recording data itself such as the video or audio 11 of the original event. The AR tag 19 may be stored in the same storage medium 18, while the data 17 is recorded in 18 in one storage medium, or may be stored in a completely different storage medium 20. In this case, the fact that the data 17 and the AR tag 19 are separated is different from the above case. Even if they are separated and stored in different places, the AR tag 19 is related to the original data 17 and the storage place can be specified. Even if the AR tag itself cannot specify the storage location of the original data 17, the original data 17
The storage location of the associated AR tag 19 may be stored in a database 21 and stored in another location 21.
The storage medium 18 and the storage medium 20 may be a database attached to a system on a computer, a server or a gateway. Further, it may be a video or audio reproducing device or a transmitting device having a storage medium corresponding to video or audio in home electric appliances in the home. Also D
A storage medium that can be easily carried, such as a VD or VHS video tape, or a solid-state memory such as a flash memory may be used.

【００３２】（第２の実施形態）次に本発明におけるＡ
Ｒタグ利用に関する実施の形態について図を参照しなが
ら説明する。(Second Embodiment) Next, A in the present invention
Embodiments relating to the use of R tags will be described with reference to the drawings.

【００３３】図２は、本発明におけるＡＲタグの利用方
法についてあらわした関係図である。一般の視聴者２２
は通常テレビやラジオ、インターネットを視聴する不特
定の人間であり、誰であってもよい。この一般視聴者２
２は自分が視聴したいと想像するシーンやコンテンツを
あらわす１個あるいは複数個のキーワードをプロセサ２
３に対して送る。このプロセサ２３はコンピュータ上の
システムに付属するデータベースでも良いし、それらを
含むサーバーやゲートウェーでもよい。また家庭内の家
電機器におけるビデオやオーディオにあたる蓄積メディ
アを持った映像あるいは音声再生装置、送出装置でもよ
い。またＤＶＤなどの光記録媒体やＶＨＳビデオテープ
などの磁気記録媒体、あるいはフラッシュメモリなどに
よる固体メモリなどで代表される、手軽に搬送可能な記
憶媒体でもよい。さてこのプロセサ２３に送られたその
キーワードが「場所」「時間」「人」「現象」「出来
事」などで代表される具体的なものであり、簡単なテキ
ストで表現可能である場合は、先ずそのコンテンツやシ
ーンに対して既存の技術で分類、標識付けされているメ
タデータを利用して検索、選択する。更にキーワードが
「おもしろい」「楽しい」「悲しい」「感動する」「感
激する」「おかしい」「落ち着く」「なごむ」などあい
まいな言葉であったり、感覚的な言葉であった場合、プ
ロセサ２３はＡＲタグを使って、その要望を満足させる
ことのできるアーカイブデータを検索する。このときプ
ロセサ２３は、蓄積メディア２４の中に蓄積されている
ＡＲタグのついた映像や音声のコンテンツファイルを、
データにつけられたＡＲタグを頼りに検索し、選択して
一般視聴者２２に届ける。この時プロセサ２３はＡＲタ
グを付与されたタグ付データを直接検索してもよい。ま
たプロセサ２３は蓄積メディア２６中に蓄積されたＡＲ
タグであるタグ２７を検索し、その結果から蓄積メディ
ア２８上にあるもとの映像あるいや音声のデータ２９を
呼び出し、取り出してきて一般視聴者２２に送ってもよ
い。FIG. 2 is a relational diagram showing a method of using the AR tag in the present invention. General audience 22
Is an unspecified person who usually watches TV, radio, and the Internet, and can be anyone. This general viewer 2
2 is one or more keywords that represent the scene or content that you want to watch.
Send to 3. This processor 23 may be a database attached to the system on the computer, or a server or gateway including them. Further, it may be a video or audio reproducing device or a transmitting device having a storage medium corresponding to video or audio in home electric appliances in the home. Further, an easily transportable storage medium represented by an optical recording medium such as a DVD, a magnetic recording medium such as a VHS video tape, or a solid-state memory such as a flash memory may be used. If the keyword sent to the processor 23 is a concrete one represented by "place", "time", "person", "phenomenon", "event", etc., and can be expressed by simple text, first, The contents and scenes are searched and selected using the metadata classified and labeled by the existing technology. Further, if the keyword is an ambiguous word such as “interesting”, “fun”, “sad”, “impressed”, “impressed”, “funny”, “calm”, “nagomu”, or a sensory word, the processor 23 uses AR Use tags to find archived data that can meet your needs. At this time, the processor 23 stores the video and audio content files with the AR tag stored in the storage medium 24.
The AR tag attached to the data is searched for, selected, and delivered to the general viewer 22. At this time, the processor 23 may directly search the tagged data to which the AR tag is added. In addition, the processor 23 stores the AR stored in the storage medium 26.
The tag 27, which is a tag, may be searched, and from the result, the original video or audio data 29 on the storage medium 28 may be retrieved, retrieved, and sent to the general viewer 22.

【００３４】またあるいはプロセサ２３自身がタグ付デ
ータ２５やタグ２７に関するデータベースを備えたサー
バであり、そのデータベースからタグ付データ２５を取
り出して来て一般視聴者２２に送ってもよいし、またそ
のデータベースからタグ２７を取りだし、そのタグ２７
からデータ２９を選択して一般視聴者２２に送ってもよ
い。Alternatively, the processor 23 itself is a server provided with a database relating to the tagged data 25 and the tags 27, and the tagged data 25 may be taken out from the database and sent to the general viewer 22. Retrieve the tag 27 from the database and add the tag 27
The data 29 may be selected and sent to the general audience 22.

【００３５】一般視聴者２２に送られた後のＡＲタグは
そのままの状態であってもよいが、今、一般視聴者２２
の検索要求に該当したという履歴を新たに取りこんで、
プロセサ２３によって加工されてもよい。すなわち図２
で表されるこのＡＲタグによる検索利用を、図１で説明
したＡＲタグ生成のスタートとして、再利用してもよ
い。またその検索要求のあった一般視聴者２２を識別
し、その情報もあらたにＡＲデータとして取りこんでも
よい。The AR tag sent to the general viewer 22 may remain as it is, but now the general viewer 22
Incorporating a new history that corresponds to the search request of
It may be processed by the processor 23. That is, FIG.
The search and use by the AR tag represented by the above may be reused as the start of the AR tag generation described in FIG. Further, the general viewer 22 who has made the search request may be identified, and that information may be newly incorporated as AR data.

【００３６】[0036]

【発明の効果】本発明におけるＡＲタグを用いることに
より、映像や音声などの蓄積情報の中から、本来、具
体的なキーワードでは定義することが難しく、しかしな
がら印象や感覚的なキーワードでは標識付けが現実的に
難しいために的確な検索ができなかった事象について、
その検索の手がかりとなる標識タグを自動的に生成、関
係付けることができるようになり、その結果簡単に、速
く検索することが可能になる。EFFECT OF THE INVENTION By using the AR tag of the present invention, it is originally difficult to define a specific keyword from stored information such as video and audio. However, an impression or a sensory keyword cannot be tagged. Regarding the event that could not be searched accurately because it was difficult in practice,
It becomes possible to automatically generate and associate a tag tag that serves as a clue for the search, and as a result, it becomes possible to search easily and quickly.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係るＡＲタグの作製
の手順についてあらわした関係図FIG. 1 is a relational diagram showing a procedure of manufacturing an AR tag according to a first embodiment of the present invention.

【図２】本発明の第１の実施形態に係るＡＲタグの利用
に関する実施の手順についてあらわした関係図FIG. 2 is a relational diagram showing an implementation procedure regarding use of an AR tag according to the first embodiment of the present invention.

【図３】本発明の第２の実施形態にＡＲタグの付与の方
法について示した模式図FIG. 3 is a schematic diagram showing a method of adding an AR tag in the second embodiment of the present invention.

[Explanation of symbols]

１１映像・音声を含む事象１２反応提供者１３データコレクタ１４プロセサ１５タグ付コンテンツデータ１６蓄積メディア１７コンテンツデータ１８蓄積メディア１９ＡＲタグ２０蓄積メディア２１データベースあるいはサーバ２２一般視聴者２３プロセサ２４蓄積メディア２５タグ付コンテンツデータ２６蓄積メディア２７ＡＲタグ２８蓄積メディア２９コンテンツデータ３０コンテンツデータ全体３１コンテンツデータの断片（シーン）３２コンテンツデータの断片（フレーム） 11 Events including video and audio 12 Reaction provider 13 Data collector 14 Processor 15 Content data with tags 16 Storage media 17 Content data 18 Storage media 19 AR tags 20 Storage media 21 Database or server 22 General audience 23 Processor 24 Storage media 25 Tagged content data 26 Storage media 27 AR Tag 28 Storage media 29 Content data 30 Content data as a whole 31 Content data fragment (scene) 32 Content data fragments (frames)

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 7/08 Ｈ０４Ｎ 7/08 Ｚ 7/081 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) H04N 7/08 H04N 7/08 Z 7/081

Claims

[Claims]

1. Marking data for addition to an image or voice data, characterized by processing a biological reaction of a human being who has viewed the target image or voice or the like and associating it with the image or voice.

2. The biological reaction of the viewed person to a target image or sound is processed in association with the emotional reaction of the viewed person, and the processed image is related to the image or sound. Label data for addition to the image and audio data according to Item 1.

3. The processing using both the biological reaction of the viewed person to the target image or sound and the existing marker data already associated with the target image or sound data. The marker data for addition to the image and voice data according to claim 1 or 2, which is newly associated with the image or voice.

4. The method according to claim 1, wherein the biological response of the viewed human to the target image or sound is associated with any continuous data fragment of the target image or sound. Addition marker data to the image and audio data according to any one of 3 above.

5. A method of adding tag data for addition to image or voice data, characterized by processing a biological reaction of a human being who has viewed the image or voice of interest and associating it with the image or voice.

6. The method according to claim 6, wherein the biological reaction of the viewed person to the target image or sound is processed in association with the emotional reaction of the viewed person, and the processed image is related to the image or sound. 5. The method of adding the tag data for addition to the image and audio data according to 5.

7. The processing using both the biological response of the viewed human to the target image or sound and the existing marker data already associated with the target image or sound data. 7. The method of adding tag data for addition to the image or voice data according to claim 5 or 6, which is newly associated with the image or voice.

8. The method according to any one of claims 5 to 7, wherein the biological response of the viewed human to the target image or sound is associated with any continuous fragment of the target image or sound. A method of adding tag data for addition to the image or audio data according to any one of the above.

9. The image according to any one of claims 1 to 4, wherein in the labeling data according to any one of claims 1, 2, 3 and 4, an individual who has caused a biological reaction used for association can be identified. , Marker data for addition to voice data.