JP2003018540A

JP2003018540A - Image summarizing method and control program

Info

Publication number: JP2003018540A
Application number: JP2001203878A
Authority: JP
Inventors: Nozomi Takahashi; 望高橋
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-07-04
Filing date: 2001-07-04
Publication date: 2003-01-17
Anticipated expiration: 2021-07-04
Also published as: JP4390407B2

Abstract

PROBLEM TO BE SOLVED: To provide an image summarizing method which summarizes effectively image information comprising a plurality of image segments constituted by adding contents describing information for expressing image contents to every partial image, and a control program of the method. SOLUTION: About one combination of the contents describing information added en bloc to the image segment from younger one, the degree of similarity is calculated for each item, thereby setting the average value as the degree of similarity. After the calculation, combination whose degree of similarity is the highest is searched (S100). The contents describing information is merged simply by setting the younger segment to be precedent. Contents describing information after being merged is added to the partial image of an image segment of identity character number high order of the contents describing information. Low order segments are deleted, and this process is performed until a desired number of image segments is obtained (S104). A front frame of each summarized image segment and its contents describing information are displayed in a chart format on a monitor, or print output is performed, and the process is finished.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、一般に多大な時間
を必要とする映像の再生・視聴を行わずに、映像の概要
把握を可能とする映像要約方法およびその制御プログラ
ムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video summarization method and a control program therefor capable of grasping the outline of a video without playing or watching the video which generally requires a lot of time.

【０００２】[0002]

【従来の技術】映像情報の内容を把握するには、映像を
再生し視聴する必要がある。しかし、一般にこの作業に
は映像時間と略同時間程度の時間が必要になり、単に映
像情報の概要のみを把握したい場合は極めて面倒であ
る。このような問題に着目し、作業の効率化を計ったも
のとして、例えば、特開２０００−３０８００８号公報
と、特開平１０−１１２８３５号公報に開示された発明
がある。2. Description of the Related Art In order to grasp the contents of video information, it is necessary to reproduce and watch the video. However, this work generally requires about the same time as the video time, which is extremely troublesome when it is desired to simply grasp the outline of the video information. Focusing on such a problem, the inventions disclosed in Japanese Patent Application Laid-Open No. 2000-308008 and Japanese Patent Application Laid-Open No. 10-112835 are examples of measures for improving work efficiency.

【０００３】特開２０００−３０８００８号公報に開示
されたビデオのセグメント重要度を決定する方法および
フレームセットを限定領域にパックする方法は、ビデオ
の各々のショットの重要度を決定し、重要度に基づいて
ビデオ要約を生成し、重要度に基づいて代表フレームの
サイズを変更し限定領域にパックすることを目的として
いる。The method of determining the segment importance of a video and the method of packing a frame set into a limited area, which are disclosed in Japanese Patent Laid-Open No. 2000-308008, determine the importance of each shot of the video and determine the importance. Based on this, it is intended to generate a video summary, resize the representative frame based on its importance, and pack it into a limited area.

【０００４】その解決手段として、重要度の測定値が、
ビデオのセグメント化された部分について計算される。
重要度の測定値は、最も重要なセグメントを選択して、
選択されたセグメントについての代表フレームを生成す
るために使用することができる。閾値処理を行うプロセ
スは、フレームによって表されるべきショット或いはセ
グメントの予め決められた数、或いは、実行中に生成さ
れる適切な数を供給するために、重要度スコアに適用さ
れる。次いで、代表フレームは、ビデオ要約にパックさ
れる。パックされるべきフレームのサイズは、それらの
重要度の測定値によって予め決められ、使用可能空間に
従って調整される、としている。As a solution, the measured value of importance is
Calculated on the segmented part of the video.
For the measure of importance, select the most important segment,
It can be used to generate a representative frame for the selected segment. A thresholding process is applied to the importance score to provide a predetermined number of shots or segments to be represented by the frame, or an appropriate number generated during the run. The representative frames are then packed into the video summary. The size of the frames to be packed is predetermined by their measure of importance and is adjusted according to the available space.

【０００５】特開平１０−１１２８３５号公報に開示さ
れた映像要約方法および映像表示方法は、映像内容の多
様性および使用者の好みの多様性に対応する映像要約装
置および要約情報を効率的に表示するための映像表示装
置を提供することを目的としている。A video summarization method and a video display method disclosed in Japanese Patent Laid-Open No. 10-112835 efficiently display a video summarization device and summary information corresponding to a variety of video contents and a variety of user preferences. It is an object of the present invention to provide a video display device for doing so.

【０００６】その解決手段として、この発明の映像要約
システムは取り込まれた映像を所定の基準に基づいて分
割して形成した複数のシーンより映像の要約情報を抽出
するための複数の映像要約手段を含む映像要約装置と、
これらの要約情報を選択することのできる映像表示装置
を備えている。この構成により、一定の映像要約基準の
みで画一的に代表画像を選択する従来の映像要約装置や
代表画像を映像の長さに関係なく選択する従来の映像表
示装置と比べて、映像内容の多様性および使用者の好み
の多様性に対応することができる、としている。As a means for solving the problem, the video summarization system of the present invention comprises a plurality of video summarization means for extracting video summarization information from a plurality of scenes formed by dividing a captured video based on a predetermined standard. Video summarizing device including
The video display device capable of selecting the summary information is provided. With this configuration, compared to a conventional video summarizing device that uniformly selects a representative image based on only a certain video summarization standard and a conventional video display device that selects a representative image regardless of the video length, It is said that it can cope with the variety and the variety of user's preference.

【０００７】ところで、現在、国際標準化機構および国
際電気標準会議において、デジタルコンテンツを特徴に
よって検索する方式として、「マルチメディアコンテン
トの記述インターフェース」すなわち、ＭＰＥＧ−７の
標準化が進められている。ここで、映像に対するセグメ
ントおよび内容記述情報の例を図５に示す。図５に示す
ように、「学会発表の報告」の映像があったとする。こ
の「学会発表の報告」は、学会での発表報告を中心にそ
の前後、つまり、「出発前の練習」、「学会会場までの
道程」と続いて「会場内」そして「発表」といった具合
に、まず大きな節から成り、さらに、例えば「学会会場
までの道程」を例にとると、そのシーンは、「市外」、
「信号待ち」、「駅前広場」…、といったようにツリー
構造をなしており、映像情報として一般的である。な
お、このようにツリー構造を成すショット・シーン・意
味的なまとまりのような部分映像を、以下、「映像セグ
メント」という。At the present time, the International Organization for Standardization and the International Electrotechnical Commission are working on the standardization of "multimedia content description interface", that is, MPEG-7, as a method for searching digital contents by characteristics. Here, an example of a segment and content description information for a video is shown in FIG. As shown in FIG. 5, it is assumed that there is an image of “report of conference presentation”. This "report of the conference presentation" is mainly about the presentation report at the conference, that is, "practice before departure", "distance to the conference site", "inside the venue" and "presentation". , First, it consists of a large section, and further, for example, in the case of "the way to the meeting place", the scene is "outside the city",
It has a tree structure such as "waiting for a signal", "station square", etc., and is generally used as video information. Note that the shots, scenes, and partial videos that have a semantic unity that form a tree structure in this manner are hereinafter referred to as "video segments."

【０００８】そして、この内容記述情報とは、映像の内
容に関して記述したテキスト情報のことをいうが、例え
ば、単純な場合は登場人物や撮影場所名、日時、概要な
どが記述されている。さらに豊富な情報を含む場合は、
シーンに関連する背景などの情報や内容記述者の主観的
感想などを記述することも可能であり、図５に示すよう
に、夫々のシーンにおいて内容記述情報を付加すること
が可能である。この内容記述情報は、上述したように、
現在、ＭＰＥＧ−７などでその標準化がすすめられてい
る。一般に、これらの内容記述情報は膨大な量の映像デ
ータを持つ映像データベースからセグメントを検索する
ためのインデックスとして利用される。[0008] The content description information is text information that describes the content of the video. For example, in a simple case, the characters, the shooting place name, the date and time, the outline, etc. are described. If you want to include more information,
It is also possible to describe information such as the background related to the scene and the subjective impression of the content description person, and as shown in FIG. 5, content description information can be added to each scene. This content description information, as described above,
Currently, standardization is being promoted in MPEG-7 and the like. Generally, the content description information is used as an index for searching a segment from a video database having a huge amount of video data.

【０００９】[0009]

【発明が解決しようとする課題】一方、従来の技術とし
て挙げた先行技術は、その両者とも上記の作業を効率化
するために、画像の色情報などを利用し映像を自動的に
ショットに分割・代表フレームを選択し、各ショットの
代表フレーム間の類似度から似たショットを併合あるい
は削除することで、ショット数を減らし、それらの各シ
ョットの代表フレームおよびショットの説明を紙媒体の
ような二次元表示媒体にマッピングすることにより、映
像情報の効率的な把握を行っている。なお、これらを映
像情報の要約と呼んでいる。On the other hand, in the prior art cited as the conventional technology, both of them use the color information of the image or the like to automatically divide the video into shots in order to make the above work efficient.・ Reduce the number of shots by selecting a representative frame and merging or deleting similar shots based on the similarity between the representative frames of each shot, and explain the representative frame and shots of each shot like a paper medium. By mapping on a two-dimensional display medium, we are efficiently grasping video information. These are called video information summaries.

【００１０】しかしながら、これらの方法は画像の特徴
に依存するものであり、ショット間が画像（視覚）的に
全く異なる場合、例えショットの意味的な内容が似てい
ても、一方のショットが併合・削除されることはない。
すなわち、これらの方法から生成される映像要約は映像
中の画像の要約ではあるが、映像の内容の要約とは言え
ない。However, these methods depend on the characteristics of the image, and when the shots are completely different in image (visual), one shot is merged even if the semantic contents of the shots are similar. -It will not be deleted.
That is, although the video summary generated from these methods is a summary of the images in the video, it cannot be said to be a summary of the content of the video.

【００１１】そこで、本発明は、上記問題を解決するた
め、映像セグメントごとに付加された内容記述情報を利
用し、映像の内容の要約を行うことにより、映像情報の
概要把握の効率を向上させた映像要約方法およびその制
御プログラムを提供することを目的とする。In order to solve the above problem, the present invention improves the efficiency of grasping the outline of the video information by utilizing the content description information added to each video segment and summarizing the contents of the video. A video summarization method and a control program therefor are provided.

【００１２】[0012]

【課題を解決するための手段】従来技術として挙げた先
行技術でも述べられているように、映像の要約とは、す
なわちセグメントの併合を繰り返すことである。そこで
本件発明者は、画像情報ではなく、上記内容記述情報の
テキストの類似度とセグメントの時間長を利用すること
によって、効率的に映像の内容を把握できる映像要約情
報を生成されることを見出し、発明するに至った。As described in the prior art cited as the prior art, the video summarization is to repeat segment merging. Therefore, the inventor of the present invention has found that video summary information that can efficiently grasp the content of the video is generated by using the text similarity and the segment length of the content description information instead of the image information. , Came to invent.

【００１３】すなわち、上記課題を達成するため、請求
項１の発明は、シーンやカット等の部分映像毎にその映
像内容を表現する文字情報からなる内容記述情報が該部
分映像に付加されて映像セグメントが構成され、該映像
セグメントがツリー状に複数連なって構成された映像情
報を要約する映像要約方法であって、前記内容記述情報
同士の類似度を計算する計算工程と、前記計算工程で算
出された類似度の高い複数の映像セグメントのうち代表
となる一つの映像セグメントを構成する部分映像を残し
て削除すると共に、該部分映像に内容記述情報を付加し
て、類似度の高い複数の映像セグメント同士を併合する
第１併合工程と、を有してなり、前記計算工程と前記第
１併合工程とで前記映像情報の内容が把握できる程度に
部分映像を淘汰して内容記述情報が付加された映像要約
を作成することを特徴とする映像要約方法にある。That is, in order to achieve the above object, the invention of claim 1 is such that content description information consisting of character information representing the video content of each partial video such as scene or cut is added to the partial video. A video summarization method for summarizing video information in which a plurality of video segments are arranged in a tree shape, the calculation step of calculating a similarity between the content description information, and the calculation step Of the plurality of video segments having a high degree of similarity, the partial video that constitutes one representative video segment is left and deleted, and the content description information is added to the partial video to obtain a plurality of the videos having a high degree of similarity. A first merging step of merging segments with each other, and selecting partial images to such an extent that the contents of the video information can be grasped in the calculation step and the first merging step. In the video summary and wherein the creating the content video summary description information is added.

【００１４】請求項２の発明は、請求項１において、前
記内容記述情報が複数項目から構成されてなり、前記計
算工程が、該内容記述情報のうち指定した一部の項目に
関してのみ類似度を計算することを特徴とする映像要約
方法にある。According to a second aspect of the present invention, in the first aspect, the content description information is composed of a plurality of items, and the calculation step determines the degree of similarity only for a part of designated items of the content description information. It is a video summarization method characterized by calculation.

【００１５】請求項３の発明は、請求項１において、前
記内容記述情報が複数項目から構成されてなり、各項目
毎に映像要約を実行させる度合いである重みを与え、前
記計算工程が、該重みが付けられた項目にのみ類似度を
計算すると共に、前記第１併合工程が、該重みの度合い
に応じて併合させる優先順位の決定をし、所望数の映像
セグメントになるまで該優先順位に基づいて映像セグメ
ント同士を併合することを特徴とする映像要約方法にあ
る。According to a third aspect of the present invention, in the first aspect, the content description information is composed of a plurality of items, and each item is given a weight that is a degree of executing a video summary, and the calculation step is performed by The similarity is calculated only for the weighted items, and the first merging step determines the priority order of merging according to the degree of the weight, and the priority order is determined until the desired number of video segments is reached. A video summarization method characterized by merging video segments based on each other.

【００１６】請求項４の発明は、請求項１から３のいず
れかにおいて、前記第１併合工程は、前記類似度の高い
映像セグメントの組が複数あった場合において、夫々の
組における部分映像の合計時間長が最小の組を優先して
併合を行うことを特徴とする映像要約方法にある。According to a fourth aspect of the present invention, in any one of the first to third aspects, in the first merging step, in the case where there are a plurality of sets of the video segments having a high degree of similarity, the partial video of each set is divided. The video summarization method is characterized in that the group having the smallest total time length is preferentially merged.

【００１７】請求項５の発明は、請求項１から３のいず
れかにおいて、前記第１併合工程は、前記類似度の高い
映像セグメントの組が複数あった場合において、ツリー
構造のレイヤー深さが深い映像セグメントの組を優先し
て併合を行うことを特徴とする映像要約方法にある。According to a fifth aspect of the present invention, in any one of the first to third aspects, in the first merging step, when there are a plurality of sets of video segments having a high degree of similarity, the layer depth of the tree structure is A video summarization method is characterized in that a combination of deep video segments is preferentially merged.

【００１８】請求項６の発明は、請求項１から５のいず
れかにおいて、前記第１併合工程が、併合対象となった
複数の映像セグメントのうち代表となる映像セグメント
の内容記述情報を併合後の映像セグメントの内容記述情
報とすることを特徴とする映像要約方法にある。According to a sixth aspect of the present invention, in any one of the first to fifth aspects, the first merging step merges the content description information of a representative video segment among the plurality of merging target video segments. The video summarization method is characterized in that the content description information of the video segment is used.

【００１９】請求項７の発明は、請求項６において、前
記代表となる映像セグメントは、前記計算工程で算出さ
れた類似度の高い複数の映像セグメントのうち、包含関
係にある上位の映像セグメントであることを特徴とする
映像要約方法にある。According to a seventh aspect of the present invention, in the sixth aspect, the representative video segment is a higher-order video segment having an inclusive relationship among a plurality of high-similarity video segments calculated in the calculation step. There is a video summarization method characterized in that there is.

【００２０】請求項８の発明は、請求項１から５のいず
れかいにおいて、前記第１併合工程が、前記計算工程で
算出された類似度の高い複数の映像セグメントの夫々の
内容記述情報をマージし、そのマージされた内容記述情
報を併合された映像セグメントの内容記述情報とするこ
とを特徴とする映像要約方法にある。According to an eighth aspect of the present invention, in any one of the first to fifth aspects, the first merging step merges respective content description information of a plurality of video segments with high similarity calculated in the calculation step. Then, the merged content description information is used as the content description information of the merged video segment.

【００２１】請求項９の発明は、請求項１から８のいず
れかにおいて、前記映像セグメント毎に異なる映像時間
の和が最短となる映像セグメントの組をサーチし併合さ
せると共に、その組において映像時間の長い方の映像セ
グメントに付加された内容記述情報を併合された映像セ
グメントの内容記述情報とする第２併合工程を加えてな
り、前記第１併合工程と共に最適な映像要約となるよう
に適宜組み合わせまたは選択的に構成されていることを
特徴とする映像要約方法にある。In a ninth aspect of the present invention, in any one of the first to eighth aspects, a set of video segments having a shortest sum of video times different for each video segment is searched for and merged, and the video time is set in the set. A second merging step of using the content description information added to the longer video segment as the content description information of the merged video segment, and appropriately combining together with the first merging step so as to obtain an optimum video summary. Alternatively, the video summarization method is characterized by being selectively configured.

【００２２】請求項１０の発明は、請求項１から９のい
ずれかにおいて、前記ツリー状を呈した映像情報の各レ
イヤー層毎に映像要約を実行させる度合いである重みを
与え、該重みに応じて併合する映像セグメントの優先度
を変更する変更工程を加えたことを特徴とする映像要約
方法にある。According to a tenth aspect of the present invention, in any one of the first to ninth aspects, a weight, which is a degree to execute the video summarization, is given to each layer layer of the tree-shaped video information, and the weight is determined according to the weight. A video summarization method is characterized in that a change step for changing the priority of video segments to be merged is added.

【００２３】請求項１１の発明は、請求項１から１０の
いずれかにおいて、画像特徴による映像セグメントの併
合工程を加えてなり、前記内容記述情報による映像セグ
メントの併合と該画像特徴による映像セグメントの併合
とで最適な映像要約となるように適宜組み合わせまたは
選択的に構成されて映像要約を作成することを特徴とす
る映像要約方法にある。The invention of claim 11 is the method according to any one of claims 1 to 10, wherein a step of merging the video segments according to the image features is added, and the merging of the video segments according to the content description information and the video segment according to the image features are performed. A video summarization method is characterized in that a video summarization is created by appropriately combining or selectively configuring so as to obtain an optimum video summarization.

【００２４】請求項１２の発明は、請求項１から１１の
いずれかにおいて、要約された映像情報を構成する各映
像セグメントの静止フレームとその静止フレームに対す
る内容記述情報とで映像要約を作成、出力する要約出力
工程を加えたことを特徴する映像要約方法にある。According to the twelfth aspect of the present invention, in any one of the first to eleventh aspects, a video summary is created and output by the still frame of each video segment forming the summarized video information and the content description information for the still frame. In the video summarization method, a summarization output step is added.

【００２５】請求項１３の発明は、シーンやカット等の
部分映像毎にその映像内容を表現する文字情報からなる
内容記述情報が該部分映像に付加されて映像セグメント
が構成され、該映像セグメントがツリー状に複数連なっ
て構成された映像情報を要約する映像要約における制御
プログラムであって、前記内容記述情報同士の類似度を
計算する計算手段と、前記計算手段で算出された類似度
の高い複数の映像セグメント同士を併合する第１併合手
段と、を有してなり、コンピュータに前記計算手段、前
記第１併合手段、として機能させ、前記映像情報の内容
が把握できる程度に部分映像を淘汰して内容記述情報が
付加された映像要約を作成することを特徴とする映像要
約における制御プログラムにある。According to the thirteenth aspect of the invention, for each partial video such as a scene or a cut, content description information consisting of character information expressing the video content is added to the partial video to form a video segment, and the video segment is A control program in video summarization for summarizing video information formed by connecting a plurality of tree-like ones, a calculation means for calculating the similarity between the content description information, and a plurality of high similarity calculated by the calculation means. A first merging means for merging the image segments with each other, and causing a computer to function as the calculating means and the first merging means, and selecting partial images to the extent that the contents of the video information can be grasped. The control program in the video summary is characterized by creating a video summary to which content description information is added.

【００２６】請求項１４の発明は、請求項１３におい
て、前記内容記述情報が複数項目から構成されてなり、
前記計算手段が、該内容記述情報のうち指定した一部の
項目に関してのみ類似度を計算することを特徴とする映
像要約における制御プログラムにある。According to a fourteenth aspect of the present invention, in the thirteenth aspect, the content description information is composed of a plurality of items,
In the control program in the video summary, the calculating means calculates the degree of similarity only with respect to some specified items of the content description information.

【００２７】請求項１５の発明は、請求項１３におい
て、前記内容記述情報が複数項目から構成されてなり、
各項目毎に映像要約を実行させる度合いである重みを与
え、前記計算手段が、該重みが付けられた項目にのみ類
似度を計算すると共に、前記第１併合手段が、該重みの
度合いに応じて併合させる優先順位の決定をし、所望数
の映像セグメントになるまで該優先順位に基づいて映像
セグメント同士を併合することを特徴とする映像要約に
おける制御プログラムにある。According to a fifteenth aspect of the present invention, in the thirteenth aspect, the content description information is composed of a plurality of items,
Each item is given a weight that is a degree to execute the video summarization, the calculating means calculates the similarity only to the weighted items, and the first merging means determines the degree of the weight. The control program in the video summary is characterized in that the priority order for merging is determined, and the video segments are merged based on the priority order until the desired number of video segments is reached.

【００２８】請求項１６の発明は、請求項１３から１５
のいずれかにおいて、前記第１併合手段は、前記類似度
の高い映像セグメントの組が複数あった場合において、
夫々の組における部分映像の合計時間長が最小の組を優
先して併合を行うことを特徴とする映像要約における制
御プログラムにある。The invention of claim 16 is based on claims 13 to 15.
In any one of the above, in the case where there are a plurality of sets of video segments with high similarity,
The control program in video summarization is characterized in that a group having the smallest total time length of partial videos in each group is preferentially merged.

【００２９】請求項１７の発明は、請求項１３から１５
のいずれかにおいて、前記第１併合手段は、前記類似度
の高い映像セグメントの組が複数あった場合において、
ツリー構造のレイヤー深さが深い映像セグメントの組を
優先して併合を行うことを特徴とする映像要約における
制御プログラムにある。The invention of claim 17 is based on claims 13 to 15.
In any one of the above, in the case where there are a plurality of sets of video segments with high similarity,
A control program in video summarization characterized by preferentially merging a set of video segments having a deep tree structure layer depth.

【００３０】請求項１８の発明は、請求項１３から１７
のいずれかにおいて、前記第１併合手段が、併合対象と
なった複数の映像セグメントのうち代表となる映像セグ
メントの内容記述情報を併合後の映像セグメントの内容
記述情報とすることを特徴とする映像要約における制御
プログラムにある。The invention of claim 18 is based on claims 13 to 17.
In any one of the above, the first merging means uses the content description information of a representative video segment among the plurality of video segments to be merged as the content description information of the merged video segment. It is in the control program in the summary.

【００３１】請求項１９の発明は、請求項１８におい
て、前記代表となる映像セグメントは、前記計算手段で
算出された類似度の高い複数の映像セグメントのうち、
包含関係にある上位の映像セグメントであることを特徴
とする映像要約における制御プログラムにある。According to a nineteenth aspect of the present invention, in the eighteenth aspect, the representative video segment is a plurality of video segments having a high degree of similarity calculated by the calculation means.
A control program in video summarization characterized by being a higher-order video segment having an inclusive relationship.

【００３２】請求項２０の発明は、請求項１３から１７
のいずれかにおいて、前記第１併合手段が、前記計算手
段で算出された類似度の高い複数の映像セグメントの夫
々の内容記述情報をマージし、そのマージされた内容記
述情報を併合された映像セグメントの内容記述情報とす
ることを特徴とする映像要約における制御プログラムに
ある。The invention of claim 20 is based on claims 13 to 17.
In any one of the above, the first merging unit merges the content description information of each of the plurality of video segments with high similarity calculated by the calculation unit, and the merged content description information is merged into the video segment. In the control program in the video summary, which is characterized by the content description information.

【００３３】請求項２１の発明は、請求項１３から２０
のいずれかにおいて、前記映像セグメント毎に異なる映
像時間の和が最短となる映像セグメントの組をサーチし
併合させると共に、その組において映像時間の長い方の
映像セグメントに付加された内容記述情報を併合された
映像セグメントの内容記述情報とする第２併合手段を加
えてなり、コンピュータに該第２併合手段として機能さ
せ、前記第１併合手段と共に最適な映像要約となるよう
に適宜組み合わせまたは選択的に構成されていることを
特徴とする映像要約における制御プログラムにある。The invention of claim 21 is from claim 13 to claim 20.
In any of the above, the set of video segments having the shortest sum of the video times different for each video segment is searched and merged, and the content description information added to the video segment with the longer video time in the set is merged. Second merging means for making the content description information of the generated video segment added, and causing the computer to function as the second merging means, and appropriately combining or selectively with the first merging means so as to obtain an optimum video summary. It is in a control program in video summarization characterized by being configured.

【００３４】請求項２２の発明は、請求項１３から２１
のいずれかにおいて、前記ツリー状を呈した映像情報の
各レイヤー層毎に映像要約を実行させる度合いである重
みを与え、該重みに応じて併合する映像セグメントの優
先度を変更する変更手段を加えてなり、コンピュータに
該変更手段として機能させたことを特徴とする映像要約
における制御プログラムにある。The invention of claim 22 is based on claims 13 to 21.
In any one of the above, adding a changing means for giving a weight that is a degree of executing video summarization to each layer layer of the tree-shaped video information, and changing the priority of the video segment to be merged according to the weight. The control program in the video summary is characterized by causing a computer to function as the changing means.

【００３５】請求項２３の発明は、請求項１３から２２
のいずれかにおいて、画像特徴による映像セグメントの
併合手段を加えてなり、コンピュータに画像特徴による
映像セグメントの併合手段として機能させ、前記内容記
述情報による映像セグメントの併合と該画像特徴による
映像セグメントの併合とで最適な映像要約となるように
適宜組み合わせまたは選択的に構成されて映像要約を作
成することを特徴とする映像要約における制御プログラ
ムにある。The invention of claim 23 is from claim 13 to claim 22.
In any one of the above, a means for merging video segments according to image features is added, causing a computer to function as means for merging video segments according to image features, and merging video segments according to the content description information and merging video segments according to the image features. A control program for video summarization, characterized in that the video summarization is created by appropriately combining or selectively constructing an optimum video summarization.

【００３６】請求項２４の発明は、請求項１３から２３
のいずれかにおいて、要約された映像情報を構成する各
映像セグメントの静止フレームとその静止フレームに対
する内容記述情報とで映像要約を作成、出力する要約出
力手段を加えてなり、コンピュータに該要約出力手段と
して機能させたことを特徴する映像要約における制御プ
ログラムにある。The invention of claim 24 is based on claims 13 to 23.
In any one of the above, a summarization output means for creating and outputting a video summarization by the still frame of each video segment constituting the summarized video information and the content description information for the still frame is added, and the summary output means for the computer. It is in the control program for video summarization that is characterized by functioning as.

【００３７】[0037]

【発明の実施の形態】以下、添付図面を参照しながら、
本発明の実施の形態を詳細に説明する。まず、本実施形
態における映像要約における制御プログラムを説明する
前に、内容記述情報が付加された映像情報について図５
を用いて説明する。この映像情報は、図５は、従来の技
術の欄で説明したように、「学会発表の報告」の映像に
ついて例示している。すなわち、この「学会発表の報
告」は、学会での発表報告を中心にその前後、つまり、
「出発前の練習」、「学会会場までの道程」と続いて
「会場内」そして「発表」といった具合に、まず大きな
節から成り、例えば「学会会場までの道程」を例にとる
と、さらにそのシーンは、「市外」、「信号待ち」、
「駅前広場」…、といったようにツリー構造（階層構
造）になった部分映像によって構成されている。DETAILED DESCRIPTION OF THE INVENTION Referring to the accompanying drawings,
Embodiments of the present invention will be described in detail. First, before explaining the control program in the video summary in the present embodiment, the video information to which the content description information is added will be described with reference to FIG.
Will be explained. This video information exemplifies the video of the "report of academic conference presentation" as described in the section of the related art in FIG. In other words, this "report of academic conference presentation" is mainly before and after the presentation report at the academic conference, that is,
"Practice before departure", "Journey to the meeting place", "Inside the meeting place", and "Presentation", etc. First, there are big sections. For example, "Journey to the meeting place" The scene is "outside the city", "waiting for the signal",
It is composed of partial images with a tree structure (hierarchical structure) such as "Station Square".

【００３８】夫々映像時間が異なった各レイヤー（層）
夫々の部分映像は、各シーンが特定できるようにSegmen
tID が振られており、このSegmentID に、Who 、When、
Where 、WhatAction、WhatObject、Why 、FreeTextの７
つの項目からなる内容記述情報（テキスト）が夫々付加
されて映像セグメントが構成され、検索、管理、編集な
どが容易に行えるようになっている。なお、この内容記
述情報は、図示したような７項目に限定されず、またこ
のような項目分けによる態様にも限らず、その他、映像
セグメントと対応付けられればその形式は問わない。Each layer having different video time
Each partial video is Segmen so that each scene can be identified.
tID is assigned to this SegmentID, Who, When,
Where, WhatAction, WhatObject, Why, FreeText 7
Content description information (text) consisting of one item is added to each to form a video segment, which facilitates retrieval, management, and editing. It should be noted that this content description information is not limited to the seven items shown in the drawing, and is not limited to such an item division mode, and any other format may be used as long as it is associated with a video segment.

【００３９】ここで、説明を簡単にするために、図５に
おいて映像の一部、すなわち、学会会場までの道程Segm
entID 1.2 の下層レイヤーのうち、SegmentID 1.2.3
と、SegmentID 1.2.4 と、SegmentID 1.2.5 の各映像セ
グメントに対応する内容記述情報について着目して説明
を進める。これらの内容記述情報は以下のようになって
いる。Here, in order to simplify the explanation, a part of the image in FIG. 5, that is, the path Segm to the conference site is shown.
Of the lower layers of entID 1.2, SegmentID 1.2.3
Then, the description will proceed by focusing on SegmentID 1.2.4 and the content description information corresponding to each video segment of SegmentID 1.2.5. The contents description information is as follows.

【００４０】（１） SegmentID 1.2.3 の内容記述情報 Who = " 高橋望" When = "1999年10月16日 "Where = "Capitole市街, Toulouse, France" WhatAction= " 駅前広場まで説明しながら歩く" WhatObject = "すれ違う人々" Why = "IDMS'99発表会場へ向かう "FreeText = " きれいな街。いろいろな人々が行き交
う。赤信号で止まる。(1) Information describing the contents of SegmentID 1.2.3 Who = "Nozomi Takahashi" When = "October 16, 1999" Where = "Capitole city, Toulouse, France" WhatAction = "Walk while explaining to the station square "WhatObject =" People passing each other "Why =" Heading to the IDMS'99 presentation site "FreeText =" A beautiful city where various people come and go. Stop at a red light.

【００４１】（２） SegmentID 1.2.4 の内容記述情報 Who = " 高橋望" When = "1999年10月16日" Where = "Capitole 市街, Toulouse, France" WhatAction = "信号待ち" WhatObject = "信号機" Why = "IDMS'99発表会場へ向かう" FreeText = "警報機のような何かが鳴っていると思った
ら、どうやら信号らしい。青の時にわたる。"(2) Information describing the contents of SegmentID 1.2.4 Who = "Nozomu Takahashi" When = "October 16, 1999" Where = "Capitole city, Toulouse, France" WhatAction = "Waiting for signal" WhatObject = "Signal signal""Why="I'm heading to the IDMS'99 presentation site "FreeText =" If you think something like an alarm is ringing, it looks like a signal. It's blue. "

【００４２】（３） SegmentID 1.2.5 の内容記述情報 Who = " 高橋望" When = "1999年10月16日" Where = "Metro Capitole Station 駅前広場, Toulous
e, France" WhatAction = "駅構内への階段を下る" WhatObject = "メトロの入り口と看板" Why = "IDMS'99発表会場へ向かう" FreeText = "公園のようにきれい。メトロの看板。"(3) Information describing the contents of SegmentID 1.2.5 Who = "Nozomi Takahashi" When = "October 16, 1999" Where = "Metro Capitole Station Station Square, Toulous
e, France "WhatAction =" Down the stairs to the station yard "WhatObject =" Metro entrance and signboard "Why =" Towards the IDMS'99 presentation site "FreeText ="It's beautiful like a park. Metro sign. "

【００４３】本実施形態における映像要約における制御
プログラムは、図１に示すように、内容記述情報パース
手段１と、画像情報取得手段２と、映像要約出力手段３
とを備えて構成されている。内容記述情報パース手段１
は、映像情報と内容記述情報とを併合させるものであ
り、内容記述情報に基づいて各映像セグメントを併合す
る第１併合手段１１と、各映像セグメントの時間長に基
づいて各映像セグメントを併合する第２併合手段１２
と、レイヤーによるセグメント併合手段１３とを備えて
なる。As shown in FIG. 1, the control program for video summarization in the present embodiment is a content description information parsing means 1, an image information acquisition means 2, and a video summarization output means 3.
And is configured. Content description information parsing means 1
Is for merging the video information and the content description information. The first merging means 11 for merging each video segment based on the content description information and each video segment based on the time length of each video segment. Second merging means 12
And segment merging means 13 by layers.

【００４４】第１併合手段１１は、各映像セグメントに
付加された内容記述情報の類似度を計算する計算手段が
具備され、映像情報を構成する複数の映像セグメントの
該内容記述情報に対して漸次類似度を計算していき、類
似度の高い映像セグメント同士を抽出して併合を行なう
ようになっている。この第１併合手段１１における併合
動作を説明すると、まず、ツリー中の全葉ノードの映像
セグメントに付加された内容記述情報の全組み合わせに
対して、その類似度を計算する。The first merging means 11 is provided with a calculating means for calculating the degree of similarity of the content description information added to each video segment, and gradually calculates the content description information of a plurality of video segments forming the video information. The similarity is calculated, and video segments with high similarity are extracted and merged. The merging operation of the first merging means 11 will be described. First, the similarity is calculated for all combinations of the content description information added to the video segments of all leaf nodes in the tree.

【００４５】類似度の計算法の簡単な例としては、各項
目ごと（Who 、Where 、・・・）に以下の式で類似度を
計算し、その平均を類似度とするなどがある。この場
合、値の範囲は0.0 〜1.0 で、値が大きいほど類似度が
高くなる。As a simple example of the similarity calculation method, there is a method of calculating the similarity for each item (Who, Where, ...) With the following formula and using the average as the similarity. In this case, the value range is 0.0 to 1.0, and the higher the value, the higher the similarity.

【００４６】[0046]

【数１】 [Equation 1]

【００４７】本発明では、テキスト間の類似度を計算で
きれば特に計算法は限定しない。しかし、言葉の揺れ
（例えば" おはよう" 、" おはよー" ）を吸収できる機
能を持つテキスト比較エンジンがより好ましい。また、
上記計算法例で言えば、平均ではなく各項目ごとに重み
をつけることで、内容記述情報の任意の項目（複数でも
可）に注目して類似度を計算することも可能である。In the present invention, the calculation method is not particularly limited as long as the similarity between texts can be calculated. However, a text comparison engine that can absorb the fluctuation of words (for example, "Good morning" and "Good morning") is more preferable. Also,
In the example of the above calculation method, by weighting each item instead of the average, it is possible to calculate the similarity by paying attention to any item (a plurality of items may be included) of the content description information.

【００４８】[0048]

【数２】 [Equation 2]

【００４９】ここで、重み(n) とは、７項目夫々に設定
された類似判断を行う際の優先的な度合いであり、その
総和は１となる。例えば、Where と WhatAction の項目
に対する度合いを夫々”０．３”とし、WhatObjectとWh
y の項目に対する度合いを夫々”０．２”とし、残りの
Who とWhenとFreeTextの項目に対する度合いを夫々”
０”とした場合、まず、Where と WhatAction の項目に
対して類似度を計算し、続いて、WhatObjectとWhy の項
目に対して類似度を計算する。このとき、残りのWho と
WhenとFreeTextの項目が同一または極めて類似していて
も類似度の計算は行わない。このように特定項目につい
てのみ類似度を計算させて、その平均値とせずに各項目
毎に類似度を判断してもよく、その場合、この項目ごと
に異なる度合いが判断時における優先度を表すことにな
る。なお、このような重みをつけずに、単純に特定の項
目のみ類似度を計算させて、その平均値を類似度として
も良い。Here, the weight (n) is a priority degree when the similarity judgment is set for each of the seven items, and the total sum is 1. For example, the degree for Where and WhatAction items is set to "0.3" respectively, and WhatObject and Wh
The degree for each item of y is set to "0.2", and the remaining
The degree to the items of Who, When and FreeText respectively ”
When it is set to 0 ”, first the similarity is calculated for the items of Where and WhatAction, and then the similarity is calculated for the items of WhatObject and Why. At this time, the remaining Who and
Even if the When and FreeText items are the same or very similar, the similarity is not calculated. In this way, the similarity may be calculated only for a specific item, and the similarity may be determined for each item instead of the average value. In that case, the degree different for each item represents the priority at the time of determination. It will be. It is also possible to simply calculate the degree of similarity only for a specific item without applying such a weight and use the average value as the degree of similarity.

【００５０】このようにして、ツリー中の全葉ノードの
映像セグメントに付加された内容記述情報の全組み合わ
せに対して、その類似度を漸次計算していく。そして、
内容記述情報の全組み合わせの類似度の中で、類似度の
高かった映像セグメントの組を映像情報および内容記述
情報の併合対象と決定し併合する。続いて、併合された
映像セグメントと、その他の映像セグメントとの全組み
合わせに対して類似度を漸次計算、併合していく。この
一連の動作は、予め指定された映像セグメント数になる
まで繰り返し行われる。In this way, the similarity is gradually calculated for all combinations of the content description information added to the video segments of all leaf nodes in the tree. And
Among the similarities of all the combinations of the content description information, the set of the video segments having the highest similarity is determined as the merge target of the video information and the content description information and merged. Then, the similarity is gradually calculated and merged for all combinations of the merged video segment and other video segments. This series of operations is repeated until the number of video segments designated in advance is reached.

【００５１】このように計算手段によって、各映像セグ
メントに付加された内容記述情報の類似度を計算した
ら、類似度の高い組の併合を行う。この二つの映像セグ
メントにおける内容記述情報の併合は、例えば以下の３
種類が挙げられる。併合対象の二つの映像セグメントに
おける内容記述情報をＣとＤ、併合結果の映像セグメン
トにおける内容記述情報をＥとして考える。After the similarity of the content description information added to each video segment is calculated by the calculating means in this way, the sets having a high similarity are merged. The merging of the content description information in these two video segments is performed, for example, in the following 3
There are several types. Consider the content description information in the two video segments to be merged as C and D, and the content description information in the merged video segment as E.

【００５２】ＣとＤとの内容記述情報間に包含関係があ
る場合、包含する側の映像セグメントの内容記述情報の
みを、Ｅの内容記述情報とする。すなわち、以下の通り
である。Ｃのテキスト⊃Ｄのテキストが成り立つ時、Ｅの内
容記述情報＝Ｃの内容記述情報Ｃのテキスト⊂Ｄのテキストが成り立つ時、Ｅの内
容記述情報＝Ｄの内容記述情報Ｃのテキスト＝Ｄのテキストが成り立つ時、Ｅの内
容記述情報はＣ、Ｄのどちらでも良い。When there is an inclusive relation between the content description information of C and D, only the content description information of the video segment on the inclusion side is the content description information of E. That is, it is as follows. When C text ⊃ D text holds, E content description information = C content description information C text ⊂ D text holds, E content description information = D content description information C text = D When the text is valid, the content description information of E may be either C or D.

【００５３】このようにして、新たな内容記述情報が決
定したら、併合対象となった各映像セグメントのうち、
包含関係上位の映像セグメントにおける部分映像に新た
な内容記述情報を付加し、包含関係下位の映像セグメン
トを切り捨てる。このようにすることで、類似度の高い
複数の映像セグメントを一つにまとめていく。なお、こ
の併合は、各レイヤー層を関係なく全ての映像セグメン
トの全組合わせで行ったり、各レイヤー層毎に組み合わ
せを限定して行ったり、あるいは、最下層のレイヤー層
から上位のレイヤー層に向かって漸次行ったり、または
その逆から行ったり、各レイヤー層毎に併合させる優先
度である”重み”を付けて併合を行う等、映像セグメン
トを併合させる形態は特に限定されない。When new content description information is determined in this way, among the video segments to be merged,
New content description information is added to the partial video in the video segment with the higher inclusion relation, and the video segment with the lower inclusion relation is truncated. By doing so, a plurality of video segments having a high degree of similarity are put together. Note that this merging is performed for all combinations of all video segments regardless of each layer layer, limited combination for each layer layer, or from the lowest layer layer to the upper layer layer. The form of merging video segments is not particularly limited, such as gradual grading toward each other or vice versa, or merging with a "weight" that is the priority of merging for each layer layer.

【００５４】また、上記した新たな内容記述情報を決定
する他の方法して、ＣとＤの内容記述情報をマージし、
その結果を、Ｅの内容記述情報としても良い。マージの
方法は、項目ごとに単純に語をつなぎあわせ、重複する
部分は省くという方法や、FreeText部に関しては、既存
技術である複数の類似文書からの要約方式（稲垣博人
ら、類似意味内容の統合による伝達型電子化文書要約方
式の提案、情報処理学会第５６回全国大会講演論文集、
分冊２、pp.255-256,1998 ）を用いても良い。また、単
純に語（文章）をつなぎあわせていく場合は、SegmentI
D の若いほうを先頭にした場合のほうが、無理のない語
（文章）となる場合が多く、好ましいものである。前者
の例としてSegmentID1.2.3、SegmentID1.2.4の内容記述
情報をマージした結果を以下に示す。As another method of determining the above-mentioned new content description information, the content description information of C and D is merged,
The result may be used as the content description information of E. The method of merging is to simply connect words for each item and omit the overlapping parts.For the FreeText part, the summarization method from multiple similar documents, which is the existing technology (Inagaki Hiroto et al. Proposal of Transfer-type Electronic Document Summarization Method by Integration of Information, Proceedings of 56th National Convention of Information Processing Society of Japan,
Separate volume 2, pp.255-256, 1998) may be used. If you simply connect words (sentences), SegmentI
It is preferable that the younger one of D comes first because it often becomes a reasonable word (sentence). As the former example, the result of merging the content description information of SegmentID 1.2.3 and SegmentID 1.2.4 is shown below.

【００５５】（１） SegmentID 1.2.3 とSegmentID 1.
2.4 をマージした後の内容記述情報 Who = " 高橋望" When = "1999年10月16日" Where = "Capitole 市街, Toulouse, France" WhatAction = "駅前広場まで説明しながら歩く信号待
ち" WhatObject = "すれ違う人々信号機" Why = "IDMS'99発表会場へ向かう" FreeText = "きれいな街。いろいろな人々が行き交う。
赤信号で止まる。警報機のような何かが鳴っていると思
ったら、どうやら信号らしい。青の時にわたる。"(1) SegmentID 1.2.3 and SegmentID 1.
Description information after merging 2.4 2.4 Who = "Nozomi Takahashi" When = "October 16, 1999" Where = "Capitole city, Toulouse, France" WhatAction = "Walk while explaining to the station square" WhatObject = "Passing traffic lights" Why = "Towards the presentation site of IDMS'99" FreeText = "A beautiful city where various people come and go.
Stop at the red light. If you think something like an alarm is ringing, it looks like a signal. Over the blue hour. "

【００５６】このように、SegmentID1.2.3とSegmentID
1.2.4との組を比較すると、FreeText部において”信
号”が同数で一致し、Who 部とWhen部とWhere 部とWhy
部とが夫々完全一致して併合対象となり、SegmentID の
若いほうを先頭にマージさせると共に、SegmentID の若
いほうの部分映像にこのマージされた内容記述情報を付
加させる。そしてSegmentID1.2.4を削除する。In this way, SegmentID 1.2.3 and SegmentID
Comparing the pair with 1.2.4, the same number of "signals" were found in the FreeText part, and the Who part, When part, Where part and Why part were
The sections and the sections are completely matched to be merged, and the younger one of the SegmentID is merged at the beginning, and the merged content description information is added to the partial video of the younger SegmentID. Then delete SegmentID 1.2.4.

【００５７】類似度が高い組が複数あった場合は、二つ
の映像セグメントの合計時間長が最小の組に対して併合
処理を行う。この場合において時間長の短い部分映像は
切り捨てても良いし、あるいは、単純に部分映像同士を
繋げても良いものである。また、類似度の高い組が複数
あった場合に、ツリー構造のレイヤー深さが深い映像セ
グメントの組の方を優先して併合処理を行うことが好ま
しい。When there are a plurality of groups having a high degree of similarity, the merge process is performed on the group having the smallest total time length of the two video segments. In this case, the partial videos having a short time length may be discarded, or the partial videos may be simply connected. In addition, when there are a plurality of groups having a high degree of similarity, it is preferable to preferentially perform the merging process for the group of video segments having a deep tree structure layer depth.

【００５８】次に、セグメント時間長に基づく第２併合
手段１２を説明する。第２併合手段１２は、各映像セグ
メントにおける内容記述情報の類似判断は行わずに、単
に各映像セグメントの夫々の映像時間の和が最短となる
組を見つけ出し、その組を併合させるものであり、二つ
の各映像セグメントにおける内容記述情報の併合は以下
の通りになる。Next, the second merging means 12 based on the segment time length will be described. The second merging means 12 simply finds out a group having the shortest sum of the video time of each video segment without judging the similarity of the content description information in each video segment, and merges the group. The merging of the content description information in each of the two video segments is as follows.

【００５９】ＣとＤの時間長を比較して、長い方の映像
セグメントの内容記述情報のみをＥの内容記述情報とす
る。すなわち、以下の通りである。Ｃの時間長＞Ｄの時間長が成り立つ時、Ｅの内容記
述情報＝Ｃの内容記述情報Ｃの時間長＜Ｄの時間長が成り立つ時、Ｅの内容記
述情報＝Ｄの内容記述情報Ｃの時間長＝Ｄの時間長が成り立つ時、Ｅの内容記
述情報はＣ、Ｄのどちらでも良い。The time lengths of C and D are compared, and only the content description information of the longer video segment is used as the content description information of E. That is, it is as follows. When C time length> D time length holds, E content description information = C content description information C time length <D time length holds, E content description information = D content description information C When the time length = D is satisfied, the content description information of E may be either C or D.

【００６０】このようにして、新たな内容記述情報が決
定したら、併合対象となった各映像セグメントのうち、
映像時間の短いほうの映像セグメントを切り捨てたり、
あるいは、部分映像同士を接続して新たな内容記述情報
を付加して一つにまとめる。When new content description information is determined in this way, of the video segments to be merged,
Truncate the video segment with the shorter video time,
Alternatively, the partial videos are connected to each other and new content descriptive information is added to combine them.

【００６１】レイヤーによるセグメント併合手段１３
は、ツリー状を呈した映像情報の各レイヤー層毎に映像
要約を実行させる度合いである重みを与え、その重みに
応じて併合する映像セグメントの優先度を変更する変更
手段を、上記した第１併合手段と第２併合手段夫々に加
えたものである。Segment segment merging means 13
Is a means for changing the priority of video segments to be merged according to the weight, which is a degree of executing the video summarization for each layer of the tree-shaped video information. This is added to each of the merging means and the second merging means.

【００６２】例えば、図５において映像情報「学会の発
表」のレイヤー層は、”３”であり、中段のレイヤー層
の重みを”０．６”とし、最下のレイヤー層の重みを”
０，４”とした場合、まず、中段のレイヤー層で併合で
きる映像セグメントはないか上記した第１併合手段また
は第２併合手段を用いてサーチを行う。そして中段のレ
イヤー層で併合対象がなくなったら、最下のレイヤー層
に移って併合できる映像セグメントはないか上記した第
１併合手段または第２併合手段を用いてサーチを行う。
そして、この、”重み”を併合する際の優先順位とし、
中段のレイヤー層における併合をし、続いて、最下のレ
イヤー層における併合をして、所望の映像セグメント数
になるまで併合処理を行う。この場合、中段のレイヤー
層による併合のみで所望の映像セグメント数になった場
合は、そこで処理を終了させる。For example, in FIG. 5, the layer layer of the image information "Presentation by the academic society" is "3", the weight of the middle layer layer is "0.6", and the weight of the bottom layer layer is "
In the case of 0, 4 ", first, the first merging unit or the second merging unit described above is used to search for any video segment that can be merged in the middle layer layer. Then, the first merging unit or the second merging unit is used to search for a video segment that can be merged by moving to the lowermost layer layer.
Then, this "weight" is set as the priority when merging,
The merging process is performed on the middle layer layers, and then on the lowest layer layers until the desired number of video segments is reached. In this case, if the desired number of video segments is reached only by the merging by the middle layer layers, the processing is ended there.

【００６３】またこれとは別に、”重み”を単純な割合
として捉えて、複数の映像セグメント数で構成されたオ
リジナルの映像情報を所望数の映像セグメントで再構築
する際に、減らす映像セグメント数を各層の割合に置き
換えて上記した第１併合手段１１または第２併合手段１
２を用いてサーチ、併合を行っても良い。Separately from this, when the "weight" is taken as a simple ratio, the number of video segments to be reduced when the original video information composed of a plurality of video segments is reconstructed with a desired number of video segments. Is replaced with the ratio of each layer, and the first merging unit 11 or the second merging unit 1 is used.
The search and merge may be performed using 2.

【００６４】画像情報取得手段２は、画像情報によるセ
グメント併合手段２１からなり、この技術的手段は各部
分映像の画像特徴を利用した公知技術であり、その詳細
は本願要旨ではないため説明は省略する。The image information acquisition means 2 comprises a segment merging means 21 based on image information, and this technical means is a known technique utilizing the image feature of each partial video, and the details thereof are not the gist of the present application, so that the description thereof will be omitted. To do.

【００６５】映像要約出力手段３は、上記した第１併合
手段１１、第２併合手段１２、レイヤーによるセグメン
ト併合手段１３、画像情報によるセグメント併合手段２
１のうち１つを選択し、複数の映像セグメント数で構成
されたオリジナルの映像情報を所望する数の映像セグメ
ントになるまで併合し、再構築された映像情報を出力さ
せるものである。その態様としては、単純に、要約され
た映像情報を”動画”としてディスプレイ上に表示させ
たり、要約された各映像セグメントのフロントフレーム
を抜き出し、そのフロントフレームと、その内容記述情
報とを一覧形式でディスプレイ上に表示または印刷出力
させる等があげられる。The video summary output means 3 includes the first merging means 11, the second merging means 12, the layer merging means 13 and the image information segment merging means 2 described above.
One of the two is selected, the original video information composed of a plurality of video segments is merged until a desired number of video segments is obtained, and the reconstructed video information is output. As its mode, simply, the summarized video information is displayed on the display as a "moving image", or the front frame of each summarized video segment is extracted, and the front frame and its content description information are listed. It can be displayed on the display or printed out.

【００６６】後者の一覧形式で出力させた要約前の出力
例を図２(SegmentID1.2.3 〜1.2.5が例示) に要約後の
出力例を図３(SegmentID 1.2.3〜1.2.6 が例示) に示
す。この図面は、GUI 環境を実現したOSがインストール
されたコンピュータに、本実施形態における映像要約に
おける制御プログラムをインストールし、動作させて出
力させたものである。この出力例で用いられた要約手段
は、内容記述情報同士をマージし、その結果を新たな内
容記述情報とする第１併合手段１１を用いている。The output example before the summary output in the latter list format is shown in FIG. 2 (SegmentID 1.2.3 to 1.2.5 is illustrated) and the output example after the summary is illustrated in FIG. 3 (SegmentID 1.2.3 to 1.2.6 is illustrated. ). In this drawing, a control program for video summarization according to the present embodiment is installed in a computer in which an OS that realizes a GUI environment is installed, and is operated and output. The summarizing means used in this output example uses the first merging means 11 that merges the content description information with each other and uses the result as new content description information.

【００６７】図３に示すようにSegmentID1.2.3とSegmen
tID1.2.4との組を比較すると、FreeText部において”信
号”が一致し、Who 部とWhen部とWhere 部とWhy 部とが
夫々完全一致して併合対象となる。そして図４に示すよ
うに、映像セグメントIDの若い方を先頭に内容記述情報
同士を単純にマージすると共に、映像セグメントIDの若
い方を残してSegmentID1.2.4の映像セグメントは削除し
てディスプレイ上に一覧形式で表示させる。この例で言
えば、本発明により閲覧者は、画像数で三枚分ダウンロ
ード量が削減され、一画面のうちで三映像セグメントし
か閲覧できなかったものを四映像セグメントを閲覧でき
るようになっている。なお、この要約された一覧は印刷
可能である。As shown in FIG. 3, Segment ID 1.2.3 and Segmen
Comparing the pair with tID1.2.4, the “Signal” matches in the FreeText part, and the Who part, When part, Where part, and Why part completely match each other, and are to be merged. Then, as shown in FIG. 4, the content description information is simply merged with the younger one of the video segment IDs at the beginning, and the younger one of the video segment IDs is left, and the video segment of SegmentID 1.2.4 is deleted and displayed on the display. Display in list format. In this example, according to the present invention, the viewer can reduce the download amount by three images and can view four video segments from one screen that could only view three video segments. There is. Note that this summarized list can be printed.

【００６８】以上、本実施形態における映像要約の制御
プログラムの各手段の機能について説明したが、図１に
示すように各併合手段を適宜切り替えて映像要約を出力
させたり、組み合わせて画像要約を行う。ここで、各手
段それぞれの機能を特定して映像要約を行う一連の手順
の一例を図４のフローチャートを用いて説明する。The functions of the respective means of the video summarization control program according to the present embodiment have been described above, but as shown in FIG. 1, the respective merging means are appropriately switched to output the video summarization or combined to perform the image summarization. . Here, an example of a series of procedures for specifying the function of each means and performing video summarization will be described with reference to the flowchart of FIG.

【００６９】まず、要約後の映像セグメント数を決定す
る。仮に映像セグメント数が３００で構成された映像情
報を１００の映像セグメント数にする場合は、２００の
映像セグメントを削除することとなる。要約後の映像セ
グメント数が決定したら、セグメントIDの若い方から総
当りで映像セグメントに付加された１組の内容記述情報
同士について各項目ごと（Who、Where 、・・・）に類
似度を計算し、その平均値を類似度としていく。計算が
終わったらその類似度の最も高い組をサーチする（ステ
ップＳ１００）。First, the number of video segments after summarization is determined. If the video information composed of 300 video segments is to be 100 video segments, 200 video segments will be deleted. Once the number of video segments after summarization is determined, the similarity is calculated for each item (Who, Where, ...) for one set of content description information added to the video segments in a brute force manner from the youngest segment ID. Then, the average value is used as the similarity. When the calculation is completed, the set with the highest similarity is searched (step S100).

【００７０】このとき、類似度の高い組が複数あった場
合に、ツリー構造のレイヤー深さが深い映像セグメント
の組の方を、併合する際の優先順位（併合順番）を上げ
ておく（ステップＳ１０１）。また、二つの映像セグメ
ントの合計時間長が最小の組も優先の対象とする（ステ
ップＳ１０２）。なお、この両者の優先度兼ね合いは、
合計時間長とレイヤー深さとを勘案してある設定値によ
って割り振られている。At this time, when there are a plurality of groups having a high degree of similarity, the group of video segments having a deep layer depth of the tree structure is given a higher priority (merge order) for merging (step). S101). In addition, a group having the smallest total time length of two video segments is also prioritized (step S102). In addition, the priority balance of both is
It is assigned by a setting value that takes into consideration the total time length and the layer depth.

【００７１】このようにして併合対象が決まったら、優
先順位（併合順番）の一番高い組の映像セグメント同士
を併合させていく。この場合、内容記述情報は単純にセ
グメントIDの若い方を先頭にマージすると共に、内容記
述情報の一致文字数が上位の方の映像セグメントの部分
映像にこのマージされた内容記述情報を付加し、下位の
方の映像セグメントは削除する（ステップＳ１０３）。When the merging target is determined in this way, the video segments of the highest priority order (merging order) are merged. In this case, the content description information is simply merged with the younger segment ID first, and the merged content description information is added to the partial video of the video segment whose matching character count of the content description information is higher, The video segment of that is deleted (step S103).

【００７２】そして、所望した要約後の映像セグメント
数に達したか否か判断する（ステップＳ１０４）。達し
ていないと判断したらステップＳ１００に戻り再度併合
を行うまでの手順を踏んで併合を行う（ステップＳ１０
４：ＮＯ）。所望した要約後の映像セグメント数に達し
たら（ステップＳ１０４：ＹＥＳ）、各部分映像の画像
特徴を捉えてさらに画像セグメントの併合を行って（ス
テップＳ１０５）、要約された画像情報を得る。このよ
うにして得られた画像情報は、要約された各映像セグメ
ントのフロントフレームを抜き出し、そのフロントフレ
ームと、その内容記述情報とを一覧形式でディスプレイ
上に表示させたり印刷出力がされて処理が終了する。Then, it is judged whether or not the desired number of video segments after summarization has been reached (step S104). If it is determined that the number of times has not been reached, the process returns to step S100 to perform the merge process again until the merge process is performed (step S10).
4: NO). When the desired number of video segments after summarization is reached (step S104: YES), image features of each partial video are captured and image segments are merged (step S105) to obtain summarized image information. The image information obtained in this way is processed by extracting the front frame of each video segment summarized and displaying the front frame and its content description information in a list format on the display or by printing out. finish.

【００７３】なお、ユーザーが設定した類似度に閾値を
与えることで制限を加えることも可能である。この場
合、全ての内容記述情報の組の類似度が閾値以下に収ま
るまで繰り返し行われる。It is also possible to add a limit by giving a threshold to the degree of similarity set by the user. In this case, the process is repeated until the similarity of all the content description information sets falls below the threshold value.

【００７４】また、これとは逆に先に画像特徴による要
約方法を利用してから本手法を用いること、交互に組み
合わせて利用しても同様の映像セグメント数を減らすこ
とも可能である。これは、本手法である内容記述情報を
利用した要約と画像特徴を利用した要約だけに限らず、
その他の手法、例えば、映像セグメント時間長や音声情
報を利用した要約などを組み合わせることが可能であ
る。もちろん本手法である内容記述情報を利用した要約
手段だけで構成しても良いものである。On the contrary, it is also possible to reduce the number of similar video segments by first using the summarizing method based on the image features and then using the present method, or by alternately using the methods. This is not limited to the summarization using the content description information and the image feature, which is the present method,
It is possible to combine other methods, for example, a video segment time length and a summary using audio information. Of course, only the summarizing means using the content description information of this method may be used.

【００７５】[0075]

【発明の効果】本発明は以上のように構成したから、下
記の有利な効果を奏する。本発明によると、映像セグメ
ント毎に付加された内容記述情報に記述された類似度を
算出しそれに基づいて映像セグメントを併合させていく
から、従来のような画像の要約ではなく、映像の内容の
要約が確実にできる。従って、映像情報の概要を把握す
る際に、短時間で映像の内容が把握でき、例えば、要約
された映像情報を構成する各映像セグメントの静止フレ
ームとその静止フレームに対する内容記述情報とで映像
要約を一覧出力させた場合、映像を再生し視聴する必要
がなくなり、この作業にかかる時間コストが削減でき
る。また、インターネットのようなネットワーク越しに
映像の概要を公開するときなど、全ての映像セグメント
の代表フレームおよび内容記述情報をダウンロードする
必要はなくなり、時間コストやネットワーク帯域利用コ
ストを低減することができ、極めて好適な画像要約方法
および制御プログラムを提供できる。As described above, the present invention has the following advantageous effects. According to the present invention, the similarity described in the content description information added to each video segment is calculated, and the video segments are merged based on the similarity. You can certainly summarize. Therefore, when grasping the outline of the video information, the contents of the video can be grasped in a short time. For example, the video summary is made by the still frame of each video segment forming the summarized video information and the content description information for the still frame. When the list is output, it is not necessary to reproduce and watch the video, and the time cost for this work can be reduced. In addition, when an overview of a video is released over a network such as the Internet, it is not necessary to download the representative frames and content description information of all video segments, and time costs and network bandwidth usage costs can be reduced. A very suitable image summarization method and control program can be provided.

[Brief description of drawings]

【図１】本発明における画像要約における制御プログラ
ムの構成を示す説明図である。FIG. 1 is an explanatory diagram showing a configuration of a control program in image summarization according to the present invention.

【図２】一覧形式で出力させた要約前の出力例を示す説
明図である。FIG. 2 is an explanatory diagram showing an output example before a summary output in a list format.

【図３】一覧形式で出力させた要約後の出力例を示す説
明図である。FIG. 3 is an explanatory diagram showing an output example after summary output in a list format.

【図４】映像要約を行う一連の手順の一例を示したフロ
ーチャートである。FIG. 4 is a flowchart showing an example of a series of procedures for performing video summarization.

【図５】ツリー構造を呈した映像情報の一例を示した概
念図である。FIG. 5 is a conceptual diagram showing an example of video information having a tree structure.

[Explanation of symbols]

１内容記述情報パース手段１１第１併合手段１２第２併合手段１３レイヤーによるセグメント併合手段２画像情報取得手段３映像要約出力手段 1 Content description information parsing means 11 First merging means 12 Second merging means Segment merging means by 13 layers 2 Image information acquisition means 3 Video summary output means

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考） // Ｈ０４Ｎ 7/24 Ｈ０４Ｎ 7/13 Ｚ Front page continuation (51) Int.Cl. ⁷ Identification code FI theme code (reference) // H04N 7/24 H04N 7/13 Z

Claims

[Claims]

1. Content description information consisting of character information expressing the video content of each partial video such as a scene or a cut is added to the partial video to form a video segment, and the video segments are arranged in a tree form. A video summarizing method for summarizing video information composed of: a calculating step of calculating a similarity between the content description information, and a representative of a plurality of video segments having a high similarity calculated in the calculating step. A first merging step of merging a plurality of video segments having a high degree of similarity with each other, while deleting the partial video constituting one video segment while leaving the partial video, and adding the content description information to the partial video. In the calculation step and the first merging step, the video summaries to which the content description information is added are created by selecting partial videos to the extent that the content of the video information can be grasped. Characteristic video summarization method.

2. The content description information is made up of a plurality of items, and the calculation step calculates the similarity only for a part of designated items of the content description information. Video summary method described.

3. The content description information is made up of a plurality of items, each item is given a weight which is a degree of executing a video summary, and the calculation step only gives the similarity to the weighted items. And the first merging step determines the priority order of merging according to the degree of the weight, and merges the video segments based on the priority order until the desired number of video segments is reached. The video summarization method according to claim 1, characterized in that

4. In the first merging step, when there are a plurality of sets of video segments having a high degree of similarity, the group having the smallest total time length of partial videos in each set is preferentially merged. The video summarizing method according to claim 1, wherein

5. The first merging step prioritizes merging of image segment sets having a deep tree structure layer depth when there are a plurality of image segment sets with high similarity. The video summarization method according to any one of claims 1 to 3.

6. The first merging step uses the content description information of a representative video segment among the plurality of merging target video segments as the content description information of the merged video segment. Any one of items 1 to 5
Video summarization method described in section.

7. The representative video segment is a higher-order video segment having an inclusive relationship among a plurality of video segments having a high degree of similarity calculated in the calculation step. Video summarization method.

8. The video segment in which the first merging step merges the content description information of each of the plurality of video segments having a high degree of similarity calculated in the calculation step, and the merged content description information is merged. 6. The video summarizing method according to any one of claims 1 to 5, wherein the content descriptive information is the content description information.

9. A set of video segments having a shortest sum of video times different for each video segment is searched and merged, and the content description information added to the video segment having the longer video time in the set is merged. 7. A second merging step for making the content description information of the video segment created is added and appropriately combined or selectively configured so as to obtain an optimum video summary together with the first merging step. 9. The video summarizing method according to any one of 1 to 8.

10. A changing step of changing a priority of a video segment to be merged according to a weight, which is a degree of executing video summarization, for each layer layer of the tree-shaped video information is added. Claim 1 characterized by the above.
10. The video summarizing method according to any one of 1 to 9.

11. A step of merging video segments according to image features is added, and the merging of the video segments according to the content description information and the merging of the video segments according to the image features are appropriately combined or selected so as to obtain an optimum video summary. The video summarization method according to claim 1, wherein the video summarization is configured to be configured as a physical image.

12. The method according to claim 1, further comprising a summary output step of creating and outputting a video summary from still frames of each video segment forming the summarized video information and content description information for the still frames. 11. The video summarizing method according to any one of 11 above.

13. A video segment is formed by adding content description information consisting of character information expressing the video content of each partial video such as a scene or a cut to the partial video, and a plurality of the video segments are arranged in a tree shape. A control program in a video summary that summarizes video information composed of: a calculation means for calculating the similarity between the content description information, and a plurality of video segments having a high similarity calculated by the calculation means. A first merging unit for merging, and causing the computer to function as the calculating unit and the first merging unit, and selecting the partial video to such an extent that the content of the video information can be grasped, and the content description information is displayed. A control program for video summarization, characterized by creating an added video summarization.

14. The content descriptive information is composed of a plurality of items, and the calculating means calculates the degree of similarity only for a part of designated items of the content descriptive information. The control program in the video summary described.

15. The content description information is made up of a plurality of items, each item is given a weight that is a degree of executing a video summary, and the calculation means gives a similarity only to the items to which the weight is attached. And the first merging means determines the priority order for merging according to the degree of the weight, and merges the video segments based on the priority order until the desired number of video segments is reached. 14. The control program in the video summary according to claim 13.

16. The first merging means, when there are a plurality of sets of the video segments having a high degree of similarity, preferentially performs the merging in which the total time length of the partial videos in each set is the smallest. The control program for video summarization according to any one of claims 13 to 15.

17. The first merging unit preferentially merges a set of video segments having a deep tree structure layer depth when there are a plurality of sets of video segments having a high degree of similarity. The control program for video summarization according to any one of claims 13 to 15.

18. The first merging means uses the content description information of a representative video segment among the plurality of merged video segments as the content description information of the merged video segment. Item 18. A control program for video summarization according to any one of items 13 to 17.

19. The representative video segment is a higher-order video segment having an inclusive relationship among a plurality of video segments having a high degree of similarity calculated by the calculating means. Control program for the video summarization.

20. The first merging means merges respective content description information of a plurality of video segments having a high degree of similarity calculated by the calculating means, and the merged content description information is merged into a video segment. 18. The control program for video summarization according to any one of claims 13 to 17, wherein the control program is content description information.

21. Searching and merging a set of video segments having a shortest sum of different video times for each video segment, and merging the content description information added to the video segment having the longer video time in the set. A second merging unit that is used as the content description information of the created video segment, and causes a computer to function as the second merging unit,
21. The control program for video summarization according to any one of claims 13 to 20, which is appropriately combined or selectively configured with the first merging unit so as to obtain an optimum video summarization.

22. Change means is provided for giving a weight that is a degree of performing video summarization to each layer layer of the tree-shaped video information, and changing the priority of the video segments to be merged according to the weight. 22. The control program for video summarization according to claim 13, wherein the control program causes a computer to function as the changing unit.

23. A means for merging video segments according to image features is added to allow a computer to function as means for merging video segments according to image features, wherein merging of video segments according to the content description information and merging of video segments according to the image features. The video summarization control program according to any one of claims 13 to 22, wherein the video summarization is created by appropriately combining or selectively configuring so as to obtain an optimum video summarization.

24. A summary output means for creating and outputting a video summary from still frames of each video segment forming the summarized video information and content description information for the still frame is added, and the summary output means to a computer. The control program for video summarization according to any one of claims 13 to 23, characterized in that the control program functions as a control program.