JP2013070158A

JP2013070158A - Video retrieval apparatus and program

Info

Publication number: JP2013070158A
Application number: JP2011206061A
Authority: JP
Inventors: Yusuke Uchida; 祐介内田; Shigeyuki Sakasawa; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-09-21
Filing date: 2011-09-21
Publication date: 2013-04-18

Abstract

PROBLEM TO BE SOLVED: To retrieve a query video and to perform processing at high speed even if the query video includes a video where a part on a time base in a reference video is cut and a telop and a logotype are inserted into the reference video.SOLUTION: A video retrieval apparatus retrieves a query video from a reference video stored in a database 15. The apparatus includes: a video division section 11 for dividing the query video into segments of a prescribed time length and selecting the number of frames, which is previously determined, from the respective segments; a feature amount extracting section 12 for extracting a feature amount showing a feature of the video from the selected frame; a feature amount quantizing section 13 for converting the extracted feature amount into one or more quantization identifiers; and a feature amount accumulating section 14 for storing a video identifier of the selected frame and time information in a list constituting a transposition index corresponding to the quantization identifier.

Description

本発明は、クエリ映像を入力して、ハードディスクドライブやその他のメディア、ネットワークストレージ等に保存されているリファレンス映像から、前記クエリ映像を検索する映像検索装置およびプログラムに関する。 The present invention relates to a video search apparatus and a program for inputting a query video and searching for the query video from a reference video stored in a hard disk drive, other media, a network storage, or the like.

近年のブロードバンドの普及、およびＨＤＤ（Hard Disk Drive）、ＤＶＤ（Digital Versatile Disk）、Ｂｌｕ−ｒａｙｄｉｓｃ等のストレージの大容量化に伴って、デジタルコンテンツを著作権者やコンテンツプロバイダの許諾を得ずに、ネットワークを介して共有・公開することが容易になってきており、このような不正な共有・公開が問題となっている。このような問題に対して、デジタルコンテンツの指紋（特徴量）を利用して、複数のデジタルコンテンツの中から、著作権者が自由配布を許諾していない特定のコンテンツを自動的に検出する技術が提案されている。 With the recent widespread use of broadband and the increased storage capacity of HDDs (Hard Disk Drives), DVDs (Digital Versatile Disks), Blu-ray discs, etc., digital content has not been approved by the copyright holders or content providers. In addition, sharing and disclosing via a network has become easier, and such illegal sharing and disclosing has become a problem. Technology to automatically detect specific contents that the copyright holder has not permitted free distribution from among a plurality of digital contents by using fingerprints (features) of the digital contents for such problems. Has been proposed.

特許文献１では、三次元周波数解析と主成分分析を用いて、コンテンツの特徴量を記述している。この手法では、空間周波数解析（DCT）で得られた係数に時間軸方向への周波数解析（FFT）を加えた三次元周波数解析を行ない、さらに主成分分析により三次元周波数解析で得られた係数から特徴量を抽出している。特許文献２では、特許文献１で利用されている特徴量を用いて、流通コンテンツと類似している特定コンテンツを絞り込み、絞り込めない場合には、位相限定相関法を用いて流通コンテンツと最も類似している特定コンテンツを決定し、閾値によって同一コンテンツであるか否かを判定している。 In Patent Document 1, the feature amount of content is described using three-dimensional frequency analysis and principal component analysis. This method performs 3D frequency analysis by adding frequency analysis (FFT) in the time axis direction to the coefficient obtained by spatial frequency analysis (DCT), and then the coefficient obtained by 3D frequency analysis by principal component analysis. Feature values are extracted from. In Patent Document 2, the specific content similar to the distributed content is narrowed down using the feature amount used in Patent Document 1, and when it cannot be narrowed down, the most similar to the distributed content using the phase-only correlation method Specific content is determined, and it is determined whether or not the same content is based on a threshold value.

非特許文献１では、映像の各フレーム全体からカラーレイアウトと呼ばれる特徴量を抽出し、複数のフレームをシーケンシャルにマッチングさせることで、映像の一部分が切り取られる等の時間的編集が行なわれた場合でも検出を可能にしている。 In Non-Patent Document 1, even when temporal editing is performed such that a part of a video is cut out by extracting a feature amount called a color layout from each frame of the video and sequentially matching a plurality of frames. Detection is possible.

また、非特許文献２では、映像の各フレームからコーナーと呼ばれる特徴点を検出し、その周辺から特徴量を抽出し、各特徴点をマッチングさせることによって、切り取り等の編集が行なわれた場合であっても、不正流通コンテンツを検出できるようにしている。 Further, in Non-Patent Document 2, a feature point called a corner is detected from each frame of an image, a feature amount is extracted from the periphery thereof, and each feature point is matched to perform editing such as clipping. Even if there is, illegally distributed content can be detected.

特開２００５−１８６７５号公報JP 2005-18675 A 特開２００６−２８５９０７号公報JP 2006-285907 A

E. Kasutani and A. Yamada, “The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval,” in Proc. of ICIP, 2001, pp. 674-677.E. Kasutani and A. Yamada, “The MPEG-7 color layout descriptor: a compact image feature description for high-speed image / video segment retrieval,” in Proc. Of ICIP, 2001, pp. 674-677. J. Law-To et al., “Video Copy Detection: A Comparative Study,”in Proc. ACM CIVR’07, pp. 371-378, 2007.J. Law-To et al., “Video Copy Detection: A Comparative Study,” in Proc. ACM CIVR’07, pp. 371-378, 2007.

しかしながら、特許文献１および２で開示されている手法では、動画コンテンツ１つから１つの特徴量を抽出するため、例えば、動画コンテンツを２つに分割する等の時間軸方向の編集が行なわれると検出ができなくなるという問題がある。非特許文献１で開示されている手法では、画面全体から１つの特徴量のみを抽出しているため、テロップやロゴを挿入するような空間的編集が行なわれると検出ができなくなる問題がある。また、非特許文献２で開示されている手法では、１画面から数十個の特徴点を抽出し、それら全てをマッチングさせているため、特徴点の抽出およびマッチングに時間がかかりすぎるという問題がある。 However, in the methods disclosed in Patent Documents 1 and 2, since one feature amount is extracted from one moving image content, for example, when editing in the time axis direction such as dividing the moving image content into two is performed. There is a problem that detection is impossible. In the method disclosed in Non-Patent Document 1, since only one feature amount is extracted from the entire screen, there is a problem that detection cannot be performed if spatial editing such as insertion of a telop or logo is performed. Further, in the method disclosed in Non-Patent Document 2, dozens of feature points are extracted from one screen and all of them are matched, so that there is a problem that it takes too much time to extract and match the feature points. is there.

本発明は、このような事情に鑑みてなされたものであり、リファレンス映像の時間軸上の一部分を切り出したり、リファレンス映像にテロップやロゴが挿入されたりした映像を含むクエリ映像であっても、そのクエリ映像の検索を可能とし、その処理を高速に行なうことができる映像検索装置およびプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, even if it is a query video including a video in which a part of the reference video on the time axis is cut out or a telop or logo is inserted in the reference video, It is an object of the present invention to provide a video search apparatus and a program that can search the query video and perform the processing at high speed.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の映像検索装置は、データベースに格納されているリファレンス映像から、クエリ映像を検索する映像検索装置であって、前記リファレンス映像およびクエリ映像を一定時間長のセグメントに分割し、前記各セグメントから予め定められた数のフレームを選択する映像分割部と、前記選択されたフレームから映像の特徴を示す特徴量を抽出する特徴量抽出部と、前記抽出した特徴量を１つ以上の量子化識別子に変換する特徴量量子化部と、前記リファレンス映像のセグメント情報と対応する量子化識別子の集合に基づいて、転置インデックスを構成するリストに前記セグメントの映像識別子と時刻情報を格納する特徴量蓄積部と、前記クエリ映像のセグメント情報と対応する量子化識別子の集合に基づいて、前記転置インデックスを参照し、投票により、前記クエリ映像の全部または一部を含むリファレンス映像の映像識別子および時刻情報を特定する転置インデックス検索部と、を備えることを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the video search device of the present invention is a video search device that searches for a query video from a reference video stored in a database, and divides the reference video and the query video into segments of a certain length of time, A video dividing unit that selects a predetermined number of frames from the segment; a feature amount extracting unit that extracts a feature amount indicating a feature of the video from the selected frame; and the extracted feature amount is one or more quantum A feature quantity quantization unit for converting to a segmentation identifier, and a feature quantity for storing the segment video identifier and time information in a list constituting a transposed index based on a set of quantization identifiers corresponding to the segment information of the reference video Based on the accumulation unit and a set of quantization identifiers corresponding to the segment information of the query video, the transposition index. Referring to box, voting by, characterized in that it comprises a transposed index search unit for identifying the video identifier and time information of the reference image including all or part of the query image.

このように、リファレンス映像およびクエリ映像を一定時間長のセグメントに分割し、各セグメントから予め定められた数のフレームを選択し、選択されたフレームから映像の特徴を示す特徴量を抽出し、抽出した特徴量を１つ以上の量子化識別子に変換し、リファレンス映像のセグメント情報と対応する量子化識別子の集合に基づいて、転置インデックスを構成するリストにセグメントの映像識別子と時刻情報を格納し、クエリ映像のセグメント情報と対応する量子化識別子の集合に基づいて、転置インデックスを参照し、投票により、クエリ映像の全部または一部を含むリファレンス映像の映像識別子および時刻情報を特定するので、クエリ映像の全部または一部が、リファレンス映像の時間軸上の一部分を切り出したり、リファレンス映像にテロップやロゴが挿入されたりしたものであっても、クエリ映像の検索が可能となり、検索時間を短縮し、検索の精度を高めることが可能となる。 In this way, the reference video and query video are divided into segments of a certain length of time, a predetermined number of frames are selected from each segment, and feature quantities indicating video features are extracted from the selected frames and extracted. Converting the feature quantity into one or more quantization identifiers, and storing the segment video identifier and time information in a list constituting a transposed index based on a set of quantization identifiers corresponding to the segment information of the reference video, Based on the set of quantization identifiers corresponding to the segment information of the query video, the transposed index is referred to, and the video identifier and time information of the reference video including all or part of the query video are specified by voting. All or a part of the reference video can be cut out on the time axis of the reference video, Be one telop or logo or inserted in the image, it is possible to search query image, and reduce search time, it is possible to refine the search.

（２）また、本発明の映像検索装置において、前記特徴量抽出部は、前記選択された各フレームについて、予め定義された１つ以上の領域毎に特徴量を抽出し、前記特徴量量子化部は、前記領域毎に抽出された特徴量を量子化識別子に変換し、前記特徴量蓄積部は、前記領域毎の量子化識別子に基づいて、前記各領域に対応する転置インデックスを構成するリストに前記領域の映像識別子と時刻情報を格納することを特徴とする。 (2) In the video search device of the present invention, the feature amount extraction unit extracts a feature amount for each of one or more predefined regions for each of the selected frames, and the feature amount quantization The unit converts the feature amount extracted for each region into a quantization identifier, and the feature amount storage unit forms a transposed index corresponding to each region based on the quantization identifier for each region. And storing the video identifier and time information of the area.

このように、前記選択された各フレームについて、予め定義された１つ以上の領域毎に特徴量を抽出し、前記特徴量量子化部は、前記領域毎に抽出された特徴量を量子化識別子に変換し、前記特徴量蓄積部は、前記領域毎の量子化識別子に基づいて、前記各領域に対応する転置インデックスを構成するリストに前記領域の映像識別子と時刻情報を格納するので、一つのフレームから複数の特徴量を抽出することが可能となり、検索精度を向上させることが可能となる。 As described above, for each of the selected frames, the feature amount is extracted for each of one or more predefined regions, and the feature amount quantization unit calculates the feature amount extracted for each region as a quantization identifier. Since the feature amount accumulation unit stores the video identifier and time information of the region in a list constituting a transposed index corresponding to each region based on the quantization identifier for each region, A plurality of feature amounts can be extracted from the frame, and search accuracy can be improved.

（３）また、本発明の映像検索装置において、前記特徴量量子化部は、ベクトル量子化を行なうためのコードブックを有し、特徴量に対して最近傍となる代表ベクトルにベクトル量子化をすることによって、特徴量を量子化識別子に変換することを特徴とする。 (3) In the video search device of the present invention, the feature quantity quantization unit has a code book for performing vector quantization, and performs vector quantization on a representative vector that is nearest to the feature quantity. In this way, the feature quantity is converted into a quantized identifier.

このように、ベクトル量子化を行なうためのコードブックを有し、特徴量に対して最近傍となる代表ベクトルにベクトル量子化をすることによって、特徴量を量子化識別子に変換し、量子化識別子に基づいてマッチングを行なうので、検索時間を短縮することが可能となる。 In this way, a code book for vector quantization is provided, and the feature quantity is converted into a quantization identifier by performing vector quantization on the representative vector that is closest to the feature quantity, and the quantization identifier Since the matching is performed based on the above, the search time can be shortened.

（４）また、本発明の映像検索装置において、前記特徴量量子化部は、前記最近傍から近い順にｋ（ｋは自然数）個の代表ベクトルにベクトル量子化をすることによって、特徴量を量子化識別子に変換することを特徴とする。 (4) In the video search device according to the present invention, the feature quantization unit quantizes the feature value by performing vector quantization on k (k is a natural number) representative vectors in order from the nearest neighbor. It is characterized by converting it into a digitized identifier.

このように、前記最近傍から近い順にｋ（ｋは自然数）個の代表ベクトルにベクトル量子化をすることによって、類似した特徴量同士が同一の量子化識別子を共有する確率を向上させ、検索の精度を高めることが可能となる。 In this manner, by performing vector quantization on k (k is a natural number) representative vectors in order from the nearest neighbor, the probability that similar feature amounts share the same quantization identifier is improved. The accuracy can be increased.

（５）また、本発明の映像検索装置において、前記特徴量量子化部は、特徴量と予め定義された基底との内積の値を規定の閾値で離散化することによって、特徴量を量子化識別子に変換することを特徴とする。 (5) In the video search device of the present invention, the feature quantity quantization unit quantizes the feature quantity by discretizing a value of an inner product between the feature quantity and a predefined base with a prescribed threshold. It is characterized by being converted to an identifier.

このように、特徴量と予め定義された基底との内積の値を規定の閾値で離散化することによって、特徴量を量子化識別子に変換し、量子化識別子に基づいてマッチングを行なうので、検索時間を短縮することが可能となる。 In this way, by discretizing the inner product value of the feature quantity and the predefined base with a prescribed threshold, the feature quantity is converted into a quantized identifier, and matching is performed based on the quantized identifier. Time can be shortened.

（６）また、本発明の映像検索装置において、前記特徴量量子化部は、前記内積を離散化する際に、１つ以上の値に離散化することによって、特徴量を量子化識別子に変換することを特徴とする。 (6) In the video search device of the present invention, the feature quantization unit converts the feature into a quantization identifier by discretizing the inner product into one or more values when discretizing the inner product. It is characterized by doing.

このように、前記内積を離散化する際に、１つ以上の値に離散化することによって、類似した特徴量同士が同一の量子化識別子を共有する確率を向上させ、検索時間を短縮し、検索の精度を高めることが可能となる。 In this way, when discretizing the inner product, by discretizing into one or more values, the probability that similar feature quantities share the same quantization identifier is improved, search time is shortened, It becomes possible to improve the accuracy of the search.

（７）また、本発明の映像検索装置において、前記転置インデックス検索部は、前記クエリ映像の各セグメントの量子化識別子に対応する転置インデックスを参照し、前記転置インデックスに登録されているリファレンス映像の映像識別子、およびリファレンス映像のセグメントの時刻情報と前記クエリ映像のセグメントの時刻情報とのオフセット値に投票を行ない、得票数が一定値以上かつ一定の範囲内で極大となるリファレンス映像の映像識別子およびオフセット値を検出候補とすることを特徴とする。 (7) In the video search device of the present invention, the transposed index search unit refers to a transposed index corresponding to a quantization identifier of each segment of the query video, and the reference video registered in the transposed index is referred to. The video identifier and the video identifier of the reference video that vote for the offset value between the time information of the segment of the reference video and the time information of the segment of the query video, and the number of votes is a certain value or more and within a certain range, and An offset value is used as a detection candidate.

このように、前記クエリ映像の各セグメントの量子化識別子に対応する転置インデックスを参照し、前記転置インデックスに登録されているリファレンス映像の映像識別子、およびリファレンス映像のセグメントの時刻情報と前記クエリ映像のセグメントの時刻情報とのオフセット値に投票を行ない、得票数が一定値以上かつ一定の範囲内で極大となるリファレンス映像の映像識別子およびオフセット値を検出候補とするので、検索時間を短縮することが可能となる。 As described above, the transposed index corresponding to the quantization identifier of each segment of the query video is referred to, the video identifier of the reference video registered in the transposed index, the time information of the segment of the reference video, and the query video Voting is performed for the offset value with the time information of the segment, and the video identifier and offset value of the reference video that has the maximum number of votes within a certain range and the maximum value are used as detection candidates, so that the search time can be shortened. It becomes possible.

（８）また、本発明の映像検索装置において、前記クエリ映像の各セグメントに割り当てられている量子化識別子ｉｄそれぞれについて、前記転置インデックスに登録されているリファレンス映像の映像識別子をｖとし、リファレンス映像のセグメントの時刻情報をｔ’とし、クエリ映像のセグメントの時刻情報をｔとしたときに、前記転置インデックス検索部は、投票時に前記（ｔ，ｉｄ）の情報を、前記ｖおよびオフセット値（ｔ’−ｔ）毎に記録し、前記検出候補それぞれについて、（ｔ，ｉｄ）のリストを参照し、隣り合うリストの要素のｔが一定以上離れている箇所でリストを分割し、前記分割したリストそれぞれに対応するスコアを算出し、前記算出したスコアを前記リスト内の最大のｔから最小のｔを減算した値を用いて正規化し、前記正規化した値を検出結果とすることを特徴とする。 (8) In the video search device of the present invention, for each quantization identifier id assigned to each segment of the query video, the video identifier of the reference video registered in the transposed index is v, and the reference video When the time information of the segment of t is t ′ and the time information of the segment of the query video is t, the transposed index search unit converts the information of (t, id) into the v and the offset value (t '-T) is recorded for each detection candidate, the list of (t, id) is referred to for each of the detection candidates, and the list is divided at a location where t of elements of adjacent lists are separated by a certain distance or more. A score corresponding to each is calculated, and the calculated score is normalized using a value obtained by subtracting the minimum t from the maximum t in the list. And, characterized by a detection result of the normalized value.

このように、投票時に（ｔ，ｉｄ）の情報を、ｖおよびオフセット値（ｔ’−ｔ）毎に記録し、検出候補それぞれについて、（ｔ，ｉｄ）のリストを参照し、隣り合うリストの要素のｔが一定以上離れている箇所でリストを分割し、分割したリストそれぞれに対応するスコアを算出し、算出したスコアをリスト内の最大のｔから最小のｔを減算した値を用いて正規化し、正規化した値を検出結果とするので、検索の精度を高めることが可能となる。 In this way, the information of (t, id) is recorded for each v and offset value (t′−t) at the time of voting, the list of (t, id) is referred to for each detection candidate, The list is divided at locations where t of the element is more than a certain distance, the score corresponding to each divided list is calculated, and the calculated score is normalized using the value obtained by subtracting the minimum t from the maximum t in the list Since the normalized value is used as the detection result, the accuracy of the search can be improved.

（９）また、本発明の映像検索装置において、前記転置インデックス検索部は、前記（ｔ，ｉｄ）のリストのスコアを、リスト内の要素の個数とすることを特徴とする。 (9) Further, in the video search device of the present invention, the transposed index search unit uses the score of the list of (t, id) as the number of elements in the list.

（１０）また、本発明の映像検索装置において、前記転置インデックス検索部は、前記（ｔ，ｉｄ）のリストのスコアを、リスト内の要素のうち、前記ｉｄに関連付けられた重要度の和とし、前記重要度は、転置インデックスに蓄積されているセグメント数を、転置インデックスのｉｄ番目のリストのサイズで除算したものの対数とすることを特徴とする。 (10) Further, in the video search device of the present invention, the transposed index search unit sets the score of the list of (t, id) as a sum of importance levels associated with the id among the elements in the list. The importance is a logarithm of the number of segments accumulated in the inverted index divided by the size of the id-th list of the inverted index.

このように、（ｔ，ｉｄ）のリストのスコアを、リスト内の要素のうち、ｉｄに関連付けられた重要度の和とし、重要度は、転置インデックスに蓄積されているセグメント数を、転置インデックスのｉｄ番目のリストのサイズで除算したものの対数とするので、多くのセグメントに出現する量子化識別子の重要度が小さくなり、特定のセグメントに固有の量子化識別子の重要度が大きくなり、検索の精度を高めることが可能となる。 Thus, the score of the list of (t, id) is the sum of the importance levels associated with id among the elements in the list, and the importance level is the number of segments accumulated in the transposition index. Since it is the logarithm of what is divided by the size of the id-th list, the importance of quantized identifiers appearing in many segments is reduced, the importance of quantized identifiers unique to a particular segment is increased, and the search The accuracy can be increased.

（１１）また、本発明の映像検索装置において、前記転置インデックス検索部は、隣接するｔについて同じｉｄが出現した際には、スコアを一度しか加算しないことを特徴とする。 (11) In the video search device of the present invention, the transposed index search unit adds a score only once when the same id appears for adjacent t.

このように、部分リストのスコアを計算する際に、隣接するｔについて同じ（ｒ，ｉｄ）が出現した際には、スコアを一度しか加算しないようにすることで、特に動きの少ない領域において、偶然無関係な特徴の量子化識別子が連続して一致してしまうことによる誤検出を削減し、検索の精度を高めることが可能となる。 Thus, when calculating the score of the partial list, when the same (r, id) appears for adjacent t, by adding the score only once, particularly in a region with little movement, It is possible to reduce false detections due to the fact that quantized identifiers of unrelated features coincide with each other, and to improve the search accuracy.

（１２）また、本発明のプログラムは、データベースに格納されているリファレンス映像から、クエリ映像を検索するプログラムであって、前記リファレンス映像およびクエリ映像を一定時間長のセグメントに分割し、前記各セグメントから予め定められた数のフレームを選択する処理と、前記選択されたフレームから映像の特徴を示す特徴量を抽出する処理と、前記抽出した特徴量を１つ以上の量子化識別子に変換する処理と、前記リファレンス映像のセグメント情報と対応する量子化識別子の集合に基づいて、転置インデックスを構成するリストに前記セグメントの映像識別子と時刻情報を格納する処理と、前記クエリ映像のセグメント情報と対応する量子化識別子の集合に基づいて、前記転置インデックスを参照し、投票により、前記クエリ映像の全部または一部を含むリファレンス映像の映像識別子および時刻情報を特定する処理と、の一連の処理を、コンピュータに実行させることを特徴とする。 (12) A program according to the present invention is a program for searching a query video from a reference video stored in a database, and divides the reference video and the query video into segments of a predetermined time length, A process of selecting a predetermined number of frames from the above, a process of extracting feature quantities indicating video features from the selected frames, and a process of converting the extracted feature quantities into one or more quantization identifiers And a process of storing the video identifier and time information of the segment in a list constituting a transposed index based on a set of quantization identifiers corresponding to the segment information of the reference video, and the segment information of the query video Based on the set of quantization identifiers, refer to the transposed index, and by voting, A process for specifying an image identifier and time information of the reference image including all or part of the area images, the series of processing, and characterized by causing a computer to execute.

（１３）また、本発明のプログラムは、前記選択された各フレームについて、予め定義された１つ以上の領域毎に特徴量を抽出する処理と、前記領域毎に抽出された特徴量を量子化識別子に変換する処理と、前記領域毎の量子化識別子に基づいて、前記各領域に対応する転置インデックスを構成するリストに前記領域の映像識別子と時刻情報を格納する処理と、をさらに含むことを特徴とする。 (13) The program of the present invention also includes a process for extracting a feature amount for each of one or more predefined regions for each of the selected frames, and the feature amount extracted for each region is quantized. A process of converting to an identifier, and a process of storing the video identifier and time information of the region in a list constituting a transposed index corresponding to each region based on the quantization identifier for each region. Features.

このように、前記選択された各フレームについて、予め定義された１つ以上の領域毎に特徴量を抽出し、前記特徴量量子化部は、前記領域毎に抽出された特徴量を量子化識別子に変換し、前記特徴量蓄積部は、前記領域毎の量子化識別子に基づいて、前記各領域に対応する転置インデックスを構成するリストに前記領域の映像識別子と時刻情報を格納するので、１つのフレームから複数の特徴量を抽出することが可能となり、検索精度を向上させることが可能となる。 As described above, for each of the selected frames, the feature amount is extracted for each of one or more predefined regions, and the feature amount quantization unit calculates the feature amount extracted for each region as a quantization identifier. And the feature amount accumulating unit stores the video identifier and time information of the region in the list constituting the transposed index corresponding to each region based on the quantization identifier for each region. A plurality of feature amounts can be extracted from the frame, and search accuracy can be improved.

本発明によれば、クエリ映像を一定時間長のセグメントに分割し、前記各セグメントから予め定められた数のフレームを選択し、前記選択されたフレームから映像の特徴を示す特徴量を抽出し、前記抽出した特徴量を１つ以上の量子化識別子に変換し、前記量子化識別子に対応する転置インデックスを構成するリストに前記選択されたフレームの映像識別子と時刻情報を格納するので、クエリ映像の全部または一部が、リファレンス映像の時間軸上の一部分を切り出したり、リファレンス映像にテロップやロゴが挿入されたりしたものであっても、クエリ映像の検索が可能となり、検索時間を短縮し、検索の精度を高めることが可能となる。 According to the present invention, the query video is divided into segments of a certain length of time, a predetermined number of frames are selected from each segment, and feature quantities indicating video features are extracted from the selected frames, The extracted feature quantity is converted into one or more quantization identifiers, and the video identifier and time information of the selected frame are stored in a list constituting a transposed index corresponding to the quantization identifier. Even if all or part of the reference video is cut out of the time axis of the reference video or a telop or logo is inserted in the reference video, the query video can be searched, reducing the search time and searching. It is possible to improve the accuracy of the.

本発明の実施形態に係る映像検索装置のブロック図である。1 is a block diagram of a video search device according to an embodiment of the present invention. 映像を分割する様子を示す図である。It is a figure which shows a mode that an image | video is divided | segmented. 矩形領域Ｒ^ｉの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. 矩形領域Ｒ^ｉの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. 矩形領域Ｒ^ｉの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. 特徴量の抽出の方法を示す図である。It is a figure which shows the method of extraction of a feature-value. ソフト割り当ての一例を示す図である。It is a figure which shows an example of soft allocation. 転置インデックスの一例を示す図である。It is a figure which shows an example of a transposition index. 投票の様子を示す図である。It is a figure which shows the mode of voting.

以下、本発明の実施形態について図面を参照して説明する。本実施形態では、予め検出を行なおうとする著作権コンテンツをリファレンス映像として入力し、特徴量を抽出し、さらに特徴量を量子化し、データベースを構築しておく。その後、入力されたクエリ映像を用いて、リファレンス映像を検索する。 Embodiments of the present invention will be described below with reference to the drawings. In the present embodiment, copyright content to be detected in advance is input as a reference video, the feature amount is extracted, the feature amount is further quantized, and a database is constructed. Thereafter, the reference video is searched using the input query video.

図１は、本発明の実施形態に係る映像検索装置のブロック図である。図１に示すように、映像検索装置１０は、映像分割部１１、特徴量抽出部１２、特徴量量子化部１３、特徴量蓄積部１４、データベース１５、およびデータベース検索部１６から構成されている。また、これらの構成要素は、制御バス１７に接続され、相互に信号の送受信を行なうことができる。 FIG. 1 is a block diagram of a video search apparatus according to an embodiment of the present invention. As shown in FIG. 1, the video search apparatus 10 includes a video division unit 11, a feature amount extraction unit 12, a feature amount quantization unit 13, a feature amount storage unit 14, a database 15, and a database search unit 16. . These components are connected to the control bus 17 and can transmit / receive signals to / from each other.

映像分割部１１は、リファレンス映像をブロックに分割する。図２は、分割の様子を示す図である。この分割は、空間の分割と時間方向の分割を含む。空間方向の分割は、図３に示すように、任意の形状の矩形によって定義される。図２に示した例は、図３Ｂに示す分割を採用し、Ｒ^４の矩形に対応するブロックを示している。なお、図３Ａは、画面全体を１つの矩形領域とした場合である。キャプションやロゴなどが挿入されることを想定しないのであれば、この矩形領域を利用する。図３Ｂは、編集が行なわれる場所を予め決めない場合の矩形領域の設定である。図３Ｃは、画面下や画面右に字幕などの編集が想定される場合の矩形領域の設定である。どちらか片方に編集が行なわれても、他方の矩形領域に影響がない。 The video dividing unit 11 divides the reference video into blocks. FIG. 2 is a diagram showing a state of division. This division includes space division and time direction division. The division in the spatial direction is defined by a rectangle having an arbitrary shape as shown in FIG. The example shown in Figure 2, employs a split shown in Figure 3B, shows a block corresponding to the rectangular R ^4. FIG. 3A shows a case where the entire screen is a single rectangular area. If it is not assumed that captions or logos are inserted, this rectangular area is used. FIG. 3B shows the setting of a rectangular area when the location where editing is performed is not determined in advance. FIG. 3C shows the setting of a rectangular area when editing such as subtitles is assumed at the bottom or right of the screen. Even if editing is performed on either one, the other rectangular area is not affected.

時間方向の分割は、連続するフレームを一定の時間（例えば０．５秒毎）で区切ることによって行なわれる。矩形のＩＤがｒの矩形かつ、時間方向にｔ番目のブロックをＢｒ，ｔと表記する。仮に３０ｆｐｓの映像を０．５秒毎に区切ると、各ブロック中には１５枚のフレームが存在することになる。 The division in the time direction is performed by dividing successive frames at a constant time (for example, every 0.5 seconds). A rectangle whose rectangular ID is r and the t-th block in the time direction is expressed as Br, t. If a 30 fps video is divided every 0.5 seconds, there are 15 frames in each block.

特徴量抽出部１２は、映像分割部１１で得られた各ブロックＢｒ，ｔから特徴量を抽出する。これは単純に、Ｂｒ，ｔに含まれる全てのフレームから特徴ベクトルを抽出しても良いが、抽出される特徴量の数を制限したい場合にはフレームを一定間隔でサンプリングしてから特徴量を抽出しても良い。例えば、ＤＣＴ係数を利用する手法や局所特徴領域を記述するために広く用いられているScale-invariant feature transform（SIFT）を用いることができる。 The feature amount extraction unit 12 extracts a feature amount from each block Br, t obtained by the video division unit 11. In this case, feature vectors may be extracted from all the frames included in Br, t. However, if it is desired to limit the number of extracted feature quantities, the feature quantities are extracted after sampling the frames at regular intervals. It may be extracted. For example, a technique using DCT coefficients or a scale-invariant feature transform (SIFT) widely used for describing local feature regions can be used.

ここで、Ｂｒ，ｔから抽出された特徴量の集合をＦｒ，ｔとする。下記にＤＣＴ係数のＡＣ（Alternating Current）成分を特徴量とする際の抽出法を説明する。まず、部分フレームを８ｘ８画素に縮小し、離散コサイン変換（ＤＣＴ）を行なう。得られたＤＣＴ係数のうち、ＡＣ成分をジグザグスキャン順にＭ個取得することでＭ次元の特徴量を得る。 Here, a set of feature amounts extracted from Br, t is Fr, t. Hereinafter, an extraction method when an AC (Alternating Current) component of a DCT coefficient is used as a feature amount will be described. First, the partial frame is reduced to 8 × 8 pixels and discrete cosine transform (DCT) is performed. Of the obtained DCT coefficients, M AC feature components are obtained in the zigzag scan order to obtain M-dimensional feature values.

特徴量量子化部１３は、特徴量抽出部１２で抽出された各特徴量集合Ｆｒ，ｔ内の特徴量を、それぞれ０〜Ｎ−１までの値に量子化し、量子化ＩＤの集合Ｗｒ，ｔ（整数値）を作成する。Ｗｒ，ｔ内のＩＤの数は、予め最大数が決められているか、量子化時のソフト割り当ての個数によって決定される。 The feature quantity quantization unit 13 quantizes the feature quantities in each feature quantity set Fr, t extracted by the feature quantity extraction unit 12 to values from 0 to N−1, respectively, and sets quantization ID sets Wr, t (integer value) is created. The number of IDs in Wr, t is determined in advance by the maximum number or by the number of soft assignments at the time of quantization.

以下、ＤＣＴ係数のＡＣ成分を特徴量とした際のＷｒ，ｔの作成法を説明する。Ｍ次元の特徴量の各次元をそれぞれ、正の値であれば１、負の値であれば０にバイナリ化する。これによりＭ次元の特徴量はＮビットのビット列に変換される。このビット列は２＾Ｍ通り存在し、このビット列を１０進数表記した０〜２＾Ｍ−１が量子化後の値となる（Ｎ＝２＾Ｍ）。この手順により、Ｗｒ，ｔが得られる。但し、特徴量の微小な変化によって量子化ＩＤが変化してしまうことに対応するため、ソフト割り当てを行なう。ソフト割り当ては、例えば、下記の文献に記載されている技術を用いて行なうことが可能である。
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Lost in quantization: Improving particular object retrieval in large scale image databases," in Proc. of CVPR, 2008. Hereinafter, a method of creating Wr, t when the AC component of the DCT coefficient is used as a feature amount will be described. Each dimension of the M-dimensional feature value is binarized to 1 if it is a positive value and 0 if it is a negative value. As a result, the M-dimensional feature value is converted into an N-bit bit string. There are 2 ^ M kinds of this bit string, and 0-2 ^ M-1 representing this bit string in decimal number is a value after quantization (N = 2 ^ M). By this procedure, Wr, t is obtained. However, soft allocation is performed in order to cope with a change in the quantization ID due to a minute change in the feature amount. The software assignment can be performed using a technique described in the following document, for example.
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Lost in quantization: Improving particular object retrieval in large scale image databases," in Proc. Of CVPR, 2008.

図５は、ソフト割り当ての一例を示す図である。このソフト割り当てとは、量子化時に特徴量を１つのＩＤだけに割り当てるのではなく、複数の量子化ＩＤに割り当てることである。本発明では、Ｆｒ，ｔの各特徴量ｆの各次元について、絶対値が小さいものからＫ次元のビットを自由に反転させたＩＤにも割り当てることで、合計２＾Ｋ個のＩＤに割り当てが行なわれる。ＩＤの最大数が定められている場合には、ＩＤの最大数を超えるまで、絶対値の小さいものから順番にビット反転を行なったＩＤを割り当てることを繰り返す。 FIG. 5 is a diagram illustrating an example of software allocation. This soft assignment is not to assign a feature quantity to only one ID at the time of quantization, but to assign it to a plurality of quantization IDs. In the present invention, for each dimension of each feature quantity f of Fr, t, assignment is made to IDs in which the absolute value is small and the bits of the K dimension are freely inverted, thereby assigning a total of 2 ^ K IDs. Done. When the maximum number of IDs is determined, the IDs subjected to bit inversion in order from the smallest absolute value are repeated until the maximum number of IDs is exceeded.

なお、一般的な特徴ベクトルについて、ベクトル量子化を用いて量子化を行なっても良い。予め特徴量の集合からｋ−ｍｅａｎｓクラスタリングを行ない、Ｎ個の代表ベクトルを求めておく。Ｆｒ，ｔ内の特徴量をそれぞれ最も近い代表ベクトルに割り当て、その代表ベクトルのＩＤを量子化ＩＤとする。また、最も近い代表ベクトルではなく、最も近いＫ個の代表ベクトルに割り当てることでソフト割り当てを実現することができる。 Note that general feature vectors may be quantized using vector quantization. K-means clustering is performed in advance from a set of feature values to obtain N representative vectors. The feature amounts in Fr and t are assigned to the nearest representative vector, and the ID of the representative vector is set as a quantization ID. Also, soft allocation can be realized by allocating not the nearest representative vector but the nearest K representative vectors.

以上のようにして、Ｗｒ，ｔの作成を行なう。なお、量子化ＩＤの重複は許さないものとする。 In this manner, Wr, t is created. In addition, duplication of quantization ID shall not be permitted.

特徴量蓄積部１４は、量子化ＩＤの集合Ｗｒ，ｔを、図６に示す転置インデックスに保存する。転置インデックスは矩形毎に個別に用意する。図６に示すように、Ｗｒ，ｔ内の全てのＩＤについて、対応するリストにブロックＢｒ，ｔの情報を登録する。登録する情報は、リファレンス映像のビデオＩＤ、ブロックの時刻情報（ｔ）とする。この登録を全てのｒ，ｔに対して行なう。特徴量蓄積部１４は、これらの登録情報をデータベース１５に格納する。 The feature amount storage unit 14 stores the set of quantization IDs Wr, t in the transposed index shown in FIG. A transpose index is prepared for each rectangle. As shown in FIG. 6, the information of the block Br, t is registered in the corresponding list for all IDs in Wr, t. The information to be registered is the video ID of the reference video and the time information (t) of the block. This registration is performed for all r and t. The feature amount accumulating unit 14 stores the registration information in the database 15.

一方、データベース検索部１６は、リファレンス映像からクエリ映像の検索を行なう。検索を行なう際には、クエリ映像は、リファレンス映像を蓄積した場合と同様に、ブロックに分割し、特徴量を抽出し、量子化を行なう。ブロックの分割法はリファレンス映像を蓄積した際と同じにしなければならないが、特徴量を抽出する際に抽出される特徴量の数を削減したり、量子化を行なう際にソフト割り当てを行なうＩＤの数を削減することで、探索速度を向上させることができる。 On the other hand, the database search unit 16 searches for the query video from the reference video. When performing a search, the query video is divided into blocks, feature quantities are extracted, and quantization is performed, as in the case of storing the reference video. The block division method must be the same as when the reference video is stored. However, the number of feature quantities extracted when extracting feature quantities is reduced, or the IDs for soft assignment when performing quantization are used. The search speed can be improved by reducing the number.

データベース検索部１６は、クエリ映像から抽出されたＷｒ，ｔそれぞれについて、Ｗｒ，ｔ内のすべての量子化ＩＤについて、転置インデックスの該当するリストを参照し、対応するブロックＩＤのリストを取得する。リスト中のブロックＩＤをｔ’とすると、対応するオフセットｔ’−ｔに投票を行なう。同時に、投票時の情報（ｒ，ｔ，ｉｄ）をオフセット値ｔ’−ｔ毎に保存する。ここでｉｄは量子化識別子である。また、その際、ｔについて昇順にソートされるように保存する。 For each Wr, t extracted from the query video, the database search unit 16 refers to the corresponding list of transposed indexes for all quantization IDs in Wr, t, and acquires a list of corresponding block IDs. If the block ID in the list is t ', the corresponding offset t'-t is voted. At the same time, information (r, t, id) at the time of voting is stored for each offset value t′−t. Here, id is a quantization identifier. At that time, t is stored so as to be sorted in ascending order.

図７は、投票の様子を示す図である。全てのＷｒ，ｔについて投票が終了すると、各オフセット値のスコアについて、閾値以上かつ極大の値を持つオフセット値のリストを取得する。その後、それぞれのオフセット値について、下記の処理を行なう。 FIG. 7 is a diagram illustrating a voting state. When voting is completed for all Wr, t, a list of offset values having a maximum value that is equal to or greater than a threshold is acquired for each offset value score. Thereafter, the following processing is performed for each offset value.

オフセット値ｔ’−ｔについて、対応する（ｒ，ｔ，ｉｄ）のリストを取得する。このリストは、前述のとおりｔについて昇順にソートされている。このリストを、隣り合うｔが一定以上の間隔が開いている箇所で分割することによって、部分リストに分割する。その後、それぞれの部分リストについて、スコアを計算する。最も単純には、リスト内の要素の数をスコアとする。要素の数ではなく、それぞれの要素に、ｉｄに基づく重みを利用しても良い。この重みには、文書検索で用いられるInverse Document Frequency（IDF）を用いることができる。 For the offset value t'-t, a corresponding (r, t, id) list is acquired. As described above, this list is sorted in ascending order with respect to t. This list is divided into partial lists by dividing the list at locations where adjacent t is a certain distance or more. Thereafter, a score is calculated for each partial list. Most simply, the score is the number of elements in the list. Instead of the number of elements, a weight based on id may be used for each element. As this weight, Inverse Document Frequency (IDF) used in document search can be used.

本発明の場合、ＩＤＦは、蓄積している映像のセグメント数をＤ、ｉｄ番目の転置インデックスのリストのサイズをＤ’とすると、ＩＤＦ＝ｌｏｇ（Ｄ’／Ｄ）と求めることができる。更に、スコアを部分リスト内の最大のｔと最小のｔの差（＝検出したコピー領域の長さ）に応じて、この差が大きくなるほどスコアを小さくするように正規化しても良い。最も簡単にはスコアを最大のｔから最小のｔを引いた値で除算する。 In the case of the present invention, IDF can be obtained as IDF = log (D ′ / D), where D is the number of stored video segments and D ′ is the size of the id-th inverted index list. Furthermore, the score may be normalized according to the difference between the maximum t and the minimum t in the partial list (= the length of the detected copy area) so that the score decreases as the difference increases. The simplest is to divide the score by the maximum t minus the minimum t.

また、部分リストのスコアを計算する際に、隣接するｔについて同じ（ｒ，ｉｄ）が出現した際には、スコアを一度しか加算しないようにしても良い。 Further, when calculating the score of the partial list, if the same (r, id) appears for adjacent t, the score may be added only once.

この処理を全てのオフセットについて行なった後、部分リストのスコアでソートし、上位の結果を検索結果とする。 After this processing is performed for all offsets, the results are sorted by the partial list score, and the upper result is used as the search result.

以上説明したように、本実施形態によれば、クエリ映像を一定時間長のセグメントに分割し、前記各セグメントから予め定められた数のフレームを選択し、前記選択されたフレームから映像の特徴を示す特徴量を抽出し、前記抽出した特徴量を１つ以上の量子化識別子に変換し、前記量子化識別子に対応する転置インデックスを構成するリストに前記選択されたフレームの映像識別子と時刻情報を格納するので、クエリ映像の全部または一部が、リファレンス映像の時間軸上の一部分を切り出したり、リファレンス映像にテロップやロゴが挿入されたりしたものであっても、クエリ映像の検索が可能となり、検索時間を短縮し、検索の精度を高めることが可能となる。 As described above, according to the present embodiment, the query video is divided into segments of a certain length of time, a predetermined number of frames are selected from each segment, and video characteristics are selected from the selected frames. Extracting the feature quantity to be displayed, converting the extracted feature quantity into one or more quantization identifiers, and adding the video identifier and time information of the selected frame to a list constituting a transposed index corresponding to the quantization identifier. Because it is stored, even if all or part of the query video is a part of the reference video cut out on the time axis or a telop or logo is inserted in the reference video, the query video can be searched, It is possible to shorten the search time and increase the accuracy of the search.

１０映像検索装置
１１映像分割部
１２特徴量抽出部
１３特徴量量子化部
１４特徴量蓄積部
１５データベース
１６データベース検索部
１７制御バス DESCRIPTION OF SYMBOLS 10 Image | video search device 11 Image | video division | segmentation part 12 Feature-value extraction part 13 Feature-quantization part 14 Feature-value storage part 15 Database 16 Database search part 17 Control bus

Claims

A video search device for searching a query video from a reference video stored in a database,
A video dividing unit that divides the reference video and the query video into segments of a predetermined time length, and selects a predetermined number of frames from each segment;
A feature amount extraction unit that extracts a feature amount indicating the feature of the video from the selected frame;
A feature quantization unit that converts the extracted feature into one or more quantization identifiers;
Based on a set of quantization identifiers corresponding to the segment information of the reference video, a feature amount storage unit that stores the video identifier and time information of the segment in a list constituting a transposed index;
Based on a set of quantization identifiers corresponding to the segment information of the query video, the transposition index is referenced, and the transposition for specifying the video identifier and time information of the reference video including all or part of the query video by voting An image search device comprising: an index search unit.

The feature amount extraction unit extracts a feature amount for each of one or more predefined regions for each of the selected frames,
The feature quantity quantization unit converts the feature quantity extracted for each region into a quantization identifier,
The feature quantity storage unit stores the video identifier and time information of the area in a list constituting a transposed index corresponding to each area based on a quantization identifier for each area. The video search device described.

The feature quantization unit has a code book for vector quantization, and converts the feature into a quantization identifier by performing vector quantization on a representative vector that is closest to the feature The video search apparatus according to claim 1 or 2, wherein

The feature quantity quantization unit converts the feature quantity into a quantization identifier by performing vector quantization on k (k is a natural number) representative vectors in order from the nearest neighbor. The video search device described.

The feature quantity quantization unit converts the feature quantity into a quantization identifier by discretizing a value of an inner product between the feature quantity and a predefined base with a prescribed threshold value. The video search device according to claim 2.

6. The video search according to claim 5, wherein the feature amount quantization unit converts the feature amount into a quantization identifier by discretizing the inner product into one or more values when discretizing the inner product. apparatus.

The transposed index search unit refers to the transposed index corresponding to the quantization identifier of each segment of the query video, the video identifier of the reference video registered in the transposed index, the time information of the segment of the reference video, and the A vote is performed on an offset value with respect to time information of a segment of a query video, and a video identifier and an offset value of a reference video that have a maximum number of votes within a certain range are set as detection candidates. Item 2. The video search device according to Item 1.

For each quantization identifier id assigned to each segment of the query video,
The video identifier of the reference video registered in the transposed index is v,
Let t 'be the time information of the reference video segment,
When the time information of the query video segment is t,
The transposed index search unit records the information of (t, id) at the time of voting for each of the v and the offset value (t′−t), and refers to the list of (t, id) for each of the detection candidates. Then, the list is divided at a location where t of the elements of the adjacent lists is more than a certain distance, a score corresponding to each of the divided lists is calculated, and the calculated score is reduced from the maximum t in the list to the minimum 8. The video search apparatus according to claim 7, wherein normalization is performed using a value obtained by subtracting t, and the normalized value is used as a detection result.

9. The video search apparatus according to claim 8, wherein the transposed index search unit uses the score of the list of (t, id) as the number of elements in the list.

The inverted index search unit sets the score of the list of (t, id) as the sum of the importance levels associated with the id among the elements in the list, and the importance levels are accumulated in the inverted index. 9. The video search apparatus according to claim 8, wherein the number of segments is a logarithm of the number obtained by dividing the number of segments by the size of the id-th list of the inverted index.

11. The video search apparatus according to claim 10, wherein the transposed index search unit adds a score only once when the same id appears for adjacent t.

A program that searches query video from reference video stored in a database.
Dividing the reference video and the query video into segments of a certain length of time, and selecting a predetermined number of frames from each segment;
Processing for extracting a feature amount indicating a feature of the video from the selected frame;
Converting the extracted feature quantity into one or more quantization identifiers;
Based on a set of quantization identifiers corresponding to the segment information of the reference video, a process of storing the video identifier and time information of the segment in a list constituting a transposed index;
A process of referring to the transposed index based on a set of quantization identifiers corresponding to the segment information of the query video and specifying the video identifier and time information of the reference video including all or part of the query video by voting A program that causes a computer to execute a series of processes.

For each selected frame, a process of extracting a feature amount for each of one or more predefined regions;
A process of converting the feature quantity extracted for each region into a quantization identifier;
13. The method according to claim 12, further comprising: storing a video identifier and time information of the region in a list constituting a transposed index corresponding to each region based on the quantization identifier for each region. Program.