JP5306940B2

JP5306940B2 - Moving image content evaluation apparatus and computer program

Info

Publication number: JP5306940B2
Application number: JP2009186573A
Authority: JP
Inventors: 一晃小峯; 寿哉森田; 俊晃上向
Original assignee: KDDI Corp; Japan Broadcasting Corp
Current assignee: KDDI Corp; Japan Broadcasting Corp
Priority date: 2009-08-11
Filing date: 2009-08-11
Publication date: 2013-10-02
Anticipated expiration: 2029-08-11
Also published as: JP2011039778A

Abstract

PROBLEM TO BE SOLVED: To provide a moving image content evaluation device easily and objectively evaluating moving image content by using point-of-gaze map data and saliency map data. SOLUTION: A point-of-gaze data analysis unit obtains an eyesight distribution by using point-of-gaze data including a point-of-gaze coordinate value about image content. An image analysis unit calculates saliency map data corresponding to a plurality of vision properties by applying a parameter for image analysis. A comparison processing unit calculates the degree of matching between the point-of-gaze map data and the saliency map data. A parameter determination unit determines an optimum parameter for evaluation based on the degree of matching. The moving image content to be evaluated without point-of-gaze map data is analyzed based on moving image content for learning having point-of-gaze map data, saliency map data, and a parameter for evaluation, and a point-of-gaze distribution of an observer is estimated. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、動画像コンテンツを評価するための動画像コンテンツ評価装置およびコンピュータプログラムに関する。 The present invention relates to a moving image content evaluation apparatus and a computer program for evaluating moving image content.

放送用コンテンツや、ネットワーク配信サービスに供されるコンテンツや、公共施設等に設置された情報提供装置に表示される映像広告等のコンテンツは、視聴者の注目が引きつけられるように映像制作時に映像面での工夫がされている。例えば、コンテンツの制作者であるディレクターやカメラマンは、経験に基づくノウハウや映像制作の専門技法に基づいて、視聴者に注意を喚起させたり注目させたりすることを意図した構図やカメラワークを決定することが多い。このような事情により、制作者には、コンテンツが視聴者にどのように見られているかを客観的に知り、それを制作に活かしたいという要求がある。 Broadcasting content, content provided for network distribution services, and content such as video advertisements displayed on information provision devices installed in public facilities can be viewed at the time of video production so that the viewer's attention is drawn. Has been devised. For example, a director or cameraman who is a content creator decides a composition or camera work that is intended to attract or attract attention from viewers based on experience-based know-how or video production expertise. There are many cases. Under such circumstances, producers are required to objectively know how content is viewed by viewers and use it for production.

ところで、人間がものを見るという行為に伴って、視線は動き、視界における注意や興味の引かれる方向に向けられる。そこで、人間の生体反応である視線運動を利用する様々な研究が行われている。例えば、画面に表示された動画像を観察する観察者の眼球運動をカメラで撮像し、その撮像画像から視線の動きを測定して、観察者の視線が画面上の動画像のどの位置（注視点）に向けられているかを求める技術が知られている。そして、その技術を用いて、観察者の視線の動きの測定結果に基づき映像コンテンツを評価する技術が提案されている（例えば、特許文献１、特許文献２を参照）。 By the way, with the act of human beings looking at things, the line of sight moves and is directed in a direction that attracts attention and interest in the field of view. Therefore, various studies have been conducted using eye movement, which is a human biological reaction. For example, an eye movement of an observer observing a moving image displayed on the screen is captured by a camera, and the movement of the line of sight is measured from the captured image, and the position of the moving image on the screen (note A technique for determining whether or not it is directed to the (viewpoint) is known. And the technique which evaluates a video content based on the measurement result of a motion of an observer's gaze using the technique is proposed (for example, refer to patent documents 1 and patent documents 2).

特許文献１に記載の映像コンテンツの評価技術は、被験者の眼球の動きを撮像して解析し、これにより得られた眼球運動データを映像の再生に同期させて提示する技術である。具体的には、同文献には、赤外線カメラで撮像した被験者の眼球の状態から、映像のフレーム画像ごとに、瞬目回数、瞳孔径変化、反応時間、眼球移動速度、眼球停留時間、停留回数、およびその位置を計算する映像コンテンツの評価装置が開示されている。そして、それらの計算結果をグラフ等により可視化することが開示されている。 The video content evaluation technique described in Patent Document 1 is a technique for imaging and analyzing the movement of the eyeball of a subject and presenting the eye movement data obtained thereby in synchronization with the reproduction of the video. Specifically, this document describes the number of blinks, pupil diameter change, reaction time, eye movement speed, eye retention time, and number of stops for each frame image from the state of the eyeball of the subject imaged by an infrared camera. , And a video content evaluation apparatus for calculating the position thereof. And it is disclosed that those calculation results are visualized by a graph or the like.

特許文献２に記載の画像評価装置は、複数の観視者について計測した注視点の座標データに基づいて、動画像データのフレーム画像ごとに注視点の分布を確率密度関数で求め、さらに動画像全体のエントロピーを計算するものである。そして、これらの注視点群を総合的に利用して、視線がフレーム画像のどの位置にどれだけ集まっているかについての確率をフレーム画像に対する集中度として計算し、コンテンツを評価する。このとき、集中度の分布を示す等高線図をフレーム画像に重畳させて表示することにより、フレーム画像のどこに注視点が集まっているかを可視化することもできる。 The image evaluation apparatus described in Patent Literature 2 obtains the distribution of a gazing point for each frame image of moving image data using a probability density function based on coordinate data of the gazing point measured for a plurality of viewers. The total entropy is calculated. Then, using these gazing point groups comprehensively, the probability of which position the line of sight gathers in the frame image is calculated as the degree of concentration with respect to the frame image, and the content is evaluated. At this time, it is possible to visualize where the gazing points are gathered in the frame image by displaying a contour map showing the distribution of the degree of concentration superimposed on the frame image.

また一方で、視聴者が画像を見たときの注意の向けられやすさの分布を推測することに用いられる顕著性マップに関する技術が知られている（例えば、非特許文献１参照）。この顕著性マップを用いた画像の評価では、被験者による目視評価実験を行う必要がなく、画像データの有する物理的特徴を解析するだけで、画像の注意の向けられやすさに関する評価を行うことができる。 On the other hand, a technique related to a saliency map that is used to estimate a distribution of ease of attention when a viewer views an image is known (for example, see Non-Patent Document 1). In the evaluation of an image using this saliency map, it is not necessary to conduct a visual evaluation experiment by a subject, and it is possible to evaluate the ease of attention of an image only by analyzing the physical characteristics of the image data. it can.

顕著性マップは、例えば、特徴マップ生成ステップと特徴マップ合成ステップとの２ステップの処理により求めることができる。このうち特徴マップ生成ステップでは、フレーム画像について、１つまたは複数の視覚属性に関する画像解析を行って特徴マップを生成する。例えば、視覚属性として、色（ｃｏｌｏｒ）、明度（ｉｎｔｅｎｓｉｔｙ）、方位（ｏｒｉｅｎｔａｔｉｏｎ）、コントラスト（ｃｏｎｔｒａｓｔ）、点滅（ｆｌｉｃｋｅｒ）、および運動（ｍｏｔｉｏｎ）の６つの属性を用いることができる。この場合、特徴マップ生成ステップでは、６つの特徴マップが生成されることになる。そして、特徴マップ合成ステップでは、視覚属性ごとに生成された特徴マップの重み付き線形和を計算して顕著性マップを求める。 The saliency map can be obtained by, for example, a two-step process including a feature map generation step and a feature map synthesis step. Of these, in the feature map generation step, a feature map is generated by performing image analysis on one or more visual attributes of the frame image. For example, six attributes of color, intensity, orientation, orientation, contrast, flicker, and motion can be used as visual attributes. In this case, in the feature map generation step, six feature maps are generated. In the feature map synthesis step, a weighted linear sum of the feature maps generated for each visual attribute is calculated to obtain a saliency map.

特開２００４−２８２４７１号公報JP 2004-282471 A 特開２００７−３１０４５４号公報JP 2007-310454 A

ＬａｕｒｅｎｔＩｔｔｉ，ＣｈｒｉｓｔｏｆＫｏｃｈ，ＥｒｎｓｔＮｉｅｂｕｒ，“ＡＭｏｄｅｌｏｆＳａｌｉｅｎｃｙ−ＢａｓｅｄＶｉｓｕａｌＡｔｔｅｎｔｉｏｎｆｏｒＲａｐｉｄＳｃｅｎｅＡｎａｌｙｓｉｓ，” ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，Ｖｏｌ．２０，Ｎｏ．１１，ｐｐ．１２５４−１２５９，１９９８年１１月．Laurent Itti, Christof Koch, Ernst Niebur, “A Model of Salientity-Based Visual Attention for Rapid Scenario Analysis,” IEEE Transactons Int. 20, no. 11, pp. 1254-1259, November 1998.

しかしながら、被験者の視線計測結果に基づいて画像コンテンツを評価する従来技術では、評価対象の画像コンテンツを変更するたびに視線計測実験をやり直す必要がある。すなわち、この従来技術では、一旦画像コンテンツを制作したのち、視線計測実験を行って注視点の分布を解析し、その結果を踏まえて画像コンテンツを制作し直して視線計測実験を再度行うという反復作業が必要となる。このように画像コンテンツ制作と評価実験とを繰り返す作業は、多大な時間と労力を要するものである。しかも、被験者のサンプル数を増やすと、その手間はさらに大きくなる。よって、被験者のサンプル数を抑えながら、画像コンテンツを簡単かつ効率的に評価する方法が求められている。 However, in the conventional technology that evaluates the image content based on the visual line measurement result of the subject, it is necessary to repeat the visual line measurement experiment every time the image content to be evaluated is changed. In other words, with this prior art, it is an iterative process that once the image content is produced, the eye gaze measurement experiment is performed to analyze the distribution of the gazing point, and the image content is produced again based on the result and the eye gaze measurement experiment is performed again. Is required. Thus, the work of repeating the image content production and the evaluation experiment requires a lot of time and labor. In addition, when the number of samples of the subject is increased, the labor is further increased. Therefore, there is a need for a method for simply and efficiently evaluating image content while suppressing the number of subjects' samples.

前述の顕著性マップを利用した評価方法においては、被験者の視線計測実験を行う必要がない。しかしながら、サンプル画像から求めた顕著性マップと、そのサンプル画像についての視線計測実験結果とを比較すると、顕著性の高い領域と注視点の集中する領域とが一致しない場合がある。これは、人間の視線の動きが、顕著性マップを計算するために用いられる属性である、画像自体の有する視覚的な特徴（ボトムアップ要因）だけでなく、被験者の嗜好、興味、経験、知識等の要因（トップダウン要因）にも影響されるためであると考えられる。 In the evaluation method using the above-described saliency map, it is not necessary to perform a gaze measurement experiment on the subject. However, when the saliency map obtained from the sample image is compared with the line-of-sight measurement experiment result for the sample image, the region with high saliency and the region where the gazing point concentrates may not match. This is not only the visual characteristics (bottom-up factor) of the image itself, which is the attribute used to calculate the saliency map, but also the subject's preference, interest, experience, knowledge This is thought to be due to factors such as top-down factors.

このようなトップダウン要因による影響をできるだけ少なくするためには、顕著性マップを生成するための画像解析処理における物理的特徴に関する画像解析用パラメータを、統計的に適正な値に設定する方法が求められる。しかしながら、従来は、顕著性マップ計算用ツールを用いる際に、初期設定値をそのまま利用したり、経験により得られた値を利用したりすることが通常であった。つまり、この画像解析用パラメータの設定方法に関する検討は従来の技術においては十分に行われていない。 In order to reduce the influence of such top-down factors as much as possible, a method of setting image analysis parameters related to physical characteristics in image analysis processing for generating a saliency map to a statistically appropriate value is required. It is done. However, conventionally, when using a saliency map calculation tool, it is usual to use the initial setting value as it is or to use a value obtained through experience. In other words, the conventional technique has not sufficiently studied the method for setting the image analysis parameter.

本発明は、上記の課題認識に基づいてなされたものであり、視線計測に基づく注視点の分布と画像解析処理に基づく顕著性の分布とを利用して、動画像コンテンツを容易に且つ客観的に評価することのできる、動画像コンテンツ評価装置およびコンピュータプログラムを提供することを目的とする。 The present invention has been made on the basis of the above problem recognition, and makes it easy and objective for moving image content using the distribution of gaze points based on gaze measurement and the distribution of saliency based on image analysis processing. It is an object of the present invention to provide a moving image content evaluation apparatus and a computer program that can be evaluated in a simple manner.

［１］上記の課題を解決するため、本発明の一態様による動画像コンテンツ評価装置は、複数の学習用動画像コンテンツ及び評価対象動画像コンテンツの各動画像コンテンツに対して、視覚属性に関する画像解析用パラメータを基に映像解析を行い、前記動画像コンテンツに含まれる画素に対応した顕著性分布を示す顕著性マップデータを生成する画像解析部と、前記学習用動画像コンテンツに関する視力分布を示す注視点マップデータと前記顕著性マップデータとに基づいて、前記注視点マップデータと前記顕著性マップデータとの間の類似性の指標である一致度を算出する比較処理部と、前記画像解析部において前記学習用動画像コンテンツのそれぞれに対して異なる画像解析用パラメータを基に映像解析を行った結果に基づいて前記比較処理部が算出した前記注視点マップデータと前記顕著性マップデータとの間の一致度に基づき画像解析用パラメータを選択し、前記選択された画像解析用パラメータを評価用パラメータの初期値として決定し、前記評価用パラメータの初期値を用いて最急勾配法によって最終評価用パラメータを求め、前記最終評価用パラメータを最適な画像解析用パラメータとして決定するパラメータ決定部と、前記評価対象動画像コンテンツに類似する学習用動画像コンテンツである類似学習用動画像コンテンツを決定する類似画像コンテンツ決定部と、前記類似画像コンテンツ決定部によって決定された前記類似学習用動画像コンテンツに対して前記パラメータ決定部が決定した前記最適な画像解析用パラメータに基づき前記画像解析部が前記評価対象動画像コンテンツの映像解析を行って生成した前記顕著性マップデータを、前記評価対象動画像コンテンツの前記視力分布と推定される注視点マップデータとして出力する注視点マップ推定部と、を具備することを特徴とする。
上記の構成において、視力分布とは、注視点およびその周辺における人の視力の分布を表す。このとき、注視点における視力に対して、その周辺部分における視力は注視点からの距離に応じて次第に低くなっていく。また、複数の被験者について注視点を測定した結果に基づき、当該複数被験者の視力分布を重畳したものを上での視力分布として用いても良い。この視力分布は、人の注視の度合いを表すものであり、注視力分布とも言える。
また、顕著性分布は、言い換えれば誘目性分布である。誘目性とは、視覚的な注意の引きやすさの程度をいう。
また、上記の視覚属性は、１種類であっても複数種類であっても良い。画像解析用パラメータは、例えば、視覚属性ごとの重み付けを表わす。
また、パラメータ決定部は、評価用パラメータを求めるための１つの学習用動画像コンテンツに関する注視点マップデータと、その他複数の学習用動画像コンテンツに関する注視点マップデータとの類似度を計算し、計算された前記類似度に基づき一又は複数の前記学習用動画像コンテンツに関する注視点マップデータを選択し、選択された注視点マップデータに関連付けられた画像解析用パラメータを選択する。次に、パラメータ決定部は、評価用パラメータを求めるための１つの学習用動画像コンテンツに関する注視点マップデータと、前記選択した一又は複数の画像解析用パラメータを基に画像解析部によって計算される顕著性マップデータとに基づいて比較処理部で計算される一致度のうち、最も高い一致度を示す画像解析用パラメータを評価用パラメータとして決定する。さらに、パラメータ決定部は、その他の学習用動画像コンテンツについても同様に評価用パラメータを決定する。
つまり、パラメータ決定部は、評価用パラメータを求めるための学習用動画像コンテンツの注視点マップデータとその他の学習用動画像コンテンツの注視点マップデータとの類似度に基づいて一又は複数の学習用動画像コンテンツの注視点マップデータに絞り込み、この絞り込まれた注視点マップデータのうちこれらに対応する学習用動画像コンテンツの顕著性マップデータとの一致度の最も高い注視点マップデータに対応する画像解析用パラメータに基づいて評価用パラメータを決定する。
類似画像コンテンツ決定部は、例えば、評価対象動画像コンテンツおよび学習用動画像コンテンツに対して、共通の顕著性マップデータを生成するための評価用パラメータを用いて、所定の視覚属性に関する特徴マップから得られた顕著性マップデータの類似性を判定することによって、評価対象動画像コンテンツに類似する学習用動画像コンテンツを決定する。または、動画像コンテンツの特徴に基づくクラスタリング処理を行って、その結果により動画像コンテンツ間の類似性を判定するようにしてもよい。 [1] In order to solve the above-described problem, a moving image content evaluation apparatus according to an aspect of the present invention provides an image relating to visual attributes for moving image contents of a plurality of learning moving image contents and evaluation target moving image content. An image analysis unit that performs video analysis based on analysis parameters and generates saliency map data indicating saliency distribution corresponding to pixels included in the moving image content; and visual acuity distribution related to the learning moving image content A comparison processing unit that calculates a degree of coincidence that is an index of similarity between the gazing point map data and the saliency map data based on the gazing point map data and the saliency map data; and the image analysis unit The ratio of the learning video content based on a result of video analysis based on different image analysis parameters for each of the learning moving image contents. Processor selects based on the Evaluation Technical parameters for image analysis degree of coincidence between said gazing point map data calculated the saliency map data, as an initial value of the evaluation parameter of the parameter for image analysis said selected A parameter determining unit that determines a final evaluation parameter by a steepest gradient method using an initial value of the evaluation parameter, and determines the final evaluation parameter as an optimal image analysis parameter; and the evaluation target moving image a similar image content determination unit that determines a moving image content for similar learning is a moving image content for learning that is similar to the content, the parameters determined with respect to the similar learning moving image contents determined by the similar image content determination unit the image analysis unit is the evaluation pair based on the optimal image analysis parameter section decides A gazing point map estimation unit that outputs the saliency map data generated by performing video analysis of moving image content as gazing point map data estimated as the visual acuity distribution of the evaluation target moving image content; It is characterized by.
In the above configuration, the visual acuity distribution represents the distribution of the visual acuity of the person at the gaze point and its surroundings. At this time, the visual acuity at the peripheral portion thereof gradually decreases with respect to the visual acuity at the gazing point according to the distance from the gazing point. Moreover, based on the result of measuring the gazing point for a plurality of subjects, a superimposition of the visual acuity distributions of the plurality of subjects may be used as the above visual acuity distribution. This visual acuity distribution represents the degree of human gaze, and can be said to be a visual acuity distribution.
The saliency distribution is, in other words, an attractive distribution. Attraction is the degree of ease of visual attention.
The visual attributes may be one type or a plurality of types. The image analysis parameter represents weighting for each visual attribute, for example.
In addition, the parameter determination unit calculates and calculates the similarity between the gazing point map data related to one learning moving image content for obtaining the evaluation parameter and the gazing point map data related to a plurality of other learning moving image contents. Based on the similarity, the gaze point map data related to one or a plurality of learning moving image contents is selected, and the image analysis parameter associated with the selected gaze point map data is selected. Next, the parameter determination unit is calculated by the image analysis unit based on the gazing point map data relating to one learning moving image content for obtaining the evaluation parameter and the selected one or more image analysis parameters. Of the matching degrees calculated by the comparison processing unit based on the saliency map data, an image analysis parameter indicating the highest matching degree is determined as an evaluation parameter. Further, the parameter determination unit similarly determines the evaluation parameters for other learning moving image contents.
In other words, the parameter determination unit determines one or a plurality of learning based on the similarity between the gazing point map data of the learning moving image content and the gazing point map data of other learning moving image content for obtaining the evaluation parameter. An image corresponding to the gazing point map data having the highest degree of coincidence with the saliency map data of the learning moving image content corresponding to the narrowed gazing point map data, narrowed down to the gazing point map data of the moving image content An evaluation parameter is determined based on the analysis parameter.
The similar image content determination unit uses, for example, a feature map relating to a predetermined visual attribute using an evaluation parameter for generating common saliency map data for the evaluation target moving image content and the learning moving image content. By determining the similarity of the obtained saliency map data, a learning moving image content similar to the evaluation target moving image content is determined. Alternatively, clustering processing based on the characteristics of moving image content may be performed, and similarity between moving image contents may be determined based on the result.

［２］また、本発明の一態様による動画像コンテンツ評価装置は、前記複数の学習用動画像コンテンツに対して、視線を計測した結果である注視点の座標値を含む注視点データに基づき、前記学習用動画像コンテンツに含まれる画素に対応した視力分布を示す注視点マップデータを生成する注視点データ解析部をさらに備え、前記比較処理部は、前記複数の学習用動画像コンテンツそれぞれに関する前記顕著性マップデータと前記注視点データ解析部が生成した前記注視点マップデータとに基づいて、前記注視点マップデータと前記顕著性マップデータとの間の類似性の指標である一致度を算出することを特徴とする。 [2] Furthermore, the moving image content evaluation apparatus according to an aspect of the present invention is based on gazing point data including coordinate values of a gazing point, which is a result of measuring the line of sight with respect to the plurality of learning moving image contents. A gazing point data analysis unit that generates gazing point map data indicating visual acuity distribution corresponding to pixels included in the learning moving image content is further provided, and the comparison processing unit relates to each of the plurality of learning moving image contents. Based on the saliency map data and the gazing point map data generated by the gazing point data analysis unit, a degree of coincidence that is an index of similarity between the gazing point map data and the saliency map data is calculated. It is characterized by that.

［３］また、本発明のコンピュータプログラムは、コンピュータを、複数の学習用動画像コンテンツ及び評価対象動画像コンテンツの各動画像コンテンツに対して、視覚属性に関する画像解析用パラメータを基に映像解析を行い、前記動画像コンテンツに含まれる画素に対応した顕著性分布を示す顕著性マップデータを生成する画像解析部と、前記学習用動画像コンテンツに関する視力分布を示す注視点マップデータと前記顕著性マップデータとに基づいて、前記注視点マップデータと前記顕著性マップデータとの間の類似性の指標である一致度を算出する比較処理部と、前記画像解析部において前記学習用動画像コンテンツのそれぞれに対して異なる画像解析用パラメータを基に映像解析を行った結果に基づいて前記比較処理部が算出した前記注視点マップデータと前記顕著性マップデータとの間の一致度に基づき画像解析用パラメータを選択し、前記選択された画像解析用パラメータを評価用パラメータの初期値として決定し、前記評価用パラメータの初期値を用いて最急勾配法によって最終評価用パラメータを求め、前記最終評価用パラメータを最適な画像解析用パラメータとして決定するパラメータ決定部と、前記評価対象動画像コンテンツに類似する学習用動画像コンテンツである類似学習用動画像コンテンツを決定する類似画像コンテンツ決定部と、前記類似画像コンテンツ決定部によって決定された前記類似学習用動画像コンテンツに対して前記パラメータ決定部が決定した前記最適な画像解析用パラメータに基づき前記画像解析部が前記評価対象動画像コンテンツの映像解析を行って生成した前記顕著性マップデータを、前記評価対象動画像コンテンツの前記視力分布と推定される注視点マップデータとして出力する注視点マップ推定部、として機能させる。 [3] Further, the computer program of the present invention causes the computer to perform video analysis based on image analysis parameters related to visual attributes for each of the plurality of moving image contents for learning and the moving image content to be evaluated. And an image analysis unit that generates saliency map data indicating saliency distribution corresponding to pixels included in the moving image content, gazing point map data indicating visual acuity distribution related to the learning moving image content, and the saliency map Each of the learning moving image content in the image analysis unit and a comparison processing unit that calculates a degree of coincidence that is an index of similarity between the gazing point map data and the saliency map data. Calculated by the comparison processing unit based on the results of video analysis based on different image analysis parameters Serial select based-out parameters for image analysis degree of coincidence between the gazing point map data wherein the saliency map data, to determine the parameters for the selected image analysis as the initial value of the evaluation parameter, the evaluation A parameter determination unit that obtains a final evaluation parameter by the steepest gradient method using an initial value of a parameter for determination, and determines the final evaluation parameter as an optimal image analysis parameter; and learning similar to the evaluation target moving image content a similar image content determination unit that determines a moving image content for similar learning a use moving image content, wherein the parameter determining unit to said similar learning moving image content as determined by the similar image content determination unit has determined the the evaluation target moving image content is the image analysis unit based on the optimal image analysis parameters The saliency map data generated by performing a video analysis, gazing point map estimation unit for outputting as a gazing point map data that is estimated to the visual acuity distribution of the evaluation target moving image contents, to function as a.

［４］さらに、上記の課題を解決するため、次の態様としてもよい。
複数の学習用動画像コンテンツそれぞれに対して、視線を計測した結果である注視点の座標値を含む注視点データに基づき、前記学習用動画像コンテンツに含まれる画素に対応した視力分布を示す注視点マップデータを生成し、前記複数の学習用動画像コンテンツそれぞれに対して、視覚属性に関する複数の画像解析用パラメータを基に映像解析をパラメータ毎に行い、前記学習用動画像コンテンツに含まれる画素に対応した顕著性分布を示す顕著性マップデータを生成し、前記複数の学習用動画像コンテンツそれぞれに対して、前記注視点マップデータと前記顕著性マップデータとに基づいて、前記注視点マップデータと前記顕著性マップデータとの間の類似性の指標である一致度を算出し、前記複数の学習用動画像コンテンツそれぞれに対して、前記注視点マップデータと前記顕著性マップデータとの一致度が一番高かったときの画像解析用パラメータを当該学習用動画像コンテンツの評価用パラメータと決定し、入力した評価対象動画像コンテンツに一番類似する学習用動画像コンテンツを前記複数の学習用動画像コンテンツから決定し、決定した一番類似する学習用動画像コンテンツの評価用パラメータに基づき、前記評価対象動画像コンテンツの映像解析を行って顕著性マップデータを生成し、前記顕著性マップデータを前記評価対象動画像コンテンツの注視点マップデータとして推定して出力することを特徴とする動画像コンテンツ評価方法。 [4] Furthermore, in order to solve the above-described problems, the following mode may be adopted.
Note indicating the visual acuity distribution corresponding to the pixels included in the learning moving image content based on the gazing point data including the coordinate value of the gazing point as a result of measuring the line of sight for each of the plurality of learning moving image contents. Pixels included in the learning moving image content by generating viewpoint map data, performing video analysis for each of the plurality of learning moving image contents for each parameter based on a plurality of image analysis parameters related to visual attributes And generating saliency map data indicating a saliency distribution corresponding to each of the plurality of learning moving image contents based on the gazing point map data and the saliency map data. And a degree of coincidence that is an index of similarity between the saliency map data and each of the plurality of learning moving image contents Then, the image analysis parameter when the degree of coincidence between the gazing point map data and the saliency map data is the highest is determined as the evaluation parameter of the learning moving image content, and the input evaluation target moving image The learning moving image content most similar to the content is determined from the plurality of learning moving image contents, and the video of the evaluation target moving image content is determined based on the evaluation parameters of the determined most similar learning moving image content A moving image content evaluation method comprising: generating saliency map data by performing analysis, estimating and outputting the saliency map data as gazing point map data of the evaluation target moving image content.

本発明によれば、画像コンテンツを容易に且つ客観的に評価するために、学習用動画像コンテンツの視線計測実験によって得られた注視点の分布と、学習用動画像コンテンツに対する映像解析処理に基づく顕著性の分布との類似性を利用して、画像解析用パラメータを取得することができる。そして、評価対象動画像コンテンツに類似する学習用動画像の画像解析用パラメータを、評価対象動画像コンテンツに対する顕著性マップ生成用の画像解析用パラメータとして利用することにより、視線計測実験を実施していない評価対象動画像コンテンツであっても、視聴者の注視点の分布と推定することができる。 According to the present invention, in order to easily and objectively evaluate image content, it is based on the distribution of the gazing point obtained by the gaze measurement experiment of the learning moving image content and the video analysis processing on the learning moving image content. Image analysis parameters can be acquired using similarity to the saliency distribution. Then, a line-of-sight measurement experiment is performed by using the image analysis parameters of the learning moving image similar to the evaluation target moving image content as the image analysis parameters for generating the saliency map for the evaluation target moving image content. Even if there is no evaluation target moving image content, it can be estimated as the distribution of the viewer's gaze point.

本発明の一実施形態による画像コンテンツ評価装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image content evaluation apparatus by one Embodiment of this invention. 同実施形態における、データ格納部に格納されるデータ構成の例を示した図である。It is the figure which showed the example of the data structure stored in the data storage part in the embodiment. 同実施形態における、注視点データ生成装置の機能構成図である。2 is a functional configuration diagram of a gazing point data generation device in the embodiment. FIG. 同実施形態における、格納部に記録される注視点データのデータ構成図である。It is a data block diagram of the gaze point data recorded on the storage part in the embodiment. 同実施形態において、注視点データ解析部が画像コンテンツについての注視点データからフレーム画像ごとの注視点マップデータを生成する手順を示すフローチャートである。In the embodiment, the gazing point data analysis unit is a flowchart showing a procedure for generating gazing point map data for each frame image from gazing point data for image content. 同実施形態において、被験者が表示画面を観察することによって視線が向けられた様子と、表示画面上の注視点の座標とを模式的に表した図である。In the same embodiment, it is a figure showing typically signs that a subject looked at by observing a display screen, and coordinates of a gazing point on a display screen. 同実施形態における、視線方向に対する偏心度と周辺視力値との関係を表す３次元グラフの例である。It is an example of the three-dimensional graph showing the relationship between the eccentricity with respect to a gaze direction and the peripheral visual acuity value in the same embodiment. 同実施形態における、注視点マップデータの３次元グラフの例である。It is an example of the three-dimensional graph of gaze point map data in the embodiment. 同実施形態において、画像解析部が画像コンテンツについての顕著性マップデータを生成する手順を示すフローチャートである。In the same embodiment, it is a flowchart which shows the procedure in which an image analysis part produces | generates the saliency map data about an image content. 同実施形態における、画像解析部が設定する画像解析用パラメータのデータ構成を示した図である。It is the figure which showed the data structure of the parameter for image analysis which the image analysis part sets in the same embodiment. 同実施形態における、参照されたフレーム画像について生成された顕著性マップの３次元グラフの例である。4 is an example of a three-dimensional graph of a saliency map generated for a referenced frame image in the embodiment. 同実施形態において、比較処理部が画像コンテンツについての注視点マップデータと顕著性マップデータとの一致度を計算する手順を示すフローチャートである。5 is a flowchart illustrating a procedure in which a comparison processing unit calculates a degree of coincidence between gaze point map data and saliency map data for image content in the embodiment. 同実施形態において、パラメータ決定部が、学習用動画像コンテンツに関する注視点マップデータおよび画像解析用パラメータを用いて、顕著性マップデータを生成するための評価用パラメータを決定する手順を示すフローチャートである。In the embodiment, the parameter determination unit is a flowchart showing a procedure for determining an evaluation parameter for generating saliency map data by using gaze point map data and image analysis parameters related to learning moving image content. . 同実施形態における、画像コンテンツ評価装置の注視点マップデータの推定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the estimation process of the gaze point map data of the image content evaluation apparatus in the embodiment.

以下、図面を参照しながら、本発明の実施形態について説明する。
図１は、本発明の一実施形態による画像コンテンツ評価装置の機能構成を示すブロック図である。同図において、画像コンテンツ評価装置１は、データ格納部１１と、注視点データ解析部１２と、画像解析部１３と、比較処理部１４と、パラメータ決定部１５と、画像入力部２１と、類似画像コンテンツ決定部２２と、注視点マップ推定部２３とを含んで構成される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a functional configuration of an image content evaluation apparatus according to an embodiment of the present invention. In the figure, the image content evaluation apparatus 1 is similar to a data storage unit 11, a gazing point data analysis unit 12, an image analysis unit 13, a comparison processing unit 14, a parameter determination unit 15, and an image input unit 21. An image content determination unit 22 and a gazing point map estimation unit 23 are included.

図２は、データ格納部１１に格納されるデータの構成例を示す概略図である。データ格納部１１は、半導体メモリや磁気ハードディスクなどを用いて実現される。同図（ａ）に示すように、データ格納部１１は、被験者に観測させたり画像解析処理を行ったりするための画像コンテンツを格納する。この画像コンテンツは、フレーム画像単位で制御することのできるデータであり、フレーム画像ごとのタイムコードを含んだものである。本実施形態で用いる画像コンテンツの内容は、トップダウン要因の影響をできるだけ低減させたもの、あるいは排除したものである。 FIG. 2 is a schematic diagram illustrating a configuration example of data stored in the data storage unit 11. The data storage unit 11 is realized using a semiconductor memory, a magnetic hard disk, or the like. As shown in FIG. 5A, the data storage unit 11 stores image content for allowing a subject to observe or perform image analysis processing. This image content is data that can be controlled in units of frame images, and includes a time code for each frame image. The contents of the image content used in the present embodiment are those in which the influence of the top-down factor is reduced as much as possible or eliminated.

例えば、公知文献（Ran Carmi and Laurrent Itti, "Causal Saliency Effects During Natural Vision”, Proc. of Symposium on Eye Tracking Research & Applications, pp. 11-18, March, 2006.）に記載されているように、被験者にとって画像コンテンツの内容に認知的な意味を含まない動画像や未知の動画像を画像コンテンツとして用いる。または、一般的な動画像であっても、再生時間が数秒程度である動画像を用いることによって、トップダウン要因の影響を低減もしくは排除できると考えられるため、例えば、５秒間の音声を含まない動画像を画像コンテンツとして用いる。これにより、被験者に考える余裕を与えず、トップダウン要因によらずに画像の物理的特徴のみから誘導される視線の動きを捉えることができる。 For example, as described in known literature (Ran Carmi and Laurrent Itti, “Causal Saliency Effects During Natural Vision”, Proc. Of Symposium on Eye Tracking Research & Applications, pp. 11-18, March, 2006.) A moving image or an unknown moving image that does not include a cognitive meaning in the content of the image content for the subject is used as the image content. Or even if it is a general moving image, it is considered that the influence of the top-down factor can be reduced or eliminated by using a moving image having a playback time of about several seconds. A moving image is used as image content. Thereby, it is possible to capture the movement of the line of sight induced only from the physical characteristics of the image without giving a margin to the subject to consider and without depending on the top-down factor.

なお、画像コンテンツは、複数のフレーム画像を含む動画像コンテンツであってもよいし、単一のフレーム画像である静止画像コンテンツであってもよい。本実施形態においては、画像コンテンツとして動画像コンテンツ（評価対象動画像コンテンツ、学習用動画像コンテンツ）を用いた例について説明する。 The image content may be moving image content including a plurality of frame images, or may be still image content that is a single frame image. In the present embodiment, an example in which moving image content (evaluation target moving image content, learning moving image content) is used as the image content will be described.

また、データ格納部１１は、上記の画像コンテンツのフレーム画像（＃１〜＃Ｎ）に対応させて、注視点データと、注視点マップデータと、特徴マップデータと、顕著性マップデータと、一致度とを格納する。 Further, the data storage unit 11 matches the gazing point data, the gazing point map data, the feature map data, and the saliency map data in correspondence with the frame images (# 1 to #N) of the image content. Store the degree.

注視点データは、一人または複数の被験者の視線を測定して得られた注視点の座標値を含むデータである。注視点マップデータは、注視点データをもとに、注視点の周辺視野を考慮して計算された注視点の分布を示すデータである。特徴マップデータは、フレーム画像の視覚属性ごとに求められた特徴量の分布を示すデータである。顕著性マップデータは、特徴マップデータの重み付け線形和が計算されて得られた顕著性（画像に対する注意の向けられやすさ）の分布、すなわち誘目性分布を示すデータである。一致度は、注視点マップデータと顕著性マップデータとの類似性の指標である一致度合いを示すデータである。ここに示した各マップデータは、水平方向画素数Ｗ×垂直方向画素数Ｈのフレーム画像の画素に相当する行列のデータであり、この行列の各要素はスカラ値である。 The gazing point data is data including coordinate values of the gazing point obtained by measuring the line of sight of one or a plurality of subjects. The gazing point map data is data indicating a distribution of gazing points calculated in consideration of the peripheral visual field of the gazing point based on the gazing point data. The feature map data is data indicating the distribution of feature amounts obtained for each visual attribute of the frame image. The saliency map data is data indicating a distribution of saliency (ease of attention to an image) obtained by calculating a weighted linear sum of feature map data, that is, an attractiveness distribution. The degree of coincidence is data indicating the degree of coincidence, which is an index of similarity between the gazing point map data and the saliency map data. Each map data shown here is data of a matrix corresponding to a pixel of a frame image having the number of horizontal pixels W × the number of vertical pixels H, and each element of the matrix is a scalar value.

また、図２（ｂ）に示すように、データ格納部１１は、画像コンテンツに対応させて評価値と、画像解析用パラメータとを格納する。評価値は、注視点マップデータと顕著性マップデータとの一致度を当該画像コンテンツ全体として評価した値である。画像解析用パラメータ（特に動画像コンテンツの場合の画像解析用パラメータを、映像解析用パラメータと呼ぶ。）は、各特徴マップデータを線形的に総和することによって顕著性マップデータを計算するための設定情報であり、視覚属性ごとの重みデータを含む。 As shown in FIG. 2B, the data storage unit 11 stores an evaluation value and an image analysis parameter in association with the image content. The evaluation value is a value obtained by evaluating the degree of coincidence between the gazing point map data and the saliency map data as the entire image content. Image analysis parameters (especially image analysis parameters in the case of moving image content are called video analysis parameters) are settings for calculating saliency map data by linearly summing each feature map data Information, including weight data for each visual attribute.

図１に戻り、注視点データ解析部１２は、データ格納部１１に格納された一人または複数の被験者についての注視点データを解析することによって、注視点に関する注視点データ評価指標データを生成する。言い換えれば、注視点データ解析部１２は、画像コンテンツに対応して注視点の座標値を含む注視点データに基づき、画像コンテンツに含まれる画素に対応した視力分布を示す注視点マップデータを生成する。
画像解析部１３は、フレーム画像の物理的特徴量を用いた画像解析処理によって、顕著性に関する評価指標データである顕著性評価指標データを生成する。言い換えれば、画像解析部１３は、画像コンテンツを基に、画素に対応した視覚属性ごとの特徴量データを算出するとともに、特徴量データと視覚属性ごとに定められる重みデータとに基づいて画素に対応した顕著性の分布を示す顕著性マップデータを生成する。 Returning to FIG. 1, the gazing point data analysis unit 12 generates gazing point data evaluation index data related to the gazing point by analyzing the gazing point data for one or a plurality of subjects stored in the data storage unit 11. In other words, the gazing point data analysis unit 12 generates gazing point map data indicating a visual acuity distribution corresponding to the pixels included in the image content based on the gazing point data including the coordinate value of the gazing point corresponding to the image content. .
The image analysis unit 13 generates saliency evaluation index data, which is evaluation index data related to saliency, by image analysis processing using the physical feature amount of the frame image. In other words, the image analysis unit 13 calculates feature amount data for each visual attribute corresponding to the pixel based on the image content, and handles the pixel based on the feature amount data and the weight data determined for each visual attribute. The saliency map data indicating the distribution of the saliency is generated.

比較処理部１４は、それぞれ生成された注視点データ評価指標データと顕著性評価指標データとを比較し、注視点の分布と顕著性の分布との一致度を計算する。言い換えれば、比較処理部１４は、前記画像コンテンツに関する前記注視点マップデータと前記顕著性マップデータとに基づき、前記注視点マップデータと前記顕著性マップデータとの間の類似性の指標である一致度を算出する。
パラメータ決定部１５は、上記の一致度に基づいて、画像解析部１３において顕著性マップデータを計算するために用いられる評価用パラメータを決定する。 The comparison processing unit 14 compares the generated gazing point data evaluation index data and the saliency evaluation index data, respectively, and calculates the degree of coincidence between the gazing point distribution and the saliency distribution. In other words, the comparison processing unit 14 is a match that is an index of similarity between the gazing point map data and the saliency map data based on the gazing point map data and the saliency map data regarding the image content. Calculate the degree.
The parameter determination unit 15 determines an evaluation parameter used for calculating the saliency map data in the image analysis unit 13 based on the degree of coincidence.

画像入力部２１は、評価すべき評価対象動画像コンテンツが外部から供給されると、その評価対象動画像コンテンツを画像コンテンツ評価装置１に入力してデータ格納部１１に格納する。
類似画像コンテンツ決定部２２は、複数の学習用動画像コンテンツの中から、評価対象動画像コンテンツに類似する学習用動画像コンテンツである類似学習用動画像コンテンツを決定する。
注視点マップ推定部２３は、類似学習用動画像コンテンツにおける評価用パラメータを評価対象動画像コンテンツに対する顕著性マップデータ算出のための評価用パラメータとして使用して画像解析部１３が生成した顕著性マップデータを、推定注視点マップデータとして出力する。 When the evaluation target moving image content to be evaluated is supplied from the outside, the image input unit 21 inputs the evaluation target moving image content to the image content evaluation device 1 and stores it in the data storage unit 11.
The similar image content determination unit 22 determines a similar learning moving image content that is a learning moving image content similar to the evaluation target moving image content from among a plurality of learning moving image contents.
The gazing point map estimation unit 23 uses the evaluation parameter in the similar learning moving image content as an evaluation parameter for calculating the saliency map data for the evaluation target moving image content, and the saliency map generated by the image analysis unit 13. The data is output as estimated gaze point map data.

次に、データ格納部１１に予め格納される注視点データの生成の手段および生成の方法について説明する。注視点データは、画像コンテンツ評価装置１とは別の装置である注視点データ生成装置が一人または複数の被験者を対象として視線計測を行うことにより生成される。この注視点データ生成処理は、画像コンテンツ評価処理の前処理として位置づけられるものである。 Next, means for generating gaze point data stored in advance in the data storage unit 11 and a generation method will be described. The gazing point data is generated when the gazing point data generation device, which is a device different from the image content evaluation device 1, performs gaze measurement for one or more subjects. This gaze point data generation process is positioned as a pre-process for the image content evaluation process.

図３は、注視点データ生成装置の機能構成を示すブロック図である。同図において、注視点データ生成装置３は、画像再生部３１と、画像表示部３２と、注視点データ計測部３３と、注視点データ記録部３４と、格納部３５とを含んで構成される。画像再生部３１は、被験者に観察させるための画像コンテンツを格納部３５から読み出して再生する。画像表示部３２は、再生された画像コンテンツを画面に表示する。注視点データ計測部３３は、画像表示部３２に表示された動画像を観察する被験者の眼球の動きを測定し、画面上の位置である注視点の座標値を計測する。注視点データ記録部３４は、画像コンテンツの再生に同期させて、注視点の座標値を格納部３５に記録する。 FIG. 3 is a block diagram illustrating a functional configuration of the gazing point data generation device. In the figure, the gazing point data generation device 3 includes an image reproduction unit 31, an image display unit 32, a gazing point data measurement unit 33, a gazing point data recording unit 34, and a storage unit 35. . The image reproduction unit 31 reads out and reproduces image content for allowing the subject to observe from the storage unit 35. The image display unit 32 displays the reproduced image content on the screen. The gaze point data measurement unit 33 measures the movement of the eyeball of the subject who observes the moving image displayed on the image display unit 32, and measures the coordinate value of the gaze point that is the position on the screen. The gazing point data recording unit 34 records the coordinate value of the gazing point in the storage unit 35 in synchronization with the reproduction of the image content.

次に、注視点データ生成装置３の、より具体的な構成およびその動作について説明する。注視点データ計測部３３は、被験者の眼球の動きを測定するための注視点測定器３３ａを備えている。この注視点測定器３３ａは従来技術によるものでよく、例えば、被験者の顔に視覚センサを装着させて注視点を検出するタイプや、コンタクトレンズやゴーグルを装着させて視線を測定するタイプなどを用いることができる。 Next, a more specific configuration and operation of the gazing point data generation device 3 will be described. The gazing point data measuring unit 33 includes a gazing point measuring device 33a for measuring the movement of the eyeball of the subject. This gaze point measuring device 33a may be based on the prior art, for example, a type that detects a gaze point by attaching a visual sensor to the face of the subject, or a type that measures a line of sight by attaching a contact lens or goggles. be able to.

本実施形態の注視点測定器３３ａは、撮像した画像を基に眼球の動きをリモートセンシングして注視点を検出する瞳孔角膜反射法による測定法を用いている。この注視点測定器３３ａは、画像表示部３２の表示面から一定距離をおいた位置から画面を観察する被験者の眼球部分を近赤外線光で照射し、目の表面での角膜反射像をカメラで撮像する測定を行う。そして、撮像画像から瞳孔の中心点と角膜反射点とを検出して、幾何学的に視線の向きと画面上の注視点の座標値とを計算する。 The gaze point measuring device 33a of the present embodiment uses a measurement method based on the pupil corneal reflection method that detects the point of gaze by remote sensing the movement of the eyeball based on the captured image. The gaze point measuring device 33a irradiates the eyeball portion of the subject who observes the screen from a position at a certain distance from the display surface of the image display unit 32 with near infrared light, and uses the camera to display a corneal reflection image on the surface of the eye. Take measurements to image. Then, the center point of the pupil and the corneal reflection point are detected from the captured image, and the direction of the line of sight and the coordinate value of the gazing point on the screen are calculated geometrically.

このリモートセンシングによる測定の分解能は、０．５度から１度の範囲程度である。これは、１ラインあたりの有効画素数が１９２０画素であるＨＤＴＶ（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＴｅｌｅｖｉｓｉｏｎ）画像を例にとると、画面の水平方向の視野角を３０度とした場合に、３２画素から６４画素程度の分解能に相当する。注視点データ記録部３４は、注視点データ計測部３３で計測された注視点の座標値を、画像再生部３１による画像コンテンツの再生に同期させて注視点データとして格納部３５に記録する。つまり、画像コンテンツに含まれる各フレーム画像が注視点データに対応づけられる。 The resolution of measurement by this remote sensing is about 0.5 to 1 degree. Taking an HDTV (High Definition Television) image with an effective pixel count of 1,920 pixels per line as an example, when the viewing angle in the horizontal direction of the screen is 30 degrees, it is about 32 to 64 pixels. Corresponds to resolution. The gazing point data recording unit 34 records the coordinate value of the gazing point measured by the gazing point data measuring unit 33 in the storage unit 35 as the gazing point data in synchronization with the reproduction of the image content by the image reproducing unit 31. That is, each frame image included in the image content is associated with the gazing point data.

図４は、注視点データのデータ構成を示す概略図である。同図に示すように、注視点データは、画像コンテンツのフレーム画像のフレーム番号と、そのフレームのタイムコードと、被験者ごとの注視点の座標値とを含む。タイムコードは、画像コンテンツの最初のフレーム画像からカウントした時間情報であり、「時：分：秒．フレーム数」で表される。フレーム番号は、画像コンテンツの最初のフレーム画像を１としてタイムコードの時系列順に１ずつ増加させた番号である。注視点の座標値は、画像表示部３２に表示されるフレーム画像の表示領域の左上端の座標を原点として、水平方向画素数Ｗ×垂直方向画素数Ｈのフレーム画像上の２次元座標系で表される。例えば、タイムコードが「０：００：０５．００」のとき、被験者１の注視点の座標値は（１７５，１２２）、被験者２のそれは（１６８，１４５）、・・・、そして被験者Mのそれは（１６６，２６０）である。 FIG. 4 is a schematic diagram showing the data structure of the gazing point data. As shown in the figure, the gazing point data includes the frame number of the frame image of the image content, the time code of the frame, and the coordinate value of the gazing point for each subject. The time code is time information counted from the first frame image of the image content, and is represented by “hour: minute: second.number of frames”. The frame number is a number that is incremented by 1 in the time code chronological order, with the first frame image of the image content being 1. The coordinate value of the gazing point is a two-dimensional coordinate system on a frame image of horizontal pixel count W × vertical pixel count H with the coordinate at the upper left corner of the display area of the frame image displayed on the image display unit 32 as the origin. expressed. For example, when the time code is “0: 00: 05.00”, the coordinate value of the gazing point of the subject 1 is (175, 122), that of the subject 2 is (168, 145), and so on. It is (166,260).

上記の注視点データ生成処理によって格納部３５に格納された注視点データを、画像コンテンツ評価装置１がデータ格納部１１に取り込んで使用する。以上が、前処理である注視点データ生成処理についての説明である。 The image content evaluation apparatus 1 uses the gazing point data stored in the storage unit 35 by the above gazing point data generation process by taking it into the data storage unit 11. This completes the description of the gazing point data generation process, which is a pre-process.

次に、画像コンテンツ評価装置１による画像コンテンツ評価処理を、注視点データ解析処理、画像解析処理、比較処理、およびパラメータ決定処理に分けて動作を説明する。
＜注視点データ解析処理＞
図５は、注視点データ解析部１２が画像コンテンツについての注視点データからフレーム画像ごとの注視点マップデータを生成する手順を示すフローチャートである。ステップＳ５１において、注視点データ解析部１２は、データ格納部１１に格納された注視点データをフレーム画像単位で参照する。ステップＳ５２において、参照される注視点データがない場合（ステップＳ５２：ＮＯ）は、このフローチャートの処理を終了する。一方、注視点データがある場合（ステップＳ５２：ＹＥＳ）はステップＳ５３の処理に進む。そして、ステップＳ５３において、注視点データ解析部１２は、参照された１フレーム画像分の全被験者の注視点データを読み込む。 Next, the operation will be described by dividing the image content evaluation process by the image content evaluation apparatus 1 into a gaze point data analysis process, an image analysis process, a comparison process, and a parameter determination process.
<Gaze point data analysis processing>
FIG. 5 is a flowchart illustrating a procedure in which the gazing point data analysis unit 12 generates gazing point map data for each frame image from the gazing point data regarding the image content. In step S51, the gazing point data analysis unit 12 refers to the gazing point data stored in the data storage unit 11 in units of frame images. In step S52, when there is no gazing point data to be referred to (step S52: NO), the process of this flowchart is ended. On the other hand, if there is gazing point data (step S52: YES), the process proceeds to step S53. In step S 53, the gazing point data analysis unit 12 reads the gazing point data of all subjects for the referenced one frame image.

次に、ステップＳ５４において、注視点データ解析部１２は、周辺視野を考慮した注視点マップデータを生成する。ここで、注視点マップデータの生成処理について詳細に説明する。眼球運動に関する視覚科学の分野の知見によれば、視線と周辺視野との関係について式（１）の関係式が成立する。ここで、Ｅは視線方向に対する偏心度、Ｖｆは視線方向における視力、Ｅｓは所定の定数、Ｖは周辺視力値である。 Next, in step S54, the gazing point data analysis unit 12 generates gazing point map data considering the peripheral visual field. Here, the generation process of the gazing point map data will be described in detail. According to the knowledge in the field of visual science regarding eye movement, the relational expression (1) is established for the relation between the line of sight and the peripheral visual field. Here, E is the degree of eccentricity with respect to the visual line direction, Vf is the visual acuity in the visual line direction, Es is a predetermined constant, and V is the peripheral visual acuity value.

図６は、被験者が表示画面を観察することによって視線を向けた様子と、表示画面上の注視点の座標とを模式的に表した図である。同図（ａ）は、表示画面６１と直交し且つこの画面の中心点を貫く軸の延長上であって、表示画面６１から距離Ｌだけ離れた位置から、被験者が表示画面６１上の動画像を観察している様子を示している。あるタイムコードが示す時刻での注視点の座標値が（ＧＸ，ＧＹ）のとき、被験者の視線は点Ｐ（ＧＸ，ＧＹ）に向けられていることを表す。ここで、被験者の視線に対して偏心度Ｅの角度となる表示画面６１上の座標（Ｘ，Ｙ）における視野に注目する。ここで、点Ｐの周辺視野における視力分布は正円状であると近似することができるため、同図（ｂ）に示すように注視点の座標（ＧＸ，ＧＹ）を中心とした半径Ｒの円周上において視力は一定となる。なお、同図において、座標値ＸおよびＹは、それぞれ、１≦Ｘ≦Ｗ、１≦Ｙ≦Ｈの範囲内の整数値をとる。ただし、ＷおよびＨは、それぞれ、フレーム画像の水平方向画素数および垂直方向画素数である。 FIG. 6 is a diagram schematically illustrating a state in which a subject turns his / her line of sight by observing the display screen and coordinates of a gazing point on the display screen. FIG. 6A shows a moving image on the display screen 61 from a position on the extension of the axis perpendicular to the display screen 61 and passing through the center point of the screen, and away from the display screen 61 by a distance L. It shows a state of observing. When the coordinate value of the gazing point at the time indicated by a certain time code is (GX, GY), it indicates that the subject's line of sight is directed to the point P (GX, GY). Here, attention is paid to the visual field at the coordinates (X, Y) on the display screen 61 that is an angle of the eccentricity E with respect to the line of sight of the subject. Here, since the visual acuity distribution in the peripheral visual field at the point P can be approximated to be a perfect circle, the radius R with the gazing point coordinates (GX, GY) as the center as shown in FIG. Visual acuity is constant on the circumference. In the figure, coordinate values X and Y take integer values in the range of 1 ≦ X ≦ W and 1 ≦ Y ≦ H, respectively. Here, W and H are the number of horizontal pixels and the number of vertical pixels of the frame image, respectively.

このとき、座標（Ｘ，Ｙ）における周辺視力値Ｖ［Ｘ，Ｙ］は式（１）を変形した式（２）で表される。 At this time, the peripheral visual acuity value V [X, Y] at the coordinates (X, Y) is expressed by Expression (2) obtained by modifying Expression (1).

なお、ここで、ａｔａｎは逆正接関数である。また、画面上の任意の画素に対する視線の方向は画面に対してほぼ垂直であるため、偏心度Ｅ[Ｘ,Ｙ]を算出するためには逆正接関数を用いた近似を行うことができる。視線方向における視力Ｖｆは、被験者ごとに設定してもよいし、共通の設定としてもよい。 Here, atan is an arctangent function. In addition, since the direction of the line of sight with respect to an arbitrary pixel on the screen is substantially perpendicular to the screen, approximation using an arctangent function can be performed to calculate the eccentricity E [X, Y]. The visual acuity Vf in the line-of-sight direction may be set for each subject or may be a common setting.

図７は、注視点データ解析部１２が式（２）の計算によって求めた、視線方向に対する偏心度Ｅ［Ｘ，Ｙ］における周辺視力値Ｖ［Ｘ，Ｙ］を示す３次元グラフである。同図は、水平方向画素数Ｗ＝３２０画素，垂直方向画素数Ｈ＝２４０画素のフレーム画像についての例である。 FIG. 7 is a three-dimensional graph showing the peripheral visual acuity value V [X, Y] in the degree of eccentricity E [X, Y] with respect to the line-of-sight direction, which is obtained by the gaze point data analysis unit 12 by calculation of Expression (2). The figure shows an example of a frame image having a horizontal pixel count W = 320 pixels and a vertical pixel count H = 240 pixels.

式（２）の計算により求められる周辺視力値Ｖ［Ｘ，Ｙ］は、水平方向画素数Ｗ×垂直方向画素数Ｈの要素数の行列データとして表され、被験者の注視点データに基づく視力分布を表す。これを注視点マップデータと呼ぶ。すなわち、フレーム番号ｆにおける被験者ｓの注視点マップデータＧＭｓ（ｆ）は、式（３）のように表される。 The peripheral visual acuity value V [X, Y] obtained by the calculation of Expression (2) is expressed as matrix data of the number of elements of horizontal pixel count W × vertical pixel count H, and visual acuity distribution based on subject's gaze data Represents. This is called gaze point map data. That is, the gazing point map data GMs (f) of the subject s in the frame number f is expressed as in Expression (3).

図５に戻り、ステップＳ５４の処理において、注視点データ解析部１２は、参照されたフレーム画像についての被験者全員分の注視点マップデータを線形和し、この線形和された注視点マップデータを当該フレーム画像における注視点データ評価指標データとする。すなわち、フレーム番号fのフレーム画像における注視点マップデータＧＭ（ｆ）を、式（４）の計算によって求める。 Returning to FIG. 5, in the process of step S 54, the gazing point data analysis unit 12 linearly sums the gazing point map data for all the subjects for the referenced frame image, and the gazing point map data that is linearly summed The gazing point data evaluation index data in the frame image is used. That is, the gazing point map data GM (f) in the frame image with the frame number f is obtained by the calculation of Expression (4).

なお、定数ｃｓは、被験者ごとに異なる値としてもよいし一定値（例えば、全ての被験者についてＣＳ＝１．０）としてもよい。 The constant cs may be a different value for each subject or may be a constant value (for example, CS = 1.0 for all subjects).

図８は、注視点データ解析部１２が式（４）の計算によって求めた、注視点マップデータＧＭ（ｆ）の３次元グラフである。同図は、水平方向画素数Ｗ＝３２０画素，垂直方向画素数Ｈ＝２４０画素のフレーム画像についての例である。 FIG. 8 is a three-dimensional graph of the gazing point map data GM (f) obtained by the gazing point data analysis unit 12 by the calculation of Expression (4). The figure shows an example of a frame image having a horizontal pixel count W = 320 pixels and a vertical pixel count H = 240 pixels.

再び、図５に戻り、次に、ステップＳ５５において、注視点データ解析部１２は、参照されたフレーム画像における注視点マップデータＧＭ（ｆ）をデータ格納部１１に記録する。そして、ステップＳ５１の処理に戻る。 Returning to FIG. 5 again, in step S55, the gazing point data analysis unit 12 records the gazing point map data GM (f) in the referenced frame image in the data storage unit 11. Then, the process returns to step S51.

上述したように、注視点データ解析部１２は、周辺視野の視力分布を考慮して注視点データ評価指標データを計算することにより、計測された注視点から人間の眼球運動の特性を適応させた評価指標データを作成することができる。これは、すなわち被験者の人数が少ない場合でも、フレーム画像の全画素に対する注視点の集中度を効率よく求めることができる。 As described above, the gazing point data analysis unit 12 calculates the gazing point data evaluation index data in consideration of the visual acuity distribution of the peripheral visual field, thereby adapting the characteristics of the human eye movement from the measured gazing point. Evaluation index data can be created. That is, even when the number of subjects is small, it is possible to efficiently obtain the concentration of the gazing point with respect to all the pixels of the frame image.

なお、注視点データ解析部１２は、既存技術によるクラスタリング方法を用いることによって複数の被験者についての注視点の分布をクラスタリングしたうえで、クラスタごとの分布を全て足し合わせて、混合正規分布となる注視点マップデータを求めるようにしてもよい。 Note that the gazing point data analysis unit 12 clusters the gazing point distributions for a plurality of subjects by using a clustering method based on existing technology, and then adds all the distributions for each cluster to form a mixed normal distribution. You may make it obtain | require viewpoint map data.

＜画像解析処理＞
顕著性マップデータの生成に際して用いられる視覚属性として、画像解析部１３は、例えば、前述したように色（ｃｏｌｏｒ）、明度（ｉｎｔｅｎｓｉｔｙ）、方位（ｏｒｉｅｎｔａｔｉｏｎ）、コントラスト（ｃｏｎｔｒａｓｔ）、点滅（ｆｌｉｃｋｅｒ）、および運動（ｍｏｔｉｏｎ）の６つの属性を用いる。色属性は、画素の色の値を色の属性値としたものである。明度属性は、画素の輝度値を明度の属性値としたものである。方位属性は、例えば、水平方向を基準方位とした場合の０度，４５度，９０度，１３５度の４つの方位それぞれの線成分の強さを画素ごとに合計して方位の属性値としたものである。なお、所定方位の線成分の強さは、例えば、その方位の方向の画像微分値と、それに直交する方向の画像微分値との比に基づいて算出する。コントラスト属性は、当該画素を含む領域の画素値とその他の領域の画素値との比により算出されるコントラスト値をコントラストの属性値としたものである。点滅属性は、当該画素を含む領域の時間方向における画素値の変化が所定の周波数成分を有する場合に、その周波数自体およびその周波数成分の振幅に基づいて算出される属性値である。また、運動属性は、フレーム画像内における所定のパターンが、時間の経過につれて所定方向に移動する場合の、そのパターンの大きさとその移動速度とを加味した属性値とするものである。顕著性マップデータの生成に際しては１種類以上の視覚属性に対応する属性値を用いるようにすれば良いが、本実施形態では、上記６種類の属性値を視覚属性に対応する物理的特徴量として用いる。 <Image analysis processing>
As the visual attributes used when generating the saliency map data, the image analysis unit 13, for example, as described above, for example, color, intensity, orientation, contrast, flicker, And six attributes of motion. The color attribute is a color attribute value that is a pixel color value. The brightness attribute is a brightness attribute value of a pixel. The azimuth attribute is, for example, the sum of the line component strength of each of the four azimuths of 0 degrees, 45 degrees, 90 degrees, and 135 degrees when the horizontal direction is the reference azimuth for each pixel to obtain an azimuth attribute value. Is. Note that the strength of the line component of a predetermined azimuth is calculated based on, for example, the ratio between the image differential value in the direction of the azimuth and the image differential value in the direction orthogonal thereto. The contrast attribute is a contrast attribute value that is a contrast value calculated by a ratio between a pixel value of a region including the pixel and a pixel value of another region. The blinking attribute is an attribute value calculated based on the frequency itself and the amplitude of the frequency component when the change in the pixel value in the time direction of the region including the pixel has a predetermined frequency component. The motion attribute is an attribute value that takes into account the size of the pattern and its moving speed when a predetermined pattern in the frame image moves in a predetermined direction as time passes. In generating the saliency map data, attribute values corresponding to one or more types of visual attributes may be used, but in the present embodiment, the six types of attribute values are used as physical feature amounts corresponding to the visual attributes. Use.

図９は、画像解析部１３が画像コンテンツについての顕著性マップデータを生成する手順を示すフローチャートである。ステップＳ９１において、画像解析部１３は、顕著性マップデータを生成するための画像解析用パラメータを設定する。この画像解析用パラメータとは、画像コンテンツの画像解析処理における物理的特徴量に対応する６つの重みデータである。 FIG. 9 is a flowchart illustrating a procedure in which the image analysis unit 13 generates saliency map data for image content. In step S91, the image analysis unit 13 sets parameters for image analysis for generating saliency map data. The image analysis parameters are six weight data corresponding to physical feature amounts in the image analysis processing of the image content.

図１０は、画像解析部１３が設定する画像解析用パラメータのデータ構成を示したものである。同図に示すように、画像解析用パラメータは、６種類の物理的特徴量にそれぞれ対応する重みデータを有している。ｗｃは色属性に対応する物理的特徴量ＣＣの重みデータ、ｗｉは明度属性に対応する物理的特徴量ＣＩの重みデータ、ｗｏは方位属性に対する物理的特徴量ＣＯの重みデータ、ｗｒはコントラスト属性に対応する物理的特徴量ＣＲの重みデータ、ｗｊは点滅属性に対応する物理的特徴量ＣＪの重みデータ、ｗｍは運動属性に対応する物理的特徴量ＣＭの重みデータである。 FIG. 10 shows the data structure of the image analysis parameters set by the image analysis unit 13. As shown in the figure, the image analysis parameter has weight data corresponding to six types of physical feature amounts. wc is weight data of the physical feature quantity CC corresponding to the color attribute, wi is weight data of the physical feature quantity CI corresponding to the brightness attribute, wo is weight data of the physical feature quantity CO with respect to the orientation attribute, and wr is a contrast attribute. , Wj is the weight data of the physical feature quantity CJ corresponding to the blinking attribute, and wm is the weight data of the physical feature quantity CM corresponding to the motion attribute.

図９に戻り、次に、ステップＳ９２において、画像解析部１３は、データ格納部１１に格納された画像コンテンツをフレーム画像単位で参照する。ステップＳ９３において、参照されるフレーム画像がない場合（ステップＳ９３：ＮＯ）は、このフローチャートの処理を終了する。一方、フレーム画像が参照された場合（ステップＳ９３：ＹＥＳ）はステップＳ９４の処理に進む。ステップＳ９４において、画像解析部１３は、参照されたフレーム画像を読み込む。 Returning to FIG. 9, next, in step S92, the image analysis unit 13 refers to the image content stored in the data storage unit 11 in units of frame images. In step S93, when there is no frame image to be referred to (step S93: NO), the processing of this flowchart is ended. On the other hand, when the frame image is referred to (step S93: YES), the process proceeds to step S94. In step S94, the image analysis unit 13 reads the referenced frame image.

次に、ステップＳ９５において、画像解析部１３は、顕著性マップデータを生成する。画像解析部１３は、ステップＳ９１の処理において設定された画像解析用パラメータに基づいて、全ての視覚属性に対応した特徴量に基づき推定した顕著性の分布データを生成する。具体的には、画像解析部１３は、読み込んだフレーム画像に対して、６つの視覚属性に関する画像解析処理を行って視覚属性ごとの特徴マップを生成する。そして、画像解析部１３は、これら特徴マップの重み付け線形和を計算して顕著性評価指標データである顕著性マップデータを生成する。画像解析部１３は、水平方向画素数Ｗ×垂直方向画素数Ｈの画素数のフレーム画像の画素（ｉ，ｊ）に対応する特徴量の重み付け線形和Ｆ［ｉ，ｊ］を、下の式（５）を用いて計算する。 Next, in step S95, the image analysis unit 13 generates saliency map data. The image analysis unit 13 generates saliency distribution data estimated based on the feature amounts corresponding to all visual attributes based on the image analysis parameters set in the process of step S91. Specifically, the image analysis unit 13 performs image analysis processing related to six visual attributes on the read frame image to generate a feature map for each visual attribute. Then, the image analysis unit 13 calculates a weighted linear sum of these feature maps to generate saliency map data that is saliency evaluation index data. The image analysis unit 13 calculates a weighted linear sum F [i, j] of feature amounts corresponding to the pixel (i, j) of the frame image having the number of horizontal pixels W × the number of vertical pixels H by the following equation: Calculate using (5).

そして、フレーム画像全体についての顕著性マップデータＳＭ（ｆ）は、式（６）のように表される。 Then, the saliency map data SM (f) for the entire frame image is expressed as in Expression (6).

図１１は、参照されたフレーム画像について生成された顕著性マップの３次元グラフの例である。同図は、水平方向画素数Ｗ＝３２０画素，垂直方向画素数Ｈ＝２４０画素のフレーム画像についての例である。 FIG. 11 is an example of a three-dimensional graph of the saliency map generated for the referenced frame image. The figure shows an example of a frame image having a horizontal pixel count W = 320 pixels and a vertical pixel count H = 240 pixels.

図９に戻り、次に、ステップＳ９６において、画像解析部１３は、参照されたフレーム画像における顕著性マップデータをデータ格納部１１に記録する。そして、ステップＳ９２の処理に戻る。 Returning to FIG. 9, next, in step S 96, the image analysis unit 13 records the saliency map data in the referenced frame image in the data storage unit 11. Then, the process returns to step S92.

＜比較処理＞
トップダウン要因の影響を少なくとも低減させた画像コンテンツを用いて、注視点データ評価指標データと顕著性評価指標データとを作成すると、画像内において注視点の集中する領域と顕著性の高い領域とは、少なくとも部分的に重複するか、あるいは近接する。そこで、比較処理部１４は、注視点データ評価指標データの分布と、顕著性評価指標データの分布との分布の類似性の指標である一致度を求める。注視点データ評価指標データである注視点マップデータと顕著性評価指標データである顕著性マップデータとの、各マトリクスの値が類似するほど、上記の一致度を示す値は大きくなる。具体的には、比較処理部１４は、式（４）で示した注視点マップデータＧＭ（ｆ）と、式（６）で示した顕著性マップデータＳＭ（ｆ）との両データを比較して一致度を計算する。 <Comparison process>
When gaze point data evaluation index data and saliency evaluation index data are created using image content that has at least reduced the influence of top-down factors, the areas where the gazing point concentrates and the areas with high saliency in the image , At least partially overlap or close. Therefore, the comparison processing unit 14 obtains a degree of coincidence that is an index of similarity between the distribution of the gazing point data evaluation index data and the distribution of the saliency evaluation index data. As the values of the respective matrices are similar between the gazing point map data that is the gazing point data evaluation index data and the saliency map data that is the saliency evaluation index data, the value indicating the degree of coincidence increases. Specifically, the comparison processing unit 14 compares both data of the gazing point map data GM (f) represented by Expression (4) and the saliency map data SM (f) represented by Expression (6). To calculate the degree of coincidence.

図１２は、比較処理部１４が一の画像コンテンツについての注視点マップデータと顕著性マップデータとを比較して一致度を計算する手順を示すフローチャートである。ステップＳ１２１において、比較処理部１４は、データ格納部１１に格納されたある画像コンテンツに含まれる１フレーム画像分の注視点マップデータを参照する。
そして、ステップＳ１２１で参照したデータがあったか否かを、次のステップＳ１２２において判定する。参照された注視点マップデータがある場合（ステップＳ１２２：ＹＥＳ）は次のステップＳ１２３の処理に進む。一方、参照すべき注視点マップデータがない場合、即ちその画像コンテンツに含まれる全てのフレーム画像についてのステップＳ１２３以下の処理が完了している場合（ステップＳ１２２：ＮＯ）にはステップＳ１２７の処理に進む。 FIG. 12 is a flowchart illustrating a procedure in which the comparison processing unit 14 calculates the degree of coincidence by comparing the gazing point map data and the saliency map data for one image content. In step S 121, the comparison processing unit 14 refers to gaze point map data for one frame image included in a certain image content stored in the data storage unit 11.
Then, in the next step S122, it is determined whether or not there is data referred to in step S121. If there is referenced gazing point map data (step S122: YES), the process proceeds to the next step S123. On the other hand, if there is no gazing point map data to be referred to, that is, if the processing from step S123 on all the frame images included in the image content is completed (step S122: NO), the processing in step S127 is performed. move on.

ステップＳ１２３において、比較処理部１４は、参照されたフレーム画像についての注視点マップデータを読み込む。次に、ステップＳ１２４において、比較処理部１４は、参照されたフレーム画像についての顕著性マップデータをデータ格納部１１から読み込む。次に、ステップＳ１２５において、比較処理部１４は、それぞれ読み込んだ注視点マップデータおよび顕著性マップデータから一致評価マップデータを計算する。次に、ステップＳ１２６において、参照されたフレーム画像における一致度を計算する。そして、ステップＳ１２１の処理に戻る。 In step S123, the comparison processing unit 14 reads gaze point map data for the referenced frame image. Next, in step S 124, the comparison processing unit 14 reads the saliency map data for the referenced frame image from the data storage unit 11. In step S125, the comparison processing unit 14 calculates coincidence evaluation map data from the read gazing point map data and saliency map data. Next, in step S126, the degree of coincidence in the referenced frame image is calculated. Then, the process returns to step S121.

上記のステップＳ１２５およびＳ１２６の一致度計算の処理について、具体的な３つの例をあげて説明する。
第１の方法による処理は、注視点マップデータおよび顕著性マップデータの各要素の差分値を求めて一致評価マップデータとするものである。すなわち、比較処理部１４は、注視点マップデータＧＭ（ｆ）および顕著性マップデータＳＭ（ｆ）について、マトリクスの位置（ｉ，ｊ）に対応する要素同士の差分の絶対値である一致評価マップデータＤＭ［ｉ，ｊ］を式（７）により計算する。 The matching degree calculation processing in steps S125 and S126 will be described with three specific examples.
The process by the 1st method calculates | requires the difference value of each element of gaze point map data and saliency map data, and makes it coincidence evaluation map data. That is, the comparison processing unit 14 matches the gazing point map data GM (f) and the saliency map data SM (f) with a coincidence evaluation map that is an absolute value of a difference between elements corresponding to the position (i, j) of the matrix. Data DM [i, j] is calculated by equation (7).

そして、比較処理部１４は、式（７）により算出された一致評価マップデータＤＭ［ｉ，ｊ］と予め決定された閾値とを比較して、この閾値よりも値の小さな差分値ＤＭ［ｉ，ｊ］の個数をカウントする。そして、カウント結果である個数を一致度とする。 Then, the comparison processing unit 14 compares the coincidence evaluation map data DM [i, j] calculated by the equation (7) with a predetermined threshold value, and the difference value DM [i having a smaller value than the threshold value. , J]. And the number which is a count result is made into a coincidence degree.

第２の方法による処理は、注視点マップデータおよび顕著性マップデータの類似性を求めるものである。すなわち、比較処理部１４は、注視点マップデータＧＭ（ｆ）と顕著性マップデータＳＭ（ｆ）とのそれぞれからヒストグラムを作成し、そのヒストグラムの要素単位で双方の差分の絶対値を計算して合計した値を一致度とする。このヒストグラムは、注視点マップデータＧＭ（ｆ）および顕著性マップデータＳＭ（ｆ）それぞれの要素が０から１までの値をとり得る場合、例えば、各要素を０．１刻みで１０等分し、区分ごとのデータの出現数を計ったものである。 The process according to the second method is to obtain the similarity between the gazing point map data and the saliency map data. That is, the comparison processing unit 14 creates a histogram from each of the gazing point map data GM (f) and the saliency map data SM (f), and calculates the absolute value of the difference between the two in element units of the histogram. The total value is used as the degree of coincidence. In the histogram, when each element of the gazing point map data GM (f) and the saliency map data SM (f) can take a value from 0 to 1, for example, each element is divided into 10 parts by 0.1. The number of occurrences of data for each category is counted.

第３の方法による処理は、注視点マップデータおよび顕著性マップデータの要素ごとの積を求めて一致評価マップデータとするものである。すなわち、比較処理部１４は、注視点マップデータＧＭ（ｆ）および顕著性マップデータＳＭ（ｆ）について、マトリクスの位置（ｉ，ｊ）に対応する要素同士の積値である一致評価マップデータＭＭ［ｉ，ｊ］を式（８）により計算する。 The process by the 3rd method calculates | requires the product for every element of gaze point map data and saliency map data, and makes it coincidence evaluation map data. That is, the comparison processing unit 14 matches the evaluation map data MM that is the product value of elements corresponding to the position (i, j) of the matrix for the gazing point map data GM (f) and the saliency map data SM (f). [I, j] is calculated by the equation (8).

そして、比較処理部１４は、式（８）により算出された一致評価マップデータＭＭ［ｉ，ｊ］と予め決定された閾値とを比較して、この閾値よりも値の大きな積値ＭＭ［ｉ，ｊ］の個数をカウントする。そして、カウント結果である個数を一致度とする。 Then, the comparison processing unit 14 compares the coincidence evaluation map data MM [i, j] calculated by the equation (8) with a predetermined threshold value, and a product value MM [i that has a value larger than the threshold value. , J]. And the number which is a count result is made into a coincidence degree.

そして、比較処理部１４は、ステップＳ１２１からＳ１２６までの処理を繰り返して、対象の画像コンテンツを構成する全てのフレーム画像についての一致度を計算したのち、ステップＳ１２７の処理に進む。ステップＳ１２７においては、比較処理部１４は、各フレーム画像の一致度に基づいて画像コンテンツ全体の一致度である評価値を計算する。 Then, the comparison processing unit 14 repeats the processing from step S121 to S126 to calculate the degree of coincidence for all the frame images constituting the target image content, and then proceeds to the processing of step S127. In step S127, the comparison processing unit 14 calculates an evaluation value that is the matching degree of the entire image content based on the matching degree of each frame image.

この評価値の計算方法は次のとおりである。例えば、１つの画像コンテンツを構成する全てのフレーム画像についての一致度の平均値を求めてそれを評価値とする。または、全てのフレーム画像の一致度の時間変化に対する積分値を求めて評価値とする。 The evaluation value is calculated as follows. For example, an average value of coincidences for all the frame images constituting one image content is obtained and used as an evaluation value. Alternatively, an integral value with respect to a temporal change in the degree of coincidence of all frame images is obtained and used as an evaluation value.

次に、ステップＳ１２８において、比較処理部１４は、計算された評価値をデータ格納部１１に記録する。 Next, in step S128, the comparison processing unit 14 records the calculated evaluation value in the data storage unit 11.

前述の注視点データ生成処理では、画像コンテンツの再生に同期させて注視点データを計測し、フレーム画像の再生時刻と同時刻に得られた注視点データを記録するようにしている。しかし、人間の目は、生理反応として、ある画像が視野に入ったときから短時間のタイムラグの後に視線を動かすという特性を有している。この特性を考慮し、あるフレーム画像の再生時刻から、そのフレーム画像に対応する注視点データの算出時刻をタイムラグに相当する時間分だけ遅くするようにしてもよい。 In the above-described gazing point data generation process, the gazing point data is measured in synchronization with the reproduction of the image content, and the gazing point data obtained at the same time as the frame image reproduction time is recorded. However, the human eye has a characteristic of moving the line of sight after a short time lag from when an image enters the field of view as a physiological response. In consideration of this characteristic, the calculation time of the gazing point data corresponding to the frame image may be delayed from the reproduction time of the frame image by a time corresponding to the time lag.

つまり、予めこのタイムラグに相当するフレーム数の値を記憶しておき、比較処理では、注視点マップデータと顕著性マップデータとを比較して一致度を計算する際に、顕著性マップデータの生成時刻よりも、当該フレーム数分遅れた注視点マップデータを用いて一致度を算出するようにしてもよい。 That is, the value of the number of frames corresponding to this time lag is stored in advance, and the comparison process generates saliency map data when calculating the degree of coincidence by comparing the gazing point map data with the saliency map data. The degree of coincidence may be calculated using gaze point map data delayed by the number of frames from the time.

なお、顕著性マップの生成時刻に対応する比較対象の注視点マップデータの収集時刻の遅延時間Ｔｄを次のようにして求めることができる。あるタイムコードのフレーム画像とその前後のフレーム画像を解析することによって、フレーム画像内の同一位置の部分領域における物理的特徴の変化が所定の閾値よりも大きい箇所を検出する。このようにして検出されたフレーム画像のタイムコードをＴ１としたとき、被験者の視線の動きを解析し、タイムコードＴ１から時間Ｔｂが経過した後の時点においてサッカードを検知すると、そのサッカード後の注視点とそのときのタイムコードＴ２＝Ｔ１＋Ｔｂを記録する。この場合に、時間Ｔｂを遅延時間Ｔｄとみなすことができる。なお、サッカードとは、視線を移すときに生じる急速な眼球運動のことであり、跳躍性眼球運動ともいう。 Note that the delay time Td of the collection time of the gazing point map data to be compared corresponding to the generation time of the saliency map can be obtained as follows. By analyzing a frame image of a certain time code and the frame images before and after the frame image, a location where a change in physical characteristics in a partial region at the same position in the frame image is larger than a predetermined threshold is detected. When the time code of the frame image thus detected is T1, the movement of the subject's line of sight is analyzed, and if a saccade is detected at a time point after the time Tb has elapsed from the time code T1, And the time code T2 = T1 + Tb at that time are recorded. In this case, the time Tb can be regarded as the delay time Td. The saccade is a rapid eye movement that occurs when the line of sight is shifted, and is also called a jumping eye movement.

＜パラメータ決定処理＞
画像コンテンツの顕著性マップデータを生成するために用いられる画像解析用パラメータについて、重みデータの組み合わせを最適化するには、例えば最急勾配法などを用いて様々な画像解析用パラメータを用いた計算を行うことにより最適解を探索する方法をとることができる。しかし、単純にこのような方法を用いるだけでは、膨大な計算量と時間がかかることになる。そこで、本実施形態では、パラメータ決定部１５が、予め一致度の計算されている学習用動画像コンテンツの注視点マップデータおよび画像解析用パラメータを用いて、評価用パラメータの決定対象である学習用動画像コンテンツ（評価用パラメータ決定対象学習用動画像コンテンツ）の顕著性マップデータを生成するための、最適な画像解析用パラメータを評価用パラメータとして決定する処理を実行する。 <Parameter determination process>
To optimize the combination of weight data for image analysis parameters used to generate image content saliency map data, for example, calculation using various image analysis parameters using the steepest gradient method Can be used to search for an optimal solution. However, simply using such a method requires a huge amount of calculation and time. Therefore, in the present embodiment, the parameter determination unit 15 uses the gazing point map data of the learning moving image content and the image analysis parameter, for which the degree of coincidence has been calculated in advance, and is used for the determination of the evaluation parameter. Processing for determining an optimal image analysis parameter as an evaluation parameter for generating saliency map data of the moving image content (evaluation parameter determination target learning moving image content) is executed.

データ格納部１１は、内部に学習用画像データベース（不図示）を備えている。この学習用画像データベースは、１つまたは複数の学習用動画像コンテンツを格納し、さらにこれら学習用動画像コンテンツそれぞれの注視点マップデータおよび顕著性マップデータと、これら注視点マップデータと顕著性マップデータとの間の一致度のデータと、その顕著性マップデータを算出する際に用いた画像解析用パラメータとを格納している。学習用動画像コンテンツが複数ある場合は、学習用動画像コンテンツごとに画像解析用パラメータの重みデータが異なるようにする。なお、ここで、学習用画像データベースが、上記のデータ項目のうち、学習用動画像コンテンツに対応する顕著性マップデータを保持しない形態でもよい。また、逆に、学習用画像データベースが、上記のデータ項目のうちの一致度のデータを保持しない形態でもよい。このとき、学習用動画像コンテンツの注視点マップデータと顕著性マップデータとから一致度を算出することができ、実質的に、注視点マップデータと顕著性マップデータとのペアは一致度をも表しているデータである。 The data storage unit 11 includes a learning image database (not shown) inside. The learning image database stores one or a plurality of learning moving image contents, and further includes gazing point map data and saliency map data of each of these learning moving image contents, and these gazing point map data and saliency map. Data of the degree of coincidence with the data and image analysis parameters used when calculating the saliency map data are stored. When there are a plurality of learning moving image contents, the weight data of the image analysis parameters is made different for each learning moving image content. Here, the learning image database may not have saliency map data corresponding to the learning moving image content among the data items. Conversely, the learning image database may not hold data on the degree of coincidence among the above data items. At this time, the degree of coincidence can be calculated from the gazing point map data and the saliency map data of the moving image content for learning, and the pair of the gazing point map data and the saliency map data substantially has the degree of coincidence. It is data that represents.

学習用動画像コンテンツごとの画像解析用パラメータは、６つの重みデータ全てを同一値に設定して重み付けを平等（例えば、ｗｃ＝ｗｉ＝ｗｏ＝ｗｒ＝ｗｊ＝ｗｍ＝１．０）にしておくか、または、学習用動画像コンテンツの注視点マップデータと顕著性マップデータとの一致度が予め決定された基準値よりも高い値となるように調整された重みデータにしておく。 As the image analysis parameters for each learning moving image content, all six weight data are set to the same value, and the weights are set to be equal (for example, wc = wi = wo = wr = wj = wm = 1.0). Alternatively, the weight data is adjusted so that the degree of coincidence between the gazing point map data of the learning moving image content and the saliency map data is higher than a predetermined reference value.

図１３は、パラメータ決定部１５が、学習用動画像コンテンツに関する注視点マップデータおよび画像解析用パラメータを用いて、学習用画像データベース内の評価用パラメータ決定対象学習用動画像コンテンツの顕著性マップデータを生成するための最適な画像解析用パラメータ（評価用パラメータ）を決定する手順を示すフローチャートである。 FIG. 13 shows the saliency map data of the evaluation parameter determination target learning moving image content in the learning image database in which the parameter determination unit 15 uses the gazing point map data and the image analysis parameters related to the learning moving image content. 5 is a flowchart showing a procedure for determining an optimum image analysis parameter (evaluation parameter) for generating the image.

まず、ステップＳ１３１において、パラメータ決定部１５は、評価用パラメータ決定対象学習用動画像コンテンツについての注視点マップデータをデータ格納部１１から読み込む。
次に、ステップＳ１３２において、パラメータ決定部１５は、データ格納部１１に格納された評価用パラメータ決定対象学習用動画像コンテンツを除く、ある学習用動画像コンテンツについての注視点マップデータを参照する。
そして、ステップＳ１３２で参照したデータがあったか否かを、次のステップＳ１３３において判定する。参照された学習用動画像コンテンツの注視点マップデータがあった場合（ステップＳ１３３：ＹＥＳ）は次のステップＳ１３４の処理に進む。一方、参照すべき注視点マップデータがない場合、即ちパラメータ決定部１５内の学習用動画像コンテンツの全てについてステップＳ１３４およびＳ１３５の処理が完了している場合（ステップＳ１３３：ＮＯ）はステップＳ１３６に進む。 First, in step S 131, the parameter determination unit 15 reads gaze point map data about the evaluation parameter determination target learning moving image content from the data storage unit 11.
Next, in step S 132, the parameter determination unit 15 refers to gazing point map data for a certain learning moving image content excluding the evaluation parameter determination target learning moving image content stored in the data storage unit 11.
Then, in the next step S133, it is determined whether or not there is data referred to in step S132. If there is the gazing point map data of the referenced learning video content (step S133: YES), the process proceeds to the next step S134. On the other hand, when there is no gazing point map data to be referred to, that is, when the processing of steps S134 and S135 has been completed for all of the learning moving image contents in the parameter determination unit 15 (step S133: NO), the process proceeds to step S136. move on.

次に、ステップＳ１３４において、パラメータ決定部１５は、ステップＳ１３３の処理において参照した学習用動画像コンテンツの注視点マップデータをデータ格納部１１から読み込む。次に、ステップＳ１３５において、パラメータ決定部１５は、それぞれ読み込んだ評価用パラメータ決定対象学習用動画像コンテンツの注視点マップデータと学習用動画像コンテンツの注視点マップデータとの類似度を計算してステップＳ１３２の処理に戻る。
ステップＳ１３５の処理における類似度の計算方法については、前述した比較処理部１４が実行する比較処理における、注視点マップデータと顕著性マップデータとを比較して一致度を計算する方法と同様の方法を用いる。例えば、パラメータ決定部１５は、評価用パラメータ決定対象学習用動画像コンテンツの注視点マップデータおよび学習用動画像コンテンツの注視点マップデータの要素ごとの差分値を計算する。そして、その差分値と予め決定された閾値とを比較し、この閾値よりも値の小さな差分値の個数を計測して類似度とする。 Next, in step S134, the parameter determination unit 15 reads from the data storage unit 11 the gazing point map data of the learning moving image content referenced in the process of step S133. Next, in step S135, the parameter determination unit 15 calculates the similarity between the gazing point map data of the read evaluation parameter determination target learning moving image content and the gazing point map data of the learning moving image content. The process returns to step S132.
The similarity calculation method in the process of step S135 is the same method as the method of calculating the degree of coincidence by comparing the gazing point map data and the saliency map data in the comparison process executed by the comparison processing unit 14 described above. Is used. For example, the parameter determination unit 15 calculates a difference value for each element of the gazing point map data of the evaluation parameter determination target learning moving image content and the gazing point map data of the learning moving image content. Then, the difference value is compared with a predetermined threshold value, and the number of difference values having a value smaller than the threshold value is measured to obtain the similarity.

ステップＳ１３３からステップＳ１３６の処理に進んだ後の処理は次の通りである。
ステップＳ１３６において、パラメータ決定部１５は、類似度の算出された１つまたは複数の学習用動画像コンテンツの注視点マップデータのうち、所定の選択基準によって注視点マップデータを選択する。この所定の選択基準は、例えば、パラメータ決定部１５がステップＳ１３５において計算した類似度が、予め決定された類似度基準値を超える類似度である注視点マップデータを選択するという基準である。また、類似度の高い方から順に、予め決定された選択数の注視点マップデータを選択するという基準にしてもよい。 Processing after proceeding from step S133 to step S136 is as follows.
In step S136, the parameter determination unit 15 selects the gazing point map data according to a predetermined selection criterion from the gazing point map data of the one or more learning moving image contents whose similarity is calculated. The predetermined selection criterion is, for example, a criterion for selecting gazing point map data whose similarity calculated by the parameter determination unit 15 in step S135 exceeds a predetermined similarity criterion value. Alternatively, the criterion may be that a predetermined number of gazing point map data is selected in descending order of similarity.

次に、ステップＳ１３７において、パラメータ決定部１５は、選択された学習用動画像コンテンツの注視点マップデータに関連づけられた画像解析用パラメータおよび一致度をデータ格納部１１から読み出す。次に、ステップＳ１３８において、パラメータ決定部１５は、読み出された一致度が複数ある場合は、それらの中で最も数値の高い一致度に対応する画像解析用パラメータを選択して、これを評価用パラメータの初期値に決定する。また、パラメータ決定部１５は、読み出された一致度が１つである場合は、その一致度に対応する画像解析用パラメータを評価用パラメータの初期値に決定する。 Next, in step S137, the parameter determination unit 15 reads from the data storage unit 11 the image analysis parameter and the degree of coincidence associated with the gazing point map data of the selected learning moving image content. Next, in step S138, when there are a plurality of read matching degrees, the parameter determining unit 15 selects an image analysis parameter corresponding to the highest matching degree among them and evaluates it. Determine the initial value of the parameter. Further, when the read degree of coincidence is one, the parameter determining unit 15 determines the image analysis parameter corresponding to the degree of coincidence as the initial value of the evaluation parameter.

次に、ステップＳ１３９において、パラメータ決定部１５は、上で決定された評価用パラメータを初期値として、６つの重みデータを詳細に探索して最適な評価用パラメータを探索する。例えば、パラメータ決定部１５は、最急勾配法を用いて詳細に重みデータの最適値を探索する。ここでは、以下の一例を挙げる。パラメータ決定部１５は、決定された評価用パラメータを構成する６つの重みデータから１番目の重みデータを選択して値を変化させ、６つの重みデータを画像解析部１３に供給する。次に、画像解析部１３は、供給された６つの重みデータの評価用パラメータを用いて評価用パラメータ決定対象学習用動画像コンテンツの顕著性マップデータを計算し、パラメータ決定部１５に制御を戻す。次に、パラメータ決定部１５は比較処理部１４に制御を渡す。比較処理部１４は、評価用パラメータ決定対象学習用動画像コンテンツの顕著性マップデータと評価用パラメータ決定対象学習用動画像コンテンツの注視点マップデータとの一致度を計算してパラメータ決定部１５に制御を戻す。パラメータ決定部１５は、上記のようにして重みデータの値を所望の範囲の間で変化させて一致度を計算し、最も一致度が高くなる重みデータを検索する。さらに、パラメータ決定部１５は、２番目から６番目の重みデータについても、１番目の重みデータと同様に最も一致度が高くなる重みデータを検索する。 Next, in step S139, the parameter determination unit 15 searches the six weight data in detail using the evaluation parameter determined above as an initial value to search for the optimal evaluation parameter. For example, the parameter determination unit 15 searches for the optimum value of the weight data in detail using the steepest gradient method. Here, the following example is given. The parameter determination unit 15 selects the first weight data from the six weight data constituting the determined evaluation parameter, changes the value, and supplies the six weight data to the image analysis unit 13. Next, the image analysis unit 13 calculates the saliency map data of the moving image content for evaluation parameter determination target learning using the evaluation parameters of the supplied six weight data, and returns control to the parameter determination unit 15. . Next, the parameter determination unit 15 passes control to the comparison processing unit 14. The comparison processing unit 14 calculates the degree of coincidence between the saliency map data of the evaluation parameter determination target learning moving image content and the gazing point map data of the evaluation parameter determination target learning moving image content, and sends it to the parameter determination unit 15. Return control. The parameter determination unit 15 calculates the degree of coincidence by changing the value of the weight data between a desired range as described above, and searches for weight data having the highest degree of coincidence. Further, the parameter determination unit 15 searches for the weight data having the highest degree of coincidence for the second to sixth weight data as well as the first weight data.

次に、ステップＳ１４０において、パラメータ決定部１５は、検索された６つの重みデータを評価用パラメータの最適値である最終評価用パラメータとして決定する。次に、ステップＳ１４１において、パラメータ決定部１５は、最終評価用パラメータをデータ格納部１１に記録する。 Next, in step S140, the parameter determination unit 15 determines the searched six weight data as final evaluation parameters that are optimum values of the evaluation parameters. Next, in step S141, the parameter determination unit 15 records the final evaluation parameters in the data storage unit 11.

上記のパラメータ決定処理において、学習用動画像コンテンツのサンプル数が少ない場合や、学習用動画像コンテンツのための画像解析用パラメータの設定パターンが少ない場合に、評価用パラメータ決定対象学習用動画像コンテンツの注視点マップデータと学習用動画像コンテンツの注視点マップデータとの類似度が小さい結果しか得られないことも起こり得る。そのような場合は、学習用動画像コンテンツの顕著性マップデータの生成過程で用いる特徴マップデータを利用して、パラメータの設定パターンを増やすようにする。 In the above parameter determination process, when the number of learning moving image content samples is small, or when the number of image analysis parameter setting patterns for learning moving image content is small, moving image content for evaluation parameter determination target learning It is possible that only a result with a low degree of similarity between the gazing point map data and the gazing point map data of the learning moving image content can be obtained. In such a case, the parameter setting pattern is increased using the feature map data used in the generation process of the saliency map data of the learning moving image content.

具体的には、学習用動画像コンテンツの顕著性マップデータと個々の特徴マップデータとの比較に基づいて、学習用動画像コンテンツの顕著性マップデータに対して影響度の大きな物理的特徴を選定する。次に、その選定された物理的特徴についての重みデータを所望の範囲内で変更ながら一致度を計算する。そして、計算された一致度が最も高くなるパラメータを追加の画像解析用パラメータとして採用する。 Specifically, based on the comparison between the saliency map data of the learning moving image content and the individual feature map data, physical features having a large influence on the saliency map data of the learning moving image content are selected. To do. Next, the degree of coincidence is calculated while changing the weight data for the selected physical feature within a desired range. Then, the parameter with the highest degree of coincidence calculated is employed as an additional image analysis parameter.

また、最終評価用パラメータの画像コンテンツとの適合度は、画像コンテンツごとに異なる。よって、画像コンテンツの用途（放送用、データ配信用等）、使用目的（不特定視聴者用、特定視聴者用等）などに応じて評価基準を変え、最終評価用パラメータを評価基準に応じて複数種類設けるようにしてもよい。 In addition, the degree of matching of the final evaluation parameter with the image content differs for each image content. Therefore, the evaluation criteria are changed according to the use of the image content (for broadcasting, data distribution, etc.), the purpose of use (for unspecified viewers, for specific viewers, etc.), and the final evaluation parameters are set according to the evaluation criteria. A plurality of types may be provided.

＜画像コンテンツの評価＞
制作中の画像コンテンツなど、視聴者による視線計測実験が行われていない画像コンテンツには注視点データが存在しない。このような注視点データの存在しない画像コンテンツについて、視聴者の注視点の分布を推定する。図１４は、外部から入力した評価対象動画像コンテンツの注視点マップデータを推定する処理についてのフローチャートである。ステップＳ２４１において、外部から評価対象動画像コンテンツが画像入力部２１に供給されると、画像入力部２１はこれを入力してデータ格納部１１に格納する。
次に、ステップＳ２４２において、類似画像コンテンツ決定部２２は、評価対象動画像コンテンツに対して、注視点マップデータ、顕著性マップデータ、および最適化された評価用パラメータが存在する学習用動画像コンテンツ群の中から、類似する学習用動画像コンテンツを類似学習用動画像コンテンツとして決定する。 <Evaluation of image content>
There is no gazing point data in image content that has not been subjected to a gaze measurement experiment by a viewer, such as image content being produced. The distribution of the gazing point of the viewer is estimated for such image content without gazing point data. FIG. 14 is a flowchart of processing for estimating gaze point map data of the evaluation target moving image content input from the outside. In step S 241, when the evaluation target moving image content is supplied from the outside to the image input unit 21, the image input unit 21 inputs this and stores it in the data storage unit 11.
Next, in step S242, the similar image content determination unit 22 learns moving image content in which gazing point map data, saliency map data, and optimized evaluation parameters exist for the evaluation target moving image content. From the group, a similar learning moving image content is determined as a similar learning moving image content.

次に、ステップＳ２４３において、注視点マップ推定部２３は、決定された類似学習用動画像コンテンツにおける評価用パラメータを参照し、これを評価対象動画像コンテンツに対する顕著性マップデータ算出のための評価用パラメータとして使用して顕著性マップデータを生成する。
つまり、注視点マップ推定部２３は、類似画像コンテンツ決定部２２によって決定された類似学習用動画像コンテンツを評価対象動画像コンテンツとし、パラメータ決定部１５が決定した評価用パラメータに基づき、画像解析部１３が評価対象動画像コンテンツの映像解析を行って顕著性マップデータを生成する。
そして、注視点マップ推定部２３は、その顕著性マップデータをデータ格納部１１に格納する。
次に、ステップＳ２４４において、注視点マップ推定部２３は、ステップＳ２４３の処理によりデータ格納部１１に格納された顕著性マップデータを読み出し、これを推定注視点マップデータとして外部に出力する。これにより、評価対象動画像コンテンツについての、視聴者の注視点の分布を推定することができる。 Next, in step S243, the gazing point map estimation unit 23 refers to the evaluation parameter in the determined moving image content for similarity learning, and uses this for evaluation for calculating the saliency map data for the evaluation target moving image content. Use as a parameter to generate saliency map data.
That is, the gazing point map estimation unit 23 sets the similar learning moving image content determined by the similar image content determination unit 22 as the evaluation target moving image content, and based on the evaluation parameters determined by the parameter determination unit 15, the image analysis unit 13 performs video analysis of the moving image content to be evaluated to generate saliency map data.
Then, the gaze point map estimation unit 23 stores the saliency map data in the data storage unit 11.
Next, in step S244, the gazing point map estimation unit 23 reads the saliency map data stored in the data storage unit 11 by the processing in step S243, and outputs this to the outside as estimated gazing point map data. Thereby, it is possible to estimate the distribution of the viewer's gaze point for the evaluation target moving image content.

評価対象動画像コンテンツと学習用動画像コンテンツとの類似性の判定方法として、次の２つの例をあげる。 The following two examples are given as methods for determining the similarity between the evaluation target moving image content and the learning moving image content.

第１の方法は、評価対象動画像コンテンツおよび学習用動画像コンテンツに対して、共通の顕著性マップデータ生成のための評価用パラメータを用いて、1つの視覚属性に関する特徴マップ同士、あるいは、複数の視覚属性に関する特徴マップから得られた顕著性マップデータ同士の類似性を判定する。 The first method uses a common evaluation parameter for generating saliency map data for the evaluation target moving image content and the learning moving image content, or a plurality of feature maps related to one visual attribute, or a plurality of feature maps. The similarity between the saliency map data obtained from the feature maps related to the visual attributes of the images is determined.

第２の方法は、動画像コンテンツの特徴に基づくクラスタリング処理を行い、その結果により動画像コンテンツ間の類似性を判定する。一例としては、文献（帆足啓一郎、外３名、“フレームクラスタリングを利用したＣＧＭ動画像コンテンツ検索手法の提案”、電子情報通信学会パターン認識・メディア理解研究会、ｐｐ．８７−９２，２００７年１０月）に記載されている、動画像コンテンツの映像解析処理によって動画像間の類似性を判定する処理を適用する。 In the second method, clustering processing based on the characteristics of moving image content is performed, and the similarity between moving image contents is determined based on the result. As an example, literature (Keiichiro Hoashi, 3 others, “Proposal of CGM moving image content search method using frame clustering”, IEICE Pattern Recognition / Media Understanding Study Group, pp. 87-92, 2007 10 The process of determining the similarity between moving images by the video analysis processing of moving image content described in “Month)” is applied.

上記のように推定した注視点の分布を、評価対象動画像コンテンツの再生表示画面に重畳して表示することによって、評価者にとって視覚的に分かりやすい評価結果を提示することができる。 By displaying the gaze point distribution estimated as described above superimposed on the reproduction display screen of the evaluation target moving image content, it is possible to present an evaluation result that is easy to understand visually for the evaluator.

以上述べたように、本実施形態では、視聴者が学習用動画像コンテンツを視聴した際に計測した視線分布とその学習用動画像コンテンツを映像解析して得られる視覚的特性の顕著性マップの一致度を利用して視聴者視線パラメータを取得してデータベース化を行い、評価対象動画像コンテンツに対して類似する学習用動画像コンテンツの視聴者視線パラメータを評価対象動画像コンテンツの画像解析用パラメータとして採用し、評価対象動画像コンテンツを映像解析して視聴者の視線分布を推定することができる。つまり、視線計測実験を行っていない画像コンテンツであっても、他の画像コンテンツの注視点マップデータ、顕著性マップデータ、顕著性マップ生成用パラメータを利用することで、簡単に注視点の分布に関する評価を行うことができる。 As described above, in this embodiment, the gaze distribution measured when the viewer views the learning moving image content and the visual characteristic saliency map obtained by analyzing the learning moving image content are analyzed. The viewer line-of-sight parameter is acquired by using the degree of coincidence, and the database is created. It is possible to estimate the viewer's gaze distribution by analyzing the video image content to be evaluated. In other words, even for image content that has not undergone a line-of-sight measurement experiment, it is possible to easily relate to the distribution of gazing points by using gazing point map data, saliency map data, and saliency map generation parameters of other image contents Evaluation can be made.

以上述べたように、本実施形態では、人間が画像を見たときの注意の向けられやすさの分布を表す顕著性マップを生成する際に、実際の人間による観察行為により得られた視線運動に基づいて求められた注視点の分布との分布の類似性が高くなるように、評価用パラメータを調整するようにした。これにより、本実施形態によれば、人間による主観的な画像評価に近い評価結果を画像の物理的特徴を用いた客観的な評価手法によって簡単に得ることができる。 As described above, in this embodiment, when generating a saliency map that represents the distribution of ease of attention when a human views an image, the eye movement obtained by an actual human observation action is generated. The evaluation parameters were adjusted so that the similarity of the distribution with the gaze point distribution obtained based on the above was increased. Thereby, according to the present embodiment, an evaluation result close to a subjective image evaluation by a human can be easily obtained by an objective evaluation method using the physical characteristics of the image.

また、本実施形態では、周辺視野の視力分布を考慮して注視点マップデータを生成するようにしたことにより、被験者の人数が少ない場合でも、フレーム画像の全画素に対する注視点の集中度を効率よく求めることができる。 Further, in the present embodiment, the attention point map data is generated in consideration of the visual acuity distribution of the peripheral visual field, so that even when the number of subjects is small, the concentration degree of the attention point with respect to all the pixels of the frame image is improved. You can often ask.

また、本実施形態では、評価用パラメータ決定対象学習用動画像コンテンツについての最適な最終評価用パラメータを決定するために、既存の学習用動画像コンテンツを用いて、注視点の分布が類似し且つ一致度の高くなる画像解析用パラメータを評価用パラメータの初期値として決定するようにした。そして、その評価用パラメータを用いて、６つの重みデータを変化させながらより適切な顕著性マップが得られるように評価用パラメータを調整して最終評価用パラメータを決定するようにした。これにより、本実施形態によれば、時間的な効率性を向上させながら最適な最終評価用パラメータを求めることができる。 Further, in the present embodiment, in order to determine the optimum final evaluation parameter for the evaluation parameter determination target learning moving image content, the distribution of the gazing point is similar using the existing learning moving image content and The parameter for image analysis that increases the degree of coincidence is determined as the initial value of the parameter for evaluation. Then, using the evaluation parameters, the final evaluation parameters are determined by adjusting the evaluation parameters so that a more appropriate saliency map can be obtained while changing the six weight data. Thereby, according to this embodiment, the optimal final evaluation parameter can be obtained while improving the temporal efficiency.

また、本実施形態では、所望の評価対象動画像コンテンツを様々な学習用動画像コンテンツを用いて評価し、最終評価用パラメータを求める実験を繰り返して実施することにより、評価対象動画像コンテンツと、最終評価用パラメータの重みデータの設定パターンとの相関関係を求めることができる。 Further, in the present embodiment, by evaluating the desired evaluation target moving image content using various learning moving image contents and repeatedly performing an experiment for obtaining a final evaluation parameter, the evaluation target moving image content, The correlation with the setting pattern of the weight data of the final evaluation parameter can be obtained.

さらに、予め実験用の評価対象動画像コンテンツを様々な学習用動画像コンテンツを用いて評価して最終評価用パラメータを求める実験を反復実施し、実験用の評価対象動画像コンテンツと、最終評価用パラメータの重みデータの設定パターンとの相関関係を求めておくことが望ましい。これにより、本実施形態による画像コンテンツ評価装置によれば、注視点データを求めるための視線計測実験を行うことなく、客観的な評価材料である画像データの物理的特徴を用いるだけで、画像コンテンツを評価して注視点マップデータを推定することができる。 Further, the evaluation target moving image content is evaluated in advance using various learning moving image contents, and an experiment for obtaining a final evaluation parameter is repeatedly performed. It is desirable to obtain the correlation with the setting pattern of the parameter weight data. As a result, according to the image content evaluation apparatus according to the present embodiment, the image content can be obtained only by using the physical characteristics of the image data that is an objective evaluation material, without performing a line-of-sight measurement experiment for obtaining gazing point data. Can be estimated and gaze point map data can be estimated.

なお、上述した実施形態である画像コンテンツ評価装置の機能をコンピュータで実現するようにしてもよい。この場合、その制御機能を実現するためのコンピュータプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたコンピュータプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や周辺機器のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記のプログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Note that the functions of the image content evaluation apparatus according to the above-described embodiment may be realized by a computer. In this case, a computer program for realizing the control function may be recorded on a computer-readable recording medium, and the computer program recorded on the recording medium may be read by the computer system and executed. . Here, the “computer system” includes an OS (Operating System) and hardware of peripheral devices. The “computer-readable recording medium” refers to a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, and a memory card, and a storage device such as a hard disk built in the computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case may be included and a program that holds a program for a certain period of time may be included. Further, the above program may be for realizing a part of the functions described above, or may be realized by a combination with the program already recorded in the computer system. .

以上、本発明の実施形態について図面を参照して詳述したが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the concrete structure is not restricted to this embodiment, The design etc. of the range which does not deviate from the summary of this invention are included.

本発明は、例えば、放送やネットワーク配信に用いられるコンテンツの制作過程における、画像コンテンツの評価に利用することができる。また、同様に、公共施設などで公衆に提示される映像広告に係るコンテンツの制作過程においても、画像コンテンツの評価に利用することができる。 The present invention can be used for evaluation of image content in the production process of content used for broadcasting and network distribution, for example. Similarly, it can be used for evaluation of image content in the production process of content related to video advertisements presented to the public in public facilities.

１画像コンテンツ評価装置
１１データ格納部
１２注視点データ解析部
１３画像解析部
１４比較処理部
１５パラメータ決定部
２１画像入力部
２２類似画像コンテンツ決定部
２３注視点マップ推定部 DESCRIPTION OF SYMBOLS 1 Image content evaluation apparatus 11 Data storage part 12 Gaze point data analysis part 13 Image analysis part 14 Comparison processing part 15 Parameter determination part 21 Image input part 22 Similar image content determination part 23 Gaze point map estimation part

Claims

Video analysis is performed based on image analysis parameters related to visual attributes for each of the plurality of learning moving image contents and evaluation moving image content, and the saliency corresponding to the pixels included in the moving image content An image analysis unit that generates saliency map data indicating a distribution;
Based on the gazing point map data indicating the visual acuity distribution related to the learning moving image content and the saliency map data, the degree of coincidence as an index of similarity between the gazing point map data and the saliency map data is determined. A comparison processing unit to calculate,
The gazing point map data and the saliency map calculated by the comparison processing unit based on the result of video analysis performed on the learning moving image content based on different image analysis parameters in the image analysis unit. select based-out parameters for image analysis coincidence degree between data, to determine the selected image analysis parameters were as initial value of the evaluation parameter, steepest using the initial value of the evaluating parameter A parameter determination unit for determining a final evaluation parameter by a gradient method, and determining the final evaluation parameter as an optimal image analysis parameter ;
A similar image content determination unit for determining similar learning moving image content that is learning moving image content similar to the evaluation target moving image content;
Video analysis of the similar image the image analysis unit is the evaluation target moving image content based on said optimal image analysis parameter the parameter determination unit has determined to the similar learning moving image contents determined by the content determination unit The saliency map data generated by performing as the gazing point map data estimated as the visual acuity distribution of the evaluation target moving image content;
A moving image content evaluation apparatus comprising:

Note indicating the visual acuity distribution corresponding to the pixels included in the learning moving image content based on the gazing point data including the coordinate value of the gazing point as a result of measuring the line of sight with respect to the plurality of learning moving image contents. A gazing point data analysis unit for generating viewpoint map data is further provided.
The comparison processing unit, based on the saliency map data regarding each of the plurality of learning moving image contents and the gazing point map data generated by the gazing point data analysis unit, The moving image content evaluation apparatus according to claim 1, wherein a degree of coincidence that is an index of similarity with map data is calculated.

Computer
Video analysis is performed based on image analysis parameters related to visual attributes for each of the plurality of learning moving image contents and evaluation moving image content, and the saliency corresponding to the pixels included in the moving image content An image analysis unit that generates saliency map data indicating a distribution;
Based on the gazing point map data indicating the visual acuity distribution related to the learning moving image content and the saliency map data, the degree of coincidence as an index of similarity between the gazing point map data and the saliency map data is determined. A comparison processing unit to calculate,
The gazing point map data and the saliency map calculated by the comparison processing unit based on the result of video analysis performed on the learning moving image content based on different image analysis parameters in the image analysis unit. select based-out parameters for image analysis coincidence degree between data, to determine the selected image analysis parameters were as initial value of the evaluation parameter, steepest using the initial value of the evaluating parameter A parameter determination unit for determining a final evaluation parameter by a gradient method, and determining the final evaluation parameter as an optimal image analysis parameter ;
A similar image content determination unit for determining similar learning moving image content that is learning moving image content similar to the evaluation target moving image content;
Video analysis of the similar image the image analysis unit is the evaluation target moving image content based on said optimal image analysis parameter the parameter determination unit has determined to the similar learning moving image contents determined by the content determination unit The saliency map data generated by performing as the gazing point map data estimated as the visual acuity distribution of the evaluation target moving image content,
Computer program to function as.