JP6553664B2

JP6553664B2 - Model learning device, score calculation device, method, data structure, and program

Info

Publication number: JP6553664B2
Application number: JP2017044172A
Authority: JP
Inventors: 崇之梅田; 豪入江; 隆行黒住; 杵渕　哲也; 哲也杵渕
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 2017-03-08
Filing date: 2017-03-08
Publication date: 2019-07-31
Anticipated expiration: 2037-03-08
Also published as: JP2018147392A

Description

本発明は、モデル学習装置、スコア計算装置、方法、データ構造、及びプログラムに係り、特に、クエリデータと参照データとの類似度スコアを計算するためのモデル学習装置、スコア計算装置、方法、データ構造、及びプログラムに関する。 The present invention relates to a model learning device, a score calculation device, a method, a data structure, and a program, and in particular, a model learning device, a score calculation device, a method, and data for calculating a similarity score between query data and reference data. It relates to structure and program.

商標出願時、登録済み商標画像との類否が登録審査の一基準となる。 At the time of trademark application, similarity with registered trademark images is one of the criteria for registration examination.

多くの商標画像はある概念を抽象化して記述されている。例えば「人」について、全身を詳細に描いたイラストがある一方で、棒人間など丸や四角などプリミティブな記号の組合せで表現される場合がある。 Many trademark images are described by abstracting a certain concept. For example, “person” may be represented by a combination of primitive symbols such as circles and squares, such as stick figures, while there are illustrations depicting the whole body in detail.

出願商標画像に対して、このような過去に登録された膨大な抽象的な商標画像との類否を判断することは、時間効率性・客観性の観点から非常に困難である。 It is very difficult to determine the similarity between the application trademark image and such a large number of abstract trademark images registered in the past from the viewpoint of time efficiency and objectivity.

このような状況から、商標審査官（and/or 出願の事前調査を行うユーザ）による抽象化された概念についての類否判定をサポートする手法が要望されている。 Under such circumstances, there is a demand for a method that supports the similarity determination of the abstracted concept by the trademark examiner (the user who conducts the preliminary search of the and / or application).

また、クエリ、正解画像、不正解画像のセットを基に、クエリ-類似画像間のスコアは高くなるように、反対にクエリ-非類似画像間のスコアは低くなるように学習することを目的とした損失関数が提案されている（非特許文献１）。 Also, based on a set of queries, correct images, and incorrect images, the purpose is to learn so that the score between query-similar images is high and the score between query-non-similar images is low. Loss function is proposed (Non-Patent Document 1).

Jiang Wang et al., " Learning Fine-grained Image Similarity with Deep Ranking", インターネット検索＜ＵＲＬ：https://arxiv.org/pdf/1404.4661v1.pdf＞Jiang Wang et al., "Learning Fine-grained Image Similarity with Deep Ranking", Internet Search <URL: https://arxiv.org/pdf/1404.4661v1.pdf>

本発明では、精度よく類似度スコアを計算することができるモデル学習装置、スコア計算装置、方法、データ構造、及びプログラムを提供することを目的とする。 It is an object of the present invention to provide a model learning device, a score calculation device, a method, a data structure, and a program that can calculate a similarity score with high accuracy.

上記目的を達成するために、第１の発明に係るモデル学習装置は、データを入力とし特徴ベクトルを出力するための多層構造ニューラルネットワークを学習するモデル学習装置であって、クエリデータ、前記クエリデータに類似する類似データ、及び前記クエリデータに類似しない非類似データに基づいて、全結合層の出力に対して正規化を行う前記多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコア、及び前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアを含む損失関数を用いて、前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコアが高く、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアが低くなるように、前記多層構造ニューラルネットワークのパラメータを学習する学習部を含んで構成されている。 In order to achieve the above object, a model learning device according to a first aspect of the present invention is a model learning device that learns a multi-layered neural network for receiving data and outputting feature vectors, which includes query data and query data And the similar to the feature vector of the query data output from the multi-layered structure neural network that performs normalization on the output of the entire combined layer based on similar data similar to L and dissimilar data not similar to the query data. The feature vector of the query data and the feature of the similar data using a loss function including the similarity score of the data to the feature vector and the similarity score of the feature vector of the query data and the feature vector of the dissimilar data The similarity score with the vector is high, and the feature vector of the query data and the As the similarity score between the feature vector data it is lowered, and is configured to include a learning unit for learning the parameters of the multi-layered structure neural network.

第２の発明に係るモデル学習方法は、データを入力とし特徴ベクトルを出力するための多層構造ニューラルネットワークを学習するモデル学習装置におけるモデル学習方法であって、学習部が、クエリデータ、前記クエリデータに類似する類似データ、及び前記クエリデータに類似しない非類似データに基づいて、全結合層の出力に対して正規化を行う前記多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコア、及び前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアを含む損失関数を用いて、前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコアが高く、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアが低くなるように、前記多層構造ニューラルネットワークのパラメータを学習する。 A model learning method according to a second aspect of the present invention is a model learning method in a model learning device for learning a multi-layered structure neural network for inputting data and outputting a feature vector, wherein the learning unit comprises query data and query data And the similar to the feature vector of the query data output from the multi-layered structure neural network that performs normalization on the output of the entire combined layer based on similar data similar to L and dissimilar data not similar to the query data. Using a loss function including a similarity score with a feature vector of data and a similarity score between the feature vector of the query data and the feature vector of the dissimilar data, the feature vector of the query data and the feature of the similar data The similarity score with the vector is high, and the feature vector of the query data As the similarity score between the feature vector of the dissimilar data is low, it learns the parameters of the multi-layered structure neural network.

第３の発明に係るモデル学習装置は、データを入力とし特徴ベクトルを出力するための多層構造ニューラルネットワークを学習するモデル学習装置であって、クエリデータ、前記クエリデータに類似する類似データ、及び前記クエリデータに類似しない非類似データに基づいて、前記多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコア、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコア、前記特徴ベクトルとは異なる情報を用いて計算された前記クエリデータと前記類似データとの類似度スコア、及び前記特徴ベクトルとは異なる情報を用いて計算された前記クエリデータと前記非類似データとの類似度スコアを含む損失関数を用いて、前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコアが高く、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアが低くなるように、前記多層構造ニューラルネットワークのパラメータを学習する学習部を含んで構成されている。 A model learning device according to a third aspect of the present invention is a model learning device for learning a multi-layered structure neural network for inputting data and outputting a feature vector, comprising: query data; similar data similar to the query data; Similarity score between the feature vector of the query data output from the multilayer structure neural network and the feature vector of the similar data based on dissimilar data not similar to the query data, the dissimilarity of the query data and the feature vector Similarity score between data and feature vector, similarity score between the query data and the similar data calculated using information different from the feature vector, and calculated using information different from the feature vector Loss including similarity score between the query data and the dissimilar data The similarity score between the feature vector of the query data and the feature vector of the similar data is high, and the similarity score between the feature vector of the query data and the feature vector of the dissimilar data is low, using a number. And a learning unit configured to learn parameters of the multi-layered neural network.

第４の発明に係るモデル学習方法は、データを入力とし特徴ベクトルを出力するための多層構造ニューラルネットワークを学習するモデル学習装置におけるモデル学習方法であって、学習部が、クエリデータ、前記クエリデータに類似する類似データ、及び前記クエリデータに類似しない非類似データに基づいて、前記多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコア、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコア、前記特徴ベクトルとは異なる情報を用いて計算された前記クエリデータと前記類似データとの類似度スコア、及び前記特徴ベクトルとは異なる情報を用いて計算された前記クエリデータと前記非類似データとの類似度スコアを含む損失関数を用いて、前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコアが高く、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアが低くなるように、前記多層構造ニューラルネットワークのパラメータを学習する。 A model learning method according to a fourth aspect of the invention is a model learning method in a model learning device for learning a multi-layered structure neural network for inputting data and outputting a feature vector, the learning unit comprising query data and query data A similarity score between a feature vector of the query data output from the multi-layer structure neural network and a feature vector of the similar data based on similar data similar to the data and dissimilar data not similar to the query data; The similarity score between the feature vector of data and the feature vector of the dissimilar data, the similarity score between the query data and the similar data calculated using information different from the feature vector, and the feature vector The dissimilarity to the query data calculated using different information Similarity score between the feature vector of the query data and the feature vector of the similar data is high using a loss function including the similarity score with the data, and the feature vector of the query data and the feature vector of the dissimilar data The parameters of the multi-layered neural network are trained so that the similarity score with.

第５の発明に係るスコア計算装置は、クエリデータと参照データとの類似度を表すスコアを算出するスコア計算装置であって、クエリデータ及び参照データに基づいて、予め学習された、全結合層の出力に対して正規化を行う多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと、前記多層構造ニューラルネットワークから出力される前記参照データの特徴ベクトルとを求め、前記クエリデータの特徴ベクトルと、前記参照データの特徴ベクトルとの類似度スコアを計算するスコア計算部と、前記特徴ベクトルとは異なる情報を用いて、前記クエリデータと前記参照データとの類似度スコアを計算する他手法スコア計算部と、前記スコア計算部によって計算された前記類似度スコアと、前記他手法スコア計算部によって計算された前記類似度スコアとを統合した統合スコアを計算するスコア統合部と、を含んで構成されている。 A score calculation device according to a fifth aspect of the present invention is a score calculation device for calculating a score representing the degree of similarity between query data and reference data, wherein all joint layers learned in advance based on query data and reference data Calculating a feature vector of the query data output from the multilayer structure neural network that performs normalization on the output of the second output data and a feature vector of the reference data output from the multilayer structure neural network, the feature vector of the query data A score calculation unit for calculating a similarity score with the feature vector of the reference data, and another method score for calculating a similarity score between the query data and the reference data using information different from the feature vector A calculation unit, the similarity score calculated by the score calculation unit, and the other method score calculation unit Thus it is configured to include a score integration unit that calculates a calculated combined score that integrates with the similarity score, the.

第６の発明に係るデータ構造は、クエリデータと参照データとの間の類似度スコアを算出するために用いられるパラメータであって、入力を、クエリデータ、前記クエリデータに類似する類似データ、前記クエリデータに類似しない非類似データとし、出力を、コサイン類似度を求めるための、前記クエリデータの正規化された特徴ベクトル、前記類似データの正規化された特徴ベクトル、前記非類似データの正規化された特徴ベクトルとする多層構造ニューラルネットワークのパラメータを含むデータ構造である。 A data structure according to a sixth aspect of the present invention is a parameter used to calculate a similarity score between query data and reference data, the input comprising query data, similar data similar to the query data, The non-similar data not similar to the query data, and the output being the normalized feature vector of the query data, the normalized feature vector of the similar data, and the non-similar data normalization to obtain cosine similarity It is a data structure including the parameters of the multi-layered structure neural network as a set feature vector.

また、第７の発明のプログラムは、コンピュータを、上記のモデル学習装置又はスコア計算装置を構成する各部として機能させるためのプログラムである。 A program according to a seventh aspect is a program for causing a computer to function as each part constituting the model learning device or the score calculation device.

以上説明したように、本発明のモデル学習装置、方法、及びプログラムによれば、損失関数を用いて、前記クエリデータの特徴ベクトルと前記類似データの特徴ベクトルとの類似度スコアが高く、前記クエリデータの特徴ベクトルと前記非類似データの特徴ベクトルとの類似度スコアが低くなるように、全結合層の出力に対して正規化を行う多層構造ニューラルネットワークのパラメータを学習することにより、精度よく類似度スコアを計算するためのモデルを学習することができる。 As described above, according to the model learning apparatus, method, and program of the present invention, the similarity score between the feature vector of the query data and the feature vector of the similar data is high using the loss function, and the query By learning the parameters of a multilayered neural network that normalizes the output of all connected layers so that the similarity score between the feature vector of the data and the feature vector of the dissimilar data is low, the similarity is accurately obtained. A model for calculating the degree score can be learned.

本発明のスコア計算装置及びプログラムによれば、全結合層の出力に対して正規化を行う多層構造ニューラルネットワークから出力される前記クエリデータの特徴ベクトルと、多層構造ニューラルネットワークから出力される参照データの特徴ベクトルとを求め、類似度スコアを計算し、特徴ベクトルとは異なる情報を用いて、クエリデータと参照データとの類似度スコアを計算し、統合することにより、精度よく類似度スコアを計算することができる。 According to the score calculation apparatus and program of the present invention, the feature vector of the query data output from the multilayer structure neural network that performs normalization on the output of all the combined layers, and the reference data output from the multilayer structure neural network The feature score is calculated and the similarity score is calculated. Using the information different from the feature vector, the similarity score between the query data and the reference data is calculated and integrated to calculate the similarity score with high accuracy. can do.

本発明のデータ構造によれば、入力を、クエリデータ、類似データ、非類似データとし、出力を、コサイン類似度を求めるための、クエリデータの正規化された特徴ベクトル、類似データの正規化された特徴ベクトル、非類似データの正規化された特徴ベクトルとする多層構造ニューラルネットワークのパラメータにより、精度よく類似度スコアを計算するためのモデルを学習することができる。 According to the data structure of the present invention, the input is query data, similar data, non-similar data, and the output is normalized feature vector of query data for finding cosine similarity, normalized similar data A model for calculating the similarity score with high accuracy can be learned by using the parameters of the multilayered neural network as the feature vector and the normalized feature vector of the dissimilar data.

（Ａ）ＤＮＮ「Ｉｌｌｕｓｔｒａｔｉｏｎ２Ｖｅｃ」の構成を示す図、及び（Ｂ）本発明の第１の実施の形態のＤＮＮモデルの構成を示す図である。(A) The figure which shows the structure of DNN "Illustration2Vec", and (B) The figure which shows the structure of the DNN model of the 1st Embodiment of this invention. 各図形の有無をビットとして各商標画像をバイナリで表現した例を示す図である。It is a figure which shows the example which represented each brand image in binary with the presence or absence of each figure as a bit. 本発明の第１の実施の形態に係るモデル学習装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a model learning device according to a first embodiment of the present invention. 本発明の第１の実施の形態に係るスコア計算装置の構成を示すブロック図である。It is a block diagram showing composition of a score calculation device concerning a 1st embodiment of the present invention. 本発明の第１の実施の形態に係るモデル学習装置におけるモデル学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the model learning process routine in the model learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るスコア計装置におけるスコア計算処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the score calculation process routine in the score meter apparatus which concerns on the 1st Embodiment of this invention. ＤＮＮ特徴ベクトルを用いて得られたスコアと他手法で得られたスコアとを統合する方法を説明するための図である。It is a figure for demonstrating the method to integrate the score obtained by using a DNN feature vector, and the score obtained by the other method. ＤＮＮ特徴ベクトルを用いて得られたスコアと他手法で得られたスコアとを含む損失関数を用いてＤＮＮのパラメータを学習する方法を説明するための図である。It is a figure for demonstrating the method of learning the parameter of DNN using the loss function containing the score obtained using the DNN feature vector, and the score obtained by the other method. ＤＮＮ特徴ベクトルを用いて得られたスコアと他手法で得られたスコアとを統合する方法を説明するための図である。It is a figure for demonstrating the method to integrate the score obtained by using a DNN feature vector, and the score obtained by the other method. 本発明の第２の実施の形態に係るモデル学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the model learning apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係るスコア計算装置の構成を示すブロック図である。It is a block diagram which shows the structure of the score calculation apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係るモデル学習装置におけるモデル学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the model learning process routine in the model learning apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係るスコア計装置におけるスコア計算処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the score calculation process routine in the score scale apparatus which concerns on the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態の概要＞
イラスト画像分類用のDNN「Illustration2Vec」は「一般イラストの多クラス分類」を目的に設計されているため（図１（Ａ）参照）、本発明の実施の形態におけるＤＮＮでは、次の3点を改良している（図１（Ｂ）参照）。 <Overview of the embodiment of the present invention>
The DNN for illustration image classification “Illustration2Vec” is designed for the purpose of “multi-class classification of general illustrations” (see FIG. 1A). Therefore, the DNN according to the embodiment of the present invention has the following three points. It is improving (refer to FIG. 1 (B)).

（１）色情報を除外する（グレースケール化を行って入力画像を1チャネルとする）。
（２）正規化層を導入して、コサイン類似度へ適応させる。
（３）検索ランキングに適した損失関数Triplet-lossを導入する。 (1) Exclude color information (use grayscale to make input image one channel).
(2) Introduce a normalization layer to adapt to cosine similarity.
(3) A loss function Triplet-loss suitable for search ranking is introduced.

また、正規化層では、ベクトルの各要素を2乗して合計した値で各要素を除算する処理(L2正規化)を行う。 Further, the normalization layer performs a process (L2 normalization) of dividing each element by a value obtained by squaring and summing each element of the vector.

・・・（１）
... (1)

ただし、ｙは、正規化層の出力値であり、特徴ベクトルに相当する。また、ｘは、前層の出力値（上記図１（Ｂ）の例では、全結合層の出力である4096次元のベクトル）である。ｙを用いてTriplet Lossを計算する。 Here, y is the output value of the normalization layer, and corresponds to a feature vector. Further, x is the output value of the previous layer (in the example of FIG. 1B above, a 4096-dimensional vector that is the output of all coupling layers). Calculate Triplet Loss using y.

ここで、正規化層を導入する理由について説明する。 Here, the reason for introducing the normalization layer will be described.

Triplet Lossの距離計算はユークリッド距離で行われる。一方で一般に、画像検索における距離（類似度）はコサイン類似度が適していることが多い。 Distance calculation of Triplet Loss is performed by Euclidean distance. On the other hand, in general, cosine similarity is often suitable for the distance (similarity) in image search.

ユークリッド距離ｄは以下の式で表わされる。 The Euclidean distance d is expressed by the following equation.

コサイン類似度ｃｏｓは以下の式で表わされる。 The cosine similarity cos is expressed by the following equation.

L2正規化を行った場合、||ｘ||＝１となるため、ユークリッド距離とコサイン類似度は等価になる。従って、正規化層を導入して、全結合層の出力値に対してＬ２正規化を行うことにより、コサイン類似度へ適応させることができる。 When L2 normalization is performed, || x || = 1, so the Euclidean distance and the cosine similarity are equivalent. Therefore, it is possible to adapt to the cosine similarity by introducing a normalization layer and performing L2 normalization on the output values of all coupling layers.

次に、Tripletデータの選択方法について説明する。 Next, a method for selecting Triplet data will be described.

TripletLossの計算にはあるクエリ画像に対して類似画像と非類似画像の両方からなる正解データが必要となるが、検索における類否は、多くの場合に主観評価によって決まり、またDNNの学習には大量のデータが必要となるため、データ収集のコストが高い。 Although the calculation of TripletLoss requires correct data consisting of both similar and dissimilar images for a given query image, similarity in search is often determined by subjective evaluation and also for DNN learning. The cost of data collection is high because a large amount of data is required.

本発明の実施の形態では、画像以外の情報を用いてTripletの「非類似画像」の選択を行う。 In the embodiment of the present invention, information other than the image is used to select the Triplet "non-similar image".

例えば、全ての登録済み商標画像にはウィーン図形に基づいた分類がなされている。クエリ画像に付与された図形コードと任意の商標画像の図形コードとを比較し、コードの一致度が低い商標画像を非類似画像とすることで、疑似的に非類似画像のデータ数を増やす。 For example, all registered trademark images are classified based on Vienna figures. The graphic code of the query image is compared with the graphic code of any trademark image, and the number of data of the non-similar image is increased by making the non-similar image a trademark image having a low degree of code coincidence.

例えば、図２に示すように、各図形の有無をビットとして各商標画像をバイナリで表現し、ハミング距離で一致度を算出し、ハミング距離が閾値以上の商標画像を非類似画像とみなす。 For example, as shown in FIG. 2, each trademark image is represented in binary with the presence or absence of each figure as a bit, the matching degree is calculated by the Hamming distance, and the trademark image having the Hamming distance equal to or more than the threshold is regarded as a dissimilar image.

＜第１の実施の形態＞
＜モデル学習装置のシステム構成＞
図３は、本発明の第１の実施の形態に係るモデル学習装置１００を示すブロック図である。このモデル学習装置１００は、ＣＰＵと、ＲＡＭと、後述するモデル学習処理ルーチンを実行するためのプログラムを記憶したＲＯＭと、を備えたコンピュータで構成され、機能的には次に示すように構成されている。 First Embodiment
<System Configuration of Model Learning Device>
FIG. 3 is a block diagram showing the model learning apparatus 100 according to the first embodiment of the present invention. The model learning apparatus 100 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a model learning processing routine described later, and is functionally configured as follows. ing.

本実施の形態に係るモデル学習装置１００は、図３に示すように、入力部１０と、演算部２０と、出力部４０とを備えている。 As shown in FIG. 3, the model learning device 100 according to the present embodiment includes an input unit 10, a calculation unit 20, and an output unit 40.

入力部１０は、クエリ画像と、当該クエリ画像に類似する類似画像との複数ペアを学習データとして受け付ける。 The input unit 10 receives, as learning data, a plurality of pairs of a query image and a similar image similar to the query image.

演算部２０は、学習データ記憶部２１、画像データベース２２、非類似データ選択部２３、及び学習部２４を備えている。 The calculation unit 20 includes a learning data storage unit 21, an image database 22, a dissimilar data selection unit 23, and a learning unit 24.

学習データ記憶部２１は、入力部１０で受け付けたクエリ画像と、当該クエリ画像に類似する類似画像との複数ペアを学習データとして記憶している。 The learning data storage unit 21 stores a plurality of pairs of a query image received by the input unit 10 and a similar image similar to the query image as learning data.

画像データベース２２には、参照画像の集合を記憶している。 The image database 22 stores a set of reference images.

非類似データ選択部２３は、学習データのクエリ画像の各々に対して、画像以外の情報を用いて、画像データベース２２に記憶されている参照画像の集合から、当該クエリ画像に類似しない非類似画像を選択する。 The dissimilar data selection unit 23 uses, for each of the query images of the learning data, information other than the image, and from the set of reference images stored in the image database 22, dissimilar images that are not similar to the query image. Choose

学習部２４は、学習データのクエリ画像と類似画像の複数ペア、及びクエリ画像の各々に対して選択された非類似画像に基づいて、ＤＮＮのパラメータを学習する。本実施の形態に係るＤＮＮは、全結合層の出力に対して正規化を行って、特徴ベクトルを出力する。 The learning unit 24 learns DNN parameters based on a plurality of pairs of query images and similar images of learning data, and dissimilar images selected for each of the query images. The DNN according to the present embodiment performs normalization on the outputs of all the combined layers and outputs a feature vector.

ＤＮＮのパラメータの学習では、クエリ画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）と、類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）との類似度スコア、及びクエリ画像の特徴ベクトル（上記式（１）に示す正規化層の出力）と非類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）との類似度スコアを含む損失関数を用いて、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコアが高く、クエリ画像の特徴ベクトルと非類似画像の特徴ベクトルとの類似度スコアが低くなるように、ＤＮＮのパラメータを学習する。なお、本実施の形態に係るＤＮＮは、上記図１（Ｂ）に示すように、入力画像の１チャネルを入力とする入力層と、畳み込み層と、全結合層と、正規化装置とを含んで構成されている。 In learning the parameters of DNN, the query image is input and the feature vector (output of the normalization layer shown in the above equation (1)) input from the DNN and the similar image is input and the feature vector output from DNN ( The similarity score with the output of the normalized layer shown in the above equation (1), the feature vector of the query image (the output of the normalized layer shown in the above equation (1)) and the dissimilar image are input and output from DNN The similarity score between the feature vector of the query image and the feature vector of the similar image is high, using the loss function including the similarity score with the feature vector (the output of the normalized layer shown in the above equation (1)) The parameters of DNN are learned so that the similarity score between the feature vector of the query image and the feature vector of the dissimilar image is low. The DNN according to the present embodiment includes, as shown in FIG. 1 (B), an input layer to which one channel of an input image is input, a convolutional layer, a total coupling layer, and a normalization device. It consists of

学習方法として、従来既知の手法（例えば、一般的なＤｅｅｐＬｅａｒｎｉｎｇの誤差逆伝播）を用いればよい。 A conventionally known method (for example, error back propagation of general Deep Learning) may be used as a learning method.

なお、既存の適当な学習パラメータ（の一部）を、ＤＮＮのパラメータの初期値として用いて、ＤＮＮのパラメータを学習してもよい。例えば、Illustration2VecやVGGなどによる学習パラメータを用いても良い。これにより、学習の効果が上がる。 Note that the DNN parameters may be learned by using (part of) existing appropriate learning parameters as initial values of the DNN parameters. For example, learning parameters by Illustration 2 Vec or VGG may be used. This increases the learning effect.

学習されたＤＮＮのパラメータが、出力部４０により出力される。ここで、ＤＮＮのパラメータが、クエリ画像と参照画像との間の類似度スコアを算出するために用いられるパラメータであって、入力を、クエリ画像、類似画像、非類似画像とし、出力を、コサイン類似度を求めるための、クエリ画像の正規化された特徴ベクトル、類似画像の正規化された特徴ベクトル、非類似画像の正規化された特徴ベクトルとする多層構造ニューラルネットワークのパラメータを含むデータ構造である。 The learned parameters of DNN are output by the output unit 40. Here, the parameters of DNN are parameters used to calculate the similarity score between the query image and the reference image, and the input is a query image, a similar image, a dissimilar image, and the output is a cosine Data structure including parameters of multi-layer structure neural network which is normalized feature vector of query image, normalized feature vector of similar image, and normalized feature vector of dissimilar image for finding similarity is there.

＜スコア計算装置のシステム構成＞
図４は、本発明の第１の実施の形態に係るスコア計算装置１５０を示すブロック図である。このスコア計算装置１５０は、ＣＰＵと、ＲＡＭと、後述するスコア計算処理ルーチンを実行するためのプログラムを記憶したＲＯＭと、を備えたコンピュータで構成され、機能的には次に示すように構成されている。 <System Configuration of Score Calculation Device>
FIG. 4 is a block diagram showing the score calculation apparatus 150 according to the first embodiment of the present invention. The score calculation device 150 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a score calculation processing routine described later, and is functionally configured as follows. ing.

本実施の形態に係るスコア計算装置１５０は、図４に示すように、入力部６０と、演算部７０と、出力部９０とを備えている。 As shown in FIG. 4, the score calculation device 150 according to the present embodiment includes an input unit 60, a calculation unit 70, and an output unit 90.

入力部６０は、クエリ画像の入力を受け付ける。 The input unit 60 receives an input of a query image.

演算部７０は、モデル記憶部７１、画像データベース７２、及びスコア計算部７３を備えている。 The calculation unit 70 includes a model storage unit 71, an image database 72, and a score calculation unit 73.

モデル記憶部７１には、モデル学習装置１００によって出力されたＤＮＮのパラメータが記憶されている。 The model storage unit 71 stores parameters of DNN output by the model learning device 100.

画像データベース７２には、参照画像の集合が記憶されている。 The image database 72 stores a set of reference images.

スコア計算部７３は、クエリ画像及び参照画像の集合に基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）と、参照画像の各々を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）とを求め、参照画像の各々に対し、クエリ画像の特徴ベクトルと、当該参照画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 The score calculation unit 73 receives the query image based on the set of the query image and the reference image, and outputs the feature vector (the output of the normalized layer shown in the above equation (1)) input from the DNN and the reference image. To obtain a feature vector (output of the normalized layer shown in the above equation (1)) output from DNN, and for each of the reference images, a feature vector of the query image and a feature vector of the reference image A similarity score representing the cosine similarity is calculated.

出力部９０は、類似度スコアの降順に、参照画像を出力する。 The output unit 90 outputs the reference image in descending order of the similarity score.

＜モデル学習装置の作用＞
次に、第１の実施の形態に係るモデル学習装置１００の作用について説明する。クエリ画像と、当該クエリ画像に類似する類似画像との複数ペアが学習データとしてモデル学習装置１００に入力されると、モデル学習装置１００によって、図５に示すモデル学習処理ルーチンが実行される。 <Operation of Model Learning Device>
Next, the operation of the model learning device 100 according to the first embodiment will be described. When a plurality of pairs of a query image and a similar image similar to the query image are input to the model learning device 100 as learning data, the model learning processing routine shown in FIG.

まず、ステップＳ１００において、学習データのクエリ画像の各々に対して、画像以外の情報を用いて、画像データベース２２に記憶されている参照画像の集合から、当該クエリ画像に類似しない非類似画像を選択する。 First, in step S100, for each of query images of learning data, a non-similar image not similar to the query image is selected from a set of reference images stored in the image database 22 using information other than the image. Do.

そして、ステップＳ１０２において、学習データのクエリ画像と類似画像の複数ペア、及びクエリ画像の各々に対して選択された非類似画像に基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトルと、類似画像を入力してＤＮＮから出力される特徴ベクトルとの類似度スコア、及びクエリ画像の特徴ベクトルと非類似画像を入力してＤＮＮから出力される特徴ベクトルとの類似度スコアを含む損失関数を最適化するように、ＤＮＮのパラメータを学習する。学習されたＤＮＮのパラメータが、出力部４０により出力され、モデル学習処理ルーチンを終了する。 Then, in step S102, a query image is input based on a plurality of pairs of query images and similar images of learning data and non-similar images selected for each of the query images, and a feature vector output from DNN Loss function including similarity score with feature vector input from similar image and output from DNN, and similarity score with feature vector from query image and non-similar image output from DNN Learn the parameters of DNN to optimize. The learned parameters of DNN are output by the output unit 40, and the model learning processing routine is ended.

＜スコア計算装置の作用＞
次に、第１の実施の形態に係るスコア計算装置１５０の作用について説明する。まず、モデル学習装置１００から出力されたＤＮＮのパラメータが、スコア計算装置１５０に入力され、モデル記憶部７１に格納される。そして、クエリ画像がスコア計算装置１５０に入力されると、スコア計算装置１５０によって、図６に示すスコア計算処理ルーチンが実行される。 <Operation of score calculation device>
Next, the operation of the score calculation apparatus 150 according to the first embodiment will be described. First, the DNN parameters output from the model learning device 100 are input to the score calculation device 150 and stored in the model storage unit 71. Then, when the query image is input to the score calculation device 150, the score calculation processing routine shown in FIG.

まず、ステップＳ１１０において、クエリ画像を入力してＤＮＮから出力される特徴ベクトルを求める。 First, in step S110, a query image is input to obtain a feature vector output from DNN.

そして、ステップＳ１１２において、画像データベース７２に記憶されている参照画像を入力してＤＮＮから出力される特徴ベクトルを求める。 In step S112, a reference image stored in the image database 72 is input to obtain a feature vector output from the DNN.

ステップＳ１１４では、上記ステップＳ１１０で求められたクエリ画像の特徴ベクトルと、上記ステップＳ１１２で求められた参照画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 In step S114, a similarity score representing the cosine similarity between the feature vector of the query image obtained in step S110 and the feature vector of the reference image obtained in step S112 is calculated.

ステップＳ１１６では、画像データベース７２に記憶されている全ての参照画像について、上記ステップＳ１１２〜Ｓ１１４の処理を実行したか否かを判定し、上記ステップＳ１１２〜Ｓ１１４の処理を実行していない参照画像が存在する場合には、上記ステップＳ１１２へ戻り、当該参照画像の特徴ベクトルを求める。一方、画像データベース７２に記憶されている全ての参照画像について、上記ステップＳ１１２〜Ｓ１１４の処理を実行した場合には、ステップＳ１１８へ進む。 In step S116, it is determined whether or not the processing of steps S112 to S114 has been performed for all reference images stored in the image database 72, and the reference images for which the processing of steps S112 to S114 have not been executed are If it exists, the process returns to step S112, and a feature vector of the reference image is obtained. On the other hand, when the processes in steps S112 to S114 have been executed for all reference images stored in the image database 72, the process proceeds to step S118.

ステップＳ１１８では、上記ステップＳ１１４で計算された類似度スコアの降順に、参照画像を出力部９０により出力して、スコア計算処理ルーチンを終了する。 In step S118, reference images are output by the output unit 90 in descending order of the similarity score calculated in step S114, and the score calculation processing routine ends.

以上説明したように、本発明の第１の実施の形態に係るモデル学習装置によれば、損失関数を用いて、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコアが高く、クエリ画像の特徴ベクトルと非類似画像の特徴ベクトルとの類似度スコアが低くなるように、全結合層の出力に対して正規化を行うＤＮＮのパラメータを学習することにより、精度よく類似度スコアを計算するためのモデルを学習することができる。 As described above, according to the model learning device according to the first embodiment of the present invention, the similarity score between the feature vector of the query image and the feature vector of the similar image is high using the loss function, and the query Calculate the similarity score with high accuracy by learning the DNN parameter that normalizes the output of all connected layers so that the similarity score between the feature vector of the image and the feature vector of the dissimilar image is low To learn a model to do.

また、本発明の第１の実施の形態に係るスコア計算装置によれば、モデル学習装置によって学習されたＤＮＮのパラメータに基づいて、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコアを計算することにより、精度よく類似度スコアを計算することができる。 Further, according to the score calculation device according to the first embodiment of the present invention, the similarity score between the feature vector of the query image and the feature vector of the similar image based on the parameters of DNN learned by the model learning device. The similarity score can be calculated with high accuracy by calculating

＜第２の実施の形態＞
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 Second Embodiment
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、ＤＮＮとは異なる他手法で得られたスコアと統合して、類似度スコアを計算する点と、他手法で得られたスコアを含む損失関数を用いて、ＤＮＮのパラメータを学習している点とが、第１の実施の形態と異なっている。 In the second embodiment, the DNN is integrated with the score obtained by another method different from the DNN to calculate the similarity score, and using the loss function including the score obtained by the other method, the DNN The point of learning the parameters is different from that of the first embodiment.

＜本実施の形態の概要＞
図７に示すように、本実施の形態では、DNNで出力された特徴ベクトルを用いたスコアと他手法で得られたスコアとを統合して、最終的なスコアを求める。具体的には、先述のDNNで出力された特徴ベクトルを用いたスコア（コサイン類似度）ｙと、他手法で得られたスコアｘを、双方のスコアの重み付き線形和ａｘ＋ｂｙで統合して最終スコアを求める。 <Overview of this embodiment>
As shown in FIG. 7, in the present embodiment, a score using a feature vector output by DNN and a score obtained by another method are integrated to obtain a final score. Specifically, the score (cosine similarity) y using the feature vector output by the DNN described above and the score x obtained by another method are integrated by a weighted linear sum ax + by of both scores, and the final result is obtained. Ask for a score.

ここで、他手法で得られたスコアとして、任意の手法で算出したスコアを利用可能である。例えば、画像に付されたタグ関連度や、ウィーン図形コードの一致度などを利用可能である。 Here, a score calculated by any method can be used as the score obtained by another method. For example, the degree of tag relevance attached to the image, the degree of coincidence of the Vienna graphic code, etc. can be used.

また、スコアの統合は任意の方法で代替可能である。例えば、DNNの損失関数に組み込まれた式と同様の式でスコアを統合してもよいし、SVMやRank-SVMで結合することによりスコアを統合してもよい。 The score integration can be replaced by any method. For example, the scores may be integrated by an expression similar to the expression incorporated in the DNN loss function, or the scores may be integrated by combining with SVM or Rank-SVM.

また、図８に示すように、DNNで出力された特徴ベクトルを用いたスコアと他手法で得られたスコアとを含む損失関数を用いて、ＤＮＮのパラメータを学習する。 Further, as shown in FIG. 8, DNN parameters are learned using a loss function including a score using a feature vector output by DNN and a score obtained by another method.

本実施の形態における損失関数は、以下の式（２）で表わされる。 The loss function in the present embodiment is expressed by the following equation (2).

・・・（２）
... (2)

ただし、ｆ（）は、ＤＮＮ特徴ベクトルを表し、ｇ（）は、他手法による特徴ベクトルを表す。ここで、最初の２項は、ＤＮＮ特徴ベクトルの類似度に関する部分であり、後の固定項（３項）のおかげで、他手法の影響を勘案して差分だけ学習できる。 However, f () represents a DNN feature vector, and g () represents a feature vector according to another method. Here, the first two terms relate to the degree of similarity of DNN feature vectors, and it is possible to learn only differences taking into account the effects of other methods, thanks to the later fixed terms (third term).

３つ目の項は、他手法によるクエリ画像と類似画像の類似度（固定値）であり、４つ目の項は、他手法によるクエリ画像と非類似画像の類似度（固定値）であり、５つ目の項は、マージン（固定値）を表す。３つ目の項以降の値が負の場合には、３つ目の項以降の値を０とする。 The third term is the similarity (fixed value) between the query image and the similar image by another method, and the fourth term is the similarity (fixed value) between the query image and the dissimilar image by the other method The fifth term represents a margin (fixed value). When the value after the third term is negative, the value after the third term is set to 0.

なお、他手法によるスコアは複数であってもよく、後の３項と同様の３項を追加することで対応可能である。 In addition, the score by another method may be plural, and it can respond by adding the same three terms as the latter three terms.

マージンmは学習時のみ使うパラメータで、他手法によって算出した類似度に合わせてチューニングする。例えば、DNN特徴ベクトルによる類似度が0.00〜1.00の値域で、他手法による類似度の値域が100.00〜101.00であればm=-100などとする。 Margin m is a parameter used only at the time of learning, and is tuned in accordance with the degree of similarity calculated by another method. For example, if the similarity based on the DNN feature vector is in the range of 0.00 to 1.00 and the similarity in the range of other methods is 100.0 to 101.00, m = -100 or the like.

上記の損失関数を最適化するように、ＤＮＮのパラメータに逆伝播することにより、ＤＮＮのパラメータを学習する。 Learn the parameters of DNN by back-propagating to the parameters of DNN so as to optimize the above loss function.

ここでは、３項目と４項目が、距離の形式で書いてあるが、ｇの中身は関係なく、「クエリ画像と類似画像の類似度」と「クエリ画像と非類似画像の類似度」の値が求められるのであれば、他の形式であっても良い。 Here, three items and four items are written in the form of distance, but the contents of g do not matter, and the values of “similarity between query image and similar image” and “similarity between query image and dissimilar image” If it is required, other forms may be used.

上記のように学習されたDNNで出力された特徴ベクトルを用いた類似度スコアと他手法で得られた類似度スコアとを統合して、参照画像との統合スコアとし、クエリ画像と類似する参照画像を検索する。例えば、以下の式のように、単純に足し合わせたものが統合スコアとなる（図９参照）。 The similarity score using the feature vector output by DNN learned as described above and the similarity score obtained by another method are integrated to obtain an integrated score with the reference image, and a reference similar to the query image Search for images For example, as shown in the following formula, a simple sum is an integrated score (see FIG. 9).

＜モデル学習装置のシステム構成＞
図１０は、本発明の第２の実施の形態に係るモデル学習装置２００を示すブロック図である。このモデル学習装置２００は、ＣＰＵと、ＲＡＭと、後述するモデル学習処理ルーチンを実行するためのプログラムを記憶したＲＯＭと、を備えたコンピュータで構成され、機能的には次に示すように構成されている。 <System Configuration of Model Learning Device>
FIG. 10 is a block diagram showing a model learning apparatus 200 according to the second embodiment of the present invention. The model learning apparatus 200 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a model learning processing routine described later, and is functionally configured as follows. ing.

本実施の形態に係るモデル学習装置２００は、図１０に示すように、入力部１０と、演算部２２０と、出力部４０とを備えている。 As shown in FIG. 10, model learning apparatus 200 according to the present embodiment includes input unit 10, calculation unit 220, and output unit 40.

演算部２２０は、学習データ記憶部２１、画像データベース２２、非類似データ選択部２３、他手法スコア計算部２２３、学習部２２４、スコア計算部２２６、及び重み学習部２２８を備えている。 The calculation unit 220 includes a learning data storage unit 21, an image database 22, a dissimilar data selection unit 23, another method score calculation unit 223, a learning unit 224, a score calculation unit 226, and a weight learning unit 228.

他手法スコア計算部２２３は、学習データのクエリ画像の各々に対して、当該クエリ画像の類似画像との類似度スコア、及び当該クエリ画像に類似しない非類似画像との類似度スコアを、ＤＮＮ特徴ベクトルとは異なる他手法により計算する。 The other method score calculation unit 223 generates, for each of the query images of the learning data, a similarity score with a similar image of the query image and a similarity score with a dissimilar image that is not similar to the query image. It is calculated by another method different from the vector.

学習部２４は、学習データのクエリ画像と類似画像の複数ペア、クエリ画像の各々に対して選択された非類似画像、他手法スコア計算部２２３により計算されたクエリ画像と類似画像との類似度スコア、及び他手法スコア計算部２２３により計算されたクエリ画像と非類似画像との類似度スコアに基づいて、ＤＮＮのパラメータを学習する。 The learning unit 24 uses a plurality of pairs of query images and similar images of learning data, dissimilar images selected for each of the query images, and the similarity between the query image calculated by the other technique score calculation unit 223 and the similar images. The parameters of DNN are learned based on the score and the similarity score between the query image and the dissimilar image calculated by the other method score calculation unit 223.

ＤＮＮのパラメータの学習では、クエリ画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）と、類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）との類似度スコア、クエリ画像の特徴ベクトル（上記式（１）に示す正規化層の出力）と非類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）との類似度スコア、他手法スコア計算部２２３により計算されたクエリ画像と類似画像との類似度スコア、及び他手法スコア計算部２２３により計算されたクエリ画像と非類似画像との類似度スコアを含む上記式（２）に示す損失関数を用いて、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコアが高く、クエリ画像の特徴ベクトルと非類似画像の特徴ベクトルとの類似度スコアが低くなるように、ＤＮＮのパラメータを学習する。 In DNN parameter learning, a feature vector (output of the normalization layer shown in the above equation (1)) input from a query image and output from the DNN, and a feature vector (output from the DNN input from a similar image) The similarity score with the output of the normalization layer shown in the above equation (1), the feature vector of the query image (the output of the normalization layer shown in the above equation (1)) and the dissimilar image are inputted and outputted from the DNN. The similarity score with the feature vector (the output of the normalization layer shown in the above formula (1)), the similarity score between the query image and the similar image calculated by the other method score calculation unit 223, and the other method score calculation unit The similarity score between the feature vector of the query image and the feature vector of the similar image is high using the loss function shown in the above formula (2) including the similarity score between the query image and the dissimilar image calculated by 223. As the similarity score between the feature vector of the feature vectors and the non-similar image of the query image is lowered, to learn the parameters of DNN.

学習されたＤＮＮのパラメータが、出力部４０により出力される。 The learned parameters of DNN are output by the output unit 40.

スコア計算部２２６は、クエリ画像及び類似画像の複数ペアと、学習されたＤＮＮのパラメータとに基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）と、類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）とを求め、クエリ画像及び類似画像の複数ペアの各々に対し、クエリ画像の特徴ベクトルと、類似画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 The score calculation unit 226 inputs a query image based on a plurality of pairs of query images and similar images and learned DNN parameters, and outputs a feature vector (normalization shown in the above formula (1)). Layer output) and a feature vector (output of the normalization layer shown in the above formula (1)) output from the DNN by inputting a similar image, and for each of a plurality of pairs of a query image and a similar image, A similarity score representing cosine similarity between the feature vector of the query image and the feature vector of the similar image is calculated.

また、スコア計算部２２６は、クエリ画像及び非類似画像の複数ペアと、学習されたＤＮＮのパラメータとに基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）と、非類似画像を入力してＤＮＮから出力される特徴ベクトル（上記式（１）に示す正規化層の出力）とを求め、クエリ画像及び非類似画像の複数ペアの各々に対し、クエリ画像の特徴ベクトルと、非類似画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 In addition, the score calculation unit 226 inputs a query image based on a plurality of pairs of query images and dissimilar images and learned DNN parameters, and outputs a feature vector (in the above equation (1)). Output of the normalization layer shown) and a feature vector (output of the normalization layer shown in the above formula (1)) inputted from the dissimilar image and outputted from the DNN, and a plurality of pairs of the query image and dissimilar image A similarity score representing a cosine similarity between the feature vector of the query image and the feature vector of the dissimilar image is calculated.

重み学習部２２８は、スコア計算部２２６によって計算された類似度スコアｙと、他手法スコア計算部２２３によって計算された類似度スコアｘとに基づいて、クエリ画像及び類似画像のペアについての統合スコア（ａｘ＋ｂｙ）が高くなり、かつ、クエリ画像及び非類似画像のペアについての統合スコア（ａｘ＋ｂｙ）が低くなるように、統合スコアを求めるための重みａ，ｂを学習する。学習された重みが、出力部４０により出力される。 Based on the similarity score y calculated by the score calculation unit 226 and the similarity score x calculated by the other technique score calculation unit 223, the weight learning unit 228 calculates an integrated score for the query image and similar image pair. The weights a and b for obtaining the integrated score are learned so that (ax + by) becomes high and the integrated score (ax + by) for the pair of the query image and the dissimilar image becomes low. The learned weight is output by the output unit 40.

＜スコア計算装置のシステム構成＞
図１１は、本発明の第２の実施の形態に係るスコア計算装置２５０を示すブロック図である。このスコア計算装置２５０は、ＣＰＵと、ＲＡＭと、後述するスコア計算処理ルーチンを実行するためのプログラムを記憶したＲＯＭと、を備えたコンピュータで構成され、機能的には次に示すように構成されている。 <System Configuration of Score Calculation Device>
FIG. 11 is a block diagram showing a score calculation apparatus 250 according to the second embodiment of the present invention. The score calculation device 250 is composed of a computer including a CPU, a RAM, and a ROM that stores a program for executing a score calculation processing routine described later, and is functionally configured as follows. ing.

本実施の形態に係るスコア計算装置２５０は、図１１に示すように、入力部６０と、演算部２７０と、出力部９０とを備えている。 The score calculation apparatus 250 which concerns on this Embodiment is provided with the input part 60, the calculating part 270, and the output part 90, as shown in FIG.

演算部２７０は、モデル記憶部７１、画像データベース７２、スコア計算部７３、他手法スコア計算部２７４、及びスコア統合部２７５を備えている。 The calculation unit 270 includes a model storage unit 71, an image database 72, a score calculation unit 73, another method score calculation unit 274, and a score integration unit 275.

モデル記憶部７１には、モデル学習装置２００によって出力されたＤＮＮのパラメータが記憶されている。 The model storage unit 71 stores DNN parameters output by the model learning device 200.

他手法スコア計算部２７４は、クエリ画像及び参照画像の集合に基づいて、ＤＮＮから出力される特徴ベクトルとは異なる他手法を用いて、参照画像の各々に対し、クエリ画像と、当該参照画像との類似度を表す類似度スコアを計算する。 The other method score calculation unit 274 uses a different method different from the feature vector output from the DNN based on the set of the query image and the reference image, for each reference image, the query image, the reference image, Calculate a similarity score that represents the similarity of

スコア統合部２７５は、モデル学習装置２００により出力された重みを用いて、参照画像の各々に対し、スコア計算部７３により計算された当該参照画像との類似度スコアｙ、及び他手法スコア計算部２７４により計算された当該参照画像との類似度スコアｘを統合して、統合スコア（ａｘ＋ｂｙ）を算出する。 The score integration unit 275 uses the weights output from the model learning device 200, and for each reference image, the similarity score y with the reference image calculated by the score calculation unit 73 and the other technique score calculation unit The similarity score x with the reference image calculated by 274 is integrated to calculate an integrated score (ax + by).

出力部９０は、統合スコアの降順に、参照画像を出力する。 The output unit 90 outputs the reference image in the descending order of the integrated score.

＜モデル学習装置の作用＞
次に、第２の実施の形態に係るモデル学習装置２００の作用について説明する。クエリ画像と、当該クエリ画像に類似する類似画像との複数ペアが学習データとしてモデル学習装置２００に入力されると、モデル学習装置２００によって、図１２に示すモデル学習処理ルーチンが実行される。 <Operation of Model Learning Device>
Next, the operation of the model learning device 200 according to the second embodiment will be described. When a plurality of pairs of a query image and a similar image similar to the query image are input as learning data to the model learning device 200, the model learning processing routine shown in FIG.

まず、ステップＳ１００において、学習データのクエリ画像の各々に対して、画像以外の情報を用いて、画像データベース２２に記憶されている参照画像の集合から、当該クエリ画像に類似しない非類似画像を選択する。 First, in step S100, for each query image of learning data, a non-similar image that is not similar to the query image is selected from a set of reference images stored in the image database 22 using information other than the image. To do.

そして、ステップＳ２００において、学習データのクエリ画像の各々に対して、当該クエリ画像の類似画像との類似度スコア、及び上記ステップＳ１００で選択された当該クエリ画像に類似しない非類似画像との類似度スコアを、ＤＮＮ特徴ベクトルとは異なる他手法により計算する。 In step S200, for each of the query images of the learning data, the similarity score with the similar image of the query image and the similarity with the dissimilar image not similar to the query image selected in step S100 The score is calculated by another method different from the DNN feature vector.

ステップＳ２０２において、学習データのクエリ画像と類似画像の複数ペア、上記ステップＳ１００でクエリ画像の各々に対して選択された非類似画像、上記ステップＳ２００で計算されたクエリ画像と類似画像との類似度スコア、及び上記ステップＳ２００で計算されたクエリ画像と非類似画像との類似度スコアに基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトルと、類似画像を入力してＤＮＮから出力される特徴ベクトルとの類似度スコア、クエリ画像の特徴ベクトルと非類似画像を入力してＤＮＮから出力される特徴ベクトルとの類似度スコア、上記ステップＳ２００で計算されたクエリ画像と類似画像との類似度スコア、及び上記ステップＳ２００で計算されたクエリ画像と非類似画像との類似度スコアを含む、上記式（２）に示す損失関数を最適化するように、ＤＮＮのパラメータを学習する。学習されたＤＮＮのパラメータが、出力部４０により出力される。 In step S202, a plurality of pairs of query images and similar images of learning data, dissimilar images selected for each of the query images in step S100, and similarity between the query image and similar images calculated in step S200 Based on the score and the similarity score between the query image and the dissimilar image calculated in step S200, the query image is input and the feature vector output from the DNN, and the similar image is input and output from the DNN. The similarity score with the feature vector, the similarity score with the feature vector output from the DNN by inputting the feature vector and the dissimilar image of the query image, the similarity between the query image and the similar image calculated in step S200 above A degree score, and a similarity score between the query image and the dissimilar image calculated in step S200, So as to optimize the loss function shown in serial formula (2), to learn the parameters of DNN. The learned parameters of DNN are output by the output unit 40.

また、ステップＳ２０４において、クエリ画像及び類似画像の複数ペアと、学習されたＤＮＮのパラメータとに基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトルと、類似画像を入力してＤＮＮから出力される特徴ベクトルとを求め、クエリ画像及び類似画像の複数ペアの各々に対し、クエリ画像の特徴ベクトルと、類似画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 Also, in step S204, based on the query image and a plurality of pairs of similar images and parameters of the learned DNN, the query image is input and the feature vector output from DNN and the similar image are input and the DNN is input. An output feature vector is obtained, and a similarity score representing a cosine similarity between the feature vector of the query image and the feature vector of the similar image is calculated for each of a plurality of pairs of the query image and the similar image.

また、クエリ画像及び非類似画像の複数ペアと、学習されたＤＮＮのパラメータとに基づいて、クエリ画像を入力してＤＮＮから出力される特徴ベクトルと、非類似画像を入力してＤＮＮから出力される特徴ベクトルとを求め、クエリ画像及び非類似画像の複数ペアの各々に対し、クエリ画像の特徴ベクトルと、非類似画像の特徴ベクトルとのコサイン類似度を表す類似度スコアを計算する。 In addition, based on a plurality of pairs of a query image and a dissimilar image, and parameters of the learned DNN, the query image is input and a feature vector output from the DNN and a dissimilar image are input and output from the DNN. And a similarity score representing a cosine similarity between the feature vector of the query image and the feature vector of the dissimilar image is calculated for each of a plurality of pairs of the query image and the dissimilar image.

そして、ステップＳ２０６では、上記ステップＳ２０４で計算された類似度スコアと、上記ステップＳ２００で計算された類似度スコアとに基づいて、クエリ画像及び類似画像のペアについての統合スコアが高くなり、かつ、クエリ画像及び非類似画像のペアについての統合スコアが低くなるように、重み付き加算により統合スコアを求めるための重みを学習する。学習された重みが、出力部４０により出力され、モデル学習処理ルーチンを終了する。 In step S206, based on the similarity score calculated in step S204 and the similarity score calculated in step S200, the integrated score for the query image and similar image pair increases, and The weight for learning the integrated score is learned by weighted addition so that the integrated score for the query image and the dissimilar image pair is low. The learned weight is output by the output unit 40, and the model learning processing routine is ended.

＜スコア計算装置の作用＞
次に、第２の実施の形態に係るスコア計算装置２５０の作用について説明する。まず、モデル学習装置２００から出力されたＤＮＮのパラメータが、スコア計算装置２５０に入力され、モデル記憶部７１に格納される。そして、クエリ画像がスコア計算装置２５０に入力されると、スコア計算装置２５０によって、図１３に示すスコア計算処理ルーチンが実行される。 <Operation of score calculation device>
Next, the operation of the score calculation apparatus 250 according to the second embodiment will be described. First, DNN parameters output from the model learning device 200 are input to the score calculation device 250 and stored in the model storage unit 71. Then, when the query image is input to the score calculation device 250, the score calculation processing routine shown in FIG.

まず、ステップＳ１１０において、クエリ画像を入力してＤＮＮから出力される特徴ベクトルを求める。 First, in step S110, a query image is input and a feature vector output from the DNN is obtained.

ステップＳ２１０では、クエリ画像及び参照画像の集合に基づいて、ＤＮＮから出力される特徴ベクトルとは異なる他手法を用いて、参照画像の各々に対し、クエリ画像と、当該参照画像との類似度を表す類似度スコアを計算する。 In step S210, the similarity between the query image and the reference image is determined for each reference image using another method different from the feature vector output from the DNN based on the set of query images and reference images. Calculate the similarity score to represent.

ステップＳ２１２において、モデル学習装置２００により出力された重みを用いて、参照画像の各々に対し、スコア計算部７３により計算された当該参照画像との類似度スコア、及び他手法スコア計算部２７４により計算された当該参照画像との類似度スコアを、重み付き加算により統合して、統合スコアを算出する。 In step S212, using the weights output by the model learning device 200, the similarity score with the reference image calculated by the score calculation unit 73 and the other method score calculation unit 274 are calculated for each reference image. The similarity score with the reference image is integrated by weighted addition to calculate an integrated score.

ステップＳ１１６では、画像データベース７２に記憶されている全ての参照画像について、上記ステップＳ１１２〜Ｓ１１４、Ｓ２１０〜Ｓ２１２の処理を実行したか否かを判定し、上記ステップＳ１１２〜Ｓ１１４、Ｓ２１０〜Ｓ２１２の処理を実行していない参照画像が存在する場合には、上記ステップＳ１１２へ戻り、当該参照画像の特徴ベクトルを求める。一方、画像データベース７２に記憶されている全ての参照画像について、上記ステップＳ１１２〜Ｓ１１４、Ｓ２１０〜Ｓ２１２の処理を実行した場合には、ステップＳ１１８へ進む。 In step S116, it is determined whether or not the processing in steps S112 to S114 and S210 to S212 has been performed for all reference images stored in the image database 72, and the processing in steps S112 to S114 and S210 to S212 is performed. If there is a reference image that has not been executed, the process returns to step S112, and a feature vector of the reference image is obtained. On the other hand, when the processes of steps S112 to S114 and S210 to S212 are executed for all reference images stored in the image database 72, the process proceeds to step S118.

ステップＳ２１２では、上記ステップＳ２１２で計算された統合スコアの降順に、参照画像を出力部９０により出力して、スコア計算処理ルーチンを終了する。 In step S212, the reference image is output by the output unit 90 in the descending order of the integrated score calculated in step S212, and the score calculation processing routine is ended.

以上説明したように、本発明の第２の実施の形態に係るモデル学習装置によれば、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコア、クエリ画像の特徴ベクトルと非類似画像の特徴ベクトルとの類似度スコア、他手法により計算されたクエリ画像と類似画像との類似度スコア、及び他手法により計算されたクエリ画像と非類似画像との類似度スコアを含む損失関数を用いて、クエリ画像の特徴ベクトルと類似画像の特徴ベクトルとの類似度スコアが高く、クエリ画像の特徴ベクトルと非類似画像の特徴ベクトルとの類似度スコアが低くなるように、全結合層の出力に対して正規化を行うＤＮＮのパラメータを学習することにより、精度よく類似度スコアを計算するためのモデルを学習することができる。特に、損失関数に、ＤＮＮに関係なく算出される他手法による類似スコア（固定値）を含めて、ＤＮＮ側だけに誤差逆伝播することで、ＤＮＮの特徴ベクトルで上手く類似判断ができる画像に対してはより上手く行くように、他手法で上手く類似判断できる画像に対しては手を抜くように、ＤＮＮのパラメータを学習をすることができる。 As described above, according to the model learning device according to the second embodiment of the present invention, the similarity score between the feature vector of the query image and the feature vector of the similar image, the feature vector of the query image and the dissimilar image Loss function including similarity score with feature vector, similarity score between query image and similar image calculated by other methods, and similarity score between query image and dissimilar image calculated by other methods The output of all connected layers is such that the similarity score between the feature vector of the query image and the feature vector of the similar image is high and the similarity score between the feature vector of the query image and the feature vector of the dissimilar image is low. By learning the parameters of DNN that performs normalization, it is possible to learn a model for calculating the similarity score with high accuracy. In particular, for an image that can be judged well with a DNN feature vector by backpropagating errors only to the DNN side, including the similarity score (fixed value) obtained by other methods calculated irrespective of DNN in the loss function. It is possible to learn the parameters of DNN so that the image can be judged similar by other methods well, as it is better.

また、本発明の第２の実施の形態に係るスコア計算装置によれば、全結合層の出力に対して正規化を行うＤＮＮから出力されるクエリ画像の特徴ベクトルと、ＤＮＮから出力される参照画像の特徴ベクトルとを求め、類似度スコアを計算し、他手法によりクエリ画像と参照画像との類似度スコアを計算し、統合することにより、精度よく類似度スコアを計算することができる。 Moreover, according to the score calculation apparatus which concerns on the 2nd Embodiment of this invention, the feature vector of the query image output from DNN which normalizes with respect to the output of all the connection layers, and the reference output from DNN By calculating the image feature vector, calculating the similarity score, calculating the similarity score between the query image and the reference image by another method, and integrating them, the similarity score can be calculated with high accuracy.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the scope of the present invention.

例えば、上記の実施の形態では、ＤＮＮが、畳み込み層と全結合層を含む場合を例に説明したが、これに限定されるものではない。例えば、畳み込み層と全結合層とからなる部分を、ＲＮＮやＬＳＴＭなど任意のネットワーク構造で代替可能である。 For example, although the above embodiment has been described by way of example in which DNN includes a convolutional layer and an all bonding layer, the present invention is not limited to this. For example, the part composed of the convolution layer and the total coupling layer can be replaced with an arbitrary network structure such as RNN or LSTM.

また、多層構造ニューラルネットワークとして、ＣＮＮ、ＲＮＮ、ＬＳＴＭなどの任意のネットワーク構造を用いても良い。 Also, any network structure such as CNN, RNN, LSTM, etc. may be used as the multilayer structure neural network.

また、画像の類似度スコアを計算する場合を例に説明したが、これに限定されるものではなく、動画や音声を対象とし、動画や音声の類似度スコアを計算するようにしてもよい。 Further, the case where the image similarity score is calculated has been described as an example. However, the present invention is not limited to this, and a moving image or audio similarity score may be calculated for a moving image or audio.

上述のモデル学習装置１００、２００、スコア計算装置１５０、２５０は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The model learning devices 100 and 200 and the score calculation devices 150 and 250 described above have a computer system inside. However, if the “computer system” uses a WWW system, a homepage providing environment ( Or display environment).

例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 For example, although the present invention has been described as an embodiment in which the program is installed in advance, the program may be provided by being stored in a computer readable recording medium.

１０、６０入力部
２０、７０、２２０、２７０演算部
２１学習データ記憶部
２２画像データベース
２３非類似データ選択部
２４、２２４学習部
４０出力部
７０演算部
７１モデル記憶部
７２画像データベース
７３、２２６スコア計算部
９０出力部
１００、２００モデル学習装置
１５０、２５０スコア計算装置
２２３、２７４他手法スコア計算部
２２８重み学習部
２７５スコア統合部 10, 60 Input unit 20, 70, 220, 270 Calculation unit 21 Learning data storage unit 22 Image database 23 Dissimilar data selection unit 24, 224 Learning unit 40 Output unit 70 Calculation unit 71 Model storage unit 72 Image database 73, 226 Score Calculation unit 90 Output unit 100, 200 Model learning device 150, 250 Score calculation device 223, 274 Other method score calculation unit 228 Weight learning unit 275 Score integration unit

Claims

A model learning device for learning a multilayered neural network for inputting data and outputting a feature vector,
The query data output from the multilayer neural network that normalizes the output of the fully connected layer based on query data, similar data similar to the query data, and dissimilar data not similar to the query data A feature vector of the query data using a loss function including a similarity score between the feature vector of the query data and the feature vector of the similar data, and a similarity score between the feature vector of the query data and the feature vector of the dissimilar data And learning the parameters of the multilayer neural network so that the similarity score between the feature vector of the similar data and the feature vector of the query data is high and the similarity score between the feature vector of the query data and the feature vector of the dissimilar data is low Model learning device including a learning unit.

A dissimilar data selection unit that selects the dissimilar data that is not similar to the query data from a set of reference data using information other than the data;
The model learning device according to claim 1, wherein the learning unit learns parameters of the multilayered neural network based on the query data, the similar data, and the selected dissimilar data.

A model learning device for learning a multilayered neural network for inputting data and outputting a feature vector,
Based on query data, similar data similar to the query data, and dissimilar data not similar to the query data, a feature vector of the query data and a feature vector of the similar data output from the multilayer neural network A similarity score, a similarity score between a feature vector of the query data and a feature vector of the dissimilar data, a similarity score between the query data and the similar data calculated using information different from the feature vector, And using a loss function including a similarity score between the query data and the dissimilar data calculated using information different from the feature vector, the feature vector of the query data and the feature vector of the similar data The similarity score is high, and the feature vector of the query data and the dissimilarity As the similarity score between the feature vector of the motor is low, the model learning device including a learning section for learning the parameters of the multi-layered structure neural network.

A score calculation device that calculates a score representing the similarity between query data and reference data,
Based on the query data and the reference data, the feature vector of the query data output from the multilayer neural network that is pre-trained and normalizes the output of the fully connected layer, and is output from the multilayer neural network. A score calculation unit that calculates a similarity score between the feature vector of the query data and the feature vector of the reference data;
Other method score calculation unit for calculating a similarity score between the query data and the reference data using information different from the feature vector;
A score integration unit that calculates an integrated score obtained by integrating the similarity score calculated by the score calculation unit and the similarity score calculated by the other method score calculation unit;
Score calculation device including.

A parameter used to calculate a similarity score between query data and reference data,
The input is query data, similar data similar to the query data, dissimilar data not similar to the query data, and the output is a normalized feature vector of the query data for determining cosine similarity, the similarity A data structure including a normalized feature vector of data and a parameter of a multilayered neural network as a normalized feature vector of the dissimilar data.

A model learning method in a model learning device for learning a multilayered neural network for inputting data and outputting a feature vector,
Based on query data, similar data similar to the query data, and dissimilar data not similar to the query data, the learning unit is output from the multilayer neural network that normalizes the output of the fully connected layer The query using a loss function including a similarity score between a feature vector of the query data and a feature vector of the similar data, and a similarity score between the feature vector of the query data and the feature vector of the dissimilar data The multilayer neural network has a high similarity score between the feature vector of the data and the feature vector of the similar data, and a low similarity score between the feature vector of the query data and the feature vector of the dissimilar data. Model learning method to learn parameters.

A model learning method in a model learning device for learning a multilayered neural network for inputting data and outputting a feature vector,
Based on query data, similar data similar to the query data, and dissimilar data not similar to the query data, a learning unit outputs a feature vector of the query data and the similar data output from the multilayer neural network. A similarity score with a feature vector, a similarity score between a feature vector of the query data and a feature vector of the dissimilar data, and the query data calculated using information different from the feature vector and the similar data Using a loss function including a similarity score and a similarity score between the query data and the dissimilar data calculated using information different from the feature vector, the feature vector of the query data and the similarity data The similarity score with the feature vector is high, and the feature vector of the query data As the similarity score between the feature vector of the dissimilar data is lower, the model learning method for learning the parameters of the multi-layered structure neural network.

The program for functioning a computer as each part which comprises the model learning apparatus in any one of Claims 1-3, or the score calculation apparatus of Claim 4.