JP5728534B2

JP5728534B2 - Integrated classifier learning apparatus, integrated classifier learning method, and integrated classifier learning program

Info

Publication number: JP5728534B2
Application number: JP2013139244A
Authority: JP
Inventors: 上田　修功; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-07-02
Filing date: 2013-07-02
Publication date: 2015-06-03
Anticipated expiration: 2033-07-02
Also published as: JP2015011686A

Description

本発明は、統合識別器学習装置、統合識別器学習方法、及び統合識別器学習プログラムにかかり、特に、複数の識別器の識別結果を統合してクラスを識別する統合識別器を構築するための統合識別器学習装置、統合識別器学習方法、及び統合識別器学習プログラムに関する。 The present invention relates to an integrated discriminator learning device, an integrated discriminator learning method, and an integrated discriminator learning program, and more particularly to constructing an integrated discriminator that integrates identification results of a plurality of discriminators to identify classes. The present invention relates to an integrated classifier learning device, an integrated classifier learning method, and an integrated classifier learning program.

一般に、教師あり学習では、観測データと各観測データが帰属するクラスを表すクラスラベルとが学習データとして与えられる。観測データは、何らかの方法で特徴が抽出され、通常、特徴ベクトルとして表現される。そして、特徴ベクトルとそのクラスラベルの組の集合（学習データ）に基づいて、学習データを正しく識別するように識別器を学習する。ここでの識別とは、Ｋ個のクラスの場合、特徴ベクトルをＫ個のクラスのいずれか一つに分類することを意味する。 Generally, in supervised learning, observation data and a class label representing a class to which each observation data belongs are given as learning data. The observation data is extracted by some method, and is usually expressed as a feature vector. Then, based on a set (learning data) of a set of feature vectors and their class labels, the discriminator learns so as to correctly identify the learning data. The identification here means that in the case of K classes, the feature vector is classified into one of the K classes.

そして、クラスラベルが未知のテストデータに対し、学習済の識別器を用いて識別を行う。このとき、識別器では、学習データだけではなく、テストデータに対する識別精度（汎化性能）が重要となる。即ち、学習データに過度に適応し、テストデータの識別精度が低い識別器は非実用的である。この問題は過学習（overfitting）と呼ばれ、識別器の設計における実用上の重要な課題となっている。 Then, the test data whose class label is unknown is identified using a learned classifier. At this time, in the classifier, not only learning data but also identification accuracy (generalization performance) for test data is important. That is, a discriminator that is excessively adapted to the learning data and has low test data discrimination accuracy is impractical. This problem is called overfitting and is an important practical issue in the design of classifiers.

これまで、識別器については、パターン認識や機械学習の分野で精力的に研究されており、現在、多くの識別器が利用可能である。しかし、新たな識別タスクに直面した場合、どのような特徴が識別に有効かどうかの事前知識やドメイン知識がない場合は、観測データに対し適切な特徴ベクトルを設計することは困難となる。また、クラス数が多い場合には、そもそも全クラスの識別に有効な特徴ベクトルを設計すること自体に限界がある。即ち、単一種類の特徴ベクトル、もしくは、単一の識別器では、ある特定のクラスの学習データに対してのみ識別性能が高くなり、他のクラスの学習データに対しては識別性能が不十分であるといった偏った識別器となる場合が多い。 So far, classifiers have been intensively studied in the fields of pattern recognition and machine learning, and many classifiers are currently available. However, when faced with a new identification task, it is difficult to design an appropriate feature vector for observation data if there is no prior knowledge or domain knowledge of what features are effective for identification. In addition, when the number of classes is large, there is a limit to designing a feature vector that is effective for identifying all classes. That is, with a single type of feature vector or single classifier, the discrimination performance is high only for a certain class of learning data, and the discrimination performance is insufficient for other classes of learning data. In many cases, it becomes a biased classifier such as.

そのため、複数の識別器の識別結果であるクラスラベルを統合する統合識別器を用いてテストデータの識別結果（クラスラベル）を得ることが行われてきている。 Therefore, it has been performed to obtain a test data identification result (class label) using an integrated classifier that integrates class labels that are identification results of a plurality of classifiers.

複数の識別器の識別結果（クラスラベル）を統合する統合識別手法の技術はアンサンブル学習と呼ばれ、これまでにいくつかの技術が提案されている。最も単純な方法は、複数の識別器の識別結果（クラスラベル）の多数決をとる手法である（例えば、非特許文献１及び非特許文献２参照）。具体的には、特徴ベクトルとクラスラベルの組からなるラベル付き学習データセットに対し、ランダムな抽出により部分集合（部分学習データセット）を複数用意する。Ｊ個の部分学習データセットが存在する場合、各部分学習データセットで識別器を学習する。これによりＪ個の識別器が得られる。そして、クラスラベルが未知のテストデータの識別では、あるテストデータをＪ個の識別器を用いて識別することにより、Ｊ個の識別結果（クラスラベル）が得られる。次に、そのＪ個のクラスラベルの多数決をとって最も多く出現したクラスをそのテストデータの識別結果（クラスラベル）とする。本手法は、単一の識別器の識別結果が不安定な場合、多数決をとることで、識別結果の信頼性が向上することが理論および実験で示されている。ただし、この手法は、各識別器の識別性能が比較的良いことが前提となっており、識別性能の低い識別器が多く含まれている場合は、単一識別器の識別性能と同等程度の識別性能となる。 The technique of the integrated identification method that integrates the identification results (class labels) of a plurality of classifiers is called ensemble learning, and several techniques have been proposed so far. The simplest method is a method of taking a majority vote of identification results (class labels) of a plurality of classifiers (see, for example, Non-Patent Document 1 and Non-Patent Document 2). Specifically, a plurality of subsets (partial learning data sets) are prepared by random extraction with respect to a labeled learning data set including a set of feature vectors and class labels. When there are J partial learning data sets, the discriminator is learned with each partial learning data set. Thereby, J discriminators are obtained. In identifying test data with unknown class labels, J test results (class labels) are obtained by identifying certain test data using J classifiers. Next, the majority class of the J class labels is taken and the class that appears most frequently is set as the identification result (class label) of the test data. It has been shown by theory and experiment that this method improves the reliability of classification results by taking a majority vote when the classification results of a single classifier are unstable. However, this method is based on the premise that the discrimination performance of each discriminator is relatively good. When many discriminators with low discrimination performance are included, the discrimination performance of the single discriminator is comparable. It becomes discrimination performance.

別の方法として、識別器の識別結果（クラスラベル）を新たな特徴として、メタレベルの識別器を学習する手法がある（例えば、非特許文献３参照）。具体的には、Ｊ個の識別器の場合、ある学習データに対して、各識別器の識別結果（クラスラベル）を要素とするＪ次元ベクトルが構成できる。 As another method, there is a method of learning a meta-level discriminator using a discrimination result (class label) of the discriminator as a new feature (for example, see Non-Patent Document 3). Specifically, in the case of J discriminators, a J-dimensional vector having the discrimination result (class label) of each discriminator as an element can be configured for certain learning data.

第ｊ要素は第ｊ識別器の識別結果に相当する。学習後のＪ個の識別器に対し、各学習データ毎に、Ｊ個の識別器の識別結果を列挙することで、一つの学習データに対し、Ｊ次元の一つの特徴ベクトルが得られる。つまり、クラスラベルと元の特徴ベクトルとの組から成るＮ個の学習データ（以下、１次学習データと呼ぶ）に対しては、クラスラベルとＪ次元特徴ベクトルのとの組から成るＮ個の学習データ(以下、２次学習データと呼ぶ)が得られることになる。そして、非特許文献３では、２次学習データを新たな学習データとして識別器を構成するためのベイズ学習法：Bayesian Classifier Combination（ＢＣＣ）を提案し、多数決による方法に比べ、テストデータに対する識別性能が向上することを実験的に示している。複数の識別器の識別結果から成る２次学習データを用いて、新たな識別器を構成するという考え方は、非特許文献４等に提案されているstackedregressionとして知られている。非特許文献３は、このstackedregressionをベイズモデル化したものと位置づけられる。 The jth element corresponds to the identification result of the jth classifier. By enumerating the discrimination results of the J discriminators for each learning data with respect to the J discriminators after learning, one J-dimensional feature vector is obtained for one learning data. That is, for N pieces of learning data consisting of a set of class labels and original feature vectors (hereinafter referred to as primary learning data), N pieces of sets consisting of a set of class labels and J-dimensional feature vectors are used. Learning data (hereinafter referred to as secondary learning data) is obtained. Non-Patent Document 3 proposes a Bayesian Classifier Combination (BCC) for constructing a discriminator using secondary learning data as new learning data, and the discrimination performance for test data compared to the majority method. Has been experimentally shown to improve. The idea of constructing a new classifier using secondary learning data composed of the classification results of a plurality of classifiers is known as stackedregression proposed in Non-Patent Document 4 and the like. Non-Patent Document 3 is positioned as a Bayesian model of this stacked regression.

Breiman, L., 1996. Bagging predictors, Machine Learning, 24, 123-140.Breiman, L., 1996. Bagging predictors, Machine Learning, 24, 123-140. Dietterich, T. G., 2000. Ensemble methods in machine learning. In Proceedingsof the First International Workshop on Multiple Classifier Systems, Springer-Verlag, LondonUK, 1-15.Dietterich, T. G., 2000. Ensemble methods in machine learning.In Proceedingsof the First International Workshop on Multiple Classifier Systems, Springer-Verlag, LondonUK, 1-15. Kim, H.C. & Ghahramani, Z., 2012. Bayesian classifier combination. In Proceedingsof International Conference on Artificial Intelligence and Statistcs, AISTATS2012,http://www.aistats.org/papers.php.Kim, H.C. & Ghahramani, Z., 2012.Bayesian classifier combination.In Proceedingsof International Conference on Artificial Intelligence and Statistcs, AISTATS2012, http: //www.aistats.org/papers.php. Wolpert, D. H., 1992. Stacked generalization, Neural Networks, 5, 241-259.Wolpert, D. H., 1992. Stacked generalization, Neural Networks, 5, 241-259.

従来の統合識別手法は、各識別器の識別性能がある程度良いことを前提としているため、識別性能の低い識別器が存在すると、統合の効果が期待できないという問題がある。実際、対象とする識別タスクにおいて識別に有効な特徴抽出が実現できていなければ、単一の識別器の性能には限界が生じる。それ故、あるクラスの識別では識別性能が高くても、別のクラスの識別では識別性能が低いような現象が生じる。この現象は、クラス数が多い程顕著である。 Since the conventional integrated identification method is based on the premise that the discrimination performance of each discriminator is good to some extent, there is a problem that the effect of integration cannot be expected if a discriminator with low discrimination performance exists. In fact, if feature extraction effective for identification cannot be realized in the target identification task, the performance of a single classifier is limited. Therefore, even if the identification performance of one class is high, the classification performance of the classification of another class is low. This phenomenon becomes more prominent as the number of classes increases.

このような現象が生じることにより、複数の識別器の識別結果を統合する統合識別器の識別精度が低下する懸念が生じるという問題がある。 When such a phenomenon occurs, there is a problem that the identification accuracy of the integrated classifier that integrates the identification results of a plurality of classifiers may decrease.

本発明は上記問題点を考慮してなされたものであり、統合識別器の識別精度を向上させることができる、統合識別器学習装置、統合識別器学習方法、及び統合識別器学習プログラムを提供することを目的とする。 The present invention has been made in consideration of the above problems, and provides an integrated discriminator learning device, an integrated discriminator learning method, and an integrated discriminator learning program capable of improving the identification accuracy of the integrated discriminator. For the purpose.

上記目的を達成するために、本発明の統合識別器学習装置は、複数の識別器の各々から出力された識別結果を統合してクラスを識別する統合識別器の学習を行う統合識別器学習装置であって、複数のデータの各々に対する前記データの特徴ベクトルと前記データが帰属する真のクラスとの組から成る１次学習データの前記複数のデータの各々について、前記複数の識別器の各々により前記データが帰属するクラスの識別を行った識別結果と、前記データが帰属する真のクラスとの組から成る２次学習データを各々取得し、取得した前記２次学習データの各々に基づいて、前記複数の識別器の各々が真のクラスに対して一貫性を有する識別結果を出力するか否かを用いて、前記複数の識別器の各々により前記データが帰属するクラスの識別を行った識別結果に基づいて前記識別結果を統合してクラスを識別する統合識別器を学習する。 In order to achieve the above object, an integrated discriminator learning apparatus according to the present invention learns an integrated discriminator that integrates the discrimination results output from each of a plurality of discriminators to identify a class. Each of the plurality of discriminators for each of the plurality of pieces of primary learning data comprising a set of a feature vector of the data for each of a plurality of pieces of data and a true class to which the data belongs. Each of the secondary learning data consisting of a set of the identification result of identifying the class to which the data belongs and the true class to which the data belongs is acquired, and based on each of the acquired secondary learning data, Whether each of the plurality of classifiers outputs an identification result consistent with the true class is used to identify the class to which the data belongs by each of the plurality of classifiers. It was based on the identification result to learn the combined discriminator identifying a class by integrating the identification result.

また、本発明の統合識別器学習装置は、前記取得した前記２次学習データの各々に基づいて、前記識別器及びクラスω_kの各組み合わせに対し、前記クラスω_kが真のクラスであるデータについて第ｊ番目の識別器が一貫性を有する識別結果を出力するか否かを示す２値の潜在変数ｒ_j ^(k)を含む前記統合識別器を学習することが好ましい。 Further, the integrated discriminator learning device according to the present invention provides data in which the class ω _k is a true class for each combination of the discriminator and the class ω _k based on each of the acquired secondary learning data. It is preferable to learn the integrated classifier including a binary latent variable r _j ^(k) indicating whether or not the j-th classifier outputs a consistent classification result.

また、本発明の統合識別器学習装置は、前記取得した前記２次学習データの各々に基づいて、下記（Ｉ）式〜（Ｖ）式で表される前記２次学習データの生成モデルのパラメータであるα、β、ａ、及びｂを、下記（ＶＩ）式により学習し、前記識別器及びクラスω_kの各組み合わせに対し、以下の（ＶＩＩ）式、及び（ＶＩＩＩ）式により計算される前記潜在変数ｒ_j ^(k)の値の確率分布に従って、前記潜在変数ｒ_j ^(k)の値を決定する事が好ましい。
The integrated classifier learning apparatus of the present invention, based on each of the front Symbol secondary learning data the acquired following formula (I) ~ (V) of the generation model of the secondary learning data represented by formula The parameters α, β, a, and b are learned by the following equation (VI), and are calculated by the following equations (VII) and (VIII) for each combination of the classifier and the class ω _k. that according to the probability distribution of the values of the latent variable r _j ^(k), it is preferable to determine the value of the latent variables r _j ^(k).

ただし、ｃ_ｉ,ｊ ^(k)は、クラスω_ｋを真のクラスとする第ｉ番目の前記２次学習データに含まれる第ｊ番目の識別器の識別結果を示し、Ｒは、前記潜在変数ｒ_ｊ ^(k)の集合を示し、ｒ_j ^(k)＝１が、クラスω_ｋが真のクラスであるデータについて第ｊ番目の識別器が一貫性を有する識別結果を出力することを示し、ｒ_j ^(k)＝０が、クラスω_ｋが真のクラスであるデータについて第ｊ番目の識別器が一貫性を有する識別結果を出力することを示し、ｒ_＼(ｋ,ｊ)が、ｒ_j ^(k)を除いた前記潜在変数の集合を表し、Ｃが、前記２次学習データの集合を示し、ｎ_j,l ^(k)は、第ｊ番目の識別器により、真のクラスω_kに帰属するデータについてクラスω_lと識別したデータ数を示し、β_j,l ^(k)は、クラスω_ｋが真のクラスであるデータについて一貫性を有する識別結果を出力する第ｊ番目の識別器により、真のクラスω_kに帰属するデータについてクラスω_lと識別する確率を生成するディレクレ分布を表すパラメータであり、α_lは、一貫性を有する識別結果を出力しない前記識別器により、前記データについてクラスω_lと識別する確率を生成するディレクレ分布を表すパラメータであり、Ｎ^(k)は、クラスω_ｋが真のクラスであるデータの数を示し、δ（ｘ，ｙ）はデルタ関数であり、Γ（ｘ）はガンマ関数である。 However, c _{i, j} ^(k) shows the identification result of the j-th identifier contained classes omega _k to the i-th pre-Symbol secondary learning data to true class, R, the potential Indicates a set of variables r _j ^(k) , where r _j ^(k) = 1 indicates that the j th discriminator outputs a consistent identification result for data whose class ω _k is a true class. , R _j ^(k) = 0 indicates that the j th discriminator outputs a consistent discrimination result for data whose class ω _k is a true class, and r _{\ (k, j)} is represents the set of latent variables excluding r _j ^(k) , C represents the set of secondary learning data, and n _{j, l} ^(k) represents the true class ω by the j-th discriminator. Indicates the number of data identified as class ω _l for data belonging to _k , and β _{j, l} ^(k) is a consistent identifier for data for which class ω _k is a true class. A parameter representing a directory distribution that generates a probability of identifying data belonging to the true class ω _k as class ω _l by the j th discriminator that outputs another result, and α _l is a consistent identification N ^(k) is a parameter representing a directory distribution that generates a probability of identifying the data as class ω _l by the classifier that does not output a result, and N ^(k) indicates the number of data in which class ω _k is a true class , Δ (x, y) is a delta function, and Γ (x) is a gamma function.

本発明の統合識別器学習方法は、複数の識別器の各々から出力された識別結果を統合して識別する統合識別器の学習を行う統合識別器学習装置における統合識別器学習方法であって、統合識別器学習装置により、複数のデータの各々に対する前記データの特徴ベクトルと前記データが帰属する真のクラスとの組から成る１次学習データの前記複数のデータの各々について、前記複数の識別器の各々により前記データが帰属するクラスの識別を行った識別結果と、前記データが帰属する正解クラスとの組から成る２次学習データを各々取得し、取得した前記２次学習データの各々に基づいて、前記複数の識別器の各々が真のクラスに対して一貫性を有する識別結果を出力するか否かを用いて、前記複数の識別器の各々により前記データが帰属するクラスの識別を行った識別結果に基づいて前記識別結果を統合する統合識別器を学習するステップを含む。 The integrated discriminator learning method of the present invention is an integrated discriminator learning method in an integrated discriminator learning device that learns an integrated discriminator that identifies and discriminates the discrimination results output from each of a plurality of discriminators, The plurality of classifiers for each of the plurality of pieces of primary learning data composed of a set of a feature vector of the data for each of a plurality of pieces of data and a true class to which the data belongs. Each of which acquires secondary learning data consisting of a set of an identification result obtained by identifying a class to which the data belongs and a correct class to which the data belongs, and based on each of the acquired secondary learning data The data is attributed to each of the plurality of classifiers using whether or not each of the plurality of classifiers outputs an identification result consistent with a true class. The identification of classes based on the identification result of including the step of learning the combined discriminator for integrating the identification result.

本発明の統合識別器学習プログラムは、コンピュータを、本発明の統合識別器学習装置として機能させるためのものである。 The integrated discriminator learning program of the present invention is for causing a computer to function as the integrated discriminator learning device of the present invention.

本発明の統合識別器学習装置、統合識別器学習方法、及び統合識別器学習プログラムによれば、統合識別器の識別精度を向上することができる、という効果が得られる。 According to the integrated discriminator learning device, the integrated discriminator learning method, and the integrated discriminator learning program of the present invention, the effect that the identification accuracy of the integrated discriminator can be improved is obtained.

本実施の形態の識別システムの一例の概略を示す概略図である。It is the schematic which shows the outline of an example of the identification system of this Embodiment. 本実施の形態の統合識別器学習装置による統合識別器の学習動作の一例の流れを表すフローチャートである。It is a flowchart showing the flow of an example of the learning operation | movement of the integrated discriminator by the integrated discriminator learning device of this Embodiment. 本実施の形態の識別システムの統合識別器学習装置における２次学習データの具体的な一例を示した説明図である。It is explanatory drawing which showed the specific example of the secondary learning data in the integrated discriminator learning device of the identification system of this Embodiment. 本実施の形態の統合識別結果出力装置によるテストデータの識別動作の一例の流れを表すフローチャートである。It is a flowchart showing the flow of an example of the identification operation | movement of the test data by the integrated identification result output device of this Embodiment. 本実施例においてwindowサイズが４８の特徴表現で学習したＨＭＭ（単一の識別器）の識別率を示した説明図である。It is explanatory drawing which showed the identification rate of HMM (single discriminator) learned by the feature expression whose window size is 48 in a present Example. 従来の統合識別器及び本実施の形態の統合識別器の識別率を示した説明図である。It is explanatory drawing which showed the identification rate of the conventional integrated discriminator and the integrated discriminator of this Embodiment.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態は本発明を限定するものではない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that this embodiment does not limit the present invention.

図１には、本実施の形態の識別システムの一例の概略を表す概略構成図を示す。本実施の形態の識別システム１０は、１次学習データ１２のクラスを識別器１４によって識別した識別結果に基づいて２次学習データ２２を生成する機能と、統合識別器学習装置２０により２次学習データ２２を教師データとして統合識別器２４の学習を行う機能と、統合識別結果出力装置３２により統合識別器２４を用いてテストデータ３０の識別を行う機能とを備えている。 FIG. 1 shows a schematic configuration diagram showing an outline of an example of an identification system according to the present embodiment. The identification system 10 according to the present embodiment includes a function of generating secondary learning data 22 based on the identification result obtained by identifying the class of the primary learning data 12 by the classifier 14, and secondary learning by the integrated classifier learning device 20. The integrated discriminator 24 has a function of learning using the data 22 as teacher data, and a function of discriminating the test data 30 using the integrated discriminator 24 by the integrated discrimination result output device 32.

本実施の形態の識別システム１０は、上記の各機能を実現するために統合識別器学習装置２０及び統合識別結果出力装置３２等が、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、及びＲＯＭ（Read Only Memory）等を備えたコンピュータにより実現されている。ＣＰＵが、ＲＯＭに記憶されているプログラムを実行することにより、詳細を後述する各動作が実行される。また、１次学習データ１２、２次学習データ２２、及びテストデータ３０は、ＨＤＤ（Hard Disk Drive）等の記憶部や記憶媒体等に記憶されている。なお、１次学習データ１２、２次学習データ２２、及びテストデータ３０は、識別システム１０の外部に設けられた記憶部や記憶媒体等に記憶しておき、各機能の実行に合わせて適宜、識別システム１０内に取得するようにしてもよい。 In the identification system 10 of the present embodiment, the integrated discriminator learning device 20, the integrated identification result output device 32, and the like are configured to implement a CPU (Central Processing Unit), a RAM (Random Access Memory), This is realized by a computer having a ROM (Read Only Memory) or the like. When the CPU executes a program stored in the ROM, each operation described in detail later is executed. The primary learning data 12, the secondary learning data 22, and the test data 30 are stored in a storage unit such as an HDD (Hard Disk Drive), a storage medium, or the like. Note that the primary learning data 12, the secondary learning data 22, and the test data 30 are stored in a storage unit, a storage medium, or the like provided outside the identification system 10, and appropriately according to the execution of each function. You may make it acquire in the identification system 10. FIG.

まず、１次学習データ１２のクラスを識別器１４によって識別した識別結果に基づいて２次学習データ２２を生成する機能について説明する。１次学習データ１２の各々は、特徴ベクトルと真のクラスラベルとの組で構成される。この１次学習データ１２の集合を用いて、Ｊ個の識別器１４（第１番目の識別器１４_１〜第Ｊ番目の識別器１４_Ｊ）を独立に学習する。なお、以下では、総称する場合は、単に識別器１４といい、個々の識別器について称する場合は、第ｊ番目の識別器１４_ｊのように、個々の符号を称していう。本実施の形態では、各識別器１４の種類、及び各識別器１４の学習法については、特に限定されず、任意で良い。 First, the function of generating the secondary learning data 22 based on the identification result obtained by identifying the class of the primary learning data 12 by the classifier 14 will be described. Each of the primary learning data 12 is composed of a set of a feature vector and a true class label. Using this set of primary learning data 12, J discriminators 14 (first discriminator 14 ₁ to J th discriminator 14 _J ) are independently learned. In the following, the collective name is simply referred to as the discriminator 14, and the individual discriminator is referred to by an individual code such as the _jth discriminator 14j. In the present embodiment, the type of each classifier 14 and the learning method of each classifier 14 are not particularly limited and may be arbitrary.

１次学習データ１２の集合は、Ｎ個の学習データからなるＤ＝｛ｄ_１，・・・，ｄ_Ｎ｝としている。但し、ｄ_ｉ＝（ｘ_ｉ，ｙ_ｉ）は、第ｉ番目の学習データを表し、ｘ_ｉは特徴ベクトル、ｙ_ｉはクラスラベルを表す。学習データが帰属するクラス数がＫである場合、クラスラベルは１，２，・・・，Ｋのいずれかの値となる。また、Ｄ_-ｉを１次学習データの集合Ｄから第ｉ番目の学習データｄ_ｉを除いた学習データの集合を表すものとすると、Ｄ_-ｉでＪ個の識別器１４を学習した後、学習に未使用の第ｉ番目の学習データｄ_ｉの特徴ベクトルｘ_ｉを学習済のＪ個の識別器１４に入力することでＪ個の識別結果を得る。この処理をｉ＝１，・・・，Ｎに対して実行することで、２次学習データ２２の集合が得られる。なお、この場合は、２次学習データ２２は、Ｊ次元の特徴ベクトル（各要素はクラスラベル）と真のクラスラベルとの組から成る。すなわち、２次学習データ２２の集合は、Ｊ次元の特徴ベクトルと真のクラスラベルとが組になった学習データをＮ組含んでいる。 The set of primary learning data 12 is D = {d ₁ ,..., D _N } consisting of N learning data. However, d _i = (x _i , y _i ) represents the i-th learning data, x _i represents a feature vector, and y _i represents a class label. When the number of classes to which the learning data belongs is K, the class label has a value of 1, 2,. Further, if D _−i represents a set of learning data obtained by removing the i-th learning data d _i from the set D of primary learning data, after learning J discriminators 14 with D _−i , By inputting the feature vector x _i of the i-th learning data d _i that is not used for learning to the learned J discriminators 14, J discrimination results are obtained. By executing this process for i = 1,..., N, a set of secondary learning data 22 is obtained. In this case, the secondary learning data 22 includes a set of a J-dimensional feature vector (each element is a class label) and a true class label. That is, the set of secondary learning data 22 includes N sets of learning data in which J-dimensional feature vectors and true class labels are paired.

次に、本実施の形態の統合識別器学習装置２０により２次学習データ２２を教師データとして統合識別器２４の学習を行う機能について説明する。当該学習を行うための技術は、既知の識別結果の集合である観測データＣから、できるだけ多くのテストデータ３０を正しく識別するための統合識別器２４を学習するための、本実施の形態の識別システム１０の統合識別器学習装置２０における主要な技術である。図２には、統合識別器学習装置２０による統合識別器２４の学習動作の一例の流れを表すフローチャートを示す。 Next, the function of performing learning of the integrated classifier 24 using the secondary learning data 22 as teacher data by the integrated classifier learning device 20 of the present embodiment will be described. The technique for performing the learning is the identification according to the present embodiment for learning the integrated classifier 24 for correctly identifying as many test data 30 as possible from the observation data C that is a set of known identification results. This is a main technique in the integrated classifier learning device 20 of the system 10. FIG. 2 is a flowchart showing an example of the learning operation of the integrated discriminator 24 by the integrated discriminator learning device 20.

本実施の形態の統合識別器学習装置２０では、まずステップＳ１００で、Ｊ個の識別器１４（第１番目の識別器１４_１〜第Ｊ番目の識別器１４_Ｊ）の識別結果による２次学習データ２２の集合を取得する。 In the integrated discriminator learning device 20 of the present embodiment, first, in step S100, secondary learning based on the discrimination results of the J discriminators 14 (first discriminator 14 ₁ to J th discriminator 14 _J ). A set of data 22 is acquired.

次のステップＳ１０２では、取得した２次学習データ２２の集合に基づいて、２次学習データ２２の生成モデルのパラメータ（α、β、ａ、ｂ、詳細後述）を推定し、記憶する。なお、本実施の形態においてこれらパラメータの記憶は、統合識別器学習装置２０内等に設けられた記憶部（図示省略）等に記憶させておく。 In the next step S102, parameters (α, β, a, b, details will be described later) of the generation model of the secondary learning data 22 are estimated and stored based on the acquired set of secondary learning data 22. In the present embodiment, the storage of these parameters is stored in a storage unit (not shown) provided in the integrated discriminator learning device 20 or the like.

次のステップＳ１０４では、推定したパラメータに基づいて、潜在変数の集合Ｒ＝｛ｒ_ｊ ^(ｋ)｝を決定し、記憶した後、本動作を終了する。なお、本実施の形態において、潜在変数の集合Ｒの記憶は、上記２次学習データ２２の生成モデルのパラメータと同様に、統合識別器学習装置２０内等に設けられた記憶部（図示省略）等に記憶させておく。 In the next step S104, a set of latent variables R = {r _j ^(k) } is determined and stored based on the estimated parameters, and this operation is terminated. In the present embodiment, the storage of the set R of latent variables is performed by a storage unit (not shown) provided in the integrated classifier learning device 20 and the like, similar to the parameters of the generation model of the secondary learning data 22 described above. And so on.

本実施の形態の統合識別器学習装置２０では、このように、２次学習データ２２の生成モデルのパラメータの推定及び潜在変数の集合Ｒの決定と記憶を行うことにより、複数（本実施の形態ではＪ個）の識別器１４のうち、どの識別器１４が、いずれのクラスの識別に有効か否かを自動学習する。なお、本実施の形態の統合識別器学習装置２０では、識別器１４が識別に有効であるか否か判断する基準は、識別結果が正しいか否かに限らない。統合識別器学習装置２０では、真のクラス（学習データが本来帰属すべきクラス）に対して、一貫性を有する識別結果を出力する識別器１４を、当該クラスの識別に有効な識別器１４と判断している。直観的には、ある識別器１４がクラス１の１次学習データ１２を高頻度でクラス２と識別する場合は、その識別器１４の識別結果がクラス２である場合は、実は真のクラスは、クラス１である可能性が高いという考え方をモデル化して学習している。 In the integrated discriminator learning apparatus 20 of the present embodiment, a plurality of (this embodiment) are obtained by estimating the parameters of the generation model of the secondary learning data 22 and determining and storing the set R of latent variables. In this case, among the J classifiers 14, which classifier 14 is effective for identifying which class is effective. In the integrated classifier learning device 20 of the present embodiment, the criterion for determining whether the classifier 14 is effective for identification is not limited to whether the identification result is correct. In the integrated discriminator learning device 20, a discriminator 14 that outputs a consistent discrimination result for a true class (a class to which learning data should belong originally) is used as a discriminator 14 effective for discriminating the class. Deciding. Intuitively, when a certain classifier 14 frequently identifies the primary learning data 12 of class 1 as class 2, if the classification result of the classifier 14 is class 2, the true class is actually , Learning by modeling the idea that it is likely to be class 1.

以下、上記学習動作（図２、ステップＳ１００〜ステップＳ１０４）の各動作の一例の詳細について説明する。 Hereinafter, details of an example of each operation of the learning operation (FIG. 2, steps S100 to S104) will be described.

サイズがＮ^(ｋ)×Ｊの行列Ｃ^(ｋ)を、真のクラスω_ｋに対する２次学習データ２２の集合を表すものとし、また、全クラスの各々を真のクラスとする１次学習データ１２に対する識別結果の集合とそのクラスラベルから成る２次学習データ２２の集合をＣ＝Ｕ_ｋ=１ ^ＫＣ^(ｋ)と書くこととする。Ｋは、クラス総数を表す。また、Ｎ^(ｋ) は、クラスω_ｋを真のクラスとする学習データの個数を表す。Ｃ^(ｋ)の行列の第（ｉ，ｊ）要素ｃ_ｉ,ｊ ^(ｋ)は、クラスω_ｋを真のクラスとする第ｉ番目の学習データに対する第ｊ番目の識別器１４_ｊの識別結果に相当する。すなわち、ｃ_ｉ,ｊ ^(ｋ)∈｛１、・・・、Ｋ｝、と表せる。 A matrix C ^(k) of size N ^(k) × J represents a set of secondary learning data 22 for a true class ω _k, and primary learning data in which all classes are true classes. A set of secondary learning data 22 consisting of a set of identification results for 12 and its class label is written as C = U _{k = 1} ^K C ^(k) . K represents the total number of classes. N ^(k) represents the number of learning data in which the class ω _k is a true class. C the the (i, j) th component c _{i, j} ^(k) of the matrix ^(k) is the j-th discriminator 14 _j of the identification result for the i-th training data class omega _k and true class It corresponds to. That is, it can be expressed as c _{i, j} ^(k) ∈ {1,..., K}.

図３には、２次学習データ２２の具体的な一例を示す。図３に示した２次学習データ２２は、クラス数が３（３クラス問題）のため、Ｋ＝３で、Ｊ＝１０、Ｎ＝１５、Ｎ^(ｋ)＝５（ｋ＝１、２、３）である場合の例である。図３では、真のクラスラベルを陽に明示していないが、５個のデータの区切りで上から順に真のクラスラベル１、２、３が付与されているものとする。例えば、図３の行列形式で表した第１行目は、クラス１（クラスラベル１）を真のクラスとする第１番目の１次学習データ１２に対する１０個の識別器１４による識別結果（クラスラベル）を表している。前述した様に、単一の識別器１４によれば、識別結果が、あるクラスのデータは比較的正しく識別できるが、別のクラスのデータは誤識別し易いという偏った結果となる場合がある。この傾向はクラス数が多くなるほど顕著となる。それ故、全ての識別器１４の識別結果を対等に扱うのは適切ではない。 FIG. 3 shows a specific example of the secondary learning data 22. In the secondary learning data 22 shown in FIG. 3, since the number of classes is 3 (3-class problem), K = 3, J = 10, N = 15, N ^(k) = 5 (k = 1, 2, This is an example in the case of 3). In FIG. 3, the true class label is not explicitly shown, but it is assumed that true class labels 1, 2, and 3 are assigned in order from the top with five data delimiters. For example, the first row expressed in the matrix format of FIG. 3 shows the discrimination results (classes) by the ten discriminators 14 for the first primary learning data 12 having class 1 (class label 1) as a true class. Label). As described above, according to the single discriminator 14, the identification result may be a biased result that data of a certain class can be identified relatively correctly, but data of another class is easily misidentified. . This tendency becomes more prominent as the number of classes increases. Therefore, it is not appropriate to treat the discrimination results of all the discriminators 14 on an equal basis.

そこで、本実施の形態の統合識別器学習装置２０では、ある識別器１４が各クラスの識別に有効か否かを示す潜在変数を導入し、当該潜在変数の学習を行う。ただし、あるクラスの識別において誤識別が多い識別器１４が必ずしもそのクラスの識別に有効でない（無効）とは限らない。つまり、その識別器１４があるクラスに対し、殆ど全て誤識別でも識別結果に真のクラス（正しい識別結果）に対する一貫性があれば、その識別器１４は、そのクラスの識別に有効と言える。図３中では、各識別器１４毎に、各クラスにおいて一貫性を有する正しい識別結果には、斜線を施しており、誤識別であるが、真のクラスに対して一貫性を有する識別結果には、網掛けを施している。 Therefore, in the integrated classifier learning device 20 of the present embodiment, a latent variable indicating whether or not a certain classifier 14 is effective for identifying each class is introduced, and the latent variable is learned. However, the discriminator 14 having many misidentifications in identifying a certain class is not always effective (invalid) for identifying the class. In other words, if the classifier 14 is consistent with the true class (correct classification result) even if almost all of the classes are erroneously identified, the classifier 14 can be said to be effective for classifying the class. In FIG. 3, for each discriminator 14, the correct identification result having consistency in each class is shaded, which is misidentification, but the identification result having consistency with the true class is displayed. Is shaded.

例えば、図３中に識別器ｉｄ＝２で示される第２番目の識別器１４_２は、全クラスの識別において、識別結果が正しく、一貫性を有しているため、全クラスの識別に有効である。 For example, the second discriminator 14 ₂ represented by the discriminator id = 2 in FIG. 3, in the identification of all classes, the identification result is correct, since it has a consistent, effective to identify all classes It is.

また、図３中に識別器ｉｄ＝３で示される第３番目の識別器１４_３は、クラス２及びクラス３の識別において、識別結果が正しく、一貫性を有しているため、クラス２及びクラス３の識別に有効である。しかしながら、第３番目の識別器１４_３は、クラス１の識別において、誤識別をしており、また、その識別結果が真のクラス「１」に対して一貫性を有していないため、クラス１の識別に無効である。 Further, the third classifier 14 ₃ indicated by the discriminator id = 3 in FIG. 3, in the identification of Class 2 and Class 3, the identification result is correct, since it has a consistent, Class 2 and Effective for class 3 identification. However, since, the third classifier 14 _3, where in the identification of class 1, and a misidentification, also the identification result does not have a consistency against true class "1", the class 1 is invalid for identification.

また、図３中に識別器ｉｄ＝５で示される第５番目の識別器１４_５は、クラス２及びクラス３の識別において、識別結果が正しく、一貫性を有しているため、クラス２及びクラス３の識別に有効である。さらに、第５番目の識別器１４_５は、クラス１の識別において、誤識別をしているが、その識別結果が真のクラス「１」に対してクラス「２」に識別するという一貫性を有しているため、クラス１の識別にも有効である。 Further, the fifth discriminator 14 ₅ represented by the identifier id = 5 in Figure 3, in the identification of Class 2 and Class 3, the identification result is correct, since it has a consistent, Class 2 and Effective for class 3 identification. Further, the fifth discriminator 14 _5, in the identification of class 1, although the misidentification, consistency of the identification result identifies the class "2" to the true class "1" Therefore, it is also effective for class 1 identification.

また、図３中に識別器ｉｄ＝１０で示される第１０番目の識別器１４_１０は、クラス２の識別において、誤識別をしているが、その識別結果が真のクラス「２」に対してクラス「１」に識別するという一貫性を有しているため、クラス２の識別に有効である。しかしながら、第１０番目の識別器１４_１０は、クラス１及びクラス３の識別において、誤識別をしており、また、その識別結果が真のクラス「１」、「３」に対して一貫性を有していないため、クラス１及びクラス３の識別に無効である。 Also, the 10 th classifier 14 ₁₀ represented by the identifier id = 10 in FIG. 3, in the identification of class 2, although the misidentification with respect to the identification result is true class "2" Therefore, it is effective for class 2 identification. However, the tenth discriminator 14 ₁₀ misclassifies class 1 and class 3, and the identification result is consistent with the true classes “1” and “3”. It is invalid for class 1 and class 3 identification.

ここで、ｃ_ｉ ^(ｋ)＝（ｃ_ｉ,１ ^(ｋ)、・・・、ｃ_ｉ,Ｊ ^(ｋ)）は、クラスω_ｋを真のクラスとする第ｉ番目の学習データに対するＪ次元の離散特徴ベクトルと見なすことができる。Ｊ個の識別器が統計的に独立とすると、クラスω_ｋの確率分布は下記の（１）式で表される。 Here, c _i ^(k) = (c _{i, 1} ^(k) ,..., C _{i, J} ^(k) ) is the J dimension for the i-th learning data in which the class ω _k is a true class. Can be regarded as discrete feature vectors. If J discriminators are statistically independent, the probability distribution of class ω _k is expressed by the following equation (1).

しかし、上述した様に、識別結果は必ずしも正しいとは限らないので、このような単純なモデル化では不十分である。そこで、統合識別器学習装置２０では、第ｊ番目の識別器１４_ｊが真のクラスω_ｋに対して一貫性を有する識別結果を出力するか否かを示す２値の潜在変数ｒ_ｊ ^(ｋ)を導入する。ｒ_ｊ ^(ｋ)＝１のとき、第ｊ番目の識別器１４_ｊは真のクラスω_ｋに対して一貫性を有する識別結果を出力し、クラスω_ｋを真のクラスとする学習データに対する第ｊ次元の特徴は、クラスω_ｋに固有の分布に従って生成されたとする。一方、ｒ_ｊ ^(ｋ)＝０のとき、第ｊ番目の識別器１４_ｊは真のクラスω_ｋに対して一貫性を有する識別結果を出力しないとし、クラスω_ｋを真のクラスとする学習データに対する第ｊ次元の特徴は、クラスω_ｋに依らない分布に従って生成されたと仮定する。さらに、ｃ_ｉ,ｊ ^(ｋ)は、以下の確率モデルにより生成されたと仮定する。 However, as described above, since the identification result is not always correct, such a simple modeling is not sufficient. Therefore, in the integrated discriminator learning device 20, a binary latent variable r _j ^(k) indicating whether or not the j-th discriminator 14 _j outputs a discrimination result consistent with the true class ω _k . ⁾ . When r _j ^(k) = 1, the j-th discriminator 14 _j outputs a consistent discrimination result for the true class ω _k , and the j th discriminator for the learning data with the class ω _k as the true class. It is assumed that the j-dimensional feature is generated according to a distribution unique to the class ω _k . On the other hand, when r _j ^(k) = 0, the j-th identifier 14 _j is not output the identification result with a consistency against true class omega _k, the class omega _k and true class learning Assume that the jth dimension features for the data are generated according to a distribution that does not depend on the class ω _k . Further, assume that c _{i, j} ^(k) is generated by the following probability model.

ここで、θ_ｊ ^(ｋ) ＝｛θ_ｊ,ｌ ^(ｋ)}_ｌ=１ ^Ｋ、及びφ＝｛φ_ｌ｝_ｌ=１ ^Ｋである。また、θ_ｊ,ｌ ^(ｋ) は、ｒ_ｊ ^(ｋ)= 1 のとき、クラスω_ｋを真のクラスとするデータに対する第ｊ番目の識別器１４_ｊの識別結果がクラスω_ｌである確率を表す。また、φ_ｌは、ｒ_ｊ ^(ｋ)＝０のとき、クラスω_ｋを真のクラスとするデータに対する第ｊ番目の識別器１４_ｊの識別結果がクラスω_ｌである確率を表す。上述した様に、ｒ_ｊ ^(ｋ)＝０は、第ｊ番目の識別器１４_ｊがクラスω_ｋの識別に有効でないことを意味するため、φ_ｌはｊ及びｋに依らない。ａ及びｂはベータ分布（Beta）のパラメータである。α＝｛α_ｌ｝_ｌ=１ ^Ｋ、及びβ_ｊ ^(ｋ)＝｛β_ｊ,ｌ ^(ｋ)｝_ｌ=１ ^Ｋは、ディリクレ分布（Dirichlet）のパラメータである。 Here, θ _j ^(k) = {θ _{j, l} ^(k) } _{l = 1} ^K and φ = {φ _l } _{l = 1} ^K. Θ _{j, l} ^(k) is the probability that when r _j ^(k) = 1, the discrimination result of the j-th discriminator 14 _j for the data with the class ω _k as a true class is the class ω _l. Represents. Φ _l represents the probability that the identification result of the j-th discriminator 14 _j for the data having the class ω _k as a true class is the class ω _l when r _j ^(k) = 0. As described above, r _j ^(k) = 0 means that the j-th discriminator 14 _j is not effective in class ω _k discrimination, and therefore φ _l does not depend on j and k. a and b are parameters of the beta distribution (beta). α = {α _l } _{l = 1} ^K and β _j ^(k) = {β _{j, l} ^(k) } _{l = 1} ^K are parameters of the Dirichlet distribution (Dirichlet).

上記（２）式〜（６）式の意味は以下の通りである。まず、パラメータａ、ｂを持つベータ分布（Beta）からλを生成する。次いで、φをパラメータαを持つディリクレ分布（Dirichlet）から生成する。また、θ_ｊ ^(ｋ)をパラメータβ_ｊ ^(ｋ)を持つディリクレ分布から生成する。そして、潜在変数ｒ_ｊ ^(ｋ)をパラメータλのベルヌーイ分布（Bernoulli）から生成する。そして、ｒ_ｊ ^(ｋ)＝１のとき、θ_ｊ ^(ｋ)＝（θ_ｊ,１ ^(ｋ) ，・・・，θ_ｊ,Ｋ ^(ｋ)）をパラメータとする離散分布（Discrete）からｃ_ｉ,ｊ ^(ｋ)を生成する。また、ｒ_ｊ ^(ｋ)＝０のとき、φ＝（φ_ｌ，・・・φ_Ｋ）をパラメータとする離散分布からｃ_ｉ,ｊ ^(ｋ)を生成する。 The meanings of the formulas (2) to (6) are as follows. First, λ is generated from a beta distribution (Beta) having parameters a and b. Next, φ is generated from a Dirichlet distribution (Dirichlet) having a parameter α. Also, θ _j ^(k) is generated from a Dirichlet distribution with parameter β _j ^(k) . Then, the latent variable r _j ^(k) is generated from the Bernoulli distribution (Bernoulli) of the parameter λ. Then, when r _j ^(k) = 1, from the discrete distribution (Discrete) with θ _j ^(k) = (θ _{j, 1} ^(k) ,..., Θ _{j, K} ^(k) ) as a parameter, c _{i, j} ^(k) is generated. In addition, when r _j ^(k) = 0, c _{i, j} ^(k) is generated from a discrete distribution whose parameter is φ = (φ ₁ ,... Φ _K ).

上記（２）式〜（６）式は、２次学習データ２２の生成モデルに相当する。即ち、上記（２）式〜（６）式に出現したパラメータを適切に与えれば、実際に観測される２次学習データ２２とほぼ類似のデータが得られることを意味する。もちろん、パラメータは未知であるため、実際に観測された識別結果である観測データＣを用いてパラメータを推定することになる。この推定が、統合識別器学習装置２０における学習に相当する。 The above equations (2) to (6) correspond to a generation model of the secondary learning data 22. That is, if the parameters appearing in the equations (2) to (6) are appropriately given, it means that data substantially similar to the actually observed secondary learning data 22 can be obtained. Of course, since the parameter is unknown, the parameter is estimated using the observation data C which is the actually observed identification result. This estimation corresponds to learning in the integrated classifier learning device 20.

学習は、以下のベイズ推定の枠組みで実現することができる。｛ｃ_ｉ ^(ｋ)｝_ｉ=１ ^Ｋの統計的独立性より、観測データＣの尤度は下記（７）式で表わされる。 Learning can be realized by the following Bayesian estimation framework. {C _i ^(k) } _{i = 1 From} the statistical independence of ^K , the likelihood of the observation data C is expressed by the following equation (7).

（７）式において、Ｒ＝｛ｒ_ｊ ^(ｋ)｝とする。また、ｎ_ｊ,ｌ ^(ｋ)は第ｊ番目の識別器１４_ｊにより、クラスω_ｋを真のクラスとするデータをクラスω_ｌと識別したデータ数を表す。即ち、ｎ_ｊ,ｌ ^(ｋ)＝Σ_ｉ=１ ^Ｎ(ｋ)δ（ｃ_ｉ,ｊ ^(ｋ)，ｌ）である。但し、δ（ｘ，ｙ）はデルタ関数であり、ｘ＝ｙのとき１の値をとり、それ以外では０の値をとる。Θ＝｛θ_ｊ ^(ｋ)｝とする。さらに、事前分布の共役性より、Θ、φ、及びλは全て積分消去することができる。 In the equation (7), R = {r _j ^(k) }. Further, n _{j, l} ^(k) represents the number of data in which data having the class ω _k as a true class is identified as the class ω _l by the j-th discriminator 14 _j . That is, n _{j, l} ^(k) = Σ _{i = 1} ^{N (k)} δ (c _{i, j} ^(k) , l). However, δ (x, y) is a delta function, and takes a value of 1 when x = y, and takes a value of 0 otherwise. Let Θ = {θ _j ^(k) }. Furthermore, Θ, φ, and λ can all be integrated and eliminated due to the conjugate nature of the prior distribution.

上記（８）式において、Γ（ｘ）はガンマ関数を表す。また、α_●＝Σ_ｌ=１ ^Ｋα_ｌ、及びβ_ｊ，● ^(ｋ)＝Σ_ｌ=１ ^Ｋβ_ｊ,ｌ ^(ｋ)とする。同様に、下記（９）式を得ることができる。 In the above equation (8), Γ (x) represents a gamma function. Also, α _● = Σ _{l = 1} ^K α _l and β _{j, ●} ^(k) = Σ _{l = 1} ^K β _{j, l} ^(k) . Similarly, the following equation (9) can be obtained.

上記（７）式及び（８）式より、下記（１０）式で表される事後分布が得られる。 From the above formulas (7) and (8), a posterior distribution represented by the following formula (10) is obtained.

経験ベイズ法により、（１０）式の右辺の対数をとって、２次学習データ２２の集合Ｃに基づいて、各パラメータ（α、β、ａ、ｂ）の関数と見なした周辺尤度を最大にするように、各パラメータ（α、β、ａ、ｂ）を推定することにより、パラメータ（α、β、ａ、ｂ）を学習する。 The empirical Bayes method is used to calculate the marginal likelihood regarded as a function of each parameter (α, β, a, b) based on the set C of the secondary learning data 22 by taking the logarithm of the right side of equation (10). The parameters (α, β, a, b) are learned by estimating the parameters (α, β, a, b) so as to maximize the parameters.

Ｐ（Ｒ｜Ｃ；α，β，ａ，ｂ）が得られれば、ギブスサンプリング法によりＲ＝｛ｒ_ｊ ^(ｋ)｝を確率的に決定することができる。ｒ_＼(ｋ,ｊ)を（ｓ，ｕ）≠（ｋ，ｊ）以外の全てのＲ＝｛ｒ_ｓ ^(ｕ)｝を表すものとすると、Ｐ（ｒ_ｊ ^(ｋ)＝１｜ｒ_＼(ｋ,ｊ)，Ｃ）＋Ｐ（ｒ_ｊ ^(ｋ)＝０｜ｒ_＼(ｋ,ｊ)，Ｃ）＝１より、比ν＝Ｐ（ｒ_ｊ ^(ｋ)＝１｜ｒ_＼(ｋ,ｊ)，Ｃ）／Ｐ（ｒ_ｊ ^(ｋ)＝０｜ｒ_＼(ｋ,ｊ)，Ｃ）を計算することでｒ_ｊ ^(ｋ)の値の確率分布を下記（１１）式により求めることができる。 If P (R | C; α, β, a, b) is obtained, R = {r _j ^(k) } can be determined stochastically by the Gibbs sampling method. r _{\ (k, j)} and (s, u) ≠ (k , j) if it is assumed to represent all of R = other than _{^{{r s (u)},}} P (r j (k) = 1 | r \ _{(k, j)} , C) + P ( _rj ^(k) = 0 | r _{\ (k, j)} , C) = 1, the ratio ν = P ( _rj ^(k) = 1 | r _{\ (k, j)} , C) / P (r _j ^(k) = 0 | r _{\ (k, j)} , C) to obtain the probability distribution of the value of r _j ^(k) by the following equation (11) Can do.

但し、νは、下記（１２）式として求められる。 However, (nu) is calculated | required as following (12) Formula.

第ｊ番目の識別器１４j及びクラスωkの各ペアについて、ｒ_ｊ ^(ｋ)の値が、上記（１１）式により得られる確率分布に従って決定される。なお、ｒ_ｊ ^(ｋ)の値の決定は、Ｔ回反復される。 For each pair of the jth discriminator 14j and class ωk, the value of r _j ^(k) is determined according to the probability distribution obtained by the above equation (11). Note that the determination of the value of r _j ^(k) is repeated T times.

このようにして本実施の形態の識別システム１０では、統合識別器学習装置２０により統合識別器２４の学習が行われる。 In this manner, in the identification system 10 according to the present embodiment, the integrated classifier learning device 20 learns the integrated classifier 24.

統合識別結果出力装置３２は、学習済みの統合識別器２４により、クラスラベルが未知のテストデータ３０の識別を行う。次に、統合識別結果出力装置３２により統合識別器２４を用いてテストデータ３０の識別を行う機能について説明する。 The integrated identification result output device 32 uses the learned integrated classifier 24 to identify the test data 30 whose class label is unknown. Next, a function for identifying the test data 30 using the integrated identifier 24 by the integrated identification result output device 32 will be described.

図４には、統合識別結果出力装置３２によるテストデータ３０の識別動作の一例の流れを表すフローチャートを示す。 FIG. 4 shows a flowchart showing an example of the operation of identifying the test data 30 by the integrated identification result output device 32.

本実施の形態の統合識別結果出力装置３２では、まずステップＳ２００で、テストデータ３０をＪ個の識別器１４（第１番目の識別器１４_１〜第Ｊ番目の識別器１４_Ｊ）で識別した識別結果を取得する。なお、このとき用いる識別器１４は、統合識別器２４の学習の際に用いた識別器１４と同一のものである。 In the integrated identification result output apparatus 32 according to the present embodiment, first, in step S200, the test data 30 is identified by the J classifiers 14 (the first classifier 14 ₁ to the Jth classifier 14 _J ). Get the identification result. Note that the discriminator 14 used at this time is the same as the discriminator 14 used in the learning of the integrated discriminator 24.

次のステップＳ２０２では、統合識別器学習装置２０により学習された統合識別器２４を用いてテストデータ３０の最適クラスω_ｋ＊を導出する。具体的には、上記ステップＳ２００により、Ｊ個のクラスラベルが得られるため、識別結果を表すＪ次元特徴ベクトルを統合識別器２４に入力することにより、最適クラスω_ｋ＊の導出を行っている。 In the next step S202, the optimal class ω _{k *} of the test data 30 is derived using the integrated classifier 24 learned by the integrated classifier learning device 20. Specifically, since J class labels are obtained in step S200, the optimal class ω _{k *} is derived by inputting a J-dimensional feature vector representing the identification result to the integrated classifier 24. .

次のステップＳ２０４では、上記ステップＳ２０２で導出した最適クラスω_ｋ＊を統合識別器２４の識別結果として、例えば、識別システム１０の外部等に出力した後、本動作を終了する。 In the next step S204, the optimum class ω _{k *} derived in step S202 is output as the identification result of the integrated discriminator 24 to, for example, the outside of the identification system 10, etc., and then this operation is terminated.

以下、上記識別動作（図４、ステップＳ２００〜ステップＳ２０４）の各動作の一例の詳細について説明する。 Hereinafter, details of an example of each operation of the identification operation (FIG. 4, steps S200 to S204) will be described.

クラスラベルが未知のＭ個のテストデータ３０のうちの第ｍ番目のテストデータに対する各識別器１４による識別結果を、ｃ_ｍ ^＊＝（ｃ_ｍ,ｌ ^＊，・・・，ｃ_ｍ,Ｊ ^＊）で表すものとする。この場合、本実施の形態の識別システム１０は、ｍ＝１，・・・，Ｍの各々について、第ｍ番目のテストデータ３０のクラスの識別を、統合識別器学習装置２０により学習済みの統合識別器２４を用いて行う。 The discrimination result by each discriminator 14 for the mth test data of M test data 30 whose class labels are unknown is represented by _cm ^* = ( _{cm, l} ^* ,..., Cm _{, J} ^*. ). In this case, the identification system 10 according to the present embodiment recognizes the class of the mth test data 30 for each of m = 1,... This is performed using the discriminator 24.

統合識別器学習装置２０で、上述のようにＲ＝｛ｒ_ｊ ^(ｋ) ｝が得られれば、以下の手順で統合識別器２４によるテストデータ３０の識別が実現できる。ｃ_ｍ ^＊に対するベイズ最適なクラスω_ｋ＊は、ｃ_ｍ ^＊に対する予測事後分布：Ｐ（ω_ｋ｜ｃ_ｍ ^＊，Ｃ）を最大化するｋに相当する。ベイズの定理より、最適クラスω_ｋ＊は、下記（１３）式で求められる。 If R = {r _j ^(k) } is obtained in the integrated discriminator learning device 20 as described above, the test data 30 can be identified by the integrated discriminator 24 in the following procedure. Bayes optimal class omega _{k *} is for c _m ^_*, prediction posterior distribution for _{^{_{c m *: P (ω k}}} | c m *, C) corresponds to a k that maximizes. From the Bayes' theorem, the optimal class ω _{k *} is obtained by the following equation (13).

（１３）式において、Ｐ（ω_ｋ）はクラス事前分布で、通常は一様分布とする。このとき、Ｐ（ω_ｋ）は下記（１４）式のモンテカルロ近似を用いて近似的に計算することができる。 In the equation (13), P (ω _k ) is a class prior distribution and is usually a uniform distribution. At this time, P (ω _k ) can be approximately calculated using the Monte Carlo approximation of the following equation (14).

（１４）式において、ｒ_ｊ ^(ｋ)(ｔ）は、Ｐ（ｒ_ｊ ^(ｋ)｜ｒ_＼(ｋ,ｊ)，Ｃ）に基づくギブスサンプリングにおける第ｔ回目の反復でのｒ_ｊ ^(ｋ)（ｔ）の値を表す。上述したように、ｒ_ｊ ^(ｋ)の値の決定処理は、Ｔ回反復されるため、ｒ_ｊ ^(ｋ)(ｔ）はその第ｔ回目の反復での決定値を表す。ｔ_０はburn-inタイムを表す。即ち、ｒ_ｊ ^(ｋ)（ｔ）の値（ｔ＝１，・・・，ｔ_０）は、ギブスサンプリングの初期サンプリング値で信頼性が低いものとして棄却される。 (14) In the equation, _{r j} ^(k) (t) _{^{is, P (r j (k)}} | r \ (k, j), C) r j (k in the t-th iteration in the Gibbs sampling based on ⁾ Represents the value of (t). As described above, since the determination process of the value of r _j ^(k) is repeated T times, r _j ^(k) (t) represents the determined value in the t-th iteration. t ₀ represents the burn-in time. That is, the value of r _j ^(k) (t) (t = 1,..., T ₀ ) is rejected as an initial sampling value of Gibbs sampling that has low reliability.

さらに、上記（１４）式の右辺はｒ_ｊ ^(ｋ)の値に依存して、下記（１５）式及び（１６）式のように解析的に計算することができる。 Further, the right side of the above equation (14) can be analytically calculated as in the following equations (15) and (16) depending on the value of r _j ^(k) .

（１５）式及び（１６）式において、ｎ_ｊ,ｌ ^(●)＝Σ_ｋ＝１ ^Ｋｎ_ｊ,ｌ ^(ｋ)である。 In the equations (15) and (16), n _{j, l} ^(●) = Σ _{k = 1} ^K n _{j, l} ^(k) .

最終的に、第ｍ番目のテストデータに対して統合識別器２４により求められる最適クラスω_ｋ＊は下記（１７）式に従って得られる。 Finally, the optimum class ω _{k *} obtained by the integrated discriminator 24 for the m-th test data is obtained according to the following equation (17).

（１７）式が、統合識別器２４によるテストデータ３０の識別ルールを表す。（１７）式により、識別ルールは、ｒ_ｊ ^(ｋ)の値に依存することが確認できる。 Expression (17) represents an identification rule of the test data 30 by the integrated classifier 24. From equation (17), it can be confirmed that the identification rule depends on the value of r _j ^(k) .

（実施例）
上述のようにして識別システム１０により構築した統合識別器２４を用いた具体的な実施例について説明する。 (Example)
A specific embodiment using the integrated classifier 24 constructed by the identification system 10 as described above will be described.

単一の識別器１４では識別困難な識別タスクとして、２２種類及び１４種類の看護師行動データをテストデータ３０とした場合における実験結果について説明する。１次学習データ１２は、以下の手順で作成した。 As an identification task that is difficult to identify with a single classifier 14, an experiment result when 22 types and 14 types of nurse behavior data are used as test data 30 will be described. The primary learning data 12 was created by the following procedure.

看護師の両手首、胸ポケット、及び腰に装着された合計４つの３軸加速度センサから得られる、模擬患者に対する２２種類及びそのサブセットである１４種類の看護行動（以下、単に「行動」という）データに対して、ある時間幅のsliding windowを設定し、５０％のオーバーラップで時間軸方向にシフトしながら特徴の抽出を行った。特徴として、平均値、標準偏差、及び周波数領域でのエネルギーとエントロピーとを採用した。なお、これらの特徴については、Bao, L., and Intille, S., 2004. Activity recognition from user-annotated acceleration data, In Proceedings of International Conference on Pervasive Computing, Pervasive2004,Springer-Verlag, 1-17.に詳細に記載されている。これらの特徴を連結することにより、４８次元の特徴ベクトルの系列とその系列の行動クラスである真のクラスラベルとから成る１次学習データ１２を得る。次いで、１次学習データ１２を用いて隠れマルコフモデル（ＨＭＭ）を学習する。なお、ここでの１次学習データ１２の特徴ベクトルは、時系列情報であるため、時系列の特徴ベクトルとなる。 Data on 22 types of nursing patients and 14 types of nursing behaviors (hereinafter simply referred to as “behavior”) obtained from a total of four 3-axis accelerometers attached to the nurse's wrists, chest pockets, and waist. In contrast, a sliding window with a certain time width was set, and features were extracted while shifting in the time axis direction with 50% overlap. The average value, standard deviation, and energy and entropy in the frequency domain were adopted as features. For these features, see Bao, L., and Intille, S., 2004.Activity recognition from user-annotated acceleration data, In Proceedings of International Conference on Pervasive Computing, Pervasive2004, Springer-Verlag, 1-17. It is described in detail. By connecting these features, primary learning data 12 comprising a sequence of 48-dimensional feature vectors and a true class label that is an action class of the sequence is obtained. Next, a hidden Markov model (HMM) is learned using the primary learning data 12. Note that the feature vector of the primary learning data 12 here is time-series information, and thus becomes a time-series feature vector.

本実施例で用いた看護師行動は、動作が均一ではないため、各行動クラスに最適なwindowサイズの決定が困難となる。即ち、動きの速い行動に対しては、sliding windowの時間幅を小さくするのが望ましい。逆に、動きの遅い行動に対しては、sliding window の幅を大きくするのが、行動クラスの識別率向上の観点で望ましい。そのため、ある単一のsliding windowの時間幅を最適化することは困難である。そこで、本実施例では、時間幅を、４、６、・・・、１００として５０種類の特徴ベクトルの時系列とそのクラスラベルとから成る１次学習データ１２を構成し、それを用いてＨＭＭを学習した。すなわち、統合識別器学習装置２０は、統合識別器２４の学習のために、５０個（Ｊ＝５０）の識別器１４のＨＭＭで得られた識別結果を取得した。その内、テストデータで最良の識別率を有するＨＭＭ（windowサイズが４８の特徴表現で学習したＨＭＭ）の識別率は図５に示す様に約４２％であった。図５は、windowサイズが４８の特徴表現で学習したＨＭＭ（単一の識別器１４）の識別率を示している。なお、図５中の「（）」内は、標準偏差を表している。そして、統合識別器学習装置２０は、これら５０個のＨＭＭで得られた識別結果から２次学習データ２２を作成し、これを用いて統合識別器２４の学習を行った。 The nurse behavior used in this embodiment is not uniform in operation, so it is difficult to determine the optimal window size for each behavior class. That is, it is desirable to reduce the sliding window time width for fast-moving behavior. Conversely, for actions with slow movements, it is desirable to increase the sliding window in terms of improving the action class identification rate. For this reason, it is difficult to optimize the time width of a single sliding window. Therefore, in this embodiment, the primary learning data 12 composed of a time series of 50 types of feature vectors and their class labels is configured with a time width of 4, 6,. Learned. That is, the integrated discriminator learning device 20 acquires the discrimination results obtained by the HMMs of 50 (J = 50) discriminators 14 for learning by the integrated discriminator 24. Among them, the discrimination rate of the HMM having the best discrimination rate in the test data (HMM learned by the feature expression having a window size of 48) was about 42% as shown in FIG. FIG. 5 shows the discrimination rate of the HMM (single discriminator 14) trained by the feature expression having a window size of 48. In FIG. 5, “()” represents a standard deviation. Then, the integrated discriminator learning device 20 creates secondary learning data 22 from the discrimination results obtained by these 50 HMMs, and uses this to learn the integrated discriminator 24.

従来の識別器による識別と本実施の形態の統合識別器２４による識別との識別率を比較した。１次学習データ１２の学習データ数は、全部で1,097である。これから学習データとテストデータ３０とから成る５つのデータセットを作成し、テストデータ３０での識別率の平均値と標準偏差を比較した。従来法としては、上述した非特許文献３のＢＣＣモデルであるＩＢＣＣ（Independent Bayesian Classifier Combination）及びＥＢＣＣ（Enhanced Bayesian Classifier Combination）の２種類のモデルの他に、代表的な統合識別器である多数決識別器（MajorityVoting：ＭＶ）、ナイーブベイズモデル（Naive Bayes：ＮＢ）、さらに上述した非特許文献４に基づくロジスティック回帰（Logistic Regression) にＬ１正則化項を付与したLR+Lasso、サポートベクトルマシン（線形カーネル：ＳＶＭ（Ｌ））、及び多項式カーネル：ＳＶＭ（Ｐ））と比較した。ＮＢモデルは、本実施の形態の識別システム１０（統合識別器２４）における全てのｋ，ｊのペアについてｒ_ｊ ^(ｋ)＝１とした場合に相当する。また、LR+Lasso、ＳＶＭはいずれも統合識別手法ではないが、２次学習データを学習データとして用い、非特許文献４のstackedregressionの枠組みで適用した。 The discrimination rates of discrimination by a conventional discriminator and discrimination by the integrated discriminator 24 of the present embodiment were compared. The total number of learning data items of the primary learning data 12 is 1,097. From this, five data sets composed of learning data and test data 30 were created, and the average value and standard deviation of the discrimination rate in test data 30 were compared. As a conventional method, in addition to the above-mentioned two models of IBCC (Independent Bayesian Classifier Combination) and EBCC (Enhanced Bayesian Classifier Combination) which are the BCC models of Non-Patent Document 3, the majority decision classification which is a representative integrated classifier is used. LR + Lasso with L1 regularization term added to logistic regression (Logistic Regression) based on the above mentioned non-patent document 4 (MajorityVoting: MV), naive Bayes model (NB), support vector machine (linear kernel) : SVM (L)), and polynomial kernel: SVM (P)). The NB model corresponds to the case where r _j ^(k) = 1 is set for all the k, j pairs in the identification system 10 (integrated classifier 24) of the present embodiment. Neither LR + Lasso nor SVM is an integrated identification method, but secondary learning data is used as learning data, and is applied in the framework of the stacked regression of Non-Patent Document 4.

図６には、従来の統合識別器及び本実施の形態の統合識別器２４の識別率を示す。なお、図６中の「（）」内は、標準偏差を表している。図６から明らかなように、本実施の形態の統合識別器２４が顕著に優位な識別結果を得られていることが確認できる。上述した様に、本実施の形態の統合識別器学習装置２０による統合識別器２４の学習手法はＮＢモデルの潜在変数拡張と見なせるが、両手法の識別結果から、潜在変数の導入の有効性が実験的にも確認できる。尚、本実施の形態の統合識別器学習装置２０による学習手法では、ギブスサンプリングに基づいてメタ識別器を学習しているが、高々１００回程度の反復で収束するため、計算時間的にも問題はなく、また、テストデータ３０に対する識別は、一サンプルあたり実時間で計算することが可能である。 FIG. 6 shows the identification rates of the conventional integrated classifier and the integrated classifier 24 of the present embodiment. In FIG. 6, “()” represents a standard deviation. As is clear from FIG. 6, it can be confirmed that the integrated discriminator 24 of the present embodiment has obtained a significantly superior discrimination result. As described above, the learning method of the integrated discriminator 24 by the integrated discriminator learning device 20 according to the present embodiment can be regarded as an extension of the latent variable of the NB model. This can be confirmed experimentally. In the learning method by the integrated discriminator learning device 20 of the present embodiment, the meta discriminator is learned based on Gibbs sampling. However, since it converges at about 100 iterations, there is a problem in calculation time. In addition, the identification for the test data 30 can be calculated in real time per sample.

以上説明したように、本実施の形態の識別システム１０における統合識別器学習装置２０では、複数の識別器１４各々の識別結果及び真のクラスを２次学習データ２２（教師データ）として用いて、統合識別器２４の学習を行う。統合識別器学習装置２０は、当該学習において、単に、正しい識別を行う識別器のみを識別に有効とするのではなく、誤認識を行う識別器であっても、識別結果が真のクラスに対して一貫性を有する場合は、識別に有効であるとし、クラス毎に、識別器１４がそのクラスを真のクラスとする場合に一貫性を有する識別結果を出力するか否かを潜在変数ｒ_j ^(k)として学習する。 As described above, in the integrated classifier learning device 20 in the identification system 10 of the present embodiment, the identification results and true classes of each of the plurality of classifiers 14 are used as the secondary learning data 22 (teacher data), Learning of the integrated discriminator 24 is performed. In the learning, the integrated discriminator learning device 20 does not simply enable only a discriminator that performs correct discrimination in the learning, but even if it is a discriminator that performs misrecognition, the discrimination result is obtained for a true class. If the classifier 14 determines that the classifier 14 outputs a consistent identification result when the class is a true class, the latent variable r _j Learn as ^(k) .

これにより、本実施の形態では、単一種類の特徴、あるいは、単一種類の識別器１４では非常に識別が困難な場合であっても、識別性能の高い統合識別器２４を実現することができる。また、複数の識別器１４を統合する際に、全クラスについては識別能力が低いが、一部のクラスについては識別能力が高い識別器１４を有効に統合することができる。 As a result, in the present embodiment, it is possible to realize the integrated discriminator 24 having a high discrimination performance even if it is difficult to discriminate with a single type of feature or with a single type of discriminator 14. it can. Further, when integrating a plurality of classifiers 14, it is possible to effectively integrate classifiers 14 having a low discrimination capability for all classes but having a high discrimination capability for some classes.

従って、本実施の形態の統合識別器学習装置２０によれば、統合識別器の識別精度を向上させることができる。 Therefore, according to the integrated classifier learning device 20 of the present embodiment, the identification accuracy of the integrated classifier can be improved.

また、本実施の統合識別器学習装置２０を用いることにより、実用上、最良な識別器の選択が不要となり、より容易に識別システム１０を構築することが可能となる。 Further, by using the integrated classifier learning device 20 of the present embodiment, it is not necessary to select the best classifier for practical use, and the identification system 10 can be constructed more easily.

また、本実施の形態の統合識別器学習装置２０により統合識別器２４を学習することにより、識別システム１０では、汎用性及び頑健性を有する。具体的には、複数の識別器１４の識別結果を統合するため、各識別器１４がどのような特徴ベクトルで学習されたか、また、どのような識別器１４であるかに依存しない、という汎用性を有する。つまり、学習データに対する、複数の識別器の識別結果（クラスラベル）のみが所与であれば良く、換言すると、任意の識別器１４に適用できることにより実現している。また、従来の手法では、被統合識別器に、識別精度が悪い識別器が含まれている場合は、識別結果を統合することによる識別精度の向上が見られないという問題に対して、本実施の形態では、識別結果が統合される識別器１４に、識別精度が悪い識別器１４が含まれている場合でも、誤識別に一貫性がある識別器１４であれば、統合識別器２４を構成する上で有効であることを学習することにより実現しており、頑健性を有している
なお、本実施の形態では、潜在変数の集合Ｒ＝｛ｒ_ｊ ^(ｋ)｝をギブスサンプリング法により決定しているがこれに限らず、マルコフ連鎖モンテカルロ法（ＭＣＭＣ）や、メトロポリス法等により決定してもよい。しかしながら、ｋとｊの数だけＲ（集合要素）が存在するのでギブスサンプリングにより決定することが好ましい。 Further, by learning the integrated discriminator 24 by the integrated discriminator learning apparatus 20 of the present embodiment, the identification system 10 has versatility and robustness. Specifically, in order to integrate the discrimination results of a plurality of discriminators 14, a general-purpose that does not depend on what kind of feature vector each discriminator 14 has learned and what kind of discriminator 14 is used. Have sex. That is, it is sufficient that only the identification results (class labels) of a plurality of classifiers with respect to the learning data are given. In other words, this is realized by being applicable to an arbitrary classifier 14. Also, in the conventional method, when the classifiers to be integrated include classifiers with poor identification accuracy, this implementation is performed for the problem that the identification accuracy is not improved by integrating the identification results. In this form, even if the discriminator 14 with poor discrimination accuracy is included in the discriminator 14 into which the discrimination results are integrated, if the discriminator 14 is consistent in misclassification, the integrated discriminator 24 is configured. In the present embodiment, the set of latent variables R = {r _j ^(k) } is obtained by the Gibbs sampling method. However, the present invention is not limited to this, but may be determined by the Markov chain Monte Carlo method (MCMC), the metropolis method, or the like. However, since there are R (aggregate elements) as many as k and j, it is preferable to determine by Gibbs sampling.

また、本実施の形態について図面を参照して詳述に説明したが、本実施の形態は一例であり、具体的な構成は本実施の形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれ、状況に応じて変更可能であることは言うまでもない。 Although the present embodiment has been described in detail with reference to the drawings, the present embodiment is an example, and the specific configuration is not limited to the present embodiment, and departs from the gist of the present invention. Needless to say, the range of designs that are not included is included and can be changed according to the situation.

１０識別システム
１２１次学習データ
１４識別器
２０統合識別器学習装置
２２２次学習データ
２４統合識別器
３０テストデータ
３２統合識別結果出力装置 DESCRIPTION OF SYMBOLS 10 Identification system 12 Primary learning data 14 Classifier 20 Integrated discriminator learning apparatus 22 Secondary learning data 24 Integrated discriminator 30 Test data 32 Integrated discrimination result output apparatus

Claims

An integrated discriminator learning device that learns an integrated discriminator that integrates discrimination results output from each of a plurality of discriminators to identify a class,
For each of the plurality of data of the primary learning data consisting of a set of a feature vector of the data for each of a plurality of data and a true class to which the data belongs, the data is attributed to each of the plurality of classifiers. Secondary learning data each consisting of a set of identification results obtained by identifying a class to be identified and a true class to which the data belongs are acquired, and the plurality of identifications are based on each of the acquired secondary learning data Based on the identification result obtained by identifying the class to which the data belongs by each of the plurality of classifiers using whether or not each classifier outputs an identification result consistent with the true class. Learning an integrated classifier that identifies the class by integrating the identification results;
Integrated classifier learning device.

Based on each of the acquired secondary learning data, for each combination of the classifier and class ω _k , the jth classifier is consistent for data in which the class ω _k is a true class. The integrated discriminator learning device according to claim 1, wherein the integrated discriminator includes a binary latent variable r _j ^(k) indicating whether to output an identification result.

Based on each of the front Symbol secondary learning data the acquired, which is a parameter of generation model of the secondary learning data represented by the following formula (I) ~ (V) Formula alpha, beta, a, and b , Learn by the following formula (VI)
Wherein for each combination of classifiers and class omega _k, according to the probability distribution of the values of the following (VII) wherein and said latent variables r _j calculated by (VIII) Formula ^(k), wherein latent variables r _j ⁽ The integrated discriminator learning device according to claim 2, wherein a value of ^k) is determined.

However, c _{i, j} ^(k) shows the identification result of the j-th identifier contained classes omega _k to the i-th pre-Symbol secondary learning data to true class, R, the potential Indicates a set of variables r _j ^(k) , where r _j ^(k) = 1 indicates that the j th discriminator outputs a consistent identification result for data whose class ω _k is a true class. , R _j ^(k) = 0 indicates that the j th discriminator outputs a consistent discrimination result for data whose class ω _k is a true class, and r _{\ (k, j)} is represents the set of latent variables excluding r _j ^(k) , C represents the set of secondary learning data, and n _{j, l} ^(k) represents the true class ω by the j-th discriminator. Indicates the number of data identified as class ω _l for data belonging to _k , and β _{j, l} ^(k) is a consistent identifier for data for which class ω _k is a true class. A parameter representing a directory distribution that generates a probability of identifying data belonging to the true class ω _k as class ω _l by the j th discriminator that outputs another result, and α _l is a consistent identification N ^(k) is a parameter representing a directory distribution that generates a probability of identifying the data as class ω _l by the classifier that does not output a result, and N ^(k) indicates the number of data in which class ω _k is a true class , Δ (x, y) is a delta function, and Γ (x) is a gamma function.

An integrated discriminator learning method in an integrated discriminator learning device that learns an integrated discriminator that identifies and discriminates the discrimination results output from each of a plurality of discriminators,
The plurality of classifiers for each of the plurality of pieces of primary learning data composed of a set of a feature vector of the data for each of a plurality of pieces of data and a true class to which the data belongs. Each of which acquires secondary learning data consisting of a set of an identification result obtained by identifying a class to which the data belongs and a correct class to which the data belongs, and based on each of the acquired secondary learning data The classifier to which the data belongs is determined by each of the plurality of classifiers using whether or not each of the plurality of classifiers outputs an identification result consistent with the true class. Learning an integrated classifier that integrates the identification results based on the identified results.
Integrated classifier learning method.

An integrated discriminator learning program for causing a computer to function as the integrated discriminator learning device according to any one of claims 1 to 3.