CN102609731A

CN102609731A - Image classifying method for combining vision vocabulary books of different sizes

Info

Publication number: CN102609731A
Application number: CN2012100070791A
Authority: CN
Inventors: 罗会兰; 廖列法; 胡中栋
Original assignee: Jiangxi University of Science and Technology
Current assignee: Jiangxi University of Science and Technology
Priority date: 2012-01-11
Filing date: 2012-01-11
Publication date: 2012-07-25
Anticipated expiration: 2032-01-11
Also published as: CN102609731B

Abstract

The invention discloses an image classifying method for combining vision vocabulary books of different sizes and relates to the technical field of model recognition, computer vision and image understanding. According to the image classifying method disclosed by the invention, multi-resolution information is used for quantizing images and a plurality of available clues from different comprehensive layers are used for classifying the images in parallel. In order to utilize information of different particles to classify the images, the images are quantized based on the vision vocabulary books of different sizes; and the vision vocabulary books of different sizes can be used for capturing different image characteristics. Then, the images are trained based on the vision vocabulary books of different sizes to obtain different quantization vector sets so as to learn different classifiers; and each classifier can obtain different models of objects according to the information of different granularities of the images to integrate classifier models to classify new images to obtain a better effect. The experimental result shows that the performance of the vision vocabulary books of single size can be obviously improved and the robustness is very strong, so that the image classifying method can be used for achieving a good classifying effect on the different images.

Description

An Image Classification Method Combining Visual Lexicons of Different Sizes

技术领域 technical field

本发明属于模式识别、计算机视觉、图像理解技术领域，具体涉及一种图像分类方法。 The invention belongs to the technical fields of pattern recognition, computer vision and image understanding, and in particular relates to an image classification method.

背景技术 Background technique

图像分类的困难在于需要建立一个即能容纳类内的高度变化，又要能区分不同类的类模型。“Constellation”模型试图定位不同的物体局部并确定它们在空间上的关系。尽管这些方法可能表示能力强，但是这种空间约束模型无法处理或识别大的变形，比如不在一个平面内的旋转和遮挡，也没有考虑局部数目不确定的物体，比如建筑物和树。许多用于图像分类的流行方法使用独立块的集合来表示图像，这些独立块由局部视觉描述子描述，其中最典型的是“bag-of-words”模型。它确定每类中特定的局部比例，而忽略局部间的空间关系。在检测到图像的兴趣点（独立块）且用描述子描述兴趣点(也就是特征表示)后，必须为训练和测试图像表示它们的分布。一种流行的表示方法，也称为图像量化方法，是通过对描述后的兴趣点集进行聚类得到一个视觉词汇本。然后图像表示成视觉单词标签的直方图。但是几乎所有流行的聚类算法都需要用户输入簇个数。为了提供这个参数，用户必需要有一些图像的先验知识或者通过许多的验证实验来选择一个合适的参数。最近，许多基于“bag-of-words”模型的方法致力于融合多种特征来得到性能提升。计算机视觉领域中流行的结合多个特征的趋势是使用多核学习方法（Multiple Kernel Learning，MKL)。从时间复杂性角度来说，MKL方法不能并行学习多个特征。 The difficulty of image classification lies in the need to establish a class model that can accommodate the height variation within a class and distinguish between different classes. The "Constellation" model attempts to locate different object parts and determine their spatial relationship. Although these methods may be expressive, such spatially constrained models cannot handle or recognize large deformations, such as rotations and occlusions that do not lie in a plane, nor do they consider objects with an uncertain number of localities, such as buildings and trees. Many popular methods for image classification represent images using collections of independent blocks described by local visual descriptors, most typical of which are "bag-of-words" models. It determines the specific proportion of parts in each class, while ignoring the spatial relationship between parts. After detecting interest points (individual patches) of an image and describing them with descriptors (i.e. feature representations), their distribution must be represented for training and test images. A popular representation method, also known as image quantization, is to obtain a visual vocabulary by clustering the described set of interest points. The image is then represented as a histogram of visual word labels. But almost all popular clustering algorithms require the user to input the number of clusters. In order to provide this parameter, the user must have some prior knowledge of the image or choose an appropriate parameter through many verification experiments. Recently, many methods based on the "bag-of-words" model are devoted to fusing multiple features for performance improvement. A popular trend in computer vision to combine multiple features is to use Multiple Kernel Learning (MKL). From the perspective of time complexity, the MKL method cannot learn multiple features in parallel.

本发明试图将集成学习技术的优势应用到图像分类中，集成学习的思想是应用多个学习器并结合他们的预测。图像分类对于传统的机器学习算法是非常困难的，因为描述图像的矢量的维度非常高。为了利用来自于不同信息综合层的线索来分类图像，不同大小的视觉词汇本成员用来构成视觉词汇本集体。当应用基于视觉词汇本集体上学习得到的分类器集体来分类新的图像时，可以得到性能的提升。而且，从时间复杂性角度来说，本发明可以并行学习成员视觉词汇本和相应的成员分类器，具有很好的并行性和可缩放性。 The present invention attempts to apply the advantages of ensemble learning techniques to image classification. The idea of ensemble learning is to apply multiple learners and combine their predictions. Image classification is very difficult for traditional machine learning algorithms because the dimensionality of the vectors describing images is very high. To classify images using cues from different information integration layers, visual vocabulary members of different sizes are used to form a visual vocabulary collective. Performance gains can be obtained when applying a ensemble of classifiers learned on the corpus of visual vocabulary to classify new images. Moreover, from the perspective of time complexity, the present invention can learn member visual vocabulary and corresponding member classifiers in parallel, and has good parallelism and scalability.

本项发明的主要贡献在于提出了一种结合不同大小视觉词汇本的图像分类方法。本发明能有效减少图像分类的监督程度，综合利用多种有效信息，并行学习物体模型，有效提高图像分类的效率和准确度。 The main contribution of this invention is to propose an image classification method combining visual vocabulary books of different sizes. The invention can effectively reduce the supervision degree of image classification, comprehensively utilize various effective information, learn object models in parallel, and effectively improve the efficiency and accuracy of image classification.

发明内容 Contents of the invention

为了解决图像分类不能有效融合多种信息和由于描述图像的矢量的高维度，传统的机器学习方法趋向于产生非常不稳定且泛化能力差的模型的问题，本发明提供了一种结合不同大小视觉词汇本的图像分类方法。 In order to solve the problem that image classification cannot effectively integrate multiple information and due to the high dimensionality of vectors describing images, traditional machine learning methods tend to produce models that are very unstable and have poor generalization ability, the present invention provides a method that combines different sizes Image Classification Approaches for Visual Lexicons.

本发明将集成学习的优势应用到图像分类中，不同综合层次的特征用来形成视觉词汇本集体。基于视觉词汇本集体上，同一副图像能得到不同的量化矢量。所以，一个分类器集体能在同一训练图像集的不同表达矢量集上学习得到。既然每个成员利用一种图像信息，当用这个分类器集体来分类新的图像时，可以得到意想不到的满意结果。集成方法通过结合多个模型的预测来提高现存算法的性能。 The invention applies the advantages of integrated learning to image classification, and the features of different comprehensive levels are used to form a visual vocabulary collection. Based on the visual vocabulary, different quantization vectors can be obtained for the same image. Therefore, an ensemble of classifiers can be learned on different sets of representation vectors from the same training image set. Since each member utilizes one kind of image information, when using this classifier collectively to classify new images, unexpected satisfactory results can be obtained. Ensembling methods improve the performance of existing algorithms by combining the predictions of multiple models.

与分类器集体相似，使用视觉词汇本集体来提高视觉词汇本的质量和鲁棒性。词汇本一般是用标准的聚类算法从训练图像集中学习得到，所以使用词汇本集体也可以达到提高聚类算法质量的目的。视觉词汇本集体用来表达不同类型的图像信息。在构建了一个差异性视觉词汇本集体后，就可以得到高差异性的分类器集体，其中的每个成员分类器分别根据不同的图像特征来建立物体模型。所以使用此分类器集体去分类新的图像时，可以得到更好的、更鲁棒的结果。高差异性的集体对于减少建立一个准确模型所需要的监督程度也非常有效。 Similar to classifier collectives, visual vocabulary collectives are used to improve the quality and robustness of visual vocabulary. The vocabulary is generally learned from the training image set using a standard clustering algorithm, so the use of the vocabulary can also achieve the purpose of improving the quality of the clustering algorithm. This collective of visual vocabularies is used to express different types of image information. After constructing a differential visual vocabulary collective, a highly differentiated classifier collective can be obtained, in which each member classifier builds object models according to different image features. So when using this classifier collectively to classify new images, better and more robust results can be obtained. High diversity ensembles are also very effective in reducing the amount of supervision needed to build an accurate model.

本发明直接使用多分辨率信息来量化图像，并行使用来自于不同综合层的多种可用线索分类图像。为了利用不同粒度的信息来分类物体，图像在基于不同大小的视觉词汇本上量化，这些不同大小的视觉词汇本可以捕获不同粒度的图像特征。然后基于不同大小的视觉词汇本，训练图像集得到不同的量化矢量集，从而可以学习到不同的分类器，每种分类器根据图像不同粒度的信息得到物体不同的模型，集成这些分类器模型来分类新的图像，包括以下步骤： The present invention quantizes images directly using multi-resolution information, classifying images in parallel using multiple available cues from different integration layers. To classify objects using information at different granularities, images are quantized based on visual vocabularies of different sizes that can capture image features at different granularities. Then based on the visual vocabulary of different sizes, the training image set obtains different quantization vector sets, so that different classifiers can be learned. Each classifier obtains different models of objects according to the information of different granularity of the image, and integrates these classifier models. Classify new images, including the following steps:

步骤1. 用兴趣点检测子提取训练图像的兴趣点，然后用描述子描述提取出来的兴趣点； Step 1. Use the interest point detector to extract the interest points of the training image, and then use the descriptor to describe the extracted interest points;

步骤2. 随机选择一部分描述好的兴趣点，在其上运行聚类算法得到一个成员视觉词汇本，通过设置不同的簇个数作为聚类算法的参数，得到具有不同大小的成员视觉词汇本； Step 2. Randomly select a part of the well-described interest points, run the clustering algorithm on it to obtain a member visual vocabulary, and set different cluster numbers as the parameters of the clustering algorithm to obtain member visual vocabulary with different sizes;

步骤3. 基于这个成员视觉词汇本对训练图像集进行量化； Step 3. Quantize the training image set based on this member visual vocabulary;

步骤4. 在量化后的训练数据集上学习一个分类器； Step 4. Learn a classifier on the quantized training data set;

步骤5. 重复步骤2到步骤4，生成预设大小的视觉词汇本集体和分类器集体； Step 5. Repeat steps 2 to 4 to generate a preset size of visual vocabulary and classifier collectives;

步骤6. 基于一个成员视觉词汇本，对新图像进行量化； Step 6. Based on a member visual vocabulary, quantify the new image;

步骤7. 使用对应成员分类器分类新图像，得到分类结果； Step 7. Use the corresponding member classifier to classify the new image to obtain the classification result;

步骤8. 重复步骤6到步骤7，直到每个成员分类器得到了自己的分类结果； Step 8. Repeat steps 6 to 7 until each member classifier gets its own classification result;

步骤9. 利用集成技术集成成员分类器的分类结果得到最终图像分类标签。 Step 9. Use the integration technique to integrate the classification results of the member classifiers to obtain the final image classification label.

实验结果表明本发明提出的方法能增加鲁棒性，因为在高维问题中很难评估分类器的好坏，所以用户通常不知道选择哪种方法好，集成方法可以使用许多的模型，然后结合它们产生稳定结果，集成方法能自动聚焦于最适合所给数据的信息。 Experimental results show that the method proposed by the present invention can increase robustness, because it is difficult to evaluate the quality of the classifier in high-dimensional problems, so users usually do not know which method to choose, the integration method can use many models, and then combine They produce stable results, and ensemble methods automatically focus on the information that best fits the given data.

本发明有益效果是具有在不同领域图像上的平均性能更好，鲁棒性强的优点，且模型简单，非常适用于一般操作者，不需要复杂参数的调整，监督程度低，且对训练数据的要求低；利用集成学习固有的并行性，可以在多个处理器上利用少量训练数据并行学习，所以本发明的效率也相对较高。 The beneficial effect of the present invention is that it has the advantages of better average performance on images in different fields and strong robustness, and the model is simple, which is very suitable for general operators, does not require complex parameter adjustment, and has a low degree of supervision. The requirements are low; using the inherent parallelism of integrated learning, a small amount of training data can be used for parallel learning on multiple processors, so the efficiency of the present invention is relatively high.

具体实施方式 Detailed ways

本发明优选的具体实施例： Preferred specific embodiments of the present invention:

一个描述子对应到与它在欧拉空间中最近的单词。在形成一个成员词汇本后，为了量化图像，所有检测出来的兴趣点都用来建立基于此成员词汇本上的直方图。为了使直方图独立于描述子个数，直方图矢量规范化成总和为1。视觉词汇本是应用聚类算法到200,000个随机从训练图像集中选择来的描述子集合上得到的。加权 LibSVM用来训练分类器。在训练阶段，正例样本的权值设为

，反例样本的权值设为

，这里#pos表示训练集中正例样本个数，#neg是训练集中反例样本个数。为了应用SVM (Support Vector Machine)分类器到多类问题，应用了一对多（one-against-all）的方法。 A descriptor corresponds to its nearest word in Euler space. After forming a member vocabulary, in order to quantify the image, all detected interest points are used to build a histogram based on this member vocabulary. To make the histogram independent of the number of descriptors, the histogram vector is normalized to sum to 1. The visual vocabulary is obtained by applying a clustering algorithm to a set of 200,000 descriptors randomly selected from the training image set. Weighted LibSVM is used to train the classifier. In the training phase, the weight of positive samples is set to

, and the weight of the negative sample is set to

, where #pos represents the number of positive samples in the training set, and #neg is the number of negative samples in the training set. In order to apply the SVM (Support Vector Machine) classifier to multi-class problems, a one-against-all method is applied.

对于兴趣点检测子和描述子，使用Koen E. A. van de Sande et al.的颜色描述子软件，这个软件执行兴趣点的检测和颜色描述子的计算。为了提高成员的性能，使用了空间金字塔结构1x1+2x2+1x3。 For interest point detectors and descriptors, use Koen E. A. van de Sande et al. al.'s color descriptor software, this software performs the detection of interest points and the calculation of color descriptors. In order to improve the performance of members, a spatial pyramid structure 1x1+2x2+1x3 is used.

对于一副测试图像，它的特征矢量假定为

，则SVM的决策函数是：

，

是训练图像

和测试图像

间的核函数，

是

的类标签（+1 或 −1），是训练图像

的权值，b是决策阈值。

在这里设定为基于

距离的核函数：

，A是规范化距离的尺度参数，在这里设置成所有训练图像间的平均

距离。 For a test image, its feature vector is assumed to be

, then the decision function of SVM is:

,

is the training image

and a test image

The kernel function between

yes

class labels (+1 or −1), is the training image

The weight of b is the decision threshold.

set here based on

Kernel function for distance:

, A is the scale parameter of the normalized distance, here it is set to be the average of all training images

distance.

为了获得AP值，直接使用了SVM决策函数值的输出。当测试一副新的图像

时，分类器集体的输出通过将所有成员分类器的决策函数值平均得到： To obtain the AP value, the output of the SVM decision function value is used directly. When testing a new image

When , the output of the classifier ensemble is obtained by averaging the decision function values of all member classifiers:

，这里S是集体大小，

是第i个成员分类器的输出值。通过设置不同的阈值，从而获得AP值和precision-recall曲线。

, where S is the collective size,

is the output value of the i -th member classifier. By setting different thresholds, the AP value and precision-recall curve are obtained.

成员视觉词汇本的大小分别设置成200，400，800，1200，1300，1500，1600，1700，1900，2000，2200，2600，2800，3000，3600，4000，4500，和5000，得到一个大小为18的集体，使用Harris-Laplace 兴趣点检测子和OpponentSift描述子。聚类算法使用欧氏距离度量的k-means聚类算法。 The size of the member's visual vocabulary is set to 200, 400, 800, 1200, 1300, 1500, 1600, 1700, 1900, 2000, 2200, 2600, 2800, 3000, 3600, 4000, 4500, and 5000, and a size of A collective of 18, using Harris-Laplace interest point detectors and OpponentSift descriptors. The clustering algorithm uses the k-means clustering algorithm of the Euclidean distance measure.

实验结果表明，本发明优选的具体实施例比传统基于单个视觉词汇本的识别方法具有更好的性能，甚至超过了一些经过精心参数调整的复杂模型的性能。 Experimental results show that the preferred embodiment of the present invention has better performance than the traditional recognition method based on a single visual vocabulary, and even surpasses the performance of some complex models whose parameters have been carefully adjusted.

Claims

1. An image classification method based on a visual vocabulary book, characterized in that multi-resolution information is used to quantify images, and multiple available clues from different comprehensive layers are used to classify images in parallel, in order to utilize information of different granularities to classify images , images are quantified based on visual vocabularies of different sizes, which can capture different image features, including the following steps:

(1) Use the interest point detector to extract the interest points of the training image, and then use the descriptor to describe the extracted interest points;

(2) Randomly select a part of the well-described interest points, run the clustering algorithm on it to obtain a member visual vocabulary, and set different cluster numbers as the parameters of the clustering algorithm to obtain member visual vocabulary with different sizes;

(3) Quantize the training image set based on this member visual vocabulary;

(4) Learn a classifier on the quantized training data set;

(5) Repeat steps 2 to 4 to generate a preset size of visual vocabulary and classifier collectives;

(6) Quantify new images based on a member visual vocabulary;

(7) Use the corresponding member classifier to classify the new image to obtain the classification result;

(8) Repeat steps 6 to 7 until each member classifier gets its own classification result;

(9) Use the integration technique to integrate the classification results of the member classifiers to obtain the final image classification label.

2. The method according to claim 1, wherein the size of the members' visual vocabulary is set to 200, 400, 800, 1200, 1300, 1500, 1600, 1700, 1900, 2000, 2200, 2600, 2800 , 3000, 3600, 4000, 4500, and 5000 to get a collective of size 18.

3. The method according to claim 1, characterized in that in order to integrate the collective visual vocabulary and corresponding classifier collective classification images, directly use the output of the SVM decision function value, when testing a new image

, where S is the collective size,

is the output value of the i- th member classifier, and the precision-recall curve is obtained by setting different thresholds.