CN104408469A

CN104408469A - Firework identification method and firework identification system based on deep learning of image

Info

Publication number: CN104408469A
Application number: CN201410711008.9A
Authority: CN
Inventors: 赵俭辉; 王勇; 章登义; 武小平
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2014-11-28
Filing date: 2014-11-28
Publication date: 2015-03-11

Abstract

The invention discloses a firework recognition method and system based on image deep learning, comprising: step 1, collecting an unlabeled sample image set and a labeled sample image set; step 2, obtaining an unlabeled training data set and a labeled training data set ; Step 3, whitening preprocessing of training data; Step 4, based on unlabeled training data after whitening preprocessing, use unsupervised learning to construct a deep neural network based on sparse self-encoding, and extract basic image features of unlabeled training data set; step 5, convolving basic image features and pooling image data; step 6, training Softmax classifier based on the labeled training data set after convolution and pooling; step 7, convolving and pooling the waiting The recognition image is input into the trained Softmax classifier to obtain the recognition result. The invention can effectively improve the visual recognition rate of fireworks and similar targets, and can realize higher-precision automatic recognition of fireworks.

Description

Fireworks recognition method and system based on image deep learning

技术领域technical field

本发明属于基于数字图像的火灾智能监测与烟火目标自动识别技术领域，尤其涉及一种基于图像深度学习的烟火识别方法及系统。The invention belongs to the technical field of fire intelligent monitoring and pyrotechnic target automatic recognition based on digital images, and in particular relates to a pyrotechnic recognition method and system based on image deep learning.

背景技术Background technique

基于数字图像的烟火智能监测是一个与图像处理、计算机视觉、人工智能、机器学习等诸多领域相关的经典问题，目前已有一些自动识别烟火对象的文献，识别过程一般可分为目标分割、特征提取、综合判断等几个阶段。The intelligent monitoring of pyrotechnics based on digital images is a classic problem related to image processing, computer vision, artificial intelligence, machine learning and many other fields. At present, there are some literatures on automatic recognition of pyrotechnic objects. The recognition process can generally be divided into target segmentation, feature Extraction, comprehensive judgment and other stages.

阶段一，目标分割。Phase 1, target segmentation.

烟火自动目标分割大致分为阈值分割、边缘检测分割、区域特性分割、特征空间聚类分割等方法。阈值分割法主要包括直方图阈值、最大类间方差(Otsu)阈值、二维最大熵值、模糊阈值、共生矩阵阈值等；边缘检测分割法主要包括Sobel算子、Canny算子、Laplacan算子、Roberts算子、Prewitt算子、Susan算子、活动轮廓模型、分水岭算法、水平集方法等；区域特性分割法主要包括区域增长、区域分开与合并、数学形态学等；特征空间聚类分割法主要包括K均值、模糊C均值、Mean-Shift等。具体而言，烟火目标的获取常常借助颜色分割，如火的彩色范围与烟的灰度范围，而常用颜色模型包括RGB、HSI、YCbCr等。Pyrotechnic automatic target segmentation can be roughly divided into threshold segmentation, edge detection segmentation, region characteristic segmentation, feature space clustering segmentation and other methods. Threshold segmentation methods mainly include histogram threshold, maximum inter-class variance (Otsu) threshold, two-dimensional maximum entropy value, fuzzy threshold, co-occurrence matrix threshold, etc.; edge detection segmentation methods mainly include Sobel operator, Canny operator, Laplacan operator, Roberts operator, Prewitt operator, Susan operator, active contour model, watershed algorithm, level set method, etc.; regional feature segmentation methods mainly include area growth, area separation and merging, mathematical morphology, etc.; feature space clustering segmentation methods mainly Including K-means, fuzzy C-means, Mean-Shift, etc. Specifically, the acquisition of pyrotechnic targets is often achieved by color segmentation, such as the color range of fire and the gray range of smoke, and commonly used color models include RGB, HSI, YCbCr, etc.

阶段二，特征提取。The second stage is feature extraction.

烟火目标的视觉特征主要包括颜色、形状、纹理、空间关系等特征。颜色特征不受图像旋转和平移变化影响，进一步归一化还可不受尺度变化影响，常用的颜色特征有颜色直方图、颜色集、颜色矩、颜色聚合向量和颜色相关图等。形状特征包括轮廓特征与区域特征两类，轮廓特征主要针对物体边界，而区域特征关系到整个物体区域，常用的形状特征有边界链码、傅里叶描述符、几何形状参数、形状不变矩和小波相对矩等。纹理特征对噪声有较强的抵抗能力，但会受到分辨率、方向性、先验假设等相关因素的影响，常用纹理分析法有统计分析、几何分析和频谱分析等。空间关系指多目标之间相互的位置或方向关系，可以加强对图像内容描述的区分能力，但对目标旋转、尺度变化等比较敏感，在实际应用中仅使用空间关系信息往往是不够的。上述各种烟火特征的表达经常需要借助一定的数学工具，如拉普拉斯算子、傅立叶变换、灰度共生矩阵、隐马尔科夫模型、LBP算子、离散小波分析等。The visual features of pyrotechnic targets mainly include features such as color, shape, texture, and spatial relationship. Color features are not affected by image rotation and translation changes, and further normalization is not affected by scale changes. Commonly used color features include color histograms, color sets, color moments, color aggregation vectors, and color correlation maps. Shape features include contour features and region features. Contour features are mainly aimed at object boundaries, while region features are related to the entire object area. Commonly used shape features include boundary chain codes, Fourier descriptors, geometric shape parameters, and shape invariant moments. and wavelet relative moments, etc. Texture features have strong resistance to noise, but will be affected by related factors such as resolution, directionality, and prior assumptions. Commonly used texture analysis methods include statistical analysis, geometric analysis, and spectrum analysis. Spatial relationship refers to the mutual position or orientation relationship between multiple targets, which can enhance the ability to distinguish image content descriptions, but it is sensitive to target rotation and scale changes, and it is often not enough to use only spatial relationship information in practical applications. The expression of the above-mentioned pyrotechnic features often requires the help of certain mathematical tools, such as Laplace operator, Fourier transform, gray level co-occurrence matrix, hidden Markov model, LBP operator, discrete wavelet analysis, etc.

阶段三，综合判断。Stage three, comprehensive judgment.

烟火目标综合判断就是基于提取的多种特征给出是否存在火灾的结论，即模式识别分类器的设计与使用。常用于综合判断的烟火图像特征包括亮度值、颜色分布值、纹理参数、质心、面积、平均密度、圆形度、曲率、偏心度、尖角数、分形编码、透过率等。模式分类包括有监督和无监督两种类型，可在信息层、特征层、决策层三个层次单独或联合进行。针对烟火的模式分类主要在特征层实现，常用方法包括投票法、最小均方融合、Bayes分类器、模糊逻辑、人工神经网络、支持向量机等。The comprehensive judgment of pyrotechnic targets is based on the extracted multiple features to give the conclusion of whether there is a fire, that is, the design and use of the pattern recognition classifier. The pyrotechnic image features commonly used in comprehensive judgment include brightness value, color distribution value, texture parameter, centroid, area, average density, circularity, curvature, eccentricity, number of sharp corners, fractal code, transmittance, etc. Pattern classification includes supervised and unsupervised two types, which can be carried out independently or jointly at the three levels of information layer, feature layer and decision-making layer. The pattern classification for pyrotechnics is mainly implemented at the feature layer, and common methods include voting method, least mean square fusion, Bayes classifier, fuzzy logic, artificial neural network, support vector machine, etc.

上述方法在在建筑物火灾监测等场合验证了其有效性，但是在自然场景中，有时会存在与烟火相似的物体，例如类似火的红花、红叶、红旗，类似烟的雾、云、霾等。这些物体的客观存在导致烟火识别精度较低，漏报率与误报率较高。因此，火灾智能监测领域亟需一种精度更高的烟火目标识别方法。近年来，机器学习领域的深度学习技术已逐步应用到图像处理与模式识别中，深度学习可通过学习一种深层非线性网络结构，实现复杂函数逼近，表征输入数据分布式表示，并展现了强大的从少数样本集中学习数据集本质特征的能力。到目前为止，尚未有将深度学习与烟火识别相结合的研究出现。The above method has verified its effectiveness in occasions such as building fire monitoring, but in natural scenes, sometimes there are objects similar to fireworks, such as red flowers, red leaves, and red flags similar to fire, and fog, clouds, and haze similar to smoke wait. The objective existence of these objects leads to low accuracy of pyrotechnic recognition, and high rate of false positives and false positives. Therefore, there is an urgent need for a more accurate pyrotechnic target recognition method in the field of fire intelligent monitoring. In recent years, deep learning technology in the field of machine learning has been gradually applied to image processing and pattern recognition. Deep learning can achieve complex function approximation by learning a deep nonlinear network structure, representing the distributed representation of input data, and showing a powerful The ability to learn the essential characteristics of the data set from a small number of samples. So far, there has been no research combining deep learning with pyrotechnic recognition.

发明内容Contents of the invention

针对现有技术存在的不足，本发明将深度学习与烟火识别相结合，提供了一种基于图像深度学习的烟火识别方法及系统，用以实现更高精度的烟火自动识别。Aiming at the deficiencies in the prior art, the present invention combines deep learning with pyrotechnic recognition to provide a pyrotechnic recognition method and system based on image deep learning to achieve higher-precision automatic pyrotechnic recognition.

为解决上述技术问题，本发明采用如下的技术方案：In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

基于图像深度学习的烟火识别方法，包括步骤：A firework recognition method based on image deep learning, including steps:

步骤1，采集样本图像集，包括(1)未分类标记的目标图像及目标相似物图像构成的无标签样本图像集和(2)分类标记的目标图像及目标相似物图像构成的有标签样本图像集；Step 1, collect a sample image set, including (1) an unlabeled sample image set composed of unclassified and labeled target images and target similar images and (2) a labeled sample image composed of classified and labeled target images and target similar images set;

步骤2，分别从无标签样本图像集和有标签样本图像集中随机获取单元图像块，构成无标签训练数据集和有标签训练数据集；Step 2, randomly obtain unit image blocks from the unlabeled sample image set and the labeled sample image set, respectively, to form an unlabeled training data set and a labeled training data set;

步骤3，对无标签训练数据集和有标签训练数据集中训练数据进行白化预处理，所述的训练数据为单元图像块对应的RGB三色通道的颜色值矩阵；Step 3, performing whitening preprocessing on the training data in the unlabeled training data set and the labeled training data set, wherein the training data is the color value matrix of the RGB three-color channel corresponding to the unit image block;

步骤4，基于白化预处理后的无标签训练数据，采用无监督学习构建基于稀疏自编码的深度神经网络，并提取无标签训练数据的基本图像特征集；Step 4, based on the unlabeled training data after whitening preprocessing, use unsupervised learning to construct a deep neural network based on sparse self-encoding, and extract the basic image feature set of the unlabeled training data;

步骤5，将无标签训练数据的基本图像特征卷积与池化图像数据，所述的图像数据包括有标签训练数据和待识别图像；Step 5, convolving the basic image features of the unlabeled training data and pooling the image data, the image data including the labeled training data and the image to be recognized;

步骤6，基于卷积和池化后的有标签训练数据集训练Softmax分类器；Step 6, training the Softmax classifier based on the labeled training data set after convolution and pooling;

步骤7，将卷积和池化后的待识别图像输入已训练的Softmax分类器获得识别结果。Step 7, input the image to be recognized after convolution and pooling into the trained Softmax classifier to obtain the recognition result.

上述步骤3中所述的白化预处理为ZCA白化预处理或PCA白化预处理。The whitening preprocessing described in the above step 3 is ZCA whitening preprocessing or PCA whitening preprocessing.

上述步骤4进一步包括子步骤：Above-mentioned step 4 further comprises sub-steps:

4.1构造深度神经网络，包括单输入层、多隐藏层和单输出层；4.1 Construct a deep neural network, including a single input layer, multiple hidden layers and a single output layer;

4.2以白化预处理后的无标签训练数据作为深度神经网络的输入和输出，通过训练基于稀疏自编码的深度神经网络进行无监督学习；4.2 Use the unlabeled training data after whitening preprocessing as the input and output of the deep neural network, and perform unsupervised learning by training the deep neural network based on sparse self-encoding;

4.3基于训练的深度神经网络提取无标签训练数据的基本图像特征集。4.3 Extract the basic image feature set of unlabeled training data based on the trained deep neural network.

子步骤4.2中所述的通过训练基于稀疏自编码的深度神经网络进行无监督学习，具体为：Unsupervised learning by training a deep neural network based on sparse autoencoder described in sub-step 4.2, specifically:

4.2.1获得神经元输入值加权和和神经元输出值；4.2.1 Obtain the weighted sum of neuron input values and neuron output values;

4.2.2设定加入稀疏性限制的目标代价函数；4.2.2 Set the target cost function with sparsity restriction;

4.2.3设定深度神经网络的权重系数向量和偏置项向量的梯度下降方向，即迭代规则；4.2.3 Set the gradient descent direction of the weight coefficient vector and bias item vector of the deep neural network, that is, the iteration rule;

4.2.4采用LBFGS参数训练算法，按设定的迭代规则迭代求解权重系数向量和偏置项向量。4.2.4 Use the LBFGS parameter training algorithm to iteratively solve the weight coefficient vector and bias item vector according to the set iteration rules.

上述步骤5进一步包括子步骤：Above-mentioned step 5 further comprises sub-steps:

5.1将无标签训练数据的基本图像特征分别与图像数据的各颜色通道进行卷积运算得到卷积图像；5.1 Convolve the basic image features of the unlabeled training data with each color channel of the image data to obtain a convolution image;

5.2利用自然图像中局部区域统计特征，通过均值池化实现卷积图像的特征降维。5.2 Using the statistical characteristics of local areas in natural images, the feature dimensionality reduction of convolutional images is realized through mean pooling.

上述步骤6进一步包括子步骤：Above-mentioned step 6 further comprises sub-steps:

6.1以卷积和池化后的有标签训练数据集作为训练样本；6.1 Use the labeled training data set after convolution and pooling as the training sample;

6.2构造Softmax分类器回归模型；6.2 Construct a Softmax classifier regression model;

6.3设定回归模型参数的梯度，即迭代规则；6.3 Set the gradient of the parameters of the regression model, that is, the iteration rule;

6.4采用LBFGS参数训练算法，按设定的迭代规则迭代求解模型参数θ。6.4 Use the LBFGS parameter training algorithm to iteratively solve the model parameter θ according to the set iteration rules.

上述基于图像深度学习的烟火识别方法对应的系统，包括：The system corresponding to the above-mentioned pyrotechnic recognition method based on image deep learning includes:

样本图像采集模块，用来采集样本图像集，包括(1)未分类标记的目标图像及目标相似物图像构成的无标签样本图像集和(2)分类标记的目标图像及目标相似物图像构成的有标签样本图像集；The sample image collection module is used to collect sample image sets, including (1) an unlabeled sample image set composed of unclassified and marked target images and target similar object images and (2) an unlabeled sample image set composed of classified and marked target images and target similar object images A set of labeled sample images;

训练数据获得模块，用来分别从无标签样本图像集和有标签样本图像集中随机获取单元图像块，构成无标签训练数据集和有标签训练数据集；The training data acquisition module is used to randomly obtain unit image blocks from the unlabeled sample image set and the labeled sample image set respectively to form an unlabeled training data set and a labeled training data set;

白化预处理模块，用来对无标签训练数据集和有标签训练数据集中训练数据进行白化预处理，所述的训练数据为单元图像块对应的RGB三色通道的颜色值矩阵；The whitening preprocessing module is used to carry out whitening preprocessing to the training data in the unlabeled training data set and the training data set with labels, and the described training data is the color value matrix of the RGB three-color channel corresponding to the unit image block;

无监督学习模块，用来基于白化预处理后的无标签训练数据，采用无监督学习构建基于稀疏自编码的深度神经网络，并提取无标签训练数据的基本图像特征集；The unsupervised learning module is used to construct a deep neural network based on sparse self-encoding by using unsupervised learning based on the unlabeled training data after whitening preprocessing, and extract the basic image feature set of the unlabeled training data;

卷积和池化模块，用来将无标签训练数据的基本图像特征卷积与池化图像数据，所述的图像数据包括有标签训练数据和待识别图像；Convolution and pooling modules are used to convolve the basic image features of unlabeled training data with pooled image data, and the image data includes labeled training data and images to be identified;

分类器训练模块，用来基于卷积和池化后的有标签训练数据集训练Softmax分类器；The classifier training module is used to train the Softmax classifier based on the labeled training data set after convolution and pooling;

识别模块，用来将卷积和池化后的待识别图像输入已训练的Softmax分类器获得识别结果。The identification module is used to input the image to be identified after convolution and pooling into the trained Softmax classifier to obtain the identification result.

与现有技术相比，本发明具有以下优点和积极效果：Compared with the prior art, the present invention has the following advantages and positive effects:

(1)深度学习通过学习深层非线性网络结构，实现复杂函数逼近，表征输入数据分布式表示，拥有强大的从大样本集中学习数据本质特征的能力，因此稀疏自编码深度神经网络的分类准确率高于传统的神经网络。(1) Deep learning achieves complex function approximation by learning deep nonlinear network structure, characterizes the distributed representation of input data, and has a strong ability to learn the essential characteristics of data from large sample sets, so the classification accuracy of sparse self-encoded deep neural network than traditional neural networks.

(2)采用的ZCA技术用于数据降维，白化技术用于降低输入图像像素间相关联度，从而有利于提高无监督学习的速度。(2) The ZCA technology used is used for data dimension reduction, and the whitening technology is used to reduce the correlation between input image pixels, which is conducive to improving the speed of unsupervised learning.

(3)采用的卷积技术有助于减少神经网络需要训练的参数并简化特征提取过程，池化技术有助于利用局部区域统计特征实现特征降维并防止出现过拟合。(3) The convolution technology adopted helps to reduce the parameters that need to be trained by the neural network and simplifies the feature extraction process. The pooling technology helps to use local area statistical features to achieve feature dimensionality reduction and prevent overfitting.

(4)采用的Softmax分类器是二分类方法的扩展，能够解决多分类问题，有利于实现烟火与更多种类相似目标的识别。(4) The Softmax classifier used is an extension of the binary classification method, which can solve multi-classification problems and is beneficial to the recognition of fireworks and more types of similar objects.

附图说明Description of drawings

图1为样本图像与深度神经网络学习到的基本图像特征集。Figure 1 shows the sample image and the basic image feature set learned by the deep neural network.

具体实施方式Detailed ways

本发明的技术方案可由本领域技术人员采用计算机软件手段实现，下面以具体实施例对本发明作进一步说明。The technical solution of the present invention can be implemented by those skilled in the art by means of computer software, and the present invention will be further described with specific embodiments below.

本发明具体步骤如下：Concrete steps of the present invention are as follows:

步骤1，获得由无标签样本图像集和有标签样本图像集构成的样本图像集，基于样本图像集获得无标签训练数据集和有标签训练数据集。Step 1. Obtain a sample image set composed of an unlabeled sample image set and a labeled sample image set, and obtain an unlabeled training data set and a labeled training data set based on the sample image set.

本步骤进一步包括以下子步骤：This step further includes the following sub-steps:

步骤1.1，采集无标签样本图像构成无标签样本图像集。Step 1.1, collecting unlabeled sample images to form an unlabeled sample image set.

无标签样本图像集包括未经分类标记的目标图像及目标相似物图像，目标即火和烟，目标相似物指与火和烟相似的物体。例如，红花、红叶、红旗等即火的相似物；雾、云、霾等即烟的相似物。本子步骤中，将大量火、红花、红叶、红旗的图像组成第一类无标签样本图像集，将大量烟、雾、云、霾的图像组成第二类无标签样本图像集。The unlabeled sample image set includes unlabeled target images and target similar images. Targets are fire and smoke, and target similar objects refer to objects similar to fire and smoke. For example, safflower, red leaves, red flags, etc. are similar to fire; fog, cloud, haze, etc. are similar to smoke. In this sub-step, a large number of images of fire, red flowers, red leaves, and red flags are used to form the first type of unlabeled sample image set, and a large number of images of smoke, fog, cloud, and haze are used to form the second type of unlabeled sample image set.

步骤1.2，采集有标签样本图像构成有标签样本图像集。Step 1.2, collecting labeled sample images to form a labeled sample image set.

有标签样本图像集包括经分类标记的目标图像及目标相似物图像，如经分类标记的火、红花、红叶、红旗的图像组成的第一类有标签样本图像集，经分类标记的烟、雾、云、霾的图像组成的第二类有标签样本图像集。The labeled sample image set includes classified and marked target images and target similar images, such as the first type of labeled sample image set composed of classified and marked images of fire, safflower, red leaves, and red flags, classified and marked smoke, The second type of labeled sample image set consists of images of fog, cloud, and haze.

步骤1.3，基于样本图像集获得无标签训练数据集和有标签训练数据集。In step 1.3, an unlabeled training dataset and a labeled training dataset are obtained based on the sample image set.

从无标签样本图像中随机获取固定尺寸(例如8像素×8像素)的单元图像块，作为无标签训练数据集；从有标签样本图像中，随机获取固定尺寸(例如8像素×8像素)的单元图像块，作为有标签训练数据集。Randomly obtain unit image blocks of fixed size (such as 8 pixels×8 pixels) from unlabeled sample images as an unlabeled training data set; randomly obtain fixed size (such as 8 pixels×8 pixels) from labeled sample images Unit image patches, as a labeled training dataset.

步骤2，无标签训练数据集和有标签训练数据集中训练数据的白化预处理，所述的训练数据即步骤1.3获得的单元图像块，即单元图像块对应的RGB三色通道的颜色值矩阵。Step 2, whitening preprocessing of the training data in the unlabeled training data set and the labeled training data set, the training data is the unit image block obtained in step 1.3, that is, the color value matrix of the RGB three-color channel corresponding to the unit image block.

本具体实施方式中对训练数据进行ZCA白化预处理，依次包括以下子步骤：In this specific embodiment, ZCA whitening preprocessing is carried out to the training data, including the following sub-steps in turn:

步骤2.1，训练数据的零均值化。Step 2.1, zero-meanization of training data.

各维减去该维平均值得到xⁱ，并归一化训练数据到[0,1]范围内，设m为训练数据数量，可得训练数据的协方差矩阵∑：Subtract the average value of each dimension to get x ⁱ , and normalize the training data to the range of [0,1]. Let m be the number of training data, and the covariance matrix Σ of the training data can be obtained:

$Σ Σ = = \frac{11}{m m} {Σ Σ}_{i i = = 11}^{m m} [[{x x}^{i i} \cdot &Center Dot; {(({x x}^{i i}))}^{T T}]] - - - - - - ((11))$

步骤2.2，计算零均值化后的训练数据在新维度下的向量基。Step 2.2, calculate the vector basis of the training data after zero-meanization in the new dimension.

对协方差矩阵∑作奇异值分解得到特征值对角矩阵S和n维特征向量U＝[u₁u₂…u_n]，其中，u₁是∑的主特征向量，u₂是次特征向量，u_n是最次特征向量，这些特征向量构成了新的维度坐标下的一组向量基。Perform singular value decomposition on the covariance matrix Σ to obtain the eigenvalue diagonal matrix S and the n-dimensional eigenvector U=[u ₁ u ₂ ...u _n ], where u ₁ is the main eigenvector of Σ, and u ₂ is the secondary eigenvector , u _n is the lowest eigenvector, and these eigenvectors constitute a set of vector bases under the new dimension coordinates.

步骤2.3，获取新维度下的训练数据。Step 2.3, obtain the training data under the new dimension.

将训练数据进行维度转换得到新数据x^r＝U^T·xⁱ，显然x^r中各维间相互独立，再将x^r除以标准差得到各维方差为1，从而满足白化的均值接近0与方差相等两个必要条件，设ε为ZCA白化参数，本具体实施中ε取10^-5，最终的ZCA白化结果为Dimensional transformation of the training data to obtain new data x ^r = U ^T x ⁱ , obviously the dimensions in x ^r are independent of each other, and then divide x ^r by the standard deviation The variance of each dimension is obtained to be 1, so as to satisfy the two necessary conditions that the mean value of whitening is close to 0 and the variance is equal. Let ε be the ZCA whitening parameter. In this specific implementation, ε is set to 10 ^-5 , and the final ZCA whitening result is

${x x}^{t t} = = U u \cdot &Center Dot; \frac{{x x}^{r r}}{\sqrt{S S + + ϵ ϵ}} \cdot &Center Dot; {U u}^{T T} - - - - - - ((22))$

本发明中训练数据预处理并不限于ZCA白化，也可以采用PCA白化等其他常规白化技术。The training data preprocessing in the present invention is not limited to ZCA whitening, and other conventional whitening techniques such as PCA whitening may also be used.

步骤3，对无标签训练数据进行无监督学习，基于稀疏自编码构建深度神经网络。Step 3, conduct unsupervised learning on the unlabeled training data, and build a deep neural network based on sparse autoencoder.

步骤3.1，构造深度神经网络，包括输入层、隐藏层和输出层，输入层和输出层均为单层，隐藏层为多层，并将无标签训练数据作为深度神经网络的输入与输出。Step 3.1, constructing a deep neural network, including an input layer, a hidden layer and an output layer, the input layer and the output layer are both single layers, and the hidden layer is multi-layered, and the unlabeled training data is used as the input and output of the deep neural network.

步骤3.2，基于训练深度神经网络进行无监督学习，即获得深度神经网络的权重系数向量和偏置项向量。In step 3.2, unsupervised learning is performed based on training the deep neural network, that is, the weight coefficient vector and the bias item vector of the deep neural network are obtained.

步骤3.2.1，获得神经元输入值加权和：Step 3.2.1, obtain the weighted sum of neuron input values:

设表示连接神经网络第l层第j个神经元和第l+1层第i个神经元的权重系数，表示第l+1层第i个神经元的偏置项，S_l表示第l层的神经元总数，表示第l+1层中第i个神经元的输入值加权和，则：set up Represents the jth neuron in the first layer of the connected neural network and the weight coefficient of the i-th neuron in layer l+1, Represents the bias item of the i-th neuron in layer l+1, S _l represents the total number of neurons in layer l, Represents the weighted sum of the input values of the i-th neuron in the l+1th layer, then:

${z z}_{i i}^{l l + + 11} = = {Σ Σ}_{j j = = 11}^{{S S}_{l l}} (({w w}_{ij ij}^{l l} {x x}_{j j}^{l l})) + + {b b}_{i i}^{l l + + 11} - - - - - - ((33))$

步骤3.2.2，获得神经元输出值：Step 3.2.2, get neuron output value:

已知神经元激活函数为表示神经网络第l层中第i个神经元的输出值即让无标签训练数据x^t，即自编码深度神经网络的输入样本与输出结果y^t相等，即y^t＝x^t，设M为无标签训练数据x^t数量，t为无标签训练数据编号，则1≤t≤M；设表示输入样本为x^t情况下第l层第j个神经元的输出值，则隐藏层第j个神经元的平均输出值为：Known neuron activation function is Indicates the output value of the i-th neuron in the l-th layer of the neural network, namely Let the unlabeled training data x ^t , that is, the input sample of the self-encoded deep neural network and the output result y ^t be equal, that is, y ^t = x ^t , let M be the number of unlabeled training data x ^t , and t be the number of unlabeled training data, Then 1≤t≤M; let Indicates the output value of the j-th neuron in the l-th layer when the input sample is x ^t , then the average output value of the j-th neuron in the hidden layer for:

步骤3.2.3，定义深度神经网络目标代价函数：Step 3.2.3, define the objective cost function of the deep neural network:

为深度神经网络加入稀疏性限制，即令ρ为稀疏性参数，稀疏性参数为接近于0的正数，一般在0～0.05间取值，本具体实施中取ρ＝0.035。就是说，要使隐藏层第j个神经元的平均输出值接近ρ，为了实现稀疏性限制，定义代价目标函数J(w,b)：Adding sparsity constraints to deep neural networks, that is, ρ is a sparsity parameter, and the sparsity parameter is a positive number close to 0, and generally takes a value between 0 and 0.05. In this specific implementation, ρ=0.035. That is to say, to make the average output value of the jth neuron in the hidden layer Close to ρ, in order to achieve the sparsity limit, define the cost objective function J(w,b):

代价目标函数由三部分的和组成，第一部分是均方差项，第二部分是规则化项，第三部分是惩罚项，用于惩罚那些和ρ显著不同的情况以实现对神经网络的稀疏性限制。其中，N为自编码深度神经网络层数；λ是规则化系数，本具体实施中λ＝0.003；h_w,b(xⁱ)是输入样本x^t对应的神经网络输出层的输出值；β是控制稀疏性限制惩罚项的系数，本具体实施中β＝5；w与b分别为深度神经网络的权重系数向量和偏置项向量。是与ρ之间的相对熵，用于测量两个分布间的差异，作为凸函数，相对熵计算公式为：The cost objective function consists of the sum of three parts, the first part is the mean square error term, the second part is the regularization term, and the third part is the penalty term, which is used to penalize those and ρ are significantly different to achieve sparsity constraints on neural networks. Among them, N is the number of self-encoded deep neural network layers; λ is the regularization coefficient, and in this specific implementation, λ=0.003; h _w,b ( ^xi ) is the output value of the neural network output layer corresponding to the input sample x ^t ; β is the coefficient for controlling the sparsity limitation penalty term, in this specific implementation, β=5; w and b are the weight coefficient vector and bias term vector of the deep neural network respectively. yes The relative entropy between ρ and ρ, which is used to measure the difference between two distributions, as a convex function, the relative entropy is calculated as:

步骤3.2.4，求解目标代价函数：Step 3.2.4, solve the objective cost function:

针对深度神经网络的权重系数向量和偏置项向量，定义它们的梯度下降方向：For the weight coefficient vector and bias term vector of the deep neural network, define their gradient descent direction:

$\{\begin{matrix} {&dtri; &dtri; w w}^{l l} = = \frac{11}{M m} \cdot \cdot {σ σ}^{l l + + 11} \cdot \cdot {(({a a}^{l l}))}^{T T} + + {λw λw}^{l l} \\ {&dtri; &dtri; b b}^{l l} = = \frac{11}{M m} {Σ Σ}_{t t = = 11}^{M m} {σ σ}_{t t}^{l l + + 11} \end{matrix} - - - - - - ((77))$

式(7)中，表示第l层权重系数向量的梯度下降方向，示第l层偏置项向量的梯度下降方向；w^l表示第l层权重系数向量；a^l为神经网络第l层的输出向量，为输入样本x^t在第l+1层对应的残差值，σ^l+1为该层的残差向量。In formula (7), Indicates the gradient descent direction of the l-th layer weight coefficient vector, Indicates the gradient descent direction of the bias item vector of the lth layer; w ^l indicates the weight coefficient vector of the lth layer; a ^l is the output vector of the lth layer of the neural network, is the residual value corresponding to the input sample x ^t at layer l+1, and σ ^l+1 is the residual vector of this layer.

公式(7)确定了w和b的迭代规则，本具体实施中采用LBFGS参数训练算法迭代求解w和b，待迭代收敛或达到最大迭代次数时的当前权重系数向量w和偏置项向量b，即训练的稀疏自编码深度神经网络的权重系数向量w和偏置项向量b。迭代收敛标准和最大迭代次数根据实际需求预先设定。获得了权重系数向量w和偏置项向量b，即完成了稀疏自编码深度神经网络的训练。Formula (7) determines the iteration rules of w and b. In this specific implementation, the LBFGS parameter training algorithm is used to iteratively solve w and b. The current weight coefficient vector w and bias term vector b when the iteration converges or reaches the maximum number of iterations, That is, the weight coefficient vector w and the bias item vector b of the trained sparse self-encoded deep neural network. The iteration convergence standard and the maximum number of iterations are preset according to actual needs. The weight coefficient vector w and the bias item vector b are obtained, that is, the training of the sparse self-encoding deep neural network is completed.

步骤3.3，基于训练的深度神经网络提取表达无标签训练数据的基本图像特征集。Step 3.3, based on the trained deep neural network, the basic image feature set expressing the unlabeled training data is extracted.

基本图像特征集指能构成复杂图像的基本图像特征的集合，见图1，左上是用于训练的样本图像之一(1)，右边是所有样本图像经深度神经网络学习得到的基本图像特征集(3)，而基本图像特征的组合能够表达样本图像中的任一单元图像块(2)。The basic image feature set refers to the set of basic image features that can constitute a complex image. See Figure 1. The upper left is one of the sample images used for training (1), and the right is the basic image feature set of all sample images learned by deep neural networks. (3), and the combination of basic image features can express any unit image block in the sample image (2).

步骤4，利用基本图像特征集卷积与池化图像数据，所述的图像数据包括有标签训练数据和待识别图像数据。Step 4, using the basic image feature set to convolve and pool the image data, the image data includes labeled training data and image data to be recognized.

本步骤进一步包括子步骤：This step further includes sub-steps:

步骤4.1，将基本图像特征分别与各图像数据的各颜色通道进行卷积运算，即对卷积模版范围内的图像像素求均值并以该平均值为目标值，将三个颜色通道的卷积结果加起来，即得到卷积图像。In step 4.1, the basic image features are respectively convolved with each color channel of each image data, that is, the image pixels within the scope of the convolution template are averaged and the average value is used as the target value, and the convolution of the three color channels The results are added up to get the convolved image.

步骤4.2，利用自然图像中局部区域统计特征，通过均值池化实现卷积图像的特征降维，即将卷积图像分区域，求各区域像素均值，并采用各区域像素均值代表该区域。Step 4.2, using the statistical characteristics of the local area in the natural image, realize the feature dimension reduction of the convolution image through mean pooling, that is, divide the convolution image into regions, calculate the average value of pixels in each region, and use the average value of pixels in each region to represent the region.

局部区域统计特征是自然图像的固有特性，即自然图像一部分的统计特性与其它部分是类似的。例如，风景图像某区域与其它区域具有相似性。这意味着图像某部分学习的特征也能应用于另部分上，而均值池化则是具体的实现方法。Statistical characteristics of local regions are inherent characteristics of natural images, that is, the statistical characteristics of a part of natural images are similar to other parts. For example, a region of a landscape image has similarities to other regions. This means that the features learned in one part of the image can also be applied to another part, and the mean pooling is the specific implementation method.

步骤5，基于有标签训练数据集训练Softmax分类器。Step 5, train the Softmax classifier based on the labeled training data set.

本步骤进一步包括子步骤：This step further includes sub-steps:

步骤5.1，构建Softmax分类器训练样本集。Step 5.1, construct the training sample set of Softmax classifier.

将卷积与池化后的有标签训练数据组成训练样本集{(x¹,y¹),(x²,y²),...,(x^K,y^K)}，K为有标签训练样本数量，xⁱ表示第i个训练样本，即卷积与池化后的有标签训练数据，yⁱ为训练样本xⁱ对应的分类标记。设Softmax分类器用于解决k分类问题，则yⁱ∈{1,2,...,k}。The labeled training data after convolution and pooling form a training sample set {(x ¹ ,y ¹ ),(x ² ,y ² ),...,(x ^K ,y ^K )}, K is labeled The number of training samples, ^xi represents the i-th training sample, that is, the labeled training data after convolution and pooling, and y ⁱ is the classification label corresponding to the training sample ^xi . Suppose the Softmax classifier is used to solve the k classification problem, then y ⁱ ∈ {1,2,...,k}.

步骤5.2，构造Softmax分类器回归模型。Step 5.2, construct the Softmax classifier regression model.

设θ为模型参数，为待估值； $h_{θ} (x^{i}) = [\begin{matrix} p (y^{i} = 1 | x^{i}; θ) \\ p (y^{i} = 2 | x^{i}; θ) \\ . \\ . \\ . \\ p (y^{i} = k | x^{i}; θ) \end{matrix}]$ 是模型参数θ的估值函数，其中估值函数h_θ(xⁱ)的代价函数J(θ)为：Let θ be the model parameter, which is to be estimated; $h_{θ} (x^{i}) = [\begin{matrix} p ({the y}^{i} = 1 | x^{i}; θ) \\ p ({the y}^{i} = 2 | x^{i}; θ) \\ . \\ . \\ . \\ p ({the y}^{i} = k | x^{i}; θ) \end{matrix}]$ is the evaluation function of the model parameter θ, where The cost function J(θ) of the evaluation function h _θ ( ^xi ) is:

$J J ((θ θ)) = = - - \frac{11}{K K} [[{Σ Σ}_{i i = = 11}^{K K} {Σ Σ}_{j j = = 11}^{k k} [[f f (({y the y}^{i i} = = j j)) log log {h h}_{θ θ} (({x x}^{i i}))]]]] - - - - - - ((88))$

其中，f(yⁱ＝j)是指示函数，取值为0或1，若第i个训练样本xⁱ标签为类别j，则函数f(yⁱ＝j)＝1，否则，函数f(yⁱ＝j)＝0。Among them, f(y ⁱ =j) is an indicator function, the value is 0 or 1, if the i-th training sample x ⁱ label is category j, then the function f(y ⁱ =j)=1, otherwise, the function f( y ⁱ =j)=0.

步骤5.3，定义模型参数θ的梯度 Step 5.3, define the gradient of the model parameter θ

${&dtri; &dtri;}_{θj θj} J J ((θ θ)) = = - - \frac{11}{K K} {Σ Σ}_{i i = = 11}^{K K} [[{x x}^{i i} ((f f (({y the y}^{i i} = = j j)) - - p p (({y the y}^{i i} = = j j | | {x x}^{i i};; θ θ))))]] - - - - - - ((99))$

公式(9)给出了模型参数θ的迭代规则，本具体实施中采用LBFGS参数训练算法迭代求解模型参数θ，基于公式(9)的迭代规则进行迭代计算，待迭代收敛或达到最大迭代次数的当前回归模型参数θ，即Softmax分类器回归模型参数θ的最优解，获得了回归模型参数θ，即完成了Softmax分类器的训练。Formula (9) gives the iteration rule for the model parameter θ. In this specific implementation, the LBFGS parameter training algorithm is used to iteratively solve the model parameter θ, and iterative calculation is performed based on the iteration rule of formula (9). When the iteration converges or reaches the maximum number of iterations The current regression model parameter θ is the optimal solution of the regression model parameter θ of the Softmax classifier, and the regression model parameter θ is obtained, that is, the training of the Softmax classifier is completed.

完成步骤1～5后，对待识别图像，采用稀疏自编码深度神经网络学习到的基本图像特征集进行卷积与池化，将卷积和池化后的待识别图像输入训练好的Softmax分类器，即可获得分类结果，即可判断为待识别图像为火、红花、红叶或红旗的图像，或为烟、雾、云或霾的图像。After completing steps 1 to 5, the image to be recognized is convolved and pooled using the basic image feature set learned by the sparse self-encoded deep neural network, and the convolved and pooled image to be recognized is input into the trained Softmax classifier , the classification result can be obtained, and it can be judged that the image to be recognized is an image of fire, red flower, red leaf or red flag, or an image of smoke, fog, cloud or haze.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention belongs can make various modifications or supplements to the described specific embodiments or adopt similar methods to replace them, but they will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.

Claims

1. The fireworks recognition method based on image deep learning, is characterized in that, comprises steps:

Step 1, collect a sample image set, including (1) an unlabeled sample image set composed of unclassified and labeled target images and target similar images and (2) a labeled sample image composed of classified and labeled target images and target similar images set;

Step 2, randomly obtain unit image blocks from the unlabeled sample image set and the labeled sample image set, respectively, to form an unlabeled training data set and a labeled training data set;

Step 3, performing whitening preprocessing on the training data in the unlabeled training data set and the labeled training data set, wherein the training data is the color value matrix of the RGB three-color channel corresponding to the unit image block;

Step 4, based on the unlabeled training data after whitening preprocessing, use unsupervised learning to construct a deep neural network based on sparse self-encoding, and extract the basic image feature set of the unlabeled training data;

Step 5, convolving the basic image features of the unlabeled training data and pooling the image data, the image data including the labeled training data and the image to be recognized;

Step 6, training the Softmax classifier based on the labeled training data set after convolution and pooling;

Step 7, input the image to be recognized after convolution and pooling into the trained Softmax classifier to obtain the recognition result.

2. the fireworks recognition method based on image deep learning as claimed in claim 1, is characterized in that:

The whitening preprocessing described in step 3 is ZCA whitening preprocessing or PCA whitening preprocessing.

3. the fireworks recognition method based on image deep learning as claimed in claim 1, is characterized in that:

Step 4 further includes sub-steps:

4.1 Construct a deep neural network, including a single input layer, multiple hidden layers and a single output layer;

4.2 Use the unlabeled training data after whitening preprocessing as the input and output of the deep neural network, and perform unsupervised learning by training the deep neural network based on sparse self-encoding;

4.3 Extract the basic image feature set of unlabeled training data based on the trained deep neural network.

4. the pyrotechnic identification method based on image deep learning as claimed in claim 3, is characterized in that:

Unsupervised learning by training a deep neural network based on sparse autoencoder described in sub-step 4.2, specifically:

4.2.1 Obtain the weighted sum of neuron input values and neuron output values;

4.2.2 Set the target cost function with sparsity restriction;

4.2.3 Set the gradient descent direction of the weight coefficient vector and bias item vector of the deep neural network, that is, the iteration rule;

4.2.4 Use the LBFGS parameter training algorithm to iteratively solve the weight coefficient vector and bias item vector according to the set iteration rules.

5. the pyrotechnic identification method based on image deep learning as claimed in claim 1, is characterized in that:

Step 5 further includes sub-steps:

5.1 Convolve the basic image features of the unlabeled training data with each color channel of the image data to obtain a convolution image;

5.2 Using the statistical characteristics of local areas in natural images, the feature dimensionality reduction of convolutional images is realized through mean pooling.

6. the pyrotechnic identification method based on image deep learning as claimed in claim 1, is characterized in that:

Step 6 further includes sub-steps:

6.1 Use the labeled training data set after convolution and pooling as the training sample;

6.2 Construct a Softmax classifier regression model;

6.3 Set the gradient of the regression model parameters, that is, the iteration rule;

6.4 Use the LBFGS parameter training algorithm to iteratively solve the model parameters according to the set iteration rules .

7. The firework recognition system based on image depth learning, is characterized in that, comprises:

The sample image collection module is used to collect sample image sets, including (1) an unlabeled sample image set composed of unclassified and marked target images and target similar object images and (2) an unlabeled sample image set composed of classified and marked target images and target similar object images A set of labeled sample images;

The training data acquisition module is used to randomly obtain unit image blocks from the unlabeled sample image set and the labeled sample image set respectively to form an unlabeled training data set and a labeled training data set;

The whitening preprocessing module is used to carry out whitening preprocessing to the training data in the unlabeled training data set and the training data set with labels, and the described training data is the color value matrix of the RGB three-color channel corresponding to the unit image block;

The unsupervised learning module is used to construct a deep neural network based on sparse self-encoding by using unsupervised learning based on the unlabeled training data after whitening preprocessing, and extract the basic image feature set of the unlabeled training data;

Convolution and pooling modules are used to convolve the basic image features of unlabeled training data with pooled image data, and the image data includes labeled training data and images to be identified;

The classifier training module is used to train the Softmax classifier based on the labeled training data set after convolution and pooling;

The identification module is used to input the image to be identified after convolution and pooling into the trained Softmax classifier to obtain the identification result.