CN110046599A

CN110046599A - Intelligent control method based on depth integration neural network pedestrian weight identification technology

Info

Publication number: CN110046599A
Application number: CN201910330924.0A
Authority: CN
Inventors: 梁子; 华如照; 张越; 迟剑宁; 王文浩
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2019-07-23

Abstract

The invention provides an intelligent monitoring method based on the deep fusion neural network pedestrian re-identification technology. The method of the invention performs color enhancement preprocessing on the acquired image, extracts traditional manual features and deep residual convolution neural network to extract the image, and then extracts the traditional manual features after dimension reduction and the well-trained neural network. The obtained deep features are fused to complete the recognition of the target pedestrian. Compared with the prior art, the present invention greatly improves the recognition accuracy, and the pedestrian re-identification algorithm included in the present invention increases the recognition success rate to 81.74%, making the technology completely practical. The process of pedestrian re-identification is completed automatically, and details will not be lost due to personnel fatigue.

Description

Intelligent monitoring method based on deep fusion neural network pedestrian re-identification technology

技术领域technical field

本发明涉及智能监控技术领域，尤其涉及一种基于深度融合神经网络行人重识别技术的智能监控方法。The invention relates to the technical field of intelligent monitoring, in particular to an intelligent monitoring method based on a deep fusion neural network pedestrian re-identification technology.

背景技术Background technique

行人重识别也称行人再识别，是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。由于不同摄像设备之间的差异，行人兼具刚性和柔性的特性，外观易受穿着、尺度、遮挡、姿态和视角等影响，因此，行人重识别成为计算机视觉领域中一个既具有研究价值又极具挑战性的热门课题。Pedestrian re-identification, also known as pedestrian re-identification, is a technology that uses computer vision technology to determine whether there is a specific pedestrian in an image or video sequence. Due to the differences between different camera devices, pedestrians have both rigid and flexible characteristics, and their appearance is easily affected by clothing, scale, occlusion, posture, and perspective. Therefore, pedestrian re-identification has become a research value in the field of computer vision. Challenging hot topic.

当前，行人重识别的研究方法主要有两种：基于手工特征的方法和基于深度特征的方法。基于手工特征的方法通常要找到一种鲁棒性良好的特征以解决光照和视角变化带来的影响，该方法多与度量学习进行结合；而基于深度特征的方法则着重从训练数据中得到匹配于标签的自适应结构，从而实现对目标行人端对端的识别。近年来，卷积神经网络提取的深度特征已经被证明有着良好的鲁棒性。自2014年被引入行人重识别领域后，基于卷积神经网络的深度特征学习和深度距离度量成为行人重识别的主流。此后的行人重识别算法多是对其进行了结构上的改进，包括Weihua Chen于2017年提出的基于triplet loss的交叉神经网络、Yifan Sun于2017年提出的基于奇异值分解的奇异值神经网络(SVDNet)和基于人体关节结构特征的SpindleNet。这些算法都有不俗的表现，但是在鲁棒性上仍然有待提升。2016年，郑伟诗提出以手工特征和深度特征融合的方式进行重识别。该算法推导了损失函数到每一个参数的反向传播过程，也证明了传统手工特征对神经网络参数的约束，但是在实际应用中该神经网络难以训练，收敛速度很难保证，并且手工特征和深度特征的耦合很微弱。Currently, there are two main research methods for person re-identification: methods based on handcrafted features and methods based on deep features. Methods based on hand-crafted features usually need to find a robust feature to deal with the effects of illumination and viewing angle changes, which are mostly combined with metric learning; while methods based on deep features focus on matching from training data Based on the self-adaptive structure of the label, it can realize the end-to-end identification of the target pedestrian. In recent years, deep features extracted by convolutional neural networks have been shown to be robust. Since it was introduced into the field of person re-identification in 2014, deep feature learning and deep distance measurement based on convolutional neural networks have become the mainstream of person re-identification. Most of the pedestrian re-identification algorithms since then have been structurally improved, including the triplet loss-based cross neural network proposed by Weihua Chen in 2017, and the singular value neural network based on singular value decomposition proposed by Yifan Sun in 2017 ( SVDNet) and SpindleNet based on human joint structure features. These algorithms have good performance, but there is still room for improvement in robustness. In 2016, Zheng Weishi proposed to re-identify by the fusion of manual features and deep features. The algorithm deduces the back-propagation process of the loss function to each parameter, and also proves the constraints of the traditional hand-crafted features on the parameters of the neural network. However, in practical applications, the neural network is difficult to train, the convergence speed is difficult to guarantee, and the hand-crafted features and The coupling of deep features is weak.

发明内容SUMMARY OF THE INVENTION

根据上述提出的技术问题，而提供一种基于深度融合神经网络行人重识别技术的智能监控系统。本发明从手工特征和深度特征两方面着手，从而有效提高匹配率。本发明采用的技术手段如下：According to the technical problems raised above, an intelligent monitoring system based on the deep fusion neural network pedestrian re-identification technology is provided. The present invention starts from two aspects of manual feature and depth feature, thereby effectively improving the matching rate. The technical means adopted in the present invention are as follows:

如图1所示，一种基于深度融合神经网络行人重识别技术的智能监控方法，包括如下步骤：As shown in Figure 1, an intelligent monitoring method based on deep fusion neural network pedestrian re-identification technology includes the following steps:

S1、从监控视频中逐帧获取图像；S1. Obtain images frame by frame from the surveillance video;

S2、对图像进行色彩增强的预处理，之后，将其发送至提取手工特征的局部最大值结构和提取深度特征的神经网络中；S2. Preprocess the image for color enhancement, and then send it to the local maximum structure for extracting hand-crafted features and the neural network for extracting deep features;

S3A、提取传统手工特征，具体为：使用局部最大池化算法提取特征向量，使用预设尺度的方格以一定步长遍历整张图片，并对每个方格里的图像提取颜色特征和纹理特征，然后对整个特征向量进行降维处理；S3A, extracting traditional hand-crafted features, specifically: using local maximum pooling algorithm to extract feature vectors, using preset-scale squares to traverse the entire picture with a certain step size, and extracting color features and textures from the images in each square feature, and then perform dimensionality reduction on the entire feature vector;

S3B、使用深度残差卷积神经网络对图像进行提取和高斯池化；S3B, image extraction and Gaussian pooling using deep residual convolutional neural networks;

S4、将降维之后的传统手工特征和训练完好的神经网络提取到的深度特征进行融合，完成对目标行人的识别；S4. Integrate the traditional manual features after dimensionality reduction and the deep features extracted by the well-trained neural network to complete the identification of the target pedestrian;

S5、对识别到的行人进行跟踪，并在跟踪置信度降低至某一个阈值时重复上述流程进行重识别，同时，对识别的结果进行预设样式的标注显示。S5. Track the identified pedestrians, and repeat the above process for re-identification when the tracking confidence is reduced to a certain threshold, and at the same time, display the identified results in a preset style.

进一步地，所述色彩增强的预处理具体为：使用多尺度图像增强算法分别使用三种不同尺度的高斯参数对得到的图像进行色彩增强。Further, the preprocessing of the color enhancement is specifically: using a multi-scale image enhancement algorithm to perform color enhancement on the obtained image by using Gaussian parameters of three different scales respectively.

进一步地，所述传统手工特征的提取中，颜色特征的提取包括：将进行过预处理的图像由RGB转化为HSV颜色空间，并在此基础上得到图像的颜色直方图；纹理特征的提取具体为使用尺度不变的三值模式编码技术提取，所述降维处理具体为：使用高斯池化，将数据以预设大小进行切块，之后分别获取切块数据的原点矩和中心矩以表示块中数据，从而实现对整个特征向量的降维。Further, in the extraction of the traditional manual features, the extraction of the color features includes: converting the preprocessed image from RGB to the HSV color space, and obtaining the color histogram of the image on this basis; In order to extract using the scale-invariant ternary mode coding technology, the dimensionality reduction process is specifically: using Gaussian pooling, dicing the data with a preset size, and then obtaining the origin moment and the central moment of the diced data respectively to represent The data in the block can be used to reduce the dimension of the entire feature vector.

进一步地，所述S3B具体为：动态模糊神经网络将预处理后的图像随机切除，之后重新调节至预设尺寸，并传入到深度神经网络ResNet50中，输入图像依次通过卷积层、局部归一化层、修正线性激活函数、最大池化的处理，之后进入卷积模块进行降维处理，每经过一个降维模块，图像的尺寸都会减少2*2。Further, the S3B is specifically: the dynamic fuzzy neural network randomly cuts off the preprocessed image, then re-adjusts it to a preset size, and transmits it to the deep neural network ResNet50, and the input image sequentially passes through the convolution layer, the local normalization layer. The normalization layer, the modified linear activation function, and the maximum pooling process, and then enter the convolution module for dimensionality reduction processing. After each dimensionality reduction module, the size of the image will be reduced by 2*2.

进一步地，所述S4具体为：使用4096维度的全连接融合层分别连接两个向量的输出，之后进行正则化、批量归一化、非线性激活等操作，最后连接至分类器，采用softmax函数接受结果并给出预测值，从而完成对目标行人的识别。Further, the S4 is specifically: using a fully connected fusion layer of 4096 dimensions to connect the outputs of the two vectors respectively, then performing operations such as regularization, batch normalization, nonlinear activation, etc., and finally connecting to the classifier, using the softmax function. Accept the result and give the predicted value to complete the identification of the target pedestrian.

进一步地，所述步骤S4具体包括如下步骤：Further, the step S4 specifically includes the following steps:

S41、将降维之后的传统手工特征和训练完好的神经网络提取到的深度特征进行融合，融合后总的特征表示为：S41. Integrate the traditional handcrafted features after dimensionality reduction with the deep features extracted by the well-trained neural network, and the total features after fusion are expressed as:

F_z＝[F_h，F_d]F _z = [F _h , F _d ]

其中，F_d为从ResNet50模型中得到的深度特征向量，F_h为经过高斯池化降维之后的传统手工特征；Among them, F _d is the deep feature vector obtained from the ResNet50 model, and F _h is the traditional handcrafted feature after Gaussian pooling dimension reduction;

S42、将传统手工特征连接到拼接层中，依次连接批量归一化、非线性激活函数后进行分类器的连接操作，具体表示为：S42. Connect the traditional handcrafted features to the splicing layer, and then connect the batch normalization and nonlinear activation functions in turn, and then perform the connection operation of the classifier, which is specifically expressed as:

式中h(·)代表激活函数，和b_f表示连接层的权重系数和迁移向量，where h( ) represents the activation function, and b _f represent the weight coefficient and transfer vector of the connection layer,

μ_z和σ_z代表均值和方差的特性，之后连接至分类器，此处使用softmax结构作为分类的一层，具体为：μ _z and σ _z represent the characteristics of mean and variance, and then connect to the classifier, where the softmax structure is used as a layer of classification, specifically:

其中，x表示上一层神经网络的输出构成的向量，θ为经过训练得到的形参向量，分母的存在是为了对预测输出进行归一化。Among them, x represents the vector formed by the output of the previous layer of neural network, θ is the formal parameter vector obtained after training, and the existence of the denominator is to normalize the predicted output.

利用交叉熵损失函数，记交叉熵损失为J，则有Using the cross-entropy loss function, denote the cross-entropy loss as J, then we have

其中，p_k表示第k个输出所对应的softmax输出值(也可理解为神经网络预测为k结果的概率)。Among them, p _k represents the softmax output value corresponding to the kth output (which can also be understood as the probability that the neural network predicts the k result).

进一步地，模型的训练基于梯度冻结训练法进行训练，具体包括如下步骤：Further, the training of the model is based on the gradient freezing training method, which specifically includes the following steps:

在数据集上训练ResNet50深度残差神经网络和对应连接的2048维度全连接层分类器；Train the ResNet50 deep residual neural network and the corresponding connected 2048-dimensional fully connected layer classifier on the dataset;

待训练至模型收敛后，在深度融合神经网络中导入训练好的ResNet50的参数，重新连接新的全连接分类和softmax网络，再在相同的数据集上进行训练；在训练时，将已经利用预训练模型初始化过的提取深度特征的网络参数设置较低的学习率，重点训练融合环节和分类器的权重参数，最终完成对整个融合网络的训练。After training to the convergence of the model, import the trained ResNet50 parameters into the deep fusion neural network, reconnect the new fully connected classification and softmax network, and then train on the same dataset; The network parameters for extracting deep features initialized by the training model are set to a low learning rate, focusing on training the weight parameters of the fusion link and the classifier, and finally completing the training of the entire fusion network.

进一步地，所述数据集具体为Market1501行人重识别数据集。Further, the data set is specifically the Market1501 pedestrian re-identification data set.

本发明具有以下优点：The present invention has the following advantages:

1、本发明与现有技术相比，识别精度大大提高，将深度特征ResNet50与手工特征LOMO进行结合之后的深度融合神经网络其精确度比单纯使用ResNet50提升了30％，本发明中包含的行人重识别算法将识别成功率提升到了81.74％，使得该技术完全可实用化。1. Compared with the prior art, the recognition accuracy of the present invention is greatly improved. The accuracy of the deep fusion neural network after combining the deep feature ResNet50 with the hand-crafted feature LOMO is improved by 30% compared with the simple use of ResNet50. The re-identification algorithm increased the recognition success rate to 81.74%, making the technology fully practical.

2、本发明使用梯度冻结训练方法时间减少10个间期，使得融合神经网络的在训练时间上的劣势得到缓解，并在保证准确率的基础上提高了收敛的速度。2. The present invention uses the gradient freezing training method to reduce the time by 10 intervals, so that the disadvantage of the fusion neural network in training time is alleviated, and the convergence speed is improved on the basis of ensuring the accuracy.

3、本发明仅需在人工监控的基础上安装一个可用鼠标操作的程序，用户无需纠结于技术内部涉及的算法细节，对行人自动进行识别和跟踪，因而使用较为简单。3. The present invention only needs to install a mouse-operable program on the basis of manual monitoring, and the user does not need to worry about the algorithm details involved in the technology, and the pedestrian can be automatically identified and tracked, so the use is relatively simple.

4、提高了安保领域的自动化程度。传统安全监控需要大量的人力进行肉眼排查，某些破案人员甚至要目不转睛，而本技术可以将这一过程自动化，所有的过程均为自动完成，不会因为人员疲劳而遗失细节。4. The degree of automation in the security field has been improved. Traditional security monitoring requires a lot of manpower for visual inspection, and some criminals even have to keep their eyes open. This technology can automate this process. All processes are completed automatically, and details will not be lost due to personnel fatigue.

基于上述理由本发明可在智能监控技术领域广泛推广。Based on the above reasons, the present invention can be widely promoted in the field of intelligent monitoring technology.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明一种基于深度融合神经网络行人重识别技术的智能监控方法流程图。FIG. 1 is a flowchart of an intelligent monitoring method based on the deep fusion neural network pedestrian re-identification technology of the present invention.

图2为本发明实施例LOMO与ResNet50具体设计流程图。FIG. 2 is a specific design flow chart of LOMO and ResNet50 according to an embodiment of the present invention.

图3为本发明实施例深度融合具体细节图。FIG. 3 is a detailed detailed diagram of deep fusion according to an embodiment of the present invention.

图4为本发明实施例梯度冻结训练法优越性示意图。FIG. 4 is a schematic diagram showing the advantages of the gradient freezing training method according to the embodiment of the present invention.

图5为本发明实施例与其他方法检测的对比图。FIG. 5 is a comparison diagram of the embodiment of the present invention and detection by other methods.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

如图1所示，本实施例提供了一种基于深度融合神经网络行人重识别技术的智能监控方法，包括如下步骤：As shown in FIG. 1 , this embodiment provides an intelligent monitoring method based on the deep fusion neural network pedestrian re-identification technology, including the following steps:

S1、从监控视频中逐帧获取图像，具体地，读取摄像头中的视频流并逐帧获取信息；采用opencv的Python封装实现对视频流的读取从而得到每一帧的图像张量，将数据留备使用。S1. Obtain images frame by frame from the surveillance video, specifically, read the video stream in the camera and obtain information frame by frame; use the Python package of opencv to read the video stream to obtain the image tensor of each frame, Data is reserved for use.

S2、对图像进行色彩增强的预处理，之后，将其发送至提取手工特征的局部最大值结构和提取深度特征的神经网络中；所述色彩增强的预处理具体为：使用多尺度图像增强算法分别使用三种不同尺度的高斯参数对得到的图像进行色彩增强。S2. Perform color enhancement preprocessing on the image, and then send it to a neural network for extracting local maximum structures of manual features and extracting depth features; the color enhancement preprocessing is specifically: using a multi-scale image enhancement algorithm The resulting images are color-enhanced using three Gaussian parameters at different scales, respectively.

S3A、提取传统手工特征，具体为：使用局部最大池化算法提取特征向量，使用预设尺度的方格以一定步长遍历整张图片，并对每个方格里的图像提取颜色特征和纹理特征，然后对整个特征向量进行降维处理；所述传统手工特征的提取中，颜色特征的提取包括：将进行过预处理的图像由RGB转化为HSV颜色空间，并在此基础上得到图像的颜色直方图；纹理特征的提取具体为使用尺度不变的三值模式编码技术提取，所述降维处理具体为：使用高斯池化，将数据以预设大小进行切块，之后分别获取切块数据的原点矩和中心矩以表示块中数据，从而实现对整个特征向量的降维。S3A, extracting traditional hand-crafted features, specifically: using local maximum pooling algorithm to extract feature vectors, using preset-scale squares to traverse the entire picture with a certain step size, and extracting color features and textures from the images in each square feature, and then perform dimensionality reduction processing on the entire feature vector; in the extraction of the traditional manual features, the extraction of color features includes: converting the preprocessed image from RGB to HSV color space, and obtaining the Color histogram; the extraction of texture features is specifically extracted by using a scale-invariant ternary mode coding technology, and the dimensionality reduction process is specifically: using Gaussian pooling to cut the data into pieces with a preset size, and then obtain the pieces separately. The origin moment and central moment of the data are used to represent the data in the block, so as to reduce the dimension of the entire feature vector.

S3B、使用深度残差卷积神经网络对图像进行提取和高斯池化；具体地，动态模糊神经网络将预处理后的图像随机切除，之后重新调节至预设尺寸，并传入到深度神经网络ResNet50中，输入图像依次通过卷积层、局部归一化层、修正线性激活函数、最大池化的处理，之后进入卷积模块进行降维处理，每经过一个降维模块，图像的尺寸都会减少2*2。S3B, using a deep residual convolutional neural network to extract and Gaussian pooling; specifically, the dynamic fuzzy neural network randomly cuts the preprocessed image, and then rescales it to a preset size, and transmits it to the deep neural network In ResNet50, the input image goes through the convolution layer, local normalization layer, modified linear activation function, and maximum pooling in sequence, and then enters the convolution module for dimensionality reduction processing. After each dimensionality reduction module, the size of the image will be reduced. 2*2.

如图2、图3所示，S4、将降维之后的传统手工特征和训练完好的神经网络提取到的深度特征进行融合，完成对目标行人的识别；具体地，使用4096维度的全连接融合层分别连接两个向量的输出，之后进行正则化、批量归一化、非线性激活等操作，最后连接至分类器，采用softmax函数接受结果并给出预测值，从而完成对目标行人的识别。As shown in Figure 2 and Figure 3, S4, fuse the traditional manual features after dimensionality reduction with the deep features extracted by the well-trained neural network to complete the identification of the target pedestrian; specifically, use 4096-dimensional full connection fusion The layer connects the outputs of the two vectors respectively, and then performs regularization, batch normalization, nonlinear activation, etc., and finally connects to the classifier, and uses the softmax function to accept the results and give the predicted value, thereby completing the target pedestrian recognition.

S5、对识别到的行人进行跟踪，并在跟踪置信度降低至某一个阈值时重复上述流程进行重识别，同时，对识别的结果进行预设样式的标注显示，具体地，采用ssd算法将识别得到的行人进行跟踪，并在跟踪置信度降低至某一个阈值时再次便采用上述流程进行重识别，此过程中识别的结果将被矩形框框出来。采用pyQt5编写用户界面，并将其转化为操作系统可执行文件(例如exe文件)，方便用户使用。S5. Track the identified pedestrians, and repeat the above process for re-identification when the tracking confidence is reduced to a certain threshold. At the same time, the identified results are marked and displayed in a preset style. Specifically, the SSD algorithm is used to identify the pedestrians. The obtained pedestrian is tracked, and when the tracking confidence is reduced to a certain threshold, the above process is used again for re-identification, and the recognition result in this process will be framed by a rectangle. Use pyQt5 to write the user interface and convert it into an operating system executable file (such as an exe file), which is convenient for users to use.

所述步骤S4具体包括如下步骤：The step S4 specifically includes the following steps:

F_z＝[F_h，F_d]F _z = [F _h , F _d ]

作为优选的实施方式，模型的训练基于梯度冻结训练法进行训练，具体包括如下步骤：As a preferred embodiment, the training of the model is based on the gradient freezing training method, which specifically includes the following steps:

其中，所述数据集具体为Market1501行人重识别数据集。本文的模型即是采用Market1501数据集进行训练(train)、测试(test)和验证(validation)。Market1501数据集是2015年由Liang Zheng发布的行人重识别数据集，目前也是规模最大的数据集。该数据集取景于清华大学超市，6个不同摄像机拍摄到的共38195张行人图片，共涉及包括1501个行人，其中训练图像有12936张，来自于751个行人，测试图像共有19732张，来自另外的750个行人。The dataset is specifically the Market1501 pedestrian re-identification dataset. The model in this paper uses the Market1501 dataset for training, testing and validation. The Market1501 dataset is a pedestrian re-identification dataset released by Liang Zheng in 2015, and it is currently the largest dataset. The dataset is set in Tsinghua University supermarket. There are 38,195 pedestrian images captured by 6 different cameras, involving a total of 1,501 pedestrians. Among them, there are 12,936 training images from 751 pedestrians, and 19,732 test images from other pedestrians. of 750 pedestrians.

实施例1：Example 1:

S3B、对于一张尺寸为128*64的原始图片，DFNN首先将其随机切除，之后重新调节尺寸至256*128并传入到深度神经网络ResNet50中。输入图像依次通过卷积层、局部归一化层、修正线性激活函数、最大池化等处理，之后进入16个卷积模块。虽然这些卷积模块提取的特征越来越复杂，但是DFFN同时会对特征的尺寸进行保持和降维，卷积模块(convblock)用来对数据进行降维。与这种降维模块相同的后面还有3个，每经过一个降维模块，图像的尺寸都会减少2*2。因此，在输入与输出通过加法器相连之前需要将输入利用卷积层进行减小尺寸的处理，以保证能够正常相加。S3B. For an original image with a size of 128*64, DFNN first randomly cuts it off, then re-adjusts the size to 256*128 and passes it into the deep neural network ResNet50. The input image is successively processed by convolution layer, local normalization layer, modified linear activation function, max pooling, etc., and then enters 16 convolution modules. Although the features extracted by these convolution modules are becoming more and more complex, DFFN maintains and reduces the dimension of the features at the same time, and the convolution module (convblock) is used to reduce the dimension of the data. There are 3 other dimensionality reduction modules following the same. After each dimensionality reduction module, the size of the image will be reduced by 2*2. Therefore, before the input and output are connected through the adder, the input needs to be reduced in size by the convolution layer to ensure normal addition.

S3A、图像在经过多尺度图像增强算法的预处理之后，一部分会被表示为局部尺度不变背景差分编码，另一部分会被转化为HSV空间的颜色特征，之后采用LOMO算法抽象二者的特征描述子，得到70770维度的手工特征向量。S3A. After the image is preprocessed by the multi-scale image enhancement algorithm, a part of the image will be represented as a local scale-invariant background differential encoding, and the other part will be converted into the color feature of the HSV space, and then the LOMO algorithm is used to abstract the feature description of the two. sub, get a handcrafted feature vector of 70770 dimensions.

手工特征维度为70770，若将其直接接入全连接层，参数量则会非常巨大。即使采取传统的降维手段，也很难保证能够精确的表征整个传统特征。因此对数据有一个合理可靠的表征十分必要。考虑到在提取传统特征时利用到的LOMO算法中有使用到利用最大的分量来表征所有的该分量的分布的手段，因而此处不再采用最大池化，而是对一定长度的模块使用平均池化。为了进一步更可靠的保证主要信息不丢失，本文假设特征数据的分类符合一维高斯分布，因而借用期望和方差两个数据来表征局部分布，将70个数据分为1组，求取其高斯参数，之后连接在一起，便得到了2022维度的特征向量，在尽可能保证精度的同时完成了降维。The manual feature dimension is 70770. If it is directly connected to the fully connected layer, the amount of parameters will be huge. Even if traditional dimensionality reduction methods are adopted, it is difficult to ensure that the entire traditional features can be accurately represented. Therefore, it is necessary to have a reasonably reliable representation of the data. Considering that the LOMO algorithm used in extracting traditional features uses the means of using the largest component to characterize the distribution of all the components, the maximum pooling is no longer used here, but the average length is used for modules of a certain length. pooling. In order to further and more reliably ensure that the main information is not lost, this paper assumes that the classification of the feature data conforms to the one-dimensional Gaussian distribution, so two data of expectation and variance are used to represent the local distribution, 70 data are divided into 1 group, and its Gaussian parameters are obtained. , and then connected together, a 2022-dimensional feature vector is obtained, and dimensionality reduction is completed while ensuring the accuracy as much as possible.

S4、将降维之后的传统手工特征和训练完好的神经网络提取到的深度特征进行融合，连接到整体的2022+2048＝4070维度的拼接层中，实现了两个特征的强对抗和强耦合。S4. Integrate the traditional handcrafted features after dimensionality reduction with the deep features extracted by the well-trained neural network, and connect them to the overall splicing layer of 2022+2048=4070 dimensions, realizing strong confrontation and strong coupling of the two features. .

基于梯度冻结训练法进行训练后得到最终的模型，对其进行测试，从而评估其优越性，具体地，行人重识别常用的评估方法是CMC(Accumulative Match Characteristic)。该评估方法认为行人重识别任务即为目标图像与待选定图像的相似度排序任务。CMC(k)是在重识别任务中是相似度程度为k的图片，计算机在第一次识别就正确得到结论的频率被称为Rank1，Rank1可以用来衡量行人重识别的准确率。After training based on the gradient freezing training method, the final model is obtained and tested to evaluate its superiority. Specifically, the commonly used evaluation method for person re-identification is CMC (Accumulative Match Characteristic). This evaluation method considers that the task of person re-identification is the task of sorting the similarity between the target image and the image to be selected. CMC(k) is a picture with a similarity degree of k in the re-identification task. The frequency of the computer getting the correct conclusion in the first recognition is called Rank1, and Rank1 can be used to measure the accuracy of pedestrian re-identification.

在多标签图像分类中，无法简单使用单标签分类中使用的平均准确率，而是使用mAP(mean Average Precision)评估方法。该评估方法是通过计算准确率和召回率求取平均准确率(AP,Average Precision)，之后再求得mAP值。一般mAP数值要略低于Rank1数值。In multi-label image classification, the average accuracy used in single-label classification cannot be simply used, but the mAP (mean Average Precision) evaluation method is used. The evaluation method is to calculate the average precision rate (AP, Average Precision) by calculating the precision rate and recall rate, and then calculate the mAP value. Generally, the mAP value is slightly lower than the Rank1 value.

如图5所示，在每一行中，左边的图像是样本，所有的识别结果按降序排列，红色框表示错误匹配，蓝色框表示正确匹配。即本申请的DFNN中，除第4个外均表示正确匹配。而SOMAnet仅有第2、7表示正确，其他均错误。对于第二组示例，本申请的DFNN中，第2、4、7、10表示正确，其他表示错误，而SOMAnet仅有第6识别正确，其他均识别错误。利用介绍过的Market1501数据集中给定的751个行人ID共12936张图片进行训练，之后利用额外的750个行人ID共19732张图片进行测试。As shown in Figure 5, in each row, the image on the left is the sample, and all the recognition results are sorted in descending order, with red boxes representing false matches and blue boxes representing correct matches. That is, in the DFNN of the present application, all except the fourth one indicate a correct match. And SOMAnet only 2 and 7 are correct, others are wrong. For the second set of examples, in the DFNN of this application, the 2nd, 4th, 7th, and 10th are correct, and the others are incorrect, while SOMAnet only has the 6th correct identification, and the others are all incorrect. The 751 pedestrian IDs given in the Market1501 dataset are used for training and a total of 12936 images are used for training, and then an additional 750 pedestrian IDs are used for testing with a total of 19732 images.

训练过程中，梯度冻结的训练方法具体过程如下：During the training process, the specific process of the training method of gradient freezing is as follows:

1)在Market1501数据集上训练ResNet50深度残差神经网络和对应连接的2048维度全连接层分类器；1) Train the ResNet50 deep residual neural network and the corresponding connected 2048-dimensional fully connected layer classifier on the Market1501 dataset;

2)在深度融合神经网络中导入训练好的ResNet50的参数，重新连接新的全连接分类和softmax网络，再在相同的数据集上进行训练。2) Import the trained ResNet50 parameters into the deep fusion neural network, reconnect the new fully connected classification and softmax network, and then train on the same dataset.

如图4所示，在只使用ResNet50网络训练时，ResNet50网络学习率、深度融合神经网络全连接分类器的学习率均设置为0.01。在对深度神经网络训练时，ResNet50的学习率被设置为较低的0.001。本模型中的优化方法为基于动量修正的小批量梯度下降法，动量(momentum)数值为0.9，权重延迟为0.0005。在ResNet训练完成后，对其进行了测试，之后在对深度神经网络进行训练之后，又进行了第二次测试。实验证明单纯使用ResNet50得到的Rank1为51％，而使用深度融合神经网络测试得到的Rank1为82％，从精度上提升了31％。As shown in Figure 4, when only the ResNet50 network is used for training, the learning rate of the ResNet50 network and the learning rate of the fully connected classifier of the deep fusion neural network are both set to 0.01. When training deep neural networks, the learning rate of ResNet50 is set to a low 0.001. The optimization method in this model is a mini-batch gradient descent method based on momentum correction, the momentum value is 0.9, and the weight delay is 0.0005. After the ResNet was trained, it was tested, and after the deep neural network was trained, a second test was performed. Experiments show that the Rank1 obtained by simply using ResNet50 is 51%, while the Rank1 obtained by using the deep fusion neural network test is 82%, which is a 31% improvement in accuracy.

通过本申请深度融合神经网络算法(DFNN)与近年来的行人重识别算法进行比较，来验证本算法的性能，测试结果如表1所示。The performance of the algorithm is verified by comparing the deep fusion neural network algorithm (DFNN) of the present application with the pedestrian re-identification algorithm in recent years. The test results are shown in Table 1.

表1Table 1

如表1所示，DFNN的Rank1为81.74％，高于表中任何其他算法。DFNN模型优于LOMO+XQDA等经典算法，因为使用ResNet50提取了深度特征。此外，DFNN模型比SpindleNet或Triplet CNN等模型要好，因为在模型中导入了手工特征来约束深度神经网络。可以看出，深度融合之后识别准确率有了明显提升。As shown in Table 1, the Rank1 of DFNN is 81.74%, which is higher than any other algorithms in the table. The DFNN model outperforms classic algorithms such as LOMO+XQDA because deep features are extracted using ResNet50. Also, DFNN models are better than models like SpindleNet or Triplet CNN because handcrafted features are imported into the model to constrain the deep neural network. It can be seen that the recognition accuracy has been significantly improved after deep fusion.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制。尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换，而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements to some or all of the technical features thereof, However, these modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. a kind of intelligent control method based on depth integration neural network pedestrian weight identification technology, which is characterized in that including such as Lower step:

S1, image is obtained frame by frame from monitor video；

S2, the pretreatment that color enhancement is carried out to image send it to the local maximum structure for extracting manual feature later In the neural network of extraction depth characteristic；

S3A, traditional-handwork feature is extracted, specifically: feature vector is extracted using local maxima pond algorithm, uses default scale Grid whole picture is traversed with a fixed step size, and to the image zooming-out color characteristic and textural characteristics in each grid, then Dimension-reduction treatment is carried out to entire feature vector；

S3B, image is extracted using depth residual error convolutional neural networks and Gauss pond；

S4, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are merged, Complete the identification to target pedestrian；

S5, the pedestrian recognized is tracked, and repeat when tracking creditability is reduced to some threshold value above-mentioned process into It is capable to identify again, meanwhile, the mark for carrying out the preset style to the result of identification is shown.

2. the intelligent control method according to claim 1 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the pretreatment of the color enhancement specifically: use three kinds of different rulers respectively using multi-scale image enhancing algorithm The Gaussian parameter of degree carries out color enhancement to obtained image.

3. the intelligent control method according to claim 2 based on depth integration neural network pedestrian weight identification technology, Be characterized in that, in the extraction of the traditional-handwork feature, the extraction of color characteristic include: will carry out pretreated image by RGB is converted into hsv color space, and obtains the color histogram of image on this basis；The extraction of textural characteristics is specially to make It is extracted with three value pattern-coding technologies of Scale invariant, the dimension-reduction treatment specifically: using Gauss pond, by data with default Size carries out stripping and slicing, and the moment of the orign and central moment for obtaining stripping and slicing data respectively later are to indicate data in block, to realize to whole The dimensionality reduction of a feature vector.

4. the intelligent control method according to claim 2 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the S3B specifically: dynamic fuzzy neural network cuts off pretreated image at random, readjusts later It to pre-set dimension, and is passed in deep neural network ResNet50, input picture passes sequentially through convolutional layer, part normalization The processing of layer, the linear activation primitive of amendment, maximum pond enters convolution module later and carries out dimension-reduction treatment, every by a drop Module is tieed up, the size of image can all reduce 2*2.

5. the intelligence prison according to any one of claims 1 to 4 based on depth integration neural network pedestrian weight identification technology Prosecutor method, which is characterized in that the S4 specifically: be separately connected the defeated of two vectors using the full connection fused layer of 4096 dimensions Out, the operations such as regularization, batch normalization, nonlinear activation are carried out later, classifier are finally coupled to, using softmax letter Number receives result and provides predicted value, to complete the identification to target pedestrian.

6. the intelligent control method according to claim 5 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the step S4 specifically comprises the following steps:

S41, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are merged, Total character representation after fusion are as follows:

F_z=[F_h, F_d]

Wherein, F_dFor the depth characteristic vector obtained in the ResNet50 model, F_hFor by the tradition after the dimensionality reduction of Gauss pond Manual feature；

S42, traditional-handwork feature is connected in splicing layer, is carried out after being sequentially connected batch normalization, nonlinear activation function The attended operation of classifier, is embodied as:

H () represents activation primitive in formula,And b_fIndicate the weight coefficient and migration vector of articulamentum,

μ_zAnd σ_zThe characteristic for representing mean value and variance, is connected to classifier later, is used herein as softmax structure as classification One layer, specifically:

Wherein, x indicates that the vector that the output of one layer of neural network is constituted, θ are the parameter vector obtained by training, denominator In the presence of be in order to prediction output be normalized.

Using cross entropy loss function, it is J that note, which intersects entropy loss, then has

Wherein, p_kIndicate that (also be understood as neural network prediction is k result to the corresponding softmax output valve of k-th of output Probability).

7. the intelligent control method according to claim 1 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the training of model is freezed coaching method based on gradient and is trained, and specifically comprises the following steps:

Training ResNet50 depth residual error neural network and the full articulamentum classifier of 2048 dimensions being correspondingly connected on data set；

After training to model convergence, the parameter of trained ResNet50 is imported in depth integration neural network, is connected again New full link sort and softmax network are connect, then is trained on identical data set；In training, will utilize The network parameter of the extraction depth characteristic that pre-training model initialization is crossed is arranged lower learning rate, emphasis training fusion link and The weight parameter of classifier is finally completed the training to entire converged network.

8. the intelligent control method according to claim 7 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the data set is specially that Market1501 pedestrian identifies data set again.