CN111062329B

CN111062329B - Unsupervised person re-identification method based on augmented network

Info

Publication number: CN111062329B
Application number: CN201911310016.1A
Authority: CN
Inventors: 郑伟诗; 袁子逸
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2023-05-30
Anticipated expiration: 2039-12-18
Also published as: CN111062329A

Abstract

The invention provides an unsupervised pedestrian re-identification method based on an augmentation network, which is characterized in that based on pedestrian image data in an original database, various forms of data augmentation are carried out, and the characteristics of the same label data access parameters which are taken as basic data after the augmentation are respectively extracted by a network with unshared parameters, so that the network is helped to train. The method mainly considers how to utilize the unlabeled data which cannot be directly used as input under the condition that the data set is not abundant, and the main network model obtained by the method can be directly used for testing after the feature extraction is directly carried out on the test set; the method can also be used for pre-training a plurality of augmentation networks and the main network by using the unlabeled data, and then the main network parameters are finely adjusted by using the labeled data, so that the unlabeled information is effectively utilized, and the accuracy of pedestrian re-identification is improved.

Description

Unsupervised person re-identification method based on augmented network

技术领域technical field

本发明涉及深度学习领域，更具体地，涉及一种无监督的行人重识别方法。The present invention relates to the field of deep learning, and more specifically, relates to an unsupervised pedestrian re-identification method.

背景技术Background technique

近年来，深度学习技术不断发展，基于深度神经网络的深度学习方法早已应用在我们生活中的方方面面。例如自然语言处理(Natural Language Processing)领域的文本翻译、文本分类等；计算机视觉(Computer Vison)领域的图像检索、人脸识别等。深度学习方法的出现给人类社会带来了极大的便捷性。In recent years, deep learning technology has continued to develop, and deep learning methods based on deep neural networks have long been applied in all aspects of our lives. For example, text translation and text classification in the field of Natural Language Processing; image retrieval and face recognition in the field of Computer Vision. The emergence of deep learning methods has brought great convenience to human society.

行人重识别方法就是基于深度学习方法的一种重要应用。行人重识别(Personre-identification)也称行人再识别，是利用计算机视觉技术判断由互相视野不覆盖的摄像头获取的图像或者视频序列中是否存在特定行人的技术。由于不同摄像设备之间的差异，同时行人兼具刚性和柔性的特性，外观易受穿着、尺度、遮挡、姿态和视角等影响，使得行人重识别成为计算机视觉领域中一个既具有研究价值同时又极具挑战性的热门课题。Pedestrian re-identification method is an important application based on deep learning method. Person re-identification (Person re-identification), also known as pedestrian re-identification, is a technology that uses computer vision technology to judge whether there is a specific pedestrian in the image or video sequence acquired by cameras that do not cover each other. Due to the differences between different camera equipment, pedestrians have both rigid and flexible characteristics, and their appearance is easily affected by clothing, scale, occlusion, posture and viewing angle, etc., making pedestrian re-identification a research value in the field of computer vision. Challenging hot topics.

行人重识别在学术领域有一些专门的数据库，但是由于数据的获取与标定需要花费大量的人力与财力，这些数据集的图像数量都很少。Market-1501与DukeMTMC-reID是其中两个常用数据集。Pedestrian re-identification has some specialized databases in the academic field, but because the acquisition and calibration of data requires a lot of manpower and financial resources, the number of images in these datasets is very small. Market-1501 and DukeMTMC-reID are two commonly used datasets.

Market-1501数据集在清华大学校园中采集，图像来自6个不同的摄像头。训练集包含12,936张图像，测试集包含19,732张图像。训练数据中一共有751人，测试集中有750人。所以在训练集中，平均每类(每个人)有17.2张训练数据。The Market-1501 dataset was collected on the campus of Tsinghua University, with images from 6 different cameras. The training set contains 12,936 images and the test set contains 19,732 images. There are 751 people in the training data and 750 people in the test set. So in the training set, there are an average of 17.2 training data per class (per person).

DukeMTMC-reID数据集在杜克大学内采集，图像来自8个不同摄像头。训练集包含16,522张图像，测试集包含17,661张图像。训练数据中一共有702人，平均每类(每个人)有23.5张训练数据。The DukeMTMC-reID dataset was collected at Duke University, with images from 8 different cameras. The training set contains 16,522 images and the test set contains 17,661 images. There are a total of 702 people in the training data, with an average of 23.5 training data per class (each person).

以上这两个常用的行人重识别数据集只有33,000张左右的图像数据，对比企业中动辄千万上亿的图像数据来说差距明显。而数据集太小时，神经网络的训练会趋于过拟合(Overfitting)，导致在原始数据集训练出的神经网络在其他数据集上测试准确率降低。The above two commonly used pedestrian re-identification datasets only have about 33,000 image data, which is significantly different from the tens of millions of image data in enterprises. If the data set is too small, the training of the neural network will tend to overfitting (Overfitting), resulting in a decrease in the test accuracy of the neural network trained on the original data set on other data sets.

在此情况下，许多数据增广(Data Augmentation)方法开始在行人重识别领域被使用，例如随机裁剪、随机翻转等等。但这种方法只是在原有有标签数据集上对图像数据的二次处理，而其他无标签的数据集仍然没有得到合理利用。In this case, many data augmentation (Data Augmentation) methods have been used in the field of person re-identification, such as random cropping, random flipping, and so on. However, this method is only a secondary processing of image data on the original labeled dataset, while other unlabeled datasets are still not properly utilized.

发明内容Contents of the invention

本发明的主要目的在于克服现有技术的缺点与不足，一种新颖的切骨逻辑而提供的一套操作方便并且可实现准确定位的一种膝关节置换手术用股骨远端个性化切骨导板，且在满足条件的情况下尽量的减材和减少设计难度，提高设计效率。The main purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a set of convenient operation and accurate positioning provided by a novel osteotomy logic. , and in the case of meeting the conditions, reduce the material and design difficulty as much as possible, and improve the design efficiency.

为了达到上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于增广网络的无监督行人重识别方法，包括以下步骤：An unsupervised person re-identification method based on an augmented network, comprising the following steps:

S1：对无标签的原始行人图像数据集D0进行增广操作，所述增广操作包括图像缩放、随机裁剪、随机擦除、加噪和高斯模糊的一种或多种，得到M个新的增广数据集D1～DM，M为正整数；S1: Perform an augmentation operation on the unlabeled original pedestrian image dataset D0, the augmentation operation includes one or more of image scaling, random cropping, random erasing, noise addition, and Gaussian blur to obtain M new Augmented data set D1～DM, M is a positive integer;

S2：将原始行人图像数据集D0中的原始图像数据通入一个卷积神经网络作为主网络N0进行前向传播提取得到特征F0；S2: Pass the original image data in the original pedestrian image dataset D0 into a convolutional neural network as the main network N0 for forward propagation extraction to obtain the feature F0;

S3：将M个增广数据集D1～DN中对应的增广图像数据分别输入M个参数不共享的卷积神经网络作为增广网络N1～NM进行前向传播提取得到特征F1～FM；S3: Input the corresponding augmented image data in the M augmented data sets D1~DN into M convolutional neural networks whose parameters are not shared as augmented networks N1~NM, and perform forward propagation to obtain features F1~FM;

S4：从原始行人图像数据集D0中随机选取一张图像Inegative作为负样本，通入主网络N0前向传播提取得到特征Fnegative；S4: Randomly select an image Negative from the original pedestrian image data set D0 as a negative sample, and pass it into the main network N0 for forward propagation to extract the feature Fnegative;

S5：用输出特征F0分别与输出特征F1～FM计算欧式距离，得到M个损失值L1～LM；S5: Use the output feature F0 to calculate the Euclidean distance with the output features F1 ~ FM respectively, and obtain M loss values L1 ~ LM;

S6：用输出特征Fnegative分别与输出特征F0～FM计算欧氏距离，得到M+1个损失值L0nagetive～LMnegative；S6: Use the output feature Fnegative to calculate the Euclidean distance with the output features F0~FM respectively, and obtain M+1 loss values L0nagetive~LMnegative;

S7：将S5中得到的M个损失值L1～LM分别与S6中得到的M个损失值L1negative～LMnegative相减后得到的结果作为损失对增广网络N1～NM进行后向传播计算梯度更新增广网络参数；S7: Subtract the M loss values L1~LM obtained in S5 from the M loss values L1negative~LMnegative obtained in S6, respectively, and use the result of subtracting the M loss values L1negative~LMnegative obtained in S6 as the loss to perform backpropagation calculation on the augmented network N1~NM to update the gradient. Broad network parameters;

S8：将S5中得到的M个损失值L1～LM进行求和与S6中得到的损失值L0negative～LMnegative求和的结果相减，得到总损失值L0；S8: Subtract the sum of the M loss values L1~LM obtained in S5 from the sum of the loss values L0negative~LMnegative obtained in S6 to obtain the total loss value L0;

S9：将S8中得到的总损失值L0作为损失对主网络N0进行后向传播计算梯度更新主网络参数；S9: Use the total loss value L0 obtained in S8 as the loss to perform backpropagation calculation on the main network N0 to update the main network parameters;

S10：重复S2～S9的操作，直到主网络与增广网络收敛；S10: Repeat the operations from S2 to S9 until the main network and the augmented network converge;

S11：将主网络模型作为输出。S11: Take the main network model as an output.

作为优选的技术方案，步骤S1中，当增广操作中包括图像缩放处理时，使用双线性插值的方法对图像进行缩放，从而模拟自然数据集中可能出现的各种不同分辨率的图像，具体计算方式如下：As a preferred technical solution, in step S1, when the augmentation operation includes image scaling processing, the bilinear interpolation method is used to scale the image, thereby simulating images of various resolutions that may appear in natural data sets, specifically It is calculated as follows:

其中Q11＝(x1,y1)，Q12＝(x1,y2)，Q21＝(x2,y1)，Q22＝(x2,y2)为点(x,y)最为接近的四个像素点。Wherein Q11=(x1, y1), Q12=(x1, y2), Q21=(x2, y1), Q22=(x2, y2) are the four closest pixel points to the point (x, y).

作为优选的技术方案，步骤S1中，当增广操作中包括随机剪裁处理时，使用随机裁剪方法进行增广，从而模拟自然数据集中可能出现的各种局部行人图像，具体方法为：As a preferred technical solution, in step S1, when the augmentation operation includes random clipping processing, the random clipping method is used for augmentation, thereby simulating various local pedestrian images that may appear in the natural data set. The specific method is:

先随机在图像中选取一个像素点，再以该像素点作为左上角随机以一定的长与宽构成一个矩形，将这整个矩形中的像素点作为裁剪的结果进行输出。First randomly select a pixel in the image, then use the pixel as the upper left corner to randomly form a rectangle with a certain length and width, and output the pixels in the entire rectangle as the cropping result.

作为优选的技术方案，步骤S1中，当增广操作中包括随机擦除处理时，使用随机擦除方法进行增广，从而模拟自然数据集中可能出现的各种缺失或不完整行人图像，具体方法为：As a preferred technical solution, in step S1, when the augmentation operation includes random erasure processing, the random erasure method is used for augmentation, thereby simulating various missing or incomplete pedestrian images that may appear in the natural data set. The specific method for:

先随机在图像中选取一个像素点，再以该像素点作为左上角随机以一定的长与宽构成一个矩形，将这整个矩形中的像素点的像素值全部置为黑，即像素值(0,0,0)，然后将操作后的整张图像作为随机擦除的结果进行输出。First randomly select a pixel in the image, then use this pixel as the upper left corner to randomly form a rectangle with a certain length and width, and set all the pixel values of the pixels in the entire rectangle to black, that is, the pixel value (0 ,0,0), and then output the whole image after operation as the result of random erasing.

作为优选的技术方案，步骤S1中，当增广操作中包括加噪方法处理时，使用加噪方法进行增广，从而模拟自然数据集中可能出现的图像噪声，具体操作为：As a preferred technical solution, in step S1, when the augmentation operation includes the noise addition method, the noise addition method is used for augmentation, thereby simulating the image noise that may appear in the natural data set, and the specific operation is:

对每个像素点都有一定的概率值变为白点，即像素值(255,255,255))或黑点，即像素值(0,0,0)，然后将操作后的整张图像作为加噪结果进行输出。For each pixel, there is a certain probability that the value becomes a white point, that is, the pixel value (255,255,255)) or a black point, that is, the pixel value (0,0,0), and then the entire image after the operation is used as the noise addition result to output.

作为优选的技术方案，步骤S1中，当增广操作中包括高斯模糊处理时，使用高斯模糊方法进行增广，从而模拟自然数据集中可能出现的图像模糊的情况，根据以下公式：As a preferred technical solution, in step S1, when the augmentation operation includes Gaussian blur processing, the Gaussian blur method is used for augmentation, thereby simulating the image blurring that may occur in natural data sets, according to the following formula:

设定好σ值后即可计算出权重矩阵，从而以图像中的每一个像素为中心进行矩阵运算，就能达到对图像进行模糊的目的。After setting the σ value, the weight matrix can be calculated, so that the matrix operation can be performed with each pixel in the image as the center, and the purpose of blurring the image can be achieved.

作为优选的技术方案，步骤S2和步骤S3中，将各自的行人图像数据传入对应卷积神经网络，使用前向传播法进行特征提取，前向传播具体公式如下：As a preferred technical solution, in step S2 and step S3, the respective pedestrian image data are passed into the corresponding convolutional neural network, and the forward propagation method is used for feature extraction. The specific formula of forward propagation is as follows:

其中a表示中间层输出；σ表示激活函数；z表示激活层的输入；上标表示层数；*表示卷积操作；W表示卷积核；b表示偏置。Where a represents the output of the middle layer; σ represents the activation function; z represents the input of the activation layer; the superscript represents the number of layers; * represents the convolution operation; W represents the convolution kernel; b represents the bias.

作为优选的技术方案，步骤S5具体为：As a preferred technical solution, step S5 is specifically:

用主网络N0提取到的特征F0分别与增广网络N1～NM提取到的特征F1～FM计算欧式距离，具体公式如下：Use the feature F0 extracted from the main network N0 to calculate the Euclidean distance with the features F1~FM extracted from the augmented network N1~NM respectively. The specific formula is as follows:

其中x带入主网络提取到的特征F0；y分别带入五个增广网络提取到的特征F1～FM；xi与yi为对应特征各个维度上的值；Among them, x is brought into the feature F0 extracted by the main network; y is brought into the features F1~FM extracted from the five augmented networks respectively; xi and yi are the values in each dimension of the corresponding feature;

步骤S6具体为：Step S6 is specifically:

所述特征Fnegative为负样本，随机选取一张图像作为负样本Fnegative与输出特征F0～FM计算欧氏距离。The feature Fnegative is a negative sample, and an image is randomly selected as the negative sample Fnegative to calculate the Euclidean distance with the output features F0-FM.

作为优选的技术方案，步骤S7具体为：As a preferred technical solution, step S7 is specifically:

将计算好的误差值传回对应的卷积神经网络，利用后向传播算法对卷积神经网络的参数值进行迭代更新，具体公式如下：Pass the calculated error value back to the corresponding convolutional neural network, and use the backpropagation algorithm to iteratively update the parameter values of the convolutional neural network. The specific formula is as follows:

其中上标表示层数；δ表示梯度值；*表示卷积操作；W表示卷积核；rot180表示对矩阵进行180度翻转，即上下翻转一次，再左右翻转一次；o表示点对点乘法；σ'表示激活函数的导数。Among them, the superscript indicates the number of layers; δ indicates the gradient value; * indicates the convolution operation; W indicates the convolution kernel; rot180 indicates that the matrix is flipped 180 degrees, that is, flipped up and down once, and then flipped left and right once; o means point-to-point multiplication; σ' Denotes the derivative of the activation function.

作为优选的技术方案，步骤S8具体为：As a preferred technical solution, step S8 is specifically:

将增广网络得到的特征与主网络得到的特征分别求欧式距离后得到的误差值L1～LM求和与L0negative～LMnegative求和的结果相减得到总误差值L0，具体公式如下：Calculate the Euclidean distance between the features obtained by the augmented network and the features obtained by the main network, and subtract the sum of the error values L1~LM from the sum of L0negative~LMnegative to obtain the total error value L0. The specific formula is as follows:

其中λ_i,i∈[1,M]为正样本权重值，L_i,i∈[1,M]对应的正样本误差值，此处取λ_i＝1,i∈[1,M]；λ_inegative,inegative∈[0,M]为负样本权重值，L_inegative,inegative∈[0,M]对应的负样本误差值，此处取λ_ineg＝1,ineg∈[0,M]。Where λ _i , i∈[1,M] is the positive sample weight value, L _i , i∈[1,M] corresponds to the positive sample error value, where λ _i =1,i∈[1,M]; λ _negative , negative∈[0,M] is the negative sample weight value, Line _negative , negative∈[0,M] corresponds to the negative sample error value, where λ _ineg =1,ineg∈[0,M].

本发明与现有技术相比，具有如下优点和有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

本发明方法使用增广网络的方法对原本无法直接作为深度神经网络的输入进行训练的无标签行人图像数据加以利用，在将无标签数据集进行增广操作之后即可端到端的进行对增广网络与主网络进行训练。利用了原始数据与由原始数据得到的增广数据提取到的特征应尽保持一致这一信息对深度神经网络进行训练。对于数据集与数据量本身就比较缺乏的行人重识别领域有极大的好处，另外，各种不同的增广操作也一定程度上模拟了行人重识别数据本身可能出现的模糊、缺失的情况，所以本发明提出的方法能够提升训练出的深度神经网络的泛化性，缓解过拟合，最终达到提升识别准确率的效果The method of the present invention uses the method of augmenting the network to utilize the unlabeled pedestrian image data that cannot be directly used as the input of the deep neural network for training, and can perform end-to-end pair augmentation after the unlabeled data set is augmented. The network is trained with the main network. The deep neural network is trained by using the information that the original data and the features extracted from the augmented data obtained from the original data should be consistent as much as possible. It is of great benefit to the field of pedestrian re-identification where the data set and data volume itself is relatively scarce. In addition, various augmentation operations also simulate to a certain extent the blurred and missing situations that may occur in the pedestrian re-identification data itself. Therefore, the method proposed in the present invention can improve the generalization of the trained deep neural network, alleviate over-fitting, and finally achieve the effect of improving the recognition accuracy

附图说明Description of drawings

图1为本发明基于增广网络的无监督行人重识别方法的流程图。FIG. 1 is a flow chart of an unsupervised pedestrian re-identification method based on an augmented network in the present invention.

具体实施方式Detailed ways

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

实施例Example

如图1所示，本实施例提供了一种基于增广网络的无监督行人重识别方法，包括下述步骤：As shown in Figure 1, the present embodiment provides an unsupervised pedestrian re-identification method based on an augmented network, including the following steps:

S1：对无标签的行人图像数据集D0进行增广操作，包括图像缩放、随机裁剪、随机擦除、加噪和高斯模糊(这五种增广操作可以选择其中几种进行组合呼出量，也可以全部选择，本实施例以5中增广操作为例做进一步的说明)，得到五个新的增广数据集D1～D5。S1: Perform augmentation operations on the unlabeled pedestrian image data set D0, including image scaling, random cropping, random erasing, noise addition, and Gaussian blur (these five augmentation operations can be selected to combine exhalation volume, or All can be selected, this embodiment takes the augmentation operation in 5 as an example for further description), and five new augmented data sets D1-D5 are obtained.

步骤S1中图像缩放、随机裁剪、随机擦除、加噪和高斯模糊具体为：Image scaling, random cropping, random erasing, noise addition and Gaussian blur in step S1 are specifically:

S11：对原始无标签行人图像数据，使用双线性插值的方法对图像进行缩放，从而模拟自然数据集中可能出现的各种不同分辨率的图像，具体计算方式如下公式所示：S11: For the original unlabeled pedestrian image data, the bilinear interpolation method is used to scale the image, thereby simulating various images of different resolutions that may appear in the natural data set. The specific calculation method is shown in the following formula:

S12：对原始无标签行人图像数据，使用随机裁剪方法进行增广，从而模拟自然数据集中可能出现的各种局部行人图像，具体操作为：先随机在图像中选取一个像素点，再以该像素点作为左上角随机以一定的长与宽构成一个矩形，将这整个矩形中的像素点作为裁剪的结果进行输出。S12: For the original unlabeled pedestrian image data, use the random cropping method to augment, so as to simulate various local pedestrian images that may appear in the natural data set. The specific operation is: first randomly select a pixel in the image, and then use the pixel The point is used as the upper left corner to randomly form a rectangle with a certain length and width, and the pixels in the entire rectangle are output as the cropping result.

S13：对原始无标签行人图像数据，使用随机擦除方法进行增广，从而模拟自然数据集中可能出现的各种缺失或不完整行人图像，具体操作为：S13: For the original unlabeled pedestrian image data, use the random erasure method to augment, so as to simulate various missing or incomplete pedestrian images that may appear in the natural data set. The specific operations are:

先随机在图像中选取一个像素点，再以该像素点作为左上角随机以一定的长与宽构成一个矩形，将这整个矩形中的像素点的像素值全部置为黑(即像素值(0,0,0))，然后将操作后的整张图像作为随机擦除的结果进行输出。First randomly select a pixel in the image, then use the pixel as the upper left corner to randomly form a rectangle with a certain length and width, and set all the pixel values of the pixels in the entire rectangle to black (that is, the pixel value (0 ,0,0)), and then output the entire image after the operation as the result of random erasure.

S14：对原始无标签行人图像数据，使用加噪方法进行增广，从而模拟自然数据集中可能出现的图像噪声，具体操作为：S14: For the original unlabeled pedestrian image data, use the noise addition method to augment, so as to simulate the image noise that may appear in the natural data set. The specific operation is:

对每个像素点都有一定的概率值变为白点(即像素值(255,255,255))或黑点(即像素值(0,0,0))，然后将操作后的整张图像作为加噪结果进行输出。For each pixel, there is a certain probability that the value becomes a white point (that is, the pixel value (255,255,255)) or a black point (that is, the pixel value (0,0,0)), and then the entire image after the operation is used as a noise The result is output.

S15：对原始无标签行人图像数据，使用高斯模糊方法进行增广，从而模拟自然数据集中可能出现的图像模糊的情况。根据以下公式：S15: For the original unlabeled pedestrian image data, the Gaussian blur method is used to augment it, so as to simulate the image blurring that may occur in the natural data set. According to the following formula:

S2：将原始行人图像数据集D0中的原始图像数据通入一个卷积神经网络作为主网络N0进行前向传播提取得到特征F0；即将各自的行人图像数据传入对应卷积神经网络，使用前向传播法进行特征提取，前向传播具体公式如下：S2: Pass the original image data in the original pedestrian image data set D0 into a convolutional neural network as the main network N0 for forward propagation extraction to obtain the feature F0; that is, pass the respective pedestrian image data into the corresponding convolutional neural network. The forward propagation method is used for feature extraction, and the specific formula of forward propagation is as follows:

S3：将五个增广数据集D1～D5中对应的增广图像数据分别输入五个参数不共享的卷积神经网络作为增广网络N1～N5进行前向传播提取得到特征F1～F5；步骤S3中的前向传播采用和步骤S2中相同的方法。S3: Input the corresponding augmented image data in the five augmented data sets D1~D5 into five convolutional neural networks whose parameters are not shared as augmented networks N1~N5 for forward propagation extraction to obtain features F1~F5; step The forward propagation in S3 adopts the same method as in step S2.

S4、从原始行人图像数据集D0中随机选取一张图像Inegative作为负样本，通入主网络N0前向传播提取得到特征Fnegative；S4. Randomly select an image Negative from the original pedestrian image data set D0 as a negative sample, and pass it into the main network N0 for forward propagation to extract the feature Fnegative;

S5：用输出特征F0分别与输出特征F1～F5计算欧式距离，得到五个损失值L1～L5，具体为：S5: Use the output feature F0 to calculate the Euclidean distance with the output features F1~F5 respectively, and get five loss values L1~L5, specifically:

用主网络N0提取到的特征F0分别与增广网络N1～N5提取到的特征F1～F5计算欧式距离，具体公式如下：Use the feature F0 extracted by the main network N0 to calculate the Euclidean distance with the features F1-F5 extracted by the augmented network N1-N5 respectively. The specific formula is as follows:

其中x带入主网络提取到的特征F0；y分别带入五个增广网络提取到的特征F1～F5；xi与yi为对应特征各个维度上的值。Among them, x is brought into the feature F0 extracted by the main network; y is brought into the features F1-F5 extracted from the five augmented networks; xi and yi are the values in each dimension of the corresponding feature.

S6：用输出特征Fnegative分别与输出特征F0～F5计算欧氏距离，得到6个损失值L0nagetive～LMnegative；由于一般而言数据集中的数据量较大，同类别数据相对于总体数据量比例小，所以此处将随机选取的一张图像作为负样本，在绝大多数情况下是可行的。S6: Use the output feature Fnegative to calculate the Euclidean distance with the output features F0~F5 respectively, and get 6 loss values L0nagetive~LMnegative; because generally speaking, the amount of data in the data set is large, and the proportion of the same type of data relative to the overall data amount is small, Therefore, it is feasible to use a randomly selected image as a negative sample here in most cases.

S7：将S5中得到的五个损失值L1～L5分别与S6中得到的5个损失值L1negative～LMnegative相减后得到的结果作为损失对增广网络N1～N5进行后向传播计算梯度更新增广网络参数。S7: Subtract the five loss values L1~L5 obtained in S5 from the five loss values L1negative~LMnegative obtained in S6, respectively, and use the results obtained after subtracting the five loss values L1negative~LMnegative obtained in S6 as the loss to perform backpropagation calculation on the augmented network N1~N5 to update the gradient. wide network parameters.

即将计算好的误差值传回对应的卷积神经网络，利用后向传播算法对卷积神经网络的参数值进行迭代更新，具体公式如下：The calculated error value is sent back to the corresponding convolutional neural network, and the parameter value of the convolutional neural network is iteratively updated using the back propagation algorithm. The specific formula is as follows:

S8：将S4中得到的五个损失值L1～L5进行求和与S6中得到的损失值L0negative～L5negative求和的结果相减，得到总损失值L0，具体公式如下：S8: Subtract the sum of the five loss values L1~L5 obtained in S4 from the sum of the loss values L0negative~L5negative obtained in S6 to obtain the total loss value L0. The specific formula is as follows:

其中λ_i,i∈[1,5]为正样本权重值，L_i,i∈[1,5]对应的正样本误差值，此处取λ_i＝1,i∈[1,5]；λ_inegative,inegative∈[0,5]为负样本权重值，L_inegative,inegative∈[0,5]对应的负样本误差值，此处取λ_inegative＝1,inegative∈[0,5]。Where λ _i , i∈[1,5] is the positive sample weight value, L _i , i∈[1,5] corresponds to the positive sample error value, where λ _i =1,i∈[1,5]; λ _negative , negative∈[0,5] is the negative sample weight value, and the negative sample error value corresponding to Line _negative , negative∈[0,5], where λ _negative =1, negative∈[0,5].

S9：将S6中得到的总损失值L0作为损失对主网络N0进行后向传播计算梯度更新主网络参数。S9: Use the total loss value L0 obtained in S6 as the loss to perform backpropagation to the main network N0 to calculate the gradient and update the main network parameters.

S10：重复S2～S10的操作，直到主网络与增广网络收敛。S10: Repeat the operations of S2-S10 until the main network and the augmented network converge.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.

Claims

1. An unsupervised pedestrian re-identification method based on augmented network, is characterized in that, comprises the following steps:

S1: Perform an augmentation operation on the unlabeled original pedestrian image dataset D0, the augmentation operation includes one or more of image scaling, random cropping, random erasing, noise addition, and Gaussian blur to obtain M new Augmented data set D1～DM, M is a positive integer;

S2: Pass the original image data in the original pedestrian image dataset D0 into a convolutional neural network as the main network N0 for forward propagation extraction to obtain the feature F0;

S3: Input the corresponding augmented image data in the M augmented data sets D1~DN into M convolutional neural networks whose parameters are not shared as augmented networks N1~NM, and perform forward propagation to obtain features F1~FM;

S4: Randomly select an image Negative from the original pedestrian image data set D0 as a negative sample, and pass it into the main network N0 for forward propagation to extract the feature Fnegative;

S5: Use the output feature F0 to calculate the Euclidean distance with the output features F1 ~ FM respectively, and obtain M loss values L1 ~ LM;

S6: Use the output feature Fnegative to calculate the Euclidean distance with the output features F0~FM respectively, and obtain M+1 loss values L0nagetive~LMnegative;

S7: Subtract the M loss values L1~LM obtained in S5 from the M loss values L1negative~LMnegative obtained in S6, respectively, and use the result of subtracting the M loss values L1negative~LMnegative obtained in S6 as the loss to perform backpropagation calculation on the augmented network N1~NM to update the gradient. Broad network parameters;

S8: Subtract the sum of the M loss values L1~LM obtained in S5 from the sum of the loss values L0negative~LMnegative obtained in S6 to obtain the total loss value L0;

S9: Use the total loss value L0 obtained in S8 as the loss to perform backpropagation calculation on the main network N0 to update the main network parameters;

S10: Repeat the operations from S2 to S9 until the main network and the augmented network converge;

S11: Take the main network model as an output.

2. The unsupervised pedestrian re-identification method based on the augmented network according to claim 1, wherein in step S1, when the augmented operation includes image scaling processing, the image is zoomed using a bilinear interpolation method , so as to simulate various images of different resolutions that may appear in the natural data set, the specific calculation method is as follows:

Wherein Q11=(x1, y1), Q12=(x1, y2), Q21=(x2, y1), Q22=(x2, y2) are the four closest pixel points to the point (x, y).

3. The method for unsupervised person re-identification based on augmented network according to claim 1, characterized in that, in step S1, when the augmentation operation includes random clipping, the random clipping method is used for augmentation, thereby simulating natural Various local pedestrian images that may appear in the data set, the specific method is:

First randomly select a pixel in the image, then use the pixel as the upper left corner to randomly form a rectangle with a certain length and width, and output the pixels in the entire rectangle as the cropping result.

4. The unsupervised person re-identification method based on augmented network according to claim 1, characterized in that, in step S1, when the augmentation operation includes random erasure processing, the random erasure method is used for augmentation, thereby Simulate various missing or incomplete pedestrian images that may appear in natural datasets by:

First randomly select a pixel in the image, then use this pixel as the upper left corner to randomly form a rectangle with a certain length and width, and set all the pixel values of the pixels in the entire rectangle to black, that is, the pixel value (0 ,0,0), and then output the whole image after operation as the result of random erasing.

5. The method of unsupervised person re-identification based on augmented network according to claim 1, characterized in that, in step S1, when the augmented operation includes noise-adding method processing, use the noise-adding method to augment, thereby simulating Image noise that may appear in natural data sets, the specific operation is:

For each pixel, there is a certain probability that the value becomes a white point, that is, the pixel value (255,255,255)) or a black point, that is, the pixel value (0,0,0), and then the entire image after the operation is used as the noise addition result to output.

6. The unsupervised person re-identification method based on augmented network according to claim 1, characterized in that, in step S1, when the augmented operation includes Gaussian blur processing, the Gaussian blur method is used for augmentation, thereby simulating natural The image blurring that may appear in the data set is based on the following formula:

After setting the σ value, the weight matrix can be calculated, so that the matrix operation can be performed with each pixel in the image as the center, and the purpose of blurring the image can be achieved.

7. The unsupervised pedestrian re-identification method based on the augmented network according to claim 1, characterized in that, in step S2 and step S3, the respective pedestrian image data are passed into the corresponding convolutional neural network, and the forward propagation method is used For feature extraction, the specific formula of forward propagation is as follows:

Where a represents the output of the middle layer; σ represents the activation function; z represents the input of the activation layer; the superscript represents the number of layers; * represents the convolution operation; W represents the convolution kernel; b represents the bias.

8. The unsupervised pedestrian re-identification method based on the augmented network according to claim 1, wherein step S5 is specifically:

Use the feature F0 extracted from the main network N0 to calculate the Euclidean distance with the features F1~FM extracted from the augmented network N1~NM respectively. The specific formula is as follows:

Among them, x is brought into the feature F0 extracted by the main network; y is brought into the features F1~FM extracted from the five augmented networks respectively; xi and yi are the values in each dimension of the corresponding feature;

Step S6 is specifically:

The feature Fnegative is a negative sample, and an image is randomly selected as the negative sample Fnegative to calculate the Euclidean distance with the output features F0-FM.

9. The unsupervised pedestrian re-identification method based on the augmented network according to claim 1, wherein step S7 is specifically:

Pass the calculated error value back to the corresponding convolutional neural network, and use the backpropagation algorithm to iteratively update the parameter values of the convolutional neural network. The specific formula is as follows:

Among them, the superscript indicates the number of layers; δ indicates the gradient value; * indicates the convolution operation; W indicates the convolution kernel; rot180 indicates that the matrix is flipped 180 degrees, that is, flipped up and down once, and then flipped left and right once; o means point-to-point multiplication; σ' Denotes the derivative of the activation function.

10. The method for unsupervised person re-identification based on augmented network according to claim 1, characterized in that step S8 is specifically:

Calculate the Euclidean distance between the features obtained by the augmented network and the features obtained by the main network, and subtract the sum of the error values L1~LM from the sum of L0negative~LMnegative to obtain the total error value L0. The specific formula is as follows:

Where λ _i , i∈[1,M] is the positive sample weight value, L _i , i∈[1,M] corresponds to the positive sample error value, where λ _i =1,i∈[1,M]; λ _negative , negative∈[0,M] is the negative sample weight value, and the negative sample error value corresponding to Line _negative , negative∈[0,M]. Here, λ _negative ＝1, negative∈[0,M].