CN111062329B - Unsupervised pedestrian re-identification method based on augmented network - Google Patents
Unsupervised pedestrian re-identification method based on augmented network Download PDFInfo
- Publication number
- CN111062329B CN111062329B CN201911310016.1A CN201911310016A CN111062329B CN 111062329 B CN111062329 B CN 111062329B CN 201911310016 A CN201911310016 A CN 201911310016A CN 111062329 B CN111062329 B CN 111062329B
- Authority
- CN
- China
- Prior art keywords
- augmentation
- network
- pedestrian
- image
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000003190 augmentative effect Effects 0.000 title claims description 24
- 230000003416 augmentation Effects 0.000 claims abstract description 45
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 abstract description 9
- 238000012360 testing method Methods 0.000 abstract description 6
- 238000013434 data augmentation Methods 0.000 abstract description 3
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000000689 upper leg Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an unsupervised pedestrian re-identification method based on an augmentation network, which is characterized in that based on pedestrian image data in an original database, various forms of data augmentation are carried out, and the characteristics of the same label data access parameters which are taken as basic data after the augmentation are respectively extracted by a network with unshared parameters, so that the network is helped to train. The method mainly considers how to utilize the unlabeled data which cannot be directly used as input under the condition that the data set is not abundant, and the main network model obtained by the method can be directly used for testing after the feature extraction is directly carried out on the test set; the method can also be used for pre-training a plurality of augmentation networks and the main network by using the unlabeled data, and then the main network parameters are finely adjusted by using the labeled data, so that the unlabeled information is effectively utilized, and the accuracy of pedestrian re-identification is improved.
Description
Technical Field
The invention relates to the field of deep learning, in particular to an unsupervised pedestrian re-recognition method.
Background
In recent years, deep learning technology is continuously developed, and deep learning methods based on deep neural networks have been applied to aspects of our lives. Such as text translation, text classification, etc. in the field of natural language processing (Natural Language Processing); image retrieval, face recognition, etc. in the field of Computer vision (Computer Vison). The occurrence of the deep learning method brings great convenience to human society.
The pedestrian re-recognition method is an important application based on the deep learning method. Pedestrian re-recognition (Person-identification) is also called pedestrian re-recognition, and is a technique for judging whether a specific pedestrian exists in an image or video sequence acquired by cameras whose fields of view do not overlap each other by using a computer vision technique. Because of the difference between different camera devices, pedestrians have the characteristics of rigidity and flexibility, and the appearance is easily influenced by wearing, dimensions, shielding, postures, visual angles and the like, the pedestrian re-recognition becomes a hot subject which has research value and is very challenging in the field of computer vision.
Pedestrian re-identification has some special databases in the academic field, but because the acquisition and calibration of data require a lot of manpower and financial resources, the number of images of the data sets is small. Market-1501 and DukeMTMC-reID are two of the common data sets.
The mark-1501 dataset was collected on a university campus of bloom, with images from 6 different cameras. The training set contained 12,936 images and the test set contained 19,732 images. The training data is of 751 people in total and 750 people in the test set. There are on average 17.2 pieces of training data per class (per person) in the training set.
The DukeMTMC-reID dataset was acquired at the university of duke and the images were from 8 different cameras. The training set contained 16,522 images and the test set contained 17,661 images. There were 702 total people in the training data, and 23.5 training data per class (each person) on average.
The two common pedestrian re-identification data sets only have 33,000 Zhang Zuoyou image data, and the difference is obvious compared with the image data of tens of millions in enterprises. While the data set is too small, the training of the neural network may tend to over fit (overfit), resulting in reduced accuracy of testing the neural network trained on the original data set on other data sets.
In this case, many data augmentation (Data Augmentation) methods are beginning to be used in the area of pedestrian re-identification, such as random clipping, random flipping, etc. However, this method only performs secondary processing on the image data on the original labeled data set, and other unlabeled data sets are still not reasonably utilized.
Disclosure of Invention
The invention mainly aims to overcome the defects and shortcomings of the prior art, provides a set of femur distal end personalized bone cutting guide plate which is convenient to operate and can realize accurate positioning and is provided by novel bone cutting logic, materials and design difficulty are reduced as much as possible under the condition that the conditions are met, and the design efficiency is improved.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an unsupervised pedestrian re-identification method based on an augmented network comprises the following steps:
s1: performing augmentation operation on an original pedestrian image data set D0 without labels, wherein the augmentation operation comprises one or more of image scaling, random cutting, random erasing, noise adding and Gaussian blur, and M new augmentation data sets D1-DM are obtained, and M is a positive integer;
s2: the original image data in the original pedestrian image data set D0 is introduced into a convolutional neural network as a main network N0 to carry out forward propagation extraction to obtain a feature F0;
s3: respectively inputting corresponding augmented image data in M augmented data sets D1-DN into M convolutional neural networks with unshared parameters as augmented networks N1-NM to perform forward propagation extraction to obtain characteristics F1-FM;
s4: randomly selecting an image inequative from an original pedestrian image data set D0 to serve as a negative sample, and introducing a main network N0 to forward propagation and extraction to obtain a characteristic Fnegative;
s5: calculating Euclidean distances by using the output characteristics F0 and the output characteristics F1-FM respectively to obtain M loss values L1-LM;
s6: calculating Euclidean distance by using the output characteristics Fnegative and the output characteristics F0-FM respectively to obtain M+1 loss values L0 negative-LMnegative;
s7: the obtained result of subtracting the M loss values L1-LM obtained in S5 from the M loss values L1 negative-LMnegative obtained in S6 is used as loss to carry out backward propagation calculation gradient update on the augmented network N1-NM;
s8: summing M loss values L1-LM obtained in the step S5, and subtracting the result of summing the loss values L0 negative-LMnegative obtained in the step S6 to obtain a total loss value L0;
s9: taking the total loss value L0 obtained in the step S8 as loss to carry out backward propagation calculation gradient update on the main network N0 to update the main network parameters;
s10: repeating the operations of S2-S9 until the main network and the augmentation network are converged;
s11: the master network model is taken as output.
As a preferred technical solution, in step S1, when the image scaling process is included in the augmentation operation, the image is scaled by using a bilinear interpolation method, so as to simulate various images with different resolutions that may occur in the natural dataset, and the specific calculation method is as follows:
where q11= (x 1, y 1), q12= (x 1, y 2), q21= (x 2, y 1), q22= (x 2, y 2) is the four pixel points where the point (x, y) is closest.
As a preferred technical solution, in step S1, when the augmentation operation includes random clipping, the augmentation is performed by using a random clipping method, so as to simulate various partial pedestrian images that may occur in the natural dataset, and the specific method is as follows:
firstly, randomly selecting a pixel point in an image, then taking the pixel point as an upper left corner, randomly forming a rectangle with a certain length and a certain width, and outputting the pixel point in the whole rectangle as a cutting result.
As a preferred technical solution, in step S1, when the random erasure process is included in the augmentation operation, the random erasure method is used to perform the augmentation, so as to simulate various missing or incomplete pedestrian images that may occur in the natural dataset, and the specific method is as follows:
a pixel point is selected randomly from an image, then the pixel point is taken as the upper left corner to form a rectangle with a certain length and a certain width, all pixel values of the pixel points in the whole rectangle are set to be black, namely pixel values (0, 0 and 0), and then the whole image after operation is output as a random erasing result.
As a preferred technical solution, in step S1, when the noise adding method is included in the amplifying operation, the noise adding method is used for amplifying, so as to simulate the image noise possibly occurring in the natural data set, and the specific operations are as follows:
for each pixel, a certain probability value becomes white point, namely pixel value (255 )), or black point, namely pixel value (0, 0), and then the whole image after operation is output as a noise adding result.
As a preferred technical solution, in step S1, when the gaussian blur processing is included in the augmentation operation, the gaussian blur method is used for the augmentation, so as to simulate the situation of image blur that may occur in the natural dataset, according to the following formula:
after the sigma value is set, a weight matrix can be calculated, so that matrix operation is performed by taking each pixel in the image as the center, and the purpose of blurring the image can be achieved.
As a preferable technical solution, in step S2 and step S3, the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and feature extraction is performed by using a forward propagation method, where a specific forward propagation formula is as follows:
wherein a represents the intermediate layer output; sigma represents an activation function; z represents the input of the activation layer; the superscript indicates the number of layers; * Representing a convolution operation; w represents a convolution kernel; b denotes the bias.
As a preferable technical scheme, step S5 specifically includes:
the Euclidean distance is calculated by the characteristic F0 extracted by the main network N0 and the characteristics F1-FM extracted by the augmentation networks N1-NM respectively, and the specific formula is as follows:
wherein x is brought into the feature F0 extracted by the main network; y are respectively brought into the characteristics F1-FM extracted by the five augmentation networks; xi and yi are values in each dimension of the corresponding feature;
the step S6 specifically comprises the following steps:
and the feature Fnegative is a negative sample, an image is randomly selected as the negative sample Fnegative, and the Euclidean distance is calculated with the output features F0-FM.
As a preferable technical scheme, step S7 specifically includes:
and transmitting the calculated error value back to the corresponding convolutional neural network, and iteratively updating the parameter value of the convolutional neural network by using a backward propagation algorithm, wherein the specific formula is as follows:
wherein the superscript indicates the number of layers; delta represents a gradient value; * Representing a convolution operation; w represents a convolution kernel; rot180 means that the matrix is turned 180 degrees, namely turned up and down once and then turned left and right once; o represents a point-to-point multiplication; σ' represents the derivative of the activation function.
As a preferable technical scheme, step S8 specifically includes:
the error values L1-LM summation obtained after the Euclidean distance is calculated respectively by the features obtained by the augmentation network and the features obtained by the main network are subtracted from the results of the L0 negative-LM negative summation to obtain a total error value L0, and the specific formula is as follows:
wherein lambda is i ,i∈[1,M]Is a positive sample weight value, L i ,i∈[1,M]The corresponding positive sample error value, here λ, is taken i =1,i∈[1,M];λ inegative ,inegative∈[0,M]Is a negative sample weight value, L inegative ,inegative∈[0,M]A corresponding negative sample error value, here λ ineg =1,ineg∈[0,M]。
Compared with the prior art, the invention has the following advantages and beneficial effects:
the method uses the method of the augmentation network to utilize the unlabeled pedestrian image data which cannot be directly used as the input of the deep neural network, and the augmentation network and the main network can be trained by end-to-end operation after the unlabeled data set is subjected to the augmentation operation. The deep neural network is trained by utilizing the information that the characteristics extracted by the original data and the augmented data obtained by the original data are consistent as much as possible. The method has great benefits for the pedestrian re-recognition field in which the data set and the data quantity are relatively lacking, and in addition, various different augmentation operations simulate the possible condition of blurring and missing of the pedestrian re-recognition data to a certain extent, so the method provided by the invention can promote the generalization of the trained deep neural network, relieve the overfitting and finally achieve the effect of improving the recognition accuracy
Drawings
FIG. 1 is a flow chart of an unsupervised pedestrian re-identification method based on an augmented network of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in fig. 1, the embodiment provides an unsupervised pedestrian re-identification method based on an augmentation network, which includes the following steps:
s1: the enhancement operation is performed on the unlabeled pedestrian image data set D0, including image scaling, random clipping, random erasure, noise addition and gaussian blur (several of these five enhancement operations may be selected to perform the combined call volume, or all of them may be selected, and the embodiment further illustrates the enhancement operation in step 5), so as to obtain five new enhancement data sets D1 to D5.
In the step S1, the image scaling, random clipping, random erasing, noise adding and Gaussian blur are specifically as follows:
s11: the original unlabeled pedestrian image data is scaled by a bilinear interpolation method, so that various images with different resolutions possibly appearing in a natural dataset are simulated, and the specific calculation mode is as follows:
where q11= (x 1, y 1), q12= (x 1, y 2), q21= (x 2, y 1), q22= (x 2, y 2) is the four pixel points where the point (x, y) is closest.
S12: the original unlabeled pedestrian image data is amplified by using a random clipping method, so that various local pedestrian images possibly appearing in a natural data set are simulated, and the specific operation is as follows: firstly, randomly selecting a pixel point in an image, then taking the pixel point as an upper left corner, randomly forming a rectangle with a certain length and a certain width, and outputting the pixel point in the whole rectangle as a cutting result.
S13: the original unlabeled pedestrian image data is amplified by using a random erasing method, so that various missing or incomplete pedestrian images possibly appearing in a natural data set are simulated, and the specific operation is as follows:
a pixel point is selected randomly from an image, then the pixel point is taken as the upper left corner to form a rectangle with a certain length and a certain width randomly, the pixel values of the pixel points in the whole rectangle are all set to be black (namely, the pixel values (0, 0)), and then the whole image after operation is output as a random erasing result.
S14: the original unlabeled pedestrian image data is amplified by using a noise adding method, so that image noise possibly occurring in a natural data set is simulated, and the method specifically comprises the following steps:
each pixel point has a certain probability value, which becomes a white point (i.e., pixel value (255, 255)) or a black point (i.e., pixel value (0, 0)), and then the entire image after the operation is output as a noise result.
S15: the original unlabeled pedestrian image data is augmented with a gaussian blur method to simulate the image blur that may occur in a natural dataset. According to the following formula:
after the sigma value is set, a weight matrix can be calculated, so that matrix operation is performed by taking each pixel in the image as the center, and the purpose of blurring the image can be achieved.
S2: the original image data in the original pedestrian image data set D0 is introduced into a convolutional neural network as a main network N0 to carry out forward propagation extraction to obtain a feature F0; the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and feature extraction is carried out by using a forward propagation method, wherein the specific formula of forward propagation is as follows:
wherein a represents the intermediate layer output; sigma represents an activation function; z represents the input of the activation layer; the superscript indicates the number of layers; * Representing a convolution operation; w represents a convolution kernel; b denotes the bias.
S3: the corresponding augmented image data in the five augmented data sets D1-D5 are respectively input into a convolutional neural network which is not shared by the five parameters as the augmented networks N1-N5 to carry out forward propagation extraction to obtain characteristics F1-F5; the forward propagation in step S3 takes the same way as in step S2.
S4, randomly selecting an image inequative from an original pedestrian image data set D0 to serve as a negative sample, and introducing a main network N0 to forward propagation and extraction to obtain a characteristic Fnegative;
s5: the Euclidean distance is calculated by using the output characteristic F0 and the output characteristics F1 to F5 respectively to obtain five loss values L1 to L5, specifically:
the Euclidean distance is calculated by the characteristic F0 extracted by the main network N0 and the characteristics F1 to F5 extracted by the augmentation networks N1 to N5 respectively, and the specific formula is as follows:
wherein x is brought into the feature F0 extracted by the main network; y are respectively brought into the characteristics F1 to F5 extracted by the five augmentation networks; xi and yi are values in the respective dimensions of the corresponding feature.
S6: calculating Euclidean distances by using the output characteristics Fnegative and the output characteristics F0-F5 respectively to obtain 6 loss values L0 negative-LMnegative; since in general the data volume in the data set is large and the proportion of the same class of data to the total data volume is small, a randomly selected image is taken here as a negative sample, which is feasible in most cases.
S7: and (3) taking the results obtained by subtracting the five loss values L1-L5 obtained in the step (S5) from the 5 loss values L1 negative-LMnegative obtained in the step (S6) as losses to perform backward propagation calculation gradient update on the augmented networks N1-N5.
The calculated error value is transmitted back to the corresponding convolutional neural network, and the parameter value of the convolutional neural network is iteratively updated by using a backward propagation algorithm, wherein the specific formula is as follows:
wherein the superscript indicates the number of layers; delta represents a gradient value; * Representing a convolution operation; w represents a convolution kernel; rot180 means that the matrix is turned 180 degrees, namely turned up and down once and then turned left and right once; o represents a point-to-point multiplication; σ' represents the derivative of the activation function.
S8: the five loss values L1 to L5 obtained in the step S4 are summed and subtracted from the result of the summation of the loss values L0negative to L5negative obtained in the step S6 to obtain a total loss value L0, and the specific formula is as follows:
wherein lambda is i ,i∈[1,5]Is a positive sample weight value, L i ,i∈[1,5]The corresponding positive sample error value, here λ, is taken i =1,i∈[1,5];λ inegative ,inegative∈[0,5]Is a negative sample weight value, L inegative ,inegative∈[0,5]A corresponding negative sample error value, here λ inegative =1,inegative∈[0,5]。
S9: and (3) taking the total loss value L0 obtained in the step S6 as loss to perform backward propagation calculation gradient updating on the main network N0 to update the main network parameters.
S10: the operations of S2 to S10 are repeated until the main network and the augmented network converge.
S11: the master network model is taken as output.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (10)
1. An unsupervised pedestrian re-identification method based on an augmented network is characterized by comprising the following steps:
s1: performing augmentation operation on an original pedestrian image data set D0 without labels, wherein the augmentation operation comprises one or more of image scaling, random cutting, random erasing, noise adding and Gaussian blur, and M new augmentation data sets D1-DM are obtained, and M is a positive integer;
s2: the original image data in the original pedestrian image data set D0 is introduced into a convolutional neural network as a main network N0 to carry out forward propagation extraction to obtain a feature F0;
s3: respectively inputting corresponding augmented image data in M augmented data sets D1-DN into M convolutional neural networks with unshared parameters as augmented networks N1-NM to perform forward propagation extraction to obtain characteristics F1-FM;
s4: randomly selecting an image inequative from an original pedestrian image data set D0 to serve as a negative sample, and introducing a main network N0 to forward propagation and extraction to obtain a characteristic Fnegative;
s5: calculating Euclidean distances by using the output characteristics F0 and the output characteristics F1-FM respectively to obtain M loss values L1-LM;
s6: calculating Euclidean distance by using the output characteristics Fnegative and the output characteristics F0-FM respectively to obtain M+1 loss values L0 negative-LMnegative;
s7: the obtained result of subtracting the M loss values L1-LM obtained in S5 from the M loss values L1 negative-LMnegative obtained in S6 is used as loss to carry out backward propagation calculation gradient update on the augmented network N1-NM;
s8: summing M loss values L1-LM obtained in the step S5, and subtracting the result of summing the loss values L0 negative-LMnegative obtained in the step S6 to obtain a total loss value L0;
s9: taking the total loss value L0 obtained in the step S8 as loss to carry out backward propagation calculation gradient update on the main network N0 to update the main network parameters;
s10: repeating the operations of S2-S9 until the main network and the augmentation network are converged;
s11: the master network model is taken as output.
2. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the image scaling process is included in the augmentation operation, the image is scaled by using a bilinear interpolation method, so as to simulate the images with various resolutions which may occur in the natural dataset, and the specific calculation method is as follows:
where q11= (x 1, y 1), q12= (x 1, y 2), q21= (x 2, y 1), q22= (x 2, y 2) is the four pixel points where the point (x, y) is closest.
3. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the augmentation operation includes random clipping, the augmentation is performed by using a random clipping method, so as to simulate various local pedestrian images that may occur in the natural dataset, and the specific method is as follows:
firstly, randomly selecting a pixel point in an image, then taking the pixel point as an upper left corner, randomly forming a rectangle with a certain length and a certain width, and outputting the pixel point in the whole rectangle as a cutting result.
4. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the random erasure process is included in the augmentation operation, the random erasure method is used for the augmentation, so as to simulate various missing or incomplete pedestrian images possibly occurring in the natural dataset, and the specific method is as follows:
a pixel point is selected randomly from an image, then the pixel point is taken as the upper left corner to form a rectangle with a certain length and a certain width, all pixel values of the pixel points in the whole rectangle are set to be black, namely pixel values (0, 0 and 0), and then the whole image after operation is output as a random erasing result.
5. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the operation of augmentation includes a noise adding method, the noise adding method is used for augmentation, so as to simulate the image noise possibly occurring in the natural data set, and the specific operation is as follows:
for each pixel, a certain probability value becomes white point, namely pixel value (255 )), or black point, namely pixel value (0, 0), and then the whole image after operation is output as a noise adding result.
6. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the gaussian blur processing is included in the augmentation operation, the augmentation is performed using a gaussian blur method, so as to simulate the situation of image blur that may occur in the natural data set, according to the following formula:
after the sigma value is set, a weight matrix can be calculated, so that matrix operation is performed by taking each pixel in the image as the center, and the purpose of blurring the image can be achieved.
7. The method for unsupervised pedestrian re-recognition based on the augmented network according to claim 1, wherein in step S2 and step S3, the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and the feature extraction is performed by using a forward propagation method, and the forward propagation specific formula is as follows:
wherein a represents the intermediate layer output; sigma represents an activation function; z represents the input of the activation layer; the superscript indicates the number of layers; * Representing a convolution operation; w represents a convolution kernel; b denotes the bias.
8. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S5 specifically comprises:
the Euclidean distance is calculated by the characteristic F0 extracted by the main network N0 and the characteristics F1-FM extracted by the augmentation networks N1-NM respectively, and the specific formula is as follows:
wherein x is brought into the feature F0 extracted by the main network; y are respectively brought into the characteristics F1-FM extracted by the five augmentation networks; xi and yi are values in each dimension of the corresponding feature;
the step S6 specifically comprises the following steps:
and the feature Fnegative is a negative sample, an image is randomly selected as the negative sample Fnegative, and the Euclidean distance is calculated with the output features F0-FM.
9. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S7 specifically comprises:
and transmitting the calculated error value back to the corresponding convolutional neural network, and iteratively updating the parameter value of the convolutional neural network by using a backward propagation algorithm, wherein the specific formula is as follows:
wherein the superscript indicates the number of layers; delta represents a gradient value; * Representing a convolution operation; w represents a convolution kernel; rot180 means that the matrix is turned 180 degrees, namely turned up and down once and then turned left and right once; o represents a point-to-point multiplication; σ' represents the derivative of the activation function.
10. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S8 specifically comprises:
the error values L1-LM summation obtained after the Euclidean distance is calculated respectively by the features obtained by the augmentation network and the features obtained by the main network are subtracted from the results of the L0 negative-LM negative summation to obtain a total error value L0, and the specific formula is as follows:
wherein lambda is i ,i∈[1,M]Is a positive sample weight value, L i ,i∈[1,M]The corresponding positive sample error value, here λ, is taken i =1,i∈[1,M];λ inegative ,inegative∈[0,M]Is a negative sample weight value, L inegative ,inegative∈[0,M]A corresponding negative sample error value, here λ inegative =1,inegative∈[0,M]。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911310016.1A CN111062329B (en) | 2019-12-18 | 2019-12-18 | Unsupervised pedestrian re-identification method based on augmented network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911310016.1A CN111062329B (en) | 2019-12-18 | 2019-12-18 | Unsupervised pedestrian re-identification method based on augmented network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111062329A CN111062329A (en) | 2020-04-24 |
CN111062329B true CN111062329B (en) | 2023-05-30 |
Family
ID=70302269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911310016.1A Active CN111062329B (en) | 2019-12-18 | 2019-12-18 | Unsupervised pedestrian re-identification method based on augmented network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111062329B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111985645A (en) * | 2020-08-28 | 2020-11-24 | 北京市商汤科技开发有限公司 | Neural network training method and device, electronic equipment and storage medium |
CN112043260B (en) * | 2020-09-16 | 2022-11-15 | 杭州师范大学 | Electrocardiogram classification method based on local mode transformation |
CN112200187A (en) * | 2020-10-16 | 2021-01-08 | 广州云从凯风科技有限公司 | Target detection method, device, machine readable medium and equipment |
CN112580720B (en) * | 2020-12-18 | 2024-07-09 | 华为技术有限公司 | Model training method and device |
CN113033410B (en) * | 2021-03-26 | 2023-06-06 | 中山大学 | Domain generalization pedestrian re-recognition method, system and medium based on automatic data enhancement |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583379A (en) * | 2018-11-30 | 2019-04-05 | 常州大学 | A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109426858B (en) * | 2017-08-29 | 2021-04-06 | 京东方科技集团股份有限公司 | Neural network, training method, image processing method, and image processing apparatus |
-
2019
- 2019-12-18 CN CN201911310016.1A patent/CN111062329B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583379A (en) * | 2018-11-30 | 2019-04-05 | 常州大学 | A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian |
Non-Patent Citations (1)
Title |
---|
Fast Open-World Person Re-Identification;Wei-Shi Zheng et al.;《IEEE Transactions on Image Processing 》;20170816;第27卷(第5期);第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111062329A (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062329B (en) | Unsupervised pedestrian re-identification method based on augmented network | |
CN112287940B (en) | Semantic segmentation method of attention mechanism based on deep learning | |
CN109886121B (en) | Human face key point positioning method for shielding robustness | |
Li et al. | Semantic relationships guided representation learning for facial action unit recognition | |
CN110210551B (en) | Visual target tracking method based on adaptive subject sensitivity | |
CN108256562B (en) | Salient target detection method and system based on weak supervision time-space cascade neural network | |
CN107945118B (en) | Face image restoration method based on generating type confrontation network | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN116682120A (en) | Multilingual mosaic image text recognition method based on deep learning | |
CN114821050B (en) | Method for dividing reference image based on transformer | |
CN111861886B (en) | Image super-resolution reconstruction method based on multi-scale feedback network | |
CN113052775B (en) | Image shadow removing method and device | |
CN111276240A (en) | Multi-label multi-mode holographic pulse condition identification method based on graph convolution network | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN114898284B (en) | Crowd counting method based on feature pyramid local difference attention mechanism | |
CN116310693A (en) | Camouflage target detection method based on edge feature fusion and high-order space interaction | |
CN110751271B (en) | Image traceability feature characterization method based on deep neural network | |
Pang et al. | PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images | |
Li et al. | Semantic prior-driven fused contextual transformation network for image inpainting | |
Yuan et al. | An efficient attention based image adversarial attack algorithm with differential evolution on realistic high-resolution image | |
CN114708591A (en) | Document image Chinese character detection method based on single character connection | |
Sun et al. | Knock knock, who’s there: Facial recognition using CNN-based classifiers | |
Essam et al. | SwinRelTR: an efficient single-stage scene graph generation model for low-resolution images | |
CN116188774B (en) | Hyperspectral image instance segmentation method and building instance segmentation method | |
JP7386940B2 (en) | Processing images containing overlapping particles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |