CN109086811B - Multi-label image classification method and device and electronic equipment - Google Patents
Multi-label image classification method and device and electronic equipment Download PDFInfo
- Publication number
- CN109086811B CN109086811B CN201810802045.9A CN201810802045A CN109086811B CN 109086811 B CN109086811 B CN 109086811B CN 201810802045 A CN201810802045 A CN 201810802045A CN 109086811 B CN109086811 B CN 109086811B
- Authority
- CN
- China
- Prior art keywords
- image
- label
- prediction result
- feature
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000011176 pooling Methods 0.000 claims abstract description 35
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims description 64
- 238000012549 training Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 18
- 230000009467 reduction Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 abstract description 4
- 230000004913 activation Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 101100261006 Salmonella typhi topB gene Proteins 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 101150032437 top-3 gene Proteins 0.000 description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000272194 Ciconiiformes Species 0.000 description 1
- 241001147416 Ursus maritimus Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a multi-label image classification method, a multi-label image classification device and electronic equipment. On the other hand, according to the parameters of the full connection layer and the first characteristic image, performing characteristic filtering on the first characteristic image to obtain a second characteristic image; wherein parameters of the fully-connected layer and parameters of the convolutional layer are optimized based on a metric learning algorithm; and then performing pooling processing on the second characteristic image to obtain a second label classification prediction result. And finally, comprehensively considering the first label classification prediction result and the second label classification prediction result to obtain a target label classification prediction result. According to the method, label classification is carried out from two aspects, and the first label classification prediction result is corrected by the second label classification prediction result obtained based on the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a multi-label image classification method and device and electronic equipment.
Background
Multi-label image classification (Multi-label image classification) is a very important research topic in computer vision. Because a picture taken in a real scene always contains a plurality of objects, the image contains a plurality of labels, and the number of result combinations of classification is exponentially increased compared with that of a single label. Compared with the single-label image classification problem, the multi-label image classification problem is higher in difficulty, lower in precision and higher in research significance.
Most of the conventional methods use Graph (Graph) to model the relationship between labels, so as to artificially add constraints to the final predicted result, so as to reduce the number of classification results. Such a method is very dependent on the a priori knowledge of the person and the quality of the created graph (graph), and has great limitations. In recent years, with the rapid development of Deep learning (Deep learning), there are more and more methods for modeling relationships between tags using a neural network, and such methods can break through the above limitations, but most of the current Deep learning methods use an Attention (Attention) mechanism, and improve the accuracy based on a single tag classification method. Therefore, at present, how to improve the multi-label image identification precision by using the relation between labels in a real sense still has no good solution.
Disclosure of Invention
In view of the above, the present invention provides a multi-label image classification method, apparatus and electronic device to effectively reduce the number of label combinations and improve the accuracy of multi-label image recognition.
In a first aspect, an embodiment of the present invention provides a multi-label image classification method, including:
extracting a first characteristic image of an image to be processed;
processing the first characteristic image through a pooling layer and a full-link layer in sequence to obtain a first label classification prediction result;
obtaining a second characteristic image according to the parameters of the full connection layer and the first characteristic image, wherein the second characteristic image comprises a sub-characteristic image corresponding to each label of a preset category;
performing pooling processing on the second characteristic image to obtain a second label classification prediction result;
and obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where a first loss function is determined according to a target label classification prediction result;
according to a preset metric learning loss function and the first loss function, determining a final loss function;
wherein the metric learning loss function is set based on a metric learning algorithm.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the method further includes:
and in the training process, optimizing the parameters of the full-connection layer and the parameters of the convolution layer based on the metric learning loss function and the correlation between the labels.
With reference to the second possible implementation manner of the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the optimizing, by using a back propagation algorithm, the parameters of the fully-connected layer and the parameters of the convolutional layer based on the metric learning loss function and the correlation between the labels includes:
mapping the sub-feature images corresponding to each label into a preset space based on a metric learning algorithm, and respectively calculating the distance between each sub-feature image in the preset space;
and optimizing parameters of the full-connection layer and parameters of the convolution layer based on the metric learning loss function and the correlation between the labels by utilizing a back propagation algorithm so as to adjust the distance between the sub-feature images in the preset space.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where in the sub-feature image corresponding to each label, a sub-feature image corresponding to a label that belongs to a currently input image is used as a correlation image, and a sub-feature image corresponding to a label that does not belong to the currently input image is used as a non-correlation image;
and zooming the distance of the correlation images in the preset space, and zooming the non-correlation images to make the non-correlation images far away from the correlation images in the preset space.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the obtaining a first label classification prediction result by sequentially processing the first feature image through a pooling layer and a full link layer includes:
inputting the first feature image into a pooling layer, and performing pooling processing on the first feature image based on a global maximum pooling function to obtain a first feature image after dimension reduction;
and inputting the first feature image subjected to dimension reduction to a full connection layer for classification processing, and generating a first label classification prediction result.
With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where obtaining a second feature image according to the parameter of the full connection layer and the first feature image includes:
and multiplying the first characteristic image and the parameters of the full connection layer to obtain a second characteristic image.
With reference to the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result includes:
and taking the sum of the first label classification prediction result and the second label classification prediction result as a target label classification prediction result.
In a second aspect, an embodiment of the present invention further provides a multi-label image classification apparatus, including:
the first extraction module is used for extracting a first characteristic image of the image to be processed;
the first prediction module is used for processing the first characteristic image through a pooling layer and a full-link layer in sequence to obtain a first label classification prediction result;
the second extraction module is used for obtaining a second characteristic image according to the parameters of the full connection layer and the first characteristic image, wherein the second characteristic image comprises a sub-characteristic image corresponding to each label in a preset category;
the second prediction module is used for performing pooling processing on the second characteristic image to obtain a second label classification prediction result;
and the target prediction module is used for obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement the method described in the first aspect and any possible implementation manner thereof.
In a fourth aspect, the present invention further provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to execute the method described in the first aspect and any possible implementation manner thereof.
The embodiment of the invention has the following beneficial effects:
in the embodiment provided by the invention, on one hand, a first characteristic image is obtained by utilizing the convolutional layer, and then, the first characteristic image is classified by utilizing the pooling layer and the full-link layer to obtain a first label classification prediction result. On the other hand, the parameters of the full connection layer and the parameters of the convolution layer are optimized based on a metric learning algorithm; and then performing pooling processing on the second characteristic image to obtain a second label classification prediction result. And finally, comprehensively considering the first label classification prediction result and the second label classification prediction result to obtain a target label classification prediction result. The method classifies labels from two aspects, obtains a second feature map bearing the relationship between labels based on a metric learning algorithm, and corrects a first label classification prediction result according to a second label classification prediction result obtained from the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a multi-label image classification method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a network model structure of a multi-label image classification method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a multi-label image classification apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another multi-label image classification apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, a mode of modeling relationships among labels by using a graph has great limitation, in order to break through the limitation, a neural network method is adopted to model the relationships among the labels, however, most of the methods use an Attention (Attention) mechanism, the accuracy is improved by a method based on single label classification, the relationships among the labels cannot be modeled in a real sense, and the multi-label image identification accuracy needs to be further improved.
In view of the above problem of how to model the relationship between labels in the multi-label image classification problem, it was found that some labels have strong correlation, such as sky and white cloud, and some labels have weak or even negative correlation, such as polar bear and penguin (unless they can occur simultaneously in the zoo). Therefore, the probability that some labels appear in the images at the same time is high, other labels basically do not appear in one image at the same time, and for the labels which do not appear in one image at the same time, the label combination does not need to be considered, so that the combination number of the predicted labels is reduced, and the multi-label image classification precision is improved.
Based on the above research thought, assuming that the labels appearing in the same image at the same time have correlation, the multi-label image classification method, apparatus, and electronic device provided in the embodiments of the present invention, on one hand, obtain the first feature image by using the convolution layer, and then perform classification by using the pooling layer and the full link layer, so as to obtain the first label classification prediction result. On the other hand, according to the parameters of the full connection layer and the first characteristic image, performing characteristic filtering on the first characteristic image to obtain a second characteristic image; and then performing pooling processing on the second characteristic image to obtain a second label classification prediction result. And finally, comprehensively considering the first label classification prediction result and the second label classification prediction result to obtain a target label classification prediction result. The method classifies labels from two aspects, obtains a second feature map bearing the relationship between labels based on a metric learning algorithm, and corrects a first label classification prediction result by using a second label classification prediction result obtained from the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
In one embodiment, the multi-label image classification method is implemented by a network model as shown in fig. 1. The network model comprises two sub-networks, specifically, a main network used for extracting image features and generating a first label classification prediction result, and a metric learning network used for applying constraint on the main network and assisting the main network to correct and enhance the label classification result. The metric learning network firstly needs to apply a metric learning algorithm to model the relationship among the labels in the training process, so that the final multi-label image identification precision can be improved by using the characteristic diagram of the relationship among the bearing labels in the testing process. For the convenience of understanding the present embodiment, first, referring to fig. 1, a multi-label image classification method disclosed in the present embodiment is described in detail.
Fig. 2 is a flowchart illustrating a multi-label image classification method according to an embodiment of the present invention. As shown in fig. 2, the multi-label image classification method includes:
in step S201, a first feature image of an image to be processed is extracted.
In the embodiment of the present invention, the image to be processed may be an image to be processed uploaded by a user in a picture format, such as bmp, jpg, png, or the like. But also a shot captured by an image capture device, such as a camera. Or the image to be processed in the picture format downloaded by the user through the network.
In one embodiment, as shown in fig. 1, the first feature image is a convolution layer in the main network, such as a deep convolution network Resnet101, and then a corresponding convolution feature is obtained. For example, the image to be processed is 448 × 448, and a first feature image with the dimension of 2048 × 14 is obtained through the Resnet101 network, and the first feature image contains information of all labels in the image to be processed. For the convenience of the following description, the first feature image is defined as ziWhere the index i denotes the ith inspection chart input into the main networkLike this.
In practical application, the features of the image to be processed may also be extracted in other manners, so as to obtain the first feature image, for example, the features of the image to be processed are extracted in manners such as a VGG network or an inclusion network, so as to obtain the first feature image.
Step S202, the first characteristic image is processed by a pooling layer and a full-link layer in sequence to obtain a first label classification prediction result.
In order to solve the problem that small object information is easily lost in a Global Average Pooling (GAP) mode, in an embodiment, a Global Max Pooling (GMP) mode is adopted to perform dimension reduction processing on the first feature image, and based on this, the step S202 includes: inputting the first characteristic image into a pooling layer, and performing pooling processing on the first characteristic image based on a global maximum pooling function to obtain a first characteristic image after dimension reduction; and inputting the first feature image after dimension reduction into a full Connected layers (FC) for classification processing, and generating a first label classification prediction result.
Continuing with the example in step S201, as shown in fig. 1, assuming that the image to be processed is 448 × 448, a first feature image with a dimension of 2048 × 14 is obtained through the Resnet101 network. After processing by the global maximum pooling function, a feature vector with dimension size of 2048 is obtained. Assume that the total number of labels in the preset category of the network model is C (i.e., the total number of labels included in the training set during training of the network model), and C is less than 2048. And then, parameters of the full connection layer are respectively configured according to the total number of labels of the preset category and the dimensionality of the first feature image after dimensionality reduction, so that the full connection layer is 2048 × C. And finally, calculating characteristic vectors with the dimension of 2048 through 2048 × C full-connection layers to obtain a first label classification prediction result, wherein the first label classification prediction result is a set of C vectors, and each vector represents a prediction result corresponding to each category.
Assuming that C is 3, the first label classification prediction result includes a set of 3 vectors, for example, the set is a { a1, a2, a3}, then a1 represents the prediction result of the first classification, a2 represents the prediction result of the second classification, and a3 represents the prediction result corresponding to the third classification.
It should be noted that the number of the labels of the preset category in the embodiment of the present invention may be 1, and may also be greater than 1. When the number of the labels of the preset category is greater than 1, the effect of the multi-label image classification method in the embodiment of the invention is more obvious.
Step S203, obtaining a second characteristic image according to the parameters of the full connection layer and the first characteristic image, wherein the second characteristic image comprises a sub-characteristic image corresponding to each label of a preset category.
For example, the preset type of label (hereinafter referred to as a label in the training set) includes a blue sky, a white cloud, a puppy, and a kitten, and the second feature map includes a sub-feature image corresponding to the blue sky, a sub-feature image corresponding to the white cloud, a sub-feature image corresponding to the puppy, and a sub-feature image corresponding to the kitten.
The step S203 includes: and multiplying the first characteristic image by the parameters of the full connection layer to obtain a second characteristic image.
Still taking fig. 1 as an example, it is first necessary to multiply the parameter W of the fully-connected layer by the first feature map, where the dimension of the parameter W is 2048 × C, and the parameter may be understood as a filter that can filter out information required by each label category in the first feature map, and filter out information that is not related to each label category. As can be seen from fig. 1, the dimension size of the second feature map is C14, where C represents the total number of labels of the preset category of the network model, for example, when the training set is trained by the network model and there are 80 labels in total, C is 80. The second feature image includes a sub-feature image corresponding to each label of a preset category, where the second feature image may also be referred to as a multi-label activation map, and the sub-feature image corresponding to the label may also be referred to as an activation map corresponding to the label, for example, an activation map corresponding to a label sky.
In a possible embodiment, A may be definedcThe activation map corresponding to label c is shown. A is ac=WcziWherein W iscAnd (4) representing the parameter corresponding to the label c in the parameter W, wherein the dimension size is 1 × 2048.
Specifically, in the process of multiplying the parameter W of the fully connected layer by the first feature map, the dimension of the first feature map is straightened to 2048 × 196, then the parameter W is multiplied by the first feature map to obtain C × 196, and finally C × 196 is transformed to C × 14.
It should be noted that the parameters of the convolutional layer and the parameters of the fully-connected layer in fig. 1 are optimized based on a metric learning algorithm in the training process.
And step S204, performing pooling processing on the second characteristic image to obtain a second label classification prediction result.
As a possible embodiment, in the metric learning network, referring to fig. 1, the optimized second feature image is subjected to pooling layer processing based on a global maximum pooling function, so as to obtain a second label classification prediction result.
And S205, obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result.
In a possible embodiment, the sum of the first label classification prediction result and the second label classification prediction result is used as a target label classification prediction result, so that the first label classification prediction result obtained by the main network is corrected by using the second label classification prediction result obtained by the metric learning network, and the accuracy of multi-label classification identification is assisted to be enhanced.
Specifically, the first label classification prediction result and the second label classification prediction result may be expressed as confidence degrees. Referring to fig. 1, the target label classification prediction resultCan be expressed as:
wherein, ypclsRepresenting the first label classification prediction, ysclsRepresenting the second label classification prediction result.
The method carries out label classification from two aspects, obtains a second feature map bearing the relationship between labels based on a metric learning algorithm, and then corrects the first label classification prediction result by using a second label classification prediction result obtained from the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
In a possible embodiment, the method further comprises:
(a1) and determining a first loss function according to the target label classification prediction result.
In particular, the first loss function may be expressed as:
wherein,an identifier indicating whether the ith image in the image batch contains the c-th label is 0 or 1, and if the ith image contains the c-th label, the identifier is used for determining whether the ith image contains the c-th labelThe value is 1, if the ith image does not contain the c-th label, yi cThe value is 0;representing the confidence of the prediction corresponding to the c label of the ith image in the image batch (namely the target label classification prediction result of the c label of the ith image),the probability of the prediction corresponding to the c label of the ith image in the image batch is shown. The confidence coefficient can be processed by a sigmoid function to obtain a corresponding probability.
(a2) And learning the loss function and the first loss function according to preset measurement, and determining a final loss function.
In a possible embodiment, the final loss function is a sum of the metric learning loss function and the first loss function, and may be specifically represented as:
L=Lcls+αLdis (3)
wherein, alpha represents a hyper-parameter, and the value range is alpha belongs to [0,1 ].
In a possible embodiment, the metric learning loss function is set based on a metric learning algorithm, and is a function of distance.
The Metric Learning (Metric Learning) is to learn the similarity of samples by a distance function. If the similarity between two images needs to be calculated, how to measure the similarity between the images is small, and the similarity between the images in the same category is large, which is the target of metric learning.
In this embodiment, the relationship between the labels is modeled based on the metric learning algorithm, and then the network model shown in fig. 1 is trained. The following describes the training process.
The method further comprises the following steps: in the training process, parameters of the full connection layer and parameters of the convolution layer are optimized based on the metric learning loss function and the correlation among the labels.
Specifically, the parameters of the fully-connected layer and the parameters of the convolutional layer are optimized, and the purpose is to improve the second feature diagram, so that the second feature diagram can bear the relationship among the labels, realize the modeling of the relationship among the labels, and then use the second feature diagram to constrain the first prediction classification result, so as to assist in improving the final classification result.
In a possible embodiment, a metric learning algorithm is utilized, and the correlation among the labels is embodied through the distance among the sub-feature graphs corresponding to the labels, so that the relational modeling is realized. Based on this, parameters of the full connection layer and parameters of the convolution layer are optimized, and the parameters specifically include:
(b1) and mapping the sub-feature images corresponding to each label into a preset space based on a metric learning algorithm, and respectively calculating the distance between each sub-feature image in the preset space.
Specifically, using the metric learning algorithm requires determining a matrix M, so that the sub-feature image, i.e., the activation map, corresponding to each label is mapped into the preset space T. Let A denote the sub-feature images corresponding to any two different labelsjAnd AkThe corresponding one-dimensional vectors after straightening are respectively expressed as ajAnd akWhere j ≠ k. Then, the distance between any two sub-feature images can be expressed as:
due to the fact thatRepresenting a distance, satisfying the nonnegativity, symmetry, and triangle inequalities, so that M is a semi-positive definite matrix, M can be decomposed as:
M=BTB (5)
thus, according to equation (5), equation (4) can be rewritten as:
wherein, B can be simulated and calculated through a neural network. Therefore, referring to fig. 1, the sub-feature image corresponding to each label can be mapped into the preset space T through the matrix B, and the distance between the sub-feature images in the preset space, that is, the euclidean distance, is calculated.
(b2) And optimizing parameters of the full-connection layer and parameters of the convolution layer by using a back propagation algorithm based on the metric learning loss function and the correlation among the labels so as to adjust the distance of each sub-feature image in a preset space.
In order to realize the purpose of embodying the correlation among the labels by using the distance, in the preset space T, the distance of the sub-feature image corresponding to the label with strong correlation is shortened, and the distance of the sub-feature image corresponding to the label with weak correlation or even the sub-feature image corresponding to the label with negative correlation is lengthened, so that the labels with strong correlation are gathered together. In this embodiment, it is assumed that the labels appearing in the same image to be detected have correlation, and the distance between the sub-feature images corresponding to the labels in the image to be detected is shortened, but the distance between the sub-feature images not belonging to the labels in the image to be detected is lengthened. Based on the above mode, in the training process, each image in the training set is trained, so that modeling of the relation between the labels is realized.
Based on this, the step (b2) includes: in the sub-feature image corresponding to each label, taking the sub-feature image corresponding to the label belonging to the currently input image as a correlation image, and taking the sub-feature image corresponding to the label not belonging to the currently input image as a non-correlation image; and zooming in the distance of the correlation images in a preset space, and zooming out the non-correlation images to enable the non-correlation images to be far away from the correlation images in the preset space.
Wherein, the label corresponding to each input image is known in the training process and can be labeled. Therefore, if the labels in the training set include puppies, kittens, sky, and white clouds, and the labels in the input image include only the sky and the white clouds, the sub-feature images corresponding to the sky and the white clouds are used as correlation images, and the distance between the correlation images in the preset space is reduced. And taking the sub-feature images corresponding to the puppies and the kittens as irrelevant images, and increasing the distance of the irrelevant images in a preset space to make the irrelevant images far away from the relevant images, namely the sub-feature images corresponding to the sky and the white clouds.
After the metric learning is completed, back propagation is performed based on the final loss function to optimize the parameters of the fully-connected layer and the parameters of the convolutional layer. In order to achieve the purpose of distance adjustment, in a specific implementation process, a metric learning loss function is constructed, and a back propagation algorithm is used to optimize the final loss function, especially the metric learning loss function in the final loss function, so as to adjust the distance of the sub-feature image corresponding to each label in the preset space T, and further improve the second feature map, so that the second feature map bears the correlation between the labels.
Based on this, the metric learning loss function constructed should embody: based on the assumption that the labels in one image have correlation, the closer the distance between the sub-feature images corresponding to the labels belonging to the same image is, the farther the sub-feature image corresponding to the label not belonging to the image is from the sub-feature image corresponding to the label of the image, and the smaller the metric learning loss value finally calculated after metric learning is.
In a possible embodiment, the metric learning loss function is expressed in particular as equation (7):
wherein, a ═ fflat(fB(A)), (8)
fw (a) represents a sub-feature image in the preset space T obtained after the sub-feature image a corresponding to the label is transformed by the matrix B. a' represents the one-dimensional feature vector after straightening for fw (A). N represents the number of images (as images to be detected) in an image batch (sample set extracted from a training set) input into the main network at a time, i represents the number of images in the image batch, j represents the number of sub-feature images corresponding to labels belonging to the ith image, and k represents the number of sub-feature images corresponding to labels not belonging to the ith image in the labels of the training set. S represents a set of sub-feature images corresponding to labels belonging to the ith image, and C' is the size of the set S, namely the number of sub-feature images corresponding to labels belonging to the ith image;and a set of sub-feature images corresponding to labels which do not belong to the ith image among the labels representing the training set.
By using a metric learning algorithm, based on a final loss function, particularly a metric learning loss function therein, parameters of the network model are optimized after back propagation, and then the distance between the sub-feature images, i.e. the activation maps of each label, in the preset space T is adjusted, so that the second feature image, i.e. the multi-label activation map, is improved, and therefore, the activation maps corresponding to the labels with correlation can be simultaneously activated in the test process, i.e. the activation maps corresponding to the labels belonging to the same image move close to each other in the preset space T.
Therefore, the multi-label activation graph is optimized based on the correlation among the labels, so that the multi-label activation graph contains the information of the correlation among the labels, the modeling of the relation among the labels is realized, and the multi-label activation graph is further used as applied constraint to improve the classification accuracy. For example, the labels of one image include moon, sky, and white cloud, and originally, because the ratio of the moon in the image is too small, the neural network cannot capture the labels well, and after the metric learning algorithm is used, the distances of activation maps corresponding to the labels of the moon, the sky, and the white cloud are reduced, so that the neural network considers that the moon, the white cloud, and the sky have correlation, and activates the activation map corresponding to the label of the moon in the multi-label activation map, so as to obtain an optimized second feature image, that is, a second feature image bearing the relationship between the labels.
In the embodiment provided by the invention, on one hand, a first characteristic image is obtained by utilizing the convolutional layer, and then, the first characteristic image is classified by utilizing the pooling layer and the full-link layer to obtain a first label classification prediction result. On the other hand, according to the parameters of the full connection layer and the first characteristic image, performing characteristic filtering on the first characteristic image to obtain a second characteristic image; and then performing pooling processing on the second characteristic image to obtain a second label classification prediction result. And finally, comprehensively considering the first label classification prediction result and the second label classification prediction result to obtain a target label classification prediction result. The method classifies labels from two aspects, obtains a second feature map bearing the relationship between labels based on a metric learning algorithm, and corrects a first label classification prediction result by using a second label classification prediction result obtained from the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
In order to more intuitively embody the beneficial effects of the multi-label image classification method in the embodiment of the present invention, the result of the multi-label classification precision experiment of the classification method in the embodiment of the present invention on the current large-scale image authoritative data set MS-COCO in the industry is compared with the existing method, as shown in table 1:
TABLE 1
For better effectiveness OF the weighing method, thirteen indexes are provided in table 1 as measurement standards, including op (overall precision), OR (overall precision), OF1(overall F1), CP (pre-class precision), OR (pre-class precision), CF1(pre-class F1), and map (mean precision); in addition, CP/top3 represents CPs calculated from the first three categories for which the prediction results are optimal; CR/top3 represents the CR calculated from the first three categories for which the prediction is best; CF1/top3 represents CF1 calculated from the first three categories for which the prediction results are best; OP/top3 represents the OP calculated from the first three categories for which the prediction results are best; OR/top3 represents the OR calculated from the first three categories for which the prediction is best; OF1/top3 shows OF1 calculated from the first three categories with the best prediction results. The indexes in table 1 are all the larger the better, and the calculation formula for each index in table 1 is as follows.
Where D represents the number of classes to be predicted, eThe index is represented by a number of words,indicating the number of e-th classes of prediction pairs,indicating the number of predicted e-th classes,indicating the number of all e-th categories.
In addition, WARP is from the article "deep restriction and transmission for a multi-label image analysis"; CNN-RNN (Convolutional Neural Network-Recurrent Neural Network) comes from the paper "a unified frame for multi-label image classification"; RLSD is available from the paper Multi-label image classification with regional specific requirements; RNN-Attention is from the paper Multi-Label image recording by recording the actual regions; RNN-Reinforcement from the article "Current Attentional reliability learning for Multi-label image recognition"; Order-Free RNN, from the article Order-Free Rnn with visual adherence for Multi-label classification; SRN from the article "Learning spatial regularization with image level supervisors for Multi-layer image classification"; multi-event is from the article "Multi-event filtering and fusion for Multi-label classification, object detection and segmentation based on weather modification". Table 1 was obtained by comparing the method in the present example with the method in each paper.
OF these, OF1 and CF1 are important indicators, and MAP is the most important indicator. Therefore, it can be intuitively seen from table 1 that the values OF the OF1, CF1 and MAP index obtained by the multi-label image classification method provided by the embodiment OF the present invention are all maximum values compared with the results obtained by the method in the prior art, so that the multi-label image classification method in the embodiment OF the present invention can effectively improve the accuracy OF multi-label image classification compared with the prior art.
Fig. 3 shows a multi-label image classification apparatus that employs the multi-label image classification method shown in the first embodiment in a one-to-one correspondence, corresponding to the multi-label image classification method in the first embodiment. The multi-label image classification device comprises:
the first extraction module 11 is configured to extract a first feature image of an image to be processed;
the first prediction module 12 is configured to sequentially process the first feature image through a pooling layer and a full link layer to obtain a first label classification prediction result;
the second extraction module 13 is configured to obtain a second feature image according to the parameters of the full connection layer and the first feature image, where the second feature image includes a sub-feature image corresponding to each label of a preset category;
the second prediction module 14 is configured to perform pooling processing on the second feature image to obtain a second label classification prediction result;
and the target prediction module 15 is configured to obtain a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result.
Further, referring to fig. 4, a function determining module 16 is further included for:
determining a first loss function according to the target label classification prediction result;
according to a preset metric learning loss function and the first loss function, determining a final loss function;
wherein the metric learning loss function is set based on a metric learning algorithm.
Further, a model training module 17 is included for:
in the training process, parameters of the full connection layer and parameters of the convolution layer are optimized based on the metric learning loss function and the correlation among the labels.
Further, the model training module 17 is further configured to:
mapping the sub-feature images corresponding to each label into a preset space based on a metric learning algorithm, and respectively calculating the distance between each sub-feature image in the preset space;
and optimizing parameters of the full-connection layer and parameters of the convolution layer by using a back propagation algorithm based on the metric learning loss function and the correlation among the labels so as to adjust the distance of each sub-feature image in a preset space.
Further, the model training module 17 is further configured to:
in the sub-feature images corresponding to each label, the sub-feature image corresponding to the label belonging to the currently input image is taken as a relevant image, and the sub-feature image corresponding to the label not belonging to the currently input image is taken as a non-relevant image; and zooming in the distance of the correlation images in a preset space, and zooming out the non-correlation images to enable the non-correlation images to be far away from the correlation images in the preset space.
The method carries out label classification from two aspects, obtains a second feature map bearing the relationship between labels based on a metric learning algorithm, and then corrects the first label classification prediction result by using a second label classification prediction result obtained from the second feature map, so that the number of label combinations is reduced, and the multi-label image identification precision is assisted to be improved.
Referring to fig. 5, an embodiment of the present invention further provides an electronic device 100, including: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
The multi-label image classification device and the electronic equipment provided by the embodiment of the invention have the same technical characteristics as the multi-label image classification method provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
The computer program product for performing the multi-label image classification method provided in the embodiment of the present invention includes a computer-readable storage medium storing a non-volatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the electronic device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A multi-label image classification method is characterized by comprising the following steps:
extracting a first characteristic image of an image to be processed;
processing the first characteristic image through a pooling layer and a full-link layer in sequence to obtain a first label classification prediction result;
obtaining a second characteristic image according to the parameters of the full connection layer and the first characteristic image, wherein the second characteristic image comprises a sub-characteristic image corresponding to each label of a preset category; wherein the parameters of the fully-connected layer and the parameters of the convolutional layer are optimized based on a metric learning algorithm;
performing pooling processing on the second characteristic image to obtain a second label classification prediction result;
obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result;
further comprising:
determining a first loss function according to the target label classification prediction result;
according to a preset metric learning loss function and the first loss function, determining a final loss function;
wherein the metric learning loss function is set based on a metric learning algorithm;
further comprising:
in the training process, optimizing parameters of the full connection layer and parameters of the convolution layer based on the metric learning loss function and the correlation between the labels;
the optimizing parameters of the fully-connected layer and parameters of the convolutional layer based on the metric learning loss function and the correlation between the labels comprises:
mapping the sub-feature images corresponding to each label into a preset space based on a metric learning algorithm, and respectively calculating the distance between each sub-feature image in the preset space;
and optimizing parameters of the full-connection layer and parameters of the convolution layer based on the metric learning loss function and the correlation between the labels by utilizing a back propagation algorithm so as to adjust the distance between the sub-feature images in the preset space.
2. The method of claim 1, wherein the adjusting the spacing of the sub-feature images in the preset space comprises:
in the sub-feature images corresponding to each label, the sub-feature images corresponding to the labels belonging to the currently input image are taken as correlation images, and the sub-feature images corresponding to the labels not belonging to the currently input image are taken as non-correlation images;
and zooming the distance of the correlation images in the preset space, and zooming the non-correlation images to make the non-correlation images far away from the correlation images in the preset space.
3. The method according to claim 1, wherein the processing the first feature image sequentially through a pooling layer and a full-link layer to obtain a first label classification prediction result comprises:
inputting the first feature image into a pooling layer, and performing pooling processing on the first feature image based on a global maximum pooling function to obtain a first feature image after dimension reduction;
and inputting the first feature image subjected to dimension reduction to a full connection layer for classification processing, and generating a first label classification prediction result.
4. The method of claim 1, wherein obtaining a second feature image according to the parameters of the fully-connected layer and the first feature image comprises:
and multiplying the first characteristic image and the parameters of the full connection layer to obtain a second characteristic image.
5. The method of claim 1, wherein obtaining a target label classification predictor from the first label classification predictor and the second label classification predictor comprises:
and taking the sum of the first label classification prediction result and the second label classification prediction result as a target label classification prediction result.
6. A multi-label image classification apparatus, comprising:
the first extraction module is used for extracting a first characteristic image of the image to be processed;
the first prediction module is used for processing the first characteristic image through a pooling layer and a full-link layer in sequence to obtain a first label classification prediction result;
the second extraction module is used for obtaining a second characteristic image according to the parameters of the full connection layer and the first characteristic image, wherein the second characteristic image comprises a sub-characteristic image corresponding to each label in a preset category;
the second prediction module is used for performing pooling processing on the second characteristic image to obtain a second label classification prediction result;
the target prediction module is used for obtaining a target label classification prediction result according to the first label classification prediction result and the second label classification prediction result;
further comprising:
the function determining module is used for determining a first loss function according to the target label classification prediction result;
according to a preset metric learning loss function and the first loss function, determining a final loss function;
wherein the metric learning loss function is set based on a metric learning algorithm;
further comprising:
the model training module is used for optimizing parameters of the full-connection layer and parameters of the convolutional layer based on the metric learning loss function and the correlation between the labels in the training process;
the model training module is further to: mapping the sub-feature images corresponding to each label into a preset space based on a metric learning algorithm, and respectively calculating the distance between each sub-feature image in the preset space;
and optimizing parameters of the full-connection layer and parameters of the convolution layer based on the metric learning loss function and the correlation between the labels by utilizing a back propagation algorithm so as to adjust the distance between the sub-feature images in the preset space.
7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 5 when executing the computer program.
8. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810802045.9A CN109086811B (en) | 2018-07-19 | 2018-07-19 | Multi-label image classification method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810802045.9A CN109086811B (en) | 2018-07-19 | 2018-07-19 | Multi-label image classification method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086811A CN109086811A (en) | 2018-12-25 |
CN109086811B true CN109086811B (en) | 2021-06-22 |
Family
ID=64838288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810802045.9A Active CN109086811B (en) | 2018-07-19 | 2018-07-19 | Multi-label image classification method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086811B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797660A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Image labeling method and device, storage medium and electronic equipment |
CN110188791B (en) * | 2019-04-18 | 2023-07-07 | 南开大学 | Visual emotion label distribution prediction method based on automatic estimation |
CN110111340B (en) * | 2019-04-28 | 2021-05-14 | 南开大学 | Weak supervision example segmentation method based on multi-path segmentation |
CN110322021B (en) * | 2019-06-14 | 2021-03-30 | 清华大学 | Hyper-parameter optimization method and device for large-scale network representation learning |
CN110602527B (en) | 2019-09-12 | 2022-04-08 | 北京小米移动软件有限公司 | Video processing method, device and storage medium |
CN110705425B (en) * | 2019-09-25 | 2022-06-28 | 广州西思数字科技有限公司 | Tongue picture multi-label classification method based on graph convolution network |
CN110689081B (en) * | 2019-09-30 | 2020-08-21 | 中国科学院大学 | Weak supervision target classification and positioning method based on bifurcation learning |
CN111046949A (en) * | 2019-12-10 | 2020-04-21 | 东软集团股份有限公司 | Image classification method, device and equipment |
CN112948631A (en) * | 2019-12-11 | 2021-06-11 | 北京金山云网络技术有限公司 | Video tag generation method and device and electronic terminal |
CN111493829A (en) * | 2020-04-23 | 2020-08-07 | 四川大学华西医院 | Method, system and equipment for determining mild cognitive impairment recognition parameters |
CN111694954B (en) * | 2020-04-28 | 2023-12-08 | 北京旷视科技有限公司 | Image classification method and device and electronic equipment |
CN113159195B (en) * | 2021-04-26 | 2024-08-02 | 深圳市大数据研究院 | Ultrasonic image classification method, system, electronic device and storage medium |
CN113537339B (en) * | 2021-07-14 | 2023-06-02 | 中国地质大学(北京) | Method and system for identifying symbiotic or associated minerals based on multi-label image classification |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886073A (en) * | 2017-11-10 | 2018-04-06 | 重庆邮电大学 | A kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897390B (en) * | 2017-01-24 | 2019-10-15 | 北京大学 | Target precise search method based on depth measure study |
CN107145904A (en) * | 2017-04-28 | 2017-09-08 | 北京小米移动软件有限公司 | Determination method, device and the storage medium of image category |
CN108133233A (en) * | 2017-12-18 | 2018-06-08 | 中山大学 | A kind of multi-tag image-recognizing method and device |
-
2018
- 2018-07-19 CN CN201810802045.9A patent/CN109086811B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886073A (en) * | 2017-11-10 | 2018-04-06 | 重庆邮电大学 | A kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
Learning Deep Features for Discriminative Localization;Bolei Zhou 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20161212;论文第2921-2928页 * |
Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification;Feng Zhu 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;论文第2027-2034页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109086811A (en) | 2018-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086811B (en) | Multi-label image classification method and device and electronic equipment | |
CN108764292B (en) | Deep learning image target mapping and positioning method based on weak supervision information | |
WO2018108129A1 (en) | Method and apparatus for use in identifying object type, and electronic device | |
CN110008956B (en) | Invoice key information positioning method, invoice key information positioning device, computer equipment and storage medium | |
CN110210513B (en) | Data classification method and device and terminal equipment | |
CN111860398B (en) | Remote sensing image target detection method and system and terminal equipment | |
US20240257423A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
CN108564102A (en) | Image clustering evaluation of result method and apparatus | |
CN111680678A (en) | Target area identification method, device, equipment and readable storage medium | |
CN112365497A (en) | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures | |
CN113487610B (en) | Herpes image recognition method and device, computer equipment and storage medium | |
CN112991280B (en) | Visual detection method, visual detection system and electronic equipment | |
CN116310850B (en) | Remote sensing image target detection method based on improved RetinaNet | |
CN115345905A (en) | Target object tracking method, device, terminal and storage medium | |
CN114170558B (en) | Method, system, apparatus, medium, and article for video processing | |
CN110210314B (en) | Face detection method, device, computer equipment and storage medium | |
CN112991281B (en) | Visual detection method, system, electronic equipment and medium | |
CN117710728A (en) | SAR image target recognition method, SAR image target recognition device, SAR image target recognition computer equipment and storage medium | |
CN111582057B (en) | Face verification method based on local receptive field | |
CN115482436B (en) | Training method and device for image screening model and image screening method | |
CN113569600A (en) | Method and device for identifying weight of object, electronic equipment and storage medium | |
CN111104965A (en) | Vehicle target identification method and device | |
CN115862119A (en) | Human face age estimation method and device based on attention mechanism | |
WO2022237065A1 (en) | Classification model training method, video classification method, and related device | |
CN114330542A (en) | Sample mining method and device based on target detection and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |