CN113902948A - Fine-grained image classification method and system based on double-branch network - Google Patents
Fine-grained image classification method and system based on double-branch network Download PDFInfo
- Publication number
- CN113902948A CN113902948A CN202111175746.2A CN202111175746A CN113902948A CN 113902948 A CN113902948 A CN 113902948A CN 202111175746 A CN202111175746 A CN 202111175746A CN 113902948 A CN113902948 A CN 113902948A
- Authority
- CN
- China
- Prior art keywords
- maximum
- classified
- double
- target image
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 11
- 230000004913 activation Effects 0.000 claims description 99
- 238000004364 calculation method Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 13
- 239000011541 reaction mixture Substances 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 230000005284 excitation Effects 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 9
- 230000001629 suppression Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 6
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000011524 similarity measure Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 7
- 238000013527 convolutional neural network Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a fine-grained image classification method based on a double-branch network, which comprises the following steps: preprocessing a target image to be classified; inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified; and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified. The method can solve the problems that the convolutional neural network method in the prior art cannot be expanded to a fine-grained image classification task based on a transform architecture network and the attention mechanism is insufficient in extracting the target features.
Description
Technical Field
The invention relates to a fine-grained image classification method and system based on a double-branch network, and belongs to the technical field of computer vision.
Background
The fine-grained image classification belongs to the field of image classification tasks, and is different from a common classification task in that the fine-grained image classification task aims at classifying subclasses in a large class, such as: different types of cars, birds, airplanes, etc. The classification targets have the characteristics of large intra-class difference and small inter-class difference, so that the key of classification lies in the fine feature extraction of the targets. At the present stage, the fine-grained image classification method mainly utilizes an attention mechanism to carry out maximum value activation on a target so as to obtain effective local discriminant features of the target, and extraction of non-maximum value activation features is lacked; on the other hand, most of the existing fine-grained classification methods mainly extract target features based on a convolutional neural network, and lack of consideration of designing a network framework and an objective function by using the angle of a Transformer architecture, and the like, and the existing convolutional neural network methods are difficult to be directly extended to fine-grained image classification methods based on the Transformer architecture network.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a fine-grained image classification method and a fine-grained image classification system based on a double-branch network, and can solve the problems that a convolutional neural network method in the prior art cannot be expanded to a fine-grained image classification task based on a Transformer architecture network, and the target features are not extracted sufficiently by an attention mechanism. In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a fine-grained image classification method based on a dual-branch network, including:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
With reference to the first aspect, further, the preprocessing the target image to be classified includes:
the target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
With reference to the first aspect, further, the non-maximum-value-activated dual-branch network includes an image preprocessing module, a backbone network feature extraction module, a non-maximum-value-activated module, and a homogeneous dual-branch subnetwork.
With reference to the first aspect, further, the non-max-activated dual-branch network is trained by:
inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
With reference to the first aspect, preferably, the image for training is obtained by: the image is scaled to 600 pixels × 600 pixels, a region image with 448 pixels × 448 pixels is randomly cut, and the image is flipped in a random horizontal flipping manner with a random probability of 0.5.
With reference to the first aspect, further, the preset target loss function is:
L=LCE+λLDB (1)
in the formula (1), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
in the formula (2), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,…,C},Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (1), LDBExpressing similarity measures the objective loss function, expressed by:
in the formula (3), the reaction mixture is,a similarity matrix between different branch signatures is represented,representing the features of the b-th imageAndthe characteristics of splicing and standardization are such that,to representFeature after transposition, diag (S)b) Represents the extraction of SbThe value of the main diagonal line, B, represents the number of input images.
With reference to the first aspect, further, the calculation process of the non-max activation module includes:
Fm,Fn=NAM(F) (4)
in formula (4), NAM (.) represents a module calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C,
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C。
With reference to the first aspect, further, the maximum activation characteristic F output by the non-maximum activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
in the formula (5), the reaction mixture is,representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
in formula (6), GAP (. cndot.) represents a global mean pooling operation,represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
With reference to the first aspect, further, the non-maximum activation feature F output by the non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
in the formula (7), the reaction mixture is,representation pair weight matrixThe ranking of the suppression is determined by the rank of the suppression,to representMiddle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maximaThe degree of inhibition of (a) is,representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
With reference to the first aspect, further, the fusing the category prediction results by using a preset fusion method includes:
the class probability prediction result is denoted as p1、p2And p3Three characteristics X for respectively activating output of the dual-branch network corresponding to the non-maximum value1、X2And X3(ii) a Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (·) indicates that the splicing operation is performed in the feature channel dimension;
and obtaining a prediction result by adopting weighted summation fusion, and calculating by the following formula:
in the formula (8), the reaction mixture is,represents the prediction probability of the c-th class after weighted fusion,representing the probability of the C class output by the kth path, wherein C is the number of classes, and M represents the total number of paths;
the classification result of the target image to be classified isThe category corresponding to the maximum value of the median is calculated as
In a second aspect, the present invention provides a fine-grained image classification system based on a dual-branch network, including:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Compared with the prior art, the fine-grained image classification method and the fine-grained image classification system based on the double-branch network have the advantages that:
the method comprises the steps of preprocessing a target image to be classified; inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified; the method is improved on the basis of the Swin transducer deep neural network, and a non-maximum value activation module and an isomorphic double-branch subnetwork are introduced, so that more sufficient target discrimination area characteristics can be obtained;
based on the obtained target image features to be classified, a classifier is adopted to carry out class prediction, and a class prediction result of the target image features to be classified is obtained; fusing the category prediction results by adopting a preset fusion method to obtain a classification result of the target image to be classified; the method can solve the problem that the current attention mechanism feature extraction is insufficient, can solve the defect that the attention mechanism method based on the convolutional neural network in the prior art is difficult to be directly expanded to a fine-grained classification method based on a transform frame, and can improve the accuracy and robustness of a fine-grained classification result.
Drawings
Fig. 1 is a flowchart of a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 2 is a structural diagram of a non-maximum activated dual-branch network in a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 3 is a structural diagram of a non-maximum activation module in a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 4 is a structural diagram of a fine-grained image classification system based on a dual-branch network according to a second embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present invention provides a fine-grained image classification method based on a dual-branch network, including:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
As shown in fig. 1, the specific steps are as follows:
step 1: and preprocessing the target image to be classified.
The target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
Step 2: and constructing and training a non-maximum activation double-branch network.
The non-maximum value activated double-branch network comprises an image preprocessing module, a backbone network feature extraction module, a non-maximum value activation module and a homogeneous double-branch sub-network.
Step 2.1: and constructing a non-maximum value activated double-branch network.
After the first three stages of the conventional Swin transducer deep neural network, a non-maximum activation module is introduced, the last stage of the Swin transducer deep neural network is copied to construct a homogeneous double-branch subnetwork, and finally, the non-maximum activation double-branch network is constructed. The constructed non-maximal value activated dual-branch network is shown in fig. 2.
Specifically, relevant contents of Swin Transformer deep neural network are found in Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin and BaiingGuo, Swin Transformer: Hierarchic Vision transform using Shifted Windows, [ DB ] [2021-04-08] https:// arxiv.org/abs/2103.14030.
As shown in fig. 3, the non-maximum activation module processes the output characteristics of the first three stages to obtain the maximum activation characteristics and the non-maximum activation characteristics, and sends the maximum activation characteristics and the non-maximum activation characteristics to the homogeneous dual-branch subnetwork for processing.
The calculation process of the non-maximum activation module comprises the following steps:
Fm,Fn=NAM(F) (1)
in formula (1), NAM (.) represents a modular calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C,
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C。
Maximum value activation characteristic F output by non-maximum value activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
in the formula (2), the reaction mixture is,representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
in formula (3), GAP (. cndot.) represents a global mean pooling operation,represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Non-maximum activation feature F output by non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
in the formula (4), the reaction mixture is,representation pair weight matrixThe ranking of the suppression is determined by the rank of the suppression,to representMiddle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maximaThe degree of inhibition of (a) is,representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Step 2.2: training the non-maximal value activates the dual-branch network.
Inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
The images used for training are obtained by the following steps: the image is scaled to 600 pixels × 600 pixels, a region image with 448 pixels × 448 pixels is randomly cut, and the image is flipped in a random horizontal flipping manner with a random probability of 0.5.
The preset target loss function is composed of a cross entropy classification target loss function and a similarity measurement target loss function, and is calculated according to the following formula:
L=LCE+λLDB (5)
in the formula (5), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
in the formula (6), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,…,C},Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (5), LDBExpressing similarity measures the objective loss function, expressed by:
in the formula (7), the reaction mixture is,a similarity matrix between different branch signatures is represented,representing the features of the b-th imageAndthe characteristics of splicing and standardization are such that,to representFeature after transposition, diag (S)b) Represents the extraction of SbThe value of the main diagonal line, B, represents the number of input images.
And step 3: inputting the preprocessed target image into a pre-trained non-maximum value activation double-branch network, and extracting to obtain the target image features to be classified.
And the non-maximum activation module outputs maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the isomorphic double-branch sub-network, the isomorphic double-branch sub-network learns, and the characteristics of the target image to be classified are output.
And 4, step 4: and based on the obtained target image features to be classified, performing class prediction by adopting a classifier to obtain a class prediction result of the target image features to be classified.
Specifically, the image I to be classified belongs to R3×448×448Three classification characteristics X are obtained by activating a double-branch network through a non-maximum value1、X2And X3Respectively adopting independent classifiers to predict the class probability of the three classification characteristics to obtain a class probability prediction result p1、p2And p3. Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (. cnat.) indicates that the splicing operation is performed in the feature channel dimension.
And 5: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Step 5.1: predicting a result p based on class probability1、p2And p3And obtaining a prediction result by adopting weighted summation fusion:
in the formula (8), the reaction mixture is,represents the prediction probability of the c-th class after weighted fusion,and C is the number of categories, and M is the total number of paths.
Step 5.2: the classification result of the target image to be classified isThe category corresponding to the maximum value of the median is calculated as
The method can obtain more sufficient target discrimination region characteristics, can solve the problem of insufficient extraction of the current attention mechanism characteristics, can solve the defect that the attention mechanism method based on the convolutional neural network in the prior art is difficult to be directly expanded to a fine-grained classification method based on a transform frame, and can improve the accuracy and robustness of a fine-grained classification result.
Example two:
as shown in fig. 4, an embodiment of the present invention provides a fine-grained image classification system based on a dual-branch network, including:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Example three:
the embodiment of the invention provides a fine-grained image classification device based on a double-branch network, which comprises a processor and a storage medium, wherein the processor is used for processing images;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment one.
Example four:
embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to one embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A fine-grained image classification method based on a double-branch network is characterized by comprising the following steps:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
2. The fine-grained image classification method based on the double-branch network according to claim 1, wherein the preprocessing of the target image to be classified comprises:
the target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
3. The fine-grained image classification method based on the double-branch network according to claim 1, wherein the non-maximum activated double-branch network comprises an image preprocessing module, a backbone network feature extraction module, a non-maximum activated module and a homogeneous double-branch subnetwork.
4. The fine-grained image classification method based on the double-branch network according to claim 1, characterized in that the non-maximum activated double-branch network is trained by the following steps:
inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
5. The fine-grained image classification method based on the dual-branch network according to claim 4, wherein the preset target loss function is:
L=LCE+λLDB (1)
in the formula (1), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
in the formula (2), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,...,C},Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (1), LDBExpressing similarity measures the objective loss function, expressed by:
in the formula (3), the reaction mixture is,a similarity matrix between different branch signatures is represented,representing the features of the b-th imageAndthe characteristics of splicing and standardization are such that,to representFeature after transposition, diag (S)b) Represents the extraction of SbValue of the main diagonal, B denotesThe number of images is input.
6. The fine-grained image classification method based on the dual-branch network according to claim 1, wherein the calculation process of the non-maximum activation module comprises:
Fm,Fn=NAM(F) (4)
in formula (4), NAM (.) represents a module calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB ×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C,
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C。
7. The fine-grained image classification method based on double-branch network according to claim 6, characterized in that the maximum activation feature F output by the non-maximum activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
in the formula (5), the reaction mixture is,representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
in formula (6), GAP (. cndot.) represents a global mean pooling operation,represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
8. The fine-grained image classification method based on double-branch network according to claim 7, characterized in that the non-maximum activation feature F output by the non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
in the formula (7), the reaction mixture is,representation pair weight matrixThe ranking of the suppression is determined by the rank of the suppression,to representMiddle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maximaThe degree of inhibition of (a) is,representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrixIn-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
9. The fine-grained image classification method based on the dual-branch network according to claim 1, wherein the fusing the class prediction results by using a preset fusion method comprises:
the class probabilityThe prediction result is expressed as p1、p2And p3Three characteristics X for respectively activating output of the dual-branch network corresponding to the non-maximum value1、X2And X3(ii) a Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (·) indicates that the splicing operation is performed in the feature channel dimension;
and obtaining a prediction result by adopting weighted summation fusion, and calculating by the following formula:
in the formula (8), the reaction mixture is,represents the prediction probability of the c-th class after weighted fusion,representing the probability of the C class output by the kth path, wherein C is the number of classes, and M represents the total number of paths;
10. A fine-grained image classification system based on a double-branch network is characterized by comprising the following components:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111175746.2A CN113902948A (en) | 2021-10-09 | 2021-10-09 | Fine-grained image classification method and system based on double-branch network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111175746.2A CN113902948A (en) | 2021-10-09 | 2021-10-09 | Fine-grained image classification method and system based on double-branch network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113902948A true CN113902948A (en) | 2022-01-07 |
Family
ID=79190664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111175746.2A Pending CN113902948A (en) | 2021-10-09 | 2021-10-09 | Fine-grained image classification method and system based on double-branch network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113902948A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114626476A (en) * | 2022-03-21 | 2022-06-14 | 北京信息科技大学 | Bird fine-grained image recognition method and device based on Transformer and component feature fusion |
CN116452896A (en) * | 2023-06-16 | 2023-07-18 | 中国科学技术大学 | Method, system, device and medium for improving fine-grained image classification performance |
-
2021
- 2021-10-09 CN CN202111175746.2A patent/CN113902948A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114626476A (en) * | 2022-03-21 | 2022-06-14 | 北京信息科技大学 | Bird fine-grained image recognition method and device based on Transformer and component feature fusion |
CN116452896A (en) * | 2023-06-16 | 2023-07-18 | 中国科学技术大学 | Method, system, device and medium for improving fine-grained image classification performance |
CN116452896B (en) * | 2023-06-16 | 2023-10-20 | 中国科学技术大学 | Method, system, device and medium for improving fine-grained image classification performance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Person search via a mask-guided two-stream cnn model | |
Jeong et al. | Ood-maml: Meta-learning for few-shot out-of-distribution detection and classification | |
Mariet et al. | Diversity networks: Neural network compression using determinantal point processes | |
EP1433118B1 (en) | System and method of face recognition using portions of learned model | |
Loussaief et al. | Deep learning vs. bag of features in machine learning for image classification | |
Chen et al. | Confidence scoring using whitebox meta-models with linear classifier probes | |
Minhas et al. | Anomaly detection in images | |
CN113902948A (en) | Fine-grained image classification method and system based on double-branch network | |
CN110197213B (en) | Image matching method, device and equipment based on neural network | |
Masood et al. | Differential evolution based advised SVM for histopathalogical image analysis for skin cancer detection | |
Zhu et al. | Crime event embedding with unsupervised feature selection | |
Tsiligkaridis | Failure prediction by confidence estimation of uncertainty-aware Dirichlet networks | |
CN113642540B (en) | Capsule network-based facial expression recognition method and device | |
CN113362814B (en) | Voice identification model compression method fusing combined model information | |
Herrera et al. | Deep fraud. A fraud intention recognition framework in public transport context using a deep-learning approach | |
Kodama et al. | Open-set recognition with supervised contrastive learning | |
Carannante et al. | Robust learning via ensemble density propagation in deep neural networks | |
CN113486202A (en) | Method for classifying small sample images | |
US10853691B1 (en) | Neural network architecture | |
CN115497564A (en) | Antigen identification model establishing method and antigen identification method | |
Zhong et al. | A dbn-crf for spectral-spatial classification of hyperspectral data | |
Namburi | Speaker Recognition Based on Mutated Monarch Butterfly Optimization Configured Artificial Neural Network | |
Kundu et al. | A modified BP network using Malsburg learning for rotation and location invariant fingerprint recognition and localization with and without occlusion | |
Ghifary et al. | Deep hybrid networks with good out-of-sample object recognition | |
Rivas et al. | Deep sparse autoencoders for american sign language recognition using depth images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |