Nothing Special   »   [go: up one dir, main page]

CN113902948A - Fine-grained image classification method and system based on double-branch network - Google Patents

Fine-grained image classification method and system based on double-branch network Download PDF

Info

Publication number
CN113902948A
CN113902948A CN202111175746.2A CN202111175746A CN113902948A CN 113902948 A CN113902948 A CN 113902948A CN 202111175746 A CN202111175746 A CN 202111175746A CN 113902948 A CN113902948 A CN 113902948A
Authority
CN
China
Prior art keywords
maximum
classified
double
target image
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111175746.2A
Other languages
Chinese (zh)
Inventor
苗壮
赵勋
王家宝
李阳
张睿
许博
王亚鹏
杨利
赵昕昕
杨义鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN202111175746.2A priority Critical patent/CN113902948A/en
Publication of CN113902948A publication Critical patent/CN113902948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fine-grained image classification method based on a double-branch network, which comprises the following steps: preprocessing a target image to be classified; inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified; and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified. The method can solve the problems that the convolutional neural network method in the prior art cannot be expanded to a fine-grained image classification task based on a transform architecture network and the attention mechanism is insufficient in extracting the target features.

Description

Fine-grained image classification method and system based on double-branch network
Technical Field
The invention relates to a fine-grained image classification method and system based on a double-branch network, and belongs to the technical field of computer vision.
Background
The fine-grained image classification belongs to the field of image classification tasks, and is different from a common classification task in that the fine-grained image classification task aims at classifying subclasses in a large class, such as: different types of cars, birds, airplanes, etc. The classification targets have the characteristics of large intra-class difference and small inter-class difference, so that the key of classification lies in the fine feature extraction of the targets. At the present stage, the fine-grained image classification method mainly utilizes an attention mechanism to carry out maximum value activation on a target so as to obtain effective local discriminant features of the target, and extraction of non-maximum value activation features is lacked; on the other hand, most of the existing fine-grained classification methods mainly extract target features based on a convolutional neural network, and lack of consideration of designing a network framework and an objective function by using the angle of a Transformer architecture, and the like, and the existing convolutional neural network methods are difficult to be directly extended to fine-grained image classification methods based on the Transformer architecture network.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a fine-grained image classification method and a fine-grained image classification system based on a double-branch network, and can solve the problems that a convolutional neural network method in the prior art cannot be expanded to a fine-grained image classification task based on a Transformer architecture network, and the target features are not extracted sufficiently by an attention mechanism. In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a fine-grained image classification method based on a dual-branch network, including:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
With reference to the first aspect, further, the preprocessing the target image to be classified includes:
the target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
With reference to the first aspect, further, the non-maximum-value-activated dual-branch network includes an image preprocessing module, a backbone network feature extraction module, a non-maximum-value-activated module, and a homogeneous dual-branch subnetwork.
With reference to the first aspect, further, the non-max-activated dual-branch network is trained by:
inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
With reference to the first aspect, preferably, the image for training is obtained by: the image is scaled to 600 pixels × 600 pixels, a region image with 448 pixels × 448 pixels is randomly cut, and the image is flipped in a random horizontal flipping manner with a random probability of 0.5.
With reference to the first aspect, further, the preset target loss function is:
L=LCE+λLDB (1)
in the formula (1), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
Figure BDA0003294997050000031
in the formula (2), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,…,C},
Figure BDA00032949970500000310
Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (1), LDBExpressing similarity measures the objective loss function, expressed by:
Figure BDA0003294997050000032
in the formula (3), the reaction mixture is,
Figure BDA0003294997050000033
a similarity matrix between different branch signatures is represented,
Figure BDA0003294997050000034
representing the features of the b-th image
Figure BDA0003294997050000035
And
Figure BDA0003294997050000036
the characteristics of splicing and standardization are such that,
Figure BDA0003294997050000037
to represent
Figure BDA0003294997050000038
Feature after transposition, diag (S)b) Represents the extraction of SbThe value of the main diagonal line, B, represents the number of input images.
With reference to the first aspect, further, the calculation process of the non-max activation module includes:
Fm,Fn=NAM(F) (4)
in formula (4), NAM (.) represents a module calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C
With reference to the first aspect, further, the maximum activation characteristic F output by the non-maximum activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
Figure BDA0003294997050000039
Calculating each set of features FiCorresponding weight matrix
Figure BDA0003294997050000041
Calculated by the following formula:
Figure BDA0003294997050000042
in the formula (5), the reaction mixture is,
Figure BDA0003294997050000043
representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
Figure BDA0003294997050000044
in formula (6), GAP (. cndot.) represents a global mean pooling operation,
Figure BDA0003294997050000045
represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrix
Figure BDA0003294997050000046
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure BDA0003294997050000047
Will k group characteristics
Figure BDA0003294997050000048
Splicing to obtain the maximum activation characteristic Fm
With reference to the first aspect, further, the non-maximum activation feature F output by the non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
Figure BDA0003294997050000049
in the formula (7), the reaction mixture is,
Figure BDA00032949970500000410
representation pair weight matrix
Figure BDA00032949970500000411
The ranking of the suppression is determined by the rank of the suppression,
Figure BDA00032949970500000412
to represent
Figure BDA00032949970500000413
Middle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maxima
Figure BDA00032949970500000414
The degree of inhibition of (a) is,
Figure BDA00032949970500000415
representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrix
Figure BDA00032949970500000416
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure BDA00032949970500000417
Will k group characteristics
Figure BDA00032949970500000418
Splicing to obtainActivation of feature F to non-maximum valuen
With reference to the first aspect, further, the fusing the category prediction results by using a preset fusion method includes:
the class probability prediction result is denoted as p1、p2And p3Three characteristics X for respectively activating output of the dual-branch network corresponding to the non-maximum value1、X2And X3(ii) a Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (·) indicates that the splicing operation is performed in the feature channel dimension;
and obtaining a prediction result by adopting weighted summation fusion, and calculating by the following formula:
Figure BDA0003294997050000051
in the formula (8), the reaction mixture is,
Figure BDA0003294997050000052
represents the prediction probability of the c-th class after weighted fusion,
Figure BDA0003294997050000053
representing the probability of the C class output by the kth path, wherein C is the number of classes, and M represents the total number of paths;
the classification result of the target image to be classified is
Figure BDA0003294997050000054
The category corresponding to the maximum value of the median is calculated as
Figure BDA0003294997050000055
In a second aspect, the present invention provides a fine-grained image classification system based on a dual-branch network, including:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Compared with the prior art, the fine-grained image classification method and the fine-grained image classification system based on the double-branch network have the advantages that:
the method comprises the steps of preprocessing a target image to be classified; inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified; the method is improved on the basis of the Swin transducer deep neural network, and a non-maximum value activation module and an isomorphic double-branch subnetwork are introduced, so that more sufficient target discrimination area characteristics can be obtained;
based on the obtained target image features to be classified, a classifier is adopted to carry out class prediction, and a class prediction result of the target image features to be classified is obtained; fusing the category prediction results by adopting a preset fusion method to obtain a classification result of the target image to be classified; the method can solve the problem that the current attention mechanism feature extraction is insufficient, can solve the defect that the attention mechanism method based on the convolutional neural network in the prior art is difficult to be directly expanded to a fine-grained classification method based on a transform frame, and can improve the accuracy and robustness of a fine-grained classification result.
Drawings
Fig. 1 is a flowchart of a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 2 is a structural diagram of a non-maximum activated dual-branch network in a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 3 is a structural diagram of a non-maximum activation module in a fine-grained image classification method based on a dual-branch network according to an embodiment of the present invention;
fig. 4 is a structural diagram of a fine-grained image classification system based on a dual-branch network according to a second embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present invention provides a fine-grained image classification method based on a dual-branch network, including:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
As shown in fig. 1, the specific steps are as follows:
step 1: and preprocessing the target image to be classified.
The target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
Step 2: and constructing and training a non-maximum activation double-branch network.
The non-maximum value activated double-branch network comprises an image preprocessing module, a backbone network feature extraction module, a non-maximum value activation module and a homogeneous double-branch sub-network.
Step 2.1: and constructing a non-maximum value activated double-branch network.
After the first three stages of the conventional Swin transducer deep neural network, a non-maximum activation module is introduced, the last stage of the Swin transducer deep neural network is copied to construct a homogeneous double-branch subnetwork, and finally, the non-maximum activation double-branch network is constructed. The constructed non-maximal value activated dual-branch network is shown in fig. 2.
Specifically, relevant contents of Swin Transformer deep neural network are found in Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin and BaiingGuo, Swin Transformer: Hierarchic Vision transform using Shifted Windows, [ DB ] [2021-04-08] https:// arxiv.org/abs/2103.14030.
As shown in fig. 3, the non-maximum activation module processes the output characteristics of the first three stages to obtain the maximum activation characteristics and the non-maximum activation characteristics, and sends the maximum activation characteristics and the non-maximum activation characteristics to the homogeneous dual-branch subnetwork for processing.
The calculation process of the non-maximum activation module comprises the following steps:
Fm,Fn=NAM(F) (1)
in formula (1), NAM (.) represents a modular calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C
Maximum value activation characteristic F output by non-maximum value activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
Figure BDA0003294997050000091
Calculating each set of features FiCorresponding weight matrix
Figure BDA0003294997050000092
Calculated by the following formula:
Figure BDA0003294997050000093
in the formula (2), the reaction mixture is,
Figure BDA0003294997050000094
representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
Figure BDA0003294997050000095
in formula (3), GAP (. cndot.) represents a global mean pooling operation,
Figure BDA0003294997050000096
represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrix
Figure BDA0003294997050000097
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure BDA0003294997050000098
Will k group characteristics
Figure BDA0003294997050000099
Splicing to obtain the maximum activation characteristic Fm
Non-maximum activation feature F output by non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
Figure BDA00032949970500000910
in the formula (4), the reaction mixture is,
Figure BDA00032949970500000911
representation pair weight matrix
Figure BDA00032949970500000912
The ranking of the suppression is determined by the rank of the suppression,
Figure BDA00032949970500000913
to represent
Figure BDA00032949970500000914
Middle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maxima
Figure BDA00032949970500000915
The degree of inhibition of (a) is,
Figure BDA00032949970500000916
representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrix
Figure BDA00032949970500000917
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure BDA00032949970500000918
Will k group characteristics
Figure BDA00032949970500000919
Splicing to obtain non-maximum activation characteristic Fn
Step 2.2: training the non-maximal value activates the dual-branch network.
Inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
The images used for training are obtained by the following steps: the image is scaled to 600 pixels × 600 pixels, a region image with 448 pixels × 448 pixels is randomly cut, and the image is flipped in a random horizontal flipping manner with a random probability of 0.5.
The preset target loss function is composed of a cross entropy classification target loss function and a similarity measurement target loss function, and is calculated according to the following formula:
L=LCE+λLDB (5)
in the formula (5), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
Figure BDA0003294997050000101
in the formula (6), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,…,C},
Figure BDA0003294997050000109
Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (5), LDBExpressing similarity measures the objective loss function, expressed by:
Figure BDA0003294997050000102
in the formula (7), the reaction mixture is,
Figure BDA0003294997050000103
a similarity matrix between different branch signatures is represented,
Figure BDA0003294997050000104
representing the features of the b-th image
Figure BDA0003294997050000105
And
Figure BDA0003294997050000106
the characteristics of splicing and standardization are such that,
Figure BDA0003294997050000107
to represent
Figure BDA0003294997050000108
Feature after transposition, diag (S)b) Represents the extraction of SbThe value of the main diagonal line, B, represents the number of input images.
And step 3: inputting the preprocessed target image into a pre-trained non-maximum value activation double-branch network, and extracting to obtain the target image features to be classified.
And the non-maximum activation module outputs maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the isomorphic double-branch sub-network, the isomorphic double-branch sub-network learns, and the characteristics of the target image to be classified are output.
And 4, step 4: and based on the obtained target image features to be classified, performing class prediction by adopting a classifier to obtain a class prediction result of the target image features to be classified.
Specifically, the image I to be classified belongs to R3×448×448Three classification characteristics X are obtained by activating a double-branch network through a non-maximum value1、X2And X3Respectively adopting independent classifiers to predict the class probability of the three classification characteristics to obtain a class probability prediction result p1、p2And p3. Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (. cnat.) indicates that the splicing operation is performed in the feature channel dimension.
And 5: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Step 5.1: predicting a result p based on class probability1、p2And p3And obtaining a prediction result by adopting weighted summation fusion:
Figure BDA0003294997050000111
in the formula (8), the reaction mixture is,
Figure BDA0003294997050000112
represents the prediction probability of the c-th class after weighted fusion,
Figure BDA0003294997050000113
and C is the number of categories, and M is the total number of paths.
Step 5.2: the classification result of the target image to be classified is
Figure BDA0003294997050000114
The category corresponding to the maximum value of the median is calculated as
Figure BDA0003294997050000115
The method can obtain more sufficient target discrimination region characteristics, can solve the problem of insufficient extraction of the current attention mechanism characteristics, can solve the defect that the attention mechanism method based on the convolutional neural network in the prior art is difficult to be directly expanded to a fine-grained classification method based on a transform frame, and can improve the accuracy and robustness of a fine-grained classification result.
Example two:
as shown in fig. 4, an embodiment of the present invention provides a fine-grained image classification system based on a dual-branch network, including:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
Example three:
the embodiment of the invention provides a fine-grained image classification device based on a double-branch network, which comprises a processor and a storage medium, wherein the processor is used for processing images;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment one.
Example four:
embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to one embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A fine-grained image classification method based on a double-branch network is characterized by comprising the following steps:
preprocessing a target image to be classified;
inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
based on the obtained target image features to be classified, adopting a classifier to perform class prediction to obtain class prediction results of the target image features to be classified;
and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
2. The fine-grained image classification method based on the double-branch network according to claim 1, wherein the preprocessing of the target image to be classified comprises:
the target image to be classified is scaled to a size of 600 pixels × 600 pixels, and an image area of 448 pixels × 448 pixels in size is cropped centering on the center of the image.
3. The fine-grained image classification method based on the double-branch network according to claim 1, wherein the non-maximum activated double-branch network comprises an image preprocessing module, a backbone network feature extraction module, a non-maximum activated module and a homogeneous double-branch subnetwork.
4. The fine-grained image classification method based on the double-branch network according to claim 1, characterized in that the non-maximum activated double-branch network is trained by the following steps:
inputting an image for training into the non-maximal value activation double-branch network; and under the guidance of a preset target loss function, training parameters of the non-maximum activated double-branch network by adopting a random gradient descent algorithm to obtain optimal network parameters.
5. The fine-grained image classification method based on the dual-branch network according to claim 4, wherein the preset target loss function is:
L=LCE+λLDB (1)
in the formula (1), L represents an objective loss function; λ represents LCEAnd LDBWeight in between; l isCERepresents a cross-entropy classification objective loss function, represented by the following equation:
Figure FDA0003294997040000021
in the formula (2), B represents the number of input images; f. ofbRepresenting the feature of the b-th image, ybA real label representing the b-th image, and yb∈{1,2,...,C},
Figure FDA0003294997040000029
Representing a feature fbMapping to a genuine tag class ybWeight parameter of WjRepresenting a feature fbMapping to a weight parameter of the jth category, wherein C is the total number of categories;
in the formula (1), LDBExpressing similarity measures the objective loss function, expressed by:
Figure FDA0003294997040000022
in the formula (3), the reaction mixture is,
Figure FDA0003294997040000023
a similarity matrix between different branch signatures is represented,
Figure FDA0003294997040000024
representing the features of the b-th image
Figure FDA0003294997040000025
And
Figure FDA0003294997040000026
the characteristics of splicing and standardization are such that,
Figure FDA0003294997040000027
to represent
Figure FDA0003294997040000028
Feature after transposition, diag (S)b) Represents the extraction of SbValue of the main diagonal, B denotesThe number of images is input.
6. The fine-grained image classification method based on the dual-branch network according to claim 1, wherein the calculation process of the non-maximum activation module comprises:
Fm,Fn=NAM(F) (4)
in formula (4), NAM (.) represents a module calculation process; f represents the characteristic of the input non-maximum value activation module and satisfies the condition that F belongs to RB ×L×C(ii) a Wherein B represents the number of input images, L represents the feature dimension, and C represents the number of feature channels; fmThe maximum value activation characteristic of the output of the non-maximum value activation module is represented, and F is satisfiedm∈RB×L×C
FnExpressing the non-maximum activation characteristic output by the non-maximum activation module to satisfy Fn∈RB×L×C
7. The fine-grained image classification method based on double-branch network according to claim 6, characterized in that the maximum activation feature F output by the non-maximum activation modulemThe calculation process of (2) is as follows:
equally dividing the characteristic F of the input non-maximum activation module into k groups along the 2 nd dimension, wherein the characteristic dimension L is required to be integral multiple of the group number k, and obtaining the ith group of characteristics
Figure FDA0003294997040000031
Calculating each set of features FiCorresponding weight matrix
Figure FDA0003294997040000032
Calculated by the following formula:
Figure FDA0003294997040000033
in the formula (5), the reaction mixture is,
Figure FDA0003294997040000034
representing the weight of the ith dimension of the b-th image, and carrying out weight normalization on the k blocks by denominator; a. thei(b, l) represents AiAn element in the middle position (b, l), AiRepresents the ith set of feature vectors, represented by:
Figure FDA0003294997040000035
in formula (6), GAP (. cndot.) represents a global mean pooling operation,
Figure FDA0003294997040000036
represents a convolution operation, σ (-) represents a Relu activation operation;
weighting matrix
Figure FDA0003294997040000037
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure FDA0003294997040000038
Will k group characteristics
Figure FDA0003294997040000039
Splicing to obtain the maximum activation characteristic Fm
8. The fine-grained image classification method based on double-branch network according to claim 7, characterized in that the non-maximum activation feature F output by the non-maximum activation modulenThe calculation process of (2) is as follows:
performing a maximum suppression operation on each set of features:
Figure FDA00032949970400000310
in the formula (7), the reaction mixture is,
Figure FDA00032949970400000311
representation pair weight matrix
Figure FDA00032949970400000312
The ranking of the suppression is determined by the rank of the suppression,
Figure FDA00032949970400000313
to represent
Figure FDA00032949970400000314
Middle alpha is large value, beta is equal to 0, 1]Representing activation of feature weights for maxima
Figure FDA00032949970400000315
The degree of inhibition of (a) is,
Figure FDA00032949970400000316
representing the ith group of non-maximum excitation characteristic weight matrixes;
weighting matrix
Figure FDA00032949970400000317
In-channel dimension pair feature FiPerforming extended weighting calculation to obtain weighted features
Figure FDA0003294997040000041
Will k group characteristics
Figure FDA0003294997040000042
Splicing to obtain non-maximum activation characteristic Fn
9. The fine-grained image classification method based on the dual-branch network according to claim 1, wherein the fusing the class prediction results by using a preset fusion method comprises:
the class probabilityThe prediction result is expressed as p1、p2And p3Three characteristics X for respectively activating output of the dual-branch network corresponding to the non-maximum value1、X2And X3(ii) a Wherein, X1Representing features of the target image to be classified containing maximum excitation features, X2Representing features of the target image to be classified containing non-maximum excitation features, X3Represents the splicing feature and satisfies X3=Concat(X1,X2) Concat (·) indicates that the splicing operation is performed in the feature channel dimension;
and obtaining a prediction result by adopting weighted summation fusion, and calculating by the following formula:
Figure FDA0003294997040000043
in the formula (8), the reaction mixture is,
Figure FDA0003294997040000044
represents the prediction probability of the c-th class after weighted fusion,
Figure FDA0003294997040000045
representing the probability of the C class output by the kth path, wherein C is the number of classes, and M represents the total number of paths;
the classification result of the target image to be classified is
Figure FDA0003294997040000046
The category corresponding to the maximum value of the median is calculated as
Figure FDA0003294997040000047
10. A fine-grained image classification system based on a double-branch network is characterized by comprising the following components:
a preprocessing module: the image preprocessing module is used for preprocessing a target image to be classified;
a feature extraction module: the system is used for inputting the preprocessed target image into a pre-trained non-maximum activated double-branch network, and extracting to obtain the characteristics of the target image to be classified; the non-maximum activation double-branch network comprises a non-maximum activation module and a homogeneous double-branch subnetwork, wherein the non-maximum activation module is used for outputting maximum activation characteristics and non-maximum activation characteristics, the output characteristics are input into the homogeneous double-branch subnetwork, the homogeneous double-branch subnetwork learns and outputs target image characteristics to be classified;
a category prediction module: the image classification device is used for performing class prediction by adopting a classifier based on the obtained target image features to be classified to obtain class prediction results of the target image features to be classified;
a fusion output module: and fusing the category prediction results by adopting a preset fusion method to obtain the classification result of the target image to be classified.
CN202111175746.2A 2021-10-09 2021-10-09 Fine-grained image classification method and system based on double-branch network Pending CN113902948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111175746.2A CN113902948A (en) 2021-10-09 2021-10-09 Fine-grained image classification method and system based on double-branch network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111175746.2A CN113902948A (en) 2021-10-09 2021-10-09 Fine-grained image classification method and system based on double-branch network

Publications (1)

Publication Number Publication Date
CN113902948A true CN113902948A (en) 2022-01-07

Family

ID=79190664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111175746.2A Pending CN113902948A (en) 2021-10-09 2021-10-09 Fine-grained image classification method and system based on double-branch network

Country Status (1)

Country Link
CN (1) CN113902948A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626476A (en) * 2022-03-21 2022-06-14 北京信息科技大学 Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN116452896A (en) * 2023-06-16 2023-07-18 中国科学技术大学 Method, system, device and medium for improving fine-grained image classification performance

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626476A (en) * 2022-03-21 2022-06-14 北京信息科技大学 Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN116452896A (en) * 2023-06-16 2023-07-18 中国科学技术大学 Method, system, device and medium for improving fine-grained image classification performance
CN116452896B (en) * 2023-06-16 2023-10-20 中国科学技术大学 Method, system, device and medium for improving fine-grained image classification performance

Similar Documents

Publication Publication Date Title
Chen et al. Person search via a mask-guided two-stream cnn model
Jeong et al. Ood-maml: Meta-learning for few-shot out-of-distribution detection and classification
Mariet et al. Diversity networks: Neural network compression using determinantal point processes
EP1433118B1 (en) System and method of face recognition using portions of learned model
Loussaief et al. Deep learning vs. bag of features in machine learning for image classification
Chen et al. Confidence scoring using whitebox meta-models with linear classifier probes
Minhas et al. Anomaly detection in images
CN113902948A (en) Fine-grained image classification method and system based on double-branch network
CN110197213B (en) Image matching method, device and equipment based on neural network
Masood et al. Differential evolution based advised SVM for histopathalogical image analysis for skin cancer detection
Zhu et al. Crime event embedding with unsupervised feature selection
Tsiligkaridis Failure prediction by confidence estimation of uncertainty-aware Dirichlet networks
CN113642540B (en) Capsule network-based facial expression recognition method and device
CN113362814B (en) Voice identification model compression method fusing combined model information
Herrera et al. Deep fraud. A fraud intention recognition framework in public transport context using a deep-learning approach
Kodama et al. Open-set recognition with supervised contrastive learning
Carannante et al. Robust learning via ensemble density propagation in deep neural networks
CN113486202A (en) Method for classifying small sample images
US10853691B1 (en) Neural network architecture
CN115497564A (en) Antigen identification model establishing method and antigen identification method
Zhong et al. A dbn-crf for spectral-spatial classification of hyperspectral data
Namburi Speaker Recognition Based on Mutated Monarch Butterfly Optimization Configured Artificial Neural Network
Kundu et al. A modified BP network using Malsburg learning for rotation and location invariant fingerprint recognition and localization with and without occlusion
Ghifary et al. Deep hybrid networks with good out-of-sample object recognition
Rivas et al. Deep sparse autoencoders for american sign language recognition using depth images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination