MRDA-MGFSNet: Network Based on a Multi-Rate Dilated Attention Mechanism and Multi-Granularity Feature Sharer for Image-Based Butterflies Fine-Grained Classification
<p>Differences comparison. (<b>a</b>,<b>b</b>) Butterfly images with complex background; (<b>c</b>–<b>g</b>) Butterfly species with similar features.</p> "> Figure 2
<p>The overall structure of the MRDA-MGFSNet.</p> "> Figure 3
<p>Multi-rate Dilated Attention Mechanism.</p> "> Figure 4
<p>Examples of dilated convolution with different coefficients. (<b>a</b>) The coefficient is 1; (<b>b</b>) the coefficient is 2; (<b>c</b>) the coefficient is 4.</p> "> Figure 5
<p>Multi-granularity Feature Sharer structure composition.</p> "> Figure 6
<p>Detailed structure of Multi-granularity Feature Sharer.</p> "> Figure 7
<p>Depthwise separable convolution structure.</p> "> Figure 8
<p>Loss curve of various methods.</p> "> Figure 9
<p>Visualization of the proposed MRDA-MGFS’s Gradient-weighted Class Activation Mapping (Grad-CAM).</p> "> Figure 10
<p>Confusion matrix of partial network.</p> "> Figure 11
<p>Noise pollution dataset.</p> ">
Abstract
:1. Introduction
- (1)
- Aiming at solving the problems of high background complexity in some butterfly images and difficulty in identifying them caused by their small inter-class variance, we propose MRDA-MGFSNet, designed as follows:
- A Multi-rate Dilated Attention Mechanism with a symmetrical structure suitable for fine-grained butterfly classification is proposed. This module assigns different weights to channel and spatial features to keep more important butterfly features and discard redundant information such as complex natural background information. At the same time, the module integrates dilated convolutions of different rates to expand the visual field of the network and obtain rich context information. It has a good effect on the problem that it is difficult to recognize butterfly images under the interference of a complex background.
- A Multi-granularity Feature Sharer is designed. This module can effectively integrate the overall features of butterflies, save and extract similar spots and patterns and other feature information in butterflies. On the basis of effectively solving the recognition problem of the small inter-class variance of butterfly spots, by connecting a 2-dimensional channel-by-channel convolution and a 3-dimensional point-by-point convolution, it effectively compensates for the increase in parameters caused by the multi-scale structure, saves the training time and improves the efficiency of the network.
- (2)
- The method of this paper obtains a mAP of 96.64% for the recognition of five categories of butterflies, and the F1 value reaches 95.44%. It has a good effect on distinguishing butterflies with similar patterns and spots and other features. It has good performance for butterfly classification in complex natural environments, enabling butterfly experts and scholars to better use this technology in the field of butterfly identification to record and study butterfly habits to protect butterflies from extinction, which can finally protect the ecosystem from damage.
2. Related Work
3. Experimental Materials and Methods
3.1. Data Acquisition
3.2. MRDA-MGFSNet
- The first part was used to extract features, of which there were 64 7 × 7 convolution kernels, stride was 2, whose purpose was to quickly extract various edge features and reduce the size of the image to half of the original size. The function of a maxpool of 3 × 3 size was to retain the main features while reducing the amounts of parameters and calculations, preventing over-fitting, and improving the generalization ability of the model.
- The second part was composed of 16 MRDA modules and MGFS modules (explained in detail below). The MGFS module was composed of 2 1 × 1 convolutions and 4 3 × 3 convolutions of different scales, which were used to pay attention to the similar spots and patterns and other small feature information of butterflies. The 3 × 3 convolution used a two-dimensional channel-by-channel convolution and a three-dimensional point-by-point convolution, and its purpose was to reduce the amount of parameter calculations and speed up network training. The MRDA module first, respectively, passed three dilated convolutions with rate = 1, 2, 3, and then used the channel attention mechanism which consists of a max pooling layer and two 1 × 1 convolutional layers. Then, we used the spatial attention mechanism which consists of an average pooling layer, a max pooling layer and two 3 × 3 convolutions. It assigned different weights to channel and spatial features, whose role was to distinguish similar patterns in the butterfly’s feature maps and suppress the background information that was similar to the features of the butterfly but was invalid, and enhance the expressive ability of the network. Finally, the feature map obtained in the first layer was added to the module after the attention mechanism, and the PRelu activation function was used to enhance the nonlinear expression ability of the network.
- In the last part, an average pooling down-sampling layer was connected to a fully connected layer and, finally the, the output was converted into a probability distribution through softmax to obtain the classification result of the butterfly image.
3.2.1. Multi-Rate Dilated Attention Mechanism (MRDA)
3.2.2. Multi-Granularity Feature Sharer (MGFS)
- (1)
- Generally, larger convolution kernels have a stronger ability to perceive large target objects, and small-size convolution kernels are better at extracting features of small targets. However, the quality of butterfly images varies. Some were butterfly specimens and had few backgrounds information, and some had complex backgrounds and the targets were not easy to find. Therefore, we increased branches of different sizes of receptive fields and used convolution kernels with sizes of 3 × 3, 5 × 5, and 7 × 7 to improve the recognition accuracy.
- (2)
- The MGFS structure divided the feature maps obtained after 1 × 1 convolution into 4 scales on average, of which 3 × 3 convolution used depthwise separable convolution to reduce the amount of parameter and calculation.
- (3)
- Using the PRelu activation function to replace the ReLU or Sigmoid activation function to improve the learning convergence effect of the network.
- (4)
- As the number of butterfly images was relatively small, the group normalization (GN) that was not affected by the batch size was used to replace the batch normalization (BN) layer to improve the network convergence effect, and the batch size was set to 10.
4. Results and Analysis
4.1. Experimental Environment and Preparation
4.2. Results and Analysis
- Ablation experiment
- b.
- The latest methods comparison experiment
- c.
- Noise processing capability experiment
5. Conclusions
- Ablation experiments showed that the MRDA had better results (+5.19%, +5.45%) for butterflies which have more complex backgrounds that are similar to their own features; the MGFS had a good recognition effect (+4.84%, +4.43%, +3.64%) for the three categories of butterflies whose spots are important information and patterns are few; under the same experimental conditions, compared with the multi-scale network, the training time of the MGFS module (with depthwise separable convolution module) was reduced by 1 h 38 min 29 s. The above results show that the two architectures proposed in this paper achieved the expected experimental results, and can effectively solve the problems of complex backgrounds and small inter-class variance between butterflies.
- Compared with some of the current state-of-the-art fine-grained classification methods, our mAP reached 96.64%, and the average F1 value reached 95.44%. The designed butterfly fine-grained classification method can achieve better performance. This method had good effects and obvious advantages in identifying different patterns and spots in different butterfly images and removing complex interference information in the background. After the noise processing capability experiment, our model had an accuracy of 93.57% and an F1 value of 93.64%, which is only 2.03% lower than the accuracy before noise was added, and the F1 value was 2.17% lower, showing that our model has good potential to deal with noisy images. It can be well applied to the butterfly recognition to better protect the important butterflies for ecological protection in the future.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311. [Google Scholar] [CrossRef] [Green Version]
- Sánchez, J.; Perronnin, F.; Mensink, T.; Verbeek, J. Image Classification with the Fisher Vector: Theory and Practice. Int. J. Comput. Vis. 2013, 105, 222–245. [Google Scholar] [CrossRef]
- Le Cun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Xin, D.; Chen, Y.-W.; Li, J. Fine-Grained Butterfly Classification in Ecological Images Using Squeeze-And-Excitation and Spatial Attention Modules. Appl. Sci. 2020, 10, 1681. [Google Scholar] [CrossRef] [Green Version]
- Tan, A.; Zhou, G.; He, M. Rapid Fine-Grained Classification of Butterflies Based on FCM-KM and Mask R-CNN Fusion. IEEE Access 2020, 8, 124722–124733. [Google Scholar] [CrossRef]
- Zhang, J.W. Automatic Identification of Butterflies Based on Computer Vision Technology; China Agriculture University: Beijing, China, 2006. [Google Scholar] [CrossRef]
- Liu, F. The Application of Wings’ Color Characters in Butterfly Species Automatic Identification; China Agricultural University: Beijing, China, 2007. [Google Scholar] [CrossRef]
- Kaya, Y.; Kayci, L.; Tek in, R. A computer vision system for the automatic identification of butterfly species via Gabor-filter-based texture features and extreme learning machine: GF+ ELM. TEM J. 2013, 2, 13–20. [Google Scholar]
- Kaya, Y.; Kayci, L. Application of artificial neural network for automatic detection of butterfly species using color and texture features. Vis. Comput. 2013, 30, 71–79. [Google Scholar] [CrossRef]
- Kang, S.-H.; Cho, J.-H.; Lee, S.-H. Identification of butterfly based on their shapes when viewed from different angles using an artificial neural network. J. Asia-Pac. Èntomol. 2014, 17, 143–149. [Google Scholar] [CrossRef]
- Hernández-Serna, A.; Jiménez-Segura, L.F. Automatic identification of species with neural networks. PeerJ 2014, 2, e563. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, A.M.; Ma, P.P.; Xi, T.Y.; Jiang-Ning, W.; Jin, F.; Ze-Zhong, S.; Yu-Lei, T.; Qing, Y. Automatic identification of butterfly specimen images at the family level based on deep learning method. Acta Entomol. Sin. 2017, 60, 1339–1348. [Google Scholar] [CrossRef]
- Juan-Ying, X.; Qi, H.; Ying-Huan, S.; Peng, L.; Jing, L.; Zhuang, F.; Zhang, J.; Tang, X.; Xu, S. The automatic identification of butterfly species. J. Comput. Res. Dev. 2018, 55, 1609–1618. [Google Scholar] [CrossRef]
- Tan, A.; Zhou, G.; He, M. Surface defect identification of Citrus based on KF-2D-Renyi and ABC-SVM. Multimed. Tools Appl. 2021, 80, 9109–9136. [Google Scholar] [CrossRef]
- Chen, X.; Zhou, G.; Chen, A.; Yi, J.; Zhang, W.; Hu, Y. Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet. Comput. Electron. Agric. 2020, 178, 105730. [Google Scholar] [CrossRef]
- Huang, S.; Zhou, G.; He, M.; Chen, A.; Zhang, W.; Hu, Y. Detection of Peach Disease Image Based on Asymptotic Non-Local Means and PCNN-IPELM. IEEE Access 2020, 8, 136421–136433. [Google Scholar] [CrossRef]
- Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J. An Empirical Study of Spatial Attention Mechanisms in Deep Networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Institute of Electrical and Electronics Engineers (IEEE), Seoul, Korea, 27 October–2 November 2019; pp. 6687–6696. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 26 July 2017; pp. 4700–4708. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yang, Z.; Luo, T.; Wang, D.; Wang, D.; Hu, Z.; Gao, J.; Wang, L. Learning to Navigate for Fine-Grained Classification. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2018. [Google Scholar]
- Wang, Y.; Morariu, V.I.; Davis, L.S. Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4148–4157. [Google Scholar]
- Li, X.; Wu, J.; Sun, Z.; Ma, Z.; Cao, J.; Xue, J.-H. BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification. IEEE Trans. Image Process. 2021, 30, 1318–1331. [Google Scholar] [CrossRef]
Example | Number | Proportion (%) | |
---|---|---|---|
Argynnis hyperbius | 3014 | 19.10 | |
Monarch butterfly | 2983 | 18.90 | |
Polygonia caureum | 3205 | 20.31 | |
Danaus genutia | 3311 | 20.98 | |
Papilio machaon | 3271 | 20.72 |
Network | Argynnis hyperbius | Monarch butterfly | Polygonia caureum | Danaus genutia | Papilio machaon | Overall | Training Time |
---|---|---|---|---|---|---|---|
CNN | 84.41% | 68.84% | 76.29% | 83.69% | 83.94% | 79.57% | 4 h 28 min 03 s |
AlexNet-fc6 | 87.56% | 71.52% | 78.63% | 86.71% | 85.17% | 82.04% | 4 h 43 min 24 s |
VGG-16 | 87.40% | 69.85% | 78.63% | 85.35% | 85.32% | 81.44% | 5 h 54 min 27 s |
DenseNet-161 | 90.38% | 76.55% | 86.12% | 87.31% | 87.61% | 85.68% | 5 h 31 min 16 s |
Resnet-50 | 91.21% | 82.58% | 82.68% | 87.76% | 85.32% | 85.90% | 6 h 15 min 04 s |
Basic | 91.38% | 88.11% | 85.96% | 91.39% | 90.83% | 89.55% | 6 h 01 min 24 s |
Multi-scale | 93.37% | 89.45% | 86.12% | 91.84% | 91.44% | 90.44% | 7 h 21 min 11 s |
MGFS | 94.53% | 92.46% | 90.80% | 93.81% | 94.65% | 93.26% | 5 h 42 min 42 s |
MRDA | 95.02% | 93.30% | 91.41% | 95.47% | 95.26% | 94.11% | 5 h 51 min 16 s |
NTS-Net [26] | 90.88% | 88.27% | 91.11% | 91.84% | 88.38% | 90.12% | 8 h 12 min 50 s |
DFL-Net [27] | 92.70% | 90.12% | 89.24% | 91.09% | 91.28% | 90.88% | 5 h 56 min 43 s |
BSNet [28] | 93.53% | 91.12% | 90.95% | 91.24% | 92.66% | 91.89% | 6 h 04 min 52 s |
MGFS + MRDA (MRDA-MGFSNet) | 96.85% | 93.63% | 93.14% | 97.13% | 96.48% | 95.47% | 5 h 58 min 20 s |
Network | Argynnis hyperbius | Monarch butterfly | Polygonia caureum | Danaus genutia | Papilio machaon | Average F1 Value |
---|---|---|---|---|---|---|
CNN | 84.51% | 68.10% | 76.33% | 82.68% | 83.46% | 79.01% |
AlexNet-fc6 | 87.69% | 72.02% | 78.34% | 85.98% | 85.63% | 81.93% |
VGG-16 | 87.66% | 69.27% | 78.94% | 86.06% | 85.41% | 81.47% |
DenseNet-161 | 90.72% | 75.88% | 86.26% | 87.85% | 87.14% | 85.57% |
Resnet-50 | 91.36% | 81.73% | 83.06% | 87.46% | 85.92% | 85.91% |
Basic | 90.86% | 87.28% | 85.49% | 92.67% | 89.97% | 89.25% |
MGFS | 92.76% | 92.54% | 92.68% | 93.88% | 94.29% | 93.23% |
MRDA | 93.32% | 93.45% | 93.46% | 95.11% | 95.04% | 94.08% |
NTS-Net | 88.10% | 88.27% | 91.25% | 92.54% | 90.17% | 90.07% |
DFL-Net | 89.44% | 90.34% | 90.08% | 91.85% | 92.56% | 90.85% |
BSNet | 90.17% | 91.43% | 92.32% | 91.31% | 94.17% | 91.88% |
MRDA-MGFSNet | 95.42% | 94.50% | 94.92% | 95.83% | 96.55% | 95.44% |
Network | mAP(%) |
---|---|
CNN | 78.45 |
AlexNet-fc6 | 81.24 |
VGG | 82.03 |
DenseNet-161 | 84.38 |
Resnet-50 | 86.55 |
NTS-Net | 90.23 |
DFL-Net | 89.27 |
BSNet | 91.09 |
MRDA-MGFSNet (ours) | 96.64 |
Network Accuracy/F1 | Argynnis hyperbius | Monarch butterfly | Polygonia caureum | Danaus genutia | Papilio machaon | Overall |
---|---|---|---|---|---|---|
Before adding noise | 96.19/97.22 | 93.80/94.52 | 92.98/93.35 | 97.58/97.41 | 97.25/96.57 | 95.60/95.81 |
After adding noise | 94.69/94.88 | 92.46/93.03 | 90.80/91.28 | 94.11/93.84 | 95.72/95.16 | 93.57/93.64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, M.; Zhou, G.; Cai, W.; Li, J.; Li, M.; He, M.; Hu, Y.; Li, L. MRDA-MGFSNet: Network Based on a Multi-Rate Dilated Attention Mechanism and Multi-Granularity Feature Sharer for Image-Based Butterflies Fine-Grained Classification. Symmetry 2021, 13, 1351. https://doi.org/10.3390/sym13081351
Li M, Zhou G, Cai W, Li J, Li M, He M, Hu Y, Li L. MRDA-MGFSNet: Network Based on a Multi-Rate Dilated Attention Mechanism and Multi-Granularity Feature Sharer for Image-Based Butterflies Fine-Grained Classification. Symmetry. 2021; 13(8):1351. https://doi.org/10.3390/sym13081351
Chicago/Turabian StyleLi, Maopeng, Guoxiong Zhou, Weiwei Cai, Jiayong Li, Mingxuan Li, Mingfang He, Yahui Hu, and Liujun Li. 2021. "MRDA-MGFSNet: Network Based on a Multi-Rate Dilated Attention Mechanism and Multi-Granularity Feature Sharer for Image-Based Butterflies Fine-Grained Classification" Symmetry 13, no. 8: 1351. https://doi.org/10.3390/sym13081351