Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3512353.3512359acmotherconferencesArticle/Chapter ViewAbstractPublication PagesapitConference Proceedingsconference-collections
research-article

Deep Learning for Fine-Grained Image Recognition: A Comprehensive Study

Published: 14 March 2022 Publication History

Abstract

In computer vision, image recognition is a noteworthy and hot research area which develops rapidly. The principal task of this technique is to automatically predict which pre-defined categories an image might belong to. Traditional image recognition targets to classify images into diversified highly distinguished categories. However, Fine-Grained Image Recognition (FGIR) aims to recognize the variances among images categorized in subordinate classes, e.g., species of birds, types of cars or species of flowers, which are equivalent to “species” in Taxonomy in certain aspects. As a result, models of FGIR are required to pick out features from finer granularity. Conventional methods apply special feature encoding to explore discernible attributes, while recent methods of FGIR makes great advancement with assistance of deep learning which has obtained the remarkable development nowadays. In this paper, we provide a new integration of the current leading FGIR models according to how they improve the development of FGIR. We classified them into five main categories and then compared their performance on three popular datasets and analyzed the results. To advance the further development of this topic, we point out some open problems worth further exploring.

References

[1]
Irving Biederman “Subordinate-level Object Classification Reexamined”. Psychological Research, 62, 131-153, 1999.
[2]
Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. arXiv, 409.1556, 2015.
[3]
Kaiming He “Deep Residual Learning for Image Recognition.” arXiv, 1512.03385, 2015.
[4]
Gao Huang “Densely Connected Convolutional Networks”. arXiv, 1608.06993, 2018.
[5]
Jie Hu “Squeeze-and-Excitation Networks”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023, 2020.
[6]
Bo Zhao “A Survey on Deep Learning-based Fine-grained Object Classification and Semantic Segmentation”. International Journal of Automation and Computing, 14, 119-135, 2017.
[7]
Yafei Wang and Zepeng Wang. “A Survey of Recent Work on Fine-grained Image Classification Techniques”. Journal of Visual Communication and Image Representation, 59, 210-214, 2019.
[8]
Xiu Shen Wei, Jianxin Wu, and Quan Cui. “Deep Learning for Fine-Grained Image Analysis: A Survey”. arXiv, 1907.03069, 2019.
[9]
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNNs for Fine-grained Visual Recognition”. arXiv, 1504.07889, 2017.
[10]
Yen-Chi Hsu “ACE: Adaptive Confusion Energy for Natural World Data Distribution”. arXiv, 1910.12423, 2021.
[11]
Harald Hanselmann and Hermann Ney. “ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding”. arXiv, 1911.07344, 2019.
[12]
Yang Gao “Compact Bilinear Pooling”. arXiv, 1511.06062, 2016.
[13]
Y. Cui “Kernel Pooling for Convolutional Neural Networks”. IEEE Conference on Computer Vision and Pattern Recognition, 3049-3058, 2017.
[14]
Shu Kong and Charless Fowlkes. “Low-rank Bilinear Pooling for Fine-Grained Classification”, IEEE Conference on Computer Vision and Pattern Recognition, 7025-7034, 2017.
[15]
Tsung-Yu Lin and Subhransu Maji. “Improved Bilinear Pooling with CNNs”, arXiv, 1707.06772, 2017.
[16]
Eric Mitchell “Higher-Order Function Networks for Learning Composable 3D Object Representations”. arXiv, 1907.10388, 2020.
[17]
Yaming Wang, Vlad I. Morariu, and Larry S. Davis. “Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4148-4157, 2018.
[18]
Peiqin Zhuang, Yali Wang, and Yu Qiao. “Learning Attentive Pairwise Interaction for Fine-Grained Classification”. arXiv, 2002.10191, 2020.
[19]
Ning Zhang “Part-based RCNN for Fine Grained Detection”. arXiv, 1407.3867, 2014.
[20]
Tianjun Xiao “The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification”. IEEE Conference on Computer Vision and Pattern Recognition, 842-850, 2015.
[21]
Y. Zhang “Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation”. IEEE Transactions on Image Processing, 10(13), 4652, 2016.
[22]
Jianlong Fu, Heliang Zheng, and Tao Mei. “Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4438-4446, 2017.
[23]
E. Gavves “Fine-Grained Categorization by Alignments”. IEEE International Conference on Computer Vision, 1713-1720, 2013.
[24]
Bo Zhao “Diversified Visual Attention Networks for Fine-Grained Object Classification”. IEEE Transactions on Multimedia, 6, 1245–1256, 2017.
[25]
Heliang Zheng “Learning Rich Part Hierarchies with Progressive Attention Networks for Fine Grained Image Recognition”. IEEE Transactions on Image Processing, 29, 1057-7149, 2020.
[26]
Weifeng Ge, Xiangru Lin, and Yizhou Yu. “Weakly Supervised Complementary Parts Models for Fine Grained Image Classification from the Bottom Up”. IEEE Conference on Computer Vision and Pattern Recognition, 3029-3038, 2019.
[27]
Zhang Wei, Chen Yu, Bai Yalong and Mei Tao. “Destruction and Construction Learning for Fine Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 5157-5166, 2019.
[28]
Ruoyi Du “Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches”. European Conference on Computer Vision, 23-28, 2020.
[29]
Shaokang Yang “Re-rank Coarse Classification with Local Region Enhanced Features for Fine Grained Image Recognition”. arXiv, 2102.09875, 2021.
[30]
Dongliang Chang “‘Your “Flamingo’ is My ‘Bird’: Fine-Grained, or Not”. arXiv, 2011.09040, 2021.
[31]
E. D. Cubuk “Randaugment: Practical Automated Data Augmentation with a Reduced Search Space”. arXiv, 1909.13719, 2020.
[32]
Ryuichiro Hataya “Faster AutoAugment: Learning Augmentation Strategies using Backpropagation.” arXiv, 1911.06987, 2019.
[33]
Keyu Tian “Improving Auto-Augment via Augmentation-Wise Weight Sharing”. arXiv, 2009.14737v2, 2020.
[34]
Barret Zoph “Learning Data Augmentation Strategies for Object Detection”. European Conference on Computer Vision, 566-583, 2020.
[35]
Longhui Wei “Circumventing Outliers of AutoAugment with Knowledge Distillation”. European Conference on Computer Vision, 608-625, 2020.
[36]
Ross Girshick “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. arXiv, 1311.2524, 2014.
[37]
S. Maji “Fine-Grained Visual Classification of Aircraft”. arXiv, 1306.5151, 2013.
[38]
Ekin Dogus Cubuk “AutoAugment: Learning Augmentation Policies from Data”. arXiv, 1805.09501, 2019.
[39]
Terrance DeVries and Graham W. Taylor. “Improved Regularization of Convolutional Neural Networks with Cutout”. arXiv, 1708.04552, 2017.
[40]
Hiroshi Inoue. “Data Augmentation by Pairing Samples for Images Classification”. arXiv, 1801.02929, 2018.
[41]
C. Wah “The Caltech-UCSD Birds-200-2011 Dataset”. California Institute of Technology, 2011.
[42]
Jonathan Krause “3D Object Representations for Fine-Grained Categorization”. 4th International IEEE Workshop on 3D Representation and Recognition, 554-561, 2013.
[43]
Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. 27th International Conference on International Conference on Machine Learning, 807-814, 2010.
[44]
Christian Szegedy “Inception-v4, Inception ResNet and the Impact of Residual Connections on Learning”. arXiv, 1602.07261, 2016.
[45]
Harald Hanselmann and Hermann Ney. “Fine Grained Visual Classification with Efficient End-to-end Localization.” arXiv, 2005.05123, 2020.
[46]
H. Zheng “Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE International Conference on Computer Vision, 52-63, 2017.
[47]
Jiquan Ngiam “Domain Adaptive Transfer Learning with Specialist Models”. arXiv, 1811.07056, 2018.
[48]
Guolei Sun “Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes”. arXiv, 1912.06842, 2019.
[49]
David Held, Sebastian Thrun, and Silvio Savarese. “Robust Single-View Instance Recognition”. IEEE International Conference on Robotics and Automation, 2152-2159, 2016.
[50]
FH Hamker. “Life-long Learning Cell Structures Continuously Learning without Catastrophic Interference”. Neural networks: the Official Journal of the International Neural Network Society, 14, 4-5, 2001.
[51]
Matthias Feurer “Efficient and Robust Automated Machine Learning”. Advances in Neural Information Processing Systems, 113-134, 2015.
[52]
Olga Russakovsky “ImageNet Large Scale Visual Recognition Challenge”. International Journal of Computer Vision, 115, 211-252, 2015.

Cited By

View all
  • (2024)SCGRFuse: An infrared and visible image fusion network based on spatial/channel attention mechanism and gradient aggregation residual dense blocksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107898132(107898)Online publication date: Jun-2024
  • (2024)Enhanced transfer learning with data augmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107602129:COnline publication date: 16-May-2024
  • (2024) Deep learning convolutional neural network ResNet101 and radiomic features accurately analyzes mpMRI imaging to predict MGMT promoter methylation status with transfer learning approach International Journal of Imaging Systems and Technology10.1002/ima.2305934:2Online publication date: 24-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference
January 2022
239 pages
ISBN:9781450395571
DOI:10.1145/3512353
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Computer Vision
  2. Convolutional Neural Network
  3. Deep Learning
  4. Fine-Grained Image Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Research and Development Plan in Shandong Province

Conference

APIT 2022
APIT 2022: 2022 4th Asia Pacific Information Technology Conference
January 14 - 16, 2022
Virtual Event, Thailand

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)2
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SCGRFuse: An infrared and visible image fusion network based on spatial/channel attention mechanism and gradient aggregation residual dense blocksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107898132(107898)Online publication date: Jun-2024
  • (2024)Enhanced transfer learning with data augmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107602129:COnline publication date: 16-May-2024
  • (2024) Deep learning convolutional neural network ResNet101 and radiomic features accurately analyzes mpMRI imaging to predict MGMT promoter methylation status with transfer learning approach International Journal of Imaging Systems and Technology10.1002/ima.2305934:2Online publication date: 24-Mar-2024
  • (2023)A Diabetes Prediction System Based on Incomplete Fused Data SourcesMachine Learning and Knowledge Extraction10.3390/make50200235:2(384-399)Online publication date: 10-Apr-2023
  • (2022)Learning Transferable Feature Representation with Swin Transformer for Object RecognitionNeural Processing Letters10.1007/s11063-022-11004-355:3(2211-2223)Online publication date: 27-Aug-2022
  • (2022)Vascular Enhancement Analysis in Lightweight Deep Feature SpaceNeural Processing Letters10.1007/s11063-022-10937-z55:3(2305-2320)Online publication date: 2-Jul-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media