research-article

Deep Learning for Fine-Grained Image Recognition: A Comprehensive Study

Authors:

Dianhui ChuAuthors Info & Claims

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference

Pages 31 - 39

https://doi.org/10.1145/3512353.3512359

Published: 14 March 2022 Publication History

Abstract

In computer vision, image recognition is a noteworthy and hot research area which develops rapidly. The principal task of this technique is to automatically predict which pre-defined categories an image might belong to. Traditional image recognition targets to classify images into diversified highly distinguished categories. However, Fine-Grained Image Recognition (FGIR) aims to recognize the variances among images categorized in subordinate classes, e.g., species of birds, types of cars or species of flowers, which are equivalent to “species” in Taxonomy in certain aspects. As a result, models of FGIR are required to pick out features from finer granularity. Conventional methods apply special feature encoding to explore discernible attributes, while recent methods of FGIR makes great advancement with assistance of deep learning which has obtained the remarkable development nowadays. In this paper, we provide a new integration of the current leading FGIR models according to how they improve the development of FGIR. We classified them into five main categories and then compared their performance on three popular datasets and analyzed the results. To advance the further development of this topic, we point out some open problems worth further exploring.

References

[1]

Irving Biederman “Subordinate-level Object Classification Reexamined”. Psychological Research, 62, 131-153, 1999.

[2]

Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. arXiv, 409.1556, 2015.

[3]

Kaiming He “Deep Residual Learning for Image Recognition.” arXiv, 1512.03385, 2015.

[4]

Gao Huang “Densely Connected Convolutional Networks”. arXiv, 1608.06993, 2018.

[5]

Jie Hu “Squeeze-and-Excitation Networks”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023, 2020.

Digital Library

[6]

Bo Zhao “A Survey on Deep Learning-based Fine-grained Object Classification and Semantic Segmentation”. International Journal of Automation and Computing, 14, 119-135, 2017.

Digital Library

[7]

Yafei Wang and Zepeng Wang. “A Survey of Recent Work on Fine-grained Image Classification Techniques”. Journal of Visual Communication and Image Representation, 59, 210-214, 2019.

Digital Library

[8]

Xiu Shen Wei, Jianxin Wu, and Quan Cui. “Deep Learning for Fine-Grained Image Analysis: A Survey”. arXiv, 1907.03069, 2019.

[9]

Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNNs for Fine-grained Visual Recognition”. arXiv, 1504.07889, 2017.

[10]

Yen-Chi Hsu “ACE: Adaptive Confusion Energy for Natural World Data Distribution”. arXiv, 1910.12423, 2021.

[11]

Harald Hanselmann and Hermann Ney. “ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding”. arXiv, 1911.07344, 2019.

[12]

Yang Gao “Compact Bilinear Pooling”. arXiv, 1511.06062, 2016.

[13]

Y. Cui “Kernel Pooling for Convolutional Neural Networks”. IEEE Conference on Computer Vision and Pattern Recognition, 3049-3058, 2017.

[14]

Shu Kong and Charless Fowlkes. “Low-rank Bilinear Pooling for Fine-Grained Classification”, IEEE Conference on Computer Vision and Pattern Recognition, 7025-7034, 2017.

[15]

Tsung-Yu Lin and Subhransu Maji. “Improved Bilinear Pooling with CNNs”, arXiv, 1707.06772, 2017.

[16]

Eric Mitchell “Higher-Order Function Networks for Learning Composable 3D Object Representations”. arXiv, 1907.10388, 2020.

[17]

Yaming Wang, Vlad I. Morariu, and Larry S. Davis. “Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4148-4157, 2018.

[18]

Peiqin Zhuang, Yali Wang, and Yu Qiao. “Learning Attentive Pairwise Interaction for Fine-Grained Classification”. arXiv, 2002.10191, 2020.

[19]

Ning Zhang “Part-based RCNN for Fine Grained Detection”. arXiv, 1407.3867, 2014.

[20]

Tianjun Xiao “The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification”. IEEE Conference on Computer Vision and Pattern Recognition, 842-850, 2015.

[21]

Y. Zhang “Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation”. IEEE Transactions on Image Processing, 10(13), 4652, 2016.

[22]

Jianlong Fu, Heliang Zheng, and Tao Mei. “Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 4438-4446, 2017.

[23]

E. Gavves “Fine-Grained Categorization by Alignments”. IEEE International Conference on Computer Vision, 1713-1720, 2013.

Digital Library

[24]

Bo Zhao “Diversified Visual Attention Networks for Fine-Grained Object Classification”. IEEE Transactions on Multimedia, 6, 1245–1256, 2017.

Digital Library

[25]

Heliang Zheng “Learning Rich Part Hierarchies with Progressive Attention Networks for Fine Grained Image Recognition”. IEEE Transactions on Image Processing, 29, 1057-7149, 2020.

Digital Library

[26]

Weifeng Ge, Xiangru Lin, and Yizhou Yu. “Weakly Supervised Complementary Parts Models for Fine Grained Image Classification from the Bottom Up”. IEEE Conference on Computer Vision and Pattern Recognition, 3029-3038, 2019.

[27]

Zhang Wei, Chen Yu, Bai Yalong and Mei Tao. “Destruction and Construction Learning for Fine Grained Image Recognition”. IEEE Conference on Computer Vision and Pattern Recognition, 5157-5166, 2019.

[28]

Ruoyi Du “Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches”. European Conference on Computer Vision, 23-28, 2020.

Digital Library

[29]

Shaokang Yang “Re-rank Coarse Classification with Local Region Enhanced Features for Fine Grained Image Recognition”. arXiv, 2102.09875, 2021.

[30]

Dongliang Chang “‘Your “Flamingo’ is My ‘Bird’: Fine-Grained, or Not”. arXiv, 2011.09040, 2021.

[31]

E. D. Cubuk “Randaugment: Practical Automated Data Augmentation with a Reduced Search Space”. arXiv, 1909.13719, 2020.

[32]

Ryuichiro Hataya “Faster AutoAugment: Learning Augmentation Strategies using Backpropagation.” arXiv, 1911.06987, 2019.

[33]

Keyu Tian “Improving Auto-Augment via Augmentation-Wise Weight Sharing”. arXiv, 2009.14737v2, 2020.

[34]

Barret Zoph “Learning Data Augmentation Strategies for Object Detection”. European Conference on Computer Vision, 566-583, 2020.

Digital Library

[35]

Longhui Wei “Circumventing Outliers of AutoAugment with Knowledge Distillation”. European Conference on Computer Vision, 608-625, 2020.

Digital Library

[36]

Ross Girshick “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. arXiv, 1311.2524, 2014.

[37]

S. Maji “Fine-Grained Visual Classification of Aircraft”. arXiv, 1306.5151, 2013.

[38]

Ekin Dogus Cubuk “AutoAugment: Learning Augmentation Policies from Data”. arXiv, 1805.09501, 2019.

[39]

Terrance DeVries and Graham W. Taylor. “Improved Regularization of Convolutional Neural Networks with Cutout”. arXiv, 1708.04552, 2017.

[40]

Hiroshi Inoue. “Data Augmentation by Pairing Samples for Images Classification”. arXiv, 1801.02929, 2018.

[41]

C. Wah “The Caltech-UCSD Birds-200-2011 Dataset”. California Institute of Technology, 2011.

[42]

Jonathan Krause “3D Object Representations for Fine-Grained Categorization”. 4th International IEEE Workshop on 3D Representation and Recognition, 554-561, 2013.

Digital Library

[43]

Vinod Nair and Geoffrey E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines”. 27th International Conference on International Conference on Machine Learning, 807-814, 2010.

Digital Library

[44]

Christian Szegedy “Inception-v4, Inception ResNet and the Impact of Residual Connections on Learning”. arXiv, 1602.07261, 2016.

[45]

Harald Hanselmann and Hermann Ney. “Fine Grained Visual Classification with Efficient End-to-end Localization.” arXiv, 2005.05123, 2020.

[46]

H. Zheng “Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition”. IEEE International Conference on Computer Vision, 52-63, 2017.

[47]

Jiquan Ngiam “Domain Adaptive Transfer Learning with Specialist Models”. arXiv, 1811.07056, 2018.

[48]

Guolei Sun “Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes”. arXiv, 1912.06842, 2019.

[49]

David Held, Sebastian Thrun, and Silvio Savarese. “Robust Single-View Instance Recognition”. IEEE International Conference on Robotics and Automation, 2152-2159, 2016.

Digital Library

[50]

FH Hamker. “Life-long Learning Cell Structures Continuously Learning without Catastrophic Interference”. Neural networks: the Official Journal of the International Neural Network Society, 14, 4-5, 2001.

Digital Library

[51]

Matthias Feurer “Efficient and Robust Automated Machine Learning”. Advances in Neural Information Processing Systems, 113-134, 2015.

[52]

Olga Russakovsky “ImageNet Large Scale Visual Recognition Challenge”. International Journal of Computer Vision, 115, 211-252, 2015.

Digital Library

Cited By

Wang YPu JMiao DZhang LZhang LDu X(2024)SCGRFuse: An infrared and visible image fusion network based on spatial/channel attention mechanism and gradient aggregation residual dense blocksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107898132(107898)Online publication date: Jun-2024
https://doi.org/10.1016/j.engappai.2024.107898
Su JYu XWang XWang ZChao G(2024)Enhanced transfer learning with data augmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107602129:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107602
Shim SHussain LAziz WAlshdadi AAlzahrani AOmar A(2024) Deep learning convolutional neural network ResNet101 and radiomic features accurately analyzes mpMRI imaging to predict MGMT promoter methylation status with transfer learning approach International Journal of Imaging Systems and Technology10.1002/ima.2305934:2Online publication date: 24-Mar-2024
https://doi.org/10.1002/ima.23059
Show More Cited By

Recommendations

Deep learning in food category recognition
Highlights
- We analysed over 350 references from all well-famed databases.
- We provided a ...
Abstract
Integrating artificial intelligence with food category recognition has been a field of interest for research for the past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The modern ...
YNBIRDS: A System for Fine-Grained Bird Image Recognition
Pattern Recognition and Computer Vision
Abstract
Fine-grained bird image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. This paper ...
A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
Highlights
- We examine challenges and vicinity distribution to demonstrate the necessity of image augmentation for deep learning.
Abstract
Although deep learning has achieved satisfactory performance in computer vision, a large volume of images is required. However, collecting images is often expensive and challenging. Many image augmentation algorithms have been proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

APIT '22: Proceedings of the 2022 4th Asia Pacific Information Technology Conference

January 2022

239 pages

ISBN:9781450395571

DOI:10.1145/3512353

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Research and Development Plan in Shandong Province

Conference

APIT 2022

APIT 2022: 2022 4th Asia Pacific Information Technology Conference

January 14 - 16, 2022

Virtual Event, Thailand

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
161
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)2

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YPu JMiao DZhang LZhang LDu X(2024)SCGRFuse: An infrared and visible image fusion network based on spatial/channel attention mechanism and gradient aggregation residual dense blocksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107898132(107898)Online publication date: Jun-2024
https://doi.org/10.1016/j.engappai.2024.107898
Su JYu XWang XWang ZChao G(2024)Enhanced transfer learning with data augmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107602129:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107602
Shim SHussain LAziz WAlshdadi AAlzahrani AOmar A(2024) Deep learning convolutional neural network ResNet101 and radiomic features accurately analyzes mpMRI imaging to predict MGMT promoter methylation status with transfer learning approach International Journal of Imaging Systems and Technology10.1002/ima.2305934:2Online publication date: 24-Mar-2024
https://doi.org/10.1002/ima.23059
Yuan ZDing HChao GSong MWang LDing WChu D(2023)A Diabetes Prediction System Based on Incomplete Fused Data SourcesMachine Learning and Knowledge Extraction10.3390/make50200235:2(384-399)Online publication date: 10-Apr-2023
https://doi.org/10.3390/make5020023
Ren JXiong YXie XDai Y(2022)Learning Transferable Feature Representation with Swin Transformer for Object RecognitionNeural Processing Letters10.1007/s11063-022-11004-355:3(2211-2223)Online publication date: 27-Aug-2022
https://dl.acm.org/doi/10.1007/s11063-022-11004-3
Chai TLi JWang YSun GGuo CZhang Z(2022)Vascular Enhancement Analysis in Lightweight Deep Feature SpaceNeural Processing Letters10.1007/s11063-022-10937-z55:3(2305-2320)Online publication date: 2-Jul-2022
https://dl.acm.org/doi/10.1007/s11063-022-10937-z

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents