Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive Regularization
<p>Block diagram of the proposed regularization method: (<b>a</b>) Overview of the training process with SPA (Self-Pixel Attention) module. (<b>b</b>) Overview of the training process with GCA (Global Channel Attention) module.</p> "> Figure 2
<p>Comparison of top-1 accuracy results with different submodules on Caltech 256-60. (<b>a</b>) Comparison between AST [<a href="#B31-sensors-24-03522" class="html-bibr">31</a>] and SPA modules. (<b>b</b>) Comparison between ACT [<a href="#B31-sensors-24-03522" class="html-bibr">31</a>] and GCA modules.</p> ">
Abstract
:1. Introduction
2. Related Works
3. The Proposed Method
3.1. The Self-Pixel Attention Submodule
3.2. The Global Channel Attention Submodule
3.3. Objective Function for Regularization
4. Details of the Experiments
4.1. Dataset Setup
4.2. The Structure of Network and Hyperparameters
5. Experimental Results
5.1. Performance Comparison
5.2. Ablation Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hussein, S.; Kandel, P.; Bolan, C.W.; Wallace, M.B.; Bagci, U. Lung and pancreatic tumor characterization in the deep learning era: Novel supervised and unsupervised learning approaches. IEEE Trans. Med. Imaging 2019, 38, 1777–1787. [Google Scholar] [CrossRef] [PubMed]
- Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; Volume 139, pp. 8821–8831. [Google Scholar]
- Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Gläser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1341–1360. [Google Scholar] [CrossRef]
- Kang, C.; Kang, S.-U. Self-supervised denoising image filter based on recursive deep neural network structure. Sensors 2021, 21, 7827. [Google Scholar] [CrossRef] [PubMed]
- Orabona, F.; Jie, L.; Caputo, B. Multi Kernel Learning with online-batch optimization. J. Mach. Learn. Res. 2012, 13, 227–253. [Google Scholar]
- Nilsback, M.-E.; Zisserman, A. Automated flower classification over a large number of classes. In Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 16–19 December 2008; pp. 722–729. [Google Scholar]
- Cui, Y.; Song, Y.; Sun, C.; Howard, A.; Belongie, S. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4109–4118. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Wang, Y.-X.; Ramanan, D.; Hebert, M. Growing a brain: Fine-tuning by increasing model capacity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2471–2480. [Google Scholar]
- Guo, Y.; Shi, H.; Kumar, A.; Grauman, K.; Rosing, T.; Feris, R. Spottune: Transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4805–4814. [Google Scholar]
- Tan, T.; Li, Z.; Liu, H.; Zanjani, F.G.; Ouyang, Q.; Tang, Y.; Hu, Z.; Li, Q. Optimize transfer learning for lung diseases in bronchoscopy using a new concept: Sequential fine-tuning. IEEE J. Transl. Eng. Health Med. 2018, 6, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Ge, W.; Yu, Y. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1086–1095. [Google Scholar]
- Ng, H.-W.; Nguyen, V.D.; Vonikakis, V.; Winkler, S. Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; pp. 443–449. [Google Scholar]
- Zhao, W. Research on the deep learning of the small sample data based on transfer learning. Proc. AIP Conf. 2017, 1864, 020018. [Google Scholar]
- Mohammadian, S.; Karsaz, A.; Roshan, Y.M. Comparative study of fine-tuning of pre-trained convolutional neural networks for diabetic retinopathy screening. In Proceedings of the 2017 24th National and 2nd International Iranian Conference on Biomedical Engineering, Tehran, Iran, 30 November–1 December 2017; pp. 1–6. [Google Scholar]
- Pratt, H.; Coenen, F.; Broadbent, D.M.; Harding, S.P.; Zheng, Y. Convolutional neural networks for diabetic retinopathy. Procedia Comput. Sci. 2016, 90, 200–205. [Google Scholar] [CrossRef]
- Wang, Z.; Dai, Z.; Poczos, B.; Carbonell, J. Characterizing and avoiding negative transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11293–11302. [Google Scholar]
- Zhao, Z.; Zhang, B.; Jiang, Y.; Xu, L.; Li, L.; Ma, W.-Y. Effective domain knowledge transfer with soft fine-tuning. arXiv 2019, arXiv:1909.02236. [Google Scholar]
- Li, X.; Grandvalet, Y.; Davoine, F. Explicit inductive bias for transfer learning with convolutional networks. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 2825–2834. [Google Scholar]
- Li, X.; Grandvalet, Y.; Davoine, F. A baseline regularization scheme for transfer learning with convolutional neural networks. Pattern Recognit. 2020, 98, 107049. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for thin deep nets. arXiv 2014, arXiv:1412.6550. [Google Scholar]
- Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved knowledge distillation via teacher assistant. Proc. AAAI Conf. Artif. Intell. 2020, 34, 5191–5198. [Google Scholar] [CrossRef]
- Li, T.; Li, J.; Liu, Z.; Zhang, C. Few sample knowledge distillation for efficient network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14639–14647. [Google Scholar]
- Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Xiong, H.; Wang, H.; Rao, Y.; Liu, L.; Chen, Z.; Huan, J. Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv 2019, arXiv:1901.09229. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
- Xie, Z.; Wen, Z.; Wang, Y.; Wu, Q.; Tan, M. Towards effective deep transfer via attentive feature alignment. Neural Netw. 2021, 138, 98–109. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Khosla, A.; Jayadevaprakash, N.; Yao, B.; Li, F.-F. Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of the First Workshop on Fine-Grained Visual Categorization in IEEE Conference on Computer Vision and Pattern Recognition, Barcelona, Spain, 6–13 November 2011; pp. 1–2. [Google Scholar]
- Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset; California Institute of Technology: Pasadena, CA, USA, 2007. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed]
- Quattoni, A.; Torralba, A. Recognizing indoor scenes. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 413–420. [Google Scholar]
Target Dataset | Task | Train Samples | Test Samples | Classes |
---|---|---|---|---|
Stanford Dogs 120 | Object Classification | 12,000 | 8580 | 120 |
Caltech 256-30 | Object Classification | 7710 | 5140 | 257 |
Caltech 256-60 | Object Classification | 15,420 | 5140 | 257 |
CUB-200-2011 | Object Classification | 5994 | 5794 | 200 |
MIT Indoor 67 | Scene Classification | 5360 | 1340 | 67 |
Model Name | Layer Type | Parameter | Value |
---|---|---|---|
Fully connected | Input size | ||
Output size | |||
Activation | ReLU | ||
Fully connected | Input size | ||
Output size | |||
Fully connected | Input size | ||
Output size | |||
Activation | ReLU | ||
Fully connected | Input size | ||
Output size |
Layer Index | ResNet-101 | ResNet-50 | Inception-V3 | MobileNetV2 |
---|---|---|---|---|
1 | Resnet.layer1.2.conv3 | Resnet.layer1.2.conv3 | Conv2d_4a_3×3 | features.5.conv.2 |
2 | Resnet.layer2.3.conv3 | Resnet.layer2.3.conv3 | Mixed_5d | features.9.conv.2 |
3 | Resnet.layer3.22.conv3 | Resnet.layer3.5.conv3 | Mixed_6e | features.13.conv.2 |
4 | Resnet.layer4.2.conv3 | Resnet.layer4.2.conv3 | Mixed_7c | features.17.conv.2 |
Model | Dataset | Methods | |||||
---|---|---|---|---|---|---|---|
[28] | [21] | DELTA [28] | AFA [31] | Proposed | |||
ⓐ | ① | ||||||
② | |||||||
③ | |||||||
④ | |||||||
ⓑ | ⑤ |
Model | Dataset | Modules | |||
---|---|---|---|---|---|
AST [31] | ACT [31] | SPA | GCA | ||
ResNet-101 | Stanford Dogs 120 | ||||
CUB-200-2011 | |||||
Caltech 256-30 | |||||
Caltech 256-60 | |||||
ResNet-50 | MIT Indoor 67 |
Model | Dataset | Reduction Rate r | |||
---|---|---|---|---|---|
4 | 8 | 16 | 32 | ||
ResNet-101 | Caltech 256-30 |
Model | Dataset | Methods | |
---|---|---|---|
SPA→GCA | GCA→SPA | ||
ResNet-101 | Stanford Dogs 120 | ||
CUB-200-2011 | |||
Caltech 256-30 | |||
Caltech 256-60 | |||
ResNet-50 | MIT Indoor 67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, C.; Kang, S.-u. Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive Regularization. Sensors 2024, 24, 3522. https://doi.org/10.3390/s24113522
Kang C, Kang S-u. Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive Regularization. Sensors. 2024; 24(11):3522. https://doi.org/10.3390/s24113522
Chicago/Turabian StyleKang, Changhee, and Sang-ug Kang. 2024. "Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive Regularization" Sensors 24, no. 11: 3522. https://doi.org/10.3390/s24113522
APA StyleKang, C., & Kang, S. -u. (2024). Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive Regularization. Sensors, 24(11), 3522. https://doi.org/10.3390/s24113522