Article

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

Authors:

Xiaoyu WangAuthors Info & Claims

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI

Pages 613 - 630

https://doi.org/10.1007/978-3-031-19809-0_35

Published: 23 October 2022 Publication History

Abstract

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose p osition a ware ci r cular c onvolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-excitation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5

\times

parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance. Source code is available at https://github.com/hkzhang91/ParC-Net.

References

[1]

Bai, J., Lu, F., Zhang, K., et al.: ONNX: open neural network exchange (2019). https://github.com/onnx/onnx

[2]

Chen, G., Wang, Y., Li, H., Dong, W.: TinyNet: a lightweight, modular, and unified network architecture for the Internet of Things. In: Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, pp. 9–11 (2019)

[3]

Chen, Y., et al.: Mobile-former: bridging MobileNet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)

[4]

Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. In: Advances in Neural Information Processing Systems 34 (2021)

[5]

Dosovitskiy, A., et al.: An image is worth

16 \times 16

words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

[6]

d’Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: ConViT: improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, pp. 2286–2296. PMLR (2021)

[7]

Everingham M, Eslami S, Gool LV, Williams C, Winn J, and Zisserman A The pascal visual object classes challenge: a retrospective Int. J. Comput. Vis. 2015 111 1 98-136

[8]

Graham, B., et al.: LeViT: a vision transformer in ConvNet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12259–12269 (2021)

[9]

Guo, J., et al.: CMT: convolutional neural networks meet vision transformers. arXiv preprint arXiv:2107.06263 (2021)

[10]

Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)

[11]

Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11936–11945 (2021)

[12]

Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

[13]

Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

[14]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

[15]

Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)

[16]

Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems 28 (2015)

[17]

Jiang, X., et al.: MNN: a universal and efficient inference engine. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 1–13 (2020)

[18]

Li, Y., et al.: MicroNet: improving image recognition with extremely low flops. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 468–477 (2021)

[19]

Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[20]

Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot MultiBox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37

[21]

Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

[22]

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)

[23]

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. ICLR (2019)

[24]

Ma N, Zhang X, Zheng H-T, and Sun J Ferrari V, Hebert M, Sminchisescu C, and Weiss Y ShuffleNet V2: practical guidelines for efficient CNN architecture design Computer Vision – ECCV 2018 2018 Cham Springer 122-138

[25]

Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. ICLR (2022)

[26]

Polyak BT and Juditsky AB Acceleration of stochastic approximation by averaging SIAM J. Control Optim. 1992 30 4 838-855

[27]

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

[28]

Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)

[29]

Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 843–852 (2017)

[30]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

[31]

Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)

[32]

Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

[33]

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

[34]

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

[35]

Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

[36]

Woo S, Park J, Lee J-Y, and Kweon IS Ferrari V, Hebert M, Sminchisescu C, and Weiss Y CBAM: convolutional block attention module Computer Vision – ECCV 2018 2018 Cham Springer 3-19

[37]

Wu, H., et al.: CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)

[38]

Xiao, T., Dollar, P., Singh, M., Mintun, E., Darrell, T., Girshick, R.: Early convolutions help transformers see better. In: Advances in Neural Information Processing Systems 34 (2021)

[39]

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

[40]

Yu, W., et al.: Metaformer is actually what you need for vision. arXiv preprint arXiv:2111.11418 (2021)

Cited By

Fan QHuang HZhou XHe ROh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Lightweight vision transformer with bidirectional interactionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666792(15234-15251)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666792
Chen LJia DGao HWu FZhao J(2023)SkaNet: Split Kernel Attention NetworkArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44192-9_37(459-473)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-44192-9_37

Index Terms

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Fried Convnets
ICCV '15: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)

The fully-connected layers of deep convolutional neural networks typically contain over 90% of the network parameters. Reducing the number of parameters while preserving predictive performance is critically important for training big models in ...
Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
Abstract
Benefiting from the advantages of good parallelism and features that support long-distance dependency modeling, a variety of ViT models based on the self-attention mechanism show outstanding performance in image classification tasks. ...
Graphical abstract

Display Omitted
Highlights
- Constructed a self-supervised vision Transformer classification model.
- Proposed ...
The Application of Vision Transformer in Image Classification
ICVARS '22: Proceedings of the 2022 6th International Conference on Virtual and Augmented Reality Simulations

This project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI

Oct 2022

814 pages

ISBN:978-3-031-19808-3

DOI:10.1007/978-3-031-19809-0

Editors:
Shai Avidan
Tel Aviv University, Tel Aviv, Israel
,
Gabriel Brostow
University College London, London, UK
,
Moustapha Cissé
Google AI, Accra, Ghana
,
Giovanni Maria Farinella
University of Catania, Catania, Italy
,
Tal Hassner
Facebook (United States), Menlo Park, CA, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan QHuang HZhou XHe ROh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Lightweight vision transformer with bidirectional interactionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666792(15234-15251)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666792
Chen LJia DGao HWu FZhao J(2023)SkaNet: Split Kernel Attention NetworkArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44192-9_37(459-473)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-44192-9_37

View Options

View options

Figures

Tables

Media

View Table of Conten