Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-19809-0_35guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

Published: 23 October 2022 Publication History

Abstract

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose p osition a ware ci r cular c onvolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-excitation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5× parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance. Source code is available at https://github.com/hkzhang91/ParC-Net.

References

[1]
Bai, J., Lu, F., Zhang, K., et al.: ONNX: open neural network exchange (2019). https://github.com/onnx/onnx
[2]
Chen, G., Wang, Y., Li, H., Dong, W.: TinyNet: a lightweight, modular, and unified network architecture for the Internet of Things. In: Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, pp. 9–11 (2019)
[3]
Chen, Y., et al.: Mobile-former: bridging MobileNet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)
[4]
Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. In: Advances in Neural Information Processing Systems 34 (2021)
[5]
Dosovitskiy, A., et al.: An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[6]
d’Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: ConViT: improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, pp. 2286–2296. PMLR (2021)
[7]
Everingham M, Eslami S, Gool LV, Williams C, Winn J, and Zisserman A The pascal visual object classes challenge: a retrospective Int. J. Comput. Vis. 2015 111 1 98-136
[8]
Graham, B., et al.: LeViT: a vision transformer in ConvNet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12259–12269 (2021)
[9]
Guo, J., et al.: CMT: convolutional neural networks meet vision transformers. arXiv preprint arXiv:2107.06263 (2021)
[10]
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
[11]
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11936–11945 (2021)
[12]
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
[13]
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
[14]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
[15]
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
[16]
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems 28 (2015)
[17]
Jiang, X., et al.: MNN: a universal and efficient inference engine. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 1–13 (2020)
[18]
Li, Y., et al.: MicroNet: improving image recognition with extremely low flops. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 468–477 (2021)
[19]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[20]
Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot MultiBox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37
[21]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
[22]
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
[23]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. ICLR (2019)
[24]
Ma N, Zhang X, Zheng H-T, and Sun J Ferrari V, Hebert M, Sminchisescu C, and Weiss Y ShuffleNet V2: practical guidelines for efficient CNN architecture design Computer Vision – ECCV 2018 2018 Cham Springer 122-138
[25]
Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. ICLR (2022)
[26]
Polyak BT and Juditsky AB Acceleration of stochastic approximation by averaging SIAM J. Control Optim. 1992 30 4 838-855
[27]
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
[28]
Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)
[29]
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 843–852 (2017)
[30]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
[31]
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
[32]
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
[33]
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
[34]
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
[35]
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)
[36]
Woo S, Park J, Lee J-Y, and Kweon IS Ferrari V, Hebert M, Sminchisescu C, and Weiss Y CBAM: convolutional block attention module Computer Vision – ECCV 2018 2018 Cham Springer 3-19
[37]
Wu, H., et al.: CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)
[38]
Xiao, T., Dollar, P., Singh, M., Mintun, E., Darrell, T., Girshick, R.: Early convolutions help transformers see better. In: Advances in Neural Information Processing Systems 34 (2021)
[39]
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
[40]
Yu, W., et al.: Metaformer is actually what you need for vision. arXiv preprint arXiv:2111.11418 (2021)

Cited By

View all
  • (2023)Lightweight vision transformer with bidirectional interactionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666792(15234-15251)Online publication date: 10-Dec-2023
  • (2023)SkaNet: Split Kernel Attention NetworkArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44192-9_37(459-473)Online publication date: 26-Sep-2023

Index Terms

  1. ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI
        Oct 2022
        814 pages
        ISBN:978-3-031-19808-3
        DOI:10.1007/978-3-031-19809-0

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 23 October 2022

        Author Tags

        1. Light-weight
        2. Edge devices
        3. Pure ConvNet
        4. Vision transformer

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Lightweight vision transformer with bidirectional interactionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666792(15234-15251)Online publication date: 10-Dec-2023
        • (2023)SkaNet: Split Kernel Attention NetworkArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44192-9_37(459-473)Online publication date: 26-Sep-2023

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media