Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation

Peng Ye¹,
Baopu Li²,
Tao Chen ORCID: orcid.org/0000-0002-6498-0138¹,
Jiayuan Fan³,
Zhen Mei¹,
Chen Lin⁵,
Chongyan Zuo⁴,
Qinghua Chi⁴ &
…
Wanli Ouyang^6,7

848 Accesses
1 Altmetric
Explore all metrics

Abstract

Semantic segmentation is a popular research topic in computer vision, and many efforts have been made on it with impressive results. In this paper, we intend to search an optimal network structure that can run in real-time for this problem. Towards this goal, we jointly search the depth, channel, dilation rate and feature spatial resolution, which results in a search space consisting of about $2.78\times 10^{324}$ possible choices. To handle such a large search space, we leverage differential architecture search methods. However, the architecture parameters searched using existing differential methods need to be discretized, which causes the discretization gap between the architecture parameters found by the differential methods and their discretized version as the final solution for the architecture search. Hence, we relieve the problem of discretization gap from the innovative perspective of solution space regularization. Specifically, a novel Solution Space Regularization (SSR) loss is first proposed to effectively encourage the supernet to converge to its discrete one. Then, a new Hierarchical and Progressive Solution Space Shrinking method is presented to further achieve high efficiency of searching. In addition, we theoretically show that the optimization of SSR loss is equivalent to the $L_{0}$-norm regularization, which accounts for the improved search-evaluation gap. Comprehensive experiments show that the proposed search scheme can efficiently find an optimal network structure that yields an extremely fast speed (175 FPS) of segmentation with a small model size (1 M) while maintaining comparable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation

Article 19 February 2021

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-Shape Priors for Image Segmentation

Article 13 April 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: Train one network and specialize it for efficient deployment. In: International Conference on Learning Representations (2019)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, W., Gong, X., Liu, X., Zhang, Q., Li, Y., Wang, Z.: Fasterseg: Searching for faster real-time semantic segmentation. In: International Conference on Learning Representations (2019)
Chen, X., Hsieh, C.J.: Stabilizing differentiable architecture search via perturbation-based regularization. In: International Conference on Machine Learning, pp. 1554–1565. PMLR (2020)
Chen, X., Xie, L., Wu, J., & Tian, Q. (2021). Progressive darts: Bridging the optimization gap for nas in the wild. International Journal of Computer Vision, 129(3), 638–655.
Article Google Scholar
Chu, X., Zhou, T., Zhang, B., Li, J.: Fair darts: Eliminating unfair advantages in differentiable architecture search. In: European conference on computer vision, pp. 465–480. Springer (2020)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223 (2016)
Du, X., Lin, T.Y., Jin, P., Ghiasi, G., Tan, M., Cui, Y., Le, Q.V., Song, X.: Spinenet: Learning scale-permuted backbone for recognition and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11592–11601 (2020)
Emara, T., Munim, H.E.A.E., Abbas, H.M.: Liteseg: A novel lightweight convnet for semantic segmentation. 2019 Digital Image Computing: Techniques and Applications (DICTA) (2019). https://doi.org/10.1109/dicta47822.2019.8945975
Guo, J., Ouyang, W., Xu, D.: Multi-dimensional pruning: A unified framework for model compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1508–1517 (2020)
Hu, K., Wang, Z., Wang, W., Martens, K. A. E., Wang, L., Tan, T., Lewis, S. J., & Feng, D. D. (2019). Graph sequence recurrent neural network for vision-based freezing of gait detection. IEEE Transactions on Image Processing, 29, 1890–1901.
Article MathSciNet Google Scholar
Li, G., Qian, G., Delgadillo, I.C., Muller, M., Thabet, A., Ghanem, B.: Sgas: Sequential greedy architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1620–1630 (2020)
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: Deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)
Liang, F., Lin, C., Guo, R., Sun, M., Wu, W., Yan, J., Ouyang, W.: Computation reallocation for object detection. In: International Conference on Learning Representations (2019)
Liang, N., Wu, G., Kang, W., Wang, Z., & Feng, D. D. (2018). Real-time long-term tracking with prediction-detection-correction. IEEE Transactions on Multimedia, 20(9), 2289–2302.
Article Google Scholar
Lin, P., Sun, P., Cheng, G., Xie, S., Li, X., Shi, J.: Graph-guided architecture search for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2020)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)
Liu, C., Chen, L.C., Schroff, F., Adam, H., Hua, W., Yuille, A.L., Fei-Fei, L.: Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 82–92 (2019)
Liu, H., Simonyan, K., Yang, Y.: Darts: Differentiable architecture search. In: International Conference on Learning Representations (2018)
Liu, S., Lin, Z., Wang, Y., Zhang, J., Perazzi, F., Johns, E.: Shape adaptor: A learnable resizing module. In: European Conference on Computer Vision, pp. 661–677. Springer (2020)
Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 12607–12616 (2019)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Qu, W., Wang, Z., Hong, H., Chi, Z., Feng, D. D., Grunstein, R., & Gordon, C. (2020). A residual based attention model for eeg based sleep staging. IEEE journal of biomedical and health informatics, 24(10), 2833–2843.
Article Google Scholar
Romera, E., Alvarez, J. M., Bergasa, L. M., & Arroyo, R. (2017). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272.
Article Google Scholar
Sun, P., Wu, J., Li, S., Lin, P., Huang, J., Li, X.: Real-time semantic segmentation via auto depth, downsampling joint decision and feature aggregation. International Journal of Computer Vision pp. 1–20 (2021)
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Tian, G. L., Ng, K. W., & Philip, L. (2011). A note on the binomial model with simplex constraints. Computational statistics & data analysis, 55(12), 3381–3385.
Article MathSciNet Google Scholar
Wan, A., Dai, X., Zhang, P., He, Z., Tian, Y., Xie, S., Wu, B., Yu, M., Xu, T., Chen, K., et al.: Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12965–12974 (2020)
Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp. 1451–1460. IEEE (2018)
Xie, S., Zheng, H., Liu, C., Lin, L.: Snas: stochastic neural architecture search. In: International Conference on Learning Representations (2018)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., & Sang, N. (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129(11), 3051–3068.
Article Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 325–341 (2018)
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. IEEE Computer Society (2017)
Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.: Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv:1805.04687 arXiv preprint 2(5), 6 (2018)
Yu, J., Jin, P., Liu, H., Bender, G., Kindermans, P.J., Tan, M., Huang, T., Song, X., Pang, R., Le, Q.: Bignas: Scaling up neural architecture search with big single-stage models. In: European Conference on Computer Vision, pp. 702–717. Springer (2020)
Zhang, Y., Qiu, Z., Liu, J., Yao, T., Liu, D., Mei, T.: Customizable architecture search for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11641–11650 (2019)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017)
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302–321.
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No. 62071127, U1909207), Shanghai Municipal Science and Technology Major Project (No.2021SHZDZX0103), and Zhejiang Lab Project (No. 2021KH0AB05).

Author information

Authors and Affiliations

School of Information Science and Technology, Fudan University, Shanghai, China
Peng Ye, Tao Chen & Zhen Mei
Oracle Health and AI, Oracle, USA
Baopu Li
Academy for Engineering and Technology, Fudan University, Shanghai, China
Jiayuan Fan
Huawei Inc. China, Huawei, China
Chongyan Zuo & Qinghua Chi
University of Oxford, Oxford, England
Chen Lin
University of Sydney, Sydney, Australia
Wanli Ouyang
Shanghai AI Laboratory, Shanghai, China
Wanli Ouyang

Authors

Peng Ye
View author publications
You can also search for this author in PubMed Google Scholar
Baopu Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiayuan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Mei
View author publications
You can also search for this author in PubMed Google Scholar
Chen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chongyan Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Chi
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Chen.

Additional information

Communicated by Minsu Cho.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Baopu Li: Co-first author.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 192 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, P., Li, B., Chen, T. et al. Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation. Int J Comput Vis 130, 2674–2694 (2022). https://doi.org/10.1007/s11263-022-01663-z

Download citation

Received: 09 May 2021
Accepted: 16 July 2022
Published: 24 August 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11263-022-01663-z

Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-Shape Priors for Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 192 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-Shape Priors for Image Segmentation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 192 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation