Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling

Xiaolong Liu¹ &
Zhidong Deng¹

1182 Accesses
40 Citations
8 Altmetric
1 Mention
Explore all metrics

Abstract

In recent years, the self-driving car has rapidly been developing around the world. Based on deep learning, monocular vision-based environmental perceptions of either ADAS or self-driving cars are regarded as a feasible and sophisticated solution, in terms of achieving human-level performance at a low cost. Perceived surroundings generally include lane markings, curbs, drivable roads, intersections, obstacles, traffic signs, and landmarks used for navigation. Reliable detection or segmentation of drivable roads provides a solid foundation for obstacle detection during autonomous driving of the self-driving car. This paper proposes an RPP model for monocular vision-based road detection based on the combination of fully convolutional network, residual learning, and pyramid pooling. Specifically, the RPP is a deep fully convolutional residual neural network with pyramid pooling. In order to greatly improve prediction accuracy on the KITTI-ROAD detection task, we present a new strategy through an addition of road edge labels and an introduction of an appropriate data augmentation so as to effectively handle small training samples contained in the KITTI road detection. The experiments demonstrate that our RPP has achieved remarkable results, which ranks second in both unmarked road and marked road tasks, fifth in multiple-marked-lane task, and third in combination task. In this paper, we propose a powerful 112-layer RPP model through the incorporation of residual connections and pyramid pooling into a fully convolutional neural network framework. For small training sample problems such as the KITTI-ROAD detection, we present a new strategy through an addition of road edge labels and data augmentation. It suggests that addition of more labels and introduction of appropriate data augmentation can help deal with small training image problems. Moreover, a larger size of crops or combination with more global information also benefit improvements in road segmentation accuracy. If regardless of restricted computing and memory resources for such large-scale networks like RPP, the use of raw images instead of any crops and the selection of a large batch size are expected to further increase road detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation

Article 02 August 2019

Robust Lane Detection Based On Convolutional Neural Network and Random Sample Consensus

Paved and Unpaved Road Segmentation Using Deep Neural Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

¹ http://www.cvlibs.net/datasets/kitti/eval_road.php
Table 3 The road detection results of RPP_whole_c3_aug
Full size table

References

Alvarez J, Gevers T, LeCun Y, Lopez A. Road scene segmentation from a single image. Computer Vision–ECCV 2012;2012:376–389.
Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv:1412.7062.
Ding C, Choi J, Tao D, Davis LS. Multi-directional multi-level dual-cross patterns for robust face recognition. IEEE Trans Pattern Anal Mach Intell 2016;38(3):518–531.
Article PubMed Google Scholar
Fang L, Wang X. Lane boundary detection algorithm based on vector fuzzy connectedness. Cogn Comput. 2017:1–12.
Fritsch J, Kuhnl T, Geiger A. A new performance measure and evaluation benchmark for road detection algorithms. 2013 16th international IEEE conference on intelligent transportation systems-(ITSC). Piscataway: IEEE; 2013. p. 1693–1700.
Goldman DB. Vignette and exposure calibration and compensation. IEEE Trans Pattern Anal Mach Intell 2010; 32(12):2276–2288.
Article PubMed Google Scholar
Goodfellow IJ, Warde-Farley D, Lamblin P, Dumoulin V, Mirza M, Pascanu R, Bergstra J, Bastien F, Bengio Y. 2013. Pylearn2: a machine learning research library. arXiv:1308.4214.
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y. 2013. Maxout networks. arXiv:1302.4389.
Guo C, Mita S, McAllester D. Stereovision-based road boundary detection for intelligent vehicles in challenging scenarios. IROS 2009. IEEE/RSJ international conference on intelligent robots and systems, 2009. Piscataway: IEEE; 2009. p. 1723–1728.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–778.
Huang G, Liu Z, Weinberger KQ, van der Maaten L. 2016. Densely connected convolutional networks. arXiv:1608.06993.
Ioffe S, Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167.
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. 2014. Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems; 2012. p. 1097–1105.
Laddha A, Kocamaz MK, Navarro-Serment LE, Hebert M. Map-supervised road detection. Intelligent vehicles symposium (IV), 2016 IEEE. Piscataway: IEEE; 2016. p. 118–123.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–3440.
Meftah B, Lézoray O, Benyettou A. Novel approach using echo state networks for microscopic cellular image segmentation. Cogn Comput 2016;8(2):237–245.
Article Google Scholar
Mendes CCT, Frémont V, Wolf DF. Exploiting fully convolutional neural networks for fast road detection. 2016 IEEE international conference on Robotics and automation (ICRA). Piscataway: IEEE; 2016. p. 3174–3179.
Neto AM, Victorino AC, Fantoni I, Ferreira JV. Real-time estimation of drivable image area based on monocular vision. Intelligent vehicles symposium (IV), 2013 IEEE. Piscataway: IEEE; 2013. p. 63–68.
Oliveira GL, Burgard W, Brox T. Efficient deep methods for monocular road segmentation. IEEE/RSJ international conference on intelligent robots and systems (IROS 2016); 2016.
Ouyang W, Wang X, Zhang C, Yang X. Factors in finetuning deep model for object detection with long-tail distribution. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 864–873.
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39(4):640–651.
Article PubMed Google Scholar
Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.
Wang B, Frémont V, Rodríguez SA. Color-based road detection and its evaluation on the KITTI road benchmark. Intelligent vehicles symposium proceedings, 2014 IEEE. Piscataway: IEEE; 2014. p. 31–36.
Wijesoma WS, Kodagoda KS, Balasuriya AP. Road-boundary detection and tracking using ladar sensing. IEEE Trans Rob Autom 2004;20(3):456–464.
Article Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al. 2016. Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144.
Xie J, Yu L, Zhu L, Chen X. Semantic image segmentation method with multiple adjacency trees and multiscale features. Cogn Comput 2017;9(2):168–179.
Article Google Scholar
Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G. 2016. Achieving human parity in conversational speech recognition. arXiv:1610.05256.
Yu F, Koltun V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122.
Zeng X, Ouyang W, Yang B, Yan J, Wang X. Gated bi-directional CNN for object detection. European conference on computer vision. Berlin: Springer; 2016. p. 354–369.
Zhao H, Shi J, Qi X, Wang X, Jia J. 2016. Pyramid scene parsing network. arXiv:1612.01105.

Download references

Acknowledgments

The authors are grateful to the reviewers for their valuable comments that considerably contributed to improving this paper.

Funding

This work was supported in part by the National Science Foundation of China (NSFC) under Grant Nos. 91420106, 90820305, and 60775040, and by research fund of Tsinghua University- Tencent Joint Laboratory for Internet Innovation Technology.

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science, Tsinghua University, Beijing, 100084, China
Xiaolong Liu & Zhidong Deng

Authors

Xiaolong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhidong Deng.

Ethics declarations

Conflict of interests

Xiaolong Liu and Zhidong Deng declare that they have no conflict of interest.

Informed Consent

Informed consent was not required as no humans or animals were involved.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Deng, Z. Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling. Cogn Comput 10, 272–281 (2018). https://doi.org/10.1007/s12559-017-9524-y

Download citation

Received: 28 May 2017
Accepted: 31 October 2017
Published: 27 November 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s12559-017-9524-y

Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation

Robust Lane Detection Based On Convolutional Neural Network and Random Sample Consensus

Paved and Unpaved Road Segmentation Using Deep Neural Network

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Informed Consent

Human and Animal Rights

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation

Robust Lane Detection Based On Convolutional Neural Network and Random Sample Consensus

Paved and Unpaved Road Segmentation Using Deep Neural Network

Explore related subjects

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Informed Consent

Human and Animal Rights

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation