Abstract
We present a simple and flexible object detection framework optimized for autonomous driving. Building on the observation that point clouds in this application are extremely sparse, we propose a practical pillar-based approach to fix the imbalance issue caused by anchors. In particular, our algorithm incorporates a cylindrical projection into multi-view feature learning, predicts bounding box parameters per pillar rather than per point or per anchor, and includes an aligned pillar-to-point projection module to improve the final prediction. Our anchor-free approach avoids hyperparameter search associated with past methods, simplifying 3D object detection while significantly improving upon state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. ACM Trans. Graph. (TOG) (2018)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. CoRR (2015)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: minkowski convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Dai, J., et al.: Deformable convolutional networks. In: The International Conference on Computer Vision (ICCV) (2017)
Girshick, R.: Fast R-CNN. In: The International Conference on Computer Vision (ICCV) (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Goforth, H., Aoki, Y., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: The International Conference on Computer Vision (ICCV) (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition Recognition (CVPR) (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: The International Conference on Machine Learning (ICML) (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: The International Conference on Learning Representations (ICLR) (2014)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. In: The International Conference on Intelligent Robots and Systems (IROS) (2018)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: The International Conference on Computer Vision (ICCV) (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, X., Qi, C.R., Guibas, L.J.: FlowNet3D: learning scene flow in 3D point clouds. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Liu, X., Yan, M., Bohg, J.: MeteorNet: deep learning on dynamic 3D point cloud sequences. In: The International Conference on Computer Vision (ICCV) (2019)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: The International Conference on Learning Representations (ICLR) (2017)
Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3d object detector for autonomous driving. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: The International Conference on Machine Learning (ICML) (2010)
Ngiam, J., et al.: StarNet: targeted computation for object detection in point clouds. arXiv (2019)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: The International Conference on Computer Vision (2019)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NeurIPS) (2017)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NeurIPS) (2015)
Sarode, V., et al.: One framework to register them all: PointNet encoding for point cloud alignment. arXiv (2019)
Shi, S., Wang, X., Li, H.: PointrCNN: 3D object proposal generation and detection from point cloud. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Simon, M., Milz, S., Amende, K., Groß, H.M.: Complex-YOLO: real-time 3D object detection on point clouds. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Su, H., et al.: SPLATNet: sparse lattice networks for point cloud processing. In: The Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2530–2539 (2018)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. arXiv (2019)
Thomas, H., Qi, C., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: The International Conference on Computer Vision (ICCV) (2019)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: The International Conference on Computer Vision (ICCV) (2019)
Wang, Y., Solomon, J.: Deep closest point: learning representations for point cloud registration. In: The International Conference on Computer Vision (ICCV) (2019)
Wang, Y., Solomon, J.: PRNet: self-supervised learning for partial-to-partial registration. In: Neural Information Processing Systems (NeurIPS) (2019)
Wang, Y., Sun, Y., Ziwei Liu, S.E.S., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38, 146 (2019)
Wong, K., Wang, S., Ren, M., Liang, M., Urtasun, R.: Identifying unknown instances for autonomous driving. In: The Conference on Robot Learning (CORL) (2019)
Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. In: Sensors (2018)
Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: The Conference on Robot Learning (CORL) (2018)
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. The Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv (2019)
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in LiDAR point clouds. In: The Conference on Robot Learning (CoRL) (2019)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: The Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Acknowledgements
Yue Wag, Justin Solomon, and the MIT Geometric Data Processing group acknowledge the generous support of Army Research Office grants W911NF1710068 and W911NF2010168, of Air Force Office of Scientific Research award FA9550-19-1-031, of National Science Foundation grant IIS-1838071, from the MIT–IBM Watson AI Laboratory, from the Toyota–CSAIL Joint Research Center, from gifts from Google and Adobe Systems, and from the Skoltech–MIT Next Generation Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these organizations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y. et al. (2020). Pillar-Based Object Detection for Autonomous Driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-58542-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)