EPGNet: Enhanced Point Cloud Generation for 3D Object Detection
<p>Cars with symmetry points. From <b>left</b> to <b>right</b>, the images show easy-, moderate-, and hard-level cars before and after generating symmetry points.</p> "> Figure 2
<p>The overall architecture of our network.</p> "> Figure 3
<p>UNet-like encoder–decoder. The encoder downsamples the feature volume by eight times, and the decoder restores it to the original scale and extracts non-empty voxel-wise features.</p> "> Figure 4
<p>Region proposal network (RPN) head. The same-colored boxes represent the features after the convolution of the same type.</p> "> Figure 5
<p>AP metric of 3D and BEV car detection at the moderate level compared with the two-stage methods in <a href="#sensors-20-06927-t003" class="html-table">Table 3</a>.</p> "> Figure 6
<p>Detection results. Each scene consists of three sub-images, including a 3D point cloud (<b>upper left</b>), a BEV map (<b>upper right</b>), and a corresponding image (<b>below</b>). The red boxes represent the detection results and the green boxes represent the ground truth.</p> "> Figure 7
<p>Detection results with our network (<b>left</b>) and the baseline (<b>right</b>).</p> "> Figure 8
<p>Comparison of error detection between our network (<b>left</b>) and the baseline (<b>right</b>).</p> "> Figure 9
<p>The regression effect of our network (<b>left</b>) and the baseline (<b>right</b>). Some of the details are circled and magnified.</p> "> Figure 10
<p>Symmetry point generation results. The symmetry points are yellow, while the foreground points are green.</p> ">
Abstract
:1. Introduction
- We propose a new task to predict the positions of symmetry points, which can complete the missing symmetry part of the object in the point cloud, so as to better detect the objects.
- We propose a simple method for calculating the symmetry point labels, that is, the position of a symmetry point in the point cloud coordinate system can be calculated indirectly through its relative position in the 3D ground truth box.
- Our detector achieves competitive detection performance in both 3D and BEV (bird’s eye view) detection and runs at 14 FPS, which is faster than many other two-stage methods.
2. Related Work
2.1. Deep Learning on Point Cloud Feature Representation
2.2. Three-Dimensional Object Detection with Multiple Sensors
2.3. Three-Dimensional Object Detection with Lidar Only
3. EPGNet for Point Cloud Object Detection
3.1. Motivation
3.2. Network Architecture
3.2.1. Data Processing
3.2.2. Symmetry Point Generation Network
3.2.3. Region Proposal Network
3.2.4. Loss Function
4. Experiments
4.1. Implementation Details
4.1.1. KITTI Dataset
4.1.2. Network Details
4.1.3. Training Details
4.2. Comparisons on the KITTI Validation Set
4.3. Ablation Studies
4.3.1. The Upper-Bound Performance of Our EPGNet
4.3.2. Hyperparameters for Training and Testing
4.3.3. Location Prediction Loss
4.4. Analysis of the Results
4.4.1. Detection of More Cars
4.4.2. Less False Detection
4.4.3. Better Regression
4.4.4. High-Quality Symmetry Point Generation
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 641–656. [Google Scholar]
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-task multi-sensor fusion for 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 15–21 June 2019; pp. 7345–7353. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, South Korea, 27 October–3 November 2019; pp. 1951–1960. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11040–11048. [Google Scholar]
- Charles, R.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 77–85. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kuang, H.; Wang, B.; An, J.; Zhang, M.; Zhang, Z. Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds. Sensors 2020, 20, 704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10529–10538. [Google Scholar]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure Aware Single-stage 3D Object Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11873–11882. [Google Scholar]
- Du, L.; Ye, X.; Tan, X.; Feng, J.; Xu, Z.; Ding, E.; Wen, S. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13329–13338. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5099–5108. [Google Scholar]
- Graham, B. Sparse 3D convolutional neural networks. arXiv 2015, arXiv:1505.02890. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 15–21 June 2019; pp. 12697–12705. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving?the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical image computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Graham, B.; van der Maaten, L. Submanifold sparse convolutional networks. arXiv 2017, arXiv:1706.01307. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
Difficulty | Min Box Height/Pixel | Max Occlusion | Max Truncation |
---|---|---|---|
Easy | 40 | Full visible | 15% |
Moderate | 25 | Partly occluded | 30% |
Hard | 25 | Difficult to see | 30% |
Difficulty | Min | Max | Average | Median | Mode | Num |
---|---|---|---|---|---|---|
Easy | 0 | 4378 | 426 | 256 | 116 (20) | 4836 |
Moderate | 0 | 3895 | 197 | 66 | 34 (83) | 6173 |
Hard | 0 | 4295 | 153 | 51 | 0 (107) | 3532 |
Scheme | Method | Modality | 3D (%) | BEV (%) | FPS | ||||
---|---|---|---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | ||||
one-stage | VoxelNet [15] | LIDAR | 81.98 | 65.46 | 62.85 | 89.60 | 84.81 | 78.57 | 4.4 |
SECOND [16] | LIDAR | 87.43 | 76.48 | 69.10 | 89.96 | 87.07 | 79.66 | 25 | |
PointPillars [24] | LIDAR | 86.13 | 77.03 | 72.43 | 89.93 | 86.92 | 84.97 | 62 | |
Voxel-FPN [17] | LIDAR | 88.27 | 77.86 | 75.84 | 90.28 | 87.92 | 86.27 | 50 | |
two-stage | MV3D [7] | RGB + LIDAR | 71.29 | 62.68 | 56.56 | 86.55 | 78.10 | 76.67 | 2.8 |
AVOD [8] | RGB + LIDAR | 84.41 | 74.44 | 68.65 | - | - | - | 10 | |
F-PointNet [23] | RGB + LIDAR | 83.76 | 70.92 | 63.65 | 88.16 | 84.02 | 76.44 | 2.1 | |
PointRCNN [11] | LIDAR | 88.88 | 78.63 | 77.38 | 89.96 | 87.07 | 79.66 | 10 | |
STD [12] | LIDAR | 88.80 | 78.70 | 78.20 | 90.10 | 88.30 | 87.40 | 10 | |
ours | LIDAR | 89.30 | 78.98 | 77.79 | 90.32 | 87.52 | 86.02 | 14 |
Method | mAOS | AOS (%) | ||
---|---|---|---|---|
Moderate | Easy | Moderate | Hard | |
SECOND | 80.37 | 87.84 | 81.31 | 71.95 |
AVOD-FPN | 85.61 | 89.95 | 87.13 | 79.74 |
PointPillar | 88.44 | 90.19 | 88.76 | 86.38 |
ours | 90.67/+ 2.23 | 94.58/+ 4.39 | 89.18/+ 0.42 | 88.25/+ 1.87 |
Training Data | Validation Data | AP (IoU = 0.7) | ||
---|---|---|---|---|
Easy | Moderate | Hard | ||
enhance | original | 44.56 | 31.39 | 26.44 |
original | enhance | 89.73 | 84.01 | 84.35 |
original | original | 88.48 | 78.14 | 76.68 |
enhance | enhance | 90.33 | 89.09 | 89.11 |
(a) Influence of Different on AP3d. | |||
---|---|---|---|
AP3d (%) | |||
Easy | Moderate | Hard | |
0.9 | 88.45 | 78.42 | 77.16 |
0.8 | 88.85 | 78.66 | 77.15 |
0.7 | 89.01 | 78.78 | 77.32 |
0.6 | 88.88 | 78.43 | 76.89 |
0.5 | 89.35 | 78.67 | 76.91 |
0.4 | 88.97 | 78.66 | 77.28 |
0.3 | 88.85 | 78.47 | 77.19 |
0.2 | 88.72 | 78.36 | 77.04 |
0.1 | 88.69 | 78.32 | 77.03 |
(b) Influence of Different on AP3d when = 0.7. | |||
AP3d (%) | |||
Easy | Moderate | Hard | |
0.9 | 88.56 | 78.52 | 76.55 |
0.8 | 89.13 | 78.85 | 77.14 |
0.7 | 89.01 | 78.78 | 77.32 |
0.6 | 89.14 | 78.87 | 77.41 |
0.5 | 89.14 | 78.88 | 77.51 |
0.4 | 89.25 | 78.96 | 77.63 |
0.3 | 89.30 | 78.98 | 77.79 |
0.2 | 89.33 | 78.93 | 77.83 |
0.1 | 89.34 | 78.84 | 77.67 |
Loss | 3D (%) | BEV (%) | ||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Smooth- Loss | 89.30 | 78.98 | 77.79 | 90.32 | 87.52 | 86.02 |
Cross-Entropy Loss | 88.23 | 78.36 | 77.23 | 89.92 | 87.13 | 86.27 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Q.; Fan, C.; Jin, W.; Zou, L.; Li, F.; Li, X.; Jiang, H.; Wu, M.; Liu, Y. EPGNet: Enhanced Point Cloud Generation for 3D Object Detection. Sensors 2020, 20, 6927. https://doi.org/10.3390/s20236927
Chen Q, Fan C, Jin W, Zou L, Li F, Li X, Jiang H, Wu M, Liu Y. EPGNet: Enhanced Point Cloud Generation for 3D Object Detection. Sensors. 2020; 20(23):6927. https://doi.org/10.3390/s20236927
Chicago/Turabian StyleChen, Qingsheng, Cien Fan, Weizheng Jin, Lian Zou, Fangyu Li, Xiaopeng Li, Hao Jiang, Minyuan Wu, and Yifeng Liu. 2020. "EPGNet: Enhanced Point Cloud Generation for 3D Object Detection" Sensors 20, no. 23: 6927. https://doi.org/10.3390/s20236927