Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information
<p>(<b>a</b>) Pseudo-LiDAR point cloud estimated by DORN [<a href="#B5-sensors-22-02576" class="html-bibr">5</a>] and ground truth; (<b>b</b>) pseudo-LiDAR point cloud estimated by our proposed depth refinement; (<b>c</b>) pseudo-LiDAR point cloud estimated by our proposed depth distribution adjustment. (Yellow ellipses indicate the object’s point cloud).</p> "> Figure 2
<p>Comparison of monocular depth estimation errors between DORN [<a href="#B5-sensors-22-02576" class="html-bibr">5</a>] (blue data) and GUPNet [<a href="#B11-sensors-22-02576" class="html-bibr">11</a>] (red data).</p> "> Figure 3
<p>An overview of the proposed monocular 3D object detection framework.</p> "> Figure 4
<p>Comparison of 3D detection using: (<b>a</b>) a 2D bounding box; (<b>b</b>) 2D instance segmentation. Ellipses in <a href="#sensors-22-02576-f004" class="html-fig">Figure 4</a>a denote the noise point clouds lying between the boundary and the bounding box of a certain object.</p> "> Figure 5
<p>The complete object-guided depth refinement module. Our module takes an image as input, along with the original depth map, 3D height and uncertainty generated from the added 3D regression head. All is feed into our GCDR module for further depth refinement.</p> "> Figure 6
<p>Linear interpolation discretization.</p> "> Figure 7
<p>Results of car detection using: (<b>a</b>) vanilla PL; (<b>b</b>) vanilla PL + GCDR; (<b>c</b>) vanilla PL + GCDR + DDA. Zoom in for details. The same 3D detection method F-Pointnet [<a href="#B18-sensors-22-02576" class="html-bibr">18</a>] was adopted for comparison. Top row images are frontal-view images, and bottom row images are bird’s-eye-view point cloud maps. Ground truth boxes are in green, and prediction results are in yellow.</p> "> Figure 8
<p>Detection of cars at different distances with occlusion. Top row images are frontal-view images, and bottom row images are bird’s-eye-view point cloud maps. Ground truth boxes are in green and prediction results are in yellow. Our proposed framework is able to create accurate 3D boxes at different distances and for occluded targets.</p> "> Figure 9
<p>Detection of a pedestrian and a cyclist. Top row images are frontal-view, and bottom row images are bird’s-eye-view point cloud maps. Ground truth boxes are in green, and predicted results are in yellow. Our proposed framework is able to create accurate 3D boxes for a pedestrian and a cyclist, and our DDA module can produce reasonable shapes for point clouds.</p> ">
Abstract
:1. Introduction
- We found that the main limitations of monocular 3D detection are due to the inaccuracy of the target position and the uncertainty of the depth distribution of the foreground target. These two problems arise from inaccurate depth estimation.
- We first propose an innovative method based on joint image segmentation and geometric-constraint-based target-guided depth adjustment to predict the target depth and provide the depth prediction confidence measure. The accuracy of the predicted target depth in the depth map is improved.
- We utilize the prior target size and normalization strategy to tackle the long-tail noise problem in pseudo-LiDAR point clouds. The uncertainly of depth distribution is reduced.
- Thorough experimentation has been carried out with the KITTI dataset. With the two novel solutions, our proposed monocular 3D object detection framework outperforms various state-of-the-art methods.
2. Related Work
2.1. Monocular 3D Object Detection
2.2. Monocular Depth Estimation
2.3. Benchmark Dataset and Evaluation Metrics
3. Proposed Framework
3.1. 2D Instance Segmentation
3.2. Monocular Depth Estimation
3.3. Object-Guided Depth Refinement
3.4. Depth Distribution Adjustment
3.5. Pseudo-LiDAR Generation
3.6. 3D Object Detection
4. Result and Discussion
4.1. System Setup
4.2. Implementation Details
4.3. Quantitative Results
4.4. Ablation Study
4.5. Visual Results
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Wu, X.; Sahoo, D.; Hoi, S.C. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
- Kim, S.h.; Hwang, Y. A survey on deep learning based methods and datasets for monocular 3D object detection. Electronics 2021, 10, 517. [Google Scholar] [CrossRef]
- Chen, L.; Lin, S.; Lu, X.; Cao, D.; Wu, H.; Guo, C.; Liu, C.; Wang, F.Y. Deep neural network based vehicle and pedestrian detection for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3234–3246. [Google Scholar] [CrossRef]
- Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D.; Mouzakitis, A. A survey on 3d object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795. [Google Scholar] [CrossRef] [Green Version]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2002–2011. [Google Scholar]
- Wang, Y.; Chao, W.L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 8445–8453. [Google Scholar]
- Ma, X.; Wang, Z.; Li, H.; Zhang, P.; Ouyang, W.; Fan, X. Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6851–6860. [Google Scholar]
- Weng, X.; Kitani, K. Monocular 3d object detection with pseudo-lidar point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Shi, X.; Ye, Q.; Chen, X.; Chen, C.; Chen, Z.; Kim, T.K. Geometry-based distance decomposition for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15172–15181. [Google Scholar]
- Lu, Y.; Ma, X.; Yang, L.; Zhang, T.; Liu, Y.; Chu, Q.; Yan, J.; Ouyang, W. Geometry uncertainty projection network for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3111–3121. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Peng, L.; Liu, F.; Yu, Z.; Yan, S.; Deng, D.; Yang, Z.; Liu, H.; Cai, D. Lidar point cloud guided monocular 3d object detection. arXiv 2021, arXiv:2104.09035. [Google Scholar]
- Vajgl, M.; Hurtik, P.; Nejezchleba, T. Dist-YOLO: Fast Object Detection with Distance Estimation. Appl. Sci. 2022, 12, 1354. [Google Scholar] [CrossRef]
- Mauri, A.; Khemmar, R.; Decoux, B.; Haddad, M.; Boutteau, R. Lightweight convolutional neural network for real-time 3D object detection in road and railway environments. J. Real-Time Image Process. 2022, 429, 1–18. [Google Scholar] [CrossRef]
- Xie, Z.; Song, Y.; Wu, J.; Li, Z.; Song, C.; Xu, Z. MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object Detection Algorithm. arXiv 2022, arXiv:2201.04341. [Google Scholar]
- Xiao, P.; Yan, F.; Chi, J.; Wang, Z. Real-Time 3D Pedestrian Tracking with Monocular Camera. Wirel. Commun. Mob. Comput. 2022, 2022, 7437289. [Google Scholar] [CrossRef]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Khan, F.; Salahuddin, S.; Javidnia, H. Deep learning-based monocular depth estimation methods—A state-of-the-art review. Sensors 2020, 20, 2272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lian, G.; Wang, Y.; Qin, H.; Chen, G. Towards unified on-road object detection and depth estimation from a single image. Int. J. Mach. Learn. Cybern. 2021, 119, 1–11. [Google Scholar] [CrossRef]
- Li, S.; Shi, J.; Song, W.; Hao, A.; Qin, H. Hierarchical Object Relationship Constrained Monocular Depth Estimation. Pattern Recognit. 2021, 120, 108116. [Google Scholar] [CrossRef]
- Liu, P.; Zhang, Z.; Meng, Z.; Gao, N. Monocular depth estimation with joint attention feature distillation and wavelet-based loss function. Sensors 2021, 21, 54. [Google Scholar] [CrossRef] [PubMed]
- Xu, X.; Chen, Z.; Yin, F. Multi-Scale Spatial Attention-Guided Monocular Depth Estimation With Semantic Enhancement. IEEE Trans. Image Process. 2021, 30, 8811–8822. [Google Scholar] [CrossRef] [PubMed]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Zamanakos, G.; Tsochatzidis, L.; Amanatiadis, A.; Pratikakis, I. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving. Comput. Graph. 2021, 99, 153–181. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. Nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11621–11631. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2446–2454. [Google Scholar]
- Li, Y.; Ma, L.; Zhong, Z.; Liu, F.; Chapman, M.A.; Cao, D.; Li, J. Deep learning for lidar point clouds in autonomous driving: A review. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3412–3432. [Google Scholar] [CrossRef] [PubMed]
- Ahmadyan, A.; Zhang, L.; Ablavatski, A.; Wei, J.; Grundmann, M. Objectron: A large scale dataset of object-centric videos in the wild with pose annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7822–7831. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3213–3223. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Saxena, A.; Sun, M.; Ng, A.Y. Learning 3-d scene structure from a single still image. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Khan, F.; Hussain, S.; Basak, S.; Moustafa, M.; Corcoran, P. A Review of Benchmark Datasets and Training Loss Functions in Neural Depth Estimation. IEEE Access 2021, 9, 148479–148503. [Google Scholar] [CrossRef]
- Lee, Y.; Park, J. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13906–13915. [Google Scholar]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10529–10538. [Google Scholar]
- Chen, X.; Kundu, K.; Zhu, Y.; Berneshawi, A.G.; Ma, H.; Fidler, S.; Urtasun, R. 3d object proposals for accurate object class detection. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Xu, D.; Anguelov, D.; Jain, A. Pointfusion: Deep sensor fusion for 3d bounding box estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 244–253. [Google Scholar]
- Brazil, G.; Liu, X. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9287–9296. [Google Scholar]
- Li, P.; Zhao, H.; Liu, P.; Cao, F. Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 644–660. [Google Scholar]
- Chen, Y.; Tai, L.; Sun, K.; Li, M. Monopair: Monocular 3d object detection using pairwise spatial relationships. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12093–12102. [Google Scholar]
- Liu, Z.; Wu, Z.; Tóth, R. Smoke: Single-stage monocular 3d object detection via keypoint estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 996–997. [Google Scholar]
- Simonelli, A.; Bulo, S.R.; Porzi, L.; Ricci, E.; Kontschieder, P. Towards generalization across depth for monocular 3d object detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 767–782. [Google Scholar]
- Ding, M.; Huo, Y.; Yi, H.; Wang, Z.; Shi, J.; Lu, Z.; Luo, P. Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 1000–1001. [Google Scholar]
- Reading, C.; Harakeh, A.; Chae, J.; Waslander, S.L. Categorical depth distribution network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8555–8564. [Google Scholar]
- Ma, X.; Liu, S.; Xia, Z.; Zhang, H.; Zeng, X.; Ouyang, W. Rethinking pseudo-lidar representation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 311–327. [Google Scholar]
- Xu, B.; Chen, Z. Multi-level fusion based 3d object detection from monocular images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2345–2353. [Google Scholar]
- Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7074–7082. [Google Scholar]
- Qin, Z.; Wang, J.; Lu, Y. Monogrnet: A geometric reasoning network for monocular 3d object localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8851–8858. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Method | Input | ||||||
---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | ||
M3D-RPN [39] | Mono | 14.76 | 9.71 | 7.42 | 21.02 | 13.67 | 10.23 |
AM3D [7] | P-LiDAR | 16.50 | 10.74 | 9.52 | 25.03 | 17.32 | 14.91 |
RTM3D [40] | Mono | 14.41 | 10.34 | 8.77 | 19.17 | 14.20 | 11.99 |
MonoPair [41] | Mono | 13.04 | 9.99 | 8.65 | 19.28 | 14.83 | 12.89 |
SMOKE [42] | Mono | 14.03 | 9.76 | 7.84 | 20.83 | 14.49 | 12.75 |
MoVi-3D [43] | Mono | 15.19 | 10.90 | 9.26 | 22.76 | 17.03 | 14.85 |
D4LCN [44] | Mono | 16.65 | 11.72 | 9.51 | 22.51 | 16.02 | 12.55 |
CaDNN [45] | Mono | 19.17 | 13.41 | 11.46 | 27.94 | 18.91 | 17.19 |
PatchNet [46] | Mono | 15.68 | 11.12 | 10.17 | 22.97 | 16.86 | 14.97 |
Mono-PLiDAR [8] | P-LiDAR | 10.76 | 7.50 | 6.10 | 21.27 | 13.92 | 11.25 |
GUP Net [11] | Mono | 20.11 | 14.20 | 11.77 | 30.29 | 21.19 | 18.20 |
Proposed framework | P-LiDAR | 25.21 | 17.25 | 13.53 | 37.86 | 27.92 | 21.97 |
Improvement | − |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Multi-Fusion [47] | 10.53 | 5.69 | 5.39 | 22.03 | 13.63 | 11.60 |
Deep3DBox [48] | 5.85 | 4.10 | 3.84 | 9.99 | 7.71 | 5.30 |
MonoGRNet [49] | 13.88 | 10.19 | 7.62 | 19.72 | 12.81 | 10.15 |
M3D-RPN [39] | 14.53 | 11.07 | 8.65 | 20.85 | 15.62 | 11.88 |
MonoPair [41] | 16.28 | 12.30 | 10.42 | 24.12 | 18.17 | 15.76 |
GUPNet [11] | 22.76 | 16.46 | 13.72 | 31.07 | 22.94 | 19.75 |
Mono-PLiDAR [8] | 28.2 | 18.5 | 16.4 | 40.6 | 26.3 | 22.9 |
AM3D [7] | 32.23 | 21.09 | 17.26 | 43.75 | 28.39 | 23.87 |
Proposed framework | 44.6 | 26.7 | 22.6 | 56.6 | 33.4 | 29.4 |
Improvement |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Multi-Fusion [47] | 47.88 | 29.48 | 26.44 | 55.02 | 36.73 | 31.27 |
Deep3DBox [48] | 27.04 | 20.55 | 15.88 | 30.02 | 23.77 | 18.83 |
MonoGRNet [49] | 47.59 | 32.28 | 25.50 | 48.53 | 35.94 | 28.59 |
M3D-RPN [39] | 48.53 | 35.94 | 28.59 | 53.35 | 39.60 | 31.76 |
MonoPair [41] | 55.38 | 42.39 | 37.99 | 61.06 | 47.63 | 41.92 |
GUPNet [11] | 57.62 | 42.33 | 37.59 | 61.78 | 47.06 | 40.88 |
Mono-PLiDAR [8] | 66.3 | 42.3 | 38.5 | 70.8 | 49.4 | 42.7 |
AM3D [7] | 68.86 | 49.19 | 42.24 | 72.64 | 51.82 | 44.21 |
Proposed framework | 75.6 | 57.1 | 49.8 | 78.7 | 59.2 | 51.0 |
Improvement |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Mono-PLiDAR [8] | 11.6 | 11.2 | 10.9 | 14.4 | 13.8 | 12.0 |
Proposed framework | 18.29 | 15.36 | 14.70 | 23.75 | 20.10 | 17.58 |
Improvement | +6.69 | 4.16 | +3.80 | +9.35 | +6.30 | +5.58 |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Mono-PLiDAR [8] | 8.5 | 6.5 | 6.5 | 11.0 | 7.7 | 6.8 |
Proposed framework | 9.1 | 6.7 | 6.2 | 12.1 | 8.9 | 7.0 |
Improvement | +0.6 | +0.2 | −0.3 | +1.1 | +1.2 | +0.2 |
Method | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Vanilla PL | 28.2 | 18.5 | 16.4 | 40.6 | 26.3 | 22.9 |
Vanilla PL + GC | 32.3 | 21.6 | 18.0 | 44.2 | 29.8 | 24.3 |
Vanilla PL + GUL | 38.1 | 23.4 | 20.6 | 50.1 | 31.6 | 27.4 |
Baseline | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
Faster-RCNN [50] | 40.0 | 23.2 | 18.3 | 49.7 | 29.6 | 24.5 |
CenterMask [34] | 42.1 | 23.4 | 20.6 | 50.1 | 31.6 | 27.4 |
DSF | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
× | 44.1 | 24.4 | 20.2 | 55.2 | 31.6 | 27.8 |
✓ | 44.6 | 26.7 | 22.6 | 56.6 | 33.4 | 29.4 |
Adjust Distribution | ||||||
---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | |
× | 38.1 | 23.4 | 20.6 | 50.1 | 31.6 | 27.4 |
✓ | 44.6 | 26.7 | 22.6 | 56.6 | 33.4 | 29.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, H.; Zhu, M.; Li, M.; Chan, K.-L. Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information. Sensors 2022, 22, 2576. https://doi.org/10.3390/s22072576
Hu H, Zhu M, Li M, Chan K-L. Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information. Sensors. 2022; 22(7):2576. https://doi.org/10.3390/s22072576
Chicago/Turabian StyleHu, Henan, Ming Zhu, Muyu Li, and Kwok-Leung Chan. 2022. "Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information" Sensors 22, no. 7: 2576. https://doi.org/10.3390/s22072576
APA StyleHu, H., Zhu, M., Li, M., & Chan, K. -L. (2022). Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information. Sensors, 22(7), 2576. https://doi.org/10.3390/s22072576