Towards Accurate Ground Plane Normal Estimation from Ego-Motion
<p>Illustration of a typical dynamic motion of a front-facing camera on a moving vehicle. The pitch angle (rotation around the <span class="html-italic">x</span>-axis) is actually oscillating with an amplitude of about 1°, though the vehicle moves straight and the road surface looks flat enough. Such pitch angle oscillation is amplified when the vehicle encounters imperfect road surfaces and speed bumps.</p> "> Figure 2
<p>Comparison of IPM images before and after using our proposed method. (<b>a</b>) Original image from KITTI odometry dataset. (<b>b</b>) IPM image using fixed extrinsic from the camera to the ground. (<b>c</b>) IPM image using the dynamic extrinsic calculated by our proposed methods. It can be clearly observed that the image in (<b>c</b>) is more accurate. See our <a href="#app1-sensors-22-09375" class="html-app">supplementary video</a> for better visualization.</p> "> Figure 3
<p>IPM images with the constant ground plane normal: road edges are not properly aligned.</p> "> Figure 4
<p>Statistics of frames (KITTI odometry sequence # 00) that are out of calibration in pitch and roll.</p> "> Figure 5
<p>Overview of our proposed ground plane normal estimation pipeline. Our proposed IEKF can process ego-motion from various sensors, such as IMU, visual odometry from monocular images, and SLAM systems that can provide real-time odometry information. The final ground plane normal vector <span class="html-italic">N</span> is predicted in real-time based on the combination of residual rotation from IEKF and static extrinsic from prior calibration.</p> "> Figure 6
<p>2D side view of the camera reference system in two adjacent frames. <math display="inline"> <semantics> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> <msup> <mrow/> <mo>′</mo> </msup> </msubsup> </semantics> </math> and <math display="inline"> <semantics> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> </mrow> <msup> <mrow/> <mo>′</mo> </msup> </msubsup> </semantics> </math> are the ideal camera reference system when the vehicle is stopped. <math display="inline"> <semantics> <msub> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics> </math> and <math display="inline"> <semantics> <msub> <mi>T</mi> <mi>k</mi> </msub> </semantics> </math> are the actual camera poses. <math display="inline"> <semantics> <mrow> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> <mi>k</mi> </msubsup> <mo>=</mo> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> </mrow> <mrow> <mo>−</mo> <mn>1</mn> </mrow> </msubsup> <mo>·</mo> <msub> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics> </math> is the ego-motion between two frames. The black dashed line is the ideal horizontal line parallel to the ground plane. <math display="inline"> <semantics> <msub> <mi>θ</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics> </math> and <math display="inline"> <semantics> <msub> <mi>θ</mi> <mi>k</mi> </msub> </semantics> </math> are the pitch angles relative to the ground plane. The actual camera extrinsics to the ground plane are <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> <mo>·</mo> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> <mo>−</mo> <mn>1</mn> </mrow> <msup> <mrow/> <mo>′</mo> </msup> </msubsup> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>k</mi> </msub> <mo>·</mo> <msubsup> <mi>T</mi> <mrow> <mi>k</mi> </mrow> <msup> <mrow/> <mo>′</mo> </msup> </msubsup> </mrow> </semantics> </math>, which is equivalent to the ground plane normal vector. Best view in colour.</p> "> Figure 7
<p>Visual comparison of IPM images using (<b>a</b>) static normal vector based on fixed extrinsic calibration, and (<b>b</b>) dynamic normal vector using our proposed method. The odometry input is formed by the monocular version of ORB-SLAM2. We can clearly find that the road edges are not parallel with each other with a static normal vector. Based on the dynamic normal vector from our method, the road edges in IPM images are more parallel and consistent.</p> "> Figure 8
<p>Plots of pitch angles with normal vectors calculated by the proposed methods. The bottom plot shows the details within 50 frames from the orange box. The oscillation tendency of the pitch angles from the proposed methods aligns well with the ground truth. Note that the overall amplitude of the pitch angles is actually small, usually within 1 degree.</p> "> Figure 9
<p>Visualization of vanishing lines. The red and green horizontal lines are vanishing lines converted from fixed and dynamic ground plane normals, respectively. The (<b>bottom</b>) image is a zoom-in image of orange rectangular areas from the (<b>top</b>) image. The green line is obviously a more accurate estimation of the vanishing line.</p> "> Figure 10
<p>IPM visualization on the nuScenes dataset.</p> "> Figure 11
<p>Comparing ground plane normal estimated by odometry only.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Ground Normal Estimation Using Depth Sensors
2.2. Ground Normal Estimation Using Stereo Cameras
2.3. Ground Normal Estimation Using Monocular Camera
3. Ground Plane Normal
4. Approach
Algorithm 1 Ground Plane Normal Vector Estimation |
Require: Extrinsic calibration between reference sensor and ground plane Input: Ego-motion from the reference sensor: []. Output: Ground plane normal vector w.r.t reference sensor: [] Initialization: Covariance matrix Initial state Process model Process variance Measurement model Measurement variance Invariant Extended Kalman Filter Cumulative ego odometry for do Compute Predict state: Update filter: Compute residual rotation: Compute normal vector from residual rotation using Equation (1) end for |
5. Experiments
5.1. Implementation
5.2. Quantitative Evaluation
5.3. Qualitative Evaluation
5.4. Ablation Study
6. Limitations
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jazar, R.N. Vehicle Dynamics; Springer: Berlin, Germany, 2008; Volume 1. [Google Scholar]
- Liu, T.; Liu, Y.; Tang, Z.; Hwang, J.N. Adaptive ground plane estimation for moving camera-based 3D object tracking. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing, New Orleans, LA, USA, 24–26 November 2017; pp. 1–6. [Google Scholar]
- Wang, Y.; Teoh, E.K.; Shen, D. Lane detection and tracking using B-Snake. Image Vis. Comput. 2004, 22, 269–280. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, H. A real-time lane detection algorithm based on a hyperbola-pair model. In Proceedings of the IEEE Intelligent Vehicles Symposium, Götemburg, Sweeden, 19–22 June 2016; pp. 510–515. [Google Scholar]
- Garnett, N.; Cohen, R.; Pe’er, T.; Lahav, R.; Levi, D. 3d-lanenet: End-to-end 3d multiple lane detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2921–2930. [Google Scholar]
- Yang, C.; Indurkhya, B.; See, J.; Grzegorzek, M. Towards automatic skeleton extraction with skeleton grafting. IEEE Trans. Vis. Comput. Graph. 2020, 27, 4520–4532. [Google Scholar] [CrossRef] [PubMed]
- Qian, Y.; Dolan, J.M.; Yang, M. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4670–4679. [Google Scholar] [CrossRef]
- Soquet, N.; Aubert, D.; Hautiere, N. Road segmentation supervised by an extended v-disparity algorithm for autonomous navigation. In Proceedings of the IEEE Intelligent Vehicles Symposium, Istanbul, Turkey, 13–15 June 2007; pp. 160–165. [Google Scholar]
- Alvarez, J.M.; Gevers, T.; LeCun, Y.; Lopez, A.M. Road scene segmentation from a single image. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 376–389. [Google Scholar]
- Lee, D.G. Fast Drivable Areas Estimation with Multi-Task Learning for Real-Time Autonomous Driving Assistant. Appl. Sci. 2021, 11, 10713. [Google Scholar] [CrossRef]
- Lee, D.G.; Kim, Y.K. Joint Semantic Understanding with a Multilevel Branch for Driving Perception. Appl. Sci. 2022, 12, 2877. [Google Scholar] [CrossRef]
- Knorr, M.; Niehsen, W.; Stiller, C. Online extrinsic multi-camera calibration using ground plane induced homographies. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gold Coast City, Australia, 23–26 June 2013; pp. 236–241. [Google Scholar]
- Yang, C.; Wang, W.; Zhang, Y.; Zhang, Z.; Shen, L.; Li, Y.; See, J. MLife: A lite framework for machine learning lifecycle initialization. Mach. Learn. 2021, 110, 2993–3013. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.; Yang, Z.; Li, W.; See, J. FatigueView: A Multi-Camera Video Dataset for Vision-Based Drowsiness Detection. IEEE Trans. Intell. Transp. Syst. 2022. [Google Scholar] [CrossRef]
- Liu, J.; Cao, L.; Li, Z.; Tang, X. Plane-based optimization for 3D object reconstruction from single line drawings. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 315–327. [Google Scholar]
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2147–2156. [Google Scholar]
- Qin, Z.; Li, X. MonoGround: Detecting Monocular 3D Objects From the Ground. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 3793–3802. [Google Scholar]
- Zhou, D.; Dai, Y.; Li, H. Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Trans. Intell. Transp. Syst. 2019, 21, 791–802. [Google Scholar] [CrossRef]
- Qin, T.; Zheng, Y.; Chen, T.; Chen, Y.; Su, Q. A Light-Weight Semantic Map for Visual Localization towards Autonomous Driving. In Proceedings of the IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021; pp. 11248–11254. [Google Scholar]
- Reiher, L.; Lampe, B.; Eckstein, L. A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems, Rhodes, Greece, 20–23 September 2020; pp. 1–7. [Google Scholar]
- Philion, J.; Fidler, S. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 194–210. [Google Scholar]
- Li, Q.; Wang, Y.; Wang, Y.; Zhao, H. Hdmapnet: An online hd map construction and evaluation framework. In Proceedings of the IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 23–27 May 2022; pp. 4628–4634. [Google Scholar]
- Zhou, J.; Li, B. Robust ground plane detection with normalized homography in monocular sequences from a robot platform. In Proceedings of the International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 3017–3020. [Google Scholar]
- Dragon, R.; Van Gool, L. Ground plane estimation using a hidden markov model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 4026–4033. [Google Scholar]
- Sui, W.; Chen, T.; Zhang, J.; Lu, J.; Zhang, Q. Road-aware Monocular Structure from Motion and Homography Estimation. arXiv 2021, arXiv:2112.08635. [Google Scholar]
- Xiong, L.; Wen, Y.; Huang, Y.; Zhao, J.; Tian, W. Joint Unsupervised Learning of Depth, Pose, Ground Normal Vector and Ground Segmentation by a Monocular Camera Sensor. Sensors 2020, 20, 3737. [Google Scholar] [CrossRef] [PubMed]
- Man, Y.; Weng, X.; Li, X.; Kitani, K. GroundNet: Monocular ground plane normal estimation with geometric consistency. In Proceedings of the ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2170–2178. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Gallo, O.; Manduchi, R.; Rafii, A. Robust curb and ramp detection for safe parking using the Canesta TOF camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, Alaska, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Yu, H.; Zhu, J.; Wang, Y.; Jia, W.; Sun, M.; Tang, Y. Obstacle classification and 3D measurement in unstructured environments based on ToF cameras. Sensors 2014, 14, 10753–10782. [Google Scholar] [CrossRef] [PubMed]
- Choi, S.; Park, J.; Byun, J.; Yu, W. Robust ground plane detection from 3D point clouds. In Proceedings of the International Conference on Control, Automation and Systems, Suwon si, Republic of Korea, 22–25 October 2014; pp. 1076–1081. [Google Scholar]
- Zhang, W. Lidar-based road and road-edge detection. In Proceedings of the IEEE Intelligent Vehicles Symposium, La Jolla, CA, USA, 21–24 June 2010; pp. 845–848. [Google Scholar]
- McDaniel, M.W.; Nishihata, T.; Brooks, C.A.; Iagnemma, K. Ground plane identification using LIDAR in forested environments. In Proceedings of the IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 3831–3836. [Google Scholar]
- Miadlicki, K.; Pajor, M.; Sakow, M. Ground plane estimation from sparse LIDAR data for loader crane sensor fusion system. In Proceedings of the International Conference on Methods and Models in Automation and Robotics, Międzyzdroje, Poland, 28–31 August 2017; pp. 717–722. [Google Scholar]
- Lee, Y.H.; Leung, T.S.; Medioni, G. Real-time staircase detection from a wearable stereo system. In Proceedings of the International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012; pp. 3770–3773. [Google Scholar]
- Schwarze, T.; Lauer, M. Robust ground plane tracking in cluttered environments from egocentric stereo vision. In Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA, 26–30 May 2015; pp. 2442–2447. [Google Scholar]
- Kusupati, U.; Cheng, S.; Chen, R.; Su, H. Normal assisted stereo depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2189–2199. [Google Scholar]
- Se, S.; Brady, M. Ground plane estimation, error analysis and applications. Robot. Auton. Syst. 2002, 39, 59–71. [Google Scholar] [CrossRef]
- Chumerin, N.; Van Hulle, M. Ground plane estimation based on dense stereo disparity. In Proceedings of the International Conference on Neural Networks and Artificial Intelligence, Prague, Czech Republic, 3–6 September 2008; pp. 1–5. [Google Scholar]
- Song, S.; Chandraker, M. Robust scale estimation in real-time monocular SFM for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1566–1573. [Google Scholar]
- Zhou, D.; Dai, Y.; Li, H. Reliable scale estimation and correction for monocular visual odometry. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gotenburg, Sweden, 19–22 June 2016; pp. 490–495. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. (Am. Soc. Mech. Eng.) 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Bonnabel, S. Left-invariant extended Kalman filter and attitude estimation. In Proceedings of the IEEE Conference on Decision and Control, New Orleans, LA, USA, 12–14 December 2007; pp. 1027–1032. [Google Scholar]
- Barrau, A.; Bonnabel, S. The invariant extended Kalman filter as a stable observer. IEEE Trans. Autom. Control 2016, 62, 1797–1812. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Yang, N.; Stumberg, L.V.; Wang, R.; Cremers, D. D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1281–1292. [Google Scholar]
- Zhang, J.; Sui, W.; Wang, X.; Meng, W.; Zhu, H.; Zhang, Q. Deep online correction for monocular visual odometry. In Proceedings of the IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021; pp. 14396–14402. [Google Scholar]
- Wagstaff, B.; Peretroukhin, V.; Kelly, J. On the Coupling of Depth and Egomotion Networks for Self-Supervised Structure from Motion. IEEE Robot. Autom. Lett. 2022, 7, 6766–6773. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, J.; Tao, D. Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World. In Proceedings of the IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 23–27 May 2022; pp. 5601–5607. [Google Scholar]
- Brossard, M.; Barrau, A.; Bonnabel, S. AI-IMU dead-reckoning. IEEE Trans. Intell. Veh. 2020, 5, 585–595. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 1290–1299. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Sequence | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | Mean | Std |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pitch | 1.06 | 1.16 | 1.11 | 0.40 | 1.21 | 1.27 | 1.27 | 1.27 | 1.31 | 1.47 | 1.15 | 0.27 |
Roll | 0.92 | 0.59 | 1.20 | 1.30 | 1.46 | 0.99 | 0.78 | 0.70 | 0.93 | 0.91 | 0.98 | 0.26 |
Methods | Error (°) | Time (ms/frame) |
---|---|---|
HMM [24] | 4.10 | - |
Xiong [26] | 3.02 | - |
GroundNet [27] | 0.70 | 920 |
Road Aware [25] | 1.12 | 130 |
Naive [54] | 0.98 | - |
Ours (IMU) | 0.44 | 3 = 2 (IMU odometry) + 1 (IEKF) |
Ours (Monocular) | 0.39 | 50 = 49 (Visual odometry) + 1 (IEKF) |
Methods | Error (°) |
---|---|
Pure odometry(relative) | 1.09 |
Pure odometry(absolute) | 2.98 |
Naive(constant normal) | 0.98 |
Ours(IMU) | 0.44 |
Ours(Monocular) | 0.39 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Sui, W.; Zhang, Q.; Chen, T.; Yang, C. Towards Accurate Ground Plane Normal Estimation from Ego-Motion. Sensors 2022, 22, 9375. https://doi.org/10.3390/s22239375
Zhang J, Sui W, Zhang Q, Chen T, Yang C. Towards Accurate Ground Plane Normal Estimation from Ego-Motion. Sensors. 2022; 22(23):9375. https://doi.org/10.3390/s22239375
Chicago/Turabian StyleZhang, Jiaxin, Wei Sui, Qian Zhang, Tao Chen, and Cong Yang. 2022. "Towards Accurate Ground Plane Normal Estimation from Ego-Motion" Sensors 22, no. 23: 9375. https://doi.org/10.3390/s22239375
APA StyleZhang, J., Sui, W., Zhang, Q., Chen, T., & Yang, C. (2022). Towards Accurate Ground Plane Normal Estimation from Ego-Motion. Sensors, 22(23), 9375. https://doi.org/10.3390/s22239375