Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling
<p>Our noise model, when integrated with KinectFusion [<a href="#B10-sensors-25-00950" class="html-bibr">10</a>], improves the quality of the surface estimation, highlighted in colored boxes (<b>middle</b>) compared with the quality without noise filtering (<b>left</b>). On the right, the noisy depth filtering using our noise model (<b>top right</b>) effectively captures high-resolution details, such as the ridges in the ear (green box) and the background texture (red box), which are absent in the unfiltered version (<b>bottom right</b>).</p> "> Figure 2
<p>Illustration of Zivid camera noise components at an arbitrary point, <span class="html-italic">P(x, y, z)</span>, measured by the camera. Axial noise, <math display="inline"><semantics> <msub> <mi>σ</mi> <mi>Z</mi> </msub> </semantics></math>, and lateral noise, <math display="inline"><semantics> <msub> <mi>σ</mi> <mi>L</mi> </msub> </semantics></math>, represent the uncertainty of the measured location of point <span class="html-italic">P</span>, along with the z-axis and x- and y-axes, respectively.</p> "> Figure 3
<p>Our experimental setup for modeling sensor noise includes a robot arm (<b>b</b>), a planar target (<b>c</b>), and a Zivid 2 RGB-D structured light sensor (<b>d</b>), as shown in (<b>a</b>).</p> "> Figure 4
<p>Axial noise modeling. (<b>b</b>,<b>c</b>) Fitted surface models for axial noise with a (<b>b</b>) 7th order bivariate polynomial, (<b>c</b>) bivariate exponential function plus a 2nd order bivariate polynomial, and (<b>d</b>) bilinearly interpolated image encoding for axial noise values corresponding to (<b>a</b>). The axis labels and titles are provided for illustrative purposes.</p> "> Figure 5
<p>Pre-processing steps for lateral noise modeling. (<b>a</b>) Depth map with segmented edge of the planar target (marked in red color); Line fitted to pixels corresponding to the edge: at <span class="html-italic">far</span> distance (<b>b</b>), at <span class="html-italic">medium</span> distance (<b>c</b>), and at <span class="html-italic">near</span> distance (<b>d</b>). Rows indicate the angle of rotation of target, varying from 0° (<b>top</b>) to 60° (<b>bottom</b>).</p> "> Figure 6
<p>Lateral noise against different distances and angles.</p> "> Figure 7
<p>Variations in axial (<b>top</b>) and lateral noise (<b>bottom</b>) due to lighting conditions (<b>left</b>), number of captures (<b>middle</b>), and exposure time (<b>right</b>) as a function of distances at fixed 0°.</p> "> Figure 8
<p>The experimental setup for dataset collection. From top to bottom: (<b>a</b>) complete setup with a robot arm, camera, and object placed in a rotating table for 0° and 45° rotation of the object, (<b>b</b>) corresponding RGB and scaled depth images captured, and (<b>c</b>) other object sequences captured—Dragon, Ganesh, Rock, and Dino.</p> "> Figure 9
<p>Poisson 3D surface reconstructions with point cloud data merged based on pair-wise ICP registration followed by pose graph optimization for (<b>a</b>) Shiva and (<b>b</b>) Gripper.</p> "> Figure 10
<p>The qualitative results from the Shiva dataset, both top-front and bottom-back views, highlighting improved reconstruction quality (<b>b</b>), in the colored boxes, when noisy depth measurements are filtered using our noise model in the KinectFusion algorithm. This improvement is particularly noticeable compared with the reconstruction without depth filtering (<b>a</b>) and against the traditional reconstruction pipeline (<b>c</b>) discussed in <a href="#sec4dot1-sensors-25-00950" class="html-sec">Section 4.1</a>.</p> "> Figure 11
<p>Comparison of trajectories for different objects with and without noise filtering. Results demonstrate superior trajectories with noise filtering (green and blue) as compared with those without noise (red). Fitted circle to the trajectory with axial and lateral noise models (blue). The axes are not of the same scale.</p> "> Figure 12
<p>Qualitative results of integrating our noise model in Point-SLAM with the Shiva dataset. Highlighted boxes show improved reconstruction—Point-SLAM baseline (<b>a</b>), noisy depth filtering using our noise model (<b>b</b>), and depth loss term using our noise model (<b>c</b>). Please zoom in the red box for better visualization.</p> ">
Abstract
:1. Introduction
- To improve 3D scanning in advanced manufacturing, we empirically model the axial and lateral noise characteristics of a high-resolution RGB-D Zivid sensor as a function of the measurement distance and surface angle of the scanned objects. We also provide insights on performance under different lighting conditions, exposure time, and different capture settings.
- We demonstrate how to employ our noise models in 3D reconstruction pipelines from traditional to neural-based methods to improve both the quality of 3D reconstruction and pose estimation using bilinear interpolation as well through fitted analytical functions.
- We collect and publish a new dataset that contains high-resolution RGB-D scans of objects with complex geometry that can be used in many applications including pose estimation and 3D reconstruction.
2. Related Work
2.1. Noise Modeling of RGB-D Sensors
2.2. Application of Noise Models to Improve Computer Vision Tasks
2.3. 3D Reconstruction from Multi-View RGB-D Scans
3. Noise Modeling
3.1. Noise Characteristics
3.2. Experimental Setup
- Distance z: the distance between the target and the camera, z, is varied from 37.5 cm to 107.0 cm in steps of 2.5 cm. This axial movement of the camera is performed by moving the robot arm towards and away from the target in a straight line perpendicular to the planar target.
- Angle θ: the angle of the planar target is varied from 0° to 90° in steps of 10° with rotation perpendicular to the camera’s principal axis.
- Lighting conditions: the data collection process involves scans of the calibration board under both light and dark conditions at 0° for all distances.
- Exposure time: the exposure time or the duration of the shutter opening time is varied from 1677 ms to 100 s.
- Number of captures: the number of captures is varied from 1 to 5, combining multiple acquisitions with different apertures to enable capture at a high dynamic range.
3.3. Axial Noise
3.4. Lateral Noise
3.5. Other Effects to Sensor Noise
4. Applying Noise Model to Improve 3D Reconstruction
4.1. Datasets
4.2. Improving KinectFusion Using Our Noise Model
Algorithm 1 TSDF integration considering both axial and lateral noise using bilinear interpolation |
|
4.3. Improving Neural Implicit SLAM Using Our Noise Model
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Han, J.; Shao, L.; Xu, D.; Shotton, J. Enhanced Computer Vision with Microsoft Kinect Sensor: A Review. IEEE Trans. Cybern. 2013, 43, 1318–1334. [Google Scholar] [PubMed]
- Haider, A.; Hel-Or, H. What Can We Learn from Depth Camera Sensor Noise? Sensors 2022, 22, 5448. [Google Scholar] [CrossRef] [PubMed]
- Jing, C.; Potgieter, J.; Noble, F.; Wang, R. A comparison and analysis of RGB-D cameras’ depth performance for robotics application. In Proceedings of the 2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Auckland, New Zealand, 21–23 November 2017; pp. 1–6. [Google Scholar]
- Tölgyessy, M.; Dekan, M.; Chovanec, L.; Hubinskỳ, P. Evaluation of the Azure Kinect and its comparison to Kinect v1 and Kinect v2. Sensors 2021, 21, 413. [Google Scholar] [CrossRef]
- Nguyen, C.V.; Izadi, S.; Lovell, D. Modeling Kinect sensor noise for improved 3D reconstruction and tracking. In Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, Zurich, Switzerland, 13–15 October 2012; pp. 524–530. [Google Scholar]
- Fankhauser, P.; Bloesch, M.; Rodriguez, D.; Kaestner, R.; Hutter, M.; Siegwart, R. Kinect v2 for mobile robot navigation: Evaluation and modeling. In Proceedings of the 2015 International Conference on Advanced Robotics (ICAR), Istanbul, Turkey, 27–31 July 2015; pp. 388–394. [Google Scholar]
- Mallick, T.; Das, P.P.; Majumdar, A.K. Characterizations of noise in Kinect depth images: A review. IEEE Sensors J. 2014, 14, 1731–1740. [Google Scholar] [CrossRef]
- Khoshelham, K.; Elberink, S.O. Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications. Sensors 2012, 12, 1437–1454. [Google Scholar] [CrossRef]
- Sandström, E.; Li, Y.; Van Gool, L.; Oswald, M.R. Point-SLAM: Dense neural point cloud-based SLAM. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 18433–18444. [Google Scholar]
- Izadi, S.; Kim, D.; Hilliges, O.; Molyneaux, D.; Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, S.; Freeman, D.; Davison, A.; et al. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 559–568. [Google Scholar]
- Choo, B.; DeVore, M.D.; Beling, P.A. Statistical models of horizontal and vertical stochastic noise for the Microsoft Kinect™ sensor. In Proceedings of the IECON 2014—40th Annual Conference of the IEEE Industrial Electronics Society, Dallas, TX, USA, 29 October–1 November 2014; pp. 2624–2630. [Google Scholar]
- Halmetschlager-Funek, G.; Suchi, M.; Kampel, M.; Vincze, M. An empirical evaluation of ten depth cameras: Bias, precision, lateral noise, different lighting conditions and materials, and multiple sensor setups in indoor environments. IEEE Robot. Autom. Mag. 2018, 26, 67–77. [Google Scholar] [CrossRef]
- Dong, W.; Wang, Q.; Wang, X.; Zha, H. PSDF fusion: Probabilistic signed distance function for on-the-fly 3D data fusion and scene reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 701–717. [Google Scholar]
- Proenca, P.F.; Gao, Y. Probabilistic RGB-D odometry based on points, lines and planes under depth uncertainty. Robot. Auton. Syst. 2018, 104, 25–39. [Google Scholar] [CrossRef]
- Dryanovski, I.; Valenti, R.G.; Xiao, J. Fast visual odometry and mapping from RGB-D data. In Proceedings of the International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 2305–2310. [Google Scholar]
- Gutierrez-Gomez, D.; Mayol-Cuevas, W.; Guerrero, J.J. Dense RGB-D visual odometry using inverse depth. Robot. Auton. Syst. 2016, 75, 571–583. [Google Scholar] [CrossRef]
- Wasenmüller, O.; Ansari, M.D.; Stricker, D. Dense noise aware SLAM for tof RGB-D cameras. In Proceedings of the Asian Conference on Computer Vision Workshop (ACCV Workshop), Taipei, Taiwan, 20–24 November 2016. [Google Scholar]
- Yamaguchi, T.; Emaru, T.; Kobayashi, Y.; Ravankar, A.A. 3D map-building from RGB-D data considering noise characteristics of Kinect. In Proceedings of the 2016 IEEE/SICE International Symposium on System Integration (SII), Sapporo, Japan, 13–15 December 2016; pp. 379–384. [Google Scholar]
- Rosinol, A.; Leonard, J.J.; Carlone, L. Probabilistic Volumetric Fusion for Dense Monocular SLAM. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 3097–3105. [Google Scholar]
- Brandao, M.; Figueiredo, R.; Takagi, K.; Bernardino, A.; Hashimoto, K.; Takanishi, A. Placing and scheduling many depth sensors for wide coverage and efficient mapping in versatile legged robots. Int. J. Robot. Res. 2020, 39, 431–460. [Google Scholar] [CrossRef]
- Lu, Y.; Song, D. Robust RGB-D odometry using point and line features. In Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Smisek, J.; Jancosek, M.; Pajdla, T. 3D with Kinect. In Consumer Depth Cameras for Computer Vision; Springer: Berlin/Heidelberg, Germany, 2013; pp. 3–25. [Google Scholar]
- Iversen, T.M.; Kraft, D. Generation of synthetic Kinect depth images based on empirical noise model. Electron. Lett. 2017, 53, 856–858. [Google Scholar] [CrossRef]
- Glover, J.; Popovic, S. Bingham procrustean alignment for object detection in clutter. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 2158–2165. [Google Scholar]
- Saulnier, K.; Atanasov, N.; Pappas, G.J.; Kumar, V. Information theoretic active exploration in signed distance fields. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4080–4085. [Google Scholar]
- Kerl, C.; Sturm, J.; Cremers, D. Robust odometry estimation for RGB-D cameras. In Proceedings of the International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 3748–3754. [Google Scholar]
- Kerl, C.; Sturm, J.; Cremers, D. Dense visual SLAM for RGB-D cameras. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013; pp. 2100–2106. [Google Scholar]
- Dhawale, A.; Michael, N. Efficient Parametric Multi-Fidelity Surface Mapping. In Proceedings of the Robotics: Science and Systems, Virtually, Los Angeles, CA, USA, 12–16 July 2020. [Google Scholar]
- Oleynikova, H.; Taylor, Z.; Fehr, M.; Siegwart, R.; Nieto, J. Voxblox: Incremental 3D Euclidean Signed Distance Fields for On-Board MAV Planning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1366–1373. [Google Scholar]
- Ran, Y.; Zeng, J.; He, S.; Li, L.; Chen, Y.; Lee, G.; Chen, J.; Ye, Q. NeurAR: Neural Uncertainty for Autonomous 3D Reconstruction With Implicit Neural Representations. IEEE Robot. Autom. Lett. 2023, 8, 1125–1132. [Google Scholar] [CrossRef]
- Osvaldov’a, K.; Gajdovsech, L.; Kocur, V.; Madaras, M. Enhancement of 3D Camera Synthetic Training Data with Noise Models. arXiv 2024, arXiv:2402.16514. [Google Scholar]
- Cai, Y.; Plozza, D.; Marty, S.; Joseph, P.; Magno, M. Noise Analysis and Modeling of the PMD Flexx2 Depth Camera for Robotic Applications. In Proceedings of the 2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS), London, UK, 29–31 July 2024; pp. 1–6. [Google Scholar]
- Rustler, L.; Volprecht, V.; Hoffmann, M. Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications. arXiv 2025, arXiv:2501.07421. [Google Scholar] [CrossRef]
- Burger, L.; Sharan, L.; Karl, R.; Wang, C.; Karck, M.; De Simone, R.; Wolf, I.; Romano, G.; Engel-hardt, S. Comparative evaluation of three commercially available markerless depth sensors for close-range use in surgical simulation. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 1109–1118. [Google Scholar] [CrossRef] [PubMed]
- Ramli, I.S.; OK Rahmat, R.W.; Ng, S.B. Enhancement of Depth Value Approximation for 3D Image-Based Modelling using Noise Filtering and Inverse Perspective Mapping Techniques for Complex Object. J. Comput. Res. Innov. 2023, 8, 246–264. [Google Scholar]
- Nießner, M.; Zollhöfer, M.; Izadi, S.; Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (Tog) 2013, 32, 169. [Google Scholar] [CrossRef]
- Newcombe, R.A.; Fox, D.; Seitz, S.M. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 343–352. [Google Scholar]
- Runz, M.; Buffier, M.; Agapito, L. MaskFusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 16–20 October 2018; pp. 10–20. [Google Scholar]
- Dai, A.; Nießner, M.; Zollhöfer, M.; Izadi, S.; Theobalt, C. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 2017, 36, 1. [Google Scholar] [CrossRef]
- Zhang, Y.; Tosi, F.; Mattoccia, S.; Poggi, M. Go-SLAM: Global optimization for consistent 3D instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3727–3737. [Google Scholar]
- Xu, Q.; Xu, Z.; Philip, J.; Bi, S.; Shu, Z.; Sunkavalli, K.; Neumann, U. Point-NeRF: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5438–5448. [Google Scholar]
- Besl, P.J.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
- Keetha, N.; Karhade, J.; Jatavallabhula, K.M.; Yang, G.; Scherer, S.; Ramanan, D.; Luiten, J. SplaTAM: Splat, Track and Map 3D Gaussians for Dense RGB-D SLAM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 11–15 June 2024; pp. 21357–21366. [Google Scholar]
Objects | Without Noise | Axial Noise | Axial-Lateral Noise |
---|---|---|---|
Shiva | (0.00613, 0.42107) | (0.00184, 0.12915) | (0.00178, 0.12375) |
Controller | (1.13467, 33.417) | (1.08114, 48.397) | (0.00579, 0.58854) |
Rock | (0.00240, 0.31135) | (0.00086, 0.13408) | (0.00080, 0.10139) |
Dragon | (0.00250, 0.41372) | (0.01116, 1.7199) | (0.00820, 1.2446) |
Dino | (0.01085, 1.159) | (0.01121, 1.1883) | (0.01131, 1.2041) |
Baseline | Without Noise | Axial Noise | Axial-Lateral Noise | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Prec.↑ | Rec.↑ | F↑ | Prec.↑ | Rec.↑ | F↑ | Prec.↑ | Rec.↑ | F↑ | Prec.↑ | Rec.↑ | F↑ |
0.154 | 0.435 | 0.228 | 0.361 | 0.479 | 0.412 | 0.382 | 0.485 | 0.427 | 0.385 | 0.482 | 0.428 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Afzal Maken, F.; Muthu, S.; Nguyen, C.; Sun, C.; Tong, J.; Wang, S.; Tsuchida, R.; Howard, D.; Dunstall, S.; Petersson, L. Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling. Sensors 2025, 25, 950. https://doi.org/10.3390/s25030950
Afzal Maken F, Muthu S, Nguyen C, Sun C, Tong J, Wang S, Tsuchida R, Howard D, Dunstall S, Petersson L. Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling. Sensors. 2025; 25(3):950. https://doi.org/10.3390/s25030950
Chicago/Turabian StyleAfzal Maken, Fahira, Sundaram Muthu, Chuong Nguyen, Changming Sun, Jinguang Tong, Shan Wang, Russell Tsuchida, David Howard, Simon Dunstall, and Lars Petersson. 2025. "Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling" Sensors 25, no. 3: 950. https://doi.org/10.3390/s25030950
APA StyleAfzal Maken, F., Muthu, S., Nguyen, C., Sun, C., Tong, J., Wang, S., Tsuchida, R., Howard, D., Dunstall, S., & Petersson, L. (2025). Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling. Sensors, 25(3), 950. https://doi.org/10.3390/s25030950