Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision
<p>Overview of the proposed pipeline. Given a set of RGB images captured by fisheye camera in driving scenarios, we trained a monocular depth estimation network that outputs the densely predicted depth map <math display="inline"><semantics> <mover accent="true"> <mi>D</mi> <mo stretchy="false">^</mo> </mover> </semantics></math>. Moreover, the LiDAR points are projected to generate a sparse depth map <span class="html-italic">D</span>. These two depth priors are employed in the edge-aware integration module. The training of the radiance field is guided by the RGB images and the integrated depth maps <math display="inline"><semantics> <mover accent="true"> <mi>D</mi> <mo stretchy="false">˜</mo> </mover> </semantics></math>, which inform the model regarding ray termination. NeRF takes 5D inputs and is trained using the <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>o</mi> <mi>l</mi> <mi>o</mi> <mi>r</mi> </mrow> </msub> </semantics></math> and the proposed <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>e</mi> <mi>d</mi> <mi>g</mi> <mi>e</mi> </mrow> </msub> </semantics></math> loss functions.</p> "> Figure 2
<p>Process of integrating the sparse LiDAR projection and dense estimated depth map. We propose the edge-aware integration loss function to optimize the NeRF model with depth supervision, as detailed in (<a href="#FD4-sensors-24-06790" class="html-disp-formula">4</a>). The proposed method guides the NeRF model using depth priors from the scene by minimizing the difference between the distributions of ray termination from the model and the given depth information. This depth is determined using the edge-aware smoothing kernel, which takes advantage of both depth priors. Moreover, it assigns a larger weight to adjacent points that are consistent with depth values.</p> "> Figure 3
<p>View synthesis on KITTI-360 dataset. The proposed method has demonstrated improved photorealistic results, as highlighted in the red boxes.</p> "> Figure 4
<p>View synthesis on JBNU-Depth360. We have highlighted the details of the synthesized image with a red box.</p> "> Figure 5
<p>Qualitative results of an ablation study on edge-aware integrated function. Compared to the spatial Gaussian function, the proposed approach better preserves the object’s edge information, resulting in a more realistic representation in the synthetic image.</p> ">
Abstract
:1. Introduction
- We present a method that utilizes a fisheye camera to synthesize novel views in real-world driving scenarios.
- We propose an edge-aware integration loss function that minimizes the difference between the rendered ray and the integrated depth distribution.
- We demonstrate that the proposed method outperforms other approaches, as shown by the results from the KITTI-360 [13] and JBNU-Depth360 datasets.
2. Related Work
2.1. Fisheye and Omni-Directional NeRF
2.2. NeRF with Depth Supervision
3. Method
3.1. Preliminaries
3.2. Depth Supervision
3.3. Edge-Aware Integration Loss
4. Experimental Results
4.1. Datasets
- KITTI-360.
- JBNU-Depth360.
- Evaluation metric.
4.2. Results on the KITTI-360 Dataset
4.3. Results on the JBNU-Depth360 Dataset
4.4. Ablation Study
- Depth type comparison.
- Depth integration method comparison.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7210–7219. [Google Scholar]
- Liu, S.; Zhang, X.; Zhang, Z.; Zhang, R.; Zhu, J.Y.; Russell, B. Editing conditional radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5773–5783. [Google Scholar]
- Bao, C.; Zhang, Y.; Yang, B.; Fan, T.; Yang, Z.; Bao, H.; Zhang, G.; Cui, Z. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20919–20929. [Google Scholar]
- Lin, C.H.; Ma, W.C.; Torralba, A.; Lucey, S. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5741–5751. [Google Scholar]
- Jeong, Y.; Ahn, S.; Choy, C.; Anandkumar, A.; Cho, M.; Park, J. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5846–5854. [Google Scholar]
- Zeng, Y.; Lei, J.; Feng, T.; Qin, X.; Li, B.; Wang, Y.; Wang, D.; Song, J. Neural Radiance Fields-Based 3D Reconstruction of Power Transmission Lines Using Progressive Motion Sequence Images. Sensors 2023, 23, 9537. [Google Scholar] [CrossRef] [PubMed]
- Ge, H.; Wang, B.; Zhu, Z.; Zhu, J.; Zhou, N. Hash Encoding and Brightness Correction in 3D Industrial and Environmental Reconstruction of Tidal Flat Neural Radiation. Sensors 2024, 24, 1451. [Google Scholar] [CrossRef]
- Rematas, K.; Liu, A.; Srinivasan, P.P.; Barron, J.T.; Tagliasacchi, A.; Funkhouser, T.; Ferrari, V. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12932–12942. [Google Scholar]
- Xie, Z.; Zhang, J.; Li, W.; Zhang, F.; Zhang, L. S-nerf: Neural radiance fields for street views. arXiv 2023, arXiv:2303.00749. [Google Scholar]
- Deng, L.; Yang, M.; Qian, Y.; Wang, C.; Wang, B. CNN based semantic segmentation for urban traffic scenes using fisheye camera. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 231–236. [Google Scholar]
- Son, E.; Choi, J.; Song, J.; Jin, Y.; Lee, S.J. Monocular Depth Estimation from a Fisheye Camera Based on Knowledge Distillation. Sensors 2023, 23, 9866. [Google Scholar] [CrossRef] [PubMed]
- Liao, Y.; Xie, J.; Geiger, A. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3292–3310. [Google Scholar] [CrossRef] [PubMed]
- Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5855–5864. [Google Scholar]
- Tang, X.; Yang, M.; Sun, P.; Li, H.; Dai, Y.; Zhu, F.; Lee, H. PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5428–5438. [Google Scholar]
- Li, H.; Zhang, D.; Dai, Y.; Liu, N.; Cheng, L.; Li, J.; Wang, J.; Han, J. GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21708–21718. [Google Scholar]
- Gu, K.; Maugey, T.; Knorr, S.; Guillemot, C. Omni-nerf: Neural radiance field from 360 image captures. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Kulkarni, S.; Yin, P.; Scherer, S. 360fusionnerf: Panoramic neural radiance fields with joint guidance. arXiv 2022, arXiv:2209.14265. [Google Scholar]
- Choi, J.; Hwang, G.; Lee, S.J. DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7850–7858. [Google Scholar]
- Wei, Y.; Liu, S.; Rao, Y.; Zhao, W.; Lu, J.; Zhou, J. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5610–5619. [Google Scholar]
- Deng, K.; Liu, A.; Zhu, J.Y.; Ramanan, D. Depth-Supervised NeRF: Fewer Views and Faster Training for Free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12882–12891. [Google Scholar]
- Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Wang, G.; Chen, Z.; Loy, C.C.; Liu, Z. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 9065–9076. [Google Scholar]
- Wang, C.; Sun, J.; Liu, L.; Wu, C.; Shen, Z.; Wu, D.; Dai, Y.; Zhang, L. Digging into depth priors for outdoor neural radiance fields. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1221–1230. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part V 12. Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3354–3361. [Google Scholar]
- Lee, J.H.; Han, M.K.; Ko, D.W.; Suh, I.H. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv 2019, arXiv:1907.10326. [Google Scholar]
- Tancik, M.; Weber, E.; Ng, E.; Li, R.; Yi, B.; Kerr, J.; Wang, T.; Kristoffersen, A.; Austin, J.; Salahi, K.; et al. Nerfstudio: A modular framework for neural radiance field development. arXiv 2023, arXiv:2302.04264. [Google Scholar]
Method | PSNR↑ | SSIM↑ | LPIPS↓ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Drive_0002 | Drive_0007 | Drive_0009 | Average | Drive_0002 | Drive_0007 | Drive_0009 | Average | Drive_0002 | Drive_0007 | Drive_0009 | Average | |
Nerfacto [30] | 13.466 | 13.699 | 13.650 | 13.024 | 0.580 | 0.583 | 0.627 | 0.537 | 0.532 | 0.607 | 0.439 | 0.583 |
UrbanNeRF [9] | 13.645 | 13.765 | 13.997 | 13.099 | 0.579 | 0.580 | 0.629 | 0.539 | 0.551 | 0.603 | 0.436 | 0.590 |
DS-NeRF [21] | 13.673 | 14.262 | 14.023 | 13.240 | 0.580 | 0.577 | 0.631 | 0.538 | 0.532 | 0.603 | 0.439 | 0.584 |
Ours | 14.180 | 14.920 | 14.102 | 13.678 | 0.577 | 0.594 | 0.641 | 0.548 | 0.521 | 0.594 | 0.641 | 0.557 |
Method | PSNR↑ | SSIM↑ | LPIPS↓ | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Drive_2 | Drive_3 | Drive_4 | Drive_5 | Average | Drive_2 | Drive_3 | Drive_4 | Drive_5 | Average | Drive_2 | Drive_3 | Drive_4 | Drive_5 | Average | |
Nerfacto [30] | 20.102 | 19.811 | 20.155 | 19.002 | 20.388 | 0.763 | 0.737 | 0.754 | 0.749 | 0.763 | 0.561 | 0.544 | 0.524 | 0.565 | 0.542 |
UrbanNeRF [9] | 20.130 | 19.822 | 20.164 | 19.170 | 20.452 | 0.764 | 0.737 | 0.754 | 0.750 | 0.763 | 0.563 | 0.539 | 0.521 | 0.561 | 0.540 |
DS-NeRF [21] | 20.157 | 19.819 | 20.341 | 19.071 | 20.468 | 0.763 | 0.738 | 0.754 | 0.750 | 0.763 | 0.564 | 0.540 | 0.521 | 0.568 | 0.541 |
Ours | 20.177 | 20.028 | 20.356 | 19.214 | 20.511 | 0.763 | 0.739 | 0.755 | 0.755 | 0.764 | 0.553 | 0.526 | 0.509 | 0.556 | 0.529 |
Depth Type | PSNR↑ | SSIM↑ | LPIPS↓ | |
---|---|---|---|---|
KITTI-360 | LiDAR projection | 13.240 | 0.538 | 0.584 |
Estimated depth | 13.555 | 0.539 | 0.578 | |
Integrated depth | 13.678 | 0.548 | 0.557 | |
JBNU-Depth360 | LiDAR projection | 20.468 | 0.763 | 0.541 |
Estimated depth | 20.401 | 0.761 | 0.542 | |
Integrated depth | 20.511 | 0.764 | 0.529 |
Method | PSNR↑ | SSIM↑ | LPIPS↓ | |
---|---|---|---|---|
KITTI-360 | Spatial Gaussian | 13.502 | 0.533 | 0.592 |
Proposed method | 13.678 | 0.548 | 0.557 | |
JBNU-Depth360 | Spatial Gaussian | 19.899 | 0.753 | 0.549 |
Proposed method | 20.511 | 0.764 | 0.529 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, J.; Lee, S.J. Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision. Sensors 2024, 24, 6790. https://doi.org/10.3390/s24216790
Choi J, Lee SJ. Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision. Sensors. 2024; 24(21):6790. https://doi.org/10.3390/s24216790
Chicago/Turabian StyleChoi, Jiho, and Sang Jun Lee. 2024. "Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision" Sensors 24, no. 21: 6790. https://doi.org/10.3390/s24216790
APA StyleChoi, J., & Lee, S. J. (2024). Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision. Sensors, 24(21), 6790. https://doi.org/10.3390/s24216790