Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Self-supervised learning monocular depth estimation from internet photos

Published: 02 July 2024 Publication History

Abstract

Monocular depth estimation (MDE) is a fundamental problem in computer vision. Recently, self-supervised learning (SSL) approaches have attracted significant attention due to the ability to train an MDE network without ground-truth depth data. However, the performance of most existing SSL-MDE methods is yet limited by the available real training dataset, which are either binocular stereo pairs or monocular video sequences. In this paper, we propose a simple but effective generalization of SSL framework such that collections of multiple view Internet photos, a virtually unlimited source of real data, are enabled to train an MDE network. Combining the depth consistency and the mask that alleviates the interference such as moving objects, the network benefits from the real correspondences in adjacent views, thus achieving the improvement. Experiments show that the generalization of Monodepth2 via the proposed method not only leads to superior performance than itself and some data-driven MDE methods, but also stably boosts the performance of multiple state-of-the-art SSL-MDE methods. Besides, experiments on SeasonDepth, a dataset with various environmental conditions, show the good generalization capability of our proposed method.

References

[1]
Y. Li, Z. Yu, C. Choy, C. Xiao, J.M. Alvarez, S. Fidler, C. Feng, A. Anandkumar, Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 9087–9098.
[2]
Y. Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y. Shi, J. Sun, Z. Li, Bevdepth: Acquisition of reliable depth for multi-view 3d object detection, in: Proc. AAAI Conf. Artif. Intell., Vol. 37, No. 2, 2023, pp. 1477–1485.
[3]
Yang B., Rosa S., Markham A., Trigoni N., Wen H., Dense 3D object reconstruction from a single depth view, IEEE Trans. Pattern Anal. Mach. Intell. 41 (12) (2019) 2820–2834.
[4]
H. Jiang, G. Larsson, M.M.G. Shakhnarovich, E. Learned-Miller, Self-Supervised Relative Depth Learning for Urban Scene Understanding, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 19–35.
[5]
Godard C., Aodha O.M., Firman M., Brostow G., Digging into self-supervised monocular depth estimation, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 3827–3837,.
[6]
Farooq Bhat S., Alhashim I., Wonka P., AdaBins: Depth estimation using adaptive bins, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4008–4017,.
[7]
Chen H., Li K., Fu Z., Liu M., Chen Z., Guo Y., Distortion-aware monocular depth estimation for omnidirectional images, IEEE Signal Process. Lett. 28 (2021) 334–338.
[8]
D. Eigen, C. Puhrsch, R. Fergus, Depth Map Prediction from a Single Image using a Multi-Scale Deep Network, in: Proc. Adv. Neural Inf. Process. Syst., Vol. 27, 2014.
[9]
Xu X., Chen Z., Yin F., Monocular depth estimation with multi-scale feature fusion, IEEE Signal Process. Lett. 28 (2021) 678–682.
[10]
Geiger A., Lenz P., Stiller C., Urtasun R., Vision meets robotics: The kitti dataset, Int. J. Robot. Res. 32 (11) (2013) 1231–1237.
[11]
N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor Segmentation and Support Inference from RGBD Images, in: Proc. Eur . Conf. Comput. Vis., ISBN: 978-3-642-33715-4, 2012, pp. 746–760.
[12]
Li L., Li X., Yang S., Ding S., Jolfaei A., Zheng X., Unsupervised-learning-based continuous depth and motion estimation with monocular endoscopy for virtual reality minimally invasive surgery, IEEE Trans. Ind. Inform. 17 (6) (2021) 3920–3928.
[13]
R. Garg, V.K. Bg, G. Carneiro, I. Reid, Unsupervised cnn for single view depth estimation: Geometry to the rescue, in: Proc. Eur . Conf. Comput. Vis., 2016, pp. 740–756.
[14]
Godard C., Aodha O.M., Brostow G.J., Unsupervised monocular depth estimation with left-right consistency, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6602–6611,.
[15]
Li Z., Snavely N., MegaDepth: Learning single-view depth prediction from internet photos, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2041–2050,.
[16]
Hu H., Yang B., Qiao Z., Zhao D., Wang H., Seasondepth: Cross-season monocular depth prediction dataset and benchmark under multiple environments, 2020, arXiv preprint arXiv:2011.04408.
[17]
Wang Q., Piao Y., Depth estimation of supervised monocular images based on semantic segmentation, J. Vis. Commun. Image Represent 90 (2023).
[18]
S. Shao, Z. Pei, W. Chen, X. Wu, Z. Li, Nddepth: Normal-distance assisted monocular depth estimation, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 7931–7940.
[19]
Shao S., Pei Z., Wu X., Liu Z., Chen W., Li Z., IEBins: Iterative elastic bins for monocular depth estimation, 2023, arXiv preprint arXiv:2309.14137.
[20]
J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M. Cheng, I. Reid, Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video, in: Proc. Adv. Neural Inf. Process. Syst., Vol. 32, 2019.
[21]
Yang Z., Wang P., Wang Y., Xu W., Nevatia R., LEGO: Learning edge with geometry all at once by watching videos, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 225–234,.
[22]
Gordon A., Li H., Jonschkowski R., Angelova A., Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8976–8985,.
[23]
Casser V., Pirk S., Mahjourian R., Angelova A., Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos, Proc. AAAI Conf. Artif. Intell. 33 (01) (2019) 8001–8008.
[24]
W. Han, J. Yin, X. Jin, X. Dai, J. Shen, BRNet: Exploring Comprehensive Features for Monocular Depth Estimation, in: Proc. Eur. Conf. Comput. Vis., 2022, pp. 586–602.
[25]
He M., Hui L., Bian Y., Ren J., Xie J., Yang J., Ra-depth: Resolution adaptive self-supervised monocular depth estimation, in: Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 565–581.
[26]
Liu Z., Li R., Shao S., Wu X., Chen W., Self-supervised monocular depth estimation with self-reference distillation and disparity offset refinement, IEEE Trans. Circuits Syst. Video. Technol. (2023).
[27]
N. Zhang, F. Nex, G. Vosselman, N. Kerle, Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18537–18546.
[28]
X. Guo, H. Li, S. Yi, J. Ren, X. Wang, Learning monocular depth by distilling cross-domain stereo networks, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 484–500.
[29]
Atapour Abarghouei A., Breckon T.P., Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2800–2810,.
[30]
C. Wang, S. Lucey, F. Perazzi, O. Wang, Web stereo video supervision for depth prediction from dynamic scenes, in: Proc. Int. Conf. 3D Vis., 2019, pp. 348–357.
[31]
Ranftl R., Lasinger K., Hafner D., Schindler K., Koltun V., Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, IEEE Trans. Pattern Anal. Mach. Intell. 44 (3) (2022) 1623–1637.
[32]
Ocal M., Mustafa A., RealMonoDepth: Self-supervised monocular depth estimation for general scenes, 2020, arXiv preprint arXiv:2004.06267.
[33]
T. Zhou, M. Brown, N. Snavely, D.G. Lowe, Unsupervised Learning of Depth and Ego-Motion From Video, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1851–1858.
[34]
Ji P., Li R., Bhanu B., Xu Y., MonoIndoor: Towards good practice of self-supervised monocular depth estimation for indoor environments, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 12767–12776,.
[35]
Eigen D., Fergus R., Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, in: Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2650–2658,.
[36]
Yan J., Zhao H., Bu P., Jin Y., Channel-wise attention-based network for self-supervised monocular depth estimation, in: Proc. Int. Conf. 3D Vis., 2021, pp. 464–473,.
[37]
X. Chen, R. Zhang, J. Jiang, Y. Wang, G. Li, T.H. Li, Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem, in: Proc. IEEE Winter Conf. Appl. Comput. Vis., 2023, pp. 5776–5786.
[38]
Tonioni A., Poggi M., Mattoccia S., Stefano L.D., Unsupervised domain adaptation for depth prediction from images, IEEE Trans. Pattern Anal. Mach. Intell. 42 (10) (2020) 2396–2409.
[39]
Geiger A., Lenz P., Urtasun R., Are we ready for autonomous driving? The KITTI vision benchmark suite, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 3354–3361,.
[40]
Schönberger J.L., Frahm J.M., Structure-from-motion revisited, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4104–4113,.
[41]
C. Shu, K. Yu, Z. Duan, K. Yang, Feature-metric loss for self-supervised learning of depth and egomotion, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 572–588.
[42]
Peng R., Wang R., Lai Y., Tang L., Cai Y., Excavating the potential capacity of self-supervised monocular depth estimation, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15540–15549,.
[43]
A. Wong, S. Soatto, Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5644–5653.
[44]
F. Tosi, F. Aleotti, M. Poggi, S. Mattoccia, Learning monocular depth estimation infusing traditional stereo knowledge, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9799–9809.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Visual Communication and Image Representation
Journal of Visual Communication and Image Representation  Volume 99, Issue C
Mar 2024
249 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 02 July 2024

Author Tags

  1. Monocular depth estimation
  2. Self-supervised learning
  3. Internet-data-driven

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media