Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-87361-5_38guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation

Published: 06 August 2021 Publication History

Abstract

The self-supervised depth and camera pose estimation methods are proposed to address the difficulty of acquiring the densely labeled ground-truth data and have achieved a great advance. As the stereo vision could constrain the predicted depth to a real-world scale, in this paper, we study the use of both left-right pairs and adjacent frames of stereo sequences for self-supervised semantic and optical flow guided monocular depth and camera pose estimation without real pose information. In particular, we explore (i) to construct a cascaded structure of the depth-pose and optical flow for well-initializing the optical flow, (ii) a cycle learning strategy to further constrain the depth-pose learning by the cross-task consistency, and (iii) a weighted semantic guided smoothness loss to match the real nature of a depth map. Our method produces favorable results against the state-of-the-art methods on several benchmarks. And we also demonstrate the generalization ability of our method on the cross dataset.

References

[1]
Chen, P., Liu, A.H., Liu, Y., Wang, Y.F.: Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2624–2632 (2019)
[2]
Chen, Y., Schmid, C., Sminchisescu, C.: Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7063–7072 (2019)
[3]
Zhan, H., Weerasekera, C.S., Bian, J., Reid, I.: Visual odometry revisited: What should be learnt?. In: IEEE International Conference on Robotics and Automation, pp. 4203–4210 (2020)
[4]
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
[5]
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing systems, pp. 2366–2374 (2014)
[6]
Fan, L., Huang, W., Gan, C., Ermon, S., Gong, B., Huang, J.: End-to-end learning of motion representation for video understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6016–6025 (2018)
[7]
Garg, R., Kumar, B,G.V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: European Conference on Computer Vision, pp. 740–756 (2016)
[8]
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
[9]
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
[10]
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3838 (2019)
[11]
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3D packing for self-supervised monocular depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
[12]
Guizilini, V., Hou, R., Li, J., Ambrus, R., Gaidon, A.: Semantically-guided representation learning for self-supervised monocular depth. In: International Conference on Learning Representations (2020)
[13]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[14]
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[15]
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision, pp. 239–248 (2016)
[16]
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)
[17]
Paszke, A., et al.: PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of NeurIPS, Vancouver, BC, Canada, pp. 8024–8035 (2019)
[18]
Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: On the uncertainty of self-supervised monocular depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
[19]
Ranjan, A., et al.: Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)
[20]
Russakovsky O et al. Imagenet large scale visual recognition challenge Int. J. Comput. Vis. 2015 115 3 211-252
[21]
Saxena A, Sun M, and Ng AY Make3D: learning 3D scene structure from a single still image IEEE Trans. Pattern Anal. Mach. Intell. 2009 31 5 824-840
[22]
Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)
[23]
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
[24]
Wang, G., Wang, H., Liu, Y., Chen, W.: Unsupervised learning of monocular depth and ego-motion using multiple masks. In: IEEE 2019 International Conference on Robotics and Automation (ICRA), pp. 4724–4730 (2019)
[25]
Casser V, Pirk S, Mahjourian R, and Angelova A Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos Proc. AAAI Conf. Artif. Intell. 2019 33 8001-8008
[26]
Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. Adv. Neural Inf. Process. Syst. 35–45 (2019)
[27]
Choi, J., Jung, D., Lee, D., Kim, C.: Safenet: Self-supervised monocular depth estimation with semantic-aware feature extraction. In: Workshops at the 34th Conference on Neural Information Processing Systems (2020)
[28]
Li, R., He, X., Zhu, Y., Li, X., Sun, J., Zhang, Y.: Enhancing self-supervised monocular depth estimation via incorporating robust constraints. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3108–3117 (2020)
[29]
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: sparsity invariant CNNs. In: International Conference on 3D Vision (2017)
[30]
Klingner, M., Termohlen, J., Mikolajczyk, J., Fingscheidt, T.: Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European Conference on Computer Vision, (2020)
[31]
Meng, Y., et al.: SIGNet: semantic instance aided unsupervised 3D geometry perception. In: Proceedings of CVPR, Long Beach, CA, USA, pp. 9810–9820, June 2019
[32]
Shen, T., Luo, Z., Zhou, L., et al.: Beyond photometric loss for self-supervised ego-motion estimation. In: International Conference on Robotics and Automation, pp. 6359–6365 (2019)
[33]
Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., Ang, Jr.: Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020)
[34]
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. Pattern Recogn. 214–223 (2007)
[35]
Schonberger, J.-L., Frahm, J.-M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

Index Terms

  1. Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part III
            Aug 2021
            839 pages
            ISBN:978-3-030-87360-8
            DOI:10.1007/978-3-030-87361-5
            • Editors:
            • Yuxin Peng,
            • Shi-Min Hu,
            • Moncef Gabbouj,
            • Kun Zhou,
            • Michael Elad,
            • Kun Xu

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 06 August 2021

            Author Tags

            1. Self-supervised learning
            2. Monocular depth estimation
            3. Camera pose estimation
            4. Stereo vision

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 20 Nov 2024

            Other Metrics

            Citations

            View Options

            View options

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media