Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Transformer guided geometry model for flow-based unsupervised visual odometry

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467

  2. Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: ICRA

  3. Azuma RT (1997) A survey of augmented reality. Teleop Virtual Environ 6(4):355–385

    Article  Google Scholar 

  4. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  5. Blanco-Claraco JL, Moreno-Dueñas FÁ, González-Jiménez J (2014) The málaga urban dataset: high-rate stereo and lidar in a realistic urban scenario. Int J Robot Res 33(2):207–214

    Article  Google Scholar 

  6. Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: ICCV

  7. Clark R, Wang S, Wen H, Markham A, Trigoni N (2017) Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI

  8. Costante G, Ciarfuglia TA (2018) LS-VO: learning dense optical subspace for robust visual odometry estimation. In: IEEE robotics and automation letters

  9. DeSouza GN, Kak AC (2002) Vision for mobile robot navigation: A survey. In: TPAMI

  10. Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multiscale structure for a more refined depth map and 3d reconstruction. Neural Comput Appl 32:11217–11228

    Article  Google Scholar 

  11. Do Q, Jain LC (2010) Application of neural processing paradigm in visual landmark recognition and autonomous robot navigation. Neural Comput Appl 19(2):237–254

    Article  Google Scholar 

  12. Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: ICCV

  13. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems

  14. Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. In: TPAMI

  15. Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular slam. In: ECCV

  16. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR

  17. Geiger A, Ziegler J, Stiller C (2011) Stereoscan: dense 3d reconstruction in real-time. In: IV

  18. Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: CVPR

  19. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR

  21. Hong C, Kiong L (2018) Topological Gaussian ARAM for biologically inspired topological map building. Neural Comput Appl 29(4):1055–1072

    Article  Google Scholar 

  22. Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems

  23. Ji Y, Zhang H, Jie Z, Ma L, Wu, QJ (2020) Casnet: a cross-attention siamese network for video salient object detection. In: IEEE transactions on neural networks and learning systems

  24. Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV

  25. Klein G. Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 1–10

  26. Li J, Zhang Y, Chen Z, Wang J, Fang M, Luo C, Wang H (2020) A novel edgeenabled slam solution using projected depth image information. Neural Comput Appl 32:15369–15381

    Article  Google Scholar 

  27. Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: ICRA

  28. Li S, Cui H, Li Y, Liu B, Lou Y (2013) Decentralized control of collaborative redundant manipulators with partial command coverage via locally connected recurrent neural networks. Neural Comput Appl 23(3):1051–1060

    Article  Google Scholar 

  29. Li Y, Li S, Ge Y (2013) A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 104:170–179

    Article  Google Scholar 

  30. Li Y, Ushiku Y, Harada T (2019) Pose graph optimization for unsupervised monocular visual odometry. In: ICRA

  31. Li Y, Zhang J, Li S (2018) Stmvo: biologically inspired monocular visual odometry. Neural Comput Appl 29(6):215–225

    Article  Google Scholar 

  32. Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR

  33. Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR

  34. Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: CVPR

  35. Mitic M, Vukovic N, Petrovic M, Miljkovic Z (2018) Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Comput Appl 30(4):1065–1083

    Article  Google Scholar 

  36. Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. In: IEEE Transactions on Robotics

  37. Parmar N, Vaswani A, Uszkoreit J, Kaiser Ł, Shazeer N, Ku A, Tran D (2018) Image transformer. arXiv preprint arXiv:1802.05751

  38. Pilzer A, Lathuiliere S, Sebe N, Ricci E (2019) Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: CVPR

  39. Qin T, Pan J, Cao S, Shen S (2019) A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638

  40. Roberts RJW (2014) Optical flow templates for mobile robot environment understanding. Ph.D. thesis, Georgia Institute of Technology

  41. Shen T, Luo Z, Zhou L, Deng H, Zhang R, Fang T, Quan L (2019) Beyond photometric loss for self-supervised ego-motion estimation. In: ICRA

  42. Sun D, Yang X, Liu MY, Kautz J (2018) PWC-NET: CNNS for optical flow using pyramid, warping, and cost volume. In: CVPR

  43. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: CVPR

  44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems

  45. Wang R, Pizer SM, Frahm JM (2019) Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: CVPR

  46. Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA

  47. Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 37(4–5):513–542

    Article  Google Scholar 

  48. Wang Y, Wang P, Yang Z, Luo C, Yang Y, Xu W (2019) Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: CVPR

  49. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: TIP

  50. Wong A, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: CVPR

  51. Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: selecting memory and refining poses for deep visual odometry. In: CVPR

  52. Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR

  53. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR

  54. Zhou H, Ummenhofer B, Brox T (2018) Deeptam: deep tracking and mapping. In: ECCV

  55. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR

  56. Zou Y, Luo Z, Huang JB (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV

Download references

Acknowledgements

This study was funded by National Natural Science Foundation of China (Grant number 61906173, 61822701).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhimin Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Hou, Y., Wang, P. et al. Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Comput & Applic 33, 8031–8042 (2021). https://doi.org/10.1007/s00521-020-05545-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05545-8

Keywords

Navigation