Transformer guided geometry model for flow-based unsupervised visual odometry

Xiangyu Li¹,
Yonghong Hou¹,
Pichao Wang²,
Zhimin Gao³,
Mingliang Xu³ &
…
Wanqing Li⁴

1534 Accesses
32 Citations
Explore all metrics

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

Article 13 March 2023

Guided Feature Selection for Deep Visual Odometry

Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: ICRA
Azuma RT (1997) A survey of augmented reality. Teleop Virtual Environ 6(4):355–385
Article Google Scholar
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Blanco-Claraco JL, Moreno-Dueñas FÁ, González-Jiménez J (2014) The málaga urban dataset: high-rate stereo and lidar in a realistic urban scenario. Int J Robot Res 33(2):207–214
Article Google Scholar
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: ICCV
Clark R, Wang S, Wen H, Markham A, Trigoni N (2017) Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI
Costante G, Ciarfuglia TA (2018) LS-VO: learning dense optical subspace for robust visual odometry estimation. In: IEEE robotics and automation letters
DeSouza GN, Kak AC (2002) Vision for mobile robot navigation: A survey. In: TPAMI
Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multiscale structure for a more refined depth map and 3d reconstruction. Neural Comput Appl 32:11217–11228
Article Google Scholar
Do Q, Jain LC (2010) Application of neural processing paradigm in visual landmark recognition and autonomous robot navigation. Neural Comput Appl 19(2):237–254
Article Google Scholar
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: ICCV
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. In: TPAMI
Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular slam. In: ECCV
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR
Geiger A, Ziegler J, Stiller C (2011) Stereoscan: dense 3d reconstruction in real-time. In: IV
Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: CVPR
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Hong C, Kiong L (2018) Topological Gaussian ARAM for biologically inspired topological map building. Neural Comput Appl 29(4):1055–1072
Article Google Scholar
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems
Ji Y, Zhang H, Jie Z, Ma L, Wu, QJ (2020) Casnet: a cross-attention siamese network for video salient object detection. In: IEEE transactions on neural networks and learning systems
Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV
Klein G. Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 1–10
Li J, Zhang Y, Chen Z, Wang J, Fang M, Luo C, Wang H (2020) A novel edgeenabled slam solution using projected depth image information. Neural Comput Appl 32:15369–15381
Article Google Scholar
Li R, Wang S, Long Z, Gu D (2018) Undeepvo: monocular visual odometry through unsupervised deep learning. In: ICRA
Li S, Cui H, Li Y, Liu B, Lou Y (2013) Decentralized control of collaborative redundant manipulators with partial command coverage via locally connected recurrent neural networks. Neural Comput Appl 23(3):1051–1060
Article Google Scholar
Li Y, Li S, Ge Y (2013) A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 104:170–179
Article Google Scholar
Li Y, Ushiku Y, Harada T (2019) Pose graph optimization for unsupervised monocular visual odometry. In: ICRA
Li Y, Zhang J, Li S (2018) Stmvo: biologically inspired monocular visual odometry. Neural Comput Appl 29(6):215–225
Article Google Scholar
Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: CVPR
Mitic M, Vukovic N, Petrovic M, Miljkovic Z (2018) Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Comput Appl 30(4):1065–1083
Article Google Scholar
Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. In: IEEE Transactions on Robotics
Parmar N, Vaswani A, Uszkoreit J, Kaiser Ł, Shazeer N, Ku A, Tran D (2018) Image transformer. arXiv preprint arXiv:1802.05751
Pilzer A, Lathuiliere S, Sebe N, Ricci E (2019) Refine and distill: exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: CVPR
Qin T, Pan J, Cao S, Shen S (2019) A general optimization-based framework for local odometry estimation with multiple sensors. arXiv preprint arXiv:1901.03638
Roberts RJW (2014) Optical flow templates for mobile robot environment understanding. Ph.D. thesis, Georgia Institute of Technology
Shen T, Luo Z, Zhou L, Deng H, Zhang R, Fang T, Quan L (2019) Beyond photometric loss for self-supervised ego-motion estimation. In: ICRA
Sun D, Yang X, Liu MY, Kautz J (2018) PWC-NET: CNNS for optical flow using pyramid, warping, and cost volume. In: CVPR
Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: CVPR
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems
Wang R, Pizer SM, Frahm JM (2019) Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: CVPR
Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA
Wang S, Clark R, Wen H, Trigoni N (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 37(4–5):513–542
Article Google Scholar
Wang Y, Wang P, Yang Z, Luo C, Yang Y, Xu W (2019) Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: CVPR
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: TIP
Wong A, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: CVPR
Xue F, Wang X, Li S, Wang Q, Wang J, Zha H (2019) Beyond tracking: selecting memory and refining poses for deep visual odometry. In: CVPR
Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR
Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR
Zhou H, Ummenhofer B, Brox T (2018) Deeptam: deep tracking and mapping. In: ECCV
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR
Zou Y, Luo Z, Huang JB (2018) Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV

Download references

Acknowledgements

This study was funded by National Natural Science Foundation of China (Grant number 61906173, 61822701).

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Xiangyu Li & Yonghong Hou
Alibaba Group (U.S.) Inc., Bellevue, USA
Pichao Wang
School of Information Engineering, Zhengzhou University, Zhengzhou, China
Zhimin Gao & Mingliang Xu
Advanced Multimedia Research Lab, University of Wollongong, Wollongong, Australia
Wanqing Li

Authors

Xiangyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Hou
View author publications
You can also search for this author in PubMed Google Scholar
Pichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Mingliang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wanqing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhimin Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Hou, Y., Wang, P. et al. Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Comput & Applic 33, 8031–8042 (2021). https://doi.org/10.1007/s00521-020-05545-8

Download citation

Received: 09 June 2020
Accepted: 18 November 2020
Published: 02 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00521-020-05545-8

Transformer guided geometry model for flow-based unsupervised visual odometry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

Guided Feature Selection for Deep Visual Odometry

Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Transformer guided geometry model for flow-based unsupervised visual odometry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry

Guided Feature Selection for Deep Visual Odometry

Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation