Rotation Equivariant Orientation Estimation for Omnidirectional Localization

Chao Zhang¹²,
Ignas Budvytis^12,13,
Stephan Liwicki¹² &
…
Roberto Cipolla^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12625))

Included in the following conference series:

Asian Conference on Computer Vision

744 Accesses
3 Citations

Abstract

Deep learning based 6-degree-of-freedom (6-DoF) direct camera pose estimation is highly efficient at test time and can achieve accurate results in challenging, weakly textured environments. Typically, however, it requires large amounts of training images, spanning many orientations and positions of the environment making it impractical for medium size or large environments. In this work we present a direct 6-DoF camera pose estimation method which alleviates the need for orientation augmentation at train time while still supporting any $\mathrm {SO}(3)$ rotation at test time. This property is achieved by the following three step procedure. Firstly, omni-directional training images are rotated to a common orientation. Secondly, a fully rotation equivariant DNN encoder is applied and its output is used to obtain: (i) a rotation invariant prediction of the camera position and (ii) a rotation equivariant prediction of the probability distribution over camera orientations. Finally, at test time, the camera position is predicted robustly due to an in-built rotation invariance, while the camera orientation is recovered from the relative shift of the peak in the probability distribution of camera orientations. We demonstrate our approach on synthetic and real-image datasets, where we significantly outperform standard DNN-based pose regression (i) in terms of accuracy when a single training orientation is used, and (ii) in training efficiency when orientation augmentation is employed. To the best of our knowledge, our proposed rotation equivariant DNN for localization is the first direct pose estimation method able to predict orientation without explicit rotation augmentation at train time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Image-Based Localization with Deep Learning: The Impact of the Loss Function

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

Notes

1.
Since input is omni-directional image, camera orientation can be adjusted with minimal loss.

References

Häne, C., et al.: 3D visual perception for self-driving cars using a multi-camera system: calibration, mapping, localization, and obstacle detection. Image Vis. Comput. 68, 14–27 (2017)
Article Google Scholar
Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M., Kim, H.J.: Real-time monocular image-based 6-dof localization. Int. J. Robot. Res. 34, 476–492 (2015)
Article Google Scholar
Castle, R., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: 2008 12th IEEE International Symposium on Wearable Computers, pp. 15–22. IEEE (2008)
Google Scholar
Middelberg, S., Sattler, T., Untzelmann, O., Kobbelt, L.: Scalable 6-DOF localization on mobile devices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 268–283. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_18
Chapter Google Scholar
Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR, pp. 1–7. IEEE (2007)
Google Scholar
Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_57
Chapter Google Scholar
Zhang, W., Kosecka, J.: Image based localization in urban environments. In: Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), pp. 33–40. IEEE (2006)
Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: CVPR, pp. 5297–5307 (2016)
Google Scholar
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57
Chapter Google Scholar
Balntas, V., Li, S., Prisacariu, V.: Relocnet: continuous metric learning relocalisation using neural nets. In: ECCV, pp. 751–767 (2018)
Google Scholar
Nakashima, R., Seki, A.: Sir-net: scene-independent end-to-end trainable visual relocalizer. In: 3DV, pp. 472–481. IEEE (2019)
Google Scholar
Brachmann, E., et al.: Dsac-differentiable ransac for camera localization. In: CVPR, pp. 6684–6692 (2017)
Google Scholar
Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: ECCV, pp. 0–0 (2018)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: CVPR, pp. 4654–4662 (2018)
Google Scholar
Budvytis, I., Teichmann, M., Vojir, T., Cipolla, R.: Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression. In: BMVC (2019)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV, pp. 2938–2946 (2015)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: ICCV, pp. 627–637 (2017)
Google Scholar
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR, pp. 4077–4085 (2016)
Google Scholar
Purkait, P., Zhao, C., Zach, C.: Synthetic view generation for absolute pose regression and image synthesis. In: BMVC, p. 69 (2018)
Google Scholar
Piasco, N., Sidibé, D., Demonceaux, C., Gouet-Brunet, V.: A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn. 74, 90–109 (2018)
Article Google Scholar
Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: CVPR, pp. 8601–8610 (2018)
Google Scholar
Garcia-Fidalgo, E., Ortiz, A.: Vision-based topological mapping and localization methods: a survey. Robot. Auton. Syst. 64, 1–20 (2015)
Article Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of cnn-based absolute camera pose regression. In: CVPR, pp. 3302–3312 (2019)
Google Scholar
Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: CVPR, pp. 1637–1646 (2017)
Google Scholar
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d–3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR, pp. 2599–2606. IEEE (2009)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical cnns. In: ECCV, pp. 52–68 (2018)
Google Scholar
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: ICLR (2018)
Google Scholar
Kondor, R., Lin, Z., Trivedi, S.: Clebsch-gordan nets: a fully fourier space spherical convolutional neural network. In: NeurlPS, pp. 10117–10126 (2018)
Google Scholar
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Chapter Google Scholar
Zhang, C., He, S., Liwicki, S.: A spherical approach to planar semantic segmentation. In: BMVC (2020)
Google Scholar
Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. In: ICRA, vol. 2, pp. 1023–1029. IEEE (2000)
Google Scholar
Blaer, P., Allen, P.: Topological mobile robot localization using fast vision techniques. In: ICRA, vol. 1, pp. 1031–1036. IEEE (2002)
Google Scholar
Gonzalez-Barbosa, J.J., Lacroix, S.: Rover localization in natural environments by indexing panoramic images. In: ICRA, vol. 2, pp. 1365–1370. IEEE (2002)
Google Scholar
Kröse, B.J., Vlassis, N., Bunschoten, R., Motomura, Y.: A probabilistic model for appearance-based robot localization. Image Vis. Comput. 19, 381–391 (2001)
Article Google Scholar
Winters, N., Gaspar, J., Lacey, G., Santos-Victor, J.: Omni-directional vision for robot navigation. In: Proceedings IEEE Workshop on Omnidirectional Vision, pp. 21–28. IEEE (2000)
Google Scholar
Hansen, P., Corke, P., Boles, W., Daniilidis, K.: Scale invariant feature matching with wide angle images. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1689–1694. IEEE (2007)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurlPS, pp. 2017–2025 (2015)
Google Scholar
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Chapter Google Scholar
Zhang, X., Yu, F.X., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: CVPR, pp. 6818–6826 (2017)
Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: CVPR, pp. 5028–5037 (2017)
Google Scholar
Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV, pp. 518–533 (2018)
Google Scholar
Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical cnns on unstructured grids. In: ICLR (2019)
Google Scholar
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.J.: Spherephd: applying cnns on a spherical polyhedron representation of 360deg images. In: CVPR, pp. 9181–9189 (2019)
Google Scholar
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: ICML, pp. 1321–1330 (2019)
Google Scholar
Zhang, C., Liwicki, S., Smith, W., Cipolla, R.: Orientation-aware semantic segmentation on icosahedron spheres. In: ICCV, pp. 3533–3541 (2019)
Google Scholar
Krachmalnicoff, N., Tomasi, M.: Convolutional neural networks on the healpix sphere: a pixel-based algorithm and its application to cmb data analysis. Astron. Astrophys. 628, A129 (2019)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: ICRA, pp. 801–808. IEEE (2016)
Google Scholar
Budvytis, I., Sauer, P., Cipolla, R.: Semantic localisation via globally unique instance segmentation. In: BMVC (2018)
Google Scholar
Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: dense depth estimation for indoors spherical panoramas. In: ECCV, pp. 448–465 (2018)
Google Scholar
Li, J., Budvytis, I., Cipolla, R.: Indoor re-localisation using synthetic data. Department of Engineering, University of Cambridge, Technical report: ENG-TR.003, ISSN 2633–68369 (2020)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: CVPR, pp. 4500–4509 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Cambridge Research Lab, Toshiba Europe Ltd, Cambridge, UK
Chao Zhang, Ignas Budvytis, Stephan Liwicki & Roberto Cipolla
Department of Engineering, University of Cambridge, Cambridge, UK
Ignas Budvytis & Roberto Cipolla

Authors

Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ignas Budvytis
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Liwicki
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Cipolla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Zhang .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 989 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Budvytis, I., Liwicki, S., Cipolla, R. (2021). Rotation Equivariant Orientation Estimation for Omnidirectional Localization. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-69538-5_21
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69537-8
Online ISBN: 978-3-030-69538-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rotation Equivariant Orientation Estimation for Omnidirectional Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Image-Based Localization with Deep Learning: The Impact of the Loss Function

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 989 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Rotation Equivariant Orientation Estimation for Omnidirectional Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Image-Based Localization with Deep Learning: The Impact of the Loss Function

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 989 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation