Nothing Special   »   [go: up one dir, main page]

Skip to main content

Rotation Equivariant Orientation Estimation for Omnidirectional Localization

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12625))

Included in the following conference series:

Abstract

Deep learning based 6-degree-of-freedom (6-DoF) direct camera pose estimation is highly efficient at test time and can achieve accurate results in challenging, weakly textured environments. Typically, however, it requires large amounts of training images, spanning many orientations and positions of the environment making it impractical for medium size or large environments. In this work we present a direct 6-DoF camera pose estimation method which alleviates the need for orientation augmentation at train time while still supporting any \(\mathrm {SO}(3)\) rotation at test time. This property is achieved by the following three step procedure. Firstly, omni-directional training images are rotated to a common orientation. Secondly, a fully rotation equivariant DNN encoder is applied and its output is used to obtain: (i) a rotation invariant prediction of the camera position and (ii) a rotation equivariant prediction of the probability distribution over camera orientations. Finally, at test time, the camera position is predicted robustly due to an in-built rotation invariance, while the camera orientation is recovered from the relative shift of the peak in the probability distribution of camera orientations. We demonstrate our approach on synthetic and real-image datasets, where we significantly outperform standard DNN-based pose regression (i) in terms of accuracy when a single training orientation is used, and (ii) in training efficiency when orientation augmentation is employed. To the best of our knowledge, our proposed rotation equivariant DNN for localization is the first direct pose estimation method able to predict orientation without explicit rotation augmentation at train time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Since input is omni-directional image, camera orientation can be adjusted with minimal loss.

References

  1. Häne, C., et al.: 3D visual perception for self-driving cars using a multi-camera system: calibration, mapping, localization, and obstacle detection. Image Vis. Comput. 68, 14–27 (2017)

    Article  Google Scholar 

  2. Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M., Kim, H.J.: Real-time monocular image-based 6-dof localization. Int. J. Robot. Res. 34, 476–492 (2015)

    Article  Google Scholar 

  3. Castle, R., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: 2008 12th IEEE International Symposium on Wearable Computers, pp. 15–22. IEEE (2008)

    Google Scholar 

  4. Middelberg, S., Sattler, T., Untzelmann, O., Kobbelt, L.: Scalable 6-DOF localization on mobile devices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 268–283. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_18

    Chapter  Google Scholar 

  5. Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR, pp. 1–7. IEEE (2007)

    Google Scholar 

  6. Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_57

    Chapter  Google Scholar 

  7. Zhang, W., Kosecka, J.: Image based localization in urban environments. In: Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), pp. 33–40. IEEE (2006)

    Google Scholar 

  8. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: CVPR, pp. 5297–5307 (2016)

    Google Scholar 

  9. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57

    Chapter  Google Scholar 

  10. Balntas, V., Li, S., Prisacariu, V.: Relocnet: continuous metric learning relocalisation using neural nets. In: ECCV, pp. 751–767 (2018)

    Google Scholar 

  11. Nakashima, R., Seki, A.: Sir-net: scene-independent end-to-end trainable visual relocalizer. In: 3DV, pp. 472–481. IEEE (2019)

    Google Scholar 

  12. Brachmann, E., et al.: Dsac-differentiable ransac for camera localization. In: CVPR, pp. 6684–6692 (2017)

    Google Scholar 

  13. Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: ECCV, pp. 0–0 (2018)

    Google Scholar 

  14. Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: CVPR, pp. 4654–4662 (2018)

    Google Scholar 

  15. Budvytis, I., Teichmann, M., Vojir, T., Cipolla, R.: Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression. In: BMVC (2019)

    Google Scholar 

  16. Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV, pp. 2938–2946 (2015)

    Google Scholar 

  17. Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: ICCV, pp. 627–637 (2017)

    Google Scholar 

  18. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR, pp. 4077–4085 (2016)

    Google Scholar 

  19. Purkait, P., Zhao, C., Zach, C.: Synthetic view generation for absolute pose regression and image synthesis. In: BMVC, p. 69 (2018)

    Google Scholar 

  20. Piasco, N., Sidibé, D., Demonceaux, C., Gouet-Brunet, V.: A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn. 74, 90–109 (2018)

    Article  Google Scholar 

  21. Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: CVPR, pp. 8601–8610 (2018)

    Google Scholar 

  22. Garcia-Fidalgo, E., Ortiz, A.: Vision-based topological mapping and localization methods: a survey. Robot. Auton. Syst. 64, 1–20 (2015)

    Article  Google Scholar 

  23. Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of cnn-based absolute camera pose regression. In: CVPR, pp. 3302–3312 (2019)

    Google Scholar 

  24. Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: CVPR, pp. 1637–1646 (2017)

    Google Scholar 

  25. Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d–3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  26. Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR, pp. 2599–2606. IEEE (2009)

    Google Scholar 

  27. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  28. Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical cnns. In: ECCV, pp. 52–68 (2018)

    Google Scholar 

  29. Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: ICLR (2018)

    Google Scholar 

  30. Kondor, R., Lin, Z., Trivedi, S.: Clebsch-gordan nets: a fully fourier space spherical convolutional neural network. In: NeurlPS, pp. 10117–10126 (2018)

    Google Scholar 

  31. Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3

    Chapter  Google Scholar 

  32. Zhang, C., He, S., Liwicki, S.: A spherical approach to planar semantic segmentation. In: BMVC (2020)

    Google Scholar 

  33. Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. In: ICRA, vol. 2, pp. 1023–1029. IEEE (2000)

    Google Scholar 

  34. Blaer, P., Allen, P.: Topological mobile robot localization using fast vision techniques. In: ICRA, vol. 1, pp. 1031–1036. IEEE (2002)

    Google Scholar 

  35. Gonzalez-Barbosa, J.J., Lacroix, S.: Rover localization in natural environments by indexing panoramic images. In: ICRA, vol. 2, pp. 1365–1370. IEEE (2002)

    Google Scholar 

  36. Kröse, B.J., Vlassis, N., Bunschoten, R., Motomura, Y.: A probabilistic model for appearance-based robot localization. Image Vis. Comput. 19, 381–391 (2001)

    Article  Google Scholar 

  37. Winters, N., Gaspar, J., Lacey, G., Santos-Victor, J.: Omni-directional vision for robot navigation. In: Proceedings IEEE Workshop on Omnidirectional Vision, pp. 21–28. IEEE (2000)

    Google Scholar 

  38. Hansen, P., Corke, P., Boles, W., Daniilidis, K.: Scale invariant feature matching with wide angle images. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1689–1694. IEEE (2007)

    Google Scholar 

  39. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurlPS, pp. 2017–2025 (2015)

    Google Scholar 

  40. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28

    Chapter  Google Scholar 

  41. Zhang, X., Yu, F.X., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: CVPR, pp. 6818–6826 (2017)

    Google Scholar 

  42. Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: CVPR, pp. 5028–5037 (2017)

    Google Scholar 

  43. Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV, pp. 518–533 (2018)

    Google Scholar 

  44. Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical cnns on unstructured grids. In: ICLR (2019)

    Google Scholar 

  45. Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.J.: Spherephd: applying cnns on a spherical polyhedron representation of 360deg images. In: CVPR, pp. 9181–9189 (2019)

    Google Scholar 

  46. Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: ICML, pp. 1321–1330 (2019)

    Google Scholar 

  47. Zhang, C., Liwicki, S., Smith, W., Cipolla, R.: Orientation-aware semantic segmentation on icosahedron spheres. In: ICCV, pp. 3533–3541 (2019)

    Google Scholar 

  48. Krachmalnicoff, N., Tomasi, M.: Convolutional neural networks on the healpix sphere: a pixel-based algorithm and its application to cmb data analysis. Astron. Astrophys. 628, A129 (2019)

    Article  Google Scholar 

  49. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  50. Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: ICRA, pp. 801–808. IEEE (2016)

    Google Scholar 

  51. Budvytis, I., Sauer, P., Cipolla, R.: Semantic localisation via globally unique instance segmentation. In: BMVC (2018)

    Google Scholar 

  52. Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: dense depth estimation for indoors spherical panoramas. In: ECCV, pp. 448–465 (2018)

    Google Scholar 

  53. Li, J., Budvytis, I., Cipolla, R.: Indoor re-localisation using synthetic data. Department of Engineering, University of Cambridge, Technical report: ENG-TR.003, ISSN 2633–68369 (2020)

    Google Scholar 

  54. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)

    Google Scholar 

  55. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  56. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  57. Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: CVPR, pp. 4500–4509 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 989 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, C., Budvytis, I., Liwicki, S., Cipolla, R. (2021). Rotation Equivariant Orientation Estimation for Omnidirectional Localization. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69538-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69537-8

  • Online ISBN: 978-3-030-69538-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics