Abstract
Deep learning based 6-degree-of-freedom (6-DoF) direct camera pose estimation is highly efficient at test time and can achieve accurate results in challenging, weakly textured environments. Typically, however, it requires large amounts of training images, spanning many orientations and positions of the environment making it impractical for medium size or large environments. In this work we present a direct 6-DoF camera pose estimation method which alleviates the need for orientation augmentation at train time while still supporting any \(\mathrm {SO}(3)\) rotation at test time. This property is achieved by the following three step procedure. Firstly, omni-directional training images are rotated to a common orientation. Secondly, a fully rotation equivariant DNN encoder is applied and its output is used to obtain: (i) a rotation invariant prediction of the camera position and (ii) a rotation equivariant prediction of the probability distribution over camera orientations. Finally, at test time, the camera position is predicted robustly due to an in-built rotation invariance, while the camera orientation is recovered from the relative shift of the peak in the probability distribution of camera orientations. We demonstrate our approach on synthetic and real-image datasets, where we significantly outperform standard DNN-based pose regression (i) in terms of accuracy when a single training orientation is used, and (ii) in training efficiency when orientation augmentation is employed. To the best of our knowledge, our proposed rotation equivariant DNN for localization is the first direct pose estimation method able to predict orientation without explicit rotation augmentation at train time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Since input is omni-directional image, camera orientation can be adjusted with minimal loss.
References
Häne, C., et al.: 3D visual perception for self-driving cars using a multi-camera system: calibration, mapping, localization, and obstacle detection. Image Vis. Comput. 68, 14–27 (2017)
Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M., Kim, H.J.: Real-time monocular image-based 6-dof localization. Int. J. Robot. Res. 34, 476–492 (2015)
Castle, R., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: 2008 12th IEEE International Symposium on Wearable Computers, pp. 15–22. IEEE (2008)
Middelberg, S., Sattler, T., Untzelmann, O., Kobbelt, L.: Scalable 6-DOF localization on mobile devices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 268–283. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_18
Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR, pp. 1–7. IEEE (2007)
Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_57
Zhang, W., Kosecka, J.: Image based localization in urban environments. In: Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), pp. 33–40. IEEE (2006)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: CVPR, pp. 5297–5307 (2016)
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57
Balntas, V., Li, S., Prisacariu, V.: Relocnet: continuous metric learning relocalisation using neural nets. In: ECCV, pp. 751–767 (2018)
Nakashima, R., Seki, A.: Sir-net: scene-independent end-to-end trainable visual relocalizer. In: 3DV, pp. 472–481. IEEE (2019)
Brachmann, E., et al.: Dsac-differentiable ransac for camera localization. In: CVPR, pp. 6684–6692 (2017)
Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: ECCV, pp. 0–0 (2018)
Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: CVPR, pp. 4654–4662 (2018)
Budvytis, I., Teichmann, M., Vojir, T., Cipolla, R.: Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression. In: BMVC (2019)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: ICCV, pp. 2938–2946 (2015)
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: ICCV, pp. 627–637 (2017)
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR, pp. 4077–4085 (2016)
Purkait, P., Zhao, C., Zach, C.: Synthetic view generation for absolute pose regression and image synthesis. In: BMVC, p. 69 (2018)
Piasco, N., Sidibé, D., Demonceaux, C., Gouet-Brunet, V.: A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn. 74, 90–109 (2018)
Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: CVPR, pp. 8601–8610 (2018)
Garcia-Fidalgo, E., Ortiz, A.: Vision-based topological mapping and localization methods: a survey. Robot. Auton. Syst. 64, 1–20 (2015)
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of cnn-based absolute camera pose regression. In: CVPR, pp. 3302–3312 (2019)
Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: CVPR, pp. 1637–1646 (2017)
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d–3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR, pp. 2599–2606. IEEE (2009)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical cnns. In: ECCV, pp. 52–68 (2018)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: ICLR (2018)
Kondor, R., Lin, Z., Trivedi, S.: Clebsch-gordan nets: a fully fourier space spherical convolutional neural network. In: NeurlPS, pp. 10117–10126 (2018)
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Zhang, C., He, S., Liwicki, S.: A spherical approach to planar semantic segmentation. In: BMVC (2020)
Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. In: ICRA, vol. 2, pp. 1023–1029. IEEE (2000)
Blaer, P., Allen, P.: Topological mobile robot localization using fast vision techniques. In: ICRA, vol. 1, pp. 1031–1036. IEEE (2002)
Gonzalez-Barbosa, J.J., Lacroix, S.: Rover localization in natural environments by indexing panoramic images. In: ICRA, vol. 2, pp. 1365–1370. IEEE (2002)
Kröse, B.J., Vlassis, N., Bunschoten, R., Motomura, Y.: A probabilistic model for appearance-based robot localization. Image Vis. Comput. 19, 381–391 (2001)
Winters, N., Gaspar, J., Lacey, G., Santos-Victor, J.: Omni-directional vision for robot navigation. In: Proceedings IEEE Workshop on Omnidirectional Vision, pp. 21–28. IEEE (2000)
Hansen, P., Corke, P., Boles, W., Daniilidis, K.: Scale invariant feature matching with wide angle images. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1689–1694. IEEE (2007)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurlPS, pp. 2017–2025 (2015)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Zhang, X., Yu, F.X., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: CVPR, pp. 6818–6826 (2017)
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: CVPR, pp. 5028–5037 (2017)
Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV, pp. 518–533 (2018)
Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al.: Spherical cnns on unstructured grids. In: ICLR (2019)
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.J.: Spherephd: applying cnns on a spherical polyhedron representation of 360deg images. In: CVPR, pp. 9181–9189 (2019)
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: ICML, pp. 1321–1330 (2019)
Zhang, C., Liwicki, S., Smith, W., Cipolla, R.: Orientation-aware semantic segmentation on icosahedron spheres. In: ICCV, pp. 3533–3541 (2019)
Krachmalnicoff, N., Tomasi, M.: Convolutional neural networks on the healpix sphere: a pixel-based algorithm and its application to cmb data analysis. Astron. Astrophys. 628, A129 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: ICRA, pp. 801–808. IEEE (2016)
Budvytis, I., Sauer, P., Cipolla, R.: Semantic localisation via globally unique instance segmentation. In: BMVC (2018)
Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: dense depth estimation for indoors spherical panoramas. In: ECCV, pp. 448–465 (2018)
Li, J., Budvytis, I., Cipolla, R.: Indoor re-localisation using synthetic data. Department of Engineering, University of Cambridge, Technical report: ENG-TR.003, ISSN 2633–68369 (2020)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: CVPR, pp. 4500–4509 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, C., Budvytis, I., Liwicki, S., Cipolla, R. (2021). Rotation Equivariant Orientation Estimation for Omnidirectional Localization. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-69538-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69537-8
Online ISBN: 978-3-030-69538-5
eBook Packages: Computer ScienceComputer Science (R0)