Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681628acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

Exploring Matching Rates: From Keypoint Selection to Camera Relocalization

Published: 28 October 2024 Publication History

Abstract

Camera relocalization is a challenging task to estimate camera pose within a known scene, with wide applications in the fields of Virtual Reality (VR), Augmented Reality (AR), robotics, and etc. Most existing learning-based methods invariably utilize all the information within an image for pose estimation. Although these methods have demonstrated leading pose accuracy in some cases, they are still far from being sufficient to handle the robustness under challenging viewpoints with less impacts on the localization accuracy for viewpoints that are easier to localize. In this paper, we propose a novel two-branch camera pose estimation framework: one branch utilizes keypoint-guided partial scene coordinate regression, while the other employs full scene coordinate regression to assess the credibility of image poses, thereby enabling more accurate camera localization. In particular, we devise a keypoint selection method predicated on matching rates which is designed to measure the matching quality between a 3D keypoint and 2D keypoints across views. With these selected 3D keypoints, we can generate 2D supervision mask with the ground-truth camera pose to supervise the keypoint prediction from the keypoint selection network. Meanwhile, we further refine the 2D supervision mask through the optimization with reprojection errors on the scene coordinate network, which estimates the scene coordinates for points within the scene that truly warrant attention, also enhances the localization performance. We also introduce a gated camera pose estimation strategy on the two-branch pose estimation framework, employing an updated keypoint selection network for images with higher credibility and a more robust network for difficult viewpoints. By adopting an effective curriculum learning scheme, we achieve higher accuracy within a training span of just 20 minutes. Our method's superior performance is validated through rigorous experimentation. The code is released at https://github.com/DUT-ICCD/KP-Guided-Reloc.

References

[1]
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR. 5297--5307.
[2]
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In ECCV. Springer, 404--417.
[3]
Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. 2023. Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses. In CVPR. 5044--5053.
[4]
Eric Brachmann, Martin Humenberger, Carsten Rother, and Torsten Sattler. 2021. On the limits of pseudo ground truth in visual camera re-localisation. In ICCV. 6218--6228.
[5]
Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. 2017. Dsac-differentiable ransac for camera localization. In CVPR. 6684--6692.
[6]
Eric Brachmann and Carsten Rother. 2018. Learning less is more-6d camera localization via 3d surface regression. In CVPR. 4654--4662.
[7]
Eric Brachmann and Carsten Rother. 2019. Expert sample consensus applied to camera re-localization. In ICCV. 7525--7534.
[8]
Eric Brachmann and Carsten Rother. 2021. Visual camera re-localization from RGB and RGB-D images using DSAC. TPAMI, Vol. 44, 9 (2021), 5847--5865.
[9]
Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Áron Monszpart, Daniyar Turmukhambetov, and Victor Adrian Prisacariu. 2024. Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer. In ECCV.
[10]
Bach-Thuan Bui, Dinh-Tuan Tran, and Joo-Ho Lee. 2023. D2s: Representing local descriptors and global scene coordinates for camera relocalization. arXiv preprint arXiv:2307.15250 (2023).
[11]
Federico Camposeco, Andrea Cohen, Marc Pollefeys, and Torsten Sattler. 2019. Hybrid scene compression for visual localization. In CVPR. 7653--7662.
[12]
Tommaso Cavallari, Stuart Golodetz, Nicholas A Lord, Julien Valentin, Victor A Prisacariu, Luigi Di Stefano, and Philip HS Torr. 2019. Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. TPAMI, Vol. 42, 10 (2019), 2465--2477.
[13]
Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu, and Eric Brachmann. 2024. Map-Relative Pose Regression for Visual Re-Localization. In CVPR. 20665--20674.
[14]
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-supervised interest point detection and description. In CVPRW. 224--236.
[15]
Tien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo Park, and Sudipta N Sinha. 2022. Learning to detect scene landmarks for camera localization. In CVPR. 11132--11142.
[16]
Tien Do and Sudipta N Sinha. 2024. Improved Scene Landmark Detection for Camera Localization. In 3DV. IEEE, 975--984.
[17]
Siyan Dong, Qingnan Fan, He Wang, Ji Shi, Li Yi, Thomas Funkhouser, Baoquan Chen, and Leonidas J Guibas. 2021. Robust neural routing through space partitions for camera relocalization in dynamic indoor environments. In CVPR. 8544--8554.
[18]
Siyan Dong, Shuzhe Wang, Yixin Zhuang, Juho Kannala, Marc Pollefeys, and Baoquan Chen. 2022. Visual localization via few-shot scene region classification. In 3DV. IEEE, 393--402.
[19]
Luca Ferranti, Xiaotian Li, Jani Boutellier, and Juho Kannala. 2021. Can you trust your pose? confidence estimation in visual localization. In ICPR. IEEE, 5004--5011.
[20]
Pierre Gleize, Weiyao Wang, and Matt Feiszli. 2023. Silk: Simple learned keypoints. In ICCV. 22499--22508.
[21]
Janghun Hyeon, Joohyung Kim, and Nakju Doh. 2021. Pose correction for highly accurate visual localization in large-scale indoor spaces. In ICCV. 15974--15983.
[22]
Reint Jansen, Frida Ruiz Mendoza, and William Hurst. 2023. Augmented reality for supporting geo-spatial planning: An open access review. VI, Vol. 7, 4 (2023), 1--12.
[23]
Alex Kendall and Roberto Cipolla. 2016. Modelling uncertainty in deep learning for camera relocalization. In ICRA. 4762--4769.
[24]
Alex Kendall and Roberto Cipolla. 2017. Geometric loss functions for camera pose regression with deep learning. In CVPR. 5974--5983.
[25]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-DOF camera relocalization. In ICCV. 2938--2946.
[26]
Hu Lin, Meng Li, Qianchen Xia, Yifeng Fei, Baocai Yin, and Xin Yang. 2022. 6-dof pose relocalization for event cameras with entropy frame and attention networks. In ACM SIGGRAPH VRCAI. 1--8.
[27]
David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV, Vol. 60, 2 (2004), 91--110.
[28]
Arthur Moreau, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. 2023. Crossfire: Camera relocalization on self-supervised features from an implicit representation. In ICCV. 252--262.
[29]
Jerome Revaud, Cesar De Souza, Martin Humenberger, and Philippe Weinzaepfel. 2019. R2d2: Reliable and repeatable detector and descriptor. NIPS, Vol. 32 (2019).
[30]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In ICCV. Ieee, 2564--2571.
[31]
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. 2019. From coarse to fine: Robust hierarchical localization at large scale. In CVPR. 12716--12725.
[32]
Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. 2021. Back to the feature: Learning robust camera localization from pixels to pose. In CVPR. 3247--3257.
[33]
Torsten Sattler, Bastian Leibe, and Leif Kobbelt. 2016. Efficient & effective prioritized matching for large-scale image-based localization. TPAMI, Vol. 39, 9 (2016), 1744--1756.
[34]
Torsten Sattler, Qunjie Zhou, Marc Pollefeys, and Laura Leal-Taixe. 2019. Understanding the limitations of cnn-based absolute camera pose regression. In CVPR. 3302--3312.
[35]
Grant Schindler, Matthew Brown, and Richard Szeliski. 2007. City-scale location recognition. In CVPR. 1--7.
[36]
Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR.
[37]
Yoli Shavit, Ron Ferens, and Yosi Keller. 2021. Learning multi-scene absolute pose regression with transformers. In ICCV. 2733--2742.
[38]
Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene coordinate regression forests for camera relocalization in RGB-D images. In CVPR. 2930--2937.
[39]
Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Akihiko Torii. 2018. InLoc: Indoor visual localization with dense matching and view synthesis. In CVPR. 7199--7209.
[40]
Hajime Taira, Ignacio Rocco, Jiri Sedlar, Masatoshi Okutomi, Josef Sivic, Tomas Pajdla, Torsten Sattler, and Akihiko Torii. 2019. Is this the right place? geometric-semantic pose verification for indoor visual localization. In ICCV. 4373--4383.
[41]
Akihiko Torii, Relja Arandjelovic, Josef Sivic, Masatoshi Okutomi, and Tomas Pajdla. 2015. 24/7 place recognition by view synthesis. In CVPR. 1808--1817.
[42]
Huiqun Wang, Di Huang, and Yunhong Wang. 2022. GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding. FCS, Vol. 16, 1 (2022), 161301.
[43]
Boyan Wei, Xianfeng Ye, Chengjiang Long, Zhenjun Du, Bangyu Li, Baocai Yin, and Xin Yang. 2023. Discriminative active learning for robotic grasping in cluttered scene. IEEE RA-L, Vol. 8, 3 (2023), 1858--1865.
[44]
Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, and Ping Tan. 2019. Sanet: Scene agnostic network for camera localization. In ICCV. 42--51.
[45]
Ziyue Yuan, Shuqi He, Yu Liu, and Lingyun Yu. 2023. MEinVR: Multimodal interaction techniques in immersive exploration. VI, Vol. 7, 3 (2023), 37--48.
[46]
Jiqing Zhang, Bo Dong, Yingkai Fu, Yuanchen Wang, Xiaopeng Wei, Baocai Yin, and Xin Yang. 2024. A Universal Event-Based Plug-In Module for Visual Object Tracking in Degraded Conditions. IJCV, Vol. 132, 5 (2024), 1857--1879.
[47]
Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, and Xin Yang. 2022. Spiking transformers for event-based single object tracking. In CVPR. 8801--8810.
[48]
Peiyao Zhao, Fei Zhu, Quan Liu, and Xinghong Ling. 2023. A stable actor-critic algorithm for solving robotic tasks with multiple constraints. FCS, Vol. 17, 4 (2023), 174328.
[49]
Qunjie Zhou, Sérgio Agostinho, Aljovsa Ovsep, and Laura Leal-Taixé. 2022. Is Geometry Enough for Matching in Visual Localization?. In ECCV. Springer, 407--425.
[50]
Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, and Heng Wang. 2023. R2former: Unified retrieval and reranking transformer for place recognition. In CVPR. 19370--19380.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. camera relocalization
  2. keypoint guided
  3. keypoint selection
  4. keypoint sets
  5. scene coordinates regression

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 23
    Total Downloads
  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)23
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media