Abstract
We present a novel view for local image feature matching, which is inspired by the difference between existing methods. Detector-based methods detect predefined keypoints in local regions, so that the stability and reliability of established matches are ensured. In contrast, detector-free methods usually directly match dense features and refine the filtered results, which can help generate more matches. In order to combine their advantages, we propose a novel Matching-to-Detecting (M2D) process for feature matching, in which we first perform global reasoning for patch-level matching and subsequently identify discriminative matches within local areas to obtain pixel-level matches. At the patch-level, dense matching provides our pipeline with the ability to find plenty of matches even in low-texture areas, while at the pixel-level, our method can be viewed as detecting from a matching perspective, so that the established matches have higher stability and reliability and are remarkable in local regions. Experimental results demonstrate that our method outperforms state-of-the-art methods by a significant margin in terms of matching accuracy and the number of matches. Moreover, the computational complexity of our model is quite low, making it more suitable for real-world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chen, H., et al.: Learning to match features with seeded graph matching network. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Chen, H., et al.: ASpanFormer: detector-free image matching with adaptive span transformer. In: ECCV (2022)
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR (2018)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Heinly, J., Schonberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: CVPR (2015)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: ICML (2020)
Kundu, J.N., Rahul, M.V., Ganeshan, A., Babu, R.V.: Object pose estimation from monocular image using multi-view keypoint correspondence. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 298–313. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_23
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: NeurIPS (2020)
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2010)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195 (2019)
Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_35
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Shi, Y., Cai, J.X., Shavit, Y., Mu, T.J., Feng, W., Zhang, K.: ClusterGNN: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: CVPR (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2016)
Tyszkiewicz, M., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: NeurIPS (2020)
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 757–774. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_44
Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: MatchFormer: interleaving attention in transformers for feature matching. arXiv preprint arXiv:2203.09645 (2022)
Zhang, R., Zhu, S., Fang, T., Quan, L.: Distributed very large scale bundle adjustment by global camera consensus. In: ICCV (2017)
Zhou, Q., Sattler, T., Leal-Taixe, L.: Patch2Pix: epipolar-guided pixel-level correspondences. In: CVPR (2021)
Acknowledgements
This research was partly supported by grants of National Natural Science Foundation of China (NSFC, Grant No. 62171281), Science and Technology Commission of Shanghai Municipality (STCSM, Grant Nos. 20DZ1200203, 2021SHZDZX0102, 22DZ2229005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xu, H., Zhou, J., Pan, R., Yang, H., Li, C., Zhao, X. (2024). Matching-to-Detecting: Establishing Dense and Reliable Correspondences Between Images. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-8432-9_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)