Abstract
Keypoint-based matching is a fundamental technology for different computer vision tasks, in which keypoint detection is a crucial step and directly affects the entire performance. Based on deep learning approaches, the learning-based keypoint detectors have been significantly developed. To further improve the accuracy of high-level matching tasks, the extracted keypoints should provide more accurate point-to-point correspondences and maintain a uniform spatial distribution. Based on this idea, a self-supervised learning method of keypoint detection named repeatable adaptive point is proposed. This method consists of a self-supervised objective and an optimization algorithm. The objective maximizes the repeatability measure with the sparsity constraint of keypoints. This sparsity constraint is formulated by combining the non-maximum suppression operation and the penalty function of keypoint number, which generally makes keypoints have a uniform spatial distribution. A novel approximate alternate optimization algorithm is proposed to maximize the above objective, whose convergence is proved in theory. The proposed detector is “adaptive” because the combinations of it and some existing descriptors can adapt to high-level matching tasks with a fast convergence speed. Specifically, the combinations of it and SuperPoint/HardNet descriptors achieve state-of-the-art accuracy on three high-level tasks based on image matching, namely homography estimation, camera pose estimation, and three-dimensional reconstruction. Furthermore, the proposed method converges faster on new scenes compared with the state-of-the-art method that jointly optimizes the detector and the descriptor.
Similar content being viewed by others
References
Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110
Zhao C H, Fan B, Hu J W, et al. Homography-based camera pose estimation with known gravity direction for UAV navigation. Sci China Inf Sci, 2021, 64: 112204
Chen M T, Wang X G, Luo H, et al. Learning to focus: cascaded feature matching network for few-shot image recognition. Sci China Inf Sci, 2021, 64: 192105
Dong Q L, Shu M, Cui H N, et al. Learning stratified 3D reconstruction. Sci China Inf Sci, 2018, 61: 023101
Rosten E, Drummond T. Machine learning for high-speed corner detection. In: Proceedings of European Conference on Computer Vision, 2006. 430–443
Strecha C, Lindner A, Ali K, et al. Training for task specific keypoint detection. In: Proceedings of Joint Pattern Recognition Symposium, 2009. 151–160
Verdie Y, Yi K, Fua P, et al. TILDE: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 5279–5288
Yi K M, Trulls E, Lepetit V, et al. LIFT: learned invariant feature transform. In: Proceedings of European Conference on Computer Vision, 2016. 467–483
DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018. 224–236
Laguna A B, Riba E, Ponsa D, et al. Key.Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of International Conference on Computer Vision. 2019. 5835–5843
Ono Y, Trulls E, Fua P, et al. LF-Net: learning local features from images. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 6237–6247
Revaud J, de Souza C R, Humenberger M, et al. R2D2: reliable and repeatable detector and descriptor. In: Proceedings of Advances in Neural Information Processing Systems, 2019. 12405–12415
Schönberger J L, Frahm J M. Structure-from-motion revisited. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Kim S, Jeong M, Ko B C. Self-supervised keypoint detection based on multi-layer random forest regressor. IEEE Access, 2021, 9: 40850–40859
Yan P, Tan Y, Tai Y, et al. Unsupervised learning framework for interest point detection and description via properties optimization. Pattern Recogn, 2021, 112: 107808
Bay H, Tuytelaars T, van Gool L. SURF: speeded up robust features. In: Proceedings of European Conference on Computer Vision, 2006. 404–417
Alcantarilla P F, Bartoli A, Davison A J. KAZE features. In: Proceedings of European Conference on Computer Vision, 2012. 214–227
Noh H, Araujo A, Sim J, et al. Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 3456–3465
Dusmanu M, Rocco I, Pajdla T, et al. D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8092–8101
Savinov N, Seki A, Ladicky L, et al. Quad-networks: unsupervised learning to rank for interest point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1822–1830
Cieslewski T, Derpanis K G, Scaramuzza D. SIPs: succinct interest points from unsupervised inlierness probability learning. In: Proceedings of International Conference on 3D Vision, 2019. 604–613
Mishkin D, Radenović F, Matas J. Repeatability is not enough: learning affine regions via discriminability. In: Proceedings of European Conference on Computer Vision, 2018
Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 4037–4058
Zhang R, Isola P, Efros A A. Colorful image colorization. In: Proceedings of European Conference on Computer Vision, 2016. 649–666
Ledig C, Theis L, Huszar F, et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of Computer Vision and Pattern Recognition, 2017. 105–114
Pathak D, Krähenbühl P, Donahue J, et al. Context encoders: feature learning by inpainting. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2016. 2536–2544
Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 119: 1597–1607
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740–755
Balntas V, Lenc K, Vedaldi A, et al. Hpatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5173–5182
Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of International Conference on Computer Vision, 2011. 2564–2571
Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 1981, 24: 381–395
Chum O, Matas J. Matching with PROSAC—progressive sample consensus. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2005. 220–226
He K, Sun J. Convolutional neural networks at constrained time cost. In: Proceedings of Conference on Computer Vision and Pattern Recognition CVPR, 2015. 5353–5360
Schünberger J L, Hardmeier H, Sattler T, et al. Comparative evaluation of hand-crafted and learned local features. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2017. 6959–6968
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant No. 41371339), National R&D Program for Major Research Instruments of Natural Science Foundation of China (Grant No. 62027808), and Fundamental Research Funds for the Central Universities (Grant No. 2017KFYXJJ179).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, P., Tan, Y. & Tai, Y. Repeatable adaptive keypoint detection via self-supervised learning. Sci. China Inf. Sci. 65, 212103 (2022). https://doi.org/10.1007/s11432-021-3364-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3364-5