Local Homography Estimation on User-Specified Textureless Regions

Zheng Chen¹,
Xiao-Nan Fang¹ &
Song-Hai Zhang¹

214 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a novel deep neural network for designated point tracking (DPT) in a monocular RGB video, VideoInNet. More concretely, the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene. DPT can be applied to augmented reality and video editing, especially in the field of video advertising. Existing methods predict the location of four designated points without appropriately considering the point correlation. To solve this problem, VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework. Our network refines the heatmaps of designated points through two stages. On the first stage, we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way. On the second stage, we introduce an iterative heatmap refinement module to improve the tracking accuracy. We propose a dataset focusing on textureless planar regions, named ScanDPT, for training and evaluation. We show that the error rate of VideoInNet is about 29% lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Mémin É, Pérez P. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Trans. Image Process., 1998, 7(5): 703-719. https://doi.org/10.1109/83.668027.
Article Google Scholar
Dosovitskiy A, Fischer P, Ilg E et al. FlowNet: Learning optical flow with convolutional networks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.2758-2766. https://doi.org/10.1109/ICCV.2015.316.
Ilg E, Mayer N, Saikia T et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1647-1655. https://doi.org/10.1109/CVPR.2017.179.
Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2720-2729. https://doi.org/10.1109/CVPR.2017.291.
Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.8934-8943. https://doi.org/10.1109/CVPR.2018.00931.
Zhao S Y, Sheng Y L, Dong Y et al. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6277-6286. 10.1109/CVPR42600.2020.00631.
Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.402-419. https://doi.org/10.1007/978-3-030-58536-5_24.
Lowe D G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 2004, 60(2): 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94.
Article Google Scholar
DeTone D, Malisiewicz T, Rabinovich A. Superpoint: Self-supervised interest point detection and description. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2018, pp.224-236. https://doi.org/10.1109/CVPRW.2018.00060.
Luo Z X, Zhou L, Bai X Y et al. ASLFeat: Learning local features of accurate shape and localization. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6588-6597. https://doi.org/10.1109/CVPR42600.2020.00662.
Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: Learning feature matching with graph neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.4937-4946. https://doi.org/10.1109/CVPR42600.2020.00499.
Jiang W, Trulls E, Hosang J et al. COTR: Correspondence transformer for matching across images. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, October 2021, pp.6187-6197. https://doi.org/10.1109/ICCV48922.2021.00615.
Efe U, Ince K G, Alatan A A. DFM: A performance baseline for deep feature matching. In Proc. the 2021 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2021, pp.4284-4293. https://doi.org/10.1109/CVPRW53098.2021.00484.
Evangelidis G D, Psarakis E Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell., 2008, 30(10): 1858-1865. https://doi.org/10.1109/TPAMI.2008.113.
Article Google Scholar
Benhimane S, Malis E. Real-time image-based tracking of planes using efficient second-order minimization. In Proc. the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 28-October 2, 2004, pp.943-948. https://doi.org/10.1109/IROS.2004.1389474.
Chen L, Zhou F, Shen Y et al. Illumination insensitive efficient second-order minimization for planar object tracking. In Proc. the 2017 IEEE International Conference on Robotics and Automation, May 29-June 3, 2017, pp.4429-4436. https://doi.org/10.1109/ICRA.2017.7989512.
DeTone D, Malisiewicz T, Rabinovich A. Deep image homography estimation. arXiv:1606.03798, 2016. https://arxiv.org/pdf/1606.03798.pdf, Jan. 2022.
Dai A, Chang A X, Savva M et al. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2432-2443. https://doi.org/10.1109/CVPR.2017.261.
Dai A, Niesner M, Zollhöfer M et al. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv:1604.01093, 2016. https://arxiv.org/pdf/1604.01093.pdf, Jan. 2022.
Li J W, Gao W, Wu Y H et al. High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review. Computational Visual Media, 2022, 8(3): 369-393. https://doi.org/10.1007/s41095-021-0250-8.
Article Google Scholar
Muratov O, Slynko Y, Chernov V et al. 3DCapture: 3D reconstruction for a smartphone. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 26-July 1, 2016, pp.893-900. https://doi.org/10.1109/CVPRW.2016.116.
Yang X B, Zhou L Y, Jiang H Q et al. Mobile3DRecon: Real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Vis. Comput. Graph., 2020, 26(12): 3446-3456. https://doi.org/10.1109/TVCG.2020.3023634.
Article Google Scholar
Zhang S H, Li X L, Liu Y T. Scale-aware insertion of virtual objects in monocular videos. In Proc. the 2020 IEEE International Symposium on Mixed and Augmented Reality, November 2020, pp.36-44. https://doi.org/10.1109/ISMAR50242.2020.00022.
Chen D, Tang F, Dong W M et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Comput. Vis. Media, 2021, 7(2): 253-265. https://doi.org/10.1007/s41095-021-0212-1.
Article Google Scholar
Xue Z X, Wu W. Anomaly detection by exploiting the tracking trajectory in surveillance videos. Sci. China: Inf. Sci., 2020, 63(5): Article No. 154101. https://doi.org/10.1007/s11432-018-9792-8.
Zhang D, Li T S, Chen C L. Target tracking algorithm based on a broad learning system. Science China: Information Sciences, 2022, 65(5): Article No. 154201. https://doi.org/10.1007/s11432-020-3272-y.
Li K, He F, Yu H. Robust visual tracking based on convolutional features with illumination and occlusion handing. J. Comput. Sci. Technol., 2018, 33(1): 223-236. https://doi.org/10.1007/s11390-017-1764-5.
Article Google Scholar
Li J C, Zhong F, Xu S H, Qin X Y. 3D object tracking with adaptively weighted local bundles. J. Comput. Sci. Technol., 2021, 36(3): 555-571. https://doi.org/10.1007/s11390-021-1272-5.
Article Google Scholar
Avidan S. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26(8): 1064-1072. https://doi.org/10.1109/TPAMI.2004.53.
Article Google Scholar
Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. Int. J. Comput. Vis., 2008, 77(1/2/3): 125-141. https://doi.org/10.1007/s11263-007-0075-7.
Article Google Scholar
Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision. In Proc. the 7th International Joint Conference on Artificial Intelligence, August 1981, pp.674-679.
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. https://doi.org/10.1109/TPAMI.2014.2345390.
Article Google Scholar
Arulampalam M S, Maskell S, Gordon N J, Clapp T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process., 2002, 50(2): 174-188. https://doi.org/10.1109/78.978374.
Article Google Scholar
Li B, Wu W, Wang Q et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.4282-4291. https://doi.org/10.1109/CVPR.2019.00441.
Guo D Y, Wang J, Cui Y et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6268-6276. https://doi.org/10.1109/CVPR42600.2020.00630.
Guo D Y, Shao Y Y, Cui Y et al. Graph attention tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.9543-9552. https://doi.org/10.1109/CVPR46437.2021.00942.
Chen X, Yan B, Zhu J W et al. Transformer tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.8126-8135. https://doi.org/10.1109/CVPR46437.2021.00803.
Wang N, Zhou W G, Wang J et al. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.1571-1580. https://doi.org/10.1109/CVPR46437.2021.00162.
Horn B K P, Schunck B G. Determining optical flow. Artif. Intell., 1981, 17(1/2/3): 185-203. https://doi.org/10.1016/0004-3702(81)90024-2.
Article Google Scholar
Hartley R, Zisserman A. Multiple view geometry in computer vision. Robotica, 2001, 19(2): 233-236. https://doi.org/10.1017/S0263574700223217.
Article MATH Google Scholar
Muja M, Lowe D G. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36(11): 2227-2240. https://doi.org/10.1109/TPAMI.2014.2321376.
Article Google Scholar
Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981, 24(6): 381-395. https://doi.org/10.1145/358669.358692.
Article MathSciNet Google Scholar
Barath D, Matas J, Noskova J. MAGSAC: Marginalizing sample consensus. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10197-10205. https://doi.org/10.1109/CVPR.2019.01044.
Nguyen T, Chen S W, Shivakumar S S et al. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics Autom. Lett., 2018, 3(3): 2346-2353. https://doi.org/10.1109/LRA.2018.2809549.
Article Google Scholar
Zhang J R, Wang C, Liu S C et al. Content-aware unsupervised deep homography estimation. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.653-669. https://doi.org/10.1007/978-3-030-58452-8_38.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Zheng Chen, Xiao-Nan Fang & Song-Hai Zhang

Authors

Zheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Nan Fang
View author publications
You can also search for this author in PubMed Google Scholar
Song-Hai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Song-Hai Zhang.

Supplementary Information

ESM 1

(PDF 523 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Fang, XN. & Zhang, SH. Local Homography Estimation on User-Specified Textureless Regions. J. Comput. Sci. Technol. 37, 615–625 (2022). https://doi.org/10.1007/s11390-022-2185-7

Download citation

Received: 25 January 2022
Accepted: 25 April 2022
Published: 31 May 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11390-022-2185-7

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking

Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting

Monocular human depth estimation with 3D motion flow and surface normals

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Local Homography Estimation on User-Specified Textureless Regions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking

Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting

Monocular human depth estimation with 3D motion flow and surface normals

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation