research-article

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Authors:

Jianming ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6347 - 6358

https://doi.org/10.1145/3503161.3547978

Published: 10 October 2022 Publication History

Abstract

Temporal consistency is the key challenge of video depth estimation. Previous works are based on additional optical flow or camera poses, which is time-consuming. By contrast, we derive consistency with less information. Since videos inherently exist with heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, we propose the frame masking network (FMNet), a spatial-temporal transformer network predicting the depth of masked frames based on their neighboring frames. By reconstructing masked temporal features, the FMNet can learn intrinsic inter-frame correlations, which leads to consistency. Compared with prior arts, experimental results demonstrate that our approach achieves comparable spatial accuracy and higher temporal consistency without any additional information. Our work provides a new perspective on consistent video depth estimation.

Supplementary Material

MP4 File (MM22-fp945.mp4.mp4)

In this video, we will introduce our work Less is More: Consistent Video Depth Estimation with Masked Frames Modeling. The paper is accepted by ACM Multimedia 2022.

Download
45.67 MB

References

[1]

Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lui, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6836--6846.

[2]

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2022. BEiT: BERT Pre- Training of Image Transformers. In International Conference on Learning Representations.

[3]

Shariq Farooq Bhat, Ibraheem Alhashim, and PeterWonka. 2021. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4009--4018.

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in neural information processing systems, Vol. 33. 1877--1901.

[5]

Yuanzhouhan Cao, Yidong Li, Haokui Zhang, Chao Ren, and Yifan Liu. 2021. Learning Structure Affinity for Video Depth Estimation. In Proceedings of the 29th ACM International Conference on Multimedia. 190--198.

Digital Library

[6]

Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2017. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2017), 3174--3182.

Digital Library

[7]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV), Vol. 12346. Springer, 213--229.

Digital Library

[8]

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International conference on machine learning. PMLR, 1691--1703.

[9]

Xinjing Cheng, PengWang, and Ruigang Yang. 2018. Depth estimation via affinity learned with convolutional spatial propagation network. In European Conference on Computer Vision (ECCV), Vol. 11220. 108--125.

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). 4171--4186.

[11]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

[12]

David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems, Vol. 27. 2366--2374.

[13]

Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6824--6835.

[14]

Huan Fu, Mingming Gong, ChaohuiWang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2002--2011.

[15]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231--1237.

Digital Library

[16]

C. Godard, O. Aodha, M. Firman, and G. Brostow. 2019. Digging into selfsupervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3828--3838.

[17]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems, Vol. 27.

Digital Library

[18]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377 (2021).

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[20]

Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2, 7 (2015).

[21]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[22]

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2462--2470.

[23]

Kevin Karsch, Ce Liu, and Sing Bing Kang. 2014. Depth transfer: Depth extraction from video using non-parametric sampling. IEEE transactions on pattern analysis and machine intelligence 36, 11 (2014), 2144--2158.

[24]

Johannes Kopf, Xuejian Rong, and Jia-Bin Huang. 2021. Robust consistent video depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1611--1621.

[25]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV). IEEE, 239--248.

[26]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[27]

Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. 2019. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019).

[28]

Ruibo Li, Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, and Lingxiao Hang. 2018. Deep attention-based classification network for robust depth prediction. In Asian Conference on Computer Vision (ACCV). 663--678.

[29]

Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1925--1934.

[30]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2117--2125.

[31]

Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2604--2613.

[32]

Yifan Liu, Changyong Shu, Jingdong Wang, and Chunhua Shen. 2020. Structured knowledge distillation for dense prediction. IEEE transactions on pattern analysis and machine intelligence (2020).

Digital Library

[33]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 10012--10022.

[34]

Zhouyong Liu, Shun Luo, Wubin Li, Jingben Lu, Yufan Wu, Shilei Sun, Chunguo Li, and Luxi Yang. 2020. Convtransformer: A convolutional transformer network for video frame synthesis. arXiv preprint arXiv:2011.10185 (2020).

[35]

Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. 2020. Consistent video depth estimation. ACM Transactions on Graphics (ToG) 39, 4 (2020), 71--1.

Digital Library

[36]

Reza Mahjourian, Martin Wicke, and Anelia Angelova. 2018. Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5667--5675.

[37]

Vaishakh Patil, Wouter Van Gansbeke, Dengxin Dai, and Luc Van Gool. 2020. Don't forget the past: Recurrent depth estimation from monocular video. IEEE Robotics and Automation Letters 5, 4 (2020), 6813--6820.

[38]

Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke Xian, and Jianming Zhang. 2022. BokehMe: When Neural Rendering Meets Classical Rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16283--16292.

[39]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI blog (2018).

[40]

Alec Radford, JeffreyWu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog (2019).

[41]

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 12179--12188.

[42]

René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. 2020. Towards robust monocular depth estimation: Mixing datasets for zeroshot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 03 (2020), 1623--1637.

[43]

Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4104--4113.

[44]

Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV), Vol. 9907. 501--518.

[45]

Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673--2681.

Digital Library

[46]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision (ECCV). Springer, 746--760.

Digital Library

[47]

Chengzhou Tang and Ping Tan. 2018. BA-Net: Dense Bundle Adjustment Networks. In International Conference on Learning Representations.

[48]

Zachary Teed and Jia Deng. 2019. DeepV2D: Video to Depth with Differentiable Structure from Motion. In International Conference on Learning Representations.

[49]

Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In European Conference on Computer Vision (ECCV). Springer, 402--419.

Digital Library

[50]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, Vol. 30.

[51]

JianyuanWang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang, Nikolai Smolyanskiy, and Hongdong Li. 2021. Deep two-view structure-from-motion revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8953--8962.

[52]

Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, and Lili Ju. 2019. Spatial correspondence with generative adversarial network: Learning depth from monocular videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 7494--7504.

[53]

Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Yang Xiao, Ruibo Li, and Zhenbo Luo. 2018. Monocular Relative Depth Perception With Web Stereo Data Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 311--320.

[54]

Ke Xian, Jianming Zhang, OliverWang, Long Mai, Zhe Lin, and Zhiguo Cao. 2020. Structure-Guided Ranking Loss for Single Image Depth Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 608--617.

[55]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1492--1500.

[56]

Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. 2018. Pad-net: Multitasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 675--684.

[57]

Wei Yin, Yifan Liu, Chunhua Shen, and Youliang Yan. 2019. Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5684--5693.

[58]

Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1725--1734.

[59]

Xuaner Zhang, Kevin Matzen, Vivien Nguyen, Dillon Yao, You Zhang, and Ren Ng. 2019. Synthetic defocus and look-ahead autofocus for casual videography. ACM Transactions on Graphics (TOG) 38, 4 (2019).

Digital Library

[60]

Zhoutong Zhang, Forrester Cole, Richard Tucker, William T Freeman, and Tali Dekel. 2021. Consistent depth of moving objects in video. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--12.

Digital Library

[61]

Lipu Zhou, Jiamin Ye, Montiel Abello, Shengze Wang, and Michael Kaess. 2018. Unsupervised learning of monocular depth estimation with bundle adjustment, super-resolution and clip loss. arXiv preprint arXiv:1812.03368 (2018).

[62]

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.

Cited By

Xian KPeng JCao ZZhang JLin G(2024)ViTA: Video Transformer Adaptor for Robust Video Depth EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.330955926(3302-3316)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3309559
Yasarla RSingh MCai HShi YJeong JZhu YHan SGarrepalli RPorikli F(2024)FutureDepth: Learning to Predict the Future Improves Video Depth EstimationComputer Vision – ECCV 202410.1007/978-3-031-72652-1_26(440-458)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-72652-1_26
Chen KShu XXie GYan RTang JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Foreground/Background-Masked Interaction Learning for Spatio-temporal Action DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611945(2381-2390)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611945
Show More Cited By

Index Terms

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Towards Practical Consistent Video Depth Estimation
ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Monocular depth estimation algorithms aim to explore the possible links between 2D and 3D data, but challenges remain for existing methods to predict consistent depth from a casual video. Relying on camera poses and the optical flow in the time-...
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Computer Vision – ECCV 2024
Abstract
In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More ...
Fast segment-based algorithm for multi-view depth map generation
ICIC'12: Proceedings of the 8th international conference on Intelligent Computing Theories and Applications

A fast segment-based algorithm for multi-view depth map generation is proposed in this paper. Firstly, the reference image is segmented by mean-shift algorithm and then an adaptive matching method is presented. The simple SAD(sum of absolute intensity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
286
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)8

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xian KPeng JCao ZZhang JLin G(2024)ViTA: Video Transformer Adaptor for Robust Video Depth EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.330955926(3302-3316)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3309559
Yasarla RSingh MCai HShi YJeong JZhu YHan SGarrepalli RPorikli F(2024)FutureDepth: Learning to Predict the Future Improves Video Depth EstimationComputer Vision – ECCV 202410.1007/978-3-031-72652-1_26(440-458)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-72652-1_26
Chen KShu XXie GYan RTang JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Foreground/Background-Masked Interaction Learning for Spatio-temporal Action DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611945(2381-2390)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611945
Li JWang YHuang ZZheng JXian KCao ZZhang JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Diffusion-Augmented Depth Prediction with Sparse AnnotationsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611807(2865-2876)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611807
Ye XZhao WLiu THuang ZCao ZLi X(2023)Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01619(17615-17624)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01619
Wang YShi MLi JHuang ZCao ZZhang JXian KLin G(2023)Neural Video Depth Stabilizer2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00868(9432-9442)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00868
Yasarla RCai HJeong JShi YGarrepalli RPorikli F(2023)MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00804(8720-8730)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00804
Khan NPenner ELanman DXiao L(2023)Temporally Consistent Online Depth Estimation Using Point-Based Fusion2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00880(9119-9129)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00880
Li XCao ZSun HZhang JXian KLin G(2023)3D Cinemagraphy from a Single Image2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00446(4595-4605)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00446

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents