Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681516acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

MagicVFX: Visual Effects Synthesis in Just Minutes

Published: 28 October 2024 Publication History

Abstract

Visual effects synthesis is crucial in the film and television industry, which aims at enhancing raw footage with virtual elements for greater expressiveness. As the demand for detailed and realistic effects escalates in modern production, professionals are compelled to allocate substantial time and resources to this endeavor. Thus, there is an urgent need to explore more convenient and less resource-intensive methods, such as incorporating the burgeoning Artificial Intelligence Generated Content (AIGC) technology. However, research into this potential integration has yet to be conducted. As the first work to establish a connection between visual effects synthesis and AIGC technology, we start by carefully setting up two paradigms according to the need for pre-produced effects or not: synthesis with reference effects and synthesis without reference effects. Following this, we compile a dataset by processing a collection of effects videos and scene videos, which contains a wide variety of effect categories and scenarios, adequately covering the common effects seen in films and television industry. Furthermore, we explore the capabilities of a pre-trained text-to-video model to synthesize visual effects within these two paradigms. The experimental results demonstrate that the pipeline we established can effectively produce impressive visual effects synthesis outcomes, thereby evidencing the significant potential of existing AIGC technology for application in visual effects synthesis tasks. Our dataset can be found in https://github.com/ruffiann/MagicVFX.

References

[1]
Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, and Heng Tao Shen. 2023. Unifying two-stream encoders with transformers for cross-modal retrieval. In Proceedings of the 31st ACM International Conference on Multimedia. 3041--3050.
[2]
Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, and Heng Tao Shen. 2024. GalleryGPT: Analyzing Paintings with Large Multimodal Models. In Proceedings of the 32nd ACM International Conference on Multimedia.
[3]
Yi Bin, Wenhao Shi, Jipeng Zhang, Yujuan Ding, Yang Yang, and Heng Tao Shen. 2022. Non-autoregressive cross-modal coherence modelling. In Proceedings of the 30th ACM International Conference on Multimedia. 3253--3261.
[4]
Wenhao Chai, Xun Guo, Gaoang Wang, and Yan Lu. 2023. StableVideo: Text-driven Consistency-aware Diffusion Video Editing. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 22983--22993. https://doi.org/10.1109/ICCV51070.2023.02106
[5]
Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, and Ying Shan. 2023. VideoCrafter1: Open Diffusion Models for High-Quality Video Generation. CoRR, Vol. abs/2310.19512 (2023). https://doi.org/10.48550/ARXIV.2310.19512 [arXiv]2310.19512
[6]
Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. 2024. VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models. CoRR, Vol. abs/2401.09047 (2024). https://doi.org/10.48550/ARXIV.2401.09047 [arXiv]2401.09047
[7]
Tao Chen, Jun-Yan Zhu, Ariel Shamir, and Shi-Min Hu. 2013. Motion-Aware Gradient Domain Video Composition. IEEE Trans. Image Process., Vol. 22, 7 (2013), 2532--2544. https://doi.org/10.1109/TIP.2013.2251642
[8]
Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, and Qingpei Guo. 2023. EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints. CoRR, Vol. abs/2308.10648 (2023). https://doi.org/10.48550/ARXIV.2308.10648 [arXiv]2308.10648
[9]
Yujuan Ding, PY Mok, Yi Bin, Xun Yang, and Zhiyong Cheng. 2023. Modeling Multi-Relational Connectivity for Personalized Fashion Matching. In Proceedings of the 31st ACM International Conference on Multimedia. 7047--7055.
[10]
Patrick Esser and et al. 2023. Structure and Content-Guided Video Synthesis with Diffusion Models. In ICCV.
[11]
Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, and Kevin Tang. 2023. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence. CoRR, Vol. abs/2312.02087 (2023). https://doi.org/10.48550/ARXIV.2312.02087 [arXiv]2312.02087
[12]
Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, and Jingkuan Song. 2024. Training-Free Semantic Video Composition via Pre-trained Diffusion Model. CoRR, Vol. abs/2401.09195 (2024). https://doi.org/10.48550/ARXIV.2401.09195 [arXiv]2401.09195
[13]
Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, and Qifeng Chen. 2022. Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths. CoRR, Vol. abs/2211.13221 (2022). https://doi.org/10.48550/ARXIV.2211.13221 [arXiv]2211.13221
[14]
Jonathan Ho, Tim Salimans, Alexey A. Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. 2022. Video Diffusion Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). http://papers.nips.cc/paper_files/paper/2022/hash/39235c56aef13fb05a6adc95eb9d8d66-Abstract-Conference.html
[15]
Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, and Jie Tang. 2023. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=rB6TpjAuSRy
[16]
Hao-Zhi Huang, Sen-Zhe Xu, Junxiong Cai, Wei Liu, and Shi-Min Hu. 2020. Temporally Coherent Video Harmonization Using Adversarial Networks. IEEE Trans. Image Process., Vol. 29 (2020), 214--224. https://doi.org/10.1109/TIP.2019.2925550
[17]
Hyeonho Jeong and Jong Chul Ye. 2023. Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models. CoRR, Vol. abs/2310.01107 (2023). https://doi.org/10.48550/ARXIV.2310.01107 [arXiv]2310.01107
[18]
Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, and Amirhossein Habibian. 2024. Object-Centric Diffusion for Efficient Video Editing. CoRR, Vol. abs/2401.05735 (2024). https://doi.org/10.48550/ARXIV.2401.05735 [arXiv]2401.05735
[19]
Zhenyi Liao and Zhijie Deng. 2023. LOVECon: Text-driven Training-Free Long Video Editing with ControlNet. CoRR, Vol. abs/2310.09711 (2023). https://doi.org/10.48550/ARXIV.2310.09711 [arXiv]2310.09711
[20]
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, and Jiaya Jia. 2023. Video-P2P: Video Editing with Cross-attention Control. CoRR, Vol. abs/2303.04761 (2023). https://doi.org/10.48550/ARXIV.2303.04761 [arXiv]2303.04761
[21]
Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, and Liqing Zhang. 2022. Deep Video Harmonization With Color Mapping Consistency. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, Luc De Raedt (Ed.). ijcai.org, 1232--1238. https://doi.org/10.24963/IJCAI.2022/172
[22]
Ao Luo, Ning Xie, Zhijia Tao, and Feng Jiang. 2020. Deep-VFX: Deep Action Recognition Driven VFX for Short Video. CoRR, Vol. abs/2007.11257 (2020). [arXiv]2007.11257 https://arxiv.org/abs/2007.11257
[23]
Chenlin Meng and et al. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR.
[24]
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbelaez, Alexander Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 DAVIS Challenge on Video Object Segmentation. CoRR, Vol. abs/1704.00675 (2017). [arXiv]1704.00675 http://arxiv.org/abs/1704.00675
[25]
Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, and Yueting Zhuang. 2023. InstructVid2Vid: Controllable Video Editing with Natural Language Instructions. CoRR, Vol. abs/2305.12328 (2023). https://doi.org/10.48550/ARXIV.2305.12328 [arXiv]2305.12328
[26]
Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li, and Xihui Liu. 2023. Drag-A-Video: Non-rigid Video Editing with Point-based Interaction. CoRR, Vol. abs/2312.02936 (2023). https://doi.org/10.48550/ARXIV.2312.02936 [arXiv]2312.02936
[27]
Cyrus Vachha. 2024. Creating Visual Effects with Neural Radiance Fields. CoRR, Vol. abs/2401.08633 (2024). https://doi.org/10.48550/ARXIV.2401.08633 [arXiv]2401.08633
[28]
Jingye Wang, Bin Sheng, Ping Li, Yuxi Jin, and David Dagan Feng. 2019. Illumination-Guided Video Composition via Gradient Consistency Optimization. IEEE Trans. Image Process., Vol. 28, 10 (2019), 5077--5090. https://doi.org/10.1109/TIP.2019.2916769
[29]
Jiuniu Wang, Hangjie Yuan, Dayou Chen, Yingya Zhang, Xiang Wang, and Shiwei Zhang. 2023. ModelScope Text-to-Video Technical Report. CoRR, Vol. abs/2308.06571 (2023). https://doi.org/10.48550/ARXIV.2308.06571 [arXiv]2308.06571
[30]
Xiang Wang and et al. 2023. VideoComposer: Compositional Video Synthesis with Motion Controllability. In NeurIPS.
[31]
Yuanzhi Wang, Yong Li, Xin Liu, Anbo Dai, Antoni Chan, and Zhen Cui. 2023. Edit Temporal-Consistent Videos with Image Diffusion Model. CoRR, Vol. abs/2308.09091 (2023). https://doi.org/10.48550/ARXIV.2308.09091 [arXiv]2308.09091
[32]
Shuai Yang, Yifan Zhou, Ziwei Liu, and Chen Change Loy. 2023. Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation. In SIGGRAPH Asia 2023 Conference Papers, SA 2023, Sydney, NSW, Australia, December 12-15, 2023, June Kim, Ming C. Lin, and Bernd Bickel (Eds.). ACM, 95:1--95:11. https://doi.org/10.1145/3610548.3618160
[33]
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, and Mike Zheng Shou. 2023. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. CoRR, Vol. abs/2309.15818 (2023). https://doi.org/10.48550/ARXIV.2309.15818 [arXiv]2309.15818
[34]
Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou. 2023. I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. CoRR, Vol. abs/2311.04145 (2023). https://doi.org/10.48550/ARXIV.2311.04145 [arXiv]2311.04145
[35]
Daquan Zhou, Weimin Wang, Hanshu Yan, Weiwei Lv, Yizhe Zhu, and Jiashi Feng. 2022. MagicVideo: Efficient Video Generation With Latent Diffusion Models. CoRR, Vol. abs/2211.11018 (2022). https://doi.org/10.48550/ARXIV.2211.11018 [arXiv]2211.11018

Index Terms

  1. MagicVFX: Visual Effects Synthesis in Just Minutes

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diffusion models
    2. video synthesis
    3. visual effects

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 12
      Total Downloads
    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media