demonstration

Text2Video: Automatic Video Generation Based on Text Scripts

Authors:

Zixun SunAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2753 - 2755

https://doi.org/10.1145/3474085.3478548

Published: 17 October 2021 Publication History

Get Access

Abstract

To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules: (1) A material manager extracts highlights from raw live game videos, and tags each video highlight, image and audio with labels. (2) A natural language processor extracts entities and semantics from the input text scripts. (3) A refined cross-modal retrieval searches for matching candidate shots from the material manager. (4) A text to speech speaker reads the processed text scripts with synthesized human voice. (5) The selected material shots and synthesized speech are assembled artistically through appropriate video editing techniques.

Supplementary Material

MP4 File (de3172.mp4)

Supplemental video

Download
10.46 MB

MP4 File (text2video.mp4)

Download
10.46 MB

References

[1]

J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.

Google Scholar

[2]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).

Crossref

Google Scholar

[3]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39, 6 (2016), 1137--1149.

Digital Library

Google Scholar

[4]

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In ICCV. 2564--2571.

Digital Library

Google Scholar

[5]

Jonathan Shen, Ruoming Pang, et al. 2018. Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In ICASSP. 4779--4783.

Google Scholar

[6]

Nguyen Duc Thanh, Wanqing Li, and Philip Ogunbona. 2009. An improved template matching method for object detection. In ACCV. 193--202.

Digital Library

Google Scholar

[7]

Yuxuan Wang, Daisy Stanton, et al. 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. In ICML. 5180--5189.

Google Scholar

[8]

Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In CVPR. 2740--2749.

Google Scholar

Cited By

View all

Harde LJensen LKrogh JPlesner ASørensen OPohl H(2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3639701.3656303
Yu YChen XZhan HElkind E(2023)VideoMasterProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/844(7179-7182)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/844
Wünsche KKoesten LMöller TChen J(2023)Supporting Video Authoring for Communication of Research ResultsProceedings of the 2023 ACM International Conference on Interactive Media Experiences10.1145/3573381.3596157(47-59)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3573381.3596157

Index Terms

Text2Video: Automatic Video Generation Based on Text Scripts
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia content creation

Recommendations

LACES: live authoring through compositing and editing of streaming video
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Video authoring activity typically consists of three phases: planning (pre-production), capture (production) and processing (post-production). The status quo is that these phases occur separately, and the latter two have a significant amount of "slack ...
Trapunta Attori: AMV Creation Support System Focusing on Characters in a Video
UbiComp '18: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers

Anime music videos (AMVs) are fan-made music videos that are created by editing and synthesizing anime footage and music. Anime fans create AMVs and share them on the Internet and with their friends. However, AMV creation takes time and effort. It is ...
Multi-clip video editing from a single viewpoint
CVMP '14: Proceedings of the 11th European Conference on Visual Media Production

We propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Check for updates

Author Tags

Qualifiers

Demonstration

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
343
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)2

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Harde LJensen LKrogh JPlesner ASørensen OPohl H(2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3639701.3656303
Yu YChen XZhan HElkind E(2023)VideoMasterProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/844(7179-7182)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/844
Wünsche KKoesten LMöller TChen J(2023)Supporting Video Authoring for Communication of Research ResultsProceedings of the 2023 ACM International Conference on Interactive Media Experiences10.1145/3573381.3596157(47-59)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3573381.3596157

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

LACES: live authoring through compositing and editing of streaming video

Trapunta Attori: AMV Creation Support System Focusing on Characters in a Video

Multi-clip video editing from a single viewpoint