Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3478548acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

Text2Video: Automatic Video Generation Based on Text Scripts

Published: 17 October 2021 Publication History

Abstract

To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules: (1) A material manager extracts highlights from raw live game videos, and tags each video highlight, image and audio with labels. (2) A natural language processor extracts entities and semantics from the input text scripts. (3) A refined cross-modal retrieval searches for matching candidate shots from the material manager. (4) A text to speech speaker reads the processed text scripts with synthesized human voice. (5) The selected material shots and synthesized speech are assembled artistically through appropriate video editing techniques.

Supplementary Material

MP4 File (de3172.mp4)
Supplemental video
MP4 File (text2video.mp4)
To make video creation simpler, in this paper we present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules: (1) A material manager extracts highlights from raw live game videos, and tags each video highlight, image and audio with labels. (2) A natural language processor extracts entities and semantics from the input text scripts. (3) A refined cross-modal retrieval searches for matching candidate shots from the material manager. (4) A text to speech speaker reads the processed text scripts with synthesized human voice. (5) The selected material shots and synthesized speech are assembled artistically through appropriate video editing techniques.

References

[1]
J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
[2]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).
[3]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39, 6 (2016), 1137--1149.
[4]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In ICCV. 2564--2571.
[5]
Jonathan Shen, Ruoming Pang, et al. 2018. Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In ICASSP. 4779--4783.
[6]
Nguyen Duc Thanh, Wanqing Li, and Philip Ogunbona. 2009. An improved template matching method for object detection. In ACCV. 193--202.
[7]
Yuxuan Wang, Daisy Stanton, et al. 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. In ICML. 5180--5189.
[8]
Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In CVPR. 2740--2749.

Cited By

View all
  • (2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
  • (2023)VideoMasterProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/844(7179-7182)Online publication date: 19-Aug-2023
  • (2023)Supporting Video Authoring for Communication of Research ResultsProceedings of the 2023 ACM International Conference on Interactive Media Experiences10.1145/3573381.3596157(47-59)Online publication date: 12-Jun-2023

Index Terms

  1. Text2Video: Automatic Video Generation Based on Text Scripts

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Check for updates

    Author Tags

    1. text2video
    2. video dubbing
    3. video editing
    4. video generation

    Qualifiers

    • Demonstration

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)63
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 29 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
    • (2023)VideoMasterProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/844(7179-7182)Online publication date: 19-Aug-2023
    • (2023)Supporting Video Authoring for Communication of Research ResultsProceedings of the 2023 ACM International Conference on Interactive Media Experiences10.1145/3573381.3596157(47-59)Online publication date: 12-Jun-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media