Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.05586 (cs)

[Submitted on 8 Oct 2024 (v1), last revised 10 Nov 2024 (this version, v2)]

Title:TeaserGen: Generating Teasers for Long Documentaries

Authors:Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, Hao-Wen Dong

Abstract:Teasers are an effective tool for promoting content in entertainment, commercial and educational fields. However, creating an effective teaser for long videos is challenging for it requires long-range multimodal modeling on the input videos, while necessitating maintaining audiovisual alignments, managing scene changes and preserving factual accuracy for the output teasers. Due to the lack of a publicly-available dataset, progress along this research direction has been hindered. In this work, we present DocumentaryNet, a collection of 1,269 documentaries paired with their teasers, featuring multimodal data streams of video, speech, music, sound effects and narrations. With DocumentaryNet, we propose a new two-stage system for generating teasers from long documentaries. The proposed TeaserGen system first generates the teaser narration from the transcribed narration of the documentary using a pretrained large language model, and then selects the most relevant visual content to accompany the generated narration through language-vision models. For narration-video matching, we explore two approaches: a pretraining-based model using pretrained contrastive language-vision models and a deep sequential model that learns the mapping between the narrations and visuals. Our experimental results show that the pretraining-based approach is more effective at identifying relevant visual content than directly trained deep autoregressive models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.05586 [cs.CV]
	(or arXiv:2410.05586v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.05586

Submission history

From: Weihan Xu [view email]
[v1] Tue, 8 Oct 2024 01:00:09 UTC (7,585 KB)
[v2] Sun, 10 Nov 2024 02:20:47 UTC (7,796 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TeaserGen: Generating Teasers for Long Documentaries

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TeaserGen: Generating Teasers for Long Documentaries

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators