research-article

Open access

Video digests: a browsable, skimmable format for informational lecture videos

Authors:

Björn Hartmann,

Maneesh AgrawalaAuthors Info & Claims

UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technology

Pages 573 - 582

https://doi.org/10.1145/2642918.2647400

Published: 05 October 2014 Publication History

Abstract

Increasingly, authors are publishing long informational talks, lectures, and distance-learning videos online. However, it is difficult to browse and skim the content of such videos using current timeline-based video players. Video digests are a new format for informational videos that afford browsing and skimming by segmenting videos into a chapter/section structure and providing short text summaries and thumbnails for each section. Viewers can navigate by reading the summaries and clicking on sections to access the corresponding point in the video. We present a set of tools to help authors create such digests using transcript-based interactions. With our tools, authors can manually create a video digest from scratch, or they can automatically generate a digest by applying a combination of algorithmic and crowdsourcing techniques and then manually refine it as needed. Feedback from first-time users suggests that our transcript-based authoring tools and automated techniques greatly facilitate video digest creation. In an evaluative crowdsourced study we find that given a short viewing time, video digests support browsing and skimming better than timeline-based or transcript-based video players.

Supplementary Material

ZIP File (uistf3662-file5.zip)

The supplementary pdf contains information on how we selected and tuned the segmentation algorithm.

Download
102.76 KB

suppl.mov (uistf3662-file3.mp4)

Supplemental video

Download
30.65 MB

References

[1]

edX. http://www.edx.org.

[2]

Khan Academy. http://khanacademy.org.

[3]

TED. http://www.ted.com/.

[4]

Barnes, C., Goldman, D. B., Shechtman, E., and finkelstein, A. Video tapestries with continuous temporal zoom. ACM Trans. Graph. 29, 4 (July 2010), 89:1--89:9.

Digital Library

[5]

Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In UIST, ACM (2011), 33--42.

Digital Library

[6]

Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In Proc. of the 23nd annual, ACM (2010), 313--322.

Digital Library

[7]

Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Trans.Graph. 31, 4 (2012), 67.

Digital Library

[8]

Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '00, ACM (New York, NY, USA, 2000), 185--192.

Digital Library

[9]

Burrows, S., Potthast, M., and Stein, B. Paraphrase acquisition via crowdsourcing and machine learning. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 43.

Digital Library

[10]

Buzek, O., Resnik, P., and Bederson, B. B. Error driven paraphrase annotation using mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 217--221.

Digital Library

[11]

Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques, ACM (2002), 157--166.

Digital Library

[12]

Chi, P.-Y., Liu, J., Linder, J., Dontcheva, M., Li, W., and Hartmann, B. Democut: generating concise instructional videos for physical demonstrations. In UIST, ACM (2013), 141--150.

Digital Library

[13]

Choi, F. Y. Advances in domain independent linear text segmentation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, Association for Computational Linguistics (2000), 26--33.

Digital Library

[14]

Christel, M. G., Smith, M. A., Taylor, C. R., and Winkler, D. B. Evolving video skims into useful multimedia abstractions. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press/Addison-Wesley Publishing Co. (1998), 171--178.

Digital Library

[15]

Corum, J. Storytelling with Data. http://style.org/tapestry/, February 2014.

[16]

Denkowski, M., Al-Haj, H., and Lavie, A. Turker-assisted paraphrasing for english-arabic machine translation. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 66--70.

Digital Library

[17]

Du, L., Buntine, W., and Johnson, M. Topic segmentation with a structured topic model. In Proceedings of NAACL-HLT (2013), 190--200.

[18]

Eisenstein, J., and Barzilay, R. Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2008), 334--343.

Digital Library

[19]

Gendler, T. Philosophy 181: Introduction. http://oyc.yale.edu/philosophy/phil-181/lecture-1, Spring 2011.

[20]

Guo, P. J., Kim, J., and Rubin, R. How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM Learning@ scale conference, ACM (2014), 41--50.

Digital Library

[21]

Gupta, V., and Lehal, G. S. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258--268.

[22]

Haubold, A., and Kender, J. R. Augmented segmentation and visualization for presentation videos. In Proceedings of the 13th annual ACM international conference on Multimedia, ACM (2005), 51--60.

Digital Library

[23]

He, L., Sanocki, E., Gupta, A., and Grudin, J. Auto-summarization of audio-video presentations. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 489--498.

Digital Library

[24]

Hearst, M. A. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64.

Digital Library

[25]

Khan, S. Us history overview: Jamestown to the civil war. https://www.khanacademy.org/humanities/ history/history-survey/us-history/v/ us-history-overview-1--jamestown-to-the-civil-war, April 2011.

[26]

Kim, J., Nguyen, P., Weir, S., Guo, P. J., Miller, R. C., and Gajos, K. Z. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the 2014 ACM annual conference on Human factors in computing systems, ACM (2014).

Digital Library

[27]

Kim, J., Shang-Wen, L. D., Cai, C. J., Gajos, K. Z., and Miller, R. C. Leveraging video interaction data and content analysis to improve video learning. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014).

[28]

Klemmer, S. The power of prototyping. https://class.coursera.org/hci/lecture, 2012.

[29]

Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In UIST, ACM (2012), 23--34.

Digital Library

[30]

Lasecki, W. S., Song, Y. C., Kautz, H., and Bigham, J. P. Real-time crowd labeling for deployable activity recognition. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 1203--1212.

Digital Library

[31]

Malioutov, I., and Barzilay, R. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2006), 25--32.

Digital Library

[32]

Mayer, R. E., and Moreno, R. Nine ways to reduce cognitive load in multimedia learning. Educational psychologist 38, 1 (2003), 43--52.

[33]

Nenkova, A., Maskey, S., and Liu, Y. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, Association for Computational Linguistics (2011), 3.

Digital Library

[34]

Rosling, H. The best statistics you've ever seen. http://www.ted.com/talks/hans_rosling_shows_ the_best_stats_you_ve_ever_seen, February 2006.

[35]

Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content based tools for editing audio stories. In UIST, ACM Press (2013), 113--122.

Digital Library

[36]

Smith, M. A., and Kanade, T. Video skimming and characterization through the combination of image and language understanding. In Content-Based Access of Image and Video Database, 1998. Proceedings., 1998 IEEE International Workshop on, IEEE (1998), 61--70.

Digital Library

[37]

Tang, A., and Boring, S. # epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2012), 1569--1572.

Digital Library

[38]

Taskiran, C. M., Pizlo, Z., Amir, A., Ponceleon, D., and Delp, E. J. Automated video program summarization using speech transcripts. Multimedia, IEEE Transactions on 8, 4 (2006), 775--791.

Digital Library

[39]

Truong, B. T., and Venkatesh, S. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 3, 1 (2007), 3.

Digital Library

[40]

Uchihashi, S., Foote, J., Girgensohn, A., and Boreczky, J. Video manga: generating semantically meaningful video summaries. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 383--392.

Digital Library

[41]

Victor, B. Media for thinking the unthinkable. http://worrydream.com/MediaForThinkingTheUnthinkable, April 2013.

[42]

Victor, B. Personal communication, December 2013.

[43]

Whittaker, S., and Amento, B. Semantic speech editing. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM (2004), 527--534.

Digital Library

[44]

Yuan, J., and Liberman, M. Speaker identification on the scotus corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.

Cited By

Fang JPark JKim JWang H(2024)EduLive: Re-Creating Cues for Instructor-Learners Interaction in Educational Live Streams with Learners' Transcript-Based AnnotationsProceedings of the ACM on Human-Computer Interaction10.1145/36869608:CSCW2(1-33)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686960
Nghiem CBousseau ASypesteyn MHoftijzer JAgrawala MTsandilas T(2024)STIVi: Turning Perspective Sketching Videos into Interactive TutorialsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670969(1-13)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670969
Oomori KIshiguro YRekimoto J(2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652942
Show More Cited By

Index Terms

Video digests: a browsable, skimmable format for informational lecture videos
1. Human-centered computing

Recommendations

Visual digests for news video libraries
MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)

The Informedia Digital Video Library contains over 2000 hours of video, growing at a rate of 15 hours per week. A good query engine is not sufficient for information retrieval because often the candidate result sets grow in number as the library grows. ...
VisMap: Exploratory Visualization Support for Introductory Data Science and Visualization
SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science Education

We present VisMap, a Web-based software tool that supports student exploration of possible data visualizations during a typical process of data science practice. Specifically, we detail visualization approaches within three major kinds of data analysis (...
Constructive visualization
DIS '14: Proceedings of the 2014 conference on Designing interactive systems

If visualization is to be democratized, we need to provide means for non-experts to create visualizations that allow them to engage directly with datasets. We present constructive visualization a new paradigm for the simple creation of flexible, dynamic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technology

October 2014

722 pages

ISBN:9781450330695

DOI:10.1145/2642918

General Chair:
Hrvoje Benko
Microsoft Research, USA
,
Program Chairs:
Mira Dontcheva
Adobe, USA
,
Daniel Wigdor
University of Toronto, Canada

Copyright © 2014 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2014

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Information and Intelligent Systems

Conference

UIST '14

Sponsor:

UIST '14: The 27th Annual ACM Symposium on User Interface Software and Technology

October 5 - 8, 2014

Hawaii, Honolulu, USA

Acceptance Rates

UIST '14 Paper Acceptance Rate 74 of 333 submissions, 22%;

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
1,954
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)28

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fang JPark JKim JWang H(2024)EduLive: Re-Creating Cues for Instructor-Learners Interaction in Educational Live Streams with Learners' Transcript-Based AnnotationsProceedings of the ACM on Human-Computer Interaction10.1145/36869608:CSCW2(1-33)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686960
Nghiem CBousseau ASypesteyn MHoftijzer JAgrawala MTsandilas T(2024)STIVi: Turning Perspective Sketching Videos into Interactive TutorialsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670969(1-13)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670969
Oomori KIshiguro YRekimoto J(2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652942
Kawamura KRekimoto J(2024)FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual ContextsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652922(205-216)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652922
Wang SNing ZTruong ADontcheva MLi DChilton L(2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661591
Perraud RTabard AMalacria S(2024)Tutorial mismatches: investigating the frictions due to interface differences when following software video tutorialsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661511(1942-1955)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661511
Tilekbay BYang SLewkowicz MSuryapranata AKim J(2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645164
Abe YTsujiguchi HSakamoto DOno T(2024)Temaneki: Map-Based Collaboration Tool for Consensus-Building in Student-Run Festival Management TeamsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651013(1-8)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651013
Wang JTang HKantor TSoltani TPopov VWang X(2024)Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642587(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642587
Murakami TFujita KHara KTakashima KKitamura Y(2024)SwapVid: Integrating Video Viewing and Document Exploration with Direct ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642515(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642515
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents