Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2642918.2647400acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

Video digests: a browsable, skimmable format for informational lecture videos

Published: 05 October 2014 Publication History

Abstract

Increasingly, authors are publishing long informational talks, lectures, and distance-learning videos online. However, it is difficult to browse and skim the content of such videos using current timeline-based video players. Video digests are a new format for informational videos that afford browsing and skimming by segmenting videos into a chapter/section structure and providing short text summaries and thumbnails for each section. Viewers can navigate by reading the summaries and clicking on sections to access the corresponding point in the video. We present a set of tools to help authors create such digests using transcript-based interactions. With our tools, authors can manually create a video digest from scratch, or they can automatically generate a digest by applying a combination of algorithmic and crowdsourcing techniques and then manually refine it as needed. Feedback from first-time users suggests that our transcript-based authoring tools and automated techniques greatly facilitate video digest creation. In an evaluative crowdsourced study we find that given a short viewing time, video digests support browsing and skimming better than timeline-based or transcript-based video players.

Supplementary Material

ZIP File (uistf3662-file5.zip)
The supplementary pdf contains information on how we selected and tuned the segmentation algorithm.
suppl.mov (uistf3662-file3.mp4)
Supplemental video

References

[1]
edX. http://www.edx.org.
[2]
Khan Academy. http://khanacademy.org.
[3]
TED. http://www.ted.com/.
[4]
Barnes, C., Goldman, D. B., Shechtman, E., and finkelstein, A. Video tapestries with continuous temporal zoom. ACM Trans. Graph. 29, 4 (July 2010), 89:1--89:9.
[5]
Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In UIST, ACM (2011), 33--42.
[6]
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In Proc. of the 23nd annual, ACM (2010), 313--322.
[7]
Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Trans.Graph. 31, 4 (2012), 67.
[8]
Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '00, ACM (New York, NY, USA, 2000), 185--192.
[9]
Burrows, S., Potthast, M., and Stein, B. Paraphrase acquisition via crowdsourcing and machine learning. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 43.
[10]
Buzek, O., Resnik, P., and Bederson, B. B. Error driven paraphrase annotation using mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 217--221.
[11]
Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques, ACM (2002), 157--166.
[12]
Chi, P.-Y., Liu, J., Linder, J., Dontcheva, M., Li, W., and Hartmann, B. Democut: generating concise instructional videos for physical demonstrations. In UIST, ACM (2013), 141--150.
[13]
Choi, F. Y. Advances in domain independent linear text segmentation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, Association for Computational Linguistics (2000), 26--33.
[14]
Christel, M. G., Smith, M. A., Taylor, C. R., and Winkler, D. B. Evolving video skims into useful multimedia abstractions. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press/Addison-Wesley Publishing Co. (1998), 171--178.
[15]
Corum, J. Storytelling with Data. http://style.org/tapestry/, February 2014.
[16]
Denkowski, M., Al-Haj, H., and Lavie, A. Turker-assisted paraphrasing for english-arabic machine translation. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 66--70.
[17]
Du, L., Buntine, W., and Johnson, M. Topic segmentation with a structured topic model. In Proceedings of NAACL-HLT (2013), 190--200.
[18]
Eisenstein, J., and Barzilay, R. Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2008), 334--343.
[19]
Gendler, T. Philosophy 181: Introduction. http://oyc.yale.edu/philosophy/phil-181/lecture-1, Spring 2011.
[20]
Guo, P. J., Kim, J., and Rubin, R. How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM Learning@ scale conference, ACM (2014), 41--50.
[21]
Gupta, V., and Lehal, G. S. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258--268.
[22]
Haubold, A., and Kender, J. R. Augmented segmentation and visualization for presentation videos. In Proceedings of the 13th annual ACM international conference on Multimedia, ACM (2005), 51--60.
[23]
He, L., Sanocki, E., Gupta, A., and Grudin, J. Auto-summarization of audio-video presentations. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 489--498.
[24]
Hearst, M. A. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64.
[25]
Khan, S. Us history overview: Jamestown to the civil war. https://www.khanacademy.org/humanities/ history/history-survey/us-history/v/ us-history-overview-1--jamestown-to-the-civil-war, April 2011.
[26]
Kim, J., Nguyen, P., Weir, S., Guo, P. J., Miller, R. C., and Gajos, K. Z. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the 2014 ACM annual conference on Human factors in computing systems, ACM (2014).
[27]
Kim, J., Shang-Wen, L. D., Cai, C. J., Gajos, K. Z., and Miller, R. C. Leveraging video interaction data and content analysis to improve video learning. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014).
[28]
Klemmer, S. The power of prototyping. https://class.coursera.org/hci/lecture, 2012.
[29]
Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In UIST, ACM (2012), 23--34.
[30]
Lasecki, W. S., Song, Y. C., Kautz, H., and Bigham, J. P. Real-time crowd labeling for deployable activity recognition. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 1203--1212.
[31]
Malioutov, I., and Barzilay, R. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2006), 25--32.
[32]
Mayer, R. E., and Moreno, R. Nine ways to reduce cognitive load in multimedia learning. Educational psychologist 38, 1 (2003), 43--52.
[33]
Nenkova, A., Maskey, S., and Liu, Y. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, Association for Computational Linguistics (2011), 3.
[34]
Rosling, H. The best statistics you've ever seen. http://www.ted.com/talks/hans_rosling_shows_ the_best_stats_you_ve_ever_seen, February 2006.
[35]
Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content based tools for editing audio stories. In UIST, ACM Press (2013), 113--122.
[36]
Smith, M. A., and Kanade, T. Video skimming and characterization through the combination of image and language understanding. In Content-Based Access of Image and Video Database, 1998. Proceedings., 1998 IEEE International Workshop on, IEEE (1998), 61--70.
[37]
Tang, A., and Boring, S. # epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2012), 1569--1572.
[38]
Taskiran, C. M., Pizlo, Z., Amir, A., Ponceleon, D., and Delp, E. J. Automated video program summarization using speech transcripts. Multimedia, IEEE Transactions on 8, 4 (2006), 775--791.
[39]
Truong, B. T., and Venkatesh, S. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 3, 1 (2007), 3.
[40]
Uchihashi, S., Foote, J., Girgensohn, A., and Boreczky, J. Video manga: generating semantically meaningful video summaries. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 383--392.
[41]
Victor, B. Media for thinking the unthinkable. http://worrydream.com/MediaForThinkingTheUnthinkable, April 2013.
[42]
Victor, B. Personal communication, December 2013.
[43]
Whittaker, S., and Amento, B. Semantic speech editing. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM (2004), 527--534.
[44]
Yuan, J., and Liberman, M. Speaker identification on the scotus corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.

Cited By

View all
  • (2024)EduLive: Re-Creating Cues for Instructor-Learners Interaction in Educational Live Streams with Learners' Transcript-Based AnnotationsProceedings of the ACM on Human-Computer Interaction10.1145/36869608:CSCW2(1-33)Online publication date: 8-Nov-2024
  • (2024)STIVi: Turning Perspective Sketching Videos into Interactive TutorialsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670969(1-13)Online publication date: 3-Jun-2024
  • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
  • Show More Cited By

Index Terms

  1. Video digests: a browsable, skimmable format for informational lecture videos

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technology
    October 2014
    722 pages
    ISBN:9781450330695
    DOI:10.1145/2642918
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 October 2014

    Check for updates

    Author Tags

    1. education
    2. video digests
    3. video presentation interfaces

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    UIST '14

    Acceptance Rates

    UIST '14 Paper Acceptance Rate 74 of 333 submissions, 22%;
    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)171
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)EduLive: Re-Creating Cues for Instructor-Learners Interaction in Educational Live Streams with Learners' Transcript-Based AnnotationsProceedings of the ACM on Human-Computer Interaction10.1145/36869608:CSCW2(1-33)Online publication date: 8-Nov-2024
    • (2024)STIVi: Turning Perspective Sketching Videos into Interactive TutorialsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670969(1-13)Online publication date: 3-Jun-2024
    • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
    • (2024)FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual ContextsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652922(205-216)Online publication date: 4-Apr-2024
    • (2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
    • (2024)Tutorial mismatches: investigating the frictions due to interface differences when following software video tutorialsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661511(1942-1955)Online publication date: 1-Jul-2024
    • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
    • (2024)Temaneki: Map-Based Collaboration Tool for Consensus-Building in Student-Run Festival Management TeamsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651013(1-8)Online publication date: 11-May-2024
    • (2024)Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642587(1-18)Online publication date: 11-May-2024
    • (2024)SwapVid: Integrating Video Viewing and Document Exploration with Direct ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642515(1-13)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media