Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2647868.2654964acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Multi-modal Language Models for Lecture Video Retrieval

Published: 03 November 2014 Publication History

Abstract

We propose Multi-modal Language Models (MLMs), which adapt latent variable techniques for document analysis to exploring co-occurrence relationships in multi-modal data. In this paper, we focus on the application of MLMs to indexing text from slides and speech in lecture videos, and subsequently employ a multi-modal probabilistic ranking function for lecture video retrieval. The MLM achieves highly competitive results against well established retrieval methods such as the Vector Space Model and Probabilistic Latent Semantic Analysis. When noise is present in the data, retrieval performance with MLMs is shown to improve with the quality of the spoken text extracted from the video.

References

[1]
J. Adcock, M. Cooper, L. Denoue, H. Pirsiavash, and L. A. Rowe. Talkminer: A lecture webcast search engine. In Proceedings of the International Conference on Multimedia, MM '10, pages 241--250, 2010.
[2]
K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. The Journal of Machine Learning Research, 3:1107--1135, March 2003.
[3]
D. M. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, pages 127--134, 2003.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, March 2003.
[5]
B. Chen. Word topic models for spoken document retrieval and transcription. ACM Transactions on Asian Language Information Processing, 8(1):2:1--2:27, March 2009.
[6]
Q. Fan, K. Barnard, A. Amir, and A. Efrat. Robust spatiotemporal matching of electronic slides to presentation videos. IEEE Transactions on Image Processing, 20:2315--2328, 2011.
[7]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1/2):177--196, January 2001.
[8]
T. Kawahara, Y. Nemoto, and Y. Akita. Automatic lecture transcription by exploiting presentation slide information for language model adaptation. In IEEE ICASSP, 2008.
[9]
R. Lienhart, S. Romberg, and E. Hörster. Multilayer plsa for multimodal image retrieval. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR '09, pages 9:1--9:8, 2009.
[10]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, 2008.
[11]
N. V. Nguyen, J.-M. Ogier, and F. Charneau. Bag of subjects: Lecture videos multimodal indexing. In Proceedings of the 2013 ACM Symposium on Document Engineering, pages 225--6, 2013.
[12]
D. Putthividhya, H. T. Attias, and S. S. Nagarajan. Topic-regression multi-modal latent dirichlet allocation for image and video annotation. In IEEE Computer Vision and Pattern Recognition, 2010.
[13]
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In Proceedings of the International Conference on Multimedia, MM '10, pages 251--260. ACM, 2010.
[14]
A. Vinciarelli and J. Odobez. Application of information retrieval technologies to presentation slides. IEEE Transactions on Multimedia, 8(5):981--995, 2006.

Cited By

View all
  • (2024)Semantic Labels-Aware Transformer Model for Searching over a Large Collection of Lecture-Slides2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00591(6004-6013)Online publication date: 3-Jan-2024
  • (2024)EduCross: Dual adversarial bipartite hypergraph learning for cross-modal retrieval in multimodal educational slidesInformation Fusion10.1016/j.inffus.2024.102428109(102428)Online publication date: Sep-2024
  • (2023)Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01838(20030-20041)Online publication date: 1-Oct-2023
  • Show More Cited By

Index Terms

  1. Multi-modal Language Models for Lecture Video Retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '14: Proceedings of the 22nd ACM international conference on Multimedia
    November 2014
    1310 pages
    ISBN:9781450330633
    DOI:10.1145/2647868
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. latent variable modeling
    2. multi-modal probabilistic ranking
    3. multi-modal retrieval

    Qualifiers

    • Poster

    Conference

    MM '14
    Sponsor:
    MM '14: 2014 ACM Multimedia Conference
    November 3 - 7, 2014
    Florida, Orlando, USA

    Acceptance Rates

    MM '14 Paper Acceptance Rate 55 of 286 submissions, 19%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semantic Labels-Aware Transformer Model for Searching over a Large Collection of Lecture-Slides2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00591(6004-6013)Online publication date: 3-Jan-2024
    • (2024)EduCross: Dual adversarial bipartite hypergraph learning for cross-modal retrieval in multimodal educational slidesInformation Fusion10.1016/j.inffus.2024.102428109(102428)Online publication date: Sep-2024
    • (2023)Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01838(20030-20041)Online publication date: 1-Oct-2023
    • (2022)Online Educational Video Recommendation System AnalysisEncyclopedia of Data Science and Machine Learning10.4018/978-1-7998-9220-5.ch093(1559-1577)Online publication date: 14-Oct-2022
    • (2022)Learning to Retrieve Videos by Asking QuestionsProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548361(356-365)Online publication date: 10-Oct-2022
    • (2019)Video Transcript Indexing and Retrieval Procedure2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SOFTCOM.2019.8903790(1-6)Online publication date: Sep-2019
    • (2019)Towards a Custom Designed Mechanism for Indexing and Retrieving Video TranscriptsHybrid Artificial Intelligent Systems10.1007/978-3-030-29859-3_26(299-309)Online publication date: 26-Aug-2019
    • (2018)Temporal Lecture Video Fragmentation Using Word EmbeddingsMultiMedia Modeling10.1007/978-3-030-05716-9_21(254-265)Online publication date: 11-Dec-2018
    • (2017)Improving speech transcription by exploiting user feedback and word repetitionMultimedia Tools and Applications10.1007/s11042-017-4714-x76:19(20359-20376)Online publication date: 1-Oct-2017
    • (2016)VideopediaProceedings, Part I, of the 22nd International Conference on MultiMedia Modeling - Volume 951610.1007/978-3-319-27671-7_20(238-250)Online publication date: 4-Jan-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media