Article

Automatic music video summarization based on audio-visual-text analysis and alignment

Authors:

Namunu C. Maddage,

Mohan S. KankanhalliAuthors Info & Claims

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 361 - 368

https://doi.org/10.1145/1076034.1076097

Published: 15 August 2005 Publication History

Abstract

In this paper, we propose a novel approach for automatic music video summarization based on audio-visual-text analysis and alignment. The music video is separated into the music and video tracks. For the music track, the chorus is detected based on music structure analysis. For the video track, we first segment the shots and classify the shots into close-up face shots and non-face shots, then we extract the lyrics and detect the most repeated lyrics from the shots. The music video summary is generated based on the alignment of boundaries of the detected chorus, shot class and the most repeated lyrics from the music video. The experiments on chorus detection, shot classification, and lyrics detection using 20 English music videos are described. Subjective user studies have been conducted to evaluate the quality and effectiveness of summary. The comparisons with the summaries based on our previous method and the manual method indicate that the results of summarization using the proposed method are better at meeting users' expectations.

References

[1]

Logan B and Chu S, Music Summarization Using Key Phrases, In Proc. IEEE International Conference on Audio, Speech and Signal Processing, Istanbul,Turkey, 2000, vol.2, II749--II752.

Digital Library

[2]

Xu C, Zhu Y and Tian Q, Automatic music summarization based on temporal, spectral and cepstral features, In Proc. IEEE International Conference on Multimedia and Explore, Lausanne, Switzerland, 2002, 117--120.

[3]

Lu L, and Zhang H, Automated Extraction of Music Snippets, In Proc. ACM International Conference on Multimedia, Berkeley, CA, 2003, 140--147.

Digital Library

[4]

Bartsch M A and Wakefield G H, To Catch a Chorus: Using Chroma-based Representations for Audio Thumbnailing, In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, 2001, 15--18.

[5]

Cooper M and Foote J, Automatic Music Summarization via Similarity Analysis, In Proc. International Conference on Music Information Retrieval, Paris, France, 2002, 81--85.

[6]

Chai W and Vercoe B, Music Thumbnailing via Structural Analysis, In Proc. ACM international conference on Multimedia, Berkeley, CA, 2003, 223--226.

Digital Library

[7]

Yow, D., Yeo, B.L., Yeung, M., and Liu, G., Analysis and presentation of soccer highlights from digital video, In Proc. of Asian Conference on Computer Vision, Singapore, 1995, vol. II, 499--503.

[8]

Gong, Y., Liu, X., and Hua, W., Creating motion video summaries with partial audio-visual alignment, In Proc. of IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, 2002, vol.1, 285--288.

[9]

Foote, J., Cooper, M., and Girgensohn, A., Creating Music videos using automatic media analysis. In Proc. ACM International Conference on Multimedia, Juan-les-Pins, France, 2002, 553--560.

Digital Library

[10]

Pfeiffer, S., Lienhart, R., Fischer, S., and Effelsberg, W., Abstracting digital movies automatically, Journal of Visual Communication and Image Representation, 7, 4, (1996), 345--353.

[11]

Shao, X., Xu, C., and Kankanhalli, M.S., Automatically generating summaries for musical video, In Proceedings of IEEE International Conference on Image Processing, Barcelona, Spain, 2003, Vol.2, 547--550.

[12]

Agnihotri, L., Dimitrova, N., Kender, J., and Zimmerman, J., Music videos miner, In Proc. of the ACM International Conference on Multimedia, Berkeley, CA, 2003, 442--443.

Digital Library

[13]

Agnihotri, L., Dimitrova, N., and Kender, J., Design and evaluation of a music video summarization system, In Proc. of IEEE International Conference on Multimedia and Expo, 2004, Taibei, Taiwan.

[14]

Ten Minute Master No 18: Song Structure. MUSIC TECH magazine. www.musictechmag.co.uk (Oct. 2003), 62--63.

[15]

Goto M A, Chorus-section detecting method for musical audio signals, In Proc. IEEE International Conference on Acoustics Speech and Signal Processing, Hong Kong, 6-10 April, 2003.

[16]

Dannenberg R B and Hu N, Discovering music structure in audio recording, In Proc. 2nd International Conference on Music and Artificial Intelligence, Scotland, UK, 2002, 43--57.

Digital Library

[17]

Maddage C. N, Xu.C, Kankanhalli M.S, Shao X, Content-based music structure analysis with the applications to music semantic understanding, In Proc. ACM International Conference on Multimedia, New York, NY, 2004, 112--119.

Digital Library

[18]

Rossing, T.D., Moore, F.R., and Wheeler, P.A., Science of Sound. Addison Wesley, 3rd Edition 2001.

[19]

Navarro, G. A guided tour to approximate string matching, ACM Computing Surveys, Vol.33, No.1, March 2001, 31--88.

Digital Library

[20]

Sheh, A., and Ellis, D.P.W., Chord Segmentation and Recognition using EM-Trained Hidden Markov Models. In Proc. ISMIR 2003.

[21]

Young S et al., The HTK Book, Dept. of Engineering, University of Cambridge, Version 3.2, 2002.

[22]

Deller, J. R., Hansen, J.H.L., and Proakis, H. J. G. Discrete-Time Processing of Speech Signals, IEEE Press (1999).

Digital Library

[23]

Collobert, R., and Bengio, S. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research. Vol 1, 2001, 143--160.

Digital Library

[24]

Smoliar, S., and Zhang, H., Content-based video indexing and retrieval, IEEE Multimedia, vol. 1, pp. 62--72, 1994.

Digital Library

[25]

Li S. Z., Zhu L., Zhang Z. Q., Blake A., Zhang H. J. and Shum H., "Statistical learning of multi-view face detection", European Conference on Computer Vision, Denmark, May 2002.

Digital Library

[26]

Hua X. S., Chen X. R., Liu W., Zhang H. J., Automatic location of text in video frames. 3rd International Workshop on Multimedia Information Retrieval, Ottawa, Canada, 2001.

Digital Library

[27]

Rudiments and Theory of Music. The Associated Board of the Royal Schools of Music, 14 Bedford Square, London, WC1B 3JG, 1949.

[28]

John.P.C., Virginia A. Diehl and Kent L.N, Development of an instrument measuring user satisfaction of the human-computer interface,Proceedings of SIGCHI'88, pp.213--218, New York, 1988.

Digital Library

Cited By

Meena PKumar HKumar Yadav S(2023)A review on video summarization techniquesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105667118:COnline publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.engappai.2022.105667
Zhu WWang XLi H(2020)Multi-Modal Deep Analysis for MultimediaIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2019.294064730:10(3740-3764)Online publication date: Oct-2020
https://doi.org/10.1109/TCSVT.2019.2940647
Lu GHan Y(2019)3D Shape Retrieval through Multilayer RBF Neural Network2019 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2019.8803384(2394-2398)Online publication date: Sep-2019
https://doi.org/10.1109/ICIP.2019.8803384
Show More Cited By

Index Terms

Automatic music video summarization based on audio-visual-text analysis and alignment
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Automatic music video generation based on temporal pattern analysis
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Music video (MV) is a short film meant to present a visual representation of a popular music song. In this paper, we present a system that automatically generates MV-like videos from personal home videos based on observations that generally there are ...
Creating music videos using automatic media analysis
MULTIMEDIA '02: Proceedings of the tenth ACM international conference on Multimedia

We present methods for automatic and semi-automatic creation of music videos, given an arbitrary audio soundtrack and source video. Significant audio changes are automatically detected; similarly, the source video is automatically segmented and analyzed ...
Automatic summarization of music videos

In this article, we propose a novel approach for automatic music video summarization. The proposed summarization scheme is different from the current methods used for video summarization. The music video is separated into the music track and video ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

August 2005

708 pages

ISBN:1595930345

DOI:10.1145/1076034

General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR05

Sponsor:

SIGIR

SIGIR05: The 28th ACM/SIGIR International Symposium on Information Retrieval 2005

August 15 - 19, 2005

Salvador, Brazil

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
1,327
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Meena PKumar HKumar Yadav S(2023)A review on video summarization techniquesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105667118:COnline publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.engappai.2022.105667
Zhu WWang XLi H(2020)Multi-Modal Deep Analysis for MultimediaIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2019.294064730:10(3740-3764)Online publication date: Oct-2020
https://doi.org/10.1109/TCSVT.2019.2940647
Lu GHan Y(2019)3D Shape Retrieval through Multilayer RBF Neural Network2019 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2019.8803384(2394-2398)Online publication date: Sep-2019
https://doi.org/10.1109/ICIP.2019.8803384
Lu GYu HYuan C(2018)Getting Rid of Night: Thermal Image Classification Based on Feature Fusion2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545321(2827-2832)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8545321
Ngo CWang F(2018)Video SummarizationEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1026(4439-4443)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_1026
Li YMerialdo B(2016)Multimedia maximal marginal relevance for multi-video summarizationMultimedia Tools and Applications10.1007/s11042-014-2287-575:1(199-220)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1007/s11042-014-2287-5
Ngo CWang F(2016)Video SummarizationEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1026-2(1-6)Online publication date: 8-Dec-2016
https://doi.org/10.1007/978-1-4899-7993-3_1026-2
Lee DPark SHahn MLee N(2014)Robot Actors and Authoring Tools for Live Performance System2014 International Conference on Information Science & Applications (ICISA)10.1109/ICISA.2014.6847457(1-3)Online publication date: May-2014
https://doi.org/10.1109/ICISA.2014.6847457
Lee JKim JKim H(2014)Music Emotion Classification Based on Music Highlight Detection2014 International Conference on Information Science & Applications (ICISA)10.1109/ICISA.2014.6847435(1-2)Online publication date: May-2014
https://doi.org/10.1109/ICISA.2014.6847435
Lee DKim EYoo JLee JChoi J(2014)FBDtoVerilog 2.0: An Automatic Translation of FBD into Verilog to Develop FPGA2014 International Conference on Information Science & Applications (ICISA)10.1109/ICISA.2014.6847402(1-4)Online publication date: May-2014
https://doi.org/10.1109/ICISA.2014.6847402
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents