Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1180639.1180727acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

The challenge problem for automated detection of 101 semantic concepts in multimedia

Published: 23 October 2006 Publication History

Abstract

We introduce the challenge problem for generic video indexing to gain insight in intermediate steps that affect performance of multimedia analysis methods, while at the same time fostering repeatability of experiments. To arrive at a challenge problem, we provide a general scheme for the systematic examination of automated concept detection methods, by decomposing the generic video indexing problem into 2 unimodal analysis experiments, 2 multimodal analysis experiments, and 1 combined analysis experiment. For each experiment, we evaluate generic video indexing performance on 85 hours of international broadcast news data, from the TRECVID 2005/2006 benchmark, using a lexicon of 101 semantic concepts. By establishing a minimum performance on each experiment, the challenge problem allows for component-based optimization of the generic indexing issue, while simultaneously offering other researchers a reference for comparison during indexing methodology development. To stimulate further investigations in intermediate analysis steps that inuence video indexing performance, the challenge offers to the research community a manually annotated concept lexicon, pre-computed low-level multimedia features, trained classifier models, and five experiments together with baseline performance, which are all available at http://www.mediamill.nl/challenge/.

References

[1]
L.A. Rowe and R. Jain. ACM SIGMM retreat report on future directions in multimedia research. ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1):3--13, 2005.
[2]
S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, and K.W. Bowyer. The humanID gait challenge problem: Data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):162--177, 2005.
[3]
K. Barnard, L. Martin, B. Funt, and A. Coath. A data set for color research. Color Research & Application, 27(3):147--151, 2002.
[4]
P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005.
[5]
M. Everingham et al. The 2005 pascal visual object classes challenge. In Selected Proceedings of the First PASCAL Challenges Workshop, LNAI. 2006.
[6]
L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594--611, 2006.
[7]
A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In ACM Multimedia, New York, USA, 2004.
[8]
A.F. Smeaton. Large scale evaluations of multimedia information retrieval: The TRECVid experience. In CIVR, volume 3569 of LNCS, pages 19--27. Springer-Verlag, 2005.
[9]
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.
[10]
H.J. Zhang, J.R. Smith, and Q. Tian, editors. Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. Singapore, 2005.
[11]
R. Lienhart, C. Kuhmunch, and W. Effelsberg. On the detection and recognition of television commercials. In IEEE Conference on Multimedia Computing and Systems, pages 509--516, Ottawa, Canada, 1997.
[12]
J.R. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12--20, 1997.
[13]
Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for TV baseball programs. In ACM Multimedia, pages 105--115, Los Angeles, USA, 2000.
[14]
M.R. Naphade and T.S. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Transactions on Multimedia, 3(1):141--151, 2001.
[15]
A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
[16]
C.G.M. Snoek, M. Worring, and A.W.M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005.
[17]
C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 2006.
[18]
C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
[19]
M. Christel, T. Kanade, M. Mauldin, R. Reddy, M. Sirbu, S. Stevens, and H. Wactlar. Informedia digital video library. Communicationns of the ACM, 38(4):57--58, 1995.
[20]
T. Volkmer, J.R. Smith, A.P. Natsev, M. Campbell, and M. Naphade. A web-based system for collaborative annotation of large image and video collections. In ACM Multimedia, Singapore, 2005.
[21]
A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In International Conference on Image and Video Retrieval, volume 3115 of LNCS, pages 674--675. Springer-Verlag, 2004.
[22]
M.R. Naphade, L. Kennedy, J.R. Kender, S.-F. Chang, J.R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for trecvid 2005. Technical Report RC23612, IBM T.J. Watson Research Center, 2005.
[23]
M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis Large-Scale Concept Ontology for Multimedia. IEEE Multimedia, 13(3):86--91, 2006.
[24]
G.M. Quenot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proceedings of the 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.
[25]
J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Communication, 37(1--2):89--108, 2002.
[26]
H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66--73, 1999.
[27]
J. Tague-Sutcliffe. The pragmatics of information retrieval experimentation, revisited. Information Processing & Management, 28(4):467--490, 1992.
[28]
C. Petersohn. Fraunhofer HHI at TRECVID 2004: Shot boundary detection system. In Proceedings of the TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2004.
[29]
K. Walker. Linguistic data consortium, http://www.ldc.upenn.edu/, April 2006. Personal communication.
[30]
P. Over. Trecvid data availability website, April 2006. http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html/.
[31]
C.G.M. Snoek, M. Worring, J.C. van Gemert, J.M. Geusebroek, D.C. Koelma, G.P. Nguyen, O. de Rooij, and F.J. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proc. ACM Multimedia, pages 225--226, Singapore, 2005.
[32]
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000.
[33]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[34]
J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.
[35]
M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. Journal of Visual Communication and Image Representation, 15(3):348--369, 2004.
[36]
J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and A.W.M. Smeulders. Robust scene categorization by learning image statistics in context. In Int'l Workshop on Semantic Learning Applications in Multimedia, in conjunction with CVPR'06, New York, USA, 2006.

Cited By

View all
  • (2024)Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear GuidanceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681300(3722-3731)Online publication date: 28-Oct-2024
  • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 11-May-2024
  • (2024)Ranking with Slot ConstraintsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672000(956-967)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '06: Proceedings of the 14th ACM international conference on Multimedia
October 2006
1072 pages
ISBN:1595934472
DOI:10.1145/1180639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. baseline
  2. generic concept detection
  3. video analysis

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 23 - 27, 2006
CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear GuidanceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681300(3722-3731)Online publication date: 28-Oct-2024
  • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 11-May-2024
  • (2024)Ranking with Slot ConstraintsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672000(956-967)Online publication date: 25-Aug-2024
  • (2024)Boosting Multi-Label Classification Performance Through Meta-ModelInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142350033738:01Online publication date: 31-Jan-2024
  • (2024)Dual Noise Elimination and Dynamic Label Correlation Guided Partial Multi-Label LearningIEEE Transactions on Multimedia10.1109/TMM.2023.333808026(5641-5656)Online publication date: 1-Jan-2024
  • (2024)Dimensionality Reduction for Partial Label Learning: A Unified and Adaptive ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336772136:8(3765-3782)Online publication date: Aug-2024
  • (2024)SkyStrokes: Bridging Artistic Expression and Communication Barriers Through Aerial Digital Canvas2024 International Conference on Emerging Smart Computing and Informatics (ESCI)10.1109/ESCI59607.2024.10497434(1-6)Online publication date: 5-Mar-2024
  • (2024)Partial label feature selection based on noisy manifold and label distributionPattern Recognition10.1016/j.patcog.2024.110791156(110791)Online publication date: Dec-2024
  • (2024)A deep low-rank semantic factorization method for micro-video multi-label classificationMultimedia Systems10.1007/s00530-024-01428-330:4Online publication date: 5-Aug-2024
  • (2023)PAC-Bayesian offline contextual bandits with guaranteesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619644(29777-29799)Online publication date: 23-Jul-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media