Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/957013.957065acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Discriminative model fusion for semantic concept detection and annotation in video

Published: 02 November 2003 Publication History

Abstract

In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus.

References

[1]
W. Adams, G. Iyengar, C.-Y. Lin, et. al Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues. Eurasip JASP., 2:170--185, 2003.
[2]
W. H. Adams, A. Amir, C. Dorai, et. al Ibm research TREC-2002 video retrieval system. In E. M. Voorhees and D. K. Harman, editors, Proc. TREC-11, Gaithersburg, MD, 2003. NIST.
[3]
S. F. Chang, W. Chen, and H. Sundaram. Semantic visual templates - linking features to semantics. In Proc. ICIP, volume 3, pages 531--535, Chicago, IL, October 1998. IEEE.
[4]
G. Iyengar and A. B. Lippman. Models for automatic classification of video sequences. In Storage and Retrieval from Image and Video Databases, volume VI. SPIE, Jan 1998.
[5]
H. J. Nock, W. H. Adams, and G. Iyengar et. al. User-trainable video annotation using multimodal cues. In Proc. SIGIR, Toronto, Canada, July 2003. ACM.
[6]
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proc. TREC-3, pages 109--126. NIST Special Publication 500-226, 1995.
[7]
J. R. Smith and S.-F. Chang. Visualseek: a fully automated content-based query system. In Proc. fourth intl. conf. multimedia, pages 87--92. ACM, May 1996.
[8]
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, USA, 1995.
[9]
N. Vasconcelos and A. Lippman. Bayesian modeling of video editing and structure: Semantic features for video summarization and browsing. In Proc. ICIP, volume 2, pages 550--555, Chicago IL, October 1998. IEEE.
[10]
T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Storage and Retrieval from Image and Video Databases, volume 3972, pages 506--517, San Jose, CA, January 2000. SPIE.

Cited By

View all
  • (2023)PMR: Prototypical Modal Rebalance for Multimodal Learning2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01918(20029-20038)Online publication date: Jun-2023
  • (2021)Lip-Reading Driven Deep Learning Approach for Speech EnhancementIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.29170395:3(481-490)Online publication date: Jun-2021
  • (2021)Ensembling Shallow Siamese Neural Network Architectures for Printed Documents Verification in Data-Scarcity ScenariosIEEE Access10.1109/ACCESS.2021.31102979(133924-133939)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. Discriminative model fusion for semantic concept detection and annotation in video

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia
    November 2003
    670 pages
    ISBN:1581137222
    DOI:10.1145/957013
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ACM proceedings
    2. digital video annotation and indexing
    3. semantic concept detection

    Qualifiers

    • Article

    Conference

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)PMR: Prototypical Modal Rebalance for Multimodal Learning2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01918(20029-20038)Online publication date: Jun-2023
    • (2021)Lip-Reading Driven Deep Learning Approach for Speech EnhancementIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.29170395:3(481-490)Online publication date: Jun-2021
    • (2021)Ensembling Shallow Siamese Neural Network Architectures for Printed Documents Verification in Data-Scarcity ScenariosIEEE Access10.1109/ACCESS.2021.31102979(133924-133939)Online publication date: 2021
    • (2020)A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty InteractionsIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2020.299239414:3(554-567)Online publication date: Mar-2020
    • (2018)Insect Recognition Using Sparse Coding and Decision FusionComputer Vision10.4018/978-1-5225-5204-8.ch073(1746-1767)Online publication date: 2018
    • (2018)Crime prediction and mapping based on real time video analysisJournal of Ambient Intelligence and Smart Environments10.3233/AIS-18047610:2(221-239)Online publication date: 1-Jan-2018
    • (2018)Learning a Multi-Concept Video Retrieval Model with Multiple Latent VariablesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317664714:2(1-21)Online publication date: 25-Apr-2018
    • (2016)Insect Recognition Using Sparse Coding and Decision FusionComputer Vision and Pattern Recognition in Environmental Informatics10.4018/978-1-4666-9435-4.ch007(124-145)Online publication date: 2016
    • (2015)Audiovisual Fusion: Challenges and New ApproachesProceedings of the IEEE10.1109/JPROC.2015.2459017103:9(1635-1653)Online publication date: Sep-2015
    • (2015)Linear multimodal fusion in video concept analysis based on node equilibrium model2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)10.1109/ACPR.2015.7486517(316-320)Online publication date: Nov-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media