Article

Discriminative model fusion for semantic concept detection and annotation in video

Authors:

G. Iyengar,

H. J. NockAuthors Info & Claims

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

Pages 255 - 258

https://doi.org/10.1145/957013.957065

Published: 02 November 2003 Publication History

Get Access

Abstract

In this paper we describe a general information fusion algorithm that can be used to incorporate multimodal cues in building user-defined semantic concept models. We compare this technique with a Bayesian Network-based approach on a semantic concept detection task. Results indicate that this technique yields superior performance. We demonstrate this approach further by building classifiers of arbitrary concepts in a score space defined by a pre-deployed set of multimodal concepts. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus.

References

[1]

W. Adams, G. Iyengar, C.-Y. Lin, et. al Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues. Eurasip JASP., 2:170--185, 2003.

Google Scholar

[2]

W. H. Adams, A. Amir, C. Dorai, et. al Ibm research TREC-2002 video retrieval system. In E. M. Voorhees and D. K. Harman, editors, Proc. TREC-11, Gaithersburg, MD, 2003. NIST.

Google Scholar

[3]

S. F. Chang, W. Chen, and H. Sundaram. Semantic visual templates - linking features to semantics. In Proc. ICIP, volume 3, pages 531--535, Chicago, IL, October 1998. IEEE.

Crossref

Google Scholar

[4]

G. Iyengar and A. B. Lippman. Models for automatic classification of video sequences. In Storage and Retrieval from Image and Video Databases, volume VI. SPIE, Jan 1998.

Google Scholar

[5]

H. J. Nock, W. H. Adams, and G. Iyengar et. al. User-trainable video annotation using multimodal cues. In Proc. SIGIR, Toronto, Canada, July 2003. ACM.

Digital Library

Google Scholar

[6]

S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proc. TREC-3, pages 109--126. NIST Special Publication 500-226, 1995.

Google Scholar

[7]

J. R. Smith and S.-F. Chang. Visualseek: a fully automated content-based query system. In Proc. fourth intl. conf. multimedia, pages 87--92. ACM, May 1996.

Digital Library

Google Scholar

[8]

V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, USA, 1995.

Digital Library

Google Scholar

[9]

N. Vasconcelos and A. Lippman. Bayesian modeling of video editing and structure: Semantic features for video summarization and browsing. In Proc. ICIP, volume 2, pages 550--555, Chicago IL, October 1998. IEEE.

Crossref

Google Scholar

[10]

T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Storage and Retrieval from Image and Video Databases, volume 3972, pages 506--517, San Jose, CA, January 2000. SPIE.

Google Scholar

Cited By

View all

Fan YXu WWang HWang JGuo S(2023)PMR: Prototypical Modal Rebalance for Multimodal Learning2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01918(20029-20038)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01918
Adeel AGogate MHussain AWhitmer W(2021)Lip-Reading Driven Deep Learning Approach for Speech EnhancementIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.29170395:3(481-490)Online publication date: Jun-2021
https://doi.org/10.1109/TETCI.2019.2917039
Ferreira APurnekar NBarni M(2021)Ensembling Shallow Siamese Neural Network Architectures for Printed Documents Verification in Data-Scarcity ScenariosIEEE Access10.1109/ACCESS.2021.31102979(133924-133939)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110297
Show More Cited By

Index Terms

Discriminative model fusion for semantic concept detection and annotation in video
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
User-trainable video annotation using multimodal cues
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building ...
Semantic concept detection for video based on extreme learning machine

Semantic concept detection is an important step in concept-based semantic video retrieval, which can be regarded as an intermediate descriptor to bridge the semantic gap. Most existing concept detection methods utilize Support Vector Machines (SVM) as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

November 2003

670 pages

ISBN:1581137222

DOI:10.1145/957013

General Chairs:
Lawrence Rowe
University of California, Berkeley
,
Harrick Vin
University of Texas, Austin
,
Program Chairs:
Thomas Plagemann
University of Oslo
,
Prashant Shenoy
University of Massachusetts, Amherst
,
John R. Smith
IBM T.J. Watson Research Center

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM03

Sponsor:

MM03: 2003 11th Annual ACM International Conference on Multimedia

November 2 - 8, 2003

CA, Berkeley, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
806
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fan YXu WWang HWang JGuo S(2023)PMR: Prototypical Modal Rebalance for Multimodal Learning2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01918(20029-20038)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01918
Adeel AGogate MHussain AWhitmer W(2021)Lip-Reading Driven Deep Learning Approach for Speech EnhancementIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.29170395:3(481-490)Online publication date: Jun-2021
https://doi.org/10.1109/TETCI.2019.2917039
Ferreira APurnekar NBarni M(2021)Ensembling Shallow Siamese Neural Network Architectures for Printed Documents Verification in Data-Scarcity ScenariosIEEE Access10.1109/ACCESS.2021.31102979(133924-133939)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110297
Zhang LRadke R(2020)A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty InteractionsIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2020.299239414:3(554-567)Online publication date: Mar-2020
https://doi.org/10.1109/JSTSP.2020.2992394
Lu HHou XLiu CChen X(2018)Insect Recognition Using Sparse Coding and Decision FusionComputer Vision10.4018/978-1-5225-5204-8.ch073(1746-1767)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5204-8.ch073
Mohammed NBeiji ZChengzhang ZRongchang Z(2018)Crime prediction and mapping based on real time video analysisJournal of Ambient Intelligence and Smart Environments10.3233/AIS-18047610:2(221-239)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/AIS-180476
Mazaheri AGong BShah M(2018)Learning a Multi-Concept Video Retrieval Model with Multiple Latent VariablesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317664714:2(1-21)Online publication date: 25-Apr-2018
https://dl.acm.org/doi/10.1145/3176647
Lu HHou XLiu CChen X(2016)Insect Recognition Using Sparse Coding and Decision FusionComputer Vision and Pattern Recognition in Environmental Informatics10.4018/978-1-4666-9435-4.ch007(124-145)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-9435-4.ch007
Katsaggelos ABahaadini SMolina R(2015)Audiovisual Fusion: Challenges and New ApproachesProceedings of the IEEE10.1109/JPROC.2015.2459017103:9(1635-1653)Online publication date: Sep-2015
https://doi.org/10.1109/JPROC.2015.2459017
Geng JMiao ZLiang QWang S(2015)Linear multimodal fusion in video concept analysis based on node equilibrium model2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)10.1109/ACPR.2015.7486517(316-320)Online publication date: Nov-2015
https://doi.org/10.1109/ACPR.2015.7486517
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Early versus late fusion in semantic video analysis

User-trainable video annotation using multimodal cues

Semantic concept detection for video based on extreme learning machine

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations