research-article

Inferring generic activities and events from image content and bags of geo-tags

Authors:

Jiebo LuoAuthors Info & Claims

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

Pages 37 - 46

https://doi.org/10.1145/1386352.1386361

Published: 07 July 2008 Publication History

Abstract

The use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years. Generally speaking, any information extracted from image headers or tags, or from large collections of related images and used at classification time, can be considered as contextual. Such information, being discriminative in its own right, when combined with pure content-based detection systems using pixel information, can improve the overall recognition performance significantly. In this paper, we describe a framework for probabilistically modeling geographical information using a Geographical Information Systems (GIS) database for event and activity recognition in general-purpose consumer images, such as those obtained from Flickr. The proposed framework discriminatively models the statistical saliency of geo-tags in describing an activity or event. Our work leverages the inherent patterns of association between events and their geographical venues. We use descriptions of small local neighborhoods to form bags of geo tags as our representation. Statistical coherence is observed in such descriptions across a wide range of event classes and across many different users. In order to test our approach, we identify certain classes of activities and events wherein people commonly participate and take pictures. Images and corresponding metadata, for the identified events and activities, are obtained from Flickr. We employ visual detectors obtained from Columbia University (Columbia 374), which perform pure visual event and activity recognition. In our experiments, we present the performance advantage obtained by combining contextual GPS information with pixel-based detection systems.

References

[1]

Ames, M. and Naaman, M. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2007.

Digital Library

[2]

Amitay, E., Har'El, N., Sivan, R., and Soffer A. 2004. Web-a-where: Geotagging web content. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2004.

Digital Library

[3]

Barnard, K., Duyugulu, P., Forsyth, D., de-Freitas, N., Blei, D. M., and Jordan, M. I. 2003. Matching words and pictures. J. Machine Learn. Res. 3 (Mar. 2003) 1107--1135.

Digital Library

[4]

Chang, S.-F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C., and Luo, J. 2007. Large-scale multimodal semantic concept detection for consumer video. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2007.

Digital Library

[5]

Chen, Y., Chen, X. Y., Rao, F. Y., Yu, X. L., Li, Y., and Liu, D. 2003. LORE: An infrastructure to support location-aware services. IBM J. Res. Devel. 48(5/6) (2004) 601--616.

Digital Library

[6]

Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2007. Tagging over time: Real-world image annotation by light weight meta-leaning. In Proceedings of the ACM International Conference on Multimedia, 2007.

Digital Library

[7]

Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(65) (2008).

Digital Library

[8]

Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghavan, P., and Tomkins, A. 2006. Visualizing tags over time. In Proceedings of the World Wide Web, 2006.

Digital Library

[9]

Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, 2750 (2003) 489--507.

[10]

Jaffe, A., Tassa, T., and Davis, M. 2006. Generating summaries and visualization for large collections of georeferenced photographs. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2006.

Digital Library

[11]

Joshi, D., Naphade, M., and Natsev, A. 2007. Semantics reinforcement and fusion learning for multimedia streams. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2007:

Digital Library

[12]

Kennedy, L., Naaman, M., Ahern, S., Nair, R., and Rattenbury, T. 2007. How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of the ACM International Conference on Multimedia, 2007.

Digital Library

[13]

Kherfi, M. L., Ziou, D., and Bernardi, A. 2004. Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36(1) (2004) 35--67.

Digital Library

[14]

Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimedia Comput., Commun. Applic. 2(1) (2006) 1--19.

Digital Library

[15]

Li, J. and Wang, J. Z. Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Machine Intell., 30(6) (2008), 985--1002.

Digital Library

[16]

Liu, L., Wolfson, O., and Yin, H. 2006. Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data Management, 2006.

Digital Library

[17]

Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process. Mag. 23(2) (March 2006) 101--114.

[18]

Loui, A. C., Luo, J., Chang, S.-F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., and Yanagawa, A. 2007. Kodak's consumer video benchmark dataset: Concept definition and annotation. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2007.

Digital Library

[19]

Monay, F. and Gatica-Perez, G. 2003. On image annotation with latent space models. In Proceedings of the ACM International Conference on multimedia, 2003.

Digital Library

[20]

Naphade, M. and Smith, J. R. 2004. On detection of semantic concepts at TRECVID. In Proceedings of the ACM International Conference on Multimedia, 2004.

Digital Library

[21]

Schiller, J. H. and Voisard, A. 2004. Location-based services. Morgan Kaufmann, 2004.

Digital Library

[22]

Snoek, C. G. M., Worring, M., and Smeulders, A. W. M. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the ACM International Conference on Multimedia, 2005.

Digital Library

[23]

Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.

[24]

Yang, J., Yan, R., and Hauptmann, A. G. 2007. Crossdomain video concept detection using adaptive SVMs. In Proceedings of the ACM International Conference on Multimedia, 2007.

Digital Library

Cited By

Arbinger CBullin MHenrich A(2022)Exploiting Geodata to Improve Image Recognition with Deep LearningCompanion Proceedings of the Web Conference 202210.1145/3487553.3524645(648-655)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524645
Yin YZhang YLiu ZWang SShah RZimmermann R(2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3060951
Yin YZhang YLiu ZLiang YWang SShah RZimmermann RShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475268
Show More Cited By

Index Terms

Inferring generic activities and events from image content and bags of geo-tags
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Multi-source fusion based geo-tagging for web images

Geographic locations estimation for web images have been received a lot of attention in recent years. With the help of smart phone, it is very popular for us to capture photos and share them in our social media networks. Users often generate several tags ...
A relevant image search engine with late fusion: mixing the roles of textual and visual descriptors
IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces

A fundamental problem in image retrieval is how to improve the text-based retrieval systems, which is known as "bridging the semantic gap". The reliance on visual similarity for judging semantic similarity may be problematic due to the semantic gap ...
Geo-based automatic image annotation
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

A huge number of user-tagged images are daily uploaded to the web. Recently, a growing number of those images are also geotagged. These provide new opportunities for solutions to automatically tag images so that efficient image management and retrieval ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

July 2008

674 pages

ISBN:9781605580708

DOI:10.1145/1386352

General Chairs:
Jiebo Luo
Kodak Research Laboratories
,
Ling Guan
Ryerson University
,
Program Chairs:
Alan Hanjalic
Delft University of Technology
,
Mohan Kankanhalli
National University of Singapore
,
Ivan Lee
University of South Australia

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIVR08

Sponsor:

CIVR08: CIVR'08 - International Conference on Content-based Image and Video Retrieval

July 7 - 9, 2008

Niagara Falls, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
733
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arbinger CBullin MHenrich A(2022)Exploiting Geodata to Improve Image Recognition with Deep LearningCompanion Proceedings of the Web Conference 202210.1145/3487553.3524645(648-655)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524645
Yin YZhang YLiu ZWang SShah RZimmermann R(2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3060951
Yin YZhang YLiu ZLiang YWang SShah RZimmermann RShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475268
Jaffali SJamoussi SKhelifi NHamadou A(2019)Survey on Social Networks Data AnalysisInnovations for Community Services10.1007/978-3-030-37484-6_6(100-119)Online publication date: 15-Dec-2019
https://doi.org/10.1007/978-3-030-37484-6_6
Newsam SLeung D(2018)Georeferenced Social Multimedia as Volunteered Geographic InformationCyberGIS for Geospatial Discovery and Innovation10.1007/978-94-024-1531-5_12(225-246)Online publication date: 27-Jun-2018
https://doi.org/10.1007/978-94-024-1531-5_12
Rawat YKankanhalli M(2017)ClickSmartIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.255565827:1(149-158)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TCSVT.2016.2555658
Kaneko TYanai K(2016)Event photo mining from Twitter using keyword bursts and image clusteringNeurocomputing10.1016/j.neucom.2015.02.081172(143-158)Online publication date: Jan-2016
https://doi.org/10.1016/j.neucom.2015.02.081
Spyrou EMylonas P(2016)A survey on Flickr multimedia research challengesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.01.00651:C(71-91)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.engappai.2016.01.006
Li LJha RThomee BShamma DCao LWang Y(2016)Where the Photos Were Taken: Location Prediction by Learning from Flickr PhotosLarge-Scale Visual Geo-Localization10.1007/978-3-319-25781-5_3(41-58)Online publication date: 6-Jul-2016
https://doi.org/10.1007/978-3-319-25781-5_3
Yanai K(2015)[Invited Paper] A Review of Web Image MiningITE Transactions on Media Technology and Applications10.3169/mta.3.1563:3(156-169)Online publication date: 2015
https://doi.org/10.3169/mta.3.156
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents