Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1991996.1992035acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Saliency moments for image categorization

Published: 18 April 2011 Publication History

Abstract

In this paper we present Saliency Moments, a new, holistic descriptor for image recognition inspired by two biological vision principles: the gist perception and the selective visual attention. While traditional image features extract either local or global discriminative properties from the visual content, we use a hybrid approach that exploits some coarsely localized information, i.e. the salient regions shape and contours, to build a global, low-dimensional image signature. Results show that this new type of image description outperforms the traditional global features on scene and object categorization, for a variety of challenging datasets. Moreover, we show that, when combined with other existing descriptors (SIFT, Color Moments, Wavelet Feature and Edge Histogram), the saliency-based features provide complementary information, improving the precision of a retrieval system we build for the TRECVID 2010.

References

[1]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-tuned salient region detection. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1597--1604. IEEE, 2009.
[2]
H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Computer Vision--ECCV 2006, pages 404--417, 2006.
[3]
I. Biederman. Visual object recognition. In Readings in philosophy and cognitive science, pages 9--21. MIT Press, 1993.
[4]
M. Castelhano and J. Henderson. The influence of color on the perception of scene gist.
[5]
M. Cohen, G. Alvarez, and K. Nakayama. Gist perception requires attention. Journal of Vision, 10(7):187, 2010.
[6]
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1, page 22. Citeseer, 2004.
[7]
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1):59--70, 2007.
[8]
P. Forssén, D. Meger, K. Lai, S. Helmer, J. Little, and D. Lowe. Informed visual search: Combining attention and object recognition. In Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pages 935--942. IEEE, 2008.
[9]
S. Harding, M. Cooke, and P. Konig. Auditory gist perception: an alternative to attentional selection of auditory streams? Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint, pages 399--416, 2007.
[10]
J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. Advances in neural information processing systems, 19:545, 2007.
[11]
X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, pages 1--8. Ieee, 2007.
[12]
J. Huang, S. Kumar, M. Mitra, and W. Zhu. Image indexing using color correlograms, 2001. US Patent 6,246,790.
[13]
L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(11):1254--1259, 2002.
[14]
H. Jegou, M. Douze, C. Schmid, and P. Perez. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3304--3311. IEEE, 2010.
[15]
J. Jones and L. Palmer. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6):1233, 1987.
[16]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169--2178. IEEE, 2006.
[17]
D. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.
[18]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE transactions on pattern analysis and machine intelligence, pages 1615--1630, 2005.
[19]
F. Moosmann, D. Larlus, and F. Jurie. Learning saliency maps for object categorization. In ECCV International Workshop on The Representation and Use of Prior Knowledge in Vision. Citeseer, 2006.
[20]
D. Navon. Forest before trees: The precedence of global features in visual perception* 1. Cognitive psychology, 9(3):353--383, 1977.
[21]
A. Oliva and P. Schyns. Diagnostic Colors Mediate Scene Recognition. Cognitive Psychology, 41(2):176--210, 2000.
[22]
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001.
[23]
A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in brain research, 155:23--36, 2006.
[24]
C. Papageorgiou, M. Oren, and T. Poggio. A General Framework for Object Detection. In Proceedings of the Sixth International Conference on Computer Vision, page 555. IEEE Computer Society, 1998.
[25]
A. Quattoni and A. Torralba. Recognizing indoor scenes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 413--420. IEEE, 2009.
[26]
L. Renninger and J. Malik. When is scene identification just texture recognition? Vision Research, 44(19):2301--2311, 2004.
[27]
Y. Ro, M. Kim, H. Kang, B. Manjunath, and J. Kim. MPEG-7 homogeneous texture descriptor. ETRI journal, 23(2):41--51, 2001.
[28]
P. Schyns and A. Oliva. From blobs to boundary edges: Evidence for time-and spatial-scale-dependent scene recognition. Psychological Science, 5(4):195, 1994.
[29]
T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio. Robust object recognition with cortex-like mechanisms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(3):411--426, 2007.
[30]
C. Siagian and L. Itti. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 300--312, 2007.
[31]
A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, New York, NY, USA, 2006. ACM Press.
[32]
M. Stricker and M. Orengo. Similarity of color images. In Proceedings of SPIE, volume 2420, page 381, 1995.
[33]
C. Suchy-Dicey. What the Gist? A Case Study in Perception and Attention.
[34]
A. Torralba, A. Oliva, M. Castelhano, and J. Henderson. Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113(4):766--786, 2006.
[35]
D. Walther, U. Rutishauser, C. Koch, and P. Perona. On the usefulness of attention for object recognition. In Workshop on Attention and Performance in Computational Vision at ECCV, pages 96--103. Citeseer, 2004.
[36]
C. Won, D. Park, and S. Park. Efficient use of MPEG-7 edge histogram descriptor. Etri Journal, 24(1):23--30, 2002.

Cited By

View all
  • (2015)The beauty of capturing faces: Rating the quality of digital portraits2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)10.1109/FG.2015.7163086(1-8)Online publication date: May-2015
  • (2014)6 Seconds of Sound and VisionProceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2014.544(4272-4279)Online publication date: 23-Jun-2014
  • (2014)Retina enhanced SURF descriptors for spatio-temporal concept detectionMultimedia Tools and Applications10.1007/s11042-012-1280-069:2(443-469)Online publication date: 1-Mar-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval
April 2011
512 pages
ISBN:9781450303361
DOI:10.1145/1991996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature extraction
  2. gist
  3. image indexing
  4. saliency
  5. scene recognition
  6. visual attention

Qualifiers

  • Research-article

Conference

ICMR'11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)The beauty of capturing faces: Rating the quality of digital portraits2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)10.1109/FG.2015.7163086(1-8)Online publication date: May-2015
  • (2014)6 Seconds of Sound and VisionProceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2014.544(4272-4279)Online publication date: 23-Jun-2014
  • (2014)Retina enhanced SURF descriptors for spatio-temporal concept detectionMultimedia Tools and Applications10.1007/s11042-012-1280-069:2(443-469)Online publication date: 1-Mar-2014
  • (2014)Hierarchical Late Fusion for Concept Detection in VideosFusion in Computer Vision10.1007/978-3-319-05696-8_3(53-77)Online publication date: 26-Mar-2014
  • (2013)Semantic indexing and computational aestheticsProceedings of the 3rd ACM conference on International conference on multimedia retrieval10.1145/2461466.2461532(337-340)Online publication date: 16-Apr-2013
  • (2012)Where is the beauty?Proceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396486(1363-1364)Online publication date: 29-Oct-2012
  • (2012)Analysing Facebook features to support event detection for photo-based Facebook applicationsProceedings of the 2nd ACM International Conference on Multimedia Retrieval10.1145/2324796.2324810(1-8)Online publication date: 5-Jun-2012
  • (2012)Enhancing semantic features with compositional analysis for scene recognitionProceedings of the 12th international conference on Computer Vision - Volume Part III10.1007/978-3-642-33885-4_45(446-455)Online publication date: 7-Oct-2012
  • (2012)A multimedia retrieval framework based on automatic graded relevance judgmentsProceedings of the 18th international conference on Advances in Multimedia Modeling10.1007/978-3-642-27355-1_29(300-311)Online publication date: 4-Jan-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media