Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Perceptual modeling in the problem of active object recognition in visual scenes

Published: 01 August 2016 Publication History

Abstract

Incorporating models of human perception into the process of scene interpretation and object recognition in visual content is a strong trend in computer vision. In this paper we tackle the modeling of visual perception via automatic visual saliency maps for object recognition. Visual saliency represents an efficient way to drive the scene analysis towards particular areas considered 'of interest' for a viewer and an efficient alternative to computationally intensive sliding window methods for object recognition. Using saliency maps, we consider biologically inspired independent paths of central and peripheral vision and apply them to fundamental steps of the so-called Bag-of-Words (BoW) paradigm, such as features sampling, pooling and encoding. Our proposal has been evaluated addressing the challenging task of active object recognition, and the results show that our method not only improves the baselines, but also achieves state-of-the-art performance in various datasets at very competitive computational times. HighlightsPerceptual model that incorporates visual attention to the problem of active object recognition.Modeling of foveal and peripheral pathways in retina.Saliency-based non-uniform feature sampling in a variable-resolution space.Saliency-sensitive Coding of features.Saliency-based Pooling of features.

References

[1]
J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: Proceedings of the International Conference on Computer Vision, vol. 2, pp. 1470-1477.
[2]
G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1-22.
[3]
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518.
[4]
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: International Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 886-893.
[5]
P.F. Felzenszwalb, R.B. Girshick, D.A. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell., 32 (2010) 1627-1645.
[6]
C.H. Lampert, M.B. Blaschko, T. Hofmann, Beyond sliding windows: object localization by efficient subwindow search, in: IEEE Conference on Computer Vision and Pattern Recognition.
[7]
A. Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 185-207.
[8]
X. Ren, C. Gu, Figure-ground segmentation improves handled object recognition in egocentric video, in: IEEE Conference on Computer Vision and Pattern Recognition.
[9]
B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 2189-2202.
[10]
J. Uijlings, K. van de Sande, T. Gevers, A. Smeulders, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.
[11]
L. Itti, C. Koch, Computational modelling of visual attention, Nat. Rev. Neurosci., 2 (2001) 194-203.
[12]
J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge, MA, 2007, pp. 545-552.
[13]
R. de Carvalho Soares, I. da Silva, D. Guliato, Spatial locality weighting of features using saliency map with a BoW approach, in: International Conference on Tools with Artificial Intelligence, 2012, pp. 1070-1075.
[14]
G. Sharma, F. Jurie, C. Schmid, Discriminative spatial saliency for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3506-3513.
[15]
M. San Biagio, L. Bazzani, M. Cristani, V. Murino, Weighted bag of visual words for object recognition, in: IEEE International Conference on Image Processing (ICIP), 2014, pp. 2734-2738.
[16]
V. Mahadevan, N. Vasconcelos, Biologically inspired object tracking using center-surround saliency mechanisms, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 541-554.
[17]
Y. Su, Q. Zhao, L. Zhao, D. Gu, Abrupt motion tracking using a visual saliency embedded particle filter, Pattern Recognit., 47 (2014) 1826-1834.
[18]
E. Vig, M. Dorr, D. Cox, Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements, Springer, Firenze, Italy, pp. 84-97.
[19]
S. Mathe, C. Sminchisescu, Dynamic eye movement datasets and learnt saliency models for visual action recognition, in: European Conference on Computer Vision (ECCV), 2012, pp. 842-856.
[20]
I. González-Díaz, V. Buso, J. Benois-Pineau, G. Bourmaud, R. Megret, Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research, in: ACM MM MIIRH Workshop.
[21]
S. Karaman, J. Benois-Pineau, V. Dovgalecs, R. Mégret, J. Pinquier, R. André-Obrecht, Y. Gaëstel, J.-F. Dartigues, Hierarchical hidden Markov model in detecting activities of daily living in wearable videos for studies of dementia, Multimed. Tools Appl. (2011) 1-29.
[22]
A. Fathi, X. Ren, J. M. Rehg, Learning to recognize objects in egocentric activities, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3281-3288.
[23]
A. Fathi, Y. Li, J.M. Rehg, Learning to recognize daily actions using gaze, in: European Conference on Computer Vision, ECCV' 12, pp. 314-327.
[24]
K. Ogaki, K. M. Kitani, Y. Sugano, Y. Sato, Coupling eye-motion and ego-motion features for first-person activity recognition, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1-7.
[25]
H.L. Fernandes, I.H. Stevenson, A.N. Phillips, M.A. Segraves, K.P. Kording, Saliency and saccade encoding in the frontal eye field during natural scene search, Cereb. Cortex (2013).
[26]
D. Wooding, Eye movements of large populations, Behav. Res. Methods Instrum. Comput., 34 (2002) 518-528.
[27]
D. Walther, U. Rutishauser, C. Koch, P. Perona, On the usefulness of attention for object recognition, in: Workshop on Attention and Performance in Computational Vision at ECCV, pp. 96-103.
[28]
F. Moosmann, D. Larlus, F. Jurie, Learning saliency maps for object categorization, in: ECCV'06 Workshop on the Representation and Use of Prior Knowledge in Vision.
[29]
H. Larochelle, G.E. Hinton, Learning to combine foveal glimpses with a third-order Boltzmann machine, in: Advances in Neural Information Processing Systems, vol. 23, pp. 1243-1251.
[30]
H. Boujut, J. Benois-Pineau, R. Megret, Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion, in: European Conference on Computer Vision Workshops, 2012.
[31]
O. Brouard, V. Ricordel, D. Barba, Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif, in: Compression et representation des signaux audiovisuels, CORESA 2009, Toulouse, France. 6 p. 2009.
[32]
C. Chamaret, J.-C. Chevet, O. Le Meur, Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies, in: IEEE International Conference on Image Processing (ICIP), 2010, pp. 1077-1080.
[33]
D. Ramirez-Moreno, O. Schwartz, J. Ramirez-Villegas, A saliency-based bottom-up visual attention model for dynamic scenes analysis, Biol. Cybern., 107 (2013) 141-160.
[34]
H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst., 110 (2008) 346-359.
[35]
D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., 60 (2004) 91-110.
[36]
V. Sreekanth, A. Vedaldi, C. Jawahar, A. Zisserman, Generalized RBF feature maps for efficient detection, in: British Machine Vision Conference 2010.
[37]
B.A. Wandell, Foundations of Vision, Sinauer Associates Inc., Sunderland, 1995.
[38]
S. Liversedge, I. Gilchrist, S. Everling, The Oxford Handbook of Eye Movements, Oxford Library of Psychology, Oxford, 2011 (Chapter 33).
[39]
E.-C. Chang, S. Mallat, C. Yap, Wavelet foveation, Appl. Comput. Harmon. Anal., 9 (2000) 312-335.
[40]
J.S. Perry, W. S. Geisler, Gaze-contingent real-time simulation of arbitrary visual fields, in: SPIE Proceedings on Human Vision and Electronic Imaging, pp. 57-69.
[41]
M. Marszalek, C. Schmid, Spatial weighting for bag-of-features, in: IEEE Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 2118-2125.
[42]
J. Yang, K. Yu, Y. Gong, T. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794-1801.
[43]
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition.
[44]
C.-P. Wei, Y.-W. Chao, Y.-R. Yeh, Y.-C.F. Wang, Locality-sensitive dictionary learning for sparse representation based classification, Pattern Recognit., 46 (2013) 1277-1287.
[45]
H. Pirsiavash, D. Ramanan, Detecting activities of daily living in first-person camera views, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46]
B. Yao, L. Fei-Fei, Grouplet: a structured image representation for recognizing human and object interactions, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA.
[47]
M. Everingham, L. Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge, Int. J. Comput. Vis., 88 (2010) 303-338.
[48]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: a large-scale hierarchical image database, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[49]
S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in: IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2169-2178.
[50]
B. Yao, A. Khosla, L. Fei-Fei, Combining randomization and discrimination for fine-grained image categorization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011.

Cited By

View all
  1. Perceptual modeling in the problem of active object recognition in visual scenes

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Pattern Recognition
    Pattern Recognition  Volume 56, Issue C
    August 2016
    184 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 August 2016

    Author Tags

    1. Active object recognition
    2. Foveal and peripheral pathways
    3. Perceptual modeling
    4. Visual saliency

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ReViTPattern Recognition10.1016/j.patcog.2024.110853156:COnline publication date: 18-Nov-2024
    • (2022)Visual vs internal attention mechanisms in deep neural networks for image classification and object detectionPattern Recognition10.1016/j.patcog.2021.108411123:COnline publication date: 1-Mar-2022
    • (2020)Rapid Autonomous Semantic Mapping2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9341564(6156-6163)Online publication date: 24-Oct-2020
    • (2019)Saliency-based selection of visual content for deep convolutional neural networksMultimedia Tools and Applications10.1007/s11042-018-6515-278:8(9553-9576)Online publication date: 1-Apr-2019
    • (2018)First-Person Daily Activity Recognition With Manipulated Object Proposals and Non-Linear Feature FusionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2017.271681928:10(2946-2955)Online publication date: 1-Oct-2018
    • (2017)ConnoisseurProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095730(1-7)Online publication date: 19-Jun-2017
    • (2017)Individual trait oriented scanpath prediction for visual attention analysis2017 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2017.8296982(3745-3749)Online publication date: 17-Sep-2017
    • (2017)TextProposalsPattern Recognition10.1016/j.patcog.2017.04.02770:C(60-74)Online publication date: 1-Oct-2017

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media