Abstract
Fitting the skills of the natural vision is an appealing perspective for artificial vision systems, especially in robotics applications dealing with visual perception of the complex surrounding environment where robots and humans mutually evolve and/or cooperate, or in a more general way, those prospecting human–robot interaction. Focusing the visual attention dilemma through human eye-fixation paradigm, in this work we propose a model for artificial visual attention combining a statistical foundation of visual saliency and a genetic tuning of the related parameters for robots’ visual perception. The computational issue of our model relies on the one hand on center-surround statistical features’ calculations with a nonlinear fusion of different resulting maps, and on the other hand on an evolutionary tuning of human’s gazing way resulting in emergence of a kind of artificial eye-fixation-based visual attention. Statistical foundation and bottom-up nature of the proposed model provide as well the advantage to make it usable without needing prior information as a comprehensive solid theoretical basement. The eye-fixation paradigm has been considered as a keystone of the human-like gazing attribute, molding the robot’s visual behavior toward the human’s one. The same paradigm, providing MIT1003 and TORONTO image datasets, has served as evaluation benchmark for experimental validation of the proposed system. The reported experimental results show viability of the incorporated genetic tuning process in shoving the conduct of the artificial system toward the human-like gazing mechanism. In addition, a comparison to currently best algorithms used in the aforementioned field is reported through MIT300 dataset. While not being designed for eye-fixation prediction task, the proposed system remains comparable to most of algorithms within this leading group of currently best state-of-the-art algorithms used in the aforementioned field. Moreover, about ten times faster than the currently best state-of-the-art algorithms, the promising execution speed of our approach makes it suitable for an effective implementation fitting requirement for real-time robotics applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achanta R, Estrada F, Wils P, Susstrunk S (2008) Salient Region Detection and Segmentation. In: Proceedings of international conference on computer vision systems, vol 5008, LNCS, Springer, Berlin/Heidelberg, pp 66–75
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned Salient Region Detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 1597–1604
Borji A, Tavakoli HR, Sihite DN, Itti L (2013a) Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of IEEE ICCV, pp 921–928
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69
Bruce NDB, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24
Chella A, Macaluso I (2009) The perception loop in CiceRobot, a museum guide robot. Neurocomputing 72(4–6):760–766
Contreras-Reyes JE, Arellano-Valle RB (2012) Küllback-Leibler divergence measure for multivariate skew-normal distributions. Entropy 14(9):1606–1626
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Cornia M, Baraldi L, Serra G, Cucchiara R (2016a) Predicting human eye fixations via an LSTM-based saliency attentive model. CoRR. arXiv:1611.09571
Cornia M., Baraldi L., Serra G., Cucchiara R (2016b) A deep multi-level network for saliency prediction. In Proceedings of international conference on pattern recognition (ICPR)
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545–552
Hayhoe M, Ballard D (2005) Eye movements in natural behavior. Trends Cogn Sci 9:188–194
Holzbach A, Cheng G (2014) A scalable and efficient method for salient region detection using sampled template collation. In: Proceedings of IEEE ICIP, pp 1110–1114
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intel 20:1254–1259
Jiang M, Xu J, Zhao Q (2014) Saliency in crowd. In: Proceedings of ECCV, Lecture Notes in Computer Science, vol 8695, pp 17–32
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: Proceedings of IEEE ICCV, pp 2106–2113
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. MIT Technical Report, http://saliency.mit.edu/
Kachurka V, Madani K, Sabourin C, Golovko V (2014) A statistical approach to human-like visual attention and saliency detection for robot vision: application to wildland fires detection. In: Proceedings of ICNNAI 2014, Brest, Byelorussia, June 3–6, CCIS series, vol 440. Springer, pp 124–135
Kachurka V, Madani K, Sabourin C, Golovko V (2015) From human eye fixation to human-like autonomous artificial vision. In: Proceedings of the international work-conference on artificial neural networks (IWANN 2015), LNCS series, vol 9094, Part I. Springer, pp 171–184
Kadir T, Brady M (2001) Saliency, scale and image description. J Vis 45(2):83–105
Kienzle W, Franz MO, Schölkopf B, Wichmann FA (2009) Center-surround patterns emerge as optimal predictors for human saccade targets. J Vis 9:1–15
Koehler K, Guo F, Zhang S, Eckstein MP (2014) What do saliency models predict? J Vis 14(3):1–27
Kümmerer M, Wallis TS, Bethge M (2016) DeepGaze II: reading fixations from deep features trained on object recognition. arXiv:1610.01563
Kümmerer M, Theis L, Bethge M, DeepGaze I (2015) Boosting saliency prediction with feature maps trained on ImageNet. In: Proceedings of international conference on learning representations (ICLR)
Liang Z, Chi Z, Fu H, Feng D (2012) Salient object detection using content-sensitive hypergraph representation and partitioning. Pattern Rec 45(11):3886–3901
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H-Y (2001) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367
Liu T, Sun J, Zheng N. N, Shum HY (2007) Learning to detect a salient object. In: Proceedings of IEEE ICCV, pp 1–8
Moreno R, Ramik DM, Graña M, Madani K (2012) Image segmentation on the spherical coordinate representation of the RGB color space. IET Image Proc 6(9):1275–1283
Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed. Proc of IEEE CVPR II:2049–2056
Pan J, Canton C, McGuinness K, O’Connor NE, Torres J (2017) SalGAN: visual saliency prediction with generative adversarial networks. In: Proceedings of scene understanding workshop (SUNw), CVPR 2017, July 21 to July 26, 2017, Honolulu, Hawaii, USA
Panerai F, Metta G, Sandini G (2002) Learning visual stabilization reflexes in robots with moving eyes. Neurocomputing 48(1–4):323–337
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416
Rajashekar U, Vander Linde I, Bovik AC, Cormack LK (2008) GAFFE: a Gaze- attentive fixation finding engine. IEEE Trans Image Process 17(4):564–573
Ramik DM (2012) Contribution to complex visual information processing and autonomous knowledge extraction: application to autonomous robotics. Ph.D. dissertation, University Paris-Est, Pub. No. 2012PEST1100
Ramik DM, Sabourin C, Madani K (2011) Hybrid salient object extraction approach with automatic estimation of visual attention scale. In: Proceedings of IEEE SITIS, pp 438–445
Ramik DM, Sabourin C, Moreno R, Madani K (2014) A machine learning based intelligent vision system for autonomous object detection and recognition. J Appl Intell 40(2):358–375
Ramik DM, Madani K, Sabourin C (2015) A soft-computing basis for robots’ cognitive autonomous learning. Soft Comput J 19:2407–2421
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: Proceedings of IEEE ICCV, pp 1153–1160
Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun J 28(6):642–658
Shen Ch, Zhao Q (2014) Webpage saliency. In: Proceedings of ECCV, Lecture Notes in Computer Science, vol 8695, pp 34–46
Subramanian R, Katti H, Sebe N, Kankanhalli M, Chua T-S (2010) An eye fixation database for saliency detection in images. In: Proceedings of ECCV, pp 30–43
Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor bases and image feature distributions. J Vis 14:1–17
Tavakoli HR, Laaksonen J (2016) Bottom-up fixation prediction using unsupervised hierarchical models. In: Proceedings of ACCV 2016, Workshop on Assistive Vision, LNCS 10116. Spinger, pp 287–302
Triesch J, Ballard DH, Hayhoe MM, Sullivan BT (2003) What you see is what you need. J Vis 3:86–94
Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of IEEE CVPR, pp 2798–2805
Võ ML-H, Smith TJ, Mital PK, Henderson JM (2012) Do the eyes really have it? Dynamic allocation of attention when viewing moving faces? J Vis 12(13):1–14
Zhang J, Sclaroff S (2013) Saliency detection: a Boolean map approach. In: Proceedings of IEEE ICCV, pp 153–160
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declares that they have no conflict of interest.
Ethical standard
Authors declare that the presented work does not involve any direct human participants. Authors also declare that all standard benchmark databases were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.
Additional information
Communicated by V. Loia.
Appendices
Appendix A-1
Let us suppose the image \(I_{YCC} \), represented by its pixels \(I_{YCC} \left( x \right) \), in YCrCb color space, where \(x\in \mathrm{N}^{2}\) denotes 2D pixel position. Let \(I_Y \left( x \right) \), \(I_{Cr} \left( x \right) \) and \(I_{Cb} \left( x \right) \) be the colors values in channels Y, Cr and Cb, respectively. Similarly, let \(I_{RGB} \left( x \right) \) be the same image in RGB color space and \(I_R \left( x \right) \), \(I_G \left( x \right) \) and \(I_B \left( x \right) \) be its colors values in channels R, G and B, respectively. Finally, let \(\overline{I_Y } \), \(\overline{I_{Cr} } \) and \(\overline{I_{Cb} } \) be median values for each channel throughout the whole image.
As a reminder, we recall that GSM is resulted from nonlinear fusion of two elementary maps denoted \(M_Y \left( x \right) \) and \(M_{CrCb} \left( x \right) \), relating luminance and chromaticity separately (Moreno et al. 2012 and Ramik 2012). Equations (A-1), (A-2) and (A-3) detail the calculation of each elementary map as well as the resulting GSM.
\(C\left( x \right) \) is a coefficient, computed accordingly to the equation (A-4) which depends on saturation of each pixel in RGB color space expressed by equation (A-5).
In the same way, based on center-surround histograms’ concept involving statistical properties of two centered windows (over each pixel) sliding alongside whole the image (initially proposed in Liu et al. 2007), LSM is also obtained from a nonlinear fusion of two statistical properties-based maps relating luminance and chromaticity of the image (Moreno et al. 2012 and Ramik 2012). Referring to Fig. 12, let \(P\left( x \right) \) be a sliding window of sizeP, centered over a pixel located at position x, and let \(Q\left( x \right) \) be another surrounding area around the same pixel of size Q, with \(Q-P=p^{2}\). Let consider the “center-histogram” \(H_C \left( x \right) \) as a histogram of pixels’ intensities in window \(P\left( x \right) \) with \(h_C \left( {x , k} \right) \) representing the value of kth bin of this histogram. Let consider the “surrounding-histogram” \(H_S \left( x \right) \) as a histogram of pixels’ intensities in the surrounding window \(Q\left( x \right) \) with \(h_S \left( {x , k} \right) \) representing its kth bin’s value. Let define the “center-surround feature” \(d_{ch} \left( x \right) \) as a sum of differences between normalized center and surrounding histograms over all 256 histograms’ bins computed for each channel “ch” (\(ch\in \left\{ {Y , Cr , Cb} \right\} )\).
Equation (A-6) details the computation of the so-called “center-surround feature,” where \(\left| {H_C \left( x \right) } \right| \) and \(\left| {H_S \left( x \right) } \right| \) represent sums over all histogram bins. Concerning pixels near the image’s borders, the areas \(P\left( x \right) \) and \(Q\left( x \right) \) are limited to the image itself (e.g., not exceeding the image’s borders).
The LSM, resulting from a nonlinear fusion of the so-called center-surround features, is obtained accordingly to the equation (A-7), where the coefficient \(C_\mu \left( x \right) \) represents the average color saturation over the window \(P\left( x \right) \), computed as average of saturations \(C\left( x \right) \) (please refer to equation (A-6)) using the equation (A-8).
Appendix A-2
Table 9 summarizes notations and symbols used in this articles.
Rights and permissions
About this article
Cite this article
Madani, K., Kachurka, V., Sabourin, C. et al. A soft-computing-based approach to artificial visual attention using human eye-fixation paradigm: toward a human-like skill in robot vision. Soft Comput 23, 2369–2389 (2019). https://doi.org/10.1007/s00500-017-2931-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2931-x