A soft-computing-based approach to artificial visual attention using human eye-fixation paradigm: toward a human-like skill in robot vision

Kurosh Madani¹,
Viachaslau Kachurka^1,2,
Christophe Sabourin¹ &
…
Vladimir Golovko²

464 Accesses
6 Citations
Explore all metrics

Abstract

Fitting the skills of the natural vision is an appealing perspective for artificial vision systems, especially in robotics applications dealing with visual perception of the complex surrounding environment where robots and humans mutually evolve and/or cooperate, or in a more general way, those prospecting human–robot interaction. Focusing the visual attention dilemma through human eye-fixation paradigm, in this work we propose a model for artificial visual attention combining a statistical foundation of visual saliency and a genetic tuning of the related parameters for robots’ visual perception. The computational issue of our model relies on the one hand on center-surround statistical features’ calculations with a nonlinear fusion of different resulting maps, and on the other hand on an evolutionary tuning of human’s gazing way resulting in emergence of a kind of artificial eye-fixation-based visual attention. Statistical foundation and bottom-up nature of the proposed model provide as well the advantage to make it usable without needing prior information as a comprehensive solid theoretical basement. The eye-fixation paradigm has been considered as a keystone of the human-like gazing attribute, molding the robot’s visual behavior toward the human’s one. The same paradigm, providing MIT1003 and TORONTO image datasets, has served as evaluation benchmark for experimental validation of the proposed system. The reported experimental results show viability of the incorporated genetic tuning process in shoving the conduct of the artificial system toward the human-like gazing mechanism. In addition, a comparison to currently best algorithms used in the aforementioned field is reported through MIT300 dataset. While not being designed for eye-fixation prediction task, the proposed system remains comparable to most of algorithms within this leading group of currently best state-of-the-art algorithms used in the aforementioned field. Moreover, about ten times faster than the currently best state-of-the-art algorithms, the promising execution speed of our approach makes it suitable for an effective implementation fitting requirement for real-time robotics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From Human Eye Fixation to Human-like Autonomous Artificial Vision

Optimised Computational Visual Attention Model for Robotic Cognition

Synthesis of Visual Attention-Based Robotic System and Its Present Utilization in Engineering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Achanta R, Estrada F, Wils P, Susstrunk S (2008) Salient Region Detection and Segmentation. In: Proceedings of international conference on computer vision systems, vol 5008, LNCS, Springer, Berlin/Heidelberg, pp 66–75
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned Salient Region Detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 1597–1604
Borji A, Tavakoli HR, Sihite DN, Itti L (2013a) Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of IEEE ICCV, pp 921–928
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Article Google Scholar
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69
Article MathSciNet MATH Google Scholar
Bruce NDB, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24
Article Google Scholar
Chella A, Macaluso I (2009) The perception loop in CiceRobot, a museum guide robot. Neurocomputing 72(4–6):760–766
Article Google Scholar
Contreras-Reyes JE, Arellano-Valle RB (2012) Küllback-Leibler divergence measure for multivariate skew-normal distributions. Entropy 14(9):1606–1626
Article MathSciNet MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Article Google Scholar
Cornia M, Baraldi L, Serra G, Cucchiara R (2016a) Predicting human eye fixations via an LSTM-based saliency attentive model. CoRR. arXiv:1611.09571
Cornia M., Baraldi L., Serra G., Cucchiara R (2016b) A deep multi-level network for saliency prediction. In Proceedings of international conference on pattern recognition (ICPR)
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545–552
Google Scholar
Hayhoe M, Ballard D (2005) Eye movements in natural behavior. Trends Cogn Sci 9:188–194
Article Google Scholar
Holzbach A, Cheng G (2014) A scalable and efficient method for salient region detection using sampled template collation. In: Proceedings of IEEE ICIP, pp 1110–1114
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intel 20:1254–1259
Article Google Scholar
Jiang M, Xu J, Zhao Q (2014) Saliency in crowd. In: Proceedings of ECCV, Lecture Notes in Computer Science, vol 8695, pp 17–32
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: Proceedings of IEEE ICCV, pp 2106–2113
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. MIT Technical Report, http://saliency.mit.edu/
Kachurka V, Madani K, Sabourin C, Golovko V (2014) A statistical approach to human-like visual attention and saliency detection for robot vision: application to wildland fires detection. In: Proceedings of ICNNAI 2014, Brest, Byelorussia, June 3–6, CCIS series, vol 440. Springer, pp 124–135
Kachurka V, Madani K, Sabourin C, Golovko V (2015) From human eye fixation to human-like autonomous artificial vision. In: Proceedings of the international work-conference on artificial neural networks (IWANN 2015), LNCS series, vol 9094, Part I. Springer, pp 171–184
Kadir T, Brady M (2001) Saliency, scale and image description. J Vis 45(2):83–105
MATH Google Scholar
Kienzle W, Franz MO, Schölkopf B, Wichmann FA (2009) Center-surround patterns emerge as optimal predictors for human saccade targets. J Vis 9:1–15
Article Google Scholar
Koehler K, Guo F, Zhang S, Eckstein MP (2014) What do saliency models predict? J Vis 14(3):1–27
Article Google Scholar
Kümmerer M, Wallis TS, Bethge M (2016) DeepGaze II: reading fixations from deep features trained on object recognition. arXiv:1610.01563
Kümmerer M, Theis L, Bethge M, DeepGaze I (2015) Boosting saliency prediction with feature maps trained on ImageNet. In: Proceedings of international conference on learning representations (ICLR)
Liang Z, Chi Z, Fu H, Feng D (2012) Salient object detection using content-sensitive hypergraph representation and partitioning. Pattern Rec 45(11):3886–3901
Article Google Scholar
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H-Y (2001) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367
Google Scholar
Liu T, Sun J, Zheng N. N, Shum HY (2007) Learning to detect a salient object. In: Proceedings of IEEE ICCV, pp 1–8
Moreno R, Ramik DM, Graña M, Madani K (2012) Image segmentation on the spherical coordinate representation of the RGB color space. IET Image Proc 6(9):1275–1283
Article Google Scholar
Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed. Proc of IEEE CVPR II:2049–2056
Google Scholar
Pan J, Canton C, McGuinness K, O’Connor NE, Torres J (2017) SalGAN: visual saliency prediction with generative adversarial networks. In: Proceedings of scene understanding workshop (SUNw), CVPR 2017, July 21 to July 26, 2017, Honolulu, Hawaii, USA
Panerai F, Metta G, Sandini G (2002) Learning visual stabilization reflexes in robots with moving eyes. Neurocomputing 48(1–4):323–337
Article MATH Google Scholar
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416
Article Google Scholar
Rajashekar U, Vander Linde I, Bovik AC, Cormack LK (2008) GAFFE: a Gaze- attentive fixation finding engine. IEEE Trans Image Process 17(4):564–573
Article MathSciNet Google Scholar
Ramik DM (2012) Contribution to complex visual information processing and autonomous knowledge extraction: application to autonomous robotics. Ph.D. dissertation, University Paris-Est, Pub. No. 2012PEST1100
Ramik DM, Sabourin C, Madani K (2011) Hybrid salient object extraction approach with automatic estimation of visual attention scale. In: Proceedings of IEEE SITIS, pp 438–445
Ramik DM, Sabourin C, Moreno R, Madani K (2014) A machine learning based intelligent vision system for autonomous object detection and recognition. J Appl Intell 40(2):358–375
Article Google Scholar
Ramik DM, Madani K, Sabourin C (2015) A soft-computing basis for robots’ cognitive autonomous learning. Soft Comput J 19:2407–2421
Article Google Scholar
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: Proceedings of IEEE ICCV, pp 1153–1160
Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun J 28(6):642–658
Article Google Scholar
Shen Ch, Zhao Q (2014) Webpage saliency. In: Proceedings of ECCV, Lecture Notes in Computer Science, vol 8695, pp 34–46
Subramanian R, Katti H, Sebe N, Kankanhalli M, Chua T-S (2010) An eye fixation database for saliency detection in images. In: Proceedings of ECCV, pp 30–43
Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor bases and image feature distributions. J Vis 14:1–17
Google Scholar
Tavakoli HR, Laaksonen J (2016) Bottom-up fixation prediction using unsupervised hierarchical models. In: Proceedings of ACCV 2016, Workshop on Assistive Vision, LNCS 10116. Spinger, pp 287–302
Triesch J, Ballard DH, Hayhoe MM, Sullivan BT (2003) What you see is what you need. J Vis 3:86–94
Article Google Scholar
Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of IEEE CVPR, pp 2798–2805
Võ ML-H, Smith TJ, Mital PK, Henderson JM (2012) Do the eyes really have it? Dynamic allocation of attention when viewing moving faces? J Vis 12(13):1–14
Article Google Scholar
Zhang J, Sclaroff S (2013) Saliency detection: a Boolean map approach. In: Proceedings of IEEE ICCV, pp 153–160

Download references

Author information

Authors and Affiliations

Université Paris-Est, Signals, Images, and Intelligent Systems Laboratory (LISSI/EA 3956), University Paris-Est Creteil, Senart-FB Institute of Technology, 36-37 rue Charpak, 77127, Lieusaint, France
Kurosh Madani, Viachaslau Kachurka & Christophe Sabourin
Neural Networks Laboratory, Intelligent Information Technologies Department, Brest State Technical University, Moskowskaya st. 267, 224017, Brest, Belarus
Viachaslau Kachurka & Vladimir Golovko

Authors

Kurosh Madani
View author publications
You can also search for this author in PubMed Google Scholar
Viachaslau Kachurka
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Sabourin
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Golovko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kurosh Madani.

Ethics declarations

Conflict of interest

Authors declares that they have no conflict of interest.

Ethical standard

Authors declare that the presented work does not involve any direct human participants. Authors also declare that all standard benchmark databases were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Additional information

Communicated by V. Loia.

Appendices

Appendix A-1

Let us suppose the image $I_{YCC} $, represented by its pixels $I_{YCC} \left( x \right) $, in YCrCb color space, where $x\in \mathrm{N}^{2}$ denotes 2D pixel position. Let $I_Y \left( x \right) $, $I_{Cr} \left( x \right) $ and $I_{Cb} \left( x \right) $ be the colors values in channels Y, Cr and Cb, respectively. Similarly, let $I_{RGB} \left( x \right) $ be the same image in RGB color space and $I_R \left( x \right) $, $I_G \left( x \right) $ and $I_B \left( x \right) $ be its colors values in channels R, G and B, respectively. Finally, let $\overline{I_Y } $, $\overline{I_{Cr} } $ and $\overline{I_{Cb} } $ be median values for each channel throughout the whole image.

As a reminder, we recall that GSM is resulted from nonlinear fusion of two elementary maps denoted $M_Y \left( x \right) $ and $M_{CrCb} \left( x \right) $, relating luminance and chromaticity separately (Moreno et al. 2012 and Ramik 2012). Equations (A-1), (A-2) and (A-3) detail the calculation of each elementary map as well as the resulting GSM.

$$\begin{aligned}&M_Y \left( x \right) =\left\| {\overline{I_Y } -I_Y \left( x \right) } \right\| \end{aligned}$$

(A-1)

$$\begin{aligned}&M_{CrCb} \left( x \right) =\sqrt{\left( {\overline{I_{Cr} } -I_{Cr} \left( x \right) } \right) ^{2}+\left( {\overline{I_{Cb} } -I_{Cb} \left( x \right) } \right) ^{2}} \end{aligned}$$

(A-2)

$$\begin{aligned}&M_G \left( x \right) =\frac{1}{1-e^{-C\left( x \right) }} M_{CrCb} \left( x \right) \nonumber \\&\quad +\left( {1-\frac{1}{1-e^{-C\left( x \right) }}} \right) M_Y \left( x \right) \end{aligned}$$

(A-3)

$C\left( x \right) $ is a coefficient, computed accordingly to the equation (A-4) which depends on saturation of each pixel in RGB color space expressed by equation (A-5).

$$\begin{aligned} C\left( x \right)= & {} 10-0.5 C_C \left( x \right) \end{aligned}$$

(A-4)

$$\begin{aligned} C_C \left( x \right)= & {} Max \left( {I_R \left( x \right) , I_G \left( x \right) , I_B \left( x \right) } \right) \nonumber \\&-Min \left( {I_R \left( x \right) , I_G \left( x \right) , I_B \left( x \right) } \right) \end{aligned}$$

(A-5)

In the same way, based on center-surround histograms’ concept involving statistical properties of two centered windows (over each pixel) sliding alongside whole the image (initially proposed in Liu et al. 2007), LSM is also obtained from a nonlinear fusion of two statistical properties-based maps relating luminance and chromaticity of the image (Moreno et al. 2012 and Ramik 2012). Referring to Fig. 12, let $P\left( x \right) $ be a sliding window of sizeP, centered over a pixel located at position x, and let $Q\left( x \right) $ be another surrounding area around the same pixel of size Q, with $Q-P=p^{2}$. Let consider the “center-histogram” $H_C \left( x \right) $ as a histogram of pixels’ intensities in window $P\left( x \right) $ with $h_C \left( {x , k} \right) $ representing the value of kth bin of this histogram. Let consider the “surrounding-histogram” $H_S \left( x \right) $ as a histogram of pixels’ intensities in the surrounding window $Q\left( x \right) $ with $h_S \left( {x , k} \right) $ representing its kth bin’s value. Let define the “center-surround feature” $d_{ch} \left( x \right) $ as a sum of differences between normalized center and surrounding histograms over all 256 histograms’ bins computed for each channel “ch” ($ch\in \left\{ {Y , Cr , Cb} \right\} )$.

Equation (A-6) details the computation of the so-called “center-surround feature,” where $\left| {H_C \left( x \right) } \right| $ and $\left| {H_S \left( x \right) } \right| $ represent sums over all histogram bins. Concerning pixels near the image’s borders, the areas $P\left( x \right) $ and $Q\left( x \right) $ are limited to the image itself (e.g., not exceeding the image’s borders).

$$\begin{aligned} d_{ch} \left( x \right) =\sum _{k=1}^{256} {\left| {\frac{h_C \left( {x , k} \right) }{\left| {H_C \left( x \right) } \right| }-\frac{h_S \left( {x , k} \right) }{\left| {H_S \left( x \right) } \right| }} \right| } \end{aligned}$$

(A-6)

The LSM, resulting from a nonlinear fusion of the so-called center-surround features, is obtained accordingly to the equation (A-7), where the coefficient $C_\mu \left( x \right) $ represents the average color saturation over the window $P\left( x \right) $, computed as average of saturations $C\left( x \right) $ (please refer to equation (A-6)) using the equation (A-8).

$$\begin{aligned}&M_L \left( x \right) =\frac{1}{1-e^{-C_\mu \left( x \right) }} d_Y \left( x \right) \nonumber \\&\qquad \qquad \quad +\left( {1-\frac{1}{1-e^{-C_\mu \left( x \right) }}} \right) Max \left( { d_{Cr} \left( x \right) , d_{Cb} \left( x \right) } \right) \nonumber \\\end{aligned}$$

(A-7)

$$\begin{aligned}&C_\mu \left( x \right) =\frac{\sum _{l=1}^P {C\left( l \right) } }{P} \end{aligned}$$

(A-8)

Appendix A-2

Table 9 summarizes notations and symbols used in this articles.

Table 9 Notations and symbols used in the article

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Madani, K., Kachurka, V., Sabourin, C. et al. A soft-computing-based approach to artificial visual attention using human eye-fixation paradigm: toward a human-like skill in robot vision. Soft Comput 23, 2369–2389 (2019). https://doi.org/10.1007/s00500-017-2931-x

Download citation

Published: 18 November 2017
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s00500-017-2931-x