Application-Oriented Retinal Image Models for Computer Vision
"> Figure 1
<p>The proposed framework to generate application-oriented retinal image models (ARIMs). The workflow begins by defining the application’s requirements regarding operation (e.g., objects’ positioning, illumination) and efficiency (e.g., storage, accuracy). Then, a proper implicit function (e.g., <math display="inline"><semantics> <msub> <mi>l</mi> <mn>2</mn> </msub> </semantics></math>) and the spatial configuration of the retinal image model—comprising foveal and peripheral regions—are chosen. The next step is the generation of the model by means of an optimization procedure that considers the implicit function and the spatial configuration to resample points in the 2D cartesian space. The final artifact is an ARIM comprised by uniformly- and non-uniformly-sampled foveal and peripheral regions, respectively. This model is used to resample uniform images, taking them to a space-variant domain and potentially contemplating the requirements determined beforehand.</p> "> Figure 2
<p>Examples of implicit functions: (<b>a</b>) <math display="inline"><semantics> <msub> <mi>l</mi> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <msub> <mi>l</mi> <mn>2</mn> </msub> </semantics></math>, and (<b>c</b>) <math display="inline"><semantics> <msub> <mi>l</mi> <mo>∞</mo> </msub> </semantics></math>.</p> "> Figure 3
<p>Convergence analysis for ARIMs with a single fovea and based on the <math display="inline"><semantics> <msub> <mi>l</mi> <mo>∞</mo> </msub> </semantics></math> implicit function. Examples of ARIMs containing (<b>a</b>) 96, (<b>b</b>) 192, and (<b>c</b>) 384 non-uniform points in the periphery. For each ARIM, the resulting global energy curve over 1000 iterations of the generation process is shown in the model’s respective column.</p> "> Figure 4
<p>Convergence analysis for ARIMs with no fovea and based on the <math display="inline"><semantics> <msub> <mi>l</mi> <mn>2</mn> </msub> </semantics></math> implicit function. Examples of ARIMs containing (<b>a</b>) 96, (<b>b</b>) 192, and (<b>c</b>) 384 non-uniform points in the periphery. For each ARIM, the resulting global energy curve over 2000 iterations of the generation process is shown in the model’s respective column.</p> "> Figure 5
<p>The evolution of an example of ARIM with 256 foveal (uniform), and 192 peripheral (non-uniform) pixels. The <math display="inline"><semantics> <msub> <mi>l</mi> <mo>∞</mo> </msub> </semantics></math> is the implicit function.</p> "> Figure 6
<p>Example of a simulation using one of our ARIMs and a sample sequence from the employed dataset [<a href="#B19-sensors-20-03746" class="html-bibr">19</a>]. Images (<b>a</b>–<b>e</b>) and (<b>k</b>–<b>o</b>) are the original frames; images (<b>f</b>–<b>j</b>) and (<b>p</b>–<b>t</b>) are the reconstructions with a model that considers an optical flow peripheral representation. Green and yellow arrows indicate motion direction to the right and left sides, respectively, whereas the ON and OFF labels refer to the operational status of the foveal (face detection/recognition) and peripheral (optical flow) regions. Note that the motion analysis, besides triggering foveal analysis, is also able to restart conveniently, as long as faces are not detected in the fovea during a time interval of frames (left-most frame in the fourth row).</p> "> Figure 7
<p>Implemented workflow for simulating the use of ARIMs in a specific Computer Vision (CV) application. In an ideal scenario, the ARIM, a captured image frame, and the chosen pixel representations for foveal and periphery areas are input to an hypothetical specific-purpose sensor that changes its configuration at run-time. Such a sensor would yield a stream (bytestream) of pixel data from each region of the captured image. The stream (not the 2D image) would be forwarded to the CV application. However, for simulation purposes, this architecture is fully implemented by software.</p> "> Figure 8
<p>The pixel map of the evaluated ARIM and its configurations. The experimented foveal configurations comprised three uniform sampling setups: <math display="inline"><semantics> <mrow> <mn>100</mn> <mo>×</mo> <mn>100</mn> </mrow> </semantics></math> (half density), <math display="inline"><semantics> <mrow> <mn>150</mn> <mo>×</mo> <mn>150</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mn>200</mn> <mo>×</mo> <mn>200</mn> </mrow> </semantics></math> (full density) pixels. The pixel representations for the fovea and periphery were based on the grayscale and optical flow (magnitude and direction) values, respectively.</p> "> Figure 9
<p>ROC curves regarding the face recognition task considering the (<b>a</b>) P1E_S2 and (<b>b</b>) P1L_S2 image sequences from all cameras. The figures comprise a mean ROC (blue) curve from all 25 (light blue) class-specific curves, i.e., for each face class in the dataset. These class-specific ROC curves were calculated via a one-versus-all classification procedure.</p> "> Figure 10
<p>Mean face recognition accuracy regarding each evaluated model, the images resized to 25% of their original sizes, and the benchmark images from the (<b>a</b>) P1E, (<b>b</b>) P1L, (<b>c</b>) P2E, and (<b>d</b>) P2L datasets.</p> "> Figure 11
<p>Total energy consumption regarding each evaluated model, the images resized to 25% of their original sizes, and the original (full-size) images from the (<b>a</b>) P1E, (<b>b</b>) P1L, (<b>c</b>) P2E, and (<b>d</b>) P2L datasets.</p> ">
Abstract
:1. Introduction
- we provide a framework for designing Application-Oriented Retinal Image Models (ARIM) towards computer vision (CV) applications;
- we evaluate the use of ARIMs in a CV application of the biometry field in terms of memory storage and energy reductions;
- we discuss the trade-offs between the application’s accuracy and the reductions in the computing resources induced by the ARIMs;
- we compare our results to other common setups (original and downsized uniform-resolution images) and show that the obtained storage and energy savings are relevant; and,
- we briefly discuss the use of ARIMs in real-life application scenarios and the nuances of having an ideal hardware layer that resamples images according to ARIMs.
2. Literature Review
2.1. Hardware-Based Approaches
2.2. Software-Based Approaches
2.3. Recent Clustering Techniques
3. Proposed Approach
3.1. Definition of Application Requirements
3.2. Implicit Function Selection
3.3. Definition of Spatial Configuration
- Number of foveas: surely a human eye has only one fovea, but it is perfectly fine for a model to comprise more than one region of uniform sampling, depending on the application on hand. In our biometric application, we took into account only one fovea.
- Location of foveas: the foveas should be spatially organized adhering to the specific requirements of the application. In ours, the fovea is centralized in the image.
- Density of foveas: the foveas can be downsampled to simulate a uniform image resolution reduction. We tested different densities (grids) for our fovea.
- Density of periphery: the periphery is an important region that encompasses few sparse data in a non-uniform sampling configuration. As discussed previously, by retaining and wisely handling sparse peripheral information (e.g., detecting motion and coarse objects in such an area), the application’s resource usage might be optimized.
3.4. Model Generation
4. Materials and Methods
4.1. Target Application: Face Detection/Recognition
4.1.1. On the Application Selection
- Step 1: We analysed the CV application’s demands and characteristics. In the biometry application considered, we observed aspects regarding:
- the use of computational resources, which should be preferably low when running in environments of large energy and storage limitations;
- the intrinsic characteristics of the application’s domain, such as the task to be executed (face authentication), the expected “behavior” of the input data (person movement and positioning in the images), camera angles, the most relevant part of the image to process, etc.;
- the possibility of balancing the pixel density of different image regions. In this sense, given the application on hand, we decided that the processed image would have different resolutions across its space. This will induce a compromise between energy, storage, and accuracy; and,
- the possibility of adopting distinct pixel representations across the image in order to save computational resources. In the current case, an additional motion analysis is performed by taking advantage of an optical flow pixel representation in some image regions.
- Step 2: subsequently, by the previous analysis, we selected an appropriate implicit function to represent the pixel distribution of the image;
- Step 3: next, we defined the spatial configuration of foveal and peripheral regions by knowing, for instance, that individuals often move to the central part of the image to allow a better authentication. In this case, we defined a single central fovea;
- Step 4: finally, we created ARIMs encompassing and consolidating the expected properties of the images defined in the previous steps.
4.1.2. On the Application Implementation
4.1.3. Simulation Details
4.1.4. Technical Information
4.2. Dataset
4.2.1. Justification for the Selected Dataset
4.2.2. Dataset Organization
- P1E and P1L: the subsets of frame sequences of people entering and leaving portal 1, respectively;
- P2E and P2L: the subsets of frame sequences of people entering and leaving portal 2, respectively.
- One (1) of the sequences of individuals entering a portal (P1E_S1_C1) was used to train the face recognizer. Such sequence comes from camera 1, which obtains near frontal-face images. That sequence is also captured by cameras 2 and 3 at different angles, hence, to avoid biased evaluations, we ignored such sequences (P1E_S1_C2 and P1E_S1_C3), as both of these contain, essentially, the same faces of the former up to slight angle variations.
- Eleven (11) sequences where no face is found in the fovea were ignored. This decision was taken because no face recognition accuracy evaluations (using our models) would apply to these sequences.
4.3. Evaluated Models
4.4. Evaluation Criteria and Hardware Setup
5. Results and Discussion
5.1. Storage Reduction
5.2. Face Recognition Accuracy
5.3. Energy Consumption Evaluation
5.4. Implications in Real-Time Applications
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Bailey, D.G.; Bouganis, C.S. Vision Sensor with an Active Digital Fovea. In Recent Advances in Sensing Technology; Mukhopadhyay, S.C., Gupta, G.S., Huang, R.Y.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 91–111. [Google Scholar] [CrossRef]
- Bornholt, J.; Mytkowicz, T.; Mckinley, K.S. The model is not enough: Understanding energy consumption in mobile devices. In Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), Cupertino, CA, USA, 27–29 August 2012; pp. 1–3. [Google Scholar]
- Wandell, B.A. Foundations of Vision; Sinauer Associates, Incorporated: Sunderland, MA, USA, 1995. [Google Scholar]
- Bolduc, M.; Levine, M.D. A Review of Biologically Motivated Space-Variant Data Reduction Models for Robotic Vision. Comput. Vision Image Underst. 1998, 69, 170–184. [Google Scholar] [CrossRef] [Green Version]
- Traver, V.J.; Bernardino, A. A review of log-polar imaging for visual perception in robotics. Robot. Auton. Syst. 2010, 58, 378–398. [Google Scholar] [CrossRef]
- Berton, F.; Sandini, G.; Metta, G. Anthropomorphic visual sensors. In Encyclopedia of Sensors; Grimes, C., Dickey, E., Pishko, M.V., Eds.; American Scientific Publishers: Stevenson Ranch, CA, USA, 2006; pp. 1–16. [Google Scholar]
- González, M.; Sánchez-Pedraza, A.; Marfil, R.; Rodríguez, J.A.; Bandera, A. Data-Driven Multiresolution Camera Using the Foveal Adaptive Pyramid. Sensors 2016, 16, 2003. [Google Scholar] [CrossRef] [Green Version]
- Smeraldi, F.; Bigun, J. Retinal vision applied to facial features detection and face authentication. Pattern Recognit. Lett. 2002, 23, 463–475. [Google Scholar] [CrossRef]
- Akbas, E.; Eckstein, M.P. Object detection through search with a foveated visual system. PLOS Comput. Biol. 2017, 13, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Phillips, D.B.; Sun, M.J.; Taylor, J.M.; Edgar, M.P.; Barnett, S.M.; Gibson, G.M.; Padgett, M.J. Adaptive foveated single-pixel imaging with dynamic supersampling. Sci. Adv. 2017, 3, e1601782. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wen, W.; Kajínek, O.; Khatibi, S.; Chadzitaskos, G. A Common Assessment Space for Different Sensor Structures. Sensors 2019, 19, 568. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jain, A.K.; Murty, M.N.; Flynn, P.J. Data Clustering: A Review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
- Jain, M.; Singh, S. A survey on: Content based image retrieval systems using clustering techniques for large data sets. Int. J. Manag. Inf. Technol. 2011, 3, 23. [Google Scholar]
- Otto, C.; Klare, B.; Jain, A.K. An efficient approach for clustering face images. In Proceedings of the 2015 International Conference on Biometrics (ICB), Phuket, Thailand, 19–22 May 2015; pp. 243–250. [Google Scholar]
- Peng, X.; Feng, J.; Xiao, S.; Yau, W.; Zhou, J.T.; Yang, S. Structured AutoEncoders for Subspace Clustering. IEEE Trans. Image Process. 2018, 27, 5076–5086. [Google Scholar] [CrossRef] [PubMed]
- Peng, X.; Zhu, H.; Feng, J.; Shen, C.; Zhang, H.; Zhou, J.T. Deep Clustering With Sample-Assignment Invariance Prior. IEEE Trans. Neural Netw. Learn. Syst. 2019, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Goldenstein, S.; Vogler, C.; Velho, L. Adaptive Deformable Models for Graphics and Vision. Comput. Graph. Forum 2005, 24, 729–741. [Google Scholar] [CrossRef] [Green Version]
- de Goes, F.; Goldenstein, S.; Velho, L. A Simple and Flexible Framework to Adapt Dynamic Meshes. Comput. Graph. 2008, 32, 141–148. [Google Scholar] [CrossRef]
- Wong, Y.; Chen, S.; Mau, S.; Sanderson, C.; Lovell, B.C. Patch-based Probabilistic Image Quality Assessment for Face Selection and Improved Video-based Face Recognition. In Proceedings of the IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops, Colorado Springs, CO, USA, 20–25 June 2011; pp. 81–88. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. 511–518. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- King, D.E. Dlib-ml: A Machine Learning Toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI’81)—Volume 2, Vancouver, BC, Canada, 24–28 August 1981; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1981; pp. 674–679. [Google Scholar]
- Khan, K.N.; Hirki, M.; Niemi, T.; Nurminen, J.K.; Ou, Z. RAPL in Action: Experiences in Using RAPL for Power Measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 2018, 3, 1–26. [Google Scholar] [CrossRef]
Theoretical Reference | Library | Method | Input Parameters | |
---|---|---|---|---|
Face Detection | Viola-Jones [20] | OpenCV 3.0.0 | detectMultiScale | scaleFactor = 1.1 |
minNeighbors = 3 | ||||
Face Recognition | DNN model [21] + 1-NN | Dlib 19.16 [22] | get_face_chip_details | size = 150 |
padding = 0.25 | ||||
winSize = (31, 31) | ||||
maxLevel = 3 | ||||
Optical Flow | Lukas-Kanade [23] | OpenCV 3.0.0 | cvCalcOpticalFlowPyrLK | criteria.maxCount = 20; |
criteria.epsilon = 0.03 | ||||
minEigThreshold = 0.001 |
Num. of Pixels | Num. of Pixels Reduction | Bytes per Region | Total Bytes | Data Size Reduction | ||
---|---|---|---|---|---|---|
FOV | PER | |||||
Original | 480,000 | - | - | - | 1440,000 | - |
Resized (25%) | 120,000 | 75.00% | - | - | 360,000 | 75.00% |
Model_1 | 10,384 | 97.83% | 30,000 | 768 | 30,768 | 97.86% |
Model_2 | 22,884 | 95.23% | 67,500 | 768 | 68,268 | 95.25% |
Model_3 | 40,384 | 91.58% | 120,000 | 768 | 120,768 | 91.61% |
Dataset | Accuracy Loss | ||||||||
---|---|---|---|---|---|---|---|---|---|
Model 1 | Model 2 | Model 3 | |||||||
Min. | Mean | Max. | Min. | Mean | Max. | Min. | Mean | Max. | |
P1E | 0.032 | 0.123 | 0.264 | 0 | 0.050 | 0.108 | 0 | 0.006 | 0.021 |
P1L | 0.060 | 0.248 | 0.613 | 0 | 0.094 | 0.255 | 0 | 0.023 | 0.103 |
P2E | 0.174 | 0.353 | 0.500 | 0.032 | 0.172 | 0.318 | 0 | 0.006 | 0.037 |
P2L | 0.143 | 0.300 | 0.529 | 0.033 | 0.086 | 0.265 | 0 | 0.063 | 0.206 |
Dataset | Energy Reduction | ||||||||
---|---|---|---|---|---|---|---|---|---|
Model 1 | Model 2 | Model 3 | |||||||
Min. | Mean | Max. | Min. | Mean | Max. | Min. | Mean | Max. | |
P1E | 0.505 | 0.551 | 0.598 | 0.463 | 0.508 | 0.550 | 0.414 | 0.456 | 0.489 |
P1L | 0.612 | 0.667 | 0.711 | 0.582 | 0.619 | 0.710 | 0.490 | 0.548 | 0.657 |
P2E | 0.536 | 0.610 | 0.672 | 0.439 | 0.549 | 0.619 | 0.381 | 0.454 | 0.551 |
P2L | 0.533 | 0.571 | 0.618 | 0.406 | 0.516 | 0.620 | 0.332 | 0.464 | 0.603 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Silva, E.; da S. Torres, R.; Pinto, A.; Tzy Li, L.; S. Vianna, J.E.; Azevedo, R.; Goldenstein, S. Application-Oriented Retinal Image Models for Computer Vision. Sensors 2020, 20, 3746. https://doi.org/10.3390/s20133746
Silva E, da S. Torres R, Pinto A, Tzy Li L, S. Vianna JE, Azevedo R, Goldenstein S. Application-Oriented Retinal Image Models for Computer Vision. Sensors. 2020; 20(13):3746. https://doi.org/10.3390/s20133746
Chicago/Turabian StyleSilva, Ewerton, Ricardo da S. Torres, Allan Pinto, Lin Tzy Li, José Eduardo S. Vianna, Rodolfo Azevedo, and Siome Goldenstein. 2020. "Application-Oriented Retinal Image Models for Computer Vision" Sensors 20, no. 13: 3746. https://doi.org/10.3390/s20133746
APA StyleSilva, E., da S. Torres, R., Pinto, A., Tzy Li, L., S. Vianna, J. E., Azevedo, R., & Goldenstein, S. (2020). Application-Oriented Retinal Image Models for Computer Vision. Sensors, 20(13), 3746. https://doi.org/10.3390/s20133746