Abstract
Human beings have developed a number of evolutionary mechanisms that allows the distinction between different objects and the triggering of events based on their perception of reality. Visual impairment has a significant impact on individuals’ quality of life, including their ability to work and to develop personal relationships as they often feel cut off people and things around them, due to their impairment. The need for assistive technologies has long been a constant in the daily lives of people with visual impairments, and will remain a constant in future years. Cognitive mapping is of extreme importance for individuals in terms of creating a conceptual model of the surrounding space and objects around them, thereby supporting their interaction with the physical environment. This work describes the use of computer vision techniques, namely feature detectors and descriptors, to detect objects in the scene and help contextualize the user within the surrounding space, enhancing their mobility, navigation and cognitive mapping of a new environment.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
From the overall population with visual impairment, about 90 % of the world’s visually impaired live in developing countries and 82 % of people living with blindness are aged 50 year old and above. Regrettably, this percentage is expected to increase in the coming decades. Visual impairment has a significant impact on individuals’ quality of life, including their ability to work and to develop personal relationships. Almost half (48 %) of the visually impaired feel “moderately” or “completely” cut off from people and things around them [1].
In order to overcome or lessen the difficulties posed by visual impairment, extensive research has been dedicated into building assistive systems. The need for assistive technologies has long been a constant in the daily lives of people with visual impairment, and will remain a constant in future years. Traditional assistive technologies for the blind include white canes, guide dogs, screen readers, and so forth. Modern mobile assistive technologies are becoming more discrete and include (or are delivered via) a wide range of mobile computerized devices, including ubiquitous technologies like mobile phones. Such discrete technologies can help alleviate the cultural stigma associated with the more traditional (and noticeable) assistive devices [2].
Human beings have the ability of acquiring and using information obtained from the surrounding environment using their natural sensors. They have developed a number of evolutionary mechanisms that allows the distinction between different objects and the triggering of events and complex processes based on their perception of reality. Recently, systems have been developed which use Computer Vision techniques, like pattern matching, to sense the surrounding environment and detect visual landmarks.
Human beings are able to recognize objects without much effort, despite variations in scale, position and light conditions. However, to emulate this detection and recognition with electronic devices is still a major challenge.
In this paper we present the use of computer vision techniques, namely feature detectors and descriptors, to detect objects in the scene and help contextualize the user within the surrounding space, enhancing their mobility, navigation and cognitive mapping of a new environment. Section 2 presents related work and how work enhances previous developments in this field by the research team at the University of Trás-os-Montes and Alto Douro. Section 3 describes the feature detectors used in this work. In Sect. 4 exposes the testing methodology and environment setup. Section 5 shows the results of the study and, finally, Sect. 6 presents some discussion regarding the results found.
2 Related Work
To address the task of finding the user location in indoor environments several techniques and technologies have been used such as sonar, radio signal triangulation, radio signal (beacon) emitters, or signal fingerprinting. All these technologies can be, and have been, used to develop systems that help enhancing the personal space range of blind or visually impaired users [3].
In recent years some research teams [4–6] have developed navigation systems based on this technology. In the case of outdoor environments, some hybrid systems have been proposed that use GPS as the main information source and use RFID for correction and minimization of the location error. In the last few years, the research team at the University of Trás-os-Montes e Alto Douro (UTAD) has given major focus to visual impairment and on how existing technology may help in everyday life applications. From an extensive review of the state of the art and its best practices, three main projects have been developed: the SmartVision [7], Nav4B [8] and Blavigator [9] projects.
The new prototype, developed by the Blavigator project is built with the same modular structure as the SmartVision project. The Blavigator project aimed at creating a small, cheap and portable device that included all the features of the SmartVision prototype, with added performance optimization. In the last optimization of the Computer Vision module of the Blavigator prototype, this module has been developed to work in conjunction with the Location Module [10]. In a known location, the use of object recognition algorithms can provide contextual feedback to the user and even serve as a validator to the positioning and information system modules of a navigation system for the visually impaired.
Based on the previous work described, this paper proposes a method where the use of computer vision algorithms validates the outputs of the positioning system through the use of feature detectors. For this purpose, an analysis of the performance of feature detectors is made as well.
3 Feature Detection
Lately, robust object recognition is a topic of major focus on computer vision. Many researchers have worked for many years to understand how to achieve this goal. Humans can see, detect, recognize and categorize objects in the real world with relative ease. However, this is not an easy task using computer vision.
Many recognition applications are only intended to recognize objects in predefined positions and orientations. However, parts of objects can be of different geometric structure, color, or even hidden. Thus, objects can be viewed from different points of view and even at different scales. The environment or the object itself may cause the occlusion. An object like a book can be hidden in the middle of many others or the facade of a building can have some of its parts hidden by shadow. This additional data should be discarded because it does not help in the recognition of the objects. Recognizing a building or a book cover in different perspectives is a very difficult task using computer vision.
Keypoints are used very often in computer vision, including for object recognition. The term keypoints (characteristic points) generally refers to the set of points that are used to describe certain patterns. The most common approach to detect and recognize objects using keypoints is divided in three steps:
-
Detect the keypoints in the image;
-
Describe the region surrounding the keypoints;
-
Use a method to compare the descriptors (like distance, or similar).
In recent years, various methods have been developed to allow the search of invariant properties that do not differ according to different conditions such as scaling, rotation, and illumination changes. Scale-invariant detectors select regions in places of significance in the image with a corresponding scale parameter, representing the size of the region. Hence, process complexity is reduced because only a limited number of regions remain that are considered as important characteristics.
In order to use the identified characteristic points, the immediate vicinity needs be described in an efficient and compact manner, in order to combine them with similar patterns in other images. A descriptor is used to construct a description of the neighborhood of feature points. The combination of keypoint detectors with feature descriptors enables the robust representation of objects. This article aims to study the performance of detectors and descriptors such as ORB, FREAK, BRISK, BRIEF, STAR, FAST and GTFF for certain classes of objects in order to identify those that behave better, depending on its class.
SURF (Speeded Up Robust Feature) was presented by Herbert Bay in 2006, and can be used in object recognition and 3D reconstruction. It is partly inspired by the SIFT and several times faster, being based on sums of “2D Haar wavelet” making an efficient use of “integral images”. SURF cannot be used for commercial purposes without permission of the patent holder [11]. STAR (Solenoidal Tracker at RHIC) derived from CenSurE (Center Surrounded Extreme), uses polygons such as squares, pentagons and hexagons as alternatives computationally less expensive than circles [12]. FAST (Features from Accelerated Segment Test) is a fast detector and an ideal corner detection algorithm for real-time processing. However, it usually detects too many features. For not being selective it is very likely that the selected features are not optimal and are found adjacent to each other [13].
Consequently, after the detection of keypoints, it is necessary to describe them. To this end, descriptors are used. The descriptors analyzed in this work were BRISK, BRIEF, ORB and FREAK, respectively.
BRISK (Binary Robust Invariant Scalable Keypoints) presents an alternative to SIFT and SURF, maintaining the robustness and speed [14]. According to the authors, the key to this speed is the implementation of a detector based on scalable FAST in combination with a “binary string” descriptor, from comparisons obtained from each neighbor sample point of interest. BRIEF (Binary Robust Independent Elementary Features) reduces the size of points of interest, converting them into “binary strings”, without having to first obtain the descriptors [15]. For example, when SIFT uses a vector of size 128, all these elements may not be necessary for the “matching” of the descriptors. Thus, these points can be converted to “binary strings” which are very efficient to make the “matching” using the “Hamming” distance which basically uses XOR instructions and processor “bit count”. These instructions are extremely fast on processors equipped with SSE instructions. Therefore, BRIEF is an algorithm that uses little memory and is more efficient than descriptors using higher dimensions. ORB (Oriented FAST and Rotated BRIEF) has been proposed as a computationally efficient substitute to SIFT, having a similar performance. ORB is less affected by noise, and can be used for real-time performance [16]. The purpose of this technique is to enable low power devices without GPU acceleration to perform panorama stitching, patch tracking and reduce the detection time of objects. ORB has similar performance to SIFT and better than SURF. ORB is based on the FAST detector and uses BRIEF in regards to recognition. Both technics have good performance and low processing costs. FREAK (Fast Retina Keypoint) consists of a cascade of binary sequences, which are efficiently calculated by comparing pairs of image intensities over a retina-sampling pattern. Interestingly, the selection of pairs to reduce the size of the descriptor produces a highly structured search pattern that mimics the human eye [17].
The techniques in which this study was based for the detection and object recognition were: BRIEF, ORB, BRISK and FREAK. The SURF detector is used together with the BRIEF, BRISK and FREAK descriptors. The ORB detector descriptor is used with ORB as, compared with other descriptors, requires information about the orientation of the keypoints. Moreover, according to the evaluation of Heinly et al. [18], the combined use of the detector and descriptor ORB/ORB exceeded the use of SURF/ORB in most cases. On the other hand, SURF keypoints are invariant to rotation and scaling, being suitable to be used with the BRISK and FREAK descriptors. The authors of the BRIEF [15] confirm the use of SURF as detector.
4 Testing Methodology and Environment Setup
The implementation of any of the methods mentioned in the previous section is similar to each other, according to the aforementioned three essential steps, namely: the detection of feature points, a description of each region around that point as a feature vector using a descriptor, and finally, using a function that allows the comparison of descriptors in order to perform the matching.
Figure 1 illustrates the methodology used to detect and recognize the objects.
The application was developed in Java programming language using the Eclipse IDE and the OpenCV 2.4.9 library for the Android operating system. All methods implemented had the same kind of input images in order to ensure consistency in the comparisons. The only difference was in the feature comparison methods. The methods used were: ORB/ORB, STAR/ORB and ORB/BRIEF. The FAST detector, although recommended for use in real-time applications, in our experiments showed a less acceptable performance due to excessive amount of detected characteristic points, which consequently requires a high processing time.
5 Results
The evaluation method used consisted of evaluating three parameters in the processed data: the number of keypoints, the accuracy of matches and processing time (in milliseconds). For this purpose we used a set of 17 images, wherein 13 of them are part of the dataset of Oxford Buildings (Block I and Block C) [20]. The remaining images used are from cover of the book The Hobbit. In these tests, the mobile device used was a Wiko Gateway with camera resolution of 800 × 600 pixels and a Cortex-A7 1.3 GHz quad-core processor.
Figure 2 represents the average number of keypoints for each set of detectors and descriptors used, in particular ORB-ORB, ORB-BRIEF and STAR-ORB. The results show that the ORB detector detects 500 feature points, which is acceptable for processing in smartphones. The STAR detector finds about 700 points, on average, which can be excessive sometimes. Figure 3 represents the ratio (measured in percentage) between the matches and the total number of keypoints detected. The graphic shows that the combination BRIEF-ORB (detector and descriptor) is the one that features better performance, unlike the combination STAR-ORB which cannot match them as efficiently, even after detecting the characteristic points. Finally, Fig. 4 represents the processing time measured in milliseconds. The results show that the STAR-ORB combination has an excessive processing time to be used in mobile applications (about 1050 ms). Since the objective of the implementation is to detect and recognize objects in real time on a mobile device, this is not appropriate to the context. With regard to ORB-ORB and ORB-BRIEF detectors and descriptors it is possible to infer that, in addition to the processing time of the methods ORB-BRIEF being shorter (about 550 ms), they have better accuracy in the matching. Thus, the detector and descriptor chosen for testing in this environment is ORB-BRIEF.
After choosing the detector and descriptor, the application was tested to determine if the ORB- BRIEF method is able to detect a set of buildings from the Oxford Buildings DataSet. For this purpose, we used a set of 10 images (I1, I2,…, I10) of the same building and 15 images (B1, B2,…, B5, C1, C2,…, C5, D1, D2,…, D5) of another three different buildings.
The tests were performed with the mobile device mentioned above. The purpose of the tests was to measure the method performance in recognizing the building in the image. To this purpose, we used a configurable reference value, which allows deciding if the building is the same in both images (true match). This value was determined empirically after a few tests. In our implementation a building is set as found if the number of matches exceeds the value 15.
In Figs. 5, 6 and 7 are shown the overall results of the tests.
It can be seen that there are some critical cases in which, even if the building in the image is not the correct one, there are more matches than normal, particularly in the case of images C5 (Fig. 7). On the other hand, in the case of image I3 (Fig. 6) the number of matches is not sufficient to recognize the building, even though the building is the correct one. According to the tests made, the study showed a success rate of 96 %, with only one false negative, in image I3.
6 Discussion
From the results obtained it can be seen that the best methods to be used in mobile devices are: ORB, BRISK, BRIEF and FREAK. The work developed was based on the ORB and BRIEF descriptors. The results of these two methods are identical, yet BRIEF shows a slight improvement in the matching of keypoints and processing time which, depending on the situation, may be relevant. Another important factor is the camera resolution. In the case of very small resolutions, the results will not be satisfactory because the image has poor quality. In this case, the response time is relatively small. If the camera resolution is too high, the results are more accurate, but the response time is also increased. Thus, the resolution to use is also a factor to be considered in order to find the best compromise between quality and processing time.
As future work, we intended to use the results of this study in order to enhance the development of the Blavigator system that allows the orientation and navigation of people inside buildings that users are unaware of by identifying natural elements, such as stairs, elevators, ATM, etc. This system is specially developed for people with visual impairments.
References
Hakobyan, L., Lumsden, J., O’Sullivan, D., Bartlett, H.: Mobile assistive technologies for the visually impaired. Surv. Ophthalmol. 58(6), 513–528 (2013)
Thomas Pocklington Trust: Research findings no 4: helping people with sight loss in their homes: housing-related assistive technology (2003). http://www.pocklington-trust.org.uk/research/publications/rf4. Accessed 23 November 2014
Strumillo, P.: Electronic interfaces aiding the visually impaired in environmental access, mobility and navigation. In: 3rd Conference on Human System Interactions (HSI), pp. 17–24 (2010)
Chumkamon, S., Tuvaphanthaphiphat, P., Keeratiwintakorn, P.: A blind navigation system using RFID for indoor environments. In: 5th International Conference on in Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, Thailand, vol. 2, pp. 765–768 (2008)
Willis, S., Helal, S.: RFID information grid for blind navigational and wayfinding. In: Proceedings of the 9th IEEE International Symposium on Wearable Computers, Osaka, pp. 34–37 (2005)
D’Atri, E., Medaglia, C., Panizzi, E., D’Atri, A.: A system to aid blind people in the mobility: a usability test and its results. In: Proceedings of the Second International Conference on Systems, Martinique, p. 35 (2007)
Fernandes, H., du Buf, J., Rodrigues, J.M.F., Barroso, J., Paredes, H., Farrajota, M., José, J.: The smartvision navigation prototype for blind users. J. Digit. Content Technol. Appl. 5(5), 351–361 (2011)
Fernandes, H., Faria, J., Paredes, H., Barroso, J.: An integrated system for blind day-to-day life autonomy. In: The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility, Dundee, Scotland, UK (2011)
Fernandes, H., Adão, T., Magalhães, L., Paredes, H., Barroso, J.: Navigation module of blavigator prototype. In: Proceedings of the World Automation Congress, World Automation Congress 2012, Puerto Vallarta (2012)
Fernandes, H., Costa, P., Paredes, H., Filipe, V., Barroso, J.: Integrating computer vision object recognition with location based services for the blind. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part III. LNCS, vol. 8515, pp. 493–500. Springer, Heidelberg (2014)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 102–115. Springer, Heidelberg (2008)
Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1508–1515 (2005)
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: binary robust invariant scalable keypoints. In: IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555 (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp: 510–517 (2012)
Heinly, J., Dunn, E., Frahm, J.-M.: Comparative evaluation of binary features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 759–773. Springer, Heidelberg (2012)
Mulmule, D., Dravid, A.: A study of computer vision techniques for currency recognition on mobile phone for the visually impaired. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(11), 160–165 (2014)
Philbin, J., Arandjelović, R., Zisserman, A.: Oxford Buildings Dataset (2015). http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/. Accessed March 2015
Acknowledgements
This work is financed by the FCT – Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within projects UID/EEA/50014/2013 and UTAP-EXPL/EEI-SII/0043/2014, and research grants SFRH/BD/89759/2012 and SFRH/BD/87259/2012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Fernandes, H., Sousa, A., Paredes, H., Filipe, V., Barroso, J. (2015). Feature Detection Applied to Context-Aware Blind Guidance Support. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to the Human Environment and Culture. UAHCI 2015. Lecture Notes in Computer Science(), vol 9178. Springer, Cham. https://doi.org/10.1007/978-3-319-20687-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-20687-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20686-8
Online ISBN: 978-3-319-20687-5
eBook Packages: Computer ScienceComputer Science (R0)