Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2984511.2984518acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

Published: 16 October 2016 Publication History

Abstract

The world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout, and usually only after labeling them with sighted assistance. We introduce VizLens - an accessible mobile application and supporting backend that can robustly and interactively help blind people use nearly any interface they encounter. VizLens users capture a photo of an inaccessible interface and send it to multiple crowd workers, who work in parallel to quickly label and describe elements of the interface to make subsequent computer vision easier. The VizLens application helps users recapture the interface in the field of the camera, and uses computer vision to interactively describe the part of the interface beneath their finger (updating 8 times per second). We show that VizLens provides accurate and usable real-time feedback in a study with 10 blind participants, and our crowdsourcing labeling workflow was fast (8 minutes), accurate (99.7%), and cheap ($1.15). We then explore extensions of VizLens that allow it to (i) adapt to state changes in dynamic interfaces, (ii) combine crowd labeling with OCR technology to handle dynamic displays, and (iii) benefit from head-mounted cameras. VizLens robustly solves a long-standing challenge in accessibility by deeply integrating crowdsourcing and computer vision, and foreshadows a future of increasingly powerful interactive applications that would be currently impossible with either alone.

Supplementary Material

suppl.mov (uist1584-file3.mp4)
Supplemental video
MP4 File (p651-guo.mp4)

References

[1]
The braille literacy crisis in america. facing the truth, reversing the trend, empowering the blind. National Federation of the Blind, Jernigan Institute, March 2009.
[2]
2Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features. In Computer vision-ECCV 2006. Springer, 2006, 404--417.
[3]
Be My Eyes. http://www.bemyeyes.org, 2015.
[4]
Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 33--42.
[5]
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. Communications of the ACM 58, 8 (2015), 85--94.
[6]
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., et al. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 333--342.
[7]
Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE (2010), 65--72.
[8]
Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. Earcons and icons: Their structure and common design principles. Human-Computer Interaction 4, 1 (1989), 11--44.
[9]
Brady, E. L., Zhong, Y., Morris, M. R., and Bigham, J. P. Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW '13, ACM (New York, NY, USA, 2013), 1225--1236.
[10]
de Freitas, A. A., Nebeling, M., Chen, X. A., Yang, J., Karthikeyan Ranithangam, A. S. K., and Dey, A. K. Snap-to-it: A user-inspired platform for opportunistic device interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, ACM (New York, NY, USA, 2016), 5909--5920.
[11]
Fusco, G., Tekin, E., Ladner, R. E., and Coughlan, J. M. Using computer vision to access appliance displays. In ASSETS/Association for Computing Machinery. ACM Conference on Assistive Technologies, vol. 2014, NIH Public Access (2014), 281.
[12]
Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292.
[13]
Guo, A., Chen, X., and Bigham, J. P. Appliancereader: A wearable, crowdsourced, vision-based system to make appliances accessible. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM (2015), 2043--2048.
[14]
Hara, K., Sun, J., Moore, R., Jacobs, D., and Froehlich, J. Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning. In Proceedings of the ACM symposium on User interface software and technology (2014), 189--204.
[15]
1Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility, ACM (2011), 203--210.
[16]
Jung, K., Kim, K. I., and Jain, A. K. Text information extraction in images and video: a survey. Pattern recognition 37, 5 (2004), 977--997.
[17]
Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. In Proc. CHI, ACM (2013), 347--350.
[18]
KNFB Reader. http://www.knfbreader.com/, 2015.
[19]
Ladner, R. E., Ivory, M. Y., Rao, R., Burgstahler, S., Comden, D., Hahn, S., Renzelmann, M., Krisnandi, S., Ramasamy, M., Slabosky, B., et al. Automating tactile graphics translation. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, ACM (2005), 150--157.
[20]
Laput, G., Lasecki, W. S., Wiese, J., Xiao, R., Bigham, J. P., and Harrison, C. Zensors: Adaptive, rapidly deployable, human-intelligent sensor feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1935--1944.
[21]
Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 23--34.
[22]
Lasecki, W. S., Murray, K. I., White, S., Miller, R. C., and Bigham, J. P. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 23--32.
[23]
Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 18.
[24]
Lasecki, W. S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J. F., and Bigham, J. P. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, ACM (New York, NY, USA, 2013), 151--162.
[25]
Manduchi, R., and Coughlan, J. M. The last meter: blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2014), 3113--3122.
[26]
Morris, T., Blenkhorn, P., Crossey, L., Ngo, Q., Ross, M., Werner, D., and Wong, C. Clearspeech: A display reader for the visually handicapped. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 4 (Dec 2006), 492--500.
[27]
Nanayakkara, S., Shilkrot, R., Yeo, K. P., and Maes, P. Eyering: a finger-worn input device for seamless interactions with our surroundings. In Proceedings of the 4th Augmented Human International Conference, ACM (2013), 13--20.
[28]
OrCam. http://www.orcam.com, 2016.
[29]
Shilkrot, R., Huber, J., Liu, C., Maes, P., and Nanayakkara, S. C. Fingerreader: A wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014), 2359--2364.
[30]
Tekin, E., Coughlan, J. M., and Shen, H. Real-time detection and reading of led/lcd displays for visually impaired persons. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, IEEE (2011), 491--496.
[31]
Thatcher, J. Screen reader/2: Access to os/2 and the graphical user interface. In Proceedings of the First Annual ACM Conference on Assistive Technologies, Assets '94, ACM (New York, NY, USA, 1994), 39--46.
[32]
Vanderheiden, G., and Treviranus, J. Creating a global public inclusive infrastructure. In Universal Access in Human-Computer Interaction. Design for All and eInclusion. Springer, 2011, 517--526.
[33]
Vázquez, M., and Steinfeld, A. An assisted photography framework to help visually impaired users properly aim a camera. ACM Transactions on Computer-Human Interaction (TOCHI) 21, 5 (2014), 25.
[34]
Vezhnevets, V., Sazonov, V., and Andreeva, A. A survey on pixel-based skin color detection techniques. In Proc. Graphicon, vol. 3, Moscow, Russia (2003), 85--92.
[35]
White, S., Ji, H., and Bigham, J. P. Easysnap: real-time audio feedback for blind photography. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 409--410.
[36]
Wikipedia. Optical character recognition -- wikipedia, the free encyclopedia, 2016. {Online; accessed 18-March-2016}.
[37]
Wikipedia. Taxicab geometry -- wikipedia, the free encyclopedia, 2016. {Online; accessed 14-March-2016}.
[38]
Wikipedia. Tesseract (software) -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.
[39]
Wikipedia. Unsharp masking -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.
[40]
Zhao, Y., Szpiro, S., and Azenkot, S. Foresee: A customizable head-mounted vision enhancement system for people with low vision. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '15, ACM (New York, NY, USA, 2015), 239--249.
[41]
Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 20.
[42]
Zhong, Y., Lasecki, W. S., Brady, E., and Bigham, J. P. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 2353--2362.

Cited By

View all
  • (2024)Touchpad Mapper: Examining Information Consumption From 2D Digital Content Using Touchpads by Screen-Reader UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688505(1-4)Online publication date: 27-Oct-2024
  • (2024)A Recipe for Success? Exploring Strategies for Improving Non-Visual Access to Cooking InstructionsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675662(1-15)Online publication date: 27-Oct-2024
  • (2024)ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User ProgrammingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676391(1-15)Online publication date: 13-Oct-2024
  • Show More Cited By

Index Terms

  1. VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology
      October 2016
      908 pages
      ISBN:9781450341899
      DOI:10.1145/2984511
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 October 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accessibility
      2. computer vision
      3. crowdsourcing
      4. mobile devices
      5. non-visual interfaces
      6. visually impaired users

      Qualifiers

      • Research-article

      Conference

      UIST '16

      Acceptance Rates

      UIST '16 Paper Acceptance Rate 79 of 384 submissions, 21%;
      Overall Acceptance Rate 561 of 2,567 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)80
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Touchpad Mapper: Examining Information Consumption From 2D Digital Content Using Touchpads by Screen-Reader UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688505(1-4)Online publication date: 27-Oct-2024
      • (2024)A Recipe for Success? Exploring Strategies for Improving Non-Visual Access to Cooking InstructionsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675662(1-15)Online publication date: 27-Oct-2024
      • (2024)ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User ProgrammingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676391(1-15)Online publication date: 13-Oct-2024
      • (2024)SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676384(1-19)Online publication date: 13-Oct-2024
      • (2024)A Contextual Inquiry of People with Vision Impairments in CookingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642233(1-14)Online publication date: 11-May-2024
      • (2024)FetchAid: Making Parcel Lockers More Accessible to Blind and Low Vision People With Deep-learning Enhanced Touchscreen Guidance, Error-Recovery Mechanism, and AR-based Search SupportProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642213(1-15)Online publication date: 11-May-2024
      • (2024)A Systematic Review of Ability-diverse Collaboration through Ability-based Lens in HCIProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641930(1-21)Online publication date: 11-May-2024
      • (2024)Exploring Space: User Interfaces for Blind and Visually Impaired People for Spatial and Non-verbal InformationComputers Helping People with Special Needs10.1007/978-3-031-62846-7_32(267-274)Online publication date: 8-Jul-2024
      • (2024)Enhancing Accessible Reading for All with Universally Designed Augmented Reality – AReader: From Audio Narration to Talking AI AvatarsUniversal Access in Human-Computer Interaction10.1007/978-3-031-60881-0_18(282-300)Online publication date: 1-Jun-2024
      • (2023)Opportunities for Accessible Virtual Reality Design for Immersive Musical Performances for Blind and Low-Vision PeopleProceedings of the 2023 ACM Symposium on Spatial User Interaction10.1145/3607822.3614540(1-21)Online publication date: 13-Oct-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media