poster

Free access

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Authors:

Md Touhidul Islam,

Elena Ariel Pearce,

Md Alimoor Reza,

Syed Masum BillahAuthors Info & Claims

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

Article No.: 80, Pages 1 - 8

https://doi.org/10.1145/3663548.3688538

Published: 27 October 2024 Publication History

All formats PDF

Abstract

This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.

References

[1]

[n. d.]. Seeing AI. https://www.microsoft.com/en-us/seeing-ai/

[2]

2015. Be My Eyes: Bringing sight to blind and low vision people. https://www.bemyeyes.com/

[3]

Hassan Abu Alhaija, Siva Karthik Mustikovela, Lars Mescheder, Andreas Geiger, and Carsten Rother. 2018. Augmented reality meets computer vision: Efficient data generation for urban driving scenes. International Journal of Computer Vision 126, 9 (2018), 961–972.

Digital Library

[4]

Aira. 2018. Aira. https://aira.io/. Retrieved May 23, 2020 from https://aira.io

[5]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In Proceedings of the IEEE International Conference on computer vision.

Digital Library

[6]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.

[8]

Lennard J. Davis. 2016. The Disability Studies Reader (5th ed.). Routledge. https://doi.org/10.4324/9781315680668

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.

[10]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2010), 303–338.

Digital Library

[11]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 3354–3361.

[12]

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In Conference on Computer Vision and Pattern Recognition (CVPR).

[13]

Tanmay Gupta, A. Kamath, Aniruddha Kembhavi, and Derek Hoiem. 2021. Towards General Purpose Vision Systems. ArXiv abs/2104.00743 (2021).

[14]

Danna Gurari, Qing Li, Abigale J Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P Bigham. 2018. Vizwiz grand challenge: Answering visual questions from blind people. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3608–3617.

[15]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision (ICCV).

[16]

Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, and Syed Masum Billah. 2024. A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals’ Navigation. arxiv:2407.16777 [cs.CV] https://arxiv.org/abs/2407.16777

[17]

Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, and Steven CH Hoi. 2022. Lavis: A library for language-vision intelligence. arXiv preprint arXiv:2209.09019 (2022).

[18]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. PMLR, 12888–12900.

[19]

Linjie Li, Jie Lei, Zhe Gan, and Jingjing Liu. 2021. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2042–2051.

[20]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.

[21]

Simi Linton. 1998. Claiming disability: Knowledge and identity. NYU Press.

[22]

Meredith Ringel Morris. 2020. AI and Accessibility. Commun. ACM 63, 6 (2020), 35–37.

Digital Library

[23]

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder. 2017. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 5000–5009.

[24]

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. 2017. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision. 4990–4999.

[25]

OKO. 2023. OKO makes every intersection accessible. https://www.ayes.ai/oko

[26]

Joon Sung Park, Danielle Bragg, Ece Kamar, and Meredith Ringel Morris. 2021. Designing an Online Infrastructure for Collecting AI Data From People With Disabilities. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 52–63.

Digital Library

[27]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).

[28]

Stephan R. Richter, Zeeshan Hayder, and Vladlen Koltun. 2017. Playing for Benchmarks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017.

[29]

Manaswi Saha, Michael Saugstad, Hanuma Teja Maddali, Aileen Zeng, Ryan Holland, Steven Bower, Aditya Dash, Sage Chen, Anthony Li, Kotaro Hara, 2019. Project sidewalk: A web-based crowdsourcing tool for collecting sidewalk accessibility data at scale. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[30]

Woosuk Seo and Hyunggu Jung. 2017. Exploring the community of blind or visually impaired people on YouTube. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. 371–372.

Digital Library

[31]

Woosuk Seo and Hyunggu Jung. 2018. Understanding blind or visually impaired people on youtube through qualitative analysis of videos. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video. 191–196.

Digital Library

[32]

Woosuk Seo and Hyunggu Jung. 2021. Understanding the community of blind or visually impaired vloggers on YouTube. Universal Access in the Information Society 20 (2021), 31–44.

Digital Library

[33]

Daniel J Simons and Daniel T Levin. 1997. Change blindness. Trends in cognitive sciences 1, 7 (1997), 261–267.

[34]

Lida Theodorou, Daniela Massiceti, Luisa Zintgraf, Simone Stumpf, Cecily Morrison, Edward Cutrell, Matthew Tobias Harris, and Katja Hofmann. 2021. Disability-first Dataset Creation: Lessons from Constructing a Dataset for Teachable Object Recognition with Blind and Low Vision Data Collectors. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–12.

Digital Library

[35]

Garreth W Tigwell. 2021. Nuanced perspectives toward disability simulations from digital designers, blind, low vision, and color blind people. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.

Digital Library

[36]

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7464–7475.

[37]

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3349–3364.

[38]

Jingyi Xie, Na Li, Sooyeon Lee, and John M Carroll. 2022. YouTube Videos as Data: Seeing Daily Challenges for People with Visual Impairments During COVID-19. In Proceedings of the 2022 ACM Conference on Information Technology for Social Good. 218–224.

Digital Library

[39]

Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, 2023. Recognize Anything: A Strong Image Tagging Model. arXiv preprint arXiv:2306.03514 (2023).

[40]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 633–641.

[41]

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2019. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision 127 (2019), 302–321.

Digital Library

Index Terms

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation
1. Computing methodologies
  1. Artificial intelligence
2. Social and professional topics
  1. Professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

A proposed indoor navigation system for blind individuals
iiWAS '11: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services

Visual impairment is one of the disabilities that could injure people congenitally or adventitiously. The number of individuals who suffer from visual impairment is increasing rapidly. Therefore, there is an urgent need to develop assistive technologies ...
Non-visual mainstream smartphone camera interactions for blind and low-vision people
Identifying Visual Cues to Improve Independent IndoorNavigation for Blind Individuals
ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility

The idea of using technology to help those with visual impairments navigate has been studied extensively. However, most of these systems focus on getting the user from place to place, rather than helping the person get a better sense and intuition of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 2024

1475 pages

ISBN:9798400706776

DOI:10.1145/3663548

Editors:
David Flatla
University of Guelph, CANADA
,
Faustina Hwang
University of Reading, UNITED KINGDOM
,
Tiago Guerreiro
University of Lisbon, PORTUGAL
,
Robin Brewer
University of Michigan, UNITED STATES

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGACCESS: ACM Special Interest Group on Accessible Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Qualifiers

Poster
Research
Refereed limited

Conference

ASSETS '24

Sponsor:

SIGACCESS

ASSETS '24: The 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 27 - 30, 2024

NL, St. John's, Canada

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)31

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents