research-article

Open access

VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging

Authors:

Naoki Hirabayashi,

Masakazu Iwamura,

Kazunori Minatani,

Koichi KiseAuthors Info & Claims

ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility

Article No.: 6, Pages 1 - 17

https://doi.org/10.1145/3597638.3608422

Published: 22 October 2023 Publication History

All formats PDF

Abstract

Many people with visual impairments would like to take photographs. However, they often have difficulty pointing the camera at the target. In this paper, we address this problem by proposing a novel photo-taking system called VisPhoto. Unlike conventional methods, VisPhoto generates a photograph in post-production. When the shutter button is pressed, VisPhoto captures an omnidirectional camera image that contains the surrounding scene of the camera. In post-production, the system outputs a cropped region as a “photograph” that satisfies the user’s preference. We conducted an experiment consisting of two parts. First, 24 people with visual impairments took photographs with a genuine iPhone camera app, a conventional method, and VisPhoto. Second, 20 sighted people evaluated the quality of the photographs. The experimental results showed that the participants with visual impairments preferred to use VisPhoto to take photographs of difficult targets, whereas they preferred the conventional method for easy targets. Moreover, we revealed that their preferences for photo-taking methods were influenced by the participants’ needs and values about photography and their confidence in their photographic abilities.

Supplemental Material

MP4 File

Video figure

Download
28.82 MB

References

[1]

[n. d.]. Envision Glasses. Retrieved Dec. 12, 2022 from https://www.letsenvision.com/glasses

[2]

[n. d.]. OrCam MyEye 2. Retrieved Sept. 18, 2019 from https://www.orcam.com/en/myeye2/

[3]

[n. d.]. SFSpeechRecognizer | Apple Developer Documentation. Retrieved Sept. 10, 2021 from https://developer.apple.com/documentation/speech/sfspeechrecognizer

[4]

Dustin Adams, Sri Kurniawan, Cynthia Herrera, Veronica Kang, and Natalie Friedman. 2016. Blind Photographers and VizSnap. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility - ASSETS '16. ACM Press. https://doi.org/10.1145/2982142.2982169

Digital Library

[5]

Dustin Adams, Lourdes Morales, and Sri Kurniawan. 2013. A Qualitative Study to Support A Blind Photography Mobile Application. In Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments - PETRA '13. ACM Press. https://doi.org/10.1145/2504335.2504360

Digital Library

[6]

Jan Balata, Zdenek Mikovec, and Lukas Neoproud. 2015. BlindCamera: Central and Golden-ratio Composition for Blind Photographers. In Proceedings of the Mulitimedia, Interaction, Design and Innnovation on ZZZ - MIDI '15. ACM Press. https://doi.org/10.1145/2814464.2814472

Digital Library

[7]

Peter G. J. Barten. 1989. The Square Root Integral (SQRI): A New Metric to Describe the Effect of Various Display Parameters on Perceived Image Quality. In Human Vision, Visual Processing, and Digital Display (1989-08). https://doi.org/10.1117/12.952705

[8]

Be My Eyes [n. d.]. Retrieved Sept. 20, 2019 from https://www.bemyeyes.com/

[9]

Cynthia L. Bennett, Jane E, Martez E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18 (2018-04). ACM Press. https://doi.org/10.1145/3173574.3173650

Digital Library

[10]

Jeffrey P. Bigham, Samual White, Tom Yeh, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, and Brandyn White. 2010. VizWiz: Nearly Real-Time Answers to Visual Questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology - UIST '10. ACM Press. https://doi.org/10.1145/1866029.1866080

Digital Library

[11]

Wei-Ta Chu, Yu-Kuang Chen, and Kuan-Ta Chen. 2013. Size does matter. In Proceedings of the 21st ACM International Conference on Multimedia - MM '13 (2013). https://doi.org/10.1145/2502081.2502102

Digital Library

[12]

Envision AI [n. d.]. Retrieved Feb. 12, 2020 from https://www.letsenvision.com/

[13]

David Frohlich and Ella Tallyn. 1999. Audiophotography: Practice and Prospects. In CHI '99 Extended Abstracts on Human Factors in Computing Systems - CHI '99. ACM Press. https://doi.org/10.1145/632716.632897

Digital Library

[14]

David M. Frohlich. 2004. Audiophotography. Springer Netherlands. https://doi.org/10.1007/978-1-4020-2210-4

[15]

Erving Goffman. 1990. The Presentation of Self in Everyday Life. Penguin Books Ltd. https://www.ebook.de/de/product/3273768/erving_goffman_the_presentation_of_self_in_everyday_life.html

[16]

Google Cloud Speech-to-Text API [n. d.]. Retrieved Sep. 8, 2021 from https://cloud.google.com/speech-to-text

[17]

Google Lookout [n. d.]. Retrieved Nov. 24, 2022 from https://play.google.com/store/apps/details?id=com.google.android.apps.accessibility.reveal

[18]

Kathryn Greene, Valerian J. Derlega, and Alicia Mathews. [n. d.]. Self-Disclosure in Personal Relationships. In The Cambridge Handbook of Personal Relationships, Anita L. Vangelisti and Daniel Perlman (Eds.). Cambridge University Press, 409–428. https://doi.org/10.1017/cbo9780511606632.023

[19]

Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. 2018. VizWiz Grand Challenge: Answering Visual Questions from Blind People. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00380

[20]

Susumu Harada, Daisuke Sato, Dustin W. Adams, Sri Kurniawan, Hironobu Takagi, and Chieko Asakawa. 2013. Accessible Photo Album: Enhancing the Photo Sharing Experience for People with Visual Impairment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI '13. ACM Press. https://doi.org/10.1145/2470654.2481292

Digital Library

[21]

Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, and Min Sun. 2017. Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.153

[22]

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2020. VisPhoto: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image. In Extended Abstracts of ACM CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3334480.3382983

Digital Library

[23]

Masakazu Iwamura, Naoki Hirabayashi, Zheng Cheng, Kazunori Minataniand, and Koichi Kise. 2021. Photography for People with Visual Impairment by Photo-Taking with Omni-Directional Camera and Its Post-Production. J104-D, 8 (2021), 663–677. https://doi.org/10.14923/transinfj.2020JDP7069

[24]

Masakazu Iwamura, Takaaki Kawai, Keigo Takashima, Kazunori Minatani, and Koichi Kise. 2022. Acquiring Surrounding Visual Information Without Actively Taking Photos for People with Visual Impairment. In Proc. Joint International Conference on Digital Inclusion, Assistive Technology & Accessibility (ICCHP-AAATE 2022). Lecture Notes in Computer Science, Vol. 13341. Springer International Publishing, 229–240. https://doi.org/10.1007/978-3-031-08648-9_27 https://www.doi.org/10.1007/978-3-031-08648-9_27.

Digital Library

[25]

Chandrika Jayant, Hanjie Ji, Samuel White, and Jeffrey P. Bigham. 2011. Supporting Blind Photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility - ASSETS '11. ACM Press. https://doi.org/10.1145/2049536.2049573

Digital Library

[26]

Wei-Sheng Lai, Yujia Huang, Neel Joshi, Christopher Buehler, Ming-Hsuan Yang, and Sing Bing Kang. 2018. Semantic-Driven Generation of Hyperlapse from 360 Degree Video. IEEE Transactions on Visualization and Computer Graphics 24, 9 (Sept. 2018), 2610–2621. https://doi.org/10.1109/tvcg.2017.2750671

[27]

Kyungjun Lee, Jonggi Hong, Simone Pimento, Ebrima Jarjue, and Hernisa Kacorri. 2019. Revisiting Blind Photography in the Context of Teachable Object Recognizers. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. ACM. https://doi.org/10.1145/3308561.3353799

Digital Library

[28]

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. arXiv preprint arXiv:1405.0312 (2014). arXiv:1405.0312v3 [cs.CV] https://arxiv.org/abs/1405.0312

[29]

Yen-Chen Lin, Yung-Ju Chang, Hou-Ning Hu, Hsien-Tzu Cheng, Chi-Wen Huang, and Min Sun. 2017. Tell Me Where to Look: Investigating Ways for Assisting Focus in 360° Video. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI '17. ACM Press. https://doi.org/10.1145/3025453.3025757

Digital Library

[30]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Proc. 14th European Conference on Computer Vision, Part I(Lecture Notes in Computer Science, Vol. 9905). 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

[31]

Object Localization of Google Cloud Vision API [n. d.]. Retrieved Sep. 8, 2021 from https://cloud.google.com/vision/docs/object-localizer

[32]

Plug-in store — Ricoh Theta [n. d.]. Retrieved Sept. 20, 2019 from https://pluginstore.theta360.com/

[33]

Post-production — Wikipedia [n. d.]. Retrieved Sept. 19, 2019 from https://en.wikipedia.org/wiki/Post-production

[34]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.91

[35]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 91–99.

[36]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. [n. d.]. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018-06). IEEE. https://doi.org/10.1109/cvpr.2018.00474

[37]

Seeing AI [n. d.]. Retrieved Sept. 7, 2021 from https://www.microsoft.com/ja-jp/ai/seeing-ai

[38]

Yu-Chuan Su and Kristen Grauman. 2017. Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.150

[39]

Yu-Chuan Su, Dinesh Jayaraman, and Kristen Grauman. 2016. Pano2Vid: Automatic Cinematography for Watching 360° Videos. In Proc. 13th Asian Conference on Computer Vision (ACCV). https://doi.org/10.1007/978-3-319-54190-7_10

[40]

TapTapSee [n. d.]. Retrieved Sept. 20, 2019 from https://www.bemyeyes.com/

[41]

TensorFlow Lite Object Detection iOS Example Application [n. d.]. Retrieved Apr. 14, 2021 from https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios

[42]

Marynel Vázquez and Aaron Steinfeld. 2012. Helping Visually Impaired Users Properly Aim A Camera. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility - ASSETS '12. ACM Press. https://doi.org/10.1145/2384916.2384934

Digital Library

[43]

Marynel Vázquez and Aaron Steinfeld. 2014. An Assisted Photography Framework to Help Visually Impaired Users Properly Aim a Camera. ACM Transactions on Computer-Human Interaction 21, 5 (Nov. 2014), 1–29. https://doi.org/10.1145/2651380

Digital Library

[44]

Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly Leshed. 2016. How Blind People Interact with Visual Content on Social Networking Services. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing - CSCW '16. ACM Press. https://doi.org/10.1145/2818048.2820013

Digital Library

[45]

Samuel White, Hanjie Ji, and Jeffrey P. Bigham. 2010. EasySnap: Real-Time Audio Feedback for Blind Photography. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology - UIST '10. ACM Press. https://doi.org/10.1145/1866218.1866244

Digital Library

[46]

Shaomei Wu, Jeffrey Wieland, Omid Farivar, and Julie Schiller. [n. d.]. Automatic Alt-Text: Computer-Generated Image Descriptions for Blind Users on a Social Network Service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (2017-02). ACM. https://doi.org/10.1145/2998181.2998364

Digital Library

[47]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas Huang. [n. d.]. Free-Form Image Inpainting with Gated Convolution. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019-10). IEEE. https://doi.org/10.1109/iccv.2019.00457

[48]

Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2017. The Effect of Computer-Generated Descriptions on Photo-Sharing Experiences of People with Visual Impairments. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (dec 2017), 1–22. https://doi.org/10.1145/3134756

Digital Library

[49]

Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2018. A Face Recognition Application for People with Visual Impairments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18. ACM Press. https://doi.org/10.1145/3173574.3173789

Digital Library

Cited By

Xu ACai MHou DChang RGuo A(2024)ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the WildProceedings of the 21st International Web for All Conference10.1145/3677846.3677861(59-69)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677861
Chang RLiu YZhang LGuo A(2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675599
Chang RLiu YGuo A(2024)WorldScribe: Towards Context-Aware Live Visual DescriptionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676375(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676375

Index Terms

VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging
1. Human-centered computing
  1. Accessibility
    1. Accessibility systems and tools

Recommendations

VisPhoto: Photography for People with Visual Impairment as Post-Production of Omni-Directional Camera Image
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems

It is known that many people with visual impairments desire to take photos. In taking photos, however, they often face difficulty in pointing the camera at the subject. In this paper, we address this problem by proposing a novel photo-taking system named ...
A simplified two-view geometry based external calibration method for omnidirectional and PTZ camera pairs

We present a new external calibration method for an omnidirectional-PTZ camera pair.The method is based on two-view geometry and requires only two scene points.Unlike alternative methods, it does not make very restrictive assumptions on camera ...
Generation Method for Immersive Bullet-Time Video Using an Omnidirectional Camera in VR Platform
AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia

This paper proposes a generation method of immersive bullet-time video that continuously switches the images captured by multi-viewpoint omnidirectional cameras arranged around the subject. In ordinary bullet-time processing, it is possible to observe a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility

October 2023

1163 pages

ISBN:9798400702204

DOI:10.1145/3597638

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGACCESS: ACM Special Interest Group on Accessible Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

JST Kakenhi
The Telecommunication Advancement Foundation

Conference

ASSETS '23

Sponsor:

SIGACCESS

ASSETS '23: The 25th International ACM SIGACCESS Conference on Computers and Accessibility

October 22 - 25, 2023

NY, New York, USA

Acceptance Rates

ASSETS '23 Paper Acceptance Rate 55 of 182 submissions, 30%;

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
589
Total Downloads

Downloads (Last 12 months)421
Downloads (Last 6 weeks)39

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu ACai MHou DChang RGuo A(2024)ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the WildProceedings of the 21st International Web for All Conference10.1145/3677846.3677861(59-69)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677861
Chang RLiu YZhang LGuo A(2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675599
Chang RLiu YGuo A(2024)WorldScribe: Towards Context-Aware Live Visual DescriptionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676375(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676375

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten