research-article

Open access

Guiding Novice Web Workers in Making Image Descriptions Using Templates

Authors:

Steven LandauAuthors Info & Claims

ACM Transactions on Accessible Computing (TACCESS), Volume 7, Issue 4

Article No.: 12, Pages 1 - 21

https://doi.org/10.1145/2764916

Published: 19 November 2015 Publication History

PDF eReader

Abstract

This article compares two methods of employing novice Web workers to author descriptions of science, technology, engineering, and mathematics images to make them accessible to individuals with visual and print-reading disabilities. The goal is to identify methods of creating image descriptions that are inexpensive, effective, and follow established accessibility guidelines. The first method explicitly presented the guidelines to the worker, then the worker constructed the image description in an empty text box and table. The second method queried the worker for image information and then used responses to construct a template-based description according to established guidelines. The descriptions generated through queried image description (QID) were more likely to include information on the image category, title, caption, and units. They were also more similar to one another, based on Jaccard distances of q-grams, indicating that their word usage and structure were more standardized. Last, the workers preferred describing images using QID and found the task easier. Therefore, explicit instruction on image-description guidelines is not sufficient to produce quality image descriptions when using novice Web workers. Instead, it is better to provide information about images, then generate descriptions from responses using templates.

References

[1]

Benetech. 2012. POET image description tool. Retrieved October 10, 2015, from http://diagramcenter.org/development/poet.html.

Google Scholar

[2]

Benetech and Touch Graphics. 2014. Decision tree: Image sorting tool. Retrieved October 10, 2015, from http://diagramcenter.org/decision-tree.html.

Google Scholar

[3]

Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The Semantic Web. Scientific American 284, 5, 28--37.

Crossref

Google Scholar

[4]

Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010a. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the 23rd Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 333--342.

Digital Library

Google Scholar

[5]

Jeffrey P. Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz: LocateIt-enabling blind people to locate objects in their environment. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, Los Alamitos, CA, 65--72.

Crossref

Google Scholar

[6]

Jeffrey P. Bigham, Richard E. Ladner, and Yevgen Borodin. 2011. The design of human-powered access technology. In Proceedings of the 13th International Conference on Computers and Accessibility (SIGACCESS’11). ACM, New York, NY, 3--10.

Digital Library

Google Scholar

[7]

Rune Haubo Bojesen Christensen, Hye-Seong Lee, and Per Bruun Brockhoff. 2012. Estimation of the Thurstonian model for the 2-AC protocol. Food Quality and Preference 24, 1, 119--128.

Crossref

Google Scholar

[8]

Leonid Boytsov. 2011. Indexing methods for approximate dictionary searching: Comparative analysis. Journal of Experimental Algorithmics 16, 1.

Digital Library

Google Scholar

[9]

Sandra Carberry, Stephanie Elzer Schwartz, Kathleen Mccoy, Seniz Demir, Peng Wu, Charles Greenbacker, Daniel Chester, Edward Schwartz, David Oliver, and Priscilla Moraes. 2012. Access to multimodal articles for individuals with sight impairments. ACM Transactions on Interactive Intelligent Systems 2, 4, 21.

Digital Library

Google Scholar

[10]

Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, and Rajeev Motwani. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 313--324.

Digital Library

Google Scholar

[11]

Daniel Dardailler. 1997. The ALT-Server (“An Eye for an Alt”). Retrieved October 10, 2015, from http://www.w3.org/WAI/altserv.htm.

Google Scholar

[12]

Seniz Demir, Sandra Carberry, and Kathleen F. McCoy. 2012. Summarizing information graphics textually. Computational Linguistics 38, 3, 527--574.

Crossref

Google Scholar

[13]

Seniz Demir, David Oliver, Edward Schwartz, Stephanie Elzer, Sandra Carberry, Kathleen F. Mccoy, and Daniel Chester. 2010. Interactive SIGHT: Textual access to simple bar charts. New Review of Hypermedia and Multimedia 16, 3, 245--279.

Digital Library

Google Scholar

[14]

Seniz Demir, Stephanie Elzer Schwartz, Richard Burns, and Sandra Carberry. 2013. What is being measured in an information graphic? In Computational Linguistics and Intelligent Text Processing. Springer, 501--512.

Digital Library

Google Scholar

[15]

Michel Dumontier, Leo Ferres, and Natalia Villanueva-Rosales. 2010. Modeling and querying graphical representations of statistical data. Web Semantics: Science, Services and Agents on the World Wide Web 8, 2, 241--254.

Digital Library

Google Scholar

[16]

Stephanie Elzer, Sandra Carberry, Ingrid Zukerman, Daniel Chester, Nancy Green, and Seniz Demir. 2005. A probabilistic framework for recognizing intention in information graphics. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 19. 1042.

Digital Library

Google Scholar

[17]

Massimo Fasciano and Guy Lapalme. 1996. Postgraphe: A system for the generation of statistical graphics and text. In Proceedings of the 8th International Workshop on Natural Language Generation (INLG’96). 51--60.

Google Scholar

[18]

Yansong Feng and Mirella Lapata. 2010. How many words is a picture worth? Automatic caption generation for news images. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1239--1249.

Digital Library

Google Scholar

[19]

Leo Ferres, Gitte Lindgaard, Livia Sumegi, and Bruce Tsuji. 2013. Evaluating a tool for improving accessibility to charts and graphs. ACM Transactions on Computer-Human Interaction 20, 5, 28.

Digital Library

Google Scholar

[20]

Bryan Gould, Trisha O’Connell, and Geoffrey Freed. 2008. Guidelines for describing STEM images. Retrieved October 10, 2015, from http://ncam.wgbh.org/experience_learn/educational_media/stemdx/guidelines.

Google Scholar

[21]

Chandrika Jayant, Matt Renzelmann, Dana Wen, Satria Krisnandi, Richard Ladner, and Dan Comden. 2007. Automated tactile graphics translation: In the field. In Proceedings of the 9th International Conference on Computers and Accessibility (SIGACCESS’07). ACM, New York, NY, 75--82.

Digital Library

Google Scholar

[22]

Geoffrey Keppel and Thomas D. Wickens. 2004. Design and Analysis: A Researcher’s Handbook (4 ed.). Pearson Education, Upper Saddle River, NJ.

Google Scholar

[23]

Richard E. Ladner, Melody Y. Ivory, Rajesh Rao, Sheryl Burgstahler, Dan Comden, Sangyun Hahn, Matthew Renzelmann, Satria Krisnandi, Mahalakshmi Ramasamy, Beverly Slabosky, Andrew Martin, Amelia Lacenski, Stuart Olsen, and Dmitri Groce. 2005. Automating tactile graphics translation. In Proceedings of the 7th International Conference on Computers and Accessibility (SIGACCESS’05). ACM, New York, NY, 150--157.

Digital Library

Google Scholar

[24]

Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 23--34.

Digital Library

Google Scholar

[25]

LimeSurvey Project Team/Carsten Schmitz. 2012. LimeSurvey: An Open Source Survey Tool. LimeSurvey Project, Hamburg, Germany. http://www.limesurvey.org.

Google Scholar

[26]

Kathleen F. McCoy, Sandra Carberry, Tom Roper, and Nancy Green. 2001. Towards generating textual summaries of graphs. In Proceedings of the International Conference on Universal Access in Human-Computer Interaction. 695--699.

Google Scholar

[27]

R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org.

Google Scholar

[28]

Daisuke Sato, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa. 2010. Social accessibility: The challenge of improving Web accessibility through collaboration. In Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A’10). ACM, New York, NY, 28.

Digital Library

Google Scholar

[29]

Hironobu Takagi, Susumu Harada, Daisuke Sato, and Chieko Asakawa. 2013. Lessons learned from crowd accessibility services. In Human-Computer Interaction—INTERACT 2013. Springer, 587--604.

Google Scholar

[30]

Esko Ukkonen. 1992. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 1, 191--211.

Digital Library

Google Scholar

[31]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555.

Google Scholar

[32]

Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 319--326.

Digital Library

Google Scholar

[33]

Luis Von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu, and Manuel Blum. 2006. Improving accessibility of the Web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 79--82.

Digital Library

Google Scholar

[34]

Peng Wu, Sandra Carberry, Stephanie Elzer, and Daniel Chester. 2010. Recognizing the intended message of line graphs. In Diagrammatic Representation and Inference. Springer, 220--234.

Digital Library

Google Scholar

Cited By

View all

Leotta MRibaudo M(2024)Evaluating the Effectiveness of STEM Images CaptioningProceedings of the 21st International Web for All Conference10.1145/3677846.3677863(150-159)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677863
Seo JKamath SZeidieh AVenkatesh SMcCurry S(2024)MAIDR Meets AI: Exploring Multimodal LLM-Based Data Visualization Interpretation by and with Blind and Low-Vision UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675660(1-31)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675660
Gubbi Mohanbabu APavel A(2024)Context-Aware Image Descriptions for Web AccessibilityProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675658(1-17)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675658
Show More Cited By

Index Terms

Guiding Novice Web Workers in Making Image Descriptions Using Templates
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Social and professional topics

Recommendations

Image Descriptions' Limitations for People with Visual Impairments: Where Are We and Where Are We Going?
IHC '21: Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems

Image descriptions aim to transcribe the visual content and are essential for people who do not have eyesight. Such image descriptions are generated manually or by Artificial Intelligence (AI) models. Despite its relevance, the emergence of automatic ...
Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision
ASSETS '21: Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility

Image descriptions are how people who are blind or have low vision (BLV) access information depicted within images. To our knowledge, no prior work has examined how a description for an image should be designed for different scenarios in which users ...
"That's in the eye of the beholder": Layers of Interpretation in Image Descriptions for Fictional Representations of People with Disabilities
ASSETS '21: Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility

Image accessibility is an established research area in Accessible Computing and a key area of digital accessibility for blind and low vision (BLV) people worldwide. Recent work has delved deeper into the question of how image descriptions should properly ...

Reviews

Reviewer: William Brinkman

Making science, technology, engineering, and math (STEM) education available to all who are capable of learning is a moral imperative. Yet our educational system (including authors, textbook publishers, and college professors) struggles to provide appropriate access to the images and figures that are critical to STEM subject learning. Morash et al. envision a system that would allow nonexpert workers (recruited through a service such as Amazon's Mechanical Turk) to create high-quality accessible descriptions (also known as alt-text) of STEM images. The success of such a system could greatly reduce the cost of making STEM teaching materials accessible, and thereby greatly increase access to STEM education for people with visual impairments. Their main contribution is to demonstrate that the design of the system's user interface influences the completeness and uniformity of the resulting alt-text. Current web-based systems for this problem simply present the worker with an image and a set of instructions, and allow the worker to enter his or her description as free text. The authors have created a competing system (which they call a queried image description, QID) that uses an interactive survey tool to gather information from the worker, and then auto-generates the image description using a template. Web workers using QID are significantly less likely to omit key information (like captions, or units on graphs) than those using free text entry. There is also significantly less variation in descriptions generated by different workers when using QID as compared to free text entry, which should simplify quality control and reduce user confusion. While there is a well-founded hope that QID-generated alt-text will be more usable than free text, and comparable to alt-text created by experts, such usability testing is left as future work. Another notable aspect of this paper is the bringing together of "greatest hits" from several different areas of computer science research. Ukkoken's approximate string matching, Von Ahn et al.'s image labeling, and the Jaccard coefficient are all ideas that graduate students should see. This paper could therefore be the starting point for a nice seminar course. These ideas should find widespread adoption in the future, if it can be shown that such a system generates alt-text at a level of quality comparable to an expert. The pressures on colleges and textbook publishers to make STEM education accessible are intense, and this approach has the potential to solve one of the major barriers to doing so. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Accessible Computing

ACM Transactions on Accessible Computing Volume 7, Issue 4

November 2015

77 pages

ISSN:1936-7228

EISSN:1936-7236

DOI:10.1145/2847216

Editors:
Matt Huenerfauth
Rochester Institute of Technology, USA
,
Kathleen F. McCoy
University of Delaware, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2015

Accepted: 01 April 2015

Revised: 01 April 2015

Received: 01 September 2014

Published in TACCESS Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Office of Special Education Programs
Department of Education
Cooperative Agreement
Benetechs DIAGRAM Center initiative

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
1,292
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)32

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Leotta MRibaudo M(2024)Evaluating the Effectiveness of STEM Images CaptioningProceedings of the 21st International Web for All Conference10.1145/3677846.3677863(150-159)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677863
Seo JKamath SZeidieh AVenkatesh SMcCurry S(2024)MAIDR Meets AI: Exploring Multimodal LLM-Based Data Visualization Interpretation by and with Blind and Low-Vision UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675660(1-31)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675660
Gubbi Mohanbabu APavel A(2024)Context-Aware Image Descriptions for Web AccessibilityProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675658(1-17)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675658
Singh NWang LBragg J(2024)FigurA11y: AI Assistance for Writing Scientific Alt TextProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645212(886-906)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645212
Ko HJeon HPark GKim DKim NKim JSeo J(2024)Natural Language Dataset Generation Framework for Visualizations Powered by Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642943(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642943
Seo JXia YLee BMccurry SYam Y(2024)MAIDR: Making Statistical Visualizations Accessible with Multimodal Data RepresentationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642730(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642730
Jiang CFan YXie JKuang EZhang KFan M(2024)Designing Unobtrusive Modulated Electrotactile Feedback on Fingertip Edge to Assist Blind and Low Vision (BLV) People in Comprehending ChartsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642546(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642546
Jiang LJung CPhutane MStangl AAzenkot S(2024)“It’s Kind of Context Dependent”: Understanding Blind and Low Vision People’s Video Accessibility Preferences Across Viewing ScenariosProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642238(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642238
Jeong HChun MLee HOh SJung H(2023)WATAA: Web Alternative Text Authoring Assistant for Improving Web Content AccessibilityCompanion Proceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581754.3584127(41-45)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581754.3584127
Fan DFay Siu ARao HKim GVazquez XGreco LO'Modhrain SFollmer S(2023)The Accessibility of Data Visualizations on the Web for Screen Reader Users: Practices and Experiences During COVID-19ACM Transactions on Accessible Computing10.1145/355789916:1(1-29)Online publication date: 29-Mar-2023
https://dl.acm.org/doi/10.1145/3557899
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Image Descriptions' Limitations for People with Visual Impairments: Where Are We and Where Are We Going?

Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision

"That's in the eye of the beholder": Layers of Interpretation in Image Descriptions for Fictional Representations of People with Disabilities

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations