research-article

Measuring text simplification with the crowd

Authors:

Walter S. Lasecki,

Jeffrey P. BighamAuthors Info & Claims

W4A '15: Proceedings of the 12th International Web for All Conference

Article No.: 4, Pages 1 - 9

https://doi.org/10.1145/2745555.2746658

Published: 18 May 2015 Publication History

Abstract

Text can often be complex and difficult to read, especially for people with cognitive impairments or low literacy skills. Text simplification is a process that reduces the complexity of both wording and structure in a sentence, while retaining its meaning. However, this is currently a challenging task for machines, and thus, providing effective on-demand text simplification to those who need it remains an unsolved problem. Even evaluating the simplicity of text remains a challenging problem for both computers, which cannot understand the meaning of text, and humans, who often struggle to agree on what constitutes a good simplification.

This paper focuses on the evaluation of English text simplification using the crowd. We show that leveraging crowds can result in a collective decision that is accurate and converges to a consensus rating. Our results from 2,500 crowd annotations show that the crowd can effectively rate levels of simplicity. This may allow simplification systems and system builders to get better feedback about how well content is being simplified, as compared to standard measures which classify content into 'simplified' or 'not simplified' categories. Our study provides evidence that the crowd could be used to evaluate English text simplification, as well as to create simplified text in future work.

References

[1]

R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4): 555--596, 2008.

Digital Library

[2]

A. D. Baddeley, N. Thomson, and M. Buchanan. Word length and the structure of short-term memory. Journal of verbal learning and verbal behavior, 14(6): 575--589, 1975.

[3]

R. Baeza-Yates, L. Rello, and J. Dembowski. A context-aware synonym simplification algorithm: Cassa. In Proc. NAACL '15, Denver, Colorado, USA, 2015. ACM.

[4]

M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proc. UIST '11, pages 33--42, New York, NY, USA, 2011. ACM.

Digital Library

[5]

M. S. Bernstein, D. R. Karger, R. C. Miller, and J. Brandt. Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995, 2012.

[6]

M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D. Crowell, and K. Panovich. Soylent: A word processor with a crowd inside. In Proc. UIST '10, pages 313--322, New York, NY, USA, 2010. ACM.

Digital Library

[7]

J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, and T. Yeh. Vizwiz: Nearly real-time answers to visual questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, UIST '10, pages 333--342, New York, NY, USA, 2010. ACM.

Digital Library

[8]

J. P. Bigham and R. E. Ladner. What the disability community can teach us about interactive crowdsourcing. interactions, 18(4): 78--81, July 2011.

Digital Library

[9]

O. Biran, S. Brody, and N. Elhadad. Putting it simply: a context-aware approach to lexical simplification. In Proc. ACL'11, pages 496--501, Portland, Oregon, USA, 2011.

Digital Library

[10]

S. Bott, L. Rello, B. Drndarevic, and H. Saggion. Can Spanish be simpler? LexSiS: Lexical simplification for Spanish. In Proc. Coling '12, Mumbay, India, 2012.

[11]

C. Callison-Burch. Fast, cheap, and creative: Evaluating translation quality using amazon's mechanical turk. In Proc. EMNLP '09, pages 286--295, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.

Digital Library

[12]

J. Carroll, G. Minnen, Y. Canning, S. Devlin, and J. Tait. Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In Proc. of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, pages 7--10, 1998.

[13]

L. B. Chilton, G. Little, D. Edge, D. S. Weld, and J. A. Landay. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI '13, pages 1999--2008, New York, NY, USA, 2013. ACM.

Digital Library

[14]

L. B. Chilton, C. T. Sims, M. Goldman, G. Little, and R. C. Miller. Seaweed: A web application for designing economic games. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09, pages 34--35, New York, NY, USA, 2009. ACM.

Digital Library

[15]

K. Collins-Thompson. Computational assessment of text readability: A survey of current and future research (working draft), 2014.

[16]

S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee, M. Beenen, A. Leaver-Fay, D. Baker, Z. Popović, et al. Predicting protein structures with a multiplayer online game. Nature, 466(7307): 756--760, 2010.

[17]

S. Crossley, M. Louwerse, P. McCarthy, and D. McNamara. A linguistic analysis of simplified and authentic texts. The Modern Language Journal, 91(1): 15--30, 2007.

[18]

O. De Clercq, V. Hoste, B. Desmet, P. Van Oosten, M. De Cock, and L. Macken. Using the crowd for readability prediction. Natural Language Engineering, pages 1--33, 2013.

[19]

S. Devlin and G. Unthank. Helping aphasic people process online information. In Proc. ASSETS '06, pages 225--226. ACM, 2006.

Digital Library

[20]

W. H. Dubay. The principles of readability a brief introduction to readability research, 2004.

[21]

R. Evans, C. Orasan, and I. Dornescu. An evaluation of syntactic simplification rules for people with autism. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL, pages 131--140, 2014.

[22]

L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad. A comparison of features for automatic readability assessment. In Proc. ACL '10, pages 276--284. Association for Computational Linguistics, 2010.

Digital Library

[23]

R. Flesch. A new readability yardstick. Journal of applied psychology, 32(3): 221, 1948.

[24]

G. Freyhoff, G. Hess, L. Kerr, E. Menzel, B. Tronbacke, and K. V. D. Veken. European guidelines for the production of easy-to-read information for people with learning disability, 1998.

[25]

C. Gasperin, E. Maziero, L. Specia, T. Pardo, and S. Aluisio. Natural language processing for social inclusion: a text simplification architecture for different literacy levels. the Proceedings of SEMISH--XXXVI Seminário Integrado de Software e Hardware, pages 387--401, 2009.

[26]

E. Gibson. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1): 1--76, 1998.

[27]

R. Gunning. Technique of clear writing. McGraw-Hill, New York, 1952.

[28]

K. Inui, A. Fujita, T. Takahashi, R. Iida, and T. Iwakura. Text simplification for reading assistance: A project note. In Proceedings of the second international workshop on Paraphrasing-Volume 16, pages 9--16. Association for Computational Linguistics, 2003.

Digital Library

[29]

J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for navy enlisted personnel. Technical report, DTIC Document, 1975.

[30]

W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, and J. Bigham. Real-time captioning by groups of non-experts. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST '12, pages 23--34, New York, NY, USA, 2012. ACM.

Digital Library

[31]

W. S. Lasecki, K. I. Murray, S. White, R. C. Miller, and J. P. Bigham. Real-time crowd control of existing interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST '11, pages 23--32, New York, NY, USA, 2011. ACM.

Digital Library

[32]

W. S. Lasecki, P. Thiha, Y. Zhong, E. Brady, and J. P. Bigham. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '13, pages 18:1--18:8, New York, NY, USA, 2013. ACM.

Digital Library

[33]

W. S. Lasecki, L. Weingard, G. Ferguson, and J. P. Bigham. Finding dependencies between actions using the crowd. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '14, pages 3095--3098, New York, NY, USA, 2014. ACM.

Digital Library

[34]

W. S. Lasecki, R. Wesley, J. Nichols, A. Kulkarni, J. F. Allen, and J. P. Bigham. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, pages 151--162, New York, NY, USA, 2013. ACM.

Digital Library

[35]

G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. Turkit: Human computation algorithms on mechanical turk. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, UIST '10, pages 57--66, New York, NY, USA, 2010. ACM.

Digital Library

[36]

D. Malvern and B. Richards. Measures of lexical richness. The Encyclopedia of Applied Linguistics, 2012.

[37]

G. H. McLaughlin. SMOG grading: A new readability formula. Journal of reading, 12(8): 639--646, 1969.

[38]

N. J. Minshew and G. Goldstein. Autism as a disorder of complex information processing. Mental Retardation and Developmental Disabilities Research Reviews, 4(2): 129--136, 1998.

[39]

U. Nations. Standard Rules on the Equalization of Opportunities for Persons with Disabilities, 1994.

[40]

C. Orasan, R. Evans, and I. Dornescu. Towards Multilingual Europe 2020: A Romanian Perspective, chapter Text Simplification for People with Autistic Spectrum Disorders, pages 287--312. Romanian Academy Publishing House, Bucharest, 2013.

[41]

D. Pellow and M. Eskenazi. An open corpus of everyday documents for simplification tasks. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@ EACL, pages 84--93, 2014.

[42]

Plain Language Action and Information Network (PLAIN). Federal Plain Language Guidelines. US Government, 2011. http://www.plainlanguage.gov/.

[43]

L. Rello and R. Baeza-Yates. Evaluation of Dyswebxia: A reading app designed for people with dyslexia. In Proc. W4A '14, Seoul, Korea, 2014.

Digital Library

[44]

L. Rello, R. Baeza-Yates, S. Bott, and H. Saggion. Simplify or help? Text simplification strategies for people with dyslexia. In Proc. W4A '13, Rio de Janeiro, Brazil, 2013.

Digital Library

[45]

L. Rello, R. Baeza-Yates, L. Dempere, and H. Saggion. Frequent words improve readability and short words improve understandability for people with dyslexia. In Proc. INTERACT '13, Cape Town, South Africa, 2013.

[46]

L. Rello, S. Bautista, R. Baeza-Yates, P. Gervás, R. Hervás, and H. Saggion. One half or 50%? An eye-tracking study of number representation readability. In Proc. INTERACT '13, Cape Town, South Africa, 2013.

[47]

M. L. Rice, S. F. Warren, and S. K. Betz. Language symptoms of developmental language disorders: An overview of autism, down syndrome, fragile x, specific language impairment, and williams syndrome. Applied psycholinguistics, 26(01): 7--27, 2005.

[48]

H. Saggion, S. Štajner, S. Bott, S. Mille, L. Rello, and B. Drndarevic. Making it Simplext: Implementation and evaluation of a text simplification system for spanish. ACM Transactions on Accessible Computing (TACCESS), In Press, 2015.

Digital Library

[49]

A. Siddharthan. Syntactic simplification and text cohesion. Research on Language and Computation, 4(1): 77--109, 2006.

[50]

F. Simmons and C. Singleton. The reading comprehension abilities of dyslexic students in higher education. Dyslexia, 6(3): 178--192, 2000.

[51]

R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast---but is it good?: Evaluating non-expert annotations for natural language tasks. In Proc. EMNLP '08, pages 254--263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.

Digital Library

[52]

S. Tanaka, A. Jatowt, M. P. Kato, and K. Tanaka. Estimating content concreteness for finding comprehensible documents. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 475--484. ACM, 2013.

Digital Library

[53]

L. von Ahn. Human Computation. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2005.

Digital Library

[54]

L. von Ahn and L. Dabbish. Labeling images with a computer game. In Proc. CHI '04, pages 319--326, New York, NY, USA, 2004. ACM.

Digital Library

[55]

S. Štajner, R. Evans, C. Orasan, and R. Mitkov. What can readability measures really tell us about text complexity. In Proceedings of the the Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA), 2012.

[56]

S. Štajner, R. Mitkov, and H. Saggion. One step closer to automatic evaluation of text simplification systems. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@ EACL, pages 1--10, 2014.

[57]

M. Yatskar, B. Pang, C. Danescu-Niculescu-Mizil, and L. Lee. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In Proc. ACL'10, pages 365--368, Uppsala, Sweden, 2010.

Digital Library

[58]

H. Zhang, E. Law, R. Miller, K. Gajos, D. Parkes, and E. Horvitz. Human computation tasks with global constraints. In Proc. CHI '12, pages 217--226, New York, NY, USA, 2012. ACM.

Digital Library

Cited By

Alonzo OTrussell JDingman BHuenerfauth MKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Comparison of Methods for Evaluating Complexity of Simplified Texts among Deaf and Hard-of-Hearing Adults at Different Literacy LevelsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445038(1-12)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445038
Hsu HChen YLin CPai T(2020)Patient Oriented Readability Assessment for Heart Disease Healthcare DocumentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202001010416:1(63-72)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.4018/IJDWM.2020010104
Gligorić KAnderson AWest R(2019)Causal Effects of Brevity on Style and Success in Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/33591473:CSCW(1-23)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1145/3359147
Show More Cited By

Index Terms

Measuring text simplification with the crowd
1. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Assistive technologies
  2. User characteristics
    1. People with disabilities

Recommendations

Automated Text Simplification: A Survey

Text simplification (TS) reduces the complexity of the text to improve its readability and understandability, while possibly retaining its original information content. Over time, TS has become an essential tool in helping those with low literacy levels,...
Text simplification resources for Spanish

In this paper we present the development of a text simplification system for Spanish. Text simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and ...
Comparing resources for spanish lexical simplification
SLSP'13: Proceedings of the First international conference on Statistical Language and Speech Processing

In this paper we study the effect of different lexical resources and strategies for selecting synonyms in a lexical simplification system for the Spanish language. The resources used for the experiments are the Spanish EuroWordNet, the Spanish Open ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

W4A '15: Proceedings of the 12th International Web for All Conference

May 2015

214 pages

ISBN:9781450333429

DOI:10.1145/2745555

General Chairs:
Luis Carriço
University of Lisbon, Portugal
,
Silvia Mirri
University of Bologna, Italy
,
Program Chairs:
Tiago Guerreiro
University of Lisbon, Portugal
,
Peter Thiessen
Midokura, Spain

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Intuit: Intuit Inc.
Google Inc.
Zakon Group
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
SIGACCESS: ACM Special Interest Group on Accessible Computing
Ability Magazine: Ability Magazine
TPG: The Paciello Group
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

W4A '15

Sponsor:

SIGWEB
Intuit
SIGCHI
SIGACCESS
Ability Magazine
TPG
IBM

W4A '15: International Web for All Conference

May 18 - 20, 2015

Florence, Italy

Acceptance Rates

W4A '15 Paper Acceptance Rate 11 of 31 submissions, 35%;

Overall Acceptance Rate 171 of 371 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
328
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alonzo OTrussell JDingman BHuenerfauth MKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Comparison of Methods for Evaluating Complexity of Simplified Texts among Deaf and Hard-of-Hearing Adults at Different Literacy LevelsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445038(1-12)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445038
Hsu HChen YLin CPai T(2020)Patient Oriented Readability Assessment for Heart Disease Healthcare DocumentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202001010416:1(63-72)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.4018/IJDWM.2020010104
Gligorić KAnderson AWest R(2019)Causal Effects of Brevity on Style and Success in Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/33591473:CSCW(1-23)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1145/3359147
Choi JKim KKim Y(2019)Emotional Analysis with News Using Text Mining for Framing TheoryComputer and Information Science10.1007/978-3-030-25213-7_7(95-108)Online publication date: 7-Aug-2019
https://doi.org/10.1007/978-3-030-25213-7_7
Chen PWu FWang TDing WMcIlraith SWeinberger K(2018)A semantic QA-based approach for text summarization evaluationProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504623(4800-4807)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.5555/3504035.3504623
Villela KHess AKoch MFalcao RGroen EDorr JValero CEbert A(2018)Towards Ubiquitous RE: A Perspective on Requirements Engineering in the Era of Digital Transformation2018 IEEE 26th International Requirements Engineering Conference (RE)10.1109/RE.2018.00029(205-216)Online publication date: Aug-2018
https://doi.org/10.1109/RE.2018.00029
Ojha PIsmail AKuppusamy K(2018)Perusal of readability with focus on web content understandabilityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2018.03.007Online publication date: Mar-2018
https://doi.org/10.1016/j.jksuci.2018.03.007
Bapat RKucherbaev PBozzon A(2018)Effective Crowdsourced Generation of Training Data for Chatbots Natural Language UnderstandingWeb Engineering10.1007/978-3-319-91662-0_8(114-128)Online publication date: 20-May-2018
https://doi.org/10.1007/978-3-319-91662-0_8
Buehler EEasley WPoole AHurst AGay GGuerreiro T(2016)Accessibility barriers to online education for young adults with intellectual disabilitiesProceedings of the 13th International Web for All Conference10.1145/2899475.2899481(1-10)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.1145/2899475.2899481

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten