Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2745555.2746658acmconferencesArticle/Chapter ViewAbstractPublication Pagesw4aConference Proceedingsconference-collections
research-article

Measuring text simplification with the crowd

Published: 18 May 2015 Publication History

Abstract

Text can often be complex and difficult to read, especially for people with cognitive impairments or low literacy skills. Text simplification is a process that reduces the complexity of both wording and structure in a sentence, while retaining its meaning. However, this is currently a challenging task for machines, and thus, providing effective on-demand text simplification to those who need it remains an unsolved problem. Even evaluating the simplicity of text remains a challenging problem for both computers, which cannot understand the meaning of text, and humans, who often struggle to agree on what constitutes a good simplification.
This paper focuses on the evaluation of English text simplification using the crowd. We show that leveraging crowds can result in a collective decision that is accurate and converges to a consensus rating. Our results from 2,500 crowd annotations show that the crowd can effectively rate levels of simplicity. This may allow simplification systems and system builders to get better feedback about how well content is being simplified, as compared to standard measures which classify content into 'simplified' or 'not simplified' categories. Our study provides evidence that the crowd could be used to evaluate English text simplification, as well as to create simplified text in future work.

References

[1]
R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4): 555--596, 2008.
[2]
A. D. Baddeley, N. Thomson, and M. Buchanan. Word length and the structure of short-term memory. Journal of verbal learning and verbal behavior, 14(6): 575--589, 1975.
[3]
R. Baeza-Yates, L. Rello, and J. Dembowski. A context-aware synonym simplification algorithm: Cassa. In Proc. NAACL '15, Denver, Colorado, USA, 2015. ACM.
[4]
M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proc. UIST '11, pages 33--42, New York, NY, USA, 2011. ACM.
[5]
M. S. Bernstein, D. R. Karger, R. C. Miller, and J. Brandt. Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995, 2012.
[6]
M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D. Crowell, and K. Panovich. Soylent: A word processor with a crowd inside. In Proc. UIST '10, pages 313--322, New York, NY, USA, 2010. ACM.
[7]
J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, and T. Yeh. Vizwiz: Nearly real-time answers to visual questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, UIST '10, pages 333--342, New York, NY, USA, 2010. ACM.
[8]
J. P. Bigham and R. E. Ladner. What the disability community can teach us about interactive crowdsourcing. interactions, 18(4): 78--81, July 2011.
[9]
O. Biran, S. Brody, and N. Elhadad. Putting it simply: a context-aware approach to lexical simplification. In Proc. ACL'11, pages 496--501, Portland, Oregon, USA, 2011.
[10]
S. Bott, L. Rello, B. Drndarevic, and H. Saggion. Can Spanish be simpler? LexSiS: Lexical simplification for Spanish. In Proc. Coling '12, Mumbay, India, 2012.
[11]
C. Callison-Burch. Fast, cheap, and creative: Evaluating translation quality using amazon's mechanical turk. In Proc. EMNLP '09, pages 286--295, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
[12]
J. Carroll, G. Minnen, Y. Canning, S. Devlin, and J. Tait. Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In Proc. of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, pages 7--10, 1998.
[13]
L. B. Chilton, G. Little, D. Edge, D. S. Weld, and J. A. Landay. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI '13, pages 1999--2008, New York, NY, USA, 2013. ACM.
[14]
L. B. Chilton, C. T. Sims, M. Goldman, G. Little, and R. C. Miller. Seaweed: A web application for designing economic games. In Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09, pages 34--35, New York, NY, USA, 2009. ACM.
[15]
K. Collins-Thompson. Computational assessment of text readability: A survey of current and future research (working draft), 2014.
[16]
S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee, M. Beenen, A. Leaver-Fay, D. Baker, Z. Popović, et al. Predicting protein structures with a multiplayer online game. Nature, 466(7307): 756--760, 2010.
[17]
S. Crossley, M. Louwerse, P. McCarthy, and D. McNamara. A linguistic analysis of simplified and authentic texts. The Modern Language Journal, 91(1): 15--30, 2007.
[18]
O. De Clercq, V. Hoste, B. Desmet, P. Van Oosten, M. De Cock, and L. Macken. Using the crowd for readability prediction. Natural Language Engineering, pages 1--33, 2013.
[19]
S. Devlin and G. Unthank. Helping aphasic people process online information. In Proc. ASSETS '06, pages 225--226. ACM, 2006.
[20]
W. H. Dubay. The principles of readability a brief introduction to readability research, 2004.
[21]
R. Evans, C. Orasan, and I. Dornescu. An evaluation of syntactic simplification rules for people with autism. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL, pages 131--140, 2014.
[22]
L. Feng, M. Jansche, M. Huenerfauth, and N. Elhadad. A comparison of features for automatic readability assessment. In Proc. ACL '10, pages 276--284. Association for Computational Linguistics, 2010.
[23]
R. Flesch. A new readability yardstick. Journal of applied psychology, 32(3): 221, 1948.
[24]
G. Freyhoff, G. Hess, L. Kerr, E. Menzel, B. Tronbacke, and K. V. D. Veken. European guidelines for the production of easy-to-read information for people with learning disability, 1998.
[25]
C. Gasperin, E. Maziero, L. Specia, T. Pardo, and S. Aluisio. Natural language processing for social inclusion: a text simplification architecture for different literacy levels. the Proceedings of SEMISH--XXXVI Seminário Integrado de Software e Hardware, pages 387--401, 2009.
[26]
E. Gibson. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1): 1--76, 1998.
[27]
R. Gunning. Technique of clear writing. McGraw-Hill, New York, 1952.
[28]
K. Inui, A. Fujita, T. Takahashi, R. Iida, and T. Iwakura. Text simplification for reading assistance: A project note. In Proceedings of the second international workshop on Paraphrasing-Volume 16, pages 9--16. Association for Computational Linguistics, 2003.
[29]
J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for navy enlisted personnel. Technical report, DTIC Document, 1975.
[30]
W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, and J. Bigham. Real-time captioning by groups of non-experts. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST '12, pages 23--34, New York, NY, USA, 2012. ACM.
[31]
W. S. Lasecki, K. I. Murray, S. White, R. C. Miller, and J. P. Bigham. Real-time crowd control of existing interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST '11, pages 23--32, New York, NY, USA, 2011. ACM.
[32]
W. S. Lasecki, P. Thiha, Y. Zhong, E. Brady, and J. P. Bigham. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '13, pages 18:1--18:8, New York, NY, USA, 2013. ACM.
[33]
W. S. Lasecki, L. Weingard, G. Ferguson, and J. P. Bigham. Finding dependencies between actions using the crowd. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '14, pages 3095--3098, New York, NY, USA, 2014. ACM.
[34]
W. S. Lasecki, R. Wesley, J. Nichols, A. Kulkarni, J. F. Allen, and J. P. Bigham. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, pages 151--162, New York, NY, USA, 2013. ACM.
[35]
G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. Turkit: Human computation algorithms on mechanical turk. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, UIST '10, pages 57--66, New York, NY, USA, 2010. ACM.
[36]
D. Malvern and B. Richards. Measures of lexical richness. The Encyclopedia of Applied Linguistics, 2012.
[37]
G. H. McLaughlin. SMOG grading: A new readability formula. Journal of reading, 12(8): 639--646, 1969.
[38]
N. J. Minshew and G. Goldstein. Autism as a disorder of complex information processing. Mental Retardation and Developmental Disabilities Research Reviews, 4(2): 129--136, 1998.
[39]
U. Nations. Standard Rules on the Equalization of Opportunities for Persons with Disabilities, 1994.
[40]
C. Orasan, R. Evans, and I. Dornescu. Towards Multilingual Europe 2020: A Romanian Perspective, chapter Text Simplification for People with Autistic Spectrum Disorders, pages 287--312. Romanian Academy Publishing House, Bucharest, 2013.
[41]
D. Pellow and M. Eskenazi. An open corpus of everyday documents for simplification tasks. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@ EACL, pages 84--93, 2014.
[42]
Plain Language Action and Information Network (PLAIN). Federal Plain Language Guidelines. US Government, 2011. http://www.plainlanguage.gov/.
[43]
L. Rello and R. Baeza-Yates. Evaluation of Dyswebxia: A reading app designed for people with dyslexia. In Proc. W4A '14, Seoul, Korea, 2014.
[44]
L. Rello, R. Baeza-Yates, S. Bott, and H. Saggion. Simplify or help? Text simplification strategies for people with dyslexia. In Proc. W4A '13, Rio de Janeiro, Brazil, 2013.
[45]
L. Rello, R. Baeza-Yates, L. Dempere, and H. Saggion. Frequent words improve readability and short words improve understandability for people with dyslexia. In Proc. INTERACT '13, Cape Town, South Africa, 2013.
[46]
L. Rello, S. Bautista, R. Baeza-Yates, P. Gervás, R. Hervás, and H. Saggion. One half or 50%? An eye-tracking study of number representation readability. In Proc. INTERACT '13, Cape Town, South Africa, 2013.
[47]
M. L. Rice, S. F. Warren, and S. K. Betz. Language symptoms of developmental language disorders: An overview of autism, down syndrome, fragile x, specific language impairment, and williams syndrome. Applied psycholinguistics, 26(01): 7--27, 2005.
[48]
H. Saggion, S. Štajner, S. Bott, S. Mille, L. Rello, and B. Drndarevic. Making it Simplext: Implementation and evaluation of a text simplification system for spanish. ACM Transactions on Accessible Computing (TACCESS), In Press, 2015.
[49]
A. Siddharthan. Syntactic simplification and text cohesion. Research on Language and Computation, 4(1): 77--109, 2006.
[50]
F. Simmons and C. Singleton. The reading comprehension abilities of dyslexic students in higher education. Dyslexia, 6(3): 178--192, 2000.
[51]
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast---but is it good?: Evaluating non-expert annotations for natural language tasks. In Proc. EMNLP '08, pages 254--263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
[52]
S. Tanaka, A. Jatowt, M. P. Kato, and K. Tanaka. Estimating content concreteness for finding comprehensible documents. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 475--484. ACM, 2013.
[53]
L. von Ahn. Human Computation. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2005.
[54]
L. von Ahn and L. Dabbish. Labeling images with a computer game. In Proc. CHI '04, pages 319--326, New York, NY, USA, 2004. ACM.
[55]
S. Štajner, R. Evans, C. Orasan, and R. Mitkov. What can readability measures really tell us about text complexity. In Proceedings of the the Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA), 2012.
[56]
S. Štajner, R. Mitkov, and H. Saggion. One step closer to automatic evaluation of text simplification systems. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@ EACL, pages 1--10, 2014.
[57]
M. Yatskar, B. Pang, C. Danescu-Niculescu-Mizil, and L. Lee. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In Proc. ACL'10, pages 365--368, Uppsala, Sweden, 2010.
[58]
H. Zhang, E. Law, R. Miller, K. Gajos, D. Parkes, and E. Horvitz. Human computation tasks with global constraints. In Proc. CHI '12, pages 217--226, New York, NY, USA, 2012. ACM.

Cited By

View all
  • (2021)Comparison of Methods for Evaluating Complexity of Simplified Texts among Deaf and Hard-of-Hearing Adults at Different Literacy LevelsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445038(1-12)Online publication date: 6-May-2021
  • (2020)Patient Oriented Readability Assessment for Heart Disease Healthcare DocumentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202001010416:1(63-72)Online publication date: 1-Oct-2020
  • (2019)Causal Effects of Brevity on Style and Success in Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/33591473:CSCW(1-23)Online publication date: 7-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
W4A '15: Proceedings of the 12th International Web for All Conference
May 2015
214 pages
ISBN:9781450333429
DOI:10.1145/2745555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NLP
  2. accessibility
  3. crowdsourcing
  4. text simplification

Qualifiers

  • Research-article

Funding Sources

Conference

W4A '15
Sponsor:
W4A '15: International Web for All Conference
May 18 - 20, 2015
Florence, Italy

Acceptance Rates

W4A '15 Paper Acceptance Rate 11 of 31 submissions, 35%;
Overall Acceptance Rate 171 of 371 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Comparison of Methods for Evaluating Complexity of Simplified Texts among Deaf and Hard-of-Hearing Adults at Different Literacy LevelsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445038(1-12)Online publication date: 6-May-2021
  • (2020)Patient Oriented Readability Assessment for Heart Disease Healthcare DocumentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202001010416:1(63-72)Online publication date: 1-Oct-2020
  • (2019)Causal Effects of Brevity on Style and Success in Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/33591473:CSCW(1-23)Online publication date: 7-Nov-2019
  • (2019)Emotional Analysis with News Using Text Mining for Framing TheoryComputer and Information Science10.1007/978-3-030-25213-7_7(95-108)Online publication date: 7-Aug-2019
  • (2018)A semantic QA-based approach for text summarization evaluationProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504623(4800-4807)Online publication date: 2-Feb-2018
  • (2018)Towards Ubiquitous RE: A Perspective on Requirements Engineering in the Era of Digital Transformation2018 IEEE 26th International Requirements Engineering Conference (RE)10.1109/RE.2018.00029(205-216)Online publication date: Aug-2018
  • (2018)Perusal of readability with focus on web content understandabilityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2018.03.007Online publication date: Mar-2018
  • (2018)Effective Crowdsourced Generation of Training Data for Chatbots Natural Language UnderstandingWeb Engineering10.1007/978-3-319-91662-0_8(114-128)Online publication date: 20-May-2018
  • (2016)Accessibility barriers to online education for young adults with intellectual disabilitiesProceedings of the 13th International Web for All Conference10.1145/2899475.2899481(1-10)Online publication date: 11-Apr-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media