Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1866795.1866796dlproceedingsArticle/Chapter ViewAbstractPublication PagesiunlpbeaConference Proceedingsconference-collections
Free access

Readability assessment for text simplification

Published: 05 June 2010 Publication History


We describe a readability assessment approach to support the process of text simplification for poor literacy readers. Given an input text, the goal is to predict its readability level, which corresponds to the literacy level that is expected from the target reader: rudimentary, basic or advanced. We complement features traditionally used for readability assessment with a number of new features, and experiment with alternative ways to model this problem using machine learning methods, namely classification, regression and ranking. The best resulting model is embedded in an authoring tool for Text Simplification.


}}Sandra M. Aluísio, Lucia Specia, Thiago A. S. Pardo, Erick G. Maziero, Renata P. M. Fortes (2008). Towards Brazilian Portuguese Automatic Text Simplification Systems. In the Proceedings of the 8th ACM Symposium on Document Engineering, pp. 240--248.
}}Eckhard Bick (2000). The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD Thesis. University of Århus, Denmark.
}}Jill Burstein, Martin Chodorow and Claudia Leacock (2003). CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays. In the Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico.
}}Arnaldo Candido Jr., Erick Maziero, Caroline Gasperin, Thiago A. S. Pardo, Lucia Specia, and Sandra M. Aluisio (2009). Supporting the Adaptation of Texts for Poor Literacy Readers: a Text Simplification Editor for Brazilian Portuguese. In NAACL-HLT Workshop on Innovative Use of NLP for Building Educational Applications, pages 34--42, Boulder'.
}}Helena de M. Caseli, Tiago de F. Pereira, Lúcia Specia, Thiago A. S. Pardo, Caroline Gasperin and Sandra Maria Aluísio (2009). Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts. In the Proceedings of CICLing.
}}Max Coltheart (1981). The MRC psycholinguistic database. In Quartely Jounal of Experimental Psychology, 33A, pages 497--505.
}}Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer e Richard Harshman (1990). Indexing By Latent Semantic Analysis. In Journal of the American Society For Information Science, V. 41, pages 391--407.
}}Bento C. Dias-da-Silva and Helio R. Moraes (2003). A construção de um thesaurus eletrônico para o português do Brasil. In ALFA- Revista de Lingüística, V. 47, N. 2, pages 101--115.
}}Bento C Dias-da-Silva, Ariani Di Felippo and Maria das Graças V. Nunes (2008). The automatic mapping of Princeton WordNet lexical conceptual relations onto the Brazilian Portuguese WordNet database. In Proceedings of the 6th LREC, Marrakech, Morocco.
}}William H. DuBay (2004). The principles of readability. Costa Mesa, CA: Impact Information:
}}Christiane Fellbaum (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
}}Lijun Feng, Noémie Elhadad and Matt Huenerfauth (2009). Cognitively Motivated Features for Readability Assessment. In the Proceedings of EACL 2009, pages 229--237.
}}Ingo Glöckner, Sven Hartrumpf, Hermann Helbig, Johannes Leveling and Rainer Osswald (2006b). An architecture for rating and controlling text readability. In Proceedings of KONVENS 2006, pages 32--35. Konstanz, Germany.
}}Arthur C. Graesser, Danielle S. McNamara, Max M. Louwerse and Zhiqiang Cai (2004). Coh-Metrix: Analysis of text on cohesion and language. In Behavioral Research Methods, Instruments, and Computers, V. 36, pages 193--202.
}}Ronald K. Hambleton, H. Swaminathan and H. Jane Rogers (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Press.
}}Michael Heilman, Kevyn Collins-Thompson, Jamie Callan and Max Eskenazi (2007). Combining lexical and grammatical features to improve readability measures for first and second language texts. In the Proceedings of NAACL HLT 2007, pages 460--467.
}}Michael Heilman, Kevyn Collins-Thompson and Maxine Eskenazi (2008). An Analysis of Statistical Models and Features for Reading Difficulty Prediction. In Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, pages 71--79.
}}INAF (2009). Instituto P. Montenegro and Ação Educativa. INAF Brasil - Indicador de Alfabetismo Funcional - 2009. Available online at
}}Teresa B. F. Martins, Claudete M. Ghiraldelo, Maria das Graças V. Nunes e Osvaldo N. de Oliveira Jr. (1996). Readability formulas applied to textbooks in brazilian portuguese. ICMC Technical Report, N. 28, 11p.
}}Aurélien Max (2006). Writing for Language-impaired Readers. In Proceedings of CICLing, pages 567--570.
}}Danielle McNamara, Max Louwerse, and Art Graesser, 2002. Coh-Metrix: Automated cohesion and coherence scores to predict text readability and facilitate comprehension. Grant proposal.
}}Eleni Miltsakaki and Audrey Troutt (2007). Read-X: Automatic Evaluation of Reading Difficulty of Web Text. In the Proceedings of E-Learn 2007, Quebec, Canada.
}}Eleni Miltsakaki and Audrey Troutt (2008). Real Time Web Text Classification and Analysis of Reading Difficulty. In the Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, Columbus, OH.
}}Cláudia Oliveira, Maria C. Freitas, Violeta Quental, Cícero N. dos Santos, Renato P. L. and Lucas Souza (2006). A Set of NP-extraction rules for Portuguese: defining and learning. In 7th Workshop on Computational Processing of Written and Spoken Portuguese, Itatiaia, Brazil.
}}Sarah E. Petersen and Mari Ostendorf (2009). A machine learning approach to reading level assessment. Computer Speech and Language 23, 89--106.
}}Emily Pitler and Ani Nenkova (2008). Revisiting readability: A unified framework for predicting text quality. In Proceedings of EMNLP, 2008.
}}Adwait Ratnaparkhi (1996). A Maximum Entropy Part-of-Speech Tagger. In Proceedings of the First Empirical Methods in Natural Language Processing Conference, pages133--142.
}}Brian Roark, Margaret Mitchell and Kristy Hollingshead (2007). Syntactic complexity measures for detecting mild cognitive impairment. In the Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic.
}}Caroline E. Scarton, Daniel M. Almeida, Sandra M. A-luísio (2009). Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. In Proceedings of STIL-2009, São Carlos, Brazil.
}}Sarah E. Schwarm and Mari Ostendorf (2005). Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In the Proceedings of the 43rd Annual Meeting of the ACL, pp 523--530.
}}Kathleen M. Sheehan, Irene Kostin and Yoko Futagi (2007). Reading Level Assessment for Literary and Expository Texts. In D. S. McNamara and J. G. Trafton (Eds.), Proceedings of the 29th Annual Cognitive Science Society, page 1853. Austin, TX: Cognitive Science Society.
}}Advaith Siddharthan (2003). Syntactic Simplification and Text Cohesion. PhD Thesis. University of Cambridge.
}}Andreas Stolcke. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, 2002.

Cited By

View all
  • (2023)Approaches, Methods, and Resources for Assessing the Readability of Arabic TextsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357151022:4(1-30)Online publication date: 25-Mar-2023
  • (2022)Reading-Assistance Tools Among Deaf and Hard-of-Hearing Computing Professionals in the U.S.: Their Reading Experiences, Interests and Perceptions of Social AccessibilityACM Transactions on Accessible Computing10.1145/352019815:2(1-31)Online publication date: 19-May-2022
  • (2019)Understanding Reader Backtracking Behavior in Online News ArticlesThe World Wide Web Conference10.1145/3308558.3313571(3237-3243)Online publication date: 13-May-2019
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image DL Hosted proceedings
IUNLPBEA '10: Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
June 2010
105 pages


Association for Computational Linguistics

United States

Publication History

Published: 05 June 2010


  • Research-article

Acceptance Rates

IUNLPBEA '10 Paper Acceptance Rate 13 of 28 submissions, 46%;
Overall Acceptance Rate 13 of 28 submissions, 46%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)14
Reflects downloads up to 17 Feb 2025

Other Metrics


Cited By

View all
  • (2023)Approaches, Methods, and Resources for Assessing the Readability of Arabic TextsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357151022:4(1-30)Online publication date: 25-Mar-2023
  • (2022)Reading-Assistance Tools Among Deaf and Hard-of-Hearing Computing Professionals in the U.S.: Their Reading Experiences, Interests and Perceptions of Social AccessibilityACM Transactions on Accessible Computing10.1145/352019815:2(1-31)Online publication date: 19-May-2022
  • (2019)Understanding Reader Backtracking Behavior in Online News ArticlesThe World Wide Web Conference10.1145/3308558.3313571(3237-3243)Online publication date: 13-May-2019
  • (2015)Making It SimplextACM Transactions on Accessible Computing10.1145/27380466:4(1-36)Online publication date: 11-May-2015
  • (2014)What makes a good biography?Proceedings of the 23rd international conference on World wide web10.1145/2566486.2567972(855-866)Online publication date: 7-Apr-2014
  • (2014)Readability Classification of Bangla TextsProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840410.1007/978-3-642-54903-8_42(507-518)Online publication date: 6-Apr-2014
  • (2013)ERNESTAProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 210.1007/978-3-642-37256-8_39(476-487)Online publication date: 24-Mar-2013
  • (2012)Comparing human versus automatic feature extraction for fine-grained elementary readability assessmentProceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations10.5555/2390916.2390926(58-64)Online publication date: 7-Jun-2012
  • (2012)Do NLP and machine learning improve traditional readability formulas?Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations10.5555/2390916.2390925(49-57)Online publication date: 7-Jun-2012
  • (2012)Making readability indices readableProceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations10.5555/2390916.2390924(40-48)Online publication date: 7-Jun-2012
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options






Share this Publication link

Share on social media