Abstract
The purpose of the present study was to delineate a range of linguistic features that characterize the English reading texts used at the B2 (Independent User) and C1 (Advanced User) level of the Greek State Certificate of English Language Proficiency (KPG) exams in order to better define text complexity per level of competence. The main outcome of the research was the L.A.S.T. Text Difficulty Index that makes possible the automatic classification of B2 and C1 English reading texts based on four in-depth linguistic features, i.e. lexical density, syntactic structure similarity, tokens per word family and academic vocabulary. Given that the predictive accuracy of the formula has reached 80% on a new set of reading comprehension texts with 32 out of the 40 new texts assigned to similar levels by both raters, the practical usefulness of the index might extend to EFL testers and materials writers, who are in constant need of calibrated texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alderson, C.: Assessing Reading. Cambridge University Press, Cambridge (2000)
Alderson, C., Figueras, N., Kuijper, H., Nold, G., Takala, S., Tardieu, C.: The development of specifications for item development and classification within The Common European Framework of Reference for Languages: Learning, Teaching, Assessment: Reading and Listening: Final report of The Dutch CEF Construct Project. Unpublished Working Paper. Lancaster University, Lancaster (2004)
Allen, D., Bernhardt, B., Berry, T., Demel, M.: Comprehension and text genre: an analysis of secondary school foreign language readers. The Modern Language Journal 72(2), 163–172 (1988)
Bailin, A., Grafstein, A.: The linguistic assumptions underlying readability formulas: a critique. Language & Communication 21(3), 285–301 (2001)
Beaudreau, S., Storandt, M., Strube, M.: A comparison of narratives told by younger and older adults. Experimental Aging Research 32(1), 105–117 (2005)
Block, E.: See How They Read: Comprehension Monitoring of L1 and L2 Readers. TESOL Quarterly 26(2), 319–342 (1992)
Bohanek, J., Fivush, R., Walker, E.: Memories of positive and negative emotional events. Applied Cognitive Psychology 19(1), 51–56 (2005)
Brown, C., Snodgrass, T., Kemper, S., Herman, R., Covington, M.: Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods 40(2), 540–545 (2008)
Carr, N.: The factor structure of test task characteristics and examinee performance. Language Testing 23(3), 269–289 (2006)
Chalhoub-Deville, M., Turner, C.: What to look for in ESL admission tests: Cambridge certificate exams, IELTS and TOEFL. System 28(4), 523–539 (2000)
Chapelle, C., Jamieson, J., Hegelheimer, V.: Validation of a web-based ESL test. Language Testing 20(4), 409–439 (2003)
Cobb, T.: Computing the vocabulary demands of L2 reading. Language Learning & Technology 11(3), 38–63 (2007)
Cobb, T.: Learning about language and learners from computer programs. Reading in a Foreign Language 22(1), 181–200 (2010)
Cook, P., Dixon, W., Duckworth, M., Kaiser, K., Koehler, W., Meeker, Stephenson, W.: Beyond Traditional Statistical Methods. Iowa State University Press, Iowa (2000)
Covington, M.: CPIDR 3.0 User Manual. CASPR Research Report 2007-03. Artificial Intelligence Center, The University of Georgia (2007), http://www.ai.uga.edu/caspr
Cox, D., Snell, E.: Analysis of Binary Data, 2nd edn. Chapman & Hall/CRC, New York (1989)
Coxhead, A.: A new academic word list. TESOL Quarterly 34(2), 213–238 (2000)
Crossley, S., Greenfield, J., McNamara, D.: Assessing Text Readability Using Cognitively Based Indices. TESOL Quarterly 42(3), 475–492 (2008)
Crossley, S., Louwerse, M., McCarthy, P., McNamara, D.: A Linguistic Analysis of Simplified and Authentic Texts. The Modern Language Journal 91(1), 15–30 (2007)
Crossley, S., Salsbury, T., McNamara, D., Jarvis, S.: Predicting lexical proficiency in language learner texts using computational indices. Language Testing 28(4), 561–580 (2011)
Douglas, D.: Performance consistency in second language acquisition and language testing research: a conceptual gap. Second Language Research 17(4), 442–456 (2001)
Durán, P., Malvern, D., Richards, B., Chipere, N.: Developmental trends in lexical diversity. Applied Linguistics 25(2), 220–242 (2004)
Durán, N., McCarthy, P., Graesser, A., McNamara, D.: Using temporal cohesion to predict temporal coherence in narrative and expository texts. Behavior Research Methods 39(2), 212–223 (2007)
Foster, J.: Data Analysis Using SPSS for Windows. Sage Publications Ltd, London (2001)
Freedle, R., Kostin, I.: Does the text matter in a multiple-choice test of comprehension? The case for the construct validity of TOEFL’s minitalks. Language Testing 16(1), 2–32 (1999)
Fulcher, G.: Text difficulty and accessibility: Reading Formulas and expert judgment. System 25(4), 497–513 (1997)
Graesser, A., McNamara, D., Louwerse, M., Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments & Computers 36(2), 193–202 (2004)
Green, A., Ünaldi, A., Weir, C.: Empiricism versus connoisseurship: Establishing the appropriacy of texts in tests of academic reading. Language Testing 27(2), 191–211 (2010)
Haertl, B., McCarthy, P.: Differential Linguistic Features in U.S. Immigration Newspaper Articles: A Contrastive Corpus Analysis Using the Gramulator. In: Murray, C., McCarthy, P. (eds.) Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 349–350. The AAAI Press, Menlo Park (2011)
Hatch, E., Lazaraton, A.: The Research Manual: Design and Statistics for Applied Linguistics. Heinle & Heinle Publishers, Boston (1991)
Hullender, A., McCarthy, P.: A Contrastive Corpus Analysis of Modern Art Criticism and Photography Criticism. In: Murray, C., McCarthy, P. (eds.) Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 351–352. The AAAI Press, Menlo Park (2011)
Hutcheson, G.: Logistic Regression. In: Moutinho, L., Hutcheson, G. (eds.) The SAGE Dictionary of Quantitative Management Research, pp. 173–176. SAGE Publications Ltd., London (2011)
Jarvis, S.: Short texts, best-fitting curves and new measures of lexical diversity. Language Testing 19(1), 57–84 (2002)
Kahn, J., Tobin, R., Massey, A., Anderson, J.: Measuring Emotional Expression with the Linguistic Inquiry and Word Count. The American Journal of Psychology 120(2), 263–286 (2007)
Kintsch, W.: The Role of Knowledge in Discourse Comprehension: A Construction Integration Model. Psychological Review 95(2), 163–182 (1988)
Lamkin, T., McCarthy, P.: The Hierarchy of Detective Fiction: A Gramulator Analysis. In: Murray, C., McCarthy, P. (eds.) Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 257–262. The AAAI Press, Menlo Park (2011)
Lee, J., Musumeci, D.: On Hierarchies of Reading Skills and Text Types. The Modern Language Journal 72(2), 173–187 (1988)
Liu, H.: MontyLingua: An end-to-end natural language processor with common sense (Computer software and documentation) (2004), http://web.media.mit.edu/~hugo/montylingua (retrieved March 23, 2012)
MacWhinney, B.: The Childes Project: Tools for Analyzing Talk. Lawrence Erlbaum Associates, Mahwah (2000)
MacWhinney, B., Snow, C.: The Child Language Data Exchange System: an update. Journal of Child Language 17(2), 457–472 (1990)
Malvern, D., Richards, B.: A new measure of lexical diversity. In: Ryan, A., Wray, A. (eds.) Evolving Models of Language: Papers from the Annual Meeting of the British Association for Applied Linguistics Held at the University of Wales, pp. 58–71. Multilingual Matters, Clevedon (1996)
Malvern, D., Richards, B.: Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing 19(1), 85–104 (2002)
Malvern, D., Richards, B., Chipere, N., Durán, P.: Lexical diversity and language development: Quantification and Assessment. Palgrave Macmillan, Houndmills (2004)
McCarthy, P., Jarvis, S.: vocd: A theoretical and empirical evaluation. Language Testing 24(4), 459–488 (2007)
McCarthy, P., Jarvis, S.: MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods 42(2), 381–392 (2010)
McCarthy, P., Watanabe, S., Lamkin, T.: The Gramulator: A Tool to Identify Differential Linguistic Features of Correlative Text Types. In: McCarthy, P., Boonthum, C. (eds.) Applied natural language processing and content analysis: Identification, investigation, and resolution, pp. 312–333. IGI Global, Hershey (2012)
McKee, G., Malvern, D., Richards, B.: Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing 15(3), 323–337 (2000)
McNamara, D., Cai, Z., Louwerse, M.: Optimizing LSA measures of cohesion. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, pp. 379–400. Routledge, New York (2011)
McNamara, D., Louwerse, M., McCarthy, P., Graesser, A.: Coh-Metrix: Capturing Linguistic Features of Cohesion. Discourse Processes 47(4), 292–330 (2010)
Meara, P.: Lexical Frequency Profiles: A Monte Carlo Analysis. Applied Linguistics 26(1), 32–47 (2005)
Min, H., McCarthy, P.: Identifying Varietals in the Discourse of American and Korean Scientists: A Contrastive Corpus Analysis Using the Gramulator. In: Guesgen, H., Murray, C. (eds.) Proceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, pp. 247–252. The AAAI Press, Menlo Park (2010)
Nagelkerke, E.: A note on a general definition of the coefficient of determination. Biometrika 78(3), 691–692 (1991)
Nation, P.: Using small corpora to investigate learner needs: two vocabulary research tools. In: Ghadessy, M., Henry, A., Roseberry, R. (eds.) Small Corpus Studies and ELT, pp. 31–45. John Benjamins, Amsterdam (2001)
Nation, P.: How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review 63(1), 59–82 (2006)
Nevo, N.: Test-taking strategies on a multiple-choice test of reading comprehension. Language Testing 6(2), 199–215 (1989)
Oakland, T., Lane, H.: Language, Reading, and Readability Formulas: Implications for Developing and Adapting Tests. International Journal of Testing 4(3), 239–252 (2004)
Pasupathi, M.: Telling and the remembered self: Linguistic differences in memories for previously disclosed and previously undisclosed events. Memory 15(3), 258–270 (2007)
Pennebaker, J., King, L.: Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296–1312 (1999)
Pennebaker, J., Booth, R., Francis, M.: Linguistic Inquiry and Word Count: LIWC 2007. LIWC.net, Austin (2007)
Phakiti, A.: A Closer Look at Gender and Strategy Use in L2 Reading. Language Learning 53(4), 649–702 (2003)
Purpura, J.: An analysis of the relationships between test takers’ cognitive and metacognitive strategy use and second language test performance. Language Learning 47(2), 289–325 (1997)
Rufenacht, R., McCarthy, P., Lamkin, T.: Fairy Tales and ESL Texts: An Analysis of Linguistic Features Using the Gramulator. In: Murray, C., McCarthy, P. (eds.) Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 287–292. The AAAI Press, Menlo Park (2011)
Shokrpour, N.: Systemic Functional Grammar as a Basis for Assessing Text Difficulty. Indian Journal of Applied Linguistics 30(2), 5–26 (2004)
Snowdon, D., Kemper, S., Mortimer, J., Greiner, L., Wekstein, D., Markesbery, W.: Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life: Findings from the Nun Study. The Journal of the American Medical Association 275(7), 528–532 (1996)
Tausczik, J., Pennebaker, W.: The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology 29(1), 24–54 (2010)
Terwilleger, B., McCarthy, P., Lamkin, T.: Bias in Hard News Articles from Fox News and MSNBC: An Empirical Assessment Using the Gramulator. In: Murray, C., McCarthy, P. (eds.) Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 361–362. The AAAI Press, Menlo Park (2011)
Turner, A., Greene, E.: The construction and use of a propositional text base. Technical Report 63. Institute for the Study of Intellectual Behavior, University of Colorado (1977)
Ungerleider, C.: Large-Scale Student Assessment: Guidelines for Policymakers. International Journal of Testing 3(2), 119–128 (2003)
Weir, C.: Limitations of the Common European Framework for developing comparable examinations and tests. Language Testing 22(3), 281–300 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Liontou, T. (2014). Focused Information Retrieval & English Language Instruction: A New Text Complexity Algorithm for Automatic Text Classification. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-13817-6_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)