Automated Writing Evaluation in An EFL Setting: Lessons From China
Automated Writing Evaluation in An EFL Setting: Lessons From China
Automated Writing Evaluation in An EFL Setting: Lessons From China
jaltcalljournal
issn 1832-4215
Vol. 13, No.2 Pages 117–146
©2017 jalt call sig
Introduction
Though automated writing evaluation
(awe ), which employs artificial intelli-
gence to evaluate essays and offer feedback,
has been in existence since the 1960s, its
use in assessment and instruction remains
controversial. Some argue that use of such
software dehumanizes the writing pro-
Regular Paper cess, violating the social and communica-
tive nature of writing (cccc Executive 117
The jalt call Journal 2017: Regular Papers
Committee, 2004; Ericsson, 2006). Others argue on behalf of automated evaluation, refer-
ring to research evidence demonstrating its high reliability that is comparable to human
scoring (Elliot, 2003; Klobucar, Elliot, Deess, Rudniy, & Joshi, 2013; Page, 1994; Ramineni,
2013; Shermis & Burstein, 2003). Despite the ongoing debate, there has been growing use
of awe softwares. Educational Testing Service (Heilman & Tetreault, 2012) reports that its
e-rater program scored some 5.8 million scripts in 2010 and was currently utilized in more
than 20 applications, including as part of the gre and toefl tests. awe products for the
classroom have been used in schools (e.g., Grimes & Warschauer, 2010; Rich, Harrington,
Kim, & West, 2008; Warschauer & Grimes, 2008; White, Hixson, D’Brot, Perdue, Foster, &
Rhudy, 2010), and universities (e.g., Chen & Cheng, 2008; Li, Link, Ma, Yang, & Hegelheimer,
2014; Li, Link, & Hegelheimer, 2015; Link, Dursun, Karakaya, & Hegelheimer, 2014; Tang &
Rich, 2011; Tang & Wu, 2012; Wang, Shang, & Briody, 2013; Wu & Tang, 2012).
This paper aims to enrich the current literature via summarizing the main impact of
awe in Chinese efl classrooms by analyzing a series of studies conducted on the use of
awe by secondary and university students who learn English as a foreign language (efl)
in China. We first review relevant research on the use of awe in the classroom, and then
report and discuss the results of these studies.
Warschauer and Grimes (2008)’s mixed-methods exploratory case study of four schools
in their use of two awe softwares revealed that although the program encouraged stu-
dents to revise more, the revision was limited to language forms only, few on content or
organization. In addition, teachers’ use of awe varied from school to school and was
determined most by teachers’ prior beliefs about writing pedagogy, which arguably called
for the necessity of teacher training on writing pedagogy if awe were to be successfully
used in the classroom.
Grimes and Warschauer (2010) conducted a 3-year longitudinal study on the use of awe
in eight schools in California and concluded that awe motivated students to write and
revise more and promoted learner autonomy. They attributed the successful use of awe
partly to the maturity of the awe programs in the study, but more importantly to the
local social factors such as technical, administrative and teacher support, which seemed
to verify the claim that the key to technology use might be neither hardware nor software,
but rather human ware (Warschauer & Meskill, 2000).
In the efl context, Wang et al. (2013) investigated the impact and effect of using awe
on freshmen writing with a group of 57 students from a university. They used a quasi-
experimental pre-post test research design and the results showed a significant difference
between the experimental group and the control group in terms of writing accuracy, with
the experimental group demonstrating obvious writing gains in terms of writing accuracy
and learner autonomy awareness. In discussing the pedagogical implications, they sug-
gested that teachers should be more actively involved from teaching students structure
and to teaching students models of writing so that students knew how to improve their
language accuracy and how to improve their writing content and structure.
In examining the impact of awe corrective feedback on writing accuracy with 70 non-
native esl students in a us university, Li et al. (2015) found that the corrective feedback had
helped increase the number of revisions and improve the accuracy. Their study seemed to
support the claim of the usefulness of the practice proposed by Chen and Cheng (2008) of
requiring a minimum score before submission to awe. Moreover, similar to the previous
studies (e.g., Grimes & Warschauer, 2010; Warschuaer & Grimes, 2008; Wang, et al., 2013;
Wu & Tang, 2012), their study reinforced the important role of teachers and suggested that
the instructor’s ways of implementing awe might impact how students engage themselves
in revising in awe.
It might be argued that except for one study (i.e., Wang et al., 2013), the rest did not have
a control group, therefore the claimed awe effect on writing performance needed to be
interpreted with caution. More importantly, though the importance of teacher pedagogical
roles has been implied or suggested in some of the studies (e.g., Li et al., 2015; Wang et al.,
2013; Warschauer & Grimes, 2008), no systematic training was provided to teachers regard-
ing the writing pedagogy in those studies reviewed. Furthermore, none of these awe stud-
ies so far seemed to suggest a tentative procedure of using awe effectively in the classroom,
in most cases, the ways of using awe mainly depended on teachers (e.g., Link et al., 2014).
In the light of the aforementioned understanding, the current research sought to con-
tribute in these areas by investigating how a group of teacher researchers explored the use
of awe in their efl classrooms and lessons. Our study differs from the previous studies
in that the intervention measures include not just awe but the awe-integrated teaching
experiment and teacher training on writing pedagogy and ongoing support. For the use
of awe, our study introduced a tentative procedure of integrating awe in the classroom
based on the previous studies (e.g., Chen & Cheng, 2008; Li et al., 2015; Link et al., 2014; 119
The jalt call Journal 2017: Regular Papers
Tang, 2014), therefore teachers are not left alone, but rather provided a reference working
framework on how to use the tool in the classroom.
Theoretical approaches
Following Grimes and Warschauer (2010), the current study adopted a social informatics
theoretical approach toward the use of awe, assuming technologies, people, and organi-
zations as a “heterogeneous socio-technical network” (Kling, 1999). Rather than the “tools”
view which focuses only on the technology per se, this approach embraces a more com-
plicated and locally situated process of technology integration. Social informatics theory
informed the current research design and drew our attention to the more important local
factors such as people and organizations in shaping the use of technology. In the light of
this understanding, teacher training not only on technology but also on writing pedagogy,
and continuous teacher support were provided throughout the experiment in this study.
In addition, participatory design (pd), commonly used in human-computer educational
research engaging the users of computer systems in designing and revising the computer
systems (Steen, 2013), and exploratory practice (ep), a practitioner-based research combin-
ing research and classroom teaching in the natural setting with the aim to resolve teacher
and students’ “puzzles” or “problems” in the classroom (Allwright, 2003), were also perti-
nent to our study design in that our teachers, rather than subjects to be investigated, but
were supported to be active researchers and encouraged to explore the best way of using
120 awe in their own teaching context.
Tang & Rich: Automated writing evaluation in an EFL setting
The current study concentrates on the classroom use of an awe software tool, Writing
Roadmap, in the Chinese efl context. The mixed methods quasi-experimental study draws
on questionnaires, journals, interviews and pre- and post-tests. It aims to investigate the
following questions:
1. How does use of awe affect students’ writing performance in English in China?
2. What is the impact of awe use on teaching and learning processes?
Method
The participants
The participants reported in this study consisted of 268 senior high school students from six
intact classes, 460 tertiary students from five universities, and ten teachers (three teachers
from the senior high school and seven from across the five universities).
The high school group comprised three cohorts with the first one from Senior High
1 (the first grade of the senior high school), with one class as the experimental and the
other as the control (see Table 1). The second and third cohorts are both from Senior High
2 (the second grade of the senior high school). The students’ age ranged from 15 to 17
years old and they have received at least nine years of English language education with six
years in the primary school and three years in the junior high school. The students upon
junior high graduation should be able to use English for basic communication and have
a vocabulary size of 1500 words. The initial language proficiency of the experimental and
the control groups were of parallel levels based on either student performance on the high
school entrance exam in the case of cohort 1 or end of the term exams from the previous
year for cohorts 2 and 3.
For the university group of 460 students, 224 were in the experimental group and 236 in
the control group (see Table 2). The type of universities varied from teacher education, poly-
technic to comprehensive. The majority of the students are from non-English majors (390
students, 85%), of arts major (including English) (327 students, 71%) and of second year in
the university (306 students, 67%). For the first-year students, they have at least completed
12 years of English language learning prior to the university and they could use English
for communication, and they are required to master at least 3500 words. The second-year
students have studied English for one more year in the college and their vocabulary size is
expected to reach 4500 words.
121
The jalt call Journal 2017: Regular Papers
Three teachers participated in the high school study with each teacher assigned one experi-
mental group and one control group. The three teachers ranged in their teaching experi-
ences from two to five years and they all had Master of Arts’ degrees in the area of applied
linguistics or English language teaching methodology. They were all under 30 years old at
the time of this study and were interested in exploring the use of technology to enhance
instruction.
The seven university teachers varied in their teaching experiences from five to fifteen
years, with one teacher having a PhD degree, and the remaining six teachers having
Master of Arts degrees in the areas of applied linguistics or language education. They were
interested in using technology to improve teaching and volunteered to participate in this
research project.
Research interventions
The intervention measures included introducing a teaching experiment using the Writing
Roadmap software, and offering support to the participating teachers. The teaching experi-
ment extended from September 2010 to July 2011. The students were divided into two
groups: the experimental and the control group. Each group is a natural intact class, and
the experimental and the control groups were made up of two classes of parallel language
proficiency levels (based on their end of term tests or high-stake exams such as the senior
high school entrance test and the college entrance test). In addition, the same teacher
taught both groups to reduce the number of variables affecting the efficacy of the teaching
experiment. The two groups participated in a pre-test before the experiment and a post-test
after the experiment. The tests were in essay format and administered in the awe system
(see Appendix A).
The software. The automated writing assessment tool investigated in this study is Writing
Roadmap (wrm) from ctb/McGraw-Hill. It provides a set of six-trait writing rubrics or
assessment criteria (ac), each with a set of indicating components. The six traits are “Ideas
and Content”, “Organization”, “Voice”, “Word Choice”, “Fluency”, and “Conventions.” wrm
offers immediate online feedback through highlighting problematic sections, narrative
comments, discrete (trait-specific) and holistic scores, and remarking and rescoring on
revised versions. It also provides a set of writing assistance tools such as “hint,” “tutor,”
“thesaurus,” and “grammar tree” to offer tips on improving writing, on grammar and syntax
and choice of words (sentences with grammar errors are highlighted in blue, and words
122 with spelling errors are in red).
Tang & Rich: Automated writing evaluation in an EFL setting
The awe-integrated teaching experiment. The teaching experiment extended for two
semesters for both the university and the high school group. For the university group,
except for the English majors (who wrote 11 essays), the remaining four non-English majors
wrote seven essays each, three essays in semester one and four essays in semester two. The
high school group wrote seven essays on average throughout the experiment. Writing was
a standing-alone individual course for the English majors, which explained why students
could write more essays, however, writing was only a component of the general English
course for the non-English majors and also the high school group.
Both the experimental and control group students knew that they were using a soft-
ware to help them with their writing, but they were not informed whether they were in
the control or experimental group, nor did they know about the details of the experiment.
Based on previous studies (e.g., Chen & Cheng, 2008; Li et al., 2015; Tang, 2014; Tang &
Rich, 2011) and the local teaching context, our team proposed and implemented the follow-
ing procedure of using awe in the classroom:
1. teacher and student understanding of awe assessment criteria (i.e. the writing rubrics,
abbreviated as ac)
2. teacher-led pre-writing discussion on the writing topic;
3. autonomous writing and revision in awe with the support of awe writing assistance
tools until arriving at the required score;
4. teacher feedback based on the ac and on the awe-generated report of students’ writing
performance and peer feedback in the light of the ac;
5. revision based on teacher and peer feedback;
6. submission of essays in awe.
The process features the ac comprehension and application throughout, and integrates self,
teacher and peer feedback, precipitating the students’ autonomy, self revision and formative
learning. Moreover, a required score for revision was introduced during the autonomous
writing stage to motivate students to revise to achieve a satisfactory mark before teacher
assessment. The idea of requiring a minimum score before teacher assessment was pro-
posed in Chen and Cheng (2008), and its efficacy was verified in Li et al. (2015), however the
authors also found flaws with this practice and called for further investigation of the issue.
To summarize, the three striking features of the suggested awe integration procedure
were the requirement for achieving a minimum score during the autonomous writing
stage; combination of self, peer and teacher feedback during the process; and the ac com-
prehension and application throughout the whole process. It may be argued that though
the general procedure is followed, variations also exist with different classes. Table 3 dem-
onstrates the writing instruction procedure in different classes.
123
The jalt call Journal 2017: Regular Papers
Table 3 shows that though teachers varied in their writing instruction, the wrm ac inter-
pretation was a core component in all the teachers’ experimental class. Second, three high
school teachers demonstrated more variation in their writing instruction procedure than
the university group. Except one teacher from University B, who used in-class peer revi-
sion in both her control and experimental class, the remaining teachers seemed to have
adopted a similar procedure
According to the teacher reports, peer feedback for the control group was conducted in
a paper format, with students reading and commenting on each other’s essay according to
the assessment criteria given by the teacher. In-class peer feedback for the experimental
group was done in the computer lab, with two students’ reviewing their essays together
on the computer.
Teacher support. In response to previous research studies’ call for offering more pedagogi-
cal assistance to teachers (e.g., Grimes & Warschauer, 2010; Warschauer & Grimes, 2008),
our team made arrangements to provide timely support to teachers participating in the
research. First, a 4-member head group (hg) of the project was established to be respon-
sible for research design, implementation, and evaluation; monitoring the experiment/
124 research process; and providing ongoing academic and practical support. Second, technical
Tang & Rich: Automated writing evaluation in an EFL setting
support was provided from ctb/McGraw Hill and the Institute of Online Education of
Beijing Foreign Studies University for wrm system operation and maintenance. The hg
held interactive lectures at the different high schools and universities, network and tele-
phone conferences, symposia, and conducted on-going individual interactions with the
participating teachers.
The key issues discussed during the support process included orientation about the
software tool wrm and the process of teaching writing; perceptions of the role of wrm
in teaching and learning (Tang & Wu, 2011); effective ways of integrating wrm in teaching
and learning (Beatty et al., 2008), for example, curriculum-based instructional design input
at the school’s level; and challenges to awe feedback and the role and implementation
of awe scoring and assessment criteria in teaching writing. Teachers were encouraged
to explore different ways of integrating awe into their teaching practice based on their
individual class needs and their local teaching context.
Research methods
It has been noted that multi-method approaches are increasingly used in research because
“mixed methods offers strengths that offset the weaknesses of separately applied quantita-
tive and qualitative research methods” (Creswell & Plano-Clark, 2006, p. 18). The mixed-
methods approach has been adopted in previous awe studies such as Wang et al. (2013), Li
et al. (2015), Warschauer and Grimes (2008), and Grimes and Warschauer (2010).
Hence the current study employed a mixed-methods qualitative and quantitative
research approach, with the main aim of examining pre- to post-changes in writing pro-
ficiency, student and teacher perceptions of and experiences with the use of automated
assessment in the China efl classroom. Questionnaires, teacher journals, interviews, and
quasi-experimental pre- and post-tests were used to collect the pertinent data. The use of
a mixed-methods approach is justified as the quantitative study of students’ pre- and post-
test score comparison will answer the first research question, that is, whether the awe use
affects students’ writing performance, while the qualitative data drawing from question-
naires, journals and interviews will help to reveal the impact of the awe-integrated teach-
ing experiment on the teaching and learning process (i.e., the second research question).
We used a quasi-experimental non-randomized control/experiment group pre- post-
test design in examining the average growth of the two groups using gain score analysis.
Students’ pre- post-test writing prompts were administered in the Writing Roadmap online
system and scored automatically using the generic scoring algorithm first, then by human
scorers to ensure the reliability and fairness of the scores.
The post-experiment student questionnaire (see Appendix B) centred on students’ beliefs
toward English learning and writing and evaluation of the awe-integrated teaching experi-
ment. Group interviews were undertaken on students’ experience with wrm and how they
used it to revise and improve their writing. Each group interview consisted of three students
and lasted thirty minutes (see Appendix C for student interview prompts).
The teacher questionnaire concentrated on teacher perceptions toward assessment,
teacher and student communication, teaching mode, beliefs toward writing instruction,
teaching methodology and teachers’ perceptions of student learning autonomy in the appli-
cation of awe (see Appendix D). Teacher interviews were conducted in groups and were
semi-structured (see Appendix E for teacher interview prompts). Two group teacher inter-
views involving five teachers were undertaken, one with the sciences major university and 125
The jalt call Journal 2017: Regular Papers
one with the high school. Each interview lasted for about 45 minutes and was recorded and
transcribed. Teacher journals mainly concerned teacher experiences and reflection during
the experiment (see Appendix F for the journal template).
Data analysis
To address the first research question of how the awe-integrated experiment impacts
students’ English writing performance in China, we examined gain scores of students’
pre-post writing tests with effect sizes for university and high school groups. While the
independent sample t-tests and paired-samples t-tests showed clear evidence on the sta-
tistically significant effect of writing performance improvement from the awe-integrated
teaching experiment (e.g., Tang, 2014; Tang & Wu, 2012), in this paper we used a General
Linear Model (glm) procedure to further explore the intervention effect by university stu-
dent study major and by classrooms in senior high school. In particular we investigated
whether the English major or non-English major studies contribute to the differences in
the observed score gains in the university sample and what is the classroom effect of the
three different teachers on the observed differences.
In response to the second research question of how the awe-integrated experiment
impacts the teaching and learning process, we collected and analyzed student and teacher
questionnaire, interview and journal data. The student questionnaire was conducted online
and the submission rate was 67%. For the high school group, 71 out of 138 experiment
group students (51%) submitted their questionnaires. For the university group, 185 out of
224 experimental group (83%) completed their questionnaires. The teacher questionnaire
was sent to the teachers via email to fill in and all ten teachers returned their answers.
Multiple choice responses were analyzed via spss, while responses toward open-ended
questions and interviews along with journals were examined through content analysis.
Common themes were extracted, discussed and exemplified to illustrate how the teaching
experiment affects the teaching and learning process (see the results below).
Results
The research results are reported in the order of the two questions. The glm analysis of pre
and post test scores will answer whether the use of wrm in efl instruction will result in
students’ improved writing performance. Data from questionnaires, interviews and teacher
journals will indicate how teaching and learning process might change during the awe-
integrated experiment.
The university group. Tables 4 and 5 present pre- and post-test statistics for the university
group using overall sample and subgroups. The mean difference scores of pre-post tests
across control and experimental groups indicated that the experimental whole group and
126 subgroups mean scores were higher than the control group mean scores with effect size as
Tang & Rich: Automated writing evaluation in an EFL setting
0.79, 1.46, 0.72. The control overall group and subgroups not only had smaller gains in the
post tests than the experimental overall and subgroups, but also with smaller effect size as
0.19, 0.31, 0.18 (see Table 4). Moreover, the overall F test is significant with p-value 0.0001
(see Table 5), indicating strong evidence that the students in the experimental group of
using Writing Roadmap had statistically significant greater English writing improvement
as measured by the pre- and post-tests than that of the students from the control group.
For the university sample analysis, we used the glm to examine the effect of the auto-
mated writing evaluation software by two types of students. English major group consists
of students who study English as a major in the universities. The other group of students
labeled as non-English major study English as a general requirement of other majors in
the university. We noticed that the English major students were assigned a set of writing
topics that are different from the non-English major students (see Appendix A). Because
the two variables overlap, we focused on English major or non-English major students, and
the combined group and type of English major interaction effect. Overall the glm analyses
indicated statistically significant higher mean score gains from English major students
vs. non-English major students with p-value of 0.005 (Table 5). The experimental/control
group and major interaction effect is present but not statistically significant with p-value
of 0.07 (Table 5). For the experimental group, English major students made statistically 127
The jalt call Journal 2017: Regular Papers
greater improvements in writing than the non-English major students. We believe that the
significant difference we observed could be explained by the fact that the curriculum for the
English major students centered on English language development, while the non-English
major students take English as only one course of the curriculum.
The high school group . For the senior high school sample, the gain score difference of the
pre- and post-test scores between the experimental and control group is smaller but still
statistically significant with p-value 0.03 (Table 7).
Senior high school has three classes listed as Teacher 1, Teacher 2 and Teacher 3 (Table
6) in this study. Teacher 1’s class gain score from experimental and control class is very
similar, 0.51 from experimental and 0.48 from the control class. Teacher 2 and Teacher 3’s
classes had very different results from the experimental and control classes. The following
factors might account for this. Teacher 1’s class was in Senior 1, that is, the first year of
high school (in China, students need to study three years in high school before taking an
entrance exam for college), when both teachers and students have no imminent pressure
from the high stakes exam such as the college entrance exam, and have more time and
are more motivated to participate in the teaching experiment. The journal data revealed
that Teacher 1 had made active use of the assessment criteria in wrm to guide her writing
instruction (see the section on the impact on teacher process below for details), therefore
both of her classes might have benefitted from this additional application of the wrm
software. In contrast, Teacher 2 and 3’s classes were both from Senior 2, when teachers
and students were faced with increasing pressure from the high stakes exam, i.e. the col-
lege entrance exam (which is held at the end of the third year of the senior high school).
It could be argued that the control groups in particular might feel less motivated to take
part in the wrm post-tests, which might result in only slight changes in their gain scores.
The three teachers from the senior high school sample each taught a control and experimen-
128 tal class. The glm test statistics show that gain scores did not have statistically significant
Tang & Rich: Automated writing evaluation in an EFL setting
differences within experimental group classes, nor was there a significant group and class
interaction effect (Table 7). We actually were pleasantly surprised to see that all classes,
despite different levels of students, benefited from the awe-integrated teaching experi-
ment. The descriptive statistics in Table 6 show that the three experimental classes gain
scores are 0.51, 0.33, 0.35 vs. the gain scores from the three control classes: 0.48, 0.05, −0.11.
In the meanwhile, the experimental group gain score effect size of 0.71, 0.53, 0.49 is much
bigger than the control groups’ effect size of 0.61, 0.11, −0.16, indicating greater improve-
ments from the experimental group and teachers. Teacher 1’s control group made strong
post-test improvement, perhaps due to the fact that Teacher 1 may have provided more
motivation in teaching and testing for the control group.
In summary, the glm analysis showed that the writing improvement for the university
experimental group and English major students was statistically significant. Similarly, the
glm analysis found statistically significant writing improvement for the senior high school
experimental group, and all three teachers’ classrooms with different levels of students.
improvement (Warschauer & Grimes, 2008). In the current experiment, however, the mul-
tiple and dynamic assessment and feedback from the teacher, the peer and the awe system,
interacted and motivated the students to write and revise continuously. The awe feedback
was instantaneous and prompt, while the teacher feedback was usually more targeted and
could tackle the more difficult problems.
Second, with teacher guidance and instruction (e.g., teachers provide feedback by cri-
tiquing exemplary essays in the class in the light of wrm six-trait rubrics or assessment
criteria) and constant interaction with the system, students seemed to have learned to use
ac to guide their own writing, which could be noted in the university group.
I used to compose a lengthy opening for my English essay. During the experiment,
through practicing my essays within the system and understanding the ac, I found
the English essays usually state their topics directly in the opening, and with a topic
sentence for each paragraph. I think writing practice in wrm helps me think in English
when writing essays and ensures smooth cross-cultural communication. (Student 1,
University E, Data source: Interview)
It may be argued that compared with what they did in the past, the students in the experi-
ment seemed to have a clearer purpose in writing via using ac to guide and revise essays,
during which they gradually internalize the ac and improve their assessment and self-
assessment abilities and become a key partner in the assessment process.
Third, it seemed that students had become more autonomous via dynamic interaction
with the system and teacher feedback, correcting their mistakes and revising their essays.
What I found most attractive about the system was that it could force me to practice and
revise my own essays, which improved my autonomy and writing. (Student 3, University
B, Data source: questionnaire)
Traditional writing instruction follows the linear order of students writing and teacher
feedback, during which the students’ role might be passive and they might lack the motiva-
tion to revise their essays, let alone join the assessment process (Carless, 2006). However, in
the wrm-integrated teaching experiment, it seemed that students were motivated to write
and revise through continuous interaction with the system, and peer and teacher assess-
ment and feedback. Moreover, the instant feedback from the system along with teacher
support with the ac interpretation seemed to have helped students internalize and apply
the ac in their own writing and acquire assessment and self-assessment abilities, which
can be shown in the following quote:
The teaching experiment helped me to know better about the ideas and structure of
English essays, it also helped to improve my self-assessment ability. Now I can see very
clearly the strengths and weaknesses of an essay. (Student 1, University A, Data source:
questionnaire)
Consequently, they might change from the traditional role of being assessed to becoming a
co-assessor, during which their autonomous learning abilities could be enhanced.
With language problems largely dealt with by wrm, teachers might not need to spend
as much time correcting and commenting on the language mistakes, and the writing
instruction seemed to witness a shift of focus from language form to content and dis-
course, from product to process (e.g., Wang et al., 2013; Warschauer & Grimes, 2008; Wu
& Tang, 2012).
wrm helped to liberate me from the marking workload. I remember I used to mark
students’ essays every weekend, while students turned a blind eye to my comments. Now
with wrm help, I can have time to think how to provide more targeted writing instruc-
tion based on their weak points. (Teacher F, Data source: post-questionnaire)
More attention seemed now to be directed on the teaching/learning process. Specifically,
a pre-writing phase was incorporated with the main purpose of helping students to brain-
storm ideas for writing as specified in the suggested awe-integrated procedure above.
More importantly, it seems that interpretation of ac has formed a key part of teaching
and ac is regarded both as a teaching goal and as a standard to reach. Teacher Q compared
what she did in the writing class in the past with the present as follows:
My writing teaching in the past involves only assigning topics and marking essays. I did
not provide specific writing requirements and objectives, nor tell students the assess-
ment criteria. After using this system, I have acquired a better understanding of the
importance of writing requirements and assessment criteria. (Teacher Q, Data source:
post-questionnaire)
Teacher G (i.e. Teacher 1 who taught the senior 1 group) from the high school group related
how ac helped with her writing instruction.
During the experiment, I used the ac in wrm to guide my writing instruction, and the
students became aware of the six traits (i.e. ideas and content, structure, word choice,
fluency, voice, conventions and mechanics) of ac and attended to them in their writing.
Due to the assistance of wrm, my writing teaching is now more guided and standard.
(Teacher G, Data source: journals)
The awe system feedback seemed to be more effective as it was immediate and could pos-
sibly help locate the type of problem, assisting students with language form (cf., Grimes
& Warschauer, 2010; Li et al., 2015; Wang et al., 2013; Warschauer & Grimes, 2008). The
teacher feedback was more concrete, targeted and contextual. The self and peer feedback
might help to empower students in self- and other-assessment, and guide them towards
student autonomy. Several teachers commented on how the system and teacher feedback
could complement each other in teaching:
Feedback from the system is relatively general. It can tell me roughly where my students
are, with reference to native-speaker performance. My feedback is very concrete, related
to the topic concerned and the context, with more concern for content and rhetoric.
(Teacher Y, Data source: post-questionnaire)
Concurrent with the teaching method changes observed above, teachers also seemed to
change in their roles from the dominator in the class and the only assessor of students’
essays, to a facilitator, co-assessor, senior learner, co-manager of learning, and researcher,
as noted in the following:
131
The jalt call Journal 2017: Regular Papers
Teachers now hold new roles: facilitators in learning, assessors to fill in the gaps left
by the awe system in its feedback, senior learners concerning the ac, researchers of
their own teaching for the sake of improving teaching and self. (Teacher W, Data source:
journal)
It is noted that though the teachers followed the suggested procedure of using awe in the
classroom largely, they were inspired by the theory of Exploratory Practice (ep) (Allwright,
2003), encouraged and supported by the hg research team to do action research to examine
the efficacy of the proposed awe integration process and to explore the proper way of awe
integration that suits their context, and they have become “researchers of their own teach-
ing for the sake of improving teaching and self ” as related in the journal (Tang et al., 2012).
Discussion
The research demonstrated that the experimental group seemed to outperform the con-
trol group in pre-and post-writing tests along with positive changes in the teaching and
learning process, which displayed the usefulness of the technology-enhanced formative
assessment on learning, the use of awe in particular, and again might verify Grimes and
Warschauer’s suggestion of awe’s “utility in a fallible tool” if deployed effectively (2010,
p. 4).
Our research indicated the efficacy of awe as a formative assessment tool in the early
drafting and revising process of writing, which reinforced the findings noted by Chen and
Cheng (2008). However, our study seemed to move beyond Chen and Cheng (2008) not
only in the subject size (60 university students vs. 460 university students) and the subject
range (268 high school students were also included in our study), but also in the experi-
mental process of implementing a procedure of integrating awe into the writing process
and evaluating its efficacy through a mixed-methods research design. In conclusion, we
have attempted to display the effectiveness of awe in the drafting and revising process on
a larger scale and with a wider range of student cohorts, and proposed and experimented
with a procedure of integrating autonomous writing with awe support tools and revision
goals, awe feedback, teacher feedback and peer feedback at different stages of writing for
future research to follow (see the part “The awe-integrated teaching experiment” under
“Research design” ).
The key to the success of our project might lie in the introduction of two main inter-
ventions: the awe-integrated teaching experiment and teacher training. The underlying
rationale was that the introduction of new technology to teaching is not just an issue of
technology, rather it concerns various factors, the core of social informatics theory which
informed the current study design (Kling, 1996; Warschauer & Meskill, 2000). Five main
factors were identified on the basis of those proposed by Kling (1996) and Warschauer and
Meskill (2000), however we developed their three factors of “technology, organization and
people” by specifying “people” into “teachers” and “students,” and adding the factor of “ways
of integrating technology into teaching”, which we considered crucial to the introduction
of any innovations in teaching.
The study might add to the current literature with the following innovations.
First, it seemed that the study exemplified the role of awe in the classroom within
the social-cultural theoretical framework. The study indicated that as a cultural arti-
132 fact, the awe tool regulated the writing process through providing assessment criteria
Tang & Rich: Automated writing evaluation in an EFL setting
(ac), instant scaffolding feedback, scores and writing assistance tools within the Zone of
Proximal Development (zpd) (Vygotsky, 1962, 1978). The scaffolding role of awe might
be manifested in the dynamic, formative assessment of the writing process, during which
students could interact with awe through the pre-writing, drafting, revising, rewriting,
editing and finalizing stages and through interacting with ac and constant practice, hence
improving students’ understanding of learning to write and writing skills. Moreover, dif-
ferent from previous research on dynamic assessment (e.g., Lantolf & Thorne, 2006), this
study adopted an innovative mediator awe to provide ongoing continuous assessment
and feedback, along with teacher and peer support, which ensured that students received
multiple and continuous feedback within the zpd.
Second, our study undertook a mixed-methods approach in research methodology,
among which participatory design and exploratory practice were the most salient meth-
ods, which again distinguished our study from previous ones (cf. Chen & Cheng, 2008;
Warschauer & Grimes, 2010). Rather than being treated as subjects to be researched (e.g.,
Link et al., 2014; Li, et al., 2015), teachers (including the hg team in our study) were actively
involved in the experimental teaching and research process and become action research-
ers themselves. Many of them researched their own ways of using awe pertinent to their
individual teaching contexts (see Table 3) and published their research papers reporting
the exploratory process (see Tang et al., 2012).
Third, teachers and students have made active use of ac of awe, which seemed not
have been mentioned in any awe research studies so far. Effective assessment needs to
have comprehensible and explicit assessment criteria. Communicating assessment criteria
to students is always an important principle of effective assessment (e.g., Brown, Race, &
Smith, 1996). During the experiment, many teachers revealed that they did not have a set
of clear assessment criteria like wrm ac to mark students’ essays prior to our study. The
high school teachers tended to use the essay assessment criteria for marking the college
entrance exam (Gaokao), and the college teachers for marking the College English Test Band
4 (cet4). Of those who did have ac, the criteria were usually not communicated to the stu-
dents clearly in the understanding either students had known about Gaokao ac and cet4
ac or students might not bother to know about it. The experimental groups demonstrated
through teacher quotes that understanding and interpreting ac constitutes an important
part of teaching writing. With awe serving as both an assessment and teaching tool and
awe ac as both an assessment standard and a teaching goal, teachers and students seemed
to become co-assessors, consequently students became more autonomous through interact-
ing with the system and assessing their own works continuously in the system, and teachers
changed their roles toward that of a facilitator, co-assessor and co-learner.
have learned to correct their own mistakes and improved their autonomy judging from the
student questionnaire and interview data as reported in the “Results” part.
Our research also reiterated the claims that the introduction of technology is not just a
technical issue per se (Grimes & Warschauer, 2010; Kling, 1996; Warschauer, 2012), it con-
cerns various interrelated factors relevant to it: organization, technology, teachers, students,
and ways of technology integration. It was demonstrated through our research that only
when teachers have acquired a good understanding of technology role and can apply it
properly, does technology act as a catalyst for positive changes in teachers and teaching.
Notes
1. The number in Experimental and Control columns in Table 1 and 2 refers to the num-
ber of students who have completed both the pre- and post-tests. The total number of
students enrolled in the class is more than this.
Acknowledgements
This paper is part of a China national educational sciences planning research project titled
“Investigating the Application of Online Formative Assessment in the efl Classroom” (Grant
number: GFA097005).
The first author is indebted to China Scholarship Council for sponsoring her research
visit to the Open University, uk from September 2016 to July 2017, during which she com-
pleted this paper.
The authors would also like to express their sincere gratitude for the great support pro-
vided by Professor Yi’an Wu, Beijing Foreign Studies University, China; Ms. Yihong Wang,
ctb/McGrawHill, the usa; Professor Mark Warschauer, University of California, Irvine,
the usa ; and Professor Mike Sharples, the Open University, the uk during the project
implementation and manuscript revision.
The authors are also grateful to the two anonymous reviewers for their insightful feed-
back
on the earlier versions of the paper.
References
Allwright, D. (2003). Exploratory practice: Rethinking practitioner research in language
teaching. Language Teaching Research, 7(2), 113–141.
Attali, Y. (2004). Exploring the feedback and revision features of Criterion. Paper
presented at the National Council on Measurement in Education conference. April
2004, San Diego, ca.
Beatty, I. D., Feldman, A., Leonard, W. J., Gerace, W. J., St. Cyr, K., Lee, H., & Harris, R.
(2008). Teacher learning of technology-enhanced formative assessment. Paper
presented at The narst 2008 Conference.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in
Education, 5(1), 7–74.
Brown, S., Race, P., & Smith, B. (1996). An assessment manifesto. 500 Tips on Assessment.
Retrieved from http://www.city.londonmet.ac.uk/deliberations/assessment/manifest.
html (18 August, 2006)
134
Tang & Rich: Automated writing evaluation in an EFL setting
Liang, M. C. (2011). Developing the automated system for scoring English essays. The Higher
Education Press, China.
Liang, M. C. (2016). Writing right with iWrite. Paper presented at 2016 International
Symposium on Computer-assisted Language Learning. July 22–23, Qingdao, China.
Liu, S., & Kunnan, J. A. (2016). Investigating the application of automated writing
evaluation to Chinese undergraduate English majors: A case study of WriteToLearn.
calico Journal, 33(1), 71–91.
Page, E. (1994). Computer grading of student prose using modern concepts and software.
Journal of Experimental Education, 62(2), 127–142.
Ramineni, C. (2013). Validating automated essay scoring for online writing placement.
Assessing Writing, 18(1), 40–61.
Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric
guidelines and practices. Assessing Writing, 18(1), 25–39.
Rich, C. S., Harrington, H., Kim, J., & West, B. (2008). Automated essay scoring in state
formative and summative writing assessment, paper presented at Annual Conference
of American Educational Research Association, New York.
Rowntree, D. (1987). Assessing students: How shall we know them? Kogan Page, London.
Shermis, M. D., & Burstein, J. (Eds.). (2003). Automated essay scoring: A cross-disciplinary
perspective. Mahwah, New Jersey: Lawerence Erlbaum Associates, Publisher.
Steen, M. (2013). Virtues in participatory design: Cooperation, curiosity, creativity,
empowerment and reflexivity. Sci Eng Ethics 19: 945–962.
Tang, J. L. (2014). How to integrate an automated writing assessment tool in the efl
classroom? Foreign Language Learning Theory and Practice, 1, 117–125.
Tang, J. L., & Rich, C. S. (2011). Online technology-enhanced English language writing
assessment in the Chinese classroom. Paper presented at Annual Conference of
American Educational Research Association, New Orleans.
Tang, J. L., & Wu, Y. A. (2012). Using automated writing assessment in the college efl
classroom. Foreign Languages and Their Teaching, 265(4), 53–59.
Tang, J. L. et al. (2012). English teaching reform in the digital age – the application of
educational assessment technology in English writing instruction. Foreign Language
Teaching and Research Press.
Tang, J. L., & Wu, Y. A. (2011). Using automated writing evaluation in classroom
assessment: a critical review. Foreign Language Teaching and Research, 43(2), 273–282.
Tang, J. L. (2011). A survey on teacher perceptions of ict and writing instruction. Paper
presented at 2011 Symposium on Using awe in College Writing Instruction, July
21–22, Beijing, Beijing Foreign Studies University.
Vantage Learning. (2007). My access! efficacy report. Retrieved from http: //www.
vantagelearning.com/school/research/myaccess.html (8 August, 2010)
Vygotsky, L. (1962). Thought and language. Cambridge, ma: mit Press.
Vygotsky, L. (1978). Mind in society: The development of higher psychological processes.
Cambridge, ma: Harvard University Press.
Wang, Y. J., Shang, H. F., & Briody, P. (2013). Exploring the impact of using automated
writing evaluation in English as a foreign language university students’ writing.
Computer Assisted Language Learning, 26(3), 234–257.
Warschauer, M., & Meskill, C. (2000). Technology and second language learning. In J.
Rosenthal (Ed.), Handbook of undergraduate second language education (pp. 303–318).
136 Mahwah, New Jersey.
Tang & Rich: Automated writing evaluation in an EFL setting
Warschauer, M., & Grimes, D. (2008). Automated writing assessment in the classroom.
Pedagogies: An International Journal, 3, 22–36.
Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the
classroom research agenda. Language Teaching Research, 10 (2), 1–24.
Warschauer, M. (2012). Writing to learn and learning to write. Plenary speech presented
at 2012 Glocall Conference and 2012 International Symposium on call, October
18–19, Beijing, China.
White, L., Hixson, N., D’Brot, J., Perdue, J., Foster, S., & Rhudy, V. (2010). Research brief,
impact of Writing Roadmap 2.0 on westest 2 online writing assessment scores.
Retrieved from http://wvde.state.wv.us/oaa/pdf/research/Research%20Brief%20-%20
wrm2.0%20Impact%20final%2001.27.10.pdf (20 September, 2010)
Wu, Y. A. & Tang, J. L. (2012). Impact of integrating an automated assessment tool into
English writing on university teachers. Computer-Assisted Foreign Language Education,
146 (4): 3–10.
Author biodata
Jinlan Tang, PhD, is the Deputy Dean and Professor of English in the School of Online and
Continuing Education, Beijing Foreign Studies University, China. Her research has cov-
ered the areas of language assessment, tutor feedback, efl teaching and learning in the
e-learning environment. She also serves as the Secretary-General of the China Computer-
Assisted Language Learning Association (Chinacall, www.chinacall.org.cn) (2016–2020).
Changhua Sun Rich, PhD, is a principal research scientist at act, usa. She works col-
laboratively with act research, test development, and international programs to conduct
international assessment research. Prior to joining act, she was a research director at
ctb/McGraw-Hill, usa and worked on automated essay and speech scoring applications
in China.
Appendix A
Appendix B
Dear student,
This questionnaire intends to acquire your experiences with the use of wrm in your
class. Please answer the questions truthfully. Your answers will be kept strictly confidential
and used for research purposes only. Thank you!
From the project team
I. Personal Information. Please tick in the appropriate place or fill in the blank space.
1. Name:
2. Gender: A. male B. female
3. Grade: A. Grade one B. Grade two
4. English score at the national entrance test for college
5. Major *
6. Name of your English teacher
7. Name of the university
* Questions 4–5 do not apply to the high school students. Except for these two, the rest of
138 the questions are the same for both groups.
Tang & Rich: Automated writing evaluation in an EFL setting
II. Perceived efficacy of English language teaching. Please tick () in the appropriate box.
Strongly Strongly
disagree Disagree Agree agree
8. The English class helps me know English history
and culture.
9. I did not learn much from the English class.
10. I have improved my English communication ability
through the English class.
11. The English class only helps me learn some
grammar rules and expressions.
12. I always look forward to having the English class.
13. Teacher feedback on my assignment is very timely.
14. Teacher written feedback on my essay is very
helpful to me.
15. I like to write in English more.
139
The jalt call Journal 2017: Regular Papers
III. Perceptions on English language learning. Please tick () in the appropriate box.
Strongly Strongly
disagree Disagree Agree agree
16. I learn English in order to get a good grade.
17. I like learning English.
18. I do not like to listen to teachers lecturing too
much, but prefer to join the English class activities.
19. I am not gifted in learning English.
20. English ability can only be improved through use,
rather than by listening to lectures.
21. My English should be assessed by my teacher,
rather than by myself or my classmates.
22. Teachers should play the main role in English
teaching, and the students should assist the
teacher.
23. The success or failure of my English learning lies
in myself.
24. My English will improve if I study hard.
25. I do not like quizzes during the term.
26. I do not think that self-correction can help improve
my English writing.
27. I do not think that English writing course is
necessary to me.
28. The computer and the Internet help with English
writing with their functions of storage, searching
and revision.
29. My English will improve if I got to know good
English learning methods.
30. Teacher feedback will help with my English writing.
31. The computer and the Internet can help improve
English learning.
32. I get very worried before and during the exam.
33. I do not like correcting my own essays.
34. Commenting on other students’ work help improve
my English writing a lot.
35. The computer and the Internet cannot provide
reliable and effective assessment of English essays.
36. Good essays result from continuous revisions.
IV. Experiences with the English writing course. Please tick () in the appropriate box.
37. How do you feel about the use of wrm in your writing?
A. Very satisfactory B. Satisfactory
C. Moderate D. Not Satisfactory
140
Tang & Rich: Automated writing evaluation in an EFL setting
VI. Use of wrm in writing. Please tick () in the appropriate box.
51. Do you revise essays in wrm before submission?
A Yes B. No
52. How many times did you revise your essays in wrm before submitting to your teacher?
A. More than five times B. Three to four times
C. Once or twice D. Never
53. I think the is the most helpful tool in wrm.
A. Hint B. Tutor C. Thesaurus D. Grammar Tree
54. Sort the tools below from the most important to the least important, and write your
answers in the blank.
A. Hint B. Tutor C. Thesaurus D. Grammar Tree
55. I wrm to correct punctuations and format errors.
A. never used B. seldom used
C. half the time used D. frequently used 141
The jalt call Journal 2017: Regular Papers
Appendix C
142
Tang & Rich: Automated writing evaluation in an EFL setting
Appendix D
Dear Teacher,
It will be a whole academic year in a week’s time after you have undertaken wrm-
integrated writing teaching experiment and we would like very much to learn about your
experiences about using wrm, hence we design this questionnaire and would appreciate
very much your feedback. Thank you!
From the project team
Instructions: The questionnaire consists of two parts: the overall and the specific aspects.
Please give brief answers to the questions in the overall aspect and specific answers to the
questions in the specific aspect.
* The tutor questionnaire was designed by Professor Yi’an Wu, advisor and key member of
our project team. The authors were grateful for her approval of its use in the study 143
The jalt call Journal 2017: Regular Papers
7) To what extent do you think your students have mastered the assessment criteria so
far?
8) Do your students have the opportunity to assess their own work? What about peer
assessment? What are their advantages and disadvantages?
9) Do students agree to the scores offered by wrm? If some students question the scor-
ing, how do you respond?
3. Teaching mode
1) Has the introduction of wrm changed your teaching procedure?
2) If yes, what changes are they?
3) Do you need to adjust your teaching from time to time in the class? Please specify.
4) In your class, have you ever encountered the situation in which students respond
differently toward wrm. If so, how did you deal with it?
5) With wrm assistance, do you know your students’ problems in the learning process
better?
6) How do you deal with these problems? Please give an example.
6. On learner autonomy
1) Do you find any changes in learner autonomy?
2) If there are changes, what are the changes?
3) Does wrm arouse your students’ interest in learning? Please specify.
4) What types of students are more interested in wrm?
5) What types of students are not interested in wrm?
6) Does students’ interest in wrm relate to their scores, characters, gender, computer
skills?
7. Through one-year teaching experiment, what do you think are the key factors for the
successful use of wrm in teaching?
9. If possible, are you going to continue the use of wrm in your teaching?
Appendix E
145
The jalt call Journal 2017: Regular Papers
Appendix F
Observations and reflections (which can center on the following three points):
1. Students’ overall performance in this essay (i.e. what they have done well, what problems
they have, and what they need to improve)
2. Experiences and reflections on marking this essay (i.e. in what areas the teacher has
done well and what area needs improvement; whether students like to practice writing
and revise in wrm, whether they have problems in writing, what type of problems;
whether they ask the teacher questions, what type of questions they might ask)
3. Students use of the online assessment system (i.e. students’ attitude toward wrm, what
type of students like using wrm, what type of students do not like or are afraid of
using wrm)
146