This art icle was downloaded by: [ Cit o Groep]
On: 27 January 2015, At : 04: 31
Publisher: Rout ledge
I nform a Lt d Regist ered in England and Wales Regist ered Num ber: 1072954 Regist ered
office: Mort im er House, 37- 41 Mort im er St reet , London W1T 3JH, UK
Assessment in Education: Principles,
Policy & Practice
Publicat ion det ails, including inst ruct ions f or aut hors and
subscript ion inf ormat ion:
ht t p: / / www. t andf online. com/ loi/ caie20
Integrating data-based decision making,
Assessment for Learning and diagnostic
testing in formative assessment
ab
ab
Fabienne M. Van der Kleij , Jorine A. Vermeulen , Kim
b
Schildkamp & Theo J. H. M. Eggen
ab
a
Psychomet ric Research Cent re, Cit o Inst it ut e f or Educat ional
Measurement , Arnhem, The Net herlands
b
Click for updates
Facult y of Behavioural Science, Universit y of Twent e, Enschede,
The Net herlands
Published online: 22 Jan 2015.
To cite this article: Fabienne M. Van der Kleij , Jorine A. Vermeulen, Kim Schildkamp & Theo J. H. M.
Eggen (2015): Int egrat ing dat a-based decision making, Assessment f or Learning and diagnost ic
t est ing in f ormat ive assessment , Assessment in Educat ion: Principles, Policy & Pract ice, DOI:
10. 1080/ 0969594X. 2014. 999024
To link to this article: ht t p: / / dx. doi. org/ 10. 1080/ 0969594X. 2014. 999024
PLEASE SCROLL DOWN FOR ARTI CLE
Taylor & Francis m akes every effort t o ensure t he accuracy of all t he inform at ion ( t he
“ Cont ent ” ) cont ained in t he publicat ions on our plat form . However, Taylor & Francis,
our agent s, and our licensors m ake no represent at ions or warrant ies what soever as t o
t he accuracy, com plet eness, or suit abilit y for any purpose of t he Cont ent . Any opinions
and views expressed in t his publicat ion are t he opinions and views of t he aut hors,
and are not t he views of or endorsed by Taylor & Francis. The accuracy of t he Cont ent
should not be relied upon and should be independent ly verified wit h prim ary sources
of inform at ion. Taylor and Francis shall not be liable for any losses, act ions, claim s,
proceedings, dem ands, cost s, expenses, dam ages, and ot her liabilit ies what soever or
howsoever caused arising direct ly or indirect ly in connect ion wit h, in relat ion t o or arising
out of t he use of t he Cont ent .
This art icle m ay be used for research, t eaching, and privat e st udy purposes. Any
subst ant ial or syst em at ic reproduct ion, redist ribut ion, reselling, loan, sub- licensing,
syst em at ic supply, or dist ribut ion in any form t o anyone is expressly forbidden. Term s &
Downloaded by [Cito Groep] at 04:31 27 January 2015
Condit ions of access and use can be found at ht t p: / / www.t andfonline.com / page/ t erm sand- condit ions
Assessment in Education: Principles, Policy & Practice, 2015
http://dx.doi.org/10.1080/0969594X.2014.999024
Integrating data-based decision making, Assessment for Learning
and diagnostic testing in formative assessment
Fabienne M. Van der Kleija,b*, Jorine A. Vermeulena,b, Kim Schildkampb and
Theo J.H.M. Eggena,b
a
Psychometric Research Centre, Cito Institute for Educational Measurement, Arnhem,
The Netherlands; bFaculty of Behavioural Science, University of Twente, Enschede,
The Netherlands
Downloaded by [Cito Groep] at 04:31 27 January 2015
(Received 21 July 2014; accepted 10 December 2014)
Recent research has highlighted the lack of a uniform definition of formative
assessment, although its effectiveness is widely acknowledged. This paper
addresses the theoretical differences and similarities amongst three approaches to
formative assessment that are currently most frequently discussed in educational
research literature: data-based decision making (DBDM), Assessment for Learning (AfL) and diagnostic testing (DT). Furthermore, the differences and similarities in the implementation of each approach were explored. This analysis shows
that although differences exist amongst the theoretical underpinnings of DBDM,
AfL and DT, the combination of these approaches can create more informed
learning environments. The thoughtful integration of the three assessment
approaches should lead to more valid formative decisions, if a range of evidence
about student learning is used to continuously optimise student learning.
Keywords: formative assessment; data-based decision making; assessment for
learning; diagnostic testing; theoretical comparison
Introduction
The complex interdependencies amongst learning, teaching and assessment are
increasingly being recognised. Assessment encompasses the use of a broad spectrum
of processes and instruments for gathering evidence about student learning, such as
paper-and-pencil tests, projects or observations (Stobart, 2008). In education, a distinction is made between summative assessment and formative assessment.
Whenever assessment results are intended to play a role in making a decision
about the mastery of a defined content domain, it has a summative purpose, for
example, in making a decision regarding selection, classification, certification or
placement (Sanders, 2011). Over shorter periods of time, assessments that serve a
summative purpose usually involve grading, which eventually adds up to a pass/fail
decision. If assessment results are intended to steer the learning process, assessment
is formative in purpose. Some researchers have defined assessments as formative
based on the actual use of assessment information to support learning (e.g. Black &
Wiliam, 1998). However, the currently dominant view in the literature supports the
notion that the fundamental distinction between formative and summative
*Corresponding author. Email: fabienne.vanderkleij@acu.edu.au
© 2015 Taylor & Francis
Downloaded by [Cito Groep] at 04:31 27 January 2015
2
F.M. Van der Kleij et al.
assessment lies in their purposes (e.g. Bennett, 2011). The purposes of summative
and formative assessments are not mutually exclusive; they can coexist as primary
and secondary purposes of the same assessment (Bennett, 2011).
The effectiveness of formative assessment is widely acknowledged. However,
these effectiveness claims are not always well grounded, amongst other things,
caused by the lack of a uniform definition of the concept of formative assessment
(Bennett, 2011).
The term formative was first applied in the context of individual student learning
by Bloom in 1969, describing assessment with the purpose of providing feedback to
direct future learning (although the term used was formative evaluation not formative assessment). Feedback is widely recognised as a crucial aspect of formative
assessment (Bennett, 2011; Brookhart, 2007; Sadler, 1989; Shepard, 2005; Stobart,
2008), and can be defined from two perspectives (Sadler, 1989): (1) the information
resulting from assessments that provides teachers and other stakeholders with
insights into student learning, and (2) feedback provided to students based on their
responses to an assessment task. The first form of feedback can be used by educators
to adapt instruction to the needs of learners, whereas the second can be used to
directly steer learning processes in students.
Formative assessment is a broad concept that comprises many definitions, e.g.
Assessment for Learning (AfL) and diagnostic testing (DT) (Bennett, 2011; Johnson
& Burdett, 2010), that have expanded over time (Brookhart, 2007) to include, for
example, self-regulated learning (e.g. Clark, 2012; Nicol & McFarlane‐Dick, 2006).
While numerous researchers have attempted to define formative assessment (Black
& Wiliam, 2009), to date a widely accepted definition has not emerged in the literature. In this paper, we define formative assessment broadly as any assessment that is
intended to support learning. Formative assessment can be seen as an umbrella term
that covers various approaches to assessment intended to support learning that have
different underlying learning theories (Briggs, Ruiz-Primo, Furtak, Shepard, & Yin,
2012). The term approach captures the underlying principles and intentions that
shape particular assessment uses.
Furthermore, it is helpful to make a distinction between formative evaluation and
formative assessment (Harlen, 2010; Shepard, 2005). The term formative evaluation
refers to the use of assessment data to make decisions concerning the quality of education at a higher aggregation level than the level of the learner or the class. Data
from summative assessment can also be used for formative evaluation (e.g. the use
of assessment data for policy development at the school level). Formative assessment, however, only concerns decisions at the levels of the learner and the class to
accommodate the pupils’ individual educational needs.
This paper examines the theoretical differences and similarities amongst three
approaches to formative assessment that are currently most frequently discussed in
educational research literature. The main feature that these approaches have in common is that the evidence gathered is interpreted and can subsequently be used to
change the learning environment in order to meet learners’ needs (Wiliam, 2011).
However, the way student learning is defined and the nature of the evidence
obtained differ to some extent within each approach. The first approach is
data-based decision making (DBDM), which originated in the USA as a direct
consequence of the No Child Left Behind (NCLB) Act. Within the NCLB, learning
outcomes are defined in terms of results and attaining specified targets (Wayman,
Spikes, & Volonnino, 2013). Second, Assessment for Learning (AfL), originally
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
3
introduced by scholars from the UK (Assessment Reform Group [ARG], 1999), is
an assessment approach that focuses on the quality of the learning process, rather
than merely on students’ (final) learning outcomes with emphasis on feedback to
students (Stobart, 2008). Finally, diagnostic testing (DT) was initially used to refer
students to special education, particularly those diagnosed as unable to participate in
mainstream educational settings (Stobart, 2008). DT provides detailed assessment
data about a learner’s problem-solving processes, which can implicate what a student needs to improve his or her learning process and learning outcomes (Crisp,
2012; Keeley & Tobey, 2011). DBDM and DT can be used for both formative and
summative assessment; this paper focuses on their potential formative uses.
Within some of the approaches, terminology and definitions are inappropriately
used interchangeably; therefore, it is valuable to review and compare the theoretical
underpinnings of DBDM, AfL and DT. For example, literature on DBDM tends to
cite literature concerning AfL, but not vice versa (e.g. Swan & Mazur, 2011). Moreover, discussions in assessment literature seem to revolve around finding evidence
of what works. As Elwood (2006) pointed out, these discussions do not acknowledge the complexity of the use of assessment for learning enhancement, and lead to
what she calls ‘quick fixes’ (p. 226). Ignoring the differences in the theoretical
underpinnings has led to theoretical ambiguity in the assessment literature, as shown
by studies that have entangled the terminology and definitions of the three
approaches. As a result, it is not feasible to study the effectiveness of each approach
separately. Bennett (2011) also stressed this ambiguity in the use of definitions:
‘Definition is important because if we can’t clearly define an innovation, we can’t
meaningfully document its effectiveness’ (p. 8).
Currently, a mix of these approaches is implemented in educational practice,
without the awareness of which approach is most suitable for a particular goal. In
order to jointly use the three approaches in an effective way, awareness of the possibilities and limitations of each approach is essential. However, due to the lack of
clarity on the definitions of these approaches and their goals, the implications for
practice are currently unclear. In addition, the ambiguity in the use of definitions is
reflected in educational policy, which further contributes to confusion about how formative assessment should be implemented in practice. For example, in British
Columbia, Canada, DBDM and AfL are the pillars of the programme of the Ministry
of Education (2002). However, while DBDM mainly focuses on what has to be
learned, AfL and DT seem to emphasise how students learn what has to be learned
(best), and the quality of the learning process (Stobart, 2008). Nevertheless, all three
approaches claim the importance of using feedback for learning enhancement, but
the procedures advised regarding the provision of feedback differ substantially.
Understanding the relation between the theoretical underpinnings and the prescriptions on why, how and when assessment should be used by learners, teachers and
schools to enhance learning is needed to move the field of educational assessment
forward.
It is important to recognise the differences between these approaches as an initial
exploration of what it might mean to blend these approaches in a meaningful way.
With this comparative theoretical analysis, we aim to contribute to a more coherent
research agenda within the field concerning the effectiveness of educational assessment programmes incorporating these approaches. Although the researchers
acknowledge that some of the differences between these approaches may be the
result of variations in the instigating contextual factors, this paper focuses on
4
F.M. Van der Kleij et al.
Downloaded by [Cito Groep] at 04:31 27 January 2015
conceptual rather than contextual issues. This is relevant because combinations of
these approaches are currently being used across various contexts, without taking
into account differences in conceptual underpinnings. Further, while our analysis
does not attempt to address the competing issues of summative and formative
assessment, reference is made to summative assessment as a means to clarify the
issue discussed. Note that we do not intend to make any claims about which assessment approach is most effective for improving student learning.
This paper explores the similarities and differences in the theoretical underpinnings of DBDM, AfL and DT. Also, the implications for implementing DBDM, AfL
and DT in educational practice are identified and contrasted. The paper aims to identify possibilities and potential stumbling blocks for the thoughtful integration of the
three approaches.
Learning theories and assessment paradigms
It is remarkable that most literature about assessment approaches rarely makes explicit the theoretical assumptions about learning. For example, a recent review on formative assessment identified that the studies which relate formative assessment to a
learning theory are scarce (Sluijsmans, Joosten-ten Brinke, & Van der Vleuten, submitted). As formative assessment lies at the intersection of instruction and assessment, its approaches are influenced by both learning theories and assessment
paradigms. Implementing a system-wide formative assessment approach requires an
alignment of assessment practices, which starts with an understanding of the learning theories behind currently dominant approaches (Elwood, 2006) and their impacts
on assessment and feedback. The following five learning theories are most prominent in current assessment literature, and are relevant to our comparison of the three
assessment approaches: neobehaviourism, cognitivism, metacognitivism, social cultural theory and (social) constructivism (Thurlings, Vermeulen, Bastiaens, & Stijnen,
2013).
The formative concept originated from neobehaviourism, as introduced by
Bloom (1969). As the dominant theory of learning from the 1930s, it focused on
behavioural rather than cognitive mechanisms of learning (Stobart, 2008; Verhofstadt-Denève, Van Geert, & Vyt, 2003). In contrast, cognitivists such as Piaget
focused on changes in cognitive structures rather than in behaviour (VerhofstadtDenève et al., 2003). Cognitivism highlights information processing and knowledge
representation in the memory, rather than learning mechanisms (Shuell, 1986). In
metacognitivism, the emphasis is on learning how to learn and regulating the learning processes by regularly providing feedback (Butler & Winne, 1995). In 1978,
Vygotsky introduced the social cultural theory of learning. This theory used
feedback in the form of scaffolding as the most important learning mechanism for
supporting students’ knowledge and skill acquisition, which learners were not yet
able to apply on their own. Learning occurs through social interactions and dialogues between the learner and the teachers, or his or her peers. Vygotsky believed
that to promote student learning, assessments should focus on what students are able
to learn, rather than what they have learned so far (Verhofstadt-Denève et al., 2003).
In constructivism, learning is seen as a cyclic process in which new knowledge and
skills are built on prior ones through continuous adaption of the learning environment to the learners’ needs (Jonassen, 1991; Stobart, 2008). In social constructivism,
the learners’ active role is emphasised, and teachers are expected to actively engage
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
5
learners in constructing knowledge and developing skills by frequently providing
elaborated feedback (Thurlings et al., 2013).
Assessment approaches are not only underpinned by learning theories, but also
by distinct assessment paradigms. A historically dominant field of study in assessment is psychometrics, which relies on the use of standardised, objective and quantified evidence, typically for summative purposes (Moss, Pullin, Gee, & Haertel,
2005). Assessments in this psychometric paradigm are designed to capture individual student ability, achievement or affect, compared with norm-referenced or criterion-referenced standards, independent of context of the assessment task
construction. As a consequence of the summative purposes, assessment was seen as
a separate activity from teaching and learning, and there were no clear links made to
learning theories.
A contrasting assessment paradigm originates from a sociocultural point of
view, in which the learner is inextricably connected with and involved in the
environment in which learning occurs. Assessment in a sociocultural paradigm is
highly situated and does not directly allow for generalisations of performance to
other situations. The assessment methods employed aim to examine learning in a
particular context and take into account the relationships between learners and the
broader community (Moss et al., 2005). These assessment paradigms have different theoretical and historical foundations, and employ significantly different methods for gathering evidence about student learning and for making decisions based
on this evidence.
While there is no direct alignment between learning theories and assessment paradigms, assessment methods based on different learning theories can be ordered on
a continuum from psychometric to sociocultural. Neobehaviourism and cognitivism
typically employ assessment strategies in line with the psychometric paradigm.
Assessment from a neobehaviourist perspective emphasises memorisation of facts,
and feedback is intended to reinforce correct recall of these facts (Hattie & Gan,
2011; Narciss, 2008). These facts are seen as independent of the context in which
they have been taught (Stobart, 2008). Because the outcome of learning in cognitivism is behavioural change, the accompanying assessment and teaching practices are
primarily of a retroactive nature, meaning that remediation is used to redirect the
learning process and promote learning (Stobart, 2008). Feedback is often intended to
correct incorrect responses (Kulhavy & Stock, 1989; Thurlings et al., 2013), and
provided by an expert to a passive learner (Evans, 2013). However, the characteristics of the learner and the task are taken into account.
In metacognitivism, assessment is aimed at metacognitive knowledge and skills.
The feedback message is usually about how the learner learns, rather than about
what the learner learns (Brown, 1987; Stobart, 2008). This approach leans towards a
sociocultural paradigm of assessment, because the emphasis is on how a learner
learns in a particular context. Assessment in practices based on social cultural theory
naturally aligns with a sociocultural assessment paradigm. However, although
Vygotsky’s social cultural theory resulted in an international shift in teaching
practices, retroactive assessment practices, which focus on remediation of individual
abilities, have remained popular (Elwood, 2006; Stobart, 2008). Thus, although
learning is seen as a sociocultural interactive activity, assessment remains mostly an
individual activity in practice. Collaborative learning and solving real-world problems, which use peer feedback as an important learning mechanism, characterise
social constructivist learning environments (Lesgold, 2004; Stobart, 2008; Thurlings
6
F.M. Van der Kleij et al.
et al., 2013). Although individual assessment still occurs in practice in constructivist
and social constructivist learning environments, these learning theories clearly align
with a sociocultural paradigm of assessment.
Downloaded by [Cito Groep] at 04:31 27 January 2015
Analysis of the three approaches
In the following sections, this paper discusses the theoretical underpinnings of each
approach in terms of its origin, definition, goals and assessment paradigms. We recognise the importance of understanding the underlying learning theories in the foundations of the three assessment approaches. However, in comparing the approaches,
emphasis remains on the differences in their assessment paradigms as they provide
clear implications for practice. Next, the implementation of each approach is discussed in terms of aggregation levels, assessment methods and feedback loops.
Theoretical underpinnings of DBDM
Teachers make several instructional decisions intuitively (Ingram, Louis, &
Schroeder, 2004; Slavin, 2002, 2003). However, educational policies such as the
NCLB act have caused an increase in accountability requirements, which has
stimulated the use of data for informing school practice in the USA (Wayman,
Jimerson, & Cho, 2012). Using data to inform decisions in the school, for example
about instruction, is referred to as DBDM (Ledoux, Blok, Boogaard, & Krüger,
2009). Schildkamp and Kuiper (2010) defined DBDM as, ‘systematically analyzing
existing data sources within the school, applying outcomes of analyses to innovate
teaching, curricula, and school performance, and, implementing (e.g. genuine
improvement actions) and evaluating these innovations’ (p. 482). The definition of
data in the context of schools is, ‘information that is systematically collected and
organized to represent some aspect of schooling’ (Lai & Schildkamp, 2013, p. 10).
This definition is broad and includes any relevant information derived from
qualitative and quantitative measurements, but the main emphasis is on objective
data (Lai & Schildkamp, 2013; Wayman et al., 2012).
Data include not only assessment results, but also other data types, such as
student background characteristics. Data use can be described as a complex and
interpretive process, in which data have to be identified, collected, analysed
and interpreted to become meaningful and useful for actions (Coburn, Toure, &
Yamashita, 2009; Coburn & Turner, 2012). The action’s impact is evaluated by
gathering new data, thus creating a feedback loop (Mandinach & Jackson, 2012).
Early initiatives of DBDM were based on neobehaviourism and cognitivism
(Stobart, 2008), which meant that no explicit attention was paid to the sociocultural
environment where learning occurred. Previously, DBDM focused on reaching predetermined goals, checking if the goals had been achieved and adapting the learning
environment where needed. This process was mainly transmissive in nature, meaning that educational facilitators (e.g. teachers) were responsible for delivering adequate instruction to learners. In this view, learning is an individual activity, and
assessments are used to check on the individual student’s ability (Elwood, 2006), in
line with a psychometric paradigm (Moss et al., 2005). As a consequence of this
view, assessment methods used for DBDM, such as standardised tests, do not
account for the variety of contexts in which the learning occurred.
Assessment in Education: Principles, Policy & Practice
7
Downloaded by [Cito Groep] at 04:31 27 January 2015
However, lately DBDM seems to have moved somewhat more towards a sociocultural paradigm, focusing on continuously adapting learning environments to facilitate and optimise learning processes, taking into account learners’ needs and
individual characteristics. Thus, instead of just acknowledging the context or controlling for it, the emphasis is on the process of data use within a particular context
(Coburn & Turner, 2011; Schildkamp, Lai, & Earl, 2013; Supovitz, 2010). While
this movement is evidenced in many countries, it has not occurred in all.
By using data, teachers can set appropriate learning goals, given students’ current achievements. Subsequently, teachers can assess and monitor whether students
are reaching their goals, and if necessary, adjust their instruction (Bernhardt, 2003;
Earl & Katz, 2006). In this way, DBDM is used for formative assessment. Data can
also be used for formative evaluation by school leaders and teachers for policy
development and school improvement planning, teacher development and monitoring the implementation of the school’s goals (Schildkamp et al., 2013; Schildkamp
& Kuiper, 2010).
Implementation of DBDM: aggregation level, assessment methods, and feedback
loops
Aggregation level
Data collection regarding DBDM takes place at the school, class and student levels.
Data can be gathered from different stakeholders. At the student and class levels,
assessment outcomes are an important source of information about how learning
processes could be improved for both students and teachers. Data can also be used
at the school level for school development purposes, for example, to increase aggregated student achievement (Schildkamp & Lai, 2013).
Assessment methods
Different data types can be used for school and instructional development. The data
type most often referred to is objective output data from standardised tests, for
example, from a student-monitoring system. However, data such as this are less frequently available than those from informal assessment situations, such as homework
assignments. Next to these formally gathered data, teachers possess data collected
using various standardised assessment methods, curriculum-embedded assessments
and (structured) observations from daily practice (Ikemoto & Marsh, 2007).
Effective DBDM requires access to high-quality data, because the quality of the
decision depends upon the quality of the data used (Coburn & Turner, 2011;
Schildkamp & Kuiper, 2010). For the implementation of DBDM, schools need
access to multiple sources of high-quality data (especially if the stakes are high) and
therefore need a good data use infrastructure (Breiter & Light, 2006; Wayman &
Stringfield, 2006).
Feedback loops
The most frequently used kind of feedback in DBDM is feedback based on assessment data. Teachers and other educators have to transform assessment data into
meaningful actions for educational improvement. These actions include making
Downloaded by [Cito Groep] at 04:31 27 January 2015
8
F.M. Van der Kleij et al.
changes in practice and providing students with feedback on their learning processes
and outcomes (Schildkamp et al., 2013). The primary user of feedback is the teacher
at the class level.
DBDM starts with a purpose, often in the form of a problem with regard to student achievement. Assessment data serve as a form of feedback for teacher and student use in identifying gaps between the current and the desired level of
achievement. Subsequently, data are collected to investigate the possible causes of
the discrepancy. These data have to be filtered, organised to investigate the possible
causes, and analysed and interpreted to become useful information. Combined with
stakeholder understanding and expertise, this becomes actionable knowledge. Based
on the data, actions such as making adjustments to instruction can be taken, or it
can indicate the need for new data to be collected. If actions are taken, evaluation
needs to occur once again to ascertain if these changes solved the discrepancy, creating another feedback loop (Mandinach, Honey, Light, & Brunner, 2008; Marsh,
2012; Marsh, Pane, & Hamilton, 2006; Schildkamp & Poortman, in press). The
length of these cycles of inquiry and feedback loops varies, but it is advised that
teachers engage in continuous cycles of inquiry (Timperley, 2009). These feedback
loops can be relatively long when involving the use of standardised or commercially
available assessments, which are often only available once or twice a year. The
majority of these loops are retroactive in nature, meaning that based on data,
achievement gaps are identified and addressed.
Theoretical underpinnings of AfL
AfL was originally introduced by UK scholars as a resistance to the emphasis on
summative uses of assessments (Stobart, 2008). This approach focuses specifically
on the quality of the learning process instead of on its outcomes. Moreover, ‘it puts
the focus on what is being learned and on the quality of classroom interactions and
relationships’ (Stobart, 2008, p. 145).
The ARG defined AfL as, ‘ … the process of seeking and interpreting evidence
for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there’ (2002, p. 2). However, this
definition was often misinterpreted (Johnson & Burdett, 2010; Klenowski, 2009).
For this reason, Klenowski (2009) reported on what she referred to as a ‘secondgeneration definition’ of AfL, which will be used in this paper: ‘part of everyday
practice by students, teachers and peers that seeks, reflects upon and responds to
information from dialogue, demonstration and observation in ways that enhance
ongoing learning’ (p. 264).
Hargreaves (2005) concluded that there are two approaches within AfL, a measurement and an inquiry approach. In the measurement approach, in line with a psychometric paradigm, AfL is viewed as an activity that includes marking, monitoring
and showing a level. In this view, (quantitative) data are used to formulate feedback
and to inform decisions. Assessment is seen as a separate activity from instruction
that shows to what degree a predetermined level has been achieved. This approach
resembles the definition of DBDM, and does not reflect the original intentions of the
AfL approach as formulated by the ARG (2002). In the inquiry approach, AfL is a
process of discovering, reflecting, understanding and reviewing. It is focused on the
process, and assessments are integrated into the learning process. Qualitative sources
of information, such as observations, demonstrations and conversations, play an
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
9
important role. This approach reflects a sociocultural paradigm of assessment. In
both AfL approaches, feedback is used to steer future learning; however, in the measurement approach, feedback might be less immediate and feedback loops less frequent. The AfL approach described in this study leans towards the inquiry approach,
as described by Klenowski (2009).
In AfL literature, classroom dialogues are stressed as an important learning activity. This idea is theoretically underpinned by metacognitivism, social cultural theory
and social constructivism. Learning is viewed as a social activity; learning occurs
through interaction. Thus, knowledge and skills are believed to depend on the context, and to exist in the relationships amongst the individuals involved in that context. As a result, assessment should not be seen as an individual activity (Elwood,
2006), reflecting the sociocultural paradigm of assessment (Moss et al., 2005).
AfL is aimed at the quality of the learning process instead of its outcomes (e.g. a
grade). This goal stimulates a learning-oriented rather than an outcome-oriented
classroom culture (Stobart, 2008) and resists the traditionally dominant psychometric
paradigm of assessment. AfL makes it possible to anticipate weaker points in the
current learning process and identify further steps to take for improvement (ARG,
1999). Students have a central role in the learning process; as a result, they actively
participate in the evaluation of their own learning (Elwood & Klenowski, 2002).
Furthermore, AfL aims to increase learner autonomy, motivation and reflection by
facilitating an inquiry-oriented and interactive classroom climate (Klenowski, 2009).
Implementation of AfL: aggregation level, assessment methods and feedback
loops
Aggregation level
The AfL approach takes place within the classroom; it concerns decisions about the
entire class or individual students. The information used to make decisions is gathered from students.
Assessment methods
The data used to inform decisions can come from various assessment sources, such
as paper-and-pencil tests, dialogues (e.g. questioning and discussions), practical
demonstrations of learning, portfolios, peer assessment or self-assessment (Gipps,
1994). Hence, the evidence gathered about the learning process of the learners can
be both qualitative and quantitative in nature. These assessment events can be
planned as well as unplanned, and be formal or informal. Continuous interactions
between learners and the teacher characterise the process.
The quality of the assessment process depends largely on the degree of the teacher’s capability to identify usable data about student learning, make inferences
about student learning, and translate this information into instructional decisions and
feedback to students (Bennett, 2011). Thus, the assessment quality depends on the
degree to which assessment results provide actionable information for formative purposes over the short term, which is a low-stakes type of use. The central role of the
student is also emphasised in AfL; students need to understand and (be willing to)
act on feedback (Ruiz-Primo & Furtak, 2006).
10
F.M. Van der Kleij et al.
Downloaded by [Cito Groep] at 04:31 27 January 2015
Feedback loops
AfL takes place in everyday practice; continuous dialogues and feedback loops characterise the process, in which (immediate) feedback is used to direct further learning
(Stobart, 2008). Since assessments are integrated into the learning process, assessment opportunities are plentiful, and feedback loops are usually short. Moreover,
students are stimulated to assess themselves and their peers, which, amongst other
things, stimulates students’ understanding of what and why they are learning
(Elwood & Klenowski, 2002). Based on the evidence gathered, continuous
adaptation takes place to meet learners’ needs. Thus, the majority of the feedback
loops are interactive in nature, but retroactive or proactive loops also occur.
Theoretical underpinnings of DT
The aim of DT is to identify the learner’s developmental stages by obtaining actionoriented, fine-grained assessment data, also referred to as process data (Rupp,
Gushta, Mislevy, & Shaffer, 2010). By using cognitive theories, process data can be
interpreted and used to identify misconceptions and knowledge associated with the
learner’s developmental stage. The intended fine-grained data resulting from the
measurements in DT, compared to regular assessments, make it exceptionally useful
for formative purposes. DT is the result of the need for assessments that combine
the psychometric and the sociocultural paradigms. Assessment is individual and
employs quantitative methods, but the results are not intended to be generalised to
other contexts or be compared to other students.
Currently, cognitive diagnostic assessment (CDA) is the most commonly used
approach of diagnostic testing. Cognitive in this context indicates the use of cognitive and developmental psychology research as input for the design of diagnostic
assessments. By connecting assessments to research regarding learning progression,
teachers are enabled to interpret students’ task behaviour in relation to the stipulated
learning trajectory to find out whether redirection is needed (Gravemeijer, 2004).
Learning progressions, also known as hypothetical learning trajectories (HLT, e.g.
Simon, 1995), consist of hypotheses about the possible ways students could develop
a certain ability (Corcoran, Mosher, & Rogat, 2009; Daro, Mosher, & Corcoran,
2011; Furtak, 2012; Gravemeijer, 2004).
The assumption in DT is that how a task is solved is indicative of the developmental stage of the learner. Collecting data about the procedural steps the learner
takes during an assessment can identify the learner’s (inadequate) reasoning styles,
and wrongly executed procedural steps caused by misconceptions and prior knowledge, amongst other things (Crisp, 2012; Keeley & Tobey, 2011).
DT is based on principles from cognitivism (Leighton & Gierl, 2007a, 2007b).
Furthermore, Stobart described diagnosing student learning needs as identifying
‘how much progress can be made with adult help … ’ (2008, p. 55), that is, the zone
of proximal development (Vygotsky, 1978). The fine-grained process data obtained
with DT are particularly useful for creating scaffolds designed to meet the learner’s
needs. In this way, DT is related to Vygotsky’s social cultural learning theory, where
assessment focuses on identifying scaffolds that help students reach the next zone of
development.
Assessment in Education: Principles, Policy & Practice
11
Downloaded by [Cito Groep] at 04:31 27 January 2015
Implementation of DT: aggregation level, assessment methods and feedback loops
Aggregation level
DT concerns the assessment of the educational needs of individual students. Because
of the nature of the instruments used in DT, data should not be aggregated to levels
higher than the individual level (Rupp et al., 2010). Furthermore, DT is not meant
for comparing students to one another, but for promoting the learning and developmental process of individual students.
Assessment methods
In order to make inferences about the problem-solving process during an assessment, the assessment tasks should be designed to make possible valid inferences
about how the student’s task behaviour relates back to his or her thinking. This
inferential chain stems from the empirical knowledge available from informationprocessing theories, cognitive psychology and learning trajectories (Daro et al.,
2011; Leighton & Gierl, 2007a; Verhofstadt-Denève et al., 2003). Based on theoretical assumptions and empirical research, the items in an assessment have certain
characteristics that are assumed to elicit a response behaviour related to the learner’s
developmental stage (Leighton & Gierl, 2007a).
The degree to which the assessment results are indicative of the developmental
stage of a student is crucial to the quality of the assessment methods used in DT.
Although including more items with the same characteristics in the assessment will
increase the certainty of the inferences about related misconceptions, it will also make
the assessment process less efficient (Rupp et al., 2010). For example, if the aim is to
identify an arithmetic misconception, and a student makes an associated error on one
item, it is possible that this error is caused by something other than that particular misconception. However, when the student consistently shows the same error on several
items with similar characteristics, the inference about the misconception becomes
stronger. Nevertheless, choosing details over certainty, in terms of test accuracy, is not
problematic with short feedback loops, because the latter provides the opportunity to
redirect the decisions made. In this case, the stakes in terms of possible negative consequences for the learner are relatively small (Rupp et al., 2010).
Moreover, to cope with this trade-off between grain size and certainty about
inferences, assessment developers in DT often consider the design of (computerised)
adaptive tests, meaning that the selection of the next item depends on the student’s
response to the previous item (Eggen, 2004). Adaptivity offers the possibility to
make the assessment process more efficient; items can be chosen based on their content and difficulty, for example, to diagnose a student’s strategy choice. Sometimes,
these types of assessments are referred to as dynamic assessments, which are usually
embedded in a computerised adaptive learning environment. This means that when a
student cannot solve a task, he or she will receive a minimally intrusive hint. In this
way, the materials are used for both assessment and learning, by providing
diagnostic information about a student’s learning needs and item-based feedback
(Stevenson, Hickendorff, Resing, Heiser, & de Boeck, 2013).
Feedback loops
Although DT has the potential to be used for retroactive, proactive or interactive formative assessment, it is primarily used retroactively (Crisp, 2012; Stobart, 2008). In
12
F.M. Van der Kleij et al.
Downloaded by [Cito Groep] at 04:31 27 January 2015
dynamic assessments, DT is used interactively; learning and assessment are integrated. When DT focuses on the assessment of prior knowledge to plan instruction,
it is used proactively. Finally, when DT is used to identify, for example, misconceptions or buggy problem-solving strategies, feedback is used for remediation, resulting in a retroactive feedback loop. Short feedback loops in DT are preferred because
the learner’s thinking and use of problem-solving strategies are highly likely to
change over time. However, delayed feedback could still be effective when the
change in the learner’s thinking and the development of new strategies cover longer
periods of time. Thus, the length of feedback loops should match the student’s learning curve for the subject matter that is the assessment’s objective. A mismatch
between the two might result in negative consequences, hindering the optimisation
of the learning process.
Comparison of the three approaches
This section addresses the theoretical differences and similarities amongst the three
approaches. Furthermore, the variations in the implementation of each approach are
also explored.
Theoretical underpinnings of DBDM, AfL and DT
To explore the similarities and differences in the dominant assessment paradigms of
DBDM, AfL and DT, we compared the underlying learning theories of these
approaches and their goals (Table 1).
Table 1 shows that DBDM, AfL and DT are underpinned by different dominating assessment paradigms. Consequently, the goals of the three approaches differ
substantially; each approach aims to promote learning through different mechanisms,
which results in different expectations of the roles of teachers, students and other
actors in the learning, assessment and feedback process. These expectations sometimes contradict each other; for example, in traditional views on DBDM, the responsibility for the assessment process is primarily in the teacher’s hands, in line with a
Table 1. Comparison of DBDM, AFL and DT regarding the theoretical underpinnings.
Approach
Theoretical
aspect
Dominant
assessment
paradigm
Goals
DBDM
AfL
DT
Psychometric
Sociocultural
Psychometric and
sociocultural
Improve the quality of
education and the
quality of instruction by
using data to monitor
and steer practices to
reach intended goals
(e.g. increased student
achievement)
Improve the quality of
the learning process by
engaging learners to
evaluate and reflect on
their own learning and
steering the learning
process through
continuous feedback
Collect fine-grained
data about a student’s
zone of proximal
development, prior
knowledge, and
reasoning styles that
can inform decisions on
adapting the learning
environment to the
learner’s needs
Assessment in Education: Principles, Policy & Practice
13
Downloaded by [Cito Groep] at 04:31 27 January 2015
psychometric paradigm. On the other hand, in AfL, the teacher and students share
this responsibility in line with a sociocultural paradigm, for example, in the form of
self- and peer assessment (Stobart, 2008; Wiliam, 2011). Recent literature on
DBDM, however, shows a shift towards shared teacher–student responsibility for
assessment (Schildkamp et al., 2013).
Implementation of DBDM, AfL and DT
To explore the consequences of the similarities and differences in DBDM, AfL and
DT for implementing these approaches in educational practice, we compared their
aggregation levels, assessment methods and feedback loops (Table 2). Figure 1 shows
the overlapping levels of the decisions in the three approaches. In DBDM, data are
aggregated at the school level to make decisions with regard to improving the school’s
quality (formative evaluation). Additionally, data are used at the class and student levels to adjust instruction to meet students’ needs (formative assessment). The latter
overlaps with AfL. DT solely focuses on assessment and instructional decisions at the
student level. Because the three approaches aim to promote learning at different
aggregation levels, they can be considered to complement each other.
The diversity in the goals of DBDM, AfL and DT is associated with the variation of the assessment methods that are typically used, resulting from the varying
dominant assessment paradigms. For example, AfL uses classroom conversations,
while DBDM and DT often employ standardised tests. The different choices in the
use of assessment methods are primarily associated with the nature of the data, and
the purposes and stakes regarding the use of these data. In DBDM, most data are
Table 2. Comparison of the implementation of DBDM, AfL and DT.
Approach
Implementation
characteristic
Level
Assessment
methods
Feedback
loops
DBDM
School
Class
Student
Standardised
assessments
Formal
classroom
assessments
Structured
classroom
observations
Immediate and
delayed
feedback
Retroactive
AfL
DT
Class
Student
Student
Informal classroom
dialogues
Formal classroom
assessments
Practical
demonstrations
Portfolios
Peer assessments
Self-assessments
Immediate
feedback
Interactive,
sometimes
retroactive or
proactive
(Adaptive) tests
with items that
elicit detailed
information
about a
student’s
reasoning
Immediate and
delayed
feedback
Mostly
retroactive,
potentially
proactive
or interactive
14
F.M. Van der Kleij et al.
Downloaded by [Cito Groep] at 04:31 27 January 2015
Figure 1. Overlapping levels of the decisions in the three approaches.
quantitative in nature; especially at the school level, high-quality data are needed as
the stakes are often higher. In contrast, most data are qualitative in AfL, because
they mainly aim to provide immediate information, which informs decisions on how
to direct learning processes. These are low-stakes decisions; if the adaptations in the
learning environment do not produce the intended effects, this will become quickly
clear from subsequent assessments, whereupon the adaptation strategy can be changed. Thus, the adaptation process is flexible. In DT, fine-grained, quantitative data
are usually gathered and translated into qualitative statements on which teachers can
take immediate action. Although DT uses quantitative data similar to DBDM, the
quality requirements are different from those of DBDM. The stakes associated with
formative decisions for which DT is used are lower as the consequences are not irreversible.
With respect to feedback mechanisms, we found the use of feedback loops in all
three approaches. However, because the approaches aim at formative assessment
and formative evaluation at different levels, these feedback loops also take place at
various levels and frequencies. In DBDM, the retroactive feedback loops that occur
at the school level are spread out over time. In AfL, continuous dialogues and feedback loops are essential, which results in short, frequently interactive, and sometimes retroactive or proactive feedback loops. Regarding DT, the length of feedback
loops should match the student’s learning curve for the subject matter that is the
assessment’s objective.
Discussion
The DBDM, AfL and DT approaches are all seen as powerful ways to support and
enhance student learning. Educational practice implements a mix of these
approaches, but in order to jointly use the three approaches in an effective way,
awareness of the goals, possibilities and limitations of each approach is essential.
The differences amongst the implementation of assessment approaches stem from
differences in their theoretical underpinnings (Stobart, 2008) and their associated
assessment paradigms. This study compared the similarities and differences in the
theoretical bases and implementation of DBDM, AfL and DT.
Our comparison suggests that the original theoretical underpinnings of the
approaches differ in their definitions of learning and dominant assessment paradigms. Nevertheless, all approaches increasingly recognise that assessment focus
should be both on the learning process and on the learning outcomes. Over the
years, the approaches have been borrowing best practices from one another, without
paying specific attention to why certain techniques benefit student learning. This has
led to practices that are sometimes hard to trace back to a specific assessment
approach.
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
15
It is important to realise that the various assessment approaches have different
relevance at various stages in the learning process. Moreover, a comprehensive set
of assessment methods that are underpinned by different learning theories are needed
to fully grasp the complexity of learning at all levels. If one wants to use assessments formatively, one should acknowledge which learning mechanisms are applicable for decision making at the school, class or student level. Integrating the three
assessment approaches in a thoughtful way can lead to more valid formative decisions. Decisions are no longer based on a single data type at one aggregation level
based on one dominant assessment paradigm, but on multiple data sources gathered
from multiple perspectives at different aggregation levels. Integrating the assessment
approaches will enable schools to capture various aspects of their curriculum and
the different learning activities of their students. Consequently, school staff will be
able to continuously provide feedback at the school, class and individual levels, to
guide and enhance student learning.
To integrate the approaches, different feedback loops should be simultaneously
active on each level in schools. At different points in the education process, retroactive, interactive or proactive feedback loops can be used to optimise student learning. However, in order for this to be effective, being aware of which approach is
most appropriate in a certain context is an important starting point. At the school
level, DBDM can be used, for example, to monitor curriculum goals, to group students differently to enhance learning and to improve the quality of education. Moreover, DBDM can be applied to monitor student achievement goals at the class level.
Similarly, DBDM can be employed to monitor individual progress. The DBDM
approach in current practice is often connected to the use of standardised external
assessments; therefore, feedback loops usually extend over a longer period of time.
Data in DBDM are objective but decontextualised and often limited in scope, and it
is up to educators to determine how to reach the goals.
The AfL approach can be used at the class and individual levels to improve the
quality of the learning process by engaging learners to evaluate and reflect on their
own learning, and steering the learning process through continuous feedback. This
approach relies to a large extent on rich qualitative and informally gathered sources
of information in a local context, highly informative to give directions for learning
and teaching in daily classroom practice. Nevertheless, teachers’ inferences are
likely to be biased to some extent (Bennett, 2011); therefore, standardised assessments should be used once in a while to check on students’ learning outcomes in
terms of overall curriculum goals and standards in line with a DBDM approach.
Regular assessments can often indicate a level of performance, but do not usually
provide information on the causes of this performance.
DT can be employed at the individual level to collect fine-grained data about a
student’s zone of proximal development, prior knowledge and reasoning styles. The
detailed evidence gathered can inform decisions on how students can best be taught
and what is needed to adapt the learning environment to the learner’s needs.
This study shows that although differences exist amongst the theoretical underpinnings of DBDM, AfL and DT, implementing an overarching formative assessment and formative evaluation approach could lead to more valid decisions at
different levels in schools. We initially explored what it might mean to integrate
these approaches. Although the comparison highlighted the promising possibilities
of an overarching formative assessment approach, a crucial question remains: how
to create the right balance amongst three approaches, each of which makes unique
16
F.M. Van der Kleij et al.
contributions to assessment practices. Educators need clarity on why, how and when
assessment should be used by whom, and there is also a need to triangulate assessment evidence to inform decisions about individual learners and the class as a
whole. It might be that further professional development is required to enable educators to successfully implement an integrated formative assessment approach. Future
research is needed to examine the actual thoughtful integration of these three
approaches with their associated challenges and opportunities.
Acknowledgements
Downloaded by [Cito Groep] at 04:31 27 January 2015
The authors would like to thank the many people who supported and inspired us on this
endeavour. An earlier version of this paper was published in Fabienne van der Kleij’s doctoral dissertation (2013).
Funding
This research was supported by Cito, Institute for Educational Measurement in the
Netherlands, and the University of Twente in the Netherlands. The paper was presented at the
14th annual conference of the Association for Educational Assessment Europe (AEA-E), Paris
2013, with support from the AEA-E Kathleen Tattersall New Researcher Award. Further
partial support in developing this paper was received from Australian Catholic University.
Notes on contributors
Fabienne M. Van der Kleij completed her PhD at the Research Centre for Examinations and
Certification (RCEC), a collaboration between Cito and the University of Twente in the
Netherlands. She conducted her research at Cito’s Psychometric Research Centre. Her
specialisations are: feedback effectiveness, computer-based assessments, Assessment for
Learning, data-driven decision making and diagnostic testing. She is currently employed as a
research fellow at Australian Catholic University.
Jorine A. Vermeulen is a PhD student at RCEC. She conducts her research at Cito’s Psychometric Research Centre. Her specialisations are diagnostic testing, primary school mathematics, classroom assessment, process data and the use of tablets in assessment.
Kim Schildkamp is an associate professor in the Faculty of Behavioural Sciences of the
University of Twente. In 2007, she obtained her PhD on school self-evaluation. Her research,
in the Netherlands but also in other countries, focuses on ‘data-based decision making for
school improvement’. She is a board member of the International Congress for School
Effectiveness and Improvement (ICSEI) and chair of the ICSEI data use network. She has
published widely on the use of data.
Theo J.H.M. Eggen is a senior research scientist at the Psychometric Research Centre of Cito
and full professor of psychometrics at the University of Twente. Consultancy and research on
educational and psychometrical issues of test development are his main activities. His specialisations are item response theory, quality of tests, (inter)national assessment and computerised adaptive testing. He is the author of numerous research articles and chapters of
textbooks. He is the scientific director of RCEC.
References
Assessment Reform Group. (1999). Assessment for learning: Beyond the black box.
Retrieved from http://assessmentreformgroup.files.wordpress.com/2012/01/beyond_black
box.pdf
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
17
Assessment Reform Group. (2002). Assessment is for learning: 10 principles. Researchbased principles to guide classroom practice. Retrieved from http://assessmentreform
group.files.wordpress.com/2012/01/10principles_english.pdf
Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education:
Principles, Policy & Practice, 18, 5–25. doi:10.1080/0969594X.2010.513678
Bernhardt, V. L. (2003). Using data to improve student achievement. Educational Leadership, 60(5), 26–30. Retrieved from http://www.ascd.org/publications/educational-leader
ship/feb03/vol60/num05/No-Schools-Left-Behind.aspx
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5, 7–74. doi:10.1080/0969595980050102
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational
Assessment, Evaluation and Accountability, 21, 5–31. doi:10.1007/s11092-008-9068-5
Bloom, B. S. (1969). Some theoretical issues relating to educational evaluation. In R. W.
Tyler (Ed.), Educational evaluation: New roles, new means (National Society for the
Study of Education Yearbook, Vol. 68, Part 2, pp. 26–50). Chicago, IL: University of
Chicago Press.
Breiter, A., & Light, D. (2006). Data for school improvement: Factors for designing effective
information systems to support decision-making in schools. Educational Technology &
Society, 9, 206–217. Retrieved from http://www.ifets.info/journals/9_3/18.pdf
Briggs, D. C., Ruiz-Primo, M. A., Furtak, E., Shepard, L., & Yin, Y. (2012). Meta-analytic
methodology and inferences about the efficacy of formative assessment. Educational
Measurement: Issues and Practice, 31, 13–17. doi:10.1111/j.1745-3992.2012.00251.x/full
Brookhart, S. B. (2007). Expanding views about formative classroom assessment: A review
of the literature. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into
practice (pp. 43–62). New York, NY: Teachers College Press.
Brown, A. (1987). Metacognition, executive control, self-regulation, and other more mysterious mechanisms. In F. E. Weinart & R. H. Kluwe (Eds.), Metacognition, motivation, and
understanding (pp. 65–116). Hillsdale, NJ: Lawrence Erlbaum.
Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical
synthesis. Review of Educational Research, 65, 245–281. doi:10.3102/00346
543065003245
Clark, I. (2012). Formative assessment: Assessment is for self-regulated learning. Educational Psychology Review, 24, 205–249. doi:10.1007/s10648-011-9191-6
Coburn, C. E., Toure, J., & Yamashita, M. (2009). Evidence, interpretation, and persuasion:
Instructional decision making in the district central office. Teachers College Record, 111,
1115–1161. Retrieved from http://www.tcrecord.org/Content.asp?ContentId=15232
Coburn, C. E., & Turner, E. O. (2011). Research on data use: A framework and analysis.
Measurement, 9, 173–206. doi:10.1080/15366367.2011.626729
Coburn, C. E., & Turner, E. O. (2012). The practice of data use: An introduction. American
Journal of Education, 118, 99–111. doi:10.1086/663272
Corcoran, T., Mosher, F. A., & Rogat, A. (2009). Learning progressions in science. An evidence based approach to reform (CPRE Research Report 63). New York, NY: Center on
Continuous Instructional Improvement, Columbia University. Retrieved from http://files.
eric.ed.gov/fulltext/ED506730.pdf
Crisp, G. T. (2012). Integrative assessment: Reframing assessment practice for current and
future learning. Assessment & Evaluation in Higher Education, 37, 33–43. doi:10.1080/
02602938.2010.494234
Daro, P., Mosher, F. A., & Corcoran, T. (2011). Learning trajectories in mathematics: A
foundation for standards, curriculum, assessment, and instruction (CPRE Research
Report #RR-68). Philadelphia, PA: Consortium for Policy Research in Education, University of Pennsylvania Graduate School of Education.
Earl, L. M., & Katz, S. (2006). Leading schools in a data-rich world: Harnessing data for
school improvement. Thousand Oaks, CA: Corwin.
Eggen, T. J. H. M. (2004). Contributions to the theory and practice of computerized adaptive
testing (Doctoral dissertation). University of Twente, Enschede. Retrieved from http://
www.cito.nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_dissertatie_
theo_eggen.ashx
Downloaded by [Cito Groep] at 04:31 27 January 2015
18
F.M. Van der Kleij et al.
Elwood, J. (2006). Gender issues in testing and assessment. In C. Skelton, B. Francis, & L.
Smulyan (Eds.), Handbook of gender and education (pp. 262–278). Thousand Oaks, CA:
Sage.
Elwood, J., & Klenowski, V. (2002). Creating communities of shared practice: The challenges of assessment use in learning and teaching. Assessment & Evaluation in Higher
Education, 27, 243–256. doi:10.1080/0260293022013860
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83, 70–120. doi:10.3102/0034654312474350
Furtak, E. M. (2012). Linking a learning progression for natural selection to teachers’ enactment of formative assessment. Journal of Research in Science Teaching, 49, 1181–1210.
doi:10.1002/tea.21054
Gipps, C. (1994). Beyond testing: Towards a theory of educational assessment. London:
Falmer.
Gravemeijer, K. (2004). Local instruction theories as means of support for teachers in reform
mathematics education. Mathematical Thinking and Learning, 6, 105–128. doi:10.1207/
s15327833mtl0602_3
Hargreaves, E. (2005). Assessment for learning? Thinking outside the (black) box.
Cambridge Journal of Education, 35, 213–224. doi:10.1080/03057640500146880
Harlen, W. (2010). What is quality teacher assessment? In J. Gardner, W. Harlen, L.
Hayward, & G. Stobart (Eds.), Developing teacher assessment (pp. 29–52). Maidenhead:
Open University Press.
Hattie, J., & Gan, M. (2011). Instruction based on feedback. In P. Alexander & R. E. Mayer
(Eds.), Handbook of research on learning and instruction (pp. 249–271). New York, NY:
Routledge.
Ikemoto, G. S., & Marsh, J. A. (2007). Cutting through the data-driven mantra: Different
conceptions of data-driven decision making. In P. A. Moss (Ed.), Evidence and decision
making (pp. 105–131). Malden, MA: Wiley-Blackwell. Retrieved from http://www.rand.
org/content/dam/rand/pubs/reprints/2009/RAND_RP1372.pdf
Ingram, D., Louis, S. K., & Schroeder, R. G. (2004). Accountability policies and teacher
decision making: Barriers to the use of data to improve practice. Teachers College
Record, 106, 1258–1287. doi:10.1111/j.1467-9620.2004.00379.x
Johnson, M., & Burdett, N. (2010). Intention, interpretation and implementation: Some paradoxes of assessment for learning across educational contexts. Research in Comparative
and International Education, 5, 122–130. doi:10.2304/rcie.2010.5.2.122
Jonassen, D. H. (1991). Evaluating constructivist learning. Educational Technology, 31(9),
28–33.
Keeley, P., & Tobey, C. R. (2011). Mathematics formative assessment. Thousand Oaks, CA:
Corwin.
Klenowski, V. (2009). Assessment for learning revisited: An Asia-Pacific perspective. Assessment in Education: Principles, Policy & Practice, 16, 263–268. doi:10.1080/
09695940903319646
Kulhavy, R. W., & Stock, W. A. (1989). Feedback in written instruction: The place of
response certitude. Educational Psychology Review, 1, 279–308. doi:10.1007/
BF01320096
Lai, M. K., & Schildkamp, K. (2013). Data-based decision making: An overview. In
K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education:
Challenges and opportunities (pp. 9–21). Dordrecht: Springer. doi:10.1007/978-94-0074816-3
Ledoux, G., Blok, H., Boogaard, M., & Krüger, M. (2009). Opbrengstgericht werken. Over
waarde van meetgestuurd onderwijs [Data-driven decision making. About the value of
measurement oriented education]. SCO-Rapport 812. Amsterdam: SCO-Kohnstamm Instituut. Retrieved from http://dare.uva.nl/document/170475
Leighton, J. P., & Gierl, M. J. (2007a). Defining and evaluating models of cognition used in
educational measurement to make inferences about examinees’ thinking processes. Educational Measurement: Issues and Practice, 26, 3–16. doi:10.1111/j.17453992.2007.00090.x
Leighton, J. P., & Gierl, M. J. (Eds.). (2007b). Cognitive diagnostic assessment for education.
Theory and applications. New York, NY: Cambridge University Press.
Downloaded by [Cito Groep] at 04:31 27 January 2015
Assessment in Education: Principles, Policy & Practice
19
Lesgold, A. (2004). Contextual requirements for constructivist learning. International Journal
of Educational Research, 41, 495–502. doi:10.1016/j.ijer.2005.08.014
Mandinach, E., Honey, M., Light, D., & Brunner, C. (2008). A conceptual framework for
data-driven decision-making. In E. B. Mandinach & M. Honey (Eds.), Data-driven
school improvement: Linking data and learning (pp. 13–31). New York, NY: Teachers
College Press.
Mandinach, E. B., & Jackson, S. S. (2012). Transforming teaching and learning through
data-driven decision making. Thousand Oaks, CA: Corwin.
Marsh, J. A. (2012). Interventions promoting educators’ use of data: Research insights and
gaps. Teachers College Record, 114(11), 1–48.
Marsh, J. A., Pane, J. F., & Hamilton, L. S. (2006). Making sense of data-driven decision
making in education. Evidence from recent RAND research. Santa Monica, CA: RAND
Corporation.
Ministry of Education, British Columbia, Canada. (2002). Accountability framework.
Retrieved from http://www.bced.gov.bc.ca/policy/policies/accountability_framework.htm
Moss, P. A., Pullin, D., Gee, J. P., & Haertel, E. H. (2005) The idea of testing: Psychometric
and sociocultural perspectives. Measurement: Interdisciplinary Research and Perspectives, 3, 63–83. doi:10.1207/s15366359mea0302_1
Narciss, S. (2008). Feedback strategies for interactive learning tasks. In J. M. Spector, M. D.
Merril, J. J. G. van Merriënboer, & M. P. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed., pp. 125–144). Mahwah, NJ: Lawrence
Erlbaum Associates.
Nicol, D., & McFarlane-Dick, D. (2006). Formative assessment and self-regulated learning:
A model and seven principles of good feedback practice. Studies in Higher Education,
31, 199–218. doi:10.1080/03075070600572090
Ruiz-Primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific
inquiry: Exploring teachers’ practices and student learning. Educational Assessment, 11,
237–263. doi:10.1080/10627197.2006.9652991
Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design
of epistemic games: Measurement principles for complex learning environments. The
Journal of Technology, Learning, and Assessment, 8. Retrieved from http://napoleon.bc.
edu/ojs/index.php/jtla/article/viewFile/1623/1467
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. doi:10.1007/BF00117714
Sanders, P. (2011). Het doel van toetsen [The purpose of testing]. In P. Sanders (Ed.), Toetsen
op school [Testing at school] (pp. 9–20). Arnhem: Cito. Retrieved from http://www.cito.
nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_toetsen_op_school.ashx
Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what
purposes, and promoting and hindering factors. Teaching and Teacher Education, 26,
482–496. doi:10.1016/j.tate.2009.06.007
Schildkamp, K., & Lai, M. K. (2013). Conclusions and a data use framework. In K. Schildkamp,
M. K. Lai, & L. Earl (Eds.), Data-based decision making in education: Challenges and
opportunities (pp. 177–191). Dordrecht: Springer. doi:10.1007/978-94-007-4816-3
Schildkamp, K., Lai, M. K., & Earl, L. (Eds.). (2013). Data-based decision making in education: Challenges and opportunities. Dordrecht: Springer. doi:10.1007/978-94-007-4816-3
Schildkamp, K., & Poortman, C. L. (in press). Factors influencing the functioning of data
teams. Teachers College Record, 117(5).
Shepard, L. A. (2005, October). Formative assessment: Caveat emptor. Paper presented at
the ETS Invitational Conference, The Future of Assessment: Shaping Teaching and
Learning, New York, NY.
Shuell, T. (1986). Cognitive conceptions of learning. Review of Educational Research, 56,
411–436. doi:10.3102/00346543056004411
Simon, M. (1995). Reconstructing mathematics pedagogy from a constructivist perspective.
Journal for Research in Mathematics Education, 26, 114–145. Retrieved from http://
www.jstor.org/stable/749205
Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice
and research. Educational Researcher, 31(7), 15–21. doi:10.3102/0013189X031007015
Downloaded by [Cito Groep] at 04:31 27 January 2015
20
F.M. Van der Kleij et al.
Slavin, R. E. (2003). A reader’s guide to scientifically based research. Educational Leadership, 60(5), 12–16. Retrieved from http://www.ascd.org/publications/educational-leader
ship/feb03/vol60/num05/A-Reader’s-Guide-to-Scientifically-Based-Research.aspx
Sluijsmans, D., Joosten-ten Brinke, D., & Van der Vleuten, C. (submitted). Toetsen met
leerwaarde: Een reviewstudie naar de effectieve kenmerken van formatief toetsen [Testing
with learning value: A review study on the effective characteristics of formative assessment]. Manuscript submitted for publication.
Stevenson, C. E., Hickendorff, M., Resing, W. C. M., Heiser, W. J., & de Boeck, P. A. L.
(2013). Explanatory item response modelling of children’s change on a dynamic test of
analogical reasoning. Intelligence, 41, 157–168. doi:10.1016/j.intell.2013.01.003
Stobart, G. (2008). Testing times: The uses and abuses of assessment. Abingdon: Routledge.
Supovitz, J. (2010). Knowledge-based organizational learning for instructional improvement.
In A. Hargreaves, A. Lieberman, M. Fullan, & D. Hopkins (Eds.), Second international
handbook of educational change (pp. 707–723). New York, NY: Springer. doi:10.1007/
978-90-481-2660-6
Swan, G., & Mazur, J. (2011). Examining data driven decision making via formative assessment: A confluence of technology, data interpretation heuristics and curricular policy.
Contemporary Issues in Technology and Teacher Education, 11, 205–222. Retrieved from
http://www.editlib.org/p/36021
Thurlings, M., Vermeulen, M., Bastiaens, T., & Stijnen, S. (2013). Understanding feedback:
A learning theory perspective. Educational Research Review, 9, 1–15. doi:10.1016/j.edurev.2012.11.004
Timperley, H. (2009). Using assessment data for improving teaching practice. Australian College of Educators, 8(3), 21–27. Retrieved from http://oksowhat.wikispaces.com/file/view/
Using+assessment+data+Helen+Timperley.pdf
Verhofstadt-Denève, L., Van Geert, P., & Vyt, A. (2003). Handboek ontwikkelingspsychologie. Grondslagen en theorieën [Handbook of developmental psychology. Principles and
theories]. Houten: Bohn Stafleu Van Loghum.
Vygotsky, L. S. (1978). Mind in society. London: Harvard University Press.
Wayman, J. C., Jimerson, J. B., & Cho, V. (2012). Organizational considerations in establishing the data-informed district. School Effectiveness and School Improvement: An International Journal of Research, Policy and Practice, 23, 159–178. doi:10.1080/
09243453.2011.652124
Wayman, J. C., Spikes, D. D., & Volonnino, M. (2013). Implementation of a data initiative
in the NCLB era. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision
making in education: Challenges and opportunities (pp. 135–153). doi:10.1007/978-90481-2660-6
Wayman, J. C., & Stringfield, S. (2006). Data use for school improvement: School practices
and research perspectives. American Journal of Education, 112, 463–468. doi:10.1086/
505055
Wiliam, D. (2011). What is assessment for learning? Studies in Educational Evaluation, 37,
3–14. doi:10.106/j.stueduc.2011.03.001