Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
This art icle was downloaded by: [ Cit o Groep] On: 27 January 2015, At : 04: 31 Publisher: Rout ledge I nform a Lt d Regist ered in England and Wales Regist ered Num ber: 1072954 Regist ered office: Mort im er House, 37- 41 Mort im er St reet , London W1T 3JH, UK Assessment in Education: Principles, Policy & Practice Publicat ion det ails, including inst ruct ions f or aut hors and subscript ion inf ormat ion: ht t p: / / www. t andf online. com/ loi/ caie20 Integrating data-based decision making, Assessment for Learning and diagnostic testing in formative assessment ab ab Fabienne M. Van der Kleij , Jorine A. Vermeulen , Kim b Schildkamp & Theo J. H. M. Eggen ab a Psychomet ric Research Cent re, Cit o Inst it ut e f or Educat ional Measurement , Arnhem, The Net herlands b Click for updates Facult y of Behavioural Science, Universit y of Twent e, Enschede, The Net herlands Published online: 22 Jan 2015. To cite this article: Fabienne M. Van der Kleij , Jorine A. Vermeulen, Kim Schildkamp & Theo J. H. M. Eggen (2015): Int egrat ing dat a-based decision making, Assessment f or Learning and diagnost ic t est ing in f ormat ive assessment , Assessment in Educat ion: Principles, Policy & Pract ice, DOI: 10. 1080/ 0969594X. 2014. 999024 To link to this article: ht t p: / / dx. doi. org/ 10. 1080/ 0969594X. 2014. 999024 PLEASE SCROLL DOWN FOR ARTI CLE Taylor & Francis m akes every effort t o ensure t he accuracy of all t he inform at ion ( t he “ Cont ent ” ) cont ained in t he publicat ions on our plat form . However, Taylor & Francis, our agent s, and our licensors m ake no represent at ions or warrant ies what soever as t o t he accuracy, com plet eness, or suit abilit y for any purpose of t he Cont ent . Any opinions and views expressed in t his publicat ion are t he opinions and views of t he aut hors, and are not t he views of or endorsed by Taylor & Francis. The accuracy of t he Cont ent should not be relied upon and should be independent ly verified wit h prim ary sources of inform at ion. Taylor and Francis shall not be liable for any losses, act ions, claim s, proceedings, dem ands, cost s, expenses, dam ages, and ot her liabilit ies what soever or howsoever caused arising direct ly or indirect ly in connect ion wit h, in relat ion t o or arising out of t he use of t he Cont ent . This art icle m ay be used for research, t eaching, and privat e st udy purposes. Any subst ant ial or syst em at ic reproduct ion, redist ribut ion, reselling, loan, sub- licensing, syst em at ic supply, or dist ribut ion in any form t o anyone is expressly forbidden. Term s & Downloaded by [Cito Groep] at 04:31 27 January 2015 Condit ions of access and use can be found at ht t p: / / www.t andfonline.com / page/ t erm sand- condit ions Assessment in Education: Principles, Policy & Practice, 2015 http://dx.doi.org/10.1080/0969594X.2014.999024 Integrating data-based decision making, Assessment for Learning and diagnostic testing in formative assessment Fabienne M. Van der Kleija,b*, Jorine A. Vermeulena,b, Kim Schildkampb and Theo J.H.M. Eggena,b a Psychometric Research Centre, Cito Institute for Educational Measurement, Arnhem, The Netherlands; bFaculty of Behavioural Science, University of Twente, Enschede, The Netherlands Downloaded by [Cito Groep] at 04:31 27 January 2015 (Received 21 July 2014; accepted 10 December 2014) Recent research has highlighted the lack of a uniform definition of formative assessment, although its effectiveness is widely acknowledged. This paper addresses the theoretical differences and similarities amongst three approaches to formative assessment that are currently most frequently discussed in educational research literature: data-based decision making (DBDM), Assessment for Learning (AfL) and diagnostic testing (DT). Furthermore, the differences and similarities in the implementation of each approach were explored. This analysis shows that although differences exist amongst the theoretical underpinnings of DBDM, AfL and DT, the combination of these approaches can create more informed learning environments. The thoughtful integration of the three assessment approaches should lead to more valid formative decisions, if a range of evidence about student learning is used to continuously optimise student learning. Keywords: formative assessment; data-based decision making; assessment for learning; diagnostic testing; theoretical comparison Introduction The complex interdependencies amongst learning, teaching and assessment are increasingly being recognised. Assessment encompasses the use of a broad spectrum of processes and instruments for gathering evidence about student learning, such as paper-and-pencil tests, projects or observations (Stobart, 2008). In education, a distinction is made between summative assessment and formative assessment. Whenever assessment results are intended to play a role in making a decision about the mastery of a defined content domain, it has a summative purpose, for example, in making a decision regarding selection, classification, certification or placement (Sanders, 2011). Over shorter periods of time, assessments that serve a summative purpose usually involve grading, which eventually adds up to a pass/fail decision. If assessment results are intended to steer the learning process, assessment is formative in purpose. Some researchers have defined assessments as formative based on the actual use of assessment information to support learning (e.g. Black & Wiliam, 1998). However, the currently dominant view in the literature supports the notion that the fundamental distinction between formative and summative *Corresponding author. Email: fabienne.vanderkleij@acu.edu.au © 2015 Taylor & Francis Downloaded by [Cito Groep] at 04:31 27 January 2015 2 F.M. Van der Kleij et al. assessment lies in their purposes (e.g. Bennett, 2011). The purposes of summative and formative assessments are not mutually exclusive; they can coexist as primary and secondary purposes of the same assessment (Bennett, 2011). The effectiveness of formative assessment is widely acknowledged. However, these effectiveness claims are not always well grounded, amongst other things, caused by the lack of a uniform definition of the concept of formative assessment (Bennett, 2011). The term formative was first applied in the context of individual student learning by Bloom in 1969, describing assessment with the purpose of providing feedback to direct future learning (although the term used was formative evaluation not formative assessment). Feedback is widely recognised as a crucial aspect of formative assessment (Bennett, 2011; Brookhart, 2007; Sadler, 1989; Shepard, 2005; Stobart, 2008), and can be defined from two perspectives (Sadler, 1989): (1) the information resulting from assessments that provides teachers and other stakeholders with insights into student learning, and (2) feedback provided to students based on their responses to an assessment task. The first form of feedback can be used by educators to adapt instruction to the needs of learners, whereas the second can be used to directly steer learning processes in students. Formative assessment is a broad concept that comprises many definitions, e.g. Assessment for Learning (AfL) and diagnostic testing (DT) (Bennett, 2011; Johnson & Burdett, 2010), that have expanded over time (Brookhart, 2007) to include, for example, self-regulated learning (e.g. Clark, 2012; Nicol & McFarlane‐Dick, 2006). While numerous researchers have attempted to define formative assessment (Black & Wiliam, 2009), to date a widely accepted definition has not emerged in the literature. In this paper, we define formative assessment broadly as any assessment that is intended to support learning. Formative assessment can be seen as an umbrella term that covers various approaches to assessment intended to support learning that have different underlying learning theories (Briggs, Ruiz-Primo, Furtak, Shepard, & Yin, 2012). The term approach captures the underlying principles and intentions that shape particular assessment uses. Furthermore, it is helpful to make a distinction between formative evaluation and formative assessment (Harlen, 2010; Shepard, 2005). The term formative evaluation refers to the use of assessment data to make decisions concerning the quality of education at a higher aggregation level than the level of the learner or the class. Data from summative assessment can also be used for formative evaluation (e.g. the use of assessment data for policy development at the school level). Formative assessment, however, only concerns decisions at the levels of the learner and the class to accommodate the pupils’ individual educational needs. This paper examines the theoretical differences and similarities amongst three approaches to formative assessment that are currently most frequently discussed in educational research literature. The main feature that these approaches have in common is that the evidence gathered is interpreted and can subsequently be used to change the learning environment in order to meet learners’ needs (Wiliam, 2011). However, the way student learning is defined and the nature of the evidence obtained differ to some extent within each approach. The first approach is data-based decision making (DBDM), which originated in the USA as a direct consequence of the No Child Left Behind (NCLB) Act. Within the NCLB, learning outcomes are defined in terms of results and attaining specified targets (Wayman, Spikes, & Volonnino, 2013). Second, Assessment for Learning (AfL), originally Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 3 introduced by scholars from the UK (Assessment Reform Group [ARG], 1999), is an assessment approach that focuses on the quality of the learning process, rather than merely on students’ (final) learning outcomes with emphasis on feedback to students (Stobart, 2008). Finally, diagnostic testing (DT) was initially used to refer students to special education, particularly those diagnosed as unable to participate in mainstream educational settings (Stobart, 2008). DT provides detailed assessment data about a learner’s problem-solving processes, which can implicate what a student needs to improve his or her learning process and learning outcomes (Crisp, 2012; Keeley & Tobey, 2011). DBDM and DT can be used for both formative and summative assessment; this paper focuses on their potential formative uses. Within some of the approaches, terminology and definitions are inappropriately used interchangeably; therefore, it is valuable to review and compare the theoretical underpinnings of DBDM, AfL and DT. For example, literature on DBDM tends to cite literature concerning AfL, but not vice versa (e.g. Swan & Mazur, 2011). Moreover, discussions in assessment literature seem to revolve around finding evidence of what works. As Elwood (2006) pointed out, these discussions do not acknowledge the complexity of the use of assessment for learning enhancement, and lead to what she calls ‘quick fixes’ (p. 226). Ignoring the differences in the theoretical underpinnings has led to theoretical ambiguity in the assessment literature, as shown by studies that have entangled the terminology and definitions of the three approaches. As a result, it is not feasible to study the effectiveness of each approach separately. Bennett (2011) also stressed this ambiguity in the use of definitions: ‘Definition is important because if we can’t clearly define an innovation, we can’t meaningfully document its effectiveness’ (p. 8). Currently, a mix of these approaches is implemented in educational practice, without the awareness of which approach is most suitable for a particular goal. In order to jointly use the three approaches in an effective way, awareness of the possibilities and limitations of each approach is essential. However, due to the lack of clarity on the definitions of these approaches and their goals, the implications for practice are currently unclear. In addition, the ambiguity in the use of definitions is reflected in educational policy, which further contributes to confusion about how formative assessment should be implemented in practice. For example, in British Columbia, Canada, DBDM and AfL are the pillars of the programme of the Ministry of Education (2002). However, while DBDM mainly focuses on what has to be learned, AfL and DT seem to emphasise how students learn what has to be learned (best), and the quality of the learning process (Stobart, 2008). Nevertheless, all three approaches claim the importance of using feedback for learning enhancement, but the procedures advised regarding the provision of feedback differ substantially. Understanding the relation between the theoretical underpinnings and the prescriptions on why, how and when assessment should be used by learners, teachers and schools to enhance learning is needed to move the field of educational assessment forward. It is important to recognise the differences between these approaches as an initial exploration of what it might mean to blend these approaches in a meaningful way. With this comparative theoretical analysis, we aim to contribute to a more coherent research agenda within the field concerning the effectiveness of educational assessment programmes incorporating these approaches. Although the researchers acknowledge that some of the differences between these approaches may be the result of variations in the instigating contextual factors, this paper focuses on 4 F.M. Van der Kleij et al. Downloaded by [Cito Groep] at 04:31 27 January 2015 conceptual rather than contextual issues. This is relevant because combinations of these approaches are currently being used across various contexts, without taking into account differences in conceptual underpinnings. Further, while our analysis does not attempt to address the competing issues of summative and formative assessment, reference is made to summative assessment as a means to clarify the issue discussed. Note that we do not intend to make any claims about which assessment approach is most effective for improving student learning. This paper explores the similarities and differences in the theoretical underpinnings of DBDM, AfL and DT. Also, the implications for implementing DBDM, AfL and DT in educational practice are identified and contrasted. The paper aims to identify possibilities and potential stumbling blocks for the thoughtful integration of the three approaches. Learning theories and assessment paradigms It is remarkable that most literature about assessment approaches rarely makes explicit the theoretical assumptions about learning. For example, a recent review on formative assessment identified that the studies which relate formative assessment to a learning theory are scarce (Sluijsmans, Joosten-ten Brinke, & Van der Vleuten, submitted). As formative assessment lies at the intersection of instruction and assessment, its approaches are influenced by both learning theories and assessment paradigms. Implementing a system-wide formative assessment approach requires an alignment of assessment practices, which starts with an understanding of the learning theories behind currently dominant approaches (Elwood, 2006) and their impacts on assessment and feedback. The following five learning theories are most prominent in current assessment literature, and are relevant to our comparison of the three assessment approaches: neobehaviourism, cognitivism, metacognitivism, social cultural theory and (social) constructivism (Thurlings, Vermeulen, Bastiaens, & Stijnen, 2013). The formative concept originated from neobehaviourism, as introduced by Bloom (1969). As the dominant theory of learning from the 1930s, it focused on behavioural rather than cognitive mechanisms of learning (Stobart, 2008; Verhofstadt-Denève, Van Geert, & Vyt, 2003). In contrast, cognitivists such as Piaget focused on changes in cognitive structures rather than in behaviour (VerhofstadtDenève et al., 2003). Cognitivism highlights information processing and knowledge representation in the memory, rather than learning mechanisms (Shuell, 1986). In metacognitivism, the emphasis is on learning how to learn and regulating the learning processes by regularly providing feedback (Butler & Winne, 1995). In 1978, Vygotsky introduced the social cultural theory of learning. This theory used feedback in the form of scaffolding as the most important learning mechanism for supporting students’ knowledge and skill acquisition, which learners were not yet able to apply on their own. Learning occurs through social interactions and dialogues between the learner and the teachers, or his or her peers. Vygotsky believed that to promote student learning, assessments should focus on what students are able to learn, rather than what they have learned so far (Verhofstadt-Denève et al., 2003). In constructivism, learning is seen as a cyclic process in which new knowledge and skills are built on prior ones through continuous adaption of the learning environment to the learners’ needs (Jonassen, 1991; Stobart, 2008). In social constructivism, the learners’ active role is emphasised, and teachers are expected to actively engage Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 5 learners in constructing knowledge and developing skills by frequently providing elaborated feedback (Thurlings et al., 2013). Assessment approaches are not only underpinned by learning theories, but also by distinct assessment paradigms. A historically dominant field of study in assessment is psychometrics, which relies on the use of standardised, objective and quantified evidence, typically for summative purposes (Moss, Pullin, Gee, & Haertel, 2005). Assessments in this psychometric paradigm are designed to capture individual student ability, achievement or affect, compared with norm-referenced or criterion-referenced standards, independent of context of the assessment task construction. As a consequence of the summative purposes, assessment was seen as a separate activity from teaching and learning, and there were no clear links made to learning theories. A contrasting assessment paradigm originates from a sociocultural point of view, in which the learner is inextricably connected with and involved in the environment in which learning occurs. Assessment in a sociocultural paradigm is highly situated and does not directly allow for generalisations of performance to other situations. The assessment methods employed aim to examine learning in a particular context and take into account the relationships between learners and the broader community (Moss et al., 2005). These assessment paradigms have different theoretical and historical foundations, and employ significantly different methods for gathering evidence about student learning and for making decisions based on this evidence. While there is no direct alignment between learning theories and assessment paradigms, assessment methods based on different learning theories can be ordered on a continuum from psychometric to sociocultural. Neobehaviourism and cognitivism typically employ assessment strategies in line with the psychometric paradigm. Assessment from a neobehaviourist perspective emphasises memorisation of facts, and feedback is intended to reinforce correct recall of these facts (Hattie & Gan, 2011; Narciss, 2008). These facts are seen as independent of the context in which they have been taught (Stobart, 2008). Because the outcome of learning in cognitivism is behavioural change, the accompanying assessment and teaching practices are primarily of a retroactive nature, meaning that remediation is used to redirect the learning process and promote learning (Stobart, 2008). Feedback is often intended to correct incorrect responses (Kulhavy & Stock, 1989; Thurlings et al., 2013), and provided by an expert to a passive learner (Evans, 2013). However, the characteristics of the learner and the task are taken into account. In metacognitivism, assessment is aimed at metacognitive knowledge and skills. The feedback message is usually about how the learner learns, rather than about what the learner learns (Brown, 1987; Stobart, 2008). This approach leans towards a sociocultural paradigm of assessment, because the emphasis is on how a learner learns in a particular context. Assessment in practices based on social cultural theory naturally aligns with a sociocultural assessment paradigm. However, although Vygotsky’s social cultural theory resulted in an international shift in teaching practices, retroactive assessment practices, which focus on remediation of individual abilities, have remained popular (Elwood, 2006; Stobart, 2008). Thus, although learning is seen as a sociocultural interactive activity, assessment remains mostly an individual activity in practice. Collaborative learning and solving real-world problems, which use peer feedback as an important learning mechanism, characterise social constructivist learning environments (Lesgold, 2004; Stobart, 2008; Thurlings 6 F.M. Van der Kleij et al. et al., 2013). Although individual assessment still occurs in practice in constructivist and social constructivist learning environments, these learning theories clearly align with a sociocultural paradigm of assessment. Downloaded by [Cito Groep] at 04:31 27 January 2015 Analysis of the three approaches In the following sections, this paper discusses the theoretical underpinnings of each approach in terms of its origin, definition, goals and assessment paradigms. We recognise the importance of understanding the underlying learning theories in the foundations of the three assessment approaches. However, in comparing the approaches, emphasis remains on the differences in their assessment paradigms as they provide clear implications for practice. Next, the implementation of each approach is discussed in terms of aggregation levels, assessment methods and feedback loops. Theoretical underpinnings of DBDM Teachers make several instructional decisions intuitively (Ingram, Louis, & Schroeder, 2004; Slavin, 2002, 2003). However, educational policies such as the NCLB act have caused an increase in accountability requirements, which has stimulated the use of data for informing school practice in the USA (Wayman, Jimerson, & Cho, 2012). Using data to inform decisions in the school, for example about instruction, is referred to as DBDM (Ledoux, Blok, Boogaard, & Krüger, 2009). Schildkamp and Kuiper (2010) defined DBDM as, ‘systematically analyzing existing data sources within the school, applying outcomes of analyses to innovate teaching, curricula, and school performance, and, implementing (e.g. genuine improvement actions) and evaluating these innovations’ (p. 482). The definition of data in the context of schools is, ‘information that is systematically collected and organized to represent some aspect of schooling’ (Lai & Schildkamp, 2013, p. 10). This definition is broad and includes any relevant information derived from qualitative and quantitative measurements, but the main emphasis is on objective data (Lai & Schildkamp, 2013; Wayman et al., 2012). Data include not only assessment results, but also other data types, such as student background characteristics. Data use can be described as a complex and interpretive process, in which data have to be identified, collected, analysed and interpreted to become meaningful and useful for actions (Coburn, Toure, & Yamashita, 2009; Coburn & Turner, 2012). The action’s impact is evaluated by gathering new data, thus creating a feedback loop (Mandinach & Jackson, 2012). Early initiatives of DBDM were based on neobehaviourism and cognitivism (Stobart, 2008), which meant that no explicit attention was paid to the sociocultural environment where learning occurred. Previously, DBDM focused on reaching predetermined goals, checking if the goals had been achieved and adapting the learning environment where needed. This process was mainly transmissive in nature, meaning that educational facilitators (e.g. teachers) were responsible for delivering adequate instruction to learners. In this view, learning is an individual activity, and assessments are used to check on the individual student’s ability (Elwood, 2006), in line with a psychometric paradigm (Moss et al., 2005). As a consequence of this view, assessment methods used for DBDM, such as standardised tests, do not account for the variety of contexts in which the learning occurred. Assessment in Education: Principles, Policy & Practice 7 Downloaded by [Cito Groep] at 04:31 27 January 2015 However, lately DBDM seems to have moved somewhat more towards a sociocultural paradigm, focusing on continuously adapting learning environments to facilitate and optimise learning processes, taking into account learners’ needs and individual characteristics. Thus, instead of just acknowledging the context or controlling for it, the emphasis is on the process of data use within a particular context (Coburn & Turner, 2011; Schildkamp, Lai, & Earl, 2013; Supovitz, 2010). While this movement is evidenced in many countries, it has not occurred in all. By using data, teachers can set appropriate learning goals, given students’ current achievements. Subsequently, teachers can assess and monitor whether students are reaching their goals, and if necessary, adjust their instruction (Bernhardt, 2003; Earl & Katz, 2006). In this way, DBDM is used for formative assessment. Data can also be used for formative evaluation by school leaders and teachers for policy development and school improvement planning, teacher development and monitoring the implementation of the school’s goals (Schildkamp et al., 2013; Schildkamp & Kuiper, 2010). Implementation of DBDM: aggregation level, assessment methods, and feedback loops Aggregation level Data collection regarding DBDM takes place at the school, class and student levels. Data can be gathered from different stakeholders. At the student and class levels, assessment outcomes are an important source of information about how learning processes could be improved for both students and teachers. Data can also be used at the school level for school development purposes, for example, to increase aggregated student achievement (Schildkamp & Lai, 2013). Assessment methods Different data types can be used for school and instructional development. The data type most often referred to is objective output data from standardised tests, for example, from a student-monitoring system. However, data such as this are less frequently available than those from informal assessment situations, such as homework assignments. Next to these formally gathered data, teachers possess data collected using various standardised assessment methods, curriculum-embedded assessments and (structured) observations from daily practice (Ikemoto & Marsh, 2007). Effective DBDM requires access to high-quality data, because the quality of the decision depends upon the quality of the data used (Coburn & Turner, 2011; Schildkamp & Kuiper, 2010). For the implementation of DBDM, schools need access to multiple sources of high-quality data (especially if the stakes are high) and therefore need a good data use infrastructure (Breiter & Light, 2006; Wayman & Stringfield, 2006). Feedback loops The most frequently used kind of feedback in DBDM is feedback based on assessment data. Teachers and other educators have to transform assessment data into meaningful actions for educational improvement. These actions include making Downloaded by [Cito Groep] at 04:31 27 January 2015 8 F.M. Van der Kleij et al. changes in practice and providing students with feedback on their learning processes and outcomes (Schildkamp et al., 2013). The primary user of feedback is the teacher at the class level. DBDM starts with a purpose, often in the form of a problem with regard to student achievement. Assessment data serve as a form of feedback for teacher and student use in identifying gaps between the current and the desired level of achievement. Subsequently, data are collected to investigate the possible causes of the discrepancy. These data have to be filtered, organised to investigate the possible causes, and analysed and interpreted to become useful information. Combined with stakeholder understanding and expertise, this becomes actionable knowledge. Based on the data, actions such as making adjustments to instruction can be taken, or it can indicate the need for new data to be collected. If actions are taken, evaluation needs to occur once again to ascertain if these changes solved the discrepancy, creating another feedback loop (Mandinach, Honey, Light, & Brunner, 2008; Marsh, 2012; Marsh, Pane, & Hamilton, 2006; Schildkamp & Poortman, in press). The length of these cycles of inquiry and feedback loops varies, but it is advised that teachers engage in continuous cycles of inquiry (Timperley, 2009). These feedback loops can be relatively long when involving the use of standardised or commercially available assessments, which are often only available once or twice a year. The majority of these loops are retroactive in nature, meaning that based on data, achievement gaps are identified and addressed. Theoretical underpinnings of AfL AfL was originally introduced by UK scholars as a resistance to the emphasis on summative uses of assessments (Stobart, 2008). This approach focuses specifically on the quality of the learning process instead of on its outcomes. Moreover, ‘it puts the focus on what is being learned and on the quality of classroom interactions and relationships’ (Stobart, 2008, p. 145). The ARG defined AfL as, ‘ … the process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there’ (2002, p. 2). However, this definition was often misinterpreted (Johnson & Burdett, 2010; Klenowski, 2009). For this reason, Klenowski (2009) reported on what she referred to as a ‘secondgeneration definition’ of AfL, which will be used in this paper: ‘part of everyday practice by students, teachers and peers that seeks, reflects upon and responds to information from dialogue, demonstration and observation in ways that enhance ongoing learning’ (p. 264). Hargreaves (2005) concluded that there are two approaches within AfL, a measurement and an inquiry approach. In the measurement approach, in line with a psychometric paradigm, AfL is viewed as an activity that includes marking, monitoring and showing a level. In this view, (quantitative) data are used to formulate feedback and to inform decisions. Assessment is seen as a separate activity from instruction that shows to what degree a predetermined level has been achieved. This approach resembles the definition of DBDM, and does not reflect the original intentions of the AfL approach as formulated by the ARG (2002). In the inquiry approach, AfL is a process of discovering, reflecting, understanding and reviewing. It is focused on the process, and assessments are integrated into the learning process. Qualitative sources of information, such as observations, demonstrations and conversations, play an Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 9 important role. This approach reflects a sociocultural paradigm of assessment. In both AfL approaches, feedback is used to steer future learning; however, in the measurement approach, feedback might be less immediate and feedback loops less frequent. The AfL approach described in this study leans towards the inquiry approach, as described by Klenowski (2009). In AfL literature, classroom dialogues are stressed as an important learning activity. This idea is theoretically underpinned by metacognitivism, social cultural theory and social constructivism. Learning is viewed as a social activity; learning occurs through interaction. Thus, knowledge and skills are believed to depend on the context, and to exist in the relationships amongst the individuals involved in that context. As a result, assessment should not be seen as an individual activity (Elwood, 2006), reflecting the sociocultural paradigm of assessment (Moss et al., 2005). AfL is aimed at the quality of the learning process instead of its outcomes (e.g. a grade). This goal stimulates a learning-oriented rather than an outcome-oriented classroom culture (Stobart, 2008) and resists the traditionally dominant psychometric paradigm of assessment. AfL makes it possible to anticipate weaker points in the current learning process and identify further steps to take for improvement (ARG, 1999). Students have a central role in the learning process; as a result, they actively participate in the evaluation of their own learning (Elwood & Klenowski, 2002). Furthermore, AfL aims to increase learner autonomy, motivation and reflection by facilitating an inquiry-oriented and interactive classroom climate (Klenowski, 2009). Implementation of AfL: aggregation level, assessment methods and feedback loops Aggregation level The AfL approach takes place within the classroom; it concerns decisions about the entire class or individual students. The information used to make decisions is gathered from students. Assessment methods The data used to inform decisions can come from various assessment sources, such as paper-and-pencil tests, dialogues (e.g. questioning and discussions), practical demonstrations of learning, portfolios, peer assessment or self-assessment (Gipps, 1994). Hence, the evidence gathered about the learning process of the learners can be both qualitative and quantitative in nature. These assessment events can be planned as well as unplanned, and be formal or informal. Continuous interactions between learners and the teacher characterise the process. The quality of the assessment process depends largely on the degree of the teacher’s capability to identify usable data about student learning, make inferences about student learning, and translate this information into instructional decisions and feedback to students (Bennett, 2011). Thus, the assessment quality depends on the degree to which assessment results provide actionable information for formative purposes over the short term, which is a low-stakes type of use. The central role of the student is also emphasised in AfL; students need to understand and (be willing to) act on feedback (Ruiz-Primo & Furtak, 2006). 10 F.M. Van der Kleij et al. Downloaded by [Cito Groep] at 04:31 27 January 2015 Feedback loops AfL takes place in everyday practice; continuous dialogues and feedback loops characterise the process, in which (immediate) feedback is used to direct further learning (Stobart, 2008). Since assessments are integrated into the learning process, assessment opportunities are plentiful, and feedback loops are usually short. Moreover, students are stimulated to assess themselves and their peers, which, amongst other things, stimulates students’ understanding of what and why they are learning (Elwood & Klenowski, 2002). Based on the evidence gathered, continuous adaptation takes place to meet learners’ needs. Thus, the majority of the feedback loops are interactive in nature, but retroactive or proactive loops also occur. Theoretical underpinnings of DT The aim of DT is to identify the learner’s developmental stages by obtaining actionoriented, fine-grained assessment data, also referred to as process data (Rupp, Gushta, Mislevy, & Shaffer, 2010). By using cognitive theories, process data can be interpreted and used to identify misconceptions and knowledge associated with the learner’s developmental stage. The intended fine-grained data resulting from the measurements in DT, compared to regular assessments, make it exceptionally useful for formative purposes. DT is the result of the need for assessments that combine the psychometric and the sociocultural paradigms. Assessment is individual and employs quantitative methods, but the results are not intended to be generalised to other contexts or be compared to other students. Currently, cognitive diagnostic assessment (CDA) is the most commonly used approach of diagnostic testing. Cognitive in this context indicates the use of cognitive and developmental psychology research as input for the design of diagnostic assessments. By connecting assessments to research regarding learning progression, teachers are enabled to interpret students’ task behaviour in relation to the stipulated learning trajectory to find out whether redirection is needed (Gravemeijer, 2004). Learning progressions, also known as hypothetical learning trajectories (HLT, e.g. Simon, 1995), consist of hypotheses about the possible ways students could develop a certain ability (Corcoran, Mosher, & Rogat, 2009; Daro, Mosher, & Corcoran, 2011; Furtak, 2012; Gravemeijer, 2004). The assumption in DT is that how a task is solved is indicative of the developmental stage of the learner. Collecting data about the procedural steps the learner takes during an assessment can identify the learner’s (inadequate) reasoning styles, and wrongly executed procedural steps caused by misconceptions and prior knowledge, amongst other things (Crisp, 2012; Keeley & Tobey, 2011). DT is based on principles from cognitivism (Leighton & Gierl, 2007a, 2007b). Furthermore, Stobart described diagnosing student learning needs as identifying ‘how much progress can be made with adult help … ’ (2008, p. 55), that is, the zone of proximal development (Vygotsky, 1978). The fine-grained process data obtained with DT are particularly useful for creating scaffolds designed to meet the learner’s needs. In this way, DT is related to Vygotsky’s social cultural learning theory, where assessment focuses on identifying scaffolds that help students reach the next zone of development. Assessment in Education: Principles, Policy & Practice 11 Downloaded by [Cito Groep] at 04:31 27 January 2015 Implementation of DT: aggregation level, assessment methods and feedback loops Aggregation level DT concerns the assessment of the educational needs of individual students. Because of the nature of the instruments used in DT, data should not be aggregated to levels higher than the individual level (Rupp et al., 2010). Furthermore, DT is not meant for comparing students to one another, but for promoting the learning and developmental process of individual students. Assessment methods In order to make inferences about the problem-solving process during an assessment, the assessment tasks should be designed to make possible valid inferences about how the student’s task behaviour relates back to his or her thinking. This inferential chain stems from the empirical knowledge available from informationprocessing theories, cognitive psychology and learning trajectories (Daro et al., 2011; Leighton & Gierl, 2007a; Verhofstadt-Denève et al., 2003). Based on theoretical assumptions and empirical research, the items in an assessment have certain characteristics that are assumed to elicit a response behaviour related to the learner’s developmental stage (Leighton & Gierl, 2007a). The degree to which the assessment results are indicative of the developmental stage of a student is crucial to the quality of the assessment methods used in DT. Although including more items with the same characteristics in the assessment will increase the certainty of the inferences about related misconceptions, it will also make the assessment process less efficient (Rupp et al., 2010). For example, if the aim is to identify an arithmetic misconception, and a student makes an associated error on one item, it is possible that this error is caused by something other than that particular misconception. However, when the student consistently shows the same error on several items with similar characteristics, the inference about the misconception becomes stronger. Nevertheless, choosing details over certainty, in terms of test accuracy, is not problematic with short feedback loops, because the latter provides the opportunity to redirect the decisions made. In this case, the stakes in terms of possible negative consequences for the learner are relatively small (Rupp et al., 2010). Moreover, to cope with this trade-off between grain size and certainty about inferences, assessment developers in DT often consider the design of (computerised) adaptive tests, meaning that the selection of the next item depends on the student’s response to the previous item (Eggen, 2004). Adaptivity offers the possibility to make the assessment process more efficient; items can be chosen based on their content and difficulty, for example, to diagnose a student’s strategy choice. Sometimes, these types of assessments are referred to as dynamic assessments, which are usually embedded in a computerised adaptive learning environment. This means that when a student cannot solve a task, he or she will receive a minimally intrusive hint. In this way, the materials are used for both assessment and learning, by providing diagnostic information about a student’s learning needs and item-based feedback (Stevenson, Hickendorff, Resing, Heiser, & de Boeck, 2013). Feedback loops Although DT has the potential to be used for retroactive, proactive or interactive formative assessment, it is primarily used retroactively (Crisp, 2012; Stobart, 2008). In 12 F.M. Van der Kleij et al. Downloaded by [Cito Groep] at 04:31 27 January 2015 dynamic assessments, DT is used interactively; learning and assessment are integrated. When DT focuses on the assessment of prior knowledge to plan instruction, it is used proactively. Finally, when DT is used to identify, for example, misconceptions or buggy problem-solving strategies, feedback is used for remediation, resulting in a retroactive feedback loop. Short feedback loops in DT are preferred because the learner’s thinking and use of problem-solving strategies are highly likely to change over time. However, delayed feedback could still be effective when the change in the learner’s thinking and the development of new strategies cover longer periods of time. Thus, the length of feedback loops should match the student’s learning curve for the subject matter that is the assessment’s objective. A mismatch between the two might result in negative consequences, hindering the optimisation of the learning process. Comparison of the three approaches This section addresses the theoretical differences and similarities amongst the three approaches. Furthermore, the variations in the implementation of each approach are also explored. Theoretical underpinnings of DBDM, AfL and DT To explore the similarities and differences in the dominant assessment paradigms of DBDM, AfL and DT, we compared the underlying learning theories of these approaches and their goals (Table 1). Table 1 shows that DBDM, AfL and DT are underpinned by different dominating assessment paradigms. Consequently, the goals of the three approaches differ substantially; each approach aims to promote learning through different mechanisms, which results in different expectations of the roles of teachers, students and other actors in the learning, assessment and feedback process. These expectations sometimes contradict each other; for example, in traditional views on DBDM, the responsibility for the assessment process is primarily in the teacher’s hands, in line with a Table 1. Comparison of DBDM, AFL and DT regarding the theoretical underpinnings. Approach Theoretical aspect Dominant assessment paradigm Goals DBDM AfL DT Psychometric Sociocultural Psychometric and sociocultural Improve the quality of education and the quality of instruction by using data to monitor and steer practices to reach intended goals (e.g. increased student achievement) Improve the quality of the learning process by engaging learners to evaluate and reflect on their own learning and steering the learning process through continuous feedback Collect fine-grained data about a student’s zone of proximal development, prior knowledge, and reasoning styles that can inform decisions on adapting the learning environment to the learner’s needs Assessment in Education: Principles, Policy & Practice 13 Downloaded by [Cito Groep] at 04:31 27 January 2015 psychometric paradigm. On the other hand, in AfL, the teacher and students share this responsibility in line with a sociocultural paradigm, for example, in the form of self- and peer assessment (Stobart, 2008; Wiliam, 2011). Recent literature on DBDM, however, shows a shift towards shared teacher–student responsibility for assessment (Schildkamp et al., 2013). Implementation of DBDM, AfL and DT To explore the consequences of the similarities and differences in DBDM, AfL and DT for implementing these approaches in educational practice, we compared their aggregation levels, assessment methods and feedback loops (Table 2). Figure 1 shows the overlapping levels of the decisions in the three approaches. In DBDM, data are aggregated at the school level to make decisions with regard to improving the school’s quality (formative evaluation). Additionally, data are used at the class and student levels to adjust instruction to meet students’ needs (formative assessment). The latter overlaps with AfL. DT solely focuses on assessment and instructional decisions at the student level. Because the three approaches aim to promote learning at different aggregation levels, they can be considered to complement each other. The diversity in the goals of DBDM, AfL and DT is associated with the variation of the assessment methods that are typically used, resulting from the varying dominant assessment paradigms. For example, AfL uses classroom conversations, while DBDM and DT often employ standardised tests. The different choices in the use of assessment methods are primarily associated with the nature of the data, and the purposes and stakes regarding the use of these data. In DBDM, most data are Table 2. Comparison of the implementation of DBDM, AfL and DT. Approach Implementation characteristic Level Assessment methods Feedback loops DBDM School Class Student Standardised assessments  Formal classroom assessments  Structured classroom observations      Immediate and delayed feedback  Retroactive AfL DT  Class  Student  Student  Informal classroom dialogues  Formal classroom assessments  Practical demonstrations  Portfolios  Peer assessments  Self-assessments  Immediate feedback  Interactive, sometimes retroactive or proactive  (Adaptive) tests with items that elicit detailed information about a student’s reasoning  Immediate and delayed feedback  Mostly retroactive, potentially proactive or interactive 14 F.M. Van der Kleij et al. Downloaded by [Cito Groep] at 04:31 27 January 2015 Figure 1. Overlapping levels of the decisions in the three approaches. quantitative in nature; especially at the school level, high-quality data are needed as the stakes are often higher. In contrast, most data are qualitative in AfL, because they mainly aim to provide immediate information, which informs decisions on how to direct learning processes. These are low-stakes decisions; if the adaptations in the learning environment do not produce the intended effects, this will become quickly clear from subsequent assessments, whereupon the adaptation strategy can be changed. Thus, the adaptation process is flexible. In DT, fine-grained, quantitative data are usually gathered and translated into qualitative statements on which teachers can take immediate action. Although DT uses quantitative data similar to DBDM, the quality requirements are different from those of DBDM. The stakes associated with formative decisions for which DT is used are lower as the consequences are not irreversible. With respect to feedback mechanisms, we found the use of feedback loops in all three approaches. However, because the approaches aim at formative assessment and formative evaluation at different levels, these feedback loops also take place at various levels and frequencies. In DBDM, the retroactive feedback loops that occur at the school level are spread out over time. In AfL, continuous dialogues and feedback loops are essential, which results in short, frequently interactive, and sometimes retroactive or proactive feedback loops. Regarding DT, the length of feedback loops should match the student’s learning curve for the subject matter that is the assessment’s objective. Discussion The DBDM, AfL and DT approaches are all seen as powerful ways to support and enhance student learning. Educational practice implements a mix of these approaches, but in order to jointly use the three approaches in an effective way, awareness of the goals, possibilities and limitations of each approach is essential. The differences amongst the implementation of assessment approaches stem from differences in their theoretical underpinnings (Stobart, 2008) and their associated assessment paradigms. This study compared the similarities and differences in the theoretical bases and implementation of DBDM, AfL and DT. Our comparison suggests that the original theoretical underpinnings of the approaches differ in their definitions of learning and dominant assessment paradigms. Nevertheless, all approaches increasingly recognise that assessment focus should be both on the learning process and on the learning outcomes. Over the years, the approaches have been borrowing best practices from one another, without paying specific attention to why certain techniques benefit student learning. This has led to practices that are sometimes hard to trace back to a specific assessment approach. Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 15 It is important to realise that the various assessment approaches have different relevance at various stages in the learning process. Moreover, a comprehensive set of assessment methods that are underpinned by different learning theories are needed to fully grasp the complexity of learning at all levels. If one wants to use assessments formatively, one should acknowledge which learning mechanisms are applicable for decision making at the school, class or student level. Integrating the three assessment approaches in a thoughtful way can lead to more valid formative decisions. Decisions are no longer based on a single data type at one aggregation level based on one dominant assessment paradigm, but on multiple data sources gathered from multiple perspectives at different aggregation levels. Integrating the assessment approaches will enable schools to capture various aspects of their curriculum and the different learning activities of their students. Consequently, school staff will be able to continuously provide feedback at the school, class and individual levels, to guide and enhance student learning. To integrate the approaches, different feedback loops should be simultaneously active on each level in schools. At different points in the education process, retroactive, interactive or proactive feedback loops can be used to optimise student learning. However, in order for this to be effective, being aware of which approach is most appropriate in a certain context is an important starting point. At the school level, DBDM can be used, for example, to monitor curriculum goals, to group students differently to enhance learning and to improve the quality of education. Moreover, DBDM can be applied to monitor student achievement goals at the class level. Similarly, DBDM can be employed to monitor individual progress. The DBDM approach in current practice is often connected to the use of standardised external assessments; therefore, feedback loops usually extend over a longer period of time. Data in DBDM are objective but decontextualised and often limited in scope, and it is up to educators to determine how to reach the goals. The AfL approach can be used at the class and individual levels to improve the quality of the learning process by engaging learners to evaluate and reflect on their own learning, and steering the learning process through continuous feedback. This approach relies to a large extent on rich qualitative and informally gathered sources of information in a local context, highly informative to give directions for learning and teaching in daily classroom practice. Nevertheless, teachers’ inferences are likely to be biased to some extent (Bennett, 2011); therefore, standardised assessments should be used once in a while to check on students’ learning outcomes in terms of overall curriculum goals and standards in line with a DBDM approach. Regular assessments can often indicate a level of performance, but do not usually provide information on the causes of this performance. DT can be employed at the individual level to collect fine-grained data about a student’s zone of proximal development, prior knowledge and reasoning styles. The detailed evidence gathered can inform decisions on how students can best be taught and what is needed to adapt the learning environment to the learner’s needs. This study shows that although differences exist amongst the theoretical underpinnings of DBDM, AfL and DT, implementing an overarching formative assessment and formative evaluation approach could lead to more valid decisions at different levels in schools. We initially explored what it might mean to integrate these approaches. Although the comparison highlighted the promising possibilities of an overarching formative assessment approach, a crucial question remains: how to create the right balance amongst three approaches, each of which makes unique 16 F.M. Van der Kleij et al. contributions to assessment practices. Educators need clarity on why, how and when assessment should be used by whom, and there is also a need to triangulate assessment evidence to inform decisions about individual learners and the class as a whole. It might be that further professional development is required to enable educators to successfully implement an integrated formative assessment approach. Future research is needed to examine the actual thoughtful integration of these three approaches with their associated challenges and opportunities. Acknowledgements Downloaded by [Cito Groep] at 04:31 27 January 2015 The authors would like to thank the many people who supported and inspired us on this endeavour. An earlier version of this paper was published in Fabienne van der Kleij’s doctoral dissertation (2013). Funding This research was supported by Cito, Institute for Educational Measurement in the Netherlands, and the University of Twente in the Netherlands. The paper was presented at the 14th annual conference of the Association for Educational Assessment Europe (AEA-E), Paris 2013, with support from the AEA-E Kathleen Tattersall New Researcher Award. Further partial support in developing this paper was received from Australian Catholic University. Notes on contributors Fabienne M. Van der Kleij completed her PhD at the Research Centre for Examinations and Certification (RCEC), a collaboration between Cito and the University of Twente in the Netherlands. She conducted her research at Cito’s Psychometric Research Centre. Her specialisations are: feedback effectiveness, computer-based assessments, Assessment for Learning, data-driven decision making and diagnostic testing. She is currently employed as a research fellow at Australian Catholic University. Jorine A. Vermeulen is a PhD student at RCEC. She conducts her research at Cito’s Psychometric Research Centre. Her specialisations are diagnostic testing, primary school mathematics, classroom assessment, process data and the use of tablets in assessment. Kim Schildkamp is an associate professor in the Faculty of Behavioural Sciences of the University of Twente. In 2007, she obtained her PhD on school self-evaluation. Her research, in the Netherlands but also in other countries, focuses on ‘data-based decision making for school improvement’. She is a board member of the International Congress for School Effectiveness and Improvement (ICSEI) and chair of the ICSEI data use network. She has published widely on the use of data. Theo J.H.M. Eggen is a senior research scientist at the Psychometric Research Centre of Cito and full professor of psychometrics at the University of Twente. Consultancy and research on educational and psychometrical issues of test development are his main activities. His specialisations are item response theory, quality of tests, (inter)national assessment and computerised adaptive testing. He is the author of numerous research articles and chapters of textbooks. He is the scientific director of RCEC. References Assessment Reform Group. (1999). Assessment for learning: Beyond the black box. Retrieved from http://assessmentreformgroup.files.wordpress.com/2012/01/beyond_black box.pdf Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 17 Assessment Reform Group. (2002). Assessment is for learning: 10 principles. Researchbased principles to guide classroom practice. Retrieved from http://assessmentreform group.files.wordpress.com/2012/01/10principles_english.pdf Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18, 5–25. doi:10.1080/0969594X.2010.513678 Bernhardt, V. L. (2003). Using data to improve student achievement. Educational Leadership, 60(5), 26–30. Retrieved from http://www.ascd.org/publications/educational-leader ship/feb03/vol60/num05/No-Schools-Left-Behind.aspx Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5, 7–74. doi:10.1080/0969595980050102 Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21, 5–31. doi:10.1007/s11092-008-9068-5 Bloom, B. S. (1969). Some theoretical issues relating to educational evaluation. In R. W. Tyler (Ed.), Educational evaluation: New roles, new means (National Society for the Study of Education Yearbook, Vol. 68, Part 2, pp. 26–50). Chicago, IL: University of Chicago Press. Breiter, A., & Light, D. (2006). Data for school improvement: Factors for designing effective information systems to support decision-making in schools. Educational Technology & Society, 9, 206–217. Retrieved from http://www.ifets.info/journals/9_3/18.pdf Briggs, D. C., Ruiz-Primo, M. A., Furtak, E., Shepard, L., & Yin, Y. (2012). Meta-analytic methodology and inferences about the efficacy of formative assessment. Educational Measurement: Issues and Practice, 31, 13–17. doi:10.1111/j.1745-3992.2012.00251.x/full Brookhart, S. B. (2007). Expanding views about formative classroom assessment: A review of the literature. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice (pp. 43–62). New York, NY: Teachers College Press. Brown, A. (1987). Metacognition, executive control, self-regulation, and other more mysterious mechanisms. In F. E. Weinart & R. H. Kluwe (Eds.), Metacognition, motivation, and understanding (pp. 65–116). Hillsdale, NJ: Lawrence Erlbaum. Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical synthesis. Review of Educational Research, 65, 245–281. doi:10.3102/00346 543065003245 Clark, I. (2012). Formative assessment: Assessment is for self-regulated learning. Educational Psychology Review, 24, 205–249. doi:10.1007/s10648-011-9191-6 Coburn, C. E., Toure, J., & Yamashita, M. (2009). Evidence, interpretation, and persuasion: Instructional decision making in the district central office. Teachers College Record, 111, 1115–1161. Retrieved from http://www.tcrecord.org/Content.asp?ContentId=15232 Coburn, C. E., & Turner, E. O. (2011). Research on data use: A framework and analysis. Measurement, 9, 173–206. doi:10.1080/15366367.2011.626729 Coburn, C. E., & Turner, E. O. (2012). The practice of data use: An introduction. American Journal of Education, 118, 99–111. doi:10.1086/663272 Corcoran, T., Mosher, F. A., & Rogat, A. (2009). Learning progressions in science. An evidence based approach to reform (CPRE Research Report 63). New York, NY: Center on Continuous Instructional Improvement, Columbia University. Retrieved from http://files. eric.ed.gov/fulltext/ED506730.pdf Crisp, G. T. (2012). Integrative assessment: Reframing assessment practice for current and future learning. Assessment & Evaluation in Higher Education, 37, 33–43. doi:10.1080/ 02602938.2010.494234 Daro, P., Mosher, F. A., & Corcoran, T. (2011). Learning trajectories in mathematics: A foundation for standards, curriculum, assessment, and instruction (CPRE Research Report #RR-68). Philadelphia, PA: Consortium for Policy Research in Education, University of Pennsylvania Graduate School of Education. Earl, L. M., & Katz, S. (2006). Leading schools in a data-rich world: Harnessing data for school improvement. Thousand Oaks, CA: Corwin. Eggen, T. J. H. M. (2004). Contributions to the theory and practice of computerized adaptive testing (Doctoral dissertation). University of Twente, Enschede. Retrieved from http:// www.cito.nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_dissertatie_ theo_eggen.ashx Downloaded by [Cito Groep] at 04:31 27 January 2015 18 F.M. Van der Kleij et al. Elwood, J. (2006). Gender issues in testing and assessment. In C. Skelton, B. Francis, & L. Smulyan (Eds.), Handbook of gender and education (pp. 262–278). Thousand Oaks, CA: Sage. Elwood, J., & Klenowski, V. (2002). Creating communities of shared practice: The challenges of assessment use in learning and teaching. Assessment & Evaluation in Higher Education, 27, 243–256. doi:10.1080/0260293022013860 Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83, 70–120. doi:10.3102/0034654312474350 Furtak, E. M. (2012). Linking a learning progression for natural selection to teachers’ enactment of formative assessment. Journal of Research in Science Teaching, 49, 1181–1210. doi:10.1002/tea.21054 Gipps, C. (1994). Beyond testing: Towards a theory of educational assessment. London: Falmer. Gravemeijer, K. (2004). Local instruction theories as means of support for teachers in reform mathematics education. Mathematical Thinking and Learning, 6, 105–128. doi:10.1207/ s15327833mtl0602_3 Hargreaves, E. (2005). Assessment for learning? Thinking outside the (black) box. Cambridge Journal of Education, 35, 213–224. doi:10.1080/03057640500146880 Harlen, W. (2010). What is quality teacher assessment? In J. Gardner, W. Harlen, L. Hayward, & G. Stobart (Eds.), Developing teacher assessment (pp. 29–52). Maidenhead: Open University Press. Hattie, J., & Gan, M. (2011). Instruction based on feedback. In P. Alexander & R. E. Mayer (Eds.), Handbook of research on learning and instruction (pp. 249–271). New York, NY: Routledge. Ikemoto, G. S., & Marsh, J. A. (2007). Cutting through the data-driven mantra: Different conceptions of data-driven decision making. In P. A. Moss (Ed.), Evidence and decision making (pp. 105–131). Malden, MA: Wiley-Blackwell. Retrieved from http://www.rand. org/content/dam/rand/pubs/reprints/2009/RAND_RP1372.pdf Ingram, D., Louis, S. K., & Schroeder, R. G. (2004). Accountability policies and teacher decision making: Barriers to the use of data to improve practice. Teachers College Record, 106, 1258–1287. doi:10.1111/j.1467-9620.2004.00379.x Johnson, M., & Burdett, N. (2010). Intention, interpretation and implementation: Some paradoxes of assessment for learning across educational contexts. Research in Comparative and International Education, 5, 122–130. doi:10.2304/rcie.2010.5.2.122 Jonassen, D. H. (1991). Evaluating constructivist learning. Educational Technology, 31(9), 28–33. Keeley, P., & Tobey, C. R. (2011). Mathematics formative assessment. Thousand Oaks, CA: Corwin. Klenowski, V. (2009). Assessment for learning revisited: An Asia-Pacific perspective. Assessment in Education: Principles, Policy & Practice, 16, 263–268. doi:10.1080/ 09695940903319646 Kulhavy, R. W., & Stock, W. A. (1989). Feedback in written instruction: The place of response certitude. Educational Psychology Review, 1, 279–308. doi:10.1007/ BF01320096 Lai, M. K., & Schildkamp, K. (2013). Data-based decision making: An overview. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education: Challenges and opportunities (pp. 9–21). Dordrecht: Springer. doi:10.1007/978-94-0074816-3 Ledoux, G., Blok, H., Boogaard, M., & Krüger, M. (2009). Opbrengstgericht werken. Over waarde van meetgestuurd onderwijs [Data-driven decision making. About the value of measurement oriented education]. SCO-Rapport 812. Amsterdam: SCO-Kohnstamm Instituut. Retrieved from http://dare.uva.nl/document/170475 Leighton, J. P., & Gierl, M. J. (2007a). Defining and evaluating models of cognition used in educational measurement to make inferences about examinees’ thinking processes. Educational Measurement: Issues and Practice, 26, 3–16. doi:10.1111/j.17453992.2007.00090.x Leighton, J. P., & Gierl, M. J. (Eds.). (2007b). Cognitive diagnostic assessment for education. Theory and applications. New York, NY: Cambridge University Press. Downloaded by [Cito Groep] at 04:31 27 January 2015 Assessment in Education: Principles, Policy & Practice 19 Lesgold, A. (2004). Contextual requirements for constructivist learning. International Journal of Educational Research, 41, 495–502. doi:10.1016/j.ijer.2005.08.014 Mandinach, E., Honey, M., Light, D., & Brunner, C. (2008). A conceptual framework for data-driven decision-making. In E. B. Mandinach & M. Honey (Eds.), Data-driven school improvement: Linking data and learning (pp. 13–31). New York, NY: Teachers College Press. Mandinach, E. B., & Jackson, S. S. (2012). Transforming teaching and learning through data-driven decision making. Thousand Oaks, CA: Corwin. Marsh, J. A. (2012). Interventions promoting educators’ use of data: Research insights and gaps. Teachers College Record, 114(11), 1–48. Marsh, J. A., Pane, J. F., & Hamilton, L. S. (2006). Making sense of data-driven decision making in education. Evidence from recent RAND research. Santa Monica, CA: RAND Corporation. Ministry of Education, British Columbia, Canada. (2002). Accountability framework. Retrieved from http://www.bced.gov.bc.ca/policy/policies/accountability_framework.htm Moss, P. A., Pullin, D., Gee, J. P., & Haertel, E. H. (2005) The idea of testing: Psychometric and sociocultural perspectives. Measurement: Interdisciplinary Research and Perspectives, 3, 63–83. doi:10.1207/s15366359mea0302_1 Narciss, S. (2008). Feedback strategies for interactive learning tasks. In J. M. Spector, M. D. Merril, J. J. G. van Merriënboer, & M. P. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed., pp. 125–144). Mahwah, NJ: Lawrence Erlbaum Associates. Nicol, D., & McFarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31, 199–218. doi:10.1080/03075070600572090 Ruiz-Primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific inquiry: Exploring teachers’ practices and student learning. Educational Assessment, 11, 237–263. doi:10.1080/10627197.2006.9652991 Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. The Journal of Technology, Learning, and Assessment, 8. Retrieved from http://napoleon.bc. edu/ojs/index.php/jtla/article/viewFile/1623/1467 Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. doi:10.1007/BF00117714 Sanders, P. (2011). Het doel van toetsen [The purpose of testing]. In P. Sanders (Ed.), Toetsen op school [Testing at school] (pp. 9–20). Arnhem: Cito. Retrieved from http://www.cito. nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_toetsen_op_school.ashx Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26, 482–496. doi:10.1016/j.tate.2009.06.007 Schildkamp, K., & Lai, M. K. (2013). Conclusions and a data use framework. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education: Challenges and opportunities (pp. 177–191). Dordrecht: Springer. doi:10.1007/978-94-007-4816-3 Schildkamp, K., Lai, M. K., & Earl, L. (Eds.). (2013). Data-based decision making in education: Challenges and opportunities. Dordrecht: Springer. doi:10.1007/978-94-007-4816-3 Schildkamp, K., & Poortman, C. L. (in press). Factors influencing the functioning of data teams. Teachers College Record, 117(5). Shepard, L. A. (2005, October). Formative assessment: Caveat emptor. Paper presented at the ETS Invitational Conference, The Future of Assessment: Shaping Teaching and Learning, New York, NY. Shuell, T. (1986). Cognitive conceptions of learning. Review of Educational Research, 56, 411–436. doi:10.3102/00346543056004411 Simon, M. (1995). Reconstructing mathematics pedagogy from a constructivist perspective. Journal for Research in Mathematics Education, 26, 114–145. Retrieved from http:// www.jstor.org/stable/749205 Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice and research. Educational Researcher, 31(7), 15–21. doi:10.3102/0013189X031007015 Downloaded by [Cito Groep] at 04:31 27 January 2015 20 F.M. Van der Kleij et al. Slavin, R. E. (2003). A reader’s guide to scientifically based research. Educational Leadership, 60(5), 12–16. Retrieved from http://www.ascd.org/publications/educational-leader ship/feb03/vol60/num05/A-Reader’s-Guide-to-Scientifically-Based-Research.aspx Sluijsmans, D., Joosten-ten Brinke, D., & Van der Vleuten, C. (submitted). Toetsen met leerwaarde: Een reviewstudie naar de effectieve kenmerken van formatief toetsen [Testing with learning value: A review study on the effective characteristics of formative assessment]. Manuscript submitted for publication. Stevenson, C. E., Hickendorff, M., Resing, W. C. M., Heiser, W. J., & de Boeck, P. A. L. (2013). Explanatory item response modelling of children’s change on a dynamic test of analogical reasoning. Intelligence, 41, 157–168. doi:10.1016/j.intell.2013.01.003 Stobart, G. (2008). Testing times: The uses and abuses of assessment. Abingdon: Routledge. Supovitz, J. (2010). Knowledge-based organizational learning for instructional improvement. In A. Hargreaves, A. Lieberman, M. Fullan, & D. Hopkins (Eds.), Second international handbook of educational change (pp. 707–723). New York, NY: Springer. doi:10.1007/ 978-90-481-2660-6 Swan, G., & Mazur, J. (2011). Examining data driven decision making via formative assessment: A confluence of technology, data interpretation heuristics and curricular policy. Contemporary Issues in Technology and Teacher Education, 11, 205–222. Retrieved from http://www.editlib.org/p/36021 Thurlings, M., Vermeulen, M., Bastiaens, T., & Stijnen, S. (2013). Understanding feedback: A learning theory perspective. Educational Research Review, 9, 1–15. doi:10.1016/j.edurev.2012.11.004 Timperley, H. (2009). Using assessment data for improving teaching practice. Australian College of Educators, 8(3), 21–27. Retrieved from http://oksowhat.wikispaces.com/file/view/ Using+assessment+data+Helen+Timperley.pdf Verhofstadt-Denève, L., Van Geert, P., & Vyt, A. (2003). Handboek ontwikkelingspsychologie. Grondslagen en theorieën [Handbook of developmental psychology. Principles and theories]. Houten: Bohn Stafleu Van Loghum. Vygotsky, L. S. (1978). Mind in society. London: Harvard University Press. Wayman, J. C., Jimerson, J. B., & Cho, V. (2012). Organizational considerations in establishing the data-informed district. School Effectiveness and School Improvement: An International Journal of Research, Policy and Practice, 23, 159–178. doi:10.1080/ 09243453.2011.652124 Wayman, J. C., Spikes, D. D., & Volonnino, M. (2013). Implementation of a data initiative in the NCLB era. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education: Challenges and opportunities (pp. 135–153). doi:10.1007/978-90481-2660-6 Wayman, J. C., & Stringfield, S. (2006). Data use for school improvement: School practices and research perspectives. American Journal of Education, 112, 463–468. doi:10.1086/ 505055 Wiliam, D. (2011). What is assessment for learning? Studies in Educational Evaluation, 37, 3–14. doi:10.106/j.stueduc.2011.03.001