Nothing Special   »   [go: up one dir, main page]

Nichols & Joldersma 2008 JEM Book Review

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Journal of Educational Measurement

Winter 2008, Vol. 45, No. 4, pp. 407–411

Book Review

Leighton, J. P., & Gierl, M. J. (Eds.) (2007). Cognitive Diagnostic Assessment for
Education: Theory and Applications. New York: Cambridge University Press.

Reviewed by
Paul D. Nichols and Kevin Joldersma
Pearson Educational Measurement
Under No Child Left Behind (NCLB, 2001), states are working toward having
100% of their students proficient in the academic year 2013–14 in the subjects of
English Language Arts, Mathematics, and Science. The expected mechanisms lead-
ing to improved student achievement were not specified in the NCLB legislation. But
within this environment of increasing pressure to improve student achievement, one
lever to achieve the NCLB goals is offered by formative test score information, i.e.,
information that can be used to narrow the gap between students’ current state of
achievement and the targeted state of achievement (William & Black, 1996). Educa-
tors may be able to leverage the formative information provided by assessments to
make better instructional decisions that will, in turn, lead to improved student learn-
ing. The use of cognitive diagnostic assessments (CDA) promises to provide teachers
the kind of formative information with which to lever higher student achievement.
Over the last several decades, a number of books and articles have appeared that
discuss CDA. The edited book, Cognitive Diagnostic Assessment for Education:
Theory and Applications, offers perhaps the most up-to-date survey of CDA. The
book self-consciously reviews and extends further earlier writings on CDA. The ex-
plicit connections the book makes between past work and present discussions pro-
vides valuable continuity in this emerging field. Because so much of the historical
context is provided, this book can serve as a primer for researchers wishing to gain
an understanding of CDA. But the book also provides a concise summary and critical
review of current work for researchers already active in CDA.
In this book review, we first briefly review the content covered by the book. Next,
we identify the general themes that characterize the constituent chapters in the book.
A number of themes stitch together the component chapters. We will describe three
of those themes and how the chapters touch on each theme. We will also explore how
future CDA work might extend the themes addressed in these chapters.
The book is divided into three sections. In the first section, the chapters address
two concerns. The first two chapters describe the issues that motivate CDA: test va-
lidity and formative test score information. The following pair of chapters deal with
what Messick (1989) termed “philosophical conceits” or a prescriptive use of the
philosophy of science to lay down how concepts, arguments, methods, and models
should be used. In the second section, the three chapters address the development
and implementation of theories that serve to direct CDA test development. In the
last section, the five chapters address what Mislevy and his collaborators (Mislevy,

Copyright 
c 2008 by the National Council on Measurement in Education 407
Book Review

Steinberg, Almond, & Lukas, 2006) call the measurement model within the evidence
model of an evidence-centered design approach. The measurement model connects
the knowledge, skills, and abilities of the construct to the observable performance
on the test. The measurement model summarizes across tasks the evidence of the
knowledge, skills, and abilities that is provided in the observable performance.
At least three general themes stitch together the constituent chapters in the book.
The first theme that we drew from the chapters was that CDA was difficult to define.
Nearly every chapter in the book begins by offering a definition of CDA. These defi-
nitions share three features: First, CDAs are designed to measure specific knowledge
structures and processing skills. These constructs are sometimes contrasted with la-
tent ability or content. Second, CDAs use cognitive information processing models
as the construct theory in construct-centered test development. The construct the-
ory is used to join the way assessment evidence is gathered (test design) and inter-
preted (scoring, calibration and scaling) to the knowledge structures and cognitive
processes the assessment is intended to assess. The psychological sciences offer can-
didates other than information processing models for construct theories such as the
sociocultural approaches described in Birenbaum and Dochy (1996). Finally, CDAs
are defined by distinguishing them from traditional classroom-based and large-scale
assessments. As Leighton and Gierl note, “It is easy to say more about what CDA is
not than about what it is” (p. 147).
The second theme that we drew from the book was that CDA embraces a strong
program of validity. What is a strong program of validity? A strong program of va-
lidity was outlined by Loevinger (1957) and Messick (1989), among others. Under a
strong program of validity, construct theory dominates every aspect of test develop-
ment. Answers to questions of item content, scoring model, and validity research are
largely deduced from construct theory.
Enthusiasm for a strong program of validity is evident throughout this book. The
chapter by Gorin describes the manner in which the information in the construct
theory is transmitted through each stage of test development. Other chapters describe
how decisions with respect to one aspect of test development or another are driven
by construct theory.
But enthusiasm for a strong program of validity is tempered by an understanding
of the challenges. In a number of the chapters, the authors acknowledge that can-
didates for construct theories are often not available. According to Borsboom and
Mellenbergh, “The problem that researchers are unable to specify the processes that
connect theoretical attributes to test scores is quite widespread throughout the social
sciences” (p. 92). The necessity for a well-developed construct theory to drive test
development may even be fatal to this new field. Leighton and Gierl lament that,
“The strong program of validity required by CDA is, in effect, both the allure and the
possible downfall of this form of assessment” (p. 13).
Looking toward the future, the construction of a dichotomy between strong pro-
grams of validity characterized by explicit chains of reasoning from observation to
cognitive theory, and weak programs of validity that lack such chains of reason-
ing, is likely to impede the development of CDA. Stark contrasts drawn by Nichols
(1994) and others between tests designed from cognitive frameworks and tests
designed from psychometric frameworks may be overly dramatic. The insistence

408
Book Review

on a strong program of validity underlying the test development process would halt
many testing programs. Cognitive science currently lacks the depth and breadth in
theory development to support exclusive dependence on a strong program of valid-
ity. However, some research and at least limited theory development on thinking,
learning, and problem solving exists in nearly every field and domain. A moder-
ate program of validity is practical within the current levels of theory development in
cognitive science. Examples of a moderate program of validity—for example, testing
practices that recognize the role of construct theory but move forward in construct-
ing tests using current levels of theory development—are sprinkled throughout this
book.
The third theme that we drew from the chapters was that CDA provides forma-
tive information. The current understanding of the meaning of formative assessment
derives from Scriven’s (1967) distinction between “formative evaluation” and “sum-
mative evaluation.” The term formative assessment appears to have entered the lexi-
con of educational measurement through work on mastery learning (Airasian, 1971;
Bloom, Hastings, & Madaus, 1971). Despite current attempts to narrow the definition
(Perie, Marion, & Gong, 2007), formative assessment is assessment that provides in-
formation used to narrow the gap between students’ current state of achievement and
the targeted state of achievement (William & Black, 1996). Summative assessment,
in contrast, is used to assess whether the program, intervention, or person has met
stated goals.
An argument sometimes made is that CDA will help teachers make better in-
structional decisions because these tests yield detailed information about students’
problem solving. While such arguments have merit, they have a flaw that prevents
them from being persuasive. Simply offering more information to educators does not
guarantee that educators can or will use that information to improve instruction and
consequently improve student learning. The argument that a test is a priori formative
is wrongheaded. Like test validity (Kane 2006; Messick, 1989), what is formative
is not the test as such but the use of the information offered in test scores. As the
chapter by Huff and Goodman demonstrates, the same information derived from test
scores can be used for different purposes by different consumers. Total scores from
large-scale assessments may be used by policymakers for summative decisions but,
as the Huff and Goodman chapter documents, these same scores are used by some
teachers to inform formative decisions.
A number of chapters in this book remark on the need for alignment between
the information offered in the test scores and the information sought by educators.
Several chapters hint at the gap between providing educators information and pro-
viding educators usable information. For example, Huff and Goodman remark that
“. . .assessment design should be informed by the same cognitive framework that
shapes the curriculum and should provide feedback to teachers that informs instruc-
tion” (p. 22). Later in the book, Gierl, Leighton, and Hunka note: “In other words,
cognitive diagnostic assessment occurs as part of a much larger instructional process,
where the goal is to identify learning problems and help remediate these problems”
(p. 244).

409
Book Review

CDA may be ideally positioned to address the gap between providing educators
information and providing educators usable information. The CDA field already
embraces a strong program of validity, and researchers in CDA have a predisposi-
tion to use empirically supported theory to inform assessment design. Researchers
need to extend theory beyond the boundaries of construct interpretation to cover test
score use. Test development must be driven by both a theory of the test construct and
a theory of test score use.
The chapter by Gierl and Leighton, the final chapter in the book, sets the stage for
future development in CDA. The issues discussed in this chapter hint at the kinds of
developments we may see. Future development may see fewer catholic treatments
of CDA and more specific treatments of aspects of CDA such as books focusing on
measurement models or principled test design. Future development may see more re-
search integrating theories of cognition, learning, instruction, and assessment. Cer-
tainly, we will see a growing amount of research and development in the spirit of
CDA.

References
Airasian, P. W. (1971). The role of evaluation in mastery learning. In J. H. Block (Ed.), Mastery
learning: Theory and practice (pp. 77–88). New York: Holt, Rinehart and Winston.
Birenbaum, M., & Dochy, F. J. R. C. (Eds.) (1996). Alternatives in assessment of achieve-
ments, learning processes and prior knowledge. Boston: Kluwer.
Bloom, B. S., Hastings, J. T., & Madaus, G. F. (Eds.) (1971). Handbook on the formative and
summative evaluation of student learning. New York: McGraw-Hill.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed.,
pp. 17–64). Washington, DC: National Council on Measurement in Education and Ameri-
can Council on Education.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological
Reports, 3, 635–694.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd Ed.,
pp. 13–103). Washington, DC: American Council on Education and National Council on
Measurement in Education.
Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology,
and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J.
Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–47).
Mahwah, NJ: Erlbaum.
Nichols, P. (1994). A framework for developing cognitively diagnostic assessments. Review
of Educational Research, 64, 575–603.
No Child Left Behind Act of 2001. Pub. L. No. 107–110, 115. Stat 1435 (2002).
Perie, M., Marion, S., & Gong, B. (2007, June). A framework for considering interim assess-
ments. Paper presented at the National Conference on Large-Scale Assessment, Nashville,
TN.
Scriven, M. (1967). The methodology of evaluation. Washington, DC: American Educational
Research Association.
William, D., & Black, P. (1996). Meanings and consequences: A basis for distinguishing for-
mative and summative functions of assessment? British Educational Research Journal, 22,
537–548.

410
Authors
PAUL NICHOLS is Vice President in Assessment and Information, Pearson Educational Mea-
surement, 2510 North Dodge Street, Iowa City, IA 52240; paul.nichols@pearson.com. His
primary research interests include psychology and psychometric methods.
KEVIN JOLDERSMA is a Research Scientist, Measurement Incorporated, 425 Edgepine
Drive, Holly Springs, NC 27540; kevin.joldersma@gmail.com. His primary research in-
terests include multilingual assessments and applied psychometrics.

411

You might also like