Brain Is Related To Behavior
Brain Is Related To Behavior
Brain Is Related To Behavior
1380-3395/98/2003-419$12.00
Swets & Zeitlinger
METHODOLOGICAL COMMENTARY
Brain is Related to Behavior (p < .05)*
Konstantine K. Zakzanis
York University, Toronto, Canada and Baycrest Centre for Geriatric Care, Toronto, Canada
ABSTRACT
This article demonstrates that sole reliance on tests of statistical significance in the analysis and interpretation of neuropsychological data that is grounded in quasi-experimentation can systematically confound the
conclusions drawn from our neuropsychological research regarding brain-behavior relations. The conclusion of this article is that we must accompany the statistical significance test with more appropriate statistics namely, point-estimate effect sizes along with interval estimation and meta-analysis for the analysis
of data from multiple studies. The argument for this conclusion is demonstrated from the re-analysis of
published neuropsychological test findings. It is recommended on the basis of this review that the consumer of neuropsychological reports will be better served if due consideration is given to the magnitude of
effect in brain-behavior statistical analyses.
*
Appreciation is expressed to two anonymous reviewers and to Dr. Linas A. Bieliauskas for their helpful comments.
Address correspondence to: Konstantine K. Zakzanis, Department of Psychology, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada. E-mail: zakzanis@yorku.ca.
Accepted for publication: April 30, 1998.
420
KONSTANTINE K. ZAKZANIS
logical tasks that are sensitive to frontal-executive functioning (e.g., Wisconsin Card Sorting
Test [WCST]; Heaton, Chelune, Talley, Kay, &
Curtiss, 1993) by providing data that achieves
statistical significance. Despite illustrating that
the two groups (typically patients with schizophrenia and normal healthy controls) are statistically different from one another, the question
that fails to be addressed when support for a hypothesis is based solely on the interpretation of
a dichotomized p value is how much of a difference is there between the two groups being compared, or more accurately, what is the magnitude
of deficit (or effect) in the patient sample and
how confident can we be in our obtained results?
Further, it is all too often that a significant p
value is taken to imply the presence of a significant deficit. This is a fatal error in data interpretation (Soper, Cicchetti, Satz, Light, &
Orsini, 1988). To illustrate, a re-analysis of
WCST results in patients with schizophrenia
from published studies with competing conclusions will serve the point. Although the issues raised in this paper are illustrated with a
specific example of a cross-sectional comparison of a continuous outcome (i.e., WCST performance) between two groups (i.e., patients
with schizophrenia and healthy normal controls), the reader is reminded that the point being
made applies to all types of experimental, quasiexperimental, and observational designs involving the estimation of effects in populations.
In the first study of frontal-executive functioning, WCST results for 117 patients with
schizophrenia and 68 healthy normal controls is
presented. A statistically significant two-tailed
independent sample t test with a level of significance at p < .05 is reported for the perseverative
error score on the WCST. This is interpreted to
support frontal-executive impairment in patients
with schizophrenia. In a second study, a nonsignificant (p > .05) two-tailed independent sample
t test is reported for the perseverative error score
on the WCST for 10 patients with schizophrenia
and 10 healthy normal control comparisons.
This result is interpreted as failing to support
frontal-executive impairment in schizophrenia,
but is also qualified with if there were more
patients and controls in the respective groups,
421
422
KONSTANTINE K. ZAKZANIS
matical terms, d is simply the difference in patient and control means calibrated in pooled
standard deviation units (i.e., patient mean
control mean / pooled standard deviation). The
effect size d is not dependent on nor influenced
by sample size. Moreover, effect sizes can demonstrate test score overlap dispersion between
two groups by utilizing and inverting Cohens
(1988) idealized population distributions. That
is, a hypothetical percent overlap is associated
with the varying degrees of effect size. For example, an effect size of 0.0 corresponds to complete overlap the two groups are completely
indistinguishable from one another on the variable measure. If d = 1.0, the corresponding overlap is 45% about half of the patient group can
be discriminated from the control group on the
basis of the variable measure. If d = 3.0, the corresponding overlap is less than 5% the two
groups are approximately completely distinguishable from one another with respect to the
variable measure. Thus, if d does equal about
3.0 for the variable measure, the effect size may
serve as a marker on account of approximate
complete discriminability between experimental
(i.e., patient) and control groups (see Zakzanis,
1998b). Briefly, a diagnostic marker should be
capable of discriminating approximately all patients from all normal healthy controls on the
dependent variable of interest. Such discriminability would have to have an associated effect size greater than 3.0 as this size of effect
corresponds to test score dispersion overlap of
less than 5% between patients and normal
healthy controls. For example, Zakzanis (1998b)
showed that effect sizes of delayed recall and
structural imaging of the hippocampus in patients with dementia of the Alzheimers type
correspond to effect sizes greater than 3.0 and
percentage overlap (OL%) values of less than
5%. This finding was taken to support the notion
that temporal-hippocampal dysfunction is a
marker for dementia of the Alzheimers type
(Zakzanis, 1998b). In doing so, heuristic benchmark criteria were proposed (i.e., d > 3.0 OL%
< 5) that could help articulate further the
strength of neuroanatomic and neuropsychological evidence in other disorders with prominent
brain pathology. Although such a standard is not
423
424
KONSTANTINE K. ZAKZANIS
effect-size statistic. One methodological approach that utilizes this statistic is meta-analysis. Meta-analysis has become a statistically sophisticated tool for objective research integration (Cooper & Hedges, 1994; Glass, McGaw &
Smith, 1981; Hedges & Olkin, 1985; Hunter,
Schmidt & Jackson, 1982; Rosenthal, 1991;
Schimdt, 1996). In addition to solving problems
with traditional literature reviews, such as the
selective inclusion of studies often based on the
reviewers own impressionistic view of the
quality of the study; differential subjective
weighting of studies in the interpretation of a set
of findings; misleading interpretations of study
findings; failure to examine characteristics of
the studies as potential explanations for disparate or consistent results across studies; and failure to examine moderating variables in the relationship under examination (Wolf, 1986), metaanalysis provides tools for the analysis of magnitude (i.e., the effect size d). Eligible research
studies comprising a common dependent variable and statistics that can be transformed into
effect sizes are viewed as a population to be systematically sampled and surveyed. Individual
study results (typically means and standard deviations from each group) and moderator variables
(e.g., education, duration of disease, gender,
age) are then abstracted, quantified and coded,
and assembled into a database that is statistically
analyzed (Lipsey & Wilson, 1993).
The main statistic presented in a meta-analysis is the mean effect size when there is little or
no heterogeneity of effect observed across studies. This statistic is meant to reflect the average
individual effect size across the sample of studies included in the synthesis. However, in the
vast majority of meta-analyses in which there is
appreciable heterogeneity of effect observed
across studies, the primary goal should be to
document and explain such heterogeneity in
terms of various characteristics of the study populations or methods along with the mean effect
across studies. That is, moderator variables are
correlated to the effect size in order to parse relationships of subject characteristics that may
influence the magnitude of the size of effect between the groups being compared. Moreover, as
indicated above, the effect size can then be
425
statistical significance, it is ideal for testing ordinal claims relating the order of conditions (see
Frick, 1996), but is insufficient when clinical
significance is important (also see Bieliauskas,
Fastenau, Lacey & Roper, 1997). As such, when
p values are desired, an exact p value (e.g., p
= 0.06) and the deletion from our written reports
any reference to the results being significant
or nonsignificant would indeed serve the
reader of neuropsychological reports well. It
would appear, therefore, that when designing a
study to test a particular brain-behavior hypothesis in neuropsychology that incorporates a
quasi-experimental research design, the magnitude of effect and interval estimation should be
taken into consideration and reported along side
our conclusions for example: The brain is related to behavior ( d = ).
REFERENCES
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 1-29.
Benson, D. F. (1994). The neurology of thinking. New
York: Oxford University Press.
Bieliauskas, L. A., Fastenau, P. S., Lacey, M. A., &
Roper, B. L. (1997). Use of the odds-ratio to translate neuropsychological test scores into real-world
outcomes: From statistical significance to clinical
significance. Journal of Clinical and Experimental
Neuropsychology, 19, 889-896.
Binder, L. M., Rohling, M. L., & Larrabee, G. J.
(1997). A review of mild head trauma. Meta-analytic review of neuropsychological studies. Journal
of Clinical and Experimental Neuropsychology,
19, 421-431.
Bouillaud, J. B. (1825). Recherches cliniques propres
a demontrer que la perte de la parole correspond a
la lesion de lobules anterieurs du cerveau, et a
confirmer lopinion de M. Gall sur le siege de
lorgane du langage articule. Archives of General
Medicine, 8, 25-45.
Broca, P. (1865). Sur la faculte du langage articule.
Bulletin de Sociology et Anthropology de Paris, 6,
337-393.
Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48,
378-399.
Christensen, H., Griffiths, K., Mackinnon, A., &
Jacomb, P. (1997). A quantitative review of cognitive deficits in depression and Alzheimer-type dementia. Journal of the International Neuropsychological Society, 3, 631-651.
426
KONSTANTINE K. ZAKZANIS
427
Thornton, A. E., & Raz, N. (1997). Memory impairment in multiple sclerosis: A quantitative review.
Neuropsychology, 11, 357-366.
Weinberger, D. R. (1988). Schizophrenia and the
frontal lobe. Trends in Neurosciences, 11, 367-370.
Wernicke, C. (1874). Des aphasische symptomenkomplex. Breslau: Cohn & Weigart.
Wishart, H., & Sharpe, D. (1997). Neuropsychological aspects of multiple sclerosis: A quantitative
review. Journal of Clinical and Experimental Neuropsychology, 19, 810-824.
Wolf, F. M. (1986). Meta-analysis: Quantitative
methods for research synthesis. Newbury, CA:
Sage.
Zakzanis, K. K. (1998a). Neurocognitive deficit in
fronto-temporal dementia. Neuropsychiatry, Neuropsychology, and Behavioral Neurology, 11, 127135.
Zakzanis, K. K. (1998b). Quantitative evidence for
neuroanatomic and neuropsychological markers in
dementia of the Alzheimers type. Journal of Clinical and Experimental Neuropsychology, 20, 259269.
Zakzanis, K. K. (1998c). The reliability of meta-analytic review. Psychological Reports, 83, 215-222.
Zakzanis, K. K. (in press-a). The subcortical dementia
of Huntingtons disease. Journal of Clinical and
Experimental Neuropsychology, 20.
Zakzanis, K. K., & Heinrichs, R. W. (1997, August).
The frontal-executive hypothesis in schizophrenia:
Cognitive and neuroimaging evidence. Paper presented at the Annual Meeting of the American Psychological Association, Chicago, IL.
Zakzanis, K. K., Leach, L., & Kaplan, E. (1998). On
the nature and pattern of neurocognitive deficit in
major depressive disorder. Neuropsychiatry, Neuropsychology, and Behavioral Neurology, 11, 136146.