published: 12 April 2017

doi: 10.3389/fpsyg.2017.00557

The Stroop Color and Word Test

Federica Scarpina 1, 2* and Sofia Tagini 2, 3
“Rita Levi Montalcini” Department of Neuroscience, University of Turin, Turin, Italy, 2 IRCCS Istituto Auxologico Italiano,
Ospedale San Giuseppe, Piancavallo, Italy, 3 CiMeC Center for the Mind/Brain Sciences, University of Trento, Rovereto, Italy

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used
to assess the ability to inhibit cognitive interference that occurs when the processing of
a specific stimulus feature impedes the simultaneous processing of a second stimulus
attribute, well-known as the Stroop Effect. The aim of the present work is to verify
the theoretical adequacy of the various scoring methods used to measure the Stroop
effect. We present a systematic review of studies that have provided normative data
for the SCWT. We referred to both electronic databases (i.e., PubMed, Scopus, Google
Scholar) and citations. Our findings show that while several scoring methods have been
reported in literature, none of the reviewed methods enables us to fully assess the Stroop
effect. Furthermore, we discuss several normative scoring methods from the Italian
panorama as reported in literature. We claim for an alternative scoring method which
takes into consideration both speed and accuracy of the response. Finally, we underline
the importance of assessing the performance in all Stroop Test conditions (word reading,
color naming, named color-word).
Joetsu University of Education, Japan The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for
*Correspondence: both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference,
Federica Scarpina which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus (Stroop, 1935). In the most common version of the
SCWT, which was originally proposed by Stroop in the 1935, subjects are required to read
third table, named color-word (CW) condition, color-words are printed in an inconsistent
color ink (for instance the word “red” is printed in green ink). Thus, in this incongruent
In other words, the participants are required to perform a less automated task (i.e., naming
ink color) while inhibiting the interference arising from a more automated task (i.e., reading
the word; MacLeod and Dunbar, 1988; Ivnik et al., 1996). This difficulty in inhibiting the
Stroop Color and Word Test more automated process is called the Stroop effect (Stroop, 1935). While the SCWT is widely
Front. Psychol. 8:557. used to measure the ability to inhibit cognitive interference; previous literature also reports its
doi: 10.3389/fpsyg.2017.00557 application to measure other cognitive functions such as attention, processing speed, cognitive

flexibility (Jensen and Rohwer, 1966), and working memory and Freshwater, 2002; Mitrushina et al., 2005; Strauss et al.,
(Kane and Engle, 2003). Thus, it may be possible to use the SCWT 2006a). At the end of the selection process we had 32 articles
to measure multiple cognitive functions. suitable for review (Figure 1).
In the present article, we present a systematic review of the From the systematic review, we extracted five studies with
SCWT literature in order to assess the theoretical adequacy of Italian normative data. Details are reported in Table 1. Of the
the different scoring methods proposed to measure the Stroop remaining 27 studies that provide normative data for non-Italian
effect (Stroop, 1935). We focus on Italian literature, which reports populations, 16 studies (Ivnik et al., 1996; Ingraham et al., 1988;
the use of several versions of the SCWT that vary in in terms of Rosselli et al., 2002; Moering et al., 2004; Lucas et al., 2005;
stimuli, administration protocol, and scoring methods. Finally, Steinberg et al., 2005; Seo et al., 2008; Peña-Casanova et al., 2009;
we attempt to indicate a score method that allows measuring Al-Ghatani et al., 2011; Norman et al., 2011; Andrews et al.,
the ability to inhibit cognitive interference in reference to the 2012; Llinàs-Reglà et al., 2013; Morrow, 2013; Lubrini et al., 2014;
subjects’ performance in SCWT. Rivera et al., 2015; Waldrop-Valverde et al., 2015) adopted the
scoring method proposed by Golden (1978). In this method, the
number of items correctly named in 45 s in each conditions is
METHODS calculated (i.e., W, C, CW). Then the predicted CW score (Pcw)
is calculated using the following formula:
We looked for normative studies of the SCWT. All studies
included a healthy adult population. Since our aim was to Pcw = 45/{((45 × W) + (45 × C))/(W × C)} (1)
understand the various available scoring methods, no studies
were excluded on the basis of age, gender, and/or education of equivalent to:
participants, or the specific version of SCWT used (e.g., short
or long, computerized or paper). Studies were identified using Pcw = (W × C)/(W + C) (2)
electronic databases and citations from a selection of relevant
articles. The electronic databases searched included PubMed (All Then, the Pcw value is subtracted from the actual number of
years), Scopus (All years) and Google Scholar (All years). The last items correctly named in the incongruous condition (CW) (i.e.,
search was run on the 22nd February, 2017, using the following IG = CW − Pcw): this procedure allows to obtain an interference
search terms: “Stroop; test; normative.” All studies written in score (IG) based on the performance in both W and C conditions.
English and Italian were included. Thus, a negative IG value represents a pathological ability to
Two independent reviewers screened the papers according to inhibit interference, where a lower score means greater difficulty
their titles and abstracts; no disagreements about suitability of the in inhibiting interference.
studies was recorded. Thereafter, a summary chart was prepared Six articles (Troyer et al., 2006; Bayard et al., 2011;
to highlight mandatory information that had to be extracted from Campanholo et al., 2014; Bezdicek et al., 2015; Hankee et al.,
each report (see Table 1). 2016; Tremblay et al., 2016) adopted the Victoria Stroop Test.
One Author extracted data from papers while the second In this version, three conditions are assessed: the C and the CW
author provided further supervision. No disagreements about correspond to the equivalent conditions of the original version of
extracted data emerged. We did not seek additional information the test (Stroop, 1935), while the W condition includes common
from the original reports, except for Caffarra et al. (2002), words which do not refer to colors. This condition represents
whose full text was not available: relevant information have been an intermediate inhibition condition, as the interference effect
extracted from Barletta-Rodolfi et al. (2011). between the written word and the color name is not present.
We extracted the following information from each article: In this SCWT form (Strauss et al., 2006b), for each condition,
the completion time and the number of errors (corrected, non-
• Year of publication. corrected, and total errors) are recorded and two interference
• Indexes whose normative data were provided. scores are computed:
Eventually, as regards the variables of interest, we focused
on those scores used in the reviewed studies to assess the I1 = Word/Dot for time (3)
performance at the SCWT. I2 = Interference/Dot for time (4)

Five studies (Strickland et al., 1997; Van der Elst et al., 2006;
RESULTS Zalonis et al., 2009; Kang et al., 2013; Zimmermann et al., 2015)
adopted different SCWT versions. Three of them (Strickland
We identified 44 articles from our electronic search and screening et al., 1997; Van der Elst et al., 2006; Kang et al., 2013) computed,
process. Eleven of them were judged inadequate for our purpose independently, the completion time and the number of errors for
and excluded. Four papers were excluded as they were written each condition. Additionally, Van der Elst et al. (2006), computed
in languages other than English or Italian (Bast-Pettersen, 2006; an interference score based on the speed performance only:
Duncan, 2006; Lopez et al., 2013; Rognoni et al., 2013); two were
excluded as they included children (Oliveira et al., 2016) and TI = CWT − [(WT + CT)/2] (5)
a clinical population (Venneri et al., 1992). Lastly, we excluded
six Stroop Test manuals, since not entirely procurable (Trenerry where WT, CT, and CWT represent the time to complete
et al., 1989; Artiola and Fortuny, 1999; Delis et al., 2001; Golden the W, C, and CW table, respectively. Zalonis et al. (2009)

TABLE 1 | Summary of data extracted from reviewed articles; those related to the Italian normative data are in bold.

References Index

Ingraham et al., 1988; Ivnik et al., 1996; Rosselli et al., 2002; Moering IG = CW − [(W × C)/(W + C)]
et al., 2004; Lucas et al., 2005; Steinberg et al., 2005; Seo et al., 2008; where IG: interference score; CW: number of items properly named in 45 s in the CW condition;
Peña-Casanova et al., 2009; Al-Ghatani et al., 2011; Norman et al., W: number of items properly named in 45 s in the W condition; C: number of items properly
2011; Andrews et al., 2012; Llinàs-Reglà et al., 2013; Morrow, 2013; named in 45 s in the C condition.
Lubrini et al., 2014; Rivera et al., 2015; Waldrop-Valverde et al., 2015

Troyer et al., 2006; Bayard et al., 2011; Campanholo et al., 2014; • Completion time for each condition.
Bezdicek et al., 2015; Hankee et al., 2016; Tremblay et al., 2016 • Number of errors (corrected, not corrected, total errors) in each condition.
• Low Interference score:
where W: time to read commons words printed in different colored ink; C: time to name colored
• High Interference score:
where CW: time to read colors names printed in incongruent colored ink; C: time to name
colored dots.

Strickland et al., 1997; Kang et al., 2013 • Time completion in W, C and CW condition.
• Errors in W, C, and CW condition.

Amato et al., 2006 • Time to name 50 items in the CW condition.

Barbarotto et al., 1998 • Correct answers in 30 s in C and in CW condition.

• Shortest interval (in seconds) of the sequence correctly read in C and CW condition.

Brugnolo et al., 2015 • Correct answers in 30 s in W, C, and CW condition.

• T to read the table in W, C, and CW condition.

Caffarra et al., 2002 • TI = CWT − [(WT + CT)/2]

where TI: time interference score; WT: time to complete W condition; CT: time to complete C
condition; CWT: time to complete CW condition.
• EI = CWE − [(WE + CE)/2]
Where EI: error interference score; EI: errors interference score; WE: errors in W condition;
CE: errors in C condition; CWE: errors in CW condition.

Valgimigli et al., 2010 • I = [(DC − DI)/(DC + DI)] × 100

where DC: correct answers in 20 s in C condition; DI: correct answers in 20 s in CW

Van der Elst et al., 2006 • Time to complete W, C, and CW conditions.

• Number of errors not self-corrected in W, C, and CW conditions.
• Interference score:
TI = CWT − [(WT + CT)/2]
where TI: time interference score; WT: time to complete W condition; CT: time to complete C
condition; CWT: time to complete CW condition.

Zalonis et al., 2009 • Time to read 112 words of colors printed in incongruous colored ink.
• Number of errors and number of self-corrections in the CW condition.
• Interference score for the CW condition:
Number of items properly named in 120 s—number of errors.

Zimmermann et al., 2015 • Errors in W, C, and CW condition.

• Corrected answer in 45 s in W, C, and CW, condition.
• Interference score:
Time to read CW + [errors CW × 2(time to read CW/number of items in CW)].

recorded: (i) the time; (ii) the number of errors and (iii) Zimmermann et al. (2015) computed the number of errors
the number of self-corrections in the CW. Moreover, they and the number of correct answers given in 45 s in each
computed an interference score subtracting the number conditions. Additionally, they calculated an interference score
of errors in the CW conditions from the number of derived by the original scoring method provided by Stroop
items properly named in 120 s in the same table. Lastly, (1935).

FIGURE 1 | Flow diagram of studies selection process.

Of the five studies (Barbarotto et al., 1998; Caffarra et al., these studies administered the W, C, and CW conditions once
2002; Amato et al., 2006; Valgimigli et al., 2010; Brugnolo et al., (Amato et al., 2006; Valgimigli et al., 2010), Barbarotto et al.
2015) that provide normative data for the Italian population, (1998) administered the CW table twice, requiring participants
two are originally written in Italian (Caffarra et al., 2002; to read the word during the first administration and then
Valgimigli et al., 2010), while the others are written in English to name the ink color during the consecutive administration.
(Barbarotto et al., 1998; Amato et al., 2006; Brugnolo et al., Additionally, they also administered a computerized version of
2015). An English translation of the title and abstract of Caffarra the SCWT in which 40 stimuli are presented in each condition;
et al. (2002) is available. Three of the studies consider the red, blue, green, and yellow are used. Valgimigli et al. (2010)
performance only on the SCWT (Caffarra et al., 2002; Valgimigli and Caffarra et al. (2002) administered shorter paper versions
et al., 2010; Brugnolo et al., 2015) while the others also include of the SCWT including only three colors (i.e., red, blue, green).
other neuropsychological tests in the experimental assessment More specifically, the former administered only the C and CW
(Barbarotto et al., 1998; Amato et al., 2006). The studies are conditions including 60 items each, arranged in six columns of
heterogeneous in that they differ in terms of administered 10 items. The latter employed a version of 30 items for each
conditions, scoring procedures, number of items, and colors condition (i.e., W, C, CW), arranged in three columns of 10 items
used. Three studies adopted a 100-items version of the SCWT each.
(Amato et al., 2006; Valgimigli et al., 2010; Brugnolo et al., Only two of the five studies assessed and provided normative
2015) which is similar to the original version proposed by Stroop data for all the conditions of the SCWT (i.e., W, C, CW; Caffarra
(1935). In this version, in every condition (i.e., W, C, CW), items et al., 2002; Brugnolo et al., 2015), while others provide only
are arranged in a matrix of 10 × 10 columns and rows; the colors partial results. Valgimigli et al. (2010) provided normative data
are red, green, blue, brown, and purple. However, while two of only for the C and CW condition, while Amato et al. (2006) and

Barbarotto et al. (1998) administered all the SCWT conditions consequences of possible inhibition difficulties on the processing
(i.e., W, C, CW) but provide normative data only for the CW speed cannot be assessed. Indeed, patients would report a non-
condition, and the C and CW condition respectively. pathological reading speed in the incongruous condition, despite
These studies use different methods to compute subjects’ extremely poor performance, even if they do not apply the
performance. Some studies record the time needed, rule “naming ink color,” simply reading the word (e.g., in CW
independently in each condition, to read all (Amato et al., condition, when the stimulus is the word/red/printed in green
2006) or a fixed number (Valgimigli et al., 2010) of presented ink, patient says “Red” instead of “Green”). Such behaviors
stimuli. Others consider the number of correct answers produced provide an indication of the failure to maintain consistent
in a fixed time (30 s; Amato et al., 2006; Brugnolo et al., 2015). activation of the intended response in the incongruent Stroop
Caffarra et al. (2002) and Valgimigli et al. (2010) provide a more condition, even if the participants properly understand the
complex interference index that relates the subject’s performance task. Such scenarios are often reported in different clinical
in the incongruous condition with the performance in the others. populations. For example, in the incongruous condition, patients
In Caffarra et al. (2002), two interference indexes based on with frontal lesions (Vendrell et al., 1995; Stuss et al., 2001; Swick
reading speed and accuracy, respectively, are computed using and Jovanovic, 2002) as well as patients affected by Parkinson’s
the following formula: Disease (Fera et al., 2007; Djamshidian et al., 2011) reported
significant impairments in terms of accuracy, but not in terms
I = CW − ((W + C)/2) (6) of processing speed. Counting the number of correct answers in
a fixed time (Amato et al., 2006; Valgimigli et al., 2010; Brugnolo
Furthermore, in Valgimigli et al. (2010) an interference score is et al., 2015) may be a plausible solution.
computed using the formula: Moreover, it must be noted that error rate (and not the
speed) is an index of inhibitory control (McDowd et al.,
I = ((DC − DI)/(DC + DI)) × 100 (7) 1995) or an index of ability to maintain the tasks goal
temporarily in a highly retrievable state (Kane and Engle, 2003).
where DC represents the correct answers produced in 20 s Nevertheless, computing exclusively the error rate (i.e., the
in naming colors and DI corresponds to the correct answers accuracy in the performance), without measuring the speed of
achieved in 20 s in the interference condition. However, they performance, would be insufficient for an extensive evaluation
do not take into account the performance on the word reading of the performance in the SCWT. In fact, the behavior in the
condition. incongruous condition (i.e., CW) may be affected by difficulties
that are not directly related to an impaired ability to suppress
DISCUSSION the interference process, which may lead to misinterpretation
of the patient’s performance. People affected by color-blindness
According to the present review, multiple SCWT scoring or dyslexia would represent the extreme case. Nonetheless, and
methods are available in literature, with Golden’s (1978) version more ordinarily, slowness, due to clinical circumstances like
being the most widely used. In the Italian literature, the dysarthria, mood disorders such as depression, or collateral
heterogeneity in SCWT scoring methods increases dramatically. medication effect, may irremediably affect the performance in
The parameters of speed and accuracy of the performance, the SCWT. In Parkinson’s Disease, ideomotor slowness (Gardner
essential for proper detection of the Stroop Effect, are scored et al., 1959; Jankovic et al., 1990) impacts the processing speed
differently between studies, thus highlighting methodological in all SCWT conditions, determining a global difficulty in the
inconsistencies. Some of the reviewed studies score solely the response execution rather than a specific impairment in the
speed of the performance (Amato et al., 2006; Valgimigli CW condition (Stacy and Jankovic, 1992; Hsieh et al., 2008).
et al., 2010). Others measure both the accuracy and speed Consequently, it seems necessary to relate the performance in
of performance (Barbarotto et al., 1998; Brugnolo et al., the incongruous condition to word reading and color naming
2015); however, they provide no comparisons between subjects’ abilities, when inhibition capability has to be assessed, as
performance on the different SCWT conditions. On the other proposed by Caffarra et al. (2002). In this method the W score
hand, Caffarra et al. (2002) compared performance in the W, and C score were subtracted from CW score. However, as
C, and CW conditions; however, they computed speed and previously mentioned, the scoring method suggested by Caffarra
accuracy independently. Only Valgimigli et al. (2010) present a et al. (2002) computes errors and speed separately. Thus, so far,
scoring method in which an index merging speed and accuracy none of the proposed Italian normative scoring methods seem
is computed for the performance in all the conditions; however, adequate to assess patients’ performance in the SCWT properly
the Authors assessed solely the performance in the C and the and informatively.
CW conditions, neglecting the subject’s performance in the W Examples of more suitable interference scores can be found
condition. in non-Italian literature. Stroop (1935) proposed that the ability
In our opinion, the reported scoring methods impede an to inhibit cognitive interference can be measured in the SCWT
exhaustive description of the performance on the SCWT, as using the formula:
suggested by clinical practice. For instance, if only the reading
time is scored, while accuracy is not computed (Amato et al., total time + ((2 × mean time per word)
2006) or is computed independently (Caffarra et al., 2002), the × number of uncorrected errors) (8)

where, total time is the overall time for reading; mean time per According to the review, the studies with Italian normative
word is the overall time for reading divided by the number of data present different theoretical interpretations of the SCWT
items; and the number of uncorrected errors is the number of scores. Amato et al. (2006) and Caffarra et al. (2002) describe the
errors not spontaneously corrected. Gardner et al. (1959) also SCWT score as a measure of the fronto-executive functioning,
propose a similar formula: while others use it as an index of the attentional functioning
(Barbarotto et al., 1998; Valgimigli et al., 2010) or of general
total time + ((total time/100) × number of errors) (9) cognitive efficiency (Brugnolo et al., 2015). Slowing to a response
conflict would be due to a failure of selective attention or a lack in
where 100 refers to the number of stimuli used in this the cognitive efficiency instead of a failure of response inhibition
version of the SCWT. When speed and errors are computed (Chafetz and Matthews, 2004); however, the performance in
together, the correct recognition of patients who show difficulties the SCWT is not exclusively related to concentration, attention
in inhibiting interference despite a non-pathological reading or cognitive effectiveness, but it relies to a more specific
time, increases. However, both the mentioned scores (Stroop, executive-frontal domain. Indeed, subjects have to process
1935; Mitrushina et al., 2005) may be susceptible to criticism selectively a specific visual feature blocking out continuously
(Jensen and Rohwer, 1966). In fact, even though accuracy the automatic processing of reading (Zajano and Gorman, 1986;
and speed are merged into a global score in these studies Shum et al., 1990), in order to solve correctly the task. The specific
(Stroop, 1935; Mitrushina et al., 2005), they are not computed involvement of executive processes is supported by clinical data.
independently. In Gardner et al. (1959) the number of errors Patients with anterior frontal lesions, and not with posterior
are computed in relation to the mean time per item and then cerebral damages, report significant difficulties in maintaining a
added to the total time, which may be redundant and lead to a consistent activation of the intended response (Valgimigli et al.,
miscomputation. 2010). Furthermore, Parkinson’s Disease patients, characterized
The most adopted scoring method in the international by executive dysfunction due to the disruption of dopaminergic
panorama is Golden (1978). Lansbergen et al. (2007) point pathway (Fera et al., 2007), reported difficulties in SCWT despite
out that the index IG might not be adequately corrected for unimpaired attentional abilities (Fera et al., 2007; Djamshidian
inter-individual differences in the reading ability, despite its et al., 2011).
effective adjustment for color naming. The Authors highlight
that the reading process is more automated in expert readers, CONCLUSION
and, consequently, they may be more susceptible to interference
(Lansbergen et al., 2007), thus, requiring that the score is According to the present review, the heterogeneity in the
weighted according to individual reading ability. However, SCWT scoring methods in international literature, and most
experimental data suggests that the increased reading practice dramatically in Italian literature, seems to require an innovative,
does not affect the susceptibility to interference in SCWT alternative and unanimous scoring system to achieve a more
(Jensen and Rohwer, 1966). Chafetz and Matthews (2004)’s article proper interpretation of the performance in the SCWT. We
might be useful for a deeper understanding of the relationship propose to adopt a scoring method in which (i) the number of
between reading words and naming colors, but the debate correct answers in a fixed time in each SCWT condition (W,
about the role of reading ability on the inhibition process C, CW) and (ii) a global index relative to the CW performance
is still open. The issue about the role of reading ability on minus reading and/or colors naming abilities, are computed.
the SCWT performance cannot be adequately satisfied even Further studies are required to collect normative data for
if the Victoria Stroop Test scoring method (Strauss et al., this scoring method and to study its applicability in clinical
2006b) is adopted, since the absence of the standard W settings.
In the light of the previous considerations, we recommend AUTHOR CONTRIBUTIONS
that a scoring method for the SCWT should fulfill two main
requirements. First, both accuracy and speed must be computed Conception of the work: FS. Acquisition of data: ST. Analysis
for all SCWT conditions. And secondly, a global index must and interpretation of data for the work: FS and ST. Writing: ST,
be calculated to relate the performance in the incongruous and revising the work: FS. Final approval of the version to be
condition to reading words and color naming abilities. The first published and agreement to be accountable for all aspects of the
requirement can be achieved by counting the number of correct work: FS and ST.
answers in each condition in within a fixed time (Amato et al.,
2006; Valgimigli et al., 2010; Brugnolo et al., 2015). The second ACKNOWLEDGMENTS
requirement can be achieved by subtracting the W score and C
score from CW score, as suggested by Caffarra et al. (2002). None The Authors thank Prerana Sabnis for her careful proofreading
of the studies reviewed satisfies both these requirements. of the manuscript.

