Kung Et Al (2018) Attention Questions
Kung Et Al (2018) Attention Questions
Kung Et Al (2018) Attention Questions
doi: 10.1111/apps.12108
INTRODUCTION
Self-report measurement scales are the capstone method in survey research
and they influence many organisational decisions, such as personnel selection
and assessment. In an ideal world, survey respondents are assumed to pay
adequate attention to each scale item so that their responses are meaningful
and offer valid measurement of a psychological construct. Yet, the ideal world
differs from the reality. Some evidence suggests that at least 5 per cent or more
of respondents answer scale items carelessly (Johnson, 2005), and this percent-
age can be as high as 60 per cent when respondents receive little or no incen-
tive to complete a survey (Berry et al., 1992; Hauser & Schwarz, 2016; Meade
in general, have shown great success in screening out careless survey respond-
ents to protect the validity of scale measurement (see Meade & Craig, 2012;
Woods, 2006).
Another variation of attention checks is called instructional manipulation
checks (IMC; Oppenheimer, Meyvis, & Davidenko, 2009). An IMC item tends
to be elaborate, with a critical cue to the correct answer hidden within a lengthy
instruction. Appendix A provides a typical example of an IMC item. In that
example, instead of answering the surface question (i.e. what workplace facili-
ties are available), the key to passing the IMC is the last sentence of the para-
graph, which instructs respondents to enter “I read the instructions” in the
textbox. The assumption of why IMC traps careless respondents is similar to
that of instructed-response items. In this respect, anyone who has read the
entire set of instructions should be able to follow the “real” instruction in the
last sentence, while any other response indicates inattention. Compared to
instructed-response items, an IMC requires more effort in reading and hence,
in theory, is a more effective technique for identifying careless respondents.
However, because an IMC stands out from the typical survey questions, its
format limits the utility of reusing an IMC, particularly in the same sample.
Moreover, an IMC looks more elaborate and can seem trickier to participants,
which may influence a respondents subjective survey experience more
strongly.
Since its publication in 2009, the IMC has been cited over 880 times.1
Despite the increasingly popular use of attention checks, there has been a pau-
city of research on how they influence survey responses. In particular, attention
checks may have a systematic influence on they way respondents answer and
understand the actual survey questions (Hauser & Schwarz, 2015).
1
As of March 2017.
comes to mind (e.g. Ferguson, Matthews, & Cox, 1999; Van Lange, 1999).2
Yet, whether attention checks in fact lead participants to deliberate more
and overthink, resulting in more inaccurate or inconsistent scale ratings,
remains an empirical question to explore.
Perhaps the most direct evidence—that supports the idea that attention
checks can pose a threat to scale validity—are studies showing that attention
checks indeed cause respondents to deliberate more. Hauser and Schwarz
(2015) conducted studies comparing the effect of receiving (vs. not receiving) an
IMC on respondents subsequent performance on a deliberation task. Their
results indicated that those who had been given an IMC before the deliberation
task scored higher on deliberation; for instance, they spent more time thinking
about a solution, and relying less on intuitive and more on rational reasoning.
Moreover, these differences did not depend on how familiar the respondents
were with attention checks, which means that seeing the attention check for the
first time has the same effect as having seen the attention check previously.
Taken together, these initial findings raise the possibility that attention checks
may damage scale validity.
Ironically, if the concern that attention checks threaten scale validity is
real, it does not influence the survey experience of careless respondents (i.e.
those who do not notice the attention check); rather, it is those who are
careful and whose data are likely retained in the actual analysis who will be
affected. This outcome can be disastrous because it means that the recom-
mended practice of using attention checks for screening (Buhrmester,
Kwang, & Gosling, 2011; Paolacci & Chandler, 2014) may create a more
serious problem than it solves. Typically, careless participants are not the
majority of a sample and error due to careless responding tends to be ran-
dom (e.g. Johnson, 2005). Because it is random, error variance due to care-
less responding can possibly be attenuated when empirical evidence
accumulates and the overall sample gets larger. However, the error due to
attention checks, if true, can be more systematic. It means that the error
accumulates across studies and cannot be attenuated even by having a larger
sample size. Such systematic error is a confound that can sway results in a
particular direction and bias research conclusions. While using attention
checks to screen participants is getting increasingly popular as a “best
practice” in the field (DeSimone, Harms, & DeSimone, 2015), their poten-
tial threat to scale validity is notably also getting increasingly critical to
address.
Are attention checks a threat to scale validity? This is an empirical ques-
tion that remains open and requires more evidence to address. Our current
research answers this question and directly examines whether the inclusion
2
We thank a reviewer for this observation.
STUDY 1
This study tests whether embedding instructed-response items in a scale or not
influences peoples responses to that scale. For the purpose of this study, we
utilised a popular organisational citizenship behavior (OCB) scale developed
by Podsakoff, MacKenzie, Moorman, and Fetter (1990). Two criteria guided
our choice of this scale. First, with a goal to inform the management literature,
we wanted to examine the impact of attention checks on an influential scale.
This OCB scale developed by Podsakoff and colleagues fits this criterion as it is
highly cited and widely used.3 A second reason for choosing this scale is that it
is multidimensional. Compared to unidimensional scales, scales with multiple
sub-dimensions are more nuanced and should be more sensitive to varying
scale responses across groups of respondents. Therefore, to aim for a stronger
test of any potential influence of attention checks, we searched for a multidi-
mensional scale. Based on the criteria, we selected the organisational citizen-
ship behavior scale.
Method
Participants and Procedures. We recruited participants through
Amazons Mechanical TurkV C (MTurk; Buhrmester et al., 2011) to complete a
3
Cited over 4,800 times as of March 2017.
Results
Mean Differences. To examine whether the instructed-response items
influenced scale responses, we first compared the mean scores of the scale
across the two conditions. Results of a between-subjects ANOVA indicated
that the experimental and control conditions did not differ in overall mean
scores or mean scores within each sub-dimensions, ps > .08 (see Table 1). These
results suggest that attention checks did not significantly alter the degree to
which respondents endorsed the items.
4
In the experimental condition, 21 participants answered either one or both instructed-
response items incorrectly. These participants reported significantly lower means than the par-
ticipants who successfully completed the attention check and in the control condition for the
overall OCB scale and most sub-dimensions. Including these participants or not did not change
the overall patterns of results. For a more stringent test of the hypotheses, they were excluded
from the analysis.
Study 1 Study 2
Mean scores
OCB overall 5.64 (.67) 5.54 (.71) .14 5.33 (.79) 5.29 (.85) .05
Conscientiousness 5.71 (1.01) 5.70 (.93) .01 5.61 (.97) 5.57 (1.05) .04
Sportsmanship 5.45 (1.05) 5.39 (1.13) .06 4.72 (.87) 4.72 (.85) .00
Virtue 5.50 (1.11) 5.39 (1.10) .10 5.26 (1.20) 5.05 (1.31) .17
Courtesy 6.00 (.80) 5.87 (.89) .15 5.59 (.97) 5.59 (1.02) .00
Altruism 5.46 (.58) 5.37 (.62) .15 5.43 (1.09) 5.46 (1.13) .03
Mean age 32.50 (8.99) 33.38 (9.18) 34.09 (10.88) 34.16 (10.89)
Male % 43.6 40.9 52.9 51.8
Mean tenure (in years) 4.61 (5.26) 5.17 (5.69) 4.89 (4.18) 5.73 (5.45)
Race (%)
Caucasian/White 79.8 74 76.3 67.3
African/Black 6.1 8.3 5.3 10.8
Hispanic/Latino 3.7 6.6 5.8 5.4
East Asian 0.6 2.8 4.3 4.9
South Asian 1.2 2.2 0.0 4.9
Native/Aboriginal 0.0 1.1 1.0 0.9
Middle Eastern 0.6 0.6 0.0 0.4
Other (e.g. mixed-race) 0.8 4.4 7.2 4.9
Education attainment (%)
Less than high school 0.6 0.0 0.5 0.5
High school 4.9 8.3 7.7 7.2
Some college 39.3 30.4 39.1 36.0
Bachelors degree 33.7 36.5 38.2 41.4
Some graduate work 9.2 8.8 1.4 2.3
Advanced degree 12.3 16.0 13.0 12.6
Median income (USD) 51,000 51,000 51,000 51,000
2 60,999 2 60,999 2 60,999 2 60,999
TABLE 2
Measurement Invariance Tests of the OCB Scale
Study 1
Configural invariance 484 939.98 <.001 0.05 0.86 – – –
Metric invariance 503 961.38 <.001 0.05 0.86 19 21.40 0.32
Scalar invariance 522 993.87 <.001 0.05 0.86 19 32.49 0.03
Partial scalar invariancea 521 986.34 <.001 0.05 0.85 18 24.96 0.13
Equivalent factor means 526 993.59 <.001 0.05 0.85 5 7.25 0.20
Study 2
Configural invariance 484 941.43 <.001 0.05 0.92 – – –
Metric invariance 503 961.52 <.001 0.05 0.92 19 20.09 0.39
Scalar invariance 522 982.85 <.001 0.05 0.92 19 21.33 0.32
Equivalent factor means 527 991.80 <.001 0.05 0.92 5 8.95 0.11
Note: a The partial scalar invariance model in Study 1 is tested against the metric invariance model.
RMSEA 5 .05, CFI 5 .86 (see Table 2), suggesting that the factor structure
and loadings of OCB are satisfactorily equivalent across the two conditions.
Next, we tested metric invariance in which the factor loadings of the same
survey items were constrained to be equal between experimental and control
groups. This is a strong test of the invariance in factors, which tells us whether
or not the same survey item relates to the underlying latent factor the same
way between the two conditions. Results for this metric invariance model indi-
cated overall acceptable model fit, v2 (503) 5 961.38, p < .001, RMSEA 5 .05,
CFI 5 .86 (see Table 2). Critically, the v2 difference test between this metric
model and the prior (configural) model was non-significant, Dv2 5 21.40,
Ddf 5 19, p > .05, suggesting metric invariance of the scale across experiment
conditions. Results indicated that the OCB scale is structurally similar for par-
ticipants who are exposed to instructed-response items and those who are not.
To provide an even more stringent measurement invariance test, we con-
ducted a test of scalar invariance. In this test, intercepts of the same survey
items were constrained to be equal between experimental and control groups.
Although this test is the least frequently conducted test of measurement invari-
ance, some researchers found its results useful for interpreting response thresh-
old differences between groups on the rating of a particular item (see
Vandenberg & Lance, 2000). We tested the scalar invariance model of the OCB
scale. Results indicated overall acceptable model fit, v2 (522) 5 993.87,
p < .001, RMSEA 5 .05, CFI 5 .85 (see Table 2). However, the v2 difference
test between this scalar model and the prior (metric) model was significant,
Dv2 5 32.49, Ddf 5 19, p 5 .03, suggesting that not all item intercepts were the
same across the two experimental conditions. To identify the source of scalar
inequivalence, we examined the item intercepts between the experimental and
C 2017 International Association of Applied Psychology.
V
ATTENTION CHECKS AND SCALE VALIDITY 273
control groups (Vandenberg & Lance, 2000). The results indicated that item
16 (see Appendix B), and only this item, had significantly different inter-
cepts across the conditions (experimental 5 0.08 vs. control 5 20.38). To
test for partial scalar invariance, we constrained the intercepts for all survey
items to be equal across the conditions except for item 16. Results for the
partial scalar invariance model indicated overall acceptable fit, v2
(521) 5 986.34, p < .001, RMSEA 5 .05, CFI 5 .85 (see Table 2). Moreover,
the v 2 difference test between the metric and partial scalar invariance mod-
els was non-significant, Dv2 5 24.96, Ddf 5 18, p > .05. Overall, scalar invar-
iance test results suggested that the item score intercepts of the OCB scale
were very similar for respondents who received the instructed-response
items and those who did not.5
In sum, Study 1 suggests no evidence that instructed-response items are a
threat to scale validity. Respondents seeing attention checks or not in the sur-
vey did not differ in their responses to and understanding of the scale. Going
beyond the current findings, we also wanted to find out if an IMC item poses a
scale threat. Compared to instructed-response items, an IMC is more elaborate
and can seem trickier to participants, which may induce deliberation and influ-
ence a respondents subjective survey experience more strongly. To test the
effect of an IMC on scale responses and replicate the findings, we conducted
Study 2.
STUDY 2
This study tests whether having answered an IMC influences responses to a
subsequent scale. Consistent with Study 1, we used the organisational citizen-
ship behavior scale (Podsakoff et al., 1990) as the criterion.
Method
Participants and Procedures. We used the same recruitment and selection
procedures as Study 1. Participants completed a 5-minute online survey about
workplace experiences for US$0.30. No respondents in Study 2 had partici-
pated in Study 1. Survey respondents first reported their demographics and
5
We also tested equal factor means between groups using structural equation modeling. In
this test, the means of the same factor were constrained to be equal between experimental and
control groups. This test tells us whether or not there are significant differences between groups
on how they scored on each factor of the scale. Results for equal factor means indicated overall
acceptable model fit, v2 (526) 5 993.59, p < .001, RMSEA 5 .05, CFI 5 .85 (see Table 2). More-
over, the v2 difference test between the partial scalar invariance and equal factor means models
was non-significant, Dv2 5 7.25, Ddf 5 5, p > .05. Consistent with the between-subject ANOVA
results, including instructed-response items did not affect respondents mean scores of the scale.
Results
Mean Differences. To examine whether the IMC influenced scale
responses, we first compared the scale scores across the two conditions. Results
of a between-subjects ANOVA indicated that the experimental and control
conditions did not differ in overall mean scores or mean scores within each
sub-dimension, ps > .37 (see Table 1). These results suggest that attention
checks did not influence respondents degree of endorsement of items.
6
In the experimental condition, 21 participants provided an incorrect answer to the IMC.
These participants did not differ in overall and sub-dimensional OCB scale scores compared to
the participants who successfully completed the attention check and in the control condition.
Consistent with Study 1, including these participants or not did not change the overall patterns
of results. For a more stringent test of the hypotheses, they were excluded from the analysis.
Moreover, there was a lower number of male participants in Study 2 (see Table 1). However, the
pattern of our results was consistent across gender, and therefore gender was not included in the
main analyses.
OVERALL DISCUSSION
Taken together, findings from two separate studies both suggest no evidence
that attention checks pose a threat to scale validity. Contrary to what the extant
literature may have predicted, attention checks did not influence respondents
answers to and understanding of the scale. This was consistent regardless of
whether the attention checks were in the form of embedded items (Study 1) or
as an individual IMC (Study 2). These results contribute to organisational sci-
ence and other literatures more broadly. To our knowledge, these studies are
the first to demonstrate that attention checks do not seem to bear an underly-
ing threat to scale validity. Because attention checks are so widely used nowa-
days, this is an especially timely piece of evidence. The findings also contribute
to our understanding of survey methods and help justify researchers use of
attention checks to ensure quality data. Moreover, as the wording of
instructed-response items and IMCs are very similar across studies in the liter-
ature, our findings can generalise to many other attention check variations.
One variation would be the increasingly popular “infrequency items”—ques-
tions that yield an obvious logically right answer.8 Resembling IMCs and
instructed-response items, infrequency items may increase deliberation, but
our research would suggest that they should not pose scale validity concerns.
Furthermore, even though the studies have a strong focus on a management
science audience, our studies are just as important in informing scholars in
other academic fields that frequently use survey designs, such as psychology,
education, political science, and communication studies.
7
We also tested equal factor means between groups using structural equation modeling.
Results indicated satisfactory model fit, v2 (527) 5 991.80, p < .001, RMSEA 5 .05, CFI 5 .92
(see Table 2). Moreover, the v2 difference test between the partial scalar invariance and equal
factor means models was non-significant, Dv2 5 8.95, Ddf 5 5, p > .05. Consistent with the
between-subjects ANOVA results, including instructed-response items did not affect
respondents mean scores of the scale.
8
“I work fourteen months in a year” (Huang, Bowling, Liu, & Li, 2015).
study is “more than meets the eye”. In contrast, it is also possible that they
could be affected by attention checks less strongly because they are more used
to a less trusting environment. The effects of individual differences on survey
response style appear to be an interesting avenue to explore.
Furthermore, whereas scale validity could be one direct outcome influenced
by attention checks, there could be other and more indirect ways in which
attention checks affect the validity of measurements. Take convergent and dis-
criminant validity as an example. By inducing more deliberate thinking, atten-
tion checks may alter the way people construe relations between the constructs
measured in a survey. If this is true, attention checks will dampen convergent
and discriminant validity—the degree to which the focal concept is observed to
be similar to related constructs (i.e. convergent) and distinct from unrelated
constructs (i.e. discriminant) as theories would have predicted. One way to test
this phenomenon is to examine whether attention checks affect how well scale
measures fit into their nomological networks (Cronbach & Meehl, 1955).
However, because the literature on the effects of attention checks on scale
responses is limited, whether attention checks affect other forms of measure-
ment validity still awaits more empirical work. Through further understanding
the interplay between individuals and survey characteristics, future research
will benefit and continue to improve the quality of survey methods and
findings.
Conclusion
Attention checks have become a popular method in survey design to ensure
quality samples and hence the validity of scale measurements. However, very
recent evidence suggests that attention checks may influence respondents level
of deliberation, causing a potential threat to scale validity, which attention
checks are to protect. Our findings provided a critical and timely test and
found no evidence that attention checks significantly affected scale responses.
Researchers may continue utilising attention checks in survey designs and
examining other dynamics between respondents and survey characteristics to
advance our research methods in general.
REFERENCES
Arbuckle, J. (2010). IBM SPSS Amos 19 users guide. Crawfordville, FL: Amos
Development Corporation.
Berinsky, A.J., Huber, G.A., & Lenz, G.S. (2012). Evaluating online labor markets
for experimental research: Amazon.coms Mechanical Turk. Political Analysis,
20(3), 351–368. https://doi.org/10.1093/pan/mpr057
Berinsky, A.J., Margolis, M.F., & Sances, M.W. (2014). Separating the shirkers from
the workers? Making sure respondents pay attention on self-administered surveys.
APPENDIX A
APPENDIX B