INDIRECT MEASURES IN FORENSIC CONTEXTS
Indirect Measures in Forensic Contexts
Alexander F. Schmidt, Rainer Banse, & Roland Imhoff
University of Bonn
Please cite as:
Schmidt, A. F., Banse, R., & Imhoff, R. (2015). Indirect Measures in Forensic Contexts. In T. M. Ortner
& F. J. R. van de Vijver (Eds.). Behavior-Based Assessment: Going Beyond Self-Report in the
Personality, Affective, Motivation, and Social Domains (pp. 173-194). Göttingen: Hogrefe.
Author Note
Alexander F. Schmidt, Department of Social and Legal Psychology, University of Bonn,
Germany; Rainer Banse, Department of Social and Legal Psychology, University of Bonn, Germany;
Roland Imhoff, Sozialpsychologie: Social Cognition, University of Cologne, Germany.
Correspondence concerning this article should be adressed to Alexander F. Schmidt, Institute
for Psychology, Department of Social and Legal Psychology, University of Bonn, Kaiser-Karl-Ring 9,
53111 Bonn, Germany. Email: afschmidt@uni-bonn.de.
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Indirect Measures in Forensic Contexts
Whenever psychologists try to diagnose a condition or disposition, psychometrically precise
assessment procedures are of paramount importance. This general need is amplified in applied
forensic contexts where diagnostic decisions may collide with individual and societal rights and
needs. Assessment outcomes have as far-reaching consequences for the respondent (e.g., restraint
of individual freedom) as for members of society as a whole (e.g., risk of future victimization). These
conflicting interests underscore the need for valid measures. However, classic self-report assessment
procedures such as questionnaires and interviews are inherently transparent and can easily be faked
by respondents who are aware of the personal consequences of the assessment outcome.
Another problem of (forensic) assessment based on self-reports is the high demand imposed
on respondents’ introspective abilities. Some forensically relevant constructs may lack introspective
accessibility per se, such as situation-specific impulses and implicit offence-facilitating theories
(cognitive distortions). Other constructs may in principle be open to introspection but the quality of
self-report depends on certain cognitive skills that are not common among prototypical forensic
populations, who usually have relatively low education levels and weak verbal skills. Due to these
crucial validity threats, researchers and practicioners alike question the usefulness of self-report
techniques (but see Grieger, Hosser, & Schmidt, 2012; Walters, 2006). Therefore, specifically in
forensic contexts, there is a strong need for reliable and valid measurement paradigms. Ideally,
these should not rely on explicit self-report and introspection and should be less transparent as well
as less deliberately controllable.
What Are Direct/Indirect vs. Explicit/Implicit Measures?
The search for a solution to the drawbacks of self-report approaches has led to an increasing
research interest in “implicit” and/or “indirect” measures. This measurement approach has shown
remarkable predictive validity across different psychological subdisciplines (e.g., Friese, Hofmann, &
Schmitt, 2008; Perugini, Richetin, & Zogmaister, 2010), including the forensic domain (Snowden &
2
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Gray, 2010). The success of indirect measures has been attributed to the fact that these approaches
benefit from being (a) inherently less transparent than self-report measures (due to the indirect
character of the measurement procedure) and (b) able to tap into automatic attitudes and
behavioral dispositions (because of the implicitness of the constructs to be measured). However,
despite the immense popularity of these measures, the precise terminological differences between
the theoretical attributes “indirect” and “implicit” do not always seem to be unequivocal as these
terms are often used interchangeably.
For the remainder of the chapter we will rely on the terminological distinction proposed by De
Houwer (2006). Accordingly, it has to be distinguished between two different uses of the term
measure that either refers to the measurement outcome or the measurement process. The term
implicit is reserved for various functional properties of the measurement outcome. These properties
describe typical criteria of automaticity that specify the particular sense in which a measure is
implicit (e.g., respondents’ lack of awareness of the relationship between the assessed construct and
the measurement outcome, lack of conscious access to the relevant construct, lack of voluntarily
control over the assessment outcome). These properties do not necessarily co-occur and have to be
demonstrated empirically rather than being mere presumptions. The term indirect refers to the
procedural properties of the measurement that are always based on an explict set of rules of how
the measurement score is derived from the assessment (otherwise, it would not qualify as a measure
in a scientific sense). Whereas direct measures rely on introspective self-descriptions or ratings of
indicators (e.g., questionnaire items) of the relevant construct, indirect assessment procedures use
the behavior exhibited in response to a stimulus (e.g., response latencies while categorizing images)
to draw indirect inferences on the construct in question. Notably, the measurement outcome of an
indirect measure is not necessarily implicit (De Houwer, 2006), as for example individuals might be
fully aware of the purpose of the assessment (which is true for most indirect measures utilized in
forensic psychology).
3
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
In line with De Houwer’s (2006) abovementioned terminological distinction, in the remainder
of this chapter we will focus on forensically relevant measures that draw their diagnostic inferences
indirectly from task behavior (i.e., response latencies1). A wide range of forensically relevant, latencybased indirect measures have been introduced to tap into various domains of individual differences
(Snowden & Gray, 2010). By far the most research utilizing indirect measures, however, has focused
on deviant sexual interests (DSI) in children2 (for recent overviews, see Snowden, Craig, & Gray,
2011; Thornton & Laws, 2009a). Indirect latency-based measures of DSI are often referred to as
“attention-based measures” (Gress & Laws, 2009; Kalmus & Beech, 2005; Ò Ciardha, 2011),
postulating that the underlying processes rely on the differential allocation of attentional resources.
However, for most of these measures this remains hypothetical, as only a small body of empirical
research into their procedural underpinnings exists. Also, empirical demonstrations of relevant
implicitness criteria are missing. As a consequence, in focusing on the nature of the dependent
variable we prefer to use latency-based indirect measures as the theoretically most parsimonious
umbrella term.
Latency-Based Indirect Measures of Deviant Sexual Interests
An overview of latency-based indirect measures of DSI utilizing samples involving male sexual
offender s against children is reported in Table 1. All of these measures capitalize on individual
differences in (sexual) information processing and, as a result, also get framed as cognitive
approaches to the assessment of DSI (Thornton & Laws, 2009a).
(Insert Table 1)
An important distinction is whether indirect measures assess deviant sexual interest (DSI) or
deviant sexual preferences (DSP). Interest refers to the absolute level of sexual interest in a specific
1
Due to space restrictions we will exclude indirect assessment paradigms that capitalize on physiological
reactions such as polygraphy in the field of deception detection (for a critical overview see National Research
Council, Committee to Review the Scientific Evidence on the Polygraph, 2003; but see Verschuere, Ben-Shakar,
& Meijer, 2011) or penile plethysmography/phallometry (for an overview see Kalmus & Beech, 2005).
2
Throughout this chapter the term deviant refers to sexual interest in prepubescent children (irrespective of
other paraphilic or otherwise abnormal sexual interests) as indicated by corresponding fantasies or behavior.
4
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
target category (e.g., prepubescent children) irrespective of interest in other categories (e.g.,
postpubescent individuals), whereas preference denotes relative sexual preference for one target
category over another target category and is usually based on a difference index of a target category
minus a comparison category (e.g., prepubescent over postpubescent individuals). Notably, although
representing two different conceptualizations, interest and preference are often used
interchangeably in the literature. Several indirect measures – the most prominent example being the
Implicit Association Test (IAT; Perugini, Costantini, Richetin, & Zogmaister, this volume) – are
inherently conceptualized as DSP measures because they are calculated from latency differences
based on sexually relevant versus sexually irrelevant trials. DSP measures do not convey diagnostic
information about the absolute level of DSI, because the baseline level of DSI gets eliminated in the
computation. Thus, a person with strong interest in both children and adults will have DSP scores
comparable to a person with weak interest in both target categories.
In addition to the DSI/DSP distinction, latency-based indirect measures of DSI can be grouped
into two distinct measurement approaches: task-relevant and task-irrelevant paradigms. Taskrelevant indirect DSI/DSP measures involve the explicit categorization of sexual target categories –
either as sexually relevant themselves or in combination with classification trials of sexual attributes.
Due to the explicit task requirement to process sexual relevance, it is fairly transparent to the
respondent that DSI is the diagnostic construct of interest. However, as respondents are not usually
informed that response latencies are the central dependent variable and the underlying rationale of
the diagnostic inference is unknown to them, these measures qualify as indirect. Task-irrelevant
measures are based on the detection of sexually irrelevant characteristics (e.g., location, colour,
semantic meaning) of target stimuli that are presented together with distracting sexually relevant vs.
irrelevant background stimuli (e.g., adults, children). The underlying rationale is that sexually relevant
background stimuli interfere with the primary detection task due to attentional capture resulting in
increased response latencies.
Task-relevant indirect measures of deviant sexual interests
5
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Viewing time tasks. Viewing times (VT) were the first ever latency-based indirect measure of
sexual interest (Rosenzweig, 1942). In the standard VT procedure, participants are asked to evaluate
pictures of target individuals on a graded scale of sexual attractiveness/arousal. The response latency
of this judgment is unobtrusively measured. It is a robust finding that this response latency is longer
for sexually attractive as compared to sexually unattractive targets and, in turn, VT measures can be
used to discriminate between participants with respect to sexual preference (e.g., Imhoff, Schmidt,
Nordsiek, Luzar, Young, & Banse, 2010).
The underlying processes driving the robust effects of longer response latencies for sexually
attractive targets are not entirely clear. Three mechanisms have been frequently proposed: (a)
deliberate delay due to the hedonic quality of sexually preferred targets, (b) automatic attentional
adhesion, (c) slowing down of decision making processes after the presentation of explicit erotic
stimuli (sexual content induced delay; Geer & Bellard, 1996). We conducted a series of experiments
providing the first causal tests of VT processes (Imhoff et al., 2010). Deliberate delay to keep pleasant
stimuli in sight longer was ruled out because prolonged response latencies also emerged for relevant
sexual attractiveness ratings in the absence of target pictures that had been presented beforehand
for a fixed amount of time. Additionally, VT effects emerged when restricted response windows of
1000 ms were utilized. Furthermore, attentional adhesion to sexually attractive stimuli also could not
fully explain VT effects: If sexually attractive stimuli lead to longer response latencies because the
stimuli automatically capture participants’ attention and distract them from the rating task, VT
effects should vanish when the sexually attractive stimuli are taken away from the participants prior
to rating sexual attractiveness. However, as described above, this was not the case. Finally, VT effects
still emerged for stimuli depicting target stimuli’s heads without any further indications of erotic
content, thereby ruling out sexual content induced delay as a causal expalanation. These results raise
the question of whether the term “viewing time” is a misnomer, as participants saw all stimuli for the
same amount of time under restricted viewing conditions but still differed in their response latencies.
6
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Accordingly, standard VT effects should be described as prolonged decision latencies for sexually
attractive targets (Imhoff et al., 2010).
As a consequence, two further mechanisms remained as plausible explanations for VT effects
even in absence of a stimulus while responding: (a) automatic time-consuming schematic processes
triggered by sexually attractive stimuli (stimulus-specific effects) and (b) cognitive demands
associated with the task of rating sexual attractiveness (task-specific effects). Imhoff, Schmidt, Weiß,
Young, and Banse (2012) disentangled stimulus- and task-specific effects by manipulating the sexual
orientation perspective under which male participants responded to standard VT tasks. It was shown
that VT effects were predominantly a function of the assigned perspective (task-specific account) and
not dependent on participants’ sexual orientation (stimulus-specific account). In other words, sexual
attractiveness ratings from a vicarious (e.g., heterosexual) perspective took longest when the targets
were adult females, regardless of participants’ actual hetero- or homosexual orientation. This is at
odds with the notion that VT measures primarily tap into hot automatic processes elicited by sexually
attractive stimuli rather than task-dependent response strategies (e.g., scrutinizing whether the
stimulus exhibits the right age, sex, and/or attractiveness for being a sexually exciting stimulus). The
latter process is based on the assumption that the more criteria for endorsing sexual attraction have
to be affirmed, the longer the scrutinizing takes (i.e., VTs increase), whereas rejection of any criterion
allows the process to stop immediately (i.e., VTs decrease). In line with this account of task- vs.
stimuli-specific effects, Glasgow (2009) reported that neither perceived sexual competition of female
mate rivals nor filial affection to children in heterosexual women – both stimulus characteristics
hypothesized to increase VTs for sexually irrelevant categories – conflated standard VT effects while
rating sexual attractiveness.
Since the first demonstrations of VT effects for child sexual offenders (Harris et al., 1996),
there have been numerous independent forensic replications (for an overview of VT studies on DSI
see Table 1). Viewing time tasks have been shown to distinguish between child sexual offenders and
non-offending controls (Glasgow, 2009; Harris et al., 1996; ds = 1.61 and 1.00), mixed community and
7
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
offender controls (Banse et al., 2010; Fromberger et al., in press; ds from 0.76 to 0.82), as well as
varying offender control groups such as non-sexual offenders (Babchishin, Nunes, & Kessous, 2012;
Banse et al., 2010; ds from 0.86 to 1.84) and adult sexual offenders (Abel et al., 2004; Gress, 2005;
Worling, 2006; ds from 0.51 to 1.08). They also differentiated between different subtypes of child
sexual offenders (e.g., child sexual offenders who victimized boys or boys and girls vs. only girls;
Gress, 2005; Schmidt, Gykiere, Vanhoeck, Mann, & Banse, in press; ds from 0.84 to 1.65).
Reports of internal consistency on raw latencies (Cronbach’s α) for forensic VT measures
generally ranged between .72 and .93 (with only two notable exceptions of .60 [male child stimuli;
Letourneau, 2002] and .62 [male adolescent stimuli; Worling, 2006]; see Table 1). Sets of African
American stimuli were tested in two studies (Abel et al., 1998; Letourneau, 2002), but only the latter
author reported origin-specific analyses (internal consistency of the African American categories was
comparable to the Caucasian stimulus set with αs ranging from .72 to .87). These generally satisfying
to good coefficients might overestimate the reliability of VT measures as the calculations might be
confounded with general executive functioning (i.e., reaction speed). However, general classification
speed as assessed by a different task was found to be unrelated to VT DSI/DSP measures (Schmidt et
al., in press). At present, no data on retest reliability are available for VT DSI measures.
Viewing time DSI measures oftentimes converge with corresponding self-report measures
(e.g., Abel et al., 1998; Babchishin, et al., 2012; Banse et al., 2010; Glasgow, 2009; Harris et al., 1996;
Worling, 2006) as well as DSP Implicit Association Tests (Babchishin, Nunes, & Hermann, in press;
Banse et al., 2010). However, particularly in light of problems with self-reports, comparing VT with
other non-self-reported measures seems advisable. A prime candidate for this is penile
plethysmography (PPG), often regarded as the most valid measure of DSI (Seto, 2008; regarding
methodological shortcomings see, e.g., Kalmus & Beech, 2005). Two studies that concurrently
utilized VT and PPG measures of DSI/DSP confirmed their convergent validity (Letourneau, 2002;
Stinson & Becker, 2008; rs between .28 and .61). However, another study reported a negative
association between VT and PPG indexes (r = -.47; Babchishin et al., 2012), rendering the findings on
8
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
convergence of VT with PPG assessments inconclusive. The Screening Scale for Pedophilic Interests
(Seto & Lalumière, 2001) is another indicator of pedophilic interest based on an index of offending
behavior. It is phallometrically validated and also associated with recidivism risk (Seto, Harris, Rice, &
Barbaree, 2004). VT measures were reported to converge with the Screening Scale for Pedophilic
Interests (Banse et al., 2010; Schmidt et al., in press). Additionally, Schmidt et al. (in press) reported
preliminary support for positive correlations of VT DSI/DSP measures with recidivism risk as assessed
by standard actuarial risk indicators (Static-99R; Helmus, Thornton, Hanson, & Babchishin, 2012).
At present, no data have been published on the fakeability of VT paradigms when used in
forensic contexts (as this would risk informing at least a number of offenders of the underlying
scoring procedures). Obviously, VT tasks are easy to fake once the measurement rationale is known.
In line with this, naïvely dissimulating pedophiles exhibited were significantly less accurately
classified than non-dissimulating pedophilic child sexual offenders (Gray & Plaud, 2005; d = -2.13).
This finding can be criticized in terms of its post-hoc classification algorithms for the dissimulators
and its strong sample selection effects (Sachsenmaier & Gress, 2009). Opposing evidence stems from
another study: VT tasks did not show differences between non-informed deniers vs. admitters of
child sexual offending such that both deniers and admitters could be discriminated from non-sexual
offender controls (Babchishin et al., 2012; ds = 1.22 and 1.32, respectively). The results from
Babchishin et al. (2012) thus provide the first evidence of VT tasks’ robustness against non-informed
dissimulation although this finding awaits replication.
In summary, VT tasks can be regarded as among the most frequently researched latency-based
measures of DSI. There have been numerous reports from different labs demonstrating that VT
measures are satisfactorily reliable and valid indicators of DSI in forensic contexts. The VT effect has
been regarded as so robust that there are commercially distributed VT paradigms (e.g., Abel, 1995).
However, from a scientific perspective data based on Abel’s VT routines have to be treated with
some caution as crucial methodological details have not been published (for a critical overview see
9
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Sachsenmaier & Gress, 2009). Furthermore, VT tasks have been shown to be robust against deniers
(Babchishin et al., 2012) uninformed about successful faking strategies.
On the other hand, the predominantly task-driven nature of VT effects is a potential threat to
the diagnostic validity of VT paradigms. The task-dependency also cautions against the interpretation
of VT effects as caused by automatic processes elicited by sexually attractive stimuli and outside of
conscious control (i.e., attentional adhesion; Imhoff et al., 2010). Thus, participants’ compliance in
completing the secondary rating task from their own self-relevant perspective is of crucial
importance. VT measures result in good differentiations of sexually deviant from non-deviant
samples only as long as participants comply with the instructions to rate how subjectively sexually
attractive targets are. However, when participants (naïvely or knowingly) complete the task from a
perspective other than their own (Imhoff et al., 2012) or with a completely different task (e.g., rating
age of the target; Petruschke, Imhoff, Banse, & Weber, in preparation), latency patterns in standard
VT paradigms will most likely be nondiagnostic.
Implicit Association Tests. The Implicit Association Test (IAT) introduced by Greenwald et al.
(1998) is another prominently researched latency-based indirect measure. In forensic contexts, the
prototypical Children/Adults DSP IAT is based on two double discrimination tasks – the so-called
critical blocks – assessing associative strengths between target categories (e.g., Children vs. Adults)
and attribute categories (e.g., Sexually exciting vs. Sexually unexciting), both arranged on bipolar
dimensions (for a detailed description of the assessment procedure and underlying processes see
Perugini et al., this volume). Classical IATs are usually scored by calculating the difference between
the mean response latencies of compatible and incompatible critical blocks, divided by the pooled
standard deviation of the response latencies (Greenwald, Nosek, & Banaji, 2003). Given that this
calculation depends on a standardized difference index, DSP IATs are inherently effect size measures
(analogous to Cohen’s d) of relative DSP (as opposed to absolute DSI measures such as raw VTs).
10
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
There have been several independent reports of DSP IAT effects in forensic populations (for an
overview see Table 1). DSP IATs differentiated between child sexual offenders and non-offending
controls (Mihailides et al., 2004; Nunes et al., 2007; ds 0.71 and 0.92), mixed community and
offender controls (Banse et al., 2010; ds from 0.43 to 0.82), as well as varying offender control groups
such as non-sexual offenders (Brown et al., 2009; Mihailides et al., 2004; ds 0.92 and 0.95,
respectively) or adult sexual offenders (Gray et al., 2005; d = 0.84). Furthermore, DSP IATs
distinguished between child sexual offenders who victimized either only boys or boys and girls vs.
only girls (Schmidt et al., in press; ds from 0.64 to 0.72) and child sexual offenders whose victims
were under twelve years of age vs. twelve years and older (Brown et al., 2009; d = 0.77). Van
Leeuwen et al. (2012) reported strong DSP IAT differences between self-identified community
pedophiles and non-offending controls (d = 1.74). DSP IATs were not confounded by general
classification speed abilities (Schmidt et al., in press). In a meta-analysis, Babchishin et al. (in press)
reported a mean DSP IAT effect of d = 0.63 between child molesters and non-molesters. As expected,
group differences were largest for comparisons of child sexual offenders to non-offenders (d = 0.96),
followed by comparisons to non-sexual offenders (d = 0.58) and to rapists (d = 0.48). Notably,
treatment participation was a significant moderator of IAT effects: DSP IATs showed larger effects for
child sexual offenders who had not undergone treatment than for treated child sexual abusers when
compared with control groups. These findings corroborate either treatment effects on indirectly
assessed DSP or confounds associated with child sexual offender treatment (e.g., group selection
effects on behalf of suspected lower DSI levels).
Retest-reliability was tested only once for DSP IATs (rtt = 0.63; Brown et al., 2009). Reports of
internal consistency (Cronbach’s α) for DSP IATs comparing sexual interest in children vs. adults
ranged between .72 and .83 (Table 1), thereby corroborating the satisfactory reliability of these
measures. However, the only two studies using sex-specific DSP IATs (Banse et al., 2010; Schmidt et
al., in press) reported lower alphas for Boys/Men (.61 and .65, respectively) in comparison to
Girls/Women IATs (.79 and .82). This difference might be attributed to variance restrictions in typical
11
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
forensic samples: Homosexual orientation is usually overrepresented in randomly selected child
sexual offender samples. This results in less clearly differentiated DSP patterns for Boys/Men IATs
between child sexual offender and control groups: Homosexually oriented child sexual abusers show
less DSP as they are likely to be interested in both boys and men, whereas heterosexual controls are
interested in neither boys nor men and thus show less DSP as well (Banse et al., 2010; Schmidt et al.,
in press). Whether the underlying rationale of sex-specific DSP IATs to disentangle sexual orientation
from sexual maturity preferences is a viable option to increase criterion validity is an open empirical
question as sex-differentiated IATs produced smaller DSP effects than generic Children/Adults IATs
(Babchishin et al., in press).
Convergent validity with other DSP measures was shown meta-analytically (Babchishin et al., in
press): DSP IATs converged with moderate effect sizes with self-report, VT, and offence-behavioral
measures of DSI (Screening Scale for Pedophilic Interests) as well as with actuarial estimates of
recidivism risk (r = .27). No convincing DSP IAT associations with corresponding PPG indexes have
been reported so far.
Although the IAT has been considered resistant to deliberate faking attempts, it has repeatedly
been shown to be fakeable when respondents are informed about successful faking strategies (e.g.,
slowing of latencies in consistent blocks), are experienced with the paradigm, and/or are strongly
motivated to fake results (Teige-Mocigemba et al., 2010). Nevertheless, there is preliminary evidence
(Brown et al., 2009) that DSP IATs can distinguish between denying child sexual offenders and nonsexual offending control groups (d = 1.01) but not between denying and admitting child sexual
offenders (d = .27; non-significant). On the other hand Babchishin et al. (2012) failed to show any
differentiation between either deniers or admitters vs. non-sexual offender controls, respectively.
In summary, in addition to VT tasks, IATs have emerged as a second robust paradigm to
indirectly assess DSI/DSP. Multiple studies from independent labs as well as a first meta-analysis
(Babchishin et al., in press) demonstrated that IATs are reliable and valid indicators of DSP for
12
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
children over adults in forensic contexts. Still, some issues need further research. First, given that the
interference effects that drive DSP IATs are based on the associative strength between concepts such
as Children and Sexual excitement, where these associations originate from is an interesting question
(Snowden et al., 2011). Is an association of children and sex a valid indicator of genuine DSP or an
indicator of childhood experiences of sexual abuse – a condition quite prevalent among child sexual
abusers (Seto, 2008)? Second, IATs have repeatedly been proven fakeable. Attempts to develop
methodological strategies to detect and even correct deliberate manipulations of IAT results in
forensic populations (Cvencek, Greenwald, Brown, Gray, & Snowden, 2010) need to be viewed with
some caution. Statistics on the discriminatory power of detection algorithms are based on group
means, which may have limited validity in single case diagnostics (i.e., relative group differences used
to classify dissimulation are not available in single case assessments and are based on sample
characteristics that might not be relevant to the actual case in question). Also, as these statistics are
based on comparisons with response-latencies in consistent blocks of an uncritical baseline IAT (e.g.,
Gender/Self IAT; Cvencek et al., 2010), respondents who know this could easily start to fake the
baseline IAT as well (by slowing latencies in the consistent block of the baseline IAT).It seems
somewhat of a paradox to derive faking-resistant countermeasures from a measure that is fakeable
in itself.
Implicit Relational Assessment Procedure. A fairly recent, task-relevant indirect measure is
the Implicit Relational Assessment Procedure (IRAP) introduced by Dawson et al. (2009). In the IRAP,
target categories (Children vs. Adults) and target words representing sexual vs. non-sexual attributes
are presented in either consistent (in accordance with the respondent’s individual associations) or
inconsistent (at odds with the respondent’s associations) pairings. During the task, participants are
forced to categorize the presented pairings as either true or false according to predetermined
contingencies: During one type of blocks (consistent for non-deviant individuals), participants are
required to categorize adults as sexual (e.g., Adult – Sexual – True; Adult – Nonsexual – False) and
children as nonsexual (e.g., Child – Nonsexual – True; Child – Sexual – False) as opposed to
13
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
(inconsistent) blocks during which the feedback contingencies were reversed, and participants were
required to categorize adults as nonsexual (e.g., Adult – Sexual – False; Adult – Nonsexual – True) and
children as sexual (e.g., Child – Sexual – True; Child – Nonsexual – False). The rationale of the IRAP is
that it takes less time to respond positively to pairings that are consistent with beliefs than to
pairings inconsistent with beliefs, because during consistent trials answer keys and initial responses
are matched, whereas in inconsistent trials the initial response has to be inhibited and overcome
with an alternative response that is incompatible with automatic individual associations. Similar to
the IAT, the IRAP DSP index is calculated as a d-measure from the difference of response latencies in
consistent vs. inconsistent blocks (Dawson et al., 2009). Dawson et al. (2009) were able to
differentiate between child sexual abusers and non-offender controls (d = .91) and the IRAP DSP
index was unrelated to years of education in their sample. No further psychometric properties were
reported. Hence, the IRAP has to be regarded as among the least researched indirect measures of
DSI/DSP with only preliminary findings concerning its validity.
Eye Movement Tracking Task. Fromberger et al. (in press) recently demonstrated the
potential of assessing eye movements as another indirect measure of DSP. In a paired comparison
task, pictures of girls vs. women and boys vs. men had to be classified according to which of the
stimuli was more sexually attractive. Initial fixation latencies as well as relative fixation times
aggregated into DSP indexes differentiated between pedophilic child sexual abusers and nonpedophilic (healthy and forensic non-child sexual offending) controls (ds = 1.84 and 1.34,
respectively). Initial fixation latencies showed good diagnostic accuracy in terms of sensitivity and
specificity. This preliminary finding holds promise for forensic purposes because initial fixation
latencies are deemed quite robust against faking attempts as they are regarded as an indicator of
automatic bottom-up information processing. However, average mean initial fixation latencies in the
Fromberger et al. (in press) study were roughly one second, which cannot be regarded as indicating
initial automatic information processing. Clearly, more research is needed on the reliability of these
14
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
eye movement measures as well as their potential to distinguish deniers and non-deniers from
relevant control groups.
Task-irrelevant indirect measures of deviant sexual interests
Emotional Stroop Tasks.The classical Stroop interference paradigm (Stroop, 1935) has been
among the first to be adapted to an indirect measure of DSP in forensic populations (for an overview
see Price, Beech, Mitchell, & Humphreys, in press). Initially, Emotional Stroop variants in which
participants had to classify the print colour of sexual vs. neutral word stimuli were utilized. Sexual
words are hypothesized to produce longer response latencies compared to neutral words due to
increased emotional salience that interferes with the colour classification task. In an initial study,
Smith and Waterman (2004) found no such effect between child sexual offenders and rapists for
sexual words representing child molesting and rape themes (only five offenders in each group). On
the other hand, Price and Hanson (2007) were able to discriminate child sexual offenders from nonoffending controls (d = 0.82) utilizing the same stimulus words as Smith and Waterman (2004) but
failed to show any effects with an alternative, more offence-specific stimulus set. Another Emotional
Stroop variant utilizing differently coloured pictorial stimuli of children vs. adults did not distinguish
between child sexual offenders and non-offending control groups (Ó Ciardha & Gormley, 2012). Van
Leeuwen et al. (2012) introduced a Picture-Word Stroop variant during which words superimposed
on pictures of children vs. adults had to be classified as either sexual or neutral. Notably, contrary to
the classic Emotional Stroop variants, sexually relevant pictures in this Picture-Word Stroop were
presumed to facilitate classifications of sexual words. Self-identified pedophiles’ response latencies
were shown to differ from non-offending controls’ on this DSP index in the expected directions (d =
1.41). As van Leeuwen et al. (2012) provide evidence for a facilitatory (as opposed to the traditional
inhibitory) effect of sexually relevant images, it remains unclear whether the heterogeneity of these
effects (Price et al., in press) are due to methodological factors (e.g., differing stimulus sets and
Stroop variants) or to sample characteristics (e.g., intra- vs. extrafamiliar child sexual abusers). In
15
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
summary, there is inconclusive evidence for DSP Stroop paradigms as valid indicators of individual
differences in forensic contexts.
Attentional Blink. Based on the attentional blink phenomenon (Raymond, Shapiro, & Arnell,
1992), Beech et al. (2008) introduced the Rapid Serial Visual Presentation Task (RSVP). This paradigm
capitalizes on the fact that if the first target presented is sexually relevant, it interferes with the
perception of the second target when the targets are presented in rapid succession. Beech and his
colleagues showed that intra- and extrafamilial child molesters in contrast to non-sexual offender
controls made more errors in reporting a second target when they were presented target pictures of
children vs. pictures of animals (ds = 1.00 and 1.28, respectively) in an RSVP task. However, Crooks et
al. (2009) did not replicate these findings in a sample of adolescent child molesters, leaving open the
question of whether the findings might be explained by sample differences (i.e., adolescent child
sexual offenders are deemed to exhibit lower DSI levels than adult child molesters) or by a lack of
task validity.
Choice Reaction Time Task. In the prototypical Choice Reaction Time Task (Wright & Adams,
1994), individuals have to detect target probes (e.g., dots) that are superimposed on either sexually
relevant or irrelevant pictorial stimuli (e.g., pictures of adults vs. children). It has been shown twice
that a DSP index of mean response latencies for infants vs. adults discriminated between child sexual
offenders and non-sex offending controls (Mokros et al., 2010; Pöppl et al., 2011; ds = 1.41 and 0.99,
respectively) without further psychometric properties being reported.
In summary, attention-based/task-irrelevant paradigms have been quite successfully used in
clinical populations where avoidance of threat or negative valence is claimed to be the source of the
attentional bias (Cisler, Bacon, & Williams, 2009). Yet, corresponding DSI/DSP tasks lack a consistent
pattern of effects congruent with the supposed rationale of selective attention. This might result
from the fact that in the case of DSI/DSP positive valence and approach behavior associated with
sexual interest might facilitate rather than divert attention allocation. Additionally, most attention16
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
based paradigms cannot disentangle initial attentional capture and subsequent difficulties in
disengagement from the relevant stimuli (Fox, Derakshan, & Standage, 2011). This leads to
theoretical problems in predetermining the directedness of the hypothesized attention biases in
research with sexually relevant stimuli (Prause, Janssen, & Hetrick, 2008), leaving it unclear whether
potential group differences represent sexual interest or other sources of attention biases (e.g.,
phobic avoidance). More elaborate theoretical frameworks that clarify the relationship between
attention biases and DSI are needed. Importantly, data on reliability – a common problem with
attention-based measures of individual differences (Cisler et al., 2009) – are generally missing for
task-irrelevant measures. Hence, it is fair to conclude that task-irrelevant approaches are currently
the least developed indirect DSI measures.
What Goes Up Must Come Down – Implicit Assumptions about Implicit Measures
New developments often foster excessive and to some degree illusory expectations. This is
certainly true for indirect/implicit measures (Perugini & Banse, 2007). It is thus necessary to
thoroughly examine the empirical foundations of the implicit assumptions about implicit measures
(for an overview see Gawronski, 2009). Probably the most common beliefs about implicit measures
are that they assess subconscious associations not accessible through introspection, and, relatedly,
that they are therefore not fakeable. Likewise, it is often claimed that implicit measures circumvent
problems of social desirability because respondent s are supposedly not able to adjust their
responses on indirect measures. But lack of introspective access does not necessarily imply that the
associations are subconscious (De Houwer, 2006). In fact, empirical results point quite to the
contrary (e.g., Gawronski, Hofmann, & Wilbur, 2006). Furthermore, although indirect measures are
obviously not as easy to fake as self-report measures, both VTs and IATs are fakeable under specific
boundary conditions as laid out in the sections above (see http://www.innocentdads.org/abel.htm
for detailed instruction on how to fake VT DSI measures; Cvencek et al., 2010). Ultimately, it is likely
that no scientific, psychological measure ever will be completely immune to faking attempts,
although resistance across measures will vary along a continuum and indirect measures are obviously
17
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
situated on the more resistant end. Paradigms that tap into early levels of bottom-up/stimulusdriven processes such as startle probe reflex (Hecker, King, & Scoular, 2009) or anti-saccade tasks
(Fox et al., 2011) seem to be promising future options as they might be even more resilient against
faking attempts and deliberative top-down regulation.
Another interesting conundrum concerns the idea of the “true value” or the “true self”. All the
issues concerning social desirability and fakeability imply that indirect/implicit measures are able to
tap into individuals’ genuine attitudes, opinions, or sexual interests that the respondent in forensic
contexts is motivated to conceal from self-reports. However, this is dependent on what one would
psychologically regard as the “true self”. On the one hand, it might be assumed that the “true self” is
revealed under circumstances of failing deliberate control (e.g., disinhibition from alcohol
consumption giving rise to the true self). On the other hand, it might be hypothesized that the “true
self” can be inferred from what a person deliberately does and explicitly chooses in a controlled
mode (Gawronski, 2009). Theoretically, from a dual-systems perspective it can be claimed that
indirect measures should predict spontaneous behaviors whereas explicit measures should be
related to deliberate behaviors. However, there is a whole set of situational and personal moderators
of these relationships (Friese et al., 2008; Perugini et al., 2010) underscoring that especially in applied
forensic diagnostics one has to be cautious not to draw diagnostic inferences from single direct or
indirect measures. All these sources of measurement error conflate criterion/predictive validity and
thus it is questionable to interpret any single measure as an absolute index of a specific psychological
attribute. Diagnostic conclusions are much safer if they are derived from multiple valid, convergent,
and conceptually different measures that tap into unique parts of criterion variance. Corroborating
this, combining direct and indirect measures into test batteries has been proven as incrementally
valid over and above single (direct and indirect) DSI indicators in child sexual abuser populations
(Babchishin et al., 2012; Banse et al., 2010).Therefore, in forensic contexts the pressing question
remains when and under what boundary conditions implicit/indirect measures are incrementally
valid predictors of specific behaviors above and beyond explicit measures.
18
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
What Should We Aim for? A Research Outlook
Important steps on the way to developing DSI measurement tasks have been achieved
(Thornton & Laws, 2009b). The current state of research suggests that VTs and IATs are the most
promising and best validated indirect measures of DSI/DSP. Several replications have been
successfully conducted with independent samples and by different research labs. Preliminary work
on the effects of various moderators on task performance (e.g., general reaction speed; Schmidt et
al., in press; denial; Babchishin et al., 2012; Brown et al., 2009) has been undertaken.
Methodological aims. In terms of methodology, future research should aim to increase the
reliability of indirect DSI/DSP measures and routinely report relevant coefficients that are based on
corresponding test subsets (e.g., split-half coefficients, Cronbach’s α). An effective strategy to
optimize reliability is to use a sufficient number of trials. Additionally, in order to maximize
differences between individuals (as opposed to experimental conditions), a fixed random stimulus
order identical for all participants should be used because a fully randomized stimuli order adds
unnecessary portions of random error to the measure.
More research focusing on convergent and discriminant validity with other established
measures of DSI is needed. Each single validation criterion is plagued with its own set of problems.
Self-reported DSI is regarded as amenable to various impression management influences. Sexual
delinquent behavior such as child molesting represents a criminological/judicial category rather than
a specific indication of a psychological construct such as DSI/DSP (empirically only up to 50% of child
sexual offenders exhibit pedophilic DSP; Seto, 2008). Clinical pedophilia diagnoses suffer from low
reliability and/or validity as these are usually based on inferences from offence behavior rendering
them tautological. Thus, future research should preferably focus on behavioral measures based on
PPG or sexual behavior on the internet. Assessing these additional behaviors might be the only way
to solve the paradox of not having a directly accessible validation criterion for DSI.
19
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Standardization is another important aim. Phallometric assessments have been extensively
criticized for a lack of standardization of stimuli, assessment procedural, and scoring (e.g., Kalmus &
Beech, 2005). This critique has been routinely named as one of the main reasons for the
development of indirect DSI measures. However, at present indirect measures of DSI are far from
being standardized, too – either in terms of stimuli or scoring algorithms. Such idiosyncrasies
constitute barriers against comparing results and undermine knowledge accumulation. An easily
accomplished way of standardizing latency-based measures might be to use an analog to the dmeasure of the IAT (Greenwald et al., 2003) based on the standardized mean differences of the
relevant sexual interest categories as most measures rely on a DSP difference index comparing
responses to child vs. adult stimuli.
Theoretical aims. Future research needs to address questions pertaining to why and how
indirect DSI/DSP measures differentiate between offender subgroups, as well as the boundary
conditions affecting their validity. Therefore, it is highly advisable to control for factors such as intravs. extrafamilial child sexual offending, antisociality/psychopathy, pre- vs. postpubescent victims,
victim sex, sexual orientation, and/or risk levels to disentangle the influence of sample characteristics
from methodological issues. Apart from these issues, there is need for theoretical clarification of
exactly what indirect measures assess and how implicit/indirect sexual interest indicators relate to
sexual behavior when behavior contradicts explicit sexual interests. The exact relationship between
latency-based DSI measures and actual (for example, physiologically assessed) sexual arousal is as of
yet unclear (Ó Ciardha, 2011). As latency-based sexual interest indications cannot be regarded as the
same as physiological sexual arousal, the relation between these two observational levels should be
clarified for each latency-based measure and indirect measures as a whole.
Clinical aims. Ultimately, a desirable goal would be not only to methodologically improve and
theoretically better understand measures of DSI/DSP, but to make them more accessible to clinical
usage. As one example of an approach towards applied usability, Banse et al. (2010) have created the
Explicit and Implicit Sexual Interest Profile (EISIP) – a user-friendly test battery that produces profile
20
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
outputs that can be interpreted by clinicians outside of research laboratory settings immediately
after the assessment. Additionally, future work might focus on the development of norms for
relevant offender and non-offender populations. Finally, data on predictive validity, the most
relevant piece of the puzzle for applied purposes, are still missing. Nevertheless, as phallometrically
assessed DSI has been proven as among the best predictors of sexual reoffending (Mann et al., 2010),
high hopes might be put into VTs and IATs – the most valid DSI/DSP measures – as less costly and
laborious adjuncts to PPG assessments. Preliminary cross-sectional reports of correlations with
actuarial risk assessment instruments (Schmidt et al., in press) and convergence of VT with PPG
measures (Letourneau, 2002; Stinson & Becker, 2008) corroborate that this long-term research effort
is worthwhile to pursue.
21
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
References
Abel, G. G. (1995). The Abel Assessment for Sexual Interest–2 (AASI–2). Atlanta, GA: Abel Screening
Inc.
Abel, G. G., Huffman, J., Warberg, B., & Holland, C. L. (1998). Visual reaction time and
plethysmography as measures of sexual interest in child molesters. Sexual Abuse: A Journal of
Research and Treatment, 10, 81–95. doi:10.1177/107906329801000202
Abel, G. G., Jordan, A., Rouleau, J. L., Emerick, R., Barboza-Whitehead, S., & Osborn, C. (2004). Use of
visual reaction time to assess male adolescents who molest children. Sexual Abuse: A Journal
of Research and Treatment, 16, 255–265. doi:10.1177/107906320401600306
Babchishin, K. M., Nunes, K. L., & Hermann, C. (in press). The validity of Implicit Association Test (IAT)
measures of sexual attraction to Children: A meta-analysis. Archives of Sexual Behavior.
doi:10.1007/s10508-012-0022-8
Babchishin, K. M., Nunes, K. L., & Kessous, N. (2012). A multimodal examination of sexual interest in
children. Manuscript submitted for publication.
Banse, R., Schmidt, A. F., & Clarbour, J. (2010). Indirect measures of sexual interest in child sex
offenders: A multi-method approach. Criminal Justice and Behavior, 37, 319–335.
doi:10.1177/0093854809357598
Beech, A. R., Kalmus, E., Tipper, S. P., Baudouin, J. Y., Flak, V., & Humphreys, G. W. (2008). Children
induce an enhanced attentional blink in child molesters. Psychological Assessment, 20, 397402. doi:10.1037/a0013587
Brown, A. S., Gray, N. S., & Snowden, R. J. (2009). Implicit measurement of sexual associations in
child sex abusers: Role of victim type and denial. Sexual Abuse: A Journal of Research and
Treatment, 21, 166-180. doi:10.1177/1079063209332234
Cisler, J. M., Bacon, A. K., Williams, N. L. (2009). Phenomenological characteristics of attentional
biases towards threat: A critical review. Cognitive Therapy and Research, 33, 221-234.
doi:10.1007/s10608-007-9161-y
22
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Crooks, V. L., Rostill-Brookes, H., Beech, A. R., & Bickley, J. A. (2009). Applying Rapid Serial Visual
Presentation to adolescent sexual offenders: Attentional bias as a measure of deviant sexual
interest? Sexual Abuse: A Journal of Research and Treatment, 21, 135–148.
doi:10.1177/1079063208328677
Cvencek, D., Greenwald, A. G., Brown, A. S., Gray, N. S., & Snowden, R. J. (2010). Faking of the Implicit
Association Test is statistically detectable and partly correctable. Basic and Applied Social
Psychology, 32, 302–314. doi:10.1080/01973533.2010.519236
Dawson, D. L., Barnes-Holmes, D., Gresswell, D. M., Hart, A. J. P., & Gore, N. J. (2009). Assessing the
implicit beliefs of sexual offenders using the Implicit Relational Assessment Procedure: A first
study. Sexual Abuse: A Journal of Research and Treatment, 21, 57–75.
doi:10.1177/1079063208326928
De Houwer, J. (2006). What are implicit measures and why are we using them. In R. W. Wiers & A. W.
Stacy (Eds.), The handbook of implicit cognition and addiction (pp. 11-28). Thousand Oaks, CA:
Sage Publishers.
Friese, M., Hofmann, W., & Schmitt, M. (2008). When and why do implicit measures predict
behaviour?: Empirical evidence for the moderating role of opportunity, motivation, and
process reliance. European Review of Social Psychology, 19, 285-338.
doi:10.1080/10463280802556958
Fox, E., Derakshan, N., & Standage, H. (2011). The assessment of human attention. In K. C. Klauer, A.
Voss, & C. Stahl (Eds.), Cognitive methods in social psychology (pp. 15-47). New York, NY:
Wiley.
Fromberger, P., Jordan, K., Steinkrauss, H., von Herder, J., Witzel, J., Stolpmann, G., Kröner-Herwig,
B., & Müller, J. L. (in press). Diagnostic accuracy of eye movements in assessing pedophilia.
Journal of Sexual Medicine. doi: 10.1111/j.1743-6109.2012.02754.x
23
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Gawronski, B. (2009). Ten frequently asked questions about implicit measures and their frequently
supposed, but not entirely correct answers. Canadian Psychology, 50, 141-150.
doi:10.1037/a0013848
Gawronski, B., Hofmann, W., & Wilbur, C. J. (2006). Are “implicit” attitudes unconscious?
Consciousness and Cognition, 15, 485–499. doi:10.1016/j.concog.2005.11.007
Geer, J. H., & Bellard, H. S. (1996). Sexual content induced delays in unprimed lexical decisions:
Gender and context effects. Archives of Sexual Behavior, 25, 91–107. doi:10.1007/BF02437581
Glasgow, D. V. (2009). Affinity: The development of a self-report assessement of paedophile sexual
interest incorporating a viewing time validity measure. In D. Thornton & D. R. Laws (Eds.),
Cognitive approaches to the assessment of sexual interest in sexual offenders (pp. 59-84).
Chichester, UK: Wiley-Blackwell. doi:10.1002/9780470747551.ch3
Gray, N. S., Brown, A. S., MacCulloch, M. J., Smith, J., & Snowden, R. J. (2005). An implicit test of the
associations between children and sex in pedophiles. Journal of Abnormal Psychology, 114,
304-308. doi:10.1037/0021-843X.114.2.304
Gray, S. R., & Plaud, J. J. (2005). A comparison of the Abel Assessment for Sexual Interest and penile
plethysmography in an outpatient sample of sexual offenders. Journal of Sexual Offender
Commitment: Science and the Law, 1, 1-10. Retrieved from http://www.soccjournal.org/
Grieger, L., Hosser, D., & Schmidt, A. F. (2012). Predictive validity of self-reported self-control for
different forms of recidivism. Journal of Criminal Psychology, 2, 80-95.
doi:10.1108/20093821211264405
Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in
implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology,
74, 1464-1480. doi:10.1037//0022-3514.74.6.1464
Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit
Association Test: I. An improved scoring algorithm. Journal of Personality and Social
Psychology, 85, 197-216. doi:10.1037/0022-3514.85.2.197
24
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Gress, C. L. Z. (2005). Viewing time measures and sexual interest: Another piece of the puzzle. Journal
of Sexual Aggression, 11, 117–125. doi:10.1080/13552600500063666
Gress, C. L. Z., & Laws, R. D. (2009). Measuring sexual deviance: Attention-based measures. In A. R.
Beech, L. A. Craig, & K. D. Browne (Eds.), Assessment and treatment of sex offenders: A
Handbook (pp. 109-128). New York, NY: Wiley-Blackwell.
Harris, G. T., Rice, M. E., Quinsey, V. L., & Chaplin, T. C. (1996). Viewing time as a measure of sexual
interest among child molesters and normal heterosexual men. Behaviour Research and
Therapy, 34, 389-394. doi:10.1016/0005-7967(95)00070-4
Hecker, J. E., King, M. W., & Scoular, R. J. (2009). The startle probe reflex: An alternative approach to
the measurement of sexual interest. In D. Thornton & D. R. Laws (Eds.), Cognitive approaches
to the assessment of sexual interest in sexual offenders (pp. 59-84). Chichester, UK: WileyBlackwell. doi:10.1002/9780470747551.ch10
Helmus, L., Thornton, D., Hanson, R. K., & Babchishin, K. M. (2012). Improving the predictive accuracy
of Static-99 and Static-2002 with older sex offenders: Revised age weights. Sexual Abuse: A
Journal of Research and Treatment, 24, 64-101. doi: 10.1177/1079063211409951
Imhoff, R. ,Schmidt, A. F., Nordsiek, U., Luzar, C., Young, A. W., & Banse, R. (2010). Viewing time
effects revisited: Prolonged response latencies for sexually attractive targets under restricted
task conditions. Archives of Sexual Behavior, 39, 1275–1288. doi:10.1007/s10508-009-9595-2
Imhoff, R., Schmidt, A. F., Weiß, S., Young, A. W., & Banse, R. (2012). Vicarious Viewing Time:
Prolonged response latencies for sexually attractive targets as a function of task- or stimulusspecific processing. Archives of Sexual Behavior, 41, 1389-1401. doi: 10.1007/s10508-0119879-1
Kalmus, E., & Beech, A. R. (2005). Forensic assessment of sexual interest: A review. Aggression and
Violent Behavior, 10, 193-218. doi:10.1016/j.avb.2003.12.002
25
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Letourneau, E. J. (2002). A comparison of objective measures of sexual arousal and interest: Visual
reaction time and penile plethysmography. Sexual Abuse: A Journal of Research and
Treatment, 14, 207–223. doi:10.1177/107906320201400302
Mann, R. E., Hanson, K. R., & Thornton, D. (2010). Assessing risk for sexual recidivism: Some
proposals on the nature of psychologically meaningful risk factors. Sexual Abuse: A Journal of
Research and Treatment, 22, 191-217. doi:10.1177/1079063210366039
Mihailides, S., Devilly, G. J., & Ward, T. (2004). Implicit cognitive distortions and sexual offending.
Sexual Abuse: A Journal of Research and Treatment, 16, 333-350.
doi:10.1177/107906320401600406
Mokros, A., Dombert, B., Osterheider, M., Zappalà, A., Santtila, P. (2010). Assessment of pedophilic
sexual interest with an attentional choice reaction time task. Archives of Sexual Behavior, 39,
1081-1090. doi:10.1007/s10508-009-9530-6
National Research Council, Committee to Review the Scientific Evidence on the Polygraph. (2003).
The polygraph and lie detection. Washington, DC: National Academy Press.
Nunes, K. L., Firestone, P., & Baldwin, M. W. (2007). Indirect assessment of cognitions of child sexual
abusers with the Implicit Association Test. Criminal Justice and Behavior, 34, 454-475.
doi:10.1177/0093854806291703
Ó Ciardha, C. (2011). A theoretical framework for understanding deviant sexual interest and cognitive
distortions as overlapping constructs contributing to sexual offending against children.
Aggression and Violent Behavior, 16, 493-502. doi:10.1016/j.avb.2011.05.001
Ó Ciardha, C., & Gormley, M. (2012). Using a pictorial-modified Stroop Task to explore the sexual
interests of sexual offenders against children. Sexual Abuse: A Journal of Research and
Treatment, 24, 175-197. doi:10.1177/1079063211407079
Price, S. A., Beech, A. R., Mitchell, I. J., Humphreys, G. W. (in press). The promises and perils of the
emotional Stroop task: A general review and considerations for use with forensic samples.
Journal of Sexual Aggression, 17, 1-16. doi:10.1080/13552600.2010.545149
26
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Price, S. A., & Hanson, R. K. (2007). A modified Stroop task with sexual offenders: Replication of a
study. Journal of Sexual Aggression, 13, 203–216. doi:10.1080/13552600701785505
Perugini, M. & Banse, R. (2007). Editorial: Personality, implicit self-concept and automaticity.
European Journal of Personality, 21, 257-261. doi:10.1002/per.637
Perugini, M., Richetin, J., & Zogmaister, C. (2010). Prediction of behavior. In B. Gawronski, & B. K.
Payne (Eds.), Handbook of social cognition – Measurement, theory, and applications (p. 255277). New York, NY: Guilford.
Petruschke, P., Imhoff, R., Banse, R., & Weber (in preparation). Bottom-up versus top-down
responding to sexually preferred stimuli: an fMRI study. Manuscript in preparation.
Pöppl, T. A., Nitschke, J., Dombert, B., Santtila, P., Greenlee, M. W., Osterheider, M., & Mokros, A.
(2011). Functional cortical and subcortical abnormalities in pedophilia: A combined study using
a choice reaction time task and fMRI. Journal of Sexual Medicine, 8, 1660-1674.
doi:10.1111/j.1743-6109.2011.02248.x
Prause, N., Janssen, E., & Hetrick, W. P. (2008). Attention and emotional responses to sexual stimuli
and their relationship to sexual desire. Archives of Sexual Behavior, 37, 934-949.
doi:10.1007/s10508-007-9236-6
Raymond, J. E., Shapiro, K. L., & Arnell, K. A. (1992). Temporary suppression of visual processing in an
RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and
Performance, 18, 849–860. doi:10.1037//0096-1523.18.3.849
Rosenzweig, S. (1942). The photoscope as an objective device for evaluating sexual interest.
Psychosomatic Medicine, 4, 150–157.
Sachsenmaier, S. J., & Gress, C. L. Z. (2009). The Abel Assessment for Sexual Interests-2: A critical
review. In D. Thornton & D. R. Laws (Eds.), Cognitive approaches to the assessment of sexual
interest in sexual offenders (pp. 31–57). Chichester, UK: Wiley-Blackwell.
doi:10.1002/9780470747551.ch2
27
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Schmidt, A. F., Gykiere, K., Vanhoeck, K., Mann, R. E., & Banse, R. (in press). Direct and indirect
measures of sexual maturity preferences differentiate subtypes of child sexual abusers. Sexual
Abuse: A Journal of Research and Treatment.
Seto, M. C. (2008). Pedophilia and sexual offending against children: Theory, assessment and
intervention. Washington, DC: APA. doi:10.1037/11639-000
Seto, M. C., Harris, G. T, Rice, M. E., & Barbaree, H. E. (2004). The Screening Scale for Pedophilic
Interests predicts recidivism among adult sex offenders with child victims. Archives of Sexual
Behavior, 33, 455-466. doi:10.1023/B:ASEB.0000037426.55935.9c
Seto, M. C., & Lalumière, M. L. (2001). A brief screening scale to identify pedophilic interests among
child molesters. Sexual Abuse: A Journal of Research and Treatment, 13, 15-25.
doi:10.1177/107906320101300103
Snowden, R. J., Craig, R. L., Gray, N. S. (2011). Indirect behavioral measures of cognition among
sexual offenders. Journal of Sex Research, 48, 192-217. doi:10.1080/00224499.2011.557750
Snowden, R. J., & Gray, N., S. (2010). Implicit social cognition in forensic settings. In B. Gawronski, &
B. K. Payne (Eds.), Handbook of implicit social cognition – measurement, theory, and
applications (pp. 522-534).
Smith, P., & Waterman, M. (2004). Processing bias for sexual material: The Emotional Stroop and
sexual offenders. Sexual Abuse: A Journal of Research and Treatment, 16, 163–171.
doi:10.1177/107906320401600206
Stinson, J. D., & Becker, J. V. (2008). Assessing sexual deviance: A comparison of physiological,
historical, and self-report measures. Journal of Psychiatric Practice, 14, 379-388.
doi:10.1097/01.pra.0000341892.51124.85
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental
Psychology, 18, 643-662. doi:10.1037/h0054651
28
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Teige-Mocigemba, S., Klauer, K. C., & Sherman, J. W. (2010). A practical guide to Implicit Association
Tests and related tasks. In B. Gawronski, & B. K. Payne (Eds.), Handbook of social cognition –
Measurement, theory, and applications (p. 117-139). New York, NY: Guilford.
Thornton, D., & Laws, D. R. (2009a). Cognitive approaches to the assessment of sexual interest in
sexual offenders. Chichester, UK: Wiley-Blackwell. doi:10.1002/9780470747551
Thornton, D., & Laws, D. R. (2009b). Postscript: Steps towards effective assessment of sexual interest.
In D. Thornton & D. R. Laws (Eds.), Cognitive approaches to the assessment of sexual interest in
sexual offenders (pp. 59-84). Chichester, UK: Wiley-Blackwell.
doi:10.1002/9780470747551.ch11
Van Leeuwen, M., van Baaren, R., Chakhssi, F., Loonen, M., Lippman, M., & Dijksterhuis, A. (2012).
Detecting implicit paedophilic preferences: Improving predictability. Manuscript submitted for
publication.
Verschuere, B., Ben-Shakar, G., & Meijer, E. (2011). Memory Detection: Theory and Application of the
Concealed Information Test. Cambridge, UK: Cambridge University Press.
Walters, G. D. (2006). Risk-appraisal versus self-report in the prediction of criminal justice outcomes.
Criminal Justice and Behavior, 33, 279-304. doi:10.1177/0093854805284409
Ward, T. (2000). Sexual offenders’ cognitive distortions as implicit theories. Aggression and Violent
Behavior, 5, 491–507. doi:10.1016/S1359-1789(98)00036-6
Ward, T., & Beech, T. (2006). An integrated theory of sexual offending. Aggression and Violent
Behavior, 11, 44-63. doi:10.1016/j.avb.2005.05.002
Worling, J. R. (2006). Assessing sexual arousal with adolescent males who have offended sexually:
Self-report and unobtrusively measured viewing time. Sexual Abuse: A Journal of Research and
Treatment, 18, 383–400. doi:10.1177/107906320601800406
Wright, L. W.,& Adams, H. E. (1994). Assessment of sexual preference using a choice reaction time
task. Journal of Psychopathology and Behavioral Assessment, 16, 221–231.
doi:10.1007/BF02229209
29
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Table 1. Overview of psychometric results from studies on latency-based indirect measures of deviant sexual interest in children.
Measure
Viewing Time (VT)
Harris, Rice, & Quinsey (1996)
Categories
Reliability
Adult-Child
n/a
α = .88
α = .89
α = .87
α = .86
α = .84
α = .90
α = .60
α = .75
α = .90
α = .90
n/a
α = .87
α = .86
α = .85
α = .80
n/a
n/a
Abel et al. (2004)
Adult Male
Adolescent Male
Young Male
Adult Female
Adolescent Female
Young Female
Male Children (age 2-4)
Male Children (age 8-10)
Male Adolescents (age 14-17)
Male Adults (age 22 and over)
Male Children (age 0-10)
Female Children (age 2-4)
Female Children (age 8-10)
Female Adolescents (age 14-17)
Female Adults (age 22 and over)
Female Children (0-10)
Children-Adults
Gress (2005)
Children-Adults
n/a
Gray & Plaud (2005)
n/a
n/a
Worling (2006)
Prepubescent/Postpubescent
Male Toddlers
Male Preadolescents
Male Adolescents
Male Adults
Female Toddlers
Female Preadolescents
Female Adolescents
Abel, Huffmann, Warberg, & Holland
(1998)
Letourneau (2002)
b
n/a
α = .82
α = .79
α = .62
α = .72
α = .73
α = .82
α = .77
30
Validity
Group Comparison (n)
Effect-size
Reported
Cohen’s d
Equivalent
CSO vs. NOC
Girls-only CSO vs. NOC
d = 1.00**
r = .60***
1.00
1.50
CSO with boy victims (10) vs. SO (47)
r = .69**
2.51
CSO with girl victims (34) vs. SO (23)
Adolescent CSO (1170) vs.
Adolescent AC (534)
CSO (19) vs. Rapists (7)
CSO with male or mixed victims (9) vs. CSO with
female victims (17)
Dissimulating pedophilic CSO (11) vs. Pedophilic
CSO (28)
VT (39) vs. PPG (39)
CSO (52) vs. Sexual offender s with peer or
adolescent victims (26)
CSO with two or more victims vs. SO
CSO with any male victims vs. SO
CSO with only male victims vs. SO
CSO with any female victims vs. SO
CSO with only female victims vs. SO
r = .08
snr
AUC = .64
0.16
0.51
Frequency Table
Frequency Table
1.08*
1.65*
Frequency Table
-2.13*
Frequency Table
b
AUC = .61
0.43
0.40
b
0.36
0.70
0.87
-0.29
-0.25
AUC = .60
b
AUC = .69**
b
AUC = .73**
b
AUC = .42
b
AUC = .43
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Measure
Categories
Babchishin, Nunes, & Kessous (2012)
Female Adults
Male Child
Male Preadolescent
Male Adolescent
Male Adult
Female Child
Female Preadolescent
Female Adolescent
Female Adult
Adult-Child
Postpubescent Males
Postpubescent Females
Prepubescent Males
Prepubescent Females
Children-Adults
Postpubescent Males
Postpubescent Females
Prepubescent Males
Prepubescent Females
Children-Adults
Postpubescent Males
Postpubescent Females
Prepubescent Males
Prepubescent Females
Children-Adults
Postpubescent Males
Postpubescent Females
Prepubescent Males
Prepubescent Females
Children-Adults
Child-Adult
Fromberger et al. (in press)
Children-Adults
Glasgow (2009)
Banse, Schmidt, & Clarbour (2010)
Schmidt, Gykiere, Vanhoeck, Mann, &
Banse (2012)
Reliability
α = .77
α = .93
α = .8
α = .9
α = .89
α = .92
α = .87
α = .89
α = .93
n/a
α = .85
α = .86
α = .85
α = .77
n/a
Validity
Group Comparison (n)
Effect-size
Reported
CSO (31) vs. NOC (31)
CS0 (38) vs. NSOC (37)/NOC (38)
AUC = .87
AUC = .82*
AUC = .56
AUC = .80*
AUC = .76*
AUC = .51
AUC = .89*
AUC = .63
AUC = .90*
AUC = .81*
AUC = .33
AUC = .78*
AUC = .74*
AUC = .86*
AUC = .73*
AUC = .46
r = .47**
r= -.37**
r = .47*
r = .00
r =.42**
d = 1.15*
d = 1.32*
d = 1.22*
AUC = 0.76***
CSO with boy victims only (14) vs.
NSOC (37)
CSO with girl victims only (16) vs.
NSOC (37)
α = .90
α = .90
α = .95
α = .93
n/a
n/a
n/a
31
CSO with male or mixed victims (18) vs. Girls-only
CS0 (36)
CSO (35) vs. NSOC (21)
Admitting CSO (20) vs. NSOC (20)
Denying CSO (12) vs. NSOC (20)
Pedophilic CSO (19) vs.
AC (7)/NOC (48)
snr
Cohen’s d
Equivalent
1.59
1.29
0.21
1.19
1.00
0.04
1.73
0.47
1.81
1.24
-0.62
1.09
0.91
1.53
0.87
-0.14
1.13
-0.84
1.13
0.00
0.98
1.15
1.32
1.22
1.00
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Measure
Implicit Association Test (IAT)
Mihailides, Devilly, & Ward (2004)
Gray, Brown, MacCulloch, Smith, &
Snowden (2005)
Nunes, Firestone, & Baldwin (2007)
Brown, Gray, & Snowden (2009)
Banse, Schmidt, & Clarbour (2010)
Babchishin, Nunes, & Kessous (2012)
Schmidt, Gykiere, Vanhoeck, Mann, &
Banse (2012)
Van Leeuwen et al. (2012)
Categories
Reliability
Group Comparison (n)
Effect-size
Reported
CSO (25) vs. NSOC (25)
CSO (25) vs. NOC (25)
CSO (18) vs. AC (60)
t = 3.15
***
t = 4.76
d = 0.84*
0.63
0.95
0.84
CSO (24) vs. NOC (29)
CSO (27) vs. NOC (29)
CSO with victims < 12 years of age (54) vs.
CSO with victims > 12 years of age (21)
CSO (54) vs. NSOC (49)
Admitting CSO (20) vs.
NSOC (49)
Denying CSO (55) vs. NSOC (49)
Denying (55) vs. Admitting CSO (20)
CS0 (38) vs. NSOC (37)/NOC (38)
r =.33*
r =.21
d = 0.77**
0.70
0.43
0.77
d = 0.92***
d = 0.75*
0.92
0.75
d = 1.01**
d = 0.27
AUC = .62*
1.01
0.27
0.43
α = .79
AUC = .72*
0.82
n/a
AUC = .71*
0.78
AUC = .60
AUC = .67
AUC = .71*
AUC = .57
AUC = .56
AUC = .60
d = 0.44
d = 0.35
d = 0.51
r = .29*
0.36
0.62
0.78
0.25
0.21
0.36
0.44
0.35
0.51
0.64
Children-Not children/Sexual-Not sexual
n/a
Children-Adults/Sex-Non-sex
n/a
Children-Adults/Sexy-Not sexy
Children-Adults/Pleasant-Unpleasant
Children-Adults/Sex-Non-sex
Validity
n/a
n/a
α = .80
rtt = .63
**
Cohen’s d
Equivalent
1. Boys-Men/Sexually exciting-Sexually
unexciting
2. Girls-Women/Sexually exciting-Sexually
unexciting
3 .Children-Adults/Sexually exciting-Sexually
unexciting
1.
2.
3.
1.
2.
3.
Children-Adults/Sexy-Not sexy
α = .65
Boys-Men/Sexually exciting-Sexually unexciting
α = .61
Girls-Women/Sexually exciting-Sexually
unexciting
Children-Adults/Sexually exciting-Sexually
unexciting
Children-Adults/Sex-related-Neutral
α = .82
r = .23
0.50
n/a
r = .32*
0.72
CSO with boy victims only (14) vs.
NSOC (37)
CSO with girl victims only (16) vs.
NSOC (37)
n/a
n/a
32
CSO (35) vs. NSOC (21)
Admitting CSO (22) vs. NSOC (21)
Denying CSO (13) vs. NSOC (21)
CSO with male or mixed victims (18) vs. Girls-only
CS0 (36)
SCP (20) vs. NOC (20)
snr
AUC = .89
1.73
Running Head: INDIRECT MEASURES IN FORENSIC CONTEXTS
Measure
Categories
Reliability
Implicit Relational Assessment Procedure (IRAP)
Dawson, Barnes-Holmes, Gresswell,
Children-Adults
Hart, & Gore (2009)
Eye Movement Tracking
Fromberger et al. (in press)
Children-Adults (Initial fixation latency)
Children-Adults (Relative fixaton time)
Choice Reaction Task (CRT)
Mokros, Dombert, Osterheider, Zappalà, Infants-Adults
& Santilla (2010)
Pöppl et al. (2011)
Infants-Adults
Stroop Variants
Smith & Waterman (2004; Emotional
Sexual-Neutral
Stroop)
Price & Hanson (2007; Emotional Stroop) Sexual-Neutral
Ó Ciardha & Gormley (2012; Picture
Stroop)
Van Leeuwen et al. (2012; Picture-Word
Stroop)
Rapid Serial Visual Presentation (RSVP)
Beech et al. (2008)
Validity
Group Comparison (n)
Effect-size
Reported
n/a
CSO (16) vs. NOC (16)
χ = 5.489*
0.91
n/a
n/a
Pedophilic CSO (20) vs.
SO with adult victims (7)/NOC (48)
AUC = 0.90***
AUC = 0.83***
1.81
1.35
n/a
CSO (21) vs. NSOC (21)
AUC = 0.84**
1.41
n/a
CSO (9) vs. NSOC (11)
d = 0.99*
0.99
n/a
CSO (5) vs. Rapists (5)
t = 0.831
0.53
n/a
CSO (15) vs. Rapists (15)
CSO (15) vs. Violent NSOC (15)
CSO (15) vs. Non-Violent NSOC (15)
CSO (15) vs. NOC (15)
CSO (15) vs. Rapists (15)
CSO (15) vs. Violent NSOC (15)
CSO (15) vs. Non-Violent NSOC (15)
CSO (15) vs. NOC (15)
CSO (15) vs. Rapists (15)
CSO (24) vs. NOC (24)
Highly deviant CSO (15) vs. NOC (24)
SCP (20) vs. NOC (20)
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
AUC = .56
AUC = .59
snr
AUC = .84
-0.45
-0.03
0.14
0.82*
-0.28
0.26
0.07
0.58
0.19
0.21
0.32
1.41
Child molesting-Neutral
n/a
Rape-Neutral
Children-Adults
n/a
n/a
Children-Adults
n/a
T1 Child-T1 Animal
n/a
2
Cohen’s d
Equivalent
Intrafamilial CSO (16) vs. NSOC (17)
r = .45**
1.00
Extrafamilial CSO (18) vs NSOC (17)
r = .54***
1.28
Crooks, Rostill-Brookes, Beech, & Bickley T1 Child-T1 Animal
n/a
Adolescent CSO (20) vs.
n/a
(2009)
Adolescent NSOC (26)
Note. All comparisons with male participants and based on uncorrected, raw data. n/a = not available; CSO = Child sexual offenders; NSOC = Non-sexual offenders; NOC = Non-offender controls;
a
AC = Non-child sexual offending controls; SO = Sexual offender s with adult and/or child victims; SCP = Self-identified community pedophiles; PPG = Penile plethysmography, VT = Viewing time.
b
No differences reported for all discriminant analyses in Abel et al. (1998). All comparisons/effect sizes reported in Worling (2006) are based on the Prepubescent/Postpubescent Deviance Index.
snr
= significance level not reported. * p < .05; ** p < .01; *** p < .001
33