Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Mistaking a House for a Face: Neural Correlates of Misperception in Healthy Humans

2005, Cerebral Cortex

Cerebral Cortex April 2006;16:500--508 doi:10.1093/cercor/bhi129 Advance Access publication July 13, 2005 Mistaking a House for a Face: Neural Correlates of Misperception in Healthy Humans Christopher Summerfield1, Tobias Egner2, Jennifer Mangels1 and Joy Hirsch2 Individuals with normal vision can sometimes momentarily mistake one object for another. In this functional magnetic resonance imaging study, we investigated how extrastriate visual regions respond during these erroneous perceptual judgements. Subjects were asked to discriminate images of houses and faces that were degraded such that they were close to an individually defined threshold for perception. On correct trials, voxels localized on the inferior occipital (OFA), fusiform (FFA) and parahippocampal (PPA) gyri exhibited selectivity for face and house images as expected. On incorrect trials, no face- or place-selectivity was observed for OFA or PPA. However, consistent with ‘predictive coding’ accounts of perception, we observed that the FFA also responded robustly on trials where a house was misperceived as a face, and concurrent activation was observed in medio-frontal and right parietal regions previously implicated in decision making under uncertainty. We suggest that FFA responses during misperception may be driven by a predictive top-down signal from these regions. It has yet to be empirically demonstrated that the visual system is testing pre-established hypotheses in a Bayesian fashion. However, aside from its intuitive appeal, there exists considerable circumstantial evidence that predictive coding may be occurring. Firstly, the context of a visual event is highly influential in shaping perception decisions (Palmer, 1975; Biederman et al., 1982; Henderson and Hollingworth, 1999; Bar, 2004). For example, a mailbox can be perceived as a loaf of bread if the context provided by surrounding objects indicates that it is to be found in the kitchen (Palmer, 1975). With respect to pre-stimulus code generation, recent evidence suggests that the brain is far from silent in the period preceding stimulation; rather, there is a tendency for neural synchrony to increase prior to onset of an expected stimulus (Brunia and Damen, 1988; Engel et al., 2001; Tallon-Baudry et al., 2005). Modelling work has suggested that feedback within a hierarchically organized system in which only prediction error is transferred between layers can account for the response properties of simple and complex cells in early visual cortex (Rao and Ballard, 1999). Moreover, one of the predictions of the model — that V1 activity will be suppressed when there is a good ‘explanation’ for the sensory data — has been borne out in functional magnetic resonance imaging (fMRI) studies in which subjects view matched gestalt and non-gestalt stimuli (Murray et al., 2004). Predictive coding offers a framework for understanding a wide range of neurocognitive phenomena, including repetition suppression (Dolan et al., 1997; Ishai et al., 2004), change blindness (Rensink, 2000), vision occurring in ‘reverse’ (Ahissar and Hochstein, 2004), visual context effects (Bar, 2004) and perceptual hysteresis (Kleinschmidt et al., 2002), as well as patterns of effective connectivity observed in the visual cortex (Pascual-Leone and Walsh, 2001). Under normal viewing conditions, where perceptual information is rich and the visual environment is regular, there is little reason why perception should err. However, where visual information is limited (such as in the dark) this may not be the case. It follows from predictive coding accounts that perceptual errors (or ‘misperceptions’) may occur when higher-order visual regions incorrectly ‘explain’ impoverished information arriving via feedforward pathways from early visual regions. In other words, where the bottom-up visual signal is ambiguous but the top-down signal is strong (and wrong), the latter may gain precedence over the former, resulting in the generation of a false or erroneous percept. Interestingly, this view is highly reminiscent of recent models of hallucinatory or illusory experience in patient populations, which have described a mismatch between top-down and bottom-up information as crucial to non-veridical perceptual experiences (Grossberg, 2000; Collerton et al., 2005). The phenomenon of ‘misperceiving’ Introduction Recent approaches to perception have drawn upon the theory that part of the problem of deciding what it is that we are seeing can be solved before the stimulus is even presented. There exist intrinsic regularities in ongoing perception, such that, for example, each time you walk through the front door of your apartment, the configuration of objects, textures and colours which greets you is likely to be highly similar to the last. Most prominent among these regularities is a powerful temporal autocorrelation in the perceptual signal (what you are seeing now, it is very likely that you will be seeing in a few seconds’ time). According to recent models, the brain can capitalize upon these regularities to generate, over time, a predictive code corresponding to perceptual events which are likely to occur (Mumford, 1992; Rao and Ballard, 1999; Bar, 2003; Friston, 2003; Murray et al., 2004). The role of such a predictive signal would be to transfer part of the computational burden to the epoch preceding the stimulus, thereby limiting post-stimulus processing to the testing of a pre-established ‘prior’ hypothesis (and the further processing of residual prediction error). According to this view, once a stimulus has been presented, bottom-up sensory information is ‘matched’ to a predictive code rather than being processed de novo in feedforward succession. In other fields of psychology, such as reward learning, the existence of such predictive signals is well established, and it has been shown that reward-related neural activity will shift from reinforcer to predictive cue over the course of repeated pairings (Tobler et al., 2005). Ó The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org Department of Psychology, Columbia University, 406 Schermerhorn Hall, 1190 Amsterdam Ave, New York, NY 10027, USA and 2Functional MRI Research Center, Department of Psychiatry and Radiology, Columbia University Neurological Institute Box 108 710 West 168th Street, New York, NY 10032, USA Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 Keywords: extrastriate, misperception, neural correlates predictive coding, visual regions 1 Materials and Methods Subjects Subjects (n = 8, four females) were neurologically normal individuals ranging in age from 19 to 34 years. All subjects gave informed consent in accordance with Columbia University Medical Center Institutional Review Board guidelines. Stimuli Stimuli were 400 3 400 pixel grayscale images of faces and houses. Faces came from the AR database (Martinez and Benavente, 1998) and houses were from photographs taken by the authors in Brooklyn, New York. Images were cropped within the borders of the face/house such that features (eyes/nose, or windows/door) were more prominent than overall shape (see Fig. 1a). House stimuli were additionally smoothed with a 2D Gaussian filter (width = 11 pixels) to match house and face stimuli for high-frequency information. All stimuli were normalized to a mean luminance of 0.5 (range 0--1). For psychophysical testing, contrast-modulated exemplars of the house/face images were generated by scaling luminance values by c within a range generated by 0.5 ± [c/2]. For example, 0.2 contrast images were generated by scaling all luminance values in the range 0.4--0.6. Mask stimuli were random checkerboards (400 3 400 pixels, each square 40 3 40 pixels) which were smoothed and deformed by the ‘spherize’ and ‘ocean ripple’ tools in Adobe Photoshop. Examples of contrast-modulated face and house images, and masks, can be seen in Figure 1a. Localizer Task In a pre-experimental fMRI run, subjects passively viewed 12 alternating blocks of unmodulated, unmasked faces and houses. House/face stimuli were presented for 750 ms with 250 ms inter-stimulus intervals, in blocks of 15 consecutive stimuli. A 10 s rest period (fixation) was interleaved between blocks. Discrimination Task In the discrimination task, each trial began with a blank screen for a variable period. This duration was varied in a Gaussian fashion such that total inter-trial interval varied between 2000 and 3000 ms. A fixation cross for 500 ms cued the onset of the stimulus. The face/house image was presented for 100 ms, followed immediately by a randomly selected mask for 250 ms. On each trial, the subject indicated with a button press whether the stimulus was a face or a house, and whether they had high or low confidence in their response. The mask duration was 250 ms, after which it was replaced by a fixation cross for 1000 ms. The sequence of events on each discrimination trial can be seen in Figure 1b. Figure 1. Example stimuli. (a) Three examples of contrast-modulated face and house images, and example masks (bottom). Images were contrast modulated by normalizing scalar intensity values within a range around 0.5. Face and house exemplars increase in contrast from left to right. (b) The sequence of stimuli presented in each trial. Inter-stimulus intervals are shown interposed between frames. Cerebral Cortex April 2006, V 16 N 4 501 Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 one object as another is common to many neurological and psychiatric disorders (ffytche and Howard, 1999), as well as patients with damage to the posterior brain (Warrington and Shallice, 1984), but even healthy individuals frequently report ‘illusory’ perceptual experiences (McKellar, 1957). These misperceptions are most likely to occur when visual information is limited, such as at night or in the darkened interior of a room (Murgatroyd and Prettyman, 2001), as would be expected if such errors were due to erroneous ‘explanation’ of a weakened visual signal. In the present study, we studied how the brain responds during erroneous perceptual decisions, with a view to understanding more about how predictive signals shape perception. It follows from predictive coding accounts of perception that on incorrect trials (where bottom-up information is weak or ambiguous), neural activity will be observed in visual regions tuned to the predicted stimulus, as well as more anterior structures from which the top-down ‘prediction’ originates. For example, where stimulus A is mistaken for (‘explained as’) stimulus B, extrastriate visual regions representing stimulus B will become active despite the fact that no bottom-up information corresponding to stimulus B has been presented. In other words, if predictive coding is occurring, then selectivity for the reported percept should be preserved during erroneous discrimination. To test this hypothesis, we used a challenging perceptual task in which subjects discriminated rapid, visually degraded images of faces and houses. Discriminability was carefully controlled with psychophysical thresholding, such that all images were presented very close to individual thresholds for discrimination. fMRI responses were acquired from ventral posterior cortical regions known to respond preferentially to the face (fusiform and inferior occipital gyri) and house (parahippocampal gyrus) images used in our discrimination task (Dolan et al., 1997; Kanwisher et al., 1997; Aguirre et al., 1998; Epstein and Kanwisher, 1998). By exploring how these regions responded on incorrect discrimination trials (for example, when a house is mistaken for a face), it was possible to assess whether perceptual selectivity is preserved under situations where one object is mistaken for another. A genetic algorithm was used to generate pseudorandom sequences of faces/houses which were optimized for detection of contrast between trial types (Wager and Nichols, 2003). Levels of contrast modulation were determined individually for each subject with extensive pre-experimental testing. Subjects performed 540 practice trials (180 outside the scanner, 360 in the scanner) on which contrast modulation varied from 0.03 to 0.3 (in steps of 0.03) in an ascending and descending staircase fashion. Each subject’s point of subjective equality (PSE) was defined as the contrast level closest to which she exhibited 75% discrimination performance. During the main task, 10 stimuli were drawn from a Gaussian distribution of contrast-modulated images falling within 0.05 of this level in steps of 0.01. fMRI Data Analysis Spatial pre-processing and statistical mapping were carried out with SPM2 software (Wellcome Department of Imaging Neuroscience, University College London, UK, http://www.fil.ion.ucl.ac.uk/spm/ spm2.html). Functional T2* images were slice-timing corrected, spatially realigned to the first volume acquired. The first five functional scans from each task were discarded prior to the subsequent analyses. A 128 s temporal highpass filter was applied in order to exclude lowfrequency artifacts. Temporal correlations were estimated using restricted maximum likelihood estimates of variance components using a first-order autoregressive model. The resulting non-sphericity was used to form maximum likelihood estimates of the activations. Each subject’s structural T1 image was co-registered to an individual mean EPI image. Transformation parameters were derived from normalizing the co-registered structural image to a template brain within the stereotactic space of the Montreal Neurological Institute (MNI), and the derived parameters were then applied to normalize each subject’s EPI volumes (from both localizer and task runs). Normalized images were smoothed with a Gaussian kernel of 9 3 9 3 13.5 mm full-width half-maximum (i.e. three times the voxel dimensions as originally acquired). Trial Classification For the discrimination task, trials were classified according to whether a face was correctly perceived as a face (FF), a face was incorrectly perceived as a house (FH), a house was incorrectly perceived as a face (HF) or a house was correctly perceived as a house (HH). Correct responses were further subdivided into high (FFhc, HHhc) and low confidence (FFlc, HHlc) options (this subdivision was not possible for incorrect trials, for which the high confidence response was very rarely used), yielding a total of six conditions of interest. Regressors of stimulus events in the discrimination task (convolved with a canonical HRF) were created for each trial type (FFhc, HHhc, FFlc, HHlc, FH, HF), and parametric modulation regressors for image contrast were included with each regressor. Parametric regressors were built using values taken from subject-specific performance curves (% correct faces/houses at each contrast level) rather than values corresponding to the degree of stimulus degradation. Subject-specific parameter estimates associated with each of these six regressors were extracted from the first (withinsubject) analysis. These estimates (beta coefficients) were then compared with t-tests at the second (group) level for analyses specific to predefined ROIs and in a voxelwise fashion across the entire brain. Localization Face- and Place-responsive Regions Selective face- and place-sensitive voxels were identified with a preexperimental localizer task in which subjects passively viewed face and 502 Neural Correlates of Misperception d Summerfield et al. Statistical Analyses In order to assess predictions of interest, subject-specific parameter estimates within regions of interest (PPA, FFA, OFA) were compared using t-tests at the group level. In particular, we were interested in how neural responses in pre-defined face- and place-sensitive regions varied on incorrect trials (HF > FH, FH > HF). Comparisons were also undertaken for the parametric modulation regressors, to assess whether that portion of the response of each region which varied with contrast similarly differed between trial types. Estimates of the hemodynamic response for each condition were obtained by refitting the data using a finite impulse response (FIR) convolution model to provide a less constrained picture of the hemodynamic response. Note that we only remodelled the data at these ROIs for which we had already established significant responses. Additionally, we performed a conventional whole-brain search for voxels whose activation varied as a function of stimulus and percept for high-confidence correct, low-confidence correct and incorrect trials. For these analyses, which were performed at the second (betweensubject) level, only voxels which were significant at P < 0.05 with the correction for false discovery rate (FDR) (Genovese et al., 2002) are reported. Results Behavioral Results Subjects’ PSE (~75% discrimination) fell within contrast values of 0.08 and 0.26. Overall mean discrimination performance on the discrimination task was 71.1 ± 5% (Fig. 2a). Subjects were equally likely to correctly detect houses (d9 = 1.79 ± 0.14) and faces (d9 = 1.76 ± 0.16) and substantial numbers of both highconfidence correct (229.3 ± 45.5), low-confidence correct (110.0 ± 43.9) and incorrect (139.9 ± 24.0) trials were obtained for each subject. High-confidence incorrect responses were rare, with an average of 8.6 ± 8.7 faces classified with high Table 1 Voxel locations (Talairach coordinate space) of peak voxel in the fusiform gyrus (FFA) and inferior occipital gyrus (OFA) which responded to the comparisons faces [ houses (FFA, OFA) and houses [ faces (PPA) in the localizer task Subject FFA voxel 1 2 3 4 5 6 7 8 40 38 36 ÿ32 36 ÿ40 41 ÿ38 ÿ63 ÿ67 ÿ52 ÿ63 ÿ62 ÿ65 ÿ56 ÿ69 OFA voxel ÿ11 ÿ5 ÿ9 ÿ15 ÿ15 ÿ19 ÿ16 ÿ19 36 ÿ32 40 ÿ43 ÿ41 ÿ43 ÿ43 ÿ45 ÿ79 ÿ75 ÿ81 ÿ83 ÿ79 ÿ81 ÿ85 ÿ83 PPA voxel 2 ÿ7 ÿ5 ÿ9 3 ÿ11 ÿ7 ÿ11 19 24 20 ÿ24 ÿ28 17 ÿ25 24 ÿ50 ÿ51 ÿ44 ÿ52 ÿ46 ÿ44 ÿ62 ÿ44 ÿ12 ÿ12 ÿ18 ÿ14 ÿ14 ÿ13 ÿ15 ÿ12 Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 fMRI Data Acquisition Images were acquired with a GE Twin-Speed 1.5 T scanner. All images were acquired parallel to the AC-PC orientation with a T2*-weighted EPI sequence of 24 contiguous axial slices [TR = 2000, TE = 40, flip angle = 60°, field of view (FoV) = 190 mm, array size = 64 3 64$] of 4.5 mm thickness and 3 3 3 mm in-plane resolution, providing whole-brain coverage. The region of interest (ROI) localizer task consisted of a single run of 155 scans, and the discrimination task consisted of four runs of 160 scans each. High-resolution anatomical scans were acquired with a T1*-weighted SPGR sequence (TR = 19, TE = 5, flip angle = 20, FoV = 220), recording 24 slices at a slice thickness of 1.5 mm and in-plane resolution of 0.86 3 0.86 mm. place stimuli in alternating blocks. Imaging data from this localizer task was modeled with two box-car functions convolved with a canonical hemodynamic response function (HRF). These regressors were contrasted with a t-test in each subject (faces >places, places >faces) and the resulting images were thresholded at a liberal threshold (P < 0.001, uncorrected) to identify face- and place-sensitive regions of the brain. Guided by an extensive previous literature, we selected peak voxels responsive to face stimuli in the inferior occiptial gyrus (the ‘occipital face area’ or OFA) and fusiform gyrus (‘the fusifom face area’ or FFA), and voxels responsive to place stimuli in the parahippocampal gyrus (the ‘parahippocampal place area’ or PPA). For each region in each subject, we defined a sphere of 2 mm radius (8 voxels) centered on the voxel showing the peak response to the relevant comparison. Additionally, we defined a single control region of interest, also of 2 mm radius, at the peak voxel falling in early visual cortex which responded to both faces and places in group analysis of the localizer task. The talairach coordinate locations of these voxels (OFA, FFA, PPA) for each subject is shown in Table 1. Figure 2. Behavioral data. (a) Psychophysical data from the experimental task (480 trials). Discrimination performance (0--1, chance 5 0.5) is on the y-axis and level of contrast modulation (difference from PSE) is on the x-axis. A mean curve is fitted to the data (mean with standard error bars). (b) From the task, tendency to respond ‘face’ as a function of contrast level. The x-axis shows contrast level (difference from PSE); the y-axis shows the percentage of trials on which the subject responded ‘face’. Fitted mean data (black line) is superimposed on mean data points with standard errors bars. No bias (0.5) is marked with a dashed grey line. Control Region The location of the early visual control region is shown in Figure 3a (cluster thresholded at P < 10–5). When imaging data from the discrimination task were extracted from a small, spherical ROI centered on the peak voxel from this ROI (indicated with blue cross-hairs), neural responses did not differ as a function of either stimulus or percept (FFhc > HHhc, P = 0.47; FFlc > HHlc, P = 0.07; HF > FH, P = 0.23). The estimated mean HRF across subjects for this ROI is shown in Figure 3b. Face-responsive Regions In the localizer task, all eight subjects showed activation in inferior occipital and fusiform regions in response to passive viewing of faces. Five subjects exhibited a peak fusiform area face-response in the right hemisphere, and three in the left hemisphere. The reverse pattern was observed in the OFA, with 6/8 subjects exhibiting maximal selectivity for faces in the left hemisphere. Table 1 shows the location of the OFA and FFA for each subject, as identified by the localizer task. All FFA peaks fell anterior to all OFA peaks (all FFA within 50 < y < 70; all OFA within 70 < y < 90). Additionally, the locations of selected FFA and OFA ROIs are shown rendered onto the MNI brain in Figure 4a,d. Figure 4 shows results from the face-responsive regions (FFA, Fig. 4a--c; OFA, Fig. 4d--f). In Figure 4b, mean FFA responses from FFhc (blue lines) and HHhc trials (red lines) are plotted. Face-selectivity in individually defined ROIs (selected FFA ROIs shown in Fig. 4a) was highly preserved for these high-confidence correct trials, with robust and positivegoing HRFs to FFhc trials. When compared at the group level, mean parameter estimates were significantly greater for FFhc trials than HHhc (t = 2.7, P < 0.05). However, our main comparison of interest concerned incorrect trials. Houses mistaken as faces elicited a reliably greater neural response in the FFA than faces mistaken as houses (t = 3.1, P < 0.02). HF trials (cyan line) were accompanied by a large, postitivegoing hemodynamic response in the FFA (Fig. 4c) which peaked later (~8 s) than the response on correct trials. No differences were observed for low-confidence trials (red and blue lines, P = 0.83). Figure 4d shows selected peak face-responsive voxels on the inferior occipital gyrus identified with the localizer task. As for the FFA, face-selectivity was preserved at these OFA voxels on the discrimination task, with a reliably greater neural response elicited on FFhc trials than HHhc trials (t = 4.2, P < 0.01). HRFs on FFhc (blue) and HHhc trials (red) can be seen in Figure 4e. Blood oxygen level-deficient (BOLD) responses did not differ, however, as a function of stimulus for low-confidence correct trials (P = 0.48) or incorrect trials (P = 0.23). HRFs for lowconfidence and incorrect trials are shown in Figure 4f. Descriptively, different patterns of results were observed in FFA and OFA, with face-selectivity preserved on incorrect trials for fusiform but not inferior occipital voxels. In order to test the statistical reliability of this result, we entered the beta coefficients from IOG and FFA ROIs into a 2 (region; FFA, OFA) 3 2 (condition; FH, HF) analysis of variance. We observed a statistically significant region 3 condition interaction (F = 9.4, P < 0.02), indicating that face percept selectivity on incorrect trials was indeed observed for fusiform but not inferior occipital face-responsive voxels. Place-responsive Regions A sample of the individual PPA locations from which these data were extracted can be seen in Figure 5a. Comparing HHhc > FFhc trials at these individually defined ROIs located on the parahippocampal gyrus, BOLD responses were significantly greater for HHhc than for FFhc trials (t = 2.5, P < 0.04). Mean hemodynamic responses on high-confidence correct trials (HHhc, red; FFhc, blue) are shown in Figure 5b. In Figure 5c, the HRF estimates on other trial types are plotted. Within this cluster, low-confidence correct house trials (HHlc) also exhibited a reliably greater response than low-confidence correct face trials (FFlc) (t = 5.2, P < 0.01). The comparison between the two types of incorrect trial (HF > FH, FH > HF), however, failed to reach statistical significance (t = 0.29). Cerebral Cortex April 2006, V 16 N 4 503 Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 confidence as houses (FH), and 2.0 ± 3.0 houses classified with high confidence as faces (HF). In absolute number, more highconfidence incorrect trials were FH than HF (t = 2.93, P < 0.04), but proportionally, more high-confidence trials were HF than FH (t = 2.56, P < 0.05). Subjects displayed an overall bias to classify stimuli as houses (t = 3.0, P < 0.03). However, this tendency varied with contrast (Fig. 2b), with lower-contrast stimuli more likely to be classed as houses (linear trend: F = 35.7, P < 0.001). Figure 3. Imaging data: V1/V2 control region. (a) Voxels responding to both faces and houses in the localizer task (cluster thresholded at P \ 10ÿ5). The peak voxel within this cluster is marked with the blue cross-hairs. (b) Discrimination-task hemodynamic responses (post-stimulus time histogram) from this peak voxel, for each of the six conditions. Time in seconds is on the x-axis. Bars represent standard errors. Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 Figure 4. Imaging data: FFA and OFA. (a) Locations of peak fusiform gyrus (FFA) voxels responsive to faces [ houses from the localizer task (selected individual subjects). (b) Mean discrimination-task hemodynamic responses from high-confidence correct face (blue) and high-confidence correct house (red) trials extracted from these fusiform loci. (c) Mean discrimination-task hemodynamic responses from low-confidence (red, blue) and incorrect (orange, cyan) trials. (d) Locations of peak inferior occipital gyrus (OFA) voxels responsive to faces [ houses from the localizer task (selected individual subjects). (e) Mean discrimination-task hemodynamic responses from high-confidence correct face (blue) and high-confidence correct house (red) trials for these inferior occipital loci. (c) Discrimination-task hemodynamic responses from low-confidence (red, blue) and incorrect (orange, cyan) trials. It was observed that, overall, beta values for incorrect and face-stimulus trials in the PPA were less than zero, and the HRF curves were accordingly negative-going. We reasoned that this might simply reflect parameter estimates falling below the mean, due to the capture of variance by the parametric contrast regressors, and so we re-ran the analysis without these regressors. A highly similar pattern of data was observed in 504 Neural Correlates of Misperception d Summerfield et al. the PPA, with HRF curves for face stimulus and incorrect trials dipping below zero in precisely the same fashion (data not shown). Contrast-modulated Regressors For each of the three regions of interest, we also compared the contrast-modulated regressors for high-confidence correct, low Figure 5. Imaging data: PPA. (a) locations of peak parahippocampal gyrus (PPA) voxels responsive to houses [ faces from the localizer task (selected individual subjects). (b) Mean discrimination-task hemodynamic responses from high-confidence correct face (blue) and high-confidence correct house (red) trials extracted from these parahippocampal loci. (c) Mean discrimination-task hemodynamic responses from low-confidence (red, blue) and incorrect (orange, cyan) trials. Whole-brain Analyses In order to identify voxels associated with our perceptual decision making task, we first conducted a search for all voxels which responded to presentation of the face/house images irrrespective of trial type. In addition to expected activations in left motor cortex (subjects all responded with their right hand), we observed significant clusters in a network of brain regions previously implicated in perceptual decision making: posterior parietal cortex bilaterally, medial prefrontal cortex, right dorsolateral prefrontal cortex and right anterior insula (not shown). Using these clusters as a mask, we then conducted a wholebrain search for voxels that differed reliably across subjects as a function of trial type. All results reported here are corrected for FDR with an alpha of P < 0.05. At this threshold, the comparison between high confidence correct face and house trials (FFhc > HHhc, HHhc > FFhc) revealed no significant differences. Similarly, comparing low-confidence correct face and house trials (FFlc > HHlc, HHlc > FFlc) yielded no active voxels. Houses judged to be faces, however, yielded significantly greater activation than faces judged to be houses (HF > FH) at a number of cerebral loci. Statistically significant clusters were observed in the right superior parietal lobe, Brodmann’s area 7 (t = 9.6, FDR P < 0.01) extending into the precuneus (t = 9.1, FDR P < 0.01) and also in the medial frontal gyrus (Brodmann’s area 32; t = 7.3, FDR P < 0.01) extending into the supplementary motor area (Brodmann’s area 6; t = 7.0, FDR P < 0.01). In Figure 6, a statistical map of significant activations is rendered onto the MNI template brain. The peak voxel for the comparison HF > FH is indicated by the blue crosshairs. Discussion In order to simulate the experience of illusory perception in the laboratory, subjects were asked to discriminate images of houses and faces which were presented close to the threshold for perception. Image visibility was carefully controlled such that discrimination errors were made on 25% of trials. The object of the study was to determine whether selectivity of responses in face- and place-selective voxels in ventral visual Figure 6. Imaging data: whole-brain analyses. Voxels across the brain responsive to the comparison HF [ FH. The peak voxels from clusters in the right superior parietal lobe (a) and medial frontal cortex (b) achieved statistical significance at P \ 0.05 (FDR correction for multiple comparisons). The red-yellow scale refers to the t-value. cortex was preserved on these incorrect trials. The results here suggest that higher visual regions are not homogenous with respect to their responses during misperception of their preferred stimulus. Whereas the FFA responded reliably on both veridical perception and misperception trials, with robust, positive-going HRFs to both to faces judged to be faces (FF) and to houses judged to be faces (HF), other face-responsive voxels in the occipital cortex responded only during veridical face perception. A similar pattern, whereby BOLD responses indicated selectivity during veridical perception but not misperception, was observed in place-responsive voxels of parahippocampal gyrus (PPA). Thus, the FFA displayed responses to incorrect trials in line with hypotheses derived from predictive coding accounts of perception (Friston, 2003; Murray et al., 2004) and top-down models of illusory perception (Collerton et al., 2005; Grossberg, 2000), whereas the PPA responded in a fashion concordant with the assumption that incorrect trials simply reflected wrong guesses (Smith and Ratcliff, 2004). Cerebral Cortex April 2006, V 16 N 4 505 Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 confidence correct, and incorrect trials. None of the comparisons reached statistical threshold (V1: all P-values > 0.3; PPA: all P-values > 0.2; FFA: all P-values > 0.1; OFA: all P-values > 0.5). 506 Neural Correlates of Misperception d Summerfield et al. fusiform gyrus mediating configuration-based face judgements, presumably via top-down interactions with more anterior structures. Place-sensitive Regions By contrast, the PPA did not respond to faces ‘misperceived’ as houses. One possible interpretation of this finding is that mechanisms of perception in the PPA and FFA differ due to differences in the level of structural regularity of their preferred stimulus. The overall structure of a face is highly predictable (two eyes above a centrally positioned nose and mouth, etc), whereas the PPA is known to respond to a wide range of natural scenes, including both interior views, exterior scenes without obvious horizon (such as views of buildings) and views with horizon (such as views of mountains). It follows from this variability that a predictive code is less likely to be of use in the processing of natural scene stimuli, and PPA neurons may thus have to rely to a greater extent on bottom-up information. Indeed, there is evidence from previous work that PPA responses are not sensitive to stimulus familiarity or identity (Epstein et al., 1999) whereas those in the FFA are (Dubois et al., 1999; Rotshtein et al., 2005). Unlike the FFA, the PPA does not respond in a viewpoint-invariant fashion (Epstein et al., 2003), suggesting a stronger responsiveness to bottom-up input from primary visual regions. Perhaps most importantly, in patients who experience recurrent visual illusions and hallucinations, the illusory image typically occurs against a veridical background scene, and panoramic hallucinations of entire visual scenes are rare (ffytche and Howard, 1999). However, it has also previously been reported that during rivalrous stimulation to each eye, PPA responses correlate with subjectively reported perception of buildings rather than bottom-up stimulation (Tong et al., 1998). Additionally, the PPA is activated by mental imagery of places (Ishai et al., 2000) and may mediate context effects in object perception (Bar and Aminoff, 2003). These results all run contrary to the intepretation that PPA responses uniquely track the ‘bottom-up’ veridical properties of the stimulus within minimal top-down input. Another interpretation of our data, thus, is that rather than reflecting fundamental differences in the responsivity of FFA and PPA, the failure to find parahippocampal place-selectivity on incorrect trials relates to a ‘predictive’ strategy employed by the subjects. Even though the face and house stimuli used in our study were well matched with regard to their level of structural regularity, in the real world houses tend to exhibit less regularity than faces, which may have prompted subjects to use a heuristic whereby they ‘predicted’ that the coming stimulus would be a face, using evidence against this prediction as evidence in support of the idea that the stimulus was in fact a house. Indeed, although we did not record this formally, in post-scan debriefing, subjects reported that they were more inclined to respond ‘house’ if they could not see the stimulus. In addition to conforming to predictive coding accounts of perception, the use of this strategic approach is described by ‘random-walk’ theories of decision making in two-choice decisions, which propose that subjects accumulate information along a single response dimension, such that information in favor of one response is information against the other (Link and Heath, 1975; Smith and Ratcliff, 2004). One prediction that can be made from this model is that if subjects are accumulating ‘face’ information during discrimination, the tendency to Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 Face-sensitive Regions Our main finding may at first appear counterintuitive: when perception errs, the FFA can exhibit a stronger and statistically more robust response to images of houses than images of faces, despite the fact that, in our experiment, face-sensitive voxels were defined to be those which exhibited a greater response to faces than houses on a pre-experimental localizer task. On trials where a house is mistaken for a face (HF), the FFA is not receiving ‘bottom-up’ sensory input signalling the presence of a face stimulus (as no such stimulus is present). It is thus likely that the FFA is strongly modulated by a ‘top-down’ expectation that a face will be presented. In other words, in line with predictive coding accounts of perception, impoverished visual information is being ‘explained’ as corresponding to a face stimulus even where no face stimulus is present. That the hemodynamic response peaked later (~8 s post-stimulus) on these trials relative to correct trials may reflect the increased time required to resolve the ambiguous visual information into a (false) percept. Previous studies have shown the FFA to be highly sensitive to top-down information. For example, a grey, oval-shaped stimulus was found to activate the FFA when contextual information suggested that it was a face (it was placed on top of a pair of shoulders). However, when the stimulus was viewed out of context, no face-related activity was elicited (Cox et al., 2004). FFA activity has been found to track reported perception under conditions where retinal input remains constant, yet the percept varies, such as binocular rivalry (Tong et al., 1998) or during the presentation of ‘Mooney’ faces (Andrews and Schluppeck, 2004). Anecdotal evidence suggests that humans have a strong predisposition to see faces where no face exists (in clouds, in the moon, or in landscapes) or to perceive a face from the barest of cues, perhaps resulting from the privileged place which faces are thought to hold in primate phylogeny and ontogeny (Yin, 1969; Farah et al., 1998). Moreover, categoryspecific activation of FFA is observed when faces are imagined (Ishai et al., 2000; O’Craven and Kanwisher, 2000). It is easy to see how generation of a mental image corresponding to the expected stimulus may accompany a predictive mechanism in perception. Taken together with these findings, the data reported here suggest that this propensity to perceive ‘illusory’ faces is likely to result from greater responsivity of FFA to top-down modulation. Not all face-responsive voxels, however, showed preserved selectivity on incorrect trials. Voxels in inferior occipital regions (OFA) responded robustly and significantly to highconfidence correct face discriminations, but failed to dissociate HF from FH trials. This result was confirmed by a statistically significant region 3 condition interaction observed for these two areas. Models of face processing have proposed that faces are discriminated via a two-step mechanism, with early visual processing stages mediating ‘structural encoding’ of the physical properties of the face, and later stages responsible for configural processing underlying face identification (Bruce and Young, 1986). Recently, it was found that the OFA is sensitive to subtle differences in structural properties of a face, whereas the FFA tracks identity shifts across a categorical boundary (Rotshtein et al., 2005). Our data provide further support for the view that there is a dissociation between these two face-processing regions, with more posterior sites sensitive to ‘bottom-up’ featural information, and later processing stages along the testing predictions during uncertain decisions. Medial prefrontal and posterior parietal cortical regions are densely interconnected (Wise et al., 1997) and extrastriate regions receive re-entrant connections particualrly from the superior parietal lobe (Van Essen et al., 1992). It is plausible that these frontal and parietal sites are the ‘source’ of the top-down prediction about the forthcoming stimulus, and that face-selective FFA responses on incorrect trials are driven by input from these regions. Other Considerations It could perhaps be argued that the differences between trial types observed here simply reflect confounding differences in the basic visual properties of the images used. Indeed, one particular concern is that due to the bias exhibited by subjects to respond ‘house’ at low contrasts, HH and FH regressors are contaminated with larger numbers of trials where contrast levels were low. We think that it is unlikely that our results simply reflect low-contrast stimulation on FH trials for a number of reasons. Firstly, contrast regressors did not seem to capture much of the variance, presumably because the overall variation in contrast was slight, as all stimuli were presented in a narrow range around a individual determed thresholds for perception. This suggests that image contrast was not a major determinant of the response in these extrastriate regions. Most importantly, however, fMRI responses in posterior regions of the occipital cortex tend to be more sensitive to the overall contrast of the stimulus (Boynton et al., 1999), and yet these regions did not show the effect for the HF > FH comparison. An a-priori defined control region which fell in or close to V1 exhibited no reliable differences between HF and FH trials, as would be predicted if this result depended on differences in contrast between trial types. Even in the extra-striate cortex the result was not ubiquitous: although HF trials exhibit stronger responses than FH trials in the FFA, in the OFA, this was not the case. We thus think it is unlikely that our results can be accounted for simply by differences in stimulus contrast. Conclusions Whole-brain Analyses When the two classes of incorrect trial were compared with voxelwise comparisons across the brain (HF > FH), areas previously implicated in perceptual decision making were strongly activated, including the posterior parietal cortex, right dorsolateral prefrontal cortex, right anterior insula and dorsomedial prefrontal cortex. Neuroimaging (Huettel et al., 2005; Pessoa and Padmala, 2005) and single-cell electrophysiology (Shadlen and Newsome, 2001) research has suggested that all of these regions play an important role in decision making under uncertainty. Considerable controversy surrounds the precise function of each part of this network, particularly with respect to whether they subserve categorical selection among alternatives or ancillary processes required to make difficult decisions (such as working memory and attention). Whilst our study was not designed to address the function of these regions in decision making, it is interesting to note that comparing the two types of incorrect trials here (HF > FH) isolated voxels in a subset of these regions: the right posterior parietal cortex and precuneus, and in mediodorsal prefrontal cortex. Our data offer a tentative new perspective on the function of these regions in decision making, by suggesting that they may be involved in In this report we describe how ventral visual regions respond during misperception of one object as another. Robust BOLD responses were observed in face-responsive regions of the fusiform gyrus (but not inferior occipital gyrus) when a house is perceived as a face, and it is argued that this activity may underlie ‘illusory’ face perception. Furthermore, medial frontal and parietal regions previously implicated in perceptual decision making also become active during misperception of faces. These regions may be candidates for the source of a topdown prediction about what the forthcoming stimulus is to be. These data provide support for the notion that the perceptual system makes use of a predictive code in deciding what it is that we are seeing. Notes We thank Suzanne Palmer for help with data collection and analysis. This work was carried out with support from the William J. Keck Foundation, the Columbia University Provosts Academic Quality Fund, and a grant from the National Insititute of Health (#R21066129) to J.A.M. Address correspondence to C. Summerfield, Department of Psychology, Columbia University, 406 Schermerhorn Hall, 1190 Amsterdam Ave, New York, NY 10027, USA. Email: summerfd@paradox.columbia.edu. Cerebral Cortex April 2006, V 16 N 4 507 Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 respond ‘face’ will vary with the amount of information present in the stimulus, as at very low contrasts, most of the stimuli will be judged to be ‘not-face’ (i.e. house) stimuli. Our behavioral data, which revealed a linear trend for the bias to respond ‘house’ to increase as the stimuli were reduced in contrast (Fig. 2b), thus complement the fMRI data in supporting the idea that subjects were indeed using a ‘face prediction’ strategy. PPA responses on all trials where the subject responded ‘face’ were negative-going, indicating that parameter estimates fell below the mean. This effect persisted even when contrastmodulated regressors were removed from the design matrix, indicating that it did not occur simply because contrast regressors captured much of the available variance. One conjecture is that during discrimination, use of a ‘face-prediction’ strategy led to active suppression of brain regions coding other non-predicted stimuli in the cognitive set. However, this interpretation must remain speculative until addressed with further research. In an earlier report, visual responses in striate (V1) and extrastriate (V2, V3) cortex were observed to track illusory perception when subjects judged whether a simple visual stimulus was present or absent (Ress and Heeger, 2003). Here, using a forced-choice discrimination paradigm, we show a similar phenenomenon for more complex stimuli in higher visual regions. However, one of the limitations of using a forcedchoice discrimination (rather than detection) is that it is difficult to draw conclusions about the subjective experiences of the subject, as below-threshold responses can drive behavior (Marcel, 1983). However, one possibility is that the failure to find PPA responses during incorrect ‘house’ responses reflects a failure for a conscious ‘mispercept’ to be formed of these stimuli, perhaps because in our study, FFA received greater topdown input from anterior control structures. This intepretation is in line with the view that neural activity in the PPA does contribute to conscious visual perception (Tong et al., 1998) and, more generally, with the observation that while human observers may frequently ‘misperceive’ objects as faces (pareidolia), such errors are less common for natural scene stimuli. References 508 Neural Correlates of Misperception d Summerfield et al. Kanwisher N, McDermott J, Chun MM (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17:4302--4311. Kleinschmidt A, Buchel C, Hutton C, Friston KJ, Frackowiak RS (2002) The neural structures expressing perceptual hysteresis in visual letter recognition. Neuron 34:659--666. Link S, Heath R (1975) A sequential theory of psychological discrimination. Psychometrika 40:77--105. Marcel AJ (1983) Conscious and unconscious perception: experiments on visual masking and word recognition. Cognit Psychol 15:197--237. Martinez M, Benavente R (1998) The AR face database. CVC Technical Report #24. McKellar P (1957) Imagination and thinking. London: Cohen & West. Mumford D (1992) On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biol Cybern 66:241--251. Murgatroyd C, Prettyman R (2001) An investigation of visual hallucinosis and visual sensory status in dementia. Int J Geriatr Psychiatry 16:709--713. Murray SO, Schrater P, Kersten D (2004) Perceptual grouping and the interactions between visual cortical areas. Neural Netw 17:695--705. O’Craven KM, Kanwisher N (2000) Mental imagery of faces and places activates corresponding stiimulus-specific brain regions. J Cogn Neurosci 12:1013--1023. Palmer S (1975) The effects of contextual scenes on the identification of objects. Mem Cognit 3:519--526. Pascual-Leone A, Walsh V (2001) Fast backprojections from the motion to the primary visual area necessary for visual awareness. Science 292:510--512. Pessoa L, Padmala S (2005) Quantitative prediction of perceptual decisions during near-threshold fear detection. Proc Natl Acad Sci USA 102:5612--5617. Rao RP, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2:79--87. Rensink RA (2000) Seeing, sensing, and scrutinizing. Vision Res 40:1469--1487. Ress D, Heeger DJ (2003) Neuronal correlates of perception in early visual cortex. Nat Neurosci 6:414--420. Rotshtein P, Henson RN, Treves A, Driver J, Dolan RJ (2005) Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain. Nat Neurosci 8:107--113. Shadlen MN, Newsome WT (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol 86:1916--1936. Smith PL, Ratcliff R (2004) Psychology and neurobiology of simple decisions. Trends Neurosci 27:161--168. Tallon-Baudry C, Bertrand O, Henaff MA, Isnard J, Fischer C (2005) Attention modulates gamma-band oscillations differently in the human lateral occipital cortex and fusiform gyrus. Cereb Cortex 15:654--662. Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307:1642--1645. Tong F, Nakayama K, Vaughan JT, Kanwisher N (1998) Binocular rivalry and visual awareness in human extrastriate cortex. Neuron 21:753--759. Van Essen DC, Anderson CH, Felleman DJ (1992) Information processing in the primate visual system:an integrated systems perspective. Science 255:419--423. Wager TD, Nichols TE (2003) Optimization of experimental design in fMRI: a general framework using a genetic algorithm. Neuroimage 18:293--309. Warrington EK, Shallice T (1984) Category specific semantic impairments. Brain 107:829--854. Wise SP, Boussaoud D, Johnson PB, Caminiti R (1997) Premotor and parietal cortex: corticocortical connectivity and combinatorial computations. Annu Rev Neurosci 20:25--42. Yin R (1969) Looking at upside down faces. J Exp Psychol 81:141--145. Downloaded from http://cercor.oxfordjournals.org/ by guest on May 30, 2013 Aguirre GK, Zarahn E, D’Esposito M (1998) An area within human ventral cortex sensitive to ‘building’ stimuli: evidence and implications. Neuron 21:373--383. Ahissar M, Hochstein S (2004) The reverse hierarchy theory of visual perceptual learning. Trends Cogn Sci 8:457--464. Andrews TJ, Schluppeck D (2004) Neural responses to Mooney images reveal a modular representation of faces in human visual cortex. Neuroimage 21:91--98. Bar M (2003) A cortical mechanism for triggering top-down facilitation in visual object recognition. J Cogn Neurosci 15:600--609. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5:617--629. Bar M, Aminoff E (2003) Cortical analysis of visual context. Neuron 38:347--358. Biederman I, Mezzanotte RJ, Rabinowitz JC (1982) Scene perception: detecting and judging objects undergoing relational violations. Cognit Psychol 14:143--177. Boynton GM, Demb JB, Glover GH, Heeger DJ (1999) Neuronal basis of contrast discrimination. Vision Res 39:257--269. Bruce V, Young A (1986) Understanding face recognition. Br J Psychol 77 (Pt 3):305--327. Brunia CH, Damen EJ (1988) Distribution of slow brain potentials related to motor preparation and stimulus anticipation in a time estimation task. Electroencephalogr Clin Neurophysiol 69:234--243. Collerton D, Perry E, McKeith I (2005) Why people see things that are not there: a novel perception and attention deficit model for recurrent complex visual hallucinations. Behav Brain Sci (in press). Cox D, Meyers E, Sinha P (2004) Contextually evoked object-specific responses in human visual cortex. Science 304:115--117. Dolan RJ, Fink GR, Rolls E, Booth M, Holmes A, Frackowiak RS, Friston KJ (1997) How the brain learns to see objects and faces in an impoverished context. Nature 389:596--599. Dubois S, Rossion B, Schiltz C, Bodart JM, Michel C, Bruyer R, Crommelinck M (1999) Effect of familiarity on the processing of human faces. Neuroimage 9:278--289. Engel AK, Fries P, Singer W (2001) Dynamic predictions: oscillations and synchrony in top-down processing. Nat Rev Neurosci 2:704--716. Epstein R, Kanwisher N (1998) A cortical representation of the local visual environment. Nature 392:598--601. Epstein R, Harris A, Stanley D, Kanwisher N (1999) The parahippocampal place area: recognition, navigation, or encoding? Neuron 23:115--125. Epstein R, Graham KS, Downing PE (2003) Viewpoint-specific scene representations in human parahippocampal cortex. Neuron 37:865--876. Farah MJ, Wilson KD, Drain M, Tanaka JN (1998) What is ‘special’ about face perception? Psychol Rev 105:482--498. ffytche DH, Howard RJ (1999) The perceptual consequences of visual loss:‘positive’ pathologies of vision. Brain 122:1247--1260. Friston K (2003) Learning and inference in the brain. Neural Netw 16:1325--1352. Genovese CR, Lazar NA, Nichols T (2002) Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15:870--878. Grossberg S (2000) How hallucinations may arise from brain mechanisms of learning, attention, and volition. J Int Neuropsychol Soc 6:583--592. Henderson JM, Hollingworth A (1999) High-level scene perception. Annu Rev Psychol 50:243--271. Huettel SA, Song AW, McCarthy G (2005) Decisions under uncertainty: probabilistic context influences activation of prefrontal and parietal cortices. J Neurosci 25:3304--3311. Ishai A, Ungerleider LG, Haxby JV (2000) Distributed neural systems for the generation of visual images. Neuron 28:979--990. Ishai A, Pessoa L, Bikle PC, Ungerleider LG (2004) Repetition suppression of faces is modulated by emotion. Proc Natl Acad Sci USA 101:9827--9832.