Humans view many types of visual scenes—including pictures as well as 3-D environments—that often contain objects or figures. Figures are those regions of the visual field that appear to be shaped by their borders and appear to be in front of the regions abutting their boundaries. Thus, figures have two attributes: They appear (1) to have shape and (2) to be closer to the viewer than the adjoining ground. These two attributes are often coupled (Peterson, 2003), although perceived shape seems to be all-or-none, whereas the perceived distance between a figure and its ground can be graded continuously.

The Rubin vase–faces stimulus in Fig. 1a is a classic figure–ground display. It is a reversible display that nicely illustrates the coupling of the two attributes of figures. When the central white region appears to be the figure, it appears to have a definite shape—that of a vase—whereas the black regions appear to be shapeless near the borders they share with the white region; they appear to simply continue behind the white figure there. In contrast, when the outer black regions appear to be figures, they appear to have definite shapes—that of two profile faces—whereas the white region appears to be shapeless near the borders it shares with the black regions; it appears to simply continue behind the black figures there. Note that the figures in the Rubin display appear to be closer to the viewer than the ground, but not much closer (for magnitude estimation evidence, see Peterson & Gibson, 1993).

Fig. 1
figure 1

Examples of typical figure–ground formations. a Rubin vase–faces display. b A circular figure in front of a square background that is shapeless near the border shared with the circular figure. c A display used by Peterson and Gibson (1994a)

The relationship between attention and figure–ground perception has long been of interest, with one recurring question being whether attention is automatically allocated to surfaces in a display, relative to their background. To examine this question, Nelson and Palmer (2007) used bipartite figure–ground displays, with two equal-area regions on either side of a central border, one of which was cued by familiarity alone to be perceived as the figure (Peterson & Gibson, 1994b; Fig. 1c). Nelson and Palmer’s (2007) observers demonstrated faster and more accurate responses for targets shown on the region portraying a familiar object than on the complementary region on the other side of the central border. Nelson and Palmer (2007) took these data to suggest that attention is automatically allocated to figures. An alternative explanation, however, could be that attention is simply drawn automatically to familiar items in a display (e.g., Christie & Klein, 1995). Additionally, subjects could have adopted a strategy of allocating attention toward the side of the display where the familiar objects were presented (e.g., Mojica, Salvagio, & Peterson, 2012; Salvagio, Mojica, Kimchi, & Peterson, 2011). Thus, the question of whether attention is automatically drawn to the figural side of a shared border is not answered definitively by Nelson and Palmer’s (2007) study.

More recently Lester, Hecht, and Vecera (2009) used different stimuli and a different task to examine whether attention is automatically allocated to figures rather than to the adjoining ground. Their stimuli portrayed two regions sharing a border, but neither region portrayed a familiar object. One region was specified as nearer than the other by multiple cues, including the figural cue of convexity and the monocular depth cues of larger relative size, shading, interposition, and height in field. The larger convex region appeared to be shaped by the border it shared with the adjoining region, as figures typically do in classic figure–ground displays such as the Rubin vase–faces display and Nelson and Palmer’s (2007) displays. In addition, because the monocular depth cues enhanced the perception of depth in the displays used by Lester et al., the apparent depth step between the figure and the ground was much larger than in classic figure–ground displays and in the displays used by Nelson and Palmer (2007).

The task that Lester et al. (2009) used was a temporal order judgment (TOJ) paradigm, where after the figure–ground display was shown for 500 ms, two probes appeared on the adjacent figure and ground regions, one after the other. The observers’ task was to report which probe appeared first. Such TOJs can reveal where attention is allocated, because events occurring at an attended location are perceived as occurring before events at an unattended location, even when both occur simultaneously (e.g., Shore, Spence, & Klein, 2001). Lester et al.’s observers perceived the probe on the figure as appearing first even when the probe on the ground preceded it by approximately 9 ms. This evidence for prior entry for figures led Lester et al. to conclude that attention is preferentially directed toward figures in a visual display

While Lester et al.’s (2009) clever use of the TOJ technique with figure–ground displays definitely suggests that attention is allocated to near figural regions, they cannot speak to whether attention is automatically allocated to near figures. This is because their base display with a near figure and a far ground was exposed for 500 ms before the first probe appeared. This is more than enough time for voluntary shifts of covert attention and, possibly, even overt shifts of attention via eye movements. Moreover, in Lester et al.’s stimuli, the near and far surfaces always shared a border that was perceived as shaping the near surface. Recent evidence that grounds are inhibited in the vicinity of borders they share with shaped figures (Likova & Tyler, 2008; Salvagio, Cacciamani, & Peterson, 2012) makes it difficult to interpret Lester et al.’s TOJ differences in terms of attentional allocation only.

In three experiments, we used a TOJ paradigm to specifically determine whether attention is biased toward the nearer of two surfaces, an important question in its own right. We separated this question from the attentional figure–ground question by using stimuli in which the far surface is not the ground to the near surface (Fig. 2). The first experiment tested objects that contained equivalent figural information but differed in perceived depth. The second experiment examined whether prior entry effects for near surfaces are automatic by eliminating the long delay between stimulus onset and target onset. A third experiment confirmed the results of Experiment 2 while controlling for the possibility that lateral inhibition of pretarget background stimuli contributed to the prioritization of nearer targets.

Fig. 2
figure 2

Outline of experimental procedure and stimuli for Experiment 1

Experiment 1

In Experiment 1, we asked whether near surfaces attract attention more than do far surfaces. We used a TOJ paradigm similar to that in Lester et al. (2009), where the base display was left up for some time before the targets appeared. Our displays, shown in Fig. 2, were variants of the classic pits (concave stimulus) and bumps (convex stimulus) displays that, on the dominant assumption that light comes from above, use shading and lighting cues to indicate that one region is a bump and the other is a pit (see the left and right circular regions, respectively, in Fig. 2). Thus, unlike Lester et al.’s stimuli, there are no shaped versus shapeless (figure–ground) distinctions between the two stimuli here; for these displays, the borders of both of the circular regions appear to lie on the picture plane, whereas the interior regions of bumps appear to be nearer to the viewer and those of pits appear to be farther from the viewer. Subjects reported the temporal order of two black dot probes that appeared in the center of each base stimulus. By assessing which overlaid dot probe subjects perceived as having onset first as a function of its temporal distribution, we obtained psychometric functions describing the degree of temporal prioritization—or prior entry—one shaded circle stimulus received over the other. If the original finding by Lester et al. applies to near versus far surfaces, we expect that bumps will attract attention and will receive prior entry over pits.

Method

Subjects

Fifteen undergraduates from the University of Toronto participated for course credit. All subjects were naive as to the purpose of the study and had normal or corrected-to-normal vision.

Stimuli and design

Stimuli were presented on a 19-in. ViewSonic Graphic Series G90fb monitor (1,024 × 768 resolution; 120-Hz refresh rate). Viewing distance was held constant at 44 cm by a chinrest. A fixation point was presented in the center of the screen, which consisted of a black square (0.1° × 0.1°). The background color was gray (RGB: 181,181,181). Stimuli used to manipulate perceived surface distance were two circles that were shaded with a luminance gradient that was either brightest at the top, producing the perception of a closer convex surface, or brightest at the bottom, producing the perception of a relatively distant concave surface. Circle stimuli had a diameter of 3.6° and were presented equidistantly on either side of fixation at an eccentricity of 4.1°. Each circle stimulus had an equal chance of being presented to the left or the right. The black dot probe stimuli were 0.65° in diameter, and each was presented in the center of one of the underlying circle stimuli. The dot probe that onset first had an equal chance of appearing over the concave or convex circle stimulus.

Procedure

A typical trial sequence can be seen in Fig. 2. Each subject participated in a block of 20 practice trials and then 10 experimental bocks of 80 trials, which lasted approximately 1 h. Subjects viewed the display binocularly. They were instructed to fixate on the center square, which was presented on screen for a random amount of time ranging from 600 to 1,000 ms, which was then followed by the onset of the pair of convex and concave circle stimuli. After a randomly jittered amount of time (600–1,000 ms), both dot probes were presented in the center of each circle stimulus and separated by a range of stimulus onset asynchronies (SOAs; 24, 48, 60 or 120 ms). Stimuli remained on screen for 72 ms and were then masked. Trial SOAs were randomly selected from trial to trial. Subjects were asked to report the temporal order of the dot probe stimuli by pressing the “z” key if the left dot appeared first or the “/” key if the right dot appeared first. Subjects were never asked to report the stimulus’s identity but simply the spatial location of the dot which arrived first.

Results and discussion

Each subject’s data were fit to separate sigmoid functions, with the average performance displayed in Fig. 3. Functions consisted of the percentage of responses in which subjects reported the probe overlaying the convex circle stimulus as having first onset as a function of SOA between the pit and the bump. Negative SOAs represented trials where the bump (convex) probe was physically presented first, while positive SOAs represented trials where the pit (concave) probe was physically presented first. Each subject's point of subjective simultaneity (PSS) was then calculated to measure the presence of attentional bias toward one of the concurrently displayed circle stimuli. A positive PSS would indicate visual prior entry for the convex stimulus, while a negative PSS would indicate prior entry for the concave stimulus. A two-tailed one-sample t-test was used to test for visual prior entry for one of the stimuli competing for attention. The average subject perceived the dot probe overlaying the convex stimulus as arriving 9.86 ms prior to the concave dot probe, t(14) = 2.20, p < .05, indicating that the perceptually closer convex stimulus was preferentially attended over its concave counterpart. Thus, like Lester et al. (2009), we found that near surfaces were preferentially attended over farther surfaces. Importantly, this attentional bias was found in the absence of a figure–ground relationship between the two objects, indicating that perceived depth alone is sufficient to bias the allocation of attention to one object over another object. What cannot be determined from this experiment, because the base displays were shown for a long period before the targets appeared, is whether this attentional bias is automatic or arises from top-down processes.

Fig. 3
figure 3

Average function for Experiment 1. Each data point represents the average performance across subjects at that stimulus onset asynchrony

Experiment 2

In Experiment 2, we examined whether evidence of prior entry for near surfaces is obtained under conditions where subjects cannot voluntarily allocate their attention to the near surface before the targets appear. In this experiment, the near and far surfaces were themselves the targets, and they appeared on a white bar that was shown on a backdrop of black and white diagonal bar (see Fig. 4). The targets were circular regions filled with stripes that were either orthogonal to or aligned with the stripes in the backdrop. The targets with stripes aligned with the backdrop were perceived as “holes” in the white bar through which the backdrop surface was visible (see Nelson & Palmer, 2001). The pattern of stripes filling the other circular target was orthogonal to the stripes in the backdrop (which we have termed “disks”). These targets gave the appearance of a disk-shaped figure on top of the white bar; thus, these targets were both near and shaped. We chose the hole and disk stimuli, which differ in both shapedness and depth, to maximize the chance of finding any evidence for the automatic biases in the allocation of attention. Importantly, to eliminate the effects of any voluntary shifts in attention, there was no delay between the presentation of the figure–ground stimuli and the targets (i.e., the disk and the hole are the targets), and the location of the two stimuli was randomly varied across the left and right locations. We predicted that if attention is automatically allocated to near surfaces, we should observe prior entry for near targets (disks) over far targets (holes) even when subjects cannot allocate attention to them before the targets appear.

Fig. 4
figure 4

Example of stimuli used in Experiment 2

Method

Subjects

Fifteen undergraduates from the University of Toronto participated for course credit. All subjects were naive as to the purpose of the study and had normal or corrected-to-normal vision.

Stimuli and design

The design and procedure were similar to those used in Experiment 1. Stimuli consisted of an initial display of a white bar stimulus (22° × 3.8°) presented for a randomly jittered amount of time (600–1,000 ms) on diagonal black and white line grating. Targets were circular stimuli that consisted of a black and white diagonal line grating that overlaid the white bar. The gratings making up the circles were either identical to the background, giving the appearance of a hole through the white bar, or were in the opposite direction of the background, giving the appearance of an object (a disk) on top of the white bar. The circle stimuli had a diameter of 3.15° and were equidistant from fixation at an eccentricity of 4.15°. Trials were counterbalanced across all possible combinations of contrasting line orientations between target and background stimuli. All aspects of the procedure were the same as those in Experiment 1.

Results and discussion

The results for Experiment 2 are summarized in Fig. 5. Each subject’s data were fit to separate sigmoid functions, with the average performance displayed in Fig. 5. Functions consisted of the percentage of responses in which subjects reported the near stimulus as having onset first. Negative SOAs represented trials where the disk stimulus was physically presented first, while positive SOAs represented trials where the hole stimulus was physically presented first. Each subject's PSS was calculated to measure the presence of attentional bias toward one of the concurrently displayed circle stimuli. A positive PSS would indicate visual prior entry for the disk stimulus, while a negative PSS indicated prior entry for the hole stimulus. A two-tailed one-sample t-test was again used to test for visual prior entry for one of the stimuli competing for attention. On average, subjects perceived the disk stimulus as arriving 10.46 ms prior to the concurrently displayed concave hole stimulus, t(14) = 3.26, p < .01, indicating that the perceptually closer foreground stimulus demanded attention over its perceptually distant counterpart. Thus, even when not enough time is allowed for people to voluntarily allocate attention across near and far surfaces in advance, perceptually closer surfaces demand attention over perceptually farther surfaces that onset simultaneously.

Fig. 5
figure 5

Average function for Experiment 2. Each data point represents the average performance across subjects at that stimulus onset asynchrony

Experiment 3

The results of Experiment 2 could possibly be due to lateral inhibition of the hole target stimuli by the backdrop, because the stripes inside the hole were collinear with those outside the hole, whereas the stripes inside the disk were orthogonal to those in the backdrop. Thus, it is not clear whether enhanced attention to the disk stimulus was necessarily a consequence of its perceived nearness and whether it could possibly be a consequence of the fact that the hole target was inhibited more by the backdrop than was the disk target, leading to its slower processing. We therefore conducted a third experiment using the same convex and concave stimuli from Experiment 1, where the backdrop is equally dissimilar to the two targets. To test whether attention is preferentially allocated to near surfaces before time is allowed to bias one spatial region of the visual field over another (as in Experiment 1), we used the concave and convex stimuli themselves as targets, instead of dot probes. We predicted that if attention is preferentially allocated to near surfaces, we should observe prior entry again for near (convex) targets over far (concave) targets even when subjects cannot allocate attention before the targets appear or inhibit regions of the background in advance in any way that can bias the competition.

Method

Subjects

Fifteen undergraduates from the University of Montreal participated in Experiment 3. All subjects were naive as to the purpose of the study and had normal or corrected-to-normal vision.

Stimuli and design

The stimuli used in Experiment 1 were used again in Experiment 3. The design and procedure were identical to those in Experiment 1, with the exception that, instead of dot probes being presented as targets, the full pit (concave stimulus) and bump (convex stimulus) stimuli were presented as competing stimuli in the TOJ task.

Results and discussion

The results for Experiment 3 are summarized in Fig. 6. Data were analyzed using separate sigmoid functions following the same procedure as that in Experiments 1 and 2. Functions consisted of the percentage of responses in which subjects reported the convex circle stimulus (the “bump”) as having first onset as a function of SOA between the pit and the bump. Negative SOAs represented trials where the bump (convex) stimulus was physically presented first, while positive SOAs represented trials where the pit (concave) stimulus was physically presented first. A positive PSS would indicate visual prior entry for the convex stimulus, while a negative PSS would indicate prior entry for the concave stimulus. A two-tailed one-sample t-test was used to test for visual prior entry for one of the stimuli competing for attention. The average subject perceived the dot probe overlaying the convex stimulus as arriving 9.39 ms prior to the concave dot probe, t(14) = 2.30, p < .05, indicating that the perceptually closer convex stimulus was again preferentially attended over its concave counterpart.

Fig. 6
figure 6

Average function for Experiment 3. Each data point represents the average performance across subjects at that stimulus onset asynchrony

General discussion

The present results suggest that near surfaces automatically bias attention. In three experiments, we found visual prior entry for near surfaces or near objects when depth cues are compelling. Experiment 2 showed that these results could not be due to the voluntary allocation of attention, since no time was allowed for this to occur before the targets were presented. Experiment 3 confirmed this result by controlling for lateral inhibition of pretarget background stimuli. This is the first time visual prior entry for near surfaces has been obtained under conditions where the effects were not potentially due to volitional attention. Thus, our results support the hypothesis that attention is biased to near surfaces over far surfaces. Our results extend the previous results reported by Lester et al. (2009), since their TOJ effects could have been due to the top-down volitional allocation of attention during the presentation of their stimuli or to the inhibition applied to ground surfaces near the borders they share with figures.

Lester et al. (2009) used displays in which one region was specified as nearer than the other by multiple monocular depth cues, as well as by the figural cue of convexity. They interpreted their results as evidence that figures automatically attract attention. Although our results show that surfaces specified to be closer to the viewer by monocular depth cues automatically attract attention, they do not necessarily imply that shaped figures like those in Fig. 1 automatically attract attention. Indeed, because Lester et al. did not separately examine depth and shape cues, their results too may apply only to near surfaces. Related to this point, Mojica et al. (2012; Salvagio et al., 2011) found that regions perceived as figures by virtue of configural cues only, and not depth cues, are given attentional priority only under conditions of spatiotemporal uncertainty.

It is possible that near surfaces might be automatically attended to in the same way that a reflexive peripheral cue is automatically processes (e.g., Luck & Thomas, 1999). One possible ecological reason for this early prioritization of near surfaces stems from the fact that we interact with and grasp objects that are near to us. Thus, near surfaces have more motivational significance to us, as compared with background objects, and might be prioritized in the same way as other types of motivationally significant stimuli.

Taken together, Lester et al.’s (2009) experiments and our experiments clearly show the advantages of using TOJs to determine attention capture effects, since the paradigm is very sensitive to differences in visual stimuli and can measure capture without the complication of either peripheral or central attentional cues. Because of this, it may be possible to use TOJs to reexamine the automaticity of many attentional phenomena that have been largely based on cue and reaction time tasks, such as object-based attention (e.g., Egly & Homa, 1984) or attentional control sets (e.g., Folk et al., 1992).