Yale Perception & Cognition Lab

VSS '04 Abstracts
 
 
Jump to:  
Cheries, E., Santos, L. R., & Scholl, B. J. (2004). Units of visual identification in rhesus macaques (Macaca mulatta): Objects or unbound visual features? Poster presented at the annual meeting of the Vision Sciences Society, 5/4/04, Sarasota, FL.  
An essential task for the visual system is to structure the incoming wash of unbound visual features into coherent object representations. To date, this process has been explored in two independent research programs. Behavioral research has explored when and how infants and nonhuman primates will use featural information to individuate and identify objects across occlusion, but has not directly demonstrated that such features are bound to the objects in question. Research in psychophysics and neuroscience, in contrast, has focused on the details of how and when the binding process itself occurs, but has not demonstrated that information about binding is used for in the service of other higher-level decisions. Here we try to bring these two research programs closer together, by exploring whether rhesus monkeys will use bound object representations in addition to individual unbound features to identify persisting objects over time and occlusion. Subjects observed a stage containing two stationary objects which could differ in both color and shape. After 3 repeated identical presentations (separated by the addition and removal of a screen), a final display was uncovered introducing two objects that had changed shape (e.g. green and yellow circles had changed to green and yellow triangles); changed color (e.g. a green circle and triangle had changed to a yellow circle and triangle); swapped a single feature (e.g. a green circle and yellow triangle had changed to a green triangle and yellow circle); or didn't change at all. The subjects dishabituated to all actual changes. Because the 'feature swap' condition did not introduce any new features into the display, dishabituation here demonstrates that the color and shape features were bound together into coherent objects which the subjects identified over time and occlusion. These results provide the first evidence to date that feature binding is used in monkeys' dynamic object identification.
 
Choi, H., & Scholl, B. J. (2004). The temporal dynamics of causal perception. Poster presented at the annual meeting of the Vision Sciences Society, 5/3/04, Sarasota, FL.  
In simple dynamic events we can perceive not only motion, but also higher-level properties such as causality, as when we see one object collide with another. The perception of causality often appears to be automatic and driven by highly constrained and stimulus-driven 'rules', and ever since Michotte a major project has been to uncover the nature of these rules. Here we take up a different project: beyond determining what information affects causal perception, we explore the temporal dynamics of when such effects occur. We do so by exploiting a recent dynamic grouping effect discovered by Choi and Scholl (in press): when an object (A) moves toward a stationary object (B) until they overlap completely, at which point A stops and B starts moving, observers can easily perceive noncausal 'passing', wherein one moving object passes over another. When another stationary object (C) is added to the display aligned with B, observers still see A and B as noncausal passing. In contrast, when C moves along with B, observers now perceive A and B as causal 'launching'. In our previous work, C always moved in perfect synchrony with B; here, in contrast, it sometimes moved a bit earlier or later, with this temporal offset varied parametrically. Surprisingly, we found that C's motion can induce causal perception even when it begins after the full-overlap between A and B. This illustrates a type of 'postdictive' perception: our conscious perception of the world is not an instantaneous moment-by-moment construction, but rather is formed by integrating information presented within brief extended temporal windows. Such effects have been previously demonstrated for low-level motion phenomena, but the present results demonstrate that postdictive processes can also influence higher-level event perception. These findings help to characterize not only the stimulus-driven 'rules' employed in causal perception, but also the temporal dynamics of how and when they operate.
 
Fiser, J., Scholl, B. J., & Aslin, R. N. (2004). Perception of object trajectories during occlusion constrains statistical learning of visual features. Talk given at the annual meeting of the Vision Sciences Society, 5/4/04, Sarasota, FL.  
A variety of perceptual and attentional processes contribute to the persistence of object identity during occlusion. Do these processes also constrain our implicit learning of new visual features in unfamiliar contexts, or does statistical learning operate equally well on all available information in a scene? To address this question, we studied statistical learning in 'bouncing vs. streaming' displays: two objects moved along the diagonals of a square, passing briefly behind a central occluder at the same moment. Observers can perceive this ambiguous display as either two objects 'streaming' past each other on linear trajectories, or as two objects 'bouncing' off each other while occluded (each changing direction by 90 deg). We first demonstrated that manipulating the acceleration/deceleration profiles of the objects can reliably determine whether bouncing or streaming is perceived, even when the colors and shapes of the objects change randomly each time they pass behind the occluder. Subjects in the main experiment then viewed such displays in which there were statistically reliable patterns in the transition between different features over time. Subjects viewed 192 such displays (half biased to 'bouncing', half to 'streaming') with no explicit task during familiarization. In the test phase, they then viewed two pairs and judged which was more familiar. They could readily identify coherent pairs over random pairs [t(35)=10.58, p<.0001]. More importantly, when their choice involved pairs that always appeared together in the same display during familiarization, their selection was biased to those pairs that were consistent with the bouncing or streaming interpretations imposed by the spatiotemporal characteristics of the display [t(35)=2.17, p<.05]. These results suggest that statistical learning of new visual features is constrained by perceptual and/or attentional processes involved in the computation of object persistence during occlusion.
 
Flombaum, J. I., & Scholl, B. J. (2004). A temporal same-object advantage for persisting objects: Change-detection studies of the 'tunnel effect'. Poster presented at the annual meeting of the Vision Sciences Society, 5/4/04, Sarasota, FL.  
Meaningful visual experience requires computations that identify objects as the same persisting individuals over time and motion. How does the visual system manage this task for scenes that contain occlusion and featural change? Here we explore this question using a new variant of the 'tunnel effect': When an object moves behind an occluder (the 'tunnel') and then an object later emerges, we often irresistibly perceive the continuous motion of a single persisting object. This percept occurs even when the pre- and post-occlusion views are featurally distinct, so long as the perceived trajectory is spatiotemporally consistent. Previous studies of the this phenomenon have relied on verbal reports which are notoriously susceptible to higher-level response biases. Here we introduce the first implicit measure of the tunnel effect, involving change detection in dynamic displays. Observers viewed displays in which several objects oscillated behind occluders, each emerging (a) on a spatiotemporally continuous trajectory ('Tunnel' events); (b) after a delay ('Temporal Gap' events); or (c) in an incongruous location, displaced along the occluder boundary ('Spatial Gap' events). Objects occasionally changed color while occluded, and observers had to detect these changes. Performance was significantly more accurate for Tunnel events than for either Spatial Gap or Temporal Gap events. We argue that these and related results reflect a new type of dynamic 'same-object advantage', in which change detection is improved across temporal scene fragments that are bound into the same persisting object representations. This work also illustrates how spatiotemporal properties play a key role in the perception of persisting objects even in the face of conflicting featural information. We demonstrate how several variants of this task can be used to explore the relative contributions of various cues used by the visual system in the construction of coherent visual experience.
 
Marino, A. C., & Scholl, B. J. (2004). The role of closure in defining the 'objects' of object-based attention. Poster presented at the annual meeting of the Vision Sciences Society, 4/30/04, Sarasota, FL.  
Many recent studies have concluded that the underlying units of visual attention are often discrete objects whose boundaries constrain the automatic spread of attention through a scene. However, relatively few studies have explored the particular stimulus cues that determine what counts as an 'object' of attention. Here we explore this issue in the context of the 'two-rectangles' stimuli previously used by many investigators. We first show, using both spatial cueing and divided-attention paradigms, that same-object advantages occur even when the ends of the two rectangles are not drawn. This is consistent with previous reports which have emphasized the importance of individual contours in guiding attention, and our study shows that such 'line-tracing' effects occur in these paradigms not only in uniform patterns, but also in displays which contain multiple grouping cues. In our divided-attention experiment, however, this contour-driven same-object advantage (without closure) was significantly weaker than that obtained with the standard two-rectangles stimulus (with closure) -- demonstrating that contour-based processes do not account for all 'object-based' effects. Methodologically, our study is consistent with the idea that divided-attention paradigms are a more sensitive measure of object-based effects than spatial cueing. Theoretically, our results confirm and extend the observation that same-object advantages can be observed even without full-fledged 'objects'. At the same time, however, these studies show that boundary closure -- one of the most important cues to 'objecthood' per se -- can directly influence attention. We conclude that object-based attention is not an all-or-nothing phenomenon: object-based effects can be independently strengthened or weakened by multiple cues to objecthood.
 
Mitroff, S. R., & Scholl, B. J. (2004). Online grouping and segmentation without awareness: Evidence from motion-induced blindness. Talk given at the annual meeting of the Vision Sciences Society, 5/4/04, Sarasota, FL.  
The visual system must parse and group the incoming input into discrete units, but it has proven difficult to determine when and how this occurs. Here we show that both object and group representations can be formed, disrupted, and updated without awareness. We do so using the phenomenon of motion-induced blindness (MIB), wherein salient and attended objects will fluctuate into and out of conscious awareness when superimposed onto certain global moving patterns. Previous research has shown that both objecthood and grouping influence MIB. For example, two discs will tend to enter and leave awareness simultaneously if grouped into a single unit (even by cues such as proximity), but will otherwise tend to undergo MIB independently. Here we alter various segmentation and grouping cues while two discs are unseen during MIB, and find that such changes influence whether the discs reappear independently. For example, adding a line to form a dumbbell during MIB causes two discs to reenter awareness together. Similarly, when the connecting line of an initial dumbbell is removed during MIB, the discs reappear independently. These results indicate that object representations can be formed and disrupted outside of awareness. Similar effects occur with grouping cues such as proximity. Observers viewed three evenly-spaced discs in a horizontal line, and reported when all three disappeared due to MIB. At this point -- while the discs were present but unseen -- a single randomly chosen disc gradually faded out, leaving two discs which were either close together or separated. Separated discs tended to reenter awareness independently, whereas neighboring discs reappeared simultaneously, indicating that their grouping strength had been revised outside of awareness. In these and several other examples, we illustrate how MIB can be used as a tool to determine the importance of conscious awareness for several types of visual processing. Demonstrations: http://www.yale.edu/perception/mib/.
 
Scholl, B. J., & Feigenson, L. (2004). When out of sight is out of mind: Perceiving object persistence through occlusion vs. implosion. Talk given at the annual meeting of the Vision Sciences Society, 5/1/04, Sarasota, FL.  
The visual system must not only segment scenes into discrete objects, but also track objects over time, motion, and occlusion as the same persisting individuals. Here we explore how this is accomplished in a multiple-object tracking task where the objects are intermittently invisible. Subjects attentionally tracked 4 target objects as they moved unpredictably for 10 seconds in a field of 4 featurally-identical moving distractors. Previous research (Scholl & Pylyshyn, 1999) demonstrated that tracking is unimpaired by occlusion, but is radically impaired when objects disappear and reappear in ways which do not implicate the presence of an occluding surface (e.g. by imploding and exploding into and out of existence). However, this research never determined whether the impairment was caused by the 'explosion' (due to attentional capture from onsetting looming objects) or 'implosion' (which signalled that the objects ceased to exist). Here we explored these options in several ways. We first showed that when targets behave in 'object-consistent' ways but additional distractors implode and explode, subjects' performance is unimpaired. This suggests that the behavior of the targets, rather than that of the distractors, is critical. A second experiment asked whether impairment was due to target implosion or target explosion by having all objects approach an occluder in one way and depart in a different way. Performance with implosion followed by disocclusion was significantly worse than with occlusion followed by explosion. These results argue against an attentional-capture account, and suggest that implosion cues cause the corresponding 'object files' in mid-level vision to be discarded -- whereas occlusion cues cause object files to be preserved. Following Gibson, we conclude that the visual system -- and attentional tracking in particular -- uses occlusion to infer that objects are merely going out of sight, and implosion to infer that objects are going out of existence.
 
Sussman, R. S., & Scholl, B. J. (2004). Finding the mean: The flexibility and limitations of visual statistical processing. Poster presented at the annual meeting of the Vision Sciences Society, 5/4/04, Sarasota, FL.  
We typically think of visual perception as the recovery of increasingly elaborated information about individual objects in a scene. Recent research, however, suggests that other visual processes automatically exploit regularities of scenes in order to construct 'statistical summary representations'. For example, human observers are able to quickly and effortlessly determine the mean size of a set of heterogeneous circles -- even when they cannot reliably encode information about the particular individuals which compose such a set. To investigate the flexibility of these representations, we explored the types of objects over which such processes can operate. Observers viewed scenes consisting of various shapes, and reported whether the average shape size was greater on the left or right half of the display. We first illustrate the striking flexibility of this process by demonstrating that robust statistical summary representations can be formed even over highly degraded stimuli: for example, observers can easily compare the mean sizes of a set of circles and a second set of crosses, even when both sets are presented in a single display for only 300 ms. Previous research has assumed that mean sizes are compared on the basis of area, but our results show that more fundamental shape dimensions like diameter play a critical role. We also uncover important limitations of this ability: for example, observers are unable to selectively extract only the mean height or width of a set of ellipses. By showing that the heterogeneity and complexity of the stimuli modulate the ability to selectively extract information, we emphasize the stimulus-driven, automatic nature of statistical extraction. Collectively these experiments demonstrate how visual processing is streamlined via statistical summary representations, and more precisely how such representations are constructed.