Yale Perception & Cognition Lab

VSS '05 Abstracts
 
 
Jump to:  
Cheries, E. W., Feigenson, L., Scholl, B. J., & Carey, S. (2005). Cues to object persistence in infancy: Tracking objects through occlusion vs. implosion. Poster presented at the annual meeting of the Vision Sciences Society, 5/10/05, Sarasota, FL.  
Objects in the real world frequently move in and out of view, as when they pass behind occluding surfaces. Even infants are able to keep track of objects over time and motion in such situations, despite long occlusion intervals. What factors support and constrain this ability? Research on mid-level vision in adults suggests that persisting object representations are constrained by the precise manner of an item's disappearance at an occluding boundary. Here we explore the power of this cue, in a study of infants' numerical representations. Infants were habituated to dynamic displays of either 2 or 3 randomly moving identical items, which disappeared and reappeared from behind occluders. In the Occlusion condition, the items disappeared and reappeared gradually, via normal accretion and deletion cues along a single edge. In the Implosion condition, the items still disappeared and reappeared gradually (and at the same rate), but did so from all contours simultaneously -- 'imploding' out of existence and then 'exploding' into existence. In a test phase, which was identical across both conditions, infants' looking times were then assessed to 2 versus 3 moving objects without occluders. Infants in the Occlusion condition looked longer to test displays with a novel number of objects compared to habituation, but infants in the Implosion condition showed no such preference for the number of objects. Thus, only infants in the Occlusion condition were able to establish representations of a constant number of items over habituation. We conclude that the local manner in which an item disappears and reappears serves as a fundamental cue to the maintenance of numerical identity over time: occlusion is a cue that an object has gone out of sight, while implosion is a cue that an object has gone out of existence. More generally, these results are consistent with the idea that the same types of representations are being studied in adult mid-level vision and infant object cognition.
 
Choi, H., & Scholl, B. J. (2005). Can the perception of causality be measured with representational momentum? Poster presented at the annual meeting of the Vision Sciences Society, 5/9/05, Sarasota, FL.  
In a collision between two objects, we can perceive not only properties such as shape and motion, but also the seemingly high-level property of causality. It has proven difficult, however, for vision researchers to measure causal perception in a quantitatively rigorous way which goes beyond subjective perceptual reports. Recently researchers have attempted to solve this problem by exploiting the phenomenon of representational momentum (RM): estimates for the final position of a moving target that disappears are displaced in the direction of the motion. Hubbard and colleagues measured RM in the context of 'launching' events, wherein an object (A) moves toward a stationary object (B) until they are adjacent, at which point A stops and B starts moving. In this situation, RM for B is reduced compared to the case when B moves in isolation. This is explained by appeal to a hardwired visual expectation that a 'launched' object is inert and thus should readily cease its movement without a source of self-propulsion. A limitation of these studies, however, is that perceived causality was always associated with either (1) the number of objects in the display, or (2) the existence of spatiotemporally continuous motions -- both likely to influence RM. We studied RM for displays which did not differ in these respects, contrasting causal launching vs. non-causal 'passing' (wherein one object is simply seen to pass through another stationary object). With such displays, however, RM is no smaller for launching than for passing -- despite the fact that we first successfully replicated the results of previous experiments using these same stimulus parameters and statistical power. Our null effect for launching vs. passing replicated several times using various parameters, well matched to those in previous experiments. We conclude that the RM-attenuation effect may not be a pure measure of causal perception, but may rather reflect lower-level spatiotemporal correlates of some causal displays.
 
DiMase, J. S., Chun, M. M., Scholl, B. J., Wolfe, J. M., & Horowitz, T. S. (2005). Learning scenes while tracking disks: The effect of MOT load on picture recognition. Poster presented at the annual meeting of the Vision Sciences Society, 5/6/05, Sarasota, FL.  
We have a remarkable ability to recognize a large number of scenes after viewing each only briefly. This ability is significantly reduced if visual attention is diverted to a superimposed letter search task during initial encoding of those scenes (DiMase, et al, OPAM 2003). How general is this impairment? In the present study, observers performed multiple object tracking (MOT) during initial encoding. This task served to both allow attentional load to be varied on different trials without changing the display and ensure that attention was broadly distributed across the scene. Additionally, prior experiments assessed scene memory after a substantial delay. The present study uses an additional, immediate (working memory) test. On each trial, observers were asked to track 0, 2, or 4 among 8 moving disks while 3 scene photographs were successively presented behind them. After five seconds, the disks stopped moving and observers indicated the ones they were asked to track. Working memory for the scenes was assessed by the immediate presentation of a single image, which was either one of the 3 scenes displayed during the trial or a completely new scene. Following the set of dual-task trials, long-term memory for the scenes was examined in a test consisting of half new pictures and half old pictures, one from each of the earlier dual-task trials. On the MOT task, observers performed better when tracking 2 items (72%) then when tracking 4 (95%). In the working memory test of scene recognition, performance decreased as MOT load increased (d' for track 0 = 2.27, track 2 = 1.69, track 4 =1.40). Long-term scene memory was poorer than working memory and was highly impaired for both load conditions (d' for track 2: 0.62, track 4: 0.61) compared to the zero track condition (d'= 1.09). These findings suggest that the ability to encode and recognize scenes in working memory and long-term memory is dependent on the degree to which visual attention is available during presentation.
 
Flombaum, J. I., & Scholl, B. J. (2005). Visual working memory for dynamic objects: Manipulations of motion and persistence in sequential change detection. Poster presented at the annual meeting of the Vision Sciences Society, 5/8/05, Sarasota, FL.  
Recent work suggests that the units of visual short-term memory (VSTM) are integrated object representations. The primary evidence for this comes from sequential change detection studies in static displays. Here we study how VSTM operates in dynamic displays, wherein the identities of objects must be maintained over time and motion. On each trial, participants viewed 2, 4, or 6 colored moving shapes, each of which disappeared at an invisible occluder, and reappeared 900 ms later at the occluder's other edge. In the Occlusion condition, objects disappeared and reappeared via deletion and accretion along a single edge. In the No Occlusion condition, objects simply disappeared and reappeared instantaneously. Subjects had to detect color changes that occurred on half of the trials. Performance was compared (within subjects) to a Static condition, wherein objects disappeared and reappeared at the same location. Overall performance was equivalent in all conditions, but the interactions with set size proved interesting: There was no difference between conditions with 2 objects, due to a ceiling effect. With 6, performance was better in the Static condition. Most intriguingly, performance with 4 objects was actually *better* with dynamic stimuli in the No Occlusion condition. These results suggest that VSTM storage may occur differently for displays of 4 or fewer objects: up to four objects can be simultaneously attended, with motion then serving as a cue to help divide attention. Efficient VSTM encoding thus occurs in terms of integrated object representations. In contrast, you cannot divide attention over 6 objects in the first place; thus the motion cue cannot aid VSTM encoding, and is only an added distraction. Most generally, these results demonstrate that change detection can be used to explore the factors which aid or constrain VSTM in dynamic displays, and further experiments investigate how manipulations of object persistence influence VSTM encoding.
 
Junge, J. A., Turk-Browne, N. B., & Scholl, B. J. (2005). Visual statistical learning through intervening noise. Poster presented at the annual meeting of the Vision Sciences Society, 5/8/05, Sarasota, FL.  
A primary goal of visual processing is to extract statistical regularities from the environment in both space and time, and recent research on visual statistical learning (VSL) has demonstrated that this extraction can occur rapidly for even subtle correlations in homogenous streams of stimuli. In the real world, however, most regularities do not exist in isolation, but rather are embedded in noisy and heterogeneous input streams. To explore VSL in such contexts, we measured subjects' ability to extract statistical regularities in time through intervening distractors, in a stream of shapes appearing one at a time. Novel shapes were randomly assigned to one of two color groups and within each group they were clustered into temporal 'triplets' -- three shapes that always appeared in the same order. Shapes from both color groups were then randomly interleaved, maintaining triplet order (e.g. triplets abc in red, and XYZ in green, presented in stream aXbcYZ). Subjects were instructed to perform a repetition detection task for shapes in just one color for 20 min, and were then given an unexpected forced-choice recognition task (without color cues) pitting triplets against random sequences of 3 shapes (from that same color group). This test revealed robust VSL for triplets despite the pervasive interruption by shapes from the other color. This VSL was replicated even with more tightly constrained interleaving, such that no triplet ever occurred without at least one interruption. Additional experiments report (1) whether 'interrupted' VSL of triplets can occur even in the absence of any uninterrupted pairs (aXbYcZ), and (2) whether interrupted VSL occurs even when there are no extrinsic cues (such as color) to distinguish the relevant and irrelevant items. Overall, these demonstrations of VSL through intervening noise suggest that statistical learning may 'scale up' to more real-world contexts wherein we encounter a constantly shifting array of objects, only some of which are related.
 
Mitroff, S. R., Cheries, E. W., Wynn, K., & Scholl, B. J. (2005). Cohesion as a principle of object persistence in infants and adults. Poster presented at the annual meeting of the Vision Sciences Society, 5/10/05, Sarasota, FL.  
A critical task for vision is to represent objects as the same persisting individuals over time and visual change. How is this accomplished? Across several areas of cognitive science, perhaps the most important principle is thought to be cohesion: objects must maintain single bounded contours. Infants, for example, fail to represent complex stimuli that undergo cohesion violations, such as pouring sand, as persisting individuals. Here we explore the role of cohesion in object persistence by examining such violations in their simplest form: a single object splitting into two. We first demonstrated a role for cohesion in adults' visual perception, by showing that splitting (but not similar control manipulations) yields severe performance costs in 'object reviewing' tests of persistence (Mitroff, Scholl, & Wynn, 2004, Psych. Sci.). To explore whether such simple cohesion violations affect infants' perception of persisting objects, we used a forced-choice crawling task with 10- and 12-month-olds. In the control condition, infants were shown one cracker hidden in one location and two crackers hidden in a second location. In the splitting condition they were shown a single cracker hidden in one location and then a larger cracker split into two, with the two resulting pieces hidden in a second location. Infants selectively crawled to the two-cracker location in the control, but they failed to do so in the splitting condition. Even though both conditions involved the same ultimate presentation of one vs. two crackers, the infants were unable to represent the two crackers as 'more' when they resulted from a 'split'. Together, these results with adults and infants suggest that even simple cohesion violations play a key role in the representation of objects as persisting individuals.
 
Noles, N. S., & Scholl, B. J. (2005). What's in an object file? Integral vs. separable features. Poster presented at the annual meeting of the Vision Sciences Society, 5/8/05, Sarasota, FL.  
To make sense of the world we must track objects as the same persisting individuals over time and motion. Such processing may reflect mid-level 'object file' (OF) representations, which track objects over time on the basis of spatiotemporal information while also storing some of their visual features. OFs can be explored via 'object reviewing' (OR) effects, which yield 'object-specific preview benefits' (OSPBs): discriminations of a dynamic object's features are speeded when an earlier preview of those features occurs on the same object, beyond general priming. Here we ask what information is stored in OFs. OR intrinsically requires storing some features, but previous work has suggested that this information may be abstracted, such that changing low-level features of the probe information (e.g. the font of a letter) has no effect. We explored the limits of such abstraction in a modified OR task with more complex stimuli, asking whether the features stored in OFs are always separable (such that irrelevant features can be changed without cost) or may sometimes be integral (such that varying even irrelevant features yields interference). Two faces appeared briefly on objects which then moved, after which a single probe face (rotated to a partial profile) appeared on one of the objects. Observers judged whether the emotion of the probe face matched either of the initial faces' emotions. These judgments yielded robust OSPBs -- but only when the identity of the face itself (independent of the emotion) was also maintained. Identical results were observed with inverted faces, suggesting that these results reflect visual properties, and are not related to specialized categorical processing. In contrast, experiments with simpler stimuli yielded no such differences in OSPBs. We conclude that in some cases OFs store features which are integrally related, such that changes even to task-irrelevant features of the object will foil the maintenance of object-specific information.
 
Scholl, B. J., & Alvarez, G. A. (2005). How does attention select and track spatially extended objects?: New effects of attentional concentration and amplification. Talk given at the annual meeting of the Vision Sciences Society, 5/9/05, Sarasota, FL.  
Much recent research has demonstrated that attention can be allocated to discrete objects in addition to spatial locations, but relatively little research has explored the allocation of attention within individual uniform objects. While it may be that attention spreads uniformly through relatively small objects, real-world situations (e.g. driving) often involve attending to spatially extended objects, often under conditions of motion and high processing load. Here we explore how attention is used to select and track spatially extended objects in a multiple object tracking (MOT) task. Instead of the punctate objects used in most previous MOT studies, observers had to track of a number of long moving overlapped line segments in a field of identical distractors. At the same time, observers had to respond to sporadic probes, and their probe detection performance is used as a measure of the distribution of attention across the lines. In four experiments we discovered two novel phenomena: First, attention seems to be concentrated at the centers of the lines during tracking, despite their uniformity: probe detection was much more accurate at the centers of the lines than near their endpoints. Second, this 'center advantage' grew as the lines became longer: not only did observers get relatively worse near the endpoints, but they became better at the lines' centers -- as if attention became more concentrated as the objects became more extended. Both of these effects were unusually large and robust. Additional results suggest that these effects reflect automatic visual processing rather than higher-level strategies. Beyond demonstrating that objects can serve as units of attention, these results begin to show *how* attention is actively allocated to extended objects over time in complex dynamic displays.
 
Turk-Browne, N. B., Junge, J. A., & Scholl, B. J. (2005). Attention and automaticity in visual statistical learning. Talk given at the annual meeting of the Vision Sciences Society, 5/11/05, Sarasota, FL.  
We typically think of vision as the recovery of increasingly rich information about individual objects, but there are also massive amounts of information about relations between objects in space and time. Recent studies of visual statistical learning (VSL) have suggested that this information is implicitly and automatically extracted by the visual system. Here we explore this possibility by evaluating the degree to which VSL of temporal regularities (Fiser & Aslin, 2002) is influenced by attention. Observers viewed a 6 min sequence of geometric shapes, appearing one at a time in the same location every 400 ms. Half of the shapes were red and half were green, with a separate pool of shapes for each color. The sequence of shapes was constructed by randomly intermixing a stream of red shapes with a stream of green shapes. Unbeknownst to observers, the color streams were constructed from sub-sequences (or 'triplets') of three shapes that always appeared in succession; these triplets comprised the temporal statistical regularities to be learned. Attention was manipulated by having subjects detect shape repetitions in one of the colors. In a surprise forced-choice familiarity test, triplets from both color streams (now in black) were pitted against foil triplets composed of shapes from the same color. If VSL is preattentive, then observers should be able to pick out the real triplets from both streams equally well. Surprisingly, however, they only learned the temporal regularities in the attended color stream. Further experiments that improved learning of the attended stream failed to elicit commensurate improvements for the unattended stream. We conclude that while VSL is certainly implicit (because it occurred during a secondary task), it is not a completely data-driven process since it appears to be gated by selective attention. The mechanics of VSL may thus be automatic, with top-down selective attention dictating the populations of stimuli over which VSL operates.