Age and Species Comparisons of Visual Mental Manipulation Ability as Evidence for its Development and Evolution

Intelligent behavior is shaped by the abilities to store and manipulate information in visual working memory. Although humans and various non-human animals demonstrate similar storage capacities, the evolution of manipulation ability remains relatively unspecified. To what extent are manipulation limits unique to humans versus shared across species? Here, we compare behavioral signatures of manipulation ability demonstrated by human adults and 6-to-8-year-old children with that of an animal separated from humans by over 300 million years of evolution: a Grey parrot (Psittacus erithacus). All groups of participants completed a variant of the “Shell Game”, which required mentally updating the locations of varying set sizes of occluded objects that swapped places a number of times. The parrot not only demonstrated above-chance performance, but also outperformed children across all conditions. Indeed, the parrot’s accuracy was comparable to (and slightly better than) human adults’ over 12/14 set-size/number-of-swaps combinations, until four items were manipulated with 3–4 swaps, where performance decreased toward that of 6- to 8-year-olds. These results suggest that manipulation of visual working memory representations is an evolutionarily ancient ability. An important next step in this research program is establishing variability across species, and identifying the evolutionary origins (analogous or homologous) of manipulation mechanisms.


Supplementary Experiment 1: Computerized Variant of Shell Game (Adults Only)
To ensure that the behavioral pattern observed in the main experiment cannot be attributed to either temporal decay or strategy use, we conducted a computerized variant of the Shell Game with human adults. Once more, participants were presented with 2-4 colors that swapped 0-4 times. However, in this experiment, a dwell period was enforced for each condition after all swaps were completed, such that the overall trial duration was held constant across all conditions (i.e. a 0 swap trial lasted just as long as a 4 swap trial). Thus, observed behavioral limits in manipulation ability could not be attributed to factors relating to temporal decay. Moreover, to prevent participants from using an "n-1" tracking strategy (attending to fewer items in the display and inferring the identity of an untracked item), we altered the way in which the target was tested. Here, a probe appeared around one of the items, and participants were instructed to identify the color of that target by choosing from a color bar of 8 possible color options.

Participants
Thirteen undergraduates from the Johns Hopkins University with normal or corrected-tonormal vision took part in the study in exchange for course credit. All experiments described in this study were approved by the school's internal review board. Sample size was chosen based on a similar pilot experiment containing the same number of participants.

Equipment
The experiment was conducted in a dimly lit room. Stimuli were presented on a Macintosh iMac computer with a viewable area of 43.5 x 27 cm. Viewing distance was not fixed, but averaged approximately 57 cm.

Stimuli
Prior to the onset of each trial, participants were presented with a two-digit (each measuring 1.7° x 0.85° of visual angle) verbal load at the center of the screen. All memory displays consisted of colored circles (diameter of each circle: 1.23° of visual angle), whose locations were randomly chosen from four vertices of an imaginary square (7.35° x 7.35° of visual angle) that was located at the center of the screen. The colors with which these circles could appear were randomly chosen without replacement from eight discrete categories: red, cyan, yellow, green, blue, orange, brown, and magenta. Each circle was framed by a circular white outline, whose thickness measured 0.13° of visual angle.

Procedure
Participants completed 300 randomly shuffled trials of a computerized Shell Game task (Supplementary Figure 1). The start of each trial was marked by a central fixation cross (black, 0.5° x 0.5° of visual angle) that was presented for 500 ms. After an interstimulus interval of 100 ms, participants were presented with a two-digit verbal load that they were previously instructed to rehearse out loud (≈1 digit/sec) throughout the trial. An experimenter was seated behind the participant throughout the duration of the entire trial, to ensure compliance with these instructions. These digits (Font Size: 35, Font: Calibri, Color: white) were presented on the screen for 500 ms, and were followed by a memory display, after an interstimulus interval of 1000 ms. Each memory display consisted of a varying set size of colored circles with white outlines. This display was presented for 500 ms, and was followed by a screen in which the colors of the circles disappeared, leaving behind the white outlines. After the colors disappeared, the display remained static for a memory consolidation period lasting 1900 ms. Following this consolidation period, one of two trial types occurred (i.e., no movement or movement trials). In the no movement condition (0 swaps), the consolidation period was immediately succeeded by the presentation of the test display. In this way, performance on 0 swap trials was used to estimate limitations in storage capacity.
In the movement conditions, the consolidation period was followed by dynamic displays where two of the circular items (still defined by their white outlines) proceeded to swap positions at a rate of 100 pixels per frame. During each swap, the pair of targets would move smoothly across the screen, following a parabolic trajectory, passing each other, and then come to rest in their new positions. Dynamic trials included 1, 2, 3, or 4 swaps. Each swap lasted an average of 1650 ms, and was separated by a 1600 ms reconsolidation period prior to the onset of the next event (i.e., either the test display or another swap). The selection of which targets would participate in a given swap was random and unconstrained. The observer's task was to watch the swap and attempt to update their representations of where each color would appear in the test display. Thus, performance on movement trials was used to estimate the observer's ability to dynamically update color-location information in visual working memory. After all swaps and consolidation periods were completed, all items remained stationary for a variable period of time, such that overall duration was held constant across all conditions (≈16.9 seconds). This dwell period was subsequently followed by the test display. In both no movement and movement conditions, a black square outline (2.45° by 2.45° of visual angle) appeared around one of the items. Participants had to indicate the identity of that stimulus by using the mouse to click on a color that appeared in a color bar. The color bar was always comprised of 8 squares that were arranged horizontally and contained all of the possible discrete colors used in the memory display. This method provides the additional advantage of characterizing the types of errors participants make in this task (reporting the identity of a color presented in the memory display vs. one that was not shown at all). By changing the response mode in this way, participants were encouraged to encode the identities of all items.

Results:
Results are illustrated in Supplementary Figure 2. A 3 (set size: 2-4) x 5 (number of swaps: 0-4) within-subjects ANOVA revealed a significant main effect of set size, F(2,46)=40.09, p<.001. Performance was highest for Set Size 2, which differed from both Set Size 3, F(1,46)=28.98, p<.05, and Set Size 4, F(1,46)=61.29, p<.05. Performance for Set Size 3 was also significantly better than Set Size 4, F(1,46)=21.24, p<.05. Furthermore, the ANOVA revealed a significant main effect of number of swaps F(4,92)=10.11, p<.001, as performance decreased with increasing numbers of swaps. Specifically, performance was higher in the static (0 swaps) condition compared to the dynamic (1-4 swaps) conditions, F(1,92)=34.84, p<.05, suggesting that there is an increased cost associated with manipulating representations, relative to simply storing them in visual WM. The ANOVA also revealed a significant interaction of set size x number of swaps, F(8,184)=2.64, p<.03. A significant linear (Set Size 2) x linear (Set Sizes 3 and 4) contrast was observed, F(1,184)=22.79, p<.05, suggesting differential effects for manipulating 2 items in visual WM, compared to manipulating 3 or 4. Taken together, these results demonstrate little to no cost for manipulating 2 items, whereas performance systematically decreases as a function of the number of swaps when manipulating 3 or 4. These results replicate the pattern observed on the main experiment, while controlling for factors relating to temporal decay and n-1 tracking strategies.

Bootstrapping Parrot's Performance
To investigate whether the parrot's performance on each condition significantly differed from chance (50%, 33.33% and 25% for Set Sizes 2-4, respectively), we performed a bootstrap analysis. For each condition of set size by number of swaps, we randomly sampled 8 trials (with replacement) from the observed data and calculated the mean of this sample. By repeating this procedure across 10,000 iterations, we obtained a grand mean across all these samples and 95% confidence intervals. The results of this bootstrap analysis suggest that the parrot's performance across all conditions exceeded chance performance for that set size, as the 95% confidence intervals did not overlap with chance values. Supplementary Table 2 presents the means of the 10,000 iterations for each condition, along with confidence intervals and chance level for that condition. The distribution of means across the 10,000 iterations are presented in Supplementary