Using an Innovation Arena to compare wild-caught and laboratory Goffin’s cockatoos

The ability to innovate, i.e., to exhibit new or modified learned behaviours, can facilitate adaptation to environmental changes or exploiting novel resources. We hereby introduce a comparative approach for studying innovation rate, the ‘Innovation Arena’ (IA), featuring the simultaneous presentation of 20 interchangeable tasks, which subjects encounter repeatedly. The new design allows for the experimental study of innovation per time unit and for uncovering group-specific problem-solving abilities – an important feature for comparing animals with different predispositions and life histories. We applied the IA for the first time to investigate how long-term captivity affects innovative capacities in the Goffin’s cockatoo, an avian model species for animal innovation. We found that fewer temporarily-captive wild birds are inclined to consistently interact with the apparatus in comparison to laboratory-raised birds. However, those that are interested solve a similar number of tasks at a similar rate, indicating no difference in the cognitive ability to solve technical problems. Our findings thus provide a contrast to previous literature, which suggested enhanced cognitive abilities and technical problem-solving skills in long-term captive animals. We discuss the impact and discrepancy between motivation and cognitive ability on innovation rate. Our findings contribute to the debate on how captivity affects innovation in animals.

subject was tested repeatedly (once per test day) until it either: (a) did not solve an additional task in five consecutive sessions, or (b) did not solve any task in ten consecutive sessions. This approach allowed us to compare the number of solutions achieved as a function of time across sessions until no new innovations occurred as well as group-specific task difficulty levels.
We apply the IA for the first time to compare two groups of the same species but with different life histories and experiences, laboratory-raised and wild-caught. Some aspects of innovation, such as technical problem-solving (including tool use), seem to be enhanced in several species under long-term captive conditions, a phenomenon known as 'captivity effect' or 'captivity bias' [29][30][31][32][33][34][35] . While short-term captivity may already boost performance through forced proximity or food deprivation 29,30 , long-term captivity has been proposed to directly impact cognitive abilities due to enculturation through the extensive exposure to artificial environments and objects as well as the proximity to and frequent interaction with humans (e.g., 36,37 ). The existing evidence concerning captivity bias on innovation is based largely on observational and often anecdotal data. There are relatively few attempts to directly compare problem-solving performance of wild and captive animals 27,28,31,32,[37][38][39] with only two studies on kea and one on hyenas targeting innovative problem-solving performance between wild and long-term captive animals. In both cases the wild subjects were free-ranging and tested in their natural habitat. While ultimately an important undertaking, such studies can be hard to interpret due to the lack of control over social and environmental distractors, as acknowledged by the respective authors 32 .
In this study, we investigate possible effects of long-term captivity on the innovative capacity in Goffin's cockatoos. This parrot is an opportunist-generalist endemic to the Tanimbar Islands, a small archipelago in the Moluccas, Indonesia 40,41 . It has become an important avian model for investigating innovation and problem-solving (see review in 42 ) as innovativeness is often linked to opportunism 5,15,43 and avian technical problem-solving skills parallel higher primates (e.g., 44,45 ). Goffins can innovate sophisticated forms of tool use and manufacture 18,46,47 despite not being dependent on tool-obtained resources in the wild 40,41 . However, all previous findings so far were exclusively based on a long-term laboratory population of Goffins which is problematic as innovativeness in captivity does not necessarily predict innovativeness in the wild 37 . Thus, we consider comparing innovative behaviour and innovation rate in laboratory-raised versus wild-caught Goffins an important next step and an ideal opportunity to apply the IA approach for the first time.
A captivity bias in problem-solving ability would imply an increased innovation rate in laboratory-raised subjects, particularly when considering human-made, artificial problems. Wild-caught birds should find fewer solutions, might show a preference for previously solved tasks across sessions, and have more problems in detecting novel affordances than laboratory birds.

Results
Qualitative analysis of apparatus-directed behaviour. Within both test groups (laboratory ('Lab') and wild-caught ('Field')) significant differences in motivation to interact with the apparatus were observed. The subjects were either very active and engaged with the apparatus readily and for an extended period of time or showed hardly any interest. If individuals did not touch any task within the first 3 min of a test session, their attention was guided towards the IA with food rewards. Individuals who required an implementation of this 'motivational' protocol at any point of this study (see Supplementary Methods for details) were classified as 'unmotivated' , whereas individuals who did not require any motivational protocols were classified as 'motivated' . The ratio of motivated to unmotivated subjects was higher in the Lab group (10 out of 11) than in the Field group (3 out of 8; Fisher´s exact test: p = 0.04).

Quantitative analysis of apparatus-directed behaviour.
A Bartlett's test revealed significant correlations ( 2 = 1203.5, df = 15, p < 0.001) between the measured apparatus-directed behaviours. We therefore used a principal component analysis before inclusion to the model. It revealed two components which explained together 76.7% of the variance (Supplementary Table S2 shows PCA output). 'Principal Component 1' (PC1) entailed frequency of contact with tasks (baited and solved), duration of time in proximity and the number of tasks touched and explained 58.6% of the variance. 'Principal Component 2' (PC2) loaded negatively on number of tasks touched but not solved, positively on contact with already solved task and described 18.2% of the total variance. Post-hoc analysis revealed that the Lab group showed significantly higher levels of PC1 (W = 78, p < 0.001; see Supplementary Fig. S1) and PC2 (W = 77, p < 0.001) than the Field group.
All motivated individuals had mean levels above 0 for PC1 (range: 0.148-1.972), while values for unmotivated subjects ranged from −2.714 to −2.2, supporting our original classification (see Supplementary Table S1). probability to solve. We used a Generalized Linear Mixed Model to investigate the probability to find solutions (see Methods for a detailed description). There was a significant combined impact of Group, PC1 and PC2 on the probability to solve (full-null model comparison: 2 = 29.64, df = 3, p < 0.001). We then dropped one term at a time from the model while controlling for others. This procedure reveals whether or not the dropped term (e.g., Group) has a significant influence on the probability to solve when all other measures are kept constant. We found no significant effect for Group (estimate = −0.089, SE ± 1.012, z = −0.09, p = 0.945). An increase in PC1 or PC2 significantly increased the probability to solve tasks (PC1: estimate = 2.713, SE ± 0.588, z = 4.61, p < 0.001; PC2: estimate = 0.906, SE ± 0.315, z = 2.87, p = 0.003). Additionally, we found a significant effect of Session: more tasks were solved in later than in earlier sessions (estimate = 1.719, SE ± 0.526, z = 3.27, p = 0.001) (see Fig. 3, Table 1 and Table 2 for model output).
Individual tasks seemed to be of varying difficulty (Fig. 4, Table 2) but overall, Lab and Field groups did not vary significantly in their success depending on the task (comparison of full model with reduced model lacking random slope of Group within Task: 2 = 7.589, df = 5, p = 0.18).
Post-hoc analysis revealed a significant interaction of Group and Session (estimate = 2.924, SE ± 0.854, z = 3.423, p = 0.001; see Table 3 for an overview of all statistical tests and results).

Discussion
Our study successfully introduced the IA as a new paradigm to compare innovative behaviour. It is, to our knowledge, the first study specifically targeting innovation rate per time unit in animals and the first systematically controlled direct comparison of problem-solving between captive-born and temporarily wild-caught animals. It yielded a number of interesting findings, with the most significant one being that long-term captivity does not seem to affect the Goffins' overall capacity to innovate in the IA but rather their motivation to do so.
We measured apparatus-directed behaviours and used a PCA to extract principal components. PC1 was affected by task proximity and physical contact with the apparatus while PC2 loaded positively on contact with already solved tasks and negatively on the number of tasks touched but not solved. Such behaviours are commonly used as measures for motivation (reviewed in 12 ). Post-hoc tests revealed that both components were considerably higher in laboratory-raised than in wild-caught subjects ( Supplementary Fig. S1). Although a unified definition of motivation is mostly lacking, two types are commonly recognized (see 48,49 for review). The first, extrinsic motivation, is driven largely by external influences, such as food rewards, while the other, intrinsic motivation, refers to motivation facilitated by gratification from the task itself or 'interest' (e.g., 48,50 ) which is largely driven by exploration and curiosity 51 . Note that a lack of interest does not mean that an animal also lacks the overall cognitive competency to find the appropriate solution. Nonetheless, intrinsic as well as extrinsic motivation is often one of several components underlying performance, both on a species 48,52,53 , as well as on an individual level 13,14,54-59 .
www.nature.com/scientificreports www.nature.com/scientificreports/ The difference in subjects' motivation between the two groups was clearly expressed in ratio rather than in degree: all six unmotivated birds (5 wild-caught, 1 laboratory) rarely approached the setup or interacted with the tasks (see Supplementary Table S1). In contrast, the remaining birds (3 wild-caught and 10 laboratory) consistently maintained their interest in the setup ( Supplementary Fig. S2) and discovered a similar number of solutions at the same rate ( Supplementary Fig. S3) despite being presented with a highly complex, artificial and novel setup. If any cognitive components, aside from motivation, required to find solutions in the IA were fundamentally different in wild-caught versus laboratory-raised birds, we would have expected the motivated wild-caught birds to perform at a different rate than the laboratory-raised birds.  www.nature.com/scientificreports www.nature.com/scientificreports/ In fact, when motivation was controlled for in our model, group identity, i.e., being either laboratory-raised or wild-caught, did not predict the probability of finding solutions to the technical problem-solving tasks in the IA (Fig. 3, Table 1 for estimates). This suggests that the cognitive capacity required for technical problem-solving in this species does not seem to be solely an artefact of the captive lifestyle or experimental history. Note, however, the observed interaction of group and session: Group identity seemed to have more impact in earlier sessions (Fig. 3c), which might be largely due to unmotivated birds.
Our results thus suggest that motivation is the main, if not the sole, cause for the differences observed. In other words, it is plausible that both wild-caught and laboratory-raised birds can (have the general capacity) perform at a similar level within the IA if they want to (are motivated to interact with the tasks). A captive life history does not seem to influence any other aspects of the overall cognitive capacity required to innovate in the IA.
Our findings contrast with previous literature, which suggested enhanced cognitive abilities and technical problem-solving skills in captive animals as a consequence of increased free time and energy as well as exposure to human environments and contact (e.g., 1,29,32,36 ). Whereas this may indeed be true for some species, particularly those closely related to humans, it may not apply to all. In fact, we believe it is possible that a long-term laboratory environment, even one featuring daily enrichment protocols, may fail to offer a greater cognitive challenge for an island dwelling parrot, such as the Goffin, than their natural habitat. The Tanimbar Islands provide a variety of feeding sources, including hard shelled, underground, patchy and/or seasonal items that are opportunistically exploited by the birds 40,41 . The arguably less predictable environment of wild Goffins coupled with their opportunity to encounter a greater variety of different problems and situations might have facilitated their flexibility 60 . Such opportunism is likely to allow the motivated wild-caught birds to swiftly adjust their problem-solving abilities to novel and artificial extractive foraging tasks. However, further studies are required as until now only a  www.nature.com/scientificreports www.nature.com/scientificreports/ handful of studies have directly compared wild and captive-born groups of the same species in problem-solving performance under the same controlled conditions 32,54 .
Concerning the difference in motivation, there are several theoretical explanations for the observed greater proportion of motivated laboratory-raised versus wild-caught Goffins, some of which are less likely than others. The wild-caught birds did not seem to have problems seeing the reward through the acrylic glass because the vast majority of unmotivated birds solved at least one task during the course of this study. Furthermore, it is unlikely that different levels of neophobia substantially influenced subjects' motivation. Although we cannot fully exclude residual neophobic reactions, we followed a detailed habituation protocol (see Supplementary Methods) to minimize its effect. Before test sessions started, all birds consumed a food reward from the top of each IA task while being individually separated.
It is also unlikely that the discrepancy in motivation was caused to a large degree by different levels of extrinsic motivation resulting from different types of food rewards: We used cashews for laboratory-raised and dry corn for wild-caught birds. While a qualitative difference of cashew over corn could in some cases impact motivation, here, the food rewards were carefully chosen according to the animals' top foraging preferences. From all food items offered from within their known feeding repertoire, laboratory-born birds showed the highest preference for cashew nuts and wild-caught birds for corn (see Supplementary Methods for details). All wild-caught subjects immediately started consuming their rewards throughout habituation and also consumed the corn placed at the start position at the beginning of each test session.
In contrast, the observed differences in ratio of motivated birds might be explained by different outcome expectancies. Laboratory-raised birds that participated in many problem-solving tasks 18,44,46,61 may have developed an expectancy to gain food through apparatus interaction, while the wild-caught birds were naïve to such tasks prior to the start of testing. Various expectancy theories on motivation (e.g., [62][63][64][65][66] ) predict that the expectation of an individual, i.e., the perceived likelihood of an occurrence, is influenced by past feedback and affects persistence during task acquisition. Thus, the unmotivated wild-caught birds may not have been as routinely interested in the test setup as a typical laboratory-raised bird would have been. Additionally, many captive-held animals, including a variety of parrot species, show the phenomenon of contra-freeloading, i.e., they willingly put effort into obtaining food (e.g., from an apparatus) although freely available food is simultaneously accessible 67,68 . The laboratory subjects are usually very eager to participate in experiments which suggests that tests might be considered as foraging enrichment substituting natural challenges. Nevertheless, one subject from the laboratory group showed similar low motivation to interact with the IA as the five wild-caught birds, suggesting that expectancy based on experience and contra-freeloading cannot fully explain observed difference in interest even though it might influence the proportion of motivated subjects.
Despite the extensive problem-solving pre-experience in laboratory-raised birds, both wild-caught and laboratory birds seemed to encounter similar overall difficulties with similar task types (see Fig. 4 and Table 2 for estimates and ranks of task complexity). Subjects of both groups performed better at tasks requiring a non-repetitive movement which was independent of the reward release mechanism (lateral slash, pulling, pushing, shoving, e.g., 'Seesaw' , 'Shovel' , 'Swish' or 'Drawer'; see Fig. 4 and Table 2). Tasks that included repetitive actions, such as biting through toilet paper or turning a mill, seemed to cause more difficulties, however turning a disk in the 'DJ' task did not. In contrast to the DJ, in both the 'Mill' and the 'Bite' task, the food was possibly out of the subject's sight during the manipulation itself, suggesting that confounded visibility of the reward during the manipulation may have also affected performance 69 . Less clearly structured tasks, in which the functional mechanism was not positioned directly at the reward but was slightly displaced (e.g. 'Wire' and 'Twig'), also proved difficult for the subjects. Although we did not find a significant difference in group on overall task difficulty, laboratory birds tended to perform better in the 'Button' task than wild-caught birds (Fig. 4, Table 2). This task required the subjects to www.nature.com/scientificreports www.nature.com/scientificreports/ press a bolt to release the reward. In this singular case, experimental experience seemed to be of advantage. All laboratory-raised birds had previously participated in experiments involving the use of stick tools to push a rew ard 46,47,61,70,71 whereas in their natural habitat wild Goffins are unlikely to require such motor patterns.
Qualitatively, the birds used different techniques to solve tasks. One example is the Bite task where toilet paper was attached with big clips on both sides and held in place by smaller clips at the bottom. Subjects discovered three different non-exclusive solutions to solve this task: They shredded the material with repetitive biting actions, pulled the paper laterally out of the big clips, or removed smaller clips which subsequently allowed them to push the paper inside and access the reward. Individuals used multiple techniques -often in a combined manner -signifying that Goffins do not seem to persistently remain with previously learned motor routines for the same task 18 . The 'Clip' task was mostly solved by applying pressure simultaneously to both sides of the clips but some subjects would position their beak at the side of the aluminium coil distal to the reward and pull or push the sides in opposing vertical directions. The laboratory birds also opened the Wire task in several instances by removing the window hinges (which were closer to the reward) instead of unbending the wire, suggesting a proximity-based innovative strategy possibly due to a conflict in subjects' attentional focus 72 . Both wild-caught and laboratory-raised birds utilized both beaks and feet, particularly in tasks that required insertions. Parrots have highly sensitive soles containing Herbst corpuscles 73 that can be used for haptic exploration and problem-solving (e.g., 69 ).
In summary, our study underlines both the feasibility as well as the informative value of an IA approach for direct comparisons of innovation rate. We were able to compare the identity and number of tasks solved across sessions by wild-caught and laboratory-raised groups as well as the difficulties encountered with specific task features. Notably, a laboratory-raised life history did not seem to affect the cognitive processes required for high rate innovation in the IA aside from motivation which suggests the lack of an overall cognitive shift towards enhanced problem-solving abilities in laboratory-raised versus wild-caught Goffins. However, more laboratory-raised birds were interested in the problem-solving setup than wild-caught birds. Highly controlled comparisons with identical tasks on wild and laboratory-raised populations of the same species are crucial for and enriching to the interpretation of experiments conducted under long-term laboratory conditions 54 .
At this stage, we can only speculate whether the observed difference in motivation is due to life in long-term captivity or due to experimental history. A fruitful avenue for future research might therefore be to test experimentally naïve but laboratory-raised birds using the IA. In order to study the role of the Goffins' natural habitat, a next step may include comparisons of innovation rate between the Goffins and closely related non-island Corella species. Another important future direction are interspecific comparisons that additionally include more distantly related species, such as primates, using the IA. Ultimately, comparative investigations of animal innovation enhance our understanding of the evolution of problem-solving in various taxa.

Subjects.
We tested 11 adult laboratory (four females; seven males; 6-10 years of age) and eight wild-caught Goffins (six females; two males; age unknown) between March and December 2017. Laboratory-raised birds were purchased as juveniles from certified European breeders and housed at Goffin Lab Goldegg, Austria. Wild birds were caught on the Yamdena island in the Tanimbar archipelago, Indonesia, and kept temporarily at Goffin Lab Tanimbar field station (see Supplementary Methods and Table S1 for a detailed description of subjects, housing, experimental history, and the capture-release procedure). Apparatus. The Innovation Arena (see Fig. 1) consisted of 20 acrylic glass boxes (base: 15 cm ×16 cm ×17.51 cm; height: 16 cm) which could be inter-changeably arranged in a semicircle (distance from central point: approx. 1 m) on a platform. The top surfaces of each box served as lids which could be opened for baiting and were secured by long bars placed across multiple boxes during testing. The bases were screwed directly on the wooden platforms to keep each box in place, while the rest of the cuboid could be placed on any of the 20 positions. Each box constituted a different technical problem task requiring various motoric actions to solve (see Fig. 2). All tasks were novel to all subjects. The positions of tasks were randomly re-arranged for each session (for the Lab group randomization was restricted to no task being at the same position twice per subject). procedure. To control for different levels of neophobia subjects were first habituated to the apparatus (see Supplementary Methods and Table S1 for details). In test sessions each task was baited before individuals entered the test compartment voluntarily and were allowed 20 min to solve as many tasks as possible (rationale: 1 min per task). If a subject showed behavioural signs of agitation, the session was aborted. We applied a motivational protocol (see Supplementary Methods) if a subject did not touch any tasks within 3 min to enhance motivation. For each session task positions were reassigned and all tasks were rebaited. Testing was repeated until a subject either: (a) did not find solutions to additional tasks in 5 consecutive sessions, or (b) did not solve any task in 10 consecutive sessions. We used small pieces of cashew nuts as rewards for laboratory birds and dried corn for wild-caught subjects. Both rewards were identified as the food with highest preference value from a variety of www.nature.com/scientificreports www.nature.com/scientificreports/ available options (see Supplementary Methods) and were only provided for testing purposes. Due to the fact that wild Goffins fed substantially longer on dry corn kernels than laboratory birds fed on small cashew pieces, the timer during test sessions in the Field group was paused if feeding on a corn kernel has exceeded 3 sec. Timing was resumed once the feeding has stopped. Laboratory birds never fed longer than 3 sec on one reward. All sessions were video-recorded by a wide-angle camera (Goffin Lab Goldegg: Dahua DH-SD22204T-GN, Goffin Lab Tanimbar: GoPro Hero 3 White) mounted at the ceiling above the IA.

ethics. The study on the laboratory subjects was approved by the Ethics and Animal
Behavioural coding. Videos were analysed using Behavioral Observation Research Interactive Software (BORIS; version 6.0.5) 74 . We coded which tasks were touched, which tasks were solved, and apparatus-directed behaviours (see Supplementary Table S4 for a detailed coding protocol).
Statistical analysis. Principal component analysis. We used multiple variables to measure apparatus-directed behaviours: the number of contacts with baited ('BaitedContact') and solved tasks ('SolvedContact'), time spent within the 20 cm proximity grid of the tasks ('GridTime'), latency to approach within 20 cm ('LatencyGrid'), the number of tasks touched ('TasksTouched') and the number of tasks touched but not solved ('TouchedNSolved'). A Bartlett's test revealed significant correlations ( 2 = 1203.5, df = 15, p < 0.001). We therefore used a Principal Component Analysis with orthogonal rotation to avoid issues of collinearity in the model fitted later. We inspected their distribution and log transformed 'LatencyGrid' to avoid influential cases beforehand. It resulted in two components (PC1 and PC2) being above Kaiser´s criterion 75 of Eigenvalues above 1 which were thus included in the model as covariates.
Probability to solve. We used a Generalized Linear Mixed Model with binomial error structure and logit link function 76 to analyse the effect of 'Group' (Field vs. Lab) on the probability to solve a task and included the number of session ('Session') and the two components resulting from the PCA as control predictors. Prior to fitting the model, we z-transformed all covariates to a mean of zero and a standard deviation of one to achieve easier interpretable estimates 77 . Group was manually dummy coded and centered prior to inclusion. We included random intercepts for Subject and Task in addition to factors combining Subject and Task ('Subj.Task') as well as Subject and Session ('SessionID') in the model. Due to the low number of males in the Field group we did not include sex as a factor in the model. However, for interested readers a graph illustrating the collected data can be found in Supplementary Fig. S4. The model entailed random slopes within Task (PC1, PC2, Session, and Group), Subject (PC1, PC2, Session) and the combined factor of Subject and Task (PC1, PC2, Session). After fitting the model, we confirmed that none of the model assumptions were violated (see Supplementary Statistical Analysis for details). To avoid 'cryptic multiple testing' 78 we first compared the full model with a null model, which comprised the same random effects structure but was lacking the fixed effects group, PC1 and PC2. Only then individual predictors were tested. To assess whether there was a difference in task difficulty between groups we compared our full model with one lacking the random slope of Group within Task. For all comparisons we used likelihood ratio tests. Task difficulty was assessed by the model estimates for each task. The lower the estimate of each task, the less likely it was to be solved. Our sample encompassed 580 observations per estimated effect (5 fixed effects, 4 random effects) from 19 individuals (8-23 sessions each). The total number of successes amounted to 2,509.
Post-hoc tests. The results of the control predictor session (Fig. 3, Table 1) suggested a possible interaction of Group and Session. For this reason, we added the interaction term to the model and tested significance using a likelihood ratio test, post-hoc. We further inspected the difference between groups for PC1 and PC2 using Mann-Whitney U-tests and compared ratios of 'motivated' and 'unmotivated' birds per group with Fisher´s exact test, post-hoc.
Implementation. All statistical analyses were performed in RStudio 79 (version 1.1.453) using the software R 80 (version 3.5.1). In addition to the base R, we used the following packages: 'rela' 81