Flexible auditory training, psychophysics, and enrichment of common marmosets with an automated, touchscreen-based system

Calapai, A.; Cabrera-Moreno, J.; Moser, T.; Jeschke, M.

doi:10.1038/s41467-022-29185-9

Download PDF

Article
Open access
Published: 28 March 2022

Flexible auditory training, psychophysics, and enrichment of common marmosets with an automated, touchscreen-based system

Nature Communications volume 13, Article number: 1648 (2022) Cite this article

4459 Accesses
12 Citations
56 Altmetric
Metrics details

Subjects

Abstract

Devising new and more efficient protocols to analyze the phenotypes of non-human primates, as well as their complex nervous systems, is rapidly becoming of paramount importance. This is because with genome-editing techniques, recently adopted to non-human primates, new animal models for fundamental and translational research have been established. One aspect in particular, namely cognitive hearing, has been difficult to assess compared to visual cognition. To address this, we devised autonomous, standardized, and unsupervised training and testing of auditory capabilities of common marmosets with a cage-based standalone, wireless system. All marmosets tested voluntarily operated the device on a daily basis and went from naïve to experienced at their own pace and with ease. Through a series of experiments, here we show, that animals autonomously learn to associate sounds with images; to flexibly discriminate sounds, and to detect sounds of varying loudness. The developed platform and training principles combine in-cage training of common marmosets for cognitive and psychoacoustic assessment with an enriched environment that does not rely on dietary restriction or social separation, in compliance with the 3Rs principle.

Multi-area recordings and optogenetics in the awake, behaving marmoset

Article Open access 02 February 2023

Social learning exploits the available auditory or visual cues

Article Open access 24 August 2020

Using touchscreen-delivered cognitive assessments to address the principles of the 3Rs in behavioral sciences

Article 17 June 2021

Introduction

In recent years non-human primates (NHP)s have seen increased interest as animal models for human diseases due to the advent of transgenic primates and genome-editing technologies^1,2. As NHPs are closer to humans than rodents with respect to e.g. physiology, cognition, genetics, and immunology^{3,4,5,6,7,8,9,10}, results from NHP studies investigating cognition are likely more representative for the situation in humans.

In visual neuroscience, attention, object formation, categorization, and other aspects of cognition are extensively studied. In auditory neuroscience, several studies have also used different tasks (e.g. 2-alternative forced choice, go-no go) to probe different cognitive functions (such as memory, categorization, reward processing^11,12,13,14). In general, though, studies in auditory cognition are lagging behind those of visual cognition with respect to overall sophistication of methods, experiments and task complexities. One factor for this is the common observation that monkeys have been notoriously difficult to train in the auditory domain, and generally display a bias towards vision. For example, it has been shown that baboons can easily learn to locate food items based on visual but not auditory cues¹⁵. Among other results, this surprising failure at such a seemingly simple auditory task has led the authors to suggest that inferential reasoning might be modality specific.

However, investigations into auditory capabilities and cognition increase in scope as NHPs have become genetically tractable organisms^1,2,16,17,18. Notably, the common marmoset (Callithrix jacchus) has become a valuable model for biomedical research in general and the neurosciences in particular^19,20,21. Factors such as the relative ease of breeding, early sexual maturation and short life span^22,23 have contributed to the rapid generation of genetic models of human mental and neurological diseases in marmosets^1,24,25,26. While generally marmoset training is lacking behind the sophistication of cognitive NHP experiments traditionally performed with macaques, auditory capabilities of marmosets have been investigated extensively^{27,28,29,30,31,32}. Furthermore, marmosets have now also become the go-to NHP model for hearing loss and cochlear implant research^33,34,35,36. In the near future many more transgenic primate models will be developed which requires extensive phenotyping, as is standard for rodent models³⁷. Phenotyping will need to investigate large number of subjects in a standardized and experimenter/observer-independent manner^{38,39,40,41,42,43,44}. In addition, increased awareness for species-specific ethical demands asks for refinement of experimentation techniques as much as possible^45,46. This has led to efforts developing home-cage, computer-based cognitive training of NHPs focusing on the visual domain^{47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}.

To achieve comparable efforts in the auditory domain, there is a need for automatic, unsupervised cage-based training and testing of auditory tasks. Towards this goal, we built a standalone wireless device for auditory training and testing of common marmosets, directly in their own cage. The system, termed marmoset experimental behavioral instrument (MXBI), is mostly comprised of off-the-shelf or 3d printed components, is entirely programmed in Python, and based on the Raspberry Pi platform, for maximum flexibility of use, openness, and to allow for easy adaptation by others. The MXBI is set up with a server/client configuration in mind; and capable of animal tagging by means of radio-frequency identification (as in rodent systems⁶⁴), which ultimately allows scalable, standardized, automated, and unsupervised training and testing protocols (AUT in short, from ref. ⁴⁸) in socially housed animals. Moreover, the MXBI and the procedures we describe contribute to the efforts of refining cognitive and environmental enrichments of NHPs in human care. Further, we report results from a set of four experiments: (1) an algorithm-based procedure for gradually and autonomously training naïve animals to the basics of a 2-Alternative-Choice task (2AC visual task); (2) an Audio-visual association experiment where a conspecific call is contrasted to an artificial acoustic stimulus; (3) a generalization experiment assessing the flexibility of the acquired discrimination behavior with other stimuli; (4) and a psychoacoustic detection experiment for quantifying hearing thresholds in a cage-based setting. We show that marmosets can be trained to flexibly perform psychoacoustic experiments on a cage-based touchscreen device, via an automated and unsupervised training procedure that requires no human supervision and does not rely on fluid or food control, nor social separation.

Results

In this study 14 adult common marmosets (Callithrix jacchus) of either sex and housed in pairs participated across one initial training phase and four autonomous cage-based experiments. Animals were generally trained in pairs on auditory tasks with a single MXBI attached to the animals’ home cage and without fluid or social restrictions (Fig. 1a). Aside from the initial training (see below) all sessions ran autonomously, while an RFID module identified the animals and an algorithm controlled the individualized, performance-based progression in difficulty (see methods: Automated unsupervised training (AUT)).

**Fig. 1: General engagement on the MXBI across all autonomous experiments.**

Initial training

The goal of the initial training was to instruct naïve animals to interact with the touchscreen to receive liquid reward (Arabic gum or marshmallow solution) from the device’s mouthpiece. The training was divided into three sequential steps: first, habituation to the device (supplementary video 1); second, forming a mouthpiece-reward association (supplementary video 2), and finally, a touch-to-drink phase (supplementary video 3.1 and 3.2). All animals started exploring the device from the very first session. During the touch-to-drink phase, a mesh tunnel was introduced inside the device (Fig. 1a), to allow only one animal at a time inside the MXBI. Animals were encouraged to enter the tunnel and reach the touchscreen by placing small pieces of marshmallows or arabic gum along the tunnel, on the mouthpiece, and on the screen. After the initial training was concluded (mean = 6 ± 1.4 sessions, Table 1), animals were introduced to the automated procedure gradually bringing them from naïve to experienced in discrimination as well as detection-based psychoacoustic tasks.

Table 1 Characteristics and statistics of all animals involved in the experiments.

Full size table

General engagement on the MXBI across all autonomous experiments

Individual animals engaged with the MXBI in different amounts with the median number of trials per session varying between 31 and 223. On average 116 trials per session (IQR = Q3-Q1 = 192) were performed (Fig. 1b, Table 1). While half of the animals had less than 10% of sessions without a single trial (median = 10.7%, IQR = 16.8%) two animals displayed more than 30% of sessions without performing a trial. On average 100 sessions were conducted per animal and 14 of those sessions had 0 trials (Fig. 1b). Controlling for session duration, we found no significant correlation between the total number of trials performed by each animal and session number (Partial Pearson correlation controlling for session duration; adjusted r² = 0.05, p-value: 0.1, N = 802; CI = −0.01, 0.13, Fig. 1c), suggesting that the level of engagement remained consistent across sessions. Qualitatively, animals tended to engage consistently throughout a session as indicated by the distribution of trial onset times (Fig. 1e). Consequently, the median time point at which half of the trials were performed was 0.52 of the session’s duration (Fig. 1d).

Automated unsupervised training (AUT)

An automated and unsupervised training protocol (AUT⁴⁸) was implemented to train naive marmosets at their own pace on the basics of a 2AC visually guided task. In order to identify the appropriate parameters upon which to build such autonomous procedure we first designed and tested multiple AUT versions with a subset of 9 animals (described in supplementary tables S1 and S2). The resulting final versions of the protocols (AUTs 8, 9, and 10), were then tested with 4 naïve animals (animals f, k, c, and d). The AUT procedure was comprised of 4 milestones—(1) decrease of the size of a visual stimulus (trigger) to be touched for reward, (2) change of position of a visual stimulus, (3) introduction of sound and delayed presentation of a visual target, (4) introduction of a second visual target as a distractor—that unfolded through a total of 48 dynamic steps (Fig. 2; Fig. S4C). During each session the transitions between steps and milestones were based on the animal’s performance in a sliding window of 10 trials (hit rate of > 80% to advance, <= 20% to retreat; Fig. S4D). Figure 2c shows the hit rate across individual steps and milestones for the 4 naïve animals that only performed the final versions of the AUT. While the procedure was designed to encourage a smooth transition from step to step, certain steps (and thus milestones) required more trials to be accomplished. As a consequence, the hit rate calculated across animals varies substantially as a function of AUT step (Fig. 2c). Due to animals learning at different paces and performing different number of trials, we quantified the progression through the AUT as a function of the percentage of total trials completed by each animal (Fig. 2d). This allowed us to visualize and compare learning progress across animals with inherently different working paces on a common frame of reference. Both the total amount of trials (expressed by line thickness in Fig. 2d) needed to complete the AUT and the learning curves throughout the AUT vary substantially across animals (Fig. 2e) in the middle portion of the AUT, during which the stimulus changed position on the screen and an acoustic stimulus was introduced. Starting from the introduction of sound (milestone 3) we introduced timeouts (gray screen) to provide further feedback on wrong trials. Analysis of inter-trial-intervals (ITIs) trials revealed shorter average ITIs after correct vs. wrong trials suggesting an effect of timeouts on animal behavior (Fig. S3 and Table S4).

Audio-visual association

Next, we tested whether animals would generalize from the visually guided 2AC task introduced via the AUT procedure to an acoustically guided 2AC discrimination. In this experiment animals were required to discriminate between a conspecific juvenile call (in the following referred to as voc), and a pure tone (simple train—sTr—chosen for individual animals from a range between 1.5 and 3.5 kHz), by selecting one of two visual stimuli permanently associated with each sound (supplementary Video 4). 5 out of 9 animals successfully learned to discriminate between the sTr and the voc by selecting a geometric pattern or a conspecifics face, respectively (Fig. 3a, c). The remaining 4 animals performed at chance level. To disentangle if these animals were unable to solve the task or maybe were unwilling to perform above chance, we devised a 3 alternative-choice (3AC; upon sound presentation animals had to choose between 3 visual symbols, see methods) version of the same task (Fig. 3b, c) and tested 2 of these animals and 2 additional animals who had failed a different control condition (see supplementary material: Artificial Discrimination, Figs. S1, S2). In the 3AC task, all 4 animals performed the task significantly above chance (Binomial test, pot-hoc corrected for multiple comparisons; Table 2). Taken together these results demonstrate that 9 out of 11 animals learned the Audio-visual association. The remaining two animals that did not learn the 2AC discrimination were assigned to a different project and were not tested on the 3AC version. Additionally, 7 out of 9 animals who accomplished the discrimination task exhibited significantly longer reaction times in responding to the target in voc vs sTr trials (Fig. 3d; Table 2), indicating that the animals behaved differently for different acoustic stimuli.

Table 2 Summary statistics for the audio-visual association across animal and stimuli (Fig. 3c, d).

Full size table

Generalization to novel stimuli

With the five best performing animals in the audio-visual association experiment, we assessed whether animals would be able to generalize the acquired discrimination to three novel stimuli (Fig. 4a): two different types of vocalizations—an adult marmosets’ Phee and a Twitter—and a white noise sound. All 5 animals quickly learned to discriminate the Twitter and the Phee when contrasted to the sTr (Fig. 4b, c). On the other hand, when two new stimuli were contrasted with each other animals displayed lower hit rates. In the white noise vs Twitter condition, 3 animals acquired the discrimination; 1 animal displayed a bias towards the twitter it had previously learned; and for 1 animal the performance fluctuated between 0.6 and 0.75 in the sessions prior to the last 2 in which it was not significantly different from chance. When the juvenile vocalization (voc) was juxtaposed to the Twitter only 2 animals significantly performed above chance and another performed significantly above chance only for the Twitter. Animals seemed to find it more difficult discriminating between vocalizations, despite having already learned and successfully discriminated both from other stimuli extensively (see Table 3). We interpret this result as an indication that vocalization stimuli (voc, Twitter, and Phee) carry a distinctive meaning to the animals compared to more artificial stimuli (tones or white noise). This could in fact explain why animals readily discriminate them when contrasted to artificial stimuli but do not display significant discrimination between multiple vocalizations. Note that Animal i was not quantified in the voc vs Twitter and in the Phee vs sTr condition due to a limited number of trials (less than 50 trials in each task).

**Fig. 4: Generalization of audio-visual association to novel stimuli.**

Table 3 Summary statistics for the Generalization to novel stimuli across animals and the four conditions (Fig. 4).

Full size table

Psychoacoustic assessment of stimulus thresholds

Last, we addressed whether the MXBI can be employed for psychoacoustics. We chose to investigate hearing thresholds in a vocalization-detection task and towards this goal trained three animals (animal a, b, and d). In this experiment animals that already knew the association between the acoustic and corresponding visual stimuli (see above: section “audio-visual association”), were now trained to associate the absence of the vocalization with the visual stimulus for the sTr (Fig. 5). The method of constant stimuli was employed by randomly selecting the sound level from a set of values between 0 and 80 dB SPL. The animals were required to report the presence or absence of the vocalization by touching the marmoset face (visual stimulus coupled with the voc) or the triangles (visual stimulus coupled with silence), respectively. Note that due to the nature of the task, reward to the animals for stimuli in the range between 15- and 45-dB SPL was provided regardless of the animal’s choice. This was instrumental to prevent frustration and thus disengagement from the task when the acoustic stimulus was presented at amplitudes presumably close to the animal’s hearing thresholds. In contrast, reward was dependent on the animals’ choice for stimuli at and above 60 dB SPL and at 0 dB SPL. The aim of this reward scheme, illustrated in Fig. 5a, was to encourage the animals to use the triangles and the marmoset face as yes/no options for the presence/absence of the acoustic stimulation. After two to three sessions with only high amplitude stimuli (70 dB SPL) to stabilize the animals’ discrimination performance at 75% or above, test sessions commenced (three for animal d and four sessions for animals a and b—Fig. 5b). The estimated hearing threshold for the vocalization stimulus (mean 37.3 dB SPL; 36 for animal a, 49 for animal b, 27 for animal d) was below the background noise of the facility of 60 dB SPL (measured inside the MXBI with a measurement microphone and amplifier, see methods; spectrograms of 3 representative 1 min long recordings are shown in Fig. 5c).

**Fig. 5: Psychophysical assessment of hearing thresholds for animals a, b, and d after training on the Audio-visual association experiment.**

Discussion

In this study, we report results from four sequential experiments conducted with a stand-alone, touchscreen-based system—termed MXBI—tailored to perform training as well as psychophysical testing of common marmosets in auditory tasks. Animals involved in this experiment operated the device with a consistent level of engagement and for a prolonged time, directly in their own housing environment, without dietary restriction or social separation. All animals navigated an automated, unsupervised training procedure with ease and at their own pace, going from naïve to experienced in a visually guided discrimination task. In a following audio-visual association experiment, nine out of eleven animals further acquired proficiency in an acoustically guided 2AC or 3AC discrimination task. Animals also quickly learned to flexibly discriminate three novel sounds they had never encountered before in a generalization experiment. Finally, we assessed the hearing thresholds of 3 animals with a spectro-temporally complex sound under potentially distracting auditory conditions. Our results indicate that: (1) marmoset monkeys consistently engage in various psychoacoustic experiments; (2) while performing enough trials and at high performance to allow psychometric evaluations; (3) in a self-paced manner; (4) without the need of dietary restriction or separation from their peers; and (5) with high degree of training flexibility.

Home-cage training of naïve animals

For our experiments we designed a cage-based device and employed an unsupervised algorithm to gradually and autonomously make naïve marmosets accustomed to a 2 or 3 alternative-choice task and a simple detection task in the auditory modality. Each of the 14 animals who participated and successfully completed the first experiment learned (1) to seek and consume reward delivered from the mouthpiece; (2) to operate a touchscreen proficiently; (3) to respond with appropriate timing to abstract sensory stimulation; (4) to understand the concept of a trial structure; (5) to tolerate frustration when failing a trial; 6) and ultimately to continuously devise, update, and deploy problem-solving strategies. For practical, experimental, as well as ethical reasons, we aimed at developing an experimental protocol to train many of these aspects directly in the animals’ own housing environment, at the animals’ own pace^{47,51,53,56,65,66}, and without dietary restrictions. Most of these aspects were instructed by a computerized training strategy in which the difficulty was automatically adjusted according to the trial-to-trial performance of the individual animal. The Automated Unsupervised Training (AUT) consisted of a pre-programmed series of steps in which several elements of the task were slowly introduced or adjusted, from trial to trial. The aim of this strategy is to keep animals at a comfortable level of performance to presumably limit frustration, while making the task gradually more difficult and thus making the animals more and more proficient⁴⁸. Additionally, such subtle, gradual, and constant change in the challenge offered to the animals has been suggested to prevent loss of interest^67,68,69,70. We indeed observed a long-term rate of engagement, across several hundred sessions across all animals, suggesting an interest in the experimental sessions that could not be attributed solely to novelty⁶⁹. Additionally, animals were generally kept together with their cage mate in their home-enclosure and were fed normal colony diet, prior to, after or even during the sessions. Fluid was also available ad libitum. Such generalized and continued interest towards the MXBI, free of any additional coercion, was presumably the result of the combination of a highly preferred primary reinforcer (liquid arabic gum or marshmallow solution), a cognitive, sensory, and interactively rich environment^67,71,72, and the dynamical adjustments in task level^48,70. Moreover, we did not observe any behavioral alteration that would suggest excessive attachment to our system at the level of the single individual or cage-pair of animals. Rather, 50% of the trials occurred within the first half of the session, in line with a recent report of a steady rate of interactions in voluntary training of motor tasks throughout the waking hours⁶⁶.

Finally, because we instructed tasks that are typical in cognitive neuroscience and animal cognition (namely a two or three alternative-choice and a detection task), we believe that similar results would be achieved in training as well as testing other sensory or cognitive domains.

Training flexibility of marmosets

With the exception of two animals who were assigned to a different project and could not be trained further, all animals were successfully trained and tested in audio-visual association experiments reported here. It is important to note that while two animals—a and b—readily transferred the knowledge acquired in the visually guided discrimination (Automated Unsupervised Training) to quickly learn the acoustically guided discrimination (audio-visual association), the remaining seven animals required a substantial amount of trials to reach the same level of proficiency. Three animals out of the remaining seven also rapidly generalized the acquired discrimination to novel acoustic stimuli at a comparable rate to animals a and b. Therefore, while the initial transition from the visual to the acoustic domain occurred at variable speed, all tested animals showed a comparable level of flexibility in generalizing to novel stimuli. Finally, all three animals tested in the psychoacoustic assessment, quickly learned to reinterpret the discrimination as a detection task as soon as the reward scheme was adjusted. This allowed for a systematic psychoacoustic assessment of the sound intensity required to detect a vocalization under conditions with background noise.

Together, our results suggest a high degree of training flexibility of common marmosets in general and the auditory modality in particular. Specifically, marmosets can: (1) transfer acquired rules from the visual to the acoustic domain; (2) rapidly learn to discriminate novel acoustic stimuli and (3) flexibly reinterpret a discrimination task as a detection task.

Cognitive hearing in marmosets

The success of the acoustic experiments presented in this study could partly be due to intrinsic properties of the stimuli employed based on the naturalistic connotation in both the visual and the acoustic domain of the juvenile vocalization and juvenile marmoset face association. This ‘natural association’ might then also support the association of the respective other stimuli. Our failed attempts, detailed in the supplementary material, indeed demonstrate the difficulty in having marmosets associate artificial stimuli across the auditory and visual modality. The guiding strategy was that additional properties of the stimuli should match across modalities to support crossmodal association and considered successful concepts from training of rodents and ferrets^73,74. For example, we presented auditory and visual stimuli together with a reward, or a timeout screen, in a temporally overlapping fashion which leads to strong associations of stimulus components in rodents. Also, the sound was presented from the speaker on which side the correct visual response indicator was located. This has been shown to be a strong cue for ferrets to guide choice towards the respective sound direction. In stark contrast, none of these approaches were successful in marmosets.

Results from the generalization experiment indicate that animals could quickly and flexibly learn to discriminate novel auditory stimuli. On the other hand, when two different types of vocalizations were contrasted, only two animals out of 4 performed above chance. Taken together these results indicate that (1) vocalizations might carry a distinctive meaning to the animals that can be exploited to train common marmosets on various psychoacoustic tasks; and (2) the use of a combination of naturalistic and artificial sounds is more likely to instruct marmosets in performing psychoacoustic tasks above chance level.

Psychoacoustic assessment of marmosets in the home enclosure

Performing auditory psychophysics directly in the animals’ colony poses an acoustically challenging environment due to the uncontrolled background noise. The sound pressure needed in order to detect a vocalization of a juvenile marmoset in a cage-based setting—37.3 dB SPL—was below the sound level of the facility’s background noise—~60 dB SPL. This might be explained by the adaptation of the auditory system to background sounds which has been documented along the auditory pathway^{75,76,77,78,79} and has been suggested to optimize perception to the environment^76,77. Additionally, the juvenile vocalization might have been less affected by background noise (mostly driven by ventilation and marmoset vocalizations) as it minimally overlaps the sound spectrum typically encountered in our colony of adult animals. Nonetheless, our data show that NHP’s psychoacoustic training and assessment is feasible within the animals’ home enclosure similar to chair based psychophysics²⁹. While measurements of hearing thresholds in more classical controlled settings are essential to understand auditory processing and sensitivity, the investigation of audition in more naturalistic environments could provide a closer estimate of real-world hearing capabilities. This might be particularly relevant for auditory processes and mechanisms that involve higher-level, top-down, cortical influences^80,81,82 and thus are more susceptible to the influence of environmental contextual factors. Environmental sounds produced by conspecifics, for example, could affect how task-relevant sounds are encoded, processed, and interpreted by marmosets that heavily rely on acoustic communication to cooperate, live together, and survive⁸³.

Towards a high-throughput pipeline for auditory neuroscience

The development of transgenic primate models – and especially marmoset models—for various human diseases^1,24,84,85 will require phenotyping a large number of animals similar to mouse phenotyping pipelines^37,38,86. Consequently, cognitive training and testing paradigms, designed around the marmoset model, need to be developed, tested, and implemented^70,87. Furthermore, in order to allow high-throughput training and testing of common marmosets directly in their own housing environment, our device was designed and built with a series of hardware and software features in mind. First, the use of an inexpensive single-board computer as central control unit of the whole device allows for straightforward scaling to more devices and simple adaptation to new experimental requirements. To the best of our knowledge, besides the MXBI introduced here, a fully wireless cage-based system tailored towards visuo-acoustic stimulation and training, capable of ID tagging and set up to be server/client ready has not been presented yet. The wireless connectivity of the MXBI, allowed us to build a network of devices that autonomously interact with a single server node. Upon booting of an MXBI a series of scripts ensures that each device is connected to the central hub where (1) information about animals’ ID are stored (used for matching ID codes coming from the implanted chips), (2) data are routinely backed up from the device, and (3) the videos of the sessions are stored. Besides having a unique network address, all devices are essentially identical and can therefore be used on any suitable home cage in our colony. Upon crossing the RFID coil, information coming from the implanted chip will be matched with the database on the server and the local device will load the desired task and AUT step for the given animal. Furthermore, employing a battery-based power solution for the MXBI made the device safer for the animals, due to the exclusively low voltage provided, and easy to handle. While in our case this feature was mostly an add-on, in outdoor cages or on field research sites without direct access to power outlets, this could be a necessary requirement. Combined with image-based animal identification^88,89, this would allow for comparative testing of captive and natural populations⁹⁰. Finally, several structural elements of the MXBI were designed for manufacturability and commissioned to local workshops or locally 3D printed. The combination of structural and electronic hardware elements is particularly well suited, in our opinion, to replicate our device on a large scale. As a result of these built-in features, in the animal’s facility of our institute, 6 devices are simultaneously active, training 12 animals in parallel over the course of several hours, and generating on average 1500 trials a day requiring only approximately 35 min of human labor.

In conclusion, all of these aspects are to be considered when establishing a successful high-throughput pipelines (across various fields of cognitive neuroscience) because together they ultimately add up to create automated high-throughput protocols for integrating advanced cognitive and behavioral assessments with physiological data recordings³⁸.

Autonomous devices as cognitive enrichment

Throughout our experiments we found that animals consistently interacted with the device regardless of their performance. In certain occasions animals performed thousands of trials at chance level, across several weeks, despite no social or fluid restriction were applied. While this might seem counterintuitive, we argue that from the animals’ perspective our approach, coupled with the appeal of the liquid arabic gum that the device delivered, represents a form of enrichment^68,69,70. From a psychological standpoint, cognitive enrichment strategies exercise what is known as competence, namely the range of species-specific skills animals employ when faced with various challenges. This, in turn, promotes the sense of agency, described as the capacity of an individual to autonomously and freely act in its environment⁹¹. Promoting both competence and agency has been proposed to be crucial for the psychological wellbeing of captive animals because: (1) animals can better cope and thus better tolerate captivity; and (2) animals can exercise species-specific cognitive abilities that have little opportunity to be expressed in captivity^68,92.

Study limitations and caveats

Several animals in the audio-visual association tasks performed at chance level for several thousands of trials. Receiving a reward in half of the trials might be a successful strategy for animals that are not constrained, isolated, or fluid/food restricted. Under these conditions it is unclear whether animals will attempt to maximize their reward—as has been reported in studies where food or fluid regimes are manipulated^93,94 but see⁹⁵—or are satisfied with chance performance. An animal that is satisfied performing at chance for a certain task will naturally not ‘learn’ even though it might cognitively be able to. In line with this interpretation, animals that performed at chance level in a 2AC version of an auditory discrimination task, successfully performed the auditory discrimination when the overall chance level was reduced from 50 to 33% by employing a 3AC version.

Our data demonstrate flexibility of auditory training using natural stimuli and lay the groundwork for further investigations e.g. testing categorical perception of vocalizations by modulating the spectral content of the stimuli used. However, a caveat of our work is that our approaches were not successful in training marmosets on discriminating artificial sounds consistently (see supplementary materials). Among other potential explanations, we attribute this difficulty due to the introduction of auditory cues relatively late in training. This might have biased animals to focus on the visual domain—which is considered the dominant sense in primates^96,97—while ignoring other cues. Future studies should therefore explore alternative approaches to train arbitrary acoustic discriminations potentially by introducing reliable auditory cues very early in training.

Methods

All animal procedures of this study were approved by the responsible regional government office [Niedersächsisches Landesamt für Verbraucherschutz und Lebensmittelsicherheit (LAVES), Permit No. 18/2976], as well as an ethics committee of the German Primate Center (Permit No. E1-20_4_18) and were in accordance with all applicable German and European regulations on husbandry procedures and conditions. It has to be noted, however, that—according to European regulations and implemented in German animal protection law — the procedures described in this study can be considered to be environmental enrichment.

Animals

A total of 14 adult common marmosets (Callithrix jacchus) of either sex (see Table 1) were involved in the experiments carried out in the animal facility of the German Primate Center in Göttingen, Germany. Some of the animals were prepared for neurophysiological and cochlear implant experiments. Animals were pair housed in wire mesh cages of sizes 160 cm (H) × 65 cm (W) × 80 cm (D) under a light-dark cycle of 12 h (06:00 to 18:00). Neighboring pairs were visually separated by opaque plastic dividers while cloths hung from the ceiling prevented visual contact across the room. Experimental sessions occurred mostly in the afternoon and without controlled food/fluid regimes or social separation from the assigned partner. Liquid arabic gum (Gummi Arabic Powder E414,1:5 dissolved in water; Willy Becker GmbH) or dissolved marshmallows (marshmallow juice, 1:4 water dilution) was provided as a reward by the touchscreen device for every correct response in the various experiments. Marshmallow or arabic gum pieces, stuck to the touchscreen, were used during the initial training phase.

Apparatus

The marmoset experimental behavioral instrument (MXBI) is directly attached onto the animals’ cage and measures 44 cm (H) × 26 cm (W) × 28 cm (D). The device is internally divided into three sections (Fig. S4A). The electronics compartment on top contains: a Raspberry Pi 3B + (raspberrypi.org); a RFID module with a serial interface (Euro I.D. LID 665 Board); two peristaltic pumps (Verderflex M025 OEM Pump), one on each side; a camera module (Raspberry Pi wide-angle camera module RB-Camera-WW Joy-IT); and a power bank (Powerbank XT-20000QC3) through which 5 and 12 V (max 2.1 A) was provided to the whole system. In our setup and with our tests, the power banks last up to 8 h before the battery is depleted allowing for continuous training or testing during most of the waking hours of the colony. We chose the Raspberry Pi single board computer instead of more commonly used tablet PCs^88,98 for ease of interfacing various external devices. Towards this requirement the Raspberry Pi has various general-purpose input output capabilities allowing to integrate a wide variety of external hardware components such as microcontrollers, touchscreens, etc. with standard communication interfaces (SPI, I2C, I2S). Additionally, new MXBIs can simply be set up by copying the content of the SD card of an existing device into the SD card of the new device. The behavioral chamber in the middle (internal dimensions: 30 cm (H) × 22 cm (W) × 24 cm (D)) hosts: a 10 inch touchscreen (Waveshare 10.1”HDMI LCD [H], later sessions contained a 10“ infrared touchscreen attached to the LCD screen, ObeyTec); a set of two speakers (Visaton FR58, 8 Ω, 120–20,000 Hz) for binaural acoustic stimulation; a horizontal reward tube with custom-made mouthpiece (placed at 3 cm from the screen but variable between 2 cm and 5 cm); the coil (or antenna) of the RFID and a cylindrical mesh to prevent more than one animal to be inside the device at the same time (Fig. 1a). Finally, at the bottom of the device, space is left to accommodate a removable tray to collect and clean waste. Hinges on one side allow the device to be opened from the back if cleaning or troubleshooting is needed (Fig. 1a Left). The MXBI can be anchored to the front panel of the animal’s cage via custom designed rails welded to the cage. A removable sliding door at the front panel allows animals to access the MXBI when attached. A Python3 based software (Python 3.5.3 with the following modules: tkinter 8.6, numpy 1.12.1, RPi.GPIO 0.6.5, pyaudio 0.2.11) running on the Raspberry Pi records all interaction events (screen touches, RFID tag readings and video recording), manages stimulus presentation (acoustic and visual), controls the reward system and finally backs up the data automatically to a server via wireless local network connection (Fig. S4B).

Procedure

Behavioral training and testing sessions were started by connecting the Raspberry Pi and LCD display to power which initiates booting. After booting, a custom script with a series of preconfigured commands was automatically initiated to: (1) connect the device to a central server for automatic, recursive, data logging, as well as main database access; (2) start the local camera server for remote monitoring and video recordings (Fig. 2b); (3) automatically launch the experimental task when needed. The fluid reward was manually loaded in each device and the pump was primed. The device was then attached to the cage and the sliding door in the front panel removed for the duration of the session. At the end of the session, the sliding door was placed back between the device and the cage so that the device could be detached, cleaned, and stored. The touchscreen surface and the behavioral compartment were thoroughly cleaned to remove odors and other traces. Hot water was used daily to clean the reward system to prevent dried reward from clogging the silicon tubes and mouthpiece. The entire process requires a single person around 35 min (15 for setting up and 20 for taking down) with six devices.

Sessions

In order to operate the touchscreen at the opposite end from the MXBI’s entrance, the animals are required to go through the opening on the front panel and the mesh cylinder (Fig. 1a). Crossing the antenna inside the mesh cylinder identifies animals via their RFID transponder (Trovan ID-100A) implanted between the animal’s shoulders for husbandry and identification reasons. Standing up inside the mesh places the animals’ head 3 cm above the mouthpiece and 4–5 cm away from the screen, directly in front of a cut out in the mesh of 3.5 × 8.5 cm (HxW) through which the touchscreen can be operated (Fig. 1a). Throughout each session, animals were regularly monitored by the experimenter from a remote location (approximately every 15 min). Additionally, videos from most sessions were recorded and stored. Fluid (either water or tea) was available ad libitum to the animals within their home cage but outside the MXBI. Solid food was provided to the majority of the animals before, after, and during the session, depending on husbandry and/or veterinary requirements.

Experimental paradigm

Throughout the experiments, animals never left their home cage. With the exception of animals a and b, that where pilot subjects and underwent a different initial procedure, all animals were first trained manually to operate the device at a basic level by means of positive reinforcement training and shaping techniques (see methods section: initial training). Afterwards, all animals where guided by an unsupervised algorithm through a series of preconfigured training steps (see section Automated unsupervised training (AUT)) to acquire basic proficiency in a standard 2AC discrimination task. The animals’ discrimination proficiency was then tested and refined in a next experiment in an acoustically guided discrimination task (see section Audio-visual association). In a third experiment, the acoustic stimuli were replaced with novel stimuli and the animal’s ability to generalize was assessed (see section Generalization to novel stimuli). Last, we developed a psychoacoustic detection task to quantify the animal’s hearing thresholds (see section Psychoacoustic assessment). It is important to note that not all animals took part in all experiments either because some animals were assigned to different projects or were not always available due to the requirements of different experiments.