The lifespan of pet or companion dogs has been increasing over the years1. Consequently, behavioural and physical deficits in old age have become more prevalent. In the last decades, research on canine ageing has grown exponentially as both scientists and the public have increasingly recognised dogs’ emotional, economic, and scientific value as an animal model species2,3,4,5. Studies have shown that owners of ageing pet dogs often report a decline in the dogs’ visual and auditory function, changes in social behaviour6,7,8,9,10,11,12, and the sleep/wake cycle13,14.

To better understand these phenomena, researchers have developed various behavioural tests to measure the behavioural differences that occur with old age in companion dogs. For instance, a curiosity test showed that the chronological age of the dogs is linked to their neophilic behaviour: specifically, young dogs (1–4 years) sniffed and played for a longer time with novel objects compared to older dogs (> 9 years)15. In a similar study, in the presence of an unfamiliar person, younger dogs (1 4 years) physically interacted more frequently with them compared to older dogs (> 9 years)16.

Previous studies also demonstrated an impairment of several cognitive abilities such as memory, learning and flexibility in aged dogs3,4,11,15. For instance, Piotti and colleagues4 showed that, in discrimination and reversal learning tasks, younger dogs (1.5–6.5 years) were able to learn faster than older dogs (8.0–14.5 years). These results have been further validated using EEG, demonstrating a correlation between sleep spindle (non-REM bursts of activity in the sigma range) intrinsic frequency and the number of reversal learning training trials required to reach the criterion17. Sleep spindles predict learning in dogs and vary with age18,19. Ageing appeared to affect also dogs’ ability to retain and later exploit spatial information. Using a spatial memory task that required the use of short-term memory to find food, it has been found that younger dogs (3–6 years) were more efficient than older dogs (9–11 years), committing fewer errors and finding the food more often at their first attempt3. The relationship between the performance in the spatial memory task and the dogs’ gut microbiome was also investigated, suggesting a worse memory performance (more errors) was associated with a higher proportion of Actinobacteria in their faeces20. These findings are in agreement with the high abundance of some Actinobacteria found in the gastrointestinal tract of patients with Alzheimer’s disease21.

Recently a battery of standardised outdoor behavioural tests (Mini Mental Test, MMT) was developed to allow the rapid assessment of age-related behavioural differences in family dogs2. Older dogs displayed less social interest, poorer spatial memory, and seemed less interested in and less fearful of a novel, moving object2. However, neither test–retest nor interobserver reliabilities were reported for this test battery which are necessary before applying the tests to clinical settings.

The development and quality of behavioural assessments should be assessed through five key measures: defining the test’s purpose, standardisation, reliability, validity, and practicality (or feasibility)22. A biological measurement is the cumulative result of several factors: the true value of the phenomenon that we intend to measure, biological variation, tool sensitivity, the skill and expectation of the observer and the experimenter, subject-related factors (e.g., hunger, fear), as well as external factors, such as environmental temperature or visual, olfactory, and auditory stimuli23. In a standardised test, two parameters are assessed to measure if the test can be considered relevant and accurate, reliability and validity24,25.

A measure is considered reliable when it is consistent and stable over multiple measurements25. There are three criteria for reliability: 1) intra- and interobserver reliability or agreement, which is the level of consistency within and between observers/coders, assessing the effect of subjective bias on the coding/scoring system25; 2) internal consistency, indicating coherence among components of a scale aimed to measure the same phenomenon26; and 3) test–retest reliability, which shows that the test yields the same results when repeated on the same subjects under identical conditions22.

Validity indicates that the method measures what it is meant to measure, both internally and externally22,26. Internal validity relates to the value of the measure itself, and it is assessed through three categories27. Content validity or a test’s scientific relevance indicates that the method only contains measures relevant to its aims. Construct validity shows whether the hypothesised cause explains the test scores. Criterion validity (predictability) indicates the predictive ability of the measurement in comparison with a previously validated instrument (a “Gold Standard”). Finally, external validity is the degree to which results can be generalised across studies27.

Behavioural tests are frequently used in various contexts, for example, to assess temperament or personality in pet, working, and shelter dogs22,27,28, which may be assessed in person or remotely26.

Despite the widespread use and the importance of behavioural testing for ageing research, some shortcomings have been identified. Some tests require a long training interval and, therefore, cannot be repeated over a short period (e.g. 4,), which makes it impossible to use them to monitor age-related behaviour changes in a longitudinal study design29. Others rely on social interaction16,30, which different dogs may perceive differently depending on the partner. For example, test accuracy may be undermined by the different responsiveness of dogs towards male and female experimenters. Previous research indicates that shelter dogs show a stronger decrease in defensively-aggressive behaviours (tendency to look, bark) towards women31, lower levels of plasma cortisol and more relaxed posture when petted by women32, as well as more stress-related behaviours (tendency to look, shorter tail-high periods, lip-licking) when walked on a leash by men33. The influence of human gender on behaviour has been understudied in companion dogs34,35 and, so far, has not been listed as a potential confounding factor in field tests aimed at assessing age-related interspecific social behavioural differences. Finally, cognitive tests designed to measure positive affective states have replicability issues and may not be reliable in ageing dogs due to the extensive learning required: for example, studies based on the cognitive bias test, a test for mood utilising discrimination choices, showed that older dogs might struggle to learn the discrimination and therefore it may not be possible to test them4,36. Currently, there are no standardised tests that can measure positive emotions in senior animals. Clinicians still need standardised testing for positive emotions in senior animals.

Previously we determined that the MMT demonstrated content and construct validity (internal validity) and a good degree of external validity2,20. This study aimed to investigate the reliability (interobserver, interexperimenter, test–retest) and reiterate the study of the internal validation (content and construct validity) of the MMT2, and adapt it to indoor settings to have a controlled environment with limited distractions. To measure interobserver and intra-experimenter agreement as well as test–retest reliability, we modified the protocol to include two experimenters (a woman and a man) and tested both old and young dogs and compared the dogs’ behaviour in the two situations (first occasion (T0), second after one to two weeks (T1)) with the different experimenters. The experimenters and an independent observer coded the dogs’ behaviour to calculate interobserver reliability. We also added a new test to the battery to assess spatial memory and neophilia, the Novel object recognition test (NOR)37,38,39. This test is widely used with murine models but, to our knowledge, has not been applied to dogs37.



All procedures complied with national and EU legislation and institutional guidelines in strict accordance with the International Society for Applied Ethology guidelines for the use of animals in research. The study received ethical permission from the Hungarian Pest County Governmental Office following the ethical review of the Eötvös Loránd University (Permission No.: PE/EA/2019–5/2017). Owners provided written consent for their voluntary participation. We took special care to ensure that the dog owners understood the consent process completely. In the consent form, participants were informed about the identity of the researchers, the procedure, location, expected time commitment of the experiment, handling of personal and research data, and data reuse. The owners were not informed about the exact aim of the tests. The information in the consent form included the participant’s right to withdraw their consent at any time. Participants could decline to participate at any point and request that their data not be used and/or deleted after they were collected. Our consent form was based on the Ethical Codex of Hungarian Psychologists (2004). For Fig. 1, we obtained informed consent from all subjects and/or their legal guardian(s) for publication of identifying information/images in an online open-access publication. For experiments involving human participants, written informed consent was obtained from all subjects and/or their legal guardian(s).

Figure 1
figure 1

Behavioural tests of the test battery. (a) Exploration; (b) Greeting; (c) Novel object recognition; (d) Problem box; (e) Memory; (f) Novel object (toy dog).


Thirty-eight dogs were recruited through the Department of Ethology, ELTE’s database of participants, social media, and word of mouth. Two groups of dogs were formed based on their age: ‘young dogs’ (N = 20, mean age ± SD = 2.7 ± 0.4, median age 3 years, IQR = 2.50–4.00, 50% female, 65% neutered), and ’old dogs’ (N = 18, mean age ± SD = 11.8 ± 1.3, median age 11 years, IQR = 10.62–12.88, 33% female, 78% neutered). Age categories (1–4 years for young dogs and above 9 years for old dogs) were based on previous findings regarding the onset of cognitive decline (see40,41 for a review). The sample included 14 mix-breeds and 24 pure breeds from 16 different breeds (Young: 6 mixed breeds, 3 golden retrievers, American Staffordshire terrier, Akita inu, Australian shepherd, Belgian shepherd, border collie, German shepherd dog, Hungarian sighthound, Kerry blue terrier, rottweiler, Siberian husky, standard poodle; Old: 8 mixed breeds, 2 border collies, American Staffordshire terrier, Belgian shepherd, golden retriever, Hungarian sighthound, labrador retriever, shar pei, vizsla, whippet; see Table S1 for full demographic information). The dogs were free from overt signs of distress and/or pain for both groups during the test.


The study was performed in an experimental room at the Department of Ethology, ELTE. Two tablets (Samsung Galaxy Tab S2), positioned at opposite corners of the room, recorded the behavioural performances of the dog during the test (Fig. 1).

The battery consisted of six indoor subtests (Fig. 1). An experimenter was present in the room for all subtests apart from the exploration test. The owner stood on his/her left-hand side and a coder, who coded some of the tests live, on his/her right-hand side. The owner kept the dog on the leash unless instructed differently.

The dogs underwent the same test twice (T0 = first test, T1 = second test, after 1 to 2 weeks) to measure test–retest reliability. Different objects were used in the second test when the dogs had to be naïve to a specific object (see Supplementary Material). The same experimenter and coder performed the test on both occasions. Half of the dogs were tested by a male experimenter the other half were tested by a female experimenter. The allocation to each experimenter was counterbalanced across dogs within age groups (see Table S2).

The behavioural variables measured are presented in Table 1.

Table 1 Subtests of the battery, variables, and their definition (modified from Kubinyi and Iotchev, 2020).


The goal of this subtest was to measure the dogs’ activity level and interest in investigating a novel environment2,42,43. The owner walked into the room with the dog on the leash and stayed in a pre-determined position (Fig. 1a) for one minute while reading a paper given by the experimenter (to prevent the owner from looking at or talking to their dog).


This subtest aimed to measure the sociability of dogs toward unfamiliar friendly people2,42,43. The experimenter entered the room and greeted the dog (Fig. 1b). If the dog approached the experimenter, the interaction continued in a standardised way (see Supplemental Material), including a ball or tugging game.

Novel object recognition (NOR)

The goal of this subtest was to measure neophilia behaviour44 and short-term memory. Dogs were presented, in a pre-determined order (Table S2), with two pairs of containers with different shapes and colours (Fig. S1). After one minute of exploring them, the dog was taken out of the room (Fig. 1c). The experimenter swapped the containers with a new pair, where one container was identical to the first one, and the second container had a novel shape and colour. The dog-owner dyad re-entered the room, and the dog had one minute to explore the containers. The position of the novel container and the types of containers were pseudo-randomised and counterbalanced between dogs and between T0 and T1 (Table S2).

Problem box

This subtest aimed to measure the dogs’ persistence. The dog was presented with a food toy (Kong wobbler (Fig. 1d)), filled with 20 pieces of dry food, and had one minute to try and retrieve the food by manipulating the toy with the paw or mouth to make the food drop from a small hole (‘solvable task’). Then the experimenter filled the toy again with a single large piece of dry meat, which was too big to get through the hole, so it was not possible for the dog to retrieve the food (‘unsolvable task’). The dog was given the toy for one minute. None of the dogs had previous experience with this type of toy.


The goal of this subtest was to detect differences in the dogs’ short-term spatial memory. The dogs were presented with five identical containers (Fig. S2) placed in a semi-circle (Fig. 1e). The experimenter placed a piece of food in one of the containers, which the dog was allowed to retrieve after a break outside the room, according to the procedure described in Piotti et al. 3. The procedure was repeated five times, once per container, and the order of the baited container’s location was counterbalanced and pseudo-randomised across participants and varied between T0 and T1 (Table S2). In addition, at the end of T1, the dogs were presented with three additional trials (‘Control Trials’) where the location of the baited container was changed while the dog was prevented from seeing the baiting. This was done to exclude the possibility that the dogs followed odour cues in this subtest.

Novel object (toy dog)

The objective of this test was to measure dogs’ neophilia and neophobia. The dogs were presented for 30 s with an electronic, moving toy dog (Fig. 1f) placed on the ground by the experimenter, according to the procedure described in Kubinyi and Iotchev2. Two toys, identical in shape and rough movement, but different in colours and sound, were used at T0 and T1, and the order was counterbalanced across dogs (Fig. S1 and Table S2).

Statistical analysis

Analyses were performed using R statistical software45 and the packages psych46, ordinal47, and lme448. Cumulative linked mixed models (CLMMs) were calculated to analyse ordinal (score) data. The cauchit link function was used for the activity level variable, probit link function for the social interaction variable, and LogLog link function for the object manipulation variable.

Generalised linear mixed models (GLMMs) were used to analyse frequency, continuous and binomial data. The recognition index, memory errors, object avoidance, and object interaction variables had Poisson error distribution, while the neophilic behaviour and spatial memory/control trials had binomial error distribution.

For each model, we initially created a global model including all the variables of interest as fixed factors, with no interactions, and the dog as a random factor. Each global model included ‘age group’ (old vs young), ‘test number’ (test vs retest), and ‘experimenter’ (A vs. B) as fixed factors. The model for the predictor’ object manipulation’ also included the variable’ test phase’ (solvable vs unsolvable). The global models for the predictors’ errors’ and ‘spatial memory’ included ‘trial’ (1 to 5). The main factors’ age group’, ‘test number’ and ‘experimenter’ were maintained in all models as part of our main hypothesis, while for all other factors, we adopted a stepwise approach to select the most parsimonious model to describe the variance of each response variable. Pairwise post-hoc comparisons with Tukey correction were then obtained.

We used a Wilcoxon signed rank test to compare the proportion of times the dogs chose the baited container in trials 1, 3, and 5 at T1 with the proportion during the corresponding control trials in the Memory test.

Finally, an independent coder (AS) coded 20% of the video material (16 tests out of 76, from both T0 and T1), and interobserver reliability was assessed using interobserver agreement (kappa) for scores, Cronbach’s alpha for binary data, and Spearman correlations for count data.

Ethical approval

The procedures of this study complied with national and EU legislation and institutional guidelines. The study received Ethical Permission from the Hungarian Pest County Governmental Office following the ethical review of the Eötvös Loránd University (Permission No.: PE/EA/2019–5/2017).


Age group

Young and old dogs differed in four variables from three subtests (Table 2). Young dogs were more likely to interact with the novel object first during the NOR, they chose fewer incorrect containers in the Memory subtest (Fig. 2, Fig. S3), and they interacted for a longer time with the toy dog and showed shorter avoidance behaviour in the Novel object (toy dog) subtest (Fig. 3, Fig. S4).

Table 2 The results of three cumulative linked mixed models (CLMMs) and six generalised linear mixed models (GLMMs). For each predictor, the estimate, the standard error (S.E. in brackets), and the p value (in italics) are reported. Significant p values are bolded.
Figure 2
figure 2

Number of errors in the memory test. On average, the old dogs made more errors in the memory test compared to the young dogs. A breakdown of the number of errors made in the memory test, divided by age group, is presented in the figure. The middle line in the box plots represents the median number of errors, the extremes of the boxes represent the lower and upper quartiles, and the error bars represent the minimum and the maximum number of errors. The asterisks indicate a statistically significant difference between the groups (** = p < 0.01).

Figure 3
figure 3

Object avoidance in the Novel object (toy dog) test. The old dogs avoided the toy for a larger proportion of time compared to the young dogs. A breakdown of the percentage of time spent avoiding the toy, divided by age group, is presented in the figure. The middle line in the box plots represents the median proportion of time spent avoiding the toy, the extremes of the boxes represent the lower and upper quartiles, and the error bars represent the minimum and maximum proportion of the time. The asterisk indicates a statistically significant difference between the groups (* = p < 0.05).


The dogs’ behaviour differed in four variables of three subtests. On the second test occasion, social interaction scores were higher than in T0 in the Greeting subtest, i.e. the dogs were quicker to move closer to the experimenter. The Recognition Index and the neophilic behaviour were lower compared to T0 in the NOR subtest, indicating that the dogs spent less time investigating the novel object as they were less interested in it. Finally, the spatial memory scores were lower in the Memory test, i.e. the dogs found the baited container less frequently during the second test occasion (Table 2).


There were no significant differences between dogs tested by the male and the female experimenter in any of the variables measured (all p > 0.05; Table 2).

Control trials in the memory subtest

The dogs were more successful in choosing the baited container when it was in the location they had witnessed during T1 (Trial 1: p = 0.008; Trial 3: p < 0.001; Trial 5: p = 0.003), compared to the control trials (see Table S3 for statistical details).

Interobserver agreement

Interobserver agreement (kappa), Chronbach’s alpha, and Spearman correlations indicated excellent agreement between coders as all values were equal to 1 and all p < 0.001 (Table S4).


The first goal of this study was to measure the reliability of a battery of six indoor behavioural subtests for the rapid assessment of behavioural differences between young and old companion dogs. Our results indicate that the variables object avoidance in the Novel object (toy dog) subtest and the errors in the Memory subtest are reliable and can be used to monitor age-related behavioural changes in companion dogs. Both measures were unaffected by the experimenter’s identity or the re-testing. Furthermore, these variables were associated with good interobserver reliability (see Supplementary Material), confirming that the subtest coding was well standardised.

The second aim was to reproduce the results obtained in our previous studies2,3 in an indoor setting. The Novel object (toy dog) subtest confirmed the large effect of age previously observed2. Younger dogs were much less avoidant of the toy than older animals, meaning that the time they spent moving away from the toy was shorter.

Avoidance is a behavioural manifestation of fear or anxiety49, which is known to increase in dogs as they age13. Age-related changes in the regulation of emotions in dogs are thought to depend on the degeneration of the amygdala, causing increased sensitivity to positive stimuli7. However, anxiety in senior dogs may be caused by multiple reasons, such as central or peripheric neuropathology, sensory decline6, metabolic, gastrointestinal or urogenital disease, dermatological conditions, pain13, or underlying behaviour problems which aggravate with time50,51,52.

During the Novel object subtest, younger dogs spent more time interacting with the toy dog than older dogs. Persistence in interaction with objects might depend on differences in motivation53, which may decline in ageing dogs due to cognitive or physical changes. Moreover, some dogs may have perceived their interaction with the object as a playing activity, and the results could indicate a stronger inclination for playfulness in younger individuals. Playing is a sign of positive emotional states54,55,56, which are fundamental for the individual’s quality of life and should therefore be monitored in senior dogs57. Nevertheless, despite a significant difference between young and old dogs, according to our results, the variable Object interaction in the Novel object subtest displays a re-test effect; therefore, this variable should not be coded over time.

We also replicated our previous findings of a reduced short-term spatial memory performance in aged dogs compared to young dogs2,3, thus confirming the efficiency of this subtest in detecting age-related differences in an independent population of companion dogs. Older dogs more often chose the wrong locations compared to younger dogs. The control trials excluded the possibility that the dogs followed odour cues during the subtest. Therefore, we can conclude that they relied on the visual information they had gathered during the first part of the subtest to find the hidden food in the second part. These findings indicate that the Memory subtest is a reliable and valid behavioural test which could be used to monitor longitudinal changes in dogs’ spatial memory.

Previously, we demonstrated that the Memory subtest has a correlation between errors in the Memory subtest and the canine gut microbiome composition was observed20. This finding will have a large practical impact on the welfare of dogs, as it will allow veterinarians and other animal professionals to perform a standardised, reliable, valid, practical test to monitor an important cognitive skill as the dogs age. Such tests are fundamental for distinguishing between normal and pathological ageing41, as well as for monitoring the progress of age-related pathologies, such as Cognitive Dysfunction Syndrome58. The cognitive decline caused by other medical conditions, such as epilepsy59, could also be monitored.

Furthermore, for all the other subtests, we did not reproduce the previous findings. Kubinyi and Iotchev2 detected a small age effect in the problem box subtest in an outdoor setting, but the present study suggests that, even if the test appears to be consistent over time, it seems to have no construct validity for ageing itself. Similar findings were observed in the study by González-Martinez et al.60, where the authors found significant differences between groups at different levels of cognitive decline (young dogs vs. aged dogs at normal cognitive levels and aged dogs with impaired cognitive levels); however, the authors did not detect significant differences based on age groups (1–4 years, 5–8 years, 9 years and above). In the current study, young and old dogs manipulated the object similarly. Thus, the test may not be consistently effective in detecting age-related behavioural modifications in companion dogs. Similarly, during the exploration and greeting, we did not detect a significant difference in social interaction between young and old dogs, and this variable seems to be affected by the repetition of the test. Therefore, this subtest should not be considered reliable and suitable for longitudinal evaluations of ageing, at least not in indoor tests. Activity levels in the Exploration subtest were consistent between T0 and T1, but old and young dogs’ performances did not vary significantly, meaning that the dogs’ exploratory behaviour in this subtest was not an effective measure of ageing. According to these findings, the Exploration and Greeting tasks should not be employed to monitor age-related differences in companion dogs as described in the current protocol.

This battery of subtests presented for the first time a paradigm to measure novel object recognition (NOR) in family dogs. Contrary to what is largely observed in other species, such as murine models37,38,39, we did not find a difference between younger and older dogs in the standard measure of the recognition of the novel object, the Recognition Index. However, the older dogs demonstrated lower neophilic behaviour (i.e., fewer dogs explored the novel object first) than younger dogs. While this difference may not be predictive of cognitive decline, it is in line with the findings on object interaction in the Novel object toy dog subtest, suggesting a decrease in curiosity and motivation towards objects with age.

Although some of the subtests of the battery have detected reliable behavioural differences associated with ageing, we must point out that the present results are restricted to a specific population of companion dogs. Firstly, the subjects tested in this study did not suffer from any overt medical conditions, and the dogs did not undergo a cognitive assessment. Therefore, the present results reflect age-related behavioural differences associated with ageing. Further investigations may help evaluate how and to what extent certain pathologies could influence senior dogs’ behavioural performances in the subtests, which could result in the development of assessment tools to aid in the diagnosis of medical conditions, including Canine Cognitive Dysfunction. Secondly, both groups of dogs were medium to large-sized. It is well known that the ageing process is strongly associated with body size, as small-sized dogs age more slowly than large-sized dogs41. However, it is unclear whether the subject’s size is a confounding variable in subtests that assess age-related behavioural differences in dogs. Since dogs’ body size has not yet been taken into account, future studies should also evaluate the potential effect of this factor to ensure high external validity.


It is often complicated to clinically separate medical and behavioural conditions in senior dogs13,52. The presence of pathologies, such as cognitive impairment, is usually related to a modification of behaviours (i.e. disorientation, altered interactions, anxiety) and is often difficult to quantify for both the owners and clinicians13,52. Moreover, factors such as breed and individual differences may further confound the correlation between behavioural modifications and specific clinical conditions13. For these reasons, standardised behavioural tests are particularly useful as they may aid the diagnosis and monitoring of age-related changes in dogs, allowing us to make a more apparent distinction between healthy and pathological ageing processes.

Overall, the current findings indicate that two tests with two variables are suitable for assessing age-related differences in companion dogs, namely the ‘errors’ in the Memory test and ‘object avoidance’ in the Novel object (toy dog) test. These variables have good interobserver and inter-experimenter agreement, as well as test–retest reliability. Taking into account previous research, too2,20, the Memory test is both valid and reliable. The Novel object (toy dog) test also appears reliable and demonstrates good external validity; further studies should investigate its internal validity. Since these tests are consistent over time, they can be used for monitoring age-related changes in dogs in longitudinal research and the relationship of the performance with medical conditions, including Canine Cognitive Dysfunction.