Training basic numerical skills in children with Down syndrome using the computerized game “The Number Race”

Individuals with Down syndrome (DS) present reduced basic numerical skills, which have a negative impact on everyday numeracy and mathematical learning. Here, we evaluated the efficacy of the adaptive (non-commercial) computerized game “The Number Race” in improving basic numerical skills in children with DS. The experimental group (EG; N = 30, Mage-in-months 118, range 70–149) completed a training playing with “The Number Race”, whereas children in the control group (CG; N = 31, Mage-in-months 138, range 76–207) worked with software aiming at improving their reading skills. The training lasted 10 weeks with two weekly sessions of 20–30 min each. We assessed both groups’ numerical and reading skills before and immediately after the end of the training, as well as at a 3-months follow-up. We found weak evidence for post-training groups differences in terms of overall numeracy score. However, the EG displayed substantial improvements in specific numerical skills and in mental calculation, which were maintained over time, and no improvement in reading. Conversely, the CG showed improvements in their reading skills as well as in number skills but to a lesser extent compared to the EG. Overall, “The Number Race” appears as a suitable tool to improve some aspects of numeracy in DS.

Basic numerical skills in Down syndrome. DS results from the trisomy of chromosome 21 and it is characterised by physical abnormalities (e.g., growth delay, flat hypoplastic face with short nose) and intellectual disability 9 . The cognitive profile of DS presents compromised verbal abilities and relatively preserved visuospatial skills 10 .
Individuals with DS display numerical and mathematical skills that are lower compared to typically developing individuals who are matched on chronological age and, in some cases, lower to individuals matched on mental age 11,12 . Several authors have tried to disentangle whether the observed poor math achievement can be fully explained by the lower intelligence or it is the consequence of specific numerical deficits, especially in the basic numerical skills that are supposed to be the building blocks of mathematical learning.
There are two mechanisms responsible for representing small and large non-symbolic numerical quantities, respectively 13,14 . The Object Tracking System is a domain-general system that allows individuals to rapidly and accurately represent small numerical quantities, usually less than 4 items. The fast and accurate enumeration of small sets, known as subitizing 15,16 , is a direct expression of the limited capacity of the OTS, which has been related to visuospatial working memory 17,18 . Large numerical quantities, instead, are processed by the Approximate Number System 13,14,19,20 , whereby each numerosity is represented by a Gaussian distribution of activation on a compressed number line. As a consequence, the numerosities progressively overlap giving rise to a ratiodependent discrimination performance. That is, the accuracy in discriminating between sets decreases as the ratio between the two numerical quantities gets close to 1 (e.g., easy 8 vs 12; hard: 16 vs 20 www.nature.com/scientificreports/ The present study. Most of the interventions to improve numerical skills in DS have focussed on early numerical skills. Previous studies have reported preliminary evidence on the effectiveness of programmes to improve basic numerical skills 59,60 , non-symbolic number comparison 61 , counting [62][63][64][65] , arithmetic 66,67 , conservation of numerical quantities 68,69 , and fractions 70 . However, the research in this area is still scarce and presents relevant methodological limitations 71 . Accordingly, most of the training studies lack an active control group and/or follow-up assessment, which makes it difficult to evaluate the efficacy of the training and its long-lasting effects. Here, we evaluate the effectiveness of the adaptive computerized game "The Number Race" [72][73][74] in improving basic numerical skills in children with DS. The game (which is freely available for non-commercial use in multiple language versions) targets those numerical skills that are usually acquired in the preschool time period. The structure of the game is based on four principles: enhancing number sense; cementing the links between representations of number; conceptualizing and automatizing arithmetic; maximizing motivation. Players compete against the software in a numerical comparison task, choosing the larger between two numerical quantities ranging from 1 to 9, which may be sets of dots, digits, or the results of sums or subtractions. An adaptive algorithm modulates the presentation time of the to-be-compared numerical quantities, the size of the dots, or the numerical distance to keep the difficulty of the game at a challenging level, thus working on the zone of proximal learning 75 . The training on the comparison of small and large non-symbolic and symbolic numerical quantities and basic arithmetic makes the software an appropriate tool as children with DS display difficulties in such abilities. The player chooses the larger quantity, so the other quantity is given to the opponent (the software). Afterwards, a board with 40 cells (4 × 10) is presented in a different screen, and players can move their characters as many steps as the numerical quantity they chose in the comparison. The request to move the characters on the game board improves children's ability to positioning numbers on a spatial layout 73,76,77 , which is another impaired ability in children with DS. The race ends when one of the characters reaches the end of the board. Verbal and acoustic feedback is continuously provided to foster motivation. The software has already provided preliminary, but promising, results on its efficacy of improving basic numerical skills in young children and in children with math difficulties 73,78-81 , even though this research presents methodological limitations and more studies are needed 82 .
Here, we present the results of an intervention study, whereby children with DS in the experimental group (EG) played with "The Number Race" whereas children in the control group (CG) worked with software aiming at improving their reading skills. The use of an active control group was designed to ensure a stringent evaluation of the numerical training effects. Though assessing numerical improvements was the primary aim of the present study, we also evaluated whether the reading training may improve literacy in DS. We assessed both groups' numerical and reading skills at pre-test, post-test, immediately after the end of the training, and at follow-up, after 3 months. We expected the EG to show relevant improvements in their numerical, but not reading, skills from pre-test to post-test compared to the CG. Such improvement might be still evident at follow-up, months after the end of the training. Conversely, we expected the CG group to show a larger improvement in reading skills compared to the EG group.

Methods
Participants. Forty-one children with Down Syndrome (DS) from north-eastern Italy took part in the study, after obtaining informed consent from parents and verbal consent from participants. We recruited participants from local associations, which offer support to families of children with intellectual disabilities. We evaluated at pre-test all underage children willing to take part as long as their parents considered them capable of completing the training under the supervision of the experimenter. However, we excluded two participants that at pre-test were not able to complete the standardized numeracy battery or had serious behavioural issues during the testing session. We allocated 20 participants (12 boys, age range in months = 70-149) to the Experimental Group (EG), that played with the Italian version of "The Number Race" 72,74 , and 21 participants (14 boys, age range in months = 76-207) to the active Control Group (CG), that practised with the software "Fondiamoleletterine" 73 (it translates to "Let's Fuse Letters") or "Lettura di base 3" 83 (it translates to "Basic Reading 3"), which aim to improve reading skills. Among children in the CG, 12 played only with "Fondiamoleletterine", 6 only with "Lettura di Base 3" and 3 with both. This subdivision was done considering the child's reading ability. The two groups were matched on age, fluid intelligence (Raven's Colored Matrices 84 ), and receptive vocabulary (Peabody Picture Vocabulary Test-Revised; PPVT-R 85 ). Children were assessed on these tasks by the experimenter in one of the testing sessions before the beginning of the training. Participants' characteristics are reported in Table 1. The age equivalent scores of receptive vocabulary (age-in-months = 51.90) and fluid intelligence (age-inmonths = 65.80) were in line with those expected from a sample of children with DS between 10-11 years of age. Children with DS also displayed the typical cognitive profile with better non-verbal than verbal age equivalent scores (i.e., vocabulary).

Procedure.
We assigned participants to the EG or CG alternatively, that is, the first participant was assigned to the EG, the second to the CG, the third to the EG and so on. The parents of four children explicitly demanded that their children were delivered the numerical training; therefore, these participants were allocated in the EG (this deviation was deemed as acceptable in light of the difficulty in recruitment). We assigned the last four participants to the CG to obtain a balanced sample size in the two groups. After participants were allocated to the experimental or control group, we assessed their cognitive skills in three testing phases (i.e., pre-test, post-test, and follow-up), each composed of four 45-min separate sessions on different days. The pre-test and post-test phases were completed respectively before and after the training, whereas participants completed the last followup session three months after the end of the training. The testing and training sessions were run in a comfortable and quiet room, individually, at home, at the clinical centre, or at school premises according to participants' The lexical subscale assesses the ability to read and write Arabic numbers as well as the ability to connect number-words to the corresponding digits. The semantic subscale measures the ability to compare numerical quantities (i.e., dots and Arabic digits). The counting subscale assesses the ability to recite the number-words sequence forward and backward as well as the knowledge of the order of Arabic digits from 1 to 5. The pre-syntactical scale evaluates the ability to link numbers to sets of dots and to order objects based on their size. We used the sum of the four subscales as an index of basic numerical abilities.
Number words comparison. In this task, children were asked to indicate the larger between two number words. The experimenter said: "Which is 'more' between x candy/ies and y candy/ies?", where x and y were number words ranging from 1 to 9. The comparison were: 4vs2, 2vs7, 3vs8, 2vs1, 8vs7, 5vs4, 3vs6, 7vs6, 1vs5, 9vs3, 1vs4. We calculated the percentage of errors as the outcome measure. For this task, we only collected the total number of correct responses, thereby preventing us from calculating the split-half reliability. However, the same task showed split-half reliability of 0.82 when calculated on data from a previous study with preschool children 36 .
Mental calculation task. The experimenter sequentially read aloud eight-teen arithmetic problems (8 additions and 10 subtractions), and the child had to respond as fast as they could. Children could count on their fingers to calculate the answer. We calculated the percentage of correct responses. The split-half reliability was 0.95 at pre-test, 0.97 at post-test, and 0.97 at follow-up.
Number-to-position task (NTP 49,50,87   Match-to-sample task. A sample set of white dots on a gray background appeared in the centre of the computer screen for 300 ms. Then, another set of black dots appeared on the screen and the child indicated whether the numerosity of the set was the same or different compared to the sample set (i.e., white dots). The numerosity of the target set (i.e., black dots) could be the same or minus/plus one compared to the sample set. There was no www.nature.com/scientificreports/ time restriction to provide the answer as the target set remained on the screen until the child responded. Size and spatial arrangement of the dots changed in each trial to prevent children based their response on non-numerical visual cues. There were 90 experimental trials: 12 trials for each sample numerosity from 2 to 7, and 9 trials for 1 and 8. Before starting the task, participants completed ten training trials, whereby the presentation of the sample set was longer and decreased trial by trial, to help the child familiarise with the task. We calculated the percentage of correct responses for each participant. The split-half reliability was 0.58 at pre-test, 0.56 at post-test, and 0.62 at follow-up.
Number naming. The child read aloud the Arabic number presented in the middle of the computer screen. We showed all the numbers from 0 to 20 in random order and calculated the percentage of correct responses. The split-half reliability was 0.97 at pre-test, 0.96 at post-test, and 0.97 at follow-up.
Counting. A set of objects (i.e., bananas or apples) appeared on the screen and the child counted it as fast as possible. There were 2 trials for each of the following target numerosities: 1, 2, 3, 4, 5, 8, 10. We calculated the percentage of correct responses. The split-half reliability was 0.80 at pre-test, 0.84 at post-test, and 0.78 at followup.
Digit comparison. Children indicated the larger between two digits, ranging from 1 to 9, that were presented on the left and right side of the computer screen respectively. There were 72 trials presenting all the possible comparisons of digits between 1 and 9, each comparison repeated twice, once with the larger digit on the right side and once with the larger digit on the left side of the screen. The split-half reliability was 0.93 at pre-test, 0.93 at post-test, and 0.91 at follow-up.
Letter recognition task 88 . The experimenter read aloud a letter in a triplet of letters and the child indicated the corresponding letter. There were 21 triplets of letters and children obtained a point for each correct recognition. We calculated the percentage of errors for each child. The split-half reliability was 0.91 at pre-test, 0.86 at posttest, and 0.85 at follow-up.
Syllable reading 89 . Children read aloud a series of syllables (a matrix of 10 × 10), in order, from left to right, as fast as possible. The number of errors was calculated for each child. The split-half reliability was 0.997 at pre-test, 0.996 at post-test and 0.997 at follow-up.
Word and pseudoword reading 90 . Children read aloud four lists of 28 words and three lists of 16 pseudo-words. The lists were presented one at the time. We calculated the total number of errors for each child. Test-retest reliability was r = 0.56. The selected numerical tasks measure those abilities that are the target of the Number Race. Accordingly, the Number Race repeatedly asks children to compare non-symbolic and symbolic numerical quantities as measured in the match-to-sample, number words comparison, and Arabic digit comparison tasks. Some children might count the dots in each set, thereby using serial counting, which we assessed in the counting task. In the advanced stages of the game, the to-be-compared numerical quantities are the results of additions and subtractions, which were tested in the mental calculation task. Moreover, the software always reads aloud the Arabic digits so children can improve the connection between visual and verbal representation of numbers, which we measured using the naming task. The Number Race asks children to move the game characters on a linear board, thereby improving the association between number and space, which was evaluated using the number line tasks. It is worth noting that the administered tasks were structurally similar to the component tasks that form the Number Race game. A notable exception is the number line task(s) because children were not required to mark the spatial position of target numbers in the Number Race. In this vein, the number line tasks could be considered as near transfer tasks. Finally, the BIN is a standardised battery assessing different aspects of numerical knowledge in preschool children, which is the corresponding mental age of the participants with DS. Similarly, we selected the letter recognition, syllables reading, and word and pseudoword reading task to assess the effectiveness of the control (reading) intervention.
Training. Both the experimental and the active control group completed 20 training sessions across ten weeks, with two weekly sessions of 20-30 min each. The experimenter supervised participants during the training sessions and continuously provided feedback to support participants' engagement with the training. Participants were encouraged to engage with the training activity for at least 20 min, but no more than 30 min. The training was delivered on the experimenter's laptop and all children completed the planned number of sessions.
Children in the experimental group played with the Italian version of "The Number Race" game [72][73][74] . Children in the active control group underwent an intervention based on "Fondiamoleletterine" 91 or "Lettura di base 3" 83 , according to their age and level. The former is a software aimed to support the early steps of reading acquisition, training phoneme blending. Moreover, it has been shown to be effective in children with DS, in particular, leading to improvements in decoding of syllables and words and repetition of auditory stimuli 92 . It is composed of seven levels of increasing difficulty: in the first one, letters are associated by shape and sound to a figure (to facilitate memorization), while in the following levels, phoneme blending is trained, starting from syllables of increasing difficulty (consonant + vowel at the beginning, three-letter syllables later), to disyllabic words. The child can then listen to the syllable/word reproduced by the software. The latter intervention improved word reading. The game is composed of different kinds of activities, where the child is required to read words or brief www.nature.com/scientificreports/ texts, for example, phonemic and syllabic inference, word-picture association, vertical word reading. The software gave feedback and reinforcements to the participant and all the activities were set in a playful environment to make exercises pleasant and motivating. The syllables and words utilized in these training sessions were not the same adopted in the testing sessions.

Results
We analysed the effect of the training by running mixed ANOVAs for each task with Session (Pre-test vs. Posttest vs. Follow-up) as a within-subjects factor and Group (EG vs. CG) as a between-subjects factor. When the assumption of Sphericity was violated, the Greenhouse-Geisser adjustment was applied to p-values (reported as p [gg] ). Post-hoc t-tests were two-tailed and the p-values were corrected for multiple comparisons analysis using the Bonferroni method (i.e., alpha value divided by the number of comparisons). Hedges' g was calculated to determine the magnitude of the difference between the groups at each session. We also reported Bayes factors (BF 10 ) expressing the probability of the data given H1 relative to H0 (i.e., values larger than 1 are in favour of H1 whereas values smaller than 1 are in favour of H0 93,94 ). We reported the Bayes factors (BF) as the ratio of BFs 10 between compared models. If the ratio between BF 10 of model A and BF 10 of model B is larger than 1, then there is evidence for model A. Conversely, if the ratio is smaller than one, there is evidence for model B. We reported the scores in the administered tasks across sessions for the two groups in Table 2. The zero-order correlations between all pre-test scores are reported in Table S1 in the Supplementary Information. In Table 3, we reported the results of mixed ANOVAs. The contrasts between and within groups were reported only when the interaction between Session and Group was significant. According to Bonferroni correction, we adjusted the alpha levels to 0.016 (i.e., 0.05/3) for comparisons between groups in each one of the three sessions, and to 0.008 (i.e., 0.05/6) for comparisons between pre-test and post-test, pre-test and follow-up, and post-test and follow-up within the two groups. Due to a computer failure, results at follow-up in mental calculation, naming, counting and digit comparison tasks were missing for 6 children of EG and 3 of CG, and in the matchto-sample results were missing for 6 children of EG and 11 of CG.

BIN.
We found strong evidence (BF = 19.46) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). Although we found anecdotal evidence for a difference between the two groups at post-test and follow-up, both groups improved their scores from pre-test to post-test and from pre-test to follow-up but not from the post-test to follow-up. These improvements appear to be more substantial in the EG (extreme evidence) compared to the CG (strong evidence).   www.nature.com/scientificreports/ Number comparison. We found strong evidence (BF = 18.54) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). The EG displayed a better performance compared to the CG at follow-up (moderate evidence). Moreover, only the EG reduced the errors from pre-test to post-test and such improvement was maintained until the follow-up (extreme evidence), which did not differ from the post-test.
Mental calculation. We found moderate evidence (BF = 4.61) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). Although we found anecdotal evidence for a difference between the two groups at post-test and follow-up, only the EG group displayed a better performance (moderate evidence) between pre-test and post-test and between pre-test and follow-up.
Number to position task. For the 0-10, we found anecdotal evidence (BF = 0.405) in favour of the model with the two main effects compared with the model also including the interaction between session and group (Fig. 1). For the 0-20, there was moderate evidence (BF 10 = 4.07) in favour of the model with the two main effects and the interaction and strong evidence (BF = 32.31) for its superiority compared to the model with only the two main effects (Fig. 1). The EG did not differ from the CG at pre-test, post-test, but did at follow-up (moderate evidence) by showing a better performance. However, we found anecdotal evidence for improvement between pre-test, post-test, and follow-up in the two groups.
Match-to-sample. We found anecdotal evidence (BF = 2.59) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). There was anecdotal evidence for a difference between groups across the three testing sessions. However, the EG displayed better performance in the post-test (strong evidence) and follow-up (moderate evidence) compared to the pre-test session.
Number naming. We found moderate evidence in favour of the model with the main effects of session and group (BF = 0.22) compared to the model with additionally the interaction (Fig. 1).
Counting. We found moderate evidence in favour of the model with the main effects of session and group (BF = 0.22) compared to the model with additionally the interaction (Fig. 1).

Digit comparison.
We did not find supporting evidence neither for the interaction between the two groups nor for the model with the two main effects only (BF = 1.01) (Fig. 1).
Letter recognition. We found anecdotal evidence in favour of the model with the main effects of session and group (BF = 0.42) compared to the model with additionally the interaction (Fig. 2).

Syllable reading.
There was extreme evidence (BF = 181) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). We found mainly anecdotal evidence for a difference between the two groups in the three testing sessions. However, there was moderate evidence for improvement between pre-test and post-test and pre-test and follow-up in the CG.
Word and pseudoword reading. For word reading, there was moderate evidence (BF = 6.67) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). There was mainly anecdotal evidence for a difference between the two groups in the three testing sessions. Only the CG group showed moderate evidence for improvement between pre-test and follow-up. For pseudoword reading, there was moderate evidence (BF = 7.42) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). There was anecdotal evidence for a difference between groups in the testing sessions. However, the CG showed an improvement in performance (strong evidence) from pre-test to post-test and from post-test to follow up (moderate evidence).

Discussion
Individuals with DS display numerical deficits that range from processing non-symbolic numerical quantities to arithmetic performance 12 . Several studies have evaluated interventions to improve numerical skills in DS yielding mixed results 71 . We provide new evidence to this research area by evaluating the efficacy of the computerized game "The Number Race" (a non-commercial, freely available software) in improving numerical skills in children with DS. We assigned participants to an experimental group (EG) who played with "The Number Race" and a control group (CG) who worked on reading skills using two different software. We evaluated participants' numerical and reading skills before and immediately after the training as well as after three months. EG and the CG had similar numerical knowledge at pre-test and both groups displayed some improvement in their basic numerical skills immediately after the training, as measured by the numeracy battery (BIN). However, the EG showed a large improvement, whereas the CG obtained a small improvement. Note that the EG improvement on the numeracy battery (BIN), when evaluated against the normative data (which is available for preschoolers), corresponds to the (average) change observed between 51 and 73 months of age in typically developing children. Importantly, the level of basic numerical skills measured at the end of the training was maintained at the 3-months follow-up. The EG group displayed better performance in the number comparison task compared www.nature.com/scientificreports/ to the CG at follow-up, and only the EG showed such improvement from pre-test to post-test. The EG group also displayed improvements in mental calculation, which were maintained at follow-up, compared to their pre-test performance, even though there were no significant differences compared to the CG. The evidence for any changes in the CG's scores in other numerical tasks was anecdotal at best. Conversely, the CG group showed a medium improvement in syllable reading and a small improvement in pseudoword reading at the end of the training. The latter score was further improved at follow-up in the CG. We found mainly anecdotal evidence for any improvement of reading skills in the EG. Overall, we found substantial evidence for an improvement of the EG group in some but not all numerical tasks. We speculate that training had deeper effects on quantity understanding and manipulation, thereby yielding larger improvements in tasks that force children to actively manipulate numerical information (BIN, number comparison, and calculation) compared to others that do so to a lesser extent (e.g., number naming, counting). Nevertheless, the visual inspection of means suggests that the performance tended to increase across www.nature.com/scientificreports/ tasks. This suggests that twenty training sessions might not be enough to observe a substantial between-groups difference in children with DS, whereas they appear to be sufficient in typically developing children 73 . Future studies may explore whether longer training could make improvements in the numerical tasks more evident compared to the control group. Another aspect worth considering is the large variability in children's numerical and reading skills, which might have obscured differences between the two groups. Accordingly, some children might have found some aspects of the training superfluous, whereas others would have benefited more from practising on given components of the training. For example, the Number Race always requires players to move the characters on the board game aiming at improving the linear relation between space and numbers, which some participants may already possess, thereby making the training redundant. In this light, a training that aims at improving a variety of numerical skills at the same time, such as the Number Race, might dilute the capacity to generate a significant improvement in one specific skill in a limited amount of time. The Number Race cannot change its structure and fully adapt to the participant's knowledge, which can instead be achieved in one-by-one training under the supervision of an expert. Sometimes what matters is not what is the training but how much it is trained. The moderate evidence for an improvement in mental calculation in DS 49 , which, instead, appeared to be stronger in typically developing preschool children 63 , might be due to the fact that the game requires to perform additions and subtractions only in the more advanced levels when participants accuracy in comparing symbolic numbers is maintained at a level of high accuracy 64 . Variable performance might have sent children with DS back to game levels that mainly involve number comparison, thereby preventing access to later game levels in which they could benefit from intense training on arithmetic. Not only what and how much, but also how a numerical skill is trained should be considered. In this vein, another possibility for the limited improvement in arithmetic is that the Number Race does not provide any support in terms of strategy. Admittedly, the Number Race presents the arithmetic operation in terms of dots that are added to or removed from a given set. However, there is no explicit instruction on using counting strategies such as counting-on from the larger set, which has been successfully taught to children with DS in a previous study 56 . www.nature.com/scientificreports/ The CG showed some improvements in reading whereas the performance of the EG remained stable across reading tasks. Although the improvement in the CG might reflect a positive effect of the reading training, such evidence should be carefully considered. Accordingly, the CG tended to display lower error scores in the reading tasks that became more similar to those of the EG at post-test and follow-up. This amelioration might be due to simple regression to the mean, rather than a substantial effect of the reading training. The CG also displayed some amelioration in their numerical skills. However, the lack of a waiting list group prevents from disentangling whether this amelioration could be attributed to the simple effect of time or the effect of the reading training on early numerical skills 95 .
The Number Race simultaneously trains several numerical abilities such as non-symbolic and symbolic number comparison, counting, arithmetic, number-space association, and number recognition. In this light, the game enhances the link between different representations of numbers, which are the cornerstones of early numerical development 30,37 , while introducing the first arithmetical procedures. Nonetheless, it is difficult to draw strong theoretical conclusions on the relation between trained numerical skills, as done in other studies, whereby, for instance, training based exclusively on non-symbolic stimuli transferred to symbolic numbers 96,97 . In this regard, the Number Race aims at establishing and strengthening the main functional components of the cognitive architecture underlying number processing and mathematical learning 74 rather than improving a specific numerical skill.
One theoretical conclusion concerns the trainability of numerical skills in DS. The descriptive statistics suggest that children with DS could improve in a variety of numerical skills, although the training might need to be longer and more intense compared to typical development. Yet, it is unclear whether individuals with DS have simply memorised some numerical facts and procedure (e.g., "seven is larger than six", "one plus one equals two"), which still constitutes a valuable achievement, or have established a deep understanding of the numerical operations they have been practising. If the latter is the case, an improvement should be observed in numerical skills that were not targeted during the training (i.e., transfer). In this regard, the number line tasks could be considered as transfer tasks. Accordingly, children were not explicitly requested to place numbers onto a visual line during the game, although the number-space association was trained when children moved characters on the board. Children in the experimental group tended to show a more accurate placing of numbers on the line tasks, especially the 0-20 interval, compared to the control group at follow-up, thereby suggesting a real mastering of the numerical skills which were not directly trained. The same cannot be said for other tasks whose structure resembled the training proposed in the Number Race, such as comparing dots, as done in the match-to-sample task, or comparing symbolic numbers, as done in the number words and digits comparison tasks. Future studies should test whether training effects generalize to numerical activities that need to be carried out in daily life. The best way to put this hypothesis to test would be to obtain some real-life measures after the training to assess whether individuals with DS apply the learned numerical skills to different tasks and contexts. This would simultaneously assess the presence of a real transfer effect and the ecological validity of the training.
The conclusions of our study should be carefully considered in light of methodological limitations. The participants were alternately allocated to one of the two training groups with the exception of four participants who were allocated to the numerical training to meet the parents' request. As a consequence, four participants were allocated to the reading training to balance the number of participants in each group. We decided to accommodate parents' request to increase the compliance with the intervention, reduce attrition and increase our final sample size. The implemented assignment procedure, however, diverges from the standard random assignment, thereby requiring carefulness when interpreting the results. For instance, some children (and families) might have had a more positive attitude toward one of the interventions compared to the other. In this light, measuring expectations and attitudes toward the interventions can ensure that participants (and families) across groups have similar willingness to undergo the intervention 98 .
The small sample size may question the reliability of the findings. However, the pattern of results went in the expected direction with the EG improving in numerical tasks and the CG improving in reading tasks, also in the case of anecdotal evidence for a reliable improvement. Accordingly, a close look at the descriptive statistics reveals that the EG group displayed on average an increase in performance on almost all the administered numerical tasks, even though there was strong or extreme evidence only in a few instances. Larger sample sizes would provide more information on whether a given numerical skill can or cannot be effectively improved in the chosen time frame by training with the Number Race.
Another limitation is the lack of blinding regarding the training participants received. Research assistants conducting the assessment were aware of the group the participants belonged to. The lack of blinding might have generated a bias in the experimenters assessing the performance before and after the intervention. Future studies shall achieve blinding by having different experimenters for the supervision of the training and the assessment before and after the intervention.
A further limitation of our study is that we did not compare the computerized training with another numerical activity delivered in a more traditional way (e.g., one-to-one teaching with paper-and-pencil materials). Nevertheless, the use of an adaptive computerised task brings some intrinsic advantages as the game requires minimal supervision and can be easily implemented at home and school under the supervision of a non-expert (e.g., parent or teacher). Nonetheless, it remains an open question to identify the most beneficial training programme for DS not only in terms of increasing numerical and mathematical skills, but also in terms of cost-benefit for institutions, practitioners, families, and individuals.
Despite the above-mentioned limitations, the present study shows that The Number Race can be a promising tool to improve basic numerical skills in children with DS.