Introduction

Numeracy constitutes an essential skill as it has been related to overall academic achievement, financial security, and well-being1,2,3,4,5,6. An adequate numeracy level is also necessary to accomplish those numerical tasks that are part of everyday life, such as calculating bills or managing money proficiently. Everyday math constitutes a crucial goal, especially for individuals with intellectual disabilities7. In this regard, training numerical skills becomes relevant in Down syndrome (DS), which is one of the most frequent conditions characterised by intellectual disability8.

Basic numerical skills in Down syndrome

DS results from the trisomy of chromosome 21 and it is characterised by physical abnormalities (e.g., growth delay, flat hypoplastic face with short nose) and intellectual disability9. The cognitive profile of DS presents compromised verbal abilities and relatively preserved visuospatial skills10.

Individuals with DS display numerical and mathematical skills that are lower compared to typically developing individuals who are matched on chronological age and, in some cases, lower to individuals matched on mental age11,12. Several authors have tried to disentangle whether the observed poor math achievement can be fully explained by the lower intelligence or it is the consequence of specific numerical deficits, especially in the basic numerical skills that are supposed to be the building blocks of mathematical learning.

There are two mechanisms responsible for representing small and large non-symbolic numerical quantities, respectively13,14. The Object Tracking System is a domain-general system that allows individuals to rapidly and accurately represent small numerical quantities, usually less than 4 items. The fast and accurate enumeration of small sets, known as subitizing15,16, is a direct expression of the limited capacity of the OTS, which has been related to visuospatial working memory17,18. Large numerical quantities, instead, are processed by the Approximate Number System13,14,19,20, whereby each numerosity is represented by a Gaussian distribution of activation on a compressed number line. As a consequence, the numerosities progressively overlap giving rise to a ratio-dependent discrimination performance. That is, the accuracy in discriminating between sets decreases as the ratio between the two numerical quantities gets close to 1 (e.g., easy 8 vs 12; hard: 16 vs 20).

Young children (30-month-old) with DS failed to discriminate between sets with two and three objects, whereas adults with DS (24-year-old) displayed a performance in a numerosity comparison task similar to control individuals21. Six-year-old children with DS displayed the expected ratio-dependent effect in a numerosity comparison task as they could discriminate between 8 and 16 dots (1:2 ratio) but failed to discriminate between 8 and 12 (2:3 ratio) as similarly done by controls matched on both chronological and mental age22. Similarly, children with DS displayed a performance that decreased as a function of the numerical ratio and was higher for large compared to small numerical quantities23. In line with this evidence, two eye-tracking studies showed that children with DS demonstrate looking patterns, when performing a numerosity comparison task, similar to typically developing individuals matched for mental age24,25. Conversely, others have found that children with DS displayed a lower ability in comparing numerical quantities compared to children matched on mental age, but not in performing non-symbolic addition26. Sella, Lanfranchi, and Zorzi27 used a match-to-sample task to explore the ability to discriminate between small and large numerical quantities in children with DS (14-year-old) and controls. Children with DS displayed a lower performance in comparing small numerical quantities (i.e., 2 vs 3, 3 vs 4) compared to children matched on mental and chronological age. The performance in comparing large numerical quantities, instead, was lower only compared to that of chronologically matched individuals but similar to that of mental age controls. Overall, it emerges that individuals with DS display a specific numerical deficit concerning the processing of small numerical quantities. In contrast, the processing of large numerical quantities appears to be in line with mental age23. The deficit in processing small numerical quantities might be explained by a limited capacity of the OTS. Accordingly, children with DS show a tracking system whose capacity is limited to one element28 and a specific deficit in processing more than one item at a time29.

The development of early numerical skills is characterised by a progressive integration between non-symbolic and symbolic representations of numbers30. Initially, young children can discriminate between small and large non-symbolic numerical quantities31,32,33. Around the age of 2, children learn how to recite the number sequence in its stable order. However, it takes almost two years of age to learn and master counting by repeatedly associating number words with the respective external numerical quantities34,35. Thereafter, children understand the magnitude relation between numerical symbols (i.e., number words and Arabic digits36,37). The ability to compare non-symbolic and symbolic numerical quantities has been repeatedly associated with mathematical achievement38. The more reliable association is between symbolic number comparison and arithmetic fluency, as magnitude understanding supports the choice of an efficient calculation strategy (e.g., counting from larger39).

On the one hand, it has been suggested that children with DS perform counting as a rote behaviour without having a deep understanding of the cardinality principle (i.e., understanding that the last pronounced number word corresponds to the cardinality of the set34,40), which in turn prevents them from developing more advanced numerical skills41. On the other hand, some authors have argued that children with DS master the counting principles42,43, but have a reduced number word sequence44,45 and they are slower when counting items, although comparable to typically developing children matched on mental age27. For instance, Sella, Lanfranchi and Zorzi27 asked children with DS and controls matched on mental and chronological age to perform a match to sample task, in which children had to verify whether the numerosity of a given visual set matched a previously presented Arabic numeral. Children with DS displayed an accuracy slightly lower but mostly comparable to MA children, even though their response times were slower. Overall, it seems that children with DS can understand the counting principles, but their poor counting performance might be related to limited vocabulary capacity and deficit in implementing counting behaviour46.

Children with DS also display a poor performance when translating numbers into a spatial position as assessed by the number line task47,48,49, whereby participants place several consecutively presented target numbers on a visual horizontal line entailing a numerical interval. For instance, participants mark the location of the target number 23 on a line with the number 0 on the left-end side and the number 100 on the right-end side. It has been repeatedly observed that children display a shift from a biased (log-like) to an accurate (linear) positioning with increased numerical knowledge and experience with the proposed numerical interval. Accordingly, preschool children display a linear positioning in the 1–10 interval but a biased one in the 0–100 interval50, whereas older children display a linear positioning also in the 0–100 interval47. Children with DS (14 years-old) displayed a linear positioning in the interval 1–10, in line with the performance of the control group matched on mental age, even though their precision in positioning numbers was lower compared to typically developing children matched on chronological age. In the interval 0–100, both DS and mental age-matched controls displayed a biased (log-like) positioning, whereas the positioning of the chronological age-matched group was linear51.

The poor performance in the number line task could be related to a lack of arithmetic strategies when solving the task, such as using the midpoint of the line as an anchoring point52,53,54,55,56. Several studies have shown that individuals with DS have reduced arithmetic skills as measured on standardized tests. Brigstocke et al.11 reported that only slightly more than half of the children with DS in their sample could complete a battery to assess basic numerical skills, including arithmetic, and their scores were yet extremely low. Similarly, it has been found that only half of the adolescents with DS can perform simple additions57 and adults with DS can recognise numbers and count, but their arithmetic skills are essentially absent58.

In summary, individuals with DS have reduced ability to process small numerical quantities compared to controls matched on mental age, whereas the ability to discriminate large numerical quantities is in line with mental age. Children with DS seem to master the counting principles, even though they are slower when enumerating items and more prone to commit errors. The accuracy in positioning numbers on the visual line is line with mental age whereas arithmetic skills are severely reduced. It is worth highlighting that these specific numerical deficits might be the byproduct of domain-general factors. Accordingly, the deficit in comparing small numerical quantities can be related to poor visuospatial working memory capacity. Similarly, the counting deficit in DS could be attributed to poor knowledge of the number sequence, which in turn, depends on verbal memory.

The present study

Most of the interventions to improve numerical skills in DS have focussed on early numerical skills. Previous studies have reported preliminary evidence on the effectiveness of programmes to improve basic numerical skills59,60, non-symbolic number comparison61, counting62,63,64,65, arithmetic66,67, conservation of numerical quantities68,69, and fractions70. However, the research in this area is still scarce and presents relevant methodological limitations71. Accordingly, most of the training studies lack an active control group and/or follow-up assessment, which makes it difficult to evaluate the efficacy of the training and its long-lasting effects.

Here, we evaluate the effectiveness of the adaptive computerized game “The Number Race”72,73,74 in improving basic numerical skills in children with DS. The game (which is freely available for non-commercial use in multiple language versions) targets those numerical skills that are usually acquired in the preschool time period. The structure of the game is based on four principles: enhancing number sense; cementing the links between representations of number; conceptualizing and automatizing arithmetic; maximizing motivation. Players compete against the software in a numerical comparison task, choosing the larger between two numerical quantities ranging from 1 to 9, which may be sets of dots, digits, or the results of sums or subtractions. An adaptive algorithm modulates the presentation time of the to-be-compared numerical quantities, the size of the dots, or the numerical distance to keep the difficulty of the game at a challenging level, thus working on the zone of proximal learning75. The training on the comparison of small and large non-symbolic and symbolic numerical quantities and basic arithmetic makes the software an appropriate tool as children with DS display difficulties in such abilities. The player chooses the larger quantity, so the other quantity is given to the opponent (the software). Afterwards, a board with 40 cells (4 × 10) is presented in a different screen, and players can move their characters as many steps as the numerical quantity they chose in the comparison. The request to move the characters on the game board improves children’s ability to positioning numbers on a spatial layout73,76,77, which is another impaired ability in children with DS. The race ends when one of the characters reaches the end of the board. Verbal and acoustic feedback is continuously provided to foster motivation. The software has already provided preliminary, but promising, results on its efficacy of improving basic numerical skills in young children and in children with math difficulties73,78,79,80,81, even though this research presents methodological limitations and more studies are needed82.

Here, we present the results of an intervention study, whereby children with DS in the experimental group (EG) played with “The Number Race” whereas children in the control group (CG) worked with software aiming at improving their reading skills. The use of an active control group was designed to ensure a stringent evaluation of the numerical training effects. Though assessing numerical improvements was the primary aim of the present study, we also evaluated whether the reading training may improve literacy in DS. We assessed both groups’ numerical and reading skills at pre-test, post-test, immediately after the end of the training, and at follow-up, after 3 months. We expected the EG to show relevant improvements in their numerical, but not reading, skills from pre-test to post-test compared to the CG. Such improvement might be still evident at follow-up, months after the end of the training. Conversely, we expected the CG group to show a larger improvement in reading skills compared to the EG group.

Methods

Participants

Forty-one children with Down Syndrome (DS) from north-eastern Italy took part in the study, after obtaining informed consent from parents and verbal consent from participants. We recruited participants from local associations, which offer support to families of children with intellectual disabilities. We evaluated at pre-test all underage children willing to take part as long as their parents considered them capable of completing the training under the supervision of the experimenter. However, we excluded two participants that at pre-test were not able to complete the standardized numeracy battery or had serious behavioural issues during the testing session. We allocated 20 participants (12 boys, age range in months = 70–149) to the Experimental Group (EG), that played with the Italian version of “The Number Race”72,74, and 21 participants (14 boys, age range in months = 76–207) to the active Control Group (CG), that practised with the software “Fondiamoleletterine”73 (it translates to “Let’s Fuse Letters”) or “Lettura di base 3”83 (it translates to “Basic Reading 3”), which aim to improve reading skills. Among children in the CG, 12 played only with “Fondiamoleletterine”, 6 only with “Lettura di Base 3” and 3 with both. This subdivision was done considering the child’s reading ability. The two groups were matched on age, fluid intelligence (Raven’s Colored Matrices84), and receptive vocabulary (Peabody Picture Vocabulary Test-Revised; PPVT-R85). Children were assessed on these tasks by the experimenter in one of the testing sessions before the beginning of the training. Participants’ characteristics are reported in Table 1. The age equivalent scores of receptive vocabulary (age-in-months = 51.90) and fluid intelligence (age-in-months = 65.80) were in line with those expected from a sample of children with DS between 10–11 years of age. Children with DS also displayed the typical cognitive profile with better non-verbal than verbal age equivalent scores (i.e., vocabulary).

Table 1 Experimental Group (EG) and Control group (CG) characteristics.

Procedure

We assigned participants to the EG or CG alternatively, that is, the first participant was assigned to the EG, the second to the CG, the third to the EG and so on. The parents of four children explicitly demanded that their children were delivered the numerical training; therefore, these participants were allocated in the EG (this deviation was deemed as acceptable in light of the difficulty in recruitment). We assigned the last four participants to the CG to obtain a balanced sample size in the two groups. After participants were allocated to the experimental or control group, we assessed their cognitive skills in three testing phases (i.e., pre-test, post-test, and follow-up), each composed of four 45-min separate sessions on different days. The pre-test and post-test phases were completed respectively before and after the training, whereas participants completed the last follow-up session three months after the end of the training. The testing and training sessions were run in a comfortable and quiet room, individually, at home, at the clinical centre, or at school premises according to participants’ availability. Trained graduate students, who were not blind to group allocation, completed the assessments at pre-test, post-test, and follow-up and supervised children during training. We recruited participants between April 2016 and August 2019 considering the research project resources in terms of staff and working hours. The present study was approved by the Ethics Committee for Psychological Research of the University of Padova and it was performed in accordance with the Declaration of Helsinki. The trial was not pre-registered and there were no substantial changes in the methods or outcome variables after the trial began.

Tasks

We reported the test–retest reliability for the standardised tests whereas for the other tasks we calculated the split-half reliability by correlating the scores, separately for each session, between odd and even trials, and then applying the Spearman-Brown formula (i.e., [2 × r]/[1 + r]).

Numerical intelligence battery (BIN: Batteria Intelligenza Numerica)

The BIN86 is a standardized numeracy test designed for preschool-level children and composed of four subscales: lexical, semantic, counting, and pre-syntactic. The battery demonstrated good psychometric properties, with all subscales presenting high reliability (Lexical subscale: r = 0.89, Semantic subscale: r = 0.69, Counting subscale: r = 0.74, Pre-syntactic subscale: r = 0.79). The lexical subscale assesses the ability to read and write Arabic numbers as well as the ability to connect number-words to the corresponding digits. The semantic subscale measures the ability to compare numerical quantities (i.e., dots and Arabic digits). The counting subscale assesses the ability to recite the number–words sequence forward and backward as well as the knowledge of the order of Arabic digits from 1 to 5. The pre-syntactical scale evaluates the ability to link numbers to sets of dots and to order objects based on their size. We used the sum of the four subscales as an index of basic numerical abilities.

Number words comparison

In this task, children were asked to indicate the larger between two number words. The experimenter said: “Which is ‘more’ between x candy/ies and y candy/ies?”, where x and y were number words ranging from 1 to 9. The comparison were: 4vs2, 2vs7, 3vs8, 2vs1, 8vs7, 5vs4, 3vs6, 7vs6, 1vs5, 9vs3, 1vs4. We calculated the percentage of errors as the outcome measure. For this task, we only collected the total number of correct responses, thereby preventing us from calculating the split-half reliability. However, the same task showed split-half reliability of 0.82 when calculated on data from a previous study with preschool children36.

Mental calculation task

The experimenter sequentially read aloud eight-teen arithmetic problems (8 additions and 10 subtractions), and the child had to respond as fast as they could. Children could count on their fingers to calculate the answer. We calculated the percentage of correct responses. The split-half reliability was 0.95 at pre-test, 0.97 at post-test, and 0.97 at follow-up.

Number-to-position task (NTP49,50,87)

Children were presented with a 20-cm line on a white landscape sheet. The left end of the line was labelled with the number 1 and the right end side was labelled either with 10 or 20. The target number to be positioned was shown on the left upper corner of the sheet. For each interval, there were eight randomly presented target numbers (i.e., 2, 3, 4, 5, 6, 7, 8, 9 for the 1–10 interval; 2, 4, 6, 7, 13, 15, 16, 18 for the 1–20 interval). Every trial, a new number line was presented with a different target number to be placed. The experimenter said: “Now we are going to play a game with number lines. You can see that this line goes from 1 to 10 (or 20). I will tell you a number and you have to indicate which is the place of this number on the line, as precise as you can.” The instructions were repeated as many times as needed, but no feedback was given. As training trials, children had to place 1 and 10 in the 1–10 line, whereas in the 1–20 line the training trials were 1 and 20. The experimenter named the target numbers every trial. Children drew a vertical mark on the line where they thought the target number should be placed. Some children had difficulties in holding the pencil, so they were asked to point with their finger the position of the target number on the line and the experimenter made the mark. An index of accuracy on this task was obtained by computing the individual percentage of absolute error ([[|Estimate-Target Number|]/Numerical Interval]×100). The split-half reliability for the range 0–10 was 0.78 at pre-test, 0.87 at post-test and 0.87 at follow-up; for the range 0–20, it was 0.77 at pre-test, 0.84 at post-test and 0.87 at follow-up.

Match-to-sample task

A sample set of white dots on a gray background appeared in the centre of the computer screen for 300 ms. Then, another set of black dots appeared on the screen and the child indicated whether the numerosity of the set was the same or different compared to the sample set (i.e., white dots). The numerosity of the target set (i.e., black dots) could be the same or minus/plus one compared to the sample set. There was no time restriction to provide the answer as the target set remained on the screen until the child responded. Size and spatial arrangement of the dots changed in each trial to prevent children based their response on non-numerical visual cues. There were 90 experimental trials: 12 trials for each sample numerosity from 2 to 7, and 9 trials for 1 and 8. Before starting the task, participants completed ten training trials, whereby the presentation of the sample set was longer and decreased trial by trial, to help the child familiarise with the task. We calculated the percentage of correct responses for each participant. The split-half reliability was 0.58 at pre-test, 0.56 at post-test, and 0.62 at follow-up.

Number naming

The child read aloud the Arabic number presented in the middle of the computer screen. We showed all the numbers from 0 to 20 in random order and calculated the percentage of correct responses. The split-half reliability was 0.97 at pre-test, 0.96 at post-test, and 0.97 at follow-up.

Counting

A set of objects (i.e., bananas or apples) appeared on the screen and the child counted it as fast as possible. There were 2 trials for each of the following target numerosities: 1, 2, 3, 4, 5, 8, 10. We calculated the percentage of correct responses. The split-half reliability was 0.80 at pre-test, 0.84 at post-test, and 0.78 at follow-up.

Digit comparison

Children indicated the larger between two digits, ranging from 1 to 9, that were presented on the left and right side of the computer screen respectively. There were 72 trials presenting all the possible comparisons of digits between 1 and 9, each comparison repeated twice, once with the larger digit on the right side and once with the larger digit on the left side of the screen. The split-half reliability was 0.93 at pre-test, 0.93 at post-test, and 0.91 at follow-up.

Letter recognition task88

The experimenter read aloud a letter in a triplet of letters and the child indicated the corresponding letter. There were 21 triplets of letters and children obtained a point for each correct recognition. We calculated the percentage of errors for each child. The split-half reliability was 0.91 at pre-test, 0.86 at post-test, and 0.85 at follow-up.

Syllable reading89

Children read aloud a series of syllables (a matrix of 10 × 10), in order, from left to right, as fast as possible. The number of errors was calculated for each child. The split-half reliability was 0.997 at pre-test, 0.996 at post-test and 0.997 at follow-up.

Word and pseudoword reading 90

Children read aloud four lists of 28 words and three lists of 16 pseudo-words. The lists were presented one at the time. We calculated the total number of errors for each child. Test–retest reliability was r = 0.56.

The selected numerical tasks measure those abilities that are the target of the Number Race. Accordingly, the Number Race repeatedly asks children to compare non-symbolic and symbolic numerical quantities as measured in the match-to-sample, number words comparison, and Arabic digit comparison tasks. Some children might count the dots in each set, thereby using serial counting, which we assessed in the counting task. In the advanced stages of the game, the to-be-compared numerical quantities are the results of additions and subtractions, which were tested in the mental calculation task. Moreover, the software always reads aloud the Arabic digits so children can improve the connection between visual and verbal representation of numbers, which we measured using the naming task. The Number Race asks children to move the game characters on a linear board, thereby improving the association between number and space, which was evaluated using the number line tasks. It is worth noting that the administered tasks were structurally similar to the component tasks that form the Number Race game. A notable exception is the number line task(s) because children were not required to mark the spatial position of target numbers in the Number Race. In this vein, the number line tasks could be considered as near transfer tasks. Finally, the BIN is a standardised battery assessing different aspects of numerical knowledge in preschool children, which is the corresponding mental age of the participants with DS. Similarly, we selected the letter recognition, syllables reading, and word and pseudoword reading task to assess the effectiveness of the control (reading) intervention.

Training

Both the experimental and the active control group completed 20 training sessions across ten weeks, with two weekly sessions of 20–30 min each. The experimenter supervised participants during the training sessions and continuously provided feedback to support participants’ engagement with the training. Participants were encouraged to engage with the training activity for at least 20 min, but no more than 30 min. The training was delivered on the experimenter’s laptop and all children completed the planned number of sessions.

Children in the experimental group played with the Italian version of “The Number Race” game72,73,74. Children in the active control group underwent an intervention based on “Fondiamoleletterine”91 or “Lettura di base 3”83, according to their age and level. The former is a software aimed to support the early steps of reading acquisition, training phoneme blending. Moreover, it has been shown to be effective in children with DS, in particular, leading to improvements in decoding of syllables and words and repetition of auditory stimuli92. It is composed of seven levels of increasing difficulty: in the first one, letters are associated by shape and sound to a figure (to facilitate memorization), while in the following levels, phoneme blending is trained, starting from syllables of increasing difficulty (consonant + vowel at the beginning, three-letter syllables later), to disyllabic words. The child can then listen to the syllable/word reproduced by the software. The latter intervention improved word reading. The game is composed of different kinds of activities, where the child is required to read words or brief texts, for example, phonemic and syllabic inference, word-picture association, vertical word reading. The software gave feedback and reinforcements to the participant and all the activities were set in a playful environment to make exercises pleasant and motivating. The syllables and words utilized in these training sessions were not the same adopted in the testing sessions.

Results

We analysed the effect of the training by running mixed ANOVAs for each task with Session (Pre-test vs. Post-test vs. Follow-up) as a within-subjects factor and Group (EG vs. CG) as a between-subjects factor. When the assumption of Sphericity was violated, the Greenhouse–Geisser adjustment was applied to p-values (reported as p[gg]). Post-hoc t-tests were two-tailed and the p-values were corrected for multiple comparisons analysis using the Bonferroni method (i.e., alpha value divided by the number of comparisons). Hedges’ g was calculated to determine the magnitude of the difference between the groups at each session. We also reported Bayes factors (BF10) expressing the probability of the data given H1 relative to H0 (i.e., values larger than 1 are in favour of H1 whereas values smaller than 1 are in favour of H093,94). We reported the Bayes factors (BF) as the ratio of BFs10 between compared models. If the ratio between BF10 of model A and BF10 of model B is larger than 1, then there is evidence for model A. Conversely, if the ratio is smaller than one, there is evidence for model B. We reported the scores in the administered tasks across sessions for the two groups in Table 2. The zero-order correlations between all pre-test scores are reported in Table S1 in the Supplementary Information.

Table 2 Mean scores (SD) [range] at pre-test, post-test, and follow-up separately for the EG and CG groups.

In Table 3, we reported the results of mixed ANOVAs. The contrasts between and within groups were reported only when the interaction between Session and Group was significant. According to Bonferroni correction, we adjusted the alpha levels to 0.016 (i.e., 0.05/3) for comparisons between groups in each one of the three sessions, and to 0.008 (i.e., 0.05/6) for comparisons between pre-test and post-test, pre-test and follow-up, and post-test and follow-up within the two groups. Due to a computer failure, results at follow-up in mental calculation, naming, counting and digit comparison tasks were missing for 6 children of EG and 3 of CG, and in the match-to-sample results were missing for 6 children of EG and 11 of CG.

Table 3 Statistical results of mixed ANOVAs and post-hoc t-tests.

BIN

We found strong evidence (BF = 19.46) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). Although we found anecdotal evidence for a difference between the two groups at post-test and follow-up, both groups improved their scores from pre-test to post-test and from pre-test to follow-up but not from the post-test to follow-up. These improvements appear to be more substantial in the EG (extreme evidence) compared to the CG (strong evidence).

Figure 1
figure 1

Mean scores in administered numerical tasks of EG and CG across sessions. Error bars represent between and within groups (horizontal segment) 95% confidence intervals. Transparent points represent individual scores.

Number comparison

We found strong evidence (BF = 18.54) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). The EG displayed a better performance compared to the CG at follow-up (moderate evidence). Moreover, only the EG reduced the errors from pre-test to post-test and such improvement was maintained until the follow-up (extreme evidence), which did not differ from the post-test.

Figure 2
figure 2

Mean scores in the administered literacy tasks for EG and CG across sessions. Error bars represent between and within groups (horizontal segment) 95% confidence intervals. Transparent points represent individual scores.

Mental calculation

We found moderate evidence (BF = 4.61) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). Although we found anecdotal evidence for a difference between the two groups at post-test and follow-up, only the EG group displayed a better performance (moderate evidence) between pre-test and post-test and between pre-test and follow-up.

Number to position task

For the 0–10, we found anecdotal evidence (BF = 0.405) in favour of the model with the two main effects compared with the model also including the interaction between session and group (Fig. 1). For the 0–20, there was moderate evidence (BF10 = 4.07) in favour of the model with the two main effects and the interaction and strong evidence (BF = 32.31) for its superiority compared to the model with only the two main effects (Fig. 1). The EG did not differ from the CG at pre-test, post-test, but did at follow-up (moderate evidence) by showing a better performance. However, we found anecdotal evidence for improvement between pre-test, post-test, and follow-up in the two groups.

Match-to-sample

We found anecdotal evidence (BF = 2.59) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 1). There was anecdotal evidence for a difference between groups across the three testing sessions. However, the EG displayed better performance in the post-test (strong evidence) and follow-up (moderate evidence) compared to the pre-test session.

Number naming

We found moderate evidence in favour of the model with the main effects of session and group (BF = 0.22) compared to the model with additionally the interaction (Fig. 1).

Counting

We found moderate evidence in favour of the model with the main effects of session and group (BF = 0.22) compared to the model with additionally the interaction (Fig. 1).

Digit comparison

We did not find supporting evidence neither for the interaction between the two groups nor for the model with the two main effects only (BF = 1.01) (Fig. 1).

Letter recognition

We found anecdotal evidence in favour of the model with the main effects of session and group (BF = 0.42) compared to the model with additionally the interaction (Fig. 2).

Syllable reading

There was extreme evidence (BF = 181) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). We found mainly anecdotal evidence for a difference between the two groups in the three testing sessions. However, there was moderate evidence for improvement between pre-test and post-test and pre-test and follow-up in the CG.

Word and pseudoword reading

For word reading, there was moderate evidence (BF = 6.67) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). There was mainly anecdotal evidence for a difference between the two groups in the three testing sessions. Only the CG group showed moderate evidence for improvement between pre-test and follow-up. For pseudoword reading, there was moderate evidence (BF = 7.42) in favour of the model with the two main effects and the interaction compared to the model with only the two main effects (Fig. 2). There was anecdotal evidence for a difference between groups in the testing sessions. However, the CG showed an improvement in performance (strong evidence) from pre-test to post-test and from post-test to follow up (moderate evidence).

Discussion

Individuals with DS display numerical deficits that range from processing non-symbolic numerical quantities to arithmetic performance12. Several studies have evaluated interventions to improve numerical skills in DS yielding mixed results71. We provide new evidence to this research area by evaluating the efficacy of the computerized game “The Number Race” (a non-commercial, freely available software) in improving numerical skills in children with DS. We assigned participants to an experimental group (EG) who played with “The Number Race” and a control group (CG) who worked on reading skills using two different software. We evaluated participants’ numerical and reading skills before and immediately after the training as well as after three months. EG and the CG had similar numerical knowledge at pre-test and both groups displayed some improvement in their basic numerical skills immediately after the training, as measured by the numeracy battery (BIN). However, the EG showed a large improvement, whereas the CG obtained a small improvement. Note that the EG improvement on the numeracy battery (BIN), when evaluated against the normative data (which is available for preschoolers), corresponds to the (average) change observed between 51 and 73 months of age in typically developing children. Importantly, the level of basic numerical skills measured at the end of the training was maintained at the 3-months follow-up. The EG group displayed better performance in the number comparison task compared to the CG at follow-up, and only the EG showed such improvement from pre-test to post-test. The EG group also displayed improvements in mental calculation, which were maintained at follow-up, compared to their pre-test performance, even though there were no significant differences compared to the CG. The evidence for any changes in the CG’s scores in other numerical tasks was anecdotal at best. Conversely, the CG group showed a medium improvement in syllable reading and a small improvement in pseudoword reading at the end of the training. The latter score was further improved at follow-up in the CG. We found mainly anecdotal evidence for any improvement of reading skills in the EG.

Overall, we found substantial evidence for an improvement of the EG group in some but not all numerical tasks. We speculate that training had deeper effects on quantity understanding and manipulation, thereby yielding larger improvements in tasks that force children to actively manipulate numerical information (BIN, number comparison, and calculation) compared to others that do so to a lesser extent (e.g., number naming, counting). Nevertheless, the visual inspection of means suggests that the performance tended to increase across tasks. This suggests that twenty training sessions might not be enough to observe a substantial between-groups difference in children with DS, whereas they appear to be sufficient in typically developing children73. Future studies may explore whether longer training could make improvements in the numerical tasks more evident compared to the control group.

Another aspect worth considering is the large variability in children’s numerical and reading skills, which might have obscured differences between the two groups. Accordingly, some children might have found some aspects of the training superfluous, whereas others would have benefited more from practising on given components of the training. For example, the Number Race always requires players to move the characters on the board game aiming at improving the linear relation between space and numbers, which some participants may already possess, thereby making the training redundant. In this light, a training that aims at improving a variety of numerical skills at the same time, such as the Number Race, might dilute the capacity to generate a significant improvement in one specific skill in a limited amount of time. The Number Race cannot change its structure and fully adapt to the participant’s knowledge, which can instead be achieved in one-by-one training under the supervision of an expert. Sometimes what matters is not what is the training but how much it is trained. The moderate evidence for an improvement in mental calculation in DS49, which, instead, appeared to be stronger in typically developing preschool children63, might be due to the fact that the game requires to perform additions and subtractions only in the more advanced levels when participants accuracy in comparing symbolic numbers is maintained at a level of high accuracy64. Variable performance might have sent children with DS back to game levels that mainly involve number comparison, thereby preventing access to later game levels in which they could benefit from intense training on arithmetic. Not only what and how much, but also how a numerical skill is trained should be considered. In this vein, another possibility for the limited improvement in arithmetic is that the Number Race does not provide any support in terms of strategy. Admittedly, the Number Race presents the arithmetic operation in terms of dots that are added to or removed from a given set. However, there is no explicit instruction on using counting strategies such as counting-on from the larger set, which has been successfully taught to children with DS in a previous study56.

The CG showed some improvements in reading whereas the performance of the EG remained stable across reading tasks. Although the improvement in the CG might reflect a positive effect of the reading training, such evidence should be carefully considered. Accordingly, the CG tended to display lower error scores in the reading tasks that became more similar to those of the EG at post-test and follow-up. This amelioration might be due to simple regression to the mean, rather than a substantial effect of the reading training. The CG also displayed some amelioration in their numerical skills. However, the lack of a waiting list group prevents from disentangling whether this amelioration could be attributed to the simple effect of time or the effect of the reading training on early numerical skills95.

The Number Race simultaneously trains several numerical abilities such as non-symbolic and symbolic number comparison, counting, arithmetic, number-space association, and number recognition. In this light, the game enhances the link between different representations of numbers, which are the cornerstones of early numerical development30,37, while introducing the first arithmetical procedures. Nonetheless, it is difficult to draw strong theoretical conclusions on the relation between trained numerical skills, as done in other studies, whereby, for instance, training based exclusively on non-symbolic stimuli transferred to symbolic numbers96,97. In this regard, the Number Race aims at establishing and strengthening the main functional components of the cognitive architecture underlying number processing and mathematical learning74 rather than improving a specific numerical skill.

One theoretical conclusion concerns the trainability of numerical skills in DS. The descriptive statistics suggest that children with DS could improve in a variety of numerical skills, although the training might need to be longer and more intense compared to typical development. Yet, it is unclear whether individuals with DS have simply memorised some numerical facts and procedure (e.g., “seven is larger than six”, “one plus one equals two”), which still constitutes a valuable achievement, or have established a deep understanding of the numerical operations they have been practising. If the latter is the case, an improvement should be observed in numerical skills that were not targeted during the training (i.e., transfer). In this regard, the number line tasks could be considered as transfer tasks. Accordingly, children were not explicitly requested to place numbers onto a visual line during the game, although the number-space association was trained when children moved characters on the board. Children in the experimental group tended to show a more accurate placing of numbers on the line tasks, especially the 0–20 interval, compared to the control group at follow-up, thereby suggesting a real mastering of the numerical skills which were not directly trained. The same cannot be said for other tasks whose structure resembled the training proposed in the Number Race, such as comparing dots, as done in the match-to-sample task, or comparing symbolic numbers, as done in the number words and digits comparison tasks. Future studies should test whether training effects generalize to numerical activities that need to be carried out in daily life. The best way to put this hypothesis to test would be to obtain some real-life measures after the training to assess whether individuals with DS apply the learned numerical skills to different tasks and contexts. This would simultaneously assess the presence of a real transfer effect and the ecological validity of the training.

The conclusions of our study should be carefully considered in light of methodological limitations. The participants were alternately allocated to one of the two training groups with the exception of four participants who were allocated to the numerical training to meet the parents’ request. As a consequence, four participants were allocated to the reading training to balance the number of participants in each group. We decided to accommodate parents’ request to increase the compliance with the intervention, reduce attrition and increase our final sample size. The implemented assignment procedure, however, diverges from the standard random assignment, thereby requiring carefulness when interpreting the results. For instance, some children (and families) might have had a more positive attitude toward one of the interventions compared to the other. In this light, measuring expectations and attitudes toward the interventions can ensure that participants (and families) across groups have similar willingness to undergo the intervention98.

The small sample size may question the reliability of the findings. However, the pattern of results went in the expected direction with the EG improving in numerical tasks and the CG improving in reading tasks, also in the case of anecdotal evidence for a reliable improvement. Accordingly, a close look at the descriptive statistics reveals that the EG group displayed on average an increase in performance on almost all the administered numerical tasks, even though there was strong or extreme evidence only in a few instances. Larger sample sizes would provide more information on whether a given numerical skill can or cannot be effectively improved in the chosen time frame by training with the Number Race.

Another limitation is the lack of blinding regarding the training participants received. Research assistants conducting the assessment were aware of the group the participants belonged to. The lack of blinding might have generated a bias in the experimenters assessing the performance before and after the intervention. Future studies shall achieve blinding by having different experimenters for the supervision of the training and the assessment before and after the intervention.

A further limitation of our study is that we did not compare the computerized training with another numerical activity delivered in a more traditional way (e.g., one-to-one teaching with paper-and-pencil materials). Nevertheless, the use of an adaptive computerised task brings some intrinsic advantages as the game requires minimal supervision and can be easily implemented at home and school under the supervision of a non-expert (e.g., parent or teacher). Nonetheless, it remains an open question to identify the most beneficial training programme for DS not only in terms of increasing numerical and mathematical skills, but also in terms of cost–benefit for institutions, practitioners, families, and individuals.

Despite the above-mentioned limitations, the present study shows that The Number Race can be a promising tool to improve basic numerical skills in children with DS.