Down syndrome (DS, OMIM #190685) is the most common genetic intellectual disability worldwide. Patients experience intellectual disability, congenital malformations, and a range of specific comorbidities. Development in children with DS is characterized by a decrease in the development quotient (DQ) with age, due to the fact that these children develop more slowly than a typical population of children of the same age, especially in terms of motor and language skills.

Although progress in managing comorbidities has improved life expectancy of patients with DS, no therapeutic options have significantly improved intellectual abilities.1 The pathophysiology of intellectual disability in DS remains unclear. It is believed to be driven by a gene imbalance effect with overexpression of key genes on chromosome 21.2 Given that neuronal abnormalities in DS appear early during the second trimester of pregnancy, it is expected that potential treatment will be more efficient before or at least soon after birth.3,4 Scientific rationale and clinical data support the hypothesis that safe therapeutic use of folic acid (or folinic acid [FA]) and thyroid hormone could be beneficial for the pediatric population with DS.

Folates play a critical role in several key biochemical reactions, including methylation (DNA, RNA, and protein), DNA synthesis and repair, and amino acid synthesis, and are critical during neurodevelopment. In the early development setting, Gao performed a review of studies evaluating the impact of folic acid supplementation during pregnancy, and found 15 of 22 studies demonstrated a beneficial effect on neurodevelopment (motricity, language, executive function, attention) in typical populations.5 There is extensive evidence that folate deficiencies contribute to neuropsychiatric disorders that can be corrected by folic acid supplementation.6,7,8 Several genes of the folate pathway are located on chromosome 21, notably cystathionine-β-synthase, which is overexpressed in DS, while low levels of homocysteine have been reported in DS patients, suggesting an induced functional folate deficiency.9 A correlation between folate plasma concentrations and intelligence quotient was seen in patients with DS who had a low functioning level.10 Nevertheless clinical studies have failed to demonstrate the efficacy of combinations of vitamins and minerals including folate, as reviewed by Schaevitz et al., although the study samples were small and in most studies participants were more than five years old.11 We previously reported the randomized, double-blind ENTRAIN study in infants with DS; patients receiving folinic acid (1 mg/kg/day) for one year showed a significant improvement in developmental age compared with the placebo arm in the per-protocol analysis, although not in the intent-to-treat population.12 Interestingly, this effect was most pronounced in patients receiving concomitant thyroxine.

Thyroid hormones also play a major role in neurological and cognitive development. Thyroid disorders are widespread in DS patients, especially congenital or acquired hypothyroidism.13,14 Hypothyroidism is overt or subclinical (high thyroid stimulating hormone [TSH] and normal free T4 values). This association between DS and thyroid disorder is possibly a consequence to Dyrk1A gene overexpression.15 There is a long history of trial with thyroid hormone treatment for patients with DS, with or without hypothyroidism such as reported by Tirosh et al.16

Recently, van Trotsenburg demonstrated that the distribution of thyroxine (T4) concentrations in a population of newborns with DS is similar to that of the general population but with a downward shift in the mean.17 Given the importance of thyroid hormone for brain maturation, the same team demonstrated in a placebo-controlled randomized study that thyroxine treatment taken for two years after birth induced a weak but significant benefit in motor development (0.7 months), height, and weight gain for children with DS without congenital hypothyroidism.18

The American Academy of Pediatrics (AAP) recommends monitoring thyroid levels of infants with DS at birth, 6 and 12 months, then annually, and treating patients if levels prove to be abnormal. In France, there are not yet national guidelines for this population, however monitoring practices for thyroid function are similar. Nevertheless, there is currently no international or national consensus for the level of TSH required for substitutive treatment of subclinical hypothyroidism in these patients; in France, a TSH level above 5 or 7 mIU/L in two consecutive analyses generally results in a proposal for treatment. It also remains unclear if thyroxin treatment can be beneficial for neurodevelopment in infants with normal but suboptimal thyroid function, as defined by van Trotsenburg.17

Given the lack of robust evidence for the efficacy of either treatment, and the possible synergistic interaction,12 we designed a phase 3 clinical trial to evaluate the benefit of FA and thyroxine, in combination or alone, on neurological development in young euthyroid patients with DS.


Study design and oversight

The Assessment of Systematic Treatment With Folinic Acid and Thyroid Hormone on Psychomotor Development of Down Syndrome Young Children (ACTHYF) was a single-center, randomized, four-arm, double-blind, parallel-group, placebo-controlled phase 3 trial, performed in infants with DS. The study was proposed to all eligible patients routinely treated at the Jerome Lejeune Institute (JLI) (Paris), a medical center dedicated to treating patients with cognitive deficiencies of genetic origin. An independent data monitoring committee (IDMC) reviewed all prespecified safety data annually. The protocol was approved by the local ethics committee and the French health authorities and was conducted in accordance with the Declaration of Helsinki and International Conference for Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) guidelines. Parents/legal guardians gave written informed consent.


Patients had to be aged between 6 and 18 months at inclusion, with a karyotype demonstrating homogeneous free or Robertsonian translocation, complete trisomy 21, and without any serious cardiac conditions (i.e., unstable or with hemodynamic consequences) or neurological conditions. Main exclusion criteria were gestational age <231 days, 5-minute Apgar score <7, congenital hypothyroidism, hypothyroidism (TSH > 7 mIU/L), hyperthyroidism, and leukemoid reaction at birth. Sleep apnea was not an exclusion criterion because the high frequency of this disease in young children with DS was not clearly established at the beginning of our study, and the AAP recommends detection at the age of 4 years only.19

Randomization and treatment

Patients were assigned to treatment at JLI at the end of the inclusion visit by centralized randomization using a computer program, prepared by a statistician. The process was managed by a Contract Research Organisation (CRO). Randomization was stratified on the basis of age (<12 months vs. ≥12 months) and sex. Blocks of four using a 1:1 ratio for each treatment group were used. Randomization was performed independently using an interactive voice response system (CVM-PhoneAssist). Patients were equally randomized to one of four treatment groups: FA plus L-thyroxine (FA+L-thyroxine), FA plus L-thyroxine placebo (FA), FA placebo plus L-thyroxine (L-thyroxine); or FA placebo plus L-thyroxine placebo (placebo). Folinic acid (Folinoral®, Therabel) 1 mg/kg/day and L-thyroxine (Lévothyrox®, Merck) 3 µg/kg/day (or matched placebo) were administered orally for 12 months. L-thyroxine dose adjustments and treatment discontinuation for hypo- or hyperthyroidism were decided by an independent unblinded pediatric endocrinologist who had no contact with investigators. TSH levels were evaluated one month after treatment initiation, at each visit, and more often if necessary. To ensure blinding, the independent endocrinologist prescribed the same dose adjustments in all groups.

Endpoints and assessments

Three visits were scheduled (at baseline, 6 and 12 months after treatment initiation), plus telephone contact at least every 2 months, and follow-up assessments 2 and 4 months after last treatment intake. The primary efficacy endpoint was the adjusted change from baseline in global development quotient (GDQ) at 12 months, using the Griffiths Mental Development Scales (GMDS).20 The GMDS includes six subscales: locomotor, socialization, language, coordination, performance, and reasoning. However, given the intellectual disability in our study population, reasoning was not assessed. For each subscale, a raw score was derived from the contributing items. The total raw score was obtained by adding subscale raw scores. The sum of all subscale raw scores was converted into a development age using a correspondence table. Subscale and global development quotients were computed by dividing the development age by the chronological age multiplied by 100.

For preterm infants, chronological age was corrected taking into account the gestational term. Psychomotor development assessments were performed early afternoon after lunch and a nap, and were recorded to check scoring if needed. Psychologists were trained for GMDS before the study and cross-evaluated each other. Secondary endpoints included the change from baseline at 12 months for the Brunet–Lézine GDQ (a French psychomotor test for infants),21 as well as for height and head circumference. Exploratory endpoints included overall evolution of clinical global impression at 12 months from baseline in terms of no progress, slight progress, marked progress, or very marked progress. Safety data (adverse events and laboratory data) were collected at each visit; biological analyses were performed centrally.

Statistical analysis

The study was designed to detect a clinically relevant difference of six points in GDQ between any one of the three treatment groups compared with placebo. The initial protocol planned for 256 patients, however due to difficulties in recruiting patients, the sample size was subsequently reduced to 175 patients, to obtain 140 evaluable patients. Assuming an intragroup standard deviation (SD) of 10.2, a one-sided type I error of 2.5% and a correlation of 0.5 between the baseline and 12-month values, a sample size of 140 patients gives a 67.9% power if one of the three active groups is efficient, 82.6% if two are efficient, and 88.5% if three are efficient. Sample size calculations are detailed in Supplementary Methods. To account for the revised sample size, the planned analysis was modified in such a way that instead of comparing each study drug with the absence of study drug and testing the interaction, a two-step Dunnett step-down procedure was performed for the primary and secondary endpoints, first comparing each active treatment (single agent or combination) with the placebo combination, and then comparing combined L-thyroxine plus folinic acid with each single agent. The analysis of covariance (ANCOVA) was adjusted on sex, age, class at randomization (<12 vs. ≥12 months), psychologist pair performing baseline and 12-month evaluations, and baseline GMDS GDQ. The same statistical analyses were performed for secondary endpoints, and for clinical global impression active groups were compared with placebo using Chi-square tests. Psychologist pair adjustment was only used in the analysis models for GMDS and Brunet–Lézine.

As recruitment was nationwide and patients were included at a single center, screening and randomization were performed on the same day to minimize patient visits to the center. Eligibility of randomized patients for TSH levels was thus confirmed retrospectively, and patients with TSH >7 mIU/L were discontinued. Thus, the modified intention-to-treat (mITT) population included all randomized patients with a valid informed consent who were not prematurely discontinued due to baseline TSH >7 mIU/L. The per-protocol population was defined as the mITT population without major protocol deviations. Efficacy analyses were performed in the mITT and per-protocol populations.

Data were analyzed according to assigned treatment. The correlation between the changes from baseline in GMDS and Brunet–Lézine GDQ were evaluated by the Pearson correlation coefficient and 95% confidence intervals (CIs) with Fisher transformation. Sensitivity analyses excluding noncooperative children (unwilling to take the test due to their young age) were performed, as well as exploratory analyses evaluating the presence versus absence of each drug, and post hoc analyses in patients aged ≥12 months, in patients with GDQ below the baseline median, or with high baseline TSH (>5 mIU/L). Analyses were done with SAS, version 9.4 (SAS Institute).


Patient population

A total of 175 patients were randomized between April 2012 and December 2016. Among them, 156 patients were included in the mITT population (41 placebo; 38 FA; 37 L-thyroxine; 40 FA+L-thyroxine; Fig. S1, Supplementary Appendix). Thirteen mITT patients prematurely discontinued the study and could not be assessed for the primary endpoint, resulting in the primary analysis being performed in 143 patients (40 placebo patients, 30 FA, 35 L-thyroxine and 38 FA+L-thyroxine). Another five children did not cooperate during psychomotor assessment and were excluded from the per-protocol analysis (138 patients: 37 placebo patients, 30 FA, 34 L-thyroxine, and 37 FA+L-thyroxine). Two of these five children showed symptoms of autism spectrum disorder.

Baseline characteristics were well balanced across the four groups in the mITT population for demographics (medical history, cardiac disorders, gestational age, birth biometry, and Apgar at 5 minutes), and T4 and TSH levels (Table 1). Mother’s level of education, rehabilitation therapies, and type of care were well balanced.

Table 1 Baseline demography and clinical characteristics in the mITT population

Primary efficacy outcome

Mean baseline GMDS GDQ in the mITT was approximately 55 points in all four arms with standard deviations close to the 10.2 used for size sample calculation (Table 2, Fig. 1). The adjusted mean change from baseline in GMDS GDQ at 12 months showed similar decreases in all four treatment groups (placebo: −5.10 [95% CI −7.84 to −2.37]; FA: −4.69 [95% CI −7.73 to −1.64]; L-thyroxine −3.89 [95% CI −6.94 to −0.83]; FA+L-thyroxine: −3.86 [95% CI −6.67 to −1.06]), with no significant differences with the Dunnett step procedure for any of the active treatment groups (FA+L-thyroxin combination or either single agent) compared with placebo. Similar results were seen in the per-protocol population (Table 2). The range in GDQ at baseline was large across all groups (overall 26 to 89), and likewise, evolution in GMDS GDQ over 12 months ranged widely in all groups, from losses of up to 28 points to gains of up to 17 points.

Table 2 Changes in GMDS GDQ after 12 months of treatment in mITT and PP populations
Fig. 1: Global Development Quotient by visit and treatment in modified-intention-to-treat (mITT) population.
figure 1

(a) Griffiths Mental Development Scales (GMDS) (b) and Brunet–Lézine global development quotient (GDQ) by visit and treatment group in modified intention-to-treat (mITT) population.

No differences in adjusted change from baseline at 12 months were seen when excluding infants considered as noncooperative. Furthermore, exploratory analysis of the global effect of each drug did not show any difference in GMDS GDQ change when patients were grouped by drug intake (Table 3). The covariate of psychologist pair performing evaluations at baseline and after 12 months did not impact the adjusted change from baseline in GDQ (p = 0.64, Table S1, Supplementary Appendix). Covariates of age at randomization (12–18 vs. 6–12 months; p = 0.004) and baseline GDQ (p < 0.0001 per one-point GDQ increase) significantly impacted the adjusted change from baseline in GDQ (Table S1, Supplementary Appendix). However, no statistical differences were seen in adjusted change from baseline in GDQ at 12 months in the active treatment groups versus placebo in post hoc analyses in patients aged ≥12 months or in patients with a GDQ below the median.

Table 3 Adjusted change in GMDS GDQ from baseline after 12 months for patients receiving either drug (single agent or in combination), mITT population

Finally, no significant differences in adjusted mean change were seen for any of the subscales of the GMDS (locomotor, personal–social skills, hearing and language, hand–eye coordination, performance) when comparing any of the three active treatments with placebo, or when comparing each single agent active drug (L-thyroxine+placebo or FA+placebo) with the active drug combination (L-thyroxine+FA).

Other efficacy outcomes

Adjusted mean change at 12 months from baseline was also not significant in the mITT for Brunet–Lézine GDQ (Fig. 1b), height, or head circumference. According to the investigator’s global clinical impression there was an improvement in the FA+L-thyroxine arm compared with placebo after 12 months; nevertheless, because the primary outcome was not significant, exploratory analyses cannot be declared as statistically significant. Correlations between GDQ values at each visit according to the GMDS and Brunet–Lézine scales were high and equivalent in all four treatment arms (0.85 to 0.95), as were correlations between the GDQ change from baseline after 12 months (0.78 to 0.87).

Post hoc analyses in patients with high baseline TSH (>5 mIU/L), showed a smaller decrease in the adjusted mean change in GMDS GDQ after 12 months in patients treated with thyroxine alone and a slight but nonsignificant increase with thyroxine combined with folinic acid compared with other groups. No difference was seen when this analysis was repeated for adjusted mean change in height in this high-TSH population.

Changes in GMDS development age at 12 months were similar in all four groups, with mean increases in the mITT population of 5.7 to 6.0 months (Table S2, Supplementary Appendix). In terms of developmental age, changes in developmental age ranged widely over 12 months, with some patients gaining only 2 months, while others gained 10 months.


Among the 174 treated patients, the median exposure duration was 12 months in all groups. Available T4, TSH, and folate analyses confirmed good exposure to treatments (Table 4). Safety profiles of the two single agent and combination treatments were similar to the placebo arm, with a low incidence of treatment-related events and no related serious adverse events. In addition to the five patients in the FA arm (11.6%) who discontinued treatment due to hypothyroidism, two patients discontinued following unrelated events of acute myeloid leukemia and infantile spasms. One patient in the L-thyroxine arm discontinued due to infantile spasms. Sudden death was reported in a female patient treated in the combination arm at 18.5 months. She had normal T4 and TSH levels at her last evaluation and at death, and her death was considered not related to treatment by the IDMC.

Table 4 Change from baseline in TSH, T4, and total folate after 12 months, safety populationa


Despite a report in a previous study suggesting that administration of 1 mg/kg/day folinic acid for one year improves psychomotor development of children with DS, particularly those on thyroxine treatment,12 the present trial does not support the efficacy of either thyroid hormone or folinic acid as single agents or in combination, for improving psychomotor development in these patients. Nor does it confirm a positive effect on motor development or height and head circumference as demonstrated when L-thyroxine treatment was administered in patients from birth for two years.18

One explanation could be insufficient dosing of folinic acid or that the presence of folate receptor antibodies modulates the response to treatment, as supported by studies showing that folate receptor antibodies are frequently present in cerebral folate deficiency syndromes, including autism and neurological disorders.8,22 A recent study showed that high-dose folinic acid (2 mg/kg/day) improved verbal communication in children with autism and language impairment, with the presence of folate receptor antibodies predictive of response of treatment.23 Autoimmunity is in fact very frequent in the DS population,24,25 and it is possible that higher-dose folinic acid could have been more efficacious than the dose used in our study. The L-thyroxine dosage was lower in this study than in the randomized study led by van Trotsenburg.18 However, in our study TSH and T4 levels changed significantly from baseline compared with placebo in patients treated with thyroxine, with T4 levels in the higher range, supporting the hypothesis that our thyroxine dosage was accurate.

Another explanation could be that the true effect is lower than expected and that the power was not sufficient to detect such a difference (the final design aimed to detect a statistically significant difference of six points in GMDS GDQ in at least one of the three active groups versus placebo with a power of at least 67.9% if only one active group was efficient, 82.6% if two were efficient, and 88.5% if all three were efficient). Considering the results, it would mean that the treatment effects would be low with a maximum difference of 4.09 points (Table 2) in GDQ, and not of clinical interest. In terms of treatment duration, a 12-month treatment period is appropriate to detect neuronal improvement linked to better metabolism in infants. A major challenge for studies in very young children with psychomotor retardation is to obtain reproducible and reliable psychomotor evaluations, as this can be impacted by several factors, especially the child’s cooperation during evaluation. The two scales used gave homogeneous results with the expected standard deviations, while the impact of the psychologist pairs evaluating patients at baseline and after 12 months was not significant. Similarly, exclusion of noncooperative children did not impact the outcome.

Finally, the study population was highly homogeneous and the four groups were equivalent in terms of biology, sex, rehabilitation, parental socioeconomic or educational level, comorbidities, and baseline development quotient. Thus overall, all treatment groups were comparable, received treatment at efficacious doses for approximately one year, and were correctly evaluated. As to whether thyroxine should be administered when TSH is in the upper quartile of the normal range (TSH >5 mIU/L), our results suggest that any effect is weak.

In conclusion, despite the robust methodology used in this study, the results do not support the recommendation of folinic acid or thyroid hormone supplementation in young children with DS with TSH levels <7 mIU/L. This study confirmed large variability of developmental outcome in the DS population at early stages, with a very large range of baseline development quotients, and developmental trajectories in all groups. Future studies should address this large variability; frequency of infectious diseases likely contributes to this variability, as does sleep apnea, which is now recognized to occur frequently in very young children with DS, with developmental consequences.26 Controlling sleep apnea may be an important target for protecting the brain in infants with DS, while additional factors such as genetic polymorphisms and the role of autoimmunity and inflammation merit further study. Finally, particular attention should be paid to psychomotor assessment, and we recommend that patients be stratified by baseline GDQ in clinical trials.