Introduction

Down syndrome (DS) is the most common chromosomal disorder described in humans1 and affects approximately 1 in every 1000–1100 newborns worldwide2. It is caused by the presence of human chromosome 21 in three copies in the cells of affected subjects. The main features of DS consist of intellectual disability (ID), cardiovascular defects and craniofacial dysmorphisms3.

After a systematic reanalysis of all described partial trisomy 21 cases, Pelleri et al.4 identified a highly restricted Down syndrome critical region (HR-DSCR) on human chromosome 21. This region is 34 kb long and is located on the distal 21q22.13. It is duplicated in all DS subjects while it is not duplicated in subjects without a diagnosis of DS5. To date there is no information about the role of this region (HR-DSCR), which does not include known genes but which is part of a very long intron of the isoform 2 of the KCNJ6 gene (https://www.ncbi.nlm.nih.gov/gene/3763)5.

Recently, Caracausi et al.6 conducted an analysis of the 1H Nuclear Magnetic Resonance (NMR)-detectable part of the metabolome in plasma and urine samples of 67 DS subjects and 29 healthy controls (CTRLs) for the first time. They found that DS samples have an altered metabolomic profile, in particular significantly modified levels of several metabolites were observed and attributed to mitochondrial dysmetabolism.

Lejeune had already hypothesized that DS can be considered a metabolic disease7, suggesting that a “blocked” mechanism might determine the level of ID severity, and that specific molecular protagonists of this complex mechanism might be identified.

Finding the possible relationship between metabolomic profile and ID in DS might be of immense value. Searching for the main altered gene products instead for the gene defects may change the point of view of the possible treatment of this condition.

ID is considered to be a significant limitation in both intellectual functioning and adaptive behavior. The mental development of children with DS shows a deceleration during the developmental period8 and their final intelligence quotient (IQ) ranges between 35 and 70, with a mean value of 509,10. However, since IQ is calculated with respect to the distribution of typically developing children of the same chronological age, IQ tends to decrease with age as an effect of the developmental deceleration characteristic of this syndrome. Moreover a high degree of interindividual variability has been demonstrated11.

Previous works indicated that DS is associated with a specific cognitive phenotype, characterized by impairments in speech and language12, with greater difficulties in expressive language than in receptive language, working memory13, as well as in executive functions13,14. Considering motor skills, subjects with DS have fine motor problems due to specific visuo-motor integration and eye-hand coordination problems, combined with slow movement15. On the other hand, non-verbal skills are less severely affected, though recent studies have shown a variable picture depending on which aspect of visuospatial cognition is considered16. Another area of relative strength is social functioning17.

In this work we performed the metabolomic analysis of plasma samples from a higher, almost doubled, number of subjects than in the previous work6. In addition, whenever possible, the cognitive data of DS subjects were collected, and statistical correlations between the metabolomic profiles and cognitive aspects were carried out. The aim of the present work was to confirm the previously observed changes in the metabolome, to identify if the metabolic imbalance makes a clear discrimination between DS and CTRL groups and to verify if specific metabolomic profiles can be associated with the degree of ID.

Results

Study design

Metabolomic data were obtained from a total of 129 subjects with DS (mean age ± standard deviation (SD) = 11.23 ± 6.64 years) and 46 CTRL subjects (mean age ± SD = 15.18 ± 7.99 years) chosen among DS siblings. Sex distribution was 51 female (F) and 78 male (M) among DS and 22 F and 24 M among CTRLs (Table 1). This sample size allowed us to obtain a statistical power of 0.92 (software G*Power, estimation for Wilcoxon-Mann-Whitney test, two tails, effect size d = 0.6, α = 0.05). The main features of the subjects enrolled in this study are described in Table 1. Data concerning age, IQ and medications at the time of the survey are reported in the Supplementary Dataset S1.

Table 1 Number of Down syndrome (DS) and healthy control (CTRL) subjects.

As explained in our previous work6, because of the pediatric age of most of the subjects, it was not always possible to collect samples at a fasting state. Thus, to avoid that non-fasting conditions could alter the results, we performed multivariate and univariate analyses for two groups of subjects: the “all” group including fasting and non-fasting subjects, and the “fasting” group (76 DS and 35 CTRL).

With the aim of minimizing healthcare-induced stress, the metabolomic analyses were conducted on EDTA-plasma samples (as in the previous study6). This sample type is indeed suitable for multiple in vitro testing using different techniques, including the analysis of circulating microRNA18.

Plasma metabolome analysis

Partial Least Squares-Canonical analysis (PLS-CA) of all plasma samples discriminated DS and CTRL groups with an accuracy of 94% in both CPMG (Carr-Purcell-Meiboom-Gill) and NOESY (Nuclear Overhauser Effect Spectroscopy) spectra (Fig. 1A,B, respectively). Under fasting conditions, the discrimination accuracies were 90% in CPMG spectra (Fig. 1C) and 87% in NOESY spectra (Fig. 1D).

Figure 1
figure 1

PLS-CA analysis of all the plasma samples: (A) CPMG and (B) NOESY spectra. Score plot, each dot represents a different plasma sample. Orange dots: Down syndrome samples (DS, n = 129); blue dots: healthy controls (CTRL, n = 46). PLS-CA analysis of the fasting plasma samples: (C) CPMG and (D) NOESY spectra. Score plot, each dot represents a different plasma sample. Orange dots: Down syndrome samples (DS, n = 76); blue dots: healthy controls (CTRL, n = 35).

A random selection of 46 DS and 46 CTRL samples derived from all subjects, repeated 500 times, highlighted a discrimination between DS and CTRL groups with an average accuracy of 87% (p-value < 0.01) in CPMG spectra and 88% (p-value < 0.01) in NOESY spectra, Supplementary Figures S1A and S1B. Considering only fasting subjects, a casual selection of 35 DS and 35 CTRL samples, repeated 500 times, showed a discrimination between DS and CTRL groups with a mean accuracy of 87% in CPMG spectra and 86% in NOESY spectra (p-value < 0.01 in both analyses), Supplementary Figures S1C and S1D.

The same analyses were also performed with samples grouped by sex. In all-female samples, PLS-CA discriminated DS from CTRL groups with an accuracy of 89% (p-value < 0.01) in CPMG spectra (Fig. 2A), while in NOESY spectra (Fig. 2B) the discrimination accuracy was 88% (p-value < 0.01). Discrimination accuracy in all-male groups was 90% (p-value < 0.01) in CPMG spectra (Fig. 2C) and 87% (p-value < 0.01) in NOESY spectra (Fig. 2D). Considering only fasting female subjects, the discrimination accuracies were 84% and 85% (p-value < 0.01) in CPMG and NOESY spectra, respectively. Considering only fasting male subjects, the discrimination accuracy was 86% (p-value < 0.01) in both CPMG and NOESY spectra.

Figure 2
figure 2

PLS-CA analysis of all the plasma samples from female subjects: (A) CPMG and (B) NOESY spectra. Score plot, each dot represents a different plasma sample. Orange dots: Down syndrome samples (DS, n = 51); blue dots: healthy controls (CTRL, n = 22). PLS-CA analysis of all the plasma samples from male subjects: (C) CPMG and (D) NOESY spectra. Score plot, each dot represents a different plasma sample. Orange dots: Down syndrome samples (DS, n = 78); blue dots: healthy controls (CTRL, n = 24).

Among female samples, a random selection of 22 DS and 22 CTRLs, repeated 500 times, gave a discrimination between DS and CTRL groups with a mean accuracy of 82% in CPMG spectra (p-value < 0.01) and 85% in NOESY spectra (p-value < 0.01), Supplementary Figure S2A and S2B. Among male samples, a random selection of 24 DS and 24 CTRL, repeated 500 times, gave a discrimination between DS and CTRL groups with a mean accuracy of 80% in CPMG spectra (p-value < 0.01) and 76% in NOESY spectra (p-value < 0.01), Supplementary Figure S2C and S2D.

The signals of 28 metabolites were unambiguously assigned in all the NMR spectra (Supplementary Dataset S1). Their levels, together with the levels of 3 unknown signals (unk1, unk2 and unk3), were analyzed through univariate statistical analysis (Tables 2 and 3). Considering the samples from all subjects, independently from their fasting state, acetate, pyruvate, acetone, creatine, formate, acetoacetate, unk3 and succinate showed significantly increased levels in DS, with a DS/CTRL ratio > 1 whereas unk1, tyrosine, histidine and threonine showed significantly reduced levels in DS, with a DS/CTRL ratio < 1 (Table 2). Taking into account only fasting samples, the same metabolites were also found to be significantly increased or decreased in DS, with the exception of acetone, threonine and unk1, which resulted not significantly different in the two groups, and of methionine, which increases significantly in DS plasma (Table 3).

Table 2 Univariate statistical analysis of all plasma samples (Down syndrome (DS), n = 129; healthy control (CTRL), n = 46).
Table 3 Univariate statistical analysis of fasting samples (Down syndrome (DS), n = 76; healthy control (CTRL), n = 35).

Correlations between metabolites in DS and control groups

We compared the levels of each metabolite with those of all the others performing a series of correlation analyses in both DS and CTRL samples, separately. Partial correlations including “chronological age” as covariate were conducted (see “Statistical analysis” paragraph in the “Materials and methods” section). We considered only correlations with p-value after FDR (False Discovery Rate)  < 0.05 and with a correlation coefficient r > 0.4 or < −0.4 to be statistically significant. We clustered the results into 7 main groups of metabolites whose levels have statistically significant correlations: the Krebs cycle metabolites (Supplementary Table 1); formate (Supplementary Table 2); ketone bodies (Supplementary Table 3); lactate, glucose and mannose (Supplementary Table 4); branched-chain amino acids (BCAA) (Supplementary Table 5); creatine and creatinine (Supplementary Table 6); amino acids (Supplementary Table 7). Interestingly, some correlations found in CTRL samples had no correspondence in DS samples and vice versa (Supplementary Tables 1, 2, 3, 4, 5, 6, 7; Fig. 3).

Figure 3
figure 3

Statistically significant correlations between the levels of some of the altered metabolites and all other detected metabolites. Heat maps of the correlations (r) of representative metabolites in both Down syndrome (DS) and healthy control (CTRL) samples, separately. The p-value (after FDR correction) of each correlation is also reported.

Correlations between metabolome and cognitive skills

We investigated if the NMR-detectable part of the plasma metabolome of subjects with DS contains signatures of the different IQ scores.

39 subjects with DS were evaluated through the WPPSI-III test (Supplementary Dataset S1). We created two main groups of DS patients on the basis of their IQ scores: one group included all DS subjects with an IQ > 40 (n = 13, mean age=125.69 months) and the other one included all DS subjects with an IQ ≤ 40 (n = 26, mean age=141.38 months). The difference between the two mean ages was not statistically significant by unpaired t-test (t = −1.694; p = 0.099). Using the PLS-CA model we obtained a discrimination accuracy of 74.5% between the plasma metabolomic profiles of the two groups, i.e. DS subjects with IQ > 40 vs. IQ ≤ 40 (Supplementary Figure S3). Additionally, we were not able to identify any metabolites with statistically significant concentration differences between the two groups.

22 subjects with DS were evaluated using the Griffiths-III test (Supplementary Dataset S1). Using the same statistical approach specified for the WPPSI-III test, we obtained a discrimination accuracy of 74.3% between the metabolomic profiles of DS subjects with IQ > 40 (n = 17, mean age=55.94 months) and those one with IQ ≤ 40 (n = 5, mean age=66.4 months). Again, the difference between the two mean ages was not statistically significant by unpaired t-test (t = −1.997; p = 0.06). Also in this case we were not able to identify any metabolites with statistically significant concentration differences (Supplementary Figure S4).

Furthermore, we performed a series of correlation analyses in order to investigate correlations between each metabolite level and the age equivalent (AE) scores collected from Griffiths-III and WPPSI-III tests from DS patients (see AE scores in the Supplementary Dataset S1 and Supplementary Tables 8 and 9). Partial correlations including “chronological age” as covariate were conducted (see “Statistical analysis” paragraph in the “Materials and methods” section). We were unable to obtain any statistically significant correlations considering p < 0.01 and r < −0.4 or r > 0.4 (Supplementary Tables 8 and 9).

Discussion

The aim of this study was to establish whether an alteration in metabolite levels may have an important role in DS. For this reason, we analyzed NMR spectra of plasma samples from patients with DS and their siblings, used as CTRL group, thus significantly increasing the number of subjects with respect to our previous work6 (from 67 to 129 DS subjects and from 29 to 46 CTRLs). First of all, we confirmed the presence of significantly different plasma metabolomic profiles of DS and CTRL groups and identified the metabolites with significantly different concentrations in the two groups. Then, we searched for correlations among metabolite levels in DS and CTRL samples separately and correlations between metabolite levels and cognitive data of DS subjects. Finally, we investigated whether the characteristic plasma metabolomic profile of DS subjects contains signatures of different grades of ID.

The discrimination accuracy between DS and CTRL groups is higher than in our previous analysis6, reaching 94% both with CPMG and NOESY spectra (Fig. 1A,B). If we only consider fasting subjects, the discrimination accuracy only slightly decreases to 90% with CPMG spectra and 87% with NOESY spectra (Fig. 1C,D). The decrease can be explained considering the reduction in the sample size using only fasting samples. Sex is not a confounding factor; in fact, our results show similar discrimination accuracy between DS and CTRL groups when considering females and males separately in both CPMG and NOESY spectra (again, a slight decrease in discrimination accuracy with respect to the complete group can be attributed to the reduction in the numbers of individuals). This is a further confirmation of our previous work6, where we demonstrated that both sex and age were not confounding factors for the discrimination between DS and CTRL metabolomic profiles.

The analysis of plasma samples provided the concentrations of 28 different metabolites involved in multiple metabolic pathways.

The univariate analysis confirmed the significant alterations of the levels of metabolites involved in processes related to mitochondrial metabolism in DS highlighted by Caracausi et al.6; the DS/CTRL ratio allowed us to understand when the metabolite level increases (DS/CTRL > 1) or decreases (DS/CTRL < 1) in DS vs CTRL samples. The analysis showed significantly increased concentrations (p-value after FDR correction < 0.05) of the following metabolites: acetate; pyruvate; succinate; formate and creatine. All of them are involved in energy production. Acetate is fundamental in supporting acetyl-coenzyme A metabolism and thus Krebs cycle progression and lipogenesis19. Pyruvate and succinate are produced before and during the Krebs cycle20. Formate is known to be necessary for the formylation of mitochondrial tRNAs21 and for the formylation of tetrahydrofolate during the folic acid cycle and the purine synthesis pathway. Finally creatine is used by muscles for the production of ATP22. Previous works demonstrated that measurable enzymes whose genes are located on Hsa21 adhere to the 3:2 overexpression model expected in trisomy 2123,24,25,26,27. Although the dependence of the levels of the metabolites on the enzyme concentration is complex and not easily interpretable, we point out that a ratio DS/CRTL near to 3:2 (1.5) or 2:3 (0.67) is observed for a certain set of molecules. In this framework, interesting data are the significant alterations of succinate in DS vs CTRL. Succinate levels increase in DS samples (p-value after FDR correction <0.0001 in both “all samples” and “fasting samples” groups) with a DS/CTRL ratio close to 3:2 (1.37 in “all samples” and 1.57 in “fasting samples” groups).

The alterations of succinate metabolism may play a role in the development of ID, but to date there is no strong evidence. However, it is already known that some enzymes which metabolize succinate are involved in developmental delay and brain injuries. In particular, the succinic semialdehyde dehydrogenase (SSADH) deficiency, caused by a mutation of the ALDH5A1 gene (6p22.3), determines an increase of γ-aminobutyric acid (GABA) and γ-hydroxybutyric acid levels. These alterations cause developmental delay, hypotonia, hyporeflexia, ataxia, neuropsychiatric problems, and epilepsy28.

Furthermore, alterations of activities for the Krebs cycle enzymes were also observed in some brain disorders such as Alzheimer’s disease29 and Huntington’s disease30. In both diseases, the increase of succinate dehydrogenase and malate dehydrogenase and the decrease of the pyruvate dehydrogenase complex, isocitrate dehydrogenase, and the alpha-ketoglutarate dehydrogenase complex were observed, suggesting that a mitochondrial alteration occurs. These results also suggest that measures to improve tricarboxylic acid cycle metabolism might lessen the effects of the diseases29,30.

In order to obtain more information about the alteration of the metabolic pathways in DS we compared the correlations (corrected by age) among the levels of metabolites in DS subjects and CTRL groups, separately.

Importantly, we observed that some correlations are strong/moderate (r > 0.4 or r < −0.4) and significant (p < 0.05) in CTRL samples but lose their significance or weaken their correlation in DS samples. These results are referred to metabolites involved in the Krebs cycle (pyruvate, citrate and succinate), formate and lactate and amino acids like alanine, threonine and tyrosine (Supplementary Table 1, 2, 4, 7; Fig. 3). On the contrary, some statistically significant correlations are found only in DS samples, like associations between phenylalanine and branched amino acids (Supplementary Table 7; Fig. 3). Moreover, the loss of correlations among strictly related metabolites (from a biochemical point of view) and the appearance of new correlation patterns in DS confirmed the dysregulation of some metabolic pathways. Accordingly, the inverse correlation between lactate and succinate characteristic of CTRL samples becomes direct in DS samples (Supplementary Table 1 and 4; Fig. 3).

These findings support the hypothesis of an altered metabolism in DS, strengthening the idea of the involvement of the Krebs cycle and of a few amino acids like leucine, phenylalanine, tyrosine and alanine. Leucine and phenylalanine are both involved in brain metabolism: the role of leucine as a nitrogen donor in the glutamine/glycine pathway may influence the function of some neurotransmitters31, while modifications in the phenylalanine pathway can alter dopamine levels32.

Even if metabolic imbalance clearly discriminates between DS and CTRL groups, it appears that specific metabolomic profiles cannot be associated with the degree of ID, as shown by both the low discrimination accuracies obtained when comparing the metabolomic profiles of DS patients with different IQ disability and the lack of correlation between the levels of the investigated metabolites and the scores obtained by cognitive tests (Supplementary Tables 8 and 9). It would be interesting to test if strengthening the statistical power of the models by increasing the number of patients in each range of ID could reveal such a correlation, as it would be of immense value in identifying critical points of metabolism as treatment targets. An alternative or complementary explanation for this result might be that, in accordance with the neuroconstructivist perspective on development, the effect of metabolome on patient’s skills might be mediated by environmental factors (such as family environment, school, early intervention, etc.), so further study is required here.

In addition, it is possible that metabolites critically altered in DS are not included among those we have investigated (i.e., fall below the NMR detection limit). This hypothesis should be further investigated by studying a larger number of metabolites using complementary analytical platforms with a better sensitivity with respect to NMR, like mass spectrometry33,34. By increasing the number of measured metabolites, extended metabolic network models could be developed and might provide deeper insight in the biochemical origin of cognitive impairment35,36.

Materials and methods

Ethics Statement

The study was approved by the independent Ethics Committee of the University Hospital St. Orsola-Malpighi Polyclinic, Bologna, Italy (study number: 39/2013/U/Tess). Informed written consent was obtained from all participants or their legal guardians. The patients, if over 18, or their legal guardians, signed an informed consent form for the collection of blood and clinical data before taking part in the study. All procedures were carried out in accordance with the Ethical Principles for Medical Research Involving Human Subjects of the Helsinki Declaration.

Case selection

All subjects were admitted to the Day Hospital of the Neonatology Unit, Sant’Orsola-Malpighi Polyclinic, Bologna, and this study was proposed in the context of the yearly routine follow-up provided for DS. Inclusion criteria for subjects with DS were diagnosis of DS with homogeneous or mosaic trisomy 21 and minimum age of 2 years.

A total of 175 subjects, between 3 and 37 years old, participated in the study including 129 patients with DS and 46 healthy CTRLs chosen among DS siblings and with no evidence of abnormal karyotype. 26 DS subjects had at least one sibling enrolled in the study. Blood samples, as well as clinical data, were collected during the follow-up visit.

For every collected sample, parents filled out a form with information about current fasting state, last meal, consumed medications.

Plasma sample preparation

Plasma samples were collected from all subjects enrolled in the study, according to standard procedures37,38,39.

Blood samples were collected in EDTA-coated blood collection tubes and stored at room temperature. They were treated within two hours of blood collection and every delayed treatment of the sample was recorded.

The samples were transferred to a new tube and centrifuged at 1,200 g for 10 min to separate corpuscular fraction from plasma. The plasma fraction was isolated and centrifuged for a second time at 800 g for 30 min. The supernatant was transferred to new tubes avoiding contact with pellets or with the bottom of the tube and then divided in aliquots of 400 μL. All plasma samples were rapidly stored in a − 80 °C freezer.

To avoid contamination all procedures were conducted carefully. The plasma samples were excluded from the analyses when the treatment of blood samples occurred more than two hours from their collection, or when evident contamination by residual erythrocytes at the end of the treatments occurred. Any anomalies like different plasma color or precipitates after centrifugation were noted and considered in further analysis.

NMR sample preparation, spectra processing and spectral analysis

NMR samples were prepared according to standard procedures6,40,41.

NMR spectra for all samples were acquired using a Bruker 600 MHz spectrometer (Bruker BioSpin) operating at 600.13 MHz proton Larmor frequency and equipped with a 5 mm PATXI 1H-13C-15N and 2H-decoupling probe including a z-axis gradient coil, an automatic tuning-matching (ATM) and an automatic and refrigerate sample changer (SampleJet, Bruker BioSpin). A BTO 2000 thermocouple served for temperature stabilization at the level of approximately 0.1 K of the sample. Before measurement, samples were kept inside the NMR probe head for 5 minutes for temperature equilibration at 310 K.

For each plasma sample, two monodimensional 1H-NMR spectra were acquired with water peak suppression and different pulse sequences that allowed the selective observation of different molecular components. The spectra were: 1) a standard NOESY42 using 32 scans, 98,304 data points, a spectral width of 18,028 Hz, an acquisition time of 2.7 s, a relaxation delay of 4 s and a mixing time of 0.1 s. This type of spectrum is made up of signals arising from low molecular weight molecules (metabolites) and signals arising from macromolecules such as lipoproteins and lipids; 2) a standard CPMG (Purcell) using 32 scans, 73,728 data points, a spectral width of 12,019 Hz and a relaxation delay of 4 s. This type of spectrum contains only the signals arising from low molecular weight molecules (metabolites).

Before applying Fourier transform, free induction decays were multiplied by an exponential function equivalent to a 0.3 Hz line-broadening factor. Transformed spectra were automatically corrected for phase and baseline distortions and calibrated to the glucose doubled at δ 5.24 ppm, using TopSpin 3.5 (Bruker BioSpin)6,40,41.

Cognitive data collection

Cognitive data were collected and processed for a total of 61 DS children/adolescents from 3 to 16 years old. Following an increasingly widespread procedure in the field of intellectual disability43, the cognitive level was assessed through tests for expected mental age rather than for chronological age. This approach provides a more sensitive measure that avoids floor effects.

Children from 3 to 6 years and 11 months were assessed using the Griffiths-III scale44, a play-oriented developmental test. Considering the DS cognitive profile, two scales were used for the purpose of this study: foundation of learning (scale A), which assesses different aspects of thinking; and language and communication (scale B), which measures overall language development, including expressive language, receptive language, and to a lesser extent, the use of language to communicate socially.

Children/adolescents from 7 to 16 years old were assessed using the WPPSI-III scale45 which consists of different subtests summarized in three principal indexes: Verbal, Non Verbal and Total.

For both tests, raw scores were registered and later converted into AE scores. However, since the AE scores do not take into account the subject’s age, every statistical analysis involving AE values has been corrected for chronological age.

Moreover, an IQ score was calculated as the ratio of the subject’s AE to his/her chronological age, multiplied by 100.

Statistical analysis

Multivariate analysis of the NMR data (aimed at analyzing the spectra as a whole) was performed on binned spectra. Each spectrum in the 10.00–0.2 ppm region was segmented into 0.02 ppm chemical shift bins and the corresponding spectral areas were integrated using the AMIX software (Bruker BioSpin). The presence of EDTA as anticoagulant gives rise to few NMR signals that are very intense and whose concentration levels can be slightly different among samples. Thus, together with the spectral region containing water signals (region:4.40–5.00 ppm), spectral regions including EDTA signals (regions: 2.53–2.60, 2.68–2.73, 3.07–3.24, 3.58–3.64 ppm) were excluded before integration to avoid the presence of potentially confounding factors in multivariate analyses41,46,47. The multivariate statistical analysis was performed using both CPMG and NOESY binned spectra.

Different kinds of multivariate statistical techniques were used on the obtained bins using R 3.0.2 in house scripts41.

Unsupervised Principal Component Analysis (PCA) was used to obtain a preliminary outlook of the data (visualization in reduced space, cluster detection, screening for outliers). Partial Least Squares (PLS) analysis was employed to perform supervised data reduction and classification between samples from healthy and diseased volunteers. Canonical analysis (CA) was used in combination with PLS to increase supervised data reduction and classification. The accuracy for classification was assessed by means of a Monte Carlo validation scheme: each dataset was randomly divided by 200 times into a training set (90% of the data) which was used to build the model and a test set (10% of the data) which was used to test the integrity of the model. The resulting confusion matrix was reported and its discrimination accuracy, specificity and sensitivity were estimated according to standard definitions. Each classification model was also validated using permutation test (n = 500) and the resulting p-value was reported.

Univariate analysis of the NMR data was performed on Fourier transformed and calibrated CPMG spectra. Metabolites, whose peaks in the spectra were well defined and resolved, were assigned and their levels analyzed. The assignment procedure was made up using an 1H-NMR spectra library of pure organic compounds, public databases, e.g. Human Metabolome Database48, storing reference 1H-NMR spectra of metabolites, spiking 1H-NMR experiments and using literature data49. The relative concentrations of the various metabolites were calculated by integrating the corresponding signals in the spectra50, using the AssureNMR Software (Bruker BioSpin) and a home-made tool for R Software.

The nonparametric Wilcoxon-Mann-Whitney test was used for the determination of the meaningful metabolites. Here, a p-value < 0.05 was considered statistically significant. Considering FDR, the p-value was corrected using the Benjamini-Hochberg formula and reported as pFDR51.

SPSS Statistics (IBM, Version 25 for Mac OS X) was used to perform partial correlation between the level of each metabolite and the levels of all the other metabolites checking for the effect of chronological age (at the moment of blood-collection). We considered r-value between 0.4 and 0.7 as moderate correlation and r > 0.7 as strong correlation52. Briefly, from the main Menu of the “SPSS Statistics” software, we selected “Analyze” and then “Correlate”; we chose “Partial...” and finally we inserted our data in the main box and inserted “Age at Blood Collection” in the “Controlling for” box. To obtain p-value after FDR correction, we created a file with all the p-values obtained from the previous analysis and, using JMP software (SAS Institute, Version 14), from the main menu we selected “Add-in”, then “False Discovery Rate P-value” command and finally inserted the p-value column in “PValue column”.

To analyze the correlations between metabolite levels and AE scores obtained from Griffiths-III and WPPSI-III tests, we performed a partial correlation checking for the effect of chronological age (at the moment of the cognitive test) using SPSS Statistics software (from the main software Menu we selected “Analyze” and then “Correlate”, then “Partial...” and finally we included our data in the main box and inserted “Age” in the “Controlling for” box).

To investigate the influence of the different IQ scores on the metabolomic profiles, we divided the metabolome results into two groups: those deriving from subjects with DS having taken the Griffiths-III test and those deriving from subjects with DS having taken the WPPSI-III test. We distinguished the subjects according to the IQ scores obtained from the two kinds of cognitive tests for both groups. It is known that the IQ scores have a mean of 100 and an SD of 15 and that a subject with an IQ < −2SD has an intellectual disability. All subjects with DS included in this study have an IQ < −2SD. To perform the multivariate statistical analysis between a significant number of metabolomic profiles for each kind of cognitive test, we decided to create two main groups of data: a group of metabolomic profiles from subjects with an IQ > 40 (between 2 and 4 SD below average) and a second group from subjects with an IQ ≤ 40 (more than 4 SD below average).