Introduction

A collection of risk factors, including central obesity, insulin resistance, dyslipidemia, and hypertension, describes metabolic syndrome (MetS). Additionally, MetS is a known precursor in cardiovascular disease development1. MetS has become a major public health concern globally due to its increasing prevalence and association with various chronic diseases2. MetS etiology is quite complex, involving a strong interplay between multiple genetic, environmental and lifestyle-related factors. In European ancestry, the heritability of the MetS was estimated to be between 13–30%3,4. The early prognosis of MetS is therefore extremely valuable for early detection of individuals at high genetic risk of developing the disease later in life and for encouraging change in lifestyle to reduce risk. While numerous single nucleotide polymorphisms (SNPs) associated with individual metabolic components and diseases have been reported in genome-wide association studies (GWAS)5,6,7,8, the effect of these polymorphisms on the MetS network and related diseases is not well studied.

Further, of all MetS components, lipid levels seem under higher genetic determination9. This has also been observed in the genetic association studies suggesting that genetic effects on lipid levels are more pronounced than for other traits10. Most of the genetic association studies for MetS have been conducted in adult population5,10,11 and are limited by the usage of one-point measurements7,12,13,14. As the prevalence of MetS in children and young adults is increasing15, a better understanding of the genetics that underlies MetS throughout childhood and adolescence will provide critical insights into the origin of the disease. We performed a longitudinal analysis using a repeated measurement design for the effect of genetic variants on a quantitative MetS score from early childhood to adolescence. We examined the association between 350 pre-selected variants and the MetS score derived from measured waist circumference (WC), high-density lipoprotein (HDL), homeostasis model assessment of insulin resistance (HOMA-IR), triglycerides (TRG), systolic blood pressure (SBP) and diastolic blood pressure (DBP) in a pan-European children cohort.

Methodology

Study population

The study population was enrolled in a pan-European, multi-center, prospective IDEFICS/I.Family cohort across three-time points. The IDEFICS baseline survey included a population-based sample of 16,229 children aged 2 to 9.9 years from eight European countries (Belgium, Cyprus, Estonia, Germany, Hungary, Italy, Spain, and Sweden) who were examined the first time in 2007/2008. Follow-up examinations were conducted after two (T1) and six (T3, I.Family study) years16,17. In our longitudinal analysis using repeated measurement design, both baseline and follow-up data from the IDEFICS and I.Family study were included from all countries except Cyprus, for understanding the associations of genetic variants with MetS. In the IDEFICS/I.Family study, risk factors of lifestyle-related outcomes were investigated in young children and anthropometric and clinical examinations were conducted at each survey wave. Additionally, health characteristics and lifestyle behaviors were collected and biosamples were taken (Details in Supplementary methods). Parents gave written informed consent before study participation and children gave oral consent before the examinations. Ethical approval was obtained from the relevant local or national ethics committees by each of the study centers, namely from the Ethics Committee of the University Hospital Ghent (Belgium), the Tallinn Medical Research Ethics Committee of the National Institutes for Health Development (Estonia), the Ethics Committee of the University Bremen (Germany), the Scientific and Research Ethics Committee of the Medical Research Council Budapest (Hungary), the Ethics Committee of the Health Office Avellino (Italy), the Ethics Committee for Clinical Research of Aragon (Spain), and the Regional Ethical Review Board of Gothenburg (Sweden). We certify that all applicable institutional and governmental guidelines and regulations concerning the ethical use of human volunteers were followed during this research.

MetS Score

There are no universal definitions of MetS in children, we have, therefore, utilized a continuous MetS score as documented in a recent publication on the IDEFICS study. The MetS score was calculated summing age and sex-specific z-scores of WC, HOMA-IR, HDL, TRG, SBP, and DBP according to the following formula by Ahrens et al.18:

$${\rm{M}}{\rm{e}}{\rm{t}}{\rm{S}}\,{\rm{s}}{\rm{c}}{\rm{o}}{\rm{r}}{\rm{e}}={z}_{WC}+\frac{{z}_{SBP}+{z}_{DBP}}{2}+\frac{{z}_{TRG}-{z}_{HDL}}{2}+{z}_{HOMA-IR}$$

The components used to calculate the MetS score were based on the same risk factors used in the adult MetS definition. A higher score was associated with an unfavorable metabolic profile18. A detailed description of the measurements of components of MetS has been published previously18.

Genotyping and quality control of SNP data

Genomic DNA was extracted either from saliva or blood samples. Genotyping was conducted in two batches on 3492 children using the UK Biobank Axiom 196-Array from Affymetrix (Santa Clara, USA). We applied extensive quality control metrics to the data following the recommendations of Weale M19, based on which we excluded the following: SNPs with a call rate of less than 97.5%, failure to meet Hardy-Weinberg equilibrium at a p-value of less than 10−4, a minor allele frequency (MAF) of less than 0.5% (batch 1) and 0.08% (batch 2), samples with a call rate of less than 98% (batch 1) and 96% (batch 2), poor intensity, sex mismatch, anomalous high heterozygosity (cut-off of 3 standard deviations (SD) from mean), cryptic relatedness, no phenotypic information or as population outliers with any of a sample’s standardized principal component (PC) loading exceeds the interval mean ±3 SD19,20. We did quality control filtering using Affymetrix calling software APT and the R packages genABEL21 and SNPRelate22. A sample of 3067 children remained for further analyses. Genome-wide imputation was carried out using the Minimac3 v2.0.1 software and reference haplotypes from unrelated individuals from the 1000 Genomes Project phase III v5.

To address the issue of population stratification, we performed a principal components analysis using the SNPRelate v1.10.2 R package, where the eigenvectors or PCs are sorted in decreasing order of the corresponding eigenvalues. The first eigenvector (PC1) has the most variation in the data on the genetic matrix (SNP by sample); the second eigenvector (PC2) has the second-most, and so on. To account for relatedness in our sample, we calculated the genetic relatedness matrix (GRM) from the genotype data using the program EMMAX v20120210 (https://genome.sph.umich.edu/wiki/EMMAX). The GRM matrix along with relatedness further adjusts for population stratification.

Selection of candidate SNPs

A custom panel of SNPs were selected for analysis in this study using the following three strategies: (a) SNPs significantly associated in previous GWAS studies (p < 5 × 10−8) with MetS were identified using NHGRI-EBI GWAS Catalog23 and PubMed search (n = 29); (b) All SNP from candidate studies which were significantly associated (p < 0.05) with MetS were included using SNP curator platform24 (n = 193); (c) genes associated with MetS (using DisGenet browser25) and involved in lipid metabolism pathway (CTdbase26) were uploaded into the Candidate gene SNP selection (Genepipe) pipeline of “SNPinfo” a web-based SNP selection tool27 with European study population. The algorithm used for selecting SNPs from the provided list of genes was as follows: five kb upstream and 1 kb downstream of the gene coordinate were included in the selection. SNPs showing a MAF of 0.05 or greater were included. Tagging proportion cut-off to filter a gene was kept at 0.8 and the linkage disequilibrium (LD) threshold cut off was kept at 0.8. The minimum number of SNPs tagged by a tag SNP was set to 3. To ensure that each gene has some coverage a minimum of 1 tag SNP to a maximum of 5 tag SNPs per gene were included. Further SNPs were filtered using the functional SNP prediction in “Genepipe” that causes an amino acid change or that may alter the functional or structural properties of the translated protein, disrupt transcription factor binding sites, disrupt splice sites or other functional sites. A total of 156 SNPs were identified using this strategy. Overall, we obtained 371 SNPs after removing duplicates among the three selection strategies, out of which we had genotyping data from 357 SNPs. After excluding 4 monomorphic SNPs and 3 SNPs due to quality control issues, the final analyses were carried out on 350 SNPs (n = 117 genotyped, n = 233 imputed).

Meta-analysis

We carried out a meta-analysis to review associations between FTO variants significantly associated in the present study (rs8050136, rs1121980, rs1558902, rs9939609, rs1421085) and MetS as the outcome. We systematically searched PubMed, Web of Science and Scopus and supplemented it by scanning reference lists of articles identified (including reviews) up to December 2019. The search strategy is detailed in Supplementary Methods. Studies were eligible for inclusion if they had met all of the following criteria: (1) provided additive odds ratios (ORs) or sufficient genotypic information for calculating ORs with 95% confidence intervals (CI); (2) were retrospective or prospective in design, and (3) were conducted in humans. Studies reporting on components of MetS alone were excluded from the analysis. For each study included, the following information was extracted: first author, year of publication, geographical location, study design, sample size, number of cases and controls, information on assay performed for genotyping, effect sizes, allele/genotypic frequency in cases and controls, and confounders adjusted for in reported associations. The quality of each included study was assessed using the Newcastle-Ottawa Scale for case-control studies28 which range from zero points (low quality) to nine points (high quality). If multiple publications on the same study data were available, the most up-to-date or comprehensive information was used. Methods and results are reported following the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA) guidelines29.

Statistical analysis

The characteristics of study participants were presented as means (± SD) for continuous variables and as frequencies (percentages) for categorical variables. Associations between SNPs and repeated MetS score values of non-independent individuals were analyzed using the Wald t-test with one degree of freedom applied on linear mixed models (LMM), using the R package GMMAT30 adjusting for age, sex, country of residence and the top five PCs as fixed effects, and using a kinship matrix to define the covariance structure of the random effect included in the model.

To account for multiple testing, we corrected the statistical significance level to α = 0.05/350 = 1.4 × 10−4 by the Bonferroni correction and false discovery rate (FDR) method for the 350 hypothesis tests. For further analysis, we presented results for only those SNPs that survived the FDR correction. We stratified association models by sex, controlling for age, country of residence, first five PCs and kinship matrix. Additionally, we performed conditional analyses on the FTO locus rs8050136 as a covariate. To identify the driving factor in the association of SNPs and MetS, we recalculated the LMM with each of the MetS components: WC, HOMA-IR, HDL, TRG, SBP, and DBP. Throughout, we used r² to report LD between pairs of SNPs. Quantile-quantile (Q-Q) plots and the genomic inflation factor (λ) were used to evaluate control of type I error. LocusZoom31 was used to plot regions harboring significant signals (p < 1.4 × 10−4) to visualize LD patterns. Statistical analyses were performed using R 3.5.3 and Stata 15. All statistical tests were 2- sided.

Functional annotation using existing datasets

To identify potential causal genes explaining the observed genetic associations with MetS, we searched for existing expression quantitative trait loci (eQTL) SNPs in the eQTL dataset GTEx V832. We estimated the associations between the identified lead SNP and transcript expression levels for genes within a +/− 1 Mb cis window around the transcription start site or a trans-gene.

In-silico functional analysis

We examined the potential functional significance of the SNPs that reached the significance level using the combined annotation-dependent depletion (CADD) method proposed by Kircher and colleagues33. CADD produces a single C score to measure the deleteriousness of a given variant, which will greatly improve in prioritizing the causal variants while conducting genetic analyses33. We also extracted the RegulomeDB score to describe the regulatory potential of these SNPs34.

Meta-analysis Crude ORs and 95% CIs in each study were estimated using a genetic additive model and evaluated for the strength of the associations between FTO variants and MetS risk. The study reported additive ORs were utilized when sufficient information on genotypic/allelic frequencies were not provided. Study-specific risk estimates were pooled by using random-effects meta-analyses and sensitivity analyses were performed using fixed-effect meta-analyses. To determine whether the genotypes in the control group deviated from Hardy-Weinberg Equilibrium (HWE) we used the R-package HardyWeinberg35. Heterogeneity was assessed using the standard χ2 tests and I2 statistic, where I2 > 50% indicated substantial heterogeneity36. Evidence of publication bias was sought using the Egger regression test for funnel asymmetry in addition to visual inspection of the funnel plots37,38. Two-sided P values <0.05 were considered statistically significant.

Results

After quality control and analytical exclusions, we performed longitudinal analyses with genotypic information on 350 SNPs and repeated measures on study calculated MetS Scores from 3067 children at 3-time points (Fig. 1). Boys and girls were equally present in the analysis with a mean age of 6.20 (±1.77). Almost 5% of study participants were first degree relatives (Table 1).

Figure 1
figure 1

Flowchart for inclusion/exclusion criteria.

Table 1 Study characteristics at baseline.

MetS score was not available for 314 study participants in any survey. In total, 2,753 children were utilized for the main analysis to test the association between pre-selected candidate SNPs and longitudinal MetS score; however, we made use of all children to test SNP effects on the components of the MetS score. Details of exclusions are shown in the appendix (Supplementary Table 1). A genomic control factor λ of 1.22 in the Q-Q plot of the association p-values suggested slight systematic inflation (Supplementary Fig. 1). The first five PCs explain only 1% of variance suggesting there may be no hidden pattern in the dataset (Supplementary Fig. 2).

Our results yielded significant associations for 13 SNPs with p-values corrected for FDR (Table 2). With a stringent Bonferroni adjustment for multiple comparisons, we obtained significant associations (p < 1.4 × 10−4) for 5 SNPs, which were highly correlated in the 16q12.2 chromosomal region in the non-coding intronic region of the FTO gene. The SNPs located in FTO gene were in high LD (r2 > 0.87), with the strongest association signal observed for rs8050136 (Pwald = 1.52 × 10−5) (Fig. 2). In LMMs conditioned on rs8050136, the risk of other variants in 16q12.2 was completely attenuated and non-significant (Supplementary Table 2). We could not replicate previously reported GWAS SNPs of MetS conducted on adults in the present children cohort (Supplementary Table 3). The allele frequencies reported in this study were comparable to those reported for European samples (Supplementary Table 4).

Table 2 Association of markers with longitudinal Metabolic Syndrome score in children of IDEFICS/I.Family study.
Figure 2
figure 2

Regional association plot of markers with longitudinal metabolic syndrome score in children, recombination hotspots, and linkage disequilibrium heatmap for the 16q12.2 locus. −log10 of p values (left y-axis) drawn from the study participants of IDEFICS/I.Family cohort for a 500 kb region covering the entire FTO gene. The purple circle indicates the query variant (rs8050136). The LD estimates are color-coded as a heatmap from dark blue (0 ≥ r2 > 0.2) to red (0.8 ≥ r2 > 1.0). The bottom panel shows the genes and their orientation for each region. We based the association analysis on a one degree of freedom Wald t-test applied on linear mixed model, adjusted for age, sex, country of residence, first five principal components as fixed effects and kinship matrix to define the covariance structure of the random effect. The blue line represents the recombination rate (right y-axis) to estimate putative recombination hotspots across the region from HapMap.

Using data for additional covariates, we performed sex-specific analyses for SNPs that reached statistical significance (Table 3). The associations were stronger in boys compared to girls. We further went ahead to analyze the repeated measures of components of the MetS score as the outcome to understand which of the components drove the observed association. The variants in FTO were associated with higher SBP and larger WC whereas the variant A of rs708272 in CETP was strongly associated with decreased TRG levels and increased HDL levels (Supplementary Table 5).

Table 3 Association of markers with longitudinal Metabolic Syndrome stratified by sex.

A CADD-scaled C score of more than 10 for SNP rs8047395 (Supplementary Table 6) was observed in in-silico analyses. Similarly, a RegulomeDB score of four for three SNPs (rs8050136, rs1121980, and rs8044769; Supplementary Table 6) in the FTO gene was observed. Using existing eQTL datasets, we found that the rs8050136-A allele in muscle-skeletal tissue was associated with higher FTO gene expression based on the linear regression model.

Meta-analysis

We screened 193 records (Fig. 3) and identified 38 eligible studies39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77 for 5 FTO variants (8, 3, 3, 32, 10 studies for rs8050136, rs1121980, rs1558902, rs9939609 and rs1421085, respectively) on 80856 participants with 22462 cases and 58394 controls (Table 4). Including the present study there were 29760, 6343, 5532, 59411 and 9908 participants for rs8050136, rs1121980, rs1558902, rs9939609 and rs1421085 respectively. The control populations of the included studies were in HWE. In addition to ours, only 4 studies were conducted on children or adolescents. A forest plot of association of FTO variants with MetS is provided in Fig. 4. The OR for MetS and rs8050136, rs1121980, rs1558902, rs9939609 and rs1421085 was 1.17 (95% CI: 1.09–1.26), 1.14 (95% CI: 1.00–1.31), 1.26 (95% CI: 1.11–1.43), 1.14 (95% CI: 1.09–1.19) and 1.21 (95% CI: 1.08–1.35) respectively. The degree of between-study heterogeneity was least with I2 = 20.3% (P = 0.263) for rs8050136 and highest for rs1421085 with I2 = 53.5% (P = 0.018). Sensitivity analyses that used fixed-effect meta-analysis (rather than random-effects meta-analysis as in the primary analysis) yielded similar OR as random effect meta-analysis (Supplementary Fig. 3). There was no evidence for publication bias, as indicated by funnel plot analyses and Egger test for asymmetry (Supplementary Fig. 4).

Figure 3
figure 3

Flow Diagram of Study Selection Process for Meta-analysis.

Table 4 Characteristics of studies included in the meta-analysis.
Figure 4
figure 4

Forest plots of random effect meta-analysis of the association of FTO variants (rs8050136, rs1121980, rs1558902, rs9939609, rs1421085) with Metabolic Syndrome. CI = confidence interval. Sizes of data markers indicate the weight of each study in the analysis. Study-specific odds ratios were pooled using random-effects meta-analysis. Col, 2017; Dusatkova, 2013; Liem 2010; Zhao 2014 were conducted in the young population (age <18 years). Additive ORs were used as indicated in the study for Liem, 2010; Sjogren, 2008; Zhao; 2014.

Discussion

Over the past decade, common genetic loci have been reported to be associated with MetS in different studies, mostly at a single time-point using a cross-sectional or a case-control approach7,76,78. Our study took a step ahead in investigating 350 pre-selected loci for their longitudinal association with a continuous MetS score during the transition from childhood to adolescence in a pan-European cohort of children with a follow-up period of up to seven years. We observed a strong association between common genetic variants in the FTO and longitudinal MetS score after Bonferroni correction for multiple comparisons. We observed stronger associations in boys as compared to girls. The effect sizes observed in our study on children were much larger than those reported in adults further suggesting greater genetic predisposition and lower influence from environmental and behavioral factors in youth.

The FTO gene codes for a nuclear protein of the non-haem iron and 2-oxoglutarate-dependent oxygenase superfamily, which is involved in posttranslational modification, DNA repair, and fatty acid metabolism79. FTO which is primarily expressed in the hypothalamus, plays a key role in energy homeostasis and regulation of food intake80. Even DNA methylation studies have shown an association with many pathological conditions including obesity81,82. FTO may thus play a role in metabolic regulation by altering gene expression in metabolically active tissues83. While the exact mechanism remains to be unraveled, it has been shown that genetic variants within the FTO gene are linked functionally to another obesity-related gene called IRX3, which promotes browning of white adipocytes, maybe a connecting link between FTO variants and obesity-related disorders76,84,85. Further, previous studies have observed that individuals homozygous for the risk alleles in FTO have impaired metabolic profile86,87,88. Similarly, our findings of the FTO association with MetS score may be related to its association with obesity89,90, T2DM91 and/or lipid abnormalities92,93. This is supported by the associations we observed between FTO variants and components of the MetS, particularly with WC and SBP. Various candidate gene studies have observed association between FTO variants and MetS in adults71,73,77,93 across different ethnicities73,76,93,94. Our results confirm the association of FTO variants and MetS in children and adolescent populations via its implication in the regulation of body fatness.

Though the CETP variant did not survive conservative Bonferroni correction, we observed a strong association of rs708272 with increased HDL (ß = 4.03, p = 5.63 × 10−40) and decreased TRG (ß = −2.43, p = 9.60 × 10−5) levels. Consistent to our observations previous literature has shown that some variants in the CETP gene, an essential protein of reverse cholesterol transport process are associated with decreased plasma CETP protein activity and protein levels, culminating in higher concentrations of HDL95,96 and reduced concentrations of TRG13. Similarly, meta-analyses have shown that carriers of the T allele, associated with lower CETP, have higher HDL concentrations than CC homozygotes97 and thereby showing an inverse association with MetS. Further, rs708272 of the CETP gene was moderately correlated (r2 = 0.47, MAF = 0.41) with the GWAS-identified SNP rs17353910, a less common SNP (MAF = 0.30) which could not be detected in the present study given the moderate sample size. We observed a significant association of rs708272 with MetS score after adjusting for BMI z-scores (Supplementary Table 7), suggesting that the association may partly be driven by lipid metabolism in addition to obesity.

In-silico examinations of the possible functional significance of SNPs found in our sample suggested that the FTO gene had a CADD C score of over 10 for one SNP. Likewise, the RegulomeDB score of 4 in the FTO gene for three SNPs suggests that transcription factor binding could be impaired by these SNPs., thus indicating that one or more variants in the FTO gene are likely to have a functional effect. Analysis of the eQTL showed that the rs8050136-A allele may upregulate the level of FTO gene expression in the muscle-skeletal tissue. However, to establish the biological function of these variants of susceptibility, more functional work is needed.

To further assess whether the MetS score association results vary by sex, we performed stratified analysis. The associations remained significant for both boys and girls with slightly stronger associations observed in boys. This is obvious as MetS is more common in adult males as compared to adult females in Europeans and other high-income countries98. A possible explanation could be due to the sex-modulated fat distribution interactions with the dynamics of cardiometabolic risk99.

In recent years there has been no meta-analysis on the FTO variants and MetS94,100,101,102, therefore the present meta-analysis provides an updated overview of the risk associated with variants in 16q12.2 involving data from 38 studies on 80856 participants plus the present IDEFICS/I.Family study. Pooled estimates from the meta-analysis further confirmed our findings for rs8050136, rs1121980, rs1558902, rs9939609, rs1421085 and MetS risk. Again, most of the studies in the meta-analysis were conducted on adults which may not be an appropriate extrapolation to children, given its greater impact in children compared to adults103.

Strengths of our study include the design (samples derived from a well-phenotyped cohort of children), an accurate and highly standardized outcome measurement, and the ability to include several important covariates. To our knowledge, this is the first study to report common genetic variation conferring MetS risk with longitudinal analysis in children104. The study could have benefitted further by in-depth laboratory functional assays, but this was beyond the scope of this paper. We therefore conducted an in-silico functional analysis. Though the study was adequately powered to detect associations with common genetic variations, we couldn’t replicate the previously identified GWAS SNPs conducted in adults, which could be for example attributable to absence of power to detect less common SNPs or SNPs with small effects, to differences in linkage disequilibrium, age group structure or the analytical methods across studies105. However, the greater impact of FTO variants in children as compared to adults is well known106,107, and therefore the association of the FTO variants in childhood MetS etiology, not observed by GWAS of the adult population, implies the involvement of different SNPs at different age groups.

In conclusion, the results from the present study along with the comprehensive meta-analysis advance etiologic understanding of childhood MetS, highlight that the genetic predisposition to MetS is largely driven by genes of obesity and lipid metabolism. Future work on functional characterization will further help in understanding the biological underpinnings underlying long-term MetS regulation. Our observation of distinct associations of variants of FTO and CETP for different component traits of MetS in children, suggests devising polygenic scores for MetS which may prove to be fundamental for identifying children and subsequently adults of the high-risk group to allow earlier targeted interventions.