Common genetic variation in obesity, lipid transfer genes and risk of Metabolic Syndrome: Results from IDEFICS/I.Family study and meta-analysis

As the prevalence of metabolic syndrome (MetS) in children and young adults is increasing, a better understanding of genetics that underlie MetS will provide critical insights into the origin of the disease. We examined associations of common genetic variants and repeated MetS score from early childhood to adolescence in a pan-European, prospective IDEFICS/I.Family cohort study with baseline survey and follow-up examinations after two and six years. We tested associations in 3067 children using a linear mixed model and confirmed the results with meta-analysis of identified SNPs. With a stringent Bonferroni adjustment for multiple comparisons we obtained significant associations(p < 1.4 × 10−4) for 5 SNPs, which were in high LD (r2 > 0.85) in the 16q12.2 non-coding intronic chromosomal region of FTO gene with strongest association observed for rs8050136 (effect size(β) = 0.31, pWald = 1.52 × 10−5). We also observed a strong association of rs708272 in CETP with increased HDL (p = 5.63 × 10−40) and decreased TRG (p = 9.60 × 10−5) levels. These findings along with meta-analysis advance etiologic understanding of childhood MetS, highlighting that genetic predisposition to MetS is largely driven by genes of obesity and lipid metabolism. Inclusion of the associated genetic variants in polygenic scores for MetS may prove to be fundamental for identifying children and subsequently adults of the high-risk group to allow earlier targeted interventions.

Further, of all MetS components, lipid levels seem under higher genetic determination 9 . This has also been observed in the genetic association studies suggesting that genetic effects on lipid levels are more pronounced than for other traits 10 . Most of the genetic association studies for MetS have been conducted in adult population 5,10,11 and are limited by the usage of one-point measurements 7,[12][13][14] . As the prevalence of MetS in children and young adults is increasing 15 , a better understanding of the genetics that underlies MetS throughout childhood and adolescence will provide critical insights into the origin of the disease. We performed a longitudinal analysis using a repeated measurement design for the effect of genetic variants on a quantitative MetS score from early childhood to adolescence. We examined the association between 350 pre-selected variants and the MetS score derived from measured waist circumference (WC), high-density lipoprotein (HDL), homeostasis model assessment of insulin resistance (HOMA-IR), triglycerides (TRG), systolic blood pressure (SBP) and diastolic blood pressure (DBP) in a pan-European children cohort.

Methodology
Study population. The study population was enrolled in a pan-European, multi-center, prospective IDEFICS/I.Family cohort across three-time points. The IDEFICS baseline survey included a population-based sample of 16,229 children aged 2 to 9.9 years from eight European countries (Belgium, Cyprus, Estonia, Germany, Hungary, Italy, Spain, and Sweden) who were examined the first time in 2007/2008. Follow-up examinations were conducted after two (T1) and six (T3, I.Family study) years 16,17 . In our longitudinal analysis using repeated measurement design, both baseline and follow-up data from the IDEFICS and I.Family study were included from all countries except Cyprus, for understanding the associations of genetic variants with MetS. In the IDEFICS/I. Family study, risk factors of lifestyle-related outcomes were investigated in young children and anthropometric and clinical examinations were conducted at each survey wave. Additionally, health characteristics and lifestyle behaviors were collected and biosamples were taken (Details in Supplementary methods). Parents gave written informed consent before study participation and children gave oral consent before the examinations. Ethical approval was obtained from the relevant local or national ethics committees by each of the study centers, namely from the Ethics Committee of the University Hospital Ghent (Belgium), the Tallinn Medical Research Ethics Committee of the National Institutes for Health Development (Estonia), the Ethics Committee of the University Bremen (Germany), the Scientific and Research Ethics Committee of the Medical Research Council Budapest (Hungary), the Ethics Committee of the Health Office Avellino (Italy), the Ethics Committee for Clinical Research of Aragon (Spain), and the Regional Ethical Review Board of Gothenburg (Sweden). We certify that all applicable institutional and governmental guidelines and regulations concerning the ethical use of human volunteers were followed during this research.
MetS Score. There are no universal definitions of MetS in children, we have, therefore, utilized a continuous MetS score as documented in a recent publication on the IDEFICS study. The MetS score was calculated summing age and sex-specific z-scores of WC, HOMA-IR, HDL, TRG, SBP, and DBP according to the following formula by Ahrens et al. 18 : The components used to calculate the MetS score were based on the same risk factors used in the adult MetS definition. A higher score was associated with an unfavorable metabolic profile 18 . A detailed description of the measurements of components of MetS has been published previously 18 . Genotyping and quality control of SNP data. Genomic DNA was extracted either from saliva or blood samples. Genotyping was conducted in two batches on 3492 children using the UK Biobank Axiom 196-Array from Affymetrix (Santa Clara, USA). We applied extensive quality control metrics to the data following the recommendations of Weale M 19 , based on which we excluded the following: SNPs with a call rate of less than 97.5%, failure to meet Hardy-Weinberg equilibrium at a p-value of less than 10 −4 , a minor allele frequency (MAF) of less than 0.5% (batch 1) and 0.08% (batch 2), samples with a call rate of less than 98% (batch 1) and 96% (batch 2), poor intensity, sex mismatch, anomalous high heterozygosity (cut-off of 3 standard deviations (SD) from mean), cryptic relatedness, no phenotypic information or as population outliers with any of a sample's standardized principal component (PC) loading exceeds the interval mean ±3 SD 19,20 . We did quality control filtering using Affymetrix calling software APT and the R packages genABEL 21 and SNPRelate 22 . A sample of 3067 children remained for further analyses. Genome-wide imputation was carried out using the Minimac3 v2.0.1 software and reference haplotypes from unrelated individuals from the 1000 Genomes Project phase III v5.
To address the issue of population stratification, we performed a principal components analysis using the SNPRelate v1.10.2 R package, where the eigenvectors or PCs are sorted in decreasing order of the corresponding eigenvalues. The first eigenvector (PC1) has the most variation in the data on the genetic matrix (SNP by sample); the second eigenvector (PC2) has the second-most, and so on. To account for relatedness in our sample, we calculated the genetic relatedness matrix (GRM) from the genotype data using the program EMMAX v20120210 (https://genome.sph.umich.edu/wiki/EMMAX). The GRM matrix along with relatedness further adjusts for population stratification.
Selection of candidate Snps. A custom panel of SNPs were selected for analysis in this study using the following three strategies: (a) SNPs significantly associated in previous GWAS studies (p < 5 × 10 −8 ) with MetS were identified using NHGRI-EBI GWAS Catalog 23 and PubMed search (n = 29); (b) All SNP from candidate studies which were significantly associated (p < 0.05) with MetS were included using SNP curator platform 24 (n = 193); (c) genes associated with MetS (using DisGenet browser 25 ) and involved in lipid metabolism pathway (CTdbase 26 ) were uploaded into the Candidate gene SNP selection (Genepipe) pipeline of "SNPinfo" a web-based SNP selection tool 27 with European study population. The algorithm used for selecting SNPs from the provided list of genes was as follows: five kb upstream and 1 kb downstream of the gene coordinate were included in the selection. SNPs showing a MAF of 0.05 or greater were included. Tagging proportion cut-off to filter a gene was kept at 0.8 and the linkage disequilibrium (LD) threshold cut off was kept at 0.8. The minimum number of SNPs tagged by a tag SNP was set to 3. To ensure that each gene has some coverage a minimum of 1 tag SNP to a maximum of 5 tag SNPs per gene were included. Further SNPs were filtered using the functional SNP prediction in "Genepipe" that causes an amino acid change or that may alter the functional or structural properties of the translated protein, disrupt transcription factor binding sites, disrupt splice sites or other functional sites. A total of 156 SNPs were identified using this strategy. Overall, we obtained 371 SNPs after removing duplicates among the three selection strategies, out of which we had genotyping data from 357 SNPs. After excluding 4 monomorphic SNPs and 3 SNPs due to quality control issues, the final analyses were carried out on 350 SNPs (n = 117 genotyped, n = 233 imputed).

Meta-analysis.
We carried out a meta-analysis to review associations between FTO variants significantly associated in the present study (rs8050136, rs1121980, rs1558902, rs9939609, rs1421085) and MetS as the outcome. We systematically searched PubMed, Web of Science and Scopus and supplemented it by scanning reference lists of articles identified (including reviews) up to December 2019. The search strategy is detailed in Supplementary Methods. Studies were eligible for inclusion if they had met all of the following criteria: (1) provided additive odds ratios (ORs) or sufficient genotypic information for calculating ORs with 95% confidence intervals (CI); (2) were retrospective or prospective in design, and (3) were conducted in humans. Studies reporting on components of MetS alone were excluded from the analysis. For each study included, the following information was extracted: first author, year of publication, geographical location, study design, sample size, number of cases and controls, information on assay performed for genotyping, effect sizes, allele/genotypic frequency in cases and controls, and confounders adjusted for in reported associations. The quality of each included study was assessed using the Newcastle-Ottawa Scale for case-control studies 28 which range from zero points (low quality) to nine points (high quality). If multiple publications on the same study data were available, the most up-to-date or comprehensive information was used. Methods and results are reported following the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA) guidelines 29 .
Statistical analysis. The characteristics of study participants were presented as means (± SD) for continuous variables and as frequencies (percentages) for categorical variables. Associations between SNPs and repeated MetS score values of non-independent individuals were analyzed using the Wald t-test with one degree of freedom applied on linear mixed models (LMM), using the R package GMMAT 30 adjusting for age, sex, country of residence and the top five PCs as fixed effects, and using a kinship matrix to define the covariance structure of the random effect included in the model.
To account for multiple testing, we corrected the statistical significance level to α = 0.05/350 = 1.4 × 10 −4 by the Bonferroni correction and false discovery rate (FDR) method for the 350 hypothesis tests. For further analysis, we presented results for only those SNPs that survived the FDR correction. We stratified association models by sex, controlling for age, country of residence, first five PCs and kinship matrix. Additionally, we performed conditional analyses on the FTO locus rs8050136 as a covariate. To identify the driving factor in the association of SNPs and MetS, we recalculated the LMM with each of the MetS components: WC, HOMA-IR, HDL, TRG, SBP, and DBP. Throughout, we used r² to report LD between pairs of SNPs. Quantile-quantile (Q-Q) plots and the genomic inflation factor (λ) were used to evaluate control of type I error. LocusZoom 31 was used to plot regions harboring significant signals (p < 1.4 × 10 −4 ) to visualize LD patterns. Statistical analyses were performed using R 3.5.3 and Stata 15. All statistical tests were 2-sided.  www.nature.com/scientificreports www.nature.com/scientificreports/ functional annotation using existing datasets. To identify potential causal genes explaining the observed genetic associations with MetS, we searched for existing expression quantitative trait loci (eQTL) SNPs in the eQTL dataset GTEx V8 32 . We estimated the associations between the identified lead SNP and transcript expression levels for genes within a +/− 1 Mb cis window around the transcription start site or a trans-gene.
In-silico functional analysis. We examined the potential functional significance of the SNPs that reached the significance level using the combined annotation-dependent depletion (CADD) method proposed by Kircher and colleagues 33 . CADD produces a single C score to measure the deleteriousness of a given variant, which will greatly improve in prioritizing the causal variants while conducting genetic analyses 33 . We also extracted the RegulomeDB score to describe the regulatory potential of these SNPs 34 .
Meta-analysis Crude ORs and 95% CIs in each study were estimated using a genetic additive model and evaluated for the strength of the associations between FTO variants and MetS risk. The study reported additive ORs were utilized when sufficient information on genotypic/allelic frequencies were not provided. Study-specific risk estimates were pooled by using random-effects meta-analyses and sensitivity analyses were performed using fixed-effect meta-analyses. To determine whether the genotypes in the control group deviated from Hardy-Weinberg Equilibrium (HWE) we used the R-package HardyWeinberg 35 . Heterogeneity was assessed using the standard χ 2 tests and I 2 statistic, where I 2 > 50% indicated substantial heterogeneity 36 . Evidence of publication bias was sought using the Egger regression test for funnel asymmetry in addition to visual inspection of the funnel plots 37,38 . Two-sided P values <0.05 were considered statistically significant.

Results
After quality control and analytical exclusions, we performed longitudinal analyses with genotypic information on 350 SNPs and repeated measures on study calculated MetS Scores from 3067 children at 3-time points (Fig. 1). Boys and girls were equally present in the analysis with a mean age of 6.20 (±1.77). Almost 5% of study participants were first degree relatives (Table 1).
MetS score was not available for 314 study participants in any survey. In total, 2,753 children were utilized for the main analysis to test the association between pre-selected candidate SNPs and longitudinal MetS score; however, we made use of all children to test SNP effects on the components of the MetS score. Details of exclusions are shown in the appendix (Supplementary Table 1). A genomic control factor λ of 1.22 in the Q-Q plot of the association p-values suggested slight systematic inflation (Supplementary Fig. 1). The first five PCs explain only 1% of variance suggesting there may be no hidden pattern in the dataset (Supplementary Fig. 2).
Our results yielded significant associations for 13 SNPs with p-values corrected for FDR (Table 2). With a stringent Bonferroni adjustment for multiple comparisons, we obtained significant associations (p < 1.4 × 10 −4 ) We based the association analysis on a one degree of freedom Wald t-test applied on linear mixed model, adjusted for age, sex, country of residence, first five principal components as fixed effects and kinship matrix to define the covariance structure of the random effect. The blue line represents the recombination rate (right y-axis) to estimate putative recombination hotspots across the region from HapMap. (2020) 10 Table 3. Association of markers with longitudinal Metabolic Syndrome stratified by sex. ß = estimated coefficient, Chr = chromosome, EAF = effect allele frequency, FDR = false discovery rate, PVAL = p-value, SNP = single nucleotide polymorphism, SE = standard error. The effect allele is the allele corresponding to the calculated risk. Adjusted for age, sex, country of residence, first five principal components as fixed effects and kinship matrix to define the covariance structure of the random effect. The results here are presented for the markers that reached statistical significance after correction for FDR in the main analysis in Table 2. www.nature.com/scientificreports www.nature.com/scientificreports/ for 5 SNPs, which were highly correlated in the 16q12.2 chromosomal region in the non-coding intronic region of the FTO gene. The SNPs located in FTO gene were in high LD (r 2 > 0.87), with the strongest association signal observed for rs8050136 (P wald = 1.52 × 10 −5 ) (Fig. 2). In LMMs conditioned on rs8050136, the risk of other variants in 16q12.2 was completely attenuated and non-significant (Supplementary Table 2). We could not replicate  www.nature.com/scientificreports www.nature.com/scientificreports/ previously reported GWAS SNPs of MetS conducted on adults in the present children cohort (Supplementary Table 3). The allele frequencies reported in this study were comparable to those reported for European samples (Supplementary Table 4).
Using data for additional covariates, we performed sex-specific analyses for SNPs that reached statistical significance ( Table 3). The associations were stronger in boys compared to girls. We further went ahead to analyze the repeated measures of components of the MetS score as the outcome to understand which of the components drove the observed association. The variants in FTO were associated with higher SBP and larger WC whereas the variant A of rs708272 in CETP was strongly associated with decreased TRG levels and increased HDL levels (Supplementary Table 5).
A CADD-scaled C score of more than 10 for SNP rs8047395 (Supplementary Table 6) was observed in in-silico analyses. Similarly, a RegulomeDB score of four for three SNPs (rs8050136, rs1121980, and rs8044769; Supplementary Table 6) in the FTO gene was observed. Using existing eQTL datasets, we found that the rs8050136-A allele in muscle-skeletal tissue was associated with higher FTO gene expression based on the linear regression model.

Discussion
Over the past decade, common genetic loci have been reported to be associated with MetS in different studies, mostly at a single time-point using a cross-sectional or a case-control approach 7,76,78 . Our study took a step ahead in investigating 350 pre-selected loci for their longitudinal association with a continuous MetS score during the transition from childhood to adolescence in a pan-European cohort of children with a follow-up period of up to seven years. We observed a strong association between common genetic variants in the FTO and longitudinal MetS score after Bonferroni correction for multiple comparisons. We observed stronger associations in boys as compared to girls. The effect sizes observed in our study on children were much larger than those reported in adults further suggesting greater genetic predisposition and lower influence from environmental and behavioral factors in youth.
The FTO gene codes for a nuclear protein of the non-haem iron and 2-oxoglutarate-dependent oxygenase superfamily, which is involved in posttranslational modification, DNA repair, and fatty acid metabolism 79 . FTO which is primarily expressed in the hypothalamus, plays a key role in energy homeostasis and regulation of food intake 80 . Even DNA methylation studies have shown an association with many pathological conditions including obesity 81,82 . FTO may thus play a role in metabolic regulation by altering gene expression in metabolically active tissues 83 . While the exact mechanism remains to be unraveled, it has been shown that genetic variants within the FTO gene are linked functionally to another obesity-related gene called IRX3, which promotes browning of white adipocytes, maybe a connecting link between FTO variants and obesity-related disorders 76,84,85 . Further, previous studies have observed that individuals homozygous for the risk alleles in FTO have impaired metabolic profile [86][87][88] . Similarly, our findings of the FTO association with MetS score may be related to its association with obesity 89,90 , T2DM 91 and/or lipid abnormalities 92,93 . This is supported by the associations we observed between FTO variants and components of the MetS, particularly with WC and SBP. Various candidate gene studies have observed association between FTO variants and MetS in adults 71,73,77,93 across different ethnicities 73,76,93,94 . Our results confirm the association of FTO variants and MetS in children and adolescent populations via its implication in the regulation of body fatness.
Though the CETP variant did not survive conservative Bonferroni correction, we observed a strong association of rs708272 with increased HDL (ß = 4.03, p = 5.63 × 10 −40 ) and decreased TRG (ß = −2.43, p = 9.60 × 10 −5 ) levels. Consistent to our observations previous literature has shown that some variants in the CETP gene, www.nature.com/scientificreports www.nature.com/scientificreports/ an essential protein of reverse cholesterol transport process are associated with decreased plasma CETP protein activity and protein levels, culminating in higher concentrations of HDL 95,96 and reduced concentrations of TRG 13 . Similarly, meta-analyses have shown that carriers of the T allele, associated with lower CETP, have higher HDL concentrations than CC homozygotes 97 and thereby showing an inverse association with MetS. Further, rs708272 of the CETP gene was moderately correlated (r 2 = 0.47, MAF = 0.41) with the GWAS-identified SNP rs173539 10 , a less common SNP (MAF = 0.30) which could not be detected in the present study given the moderate sample size. We observed a significant association of rs708272 with MetS score after adjusting for BMI z-scores (Supplementary Table 7), suggesting that the association may partly be driven by lipid metabolism in addition to obesity.
In-silico examinations of the possible functional significance of SNPs found in our sample suggested that the FTO gene had a CADD C score of over 10 for one SNP. Likewise, the RegulomeDB score of 4 in the FTO gene for three SNPs suggests that transcription factor binding could be impaired by these SNPs., thus indicating that one or more variants in the FTO gene are likely to have a functional effect. Analysis of the eQTL showed that the rs8050136-A allele may upregulate the level of FTO gene expression in the muscle-skeletal tissue. However, to establish the biological function of these variants of susceptibility, more functional work is needed.
To further assess whether the MetS score association results vary by sex, we performed stratified analysis. The associations remained significant for both boys and girls with slightly stronger associations observed in boys. This is obvious as MetS is more common in adult males as compared to adult females in Europeans and other high-income countries 98 . A possible explanation could be due to the sex-modulated fat distribution interactions with the dynamics of cardiometabolic risk 99 .
In recent years there has been no meta-analysis on the FTO variants and MetS 94,100-102 , therefore the present meta-analysis provides an updated overview of the risk associated with variants in 16q12.2 involving data from 38 studies on 80856 participants plus the present IDEFICS/I.Family study. Pooled estimates from the meta-analysis further confirmed our findings for rs8050136, rs1121980, rs1558902, rs9939609, rs1421085 and MetS risk. Again, most of the studies in the meta-analysis were conducted on adults which may not be an appropriate extrapolation to children, given its greater impact in children compared to adults 103 .
Strengths of our study include the design (samples derived from a well-phenotyped cohort of children), an accurate and highly standardized outcome measurement, and the ability to include several important covariates. To our knowledge, this is the first study to report common genetic variation conferring MetS risk with longitudinal analysis in children 104 . The study could have benefitted further by in-depth laboratory functional assays, but this was beyond the scope of this paper. We therefore conducted an in-silico functional analysis. Though the study was adequately powered to detect associations with common genetic variations, we couldn't replicate the previously identified GWAS SNPs conducted in adults, which could be for example attributable to absence of power to detect less common SNPs or SNPs with small effects, to differences in linkage disequilibrium, age group structure or the analytical methods across studies 105 . However, the greater impact of FTO variants in children as compared to adults is well known 106,107 , and therefore the association of the FTO variants in childhood MetS etiology, not observed by GWAS of the adult population, implies the involvement of different SNPs at different age groups.
In conclusion, the results from the present study along with the comprehensive meta-analysis advance etiologic understanding of childhood MetS, highlight that the genetic predisposition to MetS is largely driven by genes of obesity and lipid metabolism. Future work on functional characterization will further help in understanding the biological underpinnings underlying long-term MetS regulation. Our observation of distinct associations of variants of FTO and CETP for different component traits of MetS in children, suggests devising polygenic scores for MetS which may prove to be fundamental for identifying children and subsequently adults of the high-risk group to allow earlier targeted interventions.

Data availability
The authors declare that the data supporting the findings of this study are available within the article and its Supplementary Information files.