Pathway analysis of body mass index genome-wide association study highlights risk pathways in cardiovascular disease

Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. It is reported that body mass index (BMI) is risk factor for CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors. However, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms from 4763 samples. We identified 31 significant KEGG pathways. There is literature evidence supporting the involvement of GnRH signaling, vascular smooth muscle contraction, dilated cardiomyopathy, Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results may advance our understanding of BMI mechanisms in CVD.

Scientific RepoRts | 5:13025 | DOi: 10.1038/srep13025 The overweight and obese adults had higher levels for each biomarker compared to normal-weight individuals 4 .
Much effort has been put into identifying the genetic determinants of CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors [5][6][7][8][9][10][11] . However, these newly identified susceptibility loci exert very small risk effects and cannot fully explain the underlying genetic risk. A large proportion of heritability has yet to be explained. Recent pathway analyses of GWAS have been put in the investigation of human disease pathogenesis and yielded important new insights into genetic mechanisms of human complex diseases, such as Alzheimer's disease 12,13 , and rheumatoid arthritis 14 . Until now, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms (SNPs) from 4763 samples.

Materials and Methods
Study Population. The study subjects were from members of the Northern Finnish Birth Cohort of 1966 (NFBC1966) 15 . Mothers expected to give birth in the two Northern provinces of Oulu and Lapland in 1966 were enrolled in the NFBC1966 (N = 12058 live births, Rantakallio 1969) 15 . Primary clinical data collection on parents and the child occurred prenatal and at birth. Data collection on the child continued at ages six months, one year, 14 years (no data from one year or 14 years are included in this paper), 31 years, with assessment of a wide range of trait measures 15 . Informed consent from all study subjects was obtained using protocols approved by the Ethical Committee of the Northern Ostrobothnia Hospital District. The methods were carried out in accordance with the approved guidelines. Participants provided fasting blood samples for evaluation of the metabolic measures 15 . 4763 samples were genotyped using Illumina Infinium 370cnvDuo array 15 . According to the exclusion criteria and quality control procedures, SNPs were included for following analysis if the call rate in the final sample was > 95%, if the P value from a test of Hardy-Weinberg Equilibrium (HWE) was > 0.0001, and if the Minor Allele Frequency (MAF) was > 1% 15 . In the end, 329,091 SNPs passed the quality controls and were selected for following analysis. For each SNP, the genotype was coded as 0, 1 or 2 copies of the minor allele. A regression analysis in PLINK was used to test the association between each SNP and BMI 15 .
Gene-based testing for GWAS dataset. Here, we got the summary results from SNP-based test in the original study 15 and performed a gene-based testing for GWAS dataset. ProxyGeneLD was used to assign SNPs to specific genes 16 . ProxyGeneLD begins with the retrieval of linkage disequilibrium (LD) structures in the HapMap genotyping data 16 . If a group of markers is in high LD in HapMap (r 2 > 0.8), they are tied to a 'proxy cluster' and taken as a single signal 17 . Next, each marker in the BMI GWAS with statistically significant evidence of association is evaluated to see whether (a) it belongs to any proxy cluster and (b) whether the marker itself or any marker in the cluster is located in a genetic region 17 . If a marker or cluster overlaps a region extending across a gene, then it is assigned as showing possible association with that gene. Finally, a P value was given for each gene 16 . The P value was adjusted for the LD patterns in the human genome and gene length, but not multiple hypothesis testing correction. Genes with adjusted P < 0.05 are considered to be significant. For more detailed algorithms, please refer to the original study and our previous publications 16 .

Pathway-based testing for GWAS dataset. The Kyoto Encyclopedia of Genes and Genomes
(KEGG) pathways in WebGestalt were used 18 . For a given pathway, the hypergeometric test was used to detect the overrepresentation of BMI-related genes among all of the genes in the pathway 18 . The false discovery rate (FDR) method was used to correct for multiple testing. Any pathway with an adjusted P < 0.01 and at least five BMI genes was considered significant. In order to reduce the multiple-testing issue and to avoid testing overly narrow or broad pathways, we selected pathways that contained at least 20 and at most 300 genes for subsequent analysis.

Results
Gene-based test for GWAS dataset. We got 1008 significant BMI genes, which included 41 BMI genes with P < 0.001. TFAP2B is the most significant gene, which is reported to be significantly associated with BMI by previous studies [19][20][21] . Meanwhile, we identified other new BMI susceptibility genes. These genes were significantly associated with BMI with P < 0.001 (Table 1).
Pathway-based analysis of GWAS dataset. We identified 31 significant KEGG pathways with at least five BMI genes. Based on the classifications of the KEGG pathways, these 31 pathways can be mainly divided into environmental information processing (n = 6), cellular processes (n = 5), circulatory system cellular processes (n = 5), metabolism (n = 5), endocrine system (n = 2), genetic information processing (n = 2), immune system (n = 2), nervous system and diseases (n = 2), cardiovascular diseases (n = 1) ( Table 2). The detailed genes in these significant pathways are described in Supplementary 21 .
On the pathway level, we identified 31 significant KEGG pathways. Some of these pathways are identified to be associated with CVD. Here, we identified GnRH signaling pathway (hsa04912) to the second significant pathway. Sitras et al. conducted a gene expression profile analysis of CVD 22 . Gene set enrichment analysis showed significant association between GNRH signaling pathway and CVD 22 .
Vascular smooth muscle contraction (hsa04270) is the 5 th significant signal in our research. The vascular smooth muscle cell is a highly specialized cell. The principal function of vascular smooth muscle cell is contraction. By contraction, vascular smooth muscle cells shorten and decrease the diameter of a blood vessel to regulate the blood flow and pressure. Evidence shows that abnormal contraction of vascular smooth muscle is a major cause of vasospasm of the coronary and cerebral arteries 23 .
Here, we also highlighted the involvement of CVD related pathway in BMI. There are four CVD pathways in KEGG database, which include viral myocarditis (hsa05416), dilated cardiomyopathy (DCM) (hsa05414), hypertrophic cardiomyopathy (HCM) (hsa05410) and arrhythmogenic right ventricular cardiomyopathy (ARVC) (hsa05412). DCM is characterized by left ventricular dilation that is associated  with systolic dysfunction 24 . In our research, we identified DCM to be significantly associated with AD with P = 3.30E-03. There is also some literature evidence supporting the involvement of Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. More detailed information is described in Table 3.
Until now, there are kinds of software tools for pathway analysis of GWAS data 25 . Some tools including SNP ratio test 26 , GenGen 27 , GRASS 28 , and PLINK set-test 29 , accept raw genotype datasets as input data. Other tools including ProxyGeneLD 16 , ALIGATOR 25 , i-GSEA4GWAS 25 , and GESBAP 25 accept the summary results to subsequent pathway analysis. Here, we selected ProxyGeneLD for gene-based test because we did not have access to raw CRC genotype data. This software adjusts for gene length and LD patterns in the human genome, which can reduce the sources of bias and increasing the reliability in pathway analysis. We selected the KEGG not the GO database for pathway analysis. It is reported that KEGG database is manually compiled based on biological evidence and does not have a hierarchical structure 30,31 , whereas the GO database is based on computer predictions and human annotation. It has a hierarchical structure 30,31 . GO analysis typically assumes that each functional category is independent, and less than 1% of the GO annotations have been confirmed experimentally 30,31 .
Despite these interesting results, we recognize some limitations in our study. Multiple testing corrections may not be sufficient to account for all biases in pathway analysis. The results from the BMI GWAS should be adjusted using a permutation test. However, the original SNP genotype data for each individual are not available to us now. When we get the SNP genotype data, we will further perform a pathway analysis using some available software such as SNP ratio test 26 , GenGen 27 , GRASS 28 , and PLINK set-test 29 . These pathway analysis methods or software can be used to analyze the SNP genotype data, and can conduct a permutation test. Future replication studies using genotype data are required to replicate our findings.
Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results advance our understanding of BMI mechanisms in CVD and will be very informative for future genetic studies in BMI and CVD.  Table 3. Literature evidence supporting pathways associated with CVD.