Introduction

Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. It is reported that body mass index (BMI) is risk factor for CVD1. To evaluate the association between BMI and CVD mortality, Chen et al. conducted a pooled analysis of 20 prospective cohorts in Asia, including data from 835,082 East Asians and 289,815 South Asians1. The results showed that a high BMI is a risk factor for mortality from overall CVD and for specific diseases, including coronary heart disease, ischaemic stroke and haemorrhagic stroke in East Asians1. In the United States, all major Hispanic/Latino groups have a high prevalence of obesity. Kaplan et al. analyzed the CVD and BMI data from US Hispanic men (N = 6547) and women (N = 9797)2. They found that elevated BMI is common in Hispanic/Latino adults and is associated with a considerable excess of CVD risk factors2.

Lamon-Fava et al. studied the distribution of BMI in men and women and the association of BMI with known coronary heart disease (CHD) risk factors3. Their results indicate that increased BMI is associated with an adverse effect on all major CHD risk factors3. Loprinzi et al. examine the associations of underweight, overweight and obesity with kinds of CVD risk factors4. Their results showed that underweight adults had significantly lower levels of CRP; total cholesterol; total cholesterol to high-density lipoprotein cholesterol ratio; LDL cholesterol; and triglycerides compared to normal-weight individuals4. The overweight and obese adults had higher levels for each biomarker compared to normal-weight individuals4.

Much effort has been put into identifying the genetic determinants of CVD. Genome-wide association studies (GWAS) have recently provided rapid insights into genetics of CVD and its risk factors5,6,7,8,9,10,11. However, these newly identified susceptibility loci exert very small risk effects and cannot fully explain the underlying genetic risk. A large proportion of heritability has yet to be explained. Recent pathway analyses of GWAS have been put in the investigation of human disease pathogenesis and yielded important new insights into genetic mechanisms of human complex diseases, such as Alzheimer’s disease12,13 and rheumatoid arthritis14. Until now, the specific mechanisms how BMI influences CVD risk are largely unknown. We think that BMI may influences CVD risk by shared genetic pathways. In order to confirm this view, we conducted a pathway analysis of BMI GWAS, which examined approximately 329,091 single nucleotide polymorphisms (SNPs) from 4763 samples.

Materials and Methods

Study Population

The study subjects were from members of the Northern Finnish Birth Cohort of 1966 (NFBC1966)15. Mothers expected to give birth in the two Northern provinces of Oulu and Lapland in 1966 were enrolled in the NFBC1966 (N = 12058 live births, Rantakallio 1969)15. Primary clinical data collection on parents and the child occurred prenatal and at birth. Data collection on the child continued at ages six months, one year, 14 years (no data from one year or 14 years are included in this paper), 31 years, with assessment of a wide range of trait measures15. Informed consent from all study subjects was obtained using protocols approved by the Ethical Committee of the Northern Ostrobothnia Hospital District. The methods were carried out in accordance with the approved guidelines. Participants provided fasting blood samples for evaluation of the metabolic measures15. 4763 samples were genotyped using Illumina Infinium 370cnvDuo array15. According to the exclusion criteria and quality control procedures, SNPs were included for following analysis if the call rate in the final sample was >95%, if the P value from a test of Hardy-Weinberg Equilibrium (HWE) was >0.0001 and if the Minor Allele Frequency (MAF) was >1%15. In the end, 329,091 SNPs passed the quality controls and were selected for following analysis. For each SNP, the genotype was coded as 0, 1 or 2 copies of the minor allele. A regression analysis in PLINK was used to test the association between each SNP and BMI15.

Gene-based testing for GWAS dataset

Here, we got the summary results from SNP-based test in the original study15 and performed a gene-based testing for GWAS dataset. ProxyGeneLD was used to assign SNPs to specific genes16. ProxyGeneLD begins with the retrieval of linkage disequilibrium (LD) structures in the HapMap genotyping data16. If a group of markers is in high LD in HapMap (r2 > 0.8), they are tied to a ‘proxy cluster’ and taken as a single signal17. Next, each marker in the BMI GWAS with statistically significant evidence of association is evaluated to see whether (a) it belongs to any proxy cluster and (b) whether the marker itself or any marker in the cluster is located in a genetic region17. If a marker or cluster overlaps a region extending across a gene, then it is assigned as showing possible association with that gene. Finally, a P value was given for each gene16. The P value was adjusted for the LD patterns in the human genome and gene length, but not multiple hypothesis testing correction. Genes with adjusted P < 0.05 are considered to be significant. For more detailed algorithms, please refer to the original study and our previous publications16.

Pathway-based testing for GWAS dataset

The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in WebGestalt were used18. For a given pathway, the hypergeometric test was used to detect the overrepresentation of BMI-related genes among all of the genes in the pathway18. The false discovery rate (FDR) method was used to correct for multiple testing. Any pathway with an adjusted P < 0.01 and at least five BMI genes was considered significant. In order to reduce the multiple-testing issue and to avoid testing overly narrow or broad pathways, we selected pathways that contained at least 20 and at most 300 genes for subsequent analysis.

Results

Gene-based test for GWAS dataset

We got 1008 significant BMI genes, which included 41 BMI genes with P < 0.001. TFAP2B is the most significant gene, which is reported to be significantly associated with BMI by previous studies19,20,21. Meanwhile, we identified other new BMI susceptibility genes. These genes were significantly associated with BMI with P < 0.001 (Table 1).

Table 1 The top 41 significant genes identified by gene-based analysis of BMI GWAS.

Pathway-based analysis of GWAS dataset

We identified 31 significant KEGG pathways with at least five BMI genes. Based on the classifications of the KEGG pathways, these 31 pathways can be mainly divided into environmental information processing (n = 6), cellular processes (n = 5), circulatory system cellular processes (n = 5), metabolism (n = 5), endocrine system (n = 2), genetic information processing (n = 2), immune system (n = 2), nervous system and diseases (n = 2), cardiovascular diseases (n = 1) (Table 2). The detailed genes in these significant pathways are described in Supplementary Table 1.

Table 2 The significant KEGG pathways with P < 0.01 by pathway analysis of BMI GWAS.

Discussion

In order to investigate how BMI influences CVD risk, we conducted a pathway analysis of BMI GWAS using 329,091 SNPs in 4763 Europeans. On the gene level, we identified 41 BMI genes with P < 0.001. TFAP2B is the most significant signal. Previous studies supported significant association of TFAP2B with BMI19,20,21. Speliotes et al. analyzed 249,796 individuals and revealed 18 new loci associated with BMI. The TFAP2B rs987237 variant is significantly associated with BMI with P = 3.00E-2019. Berndt et al. identified 11 new loci for anthropometric traits. The TFAP2B rs987237 variant is significantly associated with BMI (P = 2.00E-11)20. Wen et al. conducted a meta-analysis of GWAS in East Asian-ancestry populations and reported TFAP2B rs9473924 variant is significantly associated with BMI (P = 4.00E-07)21.

On the pathway level, we identified 31 significant KEGG pathways. Some of these pathways are identified to be associated with CVD. Here, we identified GnRH signaling pathway (hsa04912) to the second significant pathway. Sitras et al. conducted a gene expression profile analysis of CVD22. Gene set enrichment analysis showed significant association between GNRH signaling pathway and CVD22.

Vascular smooth muscle contraction (hsa04270) is the 5th significant signal in our research. The vascular smooth muscle cell is a highly specialized cell. The principal function of vascular smooth muscle cell is contraction. By contraction, vascular smooth muscle cells shorten and decrease the diameter of a blood vessel to regulate the blood flow and pressure. Evidence shows that abnormal contraction of vascular smooth muscle is a major cause of vasospasm of the coronary and cerebral arteries23.

Here, we also highlighted the involvement of CVD related pathway in BMI. There are four CVD pathways in KEGG database, which include viral myocarditis (hsa05416), dilated cardiomyopathy (DCM) (hsa05414), hypertrophic cardiomyopathy (HCM) (hsa05410) and arrhythmogenic right ventricular cardiomyopathy (ARVC) (hsa05412). DCM is characterized by left ventricular dilation that is associated with systolic dysfunction24. In our research, we identified DCM to be significantly associated with AD with P = 3.30E-03.

There is also some literature evidence supporting the involvement of Gap junction, Wnt signaling, Calcium signaling and Chemokine signaling in CVD. More detailed information is described in Table 3.

Table 3 Literature evidence supporting pathways associated with CVD.

Until now, there are kinds of software tools for pathway analysis of GWAS data25. Some tools including SNP ratio test26, GenGen27, GRASS28 and PLINK set-test29, accept raw genotype datasets as input data. Other tools including ProxyGeneLD16, ALIGATOR25, i-GSEA4GWAS25 and GESBAP25 accept the summary results to subsequent pathway analysis. Here, we selected ProxyGeneLD for gene-based test because we did not have access to raw CRC genotype data. This software adjusts for gene length and LD patterns in the human genome, which can reduce the sources of bias and increasing the reliability in pathway analysis. We selected the KEGG not the GO database for pathway analysis. It is reported that KEGG database is manually compiled based on biological evidence and does not have a hierarchical structure30,31, whereas the GO database is based on computer predictions and human annotation. It has a hierarchical structure30,31. GO analysis typically assumes that each functional category is independent and less than 1% of the GO annotations have been confirmed experimentally30,31.

Despite these interesting results, we recognize some limitations in our study. Multiple testing corrections may not be sufficient to account for all biases in pathway analysis. The results from the BMI GWAS should be adjusted using a permutation test. However, the original SNP genotype data for each individual are not available to us now. When we get the SNP genotype data, we will further perform a pathway analysis using some available software such as SNP ratio test26, GenGen27, GRASS28 and PLINK set-test29. These pathway analysis methods or software can be used to analyze the SNP genotype data and can conduct a permutation test. Future replication studies using genotype data are required to replicate our findings.

Collectively, our study supports the potential role of the CVD risk pathways in BMI. BMI may influence CVD risk by the shared genetic pathways. We believe that our results advance our understanding of BMI mechanisms in CVD and will be very informative for future genetic studies in BMI and CVD.

Additional Information

How to cite this article: Zhao, X. et al. Pathway analysis of body mass index genome-wide association study highlights risk pathways in cardiovascular disease. Sci. Rep. 5, 13025; doi: 10.1038/srep13025 (2015).