Novel Common Variants Associated with Obesity and Type 2 Diabetes Detected Using a cFDR Method

Genome-wide association studies (GWASs) have been performed extensively in diverse populations to identify single nucleotide polymorphisms (SNPs) associated with complex diseases or traits. However, to date, the SNPs identified fail to explain a large proportion of the variance of the traits/diseases. GWASs on type 2 diabetes (T2D) and obesity are generally focused on individual traits independently, and genetic intercommunity (common genetic contributions or the product of over correlated phenotypic world) between them are largely unknown, despite extensive data showing that these two phenotypes share both genetic and environmental risk factors. Here, we applied a recently developed genetic pleiotropic conditional false discovery rate (cFDR) approach to discover novel loci associated with BMI and T2D by incorporating the summary statistics from existing GWASs of these two traits. Conditional Q-Q and fold enrichment plots were used to visually demonstrate the strength of pleiotropic enrichment. Adopting a cFDR nominal significance level of 0.05, 287 loci were identified for BMI and 75 loci for T2D, 23 of which for both traits. By incorporating related traits into a conditional analysis framework, we observed significant pleiotropic enrichment between obesity and T2D. These findings may provide novel insights into the etiology of obesity and T2D, individually and jointly.


Results
Assessment of pleiotropic enrichment. The conditional Q-Q plot for BMI conditional on T2D (Upper Panel (left) in Fig. 1) showed some enrichment across varying significance thresholds for T2D. The presence of leftward shift when restricting the analysis to include the SNPs that have more significant associations with BMI indicates an increase in the number of true associations for a given T2D p-value. Similar enrichment is observed for T2D given BMI (Upper Panel (right) in Fig. 1), as there appears to be a similar departure pattern between the different curves. These leftward deflections from the null line indicate a greater proportion of true associations for any given BMI nominal p-value.
Based on the fold-enrichment plot (Lower Panel of Fig. 1), we observed SNP enrichment for BMI across different levels of significance with T2D and vice versa. For progressively stringent p-value thresholds for BMI SNPs, we observed about a 50-fold increase in the proportion of SNPs reaching the genome wide significance level of -log10 (p) > 7.3 when comparing the subset with the most stringent conditional association to the group with all SNPs. A 50-fold increase was also observed for T2D.
As negative controls, conditional Q-Q plots for BMI given nominal p-values of association with attention-deficit/hyperactivity disorder (ADHD) (Upper Panel (left) in Figure S1) and major depressive disorder (MDD) (Lower Panel (left) in Figure S1), and T2D conditional on ADHD (Upper Panel (left) in Figure S2) and MDD (Lower Panel (left) in Figure S2) all showed no enrichment and vice versa.

BMI loci identified with cFDR.
Conditional on their association with T2D, we identified 287 significant SNPs (cFDR < 0.05) for BMI variation ( Fig. 2 and Table S2), which were mapped to 21 different chromosomes  and annotated to 323 genes. In the original meta-analysis for BMI GWAS 31 , 105 SNPs had p-values smaller than 1 × 10 −5 while 36 of them reached the standard genome-wide significance of 5 × 10 −8 . We confirmed 43 SNPs that were reported in the original BMI GWAS analysis 31 and previous BMI related GWASs 32 . Another 40 SNPs that were reported to be associated with BMI-related traits were also confirmed in our analysis [32][33][34] . The rest of the 204 SNPs were not previously reported in the original BMI GWAS 31 or any other previous obesity studies. However, 26 of these 204 SNPs are in high linkage disequilibrium (LD) (r2 > 0.6) with other BMI-associated SNPs reported previously (Table S3). Among the 323 genes these 287 SNPs were annotated to, 146 of these genes were newly detected compared to the original BMI GWAS 31 and previous obesity-related studies (Table S2). Among all the 287 detected loci for BMI, most of the genes were enriched in BMI-related terms such as "positive regulation of cellular metabolic process", "positive regulation of metabolic process" and "regulation of protein metabolic process". GO term enrichment analysis results were detailed in Table 1. T2D gene loci identified with cFDR. We identified 75 SNPs significantly (cFDR < 0.05) associated with T2D given their association with BMI ( Fig. 3 and Table S4), which were located on 20 chromosomes (1-20) and annotated to 89 genes. In the original meta-analysis for T2D GWAS 30 , 38 SNPs had p-values smaller than 1 × 10 −5 while 12 of them reached the standard genome-wide significance of 5 × 10 −8 . We confirmed 17 SNPs that were reported in the original T2D GWAS analysis 30 or previous T2D related GWASs 29,35 . Another 18 SNPs that were  reported to be associated with T2D-related traits were also confirmed in our analysis 33,36 . The remaining 40 SNPs were not previously reported in the original T2D GWAS 30 or any other T2D studies, although nine of these SNPs showed high LD (r2 > 0.6) with the T2D-associated SNPs reported previously (Table S5). For the 89 genes these 75 SNPs were annotated to, 42 of these genes were novel and not identified by the original T2D GWAS 30 or previous T2D-related studies (Table S2). Of the detected loci for T2D, some of the genes were enriched in T2D-related terms such as "pancreas development", "response to glucose" and "endocrine pancreas development". GO term enrichment analysis were detailed in Table 1.

Pleiotropic gene loci for both BMI and T2D. The conjunction FDR analysis detected 23 independent
pleiotropic loci that were significantly (conjunction FDR < 0.05) associated with both traits ( Fig. 4 and Table 2). Of the 23 identified pleiotropic variants, one SNP rs9930506 (FTO) reached genome-wide significance in the original BMI and T2D GWASs 30,31 . The SNPs rs7141420 (NRXN3), rs1996023 (GNPDA2 and GABRG1), rs16945088 (FTO), rs9540493 (LOC10272396 and LINC01052) and rs4238585 (GPR139 and GP2) reached genome-wide significance in only the original BMI GWAS 31 . Six SNPs (rs10787472 (TCF7L2), rs2881654 (PPARG), rs849135 (JAZF1), rs4481184 (IGF2BP2), rs1783598 (FCHSD2) and rs12245680 (TCF7L2)) were reported to be significant for only T2D in the original 30 or previous T2D GWAS 29 . The two SNPs rs6795735 (ADAMTS9-AS2) and rs12454712 (BCL2) were previously reported to be associated with both obesity and T2D 29,37,38 . The other five SNPs (rs17584208, rs11979110, rs10898868, rs1996023 and rs4474658) were previously reported to be associated with high density lipoprotein (HDL) and proinsulin 34,39,40 . The final four SNPs were not previously reported in the original BMI and T2D GWASs or GWAS studies for any related traits. For the 30 genes the identified pleiotropic SNPs were annotated to, we found twelve of them (AKAP6, NPAS3, PSRC1, MYBPHL, MIR29A, GABRG1, ZNF664, FAM101A, LOC10272396, LINC01052, GPR139, and PUM1) were not identified by any BMI or T2D related GWASs. For the SNPs that were annotated to these 12 genes, two SNPs were located in the intronic regions of genes ZNF664 and PUM1 respectively, and the other five SNPs were all located in intergenic regions (Table 2). Of the detected 23 pleiotropic loci, most of the genes were enriched in BMI and T2D related terms such as "regulation of lipid storage", "regulation of fat cell proliferation", "fat cell differentiation", and "fatty acid transport". Detailed information of GO term analysis was given in Table 1.

Protein-protein interaction network.
The 323 identified BMI-associated genes were retrieved from the STRING database. Only 143 genes, including 46 novel genes, were annotated in this database. The 143 genes were clearly enriched in three clusters: TMEM18, PPARG and MAP2K5 ( Figure S3). Two novel genes MSRA and PDILT, respectively encoding methionine sulfoxide reductase A and protein disulfide isomerase-like, were directly connected with the TMEM18 cluster. Another two novel genes, MED23 and ANPC4, respectively encoding mediator complex subunit 23 and anaphase promoting complex subunit 4, were involved in the PPARG cluster. Another three novel genes, MEF2D, RASL11A and PTPN12, respectively encoding myocyte enhancer factor 2D, RAS-like, family 11, member A and protein tyrosine phosphatase, were involved in the MAP2K5 cluster.
( Figure S3). The 89 identified T2D-associated genes were retrieved from the STRING database. Only 37 genes, including 7 novel genes, were annotated in this database. The 37 genes were clearly enriched into three clusters: HNF4A, MTNR1B and TCF7L2 ( Figure S4). Three novel genes, ANXA11, BCL2L11 and NEUROG3, those respectively encoding Annexin A11, BCL2-like 11 and Neurogenin 3, were involved in the HNF4A cluster. Another two novel genes, NPBWR2 and PTHLH, encoding Neuropeptides B/W receptor 2 and Parathyroid hormone-like hormone, were involved in the MTNR1B cluster. The other novel gene MED30 was directly connected with TCF7L2 cluster ( Figure S4)

Discussion
In our study, two GWASs with summary statistic p values were combined to explore the pleiotropic enrichment of SNPs that are associated with BMI and T2D. Compared to the conventional standard single phenotype GWASs, simultaneously analyzing multiple related traits allows for the increased discovery of trait-associated variants without requiring additional larger datasets for each individual trait. By leveraging the power of two different GWAS datasets from BMI and T2D, we discovered 287 loci for BMI and 75 loci for T2D. Using the standard GWAS significance threshold in the datasets, only 36 31 were significant for BMI. Most of the genes have not been reported to show borderline significance with BMI, as detailed in Table S2. Adopting the genetic pleiotropy-informed cFDR method, we found 12 additional novel loci associated with both BMI and T2D. These novel findings may enable us to further dissect the overlapping genetic mechanisms between these two related phenotypes. The improved detection of novel susceptibility loci with genetic pleiotropy may lead us to a better understanding of common etiology between disorders and have a significant impact on the clinical treatment and prevention of related complex human diseases.
The cFDR approach was adopted here to account for some of the missing heritability between traits or diseases. This method employs the idea that a variant with significant effects in two associated phenotypes is more likely to be a true effect, and therefore has a higher probability of being detected in multiple independent studies 4,7 . This technique allows for an increase in effective sample size and therefore a subsequent increase in power to detect true associations for more variants with small to moderate effect sizes, which are often ignored in the standard single phenotype GWAS. In addition, the genetic enrichment presented in conditional Q-Q plots conveys that the decreased cFDR value for a given nominal p value greatly increases power to detect true association effects. When initially implementing the cFDR method, Andreassen et al. 7 demonstrated one advantage of this model-free empirical cdf approach is for the avoidance of bias in cFDR estimates from model misspecification. Through a comparison of traditional unconditional FDR and cFDR methods, they found that the latter resulted in an increase of 15-20 times the number of SNPs under the same FDR threshold of 0.05 7 .
Our cFDR analysis identified 23 pleiotropic signals annotated to 30 genes, providing evidence for the close relationship and shared genetic mechanisms between these two traits. These findings are consistent with the evidence from previous studies 25 that have demonstrated a causal relationship between these two traits. The genes FTO, MC4R and TCF7L2 were frequently reported and replicated in previous BMI and T2D related studies 26,30,41 . However, potential confounding factors and biases might coincidently be responsible for some of these associations. For the genes FTO and MC4R, their respective effects on T2D were found to be modest and previous studies showed that their effects on T2D disappeared after adjustment for BMI 42 . In European populations, TCF7L2 was not reported as a risk factor for obesity although its effect on T2D risk is modulated by obesity because of the interaction between TCF7L2 polymorphisms (rs7903146) and BMI status 43 . The implementation of the cFDR method in our study not only furnishes another empirical validation for the cFDR method to successfully detect novel and known disease associated genetic variants, but also shows the possibility of improved discovery of novel susceptibility loci using existing GWAS summary statistics. There were 14 genes (JAZF1, IGF2BP2, NRXN3,  ADAMTS9-AS2, GIPR, FCHSD2, HHEX, EXOC6, KLF14, ARAP1, GNPDA2, GP2, C2CD4A and C2CD4B) that  Table 2. were associated with either BMI or T2D in previous studies but not with both that were detected as pleiotropic loci in this analysis. Furthermore, 12 novel genes are worth noting because no previous study has reported associations with either BMI or T2D for any of them. For the SNPs that were annotated to these 12 genes, two SNPs were located in the intronic regions of genes ZNF664 and PUM1 and the other five SNPs were all located in intergenic regions. As examples, we will discuss two of these genes ZNF664 and PUM1 for their potential functional relevance and significance.
The SNP rs825461 is located at the intronic region between gene ZNF664 and FAM101A. The ZNF664 gene encodes a protein named zinc finger protein 664, and one study reported that ZNF664 was involved in eye development and that the monogenic form may be associated with high risk of myopia 44 . Furthermore, ZNF664 was previously reported to show suggestive association (P < 1E-4) with adiponectin 45 , a protein involved in many metabolic processes including glucose regulation and fatty acid oxidation 46 . The rs1473 SNP is located at the intronic region of the gene PUM1, a member of the PUF family of proteins that contains a sequence-specific RNA binding domain. One study reported that the protein may be involved in the regulation of embryogenesis, and cell development and differentiation 47 . These genes may be involved in certain processes that are significant in the development of obesity and T2D, however future studies are needed to explore the exact mechanisms of those novel genes we identified.
Our study presents several strengths. First, the statistical power is increased through the cFDR method by leveraging two large GWAS datasets, providing an increase in the effective sample size. Although a meta-analysis of the same data would offer a similar gain, the meta-analysis approach only allows for more powerful detection of loci with the same direction of allelic effects in the phenotypes 48 , whereas the cFDR method allows for detection of loci regardless of their effect directions. Second, we consider two traits that are unlikely to be correlated with BMI and T2D, ADHD and MDD, and generate conditional QQ plots with respect to these "control traits. " The "control trait" enrichment analysis provides an alternative way to examine pleiotropic enrichment and provides a baseline that can be used to statistically partially validate the novel findings in our study. We believe that the collider-stratification bias is unlikely in our analysis because, the GWAS datasets have undergone genomic control (GC) and we also carried out LD pruning with r2 > 0.2. In addition, our conditional analysis provides a model-free method to obtain conservative estimation 4,7,8 .
Our study may also have some important limitations. First, we could not provide information about the effect estimates of pleiotropic loci on the phenotypes due to a lack of detailed individual-study-level data. However, we can infer this information from the summary effect sizes in the original GWAS study. This cFDR approach cannot distinguish between vertical and horizontal pleiotropy of the pleiotropic signals, although this question might be partially addressed in future Mendelian Randomization 49,50 studies. Second, it is likely that some of our cFDR results may be overestimated due to overlapping samples although the model-free approach is able to neutralize this overestimation of the conservative cFDR estimate 4,7,8 . Alternative approaches may be applied to check whether novel loci could still be identified in order to further confirm novel findings in our study or to furnish an empirical comparison of the relative performance of alternative methods, a topic we wish to pursue in the future with comprehensive theoretical and simulation approaches. In summary, by incorporating the shared genetic effects of two closely related traits into a conditional analysis framework, we observed significant pleiotropic enrichment between obesity and T2D. We identified several novel pleiotropic loci of potential functional significance for obesity and T2D in our analysis, and the results may provide us with novel insights into the shared genetic influences between these two disorders.

Materials and Methods
GWAS Datasets. The dataset for T2D contains association summary statistics for a trans-ethnic T2D GWAS meta-analysis of 26,488 cases and 83,964 controls 30 . Ancestry-specific meta-analyses were previously performed by component datasets from the full set of cohorts, including the DIAbetes Genetics Replication and Metaanalysis (DIAGRAM) Consortium (European descent) 29 , the Asian Genetic Epidemiology Network T2D (AGEN-T2D) Consortium (East Asian descent) 51 , the South Asian T2D (SAT2D) Consortium (south Asian descent) 52 , and the Mexican American T2D (MAT2D) Consortium (Mexican and Mexican American descent) 53 . Further details of the samples and methods employed within each ancestry group are presented in the corresponding consortium papers 29,51-53 . Briefly, various genotyping products were applied in the individual study's assay processes, with appropriate sample and SNP quality control (QC). Genotype imputation was conducted within each GWAS dataset using Phase II/III HapMap as the reference panels. Each SNP with MAF > 1% (except MAF > 5% in MAT2D GWAS due to a smaller sample size) and passing QC was analyzed for association with T2D using an adjusted additive model. Association summary statistics of each ancestry-specific meta-analysis were combined using a fixed-effect inverse-variance weighted meta-analysis. Genomic control (GC) was carried at the individual study level, after ethnicity-specific meta-analysis, and finally after trans-ethnic meta-analysis 30 .
The GIANT dataset for BMI contains association summary statistics for the GWAS and Metabochips meta-analysis of 339,224 individuals of various ancestries, including 322,154 individuals of European descent and 17,072 individuals of African-American and Hispanic descent 31 . The data contains the summary p-values from meta-analysis after correction for inflation due to potential population admixture. Two rounds of GC were separately applied both at the cohort level and after meta-analysis 31 .
Data Preparation. Before the implementation of the cFDR method, several preparation steps were performed. First, we checked the European Ancestry cohorts for overlapping samples included in these two datasets (Table S1). Next, we combined the common SNPs included in these two datasets. Then we applied a linkage disequilibrium (LD) based SNP pruning method 7,8 to remove large correlations between pairs of variants. The SNP pruning method begins using a window of 50 SNPs where LD is calculated between each pair of SNPs. The minor allele frequency (MAF) is the basis for the SNP pruning, where for pairs with r 2 > 0.2 we removed the SNP with smaller MAF. Following this initial removal of SNPs, the window slides 5 SNPs forward and the process is repeated until there are no pairs of SNPs that are high in LD. The dataset was pruned using the HapMap 3 genotypes as a reference panel. Last, we performed gene annotation for the final set of 123,804 variants that were included in the analyses.
GC corrections were used in the GWASs to ensure that the variance estimates for each SNP are not inflated due to population structure and cryptic relatedness 54 . Both of the original datasets 30,31 we adopted in our study applied GC at the individual study level and again after meta-analysis, hence there was no need for us to reapply the GC in this analysis.

Statistical analysis.
The cFDR approach is well-established now, which has been widely applied by many other groups 4,7,8,55,56 and our group [12][13][14] . We briefly summarized this cFDR approach as follows: after the data preparation processing, we computed the conditional empirical cumulative distribution functions (cdfs) of the corrected p-values for the x axis in conditional QQ plot. Empirical cdfs for BMI SNP p-values were conditioned on nominal p-values in T2D, and vice versa. For each nominal p-value, an estimate of the cFDR was obtained from the conditional empirical cdfs. Using this cFDR approach, we obtained two cFDR tables-cFDR result for BMI conditioned on T2D and vice versa. Using these tables we identified loci associated with BMI and T2D (cFDR < 0.05), respectively. Then a conjunction method was used to find SNPs significantly associated with both BMI and T2D. Specifically, we took the maximum of those two cFDR values above as our conjunction FDR.
Conditional QQ and enrichment plots for assessing pleiotropic enrichment. As an intuitive illustration, we presented conditional Q-Q plots to graphically assess the pleiotropic enrichment of SNPs of the principal phenotype successively conditioning on various strengths of associations with the conditional phenotype. We plotted the QQ curve for the quantiles of nominal -log (p) 10 -values obtained from GWAS summary statistics for association of the subset of SNPs that are below each significance threshold in the conditional trait. The nominal -log (p) 10 -values were plotted on the y-axis and the empirical quantiles (empirical cumulative distribution functions (cdfs)) of the nominal p-values were plotted on the x-axis. Under the global null hypothesis, the theoretical distribution of p-values is expected to lie approximately on the diagonal line of the Q-Q plots. Enrichment of genetic associations is indicated as a leftward deflection from the null line as the principal phenotype is successively conditioned on increasing strength of associations with the conditional phenotype. The degree of deflection between curves provides important information about the degree of pleiotropy between the two phenotypes. Larger deflections are considered to represent a greater enrichment of pleiotropic genes between the two phenotypes.
For the associated phenotypes BMI and T2D, pleiotropic "enrichment" of BMI with T2D exists if the proportion of SNPs or genes associated with BMI increases as a function of increased association with T2D. To confirm SCientifiC RepoRtS | 7: 16397 | DOI:10.1038/s41598-017-16722-6 the pleiotropic enrichment effect, we presented fold-enrichment plots of nominal -log (p) 10 values for BMI SNPs below the standard GWAS threshold of p < 5 × 10 −8 and for subsets of SNPs determined by the significance of their association with T2D and vice versa. As the p values of the conditional phenotypes become more significant, lower upward shift from the null line will persist.
In order to check the pleiotropic enrichment and provide a baseline that can be used to confirm novel findings, we also generated conditional QQ plots for two control traits that are unlikely to be correlated with BMI and T2D, ADHD and MDD.

Conditional Manhattan plots for localizing genetic variants.
To demonstrate the localization of the SNPs associated with BMI conditional on their significance on T2D, and the reverse, we present conditional Manhattan plots. The plots present the relationship between all SNPs within an LD block and their chromosomal locations. The 22 chromosomal locations are plotted on the x-axis, and the -log (FDR) 10 BMI values conditional on T2D are plotted on the y-axis and vice versa for T2D. Any SNP with a -log (FDR) 10 value greater than 1.3 (FDR < 0.05) was deemed to be significantly associated with the principal phenotype. We also present a conjunction Manhattan plot to demonstrate the locations of the common pleiotropic genetic variants associated with both phenotypes.
Functional annotation and gene enrichment analysis. In order to evaluate the biological functions of the individual trait associated loci identified by cFDR and pleiotropic loci identified by conjunction FDR, we performed functional annotation and gene enrichment analysis using the gene ontology (GO) terms database (http:// geneontology.org/) 57 . All significant genes identified by cFDR and conjunction cFDR in our study were annotated and characterized based on three main categories: biological processes, cellular component and molecular functions. This analysis provided comprehensive biological information, allowing us to partially validate our findings by determining specific genes that are enriched in T2D-and obesity-related GO terms.
Protein-protein interaction network. In order to detect interactions and associations of the BMI-associated and T2D-associated genes respectively, protein-protein interaction analyses were conducted by searching the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (http:// string-db.org/). The STRING database comprises known and predicted associations from curated databases or high-throughput experiments, and also with other associations derived from text mining, co-expression, and protein homology 58 .