Introduction

Type 2 diabetes (T2D) is a widespread epidemic, which disproportionately affects minority populations in the United States, such as African-American (AA) populations, compared with populations of European descent.1 Genetic, environmental and cultural factors may contribute to this disproportionate risk. Genome-wide association studies (GWASs) have discovered many common variants influencing predisposition to T2D. However, the vast majority of these studies have been performed in populations of European and Asian ancestry and little data are available for AAs.2 To date, the only gene verified as being associated with T2D in populations of African descent is TCF7L2.3 However, other studies have cast doubt on whether these and other markers associated with T2D represent the same level of risk in AA populations.4, 5 Results from two case–control association studies evaluating these GWAS-derived T2D single-nucleotide polymorphisms (SNPs) in AAs remain conflicting and warrant further study.6, 7

From 1993 to 2003, investigators of the American Diabetes Association, through the Genetics of NIDDM (GENNID) project, ascertained 1496 individuals from 580 AA families through T2D-diagnosed siblings at multiple sites as a resource for the discovery of genes related to T2D and its complications.8, 9 Family-based association studies allow for better control of population stratification and heterogeneity compared with case–control association studies.10 In this study, we evaluated 32 GWAS-derived T2D-associated SNPs in the GENNID AA pedigrees. Our study verified the association of SNP rs10490072 (in BCL11A) and rs864745 (in JAZF1) with T2D in AAs. These two SNPs fall within the support interval of suggestive linkage peaks (at chromosomes 2 and 7, respectively) for T2D in this cohort;8 thus we performed linkage disequilibrium-based fine mapping of these loci by genotyping 21 tag SNPs within the haplotype block that includes T2D-associated index SNPs. Finally, to develop causal models of diabetes, we sought to define the role of these polymorphisms as cis-regulatory elements in modulating the expression of transcripts in transformed lymphoblastoid cells available for a subset of 160 GENNID family subjects from Arkansas.

Materials and methods

Study cohort

The GENNID study ascertained 1496 subjects of 580 AA families through a sibling pair, each with a T2D diagnosis from 10 sites. T2D was diagnosed using the National Diabetes Data Group criteria. This study was approved by the Institutional Review Board at each participating institution. The GENNID cohort includes multigenerational families, affected sib pairs and nuclear families with affected siblings, available parents and unaffected sibs. Physical examination data and DNA were available on 1496 subjects, which after removing apparent sample discrepancies were reduced to 1450 individuals. Characteristics of this study cohort are summarized in Supplementary Table 1; see Elbein et al.8 for more details.

SNP selection

We selected 32 SNPs of 27 loci for our analysis. All SNPs chosen were from prior GWASs for T2D in Caucasian and East-Asian populations, and most of them have been replicated in independent Caucasian ancestry cohorts.11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 Supplementary Table 2 lists the studies from which SNPs had been selected and associated with T2D and/or related traits in Caucasian and East Asian populations. Nine tag SNPs across JAZF1 and 10 tag SNPs across BCL11A were further selected for genotyping in the GENNID AA sample, in addition to the GWAS index SNPs (rs864745 and rs10490072). Tag SNPs were selected based on HapMap (CEU, YRI and ASW) and an AA ESRD cohort genotype data27, 28 under a confidence interval model of linkage disequilibrium-block structure around the index SNP (pair-wise tagging with an r20.90).

Genotyping

Salted out DNA samples from lymphoblastoid cell lines of GENNID AA subjects were provided by the Coriell Cell Repository (Camden, NJ, USA), quantified by picogreen, and concentrations adjusted for genotyping purposes. Supplementary Table 2 lists 32 T2D GWAS SNPs that were genotyped on different platforms. Sixteen SNPs were genotyped using Single Base Primer Extension reactions in a 12-plex format using the GenomeLab-SNPstream Genotyping System (Beckman Coulter, Inc., Fullerton, CA, USA) and another 16 SNPs were genotyped by pre-designed Taqman SNP genotyping assays (Applied Biosystems Inc., Foster City, CA, USA) using an ABI-7500 Fast real-time PCR system. SNP genotyping success rates for the SNPstream and Taqman were 99.3 and 98.9%, respectively. An additional 19 JAZF1 and BCL11A haplotype block tag SNPs were genotyped on a Sequenom MassARRAY system (Sequenom Inc., San Diego, CA, USA) according to the manufacturer's iPLEX application guidelines. Details of Sequenom multiplex genotyping assays are shown in Supplementary Table 3. The genotyping calling rate was above 99%, and the genotyping reproducibility was 100% assured by 70 evenly distributed duplicate samples across the genotyping plates, as well as by two standard samples on each genotyping plate.

Transformed lymphocyte cell line culture

We used total RNA extracted from Epstein–Barr virus-transformed lymphocytes (TLs) for evaluating the role of GWAS-associated SNPs in regulating transcript level expression of nearby genes. TLs used in our study were derived from blood samples of 160 GENNID AA subjects (80 sib pairs) from Arkansas. Cells were grown under normoglycemic (5.6 mM glucose) standard culture conditions in RPMI-1640 culture media (Cat. 11875, Gibco-Invitrogen, Carlsbad, CA, USA) supplemented with 10% Benchmark fetal bovine serum (Cat. 100-106, lot no. A33B00Z, Gemini Bio-Products, West Sacramento, CA, USA).

RNA isolation and gene expression

Total RNA was isolated from TLs by using a Qiagen RNeasy Mini Kit (Qiagen, Valencia, CA, USA). RNA was quantified using a NanoDrop ND-2000 (NanoDrop Technologies Inc., Wilmington, DE, USA), and quality was assessed by an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Genome-wide expression analysis using TL total RNA was performed as described elsewhere.29 In brief, labeling and hybridization to Illumina HT-12 beadchip arrays (version 4; San Diego, CA, USA) was performed according to the manufacturer's instructions. Resulting data were processed and normalized using the average normalization algorithm as implemented in GenomeStudio Gene Expression Module v1.0 application software (Illumina). Background was subtracted before the scaling. Probes with detection P-values >0.01 were additionally excluded because of the lack of evidence for reliable quantitative expression.

Statistical analyses

Likelihood analysis as implemented in jPAP was used to test each SNP for association with T2D, age of diagnosis (AOD), body mass index (BMI) and waist–hip ratio (WHR).30 BMI and WHR were transformed separately in males and females, using the inverse normal distribution, for which a quantile was assigned to each trait value and the corresponding inverse normal deviate assigned as the trait. Transformed BMI and WHR and untransformed AOD were each modeled as a normal density. T2D risk was modeled to account for AOD in the affected pedigree members, whereas allowing for censored observations.9 SNP genotypes were coded as 0, 1 or 2, thereby assuming an additive effect. Analysis of all traits accounted for heritability and included SNP genotype and gender as covariates; analysis of BMI and WHR also included age as a covariate; analysis of T2D was performed separately without obesity adjustment and with adjustment for either BMI or WHR. Associations were tested through comparison of the maximum likelihood obtained when estimating the SNP effect to the maximum likelihood when fixing the SNP effect to zero. P-values were obtained as twice the natural logarithm of the likelihood ratio for a 1-df chi-square statistic. In each likelihood maximization, all other model parameters were estimated in analyses of BMI, WHR and AOD, whereas only the SNP effect and heritability were estimated in analysis of T2D, with all other parameters fixed at estimates obtained when correcting for ascertainment through an affected sib pair. In this paper we present association results for T2D and AOD. Merlin was used to infer the most likely haplotype for each family member for the JAZF1 and BCL11A region SNPs.31 Then association analysis was performed as before, except for testing each 2-SNP to 10-SNP (JAZF1) or 11-SNP (BCL11A) haplotype rather than single SNPs.

We assessed associations between the selected T2D GWAS SNPs and normalized quantitative expression values of local transcripts (within 1 Mb up- and downstream of tested SNP). We tested association of variable number of transcripts for each SNP, ranging from 3 transcripts (for SNP rs10923931 and rs864745) to 43 transcripts (for SNP rs1800247). The association of probe-expression level with genotype was assessed with an additive model implemented in SAS software (ver. 9.1, SAS Institute, Cary, NC, USA). The generalized estimating equations procedure was used to account for family membership. To control for potential population stratification, the association was also analyzed using a modification of the within-family association test.32 In brief, this method partitions the association into between- and within-family components represented, respectively, by the sibship mean of the continuous numeric variable for genotype and each individual's deviation from this mean. The test of the significance of the within-family components is a test of co-transmission among siblings, which is robust to population stratification. An additional adjustment of diabetic status was also applied. The P-values were not adjusted for multiple comparisons. Statistical power of our gene expression cohort was modest (92%) to detect 15% of the variation in gene expression levels (assuming a type 1 error rate=0.005, MAF>0.15, additive model) in general association.

Results

The SNP rs864745 in JAZF1 gene showed a nominally significant association with T2D (P=0.018) in the GENNID African-American sample. This association was stronger after adjustment for BMI (P=0.006). A SNP (rs10490072) in BCL11A gene was also associated with T2D after adjustment for BMI (P=0.03) and was more strongly associated with AOD (P=0.007). WFS1 SNP rs10010131 showed a marginal association with T2D. In all 29 of 32 other GWAS-derived SNPs, including TCF7L2 SNPs rs7907346 and rs12255372, showed no association with T2D in this cohort (Table 1 and Supplementary Table 4). Discriminatory ability of the combined SNP information was assessed by grouping individuals based on number of risk alleles carried for three variants (rs10490072, rs864745 and rs10010131) that are associated with T2D in our cohort. The P-values testing for increased risk of T2D for 4+, 5+ and 6 risk alleles were 0.0011, 0.0011 and 0.00979, respectively. Thus, discriminatory ability of three SNPs combined in predicting T2D risk was slightly higher than for a single SNP.

Table 1 Association of SNPs identified in Caucasian GWAS studies with type 2 diabetes in GENNID AA families

In this sample, we earlier reported linkage for T2D on chromosome 2 (logarithm of odds=3.58 at 84 cM, 1-lod drop support interval 77–102 Mb) and chromosome 7 (logarithm of odds=2.62 at 24 cM, 1-lod drop support interval 14–29 Mb).9 SNPs rs10490072 in BCL11A and rs864745 in JAZF1 are within the 1-lod drop support interval of these two linkage peaks at chromosomes 2 and 7, respectively. Thus, 11 BCL11A tag SNPs (including the GWAS tag SNP rs10490072) and 10 JAZF1 tag SNPs (including the GWAS index SNP rs864745) were genotyped in these pedigrees to identify causal variant(s) with larger effect sizes and tested haplotypic associations in these regions. The LD relationships of genotyped SNPs in these two loci in our cohort are shown in Supplementary Figure 1. For JAZF1, no T2D risk haplotype produced higher significance than did the rs864745-A allele alone, but the protective haplotype GGTGG for SNPs rs864745, rs849140, rs849141, rs10276381 and rs12154248 produced a nominal P-value of 0.000697 in analysis of T2D adjusted for BMI. Likewise for BCL11A, no early AOD risk haplotype produced higher significance than did the rs10490072-A allele alone, but the protective haplotype CCCCAGC for SNPs rs11894442, rs6718203, rs17402905, rs8179712, rs1011407, rs10490072 and rs12468946 produced identical significance in analysis of AOD.

Most of the Caucasian GWAS-derived T2D-associated SNPs are noncoding, residing in either intronic or intergenic regions, are not in LD with known non-synonymous SNPs, and are expected to increase diabetes susceptibility by modulating transcription as a cis-regulatory elements. Thus, we analyzed the genotypic association of 32 SNPs with the expression of 215 expressed local transcripts (represented by 274 probes within ±1 Mb). Six SNPs (rs6698181, rs9472138, rs730497, rs10811661, rs11037909 and rs1153188) were associated with nearby transcript expression in transformed lymphoblast cell lines of GENNID subjects in both general and within-family analyses using generalized estimating equations (Table 2). The strongest association was observed for SNP rs10811661 in regulating the transcription of KLHL9 (P=2.7 × 10−8) under the general model of inheritance.

Table 2 Association of T2D GWAS SNPs with mRNA expression for adjacent genes

Discussion

To our knowledge, this is the first study to evaluate European and Asian T2D GWAS derived polymorphisms in an AA family cohort. We studied 32 established T2D and related trait GWAS-derived SNPs and, consistent with earlier reports, most GWAS-derived SNPs showed no significant associations in these AA families.

TCF7L2 is one of the most significant diabetes susceptibility genes identified to date in various populations.11 A previous case–control association study by Lewis et al.6 reported a significant association of TCF7L2 rs7903146 with T2D in AA populations. This association was not replicated in our family-based GENNID AA sample. The lack of significance may be the result of the relatively low power of our sample, especially when accounting for family structures.

A recent DIAGRAM+ meta-analysis showed associations of 12 new autosomal and X chromosomal loci in a large discovery cohort of 22 044 Caucasian subjects.33 Most of the loci discovered by this meta-analysis showed odds ratio (OR) <1.1. Considering that the statistical power of DIAGRAM+ meta-analysis is much enhanced, we did not expect enough power to detect the effect of those SNPs in our limited-size sample, and have not selected those SNPs for validation in our cohort.

Among the other loci examined in this study, the one that showed the most significant association with T2D is a SNP (rs864745) in a zinc finger protein-coding gene JAZF1. The SNP rs864745 has been characterized as a risk factor in European populations by Zeggini et al.22 in a large meta-analysis. Deletion of the JAZF1 gene in mice leads to early growth retardation, which was associated with reduced plasma IGF-1 levels, and in adulthood to decreased muscle mass, increased fat mass and insulin resistance.34 The rs864745 was associated with JAZF1 expression in muscle in our prior population-based sample of mixed ethnicity using RT-PCR, where the association was largely contributed by African Americans.35 Our gene expression arrays were unable to detect significant expression of the JAZF1 transcripts in GENNID transformed lymphoblast cell lines. The SNPs in the JAZF1 and BCL11A genes were associated with T2D especially after adjustment for BMI and were within the support interval of suggestive linkage peaks for T2D in our GENNID African-American cohort.9 The tag SNP based analysis in these regions revealed no further common SNP or haplotype that may be a stronger predictor of T2D susceptibility than the index SNPs. None of these associated SNPs explained linkage in this region, and associations were not significant after correcting for multiple testing errors. However, a role for rare variants not tagged by haplotypes generated by the common SNPs cannot be excluded by our study.

In summary, our study indicates a nominal role of JAZF1 and BCL11A variants in T2D susceptibility in African-Americans. However, this work suggests little overlap in known susceptibility to T2D between European and African-derived populations if focusing on GWAS SNPs alone. Differences in linkage disequilibrium patterns may result in poor proxies for the tested Caucasian-attributed SNPs in AA populations. In addition, the small effect of the variants may require much larger populations to observe notable associations. Results from GWAS studies in African-Americans are awaited with interest, but further fine mapping studies based on deep sequencing of candidate regions of a representative AA cohort within areas of interest identified in Caucasian GWAS studies may be helpful to target ethnicity-specific genetic risk factors for T2D. Alternatively, as suggested by our study, T2D-associated SNPs may function as cis-regulatory elements and alter the expression of nearby genes, which may fall into certain unknown pathways that contribute to the development of T2D, where the effect size might be different across different populations because of different genetic and/or environmental backgrounds.

Lymphoblast cell lines may not be the most relevant cell types to evaluate T2D- and metabolism- related eQTLs. Thus, a limitation of the current screening of eQTLs in this study was that we only had transformed lymphoblast cell line gene expression data available for the reported GWAS SNPs. However, several studies revealed that eQTLs in tissues relevant to T2D and associated metabolic disorders (for example, adipose) significantly overlap with eQTLs in lymphoblast cell lines.36, 37 Functional studies of regulatory variants, as well as regulated genes per se, would be essential to uncover the T2D-susceptibility genes from multiple GWAS hits.