Introduction

Type 2 diabetes is a grave public health problem that affects about 30 million individuals in the United States, or more than 9% of the US adult population [1]. African American women are disproportionately affected by type 2 diabetes, with a prevalence that is more than twice as high as among non-Hispanic white women [2]. Differences in traditional major risk factors, such as obesity, do not fully explain the racial disparity in the risk of type 2 diabetes [3, 4]. Genetic factors may contribute to the excess risk observed in African Americans.

The list of loci involved in the pathogenesis of type 2 diabetes has grown substantially as a result of recent genome-wide association studies (GWAS). The majority of GWAS of type 2 diabetes susceptibility have been performed in populations of European or East Asian ancestry [5,6,7,8,9,10,11,12]. These studies have discovered more than 100 single-nucleotide polymorphisms (SNPs) associated with type 2 diabetes. GWAS in African ancestry populations have discovered only three novel type 2 diabetes risk variants [13, 14].

To add to the knowledge of the genetic basis of type 2 diabetes in African Americans, we used samples and data from the Black Women’s Health Study (BWHS) to (1) examine whether individual African ancestry is associated with higher risk of type 2 diabetes and whether such association is mediated by body weight; (2) conduct genome-wide admixture mapping to identify genomic loci with local African ancestry associated with risk of type 2 diabetes; (3) conduct replication analysis of 71 index SNPs previously found associated with type 2 diabetes in European ancestry populations; and (4) fine-map those genetic loci to identify better and/or new type 2 diabetes genetic variants in African Americans.

Materials and methods

Study population

Subjects in the present study are participants in the BWHS, an ongoing prospective cohort study initiated in 1995 when 59,000 African American women 21–69 years of age from across the US completed a postal health questionnaire [15]. Information on demographic factors, health history, and anthropometric measurements, among others, was collected at baseline. Every 2 years, participants complete follow-up questionnaires to update outcome and covariates data. DNA samples were obtained from BWHS participants by the mouthwash-swish method [16]. Saliva samples were provided by 50% of BWHS participants. The BWHS was approved by the institutional review board of Boston University (Boston, MA). All study subjects provided written informed consent for use of their saliva samples.

Cases of diabetes for the present study were randomly selected among those with a self-report of newly diagnosed type 2 diabetes in any of the biennial follow-up questionnaires up to 2013 and who had provided a saliva sample. Accuracy of self-reported diabetes was evaluated with 293 women who gave permission to contact their physicians. For 229 women whose physician provided information on the diagnosis; diagnosis of type 2 diabetes was confirmed in 96% of the cases. We randomly selected one control per case matched on year of birth (±1 year), having completed the same questionnaire as the last questionnaire completed by the case before her diagnosis (index date) of type 2 diabetes, and geographic region of residence (Northeast, South, Midwest, and West) at the matched questionnaire. Controls were selected among those without a self-report of type 2 diabetes who had provided a saliva sample. The prevalence of undiagnosed diabetes in the BWHS was previously assessed using data from collected blood samples [17]. Among 1873 BWHS participants who provided a blood sample in the first year of blood collection and had never reported diabetes, 120 (6.4%) had HbA1c levels of 6.5% (47.5 mmol/mol) or higher, meeting criteria for diabetes [18]. Thus, the prevalence of undiagnosed diabetes in the BWHS is similar to what has been reported for African Americans in the general US population [1].

Assessment of covariates

Age, weight, geographic region of residence, and neighborhood socioeconomic status (SES) information was taken from the last questionnaire completed before diagnosis for cases, and from the matched questionnaire for controls. BMI (kg/m2) was calculated from self-reported adult height at baseline and current weight. In a validation study of anthropometric measures conducted in 115 BWHS participants, Spearman correlations for self-reported vs. technician-measured weight and height were 0.97 and 0.93, respectively [19]. Neighborhood SES was measured by linking participants’ current address to 2000 US Census block groups using geocoding (Mapping Analytics, Rochester, NY, USA) [20]. We used factor analysis to calculate a score for neighborhood SES that included six group census variables: median household income; median housing value; percentage of households receiving interest, dividend, or net rental income; percentage of adults aged 25 years or older who have completed college; percentage of employed persons aged 16 years or older who are in occupations classified as managerial, executive, or professional; and percentage of families with children that are not headed by a single female. Years of education were self-reported in 1995 and 2003. Data on years of education were taken from the closest 1995 or 2003 questionnaire preceding the year of diagnosis for cases, or index year for controls.

SNP selection

Ancestral informative markers (AIMs)

A list of 2918 AIMs was extracted from the Affymetrix Axiom Genomic Database. These AIMs have large frequency differences between African and European ancestry populations from the 1000 Genomes Project.

Index SNPs and fine-mapping of type 2 diabetes loci

We included 71 index SNPs (Supplementary Table 1), representing 70 genomic regions that have been reported associated with type 2 diabetes in one or more GWAS. Tagging SNPs for fine-mapping were selected for each of these genomic regions (±100 kb around each index SNP) in order to capture (at r2 ≥ 0.9) all SNPs with minor allele frequency (MAF) ≥ 5%, based on the African populations in 1000 genomes.

Genotyping and QC

Samples were genotyped on an Affymetrix Axiom 45K custom array (Affymetrix, Santa Clara, CA, USA), designed to include genes and SNPs related to type 2 diabetes, as well as AIMs. Genotyping was carried out at the Affymetrix laboratory (Santa Clara, CA, USA). The Axiom array data underwent extensive QC procedures carried out by Affymetrix and the study investigators. We removed about 13% of samples due to high missing call rates (>5%) and poor reproducibility. In addition, we excluded about 17% of SNPs because of poor cluster properties, high missing call rates (>10%), deviation from Hardy–Weinberg proportions (p < 1 × 10−5 in controls), or discordance between duplicate samples. After these exclusions, the final dataset included 5228 subjects (2632 cases and 2596 controls) and 38,008 SNPs, including 22,038 SNPs selected for the current analysis.

Imputation

Once genotypes were obtained, we performed SNP imputation using the imputation server from the University of Michigan (https://www.imputationserver.sph.umich.edu). We used 1000 Genomes Project phase 3 African (AFR) reference panel that includes 1322 haplotypes (661 subjects). Imputation resulted in 85,375 SNPs with MAF ≥ 1% and info score ≥0.5 for analysis in the 70 type 2 diabetes loci.

Data analysis

Ancestry analysis

We used ADMIXMAP software version 3.8.3103 [21] and genotypes of 2918 AIMs to estimate individual percentage of African ancestry. We categorized individual African ancestry in quintiles for statistical analysis, using the first (lowest) quintile as reference group. We used logistic regression to estimate odds ratios (ORs) and 95% confidence intervals (CIs) of the association of individual African ancestry (quintiles) and type 2 diabetes. Our basic model was adjusted for age, geographic region of residence (Northeast, South, Midwest, and West), and genotyping batch. We further adjusted for score of neighborhood SES (quintiles), and years of education (≤12, 13–15, ≥16 years). We also modeled individual African ancestry as a continuous variable to estimate ORs and 95% CI of type 2 diabetes per 10% increase of individual African ancestry.

We used several approaches to examine whether individual African ancestry affects risk of type 2 diabetes in part through an effect on BMI. First, we ran models without and with adjustment for BMI (continuous). If β1 is the estimate of the association between African ancestry and type 2 diabetes without adjustment for BMI, and β2 is estimate of the association between African ancestry and type 2 diabetes with adjustment for BMI then, the proportion of the association between African ancestry and type 2 diabetes that is explained by BMI is given by \(\left( {\frac{{\beta _1 - \beta _2}}{{\beta _1}}} \right)\). Ancestry associations are in logarithmic scale. Second, we conducted BMI-stratified analyses (non-obese women, BMI < 30 kg/m2, and obese women, BMI ≥ 30 kg/m2). Finally, we evaluated the association of individual African ancestry (continuous) with BMI among controls using linear regression adjusting for age, geographic region of residence (Northeast, South, Midwest, and West), genotyping batch, quintiles of the neighborhood SES score, and years of education (≤12, 13–15, ≥16 years).

Admixture mapping

We used ADMIXMAP software version 3.8.3103 [21] to estimate African locus-specific ancestry and identify genomic regions with local ancestry associated with risk of type 2 diabetes. Admixture mapping can be done using case-only and/or case-control analyses. The case-only approach compares locus-specific ancestry at each chromosomal position with average genome-wide ancestry. The case-control method tests for differences of locus-specific ancestry between cases and controls adjusting for individual admixture and other covariates. Case-only analysis has greater statistical power than case-control analysis, assuming that deviation in ancestry between cases and controls is not due to population stratification. We therefore first performed a case-only admixture scan and then a case-control admixture analysis to confirm local ancestry associations from the case-only results. Case-control analyses were adjusted for global individual African ancestry, age, geographic region of residence, and genotyping batch. Statistical significance was assessed using Z statistics, with a threshold of |Z| > 4.0 considered genome-wide statistically significant [22]. A positive Z-score indicates that higher African ancestry at that particular locus is associated with higher risk of type 2 diabetes; a negative Z-score indicates that lower African ancestry is associated with higher risk of type 2 diabetes.

Fine-mapping of type 2 diabetes loci

We used logistic regressions to estimate OR and 95% CIs of the association between genetic variants and type 2 diabetes. ORs were adjusted for age at the last completed questionnaire for cases and at the matched questionnaire for controls, current geographic region of residence, individual African ancestry percentage, and genotyping batch. Within each type 2 diabetes locus we corrected for multiple testing using the number of SNPs that were genotyped [23].

Results

Table 1 shows the characteristics of BWHS participants in the present study. Cases had higher BMI and individual African ancestry compared to controls. Cases also had less years of education, and lived more frequently in neighborhoods within the lowest quintile of SES relative to controls. There were no differences regarding geographic region of residence and health insurance coverage.

Table 1 General characteristics of type 2 diabetes cases and controls in the Black Women’s Health Studya

Table 2 shows results of the association analyses between individual African ancestry and risk of type 2 diabetes. We found, in the basic model adjusting for age and geographic region of residence, that women in the highest quintile of African ancestry had 75% higher risk of type 2 diabetes compared to women in the lowest quintile (OR = 1.75; 95% CI = 1.46–2.09). The association was attenuated, although it remained highly significant, after successive adjustment for SES (OR = 1.57; 95% CI = 1.31–1.89), and BMI (OR = 1.41; 95% CI = 1.16–1.71). We estimated that each 10% increase of African ancestry was associated with 12% higher risk of type 2 diabetes in the model adjusted for basic covariates plus SES and BMI (OR = 1.12; 95% CI = 1.06–1.18). In BMI-stratified analysis, a significant association between African ancestry and type 2 diabetes was observed only among non-obese women (OR per 10% increase in African ancestry = 1.20; 95% CI = 1.10–1.30). To further explore whether African ancestry may affect risk of type 2 diabetes through an increase in BMI, we assessed the relation of African ancestry with BMI in controls in a multivariate model adjusting for age, geographic region of residence, genotyping batch, and SES. We found that each 10% increase of African ancestry was associated with an increase of BMI = 0.59 kg/m2; 95% CI = 0.37–0.81 kg/m2 (p < 1 × 10−4).

Table 2 Odds ratio (OR) according to quintiles of percentage of African ancestry in the Black Women Health’s Study

Table 3 shows results of our genome-wide admixture mapping of type 2 diabetes. A total of 2918 autosomal AIMs were included (Supplementary Table 2). We found two genome-wide significant loci, 3q26 and 12q23, with excess of African ancestry associated with higher risk of type 2 diabetes. Each African allele at 3q26 was associated with 23% higher risk of type 2 diabetes (OR = 1.23; 95% CI = 1.09–1.39). ORs for the association were 1.29 (95% CI = 1.10–1.53) among obese women and 1.19 (95% CI = 0.98–1.44) among non-obese women. At 12q23, each African allele was associated with 13% higher risk of type 2 diabetes (OR = 1.13; 95% CI = 1.00–1.29). In BMI stratified analysis, the association was observed among non-obese women only (OR = 1.33; 95% CI = 1.09–1.62). Regarding their relation with BMI, we found that the high-risk African allele at 3q26 was associated with lower BMI in controls (Beta = −0.55 kg/m2; 95% CI = −1.05, −0.038 kg/m2), and the high-risk African allele at 12q23 was associated with higher BMI in controls (Beta = 0.66 kg/m2; 95% CI = 0.13, 1.18 kg/m2).

Table 3 Genome-wide significant regions of local African ancestry associated with type 2 diabetes in the Black Women’s Health Study

Table 4 shows the 10 index SNPs, out of 71 SNPs, that were associated with type 2 diabetes at a nominal P < 0.05 in the present study. Out of these 10 nominally significant variants, all but two, rs1535500 at KCNK16 and rs6960043 near DGKB, had associations with type 2 diabetes in the same direction as reported in previous GWAS. In six out of the eight SNPs with the same GWAS-reported association direction, the high-risk allele was more frequent in BWHS and African-ancestry populations from 1000 Genomes than in European-ancestry populations. Results for all 71 index SNPs are shown in Supplementary Table 1.

Table 4 Type 2 diabetes GWAS index SNPs with P < 0.05 in the Black Women’s Health Study

Table 5 shows the five genomic loci with region-wide significant fine-mapping results. At 1q32, we identified a new signal rs12091447 (OR = 1.55, p = 1.1 × 10−4) that was not correlated with the index SNP (r2 < 0.01 in European, EUR, and African, AFR, populations from 1000 Genomes Project). We did not find a significant association of the index SNP (rs2075423) with type 2 diabetes in the present study.

Table 5 New significant signals in type 2 diabetes loci in the Black Women’s Health Study

At 5q11, the index SNP rs459193 was nominally associated with type 2 diabetes (OR = 1.09, p = 0.037). We found a new signal rs73127858 (OR = 1.29, p = 7.1 × 10−5) uncorrelated to the index SNP both in AFR and EUR populations (r2 = 0). The effect T-allele has frequency of 11% in the BWHS, 13% in AFR populations, and it is almost absent (0.1% frequency) in EUR populations.

At 9p24, we did not find a significant association of the index SNP rs7041847 and type 2 diabetes (OR = 1.10, p = 0.11). We found a new signal rs114560781 associated with type 2 diabetes in BWHS (OR = 1.80, p = 6.6 × 10−5) that was uncorrelated with the index SNP in AFR populations (r2 = 0). The effect C-allele has frequency of 2% in BWHS, 4% in AFR populations, and is absent in EUR populations.

At 10q25, the index SNP rs7903146 in the TCF7L2 gene was associated with type 2 diabetes (OR = 1.20, p = 2.4 × 10−5). We identified a new signal rs114770437 associated with type 2 diabetes (OR = 1.37, p = 6.4 × 10−5) and uncorrelated with the index SNP in AFR populations (r2 = 0.03). The effect G-allele has frequency of 93% in BWHS, 91% in AFR populations, and 100% in EUR populations.

At 12q14, the index SNP rs1531343 was not associated with type 2 diabetes (OR = 0.99, p = 0.74). We found a stronger marker, the rs2583943 SNP correlated with the index SNP in EUR populations (r2 = 0.67) and associated with type 2 diabetes (OR = 1.25, p = 5.7 × 10−5). In addition, we identified a new signal rs116333053 associated with type 2 diabetes (OR = 1.44, p = 1.8 × 10−4), uncorrelated with the index SNP in both EUR and AFR populations (r2 < 0.01 in both populations).

Discussion

There were several findings of interest from our admixture mapping and fine-mapping work.

African ancestry and type 2 diabetes

We found that higher individual African ancestry was associated with higher risk of diabetes. Even after adjustment for neighborhood SES and years of education, we observed a clear linear relationship between individual African ancestry and risk of diabetes with 16% higher risk of diabetes per each 10% increase of individual African ancestry. African ancestry was previously found to be associated with higher risk of type 2 diabetes in combined data from the Atherosclerosis Risk in Communities (ARIC) Study and Jackson Heart Study (JHS), even after adjustment for measures of SES [24]. In that analysis, subjects in the upper tertile of African ancestry had 37% higher risk of type 2 diabetes compared to subjects in the lower tertile [24]. As comparison, we found in the BWHS that women in the upper quintile of African ancestry had 57% higher risk of type 2 diabetes relative to the first quintile after adjustment for SES. These results suggest that differences in SES do not completely explain the excess risk due to African ancestry, and genetic factors may account in part for the higher burden of type 2 diabetes in African Americans.

BMI may be playing a partial mediating role between African ancestry and type 2 diabetes as shown by our analysis. The association of African ancestry with type 2 diabetes was mostly observed among non-obese subjects, and African ancestry was associated with higher BMI among controls. We estimated that BMI explained about 24% of the observed linear association of African ancestry with risk of type 2 diabetes. Cheng et al. [24] also reported a positive association of African ancestry with BMI, although they did not present BMI-stratified results of the association of African ancestry with type 2 diabetes.

Admixture mapping

Our whole-genome admixture mapping identified two genomic loci, 3q26.33 and 12q23.1, with excess of local African ancestry in cases and confirmed in the case-control analysis.

The 3q26.33 region contains GWAS-identified SNPs associated with type 2 diabetes in Chinese, Malays, and Indians from Singapore [25], acute insulin response in Mexican Americans [26], and total and LDL cholesterol changes in response to fenofibrate in patients with type 2 diabetes [27]. The association of local African ancestry and risk of type 2 diabetes seems to be similar among non-obese and obese subjects, suggesting that the observed association is not modified by body weight.

The admixture signal at 12q23.1 is located about 5 Mb from the insulin-like growth factor 1 (IGF1) gene, which harbors genetic variants associated with fasting insulin and insulin resistance in subjects of European ancestry [6, 28]. The association of local African ancestry and risk of type 2 diabetes was observed only among non-obese individuals, suggesting that body weight may be a modifier of the observed association.

Replication and fine-mapping

Results of our replication efforts showed that some risk SNPs are shared across different ancestry groups. We were able to nominally replicate associations for 8 out of 71, or about 11%, of the examined index SNPs. Our results are similar to findings reported from the meta-analysis of type 2 diabetes in African Americans (MEDIA) Consortium, which replicated associations for 17 out of 104 (or about 16%) index SNPs [14]. In addition, the present study is the first one to report an association of the X-linked rs5945326 SNP near the DUSP9 gene in African Americans, as the previous MEDIA study did not include X-chromosome data.

Our fine-mapping results suggested evidence of independent signals at five loci (PROX1 at 1q32, ANKRD55 at 5q11, GLIS3 at 9p24, TCF7L2 at 10q25, and HMGA2 at 12q14), and evidence of a stronger signal than the European index SNP at HMGA2 at 12q14. Our new SNPs at 1q32 (rs12091447) and 5q11 (rs7317858) were not correlated (r2 < 0.01) with the SNPs identified by the MEDIA consortium at these same loci: rs7548778 and rs459193, respectively [14]. In addition, our new SNP rs114560781 at 9p24 was not correlated (r2 < 0.01) with any of 10 SNPs with locus-wide significant results that were reported by the MEDIA consortium at this same locus [14]. Thus, our findings at 1q32, 5q11, and 9p24 may be false positive results.

We had previously reported about the presence of a new signal at TCF7L2 tagged by SNP rs114770437 that is independent of the index SNP rs7903146 [29]. SNP rs114770437 is monomorphic in 1000 Genomes European, East Asian, and South Asian populations. This may explain why an independent signal at TCF7L2 has not been found in previous GWAS of European and Asian ancestry subjects. The MEDIA consortium [14] found a group of SNPs—uncorrelated with the index SNP rs7903146 (r2 < 0.05)—with locus-wide significant associations with type 2 diabetes, in support of our finding of a new independent signal at TCF7L2. Three of these SNPs (rs7896811, rs11196199, and rs11196203) have moderate correlations (r2 = 0.3) with rs114770437, suggesting that all of these SNPs may be tagging the same causal SNP. It is noteworthy that we had previously postulated the existence of more than one causal variant at TCF7L2 [30], based on the presence of several enhancers with both in vitro and in vivo activity in the TCF7L2 gene [31, 32]. The rs114770437 SNP is located inside one of the enhancer elements identified by Savic et al. [31, 32].

Our strongest association at HMGA2 SNP rs2583943 may represent a better tag of the causal variant in African ancestry populations. Because the rs2583943 SNP has weak correlations (r2 = 0.16) with two SNPs (rs11175944 and rs1480475) identified by the MEDIA consortium at the HMGA2 locus, future expanded fine-mapping efforts would help to identify the causal variant(s) at this locus in African American populations.

Our study has several limitations. First, our sample size of 2632 cases and 2596 controls gives only 40% power to detect an OR of 1.10 for a risk allele with a frequency of 0.10. This low power may explain why we failed to replicate association of many of the index SNPs. Second, we used self-reported type 2 diabetes. However, our validation study showed that 96% of self-reported diabetes cases are confirmed in the medical record. Third, our custom genotyping array did not include many of the known type 2 diabetes loci as it was designed before publication of the latest GWAS. Fourth, prevalence of undiagnosed cases of diabetes among controls, although small and comparable to population estimates, may have resulted in underestimation of the true genetic associations. Fifth, although the threshold of call rate > 90% that we used in the QC of the Affymetrix Axiom 45K custom array is not the standard cutoff, we compared association results from genotyped data (shown in the present analysis) and imputed data of the index SNPs. Results were basically the same. Lastly, we cannot completely rule out the presence of unmeasured confounding of the association between genetic ancestry and risk of type 2 diabetes due to socio-economic factors that were not completely captured by either years of education or the neighborhood SES score.

In summary, the present study has several important findings. First, we found that African ancestry was associated with higher risk of type 2 diabetes. This association was mediated in part by BMI, and was not completely explained by differences in SES. Second, our genome-wide admixture mapping identified two genomic loci, 3q26.33 and 12q23.1, which may harbor genetic risk variants that explain in part the higher risk of type 2 diabetes observed in African American subjects. Third, we were able to replicate several of the index SNPs from European ancestry GWAS. In particular, ours is the first study to replicate the association of the X-linked rs5945326 SNP in African American subjects. Finally, our fine-mapping efforts suggest the presence of new independent signals in at least five loci. Although in four of the loci our strongest SNPs did not correlate with reported SNPs by the MEDIA consortium, findings from the same consortium support the existence of new genetic risk variants in those loci. Future work should help to identify these new causal variants.