Introduction

With the goal of discovering the genes that contribute to the risk of common diseases, numerous susceptibility loci have been identified using linkage analysis. Despite replication of many of these linkages in a second sample, the responsible genes often remain unknown. Some exceptions include TBC1D1, identified as the gene responsible for the obesity linkage on chromosome 4p15–14,1, 2 and HOXB13, identified as the gene responsible for the prostate cancer linkage on chromosome 17q21–22.3 However, for many other common disease linkage signals, the underlying causal genes and variants await discovery.

The expectation that a single gene accounts for a linkage peak may contribute to the difficulty in identifying causal genes. On the contrary, linkage peaks may reflect clusters of causal genes: Martin et al.4 attributed a triglyceride linkage on chromosome 7q36 to variants in five genes, and Christians et al.5, 6 found that fine-mapping caused a single quantitative trait locus (QTL) for body size in mice to resolve first into three QTLs, then one of those to split into two. Furthermore, the clustering of causal genes may have stymied the multi-group effort to identify type 2 diabetes (T2D) genes on chromosome1q;7 two strong associations present in populations of European ancestry failed replication and confirmation in other ethnic groups. These examples suggest that abandoned linkage findings might yet reveal susceptibility genes if reappraised while considering the possibility of multiple causal genes.

As for other common diseases, T2D gene discovery presents a challenge despite strong evidence of a genetic component.8 The challenge is even greater in African American (AA) populations where both prevalence and genetic diversity are higher9 and pathophysiology may differ.10, 11 Genome-wide case-control association studies have identified single nucleotide polymorphisms (SNPs) with small effects on T2D risk,12 but few of the associated SNPs, identified primarily in European ancestry populations, are replicated in AAs13 and few SNPs have been identified in AA populations.14 In general, the widespread rejection of family studies in favor of genome-wide case-control association studies has failed to produce the promised prognostic and diagnostic variants for T2D.15

As a resource for the discovery of genes related to T2D and its complications, the American Diabetes Association established the Genetics of NIDDM (GENNID) study. From 1993 to 2003, the GENNID study ascertained families through siblings diagnosed with T2D. We have used this resource to increase understanding of the genetics of T2D by applying linkage and family-based association analysis to the AA subset of the GENNID sample. Genome-wide linkage analysis using genotypes on 5914 SNPs identified chromosomal regions that potentially harbor risk genes for T2D and age of T2D diagnosis (AOD). The strongest signal for T2D occurred on chromosome 2 at 95–121 megabases (Mb), with weaker support for AOD in the same region and for T2D at 68–95 Mb. Both T2D and AOD also showed linkage on chromosome 13 at 19–30 Mb, but the linkage was limited to T2D on chromosome 7 at 50–79 Mb and to AOD on chromosome 18 at 31–65 Mb.16 Other analyses inferred pleiotropy with triglyceride in the chromosome 2p region17 and pleiotropy with obesity in the chromosome 13 region.18

Herein we present linkage and association analyses on the AA subset of the GENNID sample using genotypes on 9203 fine-mapping SNPs added to the genome-scan SNPs used previously. Our first goal was to more precisely localize the T2D-susceptibility genes in each of the five chromosomal regions identified in the genome scan (chromosome 2 at 68–95 and 95–121 Mb, chromosome 7 at 50–79 Mb, chromosome 13 at 19–30 Mb and chromosome 18 at 31–65 Mb). Our second goal was to test for the association of the gene-based fine-mapping SNPs with T2D and AOD in order to identify genes related to T2D in AAs.

Subjects and methods

The GENNID study ascertained families through sibling pairs each with a T2D diagnosis.19 During Phase 1, extended family members were also studied; one site ascertained AAs. During Phase 2, data collection beyond the sibling pair was limited to parents, or, if parents were unavailable, unaffected siblings; five sites ascertained AAs. During Phase 3, only affected sibling pairs and trios were studied; five additional sites ascertained AAs. In total, 1496 AA members of 580 pedigrees were studied at 10 sites. Data cleaning and re-evaluation of T2D diagnoses by current criteria reduced the analysis sample to 1344 members of 524 pedigrees;16 for this study, we selected an informative subset of 1077 members of 415 pedigrees. This study was approved by the Institutional Review Board at each participating institution.

T2D diagnoses, originally following National Diabetes Data Group criteria, were re-evaluated using current criteria before all analyses; 84 affected and 13 unaffected sample members were re-assigned as unknown. AOD was reported on a standardized questionnaire. Body mass index (BMI) was computed from height and weight obtained from physical examination.

We selected a subset of the sample for fine mapping by excluding unaffected individuals aged <53 years, the age by which onset had occurred in 75% of cases, and then any individual, either affected or unaffected, who consequently had no family members in the sample. This sample subset maintained significant lod scores in each of the linkage regions and produced power >80% to detect association. Power to produce a P-value of 0.00001 was estimated from simulation of 1000 replicates of a SNP with minor allele frequency 0.4 and heterozygous effect a 10% increase in penetrance at age 50 years for T2D or 3-year increase in AOD.

For fine-map genotyping, we used SNAGGER20 (http://snagger.sourceforge.net/) to select tagSNPs within genes in the linkage regions on chromosomes 2, 7, 13 and 18. Using HapMap phases 1 and 2 (http://hapmap.ncbi.nlm.nih.gov), variants in the region were tagged at linkage disequilibrium (LD) r20.7. Each SNP selected had a minor allele frequency >0.1 and Illumina design score >0.4; SNP pairs had minimum spacing of 60 base pairs (bp). Center for Inherited Disease Research (CIDR) performed the genotyping using the Illumina iSelect chip (Illumina, Inc, San Diego, CA, USA). The genetic map locations of the SNPs in centiMorgans (cM) were obtained from the Rutgers Combined Linkage-Physical Map of the Human Genome (http://compgen.rutgers.edu/maps); locations of SNPs not mapped directly were estimated using physical positions from NCBI dbSNP Build 123 in a smoothing calculation (Conway Institute Bioinformatics Service; http://integrin.ucd.ie). Genotyping errors were identified using Pedcheck21 and MERLIN;22 we zeroed 4978 genotypes of 710 SNPs, 74 identified by Pedcheck and 4904 by Merlin. Genotypes for 24 SNPs, available on <95% of the sample following data cleaning, were retained after confirmation of minimal effects on the results. Multi-point identity-by-descent probabilities were computed at each cM using MERLIN,22 treating as haplotypes SNP sets with pairwise LD r2>0.7. We identified 279, 175, 133 and 259 SNP sets on chromosomes 2, 7, 13 and 18, respectively.

Likelihood analysis, as implemented in jPAP,23, 24 was used for univariate linkage analysis of unadjusted T2D (uT2D), T2D adjusted for BMI (bT2D) and AOD and for bivariate association analysis of uT2D and AOD. AOD was adjusted for gender and modeled as a normal density with mean, s.d. and gender effect as parameters. T2D risk was modeled to account for AOD in affected pedigree members, while allowing for censored observations24 with age, gender and BMI (for bT2D only) effects and penetrance as parameters. For each trait, additional parameters included heritability, a QTL effect in linkage analysis and an additive SNP effect in association analysis.

We applied variance components linkage analysis using the univariate model for each trait in conjunction with the identity-by-descent probabilities. At each cM across the five regions, all parameters were estimated for AOD while only heritability and QTL effect were estimated for uT2D and bT2D with all the other parameters for those traits fixed at estimates obtained upon maximizing the likelihood while correcting the likelihood for the ascertainment of each pedigree through an affected sib pair. No further ascertainment correction was made in the linkage analyses. The lod score at each cM was computed as the common logarithm of the ratio of the maximized likelihoods with the QTL effect estimated to the maximized likelihood, with the QTL effect set to zero.

We assessed each fine-mapping SNP for association with uT2D and AOD by coding the SNP genotype as an additive covariate and testing its effect on both traits in a bivariate model. P-values were determined using a two-degree-of-freedom χ2 statistic computed as twice the natural logarithm of the ratio of the maximized likelihood with covariate effects estimated for both T2D and AOD to the maximized likelihood, with both effects set equal to zero. We controlled for multiple testing separately within each of the five linkage regions by specifying a false discovery rate25, 26 of 0.05 accounting for 1291, 1780, 1819, 1336 and 2977 fine-mapping SNPs for chromosomes 2p, 2q, 7, 13 and 18, respectively.

We tested for an independent effect of each secondary associated SNP, conditional on the SNP within the same gene that attained the highest significance, by comparing the maximized likelihood estimating T2D and AOD covariate effects for both SNPs simultaneously to the maximized likelihood estimating T2D and AOD covariate effects for the most significant SNP alone. P-values were again determined using a two-degree-of-freedom χ2 statistic. P<0.05 after Bonferroni correction for the number of secondary SNPs in the gene supported the independence of the secondary SNP.

Results

We genotyped 9203 SNPs within five regions encompassing a total of 133 cM and 128 Mb (Table 1) and merged them with 244 (36 duplicates) genome-scan SNPs. Spacing between SNPS averaged 0.0145 cM (13 910 bp) and ranged from 0.0 to 5.3 cM and from 26 bp to 6 Mb.

Table 1 Fine-mapping regions by chromosome, genetic location, physical position and number of SNPs

To comply with the sample size limitations of the CIDR genotyping platform, we selected a subset of the original linkage sample to be informative for both linkage and association. The genotyped subset included 1077 members of 415 pedigrees that ranged from 2 to 11 members (1–8 were genotyped). In this subset, AOD and BMI differed little from the complete sample, but the unaffected sample members, selected for a minimum age of 53 years, were older (Table 2).

Table 2 Characteristics of the sample by gender and T2D status

We previously identified five linkage peaks within four chromosomal regions from autosomal scans of T2D and AOD; the chromosome 2 region contained two peaks separated by 30 Mb and the centromere.16, 18 Using updated genetic map positions for the genome-scan SNPs, we repeated the linkage analyses on the smaller sample: the genome-scan lod scores for all the five peaks (Table 3) generally agreed with our published lod scores for bT2D and AOD16 and for uT2D.18

Table 3 Lod scores by trait in each of the five chromosomal regions

Upon adding the fine-mapping SNPs, linkage evidence remained in all five regions but strengthened only on chromosome 2 (Table 3). On chromosome 18, only AOD supported linkage, although uT2D provided weak support (lod=2.62) upon elimination of sample members with BMI>45 kg m−2. The most remarkable effect of fine-mapping was the splitting of the lod score curves into multiple peaks in all of the regions except on chromosome 18.

Additional evidence that multiple susceptibility genes contribute to the linkage signals derived from the identification of associated SNPs that reside in multiple genes within each of the five regions, including on chromosome 18 (Table 4). The number of associated SNPs ranged from 4 on chromosome 13 to 17 on chromosome 2q; the number of associated genes ranged from 2 on chromosome 13 to 9 on chromosome 7. Although 20 of the 27 associated genes were identified through a single SNP, five genes were identified through 2 or 3 SNPs and ARHGAP25 and DPP10 were identified by 8 and 10, respectively. However, six associated SNP pairs exceeded LD r20.7: ARHGAP25 (rs6714065 and rs7605681), CTNNA2 (rs968820 and rs1368915), DPP10 (rs843417 and rs1823267, rs10204212 and rs13432035, rs4848376 and rs11694256), and HIP1 (rs1179625 and rs1179622). Nevertheless, conditional association analysis supported the independence of 2 SNPs in ARHGAP25, POLR1B, DPP10 and MTUS2 (Supplementary Table S1).

Table 4 Genes by chromosomal region that associate with uT2D and AOD in bivariate analysis, as identified through FDR<0.05 for SNPs within the genes

Discussion

The fine-mapping of five chromosomal regions allowed us not only to confirm our genome-scan linkages but also to infer the presence of multiple T2D-susceptibility genes in each of the regions. Two types of evidence supported the presence of multiple susceptibility genes underlying the linkage peaks. First, except chromosome 18, each lod score peak split into at least two peaks when fine-mapping SNPs were added to the linkage analysis. Second, in every region, including chromosome 18, 2 potential susceptibility genes were identified through the association with T2D and AOD of fine-mapping SNPs residing in those genes.

Multiple susceptibility genes may often underlie linkage peaks for common diseases. In fact, Martin et al.4 attributed many failures to identify causal variants to the incorrect assumption that a single or a limited number of variants are responsible for a linkage signal. As evidence, they identified variants in five genes that fully accounted for a linkage peak for plasma triglyceride level. If small effects are typical of common disease-susceptibility genes, their detection through linkage analysis may be limited to locations where the genes cluster. The corollary is that isolated genes may elude detection by linkage analysis. However, fine-mapping using next-generation sequencing may provide sufficient power necessary to detect even isolated genes.

Although chromosome 18 failed to show T2D linkage and chromosome 7 failed to show AOD linkage, we nevertheless expect that the susceptibility genes residing in all the regions both increase T2D risk and decrease AOD. Variation between regions in the information available undoubtedly affects the relative strength of the corresponding lod scores for T2D or AOD. As evidence, the elimination of T2D cases with BMI>45 kg m−2 revealed a weak T2D linkage on chromosome 18 and fine-mapping revealed AOD linkages on chromosomes 2p and 2q. The lod scores are generally lower for AOD than for T2D as expected as AOD reflects onset age imprecisely and the familial correlation of AOD may partially result from temporal clustering of T2D diagnoses among relatives. Nevertheless, AOD proved sufficiently accurate to produce linkage evidence in four of the chromosomal regions.

Few of the associated genes have previously been reported to associate with T2D or related traits. The exceptions include GRB10 with T2D,27 NEDD4L with diabetic nephropathy28 and LIPG with lipid metabolism.29 Novel candidates reported herein include a cytokine (IL36B) and genes involved in lipid metabolism (ACOXL) and cell–cell and cell–matrix adhesions (MAGI2, CLDN4, CTNNA2). Interestingly, the candidates also included genes involved in Williams–Beuren syndrome (WBSCR28, WBSCR17, CLDN4),30 whose sufferers have a high prevalence of diabetes and pre-diabetes.31 Another unusual candidate is DPP10, which is reportedly associated with asthma.32 DPP10 is related to DPP4, whose inhibitors are oral anti-hyperglycemics used in T2D therapy,33 and DPP6, which contributes to a triglyceride linkage;4 however, unlike DPP4, DDP10 lacks serine protease/dipeptidyl peptidase activity.34

Support for the functionality of these genes derived from evidence that their expression levels are affected by our associated SNPs or proxies with high LD. We tested 14 SNPs in the 5 genes for which expression levels from transformed lymphocytes were measured on a subset of 160 sample members; nominal significance was obtained for one: P=0.0148 for an effect of rs7579103 on ARHGAP2 expression levels. In other populations and various tissues,35 expression levels of ARHGAP2, AFF3, POLR1B, HIP1, LIPG and NEDD4L showed evidence of an effect of an associated SNP or its proxy (P<1 × 10−5).

For further insight into the nature of the associations, we haplotyped sets of 15–30 fine-mapping SNPs, each encompassing 2 of the associated SNPs reported in Table 4. Haplotypes that attained P<0.00001 are shown in Supplementary Table S2. For ARHGAP25, two haplotypes associated with T2D: the first extended across four of the associated SNPs and contained the risk alleles for all four; the second extended across six of the associated SNPs and contained the risk alleles for all six but differed from the first haplotype for alleles at other SNPs. We propose that each haplotype harbors a distinct causal variant, despite sharing associations with the same risk alleles, demonstrating the complexity of disease associations. Similar patterns occurred for haplotypes in other genes (Supplementary Table S2).

Admixed populations offer both advantages and challenges for genetic studies. The advantages include the opportunity to exploit information on local ancestry to localize disease-susceptibility genes. The challenges include the presence of genetic heterogeneity within the sample. As with any such analysis, confirmation of these findings awaits replication in another sample.

In summary, linkage and association analysis using genotypes on 9203 fine-mapping SNPs added to five chromosomal regions confirmed each linkage and identified potential susceptibility genes in each of the regions. In addition, every region appeared to harbor at least two T2D-susceptibility genes.