Introduction

The phenotype of a complex disease such as diabetes or hypertension is the result of interactions between genetic and nongenetic (environmental) factors such as diet and physical activity. Genetically influenced factors differ in various populations, a fact which is partially due to natural selection (which favors adaptations specific to the environment) but also helps to explain why complex traits occur with variable prevalences and varying subphenotypes in different regions of the world.

Both genetic and nongenetic factors can severely confound the results of any genetic study on a complex trait. Although advanced strategies on the phenotype level have been developed over the past decade, little attention has been paid to population stratification and genetic homogeneity of the study sample. In general, there are only a few reports in which researchers tried to address this issue. It has been argued that the effects of stratification can be eliminated simply by carefully matching cases and controls according to self-reported ancestry and geographical origin.1 The argument was supported by studies using empirical methods such as STRUCTURE to detect stratification based on genotypes at unlinked markers.2, 3 In one of four studies involving genotyping of 44 unlinked markers, stratification was detected, but the signal was no longer apparent after more stringent matching of cases and controls based on the birthplaces of the individuals' grandparents.4 This has been interpreted as evidence that stratification may be less of a concern than originally anticipated.

Although systematic differences in the ancestry of cases and controls can be a source of false-positive associations,5, 6 the fraction of published associations that is attributable to stratification is unknown.7 Freedman et al8 found no significant evidence for stratification with STRUCTURE by analyzing data from 24–48 SNPs in 11 association studies spanning a range of disease states and self-reported ancestries and three different epidemiological designs. However, after typing more SNPs and applying the method of Genomic Control to the data they found significant evidence for stratification (P<0.0001).8 Even in the relatively homogeneous genetic isolate from Iceland, Helgason et al9 found evidence for substructures, indicating that sampling strategies need to take account of this issue.

The search for type 2 diabetes (T2DM) associated SNPs provides a sobering example of contradictory association results for a complex trait where undetected population substructures might be responsible for the discrepancy. In an association-based follow-up of a genomewide linkage scan, Horikawa et al10, 11 identified three genetic markers in intron 3 of the calpain-10 gene that significantly contributed to diabetes susceptibility in a Mexican-American population. Haplogenotype 112/121 (UCSNP-43, -19, -63) defined the highest risk (original sample: odds ratio (OR)=2.80, 95% confidence intervals (CI)=1.23–6.34; replication sample: OR=3.58, 95%CI=1.43–8.92). The largest study to date, a meta-analysis by Weedon et al,12 demonstrated a role for calpain-10 in T2DM susceptibility (P=0.0007; OR=1.17; 95%CI: 1.07–1.29). However, most studies that tried to replicate the findings have failed to show any association (Table 1).

Table 1 Tests for association of UCSNP-44 (if performed), -43, -19, -63 in intron 3 of the calpain-10 gene with T2DM in different populations

It is likely that those conflicting results were due to a strong genetic and phenotypic heterogeneity of T2DM. The existence of subtle subphenotypes with a different genetic background can be assumed. Selection pressure under different environmental conditions might result in promoting the ‘survival’ of different mutations in various genes, all leading to comparable but slightly different subphenotypes. As shown by Baier et al,13 ‘the diabetes genotype’ may have been beneficial for survival during the evolution of man, therefore implicating a variety of such subphenotypes.

In our association study involving previously mentioned SNPs at the calpain-10 locus we therefore wanted to systematically test for genetic stratification of our sample and, if found, perform separate association tests for each subgroup. In order to minimize heterogeneity further we applied a 2-step procedure. (1) We chose end-stage diabetic nephropathy requiring hemodialysis as a very specific diabetes subphenotype. The specificity relies on the fact that only 25–40% of type 2 diabetic patients develop this form of nephropathy after 25 years of diabetes duration. The clientele is usually characterized by severe insulin resistance and frequent micro- and macrovascular complications. Life expectancy after starting hemodialysis is only 2 years for 50% of the patients, based on fatal vascular complications.14, 15, 16 (2) We extended this classical and widely used approach by a genetic diversity test in our cases and controls (German ancestry) before we carried out association analyses. By selecting a specific subphenotype and, in addition, narrowing down the phenotype by a genetic diversity test we were hoping to enrich the sample with those phenotype(s) in subgroups that share nearly the same genetic basis.

We chose a multivariate approach, the genetic vector space method (modified Genomic Control method) that relies on the concept of ‘biological ethnicity’ (see Materials and methods). We identified significant genetic diversity in an apparently homogeneous German population. When testing calpain-10 SNPs (UCSNP-19, -43, -63) for association with T2D, our results changed completely when taking this genetic diversity into account.

Materials and methods

Subjects

We recruited our case sample of 612 type 2 diabetic patients with end-stage diabetic nephropathy on hemodialysis throughout Germany from dialyses centers within the study frame of the German 4D (Die Deutsche Diabetes-Dialysestudie) trial, with the headquarter at the Division of Nephrology, Department of Medicine, University of Würzburg, Würzburg, Germany.17 The study was approved by the local ethics committee. All participants gave their written consent. Only patients with end-stage diabetic nephropathy and German ancestry (questionnaire) were included in the trial. Baseline parameters are summarized in Table 2.

Table 2 Clinical and laboratory characteristics of the patients

We found no history of diabetes, stroke, or myocardial infarction in our control sample of 214 healthy controls (112 males, 102 females). Subjects came from the area around Würzburg, Bavaria, Germany. We found hypertension in two (0.93%), hyperlipoproteinemia in two (0.93%), and smoking in 42 subjects (19.63%). Our questionnaire revealed that 8.41% of parents, 0.47% of siblings, and 0% of children of controls were diabetic. Since the average age of the control group was 33.05±9.32, we had to assume that approximately 5% of the controls were still at risk for developing T2DM (according to general population prevalence of T2DM in Germany).

Genotyping

We genotyped 20 microsatellite markers in DNA samples from all subjects. The selection of those markers was based on the preferences for the genetic vector space method by Stassen et al.19 that will be described later. We used this marker set to detect unknown population stratification through the concept of biological ethnicity (Table 3). Polymerase chain reaction (PCR) was performed under standard reaction conditions. We redesigned primer sequences for UCSNP-43 and -63 (UCSNP-43 forward 5′-HEX-GACCCTCACCATGAGTCATAATTG-3′, UCSNP-43 reverse 5′-TCACCAAGTACAAGGCTTAGCCTCACCTTCGTA-3′, UCSNP-63 forward 5′-FAM-CTCCTGATCAACACCTAGCCAAGG-3′, UCSNP-63 reverse 5′-AAGGGGGGCCAGCGCCTGACGGGGGTGGCG-3′). We performed SNP testing as a modified RFLP method (DRMP-PCR) as described in Berger et al.18

Table 3 Twenty uncorrelated polymorphic markers were combined to a multidimensional feature vector in order to assess genetic diversity and to model biological ethnicity in terms of interindividual genetic similarities

Genetic diversity test by multivariate feature vectors (population substructure analysis)

The main principles of this method are described in detail by Stassen et al.19 All algorithms were implemented in the Master.GEN program package.20 Here, we present only the basic aspects of the method.

Unknown population admixture can substantially reduce the power of studies that aim to link phenotype to genotype. Since allele distributions of microsatellites generally display subtle to marked differences between populations, a multivariate configuration of sufficiently polymorphic microsatellites enables quantification of the genetic heterogeneity of genetically diverse sample sets. Such multivariate configurations can be regarded as multivariate ‘feature vectors’ which span a genetic vector space. Subjects are characterized in a vector space as distinct ‘points’ such that genetically similar subjects form compact clouds (‘clusters’), while genetically dissimilar subjects are located in more distant regions of the vector space. ‘Natural’ clusters can then be used to define genetically homogenous subgroups, thus leading to the concept of ‘biological ethnicity’. This concept reduces the problem of genomic control for genetic association studies, where unknown population admixture can produce false-positive as well as false-negative signals.

In our study, we relied on a slightly modified set of 20 di-, tri- and tetranucleotide polymorphisms (Table 3), which had previously been applied successfully in studies investigating differences in genetic diversity between various US-American populations,21, 22, 23 European populations,24, 25 and population isolates.26, 27 Those markers were unlinked with each other, albeit not randomly distributed over the genome. The method was initially developed for microsatellite markers, for which many reference genotypes and allele frequencies were available. Once the genetic vector space was constructed, cluster analysis was carried out under the following optimization criteria: (1) cluster detection started exclusively with the cases and searched for the largest homogenous group among the cases, thereby excluding the controls; (2) the controls were subsequently treated as independent replication samples, thus supplementing the clusters derived from the patients. As a direct consequence, our cluster analysis method has a slight preference for actually classified cases over controls.

Evaluation of linkage disequilibrium between UCSNP-43, -19, -63

We estimated haplotype frequencies at three loci separately for the total case and control groups using SNPHAP v1.0. In addition, we carried out haplotype estimation separately for each subgroup within cases and controls for the stratified association analyses. We estimated the LD for each SNP pair separately for cases and controls. Further, we determined Lewontin's D′ and the squared correlation coefficient Δ2 as measures of linkage disequilibrium (LD), and the P-value according to the χ2 test.28 We used the statistical package R for testing HWE and for the evaluation of LD.

Analyses of association for UCSNP-43, -19, -63

Allele frequencies for each SNP in controls and cases were computed by allele counting. As in Horikawa et al,11 allele 1 denotes the G allele for UCSNP-43, the C allele at UCSNP-63, and consists of two copies of the 32-bp repetitive sequence at UCSNP-19. We used a significance level of 5% for the initial tests of association. In addition, we evaluated whether an association remained significant after Bonferroni correction for multiple comparisons. Two test statistics were used to test for association between UCSNP-43, -19, -63 and T2DM: (1) the general χ2 test with two degrees of freedom (df) comparing genotypes of cases and controls, and (2) a trend test with one df comparing genotypes in a multiplicative allelic relative risk model. For the stratified analysis a modified Cochran–Mantel–Haenszel test statistic29 was applied, which sums up the relative genotype frequency differences for all subgroups without requiring additional df. This method allows the estimation of a common OR across different subgroups. For small P-values, ORs and 95%CI were calculated. Exact CI for the ORs are shown for small subgroups. For all association analyses, the statistical package SAS was used.

Haplotype-based analysis of association using SAS

We calculated the estimated number of haplotypes per group for fully genotyped individuals by multiplying the estimated relative frequencies with twice the number of fully genotyped individuals. The rare haplotypes 122, 211, 212, 222 were pooled. We then carried out a global χ2 test with 4 df comparing the haplotype distribution between cases and controls. For the stratified analysis we applied a modified Cochran–Mantel–Haenszel test statistic as global test statistic, like in the association analyses for individual SNPs. The P-value corresponding to each haplotype shows its importance with respect to the global test statistic. It was calculated by decomposing the global χ2 test statistic into the contributions of the individual haplotypes. For small P-values, we calculated the ORs comparing one haplotype versus all other haplotypes. We calculated the exact CI for the ORs for small subgroups. Frequencies for haplogenotypes, containing haplotypes with a significant association to T2DM were calculated as well. Only individuals whose haplogenotype could be determined with a probability of more than 90% were included in this analysis.

Results

Population substructure analyses

The tested sample consisted of 826 subjects (612 cases, 214 controls). Population substructure analyses revealed four subgroups (one large, three small groups) in our sample. In a stratified analysis, we had to exclude 87 controls of subgroup 4 because they did not match any patient group. The sample sizes of the subgroups 2 and 3 were very small so that the results of those groups had to be viewed with caution. The 547 cases and 101 controls of group 1 formed a genetically homogenous population (Table 4). We detected no significant deviation from HWE for SNPs in either the entire sample or any of the identified subgroups of cases and controls. All three markers were in LD as expected.

Table 4 Numbers of cases and controls in the subgroups (number of fully genotyped subjects)

Analyses of association for UCSNP-43, -19, -63

In the combined sample, we did not find evidence for association between T2DM with end-stage diabetic nephropathy and UCSNP-43/-19. In the nonstratified analysis, UCSNP-63 did not show an association with diabetes either. However, when we analyzed the data stratified by the three subgroups, we found a significant association between UCSNP-63 and T2DM (P=0.031) after disregarding the 87 unmatched controls of subgroup 4. The observed association in the entire stratified sample resulted from subgroup 1 (P=0.002). In subgroup 1, the rare allele 2 was more frequent in controls than in cases. When we used a Bonferroni correction for the χ2 test appropriate for testing three independent polymorphisms, that is, a significance level of α=0.05/3=0.017, the association tested across all subgroups in the stratified sample failed to reach significance. Since the polymorphisms are in strong LD with each other, this correction probably leads to a very conservative threshold. In subgroup 1, however, the association remained significant even after additional correction for the analysis in three independent subgroups with a Bonferroni-corrected significance level of α=0.05/9=0.006 (Table 5).

Table 5 Allele frequencies and tests for association between UCSNP -43, -19, -63 and T2DM

Haplotype-based analysis of association

Four of the eight possible haplotypes were observed at common frequencies in the total data set (111, 112, 121, 221; Table 6). The global χ2 test in the total nonstratified sample and across subgroups did not reveal any haplotype-based evidence for association. We found a significant global P-value across haplotypes (P=0.035) only in subgroup 1. As mentioned before, we could not observe an overall association across groups in the stratified analysis since the two other subgroups showed different haplotype distribution patterns between cases and controls. The association in subgroup 1 did not remain significant when adjusting for multiple testing in three independent subgroups, since the P-value was larger than the Bonferroni-corrected significance level of α=0.05/3=0.017. Further analysis showed that the highest contribution to the global P-value in subgroup 1 resulted from haplotype 112 (P=0.006), which was the main cause of the observed association for UCSNP-63 described in the previous paragraph. Haplotype 112 was the only haplotype with a frequency higher than 1% in the population having allele 2 at UCSNP-63. In our analysis, haplotype 112 was associated with a lower risk of T2DM because it occurred at higher frequency in the control group. We also compared the frequencies of the different haplogenotypes containing haplotype 112. The analysis showed that the observed association between haplotype 112 and diabetes was mainly caused by individuals carrying the haplotype combination 112/121 (Table 7), which was more often observed in controls versus the cases.

Table 6 Haplotype frequencies and tests for association between the haplotypes and T2DM
Table 7 Haplogenotype frequencies for haplogenotypes containing haplotype 112

Discussion

The formidable problems of detecting association in complex diseases such as T2DM lie in the significant reduction of power that is associated with the etiological complexity (clinical, genetic, ethnic heterogeneity; polygenic character; gene–environmental interactions).30, 31 Since there is no reliable test available to differentiate distinct subforms of diabetes it is not surprising that association studies present with conflicting results. A part of the explanation of this situation might be related to potentially hidden population substructure in addition to very heterogeneous phenotypes. Several authors have shown that sampling strategies need to take account of population substructure, among them the Icelandic genetic isolate.9

There is no single study on the role of the described CAPN10 polymorphism in the development of T2DM that performed a systematic test for genetic diversity in the tested samples from different populations. As in previous studies, our sampling strategy for diabetic nephropathy also relied exclusively on a questionnaire for a ‘valid’ ancestry of study participants. Although also we had apparently recruited relatively homogeneous German groups of cases and controls, we nevertheless detected three distinct subgroups in our cases and controls, with obviously differing genetic ancestry, by a genetic vector space method with 20 microsatellite markers. There were even 87 controls that had to be removed since they did not match any case from our sample. When we analyzed the entire sample disregarding population substructure, we did not detect association between end-stage diabetic nephropathy requiring hemodialysis and the three individual calpain-10 polymorphisms, including possible haplotypes and haplogenotypes. When we grouped all individuals by their population substructure we found a significant association of the common allele 1 at UCSNP-63 with diabetes (P=0.005) in the largest subgroup 1 (547 cases, 101 controls). Even after a very conservative correction for multiple testing (Bonferroni), the calculated P-value remained significant (corrected α=0.006). We were aware of the potential loss of statistical power by subclassification. However, the ‘effective’ statistical power may have been increased by the use of homogeneous subpopulations. Further, the cross comparison between groups enabled the distinction between ‘population-independent’ (the same signal shows up in all subpopulations) and ‘population-related’ (the signal shows up in a compact subpopulation while failing to be detected in the population as a whole) vulnerability factors. This was a generalization of standard genomic control methods that follow a probability-oriented approach in order to test for population stratification.

The direction of the association came somewhat unexpectedly: The rare allele 2 was more frequent in controls than in cases and decreased the risk for the development of T2DM with end-stage diabetic nephropathy. This difference could not be observed in groups 2 and 3. The stratified test statistic also supported the association (P=0.031) but did not remain significant after Bonferroni correction. This might be explained by the low power of the Cochran–Mantel–Haenszel test statistic for detecting an association if the effect is heterogeneous across the subgroups. We found further that the haplogenotype 112/121 was more often observed in controls versus cases. Our findings indicated a protective function of haplogenotypes 112/121 against the development of diabetes with end-stage diabetic nephropathy.

Other studies support the functional impact of our results: Shima et al32 found a lower body mass index (BMI) and a lower HbA1c being associated with the haplogenotype 112/121 (P=0.016, P=0.008). The same findings were replicated by Ehrmannet al33 in African-American subjects with a specific T2DM phenotype, the polycystic ovary syndrome (PCOS). In terms of common polygenic T2DM, it makes sense that individuals at risk for the disease demonstrate a higher BMI compared to non-diabetic controls. However, these studies, like our own, were in contradiction to the results of Horikawa et al who found the haplotype combination 112/121 as increasing the risk for diabetes in a Mexican-American population. Another British study supports the findings of Horikawa et al. Subjects with the 112/121 haplotype combination (n=29) had increased fasting (P=0.004) and 2-h plasma glucose levels (P=0.003) compared with the remaining group of subjects having all other haplogenotypes. The 112/121 haplotype combination was also associated with a marked decrease in the insulin secretory response, adjusted for the level of insulin resistance (P=0.002).34 Conflicting results in different populations and even within the same population may indicate a different genetic background for the trait, thus explaining contradictory findings. On the other hand, there is a chance that different subphenotypes with another genetic background were studied. This may have been the case in a meta-analysis reported by Weedon et al12 involving four different Japanese populations. The results ranged from evidence for and against association. There is clearly a problem: A stronger focus on genetic background stratification is required, as supported by our findings.

We found association in a subsample but not in the undivided sample. A possible interpretation is (1) that the primary genetic mechanism under investigation (CAPN10) may be ethnicity-specific rather than ethnicity-independent in terms of ‘biological ethnicity’, and (2) that this ‘biological ethnicity’ can be quantified through a set of polymorphic microsatellites as demonstrated, for example, by Di Rienzo et al35 for African, Egyptian and Sardinian populations using only 10 microsatellites.

It could be argued that our study results were just a matter of coincidence. However, even a very conservative Bonferroni correction of the significance level for multiple testing did not change the significance. There was also no deviation from HWE in all subgroups supporting our procedure. Further, it could be argued that we had a younger control group when compared with the cases. However, taking into account that association was still detected even though 5–10% of the control subjects will develop diabetes sometime in the future, one would expect an even greater effect in a well-matched sample regarding age and sex.

One might argue that the use of microsatellite markers for a test such as we performed could limit the applicability of genetic vector space methods. In fact, in the age of SNPs it would be ideal to have hundreds of thousands of SNPs with which we could establish a much finer population (or individual) differentiation – yet at the cost of additional complexity as a relatively large number of SNPs is necessary just to get the same information content inherent in one single microsatellite.

Since there is little chance of distinguishing subtle phenotypic differences, we propose tests for genetic homogeneity in the study sample along with the use of advanced phenotyping strategies. In this way, we might be able to enhance the chances for identifying both genetic and nongenetic factors contributing to the disease. The identification of population substructures – or in other words, the identification of genetically similar clusters of individuals – should sharpen up the results of association studies.