Introduction

Case–control analysis using singletons has been shown to be generally more powerful than any other design per genotyped individual for detecting genes associated with disease.1 However, this design is vulnerable to excessive false-positive findings in the presence of population stratification.2, 3, 4, 5, 6, 7 If a study population comprises two subpopulations, and the disease prevalence is greater in one subpopulation compared to the other, then cases will typically be over-sampled from the subpopulation with higher disease prevalence. Spurious associations will be observed between the disease and any genetic marker where the allele frequencies differ in both subpopulations.

The Transmission Disequilibrium Test (TDT) proposed by Spielman et al8 compares the rate of transmission of each allele from a heterozygous parent to an affected offspring. The TDT maintains the desired Type I error rate in the presence of population stratification, as non-transmitted parental alleles from heterozygous parents serve, in effect, as the control population. Therefore, the power of the TDT is heavily dependent on the number of heterozygotes (informative parents), who may or may not transmit the allele of interest. It is well appreciated that population stratification results in a loss of heterozygosity compared to that expected assuming Hardy–Weinberg Equilibrium (HWE). This is known as the Wahlund Effect, which results in less informative parents for the TDT. However, no guidelines currently exist for sample size calculations for the TDT in the presence of population stratification.

In this paper, we lay out a method of estimating the two parent genotype patterns (mating types) seen in the presence of population stratification using the parental allele frequencies and Wright's coefficient of inbreeding F. The estimates of the mating types are then used for power calculations using the method provided by Knapp.9 We then compare the sample sizes assumed using HWE to those calculated using our new method and show how these differences arise.

Methods

Notation and terminology

Population stratification occurs when a population comprises two or more subpopulations, where there is random mating and HWE within subpopulations, but no mating between subpopulations. Assume that there are G separate subpopulations, where G, as well as the actual members of each subpopulation are unknown and let wi be the proportion of the stratified population represented by subpopulation i. Consider a single biallelic marker or single nucleotide polymorphism (SNP) with two alleles, A and B, respectively. If pi is the frequency of allele A in subpopulation i, then is the allele A frequency (averaged over all subpopulations) in the overall population. Let qi=1−pi be the frequency of the B allele in subpopulation i, so that the average allele frequency of allele B in the overall population is 1−=. The variance of the A allele frequency between subpopulations is defined as

. As we assume random mating and HWE within subpopulations, the frequencies of the AA, AB and BB genotypes in subpopulation i are pi2, 2piqi and qi2, respectively. Let the proportion of individuals in the overall stratified population with genotypes AA, AB and BB be μAA, μAB and μBB, respectively so that

The mating type is defined as the two-parent genotype combination. We assume symmetry between the mating types (ie AA × AB=AB × AA, etc.), so that instead of nine unique mating types, we have only six mating types. Let .

We define ψ2 and ψ1 to be the respective genotypic relative risks (GRR) for the risk allele (allele A) homozygotes and heterozygotes compared to the BB homozygote. The GRR, ψi is defined as the ratio of the probability of disease in subjects with i risk alleles compared to subjects with 0 risk alleles. We assume that there is no segregation distortion and that the offspring genotype determines the offspring's disease risk. We also assume that the GRRs are the same in each subpopulation (absence of risk heterogeneity). Therefore, the difference in disease prevalence between subpopulations is only because of differences in the risk allele frequency between the subpopulations, that is the risk conferred by a given allele does not vary by subpopulation. For the sample size calculations done in this paper, the following modes of inheritance (MOI) were considered for comparison with Knapp: multiplicative model (ψ1=γ and ψ2=γ2), additive model (ψ1=γ and ψ2=2γ), dominant model (ψ1=ψ2=γ) and the recessive model (ψ1=1 and ψ2=γ), where γ≥1 for all models.9 We note that the multiplicative and additive MOI definitions were chosen to correspond to those defined previously by Camp.10

Population stratification results in a decrease in heterozygosity compared to that expected assuming HWE

If a population is divided into distinct subpopulations with random mating within each subpopulation, the decrease in the proportion of heterozygotes compared to that occurring under random mating of the whole population is proportional to the variance of the allele frequency between subpopulations. This phenomenon is known as the Wahlund Effect.11 Wright's coefficient of inbreeding, F, is commonly used to describe population stratification, where

, and the variance of the allele frequency between subpopulations,

. Population stratification results in a loss of heterozygosity that is exactly 100 (1−F)% of that expected assuming HWE.

F values expected in practice

It is difficult to know a priori how much population stratification may exist in practice. Cavalli-Sforza12 estimated that the worldwide F values averaged over all genes in the genome is approximately 0.139±0.010. A study by Akey et al13 analyzed 26 530 SNPs in three populations (African-American, East Asian and European-American). The F value for each locus was calculated and the distribution of these F values was noted. Of the 25 549 autosomal SNPs genotyped, 29.8% were common in all three populations, 26.8% were common in only two populations, 22.9% were unique to only one population and 5.1% were uncommon. Here, an SNP was defined as common if the minor allele frequency was greater than 20%. The average F value obtained in the coding, intronic and non-coding regions were 0.107, 0.118 and 0.123, respectively. The distribution of F values was skewed to the right, which suggested that F values in the range 0–0.15 are feasible values that should be considered in practice. We note that F values calculated by Akey et al13 assume that equal proportions of each subpopulation are present in the overall stratified population, whereas we allow for differing proportions of each subpopulation in the overall stratified population. Akey's approach is best suited for calculating the genetic distance between subpopulations, whereas our approach is geared toward detecting the change in the genotype frequencies in the presence of stratification.

The effect of population stratification on mating type frequencies

Let Δi be the difference between the allele A frequency in subpopulation i and the allele A frequency in the entire population, where Δi=pi, for i=1, 2,…G. Furthermore, define the central moments of the allele frequency distribution, for j=1, 2, 3, 4. Therefore,

Then,

This yields

Similarly,

This yields

All of the six mating types can now be re-written in terms of the average risk allele frequency , Wright's coefficient of inbreeding F, ϑ3 and ϑ4 as shown in Table 1.

Table 1 Mating type frequencies in the presence of population stratification parameterized in terms of the central moments of the allele frequency distribution

Change in the relative proportion of informative families because of population stratification

Three of the six mating types are informative for the TDT because they contain at least one heterozygous parent – AA × AB, AB × AB and AB × BB (m2, m4 and m5). However, as shown above, Wright's coefficient of inbreeding, F, is insufficient to appropriately calculate all mating type frequencies without ϑ3 and ϑ4. Under HWE the relative proportions of the informative mating types m2:m4:m5 are clearly 2: (1−):(1−)2; however, these proportions no longer hold in the presence of population stratification. To illustrate the changes in the relative proportions of the mating types, we consider a study population comprising two equal-sized subpopulations, where F=0 and 0.10, ϑ3=0 and ϑ4 is allowed to vary for a multiplicative MOI where γ=2 (only , F and ϑ3 are required to completely describe a two-subpopulation model). The changes in the relative proportions of the informative mating types AA × AB, AB × AB and AB × BB are shown in Figure 1.

Figure 1
figure 1

Relative proportion of the informative mating types for a multiplicative MOI where γ=2.

The power of the TDT and sample size calculations in the presence of population stratification

The TDT tests the null hypothesis of no association between a marker and disease in the presence of linkage. Knapp provided a method for reliably calculating the power of the TDT in a homogeneous population assuming HWE for affected child trios (ACTs).9 We retain his symbols for ease of comparison with our method. At the heart of Knapp's method is the characterization of family types (genotypes of both parents as well as the affected offspring) for the TDT and the calculation of each family type probability (s1, s2,…, s10), as shown in Table 2. Seven family types are informative for the TDT (s1, s2,…, s7) and Knapp showed how their multinomial frequencies could be calculated based only on ψ1, ψ2 and .9 However, if population stratification exists, the average frequency of the risk allele A in the stratified population () is not sufficient to calculate the mating type proportions. The true family type proportions (s*1, s*2,…, s*10) can be calculated from the correct mating type proportions based on the method described earlier (using , F, ϑ3 and ϑ4) and are shown in Table 2. However, ϑ3 and ϑ4 are almost always unknown, and though there are several published papers about F values expected in practice, there are no published reports of ϑ3 and ϑ4 values expected in practice. Yasuda14 showed that the higher central moments (terms in ϑ3 and ϑ4) can be ignored if >F and 1−>F. Therefore, (s1, s2,…, s7) can be estimated by (ŝ*1, ŝ*2,…, ŝ*7) using and F as shown in Table 2 and these estimates can then be used in power calculations. GRRs are set assuming γ=2.0 and 1.5 for multiplicative, additive, dominant and recessive models. To evaluate our method for estimating sample sizes, we considered a stratified study population comprising two smaller subpopulations. In the first example, 70% of the trios are from subpopulation 1, and 30% of the trios are derived from subpopulation 2. In the second example, the study population comprises equal proportions of trios from both subpopulations. We then calculated the true sample size requirements to achieve 80% power, using a Type I error rate of α=1 × 10−7 at various values of the average allele frequency, (0.1, 0.3, 0.5 and 0.8), and at various levels of population stratification measured by F (0.01, 0.05 and 0.10). The true sample size estimates were compared to our estimates using only and F, as well as compared to the sample size calculations assuming HWE. These sample size calculations are shown in Tables 3 and 4. A Type I error rate of α=1 × 10−7 is used for comparison with Knapp,9 and for comparision with sample size estimates from genomewide association studies.

Table 2 Family type probabilities given affected child
Table 3 Sample size necessary to gain 80% power in TDT with singletons (α=10−7, γ=2), comparing Knapp's first approximation assuming HWE to the sample size estimates assuming population stratification
Table 4 Sample size necessary to gain 80% power in TDT with singletons (α=10−7, γ=1.5), comparing Knapp's first approximation assuming HWE to the sample size estimates assuming population stratification

Results

Population stratification alters the frequency of the mating types when compared to those calculated assuming HWE. The change in the distribution of the mating type frequencies directly changes the family type frequencies (mother–father–offspring genotype combinations). When HWE is assumed for TDT power calculations, the expected value and variance of the test statistic under the alternative hypothesis as well as the variance of the test statistic under the null hypothesis differ from that calculated when accounting for population stratification. The variance of the TDT in the presence of stratification under the null hypothesis could be larger or smaller than that calculated assuming HWE depending on the mating type parameters (, F, ϑ3, ϑ4). The difference between the variance of the TDT under the alternative hypothesis accounting for population stratification and the variance of the TDT assuming HWE depends on the model (ψ2, ψ1) and mating type parameters (, F, ϑ3, ϑ4). The current methods existing for calculating TDT sample sizes ignore the fact that parental genotypes used in the TDT statistic are no longer in HWE if there is population stratification. Furthermore, these methods ignore the fact that the parents cannot be considered independent as population stratification induces non-random mating, when considered on a whole-population basis.

Knapp showed that the power of the TDT is a function of the genetic model, genotypic risk parameters and the risk allele frequency.9 Our model for TDT sample size calculations in the presence of population stratification based on estimations of the mating type frequencies using the average allele frequency, and Wright's coefficient of inbreeding F allows sampling from any number of subpopulations with any ascertainment scheme, and is generalizeable to any stratified study population. In addition, we show that the change in the power of the TDT in a stratified population is due to three main factors: (1) the loss in heterozygosity, which is 100 (1−F)% of that predicted assuming HWE; (2) the change in the distribution of mating type frequencies; and (3) the change in the relative proportion of the informative mating types to each other. These factors should not be ignored if there is the possibility of even mild population stratification.

Interestingly, and somewhat contradictory to initial expectation, population stratification does not always result in decreased power of the TDT because of the loss in heterozygosity. Tables 3 and 4 show that the power of the TDT can be increased in the presence of population stratification because of the change in the relative proportion of the informative mating types and the disease model. If the disease model is a multiplicative MOI, then sample sizes calculated assuming HWE are generally smaller than those actually required. For a dominant, additive or recessive model, the relative frequencies of the AA × AB, AB × BB and AB × AB mating types dictate the change in sample size requirements as the relative information content for these three mating types may not be in the ratio 1:1:2. For example, if the disease is inherited in a dominant MOI, the mating type AA × AB has no information and the mating type AB × BB has the most information.

In a stratified population, the risk allele frequency and mating type frequencies in a single subpopulation may cause subjects from that subpopulation to be more informative than subjects from other subpopulations. This results in one subpopulation that heavily influences the overall TDT sample size calculations. The subjects from other subpopulations may not be as informative and add very little to the overall TDT statistic. This suggests that it may be useful to partition a study population into its component subpopulations prior to analysis.

Larger values of Wright's F were correlated larger discrepancies in the sample size requirements compared to those calculated assuming HWE. For example, in Table 3 where F=0.01 and γ=2, estimates of the sample size required varied from 3% larger to approximately 15% smaller than the sample size calculated using Knapp's first approximation. However, when F=0.1 and γ=2, estimates of the sample size required varied from 39% larger to about 68% smaller than the sample size calculated using Knapp's first approximation.

Wright's F cannot capture all the information about population stratification pertaining to the distortion in mating type frequencies; however, reasonable estimates of the mating type frequencies can be made using F and average allele frequency when F≤min (, 1−).

To estimate sample sizes for the TDT we ignore the third and higher order risk allele frequency moments (ϑ3 and ϑ4). However, small changes in ϑ3 and ϑ4 are important and result in sample size discrepancies when our estimates are compared to the true sample size estimates for a fixed F, as seen in Tables 3 and 4.

One potential limitation of our method is that it does not take into account population admixture, as we assume both parents are sampled from the same subpopulation. Population admixture occurs when the study population comprises multiple subpopulations, but there is mating within and between subpopulations. Population admixture is complex, and the rate of admixture depends on socioeconomic, racial, ethnic, linguistic, migratory factors and several other factors. Additional work is needed to assess how the power of the TDT changes in recently admixed populations.

Knapp uses the disease prevalence as a normalizing factor so that the sum of the probabilities of all family types with an affected offspring sums to 1.9 We note that the disease prevalence, calculated assuming HWE denoted by R, is different from the disease prevalence calculated assuming population stratification. The disease prevalence in the stratified population, is the disease prevalence in subpopulation i. We also note that the normalization factor, in the stratified population is greater in magnitude than the normalization factor in the population in HWE, R, if ψ2+1>2ψ1, however, if ψ2+1<2ψ1 the normalization factor in the stratified population, is smaller in magnitude than that of the population in HWE. The difference in the normalization factors between the stratified population and the population in HWE, Fp̄q̄ (ψ2−2ψ1+1), depends on the genetic model chosen and the GRRs. The difference is due to the altered proportion of risk genotypes because of population stratification.

Distortion in the mating type frequencies can be seen when there is underlying population stratification. In a randomly mating population in HWE, F=ϑ3=ϑ4=0. Distortion in the mating type frequencies can also be observed in the parents of an affected child. For example, the parents of a child with a rare recessive disease will most likely be both heterozygous at the disease locus (AB × AB). Similarly, for a rare dominant disorder, the parental genotypes of affected children are most likely a heterozygote and a wild type (AB × BB). This distortion in mating type frequencies becomes most apparent when the risk allele is rare (less than 1% frequency in the population) but confers a substantial risk of disease and the allele explains the majority of the variability seen in the disease and the disease can be considered a simple Mendelian disease from a genetic perspective. The method proposed accounts for the genetic MOI, and takes into account the distortions in the mating type frequencies seen because the parents are the parents of an affected child. Our method is suitable for complex diseases (where the GRR conferred by any single polymorphism is generally less than 3), and the risk allele is relatively common (ie the risk allele frequency in the population is greater than 1%).

Discussion

The TDT was proposed as the solution to the challenge of finding a disease susceptibility gene in a stratified study population. This method was proposed primarily because of its robustness to population stratification and admixture, which made it superior to the traditional case–control tests, which are susceptible to false-positive results. Typically, in the design of family-based studies, HWE is assumed when calculating adequate sample sizes required for a pre-specified power level. However, when there is population stratification, there is an increase in homozygosity beyond that expected by HWE. The homozygous parents in the study population do not contribute to the TDT statistic.15 For example, in a sample of 500 ACTs, in which the estimated disease allele frequency in the parents is 0.1, 180 heterozygous parents would be expected assuming HWE. However, if there is a large degree of population substructure where F=0.25, only 135 heterozygous parents are expected under the null hypothesis (the locus of interest is not associated with the disease of interest).

Although there are several methods for calculating the power of the TDT,9, 16, 17 none of these methods takes into consideration the power of the TDT in the presence of population stratification. Despite the fact that the TDT maintains the correct Type I error rate, the power of the TDT is significantly affected in the presence of population stratification. This is extremely important, because the TDT is often used when population stratification is possible. Studies with sample sizes calculated ignoring population stratification might be underpowered, and fail to detect putative disease genes. Our method is the first method that proposes incorporating Wright's coefficient of inbreeding, F as a measure of population stratification to approximate the mating type frequencies in the presence of population stratification for TDT sample size calculation.

One of our major points is that in the presence of population stratification, there is considerable variation in the power of the TDT to detect an association between a locus and a disease. The power of the TDT depends on the underlying genetic structure of the study population. We suggest that in future, researchers present estimates of the parameters required to describe the population structure (, F, ϑ3 and ϑ4) as well as the standard errors of these parameter estimates with all TDT findings. Studies using the TDT to replicate previously published findings may be more challenging than previously thought, as two studies with the same number of subjects could have very different powers to detect the same genetic association depending on the underlying population substructure. This finding may shed light on the cornucopia of studies that have failed to replicate previously published positive findings of association at certain loci. However, one must note that the initial finding may have been spurious.

In addition to possibly losing power because of population stratification (as a result of altered proportions and relative ratios of informative family types), the power of the TDT can be reduced if there is genetic risk heterogeneity between the subpopulations. Genetic risk heterogeneity occurs if the relative risk (RR) of disease conferred by the putative deleterious allele varies from one subpopulation to another. Quantitative risk heterogeneity occurs when the effect of the deleterious allele is not homogenous across all subpopulations, but is more profound in some subpopulations. A classical example for quantitative risk heterogeneity exists between ApoE and Alzheimer's disease, where the association exists pan-ethnically but is strongest in Caucasians and Asians, and weaker in Hispanics and African-Americans.18 The results of the TDT remain valid when there is quantitative risk heterogeneity, but the power of the TDT to detect this association with a given sample size will vary depending on the ethnic composition of the study sample.

Qualitative risk heterogeneity occurs when one allele is deleterious in one subpopulation (RR>1), but is protective in another (RR<1). An important point to be emphasized is that the power of the TDT diminishes greatly if there is qualitative risk heterogeneity. In fact, the case–control genomic control method proposed by Devlin and Roeder3 and the population stratification model (STRAT) proposed by Pritchard et al19 are both more powerful than the TDT per genotyped individual in the presence of qualitative risk heterogeneity as illustrated in Table 2 of Pritchard and Donnelly.6

In summary, statistical geneticists and genetic epidemiologists should carefully identify their study population and based on a conservative level of population stratification, follow the guidelines proposed when calculating sample sizes in anticipation of genetic analysis using the TDT and other family-based tests.