Introduction

With the recent advances in high-throughput genotyping technology, many genome-wide association studies are conducted to unravel the relation between disease and genes. However, the advancement in biotechnology often confronts with a statistical issue when dealing with large-scale data (Hirschhorn and Daly 2005; Thomas et al. 2005). It encounters the multiple testing dilemmas that most traditional statistical tests fail in reducing both chances of making true-positives and false-positives (Botstein and Risch 2003; Cardon 2001; Long and Langley 1999; Risch and Merikangas 1996). In addition, as the number of markers escalates, the amount of genotyping cost increases dramatically (Thomas 2006). For the first difficulty, it is common to adopt Bonferroni correction to solve the multiplicity effect when analyzing large-scale association studies. For example, Klein et al. (2005) analyzed the relationship between age-related macular degeneration and numerous single nucleotide polymorphisms (SNPs). They used Bonferroni correction to adjust the significance level as the ratio of original nominal level to the total number of SNPs. Although it controls the family wise error rate (FWER), probability of claiming more than one false alarm, the downward level results in loss of power in detecting the relevant SNPs. Furthermore, such a single-stage strategy is not cost-efficient under limited resources, especially when testing a large number of markers. Hence, a procedure that saves cost and maintains a satisfactory power simultaneously is in urgent need. The great majority of unassociated markers can be eliminated via multi-stage procedures, in particular the two-stage methods have been proposed to optimize the power and conserve the cost of such studies (Hirschhorn and Daly 2005; Skol et al. 2006).

From the design viewpoint, there are two types of two-stage procedures. One uses independent subjects at different stages (Miller et al. 2001; Saito and Kamatani 2002), where a large significance level is adopted to select promising markers first, and then a stringent level is applied at the next stage to control the FWER. Ohashi and Clark (2005) took cost-efficiency further into account and conducted a stage-wise approach under limited total cost. Instead of FWER, other studies control the false discovery rate (FDR), false-positive proportion of significant markers (Benjamin and Hochberg 1995), that attains larger power to detect associated SNPs. van den Oord et al.(2003) suggested to use independent samples at different stages and to choose a suitable threshold for controlling FDR at an arbitrary bound. However, one limitation of the approach is that the whole information contained in the data is not fully utilized. For instance, the data of subjects recruited in the first stage are usually discarded, thus it does not satisfy the purpose of preserving the cost as much as possible. Second, when the primary concern is to reduce the number of false-positive markers, usually less attention is placed on the proportion of true-positive markers, which seems conflict with the scientific interest of identifying the markers with association. Third, the FWER-controlling method becomes stringent when the total number of markers is huge.

The second type of two-stage procedures combines all available data, including those of previously selected promising markers. One advantage is the complete utilization of all information. Another is putting more emphasis on the power to detect associated markers than focusing simply on false-positive rate (Wen et al. 2006). Kuchiba et al. (2006) controlled the FDR with optimal sample size and reduced cost. They also emphasized the influence of the true proportion of associated markers on the performance of two-stage designs. Satagopan et al. (2002, 2004) proposed to employ a fraction of resources (either cost or individuals) at an earlier stage, and to use all available individuals in the final stage. Zehetmayer et al. (2005) advocated two-stage designs with controlled FDR and split sample size into two stages for gene-expression studies. Wang et al. (2006) considered various configurations of per-genotype cost ratio and significance levels in both stages to achieve the desired power with minimum cost. Wen et al. (2006) recommended excluding mostly irrelevant markers while adopting a large significance level in the first stage, and controlling the overall false-positives with a downward significance level in the second stage. Different from Satagopan et al. (2002, 2004), they can choose the promising markers at a pre-specified significance level and control the false-positive rate (FPR) adequately. However, the optimal allocation of subjects remains an open issue. In practice, the costs or subjects are limited and it affects the recruitment in both stages with respect to error rates and power. In this paper, we propose optimal designs in this two-stage setting to distribute subjects and select associated markers under two different situations, where one is fixed total genotyping cost (FTGC) and the other is fixed sample sizes (FSS). In the following sections, we introduce the rationale and implementation of the optimal design for both FTGC and FSS. Simulation studies are conducted to evaluate the performances of the proposed approach based on limited cost and sample size. The comparison with other existing alternatives is also discussed.

Methods

In this section, we first brief the notation and then explain the derivation of optimal allocation of sample size under limited cost or total number of subjects. To detect the association between markers and disease phenotype, we consider SNPs as testing markers for illustration. Let δ denote the difference in the mean allele frequency between cases and controls, let N 1 be allele data for each group in the first stage, M the total number of markers in linkage equilibrium, and w the proportion of truly unassociated SNPs. In the earlier stage, if the individual P-value for a marker is less than the uncorrected level α1 (=0.05), the marker is considered promising and will be verified further with additional N 2 allele data in the second stage. Here we assume a balance population-based case control design, and the total number of subjects in stage one and two are N 1 and (N 1 + N 2), respectively. Suppose a total of R promising markers are considered in the second stage, and a stringent significance level, α2=0.05/R, is adopted in this stage to reduce the overall inflated type I error due to large α1. We considered two indices, TPR (true-positive rate) and FPR, to evaluate the performance of the two-stage procedure. According to Wen et al. (2006), both overall FPR and TPR are functions of sample sizes (N 1, N 2), significance levels (α1, α2), number of total markers M, and the irrelevant proportion w. In addition, the disease model parameters such as the allele frequency and effect size of tests also affect the FPR and TPR. Therefore, an optimal design must take these into account.

In the following, we introduce a grid-search algorithm for optimal allocation of (N 1, N 2) with a desired power and constrained resources. First, under FTGC, the total genotyping cost is given by T = MN 1 + RN 2, where R can be replaced with E(R)=Mwα1+M(1−w)(1−β1), and (1−β1) represents the power in the first stage. For simplicity, let N 2=kN 1, we maximize TPR with respect to k and N 1 under the constraint that T = MN 1 + E(R)(kN 1). Since N 1 is related to other factors, e.g. E(R) and significance level in stage 2, in the overall power, it is more flexible to keep the optimization algorithm in a low-dimension setting than in high-dimension of optimizing all factors simultaneously. Besides, it is not feasible to compute the analytical form, and hence we suggest a grid search for a wide range of (N 1, N 2=kN 1) and the one with maximum TPR would be the optimal allocation of sample size. The second limitation concerns the fixed total sample size N (= N 1 + N 2) used in the genetic study. For simplicity, let N 1 = Ï€ N, here Ï€ is the proportion of N 1 in N and ranges from 0 to 1. Given N, M, and w, both the TPR and required costs are proportional to Ï€. Over a plausible range of Ï€, one can conduct a grid search to find the optimal Ï€ that attains the maximum TPR, as well as a substantial cost reduction to strike a balance between power and cost. We use a program written in S-plus 7.0 to perform the searches (The program is available upon request and more details of optimization are given in Appendix.).

Results

Our purpose is to compare the TPR of the proposed method with that of other single-stage design where all markers on all samples are genotyped, and other alternative two-stage designs. All strategies were tailored to use pre-specified cost or sample size, and we compared the false-positive results (i.e. FPR or FDR) and power for a broad range of sample sizes. We also investigated the influence of different allele frequencies, effect sizes, and the allelic odds ratios (OR). Under limited cost T, the sample size of a single-stage design would be T/M. Bonferroni method for this design is denoted as B(T/M). We denoted M(S) for a single-stage method with the same significance level α2 for the proposed two-stage method with fixed FPR. Table 1 lists the simulation results of TPR and FPR for the two-stage method, B(T/M), and M(S) under FTGC for several values of (w, pÌ„,δ, OR) under a fixed array of 5,000 SNPs and equal per-genotyping cost. The fifth column also shows the proportion of cost in stage one, i.e. \(c_{1} = \frac{{MN_{1}}}{T}.\) Numbers were close to the analytical results (data not shown). In these examples, the false-positives such as FPR and FDR (in Fig. 1) were bearably small under various combinations of N 1 and N 2. However, the TPR of the proposed method varied greatly with respect to \((N_{1}, N_{2}, \bar{p},\delta, OR),\) and was often larger than that of single-stage methods, irrespective of allelic odds ratio and the number of markers associated with disease. Moreover, under the two-stage setting, large total sample size did not yield larger power to detect the markers associated with the disease. Hence, we recommend determine the optimal k with easy-to-recruit sample size on the condition that the TPR is manageable or desired for FTGC. For example, the largest TPR(=0.911) occurred at (N 1, k)=(531,2.56) with total sample size N = 1,888 for w = 0.999 and \((\bar{p},\delta, OR)=(0.5, 0.1, 1.49).\) This also indicated that allocating 87.89% of the total cost in earlier stage would maintain optimal power. The relationship between k and c 1 could be derived as \(k = \frac{M}{{{\hbox{E}}(R)}} \times \frac{{1 - c_{1}}}{{c_{1}}}.\) Clearly, if one predetermines different allocations of cost ratio (1−c 1)/c 1 or markers ratio, M/E(R), the settings would affect allocations of (N 1, N 2).

Table 1 Simulation results for the proportion of total cost in stage one, TPR and FPR of two-stage method and two single-stage methods under FTGC
Fig. 1
figure 1

The curves based on simulations in a and c are false-positive rate (FPR), in b and d are false discovery rate (FDR) with respect to various N 1. In all figures, \(M=5000, \quad w=0.999, (\bar{p},\delta,OR)=(0.5,0.1,1.49),\) and α1 = 0.05. In a and b FTGC with T = 600 M, and in c and d FSS with N = 1,200. Three lines in a and d denotes M(S) (dashed line), Bonferroni method (solid line) and two-stage method (dotted line)

Table 2 gives the simulation results of TPR, and FPR of two-stage method, Bonferroni method (denoted as B(N)) and M(S) for FSS (N = 1,200). Column 4 shows the percentage of cost saving, namely the reduction in cost of a two-stage method relative to a single-stage design, \((1 - \pi)\left(1 - \frac{{{\hbox{E}}(R)}}{M}\right).\) By comparing columns 5–7, the TPR of B(N) was the worst among the three methods. Over a plausible range of Ï€ ∈(0.55,0.95), the proposed method yielded a power comparable to that of the more genotyping-effort single-stage design with similar FPR. Hence, we suggest selecting the optimal range of Ï€(=0.55) while the TPR is satisfied and the reduction of cost is significant for FSS. It is worth noting that at small values of Ï€(≤ 0.25), the markers associated with disease in earlier stage were less likely to be chosen for further testing, and the overall power was lower than single-stage method. Alternatively, we presented the simulation results in Figs. 1 and 2. The false-positive results (FPR or FDR) of two-stage method were stable small (in Fig. 1a–d). For FTGC, the TPR of the two-stage setting varied dramatically with (N 1, N 2) and the optimal k appeared to be the maximum TPR, as well as the corresponding sample size was easy-to-recruit (in Fig. 2a–b). Also, it corresponded to deposit at least 80% of the total cost in stage one given the same unit typing cost in both stages. For FSS, the optimal Ï€ was around 0.55 while the corresponding design was at minimum cost on condition that the overall power is near-optimal (in Fig. 2d). These results for FSS are consistent with that in Wang et al. (2006). However, under FTGC, such as given total or minimum cost, there are still many options for (N 1, N 2) with desired power, say 80%. Among those choices, the one with easy-to-recruit total sample size conditional on maximum TPR would be the optimal.

Table 2 Simulation results for cost saving, TPR and FPR of two-stage method and two single-stage methods under FSS. The number of replication is 1,000 in simulation (N=1,200, M=5,000, w=0.999, 0.995, and α1=0.05)
Fig. 2
figure 2

The curves based on simulations in a–d are true-positive rate (TPR) with respect to various N 1. In all figures, \(M=5000, w=0.999, (\bar{p},\delta,OR), (\bar{p},\delta,OR)=(0.5,0.1,1.49)\) and α1 = 0.05. In a and bT = 600 M, and in c and dN = 1,200. Distinct lines correspond to different methods. The solid line is for Bonferroni method with the same resources, the dashed line is for M(S) and the dotted line is for two-stage method

We further evaluated the performance of the optimal two-stage method with some existing alternatives. Under FTGC, an alternative approach using 75% of the cost in stage one to screen all markers and evaluate promising 10% of the markers with the remaining cost in stage two was proposed by Satagopan et al. (2002) (denoted as M(1)). While the sample size was the primary constraint, Satagopan et al. (2004) (denoted as M(2)) advocated that evaluating all the markers on 50% of the subjects in stage one and selecting the most promising 10% of the markers on the remaining individuals in the second stage yielded near-optimal power. We let the total number of markers to be selected at the end of the study is five for M(1) and M(2).

Table 3 lists the TPR, FPR, FDR, total sample size or cost saving of optimal two-stage method, M(1), and M(2) under several parameter configurations for M=100, and 5,000. By comparing the FPR and FDR, the proposed optimal design produced less false-positives than that of M(1) and M(2) regardless of allelic odds ratio and the total number of markers. Actually, the expected number of false-positive results for proposed method was less than one at various M. But for M(1) or M(2), it was larger than one false alarm. Besides, looking at the TPR, the power of the optimal design was consistently larger than that of M(1) and M(2). In practice, the optimal two-stage design was also superior in terms of total sample size or cost-efficiency. For example, the optimal design recruited fewer individuals than M(1) for FTGC under different allelic odds ratio. For FSS, the optimal design produced similar cost reduction, but the power was obviously larger, as well as less false-positive results.

Table 3 Simulation results for TPR, FPR and FDR of optimal two-stage method, M(1) and M(2) under FTGC and FSS. The number of replication is 1,000 in simulation

Discussion

We propose an optimal two-stage design in genetic association studies under the constrained FTGC or FSS. Different from Wen et al. (2006), the optimization is related to limited resources and focuses on efficient allocation of subjects. To accomplish the purpose of maintaining good power when detecting truly relevant markers, we suggest a grid-search algorithm for optimal cost-efficient strategies. Briefly, the concept can be applied to other two-stage settings where the factors of the overall power interact differently. Our proposal has several advantages. First, the (N 1, k) or (N 1, Ï€) can be determined analytically with optimal TPR, bearable FPR and satisfied cost. When the total resources are limited, there are many possible allocations of N 1 and N 2. The impact of allocations on TPR is more obvious than that on false-positive results. One would also use the algorithm to examine adequate total cost or sample size before studies. The rule of thumb is to identify the mode of TPR curve against k (or Ï€ such as the case of cost savings for FSS) to find the optimal condition.

Second, under FTGC, we show that the optimal design k is about 2.5 with moderate total sample sizes and this translates to a design where M(=5,000) markers are screened with approximately 88% of total cost in earlier stage, and then R selected markers are tested with the remaining cost. Furthermore, the grid-search algorithm can be extended to different per-genotype cost ratio at each stage for FTGC. For instance, using factor c g for the ratio of per-genotype cost in stage 2 versus that in stage 1, we present in Fig. 3 the relationships between (FPR, FDR, TPR) and (N 1, N 2, k and c1), based on simulations results under c g=15 (as suggested in Wang et al. 2006). It is obvious that the optimal k is less than 1 and the corresponding proportion of total cost in stage one is nearly 60–65%. Alternatively, if the sample size is restricted, we recommend Ï€ between 0.5 and 0.6 to get a higher overall power and substantial cost reduction. That is, to screen M(=5,000) markers with nearly 55% of total sample size in earlier stage, and then test all individuals with the selected significant R markers. Finally, we investigate the power and false-positive results of alternative two-stage methods. The optimal two-stage method is superior to existing alternatives. The superiority remains when compared in terms of cost-efficiency. The proposed approach provides specific criteria in formal testing with pre-specified significance level for each stage. Satagopan et al. (2002) suggested to determine the number of selected markers prior to a two-stage proposal. This approach is not straightforward since the number of markers associated with the disease is usually unknown.

Fig. 3
figure 3

The curves based on simulations in a and b are false-positive rate (FPR) and false discovery rate (FDR), and c, d are true-positive rate (TPR) with respect to various N 1 for c g = 15 under FTGC. In all figures, \((\bar{p},\delta,OR)=(0.5,0.1,1.49), T=1000\,M, M=5000, w=0.999\) and α1=0.05. Distinct lines correspond to different methods. The solid line is for Bonferroni method, the dashed line is for M(S) and the dotted line is for two-stage method

Our method will provide useful guidelines when planning large-scale association studies. Besides, the scheme of the method does not change with the test statistic used. The same argument applies to the case when more than one locus is considered, though the test may become more complex. Other applications include the association test for tag SNPs or haplotypes. Another issue is if the proportion of cost at the earlier stage c 1 is chosen in advance, the TPR and FPR can be estimated corresponding to N 1=c 1T/M, and \(k = \frac{M}{{{\hbox{E}}(R)}} \times \frac{{1 - c_{1}}}{{c_{1}}}.\) Moreover, if one sets up a certain proportion of ‘promising’ markers, say 0.1 (i.e. E(R)/M=0.1), we could also perform a grid search over a plausible range of k and find the optimal allocation of (N 1, N 2). Kuchiba et al. (2006) recommended the use of their proposal when the proportion of true associated markers (they called it Ï€1) is greater than or equal to 0.01. In that case, this grid search algorithm will provide optimal choice of N 1 and N 2 as well.