Genome-wide association is a promising approach to identify common genetic variants that predispose to human disease1,2,3,4. Because of the high cost of genotyping hundreds of thousands of markers on thousands of subjects, genome-wide association studies often follow a staged design in which a proportion (πsamples) of the available samples are genotyped on a large number of markers in stage 1, and a proportion (πsamples) of these markers are later followed up by genotyping them on the remaining samples in stage 2. The standard strategy for analyzing such two-stage data is to view stage 2 as a replication study and focus on findings that reach statistical significance when stage 2 data are considered alone2. We demonstrate that the alternative strategy of jointly analyzing the data from both stages almost always results in increased power to detect genetic association, despite the need to use more stringent significance levels, even when effect sizes differ between the two stages. We recommend joint analysis for all two-stage genome-wide association studies, especially when a relatively large proportion of the samples are genotyped in stage 1 (πsamples ≥ 0.30), and a relatively large proportion of markers are selected for follow-up in stage 2 (πmarkers ≥ 0.01).
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 14 December 2022
Immune and spermatogenesis-related loci are involved in the development of extreme patterns of male infertility
Communications Biology Open Access 10 November 2022
A common deletion at BAK1 reduces enhancer activity and confers risk of intracranial germ cell tumors
Nature Communications Open Access 02 August 2022
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22, 139–144 (1999).
Cardon, L.R. & Bell, J.I. Association study designs for complex diseases. Nat. Rev. Genet. 2, 91–99 (2001).
Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
Johnson, G.C.L. et al. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29, 233–237 (2001).
Ke, X. & Cardon, L.R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).
Stram, D.O. et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the multiethnic cohort study. Hum. Hered. 55, 27–36 (2003).
Satagopan, J.M., Venkatraman, E.S. & Begg, C.B. Two-stage designs for gene-disease association studies with sample size constraints. Biometrics 60, 589–597 (2004).
Satagopan, J.M., Verbel, D.A., Venkatraman, E.S., Offit, K.E. & Begg, C.B. Two-stage designs for gene-disease association studies. Biometrics 58, 163–170 (2002).
Thomas, D., Xie, R.R. & Gebregziabher, M. Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414 (2004).
Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001).
Hinds, D.A. et al. Matching strategies for genetic association studies in structured populations. Am. J. Hum. Genet. 74, 317–325 (2004).
Pritchard, J.K. & Donnelly, P. Case-control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).
Ripatti, S., Pitkaniemi, J. & Sillanpaa, M.J. Joint modeling of genetic association and population stratification using latent class models. Genet. Epidemiol. 21, S409–S414 (2001).
Satten, G.A., Flanders, W.D. & Yang, Q.H. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68, 466–477 (2001).
Shmulewitz, D., Zhang, J.Y. & Greenberg, D.A. Case-control association studies in mixed populations: Correcting using genomic control. Hum. Hered. 58, 145–153 (2004).
Yang, B.Z., Zhao, H.Y., Kranzler, H.R. & Gelernter, J. Practical population group assignment with selected informative markers: characteristics and properties of Bayesian clustering via STRUCTURE. Genet. Epidemiol. 28, 302–312 (2005).
This research was supported by the US National Institutes of Health.
The authors declare no competing financial interests.
Significance thresholds and power of joint and replication-based analysis for a number of two-stage genome-wide association designs. (PDF 219 kb)
Power of joint and replication-based analysis for a number of two-stage genome-wide association designs. (PDF 190 kb)
About this article
Cite this article
Skol, A., Scott, L., Abecasis, G. et al. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38, 209–213 (2006). https://doi.org/10.1038/ng1706
This article is cited by
Association study between polymorphisms in MIA3, SELE, SMAD3 and CETP genes and coronary artery disease in an Iranian population
BMC Cardiovascular Disorders (2022)
Association of genetic variants of oxidative stress responsive kinase 1 (OXSR1) with asthma exacerbations in non-smoking asthmatics
BMC Pulmonary Medicine (2022)
Nature Communications (2022)
GWAS identifies candidate susceptibility loci and microRNA biomarkers for acute encephalopathy with biphasic seizures and late reduced diffusion
Scientific Reports (2022)
Association of Endothelial Nitric Oxide Synthase Gene Polymorphisms with Coronary Artery Disease in North Indian Punjabi Population
Biochemical Genetics (2022)