Estimation of effect size distribution from genome-wide association studies and implications for future discoveries

Journal name:
Nature Genetics
Year published:
Published online


We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn's disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.

At a glance


  1. Nonparametric estimates for distributions of effect sizes for susceptibility loci.
    Figure 1: Nonparametric estimates for distributions of effect sizes for susceptibility loci.

    (a) Curves based only on observed susceptibility loci; these curves are distorted because loci with larger effect sizes are more likely to have been detected. (b) Curves based on estimated susceptibility loci, representative of the population of all susceptibility loci. (c) Estimated nonparametric distributions after normalization over the common observed range for the three traits.

  2. Receiver operating characteristic curves for genetic risk models.
    Figure 2: Receiver operating characteristic curves for genetic risk models.

    (a,b) Curves for Crohn's disease (a) and BPC cancers (b). AUC is a measure of the discriminatory power of the risk model. Blue, a theoretical genetic risk model that explains all of the known familial risk of the trait. Green, a risk model that includes all of the susceptibility loci (142 for Crohn's disease and 67 on average for BPC cancers) estimated to exist within the range of effect sizes seen in the current GWASs. Red, a risk model that includes only known susceptibility loci (~30 for Crohn's disease and ~7 on average for each of the BPC cancers), which we used to estimate the distribution of effect sizes of these traits. Black, reference line corresponding to a model without discriminatory power in which cases have the same distribution of risk as controls.


  1. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747753 (2009).
  2. Hirschhorn, J.N. Genomewide association studies–illuminating biologic pathways. N. Engl. J. Med. 360, 16991701 (2009).
  3. Goldstein, D.B. Common genetic variation and human traits. N. Engl. J. Med. 360, 16961698 (2009).
  4. Kraft, P. et al. Beyond odds ratios–communicating disease risk based on genetic profiles. Nat. Rev. Genet. 10, 264269 (2009).
  5. Pharoah, P.D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 3336 (2002).
  6. Gail, M.H. Value of adding single-nucleotide polymorphism genotypes to a breast cancer risk model. J. Natl. Cancer Inst. 101, 959963 (2009).
  7. Gail, M.H. Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J. Natl. Cancer Inst. 100, 10371041 (2008).
  8. Xu, J. et al. Estimation of absolute risk for prostate cancer using genetic markers and family history. Prostate 69, 15651572 (2009).
  9. Meigs, J.B. et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N. Engl. J. Med. 359, 22082219 (2008).
  10. Wacholder, S. et al. Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362, 986993 (2010).
  11. Kraft, P. & Hunter, D.J. Genetic risk prediction–are we there yet? N. Engl. J. Med. 360, 17011703 (2009).
  12. Visscher, P.M. Sizing up human height variation. Nat. Genet. 40, 489490 (2008).
  13. Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat. Genet. 40, 609615 (2008).
  14. Lettre, G. et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat. Genet. 40, 584591 (2008).
  15. Weedon, M.N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat. Genet. 40, 575583 (2008).
  16. Weedon, M.N. & Frayling, T.M. Reaching new heights: insights into the genetics of human stature. Trends Genet. 24, 595603 (2008).
  17. Barrett, J.C. et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat. Genet. 40, 955962 (2008).
  18. Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 7885 (2000).
  19. Easton, D.F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 10871093 (2007).
  20. Eeles, R.A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat. Genet. 40, 316321 (2008).
  21. Houlston, R.S. et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 40, 14261435 (2008).
  22. Thomas, G. et al. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat. Genet. 41, 579584 (2009).
  23. Thomas, G. et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat. Genet. 40, 310315 (2008).
  24. Eeles, R.A. et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat. Genet. 41, 11161121 (2009).
  25. Orr, H.A. The population genetics of adaptation: The distribution of factors fixed during adaptive evolution. Evolution 52, 935949 (1998).
  26. Eberle, M.A. et al. Power to detect risk alleles using genome-wide tag SNP panels. PLoS Genet. 3, 18271837 (2007).
  27. Schork, N.J. Power calculations for genetic association studies using estimated probability distributions. Am. J. Hum. Genet. 70, 14801489 (2002).
  28. Ambrosius, W.T., Lange, E.M. & Langefeld, C.D. Power for genetic association studies with random allele frequencies and genotype distributions. Am. J. Hum. Genet. 74, 683693 (2004).
  29. Spencer, C.C., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).
  30. Dickson, S.P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D.B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).
  31. Yu, K. et al. Flexible design for following up positive findings. Am. J. Hum. Genet. 81, 540551 (2007).
  32. Ghosh, A., Zou, F. & Wright, F.A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet. 82, 10641074 (2008).
  33. Li, B. & Leal, S.M. Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet. 5, e1000481 (2009).
  34. Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311321 (2008).
  35. Zhong, H. & Prentice, R.L. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 621634 (2008).
  36. Zhong, H. & Prentice, R.L. Correcting “winner's curse” in odds ratios from genomewide association findings for major complex human diseases. Genet. Epidemiol. 34, 7891 (2009).

Download references

Author information


  1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, US Department of Health and Human Services, Rockville, Maryland, USA.

    • Ju-Hyun Park,
    • Sholom Wacholder,
    • Mitchell H Gail,
    • Stephen J Chanock &
    • Nilanjan Chatterjee
  2. Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.

    • Ulrike Peters
  3. Core Genotyping Facility, National Cancer Institute, National Institutes of Health, US Department of Health and Human Services, Gaithersburg, Maryland, USA.

    • Kevin B Jacobs &
    • Stephen J Chanock


J.-H.P. and N.C. developed the statistical methods and designed the analyses. J.-H.P. implemented the methods and carried out all analyses. N.C. and S.J.C. drafted the manuscript. S.W., M.H.G., K.B.J. and U.P. made important suggestions for presentation and interpretation of the results. All the authors participated in critically reviewing the paper and approved the final version of the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (548K)

    Supplementary Tables 1–7 and Supplementary Note.

Additional data