Statistical power and significance testing in large-scale genetic studies

Sham, Pak C.; Purcell, Shaun M.

doi:10.1038/nrg3706

Review Article
Published: 17 April 2014

Statistical power and significance testing in large-scale genetic studies

Pak C. Sham^1,2 &
Shaun M. Purcell^3,4

Nature Reviews Genetics volume 15, pages 335–346 (2014)Cite this article

51k Accesses
379 Citations
43 Altmetric
Metrics details

Subjects

Key Points

Significance testing, with appropriate multiple testing correction, is currently the most convenient method for summarizing the evidence for association between a disease and a genetic variant.
Inadequate statistical power increases not only the probability of missing genuine associations but also the probability that significant associations represent false-positive findings.
Statistical power declines rapidly with decreasing allele frequency and effect size, but it can be enhanced by increasing sample size and by selecting appropriate subjects (for example, family history positive cases and 'super normal' controls).
Exome sequencing studies can often identify the mutation responsible for a Mendelian disease by filtering out common variants, synonymous variants or variants that do not co-segregate with disease, and then assigning priority to the remaining variants using bioinformatic tools.
Adequate statistical power for rare-variant association analyses in complex diseases requires the aggregation of the effects of multiple rare variants within a defined portion of the genome (for example, a set of related genes).
Various computational tools are available for calculating the statistical power of genetic studies.

Abstract

Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Posterior probability of H₀ given the critical significance level and the statistical power of a study, for different prior probabilities of H₀.**

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

References

Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1925).
Google Scholar
Neyman, J. & Pearson, E. S. On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond. A 231, 289–337 (1933).
Article Google Scholar
Nickerson, R. S. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol. Methods 5, 241–301 (2000).
Article CAS PubMed Google Scholar
Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet. 7, 781–791 (2006).
Article CAS PubMed Google Scholar
Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nature Rev. Genet. 10, 681–690 (2009). This is a highly readable account of Bayesian approaches for the analysis of genetic association studies.
Article CAS PubMed Google Scholar
Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).
Article CAS PubMed Google Scholar
Ioannidis, J. P. A. Genetic associations: false or true? Trends Mol. Med. 9, 135–138 (2003).
Article PubMed Google Scholar
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).
Article CAS PubMed Google Scholar
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005).
Article CAS PubMed Google Scholar
Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).
Article PubMed Google Scholar
Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).
Article PubMed PubMed Central Google Scholar
Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).
Article PubMed Google Scholar
Voight, B. F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).
Article CAS PubMed PubMed Central Google Scholar
Juran, B. D. et al. Immunochip analyses identify a novel risk locus for primary biliary cirrhosis at 13q14, multiple independent associations at four established risk loci and epistasis between 1p31 and 7q32 risk variants. Hum. Mol. Genet. 21, 5209–5221 (2012).
Article CAS PubMed PubMed Central Google Scholar
Duggal, P., Gillanders, E. M., Holmes, T. N. & Bailey-Wilson, J. E. Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9, 516 (2008).
Article PubMed PubMed Central CAS Google Scholar
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
Article CAS PubMed PubMed Central Google Scholar
Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol. 33, 559–568 (2009).
Article PubMed Google Scholar
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).
Article CAS PubMed Google Scholar
Moskvina, V. & Schmidt, K. M. On multiple-testing correction in genome-wide association studies. Genet. Epidemiol. 32, 567–573 (2008).
Article PubMed Google Scholar
Li, M. X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective number of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).
Article CAS PubMed Google Scholar
North, B. V., Curtis, D. & Sham, P. C. A note on the calculation of empirical P values from Monte Carlo procedures. Am. J. Hum. Genet. 71, 439–441 (2002).
Article CAS PubMed PubMed Central Google Scholar
North, B. V., Curtis, D. & Sham, P. C. A note on calculation of empirical P values from Monte Carlo procedure. Am. J. Hum. Genet. 72, 498–499 (2003).
Article CAS PubMed PubMed Central Google Scholar
Dudbridge, F. & Koeleman, B. P. C. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).
Article CAS PubMed PubMed Central Google Scholar
Seaman, S. R. & Müller-Myhsok, B. Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am. J. Hum. Genet. 76, 399–408 (2005).
Article CAS PubMed PubMed Central Google Scholar
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).
Article PubMed PubMed Central Google Scholar
Panagiotou, O. A., Ioannidis, J. P. & Genome-Wide Significance Project. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int. J. Epidemiol. 41, 273–286 (2011).
Article PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Google Scholar
Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).
Article PubMed Google Scholar
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). This paper summarizes and interprets GWAS findings on common diseases and quantitative traits.
Article CAS PubMed PubMed Central Google Scholar
Pawitan, Y., Seng, K. C. & Magnusson, P. K. E. How many genetic variants remain to be discovered? PLoS ONE 4, e7969 (2009).
Article PubMed PubMed Central CAS Google Scholar
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
Article CAS PubMed Google Scholar
Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
Article PubMed Google Scholar
Zhong, H. & Prentice, R. L. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 621–634 (2008).
Article PubMed PubMed Central Google Scholar
Ghosh, A., Zou, F. & Wright, F. A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet. 82, 1064–1074 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).
Article CAS PubMed PubMed Central Google Scholar
Pirinen, M., Donnelly, P. & Spencer, C. C. A. Including known covariates can reduce power to detect genetic effects in case–control studies. Nature Genet. 44, 848–851 (2012).
Article CAS PubMed Google Scholar
Li, Q., Zheng, G., Li, Z. & Yu, K. Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann. Hum. Genet. 72, 397–406 (2008).
Article PubMed Google Scholar
González, J. R. et al. Maximizing association statistics over genetic models. Genet. Epidemiol. 32, 246–254 (2008).
Article PubMed Google Scholar
So, H.-C. & Sham, P. C. Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates. Behav. Genet. 41, 768–775 (2011).
Article PubMed PubMed Central Google Scholar
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).
Article CAS PubMed Google Scholar
Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nature Genet. 44, 623–630 (2012).
Article CAS PubMed Google Scholar
Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).
Article CAS PubMed PubMed Central Google Scholar
Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Li, B. & Leal, S. M. Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet. 5, e1000481 (2009).
Article PubMed PubMed Central CAS Google Scholar
Liu, D. J. & Leal, S. M. Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am. J. Hum. Genet. 87, 790–801 (2010).
Article CAS PubMed PubMed Central Google Scholar
Li, M. X., Gui, H. S., Kwan, J. S. H., Bao, S. Y. & Sham, P. C. A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res. 40, e53 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet. 42, 790–793 (2010).
Article CAS PubMed Google Scholar
Zhi, D. & Chen, R. Statistical guidance for experimental design and data analysis of mutation detection in rare monogenic mendelian diseases by exome sequencing. PLoS ONE 7, e31358 (2012).
Article CAS PubMed PubMed Central Google Scholar
Feng, B.-J., Tavtigian, S. V., Southey, M. C. & Goldgar, D. E. Design considerations for massively parallel sequencing studies of complex human disease. PLoS ONE 6, e23221 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). This is one of the first association tests for rare variants.
Article CAS PubMed PubMed Central Google Scholar
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).
Article PubMed PubMed Central CAS Google Scholar
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 982 (2010).
Article CAS PubMed Central Google Scholar
Lin, D.-Y. & Tang, Z.-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet. 11, 773–785 (2010).
Article CAS PubMed Google Scholar
Stitziel, N. O., Kiezun, A. & Sunyaev, S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol. 12, 227 (2011).
Article PubMed PubMed Central Google Scholar
Basu, S. & Pan, W. Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol. 35, 606–619 (2011).
Article PubMed PubMed Central Google Scholar
Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. T. & Richards, J. B. The empirical power of rare variant association methods: results from Sanger sequencing in 1,998 individuals. PLoS Genet. 8, e1002496 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ladouceur, M., Zheng, H.-F., Greenwood, C. M. T. & Richards, J. B. Empirical power of very rare variants for common traits and disease: results from Sanger sequencing 1998 individuals. Eur. J. Hum. Genet. 21, 1027–1030 (2013).
Article PubMed PubMed Central Google Scholar
Saad, M., Pierre, A. S., Bohossian, N., Macé, M. & Martinez, M. Comparative study of statistical methods for detecting association with rare variants in exome-resequencing data. BMC Proc. 5, S33 (2011).
Article PubMed PubMed Central Google Scholar
Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wu, Michael, C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). This is the original paper that describes the SKAT for rare-variant association.
Article CAS PubMed PubMed Central Google Scholar
Liu, L. et al. Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet. 9, e1003443 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zuk, O. et al. Searching for missing heritability: Designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2013). This paper presents a framework for power calculation and ways to improve power for rare-variant studies.
Article CAS Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Li, D., Lewinger, J. P., Gauderman, W. J., Murcray, C. E. & Conti, D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet. Epidemiol. 35, 790–799 (2011).
Article PubMed PubMed Central Google Scholar
Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J. A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324, 387–389 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bailey-Wilson, J. E. & Wilson, A. F. Linkage analysis in the next-generation sequencing era. Hum. Hered. 72, 228–236 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Family-based association tests for sequence data, and comparisons with population-based association tests. Eur. J. Hum. Genet. 21, 1158–1162 (2013).
Article PubMed PubMed Central Google Scholar
Pinto, D. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010).
Article CAS PubMed PubMed Central Google Scholar
Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lim, Elaine, T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235–242 (2013).
Article CAS PubMed PubMed Central Google Scholar
Longmate, J. A., Larson, G. P., Krontiris, T. G. & Sommer, S. S. Three ways of combining genotyping and resequencing in case–control association studies. PLoS ONE 5, e14318 (2010).
Article CAS PubMed PubMed Central Google Scholar
Aschard, H. et al. Combining effects from rare and common genetic variants in an exome-wide association study of sequence data. BMC Proc. 5, S44 (2011).
Article PubMed PubMed Central Google Scholar
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ye, K. Q. & Engelman, C. D. Detecting multiple causal rare variants in exome sequence data. Genet. Epidemiol. 35, S18–S21 (2011).
Article PubMed PubMed Central Google Scholar
Li, B., Wang, G. & Leal, S. M. SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits. Bioinformatics 28, 2703–2704 (2012).
Article CAS PubMed PubMed Central Google Scholar
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nature Genet. 44, 243–246 (2012).
Article CAS PubMed Google Scholar
Lee, S., Teslovich, Tanya, M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am. J. Hum. Genet. 93, 236–248 (2013). References 83 and 84 propose powerful and convenient score tests for meta-analyses of rare-variant association studies.
Article CAS PubMed PubMed Central Google Scholar
Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012). This paper describes the SKAT power calculation tool.
Article PubMed PubMed Central Google Scholar
Rees, E. et al. Analysis of copy number variations at 15 schizophrenia-associated loci. Br. J. Psychiatry 204, 108–114 (2013).
Article PubMed Google Scholar
Patnaik, P. B. The power function of the test for the difference between two proportions in a 2 × 2 table. Biometrika 35, 157 (1948).
CAS PubMed Google Scholar
Sidak, Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Statist. Associ. 62, 626 (1967).
Google Scholar
Davison, A. C. & Hinkley, D. V. Bootstrap Methods and Their Application (Cambridge Univ. Press, 1997).
Book Google Scholar
Patnaik, P. B. The non-central χ² - and F-distribution and their applications. Biometrika 36, 202 (1949).
CAS PubMed Google Scholar
Whittaker, J. C. & Lewis, C. M. Power comparisons of the transmission/disequilibrium test and sib–transmission/disequilibrium-test statistics. Am. J. Hum. Genet. 65, 578–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999).
Article CAS PubMed PubMed Central Google Scholar
Kwan, J. S. H., Cherny, S. S., Kung, A. W. C. & Sham, P. C. Novel sib pair selection strategy increases power in quantitative association analysis. Behav. Genet. 39, 571–579 (2009).
Article PubMed Google Scholar
Luan, J. Sample size determination for studies of gene–environment interaction. Int. J. Epidemiol. 30, 1035–1040 (2001).
Article CAS PubMed Google Scholar
Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).
Article PubMed Google Scholar
Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Statist. Med. 21, 35–50 (2002).
Article Google Scholar

Download references

Acknowledgements

This work was supported by The University of Hong Kong Strategic Research Theme on Genomics; Hong Kong Research Grants Council (HKRGC) General Research Funds 777511M, 776412M and 776513M; HKRGC Theme-Based Research Scheme T12-705/11 and T12-708/12-N; and the European Community Seventh Framework Programme Grant on European Network of National Schizophrenia Networks Studying Gene–Environment Interactions (EU-GEI); and the US National Institutes of Health grants R01 MH099126 and R01 HG005827 (to S.M.P.). The authors thank R. Porsch and S.-W. Choi for technical assistance with the manuscript.

Author information

Authors and Affiliations

and Department of Psychiatry, Centre for Genomic Sciences, Jockey Club Building for Interdisciplinary Research; State Key Laboratory of Brain and Cognitive Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong,
Pak C. Sham
and Department of Psychiatry, State Key Laboratory of Brain and Cognitive Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, SAR, China
Pak C. Sham
Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, 10029–6574, USA
Shaun M. Purcell
Center for Human Genetic Research, Massachusetts General Hospital and Harvard Medical School, Boston, 02114, Massachusetts, USA
Shaun M. Purcell

Authors

Pak C. Sham
View author publications
You can also search for this author in PubMed Google Scholar
Shaun M. Purcell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pak C. Sham.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

PowerPoint slide for Fig. 1

Glossary

Likelihoods: Probabilities (or probability densities) of observed data under an assumed statistical model as a function of model parameters.
Family-wise error rate: (FWER). The probability of at least one false-positive significant finding from a family of multiple tests when the null hypothesis is true for all the tests.
C-alpha test: A rare-variant association test based on the distribution of variants in cases and controls (that is, whether such a distribution has inflated variance compared with a binomial distribution).
Sequence kernel association test: (SKAT). A test based on score statistics for testing the association of rare variants from sequence data with either a continuous or a discontinuous genetic trait.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sham, P., Purcell, S. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15, 335–346 (2014). https://doi.org/10.1038/nrg3706

Download citation

Published: 17 April 2014
Issue Date: May 2014
DOI: https://doi.org/10.1038/nrg3706

This article is cited by

Incorporating knowledge of disease-defining hub genes and regulatory network into a machine learning-based model for predicting treatment response in lupus nephritis after the first renal flare
- Ding-Jie Lee
- Ping-Huang Tsai
- Yang-Hong Dai
Journal of Translational Medicine (2023)
Increase in power by obtaining 10 or more controls per case when type-1 error is small in large-scale association studies
- Hormuzd A. Katki
- Sonja I. Berndt
- Nathaniel Rothman
BMC Medical Research Methodology (2023)
Association between PTPN1 polymorphisms and obesity-related phenotypes in European adolescents: influence of physical activity
- Diego F. Salazar-Tortosa
- Idoia Labayen
- Jonatan R. Ruiz
Pediatric Research (2023)
Identification and Genomic Localization of Autosomal sdY Locus in a Population of Atlantic Salmon (Salmo salar)
- Hooman K. Moghadam
- Brede Fannemel
- Borghild Hillestad
Marine Biotechnology (2023)
Large-scale plasma proteomics comparisons through genetics and disease associations
- Grimur Hjorleifsson Eldjarn
- Egil Ferkingstad
- Kari Stefansson
Nature (2023)