Protocol | Published:

Basic statistical analysis in genetic case-control studies

Nature Protocols volume 6, pages 121133 (2011) | Download Citation

Abstract

This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take 1 h to complete.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & Designing candidate gene and genome-wide case-control association studies. Nat. Protoc. 2, 2492–2501 (2007).

  2. 2.

    et al. Marker selection for genetic case-control association studies. Nat. Protoc. 4, 743–752 (2009).

  3. 3.

    et al. Data quality control in genetic-case control association studies. Nat. Protoc. 5, 1564–1573 (2010).

  4. 4.

    & An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).

  5. 5.

    et al. Genome-wide association analysis and replication of coronary artery disease in South Korea suggests a causal variant common to diverse populations. Heart Asia 2, 104–108 (2010).

  6. 6.

    Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  7. 7.

    The International HapMap Project. Nature 426, 789–796 (2003).

  8. 8.

    et al. Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am. J. Hum. Genet. 83, 112–119 (2008).

  9. 9.

    Genomewide transmission/disequilibrium testing—consideration of the genotypic relative risks at disease loci. Am. J. Hum. Genet. 61, 1424–1430 (1997).

  10. 10.

    , & Handbook of Statistical Genetics (John Wiley & Sons Ltd., 2003).

  11. 11.

    , & Discrete Multivariate Analysis: Theory and Practice (MIT Press, 557, 1975).

  12. 12.

    Some methods for strengthening the common chi-squared test. Biometrics 10 (1954).

  13. 13.

    Tests for linear trends in proportions and frequencies. Biometrics 11, 375–386 (1955).

  14. 14.

    Mathematical Statistics and Data Analysis (Duxbury Press, 1995).

  15. 15.

    On multivariate normal probabilities of rectangles: their dependence on correlations. Ann. Math. Statist. 39, 1425–1434 (1968).

  16. 16.

    On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann. Math. Statist. 42, 169–175 (1971).

  17. 17.

    A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).

  18. 18.

    & Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. Royal Statist. Soc. Series B-Methodological 57, 289–300 (1995).

  19. 19.

    & The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).

  20. 20.

    & Resampling-Based Multiple Testing: Examples and Methods for P-value Adjustment xvii, 340 p. (John Wiley & Sons, 1993).

  21. 21.

    & Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).

  22. 22.

    , , , & Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).

  23. 23.

    , , & Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

  24. 24.

    , & Allelic association patterns for a dense SNP map. Genet. Epidemiol. 27, 442–450 (2004).

  25. 25.

    , , & Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 43, 520–526 (1988).

  26. 26.

    & Genomic control for association studies. Biometrics 55, 997–1004 (1999).

  27. 27.

    et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–R128 (2008).

  28. 28.

    , , , & Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81, 995–1005 (2007).

  29. 29.

    , , & Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).

  30. 30.

    , , & Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31, 776–788 (2007).

  31. 31.

    R Development Core Team.. A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2009).

  32. 32.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  33. 33.

    , , & Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

  34. 34.

    An R and S-Plus Companion to Applied Regression, xvi, 312 p. (Sage Publications, 2002).

  35. 35.

    A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

  36. 36.

    & Applied Logistic Regression, xii, 373 p. (Wiley, 2000).

  37. 37.

    Introductory Statistics with R, xvi, 363 p. (Springer, 2008).

  38. 38.

    , & GOLDsurfer: three dimensional display of linkage disequilibrium. Bioinformatics 20, 3241–3243 (2004).

  39. 39.

    , , & Goldsurfer2 (Gs2): a comprehensive tool for the analysis and visualization of genome wide association studies. BMC Bioinformatics 9, 138 (2008).

Download references

Acknowledgements

G.M.C. is funded by the Wellcome Trust. F.H.P. is funded by the Welcome Trust. C.A.A. is funded by the Wellcome Trust (WT91745/Z/10/Z). A.P.M. is supported by a Wellcome Trust Senior Research Fellowship. K.T.Z. is supported by a Wellcome Trust Research Career Development Fellowship.

Author information

Affiliations

  1. Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    • Geraldine M Clarke
    • , Fredrik H Pettersson
    • , Andrew P Morris
    •  & Krina T Zondervan
  2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Carl A Anderson
  3. GlaxoSmithKline, King of Prussia, Pennsylvania, USA.

    • Lon R Cardon

Authors

  1. Search for Geraldine M Clarke in:

  2. Search for Carl A Anderson in:

  3. Search for Fredrik H Pettersson in:

  4. Search for Lon R Cardon in:

  5. Search for Andrew P Morris in:

  6. Search for Krina T Zondervan in:

Contributions

G.M.C. wrote the first draft of the manuscript, wrote scripts and performed analyses. G.M.C., C.A.A., A.P.M. and K.T.Z. revised the manuscript and designed the protocol. L.R.C. conceived the protocol.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Geraldine M Clarke.

Supplementary information

Zip files

  1. 1.

    Supplementary Data 1

    Example genome wide association (GTA) data.

  2. 2.

    Supplementary Data 2

    Example candidate gene 9 (CG) data.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nprot.2010.182

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.