Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A robust statistical method for case-control association testing with copy number variation


Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Example of CNV data showing poor clustering quality and differential errors.
Figure 2: Methods for performing CNV-association testing.
Figure 3: Modelling the dependency between copy number and disease.
Figure 4: Sensitivity of 1-d.f. association testing methods to clustering quality and differential errors between cases and controls in simulated data.
Figure 5: Statistical power of the likelihood ratio trend test.
Figure 6: Examples of empirical CNV associations.

Similar content being viewed by others


  1. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444?454 (2006).

    Article  CAS  Google Scholar 

  2. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727?732 (2005).

    Article  CAS  Google Scholar 

  3. Lupski, J.R. & Stankiewicz, P. (eds). Genomic Disorders: The Genomic Basis of Disease (Humana Press, Totowa, New Jersey, 2006).

    Book  Google Scholar 

  4. Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848?853 (2007).

    Article  CAS  Google Scholar 

  5. Flint, J. et al. High frequencies of alpha-thalassaemia are the result of natural selection by malaria. Nature 321, 744?750 (1986).

    Article  CAS  Google Scholar 

  6. Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434?1440 (2005).

    Article  CAS  Google Scholar 

  7. Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851?855 (2006).

    Article  CAS  Google Scholar 

  8. McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37?S42 (2007).

    Article  CAS  Google Scholar 

  9. Yang, Y. et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet. 80, 1037?1054 (2007).

    Article  CAS  Google Scholar 

  10. Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439?448 (2006).

    Article  CAS  Google Scholar 

  11. Hollox, E.J. et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40, 23?25 (2008).

    Article  CAS  Google Scholar 

  12. Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243?1246 (2005).

    Article  CAS  Google Scholar 

  13. Plagnol, V., Cooper, J.D., Todd, J.A. & Clayton, D.G. A method to address differential bias in genotyping in large-scale association studies. PLoS Genet. 3, e74 (2007).

    Article  Google Scholar 

  14. WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661?678 (2007).

  15. Armour, J.A., Barton, D.E., Cockburn, D.J. & Taylor, G.R. The detection of large deletions or duplications in genomic DNA. Hum. Mutat. 20, 325?337 (2002).

    Article  CAS  Google Scholar 

  16. Chong, S.S., Boehm, C.D., Higgs, D.R. & Cutting, G.R. Single-tube multiplex-PCR screen for common deletional determinants of alpha-thalassemia. Blood 95, 360?362 (2000).

    CAS  PubMed  Google Scholar 

  17. Newman, T.L. et al. High-throughput genotyping of intermediate-size structural variation. Hum. Mol. Genet. 15, 1159?1167 (2006).

    Article  CAS  Google Scholar 

  18. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906?913 (2007).

    Article  CAS  Google Scholar 

  19. McCullagh, P. & Nelder, J.A. Generalized Linear Models (Chapman and Hall, London, 1989).

    Book  Google Scholar 

  20. Prentice, R.L. & Pyke, R. Logistic disease incidence models and case-control studies. Biometrika 66, 403?411 (1979).

    Article  Google Scholar 

  21. Meng, X.-L. & Rubin, D.B. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267?278 (1993).

    Article  Google Scholar 

  22. R Development Core Team. R: A Language and Environment for Statistical Computing <> (2007).

  23. Schwarz, G. Estimating the dimension of a model. Annals of Statistics 6, 461?464 (1978).

    Article  Google Scholar 

Download references


C.B., T.F., R.R. and M.E.H. are funded by the Wellcome Trust (WT), J.M. is funded by the WT and the National Institute of General Medical Sciences, V.P. is supported by a Juvenile Diabetes Research Foundation (JDRF) fellowship, and D.C. is supported by a JDRF/WT fellowship. The authors would like to thank the Wellcome Trust Case Control Consortium, D. Conrad, A. Moses, N. Carter, M. Dermitzakis, B. Stranger, J. Armour and E. Hollox for data access and helpful discussions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Matthew E Hurles.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1?5, Supplementary Table 1, Supplementary Methods (PDF 334 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barnes, C., Plagnol, V., Fitzgerald, T. et al. A robust statistical method for case-control association testing with copy number variation. Nat Genet 40, 1245–1252 (2008).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing