cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs

Article metrics


Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs; but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple samples and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French individuals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more amplified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Schematic flow chart showing cnvHap operation.
Figure 2: Visualization of cnvHap population model on chromosome 1 for integrated Illumina 1M and Agilent 244k datasets.
Figure 3: CNV predictions for chromosome 1 25.452 Mb–25.536 Mb.
Figure 4: Cumulative frequency of squared Pearson's correlation coefficient between predicted and benchmark copy number calls.
Figure 5: ROC curves for detecting CNVs using Illumina 1M data.
Figure 6: Combining datasets improved sensitivity.


  1. 1

    Meyre, D. et al. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat. Genet. 41, 157–159 (2009).

  2. 2

    Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

  3. 3

    Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40, 638–645 (2008).

  4. 4

    Cook, E.H. & Scherer, S.W. Copy-number variations associated with neuropsychiatric conditions. Nature 455, 919–923 (2008).

  5. 5

    Walters, R.G. et al. A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature 463, 671–675 (2010).

  6. 6

    Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).

  7. 7

    Diskin, S.J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).

  8. 8

    McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).

  9. 9

    Willer, C.J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).

  10. 10

    Kleinjan, D.A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

  11. 11

    Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

  12. 12

    Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  13. 13

    Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

  14. 14

    Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

  15. 15

    Lipson, D., Aumann, Y., Ben-Dor, A., Linial, N. & Yakhini, Z. Efficient calculation of interval scores for DNA copy number data analysis. J. Comput. Biol. 13, 215–228 (2006).

  16. 16

    Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

  17. 17

    Colella, S. et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).

  18. 18

    Franke, L. et al. Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays. Am. J. Hum. Genet. 82, 1316–1333 (2008).

  19. 19

    Mefford, H.C. et al. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res. 19, 1579–1585 (2009).

  20. 20

    Cooper, G.M., Zerr, T., Kidd, J.M., Eichler, E.E. & Nickerson, D.A. Systematic assessment of copy-number-variant detection via genome-wide SNP genotyping. Nat. Genet. 40, 1199–1203 (2008).

  21. 21

    Korn, J.M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).

  22. 22

    Coin, L. & Durbin, R. Improved techniques for the identification of pseudogenes. Bioinformatics 20 (Suppl. 1), i94–i100 (2004).

  23. 23

    Hoerl, A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962).

  24. 24

    de Smith, A.J. et al. Small deletion variants have stable breakpoints commonly associated with alu elements. PLoS One 3, e3104 (2008).

  25. 25

    Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

  26. 26

    Su, S.-Y., Balding, D.J. & Coin, L.J.M. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions. BMC Bioinformatics 9, 513 (2008).

  27. 27

    de Smith, A.J. et al. Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum. Mol. Genet. 16, 2783–2794 (2007).

  28. 28

    Peiffer, D.A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

  29. 29

    Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  30. 30

    Su, S.-Y., Balding, D.J. & Coin, L.J.M. Disease association tests by inferring ancestral haplotypes using a hidden Markov model. Bioinformatics 24, 972–978 (2008).

  31. 31

    Marioni, J.C. et al. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 8, R228 (2007).

Download references


We thank D. Serre, A. Montpetit and D. Vincent for advice concerning Illumina arrays and D. Peiffer (Illumina) for providing genotype data on HapMap samples. Genome Canada and Genome Quebec funded genotyping on the Illumina Human1M platform. L.J.M.C. is funded by a Research Council UK fellowship. J.E.A. is supported by the Medical Research Council. R.G.W. is supported by Johnson & Johnson and the South East England Development Agency. J.S.E.-S.M. is supported by an Imperial College Division of Medicine PhD studentship.

Author information

L.J.M.C. designed the project with A.I.F.B., developed the cnvHap algorithm and software, analyzed data and wrote the paper. J.E.A. ran cnvPartition, PennCNV and QuantiSNP on the data and helped write the paper. R.G.W. and J.S.E.-S.M. provided critical comments and helped to write the paper. D.J.B. provided statistical advice. R.S. provided SNP genotype data, advised on its interpretation and edited the paper. A.J.d.S. provided aCGH data and advised on its interpretation. P.F. provided the DNA samples and coordinated the SNP genotyping. A.I.F.B. designed the project with L.J.M.C., coordinated the aCGH analysis, contributed to writing the paper and oversaw the project.

Correspondence to Lachlan J M Coin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–3 and Supplementary Note 1 (PDF 1513 kb)

Supplementary Software

Software, documentation and an example. (ZIP 9805 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Coin, L., Asher, J., Walters, R. et al. cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs. Nat Methods 7, 541–546 (2010) doi:10.1038/nmeth.1466

Download citation

Further reading