Technical Report | Published:

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs

Nature Genetics volume 40, pages 12531260 (2008) | Download Citation

Abstract

Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006).

  2. 2.

    , , & GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 22, 1942–1947 (2006).

  3. 3.

    et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).

  4. 4.

    , , , & A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).

  5. 5.

    & Copy-number variation and association studies of human disease. Nat. Genet. 39 (Suppl.), S37–S42 (2007).

  6. 6.

    et al. Integrated detection and population-genetic analysis of SNPs and copy-number variation. Nat. Genet. advance online publication, 10.1038/ng.238 (7 September 2008).

  7. 7.

    et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16, 1575–1584 (2006).

  8. 8.

    et al. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 16, 1566–1574 (2006).

  9. 9.

    , , & Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).

  10. 10.

    et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

  11. 11.

    , , & Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24, 759–767 (2008); published online 19 January 2008.

  12. 12.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  13. 13.

    , , & Toward accurate high-throughput SNP genotyping in the presence of inherited copy number variation. BMC Genomics 8, 211 (2007).

  14. 14.

    , & Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977).

  15. 15.

    Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Info Theory IT-13, 260–269 (1967).

  16. 16.

    et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

  17. 17.

    , & PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics 8, 323–336 (2007).

  18. 18.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  19. 19.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  20. 20.

    et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).

  21. 21.

    et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008); published online 9 January 2008.

  22. 22.

    The International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature advance online publication, doi:10.1038/nature07239 (30 July 2008).

Download references

Acknowledgements

We wish to thank G. Getz for discussions on algorithms and comments regarding the supplemental methods. We also thank E. Lander and J. Hirschhorn for their readings and feedback. Finally, we are indebted to the testing labs that provided us with many replicates of HapMap samples run on the Affymetrix SNP 6.0 array. S.A.M. was supported by a Lilly Life Sciences Research Fellowship.

Author information

Author notes

    • Joshua M Korn
    •  & Finny G Kuruvilla

    These authors contributed equally to this work.

Affiliations

  1. Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.

    • Joshua M Korn
    • , Finny G Kuruvilla
    • , Steven A McCarroll
    • , Alec Wysoker
    • , James Nemesh
    • , Marcia M Nizzari
    • , Stacey B Gabriel
    • , Shaun Purcell
    • , Mark J Daly
    •  & David Altshuler
  2. Harvard-MIT Division of Health Sciences and Technology, Cambridge, Massachusetts 02139, USA.

    • Joshua M Korn
  3. Graduate Program in Biophysics, Harvard University, Cambridge, Massachusetts 02138, USA.

    • Joshua M Korn
  4. Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.

    • Joshua M Korn
    • , Finny G Kuruvilla
    • , Steven A McCarroll
    •  & David Altshuler
  5. Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.

    • Joshua M Korn
    • , Finny G Kuruvilla
    • , Steven A McCarroll
    • , Shaun Purcell
    • , Mark J Daly
    •  & David Altshuler
  6. Department of Pathology, Brigham & Women's Hospital, Boston, Massachusetts 02115, USA.

    • Finny G Kuruvilla
  7. Affymetrix, Inc., Santa Clara, California 95051, USA.

    • Simon Cawley
    • , Earl Hubbell
    • , Jim Veitch
    •  & Patrick J Collins
  8. Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, USA.

    • Katayoon Darvishi
    •  & Charles Lee
  9. Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA.

    • Mark J Daly
    •  & David Altshuler

Authors

  1. Search for Joshua M Korn in:

  2. Search for Finny G Kuruvilla in:

  3. Search for Steven A McCarroll in:

  4. Search for Alec Wysoker in:

  5. Search for James Nemesh in:

  6. Search for Simon Cawley in:

  7. Search for Earl Hubbell in:

  8. Search for Jim Veitch in:

  9. Search for Patrick J Collins in:

  10. Search for Katayoon Darvishi in:

  11. Search for Charles Lee in:

  12. Search for Marcia M Nizzari in:

  13. Search for Stacey B Gabriel in:

  14. Search for Shaun Purcell in:

  15. Search for Mark J Daly in:

  16. Search for David Altshuler in:

Contributions

J.M.K., F.G.K., S.A.M., M.J.D. and D.A. conceived of and refined the four-stage structure of Birdsuite. S.A.M., F.G.K. and J.N. developed and implemented Canary. J.N., S.A.M. and J.M.K. validated Canary calls, using data provided by P.J.C., J.V. and S.C. J.M.K., F.G.K., A.W., S.C. and E.H. developed, implemented, tested and validated Birdseed. J.M.K. developed, implemented and validated Birdseye. A.W. implemented Fawkes, which J.N., A.W. and J.M.K. validated. J.N., A.W., M.M.N. and S.B.G. were responsible for integration of the components and supporting software. K.D., C.L., J.M.K. and S.A.M. compared Birdsuite to Nexus and Partek. S.P. implemented the association tools. J.M.K., F.G.K., S.A.M., S.P., M.J.D. and D.A. wrote the manuscript. Discussion among all authors led to improvements in the algorithms and their implementations.

Competing interests

S.C., E.H., J.V. and P.J.C. are employees of Affymetrix. The remaining authors (J.M.K., F.G.K., S.A.M., A.W., J.N., K.D., C.L., M.M.N., S.B.G., S.P., M.J.D. and D.A.) neither personally nor institutionally receive financial support from Affymetrix, and neither the authors nor their employers receive compensation or royalties from the work described in this article.

Corresponding authors

Correspondence to Joshua M Korn or David Altshuler.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1 and 2, Supplementary Tables 1 and 2, Supplementary Note and Supplementary Methods

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.237

Further reading