Abstract
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer
Human Genomics Open Access 16 June 2023
-
Germline modifiers of the tumor immune microenvironment implicate drivers of cancer risk and immunotherapy response
Nature Communications Open Access 12 May 2023
-
Contribution and clinical relevance of germline variation to the cancer transcriptome
BMC Cancer Open Access 20 June 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Rabbee, N. & Speed, T.P. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006).
Nicolae, D.L., Wu, X., Miyake, K. & Cox, N.J. GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 22, 1942–1947 (2006).
McCarroll, S.A. et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).
Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).
McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39 (Suppl.), S37–S42 (2007).
McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy-number variation. Nat. Genet. advance online publication, 10.1038/ng.238 (7 September 2008).
Komura, D. et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16, 1575–1584 (2006).
Fiegler, H. et al. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 16, 1566–1574 (2006).
Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Bengtsson, H., Irizarry, R., Carvalho, B. & Speed, T.P. Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24, 759–767 (2008); published online 19 January 2008.
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Macconaill, L.E., Aldred, M.A., Lu, X. & Laframboise, T. Toward accurate high-throughput SNP genotyping in the presence of inherited copy number variation. BMC Genomics 8, 211 (2007).
Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977).
Viterbi, A.J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Info Theory IT-13, 260–269 (1967).
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
Laframboise, T., Harrington, D. & Weir, B.A. PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics 8, 323–336 (2007).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).
Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008); published online 9 January 2008.
The International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature advance online publication, doi:10.1038/nature07239 (30 July 2008).
Acknowledgements
We wish to thank G. Getz for discussions on algorithms and comments regarding the supplemental methods. We also thank E. Lander and J. Hirschhorn for their readings and feedback. Finally, we are indebted to the testing labs that provided us with many replicates of HapMap samples run on the Affymetrix SNP 6.0 array. S.A.M. was supported by a Lilly Life Sciences Research Fellowship.
Author information
Authors and Affiliations
Contributions
J.M.K., F.G.K., S.A.M., M.J.D. and D.A. conceived of and refined the four-stage structure of Birdsuite. S.A.M., F.G.K. and J.N. developed and implemented Canary. J.N., S.A.M. and J.M.K. validated Canary calls, using data provided by P.J.C., J.V. and S.C. J.M.K., F.G.K., A.W., S.C. and E.H. developed, implemented, tested and validated Birdseed. J.M.K. developed, implemented and validated Birdseye. A.W. implemented Fawkes, which J.N., A.W. and J.M.K. validated. J.N., A.W., M.M.N. and S.B.G. were responsible for integration of the components and supporting software. K.D., C.L., J.M.K. and S.A.M. compared Birdsuite to Nexus and Partek. S.P. implemented the association tools. J.M.K., F.G.K., S.A.M., S.P., M.J.D. and D.A. wrote the manuscript. Discussion among all authors led to improvements in the algorithms and their implementations.
Corresponding authors
Ethics declarations
Competing interests
S.C., E.H., J.V. and P.J.C. are employees of Affymetrix. The remaining authors (J.M.K., F.G.K., S.A.M., A.W., J.N., K.D., C.L., M.M.N., S.B.G., S.P., M.J.D. and D.A.) neither personally nor institutionally receive financial support from Affymetrix, and neither the authors nor their employers receive compensation or royalties from the work described in this article.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1 and 2, Supplementary Tables 1 and 2, Supplementary Note and Supplementary Methods (PDF 350 kb)
Rights and permissions
About this article
Cite this article
Korn, J., Kuruvilla, F., McCarroll, S. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40, 1253–1260 (2008). https://doi.org/10.1038/ng.237
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.237
This article is cited by
-
Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer
Human Genomics (2023)
-
Cancer aneuploidies are shaped primarily by effects on tumour fitness
Nature (2023)
-
Germline modifiers of the tumor immune microenvironment implicate drivers of cancer risk and immunotherapy response
Nature Communications (2023)
-
Depression pathophysiology, risk prediction of recurrence and comorbid psychiatric disorders using genome-wide analyses
Nature Medicine (2023)
-
Establishing analytical validity of BeadChip array genotype data by comparison to whole-genome sequence and standard benchmark datasets
BMC Medical Genomics (2022)