Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The CNV discovery and CNV genotyping data are available at ArrayExpress (http://www.ebi.ac.uk/microarray-as/ae/) under accession numbers E-MTAB-40 and E-MTAB-142, respectively. Normalized CNV discovery data are available at http://www.sanger.ac.uk/humgen/cnv/42mio. CNVs are displayed at the Database of Genomic Variants (http://projects.tcag.ca/variation). CNV locations and genotypes are reported in Supplementary Tables 1 and 2.
We would like to thank A. Boyko, J. J. Emerson, J. Pickrell, S. Kudaravalli, J. Pritchard, T. Down, S. McCarroll, J. Collins, C. Beazley, M. Dermitzakis, P. Eis, T. Richmond, M. Hogan, D. Bailey, S. Giles, G. Speight, N. Sparkes, D. Peiffer, C. Chen, K. Li, P. Oeth, D. Stetson and D. Church for advice, sharing data, sharing software and technical assistance. We are grateful for the efforts and support of our colleagues at NimbleGen, Agilent, Illumina, Applied Biosystems and Sequenom. We thank J. Barrett for comments on an earlier version of the manuscript. The Centre for Applied Genomics at the Hospital for Sick Children and Wellcome Trust Sanger Institute are acknowledged for database, technical assistance and bioinformatics support. This research was supported by the Wellcome Trust (grant no. 077006/Z/05/Z; to M.E.H., N.P.C., C.T.-S.), Canada Foundation of Innovation and Ontario Innovation Trust (to S.W.S.), Canadian Institutes of Health Research (CIHR) (to S.W.S.), Genome Canada/Ontario Genomics Institute (to S.W.S.), the McLaughlin Centre for Molecular Medicine (to S.W.S.), Ontario Ministry of Research and Innovation (to S.W.S.), the Hospital for Sick Children Foundation (to S.W.S.), the Department of Pathology at Brigham and Women’s Hospital (to C.L.) and the National Institutes of Health (NIH) (grants HG004221 and GM081533; to C.L.). K.K. is supported by the Academy of Finland. D.P. is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). S.W.S. holds the GlaxoSmithKline Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children.
Author Contributions C.T.-S., N.P.C., C.L., S.W.S. and M.E.H. are all joint senior authors, and planned and managed the project. D.F.C. and D.P. lead the data analysis. Data analyses were performed by D.F.C., D.P., R.R., L.F., O.G., Y.Z., J.A., T.D.A., C.B., P.C., T.F., M.H., C.H.I., K.K., D.G.M., J.R.M., I.O., A.W.C.P., S.R., K.S., A.V., K.W., J.W. and M.E.H. The WTCCC collaborated on array design. Validation experiments were performed by Y.Z. and M.H. D.F.C., D.P., S.W.S. and M.E.H. wrote the paper.
This file contains Supplementary Table 1: CNV map. Genomic locations for all 11,700 candidate CNVs, including the number of CEU and YRI individuals in which the CNV was detected during the discovery experiment.
This file contains Supplementary Table 2: CNV genotypes. Absolute integer copy number estimates for 5,238 CNVs in 450 individuals from 4 HapMap populations.