Origins and functional impact of copy number variation in the human genome

Conrad, Donald F.; Pinto, Dalila; Redon, Richard; Feuk, Lars; Gokcumen, Omer; Zhang, Yujun; Aerts, Jan; Andrews, T. Daniel; Barnes, Chris; Campbell, Peter; Fitzgerald, Tomas; Hu, Min; Ihm, Chun Hwa; Kristiansson, Kati; MacArthur, Daniel G.; MacDonald, Jeffrey R.; Onyiah, Ifejinelo; Pang, Andy Wing Chun; Robson, Sam; Stirrups, Kathy; Valsesia, Armand; Walter, Klaudia; Wei, John; Tyler-Smith, Chris; Carter, Nigel P.; Lee, Charles; Scherer, Stephen W.; Hurles, Matthew E.

doi:10.1038/nature08516

Article
Published: 07 October 2009

Origins and functional impact of copy number variation in the human genome

Donald F. Conrad¹^na1,
Dalila Pinto²^na1,
Richard Redon^1,3,
Lars Feuk^2,4,
Omer Gokcumen⁵,
Yujun Zhang¹,
Jan Aerts¹,
T. Daniel Andrews¹,
Chris Barnes¹,
Peter Campbell¹,
Tomas Fitzgerald¹,
Min Hu¹,
Chun Hwa Ihm⁵,
Kati Kristiansson¹,
Daniel G. MacArthur¹,
Jeffrey R. MacDonald²,
Ifejinelo Onyiah¹,
Andy Wing Chun Pang²,
Sam Robson¹,
Kathy Stirrups¹,
Armand Valsesia¹,
Klaudia Walter¹,
John Wei²,
The Wellcome Trust Case Control Consortium,
Chris Tyler-Smith¹,
Nigel P. Carter¹,
Charles Lee⁵,
Stephen W. Scherer^2,6 &
…
Matthew E. Hurles¹

Nature volume 464, pages 704–712 (2010)Cite this article

20k Accesses
1338 Citations
57 Altmetric
Metrics details

Abstract

Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Figure 1: **Overview of experimental strategy for CNV discovery and genotyping.**

Figure 2: **Functional impact of CNVs by type, frequency and population.**

Figure 3: **DNA sequence context enrichments around CNV breakpoints.**

Figure 4: **Circular map showing the genomic distribution of different classes of CNVs and their population differentiation.**

Figure 5: **Population properties of CNV show functional impact.**

Characterization of genome-wide STR variation in 6487 human genomes

Article Open access 12 April 2023

A landscape of complex tandem repeats within individual human genomes

Article Open access 14 September 2023

Protein-altering variants at copy number-variable regions influence diverse human phenotypes

Article Open access 28 March 2024

Accession codes

Primary accessions

ArrayExpress

Data deposits

The CNV discovery and CNV genotyping data are available at ArrayExpress (http://www.ebi.ac.uk/microarray-as/ae/) under accession numbers E-MTAB-40 and E-MTAB-142, respectively. Normalized CNV discovery data are available at http://www.sanger.ac.uk/humgen/cnv/42mio. CNVs are displayed at the Database of Genomic Variants (http://projects.tcag.ca/variation). CNV locations and genotypes are reported in Supplementary Tables 1 and 2.

References

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)
Levy, S. & Strausberg, R. L. Human genetics: Individual genomes diversify. Nature 456, 49–51 (2008)
Article CAS ADS Google Scholar
Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)
Article CAS ADS Google Scholar
Marchini, J. et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007)
Article CAS Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)
Article Google Scholar
Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)
Article CAS ADS Google Scholar
Conrad, D. F. et al. A high-resolution survey of deletion polymorphism in the human genome. Nature Genet. 38, 75–81 (2006)
Article CAS Google Scholar
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008)
Article CAS Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)
Article CAS ADS Google Scholar
Hurles, M. E., Dermitzakis, E. T. & Tyler-Smith, C. The functional impact of structural variation in humans. Trends Genet. 24, 238–245 (2008)
Article CAS Google Scholar
Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007)
Article CAS ADS Google Scholar
Buchanan, J. A. & Scherer, S. W. Contemplating effects of genomic structural variation. Genet. Med. 10, 639–647 (2008)
Article Google Scholar
McCarroll, S. A. et al. Deletion polymorphism upstream of IRiGM associated with altered IRGM expression and Crohn’s disease. Nature Genet. 40, 1107–1112 (2008)
Article CAS Google Scholar
Willer, C. J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2009)
Article CAS Google Scholar
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nature Genet. 41, 211–215 (2009)
Article CAS Google Scholar
Yang, T. L. et al. Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. Am. J. Hum. Genet. 83, 663–674 (2008)
Article CAS Google Scholar
Lee, C., Iafrate, A. J. & Brothman, A. R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nature Genet. 39 (suppl). S48–S54 (2007)
Article CAS Google Scholar
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008)
Article CAS ADS Google Scholar
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007)
Article CAS ADS Google Scholar
Gu, W., Zhang, F. & Lupski, J. R. Mechanisms for human genomic rearrangements. Pathogenetics 1, 4 (2008)
Article Google Scholar
Barnes, C. et al. A robust statistical method for case-control association testing with copy number variation. Nature Genet. 40, 1245–1252 (2008)
Article CAS Google Scholar
Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008)
Article CAS ADS Google Scholar
Ng, P. C. et al. Genetic variation in an individual human exome. PLoS Genet. 4, e1000160 (2008)
Article Google Scholar
Kim, P. M., Korbel, J. O. & Gerstein, M. B. Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc. Natl Acad. Sci. USA 104, 20274–20279 (2007)
Article CAS ADS Google Scholar
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005)
Article CAS Google Scholar
Jeffreys, A. J. et al. Human minisatellites, repeat DNA instability and meiotic recombination. Electrophoresis 20, 1665–1675 (1999)
Article CAS Google Scholar
Bacolla, A. & Wells, R. D. Non-B DNA conformations, genomic rearrangements, and human disease. J. Biol. Chem. 279, 47411–47414 (2004)
Article CAS Google Scholar
Myers, S. et al. A common sequence motif associated with recombination hot spots and genome instability in humans. Nature Genet. 40, 1124–1129 (2008)
Article CAS Google Scholar
Jeffreys, A. J. et al. Meiotic recombination hot spots and human DNA diversity. Phil. Trans. R. Soc. Lond. B 359, 141–152 (2004)
Article CAS Google Scholar
Huppert, J. L. & Balasubramanian, S. G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res. 35, 406–413 (2007)
Article CAS Google Scholar
Down, T. A. & Hubbard, T. J. NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res. 33, 1445–1453 (2005)
Article CAS Google Scholar
Sen, S. K. et al. Human genomic deletions mediated by recombination between Alu elements. Am. J. Hum. Genet. 79, 41–53 (2006)
Article CAS Google Scholar
Tian, D. et al. Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature 455, 105–108 (2008)
Article CAS ADS Google Scholar
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)
Article CAS Google Scholar
Pickeral, O. K., Makalowski, W., Boguski, M. S. & Boeke, J. D. Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res. 10, 411–415 (2000)
Article CAS Google Scholar
Gondo, Y. et al. High-frequency genetic reversion mediated by a DNA duplication: the mouse pink-eyed unstable mutation. Proc. Natl Acad. Sci. USA 90, 297–301 (1993)
Article CAS ADS Google Scholar
Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008)
Article Google Scholar
Emerson, J. J., Cardoso-Moreira, M., Borevitz, J. O. & Long, M. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster . Science 320, 1629–1631 (2008)
Article CAS ADS Google Scholar
Wang, L. L. et al. Intron-size constraint as a mutational mechanism in Rothmund-Thomson syndrome. Am. J. Hum. Genet. 71, 165–167 (2002)
Article CAS Google Scholar
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007)
Article CAS ADS Google Scholar
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006)
Article Google Scholar
Smith, E. E. & Malik, H. S. The apolipoprotein L family of programmed cell death and immunity genes rapidly evolved in primates at discrete sites of host-pathogen interactions. Genome Res. 19, 850–858 (2009)
Article CAS Google Scholar
Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009)
Article CAS Google Scholar
Silva, A. M. et al. Ethnicity-related skeletal muscle differences across the lifespan. Am. J. Hum. Biol. 10.1002/ajhb.20956 (16 June 2009)
MacArthur, D. G. et al. Loss of ACTN3 gene function alters mouse muscle metabolism and shows evidence of positive selection in humans. Nature Genet. 39, 1261–1265 (2007)
Article CAS Google Scholar
Nielsen, R. et al. Darwinian and demographic forces affecting human protein coding genes. Genome Res. 19, 838–849 (2009)
Article CAS Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009)
Article CAS ADS Google Scholar
Pique-Regi, R. et al. Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics 24, 309–318 (2008)
Article CAS Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007)
Article CAS Google Scholar
Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004)
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank A. Boyko, J. J. Emerson, J. Pickrell, S. Kudaravalli, J. Pritchard, T. Down, S. McCarroll, J. Collins, C. Beazley, M. Dermitzakis, P. Eis, T. Richmond, M. Hogan, D. Bailey, S. Giles, G. Speight, N. Sparkes, D. Peiffer, C. Chen, K. Li, P. Oeth, D. Stetson and D. Church for advice, sharing data, sharing software and technical assistance. We are grateful for the efforts and support of our colleagues at NimbleGen, Agilent, Illumina, Applied Biosystems and Sequenom. We thank J. Barrett for comments on an earlier version of the manuscript. The Centre for Applied Genomics at the Hospital for Sick Children and Wellcome Trust Sanger Institute are acknowledged for database, technical assistance and bioinformatics support. This research was supported by the Wellcome Trust (grant no. 077006/Z/05/Z; to M.E.H., N.P.C., C.T.-S.), Canada Foundation of Innovation and Ontario Innovation Trust (to S.W.S.), Canadian Institutes of Health Research (CIHR) (to S.W.S.), Genome Canada/Ontario Genomics Institute (to S.W.S.), the McLaughlin Centre for Molecular Medicine (to S.W.S.), Ontario Ministry of Research and Innovation (to S.W.S.), the Hospital for Sick Children Foundation (to S.W.S.), the Department of Pathology at Brigham and Women’s Hospital (to C.L.) and the National Institutes of Health (NIH) (grants HG004221 and GM081533; to C.L.). K.K. is supported by the Academy of Finland. D.P. is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). S.W.S. holds the GlaxoSmithKline Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children.

Author Contributions C.T.-S., N.P.C., C.L., S.W.S. and M.E.H. are all joint senior authors, and planned and managed the project. D.F.C. and D.P. lead the data analysis. Data analyses were performed by D.F.C., D.P., R.R., L.F., O.G., Y.Z., J.A., T.D.A., C.B., P.C., T.F., M.H., C.H.I., K.K., D.G.M., J.R.M., I.O., A.W.C.P., S.R., K.S., A.V., K.W., J.W. and M.E.H. The WTCCC collaborated on array design. Validation experiments were performed by Y.Z. and M.H. D.F.C., D.P., S.W.S. and M.E.H. wrote the paper.

Author information

Donald F. Conrad and Dalila Pinto: These authors contributed equally to this work.

Authors and Affiliations

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK ,
Donald F. Conrad, Richard Redon, Yujun Zhang, Jan Aerts, T. Daniel Andrews, Chris Barnes, Peter Campbell, Tomas Fitzgerald, Min Hu, Kati Kristiansson, Daniel G. MacArthur, Ifejinelo Onyiah, Sam Robson, Kathy Stirrups, Armand Valsesia, Klaudia Walter, Chris Tyler-Smith, Nigel P. Carter & Matthew E. Hurles
The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, MaRS Centre–East Tower, 101 College Street, Room 14-701, Toronto, Ontario M5G 1L7, Canada ,
Dalila Pinto, Lars Feuk, Jeffrey R. MacDonald, Andy Wing Chun Pang, John Wei & Stephen W. Scherer
Inserm UMR915, L’institut du thorax, Nantes 44035, France ,
Richard Redon
Uppsala: Department of Genetics and Pathology, Rudbeck Laboratory Uppsala University, Uppsala 751 85, Sweden
Lars Feuk
Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA,
Omer Gokcumen, Chun Hwa Ihm & Charles Lee
Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
Stephen W. Scherer

Authors

Donald F. Conrad
View author publications
You can also search for this author in PubMed Google Scholar
Dalila Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Richard Redon
View author publications
You can also search for this author in PubMed Google Scholar
Lars Feuk
View author publications
You can also search for this author in PubMed Google Scholar
Omer Gokcumen
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jan Aerts
View author publications
You can also search for this author in PubMed Google Scholar
T. Daniel Andrews
View author publications
You can also search for this author in PubMed Google Scholar
Chris Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Peter Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Fitzgerald
View author publications
You can also search for this author in PubMed Google Scholar
Min Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chun Hwa Ihm
View author publications
You can also search for this author in PubMed Google Scholar
Kati Kristiansson
View author publications
You can also search for this author in PubMed Google Scholar
Daniel G. MacArthur
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey R. MacDonald
View author publications
You can also search for this author in PubMed Google Scholar
Ifejinelo Onyiah
View author publications
You can also search for this author in PubMed Google Scholar
Andy Wing Chun Pang
View author publications
You can also search for this author in PubMed Google Scholar
Sam Robson
View author publications
You can also search for this author in PubMed Google Scholar
Kathy Stirrups
View author publications
You can also search for this author in PubMed Google Scholar
Armand Valsesia
View author publications
You can also search for this author in PubMed Google Scholar
Klaudia Walter
View author publications
You can also search for this author in PubMed Google Scholar
John Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chris Tyler-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Nigel P. Carter
View author publications
You can also search for this author in PubMed Google Scholar
Charles Lee
View author publications
You can also search for this author in PubMed Google Scholar
Stephen W. Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Hurles
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The Wellcome Trust Case Control Consortium

Corresponding authors

Correspondence to Stephen W. Scherer or Matthew E. Hurles.

Additional information

Lists of participants and affiliations appear in Supplementary Information.

Supplementary information

Supplementary Notes

This file contains Supplementary Notes, including Figures 1.1-1.12, Tables 1.1-1.7 and an Appendix of WTCCC authors and their affiliations. (PDF 3783 kb)

Supplementary Methods

This file contains Supplementary Methods, including Figures 2.1-2.30 and Tables 2.1-2.9, References and Appendices. (PDF 7430 kb)

Supplementary Table

This file contains Supplementary Table 1: CNV map. Genomic locations for all 11,700 candidate CNVs, including the number of CEU and YRI individuals in which the CNV was detected during the discovery experiment. (XLS 1860 kb)

Supplementary Table

This file contains Supplementary Table 2: CNV genotypes. Absolute integer copy number estimates for 5,238 CNVs in 450 individuals from 4 HapMap populations. (XLS 15811 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

PowerPoint slide for Fig. 5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conrad, D., Pinto, D., Redon, R. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). https://doi.org/10.1038/nature08516

Download citation

Received: 14 August 2009
Accepted: 21 September 2009
Published: 07 October 2009
Issue Date: 01 April 2010
DOI: https://doi.org/10.1038/nature08516

This article is cited by

A cohort study of neurodevelopmental disorders and/or congenital anomalies using high resolution chromosomal microarrays in southern Brazil highlighting the significance of ASD
- Tiago Fernando Chaves
- Maristela Ocampos
- Angelica Francesca Maris
Scientific Reports (2024)
Ethnic and functional differentiation of copy number polymorphisms in Tunisian and HapMap population unveils insights on genome organizational plasticity
- Lilia Romdhane
- Sameh Kefi
- Sonia Abdelhak
Scientific Reports (2024)
Using rare genetic mutations to revisit structural brain asymmetry
- Jakub Kopal
- Kuldeep Kumar
- Danilo Bzdok
Nature Communications (2024)
High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data
- Young-Lim Lee
- Mirte Bosse
- Carole Charlier
BMC Genomics (2023)
Exploring quantitative traits-associated copy number deletions through reanalysis of UK10K consortium whole genome sequencing cohorts
- Sejoon Lee
- Jinho Kim
- Jung Hun Ohn
BMC Genomics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.