Analysis | Published:

Patterns of genic intolerance of rare copy number variation in 59,898 human exomes

Nature Genetics volume 48, pages 11071111 (2016) | Download Citation

Abstract

Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).

  2. 2.

    , & Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

  3. 3.

    , , & The somatic chromosomes in mongolism. Lancet 1, 710 (1959).

  4. 4.

    , & Chromosomic diagnosis of mongolism. Arch. Fr. Pediatr. 16, 962–963 (1959).

  5. 5.

    , , & A cytogenetic survey of an institution for the mentally retarded: I. chromosome abnormalities. Clin. Genet. 13, 37–60 (1978).

  6. 6.

    et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

  7. 7.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  8. 8.

    , , & Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).

  9. 9.

    et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

  10. 10.

    et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

  11. 11.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  12. 12.

    & Contemplating effects of genomic structural variation. Genet. Med. 10, 639–647 (2008).

  13. 13.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

  14. 14.

    et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

  15. 15.

    , , & A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).

  16. 16.

    et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 28, 2747–2754 (2012).

  17. 17.

    et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).

  18. 18.

    et al. Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder. Am. J. Hum. Genet. 93, 607–619 (2013).

  19. 19.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature (2016).

  20. 20.

    & Selection and adaptation in the human genome. Annu. Rev. Genomics Hum. Genet. 14, 467–489 (2013).

  21. 21.

    et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).

  22. 22.

    et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

  23. 23.

    et al. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet. 5, e1000373 (2009).

  24. 24.

    et al. Copy number variation in schizophrenia in Sweden. Mol. Psychiatry 19, 762–773 (2014).

  25. 25.

    et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

  26. 26.

    et al. Increased female autosomal burden of rare copy number variants in human populations and in autism families. Mol. Psychiatry 20, 170–175 (2015).

  27. 27.

    et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

  28. 28.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

  29. 29.

    , , & Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

  30. 30.

    et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

  31. 31.

    et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).

  32. 32.

    et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

  33. 33.

    , , , & CEG: a database of essential gene clusters. BMC Genomics 14, 769 (2013).

  34. 34.

    et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).

  35. 35.

    , , , & Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 5, e13984 (2010).

  36. 36.

    et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).

  37. 37.

    et al. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol. Psychiatry 17, 142–153 (2012).

  38. 38.

    et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014).

  39. 39.

    , , & ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).

Download references

Acknowledgements

We would like to acknowledge E. Fluder and K. Shakir for their help in running XHMM at the large scale required for over 60,000 samples. Work at the Icahn School of Medicine at Mount Sinai was supported by the Institute for Genomics and Multiscale Biology (including computational resources and staff expertise provided by the Department of Scientific Computing) and NIH grants R01-HG005827 and R01-MH099126 (to S.M.P.).

Author information

Author notes

    • Menachem Fromer
    •  & Shaun M Purcell

    These authors contributed equally to this work.

Affiliations

  1. Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Douglas M Ruderfer
    • , Tymor Hamamsy
    • , David Kavanagh
    • , Menachem Fromer
    •  & Shaun M Purcell
  2. Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Douglas M Ruderfer
    • , David Kavanagh
    • , Menachem Fromer
    •  & Shaun M Purcell
  3. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Douglas M Ruderfer
    • , Monkol Lek
    • , Konrad J Karczewski
    • , Kaitlin E Samocha
    • , Mark J Daly
    • , Daniel G MacArthur
    • , Menachem Fromer
    •  & Shaun M Purcell
  4. Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.

    • Monkol Lek
    • , Konrad J Karczewski
    • , Kaitlin E Samocha
    • , Mark J Daly
    • , Daniel G MacArthur
    • , Menachem Fromer
    •  & Shaun M Purcell
  5. Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    • Shaun M Purcell

Consortia

  1. Exome Aggregation Consortium

    A list of members and affiliations appears in the Supplementary Note.

Authors

  1. Search for Douglas M Ruderfer in:

  2. Search for Tymor Hamamsy in:

  3. Search for Monkol Lek in:

  4. Search for Konrad J Karczewski in:

  5. Search for David Kavanagh in:

  6. Search for Kaitlin E Samocha in:

  7. Search for Mark J Daly in:

  8. Search for Daniel G MacArthur in:

  9. Search for Menachem Fromer in:

  10. Search for Shaun M Purcell in:

Contributions

D.M.R., M.F., and S.M.P. designed the study. M.L., K.J.K., and D.G.M. handled sample and data management. D.M.R., T.H., K.E.S., M.F., and S.M.P. contributed to statistical analyses. D.K., D.M.R., and K.J.K. designed and implemented website visualizations. D.M.R., M.J.D., D.G.M., M.F., and S.M.P. contributed to primary interpretations. D.M.R., M.F., and S.M.P. performed the primary drafting of the manuscript. All authors contributed to, read, and approved the final manuscript.

Competing interests

M.F. is now an employee at Verily Life Sciences.

Corresponding authors

Correspondence to Douglas M Ruderfer or Shaun M Purcell.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–7 and Supplementary Note.

Excel files

  1. 1.

    Supplementary Table 1

    Summary of the number of CNVs and genes affected by CNVs, stratified by ethnicity and gender.

  2. 2.

    Supplementary Table 2

    Output of the full linear regression model used to create CNV intolerance scores for the seven main variables included.

  3. 3.

    Supplementary Table 3

    Results from t tests of highly expressed genes from a given tissue versus the remaining genes, stratified by all CNV, deletions, and duplications.

  4. 4.

    Supplementary Table 4

    Results from t tests of disease-related gene sets versus the remaining genes, stratified by all CNV, deletions, and duplications.

  5. 5.

    Supplementary Table 5

    Summary of gene set enrichment results from the 5% most intolerant genes (n = 787) from ToppFun.

  6. 6.

    Supplementary Table 6

    List of genes among the top 5% of intolerance (most intolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups.

  7. 7.

    Supplementary Table 7

    Summary of gene set enrichment results for the 5% most tolerant genes (n = 787) from ToppFun.

  8. 8.

    Supplementary Table 8

    List of genes among the bottom 5% of intolerance (most tolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3638

Further reading

  • A global analysis of CNVs in diverse yak populations using whole-genome resequencing

    • Hui Wang
    • , Zhixin Chai
    • , Dan Hu
    • , Qiumei Ji
    • , Jinwei Xin
    • , Chengfu Zhang
    •  & Jincheng Zhong

    BMC Genomics (2019)

  • Gene expression across mammalian organ development

    • Margarida Cardoso-Moreira
    • , Jean Halbert
    • , Delphine Valloton
    • , Britta Velten
    • , Chunyan Chen
    • , Yi Shao
    • , Angélica Liechti
    • , Kelly Ascenção
    • , Coralie Rummel
    • , Svetlana Ovchinnikova
    • , Pavel V. Mazin
    • , Ioannis Xenarios
    • , Keith Harshman
    • , Matthew Mort
    • , David N. Cooper
    • , Carmen Sandi
    • , Michael J. Soares
    • , Paula G. Ferreira
    • , Sandra Afonso
    • , Miguel Carneiro
    • , James M. A. Turner
    • , John L. VandeBerg
    • , Amir Fallahshahroudi
    • , Per Jensen
    • , Rüdiger Behr
    • , Steven Lisgo
    • , Susan Lindsay
    • , Philipp Khaitovich
    • , Wolfgang Huber
    • , Julie Baker
    • , Simon Anders
    • , Yong E. Zhang
    •  & Henrik Kaessmann

    Nature (2019)

  • Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP

    • Anja Thormann
    • , Mihail Halachev
    • , William McLaren
    • , David J. Moore
    • , Victoria Svinti
    • , Archie Campbell
    • , Shona M. Kerr
    • , Marc Tischkowitz
    • , Sarah E. Hunt
    • , Malcolm G. Dunlop
    • , Matthew E. Hurles
    • , Caroline F. Wright
    • , Helen V. Firth
    • , Fiona Cunningham
    •  & David R. FitzPatrick

    Nature Communications (2019)

  • Measuring intolerance to mutation in human genetics

    • Zachary L. Fuller
    • , Jeremy J. Berg
    • , Hakhamanesh Mostafavi
    • , Guy Sella
    •  & Molly Przeworski

    Nature Genetics (2019)

  • Atlas-CNV: a validated approach to call single-exon CNVs in the eMERGESeq gene panel

    • Theodore Chiang
    • , Xiuping Liu
    • , Tsung-Jung Wu
    • , Jianhong Hu
    • , Fritz J. Sedlazeck
    • , Simon White
    • , Daniel Schaid
    • , Mariza de Andrade
    • , Gail P. Jarvik
    • , David Crosslin
    • , Ian Stanaway
    • , David S. Carrell
    • , John J. Connolly
    • , Hakon Hakonarson
    • , Emily E. Groopman
    • , Ali G. Gharavi
    • , Alexander Fedotov
    • , Weimin Bi
    • , Magalie S. Leduc
    • , David R. Murdock
    • , Yunyun Jiang
    • , Linyan Meng
    • , Christine M. Eng
    • , Shu Wen
    • , Yaping Yang
    • , Donna M. Muzny
    • , Eric Boerwinkle
    • , William Salerno
    • , Eric Venner
    •  & Richard A. Gibbs

    Genetics in Medicine (2019)