Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Patterns of genic intolerance of rare copy number variation in 59,898 human exomes

Abstract

Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Distribution of the number and amount of CNV across 59,898 exome-sequenced individuals.
Figure 2: Genic summary of rare deletions and duplications in the ExAC sample.
Figure 3: Brain-relevant genes demonstrate the greatest intolerance to dosage changes from CNVs.

Similar content being viewed by others

References

  1. Lupski, J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).

    Article  CAS  PubMed  Google Scholar 

  2. Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    Article  CAS  PubMed  Google Scholar 

  3. Jacobs, P.A., Baikie, A.G., Court Brown, W.M. & Strong, J.A. The somatic chromosomes in mongolism. Lancet 1, 710 (1959).

    Article  CAS  PubMed  Google Scholar 

  4. Lejeune, J., Turpin, R. & Gautier, M. Chromosomic diagnosis of mongolism. Arch. Fr. Pediatr. 16, 962–963 (1959).

    CAS  PubMed  Google Scholar 

  5. Jacobs, P.A., Matsuura, J.S., Mayer, M. & Newlands, I.M. A cytogenetic survey of an institution for the mentally retarded: I. chromosome abnormalities. Clin. Genet. 13, 37–60 (1978).

    Article  CAS  PubMed  Google Scholar 

  6. Sudmant, P.H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hastings, P.J., Lupski, J.R., Rosenberg, S.M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    Article  CAS  PubMed  Google Scholar 

  10. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    Article  CAS  PubMed  Google Scholar 

  12. Buchanan, J.A. & Scherer, S.W. Contemplating effects of genomic structural variation. Genet. Med. 10, 639–647 (2008).

    Article  PubMed  Google Scholar 

  13. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    Article  CAS  PubMed  Google Scholar 

  15. Zarrei, M., MacDonald, J.R., Merico, D. & Scherer, S.W. A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Plagnol, V. et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 28, 2747–2754 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Poultney, C.S. et al. Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder. Am. J. Hum. Genet. 93, 607–619 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature http://dx.doi.org/10.1038/nature19057 (2016).

  20. Fu, W. & Akey, J.M. Selection and adaptation in the human genome. Annu. Rev. Genomics Hum. Genet. 14, 467–489 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Purcell, S.M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Need, A.C. et al. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet. 5, e1000373 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Szatkiewicz, J.P. et al. Copy number variation in schizophrenia in Sweden. Mol. Psychiatry 19, 762–773 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Desachy, G. et al. Increased female autosomal burden of rare copy number variants in human populations and in autism families. Mol. Psychiatry 20, 170–175 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Huang, N., Lee, I., Marcotte, E.M. & Hurles, M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Ye, Y.N., Hua, Z.G., Huang, J., Rao, N. & Guo, F.B. CEG: a database of essential gene clusters. BMC Genomics 14, 769 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G.D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 5, e13984 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Handsaker, R.E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Kirov, G. et al. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol. Psychiatry 17, 142–153 (2012).

    Article  CAS  PubMed  Google Scholar 

  38. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Chen, J., Bardes, E.E., Aronow, B.J. & Jegga, A.G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge E. Fluder and K. Shakir for their help in running XHMM at the large scale required for over 60,000 samples. Work at the Icahn School of Medicine at Mount Sinai was supported by the Institute for Genomics and Multiscale Biology (including computational resources and staff expertise provided by the Department of Scientific Computing) and NIH grants R01-HG005827 and R01-MH099126 (to S.M.P.).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

D.M.R., M.F., and S.M.P. designed the study. M.L., K.J.K., and D.G.M. handled sample and data management. D.M.R., T.H., K.E.S., M.F., and S.M.P. contributed to statistical analyses. D.K., D.M.R., and K.J.K. designed and implemented website visualizations. D.M.R., M.J.D., D.G.M., M.F., and S.M.P. contributed to primary interpretations. D.M.R., M.F., and S.M.P. performed the primary drafting of the manuscript. All authors contributed to, read, and approved the final manuscript.

Corresponding authors

Correspondence to Douglas M Ruderfer or Shaun M Purcell.

Ethics declarations

Competing interests

M.F. is now an employee at Verily Life Sciences.

Additional information

A list of members and affiliations appears in the Supplementary Note.

Integrated supplementary information

Supplementary Figure 1 Histogram showing the proportion of genotyping array–based CNV calls that were also called by exome sequencing in 10,091 samples where CNVs from both platforms existed.

The histogram is stratified by the number of exome sequencing targets in which the array-based CNV overlapped.

Supplementary Figure 2 Histogram showing the number of CNVs called by exome sequencing and the number that were also called by genotyping arrays in 10,091 samples where CNVs from both platforms existed.

The histogram is stratified by the number of exome sequencing targets in which the CNV overlapped.

Supplementary Figure 3 Correlation of number of CNVs and average read depth by ExAC cohort.

ExAC cohorts stratified by sample and population (corresponding to colors) with mean read depth on the x axis and mean number of CNVs on the y axis.

Supplementary Figure 4 Average number of CNVs by population and ExAC cohort.

Mean number of CNVs, stratified by ethnicity on the x axis. Each point corresponds to an ExAC cohort (as denoted by its color), and the size of the point is proportional to the size of the subcohort from a particular ethnicity group.

Supplementary Figure 5 CNV frequency mediated by the number of pairs of segmental duplications within which a gene occurs.

Number of CNVs for each gene, binned by the number of pairs of segmental duplications between which the gene is found; note that 6+ denotes 6 or more pairs.

Supplementary Figure 6 Distribution of CNV intolerance scores.

Histogram of the CNV intolerance score for each gene before and after winsorizing the most tolerant end of the distribution.

Supplementary Figure 7 Correlation of CNV intolerance scores and loss-of-function constraint scores.

The violin plots show that increase in CNV intolerance score tracks with increase in both missense and loss-of-function SNV constraint scores stratified by decile.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Note. (PDF 1195 kb)

Supplementary Table 1

Summary of the number of CNVs and genes affected by CNVs, stratified by ethnicity and gender. (XLSX 10 kb)

Supplementary Table 2

Output of the full linear regression model used to create CNV intolerance scores for the seven main variables included. (XLSX 8 kb)

Supplementary Table 3

Results from t tests of highly expressed genes from a given tissue versus the remaining genes, stratified by all CNV, deletions, and duplications. (XLSX 12 kb)

Supplementary Table 4

Results from t tests of disease-related gene sets versus the remaining genes, stratified by all CNV, deletions, and duplications. (XLSX 11 kb)

Supplementary Table 5

Summary of gene set enrichment results from the 5% most intolerant genes (n = 787) from ToppFun. (XLSX 16 kb)

Supplementary Table 6

List of genes among the top 5% of intolerance (most intolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups. (XLSX 39 kb)

Supplementary Table 7

Summary of gene set enrichment results for the 5% most tolerant genes (n = 787) from ToppFun. (XLSX 17 kb)

Supplementary Table 8

List of genes among the bottom 5% of intolerance (most tolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups. (XLSX 15 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruderfer, D., Hamamsy, T., Lek, M. et al. Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat Genet 48, 1107–1111 (2016). https://doi.org/10.1038/ng.3638

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3638

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing