Abstract
Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lupski, J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).
Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Jacobs, P.A., Baikie, A.G., Court Brown, W.M. & Strong, J.A. The somatic chromosomes in mongolism. Lancet 1, 710 (1959).
Lejeune, J., Turpin, R. & Gautier, M. Chromosomic diagnosis of mongolism. Arch. Fr. Pediatr. 16, 962–963 (1959).
Jacobs, P.A., Matsuura, J.S., Mayer, M. & Newlands, I.M. A cytogenetic survey of an institution for the mentally retarded: I. chromosome abnormalities. Clin. Genet. 13, 37–60 (1978).
Sudmant, P.H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Hastings, P.J., Lupski, J.R., Rosenberg, S.M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Buchanan, J.A. & Scherer, S.W. Contemplating effects of genomic structural variation. Genet. Med. 10, 639–647 (2008).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).
Zarrei, M., MacDonald, J.R., Merico, D. & Scherer, S.W. A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).
Plagnol, V. et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 28, 2747–2754 (2012).
Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).
Poultney, C.S. et al. Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder. Am. J. Hum. Genet. 93, 607–619 (2013).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature http://dx.doi.org/10.1038/nature19057 (2016).
Fu, W. & Akey, J.M. Selection and adaptation in the human genome. Annu. Rev. Genomics Hum. Genet. 14, 467–489 (2013).
Purcell, S.M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).
Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).
Need, A.C. et al. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet. 5, e1000373 (2009).
Szatkiewicz, J.P. et al. Copy number variation in schizophrenia in Sweden. Mol. Psychiatry 19, 762–773 (2014).
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Desachy, G. et al. Increased female autosomal burden of rare copy number variants in human populations and in autism families. Mol. Psychiatry 20, 170–175 (2015).
Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Huang, N., Lee, I., Marcotte, E.M. & Hurles, M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).
Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).
Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).
Ye, Y.N., Hua, Z.G., Huang, J., Rao, N. & Guo, F.B. CEG: a database of essential gene clusters. BMC Genomics 14, 769 (2013).
Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).
Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G.D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 5, e13984 (2010).
Handsaker, R.E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).
Kirov, G. et al. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol. Psychiatry 17, 142–153 (2012).
Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014).
Chen, J., Bardes, E.E., Aronow, B.J. & Jegga, A.G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).
Acknowledgements
We would like to acknowledge E. Fluder and K. Shakir for their help in running XHMM at the large scale required for over 60,000 samples. Work at the Icahn School of Medicine at Mount Sinai was supported by the Institute for Genomics and Multiscale Biology (including computational resources and staff expertise provided by the Department of Scientific Computing) and NIH grants R01-HG005827 and R01-MH099126 (to S.M.P.).
Author information
Authors and Affiliations
Consortia
Contributions
D.M.R., M.F., and S.M.P. designed the study. M.L., K.J.K., and D.G.M. handled sample and data management. D.M.R., T.H., K.E.S., M.F., and S.M.P. contributed to statistical analyses. D.K., D.M.R., and K.J.K. designed and implemented website visualizations. D.M.R., M.J.D., D.G.M., M.F., and S.M.P. contributed to primary interpretations. D.M.R., M.F., and S.M.P. performed the primary drafting of the manuscript. All authors contributed to, read, and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
M.F. is now an employee at Verily Life Sciences.
Additional information
A list of members and affiliations appears in the Supplementary Note.
Integrated supplementary information
Supplementary Figure 1 Histogram showing the proportion of genotyping array–based CNV calls that were also called by exome sequencing in 10,091 samples where CNVs from both platforms existed.
The histogram is stratified by the number of exome sequencing targets in which the array-based CNV overlapped.
Supplementary Figure 2 Histogram showing the number of CNVs called by exome sequencing and the number that were also called by genotyping arrays in 10,091 samples where CNVs from both platforms existed.
The histogram is stratified by the number of exome sequencing targets in which the CNV overlapped.
Supplementary Figure 3 Correlation of number of CNVs and average read depth by ExAC cohort.
ExAC cohorts stratified by sample and population (corresponding to colors) with mean read depth on the x axis and mean number of CNVs on the y axis.
Supplementary Figure 4 Average number of CNVs by population and ExAC cohort.
Mean number of CNVs, stratified by ethnicity on the x axis. Each point corresponds to an ExAC cohort (as denoted by its color), and the size of the point is proportional to the size of the subcohort from a particular ethnicity group.
Supplementary Figure 5 CNV frequency mediated by the number of pairs of segmental duplications within which a gene occurs.
Number of CNVs for each gene, binned by the number of pairs of segmental duplications between which the gene is found; note that 6+ denotes 6 or more pairs.
Supplementary Figure 6 Distribution of CNV intolerance scores.
Histogram of the CNV intolerance score for each gene before and after winsorizing the most tolerant end of the distribution.
Supplementary Figure 7 Correlation of CNV intolerance scores and loss-of-function constraint scores.
The violin plots show that increase in CNV intolerance score tracks with increase in both missense and loss-of-function SNV constraint scores stratified by decile.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Note. (PDF 1195 kb)
Supplementary Table 1
Summary of the number of CNVs and genes affected by CNVs, stratified by ethnicity and gender. (XLSX 10 kb)
Supplementary Table 2
Output of the full linear regression model used to create CNV intolerance scores for the seven main variables included. (XLSX 8 kb)
Supplementary Table 3
Results from t tests of highly expressed genes from a given tissue versus the remaining genes, stratified by all CNV, deletions, and duplications. (XLSX 12 kb)
Supplementary Table 4
Results from t tests of disease-related gene sets versus the remaining genes, stratified by all CNV, deletions, and duplications. (XLSX 11 kb)
Supplementary Table 5
Summary of gene set enrichment results from the 5% most intolerant genes (n = 787) from ToppFun. (XLSX 16 kb)
Supplementary Table 6
List of genes among the top 5% of intolerance (most intolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups. (XLSX 39 kb)
Supplementary Table 7
Summary of gene set enrichment results for the 5% most tolerant genes (n = 787) from ToppFun. (XLSX 17 kb)
Supplementary Table 8
List of genes among the bottom 5% of intolerance (most tolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups. (XLSX 15 kb)
Rights and permissions
About this article
Cite this article
Ruderfer, D., Hamamsy, T., Lek, M. et al. Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat Genet 48, 1107–1111 (2016). https://doi.org/10.1038/ng.3638
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3638
This article is cited by
-
Copy Number Variations and Schizophrenia
Molecular Neurobiology (2023)
-
Chromosomal microarray analysis of 410 Han Chinese patients with autism spectrum disorder or unexplained intellectual disability and developmental delay
npj Genomic Medicine (2022)
-
Using induced pluripotent stem cells to investigate human neuronal phenotypes in 1q21.1 deletion and duplication syndrome
Molecular Psychiatry (2022)
-
Large-scale discovery of novel neurodevelopmental disorder-related genes through a unified analysis of single-nucleotide and copy number variants
Genome Medicine (2022)
-
Diverse monogenic subforms of human spermatogenic failure
Nature Communications (2022)