Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We would like to acknowledge E. Fluder and K. Shakir for their help in running XHMM at the large scale required for over 60,000 samples. Work at the Icahn School of Medicine at Mount Sinai was supported by the Institute for Genomics and Multiscale Biology (including computational resources and staff expertise provided by the Department of Scientific Computing) and NIH grants R01-HG005827 and R01-MH099126 (to S.M.P.).
Integrated supplementary information
Summary of the number of CNVs and genes affected by CNVs, stratified by ethnicity and gender.
Output of the full linear regression model used to create CNV intolerance scores for the seven main variables included.
Results from t tests of highly expressed genes from a given tissue versus the remaining genes, stratified by all CNV, deletions, and duplications.
Results from t tests of disease-related gene sets versus the remaining genes, stratified by all CNV, deletions, and duplications.
Summary of gene set enrichment results from the 5% most intolerant genes (n = 787) from ToppFun.
List of genes among the top 5% of intolerance (most intolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups.
Summary of gene set enrichment results for the 5% most tolerant genes (n = 787) from ToppFun.
List of genes among the bottom 5% of intolerance (most tolerant) that were present in at least one group of significant pathways, along with the CNV intolerance score, the number of pathways in each group, and the number of groups.
About this article
Genetics in Medicine (2019)