A map of constrained coding regions in the human genome

Abstract

Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Gene-wide summary measures of constraint are prone to overstating and understating constraint within specific regions of protein-coding genes.
Fig. 2: The most constrained CCRs are enriched for pathogenic variants and are restricted to a small subset of genes.
Fig. 3: The relationship between CCRs and interspecies conservation.
Fig. 4: A comparison of CCRs with other models of genic and regional constraint.
Fig. 5: Evaluation of de novo mutations from a cohort with severe developmental delay, intellectual disability, and epileptic encephalopathy versus de novo variation from unaffected siblings of autism probands.

Data availability

The segmental duplications can be found at ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz. The self-chains can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chainSelf.txt.gz. The Pfam domains can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ucscGenePfam.txt.gz. The Ensembl exons file can be found at ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz. The gnomAD file can be found at https://storage.googleapis.com/gnomad-public/release/2.0.1/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz. The gnomAD coverage files can be found at the location indicated by the pattern below: https://storage.googleapis.com/gnomad-public/release/2.0.1/coverage/exomes/gnomad.exomes.r2.0.1.chr$chrom.coverage.txt.gz. The CADD files for both indels and SNPs can be found at http://krishna.gs.washington.edu/download/CADD/v1.3/InDels.tsv.gz and http://krishna.gs.washington.edu/download/CADD/v1.3/whole_genome_SNVs.tsv.gz. The GERP++ file can be found at http://mendel.stanford.edu/SidowLab/downloads/gerp/hg19.GERP_scores.tar.gz. The file for MPC can be found at ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/regional_missense_constraint/fordist_constraint_official_mpc_values.txt.gz. The whole-exome MTR file can be found, courtesy of the author, at http://mtr-viewer.mdhs.unimelb.edu.au:8079/mtrflatfile_1.0.txt.gz. The REVEL file can be found at https://rothsj06.u.hpc.mssm.edu/revel/revel_all_chromosomes.csv.zip. The file for pLI can be found at ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/manuscript_data/forweb_cleaned_exac_r03_march16_z_data_pLI.txt.gz. The ClinVar VCF file used in the analyses can be found at ftp://ftp.ncbi.nih.gov/pub/clinvar/vcf_GRCh37/archive_2.0/2017/clinvar_20170802.vcf.gz. Lastly, the de novo variants file from ref. 41 can be found on our s3 server at https://s3.us-east-2.amazonaws.com/pathoscore-data/samocha/samochadenovo.xlsx.

References

  1. 1.

    Wallis, W. A. The statistical research group, 1942–1945. J. Am. Stat. Assoc. 75, 320–330 (1980).

    Google Scholar 

  2. 2.

    Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).

    Article  CAS  Google Scholar 

  7. 7.

    Letunic, I., Doerks, T. & Bork, P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 40, D302–D305 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. 8.

    Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Klimke, W. et al. The National Center For Biotechnology Information’s Protein Clusters Database. Nucleic Acids Res. 37, D216–D223 (2009).

    Article  CAS  PubMed  Google Scholar 

  10. 10.

    Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    Article  CAS  Google Scholar 

  12. 12.

    Cabanski, C. R. et al. BlackOPs: increasing confidence in variant detection through mappability filtering. Nucleic Acids Res. 41, e178 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Aggarwala, V. & Voight, B. F. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48, 349–355 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Mugal, C. F. & Ellegren, H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol. 12, R58 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Carlson, J. et al. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Preprint at bioRxiv https://doi.org/10.1101/108290 (2017).

  17. 17.

    Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016).

    Article  CAS  Google Scholar 

  18. 18.

    Marfella, C. G. A. & Imbalzano, A. N. The Chd family of chromatin remodelers. Mutat. Res. 618, 30–40 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Van Houdt, J. K. J. et al. Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome. Nat. Genet. 44, 445–449 (2012).

    Article  CAS  PubMed  Google Scholar 

  20. 20.

    Spataro, N., Rodríguez, J. A., Navarro, A. & Bosch, E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum. Mol. Genet. 26, 489–500 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Gibson, J., Tapper, W., Ennis, S. & Collins, A. Exome-based linkage disequilibrium maps of individual genes: functional clustering and relationship to disease. Hum. Genet. 132, 233–243 (2013).

    Article  CAS  PubMed  Google Scholar 

  22. 22.

    Collins, A. The genomic and functional characteristics of disease genes. Brief. Bioinform. 16, 16–23 (2014).

    Article  CAS  PubMed  Google Scholar 

  23. 23.

    Lelieveld, S. H. et al. Spatial clustering of de novo missense mutations identifies candidate neurodevelopmental disorder-associated genes. Am. J. Hum. Genet. 101, 478–484 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Gussow, A. B., Petrovski, S., Wang, Q., Allen, A. S. & Goldstein, D. B. The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes. Genome Biol. 17, 9 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Lee, M. P. et al. Low frequency of p57KIP2 mutation in Beckwith-Wiedemann syndrome. Am. J. Hum. Genet. 61, 304–309 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Romanelli, V. et al. CDKN1C (p57 Kip)) analysis in Beckwith-Wiedemann syndrome (BWS) patients: genotype-phenotype correlations, novel mutations, and polymorphisms. Am. J. Med. Genet. A 152A, 1390–1397 (2010).

    CAS  PubMed  Google Scholar 

  28. 28.

    Higashimoto, K., Soejima, H., Saito, T., Okumura, K. & Mukai, T. Imprinting disruption of the CDKN1C/KCNQ1OT1 domain: the molecular mechanisms causing Beckwith-Wiedemann syndrome and cancer. Cytogenet. Genome Res. 113, 306–312 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. 29.

    Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).

    Article  CAS  PubMed  Google Scholar 

  31. 31.

    Weckhuysen, S. et al. KCNQ2 encephalopathy: emerging phenotype of a neonatal epileptic encephalopathy. Ann. Neurol. 71, 15–25 (2012).

    Article  CAS  PubMed  Google Scholar 

  32. 32.

    Tinel, N., Lauritzen, I., Chouabe, C., Lazdunski, M. & Borsotto, M. The KCNQ2 potassium channel: splice variants, functional and developmental expression. Brain localization and comparison with KCNQ3. FEBS Lett. 438, 171–176 (1998).

    Article  CAS  PubMed  Google Scholar 

  33. 33.

    Ocorr, K. et al. KCNQ potassium channel mutations cause cardiac arrhythmias in Drosophila that mimic the effects of aging. Proc. Natl Acad. Sci. USA 104, 3943–3948 (2007).

    Article  CAS  PubMed  Google Scholar 

  34. 34.

    Mark, M., Rijli, F. M. & Chambon, P. Homeobox genes in embryogenesis and pathogenesis. Pediatr. Res. 42, 421–429 (1997).

    Article  CAS  PubMed  Google Scholar 

  35. 35.

    Stevenson, R. E. in GeneReviews (eds Adam, M. P. et al.) (Univ. Washington, 1993–2018).

  36. 36.

    Higgs, D. R. et al. Understanding α-globin gene regulation: aiming to improve the management of thalassemia. Ann. NY Acad. Sci. 1054, 92–102 (2005).

    Article  CAS  PubMed  Google Scholar 

  37. 37.

    Baker, L. A., Allis, C. D. & Wang, G. G. PHD fingers in human diseases: disorders arising from misinterpreting epigenetic marks. Mutat. Res. 647, 3–12 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Musselman, C. A. & Kutateladze, T. G. PHD fingers: epigenetic effectors and potential drug targets. Mol. Interv. 9, 314–323 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Matthews, A. G. W. et al. RAG2 PHD finger couples histone H3 lysine 4 trimethylation with V(D)J recombination. Nature 450, 1106–1110 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Nishimura, K., Lee, S. B., Park, J. H. & Park, M. H. Essential role of eIF5A-1 and deoxyhypusine synthase in mouse embryonic development. Amino Acids 42, 703–710 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. 41.

    Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).

  42. 42.

    de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).

    Article  CAS  PubMed  Google Scholar 

  43. 43.

    Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Lelieveld, S. H. et al. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nat. Neurosci. 19, 1194–1196 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. 45.

    Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).

    Article  CAS  Google Scholar 

  46. 46.

    Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).

    Article  CAS  Google Scholar 

  47. 47.

    Epi4K Consortium. et al. De novo mutations in epileptic encephalopathies. Nature 501, 217–221 (2013).

    Article  CAS  Google Scholar 

  48. 48.

    Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Traynelis, J. et al. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 27, 1715–1729 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).

    Article  CAS  Google Scholar 

  54. 54.

    Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Homsy, J. et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Zou, J. et al. Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nat. Commun. 7, 13293 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Villard, E. et al. Mutation screening in dilated cardiomyopathy: prominent role of the beta myosin heavy chain gene. Eur. Heart J. 26, 794–803 (2005).

    Article  CAS  PubMed  Google Scholar 

  61. 61.

    Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics 31, 2202–2204 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Berg, J. S. et al. An informatics approach to analyzing the incidentalome. Genet. Med. 15, 36–44 (2013).

    Article  CAS  PubMed  Google Scholar 

  64. 64.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551–1566 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We acknowledge W. Pearson, C. Feschotte, J. Seger, G. Marth, N. Elde, and S. Kravitz for insightful discussions that motivated some of the analyses presented in this manuscript. We also thank the investigators who contributed to and created the Genome Aggregation Database for openly sharing the genetic variation datasets that facilitated our research. A.R.Q. was supported by the US National Institutes of Health through grants from the National Human Genome Research Institute (R01HG006693 and R01HG009141), the National Institute of General Medical Sciences (R01GM124355), and the National Cancer Institute (U24CA209999). R.M.L. was supported by a K99 award from the National Human Genome Research Institute (K99HG009532).

Author information

Affiliations

Authors

Contributions

A.R.Q. conceived the research question and organized the study. J.M.H. led the research and analysis. J.M.H., B.S.P., R.M.L., and A.R.Q. designed the coding constraint region model and contributed to the analyses. J.M.H. and A.R.Q. wrote the manuscript.

Corresponding author

Correspondence to Aaron R. Quinlan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Evaluation of CCR models by sequencing coverage threshold.

Evaluation of CCR models constructed using different coverage thresholds and different thresholds for the percentage of gnomAD individuals meeting the minimum coverage depth. For example, ‘10x.5 CCR’ reflects a CCR model where every position in a CCR region was required to have 10× coverage in at least 50% of gnomAD individuals. a, ROC curve based on the ClinVar variant set. b, PR curve based on ClinVar. True positives are pathogenic variants and likely pathogenic variants from ClinVar. True negatives are variants labeled as benign from ClinVar. The performance of each model is clearly very similar, and the ‘10x.5 CCR’ model imposed the most relaxed coverage requirement while exhibiting the highest performance. It was therefore chosen as the coverage threshold for the final model. 24,554 pathogenic variants from ClinVar were used, and 4,689 benign variants were used for the evaluation dataset.

Supplementary Figure 2 Correlation between exonic CpG density and genetic variation.

The sample size is the number of CCRs, which is 8,065,333 unique regions. Pearson’s correlation was used. a, Exonic CpG density compared to the density of exonic C>T or G>A transitions. b, Exonic CpG density compared to the density of all exonic variant types.

Supplementary Figure 3 Average exonic distance for adjacent gnomAD variants.

Distribution of the exonic distance between protein-changing (missense or LoF) variants in gnomAD without filtering regions by coverage, segmental duplications, or self-chains. The red dashed line is the average distance between protein-changing variants. The blue and black dashed lines represent the average length of CCRs in the 95th and 99th percentile, respectively.

Supplementary Figure 4 Correlation of constrained coding regions to other models of genic constraint.

The sample size is the number of CCRs ≥95%, which is 21,650 unique regions, and the number of genes with a Missense Z constraint score or pLI score is 18,225 genes for both sets. a, The correlation between a gene’s Missense Z metric (least to most constrained from left to right) and the number of CCRs in the 95th percentile or higher observed in the gene. b, The correlation between a gene’s RVIS metric (least to most constrained from left to right) and the number of CCRs in the 95th percentile or higher observed in the gene.

Supplementary Figure 5 Total number of shared and unique genes across metrics for predetermined constraint metric cutoffs.

a,b, Comparison of genes covered by each metric’s cutoff for constraint (CCR ≥ 95 (a) or 99 (b), pLI ≥ 0.9, and missense depletion ≤ 0.4). The dark blue bar indicates how many genes are unique to a particular metric’s cutoff for constraint, and the light blue-green bar represents how many of the genes for that cutoff are shared with at least one of the other two metrics.

Supplementary Figure 6 Precision–recall (PR) curves for the developmental disorder de novo variant evaluation set.

The true positives are 3,400 missense-only de novo variants from patients with developmental disorders. The true negatives are 1,269 missense de novo variants from the unaffected siblings of autism patients. The dots indicate the score cutoff with the maximal Youden J statistic for each tool. Values in parentheses indicate the F1 score, the weighted average of recall and precision, at the J-score cutoff.

Supplementary Figure 7 X-chromosome variant pathogenicity prediction comparison for CCR versus other metrics.

a, Enrichment of 166 pathogenic de novo mutations on the X chromosome in the most constrained X-CCRs and 43 benign mutations in the least constrained X-CCRs. The error bars represent 95% confidence intervals of 0.043–0.226 for the 0–20 bin, 0.46–2.07 for the 20–80 bin, 0.85–16.5 for the 80–90 bin, 0.69–41.1 for the 90–95 bin, and 1.35–77.2 for the 95–100 bin. b, ROC curve for the developmental disorder de novo variant evaluation set. The true positives are 166 missense-only de novo variants from patients with developmental disorders. The true negatives are 43 missense de novo variants from the unaffected siblings of autism patients. c, PR curve for X-CCR versus other metrics for the de novo set. The dots in b and c indicate the score cutoff with the maximal Youden J statistic for each tool. Values in parentheses indicate AUC and peak J score (respectively) for b and the F1 score, the weighted average of recall and precision, at the J-score cutoff for c.

Supplementary Figure 8 Odds ratio comparison between ExAC-based CCR and gnomAD-based CCR for the ClinVar variant set.

True positives are 24,554 pathogenic variants and likely pathogenic variants from ClinVar. True negatives are 4,689 variants labeled as benign from ClinVar. For ExAC v1, the 95% confidence intervals are 0.021–0.028 for the 0–20 bin, 20.5–29.6 for the 20–80 bin, 9.09–20.0 for the 80–90 bin, 11.8–47.4 for the 90–95 bin, and 14.1–36.8 for the 95–100 bin. For gnomAD, the 95% confidence intervals are 0.015–0.023 for the 0–20 bin, 23.9–36.6 for the 20–80 bin, 14.6–45.4 for the 80–90 bin, 22.8–1151.0 for the 90–95 bin, and 40.4–647.5 for the 95–100 bin.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8

Reporting Summary

Supplementary Table 1

Genes with CCRs in the 99th percentile or higher

Supplementary Table 2

CCRs under purifying selection specifically in humans

Supplementary Table 3

CCR enrichment in Pfam domains

Supplementary Table 4

Highly constrained CCRs not covered by missense depletion

Supplementary Data

CCR percentile distributions for all Pfam domains

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Havrilla, J.M., Pedersen, B.S., Layer, R.M. et al. A map of constrained coding regions in the human genome. Nat Genet 51, 88–95 (2019). https://doi.org/10.1038/s41588-018-0294-6

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing