A map of constrained coding regions in the human genome

Havrilla, James M.; Pedersen, Brent S.; Layer, Ryan M.; Quinlan, Aaron R.

doi:10.1038/s41588-018-0294-6

Article
Published: 10 December 2018

A map of constrained coding regions in the human genome

Nature Genetics volume 51, pages 88–95 (2019)Cite this article

20k Accesses
136 Citations
159 Altmetric
Metrics details

Subjects

Abstract

Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Gene-wide summary measures of constraint are prone to overstating and understating constraint within specific regions of protein-coding genes.**

**Fig. 2: The most constrained CCRs are enriched for pathogenic variants and are restricted to a small subset of genes.**

**Fig. 3: The relationship between CCRs and interspecies conservation.**

**Fig. 4: A comparison of CCRs with other models of genic and regional constraint.**

Fig. 5: Evaluation of de novo mutations from a cohort with severe developmental delay, intellectual disability, and epileptic encephalopathy versus de novo variation from unaffected siblings of autism probands.

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Anoushka Joglekar, Wen Hu, … Hagen U. Tilgner

Genomic language model predicts protein co-regulation and function

Article Open access 03 April 2024

Yunha Hwang, Andre L. Cornman, … Peter R. Girguis

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

Data availability

The segmental duplications can be found at ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz. The self-chains can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chainSelf.txt.gz. The Pfam domains can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ucscGenePfam.txt.gz. The Ensembl exons file can be found at ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz. The gnomAD file can be found at https://storage.googleapis.com/gnomad-public/release/2.0.1/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz. The gnomAD coverage files can be found at the location indicated by the pattern below: https://storage.googleapis.com/gnomad-public/release/2.0.1/coverage/exomes/gnomad.exomes.r2.0.1.chr$chrom.coverage.txt.gz. The CADD files for both indels and SNPs can be found at http://krishna.gs.washington.edu/download/CADD/v1.3/InDels.tsv.gz and http://krishna.gs.washington.edu/download/CADD/v1.3/whole_genome_SNVs.tsv.gz. The GERP++ file can be found at http://mendel.stanford.edu/SidowLab/downloads/gerp/hg19.GERP_scores.tar.gz. The file for MPC can be found at ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/regional_missense_constraint/fordist_constraint_official_mpc_values.txt.gz. The whole-exome MTR file can be found, courtesy of the author, at http://mtr-viewer.mdhs.unimelb.edu.au:8079/mtrflatfile_1.0.txt.gz. The REVEL file can be found at https://rothsj06.u.hpc.mssm.edu/revel/revel_all_chromosomes.csv.zip. The file for pLI can be found at ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/manuscript_data/forweb_cleaned_exac_r03_march16_z_data_pLI.txt.gz. The ClinVar VCF file used in the analyses can be found at ftp://ftp.ncbi.nih.gov/pub/clinvar/vcf_GRCh37/archive_2.0/2017/clinvar_20170802.vcf.gz. Lastly, the de novo variants file from ref. ⁴¹ can be found on our s3 server at https://s3.us-east-2.amazonaws.com/pathoscore-data/samocha/samochadenovo.xlsx.

References

Wallis, W. A. The statistical research group, 1942–1945. J. Am. Stat. Assoc. 75, 320–330 (1980).
Google Scholar
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Article CAS PubMed PubMed Central Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Article CAS PubMed Google Scholar
Letunic, I., Doerks, T. & Bork, P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 40, D302–D305 (2012).
Article CAS PubMed Google Scholar
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
Article PubMed PubMed Central Google Scholar
Klimke, W. et al. The National Center For Biotechnology Information’s Protein Clusters Database. Nucleic Acids Res. 37, D216–D223 (2009).
Article CAS PubMed Google Scholar
Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).
Article CAS PubMed PubMed Central Google Scholar
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Article CAS PubMed Google Scholar
Cabanski, C. R. et al. BlackOPs: increasing confidence in variant detection through mappability filtering. Nucleic Acids Res. 41, e178 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).
Article CAS PubMed PubMed Central Google Scholar
Aggarwala, V. & Voight, B. F. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48, 349–355 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mugal, C. F. & Ellegren, H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol. 12, R58 (2011).
Article CAS PubMed PubMed Central Google Scholar
Carlson, J. et al. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Preprint at bioRxiv https://doi.org/10.1101/108290 (2017).
Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016).
Article CAS PubMed Google Scholar
Marfella, C. G. A. & Imbalzano, A. N. The Chd family of chromatin remodelers. Mutat. Res. 618, 30–40 (2007).
Article CAS PubMed PubMed Central Google Scholar
Van Houdt, J. K. J. et al. Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome. Nat. Genet. 44, 445–449 (2012).
Article CAS PubMed Google Scholar
Spataro, N., Rodríguez, J. A., Navarro, A. & Bosch, E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum. Mol. Genet. 26, 489–500 (2017).
CAS PubMed PubMed Central Google Scholar
Gibson, J., Tapper, W., Ennis, S. & Collins, A. Exome-based linkage disequilibrium maps of individual genes: functional clustering and relationship to disease. Hum. Genet. 132, 233–243 (2013).
Article CAS PubMed Google Scholar
Collins, A. The genomic and functional characteristics of disease genes. Brief. Bioinform. 16, 16–23 (2014).
Article CAS PubMed Google Scholar
Lelieveld, S. H. et al. Spatial clustering of de novo missense mutations identifies candidate neurodevelopmental disorder-associated genes. Am. J. Hum. Genet. 101, 478–484 (2017).
Article CAS PubMed PubMed Central Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gussow, A. B., Petrovski, S., Wang, Q., Allen, A. S. & Goldstein, D. B. The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes. Genome Biol. 17, 9 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lee, M. P. et al. Low frequency of p57^KIP2 mutation in Beckwith-Wiedemann syndrome. Am. J. Hum. Genet. 61, 304–309 (1997).
Article CAS PubMed PubMed Central Google Scholar
Romanelli, V. et al. CDKN1C (p57 ^Kip)) analysis in Beckwith-Wiedemann syndrome (BWS) patients: genotype-phenotype correlations, novel mutations, and polymorphisms. Am. J. Med. Genet. A 152A, 1390–1397 (2010).
CAS PubMed Google Scholar
Higashimoto, K., Soejima, H., Saito, T., Okumura, K. & Mukai, T. Imprinting disruption of the CDKN1C/KCNQ1OT1 domain: the molecular mechanisms causing Beckwith-Wiedemann syndrome and cancer. Cytogenet. Genome Res. 113, 306–312 (2006).
Article CAS PubMed Google Scholar
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).
Article CAS PubMed Google Scholar
Weckhuysen, S. et al. KCNQ2 encephalopathy: emerging phenotype of a neonatal epileptic encephalopathy. Ann. Neurol. 71, 15–25 (2012).
Article CAS PubMed Google Scholar
Tinel, N., Lauritzen, I., Chouabe, C., Lazdunski, M. & Borsotto, M. The KCNQ2 potassium channel: splice variants, functional and developmental expression. Brain localization and comparison with KCNQ3. FEBS Lett. 438, 171–176 (1998).
Article CAS PubMed Google Scholar
Ocorr, K. et al. KCNQ potassium channel mutations cause cardiac arrhythmias in Drosophila that mimic the effects of aging. Proc. Natl Acad. Sci. USA 104, 3943–3948 (2007).
Article CAS PubMed PubMed Central Google Scholar
Mark, M., Rijli, F. M. & Chambon, P. Homeobox genes in embryogenesis and pathogenesis. Pediatr. Res. 42, 421–429 (1997).
Article CAS PubMed Google Scholar
Stevenson, R. E. in GeneReviews (eds Adam, M. P. et al.) (Univ. Washington, 1993–2018).
Higgs, D. R. et al. Understanding α-globin gene regulation: aiming to improve the management of thalassemia. Ann. NY Acad. Sci. 1054, 92–102 (2005).
Article CAS PubMed Google Scholar
Baker, L. A., Allis, C. D. & Wang, G. G. PHD fingers in human diseases: disorders arising from misinterpreting epigenetic marks. Mutat. Res. 647, 3–12 (2008).
Article CAS PubMed PubMed Central Google Scholar
Musselman, C. A. & Kutateladze, T. G. PHD fingers: epigenetic effectors and potential drug targets. Mol. Interv. 9, 314–323 (2009).
Article CAS PubMed PubMed Central Google Scholar
Matthews, A. G. W. et al. RAG2 PHD finger couples histone H3 lysine 4 trimethylation with V(D)J recombination. Nature 450, 1106–1110 (2007).
Article CAS PubMed PubMed Central Google Scholar
Nishimura, K., Lee, S. B., Park, J. H. & Park, M. H. Essential role of eIF5A-1 and deoxyhypusine synthase in mouse embryonic development. Amino Acids 42, 703–710 (2012).
Article CAS PubMed Google Scholar
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).
de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).
Article CAS PubMed Google Scholar
Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012).
Article CAS PubMed Google Scholar
Lelieveld, S. H. et al. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nat. Neurosci. 19, 1194–1196 (2016).
Article CAS PubMed Google Scholar
Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).
Article CAS Google Scholar
Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).
Article CAS Google Scholar
Epi4K Consortium. et al. De novo mutations in epileptic encephalopathies. Nature 501, 217–221 (2013).
Article CAS Google Scholar
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Article CAS PubMed PubMed Central Google Scholar
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
Article CAS PubMed PubMed Central Google Scholar
Traynelis, J. et al. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 27, 1715–1729 (2017).
Article CAS PubMed PubMed Central Google Scholar
Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).
Article CAS PubMed Google Scholar
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Article CAS PubMed PubMed Central Google Scholar
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 (2017).
Article CAS PubMed PubMed Central Google Scholar
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
Article CAS PubMed PubMed Central Google Scholar
Homsy, J. et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015).
Article CAS PubMed PubMed Central Google Scholar
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zou, J. et al. Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nat. Commun. 7, 13293 (2016).
Article CAS PubMed PubMed Central Google Scholar
Villard, E. et al. Mutation screening in dilated cardiomyopathy: prominent role of the beta myosin heavy chain gene. Eur. Heart J. 26, 794–803 (2005).
Article CAS PubMed Google Scholar
Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics 31, 2202–2204 (2015).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article CAS PubMed PubMed Central Google Scholar
Berg, J. S. et al. An informatics approach to analyzing the incidentalome. Genet. Med. 15, 36–44 (2013).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551–1566 (2013).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge W. Pearson, C. Feschotte, J. Seger, G. Marth, N. Elde, and S. Kravitz for insightful discussions that motivated some of the analyses presented in this manuscript. We also thank the investigators who contributed to and created the Genome Aggregation Database for openly sharing the genetic variation datasets that facilitated our research. A.R.Q. was supported by the US National Institutes of Health through grants from the National Human Genome Research Institute (R01HG006693 and R01HG009141), the National Institute of General Medical Sciences (R01GM124355), and the National Cancer Institute (U24CA209999). R.M.L. was supported by a K99 award from the National Human Genome Research Institute (K99HG009532).

Author information

Authors and Affiliations

Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
James M. Havrilla, Brent S. Pedersen & Aaron R. Quinlan
USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
James M. Havrilla, Brent S. Pedersen & Aaron R. Quinlan
BioFrontiers Institute, University of Colorado, Boulder, CO, USA
Ryan M. Layer
Department of Computer Science, University of Colorado, Boulder, CO, USA
Ryan M. Layer
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
Aaron R. Quinlan

Authors

James M. Havrilla
View author publications
You can also search for this author in PubMed Google Scholar
Brent S. Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Ryan M. Layer
View author publications
You can also search for this author in PubMed Google Scholar
Aaron R. Quinlan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.R.Q. conceived the research question and organized the study. J.M.H. led the research and analysis. J.M.H., B.S.P., R.M.L., and A.R.Q. designed the coding constraint region model and contributed to the analyses. J.M.H. and A.R.Q. wrote the manuscript.

Corresponding author

Correspondence to Aaron R. Quinlan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Evaluation of CCR models by sequencing coverage threshold.

Evaluation of CCR models constructed using different coverage thresholds and different thresholds for the percentage of gnomAD individuals meeting the minimum coverage depth. For example, ‘10x.5 CCR’ reflects a CCR model where every position in a CCR region was required to have 10× coverage in at least 50% of gnomAD individuals. a, ROC curve based on the ClinVar variant set. b, PR curve based on ClinVar. True positives are pathogenic variants and likely pathogenic variants from ClinVar. True negatives are variants labeled as benign from ClinVar. The performance of each model is clearly very similar, and the ‘10x.5 CCR’ model imposed the most relaxed coverage requirement while exhibiting the highest performance. It was therefore chosen as the coverage threshold for the final model. 24,554 pathogenic variants from ClinVar were used, and 4,689 benign variants were used for the evaluation dataset.

Supplementary Figure 2 Correlation between exonic CpG density and genetic variation.

The sample size is the number of CCRs, which is 8,065,333 unique regions. Pearson’s correlation was used. a, Exonic CpG density compared to the density of exonic C>T or G>A transitions. b, Exonic CpG density compared to the density of all exonic variant types.

Supplementary Figure 3 Average exonic distance for adjacent gnomAD variants.

Distribution of the exonic distance between protein-changing (missense or LoF) variants in gnomAD without filtering regions by coverage, segmental duplications, or self-chains. The red dashed line is the average distance between protein-changing variants. The blue and black dashed lines represent the average length of CCRs in the 95th and 99th percentile, respectively.

Supplementary Figure 4 Correlation of constrained coding regions to other models of genic constraint.

The sample size is the number of CCRs ≥95%, which is 21,650 unique regions, and the number of genes with a Missense Z constraint score or pLI score is 18,225 genes for both sets. a, The correlation between a gene’s Missense Z metric (least to most constrained from left to right) and the number of CCRs in the 95th percentile or higher observed in the gene. b, The correlation between a gene’s RVIS metric (least to most constrained from left to right) and the number of CCRs in the 95th percentile or higher observed in the gene.

Supplementary Figure 5 Total number of shared and unique genes across metrics for predetermined constraint metric cutoffs.

a,b, Comparison of genes covered by each metric’s cutoff for constraint (CCR ≥ 95 (a) or 99 (b), pLI ≥ 0.9, and missense depletion ≤ 0.4). The dark blue bar indicates how many genes are unique to a particular metric’s cutoff for constraint, and the light blue-green bar represents how many of the genes for that cutoff are shared with at least one of the other two metrics.

Supplementary Figure 6 Precision–recall (PR) curves for the developmental disorder de novo variant evaluation set.

The true positives are 3,400 missense-only de novo variants from patients with developmental disorders. The true negatives are 1,269 missense de novo variants from the unaffected siblings of autism patients. The dots indicate the score cutoff with the maximal Youden J statistic for each tool. Values in parentheses indicate the F1 score, the weighted average of recall and precision, at the J-score cutoff.

Supplementary Figure 7 X-chromosome variant pathogenicity prediction comparison for CCR versus other metrics.

a, Enrichment of 166 pathogenic de novo mutations on the X chromosome in the most constrained X-CCRs and 43 benign mutations in the least constrained X-CCRs. The error bars represent 95% confidence intervals of 0.043–0.226 for the 0–20 bin, 0.46–2.07 for the 20–80 bin, 0.85–16.5 for the 80–90 bin, 0.69–41.1 for the 90–95 bin, and 1.35–77.2 for the 95–100 bin. b, ROC curve for the developmental disorder de novo variant evaluation set. The true positives are 166 missense-only de novo variants from patients with developmental disorders. The true negatives are 43 missense de novo variants from the unaffected siblings of autism patients. c, PR curve for X-CCR versus other metrics for the de novo set. The dots in b and c indicate the score cutoff with the maximal Youden J statistic for each tool. Values in parentheses indicate AUC and peak J score (respectively) for b and the F1 score, the weighted average of recall and precision, at the J-score cutoff for c.

Supplementary Figure 8 Odds ratio comparison between ExAC-based CCR and gnomAD-based CCR for the ClinVar variant set.

True positives are 24,554 pathogenic variants and likely pathogenic variants from ClinVar. True negatives are 4,689 variants labeled as benign from ClinVar. For ExAC v1, the 95% confidence intervals are 0.021–0.028 for the 0–20 bin, 20.5–29.6 for the 20–80 bin, 9.09–20.0 for the 80–90 bin, 11.8–47.4 for the 90–95 bin, and 14.1–36.8 for the 95–100 bin. For gnomAD, the 95% confidence intervals are 0.015–0.023 for the 0–20 bin, 23.9–36.6 for the 20–80 bin, 14.6–45.4 for the 80–90 bin, 22.8–1151.0 for the 90–95 bin, and 40.4–647.5 for the 95–100 bin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Havrilla, J.M., Pedersen, B.S., Layer, R.M. et al. A map of constrained coding regions in the human genome. Nat Genet 51, 88–95 (2019). https://doi.org/10.1038/s41588-018-0294-6

Download citation

Received: 14 December 2017
Accepted: 29 October 2018
Published: 10 December 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41588-018-0294-6

This article is cited by

Next-generation sequencing and bioinformatics in rare movement disorders
- Michael Zech
- Juliane Winkelmann
Nature Reviews Neurology (2024)
A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites
- Stuart A. MacGowan
- Fábio Madeira
- Geoffrey J. Barton
Communications Biology (2024)
An overload of missense variants in the OTOG gene may drive a higher prevalence of familial Meniere disease in the European population
- Alberto M. Parra-Perez
- Alvaro Gallego-Martinez
- Jose A. Lopez-Escamez
Human Genetics (2024)
Identifying and predicting the pathogenic effects of a novel variant inducing severe early onset MMA: a bioinformatics approach
- Fereshteh Maryami
- Elham Rismani
- Sirous Zeinali
Hereditas (2023)
dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
- Kangqi Lv
- Dayang Chen
- Xiuming Zhang
BMC Genomics (2023)