Abstract
Patterns of amino acid conservation have served as a tool for understanding protein evolution1. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients2. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes3,4 revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity5,6.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Alföldi, J. & Lindblad-Toh, K. Comparative genomics as a tool to understand evolution and disease. Genome Res. 23, 1063–1068 (2013)
Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Rev. Genet. 12, 628–640 (2011)
Katsanis, N. et al. BBS4 is a minor contributor to Bardet-Biedl syndrome and may also participate in triallelic inheritance. Am. J. Hum. Genet. 71, 22–29 (2002)
Khanna, H. et al. A common allele in RPGRIP1L is a modifier of retinal degeneration in ciliopathies. Nature Genet. 41, 739–745 (2009)
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010)
Sim, N. L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012)
Breen, M. S., Kemena, C., Vlasov, P. K., Notredame, C. & Kondrashov, F. A. Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012)
Weinreich, D. M., Delaney, N. F., DePristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006)
McCandlish, D. M., Rajon, E., Shah, P., Ding, Y. & Plotkin, J. B. The role of epistasis in protein evolution. Nature 497, E1–2 (2013)
Corbett-Detig, R. B., Zhou, J., Clark, A. G., Hartl, D. L. & Ayroles, J. F. Genetic incompatibilities are widespread within species. Nature 504, 135–137 (2013)
Gao, L. & Zhang, J. Why are some human disease-associated mutations fixed in mice? Trends Genet. 19, 678–681 (2003)
Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013)
Chou, H. H., Chiu, H. C., Delaney, N. F., Segre, D. & Marx, C. J. Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science 332, 1190–1192 (2011)
Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)
Kulathinal, R. J., Bettencourt, B. R. & Hartl, D. L. Compensated deleterious mutations in insect genomes. Science 306, 1553–1554 (2004)
Soylemez, O. & Kondrashov, F. A. Estimating the rate of irreversibility in protein evolution. Genome Biol. Evol. 4, 1213–1222 (2012)
Ferrer-Costa, C., Orozco, M. & de la Cruz, X. Characterization of compensated mutations in terms of structural and physico-chemical properties. J. Mol. Biol. 365, 249–256 (2007)
Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Mottaz, A., David, F. P., Veuthey, A. L. & Yip, Y. L. Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 26, 851–852 (2010)
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014)
Ahola, V., Aittokallio, T., Vihinen, M. & Uusipaikka, E. Model-based prediction of sequence alignment quality. Bioinformatics 24, 2165–2171 (2008)
Giudicessi, J. R. & Ackerman, M. J. Determinants of incomplete penetrance and variable expressivity in heritable cardiac arrhythmia syndromes. Transl. Res. 161, 1–14 (2013)
Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, 1984)
Povolotskaya, I. S. & Kondrashov, F. A. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010)
Zaghloul, N. A. et al. Functional analyses of variants reveal a significant role for dominant negative and common alleles in oligogenic Bardet-Biedl syndrome. Proc. Natl Acad. Sci. USA 107, 10602–10607 (2010)
Katsanis, N., Cotten, M. & Angrist, M. Exome and genome sequencing of neonates with neurodevelopmental disorders. Future Neurology 7, 655–658 (2012)
Herman, D. S. et al. Truncations of titin causing dilated cardiomyopathy. N. Engl. J. Med. 366, 619–628 (2012)
Montagnoli, A., Guardavaccaro, D., Starace, G. & Tirone, F. Overexpression of the nerve growth factor-inducible PC3 immediate early gene is associated with growth inhibition. Cell Growth Differ. 7, 1327–1336 (1996)
Beunders, G. et al. Exonic deletions in AUTS2 cause a syndromic form of intellectual disability and suggest a critical role for the C terminus. Am. J. Hum. Genet. 92, 210–220 (2013)
Fraïsse, C., Elderfield, J. A. & Welch, J. J. The genetics of speciation: are complex incompatibilities easier to evolve? J. Evol. Biol. 27, 688–699 (2014)
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012)
Karolchik, D. et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014)
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014)
Bainbridge, M. N. et al. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 12, R68 (2011)
Challis, D. et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 13, 8 (2012)
Niederriter, A. R. et al. In vivo modeling of the morbid human genome using Danio rerio . J. Vis. Exp. 78, e50338 (2012)
Acknowledgements
We thank Y. Liu and D. Balick for helpful discussions, M. Kousi for assistance with the NCL mutational list, and M. Talkowski, A. Kondrashov and G. Lyon for critical review of the manuscript. This work was supported by grants R01HD04260, R01DK072301 and R01DK075972 (N.K.); R01 GM078598, R01 MH101244, R01 DK095721 and U01 HG006500 (S.R.S.); R01EY021872 (E.E.D.); and a NARSAD Young Investigator Award (C.G.). N.K. is a Distinguished Brumley Professor.
Author information
Authors and Affiliations
Consortia
Contributions
D.M.J., S.G.F., S.R.S. and N.K. designed the overall study. D.M.J., C.A.C. and S.R.S. conceptualized the principle of CPDs and performed all computational analyses. S.G.F., E.E.D. and N.K. conceptualized the biological properties of CPDs and implemented in vivo testing with the assistance of C.G. J.K. referred the index patient and evaluated clinical data in the context of molecular discoveries. The Task Force for Neonatal Genomics constructed the platforms and methods for recruitment, ascertainment and evaluation of clinical and molecular data and return of results.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Lists of participants and their affiliations appear in the Supplementary Information.
Extended data figures and tables
Extended Data Figure 1 Different alignment methodologies with HumVar and ClinVar produce qualitatively similar alignments.
a, b, Distributions of missense variants annotated as neutral (a) or pathogenic (b) in the HumVar and ClinVar data sets, with each of the five alignment strategies described in the text (MultiZ unfiltered, MultiZ mammals-only, EPO, MultiZ with alignment quality filter, MultiZ with >1 sequence filter). All distributions are quantitatively similar. Compare with Fig. 2c, d.
Extended Data Figure 2 Protein domain structure of functionally tested human disease genes.
a, Schematic of BBS4 (519 amino acids) is depicted with eight tetratricopeptide (TPR) domains (yellow); b, RPGRIP1L (1,315 amino acids) has multiple coiled-coil domains (green rectangles) and two protein kinase C conserved region 2 (C2) domains (green hexagons); and c, BTG2 (158 amino acids) has one BTG1 domain (purple pentagon). Disease-causing alleles are shown with red stars; complementing alleles are represented with blue stars; amino acid number scale in increments of 100 is shown below each schematic.
Extended Data Figure 3 Evaluation of btg2 and nos2a/b MOs.
a–c, Schematic of the D. rerio btg2, nos2a and nos2b loci. Blue boxes, exons; dashed lines, introns; white boxes, untranslated regions; red boxes, MOs; ATG indicates the translational start site; arrows, polymerase chain reaction with reverse transcription (RT–PCR) primers; number indicates the targeted exon. d, e, Agarose gel images of nos2a/b RT–PCR products.
Extended Data Figure 4 HuC/HuD staining and quantification of 2 dpf zebrafish embryos confirms pathogenicity of BTG2 V141M.
a, Suppression of btg2 leads to a decrease of HuC/HuD levels at 2 dpf. Representative ventral images of control, btg2 morphants (images show unilateral or absent HuC/HuD expression), and a rescued embryo injected with a btg2 MO plus human BTG2 wild-type (WT) mRNA. Scale bar, 250 μm. b, Percentage of embryos with normal, bilateral HuC/HuD protein levels in the anterior forebrain or decreased/unilateral HuC/HuD protein levels in embryos injected with btg2 MOs alone or MOs plus human BTG2 wild-type or variant mRNAs (p.V141M, index case; p.A126S and p.R145Q, control alleles). *P < 0.05 (two-tailed t-test comparisons between MO-injected and rescued embryos; n = 38–78 per injection batch).
Supplementary information
Supplementary Information
This file contains Supplementary Text and Supplementary Tables 1-11. (PDF 992 kb)
Supplementary Data
This file contains the Predictor Code, the source code for the publically accessible online prediction algorithm. (TXT 6 kb)
Source data
Rights and permissions
About this article
Cite this article
Jordan, D., Frangakis, S., Golzio, C. et al. Identification of cis-suppression of human disease mutations by comparative genomics. Nature 524, 225–229 (2015). https://doi.org/10.1038/nature14497
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature14497
This article is cited by
-
Intragenic compensation through the lens of deep mutational scanning
Biophysical Reviews (2022)
-
Compensatory epistasis explored by molecular dynamics simulations
Human Genetics (2021)
-
Evolutionary and biomedical insights from a marmoset diploid genome assembly
Nature (2021)
-
Clinical evolution, genetic landscape and trajectories of clonal hematopoiesis in SAMD9/SAMD9L syndromes
Nature Medicine (2021)
-
Clinical and genetic variability in children with partial albinism
Scientific Reports (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.