Abstract
The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, whereas the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family and to the false positive annotation of protein-coding genes.
This is a preview of subscription content
Access options
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
$29.99
monthly
Subscribe to Journal
Get full journal access for 1 year
$119.00
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.






References
McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).
Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).
Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
Masel, J. Cryptic genetic variation is enriched for potential adaptations. Genetics 172, 1985–1991 (2006).
Rajon, E. & Masel, J. The evolution of molecular error rates and the consequences for evolvability. Proc. Natl Acad. Sci. USA 108, 1082–1087 (2011).
Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).
Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).
Romero, P . et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448 (1998).
Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).
Yu, J.-F. et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 73, 2949–2957 (2016).
Buljan, M., Frankish, A. & Bateman, A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).
Moore, A. D. & Bornberg-Bauer, E. The dynamics and evolutionary potential of domain loss and emergence. Mol. Biol. Evol. 29, 787–796 (2012).
Ekman, D. & Elofsson, A. Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).
Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).
Mukherjee, S., Panda, A. & Ghosh, T. C. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. Infect. Genet. Evol. 32, 330–337 (2015).
Rancurel, C., Khosravi, M., Dunker, A. K., Romero, P. R. & Karlin, D. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J. Virol. 83, 10719–10736 (2009).
Domazet-Lošo, T., Brajković, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).
Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).
Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2016).
Albà, M. M. & Castresana, J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).
Chen, S. C.-C., Chuang, T.-J. & Li, W.-H. The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol. Biol. Evol. 28, 2513–2520 (2011).
Podder, S. & Ghosh, T. C. Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27, 934–941 (2010).
Light, S., Basile, W. & Elofsson, A. Orphans and new gene origination, a structural and evolutionary perspective. Curr. Opin. Struct. Biol. 26, 73–83 (2014).
Domazet-Lošo, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).
White, S. H. Amino acid preferences of small proteins. J. Mol. Biol. 227, 991–995 (1992).
Irbäck, A. & Sandelin, E. On hydrophobicity correlations in protein chains. Biophys. J. 79, 2252–2258 (2000).
Sandelin, E. On hydrophobicity and conformational specificity in proteins. Biophys. J. 86, 23–30 (2004).
Bock, W. J. Preadaptation and multiple evolutionary pathways. Evolution 13, 194–211 (1959).
Gould, S. J. & Vrba, E. S. Exaptation—a missing term in the science of form. Paleobiology 8, 4–15 (1982).
Whitehead, D. J., Wilke, C. O., Vernazobres, D. & Bornberg-Bauer, E. The look-ahead effect of phenotypic mutations. Biol. Direct 3, 18 (2008).
Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).
Malinas, G. & Bigelow, J. Simpson’s Paradox (ed. Zalta, E. N. ) https://plato.stanford.edu/archives/fall2016/entries/paradox-simpson/ (2016).
Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).
Graur, D. et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).
Tartaglia, G. G., Pellarin, R., Cavalli, A. & Caflisch, A. Organism complexity anti-correlates with proteomic β-aggregation propensity. Protein Sci. 14, 2735–2740 (2005).
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).
Uversky, V. N. & Dunker, A. K. Understanding protein non-folding. BBA-Proteins Proteom. 1804, 1231–1264 (2010).
Smit, A. F. A., Hubley, R . & Green, P. RepeatMasker Open-4.0 v. 4.0.5 (2013–2015); http://www.repeatmasker.org
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Acknowledgements
Work was supported by the John Templeton Foundation (39667), the National Institutes of Health (GM104040) and ERC grant NewGenes (322564). We thank D. Tautz and M. Cordes for discussions, R. Bakaric for assistance with phylostratigraphy and A.-R. Carvunis for comments on a draft of the manuscript and for sharing data.
Author information
Authors and Affiliations
Contributions
J.M and R.N. conceived the approach, R.N. performed the phylostratigraphy, B.A.W. and S.G.F. completed all other data analyses, and J.M. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary information
Supplementary Figure 1 and Supplementary Table 1 (PDF 376 kb)
Supplementary Table 2
M. musculus proteins. (CSV 12497 kb)
Supplementary Table 3
Nucleotide sequences from intergenic regions of M. musculus genome (CSV 70010 kb)
Supplementary Table 4
Nucleotide sequences from intergenic regions of the masked M. musculus genome (CSV 70154 kb)
Supplementary Table 5
Randomly generated nucleotide sequences (CSV 35221 kb)
Supplementary Table 6
Scrambled amino acid sequences (CSV 12119 kb)
Supplementary Table 7
S. cerevisiae proteins from Table 1 (CSV 3330 kb)
Rights and permissions
About this article
Cite this article
Wilson, B., Foy, S., Neme, R. et al. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1, 0146 (2017). https://doi.org/10.1038/s41559-017-0146
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41559-017-0146
Further reading
-
The Origins and Functions of De Novo Genes: Against All Odds?
Journal of Molecular Evolution (2022)
-
A Thermodynamic Model for Water Activity and Redox Potential in Evolution and Development
Journal of Molecular Evolution (2022)
-
Xa7, a Small Orphan Gene Harboring Promoter Trap for AvrXa7, Leads to the Durable Resistance to Xanthomonas oryzae Pv. oryzae
Rice (2021)
-
Insights into how development and life-history dynamics shape the evolution of venom
EvoDevo (2021)
-
Intrinsic disorder in protein domains contributes to both organism complexity and clade-specific functions
Scientific Reports (2021)