Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth


The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, whereas the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family and to the false positive annotation of protein-coding genes.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: Young genes have higher ISD (black circles) than old genes.
Figure 3: In agreement with many previous studies, young genes evolve faster and are shorter.
Figure 4: Elevated ISD can be broken down into contributions from amino acid composition and from exact amino acid order.
Figure 5: Putative evidence for the continuum hypothesis can be explained as a statistical artefact known as Simpson’s paradox.
Figure 6: Young yeast genes, like the young mouse genes in Fig. 2, have higher ISD.


  1. McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).

    Article  Google Scholar 

  2. Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).

    Article  CAS  Google Scholar 

  3. Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

    Article  CAS  Google Scholar 

  4. Masel, J. Cryptic genetic variation is enriched for potential adaptations. Genetics 172, 1985–1991 (2006).

    Article  CAS  Google Scholar 

  5. Rajon, E. & Masel, J. The evolution of molecular error rates and the consequences for evolvability. Proc. Natl Acad. Sci. USA 108, 1082–1087 (2011).

    Article  CAS  Google Scholar 

  6. Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).

    Article  CAS  Google Scholar 

  7. Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).

    Article  CAS  Google Scholar 

  8. Romero, P . et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448 (1998).

    Google Scholar 

  9. Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).

    Article  Google Scholar 

  10. Yu, J.-F. et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 73, 2949–2957 (2016).

    Article  CAS  Google Scholar 

  11. Buljan, M., Frankish, A. & Bateman, A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).

    Article  Google Scholar 

  12. Moore, A. D. & Bornberg-Bauer, E. The dynamics and evolutionary potential of domain loss and emergence. Mol. Biol. Evol. 29, 787–796 (2012).

    Article  CAS  Google Scholar 

  13. Ekman, D. & Elofsson, A. Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).

    Article  CAS  Google Scholar 

  14. Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).

    Article  CAS  Google Scholar 

  15. Mukherjee, S., Panda, A. & Ghosh, T. C. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. Infect. Genet. Evol. 32, 330–337 (2015).

    Article  CAS  Google Scholar 

  16. Rancurel, C., Khosravi, M., Dunker, A. K., Romero, P. R. & Karlin, D. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J. Virol. 83, 10719–10736 (2009).

    Article  CAS  Google Scholar 

  17. Domazet-Lošo, T., Brajković, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).

    Article  Google Scholar 

  18. Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).

    Article  CAS  Google Scholar 

  19. Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2016).

    Article  CAS  Google Scholar 

  20. Albà, M. M. & Castresana, J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).

    Article  Google Scholar 

  21. Chen, S. C.-C., Chuang, T.-J. & Li, W.-H. The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol. Biol. Evol. 28, 2513–2520 (2011).

    Article  CAS  Google Scholar 

  22. Podder, S. & Ghosh, T. C. Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27, 934–941 (2010).

    Article  CAS  Google Scholar 

  23. Light, S., Basile, W. & Elofsson, A. Orphans and new gene origination, a structural and evolutionary perspective. Curr. Opin. Struct. Biol. 26, 73–83 (2014).

    Article  CAS  Google Scholar 

  24. Domazet-Lošo, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).

    PubMed  PubMed Central  Google Scholar 

  25. White, S. H. Amino acid preferences of small proteins. J. Mol. Biol. 227, 991–995 (1992).

    Article  CAS  Google Scholar 

  26. Irbäck, A. & Sandelin, E. On hydrophobicity correlations in protein chains. Biophys. J. 79, 2252–2258 (2000).

    Article  Google Scholar 

  27. Sandelin, E. On hydrophobicity and conformational specificity in proteins. Biophys. J. 86, 23–30 (2004).

    Article  CAS  Google Scholar 

  28. Bock, W. J. Preadaptation and multiple evolutionary pathways. Evolution 13, 194–211 (1959).

    Article  Google Scholar 

  29. Gould, S. J. & Vrba, E. S. Exaptation—a missing term in the science of form. Paleobiology 8, 4–15 (1982).

    Article  Google Scholar 

  30. Whitehead, D. J., Wilke, C. O., Vernazobres, D. & Bornberg-Bauer, E. The look-ahead effect of phenotypic mutations. Biol. Direct 3, 18 (2008).

    Article  Google Scholar 

  31. Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).

    Article  Google Scholar 

  32. Malinas, G. & Bigelow, J. Simpson’s Paradox (ed. Zalta, E. N. ) (2016).

    Google Scholar 

  33. Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).

    Article  Google Scholar 

  34. Graur, D. et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).

    Article  Google Scholar 

  35. Tartaglia, G. G., Pellarin, R., Cavalli, A. & Caflisch, A. Organism complexity anti-correlates with proteomic β-aggregation propensity. Protein Sci. 14, 2735–2740 (2005).

    Article  CAS  Google Scholar 

  36. Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).

    Article  CAS  Google Scholar 

  37. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  38. Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).

    Article  CAS  Google Scholar 

  39. Uversky, V. N. & Dunker, A. K. Understanding protein non-folding. BBA-Proteins Proteom. 1804, 1231–1264 (2010).

    Article  CAS  Google Scholar 

  40. Smit, A. F. A., Hubley, R . & Green, P. RepeatMasker Open-4.0 v. 4.0.5 (2013–2015);

  41. Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).

    Article  CAS  Google Scholar 

Download references


Work was supported by the John Templeton Foundation (39667), the National Institutes of Health (GM104040) and ERC grant NewGenes (322564). We thank D. Tautz and M. Cordes for discussions, R. Bakaric for assistance with phylostratigraphy and A.-R. Carvunis for comments on a draft of the manuscript and for sharing data.

Author information

Authors and Affiliations



J.M and R.N. conceived the approach, R.N. performed the phylostratigraphy, B.A.W. and S.G.F. completed all other data analyses, and J.M. wrote the paper.

Corresponding author

Correspondence to Joanna Masel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information

Supplementary Figure 1 and Supplementary Table 1 (PDF 376 kb)

Supplementary Table 2

M. musculus proteins. (CSV 12497 kb)

Supplementary Table 3

Nucleotide sequences from intergenic regions of M. musculus genome (CSV 70010 kb)

Supplementary Table 4

Nucleotide sequences from intergenic regions of the masked M. musculus genome (CSV 70154 kb)

Supplementary Table 5

Randomly generated nucleotide sequences (CSV 35221 kb)

Supplementary Table 6

Scrambled amino acid sequences (CSV 12119 kb)

Supplementary Table 7

S. cerevisiae proteins from Table 1 (CSV 3330 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilson, B., Foy, S., Neme, R. et al. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1, 0146 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing