Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth


The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, whereas the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family and to the false positive annotation of protein-coding genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1
Figure 2: Young genes have higher ISD (black circles) than old genes.
Figure 3: In agreement with many previous studies, young genes evolve faster and are shorter.
Figure 4: Elevated ISD can be broken down into contributions from amino acid composition and from exact amino acid order.
Figure 5: Putative evidence for the continuum hypothesis can be explained as a statistical artefact known as Simpson’s paradox.
Figure 6: Young yeast genes, like the young mouse genes in Fig. 2, have higher ISD.


  1. 1

    McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).

    Article  Google Scholar 

  2. 2

    Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).

    CAS  Article  Google Scholar 

  3. 3

    Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

    CAS  Article  Google Scholar 

  4. 4

    Masel, J. Cryptic genetic variation is enriched for potential adaptations. Genetics 172, 1985–1991 (2006).

    CAS  Article  Google Scholar 

  5. 5

    Rajon, E. & Masel, J. The evolution of molecular error rates and the consequences for evolvability. Proc. Natl Acad. Sci. USA 108, 1082–1087 (2011).

    CAS  Article  Google Scholar 

  6. 6

    Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).

    CAS  Article  Google Scholar 

  7. 7

    Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).

    CAS  Article  Google Scholar 

  8. 8

    Romero, P . et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448 (1998).

    Google Scholar 

  9. 9

    Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).

    Article  Google Scholar 

  10. 10

    Yu, J.-F. et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 73, 2949–2957 (2016).

    CAS  Article  Google Scholar 

  11. 11

    Buljan, M., Frankish, A. & Bateman, A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).

    Article  Google Scholar 

  12. 12

    Moore, A. D. & Bornberg-Bauer, E. The dynamics and evolutionary potential of domain loss and emergence. Mol. Biol. Evol. 29, 787–796 (2012).

    CAS  Article  Google Scholar 

  13. 13

    Ekman, D. & Elofsson, A. Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).

    CAS  Article  Google Scholar 

  14. 14

    Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).

    CAS  Article  Google Scholar 

  15. 15

    Mukherjee, S., Panda, A. & Ghosh, T. C. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. Infect. Genet. Evol. 32, 330–337 (2015).

    CAS  Article  Google Scholar 

  16. 16

    Rancurel, C., Khosravi, M., Dunker, A. K., Romero, P. R. & Karlin, D. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J. Virol. 83, 10719–10736 (2009).

    CAS  Article  Google Scholar 

  17. 17

    Domazet-Lošo, T., Brajković, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).

    Article  Google Scholar 

  18. 18

    Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).

    CAS  Article  Google Scholar 

  19. 19

    Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2016).

    CAS  Article  Google Scholar 

  20. 20

    Albà, M. M. & Castresana, J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).

    Article  Google Scholar 

  21. 21

    Chen, S. C.-C., Chuang, T.-J. & Li, W.-H. The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol. Biol. Evol. 28, 2513–2520 (2011).

    CAS  Article  Google Scholar 

  22. 22

    Podder, S. & Ghosh, T. C. Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27, 934–941 (2010).

    CAS  Article  Google Scholar 

  23. 23

    Light, S., Basile, W. & Elofsson, A. Orphans and new gene origination, a structural and evolutionary perspective. Curr. Opin. Struct. Biol. 26, 73–83 (2014).

    CAS  Article  Google Scholar 

  24. 24

    Domazet-Lošo, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).

    PubMed  PubMed Central  Google Scholar 

  25. 25

    White, S. H. Amino acid preferences of small proteins. J. Mol. Biol. 227, 991–995 (1992).

    CAS  Article  Google Scholar 

  26. 26

    Irbäck, A. & Sandelin, E. On hydrophobicity correlations in protein chains. Biophys. J. 79, 2252–2258 (2000).

    Article  Google Scholar 

  27. 27

    Sandelin, E. On hydrophobicity and conformational specificity in proteins. Biophys. J. 86, 23–30 (2004).

    CAS  Article  Google Scholar 

  28. 28

    Bock, W. J. Preadaptation and multiple evolutionary pathways. Evolution 13, 194–211 (1959).

    Article  Google Scholar 

  29. 29

    Gould, S. J. & Vrba, E. S. Exaptation—a missing term in the science of form. Paleobiology 8, 4–15 (1982).

    Article  Google Scholar 

  30. 30

    Whitehead, D. J., Wilke, C. O., Vernazobres, D. & Bornberg-Bauer, E. The look-ahead effect of phenotypic mutations. Biol. Direct 3, 18 (2008).

    Article  Google Scholar 

  31. 31

    Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).

    Article  Google Scholar 

  32. 32

    Malinas, G. & Bigelow, J. Simpson’s Paradox (ed. Zalta, E. N. ) (2016).

    Google Scholar 

  33. 33

    Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).

    Article  Google Scholar 

  34. 34

    Graur, D. et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).

    Article  Google Scholar 

  35. 35

    Tartaglia, G. G., Pellarin, R., Cavalli, A. & Caflisch, A. Organism complexity anti-correlates with proteomic β-aggregation propensity. Protein Sci. 14, 2735–2740 (2005).

    CAS  Article  Google Scholar 

  36. 36

    Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).

    CAS  Article  Google Scholar 

  37. 37

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    CAS  Article  Google Scholar 

  38. 38

    Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).

    CAS  Article  Google Scholar 

  39. 39

    Uversky, V. N. & Dunker, A. K. Understanding protein non-folding. BBA-Proteins Proteom. 1804, 1231–1264 (2010).

    CAS  Article  Google Scholar 

  40. 40

    Smit, A. F. A., Hubley, R . & Green, P. RepeatMasker Open-4.0 v. 4.0.5 (2013–2015);

  41. 41

    Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).

    CAS  Article  Google Scholar 

Download references


Work was supported by the John Templeton Foundation (39667), the National Institutes of Health (GM104040) and ERC grant NewGenes (322564). We thank D. Tautz and M. Cordes for discussions, R. Bakaric for assistance with phylostratigraphy and A.-R. Carvunis for comments on a draft of the manuscript and for sharing data.

Author information




J.M and R.N. conceived the approach, R.N. performed the phylostratigraphy, B.A.W. and S.G.F. completed all other data analyses, and J.M. wrote the paper.

Corresponding author

Correspondence to Joanna Masel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information

Supplementary Figure 1 and Supplementary Table 1 (PDF 376 kb)

Supplementary Table 2

M. musculus proteins. (CSV 12497 kb)

Supplementary Table 3

Nucleotide sequences from intergenic regions of M. musculus genome (CSV 70010 kb)

Supplementary Table 4

Nucleotide sequences from intergenic regions of the masked M. musculus genome (CSV 70154 kb)

Supplementary Table 5

Randomly generated nucleotide sequences (CSV 35221 kb)

Supplementary Table 6

Scrambled amino acid sequences (CSV 12119 kb)

Supplementary Table 7

S. cerevisiae proteins from Table 1 (CSV 3330 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wilson, B., Foy, S., Neme, R. et al. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1, 0146 (2017).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing