Article | Published:

Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth

Nature Ecology & Evolution volume 1, Article number: 0146 (2017) | Download Citation

Abstract

The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, whereas the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family and to the false positive annotation of protein-coding genes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

References

  1. 1.

    & New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).

  2. 2.

    & Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).

  3. 3.

    et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

  4. 4.

    Cryptic genetic variation is enriched for potential adaptations. Genetics 172, 1985–1991 (2006).

  5. 5.

    & The evolution of molecular error rates and the consequences for evolvability. Proc. Natl Acad. Sci. USA 108, 1082–1087 (2011).

  6. 6.

    & Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).

  7. 7.

    & Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).

  8. 8.

    . et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448 (1998).

  9. 9.

    , , & IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).

  10. 10.

    et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 73, 2949–2957 (2016).

  11. 11.

    , & Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).

  12. 12.

    & The dynamics and evolutionary potential of domain loss and emergence. Mol. Biol. Evol. 29, 787–796 (2012).

  13. 13.

    & Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).

  14. 14.

    & Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).

  15. 15.

    , & Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. Infect. Genet. Evol. 32, 330–337 (2015).

  16. 16.

    , , , & Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J. Virol. 83, 10719–10736 (2009).

  17. 17.

    , & A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).

  18. 18.

    & Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).

  19. 19.

    & Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2016).

  20. 20.

    & On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).

  21. 21.

    , & The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol. Biol. Evol. 28, 2513–2520 (2011).

  22. 22.

    & Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27, 934–941 (2010).

  23. 23.

    , & Orphans and new gene origination, a structural and evolutionary perspective. Curr. Opin. Struct. Biol. 26, 73–83 (2014).

  24. 24.

    et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).

  25. 25.

    Amino acid preferences of small proteins. J. Mol. Biol. 227, 991–995 (1992).

  26. 26.

    & On hydrophobicity correlations in protein chains. Biophys. J. 79, 2252–2258 (2000).

  27. 27.

    On hydrophobicity and conformational specificity in proteins. Biophys. J. 86, 23–30 (2004).

  28. 28.

    Preadaptation and multiple evolutionary pathways. Evolution 13, 194–211 (1959).

  29. 29.

    & Exaptation—a missing term in the science of form. Paleobiology 8, 4–15 (1982).

  30. 30.

    , , & The look-ahead effect of phenotypic mutations. Biol. Direct 3, 18 (2008).

  31. 31.

    , & Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).

  32. 32.

    & Simpson’s Paradox (ed. Zalta, E. N.) (2016).

  33. 33.

    & Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).

  34. 34.

    et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).

  35. 35.

    , , & Organism complexity anti-correlates with proteomic β-aggregation propensity. Protein Sci. 14, 2735–2740 (2005).

  36. 36.

    et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).

  37. 37.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  38. 38.

    et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).

  39. 39.

    & Understanding protein non-folding. BBA-Proteins Proteom. 1804, 1231–1264 (2010).

  40. 40.

    . & RepeatMasker Open-4.0 v. 4.0.5 (2013–2015);

  41. 41.

    et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).

Download references

Acknowledgements

Work was supported by the John Templeton Foundation (39667), the National Institutes of Health (GM104040) and ERC grant NewGenes (322564). We thank D. Tautz and M. Cordes for discussions, R. Bakaric for assistance with phylostratigraphy and A.-R. Carvunis for comments on a draft of the manuscript and for sharing data.

Author information

Author notes

    • Scott G. Foy
    •  & Rafik Neme

    Present addresses: St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA (S.G.F.); Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York 10032, USA (R.N.).

    • Benjamin A. Wilson
    •  & Scott G. Foy

    These authors contributed equally to this work.

Affiliations

  1. Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA.

    • Benjamin A. Wilson
    • , Scott G. Foy
    •  & Joanna Masel
  2. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön SH 24306, Germany.

    • Rafik Neme

Authors

  1. Search for Benjamin A. Wilson in:

  2. Search for Scott G. Foy in:

  3. Search for Rafik Neme in:

  4. Search for Joanna Masel in:

Contributions

J.M and R.N. conceived the approach, R.N. performed the phylostratigraphy, B.A.W. and S.G.F. completed all other data analyses, and J.M. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Joanna Masel.

Supplementary information

PDF files

  1. 1.

    Supplementary information

    Supplementary Figure 1 and Supplementary Table 1

CSV files

  1. 1.

    Supplementary Table 2

    M. musculus proteins.

  2. 2.

    Supplementary Table 3

    Nucleotide sequences from intergenic regions of M. musculus genome

  3. 3.

    Supplementary Table 4

    Nucleotide sequences from intergenic regions of the masked M. musculus genome

  4. 4.

    Supplementary Table 5

    Randomly generated nucleotide sequences

  5. 5.

    Supplementary Table 6

    Scrambled amino acid sequences

  6. 6.

    Supplementary Table 7

    S. cerevisiae proteins from Table 1

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41559-017-0146