Translation of neutrally evolving peptides provides a basis for de novo gene evolution

  • Nature Ecology & Evolutionvolume 2pages890896 (2018)
  • doi:10.1038/s41559-018-0506-6
  • Download Citation
Published online:


Accumulating evidence indicates that some protein-coding genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that are translated at significant levels and that can at some point acquire new functions. Here, we use deep ribosome-profiling sequencing data, together with proteomics and single nucleotide polymorphism information, to search for these peptides. We find hundreds of open reading frames that are translated and that show no evolutionary conservation or selective constraints. These data suggest that the translation of these neutrally evolving peptides may be facilitated by the chance occurrence of open reading frames with a favourable codon composition. We conclude that the pervasive translation of the transcriptome provides plenty of material for the evolution of new functional proteins.

  • Subscribe to Nature Ecology & Evolution for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Kutter, C. et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 8, e1002841 (2012).

  2. 2.

    Wiberg, R. A. W. et al. Assessing recent selection and functionality at long noncoding RNA loci in the mouse genome. Genome Biol. Evol. 7, 2432–2444 (2015).

  3. 3.

    Ruiz-Orera, J. et al. Origins of de novo genes in human and chimpanzee. PLoS Genet. 11, e1005721 (2015).

  4. 4.

    Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5'UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife 4, e08890 (2015).

  5. 5.

    Raj, A. et al. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife 5, e13328 (2016).

  6. 6.

    Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

  7. 7.

    Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).

  8. 8.

    Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. Elife 3, e03523 (2014).

  9. 9.

    Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).

  10. 10.

    Couso, J.-P. & Patraquim, P. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 18, 575–589 (2017).

  11. 11.

    Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).

  12. 12.

    Bazzini, A. A. et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).

  13. 13.

    Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).

  14. 14.

    Aspden, J. L. et al. Extensive translation of small ORFs revealed by Poly-Ribo-Seq. Elife 3, e03528 (2014).

  15. 15.

    Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 1–21 (2015).

  16. 16.

    Begun, D. J., Lindfors, H. A., Kern, A. D. & Jones, C. D. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2006).

  17. 17.

    Tautz, D. & Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692–702 (2011).

  18. 18.

    McLysaght, A. & Hurst, L. D. Open questions in the study of de novo genes: what, how and why. Nat. Rev. Genet. 17, 567–578 (2016).

  19. 19.

    Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772 (2014).

  20. 20.

    Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

  21. 21.

    Toll-Riera, M. et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603–612 (2009).

  22. 22.

    Cai, J. J. & Petrov, D. A. Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol. Evol. 2, 393–409 (2010).

  23. 23.

    Chen, S., Zhang, Y. E. & Long, M. New genes in Drosophila quickly become essential. Science 330, 1682–1685 (2010).

  24. 24.

    Reinhardt, J. A. et al. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet. 9, e1003860 (2013).

  25. 25.

    Sunyaev, S., Kondrashov, F. A., Bork, P. & Ramensky, V. Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330 (2003).

  26. 26.

    Gayà-Vidal, M. & Albà, M. M. Uncovering adaptive evolution in the human lineage. BMC Genomics 15, 599 (2014).

  27. 27.

    Harr, B. et al. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Sci. Data 3, 160075 (2016).

  28. 28.

    Buck-Koehntop, B. A., Mascioni, A., Buffy, J. J. & Veglia, G. Structure, dynamics, and membrane topology of stannin: a mediator of neuronal cell apoptosis induced by trimethyltin chloride. J. Mol. Biol. 354, 652–665 (2005).

  29. 29.

    Pueyo, J. I. et al. Hemotin, a regulator of phagocytosis encoded by a small ORF and conserved across Metazoans. PLoS Biol. 14, e1002395 (2016).

  30. 30.

    Plotkin, J. B. & Kudla, G. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42 (2011).

  31. 31.

    Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016).

  32. 32.

    Slavoff, S. A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013).

  33. 33.

    Heinen, T. J. A. J., Staubach, F., Häming, D. & Tautz, D. Emergence of a new gene from an intergenic region. Curr. Biol. 19, 1527–1531 (2009).

  34. 34.

    Dana, A. & Tuller, T. The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 42, 9171–9181 (2014).

  35. 35.

    Yu, C. et al. Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol. Cell 59, 744–754 (2015).

  36. 36.

    Presnyak, V. et al. Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015).

  37. 37.

    Schlötterer, C. Genes from scratch — the evolutionary fate of de novo genes. Trends Genet. 31, 215–219 (2015).

  38. 38.

    Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

  39. 39.

    Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. Elife 5, e09977 (2016).

  40. 40.

    Lynch, M. & Marinov, G. K. The bioenergetic costs of a gene. Proc. Natl Acad. Sci. USA 112, 15690–15695 (2015).

  41. 41.

    Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 146 (2017).

  42. 42.

    Kaiser, C. A., Preuss, D., Grisafi, P. & Botstein, D. Many random sequences functionally replace the secretion signal sequence of yeast invertase. Science 235, 312–317 (1987).

  43. 43.

    Keefe, A. D. & Szostak, J. W. Functional proteins from a random-sequence library. Nature 410, 715–718 (2001).

  44. 44.

    Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0127 (2017).

  45. 45.

    Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).

  46. 46.

    Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).

  47. 47.

    Smeds, L. & Künstner, A. ConDeTri — a content dependent read trimmer for Illumina data. PLoS ONE 6, e26314 (2011).

  48. 48.

    Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

  49. 49.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  50. 50.

    Luis Villanueva-Cañas, J. et al. New genes and functional innovation in mammals. Genome Biol. Evol. 9, 1886–1900 (2017).

  51. 51.

    Gonzalez, C. et al. Ribosome profiling reveals a cell-type-specific translational landscape in brain tumors. J. Neurosci. 34, 10924–10936 (2014).

  52. 52.

    Castañeda, J. et al. Reduced pachytene piRNAs and translation underlie spermiogenic arrest in Maelstrom mutant mice. EMBO J. 33, 1999–2019 (2014).

  53. 53.

    Guo, H., Ingolia, N. T., Weissman, J. S. & Bartel, D. P. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835–840 (2010).

  54. 54.

    Diaz-Munoz, M. D. et al. The RNA-binding protein HuR is essential for the B cell antibody response. Nat. Immunol. 16, 415–425 (2015).

  55. 55.

    Cho, J. et al. Multiple repressive mechanisms in the hippocampus during memory formation. Science 350, 82–87 (2015).

  56. 56.

    Sedlazeck, F. J., Rescheneder, P. & von Haeseler, A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29, 2790–2791 (2013).

  57. 57.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  58. 58.

    Karolchik, D. et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).

  59. 59.

    Rosenberg, M. S., Subramanian, S. & Kumar, S. Patterns of transitional mutation biases within and among mammalian genomes. Mol. Biol. Evol. 20, 988–993 (2003).

  60. 60.

    R Development Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, 2016).

Download references


We are grateful for valuable discussions with many colleagues during this study. This work was funded by grants BFU2012-36820, BFU2015-65235-P and TIN2015-69175-C4-3-R from Ministerio de Economía e Innovación (Spanish Government) and co-funded by FEDER (EC). We also received funding from Agència de Gestió d’Ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR), grant no. 2014SGR1121.

Author information


  1. Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain

    • Jorge Ruiz-Orera
    • , José Luis Villanueva-Cañas
    •  & M. Mar Albà
  2. Computer Sciences Department, Universitat Politècnica de Catalunya , Barcelona, Spain

    • Pol Verdaguer-Grau
    •  & Xavier Messeguer
  3. Catalan Institution for Research and Advanced Studies, Barcelona, Spain

    • M. Mar Albà


  1. Search for Jorge Ruiz-Orera in:

  2. Search for Pol Verdaguer-Grau in:

  3. Search for José Luis Villanueva-Cañas in:

  4. Search for Xavier Messeguer in:

  5. Search for M. Mar Albà in:


J.R.-O. and M.M.A. conceived the study, interpreted the data and wrote the paper. J.R.-O. performed most of the analyses, including the transcript assemblies, identification of translated ORFs, BLAST searches, SNP mapping and generation of controls. J.R.-O., P.V.-G. and J.L.V.-C. wrote the code and performed analyses on the coding score. X.M. wrote the code to calculate the expected SNP frequencies. M.M.A. coordinated the study.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Jorge Ruiz-Orera or M. Mar Albà.

Supplementary information

  1. Supplementary Information

    Supplementary Tables 1–6, Supplementary Figures 1–10

  2. Life Sciences Reporting Summary