Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover

Abstract

A recent surge of studies have suggested that many novel genes arise de novo from previously noncoding DNA and not by duplication. However, most studies concentrated on longer evolutionary time scales and rarely considered protein structural properties. Therefore, it remains unclear how these properties are shaped by evolution, depend on genetic mechanisms and influence gene survival. Here we compare open reading frames (ORFs) from high coverage transcriptomes from mouse and another four mammals covering 160 million years of evolution. We find that novel ORFs pervasively emerge from noncoding regions but are rapidly lost again, while relatively fewer arise from the divergence of coding sequences but are retained much longer. We also find that a subset (14%) of the mouse-specific ORFs bind ribosomes and are potentially translated, showing that such ORFs can be the starting points of gene emergence. Surprisingly, disorder and other protein properties of young ORFs hardly change with gene age in short time frames. Only length and nucleotide composition change significantly. Thus, some transcribed de novo genes resemble ‘frozen accidents’ of randomly emerged ORFs that survived initial purging. This perspective complies with very recent studies indicating that some neutrally evolving transcripts containing random protein sequences may be translated and be viable starting points of de novo gene emergence.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: ORF annotation status varies with ORF age.
Fig. 2: Comparison of ORF sequence properties across age classes.
Fig. 3: Comparison of mouse-specific ORF sequences to randomly generated sequences.

Data availability

The datasets used and/or analysed during the current study are available at https://doi.org/10.6084/m9.figshare.6225563.v1.

References

  1. 1.

    Tautz, D. & Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692–702 (2011).

    Article  CAS  Google Scholar 

  2. 2.

    Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R. & Bosch, T. C. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Ohno, S. Evolution by Gene Duplication (Springer, New York, 1970).

  4. 4.

    Zhang, J. Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292–298 (2003).

    Article  Google Scholar 

  5. 5.

    Domazet-Loso, T. & Tautz, D. An evolutionary analysis of orphan genes in Drosophila. Genome Res. 13, 2213–2219 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Wissler, L., Gadau, J., Simola, D. F., Helmkampf, M. & Bornberg-Bauer, E. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol. Evol. 5, 439–455 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Wu, D.-D., Irwin, D. M. & Zhang, Y.-P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Donoghue, M. T., Keshavaiah, C., Swamidatta, S. H. & Spillane, C. Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol. 11, 47 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Begun, D. J., Lindfors, H. A., Kern, A. D. & Jones, C. D. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Geiler-Samerotte, K. A. et al. Misfolded proteins impose a dosage-dependent fitness cost and trigger a cytosolic unfolded protein response in yeast. Proc. Natl Acad. Sci USA 108, 680–685 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat. Rev. Genet. 6, 678–687 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Ptitsyn, O. B. Physical principles of protein structure and protein folding. J. Biosci. 8, 1–13 (1985).

    Article  CAS  Google Scholar 

  15. 15.

    Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Saibil, H. Chaperone machines for protein folding, unfolding and disaggregation. Nat. Rev. Mol. Cell Biol. 14, 630–642 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Tompa, P. Unstructural biology coming of age. Curr. Opin. Struct. Biol. 21, 419–425 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Wright, P. E. & Dyson, H. J. Intrinsically disordered proteins in cellular signaling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18–29 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Bellay, J. et al. Bringing order to protein disorder through comparative genomics and genetic interactions. Genome. Biol. 12, R14 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Bornberg-Bauer, E., Schmitz, J. & Heberlein, M. Emergence of de novo proteins from ‘dark genomic matter’ by ‘grow slow and moult’. Biochem. Soc. Trans. 43, 867–873 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 0146 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Basile, W., Sachenkova, O., Light, S. & Elofsson, A. High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput. Biol. 13, e1005375 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Schaefer, C., Schlessinger, A. & Rost, B. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. Bioinformatics 26, 625–631 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Tretyachenko, V. et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep. 7, 15449 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Keefe, A. D. & Szostak, J. W. Functional proteins from a random-sequence library. Nature 410, 715–718 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0217 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Hollfelder, F., Kirby, A. J., Tawfik, D. S., Kikuchi, K. & Hilvert, D. Characterization of proton-transfer catalysis by serum albumins. J. Am. Chem. Soc. 122, 1022–1029 (2000).

    Article  CAS  Google Scholar 

  30. 30.

    Chen, J.-Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Palmieri, N., Kosiol, C. & Schlötterer, C. The life cycle of Drosophila orphan genes. eLife 3, e01311 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Chen, S., Zhang, Y. E. & Long, M. New genes in Drosophila quickly become essential. Science 330, 1682–1685 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Reinhardt, J. A. et al. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet. 9, e1003860 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Gubala, A. M. et al. The Goddard and Saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 34, 1066–1082 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Long, M., Betrán, E., Thornton, K. & Wang, W. The origin of new genes:glimpses from the young and old. Nat. Rev. Genet. 4, 865–875 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. A. & Begun, D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl Acad. Sci. USA 103, 9935–9939 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Knowles, D. G. & McLysaght, A. Recent de novo origin of human protein-coding genes. Genome Res. 19, 1752–1759 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Ruiz-Orera, J. et al. Origins of de novo genes in human and chimpanzee. PLoS Genet. 11, e1005721 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Abrusán, G. Integration of new genes into cellular networks, and their structural maturation. Genetics 195, 1407–1417 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Luis Villanueva-Cañas, J. et al. New genes and functional innovation in mammals. Genome Biol. Evol. 9, 1886–1900 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Kapranov, P. & St. Laurent, G. Dark matter RNA: existence, function, and controversy. Front. Genet. 3, 60 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Singer, S. S., Männel, D. N., Hehlgans, T., Brosius, J. & Schmitz, J. From “junk” to gene: curriculum vitae of a primate receptor isoform gene. J. Mol. Biol. 341, 883–886 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Krull, M., Brosius, J. & Schmitz, J. Alu-SINE exonization: en route to protein-coding function. Mol. Biol. Evol. 22, 1702–1711 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Schmitz, J. & Brosius, J. Exonization of transposed elements: a challenge and opportunity for evolution. Biochimie 93, 1928–1934 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Kozak, M. Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187–208 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Mouilleron, H., Delcourt, V. & Roucou, X. Death of a dogma: eukaryotic mRNAs can code for more than one protein. Nucleic Acids Res. 44, 14–23 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Schmitz, J. F. & Bornberg-Bauer, E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Research 6, 57 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Ladoukakis, E., Pereira, V., Magny, E. G., Eyre-Walker, A. & Couso, J. P. Hundreds of putatively functional small open reading frames in Drosophila. Genome. Biol. 12, R118 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Couso, J. P. Finding smORFs: getting closer. Genome. Biol. 16, 189 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome. Biol. 16, 179 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Galindo, M. I., Pueyo, J. I., Fouix, S., Bishop, S. A. & Couso, J. P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Heinen, T. J. A. J., Staubach, F., Häming, D. & Tautz, D. Emergence of a new gene from an intergenic region. Curr. Biol. 19, 1527–1531 (2009).

    Article  CAS  Google Scholar 

  56. 56.

    Michel, A. M. et al. GWIPS-Viz: development of a Ribo-Seq genome browser. Nucleic Acids Res. 42, D859–D864 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Ruiz-Orera, J., Verdaguer-Grau, P., Villanueva-Cañas, J. L., Messeguer, X. & Albà, M. M. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol 1, 890–896 (2018).

    Article  Google Scholar 

  58. 58.

    Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Ahrens, J., Dos Santos, H. G. & Siltberg-Liberles, J. The nuanced interplay of intrinsic disorder and other structural properties driving protein evolution. Mol. Biol. Evol. 33, 2248–2256 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Rice, P. et al. EMBOSS: the European Molecular Biology open software suite. Trends Genet. 16, 276–277 (2000).

    Article  CAS  Google Scholar 

  64. 64.

    Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  Google Scholar 

  65. 65.

    Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J. & Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol. 22, 1302–1306 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Faure, G. & Callebaut, I. Comprehensive repertoire of foldable regions within whole genomes. PLoS Comput. Biol. 9, e1003280 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Wang, L. et al. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Linding, R., Schymkowitz, J., Rousseau, F., Diella, F. & Serrano, L. A comparative study of the relationship between protein structure and β-aggregation in globular and intrinsically disordered proteins. J. Mol. Biol. 342, 345–353 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

J.F.S. was supported by an HFSP grant to E.B.-B. We thank D. Tautz and R. Neme for input into the initial study design and A. Lange for valuable feedback on the manuscript.

Author information

Affiliations

Authors

Contributions

All authors conceived the study. J.F.S. performed the analyses. All authors analysed the data. J.F.S. and E.B.-B. wrote the paper. All authors read, finalized and approved the final manuscript.

Corresponding author

Correspondence to Erich Bornberg-Bauer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables and Supplementary Figures

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schmitz, J.F., Ullrich, K.K. & Bornberg-Bauer, E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2, 1626–1632 (2018). https://doi.org/10.1038/s41559-018-0639-7

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing