A recent surge of studies have suggested that many novel genes arise de novo from previously noncoding DNA and not by duplication. However, most studies concentrated on longer evolutionary time scales and rarely considered protein structural properties. Therefore, it remains unclear how these properties are shaped by evolution, depend on genetic mechanisms and influence gene survival. Here we compare open reading frames (ORFs) from high coverage transcriptomes from mouse and another four mammals covering 160 million years of evolution. We find that novel ORFs pervasively emerge from noncoding regions but are rapidly lost again, while relatively fewer arise from the divergence of coding sequences but are retained much longer. We also find that a subset (14%) of the mouse-specific ORFs bind ribosomes and are potentially translated, showing that such ORFs can be the starting points of gene emergence. Surprisingly, disorder and other protein properties of young ORFs hardly change with gene age in short time frames. Only length and nucleotide composition change significantly. Thus, some transcribed de novo genes resemble ‘frozen accidents’ of randomly emerged ORFs that survived initial purging. This perspective complies with very recent studies indicating that some neutrally evolving transcripts containing random protein sequences may be translated and be viable starting points of de novo gene emergence.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Ecology & Evolution Open Access 06 April 2023
Nature Ecology & Evolution Open Access 02 January 2023
Nature Communications Open Access 12 March 2021
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The datasets used and/or analysed during the current study are available at https://doi.org/10.6084/m9.figshare.6225563.v1.
Tautz, D. & Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692–702 (2011).
Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R. & Bosch, T. C. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009).
Ohno, S. Evolution by Gene Duplication (Springer, New York, 1970).
Zhang, J. Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292–298 (2003).
Domazet-Loso, T. & Tautz, D. An evolutionary analysis of orphan genes in Drosophila. Genome Res. 13, 2213–2219 (2003).
Wissler, L., Gadau, J., Simola, D. F., Helmkampf, M. & Bornberg-Bauer, E. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol. Evol. 5, 439–455 (2013).
Wu, D.-D., Irwin, D. M. & Zhang, Y.-P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).
Donoghue, M. T., Keshavaiah, C., Swamidatta, S. H. & Spillane, C. Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol. 11, 47 (2011).
Begun, D. J., Lindfors, H. A., Kern, A. D. & Jones, C. D. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).
Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).
Geiler-Samerotte, K. A. et al. Misfolded proteins impose a dosage-dependent fitness cost and trigger a cytosolic unfolded protein response in yeast. Proc. Natl Acad. Sci USA 108, 680–685 (2011).
DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat. Rev. Genet. 6, 678–687 (2005).
Ptitsyn, O. B. Physical principles of protein structure and protein folding. J. Biosci. 8, 1–13 (1985).
Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).
Saibil, H. Chaperone machines for protein folding, unfolding and disaggregation. Nat. Rev. Mol. Cell Biol. 14, 630–642 (2013).
Tompa, P. Unstructural biology coming of age. Curr. Opin. Struct. Biol. 21, 419–425 (2011).
Wright, P. E. & Dyson, H. J. Intrinsically disordered proteins in cellular signaling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18–29 (2015).
Bellay, J. et al. Bringing order to protein disorder through comparative genomics and genetic interactions. Genome. Biol. 12, R14 (2011).
Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772 (2014).
Bornberg-Bauer, E., Schmitz, J. & Heberlein, M. Emergence of de novo proteins from ‘dark genomic matter’ by ‘grow slow and moult’. Biochem. Soc. Trans. 43, 867–873 (2015).
Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 0146 (2017).
Basile, W., Sachenkova, O., Light, S. & Elofsson, A. High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput. Biol. 13, e1005375 (2017).
Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).
Schaefer, C., Schlessinger, A. & Rost, B. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. Bioinformatics 26, 625–631 (2010).
Tretyachenko, V. et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep. 7, 15449 (2017).
Keefe, A. D. & Szostak, J. W. Functional proteins from a random-sequence library. Nature 410, 715–718 (2001).
Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0217 (2017).
Hollfelder, F., Kirby, A. J., Tawfik, D. S., Kikuchi, K. & Hilvert, D. Characterization of proton-transfer catalysis by serum albumins. J. Am. Chem. Soc. 122, 1022–1029 (2000).
Chen, J.-Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).
Palmieri, N., Kosiol, C. & Schlötterer, C. The life cycle of Drosophila orphan genes. eLife 3, e01311 (2014).
Chen, S., Zhang, Y. E. & Long, M. New genes in Drosophila quickly become essential. Science 330, 1682–1685 (2010).
Reinhardt, J. A. et al. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet. 9, e1003860 (2013).
Gubala, A. M. et al. The Goddard and Saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 34, 1066–1082 (2017).
Long, M., Betrán, E., Thornton, K. & Wang, W. The origin of new genes:glimpses from the young and old. Nat. Rev. Genet. 4, 865–875 (2003).
Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. A. & Begun, D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl Acad. Sci. USA 103, 9935–9939 (2006).
Knowles, D. G. & McLysaght, A. Recent de novo origin of human protein-coding genes. Genome Res. 19, 1752–1759 (2009).
Ruiz-Orera, J. et al. Origins of de novo genes in human and chimpanzee. PLoS Genet. 11, e1005721 (2015).
Abrusán, G. Integration of new genes into cellular networks, and their structural maturation. Genetics 195, 1407–1417 (2013).
Luis Villanueva-Cañas, J. et al. New genes and functional innovation in mammals. Genome Biol. Evol. 9, 1886–1900 (2017).
Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).
Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).
Kapranov, P. & St. Laurent, G. Dark matter RNA: existence, function, and controversy. Front. Genet. 3, 60 (2012).
Singer, S. S., Männel, D. N., Hehlgans, T., Brosius, J. & Schmitz, J. From “junk” to gene: curriculum vitae of a primate receptor isoform gene. J. Mol. Biol. 341, 883–886 (2004).
Krull, M., Brosius, J. & Schmitz, J. Alu-SINE exonization: en route to protein-coding function. Mol. Biol. Evol. 22, 1702–1711 (2005).
Schmitz, J. & Brosius, J. Exonization of transposed elements: a challenge and opportunity for evolution. Biochimie 93, 1928–1934 (2011).
Kozak, M. Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187–208 (1999).
Mouilleron, H., Delcourt, V. & Roucou, X. Death of a dogma: eukaryotic mRNAs can code for more than one protein. Nucleic Acids Res. 44, 14–23 (2016).
Schmitz, J. F. & Bornberg-Bauer, E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Research 6, 57 (2017).
Ladoukakis, E., Pereira, V., Magny, E. G., Eyre-Walker, A. & Couso, J. P. Hundreds of putatively functional small open reading frames in Drosophila. Genome. Biol. 12, R118 (2011).
Couso, J. P. Finding smORFs: getting closer. Genome. Biol. 16, 189 (2015).
Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome. Biol. 16, 179 (2015).
Galindo, M. I., Pueyo, J. I., Fouix, S., Bishop, S. A. & Couso, J. P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106 (2007).
Heinen, T. J. A. J., Staubach, F., Häming, D. & Tautz, D. Emergence of a new gene from an intergenic region. Curr. Biol. 19, 1527–1531 (2009).
Michel, A. M. et al. GWIPS-Viz: development of a Ribo-Seq genome browser. Nucleic Acids Res. 42, D859–D864 (2014).
Ruiz-Orera, J., Verdaguer-Grau, P., Villanueva-Cañas, J. L., Messeguer, X. & Albà, M. M. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol 1, 890–896 (2018).
Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).
Ahrens, J., Dos Santos, H. G. & Siltberg-Liberles, J. The nuanced interplay of intrinsic disorder and other structural properties driving protein evolution. Mol. Biol. Evol. 33, 2248–2256 (2016).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Rice, P. et al. EMBOSS: the European Molecular Biology open software suite. Trends Genet. 16, 276–277 (2000).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J. & Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol. 22, 1302–1306 (2004).
Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).
Faure, G. & Callebaut, I. Comprehensive repertoire of foldable regions within whole genomes. PLoS Comput. Biol. 9, e1003280 (2013).
Wang, L. et al. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).
Linding, R., Schymkowitz, J., Rousseau, F., Diella, F. & Serrano, L. A comparative study of the relationship between protein structure and β-aggregation in globular and intrinsically disordered proteins. J. Mol. Biol. 342, 345–353 (2004).
J.F.S. was supported by an HFSP grant to E.B.-B. We thank D. Tautz and R. Neme for input into the initial study design and A. Lange for valuable feedback on the manuscript.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Schmitz, J.F., Ullrich, K.K. & Bornberg-Bauer, E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2, 1626–1632 (2018). https://doi.org/10.1038/s41559-018-0639-7
This article is cited by
Nature Ecology & Evolution (2023)
Nature Ecology & Evolution (2023)
Nature Ecology & Evolution (2023)
Journal of Molecular Evolution (2023)
Nature Communications (2021)