Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo1, 2. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described1, 2. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or ‘non-genic’ sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions1, 3, 4, 5, 6. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes4 generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential7, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.
At a glance
- The evolutionary origin of orphan genes. Nature Rev. Genet. 12, 692–702 (2011) &
- Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010)
- Evolution and tinkering. Science 196, 1161–1166 (1977)
- Darwinian alchemy: human genes from noncoding DNA. Genome Res. 19, 1693–1695 (2009)
- More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009) , , , &
- Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011) &
- Protein homeostasis and the phenotypic manifestation of genetic diversity: principles and mechanisms. Annu. Rev. Genet. 44, 189–216 (2010) , &
- De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487–496 (2008) , , &
- De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011) , &
- Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010) &
- The relationship of protein conservation and sequence length. BMC Evol. Biol. 2, 20 (2002) , , , &
- The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl Acad. Sci. USA 106, 7273–7280 (2009) , , , &
- Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol. Evol. 2, 393–409 (2010) &
- The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet. 23, 219–224 (2007) &
- The complete DNA sequence of yeast chromosome III. Nature 357, 38–46 (1992) et al.
- Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast 23, 857–865 (2006) et al.
- The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008) et al.
- Large-scale exploration of growth inhibition caused by overexpression of genomic fragments in Saccharomyces cerevisiae. Genome Biol. 5, R72 (2004) et al.
- High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335, 552–557 (2012) et al.
- Revisiting the Saccharomyces cerevisiae predicted ORFeome. Genome Res. 18, 1294–1303 (2008) et al.
- Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481–1488 (2000) &
- The conversion of 3′ UTRs into coding regions. Mol. Biol. Evol. 24, 457–464 (2007) , &
- Codon usage is associated with the evolutionary age of genes in metazoan genomes. BMC Evol. Biol. 9, 285 (2009) , , &
- Composition bias and the origin of ORFan genes. Bioinformatics 26, 996–999 (2010) , , , &
- Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009) , , &
- Young proteins experience more variable selection pressures than old proteins. Genome Res. 20, 1574–1581 (2010) , , , &
- Very low gene duplication rate in the yeast genome. Science 306, 1367–1370 (2004) &
- Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme. Nature 474, 92–95 (2011) , &
- Highly expressed genes in yeast evolve slowly. Genetics 158, 927–931 (2001) , &
- A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol. 23, 327–337 (2006) , &
- Supplementary Information (11.1M)
This file contains Supplementary Figures 1-8, Supplementary Methods, Supplementary Tables 1-4 and additional references.