Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo1,2. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described1,2. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or ‘non-genic’ sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions1,3,4,5,6. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes4 generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential7, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify 1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    & The evolutionary origin of orphan genes. Nature Rev. Genet. 12, 692–702 (2011)

  2. 2.

    Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010)

  3. 3.

    Evolution and tinkering. Science 196, 1161–1166 (1977)

  4. 4.

    Darwinian alchemy: human genes from noncoding DNA. Genome Res. 19, 1693–1695 (2009)

  5. 5.

    , , , & More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009)

  6. 6.

    & Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011)

  7. 7.

    , & Protein homeostasis and the phenotypic manifestation of genetic diversity: principles and mechanisms. Annu. Rev. Genet. 44, 189–216 (2010)

  8. 8.

    , , & De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487–496 (2008)

  9. 9.

    , & De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011)

  10. 10.

    & Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010)

  11. 11.

    , , , & The relationship of protein conservation and sequence length. BMC Evol. Biol. 2, 20 (2002)

  12. 12.

    , , , & The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl Acad. Sci. USA 106, 7273–7280 (2009)

  13. 13.

    & Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol. Evol. 2, 393–409 (2010)

  14. 14.

    & The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet. 23, 219–224 (2007)

  15. 15.

    et al. The complete DNA sequence of yeast chromosome III. Nature 357, 38–46 (1992)

  16. 16.

    et al. Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast 23, 857–865 (2006)

  17. 17.

    et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008)

  18. 18.

    et al. Large-scale exploration of growth inhibition caused by overexpression of genomic fragments in Saccharomyces cerevisiae. Genome Biol. 5, R72 (2004)

  19. 19.

    et al. High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335, 552–557 (2012)

  20. 20.

    et al. Revisiting the Saccharomyces cerevisiae predicted ORFeome. Genome Res. 18, 1294–1303 (2008)

  21. 21.

    & Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481–1488 (2000)

  22. 22.

    , & The conversion of 3′ UTRs into coding regions. Mol. Biol. Evol. 24, 457–464 (2007)

  23. 23.

    , , & Codon usage is associated with the evolutionary age of genes in metazoan genomes. BMC Evol. Biol. 9, 285 (2009)

  24. 24.

    , , , & Composition bias and the origin of ORFan genes. Bioinformatics 26, 996–999 (2010)

  25. 25.

    , , & Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009)

  26. 26.

    , , , & Young proteins experience more variable selection pressures than old proteins. Genome Res. 20, 1574–1581 (2010)

  27. 27.

    & Very low gene duplication rate in the yeast genome. Science 306, 1367–1370 (2004)

  28. 28.

    , & Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme. Nature 474, 92–95 (2011)

  29. 29.

    , & Highly expressed genes in yeast evolve slowly. Genetics 158, 927–931 (2001)

  30. 30.

    , & A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol. 23, 327–337 (2006)

Download references


We thank L. Duret, E. Levy, J. Vandenhaute, Q. Li, H. Yu, P. Braun, M. Dreze, C. Foo, M. Mann, N. Kulak, J. Cox, C. Maire and S. Jhavery-Schneider as well as members of the Center for Cancer Systems Biology (CCSB), in particular A. Dricot-Ziter, A. MacWilliams, F. Roth, Y. Jacob and D. Hill for discussions and proofreading. A.R. was supported by a National Institute of Health Pioneer Award, a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and the Howard Hughes Medical Institute (HHMI). I.W. is a HHMI fellow of the Damon Runyon Cancer Research Institute. G.A.B. was supported by American Cancer Society Postdoctoral fellowship 117945-PF-09-136-01-RMC. M.V. is a Chercheur Qualifié Honoraire from the Fonds de la Recherche Scientifique (FRS-FNRS, Wallonia-Brussels Federation, Belgium). This work was supported by the grant R01-HG006061 from the National Human Genome Research Institute awarded to M.V.

Author information

Author notes

    • Nicolas Simonis

    Present address: Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Campus Plaine, Free University of Brussels, 1050 Brussels, Wallonia-Brussels Federation, Belgium.


  1. Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA

    • Anne-Ruxandra Carvunis
    • , Thomas Rolland
    • , Michael A. Calderwood
    • , Nicolas Simonis
    • , Benoit Charloteaux
    • , Justin Barbette
    • , Balaji Santhanam
    • , Michael E. Cusick
    •  & Marc Vidal
  2. Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Anne-Ruxandra Carvunis
    • , Thomas Rolland
    • , Michael A. Calderwood
    • , Nicolas Simonis
    • , Benoit Charloteaux
    • , Justin Barbette
    • , Balaji Santhanam
    • , Michael E. Cusick
    •  & Marc Vidal
  3. UJF-Grenoble 1/CNRS/TIMC-IMAG UMR 5525, Computational and Mathematical Biology Group, Grenoble F-38041, France

    • Anne-Ruxandra Carvunis
    •  & Nicolas Thierry-Mieg
  4. Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Ilan Wapinski
  5. Center for International Development and Harvard University, Cambridge, Massachusetts 02138, USA

    • Muhammed A. Yildirim
  6. Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liege, 4000 Liege, Wallonia-Brussels Federation, Belgium

    • Benoit Charloteaux
  7. The MIT Media Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA

    • César A. Hidalgo
  8. Howard Hughes Medical Institute, Department of Cellular and Molecular Pharmacology, University of California, San Francisco, and California Institute for Quantitative Biosciences, San Francisco, California 94158, USA

    • Gloria A. Brar
    •  & Jonathan S. Weissman
  9. Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA

    • Aviv Regev
  10. Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Aviv Regev


  1. Search for Anne-Ruxandra Carvunis in:

  2. Search for Thomas Rolland in:

  3. Search for Ilan Wapinski in:

  4. Search for Michael A. Calderwood in:

  5. Search for Muhammed A. Yildirim in:

  6. Search for Nicolas Simonis in:

  7. Search for Benoit Charloteaux in:

  8. Search for César A. Hidalgo in:

  9. Search for Justin Barbette in:

  10. Search for Balaji Santhanam in:

  11. Search for Gloria A. Brar in:

  12. Search for Jonathan S. Weissman in:

  13. Search for Aviv Regev in:

  14. Search for Nicolas Thierry-Mieg in:

  15. Search for Michael E. Cusick in:

  16. Search for Marc Vidal in:


A.-R.C., I.W., M.E.C. and M.V. conceived the project. A.-R.C. led the project and performed most of the analyses. T.R. evaluated cross-species transfer events, optimized the ribosome footprint analysis pipeline and assisted in other analyses. I.W. designed the conservation level tool and calculated most of the purifying selection statistics. M.A.C., C.A.H., A.R. and N.T.-M. advised on the research. M.A.Y. aligned the sequencing reads. B.S. predicted disordered and transmembrane regions and assisted in the cross-species transfer analyses. N.S. and B.C. assisted in analyses. G.A.B. and J.S.W. shared their expertise in ribosome footprinting data analysis and provided the meiosis ribosome footprinting raw and processed data. A.-R.C., T.R., M.E.C. and M.V. designed the figures. All authors contributed to writing the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Marc Vidal.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Figures 1-8, Supplementary Methods, Supplementary Tables 1-4 and additional references.

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.