Review Article | Published:

The evolutionary origin of orphan genes

Nature Reviews Genetics volume 12, pages 692702 (2011) | Download Citation

Abstract

Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.

Key points

  • Current models of gene evolution have concentrated on mechanisms that are mediated by duplication and by transposable elements, but these models have not yet fully evaluated the possibility of de novo evolution.

  • Orphan genes with no homology to genes in other evolutionary lineages occur in all genomes and are candidates for the de novo evolution of genes.

  • Several examples of de novo evolved genes have now been found, and the emergence process can involve a phase in which a gene functions as a non-coding RNA.

  • Orphan gene emergence rates are elevated during adaptive radiations.

  • Orphan gene emergence over time appears to be continuously high, but most of these newly emerged genes are likely to be subsequently lost.

  • It is still an open question whether newly evolved genes can form new stable protein domains or whether they act as intrinsically unstructured proteins.

  • The emergence of new genes may contribute to evolutionary novelties to the same degree as does the emergence of new regulatory interactions.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Evolution by Gene Duplication (Springer, New York, 1970).

  2. 2.

    Evolution and tinkering. Science 196, 1161–1166 (1977).

  3. 3.

    & Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).

  4. 4.

    Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292–298 (2003).

  5. 5.

    Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36 (2008).

  6. 6.

    & The life and death of gene families. Bioessays 31, 29–39 (2009).

  7. 7.

    Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010). This is a comprehensive review of all mechanisms of formation of new genes, in particular duplication and rearrangement processes.

  8. 8.

    , & How do new proteins arise? Curr. Opin. Struct. Biol. 20, 390–396 (2010).

  9. 9.

    , , & The origin of new genes: glimpses from the young and old. Nature Rev. Genet. 4, 865–875 (2003).

  10. 10.

    & On the origin and evolution of new genes — a genomic and experimental perspective. J. Genet. Genomics 35, 639–648 (2008).

  11. 11.

    The yeast genome project: what did we learn? Trends Genet. 12, 263–270 (1996).

  12. 12.

    , , & Bioinformatics and the discovery of gene function. Trends Genet. 12, 244–245 (1996).

  13. 13.

    & Finding families for genomic ORFans. Bioinformatics 15, 759–762 (1999).

  14. 14.

    , , , & More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009).

  15. 15.

    & Identification and investigation of ORFans in the viral world. BMC Genomics 9, 24 (2008).

  16. 16.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  17. 17.

    & Inverse relationship between evolutionary rate and age of mammalian genes. Mol. Biol. Evol. 22, 598–606 (2005).

  18. 18.

    , & The “inverse relationship between evolutionary rate and age of mammalian genes” is an artifact of increased genetic distance with rate of evolution and time of divergence. Mol. Biol. Evol. 23, 1–3 (2006).

  19. 19.

    & On homology searches by protein BLAST and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007). This is a crucial paper for understanding the power of BLAST for retrieving homologues and the probability of assigning orphan status to genes.

  20. 20.

    , , , & The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl Acad. Sci. USA 106, 7273–7280 (2009). This paper shows a universal log-normal distribution of evolutionary rates of proteins and develops a steady-state model of gene gain and gene loss during genome evolution.

  21. 21.

    & Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol. Evol. 12, 393–409 (2010). This study makes extensive use of comparative genomic data and polymorphism data from human populations to assess selection and adaptation processes in old versus young genes.

  22. 22.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  23. 23.

    , & A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).

  24. 24.

    & Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biol. 8, 66 (2010).

  25. 25.

    & A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468, 815–818 (2010). A systematic study that showed a clear link between phylogenetically young (that is, orphan) genes and global morphological divergence in the developmental context.

  26. 26.

    , & Genome structural variation discovery and genotyping. Nature Rev. Genet. 12, 363–376 (2011).

  27. 27.

    , & Human LINE retrotransposons generate processed pseudogenes. Nature Genet. 24, 363–367 (2000).

  28. 28.

    , & RNA-based gene duplication: mechanistic and evolutionary insights. Nature Rev. Genet. 10, 19–31 (2009).

  29. 29.

    , & How big is the universe of exons? Science 250, 1377–1382 (1990).

  30. 30.

    Genome evolution and the evolution of exon-shuffling—a review. Gene 238, 103–114 (1999).

  31. 31.

    , , & Signatures of domain shuffling in the human genome. Genome Res. 12, 1642–1650 (2002).

  32. 32.

    , & Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).

  33. 33.

    , & A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes. Genome Biol. 10, R65 (2009).

  34. 34.

    & Lateral gene transfer. Curr. Biol. 21, R242–246 (2011).

  35. 35.

    & Horizontal gene transfer in eukaryotic evolution. Nature Rev. Genet. 9, 605–618 (2008).

  36. 36.

    & The altered evolutionary trajectories of gene duplicates. Trends Genet. 20, 544–549 (2004).

  37. 37.

    & Turning a hobby into a job: how duplicated genes find new functions. Nature Rev. Genet. 9, 938–950 (2008).

  38. 38.

    , , & Oscillating evolution of a mammalian locus with overlapping reading frames: an XLαs/ALEX relay. PLoS Genet. 1, e18 (2005).

  39. 39.

    , & Long non-coding RNAs: insights into functions. Nature Rev. Genet. 10, 155–159 (2009).

  40. 40.

    , , & De novo origination of a new proteincoding gene in Saccharomyces cerevisiae. Genetics 179, 487–496 (2008). This was the first study that provided direct functional evidence for the evolution of a completely new ORF out of a previously non-coding RNA.

  41. 41.

    , , & Emergence of a new gene from an intergenic region. Curr. Biol. 19, 1527–1531 (2009). This was the first study that provided direct functional evidence for the de novo evolution of a new transcript out of a non-coding genomic region.

  42. 42.

    & Recent de novo origin of human proteincoding genes. Genome Res. 19, 1752–1759 (2009).

  43. 43.

    et al. A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand. Cell Res. 20, 408–420 (2010).

  44. 44.

    et al. A human-specific de novo proteincoding gene associated with human brain functions. PLoS Comput. Biol. 6, e1000734 (2010).

  45. 45.

    , , , & Novel genes derived from non-coding DNA in Drosophila melanogaster are frequently Xlinked and show testis-biased expression. Proc. Natl Acad. Sci. USA 103, 9935–9939 (2006).

  46. 46.

    , , & Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).

  47. 47.

    et al. On the origin of new genes in Drosophila. Genome Res. 18, 1446–1455 (2008).

  48. 48.

    et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603–612 (2009). This is currently the most comprehensive systematic survey of orphan genes in primates, drawing specific reference to the modes of origin of this gene class.

  49. 49.

    & Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).

  50. 50.

    RNA dust: where are the genes? DNA Res. 17, 51–59 (2010).

  51. 51.

    Unique chromatin remodeling and transcriptional regulation in spermatogenesis. Science 296, 2176–2178 (2002).

  52. 52.

    Sexual selection, genetic conflict, selfish genes, and the atypical patterns of gene expression in spermatogenic cells. Dev. Biol. 277, 16–26 (2005).

  53. 53.

    , & An integrated view of protein evolution. Nature Rev. Genet. 7, 337–348 (2006).

  54. 54.

    , , , & Accelerated evolutionary rate may be responsible for the emergence of lineagespecific genes in ascomycota. J. Mol. Evol. 63, 1–11 (2006).

  55. 55.

    & An evolutionary analysis of orphan genes in Drosophila. Genome Res. 13, 2213–2219 (2003).

  56. 56.

    , & Highly expressed genes in yeast evolve slowly. Genetics 158, 927–931 (2001).

  57. 57.

    & Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168, 373–381 (2004).

  58. 58.

    , , & Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein–protein interactions. Mol. Biol. Evol. 22, 1345–1354 (2005).

  59. 59.

    , & A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol. 23, 327–337 (2006).

  60. 60.

    , , , & Young proteins experience more variable selection pressures than old proteins. Genome Res. 20, 1574–1581 (2010).

  61. 61.

    , , , & The relationship of protein conservation and sequence length. BMC Evol. Biol. 2, 20 (2002).

  62. 62.

    & Do essential genes evolve slowly? Curr. Biol. 9, 747–750 (1999).

  63. 63.

    & Protein dispensability and rate of evolution. Nature 411, 1046–1049 (2001).

  64. 64.

    et al. Functional genomic analysis of the rates of protein evolution. Proc. Natl Acad. Sci. USA 102, 5483–5488 (2005).

  65. 65.

    & Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008). This paper investigates the selective pressures behind protein evolution and suggests that selection against the toxicity of misfolded proteins generated by ribosome errors is a major mechanism that limits the number of genes in a genome.

  66. 66.

    , , & Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235 (2003).

  67. 67.

    , , & Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes. Nucleic Acids Res. 35, e7 (2007).

  68. 68.

    , , & A neoproterozoic snowball earth. Science 281, 1342–1346 (1998).

  69. 69.

    et al. Rosid radiation and the rapid rise of angiosperm-dominated forests. Proc. Natl Acad. Sci. USA 106, 3853–3858 (2009).

  70. 70.

    Darwinian alchemy: human genes from noncoding DNA. Genome Res. 19, 1693–1695 (2009).

  71. 71.

    & Protein families and their evolution—a structural perspective. Annu. Rev. Biochem. 74, 867–900 (2005).

  72. 72.

    & Function driven protein evolution. A possible proto-protein for the RNA-binding proteins. Pac. Symp. Biocomput. 3, 485–496 (1998).

  73. 73.

    , & On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J. Struct. Biol. 134, 191–203 (2001).

  74. 74.

    & More than the sum of their parts: on the evolution of proteins from peptides. Bioessays 25, 837–846 (2003).

  75. 75.

    , , , & A galaxy of folds. Protein Sci. 19, 124–130 (2010).

  76. 76.

    , , , & On the origin and highly likely completeness of single-domain protein structures. Proc. Natl Acad. Sci. USA 103, 2605–2610 (2006).

  77. 77.

    , & Pfam 10 years on: 10 000 families and still growing. Brief. Bioinform. 9, 210–219 (2008).

  78. 78.

    Nature of the protein universe. Proc. Natl Acad. Sci. USA 106, 11079–11084 (2009).

  79. 79.

    et al. Myriads of protein families, and still counting. Genome Biol. 4, 401 (2003).

  80. 80.

    et al. The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007).

  81. 81.

    , & MALISAM: a database of structurally analogous motifs in proteins. Nucleic Acids Res. 36, D211–D217 (2008).

  82. 82.

    , , , & Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res. 34, 1066–1080 (2006). This study provides an analysis of 203 completed genomes (mostly from bacteria and archaea) and demonstrates that the number of protein families is continually expanding over time and that orphans appear to be an intrinsic part of these genomes.

  83. 83.

    , , & Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins 59, 603–615 (2005).

  84. 84.

    & Structural biology sheds light on the puzzle of genomic ORFans. J. Mol. Biol. 342, 369–373 (2004).

  85. 85.

    , & Structural features and the persistence of acquired proteins. Proteomics 8, 4772–4781 (2008).

  86. 86.

    , & Novel genes exhibit distinct patterns of function acquisition and network integration. Genome Biol. 11, R127 (2010).

  87. 87.

    , , , & The MPI Toolkit for protein sequence analysis. Nucleic Acids Res. 34, W335–W339 (2006).

  88. 88.

    & Intrinsically unstructured proteins and their functions. Nature Rev. Mol. Cell Biol. 6, 197–208 (2005).

  89. 89.

    , , & Molecular principles of the interactions of disordered proteins. J. Mol. Biol. 372, 549–561 (2007).

  90. 90.

    et al. Protein disorder—a breakthrough invention of evolution? Curr. Opin. Struct. Biol. 21, 412–418 (2011).

  91. 91.

    & Intrinsically disordered chaperones in plants and animals. Biochem. Cell Biol. 88, 167–174 (2010).

  92. 92.

    Temporal colinearity and the phylotypic progression: a basis for the stability of a vertebrate Bauplan and the evolution of morphologies through heterochrony. Dev. Suppl. 1994, 135–142 (1994).

  93. 93.

    , & New genes in Drosophila quickly become essential. Science 330, 1682–1685 (2010).

  94. 94.

    , , , & Expansion of the protein repertoire in newly explored environments: human gut microbiome specific protein families. PLoS Comput. Biol. 6, e1000798 (2010).

  95. 95.

    & Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria. BMC Evol. Biol. 8, 108 (2008).

  96. 96.

    et al. A novel gene family controls species-specific morphological traits in Hydra. PLoS Biol. 6, e278 (2008).

  97. 97.

    et al. The ecoresponsive genome of Daphnia pulex. Science 331, 555–561 (2011).

  98. 98.

    A genetic uncertainty problem. Trends Genet. 16, 475–477 (2000).

  99. 99.

    & The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 995–1016 (2007).

  100. 100.

    , & TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).

Download references

Acknowledgements

We thank former and current colleagues and laboratory members, as well as three anonymous reviewers who have contributed to the ideas presented here. We thank R. Neme, M. Matejcˇic´ and M. S. Šestak for providing phylostratigraphic maps. The work of the authors is supported by institutional funds of the Max-Planck Society, the Ruđer Boškovic´ Institute, the Zoological Institute of the Christian-Albrechts-University Kiel and the Unity Through Knowledge Fund (grant number 49).

Author information

Affiliations

  1. Max-Planck Institut für Evolutionsbiologie, August-Thienemannstrasse 2, 24306 Plön, Germany.

    • Diethard Tautz
  2. Zoological Institute, Christian Albrechts University Kiel, 24105 Kiel, Germany.

    • Tomislav Domazet-Lošo
  3. Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, P.P. 180, 10002 Zagreb, Croatia.

    • Tomislav Domazet-Lošo

Authors

  1. Search for Diethard Tautz in:

  2. Search for Tomislav Domazet-Lošo in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Diethard Tautz.

Glossary

Orphan genes

Genes that lack homologues in other lineages — that is, they cannot be linked by overall similarity or shared domains to genes or gene families known from other organisms.

Purifying selection

The removal of deleterious mutations through natural selection.

Basic Local Alignment Search Tool

(BLAST). A program that compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

Protostomes

The animal superphylum that includes nematodes (for example, Caenorhabditis elegans) and arthropods (for example, Drosophila melanogaster).

Deuterostomes

The animal superphylum that includes vertebrates (for example, zebrafish) and mammals (for example, humans).

Founder genes

The phylogenetically oldest genes forming the basis of a new gene lineage, new protein domain or new gene family. The origin of founder genes is expected to correlate with evolution of functional novelty.

Phylostratigraphy

A systematic procedure to identify the origin of genes within a comparative framework of fully sequenced genomes at multiple levels of the phylogenetic hierarchy (the phylostrata).

Horizontal gene transfer

(HGT). The exchange of genes between different evolutionary lineages.

Retrotransposons

Transposons that require an RNA intermediate for their transposition.

Sexual selection

A form of selection that arises from the interaction between the sexes and their gametes rather than from interactions with the environment.

Positive selection

The increase in frequency and fixation of alleles that contributes to the fitness of an organism.

Selective sweeps

The reduction or elimination of nucleotide variation in the genomic region that surrounds a positively selected new mutation.

Phylostratum

A node in the phylogenetic hierarchy that is represented by one or more fully sequenced genomes and where a set of genes from an organism coalesce to founder genes.

Synteny

Conserved genomic arrangements of genes in a linear order.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrg3053

Further reading