Open questions in the study of de novo genes: what, how and why

Journal name:
Nature Reviews Genetics
Year published:
Published online
Corrected online


The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.

At a glance


  1. A systematic approach to the classification of novel genes.
    Figure 1: A systematic approach to the classification of novel genes.

    This classification is based on tracing the evolution of a locus to the most recent common ancestor in which it can be inferred that there was no expressed open reading frame (ORF) at the orthologous location, and then inspecting the evolutionary steps in the origin of the gene. This requires abundant, closely related, high-quality genomes. Evidence of the presence of a translated ORF, even a relatively short one, can be sought through ribosome protection assays33, 121, 122. The defining characteristic of a de novo gene is that it evolved from previously non-coding sequence; therefore, it must be possible to identify that sequence otherwise the classification must remain ambiguous. ORFs for which an ORF-less orthologous location cannot be found cannot be defined as de novo or new, as we cannot exclude the possibility that they are ancient but fast evolving. The approach using sequence similarity alone can, at best, suggest that the new gene bears no resemblance to extant genes or horizontally transferred genes, thus implicating de novo origination by exclusion. If no attempt is made to reconstruct the ancestral state or if it is not practically possible to do so, these are classified as 'putatively de novo genes' and are not considered in this taxonomy. We propose a hierarchical classification of de novo genes based on whether or not there is newly inserted sequence in the locus and whether or not that sequence had been previously under natural selection: type Ia and type Ib are entirely derived from non-protein-coding sequence; type II contains a minority of sequence that was previously under selection (such as transposable elements or portions of a pre-existing gene), but that does not explain the function of the modern gene; and type III de novo genes are chimaeras of sequences previously under selection, but which contain some novel sequence not previously under selection for protein-coding function. Type I genes are the most clean-cut, whereas the gene histories for type II and type III are progressively more convoluted. Real world complexity probably means that type III genes will be contested in some instances. For example, jingwei (jgw) has not previously been considered as a de novo gene, but it is a new gene that includes previously intronic (that is, non-protein-coding) sequence, and classifying it as a type III de novo gene acknowledges this mixed history. CLLU1, chronic lymphocytic leukaemia upregulated 1; D. melanogaster, Drosophila melanogaster; ESRG, embryonic stem cell related; H. sapiens, Homo sapiens; PBOV1, prostate and breast cancer overexpressed 1; S. cerevisiae, Saccharomyces cerevisiae.

  2. Validation of novel genes.
    Figure 2: Validation of novel genes.

    There are significant challenges in verifying lineage-specific de novo genes owing to the unavailability of some tools and the uninformative nature of others. Well-established genes receive support from sequence similarity across many genomes and evolutionary patterns of selection, and a certain fraction of them will show a phenotype in a knockout or knockdown experiment. For most of these tests, species-specific genes are indistinguishable from spurious expression in the genome. Functional characterization through observation of knockdown or knockout phenotypes is one way to distinguish genuine novel genes from spurious expression. Note that the distinction between 'taxonomically restricted' and 'species-specific' need not be absolute in that the latter can become the former when more closely related genomes are added. In the lower panel, ticks indicate observed and crosses indicate not observed. NA, not available; ORF, open reading frame.

  3. Features of genome anatomy that alter the likelihood of novel gene origination.
    Figure 3: Features of genome anatomy that alter the likelihood of novel gene origination.

    a | A pre-existing gene (dark blue box) may facilitate the origin of a de novo gene (light blue box) through the re-use of the promoter, perhaps in a bidirectional conformation. b | A pre-existing gene may also contribute to novel gene origination through transcriptional read-through, which can occur in as many as 11% of cases. c | Start (ATG) and stop (TAA, TAG and TGA) codons are AT-rich and are thus less frequent in GC-rich regions of the genome. This means that the distance between any start and stop codon is longer in GC-rich regions of genome, resulting in longer random open reading frames (ORFs; light blue boxes). Conversely, although stop codons are common in GC-poor regions, so are start codons; thus, random ORFs will be short but so will the inter-ORF distances, resulting in high ORF density. d | Transposable elements (yellow boxes) can facilitate the origin of new genes by providing regulatory sequences (arrows) or by contributing to the ORF of the novel gene (light blue boxes). e | Mutation of a transcription factor (TF) may alter its DNA-binding specificity, thus activating expression at many previously unexpressed loci (light blue boxes) with or without affecting the expression of pre-existing genes (dark blue boxes).

Change history

Corrected online 27 July 2016
In Table 1 of the original version of this article the gene name NCYM was incorrectly written as NYCM. This has now been corrected. The editors apologize for this error.


  1. Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. A. & Begun, D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl Acad. Sci. USA 103, 99359939 (2006).
  2. Begun, D. J., Lindfors, H. A., Thompson, M. E. & Holloway, A. K. Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags. Genetics 172, 16751681 (2006).
  3. Xiao, W. et al. A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS ONE 4, e4603 (2009).
  4. Knowles, D. G. & McLysaght, A. Recent de novo origin of human protein-coding genes. Genome Res. 19, 17521759 (2009).
  5. Li, L. et al. Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves. Plant J. 58, 485498 (2009).
  6. Cai, J., Zhao, R., Jiang, H. & Wang, W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487496 (2008).
  7. Zhou, Q. & Wang, W. On the origin and evolution of new genes — a genomic and experimental perspective. J. Genet. Genom. 35, 639648 (2008).
  8. Toll-Riera, M. et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603612 (2009).
  9. Wu, D.-D., Irwin, D. M. & Zhang, Y.-P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).
  10. Tautz, D. & Domazet-Loso, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692702 (2011).
  11. McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).
  12. Schlötterer, C. Genes from scratch — the evolutionary fate of de novo genes. Trends Genet. 31, 215219 (2015).
  13. Guerzoni, D. & McLysaght, A. De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol. Evol. 8, 12221232 (2016).
  14. Domazet-Loso, T., Brajkovic´, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533539 (2007).
  15. Wolfe, K. Evolutionary genomics: yeasts accelerate beyond BLAST. Curr. Biol. 14, R392R394 (2004).
  16. Elhaik, E., Sabath, N. & Graur, D. The “inverse relationship between evolutionary rate and age of mammalian genes” is an artifact of increased genetic distance with rate of evolution and time of divergence. Mol. Biol. Evol. 23, 13 (2006).
  17. Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258267 (2015).
  18. Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 12451256 (2016).
  19. Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370374 (2012).
  20. Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).
  21. Alba, M. M. & Castresana, J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).
  22. Alba, M. M. & Castresana, J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol. Biol. Evol. 22, 598606 (2005).
  23. Domazet-Loso, T. & Tautz, D. An ancient evolutionary origin of genes associated with human genetic diseases. Mol. Biol. Evol. 25, 26992707 (2008).
  24. Smith, N. G. C. & Eyre-Walker, A. Human disease genes: patterns and predictions. Gene 318, 169175 (2003).
  25. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  26. Hurst, L. D. Open questions: a logic (or lack thereof) of genome organization. BMC Biol. 11, 58 (2013).
  27. Graur, D. et al. On the immortality of television sets: 'function' in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578590 (2013).
  28. Doolittle, W. F. Is junk DNA bunk? A critique of ENCODE. Proc. Natl Acad. Sci. USA 110, 52945300 (2013).
  29. Jaillon, O. et al. Translational control of intron splicing in eukaryotes. Nature 451, 359362 (2008).
  30. Cusack, B. P., Arndt, P. F., Duret, L. & Roest Crollius, H. Preventing dangerous nonsense: selection for robustness to transcriptional error in human genes. PLoS Genet. 7, e1002276 (2011).
  31. Dewey, C. N., Rogozin, I. B. & Koonin, E. V. Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics 7, 311 (2006).
  32. Schüler, A., Ghanbarian, A. T. & Hurst, L. D. Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs. Mol. Biol. Evol. 31, 31643183 (2014).
  33. Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
  34. Chen, J.-Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).
  35. Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769772 (2014).
  36. Galtier, N., Duret, L., Glémin, S. & Ranwez, V. GC-biased gene conversion promotes the fixation of deleterious amino acid changes in primates. Trends Genet. 25, 15 (2009).
  37. Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 10921096 (2015).
  38. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 10961101 (2015).
  39. Wang, J. et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516, 405409 (2014).
  40. Lavialle, C. et al. Paleovirology of 'syncytins', retroviral env genes exapted for a role in placentation. Phil. Trans. R. Soc. B 368, 20120507 (2013).
  41. Li, D., Yan, Z., Lu, L., Jiang, H. & Wang, W. Pleiotropy of the de novo-originated gene MDF1. Sci. Rep. 4, 7280 (2014).
  42. Li, D. et al. A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand. Cell Res. 20, 408420 (2010).
  43. Ghysen, A. Debatable issues. Interview with L Wolpert and A García-Bellido. Int. J. Dev. Biol. 42, 511518 (1998).
  44. Tautz, D. A genetic uncertainty problem. Trends Genet. 16, 475477 (2000).
  45. Chalfin, L. et al. Mapping ecologically relevant social behaviours by gene knockout in wild mice. Nat. Commun. 5, 4569 (2014).
  46. Xu, J. & Zhang, J. Are human translated pseudogenes functional? Mol. Biol. Evol. 33, 755760 (2016).
  47. Chen, S., Zhang, Y. E. & Long, M. New genes in Drosophila quickly become essential. Science 330, 16821685 (2010).
  48. Bird, A. P. Gene number, noise reduction and biological complexity. Trends Genet. 11, 94100 (1995).
  49. Hurst, L. D. Evolutionary genomics and the reach of selection. J. Biol. 8, 12 (2009).
  50. Prestridge, D. S. & Burks, C. The density of transcriptional elements in promoter and non-promoter sequences. Hum. Mol. Genet. 2, 14491453 (1993).
  51. Hoekstra, H. E. & Coyne, J. A. The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 9951016 (2007).
  52. Begun, D. J., Lindfors, H. A., Kern, A. D. & Jones, C. D. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 11311137 (2007).
  53. Ebisuya, M., Yamamoto, T., Nakajima, M. & Nishida, E. Ripples from neighbouring transcription. Nat. Cell Biol. 10, 11061113 (2008).
  54. Siepel, A. Darwinian alchemy: human genes from noncoding DNA. Genome Res. 19, 16931695 (2009).
  55. Murphy, D. N. & McLysaght, A. De novo origin of protein-coding genes in murine rodents. PLoS ONE 7, e48650 (2012).
  56. Gotea, V., Petrykowska, H. M. & Elnitski, L. Bidirectional promoters as important drivers for the emergence of species-specific transcripts. PLoS ONE 8, e57323 (2013).
  57. Wu, X. & Sharp, P. A. Divergent transcription: a driving force for new gene origination? Cell 155, 990996 (2013).
  58. Akiva, P. et al. Transcription-mediated gene fusion in the human genome. Genome Res. 16, 3036 (2006).
  59. Parra, G. et al. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 16, 3744 (2006).
  60. Nacu, S. et al. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genom. 4, 11 (2011).
  61. Ruiz-Orera, J. et al. Origins of de novo genes in human and chimpanzee. PLoS Genet. 11, e1005721 (2015).
  62. Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).
  63. Necsulea, A. & Kaessmann, H. Evolutionary dynamics of coding and non-coding transcriptomes. Nat. Rev. Genet. 15, 734748 (2014).
  64. Warnecke, T., Huang, Y., Przytycka, T. M. & Hurst, L. D. Unique cost dynamics elucidate the role of frameshifting errors in promoting translational robustness. Genome Biol. Evol. 2, 636645 (2010).
  65. Lercher, M. J., Urrutia, A. O., Pavlícek, A. & Hurst, L. D. A unification of mosaic structures in the human genome. Hum. Mol. Genet. 12, 24112415 (2003).
  66. Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 17981812 (2012).
  67. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. 104, 1861318618 (2007).
  68. Gotea, V. & Makałowski, W. Do transposable elements really contribute to proteomes? Trends Genet. 22, 260267 (2006).
  69. Thornburg, B. G., Gotea, V. & Makałowski, W. Transposable elements as a significant source of transcription regulating signals. Gene 365, 104110 (2006).
  70. Göke, J. et al. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 16, 135141 (2015).
  71. Denli, A. M. et al. Primate-specific ORF0 contributes to retrotransposon-mediated diversity. Cell 163, 583593 (2015).
  72. Wang, Y. et al. Endogenous miRNA sponge lincRNA-RoR regulates Oct4, Nanog, and Sox2 in human embryonic stem cell self-renewal. Dev. Cell 25, 6980 (2013).
  73. Galagan, J. E., & Selker, E. U. RIP: the evolutionary cost of genome defense. Trends Genet. 20, 417413 (2004).
  74. Xie, C. et al. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet. 8, e1002942 (2012).
  75. Palmieri, N., Kosiol, C. & Schlötterer, C. The life cycle of Drosophila orphan genes. eLife 3, e01311 (2014).
  76. Neme, R. & Tautz, D. Evolution: dynamics of de novo gene emergence. Curr. Biol. 24, R238R240 (2014).
  77. Kamijyo, A., Yura, K. & Ogura, A. Distinct evolutionary rate in the eye field transcription factors found by estimation of ancestral protein structure. Gene 555, 7379 (2015).
  78. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252263 (2009).
  79. Hayashi, Y., Sakata, H., Makino, Y., Urabe, I. & Yomo, T. Can an arbitrary sequence evolve towards acquiring a biological function? J. Mol. Evol. 56, 162168 (2003).
  80. Zhang, W., Landback, P., Gschwend, A. R., Shen, B. & Long, M. New genes drive the evolution of gene interaction networks in the human and mouse genomes. Genome Biol. 16, 202 (2015).
  81. Lercher, M. J. & Pál, C. Integration of horizontally transferred genes into regulatory interaction networks takes many million years. Mol. Biol. Evol. 25, 559567 (2008).
  82. Batada, N. N., Hurst, L. D. & Tyers, M. Evolutionary and physiological importance of hub proteins. PLoS Comp. Biol. 2, e88 (2006).
  83. Force, A. et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 15311545 (1999).
  84. Schoorlemmer, J., Pérez-Palacios, R., Climent, M., Guallar, D. & Muniesa, P. Regulation of mouse retroelement MuERV-L/MERVL expression by REX1 and epigenetic control of stem cell potency. Front. Oncol. 4, 14 (2014).
  85. Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 5763 (2012).
  86. Imakawa, K., Nakagawa, S. & Miyazawa, T. Baton pass hypothesis: successive incorporation of unconserved endogenous retroviral genes for placentation during mammalian evolution. Genes Cells 20, 771788 (2015).
  87. Aakre, C. D. et al. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594606 (2015).
  88. Esnault, C., Cornelis, G., Heidmann, O. & Heidmann, T. Differential evolutionary fate of an ancestral primate endogenous retrovirus envelope gene, the EnvV syncytin, captured for a function in placentation. PLoS Genet. 9, e1003400 (2013).
  89. Cornelis, G. et al. Retroviral envelope syncytin capture in an ancestrally diverged mammalian clade for placentation in the primitive Afrotherian tenrecs. Proc. Natl Acad. Sci. USA 111, E4332E4341 (2014).
  90. Cornelis, G. et al. Retroviral envelope gene captures and syncytin exaptation for placentation in marsupials. Proc. Natl Acad. Sci. USA 112, E487E496 (2015).
  91. Cornelis, G. et al. Captured retroviral envelope syncytin gene associated with the unique placental structure of higher ruminants. Proc. Natl Acad. Sci. USA 110, E828E837 (2013).
  92. Dupressoir, A., Lavialle, C. & Heidmann, T. From ancestral infectious retroviruses to bona fide cellular genes: role of the captured syncytins in placentation. Placenta 33, 663671 (2012).
  93. Emera, D. et al. Convergent evolution of endometrial prolactin expression in primates, mice, and elephants through the independent recruitment of transposable elements. Mol. Biol. Evol. 29, 239247 (2012).
  94. Maston, G. A. & Ruvolo, M. Chorionic gonadotropin has a recent origin within primates and an evolutionary history of selection. Mol. Biol. Evol. 19, 320335 (2002).
  95. Ross, B. D. et al. Stepwise evolution of essential centromere function in a Drosophila neogene. Science 340, 12111214 (2013).
  96. Elliot, M. G. & Crespi, B. J. Phylogenetic evidence for early hemochorial placentation in eutheria. Placenta 30, 949967 (2009).
  97. Elliot, M. G. & Crespi, B. J. Genetic recapitulation of human pre-eclampsia risk during convergent evolution of reduced placental invasiveness in eutherian mammals. Phil. Trans. R. Soc. B 370, 20140069 (2015).
  98. Izsvák, Z., Wang, J., Singh, M., Mager, D. L. & Hurst, L. D. Pluripotency and the endogenous retrovirus HERVH: conflict or serendipity? Bioessays 38, 109117 (2016).
  99. Landmann, F., Orsi, G. A., Loppin, B. & Sullivan, W. Wolbachia-mediated cytoplasmic incompatibility is associated with impaired histone deposition in the male pronucleus. PLoS Pathog. 5, e1000343 (2009).
  100. Fine, P. E. On the dynamics of symbiote-dependent cytoplasmic incompatibility in culicine mosquitoes. J. Invertebr. Pathol. 31, 1018 (1978).
  101. Merrill, C., Bayraktaroglu, L., Kusano, A. & Ganetzky, B. Truncated RanGAP encoded by the Segregation Distorter locus of Drosophila. Science 283, 17421745 (1999).
  102. Gerdes, K. et al. The hok killer gene family in gram-negative bacteria. New Biol. 2, 946956 (1990).
  103. Hurst, L. D. scat+ is a selfish gene analogous to Medea of Tribolium castaneum. Cell 75, 407408 (1993).
  104. Marshall, J. M. The toxin and antidote puzzle: new ways to control insect pest populations through manipulating inheritance. Bioeng. Bugs 2, 235240 (2011).
  105. Chen, C.-H. et al. A synthetic maternal-effect selfish genetic element drives population replacement in Drosophila. Science 316, 597600 (2007).
  106. Phadnis, N. & Orr, H. A. A single gene causes both male sterility and segregation distortion in Drosophila hybrids. Science 323, 376379 (2009).
  107. Hurst, L. D. & Pomiankowski, A. Causes of sex ratio bias may account for unisexual sterility in hybrids: a new explanation of Haldane's rule and related phenomena. Genetics 128, 841858 (1991).
  108. Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005).
  109. Kosiol, C. et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008).
  110. Goriely, A. et al. Gain-of-function amino acid substitutions drive positive selection of FGFR2 mutations in human spermatogonia. Proc. Natl Acad. Sci. USA 102, 60516056 (2005).
  111. Suenaga, Y. et al. NCYM, a cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas. PLoS Genet. 10, e1003996 (2014).
  112. Samusik, N., Krukovskaya, L., Meln, I., Shilov, E. & Kozlov, A. P. PBOV1 is a human de novo gene with tumor-specific expression that is associated with a positive clinical outcome of cancer. PLoS ONE 8, e56162 (2013).
  113. Zendman, A. J. W., Ruiter, D. J. & Van Muijen, G. N. P. Cancer/testis-associated genes: identification, expression profile, and putative function. J. Cell. Physiol. 194, 272288 (2003).
  114. Simpson, A. J. G., Caballero, O. L., Jungbluth, A., Chen, Y.-T. & Old, L. J. Cancer/testis antigens, gametogenesis and cancer. Nat. Rev. Cancer 5, 615625 (2005).
  115. Hofmann, O. et al. Genome-wide analysis of cancer/testis gene expression. 105, 2042220427 (2008).
  116. Kohn, D. B., Sadelain, M. & Glorioso, J. C. Occurrence of leukaemia following gene therapy of X-linked SCID. Nat. Rev. Cancer 3, 477488 (2003).
  117. Bornberg-Bauer, E. & Alba, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459466 (2013).
  118. Heinen, T. J. A. J., Staubach, F., Häming, D. & Tautz, D. Emergence of a new gene from an intergenic region. Curr. Biol. 19, 15271531 (2009).
  119. Broustas, C. G. et al. BRCC2, a novel BH3-like domain-containing protein, induces apoptosis in a caspase-dependent manner. J. Biol. Chem. 279, 2678026788 (2004).
  120. Broustas, C. G. et al. The proapoptotic molecule BLID interacts with Bcl-XL and its downregulation in breast cancer correlates with poor disease-free and overall survival. Clin. Cancer Res. 16, 29392948 (2010).
  121. Andrews, S. J. & Rothnagel, J. A. Emerging evidence for functional peptides encoded by short open reading frames. Nat. Rev. Genet. 15, 193204 (2014).
  122. Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
  123. Buhl, A. M. et al. Identification of a gene on chromosome 12q22 uniquely overexpressed in chronic lymphocytic leukemia. Blood 107, 29042911 (2006).
  124. Lin, B. et al. PART-1: a novel human prostate-specific, androgen-regulated gene that maps to chromosome 5q12. Cancer Res. 60, 858863 (2000).
  125. Pekarsky, Y., Rynditch, A., Wieser, R., Fonatsch, C. & Gardiner, K. Activation of a novel gene in 3q21 and identification of intergenic fusion transcripts with ecotropic viral insertion site I in leukemia. Cancer Res. 57, 39143919 (1997).
  126. Kaushal, A. et al. A novel transcript from the KLKP1 gene is androgen regulated, down-regulated during prostate cancer progression and encodes the first non-serine protease identified from the human kallikrein gene locus. Prostate 68, 381399 (2008).

Download references

Author information


  1. The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland.

    • Aoife McLysaght
  2. The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK.

    • Laurence D. Hurst

Competing interests statement

The authors declare no competing interests.

Corresponding author

Correspondence to:

Author details

  • Aoife McLysaght

    Aoife McLysaght is a professor in genetics at the University of Dublin, Trinity College, Ireland, where she also obtained her Ph.D. Her research is in molecular evolution, using bioinformatics and computational biology approaches. Her principal research focus is the evolution of vertebrate genomes with respect to fundamental questions surrounding new genes, gene and genome duplication, and evolutionary constraints.

  • Laurence D. Hurst

    Laurence D. Hurst is Director of the Milner Centre for Evolution, Director of the Genetics and Evolution Teaching Project and a professor of evolutionary genetics at the University of Bath, UK. He received his B.A. from the University of Cambridge, UK, and his D.Phil. from the University of Oxford, UK. He is interested in fundamental problems concerning the evolution of genes, genomes and genetic systems.

Additional data