Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties—for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Casari, G., De Daruvar, A., Sander, C. & Schneider, R. Bioinformatics and the discovery of gene function. Trends Genet. 12, 244–245 (1996).
Boguski, M. S., Tolstoshev, C. M. & Bassett, D. E. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760–1774 (2012).
Kong, S., Tao, M., Shen, X. & Ju, S. Translatable circRNAs and lncRNAs: driving mechanisms and functions of their translation products. Cancer Lett. 483, 59–65 (2020).
Lu, S. et al. A hidden human proteome encoded by ‘non-coding’ genes. Nucleic Acids Res. 47, 8111–8125 (2019).
Ruiz-Orera, J., Villanueva-Cañas, J. L. & Albà, M. M. Evolution of new proteins from translated sORFs in long non-coding RNAs. Exp. Cell. Res. 391, 111940 (2020).
Mudge, J. M. et al. Standardized annotation of translated open reading frames. Nat. Biotechnol. 40, 994–999 (2022).
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Kozak, M. Structural features in eukaryotic mRNAs that modulate the initiation of translation. J. Biol. Chem. 266, 19867–19870 (1991).
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
Jacob, F. Evolution and tinkering. Science 196, 1161–1166 (1977).
Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
Ruiz-Orera, J., Verdaguer-Grau, P., Villanueva-Cañas, J. L., Messeguer, X. & Albà, M. M. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol. 2, 890–896 (2018).
Baalsrud, H. T. et al. De novo gene evolution of antifreeze glycoproteins in codfishes revealed by whole genome sequence data. Mol. Biol. Evol. 35, 593–606 (2018).
Schmitz, J. F., Ullrich, K. K. & Bornberg-Bauer, E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat. Ecol. Evol. 2, 1626–1632 (2018).
Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772 (2014).
Zhang, L. et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat. Ecol. Evol. 3, 679–690 (2019).
Wu, D.-D., Irwin, D. M. & Zhang, Y.-P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).
Ruiz-Orera, J. et al. Origins of de novo genes in human and chimpanzee. PLoS Genet. 11, e1005721 (2015).
Zhu, S. et al. An oncopeptide regulates m6A recognition by the m6A reader IGF2BP1 and tumorigenesis. Nat. Commun. 11, 1685 (2020).
Guo, Z.-W. et al. Translated long non-coding ribonucleic acid ZFAS1 promotes cancer cell migration by elevating reactive oxygen species production in hepatocellular carcinoma. Front. Genet. 10, 1111 (2019).
Shao, Y. et al. GenTree, an integrated resource for analyzing the evolution and function of primate-specific coding genes. Genome Res. 29, 682–696 (2019).
Guerzoni, D. & McLysaght, A. De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol. Evol. 8, 1222–1232 (2016).
Chen, J.-Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).
Samusik, N., Krukovskaya, L., Meln, I., Shilov, E. & Kozlov, A. P. PBOV1 is a human de novo gene with tumor-specific expression that is associated with a positive clinical outcome of cancer. PLoS ONE 8, e56162 (2013).
Li, C.-Y. et al. A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput. Biol. 6, e1000734 (2010).
Suenaga, Y. et al. NCYM, a cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas. PLoS Genet. 10, e1003996 (2014).
Knowles, D. G. & McLysaght, A. Recent de novo origin of human protein-coding genes. Genome Res. 19, 1752–1759 (2009).
Xie, C. et al. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet. 8, e1002942 (2012).
Van Oss, S. B. & Carvunis, A.-R. De novo gene birth. PLoS Genet. 15, e1008160 (2019).
Schlötterer, C. Genes from scratch – the evolutionary fate of de novo genes. Trends Genet. 31, 215–219 (2015).
McLysaght, A. & Hurst, L. D. Open questions in the study of de novo genes: what, how and why. Nat. Rev. Genet. 17, 567–578 (2016).
Weisman, C. M. The origins and functions of de novo genes: against all odds? J. Mol. Evol. 90, 244–257 (2022).
Tautz, D. & Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692–702 (2011).
Dujon, B. The yeast genome project: what did we learn? Trends Genet. 12, 263–270 (1996).
Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R. & Bosch, T. C. G. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 25, 404–413 (2009).
Weisman, C. M., Murray, A. W. & Eddy, S. R. Many, but not all, lineage-specific genes can be explained by homology detection failure. PLoS Biol. 18, e3000862 (2020).
Levy, A. How evolution builds genes from scratch. Nature 574, 314–316 (2019).
Toll-Riera, M. et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603–612 (2009).
Suntsova, M. V. & Buzdin, A. A. Differences between human and chimpanzee genomes and their implications in gene expression, protein functions and biochemical properties of the two species. BMC Genom. 21, 535 (2020).
Zhuang, X., Yang, C., Murphy, K. R., Christina Cheng, C. H. & Cheng, C.-H. C. Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids. Proc. Natl Acad. Sci. USA 116, 4400–4405 (2019).
Grandchamp, A., Berk, K., Dohmen, E. & Bornberg‐bauer, E. New genomic signals underlying the emergence of human proto‐genes. Genes 13, 284 (2022).
Vakirlis, N., Vance, Z., Duggan, K. M. & McLysaght, A. De novo birth of functional microproteins in the human lineage. Cell Rep. 41, 111808 (2022).
Clark, M. B. et al. The reality of pervasive transcription. PLoS Biol. 9, 5–10 (2011).
Ulitsky, I. & Bartel, D. P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013).
Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).
Aspden, J. L. et al. Extensive translation of small open reading frames revealed by poly-Ribo-seq. eLife 3, e03528 (2014).
Van Heesch, S. et al. Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol. 15, R6 (2014).
Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015).
Brar, G. A. et al. High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335, 552–557 (2012).
Andreev, D. E. et al. Non-AUG translation initiation in mammals. Genome Biol. 23, 111 (2022).
Kozak, M. Pushing the limits of the scanning mechanism for initiation of translation. Gene 299, 1–34 (2002).
Ruiz-Orera, J. & Albà, M. M. Conserved regions in long non-coding RNAs contain abundant translation and protein-RNA interaction signatures. NAR Genom. Bioinform. 1, e2 (2019).
Xie, C. et al. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife 8, e44392 (2019).
Dowling, D., Schmitz, J. F. & Bornberg-Bauer, E. Stochastic gain and loss of novel transcribed open reading frames in the human lineage. Genome Biol. Evol. 12, 2183–2195 (2020).
Vakirlis, N. et al. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat. Commun. 11, 781 (2020).
Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0127 (2017).
Palmieri, N., Kosiol, C. & Schlötterer, C. The life cycle of Drosophila orphan genes. eLife 3, e01311 (2014).
Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 0146 (2017).
Casola, C. From de novo to “de nono”: the majority of novel protein-coding genes identified with phylostratigraphy are old genes or recent duplicates. Genome Biol. Evol. 10, 2906–2918 (2018).
Durand, É. et al. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Res. 29, 932–943 (2019).
Vakirlis, N. et al. A molecular portrait of de novo genes in yeasts. Mol. Biol. Evol. 35, 631–645 (2018).
Heames, B. et al. Experimental characterisation of de novo proteins and their unevolved random-sequence counterparts. Preprint at https://doi.org/10.1101/2022.01.14.476368 (2022).
Albà, M. M. & Castresana, J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol. Biol. Evol. 22, 598–606 (2005).
Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genom. 14, 117 (2013).
Janssen, J. W. G. et al. Concurrent activation of a novel putative transforming gene, myeov, and cyclin D1 in a subset of multiple myeloma cell lines with t(11;14)(q13;q32). Blood 95, 2691–2698 (2000).
Lynch, M. & Marinov, G. K. The bioenergetic costs of a gene. Proc. Natl Acad. Sci. USA 112, 15690–15695 (2015).
Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).
Kesner, J. S., Chen, Z., Aparicio, A. A. & Wu, X. A unified model for the surveillance of translation in diverse noncoding sequences. Preprint at https://doi.org/10.1101/2022.07.20.500724 (2022).
Castro, J. F. & Tautz, D. The effects of sequence length and composition of random sequence peptides on the growth of E. Coli cells. Genes 12, 1913 (2021).
Eicholt, L. A., Aubel, M., Berk, K., Bornberg-Bauer, E. & Lange, A. Heterologous expression of naturally evolved putative de novo proteins with chaperones. Protein Sci. 31, e4371 (2022).
Papadopoulos, C. et al. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res. 31, 2303–2315 (2021).
Bornberg-Bauer, E., Hlouchova, K. & Lange, A. Structure and function of naturally evolved de novo proteins. Curr. Opin. Struct. Biol. 68, 175–183 (2021).
Brunet, T. D. P. & Doolittle, W. F. The generality of constructive neutral evolution. Biol. Philos. 33, 2 (2018).
Keeling, D. M. et al. The meanings of ‘function’ in biology and the problematic case of de novo gene emergence. eLife 8, e47014 (2019).
Chen, J. et al. Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 (2020).
Yu, J. et al. lncRNA MYCNOS facilitates proliferation and invasion in hepatocellular carcinoma by regulating miR-340. Hum. Cell 33, 148–158 (2020).
Lange, A. et al. Structural and functional characterization of a putative de novo gene in Drosophila. Nat. Commun. 12, 1667 (2021).
Rivard, E. L. et al. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet. 17, e1009787 (2021).
Jiang, X. et al. Characterization of a novel human testis-specific gene: testis developmental related gene 1 (TDRG1). Tohoku J. Exp. Med. 225, 311–318 (2011).
Florio, M. et al. Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex. eLife 7, e32332 (2018).
van Heesch, S. et al. The translational landscape of the human heart. Cell 178, 242–260.e29 (2019).
Martinez, T. F. et al. Accurate annotation of human protein-coding small open reading frames. Nat. Chem. Biol. 16, 458–468 (2020).
Raj, A. et al. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. eLife 5, e13328 (2016).
Gaertner, B. et al. A human ESC-based screen identifies a role for the translated lncRNA LINC00261 in pancreatic endocrine differentiation. eLife 9, e58659 (2020).
Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Craig, R., Cortens, J. P. & Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3, 1234–1242 (2004).
Deutsch, E. W. et al. State of the human proteome in 2014/2015 as viewed through PeptideAtlas: enhancing accuracy and coverage through the AtlasProphet. J. Proteome Res. 14, 3461–3473 (2015).
Deutsch, E. W. et al. Human Proteome Project mass spectrometry data interpretation guidelines 3.0. J. Proteome Res. 18, 4108–4116 (2019).
Wright, B. W., Molloy, M. P. & Jaschke, P. R. Overlapping genes in natural and engineered genomes. Nat. Rev. Genet. 23, 154–168 (2022).
Zhang, Y. E., Landback, P., Vibranovski, M. D. & Long, M. Accelerated recruitment of new brain development genes into the human genome. PLoS Biol. 9, e1001179 (2011).
Bekpen, C., Xie, C. & Tautz, D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol. Biol. 18, 121 (2018).
Deng, Y. et al. Spatial profiling of chromatin accessibility in mouse and human tissues. Nature 609, 375–383 (2022).
Majic, P. & Payne, J. L. Enhancers facilitate the birth of de novo genes and gene integration into regulatory networks. Mol. Biol. Evol. 37, 1165–1178 (2020).
Zhang, S. et al. Open chromatin dynamics reveals stage-specific transcriptional networks in hiPSC-based neurodevelopmental model. Stem Cell Res. 29, 88–98 (2018).
An, N. A. et al. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat. Ecol. Evol. 7, 264–278 (2023).
Qi, J. et al. A human-specific de novo gene promotes cortical expansion and folding. Adv. Sci. 10, e2204140 (2023).
Duffy, E. E. et al. Developmental dynamics of RNA translation in the human brain. Nat. Neurosci. 25, 1353–1365 (2022).
Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. A. & Begun, D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl Acad. Sci. USA 103, 9935–9939 (2006).
Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, 0976–0985 (2005).
Vinckenbosch, N., Dupanloup, I. & Kaessmann, H. Evolutionary fate of retroposed gene copies in the human genome. Proc. Natl Acad. Sci. USA 103, 3220–3225 (2006).
Rödelsperger, C. et al. Spatial transcriptomics of nematodes identifies sperm cells as a source of genomic novelty and rapid evolution. Mol. Biol. Evol. 38, 229–243 (2021).
Witt, E., Benjamin, S., Svetec, N. & Zhao, L. Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila. eLife 8, e47138 (2019).
Kondo, S. et al. New genes often acquire male specific functions but rarely become essential in Drosophila. Genes Dev. 31, 1841–1846 (2017).
Gubala, A. M. et al. The goddard and saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 34, 1066–1082 (2017).
Su, Q., He, H. & Zhou, Q. On the origin and evolution of Drosophila new genes during spermatogenesis. Genes 12, 1796 (2021).
Kopania, E. E. K., Larson, E. L., Callahan, C., Keeble, S. & Good, J. M. Molecular evolution across mouse spermatogenesis. Mol. Biol. Evol. 39, msac023 (2022).
Kaneko, Y. et al. Functional interplay between MYCN, NCYM, and OCT4 promotes aggressiveness of human neuroblastomas. Cancer Sci. 106, 840–847 (2015).
Suenaga, Y., Nakatani, K. & Nakagawara, A. De novo evolved gene product NCYM in the pathogenesis and clinical outcome of human neuroblastomas and other cancers. Jpn. J. Clin. Oncol. 50, 839–846 (2020).
Zhao, X. et al. CTCF cooperates with noncoding RNA MYCNOS to promote neuroblastoma progression through facilitating MYCN expression. Oncogene 35, 3565–3576 (2016).
Kanatsu-Shinohara, M. et al. Myc/Mycn-mediated glycolysis enhances mouse spermatogonial stem cell self-renewal. Genes Dev. 30, 2637–2648 (2016).
Zhang, R., Xia, L. Q., Lu, W. W., Zhang, J. & Zhu, J. S. lncRNAs and cancer. Oncol. Lett. 12, 1233–1239 (2016).
de Magalhães, J. P. Every gene can (and possibly will) be associated with cancer. Trends Genet. 38, 216–217 (2022).
Li, J. & Liu, C. Coding or noncoding, the converging concepts of RNAs. Front. Genet. 10, 496 (2019).
Nam, J.-W., Choi, S.-W. & You, B.-H. Incredible RNA: dual functions of coding and noncoding. Mol. Cells 39, 367–374 (2016).
Dinger, M. E., Gascoigne, D. K. & Mattick, J. S. The evolution of RNAs with multiple functions. Biochimie 93, 2013–2018 (2011).
Brunet, M. A. et al. OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res. 47, D403–D410 (2019).
Neville, M. D. C. et al. A platform for curated products from novel open reading frames prompts reinterpretation of disease variants. Genome Res. 31, 327–336 (2021).
Olexiouk, V., Van Criekinge, W. & Menschaert, G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 46, D497–D502 (2017).
Graur, D. et al. On the immortality of television sets: ‘function’ in the human genome according to the evolution-free gospel of encode. Genome Biol. Evol. 5, 578–590 (2013).
Ruiz-Orera, J., Albà, M. M. & Alba, M. M. Translation of small open reading frames: roles in regulation and evolutionary innovation. Trends Genet. 35, 186–198 (2019).
Prensner, J. R. et al. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat. Biotechnol. 39, 697–704 (2021).
Xing, L. et al. Expression of human‐specific ARHGAP11B in mice leads to neocortex expansion and increased memory flexibility. EMBO J. 40, e107093 (2021).
Schmidt, E. R. E., Kupferman, J. V., Stackmann, M. & Polleux, F. The human-specific paralogs SRGAP2B and SRGAP2C differentially modulate SRGAP2A-dependent synaptic development. Sci. Rep. 9, 18692 (2019).
Suzuki, I. K. et al. Human-specific NOTCH2NL genes expand cortical neurogenesis through Delta/Notch regulation. Cell 173, 1370–1384.e16 (2018).
Pollen, A. A. et al. Establishing cerebral organoids as models of human-specific brain evolution. Cell 176, 743–756.e17 (2019).
Lancaster, M. A. et al. Cerebral organoids model human brain development and microcephaly. Nature 501, 373–379 (2013).
Sidhaye, J. et al. Integrated transcriptome and proteome analysis in human brain organoids reveals translational regulation of ribosomal proteins. Preprint at https://doi.org/10.1101/2022.10.07.511280 (2022)
Fischer, J. et al. Human‐specific ARHGAP11B ensures human‐like basal progenitor levels in hominid cerebral organoids. EMBO Rep. 23, e54728 (2022).
Heide, M., Huttner, W. B. & Mora-Bermúdez, F. Brain organoids as models to study human neocortex development and evolution. Curr. Opin. Cell Biol. 55, 8–16 (2018).
Fiddes, I. T. et al. Human-specific NOTCH2NL genes affect Notch signaling and cortical neurogenesis. Cell 173, 1356–1369.e22 (2018).
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
The authors declare no competing interests.
Peer review information
Nature Ecology & Evolution thanks Chuan-Yun Li, Lars Eicholt and Nikolaos Vakirlis for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Broeils, L.A., Ruiz-Orera, J., Snel, B. et al. Evolution and implications of de novo genes in humans. Nat Ecol Evol 7, 804–815 (2023). https://doi.org/10.1038/s41559-023-02014-y