This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Data availability
Ribo-seq and matched RNA sequencing (RNA-seq) data for adult brain (cortex), liver and testis from the different mammals used to assess potential translation of the loci reported by An et al.3 stem from our previous study7 and can be retrieved from ArrayExpress at www.ebi.ac.uk/arrayexpress/ (accession code E-MTAB-7247). Ribo-seq and matched RNA-seq data for the developing human cortex stem from a previous study21 and were retrieved from the database of Genotypes and Phenotypes (accession code phs002489). The developmental human RNA-seq data stem from our previous study16 and can be retrieved from ArrayExpress (accession code E-MTAB-6814). Analyses of the various gene annotations were based on different versions of Ensembl (http://www.ensembl.org), as detailed in the main text. Protein annotation for ENSG00000205704 was also assessed using UniProt (www.uniprot.org/uniprotkb; accession code A0A590UK83), as described in the main text.
Code availability
To create the plots and analyse the data, we used the R (https://www.r-project.org/) packages ggplot2 (version 3.3.5) and tidyverse (version 1.3.1) throughout the manuscript. We used Integrative Genomics Viewer (version 2.3.60) to visualize the mapped reads. We wrote a custom R script to calculate the translational likelihoods (https://github.com/evgenyleushkin/translationLR). We used codeml from the Phylogenetic Analysis by Maximum Likelihood (version 4.8) package (http://abacus.gene.ucl.ac.uk/software/paml.html) to assess the dN/dS values. The sequence alignments were extracted from existing University of California, Santa Cruz (https://genome.ucsc.edu/) alignments (https://hgdownload.soe.ucsc.edu/downloads.html).
References
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
McLysaght, A. & Hurst, L. D. Open questions in the study of de novo genes: what, how and why. Nat. Rev. Genet. 17, 567–578 (2016).
An, N. A. et al. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat. Ecol. Evol. 7, 264–278 (2023).
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
Karginov, T. A., Pastor, D. P. H., Semler, B. L. & Gomez, C. M. Mammalian polycistronic mRNAs and disease. Trends Genet. 33, 129–142 (2017).
Delihas, N. Evolution of a human-specific de novo open reading frame and its linked transcriptional silencer. Int. J. Mol. Sci. 25, 3924 (2024).
Wang, Z. Y. et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642–647 (2020).
Xu, J. & Zhang, J. Are human translated pseudogenes functional? Mol. Biol. Evol. 33, 755–760 (2016).
Jeong, K., Kim, S. & Bandeira, N. False discovery rates in spectral identification. BMC Bioinformatics 13, S2 (2012).
Tabb, D. L., Friedman, D. B. & Ham, A. J. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat. Protoc. 1, 2213–2222 (2006).
Wu, D. D., Irwin, D. M. & Zhang, Y. P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).
Bak, R. O. & Mikkelsen, J. G. miRNA sponges: soaking up miRNAs for regulation of gene expression. Wiley Interdiscip. Rev. RNA 5, 317–333 (2014).
Charley, P. A. & Wilusz, J. Sponging of cellular proteins by viral RNAs. Curr. Opin. Virol. 9, 14–18 (2014).
HafezQorani, S., Houdjedj, A., Arici, M., Said, A. & Kazan, H. RBPSponge: genome-wide identification of lncRNAs that sponge RBPs. Bioinformatics 35, 4760–4763 (2019).
Broeils, L. A., Ruiz-Orera, J., Snel, B., Hubner, N. & van Heesch, S. Evolution and implications of de novo genes in humans. Nat. Ecol. Evol. 7, 804–815 (2023).
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Chen, J. Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Qi, J. et al. A human-specific de novo gene promotes cortical expansion and folding. Adv. Sci. (Weinh.) 10, e2204140 (2023).
Duffy, E. E. et al. Developmental dynamics of RNA translation in the human brain. Nat. Neurosci. 25, 1353–1365 (2022).
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Acknowledgements
We thank the members of H.K.’s laboratory for discussions.
Author information
Authors and Affiliations
Contributions
E.L. and H.K. conceived of and organized the study, analysed the data and co-wrote the manuscript. E.L. performed all of the large-scale analyses.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks Anne-Ruxandra Carvunis, Sebastiaan van Heesch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Translation and transcription of the SMIM45 (ENSG00000205704) gene during human cortex development.
a, IGV (Integrative Genomics Viewer24) browser screenshot of ribosome profiling footprints within SMIM45 in fetal (12–23 weeks post-conception) human cortical development, based on ribosome profiling data from Duffy et al. (ref. 21). b, IGV browser screenshot of total RNA sequencing reads during different stages of human cortical development, based on RNA sequencing data from Cardoso-Moreira et al. (ref. 16).
Extended Data Fig. 2 Evolutionary relationships of the SMIM45-201 and SEPTIN3, and developmental expression of SEPTIN3.
a, SMIM45 and SEPTIN-3 are adjacent but independent protein-coding genes in the human genome, and correspond to regions within the single protein-coding gene (annotated as Septin-3) in bony (ray-finned) fish, as illustrated for the fathead minnow. Amino acid sequence similarity was detected with an iterative PSI-BLAST (ref. 25) search (sequence identity, ‘id’, is indicated). b, Expression (transcript abundance) pattern of SEPTIN3 (that is, that of the longest isoform, SEPTIN3-207) across development of different organs in human, based on RNA sequencing data from Cardoso-Moreira et al. (ref. 16). Error bars indicate the range of values across biological replicates.
Extended Data Fig. 3 Translational profile for ENSG00000120742 (SERP1) in the brain.
Transcription/translation signals along the SERP1 sequence. From top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated ORF annotation of SERP1 includes a uORF (studied by An et al., to the left) and the main (downstream) canonical ORF/coding region (to the right). See the main text for details.
Extended Data Fig. 4 Translational profile for ENSG00000145063.
Transcription/translation signals along the ENSG00000145063 sequence in three organs (brain, liver, testis). Upper panel: exonic sequence ENSG00000145063, with the translated coding region shaded in turquoise and the ORF studied by An et al. in red (bold). Lower panels, respectively for the three organs, from top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotation of ENSG00000145063 includes the main canonical ORF/coding region (light blue box, bottom figure). The non-translated ORF considered by An et al. is indicated by a red line in lower (left) ORF figure. See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.
Extended Data Fig. 5 Translational profile for ENSG00000260456 (C16orf95) and its mouse ortholog.
Transcription/translation signals along the ENSG00000260456 sequence in the testis of human and mouse. Panel a (top): human exonic sequence for ENSG00000260456, with the translated coding region shaded in turquoise and the ORF studied by An et al. in red (bold). Panel b (top): mouse exonic sequence for ENSMUSG00000031809, with the translated coding region shaded in turquoise. Below the sequences, respectively for human (a) and mouse (b), from top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotations of ENSG00000260456 and ENSMUSG00000031809, respectively, include the main canonical ORF/coding region (light blue box, bottom figure). The non-translated ORF considered by An et al. in humans is indicated by a red line in the lower (left) ORF figure. See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.
Extended Data Fig. 6 Translational profile for ENSG00000203930.
Transcription/translation signals along the ENSG00000203930 sequence in the human brain. a, three-exon transcript isoform with the ORF studied by An et al. (ENST00000370535). b, main, efficiently translated two-exon isoform (ENST00000607004). From top to bottom in both panels, respectively: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotations of the two isoforms, respectively. Only the main isoform ENST00000607004 is efficiently translated (5.1 ribosomal footprints per nucleotide), whereas for the isoform ENST00000370535, the ORF sequence not overlapping with that of the main isoform shows a low translation signal (average footprint coverage across ORF: 0.28 ribosomal footprints per nucleotide). See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.
Supplementary information
Supplementary Tables
Supplementary Table 1. Reassessment of the 74 loci reported by An et al.3. Supplementary Table 2. Overview of the peptide identifiers from Chen et al.18—where 64 of the loci analysed by An et al.3 had already been reported—for which assignment to a database (PRIDE) and hence identification of the tissue source was possible.
Supplementary Data 1
Annotation of translated ORFs and expressed transcripts for three organs (brain, liver and testis).
Supplementary Data 2
Sequence alignment files for four genes (ENSG00000120742, ENSG00000145063, ENSG00000203930 and ENSG00000260456).
Rights and permissions
About this article
Cite this article
Leushkin, E., Kaessmann, H. Identification of old coding regions disproves the hominoid de novo status of genes. Nat Ecol Evol 8, 1826–1830 (2024). https://doi.org/10.1038/s41559-024-02513-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41559-024-02513-6