Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Matters Arising
  • Published:

Identification of old coding regions disproves the hominoid de novo status of genes

Matters Arising to this article was published on 26 August 2024

The Original Article was published on 02 January 2023

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Patterns of translation and conservation of the SMIM45 (ENSG00000205704) gene.
Fig. 2: Translation of SMIM45 (ENSG00000205704) across species.

Data availability

Ribo-seq and matched RNA sequencing (RNA-seq) data for adult brain (cortex), liver and testis from the different mammals used to assess potential translation of the loci reported by An et al.3 stem from our previous study7 and can be retrieved from ArrayExpress at www.ebi.ac.uk/arrayexpress/ (accession code E-MTAB-7247). Ribo-seq and matched RNA-seq data for the developing human cortex stem from a previous study21 and were retrieved from the database of Genotypes and Phenotypes (accession code phs002489). The developmental human RNA-seq data stem from our previous study16 and can be retrieved from ArrayExpress (accession code E-MTAB-6814). Analyses of the various gene annotations were based on different versions of Ensembl (http://www.ensembl.org), as detailed in the main text. Protein annotation for ENSG00000205704 was also assessed using UniProt (www.uniprot.org/uniprotkb; accession code A0A590UK83), as described in the main text.

Code availability

To create the plots and analyse the data, we used the R (https://www.r-project.org/) packages ggplot2 (version 3.3.5) and tidyverse (version 1.3.1) throughout the manuscript. We used Integrative Genomics Viewer (version 2.3.60) to visualize the mapped reads. We wrote a custom R script to calculate the translational likelihoods (https://github.com/evgenyleushkin/translationLR). We used codeml from the Phylogenetic Analysis by Maximum Likelihood (version 4.8) package (http://abacus.gene.ucl.ac.uk/software/paml.html) to assess the dN/dS values. The sequence alignments were extracted from existing University of California, Santa Cruz (https://genome.ucsc.edu/) alignments (https://hgdownload.soe.ucsc.edu/downloads.html).

References

  1. Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. McLysaght, A. & Hurst, L. D. Open questions in the study of de novo genes: what, how and why. Nat. Rev. Genet. 17, 567–578 (2016).

    Article  CAS  PubMed  Google Scholar 

  3. An, N. A. et al. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat. Ecol. Evol. 7, 264–278 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).

    Article  CAS  PubMed  Google Scholar 

  5. Karginov, T. A., Pastor, D. P. H., Semler, B. L. & Gomez, C. M. Mammalian polycistronic mRNAs and disease. Trends Genet. 33, 129–142 (2017).

    Article  CAS  PubMed  Google Scholar 

  6. Delihas, N. Evolution of a human-specific de novo open reading frame and its linked transcriptional silencer. Int. J. Mol. Sci. 25, 3924 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wang, Z. Y. et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642–647 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Xu, J. & Zhang, J. Are human translated pseudogenes functional? Mol. Biol. Evol. 33, 755–760 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Jeong, K., Kim, S. & Bandeira, N. False discovery rates in spectral identification. BMC Bioinformatics 13, S2 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Tabb, D. L., Friedman, D. B. & Ham, A. J. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat. Protoc. 1, 2213–2222 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wu, D. D., Irwin, D. M. & Zhang, Y. P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Bak, R. O. & Mikkelsen, J. G. miRNA sponges: soaking up miRNAs for regulation of gene expression. Wiley Interdiscip. Rev. RNA 5, 317–333 (2014).

    Article  CAS  PubMed  Google Scholar 

  13. Charley, P. A. & Wilusz, J. Sponging of cellular proteins by viral RNAs. Curr. Opin. Virol. 9, 14–18 (2014).

    Article  CAS  PubMed  Google Scholar 

  14. HafezQorani, S., Houdjedj, A., Arici, M., Said, A. & Kazan, H. RBPSponge: genome-wide identification of lncRNAs that sponge RBPs. Bioinformatics 35, 4760–4763 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Broeils, L. A., Ruiz-Orera, J., Snel, B., Hubner, N. & van Heesch, S. Evolution and implications of de novo genes in humans. Nat. Ecol. Evol. 7, 804–815 (2023).

    Article  PubMed  Google Scholar 

  16. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Chen, J. Y. et al. Emergence, retention and selection: a trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates. PLoS Genet. 11, e1005391 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).

    Article  CAS  PubMed  Google Scholar 

  20. Qi, J. et al. A human-specific de novo gene promotes cortical expansion and folding. Adv. Sci. (Weinh.) 10, e2204140 (2023).

    PubMed  Google Scholar 

  21. Duffy, E. E. et al. Developmental dynamics of RNA translation in the human brain. Nat. Neurosci. 25, 1353–1365 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).

    Article  CAS  PubMed  Google Scholar 

  24. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the members of H.K.’s laboratory for discussions.

Author information

Authors and Affiliations

Authors

Contributions

E.L. and H.K. conceived of and organized the study, analysed the data and co-wrote the manuscript. E.L. performed all of the large-scale analyses.

Corresponding authors

Correspondence to Evgeny Leushkin or Henrik Kaessmann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Anne-Ruxandra Carvunis, Sebastiaan van Heesch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Translation and transcription of the SMIM45 (ENSG00000205704) gene during human cortex development.

a, IGV (Integrative Genomics Viewer24) browser screenshot of ribosome profiling footprints within SMIM45 in fetal (12–23 weeks post-conception) human cortical development, based on ribosome profiling data from Duffy et al. (ref. 21). b, IGV browser screenshot of total RNA sequencing reads during different stages of human cortical development, based on RNA sequencing data from Cardoso-Moreira et al. (ref. 16).

Extended Data Fig. 2 Evolutionary relationships of the SMIM45-201 and SEPTIN3, and developmental expression of SEPTIN3.

a, SMIM45 and SEPTIN-3 are adjacent but independent protein-coding genes in the human genome, and correspond to regions within the single protein-coding gene (annotated as Septin-3) in bony (ray-finned) fish, as illustrated for the fathead minnow. Amino acid sequence similarity was detected with an iterative PSI-BLAST (ref. 25) search (sequence identity, ‘id’, is indicated). b, Expression (transcript abundance) pattern of SEPTIN3 (that is, that of the longest isoform, SEPTIN3-207) across development of different organs in human, based on RNA sequencing data from Cardoso-Moreira et al. (ref. 16). Error bars indicate the range of values across biological replicates.

Extended Data Fig. 3 Translational profile for ENSG00000120742 (SERP1) in the brain.

Transcription/translation signals along the SERP1 sequence. From top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated ORF annotation of SERP1 includes a uORF (studied by An et al., to the left) and the main (downstream) canonical ORF/coding region (to the right). See the main text for details.

Extended Data Fig. 4 Translational profile for ENSG00000145063.

Transcription/translation signals along the ENSG00000145063 sequence in three organs (brain, liver, testis). Upper panel: exonic sequence ENSG00000145063, with the translated coding region shaded in turquoise and the ORF studied by An et al. in red (bold). Lower panels, respectively for the three organs, from top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotation of ENSG00000145063 includes the main canonical ORF/coding region (light blue box, bottom figure). The non-translated ORF considered by An et al. is indicated by a red line in lower (left) ORF figure. See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.

Extended Data Fig. 5 Translational profile for ENSG00000260456 (C16orf95) and its mouse ortholog.

Transcription/translation signals along the ENSG00000260456 sequence in the testis of human and mouse. Panel a (top): human exonic sequence for ENSG00000260456, with the translated coding region shaded in turquoise and the ORF studied by An et al. in red (bold). Panel b (top): mouse exonic sequence for ENSMUSG00000031809, with the translated coding region shaded in turquoise. Below the sequences, respectively for human (a) and mouse (b), from top to bottom: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotations of ENSG00000260456 and ENSMUSG00000031809, respectively, include the main canonical ORF/coding region (light blue box, bottom figure). The non-translated ORF considered by An et al. in humans is indicated by a red line in the lower (left) ORF figure. See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.

Extended Data Fig. 6 Translational profile for ENSG00000203930.

Transcription/translation signals along the ENSG00000203930 sequence in the human brain. a, three-exon transcript isoform with the ORF studied by An et al. (ENST00000370535). b, main, efficiently translated two-exon isoform (ENST00000607004). From top to bottom in both panels, respectively: total RNA (TR) read counts averaged in five trinucleotide windows, ribosome profiling footprint (RP) counts averaged in five trinucleotide windows; translational likelihood ratio (LR) (see legend of Fig. 2 for details); ORFs with LR > 10 in first, second, and third reading frame (relative to position 1 in a spliced transcript; intron positions indicated by gray vertical lines); resulting translated (brain/cortex) ORF annotations of the two isoforms, respectively. Only the main isoform ENST00000607004 is efficiently translated (5.1 ribosomal footprints per nucleotide), whereas for the isoform ENST00000370535, the ORF sequence not overlapping with that of the main isoform shows a low translation signal (average footprint coverage across ORF: 0.28 ribosomal footprints per nucleotide). See the main text for details. Icons reproduced from ref. 16, Springer Nature Limited.

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Table 1. Reassessment of the 74 loci reported by An et al.3. Supplementary Table 2. Overview of the peptide identifiers from Chen et al.18—where 64 of the loci analysed by An et al.3 had already been reported—for which assignment to a database (PRIDE) and hence identification of the tissue source was possible.

Supplementary Data 1

Annotation of translated ORFs and expressed transcripts for three organs (brain, liver and testis).

Supplementary Data 2

Sequence alignment files for four genes (ENSG00000120742, ENSG00000145063, ENSG00000203930 and ENSG00000260456).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leushkin, E., Kaessmann, H. Identification of old coding regions disproves the hominoid de novo status of genes. Nat Ecol Evol 8, 1826–1830 (2024). https://doi.org/10.1038/s41559-024-02513-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-024-02513-6

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing