Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth

Wilson, Benjamin A.; Foy, Scott G.; Neme, Rafik; Masel, Joanna

doi:10.1038/s41559-017-0146

Article
Published: 24 April 2017

Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth

Nature Ecology & Evolution volume 1, Article number: 0146 (2017) Cite this article

6483 Accesses
85 Citations
77 Altmetric
Metrics details

Subjects

Abstract

The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, whereas the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family and to the false positive annotation of protein-coding genes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Young genes have higher ISD (black circles) than old genes.**

**Figure 3: In agreement with many previous studies, young genes evolve faster and are shorter.**

**Figure 4: Elevated ISD can be broken down into contributions from amino acid composition and from exact amino acid order.**

**Figure 5: Putative evidence for the continuum hypothesis can be explained as a statistical artefact known as Simpson’s paradox.**

**Figure 6: Young yeast genes, like the young mouse genes in Fig. 2, have higher ISD.**

Experimental characterization of de novo proteins and their unevolved random-sequence counterparts

Article Open access 06 April 2023

Evolution and implications of de novo genes in humans

Article 16 March 2023

Evolutionary conservation of the fidelity of transcription

Article Open access 20 March 2023

References

McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015).
Article Google Scholar
Monsellier, E. & Chiti, F. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep. 8, 737–742 (2007).
Article CAS Google Scholar
Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
Article CAS Google Scholar
Masel, J. Cryptic genetic variation is enriched for potential adaptations. Genetics 172, 1985–1991 (2006).
Article CAS Google Scholar
Rajon, E. & Masel, J. The evolution of molecular error rates and the consequences for evolvability. Proc. Natl Acad. Sci. USA 108, 1082–1087 (2011).
Article CAS Google Scholar
Wilson, B. A. & Masel, J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245–1252 (2011).
Article CAS Google Scholar
Neme, R. & Tautz, D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14, 117 (2013).
Article CAS Google Scholar
Romero, P . et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998, 437–448 (1998).
Google Scholar
Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434 (2005).
Article Google Scholar
Yu, J.-F. et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 73, 2949–2957 (2016).
Article CAS Google Scholar
Buljan, M., Frankish, A. & Bateman, A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11, R74 (2010).
Article Google Scholar
Moore, A. D. & Bornberg-Bauer, E. The dynamics and evolutionary potential of domain loss and emergence. Mol. Biol. Evol. 29, 787–796 (2012).
Article CAS Google Scholar
Ekman, D. & Elofsson, A. Identifying and quantifying orphan protein sequences in fungi. J. Mol. Biol. 396, 396–405 (2010).
Article CAS Google Scholar
Bornberg-Bauer, E. & Albà, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).
Article CAS Google Scholar
Mukherjee, S., Panda, A. & Ghosh, T. C. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. Infect. Genet. Evol. 32, 330–337 (2015).
Article CAS Google Scholar
Rancurel, C., Khosravi, M., Dunker, A. K., Romero, P. R. & Karlin, D. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J. Virol. 83, 10719–10736 (2009).
Article CAS Google Scholar
Domazet-Lošo, T., Brajković, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).
Article Google Scholar
Moyers, B. A. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 32, 258–267 (2015).
Article CAS Google Scholar
Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2016).
Article CAS Google Scholar
Albà, M. M. & Castresana, J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol. Biol. 7, 53 (2007).
Article Google Scholar
Chen, S. C.-C., Chuang, T.-J. & Li, W.-H. The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol. Biol. Evol. 28, 2513–2520 (2011).
Article CAS Google Scholar
Podder, S. & Ghosh, T. C. Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human. Mol. Biol. Evol. 27, 934–941 (2010).
Article CAS Google Scholar
Light, S., Basile, W. & Elofsson, A. Orphans and new gene origination, a structural and evolutionary perspective. Curr. Opin. Struct. Biol. 26, 73–83 (2014).
Article CAS Google Scholar
Domazet-Lošo, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).
PubMed PubMed Central Google Scholar
White, S. H. Amino acid preferences of small proteins. J. Mol. Biol. 227, 991–995 (1992).
Article CAS Google Scholar
Irbäck, A. & Sandelin, E. On hydrophobicity correlations in protein chains. Biophys. J. 79, 2252–2258 (2000).
Article Google Scholar
Sandelin, E. On hydrophobicity and conformational specificity in proteins. Biophys. J. 86, 23–30 (2004).
Article CAS Google Scholar
Bock, W. J. Preadaptation and multiple evolutionary pathways. Evolution 13, 194–211 (1959).
Article Google Scholar
Gould, S. J. & Vrba, E. S. Exaptation—a missing term in the science of form. Paleobiology 8, 4–15 (1982).
Article Google Scholar
Whitehead, D. J., Wilke, C. O., Vernazobres, D. & Bornberg-Bauer, E. The look-ahead effect of phenotypic mutations. Biol. Direct 3, 18 (2008).
Article Google Scholar
Ángyán, A. F., Perczel, A. & Gáspári, Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586, 2468–2472 (2012).
Article Google Scholar
Malinas, G. & Bigelow, J. Simpson’s Paradox (ed. Zalta, E. N. ) https://plato.stanford.edu/archives/fall2016/entries/paradox-simpson/ (2016).
Google Scholar
Neme, R. & Tautz, D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife 5, e09977 (2016).
Article Google Scholar
Graur, D. et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).
Article Google Scholar
Tartaglia, G. G., Pellarin, R., Cavalli, A. & Caflisch, A. Organism complexity anti-correlates with proteomic β-aggregation propensity. Protein Sci. 14, 2735–2740 (2005).
Article CAS Google Scholar
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).
Article CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar
Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).
Article CAS Google Scholar
Uversky, V. N. & Dunker, A. K. Understanding protein non-folding. BBA-Proteins Proteom. 1804, 1231–1264 (2010).
Article CAS Google Scholar
Smit, A. F. A., Hubley, R . & Green, P. RepeatMasker Open-4.0 v. 4.0.5 (2013–2015); http://www.repeatmasker.org
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Article CAS Google Scholar

Download references

Acknowledgements

Work was supported by the John Templeton Foundation (39667), the National Institutes of Health (GM104040) and ERC grant NewGenes (322564). We thank D. Tautz and M. Cordes for discussions, R. Bakaric for assistance with phylostratigraphy and A.-R. Carvunis for comments on a draft of the manuscript and for sharing data.

Author information

Scott G. Foy & Rafik Neme
Present address: *Present addresses: St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA (S.G.F.); Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York 10032, USA (R.N.).,
Benjamin A. Wilson and Scott G. Foy: These authors contributed equally to this work.

Authors and Affiliations

Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA.
Benjamin A. Wilson, Scott G. Foy & Joanna Masel
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, SH 24306, Germany.
Rafik Neme

Authors

Benjamin A. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Scott G. Foy
View author publications
You can also search for this author in PubMed Google Scholar
Rafik Neme
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Masel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M and R.N. conceived the approach, R.N. performed the phylostratigraphy, B.A.W. and S.G.F. completed all other data analyses, and J.M. wrote the paper.

Corresponding author

Correspondence to Joanna Masel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figure 1 and Supplementary Table 1 (PDF 376 kb)

Supplementary Table 2

M. musculus proteins. (CSV 12497 kb)

Supplementary Table 3

Nucleotide sequences from intergenic regions of M. musculus genome (CSV 70010 kb)

Supplementary Table 4

Nucleotide sequences from intergenic regions of the masked M. musculus genome (CSV 70154 kb)

Supplementary Table 5

Randomly generated nucleotide sequences (CSV 35221 kb)

Supplementary Table 6

Scrambled amino acid sequences (CSV 12119 kb)

Supplementary Table 7

S. cerevisiae proteins from Table 1 (CSV 3330 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wilson, B., Foy, S., Neme, R. et al. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1, 0146 (2017). https://doi.org/10.1038/s41559-017-0146

Download citation

Received: 23 September 2016
Accepted: 16 March 2017
Published: 24 April 2017
DOI: https://doi.org/10.1038/s41559-017-0146

This article is cited by

The origin and structural evolution of de novo genes in Drosophila
- Junhui Peng
- Li Zhao
Nature Communications (2024)
Four classic “de novo” genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences
- Joseph Hannon Bozorgmehr
Molecular Genetics and Genomics (2024)
Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra
- Josué Barrera-Redondo
- Jaruwatana Sodai Lotharukpong
- Susana M. Coelho
Genome Biology (2023)
Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways
- Idan Frumkin
- Michael T. Laub
Nature Ecology & Evolution (2023)
De novo genes with an lncRNA origin encode unique human brain developmental functionality
- Ni A. An
- Jie Zhang
- Chuan-Yun Li
Nature Ecology & Evolution (2023)