Rapid evolution of protein diversity by de novo origination in Oryza

Abstract

New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Identification of de novo genes that originated recently during Oryza diversification.
Fig. 2: Stepwise origination processes for the de novo gene Osjap05g30030.
Fig. 3: Stepwise origination process for the de novo gene Osjap06g21910.
Fig. 4: Patterns of de novo origination in evolution, expression and gene structures.
Fig. 5: Example of the verification of protein products translated from a candidate de novo gene, Osjap05g20760.
Fig. 6: Summary of the protein products translated from candidate de novo genes in O. sativa subspecies japonica, as detected by experimental proteomics and ribosomal profiling analyses.

Data availability

The data that support the findings of this study are available in Supplementary Files 1 and 2, Supplementary Figs. 4 and 5, and Supplementary Tables 11 and 14.

References

  1. 1.

    Chen, L., DeVries, A. L. & Cheng, C. H. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc. Natl Acad. Sci. USA 94, 3811–3816 (1997).

    Article  CAS  PubMed  Google Scholar 

  2. 2.

    Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. A. & Begun, D. J. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl Acad. Sci. USA 103, 9935–9939 (2006).

    Article  CAS  PubMed  Google Scholar 

  3. 3.

    Ohno, S. Evolution by Gene Duplication (Springer, 1970).

  4. 4.

    Jacob, F. Evolution and tinkering. Science 196, 1161–1166 (1977).

    Article  CAS  PubMed  Google Scholar 

  5. 5.

    Gilbert, W. Why genes in pieces? Nature 271, 501 (1978).

    Article  CAS  PubMed  Google Scholar 

  6. 6.

    Mayr, E. The Growth of Biological Thought: Diversity, Evolution, and Inheritance (Belknap Press, 1982).

  7. 7.

    Patthy, L. in Protein Evolution 2nd edn 108–109 (Blackwell Publishing, 2008).

  8. 8.

    Klasberg, S., Bitard-Feildel, T., Callebaut, I. & Bornberg-Bauer, E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J. 285, 2605–2625 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. 9.

    Bitard-Feildel, T., Heberlein, M., Bornberg-Bauer, E. & Callebaut, I. Detection of orphan domains in Drosophila using “hydrophobic cluster analysis”. Biochimie 119, 244–253 (2015).

    Article  CAS  PubMed  Google Scholar 

  10. 10.

    Cai, J., Zhao, R., Jiang, H. & Wang, W. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179, 487–496 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Carvunis, A. R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Xiao, W. et al. A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS ONE 4, e4603 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Wu, D. D. et al. “Out of pollen” hypothesis for origin of new genes in flowering plants: study from Arabidopsis thaliana. Genome Biol. Evol. 6, 2822–2829 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Cui, X. et al. Young genes out of the male: an insight from evolutionary age analysis of the pollen transcriptome. Mol. Plant 8, 935–945 (2015).

    Article  CAS  PubMed  Google Scholar 

  15. 15.

    Donoghue, M. T., Keshavaiah, C., Swamidatta, S. H. & Spillane, C. Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol. 11, 47 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Begun, D. J., Lindfors, H. A., Kern, A. D. & Jones, C. D. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Chen, S. T., Cheng, H. C., Barbash, D. A. & Yang, H. P. Evolution of hydra, a recently evolved testis-expressed gene with nine alternative first exons in Drosophila melanogaster. PLoS Genet. 3, e107 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Chen, S., Zhang, Y. E. & Long, M. New genes in Drosophila quickly become essential. Science 330, 1682–1685 (2010).

    Article  CAS  PubMed  Google Scholar 

  19. 19.

    Reinhardt, J. A. et al. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet. 9, e1003860 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Zhou, Q. et al. On the origin of new genes in Drosophila. Genome Res. 18, 1446–1455 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Zhao, L., Saelao, P., Jones, C. D. & Begun, D. J. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769–772 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Toll-Riera, M. et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603–612 (2009).

    Article  CAS  PubMed  Google Scholar 

  23. 23.

    Li, C. Y. et al. A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput. Biol. 6, e1000734 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Wu, D. D., Irwin, D. M. & Zhang, Y. P. De novo origin of human protein-coding genes. PLoS Genet. 7, e1002379 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Zhang, Y. E., Vibranovski, M. D., Landback, P., Marais, G. A. & Long, M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol. 8, e1000494 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Knowles, D. G. & McLysaght, A. Recent de novo origin of human protein-coding genes. Genome Res. 19, 1752–1759 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Murphy, D. N. & McLysaght, A. De novo origin of protein-coding genes in murine rodents. PLoS ONE 7, e48650 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Xie, C. et al. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet. 8, e1002942 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Ruiz-Orera, J., Verdaguer-Grau, P., Villanueva-Canas, J. L., Messeguer, X. & Alba, M. M. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol. 2, 890–896 (2018).

    Article  PubMed  Google Scholar 

  30. 30.

    Tautz, D. & Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692–702 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. 31.

    Schlötterer, C. Genes from scratch—the evolutionary fate of de novo genes. Trends Genet. 31, 215–219 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Moyers, B. A. & Zhang, J. Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution. Mol. Biol. Evol. 33, 1245–1256 (2018).

    Article  CAS  Google Scholar 

  33. 33.

    Zhao, Y. et al. Identification and analysis of unitary loss of long-established protein-coding genes in Poaceae shows evidences for biased gene loss and putatively functional transcription of relics. BMC Evol. Biol. 15, 66 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Cheng, C. H. & Chen, L. Evolution of an antifreeze glycoprotein. Nature 401, 443–444 (1999).

    Article  CAS  PubMed  Google Scholar 

  35. 35.

    Husnik, F. & McCutcheon, J. P. Functional horizontal gene transfer from bacteria to eukaryotes. Nat. Rev. Microbiol. 16, 67–79 (2018).

    Article  CAS  PubMed  Google Scholar 

  36. 36.

    Dujon, B. The yeast genome project: what did we learn? Trends Genet. 12, 263–270 (1996).

    Article  CAS  PubMed  Google Scholar 

  37. 37.

    Gubala, A. M. et al. The goddard and saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 34, 1066–1082 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet. 50, 285–296 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Hedges, S. B., Marin, J., Suleski, M., Paymer, M. & Kumar, S. Tree of life reveals clock-like speciation and diversification. Mol. Biol. Evol. 32, 835–845 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Kawahara, Y. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Sakai, H. et al. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol. 54, e6 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Long, M. Y., VanKuren, N. W., Chen, S. D. & Vibranovski, M. D. New gene evolution: little did we know. Annu. Rev. Genet. 47, 307–333 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Zhang, C. J. et al. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences. Genome Biol. Evol. 5, 1038–1048 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Zhang, Y. E., Landback, P., Vibranovski, M. & Long, M. New genes expressed in human brains: implications for annotating evolving genomes. BioEssays 34, 982–991 (2012).

    Article  CAS  PubMed  Google Scholar 

  46. 46.

    Mills, R. E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    Article  CAS  Google Scholar 

  48. 48.

    Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105–111 (2012).

    Article  CAS  Google Scholar 

  49. 49.

    Watterson, G. A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975).

    Article  CAS  PubMed  Google Scholar 

  50. 50.

    McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991).

    Article  CAS  PubMed  Google Scholar 

  51. 51.

    Wang, M. et al. The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat. Genet. 46, 982–988 (2014).

    Article  CAS  Google Scholar 

  52. 52.

    Hartl, D. L. & Clark, A. G. Principles of Population Genetics 4th edn 172–175; 351–354 (Sinauer Associates, Sunderland, 2007).

  53. 53.

    Berretta, J. & Morillon, A. Pervasive transcription constitutes a new level of eukaryotic genome regulation. EMBO Rep. 10, 973–982 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Bornberg-Bauer, E. & Alba, M. M. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23, 459–466 (2013).

    Article  CAS  PubMed  Google Scholar 

  55. 55.

    Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0217 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Heinen, T. J., Staubach, F., Häming, D. & Tautz, D. Emergence of a new gene from an intergenic region. Curr. Biol. 19, 1527–1531 (2009).

    Article  CAS  PubMed  Google Scholar 

  57. 57.

    Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).

    Article  CAS  Google Scholar 

  58. 58.

    Long, M., Rosenberg, C. & Gilbert, W. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl Acad. Sci. USA 92, 12495–12499 (1995).

    Article  CAS  PubMed  Google Scholar 

  59. 59.

    Sharp, P. A. Speculations on RNA splicing. Cell 23, 643–646 (1981).

    Article  CAS  PubMed  Google Scholar 

  60. 60.

    Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  Google Scholar 

  61. 61.

    Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Ebhardt, H. A., Root, A., Sander, C. & Aebersold, R. Applications of targeted proteomics in systems biology and translational medicine. Proteomics 15, 3193–3208 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Pecorelli, I., Bibi, R., Fioroni, L. & Galarini, R. Validation of a confirmatory method for the determination of sulphonamides in muscle according to the European Union regulation 2002/657/EC. J. Chromatogr. A 1032, 23–29 (2004).

    Article  CAS  PubMed  Google Scholar 

  64. 64.

    Wen, B. et al. IPeak: an open source tool to combine results from multiple MS/MS search engines. Proteomics 15, 2916–2920 (2015).

    Article  CAS  PubMed  Google Scholar 

  65. 65.

    Zhao, D. et al. Analysis of ribosome-associated mRNAs in rice reveals the importance of transcript size and GC content in translation. G3 (Bethesda) 7, 203–219 (2017).

    Article  CAS  Google Scholar 

  66. 66.

    Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Sabi, R., Volvovitch Daniel, R. & Tuller, T. stAIcalc: tRNA adaptation index calculator based on species-specific weights. Bioinformatics 33, 589–591 (2017).

    CAS  PubMed  Google Scholar 

  68. 68.

    Lees, J. G., Dawson, N. L., Sillitoe, I. & Orengo, C. A. Functional innovation from changes in protein domains and their combinations. Curr. Opin. Struct. Biol. 38, 44–52 (2016).

    Article  CAS  Google Scholar 

  69. 69.

    Davidson, A. R. & Sauer, R. T. Folded proteins occur frequently in libraries of random amino acid sequences. Proc. Natl Acad. Sci. USA 91, 2146–2150 (1994).

    Article  CAS  PubMed  Google Scholar 

  70. 70.

    Keefe, A. D. & Szostak, J. W. Functional proteins from a random-sequence library. Nature 410, 715–718 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Vaughan, D. A., Morishima, H. & Kadowaki, K. Diversity in the Oryza genus. Curr. Opin. Plant Biol. 6, 139–146 (2003).

    Article  CAS  PubMed  Google Scholar 

  72. 72.

    Murat, F., Van de Peer, Y. & Salse, J. Decoding plant and animal genome plasticity from differential paleo-evolutionary patterns and processes. Genome Biol. Evol. 4, 917–928 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Huey, R. B. et al. Plants versus animals: do they deal with stress in different ways? Integr. Comp. Biol. 42, 415–423 (2002).

    Article  PubMed  Google Scholar 

  74. 74.

    Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 0146 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  75. 75.

    McLysaght, A. & Hurst, L. D. Open questions in the study of de novo genes: what, how and why. Nat. Rev. Genet. 17, 567–578 (2016).

    Article  CAS  PubMed  Google Scholar 

  76. 76.

    Zhang, Y. E., Vibranovski, M. D., Krinsky, B. H. & Long, M. Age-dependent chromosomal distribution of male-biased genes in Drosophila. Genome Res. 20, 1526–1533 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Zhang, Y. E., Landback, P., Vibranovski, M. D. & Long, M. Accelerated recruitment of new brain development genes into the human genome. PLoS Biol. 9, e1001179 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. 78.

    Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Ranwez, V., Harispe, S., Delsuc, F. & Douzery, E. J. MACSE: multiple alignment of coding sequences accounting for frameshifts and stop codons. PLoS ONE 6, e22594 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 37, D5–D15 (2009).

    Article  CAS  Google Scholar 

  82. 82.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Dos Reis, M. et al. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 32, 5036–5044 (2004).

    Article  CAS  PubMed  Google Scholar 

  85. 85.

    Chan, P. P. & Lowe, T. M. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 37, D93–D97 (2009).

    Article  CAS  PubMed  Google Scholar 

  86. 86.

    Aebersold, R., Burlingame, A. L. & Bradshaw, R. A. Western blots versus selected reaction monitoring assays: time to turn the tables? Mol. Cell. Proteomics 12, 2381–2382 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. 87.

    Sjostrom, M. et al. A combined shotgun and targeted mass spectrometry strategy for breast cancer biomarker discovery. J. Proteome Res. 14, 2807–2818 (2015).

    Article  CAS  PubMed  Google Scholar 

  88. 88.

    Guo, J. et al. A comprehensive investigation toward the indicative proteins of bladder cancer in urine: from surveying cell secretomes to verifying urine proteins. J. Proteome Res. 15, 2164–2177 (2016).

    Article  CAS  PubMed  Google Scholar 

  89. 89.

    Xie, Y. et al. The levels of serine proteases in colon tissue interstitial fluid and serum serve as an indicator of colorectal cancer progression. Oncotarget 7, 32592–32606 (2016).

    PubMed  PubMed Central  Google Scholar 

  90. 90.

    Zhang, S. et al. Quantitative analysis of the human AKR family members in cancer cell lines using the mTRAQ/MRM approach. J. Proteome Res. 12, 2022–2033 (2013).

    Article  CAS  PubMed  Google Scholar 

  91. 91.

    Hou, G. et al. Biomarker discovery and verification of esophageal squamous cell carcinoma using integration of SWATH/MRM. J. Proteome Res. 14, 3793–3803 (2015).

    Article  CAS  PubMed  Google Scholar 

  92. 92.

    Hou, G., Wang, Y., Lou, X. & Liu, S. Combination strategy of quantitative proteomics uncovers the related proteins of colorectal cancer in the interstitial fluid of colonic tissue from the AOM-DSS mouse model. Methods Mol. Biol. 1788, 185–192 (2017).

    Article  CAS  Google Scholar 

  93. 93.

    Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. 94.

    Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  CAS  Google Scholar 

  95. 95.

    Lindskog, C. The potential clinical impact of the tissue-based map of the human proteome. Expert Rev. Proteomics 12, 213–215 (2015).

    Article  CAS  PubMed  Google Scholar 

  96. 96.

    Uhlen, M. et al. Transcriptomics resources of human tissues and organs. Mol. Syst. Biol. 12, 862 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  97. 97.

    Wisniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. 98.

    MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. 99.

    Picotti, P. & Aebersold, R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 9, 555–566 (2012).

    Article  CAS  PubMed  Google Scholar 

  100. 100.

    Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. 101.

    Bruderer, R., Bernhardt, O. M., Gandhi, T. & Reiter, L. High-precision iRT prediction in the targeted analysis of data-independent acquisition and its impact on identification and quantitation. Proteomics 16, 2246–2256 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. 102.

    Navarro, P. et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat. Biotechnol. 34, 1130–1136 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. 103.

    Jordan, G. & Goldman, N. The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol. Biol. Evol. 29, 1125–1139 (2012).

    Article  CAS  PubMed  Google Scholar 

  104. 104.

    Löytynoja, A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 1079, 155–170 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  105. 105.

    Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).

    Article  CAS  PubMed  Google Scholar 

  106. 106.

    Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We appreciate valuable discussions with N. Jiang at MSU, the group of M. L. at Chicago, Y. Liao and M. Chen at the Institute of Genetics and Development in Beijing, and J. P. Staley at Chicago. We are thankful for the editing done by E. Mortola. This work was supported by the USA National Science Foundation (NSF) under Plant Genome Research Program numbers 0321678, 0638541 and 0822284, the Bud Antle Endowed Chair of Excellence in Agriculture and Life Sciences, and the AXA Chair for Evolutionary Genomics and Genome Biology (to R.A.W.), NSF MCB number 1026200 (to M.L. and R.A.W.), NSF MCB 1051826 and NIH R01 GM 100768 (to M.L.), the National Key R&D Program of China 2017YFC0908400 (to S.L.) and the National Program for Support of Top-notch Young Professionals of China (to Y.O.).

Author information

Affiliations

Authors

Contributions

L.Z., R.A.W., S.L. and M.L. conceived and designed the project. L.Z., Y.R., R.A.W., S.L. and M.L. wrote the manuscript, with significant contributions from C.Z., A.R.G., J.C. and Y.Z. L.Z. conducted the computational genomic analysis, with significant contributions from A.R.G., K.C., J.Z. and Y.Z. C.Z., Y.Y., J.Z., K.C., M.W., D.C. and R.A.W. generated and annotated the genome sequences. Y.R., G.H., J.Z., L.Z. and S.L. designed and conducted the proteomics experiments to detect proteins translated from de novo genes. R.Z., B.W., L.Z. and Z.P. conducted the analysis of public proteomics databases. Y.R., L.Z., J.C., M.L. and S.L. performed further evolutionary and proteomics analyses. T.Y., G.L. and Y.O. grew rice strains in Sanya (China) and dissected rice tissues. J.C., L.Z., C.Z. and M.L. conducted the evolutionary substitution analyses of de novo genes.

Corresponding authors

Correspondence to Rod A. Wing or Siqi Liu or Manyuan Long.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figures

Supplementary Figures 1–6

Reporting Summary

Supplementary File 1

Sequence alignments of 929 orphan genes exported from the MASCE program. Alignments were manually annotated at a later stage and can be found online

Supplementary File 2

Ribosome profiling evidence for candidate de novo genes

Supplementary Table 1

ORF status of de novo gene candidates in each species

Supplementary Table 2

Transcription status of de novo gene candidates in each species

Supplementary Table 3

Candidate de novo genes with matches in the Genbank’s nr database

Supplementary Table 4

Statistics of mutations that are crucial for the transformation of noncoding to coding sequences

Supplementary Table 5

Population genomics of indels and SNP in O. sativa japonica and O. barthii.

Supplementary Table 6

Expression level and tissue specificity of candidate de novo genes and old singleton 76 genes derived from OGE datasets including leaf, root, and panicle.

Supplementary Table 7

Gene structures with relevant statistics

Supplementary Table 8

Intron phase distributions for different gene categories

Supplementary Table 9

Candidate de novo genes with signals of natural selection resulting from the branch model analyses in PAML

Supplementary Table 10

. Candidate de novo genes that have been identified with peptide supports by the MRM method.

Supplementary Table 11

The eight datasets used for proteomics analysis of candidate de novo genes

Supplementary Table 12

Candidate de novo genes that have been identified with peptide supports.

Supplementary Table 13

Candidate de novo genes with ribosomal profiling evidence supports.

Supplementary Table 14

tRNA adaptive indexes (tAIs) in 175 de novo genes (plus 7 isoforms) and 4,965 single-copy genes (plus 2,079 isoforms).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Ren, Y., Yang, T. et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 3, 679–690 (2019). https://doi.org/10.1038/s41559-019-0822-5

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing