Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus

Abstract

A diploid genome in the Saccharum complex facilitates our understanding of evolution in the highly polyploid Saccharum genus. Here we have generated a complete, gap-free genome assembly of Erianthus rufipilus, a diploid species within the Saccharum complex. The complete assembly revealed that centromere satellite homogenization was accompanied by the insertions of Gypsy retrotransposons, which drove centromere diversification. An overall low rate of gene transcription was observed in the palaeo-duplicated chromosome EruChr05 similar to other grasses, which might be regulated by methylation patterns mediated by homologous 24 nt small RNAs, and potentially mediating the functions of many nucleotide-binding site genes. Sequencing data for 211 accessions in the Saccharum complex indicated that Saccharum probably originated in the trans-Himalayan region from a diploid ancestor (x = 10) around 1.9–2.5 million years ago. Our study provides new insights into the origin and evolution of Saccharum and accelerates translational research in cereal genetics and genomics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Complete assembly and validation of the T2T E. rufipilus Yunnan2009-3 genome.
Fig. 2: Characteristics of centromeres and CEN137 satellite repeat library.
Fig. 3: The evolution of E. rufipilus.
Fig. 4: Methylation regulates expression of genes located on the PdCPs.
Fig. 5: Distribution of 24 nt sRNAs and the potential regulatory mechanisms involved in methylation and gene expression in E. rufipilus.
Fig. 6: Population evolution of Saccharum.

Similar content being viewed by others

Data availability

The assembled genome sequences and all raw sequencing data for E. rufipilus were deposited in the National Genomics Data Center (NGDC) database under Bioproject accession PRJCA014818. Genome assemblies and annotation files of E. rufipilus are also available in sugarcane database (http://sugarcane.zhangjisenlab.cn/SugarcaneDB/#/downloads). Source data are provided with this paper.

References

  1. Talukdar, D., Verma, D. K., Malik, K., Mohapatra, B. & Yulianto, R. in Sugarcane Biotechnology: Challenges and Prospects (ed. Mohan, C.) 123–137 (Springer, 2017).

  2. D'Hont, A., Lu, Y., Feldmann, P. & Glaszmann, J.-C. Cytoplasmic diversity in sugar cane revealed by heterogous probes. Sugar Cane 1, 12–15 (1993).

    Google Scholar 

  3. Lu, Y. et al. Relationships among ancestral species of sugarcane revealed with RFLP using single copy maize nuclear probes. Euphytica 78, 7–18 (1994).

    Article  Google Scholar 

  4. Daniels, J. & Roach, B. T. in Developments in Crop Science Vol. 11 (ed. Heinz, D.) 7–84 (Elsevier, 1987).

  5. Brandes, E. Origin, dispersal and use in breeding of the Melanesian garden sugarcane and their derivatives, Saccharum officinarum L. Proc. Int. Soc. Sugar Cane Technol. 9, 709–750 (1956).

    Google Scholar 

  6. Glaszmann, J.-C., Lu, Y. & Lanaud, C. Variation of nuclear ribosomal DNA in sugarcane. J. Genet. Breed. 44, 191–197 (1990).

    Google Scholar 

  7. Irvine, J. E. Saccharum species as horticultural classes. Theor. Appl. Genet. 98, 186–194 (1999).

    Article  Google Scholar 

  8. Soltis, P. S., Marchant, D. B., Van de Peer, Y. & Soltis, D. E. Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev. 35, 119–125 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Paterson, A., Bowers, J. & Chapman, B. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA 101, 9903–9908 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhang, Q. et al. Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum. Nat. Genet. 54, 885–896 (2022).

    Article  CAS  PubMed  Google Scholar 

  11. Piperidis, N. & D’Hont, A. Sugarcane genome architecture decrypted with chromosome‐specific oligo probes. Plant J. 103, 2039–2051 (2020).

    Article  CAS  PubMed  Google Scholar 

  12. Thirugnanasambandam, P. P., Hoang, N. V. & Henry, R. J. The challenge of analyzing the sugarcane genome. Front. Plant Sci. 9, 616 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Michael, T. P. & VanBuren, R. Building near-complete plant genomes. Curr. Opin. Plant Biol. 54, 26–33 (2020).

    Article  CAS  PubMed  Google Scholar 

  14. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with nanopore and HiFi long reads. Genomics Proteomics Bioinformatics 20, 4–13 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).

    Article  PubMed  Google Scholar 

  17. Song, J.-M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).

    Article  CAS  PubMed  Google Scholar 

  18. Li, K. et al. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol. Plant 14, 1745–1756 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Belser, C. et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 4, 1047 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sankaranarayanan, S. R. et al. Loss of centromere function drives karyotype evolution in closely related Malassezia species. eLife 9, e53944 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Chmátal, L. et al. Centromere strength provides the cell biological basis for meiotic drive and karyotype evolution in mice. Curr. Biol. 24, 2295–2300 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Huang, Y. et al. The formation and evolution of centromeric satellite repeats in Saccharum species. Plant J. 106, 616–629 (2021).

    Article  CAS  PubMed  Google Scholar 

  24. Li, J. Flora of China. Harv. Pap. Bot. 13, 301–302 (2007).

    Article  Google Scholar 

  25. Wang, X. et al. Characterization of the chromosomal transmission of intergeneric hybrids of Saccharum spp. and Erianthus fulvus by genomic in situ hybridization. Crop Sci. 50, 1642–1648 (2010).

    Article  CAS  Google Scholar 

  26. Lloyd Evans, D., Joshi, S. V. & Wang, J. Whole chloroplast genome and gene locus phylogenies reveal the taxonomic placement and relationship of Tripidium (Panicoideae: Andropogoneae) to sugarcane. BMC Evol. Biol. 19, 33 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Welker, C. A., McKain, M. R., Vorontsova, M. S., Peichoto, M. C. & Kellogg, E. A. Plastome phylogenomics of sugarcane and relatives confirms the segregation of the genus Tripidium (Poaceae: Andropogoneae). Taxon 68, 246–267 (2019).

    Article  Google Scholar 

  28. Welker, C. A. D., Vorontsova, M. S. & Kellogg, E. A. A new combination in the genus Tripidium (Poaceae: Andropogoneae). Phytotaxa 471, 297–300 (2020).

    Article  Google Scholar 

  29. Yu, F. et al. Chromosome-specific painting unveils chromosomal fusions and distinct allopolyploid species in the Saccharum complex. N. Phytol. 233, 1953–1965 (2022).

    Article  CAS  Google Scholar 

  30. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  34. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  PubMed  Google Scholar 

  35. Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

    Article  CAS  PubMed  Google Scholar 

  36. Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).

    Article  CAS  PubMed  Google Scholar 

  37. Mitros, T. et al. Genome biology of the paleotetraploid perennial biomass crop Miscanthus. Nat. Commun. 11, 5442 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Scelfo, A. & Fachinetti, D. Keeping the centromere under control: a promising role for DNA methylation. Cells 8, 912 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Emms, D. & Kelly, S. STAG: Species Tree inference from All Genes. Preprint at bioRxiv https://doi.org/10.1101/267914 (2018).

  40. Zhang, G. et al. The reference genome of Miscanthus floridulus illuminates the evolution of Saccharinae. Nat. Plants 7, 608–618 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Wang, X., Tang, H. & Paterson, A. H. Seventy million years of concerted evolution of a homoeologous chromosome pair, in parallel, in major Poaceae lineages. Plant Cell 23, 27–37 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Zhou, D. & Robertson, K. D. in Genome Stability: From Virus to Human Application (eds Kovalchuk, I. & Kovalchuk, O.) Ch 24 (Academic Press, 2016).

  43. Matzke, M. A. & Mosher, R. A. RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nat. Rev. Genet. 15, 394–408 (2014).

    Article  CAS  PubMed  Google Scholar 

  44. Huang, B., Spooner, D. M. & Liang, Q. Genome diversity of the potato. Proc. Natl Acad. Sci. USA 115, E6392–E6393 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Bredeson, J. V. et al. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat. Biotechnol. 34, 562–570 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Myles, S. et al. Genetic structure and domestication history of the grape. Proc. Natl Acad. Sci. USA 108, 3530–3535 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Petit, J. R. et al. Climate and atmospheric history of the past 420,000 years from the Vostok ice core, Antarctica. Nature 399, 429–436 (1999).

    Article  CAS  Google Scholar 

  48. Zheng, B., Xu, Q. & Shen, Y. The relationship between climate change and Quaternary glacial cycles on the Qinghai–Tibetan Plateau: review and speculation. Quat. Int. 97-98, 93–101 (2002).

    Article  Google Scholar 

  49. Bever, J. D. & Felber, F. The theoretical population genetics of autopolyploidy. Oxf. Surv. Evolut. Biol. 8, 185 (1992).

    Google Scholar 

  50. Garsmeur, O. et al. A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nat. Commun. 9, 2638 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Trujillo-Montenegro, J. H. et al. Unraveling the genome of a high yielding Colombian sugarcane hybrid. Front. Plant Sci. 12, 694859 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Souza, G. M. et al. Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop. Gigascience 8, giz129 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Shearman, J. R. et al. A draft chromosome-scale genome assembly of a commercial sugarcane. Sci. Rep. 12, 20474 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Bilinski, P. et al. Diversity and evolution of centromere repeats in the maize genome. Chromosoma 124, 57–65 (2015).

    Article  PubMed  Google Scholar 

  56. Bowers, J. E. et al. Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc. Natl Acad. Sci. USA 102, 13206–13211 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Erdmann, R. M. & Picard, C. L. RNA-directed DNA methylation. PLoS Genet. 16, e1009034 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Rodin, S. N. & Riggs, A. D. Epigenetic silencing may aid evolution by gene duplication. J. Mol. Evol. 56, 718–729 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Keller, T. E. & Yi, S. V. DNA methylation and evolution of duplicate genes. Proc. Natl Acad. Sci. USA 111, 5932–5937 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Schuster, R. Continental movements,“Wallace’s Line” and Indomalayan-Australasian dispersal of land plants: some eclectic concepts. Bot. Rev. 38, 3–86 (1972).

    Article  Google Scholar 

  61. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Jin, J.-J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 21, 241 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Myers, E. W. The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005).

    Article  CAS  PubMed  Google Scholar 

  66. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    Article  CAS  PubMed  Google Scholar 

  69. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).

    Article  CAS  PubMed  Google Scholar 

  71. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).

    Article  PubMed  Google Scholar 

  73. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27, 49–54 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).

    Article  CAS  PubMed  Google Scholar 

  80. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).

    Article  CAS  PubMed  Google Scholar 

  85. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  86. De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).

    Article  PubMed  Google Scholar 

  87. Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 (2003).

    Article  CAS  PubMed  Google Scholar 

  88. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nat. Biotechnol. 29, 644 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

    Article  CAS  PubMed  Google Scholar 

  91. Zhang, Q. et al. Structure, phylogeny, allelic haplotypes and expression of sucrose transporter gene families in Saccharum. BMC Genomics 17, 88 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Langdon, W. B. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. BioData Min. 8, 1 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  100. Liu, X. & Fu, Y.-X. Exploring population size changes using SNP frequency spectra. Nat. Genet. 47, 555–559 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development programme (2021YFF1000101 and 2021YFF1000104); the Science and Technology Planting Project of Guangdong Province (2019B020238001), the National High-tech R&D Program (2013AA100604); the National Natural Science Foundation of China (31660420); the Science and Technology Major Project of Guangxi (AA17202025); the Fujian Provincial Department of Education (JA12082); the Natural Science Foundation of Fujian Province, China (2019J0102); the National Natural Science Foundation of China (32201794); the fellowship of China National Postdoctoral Program for Innovative Talents (BX20220349); and the National Natural Science Foundation of China (32001605).

Author information

Authors and Affiliations

Authors

Contributions

J.Z. conceived this project and coordinated research activities; J.Z., W.Y. and H.T. designed the experiments; T.W., J.Z., B.W., X.L., B.C., X.M., R.M. and M.Z. collected E. rufipilus and sugarcane materials; X.H., T.W., B.W., Zhe Z. and H.S. compared the morphological and anatomical features; Z.Y., Y.H. and Z.D. performed the oligo-FISH experiments; T.W., B.W., Q.Z., G.W. and Y.L. assembled, validated and annotated the T2T E. rufipilus; B.W., Y.Q. and T.W. characterized the centromeric sequences; Zeyu Z., L.G. and Yongjun W. analysed bisulfite sequencing data and sRNA sequencing data; T.W., B.Y., Q.Z., J.M., Y.Z., Yuhao W., Z.L., H.P. and S.C. conducted the genomic characteristics analysis and evolution of PdCPs. T.W., B.W., R.G., Y.Q. and Yuhao W. studied the population genomics. J.Z., T.W., H.T. and J.W. wrote the paper.

Corresponding authors

Correspondence to Wei Yao or Jisen Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Ling-Ling Chen, John Riascos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 FISH using chromosome-specific oligo probes according to the S. officinarum genome in the same metaphase cells of Erianthus rufipilus Yunnan2009-3.

a, b, c, d and e. E. rufipilus chromosome-specific oligo probes for Chr01, Chr03, Chr05, Chr07, and Chr09 are visualized in red. E. rufipilus chromosome-specific oligo probes Chr02, Chr04, Chr06, Chr08, and Chr10 are visualized in green. Karyotypes of E. rufipilus are shown in f and g. These results confirm the chromosome number of E. rufipilus as 20 and the basic chromosome number as x = 10. Experiment was repeated at least 5 times independently with similar results. Bars = 10 μm.

Extended Data Fig. 2 Genome-wide chromatin interactions in the E. rufipilus genome assembly at 1000-Kbp resolution.

a. Genome-wide chromatin interactions. b. Chromatin interactions in each of its 10 Chromosomes. The intensity of pixels represents the links between 1000-kb windows on all chromosomes. Darker red color indicates higher contact probability.

Extended Data Fig. 3 Alignment of Bionano optical map against in-silico maps of the E. rufipilus genome in the centromere regions.

High match lines between Bionano optical map and in-silico maps of the E. rufipilus genome verifies the accurate assembly of centromeres.

Extended Data Fig. 4 Characteristics and epigenetics of all 10 centromeres of E. rufipilus.

The distributions of CEN137 per 10-kbp on forward (red) or reverse (blue) strands, genes (dark blue), transcript abundance (green), LTR in the GYPSY superfamily (purple), methylation patterns (CG, green; CHG, yellow; CHH, dark blue) and CEN137 sequence similarity on the 10 centromeric regions were plotted successively. The bottom shows the methylation patterns on a chromosomal scale.

Extended Data Fig. 5 Whole plastid phylogram indicates the Erianthus rufipilus belongs to Saccharum genus.

Phylogram of whole chloroplast alignments for Erianthus rufipilus accessions (in red color) from trans-Himalayan region and “Saccharum” (Tripidium) rufipilum accessions from South African Sugarcane Research Institute (in red color) with 52 representative chloroplast sequences. Numbers next to nodes represent bootstraps and the scale bar at the base of the phylogeny represents the expected number of substitutions per site.

Extended Data Fig. 6 Gene family expansion and contraction in E. rufipilus and representative species in the grass lineages.

Red color and blue color indicate different rate of change of evolution (lambda) in the tree, and the size of circle represents the average expansion (contraction) ratio of expansion and contraction. The numbers on the left and on the right represent the expansion (+) or contraction (-) of gene families on the node and in each species, respectively. The numbers in the brackets represent the number of rapidly expanding or contracting gene families.

Extended Data Fig. 7 Evolution of paleo-duplicated chromosome pairs of Saccharum.

Syntenic blocks were identified on E. rufipilus Chr05 and Chr08 as well as in the corresponding chromosomes in representative grass lineages. The colors in the legend indicate the Ks value of the syntenic gene pairs, and the green arrows indicate the large fragment inversion on E. rufipilus Chr08 and the corresponding chromosome in representative grass lineages.

Extended Data Fig. 8 Population characteristics and evolution of E. rufipilus.

a. Principal component analysis with PC1 and PC2. b. Genetic differentiation values (Fst) between different groups are presented on the dashed line and nucleotide diversity (π) in different groups are presented in the circles. c. Genome-wide linkage disequilibrium (LD) analysis of E. rufipilus accessions. d. The distribution of Tajima’s D values among E. rufipilus accessions. e. ADMIXTURE plot of E. rufipilus accessions for K = 3 through 7.

Extended Data Fig. 9 Selective sweeps in Saccharum.

a. Selective sweep detection by estimating ROD (the genomic nucleotide diversity decrease ratio) implicated genes related to disease resistance or response to water deprivation in S. spontaneum. b. Selective sweeps were detected by estimating ROD and implicated genes related to reproduction, development, or photosynthesis in S. officinarum. c. Expression profiles (log2(FPKM + 1)) of the genes (marked in a) in the developmental gradient leaf segments in S. spontaneum. d. Expression profiles of the genes (marked in b) in the developmental gradient leaf segments in S. officinarum.

Extended Data Fig. 10 Deleterious mutations in Saccharum.

a. Comparison of total deleterious mutation numbers in the (corresponding) chromosomes of S. officinarum, E. rufipilus, and S. spontaneum. On each box, the centerline represents the median; the lower and upper hinges represent the 25th and 75th percentiles and the whiskers represent 1.5× the interquartile range. n = 10 for all the 3 species. T-test, P-values adjusted using Holm procedure, * stand for P < 0.1, and **** stand for P < 0.0001. b. Distributions of deleterious mutations on different chromosomes in S. officinarum, E. rufipilus, and S. spontaneum. c. GO term enrichment for genes carrying deleterious mutations in E. rufipilus. Fisher’s exact test, with P-values adjusted using the Benjamini–Hochberg correction for multiple hypothesis testing.

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2, Figs. 1–20 and Tables 1–18.

Reporting Summary

Supplementary Data 1

The seed LTR sequences of E. rufipilus.

Supplementary Data 2

Estimated times of LTRs inserted to centromeres.

Supplementary Data 3

The total number of methylated cytosines of CG, CHG and CHH context in root, stem and leaf.

Supplementary Data 4

Collinear gene pair IDs in inverted regions of E. rufipilus and Sorghum.

Supplementary Data 5

Enriched GO terms of rapidly expanding gene families.

Supplementary Data 6

Synteny gene pairs on PdCPs in E. rufipilus, rice, sorghum, Miscanthus, S. spontaneum Np-X and S. spontaneum AP85-441.

Supplementary Data 7

NBS genes in E. rufipilus, rice and sorghum.

Supplementary Data 8

P values of CG, CHG and CHH methylation on chromosome 05, chromosome 08 and other chromosomes (t-test).

Supplementary Data 9

Correlation of CG, CHG and CHH methylation with gene expression.

Supplementary Data 10

Highly expressed 24 nt sRNA at promoter in the examined tissues.

Supplementary Data 11

Annotation of the corresponding selective sweep genes in S. spontaneum.

Supplementary Data 12

Annotation of the corresponding selective sweep genes in S. officinarum.

Source data

Source Data Fig. 1

Unprocessed gel of PCR product for gap filling in Fig. 1b.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Wang, B., Hua, X. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023). https://doi.org/10.1038/s41477-023-01378-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-023-01378-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing