A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus

Wang, Tianyou; Wang, Baiyu; Hua, Xiuting; Tang, Haibao; Zhang, Zeyu; Gao, Ruiting; Qi, Yiying; Zhang, Qing; Wang, Gang; Yu, Zehuai; Huang, Yongji; Zhang, Zhe; Mei, Jing; Wang, Yuhao; Zhang, Yixing; Li, Yihan; Meng, Xue; Wang, Yongjun; Pan, Haoran; Chen, Shuqi; Li, Zhen; Shi, Huihong; Liu, Xinlong; Deng, Zuhu; Chen, Baoshan; Zhang, Muqing; Gu, Lianfeng; Wang, Jianping; Ming, Ray; Yao, Wei; Zhang, Jisen

doi:10.1038/s41477-023-01378-0

Article
Published: 30 March 2023

A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus

Tianyou Wang¹^na1,
Baiyu Wang²^na1,
Xiuting Hua²^na1,
Haibao Tang ORCID: orcid.org/0000-0002-3460-8570³^na1,
Zeyu Zhang⁴,
Ruiting Gao²,
Yiying Qi¹,
Qing Zhang³,
Gang Wang⁵,
Zehuai Yu²,
Yongji Huang⁶,
Zhe Zhang³,
Jing Mei³,
Yuhao Wang¹,
Yixing Zhang³,
Yihan Li²,
Xue Meng³,
Yongjun Wang¹,
Haoran Pan¹,
Shuqi Chen³,
Zhen Li¹,
Huihong Shi³,
Xinlong Liu⁷,
Zuhu Deng¹,
Baoshan Chen²,
Muqing Zhang²,
Lianfeng Gu ORCID: orcid.org/0000-0002-3810-2411⁴,
Jianping Wang ORCID: orcid.org/0000-0002-0259-1508⁸,
Ray Ming ORCID: orcid.org/0000-0002-9417-5789³,
Wei Yao ORCID: orcid.org/0000-0001-9919-1940² &
…
Jisen Zhang ORCID: orcid.org/0000-0003-1041-2757²

Nature Plants volume 9, pages 554–571 (2023)Cite this article

3547 Accesses
6 Citations
12 Altmetric
Metrics details

Subjects

Abstract

A diploid genome in the Saccharum complex facilitates our understanding of evolution in the highly polyploid Saccharum genus. Here we have generated a complete, gap-free genome assembly of Erianthus rufipilus, a diploid species within the Saccharum complex. The complete assembly revealed that centromere satellite homogenization was accompanied by the insertions of Gypsy retrotransposons, which drove centromere diversification. An overall low rate of gene transcription was observed in the palaeo-duplicated chromosome EruChr05 similar to other grasses, which might be regulated by methylation patterns mediated by homologous 24 nt small RNAs, and potentially mediating the functions of many nucleotide-binding site genes. Sequencing data for 211 accessions in the Saccharum complex indicated that Saccharum probably originated in the trans-Himalayan region from a diploid ancestor (x = 10) around 1.9–2.5 million years ago. Our study provides new insights into the origin and evolution of Saccharum and accelerates translational research in cereal genetics and genomics.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Complete assembly and validation of the T2T *E. rufipilus* Yunnan2009-3 genome.**

**Fig. 2: Characteristics of centromeres and *CEN137* satellite repeat library.**

**Fig. 3: The evolution of *E. rufipilus*.**

**Fig. 4: Methylation regulates expression of genes located on the PdCPs.**

**Fig. 5: Distribution of 24 nt sRNAs and the potential regulatory mechanisms involved in methylation and gene expression in *E. rufipilus*.**

**Fig. 6: Population evolution of *Saccharum*.**

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Jarkko Salojärvi, Aditi Rambani, … Patrick Descombes

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Qichao Lian, Bruno Huettel, … Raphael Mercier

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

Data availability

The assembled genome sequences and all raw sequencing data for E. rufipilus were deposited in the National Genomics Data Center (NGDC) database under Bioproject accession PRJCA014818. Genome assemblies and annotation files of E. rufipilus are also available in sugarcane database (http://sugarcane.zhangjisenlab.cn/SugarcaneDB/#/downloads). Source data are provided with this paper.

References

Talukdar, D., Verma, D. K., Malik, K., Mohapatra, B. & Yulianto, R. in Sugarcane Biotechnology: Challenges and Prospects (ed. Mohan, C.) 123–137 (Springer, 2017).
D'Hont, A., Lu, Y., Feldmann, P. & Glaszmann, J.-C. Cytoplasmic diversity in sugar cane revealed by heterogous probes. Sugar Cane 1, 12–15 (1993).
Google Scholar
Lu, Y. et al. Relationships among ancestral species of sugarcane revealed with RFLP using single copy maize nuclear probes. Euphytica 78, 7–18 (1994).
Article Google Scholar
Daniels, J. & Roach, B. T. in Developments in Crop Science Vol. 11 (ed. Heinz, D.) 7–84 (Elsevier, 1987).
Brandes, E. Origin, dispersal and use in breeding of the Melanesian garden sugarcane and their derivatives, Saccharum officinarum L. Proc. Int. Soc. Sugar Cane Technol. 9, 709–750 (1956).
Google Scholar
Glaszmann, J.-C., Lu, Y. & Lanaud, C. Variation of nuclear ribosomal DNA in sugarcane. J. Genet. Breed. 44, 191–197 (1990).
Google Scholar
Irvine, J. E. Saccharum species as horticultural classes. Theor. Appl. Genet. 98, 186–194 (1999).
Article Google Scholar
Soltis, P. S., Marchant, D. B., Van de Peer, Y. & Soltis, D. E. Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev. 35, 119–125 (2015).
Article CAS PubMed Google Scholar
Paterson, A., Bowers, J. & Chapman, B. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA 101, 9903–9908 (2004).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum. Nat. Genet. 54, 885–896 (2022).
Article CAS PubMed Google Scholar
Piperidis, N. & D’Hont, A. Sugarcane genome architecture decrypted with chromosome‐specific oligo probes. Plant J. 103, 2039–2051 (2020).
Article CAS PubMed Google Scholar
Thirugnanasambandam, P. P., Hoang, N. V. & Henry, R. J. The challenge of analyzing the sugarcane genome. Front. Plant Sci. 9, 616 (2018).
Article PubMed PubMed Central Google Scholar
Michael, T. P. & VanBuren, R. Building near-complete plant genomes. Curr. Opin. Plant Biol. 54, 26–33 (2020).
Article CAS PubMed Google Scholar
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with nanopore and HiFi long reads. Genomics Proteomics Bioinformatics 20, 4–13 (2021).
Article PubMed PubMed Central Google Scholar
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Article PubMed Google Scholar
Song, J.-M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Article CAS PubMed Google Scholar
Li, K. et al. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol. Plant 14, 1745–1756 (2021).
Article CAS PubMed Google Scholar
Belser, C. et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 4, 1047 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sankaranarayanan, S. R. et al. Loss of centromere function drives karyotype evolution in closely related Malassezia species. eLife 9, e53944 (2020).
Article PubMed PubMed Central Google Scholar
Chmátal, L. et al. Centromere strength provides the cell biological basis for meiotic drive and karyotype evolution in mice. Curr. Biol. 24, 2295–2300 (2014).
Article PubMed PubMed Central Google Scholar
Huang, Y. et al. The formation and evolution of centromeric satellite repeats in Saccharum species. Plant J. 106, 616–629 (2021).
Article CAS PubMed Google Scholar
Li, J. Flora of China. Harv. Pap. Bot. 13, 301–302 (2007).
Article Google Scholar
Wang, X. et al. Characterization of the chromosomal transmission of intergeneric hybrids of Saccharum spp. and Erianthus fulvus by genomic in situ hybridization. Crop Sci. 50, 1642–1648 (2010).
Article CAS Google Scholar
Lloyd Evans, D., Joshi, S. V. & Wang, J. Whole chloroplast genome and gene locus phylogenies reveal the taxonomic placement and relationship of Tripidium (Panicoideae: Andropogoneae) to sugarcane. BMC Evol. Biol. 19, 33 (2019).
Article PubMed PubMed Central Google Scholar
Welker, C. A., McKain, M. R., Vorontsova, M. S., Peichoto, M. C. & Kellogg, E. A. Plastome phylogenomics of sugarcane and relatives confirms the segregation of the genus Tripidium (Poaceae: Andropogoneae). Taxon 68, 246–267 (2019).
Article Google Scholar
Welker, C. A. D., Vorontsova, M. S. & Kellogg, E. A. A new combination in the genus Tripidium (Poaceae: Andropogoneae). Phytotaxa 471, 297–300 (2020).
Article Google Scholar
Yu, F. et al. Chromosome-specific painting unveils chromosomal fusions and distinct allopolyploid species in the Saccharum complex. N. Phytol. 233, 1953–1965 (2022).
Article CAS Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
Article CAS PubMed Google Scholar
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Article CAS PubMed Google Scholar
Mitros, T. et al. Genome biology of the paleotetraploid perennial biomass crop Miscanthus. Nat. Commun. 11, 5442 (2020).
Article CAS PubMed PubMed Central Google Scholar
Scelfo, A. & Fachinetti, D. Keeping the centromere under control: a promising role for DNA methylation. Cells 8, 912 (2019).
Article CAS PubMed PubMed Central Google Scholar
Emms, D. & Kelly, S. STAG: Species Tree inference from All Genes. Preprint at bioRxiv https://doi.org/10.1101/267914 (2018).
Zhang, G. et al. The reference genome of Miscanthus floridulus illuminates the evolution of Saccharinae. Nat. Plants 7, 608–618 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, X., Tang, H. & Paterson, A. H. Seventy million years of concerted evolution of a homoeologous chromosome pair, in parallel, in major Poaceae lineages. Plant Cell 23, 27–37 (2011).
Article PubMed PubMed Central Google Scholar
Zhou, D. & Robertson, K. D. in Genome Stability: From Virus to Human Application (eds Kovalchuk, I. & Kovalchuk, O.) Ch 24 (Academic Press, 2016).
Matzke, M. A. & Mosher, R. A. RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nat. Rev. Genet. 15, 394–408 (2014).
Article CAS PubMed Google Scholar
Huang, B., Spooner, D. M. & Liang, Q. Genome diversity of the potato. Proc. Natl Acad. Sci. USA 115, E6392–E6393 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bredeson, J. V. et al. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat. Biotechnol. 34, 562–570 (2016).
Article CAS PubMed Google Scholar
Myles, S. et al. Genetic structure and domestication history of the grape. Proc. Natl Acad. Sci. USA 108, 3530–3535 (2011).
Article CAS PubMed PubMed Central Google Scholar
Petit, J. R. et al. Climate and atmospheric history of the past 420,000 years from the Vostok ice core, Antarctica. Nature 399, 429–436 (1999).
Article CAS Google Scholar
Zheng, B., Xu, Q. & Shen, Y. The relationship between climate change and Quaternary glacial cycles on the Qinghai–Tibetan Plateau: review and speculation. Quat. Int. 97-98, 93–101 (2002).
Article Google Scholar
Bever, J. D. & Felber, F. The theoretical population genetics of autopolyploidy. Oxf. Surv. Evolut. Biol. 8, 185 (1992).
Google Scholar
Garsmeur, O. et al. A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nat. Commun. 9, 2638 (2018).
Article PubMed PubMed Central Google Scholar
Trujillo-Montenegro, J. H. et al. Unraveling the genome of a high yielding Colombian sugarcane hybrid. Front. Plant Sci. 12, 694859 (2021).
Article PubMed PubMed Central Google Scholar
Souza, G. M. et al. Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop. Gigascience 8, giz129 (2019).
Article PubMed PubMed Central Google Scholar
Shearman, J. R. et al. A draft chromosome-scale genome assembly of a commercial sugarcane. Sci. Rep. 12, 20474 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bilinski, P. et al. Diversity and evolution of centromere repeats in the maize genome. Chromosoma 124, 57–65 (2015).
Article PubMed Google Scholar
Bowers, J. E. et al. Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc. Natl Acad. Sci. USA 102, 13206–13211 (2005).
Article CAS PubMed PubMed Central Google Scholar
Erdmann, R. M. & Picard, C. L. RNA-directed DNA methylation. PLoS Genet. 16, e1009034 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rodin, S. N. & Riggs, A. D. Epigenetic silencing may aid evolution by gene duplication. J. Mol. Evol. 56, 718–729 (2003).
Article CAS PubMed Google Scholar
Keller, T. E. & Yi, S. V. DNA methylation and evolution of duplicate genes. Proc. Natl Acad. Sci. USA 111, 5932–5937 (2014).
Article CAS PubMed PubMed Central Google Scholar
Schuster, R. Continental movements,“Wallace’s Line” and Indomalayan-Australasian dispersal of land plants: some eclectic concepts. Bot. Rev. 38, 3–86 (1972).
Article Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Jin, J.-J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 21, 241 (2020).
Article PubMed PubMed Central Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).
Article CAS PubMed PubMed Central Google Scholar
Myers, E. W. The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Article CAS PubMed Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Article PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Article PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27, 49–54 (1999).
Article CAS PubMed PubMed Central Google Scholar
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
Article PubMed PubMed Central Google Scholar
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).
Article PubMed Google Scholar
Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 (2003).
Article CAS PubMed Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nat. Biotechnol. 29, 644 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Article CAS PubMed Google Scholar
Zhang, Q. et al. Structure, phylogeny, allelic haplotypes and expression of sucrose transporter gene families in Saccharum. BMC Genomics 17, 88 (2016).
Article PubMed PubMed Central Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS PubMed PubMed Central Google Scholar
Langdon, W. B. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. BioData Min. 8, 1 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014).
Article PubMed PubMed Central Google Scholar
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Article PubMed PubMed Central Google Scholar
Liu, X. & Fu, Y.-X. Exploring population size changes using SNP frequency spectra. Nat. Genet. 47, 555–559 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development programme (2021YFF1000101 and 2021YFF1000104); the Science and Technology Planting Project of Guangdong Province (2019B020238001), the National High-tech R&D Program (2013AA100604); the National Natural Science Foundation of China (31660420); the Science and Technology Major Project of Guangxi (AA17202025); the Fujian Provincial Department of Education (JA12082); the Natural Science Foundation of Fujian Province, China (2019J0102); the National Natural Science Foundation of China (32201794); the fellowship of China National Postdoctoral Program for Innovative Talents (BX20220349); and the National Natural Science Foundation of China (32001605).

Author information

These authors contributed equally: Tianyou Wang, Baiyu Wang, Xiuting Hua, Haibao Tang.

Authors and Affiliations

National Engineering Research Center for Sugarcane, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, China
Tianyou Wang, Yiying Qi, Yuhao Wang, Yongjun Wang, Haoran Pan, Zhen Li & Zuhu Deng
State Key Lab for Conservation and Utilization of Subtropical AgroBiological Resources and Guangxi Key Lab for Sugarcane Biology, Guangxi University, Nanning, China
Baiyu Wang, Xiuting Hua, Ruiting Gao, Zehuai Yu, Yihan Li, Baoshan Chen, Muqing Zhang, Wei Yao & Jisen Zhang
Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science, Fujian Agriculture and Forestry University, Fuzhou, China
Haibao Tang, Qing Zhang, Zhe Zhang, Jing Mei, Yixing Zhang, Xue Meng, Shuqi Chen, Huihong Shi & Ray Ming
Basic Forestry and Proteomics Research Center, College of Forestry, Haixia Institute of Science, Fujian Agriculture and Forestry University, Fuzhou, China
Zeyu Zhang & Lianfeng Gu
Jiangsu Key Laboratory for Bioresources of Saline Soils, Yancheng Teachers University, Yancheng, China
Gang Wang
Institute of Oceanography, Marine Biotechnology Center, Minjiang University, Fuzhou, China
Yongji Huang
Yunnan Key Laboratory of Sugarcane Genetic Improvement, Sugarcane Research Institute, Yunnan Academy of Agricultural Sciences, Kaiyuan, China
Xinlong Liu
Department of Agronomy, University of Florida, Gainesville, FL, USA
Jianping Wang

Authors

Tianyou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baiyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiuting Hua
View author publications
You can also search for this author in PubMed Google Scholar
Haibao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruiting Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yiying Qi
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zehuai Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yongji Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Mei
View author publications
You can also search for this author in PubMed Google Scholar
Yuhao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yixing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yihan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xue Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yongjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Pan
View author publications
You can also search for this author in PubMed Google Scholar
Shuqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Huihong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xinlong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zuhu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Baoshan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Muqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lianfeng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ray Ming
View author publications
You can also search for this author in PubMed Google Scholar
Wei Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jisen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Z. conceived this project and coordinated research activities; J.Z., W.Y. and H.T. designed the experiments; T.W., J.Z., B.W., X.L., B.C., X.M., R.M. and M.Z. collected E. rufipilus and sugarcane materials; X.H., T.W., B.W., Zhe Z. and H.S. compared the morphological and anatomical features; Z.Y., Y.H. and Z.D. performed the oligo-FISH experiments; T.W., B.W., Q.Z., G.W. and Y.L. assembled, validated and annotated the T2T E. rufipilus; B.W., Y.Q. and T.W. characterized the centromeric sequences; Zeyu Z., L.G. and Yongjun W. analysed bisulfite sequencing data and sRNA sequencing data; T.W., B.Y., Q.Z., J.M., Y.Z., Yuhao W., Z.L., H.P. and S.C. conducted the genomic characteristics analysis and evolution of PdCPs. T.W., B.W., R.G., Y.Q. and Yuhao W. studied the population genomics. J.Z., T.W., H.T. and J.W. wrote the paper.

Corresponding authors

Correspondence to Wei Yao or Jisen Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Ling-Ling Chen, John Riascos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 FISH using chromosome-specific oligo probes according to the S. officinarum genome in the same metaphase cells of Erianthus rufipilus Yunnan2009-3.

a, b, c, d and e. E. rufipilus chromosome-specific oligo probes for Chr01, Chr03, Chr05, Chr07, and Chr09 are visualized in red. E. rufipilus chromosome-specific oligo probes Chr02, Chr04, Chr06, Chr08, and Chr10 are visualized in green. Karyotypes of E. rufipilus are shown in f and g. These results confirm the chromosome number of E. rufipilus as 20 and the basic chromosome number as x = 10. Experiment was repeated at least 5 times independently with similar results. Bars = 10 μm.

Extended Data Fig. 2 Genome-wide chromatin interactions in the E. rufipilus genome assembly at 1000-Kbp resolution.

a. Genome-wide chromatin interactions. b. Chromatin interactions in each of its 10 Chromosomes. The intensity of pixels represents the links between 1000-kb windows on all chromosomes. Darker red color indicates higher contact probability.

Extended Data Fig. 3 Alignment of Bionano optical map against in-silico maps of the E. rufipilus genome in the centromere regions.

High match lines between Bionano optical map and in-silico maps of the E. rufipilus genome verifies the accurate assembly of centromeres.

Extended Data Fig. 4 Characteristics and epigenetics of all 10 centromeres of E. rufipilus.

The distributions of CEN137 per 10-kbp on forward (red) or reverse (blue) strands, genes (dark blue), transcript abundance (green), LTR in the GYPSY superfamily (purple), methylation patterns (CG, green; CHG, yellow; CHH, dark blue) and CEN137 sequence similarity on the 10 centromeric regions were plotted successively. The bottom shows the methylation patterns on a chromosomal scale.

Extended Data Fig. 5 Whole plastid phylogram indicates the Erianthus rufipilus belongs to Saccharum genus.

Phylogram of whole chloroplast alignments for Erianthus rufipilus accessions (in red color) from trans-Himalayan region and “Saccharum” (Tripidium) rufipilum accessions from South African Sugarcane Research Institute (in red color) with 52 representative chloroplast sequences. Numbers next to nodes represent bootstraps and the scale bar at the base of the phylogeny represents the expected number of substitutions per site.

Extended Data Fig. 6 Gene family expansion and contraction in E. rufipilus and representative species in the grass lineages.

Red color and blue color indicate different rate of change of evolution (lambda) in the tree, and the size of circle represents the average expansion (contraction) ratio of expansion and contraction. The numbers on the left and on the right represent the expansion (+) or contraction (-) of gene families on the node and in each species, respectively. The numbers in the brackets represent the number of rapidly expanding or contracting gene families.

Extended Data Fig. 7 Evolution of paleo-duplicated chromosome pairs of Saccharum.

Syntenic blocks were identified on E. rufipilus Chr05 and Chr08 as well as in the corresponding chromosomes in representative grass lineages. The colors in the legend indicate the Ks value of the syntenic gene pairs, and the green arrows indicate the large fragment inversion on E. rufipilus Chr08 and the corresponding chromosome in representative grass lineages.

Extended Data Fig. 8 Population characteristics and evolution of E. rufipilus.

a. Principal component analysis with PC1 and PC2. b. Genetic differentiation values (F_st) between different groups are presented on the dashed line and nucleotide diversity (π) in different groups are presented in the circles. c. Genome-wide linkage disequilibrium (LD) analysis of E. rufipilus accessions. d. The distribution of Tajima’s D values among E. rufipilus accessions. e. ADMIXTURE plot of E. rufipilus accessions for K = 3 through 7.

Extended Data Fig. 9 Selective sweeps in Saccharum.

a. Selective sweep detection by estimating ROD (the genomic nucleotide diversity decrease ratio) implicated genes related to disease resistance or response to water deprivation in S. spontaneum. b. Selective sweeps were detected by estimating ROD and implicated genes related to reproduction, development, or photosynthesis in S. officinarum. c. Expression profiles (log₂(FPKM + 1)) of the genes (marked in a) in the developmental gradient leaf segments in S. spontaneum. d. Expression profiles of the genes (marked in b) in the developmental gradient leaf segments in S. officinarum.

Extended Data Fig. 10 Deleterious mutations in Saccharum.

a. Comparison of total deleterious mutation numbers in the (corresponding) chromosomes of S. officinarum, E. rufipilus, and S. spontaneum. On each box, the centerline represents the median; the lower and upper hinges represent the 25th and 75th percentiles and the whiskers represent 1.5× the interquartile range. n = 10 for all the 3 species. T-test, P-values adjusted using Holm procedure, * stand for P < 0.1, and **** stand for P < 0.0001. b. Distributions of deleterious mutations on different chromosomes in S. officinarum, E. rufipilus, and S. spontaneum. c. GO term enrichment for genes carrying deleterious mutations in E. rufipilus. Fisher’s exact test, with P-values adjusted using the Benjamini–Hochberg correction for multiple hypothesis testing.

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2, Figs. 1–20 and Tables 1–18.

Reporting Summary

Supplementary Data 1

The seed LTR sequences of E. rufipilus.

Supplementary Data 2

Estimated times of LTRs inserted to centromeres.

Supplementary Data 3

The total number of methylated cytosines of CG, CHG and CHH context in root, stem and leaf.

Supplementary Data 4

Collinear gene pair IDs in inverted regions of E. rufipilus and Sorghum.

Supplementary Data 5

Enriched GO terms of rapidly expanding gene families.

Supplementary Data 6

Synteny gene pairs on PdCPs in E. rufipilus, rice, sorghum, Miscanthus, S. spontaneum Np-X and S. spontaneum AP85-441.

Supplementary Data 7

NBS genes in E. rufipilus, rice and sorghum.

Supplementary Data 8

P values of CG, CHG and CHH methylation on chromosome 05, chromosome 08 and other chromosomes (t-test).

Supplementary Data 9

Correlation of CG, CHG and CHH methylation with gene expression.

Supplementary Data 10

Highly expressed 24 nt sRNA at promoter in the examined tissues.

Supplementary Data 11

Annotation of the corresponding selective sweep genes in S. spontaneum.

Supplementary Data 12

Annotation of the corresponding selective sweep genes in S. officinarum.

Source data

Source Data Fig. 1

Unprocessed gel of PCR product for gap filling in Fig. 1b.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, T., Wang, B., Hua, X. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023). https://doi.org/10.1038/s41477-023-01378-0

Download citation

Received: 24 September 2022
Accepted: 21 February 2023
Published: 30 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1038/s41477-023-01378-0

This article is cited by

Three near-complete genome assemblies reveal substantial centromere dynamics from diploid to tetraploid in Brachypodium genus
- Chuanye Chen
- Siying Wu
- Handong Su
Genome Biology (2024)
Technology-enabled great leap in deciphering plant genomes
- Lingjuan Xie
- Xiaojiao Gong
- Longjiang Fan
Nature Plants (2024)
The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars
- Jarkko Salojärvi
- Aditi Rambani
- Patrick Descombes
Nature Genetics (2024)