Introduction

Sugarcane (Saccharum spp.) plays a vital role as a primary sugar-producing crop (sugar 80%) and has major potential as a renewable bioenergy crop (ethanol 50%) in world agriculture1. The Saccharum complex contains six main species: the two wild species are S. spontaneum and S. robustum, and the four cultivated species are S. officinarum, S. sinense, S. barberi and S. edule2. In addition, Erianthus arundinaceus is a species of Erianthus genus with strong abiotic stress tolerance and could be widely used for modern sugarcane breeding and a potential bioenergy plant3. Currently, sugarcane commercial breeding populations in the world share a narrow genetic base due to their common origins from a number of popular cultivars, such as POJ2878, Co419 and NCo310 which were achieved in the early 1900s2. Furthermore, these exotic varieties were developed from complex interspecific hybridization through Noblization Breeding process among wild clones of S. spontaneum and S. officinarum4. There is still a great attention among sugarcane breeders in broadening the genetic base of the crop and also in taping into the gene pool of the wild relatives to enhance stress-resistance and sucrose content5.

Since the late 1980s, sugarcane breeders and geneticists have discovered and use several DNA molecular markers including amplified fragment length polymorphisms (AFLP), restriction fragment length polymorphisms (RFLP), random amplification of polymorphic DNAs (RAPD), single nucleotide polymorphism (SNP), simple sequence repeats (SSRs), inter simple sequence repeat (ISSRs), and expressed sequence tag- simple sequence repeat (EST-SSRs) to improve Saccharum breeding6. Among these molecular markers, SSR (microsatellite) markers have been widely used to study sugarcane genetic diversity7, genetic mapping8, cross-transferability9, paternity analysis10, segregation analysis11, and marker-assisted selection12. SSR primer pairs are considered the most capable marker for plant genetics and breeding programs, because of co-dominant, multi-allelic nature, and relatively abundant with an excellent genome coverage13.

Early molecular marker research focused on the origin of wild Saccharum species. Lu et al.14 proposed a hybrid origin for S. barberi and S. sinense from natural hybridization between S. spontaneum and S. officinarum, based on a factorial correspondence analysis of RFLP markers. Subsequently, these results were supported by Irvine15 and Selvi et al.16 using SSR markers and by D’Hont et al.17 utilizing genomic in situ hybridization (GISH). Based on analysis of agronomic traits and mitochondrial profiles, S. barberi and S. sinense were placed in adjacent clusters, but apart from S. robustum18,19,20. Later on a number of reports focused on the analysis of genetic diversity and population structure among commercial Saccharum spp. hybrids varieties7,21,22,23,24,25 and among S. spontaneum populations with different ploidy levels in China26. Therefore, there has been an increasing interest among sugarcane breeders to investigate the genetic diversity of parental resources and to broaden the genetic base by tapping into the gene pools of the wild relatives27,28,29.

To better understand the genetic background of these euploid sugarcane clones, this study aimed to characterize the genetic diversity and population structure of 115 accessions belonging to S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense, E. arundinaceus, and Saccharum spp. hybrids. The results may provide invaluable information for the better utilization of Saccharum and Erianthus wild germplasms at different ploidy levels in sugarcane breeding.

Results

Total alleles amplification of 21 SSR markers

A total of 167 SSR alleles were amplified from the DNA of 115 accessions including five Saccharum species, E. arundinaceus, and 30 clones of Saccharum spp. hybrids with the 21 fluorescence-labeled SSR primer pairs and capillary electrophoresis (CE) detection system. We could not find in our CE data the 16 SSR alleles reported earlier by Pan30, but instead, we have found 38 new SSR alleles that were never reported before (Table 1). Furthermore, the numbers of new and absent SSR alleles detected in this study were greater than the 20 new and 13 absent SSR alleles reported previously by Ali et al.7.

Table 1 The general utility and amplification profile of 21 SSR primer pairs based on a capillary electrophoresis (CE) detection platform.

The number of alleles detected by the CE system varied from as few as four (SMC36BUQ) to as many as 13 (SMC597CS), with an average of 7.95 per SSR primer pair. Seven SSR primer pairs, namely SMC24DUQ, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, mSSCIR3 and mSSCIR43, were highly polymorphic, each producing 10 to 13 alleles. Other eleven SSR primer pairs, namely, SMC119CG, SMC1604SA, SMC1751CL, SMC18SA, SMC22DU, SMC278CS, SMC334BS, SMC7CUQ, SMC851MS, mSSCIR66 and mSSCIR74, were moderately polymorphic, each producing six to nine alleles. The remaining three SSR primer pairs, namely, SMC36BUQ, SMC486CG and SMC569CS, were less polymorphic by producing less than six alleles each (Table 1). The PIC values of these primer pairs ranged from 0.80 (SMC36BUQ) to 0.95 (SMC24DUQ, SMC31CUQ, SMC336BUQ, SMC597CS, SMC703BS, mSSCIR3, mSSCIR43) with an average of 0.92 (Table 1).

The PIC values of each Saccharum and E. arundinaceus species were also calculated in our study. The maximum PIC value was 0.95 for mSSCIR3 on S. spontaneum and the minmum PIC value was 0.28 for SMC119CG on E. arundinaceus. Generally, higher PIC values were found in Saccharum spp. hybrids with an average value of 0.87, followed by an average PIC value of 0.86 in S. spontaneum (Table 2).

Table 2 Polymorphism information content (PIC) of 21 SSR primer pairs analysed using 115 accessions from Saccharum, Erianthus, and Saccharum spp. hybrids.

Genetic variability

Using the CE detection system, an average of 138 polymorphic SSR bands was observed in each Saccharum or E. arundinaceus species. Among the five species of Saccharum, one species of Erianthus, and Saccharum spp. hybrids, both the highest number of polymorphic loci (NPL) and the highest percentage of polymorphic loci (PPL) were observed in Saccharum spp. hybrids population (NPL = 165, PPL = 98.8%), followed by S. spontaneum (NPL = 159, PPL = 95.21%), while the lowest number and percentage of polymorphic loci were found in E. arundinaceus (NPL = 93, PPL = 55.69%) (Fig. 1a). The highest number of observed alleles (Na = 1.98) was found in Saccharum spp. hybrids, while the lowest number of observed alleles (Na = 1.55) was found in E. arundinaceus (Fig. 1b). Morever, the highest number of effective alleles (Ne = 1.70) was found in the Saccharum spp. hybrids, followed by S. spontaneum (Ne = 1.64). The lowest number of effective alleles (Ne = 1.30) was observed in E. arundinaceus (Fig. 1c). Shannon’s index information of different populations ranged from 0. 28 (E. arundinaceus) to 0.57 (Saccharum. spp. hybrids). Analysis of Shannon’s index (I) showed that Saccharum spp. hybrids and S. spontaneum were different from the rest of other Sccharum species by sharing the highest shannon’s index value of 0.57. The lowest shannon’s diversity index value of 0.28 was observed in S. sinense (Fig. 1d). The Nei’s gene diversity (h) of the seven populations ranged from 0.21 to 0.39. The higher genetic diversity values of 0.39, 0.36 and 0.34 were observed in Saccharum spp. hybrids, S. spontaneum and S. officinarum populations, respectivaly; while the E. arundinaceus and S. barberi populations had the lower genetic diversity values of 0.21 and 0.26 (Fig. 1e).

Figure 1
figure 1

Statistical analysis of genetic variability among Saccharum, Erianthus and Saccharum spp. hybrids populations based on SSR data. Polymorphism index (PI) (a), Number of observed alleles (Na) (b), Number of effective alleles (Ne) (c), Shannon’s index (I) (d), and Nei’s genetic diversity (h) (e).

Principal Component Analysis (PCA)

Principal component analysis (PCA) data for all 115 accessions are shown in Fig. 2. The analysis classified these accessions into evelen groups involving different Saccharum and Erianthus species to some extent, i.e., Group I-A and I-B (Saccharum spp. hybrids), Group II-A and II-B (S. spontaneum), Group III (S. barberi), Group IV-A and IV-B (S. robustum), Group V-A and V-B (S. sinense), Group VI (S. officinarum), and Group VII (E. arundinaues) (Fig. 2). The amount of variance accounted for by the globle three-dimensional plot is 13.4% of Dim1, 7.12% of Dim2, and 6.51% of Dim3, with a total of 27.03% for three dimensions. This is an acceptale fit, given the small amount of variability from the large number of accessions and SSR alleles used in the analysis.

Figure 2
figure 2

Three-dimensional principal component analysis (PCA) plot of Saccharum, Erianthus, and Saccharum spp. hybrids based on SSR data.

Phyolgenetic analysis

A phylogenetic tree is shown in Fig. 3. Based on phylogenetic analysis, the 115 accessions were clearly clustered at Saccharum and Erianthus genera level into seven major clades, also involving different Saccharum and Erianthus species to some extent. Clade-I contained 27 accessions from S. officinarum, S. robustum, S. barberi and S. sinense. Clade-II included 16 accessions from S. spontaneum. Clade-III comprised of three accessions of S. officinarum, three accessions of S. robustum, and three accessions of S. barberi. Clade-IV and Clade-V held 22 accessions of Saccharum spp. Hybrids. Clade-VI clustered 13 accessions of S. robustum and S. spontaneum and five accessions of E. arundinaceus. However, one E. arundinaceus accession, Guizhou 78-I-24 (Earu05), was clustered with six S. spontaneum accessions. Finally, Clade-VII contained eight accessions of Saccharum spp. hybrids, four accessions of S. officinarum, five accessions of S. barberi, and six accessions of S. sinense.

Figure 3
figure 3

Phylogenetic trees of Saccharum, Erianthus, and Saccharum spp. hybrids based on SSR data. A distance tree was constructed in MEGA 6 using the UPGMA method. Robustness of the node of the phylogenetic tree was assessed from 1000 bootstrap replicates and bootstrap values of >60% are shown.

To verify some core SSR primer pairs out of the 21 primer pairs, we compared two phylogenetic trees constructed based on CE-data of 21 SSR primer pairs vs 7 SSR primer pairs and of 21 SSR primer pairs vs 6 SSR primer pairs with the Robinson-Foulds distance. Further analysis with Dendextend showed a higher cophenetic correlation coefficient value (0.93) between 21 SSR primer pairs and 7 SSR primer pairs than the 0.91 cophenetic correlation coefficient value between 21 SSR primer pairs and 6 SSR primer pairs. The plots of two phylogenetic trees based on the CE-data of 21 SSR primer pairs vs 7 SSR primer pairs are shown in Fig. 4 with tanglegrams.

Figure 4
figure 4

Two phylogenetic trees constrcuted using SSR data derived from 21 SSR primer pairs vs 7 SSR primer pairs with tanglegrams.

Genetic identity analysis

Percent of genetic identity was estimated between and within the seven phylogenetic groups. Percent genetic identity between phylogenetic groups ranged from 26.9% (Saccharum spp. hybrids and S. spontaneum) to 96.4% (E. arundinaceus and S. spontaneum). Percent genetic identity within phylogenetic groups ranged from 38.9% (within S. barberi or Saccharum spp. hybrids) to 100% (within S. robustum) (Table 3).

Table 3 Genetic identity (%) among five Saccharum species, one Erianthus species, and Saccharum spp. hybrids based on SSR data.

Discussion

Since 1950s, wild accessions of Saccharum and Erianthus have been continuously collected on mainland China and maintained in the Sugarcane Germplasm Nurseries in Yacheng, Hainan province or Kaiyuan, Yunnan Province, China. However, the genetic relationship and molecular identification between these two germplasm collections have never been entirely examined. Molecular markers are considered to be most effective in analyzing the genetic diversity, population structure, and phylogenetic relationship within sugarcane germplasm31. In recent years, SSR markers are proven to be very useful for a variety of applications in plants, including linkage maps analysis, segregation analysis, population structure analysis, marker-assisted selection, assessment of genetic relationships between individuals, mapping genes of interest, and marker-assisted backcrosses, population genetics and phylogenetic studies32,33.

In this study, we investigated the genetic diversity and population structure for 115 accessions of Saccharum and Erianthus genera that originated from two collections on mainland China and a local collection in the USA by 21 SSR primer pairs. The 21 primer pairs primed the amplification of 167 polymorphic SSR alleles detectable by the CE platform, of which 38 alleles have never been reported before. Every primer pair was able to amplify varying numbers of SSR alleles from all accessions tested, regardless of their geographical origins. Seven core SSR primer pairs, namely, SMC24DUQ, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, mSSCIR3, and mSSCIR43, produced more than ten alleles among the 115 accessions, while four of the seven core primer pairs, namely, SMC31CUQ, SMC336BS, SMC597CS, and mSSCIR3, also primed the amplification of more than 10 alleles among 92 Chinese commercial sugarcane varieties7. Therefore, these seven core SSR primer pairs would have a priority of choice in identifying clones either from Saccharum species or Saccharum spp. hybrids.

The number of polymorphic SSR alleles detected in this study was higher than the 144 alleles reported by Pan30 or the 151 alleles reported by Ahmad et al.7, but lower than the 205 polymorphic alleles reported by You et al.25. We considered that the differences were due to different Saccharum clones being used in previous studies or to different scoring criteria. The differences may also be due to the complex genomes of Saccharum and Erianthus on one hand and relatively narrow genetic base of commercial sugarcane varieties on the other hand.

In this study, we observed different levels of genetic variations among accessions of Saccharum and Erianthus tested. In general, Saccharum spp. hybrids and S. spontaneum accessions had a higher genetic diversity than S. sinense and E. arundinaceus accessions. However, the highest number of observed alleles, number of effective alleles and polymorphism index were observed in accessions of Saccharum spp. hybrids, which are polyploidy with genome contributions from several Saccharum species. Historically, the modern Saccharum spp. hybrids were developed from crosses between the “Noble” cane S. officinarum and its relatives, namely, S. spontaneum, S. sinense, or S. barberi in the early 20th century34,35. The overall genetic variation values from this study were higher than those reported by You et al.24,25. We hypothesize that this phenomenon was due to the utilization of a larger number of SSR primer pairs and the large number of accessions from diverse Saccharum and Erianthus species in our study.

It is worthnoting that the 21 SSR primer pairs worked well in clustering Saccharum, Erianthus, and Saccharum spp. hybrids clones during phylogenetic analysis process. Two Saccharum spp. hybrids clones [(R570 (Sspp17)] from France and [(Q124 (Sspp18)] from Australia were clustered into a sub-clade in Clade-VII with four accessions of S. officinarum. The reason could be that R570 and Q124 varieties may have a closer affiliation with S. officinarum. In addition, the six accessions of E. arundinaceus were clustered with accessions of S. robustum and S. spontaneum in Clade-VI rather than forming a separate clade. This was because all the 21 SSR primer pairs were designed from the genomic DNA sequences of two cultivars, either Q124 or R-57030. Unlike some consensus primers that are able to prime the PCR amplification of plant genomic sequences36, these SSR primer pairs may not be able to amplify Erianthus genomic DNA at equivalent efficiency as they do to the Saccharum genomes. Another reason is that it is now generally accepted that Noble cultivars might directly emerge from S. robustum. It also has hypothesized that S. robustum be evolved from complex introgressions between S. spontaneum and other genera, particularly Erianthus and Miscanthus sharing close genetic affiliation37,38. The genetic diversity results from our study were in general conformity with the evolutionary course of the sugarcane cultivars in that the order of contributing species in today’s accessions is S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense, E. arundinaceus and Saccharum spp. hybrids. PCA analysis also revealed a similar pattern of phylogeny to some extent.

Today, China holds more than 2,000 accessions of Saccharum and Erianthus, among which some are wild types. These accessions are either China-born or through foreign introductions. As the size of sugarcane germplasm grows, the genetic information among accessions becomes more critical for maintaining and utilization strategies designed to establish cross parentages in China’s breeding programs. We conclude that the estimation of genetic diversity and population structure of 115 accessions of Saccharum and Erianthus genera using SSR primer pairs may provide more accurate information to sugarcane breeders than the classical pedigree method. The 21 SSR primer pairs used in our study may also be of potential value for further research on genetic mapping, segregation analysis, marker-assisted selections, QTL mapping and gene tagging in sugarcane. In addition, further study with consensus PCR primers may be needed to assess the phylogenetic status of the Erianthus genus within the “Saccharum Complex”38.

Materials and Methods

Plant materials

One hundred and fifteen asseccions were used in this study, including 12 accessions from S. officinarum, 22 from S. spontaneum, 14 from S. robustum, 17 from S. barberi, 14 from S. sinense, 30 from Saccharum spp. hybrids, and six from E. arundinaceus. The leaf samples of all the clones were collected either from the Sugarcane Germplasm Nursery in Yacheng, Hainan, China or a local collection at the USDA-ARS, Sugarcane Research Unit, Houma, Louisiana, USA (Table 4). The leaf samples were collected, wiped off with 75% ethanol, and kept at −80 °C until DNA extraction.

Table 4 A list of 115 accessions from Saccharum, Erianthus, and Saccharum spp. hybrids.

DNA extraction

Genomic DNA was extracted from leaf tissues using a modified cetyl tri-methyl ammonium bromide (CTAB) method39 as previously described by Ahmad et al.7. The quality and concentration of DNA were measured using UV absorbance assay with a Synergy™ H1 Multi-Mode Reader (BioTek, Winooski, VT, USA) and 0.8% agarose gel electrophoresis with ethidium bromide staining.

SSR markers and SSR reactions

The 21 polymorphic SSR primer pairs from Pan30 were used in this study based on their high PIC values of greater than 0.787,40,41. All forward primers were labeled with the fluorescence dye, 6-carboxy-fluorescein (FAM). Serials of PCR-cycling conditions were performed to detect the SSR DNA fingerprints7,30. The PCR products for the capillary electrophoresis (CE) were conducted on ABI 3730XL DNA Analyzer (Applied Biosystems Inc., Foster City, CA, USA) following the manufacturer’s instructions to generate GeneScan files.

Marker scoring

The GeneScan files were analyzed with the GeneMarker™ software (version 1.80) (SoftGenetics LLC®, State College, PA, USA, www.softgenetics.com) to reveal capillary electrophoregrams of PCR amplified SSR-DNA fragments. Fragment sizes were computed automatically against the GS500 DNA size standards (Applied Biosystems, Inc., Foster City, CA, USA). SSR alleles were manually assigned to unique, true “Plus-adenine” DNA fingerprints that gave quantifiable fluorescence values. Irregular peaks and stutters peaks were not scored according to Pan et al.41. Data were scored manually in a binary format into a data matrix file, with the presence of a band scored as “1” or “A” and its absence scored as “0” or “C”. The polymorphism information content (PIC) values were calculated using the formula of Liu et al.23.

$$PIC=1-\sum _{J=1}^{n}\,{P}_{ij}^{2}$$

Where Pij is the frequency of jth allele for ith locus and summation extends over n alleles.

Data analysis

The allelic data matrix of “1” or “0” was used to calculate the population genetic analysis using POPGENE version 1.3242, including number of observed alleles (Na), and number of effective alleles (Ne). Nei’s genetic diversity (h), polymorphism index (PI) and Shannon’s index (I) were computed for each Saccharum and Erianthus populations based on the obtained allele frequencies. The allelic data matrix of “A” or “C” was used to perform phylogenetic analysis. Phylogenetic tree was constructed with MEGA 6 using UPGMA statistical method with substitution model of Maximum Composite Likelihood43. Robustness of the node of the phylogenetic tree was assessed from 1000 bootstrap replicates. To find out the core-primer pairs of 21 SSR primer pairs, two other phylogenetic trees were constructed using SSR data from six or seven highly polymorphic SSR primer pairs. Then, the three phylogenetic tree files were calculated and Robinson-Foulds distances of 21 SSR vs 7 SSR and 21 SSR vs 6 SSR determined with Phangorn Package44 and cophenetic correlation coefficients of the topological distance were analyzed with Dendextend45. To better view the comparison between trees, Dendextend were used to plot two trees with tanglegrams. Genetic identity matrix was calculated using BioEdit Sequence Alignment Editor Version 7.1.946. Genetic similarity coefficients among Saccharum and Erianthus populations were estimated with the SIMQUAL subprogram using the Jaccard’s coefficient, followed by principal component analysis (PCA) with the DICE subprogram as implemented in NTSYS-pc version 2.10e47.