Genetic diversity and population structure analysis of Saccharum and Erianthus genera using microsatellite (SSR) markers

In order to understand the genetic diversity and structure within and between the genera of Saccharum and Erianthus, 79 accessions from five species (S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense), six accessions of E. arundinaceus, and 30 Saccharum spp. hybrids were analyzed using 21 pairs of fluorescence-labeled highly poloymorphic SSR primers and a capillary electrophoresis (CE) detection system. A total of 167 polymorphic SSR alleles were identified by CE with a mean value of polymorphic information content (PIC) of 0.92. Genetic diversity parameters among these 115 accessions revealed that Saccharum spp. hybrids were more diverse than those of Saccharum and Erianthus species. Based on the SSR data, the 115 accessions were classified into seven main phylogenetic groups, which corresponded to the Saccharum and Erianthus genera through phylogenetic analysis and principle component analysis (PCA). We propose that seven core SSR primer pairs, namely, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, SMC24DUQ, mSSCIR3, and mSSCIR43, may have a wide appicability in genotype identification of Saccharum species and Saccharum spp. hybrids. Thus, the information from this study contibites to manage sugarcane genetic resources.

Early molecular marker research focused on the origin of wild Saccharum species. Lu et al. 14 proposed a hybrid origin for S. barberi and S. sinense from natural hybridization between S. spontaneum and S. officinarum, based on a factorial correspondence analysis of RFLP markers. Subsequently, these results were supported by Irvine 15 and Selvi et al. 16 using SSR markers and by D'Hont et al. 17 utilizing genomic in situ hybridization (GISH). Based on analysis of agronomic traits and mitochondrial profiles, S. barberi and S. sinense were placed in adjacent clusters, but apart from S. robustum [18][19][20] . Later on a number of reports focused on the analysis of genetic diversity and population structure among commercial Saccharum spp. hybrids varieties 7,21-25 and among S. spontaneum populations with different ploidy levels in China 26 . Therefore, there has been an increasing interest among sugarcane breeders to investigate the genetic diversity of parental resources and to broaden the genetic base by tapping into the gene pools of the wild relatives [27][28][29] .
To better understand the genetic background of these euploid sugarcane clones, this study aimed to characterize the genetic diversity and population structure of 115 accessions belonging to S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense, E. arundinaceus, and Saccharum spp. hybrids. The results may provide invaluable information for the better utilization of Saccharum and Erianthus wild germplasms at different ploidy levels in sugarcane breeding.

Results
Total alleles amplification of 21 SSR markers. A total of 167 SSR alleles were amplified from the DNA of 115 accessions including five Saccharum species, E. arundinaceus, and 30 clones of Saccharum spp. hybrids with the 21 fluorescence-labeled SSR primer pairs and capillary electrophoresis (CE) detection system. We could not find in our CE data the 16 SSR alleles reported earlier by Pan 30 , but instead, we have found 38 new SSR alleles that were never reported before (Table 1). Furthermore, the numbers of new and absent SSR alleles detected in this study were greater than the 20 new and 13 absent SSR alleles reported previously by Ali et al. 7 .
The number of alleles detected by the CE system varied from as few as four (SMC36BUQ) to as many as 13 (SMC597CS), with an average of 7.95 per SSR primer pair. Seven SSR primer pairs, namely SMC24DUQ, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, mSSCIR3 and mSSCIR43, were highly polymorphic, each producing 10 to 13 alleles. Other eleven SSR primer pairs, namely, SMC119CG, SMC1604SA, SMC1751CL, SMC18SA, SMC22DU, SMC278CS, SMC334BS, SMC7CUQ, SMC851MS, mSSCIR66 and mSSCIR74, were moderately polymorphic, each producing six to nine alleles. The remaining three SSR primer pairs, namely, SMC36BUQ, SMC486CG and SMC569CS, were less polymorphic by producing less than six alleles each ( Table 1). The PIC values of these primer pairs ranged from 0.80 (SMC36BUQ) to 0.95 (SMC24DUQ, SMC31CUQ, SMC336BUQ, SMC597CS, SMC703BS, mSSCIR3, mSSCIR43) with an average of 0.92 (Table 1). The PIC values of each Saccharum and E. arundinaceus species were also calculated in our study. The maximum PIC value was 0.95 for mSSCIR3 on S. spontaneum and the minmum PIC value was 0.28 for SMC119CG on E. arundinaceus. Generally, higher PIC values were found in Saccharum spp. hybrids with an average value of 0.87, followed by an average PIC value of 0.86 in S. spontaneum (Table 2). Genetic variability. Using the CE detection system, an average of 138 polymorphic SSR bands was observed in each Saccharum or E. arundinaceus species. Among the five species of Saccharum, one species of Erianthus, and Saccharum spp. hybrids, both the highest number of polymorphic loci (NPL) and the highest percentage of polymorphic loci (PPL) were observed in Saccharum spp. hybrids population (NPL = 165, PPL = 98.8%), followed by S. spontaneum (NPL = 159, PPL = 95.21%), while the lowest number and percentage of polymorphic loci were found in E. arundinaceus (NPL = 93, PPL = 55.69%) (Fig. 1a). The highest number of observed alleles (Na = 1.98) was found in Saccharum spp. hybrids, while the lowest number of observed alleles (Na = 1.55) was found in E. arundinaceus (Fig. 1b). Morever, the highest number of effective alleles (Ne = 1.70) was found in the Saccharum spp. hybrids, followed by S. spontaneum (Ne = 1.64). The lowest number of effective alleles (Ne = 1.30) was observed in E. arundinaceus (Fig. 1c). Shannon's index information of different populations ranged from 0. 28 (E. arundinaceus) to 0.57 (Saccharum. spp. hybrids). Analysis of Shannon's index (I) showed that Saccharum spp. hybrids and S. spontaneum were different from the rest of other Sccharum species by sharing the highest shannon's index value of 0.57. The lowest shannon's diversity index value of 0.28 was observed in S. sinense (Fig. 1d).  (Fig. 2). The amount of variance accounted for by the globle three-dimensional plot is 13.4% of Dim1, 7.12% of Dim2, and 6.51% of Dim3, with a total of 27.03% for three dimensions. This is an acceptale fit, given the small amount of variability from the large number of accessions and SSR alleles used in the analysis. Fig. 3. Based on phylogenetic analysis, the 115 accessions were clearly clustered at Saccharum and Erianthus genera level into seven major clades, also involving different Saccharum and Erianthus species to some extent. Clade-I contained 27 accessions from S. officinarum, S. robustum, S. barberi and S. sinense. Clade-II included 16 accessions from S. spontaneum. Clade-III comprised of three accessions of S. officinarum, three accessions of S. robustum, and three accessions of S. barberi. Clade-IV To verify some core SSR primer pairs out of the 21 primer pairs, we compared two phylogenetic trees constructed based on CE-data of 21 SSR primer pairs vs 7 SSR primer pairs and of 21 SSR primer pairs vs 6 SSR primer pairs with the Robinson-Foulds distance. Further analysis with Dendextend showed a higher cophenetic correlation coefficient value (0.93) between 21 SSR primer pairs and 7 SSR primer pairs than the 0.91 cophenetic correlation coefficient value between 21 SSR primer pairs and 6 SSR primer pairs. The plots of two phylogenetic trees based on the CE-data of 21 SSR primer pairs vs 7 SSR primer pairs are shown in Fig. 4 with tanglegrams.

Phyolgenetic analysis. A phylogenetic tree is shown in
Genetic identity analysis. Percent of genetic identity was estimated between and within the seven phylogenetic groups. Percent genetic identity between phylogenetic groups ranged from 26.9% (Saccharum spp. hybrids and S. spontaneum) to 96.4% (E. arundinaceus and S. spontaneum). Percent genetic identity within phylogenetic groups ranged from 38.9% (within S. barberi or Saccharum spp. hybrids) to 100% (within S. robustum) ( Table 3).

Discussion
Since 1950s, wild accessions of Saccharum and Erianthus have been continuously collected on mainland China and maintained in the Sugarcane Germplasm Nurseries in Yacheng, Hainan province or Kaiyuan, Yunnan Province, China. However, the genetic relationship and molecular identification between these two germplasm collections have never been entirely examined. Molecular markers are considered to be most effective in analyzing the genetic diversity, population structure, and phylogenetic relationship within sugarcane germplasm 31 .
In recent years, SSR markers are proven to be very useful for a variety of applications in plants, including linkage maps analysis, segregation analysis, population structure analysis, marker-assisted selection, assessment of genetic relationships between individuals, mapping genes of interest, and marker-assisted backcrosses, population genetics and phylogenetic studies 32,33 .
In this study, we investigated the genetic diversity and population structure for 115 accessions of Saccharum and Erianthus genera that originated from two collections on mainland China and a local collection in the USA by 21 SSR primer pairs. The 21 primer pairs primed the amplification of 167 polymorphic SSR alleles detectable by the CE platform, of which 38 alleles have never been reported before. Every primer pair was able to amplify varying numbers of SSR alleles from all accessions tested, regardless of their geographical origins. Seven core SSR primer pairs, namely, SMC24DUQ, SMC31CUQ, SMC336BS, SMC597CS, SMC703BS, mSSCIR3, and mSS-CIR43, produced more than ten alleles among the 115 accessions, while four of the seven core primer pairs, namely, SMC31CUQ, SMC336BS, SMC597CS, and mSSCIR3, also primed the amplification of more than 10 alleles among 92 Chinese commercial sugarcane varieties 7 . Therefore, these seven core SSR primer pairs would have a priority of choice in identifying clones either from Saccharum species or Saccharum spp. hybrids.
The number of polymorphic SSR alleles detected in this study was higher than the 144 alleles reported by Pan 30 or the 151 alleles reported by Ahmad et al. 7 , but lower than the 205 polymorphic alleles reported by You et al. 25 . We considered that the differences were due to different Saccharum clones being used in previous studies or to different scoring criteria. The differences may also be due to the complex genomes of Saccharum and Erianthus on one hand and relatively narrow genetic base of commercial sugarcane varieties on the other hand.
In this study, we observed different levels of genetic variations among accessions of Saccharum and Erianthus tested. In general, Saccharum spp. hybrids and S. spontaneum accessions had a higher genetic diversity than S. sinense and E. arundinaceus accessions. However, the highest number of observed alleles, number of effective alleles and polymorphism index were observed in accessions of Saccharum spp. hybrids, which are polyploidy with genome contributions from several Saccharum species. Historically, the modern Saccharum spp. hybrids were developed from crosses between the "Noble" cane S. officinarum and its relatives, namely, S. spontaneum, S. sinense, or S. barberi in the early 20th century 34,35 . The overall genetic variation values from this study were higher than those reported by You et al. 24,25 . We hypothesize that this phenomenon was due to the utilization of a larger number of SSR primer pairs and the large number of accessions from diverse Saccharum and Erianthus species in our study. It is worthnoting that the 21 SSR primer pairs worked well in clustering Saccharum, Erianthus, and Saccharum spp. hybrids clones during phylogenetic analysis process. Two Saccharum spp. hybrids clones [(R570 (Sspp17)] from France and [(Q124 (Sspp18)] from Australia were clustered into a sub-clade in Clade-VII with four accessions of S. officinarum. The reason could be that R570 and Q124 varieties may have a closer affiliation with S. officinarum. In addition, the six accessions of E. arundinaceus were clustered with accessions of S. robustum and S. spontaneum in Clade-VI rather than forming a separate clade. This was because all the 21 SSR primer pairs were designed from the genomic DNA sequences of two cultivars, either Q124 or R-570 30 . Unlike some consensus primers that are able to prime the PCR amplification of plant genomic sequences 36 , these SSR primer pairs may not be able to amplify Erianthus genomic DNA at equivalent efficiency as they do to the Saccharum genomes. Another reason is that it is now generally accepted that Noble cultivars might directly emerge from S. robustum. It also has hypothesized that S. robustum be evolved from complex introgressions between S. spontaneum and other genera, particularly Erianthus and Miscanthus sharing close genetic affiliation 37,38 . The genetic diversity results from our study were in general conformity with the evolutionary course of the sugarcane cultivars in that the order of contributing species in today's accessions is S. officinarum, S. spontaneum, S. robustum, S. barberi, S. sinense,    E. arundinaceus and Saccharum spp. hybrids. PCA analysis also revealed a similar pattern of phylogeny to some extent. Today, China holds more than 2,000 accessions of Saccharum and Erianthus, among which some are wild types. These accessions are either China-born or through foreign introductions. As the size of sugarcane germplasm grows, the genetic information among accessions becomes more critical for maintaining and utilization strategies designed to establish cross parentages in China's breeding programs. We conclude that the estimation of genetic diversity and population structure of 115 accessions of Saccharum and Erianthus genera using SSR primer pairs may provide more accurate information to sugarcane breeders than the classical pedigree method. The 21 SSR primer pairs used in our study may also be of potential value for further research on genetic mapping, segregation analysis, marker-assisted selections, QTL mapping and gene tagging in sugarcane. In addition, further study with consensus PCR primers may be needed to assess the phylogenetic status of the Erianthus genus within the "Saccharum Complex" 38 .

Materials and Methods
Plant materials. One hundred and fifteen asseccions were used in this study, including 12 accessions from S. officinarum, 22 from S. spontaneum, 14 from S. robustum, 17 from S. barberi, 14 from S. sinense, 30 from Saccharum spp. hybrids, and six from E. arundinaceus. The leaf samples of all the clones were collected either from the Sugarcane Germplasm Nursery in Yacheng, Hainan, China or a local collection at the USDA-ARS, Sugarcane Research Unit, Houma, Louisiana, USA ( Table 4). The leaf samples were collected, wiped off with 75% ethanol, and kept at −80 °C until DNA extraction. DNA extraction. Genomic DNA was extracted from leaf tissues using a modified cetyl tri-methyl ammonium bromide (CTAB) method 39 as previously described by Ahmad et al. 7 . The quality and concentration of DNA were measured using UV absorbance assay with a Synergy ™ H1 Multi-Mode Reader (BioTek, Winooski, VT, USA) and 0.8% agarose gel electrophoresis with ethidium bromide staining. 30 were used in this study based on their high PIC values of greater than 0.78 7,40,41 . All forward primers were labeled with the fluorescence dye, 6-carboxy-fluorescein (FAM). Serials of PCR-cycling conditions were performed to detect the SSR DNA fingerprints 7,30 . The PCR products for the capillary electrophoresis (CE) were conducted on ABI 3730XL DNA Analyzer (Applied Biosystems Inc., Foster City, CA, USA) following the manufacturer's instructions to generate GeneScan files.

SSR markers and SSR reactions. The 21 polymorphic SSR primer pairs from Pan
Marker scoring. The GeneScan files were analyzed with the GeneMarker ™ software (version 1.80) (SoftGenetics LLC ® , State College, PA, USA, www.softgenetics.com) to reveal capillary electrophoregrams of PCR amplified SSR-DNA fragments. Fragment sizes were computed automatically against the GS500 DNA size standards (Applied Biosystems, Inc., Foster City, CA, USA). SSR alleles were manually assigned to unique, true "Plus-adenine" DNA fingerprints that gave quantifiable fluorescence values. Irregular peaks and stutters peaks were not scored according to Pan et al. 41 . Data were scored manually in a binary format into a data matrix file, with the presence of a band scored as "1" or "A" and its absence scored as "0" or "C". The polymorphism information content (PIC) values were calculated using the formula of Liu et al. 23 . Where P ij is the frequency of j th allele for i th locus and summation extends over n alleles. Data analysis. The allelic data matrix of "1" or "0" was used to calculate the population genetic analysis using POPGENE version 1.32 42 , including number of observed alleles (Na), and number of effective alleles (Ne). Nei's genetic diversity (h), polymorphism index (PI) and Shannon's index (I) were computed for each Saccharum and Erianthus populations based on the obtained allele frequencies. The allelic data matrix of "A" or "C" was used to perform phylogenetic analysis. Phylogenetic tree was constructed with MEGA 6 using UPGMA statistical method with substitution model of Maximum Composite Likelihood 43 . Robustness of the node of the phylogenetic tree was assessed from 1000 bootstrap replicates. To find out the core-primer pairs of 21 SSR primer pairs, two other phylogenetic trees were constructed using SSR data from six or seven highly polymorphic SSR primer pairs. Then, the three phylogenetic tree files were calculated and Robinson-Foulds distances of 21 SSR vs 7 SSR and 21 SSR vs 6 SSR determined with Phangorn Package 44 and cophenetic correlation coefficients of the topological distance were analyzed with Dendextend 45 . To better view the comparison between trees, Dendextend were used to plot two trees with tanglegrams. Genetic identity matrix was calculated using BioEdit Sequence Alignment Editor Version 7.1.9 46 . Genetic similarity coefficients among Saccharum and Erianthus populations were estimated with the SIMQUAL subprogram using the Jaccard's coefficient, followed by principal component analysis (PCA) with the DICE subprogram as implemented in NTSYS-pc version 2.10e 47 .