Cross-species transferability of EST-SSR markers developed from the transcriptome of Melilotus and their application to population genetics research

Melilotus is one of the most important legume forages, but the lack of molecular markers has limited the development and utilization of Melilotus germplasm resources. In the present study, 151 M clean reads were generated from various genotypes of Melilotus albus using Illumina sequencing. A total of 19,263 potential EST-SSRs were identified from 104,358 unigene sequences. Moreover, 18,182 primer pairs were successfully designed, and 550 primer pairs were selected using criteria of base repeat type, fragment length and annealing temperature. In addition, 550 primer pairs were screened by using PCR amplification products and used to assess polymorphisms in 15 M. albus accessions. A total of 114 primer pairs were detected as being highly polymorphic, and the average polymorphism information content (PIC) value was 0.79. Furthermore, those 114 polymorphic primer pairs were used to evaluate the transferability to 18 species of the genus Melilotus, and 70 EST-SSR markers were found to be transferable among the 18 Melilotus species. According to the UPGMA dendrogram and STRUCTURE analysis, the 18 Melilotus species were classified into three clusters. This study offers a valuable resource for the genetic diversity and molecular assisted breeding of germplasm resources in the genus Melilotus.

EST-SSRs are a type of molecular marker based on expressed sequence tags; compared with genomic-SSRs, EST-SSRs have a higher level of transferability across related species because EST-SSRs originate from the transcribed regions in genomes and possess conserved sequences among homologous genes 22 . At present, EST-SSR is widely used in plant genomics research, such as genetic map construction, comparative mapping, genetic diversity evaluation, germplasm identification, and phylogenetic and evolutionary studies [23][24][25] .
In this study, we aim to (1) develop EST-SSR markers for M. albus using the Illumina HiSeq. 2000 sequencing platform, (2) screen the 550 primer pairs that were selected based on the conditions of base repeat type, fragment length and annealing temperature, by using PCR and PAGE electrophoresis from fifteen accessions of M. albus, (3) detect the transferable and polymorphic EST-SSR markers for the genus Melilotus, and (4) reveal the population structure of Melilotus species.

Development, screening, and polymorphism of EST-SSR.
In the initial screen of 550 EST-SSR primer pairs were selected based on the conditions of base repeat type, fragment length and annealing temperature by using the genomic DNA of fifteen accessions of M. albus. A total of 351 pairs of primers generated amplification products, while the remaining 199 pairs of primers failed to detect PCR amplification products at multiple annealing temperatures. For the 351 that amplified, 290 pairs of primers were obtained with clear and well-sized amplified products, while the remaining 61 pairs of primers amplified the PCR product bands size is greater or less than the expected fragment size of the primer. Of the 290 pairs of EST-SSR primers capable of amplifying the expected fragment size of the primers, 206 pairs showed polymorphism, while the remaining 84 primers did not (Table S2). Then, 290 EST-SSR primer pairs were selected for further screening based on their polymorphism using sixty individuals of fifteen M. albus accessions. In total, 114 polymorphism primers were selected and used for transferability analysis of the genus Melilotus (Fig. 1, Table S3). Furthermore, to verify the accuracy and
Genetic diversity analysis and population structure analysis. As shown in Table 3 (Fig. S1). Cluster I contained five germplasms, PI 478773, PI 508617, PI 342796, PI 662296 and PI 478468. Cluster II contained PI 342765, PI 494706, ZXY06P-1732, Zhongxu-1226 and ZXY07P-3150, and the remaining five accessions were clustered in Cluster III. From the results of the cluster analysis, it can be seen that the individual plants of the germplasm were clustered together, and the genetic similarity coefficients ranged from 0.79 to 0.98. The genetic similarity coefficients were higher, which indicated that the genetic relationship between the 15 germplasms was close.
According to the UPGMA dendrogram (Fig. 3 The analysis of the genetic structures of 54 individual plants belonging to 18 Melilotus species by using EST-SSR markers was performed with structure software, which was run for K = 2-8. The optimal number of groups was three based on maximum likelihood and delta K (ΔK) values (Fig. 4)

Discussion
SSR markers are well known and widely used for genetic diversity analysis, germplasm identification, comparative genetics, phylogenetic relationship, QTL analysis, linkage mapping and marker-assisted selection 26,27 . Therefore, they are useful for studying genetics and for breeding applications to develop SSR markers from the M. albus transcriptome. In this study, a total of 19,263 potential EST-SSRs were identified from 104,358 Melilotus unigenes, revealing that the abundance of SSRs for Melilotus ESTs was higher than those for alfalfa 28 , M. truncatula 29 , and cupuassu tree 30 . In total, 18,182 EST-SSR markers were identified from the 19,263 potential EST-SSRs. The study did not analyse the mononucleotide repeats because it was difficult to distinguish single nucleotide repeats  from polyadenylation products and single nucleotide stretch errors produced by sequencing 28 . Here, di-and tri-nucleotide repeats were the most abundant repeats in Melilotus, which is same as in most plants in previous studies. Dinucleotide repeats were found to be the most abundant repeat motif in pistachio 21 , tea 31 , rubber tree 32 and Neottopterisnidus 33 . Tri-nucleotide repeats were the most abundant repeat motif in bread wheat 34 , Siberian wildrye 35 , alfalfa 28 and castor bean 36 . Fifteen M. albus accessions were used to screen the 550 pairs of primer for PCR amplification, and clear bands were detected with 351 pairs. This was fewer than for Siberian wildrye 35 but higher than alfalfa 28 and rubber tree 32 . Among these 351 primer pairs that produced amplification, 290 pairs of primers were obtained with clear and well-sized amplified products, while the remaining 61 primer pairs amplified PCR product bands that were larger or smaller than the expected. These deviations may be due to the large insertions or to variation in the repeat number, lack of specificity or assembly error and the existence of introns 37,38 . However, 114 of those 290 primer pairs were polymorphic among 60 individuals of 15 M. albus accessions. In the tested accessions, the percentage of polymorphic loci was higher than the results for Siberian wildrye 35,39 . The high levels of polymorphism observed may be due to the M. albus materials selected for screening the primers.   In this study, 114 newly developed EST-SSR markers were used to evaluate the transferability of 18 species of the genus Melilotus. In total, 70 of the 114 primer pairs successfully amplified in all species and obtained stable transferability (61.4%), which is higher than the rates obtained in Siberian wildrye (49.11%) 35 and Cucumis (12.7%) 40 . The high rate of transferability of EST-SSRs was may be because all genotypes belonged to the genus Melilotus, which means that the EST-SSR markers are derived from transcribed regions that are conserved across species. Additionally, the EST-SSRs have higher transferability than SSRs were from untranscribed regions 41    previous studies in Melilotus 7 . This may be because the expressed sequences, from which EST-SSR are derived, are highly conserved. EST-SSR markers detected a lower rate of polymorphism than the genomic SSRs 42,43 . However, the average values of H E and PIC were higher than in previous studies in Elymus 35 , alfalfa (M. sativa L.) 28 and millet (Setariaitalica L.) 44 . However, the value of H O was low, with an average of 0.11. Therefore, to analyse the genetic diversity and promote genetic breeding programs in future studies, it is necessary to improve the heterozygosity of Melilotus accessions 7 . As shown in the UPGMA results, M. albus, M. altissimus, M. dentatus, M. elegans, M. hirsutus, M. officinalis, M. polonicus, M. tauricus, M. wolgicus and M. suaveolens were clustered into single group, which is consistent with the results of previous studies 13 . However, in a previous study, M. dentatus and M. tauricus were not clustered into the same group as the other species 7 . Among the 40 Melilotus germplasms, the association between the clustering pattern and the geographical distribution was less significant. The result may be due to the small number of accessions from each geographical location used in this study. The similar results have been reported in alfalfa 45 and drumstick 46 . These results revealed that genetic distance cannot be the only criterion for genetic differentiation of populations. Moreover, the genetic structure revealed some species showing admixture between group I and group II, while species showed less admixture in group III. Therefore, it is important to use more EST-SSR loci and more individual plants to reveal the relationships among Melilotus species in future studies.  In the present study, we developed a large number of EST-SSR markers for Melilotus from transcriptome data. A total of 104,358 unigenes were generated, and 19,263 EST-SSRs were identified. For these EST-SSRs, 18,182 primer pairs were successfully designed, providing an important foundation for molecular marker development in Melilotus. Of these EST-SSRs, 550 were selected on the basis of base repeat type, fragment length and annealing temperature for further validation. A total of 114 primer pairs detected high polymorphism among 15 accessions of M. albus. In addition, 70 showed transferability of 114 polymorphic primer pairs among 18 Melilotus species. The results suggest that these 70 primer pairs will be useful in future studies of Melilotus population structure, genetic diversity, molecular assisted selection, QTL analysis, and evaluation of germplasm accessions. This study offers a valuable resource for the genetic diversity and molecular-assisted breeding of germplasm resources in the genus Melilotus.

Materials and Methods
Plant materials and DNA extraction. Seeds from Melilotus species were obtained from the National Gene Bank of Forage Germplasm (NGBFG, China) and the National Plant Germplasm System (NPGS, USA) as summarized in Tables 4 and 5. Fifteen accessions of M. albus were used to screen EST-SSR markers for polymorphism and to assess genetic diversity (Table 4), and each accession contained four individual plants. Forty accessions of eighteen species in Melilotus genus were collected to evaluate the transferability of these newly developed EST-SSR markers to other related species and to analyse genetic diversity among Melilotus species (Table 5), and each species contained three individual plants from different accessions. Genomic DNA was extracted from the young leaves using the sodium dodecyl sulfate (SDS) method 47 . The extracted DNA was detected by agarose gel electrophoresis. The samples were diluted with ddH 2 O to 50 ng/μL and stored at −20 °C.
Detection of the EST-SSR markers and primer design. SSRs were detected in the assembled unigenes using the Simple Sequence Repeat Identification Tool program (MicroSatellite), and the SSRs were considered to contain mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides with minimum repeat numbers of ten, six, five, five and five, respectively. The EST-SSR primers were designed using BatchPrimer3, and the designed EST-SSR primers were synthesized by Shanghai Sangon Biological Engineering Technology (Shanghai, China).
Primer selection and PCR conditions. A total of 550 primer pairs were selected according to the conditions of base repeat type (except for single bases), fragment length (150-200 bp) and annealing temperature (55-60 °C). PCR and electrophoresis were performed to screen EST-SSR primer pairs for polymorphism using Melilotus species. PCR amplifications were performed in a final volume of 10 μL containing 1 μL genomic DNA (50 ng/μL), 4.9 μL 2 × Reaction Mix (dNTPs at 500 μM each, 20 mM Tris-HCl, 100 mM KCl, 3 mM MgCl 2 ), 0.1 μL 2.5 U/μL Golden DNA Polymerase, 1.0 μL of each primer (4 μM each) and 2.0 μL double distilled water. PCR cycling conditions were 3 min at 94 °C, 35 cycles of 30 s at 94 °C, 30 s at the annealing temperature, 30 s at 72 °C and a final extension step of 7 min at 72 °C. The PCR products were subjected to electrophoresis on 6.0% non-denaturing polyacrylamide gels and stained using silver dye. In addition, the DL500 DNA marker was used to determine the sizes of the PCR products.

Sequencing of PCR amplification products.
To verify the accuracy and authenticity of the PCR amplification products, we selected some PCR amplification products for sequencing by Shanghai Sangon Biological Engineering Technology (Shanghai, China). PCR products should be selected according to the presence of single bands and high amplification efficiency. Data analysis. The number of alleles of each EST-SSR locus was calculated based on presence (1) or absence (0). The observed heterozygosity (H O ), expected heterozygosity (H E ), and polymorphic information content (PIC) were calculated as previously reported 48 . The program POPGENE 32 49 was used to calculate the number of polymorphic loci (NPL), the percentage of polymorphic loci (PPL), the observed number of alleles (na), the effective number of alleles (ne), Nei's gene diversity (h) and Shannon's Information index (I) values. With the help of SAHN-clustering in NTSYSpc-v.2.1 software, Nei's unbiased genetic distance and UPGMA were used for cluster analysis to generate a dendrogram 50 . The model-based approach implemented was used to subdivide the individuals into different subgroups in the program STRUCTURE 2.3 51 .
Data availability. The RNA-seq data supporting the results of this article are available at NCBI under BioProject with accession PRJNA331091.