Construction of a high-density linkage map and QTL mapping for important agronomic traits in Stylosanthes guianensis (Aubl.) Sw.

Stylosanthes guianensis (Aubl.) Sw. is an economically important pasture and forage legume in tropical regions of the world. Genetic improvement of the crop can be enhanced through marker-assisted breeding. However, neither single nucleotide polymorphism (SNP) markers nor SNP-based genetic linkage map has been previously reported. In this study, a high-quality genetic linkage map of 2572 SNP markers for S. guianensis is generated using amplified-fragment single nucleotide polymorphism and methylation (AFSM) approach. The genetic map has 10 linkage groups (LGs), which spanned 2226.6 cM, with an average genetic distance of 0.87 cM between adjacent markers. Genetic mapping of quantitative trait loci (QTLs) for important agronomic traits such as yield-related and nutritional or quality-related traits was performed using F2 progeny of a cross between a male-sterile female parent TPRC1979 and male parent TPRCR273 with contrasting phenotypes for morphological and physiological traits. A total of 30 QTLs for 8 yield-related traits and 18 QTLs for 4 nutritional or quality-related traits are mapped on the linkage map. Both the high-quality genetic linkage map and the QTL mapping for important agronomic traits described here will provide valuable genetic resources for marker-assisted selection for S. guianensis.

The genus Stylosanthes, belonging to the tribe Aeschynomeneae, subtribe Stylosanthinae and the family Fabaceae, comprises 48 species native to tropical, subtropical and temperate regions of America, Africa and Southeast Asia, mainly South America 20,21 . Stylosanthes (stylo) is currently used as cut-and-carry feed for livestock, wasteland reclamation, fallow crops and hay 22,23 . Several species of stylo such as S. guianensis, S. hamata, S. scabra, S. fruticosa, S. seabrana, are considered as economically important tropical pasture legumes due to their high productivity, wide adaptability, potential in improving soil fertility (through nitrogen fixation, reclaiming degraded wastelands, and water and soil conservation) and extensive use in a range of agricultural systems 22 . Of these, S. guianensis is the most widespread perennial herbaceous Stylosanthes species in Australia, China, India and many other tropical countries of Africa, Southeast Asia and South America 24,25 . S. guianensis, a diploid species (2n = 20) with great morphological diversity 26 , has a mixed mating system with predominance of autogamy, and the outcrossing rate is estimated as 26% 27 .
Although genetic linkage maps of an interspecific cross between S. hamata and S. scabra 28 , and of S. scabra 29 had been reported with RAPD markers of low polymorphism, to date, high-density genetic linkage map for S. guianensis is not available. Even the literature on the development and application of molecular markers for S. guianensis is still relatively scarce. Microsatellites or simple sequence repeats (SSRs) have been developed to study the genetic diversity and population structure of S. guianensis [30][31][32] , and to determine the mating system of S. guianensis. Sequence-related amplified polymorphism (SRAP) has been used to select the true hybrids of S. guianensis 33 . The generation of a high-density genetic linkage map is critical step toward providing the framework for stylo genetic improvement through MAS. However, due to the non-availability of co-dominant markers of high level polymorphism, high-density linkage map for S. guianensis has not been generated successfully.
AFSM represents a quick and simple SNP discovery based on the use of two restriction enzyme pairs (EcoRI-MspI and EcoRI-HpaII) and next-generation sequencing. It has been successfully applied to construct high-density linkage map for cassava (Manihot esculanta Crantz), proving that it is an efficient and cost-effective for high-throughput strategy for de novo SNP genotyping of large genomes 14 . Here, we describe the development of the first high-density linkage map using AFSM approach and QTL mapping for important agronomic traits in S. guianensis using the genome-wide composite interval mapping (GCIM). These resources will help to promote genetic dissection of important agronomic traits and subsequent MAS in accelerating genetic improvement for S. guianensis.

Materials and Methods
plant materials and DNA extraction. A F 2 population of 202 individuals was produced by self-pollinating a F 1 hybrid derived from the cross between S. guianensis var. TPRC1979 (female) and var. TPRCR273 (male) with contrasting phenotypes for morphological and physiological traits 34 . Blocks for F 2 population, F 1 and their parents were established in a randomized complete block design with plant and row spacing of 2 m at the base of teaching and research (Danzhou Campus), Tropical Agriculture and Forestry Institute, Hainan University in 2015. Total genomic DNA was extracted from the young fully-expanded leaves of the parents, F 1 and F 2 progenies with the DNeasy 96 Plant Kit (Qiagen), and the two technical replicates were pooled for each individual. DNA was quantified using a NanoDrop 2000C spectrophotometer (ND2000; Thermo Fisher Scientific, USA), and DNA concentrations were normalized to 100 ng/μL. DNA quality and integrity were measured by electrophoresis on 1.0% agarose gels.
AFsM library construction, sequencing and genotyping. AFSM library construction, sequencing and genotyping followed the previously established protocol 14 . The genomic DNA of each sample was digested with two restriction enzyme pairs, EcoRI-MspI and EcoRI-HpaII (New England BioLabs Inc., Ipswich, MA) and ligated to barcode adapters and methylation adapters. Restriction fragments from each library were then amplified, and the resulting PCR products ranging from 250 bp to 500 bp in size were isolated to construct EcoRI-MspI library and EcoRI-HpaII library. Then paired-end sequencing was conducted on an Illumina HiSeq2500 system (Illumina Inc., San Diego, CA). Sequenced reads were processed using custom Perl scripts (http://afsmseq. sourceforge.net/), and the filtered reads were used for SNP identification using the TASSEL-GBS pipeline 35 and VCFtools_v0.1.9 (http://vcftools.sourceforge.net/) 36 . Genetic linkage map construction. Bi-allelic SNPs identification and genetic linkage map construction with JoinMap 4.1 37 were performed using procedures described by Xia et al. 14 . After the evaluation of all pairs of tags for the presence of at least five reads, bi-allelic SNPs (indels and methylation polymorphism sites) were identified by querying the filtered tags for pairs of sequences with the following requirements: identical in at least five reads; present in >50% of the individuals; passed a Fisher's exact test for independence; fit to the expected Mendelian segregation ratio as calculated with the chisquare test (χ 2 ) for P < 0.01; possessed a recombination frequency of <0.4; had AFSM markers specific to the female or male parent that fit to a 1:1 segregation ratio in addition to shared AFSM markers that fit to a 3:1 segregation ratio in the F1 population. The parent-specific AFSM markers, which segregated at a 1:1 ratio in the population, were set to the lm × ll (marker in the female parent) and nn × np (marker in the male parent) configuration, respectively. The AFSM markers that were present in both parents and segregated at a 3:1 ratio in the population were labeled as hk × hk (marker present in both parents) configuration. The Kosambi mapping function was used for map distance estimation, and the "Ripple" function was employed to confirm the orders of markers within each linkage group. phenotypic assessment of morphological and physiological indexes. Agronomic traits of F2 population were determined with three replicates for each item, including 10 yield-related indexes such as fresh www.nature.com/scientificreports www.nature.com/scientificreports/ weight (FW), dry weight (DW), the fresh weight: dry weight ratio (FW/DW), plant height (PH), primary branch number (PBN), plant width (PW), the maximum length of branches (LB), leaf length (LL), leaf width (LW) and the leaf length: leaf width ratio (LL/LW), and 8 nutritional or quality-related indexes such as the contents of crude protein, crude fiber, crude fat, crude ash, calcium, phosphate, potassium and magnesium. Plants propagated from cuttings of F1 and F2 individuals were used to repeat the experiments. Phenotypic evaluation and data analysis were carried out with the methods described by Ding et al. 34 .
QtL analysis and mapping. The GCIM 38 , QTL.gCIMapping 3.1, was applied to map QTLs. Permutation tests were executed to determine the statistical significance of LOD threshold for each trait with the number of permutations set at 10,000. logarithm of the odds (LOD) peaks exceeding LOD threshold (>2.5) were identified as QTL, and based on the location of its LOD peak and the surrounding region, the position of each QTL was determined.

Results
Genetic linkage mapping. A total of 223,011 SNPs were identified with AFSM approach. After filtering, the low-quality SNPs with coverage less than one third of mapping population and the SNPs showing a statistically significant deviation from Mendelian segregation were removed, and a final set of 13,661 high-quality SNP markers were evaluated for subsequent genetic linkage mapping. Excluding the unlinked and co-segregated SNP markers, we constructed the linkage map with 2572 markers distributed across 10 linkage groups (LGs). The resulting map had a total length of 2226.6 cM and an average interlocus distance of 0.87 cM (Fig. 1, Table 1, Table S1). The longest LG (LG9) contained 249 markers spanning 317.3 cM and the shortest LG (LG2) contained 273 markers spanning 121 cM, with an average genetic distance between successive markers being 1.27 cM and 0.44 cM, respectively. The highest-density LG (LG1) consisted of 600 markers spanning 162.6 cM, with an average marker interval being 0.41 cM.  www.nature.com/scientificreports www.nature.com/scientificreports/ evaluation of the phenotypic data. On the basis of the phenotypic data 34 , it was revealed that two parents were clearly different in terms of the agronomic traits, and the phenotypic values for each trait of segregation population exhibited wide range, with the coefficient of variation (CV) ranging from 8.31% (LL) to 46.19% (FW) for the yield-related traits and from 10.32% (crude ash content) to 34.01% (crude fat content) for nutritional or quality-related traits (Tables S2, S3 and S4). Transgressive segregation in both directions was observed for all traits. All traits exhibited normal distributions based on the testing for normality with Shapiro-Wilk test, suggesting they are quantitative traits (Tables S5 and S6).
Correlation analysis showed that highly significant positive correlation existed between the FW and other yield-related traits (Table S7). Crude protein content had highly significant positive correlation with P content, but had highly significant negative correlation with crude fiber content and not any obvious correlation with other nutritional traits (Table S8). Crude fiber content was significantly positively correlated with crude fat content but significantly negatively correlated with crude ash content and P content. Ca content had significant positive correlations with P content and Mg content but significant negative correlation with K content.
QtL mapping analysis of agronomic traits. A total of 30 QTLs, significant at a LOD score of >2.5, were identified for 8 yield-related traits including PW, LB, PH, LL, LW, PBN, FW/DW and LL/LW (Table 2). Eighteen QTLs controlling four nutritional or quality-related traits (crude protein content, crude ash content, calcium content, and potassium content) were mapped on the linkage map (Table 3).
Five QTLs for PW were identified on four LGs, namely, 2 on LG5, 1 on LG6, 1on LG7 and 1 on LG9 respectively, and explained 1.9-9.5% of the total phenotypic variance at a LOD score of 3.04-9.62. Twelve QTLs for LB were mapped to three LGs, namely, 6 QTLs on LG4, 3 QTLs on LG5 and 3 QTLs on LG9, and the phenotypic variances explained by these QTLs varied from 1.1% to 4.3% at the LOD scores of 5.64-65.82. Seven QTLs for PH were detected on five LGs, namely, 1 on LG2, 2 on LG3, 1 on LG4, 1 on LG5, 1 on LG7 and 1 on LG8, explaining 1.8-6.6% of the phenotypic variance at a LOD score of 2.52-4.65. Only one QTL related to LL were detected on LG8, accounting for 1.3% of the phenotypic variance at a LOD score of 2.51. Only one QTL for LW were identified  www.nature.com/scientificreports www.nature.com/scientificreports/ on LG8, accounting for 4.9% of the phenotypic variance at a LOD score of 4.65. Only one QTL for PBN were identified on LG4, accounting for 56.2% of the phenotypic variance at a LOD score of 5.37, and had the largest contribution to the trait's variance. Two QTLs for FW/DW were distributed across two LGs, namely, 1 on LG8 and 1 on LG10, explaining 4.2-5.0% of the phenotypic variance at a LOD score of ranging from 5.28 to 6.39. Only one QTL for LL/LW identified on LG9, explained 1.8% of the total phenotypic variance at a LOD score of 2.57.

Discussion
Construction of a high-density genetic map can provide a valuable resource for understanding the basis of important complex agronomic traits in S. guianensis. For S. guianensis, the major hurdles in constructing useful genetic map included the lack of the highly polymorphic, co-dominant molecular markers, and the difficulty in artificial emasculation and hybridization. Taking the advantages of the AFSM technology, a cost-effective and improved high-throughput SNP discovery approach which is better suited for non-model species 14 , and with a mapping population created with a male-sterile female parent (S. guianensis var. TPRC1979) newly found in China, we successfully developed the first high-density genetic linkage map for S. guianensis with 2572 polymorphic SNP markers, which spanned 2226.6 cM with an average distance of 0.87 cM between the flanking markers. With the extensive construction of genetic linkage maps, many excellent software programs have been developed such as FsLinkageMap by Tong et al. 39 for constructing high-density genetic linkage maps in some outcrossing plant species, especially in forest trees. S. guianensis is a perennial herbaceous species with predominance of autogamy, and the parents, S. guianensis var. TPRC1979 (female) and var. TPRCR273 (male), could be regarded as inbred lines, thus the linkage map construction in this study was performed with the JoinMap 4.1 37 . S. guianensis has 10 chromosomes in its haploid genome and current genetic map developed in this study is composed of 10 LGs, suggesting that the number of LGs is consistent with haploid chromosome number, and provides adequate genome coverage.
Kazan et al. 28 developed a linkage map containing a total of 44 RAPD loci based on an interspecific cross between Stylosanthes hamata and S. scabra. Thumma et al. 29 reported that a genetic linkage map of S. scabra consisting of 120 RAPD markers had been constructed, with the total genome length of 1406 cM and an average interval of 12 cM. However, these maps with average marker spacing above 5 cM could not be considered as the high-density maps. In addition, RAPD markers are not only low polymorphic and anonymous (their nucleotide sequences are not known) but also difficult to be transferred to new populations due to their dominant nature.
A high-density genetic map is principally applicable for fine-mapping of QTLs for important agronomic traits. Despite the importance of the agronomic traits such as yield-related and nutritional or quality-related traits, few QTL studies for these traits have been reported due to the lack of high-density genetic maps. Here, for the first time, we mapped the QTLs for the important agronomic traits on the basis of the high-density genetic map we

Trait
LG QTL Location (cM) Flanking marker LOD r 2 (%)  Table 3. Quantitative trait loci (QTLs) detected for the nutritional or quality-related traits based on the data from F2 population derived from the cross between TPRC1979 (female) and TPRCR273 (male) of Stylosanthes guianensis. r 2 (%): proportion of phenotypic variance explained by single QTL.
www.nature.com/scientificreports www.nature.com/scientificreports/ obtained. In total, 30 QTLs for 8 yield-related traits (PW, LB, PH, LL, LW, PBN, FW/DW and LL/LW) and 18 QTLs controlling 4 quality-related traits (crude protein content, crude ash content, calcium content, and potassium content) were detected on the linkage map. Preliminary mapping QTLs with the MapQTL 5.0 40 using both interval mapping (IM) and multiple QTL model (MQM) mapping generated many problematic results such as two QTLs in the same marker interval, the total of phenotypic variation explained by those QTLs more than 100%, and too many major effective QTLs detected. These issues were addressed very well by the GCIM developed by Wen et al. 38 , which is a more powerful method for the detection of closely linked and small-effect QTLs. Next step is to validate the QTL mapping in different environments and/or genetic backgrounds and in relatively large mapping populations.
In summary, we have constructed the first high-density genetic map for S. guianensis, which provides a platform for mapping QTLs and cloning genes and comparative genomic study. Once confirmed, the localization of these major QTLs on the linkage map based on next generation sequencing methods will be valuable resources for QTL cloning and genetic improvement by pyramiding favorable alleles into the improved cultivars.